date:20170730

Re: [PATCH v5 net-next] net: systemport: Support 64bit statistics

2017-07-30 Thread Florian Fainelli

On July 30, 2017 5:01:15 PM PDT, "Jianming.qiao"  wrote:
>When using Broadcom Systemport device in 32bit Platform, ifconfig can
>only report up to 4G tx,rx status, which will be wrapped to 0 when the
>number of incoming or outgoing packets exceeds 4G, only taking
>around 2 hours in busy network environment (such as streaming).
>Therefore, it makes hard for network diagnostic tool to get reliable
>statistical result, so the patch is used to add 64bit support for
>Broadcom Systemport device in 32bit Platform.
>
>Signed-off-by: Jianming.qiao 

Reviewed-by: Florian Fainelli 

-- 
Florian

Re: [PATCH net] ipv6: set fc_protocol with 0 when rtm_protocol is RTPROT_REDIRECT

2017-07-30 Thread Xin Long

On Mon, Jul 31, 2017 at 2:35 PM, David Ahern  wrote:
> On 7/30/17 6:51 AM, Xin Long wrote:
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index 4d30c96..187580f 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -2912,9 +2912,11 @@ static int rtm_to_fib6_config(struct sk_buff *skb, 
>> struct nlmsghdr *nlh,
>>   cfg->fc_dst_len = rtm->rtm_dst_len;
>>   cfg->fc_src_len = rtm->rtm_src_len;
>>   cfg->fc_flags = RTF_UP;
>> - cfg->fc_protocol = rtm->rtm_protocol;
>>   cfg->fc_type = rtm->rtm_type;
>>
>> + if (rtm->rtm_protocol != RTPROT_REDIRECT)
>> + cfg->fc_protocol = rtm->rtm_protocol;
>> +
>>   if (rtm->rtm_type == RTN_UNREACHABLE ||
>>   rtm->rtm_type == RTN_BLACKHOLE ||
>>   rtm->rtm_type == RTN_PROHIBIT ||
Hi, David
>
> Did you look at removing this hunk from rt6_fill_node:
>
> if (rt->rt6i_flags & RTF_DYNAMIC)
> rtm->rtm_protocol = RTPROT_REDIRECT;
> else if (rt->rt6i_flags & RTF_ADDRCONF) {
> if (rt->rt6i_flags & (RTF_DEFAULT | RTF_ROUTEINFO))
> rtm->rtm_protocol = RTPROT_RA;
> else
> rtm->rtm_protocol = RTPROT_KERNEL;
> }
The issue seems to affect "ip -6 route flush all" as well, not only cache
since 'else if {}' also  causes rtm proto being different from rt6 proto.

>
> And have rtm_protocol set properly on the route when it is installed?
The codes not keeping rtm proto consistent with rt6 proto day 1,
any idea on why it didn't use rt6 proto in kernel properly?

Thanks.

Re: [PATCH net] ipv6: set fc_protocol with 0 when rtm_protocol is RTPROT_REDIRECT

2017-07-30 Thread David Ahern

On 7/30/17 6:51 AM, Xin Long wrote:
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 4d30c96..187580f 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -2912,9 +2912,11 @@ static int rtm_to_fib6_config(struct sk_buff *skb, 
> struct nlmsghdr *nlh,
>   cfg->fc_dst_len = rtm->rtm_dst_len;
>   cfg->fc_src_len = rtm->rtm_src_len;
>   cfg->fc_flags = RTF_UP;
> - cfg->fc_protocol = rtm->rtm_protocol;
>   cfg->fc_type = rtm->rtm_type;
>  
> + if (rtm->rtm_protocol != RTPROT_REDIRECT)
> + cfg->fc_protocol = rtm->rtm_protocol;
> +
>   if (rtm->rtm_type == RTN_UNREACHABLE ||
>   rtm->rtm_type == RTN_BLACKHOLE ||
>   rtm->rtm_type == RTN_PROHIBIT ||

Did you look at removing this hunk from rt6_fill_node:

if (rt->rt6i_flags & RTF_DYNAMIC)
rtm->rtm_protocol = RTPROT_REDIRECT;
else if (rt->rt6i_flags & RTF_ADDRCONF) {
if (rt->rt6i_flags & (RTF_DEFAULT | RTF_ROUTEINFO))
rtm->rtm_protocol = RTPROT_RA;
else
rtm->rtm_protocol = RTPROT_KERNEL;
}

And have rtm_protocol set properly on the route when it is installed?

Re: [PATCH net-next v12 0/4] net sched actions: improve dump performance

2017-07-30 Thread David Miller


Series applied, thanks.

Re: [PATCH net-next] net: fec: Allow reception of frames bigger than 1522 bytes

2017-07-30 Thread Andrew Lunn

On Sun, Jul 30, 2017 at 07:26:29PM -0700, David Miller wrote:
> From: Andrew Lunn 
> Date: Sun, 30 Jul 2017 19:36:05 +0200
> 
> > The FEC Receive Control Register has a 14 bit field indicating the
> > longest frame that my be received. It is being set to 1522. Frames
> > longer than this are discarded, but counted as being in error.
> > 
> > When using DSA, frames from the switch has an additional header,
> > either 4 or 8 bytes if a Marvell switch is used. Thus a full MTU frame
> > of 1522 bytes received by the switch on a port becomes 1530 bytes when
> > passed to the host via the FEC interface.
> > 
> > Change the maximum receive size to 2048 - 64, where 64 is the maximum
> > rx_alignment applied on the receive buffer for AVB capable FEC
> > cores. Use this value also for the maximum receive buffer size. The
> > driver is already allocating a receive SKB of 2048 bytes, so this
> > change should not have any significant effects.
> > 
> > Tested on imx51, imx6, vf610.
> > 
> > Signed-off-by: Andrew Lunn 
> 
> Applied with commit log message typo fixed.

Hi David

Thanks for fixing the typo.

   Andrew

Re: [PATCH net-next v12 1/4] net netlink: Add new type NLA_BITFIELD32

2017-07-30 Thread David Ahern

On 7/30/17 1:59 PM, Jamal Hadi Salim wrote:
> On D. Ahern: I dont think we are disagreeing anymore on the need to
> generalize the check. He is saying it should be a helper and I already
> had the validation data; either works. I dont see the gapping need
> to remove the validation data.

I never disagreed on general code; I have always disagreed on validating
values as part of the policy check.

Re: [PATCH net-next] net: fec: Allow reception of frames bigger than 1522 bytes

2017-07-30 Thread David Miller

From: Andrew Lunn 
Date: Sun, 30 Jul 2017 19:36:05 +0200

> The FEC Receive Control Register has a 14 bit field indicating the
> longest frame that my be received. It is being set to 1522. Frames
> longer than this are discarded, but counted as being in error.
> 
> When using DSA, frames from the switch has an additional header,
> either 4 or 8 bytes if a Marvell switch is used. Thus a full MTU frame
> of 1522 bytes received by the switch on a port becomes 1530 bytes when
> passed to the host via the FEC interface.
> 
> Change the maximum receive size to 2048 - 64, where 64 is the maximum
> rx_alignment applied on the receive buffer for AVB capable FEC
> cores. Use this value also for the maximum receive buffer size. The
> driver is already allocating a receive SKB of 2048 bytes, so this
> change should not have any significant effects.
> 
> Tested on imx51, imx6, vf610.
> 
> Signed-off-by: Andrew Lunn 

Applied with commit log message typo fixed.

Re: [PATCH v2 net-next 0/4] net: dsa: lan9303: Fix MDIO issues.

2017-07-30 Thread David Miller

From: Egil Hjelmeland 
Date: Sun, 30 Jul 2017 19:58:52 +0200

> This series fix the MDIO interface for the lan9303 DSA driver.
> Bugs found after testing on actual HW.
> 
> This series is extracted from the first patch of my first large
> series. Significant changes from that version are:
>  - use mdiobus_write_nested, mdiobus_read_nested.
>  - EXPORT lan9303_indirect_phy_ops
> 
> Unfortunately I do not have access to i2c based system for
> testing.
> 
> Changes from first version:
>  - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL

Series applied, thanks.

RE: [PATCH net-next] net: fec: Issue error for missing but expected PHY

2017-07-30 Thread Andy Duan

From: Andrew Lunn  Sent: Monday, July 31, 2017 4:11 AM
>If the PHY is missing but expected, e.g. because of a typ0 in the dt file, it 
>is not
>possible to open the interface. ip link returns:
>
>RTNETLINK answers: No such device
>
>It is not very obvious what the problem is. Add a netdev_err() in this case to
>make it easier to debug the issue.
>
>[   21.409385] fec 2188000.ethernet eth0: Unable to connect to phy
>RTNETLINK answers: No such device
>
>Signed-off-by: Andrew Lunn 

Acked-by: Fugang Duan 

>---
> drivers/net/ethernet/freescale/fec_main.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/ethernet/freescale/fec_main.c
>b/drivers/net/ethernet/freescale/fec_main.c
>index cc0c2a58c4de..c5995f07f821 100644
>--- a/drivers/net/ethernet/freescale/fec_main.c
>+++ b/drivers/net/ethernet/freescale/fec_main.c
>@@ -1907,8 +1907,10 @@ static int fec_enet_mii_probe(struct net_device
>*ndev)
>   phy_dev = of_phy_connect(ndev, fep->phy_node,
>_enet_adjust_link, 0,
>fep->phy_interface);
>-  if (!phy_dev)
>+  if (!phy_dev) {
>+  netdev_err(ndev, "Unable to connect to phy\n");
>   return -ENODEV;
>+  }
>   } else {
>   /* check for attached phy */
>   for (phy_id = 0; (phy_id < PHY_MAX_ADDR); phy_id++) {
>--
>2.13.2

RE: [PATCH net-next] net: fec: Allow reception of frames bigger than 1522 bytes

2017-07-30 Thread Andy Duan

From: Andrew Lunn  Sent: Monday, July 31, 2017 1:36 AM
>The FEC Receive Control Register has a 14 bit field indicating the longest 
>frame
>that my be received. It is being set to 1522. Frames longer than this are

My -> may

>discarded, but counted as being in error.
>
>When using DSA, frames from the switch has an additional header, either 4 or
>8 bytes if a Marvell switch is used. Thus a full MTU frame of 1522 bytes
>received by the switch on a port becomes 1530 bytes when passed to the host
>via the FEC interface.
>
>Change the maximum receive size to 2048 - 64, where 64 is the maximum
>rx_alignment applied on the receive buffer for AVB capable FEC cores. Use this
>value also for the maximum receive buffer size. The driver is already 
>allocating
>a receive SKB of 2048 bytes, so this change should not have any significant
>effects.
>
>Tested on imx51, imx6, vf610.
>
>Signed-off-by: Andrew Lunn 
>---
> drivers/net/ethernet/freescale/fec_main.c | 8 +---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/net/ethernet/freescale/fec_main.c
>b/drivers/net/ethernet/freescale/fec_main.c
>index a6e323f15637..47ee74a17a9f 100644
>--- a/drivers/net/ethernet/freescale/fec_main.c
>+++ b/drivers/net/ethernet/freescale/fec_main.c
>@@ -173,10 +173,12 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet
>MAC address");  #endif /* CONFIG_M5272 */
>
> /* The FEC stores dest/src/type/vlan, data, and checksum for receive packets.
>+ *
>+ * 2048 byte skbufs are allocated. However, alignment requirements
>+ * varies between FEC variants. Worst case is 64, so round down by 64.
>  */
>-#define PKT_MAXBUF_SIZE   1522
>+#define PKT_MAXBUF_SIZE   (round_down(2048 - 64, 64))
> #define PKT_MINBUF_SIZE   64
>-#define PKT_MAXBLR_SIZE   1536
>
> /* FEC receive acceleration */
> #define FEC_RACC_IPDIS(1 << 1)
>@@ -851,7 +853,7 @@ static void fec_enet_enable_ring(struct net_device
>*ndev)
>   for (i = 0; i < fep->num_rx_queues; i++) {
>   rxq = fep->rx_queue[i];
>   writel(rxq->bd.dma, fep->hwp + FEC_R_DES_START(i));
>-  writel(PKT_MAXBLR_SIZE, fep->hwp + FEC_R_BUFF_SIZE(i));
>+  writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_R_BUFF_SIZE(i));
>
>   /* enable DMA1/2 */
>   if (i)
>--
>2.13.2

[PATCH] drivers/net/wan/z85230.c: Use designated initializers

2017-07-30 Thread Kees Cook

In preparation for the randstruct gcc plugin performing randomization of
structures that are entirely function pointers, use designated initializers
so the compiler doesn't get angry.

Reported-by: kbuild test robot 
Signed-off-by: Kees Cook 
---
This is a prerequisite for the future randstruct fptr randomization. I'd
prefer to carry this in my gcc-plugin tree for v4.14 with an Ack from
someone on net-dev, or if possible, have it applied to v4.13 via net-dev.

Thanks!
---
 drivers/net/wan/z85230.c | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/net/wan/z85230.c b/drivers/net/wan/z85230.c
index 2f0bd6955f33..deea41e96f01 100644
--- a/drivers/net/wan/z85230.c
+++ b/drivers/net/wan/z85230.c
@@ -483,11 +483,10 @@ static void z8530_status(struct z8530_channel *chan)
write_zsctrl(chan, RES_H_IUS);
 }
 
-struct z8530_irqhandler z8530_sync =
-{
-   z8530_rx,
-   z8530_tx,
-   z8530_status
+struct z8530_irqhandler z8530_sync = {
+   .rx = z8530_rx,
+   .tx = z8530_tx,
+   .status = z8530_status,
 };
 
 EXPORT_SYMBOL(z8530_sync);
@@ -605,15 +604,15 @@ static void z8530_dma_status(struct z8530_channel *chan)
 }
 
 static struct z8530_irqhandler z8530_dma_sync = {
-   z8530_dma_rx,
-   z8530_dma_tx,
-   z8530_dma_status
+   .rx = z8530_dma_rx,
+   .tx = z8530_dma_tx,
+   .status = z8530_dma_status,
 };
 
 static struct z8530_irqhandler z8530_txdma_sync = {
-   z8530_rx,
-   z8530_dma_tx,
-   z8530_dma_status
+   .rx = z8530_rx,
+   .tx = z8530_dma_tx,
+   .status = z8530_dma_status,
 };
 
 /**
@@ -678,11 +677,10 @@ static void z8530_status_clear(struct z8530_channel *chan)
write_zsctrl(chan, RES_H_IUS);
 }
 
-struct z8530_irqhandler z8530_nop=
-{
-   z8530_rx_clear,
-   z8530_tx_clear,
-   z8530_status_clear
+struct z8530_irqhandler z8530_nop = {
+   .rx = z8530_rx_clear,
+   .tx = z8530_tx_clear,
+   .status = z8530_status_clear,
 };
 
 
-- 
2.7.4


-- 
Kees Cook
Pixel Security

[PATCH v5 net-next] net: systemport: Support 64bit statistics

2017-07-30 Thread Jianming.qiao

When using Broadcom Systemport device in 32bit Platform, ifconfig can
only report up to 4G tx,rx status, which will be wrapped to 0 when the
number of incoming or outgoing packets exceeds 4G, only taking
around 2 hours in busy network environment (such as streaming).
Therefore, it makes hard for network diagnostic tool to get reliable
statistical result, so the patch is used to add 64bit support for
Broadcom Systemport device in 32bit Platform.

Signed-off-by: Jianming.qiao 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 68 --
 drivers/net/ethernet/broadcom/bcmsysport.h |  9 +++-
 2 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 5333601..c0df4f9 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -662,6 +662,7 @@ static int bcm_sysport_alloc_rx_bufs(struct 
bcm_sysport_priv *priv)
 static unsigned int bcm_sysport_desc_rx(struct bcm_sysport_priv *priv,
unsigned int budget)
 {
+   struct bcm_sysport_stats *stats64 = >stats64;
struct net_device *ndev = priv->netdev;
unsigned int processed = 0, to_process;
struct bcm_sysport_cb *cb;
@@ -765,6 +766,10 @@ static unsigned int bcm_sysport_desc_rx(struct 
bcm_sysport_priv *priv,
skb->protocol = eth_type_trans(skb, ndev);
ndev->stats.rx_packets++;
ndev->stats.rx_bytes += len;
+   u64_stats_update_begin(>syncp);
+   stats64->rx_packets++;
+   stats64->rx_bytes += len;
+   u64_stats_update_end(>syncp);
 
napi_gro_receive(>napi, skb);
 next:
@@ -787,17 +792,15 @@ static void bcm_sysport_tx_reclaim_one(struct 
bcm_sysport_tx_ring *ring,
struct device *kdev = >pdev->dev;
 
if (cb->skb) {
-   ring->bytes += cb->skb->len;
*bytes_compl += cb->skb->len;
dma_unmap_single(kdev, dma_unmap_addr(cb, dma_addr),
 dma_unmap_len(cb, dma_len),
 DMA_TO_DEVICE);
-   ring->packets++;
(*pkts_compl)++;
bcm_sysport_free_cb(cb);
/* SKB fragment */
} else if (dma_unmap_addr(cb, dma_addr)) {
-   ring->bytes += dma_unmap_len(cb, dma_len);
+   *bytes_compl += dma_unmap_len(cb, dma_len);
dma_unmap_page(kdev, dma_unmap_addr(cb, dma_addr),
   dma_unmap_len(cb, dma_len), DMA_TO_DEVICE);
dma_unmap_addr_set(cb, dma_addr, 0);
@@ -808,9 +811,10 @@ static void bcm_sysport_tx_reclaim_one(struct 
bcm_sysport_tx_ring *ring,
 static unsigned int __bcm_sysport_tx_reclaim(struct bcm_sysport_priv *priv,
 struct bcm_sysport_tx_ring *ring)
 {
-   struct net_device *ndev = priv->netdev;
unsigned int c_index, last_c_index, last_tx_cn, num_tx_cbs;
+   struct bcm_sysport_stats *stats64 = >stats64;
unsigned int pkts_compl = 0, bytes_compl = 0;
+   struct net_device *ndev = priv->netdev;
struct bcm_sysport_cb *cb;
u32 hw_ind;
 
@@ -849,6 +853,11 @@ static unsigned int __bcm_sysport_tx_reclaim(struct 
bcm_sysport_priv *priv,
last_c_index &= (num_tx_cbs - 1);
}
 
+   u64_stats_update_begin(>syncp);
+   ring->packets += pkts_compl;
+   ring->bytes += bytes_compl;
+   u64_stats_update_end(>syncp);
+
ring->c_index = c_index;
 
netif_dbg(priv, tx_done, ndev,
@@ -1671,24 +1680,6 @@ static int bcm_sysport_change_mac(struct net_device 
*dev, void *p)
return 0;
 }
 
-static struct net_device_stats *bcm_sysport_get_nstats(struct net_device *dev)
-{
-   struct bcm_sysport_priv *priv = netdev_priv(dev);
-   unsigned long tx_bytes = 0, tx_packets = 0;
-   struct bcm_sysport_tx_ring *ring;
-   unsigned int q;
-
-   for (q = 0; q < dev->num_tx_queues; q++) {
-   ring = >tx_rings[q];
-   tx_bytes += ring->bytes;
-   tx_packets += ring->packets;
-   }
-
-   dev->stats.tx_bytes = tx_bytes;
-   dev->stats.tx_packets = tx_packets;
-   return >stats;
-}
-
 static void bcm_sysport_netif_start(struct net_device *dev)
 {
struct bcm_sysport_priv *priv = netdev_priv(dev);
@@ -1923,6 +1914,37 @@ static int bcm_sysport_stop(struct net_device *dev)
return 0;
 }
 
+static void bcm_sysport_get_stats64(struct net_device *dev,
+   struct rtnl_link_stats64 *stats)
+{
+   struct bcm_sysport_priv *priv = netdev_priv(dev);
+   struct bcm_sysport_stats *stats64 = >stats64;
+   struct bcm_sysport_tx_ring *ring;
+   u64 tx_packets = 0, tx_bytes = 0;
+   unsigned int start;
+   unsigned int q;
+
+

I am Mrs Kivi Kangas, citizen of Finland and 68 years old.

2017-07-30 Thread Office File

Dear Beloved One.

I am writing this mail with tears and sorrow seeking your assistance
due to my medical situation here in London. I am Mrs Kivi Kangas,
citizen of Finland and 68 years old. I was an Orphan, adopted by my
late father, Engr. Grain Kangas, Chairman/CEO of Grainkangas Oil and
Gas Services. After the death of my father, I took over his company
management and also decided not to get married since I am his only
Child.

Since then, I started suffering from Coronary heart disease which has
cost me a lot and also affected every part of my body and brain cells
due to complications. Just 5 days ago, the British doctors told me
that I may die any time because my condition has gotten to a critical
and life threatening stage. Having known my medical critical status, I
therefore decided to seek for your urgent assistance/ cooperation to
use my inheritance $18.3 Million USD, to Build Orphanage homes in
memory of me when am gone.

I took this bold decision because I do not have husband, family or
children that can inherit these funds when am gone. If interested, you
shall take 25% out of the total money for your assistance, and also
use the balance to build Orphanage homes in memory of me.  I
interested, get back to me urgently for more details and to
proceed.Immediately I hearing from you through this email (
mrskivikanga...@gmail.com ), I shall issue you Certificate of
Deposit and my father death certificate for your confirmation before
we proceed, to establish you as my legal beneficiary in receiving the
money on my behalf.

I am waiting to hearing from you urgently.
Regards,
Mrs Kivi Kangas

Re: Performance regression with virtio_net

2017-07-30 Thread Euan Kemp

I've also observed this performance regression.

The minimal fix for me is removing the two
> if (unlikely(len > (unsigned long)ctx))
checks added in 680557c.

After digging a little more, the reason that check can fail appears to
be that add_recvbuf_mergeable sometimes includes a hole at the end,
which is included in len but not ctx.

I'd send a patch removing those conditions, but I'm not certain
whether "truesize" in receive_mergeable should also be changed back to
be the max of len/ctx, or should remain as-is.

- Euan

[PATCH] hyperv: netvsc: Neaten netvsc_send_pkt by using a temporary

2017-07-30 Thread Joe Perches

Repeated dereference of nvmsg.msg.v1_msg.send_rndis_pkt can be
shortened by using a temporary.  Do so.

No change in object code.

Signed-off-by: Joe Perches 
---
 drivers/net/hyperv/netvsc.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 06f39a99da7c..fede1546cdc6 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -743,6 +743,7 @@ static inline int netvsc_send_pkt(
struct sk_buff *skb)
 {
struct nvsp_message nvmsg;
+   struct nvsp_1_message_send_rndis_packet *rpkt;
struct netvsc_channel *nvchan
= _device->chan_table[packet->q_idx];
struct vmbus_channel *out_channel = nvchan->channel;
@@ -754,21 +755,17 @@ static inline int netvsc_send_pkt(
u32 ring_avail = hv_ringbuf_avail_percent(_channel->outbound);
 
nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
-   if (skb != NULL) {
-   /* 0 is RMC_DATA; */
-   nvmsg.msg.v1_msg.send_rndis_pkt.channel_type = 0;
-   } else {
-   /* 1 is RMC_CONTROL; */
-   nvmsg.msg.v1_msg.send_rndis_pkt.channel_type = 1;
-   }
+   rpkt = _msg.send_rndis_pkt;
+   if (skb != NULL)
+   rpkt->channel_type = 0; /* 0 is RMC_DATA */
+   else
+   rpkt->channel_type = 1; /* 1 is RMC_CONTROL */
 
-   nvmsg.msg.v1_msg.send_rndis_pkt.send_buf_section_index =
-   packet->send_buf_index;
+   rpkt->send_buf_section_index = packet->send_buf_index;
if (packet->send_buf_index == NETVSC_INVALID_INDEX)
-   nvmsg.msg.v1_msg.send_rndis_pkt.send_buf_section_size = 0;
+   rpkt->send_buf_section_size = 0;
else
-   nvmsg.msg.v1_msg.send_rndis_pkt.send_buf_section_size =
-   packet->total_data_buflen;
+   rpkt->send_buf_section_size = packet->total_data_buflen;
 
req_id = (ulong)skb;
 
@@ -776,8 +773,7 @@ static inline int netvsc_send_pkt(
return -ENODEV;
 
if (packet->page_buf_cnt) {
-   pgbuf = packet->cp_partial ? (*pb) +
-   packet->rmsg_pgcnt : (*pb);
+   pgbuf = packet->cp_partial ? *pb + packet->rmsg_pgcnt : *pb;
ret = vmbus_sendpacket_pagebuffer_ctl(out_channel,
  pgbuf,
  packet->page_buf_cnt,
-- 
2.10.0.rc2.1.g053435c

Re: Kernel TLS in 4.13-rc1

2017-07-30 Thread David Oberhollenzer

On 07/24/2017 11:10 PM, Dave Watson wrote:
> On 07/23/17 09:39 PM, David Oberhollenzer wrote:
>> After fixing the benchmark/test tool that the patch description
>> linked to (https://github.com/Mellanox/tls-af_ktls_tool) to make
>> sure that the server and client actually *agree* on AES-128-GCM,
>> I simply ran the client program with the --verify-sendpage option.
>>
>> The handshake and setting up of the sockets appears to work but
>> the program complains that the sent and received page contents
>> do not match (sent is 0x12 repeated all over and received looks
>> pretty random).
> 
> The --verify functions depend on the RX path as well, which has not
> been merged.  Any programs / tests using OpenSSL + patches should work
> fine.
> 
> If you want to use the tool, something like this should work, so that
> the receive path uses gnutls:
> 
> ./server --no-echo
> 
> ./client --server-port 12345 --sendfile some_file --server-host localhost
> 

Thanks! This appears to work as expected (output from the server matches the
input from the client and the pcap dumps look fine).

>From briefly browsing through the code of the test tool I was initially under
the impression that it would generate an error message and terminate if an
attempt was made at configuring ktls for the RX path.

Anyway, I already read in the patch description that RX wasn't included yet,
still requires a few cleanups and would follow at some point.

Is there currently a "not-so-clean" version of the RX patches floating around
somewhere that we could take a look at?


Thanks,

David

[PATCH net-next 5/7] net: phy: marvell: Refactor m88e1121 RGMII delay configuration

2017-07-30 Thread Andrew Lunn

Turns out that MII_M1116R_CONTROL_REG_MAC is the same as
MII_88E1121_PHY_MSCR_REG. Refactor the code to set the RGMII delays
into a shared helper.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/marvell.c | 62 +--
 1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 275647ebaa81..408442bdef0a 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -80,7 +80,7 @@
 #define MII_88E1121_PHY_MSCR_REG   21
 #define MII_88E1121_PHY_MSCR_RX_DELAY  BIT(5)
 #define MII_88E1121_PHY_MSCR_TX_DELAY  BIT(4)
-#define MII_88E1121_PHY_MSCR_DELAY_MASK(~(0x3 << 4))
+#define MII_88E1121_PHY_MSCR_DELAY_MASK(~(BIT(5) || BIT(4)))
 
 #define MII_88E1121_MISC_TEST  0x1a
 #define MII_88E1510_MISC_TEST_TEMP_THRESHOLD_MASK  0x1f00
@@ -127,8 +127,6 @@
 #define MII_M1011_PHY_STATUS_RESOLVED  0x0800
 #define MII_M1011_PHY_STATUS_LINK  0x0400
 
-#define MII_M1116R_CONTROL_REG_MAC 21
-
 #define MII_88E3016_PHY_SPEC_CTRL  0x10
 #define MII_88E3016_DISABLE_SCRAMBLER  0x0200
 #define MII_88E3016_AUTO_MDIX_CROSSOVER0x0030
@@ -442,7 +440,7 @@ static int marvell_of_reg_init(struct phy_device *phydev)
 }
 #endif /* CONFIG_OF_MDIO */
 
-static int m88e1121_config_aneg(struct phy_device *phydev)
+static int m88e1121_config_aneg_rgmii_delays(struct phy_device *phydev)
 {
int err, oldpage, mscr;
 
@@ -450,25 +448,40 @@ static int m88e1121_config_aneg(struct phy_device *phydev)
if (oldpage < 0)
return oldpage;
 
-   if (phy_interface_is_rgmii(phydev)) {
-   mscr = phy_read(phydev, MII_88E1121_PHY_MSCR_REG) &
-   MII_88E1121_PHY_MSCR_DELAY_MASK;
-
-   if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
-   mscr |= (MII_88E1121_PHY_MSCR_RX_DELAY |
-MII_88E1121_PHY_MSCR_TX_DELAY);
-   else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID)
-   mscr |= MII_88E1121_PHY_MSCR_RX_DELAY;
-   else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)
-   mscr |= MII_88E1121_PHY_MSCR_TX_DELAY;
-
-   err = phy_write(phydev, MII_88E1121_PHY_MSCR_REG, mscr);
-   if (err < 0)
-   return err;
+   mscr = phy_read(phydev, MII_88E1121_PHY_MSCR_REG);
+   if (mscr < 0) {
+   err = mscr;
+   goto out;
}
 
+   mscr &= MII_88E1121_PHY_MSCR_DELAY_MASK;
+
+   if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
+   mscr |= (MII_88E1121_PHY_MSCR_RX_DELAY |
+MII_88E1121_PHY_MSCR_TX_DELAY);
+   else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID)
+   mscr |= MII_88E1121_PHY_MSCR_RX_DELAY;
+   else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)
+   mscr |= MII_88E1121_PHY_MSCR_TX_DELAY;
+
+   err = phy_write(phydev, MII_88E1121_PHY_MSCR_REG, mscr);
+
+out:
marvell_set_page(phydev, oldpage);
 
+   return err;
+}
+
+static int m88e1121_config_aneg(struct phy_device *phydev)
+{
+   int err = 0;
+
+   if (phy_interface_is_rgmii(phydev)) {
+   err = m88e1121_config_aneg_rgmii_delays(phydev);
+   if (err)
+   return err;
+   }
+
err = genphy_soft_reset(phydev);
if (err < 0)
return err;
@@ -650,16 +663,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
if (err < 0)
return err;
 
-   err = marvell_set_page(phydev, MII_MARVELL_MSCR_PAGE);
-   if (err < 0)
-   return err;
-   temp = phy_read(phydev, MII_M1116R_CONTROL_REG_MAC);
-   temp |= (1 << 5);
-   temp |= (1 << 4);
-   err = phy_write(phydev, MII_M1116R_CONTROL_REG_MAC, temp);
-   if (err < 0)
-   return err;
-   err = marvell_set_page(phydev, MII_MARVELL_COPPER_PAGE);
+   err = m88e1121_config_aneg_rgmii_delays(phydev);
if (err < 0)
return err;
 
-- 
2.13.2

[PATCH net-next 7/7] net: phy: marvell: Refactor setting downshift into a helper

2017-07-30 Thread Andrew Lunn

The 1116r has code to set downshift. Refactor this into a helper, so
in future other marvell PHYs can use it.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/marvell.c | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 34fd15b904e7..361fe9927ef2 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -58,6 +58,7 @@
 #define MII_M1011_PHY_SCR  0x10
 #define MII_M1011_PHY_SCR_DOWNSHIFT_EN BIT(11)
 #define MII_M1011_PHY_SCR_DOWNSHIFT_SHIFT  12
+#define MII_M1011_PHY_SRC_DOWNSHIFT_MASK   0x7800
 #define MII_M1011_PHY_SCR_MDI  (0x0 << 5)
 #define MII_M1011_PHY_SCR_MDI_X(0x1 << 5)
 #define MII_M1011_PHY_SCR_AUTO_CROSS   (0x3 << 5)
@@ -263,6 +264,23 @@ static int marvell_set_polarity(struct phy_device *phydev, 
int polarity)
return 0;
 }
 
+static int marvell_set_downshift(struct phy_device *phydev, bool enable,
+u8 retries)
+{
+   int reg;
+
+   reg = phy_read(phydev, MII_M1011_PHY_SCR);
+   if (reg < 0)
+   return reg;
+
+   reg &= MII_M1011_PHY_SRC_DOWNSHIFT_MASK;
+   reg |= ((retries - 1) << MII_M1011_PHY_SCR_DOWNSHIFT_SHIFT);
+   if (enable)
+   reg |= MII_M1011_PHY_SCR_DOWNSHIFT_EN;
+
+   return phy_write(phydev, MII_M1011_PHY_SCR, reg);
+}
+
 static int marvell_config_aneg(struct phy_device *phydev)
 {
int err;
@@ -643,7 +661,6 @@ static int marvell_config_init(struct phy_device *phydev)
 
 static int m88e1116r_config_init(struct phy_device *phydev)
 {
-   int temp;
int err;
 
err = genphy_soft_reset(phydev);
@@ -660,10 +677,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
if (err < 0)
return err;
 
-   temp = phy_read(phydev, MII_M1011_PHY_SCR);
-   temp |= (7 << MII_M1011_PHY_SCR_DOWNSHIFT_SHIFT);
-   temp |= MII_M1011_PHY_SCR_DOWNSHIFT_EN;
-   err = phy_write(phydev, MII_M1011_PHY_SCR, temp);
+   err = marvell_set_downshift(phydev, true, 8);
if (err < 0)
return err;
 
-- 
2.13.2

[PATCH net-next 6/7] net: phy: marvell: Use the set_polarity helper

2017-07-30 Thread Andrew Lunn

Some of the init functions unilaterally enable set auto cross over
without using the helper. Make use of the helper, and respect the
phydev MDI configuration.

Clean up the #define used while setting polarity, and the other
functions of the bits in the register.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/marvell.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 408442bdef0a..34fd15b904e7 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -55,10 +55,12 @@
 #define MII_M1011_IMASK_INIT   0x6400
 #define MII_M1011_IMASK_CLEAR  0x
 
-#define MII_M1011_PHY_SCR  0x10
-#define MII_M1011_PHY_SCR_MDI  0x
-#define MII_M1011_PHY_SCR_MDI_X0x0020
-#define MII_M1011_PHY_SCR_AUTO_CROSS   0x0060
+#define MII_M1011_PHY_SCR  0x10
+#define MII_M1011_PHY_SCR_DOWNSHIFT_EN BIT(11)
+#define MII_M1011_PHY_SCR_DOWNSHIFT_SHIFT  12
+#define MII_M1011_PHY_SCR_MDI  (0x0 << 5)
+#define MII_M1011_PHY_SCR_MDI_X(0x1 << 5)
+#define MII_M1011_PHY_SCR_AUTO_CROSS   (0x3 << 5)
 
 #define MII_M_PHY_LED_CONTROL  0x18
 #define MII_M_PHY_LED_DIRECT   0x4100
@@ -486,8 +488,7 @@ static int m88e1121_config_aneg(struct phy_device *phydev)
if (err < 0)
return err;
 
-   err = phy_write(phydev, MII_M1011_PHY_SCR,
-   MII_M1011_PHY_SCR_AUTO_CROSS);
+   err = marvell_set_polarity(phydev, phydev->mdix_ctrl);
if (err < 0)
return err;
 
@@ -655,10 +656,13 @@ static int m88e1116r_config_init(struct phy_device 
*phydev)
if (err < 0)
return err;
 
+   err = marvell_set_polarity(phydev, phydev->mdix_ctrl);
+   if (err < 0)
+   return err;
+
temp = phy_read(phydev, MII_M1011_PHY_SCR);
-   temp |= (7 << 12);  /* max number of gigabit attempts */
-   temp |= (1 << 11);  /* enable downshift */
-   temp |= MII_M1011_PHY_SCR_AUTO_CROSS;
+   temp |= (7 << MII_M1011_PHY_SCR_DOWNSHIFT_SHIFT);
+   temp |= MII_M1011_PHY_SCR_DOWNSHIFT_EN;
err = phy_write(phydev, MII_M1011_PHY_SCR, temp);
if (err < 0)
return err;
@@ -891,8 +895,7 @@ static int m88e1118_config_aneg(struct phy_device *phydev)
if (err < 0)
return err;
 
-   err = phy_write(phydev, MII_M1011_PHY_SCR,
-   MII_M1011_PHY_SCR_AUTO_CROSS);
+   err = marvell_set_polarity(phydev, phydev->mdix_ctrl);
if (err < 0)
return err;
 
-- 
2.13.2

[PATCH net-next 4/7] net: phy: marvell: Consolidate setting the phy-mode

2017-07-30 Thread Andrew Lunn

The same code is repeated a few times. Refactor into a helped.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/marvell.c | 88 +--
 1 file changed, 40 insertions(+), 48 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index c1b724ab5f25..275647ebaa81 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -60,11 +60,6 @@
 #define MII_M1011_PHY_SCR_MDI_X0x0020
 #define MII_M1011_PHY_SCR_AUTO_CROSS   0x0060
 
-#define MII_M1145_PHY_EXT_SR   0x1b
-#define MII_M1145_HWCFG_MODE_SGMII_NO_CLK  0x4
-#define MII_M1145_HWCFG_MODE_MASK  0xf
-#define MII_M1145_HWCFG_FIBER_COPPER_AUTO  0x8000
-
 #define MII_M_PHY_LED_CONTROL  0x18
 #define MII_M_PHY_LED_DIRECT   0x4100
 #define MII_M_PHY_LED_COMBINE  0x411c
@@ -74,12 +69,13 @@
 #define MII_M_PHY_EXT_SR   0x1b
 
 #define MII_M_HWCFG_MODE_MASK  0xf
-#define MII_M_HWCFG_MODE_COPPER_RGMII  0xb
 #define MII_M_HWCFG_MODE_FIBER_RGMII   0x3
 #define MII_M_HWCFG_MODE_SGMII_NO_CLK  0x4
+#define MII_M_HWCFG_MODE_RTBI  0x7
 #define MII_M_HWCFG_MODE_COPPER_RTBI   0x9
-#define MII_M_HWCFG_FIBER_COPPER_AUTO  0x8000
-#define MII_M_HWCFG_FIBER_COPPER_RES   0x2000
+#define MII_M_HWCFG_MODE_COPPER_RGMII  0xb
+#define MII_M_HWCFG_FIBER_COPPER_RES   BIT(13)
+#define MII_M_HWCFG_FIBER_COPPER_AUTO  BIT(15)
 
 #define MII_88E1121_PHY_MSCR_REG   21
 #define MII_88E1121_PHY_MSCR_RX_DELAY  BIT(5)
@@ -693,6 +689,27 @@ static int m88e3016_config_init(struct phy_device *phydev)
return marvell_config_init(phydev);
 }
 
+static int m88e_config_init_hwcfg_mode(struct phy_device *phydev,
+  u16 mode,
+  int fibre_copper_auto)
+{
+   int temp;
+
+   temp = phy_read(phydev, MII_M_PHY_EXT_SR);
+   if (temp < 0)
+   return temp;
+
+   temp &= ~(MII_M_HWCFG_MODE_MASK |
+ MII_M_HWCFG_FIBER_COPPER_AUTO |
+ MII_M_HWCFG_FIBER_COPPER_RES);
+   temp |= mode;
+
+   if (fibre_copper_auto)
+   temp |= MII_M_HWCFG_FIBER_COPPER_AUTO;
+
+   return phy_write(phydev, MII_M_PHY_EXT_SR, temp);
+}
+
 static int m88e_config_init_rgmii_delays(struct phy_device *phydev)
 {
int temp;
@@ -740,17 +757,11 @@ static int m88e_config_init_rgmii(struct phy_device 
*phydev)
 static int m88e_config_init_sgmii(struct phy_device *phydev)
 {
int err;
-   int temp;
 
-   temp = phy_read(phydev, MII_M_PHY_EXT_SR);
-   if (temp < 0)
-   return temp;
-
-   temp &= ~(MII_M_HWCFG_MODE_MASK);
-   temp |= MII_M_HWCFG_MODE_SGMII_NO_CLK;
-   temp |= MII_M_HWCFG_FIBER_COPPER_AUTO;
-
-   err = phy_write(phydev, MII_M_PHY_EXT_SR, temp);
+   err = m88e_config_init_hwcfg_mode(
+   phydev,
+   MII_M_HWCFG_MODE_SGMII_NO_CLK,
+   MII_M_HWCFG_FIBER_COPPER_AUTO);
if (err < 0)
return err;
 
@@ -760,22 +771,16 @@ static int m88e_config_init_sgmii(struct phy_device 
*phydev)
 
 static int m88e_config_init_rtbi(struct phy_device *phydev)
 {
-   int temp;
int err;
 
err = m88e_config_init_rgmii_delays(phydev);
if (err)
return err;
 
-   temp = phy_read(phydev, MII_M_PHY_EXT_SR);
-   if (temp < 0)
-   return temp;
-
-   temp &= ~(MII_M_HWCFG_MODE_MASK |
- MII_M_HWCFG_FIBER_COPPER_RES);
-   temp |= 0x7 | MII_M_HWCFG_FIBER_COPPER_AUTO;
-
-   err = phy_write(phydev, MII_M_PHY_EXT_SR, temp);
+   err = m88e_config_init_hwcfg_mode(
+   phydev,
+   MII_M_HWCFG_MODE_RTBI,
+   MII_M_HWCFG_FIBER_COPPER_AUTO);
if (err < 0)
return err;
 
@@ -784,16 +789,10 @@ static int m88e_config_init_rtbi(struct phy_device 
*phydev)
if (err < 0)
return err;
 
-   temp = phy_read(phydev, MII_M_PHY_EXT_SR);
-   if (temp < 0)
-   return temp;
-
-   temp &= ~(MII_M_HWCFG_MODE_MASK |
- MII_M_HWCFG_FIBER_COPPER_RES);
-   temp |= MII_M_HWCFG_MODE_COPPER_RTBI |
-   MII_M_HWCFG_FIBER_COPPER_AUTO;
-
-   return phy_write(phydev, MII_M_PHY_EXT_SR, temp);
+   return m88e_config_init_hwcfg_mode(
+   phydev,
+   MII_M_HWCFG_MODE_RTBI,
+   MII_M_HWCFG_FIBER_COPPER_AUTO);
 }
 
 static int m88e_config_init(struct phy_device *phydev)
@@ -999,16 +998,9 @@ static int m88e1145_config_init_rgmii(struct phy_device 
*phydev)
 
 static int m88e1145_config_init_sgmii(struct phy_device *phydev)
 {
-

[PATCH net-next 0/7] More Marvell PHY refactoring and cleanup

2017-07-30 Thread Andrew Lunn

Consolidate more duplicated code into helpers, make use of core
helpers, move code into a helper for later adding functionality to add
marvell PHYs, etc.

Andrew Lunn (7):
  net: phy: marvell: tabification
  net: phy: marvell: Use core genphy_soft_reset()
  net: phy: marvell: consolidate RGMII delay code
  net: phy: marvell: Consolidate setting the phy-mode
  net: phy: marvell: Refactor m88e1121 RGMII delay configuration
  net: phy: marvell: Use the set_polarity helper
  net: phy: marvell: Refactor setting downshift into a helper

 drivers/net/phy/marvell.c | 320 ++
 1 file changed, 150 insertions(+), 170 deletions(-)

-- 
2.13.2

[PATCH net-next 2/7] net: phy: marvell: Use core genphy_soft_reset()

2017-07-30 Thread Andrew Lunn

Rather than using an open coded equivalent, use the core
genphy_soft_reset() function.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/marvell.c | 47 ---
 1 file changed, 12 insertions(+), 35 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 6a5256ceb11e..33a52532fac6 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -292,17 +292,11 @@ static int marvell_config_aneg(struct phy_device *phydev)
return err;
 
if (phydev->autoneg != AUTONEG_ENABLE) {
-   int bmcr;
-
/* A write to speed/duplex bits (that is performed by
 * genphy_config_aneg() call above) must be followed by
 * a software reset. Otherwise, the write has no effect.
 */
-   bmcr = phy_read(phydev, MII_BMCR);
-   if (bmcr < 0)
-   return bmcr;
-
-   err = phy_write(phydev, MII_BMCR, bmcr | BMCR_RESET);
+   err = genphy_soft_reset(phydev);
if (err < 0)
return err;
}
@@ -318,8 +312,7 @@ static int m88e1101_config_aneg(struct phy_device *phydev)
 * that certain registers get written in order
 * to restart autonegotiation
 */
-   err = phy_write(phydev, MII_BMCR, BMCR_RESET);
-
+   err = genphy_soft_reset(phydev);
if (err < 0)
return err;
 
@@ -354,7 +347,7 @@ static int m88e_config_aneg(struct phy_device *phydev)
 * that certain registers get written in order
 * to restart autonegotiation
 */
-   err = phy_write(phydev, MII_BMCR, BMCR_RESET);
+   err = genphy_soft_reset(phydev);
 
err = marvell_set_polarity(phydev, phydev->mdix_ctrl);
if (err < 0)
@@ -370,17 +363,11 @@ static int m88e_config_aneg(struct phy_device *phydev)
return err;
 
if (phydev->autoneg != AUTONEG_ENABLE) {
-   int bmcr;
-
/* A write to speed/duplex bits (that is performed by
 * genphy_config_aneg() call above) must be followed by
 * a software reset. Otherwise, the write has no effect.
 */
-   bmcr = phy_read(phydev, MII_BMCR);
-   if (bmcr < 0)
-   return bmcr;
-
-   err = phy_write(phydev, MII_BMCR, bmcr | BMCR_RESET);
+   err = genphy_soft_reset(phydev);
if (err < 0)
return err;
}
@@ -493,7 +480,7 @@ static int m88e1121_config_aneg(struct phy_device *phydev)
 
marvell_set_page(phydev, oldpage);
 
-   err = phy_write(phydev, MII_BMCR, BMCR_RESET);
+   err = genphy_soft_reset(phydev);
if (err < 0)
return err;
 
@@ -656,9 +643,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
int temp;
int err;
 
-   temp = phy_read(phydev, MII_BMCR);
-   temp |= BMCR_RESET;
-   err = phy_write(phydev, MII_BMCR, temp);
+   err = genphy_soft_reset(phydev);
if (err < 0)
return err;
 
@@ -689,14 +674,10 @@ static int m88e1116r_config_init(struct phy_device 
*phydev)
if (err < 0)
return err;
 
-   temp = phy_read(phydev, MII_BMCR);
-   temp |= BMCR_RESET;
-   err = phy_write(phydev, MII_BMCR, temp);
+   err = genphy_soft_reset(phydev);
if (err < 0)
return err;
 
-   mdelay(500);
-
return marvell_config_init(phydev);
 }
 
@@ -804,14 +785,10 @@ static int m88e_config_init_rtbi(struct phy_device 
*phydev)
return err;
 
/* soft reset */
-   err = phy_write(phydev, MII_BMCR, BMCR_RESET);
+   err = genphy_soft_reset(phydev);
if (err < 0)
return err;
 
-   do
-   temp = phy_read(phydev, MII_BMCR);
-   while (temp & BMCR_RESET);
-
temp = phy_read(phydev, MII_M_PHY_EXT_SR);
if (temp < 0)
return temp;
@@ -850,7 +827,7 @@ static int m88e_config_init(struct phy_device *phydev)
if (err < 0)
return err;
 
-   return phy_write(phydev, MII_BMCR, BMCR_RESET);
+   return genphy_soft_reset(phydev);
 }
 
 static int m88e1121_config_init(struct phy_device *phydev)
@@ -912,7 +889,7 @@ static int m88e1118_config_aneg(struct phy_device *phydev)
 {
int err;
 
-   err = phy_write(phydev, MII_BMCR, BMCR_RESET);
+   err = genphy_soft_reset(phydev);
if (err < 0)
return err;
 
@@ -961,7 +938,7 @@ static int m88e1118_config_init(struct phy_device *phydev)
if (err < 0)
return err;
 
-   return phy_write(phydev, MII_BMCR, BMCR_RESET);
+   return genphy_soft_reset(phydev);
 }
 
 static int m88e1149_config_init(struct phy_device *phydev)
@@ -987,7 +964,7 @@ static int

[PATCH net-next 3/7] net: phy: marvell: consolidate RGMII delay code

2017-07-30 Thread Andrew Lunn

The same code is repeated for different PHY versions. Put it into a
help and call when needed.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/marvell.c | 54 +++
 1 file changed, 22 insertions(+), 32 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 33a52532fac6..c1b724ab5f25 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -61,13 +61,6 @@
 #define MII_M1011_PHY_SCR_AUTO_CROSS   0x0060
 
 #define MII_M1145_PHY_EXT_SR   0x1b
-#define MII_M1145_PHY_EXT_CR   0x14
-#define MII_M1145_RGMII_RX_DELAY   0x0080
-#define MII_M1145_RGMII_TX_DELAY   0x0002
-#define MII_M1145_HWCFG_MODE_SGMII_NO_CLK  0x4
-#define MII_M1145_HWCFG_MODE_MASK  0xf
-#define MII_M1145_HWCFG_FIBER_COPPER_AUTO  0x8000
-
 #define MII_M1145_HWCFG_MODE_SGMII_NO_CLK  0x4
 #define MII_M1145_HWCFG_MODE_MASK  0xf
 #define MII_M1145_HWCFG_FIBER_COPPER_AUTO  0x8000
@@ -76,8 +69,8 @@
 #define MII_M_PHY_LED_DIRECT   0x4100
 #define MII_M_PHY_LED_COMBINE  0x411c
 #define MII_M_PHY_EXT_CR   0x14
-#define MII_M_RX_DELAY 0x80
-#define MII_M_TX_DELAY 0x2
+#define MII_M_RGMII_RX_DELAY   BIT(7)
+#define MII_M_RGMII_TX_DELAY   BIT(1)
 #define MII_M_PHY_EXT_SR   0x1b
 
 #define MII_M_HWCFG_MODE_MASK  0xf
@@ -700,9 +693,8 @@ static int m88e3016_config_init(struct phy_device *phydev)
return marvell_config_init(phydev);
 }
 
-static int m88e_config_init_rgmii(struct phy_device *phydev)
+static int m88e_config_init_rgmii_delays(struct phy_device *phydev)
 {
-   int err;
int temp;
 
temp = phy_read(phydev, MII_M_PHY_EXT_CR);
@@ -710,16 +702,24 @@ static int m88e_config_init_rgmii(struct phy_device 
*phydev)
return temp;
 
if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID) {
-   temp |= (MII_M_RX_DELAY | MII_M_TX_DELAY);
+   temp |= (MII_M_RGMII_RX_DELAY | MII_M_RGMII_TX_DELAY);
} else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID) {
-   temp &= ~MII_M_TX_DELAY;
-   temp |= MII_M_RX_DELAY;
+   temp &= ~MII_M_RGMII_TX_DELAY;
+   temp |= MII_M_RGMII_RX_DELAY;
} else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID) {
-   temp &= ~MII_M_RX_DELAY;
-   temp |= MII_M_TX_DELAY;
+   temp &= ~MII_M_RGMII_RX_DELAY;
+   temp |= MII_M_RGMII_TX_DELAY;
}
 
-   err = phy_write(phydev, MII_M_PHY_EXT_CR, temp);
+   return phy_write(phydev, MII_M_PHY_EXT_CR, temp);
+}
+
+static int m88e_config_init_rgmii(struct phy_device *phydev)
+{
+   int temp;
+   int err;
+
+   err = m88e_config_init_rgmii_delays(phydev);
if (err < 0)
return err;
 
@@ -760,16 +760,11 @@ static int m88e_config_init_sgmii(struct phy_device 
*phydev)
 
 static int m88e_config_init_rtbi(struct phy_device *phydev)
 {
-   int err;
int temp;
+   int err;
 
-   temp = phy_read(phydev, MII_M_PHY_EXT_CR);
-   if (temp < 0)
-   return temp;
-
-   temp |= (MII_M_RX_DELAY | MII_M_TX_DELAY);
-   err = phy_write(phydev, MII_M_PHY_EXT_CR, temp);
-   if (err < 0)
+   err = m88e_config_init_rgmii_delays(phydev);
+   if (err)
return err;
 
temp = phy_read(phydev, MII_M_PHY_EXT_SR);
@@ -969,15 +964,10 @@ static int m88e1149_config_init(struct phy_device *phydev)
 
 static int m88e1145_config_init_rgmii(struct phy_device *phydev)
 {
+   int temp;
int err;
-   int temp = phy_read(phydev, MII_M1145_PHY_EXT_CR);
-
-   if (temp < 0)
-   return temp;
-
-   temp |= (MII_M1145_RGMII_RX_DELAY | MII_M1145_RGMII_TX_DELAY);
 
-   err = phy_write(phydev, MII_M1145_PHY_EXT_CR, temp);
+   err = m88e_config_init_rgmii_delays(phydev);
if (err < 0)
return err;
 
-- 
2.13.2

[PATCH net-next 1/7] net: phy: marvell: tabification

2017-07-30 Thread Andrew Lunn

Convert spaces to tabs where appropriate, and fix up some otherwise
odd indentation.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/marvell.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 5d314f143aea..6a5256ceb11e 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -108,24 +108,24 @@
 #define MII_88E1318S_PHY_MSCR1_PAD_ODD BIT(6)
 
 /* Copper Specific Interrupt Enable Register */
-#define MII_88E1318S_PHY_CSIER  0x12
+#define MII_88E1318S_PHY_CSIER 0x12
 /* WOL Event Interrupt Enable */
-#define MII_88E1318S_PHY_CSIER_WOL_EIE  BIT(7)
+#define MII_88E1318S_PHY_CSIER_WOL_EIE BIT(7)
 
 /* LED Timer Control Register */
-#define MII_88E1318S_PHY_LED_TCR0x12
-#define MII_88E1318S_PHY_LED_TCR_FORCE_INT  BIT(15)
-#define MII_88E1318S_PHY_LED_TCR_INTn_ENABLEBIT(7)
-#define MII_88E1318S_PHY_LED_TCR_INT_ACTIVE_LOW BIT(11)
+#define MII_88E1318S_PHY_LED_TCR   0x12
+#define MII_88E1318S_PHY_LED_TCR_FORCE_INT BIT(15)
+#define MII_88E1318S_PHY_LED_TCR_INTn_ENABLE   BIT(7)
+#define MII_88E1318S_PHY_LED_TCR_INT_ACTIVE_LOWBIT(11)
 
 /* Magic Packet MAC address registers */
-#define MII_88E1318S_PHY_MAGIC_PACKET_WORD2 0x17
-#define MII_88E1318S_PHY_MAGIC_PACKET_WORD1 0x18
-#define MII_88E1318S_PHY_MAGIC_PACKET_WORD0 0x19
+#define MII_88E1318S_PHY_MAGIC_PACKET_WORD20x17
+#define MII_88E1318S_PHY_MAGIC_PACKET_WORD10x18
+#define MII_88E1318S_PHY_MAGIC_PACKET_WORD00x19
 
-#define MII_88E1318S_PHY_WOL_CTRL   0x10
-#define MII_88E1318S_PHY_WOL_CTRL_CLEAR_WOL_STATUS  BIT(12)
-#define MII_88E1318S_PHY_WOL_CTRL_MAGIC_PACKET_MATCH_ENABLE BIT(14)
+#define MII_88E1318S_PHY_WOL_CTRL  0x10
+#define MII_88E1318S_PHY_WOL_CTRL_CLEAR_WOL_STATUS BIT(12)
+#define MII_88E1318S_PHY_WOL_CTRL_MAGIC_PACKET_MATCH_ENABLEBIT(14)
 
 #define MII_88E1121_PHY_LED_CTRL   16
 #define MII_88E1121_PHY_LED_DEF0x0030
@@ -152,7 +152,7 @@
 #define LPA_FIBER_1000HALF 0x40
 #define LPA_FIBER_1000FULL 0x20
 
-#define LPA_PAUSE_FIBER0x180
+#define LPA_PAUSE_FIBER0x180
 #define LPA_PAUSE_ASYM_FIBER   0x100
 
 #define ADVERTISE_FIBER_1000HALF   0x40
@@ -596,7 +596,7 @@ static int marvell_config_aneg_fiber(struct phy_device 
*phydev)
 
if (changed == 0) {
/* Advertisement hasn't changed, but maybe aneg was never on to
-* begin with?  Or maybe phy was isolated?
+* begin with?  Or maybe phy was isolated?
 */
int ctl = phy_read(phydev, MII_BMCR);
 
@@ -1515,7 +1515,7 @@ static void marvell_get_strings(struct phy_device 
*phydev, u8 *data)
 }
 
 #ifndef UINT64_MAX
-#define UINT64_MAX  (u64)(~((u64)0))
+#define UINT64_MAX (u64)(~((u64)0))
 #endif
 static u64 marvell_get_stat(struct phy_device *phydev, int i)
 {
-- 
2.13.2

Re: [PATCH V2 net-next 0/2] liquidio: Add support for managing liquidio adapter

2017-07-30 Thread Simon Horman

On Fri, Jul 28, 2017 at 11:17:07PM -0700, Felix Manlunas wrote:
> From: Veerasenareddy Burru 
> 
> The LiquidIO adapter has processor cores that can run Linux. This patch
> set adds support to create a virtual Ethernet interface on host to
> communicate with applications running on Linux in the LiquidIO adapter.
> The virtual Ethernet interface also provides login access to Linux on
> LiquidIO through ssh for management and debugging.

As per the somewhat more detailed feedback provided by my colleague Jakub
Kicinski to v1 of this patchset[1] I am concerned that this patchset breaks down
the long standing practice of not granting direct access to firmware from
userspace.

[1] https://www.spinics.net/lists/netdev/msg444929.html

Re: [PATCH v2] ravb: add wake-on-lan support via magic packet

2017-07-30 Thread Niklas Söderlund

Hi Andrew,

On 2017-07-30 22:07:31 +0200, Andrew Lunn wrote:
> On Sun, Jul 30, 2017 at 09:51:54PM +0200, Niklas Söderlund wrote:
> > Hi Andrew,
> > 
> > On 2017-07-30 19:07:38 +0200, Andrew Lunn wrote:
> > > Hi Niklas
> > > 
> > > > @@ -2041,6 +2073,11 @@ static int ravb_probe(struct platform_device 
> > > > *pdev)
> > > >  
> > > > priv->chip_id = chip_id;
> > > >  
> > > > +   /* Get clock, if not found that's OK but Wake-On-Lan is 
> > > > unavailable */
> > > > +   priv->clk = devm_clk_get(>dev, NULL);
> > > > +   if (IS_ERR(priv->clk))
> > > > +   priv->clk = NULL;
> > > 
> > > Can you get EPROBE_DEFER returned?
> > 
> > I don't think so, but I'm not sure :-)
> > 
> > The clock I'm trying to get is the module clock of the ravb itself, so 
> > if that clock is not available (and enabled) no register writes to the 
> > ravb would be possible in the first place, so i guess it's safe to 
> > assume -EPROBE_DEFER can not happen here?
> > 
> > I'm just trying to play it safe here since the clock is only needed to 
> > support WoL, I though it best to not change behavior here. Try to get 
> > the clock, if we can great we can do WoL if not then user-space will be 
> > prevented from enabling WoL and nothing in the current behavior changes.
> 
> Hi Nikls
> 
> Well, if it can return -EPROBE_DEFER, it means sometimes WoL will be
> avalable and other times not, depending on when the clock driver

Ahh I see yes that would be indeed be bad.

> probes. However, it sounds like this is the SoCs core clock driver. If
> so, it gets loaded very early, so you are safe.

Yes this is renesas-cpg-mssr which if I understand things is a core 
clock driver. It is register in drivers/clk/renesas/renesas-cpg-mssr.c 
using:

 subsys_initcall(cpg_mssr_init);

So I take it I'm safe. Thanks however for bringing this to my attention 
I learnt something new today :-)

> 
>Andrew

-- 
Regards,
Niklas Söderlund

Re: [patch net-next 09/20] net: sched: convert actions array into rcu list

2017-07-30 Thread Jamal Hadi Salim


On 17-07-28 10:40 AM, Jiri Pirko wrote:

From: Jiri Pirko 

Currently the actions are stored in array with array size. To traverse
this array in fastpath, tcf_tree_lock is taken to protect it. Convert
the array into a singly linked list, similar to the filter chains style
and allow traversal protected by rcu.



Looks sane. But lets have Cong provide an opinion (since he said he was
trying to rcu the actions).

cheers,
jamal

[PATCH net-next] net: fec: Issue error for missing but expected PHY

2017-07-30 Thread Andrew Lunn

If the PHY is missing but expected, e.g. because of a typ0 in the dt
file, it is not possible to open the interface. ip link returns:

RTNETLINK answers: No such device

It is not very obvious what the problem is. Add a netdev_err() in this
case to make it easier to debug the issue.

[   21.409385] fec 2188000.ethernet eth0: Unable to connect to phy
RTNETLINK answers: No such device

Signed-off-by: Andrew Lunn 
---
 drivers/net/ethernet/freescale/fec_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index cc0c2a58c4de..c5995f07f821 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1907,8 +1907,10 @@ static int fec_enet_mii_probe(struct net_device *ndev)
phy_dev = of_phy_connect(ndev, fep->phy_node,
 _enet_adjust_link, 0,
 fep->phy_interface);
-   if (!phy_dev)
+   if (!phy_dev) {
+   netdev_err(ndev, "Unable to connect to phy\n");
return -ENODEV;
+   }
} else {
/* check for attached phy */
for (phy_id = 0; (phy_id < PHY_MAX_ADDR); phy_id++) {
-- 
2.13.2

Re: [PATCH v2] ravb: add wake-on-lan support via magic packet

2017-07-30 Thread Andrew Lunn

On Sun, Jul 30, 2017 at 09:51:54PM +0200, Niklas Söderlund wrote:
> Hi Andrew,
> 
> On 2017-07-30 19:07:38 +0200, Andrew Lunn wrote:
> > Hi Niklas
> > 
> > > @@ -2041,6 +2073,11 @@ static int ravb_probe(struct platform_device *pdev)
> > >  
> > >   priv->chip_id = chip_id;
> > >  
> > > + /* Get clock, if not found that's OK but Wake-On-Lan is unavailable */
> > > + priv->clk = devm_clk_get(>dev, NULL);
> > > + if (IS_ERR(priv->clk))
> > > + priv->clk = NULL;
> > 
> > Can you get EPROBE_DEFER returned?
> 
> I don't think so, but I'm not sure :-)
> 
> The clock I'm trying to get is the module clock of the ravb itself, so 
> if that clock is not available (and enabled) no register writes to the 
> ravb would be possible in the first place, so i guess it's safe to 
> assume -EPROBE_DEFER can not happen here?
> 
> I'm just trying to play it safe here since the clock is only needed to 
> support WoL, I though it best to not change behavior here. Try to get 
> the clock, if we can great we can do WoL if not then user-space will be 
> prevented from enabling WoL and nothing in the current behavior changes.

Hi Nikls

Well, if it can return -EPROBE_DEFER, it means sometimes WoL will be
avalable and other times not, depending on when the clock driver
probes. However, it sounds like this is the SoCs core clock driver. If
so, it gets loaded very early, so you are safe.

   Andrew

[PATCH 1/1 v2] netfilter: constify nf_conntrack_l3/4proto parameters

2017-07-30 Thread Julia Lawall

When a nf_conntrack_l3/4proto parameter is not on the left hand side
of an assignment, its address is not taken, and it is not passed to a
function that may modify its fields, then it can be declared as const.

This change is useful from a documentation point of view, and can
possibly facilitate making some nf_conntrack_l3/4proto structures const
subsequently.

Done with the help of Coccinelle.

Some spacing adjusted to fit within 80 characters.

Signed-off-by: Julia Lawall 

---

v2: Added consideration of array parameters.  This adds transformation of
nf_ct_l4proto_pernet_register and nf_ct_l4proto_pernet_unregister.

This patch also adds transformation of ctnl_timeout_parse_policy that was
somehow overlooked previously.

 include/net/netfilter/nf_conntrack_l3proto.h |6 ++---
 include/net/netfilter/nf_conntrack_l4proto.h |   14 ++--
 include/net/netfilter/nf_conntrack_timeout.h |2 -
 net/netfilter/nf_conntrack_core.c|8 +++
 net/netfilter/nf_conntrack_netlink.c |6 ++---
 net/netfilter/nf_conntrack_proto.c   |   30 +--
 net/netfilter/nfnetlink_cttimeout.c  |5 ++--
 7 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_l3proto.h 
b/include/net/netfilter/nf_conntrack_l3proto.h
index 6d14b36..4782b15 100644
--- a/include/net/netfilter/nf_conntrack_l3proto.h
+++ b/include/net/netfilter/nf_conntrack_l3proto.h
@@ -76,17 +76,17 @@ struct nf_conntrack_l3proto {
 #ifdef CONFIG_SYSCTL
 /* Protocol pernet registration. */
 int nf_ct_l3proto_pernet_register(struct net *net,
- struct nf_conntrack_l3proto *proto);
+ const struct nf_conntrack_l3proto *proto);
 #else
 static inline int nf_ct_l3proto_pernet_register(struct net *n,
-   struct nf_conntrack_l3proto *p)
+   const struct nf_conntrack_l3proto *p)
 {
return 0;
 }
 #endif
 
 void nf_ct_l3proto_pernet_unregister(struct net *net,
-struct nf_conntrack_l3proto *proto);
+const struct nf_conntrack_l3proto *proto);
 
 /* Protocol global registration. */
 int nf_ct_l3proto_register(struct nf_conntrack_l3proto *proto);
diff --git a/include/net/netfilter/nf_conntrack_l4proto.h 
b/include/net/netfilter/nf_conntrack_l4proto.h
index 7032e04..c86e946 100644
--- a/include/net/netfilter/nf_conntrack_l4proto.h
+++ b/include/net/netfilter/nf_conntrack_l4proto.h
@@ -125,23 +125,23 @@ struct nf_conntrack_l4proto 
*__nf_ct_l4proto_find(u_int16_t l3proto,
 
 struct nf_conntrack_l4proto *nf_ct_l4proto_find_get(u_int16_t l3proto,
u_int8_t l4proto);
-void nf_ct_l4proto_put(struct nf_conntrack_l4proto *p);
+void nf_ct_l4proto_put(const struct nf_conntrack_l4proto *p);
 
 /* Protocol pernet registration. */
 int nf_ct_l4proto_pernet_register_one(struct net *net,
- struct nf_conntrack_l4proto *proto);
+   const struct nf_conntrack_l4proto *proto);
 void nf_ct_l4proto_pernet_unregister_one(struct net *net,
-struct nf_conntrack_l4proto *proto);
+   const struct nf_conntrack_l4proto *proto);
 int nf_ct_l4proto_pernet_register(struct net *net,
- struct nf_conntrack_l4proto *proto[],
+ struct nf_conntrack_l4proto * const proto[],
  unsigned int num_proto);
 void nf_ct_l4proto_pernet_unregister(struct net *net,
-struct nf_conntrack_l4proto *proto[],
-unsigned int num_proto);
+   struct nf_conntrack_l4proto * const proto[],
+   unsigned int num_proto);
 
 /* Protocol global registration. */
 int nf_ct_l4proto_register_one(struct nf_conntrack_l4proto *proto);
-void nf_ct_l4proto_unregister_one(struct nf_conntrack_l4proto *proto);
+void nf_ct_l4proto_unregister_one(const struct nf_conntrack_l4proto *proto);
 int nf_ct_l4proto_register(struct nf_conntrack_l4proto *proto[],
   unsigned int num_proto);
 void nf_ct_l4proto_unregister(struct nf_conntrack_l4proto *proto[],
diff --git a/include/net/netfilter/nf_conntrack_timeout.h 
b/include/net/netfilter/nf_conntrack_timeout.h
index d40b893..b222957 100644
--- a/include/net/netfilter/nf_conntrack_timeout.h
+++ b/include/net/netfilter/nf_conntrack_timeout.h
@@ -68,7 +68,7 @@ struct nf_conn_timeout *nf_ct_timeout_ext_add(struct nf_conn 
*ct,
 
 static inline unsigned int *
 nf_ct_timeout_lookup(struct net *net, struct nf_conn *ct,
-struct nf_conntrack_l4proto *l4proto)
+const struct nf_conntrack_l4proto *l4proto)
 {

[PATCH 0/1 v2] constify nf_conntrack_l3/4proto parameters

2017-07-30 Thread Julia Lawall

When a nf_conntrack_l3/4proto parameter is not on the left hand side
of an assignment, its address is not taken, and it is not passed to a
function that may modify its fields, then it can be declared as const.

This change is useful from a documentation point of view, and can
possibly facilitate making some nf_conntrack_l4proto structures const
subsequently.

Done with the help of Coccinelle.  The following semantic patch shows
the nf_conntrack_l4proto case.

// 
virtual update_results
virtual after_start

@initialize:ocaml@
@@

let unsafe = Hashtbl.create 101

let is_unsafe f = Hashtbl.mem unsafe f

let changed = ref false


(* The next three rules relate to the fact that we do not know the type of
void * variables.  Fortunately this is only neede on the first iteration,
but it still means that the whole kernel will end up being considered. *)

@has depends on !after_start && !update_results@
identifier f,x;
position p;
@@

f@p(...,struct nf_conntrack_l3proto *x,...) { ... }

@hasa depends on !after_start@
identifier f,x;
position p;
@@

f@p(...,struct nf_conntrack_l3proto *x[],...) { ... }

@others depends on !after_start && !update_results@
position p != {has.p,hasa.p};
identifier f,x;
@@

f@p(...,void *x,...) { ... }

@script:ocaml@
f << others.f;
@@

changed := true;
Hashtbl.add unsafe f ()


@fpb depends on !update_results disable optional_qualifier, drop_cast exists@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x,fld;
identifier bad : script:ocaml() { is_unsafe(bad) };
assignment operator aop;
expression e;
local idexpression fp;
type T;
@@

f(...,struct nf_conntrack_l3proto *x,...)
{
...
(
  return x;
|
  (<+...x...+>) aop e
|
  e aop x
|
  (T)x
|
  &(<+...x...+>)
|
  bad(...,x,...)
|
  fp(...,x,...)
|
  (<+...e->fld...+>)(...,x,...)
)
...when any
 }

@script:ocaml@
f << fpb.f;
@@

changed := true;
Printf.eprintf "%s is unsafe\n" f;
Hashtbl.add unsafe f ()

@fpba depends on !update_results disable optional_qualifier, drop_cast exists@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x,fld;
identifier bad : script:ocaml() { is_unsafe(bad) };
assignment operator aop;
expression e;
local idexpression fp;
type T;
@@

f(...,struct nf_conntrack_l3proto *x[],...)
{
...
(
  return \(x\|x[...]\);
|
  (<+...x...+>) aop e
|
  e aop \(x\|x[...]\)
|
  (T)\(x\|x[...]\)
|
  &(<+...x...+>)
|
  bad(...,\(x\|x[...]\),...)
|
  fp(...,\(x\|x[...]\),...)
|
  (<+...e->fld...+>)(...,\(x\|x[...]\),...)
)
... when any
 }

@script:ocaml@
f << fpba.f;
@@

changed := true;
Printf.eprintf "%s is unsafe\n" f;
Hashtbl.add unsafe f ()

@finalize:ocaml depends on !update_results@
tbls << merge.unsafe;
c << merge.changed;
@@

List.iter
(fun t ->
  Hashtbl.iter
(fun k v ->
  if not (Hashtbl.mem unsafe k) then Hashtbl.add unsafe k ()) t)
tbls;
changed := false;
let changed = List.exists (fun x -> !x) c in
let it = new iteration() in
it#add_virtual_rule After_start;
(if not changed
then it#add_virtual_rule Update_results);
it#register()

@depends on update_results disable optional_qualifier@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x;
@@

f(...,
+ const
  struct nf_conntrack_l3proto *x,...) { ... }

@depends on update_results disable optional_qualifier@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x;
type T;
@@

T f(...,
+ const
  struct nf_conntrack_l3proto *x,...);

@depends on update_results disable optional_qualifier@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x;
@@

f(...,
+ const
  struct nf_conntrack_l3proto *x[],...) { ... }

@depends on update_results disable optional_qualifier@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x;
type T;
@@

T f(...,
+ const
  struct nf_conntrack_l3proto *x[],...);
// 

---

v2: Added consideration of array parameters.  This adds transformation of
nf_ct_l4proto_pernet_register and nf_ct_l4proto_pernet_unregister.

This patch also adds transformation of ctnl_timeout_parse_policy that was
somehow overlooked previously.

 include/net/netfilter/nf_conntrack_l3proto.h |6 ++---
 include/net/netfilter/nf_conntrack_l4proto.h |   14 ++--
 include/net/netfilter/nf_conntrack_timeout.h |2 -
 net/netfilter/nf_conntrack_core.c|8 +++
 net/netfilter/nf_conntrack_netlink.c |6 ++---
 net/netfilter/nf_conntrack_proto.c   |   30 +--
 net/netfilter/nfnetlink_cttimeout.c  |5 ++--
 7 files changed, 36 insertions(+), 35 deletions(-)

Re: [PATCH net-next v12 1/4] net netlink: Add new type NLA_BITFIELD32

2017-07-30 Thread Jamal Hadi Salim


Jiri,

This is getting exhausting, seriously.
I posted the code you are commenting one two days ago so i dont have to
repost.

On D. Ahern: I dont think we are disagreeing anymore on the need to
generalize the check. He is saying it should be a helper and I already
had the validation data; either works. I dont see the gapping need
to remove the validation data.

cheers,
jamal

On 17-07-30 02:42 PM, Jiri Pirko wrote:

Sun, Jul 30, 2017 at 07:24:49PM CEST, j...@mojatatu.com wrote:

From: Jamal Hadi Salim 

Generic bitflags attribute content sent to the kernel by user.
With this netlink attr type the user can either set or unset a
flag in the kernel.

The value is a bitmap that defines the bit values being set
The selector is a bitmask that defines which value bit is to be
considered.

A check is made to ensure the rules that a kernel subsystem always
conforms to bitflags the kernel already knows about. i.e
if the user tries to set a bit flag that is not understood then
the _it will be rejected_.

In the most basic form, the user specifies the attribute policy as:
[ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data =  },

where myvalidflags is the bit mask of the flags the kernel understands.

If the user _does not_ provide myvalidflags then the attribute will
also be rejected.

Examples:
value = 0x0, and selector = 0x1
implies we are selecting bit 1 and we want to set its value to 0.

value = 0x2, and selector = 0x2
implies we are selecting bit 2 and we want to set its value to 1.

Suggested-by: Jiri Pirko 
Signed-off-by: Jamal Hadi Salim 
---
include/net/netlink.h| 16 
include/uapi/linux/netlink.h | 17 +
lib/nlattr.c | 30 ++
3 files changed, 63 insertions(+)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index ef8e6c3..82dd298 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -178,6 +178,7 @@ enum {
NLA_S16,
NLA_S32,
NLA_S64,
+   NLA_BITFIELD32,
__NLA_TYPE_MAX,
};

@@ -206,6 +207,7 @@ enum {
  *NLA_MSECSLeaving the length field zero will verify the
  * given type fits, using it verifies minimum length
  * just like "All other"
+ *NLA_BITFIELD32  A 32-bit bitmap/bitselector attribute
  *All otherMinimum length of attribute payload
  *
  * Example:
@@ -213,11 +215,13 @@ enum {
  * [ATTR_FOO] = { .type = NLA_U16 },
  * [ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ },
  * [ATTR_BAZ] = { .len = sizeof(struct mystruct) },
+ * [ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data =  
},


Checkpatch warns you about the line to long, please wrap it.

Btw, I did not see you reached a consensus with DavidA regarding this.
Did I miss it?



  * };
  */
struct nla_policy {
u16 type;
u16 len;
+   void*validation_data;
};

/**
@@ -1203,6 +1207,18 @@ static inline struct in6_addr nla_get_in6_addr(const 
struct nlattr *nla)
}

/**
+ * nla_get_bitfield32 - return payload of 32 bitfield attribute
+ * @nla: nla_bitfield32 attribute
+ */
+static inline struct nla_bitfield32 nla_get_bitfield32(const struct nlattr 
*nla)
+{
+   struct nla_bitfield32 tmp;
+
+   nla_memcpy(, nla, sizeof(tmp));
+   return tmp;
+}
+
+/**
  * nla_memdup - duplicate attribute memory (kmemdup)
  * @src: netlink attribute to duplicate from
  * @gfp: GFP mask
diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index f86127a..f4fc9c9 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -226,5 +226,22 @@ struct nlattr {
#define NLA_ALIGN(len)  (((len) + NLA_ALIGNTO - 1) & ~(NLA_ALIGNTO - 1))
#define NLA_HDRLEN  ((int) NLA_ALIGN(sizeof(struct nlattr)))

+/* Generic 32 bitflags attribute content sent to the kernel.
+ *
+ * The value is a bitmap that defines the values being set
+ * The selector is a bitmask that defines which value is legit
+ *
+ * Examples:
+ *  value = 0x0, and selector = 0x1
+ *  implies we are selecting bit 1 and we want to set its value to 0.
+ *
+ *  value = 0x2, and selector = 0x2
+ *  implies we are selecting bit 2 and we want to set its value to 1.
+ *
+ */
+struct nla_bitfield32 {
+   __u32 value;
+   __u32 selector;
+};

#endif /* _UAPI__LINUX_NETLINK_H */
diff --git a/lib/nlattr.c b/lib/nlattr.c
index fb52435..ee79b7a 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -27,6 +27,30 @@
[NLA_S64]   = sizeof(s64),
};

+static int validate_nla_bitfield32(const struct nlattr *nla,
+  u32 *valid_flags_allowed)
+{
+   const struct nla_bitfield32 *bf = nla_data(nla);
+   u32 *valid_flags_mask = valid_flags_allowed;


I pointed this out already. This weird.
You do *u32 = *u32, just with different name. Just use valid_flags_allowed

Re: [PATCH v2] ravb: add wake-on-lan support via magic packet

2017-07-30 Thread Niklas Söderlund

Hi Andrew,

On 2017-07-30 19:07:38 +0200, Andrew Lunn wrote:
> Hi Niklas
> 
> > @@ -2041,6 +2073,11 @@ static int ravb_probe(struct platform_device *pdev)
> >  
> > priv->chip_id = chip_id;
> >  
> > +   /* Get clock, if not found that's OK but Wake-On-Lan is unavailable */
> > +   priv->clk = devm_clk_get(>dev, NULL);
> > +   if (IS_ERR(priv->clk))
> > +   priv->clk = NULL;
> 
> Can you get EPROBE_DEFER returned?

I don't think so, but I'm not sure :-)

The clock I'm trying to get is the module clock of the ravb itself, so 
if that clock is not available (and enabled) no register writes to the 
ravb would be possible in the first place, so i guess it's safe to 
assume -EPROBE_DEFER can not happen here?

I'm just trying to play it safe here since the clock is only needed to 
support WoL, I though it best to not change behavior here. Try to get 
the clock, if we can great we can do WoL if not then user-space will be 
prevented from enabling WoL and nothing in the current behavior changes.

> 
> Andrew

-- 
Regards,
Niklas Söderlund

Re: [patch net-next 04/20] net: sched: use tcf_exts_has_actions in tcf_exts_exec

2017-07-30 Thread Jamal Hadi Salim


I am probably missing something. All those changes to just
replace "if (exts->nr_actions)" with "if (tcf_exts_has_actions(exts))" ?

cheers,
jamal

On 17-07-28 10:40 AM, Jiri Pirko wrote:

From: Jiri Pirko 

Use the tcf_exts_has_actions helper instead or directly testing
exts->nr_actions in tcf_exts_exec.

Signed-off-by: Jiri Pirko 
---
  include/net/pkt_cls.h | 46 +++---
  1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 7f25636..322a282 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -177,29 +177,6 @@ tcf_exts_stats_update(const struct tcf_exts *exts,
  }
  
  /**

- * tcf_exts_exec - execute tc filter extensions
- * @skb: socket buffer
- * @exts: tc filter extensions handle
- * @res: desired result
- *
- * Executes all configured extensions. Returns 0 on a normal execution,
- * a negative number if the filter must be considered unmatched or
- * a positive action code (TC_ACT_*) which must be returned to the
- * underlying layer.
- */
-static inline int
-tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
-  struct tcf_result *res)
-{
-#ifdef CONFIG_NET_CLS_ACT
-   if (exts->nr_actions)
-   return tcf_action_exec(skb, exts->actions, exts->nr_actions,
-  res);
-#endif
-   return 0;
-}
-
-/**
   * tcf_exts_has_actions - check if at least one action is present
   * @exts: tc filter extensions handle
   *
@@ -229,6 +206,29 @@ static inline bool tcf_exts_has_one_action(struct tcf_exts 
*exts)
  #endif
  }
  
+/**

+ * tcf_exts_exec - execute tc filter extensions
+ * @skb: socket buffer
+ * @exts: tc filter extensions handle
+ * @res: desired result
+ *
+ * Executes all configured extensions. Returns 0 on a normal execution,
+ * a negative number if the filter must be considered unmatched or
+ * a positive action code (TC_ACT_*) which must be returned to the
+ * underlying layer.
+ */
+static inline int
+tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
+ struct tcf_result *res)
+{
+#ifdef CONFIG_NET_CLS_ACT
+   if (tcf_exts_has_actions(exts))
+   return tcf_action_exec(skb, exts->actions, exts->nr_actions,
+  res);
+#endif
+   return 0;
+}
+
  int tcf_exts_validate(struct net *net, struct tcf_proto *tp,
  struct nlattr **tb, struct nlattr *rate_tlv,
  struct tcf_exts *exts, bool ovr);

Re: [patch net-next 03/20] net: sched: change names of action number helpers to be aligned with the rest

2017-07-30 Thread Jamal Hadi Salim


On 17-07-28 10:40 AM, Jiri Pirko wrote:

From: Jiri Pirko 

The rest of the helpers are named tcf_exts_*, so change the name of
the action number helpers to be aligned. While at it, change to inline
functions.

Signed-off-by: Jiri Pirko 


Looks reasonable.

Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [PATCH net-next v12 4/4] net sched actions: add time filter for action dumping

2017-07-30 Thread Jiri Pirko

Sun, Jul 30, 2017 at 07:24:52PM CEST, j...@mojatatu.com wrote:
>From: Jamal Hadi Salim 
>
>This patch adds support for filtering based on time since last used.
>When we are dumping a large number of actions it is useful to
>have the option of filtering based on when the action was last
>used to reduce the amount of data crossing to user space.
>
>With this patch the user space app sets the TCA_ROOT_TIME_DELTA
>attribute with the value in milliseconds with "time of interest
>since now".  The kernel converts this to jiffies and does the
>filtering comparison matching entries that have seen activity
>since then and returns them to user space.
>Old kernels and old tc continue to work in legacy mode since
>they dont specify this attribute.
>
>Some example (we have 400 actions bound to 400 filters); at
>installation time. Using updated when tc setting the time of
>interest to 120 seconds earlier (we see 400 actions):
>prompt$ hackedtc actions ls action gact since 12| grep index | wc -l
>400
>
>go get some coffee and wait for > 120 seconds and try again:
>
>prompt$ hackedtc actions ls action gact since 12 | grep index | wc -l
>0
>
>Lets see a filter bound to one of these actions:
>
>filter pref 10 u32
>filter pref 10 u32 fh 800: ht divisor 1
>filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule 
>hit 2 success 1)
>  match 7f02/ at 12 (success 1 )
>action order 1: gact action pass
> random type none pass val 0
> index 23 ref 2 bind 1 installed 1145 sec used 802 sec
>Action statistics:
>Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>
>that coffee took long, no? It was good.
>
>Now lets ping -c 1 127.0.0.2, then run the actions again:
>prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
>1
>
>More details please:
>prompt$ hackedtc -s actions ls action gact since 12
>
>action order 0: gact action pass
> random type none pass val 0
> index 23 ref 2 bind 1 installed 1270 sec used 30 sec
>Action statistics:
>Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>And the filter?
>
>filter pref 10 u32
>filter pref 10 u32 fh 800: ht divisor 1
>filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule 
>hit 4 success 2)
>  match 7f02/ at 12 (success 2 )
>action order 1: gact action pass
> random type none pass val 0
> index 23 ref 2 bind 1 installed 1324 sec used 84 sec
>Action statistics:
>Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>Signed-off-by: Jamal Hadi Salim 

Reviewed-by: Jiri Pirko

Re: [PATCH net-next v12 3/4] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

2017-07-30 Thread Jiri Pirko

Sun, Jul 30, 2017 at 07:24:51PM CEST, j...@mojatatu.com wrote:
>From: Jamal Hadi Salim 
>
>When you dump hundreds of thousands of actions, getting only 32 per
>dump batch even when the socket buffer and memory allocations allow
>is inefficient.
>
>With this change, the user will get as many as possibly fitting
>within the given constraints available to the kernel.
>
>The top level action TLV space is extended. An attribute
>TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
>is set by the user indicating the user is capable of processing
>these large dumps. Older user space which doesnt set this flag
>doesnt get the large (than 32) batches.
>The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
>actions are put in a single batch. As such user space app knows how long
>to iterate (independent of the type of action being dumped)
>instead of hardcoded maximum of 32 thus maintaining backward compat.
>
>Some results dumping 1.5M actions below:
>first an unpatched tc which doesnt understand these features...
>
>prompt$ time -p tc actions ls action gact | grep index | wc -l
>150
>real 1388.43
>user 2.07
>sys 1386.79
>
>Now lets see a patched tc which sets the correct flags when requesting
>a dump:
>
>prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
>150
>real 178.13
>user 2.02
>sys 176.96
>
>That is about 8x performance improvement for tc app which sets its
>receive buffer to about 32K.
>
>Signed-off-by: Jamal Hadi Salim 

If DavidA is ok with the "validation_data", I am fine with this patch.

Reviewed-by: Jiri Pirko

Re: [patch net-next 01/20] net: sched: sch_atm: use Qdisc_class_common structure

2017-07-30 Thread Jamal Hadi Salim


On 17-07-28 10:40 AM, Jiri Pirko wrote:

From: Jiri Pirko 

Even if it is only for classid now, use this common struct a be aligned
with the rest of the classful qdiscs.

Signed-off-by: Jiri Pirko 


Looks good to me.

Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [PATCH net-next v12 2/4] net sched actions: Use proper root attribute table for actions

2017-07-30 Thread Jiri Pirko

Sun, Jul 30, 2017 at 07:24:50PM CEST, j...@mojatatu.com wrote:
>From: Jamal Hadi Salim 
>
>Bug fix for an issue which has been around for about a decade.
>We got away with it because the enumeration was larger than needed.
>
>Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new 
>netlink API")
>Suggested-by: Jiri Pirko 
>Reviewed-by: Simon Horman 
>Signed-off-by: Jamal Hadi Salim 

Reviewed-by: Jiri Pirko

Re: [PATCH net-next v12 1/4] net netlink: Add new type NLA_BITFIELD32

2017-07-30 Thread Jiri Pirko

Sun, Jul 30, 2017 at 07:24:49PM CEST, j...@mojatatu.com wrote:
>From: Jamal Hadi Salim 
>
>Generic bitflags attribute content sent to the kernel by user.
>With this netlink attr type the user can either set or unset a
>flag in the kernel.
>
>The value is a bitmap that defines the bit values being set
>The selector is a bitmask that defines which value bit is to be
>considered.
>
>A check is made to ensure the rules that a kernel subsystem always
>conforms to bitflags the kernel already knows about. i.e
>if the user tries to set a bit flag that is not understood then
>the _it will be rejected_.
>
>In the most basic form, the user specifies the attribute policy as:
>[ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data =  },
>
>where myvalidflags is the bit mask of the flags the kernel understands.
>
>If the user _does not_ provide myvalidflags then the attribute will
>also be rejected.
>
>Examples:
>value = 0x0, and selector = 0x1
>implies we are selecting bit 1 and we want to set its value to 0.
>
>value = 0x2, and selector = 0x2
>implies we are selecting bit 2 and we want to set its value to 1.
>
>Suggested-by: Jiri Pirko 
>Signed-off-by: Jamal Hadi Salim 
>---
> include/net/netlink.h| 16 
> include/uapi/linux/netlink.h | 17 +
> lib/nlattr.c | 30 ++
> 3 files changed, 63 insertions(+)
>
>diff --git a/include/net/netlink.h b/include/net/netlink.h
>index ef8e6c3..82dd298 100644
>--- a/include/net/netlink.h
>+++ b/include/net/netlink.h
>@@ -178,6 +178,7 @@ enum {
>   NLA_S16,
>   NLA_S32,
>   NLA_S64,
>+  NLA_BITFIELD32,
>   __NLA_TYPE_MAX,
> };
> 
>@@ -206,6 +207,7 @@ enum {
>  *NLA_MSECSLeaving the length field zero will verify the
>  * given type fits, using it verifies minimum length
>  * just like "All other"
>+ *NLA_BITFIELD32  A 32-bit bitmap/bitselector attribute
>  *All otherMinimum length of attribute payload
>  *
>  * Example:
>@@ -213,11 +215,13 @@ enum {
>  *[ATTR_FOO] = { .type = NLA_U16 },
>  *[ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ },
>  *[ATTR_BAZ] = { .len = sizeof(struct mystruct) },
>+ *[ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data =  
>},

Checkpatch warns you about the line to long, please wrap it.

Btw, I did not see you reached a consensus with DavidA regarding this.
Did I miss it?


>  * };
>  */
> struct nla_policy {
>   u16 type;
>   u16 len;
>+  void*validation_data;
> };
> 
> /**
>@@ -1203,6 +1207,18 @@ static inline struct in6_addr nla_get_in6_addr(const 
>struct nlattr *nla)
> }
> 
> /**
>+ * nla_get_bitfield32 - return payload of 32 bitfield attribute
>+ * @nla: nla_bitfield32 attribute
>+ */
>+static inline struct nla_bitfield32 nla_get_bitfield32(const struct nlattr 
>*nla)
>+{
>+  struct nla_bitfield32 tmp;
>+
>+  nla_memcpy(, nla, sizeof(tmp));
>+  return tmp;
>+}
>+
>+/**
>  * nla_memdup - duplicate attribute memory (kmemdup)
>  * @src: netlink attribute to duplicate from
>  * @gfp: GFP mask
>diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
>index f86127a..f4fc9c9 100644
>--- a/include/uapi/linux/netlink.h
>+++ b/include/uapi/linux/netlink.h
>@@ -226,5 +226,22 @@ struct nlattr {
> #define NLA_ALIGN(len)(((len) + NLA_ALIGNTO - 1) & 
> ~(NLA_ALIGNTO - 1))
> #define NLA_HDRLEN((int) NLA_ALIGN(sizeof(struct nlattr)))
> 
>+/* Generic 32 bitflags attribute content sent to the kernel.
>+ *
>+ * The value is a bitmap that defines the values being set
>+ * The selector is a bitmask that defines which value is legit
>+ *
>+ * Examples:
>+ *  value = 0x0, and selector = 0x1
>+ *  implies we are selecting bit 1 and we want to set its value to 0.
>+ *
>+ *  value = 0x2, and selector = 0x2
>+ *  implies we are selecting bit 2 and we want to set its value to 1.
>+ *
>+ */
>+struct nla_bitfield32 {
>+  __u32 value;
>+  __u32 selector;
>+};
> 
> #endif /* _UAPI__LINUX_NETLINK_H */
>diff --git a/lib/nlattr.c b/lib/nlattr.c
>index fb52435..ee79b7a 100644
>--- a/lib/nlattr.c
>+++ b/lib/nlattr.c
>@@ -27,6 +27,30 @@
>   [NLA_S64]   = sizeof(s64),
> };
> 
>+static int validate_nla_bitfield32(const struct nlattr *nla,
>+ u32 *valid_flags_allowed)
>+{
>+  const struct nla_bitfield32 *bf = nla_data(nla);
>+  u32 *valid_flags_mask = valid_flags_allowed;

I pointed this out already. This weird.
You do *u32 = *u32, just with different name. Just use valid_flags_allowed
directly.


>+
>+  if (!valid_flags_allowed)
>+  return -EINVAL;
>+
>+  /*disallow invalid bit selector */

Fix all the comments in this function. Should be
/* something */
with spaces in front and at the end.


>+  if (bf->selector & ~*valid_flags_mask)
>+

Fw: [Bug 196533] New: kernel stack infoleaks

2017-07-30 Thread Stephen Hemminger

Begin forwarded message:

Date: Sun, 30 Jul 2017 05:13:08 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 196533] New: kernel stack infoleaks

https://bugzilla.kernel.org/show_bug.cgi?id=196533

Bug ID: 196533
   Summary: kernel stack infoleaks
   Product: Networking
   Version: 2.5
Kernel Version: 4.12.2
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: sohu0...@126.com
Regression: No

bug in net/irda/af_irda.c  

Sometimes irda_getsockopt() doesn't initialize all members of list field of
irda_device_list struct.  This structure is then copied to
userland.  It leads to leaking of contents of kernel stack memory.  We have to
initialize them to zero , or it will allows local users to obtain potentially
sensitive information from kernel stack memory by reading a copy of this
structure  

https://github.com/torvalds/linux/pull/440

-- 
You are receiving this mail because:
You are the assignee for the bug.

[PATCH v2 net-next 4/4] net: dsa: lan9303: MDIO access phy registers directly

2017-07-30 Thread Egil Hjelmeland

Indirect access (PMI) to phy register only work in I2C mode. In
MDIO mode phy registers must be accessed directly. Introduced
struct lan9303_phy_ops to handle the two modes.

Signed-off-by: Egil Hjelmeland 
Reviewed-by: Andrew Lunn 
Reviewed-by: Vivien Didelot 
Reviewed-by: Florian Fainelli 
---
 drivers/net/dsa/lan9303-core.c | 20 +---
 drivers/net/dsa/lan9303.h  | 11 +++
 drivers/net/dsa/lan9303_i2c.c  |  2 ++
 drivers/net/dsa/lan9303_mdio.c | 21 +
 4 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 9427c3b0ced2..8e430d1ee297 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -334,6 +334,12 @@ static int lan9303_indirect_phy_write(struct lan9303 
*chip, int addr,
return ret;
 }
 
+const struct lan9303_phy_ops lan9303_indirect_phy_ops = {
+   .phy_read = lan9303_indirect_phy_read,
+   .phy_write = lan9303_indirect_phy_write,
+};
+EXPORT_SYMBOL_GPL(lan9303_indirect_phy_ops);
+
 static int lan9303_switch_wait_for_completion(struct lan9303 *chip)
 {
int ret, i;
@@ -435,7 +441,7 @@ static int lan9303_detect_phy_setup(struct lan9303 *chip)
 * 0x, which means 'phy_addr_sel_strap' is 1 and the IDs are 1-2-3.
 * 0x is returned on MDIO read with no response.
 */
-   reg = lan9303_indirect_phy_read(chip, 3, MII_LAN911X_SPECIAL_MODES);
+   reg = chip->ops->phy_read(chip, 3, MII_LAN911X_SPECIAL_MODES);
if (reg < 0) {
dev_err(chip->dev, "Failed to detect phy config: %d\n", reg);
return reg;
@@ -726,7 +732,7 @@ static int lan9303_phy_read(struct dsa_switch *ds, int phy, 
int regnum)
if (phy > phy_base + 2)
return -ENODEV;
 
-   return lan9303_indirect_phy_read(chip, phy, regnum);
+   return chip->ops->phy_read(chip, phy, regnum);
 }
 
 static int lan9303_phy_write(struct dsa_switch *ds, int phy, int regnum,
@@ -740,7 +746,7 @@ static int lan9303_phy_write(struct dsa_switch *ds, int 
phy, int regnum,
if (phy > phy_base + 2)
return -ENODEV;
 
-   return lan9303_indirect_phy_write(chip, phy, regnum, val);
+   return chip->ops->phy_write(chip, phy, regnum, val);
 }
 
 static int lan9303_port_enable(struct dsa_switch *ds, int port,
@@ -773,13 +779,13 @@ static void lan9303_port_disable(struct dsa_switch *ds, 
int port,
switch (port) {
case 1:
lan9303_disable_packet_processing(chip, LAN9303_PORT_1_OFFSET);
-   lan9303_indirect_phy_write(chip, chip->phy_addr_sel_strap + 1,
-  MII_BMCR, BMCR_PDOWN);
+   lan9303_phy_write(ds, chip->phy_addr_sel_strap + 1,
+ MII_BMCR, BMCR_PDOWN);
break;
case 2:
lan9303_disable_packet_processing(chip, LAN9303_PORT_2_OFFSET);
-   lan9303_indirect_phy_write(chip, chip->phy_addr_sel_strap + 2,
-  MII_BMCR, BMCR_PDOWN);
+   lan9303_phy_write(ds, chip->phy_addr_sel_strap + 2,
+ MII_BMCR, BMCR_PDOWN);
break;
default:
dev_dbg(chip->dev,
diff --git a/drivers/net/dsa/lan9303.h b/drivers/net/dsa/lan9303.h
index d1512dad2d90..4d8be555ff4d 100644
--- a/drivers/net/dsa/lan9303.h
+++ b/drivers/net/dsa/lan9303.h
@@ -2,6 +2,15 @@
 #include 
 #include 
 
+struct lan9303;
+
+struct lan9303_phy_ops {
+   /* PHY 1 and 2 access*/
+   int (*phy_read)(struct lan9303 *chip, int port, int regnum);
+   int (*phy_write)(struct lan9303 *chip, int port,
+int regnum, u16 val);
+};
+
 struct lan9303 {
struct device *dev;
struct regmap *regmap;
@@ -11,9 +20,11 @@ struct lan9303 {
bool phy_addr_sel_strap;
struct dsa_switch *ds;
struct mutex indirect_mutex; /* protect indexed register access */
+   const struct lan9303_phy_ops *ops;
 };
 
 extern const struct regmap_access_table lan9303_register_set;
+extern const struct lan9303_phy_ops lan9303_indirect_phy_ops;
 
 int lan9303_probe(struct lan9303 *chip, struct device_node *np);
 int lan9303_remove(struct lan9303 *chip);
diff --git a/drivers/net/dsa/lan9303_i2c.c b/drivers/net/dsa/lan9303_i2c.c
index ab3ce0da5071..24ec20f7f444 100644
--- a/drivers/net/dsa/lan9303_i2c.c
+++ b/drivers/net/dsa/lan9303_i2c.c
@@ -63,6 +63,8 @@ static int lan9303_i2c_probe(struct i2c_client *client,
i2c_set_clientdata(client, sw_dev);
sw_dev->chip.dev = >dev;
 
+   sw_dev->chip.ops = _indirect_phy_ops;
+
ret = lan9303_probe(_dev->chip, client->dev.of_node);
if (ret != 0)
return ret;
diff --git a/drivers/net/dsa/lan9303_mdio.c

[PATCH v2 net-next 3/4] net: dsa: lan9303: Renamed indirect phy access functions

2017-07-30 Thread Egil Hjelmeland

Preparing for the following fix of MDIO phy access:

Renamed functions that access PHY 1 and 2 indirectly through PMI
registers.

 lan9303_port_phy_reg_wait_for_completion() to
 lan9303_indirect_phy_wait_for_completion()

 lan9303_port_phy_reg_read() to
 lan9303_indirect_phy_read()

 lan9303_port_phy_reg_write() to
 lan9303_indirect_phy_write()

Also changed "val" parameter of lan9303_indirect_phy_write() to u16,
for clarity.

Signed-off-by: Egil Hjelmeland 
Reviewed-by: Andrew Lunn 
Reviewed-by: Vivien Didelot 
Reviewed-by: Florian Fainelli 
---
 drivers/net/dsa/lan9303-core.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 96ebeb9bd59a..9427c3b0ced2 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -248,7 +248,7 @@ static int lan9303_virt_phy_reg_write(struct lan9303 *chip, 
int regnum, u16 val)
return regmap_write(chip->regmap, LAN9303_VIRT_PHY_BASE + regnum, val);
 }
 
-static int lan9303_port_phy_reg_wait_for_completion(struct lan9303 *chip)
+static int lan9303_indirect_phy_wait_for_completion(struct lan9303 *chip)
 {
int ret, i;
u32 reg;
@@ -268,7 +268,7 @@ static int lan9303_port_phy_reg_wait_for_completion(struct 
lan9303 *chip)
return -EIO;
 }
 
-static int lan9303_port_phy_reg_read(struct lan9303 *chip, int addr, int 
regnum)
+static int lan9303_indirect_phy_read(struct lan9303 *chip, int addr, int 
regnum)
 {
int ret;
u32 val;
@@ -278,7 +278,7 @@ static int lan9303_port_phy_reg_read(struct lan9303 *chip, 
int addr, int regnum)
 
mutex_lock(>indirect_mutex);
 
-   ret = lan9303_port_phy_reg_wait_for_completion(chip);
+   ret = lan9303_indirect_phy_wait_for_completion(chip);
if (ret)
goto on_error;
 
@@ -287,7 +287,7 @@ static int lan9303_port_phy_reg_read(struct lan9303 *chip, 
int addr, int regnum)
if (ret)
goto on_error;
 
-   ret = lan9303_port_phy_reg_wait_for_completion(chip);
+   ret = lan9303_indirect_phy_wait_for_completion(chip);
if (ret)
goto on_error;
 
@@ -305,8 +305,8 @@ static int lan9303_port_phy_reg_read(struct lan9303 *chip, 
int addr, int regnum)
return ret;
 }
 
-static int lan9303_phy_reg_write(struct lan9303 *chip, int addr, int regnum,
-unsigned int val)
+static int lan9303_indirect_phy_write(struct lan9303 *chip, int addr,
+ int regnum, u16 val)
 {
int ret;
u32 reg;
@@ -317,7 +317,7 @@ static int lan9303_phy_reg_write(struct lan9303 *chip, int 
addr, int regnum,
 
mutex_lock(>indirect_mutex);
 
-   ret = lan9303_port_phy_reg_wait_for_completion(chip);
+   ret = lan9303_indirect_phy_wait_for_completion(chip);
if (ret)
goto on_error;
 
@@ -435,7 +435,7 @@ static int lan9303_detect_phy_setup(struct lan9303 *chip)
 * 0x, which means 'phy_addr_sel_strap' is 1 and the IDs are 1-2-3.
 * 0x is returned on MDIO read with no response.
 */
-   reg = lan9303_port_phy_reg_read(chip, 3, MII_LAN911X_SPECIAL_MODES);
+   reg = lan9303_indirect_phy_read(chip, 3, MII_LAN911X_SPECIAL_MODES);
if (reg < 0) {
dev_err(chip->dev, "Failed to detect phy config: %d\n", reg);
return reg;
@@ -726,7 +726,7 @@ static int lan9303_phy_read(struct dsa_switch *ds, int phy, 
int regnum)
if (phy > phy_base + 2)
return -ENODEV;
 
-   return lan9303_port_phy_reg_read(chip, phy, regnum);
+   return lan9303_indirect_phy_read(chip, phy, regnum);
 }
 
 static int lan9303_phy_write(struct dsa_switch *ds, int phy, int regnum,
@@ -740,7 +740,7 @@ static int lan9303_phy_write(struct dsa_switch *ds, int 
phy, int regnum,
if (phy > phy_base + 2)
return -ENODEV;
 
-   return lan9303_phy_reg_write(chip, phy, regnum, val);
+   return lan9303_indirect_phy_write(chip, phy, regnum, val);
 }
 
 static int lan9303_port_enable(struct dsa_switch *ds, int port,
@@ -773,13 +773,13 @@ static void lan9303_port_disable(struct dsa_switch *ds, 
int port,
switch (port) {
case 1:
lan9303_disable_packet_processing(chip, LAN9303_PORT_1_OFFSET);
-   lan9303_phy_reg_write(chip, chip->phy_addr_sel_strap + 1,
- MII_BMCR, BMCR_PDOWN);
+   lan9303_indirect_phy_write(chip, chip->phy_addr_sel_strap + 1,
+  MII_BMCR, BMCR_PDOWN);
break;
case 2:
lan9303_disable_packet_processing(chip, LAN9303_PORT_2_OFFSET);
-   lan9303_phy_reg_write(chip, chip->phy_addr_sel_strap + 2,
-

[PATCH v2 net-next 0/4] net: dsa: lan9303: Fix MDIO issues.

2017-07-30 Thread Egil Hjelmeland

This series fix the MDIO interface for the lan9303 DSA driver.
Bugs found after testing on actual HW.

This series is extracted from the first patch of my first large
series. Significant changes from that version are:
 - use mdiobus_write_nested, mdiobus_read_nested.
 - EXPORT lan9303_indirect_phy_ops

Unfortunately I do not have access to i2c based system for
testing.

Changes from first version:
 - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL


Egil Hjelmeland (4):
  net: dsa: lan9303: Fix lan9303_detect_phy_setup() for MDIO
  net: dsa: lan9303: Multiply by 4 to get MDIO register
  net: dsa: lan9303: Renamed indirect phy access functions
  net: dsa: lan9303: MDIO access phy registers directly

 drivers/net/dsa/lan9303-core.c | 43 +++---
 drivers/net/dsa/lan9303.h  | 11 +++
 drivers/net/dsa/lan9303_i2c.c  |  2 ++
 drivers/net/dsa/lan9303_mdio.c | 23 ++
 4 files changed, 64 insertions(+), 15 deletions(-)

-- 
2.11.0

[PATCH v2 net-next 1/4] net: dsa: lan9303: Fix lan9303_detect_phy_setup() for MDIO

2017-07-30 Thread Egil Hjelmeland

Handle that MDIO read with no response return 0x.

Signed-off-by: Egil Hjelmeland 
Reviewed-by: Andrew Lunn 
Reviewed-by: Vivien Didelot 
Reviewed-by: Florian Fainelli 
---
 drivers/net/dsa/lan9303-core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index cd76e61f1fca..9d0ab77edb4a 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -427,6 +427,7 @@ static int lan9303_detect_phy_setup(struct lan9303 *chip)
 * Special reg 18 of phy 3 reads as 0x, if 'phy_addr_sel_strap' is 0
 * and the IDs are 0-1-2, else it contains something different from
 * 0x, which means 'phy_addr_sel_strap' is 1 and the IDs are 1-2-3.
+* 0x is returned on MDIO read with no response.
 */
reg = lan9303_port_phy_reg_read(chip, 3, MII_LAN911X_SPECIAL_MODES);
if (reg < 0) {
@@ -434,7 +435,7 @@ static int lan9303_detect_phy_setup(struct lan9303 *chip)
return reg;
}
 
-   if (reg != 0)
+   if ((reg != 0) && (reg != 0x))
chip->phy_addr_sel_strap = 1;
else
chip->phy_addr_sel_strap = 0;
-- 
2.11.0

[PATCH v2 net-next 2/4] net: dsa: lan9303: Multiply by 4 to get MDIO register

2017-07-30 Thread Egil Hjelmeland

lan9303_mdio_write()/_read() must multiply register number by 4 to get
offset.

Added some commments to the register definitions.

Signed-off-by: Egil Hjelmeland 
Reviewed-by: Andrew Lunn 
Reviewed-by: Vivien Didelot 
Reviewed-by: Florian Fainelli 
---
 drivers/net/dsa/lan9303-core.c | 6 ++
 drivers/net/dsa/lan9303_mdio.c | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 9d0ab77edb4a..96ebeb9bd59a 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -20,6 +20,9 @@
 
 #include "lan9303.h"
 
+/* 13.2 System Control and Status Registers
+ * Multiply register number by 4 to get address offset.
+ */
 #define LAN9303_CHIP_REV 0x14
 # define LAN9303_CHIP_ID 0x9303
 #define LAN9303_IRQ_CFG 0x15
@@ -53,6 +56,9 @@
 #define LAN9303_VIRT_PHY_BASE 0x70
 #define LAN9303_VIRT_SPECIAL_CTRL 0x77
 
+/*13.4 Switch Fabric Control and Status Registers
+ * Accessed indirectly via SWITCH_CSR_CMD, SWITCH_CSR_DATA.
+ */
 #define LAN9303_SW_DEV_ID 0x
 #define LAN9303_SW_RESET 0x0001
 #define LAN9303_SW_RESET_RESET BIT(0)
diff --git a/drivers/net/dsa/lan9303_mdio.c b/drivers/net/dsa/lan9303_mdio.c
index 93c36c0541cf..2db7970fc88c 100644
--- a/drivers/net/dsa/lan9303_mdio.c
+++ b/drivers/net/dsa/lan9303_mdio.c
@@ -40,6 +40,7 @@ static int lan9303_mdio_write(void *ctx, uint32_t reg, 
uint32_t val)
 {
struct lan9303_mdio *sw_dev = (struct lan9303_mdio *)ctx;
 
+   reg <<= 2; /* reg num to offset */
mutex_lock(_dev->device->bus->mdio_lock);
lan9303_mdio_real_write(sw_dev->device, reg, val & 0x);
lan9303_mdio_real_write(sw_dev->device, reg + 2, (val >> 16) & 0x);
@@ -57,6 +58,7 @@ static int lan9303_mdio_read(void *ctx, uint32_t reg, 
uint32_t *val)
 {
struct lan9303_mdio *sw_dev = (struct lan9303_mdio *)ctx;
 
+   reg <<= 2; /* reg num to offset */
mutex_lock(_dev->device->bus->mdio_lock);
*val = lan9303_mdio_real_read(sw_dev->device, reg);
*val |= (lan9303_mdio_real_read(sw_dev->device, reg + 2) << 16);
-- 
2.11.0

[PATCH net-next] net: fec: Allow reception of frames bigger than 1522 bytes

2017-07-30 Thread Andrew Lunn

The FEC Receive Control Register has a 14 bit field indicating the
longest frame that my be received. It is being set to 1522. Frames
longer than this are discarded, but counted as being in error.

When using DSA, frames from the switch has an additional header,
either 4 or 8 bytes if a Marvell switch is used. Thus a full MTU frame
of 1522 bytes received by the switch on a port becomes 1530 bytes when
passed to the host via the FEC interface.

Change the maximum receive size to 2048 - 64, where 64 is the maximum
rx_alignment applied on the receive buffer for AVB capable FEC
cores. Use this value also for the maximum receive buffer size. The
driver is already allocating a receive SKB of 2048 bytes, so this
change should not have any significant effects.

Tested on imx51, imx6, vf610.

Signed-off-by: Andrew Lunn 
---
 drivers/net/ethernet/freescale/fec_main.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index a6e323f15637..47ee74a17a9f 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -173,10 +173,12 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 #endif /* CONFIG_M5272 */
 
 /* The FEC stores dest/src/type/vlan, data, and checksum for receive packets.
+ *
+ * 2048 byte skbufs are allocated. However, alignment requirements
+ * varies between FEC variants. Worst case is 64, so round down by 64.
  */
-#define PKT_MAXBUF_SIZE1522
+#define PKT_MAXBUF_SIZE(round_down(2048 - 64, 64))
 #define PKT_MINBUF_SIZE64
-#define PKT_MAXBLR_SIZE1536
 
 /* FEC receive acceleration */
 #define FEC_RACC_IPDIS (1 << 1)
@@ -851,7 +853,7 @@ static void fec_enet_enable_ring(struct net_device *ndev)
for (i = 0; i < fep->num_rx_queues; i++) {
rxq = fep->rx_queue[i];
writel(rxq->bd.dma, fep->hwp + FEC_R_DES_START(i));
-   writel(PKT_MAXBLR_SIZE, fep->hwp + FEC_R_BUFF_SIZE(i));
+   writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_R_BUFF_SIZE(i));
 
/* enable DMA1/2 */
if (i)
-- 
2.13.2

[PATCH net-next v12 4/4] net sched actions: add time filter for action dumping

2017-07-30 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

This patch adds support for filtering based on time since last used.
When we are dumping a large number of actions it is useful to
have the option of filtering based on when the action was last
used to reduce the amount of data crossing to user space.

With this patch the user space app sets the TCA_ROOT_TIME_DELTA
attribute with the value in milliseconds with "time of interest
since now".  The kernel converts this to jiffies and does the
filtering comparison matching entries that have seen activity
since then and returns them to user space.
Old kernels and old tc continue to work in legacy mode since
they dont specify this attribute.

Some example (we have 400 actions bound to 400 filters); at
installation time. Using updated when tc setting the time of
interest to 120 seconds earlier (we see 400 actions):
prompt$ hackedtc actions ls action gact since 12| grep index | wc -l
400

go get some coffee and wait for > 120 seconds and try again:

prompt$ hackedtc actions ls action gact since 12 | grep index | wc -l
0

Lets see a filter bound to one of these actions:

filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule 
hit 2 success 1)
  match 7f02/ at 12 (success 1 )
action order 1: gact action pass
 random type none pass val 0
 index 23 ref 2 bind 1 installed 1145 sec used 802 sec
Action statistics:
Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0


that coffee took long, no? It was good.

Now lets ping -c 1 127.0.0.2, then run the actions again:
prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
1

More details please:
prompt$ hackedtc -s actions ls action gact since 12

action order 0: gact action pass
 random type none pass val 0
 index 23 ref 2 bind 1 installed 1270 sec used 30 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

And the filter?

filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule 
hit 4 success 2)
  match 7f02/ at 12 (success 2 )
action order 1: gact action pass
 random type none pass val 0
 index 23 ref 2 bind 1 installed 1324 sec used 84 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Signed-off-by: Jamal Hadi Salim 
---
 include/uapi/linux/rtnetlink.h |  1 +
 net/sched/act_api.c| 21 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index bfa80a6..dab7dad 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -691,6 +691,7 @@ enum {
 #define TCAA_MAX TCA_ROOT_TAB
TCA_ROOT_FLAGS,
TCA_ROOT_COUNT,
+   TCA_ROOT_TIME_DELTA, /* in msecs */
__TCA_ROOT_MAX,
 #defineTCA_ROOT_MAX (__TCA_ROOT_MAX - 1)
 };
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index d53653a..f19b118 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -111,6 +111,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
 {
int err = 0, index = -1, i = 0, s_i = 0, n_i = 0;
u32 act_flags = cb->args[2];
+   unsigned long jiffy_since = cb->args[3];
struct nlattr *nest;
 
spin_lock_bh(>lock);
@@ -128,6 +129,11 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
if (index < s_i)
continue;
 
+   if (jiffy_since &&
+   time_after(jiffy_since,
+  (unsigned long)p->tcfa_tm.lastuse))
+   continue;
+
nest = nla_nest_start(skb, n_i);
if (nest == NULL)
goto nla_put_failure;
@@ -145,9 +151,11 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
}
}
 done:
+   if (index >= 0)
+   cb->args[0] = index + 1;
+
spin_unlock_bh(>lock);
if (n_i) {
-   cb->args[0] += n_i;
if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
cb->args[1] = n_i;
}
@@ -1077,6 +1085,7 @@ static int tcf_action_add(struct net *net, struct nlattr 
*nla,
 static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = {
[TCA_ROOT_FLAGS] = { .type = NLA_BITFIELD32,
 .validation_data = _root_flags_allowed },
+   [TCA_ROOT_TIME_DELTA]  = { .type = NLA_U32 },
 };
 
 static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n,
@@ -1166,8 +1175,10 @@ static int tc_dump_action(struct

[PATCH net-next v12 1/4] net netlink: Add new type NLA_BITFIELD32

2017-07-30 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Generic bitflags attribute content sent to the kernel by user.
With this netlink attr type the user can either set or unset a
flag in the kernel.

The value is a bitmap that defines the bit values being set
The selector is a bitmask that defines which value bit is to be
considered.

A check is made to ensure the rules that a kernel subsystem always
conforms to bitflags the kernel already knows about. i.e
if the user tries to set a bit flag that is not understood then
the _it will be rejected_.

In the most basic form, the user specifies the attribute policy as:
[ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data =  },

where myvalidflags is the bit mask of the flags the kernel understands.

If the user _does not_ provide myvalidflags then the attribute will
also be rejected.

Examples:
value = 0x0, and selector = 0x1
implies we are selecting bit 1 and we want to set its value to 0.

value = 0x2, and selector = 0x2
implies we are selecting bit 2 and we want to set its value to 1.

Suggested-by: Jiri Pirko 
Signed-off-by: Jamal Hadi Salim 
---
 include/net/netlink.h| 16 
 include/uapi/linux/netlink.h | 17 +
 lib/nlattr.c | 30 ++
 3 files changed, 63 insertions(+)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index ef8e6c3..82dd298 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -178,6 +178,7 @@ enum {
NLA_S16,
NLA_S32,
NLA_S64,
+   NLA_BITFIELD32,
__NLA_TYPE_MAX,
 };
 
@@ -206,6 +207,7 @@ enum {
  *NLA_MSECSLeaving the length field zero will verify the
  * given type fits, using it verifies minimum length
  * just like "All other"
+ *NLA_BITFIELD32  A 32-bit bitmap/bitselector attribute
  *All otherMinimum length of attribute payload
  *
  * Example:
@@ -213,11 +215,13 @@ enum {
  * [ATTR_FOO] = { .type = NLA_U16 },
  * [ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ },
  * [ATTR_BAZ] = { .len = sizeof(struct mystruct) },
+ * [ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data =  
},
  * };
  */
 struct nla_policy {
u16 type;
u16 len;
+   void*validation_data;
 };
 
 /**
@@ -1203,6 +1207,18 @@ static inline struct in6_addr nla_get_in6_addr(const 
struct nlattr *nla)
 }
 
 /**
+ * nla_get_bitfield32 - return payload of 32 bitfield attribute
+ * @nla: nla_bitfield32 attribute
+ */
+static inline struct nla_bitfield32 nla_get_bitfield32(const struct nlattr 
*nla)
+{
+   struct nla_bitfield32 tmp;
+
+   nla_memcpy(, nla, sizeof(tmp));
+   return tmp;
+}
+
+/**
  * nla_memdup - duplicate attribute memory (kmemdup)
  * @src: netlink attribute to duplicate from
  * @gfp: GFP mask
diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index f86127a..f4fc9c9 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -226,5 +226,22 @@ struct nlattr {
 #define NLA_ALIGN(len) (((len) + NLA_ALIGNTO - 1) & ~(NLA_ALIGNTO - 1))
 #define NLA_HDRLEN ((int) NLA_ALIGN(sizeof(struct nlattr)))
 
+/* Generic 32 bitflags attribute content sent to the kernel.
+ *
+ * The value is a bitmap that defines the values being set
+ * The selector is a bitmask that defines which value is legit
+ *
+ * Examples:
+ *  value = 0x0, and selector = 0x1
+ *  implies we are selecting bit 1 and we want to set its value to 0.
+ *
+ *  value = 0x2, and selector = 0x2
+ *  implies we are selecting bit 2 and we want to set its value to 1.
+ *
+ */
+struct nla_bitfield32 {
+   __u32 value;
+   __u32 selector;
+};
 
 #endif /* _UAPI__LINUX_NETLINK_H */
diff --git a/lib/nlattr.c b/lib/nlattr.c
index fb52435..ee79b7a 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -27,6 +27,30 @@
[NLA_S64]   = sizeof(s64),
 };
 
+static int validate_nla_bitfield32(const struct nlattr *nla,
+  u32 *valid_flags_allowed)
+{
+   const struct nla_bitfield32 *bf = nla_data(nla);
+   u32 *valid_flags_mask = valid_flags_allowed;
+
+   if (!valid_flags_allowed)
+   return -EINVAL;
+
+   /*disallow invalid bit selector */
+   if (bf->selector & ~*valid_flags_mask)
+   return -EINVAL;
+
+   /*disallow invalid bit values */
+   if (bf->value & ~*valid_flags_mask)
+   return -EINVAL;
+
+   /*disallow valid bit values that are not selected*/
+   if (bf->value & ~bf->selector)
+   return -EINVAL;
+
+   return 0;
+}
+
 static int validate_nla(const struct nlattr *nla, int maxtype,
const struct nla_policy *policy)
 {
@@ -46,6 +70,12 @@ static int validate_nla(const struct nlattr *nla, int 
maxtype,
return -ERANGE;
break;
 
+

[PATCH net-next v12 2/4] net sched actions: Use proper root attribute table for actions

2017-07-30 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Bug fix for an issue which has been around for about a decade.
We got away with it because the enumeration was larger than needed.

Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new 
netlink API")
Suggested-by: Jiri Pirko 
Reviewed-by: Simon Horman 
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_api.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index f2e9ed3..848370e 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -1072,7 +1072,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct 
nlmsghdr *n,
 struct netlink_ext_ack *extack)
 {
struct net *net = sock_net(skb->sk);
-   struct nlattr *tca[TCA_ACT_MAX + 1];
+   struct nlattr *tca[TCAA_MAX + 1];
u32 portid = skb ? NETLINK_CB(skb).portid : 0;
int ret = 0, ovr = 0;
 
@@ -1080,7 +1080,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct 
nlmsghdr *n,
!netlink_capable(skb, CAP_NET_ADMIN))
return -EPERM;
 
-   ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCA_ACT_MAX, NULL,
+   ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCAA_MAX, NULL,
  extack);
if (ret < 0)
return ret;
-- 
1.9.1

[PATCH net-next v12 3/4] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

2017-07-30 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

When you dump hundreds of thousands of actions, getting only 32 per
dump batch even when the socket buffer and memory allocations allow
is inefficient.

With this change, the user will get as many as possibly fitting
within the given constraints available to the kernel.

The top level action TLV space is extended. An attribute
TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
is set by the user indicating the user is capable of processing
these large dumps. Older user space which doesnt set this flag
doesnt get the large (than 32) batches.
The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
actions are put in a single batch. As such user space app knows how long
to iterate (independent of the type of action being dumped)
instead of hardcoded maximum of 32 thus maintaining backward compat.

Some results dumping 1.5M actions below:
first an unpatched tc which doesnt understand these features...

prompt$ time -p tc actions ls action gact | grep index | wc -l
150
real 1388.43
user 2.07
sys 1386.79

Now lets see a patched tc which sets the correct flags when requesting
a dump:

prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
150
real 178.13
user 2.02
sys 176.96

That is about 8x performance improvement for tc app which sets its
receive buffer to about 32K.

Signed-off-by: Jamal Hadi Salim 
---
 include/uapi/linux/rtnetlink.h | 22 +--
 net/sched/act_api.c| 50 +-
 2 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index d148505..bfa80a6 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -683,10 +683,28 @@ struct tcamsg {
unsigned char   tca__pad1;
unsigned short  tca__pad2;
 };
+
+enum {
+   TCA_ROOT_UNSPEC,
+   TCA_ROOT_TAB,
+#define TCA_ACT_TAB TCA_ROOT_TAB
+#define TCAA_MAX TCA_ROOT_TAB
+   TCA_ROOT_FLAGS,
+   TCA_ROOT_COUNT,
+   __TCA_ROOT_MAX,
+#defineTCA_ROOT_MAX (__TCA_ROOT_MAX - 1)
+};
+
 #define TA_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct 
tcamsg
 #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
-#define TCA_ACT_TAB 1 /* attr type must be >=1 */  
-#define TCAA_MAX 1
+/* tcamsg flags stored in attribute TCA_ROOT_FLAGS
+ *
+ * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than 
TCA_ACT_MAX_PRIO
+ * actions in a dump. All dump responses will contain the number of actions
+ * being dumped stored in for user app's consumption in TCA_ROOT_COUNT
+ *
+ */
+#define TCA_FLAG_LARGE_DUMP_ON (1 << 0)
 
 /* New extended info filters for IFLA_EXT_MASK */
 #define RTEXT_FILTER_VF(1 << 0)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 848370e..d53653a 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -110,6 +110,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
   struct netlink_callback *cb)
 {
int err = 0, index = -1, i = 0, s_i = 0, n_i = 0;
+   u32 act_flags = cb->args[2];
struct nlattr *nest;
 
spin_lock_bh(>lock);
@@ -138,14 +139,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, 
struct sk_buff *skb,
}
nla_nest_end(skb, nest);
n_i++;
-   if (n_i >= TCA_ACT_MAX_PRIO)
+   if (!(act_flags & TCA_FLAG_LARGE_DUMP_ON) &&
+   n_i >= TCA_ACT_MAX_PRIO)
goto done;
}
}
 done:
spin_unlock_bh(>lock);
-   if (n_i)
+   if (n_i) {
cb->args[0] += n_i;
+   if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
+   cb->args[1] = n_i;
+   }
return n_i;
 
 nla_put_failure:
@@ -1068,11 +1073,17 @@ static int tcf_action_add(struct net *net, struct 
nlattr *nla,
return tcf_add_notify(net, n, , portid);
 }
 
+static u32 tcaa_root_flags_allowed = TCA_FLAG_LARGE_DUMP_ON;
+static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = {
+   [TCA_ROOT_FLAGS] = { .type = NLA_BITFIELD32,
+.validation_data = _root_flags_allowed },
+};
+
 static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n,
 struct netlink_ext_ack *extack)
 {
struct net *net = sock_net(skb->sk);
-   struct nlattr *tca[TCAA_MAX + 1];
+   struct nlattr *tca[TCA_ROOT_MAX + 1];
u32 portid = skb ? NETLINK_CB(skb).portid : 0;
int ret = 0, ovr = 0;
 
@@ -1080,7 +1091,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct 
nlmsghdr *n,
!netlink_capable(skb, CAP_NET_ADMIN))
return -EPERM;
 
-   ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCAA_MAX,

[PATCH net-next v12 0/4] net sched actions: improve dump performance

2017-07-30 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Changes since v11:
--
1) Jiri - renames: nla_value to value and nla_selector to selector
2) Jiri - rename: validate_nla_bitfield_32 to validate_nla_bitfield_32
3) Jiri - rename: NLA_BITFIELD_32 to NLA_BITFIELD32
4) Jiri - remove unnecessary break when we return in case statement
5) Jiri - rename and move nla_get_bitfield_32 to an earlier patch
6) Jiri - xmas tree alignment of var declaration
7) Jiri - rename all declarations of bitfield 32 vars to be consistent ("bf")
8) Jiri - improve validate_nla_bitfield32() validation to disallow valid
  bit values that are not selected by the selector

Changes since v10:
-
1) Jiri: move type->validate_content() to its own patch
Jamal: decided to remove it altogether so we can get this patch set in.

2) Change name of NLA_FLAG_BITS to NLA_BITFIELD_32 based on discussions
with D. Ahern and Jiri. D. Ahern suggests to make this a variable bitmap size.
My analysis at this point is it too complex and i only need a few bit
flags. If we run out of bits someone else can create a new NLA_BITFIELD_XXX
and start using that. So please let this go.

3) Jamal - Add Suggested-by: Jiri for type NLA_BITFIELD_32

4) Jiri: Change name allowed_flags to tcaa_root_flags_allowed

5) Jiri: Introduce nla_get_flag_bits_values() helper instead of using
memcpy for retrieving nla_bitfield_32 fields.

Changes since v9:
-

1) General consensus:
- remove again the use of BIT() to maintain uapi consistency ;->

1) Jiri:
- Add a new netlink type NLA_FLAG_BITS to check for valid bits 
  and use it instead of inline vetting (patch 4/4 now)
  
Changes since v8:
-

1) Jiri:
- Add back the use of BIT(). Eventually fix iproute2 instead
- Rename VALID_TCA_FLAGS to VALID_TCA_ROOT_FLAGS

Changes since v7:
-

Jamal:
No changes.
Patch 1 went out twice. Resend without two copies of patch 1

changes since v6:
-

1) DaveM:
New rules for netlink messages. From now on we are going to start
checking for bits that are not used and rejecting anything we dont
understand. In the future this is going to require major changes
to user space code (tc etc). This is just a start.

To quote, David:
"
 Again, bits you aren't using now, make sure userspace doesn't
   set them.  And if it does, reject.
"
Added checks for ensuring things work as above.

2) Jiri:
a)Fix the commit message to properly use "Fixes" description
b)Align assignments for nla_policy

Changes since v5:


0)
Remove use of BIT() because it is kernel specific. Requires a separate
patch (Jiri can submit that in his cleanups)

1)To paraphrase Eric D.

"memcpy(nla_data(count_attr), >args[1], sizeof(u32));
wont work on 64bit BE machines because cb->args[1]
(which is 64 bit is larger in size than sizeof(u32))"

Fixed

2) Jiri Pirko

i) Spotted a bug fix mixed in the patch for wrong TLV
fix. Add patch 1/3 to address this. Make part of this
series because of dependencies.

ii) Rename ACT_LARGE_DUMP_ON -> TCA_FLAG_LARGE_DUMP_ON

iii) Satisfy Jiri's obsession against the noun "tcaa"
a)Rename struct nlattr *tcaa --> struct nlattr *tb
b)Rename TCAA_ACT_XXX -> TCA_ROOT_XXX

Changes since v4:
-

1) Eric D.

pointed out that when all skb space is used up by the dump
there will be no space to insert the TCAA_ACT_COUNT attribute.

2) Jiri:

i) Change:

enum {
TCAA_UNSPEC,
TCAA_ACT_TAB,
TCAA_ACT_FLAGS,
TCAA_ACT_COUNT,
TCAA_ACT_TIME_FILTER,
__TCAA_MAX
};

#define TCAA_MAX (__TCAA_MAX - 1)
#define ACT_LARGE_DUMP_ON   (1 << 0)

to:
enum {
   TCAA_UNSPEC,
   TCAA_ACT_TAB,
#define TCA_ACT_TAB TCAA_ACT_TAB
   TCAA_ACT_FLAGS,
   TCAA_ACT_COUNT,
   __TCAA_MAX,
#defineTCAA_MAX (__TCAA_MAX - 1)
};

#define ACT_LARGE_DUMP_ON  BIT(0)

Jiri plans to followup with the rest of the code to make the
style consistent.

ii) Rename attribute TCAA_ACT_TIME_FILTER --> TCAA_ACT_TIME_DELTA

iii) Rename variable jiffy_filter --> jiffy_since
iv) Rename msecs_filter --> msecs_since
v) get rid of unused cb->args[0] and rename cb->args[4] to cb->args[0]

Earlier Changes

- Jiri mostly on names of things.

Jamal Hadi Salim (4):
  net netlink: Add new type NLA_BITFIELD32
  net sched actions: Use proper root attribute table for actions
  net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
  net sched actions: add time filter for action dumping

 include/net/netlink.h  | 16 ++
 include/uapi/linux/netlink.h   | 17 ++
 include/uapi/linux/rtnetlink.h | 23 --
 lib/nlattr.c   | 30 ++
 net/sched/act_api.c| 71 +++---
 5 files changed, 144 insertions(+), 13 deletions(-)

-- 
1.9.1

Re: [PATCH v2] ravb: add wake-on-lan support via magic packet

2017-07-30 Thread Andrew Lunn

Hi Niklas

> @@ -2041,6 +2073,11 @@ static int ravb_probe(struct platform_device *pdev)
>  
>   priv->chip_id = chip_id;
>  
> + /* Get clock, if not found that's OK but Wake-On-Lan is unavailable */
> + priv->clk = devm_clk_get(>dev, NULL);
> + if (IS_ERR(priv->clk))
> + priv->clk = NULL;

Can you get EPROBE_DEFER returned?

Andrew

[RFC PATCH net-next 5/5] ipv6: sr: implement several seg6local actions

2017-07-30 Thread David Lebrun

This patch implements the following seg6local actions.

- SEG6_LOCAL_ACTION_END: regular SRH processing. The DA of the packet
  is updated to the next segment and forwarded accordingly.

- SEG6_LOCAL_ACTION_END_X: same as above, except that the packet is
  forwarded to the specified IPv6 next-hop.

- SEG6_LOCAL_ACTION_END_B6: insert the specified SRH directly after
  the IPv6 header of the packet.

- SEG6_LOCAL_ACTION_END_B6_ENCAP: encapsulate the packet within
  an outer IPv6 header, containing the specified SRH.

Signed-off-by: David Lebrun 
---
 net/ipv6/seg6_local.c | 176 ++
 1 file changed, 176 insertions(+)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index ab1fc1b..a7b346b 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -58,11 +58,187 @@ static struct seg6_local_lwt *seg6_local_lwtunnel(struct 
lwtunnel_state *lwt)
return (struct seg6_local_lwt *)lwt->data;
 }
 
+static struct ipv6_sr_hdr *get_srh(struct sk_buff *skb)
+{
+   struct ipv6_sr_hdr *srh;
+   struct ipv6hdr *hdr;
+   int len;
+
+   hdr = ipv6_hdr(skb);
+   if (hdr->nexthdr != IPPROTO_ROUTING)
+   return NULL;
+
+   srh = (struct ipv6_sr_hdr *)(hdr + 1);
+   len = (srh->hdrlen + 1) << 3;
+
+   if (!pskb_may_pull(skb, sizeof(*hdr) + len))
+   return NULL;
+
+   if (!seg6_validate_srh(srh, len))
+   return NULL;
+
+   return srh;
+}
+
+static struct ipv6_sr_hdr *get_and_validate_srh(struct sk_buff *skb)
+{
+   struct ipv6_sr_hdr *srh;
+
+   srh = get_srh(skb);
+   if (!srh)
+   return NULL;
+
+   if (srh->segments_left == 0)
+   return NULL;
+
+#ifdef CONFIG_IPV6_SEG6_HMAC
+   if (!seg6_hmac_validate_skb(skb))
+   return NULL;
+#endif
+
+   return srh;
+}
+
+static int input_action_end(struct sk_buff *skb, struct seg6_local_lwt *slwt)
+{
+   struct ipv6_sr_hdr *srh;
+   struct in6_addr *addr;
+
+   srh = get_and_validate_srh(skb);
+   if (!srh)
+   goto drop;
+
+   srh->segments_left--;
+   addr = srh->segments + srh->segments_left;
+
+   ipv6_hdr(skb)->daddr = *addr;
+
+   skb_dst_drop(skb);
+   ip6_route_input(skb);
+
+   return dst_input(skb);
+
+drop:
+   kfree_skb(skb);
+   return -EINVAL;
+}
+
+static int input_action_end_x(struct sk_buff *skb, struct seg6_local_lwt *slwt)
+{
+   struct net *net = dev_net(skb->dev);
+   struct ipv6_sr_hdr *srh;
+   struct in6_addr *addr;
+   struct ipv6hdr *hdr;
+   struct flowi6 fl6;
+   int flags;
+
+   srh = get_and_validate_srh(skb);
+   if (!srh)
+   goto drop;
+
+   srh->segments_left--;
+   addr = srh->segments + srh->segments_left;
+
+   hdr = ipv6_hdr(skb);
+   hdr->daddr = *addr;
+
+   skb_dst_drop(skb);
+
+   fl6.flowi6_iif = skb->dev->ifindex;
+   fl6.daddr = slwt->nh6;
+   fl6.saddr = hdr->saddr;
+   fl6.flowlabel = ip6_flowinfo(hdr);
+   fl6.flowi6_mark = skb->mark;
+   fl6.flowi6_proto = hdr->nexthdr;
+
+   flags = RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_REACHABLE;
+   skb_dst_set(skb, ip6_route_input_lookup(net, skb->dev, , flags));
+
+   return dst_input(skb);
+
+drop:
+   kfree_skb(skb);
+   return -EINVAL;
+}
+
+static int input_action_end_b6(struct sk_buff *skb, struct seg6_local_lwt 
*slwt)
+{
+   struct ipv6_sr_hdr *srh;
+   int err = -EINVAL;
+
+   srh = get_and_validate_srh(skb);
+   if (!srh)
+   goto drop;
+
+   err = seg6_do_srh_inline(skb, slwt->srh);
+   if (err)
+   goto drop;
+
+   ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
+   skb_set_transport_header(skb, sizeof(struct ipv6hdr));
+
+   skb_dst_drop(skb);
+   ip6_route_input(skb);
+
+   return dst_input(skb);
+
+drop:
+   kfree_skb(skb);
+   return err;
+}
+
+static int input_action_end_b6_encap(struct sk_buff *skb,
+struct seg6_local_lwt *slwt)
+{
+   struct ipv6_sr_hdr *srh;
+   int err = -EINVAL;
+
+   srh = get_and_validate_srh(skb);
+   if (!srh)
+   goto drop;
+
+   skb_reset_inner_headers(skb);
+   skb->encapsulation = 1;
+
+   err = seg6_do_srh_encap(skb, slwt->srh);
+   if (err)
+   goto drop;
+
+   ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
+   skb_set_transport_header(skb, sizeof(struct ipv6hdr));
+
+   skb_dst_drop(skb);
+   ip6_route_input(skb);
+
+   return dst_input(skb);
+
+drop:
+   kfree_skb(skb);
+   return err;
+}
+
 static struct seg6_action_desc seg6_action_table[] = {
{
.action = SEG6_LOCAL_ACTION_END,
.attrs  = 0,
+   .input  = input_action_end,
+   },
+

[RFC PATCH net-next 2/5] ipv6: sr: export SRH insertion functions

2017-07-30 Thread David Lebrun

This patch exports the seg6_do_srh_encap() and seg6_do_srh_inline()
functions. It also removes the CONFIG_IPV6_SEG6_INLINE knob
that enabled the compilation of seg6_do_srh_inline(). This function
is now built-in.

Signed-off-by: David Lebrun 
---
 include/net/seg6.h   |  2 ++
 net/ipv6/Kconfig | 12 
 net/ipv6/seg6_iptunnel.c | 12 
 3 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/include/net/seg6.h b/include/net/seg6.h
index 4e03575..a32abb0 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -58,5 +58,7 @@ extern int seg6_iptunnel_init(void);
 extern void seg6_iptunnel_exit(void);
 
 extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len);
+extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
+extern int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
 
 #endif
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 48c4529..50181a9 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -315,18 +315,6 @@ config IPV6_SEG6_LWTUNNEL
 
  If unsure, say N.
 
-config IPV6_SEG6_INLINE
-   bool "IPv6: direct Segment Routing Header insertion "
-   depends on IPV6_SEG6_LWTUNNEL
-   ---help---
- Support for direct insertion of the Segment Routing Header,
- also known as inline mode. Be aware that direct insertion of
- extension headers (as opposed to encapsulation) may break
- multiple mechanisms such as PMTUD or IPSec AH. Use this feature
- only if you know exactly what you are doing.
-
- If unsure, say N.
-
 config IPV6_SEG6_HMAC
bool "IPv6: Segment Routing HMAC support"
depends on IPV6
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 264d772..20b1ba8 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -91,7 +91,7 @@ static void set_tun_src(struct net *net, struct net_device 
*dev,
 }
 
 /* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */
-static int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
+int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
 {
struct net *net = dev_net(skb_dst(skb)->dev);
struct ipv6hdr *hdr, *inner_hdr;
@@ -141,10 +141,10 @@ static int seg6_do_srh_encap(struct sk_buff *skb, struct 
ipv6_sr_hdr *osrh)
 
return 0;
 }
+EXPORT_SYMBOL(seg6_do_srh_encap);
 
 /* insert an SRH within an IPv6 packet, just after the IPv6 header */
-#ifdef CONFIG_IPV6_SEG6_INLINE
-static int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
+int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
 {
struct ipv6hdr *hdr, *oldhdr;
struct ipv6_sr_hdr *isrh;
@@ -193,7 +193,7 @@ static int seg6_do_srh_inline(struct sk_buff *skb, struct 
ipv6_sr_hdr *osrh)
 
return 0;
 }
-#endif
+EXPORT_SYMBOL(seg6_do_srh_inline);
 
 static int seg6_do_srh(struct sk_buff *skb)
 {
@@ -209,12 +209,10 @@ static int seg6_do_srh(struct sk_buff *skb)
}
 
switch (tinfo->mode) {
-#ifdef CONFIG_IPV6_SEG6_INLINE
case SEG6_IPTUN_MODE_INLINE:
err = seg6_do_srh_inline(skb, tinfo->srh);
skb_reset_inner_headers(skb);
break;
-#endif
case SEG6_IPTUN_MODE_ENCAP:
err = seg6_do_srh_encap(skb, tinfo->srh);
break;
@@ -357,10 +355,8 @@ static int seg6_build_state(struct nlattr *nla,
return -EINVAL;
 
switch (tuninfo->mode) {
-#ifdef CONFIG_IPV6_SEG6_INLINE
case SEG6_IPTUN_MODE_INLINE:
break;
-#endif
case SEG6_IPTUN_MODE_ENCAP:
break;
default:
-- 
2.10.2

[RFC PATCH net-next 4/5] ipv6: sr: add rtnetlink functions for seg6local action parameters

2017-07-30 Thread David Lebrun

This patch adds the necessary functions to parse, fill, and compare
seg6local rtnetlink attributes, for all defined action parameters.

- The SRH parameter defines an SRH to be inserted or encapsulated.
- The TABLE parameter defines the table to use for the route lookup of
  the next segment or the inner decapsulated packet.
- The NH4 parameter defines the IPv4 next-hop for an inner decapsulated
  IPv4 packet.
- The NH6 parameter defines the IPv6 next-hop for the next segment or
  for an inner decapsulated IPv6 packet
- The IIF parameter defines an ingress interface index.
- The OIF parameter defines an egress interface index.

Signed-off-by: David Lebrun 
---
 net/ipv6/seg6_local.c | 211 +-
 1 file changed, 193 insertions(+), 18 deletions(-)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 53615d7..ab1fc1b 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -104,6 +104,181 @@ static const struct nla_policy 
seg6_local_policy[SEG6_LOCAL_MAX + 1] = {
[SEG6_LOCAL_OIF]= { .type = NLA_U32 },
 };
 
+static int parse_nla_srh(struct nlattr **attrs, struct seg6_local_lwt *slwt)
+{
+   struct ipv6_sr_hdr *srh;
+   int len;
+
+   srh = nla_data(attrs[SEG6_LOCAL_SRH]);
+   len = nla_len(attrs[SEG6_LOCAL_SRH]);
+
+   /* SRH must contain at least one segment */
+   if (len < sizeof(*srh) + sizeof(struct in6_addr))
+   return -EINVAL;
+
+   if (!seg6_validate_srh(srh, len))
+   return -EINVAL;
+
+   slwt->srh = kmalloc(len, GFP_KERNEL);
+   if (!slwt->srh)
+   return -ENOMEM;
+
+   memcpy(slwt->srh, srh, len);
+
+   slwt->headroom += len;
+
+   return 0;
+}
+
+static int put_nla_srh(struct sk_buff *skb, struct seg6_local_lwt *slwt)
+{
+   struct ipv6_sr_hdr *srh;
+   struct nlattr *nla;
+   int len;
+
+   srh = slwt->srh;
+   len = (srh->hdrlen + 1) << 3;
+
+   nla = nla_reserve(skb, SEG6_LOCAL_SRH, len);
+   if (!nla)
+   return -EMSGSIZE;
+
+   memcpy(nla_data(nla), srh, len);
+
+   return 0;
+}
+
+static int cmp_nla_srh(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
+{
+   int len = (a->srh->hdrlen + 1) << 3;
+
+   if (len != ((b->srh->hdrlen + 1) << 3))
+   return 1;
+
+   return memcmp(a->srh, b->srh, len);
+}
+
+static int parse_nla_table(struct nlattr **attrs, struct seg6_local_lwt *slwt)
+{
+   slwt->table = nla_get_u32(attrs[SEG6_LOCAL_TABLE]);
+
+   return 0;
+}
+
+static int put_nla_table(struct sk_buff *skb, struct seg6_local_lwt *slwt)
+{
+   if (nla_put_u32(skb, SEG6_LOCAL_TABLE, slwt->table))
+   return -EMSGSIZE;
+
+   return 0;
+}
+
+static int cmp_nla_table(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
+{
+   if (a->table != b->table)
+   return 1;
+
+   return 0;
+}
+
+static int parse_nla_nh4(struct nlattr **attrs, struct seg6_local_lwt *slwt)
+{
+   memcpy(>nh4, nla_data(attrs[SEG6_LOCAL_NH4]),
+  sizeof(struct in_addr));
+
+   return 0;
+}
+
+static int put_nla_nh4(struct sk_buff *skb, struct seg6_local_lwt *slwt)
+{
+   struct nlattr *nla;
+
+   nla = nla_reserve(skb, SEG6_LOCAL_NH4, sizeof(struct in_addr));
+   if (!nla)
+   return -EMSGSIZE;
+
+   memcpy(nla_data(nla), >nh4, sizeof(struct in_addr));
+
+   return 0;
+}
+
+static int cmp_nla_nh4(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
+{
+   return memcmp(>nh4, >nh4, sizeof(struct in_addr));
+}
+
+static int parse_nla_nh6(struct nlattr **attrs, struct seg6_local_lwt *slwt)
+{
+   memcpy(>nh6, nla_data(attrs[SEG6_LOCAL_NH6]),
+  sizeof(struct in6_addr));
+
+   return 0;
+}
+
+static int put_nla_nh6(struct sk_buff *skb, struct seg6_local_lwt *slwt)
+{
+   struct nlattr *nla;
+
+   nla = nla_reserve(skb, SEG6_LOCAL_NH6, sizeof(struct in6_addr));
+   if (!nla)
+   return -EMSGSIZE;
+
+   memcpy(nla_data(nla), >nh6, sizeof(struct in6_addr));
+
+   return 0;
+}
+
+static int cmp_nla_nh6(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
+{
+   return memcmp(>nh6, >nh6, sizeof(struct in6_addr));
+}
+
+static int parse_nla_iif(struct nlattr **attrs, struct seg6_local_lwt *slwt)
+{
+   slwt->iif = nla_get_u32(attrs[SEG6_LOCAL_IIF]);
+
+   return 0;
+}
+
+static int put_nla_iif(struct sk_buff *skb, struct seg6_local_lwt *slwt)
+{
+   if (nla_put_u32(skb, SEG6_LOCAL_IIF, slwt->iif))
+   return -EMSGSIZE;
+
+   return 0;
+}
+
+static int cmp_nla_iif(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
+{
+   if (a->iif != b->iif)
+   return 1;
+
+   return 0;
+}
+
+static int parse_nla_oif(struct nlattr **attrs, struct seg6_local_lwt *slwt)
+{
+   slwt->oif = nla_get_u32(attrs[SEG6_LOCAL_OIF]);
+
+   return 0;
+}
+
+static int put_nla_oif(struct

[RFC PATCH net-next 3/5] ipv6: sr: define core operations for seg6local lightweight tunnel

2017-07-30 Thread David Lebrun

This patch implements a new type of lightweight tunnel named seg6local.
A seg6local lwt is defined by a type of action and a set of parameters.
The action represents the operation to perform on the packets matching the
lwt's route, and is not necessarily an encapsulation. The set of parameters
are arguments for the processing function.

Each action is defined in a struct seg6_action_desc within
seg6_action_table[]. This structure contains the action, mandatory
attributes, the processing function, and a static headroom size required by
the action. The mandatory attributes are encoded as a bitmask field. The
static headroom is set to a non-zero value when the processing function
always add a constant number of bytes to the skb (e.g. the header size for
encapsulations).

To facilitate rtnetlink-related operations such as parsing, fill_encap,
and cmp_encap, each type of action parameter is associated to three
function pointers, in seg6_action_params[].

All actions defined in seg6_local.h are detailed in [1].

[1] 
https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-01

Signed-off-by: David Lebrun 
---
 include/linux/seg6_local.h  |   6 +
 include/net/seg6.h  |   2 +
 include/uapi/linux/lwtunnel.h   |   1 +
 include/uapi/linux/seg6_local.h |  68 +
 net/core/lwtunnel.c |   2 +
 net/ipv6/Kconfig|   3 +-
 net/ipv6/Makefile   |   2 +-
 net/ipv6/seg6.c |   5 +
 net/ipv6/seg6_local.c   | 320 
 9 files changed, 407 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/seg6_local.h
 create mode 100644 include/uapi/linux/seg6_local.h
 create mode 100644 net/ipv6/seg6_local.c

diff --git a/include/linux/seg6_local.h b/include/linux/seg6_local.h
new file mode 100644
index 000..ee63e76
--- /dev/null
+++ b/include/linux/seg6_local.h
@@ -0,0 +1,6 @@
+#ifndef _LINUX_SEG6_LOCAL_H
+#define _LINUX_SEG6_LOCAL_H
+
+#include 
+
+#endif
diff --git a/include/net/seg6.h b/include/net/seg6.h
index a32abb0..5379f55 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -56,6 +56,8 @@ extern int seg6_init(void);
 extern void seg6_exit(void);
 extern int seg6_iptunnel_init(void);
 extern void seg6_iptunnel_exit(void);
+extern int seg6_local_init(void);
+extern void seg6_local_exit(void);
 
 extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len);
 extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index 92724cba..7fdd19c 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -11,6 +11,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_IP6,
LWTUNNEL_ENCAP_SEG6,
LWTUNNEL_ENCAP_BPF,
+   LWTUNNEL_ENCAP_SEG6_LOCAL,
__LWTUNNEL_ENCAP_MAX,
 };
 
diff --git a/include/uapi/linux/seg6_local.h b/include/uapi/linux/seg6_local.h
new file mode 100644
index 000..ef2d8c3
--- /dev/null
+++ b/include/uapi/linux/seg6_local.h
@@ -0,0 +1,68 @@
+/*
+ *  SR-IPv6 implementation
+ *
+ *  Author:
+ *  David Lebrun 
+ *
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_LINUX_SEG6_LOCAL_H
+#define _UAPI_LINUX_SEG6_LOCAL_H
+
+#include 
+
+enum {
+   SEG6_LOCAL_UNSPEC,
+   SEG6_LOCAL_ACTION,
+   SEG6_LOCAL_SRH,
+   SEG6_LOCAL_TABLE,
+   SEG6_LOCAL_NH4,
+   SEG6_LOCAL_NH6,
+   SEG6_LOCAL_IIF,
+   SEG6_LOCAL_OIF,
+   __SEG6_LOCAL_MAX,
+};
+#define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
+
+enum {
+   SEG6_LOCAL_ACTION_UNSPEC= 0,
+   /* node segment */
+   SEG6_LOCAL_ACTION_END   = 1,
+   /* adjacency segment (IPv6 cross-connect) */
+   SEG6_LOCAL_ACTION_END_X = 2,
+   /* lookup of next seg NH in table */
+   SEG6_LOCAL_ACTION_END_T = 3,
+   /* decap and L2 cross-connect */
+   SEG6_LOCAL_ACTION_END_DX2   = 4,
+   /* decap and IPv6 cross-connect */
+   SEG6_LOCAL_ACTION_END_DX6   = 5,
+   /* decap and IPv4 cross-connect */
+   SEG6_LOCAL_ACTION_END_DX4   = 6,
+   /* decap and lookup of DA in v6 table */
+   SEG6_LOCAL_ACTION_END_DT6   = 7,
+   /* decap and lookup of DA in v4 table */
+   SEG6_LOCAL_ACTION_END_DT4   = 8,
+   /* binding segment with insertion */
+   SEG6_LOCAL_ACTION_END_B6= 9,
+   /* binding segment with encapsulation */
+   SEG6_LOCAL_ACTION_END_B6_ENCAP  = 10,
+   /* binding segment with MPLS encap */
+   SEG6_LOCAL_ACTION_END_BM= 11,
+   /* lookup last seg in table */
+   SEG6_LOCAL_ACTION_END_S = 12,
+   /*

[RFC PATCH net-next 0/5] ipv6: sr: add support for advanced local segment processing

2017-07-30 Thread David Lebrun

The current implementation of IPv6 SR supports SRH insertion/encapsulation
and basic segment endpoint behavior (i.e., processing of an SRH contained in
a packet whose active segment (IPv6 DA) is routed to the local node). This
behavior simply consists of updating the DA to the next segment and forwarding
the packet accordingly. This processing is realised for all such packets,
regardless of the active segment.

The most recent specifications of IPv6 SR [1][2] extend the SRH processing
features as follows. Each segment endpoint defines a MyLocalSID table.
This table maps segments to operations to perform. For each ingress
IPv6 packet whose DA is part of a given prefix, the segment endpoint
looks up the active segment (i.e., the IPv6 DA) in the MyLocalSID table
and applies the corresponding operation. Such specifications enables
to specify arbitrary operations besides the basic SRH processing and
allows for a more fine-grained classification.

This patch series implements those extended specifications by leveraging
a new type of lightweight tunnel, seg6local. The MyLocalSID table is
simply an arbitrary routing table (using CONFIG_IPV6_MULTIPLE_TABLES). The
following commands would assign the prefix fc00::/64 to the MyLocalSID
table, map the segment fc00::42 to the regular SRH processing function
(named "End"), and drop all packets received with an undefined active
segment:

ip -6 rule add fc00::/64 lookup 100
ip -6 route add fc00::42 encap seg6local action End dev eth0 table 100
ip -6 route add blackhole default table 100

As another example, the following command would assign the segment
fc00::1234 to the regular SRH processing function, except that the next
segment must be forwarded to the next-hop fc42::1 (this operation is named
"End.X"):

ip -6 route add fc00::1234 encap seg6local action End.X nh6 fc42::1 dev eth0 
table 100

Those two basic operations (End and End.X) are defined in [1]. A more
extensive list of advanced operations is defined in [2].

The first two patches of the series are preliminary work that remove an
assumption about initial SRH format, and export the two functions used to
insert and encapsulate an SRH onto packets. The third patch defines the
new seg6local lightweight tunnel and implement the core functions. The
fourth patch implements the operations needed to handle the newly defined
rtnetlink attributes. The fifth patch implements a few SRH processing
operations, including End and End.X.

[1] https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07
[2] 
https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-01

David Lebrun (5):
  ipv6: sr: allow SRH insertion with arbitrary segments_left value
  ipv6: sr: export SRH insertion functions
  ipv6: sr: define core operations for seg6local lightweight tunnel
  ipv6: sr: add rtnetlink functions for seg6local action parameters
  ipv6: sr: implement several seg6local actions

 include/linux/seg6_local.h  |   6 +
 include/net/seg6.h  |   4 +
 include/uapi/linux/lwtunnel.h   |   1 +
 include/uapi/linux/seg6_local.h |  68 
 net/core/lwtunnel.c |   2 +
 net/ipv6/Kconfig|  15 +-
 net/ipv6/Makefile   |   2 +-
 net/ipv6/exthdrs.c  |   4 +-
 net/ipv6/seg6.c |   7 +-
 net/ipv6/seg6_iptunnel.c|  12 +-
 net/ipv6/seg6_local.c   | 671 
 11 files changed, 767 insertions(+), 25 deletions(-)
 create mode 100644 include/linux/seg6_local.h
 create mode 100644 include/uapi/linux/seg6_local.h
 create mode 100644 net/ipv6/seg6_local.c

-- 
2.10.2

[RFC PATCH net-next 1/5] ipv6: sr: allow SRH insertion with arbitrary segments_left value

2017-07-30 Thread David Lebrun

The seg6_validate_srh() function only allows SRHs whose active segment is
the first segment of the path. However, an application may insert an SRH
whose active segment is not the first one. Such an application might be
for example an SR-aware Virtual Network Function.

This patch enables to insert SRHs with an arbitrary active segment.

Signed-off-by: David Lebrun 
---
 net/ipv6/exthdrs.c | 4 ++--
 net/ipv6/seg6.c| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 4996d73..3b7d369 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -881,7 +881,7 @@ static void ipv6_push_rthdr4(struct sk_buff *skb, u8 *proto,
   (hops - 1) * sizeof(struct in6_addr));
 
sr_phdr->segments[0] = **addr_p;
-   *addr_p = _ihdr->segments[hops - 1];
+   *addr_p = _ihdr->segments[sr_ihdr->segments_left];
 
 #ifdef CONFIG_IPV6_SEG6_HMAC
if (sr_has_hmac(sr_phdr)) {
@@ -1173,7 +1173,7 @@ struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
{
struct ipv6_sr_hdr *srh = (struct ipv6_sr_hdr *)opt->srcrt;
 
-   fl6->daddr = srh->segments[srh->first_segment];
+   fl6->daddr = srh->segments[srh->segments_left];
break;
}
default:
diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c
index 15fba55..81c2339 100644
--- a/net/ipv6/seg6.c
+++ b/net/ipv6/seg6.c
@@ -40,7 +40,7 @@ bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len)
if (((srh->hdrlen + 1) << 3) != len)
return false;
 
-   if (srh->segments_left != srh->first_segment)
+   if (srh->segments_left > srh->first_segment)
return false;
 
tlv_offset = sizeof(*srh) + ((srh->first_segment + 1) << 4);
-- 
2.10.2

[PATCH v2] ravb: add wake-on-lan support via magic packet

2017-07-30 Thread Niklas Söderlund

WoL is enabled in the suspend callback by setting MagicPacket detection
and disabling all interrupts expect MagicPacket. In the resume path the
driver needs to reset the hardware to rearm the WoL logic, this prevents
the driver from simply restoring the registers and to take advantage of
that ravb was not suspended to reduce resume time. To reset the
hardware the driver closes the device, sets it in reset mode and reopens
the device just like it would do in a normal suspend/resume scenario
without WoL enabled, but it both closes and opens the device in the
resume callback since the device needs to be reset for WoL to work.

One quirk needed for WoL is that the module clock needs to be prevented
from being switched off by Runtime PM. To keep the clock alive the
suspend callback need to call clk_enable() directly to increase the
usage count of the clock. Then when Runtime PM decreases the clock usage
count it won't reach 0 and be switched off.

Signed-off-by: Niklas Söderlund 
---
 drivers/net/ethernet/renesas/ravb.h  |   2 +
 drivers/net/ethernet/renesas/ravb_main.c | 130 ++-
 2 files changed, 128 insertions(+), 4 deletions(-)

Changes from v1
- Fix issue where device would fail to resume from PSCI suspend if WoL 
  was enabled, reported by Geert. The fault was that the clock driver 
  thinks the clock is on, but PSCI have disabled it, added workaround 
  for this in ravb driver which can be removed once the clock driver is 
  aware of the PSCI behavior.
- Only try to restore from wol wake up if netif is running, since this 
  is a condition to enable wol in the first place this was a bug in v1.


diff --git a/drivers/net/ethernet/renesas/ravb.h 
b/drivers/net/ethernet/renesas/ravb.h
index 0525bd696d5d02e5..96a27b00c90e212a 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h
@@ -991,6 +991,7 @@ struct ravb_private {
struct net_device *ndev;
struct platform_device *pdev;
void __iomem *addr;
+   struct clk *clk;
struct mdiobb_ctrl mdiobb;
u32 num_rx_ring[NUM_RX_QUEUE];
u32 num_tx_ring[NUM_TX_QUEUE];
@@ -1033,6 +1034,7 @@ struct ravb_private {
 
unsigned no_avb_link:1;
unsigned avb_link_active_low:1;
+   unsigned wol_enabled:1;
 };
 
 static inline u32 ravb_read(struct net_device *ndev, enum ravb_reg reg)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 5931e859876c2aee..3d399f85417a83cf 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -680,6 +680,9 @@ static void ravb_emac_interrupt_unlocked(struct net_device 
*ndev)
 
ecsr = ravb_read(ndev, ECSR);
ravb_write(ndev, ecsr, ECSR);   /* clear interrupt */
+
+   if (ecsr & ECSR_MPD)
+   pm_wakeup_event(>pdev->dev, 0);
if (ecsr & ECSR_ICD)
ndev->stats.tx_carrier_errors++;
if (ecsr & ECSR_LCHNG) {
@@ -1330,6 +1333,33 @@ static int ravb_get_ts_info(struct net_device *ndev,
return 0;
 }
 
+static void ravb_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
+{
+   struct ravb_private *priv = netdev_priv(ndev);
+
+   wol->supported = 0;
+   wol->wolopts = 0;
+
+   if (priv->clk) {
+   wol->supported = WAKE_MAGIC;
+   wol->wolopts = priv->wol_enabled ? WAKE_MAGIC : 0;
+   }
+}
+
+static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
+{
+   struct ravb_private *priv = netdev_priv(ndev);
+
+   if (!priv->clk || wol->wolopts & ~WAKE_MAGIC)
+   return -EOPNOTSUPP;
+
+   priv->wol_enabled = !!(wol->wolopts & WAKE_MAGIC);
+
+   device_set_wakeup_enable(>pdev->dev, priv->wol_enabled);
+
+   return 0;
+}
+
 static const struct ethtool_ops ravb_ethtool_ops = {
.nway_reset = ravb_nway_reset,
.get_msglevel   = ravb_get_msglevel,
@@ -1343,6 +1373,8 @@ static const struct ethtool_ops ravb_ethtool_ops = {
.get_ts_info= ravb_get_ts_info,
.get_link_ksettings = ravb_get_link_ksettings,
.set_link_ksettings = ravb_set_link_ksettings,
+   .get_wol= ravb_get_wol,
+   .set_wol= ravb_set_wol,
 };
 
 static inline int ravb_hook_irq(unsigned int irq, irq_handler_t handler,
@@ -2041,6 +2073,11 @@ static int ravb_probe(struct platform_device *pdev)
 
priv->chip_id = chip_id;
 
+   /* Get clock, if not found that's OK but Wake-On-Lan is unavailable */
+   priv->clk = devm_clk_get(>dev, NULL);
+   if (IS_ERR(priv->clk))
+   priv->clk = NULL;
+
/* Set function */
ndev->netdev_ops = _netdev_ops;
ndev->ethtool_ops = _ethtool_ops;
@@ -2107,6 +2144,9 @@ static int ravb_probe(struct platform_device *pdev)
if (error)
goto out_napi_del;
 
+

Re: [net-next PATCH 11/12] net: add notifier hooks for devmap bpf map

2017-07-30 Thread Levin, Alexander (Sasha Levin)

On Mon, Jul 17, 2017 at 09:30:02AM -0700, John Fastabend wrote:
>@@ -341,9 +368,11 @@ static int dev_map_update_elem(struct bpf_map *map, void 
>*key, void *value,
>* Remembering the driver side flush operation will happen before the
>* net device is removed.
>*/
>+  mutex_lock(_map_list_mutex);
>   old_dev = xchg(>netdev_map[i], dev);
>   if (old_dev)
>   call_rcu(_dev->rcu, __dev_map_entry_free);
>+  mutex_unlock(_map_list_mutex);
>
>   return 0;
> }

This function gets called under rcu critical section, where we can't grab 
mutexes:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 16315, name: syz-executor1
1 lock held by syz-executor1/16315:
 #0:  (rcu_read_lock){..}, at: [] map_delete_elem 
kernel/bpf/syscall.c:577 [inline]
 #0:  (rcu_read_lock){..}, at: [] SYSC_bpf 
kernel/bpf/syscall.c:1427 [inline]
 #0:  (rcu_read_lock){..}, at: [] SyS_bpf+0x1d32/0x4ba0 
kernel/bpf/syscall.c:1388
Preemption disabled at:
[] map_delete_elem kernel/bpf/syscall.c:582 [inline]
[] SYSC_bpf kernel/bpf/syscall.c:1427 [inline]
[] SyS_bpf+0x1d41/0x4ba0 kernel/bpf/syscall.c:1388
CPU: 2 PID: 16315 Comm: syz-executor1 Not tainted 4.13.0-rc2-next-20170727 #235
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 
04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x11d/0x1e5 lib/dump_stack.c:52
 ___might_sleep+0x3cc/0x520 kernel/sched/core.c:6001
 __might_sleep+0x95/0x190 kernel/sched/core.c:5954
 __mutex_lock_common kernel/locking/mutex.c:747 [inline]
 __mutex_lock+0x146/0x19b0 kernel/locking/mutex.c:893
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
 dev_map_delete_elem+0x82/0x110 kernel/bpf/devmap.c:325
 map_delete_elem kernel/bpf/syscall.c:585 [inline]
 SYSC_bpf kernel/bpf/syscall.c:1427 [inline]
 SyS_bpf+0x1deb/0x4ba0 kernel/bpf/syscall.c:1388
 do_syscall_64+0x26a/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x452309
RSP: 002b:7f8d83d66c08 EFLAGS: 0216 ORIG_RAX: 0141
RAX: ffda RBX: 00718000 RCX: 00452309
RDX: 0010 RSI: 20007000 RDI: 0003
RBP: 0270 R08:  R09: 
R10:  R11: 0216 R12: 004b85e4
R13:  R14: 0003 R15: 20007000

-- 

Thanks,
Sasha

[PATCH net] ipv6: set fc_protocol with 0 when rtm_protocol is RTPROT_REDIRECT

2017-07-30 Thread Xin Long

After commit c2ed1880fd61 ("net: ipv6: check route protocol when
deleting routes"), ipv6 route checks rt protocol when trying to
remove a rt entry.

It introduced a side effect when flushing caches with iproute, in
which all route caches get dumped from kernel then removed one by
one by sending RTM_DELROUTE requests to kernel for each cache.

The thing is iproute sends the request with the cache whose proto
is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps
it. But in kernel the rt_cache protocol is still 0, which causes
the cache not to be found and removed.

As rt6_fill_node always sets rtm proto with RTPROT_REDIRECT when
rt cache info goes to rtmsg, the reverse process is needed when
users remove a route cache and rtmsg goes to cfg.

This patch is to fix it by keeping cfg proto as 0 when rtm proto
is REDIRECT. It's a safe fix as rtm proto is set with REDIRECT
only if rt flag has RTF_DYNAMIC which is set when creating a rt
cache in rt6_do_redirect where the cache's proto is always 0.

Note that this issue can also be avoided in iproute by changing
rtm proto back to 0 before sending DELROUTE requests for cache.
But in kernel part, the fix is still necessary as kernel should
do the reverse conversion when rtm goes to cfg.

Fixes: c2ed1880fd61 ("net: ipv6: check route protocol when deleting routes")
Signed-off-by: Xin Long 
---
 net/ipv6/route.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4d30c96..187580f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2912,9 +2912,11 @@ static int rtm_to_fib6_config(struct sk_buff *skb, 
struct nlmsghdr *nlh,
cfg->fc_dst_len = rtm->rtm_dst_len;
cfg->fc_src_len = rtm->rtm_src_len;
cfg->fc_flags = RTF_UP;
-   cfg->fc_protocol = rtm->rtm_protocol;
cfg->fc_type = rtm->rtm_type;
 
+   if (rtm->rtm_protocol != RTPROT_REDIRECT)
+   cfg->fc_protocol = rtm->rtm_protocol;
+
if (rtm->rtm_type == RTN_UNREACHABLE ||
rtm->rtm_type == RTN_BLACKHOLE ||
rtm->rtm_type == RTN_PROHIBIT ||
-- 
2.1.0

Re: [PATCH] b43legacy: Fix a sleep-in-atomic bug in b43legacy_attr_interfmode_store

2017-07-30 Thread Michael Büsch

On Fri, 02 Jun 2017 09:18:14 +0800
Jia-Ju Bai  wrote:

> On 06/02/2017 12:11 AM, Jonathan Corbet wrote:
> > On Thu, 01 Jun 2017 09:05:07 +0800
> > Jia-Ju Bai  wrote:
> >  
> >> I admit my patches are not well tested, and they may not well fix the bugs.
> >> I am looking forward to opinions and suggestions :)  
> > May I politely suggest that sending out untested locking changes is a
> > dangerous thing to do?  You really should not be changing the locking in a
> > piece of kernel code without understanding very well what the lock is
> > protecting and being able to say why your changes are safe.  Without that,
> > the risk of introducing subtle bugs is very high.
> >
> > It looks like you have written a useful tool that could help us to make
> > the kernel more robust.  If you are interested in my suggestion, I would
> > recommend that you post the sleep-in-atomic scenarios that you are
> > finding, but refrain from "fixing" them in any case where you cannot offer
> > a strong explanation of why your fix is correct.
> >
> > Thanks for working to find bugs in the kernel!
> >
> > jon  
> Hi,
> 
> Thanks for your good and helpful advice. I am sorry for my improper patches.
> I will only report bugs instead of sending improper patches when I have 
> no good solution of fixing the bugs.


Is somebody still working on these fixes?
I think I found my old b43-legacy based 4306, so that I will
be able to get these patches into properly tested shape.

-- 
Michael


pgpO_IQ9cG7LX.pgp
Description: OpenPGP digital signature

Re: [PATCH V2 net-next 20/21] net-next/hinic: Add ethtool and stats

2017-07-30 Thread Aviad Krawczyk

Hi,

I saw that netif_err is more common in code, is it preferred on netdev_err?
What is the preferred style, netif_ or netdev_?

Best Regards,
Aviad

On 7/27/2017 1:33 AM, Andrew Lunn wrote:
> On Wed, Jul 19, 2017 at 03:36:28PM +0300, Aviad Krawczyk wrote:
>> Hi Joe,
>>
>> I tried to be consistent with the comments before, that requested
>> that we will use dev_err exclude some special cases for use netif.
>>
>> We will replace the dev_err(>dev,.. to netdev_err in the
>> next fix.
> 
> netdev_err() should be used when possible. You just have to be careful
> in the probe() function, before netdev exists and you get "(NULL
> net_device):" or before it is registered and you get "(unnamed
> net_device)" instead of "eth42" etc.
> 
> Andrew
> 
> .
>

Re: [PATCH V5 net-next 2/8] net: hns3: Add support of the HNAE3 framework

2017-07-30 Thread Leon Romanovsky

On Fri, Jul 28, 2017 at 11:26:46PM +0100, Salil Mehta wrote:
> This patch adds the support of the HNAE3 (Hisilicon Network
> Acceleration Engine 3) framework support to the HNS3 driver.
>
> Framework facilitates clients like ENET(HNS3 Ethernet Driver), RoCE
> and user-space Ethernet drivers (like ODP etc.) to register with HNAE3
> devices and their associated operations.
>
> Signed-off-by: Daode Huang 
> Signed-off-by: lipeng 
> Signed-off-by: Salil Mehta 
> Signed-off-by: Yisen Zhuang 
> ---
> Patch V5: Addressed following comments
>   1. Leon Romanovsky:
>  https://lkml.org/lkml/2017/7/23/67
> Patch V4: Addressed following comments
>   1. Andrew Lunn:
>  https://lkml.org/lkml/2017/6/17/233
>  https://lkml.org/lkml/2017/6/18/105
>   2. Bo Yu:
>  https://lkml.org/lkml/2017/6/18/112
>   3. Stephen Hamminger:
>  https://lkml.org/lkml/2017/6/19/778
> Patch V3: Addressed below comments
>   1. Andrew Lunn:
>  https://lkml.org/lkml/2017/6/13/1025
> Patch V2: No change
> Patch V1: Initial Submit
> ---
>  drivers/net/ethernet/hisilicon/hns3/hnae3.c | 319 
>  drivers/net/ethernet/hisilicon/hns3/hnae3.h | 444 
> 
>  2 files changed, 763 insertions(+)
>  create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.c
>  create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.h
>
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.c 
> b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
> new file mode 100644
> index 000..d28b69d
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
> @@ -0,0 +1,319 @@
> +/*
> + * Copyright (c) 2016-2017 Hisilicon Limited.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "hnae3.h"
> +
> +static LIST_HEAD(hnae3_ae_algo_list);
> +static LIST_HEAD(hnae3_client_list);
> +static LIST_HEAD(hnae3_ae_dev_list);
> +
> +/* we are keeping things simple and using single lock for all the
> + * list. This is a non-critical code so other updations, if happen
> + * in parallel, can wait.
> + */
> +static DEFINE_MUTEX(hnae3_common_lock);
> +
> +static bool hnae3_client_match(enum hnae3_client_type client_type,
> +enum hnae3_dev_type dev_type)
> +{
> + if (dev_type == HNAE3_DEV_KNIC) {
> + switch (client_type) {
> + case HNAE3_CLIENT_KNIC:
> + case HNAE3_CLIENT_ROCE:
> + return true;
> + default:
> + return false;
> + }
> + } else if (dev_type == HNAE3_DEV_UNIC) {
> + switch (client_type) {
> + case HNAE3_CLIENT_UNIC:
> + return true;
> + default:
> + return false;
> + }
> + } else {
> + return false;
> + }
> +}

Slightly compact version:

static bool hnae3_client_match(enum hnae3_client_type client_type,
   enum hnae3_dev_type dev_type)
{
if (dev_type == HNAE3_DEV_KNIC &&
client_type == HNAE3_CLIENT_KNIC || client_type == 
HNAE3_CLIENT_ROCE)
return true;
if (dev_type == HNAE3_DEV_UNIC && client_type == HNAE3_CLIENT_UNIC)
return true;
return false;
}


> +
> +static int hnae3_match_n_instantiate(struct hnae3_client *client,
> +  struct hnae3_ae_dev *ae_dev,
> +  bool is_reg, bool *matched)
> +{
> + int ret;
> +
> + *matched = false;
> +
> + /* check if this client matches the type of ae_dev */
> + if (!(hnae3_client_match(client->type, ae_dev->dev_type) &&
> +   hnae_get_bit(ae_dev->flag, HNAE3_DEV_INITED_B))) {
> + return 0;
> + }
> + /* there is a match of client and dev */
> + *matched = true;
> +
> + if (!(ae_dev->ops && ae_dev->ops->init_client_instance &&
> +   ae_dev->ops->uninit_client_instance)) {
> + dev_err(_dev->pdev->dev,
> + "ae_dev or client init/uninit ops are null\n");
> + return -EOPNOTSUPP;
> + }

You should check it during registration phase, IMHO in other places it is
safe to assume that you have init/uninit functions.

> +
> + /* now, (un-)instantiate client by calling lower layer */
> + if (is_reg) {
> + ret = ae_dev->ops->init_client_instance(client, ae_dev);
> + if (ret)
> + dev_err(_dev->pdev->dev,
> + "fail to instantiate client\n");
> + return ret;
> + }
> +
> + ae_dev->ops->uninit_client_instance(client,

[PATCH] netfilter: ipvs: Fix space before '[' error.

2017-07-30 Thread Arvind Yadav

Fix checkpatch.pl error:
ERROR: space prohibited before open square bracket '['.

Signed-off-by: Arvind Yadav 
---
 net/netfilter/ipvs/ip_vs_proto_tcp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c 
b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index 12dc8d5..4fc17fc 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -439,7 +439,7 @@ static bool tcp_state_active(int state)
return tcp_state_active_table[state];
 }
 
-static struct tcp_states_t tcp_states [] = {
+static struct tcp_states_t tcp_states[] = {
 /* INPUT */
 /*sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA*/
 /*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSR }},
@@ -462,7 +462,7 @@ static struct tcp_states_t tcp_states [] = {
 /*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sCL }},
 };
 
-static struct tcp_states_t tcp_states_dos [] = {
+static struct tcp_states_t tcp_states_dos[] = {
 /* INPUT */
 /*sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA*/
 /*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSA }},
-- 
2.7.4

Re: [PATCH net] bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len

2017-07-30 Thread David Miller

From: Daniel Borkmann 
Date: Fri, 28 Jul 2017 17:05:25 +0200

> bpf_prog_size(prog->len) is not the correct length we want to dump
> back to user space. The code in bpf_prog_get_info_by_fd() uses this
> to copy prog->insnsi to user space, but bpf_prog_size(prog->len) also
> includes the size of struct bpf_prog itself plus program instructions
> and is usually used either in context of accounting or for bpf_prog_alloc()
> et al, thus we copy out of bounds in bpf_prog_get_info_by_fd()
> potentially. Use the correct bpf_prog_insn_size() instead.
> 
> Fixes: 1e2709769086 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD")
> Signed-off-by: Daniel Borkmann 

Applied, thanks Daniel.

Re: [PATCH net] mcs7780: Silence uninitialized variable warning

2017-07-30 Thread David Miller

From: Dan Carpenter 
Date: Fri, 28 Jul 2017 17:45:11 +0300

> - __u16 rval;
> + __u16 rval = -1;

Fixing a bogus warning by assigning a signed constant to an
unsigned variable doesn't really make me all that happy.

I don't think I'll apply this, sorry.

Re: [PATCH net] tcp: avoid bogus gcc-7 array-bounds warning

2017-07-30 Thread David Miller

From: Arnd Bergmann 
Date: Fri, 28 Jul 2017 16:41:37 +0200

> When using CONFIG_UBSAN_SANITIZE_ALL, the TCP code produces a
> false-positive warning:
> 
> net/ipv4/tcp_output.c: In function 'tcp_connect':
> net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds 
> [-Werror=array-bounds]
>tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
> ^~
> net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds 
> [-Werror=array-bounds]
>tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
>~^
> 
> I have opened a gcc bug for this, but distros have already shipped
> compilers with this problem, and it's not clear yet whether there is
> a way for gcc to avoid the warning. As the problem is related to the
> bitfield access, this introduces a temporary variable to store the old
> enum value.
> 
> I did not notice this warning earlier, since UBSAN is disabled when
> building with COMPILE_TEST, and that was always turned on in both
> allmodconfig and randconfig tests.
> 
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601
> Signed-off-by: Arnd Bergmann 

Applied, thanks Arnd.

Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-30 Thread K. Den

On Wed, 2017-07-26 at 19:08 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 26, 2017 at 09:37:15PM +0800, Jason Wang wrote:
> > 
> > 
> > On 2017年07月26日 21:18, Jason Wang wrote:
> > > 
> > > 
> > > On 2017年07月26日 20:57, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:
> > > > > This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> > > > > was reported to break vhost_net. We want to cache used event and use
> > > > > it to check for notification. We try to valid cached used event by
> > > > > checking whether or not it was ahead of new, but this is not correct
> > > > > all the time, it could be stale and there's no way to know about this.
> > > > > 
> > > > > Signed-off-by: Jason Wang
> > > > 
> > > > Could you supply a bit more data here please?  How does it get stale?
> > > > What does guest need to do to make it stale?  This will be helpful if
> > > > anyone wants to bring it back, or if we want to extend the protocol.
> > > > 
> > > 
> > > The problem we don't know whether or not guest has published a new used
> > > event. The check vring_need_event(vq->last_used_event, new + vq->num,
> > > new) is not sufficient to check for this.
> > > 
> > > Thanks
> > 
> > More notes, the previous assumption is that we don't move used event back,
> > but this could happen in fact if idx is wrapper around.
> 
> You mean if the 16 bit index wraps around after 64K entries.
> Makes sense.
> 
> > Will repost and add
> > this into commit log.
> > 
> > Thanks

Hi,

I am just curious but I have got a question:
AFAIU, if you wanted to keep the caching mechanism alive in the code base,
the following two changes could clear off the issue, or not?:
(1) Always fetch the latest event value from guest when signalled_used event is
invalid, which includes last_used_idx wraps-around case. Otherwise we might need
changes which would complicate too much the logic to properly decide whether or
not to skip signalling in the next vhost_notify round.
(2) On top of that, split the signal-postponing logic to three cases like:
* if the interval of vq.num is [2^16, UINT_MAX]:
any cached event is in should-postpone-signalling interval, so paradoxically
must always do signalling.
* else if the interval of vq.num is [2^15, 2^16):
the logic in the original patch (809ecb9bca6a9) suffices
* else (= less than 2^15) (optional):
checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
would suffice.

Am I missing something, or is this irrelevant?
I would appreciate if you could elaborate a bit more how the situation where
event idx wraps around and moves back would make trouble.

Thanks.

Re: [PATCH net-next v2 0/3] ethtool: support for forward error correction mode setting on a link

2017-07-30 Thread David Miller

From: Roopa Prabhu 
Date: Thu, 27 Jul 2017 16:47:25 -0700

> From: Roopa Prabhu 
> 
> Forward Error Correction (FEC) modes i.e Base-R
> and Reed-Solomon modes are introduced in 25G/40G/100G standards
> for providing good BER at high speeds. Various networking devices
> which support 25G/40G/100G provides ability to manage supported FEC
> modes and the lack of FEC encoding control and reporting today is a
> source for interoperability issues for many vendors.
> FEC capability as well as specific FEC mode i.e. Base-R
> or RS modes can be requested or advertised through bits D44:47 of base link
> codeword.
> 
> This patch set intends to provide option under ethtool to manage and
> report FEC encoding settings for networking devices as per IEEE 802.3
> bj, bm and by specs.
> 
> v2 :
> - minor patch format fixes and typos pointed out by Andrew
> - there was a pending discussion on the use of 'auto' vs
>   'automatic' for fec settings. I have left it as 'auto'
>   because in most cases today auto is used in place of
>   automatic to represent automatically generated values.
>   We use it in other networking config too. I would prefer
>   leaving it as auto.

Series applied to net-next, thank you.

71 matches

Mail list logo