Re: [PATCH net] tun: Fix use-after-free on XDP_TX

2018-07-12 Thread Jesper Dangaard Brouer
On Fri, 13 Jul 2018 13:05:04 +0800
Jason Wang  wrote:

> On 2018年07月13日 12:24, Toshiaki Makita wrote:
> > On XDP_TX we need to free up the frame only when tun_xdp_tx() returns a
> > negative value. A positive value indicates that the packet is
> > successfully enqueued to the ptr_ring, so freeing the page causes
> > use-after-free.
> >
> > Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking")
> > Signed-off-by: Toshiaki Makita 
> > ---
> >   drivers/net/tun.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index a192a01..f5727ba 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -1688,7 +1688,7 @@ static struct sk_buff *tun_build_skb(struct 
> > tun_struct *tun,
> > case XDP_TX:
> > get_page(alloc_frag->page);
> > alloc_frag->offset += buflen;
> > -   if (tun_xdp_tx(tun->dev, ))
> > +   if (tun_xdp_tx(tun->dev, ) < 0)
> > goto err_redirect;
> > rcu_read_unlock();
> > local_bh_enable();  
> 
> Acked-by: Jason Wang 

Acked-by: Jesper Dangaard Brouer 

Thanks for catching and fixing this!

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


[PATCH net-next] net: ip6_gre: get ipv6hdr after skb_cow_head()

2018-07-12 Thread Prashant Bhole
A KASAN:use-after-free bug was found related to ip6-erspan
while running selftests/net/ip6_gre_headroom.sh

It happens because of following sequence:
- ipv6hdr pointer is obtained from skb
- skb_cow_head() is called, skb->head memory is reallocated
- old data is accessed using ipv6hdr pointer

skb_cow_head() call was added in e41c7c68ea77 ("ip6erspan: make sure
enough headroom at xmit."), but looking at the history there was a
chance of similar bug because gre_handle_offloads() and pskb_trim()
can also reallocate skb->head memory. Fixes tag points to commit
which introduced possibility of this bug.

This patch moves ipv6hdr pointer assignment after skb_cow_head() call.

Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Prashant Bhole 
---
 net/ipv6/ip6_gre.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 367177786e34..fc7dd3a04360 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -927,7 +927,6 @@ static netdev_tx_t ip6gre_tunnel_xmit(struct sk_buff *skb,
 static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 struct net_device *dev)
 {
-   struct ipv6hdr *ipv6h = ipv6_hdr(skb);
struct ip6_tnl *t = netdev_priv(dev);
struct dst_entry *dst = skb_dst(skb);
struct net_device_stats *stats;
@@ -1012,6 +1011,8 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff 
*skb,
goto tx_err;
}
} else {
+   struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+
switch (skb->protocol) {
case htons(ETH_P_IP):
memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
-- 
2.17.1




Re: [PATCH v3 net-next] net/sched: add skbprio scheduler

2018-07-12 Thread Cong Wang
On Wed, Jul 11, 2018 at 11:37 AM Marcelo Ricardo Leitner
 wrote:
>
> On Tue, Jul 10, 2018 at 07:32:43PM -0700, Cong Wang wrote:
> > On Mon, Jul 9, 2018 at 12:53 PM Marcelo Ricardo Leitner
> >  wrote:
> > >
> > > On Mon, Jul 09, 2018 at 02:18:33PM -0400, Michel Machado wrote:
> > > >
> > > >2. sch_prio.c does not have a global limit on the number of packets 
> > > > on
> > > > all its queues, only a limit per queue.
> > >
> > > It can be useful to sch_prio.c as well, why not?
> > > prio_enqueue()
> > > {
> > > ...
> > > +   if (count > sch->global_limit)
> > > +   prio_tail_drop(sch);   /* to be implemented */
> > > ret = qdisc_enqueue(skb, qdisc, to_free);
> > >
> >
> > Isn't the whole point of sch_prio offloading the queueing to
> > each class? If you need a limit, there is one for each child
> > qdisc if you use for example pfifo or bfifo (depending on you
> > want to limit bytes or packets).
>
> Yes, but Michel wants to drop from other lower priorities if needed,
> and that's not possible if you handle the limit already in a child
> qdisc as they don't know about their siblings. The idea in the example
> above is to discard it from whatever lower priority is needed, then
> queue it. (ok, the example missed to check the priority level)

So it disproves your point of adding a flag to sch_prio, right?

Also, you have to re-introduce qdisc->ops->drop() if you really want
to go this direction.

>
> As for the different units, sch_prio holds a count of how many packets
> are queued on its children, and that's what would be used for the limit.
>
> >
> > Also, what's your plan for backward compatibility here?
>
> say:
>   if (sch->global_limit && count > sch->global_limit)
> as in, only do the limit check/enforcing if needed.

Obviously doesn't work, users could pass 0 to effectively
disable the qdisc from enqueue'ing any packet.


Re: [PATCH net] tun: Fix use-after-free on XDP_TX

2018-07-12 Thread Jason Wang




On 2018年07月13日 12:24, Toshiaki Makita wrote:

On XDP_TX we need to free up the frame only when tun_xdp_tx() returns a
negative value. A positive value indicates that the packet is
successfully enqueued to the ptr_ring, so freeing the page causes
use-after-free.

Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: Toshiaki Makita 
---
  drivers/net/tun.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a192a01..f5727ba 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1688,7 +1688,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
case XDP_TX:
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
-   if (tun_xdp_tx(tun->dev, ))
+   if (tun_xdp_tx(tun->dev, ) < 0)
goto err_redirect;
rcu_read_unlock();
local_bh_enable();


Acked-by: Jason Wang 


[PATCH net] tun: Fix use-after-free on XDP_TX

2018-07-12 Thread Toshiaki Makita
On XDP_TX we need to free up the frame only when tun_xdp_tx() returns a
negative value. A positive value indicates that the packet is
successfully enqueued to the ptr_ring, so freeing the page causes
use-after-free.

Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: Toshiaki Makita 
---
 drivers/net/tun.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a192a01..f5727ba 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1688,7 +1688,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
case XDP_TX:
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
-   if (tun_xdp_tx(tun->dev, ))
+   if (tun_xdp_tx(tun->dev, ) < 0)
goto err_redirect;
rcu_read_unlock();
local_bh_enable();
-- 
1.8.3.1




Re: [PATCH net-next v6 00/11] Modify action API for implementing lockless actions

2018-07-12 Thread Cong Wang
On Sat, Jul 7, 2018 at 8:43 PM David Miller  wrote:
>
> From: Vlad Buslov 
> Date: Thu,  5 Jul 2018 17:24:22 +0300
>
> > Currently, all netlink protocol handlers for updating rules, actions and
> > qdiscs are protected with single global rtnl lock which removes any
> > possibility for parallelism. This patch set is a first step to remove
> > rtnl lock dependency from TC rules update path.
>  ...
>
> I'll apply this for now, I reviewed it a few more times and I see
> where you are going with this.

Dear David,

I don't understand why you even believe the claim of lockless
updaters here, it at least should raise a red flag when you see any
kinda of this claim.

I know you don't trust me, how about thinking it in this way:

Why does RCU still require a lock for RCU writers? (Or at least
RCU recommends a lock, if anyone really wants to point out some
lockless algorithm here.)

or:

If writers could really go lockless as easily as Vlad claims, how could
even Paul E. McKenney never bring it into RCU?

Maybe Vlad is much cleverer than any of us here, and maybe he really
discovers a very brilliant algorithm to allow TC actions to be updated
locklessly, why not wait until he shows a proof (either code or a paper)?
Is there a rush? I don't see it.

In fact, I discussed this with Vlad a little bit at netdev TC workshop.
I never see any brilliant algorithm from him from his slides, and I was
told by him he used "copy and replace" to archive parallel updaters, I
told him that is basically how RCU works and RCU writers have to be
sync'ed with a lock (or at least recommended).

Also, to confirm my judgement, I checked this with Paul privately too.
Paul said you have to be extremely careful to go lockless, it is very hard
to be bug free for lockless, although he _never_ says it is impossible.

My _personal_ bet is that, lockless updates for TC filters or actions
are impossible unless there are more things hiding behind "copy and
replace", for example, some brilliant lockless algorithm. If lockless is
really impossible in this circumstance, then many of your efforts in
this patchset are vain, by the way.

I _do_ believe you can break RTNL down to per device, per filter or per
action, but no matter how small the locking scope is, there is still a lock.
With a lock, there is no need to make things friendly to lockless, like
making an integer increment inside an action to be atomic (your patch
02/11).

Please _do_ prove my personal judgement is wrong, by showing your
final code or a formal paper/article. I am very *happy* to be proved
to be wrong here, I am very open to change my mind here.

Vlad, we need your proof. Please prove I am wrong, seriously!!! :)

Thanks to anyone for proving me I am wrong just in case!!! :)


Re: [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update

2018-07-12 Thread Cong Wang
On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov  wrote:
>
> Implement functions to atomically update and free action cookie
> using rcu mechanism.

Without stating any reason. Is this even a changelog?

>
> Reviewed-by: Marcelo Ricardo Leitner 

Dear Marcelo, how did it pass your review? See below:


> +static void tcf_set_action_cookie(struct tc_cookie __rcu **old_cookie,
> + struct tc_cookie *new_cookie)
> +{
> +   struct tc_cookie *old;
> +
> +   old = xchg(old_cookie, new_cookie);


This is an incorrect use of RCU, obviously should be rcu_assign_pointer()
here.


> @@ -65,10 +83,7 @@ static void free_tcf(struct tc_action *p)
> free_percpu(p->cpu_bstats);
> free_percpu(p->cpu_qstats);
>
> -   if (p->act_cookie) {
> -   kfree(p->act_cookie->data);
> -   kfree(p->act_cookie);
> -   }
> +   tcf_set_action_cookie(>act_cookie, NULL);

So, this is called in free_tcf(), where the action is already
invisible from readers so it is ready to be freed.

The question is:

If the action itself is already ready to be freed, why do you
need RCU here? What could still read 'act->act_cookie'
while 'act' is already invisible?

Its last refcnt is already gone, the fast path RCU readers
are gone too given filters use rcu work already.

Standalone action dump? Again, the last refcnt is already
gone.

Marcelo, Vlad, Jiri, please explain.

Thanks!


Re: [PATCH v2 net-next 9/9] lan743x: Add PTP support

2018-07-12 Thread Richard Cochran
On Thu, Jul 12, 2018 at 03:05:06PM -0400, Bryan Whitehead wrote:
> +static int lan743x_ethtool_get_ts_info(struct net_device *netdev,
> +struct ethtool_ts_info *ts_info)
> +{
> + struct lan743x_adapter *adapter = netdev_priv(netdev);
> +
> + ts_info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE |
> +SOF_TIMESTAMPING_RX_SOFTWARE |
> +SOF_TIMESTAMPING_SOFTWARE |
> +SOF_TIMESTAMPING_TX_HARDWARE |
> +SOF_TIMESTAMPING_RX_HARDWARE |
> +SOF_TIMESTAMPING_RAW_HARDWARE;
> +#ifdef CONFIG_PTP_1588_CLOCK

No need for this ifdeferry - ptp_clock_index() already returns -1 in
that case.

> + if (adapter->ptp.ptp_clock)
> + ts_info->phc_index = ptp_clock_index(adapter->ptp.ptp_clock);
> + else
> + ts_info->phc_index = -1;
> +#else
> + ts_info->phc_index = -1;
> +#endif
> + ts_info->tx_types = BIT(HWTSTAMP_TX_OFF) |
> + BIT(HWTSTAMP_TX_ON);
> + ts_info->rx_filters = BIT(HWTSTAMP_FILTER_NONE) |
> +   BIT(HWTSTAMP_FILTER_ALL);
> + return 0;
> +}
> +

> @@ -690,6 +717,7 @@ const struct ethtool_ops lan743x_ethtool_ops = {
>   .get_rxfh_indir_size = lan743x_ethtool_get_rxfh_indir_size,
>   .get_rxfh = lan743x_ethtool_get_rxfh,
>   .set_rxfh = lan743x_ethtool_set_rxfh,
> + .get_ts_info = lan743x_ethtool_get_ts_info,
>   .get_eee = lan743x_ethtool_get_eee,
>   .set_eee = lan743x_ethtool_set_eee,
>   .get_link_ksettings = phy_ethtool_get_link_ksettings,
> diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
> b/drivers/net/ethernet/microchip/lan743x_main.c
> index 953b581..ca9ae49 100644
> --- a/drivers/net/ethernet/microchip/lan743x_main.c
> +++ b/drivers/net/ethernet/microchip/lan743x_main.c
> @@ -267,6 +267,10 @@ static void lan743x_intr_shared_isr(void *context, u32 
> int_sts, u32 flags)
>   lan743x_intr_software_isr(adapter);
>   int_sts &= ~INT_BIT_SW_GP_;
>   }
> + if (int_sts & INT_BIT_1588_) {
> + lan743x_ptp_isr(adapter);
> + int_sts &= ~INT_BIT_1588_;
> + }
>   }
>   if (int_sts)
>   lan743x_csr_write(adapter, INT_EN_CLR, int_sts);
> @@ -976,6 +980,7 @@ static void lan743x_phy_link_status_change(struct 
> net_device *netdev)
>  ksettings.base.duplex,
>  local_advertisement,
>  remote_advertisement);
> + lan743x_ptp_update_latency(adapter, ksettings.base.speed);
>   }
>  }
>  
> @@ -1256,11 +1261,29 @@ static void lan743x_tx_release_desc(struct lan743x_tx 
> *tx,
>   buffer_info->dma_ptr = 0;
>   buffer_info->buffer_length = 0;
>   }
> - if (buffer_info->skb) {
> + if (!buffer_info->skb)
> + goto clear_active;
> +
> + if (!(buffer_info->flags &
> + TX_BUFFER_INFO_FLAG_TIMESTAMP_REQUESTED)) {

Bad line break.

>   dev_kfree_skb(buffer_info->skb);
> - buffer_info->skb = NULL;
> + goto clear_skb;
>   }
>  
> + if (cleanup) {
> + lan743x_ptp_unrequest_tx_timestamp(tx->adapter);
> + dev_kfree_skb(buffer_info->skb);
> + } else {
> + lan743x_ptp_tx_timestamp_skb(tx->adapter,
> +  buffer_info->skb,
> +  (buffer_info->flags &
> +  TX_BUFFER_INFO_FLAG_IGNORE_SYNC)
> +  != 0);

This is poor coding style.  Please find a better way.

> + }
> +
> +clear_skb:
> + buffer_info->skb = NULL;
> +
>  clear_active:
>   buffer_info->flags &= ~TX_BUFFER_INFO_FLAG_ACTIVE;
>  
> @@ -1321,10 +1344,25 @@ static int lan743x_tx_get_avail_desc(struct 
> lan743x_tx *tx)
>   return last_head - last_tail - 1;
>  }
>  
> +void lan743x_tx_set_timestamping_mode(struct lan743x_tx *tx,
> +   bool enable_timestamping,
> +   bool enable_onestep_sync)
> +{
> + if (enable_timestamping)
> + tx->ts_flags |= TX_TS_FLAG_TIMESTAMPING_ENABLED;
> + else
> + tx->ts_flags &= ~TX_TS_FLAG_TIMESTAMPING_ENABLED;
> + if (enable_onestep_sync)
> + tx->ts_flags |= TX_TS_FLAG_ONE_STEP_SYNC;
> + else
> + tx->ts_flags &= ~TX_TS_FLAG_ONE_STEP_SYNC;
> +}
> +
>  static int lan743x_tx_frame_start(struct lan743x_tx *tx,
> unsigned char *first_buffer,
> unsigned int first_buffer_length,
> unsigned int frame_length,
> +

Re: [PATCH net-next v3 02/11] devlink: Add callback to query for snapshot id before snapshot create

2018-07-12 Thread Jakub Kicinski
On Thu, 12 Jul 2018 15:13:09 +0300, Alex Vesker wrote:
> To restrict the driver with the snapshot ID selection a new callback
> is introduced for the driver to get the snapshot ID before creating
> a new snapshot. This will also allow giving the same ID for multiple
> snapshots taken of different regions on the same time.

I'm not in position to criticize other people's commit messages :), but
I find this one hard to parse.  I think what you meant to say is that
you add a helper for numbering the snapshot per-devlink instance.
There is no callback to be seen here.  You *prevent* from giving the
same ID to multiple snapshot even if they are from different regions.

> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index cac8561..6c92ddd 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -4193,6 +4193,27 @@ void devlink_region_destroy(struct devlink_region 
> *region)
>  }
>  EXPORT_SYMBOL_GPL(devlink_region_destroy);
>  
> +/**
> + *   devlink_region_shapshot_id_get - get snapshot ID
> + *
> + *   This callback should be called when adding a new snapshot,
> + *   Driver should use the same id for multiple snapshots taken
> + *   on multiple regions at the same time/by the same trigger.
> + *
> + *   @devlink: devlink
> + */
> +u32 devlink_region_shapshot_id_get(struct devlink *devlink)
> +{
> + u32 id;
> +
> + mutex_lock(>lock);
> + id = ++devlink->snapshot_id;

Any reason not to use an IDA?  The reuse may seem unlikely, OTOH IDA
isn't going to cost much, so why risk it...

> + mutex_unlock(>lock);
> +
> + return id;
> +}
> +EXPORT_SYMBOL_GPL(devlink_region_shapshot_id_get);

Sorry for only spotting this now.


[PATCH net-next] TCP: make seq # error messages more readable

2018-07-12 Thread Randy Dunlap
From: Randy Dunlap 

Attempt to make cryptic TCP seq number error messages clearer by
(1) adding the function name, (2) identifying the errors as "seq # bug",
and (3) grouping the field identifiers and values by separating them
with commas.

E.g., the following message is changed from:

recvmsg bug 2: copied 73BCB6CD seq 70F17CBE rcvnxt 73BCB9AA fl 0
WARNING: CPU: 2 PID: 1501 at /linux/net/ipv4/tcp.c:1881 tcp_recvmsg+0x649/0xb90

to:

tcp_recvmsg: TCP recvmsg seq # bug 2: copied 73BCB6CD, seq 70F17CBE, rcvnxt 
73BCB9AA, fl 0
WARNING: CPU: 2 PID: 1501 at /linux/net/ipv4/tcp.c:2011 tcp_recvmsg+0x694/0xba0

Suggested-by: 積丹尼 Dan Jacobson 
Signed-off-by: Randy Dunlap 
---
 net/ipv4/tcp.c |   11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

--- linux-next-20180712.orig/net/ipv4/tcp.c
+++ linux-next-20180712/net/ipv4/tcp.c
@@ -1994,9 +1994,9 @@ int tcp_recvmsg(struct sock *sk, struct
 * shouldn't happen.
 */
if (WARN(before(*seq, TCP_SKB_CB(skb)->seq),
-"recvmsg bug: copied %X seq %X rcvnxt %X fl 
%X\n",
-*seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt,
-flags))
+"%s: TCP recvmsg seq # bug: copied %X, seq %X, 
rcvnxt %X, fl %X\n",
+__func__, *seq,
+TCP_SKB_CB(skb)->seq, tp->rcv_nxt, flags))
break;
 
offset = *seq - TCP_SKB_CB(skb)->seq;
@@ -2009,8 +2009,9 @@ int tcp_recvmsg(struct sock *sk, struct
if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
goto found_fin_ok;
WARN(!(flags & MSG_PEEK),
-"recvmsg bug 2: copied %X seq %X rcvnxt %X fl 
%X\n",
-*seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt, flags);
+"%s: TCP recvmsg seq # bug 2: copied %X, seq %X, 
rcvnxt %X, fl %X\n",
+__func__, *seq,
+TCP_SKB_CB(skb)->seq, tp->rcv_nxt, flags);
}
 
/* Well, if we have backlog, try to process it now yet. */



Re: [PATCH v4 net-next 19/19] net/mlx5e: Kconfig, mutually exclude compilation of TLS and IPsec accel

2018-07-12 Thread David Miller
From: Boris Pismenny 
Date: Thu, 12 Jul 2018 22:25:57 +0300

> We currently have no devices that support both TLS and IPsec using the
> accel framework, and the current code does not support both IPsec and
> TLS. This patch prevents such combinations.
> 
> Signed-off-by: Boris Pismenny 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
> b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> index 2545296..d3e8c70 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> @@ -93,6 +93,7 @@ config MLX5_EN_TLS
>   depends on TLS_DEVICE
>   depends on TLS=y || MLX5_CORE=m
>   depends on MLX5_ACCEL
> + depends on !MLX5_EN_IPSEC
>   default n

You absolutely cannot do this.

You are forcing a distribution to pick one offload or the other at
build time, that's insane.

Please find a way to support both offloads in the driver.  It is
absolutely valid for a distribution to ship the driver in a state that
supports both offloads and you must therefore support this properly.

Thank you.


Re: [PATCH net-next 5/5 v3] net: gemini: Indicate that we can handle jumboframes

2018-07-12 Thread David Miller
From: Linus Walleij 
Date: Wed, 11 Jul 2018 21:32:45 +0200

> The hardware supposedly handles frames up to 10236 bytes and
> implements .ndo_change_mtu() so accept 10236 minus the ethernet
> header for a VLAN tagged frame on the netdevices. Use
> ETH_MIN_MTU as minimum MTU.
> 
> Signed-off-by: Linus Walleij 

Applied.


Re: [PATCH net-next 1/5 v3] net: gemini: Look up L3 maxlen from table

2018-07-12 Thread David Miller
From: Linus Walleij 
Date: Wed, 11 Jul 2018 21:32:41 +0200

> The code to calculate the hardware register enumerator
> for the maximum L3 length isn't entirely simple to read.
> Use the existing defines and rewrite the function into a
> table look-up.
> 
> Acked-by: Michał Mirosław 
> Signed-off-by: Linus Walleij 

Applied.


Re: [PATCH net-next 4/5 v3] net: gemini: Move main init to port

2018-07-12 Thread David Miller
From: Linus Walleij 
Date: Wed, 11 Jul 2018 21:32:44 +0200

> The initialization sequence for the ethernet, setting up
> interrupt routing and such things, need to be done after
> both the ports are clocked and reset. Before this the
> config will not "take". Move the initialization to the
> port probe function and keep track of init status in
> the state.
> 
> Signed-off-by: Linus Walleij 

Applied.


Re: [PATCH net-next 2/5 v3] net: gemini: Improve connection prints

2018-07-12 Thread David Miller
From: Linus Walleij 
Date: Wed, 11 Jul 2018 21:32:42 +0200

> Switch over to using a module parameter and debug prints
> that can be controlled by this or ethtool like everyone
> else. Depromote all other prints to debug messages.
> 
> The phy_print_status() was already in place, albeit never
> really used because the debuglevel hiding it had to be
> set up using ethtool.
> 
> Signed-off-by: Linus Walleij 

Applied.


Re: [PATCH net-next 3/5 v3] net: gemini: Allow multiple ports to instantiate

2018-07-12 Thread David Miller
From: Linus Walleij 
Date: Wed, 11 Jul 2018 21:32:43 +0200

> The code was not tested with two ports actually in use at
> the same time. (I blame this on lack of actual hardware using
> that feature.) Now after locating a system using both ports,
> add necessary fix to make both ports come up.
> 
> Signed-off-by: Linus Walleij 

Applied.


Re: [PATCH net-next v3 00/11] devlink: Add support for region access

2018-07-12 Thread David Miller
From: Alex Vesker 
Date: Thu, 12 Jul 2018 15:13:07 +0300

> This is a proposal which will allow access to driver defined address
> regions using devlink. Each device can create its supported address
> regions and register them. A device which exposes a region will allow
> access to it using devlink.
> 
> The suggested implementation will allow exposing regions to the user,
> reading and dumping snapshots taken from different regions. 
> A snapshot represents a memory image of a region taken by the driver.
> 
> If a device collects a snapshot of an address region it can be later
> exposed using devlink region read or dump commands.
> This functionality allows for future analyses on the snapshots to be
> done.
> 
> The major benefit of this support is not only to provide access to
> internal address regions which were inaccessible to the user but also
> to provide an additional way to debug complex error states using the
> region snapshots.
 ...

Series applied, thanks!


Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread महेश बंडेवार
On Thu, Jul 12, 2018 at 4:14 PM, Michal Soltys  wrote:
> On 2018-07-13 00:03, Jay Vosburgh wrote:
>> Mahesh Bandewar (महेश बंडेवार) wrote:
>>
>>>On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
>>> wrote:
 Michal Soltys  wrote:

>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
>> Mahesh Bandewar (महेश बंडेवार) wrote:
>>
>>> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:

 Hi,

 As weird as that sounds, this is what I observed today after bumping
 kernel version. I have a setup where 2 bonds are attached to linux
 bridge and physically are connected to two switches doing MSTP (and
 linux bridge is just passing them).

 Initially I suspected some changes related to bridge code - but quick
 peek at the code showed nothing suspicious - and the part of it that
 explicitly passes stp frames if stp is not enabled has seen little
 changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
 regular non-bonded interfaces are attached everything works fine.

 Just to be sure I detached the bond (802.3ad mode) and checked it with
 simple tcpdump (ether proto \\stp) - and indeed no hello packets were
 there (with them being present just fine on active enslaved interface,
 or on the bond device in earlier kernels).

 If time permits I'll bisect tommorow to pinpoint the commit, but from
 quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
 debian) and 4.17.3 (tested on archlinux) are failing.

 Unless this is already a known issue (or you have any suggestions what
 could be responsible).

>>> I believe these are link-local-multicast messages and sometime back a
>>> change went into to not pass those frames to the bonding master. This
>>> could be the side effect of that.
>>
>>  Mahesh, I suspect you're thinking of:
>>
>> commit b89f04c61efe3b7756434d693b9203cc0cce002e
>> Author: Chonggang Li 
>> Date:   Sun Apr 16 12:02:18 2017 -0700
>>
>>  bonding: deliver link-local packets with skb->dev set to link that 
>> packets arrived on
>>
>>  Michal, are you able to revert this patch and test?
>>
>>  -J
>>
>> ---
>>  -Jay Vosburgh, jay.vosbu...@canonical.com
>>
>
>
>Just tested - yes, reverting that patch solves the issues.

 Chonggang,

 Reading the changelog in your commit referenced above, I'm not
 entirely sure what actual problem it is fixing.  Could you elaborate?

 As the patch appears to cause a regression, it needs to be
 either fixed or reverted.

 Mahesh, you signed-off on it as well, perhaps you also have some
 context?

>>>
>>>I think the original idea behind it was to pass the LLDPDUs to the
>>>stack on the interface that they came on since this is considered to
>>>be link-local traffic and passing to bond-master would loose it's
>>>"linklocal-ness". This is true for LLDP and if you change the skb->dev
>>>of the packet, then you don't know which slave link it came on in
>>>(from LLDP consumer's perspective).
>>>
>>>I don't know much about STP but trunking two links and aggregating
>>>this link info through bond-master seems wrong. Just like LLDP, you
>>>are losing info specific to a link and the decision derived from that
>>>info could be wrong.
>>>
>>>Having said that, we determine "linklocal-ness" by looking at L2 and
>>>bondmaster shares this with lts slaves. So it does seem fair to pass
>>>those frames to the bonding-master but at the same time link-local
>>>traffic is supposed to be limited to the physical link (LLDP/STP/LACP
>>>etc). Your thoughts?
>>
>>   I agree the whole thing sounds kind of weird, but I'm curious as
>> to what Michal's actual use case is; he presumably has some practical
>> use for this, since he noticed that the behavior changed.
>>
>
> The whole "link-local" term is a bit I don't know - at this point it
> feels like too many things were thrown into single bag and it got
> somewhat confusing (bpdu, lldp, pause frames, lacp, pae, qinq mulitcast
> that afaik has its own address) - I added some examples in another reply
> I did at the same time as you were typing this one =)
>
>>   Michal, you mentioned MSTP and using 802.3ad (LACP) mode; how
>> does that combination work rationally given that the bond might send and
>> receive traffic across multiple slaves?  Or does the switch side bundle
>> the ports together into a single logical interface for MSTP purposes?
>> On the TX side, I think the bond will likely balance all STP frames to
>> just one slave.
>>
>
> The basic concept - two "main" switches with "important" machines
> connected to those. One switch dies and everything keeps working. With
> no unused ports and so on.
>
> In 

Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


Re: [PATCH] liquidio: Use %pad printk format for dma_addr_t values

2018-07-12 Thread Felix Manlunas
On Thu, Jul 12, 2018 at 10:36:29PM +0200, Helge Deller wrote:
> Use the existing %pad printk format to print dma_addr_t values.
> This avoids the following warnings when compiling on the parisc platform:
> 
> warning: format '%llx' expects argument of type 'long long unsigned int', but 
> argument 2 has type 'dma_addr_t {aka unsigned int}' [-Wformat=]
> 
> Signed-off-by: Helge Deller 
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c 
> b/drivers/net/ethernet/cavium/liquidio/request_manager.c
> index 1f2e75da28f8..d5d9e47daa4b 100644
> --- a/drivers/net/ethernet/cavium/liquidio/request_manager.c
> +++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c
> @@ -110,8 +110,8 @@ int octeon_init_instr_queue(struct octeon_device *oct,
> 
>   memset(iq->request_list, 0, sizeof(*iq->request_list) * num_descs);
> 
> - dev_dbg(>pci_dev->dev, "IQ[%d]: base: %p basedma: %llx count: 
> %d\n",
> - iq_no, iq->base_addr, iq->base_addr_dma, iq->max_count);
> + dev_dbg(>pci_dev->dev, "IQ[%d]: base: %p basedma: %pad count: 
> %d\n",
> + iq_no, iq->base_addr, >base_addr_dma, iq->max_count);
> 
>   iq->txpciq.u64 = txpciq.u64;
>   iq->fill_threshold = (u32)conf->db_min;

Acked-by: Felix Manlunas 


Re: [PATCH net-next] net: gro: properly remove skb from list

2018-07-12 Thread David Miller
From: Prashant Bhole 
Date: Thu, 12 Jul 2018 16:24:59 +0900

> Following crash occurs in validate_xmit_skb_list() when same skb is
> iterated multiple times in the loop and consume_skb() is called.
> 
> The root cause is calling list_del_init(>list) and not clearing
> skb->next in d4546c2509b1. list_del_init(>list) sets skb->next
> to point to skb itself. skb->next needs to be cleared because other
> parts of network stack uses another kind of SKB lists.
> validate_xmit_skb_list() uses such list.
> 
> A similar type of bugfix was reported by Jesper Dangaard Brouer.
> https://patchwork.ozlabs.org/patch/942541/
> 
> This patch clears skb->next and changes list_del_init() to list_del()
> so that list->prev will maintain the list poison.
 ...
> Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.")
> Signed-off-by: Prashant Bhole 
> Reported-by: Tyler Hicks 

Applied, thank you.

Hopefully we can convert more layers to list_head SKB usage, and
thus no longer need hacks like this.

Thanks.


Re: [PATCH net] nsh: set mac len based on inner packet

2018-07-12 Thread David Miller
From: Willem de Bruijn 
Date: Wed, 11 Jul 2018 12:00:44 -0400

> From: Willem de Bruijn 
> 
> When pulling the NSH header in nsh_gso_segment, set the mac length
> based on the encapsulated packet type.
> 
> skb_reset_mac_len computes an offset to the network header, which
> here still points to the outer packet:
> 
>   > skb_reset_network_header(skb);
>   > [...]
>   > __skb_pull(skb, nsh_len);
>   > skb_reset_mac_header(skb);// now mac hdr starts nsh_len == 8B 
> after net hdr
>   > skb_reset_mac_len(skb);   // mac len = net hdr - mac hdr == (u16) 
> -8 == 65528
>   > [..]
>   > skb_mac_gso_segment(skb, ..)
> 
> Link: 
> http://lkml.kernel.org/r/CAF=yd-keactson4axiraxl8m7qas8gbbe1w09eziywvpbbu...@mail.gmail.com
> Reported-by: syzbot+7b9ed9872dab8c323...@syzkaller.appspotmail.com
> Fixes: c411ed854584 ("nsh: add GSO support")
> Signed-off-by: Willem de Bruijn 

Applied and queued up for -stable.


Re: [PATCH net] packet: reset network header if packet shorter than ll reserved space

2018-07-12 Thread David Miller
From: Willem de Bruijn 
Date: Wed, 11 Jul 2018 12:00:45 -0400

> From: Willem de Bruijn 
> 
> If variable length link layer headers result in a packet shorter
> than dev->hard_header_len, reset the network header offset. Else
> skb->mac_len may exceed skb->len after skb_mac_reset_len.
> 
> packet_sendmsg_spkt already has similar logic.
> 
> Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer 
> allocation")
> Signed-off-by: Willem de Bruijn 

Applied and queued up for -stable.


Re: [PATCH net] selftests: in udpgso_bench do not test udp zerocopy

2018-07-12 Thread David Miller
From: Willem de Bruijn 
Date: Wed, 11 Jul 2018 12:00:46 -0400

> From: Willem de Bruijn 
> 
> The udpgso benchmark compares various configurations of UDP and TCP.
> Including one that is not upstream, udp zerocopy. This is a leftover
> from the earlier RFC patchset.
> 
> The test is part of kselftests and run in continuous spinners. Remove
> the failing case to make the test start passing.
> 
> Fixes: 3a687bef148d ("selftests: udp gso benchmark")
> Reported-by: Naresh Kamboju 
> Signed-off-by: Willem de Bruijn 

Applied.


Re: [PATCH net-next 00/10] s390/qeth: updates 2018-07-11

2018-07-12 Thread David Miller
From: Julian Wiedmann 
Date: Wed, 11 Jul 2018 17:42:37 +0200

> please apply this first batch of qeth patches for net-next. It brings the
> usual cleanups, and some performance improvements to the transmit paths.

Series applied, thank you.


Re: [net-next PATCH] net: ipv4: fix listify ip_rcv_finish in case of forwarding

2018-07-12 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Wed, 11 Jul 2018 17:01:20 +0200

> In commit 5fa12739a53d ("net: ipv4: listify ip_rcv_finish") calling
> dst_input(skb) was split-out.  The ip_sublist_rcv_finish() just calls
> dst_input(skb) in a loop.
> 
> The problem is that ip_sublist_rcv_finish() forgot to remove the SKB
> from the list before invoking dst_input().  Further more we need to
> clear skb->next as other parts of the network stack use another kind
> of SKB lists for xmit_more (see dev_hard_start_xmit).
> 
> A crash occurs if e.g. dst_input() invoke ip_forward(), which calls
> dst_output()/ip_output() that eventually calls __dev_queue_xmit() +
> sch_direct_xmit(), and a crash occurs in validate_xmit_skb_list().
> 
> This patch only fixes the crash, but there is a huge potential for
> a performance boost if we can pass an SKB-list through to ip_forward.
> 
> Fixes: 5fa12739a53d ("net: ipv4: listify ip_rcv_finish")
> Signed-off-by: Jesper Dangaard Brouer 
> ---
> Only driver sfc actually uses this, but I don't have this NIC, so I
> tested this on mlx5, with my own changes to make it use 
> netif_receive_skb_list(),
> but I'm not ready to upstream the mlx5 driver change yet.

Applied, thanks Jesper.

This whole:

list_del();
skb->next = NULL;

business is exactly the kind of dragons I was worried about when starting
to use list_head with SKBs.

There is a similar fix wrt. the GRO stuff that I'm about to apply as well.

It definitely is better if we don't have to forcefully hand off NULL
->next next pointers like this in the long term.


Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Michal Soltys
On 2018-07-13 00:03, Jay Vosburgh wrote:
> Mahesh Bandewar (महेश बंडेवार) wrote:
> 
>>On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
>> wrote:
>>> Michal Soltys  wrote:
>>>
On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
> Mahesh Bandewar (महेश बंडेवार) wrote:
>
>> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:
>>>
>>> Hi,
>>>
>>> As weird as that sounds, this is what I observed today after bumping
>>> kernel version. I have a setup where 2 bonds are attached to linux
>>> bridge and physically are connected to two switches doing MSTP (and
>>> linux bridge is just passing them).
>>>
>>> Initially I suspected some changes related to bridge code - but quick
>>> peek at the code showed nothing suspicious - and the part of it that
>>> explicitly passes stp frames if stp is not enabled has seen little
>>> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
>>> regular non-bonded interfaces are attached everything works fine.
>>>
>>> Just to be sure I detached the bond (802.3ad mode) and checked it with
>>> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
>>> there (with them being present just fine on active enslaved interface,
>>> or on the bond device in earlier kernels).
>>>
>>> If time permits I'll bisect tommorow to pinpoint the commit, but from
>>> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
>>> debian) and 4.17.3 (tested on archlinux) are failing.
>>>
>>> Unless this is already a known issue (or you have any suggestions what
>>> could be responsible).
>>>
>> I believe these are link-local-multicast messages and sometime back a
>> change went into to not pass those frames to the bonding master. This
>> could be the side effect of that.
>
>  Mahesh, I suspect you're thinking of:
>
> commit b89f04c61efe3b7756434d693b9203cc0cce002e
> Author: Chonggang Li 
> Date:   Sun Apr 16 12:02:18 2017 -0700
>
>  bonding: deliver link-local packets with skb->dev set to link that 
> packets arrived on
>
>  Michal, are you able to revert this patch and test?
>
>  -J
>
> ---
>  -Jay Vosburgh, jay.vosbu...@canonical.com
>


Just tested - yes, reverting that patch solves the issues.
>>>
>>> Chonggang,
>>>
>>> Reading the changelog in your commit referenced above, I'm not
>>> entirely sure what actual problem it is fixing.  Could you elaborate?
>>>
>>> As the patch appears to cause a regression, it needs to be
>>> either fixed or reverted.
>>>
>>> Mahesh, you signed-off on it as well, perhaps you also have some
>>> context?
>>>
>>
>>I think the original idea behind it was to pass the LLDPDUs to the
>>stack on the interface that they came on since this is considered to
>>be link-local traffic and passing to bond-master would loose it's
>>"linklocal-ness". This is true for LLDP and if you change the skb->dev
>>of the packet, then you don't know which slave link it came on in
>>(from LLDP consumer's perspective).
>>
>>I don't know much about STP but trunking two links and aggregating
>>this link info through bond-master seems wrong. Just like LLDP, you
>>are losing info specific to a link and the decision derived from that
>>info could be wrong.
>>
>>Having said that, we determine "linklocal-ness" by looking at L2 and
>>bondmaster shares this with lts slaves. So it does seem fair to pass
>>those frames to the bonding-master but at the same time link-local
>>traffic is supposed to be limited to the physical link (LLDP/STP/LACP
>>etc). Your thoughts?
> 
>   I agree the whole thing sounds kind of weird, but I'm curious as
> to what Michal's actual use case is; he presumably has some practical
> use for this, since he noticed that the behavior changed.
> 

The whole "link-local" term is a bit I don't know - at this point it
feels like too many things were thrown into single bag and it got
somewhat confusing (bpdu, lldp, pause frames, lacp, pae, qinq mulitcast
that afaik has its own address) - I added some examples in another reply
I did at the same time as you were typing this one =)

>   Michal, you mentioned MSTP and using 802.3ad (LACP) mode; how
> does that combination work rationally given that the bond might send and
> receive traffic across multiple slaves?  Or does the switch side bundle
> the ports together into a single logical interface for MSTP purposes?
> On the TX side, I think the bond will likely balance all STP frames to
> just one slave.
> 

The basic concept - two "main" switches with "important" machines
connected to those. One switch dies and everything keeps working. With
no unused ports and so on.

In more details:

Originally I was trying MSTP daemon (on "important" machines) which
seems quite well and completely coded, but cannot really work correctly
- as afaik you can't 

Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello

I have a business proposal of mutual benefits i would like to discuss with
you.


Re: [PATCH v2 net-next 7/9] lan743x: Add EEE support

2018-07-12 Thread Andrew Lunn
> +static int lan743x_ethtool_set_eee(struct net_device *netdev,
> +struct ethtool_eee *eee)
> +{
> + struct lan743x_adapter *adapter = netdev_priv(netdev);
> + struct phy_device *phydev = NULL;
> + u32 buf = 0;
> + int ret = 0;
> +
> + if (!netdev)
> + return -EINVAL;
> + adapter = netdev_priv(netdev);
> + if (!adapter)
> + return -EINVAL;
> + phydev = netdev->phydev;
> + if (!phydev)
> + return -EIO;
> + if (!phydev->drv) {
> + netif_err(adapter, drv, adapter->netdev,
> +   "Missing PHY Driver\n");
> + return -EIO;
> + }
> +
> + if (eee->eee_enabled) {
> + ret = phy_init_eee(phydev, 0);
> + if (ret) {
> + netif_err(adapter, drv, adapter->netdev,
> +   "EEE initialization failed\n");
> + return ret;
> + }
> +
> + buf = lan743x_csr_read(adapter, MAC_CR);
> + buf |= MAC_CR_EEE_EN_;
> + lan743x_csr_write(adapter, MAC_CR, buf);
> +
> + phy_ethtool_set_eee(phydev, eee);
> +
> + buf = (u32)eee->tx_lpi_timer;
> + lan743x_csr_write(adapter, MAC_EEE_TX_LPI_REQ_DLY_CNT, buf);
> + netif_info(adapter, drv, adapter->netdev, "Enabled EEE\n");
> + } else {
> + buf = lan743x_csr_read(adapter, MAC_CR);
> + buf &= ~MAC_CR_EEE_EN_;
> + lan743x_csr_write(adapter, MAC_CR, buf);
> + netif_info(adapter, drv, adapter->netdev, "Disabled EEE\n");
> + }
> +

Hi Bryan

You should call phy_ethtool_set_eee() in both cases, so that it gets
disabled in the PHY as well. It needs to stop advertising it.

   Andrew



Re: [PATCH v2 net-next 6/9] lan743x: Add power management support

2018-07-12 Thread Andrew Lunn
> +#ifdef CONFIG_PM
> +static void lan743x_ethtool_get_wol(struct net_device *netdev,
> + struct ethtool_wolinfo *wol)
> +{
> + struct lan743x_adapter *adapter = netdev_priv(netdev);
> +
> + wol->supported = WAKE_BCAST | WAKE_UCAST | WAKE_MCAST |
> + WAKE_MAGIC | WAKE_PHY | WAKE_ARP;
> +
> + wol->wolopts = adapter->wolopts;
> +}
> +#endif /* CONFIG_PM */
> +
> +#ifdef CONFIG_PM
> +static int lan743x_ethtool_set_wol(struct net_device *netdev,
> +struct ethtool_wolinfo *wol)
> +{
> + struct lan743x_adapter *adapter = netdev_priv(netdev);
> +
> + if (wol->wolopts & WAKE_MAGICSECURE)
> + return -EOPNOTSUPP;
> +
> + adapter->wolopts = 0;
> + if (wol->wolopts & WAKE_UCAST)
> + adapter->wolopts |= WAKE_UCAST;
> + if (wol->wolopts & WAKE_MCAST)
> + adapter->wolopts |= WAKE_MCAST;
> + if (wol->wolopts & WAKE_BCAST)
> + adapter->wolopts |= WAKE_BCAST;
> + if (wol->wolopts & WAKE_MAGIC)
> + adapter->wolopts |= WAKE_MAGIC;
> + if (wol->wolopts & WAKE_PHY)
> + adapter->wolopts |= WAKE_PHY;
> + if (wol->wolopts & WAKE_ARP)
> + adapter->wolopts |= WAKE_ARP;
> +
> + device_set_wakeup_enable(>pdev->dev, (bool)wol->wolopts);
> +
> + phy_ethtool_set_wol(netdev->phydev, wol);

Hi Bryan

This seems asymmetric. set_wol you call into the phylib to enable wol
in the PHY. But get_wol does not call into phylib. So the phy has no
chance to set what it supports.

   Andrew


Re: [PATCH v2 net-next 5/9] lan743x: Add support for ethtool eeprom access

2018-07-12 Thread Andrew Lunn
On Thu, Jul 12, 2018 at 03:05:02PM -0400, Bryan Whitehead wrote:
> Implement ethtool eeprom access
> Also provides access to OTP (One Time Programming)
> 
> Signed-off-by: Bryan Whitehead 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH v2 net-next 4/9] lan743x: Add support for ethtool message level

2018-07-12 Thread Andrew Lunn
On Thu, Jul 12, 2018 at 03:05:01PM -0400, Bryan Whitehead wrote:
> Implement ethtool message level
> 
> Signed-off-by: Bryan Whitehead 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH v2 net-next 3/9] lan743x: Add support for ethtool statistics

2018-07-12 Thread Andrew Lunn
On Thu, Jul 12, 2018 at 03:05:00PM -0400, Bryan Whitehead wrote:
> Implement ethtool statistics
> 
> Signed-off-by: Bryan Whitehead 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH v2 net-next 2/9] lan743x: Add support for ethtool link settings

2018-07-12 Thread Andrew Lunn
On Thu, Jul 12, 2018 at 03:04:59PM -0400, Bryan Whitehead wrote:
> Use default link setting functions
> 
> Signed-off-by: Bryan Whitehead 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH v2 net-next 1/9] lan743x: Add support for ethtool get_drvinfo

2018-07-12 Thread Andrew Lunn
On Thu, Jul 12, 2018 at 03:04:58PM -0400, Bryan Whitehead wrote:
> Implement ethtool get_drvinfo
> 
> Signed-off-by: Bryan Whitehead 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net] net: Don't copy pfmemalloc flag in __copy_skb_header()

2018-07-12 Thread David Miller
From: Stefano Brivio 
Date: Wed, 11 Jul 2018 14:39:42 +0200

> The pfmemalloc flag indicates that the skb was allocated from
> the PFMEMALLOC reserves, and the flag is currently copied on skb
> copy and clone.
> 
> However, an skb copied from an skb flagged with pfmemalloc
> wasn't necessarily allocated from PFMEMALLOC reserves, and on
> the other hand an skb allocated that way might be copied from an
> skb that wasn't.
> 
> So we should not copy the flag on skb copy, and rather decide
> whether to allow an skb to be associated with sockets unrelated
> to page reclaim depending only on how it was allocated.
> 
> Move the pfmemalloc flag before headers_start[0] using an
> existing 1-bit hole, so that __copy_skb_header() doesn't copy
> it.
> 
> When cloning, we'll now take care of this flag explicitly,
> contravening to the warning comment of __skb_clone().
> 
> While at it, restore the newline usage introduced by commit
> b19372273164 ("net: reorganize sk_buff for faster
> __copy_skb_header()") to visually separate bytes used in
> bitfields after headers_start[0], that was gone after commit
> a9e419dc7be6 ("netfilter: merge ctinfo into nfct pointer storage
> area"), and describe the pfmemalloc flag in the kernel-doc
> structure comment.
> 
> This doesn't change the size of sk_buff or cacheline boundaries,
> but consolidates the 15 bits hole before tc_index into a 2 bytes
> hole before csum, that could now be filled more easily.
> 
> Reported-by: Patrick Talbert 
> Fixes: c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves")
> Signed-off-by: Stefano Brivio 

Applied and queued up for -stable, thank you.


Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Michal Soltys
On 2018-07-12 23:26, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
>  wrote:
>> Michal Soltys  wrote:
>>
>>>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
 Mahesh Bandewar (महेश बंडेवार) wrote:

> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:
>>
>> Hi,
>>
>> As weird as that sounds, this is what I observed today after bumping
>> kernel version. I have a setup where 2 bonds are attached to linux
>> bridge and physically are connected to two switches doing MSTP (and
>> linux bridge is just passing them).
>>
>> Initially I suspected some changes related to bridge code - but quick
>> peek at the code showed nothing suspicious - and the part of it that
>> explicitly passes stp frames if stp is not enabled has seen little
>> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
>> regular non-bonded interfaces are attached everything works fine.
>>
>> Just to be sure I detached the bond (802.3ad mode) and checked it with
>> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
>> there (with them being present just fine on active enslaved interface,
>> or on the bond device in earlier kernels).
>>
>> If time permits I'll bisect tommorow to pinpoint the commit, but from
>> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
>> debian) and 4.17.3 (tested on archlinux) are failing.
>>
>> Unless this is already a known issue (or you have any suggestions what
>> could be responsible).
>>
> I believe these are link-local-multicast messages and sometime back a
> change went into to not pass those frames to the bonding master. This
> could be the side effect of that.

  Mahesh, I suspect you're thinking of:

 commit b89f04c61efe3b7756434d693b9203cc0cce002e
 Author: Chonggang Li 
 Date:   Sun Apr 16 12:02:18 2017 -0700

  bonding: deliver link-local packets with skb->dev set to link that 
 packets arrived on

  Michal, are you able to revert this patch and test?

  -J

 ---
  -Jay Vosburgh, jay.vosbu...@canonical.com

>>>
>>>
>>>Just tested - yes, reverting that patch solves the issues.
>>
>> Chonggang,
>>
>> Reading the changelog in your commit referenced above, I'm not
>> entirely sure what actual problem it is fixing.  Could you elaborate?
>>
>> As the patch appears to cause a regression, it needs to be
>> either fixed or reverted.
>>
>> Mahesh, you signed-off on it as well, perhaps you also have some
>> context?
>>
> 
> I think the original idea behind it was to pass the LLDPDUs to the
> stack on the interface that they came on since this is considered to
> be link-local traffic and passing to bond-master would loose it's
> "linklocal-ness". This is true for LLDP and if you change the skb->dev
> of the packet, then you don't know which slave link it came on in
> (from LLDP consumer's perspective).
> 
> I don't know much about STP but trunking two links and aggregating
> this link info through bond-master seems wrong. Just like LLDP, you
> are losing info specific to a link and the decision derived from that
> info could be wrong.
> 
> Having said that, we determine "linklocal-ness" by looking at L2 and
> bondmaster shares this with lts slaves. So it does seem fair to pass
> those frames to the bonding-master but at the same time link-local
> traffic is supposed to be limited to the physical link (LLDP/STP/LACP
> etc). Your thoughts?
> 

But, isn't bond de-facto considered the "physical link" ? Not directly
of course, but say an LLDP daemon would likely be more interested in
getting LLDP data from a bond device (or a bridge device, if the bond is
attached to one), than from its enslaved interfaces (and enslaved
interfaces can be changed, not mentioning potentially complex setup
itself, even if usually it's just lacp ).

ITOW, blocking link-local multicasts on bond level (among those - bpdu,
pae, lldp) is a bit like if the interface itself hid LACP before bond code.

A few other examples:

- putting bonds in a bridge is pretty normal thing - and whether the
bridge interpretes the spanning tree data itself (via in-kernel classic
stp or userspace daemon for e.g. rstp) or passes the trafic, it must see
the BPDU frames. Otherwise it becomes blind to the whole spanning tree
protocol - and implicitly other switches around - real or virtual ones.
It's literally instant loop disaster. br_input.c specifically takes care
to pass those frames if the bridge has stp turned off

- "group_fwd_mask" (again in bridge context) has been added to bridge
code - and recently as a per-port knob as well - to specifically allow
the control of what kind of "link-local" stuff is passed or not. LLDP
and 802.1X PAE were, afaik, the main reasons for that sysfs variable.
The per-port setting is even more 

Re: [PATCH mlx5-next v1 2/8] net/mlx5: Add support for flow table destination number

2018-07-12 Thread Jason Gunthorpe
On Fri, Jul 13, 2018 at 12:51:10AM +0300, Or Gerlitz wrote:
> On Fri, Jul 13, 2018 at 12:26 AM, Jason Gunthorpe  wrote:
> > On Fri, Jul 13, 2018 at 12:00:41AM +0300, Or Gerlitz wrote:
> >> On Wed, Jul 11, 2018 at 2:10 PM, Leon Romanovsky  wrote:
> >> > From: Yishai Hadas 
> >> >
> >> > Add support to set a destination from a flow table number.
> >> > This functionality will be used in downstream patches from this
> >> > series by the DEVX stuff.
> >>
> >> Reading your cover letter, I still don't understand what is missing
> >> in the current mlx5 fs core API for your needs. After all, you do
> >> create flow tables from the IB driver through fs core calls, right?
> >> so @ the end of the day, you have the FT pointer to provide the
> >> core, why you need the FT number?
> >
> > Via the devx API userspace can create flow tables directly without
> > going to the driver's flow steering core.
> 
> so why you change the core?

User space flow tables don't get any traffic until they are linked
into the main steering. The only ID the kernel gets for them when
adding this link is the actual PRM handle, not a pointer - hence the
change.

Jason


Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Jay Vosburgh
Mahesh Bandewar (महेश बंडेवार) wrote:

>On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
> wrote:
>> Michal Soltys  wrote:
>>
>>>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
 Mahesh Bandewar (महेश बंडेवार) wrote:

> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:
>>
>> Hi,
>>
>> As weird as that sounds, this is what I observed today after bumping
>> kernel version. I have a setup where 2 bonds are attached to linux
>> bridge and physically are connected to two switches doing MSTP (and
>> linux bridge is just passing them).
>>
>> Initially I suspected some changes related to bridge code - but quick
>> peek at the code showed nothing suspicious - and the part of it that
>> explicitly passes stp frames if stp is not enabled has seen little
>> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
>> regular non-bonded interfaces are attached everything works fine.
>>
>> Just to be sure I detached the bond (802.3ad mode) and checked it with
>> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
>> there (with them being present just fine on active enslaved interface,
>> or on the bond device in earlier kernels).
>>
>> If time permits I'll bisect tommorow to pinpoint the commit, but from
>> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
>> debian) and 4.17.3 (tested on archlinux) are failing.
>>
>> Unless this is already a known issue (or you have any suggestions what
>> could be responsible).
>>
> I believe these are link-local-multicast messages and sometime back a
> change went into to not pass those frames to the bonding master. This
> could be the side effect of that.

  Mahesh, I suspect you're thinking of:

 commit b89f04c61efe3b7756434d693b9203cc0cce002e
 Author: Chonggang Li 
 Date:   Sun Apr 16 12:02:18 2017 -0700

  bonding: deliver link-local packets with skb->dev set to link that 
 packets arrived on

  Michal, are you able to revert this patch and test?

  -J

 ---
  -Jay Vosburgh, jay.vosbu...@canonical.com

>>>
>>>
>>>Just tested - yes, reverting that patch solves the issues.
>>
>> Chonggang,
>>
>> Reading the changelog in your commit referenced above, I'm not
>> entirely sure what actual problem it is fixing.  Could you elaborate?
>>
>> As the patch appears to cause a regression, it needs to be
>> either fixed or reverted.
>>
>> Mahesh, you signed-off on it as well, perhaps you also have some
>> context?
>>
>
>I think the original idea behind it was to pass the LLDPDUs to the
>stack on the interface that they came on since this is considered to
>be link-local traffic and passing to bond-master would loose it's
>"linklocal-ness". This is true for LLDP and if you change the skb->dev
>of the packet, then you don't know which slave link it came on in
>(from LLDP consumer's perspective).
>
>I don't know much about STP but trunking two links and aggregating
>this link info through bond-master seems wrong. Just like LLDP, you
>are losing info specific to a link and the decision derived from that
>info could be wrong.
>
>Having said that, we determine "linklocal-ness" by looking at L2 and
>bondmaster shares this with lts slaves. So it does seem fair to pass
>those frames to the bonding-master but at the same time link-local
>traffic is supposed to be limited to the physical link (LLDP/STP/LACP
>etc). Your thoughts?

I agree the whole thing sounds kind of weird, but I'm curious as
to what Michal's actual use case is; he presumably has some practical
use for this, since he noticed that the behavior changed.

Michal, you mentioned MSTP and using 802.3ad (LACP) mode; how
does that combination work rationally given that the bond might send and
receive traffic across multiple slaves?  Or does the switch side bundle
the ports together into a single logical interface for MSTP purposes?
On the TX side, I think the bond will likely balance all STP frames to
just one slave.

As for a resolution, presuming that Michal has some reasonable
use case, I'm thinking along the lines of reverting the new (leave frame
attached to slave) behavior for the general case and adding a special
case for LLDP and friends to get the new behavior.  I'd like to avoid
adding any new options to bonding.

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com


Re: [PATCH iproute2-next v2] net:sched: add action inheritdsfield to skbedit

2018-07-12 Thread Stephen Hemminger
On Thu, 12 Jul 2018 12:09:26 -0400
Qiaobin Fu  wrote:

> + if (*flags & SKBEDIT_F_INHERITDSFIELD)
> + print_string(PRINT_ANY, "inheritdsfield", " %s",
> +  "inheritdsfield");

Flags should be represented in JSON output as a null value (or boolean).
print_null(PRINT_ANY, "inheritdsfield", " %s", 
"inheritdsfield");
This will generate:
"inheritdsfield" : null,
Instead of:

"inheritdsfield" : "inheritdsfield",



Re: [PATCH net-next v2 0/2] net/sched: act_skbedit: lockless data path

2018-07-12 Thread David Miller
From: Davide Caratti 
Date: Wed, 11 Jul 2018 16:04:48 +0200

> the data path of act_skbedit can be faster if we avoid using spinlocks:
>  - patch 1 converts act_skbedit statistics to use per-cpu counters
>  - patch 2 lets act_skbedit use RCU to read/update its configuration 
> 
> test procedure (using pktgen from https://github.com/netoptimizer):
> 
>  # ip link add name eth1 type dummy
>  # ip link set dev eth1 up
>  # tc qdisc add dev eth1 clsact
>  # tc filter add dev eth1 egress matchall action skbedit priority c1a0:c1a0
>  # for c in 1 2 4 ; do
>  > ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $c -n 500 -i eth1
>  > done
> 
> test results (avg. pps/thread)
> 
>   $c | before patch |  after patch | improvement
>  +--+--+
>1 | 3917464 ± 3% | 4000458 ± 3% |  irrelevant
>2 | 3455367 ± 4% | 3953076 ± 1% |+14%
>4 | 2496594 ± 2% | 3801123 ± 3% |+52%
> 
> v2: rebased on latest net-next

Series applied, thank you.


Re: [PATCH net 0/2] sfc: filter locking fixes

2018-07-12 Thread David Miller
From: Bert Kenward 
Date: Wed, 11 Jul 2018 11:39:39 +0100

> Two fixes for sfc ef10 filter table locking. Initially spotted
> by lockdep, but one issue has also been seen in normal use.

Series applied, thanks.


Re: [PATCH mlx5-next v1 2/8] net/mlx5: Add support for flow table destination number

2018-07-12 Thread Or Gerlitz
On Fri, Jul 13, 2018 at 12:26 AM, Jason Gunthorpe  wrote:
> On Fri, Jul 13, 2018 at 12:00:41AM +0300, Or Gerlitz wrote:
>> On Wed, Jul 11, 2018 at 2:10 PM, Leon Romanovsky  wrote:
>> > From: Yishai Hadas 
>> >
>> > Add support to set a destination from a flow table number.
>> > This functionality will be used in downstream patches from this
>> > series by the DEVX stuff.
>>
>> Reading your cover letter, I still don't understand what is missing
>> in the current mlx5 fs core API for your needs. After all, you do
>> create flow tables from the IB driver through fs core calls, right?
>> so @ the end of the day, you have the FT pointer to provide the
>> core, why you need the FT number?
>
> Via the devx API userspace can create flow tables directly without
> going to the driver's flow steering core.

so why you change the core?


[PATCH net] net/ipv6: Do not allow device only routes via the multipath API

2018-07-12 Thread dsahern
From: David Ahern 

Eric reported that reverting the patch that fixed and simplified IPv6
multipath routes means reverting back to invalid userspace notifications.
eg.,
$ ip -6 route add 2001:db8:1::/64 nexthop dev eth0 nexthop dev eth1

only generates a single notification:
2001:db8:1::/64 dev eth0 metric 1024 pref medium

While working on a fix for this problem I found another case that is just
broken completely - a multipath route with a gateway followed by device
followed by gateway:
$ ip -6 ro add 2001:db8:103::/64
  nexthop via 2001:db8:1::64
  nexthop dev dummy2
  nexthop via 2001:db8:3::64

In this case the device only route is dropped completely - no notification
to userpsace but no addition to the FIB either:

$ ip -6 ro ls
2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
2001:db8:2::/64 dev dummy2 proto kernel metric 256 pref medium
2001:db8:3::/64 dev dummy3 proto kernel metric 256 pref medium
2001:db8:103::/64 metric 1024
nexthop via 2001:db8:1::64 dev dummy1 weight 1
nexthop via 2001:db8:3::64 dev dummy3 weight 1 pref medium
fe80::/64 dev dummy1 proto kernel metric 256 pref medium
fe80::/64 dev dummy2 proto kernel metric 256 pref medium
fe80::/64 dev dummy3 proto kernel metric 256 pref medium

Really, IPv6 multipath is just FUBAR'ed beyond repair when it comes to
device only routes, so do not allow it all.

This change will break any scripts relying on the mpath api for insert,
but I don't see any other way to handle the permutations. Besides, since
the routes are added to the FIB as standalone (non-multipath) routes the
kernel is not doing what the user requested, so it might as well tell the
user that.

Reported-by: Eric Dumazet 
Signed-off-by: David Ahern 
---
 net/ipv6/route.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 63f99411f0de..1f1f0f318d74 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4388,6 +4388,13 @@ static int ip6_route_multipath_add(struct fib6_config 
*cfg,
rt = NULL;
goto cleanup;
}
+   if (!rt6_qualify_for_ecmp(rt)) {
+   err = EINVAL;
+   NL_SET_ERR_MSG(extack,
+  "Device only routes can not be added for 
IPv6 using the multipath API.");
+   fib6_info_release(rt);
+   goto cleanup;
+   }
 
rt->fib6_nh.nh_weight = rtnh->rtnh_hops + 1;
 
-- 
2.11.0



Re: [PATCH 00/14] ARM BPF jit compiler improvements

2018-07-12 Thread Russell King - ARM Linux
On Thu, Jul 12, 2018 at 11:12:45PM +0200, Daniel Borkmann wrote:
> On 07/12/2018 11:02 PM, Russell King - ARM Linux wrote:
> > On Thu, Jul 12, 2018 at 09:02:41PM +0200, Daniel Borkmann wrote:
> >> Applied to bpf-next, thanks a lot Russell!
> > 
> > Thanks, I've just sent four more patches, which is the sum total of
> > what I'm intending to send for BPF improvements for the next merge
> > window.
> 
> Great, thanks a lot for the batch of improvements, Russell!
> 
> Did you manage to get the BPF kselftest suite working on arm32 under
> tools/testing/selftests/bpf/? In particular the test_verfier with
> bpf_jit_enabled set to 1 and test_kmod.sh has a bigger number of
> runtime tests that would stress it.

I have a big issue with almost all of the tools/ subdirectory, and
that is that it isn't "portable".

It seems that cross-build environments just weren't considered when
the tools subdirectory was created - it appears to require the entire
kernel tree and build tree to be accessible on the target in order
to build almost everything there.  (I also exclusively do split-object
builds, I never do an in-source-tree build.)

At least perf has the ability to ask Kbuild to package it up as a
tar.* file.  That can be easily transported to the target as a
self-contained buildable tree, and then be able to built from that.

My cross-build environment for the kernel is just for building
kernels, it does not have the facilities to build for userspace - I
have a wide range of userspaces across targets, with a multitude of
different glibc versions, and even when they're compatible versions,
they're built differently.

As far as I can see, basically, most tools/ stuff requires too much
effort to work around this to be of any use to me.  Even if I did
unpick it from the kernel source tree by hand, that would be wasted
effort, because I'd need to repeat that same process whenever
anything there gets updated.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: BUG: MAX_LOCK_DEPTH too low! (2)

2018-07-12 Thread syzbot

syzbot has found a reproducer for the following crash on:

HEAD commit:6e6fddc78323 bpf: fix panic due to oob in bpf_prog_test_ru..
git tree:   bpf
console output: https://syzkaller.appspot.com/x/log.txt?x=1364db9440
kernel config:  https://syzkaller.appspot.com/x/.config?x=2ca6c7a31d407f86
dashboard link: https://syzkaller.appspot.com/bug?extid=802a5abb8abae86eb6de
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1157279440
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16aff56840

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+802a5abb8abae86eb...@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by syz-executor169/4820:
 #0: (ptrval) (rcu_read_lock_bh){}, at:  
__dev_queue_xmit+0x328/0x3910 net/core/dev.c:3503
 #1: (ptrval) (&(>seqlock)->rlock){+...}, at: spin_trylock  
include/linux/spinlock.h:320 [inline]
 #1: (ptrval) (&(>seqlock)->rlock){+...}, at: qdisc_run_begin  
include/net/sch_generic.h:124 [inline]
 #1: (ptrval) (&(>seqlock)->rlock){+...}, at: qdisc_run  
include/net/pkt_sched.h:117 [inline]
 #1: (ptrval) (&(>seqlock)->rlock){+...}, at: __dev_xmit_skb  
net/core/dev.c:3229 [inline]
 #1: (ptrval) (&(>seqlock)->rlock){+...}, at:  
__dev_queue_xmit+0x13a3/0x3910 net/core/dev.c:3537
 #2: (ptrval) (dev->qdisc_running_key ?: _running_key){+...},  
at: dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
 #3: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #3: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #4: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #4: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #5: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #5: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #6: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #6: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #7: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #7: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #8: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #8: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #9: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #9: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #10: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #10: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #11: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #11: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #12: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #12: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #13: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #13: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #14: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #14: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #15: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #15: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #16: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #16: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #17: (ptrval) (rcu_read_lock){}, at: __skb_pull  
include/linux/skbuff.h:2080 [inline]
 #17: (ptrval) (rcu_read_lock){}, at:  
skb_mac_gso_segment+0x229/0x740 net/core/dev.c:2787
 #18: (ptrval) (rcu_read_lock){}, at: __skb_pull  

Re: [PATCH] tcp: allow user to create repair socket without window probes

2018-07-12 Thread David Miller
From: Stefan Baranoff 
Date: Tue, 10 Jul 2018 17:31:10 -0400

> Under rare conditions where repair code may be used it is possible that
> window probes are either unnecessary or undesired. If the user knows that
> window probes are not wanted or needed this change allows them to skip
> sending them when a socket comes out of repair.
> 
> Signed-off-by: Stefan Baranoff 

Applied.


Re: [PATCH] tcp: fix sequence numbers for repaired sockets re-using TIME-WAIT sockets

2018-07-12 Thread David Miller
From: Stefan Baranoff 
Date: Tue, 10 Jul 2018 17:25:20 -0400

> This patch fixes a bug where the sequence numbers of a socket created using
> TCP repair functionality are lower than set after connect is called.
> This occurs when the repair socket overlaps with a TIME-WAIT socket and
> triggers the re-use code. The amount lower is equal to the number of times
> that a particular IP/port set is re-used and then put back into TIME-WAIT.
> Re-using the first time the sequence number is 1 lower, closing that socket
> and then re-opening (with repair) a new socket with the same addresses/ports
> puts the sequence number 2 lower than set via setsockopt. The third time is
> 3 lower, etc. I have not tested what the limit of this acrewal is, if any.
> 
> The fix is, if a socket is in repair mode, to respect the already set
> sequence number and timestamp when it would have already re-used the
> TIME-WAIT socket.
> 
> Signed-off-by: Stefan Baranoff 

Applied.


Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread महेश बंडेवार
On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
 wrote:
> Michal Soltys  wrote:
>
>>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
>>> Mahesh Bandewar (महेश बंडेवार) wrote:
>>>
 On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:
>
> Hi,
>
> As weird as that sounds, this is what I observed today after bumping
> kernel version. I have a setup where 2 bonds are attached to linux
> bridge and physically are connected to two switches doing MSTP (and
> linux bridge is just passing them).
>
> Initially I suspected some changes related to bridge code - but quick
> peek at the code showed nothing suspicious - and the part of it that
> explicitly passes stp frames if stp is not enabled has seen little
> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
> regular non-bonded interfaces are attached everything works fine.
>
> Just to be sure I detached the bond (802.3ad mode) and checked it with
> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
> there (with them being present just fine on active enslaved interface,
> or on the bond device in earlier kernels).
>
> If time permits I'll bisect tommorow to pinpoint the commit, but from
> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
> debian) and 4.17.3 (tested on archlinux) are failing.
>
> Unless this is already a known issue (or you have any suggestions what
> could be responsible).
>
 I believe these are link-local-multicast messages and sometime back a
 change went into to not pass those frames to the bonding master. This
 could be the side effect of that.
>>>
>>>  Mahesh, I suspect you're thinking of:
>>>
>>> commit b89f04c61efe3b7756434d693b9203cc0cce002e
>>> Author: Chonggang Li 
>>> Date:   Sun Apr 16 12:02:18 2017 -0700
>>>
>>>  bonding: deliver link-local packets with skb->dev set to link that 
>>> packets arrived on
>>>
>>>  Michal, are you able to revert this patch and test?
>>>
>>>  -J
>>>
>>> ---
>>>  -Jay Vosburgh, jay.vosbu...@canonical.com
>>>
>>
>>
>>Just tested - yes, reverting that patch solves the issues.
>
> Chonggang,
>
> Reading the changelog in your commit referenced above, I'm not
> entirely sure what actual problem it is fixing.  Could you elaborate?
>
> As the patch appears to cause a regression, it needs to be
> either fixed or reverted.
>
> Mahesh, you signed-off on it as well, perhaps you also have some
> context?
>

I think the original idea behind it was to pass the LLDPDUs to the
stack on the interface that they came on since this is considered to
be link-local traffic and passing to bond-master would loose it's
"linklocal-ness". This is true for LLDP and if you change the skb->dev
of the packet, then you don't know which slave link it came on in
(from LLDP consumer's perspective).

I don't know much about STP but trunking two links and aggregating
this link info through bond-master seems wrong. Just like LLDP, you
are losing info specific to a link and the decision derived from that
info could be wrong.

Having said that, we determine "linklocal-ness" by looking at L2 and
bondmaster shares this with lts slaves. So it does seem fair to pass
those frames to the bonding-master but at the same time link-local
traffic is supposed to be limited to the physical link (LLDP/STP/LACP
etc). Your thoughts?


> -J
>
> ---
> -Jay Vosburgh, jay.vosbu...@canonical.com


Re: [PATCH mlx5-next v1 2/8] net/mlx5: Add support for flow table destination number

2018-07-12 Thread Jason Gunthorpe
On Fri, Jul 13, 2018 at 12:00:41AM +0300, Or Gerlitz wrote:
> On Wed, Jul 11, 2018 at 2:10 PM, Leon Romanovsky  wrote:
> > From: Yishai Hadas 
> >
> > Add support to set a destination from a flow table number.
> > This functionality will be used in downstream patches from this
> > series by the DEVX stuff.
> 
> Reading your cover letter, I still don't understand what is missing
> in the current mlx5 fs core API for your needs. After all, you do
> create flow tables from the IB driver through fs core calls, right?
> so @ the end of the day, you have the FT pointer to provide the
> core, why you need the FT number?

Via the devx API userspace can create flow tables directly without
going to the driver's flow steering core.

Jason


Re: [PATCH 00/14] ARM BPF jit compiler improvements

2018-07-12 Thread Daniel Borkmann
On 07/12/2018 11:02 PM, Russell King - ARM Linux wrote:
> On Thu, Jul 12, 2018 at 09:02:41PM +0200, Daniel Borkmann wrote:
>> Applied to bpf-next, thanks a lot Russell!
> 
> Thanks, I've just sent four more patches, which is the sum total of
> what I'm intending to send for BPF improvements for the next merge
> window.

Great, thanks a lot for the batch of improvements, Russell!

Did you manage to get the BPF kselftest suite working on arm32 under
tools/testing/selftests/bpf/? In particular the test_verfier with
bpf_jit_enabled set to 1 and test_kmod.sh has a bigger number of
runtime tests that would stress it.

Thanks,
Daniel


Re: [PATCH bpf v2] bpf: don't leave partial mangled prog in jit_subprogs error path

2018-07-12 Thread Alexei Starovoitov
On Thu, Jul 12, 2018 at 09:44:28PM +0200, Daniel Borkmann wrote:
> syzkaller managed to trigger the following bug through fault injection:
> 
>   [...]
>   [  141.043668] verifier bug. No program starts at insn 3
>   [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
>  get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
>   [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
>  fixup_call_args kernel/bpf/verifier.c:5587 [inline]
>   [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
>  bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
>   [  141.047355] CPU: 3 PID: 4072 Comm: a.out Not tainted 4.18.0-rc4+ #51
>   [  141.048446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 
> 1.10.2-1 04/01/2014
>   [  141.049877] Call Trace:
>   [  141.050324]  __dump_stack lib/dump_stack.c:77 [inline]
>   [  141.050324]  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
>   [  141.050950]  ? dump_stack_print_info.cold.2+0x52/0x52 lib/dump_stack.c:60
>   [  141.051837]  panic+0x238/0x4e7 kernel/panic.c:184
>   [  141.052386]  ? add_taint.cold.5+0x16/0x16 kernel/panic.c:385
>   [  141.053101]  ? __warn.cold.8+0x148/0x1ba kernel/panic.c:537
>   [  141.053814]  ? __warn.cold.8+0x117/0x1ba kernel/panic.c:530
>   [  141.054506]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
>   [  141.054506]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
>   [  141.054506]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
>   [  141.055163]  __warn.cold.8+0x163/0x1ba kernel/panic.c:538
>   [  141.055820]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
>   [  141.055820]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
>   [  141.055820]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
>   [...]
> 
> What happens in jit_subprogs() is that kcalloc() for the subprog func
> buffer is failing with NULL where we then bail out. Latter is a plain
> return -ENOMEM, and this is definitely not okay since earlier in the
> loop we are walking all subprogs and temporarily rewrite insn->off to
> remember the subprog id as well as insn->imm to temporarily point the
> call to __bpf_call_base + 1 for the initial JIT pass. Thus, bailing
> out in such state and handing this over to the interpreter is troublesome
> since later/subsequent e.g. find_subprog() lookups are based on wrong
> insn->imm.
> 
> Therefore, once we hit this point, we need to jump to out_free path
> where we undo all changes from earlier loop, so that interpreter can
> work on unmodified insn->{off,imm}.
> 
> Another point is that should find_subprog() fail in jit_subprogs() due
> to a verifier bug, then we also should not simply defer the program to
> the interpreter since also here we did partial modifications. Instead
> we should just bail out entirely and return an error to the user who is
> trying to load the program.
> 
> Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
> Reported-by: syzbot+7d427828b2ea6e592...@syzkaller.appspotmail.com
> Signed-off-by: Daniel Borkmann 

Applied, Thanks



Re: [PATCH 00/14] ARM BPF jit compiler improvements

2018-07-12 Thread Russell King - ARM Linux
On Thu, Jul 12, 2018 at 09:02:41PM +0200, Daniel Borkmann wrote:
> Applied to bpf-next, thanks a lot Russell!

Thanks, I've just sent four more patches, which is the sum total of
what I'm intending to send for BPF improvements for the next merge
window.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH mlx5-next v1 2/8] net/mlx5: Add support for flow table destination number

2018-07-12 Thread Or Gerlitz
On Wed, Jul 11, 2018 at 2:10 PM, Leon Romanovsky  wrote:
> From: Yishai Hadas 
>
> Add support to set a destination from a flow table number.
> This functionality will be used in downstream patches from this
> series by the DEVX stuff.

Reading your cover letter, I still don't understand what is missing in
the current mlx5
fs core API for your needs. After all, you do create flow tables from
the IB driver through
fs core calls, right? so @ the end of the day, you have the FT pointer
to provide the core,
why you need the FT number?


Re: [PATCH mlx5-next v1 1/8] net/mlx5: Add forward compatible support for the FTE match data

2018-07-12 Thread Or Gerlitz
On Wed, Jul 11, 2018 at 2:10 PM, Leon Romanovsky  wrote:
> From: Yishai Hadas 
>
> Use the PRM size including the reserved when working with the FTE
> match data.

is this actually a bug fix?

> This comes to support forward compatibility for cases that current
> reserved data will be exposed by the firmware and could be used by an
> application by DEVX without changing the kernel.

something went wrong in the phrasing/wording of "used by an application by DEVX"
I can't follow on that part of the sentence, please try to improve/fix it.

> Also drop some driver checks around the match criteria leaving the work
> for firmware to enable forward compatibility for future bits there.

not following,

OTOH we can always patch the kernel to add new bits for checking, why
remove these checks?

OTOH, suppose today we check that one of four bits is set and now one
added bit #5 and the
kernel doesn't check it, what removing the existing four checks buys you?


[PATCH net-next 3/4] ARM: net: bpf: improve 64-bit store implementation

2018-07-12 Thread Russell King
Improve the 64-bit store implementation from:

  ldr r6, [fp, #-8]
  str r8, [r6]
  ldr r6, [fp, #-8]
  mov r7, #4
  add r7, r6, r7
  str r9, [r7]

to:

  ldr r6, [fp, #-8]
  str r8, [r6]
  str r9, [r6, #4]

We leave the store as two separate STR instructions rather than using
STRD as the store may not be aligned, and STR can handle misalignment.

Signed-off-by: Russell King 
---
 arch/arm/net/bpf_jit_32.c | 52 +++
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 3a182e618441..026612ee8151 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -975,29 +975,42 @@ static inline void emit_a32_mul_r64(const s8 dst[], const 
s8 src[],
 }
 
 /* *(size *)(dst + off) = src */
-static inline void emit_str_r(const s8 dst, const s8 src,
- const s32 off, struct jit_ctx *ctx, const u8 sz){
+static inline void emit_str_r(const s8 dst, const s8 src[],
+ s32 off, struct jit_ctx *ctx, const u8 sz){
const s8 *tmp = bpf2a32[TMP_REG_1];
+   s32 off_max;
s8 rd;
 
rd = arm_bpf_get_reg32(dst, tmp[1], ctx);
-   if (off) {
+
+   if (sz == BPF_H)
+   off_max = 0xff;
+   else
+   off_max = 0xfff;
+
+   if (off < 0 || off > off_max) {
emit_a32_mov_i(tmp[0], off, ctx);
-   emit(ARM_ADD_R(tmp[0], rd, tmp[0]), ctx);
+   emit(ARM_ADD_R(tmp[0], tmp[0], rd), ctx);
rd = tmp[0];
+   off = 0;
}
switch (sz) {
-   case BPF_W:
-   /* Store a Word */
-   emit(ARM_STR_I(src, rd, 0), ctx);
+   case BPF_B:
+   /* Store a Byte */
+   emit(ARM_STRB_I(src_lo, rd, off), ctx);
break;
case BPF_H:
/* Store a HalfWord */
-   emit(ARM_STRH_I(src, rd, 0), ctx);
+   emit(ARM_STRH_I(src_lo, rd, off), ctx);
break;
-   case BPF_B:
-   /* Store a Byte */
-   emit(ARM_STRB_I(src, rd, 0), ctx);
+   case BPF_W:
+   /* Store a Word */
+   emit(ARM_STR_I(src_lo, rd, off), ctx);
+   break;
+   case BPF_DW:
+   /* Store a Double Word */
+   emit(ARM_STR_I(src_lo, rd, off), ctx);
+   emit(ARM_STR_I(src_hi, rd, off + 4), ctx);
break;
}
 }
@@ -1539,16 +1552,14 @@ static int build_insn(const struct bpf_insn *insn, 
struct jit_ctx *ctx)
case BPF_DW:
/* Sign-extend immediate value into temp reg */
emit_a32_mov_se_i64(true, tmp2, imm, ctx);
-   emit_str_r(dst_lo, tmp2[1], off, ctx, BPF_W);
-   emit_str_r(dst_lo, tmp2[0], off+4, ctx, BPF_W);
break;
case BPF_W:
case BPF_H:
case BPF_B:
emit_a32_mov_i(tmp2[1], imm, ctx);
-   emit_str_r(dst_lo, tmp2[1], off, ctx, BPF_SIZE(code));
break;
}
+   emit_str_r(dst_lo, tmp2, off, ctx, BPF_SIZE(code));
break;
/* STX XADD: lock *(u32 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_W:
@@ -1560,20 +1571,9 @@ static int build_insn(const struct bpf_insn *insn, 
struct jit_ctx *ctx)
case BPF_STX | BPF_MEM | BPF_H:
case BPF_STX | BPF_MEM | BPF_B:
case BPF_STX | BPF_MEM | BPF_DW:
-   {
-   u8 sz = BPF_SIZE(code);
-
rs = arm_bpf_get_reg64(src, tmp2, ctx);
-
-   /* Store the value */
-   if (BPF_SIZE(code) == BPF_DW) {
-   emit_str_r(dst_lo, rs[1], off, ctx, BPF_W);
-   emit_str_r(dst_lo, rs[0], off+4, ctx, BPF_W);
-   } else {
-   emit_str_r(dst_lo, rs[1], off, ctx, sz);
-   }
+   emit_str_r(dst_lo, rs, off, ctx, BPF_SIZE(code));
break;
-   }
/* PC += off if dst == src */
/* PC += off if dst > src */
/* PC += off if dst >= src */
-- 
2.7.4



[PATCH net-next 4/4] ARM: net: bpf: improve 64-bit ALU implementation

2018-07-12 Thread Russell King
Improbe the 64-bit ALU implementation from:

  movwr8, #65532
  movtr8, #65535
  movwr9, #65535
  movtr9, #65535
  ldr r7, [fp, #-44]
  addsr7, r7, r8
  str r7, [fp, #-44]
  ldr r7, [fp, #-40]
  adc r7, r7, r9
  str r7, [fp, #-40]

to:

  movwr8, #65532
  movtr8, #65535
  movwr9, #65535
  movtr9, #65535
  ldrdr6, [fp, #-44]
  addsr6, r6, r8
  adc r7, r7, r9
  strdr6, [fp, #-44]

Signed-off-by: Russell King 
---
 arch/arm/net/bpf_jit_32.c | 29 -
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 026612ee8151..25b3ee85066e 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -716,11 +716,30 @@ static inline void emit_a32_alu_r(const s8 dst, const s8 
src,
 static inline void emit_a32_alu_r64(const bool is64, const s8 dst[],
  const s8 src[], struct jit_ctx *ctx,
  const u8 op) {
-   emit_a32_alu_r(dst_lo, src_lo, ctx, is64, false, op);
-   if (is64)
-   emit_a32_alu_r(dst_hi, src_hi, ctx, is64, true, op);
-   else
-   emit_a32_mov_i(dst_hi, 0, ctx);
+   const s8 *tmp = bpf2a32[TMP_REG_1];
+   const s8 *tmp2 = bpf2a32[TMP_REG_2];
+   const s8 *rd;
+
+   rd = arm_bpf_get_reg64(dst, tmp, ctx);
+   if (is64) {
+   const s8 *rs;
+
+   rs = arm_bpf_get_reg64(src, tmp2, ctx);
+
+   /* ALU operation */
+   emit_alu_r(rd[1], rs[1], true, false, op, ctx);
+   emit_alu_r(rd[0], rs[0], true, true, op, ctx);
+   } else {
+   s8 rs;
+
+   rs = arm_bpf_get_reg32(src_lo, tmp2[1], ctx);
+
+   /* ALU operation */
+   emit_alu_r(rd[1], rs, true, false, op, ctx);
+   emit_a32_mov_i(rd[0], 0, ctx);
+   }
+
+   arm_bpf_put_reg64(dst, rd, ctx);
 }
 
 /* dst = src (4 bytes)*/
-- 
2.7.4



Re: [PATCH iproute2-next v2] net:sched: add action inheritdsfield to skbedit

2018-07-12 Thread Marcelo Ricardo Leitner
On Thu, Jul 12, 2018 at 12:09:26PM -0400, Qiaobin Fu wrote:
> @@ -156,6 +162,9 @@ parse_skbedit(struct action_util *a, int *argc_p, char 
> ***argv_p, int tca_id,
>   if (flags & SKBEDIT_F_PTYPE)
>   addattr_l(n, MAX_MSG, TCA_SKBEDIT_PTYPE,
> , sizeof(ptype));
> + if (pure_flags != 0)
> + addattr_l(n, MAX_MSG, TCA_SKBEDIT_FLAGS,
> + _flags, sizeof(pure_flags));

It is missing 2 spaces  ^--- here, to make the indentation right. (as
in the block above)

  Marcelo


[PATCH net-next 2/4] ARM: net: bpf: improve 64-bit sign-extended immediate load

2018-07-12 Thread Russell King
Improve the 64-bit sign-extended immediate from:

  mov r6, #1
  str r6, [fp, #-52]  ; 0xffcc
  mov r6, #0
  str r6, [fp, #-48]  ; 0xffd0

to:

  mov r6, #1
  mov r7, #0
  strdr6, [fp, #-52]  ; 0xffcc

Signed-off-by: Russell King 
---
 arch/arm/net/bpf_jit_32.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 6558bd73bbb9..3a182e618441 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -613,12 +613,11 @@ static void emit_a32_mov_i64(const s8 dst[], u64 val, 
struct jit_ctx *ctx)
 /* Sign extended move */
 static inline void emit_a32_mov_se_i64(const bool is64, const s8 dst[],
   const u32 val, struct jit_ctx *ctx) {
-   u32 hi = 0;
+   u64 val64 = val;
 
if (is64 && (val & (1<<31)))
-   hi = (u32)~0;
-   emit_a32_mov_i(dst_lo, val, ctx);
-   emit_a32_mov_i(dst_hi, hi, ctx);
+   val64 |= 0xULL;
+   emit_a32_mov_i64(dst, val64, ctx);
 }
 
 static inline void emit_a32_add_r(const u8 dst, const u8 src,
-- 
2.7.4



[PATCH net-next 1/4] ARM: net: bpf: improve 64-bit load immediate implementation

2018-07-12 Thread Russell King
Rather than writing each 32-bit half of the 64-bit immediate value
separately when the register is on the stack:

  movwr6, #45056  ; 0xb000
  movtr6, #60979  ; 0xee33
  str r6, [fp, #-44]  ; 0xffd4
  mov r6, #0
  str r6, [fp, #-40]  ; 0xffd8

arrange to use the double-word store when available instead:

  movwr6, #45056  ; 0xb000
  movtr6, #60979  ; 0xee33
  mov r7, #0
  strdr6, [fp, #-44]  ; 0xffd4

Signed-off-by: Russell King 
---
 arch/arm/net/bpf_jit_32.c | 32 
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index a9f68a924800..6558bd73bbb9 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -599,9 +599,20 @@ static inline void emit_a32_mov_i(const s8 dst, const u32 
val,
}
 }
 
+static void emit_a32_mov_i64(const s8 dst[], u64 val, struct jit_ctx *ctx)
+{
+   const s8 *tmp = bpf2a32[TMP_REG_1];
+   const s8 *rd = is_stacked(dst_lo) ? tmp : dst;
+
+   emit_mov_i(rd[1], (u32)val, ctx);
+   emit_mov_i(rd[0], val >> 32, ctx);
+
+   arm_bpf_put_reg64(dst, rd, ctx);
+}
+
 /* Sign extended move */
-static inline void emit_a32_mov_i64(const bool is64, const s8 dst[],
- const u32 val, struct jit_ctx *ctx) {
+static inline void emit_a32_mov_se_i64(const bool is64, const s8 dst[],
+  const u32 val, struct jit_ctx *ctx) {
u32 hi = 0;
 
if (is64 && (val & (1<<31)))
@@ -1309,7 +1320,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
break;
case BPF_K:
/* Sign-extend immediate value to destination reg */
-   emit_a32_mov_i64(is64, dst, imm, ctx);
+   emit_a32_mov_se_i64(is64, dst, imm, ctx);
break;
}
break;
@@ -1358,7 +1369,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
 * value into temporary reg and then it would be
 * safe to do the operation on it.
 */
-   emit_a32_mov_i64(is64, tmp2, imm, ctx);
+   emit_a32_mov_se_i64(is64, tmp2, imm, ctx);
emit_a32_alu_r64(is64, dst, tmp2, ctx, BPF_OP(code));
break;
}
@@ -1454,7 +1465,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
 * reg then it would be safe to do the operation
 * on it.
 */
-   emit_a32_mov_i64(is64, tmp2, imm, ctx);
+   emit_a32_mov_se_i64(is64, tmp2, imm, ctx);
emit_a32_mul_r64(dst, tmp2, ctx);
break;
}
@@ -1506,12 +1517,9 @@ static int build_insn(const struct bpf_insn *insn, 
struct jit_ctx *ctx)
/* dst = imm64 */
case BPF_LD | BPF_IMM | BPF_DW:
{
-   const struct bpf_insn insn1 = insn[1];
-   u32 hi, lo = imm;
+   u64 val = (u32)imm | (u64)insn[1].imm << 32;
 
-   hi = insn1.imm;
-   emit_a32_mov_i(dst_lo, lo, ctx);
-   emit_a32_mov_i(dst_hi, hi, ctx);
+   emit_a32_mov_i64(dst, val, ctx);
 
return 1;
}
@@ -1531,7 +1539,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
switch (BPF_SIZE(code)) {
case BPF_DW:
/* Sign-extend immediate value into temp reg */
-   emit_a32_mov_i64(true, tmp2, imm, ctx);
+   emit_a32_mov_se_i64(true, tmp2, imm, ctx);
emit_str_r(dst_lo, tmp2[1], off, ctx, BPF_W);
emit_str_r(dst_lo, tmp2[0], off+4, ctx, BPF_W);
break;
@@ -1620,7 +1628,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
rm = tmp2[0];
rn = tmp2[1];
/* Sign-extend immediate value */
-   emit_a32_mov_i64(true, tmp2, imm, ctx);
+   emit_a32_mov_se_i64(true, tmp2, imm, ctx);
 go_jmp:
/* Setup destination register */
rd = arm_bpf_get_reg64(dst, tmp, ctx);
-- 
2.7.4



[PATCH net-next 0/4] Further ARM BPF jit compiler improvements

2018-07-12 Thread Russell King - ARM Linux
Four further jit compiler improves for 32-bit ARM.

 arch/arm/net/bpf_jit_32.c | 120 --
 1 file changed, 73 insertions(+), 47 deletions(-)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


[PATCH] liquidio: Use %pad printk format for dma_addr_t values

2018-07-12 Thread Helge Deller
Use the existing %pad printk format to print dma_addr_t values.
This avoids the following warnings when compiling on the parisc platform:

warning: format '%llx' expects argument of type 'long long unsigned int', but 
argument 2 has type 'dma_addr_t {aka unsigned int}' [-Wformat=]

Signed-off-by: Helge Deller 

diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c 
b/drivers/net/ethernet/cavium/liquidio/request_manager.c
index 1f2e75da28f8..d5d9e47daa4b 100644
--- a/drivers/net/ethernet/cavium/liquidio/request_manager.c
+++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c
@@ -110,8 +110,8 @@ int octeon_init_instr_queue(struct octeon_device *oct,
 
memset(iq->request_list, 0, sizeof(*iq->request_list) * num_descs);
 
-   dev_dbg(>pci_dev->dev, "IQ[%d]: base: %p basedma: %llx count: 
%d\n",
-   iq_no, iq->base_addr, iq->base_addr_dma, iq->max_count);
+   dev_dbg(>pci_dev->dev, "IQ[%d]: base: %p basedma: %pad count: 
%d\n",
+   iq_no, iq->base_addr, >base_addr_dma, iq->max_count);
 
iq->txpciq.u64 = txpciq.u64;
iq->fill_threshold = (u32)conf->db_min;


Re: [net-next PATCH] net: ipv4: fix listify ip_rcv_finish in case of forwarding

2018-07-12 Thread Or Gerlitz
On Wed, Jul 11, 2018 at 11:06 PM, Jesper Dangaard Brouer
 wrote:

> Well, I would prefer you to implement those.  I just did a quick
> implementation (its trivially easy) so I have something to benchmark
> with.  The performance boost is quite impressive!

sounds good, but wait


> One reason I didn't "just" send a patch, is that Edward so-fare only
> implemented netif_receive_skb_list() and not napi_gro_receive_list().

sfc does't support gro?! doesn't make sense.. Edward?

> And your driver uses napi_gro_receive().  This sort-of disables GRO for
> your driver, which is not a choice I can make.  Interestingly I get
> around the same netperf TCP_STREAM performance.

Same TCP performance

with GRO and no rx-batching

or

without GRO and yes rx-batching

is by far not intuitive result to me unless both these techniques
mostly serve to eliminate lots of instruction cache misses and the
TCP stack is so much optimized that if the code is in the cache,
going through it once with 64K byte GRO-ed packet is like going
through it ~40 (64K/1500) times with non GRO-ed packets.

What's the baseline (with GRO and no rx-batching) number on your setup?

> I assume we can get even better perf if we "listify" napi_gro_receive.

yeah, that would be very interesting to get there


Re: [PATCH net-next 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Heiner Kallweit
On 12.07.2018 21:53, Florian Fainelli wrote:
> 
> 
> On 07/12/2018 12:25 PM, Florian Fainelli wrote:
>>
>>
>> On 07/12/2018 12:10 PM, Heiner Kallweit wrote:
>>> On 12.07.2018 21:09, Andrew Lunn wrote:
> Like r8169 also tg3 driver doesn't wait for the speed-down-renegotiation
> to finish. Therefore, even though I share Andrew's concerns, there seem
> to be chips where it's safe to not wait for the renegotiation to finish
> (e.g. because device is in PCI D3 already and can't generate an 
> interrupt).
> Having said that I'd keep the sync parameter for phy_speed_down so that
> the driver can decide.

 Hi Heiner

 Please put a big fat comment about the dangers of sync=false in the
 function header. We want people to known it is dangerous by default,
 and should only be used in special conditions, when it is known to be
 safe.
Andrew

>>> OK ..
>>
>> What part do you find dangerous? Magic Packets are UDP packets and they
>> are not routed (unless specifically taken care of) so there is already
>> some "lossy" behavior involved with waking-up an Ethernet MAC, I don't
>> think that is too bad to retry several times until the link comes up.
> 
> I see the concern with the comment from v2, and indeed you could get an
> interrupt signaling the PHY auto-negotiated the link before or at the
> time we are suspending causing potentially an early wake-up. Not that
> this should be a problem though since there is usually a point of not
> return past which you can't do early wake-up anyway.
> 
I think we should leave the comment in for the moment so that people
think twice about the described scenario. If we should find out that
the issue can't be triggered on all platforms then we still can remove
the comment.


Re: [PATCH net-next 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Florian Fainelli



On 07/12/2018 12:25 PM, Florian Fainelli wrote:
> 
> 
> On 07/12/2018 12:10 PM, Heiner Kallweit wrote:
>> On 12.07.2018 21:09, Andrew Lunn wrote:
 Like r8169 also tg3 driver doesn't wait for the speed-down-renegotiation
 to finish. Therefore, even though I share Andrew's concerns, there seem
 to be chips where it's safe to not wait for the renegotiation to finish
 (e.g. because device is in PCI D3 already and can't generate an interrupt).
 Having said that I'd keep the sync parameter for phy_speed_down so that
 the driver can decide.
>>>
>>> Hi Heiner
>>>
>>> Please put a big fat comment about the dangers of sync=false in the
>>> function header. We want people to known it is dangerous by default,
>>> and should only be used in special conditions, when it is known to be
>>> safe.
>>> Andrew
>>>
>> OK ..
> 
> What part do you find dangerous? Magic Packets are UDP packets and they
> are not routed (unless specifically taken care of) so there is already
> some "lossy" behavior involved with waking-up an Ethernet MAC, I don't
> think that is too bad to retry several times until the link comes up.

I see the concern with the comment from v2, and indeed you could get an
interrupt signaling the PHY auto-negotiated the link before or at the
time we are suspending causing potentially an early wake-up. Not that
this should be a problem though since there is usually a point of not
return past which you can't do early wake-up anyway.
-- 
Florian


[PATCH net-next] net: phy: realtek: add missing entry for RTL8211C to mdio_device_id table

2018-07-12 Thread Heiner Kallweit
Add missing entry for RTL8211C to mdio_device_id table.

Signed-off-by: Heiner Kallweit 
Fixes: cf87915cb9f8 ("net: phy: realtek: add support for RTL8211C")
---
 drivers/net/phy/realtek.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index f8f12783..0610148c 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -279,6 +279,7 @@ static struct mdio_device_id __maybe_unused realtek_tbl[] = 
{
{ 0x001cc816, 0x001f },
{ 0x001cc910, 0x001f },
{ 0x001cc912, 0x001f },
+   { 0x001cc913, 0x001f },
{ 0x001cc914, 0x001f },
{ 0x001cc915, 0x001f },
{ 0x001cc916, 0x001f },
-- 
2.18.0



[PATCH bpf v2] bpf: don't leave partial mangled prog in jit_subprogs error path

2018-07-12 Thread Daniel Borkmann
syzkaller managed to trigger the following bug through fault injection:

  [...]
  [  141.043668] verifier bug. No program starts at insn 3
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
 get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
 fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
 bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [  141.047355] CPU: 3 PID: 4072 Comm: a.out Not tainted 4.18.0-rc4+ #51
  [  141.048446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 
1.10.2-1 04/01/2014
  [  141.049877] Call Trace:
  [  141.050324]  __dump_stack lib/dump_stack.c:77 [inline]
  [  141.050324]  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  [  141.050950]  ? dump_stack_print_info.cold.2+0x52/0x52 lib/dump_stack.c:60
  [  141.051837]  panic+0x238/0x4e7 kernel/panic.c:184
  [  141.052386]  ? add_taint.cold.5+0x16/0x16 kernel/panic.c:385
  [  141.053101]  ? __warn.cold.8+0x148/0x1ba kernel/panic.c:537
  [  141.053814]  ? __warn.cold.8+0x117/0x1ba kernel/panic.c:530
  [  141.054506]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.054506]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.054506]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [  141.055163]  __warn.cold.8+0x163/0x1ba kernel/panic.c:538
  [  141.055820]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.055820]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.055820]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [...]

What happens in jit_subprogs() is that kcalloc() for the subprog func
buffer is failing with NULL where we then bail out. Latter is a plain
return -ENOMEM, and this is definitely not okay since earlier in the
loop we are walking all subprogs and temporarily rewrite insn->off to
remember the subprog id as well as insn->imm to temporarily point the
call to __bpf_call_base + 1 for the initial JIT pass. Thus, bailing
out in such state and handing this over to the interpreter is troublesome
since later/subsequent e.g. find_subprog() lookups are based on wrong
insn->imm.

Therefore, once we hit this point, we need to jump to out_free path
where we undo all changes from earlier loop, so that interpreter can
work on unmodified insn->{off,imm}.

Another point is that should find_subprog() fail in jit_subprogs() due
to a verifier bug, then we also should not simply defer the program to
the interpreter since also here we did partial modifications. Instead
we should just bail out entirely and return an error to the user who is
trying to load the program.

Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Reported-by: syzbot+7d427828b2ea6e592...@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann 
---
 v1 -> v2:
   - used label instead of if condition, bit cleaner and shorter

 kernel/bpf/verifier.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9e2bf83..63aaac5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5430,6 +5430,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
if (insn->code != (BPF_JMP | BPF_CALL) ||
insn->src_reg != BPF_PSEUDO_CALL)
continue;
+   /* Upon error here we cannot fall back to interpreter but
+* need a hard reject of the program. Thus -EFAULT is
+* propagated in any case.
+*/
subprog = find_subprog(env, i + insn->imm + 1);
if (subprog < 0) {
WARN_ONCE(1, "verifier bug. No program starts at insn 
%d\n",
@@ -5450,7 +5454,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 
func = kcalloc(env->subprog_cnt, sizeof(prog), GFP_KERNEL);
if (!func)
-   return -ENOMEM;
+   goto out_undo_insn;
 
for (i = 0; i < env->subprog_cnt; i++) {
subprog_start = subprog_end;
@@ -5515,7 +5519,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
tmp = bpf_int_jit_compile(func[i]);
if (tmp != func[i] || func[i]->bpf_func != old_bpf_func) {
verbose(env, "JIT doesn't support bpf-to-bpf calls\n");
-   err = -EFAULT;
+   err = -ENOTSUPP;
goto out_free;
}
cond_resched();
@@ -5552,6 +5556,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
if (func[i])
bpf_jit_free(func[i]);
kfree(func);
+out_undo_insn:
/* cleanup main prog to be interpreted */
prog->jit_requested = 0;
for (i = 0, insn = 

Re: [PATCH net-next] tc-testing: add geneve options in tunnel_key unit tests

2018-07-12 Thread David Miller
From: Jakub Kicinski 
Date: Tue, 10 Jul 2018 18:22:31 -0700

> From: Pieter Jansen van Vuuren 
> 
> Extend tc tunnel_key action unit tests with geneve options. Tests
> include testing single and multiple geneve options, as well as
> testing geneve options that are expected to fail.
> 
> Signed-off-by: Pieter Jansen van Vuuren 

Applied, thanks.


[PATCH net-next v2 1/2] net: phy: add helper phy_config_aneg

2018-07-12 Thread Heiner Kallweit
This functionality will also be needed in subsequent patches of this
series, therefore factor it out to a helper.

Signed-off-by: Heiner Kallweit 
Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
---
 drivers/net/phy/phy.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 537297d2..c4aa360d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -467,6 +467,14 @@ int phy_mii_ioctl(struct phy_device *phydev, struct ifreq 
*ifr, int cmd)
 }
 EXPORT_SYMBOL(phy_mii_ioctl);
 
+static int phy_config_aneg(struct phy_device *phydev)
+{
+   if (phydev->drv->config_aneg)
+   return phydev->drv->config_aneg(phydev);
+   else
+   return genphy_config_aneg(phydev);
+}
+
 /**
  * phy_start_aneg_priv - start auto-negotiation for this PHY device
  * @phydev: the phy_device struct
@@ -493,10 +501,7 @@ static int phy_start_aneg_priv(struct phy_device *phydev, 
bool sync)
/* Invalidate LP advertising flags */
phydev->lp_advertising = 0;
 
-   if (phydev->drv->config_aneg)
-   err = phydev->drv->config_aneg(phydev);
-   else
-   err = genphy_config_aneg(phydev);
+   err = phy_config_aneg(phydev);
if (err < 0)
goto out_unlock;
 
-- 
2.18.0




[PATCH net-next v2 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Heiner Kallweit
Some network drivers include functionality to speed down the PHY when
suspending and just waiting for a WoL packet because this saves energy.
This functionality is quite generic, therefore let's factor it out to
phylib.

Signed-off-by: Heiner Kallweit 
---
v2:
- add comment to phy_speed_down regarding use of sync = false
- remove sync parameter from phy_speed_up
---
 drivers/net/phy/phy.c | 78 +++
 include/linux/phy.h   |  2 ++
 2 files changed, 80 insertions(+)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index c4aa360d..e61864ca 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -551,6 +551,84 @@ int phy_start_aneg(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(phy_start_aneg);
 
+static int phy_poll_aneg_done(struct phy_device *phydev)
+{
+   unsigned int retries = 100;
+   int ret;
+
+   do {
+   msleep(100);
+   ret = phy_aneg_done(phydev);
+   } while (!ret && --retries);
+
+   if (!ret)
+   return -ETIMEDOUT;
+
+   return ret < 0 ? ret : 0;
+}
+
+/**
+ * phy_speed_down - set speed to lowest speed supported by both link partners
+ * @phydev: the phy_device struct
+ * @sync: perform action synchronously
+ *
+ * Description: Typically used to save energy when waiting for a WoL packet
+ *
+ * WARNING: Setting sync to false may cause the system being unable to suspend
+ * in case the PHY generates an interrupt when finishing the autonegotiation.
+ * This interrupt may wake up the system immediately after suspend.
+ * Therefore use sync = false only if you're sure it's safe with the respective
+ * network chip.
+ */
+int phy_speed_down(struct phy_device *phydev, bool sync)
+{
+   u32 adv = phydev->lp_advertising & phydev->supported;
+   u32 adv_old = phydev->advertising;
+   int ret;
+
+   if (phydev->autoneg != AUTONEG_ENABLE)
+   return 0;
+
+   if (adv & PHY_10BT_FEATURES)
+   phydev->advertising &= ~(PHY_100BT_FEATURES |
+PHY_1000BT_FEATURES);
+   else if (adv & PHY_100BT_FEATURES)
+   phydev->advertising &= ~PHY_1000BT_FEATURES;
+
+   if (phydev->advertising == adv_old)
+   return 0;
+
+   ret = phy_config_aneg(phydev);
+   if (ret)
+   return ret;
+
+   return sync ? phy_poll_aneg_done(phydev) : 0;
+}
+EXPORT_SYMBOL_GPL(phy_speed_down);
+
+/**
+ * phy_speed_up - (re)set advertised speeds to all supported speeds
+ * @phydev: the phy_device struct
+ *
+ * Description: Used to revert the effect of phy_speed_down
+ */
+int phy_speed_up(struct phy_device *phydev)
+{
+   u32 mask = PHY_10BT_FEATURES | PHY_100BT_FEATURES | PHY_1000BT_FEATURES;
+   u32 adv_old = phydev->advertising;
+
+   if (phydev->autoneg != AUTONEG_ENABLE)
+   return 0;
+
+   phydev->advertising = (adv_old & ~mask) | (phydev->supported & mask);
+
+   if (phydev->advertising == adv_old)
+   return 0;
+
+   return phy_config_aneg(phydev);
+}
+EXPORT_SYMBOL_GPL(phy_speed_up);
+
 /**
  * phy_start_machine - start PHY state machine tracking
  * @phydev: the phy_device struct
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 6cd09098..075c2f77 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -942,6 +942,8 @@ void phy_start(struct phy_device *phydev);
 void phy_stop(struct phy_device *phydev);
 int phy_start_aneg(struct phy_device *phydev);
 int phy_aneg_done(struct phy_device *phydev);
+int phy_speed_down(struct phy_device *phydev, bool sync);
+int phy_speed_up(struct phy_device *phydev);
 
 int phy_stop_interrupts(struct phy_device *phydev);
 int phy_restart_aneg(struct phy_device *phydev);
-- 
2.18.0




Re: [net v2] sch_fq_codel: zero q->flows_cnt when fq_codel_init fails

2018-07-12 Thread David Miller
From: Jacob Keller 
Date: Tue, 10 Jul 2018 14:22:27 -0700

> When fq_codel_init fails, qdisc_create_dflt will cleanup by using
> qdisc_destroy. This function calls the ->reset() op prior to calling the
> ->destroy() op.
> 
> Unfortunately, during the failure flow for sch_fq_codel, the ->flows
> parameter is not initialized, so the fq_codel_reset function will null
> pointer dereference.
 ...
> This is caused because flows_cnt is non-zero, but flows hasn't been
> initialized. fq_codel_init has left the private data in a partially
> initialized state.
> 
> To fix this, reset flows_cnt to 0 when we fail to initialize.
> Additionally, to make the state more consistent, also cleanup the flows
> pointer when the allocation of backlogs fails.
> 
> This fixes the NULL pointer dereference, since both the for-loop and
> memset in fq_codel_reset will be no-ops when flow_cnt is zero.
> 
> Signed-off-by: Jacob Keller 

Applied and queued up for -stable, thanks!


[PATCH net-next v2 0/2] net: phy: add functionality to speed down PHY when waiting for WoL packet

2018-07-12 Thread Heiner Kallweit
Some network drivers include functionality to speed down the PHY when
suspending and just waiting for a WoL packet because this saves energy.

This patch is based on our recent discussion about factoring out this
functionality to phylib. First user will be the r8169 driver.

v2:
- add warning comment to phy_speed_down regarding usage of sync = false
- remove sync parameter from phy_speed_up

Heiner Kallweit (2):
  net: phy: add helper phy_config_aneg
  net: phy: add phy_speed_down and phy_speed_up

 drivers/net/phy/phy.c | 91 +--
 include/linux/phy.h   |  2 +
 2 files changed, 89 insertions(+), 4 deletions(-)

-- 
2.18.0



Re: [PATCH v3 net-next 10/19] tls: Fix zerocopy_from_iter iov handling

2018-07-12 Thread Boris Pismenny




On 7/12/2018 12:46 PM, Dave Watson wrote:

On 07/11/18 10:54 PM, Boris Pismenny wrote:

zerocopy_from_iter iterates over the message, but it doesn't revert the
updates made by the iov iteration. This patch fixes it. Now, the iov can
be used after calling zerocopy_from_iter.


This breaks tests (which I will send up as selftests shortly).  I
believe we are depending on zerocopy_from_iter to advance the iter,
and if zerocopy_from_iter returns a failure, then we revert it.  So
you can revert it here if you want, but you'd have to advance it if we
actually used it instead.



Only in the send side do we depend on this semantic. On the receive 
side, we need to revert it in case we go to the fallback flow.


[PATCH v4 net-next 06/19] tls: Split decrypt_skb to two functions

2018-07-12 Thread Boris Pismenny
Previously, decrypt_skb also updated the TLS context.
Now, decrypt_skb only decrypts the payload using the current context,
while decrypt_skb_update also updates the state.

Later, in the tls_device Rx flow, we will use decrypt_skb directly.

Signed-off-by: Boris Pismenny 
---
 include/net/tls.h |  2 ++
 net/tls/tls_sw.c  | 44 ++--
 2 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 5dcd808..49b8922 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -390,6 +390,8 @@ int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
  unsigned char *record_type);
 void tls_register_device(struct tls_device *device);
 void tls_unregister_device(struct tls_device *device);
+int decrypt_skb(struct sock *sk, struct sk_buff *skb,
+   struct scatterlist *sgout);
 
 struct sk_buff *tls_validate_xmit_skb(struct sock *sk,
  struct net_device *dev,
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 3bd7c14..99d0347 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -53,7 +53,6 @@ static int tls_do_decryption(struct sock *sk,
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
-   struct strp_msg *rxm = strp_msg(skb);
struct aead_request *aead_req;
 
int ret;
@@ -74,18 +73,6 @@ static int tls_do_decryption(struct sock *sk,
 
ret = crypto_wait_req(crypto_aead_decrypt(aead_req), >async_wait);
 
-   if (ret < 0)
-   goto out;
-
-   rxm->offset += tls_ctx->rx.prepend_size;
-   rxm->full_len -= tls_ctx->rx.overhead_size;
-   tls_advance_record_sn(sk, _ctx->rx);
-
-   ctx->decrypted = true;
-
-   ctx->saved_data_ready(sk);
-
-out:
kfree(aead_req);
return ret;
 }
@@ -670,8 +657,29 @@ static struct sk_buff *tls_wait_data(struct sock *sk, int 
flags,
return skb;
 }
 
-static int decrypt_skb(struct sock *sk, struct sk_buff *skb,
-  struct scatterlist *sgout)
+static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb,
+ struct scatterlist *sgout)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+   struct strp_msg *rxm = strp_msg(skb);
+   int err = 0;
+
+   err = decrypt_skb(sk, skb, sgout);
+   if (err < 0)
+   return err;
+
+   rxm->offset += tls_ctx->rx.prepend_size;
+   rxm->full_len -= tls_ctx->rx.overhead_size;
+   tls_advance_record_sn(sk, _ctx->rx);
+   ctx->decrypted = true;
+   ctx->saved_data_ready(sk);
+
+   return err;
+}
+
+int decrypt_skb(struct sock *sk, struct sk_buff *skb,
+   struct scatterlist *sgout)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -821,7 +829,7 @@ int tls_sw_recvmsg(struct sock *sk,
if (err < 0)
goto fallback_to_reg_recv;
 
-   err = decrypt_skb(sk, skb, sgin);
+   err = decrypt_skb_update(sk, skb, sgin);
for (; pages > 0; pages--)
put_page(sg_page([pages]));
if (err < 0) {
@@ -830,7 +838,7 @@ int tls_sw_recvmsg(struct sock *sk,
}
} else {
 fallback_to_reg_recv:
-   err = decrypt_skb(sk, skb, NULL);
+   err = decrypt_skb_update(sk, skb, NULL);
if (err < 0) {
tls_err_abort(sk, EBADMSG);
goto recv_end;
@@ -901,7 +909,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t 
*ppos,
}
 
if (!ctx->decrypted) {
-   err = decrypt_skb(sk, skb, NULL);
+   err = decrypt_skb_update(sk, skb, NULL);
 
if (err < 0) {
tls_err_abort(sk, EBADMSG);
-- 
1.8.3.1



[PATCH v4 net-next 12/19] net/mlx5: Accel, add TLS rx offload routines

2018-07-12 Thread Boris Pismenny
In Innova TLS, TLS contexts are added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.

Complete the implementation for Innova TLS (FPGA-based) hardware by
adding support for rx inline crypto offload.

Signed-off-by: Boris Pismenny 
Signed-off-by: Ilya Lesokhin 
---
 .../net/ethernet/mellanox/mlx5/core/accel/tls.c|  23 +++--
 .../net/ethernet/mellanox/mlx5/core/accel/tls.h|  26 +++--
 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 113 -
 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.h |  18 ++--
 include/linux/mlx5/mlx5_ifc_fpga.h |   1 +
 5 files changed, 135 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c
index 77ac19f..da7bd26 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c
@@ -37,17 +37,26 @@
 #include "mlx5_core.h"
 #include "fpga/tls.h"
 
-int mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
-  struct tls_crypto_info *crypto_info,
-  u32 start_offload_tcp_sn, u32 *p_swid)
+int mlx5_accel_tls_add_flow(struct mlx5_core_dev *mdev, void *flow,
+   struct tls_crypto_info *crypto_info,
+   u32 start_offload_tcp_sn, u32 *p_swid,
+   bool direction_sx)
 {
-   return mlx5_fpga_tls_add_tx_flow(mdev, flow, crypto_info,
-start_offload_tcp_sn, p_swid);
+   return mlx5_fpga_tls_add_flow(mdev, flow, crypto_info,
+ start_offload_tcp_sn, p_swid,
+ direction_sx);
 }
 
-void mlx5_accel_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 swid)
+void mlx5_accel_tls_del_flow(struct mlx5_core_dev *mdev, u32 swid,
+bool direction_sx)
 {
-   mlx5_fpga_tls_del_tx_flow(mdev, swid, GFP_KERNEL);
+   mlx5_fpga_tls_del_flow(mdev, swid, GFP_KERNEL, direction_sx);
+}
+
+int mlx5_accel_tls_resync_rx(struct mlx5_core_dev *mdev, u32 handle, u32 seq,
+u64 rcd_sn)
+{
+   return mlx5_fpga_tls_resync_rx(mdev, handle, seq, rcd_sn);
 }
 
 bool mlx5_accel_is_tls_device(struct mlx5_core_dev *mdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h
index 6f9c9f4..2228c10 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h
@@ -60,10 +60,14 @@ struct mlx5_ifc_tls_flow_bits {
u8 reserved_at_2[0x1e];
 };
 
-int mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
-  struct tls_crypto_info *crypto_info,
-  u32 start_offload_tcp_sn, u32 *p_swid);
-void mlx5_accel_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 swid);
+int mlx5_accel_tls_add_flow(struct mlx5_core_dev *mdev, void *flow,
+   struct tls_crypto_info *crypto_info,
+   u32 start_offload_tcp_sn, u32 *p_swid,
+   bool direction_sx);
+void mlx5_accel_tls_del_flow(struct mlx5_core_dev *mdev, u32 swid,
+bool direction_sx);
+int mlx5_accel_tls_resync_rx(struct mlx5_core_dev *mdev, u32 handle, u32 seq,
+u64 rcd_sn);
 bool mlx5_accel_is_tls_device(struct mlx5_core_dev *mdev);
 u32 mlx5_accel_tls_device_caps(struct mlx5_core_dev *mdev);
 int mlx5_accel_tls_init(struct mlx5_core_dev *mdev);
@@ -71,11 +75,15 @@ int mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, 
void *flow,
 
 #else
 
-static inline int
-mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
-  struct tls_crypto_info *crypto_info,
-  u32 start_offload_tcp_sn, u32 *p_swid) { return 0; }
-static inline void mlx5_accel_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 
swid) { }
+static int
+mlx5_accel_tls_add_flow(struct mlx5_core_dev *mdev, void *flow,
+   struct tls_crypto_info *crypto_info,
+   u32 start_offload_tcp_sn, u32 *p_swid,
+   bool direction_sx) { return -ENOTSUPP; }
+static inline void mlx5_accel_tls_del_flow(struct mlx5_core_dev *mdev, u32 
swid,
+  bool direction_sx) { }
+static inline int mlx5_accel_tls_resync_rx(struct mlx5_core_dev *mdev, u32 
handle,
+  u32 seq, u64 rcd_sn) { return 0; }
 static inline bool mlx5_accel_is_tls_device(struct mlx5_core_dev *mdev) { 
return false; }
 static inline u32 mlx5_accel_tls_device_caps(struct mlx5_core_dev *mdev) { 
return 0; }
 static inline int mlx5_accel_tls_init(struct mlx5_core_dev *mdev) { return 0; }

[PATCH v4 net-next 16/19] net/mlx5e: TLS, build TLS netdev from capabilities

2018-07-12 Thread Boris Pismenny
This patch enables TLS Rx based on available HW capabilities.

Signed-off-by: Boris Pismenny 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
index 541e6f4..eddd7702 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -183,13 +183,27 @@ static void mlx5e_tls_resync_rx(struct net_device 
*netdev, struct sock *sk,
 
 void mlx5e_tls_build_netdev(struct mlx5e_priv *priv)
 {
+   u32 caps = mlx5_accel_tls_device_caps(priv->mdev);
struct net_device *netdev = priv->netdev;
 
if (!mlx5_accel_is_tls_device(priv->mdev))
return;
 
-   netdev->features |= NETIF_F_HW_TLS_TX;
-   netdev->hw_features |= NETIF_F_HW_TLS_TX;
+   if (caps & MLX5_ACCEL_TLS_TX) {
+   netdev->features  |= NETIF_F_HW_TLS_TX;
+   netdev->hw_features   |= NETIF_F_HW_TLS_TX;
+   }
+
+   if (caps & MLX5_ACCEL_TLS_RX) {
+   netdev->features  |= NETIF_F_HW_TLS_RX;
+   netdev->hw_features   |= NETIF_F_HW_TLS_RX;
+   }
+
+   if (!(caps & MLX5_ACCEL_TLS_LRO)) {
+   netdev->features  &= ~NETIF_F_LRO;
+   netdev->hw_features   &= ~NETIF_F_LRO;
+   }
+
netdev->tlsdev_ops = _tls_ops;
 }
 
-- 
1.8.3.1



[PATCH v4 net-next 03/19] net: Add TLS rx resync NDO

2018-07-12 Thread Boris Pismenny
Add new netdev tls op for resynchronizing HW tls context

Signed-off-by: Boris Pismenny 
---
 include/linux/netdevice.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b683971..0434df3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -903,6 +903,8 @@ struct tlsdev_ops {
void (*tls_dev_del)(struct net_device *netdev,
struct tls_context *ctx,
enum tls_offload_ctx_dir direction);
+   void (*tls_dev_resync_rx)(struct net_device *netdev,
+ struct sock *sk, u32 seq, u64 rcd_sn);
 };
 #endif
 
-- 
1.8.3.1



[PATCH v4 net-next 19/19] net/mlx5e: Kconfig, mutually exclude compilation of TLS and IPsec accel

2018-07-12 Thread Boris Pismenny
We currently have no devices that support both TLS and IPsec using the
accel framework, and the current code does not support both IPsec and
TLS. This patch prevents such combinations.

Signed-off-by: Boris Pismenny 
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 2545296..d3e8c70 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -93,6 +93,7 @@ config MLX5_EN_TLS
depends on TLS_DEVICE
depends on TLS=y || MLX5_CORE=m
depends on MLX5_ACCEL
+   depends on !MLX5_EN_IPSEC
default n
---help---
  Build support for TLS cryptography-offload accelaration in the NIC.
-- 
1.8.3.1



[PATCH v4 net-next 14/19] net/mlx5e: TLS, add Innova TLS rx data path

2018-07-12 Thread Boris Pismenny
Implement the TLS rx offload data path according to the
requirements of the TLS generic NIC offload infrastructure.

Special metadata ethertype is used to pass information to
the hardware.

When hardware loses synchronization a special resync request
metadata message is used to request resync.

Signed-off-by: Boris Pismenny 
Signed-off-by: Ilya Lesokhin 
---
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 112 -
 .../mellanox/mlx5/core/en_accel/tls_rxtx.h |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   6 ++
 3 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
index c96196f..d460fda 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
@@ -33,6 +33,12 @@
 
 #include "en_accel/tls.h"
 #include "en_accel/tls_rxtx.h"
+#include 
+#include 
+
+#define SYNDROM_DECRYPTED  0x30
+#define SYNDROM_RESYNC_REQUEST 0x31
+#define SYNDROM_AUTH_FAILED 0x32
 
 #define SYNDROME_OFFLOAD_REQUIRED 32
 #define SYNDROME_SYNC 33
@@ -44,10 +50,26 @@ struct sync_info {
skb_frag_t frags[MAX_SKB_FRAGS];
 };
 
-struct mlx5e_tls_metadata {
+struct recv_metadata_content {
+   u8 syndrome;
+   u8 reserved;
+   __be32 sync_seq;
+} __packed;
+
+struct send_metadata_content {
/* One byte of syndrome followed by 3 bytes of swid */
__be32 syndrome_swid;
__be16 first_seq;
+} __packed;
+
+struct mlx5e_tls_metadata {
+   union {
+   /* from fpga to host */
+   struct recv_metadata_content recv;
+   /* from host to fpga */
+   struct send_metadata_content send;
+   unsigned char raw[6];
+   } __packed content;
/* packet type ID field */
__be16 ethertype;
 } __packed;
@@ -68,7 +90,8 @@ static int mlx5e_tls_add_metadata(struct sk_buff *skb, __be32 
swid)
2 * ETH_ALEN);
 
eth->h_proto = cpu_to_be16(MLX5E_METADATA_ETHER_TYPE);
-   pet->syndrome_swid = htonl(SYNDROME_OFFLOAD_REQUIRED << 24) | swid;
+   pet->content.send.syndrome_swid =
+   htonl(SYNDROME_OFFLOAD_REQUIRED << 24) | swid;
 
return 0;
 }
@@ -149,7 +172,7 @@ static void mlx5e_tls_complete_sync_skb(struct sk_buff *skb,
 
pet = (struct mlx5e_tls_metadata *)(nskb->data + sizeof(struct ethhdr));
memcpy(pet, , sizeof(syndrome));
-   pet->first_seq = htons(tcp_seq);
+   pet->content.send.first_seq = htons(tcp_seq);
 
/* MLX5 devices don't care about the checksum partial start, offset
 * and pseudo header
@@ -276,3 +299,86 @@ struct sk_buff *mlx5e_tls_handle_tx_skb(struct net_device 
*netdev,
 out:
return skb;
 }
+
+static int tls_update_resync_sn(struct net_device *netdev,
+   struct sk_buff *skb,
+   struct mlx5e_tls_metadata *mdata)
+{
+   struct sock *sk = NULL;
+   struct iphdr *iph;
+   struct tcphdr *th;
+   __be32 seq;
+
+   if (mdata->ethertype != htons(ETH_P_IP))
+   return -EINVAL;
+
+   iph = (struct iphdr *)(mdata + 1);
+
+   th = ((void *)iph) + iph->ihl * 4;
+
+   if (iph->version == 4) {
+   sk = inet_lookup_established(dev_net(netdev), _hashinfo,
+iph->saddr, th->source, iph->daddr,
+th->dest, netdev->ifindex);
+#if IS_ENABLED(CONFIG_IPV6)
+   } else {
+   struct ipv6hdr *ipv6h = (struct ipv6hdr *)iph;
+
+   sk = __inet6_lookup_established(dev_net(netdev), _hashinfo,
+   >saddr, th->source,
+   >daddr, th->dest,
+   netdev->ifindex, 0);
+#endif
+   }
+   if (!sk || sk->sk_state == TCP_TIME_WAIT)
+   goto out;
+
+   skb->sk = sk;
+   skb->destructor = sock_edemux;
+
+   memcpy(, >content.recv.sync_seq, sizeof(seq));
+   tls_offload_rx_resync_request(sk, seq);
+out:
+   return 0;
+}
+
+void mlx5e_tls_handle_rx_skb(struct net_device *netdev, struct sk_buff *skb,
+u32 *cqe_bcnt)
+{
+   struct mlx5e_tls_metadata *mdata;
+   struct ethhdr *old_eth;
+   struct ethhdr *new_eth;
+   __be16 *ethtype;
+
+   /* Detect inline metadata */
+   if (skb->len < ETH_HLEN + MLX5E_METADATA_ETHER_LEN)
+   return;
+   ethtype = (__be16 *)(skb->data + ETH_ALEN * 2);
+   if (*ethtype != cpu_to_be16(MLX5E_METADATA_ETHER_TYPE))
+   return;
+
+   /* Use the metadata */
+   mdata = (struct mlx5e_tls_metadata *)(skb->data + ETH_HLEN);
+   switch (mdata->content.recv.syndrome) {
+   case SYNDROM_DECRYPTED:
+   

[PATCH v4 net-next 08/19] tls: Fill software context without allocation

2018-07-12 Thread Boris Pismenny
This patch allows tls_set_sw_offload to fill the context in case it was
already allocated previously.

We will use it in TLS_DEVICE to fill the RX software context.

Signed-off-by: Boris Pismenny 
---
 net/tls/tls_sw.c | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 86e22bc..5073676 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1090,28 +1090,38 @@ int tls_set_sw_offload(struct sock *sk, struct 
tls_context *ctx, int tx)
}
 
if (tx) {
-   sw_ctx_tx = kzalloc(sizeof(*sw_ctx_tx), GFP_KERNEL);
-   if (!sw_ctx_tx) {
-   rc = -ENOMEM;
-   goto out;
+   if (!ctx->priv_ctx_tx) {
+   sw_ctx_tx = kzalloc(sizeof(*sw_ctx_tx), GFP_KERNEL);
+   if (!sw_ctx_tx) {
+   rc = -ENOMEM;
+   goto out;
+   }
+   ctx->priv_ctx_tx = sw_ctx_tx;
+   } else {
+   sw_ctx_tx =
+   (struct tls_sw_context_tx *)ctx->priv_ctx_tx;
}
-   crypto_init_wait(_ctx_tx->async_wait);
-   ctx->priv_ctx_tx = sw_ctx_tx;
} else {
-   sw_ctx_rx = kzalloc(sizeof(*sw_ctx_rx), GFP_KERNEL);
-   if (!sw_ctx_rx) {
-   rc = -ENOMEM;
-   goto out;
+   if (!ctx->priv_ctx_rx) {
+   sw_ctx_rx = kzalloc(sizeof(*sw_ctx_rx), GFP_KERNEL);
+   if (!sw_ctx_rx) {
+   rc = -ENOMEM;
+   goto out;
+   }
+   ctx->priv_ctx_rx = sw_ctx_rx;
+   } else {
+   sw_ctx_rx =
+   (struct tls_sw_context_rx *)ctx->priv_ctx_rx;
}
-   crypto_init_wait(_ctx_rx->async_wait);
-   ctx->priv_ctx_rx = sw_ctx_rx;
}
 
if (tx) {
+   crypto_init_wait(_ctx_tx->async_wait);
crypto_info = >crypto_send;
cctx = >tx;
aead = _ctx_tx->aead_send;
} else {
+   crypto_init_wait(_ctx_rx->async_wait);
crypto_info = >crypto_recv;
cctx = >rx;
aead = _ctx_rx->aead_recv;
-- 
1.8.3.1



[PATCH v4 net-next 10/19] tls: Fix zerocopy_from_iter iov handling

2018-07-12 Thread Boris Pismenny
zerocopy_from_iter iterates over the message, but it doesn't revert the
updates made by the iov iteration. This patch fixes it. Now, the iov can
be used after calling zerocopy_from_iter.

Fixes: 3c4d75591 ("tls: kernel TLS support")
Signed-off-by: Boris Pismenny 
---
 net/tls/tls_sw.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2a6ba0f..ea78678 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -267,7 +267,7 @@ static int zerocopy_from_iter(struct sock *sk, struct 
iov_iter *from,
  int length, int *pages_used,
  unsigned int *size_used,
  struct scatterlist *to, int to_max_pages,
- bool charge)
+ bool charge, bool revert)
 {
struct page *pages[MAX_SKB_FRAGS];
 
@@ -318,6 +318,8 @@ static int zerocopy_from_iter(struct sock *sk, struct 
iov_iter *from,
 out:
*size_used = size;
*pages_used = num_elem;
+   if (revert)
+   iov_iter_revert(from, size);
 
return rc;
 }
@@ -419,7 +421,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
>sg_plaintext_size,
ctx->sg_plaintext_data,
ARRAY_SIZE(ctx->sg_plaintext_data),
-   true);
+   true, false);
if (ret)
goto fallback_to_reg_send;
 
@@ -834,7 +836,7 @@ int tls_sw_recvmsg(struct sock *sk,
err = zerocopy_from_iter(sk, >msg_iter,
 to_copy, ,
 , [1],
-MAX_SKB_FRAGS, false);
+MAX_SKB_FRAGS, false, 
true);
if (err < 0)
goto fallback_to_reg_recv;
 
-- 
1.8.3.1



[PATCH v4 net-next 17/19] net/mlx5: Accel, add common metadata functions

2018-07-12 Thread Boris Pismenny
This patch adds common functions to handle mellanox metadata headers.
These functions are used by IPsec and TLS to process FPGA metadata.

Signed-off-by: Boris Pismenny 
---
 .../net/ethernet/mellanox/mlx5/core/accel/accel.h  | 37 ++
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c   | 19 +++
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 18 +++
 3 files changed, 45 insertions(+), 29 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/accel/accel.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/accel.h 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/accel.h
new file mode 100644
index 000..c132604
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/accel.h
@@ -0,0 +1,37 @@
+#ifndef __MLX5E_ACCEL_H__
+#define __MLX5E_ACCEL_H__
+
+#ifdef CONFIG_MLX5_ACCEL
+
+#include 
+#include 
+#include "en.h"
+
+static inline bool is_metadata_hdr_valid(struct sk_buff *skb)
+{
+   __be16 *ethtype;
+
+   if (unlikely(skb->len < ETH_HLEN + MLX5E_METADATA_ETHER_LEN))
+   return false;
+   ethtype = (__be16 *)(skb->data + ETH_ALEN * 2);
+   if (*ethtype != cpu_to_be16(MLX5E_METADATA_ETHER_TYPE))
+   return false;
+   return true;
+}
+
+static inline void remove_metadata_hdr(struct sk_buff *skb)
+{
+   struct ethhdr *old_eth;
+   struct ethhdr *new_eth;
+
+   /* Remove the metadata from the buffer */
+   old_eth = (struct ethhdr *)skb->data;
+   new_eth = (struct ethhdr *)(skb->data + MLX5E_METADATA_ETHER_LEN);
+   memmove(new_eth, old_eth, 2 * ETH_ALEN);
+   /* Ethertype is already in its new place */
+   skb_pull_inline(skb, MLX5E_METADATA_ETHER_LEN);
+}
+
+#endif /* CONFIG_MLX5_ACCEL */
+
+#endif /* __MLX5E_EN_ACCEL_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
index c245d8e..fda7929 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
@@ -37,6 +37,7 @@
 
 #include "en_accel/ipsec_rxtx.h"
 #include "en_accel/ipsec.h"
+#include "accel/accel.h"
 #include "en.h"
 
 enum {
@@ -346,19 +347,12 @@ struct sk_buff *mlx5e_ipsec_handle_tx_skb(struct 
net_device *netdev,
 }
 
 struct sk_buff *mlx5e_ipsec_handle_rx_skb(struct net_device *netdev,
- struct sk_buff *skb)
+ struct sk_buff *skb, u32 *cqe_bcnt)
 {
struct mlx5e_ipsec_metadata *mdata;
-   struct ethhdr *old_eth;
-   struct ethhdr *new_eth;
struct xfrm_state *xs;
-   __be16 *ethtype;
 
-   /* Detect inline metadata */
-   if (skb->len < ETH_HLEN + MLX5E_METADATA_ETHER_LEN)
-   return skb;
-   ethtype = (__be16 *)(skb->data + ETH_ALEN * 2);
-   if (*ethtype != cpu_to_be16(MLX5E_METADATA_ETHER_TYPE))
+   if (!is_metadata_hdr_valid(skb))
return skb;
 
/* Use the metadata */
@@ -369,12 +363,7 @@ struct sk_buff *mlx5e_ipsec_handle_rx_skb(struct 
net_device *netdev,
return NULL;
}
 
-   /* Remove the metadata from the buffer */
-   old_eth = (struct ethhdr *)skb->data;
-   new_eth = (struct ethhdr *)(skb->data + MLX5E_METADATA_ETHER_LEN);
-   memmove(new_eth, old_eth, 2 * ETH_ALEN);
-   /* Ethertype is already in its new place */
-   skb_pull_inline(skb, MLX5E_METADATA_ETHER_LEN);
+   remove_metadata_hdr(skb);
 
return skb;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
index ecfc764..92d3745 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
@@ -33,6 +33,8 @@
 
 #include "en_accel/tls.h"
 #include "en_accel/tls_rxtx.h"
+#include "accel/accel.h"
+
 #include 
 #include 
 
@@ -350,16 +352,9 @@ void mlx5e_tls_handle_rx_skb(struct net_device *netdev, 
struct sk_buff *skb,
 u32 *cqe_bcnt)
 {
struct mlx5e_tls_metadata *mdata;
-   struct ethhdr *old_eth;
-   struct ethhdr *new_eth;
-   __be16 *ethtype;
struct mlx5e_priv *priv;
 
-   /* Detect inline metadata */
-   if (skb->len < ETH_HLEN + MLX5E_METADATA_ETHER_LEN)
-   return;
-   ethtype = (__be16 *)(skb->data + ETH_ALEN * 2);
-   if (*ethtype != cpu_to_be16(MLX5E_METADATA_ETHER_TYPE))
+   if (!is_metadata_hdr_valid(skb))
return;
 
/* Use the metadata */
@@ -383,11 +378,6 @@ void mlx5e_tls_handle_rx_skb(struct net_device *netdev, 
struct sk_buff *skb,
return;
}
 
-   /* Remove the metadata from the buffer */
-   old_eth = (struct ethhdr *)skb->data;
-   new_eth = (struct ethhdr *)(skb->data + MLX5E_METADATA_ETHER_LEN);
-   

[PATCH v4 net-next 15/19] net/mlx5e: TLS, add software statistics

2018-07-12 Thread Boris Pismenny
This patch adds software statistics for TLS to count important
events.

Signed-off-by: Boris Pismenny 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c  |  3 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h  |  4 
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c | 11 ++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
index 68368c9..541e6f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -169,7 +169,10 @@ static void mlx5e_tls_resync_rx(struct net_device *netdev, 
struct sock *sk,
 
rx_ctx = mlx5e_get_tls_rx_context(tls_ctx);
 
+   netdev_info(netdev, "resyncing seq %d rcd %lld\n", seq,
+   be64_to_cpu(rcd_sn));
mlx5_accel_tls_resync_rx(priv->mdev, rx_ctx->handle, seq, rcd_sn);
+   atomic64_inc(>tls->sw_stats.rx_tls_resync_reply);
 }
 
 static const struct tlsdev_ops mlx5e_tls_ops = {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
index 2d40ede..3f5d721 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
@@ -43,6 +43,10 @@ struct mlx5e_tls_sw_stats {
atomic64_t tx_tls_drop_resync_alloc;
atomic64_t tx_tls_drop_no_sync_data;
atomic64_t tx_tls_drop_bypass_required;
+   atomic64_t rx_tls_drop_resync_request;
+   atomic64_t rx_tls_resync_request;
+   atomic64_t rx_tls_resync_reply;
+   atomic64_t rx_tls_auth_fail;
 };
 
 struct mlx5e_tls {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
index d460fda..ecfc764 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
@@ -330,8 +330,12 @@ static int tls_update_resync_sn(struct net_device *netdev,
netdev->ifindex, 0);
 #endif
}
-   if (!sk || sk->sk_state == TCP_TIME_WAIT)
+   if (!sk || sk->sk_state == TCP_TIME_WAIT) {
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+
+   atomic64_inc(>tls->sw_stats.rx_tls_drop_resync_request);
goto out;
+   }
 
skb->sk = sk;
skb->destructor = sock_edemux;
@@ -349,6 +353,7 @@ void mlx5e_tls_handle_rx_skb(struct net_device *netdev, 
struct sk_buff *skb,
struct ethhdr *old_eth;
struct ethhdr *new_eth;
__be16 *ethtype;
+   struct mlx5e_priv *priv;
 
/* Detect inline metadata */
if (skb->len < ETH_HLEN + MLX5E_METADATA_ETHER_LEN)
@@ -365,9 +370,13 @@ void mlx5e_tls_handle_rx_skb(struct net_device *netdev, 
struct sk_buff *skb,
break;
case SYNDROM_RESYNC_REQUEST:
tls_update_resync_sn(netdev, skb, mdata);
+   priv = netdev_priv(netdev);
+   atomic64_inc(>tls->sw_stats.rx_tls_resync_request);
break;
case SYNDROM_AUTH_FAILED:
/* Authentication failure will be observed and verified by kTLS 
*/
+   priv = netdev_priv(netdev);
+   atomic64_inc(>tls->sw_stats.rx_tls_auth_fail);
break;
default:
/* Bypass the metadata header to others */
-- 
1.8.3.1



[PATCH v4 net-next 00/19] TLS offload rx, netdev & mlx5

2018-07-12 Thread Boris Pismenny
Hi,

The following series provides TLS RX inline crypto offload.

v4->v3:
- Remove the iov revert for zero copy send flow 

v2->v3:
- Fix typo
- Adjust cover letter
- Fix bug in zero copy flows
- Use network byte order for the record number in resync
- Adjust the sequence provided in resync

v1->v2:
- Fix bisectability problems due to variable name changes
- Fix potential uninitialized return value

This series completes the generic infrastructure to offload TLS crypto to
a network devices. It enables the kernel TLS socket to skip decryption and
authentication operations for SKBs marked as decrypted on the receive
side of the data path. Leaving those computationally expensive operations
to the NIC.

This infrastructure doesn't require a TCP offload engine. Instead, the
NIC decrypts a packet's payload if the packet contains the expected TCP
sequence number. The TLS record authentication tag remains unmodified
regardless of decryption. If the packet is decrypted successfully and it
contains an authentication tag, then the authentication check has passed.
Otherwise, if the authentication fails, then the packet is provided
unmodified and the KTLS layer is responsible for handling it.
Out-Of-Order TCP packets are provided unmodified. As a result,
in the slow path some of the SKBs are decrypted while others remain as
ciphertext.

The GRO and TCP layers must not coalesce decrypted and non-decrypted SKBs. 
At the worst case a received TLS record consists of both plaintext
and ciphertext packets. These partially decrypted records must be
reencrypted, only to be decrypted.

The notable differences between SW KTLS and NIC offloaded TLS
implementations are as follows:
1. Partial decryption - Software must handle the case of a TLS record
that was only partially decrypted by HW. This can happen due to packet
reordering.
2. Resynchronization - tls_read_size calls the device driver to
resynchronize HW whenever it lost track of the TLS record framing in
the TCP stream.

The infrastructure should be extendable to support various NIC offload
implementations.  However it is currently written with the
implementation below in mind:
The NIC identifies packets that should be offloaded according to
the 5-tuple and the TCP sequence number. If these match and the
packet is decrypted and authenticated successfully, then a syndrome
is provided to software. Otherwise, the packet is unmodified.
Decrypted and non-decrypted packets aren't coalesced by the network stack,
and the KTLS layer decrypts and authenticates partially decrypted records.
The NIC provides an indication whenever a resync is required. The resync
operation is triggered by the KTLS layer while parsing TLS record headers.

Finally, we measure the performance obtained by running single stream
iperf with two Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz machines connected
back-to-back with Innova TLS (40Gbps) NICs. We compare TCP (upper bound)
and KTLS-Offload running both in Tx and Rx. The results show that the
performance of offload is comparable to TCP.

  | Bandwidth (Gbps) | CPU Tx (%) | CPU rx (%)
TCP   | 28.8 | 5  | 12
KTLS-Offload-Tx-Rx| 28.6 | 7  | 14

Paper: https://netdevconf.org/2.2/papers/pismenny-tlscrypto-talk.pdf

Boris Pismenny (18):
  net: Add decrypted field to skb
  net: Add TLS rx resync NDO
  tcp: Don't coalesce decrypted and encrypted SKBs
  tls: Refactor tls_offload variable names
  tls: Split decrypt_skb to two functions
  tls: Split tls_sw_release_resources_rx
  tls: Fill software context without allocation
  tls: Add rx inline crypto offload
  tls: Fix zerocopy_from_iter iov handling
  net/mlx5e: TLS, refactor variable names
  net/mlx5: Accel, add TLS rx offload routines
  net/mlx5e: TLS, add innova rx support
  net/mlx5e: TLS, add Innova TLS rx data path
  net/mlx5e: TLS, add software statistics
  net/mlx5e: TLS, build TLS netdev from capabilities
  net/mlx5: Accel, add common metadata functions
  net/mlx5e: IPsec, fix byte count in CQE
  net/mlx5e: Kconfig, mutually exclude compilation of TLS and IPsec
accel

Ilya Lesokhin (1):
  net: Add TLS RX offload feature

 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   1 +
 .../net/ethernet/mellanox/mlx5/core/accel/accel.h  |  37 +++
 .../net/ethernet/mellanox/mlx5/core/accel/tls.c|  23 +-
 .../net/ethernet/mellanox/mlx5/core/accel/tls.h|  26 +-
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c   |  20 +-
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.h   |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c |  69 +++--
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h |  33 ++-
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 117 +++-
 .../mellanox/mlx5/core/en_accel/tls_rxtx.h |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 113 ++--
 

[PATCH v4 net-next 09/19] tls: Add rx inline crypto offload

2018-07-12 Thread Boris Pismenny
This patch completes the generic infrastructure to offload TLS crypto to a
network device. It enables the kernel to skip decryption and
authentication of some skbs marked as decrypted by the NIC. In the fast
path, all packets received are decrypted by the NIC and the performance
is comparable to plain TCP.

This infrastructure doesn't require a TCP offload engine. Instead, the
NIC only decrypts packets that contain the expected TCP sequence number.
Out-Of-Order TCP packets are provided unmodified. As a result, at the
worst case a received TLS record consists of both plaintext and ciphertext
packets. These partially decrypted records must be reencrypted,
only to be decrypted.

The notable differences between SW KTLS Rx and this offload are as
follows:
1. Partial decryption - Software must handle the case of a TLS record
that was only partially decrypted by HW. This can happen due to packet
reordering.
2. Resynchronization - tls_read_size calls the device driver to
resynchronize HW after HW lost track of TLS record framing in
the TCP stream.

Signed-off-by: Boris Pismenny 
---
 include/net/tls.h |  63 +-
 net/tls/tls_device.c  | 278 ++
 net/tls/tls_device_fallback.c |   1 +
 net/tls/tls_main.c|  32 +++--
 net/tls/tls_sw.c  |  24 +++-
 5 files changed, 355 insertions(+), 43 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 7a485de..d8b3b65 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -83,6 +83,16 @@ struct tls_device {
void (*unhash)(struct tls_device *device, struct sock *sk);
 };
 
+enum {
+   TLS_BASE,
+   TLS_SW,
+#ifdef CONFIG_TLS_DEVICE
+   TLS_HW,
+#endif
+   TLS_HW_RECORD,
+   TLS_NUM_CONFIG,
+};
+
 struct tls_sw_context_tx {
struct crypto_aead *aead_send;
struct crypto_wait async_wait;
@@ -197,6 +207,7 @@ struct tls_context {
int (*push_pending_record)(struct sock *sk, int flags);
 
void (*sk_write_space)(struct sock *sk);
+   void (*sk_destruct)(struct sock *sk);
void (*sk_proto_close)(struct sock *sk, long timeout);
 
int  (*setsockopt)(struct sock *sk, int level,
@@ -209,13 +220,27 @@ struct tls_context {
void (*unhash)(struct sock *sk);
 };
 
+struct tls_offload_context_rx {
+   /* sw must be the first member of tls_offload_context_rx */
+   struct tls_sw_context_rx sw;
+   atomic64_t resync_req;
+   u8 driver_state[];
+   /* The TLS layer reserves room for driver specific state
+* Currently the belief is that there is not enough
+* driver specific state to justify another layer of indirection
+*/
+};
+
+#define TLS_OFFLOAD_CONTEXT_SIZE_RX\
+   (ALIGN(sizeof(struct tls_offload_context_rx), sizeof(void *)) + \
+TLS_DRIVER_STATE_SIZE)
+
 int wait_on_pending_writer(struct sock *sk, long *timeo);
 int tls_sk_query(struct sock *sk, int optname, char __user *optval,
int __user *optlen);
 int tls_sk_attach(struct sock *sk, int optname, char __user *optval,
  unsigned int optlen);
 
-
 int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx);
 int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int tls_sw_sendpage(struct sock *sk, struct page *page,
@@ -290,11 +315,19 @@ static inline bool tls_is_pending_open_record(struct 
tls_context *tls_ctx)
return tls_ctx->pending_open_record_frags;
 }
 
+struct sk_buff *
+tls_validate_xmit_skb(struct sock *sk, struct net_device *dev,
+ struct sk_buff *skb);
+
 static inline bool tls_is_sk_tx_device_offloaded(struct sock *sk)
 {
-   return sk_fullsock(sk) &&
-  /* matches smp_store_release in tls_set_device_offload */
-  smp_load_acquire(>sk_destruct) == _device_sk_destruct;
+#ifdef CONFIG_SOCK_VALIDATE_XMIT
+   return sk_fullsock(sk) &
+  (smp_load_acquire(>sk_validate_xmit_skb) ==
+  _validate_xmit_skb);
+#else
+   return false;
+#endif
 }
 
 static inline void tls_err_abort(struct sock *sk, int err)
@@ -387,10 +420,27 @@ static inline struct tls_sw_context_tx *tls_sw_ctx_tx(
return (struct tls_offload_context_tx *)tls_ctx->priv_ctx_tx;
 }
 
+static inline struct tls_offload_context_rx *
+tls_offload_ctx_rx(const struct tls_context *tls_ctx)
+{
+   return (struct tls_offload_context_rx *)tls_ctx->priv_ctx_rx;
+}
+
+/* The TLS context is valid until sk_destruct is called */
+static inline void tls_offload_rx_resync_request(struct sock *sk, __be32 seq)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct tls_offload_context_rx *rx_ctx = tls_offload_ctx_rx(tls_ctx);
+
+   atomic64_set(_ctx->resync_req, uint64_t)seq) << 32) | 1));
+}
+
+
 int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
  unsigned char *record_type);
 void 

[PATCH v4 net-next 18/19] net/mlx5e: IPsec, fix byte count in CQE

2018-07-12 Thread Boris Pismenny
This patch fixes the byte count indication in CQE for processed IPsec
packets that contain a metadata header.

Signed-off-by: Boris Pismenny 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
index fda7929..128a82b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
@@ -364,6 +364,7 @@ struct sk_buff *mlx5e_ipsec_handle_rx_skb(struct net_device 
*netdev,
}
 
remove_metadata_hdr(skb);
+   *cqe_bcnt -= MLX5E_METADATA_ETHER_LEN;
 
return skb;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
index 2bfbbef..ca47c05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
@@ -41,7 +41,7 @@
 #include "en.h"
 
 struct sk_buff *mlx5e_ipsec_handle_rx_skb(struct net_device *netdev,
- struct sk_buff *skb);
+ struct sk_buff *skb, u32 *cqe_bcnt);
 void mlx5e_ipsec_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
 
 void mlx5e_ipsec_inverse_table_init(void);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 847e195..4a85b26 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1470,7 +1470,7 @@ void mlx5e_ipsec_handle_rx_cqe(struct mlx5e_rq *rq, 
struct mlx5_cqe64 *cqe)
mlx5e_free_rx_wqe(rq, wi);
goto wq_ll_pop;
}
-   skb = mlx5e_ipsec_handle_rx_skb(rq->netdev, skb);
+   skb = mlx5e_ipsec_handle_rx_skb(rq->netdev, skb, _bcnt);
if (unlikely(!skb)) {
mlx5e_free_rx_wqe(rq, wi);
goto wq_ll_pop;
-- 
1.8.3.1



[PATCH v4 net-next 05/19] tls: Refactor tls_offload variable names

2018-07-12 Thread Boris Pismenny
For symmetry, we rename tls_offload_context to
tls_offload_context_tx before we add tls_offload_context_rx.

Signed-off-by: Boris Pismenny 
---
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h |  6 +++---
 include/net/tls.h  | 16 +++---
 net/tls/tls_device.c   | 25 +++---
 net/tls/tls_device_fallback.c  |  8 +++
 4 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
index b616217..b82f4de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
@@ -50,7 +50,7 @@ struct mlx5e_tls {
 };
 
 struct mlx5e_tls_offload_context {
-   struct tls_offload_context base;
+   struct tls_offload_context_tx base;
u32 expected_seq;
__be32 swid;
 };
@@ -59,8 +59,8 @@ struct mlx5e_tls_offload_context {
 mlx5e_get_tls_tx_context(struct tls_context *tls_ctx)
 {
BUILD_BUG_ON(sizeof(struct mlx5e_tls_offload_context) >
-TLS_OFFLOAD_CONTEXT_SIZE);
-   return container_of(tls_offload_ctx(tls_ctx),
+TLS_OFFLOAD_CONTEXT_SIZE_TX);
+   return container_of(tls_offload_ctx_tx(tls_ctx),
struct mlx5e_tls_offload_context,
base);
 }
diff --git a/include/net/tls.h b/include/net/tls.h
index 70c2737..5dcd808 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -128,7 +128,7 @@ struct tls_record_info {
skb_frag_t frags[MAX_SKB_FRAGS];
 };
 
-struct tls_offload_context {
+struct tls_offload_context_tx {
struct crypto_aead *aead_send;
spinlock_t lock;/* protects records list */
struct list_head records_list;
@@ -147,8 +147,8 @@ struct tls_offload_context {
 #define TLS_DRIVER_STATE_SIZE (max_t(size_t, 8, sizeof(void *)))
 };
 
-#define TLS_OFFLOAD_CONTEXT_SIZE   
\
-   (ALIGN(sizeof(struct tls_offload_context), sizeof(void *)) +   \
+#define TLS_OFFLOAD_CONTEXT_SIZE_TX
\
+   (ALIGN(sizeof(struct tls_offload_context_tx), sizeof(void *)) +\
 TLS_DRIVER_STATE_SIZE)
 
 enum {
@@ -239,7 +239,7 @@ int tls_device_sendpage(struct sock *sk, struct page *page,
 void tls_device_init(void);
 void tls_device_cleanup(void);
 
-struct tls_record_info *tls_get_record(struct tls_offload_context *context,
+struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context,
   u32 seq, u64 *p_record_sn);
 
 static inline bool tls_record_is_start_marker(struct tls_record_info *rec)
@@ -380,10 +380,10 @@ static inline struct tls_sw_context_tx *tls_sw_ctx_tx(
return (struct tls_sw_context_tx *)tls_ctx->priv_ctx_tx;
 }
 
-static inline struct tls_offload_context *tls_offload_ctx(
-   const struct tls_context *tls_ctx)
+static inline struct tls_offload_context_tx *
+tls_offload_ctx_tx(const struct tls_context *tls_ctx)
 {
-   return (struct tls_offload_context *)tls_ctx->priv_ctx_tx;
+   return (struct tls_offload_context_tx *)tls_ctx->priv_ctx_tx;
 }
 
 int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
@@ -396,7 +396,7 @@ struct sk_buff *tls_validate_xmit_skb(struct sock *sk,
  struct sk_buff *skb);
 
 int tls_sw_fallback_init(struct sock *sk,
-struct tls_offload_context *offload_ctx,
+struct tls_offload_context_tx *offload_ctx,
 struct tls_crypto_info *crypto_info);
 
 #endif /* _TLS_OFFLOAD_H */
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a7a8f8e..332a5d1 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -52,9 +52,8 @@
 
 static void tls_device_free_ctx(struct tls_context *ctx)
 {
-   struct tls_offload_context *offload_ctx = tls_offload_ctx(ctx);
+   kfree(tls_offload_ctx_tx(ctx));
 
-   kfree(offload_ctx);
kfree(ctx);
 }
 
@@ -125,7 +124,7 @@ static void destroy_record(struct tls_record_info *record)
kfree(record);
 }
 
-static void delete_all_records(struct tls_offload_context *offload_ctx)
+static void delete_all_records(struct tls_offload_context_tx *offload_ctx)
 {
struct tls_record_info *info, *temp;
 
@@ -141,14 +140,14 @@ static void tls_icsk_clean_acked(struct sock *sk, u32 
acked_seq)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_record_info *info, *temp;
-   struct tls_offload_context *ctx;
+   struct tls_offload_context_tx *ctx;
u64 deleted_records = 0;
unsigned long flags;
 
if (!tls_ctx)
return;
 
-   ctx = tls_offload_ctx(tls_ctx);
+   ctx = tls_offload_ctx_tx(tls_ctx);
 
spin_lock_irqsave(>lock, flags);
  

[PATCH v4 net-next 13/19] net/mlx5e: TLS, add innova rx support

2018-07-12 Thread Boris Pismenny
Add the mlx5 implementation of the TLS Rx routines to add/del TLS
contexts, also add the tls_dev_resync_rx routine
to work with the TLS inline Rx crypto offload infrastructure.

Signed-off-by: Boris Pismenny 
Signed-off-by: Ilya Lesokhin 
---
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 46 +++---
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h | 15 +++
 2 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
index 7fb9c75..68368c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -110,9 +110,7 @@ static int mlx5e_tls_add(struct net_device *netdev, struct 
sock *sk,
u32 caps = mlx5_accel_tls_device_caps(mdev);
int ret = -ENOMEM;
void *flow;
-
-   if (direction != TLS_OFFLOAD_CTX_DIR_TX)
-   return -EINVAL;
+   u32 swid;
 
flow = kzalloc(MLX5_ST_SZ_BYTES(tls_flow), GFP_KERNEL);
if (!flow)
@@ -122,18 +120,23 @@ static int mlx5e_tls_add(struct net_device *netdev, 
struct sock *sk,
if (ret)
goto free_flow;
 
+   ret = mlx5_accel_tls_add_flow(mdev, flow, crypto_info,
+ start_offload_tcp_sn, ,
+ direction == TLS_OFFLOAD_CTX_DIR_TX);
+   if (ret < 0)
+   goto free_flow;
+
if (direction == TLS_OFFLOAD_CTX_DIR_TX) {
struct mlx5e_tls_offload_context_tx *tx_ctx =
mlx5e_get_tls_tx_context(tls_ctx);
-   u32 swid;
-
-   ret = mlx5_accel_tls_add_tx_flow(mdev, flow, crypto_info,
-start_offload_tcp_sn, );
-   if (ret < 0)
-   goto free_flow;
 
tx_ctx->swid = htonl(swid);
tx_ctx->expected_seq = start_offload_tcp_sn;
+   } else {
+   struct mlx5e_tls_offload_context_rx *rx_ctx =
+   mlx5e_get_tls_rx_context(tls_ctx);
+
+   rx_ctx->handle = htonl(swid);
}
 
return 0;
@@ -147,19 +150,32 @@ static void mlx5e_tls_del(struct net_device *netdev,
  enum tls_offload_ctx_dir direction)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
+   unsigned int handle;
 
-   if (direction == TLS_OFFLOAD_CTX_DIR_TX) {
-   u32 swid = ntohl(mlx5e_get_tls_tx_context(tls_ctx)->swid);
+   handle = ntohl((direction == TLS_OFFLOAD_CTX_DIR_TX) ?
+  mlx5e_get_tls_tx_context(tls_ctx)->swid :
+  mlx5e_get_tls_rx_context(tls_ctx)->handle);
 
-   mlx5_accel_tls_del_tx_flow(priv->mdev, swid);
-   } else {
-   netdev_err(netdev, "unsupported direction %d\n", direction);
-   }
+   mlx5_accel_tls_del_flow(priv->mdev, handle,
+   direction == TLS_OFFLOAD_CTX_DIR_TX);
+}
+
+static void mlx5e_tls_resync_rx(struct net_device *netdev, struct sock *sk,
+   u32 seq, u64 rcd_sn)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+   struct mlx5e_tls_offload_context_rx *rx_ctx;
+
+   rx_ctx = mlx5e_get_tls_rx_context(tls_ctx);
+
+   mlx5_accel_tls_resync_rx(priv->mdev, rx_ctx->handle, seq, rcd_sn);
 }
 
 static const struct tlsdev_ops mlx5e_tls_ops = {
.tls_dev_add = mlx5e_tls_add,
.tls_dev_del = mlx5e_tls_del,
+   .tls_dev_resync_rx = mlx5e_tls_resync_rx,
 };
 
 void mlx5e_tls_build_netdev(struct mlx5e_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
index e26222a..2d40ede 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
@@ -65,6 +65,21 @@ struct mlx5e_tls_offload_context_tx {
base);
 }
 
+struct mlx5e_tls_offload_context_rx {
+   struct tls_offload_context_rx base;
+   __be32 handle;
+};
+
+static inline struct mlx5e_tls_offload_context_rx *
+mlx5e_get_tls_rx_context(struct tls_context *tls_ctx)
+{
+   BUILD_BUG_ON(sizeof(struct mlx5e_tls_offload_context_rx) >
+TLS_OFFLOAD_CONTEXT_SIZE_RX);
+   return container_of(tls_offload_ctx_rx(tls_ctx),
+   struct mlx5e_tls_offload_context_rx,
+   base);
+}
+
 void mlx5e_tls_build_netdev(struct mlx5e_priv *priv);
 int mlx5e_tls_init(struct mlx5e_priv *priv);
 void mlx5e_tls_cleanup(struct mlx5e_priv *priv);
-- 
1.8.3.1



[PATCH v4 net-next 07/19] tls: Split tls_sw_release_resources_rx

2018-07-12 Thread Boris Pismenny
This patch splits tls_sw_release_resources_rx into two functions one
which releases all inner software tls structures and another that also
frees the containing structure.

In TLS_DEVICE we will need to release the software structures without
freeeing the containing structure, which contains other information.

Signed-off-by: Boris Pismenny 
---
 include/net/tls.h |  1 +
 net/tls/tls_sw.c  | 10 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 49b8922..7a485de 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -223,6 +223,7 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
 void tls_sw_close(struct sock *sk, long timeout);
 void tls_sw_free_resources_tx(struct sock *sk);
 void tls_sw_free_resources_rx(struct sock *sk);
+void tls_sw_release_resources_rx(struct sock *sk);
 int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
   int nonblock, int flags, int *addr_len);
 unsigned int tls_sw_poll(struct file *file, struct socket *sock,
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 99d0347..86e22bc 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1039,7 +1039,7 @@ void tls_sw_free_resources_tx(struct sock *sk)
kfree(ctx);
 }
 
-void tls_sw_free_resources_rx(struct sock *sk)
+void tls_sw_release_resources_rx(struct sock *sk)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -1058,6 +1058,14 @@ void tls_sw_free_resources_rx(struct sock *sk)
strp_done(>strp);
lock_sock(sk);
}
+}
+
+void tls_sw_free_resources_rx(struct sock *sk)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+
+   tls_sw_release_resources_rx(sk);
 
kfree(ctx);
 }
-- 
1.8.3.1



[PATCH v4 net-next 02/19] net: Add TLS RX offload feature

2018-07-12 Thread Boris Pismenny
From: Ilya Lesokhin 

This patch adds a netdev feature to configure TLS RX inline crypto offload.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
---
 include/linux/netdev_features.h | 2 ++
 net/core/ethtool.c  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 623bb8c..2b2a6dc 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -79,6 +79,7 @@ enum {
NETIF_F_HW_ESP_TX_CSUM_BIT, /* ESP with TX checksum offload */
NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
NETIF_F_HW_TLS_TX_BIT,  /* Hardware TLS TX offload */
+   NETIF_F_HW_TLS_RX_BIT,  /* Hardware TLS RX offload */
 
NETIF_F_GRO_HW_BIT, /* Hardware Generic receive offload */
NETIF_F_HW_TLS_RECORD_BIT,  /* Offload TLS record */
@@ -151,6 +152,7 @@ enum {
 #define NETIF_F_HW_TLS_RECORD  __NETIF_F(HW_TLS_RECORD)
 #define NETIF_F_GSO_UDP_L4 __NETIF_F(GSO_UDP_L4)
 #define NETIF_F_HW_TLS_TX  __NETIF_F(HW_TLS_TX)
+#define NETIF_F_HW_TLS_RX  __NETIF_F(HW_TLS_RX)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index e677a20..c9993c6 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -111,6 +111,7 @@ int ethtool_op_get_ts_info(struct net_device *dev, struct 
ethtool_ts_info *info)
[NETIF_F_RX_UDP_TUNNEL_PORT_BIT] =   "rx-udp_tunnel-port-offload",
[NETIF_F_HW_TLS_RECORD_BIT] =   "tls-hw-record",
[NETIF_F_HW_TLS_TX_BIT] ="tls-hw-tx-offload",
+   [NETIF_F_HW_TLS_RX_BIT] ="tls-hw-rx-offload",
 };
 
 static const char
-- 
1.8.3.1



[PATCH v4 net-next 04/19] tcp: Don't coalesce decrypted and encrypted SKBs

2018-07-12 Thread Boris Pismenny
Prevent coalescing of decrypted and encrypted SKBs in GRO
and TCP layer.

Signed-off-by: Boris Pismenny 
Signed-off-by: Ilya Lesokhin 
---
 net/ipv4/tcp_input.c   | 12 
 net/ipv4/tcp_offload.c |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 814ea43..f89d86a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4343,6 +4343,11 @@ static bool tcp_try_coalesce(struct sock *sk,
if (TCP_SKB_CB(from)->seq != TCP_SKB_CB(to)->end_seq)
return false;
 
+#ifdef CONFIG_TLS_DEVICE
+   if (from->decrypted != to->decrypted)
+   return false;
+#endif
+
if (!skb_try_coalesce(to, from, fragstolen, ))
return false;
 
@@ -4872,6 +4877,9 @@ void tcp_rbtree_insert(struct rb_root *root, struct 
sk_buff *skb)
break;
 
memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
+#ifdef CONFIG_TLS_DEVICE
+   nskb->decrypted = skb->decrypted;
+#endif
TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start;
if (list)
__skb_queue_before(list, skb, nskb);
@@ -4899,6 +4907,10 @@ void tcp_rbtree_insert(struct rb_root *root, struct 
sk_buff *skb)
skb == tail ||
(TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | 
TCPHDR_FIN)))
goto end;
+#ifdef CONFIG_TLS_DEVICE
+   if (skb->decrypted != nskb->decrypted)
+   goto end;
+#endif
}
}
}
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index f5aee64..870b0a3 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -262,6 +262,9 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, 
struct sk_buff *skb)
 
flush |= (len - 1) >= mss;
flush |= (ntohl(th2->seq) + skb_gro_len(p)) ^ ntohl(th->seq);
+#ifdef CONFIG_TLS_DEVICE
+   flush |= p->decrypted ^ skb->decrypted;
+#endif
 
if (flush || skb_gro_receive(p, skb)) {
mss = 1;
-- 
1.8.3.1



[PATCH v4 net-next 11/19] net/mlx5e: TLS, refactor variable names

2018-07-12 Thread Boris Pismenny
For symmetry, we rename mlx5e_tls_offload_context to
mlx5e_tls_offload_context_tx before we add mlx5e_tls_offload_context_rx.

Signed-off-by: Boris Pismenny 
Reviewed-by: Aviad Yehezkel 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c  | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h  | 8 
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c | 6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
index d167845..7fb9c75 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -123,7 +123,7 @@ static int mlx5e_tls_add(struct net_device *netdev, struct 
sock *sk,
goto free_flow;
 
if (direction == TLS_OFFLOAD_CTX_DIR_TX) {
-   struct mlx5e_tls_offload_context *tx_ctx =
+   struct mlx5e_tls_offload_context_tx *tx_ctx =
mlx5e_get_tls_tx_context(tls_ctx);
u32 swid;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
index b82f4de..e26222a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
@@ -49,19 +49,19 @@ struct mlx5e_tls {
struct mlx5e_tls_sw_stats sw_stats;
 };
 
-struct mlx5e_tls_offload_context {
+struct mlx5e_tls_offload_context_tx {
struct tls_offload_context_tx base;
u32 expected_seq;
__be32 swid;
 };
 
-static inline struct mlx5e_tls_offload_context *
+static inline struct mlx5e_tls_offload_context_tx *
 mlx5e_get_tls_tx_context(struct tls_context *tls_ctx)
 {
-   BUILD_BUG_ON(sizeof(struct mlx5e_tls_offload_context) >
+   BUILD_BUG_ON(sizeof(struct mlx5e_tls_offload_context_tx) >
 TLS_OFFLOAD_CONTEXT_SIZE_TX);
return container_of(tls_offload_ctx_tx(tls_ctx),
-   struct mlx5e_tls_offload_context,
+   struct mlx5e_tls_offload_context_tx,
base);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
index 15aef71..c96196f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
@@ -73,7 +73,7 @@ static int mlx5e_tls_add_metadata(struct sk_buff *skb, __be32 
swid)
return 0;
 }
 
-static int mlx5e_tls_get_sync_data(struct mlx5e_tls_offload_context *context,
+static int mlx5e_tls_get_sync_data(struct mlx5e_tls_offload_context_tx 
*context,
   u32 tcp_seq, struct sync_info *info)
 {
int remaining, i = 0, ret = -EINVAL;
@@ -161,7 +161,7 @@ static void mlx5e_tls_complete_sync_skb(struct sk_buff *skb,
 }
 
 static struct sk_buff *
-mlx5e_tls_handle_ooo(struct mlx5e_tls_offload_context *context,
+mlx5e_tls_handle_ooo(struct mlx5e_tls_offload_context_tx *context,
 struct mlx5e_txqsq *sq, struct sk_buff *skb,
 struct mlx5e_tx_wqe **wqe,
 u16 *pi,
@@ -239,7 +239,7 @@ struct sk_buff *mlx5e_tls_handle_tx_skb(struct net_device 
*netdev,
u16 *pi)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
-   struct mlx5e_tls_offload_context *context;
+   struct mlx5e_tls_offload_context_tx *context;
struct tls_context *tls_ctx;
u32 expected_seq;
int datalen;
-- 
1.8.3.1



[PATCH v4 net-next 01/19] net: Add decrypted field to skb

2018-07-12 Thread Boris Pismenny
The decrypted bit is propogated to cloned/copied skbs.
This will be used later by the inline crypto receive side offload
of tls.

Signed-off-by: Boris Pismenny 
Signed-off-by: Ilya Lesokhin 
---
 include/linux/skbuff.h | 7 ++-
 net/core/skbuff.c  | 6 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7601838..3ceb8dc 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -630,6 +630,7 @@ enum {
  * @hash: the packet hash
  * @queue_mapping: Queue mapping for multiqueue devices
  * @xmit_more: More SKBs are pending for this queue
+ * @decrypted: Decrypted SKB
  * @ndisc_nodetype: router type (from link layer)
  * @ooo_okay: allow the mapping of a socket to a queue to be changed
  * @l4_hash: indicate hash is a canonical 4-tuple hash over transport
@@ -736,7 +737,11 @@ struct sk_buff {
peeked:1,
head_frag:1,
xmit_more:1,
-   __unused:1; /* one bit hole */
+#ifdef CONFIG_TLS_DEVICE
+   decrypted:1;
+#else
+   __unused:1;
+#endif
 
/* fields enclosed in headers_start/headers_end are copied
 * using a single memcpy() in __copy_skb_header()
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c4e24ac..cfd6c6f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -805,6 +805,9 @@ static void __copy_skb_header(struct sk_buff *new, const 
struct sk_buff *old)
 * It is not yet because we do not want to have a 16 bit hole
 */
new->queue_mapping = old->queue_mapping;
+#ifdef CONFIG_TLS_DEVICE
+   new->decrypted = old->decrypted;
+#endif
 
memcpy(>headers_start, >headers_start,
   offsetof(struct sk_buff, headers_end) -
@@ -865,6 +868,9 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, 
struct sk_buff *skb)
C(head_frag);
C(data);
C(truesize);
+#ifdef CONFIG_TLS_DEVICE
+   C(decrypted);
+#endif
refcount_set(>users, 1);
 
atomic_inc(&(skb_shinfo(skb)->dataref));
-- 
1.8.3.1



Re: [PATCH net-next 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Florian Fainelli



On 07/12/2018 12:10 PM, Heiner Kallweit wrote:
> On 12.07.2018 21:09, Andrew Lunn wrote:
>>> Like r8169 also tg3 driver doesn't wait for the speed-down-renegotiation
>>> to finish. Therefore, even though I share Andrew's concerns, there seem
>>> to be chips where it's safe to not wait for the renegotiation to finish
>>> (e.g. because device is in PCI D3 already and can't generate an interrupt).
>>> Having said that I'd keep the sync parameter for phy_speed_down so that
>>> the driver can decide.
>>
>> Hi Heiner
>>
>> Please put a big fat comment about the dangers of sync=false in the
>> function header. We want people to known it is dangerous by default,
>> and should only be used in special conditions, when it is known to be
>> safe.
>>  Andrew
>>
> OK ..

What part do you find dangerous? Magic Packets are UDP packets and they
are not routed (unless specifically taken care of) so there is already
some "lossy" behavior involved with waking-up an Ethernet MAC, I don't
think that is too bad to retry several times until the link comes up.
-- 
Florian


Re: [PATCH v3 net-next 00/19] TLS offload rx, netdev & mlx5

2018-07-12 Thread Boris Pismenny

Hi Dave,

On 7/12/2018 12:54 PM, Dave Watson wrote:

On 07/11/18 10:54 PM, Boris Pismenny wrote:

Hi,

The following series provides TLS RX inline crypto offload.


All the tls patches look good to me except #10

"tls: Fix zerocopy_from_iter iov handling"

which seems to break the non-device zerocopy flow.


Thanks for reviewing!

Sorry, it seems to break the zerocopy send flow, and I've tested only 
with the receive flow offload disabled.


I'll fix it in v4. I think that adding a flag to indicate whether a 
revert is needed should do the trick. In the receive flow the revert is 
needed to handle potential errors, while in the transmit flow it needs 
to be removed.




The integration is very clean, thanks!  



v2->v3:
 - Fix typo
 - Adjust cover letter
 - Fix bug in zero copy flows
 - Use network byte order for the record number in resync
 - Adjust the sequence provided in resync

v1->v2:
 - Fix bisectability problems due to variable name changes
 - Fix potential uninitialized return value



Re: [PATCH net-next 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Heiner Kallweit
On 12.07.2018 21:09, Andrew Lunn wrote:
>> Like r8169 also tg3 driver doesn't wait for the speed-down-renegotiation
>> to finish. Therefore, even though I share Andrew's concerns, there seem
>> to be chips where it's safe to not wait for the renegotiation to finish
>> (e.g. because device is in PCI D3 already and can't generate an interrupt).
>> Having said that I'd keep the sync parameter for phy_speed_down so that
>> the driver can decide.
> 
> Hi Heiner
> 
> Please put a big fat comment about the dangers of sync=false in the
> function header. We want people to known it is dangerous by default,
> and should only be used in special conditions, when it is known to be
> safe.
>   Andrew
> 
OK ..

Heiner



Re: [PATCH net-next 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Andrew Lunn
> Like r8169 also tg3 driver doesn't wait for the speed-down-renegotiation
> to finish. Therefore, even though I share Andrew's concerns, there seem
> to be chips where it's safe to not wait for the renegotiation to finish
> (e.g. because device is in PCI D3 already and can't generate an interrupt).
> Having said that I'd keep the sync parameter for phy_speed_down so that
> the driver can decide.

Hi Heiner

Please put a big fat comment about the dangers of sync=false in the
function header. We want people to known it is dangerous by default,
and should only be used in special conditions, when it is known to be
safe.
Andrew


[PATCH v2 net-next 7/9] lan743x: Add EEE support

2018-07-12 Thread Bryan Whitehead
Implement EEE support

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/lan743x_ethtool.c | 89 
 drivers/net/ethernet/microchip/lan743x_main.h|  3 +
 2 files changed, 92 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
index f9d875d..3d95290 100644
--- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -415,6 +415,93 @@ static int lan743x_ethtool_get_sset_count(struct 
net_device *netdev, int sset)
}
 }
 
+static int lan743x_ethtool_get_eee(struct net_device *netdev,
+  struct ethtool_eee *eee)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+   struct phy_device *phydev = netdev->phydev;
+   u32 buf;
+   int ret;
+
+   if (!phydev)
+   return -EIO;
+   if (!phydev->drv) {
+   netif_err(adapter, drv, adapter->netdev,
+ "Missing PHY Driver\n");
+   return -EIO;
+   }
+
+   ret = phy_ethtool_get_eee(phydev, eee);
+   if (ret < 0)
+   return ret;
+
+   buf = lan743x_csr_read(adapter, MAC_CR);
+   if (buf & MAC_CR_EEE_EN_) {
+   eee->eee_enabled = true;
+   eee->eee_active = !!(eee->advertised & eee->lp_advertised);
+   eee->tx_lpi_enabled = true;
+   /* EEE_TX_LPI_REQ_DLY & tx_lpi_timer are same uSec unit */
+   buf = lan743x_csr_read(adapter, MAC_EEE_TX_LPI_REQ_DLY_CNT);
+   eee->tx_lpi_timer = buf;
+   } else {
+   eee->eee_enabled = false;
+   eee->eee_active = false;
+   eee->tx_lpi_enabled = false;
+   eee->tx_lpi_timer = 0;
+   }
+
+   return 0;
+}
+
+static int lan743x_ethtool_set_eee(struct net_device *netdev,
+  struct ethtool_eee *eee)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+   struct phy_device *phydev = NULL;
+   u32 buf = 0;
+   int ret = 0;
+
+   if (!netdev)
+   return -EINVAL;
+   adapter = netdev_priv(netdev);
+   if (!adapter)
+   return -EINVAL;
+   phydev = netdev->phydev;
+   if (!phydev)
+   return -EIO;
+   if (!phydev->drv) {
+   netif_err(adapter, drv, adapter->netdev,
+ "Missing PHY Driver\n");
+   return -EIO;
+   }
+
+   if (eee->eee_enabled) {
+   ret = phy_init_eee(phydev, 0);
+   if (ret) {
+   netif_err(adapter, drv, adapter->netdev,
+ "EEE initialization failed\n");
+   return ret;
+   }
+
+   buf = lan743x_csr_read(adapter, MAC_CR);
+   buf |= MAC_CR_EEE_EN_;
+   lan743x_csr_write(adapter, MAC_CR, buf);
+
+   phy_ethtool_set_eee(phydev, eee);
+
+   buf = (u32)eee->tx_lpi_timer;
+   lan743x_csr_write(adapter, MAC_EEE_TX_LPI_REQ_DLY_CNT, buf);
+   netif_info(adapter, drv, adapter->netdev, "Enabled EEE\n");
+   } else {
+   buf = lan743x_csr_read(adapter, MAC_CR);
+   buf &= ~MAC_CR_EEE_EN_;
+   lan743x_csr_write(adapter, MAC_CR, buf);
+   netif_info(adapter, drv, adapter->netdev, "Disabled EEE\n");
+   }
+
+   return 0;
+}
+
 #ifdef CONFIG_PM
 static void lan743x_ethtool_get_wol(struct net_device *netdev,
struct ethtool_wolinfo *wol)
@@ -471,6 +558,8 @@ const struct ethtool_ops lan743x_ethtool_ops = {
.get_strings = lan743x_ethtool_get_strings,
.get_ethtool_stats = lan743x_ethtool_get_ethtool_stats,
.get_sset_count = lan743x_ethtool_get_sset_count,
+   .get_eee = lan743x_ethtool_get_eee,
+   .set_eee = lan743x_ethtool_set_eee,
.get_link_ksettings = phy_ethtool_get_link_ksettings,
.set_link_ksettings = phy_ethtool_set_link_ksettings,
 #ifdef CONFIG_PM
diff --git a/drivers/net/ethernet/microchip/lan743x_main.h 
b/drivers/net/ethernet/microchip/lan743x_main.h
index 72b9beb..93cb60a 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.h
+++ b/drivers/net/ethernet/microchip/lan743x_main.h
@@ -82,6 +82,7 @@
((value << 0) & FCT_FLOW_CTL_ON_THRESHOLD_)
 
 #define MAC_CR (0x100)
+#define MAC_CR_EEE_EN_ BIT(17)
 #define MAC_CR_ADD_BIT(12)
 #define MAC_CR_ASD_BIT(11)
 #define MAC_CR_CNTR_RST_   BIT(5)
@@ -117,6 +118,8 @@
 
 #define MAC_MII_DATA   (0x124)
 
+#define MAC_EEE_TX_LPI_REQ_DLY_CNT (0x130)
+
 #define MAC_WUCSR  (0x140)
 #define MAC_WUCSR_RFE_WAKE_EN_ BIT(14)
 #define MAC_WUCSR_PFDA_EN_ BIT(3)
-- 

[PATCH v2 net-next 1/9] lan743x: Add support for ethtool get_drvinfo

2018-07-12 Thread Bryan Whitehead
Implement ethtool get_drvinfo

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/Makefile  |  2 +-
 drivers/net/ethernet/microchip/lan743x_ethtool.c | 21 +
 drivers/net/ethernet/microchip/lan743x_ethtool.h | 11 +++
 drivers/net/ethernet/microchip/lan743x_main.c|  2 ++
 4 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ethtool.c
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ethtool.h

diff --git a/drivers/net/ethernet/microchip/Makefile 
b/drivers/net/ethernet/microchip/Makefile
index 2e982cc..43f47cb 100644
--- a/drivers/net/ethernet/microchip/Makefile
+++ b/drivers/net/ethernet/microchip/Makefile
@@ -6,4 +6,4 @@ obj-$(CONFIG_ENC28J60) += enc28j60.o
 obj-$(CONFIG_ENCX24J600) += encx24j600.o encx24j600-regmap.o
 obj-$(CONFIG_LAN743X) += lan743x.o
 
-lan743x-objs := lan743x_main.o
+lan743x-objs := lan743x_main.o lan743x_ethtool.o
diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
new file mode 100644
index 000..0e20758
--- /dev/null
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/* Copyright (C) 2018 Microchip Technology Inc. */
+
+#include 
+#include "lan743x_main.h"
+#include "lan743x_ethtool.h"
+#include 
+
+static void lan743x_ethtool_get_drvinfo(struct net_device *netdev,
+   struct ethtool_drvinfo *info)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   strlcpy(info->driver, DRIVER_NAME, sizeof(info->driver));
+   strlcpy(info->bus_info,
+   pci_name(adapter->pdev), sizeof(info->bus_info));
+}
+
+const struct ethtool_ops lan743x_ethtool_ops = {
+   .get_drvinfo = lan743x_ethtool_get_drvinfo,
+};
diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.h 
b/drivers/net/ethernet/microchip/lan743x_ethtool.h
new file mode 100644
index 000..d0d11a7
--- /dev/null
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/* Copyright (C) 2018 Microchip Technology Inc. */
+
+#ifndef _LAN743X_ETHTOOL_H
+#define _LAN743X_ETHTOOL_H
+
+#include "linux/ethtool.h"
+
+extern const struct ethtool_ops lan743x_ethtool_ops;
+
+#endif /* _LAN743X_ETHTOOL_H */
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
b/drivers/net/ethernet/microchip/lan743x_main.c
index e1747a4..ade3b04 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include "lan743x_main.h"
+#include "lan743x_ethtool.h"
 
 static void lan743x_pci_cleanup(struct lan743x_adapter *adapter)
 {
@@ -2689,6 +2690,7 @@ static int lan743x_pcidev_probe(struct pci_dev *pdev,
goto cleanup_hardware;
 
adapter->netdev->netdev_ops = _netdev_ops;
+   adapter->netdev->ethtool_ops = _ethtool_ops;
adapter->netdev->features = NETIF_F_SG | NETIF_F_TSO | NETIF_F_HW_CSUM;
adapter->netdev->hw_features = adapter->netdev->features;
 
-- 
2.7.4



[PATCH v2 net-next 6/9] lan743x: Add power management support

2018-07-12 Thread Bryan Whitehead
Implement power management.
Supports suspend, resume, and Wake On LAN

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/lan743x_ethtool.c |  48 ++
 drivers/net/ethernet/microchip/lan743x_main.c| 184 +++
 drivers/net/ethernet/microchip/lan743x_main.h|  47 ++
 3 files changed, 279 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
index f9ad237..f9d875d 100644
--- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -415,6 +415,50 @@ static int lan743x_ethtool_get_sset_count(struct 
net_device *netdev, int sset)
}
 }
 
+#ifdef CONFIG_PM
+static void lan743x_ethtool_get_wol(struct net_device *netdev,
+   struct ethtool_wolinfo *wol)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   wol->supported = WAKE_BCAST | WAKE_UCAST | WAKE_MCAST |
+   WAKE_MAGIC | WAKE_PHY | WAKE_ARP;
+
+   wol->wolopts = adapter->wolopts;
+}
+#endif /* CONFIG_PM */
+
+#ifdef CONFIG_PM
+static int lan743x_ethtool_set_wol(struct net_device *netdev,
+  struct ethtool_wolinfo *wol)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   if (wol->wolopts & WAKE_MAGICSECURE)
+   return -EOPNOTSUPP;
+
+   adapter->wolopts = 0;
+   if (wol->wolopts & WAKE_UCAST)
+   adapter->wolopts |= WAKE_UCAST;
+   if (wol->wolopts & WAKE_MCAST)
+   adapter->wolopts |= WAKE_MCAST;
+   if (wol->wolopts & WAKE_BCAST)
+   adapter->wolopts |= WAKE_BCAST;
+   if (wol->wolopts & WAKE_MAGIC)
+   adapter->wolopts |= WAKE_MAGIC;
+   if (wol->wolopts & WAKE_PHY)
+   adapter->wolopts |= WAKE_PHY;
+   if (wol->wolopts & WAKE_ARP)
+   adapter->wolopts |= WAKE_ARP;
+
+   device_set_wakeup_enable(>pdev->dev, (bool)wol->wolopts);
+
+   phy_ethtool_set_wol(netdev->phydev, wol);
+
+   return 0;
+}
+#endif /* CONFIG_PM */
+
 const struct ethtool_ops lan743x_ethtool_ops = {
.get_drvinfo = lan743x_ethtool_get_drvinfo,
.get_msglevel = lan743x_ethtool_get_msglevel,
@@ -429,4 +473,8 @@ const struct ethtool_ops lan743x_ethtool_ops = {
.get_sset_count = lan743x_ethtool_get_sset_count,
.get_link_ksettings = phy_ethtool_get_link_ksettings,
.set_link_ksettings = phy_ethtool_set_link_ksettings,
+#ifdef CONFIG_PM
+   .get_wol = lan743x_ethtool_get_wol,
+   .set_wol = lan743x_ethtool_set_wol,
+#endif
 };
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
b/drivers/net/ethernet/microchip/lan743x_main.c
index 1e2f8c6..8e9eff8 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "lan743x_main.h"
 #include "lan743x_ethtool.h"
 
@@ -2749,10 +2750,190 @@ static void lan743x_pcidev_shutdown(struct pci_dev 
*pdev)
lan743x_netdev_close(netdev);
rtnl_unlock();
 
+#ifdef CONFIG_PM
+   pci_save_state(pdev);
+#endif
+
/* clean up lan743x portion */
lan743x_hardware_cleanup(adapter);
 }
 
+#ifdef CONFIG_PM
+static u16 lan743x_pm_wakeframe_crc16(const u8 *buf, int len)
+{
+   return bitrev16(crc16(0x, buf, len));
+}
+#endif /* CONFIG_PM */
+
+#ifdef CONFIG_PM
+static void lan743x_pm_set_wol(struct lan743x_adapter *adapter)
+{
+   const u8 ipv4_multicast[3] = { 0x01, 0x00, 0x5E };
+   const u8 ipv6_multicast[3] = { 0x33, 0x33 };
+   const u8 arp_type[2] = { 0x08, 0x06 };
+   int mask_index;
+   u32 pmtctl;
+   u32 wucsr;
+   u32 macrx;
+   u16 crc;
+
+   for (mask_index = 0; mask_index < MAC_NUM_OF_WUF_CFG; mask_index++)
+   lan743x_csr_write(adapter, MAC_WUF_CFG(mask_index), 0);
+
+   /* clear wake settings */
+   pmtctl = lan743x_csr_read(adapter, PMT_CTL);
+   pmtctl |= PMT_CTL_WUPS_MASK_;
+   pmtctl &= ~(PMT_CTL_GPIO_WAKEUP_EN_ | PMT_CTL_EEE_WAKEUP_EN_ |
+   PMT_CTL_WOL_EN_ | PMT_CTL_MAC_D3_RX_CLK_OVR_ |
+   PMT_CTL_RX_FCT_RFE_D3_CLK_OVR_ | PMT_CTL_ETH_PHY_WAKE_EN_);
+
+   macrx = lan743x_csr_read(adapter, MAC_RX);
+
+   wucsr = 0;
+   mask_index = 0;
+
+   pmtctl |= PMT_CTL_ETH_PHY_D3_COLD_OVR_ | PMT_CTL_ETH_PHY_D3_OVR_;
+
+   if (adapter->wolopts & WAKE_PHY) {
+   pmtctl |= PMT_CTL_ETH_PHY_EDPD_PLL_CTL_;
+   pmtctl |= PMT_CTL_ETH_PHY_WAKE_EN_;
+   }
+   if (adapter->wolopts & WAKE_MAGIC) {
+   wucsr |= MAC_WUCSR_MPEN_;
+   macrx |= MAC_RX_RXEN_;
+   pmtctl |= PMT_CTL_WOL_EN_ | PMT_CTL_MAC_D3_RX_CLK_OVR_;
+   }
+   if (adapter->wolopts & WAKE_UCAST) {
+   wucsr |= MAC_WUCSR_RFE_WAKE_EN_ | MAC_WUCSR_PFDA_EN_;
+   macrx |= 

  1   2   >