Re: [PATCH mlx5-next 1/7] net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits

2018-12-03 Thread Leon Romanovsky
On Mon, Dec 03, 2018 at 01:32:45PM -0500, Doug Ledford wrote:
> On Mon, 2018-11-26 at 08:28 +0200, Leon Romanovsky wrote:
> > From: Yishai Hadas 
> >
> > Expose device capabilities for DEVX user context, it includes which caps
> > the device is supported and a matching bit to set as part of user
> > context creation.
> >
> > Signed-off-by: Yishai Hadas 
> > Reviewed-by: Artemy Kovalyov 
> > Signed-off-by: Leon Romanovsky 
>
> This looks fine to me.  Is it in mlx5-next yet?
>

9d43faac02e3 net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits

Thanks


signature.asc
Description: PGP signature


Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames

2018-12-03 Thread Eric Dumazet
On Mon, Dec 3, 2018 at 11:30 PM Cong Wang  wrote:
>
> On Mon, Dec 3, 2018 at 11:08 PM Eric Dumazet  wrote:
> >
> > The hardware has probably validated the L3 & L4 checksum just fine.
> >
> > Note that if ip_summed is CHECKSUM_UNNECESSARY, the padding bytes (if any)
> > have no impact on the csum that has been verified by the NIC.
>
>
> Why? Why does the hardware validates L3/L4 checksum when it
> supplies a full-packet checksum? What's its point here?

The point is that the driver author can decide what is best.

For native IP+TCP or IP+UDP, the NIC has the ability to fully
understand the packet and fully validate the checksum.

>
> If it really validates L3/L4 checksum, then a full-packet checksum
> is not needed.

Yes, this is exactly what CHECKSUM_UNNECESSARY means.
linux stack does not have to perform the check another time.

For example, no call to csum_partial() is needed, even for IPv6+TCP or IPv6+UDP


Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames

2018-12-03 Thread Cong Wang
On Mon, Dec 3, 2018 at 11:08 PM Eric Dumazet  wrote:
>
> The hardware has probably validated the L3 & L4 checksum just fine.
>
> Note that if ip_summed is CHECKSUM_UNNECESSARY, the padding bytes (if any)
> have no impact on the csum that has been verified by the NIC.


Why? Why does the hardware validates L3/L4 checksum when it
supplies a full-packet checksum? What's its point here?

If it really validates L3/L4 checksum, then a full-packet checksum
is not needed.

If a full-packet checksum is supplied, the software is able to use
it to validate L3/L4 checksum, then the hardware doesn't need to
validate it.

I see no reason it provides both at the same time. If it really does,
then all CHECKSUM_COMPLETE code here could be just removed
and would be faster.

Something must be wrong with your argument.

>
> Sorry I do not get your point.


I don't get your point either.


Thanks.


Re: [PATCH mlx5-next 00/11] Remove SRQ code from mlx5_core

2018-12-03 Thread Leon Romanovsky
On Wed, Nov 28, 2018 at 08:53:32PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky 
>
> Hi Doug and Jason,
>
> This series is intended to be applied on mlx5-next after Saeed's
> infrastructure changes series are accepted:
> https://patchwork.kernel.org/project/linux-rdma/list/?series=47809
>
> --
> The following patchset brings mlx5 SRQ functionality to the module
> which is actually using it - mlx5_ib. The original code being not used
> in mlx5_core/mlx5_en, but located there, caused to unneeded maintenance
> overhead.
>
> By moving mlx5 SRQ code to mlx5_ib, we successfully reduced ~200 LOC codes
> and minimized exports and headers pollution.
>
> The proposed patches are divided to three logical groups: move from
> netdev, fix IB part and cleanup.
>
> Thanks
>
> Leon Romanovsky (11):
>   net/mlx5: Align SRQ licenses and copyright information
>   net/mlx5: Remove dead transobj code
>   net/mlx5: Remove not-used lib/eq.h header file
>   net/mlx5: Remove references to local mlx5_core functions
>   net/mlx5: Move SRQ functions to RDMA part
>   RDMA/mlx5: Remove SRQ signature global flag
>   RDMA/mlx5: Use stages for callback to setup and release DEVX
>   RDMA/mlx5: Update SRQ functions signatures to mlx5_ib format
>   RDMA/mlx5: Initialize SRQ tables on mlx5_ib
>   RDMA/mlx5: Unfold create RMP function
>   RDMA/mlx5: Unfold modify RMP function
>

Applied to mlx5-next.

Thanks


signature.asc
Description: PGP signature


Re: consistency for statistics with XDP mode

2018-12-03 Thread Jakub Kicinski
On Tue, 04 Dec 2018 09:03:52 +0200, Toke Høiland-Jørgensen wrote:
> David Miller  writes:
> 
> > From: David Ahern 
> > Date: Mon, 3 Dec 2018 17:15:03 -0700
> >  
> >> So, instead of a program tag which the program writer controls, how
> >> about some config knob that an admin controls that says at attach time
> >> use standard stats?  
> >
> > How about, instead of replacing it is in addition to, and admin can
> > override?  
> 
> Yeah, I was also thinking about something the program writer can set,
> but the admin can override. There could even be a system-wide setting
> that would make the verifier inject it into all programs at load time?

That'd be far preferable, having all drivers include the code when we
have JITs seems like a step backward.

We could probably fit the stats into the enormous padding of struct
xdp_rxq_info, the question is - how do we get to it in a clean way..


Re: [PATCH 0/3]: net: dsa: mt7530: support MT7530 in the MT7621 SoC

2018-12-03 Thread Greg Ungerer

Hi Bjorn,

On 3/12/18 9:34 pm, Bjørn Mork wrote:

[ fixed Johns address - the openwrt.org email was apparently never restored? ]

Greg Ungerer  writes:


The following change helped alot, but I still get some problems under
sustained load and some types of port setups. Still trying to figure
out what exactly is going on.

--- a/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1750,8 +1750,8 @@ static irqreturn_t mtk_handle_irq_rx(int irq, void *_eth)
if (likely(napi_schedule_prep(>rx_napi))) {
 __napi_schedule(>rx_napi);
-   mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
 }
+   mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
return IRQ_HANDLED;
  }
@@ -1762,11 +1762,53 @@ static irqreturn_t mtk_handle_irq_tx(int irq, void 
*_eth)
if (likely(napi_schedule_prep(>tx_napi))) {
 __napi_schedule(>tx_napi);
-   mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
 }
+   mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
return IRQ_HANDLED;
  }


Yes, sorry I didn't point to that as well.  Just to be clear:  I have no
clue how this thing is actually wired up, or if you could use three
interrupts on the MT7621 too. I just messed with it until I got
something to work, based on Renés original idea and code.


Understood.

By way of progress I have found that making the single IRQ handler
look like this is better than the last patch:

static irqreturn_t mtk_handle_irq(int irq, void *_eth)
{
struct mtk_eth *eth = _eth;
irqreturn_t rc = IRQ_NONE;

if (mtk_r32(eth, MTK_PDMA_INT_MASK) & MTK_RX_DONE_INT) {
if (mtk_r32(eth, MTK_PDMA_INT_STATUS) & MTK_RX_DONE_INT)
rc = mtk_handle_irq_rx(irq, _eth);
}

if (mtk_r32(eth, MTK_QDMA_INT_MASK) & MTK_TX_DONE_INT) {
if (mtk_r32(eth, MTK_QMTK_INT_STATUS) & MTK_TX_DONE_INT)
rc = mtk_handle_irq_tx(irq, _eth);
}

return rc;
}

So not calling the RX or TX handlers if their interrupts
where not masked on. For some work loads that is quite reliable.
Flood ping through one port and out the other can get a lof of
packets through.

However I still get dev_watchdog timeouts after a while and with
more mixed packet loads. Seems like something is racing on the
TX path side.

Regards
Greg




Anyway, this really looks like the right approach to me. This driver is
clearly capable of supporting the mt7621 ethernet ports. No need for the
staging driver.


Great! Thanks for doing this.

I did make a feeble attempt at testing this with current mainline
myself, but the only MT7621 board I have is using NAND flash.  So I
started trying to forward port the mtk-nand2 driver from OpenWrt. And
failed. Probably a simple mixup while trying to adjust to the many
changes in the raw NAND API between v4.14 and v.4.20.  Then I
optimistically attempted to use the mainline mtk-nand driver instead,
assuming it would be as simple as with the mtk-eth driver.  Which it
wasn't, of course. I guess there are a lot of things I do not understand
wrt flash and HW ECC etc...

Short version: I won't be able to test the mainline mtk-eth driver with
MT7621 on newer kernels before smarter people like John upgrade the
OpenWrt kernel.


Bjørn



Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames

2018-12-03 Thread Eric Dumazet
Resent to netdev@ without htm formatting, sorry.

On Mon, Dec 3, 2018 at 11:08 PM Eric Dumazet  wrote:
>
>
>
> On Mon, Dec 3, 2018 at 10:48 PM Cong Wang  wrote:
>>
>> On Mon, Dec 3, 2018 at 10:34 PM Eric Dumazet  wrote:
>> >
>> > On Mon, Dec 3, 2018 at 10:14 PM Cong Wang  wrote:
>> > >
>> > > When an ethernet frame is padded to meet the minimum ethernet frame
>> > > size, the padding octets are not covered by the hardware checksum.
>> > > Fortunately the padding octets are ususally zero's, which don't affect
>> > > checksum. However, we have a switch which pads non-zero octets, this
>> > > causes kernel hardware checksum fault repeatedly.
>> > >
>> > > Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and 
>> > > CHECKSUM_COMPLETE are friends"),
>> > > skb checksum was forced to be CHECKSUM_NONE when padding is detected.
>> > > After it, we need to keep skb->csum updated, like what we do for RXFCS.
>> > > However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
>> > > headers, it is not worthy the effort as the packets are so small that
>> > > CHECKSUM_COMPLETE can't save anything.
>> > >
>> > > I tested this patch with RXFCS on and off, it works fine without any
>> > > warning in both cases.
>> > >
>> > > Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are 
>> > > friends"),
>> > > Cc: Saeed Mahameed 
>> > > Cc: Eric Dumazet 
>> > > Cc: Tariq Toukan 
>> > > Signed-off-by: Cong Wang 
>> > > ---
>> > >  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 ++-
>> > >  1 file changed, 21 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
>> > > b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> > > index 624eed345b5d..1c153b8091da 100644
>> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> > > @@ -732,6 +732,13 @@ static u8 get_ip_proto(struct sk_buff *skb, int 
>> > > network_depth, __be16 proto)
>> > > ((struct ipv6hdr 
>> > > *)ip_p)->nexthdr;
>> > >  }
>> > >
>> > > +static bool is_short_frame(struct sk_buff *skb, bool has_fcs)
>> > > +{
>> > > +   u32 frame_len = has_fcs ? skb->len - ETH_FCS_LEN : skb->len;
>> > > +
>> > > +   return frame_len <= ETH_ZLEN;
>> > > +}
>> > > +
>> > >  static inline void mlx5e_handle_csum(struct net_device *netdev,
>> > >  struct mlx5_cqe64 *cqe,
>> > >  struct mlx5e_rq *rq,
>> > > @@ -755,9 +762,22 @@ static inline void mlx5e_handle_csum(struct 
>> > > net_device *netdev,
>> > > goto csum_unnecessary;
>> > >
>> > > if (likely(is_last_ethertype_ip(skb, _depth, ))) {
>> > > +   bool has_fcs = !!(netdev->features & NETIF_F_RXFCS);
>> > > +
>> > > if (unlikely(get_ip_proto(skb, network_depth, proto) == 
>> > > IPPROTO_SCTP))
>> > > goto csum_unnecessary;
>> > >
>> > > +   /* CQE csum doesn't cover padding octets in short 
>> > > ethernet
>> > > +* frames. And the pad field is appended prior to 
>> > > calculating
>> > > +* and appending the FCS field.
>> > > +*
>> > > +* Detecting these padded frames requires to verify and 
>> > > parse
>> > > +* IP headers, so we simply force all those small frames 
>> > > to be
>> > > +* CHECKSUM_NONE even if they are not padded.
>> > > +*/
>> > > +   if (unlikely(is_short_frame(skb, has_fcs)))
>> > > +   goto csum_none;
>> >
>> > Should not this go to csum_unnecessary instead ?
>>
>> I don't see why we don't even want to validate the protocol checksum
>> here.
>>
>> Any reason you are suggesting so?
>>
>>
>> >
>> > Probably not a big deal, but small UDP frames might hit this code path,
>> > so ethtool -S would show a lot of csum_none which could confuse mlx5 
>> > owners.
>>
>> Why it is confusing? We intentionally bypass hardware checksum
>> and let protocol layer validate it.
>
>
> The hardware has probably validated the L3 & L4 checksum just fine.
>
> Note that if ip_summed is CHECKSUM_UNNECESSARY, the padding bytes (if any)
> have no impact on the csum that has been verified by the NIC.
>
>
>>
>>
>> >
>> > BTW,
>> > It looks like mlx5 prefers delivering skbs with CHECKSUM_COMPLETE instead 
>> > of
>> > CHECKSUM_UNNECESSARY but at least for ipv6 CHECKSUM_UNNECESSARY
>> > would be slightly faster, by avoiding  various csum_partial() costs
>> > when headers are parsed.
>>
>> Sure, it is certainly faster if you don't want to validate L4 checksum.
>
>
> Sorry I do not get your point.
>
>>
>> The only question is why we don't either validate hardware checksum
>> or L4 checksum?
>>
>
> CHECKSUM_UNNECESSARY means the NIC has fully  validated the checksums,
> including L4 one.
>
> For example, mlx4 drivers first checks if 

Re: consistency for statistics with XDP mode

2018-12-03 Thread Toke Høiland-Jørgensen
David Miller  writes:

> From: David Ahern 
> Date: Mon, 3 Dec 2018 17:15:03 -0700
>
>> So, instead of a program tag which the program writer controls, how
>> about some config knob that an admin controls that says at attach time
>> use standard stats?
>
> How about, instead of replacing it is in addition to, and admin can
> override?

Yeah, I was also thinking about something the program writer can set,
but the admin can override. There could even be a system-wide setting
that would make the verifier inject it into all programs at load time?

-Toke


Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames

2018-12-03 Thread Cong Wang
On Mon, Dec 3, 2018 at 10:34 PM Eric Dumazet  wrote:
>
> On Mon, Dec 3, 2018 at 10:14 PM Cong Wang  wrote:
> >
> > When an ethernet frame is padded to meet the minimum ethernet frame
> > size, the padding octets are not covered by the hardware checksum.
> > Fortunately the padding octets are ususally zero's, which don't affect
> > checksum. However, we have a switch which pads non-zero octets, this
> > causes kernel hardware checksum fault repeatedly.
> >
> > Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE 
> > are friends"),
> > skb checksum was forced to be CHECKSUM_NONE when padding is detected.
> > After it, we need to keep skb->csum updated, like what we do for RXFCS.
> > However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
> > headers, it is not worthy the effort as the packets are so small that
> > CHECKSUM_COMPLETE can't save anything.
> >
> > I tested this patch with RXFCS on and off, it works fine without any
> > warning in both cases.
> >
> > Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are 
> > friends"),
> > Cc: Saeed Mahameed 
> > Cc: Eric Dumazet 
> > Cc: Tariq Toukan 
> > Signed-off-by: Cong Wang 
> > ---
> >  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 ++-
> >  1 file changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > index 624eed345b5d..1c153b8091da 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > @@ -732,6 +732,13 @@ static u8 get_ip_proto(struct sk_buff *skb, int 
> > network_depth, __be16 proto)
> > ((struct ipv6hdr 
> > *)ip_p)->nexthdr;
> >  }
> >
> > +static bool is_short_frame(struct sk_buff *skb, bool has_fcs)
> > +{
> > +   u32 frame_len = has_fcs ? skb->len - ETH_FCS_LEN : skb->len;
> > +
> > +   return frame_len <= ETH_ZLEN;
> > +}
> > +
> >  static inline void mlx5e_handle_csum(struct net_device *netdev,
> >  struct mlx5_cqe64 *cqe,
> >  struct mlx5e_rq *rq,
> > @@ -755,9 +762,22 @@ static inline void mlx5e_handle_csum(struct net_device 
> > *netdev,
> > goto csum_unnecessary;
> >
> > if (likely(is_last_ethertype_ip(skb, _depth, ))) {
> > +   bool has_fcs = !!(netdev->features & NETIF_F_RXFCS);
> > +
> > if (unlikely(get_ip_proto(skb, network_depth, proto) == 
> > IPPROTO_SCTP))
> > goto csum_unnecessary;
> >
> > +   /* CQE csum doesn't cover padding octets in short ethernet
> > +* frames. And the pad field is appended prior to 
> > calculating
> > +* and appending the FCS field.
> > +*
> > +* Detecting these padded frames requires to verify and 
> > parse
> > +* IP headers, so we simply force all those small frames to 
> > be
> > +* CHECKSUM_NONE even if they are not padded.
> > +*/
> > +   if (unlikely(is_short_frame(skb, has_fcs)))
> > +   goto csum_none;
>
> Should not this go to csum_unnecessary instead ?

I don't see why we don't even want to validate the protocol checksum
here.

Any reason you are suggesting so?


>
> Probably not a big deal, but small UDP frames might hit this code path,
> so ethtool -S would show a lot of csum_none which could confuse mlx5 owners.

Why it is confusing? We intentionally bypass hardware checksum
and let protocol layer validate it.


>
> BTW,
> It looks like mlx5 prefers delivering skbs with CHECKSUM_COMPLETE instead of
> CHECKSUM_UNNECESSARY but at least for ipv6 CHECKSUM_UNNECESSARY
> would be slightly faster, by avoiding  various csum_partial() costs
> when headers are parsed.

Sure, it is certainly faster if you don't want to validate L4 checksum.
The only question is why we don't either validate hardware checksum
or L4 checksum?

Thanks.


[PATCH bpf 2/3] bpf: improve verifier branch analysis

2018-12-03 Thread Alexei Starovoitov
pathological bpf programs may try to force verifier to explode in
the number of branch states:
  20: (d5) if r1 s<= 0x2428 goto pc+0
  21: (b5) if r0 <= 0xe1fa20 goto pc+2
  22: (d5) if r1 s<= 0x7e goto pc+0
  23: (b5) if r0 <= 0xe880e000 goto pc+0
  24: (c5) if r0 s< 0x2100ecf4 goto pc+0
  25: (d5) if r1 s<= 0xe880e000 goto pc+1
  26: (c5) if r0 s< 0xf4041810 goto pc+0
  27: (d5) if r1 s<= 0x1e007e goto pc+0
  28: (b5) if r0 <= 0xe86be000 goto pc+0
  29: (07) r0 += 16614
  30: (c5) if r0 s< 0x6d0020da goto pc+0
  31: (35) if r0 >= 0x2100ecf4 goto pc+0

Teach verifier to recognize always taken and always not taken branches.
This analysis is already done for == and != comparison.
Expand it to all other branches.

It also helps real bpf programs to be verified faster:
   before  after
bpf_lb-DLB_L3.o 20031940
bpf_lb-DLB_L4.o 31733089
bpf_lb-DUNKNOWN.o   10801065
bpf_lxc-DDROP_ALL.o 29584   28052
bpf_lxc-DUNKNOWN.o  36916   35487
bpf_netdev.o11188   10864
bpf_overlay.o   66796643
bpf_lcx_jit.o   39555   38437

Reported-by: Anatoly Trosinenko 
Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Edward Cree 
---
 kernel/bpf/verifier.c   | 93 ++---
 tools/testing/selftests/bpf/test_verifier.c |  4 +-
 2 files changed, 82 insertions(+), 15 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 751bb30b7c5c..55a49703f423 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3751,6 +3751,79 @@ static void find_good_pkt_pointers(struct 
bpf_verifier_state *vstate,
}
 }
 
+/* compute branch direction of the expression "if (reg opcode val) goto 
target;"
+ * and return:
+ *  1 - branch will be taken and "goto target" will be executed
+ *  0 - branch will not be taken and fall-through to next insn
+ * -1 - unknown. Example: "if (reg < 5)" is unknown when register value range 
[0,10]
+ */
+static int is_branch_taken(struct bpf_reg_state *reg, u64 val, u8 opcode)
+{
+   if (__is_pointer_value(false, reg))
+   return -1;
+
+   switch (opcode) {
+   case BPF_JEQ:
+   if (tnum_is_const(reg->var_off))
+   return !!tnum_equals_const(reg->var_off, val);
+   break;
+   case BPF_JNE:
+   if (tnum_is_const(reg->var_off))
+   return !tnum_equals_const(reg->var_off, val);
+   break;
+   case BPF_JGT:
+   if (reg->umin_value > val)
+   return 1;
+   else if (reg->umax_value <= val)
+   return 0;
+   break;
+   case BPF_JSGT:
+   if (reg->smin_value > (s64)val)
+   return 1;
+   else if (reg->smax_value < (s64)val)
+   return 0;
+   break;
+   case BPF_JLT:
+   if (reg->umax_value < val)
+   return 1;
+   else if (reg->umin_value >= val)
+   return 0;
+   break;
+   case BPF_JSLT:
+   if (reg->smax_value < (s64)val)
+   return 1;
+   else if (reg->smin_value >= (s64)val)
+   return 0;
+   break;
+   case BPF_JGE:
+   if (reg->umin_value >= val)
+   return 1;
+   else if (reg->umax_value < val)
+   return 0;
+   break;
+   case BPF_JSGE:
+   if (reg->smin_value >= (s64)val)
+   return 1;
+   else if (reg->smax_value < (s64)val)
+   return 0;
+   break;
+   case BPF_JLE:
+   if (reg->umax_value <= val)
+   return 1;
+   else if (reg->umin_value > val)
+   return 0;
+   break;
+   case BPF_JSLE:
+   if (reg->smax_value <= (s64)val)
+   return 1;
+   else if (reg->smin_value > (s64)val)
+   return 0;
+   break;
+   }
+
+   return -1;
+}
+
 /* Adjusts the register min/max values in the case that the dst_reg is the
  * variable register that we are working on, and src_reg is a constant or we're
  * simply doing a BPF_K check.
@@ -4152,21 +4225,15 @@ static int check_cond_jmp_op(struct bpf_verifier_env 
*env,
 
dst_reg = [insn->dst_reg];
 
-   /* detect if R == 0 where R was initialized to zero earlier */
-   if (BPF_SRC(insn->code) == BPF_K &&
-   (opcode == BPF_JEQ || opcode == BPF_JNE) &&
-   dst_reg->type == SCALAR_VALUE &&
-   tnum_is_const(dst_reg->var_off)) {
-   if ((opcode == BPF_JEQ && dst_reg->var_off.value == insn->imm) 
||
-   (opcode == BPF_JNE && dst_reg->var_off.value != insn->imm)) 
{
-   /* if 

[PATCH bpf 0/3] bpf: improve verifier resilience

2018-12-03 Thread Alexei Starovoitov
Three patches to improve verifier ability to handle pathological bpf programs
with a lot of branches:
- make sure prog_load syscall can be aborted
- improve branch taken analysis
- introduce per-insn complexity limit for unprivileged programs

Alexei Starovoitov (3):
  bpf: check pending signals while verifying programs
  bpf: improve verifier branch analysis
  bpf: add per-insn complexity limit

 kernel/bpf/verifier.c   | 103 +---
 tools/testing/selftests/bpf/test_verifier.c |   4 +-
 2 files changed, 91 insertions(+), 16 deletions(-)

-- 
2.17.1



[PATCH bpf 3/3] bpf: add per-insn complexity limit

2018-12-03 Thread Alexei Starovoitov
malicious bpf program may try to force the verifier to remember
a lot of distinct verifier states.
Put a limit to number of per-insn 'struct bpf_verifier_state'.
Note that hitting the limit doesn't reject the program.
It potentially makes the verifier do more steps to analyze the program.
It means that malicious programs will hit BPF_COMPLEXITY_LIMIT_INSNS sooner
instead of spending cpu time walking long link list.

The limit of BPF_COMPLEXITY_LIMIT_STATES==64 affects cilium progs
with slight increase in number of "steps" it takes to successfully verify
the programs:
   beforeafter
bpf_lb-DLB_L3.o 1940  1940
bpf_lb-DLB_L4.o 3089  3089
bpf_lb-DUNKNOWN.o   1065  1065
bpf_lxc-DDROP_ALL.o 28052  |  28162
bpf_lxc-DUNKNOWN.o  35487  |  35541
bpf_netdev.o10864 10864
bpf_overlay.o   6643  6643
bpf_lcx_jit.o   38437 38437

But it also makes malicious program to be rejected in 0.4 seconds vs 6.5
Hence apply this limit to unprivileged programs only.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Edward Cree 
---
 kernel/bpf/verifier.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 55a49703f423..fc760d00a38c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -175,6 +175,7 @@ struct bpf_verifier_stack_elem {
 
 #define BPF_COMPLEXITY_LIMIT_INSNS 131072
 #define BPF_COMPLEXITY_LIMIT_STACK 1024
+#define BPF_COMPLEXITY_LIMIT_STATES64
 
 #define BPF_MAP_PTR_UNPRIV 1UL
 #define BPF_MAP_PTR_POISON ((void *)((0xeB9FUL << 1) + \
@@ -5047,7 +5048,7 @@ static int is_state_visited(struct bpf_verifier_env *env, 
int insn_idx)
struct bpf_verifier_state_list *new_sl;
struct bpf_verifier_state_list *sl;
struct bpf_verifier_state *cur = env->cur_state, *new;
-   int i, j, err;
+   int i, j, err, states_cnt = 0;
 
sl = env->explored_states[insn_idx];
if (!sl)
@@ -5074,8 +5075,12 @@ static int is_state_visited(struct bpf_verifier_env 
*env, int insn_idx)
return 1;
}
sl = sl->next;
+   states_cnt++;
}
 
+   if (!env->allow_ptr_leaks && states_cnt > BPF_COMPLEXITY_LIMIT_STATES)
+   return 0;
+
/* there were no equivalent states, remember current one.
 * technically the current state is not proven to be safe yet,
 * but it will either reach outer most bpf_exit (which means it's safe)
-- 
2.17.1



[PATCH bpf 1/3] bpf: check pending signals while verifying programs

2018-12-03 Thread Alexei Starovoitov
Malicious user space may try to force the verifier to use as much cpu
time and memory as possible. Hence check for pending signals
while verifying the program.
Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN,
since the kernel has to release the resources used for program verification.

Reported-by: Anatoly Trosinenko 
Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Edward Cree 
---
 kernel/bpf/verifier.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6dd419550aba..751bb30b7c5c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5148,6 +5148,9 @@ static int do_check(struct bpf_verifier_env *env)
goto process_bpf_exit;
}
 
+   if (signal_pending(current))
+   return -EAGAIN;
+
if (need_resched())
cond_resched();
 
-- 
2.17.1



Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames

2018-12-03 Thread Eric Dumazet
On Mon, Dec 3, 2018 at 10:14 PM Cong Wang  wrote:
>
> When an ethernet frame is padded to meet the minimum ethernet frame
> size, the padding octets are not covered by the hardware checksum.
> Fortunately the padding octets are ususally zero's, which don't affect
> checksum. However, we have a switch which pads non-zero octets, this
> causes kernel hardware checksum fault repeatedly.
>
> Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE 
> are friends"),
> skb checksum was forced to be CHECKSUM_NONE when padding is detected.
> After it, we need to keep skb->csum updated, like what we do for RXFCS.
> However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
> headers, it is not worthy the effort as the packets are so small that
> CHECKSUM_COMPLETE can't save anything.
>
> I tested this patch with RXFCS on and off, it works fine without any
> warning in both cases.
>
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are 
> friends"),
> Cc: Saeed Mahameed 
> Cc: Eric Dumazet 
> Cc: Tariq Toukan 
> Signed-off-by: Cong Wang 
> ---
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 ++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 624eed345b5d..1c153b8091da 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -732,6 +732,13 @@ static u8 get_ip_proto(struct sk_buff *skb, int 
> network_depth, __be16 proto)
> ((struct ipv6hdr *)ip_p)->nexthdr;
>  }
>
> +static bool is_short_frame(struct sk_buff *skb, bool has_fcs)
> +{
> +   u32 frame_len = has_fcs ? skb->len - ETH_FCS_LEN : skb->len;
> +
> +   return frame_len <= ETH_ZLEN;
> +}
> +
>  static inline void mlx5e_handle_csum(struct net_device *netdev,
>  struct mlx5_cqe64 *cqe,
>  struct mlx5e_rq *rq,
> @@ -755,9 +762,22 @@ static inline void mlx5e_handle_csum(struct net_device 
> *netdev,
> goto csum_unnecessary;
>
> if (likely(is_last_ethertype_ip(skb, _depth, ))) {
> +   bool has_fcs = !!(netdev->features & NETIF_F_RXFCS);
> +
> if (unlikely(get_ip_proto(skb, network_depth, proto) == 
> IPPROTO_SCTP))
> goto csum_unnecessary;
>
> +   /* CQE csum doesn't cover padding octets in short ethernet
> +* frames. And the pad field is appended prior to calculating
> +* and appending the FCS field.
> +*
> +* Detecting these padded frames requires to verify and parse
> +* IP headers, so we simply force all those small frames to be
> +* CHECKSUM_NONE even if they are not padded.
> +*/
> +   if (unlikely(is_short_frame(skb, has_fcs)))
> +   goto csum_none;

Should not this go to csum_unnecessary instead ?

Probably not a big deal, but small UDP frames might hit this code path,
so ethtool -S would show a lot of csum_none which could confuse mlx5 owners.

BTW,
It looks like mlx5 prefers delivering skbs with CHECKSUM_COMPLETE instead of
CHECKSUM_UNNECESSARY but at least for ipv6 CHECKSUM_UNNECESSARY
would be slightly faster, by avoiding  various csum_partial() costs
when headers are parsed.


> +
> skb->ip_summed = CHECKSUM_COMPLETE;
> skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
> if (network_depth > ETH_HLEN)
> @@ -768,7 +788,7 @@ static inline void mlx5e_handle_csum(struct net_device 
> *netdev,
> skb->csum = csum_partial(skb->data + ETH_HLEN,
>  network_depth - ETH_HLEN,
>  skb->csum);
> -   if (unlikely(netdev->features & NETIF_F_RXFCS))
> +   if (unlikely(has_fcs))
> skb->csum = csum_block_add(skb->csum,
>(__force 
> __wsum)mlx5e_get_fcs(skb),
>skb->len - ETH_FCS_LEN);
> --
> 2.19.1
>


[PATCH net-next] can: flexcan: flexcan_chip_start(): fix the error return code in flexcan_setup_stop_mode()

2018-12-03 Thread Wei Yongjun
The error return code PTR_ERR(gpr_np) is always 0 since gpr_np is
equal to NULL in this error handling case. Fix it by return -ENOENT.

Fixes: de3578c198c6 ("can: flexcan: add self wakeup support")
Signed-off-by: Wei Yongjun 
---
 drivers/net/can/flexcan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 0f36eaf..f412d84 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -1432,7 +1432,7 @@ static int flexcan_setup_stop_mode(struct platform_device 
*pdev)
gpr_np = of_find_node_by_phandle(phandle);
if (!gpr_np) {
dev_dbg(>dev, "could not find gpr node by phandle\n");
-   return PTR_ERR(gpr_np);
+   return -ENOENT;
}
 
priv = netdev_priv(dev);





[Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames

2018-12-03 Thread Cong Wang
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are ususally zero's, which don't affect
checksum. However, we have a switch which pads non-zero octets, this
causes kernel hardware checksum fault repeatedly.

Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are 
friends"),
skb checksum was forced to be CHECKSUM_NONE when padding is detected.
After it, we need to keep skb->csum updated, like what we do for RXFCS.
However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
headers, it is not worthy the effort as the packets are so small that
CHECKSUM_COMPLETE can't save anything.

I tested this patch with RXFCS on and off, it works fine without any
warning in both cases.

Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are 
friends"),
Cc: Saeed Mahameed 
Cc: Eric Dumazet 
Cc: Tariq Toukan 
Signed-off-by: Cong Wang 
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 ++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 624eed345b5d..1c153b8091da 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -732,6 +732,13 @@ static u8 get_ip_proto(struct sk_buff *skb, int 
network_depth, __be16 proto)
((struct ipv6hdr *)ip_p)->nexthdr;
 }
 
+static bool is_short_frame(struct sk_buff *skb, bool has_fcs)
+{
+   u32 frame_len = has_fcs ? skb->len - ETH_FCS_LEN : skb->len;
+
+   return frame_len <= ETH_ZLEN;
+}
+
 static inline void mlx5e_handle_csum(struct net_device *netdev,
 struct mlx5_cqe64 *cqe,
 struct mlx5e_rq *rq,
@@ -755,9 +762,22 @@ static inline void mlx5e_handle_csum(struct net_device 
*netdev,
goto csum_unnecessary;
 
if (likely(is_last_ethertype_ip(skb, _depth, ))) {
+   bool has_fcs = !!(netdev->features & NETIF_F_RXFCS);
+
if (unlikely(get_ip_proto(skb, network_depth, proto) == 
IPPROTO_SCTP))
goto csum_unnecessary;
 
+   /* CQE csum doesn't cover padding octets in short ethernet
+* frames. And the pad field is appended prior to calculating
+* and appending the FCS field.
+*
+* Detecting these padded frames requires to verify and parse
+* IP headers, so we simply force all those small frames to be
+* CHECKSUM_NONE even if they are not padded.
+*/
+   if (unlikely(is_short_frame(skb, has_fcs)))
+   goto csum_none;
+
skb->ip_summed = CHECKSUM_COMPLETE;
skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
if (network_depth > ETH_HLEN)
@@ -768,7 +788,7 @@ static inline void mlx5e_handle_csum(struct net_device 
*netdev,
skb->csum = csum_partial(skb->data + ETH_HLEN,
 network_depth - ETH_HLEN,
 skb->csum);
-   if (unlikely(netdev->features & NETIF_F_RXFCS))
+   if (unlikely(has_fcs))
skb->csum = csum_block_add(skb->csum,
   (__force 
__wsum)mlx5e_get_fcs(skb),
   skb->len - ETH_FCS_LEN);
-- 
2.19.1



[PATCH bpf-next v2 2/5] bpf: add BPF_PROG_TEST_RUN support for flow dissector

2018-12-03 Thread Stanislav Fomichev
The input is packet data, the output is struct bpf_flow_key. This should
make it easy to test flow dissector programs without elaborate
setup.

Signed-off-by: Stanislav Fomichev 
---
 include/linux/bpf.h |  3 ++
 net/bpf/test_run.c  | 71 -
 net/core/filter.c   |  1 +
 3 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e82b7039fc66..7a572d15d5dd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -373,6 +373,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
union bpf_attr *kattr,
  union bpf_attr __user *uattr);
 int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
  union bpf_attr __user *uattr);
+int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
+const union bpf_attr *kattr,
+union bpf_attr __user *uattr);
 
 /* an array of programs to be executed under rcu_lock.
  *
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index c89c22c49015..0368ee9eec14 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -16,12 +16,26 @@
 static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
 {
+   struct bpf_skb_data_end *cb;
+   struct sk_buff *skb;
u32 ret;
 
preempt_disable();
rcu_read_lock();
bpf_cgroup_storage_set(storage);
-   ret = BPF_PROG_RUN(prog, ctx);
+
+   switch (prog->type) {
+   case BPF_PROG_TYPE_FLOW_DISSECTOR:
+   skb = (struct sk_buff *)ctx;
+   cb = (struct bpf_skb_data_end *)skb->cb;
+   ret = __skb_flow_bpf_dissect(prog, ctx, _keys_dissector,
+cb->qdisc_cb.flow_keys);
+   break;
+   default:
+   ret = BPF_PROG_RUN(prog, ctx);
+   break;
+   }
+
rcu_read_unlock();
preempt_enable();
 
@@ -220,3 +234,58 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
union bpf_attr *kattr,
kfree(data);
return ret;
 }
+
+int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
+const union bpf_attr *kattr,
+union bpf_attr __user *uattr)
+{
+   u32 size = kattr->test.data_size_in;
+   u32 repeat = kattr->test.repeat;
+   struct bpf_flow_keys flow_keys;
+   struct bpf_skb_data_end *cb;
+   u32 retval, duration;
+   struct sk_buff *skb;
+   struct sock *sk;
+   void *data;
+   int ret;
+
+   if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
+   return -EINVAL;
+
+   data = bpf_test_init(kattr, size, NET_SKB_PAD + NET_IP_ALIGN,
+SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
+   if (IS_ERR(data))
+   return PTR_ERR(data);
+
+   sk = kzalloc(sizeof(*sk), GFP_USER);
+   if (!sk) {
+   kfree(data);
+   return -ENOMEM;
+   }
+   sock_net_set(sk, current->nsproxy->net_ns);
+   sock_init_data(NULL, sk);
+
+   skb = build_skb(data, 0);
+   if (!skb) {
+   kfree(data);
+   kfree(sk);
+   return -ENOMEM;
+   }
+   skb->sk = sk;
+
+   skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+   __skb_put(skb, size);
+   skb->protocol = eth_type_trans(skb,
+  current->nsproxy->net_ns->loopback_dev);
+   skb_reset_network_header(skb);
+
+   cb = (struct bpf_skb_data_end *)skb->cb;
+   cb->qdisc_cb.flow_keys = _keys;
+   retval = bpf_test_run(prog, skb, repeat, );
+
+   ret = bpf_test_finish(kattr, uattr, _keys, sizeof(flow_keys),
+ retval, duration);
+   kfree_skb(skb);
+   kfree(sk);
+   return ret;
+}
diff --git a/net/core/filter.c b/net/core/filter.c
index bd0df75dc7b6..4eae6102399d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7657,6 +7657,7 @@ const struct bpf_verifier_ops flow_dissector_verifier_ops 
= {
 };
 
 const struct bpf_prog_ops flow_dissector_prog_ops = {
+   .test_run   = bpf_prog_test_run_flow_dissector,
 };
 
 int sk_detach_filter(struct sock *sk)
-- 
2.20.0.rc1.387.gf8505762e3-goog



[PATCH bpf-next v2 1/5] net/flow_dissector: move bpf case into __skb_flow_bpf_dissect

2018-12-03 Thread Stanislav Fomichev
This way, we can reuse it for flow dissector in BPF_PROG_TEST_RUN.

No functional changes.

Signed-off-by: Stanislav Fomichev 
---
 include/linux/skbuff.h|  5 +++
 net/core/flow_dissector.c | 85 +++
 2 files changed, 56 insertions(+), 34 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 73902acf2b71..c10b27bb0cab 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1215,6 +1215,11 @@ static inline int 
skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 }
 #endif
 
+struct bpf_flow_keys;
+bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
+   const struct sk_buff *skb,
+   struct flow_dissector *flow_dissector,
+   struct bpf_flow_keys *flow_keys);
 bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
void *target_container,
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 2e8d91e54179..3c8a78decbc0 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -683,6 +683,41 @@ static void __skb_flow_bpf_to_target(const struct 
bpf_flow_keys *flow_keys,
}
 }
 
+bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
+   const struct sk_buff *skb,
+   struct flow_dissector *flow_dissector,
+   struct bpf_flow_keys *flow_keys)
+{
+   struct bpf_skb_data_end cb_saved;
+   struct bpf_skb_data_end *cb;
+   u32 result;
+
+   /* Note that even though the const qualifier is discarded
+* throughout the execution of the BPF program, all changes(the
+* control block) are reverted after the BPF program returns.
+* Therefore, __skb_flow_dissect does not alter the skb.
+*/
+
+   cb = (struct bpf_skb_data_end *)skb->cb;
+
+   /* Save Control Block */
+   memcpy(_saved, cb, sizeof(cb_saved));
+   memset(cb, 0, sizeof(*cb));
+
+   /* Pass parameters to the BPF program */
+   memset(flow_keys, 0, sizeof(*flow_keys));
+   cb->qdisc_cb.flow_keys = flow_keys;
+   flow_keys->nhoff = skb_network_offset(skb);
+
+   bpf_compute_data_pointers((struct sk_buff *)skb);
+   result = BPF_PROG_RUN(prog, skb);
+
+   /* Restore state */
+   memcpy(cb, _saved, sizeof(cb_saved));
+
+   return result == BPF_OK;
+}
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are 
specified
@@ -714,7 +749,6 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector_key_vlan *key_vlan;
enum flow_dissect_ret fdret;
enum flow_dissector_key_id dissector_vlan = FLOW_DISSECTOR_KEY_MAX;
-   struct bpf_prog *attached = NULL;
int num_hdrs = 0;
u8 ip_proto = 0;
bool ret;
@@ -754,49 +788,32 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
  FLOW_DISSECTOR_KEY_BASIC,
  target_container);
 
-   rcu_read_lock();
if (skb) {
+   struct bpf_flow_keys flow_keys;
+   struct bpf_prog *attached = NULL;
+
+   rcu_read_lock();
+
if (skb->dev)
attached = 
rcu_dereference(dev_net(skb->dev)->flow_dissector_prog);
else if (skb->sk)
attached = 
rcu_dereference(sock_net(skb->sk)->flow_dissector_prog);
else
WARN_ON_ONCE(1);
-   }
-   if (attached) {
-   /* Note that even though the const qualifier is discarded
-* throughout the execution of the BPF program, all changes(the
-* control block) are reverted after the BPF program returns.
-* Therefore, __skb_flow_dissect does not alter the skb.
-*/
-   struct bpf_flow_keys flow_keys = {};
-   struct bpf_skb_data_end cb_saved;
-   struct bpf_skb_data_end *cb;
-   u32 result;
-
-   cb = (struct bpf_skb_data_end *)skb->cb;
-
-   /* Save Control Block */
-   memcpy(_saved, cb, sizeof(cb_saved));
-   memset(cb, 0, sizeof(cb_saved));
 
-   /* Pass parameters to the BPF program */
-   cb->qdisc_cb.flow_keys = _keys;
-   flow_keys.nhoff = nhoff;
-
-   bpf_compute_data_pointers((struct sk_buff *)skb);
-   result = BPF_PROG_RUN(attached, skb);
-
-   /* Restore state */
-   memcpy(cb, _saved, sizeof(cb_saved));
-
-   __skb_flow_bpf_to_target(_keys, flow_dissector,
-target_container);
-   key_control->thoff = min_t(u16, key_control->thoff, skb->len);
+ 

[PATCH bpf-next v2 0/5] support flow dissector in BPF_PROG_TEST_RUN

2018-12-03 Thread Stanislav Fomichev
This patch series adds support for testing flow dissector BPF programs by
extending already existing BPF_PROG_TEST_RUN. The goal is to have a
packet as an input and `struct bpf_flow_key' as an output. That way
we can easily test flow dissector programs' behavior.
I've also modified existing test_progs.c test to do a simple flow
dissector run as well.

* first patch introduces new __skb_flow_bpf_dissect to simplify
  sharing between __skb_flow_bpf_dissect and BPF_PROG_TEST_RUN
* second patch adds actual BPF_PROG_TEST_RUN support
* third patch converts BPF flow dissector to thoff
* forth patch correctly caps nhoff and thoff returned from bpf flow
  dissector
* fifth patch adds example usage to the selftests

v2 changes:

* new patch to use thoff instead of nhoff in bpf flow dissector
* new patch to correctly cap thoff for BPF case
* add missing memset(flow_keys, 0, ...) to __skb_flow_bpf_dissect
* set test iterations to 10

Stanislav Fomichev (5):
  net/flow_dissector: move bpf case into __skb_flow_bpf_dissect
  bpf: add BPF_PROG_TEST_RUN support for flow dissector
  selftests/bpf: use thoff instead of nhoff in BPF flow dissector
  net/flow_dissector: correctly cap nhoff and thoff in case of BPF
  selftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow
dissector

 include/linux/bpf.h   |  3 +
 include/linux/skbuff.h|  5 ++
 net/bpf/test_run.c| 71 ++-
 net/core/filter.c |  1 +
 net/core/flow_dissector.c | 88 ---
 tools/testing/selftests/bpf/Makefile  |  3 +
 tools/testing/selftests/bpf/bpf_flow.c| 36 
 .../selftests/bpf/flow_dissector_load.c   | 43 ++---
 .../selftests/bpf/flow_dissector_load.h   | 55 
 tools/testing/selftests/bpf/test_progs.c  | 78 +++-
 10 files changed, 289 insertions(+), 94 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/flow_dissector_load.h

-- 
2.20.0.rc1.387.gf8505762e3-goog



[PATCH bpf-next v2 4/5] net/flow_dissector: correctly cap nhoff and thoff in case of BPF

2018-12-03 Thread Stanislav Fomichev
We want to make sure that the following condition holds:
0 <= nhoff <= thoff <= skb->len

BPF program can set out-of-bounds nhoff and thoff, which is dangerous, see
recent commit d0c081b49137 ("flow_dissector: properly cap thoff field")'.

Signed-off-by: Stanislav Fomichev 
---
 net/core/flow_dissector.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index ac19da6f390b..bb1a54747d64 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -716,6 +716,10 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
/* Restore state */
memcpy(cb, _saved, sizeof(cb_saved));
 
+   flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff, 0, skb->len);
+   flow_keys->thoff = clamp_t(u16, flow_keys->thoff,
+  flow_keys->nhoff, skb->len);
+
return result == BPF_OK;
 }
 
@@ -808,8 +812,6 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 _keys);
__skb_flow_bpf_to_target(_keys, flow_dissector,
 target_container);
-   key_control->thoff = min_t(u16, key_control->thoff,
-  skb->len);
rcu_read_unlock();
return ret;
}
-- 
2.20.0.rc1.387.gf8505762e3-goog



[PATCH bpf-next v2 3/5] selftests/bpf: use thoff instead of nhoff in BPF flow dissector

2018-12-03 Thread Stanislav Fomichev
We are returning thoff from the flow dissector, not the nhoff. Pass
thoff along with nhoff to the bpf program (initially thoff == nhoff)
and expect flow dissector amend/return thoff, not nhoff.

This avoids confusion, when by the time bpf flow dissector exits,
nhoff == thoff, which doesn't make much sense (this is relevant
in the context of the next patch, where I add simple selftest
and manually construct expected flow_keys).

Signed-off-by: Stanislav Fomichev 
---
 net/core/flow_dissector.c  |  1 +
 tools/testing/selftests/bpf/bpf_flow.c | 36 --
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 3c8a78decbc0..ac19da6f390b 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -708,6 +708,7 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
memset(flow_keys, 0, sizeof(*flow_keys));
cb->qdisc_cb.flow_keys = flow_keys;
flow_keys->nhoff = skb_network_offset(skb);
+   flow_keys->thoff = flow_keys->nhoff;
 
bpf_compute_data_pointers((struct sk_buff *)skb);
result = BPF_PROG_RUN(prog, skb);
diff --git a/tools/testing/selftests/bpf/bpf_flow.c 
b/tools/testing/selftests/bpf/bpf_flow.c
index b9798f558ca7..284660f5aa95 100644
--- a/tools/testing/selftests/bpf/bpf_flow.c
+++ b/tools/testing/selftests/bpf/bpf_flow.c
@@ -70,18 +70,18 @@ static __always_inline void 
*bpf_flow_dissect_get_header(struct __sk_buff *skb,
 {
void *data_end = (void *)(long)skb->data_end;
void *data = (void *)(long)skb->data;
-   __u16 nhoff = skb->flow_keys->nhoff;
+   __u16 thoff = skb->flow_keys->thoff;
__u8 *hdr;
 
/* Verifies this variable offset does not overflow */
-   if (nhoff > (USHRT_MAX - hdr_size))
+   if (thoff > (USHRT_MAX - hdr_size))
return NULL;
 
-   hdr = data + nhoff;
+   hdr = data + thoff;
if (hdr + hdr_size <= data_end)
return hdr;
 
-   if (bpf_skb_load_bytes(skb, nhoff, buffer, hdr_size))
+   if (bpf_skb_load_bytes(skb, thoff, buffer, hdr_size))
return NULL;
 
return buffer;
@@ -158,13 +158,13 @@ static __always_inline int parse_ip_proto(struct 
__sk_buff *skb, __u8 proto)
/* Only inspect standard GRE packets with version 0 */
return BPF_OK;
 
-   keys->nhoff += sizeof(*gre); /* Step over GRE Flags and Proto */
+   keys->thoff += sizeof(*gre); /* Step over GRE Flags and Proto */
if (GRE_IS_CSUM(gre->flags))
-   keys->nhoff += 4; /* Step over chksum and Padding */
+   keys->thoff += 4; /* Step over chksum and Padding */
if (GRE_IS_KEY(gre->flags))
-   keys->nhoff += 4; /* Step over key */
+   keys->thoff += 4; /* Step over key */
if (GRE_IS_SEQ(gre->flags))
-   keys->nhoff += 4; /* Step over sequence number */
+   keys->thoff += 4; /* Step over sequence number */
 
keys->is_encap = true;
 
@@ -174,7 +174,7 @@ static __always_inline int parse_ip_proto(struct __sk_buff 
*skb, __u8 proto)
if (!eth)
return BPF_DROP;
 
-   keys->nhoff += sizeof(*eth);
+   keys->thoff += sizeof(*eth);
 
return parse_eth_proto(skb, eth->h_proto);
} else {
@@ -191,7 +191,6 @@ static __always_inline int parse_ip_proto(struct __sk_buff 
*skb, __u8 proto)
if ((__u8 *)tcp + (tcp->doff << 2) > data_end)
return BPF_DROP;
 
-   keys->thoff = keys->nhoff;
keys->sport = tcp->source;
keys->dport = tcp->dest;
return BPF_OK;
@@ -201,7 +200,6 @@ static __always_inline int parse_ip_proto(struct __sk_buff 
*skb, __u8 proto)
if (!udp)
return BPF_DROP;
 
-   keys->thoff = keys->nhoff;
keys->sport = udp->source;
keys->dport = udp->dest;
return BPF_OK;
@@ -252,8 +250,8 @@ PROG(IP)(struct __sk_buff *skb)
keys->ipv4_src = iph->saddr;
keys->ipv4_dst = iph->daddr;
 
-   keys->nhoff += iph->ihl << 2;
-   if (data + keys->nhoff > data_end)
+   keys->thoff += iph->ihl << 2;
+   if (data + keys->thoff > data_end)
return BPF_DROP;
 
if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET)) {
@@ -285,7 +283,7 @@ PROG(IPV6)(struct __sk_buff *skb)
keys->addr_proto = ETH_P_IPV6;
memcpy(>ipv6_src, >saddr, 2*sizeof(ip6h->saddr));
 
-   keys->nhoff += sizeof(struct ipv6hdr);
+   keys->thoff += sizeof(struct ipv6hdr);
 
return parse_ipv6_proto(skb, ip6h->nexthdr);
 }
@@ -301,7 +299,7 @@ PROG(IPV6OP)(struct __sk_buff 

[PATCH bpf-next v2 5/5] selftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow dissector

2018-12-03 Thread Stanislav Fomichev
Use existing pkt_v4 and pkt_v6 to make sure flow_keys are what we want.

Also, add new bpf_flow_load routine (and flow_dissector_load.h header)
that loads bpf_flow.o program and does all required setup.

Signed-off-by: Stanislav Fomichev 
---
 tools/testing/selftests/bpf/Makefile  |  3 +
 .../selftests/bpf/flow_dissector_load.c   | 43 ++
 .../selftests/bpf/flow_dissector_load.h   | 55 +
 tools/testing/selftests/bpf/test_progs.c  | 78 ++-
 4 files changed, 139 insertions(+), 40 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/flow_dissector_load.h

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 73aa6d8f4a2f..387763e6d137 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -19,6 +19,9 @@ all: $(TEST_CUSTOM_PROGS)
 $(TEST_CUSTOM_PROGS): $(OUTPUT)/%: %.c
$(CC) -o $(TEST_CUSTOM_PROGS) -static $< -Wl,--build-id
 
+flow_dissector_load.c: flow_dissector_load.h
+test_run.c: flow_dissector_load.h
+
 # Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
diff --git a/tools/testing/selftests/bpf/flow_dissector_load.c 
b/tools/testing/selftests/bpf/flow_dissector_load.c
index ae8180b11d5f..77cafa66d048 100644
--- a/tools/testing/selftests/bpf/flow_dissector_load.c
+++ b/tools/testing/selftests/bpf/flow_dissector_load.c
@@ -12,6 +12,7 @@
 #include 
 
 #include "bpf_rlimit.h"
+#include "flow_dissector_load.h"
 
 const char *cfg_pin_path = "/sys/fs/bpf/flow_dissector";
 const char *cfg_map_name = "jmp_table";
@@ -21,46 +22,13 @@ char *cfg_path_name;
 
 static void load_and_attach_program(void)
 {
-   struct bpf_program *prog, *main_prog;
-   struct bpf_map *prog_array;
-   int i, fd, prog_fd, ret;
+   int prog_fd, ret;
struct bpf_object *obj;
-   int prog_array_fd;
 
-   ret = bpf_prog_load(cfg_path_name, BPF_PROG_TYPE_FLOW_DISSECTOR, ,
-   _fd);
+   ret = bpf_flow_load(, cfg_path_name, cfg_section_name,
+   cfg_map_name, _fd);
if (ret)
-   error(1, 0, "bpf_prog_load %s", cfg_path_name);
-
-   main_prog = bpf_object__find_program_by_title(obj, cfg_section_name);
-   if (!main_prog)
-   error(1, 0, "bpf_object__find_program_by_title %s",
- cfg_section_name);
-
-   prog_fd = bpf_program__fd(main_prog);
-   if (prog_fd < 0)
-   error(1, 0, "bpf_program__fd");
-
-   prog_array = bpf_object__find_map_by_name(obj, cfg_map_name);
-   if (!prog_array)
-   error(1, 0, "bpf_object__find_map_by_name %s", cfg_map_name);
-
-   prog_array_fd = bpf_map__fd(prog_array);
-   if (prog_array_fd < 0)
-   error(1, 0, "bpf_map__fd %s", cfg_map_name);
-
-   i = 0;
-   bpf_object__for_each_program(prog, obj) {
-   fd = bpf_program__fd(prog);
-   if (fd < 0)
-   error(1, 0, "bpf_program__fd");
-
-   if (fd != prog_fd) {
-   printf("%d: %s\n", i, bpf_program__title(prog, false));
-   bpf_map_update_elem(prog_array_fd, , , BPF_ANY);
-   ++i;
-   }
-   }
+   error(1, 0, "bpf_flow_load %s", cfg_path_name);
 
ret = bpf_prog_attach(prog_fd, 0 /* Ignore */, BPF_FLOW_DISSECTOR, 0);
if (ret)
@@ -69,7 +37,6 @@ static void load_and_attach_program(void)
ret = bpf_object__pin(obj, cfg_pin_path);
if (ret)
error(1, 0, "bpf_object__pin %s", cfg_pin_path);
-
 }
 
 static void detach_program(void)
diff --git a/tools/testing/selftests/bpf/flow_dissector_load.h 
b/tools/testing/selftests/bpf/flow_dissector_load.h
new file mode 100644
index ..41dd6959feb0
--- /dev/null
+++ b/tools/testing/selftests/bpf/flow_dissector_load.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) */
+#ifndef FLOW_DISSECTOR_LOAD
+#define FLOW_DISSECTOR_LOAD
+
+#include 
+#include 
+
+static inline int bpf_flow_load(struct bpf_object **obj,
+   const char *path,
+   const char *section_name,
+   const char *map_name,
+   int *prog_fd)
+{
+   struct bpf_program *prog, *main_prog;
+   struct bpf_map *prog_array;
+   int prog_array_fd;
+   int ret, fd, i;
+
+   ret = bpf_prog_load(path, BPF_PROG_TYPE_FLOW_DISSECTOR, obj,
+   prog_fd);
+   if (ret)
+   return ret;
+
+   main_prog = bpf_object__find_program_by_title(*obj, section_name);
+   if (!main_prog)
+   return ret;
+
+   *prog_fd = bpf_program__fd(main_prog);
+   if (*prog_fd < 0)
+

Re: [PATCH bpf-next 2/3] bpf: add BPF_PROG_TEST_RUN support for flow dissector

2018-12-03 Thread Stanislav Fomichev
On 12/03, Stanislav Fomichev wrote:
> On 12/03, Song Liu wrote:
> > On Mon, Dec 3, 2018 at 11:00 AM Stanislav Fomichev  wrote:
> > >
> > > The input is packet data, the output is struct bpf_flow_key. This should
> > > make it easy to test flow dissector programs without elaborate
> > > setup.
> > >
> > > Signed-off-by: Stanislav Fomichev 
> > > ---
> > >  include/linux/bpf.h |  3 ++
> > >  net/bpf/test_run.c  | 76 +
> > >  net/core/filter.c   |  1 +
> > >  3 files changed, 74 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index e82b7039fc66..7a572d15d5dd 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -373,6 +373,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, 
> > > const union bpf_attr *kattr,
> > >   union bpf_attr __user *uattr);
> > >  int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr 
> > > *kattr,
> > >   union bpf_attr __user *uattr);
> > > +int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> > > +const union bpf_attr *kattr,
> > > +union bpf_attr __user *uattr);
> > >
> > >  /* an array of programs to be executed under rcu_lock.
> > >   *
> > > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > > index c89c22c49015..bfa05d31c6e3 100644
> > > --- a/net/bpf/test_run.c
> > > +++ b/net/bpf/test_run.c
> > > @@ -14,21 +14,33 @@
> > >  #include 
> > >
> > >  static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void 
> > > *ctx,
> > > -   struct bpf_cgroup_storage 
> > > *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
> > > +   struct bpf_cgroup_storage 
> > > *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
> > > +   struct bpf_flow_keys *flow_keys)
> > >  {
> > > u32 ret;
> > >
> > > preempt_disable();
> > > rcu_read_lock();
> > > bpf_cgroup_storage_set(storage);
> > > -   ret = BPF_PROG_RUN(prog, ctx);
> > > +
> > > +   switch (prog->type) {
> > > +   case BPF_PROG_TYPE_FLOW_DISSECTOR:
> > > +   ret = __skb_flow_bpf_dissect(prog, ctx, 
> > > _keys_dissector,
> > > +flow_keys);
> > > +   break;
> > > +   default:
> > > +   ret = BPF_PROG_RUN(prog, ctx);
> > > +   break;
> > > +   }
> > > +
> > 
> > Is it possible to fold the logic above into 
> > bpf_prog_test_run_flow_dissector()?
> > In that way, the logic flow is similar to other bpf_prog_test_run_XXX()
> > functions.
> I can probably do everything that __skb_flow_bpf_dissect does inside of
> bpf_prog_test_run_flow_dissector (pass flow_keys in cb); that would remove
> this ugly program type check down the stack. But my additional goal here was
> to test __skb_flow_bpf_dissect itself as well.
> 
> Attached some possible prototype below. The only issue I see with that
> approach is that we need to clear flow_keys on each test iteration.
> I think in my current patch I'm actually missing a memset(_keys, 0, ...)
> for each iteration of __skb_flow_bpf_dissect;
> I haven't tested multiple runs :-/
> 
> So I'd prefer to continue doing this prog type check (plus, add a missing
> memset for flow_keys on each iteration in v2). WDYT?
I think I found another problem with not using __skb_flow_bpf_dissect:
I want to add some clamping to the thoff/nhoff returned by the bpf
program (to be safe); it's probably better to keep all this logic in one
place.

I can probably pass flow_keys in the skb->cb, so I don't have to add
flow_keys to the bpf_test_run_one/bpf_test_run signatures. I'll send v2
shortly, let me know if it addresses your point.

> --- 
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e82b7039fc66..7a572d15d5dd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -373,6 +373,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
> union bpf_attr *kattr,
> union bpf_attr __user *uattr);
>  int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> union bpf_attr __user *uattr);
> +int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> +  const union bpf_attr *kattr,
> +  union bpf_attr __user *uattr);
>  
>  /* an array of programs to be executed under rcu_lock.
>   *
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index c89c22c49015..04387ef1f95a 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -220,3 +220,60 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
> union bpf_attr *kattr,
>   kfree(data);
>   return ret;
>  }
> +
> +int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> +  const union bpf_attr *kattr,
> +   

[PATCH net-next] net: netem: use a list in addition to rbtree

2018-12-03 Thread Peter Oskolkov
When testing high-bandwidth TCP streams with large windows,
high latency, and low jitter, netem consumes a lot of CPU cycles
doing rbtree rebalancing.

This patch uses a linear list/queue in addition to the rbtree:
if an incoming packet is past the tail of the linear queue, it is
added there, otherwise it is inserted into the rbtree.

Without this patch, perf shows netem_enqueue, netem_dequeue,
and rb_* functions among the top offenders. With this patch,
only netem_enqueue is noticeable if jitter is low/absent.

Suggested-by: Eric Dumazet 
Signed-off-by: Peter Oskolkov 
---
 net/sched/sch_netem.c | 86 ++-
 1 file changed, 68 insertions(+), 18 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 2c38e3d07924..f1099486ecd3 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -77,6 +77,10 @@ struct netem_sched_data {
/* internal t(ime)fifo qdisc uses t_root and sch->limit */
struct rb_root t_root;
 
+   /* a linear queue; reduces rbtree rebalancing when jitter is low */
+   struct sk_buff  *t_head;
+   struct sk_buff  *t_tail;
+
/* optional qdisc for classful handling (NULL at netem init) */
struct Qdisc*qdisc;
 
@@ -369,26 +373,39 @@ static void tfifo_reset(struct Qdisc *sch)
rb_erase(>rbnode, >t_root);
rtnl_kfree_skbs(skb, skb);
}
+
+   rtnl_kfree_skbs(q->t_head, q->t_tail);
+   q->t_head = NULL;
+   q->t_tail = NULL;
 }
 
 static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
 {
struct netem_sched_data *q = qdisc_priv(sch);
u64 tnext = netem_skb_cb(nskb)->time_to_send;
-   struct rb_node **p = >t_root.rb_node, *parent = NULL;
 
-   while (*p) {
-   struct sk_buff *skb;
-
-   parent = *p;
-   skb = rb_to_skb(parent);
-   if (tnext >= netem_skb_cb(skb)->time_to_send)
-   p = >rb_right;
+   if (!q->t_tail || tnext >= netem_skb_cb(q->t_tail)->time_to_send) {
+   if (q->t_tail)
+   q->t_tail->next = nskb;
else
-   p = >rb_left;
+   q->t_head = nskb;
+   q->t_tail = nskb;
+   } else {
+   struct rb_node **p = >t_root.rb_node, *parent = NULL;
+
+   while (*p) {
+   struct sk_buff *skb;
+
+   parent = *p;
+   skb = rb_to_skb(parent);
+   if (tnext >= netem_skb_cb(skb)->time_to_send)
+   p = >rb_right;
+   else
+   p = >rb_left;
+   }
+   rb_link_node(>rbnode, parent, p);
+   rb_insert_color(>rbnode, >t_root);
}
-   rb_link_node(>rbnode, parent, p);
-   rb_insert_color(>rbnode, >t_root);
sch->q.qlen++;
 }
 
@@ -534,6 +551,15 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
last = t_last;
}
}
+   if (q->t_tail) {
+   struct netem_skb_cb *t_last =
+   netem_skb_cb(q->t_tail);
+
+   if (!last ||
+   t_last->time_to_send > last->time_to_send) {
+   last = t_last;
+   }
+   }
 
if (last) {
/*
@@ -611,11 +637,37 @@ static void get_slot_next(struct netem_sched_data *q, u64 
now)
q->slot.bytes_left = q->slot_config.max_bytes;
 }
 
+static struct sk_buff *netem_peek(struct netem_sched_data *q)
+{
+   struct sk_buff *skb = skb_rb_first(>t_root);
+   u64 t1, t2;
+
+   if (!skb)
+   return q->t_head;
+   if (!q->t_head)
+   return skb;
+
+   t1 = netem_skb_cb(skb)->time_to_send;
+   t2 = netem_skb_cb(q->t_head)->time_to_send;
+   if (t1 < t2)
+   return skb;
+   return q->t_head;
+}
+
+static void netem_erase_head(struct netem_sched_data *q, struct sk_buff *skb)
+{
+   if (skb == q->t_head) {
+   q->t_head = skb->next;
+   if (!q->t_head)
+   q->t_tail = NULL;
+   } else
+   rb_erase(>rbnode, >t_root);
+}
+
 static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 {
struct netem_sched_data *q = qdisc_priv(sch);
struct sk_buff *skb;
-   struct rb_node *p;
 
 tfifo_dequeue:
skb = __qdisc_dequeue_head(>q);
@@ -625,20 +677,18 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
qdisc_bstats_update(sch, skb);
return skb;
}
-   p = rb_first(>t_root);
-   if (p) {
+   skb = netem_peek(q);
+   if 

Re: consistency for statistics with XDP mode

2018-12-03 Thread David Miller
From: David Ahern 
Date: Mon, 3 Dec 2018 17:15:03 -0700

> So, instead of a program tag which the program writer controls, how
> about some config knob that an admin controls that says at attach time
> use standard stats?

How about, instead of replacing it is in addition to, and admin can
override?

I'm all for choice so how can I object? :)


Re: [PATCH net-next 0/2] mlx4_core cleanups

2018-12-03 Thread David Miller
From: Tariq Toukan 
Date: Sun,  2 Dec 2018 17:40:24 +0200

> This patchset by Erez contains cleanups to the mlx4_core driver.
> 
> Patch 1 replaces -EINVAL with -EOPNOTSUPP for unsupported operations.
> Patch 2 fixes some coding style issues.
> 
> Series generated against net-next commit:
> 97e6c858a26e net: usb: aqc111: Initialize wol_cfg with memset in 
> aqc111_suspend

Series applied, thanks.


Re: [PATCH net-next v2] net: phy: Also request modules for C45 IDs

2018-12-03 Thread David Miller
From: Jose Abreu 
Date: Sun,  2 Dec 2018 16:33:14 +0100

> Logic of phy_device_create() requests PHY modules according to PHY ID
> but for C45 PHYs we use different field for the IDs.
> 
> Let's also request the modules for these IDs.
> 
> Changes from v1:
> - Only request C22 modules if C45 are not present (Andrew)
> 
> Signed-off-by: Jose Abreu 

Applied, thanks Jose.

Florian, just for the record, I actually like the changelogs to be in
the commit messages.  It can help people understand that something
was deliberately implemented a certain way and alternative approaches
were considered.


Re: [PATCH net-next v2 00/14] octeontx2-af: NIX and NPC enhancements

2018-12-03 Thread David Miller
From: Jerin Jacob 
Date: Sun,  2 Dec 2018 18:17:35 +0530

> This patchset is a continuation to earlier submitted four patch
> series to add a new driver for Marvell's OcteonTX2 SOC's
> Resource virtualization unit (RVU) admin function driver.
> 
> 1. octeontx2-af: Add RVU Admin Function driver
>https://www.spinics.net/lists/netdev/msg528272.html
> 2. octeontx2-af: NPA and NIX blocks initialization
>https://www.spinics.net/lists/netdev/msg529163.html
> 3. octeontx2-af: NPC parser and NIX blocks initialization
>https://www.spinics.net/lists/netdev/msg530252.html
> 4. octeontx2-af: NPC MCAM support and FLR handling
>https://www.spinics.net/lists/netdev/msg534392.html
> 
> This patch series adds support for below
> 
> NPC block:
> - Add NPC(mkex) profile support for various Key extraction configurations
> 
> NIX block:
> - Enable dynamic RSS flow key algorithm configuration
> - Enhancements on Rx checksum and error checks
> - Add support for Tx packet marking support
> - TL1 schedule queue allocation enhancements
> - Add LSO format configuration mbox
> - VLAN TPID configuration
> - Skip multicast entry init for broadcast tables
 ...

Series applied, thanks.


Re: [PATCH v2 bpf] mips: bpf: fix encoding bug for mm_srlv32_op

2018-12-03 Thread Jakub Kicinski
On Tue, 4 Dec 2018 00:06:24 +, Paul Burton wrote:
> If you have related patches the best thing to do would be to submit them
> together as a series. Then after the maintainers involved can see the
> patches we can figure out the best way to apply them.

Right, in hindsight that could've worked better, but for netdev/bpf
patches posting fixes and features in one series is a no-no :)

I guess the best way forward would be for Jiong to post the dependent
set (BPF_ALU | BPF_ARSH support) as an RFC and then decide.  The
conflict will be trivial, yet avoidable if we wait for this to
propagate to bpf-next.


Re: [PATCH net 0/2] mlx4 fixes for 4.20-rc

2018-12-03 Thread David Miller
From: Tariq Toukan 
Date: Sun,  2 Dec 2018 14:34:35 +0200

> This patchset includes small fixes for the mlx4_en driver.
> 
> First patch by Eran fixes the value used to init the netdevice's
> min_mtu field.
> Please queue it to -stable >= v4.10.
> 
> Second patch by Saeed adds missing Kconfig build dependencies.
> 
> Series generated against net commit:
> 35b827b6d061 tun: forbid iface creation with rtnl ops

Series applied and patch #1 queued up for -stable, thanks.


Re: consistency for statistics with XDP mode

2018-12-03 Thread David Ahern
On 12/3/18 5:00 PM, David Miller wrote:
> From: Toke Høiland-Jørgensen 
> Date: Mon, 03 Dec 2018 22:00:32 +0200
> 
>> I wonder if it would be possible to support both the "give me user
>> normal stats" case and the "let me do whatever I want" case by a
>> combination of userspace tooling and maybe a helper or two?
>>
>> I.e., create a "do_stats()" helper (please pick a better name), which
>> will either just increment the desired counters, or set a flag so the
>> driver can do it at napi poll exit. With this, the userspace tooling
>> could have a "--give-me-normal-stats" switch (or some other interface),
>> which would inject a call instruction to that helper at the start of the
>> program.
>>
>> This would enable the normal counters in a relatively painless way,
>> while still letting people opt out if they don't want to pay the cost in
>> terms of overhead. And having the userspace tooling inject the helper
>> call helps support the case where the admin didn't write the XDP
>> programs being loaded.
>>
>> Any reason why that wouldn't work?
> 
> I think this is a good idea, or even an attribute tag that gets added
> to the XDP program that controls stats handling.
> 

My argument is that the ebpf program writer should *not* get that
choice; the admin of the box should. Program writers make mistakes. Box
admins / customer support are the ones that have to deal with those
mistakes. Program writers - especially for xdp - are going to be focused
on benchmarks; admins are focused on the big picture and should be given
the option of trading a small amount of performance for simpler management.

So, instead of a program tag which the program writer controls, how
about some config knob that an admin controls that says at attach time
use standard stats?


Re: [PATCH v2 bpf] mips: bpf: fix encoding bug for mm_srlv32_op

2018-12-03 Thread Paul Burton
Hi Jakub,

On Mon, Dec 03, 2018 at 03:55:45PM -0800, Jakub Kicinski wrote:
> On Mon, 3 Dec 2018 22:42:04 +, Paul Burton wrote:
> > Jiong Wang wrote:
> > > For micro-mips, srlv inside POOL32A encoding space should use 0x50
> > > sub-opcode, NOT 0x90.
> > > 
> > > Some early version ISA doc describes the encoding as 0x90 for both srlv 
> > > and
> > > srav, this looks to me was a typo. I checked Binutils libopcode
> > > implementation which is using 0x50 for srlv and 0x90 for srav.
> > > 
> > > v1->v2:
> > > - Keep mm_srlv32_op sorted by value.
> > > 
> > > Fixes: f31318fdf324 ("MIPS: uasm: Add srlv uasm instruction")
> > > Cc: Markos Chandras 
> > > Cc: Paul Burton 
> > > Cc: linux-m...@vger.kernel.org
> > > Acked-by: Jakub Kicinski 
> > > Acked-by: Song Liu 
> > > Signed-off-by: Jiong Wang   
> > 
> > Applied to mips-fixes.
> 
> Newbie process related question - are the arch JIT patches routed via
> arch trees or bpf-next?  Jiong has more (slightly conflicting) JIT
> patches to send - I wonder how they'll get applied and whether to wait
> for the mips -> Linus -> net -> bpf merge chain.

I'd expect that to be a case-by-case "what makes most sense this time?"
sort of question.

In this particular patch the code you're changing isn't specifically
BPF-related code, it's part of the MIPS uasm assembler which MIPS BPF
happens to use behind the scenes, so since it seemed like a pretty
standalone patch taking it through the MIPS tree made sense to me.

If you have related patches the best thing to do would be to submit them
together as a series. Then after the maintainers involved can see the
patches we can figure out the best way to apply them.

Thanks,
Paul


Re: [iproute2-next PATCH v6] tc: flower: Classify packets based port ranges

2018-12-03 Thread David Ahern
On 12/3/18 4:58 PM, Nambiar, Amritha wrote:
> A previous version v3 of this patch was already applied to iproute2-next.
> https://patchwork.ozlabs.org/patch/998644/
> 
> I think that needs to be reverted for this v6 to apply clean.

ugh. That's embarrassing. Looks like I inadvertently pushed the older
one. Reverted and applied. Thanks,


Re: [PATCH net] macvlan: return correct error value

2018-12-03 Thread David Miller
From: Matteo Croce 
Date: Sat,  1 Dec 2018 00:26:27 +0100

> A MAC address must be unique among all the macvlan devices with the same
> lower device. The only exception is the passthru [sic] mode,
> which shares the lower device address.
> 
> When duplicate addresses are detected, EBUSY is returned when bringing
> the interface up:
> 
> # ip link add macvlan0 link eth0 type macvlan
> # read addr  # ip link set macvlan0 address $addr
> # ip link set macvlan0 up
> RTNETLINK answers: Device or resource busy
> 
> Use correct error code which is EADDRINUSE, and do the check also
> earlier, on address change:
> 
> # ip link set macvlan0 address $addr
> RTNETLINK answers: Address already in use
> 
> Signed-off-by: Matteo Croce 

Applied, thanks Matteo.


Re: consistency for statistics with XDP mode

2018-12-03 Thread David Miller
From: Toke Høiland-Jørgensen 
Date: Mon, 03 Dec 2018 22:00:32 +0200

> I wonder if it would be possible to support both the "give me user
> normal stats" case and the "let me do whatever I want" case by a
> combination of userspace tooling and maybe a helper or two?
> 
> I.e., create a "do_stats()" helper (please pick a better name), which
> will either just increment the desired counters, or set a flag so the
> driver can do it at napi poll exit. With this, the userspace tooling
> could have a "--give-me-normal-stats" switch (or some other interface),
> which would inject a call instruction to that helper at the start of the
> program.
> 
> This would enable the normal counters in a relatively painless way,
> while still letting people opt out if they don't want to pay the cost in
> terms of overhead. And having the userspace tooling inject the helper
> call helps support the case where the admin didn't write the XDP
> programs being loaded.
> 
> Any reason why that wouldn't work?

I think this is a good idea, or even an attribute tag that gets added
to the XDP program that controls stats handling.


Re: [PATCH net-next v4 0/3] udp msg_zerocopy

2018-12-03 Thread David Miller
From: Willem de Bruijn 
Date: Fri, 30 Nov 2018 15:32:38 -0500

> Enable MSG_ZEROCOPY for udp sockets

Series applied, thanks for keeping up with this.


Re: [iproute2-next PATCH v6] tc: flower: Classify packets based port ranges

2018-12-03 Thread Nambiar, Amritha
On 12/3/2018 3:52 PM, David Ahern wrote:
> On 11/27/18 3:40 PM, Amritha Nambiar wrote:
>> Added support for filtering based on port ranges.
>> UAPI changes have been accepted into net-next.
>>
>> Example:
>> 1. Match on a port range:
>> -
>> $ tc filter add dev enp4s0 protocol ip parent :\
>>   prio 1 flower ip_proto tcp dst_port 20-30 skip_hw\
>>   action drop
>>
>> $ tc -s filter show dev enp4s0 parent :
>> filter protocol ip pref 1 flower chain 0
>> filter protocol ip pref 1 flower chain 0 handle 0x1
>>   eth_type ipv4
>>   ip_proto tcp
>>   dst_port 20-30
>>   skip_hw
>>   not_in_hw
>> action order 1: gact action drop
>>  random type none pass val 0
>>  index 1 ref 1 bind 1 installed 85 sec used 3 sec
>> Action statistics:
>> Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>>
>> 2. Match on IP address and port range:
>> --
>> $ tc filter add dev enp4s0 protocol ip parent :\
>>   prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port 100-200\
>>   skip_hw action drop
>>
>> $ tc -s filter show dev enp4s0 parent :
>> filter protocol ip pref 1 flower chain 0 handle 0x2
>>   eth_type ipv4
>>   ip_proto tcp
>>   dst_ip 192.168.1.1
>>   dst_port 100-200
>>   skip_hw
>>   not_in_hw
>> action order 1: gact action drop
>>  random type none pass val 0
>>  index 2 ref 1 bind 1 installed 58 sec used 2 sec
>> Action statistics:
>> Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>>
>> v6:
>> Modified to change json output format as object for sport/dport.
>>
>>  "dst_port":{
>>"start":2000,
>>"end":6000
>>  },
>>  "src_port":{
>>"start":50,
>>"end":60
>>  }
>>
>> v5:
>> Simplified some code and used 'sscanf' for parsing. Removed
>> space in output format.
>>
>> v4:
>> Added man updates explaining filtering based on port ranges.
>> Removed 'range' keyword.
>>
>> v3:
>> Modified flower_port_range_attr_type calls.
>>
>> v2:
>> Addressed Jiri's comment to sync output format with input
>>
>> Signed-off-by: Amritha Nambiar 
>> ---
>>  man/man8/tc-flower.8 |   13 +---
>>  tc/f_flower.c|   85 
>> +-
>>  2 files changed, 84 insertions(+), 14 deletions(-)
>>
> 
> sorry for the delay; fell through the cracks. It no longer applies to
> net-next. Please update. Thanks,
> 

A previous version v3 of this patch was already applied to iproute2-next.
https://patchwork.ozlabs.org/patch/998644/

I think that needs to be reverted for this v6 to apply clean.


Re: [PATCH 0/3] net: macb: DMA race condition fixes

2018-12-03 Thread David Miller
From: 
Date: Mon, 3 Dec 2018 08:26:52 +

> Can you please delay a bit the acceptance of this series, I would like 
> that we assess these findings with tests on our hardware before applying 
> them.

Sure.


Re: [PATCH net] sctp: kfree_rcu asoc

2018-12-03 Thread David Miller
From: Xin Long 
Date: Sat,  1 Dec 2018 01:36:59 +0800

> In sctp_hash_transport/sctp_epaddr_lookup_transport, it dereferences
> a transport's asoc under rcu_read_lock while asoc is freed not after
> a grace period, which leads to a use-after-free panic.
> 
> This patch fixes it by calling kfree_rcu to make asoc be freed after
> a grace period.
> 
> Note that only the asoc's memory is delayed to free in the patch, it
> won't cause sk to linger longer.
> 
> Thanks Neil and Marcelo to make this clear.
> 
> Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport 
> rhashtable")
> Fixes: cd2b70875058 ("sctp: check duplicate node before inserting a new 
> transport")
> Reported-by: syzbot+0b05d8aa7cb185107...@syzkaller.appspotmail.com
> Reported-by: syzbot+aad231d51b1923158...@syzkaller.appspotmail.com
> Suggested-by: Neil Horman 
> Signed-off-by: Xin Long 

Applied and queued up for -stable, thanks.


Re: [PATCH v2 bpf] mips: bpf: fix encoding bug for mm_srlv32_op

2018-12-03 Thread Jakub Kicinski
On Mon, 3 Dec 2018 22:42:04 +, Paul Burton wrote:
> Jiong Wang wrote:
> > For micro-mips, srlv inside POOL32A encoding space should use 0x50
> > sub-opcode, NOT 0x90.
> > 
> > Some early version ISA doc describes the encoding as 0x90 for both srlv and
> > srav, this looks to me was a typo. I checked Binutils libopcode
> > implementation which is using 0x50 for srlv and 0x90 for srav.
> > 
> > v1->v2:
> > - Keep mm_srlv32_op sorted by value.
> > 
> > Fixes: f31318fdf324 ("MIPS: uasm: Add srlv uasm instruction")
> > Cc: Markos Chandras 
> > Cc: Paul Burton 
> > Cc: linux-m...@vger.kernel.org
> > Acked-by: Jakub Kicinski 
> > Acked-by: Song Liu 
> > Signed-off-by: Jiong Wang   
> 
> Applied to mips-fixes.

Newbie process related question - are the arch JIT patches routed via
arch trees or bpf-next?  Jiong has more (slightly conflicting) JIT
patches to send - I wonder how they'll get applied and whether to wait
for the mips -> Linus -> net -> bpf merge chain.


Re: [RFC PATCH 4/6] dt-bindings: update mvneta binding document

2018-12-03 Thread Rob Herring
On Mon, Nov 12, 2018 at 12:31:02PM +, Russell King wrote:
> Signed-off-by: Russell King 

Needs a better subject and a commit msg.

> ---
>  Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git 
> a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt 
> b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
> index bedcfd5a52cd..691f886cfc4a 100644
> --- a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
> +++ b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
> @@ -19,7 +19,7 @@
>"marvell,armada-370-neta" and 9800B for others.
>  - clock-names: List of names corresponding to clocks property; shall be
>"core" for core clock and "bus" for the optional bus clock.
> -
> +- phys: comphy for the ethernet port, see ../phy/phy-bindings.txt
>  
>  Optional properties (valid only for Armada XP/38x):
>  
> -- 
> 2.7.4
> 


Re: [PATCH net] net/ibmvnic: Fix RTNL deadlock during device reset

2018-12-03 Thread David Miller
From: Thomas Falcon 
Date: Fri, 30 Nov 2018 10:59:08 -0600

> Commit a5681e20b541 ("net/ibmnvic: Fix deadlock problem
> in reset") made the change to hold the RTNL lock during
> driver reset but still calls netdev_notify_peers, which
> results in a deadlock. Instead, use call_netdevice_notifiers,
> which is functionally the same except that it does not
> take the RTNL lock again.
> 
> Fixes: a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset")
> 
> Signed-off-by: Thomas Falcon 

Applied.


Re: [RFC PATCH 1/6] dt-bindings: phy: Armada 38x common phy bindings

2018-12-03 Thread Rob Herring
On Mon, 12 Nov 2018 12:30:47 +, Russell King wrote:
> Add the Marvell Armada 38x common phy bindings.
> 
> Signed-off-by: Russell King 
> ---
>  .../bindings/phy/phy-armada38x-comphy.txt  | 40 
> ++
>  1 file changed, 40 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/phy/phy-armada38x-comphy.txt
> 

Reviewed-by: Rob Herring 


Re: [PATCH net] rtnetlink: Refine sanity checks in rtnl_fdb_{add|del}

2018-12-03 Thread David Miller
From: Ido Schimmel 
Date: Fri, 30 Nov 2018 19:00:24 +0200

> Yes, agree. Patch is good. I'll tag your v2.

This means, I assume, that a new version of this fix is coming.

Eric, is this correct?


Re: [iproute2-next PATCH v6] tc: flower: Classify packets based port ranges

2018-12-03 Thread David Ahern
On 11/27/18 3:40 PM, Amritha Nambiar wrote:
> Added support for filtering based on port ranges.
> UAPI changes have been accepted into net-next.
> 
> Example:
> 1. Match on a port range:
> -
> $ tc filter add dev enp4s0 protocol ip parent :\
>   prio 1 flower ip_proto tcp dst_port 20-30 skip_hw\
>   action drop
> 
> $ tc -s filter show dev enp4s0 parent :
> filter protocol ip pref 1 flower chain 0
> filter protocol ip pref 1 flower chain 0 handle 0x1
>   eth_type ipv4
>   ip_proto tcp
>   dst_port 20-30
>   skip_hw
>   not_in_hw
> action order 1: gact action drop
>  random type none pass val 0
>  index 1 ref 1 bind 1 installed 85 sec used 3 sec
> Action statistics:
> Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> 
> 2. Match on IP address and port range:
> --
> $ tc filter add dev enp4s0 protocol ip parent :\
>   prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port 100-200\
>   skip_hw action drop
> 
> $ tc -s filter show dev enp4s0 parent :
> filter protocol ip pref 1 flower chain 0 handle 0x2
>   eth_type ipv4
>   ip_proto tcp
>   dst_ip 192.168.1.1
>   dst_port 100-200
>   skip_hw
>   not_in_hw
> action order 1: gact action drop
>  random type none pass val 0
>  index 2 ref 1 bind 1 installed 58 sec used 2 sec
> Action statistics:
> Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> 
> v6:
> Modified to change json output format as object for sport/dport.
> 
>  "dst_port":{
>"start":2000,
>"end":6000
>  },
>  "src_port":{
>"start":50,
>"end":60
>  }
> 
> v5:
> Simplified some code and used 'sscanf' for parsing. Removed
> space in output format.
> 
> v4:
> Added man updates explaining filtering based on port ranges.
> Removed 'range' keyword.
> 
> v3:
> Modified flower_port_range_attr_type calls.
> 
> v2:
> Addressed Jiri's comment to sync output format with input
> 
> Signed-off-by: Amritha Nambiar 
> ---
>  man/man8/tc-flower.8 |   13 +---
>  tc/f_flower.c|   85 
> +-
>  2 files changed, 84 insertions(+), 14 deletions(-)
> 

sorry for the delay; fell through the cracks. It no longer applies to
net-next. Please update. Thanks,


Re: [PATCH net-next v4 0/2] Add mii_bus to ixgbe driver for dsa devs

2018-12-03 Thread Florian Fainelli
On 12/3/18 3:42 PM, Steve Douthit wrote:
>> Not directly related to this patch series, are you using the legacy or
>> "new" way of passing platform data in order to register the DSA switch?
>> Since you mentioned 6390, I would assume this must be the "new" way of
>> registering DSA devices with mdio_boardinfo etc. In that case, have you
>> found yourself limited by the current dsa_chip_platform_data?
> 
> With the new method I had an off-by-one error where the 'enabled_ports'
> bitmask and 'port_names' array didn't match up in my first attempt.
> That's definitely my fault, but the loop that searches for the "cpu"
> string didn't like it and segfaulted.
> 
> I've got something of a chicken-and-the-egg problem between where I'm
> deferring while waiting for netdevs to show up, and registering the
> mdio_board_info that needs those same pointers.  For testing I exported
> mdiobus_setup_mdiodev_from_board_info() and mdiobus_create_device() so I
> could call the setup function again when the data was actually ready.

Yes the current solution whereby we need to get a hold on the network
device's struct device reference is not quite ideal, AFAIR, Andrew has
had the same problem.

> 
> It'd be nice to have more than one "cpu" port on a switch and have some
> way to associate downstream and "cpu" ports.  Not sure exactly what that
> would look like, and it's not something I'm going to try and tackle at
> the moment, but it's one for the wish list.

Yes, we have been discussing that topic with Andrew and have a few ideas
on how that could be achieved, but not code to use that at the moment.
One of the idea was to finally allow enslaving the DSA master network
device, that way you could assign specific ports to a specific CPU/DSA
master network port within the switch, though that requires setting up a
bridge. Would that work for your use case?
--
Florian


Re: [Patch net v2] mlx5: fixup checksum for short ethernet frame padding

2018-12-03 Thread Cong Wang
On Mon, Dec 3, 2018 at 3:17 PM David Miller  wrote:
>
> Saeed, are you going to take care of this?

David, I will send v3 to switch to CHECKSUM_NONE for these short
frames, which also could avoid parsing IP headers.

Thanks.


Re: [PATCH net-next v4 0/2] Add mii_bus to ixgbe driver for dsa devs

2018-12-03 Thread Steve Douthit
> Not directly related to this patch series, are you using the legacy or
> "new" way of passing platform data in order to register the DSA switch?
> Since you mentioned 6390, I would assume this must be the "new" way of
> registering DSA devices with mdio_boardinfo etc. In that case, have you
> found yourself limited by the current dsa_chip_platform_data?

With the new method I had an off-by-one error where the 'enabled_ports'
bitmask and 'port_names' array didn't match up in my first attempt.
That's definitely my fault, but the loop that searches for the "cpu"
string didn't like it and segfaulted.

I've got something of a chicken-and-the-egg problem between where I'm
deferring while waiting for netdevs to show up, and registering the
mdio_board_info that needs those same pointers.  For testing I exported
mdiobus_setup_mdiodev_from_board_info() and mdiobus_create_device() so I
could call the setup function again when the data was actually ready.

It'd be nice to have more than one "cpu" port on a switch and have some
way to associate downstream and "cpu" ports.  Not sure exactly what that
would look like, and it's not something I'm going to try and tackle at
the moment, but it's one for the wish list.


[PATCH RFC 2/6] net: dsa: microchip: Add MIB counter reading support

2018-12-03 Thread Tristram.Ha
From: Tristram Ha 

Add MIB counter reading support.

Signed-off-by: Tristram Ha 
Reviewed-by: Woojung Huh 
---
 drivers/net/dsa/microchip/ksz9477.c| 121 ++---
 drivers/net/dsa/microchip/ksz_common.c | 101 +++
 drivers/net/dsa/microchip/ksz_common.h |   2 +
 drivers/net/dsa/microchip/ksz_priv.h   |   7 +-
 4 files changed, 186 insertions(+), 45 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz9477.c 
b/drivers/net/dsa/microchip/ksz9477.c
index 98aa25e..bd1ca33 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -259,6 +259,76 @@ static int ksz9477_reset_switch(struct ksz_device *dev)
return 0;
 }
 
+static void ksz9477_r_mib_cnt(struct ksz_device *dev, int port, u16 addr,
+ u64 *cnt)
+{
+   u32 data;
+   int timeout;
+   struct ksz_port *p = >ports[port];
+
+   /* retain the flush/freeze bit */
+   data = p->freeze ? MIB_COUNTER_FLUSH_FREEZE : 0;
+   data |= MIB_COUNTER_READ;
+   data |= (addr << MIB_COUNTER_INDEX_S);
+   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4, data);
+
+   timeout = 1000;
+   do {
+   ksz_pread32(dev, port, REG_PORT_MIB_CTRL_STAT__4,
+   );
+   usleep_range(1, 10);
+   if (!(data & MIB_COUNTER_READ))
+   break;
+   } while (timeout-- > 0);
+
+   /* failed to read MIB. get out of loop */
+   if (!timeout) {
+   dev_dbg(dev->dev, "Failed to get MIB\n");
+   return;
+   }
+
+   /* count resets upon read */
+   ksz_pread32(dev, port, REG_PORT_MIB_DATA, );
+   *cnt += data;
+}
+
+static void ksz9477_r_mib_pkt(struct ksz_device *dev, int port, u16 addr,
+ u64 *dropped, u64 *cnt)
+{
+   addr = ksz9477_mib_names[addr].index;
+   ksz9477_r_mib_cnt(dev, port, addr, cnt);
+}
+
+static void ksz9477_freeze_mib(struct ksz_device *dev, int port, bool freeze)
+{
+   struct ksz_port *p = >ports[port];
+   u32 val = freeze ? MIB_COUNTER_FLUSH_FREEZE : 0;
+
+   /* enable/disable the port for flush/freeze function */
+   mutex_lock(>mib.cnt_mutex);
+   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4, val);
+
+   /* used by MIB counter reading code to know freeze is enabled */
+   p->freeze = freeze;
+   mutex_unlock(>mib.cnt_mutex);
+}
+
+static void ksz9477_port_init_cnt(struct ksz_device *dev, int port)
+{
+   struct ksz_port_mib *mib = >ports[port].mib;
+
+   /* flush all enabled port MIB counters */
+   mutex_lock(>cnt_mutex);
+   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4,
+MIB_COUNTER_FLUSH_FREEZE);
+   ksz_write8(dev, REG_SW_MAC_CTRL_6, SW_MIB_COUNTER_FLUSH);
+   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4, 0);
+   mutex_unlock(>cnt_mutex);
+
+   mib->cnt_ptr = 0;
+   memset(mib->counters, 0, dev->mib_cnt * sizeof(u64));
+}
+
 static enum dsa_tag_protocol ksz9477_get_tag_protocol(struct dsa_switch *ds,
  int port)
 {
@@ -342,47 +412,6 @@ static void ksz9477_get_strings(struct dsa_switch *ds, int 
port,
}
 }
 
-static void ksz_get_ethtool_stats(struct dsa_switch *ds, int port,
- uint64_t *buf)
-{
-   struct ksz_device *dev = ds->priv;
-   int i;
-   u32 data;
-   int timeout;
-
-   mutex_lock(>stats_mutex);
-
-   for (i = 0; i < TOTAL_SWITCH_COUNTER_NUM; i++) {
-   data = MIB_COUNTER_READ;
-   data |= ((ksz9477_mib_names[i].index & 0xFF) <<
-   MIB_COUNTER_INDEX_S);
-   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4, data);
-
-   timeout = 1000;
-   do {
-   ksz_pread32(dev, port, REG_PORT_MIB_CTRL_STAT__4,
-   );
-   usleep_range(1, 10);
-   if (!(data & MIB_COUNTER_READ))
-   break;
-   } while (timeout-- > 0);
-
-   /* failed to read MIB. get out of loop */
-   if (!timeout) {
-   dev_dbg(dev->dev, "Failed to get MIB\n");
-   break;
-   }
-
-   /* count resets upon read */
-   ksz_pread32(dev, port, REG_PORT_MIB_DATA, );
-
-   dev->mib_value[i] += (uint64_t)data;
-   buf[i] = dev->mib_value[i];
-   }
-
-   mutex_unlock(>stats_mutex);
-}
-
 static void ksz9477_cfg_port_member(struct ksz_device *dev, int port,
u8 member)
 {
@@ -1150,9 +1179,14 @@ static int ksz9477_setup(struct dsa_switch *ds)
/* queue based egress rate limit */
ksz_cfg(dev, REG_SW_MAC_CTRL_5, SW_OUT_RATE_LIMIT_QUEUE_BASED, true);
 
+   /* enable global MIB counter 

[PATCH RFC 5/6] net: dsa: microchip: Update tag_ksz.c to access switch driver

2018-12-03 Thread Tristram.Ha
From: Tristram Ha 

Update tag_ksz.c to access switch driver's tail tagging operations.

Signed-off-by: Tristram Ha 
---
 net/dsa/tag_ksz.c | 44 
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 0f62eff..307e58b 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -11,37 +11,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "dsa_priv.h"
 
-/* For Ingress (Host -> KSZ), 2 bytes are added before FCS.
- * ---
- * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes)
- * ---
- * tag0 : Prioritization (not used now)
- * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5)
- *
- * For Egress (KSZ -> Host), 1 byte is added before FCS.
- * ---
- * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|FCS(4bytes)
- * ---
- * tag0 : zero-based value represents port
- *   (eg, 0x00=port1, 0x02=port3, 0x06=port7)
- */
-
-#defineKSZ_INGRESS_TAG_LEN 2
 #defineKSZ_EGRESS_TAG_LEN  1
 
 static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct dsa_port *dp = dsa_slave_to_port(dev);
+   struct ksz_device *sw = dp->ds->priv;
struct sk_buff *nskb;
+   int len;
int padlen;
-   u8 *tag;
+
+   len = sw->tag_ops->get_len(sw);
 
padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
 
-   if (skb_tailroom(skb) >= padlen + KSZ_INGRESS_TAG_LEN) {
+   if (skb_tailroom(skb) >= padlen + len) {
/* Let dsa_slave_xmit() free skb */
if (__skb_put_padto(skb, skb->len + padlen, false))
return NULL;
@@ -49,7 +37,7 @@ static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct 
net_device *dev)
nskb = skb;
} else {
nskb = alloc_skb(NET_IP_ALIGN + skb->len +
-padlen + KSZ_INGRESS_TAG_LEN, GFP_ATOMIC);
+padlen + len, GFP_ATOMIC);
if (!nskb)
return NULL;
skb_reserve(nskb, NET_IP_ALIGN);
@@ -70,9 +58,8 @@ static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct 
net_device *dev)
consume_skb(skb);
}
 
-   tag = skb_put(nskb, KSZ_INGRESS_TAG_LEN);
-   tag[0] = 0;
-   tag[1] = 1 << dp->index; /* destination port */
+   sw->tag_ops->set_tag(sw, skb_put(nskb, len), skb_mac_header(nskb),
+dp->index);
 
return nskb;
 }
@@ -80,18 +67,27 @@ static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct 
net_device *dev)
 static struct sk_buff *ksz_rcv(struct sk_buff *skb, struct net_device *dev,
   struct packet_type *pt)
 {
+   struct dsa_port *cpu_dp = dev->dsa_ptr;
+   struct dsa_switch_tree *dst = cpu_dp->dst;
+   struct dsa_switch *ds = dst->ds[0];
+   struct ksz_device *sw;
u8 *tag;
+   int len;
int source_port;
 
+   if (!ds)
+   return NULL;
+   sw = ds->priv;
+
tag = skb_tail_pointer(skb) - KSZ_EGRESS_TAG_LEN;
 
-   source_port = tag[0] & 7;
+   len = sw->tag_ops->get_tag(sw, tag, _port);
 
skb->dev = dsa_master_find_slave(dev, 0, source_port);
if (!skb->dev)
return NULL;
 
-   pskb_trim_rcsum(skb, skb->len - KSZ_EGRESS_TAG_LEN);
+   pskb_trim_rcsum(skb, skb->len - len);
 
return skb;
 }
-- 
1.9.1



[PATCH RFC 6/6] net: dsa: microchip: Add switch offload forwarding support

2018-12-03 Thread Tristram.Ha
From: Tristram Ha 

The flag offload_fwd_mark is set as the switch can forward frames by
itself.

Signed-off-by: Tristram Ha 
---
 drivers/net/dsa/microchip/ksz9477.c | 7 ---
 net/dsa/tag_ksz.c   | 2 ++
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz9477.c 
b/drivers/net/dsa/microchip/ksz9477.c
index c690c2b2..ca9199a 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -434,6 +434,7 @@ static void ksz9477_port_stp_state_set(struct dsa_switch 
*ds, int port,
struct ksz_port *p = >ports[port];
u8 data;
int member = -1;
+   int forward = dev->member;
 
ksz_pread8(dev, port, P_STP_CTRL, );
data &= ~(PORT_TX_ENABLE | PORT_RX_ENABLE | PORT_LEARN_DISABLE);
@@ -501,10 +502,10 @@ static void ksz9477_port_stp_state_set(struct dsa_switch 
*ds, int port,
}
 
/* When topology has changed the function ksz_update_port_member
-* should be called to modify port forwarding behavior.  However
-* as the offload_fwd_mark indication cannot be reported here
-* the switch forwarding function is not enabled.
+* should be called to modify port forwarding behavior.
 */
+   if (forward != dev->member)
+   ksz_update_port_member(dev, port);
 }
 
 static void ksz9477_flush_dyn_mac_table(struct ksz_device *dev, int port)
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 307e58b..50511c2 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -89,6 +89,8 @@ static struct sk_buff *ksz_rcv(struct sk_buff *skb, struct 
net_device *dev,
 
pskb_trim_rcsum(skb, skb->len - len);
 
+   skb->offload_fwd_mark = 1;
+
return skb;
 }
 
-- 
1.9.1



[PATCH RFC 3/6] net: dsa: microchip: Break ksz_priv.h into two files

2018-12-03 Thread Tristram.Ha
From: Tristram Ha 

Break ksz_priv.h into two files: private and public.  The public one is
put in include/linux/dsa/ksz_dsa.h.

This allows the tail tagging code tag_ksz.c to access the switch driver
as the tail tag format can be different when certain switch functions are
enabled.

Signed-off-by: Tristram Ha 
---
 drivers/net/dsa/microchip/ksz_priv.h | 86 +-
 include/linux/dsa/ksz_dsa.h  | 90 
 2 files changed, 92 insertions(+), 84 deletions(-)
 create mode 100644 include/linux/dsa/ksz_dsa.h

diff --git a/drivers/net/dsa/microchip/ksz_priv.h 
b/drivers/net/dsa/microchip/ksz_priv.h
index 1955ea6..5d93822 100644
--- a/drivers/net/dsa/microchip/ksz_priv.h
+++ b/drivers/net/dsa/microchip/ksz_priv.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0
  *
- * Microchip KSZ series switch common definitions
+ * Microchip KSZ series switch private definitions
  *
  * Copyright (C) 2017-2018 Microchip Technology Inc.
  */
@@ -13,89 +13,7 @@
 #include 
 #include 
 #include 
-
-struct ksz_io_ops;
-
-struct vlan_table {
-   u32 table[3];
-};
-
-struct ksz_port_mib {
-   struct mutex cnt_mutex; /* structure access */
-   u8 cnt_ptr;
-   u64 *counters;
-};
-
-struct ksz_port {
-   u16 member;
-   u16 vid_member;
-   int stp_state;
-   struct phy_device phydev;
-
-   u32 on:1;   /* port is not disabled by hardware */
-   u32 phy:1;  /* port has a PHY */
-   u32 fiber:1;/* port is fiber */
-   u32 sgmii:1;/* port is SGMII */
-   u32 force:1;
-   u32 link_just_down:1;   /* link just goes down */
-   u32 freeze:1;   /* MIB counter freeze is enabled */
-
-   struct ksz_port_mib mib;
-};
-
-struct ksz_device {
-   struct dsa_switch *ds;
-   struct ksz_platform_data *pdata;
-   const char *name;
-
-   struct mutex reg_mutex; /* register access */
-   struct mutex stats_mutex;   /* status access */
-   struct mutex alu_mutex; /* ALU access */
-   struct mutex vlan_mutex;/* vlan access */
-   const struct ksz_io_ops *ops;
-   const struct ksz_dev_ops *dev_ops;
-
-   struct device *dev;
-
-   void *priv;
-
-   /* chip specific data */
-   u32 chip_id;
-   int num_vlans;
-   int num_alus;
-   int num_statics;
-   int cpu_port;   /* port connected to CPU */
-   int cpu_ports;  /* port bitmap can be cpu port */
-   int phy_port_cnt;
-   int port_cnt;
-   int reg_mib_cnt;
-   int mib_cnt;
-   int mib_port_cnt;
-   int last_port;  /* ports after that not used */
-   phy_interface_t interface;
-   u32 regs_size;
-
-   struct vlan_table *vlan_cache;
-
-   u8 *txbuf;
-
-   struct ksz_port *ports;
-   struct timer_list mib_read_timer;
-   struct work_struct mib_read;
-   unsigned long mib_read_interval;
-   u16 br_member;
-   u16 member;
-   u16 live_ports;
-   u16 on_ports;   /* ports enabled by DSA */
-   u16 rx_ports;
-   u16 tx_ports;
-   u16 mirror_rx;
-   u16 mirror_tx;
-   u32 features;   /* chip specific features */
-   u32 overrides;  /* chip functions set by user */
-   u16 host_mask;
-   u16 port_mask;
-};
+#include 
 
 struct ksz_io_ops {
int (*read8)(struct ksz_device *dev, u32 reg, u8 *value);
diff --git a/include/linux/dsa/ksz_dsa.h b/include/linux/dsa/ksz_dsa.h
new file mode 100644
index 000..3148cae
--- /dev/null
+++ b/include/linux/dsa/ksz_dsa.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Microchip KSZ series switch common definitions
+ *
+ * Copyright (C) 2017-2018 Microchip Technology Inc.
+ */
+
+#include 
+
+struct vlan_table {
+   u32 table[3];
+};
+
+struct ksz_port_mib {
+   struct mutex cnt_mutex; /* structure access */
+   u8 cnt_ptr;
+   u64 *counters;
+};
+
+struct ksz_port {
+   u16 member;
+   u16 vid_member;
+   int stp_state;
+   struct phy_device phydev;
+
+   u32 on:1;   /* port is not disabled by hardware */
+   u32 phy:1;  /* port has a PHY */
+   u32 fiber:1;/* port is fiber */
+   u32 sgmii:1;/* port is SGMII */
+   u32 force:1;
+   u32 link_just_down:1;   /* link just goes down */
+   u32 freeze:1;   /* MIB counter freeze is enabled */
+
+   struct ksz_port_mib mib;
+};
+
+struct ksz_device {
+   struct dsa_switch *ds;
+   struct ksz_platform_data *pdata;
+   const char *name;
+
+   struct mutex reg_mutex; /* register access */
+   struct mutex stats_mutex;   /* status access */
+   struct mutex alu_mutex; /* 

[PATCH RFC 4/6] net: dsa: microchip: Each switch driver has its own tail tagging operations

2018-12-03 Thread Tristram.Ha
From: Tristram Ha 

The tail tagging operations are implemented in each switch driver so that
the main tail tagging code tag_ksz.c does not need to be changed after
modification to support that mechanism is made.

Signed-off-by: Tristram Ha 
---
 drivers/net/dsa/microchip/ksz9477.c| 78 +-
 drivers/net/dsa/microchip/ksz_common.c |  4 +-
 drivers/net/dsa/microchip/ksz_priv.h   |  3 +-
 include/linux/dsa/ksz_dsa.h|  9 
 4 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz9477.c 
b/drivers/net/dsa/microchip/ksz9477.c
index bd1ca33..c690c2b2 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -21,6 +21,14 @@
 #include "ksz_common.h"
 #include "ksz9477_reg.h"
 
+/* features flags */
+#define GBIT_SUPPORT   BIT(0)
+#define NEW_XMII   BIT(1)
+#define IS_9893BIT(2)
+
+/* overrides flags */
+#define PTP_TAGBIT(0)
+
 static const struct {
int index;
char string[ETH_GSTRING_LEN];
@@ -1356,9 +1364,77 @@ static void ksz9477_switch_exit(struct ksz_device *dev)
.exit = ksz9477_switch_exit,
 };
 
+/* For Ingress (Host -> KSZ), 2 bytes are added before FCS.
+ * ---
+ * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes)
+ * ---
+ * tag0 : Prioritization (not used now)
+ * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5)
+ *
+ * For switch with 3 ports only one byte is needed.
+ * When PTP function is enabled additional 4 bytes are needed.
+ *
+ * For Egress (KSZ -> Host), 1 byte is added before FCS.
+ * ---
+ * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|FCS(4bytes)
+ * ---
+ * tag0 : zero-based value represents port
+ *   (eg, 0x00=port1, 0x02=port3, 0x06=port7)
+ *
+ * When PTP function is enabled BIT 7 indicates the received frame is a PTP
+ * message and so there are 4 additional bytes for the receive timestamp.
+ */
+
+static int ksz9477_get_len(struct ksz_device *dev)
+{
+   int len = 1;
+
+   if (!(dev->features & IS_9893))
+   len += 1;
+   if (dev->overrides & PTP_TAG)
+   len += 4;
+   return len;
+}
+
+static int ksz9477_get_tag(struct ksz_device *dev, u8 *tag, int *port)
+{
+   int len = 1;
+
+   if (tag[0] & BIT(7))
+   len += 4;
+   *port = tag[0] & 7;
+   return len;
+}
+
+static void ksz9477_set_tag(struct ksz_device *dev, void *ptr, u8 *addr, int p)
+{
+   if (dev->overrides & PTP_TAG) {
+   u32 *timestamp = (u32 *)ptr;
+
+   *timestamp = 0;
+   ptr = timestamp + 1;
+   }
+   if (dev->features & IS_9893) {
+   u8 *tag = (u8 *)ptr;
+
+   *tag = 1 << p;
+   } else {
+   u16 *tag = (u16 *)ptr;
+
+   *tag = 1 << p;
+   *tag = cpu_to_be16(*tag);
+   }
+}
+
+static const struct ksz_tag_ops ksz9477_tag_ops = {
+   .get_len = ksz9477_get_len,
+   .get_tag = ksz9477_get_tag,
+   .set_tag = ksz9477_set_tag,
+};
+
 int ksz9477_switch_register(struct ksz_device *dev)
 {
-   return ksz_switch_register(dev, _dev_ops);
+   return ksz_switch_register(dev, _dev_ops, _tag_ops);
 }
 EXPORT_SYMBOL(ksz9477_switch_register);
 
diff --git a/drivers/net/dsa/microchip/ksz_common.c 
b/drivers/net/dsa/microchip/ksz_common.c
index 7b8f57b..a72659b 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -399,7 +399,8 @@ struct ksz_device *ksz_switch_alloc(struct device *base,
 EXPORT_SYMBOL(ksz_switch_alloc);
 
 int ksz_switch_register(struct ksz_device *dev,
-   const struct ksz_dev_ops *ops)
+   const struct ksz_dev_ops *ops,
+   const struct ksz_tag_ops *tag_ops)
 {
int ret;
 
@@ -412,6 +413,7 @@ int ksz_switch_register(struct ksz_device *dev,
mutex_init(>vlan_mutex);
 
dev->dev_ops = ops;
+   dev->tag_ops = tag_ops;
 
if (dev->dev_ops->detect(dev))
return -EINVAL;
diff --git a/drivers/net/dsa/microchip/ksz_priv.h 
b/drivers/net/dsa/microchip/ksz_priv.h
index 5d93822..d020c1e 100644
--- a/drivers/net/dsa/microchip/ksz_priv.h
+++ b/drivers/net/dsa/microchip/ksz_priv.h
@@ -78,7 +78,8 @@ struct ksz_dev_ops {
 struct ksz_device *ksz_switch_alloc(struct device *base,
const struct ksz_io_ops *ops, void *priv);
 int ksz_switch_register(struct ksz_device *dev,
-   const struct ksz_dev_ops *ops);
+   const struct 

[PATCH RFC 0/6] net: dsa: microchip: Modify KSZ9477 DSA driver to support different tail tag formats

2018-12-03 Thread Tristram.Ha
From: Tristram Ha 

This series of patches is to modify the KSZ9477 DSA driver to support
different tail tag formats such that other new KSZ switch drivers can be
added without changing the base tail tagging code.

Tristram Ha (6):
  net: dsa: microchip: Prepare PHY for proper advertisement
  net: dsa: microchip: Add MIB counter reading support
  net: dsa: microchip: Break ksz_priv.h into two files
  net: dsa: microchip: Each switch driver has its own tail tagging
operations
  net: dsa: microchip: Update tag_ksz.c to access switch driver
  net: dsa: microchip: Add switch offload forwarding support

 drivers/net/dsa/microchip/ksz9477.c| 218 ++---
 drivers/net/dsa/microchip/ksz_common.c | 122 +-
 drivers/net/dsa/microchip/ksz_common.h |   4 +
 drivers/net/dsa/microchip/ksz_priv.h   |  94 ++
 include/linux/dsa/ksz_dsa.h|  99 +++
 net/dsa/tag_ksz.c  |  46 ---
 6 files changed, 426 insertions(+), 157 deletions(-)
 create mode 100644 include/linux/dsa/ksz_dsa.h

-- 
1.9.1



[PATCH RFC 1/6] net: dsa: microchip: Prepare PHY for proper advertisement

2018-12-03 Thread Tristram.Ha
From: Tristram Ha 

Prepare PHY for proper advertisement and get link status for the port.

Signed-off-by: Tristram Ha 
Reviewed-by: Woojung Huh 
---
 drivers/net/dsa/microchip/ksz9477.c| 12 
 drivers/net/dsa/microchip/ksz_common.c | 17 +
 drivers/net/dsa/microchip/ksz_common.h |  2 ++
 drivers/net/dsa/microchip/ksz_priv.h   |  2 ++
 4 files changed, 33 insertions(+)

diff --git a/drivers/net/dsa/microchip/ksz9477.c 
b/drivers/net/dsa/microchip/ksz9477.c
index 0684657..98aa25e 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -969,6 +969,16 @@ static void ksz9477_port_mirror_del(struct dsa_switch *ds, 
int port,
 PORT_MIRROR_SNIFFER, false);
 }
 
+static void ksz9477_phy_setup(struct ksz_device *dev, int port,
+ struct phy_device *phy)
+{
+   if (port < dev->phy_port_cnt) {
+   /* SUPPORTED_Asym_Pause and SUPPORTED_Pause can be removed to
+* disable flow control when rate limiting is used.
+*/
+   }
+}
+
 static void ksz9477_port_setup(struct ksz_device *dev, int port, bool cpu_port)
 {
u8 data8;
@@ -1151,6 +1161,7 @@ static int ksz9477_setup(struct dsa_switch *ds)
.setup  = ksz9477_setup,
.phy_read   = ksz9477_phy_read16,
.phy_write  = ksz9477_phy_write16,
+   .adjust_link= ksz_adjust_link,
.port_enable= ksz_enable_port,
.port_disable   = ksz_disable_port,
.get_strings= ksz9477_get_strings,
@@ -1298,6 +1309,7 @@ static void ksz9477_switch_exit(struct ksz_device *dev)
.get_port_addr = ksz9477_get_port_addr,
.cfg_port_member = ksz9477_cfg_port_member,
.flush_dyn_mac_table = ksz9477_flush_dyn_mac_table,
+   .phy_setup = ksz9477_phy_setup,
.port_setup = ksz9477_port_setup,
.shutdown = ksz9477_reset_switch,
.detect = ksz9477_switch_detect,
diff --git a/drivers/net/dsa/microchip/ksz_common.c 
b/drivers/net/dsa/microchip/ksz_common.c
index 9705808..39adc57 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -61,6 +61,22 @@ int ksz_phy_write16(struct dsa_switch *ds, int addr, int 
reg, u16 val)
 }
 EXPORT_SYMBOL_GPL(ksz_phy_write16);
 
+void ksz_adjust_link(struct dsa_switch *ds, int port,
+struct phy_device *phydev)
+{
+   struct ksz_device *dev = ds->priv;
+   struct ksz_port *p = >ports[port];
+
+   if (phydev->link) {
+   dev->live_ports |= (1 << port) & dev->on_ports;
+   } else if (p->phydev.link) {
+   p->link_just_down = 1;
+   dev->live_ports &= ~(1 << port);
+   }
+   p->phydev = *phydev;
+}
+EXPORT_SYMBOL_GPL(ksz_adjust_link);
+
 int ksz_sset_count(struct dsa_switch *ds, int port, int sset)
 {
struct ksz_device *dev = ds->priv;
@@ -238,6 +254,7 @@ int ksz_enable_port(struct dsa_switch *ds, int port, struct 
phy_device *phy)
 
/* setup slave port */
dev->dev_ops->port_setup(dev, port, false);
+   dev->dev_ops->phy_setup(dev, port, phy);
 
/* port_stp_state_set() will be called after to enable the port so
 * there is no need to do anything.
diff --git a/drivers/net/dsa/microchip/ksz_common.h 
b/drivers/net/dsa/microchip/ksz_common.h
index 2dd832d..206b313 100644
--- a/drivers/net/dsa/microchip/ksz_common.h
+++ b/drivers/net/dsa/microchip/ksz_common.h
@@ -13,6 +13,8 @@
 
 int ksz_phy_read16(struct dsa_switch *ds, int addr, int reg);
 int ksz_phy_write16(struct dsa_switch *ds, int addr, int reg, u16 val);
+void ksz_adjust_link(struct dsa_switch *ds, int port,
+struct phy_device *phydev);
 int ksz_sset_count(struct dsa_switch *ds, int port, int sset);
 int ksz_port_bridge_join(struct dsa_switch *ds, int port,
 struct net_device *br);
diff --git a/drivers/net/dsa/microchip/ksz_priv.h 
b/drivers/net/dsa/microchip/ksz_priv.h
index a38ff08..fcb75a8 100644
--- a/drivers/net/dsa/microchip/ksz_priv.h
+++ b/drivers/net/dsa/microchip/ksz_priv.h
@@ -135,6 +135,8 @@ struct ksz_dev_ops {
u32 (*get_port_addr)(int port, int offset);
void (*cfg_port_member)(struct ksz_device *dev, int port, u8 member);
void (*flush_dyn_mac_table)(struct ksz_device *dev, int port);
+   void (*phy_setup)(struct ksz_device *dev, int port,
+ struct phy_device *phy);
void (*port_setup)(struct ksz_device *dev, int port, bool cpu_port);
void (*r_phy)(struct ksz_device *dev, u16 phy, u16 reg, u16 *val);
void (*w_phy)(struct ksz_device *dev, u16 phy, u16 reg, u16 val);
-- 
1.9.1



Re: [Patch net v2] mlx5: fixup checksum for short ethernet frame padding

2018-12-03 Thread David Miller
From: Cong Wang 
Date: Tue, 27 Nov 2018 22:10:13 -0800

> When an ethernet frame is padded to meet the minimum ethernet frame
> size, the padding octets are not covered by the hardware checksum.
> Fortunately the padding octets are ususally zero's, which don't affect
> checksum. However, we have a switch which pads non-zero octets, this
> causes kernel hardware checksum fault repeatedly.
> 
> Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE 
> are friends"),
> skb checksum is forced to be CHECKSUM_NONE when padding is detected.
> After it, we need to keep skb->csum updated, like what we do for FCS.
> 
> The logic is a bit complicated when dealing with both FCS and padding,
> so I wrap it up in a helper function mlx5e_csum_padding().
> 
> I tested this patch with RXFCS on and off, it works fine without any
> warning in both cases.
> 
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are 
> friends"),
> Cc: Saeed Mahameed 
> Cc: Eric Dumazet 
> Signed-off-by: Cong Wang 

Saeed, are you going to take care of this?


Re: [PATCH net-next] net: phy: improve generic EEE ethtool functions

2018-12-03 Thread David Miller
From: Heiner Kallweit 
Date: Tue, 27 Nov 2018 22:30:14 +0100

> So far the two functions consider neither member eee_enabled nor
> eee_active. Therefore network drivers have to do this in some kind
> of glue code. I think this can be avoided.
> 
> Getting EEE parameters:
> When not advertising any EEE mode, we can't consider EEE to be enabled.
> Therefore interpret "EEE enabled" as "we advertise at least one EEE
> mode". It's similar with "EEE active": interpret it as "EEE modes
> advertised by both link partner have at least one mode in common".
> 
> Setting EEE parameters:
> If eee_enabled isn't set, don't advertise any EEE mode and restart
> aneg if needed to switch off EEE. If eee_enabled is set and
> data->advertised is empty (e.g. because EEE was disabled), advertise
> everything we support as default. This way EEE can easily switched
> on/off by doing ethtool --set-eee  eee on/off, w/o any additional
> parameters.
> 
> The changes to both functions shouldn't break any existing user.
> Once the changes have been applied, at least some users can be
> simplified.
> 
> Signed-off-by: Heiner Kallweit 

Applied.


Re: [PATCH bpf-next 2/3] bpf: add BPF_PROG_TEST_RUN support for flow dissector

2018-12-03 Thread Stanislav Fomichev
On 12/03, Song Liu wrote:
> On Mon, Dec 3, 2018 at 11:00 AM Stanislav Fomichev  wrote:
> >
> > The input is packet data, the output is struct bpf_flow_key. This should
> > make it easy to test flow dissector programs without elaborate
> > setup.
> >
> > Signed-off-by: Stanislav Fomichev 
> > ---
> >  include/linux/bpf.h |  3 ++
> >  net/bpf/test_run.c  | 76 +
> >  net/core/filter.c   |  1 +
> >  3 files changed, 74 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index e82b7039fc66..7a572d15d5dd 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -373,6 +373,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
> > union bpf_attr *kattr,
> >   union bpf_attr __user *uattr);
> >  int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr 
> > *kattr,
> >   union bpf_attr __user *uattr);
> > +int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> > +const union bpf_attr *kattr,
> > +union bpf_attr __user *uattr);
> >
> >  /* an array of programs to be executed under rcu_lock.
> >   *
> > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > index c89c22c49015..bfa05d31c6e3 100644
> > --- a/net/bpf/test_run.c
> > +++ b/net/bpf/test_run.c
> > @@ -14,21 +14,33 @@
> >  #include 
> >
> >  static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void 
> > *ctx,
> > -   struct bpf_cgroup_storage 
> > *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
> > +   struct bpf_cgroup_storage 
> > *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
> > +   struct bpf_flow_keys *flow_keys)
> >  {
> > u32 ret;
> >
> > preempt_disable();
> > rcu_read_lock();
> > bpf_cgroup_storage_set(storage);
> > -   ret = BPF_PROG_RUN(prog, ctx);
> > +
> > +   switch (prog->type) {
> > +   case BPF_PROG_TYPE_FLOW_DISSECTOR:
> > +   ret = __skb_flow_bpf_dissect(prog, ctx, 
> > _keys_dissector,
> > +flow_keys);
> > +   break;
> > +   default:
> > +   ret = BPF_PROG_RUN(prog, ctx);
> > +   break;
> > +   }
> > +
> 
> Is it possible to fold the logic above into 
> bpf_prog_test_run_flow_dissector()?
> In that way, the logic flow is similar to other bpf_prog_test_run_XXX()
> functions.
I can probably do everything that __skb_flow_bpf_dissect does inside of
bpf_prog_test_run_flow_dissector (pass flow_keys in cb); that would remove
this ugly program type check down the stack. But my additional goal here was
to test __skb_flow_bpf_dissect itself as well.

Attached some possible prototype below. The only issue I see with that
approach is that we need to clear flow_keys on each test iteration.
I think in my current patch I'm actually missing a memset(_keys, 0, ...)
for each iteration of __skb_flow_bpf_dissect;
I haven't tested multiple runs :-/

So I'd prefer to continue doing this prog type check (plus, add a missing
memset for flow_keys on each iteration in v2). WDYT?

--- 

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e82b7039fc66..7a572d15d5dd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -373,6 +373,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
union bpf_attr *kattr,
  union bpf_attr __user *uattr);
 int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
  union bpf_attr __user *uattr);
+int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
+const union bpf_attr *kattr,
+union bpf_attr __user *uattr);
 
 /* an array of programs to be executed under rcu_lock.
  *
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index c89c22c49015..04387ef1f95a 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -220,3 +220,60 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
union bpf_attr *kattr,
kfree(data);
return ret;
 }
+
+int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
+const union bpf_attr *kattr,
+union bpf_attr __user *uattr)
+{
+   struct bpf_flow_keys flow_keys = {};
+   u32 size = kattr->test.data_size_in;
+   u32 repeat = kattr->test.repeat;
+   struct bpf_skb_data_end *cb;
+   u32 retval, duration;
+   struct sk_buff *skb;
+   struct sock *sk;
+   void *data;
+   int ret;
+
+   if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
+   return -EINVAL;
+
+   data = bpf_test_init(kattr, size, NET_SKB_PAD + NET_IP_ALIGN,
+SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
+   if (IS_ERR(data))
+   return PTR_ERR(data);
+
+   sk 

Re: [PATCH bpf-next v2 0/2] sample: xdp1 improvements

2018-12-03 Thread Jakub Kicinski
On Sun, 2 Dec 2018 02:23:41 +, Matteo Croce wrote:
> On Sat, Dec 1, 2018 at 6:11 AM Alexei Starovoitov wrote:
> >
> > On Sat, Dec 01, 2018 at 01:23:04AM +0100, Matteo Croce wrote:  
> > > Small improvements to improve the readability and easiness
> > > to use of the xdp1 sample.  
> >
> > Applied to bpf-next.
> >
> > I think that sample code could be more useful if it's wrapped with bash
> > script like selftests/test_xdp* tests do.
> > At that point it can move to selftests to get 0bot coverage.
> > Would you be interested in doing that?
> >  
> 
> It would be nice, but I think that the samples have more urgent issues
> right now.
> Many examples doesn't compile on my system (Fedora 29, GCC 8.2.1, Clang 
> 7.0.0),
> these are the errors that I encounter:
> 
>   HOSTCC  /home/matteo/src/kernel/linux/samples/bpf/test_lru_dist
> /home/matteo/src/kernel/linux/samples/bpf/test_lru_dist.c:39:8: error:
> redefinition of ‘struct list_head’
>  struct list_head {
> ^

I wonder if there is a way to catch that, and print "run make
headers_install, please" rather than confuse newcomers.  Hm..

Like this?

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 35444f4a846b..b1fdea806d4d 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -205,6 +205,15 @@ HOSTCC = $(CROSS_COMPILE)gcc
 CLANG_ARCH_ARGS = -target $(ARCH)
 endif
 
+HDR_PROBE := $(shell echo "\#include \n struct list_head { int 
a; }; int main() { return 0; }" | \
+   gcc $(KBUILD_HOSTCFLAGS) -x c - -o /dev/null 2>/dev/null && \
+   echo okay)
+
+ifeq ($(HDR_PROBE),)
+$(warning Detected possible issues with include path.)
+$(warning Please install kernel headers locally (make headers_install))
+endif
+
 BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
 BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
 BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 
'usage.*llvm')


Re: [PATCH v2] samples: bpf: fix: seg fault with NULL pointer arg

2018-12-03 Thread Daniel Borkmann
On 12/03/2018 11:39 AM, Daniel T. Lee wrote:
> When NULL pointer accidentally passed to write_kprobe_events,
> due to strlen(NULL), segmentation fault happens.
> Changed code returns -1 to deal with this situation.
> 
> Bug issued with Smatch, static analysis.
> 
> Signed-off-by: Daniel T. Lee 

Applied to bpf-next, thanks!


Re: [PATCH v2 bpf] mips: bpf: fix encoding bug for mm_srlv32_op

2018-12-03 Thread Paul Burton
Hello,

Jiong Wang wrote:
> For micro-mips, srlv inside POOL32A encoding space should use 0x50
> sub-opcode, NOT 0x90.
> 
> Some early version ISA doc describes the encoding as 0x90 for both srlv and
> srav, this looks to me was a typo. I checked Binutils libopcode
> implementation which is using 0x50 for srlv and 0x90 for srav.
> 
> v1->v2:
> - Keep mm_srlv32_op sorted by value.
> 
> Fixes: f31318fdf324 ("MIPS: uasm: Add srlv uasm instruction")
> Cc: Markos Chandras 
> Cc: Paul Burton 
> Cc: linux-m...@vger.kernel.org
> Acked-by: Jakub Kicinski 
> Acked-by: Song Liu 
> Signed-off-by: Jiong Wang 

Applied to mips-fixes.

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.bur...@mips.com to report it. ]


Re: [iproute PATCH] ssfilter: Fix for inverted last expression

2018-12-03 Thread Stephen Hemminger
On Thu, 29 Nov 2018 13:20:37 +0100
Phil Sutter  wrote:

> When fixing for shift/reduce conflicts, possibility to invert the last
> expression by prefixing with '!' or 'not' was accidentally removed.
> 
> Fix this by allowing for expr to be an inverted expr so that any
> reference to it in exprlist accepts the inverted prefix.
> 
> Reported-by: Eric Dumazet 
> Fixes: b2038cc0b2403 ("ssfilter: Eliminate shift/reduce conflicts")
> Signed-off-by: Phil Sutter 

Applied. THanks.


Re: [PATCH iproute2] tc: add a missing space between rate estimator and backlog

2018-12-03 Thread Stephen Hemminger
On Fri, 30 Nov 2018 05:57:02 -0800
Eric Dumazet  wrote:

> When a rate estimator is active, "tc -s qd" displays
> something like :
> 
> rate 12616bit 11ppsbacklog 0b 0p requeues 2
> 
> instead of :
> 
> rate 12616bit 11pps backlog 0b 0p requeues 2
> 
> Fixes: 4fcec7f3665b ("tc: jsonify stats2")
> Signed-off-by: Eric Dumazet 
> Cc: Jiri Pirko 

Applied, thanks.


Re: [PATCH bpf-next 2/3] bpf: add BPF_PROG_TEST_RUN support for flow dissector

2018-12-03 Thread Song Liu
On Mon, Dec 3, 2018 at 11:00 AM Stanislav Fomichev  wrote:
>
> The input is packet data, the output is struct bpf_flow_key. This should
> make it easy to test flow dissector programs without elaborate
> setup.
>
> Signed-off-by: Stanislav Fomichev 
> ---
>  include/linux/bpf.h |  3 ++
>  net/bpf/test_run.c  | 76 +
>  net/core/filter.c   |  1 +
>  3 files changed, 74 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e82b7039fc66..7a572d15d5dd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -373,6 +373,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
> union bpf_attr *kattr,
>   union bpf_attr __user *uattr);
>  int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>   union bpf_attr __user *uattr);
> +int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> +const union bpf_attr *kattr,
> +union bpf_attr __user *uattr);
>
>  /* an array of programs to be executed under rcu_lock.
>   *
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index c89c22c49015..bfa05d31c6e3 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -14,21 +14,33 @@
>  #include 
>
>  static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
> -   struct bpf_cgroup_storage 
> *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
> +   struct bpf_cgroup_storage 
> *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
> +   struct bpf_flow_keys *flow_keys)
>  {
> u32 ret;
>
> preempt_disable();
> rcu_read_lock();
> bpf_cgroup_storage_set(storage);
> -   ret = BPF_PROG_RUN(prog, ctx);
> +
> +   switch (prog->type) {
> +   case BPF_PROG_TYPE_FLOW_DISSECTOR:
> +   ret = __skb_flow_bpf_dissect(prog, ctx, _keys_dissector,
> +flow_keys);
> +   break;
> +   default:
> +   ret = BPF_PROG_RUN(prog, ctx);
> +   break;
> +   }
> +

Is it possible to fold the logic above into bpf_prog_test_run_flow_dissector()?
In that way, the logic flow is similar to other bpf_prog_test_run_XXX()
functions.

Thanks,
Song


> rcu_read_unlock();
> preempt_enable();
>
> return ret;
>  }
>
> -static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 
> *time)
> +static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 
> *time,
> +   struct bpf_flow_keys *flow_keys)
>  {
> struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { 0 
> };
> enum bpf_cgroup_storage_type stype;
> @@ -49,7 +61,7 @@ static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, 
> u32 repeat, u32 *time)
> repeat = 1;
> time_start = ktime_get_ns();
> for (i = 0; i < repeat; i++) {
> -   ret = bpf_test_run_one(prog, ctx, storage);
> +   ret = bpf_test_run_one(prog, ctx, storage, flow_keys);
> if (need_resched()) {
> if (signal_pending(current))
> break;
> @@ -165,7 +177,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const 
> union bpf_attr *kattr,
> __skb_push(skb, hh_len);
> if (is_direct_pkt_access)
> bpf_compute_data_pointers(skb);
> -   retval = bpf_test_run(prog, skb, repeat, );
> +   retval = bpf_test_run(prog, skb, repeat, , NULL);
> if (!is_l2) {
> if (skb_headroom(skb) < hh_len) {
> int nhead = HH_DATA_ALIGN(hh_len - skb_headroom(skb));
> @@ -212,7 +224,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
> union bpf_attr *kattr,
> rxqueue = 
> __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0);
> xdp.rxq = >xdp_rxq;
>
> -   retval = bpf_test_run(prog, , repeat, );
> +   retval = bpf_test_run(prog, , repeat, , NULL);
> if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
> xdp.data_end != xdp.data + size)
> size = xdp.data_end - xdp.data;
> @@ -220,3 +232,55 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
> union bpf_attr *kattr,
> kfree(data);
> return ret;
>  }
> +
> +int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> +const union bpf_attr *kattr,
> +union bpf_attr __user *uattr)
> +{
> +   struct bpf_flow_keys flow_keys = {};
> +   u32 size = kattr->test.data_size_in;
> +   u32 repeat = kattr->test.repeat;
> +   u32 retval, duration;
> +   struct sk_buff *skb;
> +   struct sock *sk;
> +   void *data;
> +   int ret;
> +
> +   if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
> +  

[PATCH v2 bpf] mips: bpf: fix encoding bug for mm_srlv32_op

2018-12-03 Thread Jiong Wang
For micro-mips, srlv inside POOL32A encoding space should use 0x50
sub-opcode, NOT 0x90.

Some early version ISA doc describes the encoding as 0x90 for both srlv and
srav, this looks to me was a typo. I checked Binutils libopcode
implementation which is using 0x50 for srlv and 0x90 for srav.

v1->v2:
  - Keep mm_srlv32_op sorted by value.

Fixes: f31318fdf324 ("MIPS: uasm: Add srlv uasm instruction")
Cc: Markos Chandras 
Cc: Paul Burton 
Cc: linux-m...@vger.kernel.org
Acked-by: Jakub Kicinski 
Acked-by: Song Liu 
Signed-off-by: Jiong Wang 
---
 arch/mips/include/uapi/asm/inst.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/uapi/asm/inst.h 
b/arch/mips/include/uapi/asm/inst.h
index c05dcf5..273ef58 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -369,8 +369,8 @@ enum mm_32a_minor_op {
mm_ext_op = 0x02c,
mm_pool32axf_op = 0x03c,
mm_srl32_op = 0x040,
+   mm_srlv32_op = 0x050,
mm_sra_op = 0x080,
-   mm_srlv32_op = 0x090,
mm_rotr_op = 0x0c0,
mm_lwxs_op = 0x118,
mm_addu32_op = 0x150,
-- 
2.7.4



Re: [PATCH v7 0/4] Add VRF support for VXLAN underlay

2018-12-03 Thread David Miller
From: Alexis Bauvin 
Date: Mon,  3 Dec 2018 10:54:37 +0100

> We are trying to isolate the VXLAN traffic from different VMs with VRF as 
> shown
> in the schemas below:
 ...a
> We faced some issue in the datapath, here are the details:
> 
> * Egress traffic:
> The vxlan packets are sent directly to the default VRF because it's where the
> socket is bound, therefore the traffic has a default route via eth0. the
> workarount is to force this traffic to VRF green with ip rules.
> 
> * Ingress traffic:
> When receiving the traffic on eth0.2030 the vxlan socket is unreachable from
> VRF green. The workaround is to enable *udp_l3mdev_accept* sysctl, but
> this breaks isolation between overlay and underlay: packets sent from
> blue or red by e.g. a guest VM will be accepted by the socket, allowing
> injection of VXLAN packets from the overlay.
> 
> This patch serie fixes the issues describe above by allowing VXLAN socket to 
> be
> bound to a specific VRF device therefore looking up in the correct table.

Series applied to net-next, thanks.


Re: [PATCH v2 net] tun: remove skb access after netif_receive_skb

2018-12-03 Thread David Miller
From: Prashant Bhole 
Date: Mon,  3 Dec 2018 18:09:24 +0900

> In tun.c skb->len was accessed while doing stats accounting after a
> call to netif_receive_skb. We can not access skb after this call
> because buffers may be dropped.
> 
> The fix for this bug would be to store skb->len in local variable and
> then use it after netif_receive_skb(). IMO using xdp data size for
> accounting bytes will be better because input for tun_xdp_one() is
> xdp_buff.
> 
> Hence this patch:
> - fixes a bug by removing skb access after netif_receive_skb()
> - uses xdp data size for accounting bytes
 ...
> Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()")
> Reviewed-by: Toshiaki Makita 
> Signed-off-by: Prashant Bhole 
> Acked-by: Jason Wang 
> ---
> v1 -> v2:
> No change. Reposted due to patchwork status.

Applied, thanks.


Re: [PATCH bpf] mips: bpf: fix encoding bug for mm_srlv32_op

2018-12-03 Thread Jiong Wang

On 03/12/2018 22:04, Paul Burton wrote:

Hi Jiong,

On Sat, Dec 01, 2018 at 04:10:01AM -0500, Jiong Wang wrote:

For micro-mips, srlv inside POOL32A encoding space should use 0x50
sub-opcode, NOT 0x90.

Some early version ISA doc describes the encoding as 0x90 for both srlv and
srav, this looks to me was a typo. I checked Binutils libopcode
implementation which is using 0x50 for srlv and 0x90 for srav.

Are you aware of documentation that gets this right?


No, I am not aware of.


Looking at the
latest microMIPS spec I have available (6.05) it looks like this is
still documented incorrectly. I'll pass this along to the architecture
team.


Fixes: f31318fdf324 ("MIPS: uasm: Add srlv uasm instruction")
CC: Markos Chandras 
CC: Paul Burton 
Acked-by: Jakub Kicinski 
Signed-off-by: Jiong Wang 
---
  arch/mips/include/uapi/asm/inst.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/uapi/asm/inst.h 
b/arch/mips/include/uapi/asm/inst.h
index c05dcf5..80f35e7 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -370,7 +370,7 @@ enum mm_32a_minor_op {
mm_pool32axf_op = 0x03c,
mm_srl32_op = 0x040,
mm_sra_op = 0x080,
-   mm_srlv32_op = 0x090,
+   mm_srlv32_op = 0x050,
mm_rotr_op = 0x0c0,
mm_lwxs_op = 0x118,
mm_addu32_op = 0x150,

Could you also move it above mm_sra_op to keep them sorted by value?

When submitting v2 please also copy the linux-mips mailing list -
linux-m...@vger.kernel.org.


Thanks, will do.

Regards,
Jiong



Re: [PATCH bpf] mips: bpf: fix encoding bug for mm_srlv32_op

2018-12-03 Thread Paul Burton
Hi Jiong,

On Sat, Dec 01, 2018 at 04:10:01AM -0500, Jiong Wang wrote:
> For micro-mips, srlv inside POOL32A encoding space should use 0x50
> sub-opcode, NOT 0x90.
> 
> Some early version ISA doc describes the encoding as 0x90 for both srlv and
> srav, this looks to me was a typo. I checked Binutils libopcode
> implementation which is using 0x50 for srlv and 0x90 for srav.

Are you aware of documentation that gets this right? Looking at the
latest microMIPS spec I have available (6.05) it looks like this is
still documented incorrectly. I'll pass this along to the architecture
team.

> Fixes: f31318fdf324 ("MIPS: uasm: Add srlv uasm instruction")
> CC: Markos Chandras 
> CC: Paul Burton 
> Acked-by: Jakub Kicinski 
> Signed-off-by: Jiong Wang 
> ---
>  arch/mips/include/uapi/asm/inst.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/mips/include/uapi/asm/inst.h 
> b/arch/mips/include/uapi/asm/inst.h
> index c05dcf5..80f35e7 100644
> --- a/arch/mips/include/uapi/asm/inst.h
> +++ b/arch/mips/include/uapi/asm/inst.h
> @@ -370,7 +370,7 @@ enum mm_32a_minor_op {
>   mm_pool32axf_op = 0x03c,
>   mm_srl32_op = 0x040,
>   mm_sra_op = 0x080,
> - mm_srlv32_op = 0x090,
> + mm_srlv32_op = 0x050,
>   mm_rotr_op = 0x0c0,
>   mm_lwxs_op = 0x118,
>   mm_addu32_op = 0x150,

Could you also move it above mm_sra_op to keep them sorted by value?

When submitting v2 please also copy the linux-mips mailing list -
linux-m...@vger.kernel.org.

Thanks,
Paul


Re: [PATCH net-next v2 0/3] mlxsw: Add 'fw_load_policy' devlink parameter

2018-12-03 Thread David Miller
From: Ido Schimmel 
Date: Mon, 3 Dec 2018 07:58:57 +

> Shalom says:
> 
> Currently, drivers do not have the ability to control the firmware
> loading policy and they always use their own fixed policy. This prevents
> drivers from running the device with a different firmware version for
> testing and/or debugging purposes. For example, testing a firmware bug
> fix.
> 
> For these situations, the new devlink generic parameter,
> 'fw_load_policy', gives the ability to control this option and allows
> drivers to run with a different firmware version than required by the
> driver.
> 
> Patch #1 adds the new parameter to devlink. The other two patches, #2
> and #3, add support for this parameter in the mlxsw driver.
> 
> Example:
 ...
> iproute2 patches available here:
> https://github.com/tshalom/iproute2-next
> 
> v2:
> * Change 'fw_version_check' to 'fw_load_policy' with values 'driver' and
>   'flash' (Jakub)

Series applied.


Re: [PATCH net] net: phy: don't allow __set_phy_supported to add unsupported modes

2018-12-03 Thread David Miller
From: Heiner Kallweit 
Date: Mon, 3 Dec 2018 08:19:33 +0100

> Currently __set_phy_supported allows to add modes w/o checking whether
> the PHY supports them. This is wrong, it should never add modes but
> only remove modes we don't want to support.
> 
> The commit marked as fixed didn't do anything wrong, it just copied
> existing functionality to the helper which is being fixed now.
> 
> Fixes: f3a6bd393c2c ("phylib: Add phy_set_max_speed helper")
> Signed-off-by: Heiner Kallweit 
> ---
> This will cause a merge conflict once net is merged into net-next.
> And the fix will need minor tweaking when being backported to
> older kernel versions.

Applied and queued up for -stable.

I'll let you know if I need help with those backports.


Re: [PATCH net-next] net: phy: don't allow __set_phy_supported to add unsupported modes

2018-12-03 Thread David Miller
From: Heiner Kallweit 
Date: Mon, 3 Dec 2018 08:04:57 +0100

> Currently __set_phy_supported allows to add modes w/o checking whether
> the PHY supports them. This is wrong, it should never add modes but
> only remove modes we don't want to support.
> 
> Signed-off-by: Heiner Kallweit 

Applied.


Re: [Bug 201829] New: Failed build kernel with nat

2018-12-03 Thread Stephen Hemminger
On Sat, 01 Dec 2018 16:55:40 -0800 (PST)
David Miller  wrote:

> Stephen please actually read the reports your forward here.
> 
> This is a report about arch/x86/lib/inat.c which is a set of CPU
> instruction attributes and has nothing to do with networkig.

Yes, bug moved to x86 platform.


Re: No address after suspend/resume with 4.18

2018-12-03 Thread Stephen Hemminger
On Mon, 03 Dec 2018 13:19:20 -0800
Jeff Kirsher  wrote:

> On Fri, 2018-11-30 at 15:26 -0800, Stephen Hemminger wrote:
> > On my box with Debian testing, I see a new problem with
> > suspend/resume of wired network device.
> > Using stock Debian kernel 4.18.0-2-amd64
> > 
> > After suspend/resume cycle, IP address is lost.
> > 
> > Device Info:
> > $ /sbin/ethtool -i enp12s0
> > driver: igb
> > version: 5.4.0-k
> > firmware-version:  0. 6-1
> > expansion-rom-version: 
> > bus-info: :0c:00.0
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: yes
> > 
> > 
> > $ lspci -v -s :0c:00.0
> > 0c:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > Connection (rev 03)
> > Subsystem: Gigabyte Technology Co., Ltd I211 Gigabit Network
> > Connection
> > Flags: bus master, fast devsel, latency 0, IRQ 17, NUMA node 0
> > Memory at dfb0 (32-bit, non-prefetchable) [size=128K]
> > I/O ports at c000 [size=32]
> > Memory at dfb2 (32-bit, non-prefetchable) [size=16K]
> > Capabilities: 
> > Kernel driver in use: igb
> > Kernel modules: igb
> > 
> > State before suspend:
> > $ ip addr show dev enp12s0
> > 4: enp12s0:  mtu 1500 qdisc mq state
> > UP group default qlen 1000
> > link/ether 1c:1b:0d:0a:4b:0e brd ff:ff:ff:ff:ff:ff
> > inet 192.168.1.18/24 brd 192.168.1.255 scope global enp12s0
> >valid_lft forever preferred_lft forever
> > 
> > State after resume:
> > 
> > $ ip addr show dev enp12s0
> > 4: enp12s0:  mtu 1500 qdisc mq state
> > UP group default qlen 1000
> > link/ether 1c:1b:0d:0a:4b:0e brd ff:ff:ff:ff:ff:f
> > 
> > Doing ifdown/ifup which restarts the DHCP client does restore the
> > address.
> > 
> > Not sure if this is a kernel issue with carrier handling, Intel
> > driver issue, or DHCP client issue.  
> 
> We have not been able to reproduce this using Fedora and 4.18 kernel. 
> This looks to be a Debian/DHCP client specific issue, we are continuing
> our reproduction efforts but with Debian to see if we can determine a
> root cause.

Great, thanks. I will reproduce with latest update, and file Debian bug.


Re: [PATCH 2/2] net: phy: ensure autoneg is configured when resuming a phydev

2018-12-03 Thread Heiner Kallweit
On 03.12.2018 11:43, Anssi Hannula wrote:
> On 1.12.2018 0:16, Heiner Kallweit wrote:
>> On 30.11.2018 19:45, Anssi Hannula wrote:
>>> When a PHY_HALTED phydev is resumed by phy_start(), it is set to
>>> PHY_RESUMING to wait for the link to come up.
>>>
>>> However, if the phydev was put to PHY_HALTED (by e.g. phy_stop()) before
>>> autonegotiation was ever started by phy_state_machine(), autonegotiation
>>> remains unconfigured, i.e. phy_config_aneg() is never called.
>>>
>> phy_stop() should only be called once the PHY was started. But it's
>> right that we don't have full control over it because misbehaving
>> drivers may call phy_stop() even if the PHY hasn't been started yet.
> 
> Having run phy_start() does not guarantee that the phy_state_machine()
> has been run at least once, though.
> 
> I was able to reproduce the issue by calling phy_start(); phy_stop().
> Then on the next phy_start() autoneg is not configured.
> 
Right, phy_start() schedules the state machine run, so there is a small
window where this can happen. Did you experience this in any real-life
scenario?

>> I think phy_error() and phy_stop() should be extended to set the state
>> to PHY_HALTED only if the PHY is in a started state, means:
>> don't touch the state if it's DOWN, READY, or HALTED.
>> In case of phy_error() it's more a precaution, because it's used within
>> phylib only and AFAICS it can be triggered from a started state only.
> 
> This wouldn't fix the issue that occurs when phy_stop() is called right
> after phy_start(), though, as PHY is in UP state then.
> 
After having removed state AN I was thinking already whether we really
need to have separate states READY and HALTED. I think we wouldn't
have this scenario if phy_stop() and phy_error() would set the state
to READY. Merging the two states requires some upfront work, but I have
some ideas in the back of my mind.
Overall this should be cleaner than the proposed workaround.

>>
>> So yes, there is a theoretical issue. But as you wrote already, I think
>> there's a better way to deal with it.
>>
>> For checking whether PHY is in a started state you could use the helper
>> which is being discussed here:
>> https://marc.info/?l=linux-netdev=154360368104946=2
>>
>>
>>> Fix that by going to PHY_UP instead of PHY_RESUMING if autonegotiation
>>> has never been configured.
>>>
>>> Signed-off-by: Anssi Hannula 
>>> ---
>>>
>>> This doesn't feel as clean is I'd like to, though. Maybe there is a
>>> better way?
>>>
>>>
>>>  drivers/net/phy/phy.c | 11 ++-
>>>  include/linux/phy.h   |  2 ++
>>>  2 files changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>>> index d46858694db9..7a650cce7599 100644
>>> --- a/drivers/net/phy/phy.c
>>> +++ b/drivers/net/phy/phy.c
>>> @@ -553,6 +553,8 @@ int phy_start_aneg(struct phy_device *phydev)
>>> if (err < 0)
>>> goto out_unlock;
>>>  
>>> +   phydev->autoneg_configured = 1;
>>> +
>>> if (phydev->state != PHY_HALTED) {
>>> if (AUTONEG_ENABLE == phydev->autoneg) {
>>> err = phy_check_link_status(phydev);
>>> @@ -893,7 +895,14 @@ void phy_start(struct phy_device *phydev)
>>> break;
>>> }
>>>  
>>> -   phydev->state = PHY_RESUMING;
>>> +   /* if autoneg/forcing was never configured, go back to PHY_UP
>>> +* to make that happen
>>> +*/
>>> +   if (!phydev->autoneg_configured)
>>> +   phydev->state = PHY_UP;
>>> +   else
>>> +   phydev->state = PHY_RESUMING;
>>> +
>>> break;
>>> default:
>>> break;
>>> diff --git a/include/linux/phy.h b/include/linux/phy.h
>>> index 8f927246acdb..b306d5ed9d09 100644
>>> --- a/include/linux/phy.h
>>> +++ b/include/linux/phy.h
>>> @@ -389,6 +389,8 @@ struct phy_device {
>>> unsigned loopback_enabled:1;
>>>  
>>> unsigned autoneg:1;
>>> +   /* autoneg has been configured at least once */
>>> +   unsigned autoneg_configured:1;
>>> /* The most recently read link state */
>>> unsigned link:1;
>>>  
>>>
> 



Re: No address after suspend/resume with 4.18

2018-12-03 Thread Jeff Kirsher
On Fri, 2018-11-30 at 15:26 -0800, Stephen Hemminger wrote:
> On my box with Debian testing, I see a new problem with
> suspend/resume of wired network device.
> Using stock Debian kernel 4.18.0-2-amd64
> 
> After suspend/resume cycle, IP address is lost.
> 
> Device Info:
> $ /sbin/ethtool -i enp12s0
> driver: igb
> version: 5.4.0-k
> firmware-version:  0. 6-1
> expansion-rom-version: 
> bus-info: :0c:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
> 
> 
> $ lspci -v -s :0c:00.0
> 0c:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> Connection (rev 03)
>   Subsystem: Gigabyte Technology Co., Ltd I211 Gigabit Network
> Connection
>   Flags: bus master, fast devsel, latency 0, IRQ 17, NUMA node 0
>   Memory at dfb0 (32-bit, non-prefetchable) [size=128K]
>   I/O ports at c000 [size=32]
>   Memory at dfb2 (32-bit, non-prefetchable) [size=16K]
>   Capabilities: 
>   Kernel driver in use: igb
>   Kernel modules: igb
> 
> State before suspend:
> $ ip addr show dev enp12s0
> 4: enp12s0:  mtu 1500 qdisc mq state
> UP group default qlen 1000
> link/ether 1c:1b:0d:0a:4b:0e brd ff:ff:ff:ff:ff:ff
> inet 192.168.1.18/24 brd 192.168.1.255 scope global enp12s0
>valid_lft forever preferred_lft forever
> 
> State after resume:
> 
> $ ip addr show dev enp12s0
> 4: enp12s0:  mtu 1500 qdisc mq state
> UP group default qlen 1000
> link/ether 1c:1b:0d:0a:4b:0e brd ff:ff:ff:ff:ff:f
> 
> Doing ifdown/ifup which restarts the DHCP client does restore the
> address.
> 
> Not sure if this is a kernel issue with carrier handling, Intel
> driver issue, or DHCP client issue.

We have not been able to reproduce this using Fedora and 4.18 kernel. 
This looks to be a Debian/DHCP client specific issue, we are continuing
our reproduction efforts but with Debian to see if we can determine a
root cause.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH net-next v4 0/2] Add mii_bus to ixgbe driver for dsa devs

2018-12-03 Thread Florian Fainelli
On 12/3/18 12:14 PM, Steve Douthit wrote:
> Changes from v3 -> v4
> * Remove unecessary pointer casts
> * Fix copy/paste issues in comments
> * Simplify setting of swfw semaphore flags
> * Collect Reviewed-by: tags
> 
> Changes from v2 -> v3
> * Added warnings about interactions between this code and PHY polling
>   unit to the commit messages.
> 
> Changes from v1 -> v2
> [PATCH 1/2] ixgbe: register a mdiobus
> * Add intel-wired-...@lists.osuosl.org to CC list, see
> * select MII in Kconfig (thanks to the kbuild bot)
> * Only call mdiobus_regsiter for single x500em_a device
> * Use readx_poll_timeout() in ixgbe_msca_cmd()
> * Register different bus->read/write callbacks in ixgbe_mii_bus_init()
>   so there's device id check on every access
> * Use device pci_name() in bus->id instead of parent bridge's name
> 
> [PATCH 2/2] ixgbe: use mii_bus to handle MII related ioctls
> * Only use mdiobus_read/write for adapters that registered a mdiobus

Not directly related to this patch series, are you using the legacy or
"new" way of passing platform data in order to register the DSA switch?
Since you mentioned 6390, I would assume this must be the "new" way of
registering DSA devices with mdio_boardinfo etc. In that case, have you
found yourself limited by the current dsa_chip_platform_data?

> 
> Stephen Douthit (2):
>   ixgbe: register a mdiobus
>   ixgbe: use mii_bus to handle MII related ioctls
> 
>  drivers/net/ethernet/intel/Kconfig|   1 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h  |   2 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  23 ++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c  | 307 ++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h  |   2 +
>  5 files changed, 335 insertions(+)
> 


-- 
Florian


Re: [PATCH bpf-next] bpf: fix documentation for eBPF helpers

2018-12-03 Thread Daniel Borkmann
On 12/03/2018 01:13 PM, Quentin Monnet wrote:
> The missing indentation on the "Return" sections for bpf_map_pop_elem()
> and bpf_map_peek_elem() helpers break RST and man pages generation. This
> patch fixes them, and moves the description of those two helpers towards
> the end of the list (even though they are somehow related to the three
> first helpers for maps, the man page explicitly states that the helpers
> are sorted in chronological order).
> 
> While at it, bring other minor formatting edits for eBPF helpers
> documentation: mostly blank lines removal, RST formatting, or other
> small nits for consistency.
> 
> Signed-off-by: Quentin Monnet 
> Reviewed-by: Jakub Kicinski 

Applied to bpf-next, thanks!


Re: [PATCH bpf-next v2] bpf: allow BPF read access to qdisc pkt_len

2018-12-03 Thread Daniel Borkmann
On 12/03/2018 02:18 AM, Willem de Bruijn wrote:
> From: Petar Penkov 
> 
> The pkt_len field in qdisc_skb_cb stores the skb length as it will
> appear on the wire after segmentation. For byte accounting, this value
> is more accurate than skb->len. It is computed on entry to the TC
> layer, so only valid there.
> 
> Allow read access to this field from BPF tc classifier and action
> programs. The implementation is analogous to tc_classid, aside from
> restricting to read access.
> 
> To distinguish it from skb->len and self-describe export as wire_len.
> 
> Changes v1->v2
>   - Rename pkt_len to wire_len
> 
> Signed-off-by: Petar Penkov 
> Signed-off-by: Vlad Dumitrescu 
> Signed-off-by: Willem de Bruijn 

Applied to bpf-next, thanks!


RE: [PATCH v4 net-next 5/6] net: dsa: microchip: break KSZ9477 DSA driver into two files

2018-12-03 Thread Tristram.Ha
> From: Sørensen, Stefan 
> Sent: Monday, December 03, 2018 5:50 AM
> To: da...@davemloft.net; Tristram Ha - C24268
> 
> Cc: netdev@vger.kernel.org; pa...@ucw.cz; f.faine...@gmail.com;
> UNGLinuxDriver ; and...@lunn.ch
> Subject: Re: [PATCH v4 net-next 5/6] net: dsa: microchip: break KSZ9477 DSA
> driver into two files
> 
> On Tue, 2018-11-20 at 15:55 -0800, tristram...@microchip.com wrote:
> > From: Tristram Ha 
> >
> > Break KSZ9477 DSA driver into two files in preparation to add more
> > KSZ switch drivers.
> > Add common functions in ksz_common.h so that other KSZ switch drivers
> > can access code in ksz_common.c.
> > Add ksz_spi.h for common functions used by KSZ switch SPI drivers.
> 
> With this patch, the ethernet on my EVB-KSZ9477 is now completely non-
> functional, link is detected on the LAN ports, but no packets are
> forwarded to/from the CPU port.
> 
> Looking through the patch, it does much more than splitting up the
> existing code, e.g. the PORT_VLAN_MEMBERSHIP register is now written to
> - I suspect that change may be responsible for the problem.
> 

It is more likely the RGMII interface is not set properly.  There are upcoming
patches that address this issue.  The primary way to set RGMII delay is to set
phy-mode to "rgmii-txid" inside the device tree under the ksz9477 definition.
The chip can be set to MII/RMII with strap options, but there is no way to
change RGMII delay settings.

The patch will try to get the mode from hardware first.  When it is not set in
device tree this mode will be used.  When it is different from the one set a
warning will be displayed.  The one from device tree will be the one to use.

This will patch will support KSZ9893, a 3-port variant of KSZ9477.  It shares
most of the registers but has a different tail tag format, and uses a different
RGMII setting in the evaluation board.

Or do you use a different CPU port rather than the default port 6?



Re: [PATCH bpf-next] libbpf: Fix license in README.rst

2018-12-03 Thread Daniel Borkmann
On 12/03/2018 08:48 AM, Song Liu wrote:
> On Sun, Dec 2, 2018 at 1:04 PM Andrey Ignatov  wrote:
>>
>> The whole libbpf is licensed as (LGPL-2.1 OR BSD-2-Clause). I missed it
>> while adding README.rst. Fix it and use same license as all other files
>> in libbpf do. Since I'm the only author of README.rst so far, no others'
>> permissions should be needed.
>>
>> Fixes: 76d1b894c515 ("libbpf: Document API and ABI conventions")
>> Signed-off-by: Andrey Ignatov 
> Acked-by: Song Liu 

Applied, thanks!


[PATCH net-next v4 0/2] Add mii_bus to ixgbe driver for dsa devs

2018-12-03 Thread Steve Douthit
Changes from v3 -> v4
* Remove unecessary pointer casts
* Fix copy/paste issues in comments
* Simplify setting of swfw semaphore flags
* Collect Reviewed-by: tags

Changes from v2 -> v3
* Added warnings about interactions between this code and PHY polling
  unit to the commit messages.

Changes from v1 -> v2
[PATCH 1/2] ixgbe: register a mdiobus
* Add intel-wired-...@lists.osuosl.org to CC list, see
* select MII in Kconfig (thanks to the kbuild bot)
* Only call mdiobus_regsiter for single x500em_a device
* Use readx_poll_timeout() in ixgbe_msca_cmd()
* Register different bus->read/write callbacks in ixgbe_mii_bus_init()
  so there's device id check on every access
* Use device pci_name() in bus->id instead of parent bridge's name

[PATCH 2/2] ixgbe: use mii_bus to handle MII related ioctls
* Only use mdiobus_read/write for adapters that registered a mdiobus

Stephen Douthit (2):
  ixgbe: register a mdiobus
  ixgbe: use mii_bus to handle MII related ioctls

 drivers/net/ethernet/intel/Kconfig|   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |   2 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  23 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c  | 307 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h  |   2 +
 5 files changed, 335 insertions(+)

-- 
2.17.2



[PATCH net-next v4 1/2] ixgbe: register a mdiobus

2018-12-03 Thread Steve Douthit
Most dsa devices expect a 'struct mii_bus' pointer to talk to switches
via the MII interface.

While this works for dsa devices, it will not work safely with Linux
PHYs in all configurations since the firmware of the ixgbe device may
be polling some PHY addresses in the background.

Signed-off-by: Stephen Douthit 
Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
---
 drivers/net/ethernet/intel/Kconfig|   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |   2 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   5 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c  | 299 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h  |   2 +
 5 files changed, 309 insertions(+)

diff --git a/drivers/net/ethernet/intel/Kconfig 
b/drivers/net/ethernet/intel/Kconfig
index 59e1bc0f609e..79ad2c5860f0 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -159,6 +159,7 @@ config IXGBE
tristate "Intel(R) 10GbE PCI Express adapters support"
depends on PCI
select MDIO
+   select MII
imply PTP_1588_CLOCK
---help---
  This driver supports Intel(R) 10GbE PCI Express family of
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 143bdd5ee2a0..08d85e336bd4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -561,6 +562,7 @@ struct ixgbe_adapter {
struct net_device *netdev;
struct bpf_prog *xdp_prog;
struct pci_dev *pdev;
+   struct mii_bus *mii_bus;
 
unsigned long state;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b57fa0cfb222..b7907674c565 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -39,6 +39,7 @@
 #include "ixgbe.h"
 #include "ixgbe_common.h"
 #include "ixgbe_dcb_82599.h"
+#include "ixgbe_phy.h"
 #include "ixgbe_sriov.h"
 #include "ixgbe_model.h"
 #include "ixgbe_txrx_common.h"
@@ -11120,6 +11121,8 @@ static int ixgbe_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
IXGBE_LINK_SPEED_10GB_FULL | IXGBE_LINK_SPEED_1GB_FULL,
true);
 
+   ixgbe_mii_bus_init(hw);
+
return 0;
 
 err_register:
@@ -11170,6 +11173,8 @@ static void ixgbe_remove(struct pci_dev *pdev)
set_bit(__IXGBE_REMOVING, >state);
cancel_work_sync(>service_task);
 
+   if (adapter->mii_bus)
+   mdiobus_unregister(adapter->mii_bus);
 
 #ifdef CONFIG_IXGBE_DCA
if (adapter->flags & IXGBE_FLAG_DCA_ENABLED) {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c
index 919a7af84b42..cc4907f9ff02 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include "ixgbe.h"
@@ -658,6 +659,304 @@ s32 ixgbe_write_phy_reg_generic(struct ixgbe_hw *hw, u32 
reg_addr,
return status;
 }
 
+#define IXGBE_HW_READ_REG(addr) IXGBE_READ_REG(hw, addr)
+
+/**
+ *  ixgbe_msca_cmd - Write the command register and poll for completion/timeout
+ *  @hw: pointer to hardware structure
+ *  @cmd: command register value to write
+ **/
+static s32 ixgbe_msca_cmd(struct ixgbe_hw *hw, u32 cmd)
+{
+   IXGBE_WRITE_REG(hw, IXGBE_MSCA, cmd);
+
+   return readx_poll_timeout(IXGBE_HW_READ_REG, IXGBE_MSCA, cmd,
+ !(cmd & IXGBE_MSCA_MDI_COMMAND), 10,
+ 10 * IXGBE_MDIO_COMMAND_TIMEOUT);
+}
+
+/**
+ *  ixgbe_mii_bus_read_generic - Read a clause 22/45 register with gssr flags
+ *  @hw: pointer to hardware structure
+ *  @addr: address
+ *  @regnum: register number
+ *  @gssr: semaphore flags to acquire
+ **/
+static s32 ixgbe_mii_bus_read_generic(struct ixgbe_hw *hw, int addr,
+ int regnum, u32 gssr)
+{
+   u32 hwaddr, cmd;
+   s32 data;
+
+   if (hw->mac.ops.acquire_swfw_sync(hw, gssr))
+   return -EBUSY;
+
+   hwaddr = addr << IXGBE_MSCA_PHY_ADDR_SHIFT;
+   if (regnum & MII_ADDR_C45) {
+   hwaddr |= regnum & GENMASK(21, 0);
+   cmd = hwaddr | IXGBE_MSCA_ADDR_CYCLE | IXGBE_MSCA_MDI_COMMAND;
+   } else {
+   hwaddr |= (regnum & GENMASK(5, 0)) << IXGBE_MSCA_DEV_TYPE_SHIFT;
+   cmd = hwaddr | IXGBE_MSCA_OLD_PROTOCOL |
+   IXGBE_MSCA_READ_AUTOINC | IXGBE_MSCA_MDI_COMMAND;
+   }
+
+   data = ixgbe_msca_cmd(hw, cmd);
+   if (data < 0)
+   goto mii_bus_read_done;
+
+   /* For a clause 45 access the address cycle just completed, we still
+* need to do the read command, otherwise just get the data
+*/
+   

[PATCH net-next v4 2/2] ixgbe: use mii_bus to handle MII related ioctls

2018-12-03 Thread Steve Douthit
Use the mii_bus callbacks to address the entire clause 22/45 address
space.  Enables userspace to poke switch registers instead of a single
PHY address.

The ixgbe firmware may be polling PHYs in a way that is not protected by
the mii_bus lock.  This isn't new behavior, but as Andrew Lunn pointed
out there are more addresses available for conflicts.

Signed-off-by: Stephen Douthit 
Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b7907674c565..caa64afdedfb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8790,6 +8790,15 @@ ixgbe_mdio_read(struct net_device *netdev, int prtad, 
int devad, u16 addr)
u16 value;
int rc;
 
+   if (adapter->mii_bus) {
+   int regnum = addr;
+
+   if (devad != MDIO_DEVAD_NONE)
+   regnum |= (devad << 16) | MII_ADDR_C45;
+
+   return mdiobus_read(adapter->mii_bus, prtad, regnum);
+   }
+
if (prtad != hw->phy.mdio.prtad)
return -EINVAL;
rc = hw->phy.ops.read_reg(hw, addr, devad, );
@@ -8804,6 +8813,15 @@ static int ixgbe_mdio_write(struct net_device *netdev, 
int prtad, int devad,
struct ixgbe_adapter *adapter = netdev_priv(netdev);
struct ixgbe_hw *hw = >hw;
 
+   if (adapter->mii_bus) {
+   int regnum = addr;
+
+   if (devad != MDIO_DEVAD_NONE)
+   regnum |= (devad << 16) | MII_ADDR_C45;
+
+   return mdiobus_write(adapter->mii_bus, prtad, regnum, value);
+   }
+
if (prtad != hw->phy.mdio.prtad)
return -EINVAL;
return hw->phy.ops.write_reg(hw, addr, devad, value);
-- 
2.17.2



Re: consistency for statistics with XDP mode

2018-12-03 Thread Toke Høiland-Jørgensen
David Miller  writes:

> From: David Ahern 
> Date: Mon, 3 Dec 2018 08:45:12 -0700
>
>> On 12/1/18 4:22 AM, Jesper Dangaard Brouer wrote:
>>> IMHO XDP_DROP should not be accounted as netdev stats drops, this is
>>> a user installed program like tc/iptables, that can also choose to
>>> drop packets.
>> 
>> sure and both tc and iptables have counters that can see the dropped
>> packets. A counter in the driver level stats ("xdp_drop" is fine with
>> with me).
>
> Part of the problem I have with this kind of logic is we take the
> choice away from the XDP program.

I wonder if it would be possible to support both the "give me user
normal stats" case and the "let me do whatever I want" case by a
combination of userspace tooling and maybe a helper or two?

I.e., create a "do_stats()" helper (please pick a better name), which
will either just increment the desired counters, or set a flag so the
driver can do it at napi poll exit. With this, the userspace tooling
could have a "--give-me-normal-stats" switch (or some other interface),
which would inject a call instruction to that helper at the start of the
program.

This would enable the normal counters in a relatively painless way,
while still letting people opt out if they don't want to pay the cost in
terms of overhead. And having the userspace tooling inject the helper
call helps support the case where the admin didn't write the XDP
programs being loaded.

Any reason why that wouldn't work?

-Toke


Re: [PATCH net-next v3 1/2] ixgbe: register a mdiobus

2018-12-03 Thread Florian Fainelli
On 12/3/18 11:44 AM, Steve Douthit wrote:
> On 12/3/18 2:07 PM, Florian Fainelli wrote:
>> On 12/3/18 10:55 AM, Steve Douthit wrote:
>>> Most dsa devices expect a 'struct mii_bus' pointer to talk to switches
>>> via the MII interface.
>>>
>>> While this works for dsa devices, it will not work safely with Linux
>>> PHYs in all configurations since the firmware of the ixgbe device may
>>> be polling some PHY addresses in the background.
>>>
>>> Signed-off-by: Stephen Douthit 
>>> ---
>>
>> [snip]
>>
>>> +/**
>>> + *  ixgbe_mii_bus_write - Write a clause 22/45 register
>>> + *  @hw: pointer to hardware structure
>>> + *  @addr: address
>>> + *  @regnum: register number
>>> + *  @regnum: valueto write
>>
>> This should be @val to match the function parameters
> 
> OK
> 
>>> + **/
>>> +static s32 ixgbe_mii_bus_write(struct mii_bus *bus, int addr, int regnum,
>>> +  u16 val)
>>> +{
>>> +   struct ixgbe_adapter *adapter = (struct ixgbe_adapter *)bus->priv;
>>
>> Nitpick: cast is not necessary since this is a void * pointer (for that
>> reason).
> 
> OK, I'll clean up this and other unnecessary casts.
> 
> I forgot Andrew's suggestion to squash the swfw semaphore masks from:
> 
> + u32 gssr = hw->phy.phy_semaphore_mask | IXGBE_GSSR_TOKEN_SM;
> +
> + if (hw->bus.lan_id)
> + gssr |= IXGBE_GSSR_PHY1_SM;
> + else
> + gssr |= IXGBE_GSSR_PHY0_SM;
> 
> to
> 
> + u32 gssr = hw->phy.phy_semaphore_mask;
> + gssr |= IXGBE_GSSR_TOKEN_SM | IXGBE_GSSR_PHY0_SM;
> 
> Is it ok to collect both of your 'Reviewed-by:'s with that additional
> change for v4?

I'd think so.
-- 
Florian


Re: [PATCH net-next v3 1/2] ixgbe: register a mdiobus

2018-12-03 Thread Steve Douthit
On 12/3/18 2:07 PM, Florian Fainelli wrote:
> On 12/3/18 10:55 AM, Steve Douthit wrote:
>> Most dsa devices expect a 'struct mii_bus' pointer to talk to switches
>> via the MII interface.
>>
>> While this works for dsa devices, it will not work safely with Linux
>> PHYs in all configurations since the firmware of the ixgbe device may
>> be polling some PHY addresses in the background.
>>
>> Signed-off-by: Stephen Douthit 
>> ---
> 
> [snip]
> 
>> +/**
>> + *  ixgbe_mii_bus_write - Write a clause 22/45 register
>> + *  @hw: pointer to hardware structure
>> + *  @addr: address
>> + *  @regnum: register number
>> + *  @regnum: valueto write
> 
> This should be @val to match the function parameters

OK

>> + **/
>> +static s32 ixgbe_mii_bus_write(struct mii_bus *bus, int addr, int regnum,
>> +   u16 val)
>> +{
>> +struct ixgbe_adapter *adapter = (struct ixgbe_adapter *)bus->priv;
> 
> Nitpick: cast is not necessary since this is a void * pointer (for that
> reason).

OK, I'll clean up this and other unnecessary casts.

I forgot Andrew's suggestion to squash the swfw semaphore masks from:

+   u32 gssr = hw->phy.phy_semaphore_mask | IXGBE_GSSR_TOKEN_SM;
+
+   if (hw->bus.lan_id)
+   gssr |= IXGBE_GSSR_PHY1_SM;
+   else
+   gssr |= IXGBE_GSSR_PHY0_SM;

to

+   u32 gssr = hw->phy.phy_semaphore_mask;
+   gssr |= IXGBE_GSSR_TOKEN_SM | IXGBE_GSSR_PHY0_SM;

Is it ok to collect both of your 'Reviewed-by:'s with that additional
change for v4?


Re: consistency for statistics with XDP mode

2018-12-03 Thread Arnaldo Carvalho de Melo
Em Mon, Dec 03, 2018 at 11:30:01AM -0800, David Miller escreveu:
> From: David Ahern 
> Date: Mon, 3 Dec 2018 08:45:12 -0700
> 
> > On 12/1/18 4:22 AM, Jesper Dangaard Brouer wrote:
> >> IMHO XDP_DROP should not be accounted as netdev stats drops, this is a
> >> user installed program like tc/iptables, that can also choose to drop
> >> packets.
> > 
> > sure and both tc and iptables have counters that can see the dropped
> > packets. A counter in the driver level stats ("xdp_drop" is fine with
> > with me).
> 
> Part of the problem I have with this kind of logic is we take the choice
> away from the XDP program.
> 
> If I feel that the xdp_drop counter bump is too much overhead during a
> DDoS attack and I want to avoid it, you don't give me a choice in the
> matter.
> 
> If I want to represent the statistics for that event differently, you
> also give me no choice about it.
> 
> Really, if XDP_DROP is returned, zero resources should be devoted to
> the frame past that point.
> 
> I know you want to live in this magical world where XDP stuff behaves
> like the existing stack and give you all of the visibility to events
> and objects.
> 
> But that is your choice.
> 
> Please give others the choice to not live in that world and allow XDP
> programs to live in their own entirely different environment, with
> custom statistics and complete control over how counters are
> incremented and how objects are used and represented, if they choose
> to do so.
> 
> XDP is about choice.

Coming out of the blue...: the presence of a "struct xdp_stats" in the
XDP program BPF object file .BTF section, one could query and the parse
to figure out what stats, if any, are provided.

/me goes back to tweaking his btf_loader in pahole... :-)

- Arnaldo


Re: consistency for statistics with XDP mode

2018-12-03 Thread David Miller
From: David Ahern 
Date: Mon, 3 Dec 2018 08:56:28 -0700

> I do not want Linux + XDP to require custom anything to debug basic
> functionality - such as isolating a packet drop to a specific point.

Definitely we need to let people choose to arrange things that way
if they want to do so.  It is not our place to make to make that kind
of decision for them.

If they want custom statistics and a custom representation of objects
in their tools, etc.  That is completely fine.

XDP is not about forcing a model of these things upon people.


Re: consistency for statistics with XDP mode

2018-12-03 Thread David Miller
From: David Ahern 
Date: Mon, 3 Dec 2018 08:45:12 -0700

> On 12/1/18 4:22 AM, Jesper Dangaard Brouer wrote:
>> IMHO XDP_DROP should not be accounted as netdev stats drops, this is a
>> user installed program like tc/iptables, that can also choose to drop
>> packets.
> 
> sure and both tc and iptables have counters that can see the dropped
> packets. A counter in the driver level stats ("xdp_drop" is fine with
> with me).

Part of the problem I have with this kind of logic is we take the choice
away from the XDP program.

If I feel that the xdp_drop counter bump is too much overhead during a
DDoS attack and I want to avoid it, you don't give me a choice in the
matter.

If I want to represent the statistics for that event differently, you
also give me no choice about it.

Really, if XDP_DROP is returned, zero resources should be devoted to
the frame past that point.

I know you want to live in this magical world where XDP stuff behaves
like the existing stack and give you all of the visibility to events
and objects.

But that is your choice.

Please give others the choice to not live in that world and allow XDP
programs to live in their own entirely different environment, with
custom statistics and complete control over how counters are
incremented and how objects are used and represented, if they choose
to do so.

XDP is about choice.


Re: [PATCH net-next v1 1/4] etf: Cancel timer if there are no pending skbs

2018-12-03 Thread Vinicius Costa Gomes
Hi,

Vinicius Costa Gomes  writes:

> From: Jesus Sanchez-Palencia 
>
> There is no point in firing the qdisc watchdog if there are no future
> skbs pending in the queue and the watchdog had been set previously.
>
> Signed-off-by: Jesus Sanchez-Palencia 

It seems that I made a mistake when submitting this series.

It should have been proposed to the net tree (instead of net-next) and
should have included a "Fixes" tag.

Is there a way to have this series considered for the net tree at this
point (given that it already landed on net-next)?


Cheers,
--
Vinicius


Re: [PATCH net-next v3 2/2] ixgbe: use mii_bus to handle MII related ioctls

2018-12-03 Thread Florian Fainelli
On 12/3/18 10:55 AM, Steve Douthit wrote:
> Use the mii_bus callbacks to address the entire clause 22/45 address
> space.  Enables userspace to poke switch registers instead of a single
> PHY address.
> 
> The ixgbe firmware may be polling PHYs in a way that is not protected by
> the mii_bus lock.  This isn't new behavior, but as Andrew Lunn pointed
> out there are more addresses available for conflicts.
> 
> Signed-off-by: Stephen Douthit 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH net-next v3 1/2] ixgbe: register a mdiobus

2018-12-03 Thread Florian Fainelli
On 12/3/18 10:55 AM, Steve Douthit wrote:
> Most dsa devices expect a 'struct mii_bus' pointer to talk to switches
> via the MII interface.
> 
> While this works for dsa devices, it will not work safely with Linux
> PHYs in all configurations since the firmware of the ixgbe device may
> be polling some PHY addresses in the background.
> 
> Signed-off-by: Stephen Douthit 
> ---

[snip]

> +/**
> + *  ixgbe_mii_bus_write - Write a clause 22/45 register
> + *  @hw: pointer to hardware structure
> + *  @addr: address
> + *  @regnum: register number
> + *  @regnum: valueto write

This should be @val to match the function parameters

> + **/
> +static s32 ixgbe_mii_bus_write(struct mii_bus *bus, int addr, int regnum,
> +u16 val)
> +{
> + struct ixgbe_adapter *adapter = (struct ixgbe_adapter *)bus->priv;

Nitpick: cast is not necessary since this is a void * pointer (for that
reason).

> + struct ixgbe_hw *hw = >hw;
> + u32 gssr = hw->phy.phy_semaphore_mask;
> +
> + return ixgbe_mii_bus_write_generic(hw, addr, regnum, val, gssr);
> +}
> +
> +/**
> + *  ixgbe_x550em_a_mii_bus_read - Read a clause 22/45 register on x550em_a
> + *  @hw: pointer to hardware structure
> + *  @addr: address
> + *  @regnum: register number
> + **/
> +static s32 ixgbe_x550em_a_mii_bus_read(struct mii_bus *bus, int addr,
> +int regnum)
> +{
> + struct ixgbe_adapter *adapter = (struct ixgbe_adapter *)bus->priv;

Likewise, the cast is not necessary.

Everything else looked fine, so with that fixed:

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH net-next v3 2/2] ixgbe: use mii_bus to handle MII related ioctls

2018-12-03 Thread Andrew Lunn
On Mon, Dec 03, 2018 at 06:55:26PM +, Steve Douthit wrote:
> Use the mii_bus callbacks to address the entire clause 22/45 address
> space.  Enables userspace to poke switch registers instead of a single
> PHY address.
> 
> The ixgbe firmware may be polling PHYs in a way that is not protected by
> the mii_bus lock.  This isn't new behavior, but as Andrew Lunn pointed
> out there are more addresses available for conflicts.
> 
> Signed-off-by: Stephen Douthit 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net-next v3 1/2] ixgbe: register a mdiobus

2018-12-03 Thread Andrew Lunn
On Mon, Dec 03, 2018 at 06:55:22PM +, Steve Douthit wrote:
> Most dsa devices expect a 'struct mii_bus' pointer to talk to switches
> via the MII interface.
> 
> While this works for dsa devices, it will not work safely with Linux
> PHYs in all configurations since the firmware of the ixgbe device may
> be polling some PHY addresses in the background.
> 
> Signed-off-by: Stephen Douthit 

Reviewed-by: Andrew Lunn 

Andrew


[PATCH bpf-next 2/3] bpf: add BPF_PROG_TEST_RUN support for flow dissector

2018-12-03 Thread Stanislav Fomichev
The input is packet data, the output is struct bpf_flow_key. This should
make it easy to test flow dissector programs without elaborate
setup.

Signed-off-by: Stanislav Fomichev 
---
 include/linux/bpf.h |  3 ++
 net/bpf/test_run.c  | 76 +
 net/core/filter.c   |  1 +
 3 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e82b7039fc66..7a572d15d5dd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -373,6 +373,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
union bpf_attr *kattr,
  union bpf_attr __user *uattr);
 int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
  union bpf_attr __user *uattr);
+int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
+const union bpf_attr *kattr,
+union bpf_attr __user *uattr);
 
 /* an array of programs to be executed under rcu_lock.
  *
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index c89c22c49015..bfa05d31c6e3 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -14,21 +14,33 @@
 #include 
 
 static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
-   struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
+   struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
+   struct bpf_flow_keys *flow_keys)
 {
u32 ret;
 
preempt_disable();
rcu_read_lock();
bpf_cgroup_storage_set(storage);
-   ret = BPF_PROG_RUN(prog, ctx);
+
+   switch (prog->type) {
+   case BPF_PROG_TYPE_FLOW_DISSECTOR:
+   ret = __skb_flow_bpf_dissect(prog, ctx, _keys_dissector,
+flow_keys);
+   break;
+   default:
+   ret = BPF_PROG_RUN(prog, ctx);
+   break;
+   }
+
rcu_read_unlock();
preempt_enable();
 
return ret;
 }
 
-static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 
*time)
+static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 
*time,
+   struct bpf_flow_keys *flow_keys)
 {
struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { 0 };
enum bpf_cgroup_storage_type stype;
@@ -49,7 +61,7 @@ static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 
repeat, u32 *time)
repeat = 1;
time_start = ktime_get_ns();
for (i = 0; i < repeat; i++) {
-   ret = bpf_test_run_one(prog, ctx, storage);
+   ret = bpf_test_run_one(prog, ctx, storage, flow_keys);
if (need_resched()) {
if (signal_pending(current))
break;
@@ -165,7 +177,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const 
union bpf_attr *kattr,
__skb_push(skb, hh_len);
if (is_direct_pkt_access)
bpf_compute_data_pointers(skb);
-   retval = bpf_test_run(prog, skb, repeat, );
+   retval = bpf_test_run(prog, skb, repeat, , NULL);
if (!is_l2) {
if (skb_headroom(skb) < hh_len) {
int nhead = HH_DATA_ALIGN(hh_len - skb_headroom(skb));
@@ -212,7 +224,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
union bpf_attr *kattr,
rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 
0);
xdp.rxq = >xdp_rxq;
 
-   retval = bpf_test_run(prog, , repeat, );
+   retval = bpf_test_run(prog, , repeat, , NULL);
if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
xdp.data_end != xdp.data + size)
size = xdp.data_end - xdp.data;
@@ -220,3 +232,55 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
union bpf_attr *kattr,
kfree(data);
return ret;
 }
+
+int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
+const union bpf_attr *kattr,
+union bpf_attr __user *uattr)
+{
+   struct bpf_flow_keys flow_keys = {};
+   u32 size = kattr->test.data_size_in;
+   u32 repeat = kattr->test.repeat;
+   u32 retval, duration;
+   struct sk_buff *skb;
+   struct sock *sk;
+   void *data;
+   int ret;
+
+   if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
+   return -EINVAL;
+
+   data = bpf_test_init(kattr, size, NET_SKB_PAD + NET_IP_ALIGN,
+SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
+   if (IS_ERR(data))
+   return PTR_ERR(data);
+
+   sk = kzalloc(sizeof(*sk), GFP_USER);
+   if (!sk) {
+   kfree(data);
+   return -ENOMEM;
+   }
+   sock_net_set(sk, current->nsproxy->net_ns);
+   sock_init_data(NULL, sk);
+
+   

[PATCH bpf-next 3/3] selftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow dissector

2018-12-03 Thread Stanislav Fomichev
Use existing pkt_v4 and pkt_v6 to make sure flow_keys are what we want.

Also, add new bpf_flow_load routine (and flow_dissector_load.h header)
that loads bpf_flow.o program and does all required setup.

Signed-off-by: Stanislav Fomichev 
---
 tools/testing/selftests/bpf/Makefile  |  3 +
 .../selftests/bpf/flow_dissector_load.c   | 43 ++-
 .../selftests/bpf/flow_dissector_load.h   | 55 ++
 tools/testing/selftests/bpf/test_progs.c  | 76 ++-
 4 files changed, 137 insertions(+), 40 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/flow_dissector_load.h

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 73aa6d8f4a2f..387763e6d137 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -19,6 +19,9 @@ all: $(TEST_CUSTOM_PROGS)
 $(TEST_CUSTOM_PROGS): $(OUTPUT)/%: %.c
$(CC) -o $(TEST_CUSTOM_PROGS) -static $< -Wl,--build-id
 
+flow_dissector_load.c: flow_dissector_load.h
+test_run.c: flow_dissector_load.h
+
 # Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
diff --git a/tools/testing/selftests/bpf/flow_dissector_load.c 
b/tools/testing/selftests/bpf/flow_dissector_load.c
index ae8180b11d5f..77cafa66d048 100644
--- a/tools/testing/selftests/bpf/flow_dissector_load.c
+++ b/tools/testing/selftests/bpf/flow_dissector_load.c
@@ -12,6 +12,7 @@
 #include 
 
 #include "bpf_rlimit.h"
+#include "flow_dissector_load.h"
 
 const char *cfg_pin_path = "/sys/fs/bpf/flow_dissector";
 const char *cfg_map_name = "jmp_table";
@@ -21,46 +22,13 @@ char *cfg_path_name;
 
 static void load_and_attach_program(void)
 {
-   struct bpf_program *prog, *main_prog;
-   struct bpf_map *prog_array;
-   int i, fd, prog_fd, ret;
+   int prog_fd, ret;
struct bpf_object *obj;
-   int prog_array_fd;
 
-   ret = bpf_prog_load(cfg_path_name, BPF_PROG_TYPE_FLOW_DISSECTOR, ,
-   _fd);
+   ret = bpf_flow_load(, cfg_path_name, cfg_section_name,
+   cfg_map_name, _fd);
if (ret)
-   error(1, 0, "bpf_prog_load %s", cfg_path_name);
-
-   main_prog = bpf_object__find_program_by_title(obj, cfg_section_name);
-   if (!main_prog)
-   error(1, 0, "bpf_object__find_program_by_title %s",
- cfg_section_name);
-
-   prog_fd = bpf_program__fd(main_prog);
-   if (prog_fd < 0)
-   error(1, 0, "bpf_program__fd");
-
-   prog_array = bpf_object__find_map_by_name(obj, cfg_map_name);
-   if (!prog_array)
-   error(1, 0, "bpf_object__find_map_by_name %s", cfg_map_name);
-
-   prog_array_fd = bpf_map__fd(prog_array);
-   if (prog_array_fd < 0)
-   error(1, 0, "bpf_map__fd %s", cfg_map_name);
-
-   i = 0;
-   bpf_object__for_each_program(prog, obj) {
-   fd = bpf_program__fd(prog);
-   if (fd < 0)
-   error(1, 0, "bpf_program__fd");
-
-   if (fd != prog_fd) {
-   printf("%d: %s\n", i, bpf_program__title(prog, false));
-   bpf_map_update_elem(prog_array_fd, , , BPF_ANY);
-   ++i;
-   }
-   }
+   error(1, 0, "bpf_flow_load %s", cfg_path_name);
 
ret = bpf_prog_attach(prog_fd, 0 /* Ignore */, BPF_FLOW_DISSECTOR, 0);
if (ret)
@@ -69,7 +37,6 @@ static void load_and_attach_program(void)
ret = bpf_object__pin(obj, cfg_pin_path);
if (ret)
error(1, 0, "bpf_object__pin %s", cfg_pin_path);
-
 }
 
 static void detach_program(void)
diff --git a/tools/testing/selftests/bpf/flow_dissector_load.h 
b/tools/testing/selftests/bpf/flow_dissector_load.h
new file mode 100644
index ..9f2fd38e30b0
--- /dev/null
+++ b/tools/testing/selftests/bpf/flow_dissector_load.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef FLOW_DISSECTOR_LOAD
+#define FLOW_DISSECTOR_LOAD
+
+#include 
+#include 
+
+static inline int bpf_flow_load(struct bpf_object **obj,
+   const char *path,
+   const char *section_name,
+   const char *map_name,
+   int *prog_fd)
+{
+   struct bpf_program *prog, *main_prog;
+   struct bpf_map *prog_array;
+   int prog_array_fd;
+   int ret, fd, i;
+
+   ret = bpf_prog_load(path, BPF_PROG_TYPE_FLOW_DISSECTOR, obj,
+   prog_fd);
+   if (ret)
+   return ret;
+
+   main_prog = bpf_object__find_program_by_title(*obj, section_name);
+   if (!main_prog)
+   return ret;
+
+   *prog_fd = bpf_program__fd(main_prog);
+   if (*prog_fd < 0)
+   return ret;
+
+   

[PATCH bpf-next 1/3] net/flow_dissector: move bpf case into __skb_flow_bpf_dissect

2018-12-03 Thread Stanislav Fomichev
This way, we can reuse it for flow dissector in BPF_PROG_TEST_RUN.

No functional changes.

Signed-off-by: Stanislav Fomichev 
---
 include/linux/skbuff.h|  5 +++
 net/core/flow_dissector.c | 83 +++
 2 files changed, 54 insertions(+), 34 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 73902acf2b71..c10b27bb0cab 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1215,6 +1215,11 @@ static inline int 
skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 }
 #endif
 
+struct bpf_flow_keys;
+bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
+   const struct sk_buff *skb,
+   struct flow_dissector *flow_dissector,
+   struct bpf_flow_keys *flow_keys);
 bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
void *target_container,
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 2e8d91e54179..b465f894ec20 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -683,6 +683,39 @@ static void __skb_flow_bpf_to_target(const struct 
bpf_flow_keys *flow_keys,
}
 }
 
+bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
+   const struct sk_buff *skb,
+   struct flow_dissector *flow_dissector,
+   struct bpf_flow_keys *flow_keys)
+{
+   struct bpf_skb_data_end cb_saved;
+   struct bpf_skb_data_end *cb;
+   u32 result;
+
+   /* Note that even though the const qualifier is discarded
+* throughout the execution of the BPF program, all changes(the
+* control block) are reverted after the BPF program returns.
+* Therefore, __skb_flow_dissect does not alter the skb.
+*/
+
+   cb = (struct bpf_skb_data_end *)skb->cb;
+
+   /* Save Control Block */
+   memcpy(_saved, cb, sizeof(cb_saved));
+   memset(cb, 0, sizeof(cb_saved));
+
+   /* Pass parameters to the BPF program */
+   cb->qdisc_cb.flow_keys = flow_keys;
+
+   bpf_compute_data_pointers((struct sk_buff *)skb);
+   result = BPF_PROG_RUN(prog, skb);
+
+   /* Restore state */
+   memcpy(cb, _saved, sizeof(cb_saved));
+
+   return result == BPF_OK;
+}
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are 
specified
@@ -714,7 +747,6 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector_key_vlan *key_vlan;
enum flow_dissect_ret fdret;
enum flow_dissector_key_id dissector_vlan = FLOW_DISSECTOR_KEY_MAX;
-   struct bpf_prog *attached = NULL;
int num_hdrs = 0;
u8 ip_proto = 0;
bool ret;
@@ -754,49 +786,32 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
  FLOW_DISSECTOR_KEY_BASIC,
  target_container);
 
-   rcu_read_lock();
if (skb) {
+   struct bpf_flow_keys flow_keys = {};
+   struct bpf_prog *attached = NULL;
+
+   rcu_read_lock();
+
if (skb->dev)
attached = 
rcu_dereference(dev_net(skb->dev)->flow_dissector_prog);
else if (skb->sk)
attached = 
rcu_dereference(sock_net(skb->sk)->flow_dissector_prog);
else
WARN_ON_ONCE(1);
-   }
-   if (attached) {
-   /* Note that even though the const qualifier is discarded
-* throughout the execution of the BPF program, all changes(the
-* control block) are reverted after the BPF program returns.
-* Therefore, __skb_flow_dissect does not alter the skb.
-*/
-   struct bpf_flow_keys flow_keys = {};
-   struct bpf_skb_data_end cb_saved;
-   struct bpf_skb_data_end *cb;
-   u32 result;
-
-   cb = (struct bpf_skb_data_end *)skb->cb;
 
-   /* Save Control Block */
-   memcpy(_saved, cb, sizeof(cb_saved));
-   memset(cb, 0, sizeof(cb_saved));
-
-   /* Pass parameters to the BPF program */
-   cb->qdisc_cb.flow_keys = _keys;
-   flow_keys.nhoff = nhoff;
-
-   bpf_compute_data_pointers((struct sk_buff *)skb);
-   result = BPF_PROG_RUN(attached, skb);
-
-   /* Restore state */
-   memcpy(cb, _saved, sizeof(cb_saved));
-
-   __skb_flow_bpf_to_target(_keys, flow_dissector,
-target_container);
-   key_control->thoff = min_t(u16, key_control->thoff, skb->len);
+   if (attached) {
+   ret = __skb_flow_bpf_dissect(attached, 

[PATCH bpf-next 0/3] support flow dissector in BPF_PROG_TEST_RUN

2018-12-03 Thread Stanislav Fomichev
This patch series adds support for testing flow dissector BPF programs by
extending already existing BPF_PROG_TEST_RUN. The goal is to have a
packet as an input and `struct bpf_flow_key' as an output. That way
we can easily test flow dissector programs' behavior.
I've also modified existing test_progs.c test to do a simple flow
dissector run as well.

* first patch introduces new __skb_flow_bpf_dissect to simplify
  sharing between __skb_flow_bpf_dissect and BPF_PROG_TEST_RUN
* second patch adds actual BPF_PROG_TEST_RUN support
* third patch adds example usage to the selftests

Stanislav Fomichev (3):
  net/flow_dissector: move bpf case into __skb_flow_bpf_dissect
  bpf: add BPF_PROG_TEST_RUN support for flow dissector
  selftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow
dissector

 include/linux/bpf.h   |  3 +
 include/linux/skbuff.h|  5 ++
 net/bpf/test_run.c| 76 +++--
 net/core/filter.c |  1 +
 net/core/flow_dissector.c | 83 +++
 tools/testing/selftests/bpf/Makefile  |  3 +
 .../selftests/bpf/flow_dissector_load.c   | 43 ++
 .../selftests/bpf/flow_dissector_load.h   | 55 
 tools/testing/selftests/bpf/test_progs.c  | 76 -
 9 files changed, 265 insertions(+), 80 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/flow_dissector_load.h

-- 
2.20.0.rc1.387.gf8505762e3-goog



[PATCH net-next v3 2/2] ixgbe: use mii_bus to handle MII related ioctls

2018-12-03 Thread Steve Douthit
Use the mii_bus callbacks to address the entire clause 22/45 address
space.  Enables userspace to poke switch registers instead of a single
PHY address.

The ixgbe firmware may be polling PHYs in a way that is not protected by
the mii_bus lock.  This isn't new behavior, but as Andrew Lunn pointed
out there are more addresses available for conflicts.

Signed-off-by: Stephen Douthit 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 82af3b24d222..fb066c491abe 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8790,6 +8790,15 @@ ixgbe_mdio_read(struct net_device *netdev, int prtad, 
int devad, u16 addr)
u16 value;
int rc;
 
+   if (adapter->mii_bus) {
+   int regnum = addr;
+
+   if (devad != MDIO_DEVAD_NONE)
+   regnum |= (devad << 16) | MII_ADDR_C45;
+
+   return mdiobus_read(adapter->mii_bus, prtad, regnum);
+   }
+
if (prtad != hw->phy.mdio.prtad)
return -EINVAL;
rc = hw->phy.ops.read_reg(hw, addr, devad, );
@@ -8804,6 +8813,15 @@ static int ixgbe_mdio_write(struct net_device *netdev, 
int prtad, int devad,
struct ixgbe_adapter *adapter = netdev_priv(netdev);
struct ixgbe_hw *hw = >hw;
 
+   if (adapter->mii_bus) {
+   int regnum = addr;
+
+   if (devad != MDIO_DEVAD_NONE)
+   regnum |= (devad << 16) | MII_ADDR_C45;
+
+   return mdiobus_write(adapter->mii_bus, prtad, regnum, value);
+   }
+
if (prtad != hw->phy.mdio.prtad)
return -EINVAL;
return hw->phy.ops.write_reg(hw, addr, devad, value);
-- 
2.17.2



  1   2   >