[PATCH bpf-next] nfp: bpf: xdp_adjust_tail support

2018-08-03 Thread Jakub Kicinski
Add support for adjust_tail.  There are no FW changes needed but add
a FW capability just in case there would be any issue with previously
released FW, or we will have to change the ABI in the future.

The helper is trivial and shouldn't be used too often so just inline
the body of the function.  We add the delta to locally maintained
packet length register and check for overflow, since add of negative
value must overflow if result is positive.  Note that if delta of 0
would be allowed in the kernel this trick stops working and we need
one more instruction to compare lengths before and after the change.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/bpf/fw.h   |  1 +
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 47 +++
 drivers/net/ethernet/netronome/nfp/bpf/main.c | 13 +
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  2 +
 .../net/ethernet/netronome/nfp/bpf/verifier.c |  7 +++
 drivers/net/ethernet/netronome/nfp/nfp_asm.h  |  1 +
 6 files changed, 71 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/fw.h 
b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
index 4c7972e3db63..e4f9b7ec8528 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/fw.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
@@ -51,6 +51,7 @@ enum bpf_cap_tlv_type {
NFP_BPF_CAP_TYPE_MAPS   = 3,
NFP_BPF_CAP_TYPE_RANDOM = 4,
NFP_BPF_CAP_TYPE_QUEUE_SELECT   = 5,
+   NFP_BPF_CAP_TYPE_ADJUST_TAIL= 6,
 };
 
 struct nfp_bpf_cap_tlv_func {
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c 
b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 3c22d27de9da..eff57f7d056a 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1642,6 +1642,51 @@ static int adjust_head(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
return 0;
 }
 
+static int adjust_tail(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+   u32 ret_einval, end;
+   swreg plen, delta;
+
+   BUILD_BUG_ON(plen_reg(nfp_prog) != reg_b(STATIC_REG_PKT_LEN));
+
+   plen = imm_a(nfp_prog);
+   delta = reg_a(2 * 2);
+
+   ret_einval = nfp_prog_current_offset(nfp_prog) + 9;
+   end = nfp_prog_current_offset(nfp_prog) + 11;
+
+   /* Calculate resulting length */
+   emit_alu(nfp_prog, plen, plen_reg(nfp_prog), ALU_OP_ADD, delta);
+   /* delta == 0 is not allowed by the kernel, add must overflow to make
+* length smaller.
+*/
+   emit_br(nfp_prog, BR_BCC, ret_einval, 0);
+
+   /* if (new_len < 14) then -EINVAL */
+   emit_alu(nfp_prog, reg_none(), plen, ALU_OP_SUB, reg_imm(ETH_HLEN));
+   emit_br(nfp_prog, BR_BMI, ret_einval, 0);
+
+   emit_alu(nfp_prog, plen_reg(nfp_prog),
+plen_reg(nfp_prog), ALU_OP_ADD, delta);
+   emit_alu(nfp_prog, pv_len(nfp_prog),
+pv_len(nfp_prog), ALU_OP_ADD, delta);
+
+   emit_br(nfp_prog, BR_UNC, end, 2);
+   wrp_immed(nfp_prog, reg_both(0), 0);
+   wrp_immed(nfp_prog, reg_both(1), 0);
+
+   if (!nfp_prog_confirm_current_offset(nfp_prog, ret_einval))
+   return -EINVAL;
+
+   wrp_immed(nfp_prog, reg_both(0), -22);
+   wrp_immed(nfp_prog, reg_both(1), ~0);
+
+   if (!nfp_prog_confirm_current_offset(nfp_prog, end))
+   return -EINVAL;
+
+   return 0;
+}
+
 static int
 map_call_stack_common(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
@@ -3041,6 +3086,8 @@ static int call(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
switch (meta->insn.imm) {
case BPF_FUNC_xdp_adjust_head:
return adjust_head(nfp_prog, meta);
+   case BPF_FUNC_xdp_adjust_tail:
+   return adjust_tail(nfp_prog, meta);
case BPF_FUNC_map_lookup_elem:
case BPF_FUNC_map_update_elem:
case BPF_FUNC_map_delete_elem:
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c 
b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index cce1d2945a32..970af07f4656 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -334,6 +334,14 @@ nfp_bpf_parse_cap_qsel(struct nfp_app_bpf *bpf, void 
__iomem *value, u32 length)
return 0;
 }
 
+static int
+nfp_bpf_parse_cap_adjust_tail(struct nfp_app_bpf *bpf, void __iomem *value,
+ u32 length)
+{
+   bpf->adjust_tail = true;
+   return 0;
+}
+
 static int nfp_bpf_parse_capabilities(struct nfp_app *app)
 {
struct nfp_cpp *cpp = app->pf->cpp;
@@ -380,6 +388,11 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app)
if (nfp_bpf_parse_cap_qsel(app->priv, value, length))
goto err_release_free;
break;
+   case NFP_BPF_CAP_TYPE_ADJUST_TAIL:
+   if 

Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31

2018-08-03 Thread Jakub Kicinski
On Fri, 3 Aug 2018 19:41:50 +0300, Ido Schimmel wrote:
> On Thu, Aug 02, 2018 at 03:53:15PM -0700, Jakub Kicinski wrote:
> > No one is requesting full RED offload here..  if someone sets the
> > parameters you can't support you simply won't offload them.  And ignore
> > the parameters which only make sense in software terms.  Look at the
> > docs for mlxsw:
> > 
> > https://github.com/Mellanox/mlxsw/wiki/Queues-Management#offloading-red
> > 
> > It says "not offloaded" in a number of places.
> >   
> ...
> > It's generally preferable to implement a subset of exiting well defined
> > API than create vendor knobs, hence hardly a misuse.  
> 
> Sorry for derailing the discussion, but you mentioned some points that
> have been bothering me for a while.
> 
> I think we didn't do a very good job with buffer management and this is
> exactly why you see some parameters marked as "not offloaded". Take the
> "limit" (queue size) for example. It's configured via devlink-sb, by
> setting a quota on the number of bytes that can be queued for the port
> and TC (queue) that RED manages. See:
> 
> https://github.com/Mellanox/mlxsw/wiki/Quality-of-Service#pool-binding

FWIW I was implementing a very similar thing for the NFP a while back.
devlink-sb to configure per-port limits and RED offload.  I believe we
have some more qdisc offloads but out-of-tree/for appliances.
"Switchdev mode" + qdisc offloads work quite well.  For RED I think
we also don't offload the limit.

> It would have been much better and user friendly to not ignore this
> parameter and have users configure the limit using existing interfaces
> (tc), instead of creating a discrepancy between the software and
> hardware data paths by configuring the hardware directly via devlink-sb.
>
> I believe devlink-sb is mainly the result of Linux's short comings in
> this area and our lack of perspective back then. While the qdisc layer
> (Linux's shared buffers) works for end hosts, it requires enhancements
> (mainly on ingress) for switches (physical/virtual) that forward
> packets.

I could definitely agree with you.  But there is another way to look at
this.  Memory in ASICs is fundamentally more precious.  If the problem
was never solved for Linux (placing constraints on the number of
packets in the system by ingress port) maybe it's just not important
for software stacks?  Qdiscs are focused on egress.  Perhaps a better
software equivalent to Shared Buffers would be Jesper's Buffer Pools?

With Buffer Pools the concern that a pre-configured and pinned pool of
DMA-mapped pages will start growing and eat all host's memory is more
real.  That to me that's closer.  If we develop XDP-based fastpaths
with DMA pools shared between devices - that's much more like an ASIC's
SB.

In my view we don't offload the limit not because we configure it via
an different API, but because the limit assumes there is abundance of
memory and queue has to be capped.  Limit expresses how much queue
build up is okay, while SB config is strictly a resource quota.  In
practice the quota is always a lot lower than user's desired limit so
we don't even bother with the limit.

> For example, switches (I'm familiar with Mellanox ASICs, but I assume
> the concept is similar in other ASICs) have ingress buffers where
> packets are stored while going through the pipeline. Once out of the
> pipeline you know from which port and queue the packet should egress. In
> case you have both lossless and lossy traffic in your network you
> probably want to classify it into different ingress buffers and mark the
> buffers where the lossless traffic is stored as such, so that PFC frames
> would be emitted above a certain threshold.
> 
> This is currently configured using dcbnl, but it lacks a software model
> which means that packets that are forwarded by the kernel don't get the
> same treatment (e.g., skb priority isn't set). It also means that when
> you want to limit the number of packets that are queued *from* a certain
> port and ingress buffer you resort to tools such as devlink-sb that end
> up colliding with existing tools (tc).

Extending DCB further into the kernel on ingress does not seem
impossible.  Maybe the AVB/industrial folks will tackle that at some
point?

> I was thinking (not too much...) about modelling the above using ingress
> qdiscs. They don't do any queueing, but more of accounting. Once the
> egress qdisc dequeues the packet, you give credit back to the ingress
> qdisc from which the packet came from. I believe that modelling these
> buffers using the qdisc layer is the right abstraction.

Interesting.  My concern would be mapping the packet back to ingress
port to free the right qdisc credit.  MM direction, like Buffer Pools,
seem more viable to a layman like me.  But ingress qdiscs sound worth
exploring.

> Would appreciate hearing your thoughts on the above.

Thanks a lot for your response, you've certainly given me things to
think about over the weekend :)  A lot of cool 

Re: KCM - recvmsg() mangles packets?

2018-08-03 Thread Dominique Martinet
Dominique Martinet wrote on Sat, Aug 04, 2018:
> Actually, now I'm looking closer to the timing, it looks specific to the
> connection setup. This send loop works:
> int i = 1;
> while(i <= 1000) {
> int len = (i++ * 1312739ULL) % 31 + 1;
> my_msg.hdr.len = htonl(len);
> for (int j = 0; j < len; ) {
> j += snprintf(my_msg.data + j, len - j,
>   "%i", i - 1);
> }
> my_msg.data[len-1] = '\0';
> //printf("%d: writing %d\n", i-1, len);
> len = write(s, _msg, sizeof(my_msg.hdr) + len);
> if (error == -1)
> err(EXIT_FAILURE, "write");
> if (i == 2)
> usleep(1);
> }
> 
> But removing the usleep(1) after the first packet makes recvmsg()
> "fail": it reads the content of the second packet with the size of the
> first.

I talked too fast, I can get this to fail on later packets e.g.
Got 18, expected 31 on 452nd message: 453453453453453453; flags: 80

The content is 453 in a loop so this really is the 453rd packet...

But being slower e.g. doing that usleep after every single packets and I
could let the loop run until 100k without a hintch.


There really has to be something wrong, I just can't tell what from
looking at the code with my naive eyes.
Maybe we need to lock both the tcp and the kcm sockets?


Thanks,
-- 
Dominique


Re: KCM - recvmsg() mangles packets?

2018-08-03 Thread Dominique Martinet
Tom Herbert wrote on Fri, Aug 03, 2018:
> On Fri, Aug 3, 2018 at 4:20 PM, Dominique Martinet
>  wrote:
> > Tom Herbert wrote on Fri, Aug 03, 2018:
> >> struct my_proto {
> >>struct _hdr {
> >>uint32_t len;
> >> } hdr;
> >> char data[32];
> >> } __attribute__((packed));
> >>
> >> // use htons to use LE header size, since load_half does a first convertion
> >> // from network byte order
> >> const char *bpf_prog_string = " \
> >> ssize_t bpf_prog1(struct __sk_buff *skb) \
> >> { \
> >> return bpf_htons(load_half(skb, 0)) + 4; \
> >> }";
> >
> > (Just to make sure I did fix it to htonl(load_word()) and I can confirm
> > there is no difference)
> 
> You also need to htonl for
> 
> my_msg.hdr.len = (i++ * 1312739ULL) % 31 + 1;

Thanks, but this looks correct to me - I was writing the header in
little endian order here and doing the double-swap dance in the bpf prog
because the protocol I was considering making a KCM implementation for
uses that.

Just to make sure, I rewrote it using network byte order e.g. these
three points and this makes no difference:
---8<--
diff --git a/kcm.c b/kcm.c
index cb48df1..d437226 100644
--- a/kcm.c
+++ b/kcm.c
@@ -36,7 +36,7 @@ struct my_proto {
 const char *bpf_prog_string = "\
 ssize_t bpf_prog1(struct __sk_buff *skb)   \
 {  \
-   return bpf_htons(load_half(skb, 0)) + 4;\
+   return load_word(skb, 0) + 4;   \
 }";
 
 int servsock_init(int port)
@@ -110,13 +110,15 @@ void client(int port)
 
int i = 1;
while(1) {
-   my_msg.hdr.len = (i++ * 1312739ULL) % 31 + 1;
-   for (int j = 0; j < my_msg.hdr.len; ) {
-   j += snprintf(my_msg.data + j, my_msg.hdr.len - j, 
"%i", i - 1);
+   int len = (i++ * 1312739ULL) % 31 + 1;
+   my_msg.hdr.len = htonl(len);
+   for (int j = 0; j < len; ) {
+   j += snprintf(my_msg.data + j, len - j,
+ "%i", i - 1);
}
-   my_msg.data[my_msg.hdr.len-1] = '\0';
-   //printf("%d: writing %d\n", i-1, my_msg.hdr.len);
-   len = write(s, _msg, sizeof(my_msg.hdr) + my_msg.hdr.len);
+   my_msg.data[len-1] = '\0';
+   //printf("%d: writing %d\n", i-1, len);
+   len = write(s, _msg, sizeof(my_msg.hdr) + len);
if (error == -1)
err(EXIT_FAILURE, "write");
//usleep(1);
@@ -171,9 +173,10 @@ void process(int kcmfd)
len = recvmsg(kcmfd, , 0);
if (len == -1)
err(EXIT_FAILURE, "recvmsg");
-   if (len != my_msg.hdr.len + 4) {
+   if (len != ntohl(my_msg.hdr.len) + 4) {
printf("Got %d, expected %d on %dth message: %s; flags:
%x\n",
-  len - 4, my_msg.hdr.len, i, my_msg.data, 
msg.msg_flags);
+  len - 4, ntohl(my_msg.hdr.len), i,
+  my_msg.data, msg.msg_flags);
exit(1);
}
i++;
8<---

Frankly I do not believe this is a rule problem, as if the length
splitting was incorrect the program would not work at all, but just
uncommenting the usleep on the sender side makes this work.

Actually, now I'm looking closer to the timing, it looks specific to the
connection setup. This send loop works:
int i = 1;
while(i <= 1000) {
int len = (i++ * 1312739ULL) % 31 + 1;
my_msg.hdr.len = htonl(len);
for (int j = 0; j < len; ) {
j += snprintf(my_msg.data + j, len - j,
  "%i", i - 1);
}
my_msg.data[len-1] = '\0';
//printf("%d: writing %d\n", i-1, len);
len = write(s, _msg, sizeof(my_msg.hdr) + len);
if (error == -1)
err(EXIT_FAILURE, "write");
if (i == 2)
usleep(1);
}

But removing the usleep(1) after the first packet makes recvmsg()
"fail": it reads the content of the second packet with the size of the
first.


I assume that usleep gives the server time to finish setting up the kcm
socket, because it does accept(); ioctl(SIOCKCMATTACH); recvmsg(); but
the client does not wait to send packets so there could be some sort of
race with the attach and multiple packets?


FWIW I took the time to look at older kernel and this has been happening
ever since KCM got introduced in 4.6


Thanks,
-- 
Dominique


[stmmac][bug?] endianness of Flexible RX Parser code

2018-08-03 Thread Al Viro
The values passed in struct tc_u32_sel ->mask and ->val are
32bit net-endian.  Your tc_fill_entry() does this:

data = sel->keys[0].val;
mask = sel->keys[0].mask;

...
entry->frag_ptr = frag;
entry->val.match_en = (mask << (rem * 8)) &
GENMASK(31, rem * 8);
entry->val.match_data = (data << (rem * 8)) &
GENMASK(31, rem * 8);
entry->val.frame_offset = real_off;
entry->prio = prio;

frag->val.match_en = (mask >> (rem * 8)) &
GENMASK(rem * 8 - 1, 0);
frag->val.match_data = (data >> (rem * 8)) &
GENMASK(rem * 8 - 1, 0);
frag->val.frame_offset = real_off + 1;
frag->prio = prio;
frag->is_frag = true;

and that looks very odd.  rem here is offset modulo 4.  Suppose offset is
equal to 5, val contains {V0, V1, V2, V3} and mask - {M0, M1, M2, M3}.
Then on little-endian host we get
entry->val.match_en:{0, M0, M1, M2}
entry->val.match_data:  {0, V0, V1, V2}
entry->val.frame_offset = 1;
frag->val.match_en: {M3, 0, 0, 0}
frag->val.match_data:   {V3, 0, 0, 0}
frag->val.frame_offset = 2;
and on big-endian
entry->val.match_en:{M1, M2, M3, 0}
entry->val.match_data:  {V1, V2, V3, 0}
entry->val.frame_offset = 1;
frag->val.match_en: {0, 0, 0, M0}
frag->val.match_data:   {0, 0, 0, V0}
frag->val.frame_offset = 2;

Little-endian variant looks like we mask octets 5, 6, 7 and 8 with
M0..M3 resp. and want V0..V3 in those.  On big-endian, though, we
look at the octets 11, 4, 5 and 6 instead.

I don't know the hardware (and it might be pulling any kind of weird
endianness-dependent stunts), but that really smells like a bug.
It looks like that code is trying to do something like

data = ntohl(sel->keys[0].val);
mask = ntohl(sel->keys[0].mask);
shift = rem * 8;

entry->val.match_en = htonl(mask >> shift);
entry->val.match_data = htonl(data >> shift);
entry->val.frame_offset = real_off;
...
frag->val.match_en = htonl(mask << (32 - shift));
frag->val.match_data = htonl(data << (32 - shift));
entry->val.frame_offset = real_off + 1;

Comments?


Re: KCM - recvmsg() mangles packets?

2018-08-03 Thread Tom Herbert
On Fri, Aug 3, 2018 at 4:20 PM, Dominique Martinet
 wrote:
> Tom Herbert wrote on Fri, Aug 03, 2018:
>> struct my_proto {
>>struct _hdr {
>>uint32_t len;
>> } hdr;
>> char data[32];
>> } __attribute__((packed));
>>
>> // use htons to use LE header size, since load_half does a first convertion
>> // from network byte order
>> const char *bpf_prog_string = " \
>> ssize_t bpf_prog1(struct __sk_buff *skb) \
>> { \
>> return bpf_htons(load_half(skb, 0)) + 4; \
>> }";
>>
>> The length in hdr is uint32_t above, but this looks like it's being
>> read as a short.
>
> Err, I agree this is obviously wrong here (I can blame my lack of
> attention to this and the example I used), but this isn't the problem as
> the actual size is between 0 and 32 -- I could use any size I want here
> and the result would the same.
>
> A "real" problem with the conversion program would mean that my example
> would not work if I slow it down, but I can send as many packet as I
> want if I uncomment the usleep() on the client side or if I just
> throttle the network stack with a loud tcpdump writing to stdout -- that
> means the algorithm is working even if it's making some badly-sized
> conversions.
>
> (Just to make sure I did fix it to htonl(load_word()) and I can confirm
> there is no difference)
>

You also need to htonl for

my_msg.hdr.len = (i++ * 1312739ULL) % 31 + 1;


>
> Thanks,
> --
> Dominique Martinet


Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-08-03 Thread Ben Pfaff
On Sat, Aug 04, 2018 at 02:43:24AM +0200, Stefano Brivio wrote:
> On Fri, 3 Aug 2018 16:01:08 -0700
> Ben Pfaff  wrote:
> > I would be very pleased if we could integrate a simple mechanism for
> > fairness, based for now on some simple criteria like the source port,
> > but thinking ahead to how we could later make it gracefully extensible
> > to consider more general and possibly customizable criteria.
> 
> We could change the patch so that instead of just using the vport for
> round-robin queue insertion, we generalise that and use "buckets"
> instead of vports, and have a set of possible functions that are called
> instead of using port_no directly in ovs_dp_upcall_queue_roundrobin(),
> making this configurable via netlink, per datapath.
> 
> We could implement selection based on source port or a hash on the
> source 5-tuple, and the relevant bits of
> ovs_dp_upcall_queue_roundrobin() would look like this:

[...]

> What do you think?

I'd support that.  Thanks.


Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-08-03 Thread Stefano Brivio
On Fri, 3 Aug 2018 16:01:08 -0700
Ben Pfaff  wrote:

> I think that a simple mechanism for fairness is fine.  The direction
> of extensibility that makes me anxious is how to decide what matters
> for fairness.  So far, we've talked about per-vport fairness.  That
> works pretty well for packets coming in from virtual interfaces where
> each vport represents a separate VM.

Yes, right, that's the case where we have significant issues currently.

> It does not work well if the traffic filling your queues all comes
> from a single physical port because some source of traffic is sending
> traffic at a high rate.  In that case, you'll do a lot better if you
> do fairness based on the source 5-tuple. But if you're doing network
> virtualization, then the outer source 5-tuples won't necessarily vary
> much and you'd be better off looking at the VNI and maybe some Geneve
> TLV options and maybe the inner 5-tuple...

Sure, I see what you mean now. That looks entirely doable if we
abstract the round-robin bucket selection out of the current patch.

> I would be very pleased if we could integrate a simple mechanism for
> fairness, based for now on some simple criteria like the source port,
> but thinking ahead to how we could later make it gracefully extensible
> to consider more general and possibly customizable criteria.

We could change the patch so that instead of just using the vport for
round-robin queue insertion, we generalise that and use "buckets"
instead of vports, and have a set of possible functions that are called
instead of using port_no directly in ovs_dp_upcall_queue_roundrobin(),
making this configurable via netlink, per datapath.

We could implement selection based on source port or a hash on the
source 5-tuple, and the relevant bits of
ovs_dp_upcall_queue_roundrobin() would look like this:

static int ovs_dp_upcall_queue_roundrobin(struct datapath *dp,
  struct dp_upcall_info *upcall)
{

[...]

list_for_each_entry(pos, head, list) {
int bucket = dp->rr_select(pos);

/* Count per-bucket upcalls. */
if (dp->upcalls.count[bucket] == U8_MAX) {
err = -ENOSPC;
goto out_clear;
}
dp->upcalls.count[bucket]++;

if (bucket == upcall->bucket) {
/* Another upcall for the same bucket: move insertion
 * point here, keep looking for insertion condition to
 * be still met further on.
 */
find_next = true;
here = pos;
continue;
}

count = dp->upcalls.count[bucket];
if (find_next && dp->upcalls.count[bucket] >= count) {
/* Insertion condition met: no need to look further,
 * unless another upcall for the same port occurs later.
 */
find_next = false;
here = pos;
}
}

[...]

}

and implementations for dp->rr_select() would look like:

int rr_select_vport(struct dp_upcall_info *upcall)
{
return upcall->port_no;
}

int rr_select_srcport(struct dp_upcall_info *upcall)
{
/* look up source port from upcall->skb... */
}

And we could then easily extend this to use BPF with maps one day.

This is for clarity by the way, but I guess we should avoid indirect
calls in the final implementation. 

What do you think?

-- 
Stefano


Re: [PATCH net 0/5] tcp: more robust ooo handling

2018-08-03 Thread David Miller
From: David Woodhouse 
Date: Fri, 03 Aug 2018 11:55:37 +0100

> I see the first four in 4.9.116 but not the fifth (adding
> tcp_ooo_try_coalesce()).
> 
> Is that intentional? 

I don't work on the 4.9 -stable backports, so I personally have
no idea.

I submitted for 4.17 and 4.14


Re: [PATCH v2 net-next] af_unix: ensure POLLOUT on remote close() for connected dgram socket

2018-08-03 Thread David Miller
From: Jason Baron 
Date: Fri,  3 Aug 2018 17:24:53 -0400

> Applications use -ECONNREFUSED as returned from write() in order to
> determine that a socket should be closed. However, when using connected
> dgram unix sockets in a poll/write loop, a final POLLOUT event can be
> missed when the remote end closes. Thus, the poll is stuck forever:
> 
>   thread 1 (client)   thread 2 (server)
> 
> connect() to server
> write() returns -EAGAIN
> unix_dgram_poll()
>  -> unix_recvq_full() is true
>close()
> ->unix_release_sock()
>  ->wake_up_interruptible_all()
> unix_dgram_poll() (due to the
>  wake_up_interruptible_all)
>  -> unix_recvq_full() still is true
>  ->free all skbs
> 
> 
> Now thread 1 is stuck and will not receive anymore wakeups. In this
> case, when thread 1 gets the -EAGAIN, it has not queued any skbs
> otherwise the 'free all skbs' step would in fact cause a wakeup and
> a POLLOUT return. So the race here is probably fairly rare because
> it means there are no skbs that thread 1 queued and that thread 1
> schedules before the 'free all skbs' step.
> 
> This issue was reported as a hang when /dev/log is closed.
> 
> The fix is to signal POLLOUT if the socket is marked as SOCK_DEAD, which
> means a subsequent write() will get -ECONNREFUSED.
> 
> Reported-by: Ian Lance Taylor 
> Cc: David Rientjes 
> Cc: Rainer Weikusat 
> Cc: Eric Dumazet 
> Signed-off-by: Jason Baron 
> ---
> v2: use check for SOCK_DEAD, since skb's can be purged in 
> unix_sock_destructor()

Applied, thanks Jason.


Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-08-03 Thread Ben Pfaff
On Fri, Aug 03, 2018 at 06:52:41PM +0200, Stefano Brivio wrote:
> On Tue, 31 Jul 2018 15:06:57 -0700 Ben Pfaff  wrote:
> > My current thought is that any fairness scheme we implement directly in
> > the kernel is going to need to evolve over time.  Maybe we could do
> > something flexible with BPF and maps, instead of hard-coding it.
> 
> Honestly, I fail to see what else we might want to do here, other than
> adding a simple mechanism for fairness, to solve the specific issue at
> hand. Flexibility would probably come at a higher cost. We could easily
> make limits configurable if needed. Do you have anything else in mind?

I think that a simple mechanism for fairness is fine.  The direction of
extensibility that makes me anxious is how to decide what matters for
fairness.  So far, we've talked about per-vport fairness.  That works
pretty well for packets coming in from virtual interfaces where each
vport represents a separate VM.  It does not work well if the traffic
filling your queues all comes from a single physical port because some
source of traffic is sending traffic at a high rate.  In that case,
you'll do a lot better if you do fairness based on the source 5-tuple.
But if you're doing network virtualization, then the outer source
5-tuples won't necessarily vary much and you'd be better off looking at
the VNI and maybe some Geneve TLV options and maybe the inner 5-tuple...

I would be very pleased if we could integrate a simple mechanism for
fairness, based for now on some simple criteria like the source port,
but thinking ahead to how we could later make it gracefully extensible
to consider more general and possibly customizable criteria.

Thanks,

Ben.


Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()

2018-08-03 Thread Marcelo Ricardo Leitner
On Fri, Aug 03, 2018 at 07:21:00PM +0300, Konstantin Khorenko wrote:
...
> Performance results:
> 
>   * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this 
> thread)
>   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
>   RAM: 32 Gb
> 
>   * netperf: taken from https://github.com/HewlettPackard/netperf.git,
>compiled from sources with sctp support
>   * netperf server and client are run on the same node
>   * ip link set lo mtu 1500
> 
> The script used to run tests:
>  # cat run_tests.sh
>  #!/bin/bash
> 
> for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
>   echo "TEST: $test";
>   for i in `seq 1 3`; do
> echo "Iteration: $i";
> set -x
> netperf -t $test -H localhost -p 2 -S 20,20 -s 20,20 \
> -l 60 -- -m 1452;
> set +x
>   done
> done
> 
> 
> Results (a bit reformatted to be more readable):
...

Nice, good numbers.

I'm missing some test that actually uses more than 1 stream. All tests
in netperf uses only 1 stream. They can use 1 or Many associations on
a socket, but not multiple streams. That means the numbers here show
that we shouldn't see any regression on the more traditional uses, per
Michael's reply on the other email, but it is not testing how it will
behave if we go crazy and use the 64k streams (worst case).

You'll need some other tool to test it. One idea is sctp_test, from
lksctp-tools. Something like:

Server side: 
./sctp_test -H 172.0.0.1 -P 2 -l -d 0
Client side: 
time ./sctp_test -H 172.0.0.1 -P 1 \
-h 172.0.0.1 -p 2 -s \
-c 1 -M 65535 -T -t 1 -x 10 -d 0

And then measure the difference on how long each test took. Can you
get these too?

Interesting that in my laptop just to start this test for the first
time can took some *seconds*. Seems kernel had a hard time
defragmenting the memory here. :)

Thanks,
Marcelo


Re: KCM - recvmsg() mangles packets?

2018-08-03 Thread Dominique Martinet
Tom Herbert wrote on Fri, Aug 03, 2018:
> struct my_proto {
>struct _hdr {
>uint32_t len;
> } hdr;
> char data[32];
> } __attribute__((packed));
> 
> // use htons to use LE header size, since load_half does a first convertion
> // from network byte order
> const char *bpf_prog_string = " \
> ssize_t bpf_prog1(struct __sk_buff *skb) \
> { \
> return bpf_htons(load_half(skb, 0)) + 4; \
> }";
> 
> The length in hdr is uint32_t above, but this looks like it's being
> read as a short.

Err, I agree this is obviously wrong here (I can blame my lack of
attention to this and the example I used), but this isn't the problem as
the actual size is between 0 and 32 -- I could use any size I want here
and the result would the same.

A "real" problem with the conversion program would mean that my example
would not work if I slow it down, but I can send as many packet as I
want if I uncomment the usleep() on the client side or if I just
throttle the network stack with a loud tcpdump writing to stdout -- that
means the algorithm is working even if it's making some badly-sized
conversions.

(Just to make sure I did fix it to htonl(load_word()) and I can confirm
there is no difference)


Thanks,
-- 
Dominique Martinet


[linux-next:master 11347/11797] include/linux/compiler.h:61:17: warning: 'ib_query_gid' is deprecated

2018-08-03 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
master
head:   116b181bb646afedd770985de20a68721bdb2648
commit: 9f4c2b1ceca8021c68c23e655b275c693367a48c [11347/11797] 
next-20180802/net-next
config: x86_64-randconfig-s1-08040602 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
git checkout 9f4c2b1ceca8021c68c23e655b275c693367a48c
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   net/smc/smc_ib.c: In function 'smc_ib_fill_mac':
   net/smc/smc_ib.c:152:2: warning: 'ib_query_gid' is deprecated 
[-Wdeprecated-declarations]
 rc = ib_query_gid(smcibdev->ibdev, ibport, 0, , );
 ^~
   In file included from net/smc/smc_ib.c:19:0:
   include/rdma/ib_cache.h:139:32: note: declared here
static inline __deprecated int ib_query_gid(struct ib_device *device,
   ^~~~
   net/smc/smc_ib.c: In function 'smc_ib_determine_gid':
   net/smc/smc_ib.c:190:3: warning: 'ib_query_gid' is deprecated 
[-Wdeprecated-declarations]
  if (ib_query_gid(smcibdev->ibdev, ibport, i, &_gid, ))
  ^~
   In file included from net/smc/smc_ib.c:19:0:
   include/rdma/ib_cache.h:139:32: note: declared here
static inline __deprecated int ib_query_gid(struct ib_device *device,
   ^~~~
   net/smc/smc_ib.c:190:3: warning: 'ib_query_gid' is deprecated 
[-Wdeprecated-declarations]
  if (ib_query_gid(smcibdev->ibdev, ibport, i, &_gid, ))
  ^~
   In file included from net/smc/smc_ib.c:19:0:
   include/rdma/ib_cache.h:139:32: note: declared here
static inline __deprecated int ib_query_gid(struct ib_device *device,
   ^~~~
   In file included from include/linux/kernel.h:10:0,
from include/linux/list.h:9,
from include/linux/random.h:10,
from net/smc/smc_ib.c:15:
>> include/linux/compiler.h:61:17: warning: 'ib_query_gid' is deprecated 
>> [-Wdeprecated-declarations]
  static struct ftrace_branch_data   \
^
   include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
>> net/smc/smc_ib.c:190:3: note: in expansion of macro 'if'
  if (ib_query_gid(smcibdev->ibdev, ibport, i, &_gid, ))
  ^~
   In file included from net/smc/smc_ib.c:19:0:
   include/rdma/ib_cache.h:139:32: note: declared here
static inline __deprecated int ib_query_gid(struct ib_device *device,
   ^~~~
--
   net//smc/smc_ib.c: In function 'smc_ib_fill_mac':
   net//smc/smc_ib.c:152:2: warning: 'ib_query_gid' is deprecated 
[-Wdeprecated-declarations]
 rc = ib_query_gid(smcibdev->ibdev, ibport, 0, , );
 ^~
   In file included from net//smc/smc_ib.c:19:0:
   include/rdma/ib_cache.h:139:32: note: declared here
static inline __deprecated int ib_query_gid(struct ib_device *device,
   ^~~~
   net//smc/smc_ib.c: In function 'smc_ib_determine_gid':
   net//smc/smc_ib.c:190:3: warning: 'ib_query_gid' is deprecated 
[-Wdeprecated-declarations]
  if (ib_query_gid(smcibdev->ibdev, ibport, i, &_gid, ))
  ^~
   In file included from net//smc/smc_ib.c:19:0:
   include/rdma/ib_cache.h:139:32: note: declared here
static inline __deprecated int ib_query_gid(struct ib_device *device,
   ^~~~
   net//smc/smc_ib.c:190:3: warning: 'ib_query_gid' is deprecated 
[-Wdeprecated-declarations]
  if (ib_query_gid(smcibdev->ibdev, ibport, i, &_gid, ))
  ^~
   In file included from net//smc/smc_ib.c:19:0:
   include/rdma/ib_cache.h:139:32: note: declared here
static inline __deprecated int ib_query_gid(struct ib_device *device,
   ^~~~
   In file included from include/linux/kernel.h:10:0,
from include/linux/list.h:9,
from include/linux/random.h:10,
from net//smc/smc_ib.c:15:
>> include/linux/compiler.h:61:17: warning: 'ib_query_gid' is deprecated 
>> [-Wdeprecated-declarations]
  static struct ftrace_branch_data   \
^
   include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   net//smc/smc_ib.c:190:3: note: in expansion of macro 'if'
  if (ib_query_gid(smcibdev->ibdev, ibport, i, &_gid, ))
  ^~
   In file included from net//smc/smc_ib.c:19:0:
   include/rdma/ib_cache.h:139:32: note: declared here
static inline __deprecated int ib_query_gid(struct ib_device *device,
   ^~~~

vim +/ib_query_gid +61 include/linux/compiler.h

2bcd521a Steven Rostedt 2008-11-21  50  
2bcd521a Steven 

Re: KCM - recvmsg() mangles packets?

2018-08-03 Thread Tom Herbert
struct my_proto {
   struct _hdr {
   uint32_t len;
} hdr;
char data[32];
} __attribute__((packed));

// use htons to use LE header size, since load_half does a first convertion
// from network byte order
const char *bpf_prog_string = " \
ssize_t bpf_prog1(struct __sk_buff *skb) \
{ \
return bpf_htons(load_half(skb, 0)) + 4; \
}";


On Fri, Aug 3, 2018 at 11:28 AM, Dominique Martinet
 wrote:
> I've been playing with KCM on a 4.18.0-rc7 kernel and I'm running in a
> problem where the iovec filled by recvmsg() is mangled up: it is filled
> by the length of one packet, but contains (truncated) data from another
> packet, rendering KCM unuseable.
>
> (I haven't tried old kernels to see for how long this is broken/try to
> bisect; I might if there's no progress but this might be simpler than I
> think)
>
>
> I've attached a reproducer, a simple program that forks, creates a tcp
> server/client, attach the server socket to a kcm socket, and in an
> infinite loop sends varying-length messages from the client to the
> server.
> The loop stops when the server gets a message which length is not the
> length indicated in the packet header, rather fast (I can make it run
> for a while if I slow down emission, or if I run a verbose tcpdump for
> example)
>
>From the reproducer:

struct my_proto {
   struct _hdr {
   uint32_t len;
} hdr;
char data[32];
} __attribute__((packed));

// use htons to use LE header size, since load_half does a first convertion
// from network byte order
const char *bpf_prog_string = " \
ssize_t bpf_prog1(struct __sk_buff *skb) \
{ \
return bpf_htons(load_half(skb, 0)) + 4; \
}";

The length in hdr is uint32_t above, but this looks like it's being
read as a short.

Tom

> In the quiet version on a VM on my laptop, I get this output:
> [root@f2 ~]# gcc -g -l bcc -o kcm kcm.c
> [root@f2 ~]# ./kcm
> client is starting
> server is starting
> server is receiving data
> Got 14, expected 27 on 1th message: 22; flags: 80
>
> The client sends message deterministacally, first one is 14 bytes filled
> with 1, second one is 27 bytes filled with 2, third one is 9 bytes
> filled with 3 etc (final digit is actually a \0 instead)
>
> As we can see, the server received 14 '2', and the header size matches
> the second message header, so something went wrong™.
> Flags 0x80 is MSG_EOR meaning recvmsg copied the full message.
>
>
>
> This happens even if I reduce the VMs CPU to 1, so I was thinking some
> irq messes with the sock between skb_peek and the actual copy of the
> data (as this deos work if I send slowly!), but even disabling
> irq/preempt doesn't seem to help so I'm not sure what to try next.
>
> Any idea?
>
>
> Thanks,
> --
> Dominique Martinet


[PATCH v2 net-next] af_unix: ensure POLLOUT on remote close() for connected dgram socket

2018-08-03 Thread Jason Baron
Applications use -ECONNREFUSED as returned from write() in order to
determine that a socket should be closed. However, when using connected
dgram unix sockets in a poll/write loop, a final POLLOUT event can be
missed when the remote end closes. Thus, the poll is stuck forever:

  thread 1 (client)   thread 2 (server)

connect() to server
write() returns -EAGAIN
unix_dgram_poll()
 -> unix_recvq_full() is true
   close()
->unix_release_sock()
 ->wake_up_interruptible_all()
unix_dgram_poll() (due to the
 wake_up_interruptible_all)
 -> unix_recvq_full() still is true
 ->free all skbs


Now thread 1 is stuck and will not receive anymore wakeups. In this
case, when thread 1 gets the -EAGAIN, it has not queued any skbs
otherwise the 'free all skbs' step would in fact cause a wakeup and
a POLLOUT return. So the race here is probably fairly rare because
it means there are no skbs that thread 1 queued and that thread 1
schedules before the 'free all skbs' step.

This issue was reported as a hang when /dev/log is closed.

The fix is to signal POLLOUT if the socket is marked as SOCK_DEAD, which
means a subsequent write() will get -ECONNREFUSED.

Reported-by: Ian Lance Taylor 
Cc: David Rientjes 
Cc: Rainer Weikusat 
Cc: Eric Dumazet 
Signed-off-by: Jason Baron 
---
v2: use check for SOCK_DEAD, since skb's can be purged in unix_sock_destructor()
---
 net/unix/af_unix.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 1772a0e..d1edfa3 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -430,7 +430,12 @@ static int unix_dgram_peer_wake_me(struct sock *sk, struct 
sock *other)
 
connected = unix_dgram_peer_wake_connect(sk, other);
 
-   if (unix_recvq_full(other))
+   /* If other is SOCK_DEAD, we want to make sure we signal
+* POLLOUT, such that a subsequent write() can get a
+* -ECONNREFUSED. Otherwise, if we haven't queued any skbs
+* to other and its full, we will hang waiting for POLLOUT.
+*/
+   if (unix_recvq_full(other) && !sock_flag(other, SOCK_DEAD))
return 1;
 
if (connected)
-- 
1.9.1



Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()

2018-08-03 Thread Michael Tuexen



> On 3. Aug 2018, at 22:30, Marcelo Ricardo Leitner  
> wrote:
> 
> On Fri, Aug 03, 2018 at 04:43:28PM +, David Laight wrote:
>> From: Konstantin Khorenko
>>> Sent: 03 August 2018 17:21
>>> 
>>> Each SCTP association can have up to 65535 input and output streams.
>>> For each stream type an array of sctp_stream_in or sctp_stream_out
>>> structures is allocated using kmalloc_array() function. This function
>>> allocates physically contiguous memory regions, so this can lead
>>> to allocation of memory regions of very high order, i.e.:
>> ...
>> 
>> Given how useless SCTP streams are, does anything actually use
>> more than about 4?
> 
> Maybe Michael can help us with that. I'm also curious now.
In the context of SIGTRAN I have seen 17 streams...

In the context of WebRTC I have seen more streams. In general,
the streams concept seems to be useful. QUIC has lots of streams.

So I'm wondering why they are considered useless.
David, can you elaborate on this?

Best regards
Michael
> 
>  Marcelo



Re: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams

2018-08-03 Thread Marcelo Ricardo Leitner
On Fri, Aug 03, 2018 at 07:21:01PM +0300, Konstantin Khorenko wrote:
> This patch introduces wrappers for accessing in/out streams indirectly.
> This will enable to replace physically contiguous memory arrays
> of streams with flexible arrays (or maybe any other appropriate
> mechanism) which do memory allocation on a per-page basis.
> 
> Signed-off-by: Oleg Babin 
> Signed-off-by: Konstantin Khorenko 
> 
> ---
> v2 changes:
>  sctp_stream_in() users are updated to provide stream as an argument,
>  sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().
> ---

...

>  
>  struct sctp_stream {
> - struct sctp_stream_out *out;
> - struct sctp_stream_in *in;
> + struct flex_array *out;
> + struct flex_array *in;

If this patch was meant to be a preparation, shouldn't this belong to
the next patch instead?

  Marcelo


Re: [PATCH net] dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart()

2018-08-03 Thread David Miller
From: Alexey Kodanev 
Date: Thu,  2 Aug 2018 19:22:05 +0300

> Make sure that the value of "(now - hc->tx_lsndtime) / hc->tx_rto" is
> properly limited when shifting 'u32 cwnd' with it, otherwise we can get:
 ...
> Fixes: 113ced1f52e5 ("dccp ccid-2: Perform congestion-window validation")
> Signed-off-by: Alexey Kodanev 
 ...
> @@ -234,7 +234,7 @@ static void ccid2_cwnd_restart(struct sock *sk, const u32 
> now)
>  
>   /* don't reduce cwnd below the initial window (IW) */
>   restart_cwnd = min(cwnd, iwnd);
> - cwnd >>= (now - hc->tx_lsndtime) / hc->tx_rto;
> + cwnd >>= min((now - hc->tx_lsndtime) / hc->tx_rto, 31U);
>   hc->tx_cwnd = max(cwnd, restart_cwnd);
>  
>   hc->tx_cwnd_stamp = now;

Better to mimick the TCP cwnd validation code, something like:

s32 delta = now - hc->tx_lsndtime;
while ((delta -= hc->tx_rto) > 0 && cwnd > restart_cwnd)
cwnd >>= 1;

Thanks.


Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()

2018-08-03 Thread Marcelo Ricardo Leitner
On Fri, Aug 03, 2018 at 04:43:28PM +, David Laight wrote:
> From: Konstantin Khorenko
> > Sent: 03 August 2018 17:21
> > 
> > Each SCTP association can have up to 65535 input and output streams.
> > For each stream type an array of sctp_stream_in or sctp_stream_out
> > structures is allocated using kmalloc_array() function. This function
> > allocates physically contiguous memory regions, so this can lead
> > to allocation of memory regions of very high order, i.e.:
> ...
> 
> Given how useless SCTP streams are, does anything actually use
> more than about 4?

Maybe Michael can help us with that. I'm also curious now.

  Marcelo


Re: [PATCH ethtool] ethtool: Add support for WAKE_FILTER

2018-08-03 Thread David Miller
From: Florian Fainelli 
Date: Fri, 3 Aug 2018 12:58:12 -0700

> For instance, in the current HW, you can program 128 filters through
> the switch, but only 8 of those could be wake-up capable at the
> CPU/management (SYSTEM PORT) level.

Yes, I noticed this in the driver patches.

> Let me cook something that does just that and re-post.
> 
> Thanks for your feedback!

No problem.


Re: [PATCH ethtool] ethtool: Add support for WAKE_FILTER

2018-08-03 Thread Florian Fainelli
On 08/03/2018 12:07 PM, David Miller wrote:
> From: Florian Fainelli 
> Date: Fri, 3 Aug 2018 10:57:13 -0700
> 
>> Does the current approach of specifying a bitmask of filters looks
>> reasonable to you though?
> 
> So, in order to answer that, I need some clarification.
> 
> The mask, as I see it, is a bit map of 48 possible positions
> (SOPASS_MAX * bits_per_byte).  How do these bits map to individual
> rxnfc entries?

Correct about the size, it is 48-bits, each bit indeed does map to a
filter location. So if you programmed a filter a location 1, you would
pass 0x2 as the wake-on filter bitmask, etc.

> 
> Are they locations?  If so, how are special locations handled?
> 
> What about "special" locations, where the driver and/or hardware
> are supposed to decide the location based upon the "special" type
> used?

I would not think they require special handling because the process is
kind of two step right now:

- first you program the desired filter (special location or not) and you
obtain an unique ID back
- second you program the desired filter mask with that ID as a bit
position that must be set

So the special location handling was kind of done by the kernel/driver
on the first filter insertion and you just pass that unique filter ID
around.

The reason why it was done as a two step process was largely because the
DSA switch driver, which is the one supporting the filter programming is
a discrete driver from the SYSTEM PORT driver which supports the
wake-on-filter thing. The two do communicate with one another through
the means of the DSA layer though.

Now that I think about it some more, see below, I prefer you approach
since it eliminates the "passing that ID around" step.

> 
> If you considered the following, and you explained why it won't
> work, I apologize.  But I'm wondering why you just don't find
> some way to specify this as a boolean of the flow spec in the
> rxnfc request or similar?
> 
> That, at least semantically, seems to avoids several issues.  And it
> is unambiguous what flow rule the wake filter boolean applies to.
> 
> Right?

Yes, it would actually remove the need for having to specify a storage
location between user-space and kernel space and we would also be able
to valid ahead of time that we are not overflowing the wake-on-LAN
filter capacity. For instance, in the current HW, you can program 128
filters through the switch, but only 8 of those could be wake-up capable
at the CPU/management (SYSTEM PORT) level.

Let me cook something that does just that and re-post.

Thanks for your feedback!
-- 
Florian


Re: [PATCH v2 net-next 0/3] ip: Use rb trees for IP frag queue

2018-08-03 Thread Peter Oskolkov
On Fri, Aug 3, 2018 at 12:33 PM Josh Hunt  wrote:
>
> On Thu, Aug 2, 2018 at 4:34 PM, Peter Oskolkov  wrote:
>>
>> This patchset
>>  * changes IPv4 defrag behavior to match that of IPv6: overlapping
>>fragments now cause the whole IP datagram to be discarded (suggested
>>by David Miller): there are no legitimate use cases for overlapping
>>fragments;
>>  * changes IPv4 defrag queue from a list to a rb tree (suggested
>>by Eric Dumazet): this change removes a potential attach vector.
>>
>> Upcoming patches will contain similar changes for IPv6 frag queue,
>> as well as a comprehensive IP defrag self-test (temporarily delayed).
>>
>> Peter Oskolkov (3):
>>   ip: discard IPv4 datagrams with overlapping segments.
>>   net: modify skb_rbtree_purge to return the truesize of all purged
>> skbs.
>>   ip: use rb trees for IP frag queue.
>>
>>  include/linux/skbuff.h  |  11 +-
>>  include/net/inet_frag.h |   3 +-
>>  include/uapi/linux/snmp.h   |   1 +
>>  net/core/skbuff.c   |   6 +-
>>  net/ipv4/inet_fragment.c|  16 +-
>>  net/ipv4/ip_fragment.c  | 239 +++-
>>  net/ipv4/proc.c |   1 +
>>  net/ipv6/netfilter/nf_conntrack_reasm.c |   1 +
>>  net/ipv6/reassembly.c   |   1 +
>>  9 files changed, 139 insertions(+), 140 deletions(-)
>>
>> --
>> 2.18.0.597.ga71716f1ad-goog
>>
>
> Peter
>
> I just tested your patches along with Florian's on top of net-next. Things 
> look much better wrt this type of attack. Thanks for doing this. I'm 
> wondering if we want to put an optional mechanism in place to limit the size 
> of the tree in terms of skbs it can hold? Otherwise an attacker can send 
> ~1400 8 byte frags and consume all frag memory (default high thresh is 4M) 
> pretty easily and I believe also evict other frags which may have been 
> pending? I am guessing this is what Florian's min MTU patches are trying to 
> help with.
>
> --
> Josh

Hi Josh,

It will be really easy to limit the size of the queue/tree (e.g. based
on a sysctl parameter). I can send a follow-up patch if there is a
consensus that this behavior is needed/useful.

Thanks,
Peter


Re: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams

2018-08-03 Thread David Miller
From: Konstantin Khorenko 
Date: Fri,  3 Aug 2018 19:21:01 +0300

> +struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
> + __u16 sid)
> +{
> + return ((struct sctp_stream_out *)(stream->out)) + sid;
> +}
> +
> +struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
> +   __u16 sid)
> +{
> + return ((struct sctp_stream_in *)(stream->in)) + sid;
> +}

I agree with David that these should be in a header file, and marked
inline.


Re: [PATCH net] l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()

2018-08-03 Thread David Miller
From: Guillaume Nault 
Date: Fri, 3 Aug 2018 17:00:11 +0200

> If 'session' is not NULL and is not a PPP pseudo-wire, then we fail to
> drop the reference taken by l2tp_session_get().
> 
> Fixes: ecd012e45ab5 ("l2tp: filter out non-PPP sessions in 
> pppol2tp_tunnel_ioctl()")
> Signed-off-by: Guillaume Nault 
> ---
> Sorry for the stupid mistake. I guess I got blinded by the apparent
> simplicity of the bug when I wrote the original patch.

Applied, thanks.

I'm pretty sure I backported the commit this fixes, so I'm queueing
this up for -stable as well.


Re: [Patch net] ipv6: fix double refcount of fib6_metrics

2018-08-03 Thread David Miller
From: Cong Wang 
Date: Thu,  2 Aug 2018 23:20:38 -0700

> All the callers of ip6_rt_copy_init()/rt6_set_from() hold refcnt
> of the "from" fib6_info, so there is no need to hold fib6_metrics
> refcnt again, because fib6_metrics refcnt is only released when
> fib6_info is gone, that is, they have the same life time, so the
> whole fib6_metrics refcnt can be removed actually.
> 
> This fixes a kmemleak warning reported by Sabrina.
> 
> Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst 
> based routes")
> Reported-by: Sabrina Dubroca 
> Cc: Sabrina Dubroca 
> Cc: David Ahern 
> Signed-off-by: Cong Wang 

Sabrina, please review!


Re: [PATCH net 0/4] mlxsw: Fix ACL actions error condition handling

2018-08-03 Thread David Miller
From: Ido Schimmel 
Date: Fri,  3 Aug 2018 15:57:40 +0300

> Nir says:
> 
> Two issues were lately noticed within mlxsw ACL actions error condition
> handling. The first patch deals with conflicting actions such as:
> 
>  # tc filter add dev swp49 parent : \
>protocol ip pref 10 flower skip_sw dst_ip 192.168.101.1 \
>action goto chain 100 \
>action mirred egress redirect dev swp4
> 
> The second action will never execute, however SW model allows this
> configuration, while the mlxsw driver cannot allow for it as it
> implements actions in sets of up to three actions per set with a single
> termination marking. Conflicting actions create a contradiction over
> this single marking and thus cannot be configured. The fix replaces a
> misplaced warning with an error code to be returned.
> 
> Patches 2-4 fix a condition of duplicate destruction of resources. Some
> actions require allocation of specific resource prior to setting the
> action itself. On error condition this resource was destroyed twice,
> leading to a crash when using mirror action, and to a redundant
> destruction in other cases, since for error condition rule destruction
> also takes care of resource destruction. In order to fix this state a
> symmetry in behavior is added and resource destruction also takes care
> of removing the resource from rule's resource list.

Series applied, and queued up for -stable.

And thanks especially for the merge conflict heads up.


Re: [PATCH v2 net-next 0/3] ip: Use rb trees for IP frag queue

2018-08-03 Thread Josh Hunt
On Thu, Aug 2, 2018 at 4:34 PM, Peter Oskolkov  wrote:

> This patchset
>  * changes IPv4 defrag behavior to match that of IPv6: overlapping
>fragments now cause the whole IP datagram to be discarded (suggested
>by David Miller): there are no legitimate use cases for overlapping
>fragments;
>  * changes IPv4 defrag queue from a list to a rb tree (suggested
>by Eric Dumazet): this change removes a potential attach vector.
>
> Upcoming patches will contain similar changes for IPv6 frag queue,
> as well as a comprehensive IP defrag self-test (temporarily delayed).
>
> Peter Oskolkov (3):
>   ip: discard IPv4 datagrams with overlapping segments.
>   net: modify skb_rbtree_purge to return the truesize of all purged
> skbs.
>   ip: use rb trees for IP frag queue.
>
>  include/linux/skbuff.h  |  11 +-
>  include/net/inet_frag.h |   3 +-
>  include/uapi/linux/snmp.h   |   1 +
>  net/core/skbuff.c   |   6 +-
>  net/ipv4/inet_fragment.c|  16 +-
>  net/ipv4/ip_fragment.c  | 239 +++-
>  net/ipv4/proc.c |   1 +
>  net/ipv6/netfilter/nf_conntrack_reasm.c |   1 +
>  net/ipv6/reassembly.c   |   1 +
>  9 files changed, 139 insertions(+), 140 deletions(-)
>
> --
> 2.18.0.597.ga71716f1ad-goog
>
>
Peter

I just tested your patches along with Florian's on top of net-next. Things
look much better wrt this type of attack. Thanks for doing this. I'm
wondering if we want to put an optional mechanism in place to limit the
size of the tree in terms of skbs it can hold? Otherwise an attacker can
send ~1400 8 byte frags and consume all frag memory (default high thresh is
4M) pretty easily and I believe also evict other frags which may have been
pending? I am guessing this is what Florian's min MTU patches are trying to
help with.

-- 
Josh


[PATCH net-next] net: sched: cls_flower: Fix an error code in fl_tmplt_create()

2018-08-03 Thread Dan Carpenter
We forgot to set the error code on this path, so we return NULL instead
of an error pointer.  In the current code kzalloc() won't fail for small
allocations so this doesn't really affect runtime.

Fixes: b95ec7eb3b4d ("net: sched: cls_flower: implement chain templates")
Signed-off-by: Dan Carpenter 

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e8bd08ba998a..a3b69bb6f4b0 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1250,8 +1250,10 @@ static void *fl_tmplt_create(struct net *net, struct 
tcf_chain *chain,
goto errout_tb;
 
tmplt = kzalloc(sizeof(*tmplt), GFP_KERNEL);
-   if (!tmplt)
+   if (!tmplt) {
+   err = -ENOMEM;
goto errout_tb;
+   }
tmplt->chain = chain;
err = fl_set_key(net, tb, >dummy_key, >mask, extack);
if (err)


Re: [PATCH net-next 0/4] net: dsa and systemport WoL changes

2018-08-03 Thread David Miller
From: Florian Fainelli 
Date: Fri,  3 Aug 2018 11:08:40 -0700

> This patch series extracts what was previously submitted as part of the
> "WAKE_FILTER" Wake-on-LAN patch series into patches that do not.
> 
> Changes in this series:
> 
> - properly align the dsa_is_cpu_port() check in first patch

Series applied, thanks for splitting these out into a separate series.


Re: [PATCH ethtool] ethtool: Add support for WAKE_FILTER

2018-08-03 Thread David Miller
From: Florian Fainelli 
Date: Fri, 3 Aug 2018 10:57:13 -0700

> Does the current approach of specifying a bitmask of filters looks
> reasonable to you though?

So, in order to answer that, I need some clarification.

The mask, as I see it, is a bit map of 48 possible positions
(SOPASS_MAX * bits_per_byte).  How do these bits map to individual
rxnfc entries?

Are they locations?  If so, how are special locations handled?

What about "special" locations, where the driver and/or hardware
are supposed to decide the location based upon the "special" type
used?

If you considered the following, and you explained why it won't
work, I apologize.  But I'm wondering why you just don't find
some way to specify this as a boolean of the flow spec in the
rxnfc request or similar?

That, at least semantically, seems to avoids several issues.  And it
is unambiguous what flow rule the wake filter boolean applies to.

Right?



Re: [PATCH v3 net-next 5/9] net: stmmac: Add MDIO related functions for XGMAC2

2018-08-03 Thread Florian Fainelli
On 08/03/2018 08:50 AM, Jose Abreu wrote:
> Add the MDIO related funcionalities for the new IP block XGMAC2.
> 
> Signed-off-by: Jose Abreu 
> Cc: David S. Miller 
> Cc: Joao Pinto 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> Cc: Andrew Lunn 
> ---

> +satic int stmmac_xgmac2_c22_format(struct stmmac_priv *priv, int phyaddr,
> + int phyreg, u32 *hw_addr)
> +{
> + unsigned int mii_data = priv->hw->mii.data;
> + u32 tmp;
> +
> + /* HW does not support C22 addr >= 4 */
> + if (phyaddr >= 4)
> + return -ENODEV;

It would be nice if this could be moved at probe time so you don't have
to wait until you connect to the PHY, read its PHY OUI and find out it
has a MDIO address >= 4. Not a blocker, but something that could be
improved further on.

In premise you could even scan the MDIO bus' device tree node, and find
that out ahead of time.

> + /* Wait until any existing MII operation is complete */
> + if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
> +!(tmp & MII_XGMAC_BUSY), 100, 1))
> + return -EBUSY;
> +
> + /* Set port as Clause 22 */
> + tmp = readl(priv->ioaddr + XGMAC_MDIO_C22P);
> + tmp |= BIT(phyaddr);

Since the registers are being Read/Modify/Write here, don't you need to
clear the previous address bits as well?

You probably did not encounter any problems in your testing if you had
only one PHY on the MDIO bus, but this is not something that is
necessarily true, e.g: if you have an Ethernet switch, several MDIO bus
addresses are going to be responding.

Your MDIO bus implementation must be able to support one transaction
with one PHY address and the next transaction with another PHY address ,
etc...

That is something that should be easy to fix and be resubmitted as part
of v4.
-- 
Florian


Re: [PATCH v3 net-next 3/9] net: stmmac: Add DMA related callbacks for XGMAC2

2018-08-03 Thread Florian Fainelli
On 08/03/2018 08:50 AM, Jose Abreu wrote:
> Add the DMA related callbacks for the new IP block XGMAC2.
> 
> Signed-off-by: Jose Abreu 
> Cc: David S. Miller 
> Cc: Joao Pinto 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> ---

> + value &= ~XGMAC_RD_OSR_LMT;
> + value |= (axi->axi_rd_osr_lmt << XGMAC_RD_OSR_LMT_SHIFT) &
> + XGMAC_RD_OSR_LMT;
> +
> + for (i = 0; i < AXI_BLEN; i++) {
> + if (axi->axi_blen[i])
> + value &= ~XGMAC_UNDEF;

Should not you be you clearing all XGMAC_BLEN* values since you do a
logical or here? I am assuming this is not something that would likely
change from one open/close but still?
-- 
Florian


Re: [PATCH v3 net-next 1/9] net: stmmac: Add XGMAC 2.10 HWIF entry

2018-08-03 Thread Florian Fainelli
On 08/03/2018 08:50 AM, Jose Abreu wrote:
> Add a new entry to HWIF table for XGMAC 2.10. For now we fill it with
> empty callbacks which will be added in posterior patches.
> 
> Signed-off-by: Jose Abreu 
> Cc: David S. Miller 
> Cc: Joao Pinto 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> ---
>  drivers/net/ethernet/stmicro/stmmac/common.h | 14 +++--
>  drivers/net/ethernet/stmicro/stmmac/hwif.c   | 31 
> ++--
>  include/linux/stmmac.h   |  1 +
>  3 files changed, 38 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
> b/drivers/net/ethernet/stmicro/stmmac/common.h
> index 78fd0f8b8e81..3fb81acbd274 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/common.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/common.h
> @@ -36,12 +36,14 @@
>  #include "mmc.h"
>  
>  /* Synopsys Core versions */
> -#define  DWMAC_CORE_3_40 0x34
> -#define  DWMAC_CORE_3_50 0x35
> -#define  DWMAC_CORE_4_00 0x40
> -#define DWMAC_CORE_4_10  0x41
> -#define DWMAC_CORE_5_00 0x50
> -#define DWMAC_CORE_5_10 0x51
> +#define  DWMAC_CORE_3_40 0x34
> +#define  DWMAC_CORE_3_50 0x35
> +#define  DWMAC_CORE_4_00 0x40
> +#define DWMAC_CORE_4_10  0x41
> +#define DWMAC_CORE_5_00  0x50
> +#define DWMAC_CORE_5_10  0x51
> +#define DWXGMAC_CORE_2_100x21
> +
>  #define STMMAC_CHAN0 0   /* Always supported and default for all chips */
>  
>  /* These need to be power of two, and >= 4 */
> diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c 
> b/drivers/net/ethernet/stmicro/stmmac/hwif.c
> index 1f50e83cafb2..24f5ff175aa4 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
> @@ -72,6 +72,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
>  static const struct stmmac_hwif_entry {
>   bool gmac;
>   bool gmac4;
> + bool xgmac;
>   u32 min_id;
>   const struct stmmac_regs_off regs;
>   const void *desc;
> @@ -87,6 +88,7 @@ static const struct stmmac_hwif_entry {
>   {
>   .gmac = false,
>   .gmac4 = false,
> + .xgmac = false,

In a future clean-up you would like want to remove this and replace this
an enumeration which is less error prone than having to define a boolean
for each of these previous generations only to say "this is not an xgmac".
-- 
Florian


Re: [PATCH net-next] net/tls: Calculate nsg for zerocopy path without skb_cow_data.

2018-08-03 Thread Doron Roberts-Kedes
On Fri, Aug 03, 2018 at 01:23:33AM +, Vakul Garg wrote:
> 
> 
> > -Original Message-
> > From: Doron Roberts-Kedes [mailto:doro...@fb.com]
> > Sent: Friday, August 3, 2018 6:00 AM
> > To: David S . Miller 
> > Cc: Dave Watson ; Vakul Garg
> > ; Boris Pismenny ; Aviad
> > Yehezkel ; netdev@vger.kernel.org; Doron
> > Roberts-Kedes 
> > Subject: [PATCH net-next] net/tls: Calculate nsg for zerocopy path without
> > skb_cow_data.
> > 
> > decrypt_skb fails if the number of sg elements required to map is greater
> > than MAX_SKB_FRAGS. As noted by Vakul Garg, nsg must always be
> > calculated, but skb_cow_data adds unnecessary memcpy's for the zerocopy
> > case.
> > 
> > The new function skb_nsg calculates the number of scatterlist elements
> > required to map the skb without the extra overhead of skb_cow_data. This
> > function mimics the structure of skb_to_sgvec.
> > 
> > Fixes: c46234ebb4d1 ("tls: RX path for ktls")
> > Signed-off-by: Doron Roberts-Kedes 
> > ---
> >  net/tls/tls_sw.c | 89
> > ++--
> >  1 file changed, 86 insertions(+), 3 deletions(-)
> > 
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index
> > ff3a6904a722..c62793601cfc 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -43,6 +43,76 @@
> > 
> >  #define MAX_IV_SIZETLS_CIPHER_AES_GCM_128_IV_SIZE
> > 
> > +static int __skb_nsg(struct sk_buff *skb, int offset, int len,
> > +  unsigned int recursion_level)
> > +{
> > +int start = skb_headlen(skb);
> > +int i, copy = start - offset;
> > +struct sk_buff *frag_iter;
> > +int elt = 0;
> > +
> > +if (unlikely(recursion_level >= 24))
> > +return -EMSGSIZE;
> > +
> > +if (copy > 0) {
> > +if (copy > len)
> > +copy = len;
> > +elt++;
> > +if ((len -= copy) == 0)
> > +return elt;
> > +offset += copy;
> > +}
> > +
> > +for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> > +int end;
> > +
> > +WARN_ON(start > offset + len);
> > +
> > +end = start + skb_frag_size(_shinfo(skb)->frags[i]);
> > +if ((copy = end - offset) > 0) {
> > +if (copy > len)
> > +copy = len;
> > +elt++;
> > +if (!(len -= copy))
> > +return elt;
> > +offset += copy;
> > +}
> > +start = end;
> > +}
> > +
> > +skb_walk_frags(skb, frag_iter) {
> > +int end, ret;
> > +
> > +WARN_ON(start > offset + len);
> > +
> > +end = start + frag_iter->len;
> > +if ((copy = end - offset) > 0) {
> > +
> > +if (copy > len)
> > +copy = len;
> > +ret = __skb_nsg(frag_iter, offset - start, copy,
> > +   recursion_level + 1);
> > +if (unlikely(ret < 0))
> > +return ret;
> > +elt += ret;
> > +if ((len -= copy) == 0)
> > +return elt;
> > +offset += copy;
> > +}
> > +start = end;
> > +}
> > +BUG_ON(len);
> > +return elt;
> > +}
> > +
> > +/* Return the number of scatterlist elements required to completely map
> > +the
> > + * skb, or -EMSGSIZE if the recursion depth is exceeded.
> > + */
> > +static int skb_nsg(struct sk_buff *skb, int offset, int len) {
> > +   return __skb_nsg(skb, offset, len, 0); }
> > +
> 
> These is generic function and useful elsewhere too.
> Should the above two functions be exported by skbuff.c?

True. Perhaps it can move into skbuff.c if/when there is a second
use case for it.

> 
> >  static int tls_do_decryption(struct sock *sk,
> >  struct scatterlist *sgin,
> >  struct scatterlist *sgout,
> > @@ -693,7 +763,7 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
> > struct scatterlist sgin_arr[MAX_SKB_FRAGS + 2];
> > struct scatterlist *sgin = _arr[0];
> > struct strp_msg *rxm = strp_msg(skb);
> > -   int ret, nsg = ARRAY_SIZE(sgin_arr);
> > +   int ret, nsg;
> > struct sk_buff *unused;
> > 
> > ret = skb_copy_bits(skb, rxm->offset + TLS_HEADER_SIZE, @@ -
> > 704,10 +774,23 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
> > 
> > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
> > if (!sgout) {
> > -   nsg = skb_cow_data(skb, 0, ) + 1;
> > +   nsg = skb_cow_data(skb, 0, );
> > +   } else {
> > +   nsg = skb_nsg(skb,
> > + rxm->offset + 

KCM - recvmsg() mangles packets?

2018-08-03 Thread Dominique Martinet
I've been playing with KCM on a 4.18.0-rc7 kernel and I'm running in a
problem where the iovec filled by recvmsg() is mangled up: it is filled
by the length of one packet, but contains (truncated) data from another
packet, rendering KCM unuseable.

(I haven't tried old kernels to see for how long this is broken/try to
bisect; I might if there's no progress but this might be simpler than I
think)


I've attached a reproducer, a simple program that forks, creates a tcp
server/client, attach the server socket to a kcm socket, and in an
infinite loop sends varying-length messages from the client to the
server.
The loop stops when the server gets a message which length is not the
length indicated in the packet header, rather fast (I can make it run
for a while if I slow down emission, or if I run a verbose tcpdump for
example)

In the quiet version on a VM on my laptop, I get this output:
[root@f2 ~]# gcc -g -l bcc -o kcm kcm.c
[root@f2 ~]# ./kcm 
client is starting
server is starting
server is receiving data
Got 14, expected 27 on 1th message: 22; flags: 80

The client sends message deterministacally, first one is 14 bytes filled
with 1, second one is 27 bytes filled with 2, third one is 9 bytes
filled with 3 etc (final digit is actually a \0 instead)

As we can see, the server received 14 '2', and the header size matches
the second message header, so something went wrong™.
Flags 0x80 is MSG_EOR meaning recvmsg copied the full message.



This happens even if I reduce the VMs CPU to 1, so I was thinking some
irq messes with the sock between skb_peek and the actual copy of the
data (as this deos work if I send slowly!), but even disabling
irq/preempt doesn't seem to help so I'm not sure what to try next.

Any idea?


Thanks,
-- 
Dominique Martinet
/*
 * A sample program of KCM.
 * Originally https://gist.github.com/peo3/fd0e266a3852d3422c08854aba96bff5
 *
 * $ gcc -lbcc kcm-sample.c
 * $ ./a.out 1
 */
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 
#include 


struct my_proto {
	struct _hdr {
		uint32_t len;
	} hdr;
	char data[32];
} __attribute__((packed));

// use htons to use LE header size, since load_half does a first convertion
// from network byte order
const char *bpf_prog_string = "\
ssize_t bpf_prog1(struct __sk_buff *skb)		\
{			\
	return bpf_htons(load_half(skb, 0)) + 4;	\
}";

int servsock_init(int port)
{
	int s, error;
	struct sockaddr_in addr;

	s = socket(AF_INET, SOCK_STREAM, 0);

	addr.sin_family = AF_INET;
	addr.sin_port = htons(port);
	addr.sin_addr.s_addr = INADDR_ANY;
	error = bind(s, (struct sockaddr *), sizeof(addr));
	if (error == -1)
		err(EXIT_FAILURE, "bind");

	error = listen(s, 10);
	if (error == -1)
		err(EXIT_FAILURE, "listen");

	return s;
}

int bpf_init(void)
{
	int fd, map_fd;
	void *mod;
	int key;
	long long value = 0;

	mod = bpf_module_create_c_from_string(bpf_prog_string, 0, NULL, 0);
	fd = bpf_prog_load(
		BPF_PROG_TYPE_SOCKET_FILTER,
		"bpf_prog1",
		bpf_function_start(mod, "bpf_prog1"),
		bpf_function_size(mod, "bpf_prog1"),
		bpf_module_license(mod),
		bpf_module_kern_version(mod),
		0, NULL, 0);

	if (fd == -1)
		exit(1);
	return fd;	
}

void client(int port)
{
	int s, error;
	struct sockaddr_in addr;
	struct hostent *host;
	struct my_proto my_msg;
	int len;

	printf("client is starting\n");

	s = socket(AF_INET, SOCK_STREAM, 0);
	if (s == -1)
		err(EXIT_FAILURE, "socket");

	memset(, 0, sizeof(addr));
	addr.sin_family = AF_INET;
	addr.sin_port = htons(port);
	host = gethostbyname("localhost");
	if (host == NULL)
		err(EXIT_FAILURE, "gethostbyname");
	memcpy(_addr, host->h_addr, host->h_length);

	error = connect(s, (struct sockaddr *), sizeof(addr));
	if (error == -1)
		err(EXIT_FAILURE, "connect");

	len = sprintf(my_msg.data, "1234567890123456789012345678901");
	my_msg.data[len] = '\0';
	my_msg.hdr.len = len + 1;

	int i = 1;
	while(1) {
		my_msg.hdr.len = (i++ * 1312739ULL) % 31 + 1;
		for (int j = 0; j < my_msg.hdr.len; ) {
			j += snprintf(my_msg.data + j, my_msg.hdr.len - j, "%i", i - 1);
		}
		my_msg.data[my_msg.hdr.len-1] = '\0';
		//printf("%d: writing %d\n", i-1, my_msg.hdr.len);
		len = write(s, _msg, sizeof(my_msg.hdr) + my_msg.hdr.len);
		if (error == -1)
			err(EXIT_FAILURE, "write");
		//usleep(1);
	}

	close(s);
}

int kcm_init(void)
{
	int kcmfd;

	kcmfd = socket(AF_KCM, SOCK_DGRAM, KCMPROTO_CONNECTED);
	if (kcmfd == -1)
		err(EXIT_FAILURE, "socket(AF_KCM)");

	return kcmfd;
}

int kcm_attach(int kcmfd, int csock, int bpf_prog_fd)
{
	int error;
	struct kcm_attach attach_info = {
		.fd = csock,
		.bpf_fd = bpf_prog_fd,
	};

	error = ioctl(kcmfd, SIOCKCMATTACH, _info);
	if (error == -1)
		err(EXIT_FAILURE, "ioctl(SIOCKCMATTACH)");
}

void process(int kcmfd)
{
	struct my_proto my_msg;
	int error, len;
	struct msghdr msg;
	struct iovec iov = {
		.iov_base = _msg,
		.iov_len = sizeof(my_msg),
	};

	printf("server is 

Re: [PATCH net-next 4/4] net: systemport: Create helper to set MPD

2018-08-03 Thread Andrew Lunn
On Fri, Aug 03, 2018 at 11:08:44AM -0700, Florian Fainelli wrote:
> Create a helper function to turn on/off MPD, this will be used to avoid
> duplicating code as we are going to add additional types of wake-up
> types.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net-next 2/4] net: dsa: bcm_sf2: Disable learning while in WoL

2018-08-03 Thread Andrew Lunn
On Fri, Aug 03, 2018 at 11:08:42AM -0700, Florian Fainelli wrote:
> When we are in Wake-on-LAN, we operate with the host sofware not running
> a network stack, so we want to the switch to flood packets in order to
> cause a system wake-up when matching specific filters (unicast or
> multicast). This was not necessary before since we supported Magic
> Packet which are targeting a broadcast MAC address which the switch
> already floods.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net-next 1/4] net: dsa: bcm_sf2: Allow targeting CPU ports for CFP rules

2018-08-03 Thread Andrew Lunn
On Fri, Aug 03, 2018 at 11:08:41AM -0700, Florian Fainelli wrote:
> ds->enabled_port_mask only contains a bitmask of user-facing enabled
> ports, we also need to allow programming CFP rules that target CPU ports
> (e.g: ports 5 and 8).
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew


[PATCH net-next 0/4] net: dsa and systemport WoL changes

2018-08-03 Thread Florian Fainelli
Hi David,

This patch series extracts what was previously submitted as part of the
"WAKE_FILTER" Wake-on-LAN patch series into patches that do not.

Changes in this series:

- properly align the dsa_is_cpu_port() check in first patch

Florian Fainelli (4):
  net: dsa: bcm_sf2: Allow targeting CPU ports for CFP rules
  net: dsa: bcm_sf2: Disable learning while in WoL
  net: systemport: Do not re-configure upon WoL interrupt
  net: systemport: Create helper to set MPD

 drivers/net/dsa/bcm_sf2.c  | 12 +++-
 drivers/net/dsa/bcm_sf2_cfp.c  |  3 ++-
 drivers/net/dsa/bcm_sf2_regs.h |  2 ++
 drivers/net/ethernet/broadcom/bcmsysport.c | 24 ++--
 4 files changed, 29 insertions(+), 12 deletions(-)

-- 
2.14.1



[PATCH net-next 3/4] net: systemport: Do not re-configure upon WoL interrupt

2018-08-03 Thread Florian Fainelli
We already properly resume from Wake-on-LAN whether such a condition
occured or not, no need to process the WoL interrupt for functional
changes since that could race with other settings.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 631617d95769..7faad9e1a6f9 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1102,10 +1102,8 @@ static irqreturn_t bcm_sysport_rx_isr(int irq, void 
*dev_id)
if (priv->irq0_stat & INTRL2_0_TX_RING_FULL)
bcm_sysport_tx_reclaim_all(priv);
 
-   if (priv->irq0_stat & INTRL2_0_MPD) {
+   if (priv->irq0_stat & INTRL2_0_MPD)
netdev_info(priv->netdev, "Wake-on-LAN interrupt!\n");
-   bcm_sysport_resume_from_wol(priv);
-   }
 
if (!priv->is_lite)
goto out;
-- 
2.14.1



[PATCH net-next 1/4] net: dsa: bcm_sf2: Allow targeting CPU ports for CFP rules

2018-08-03 Thread Florian Fainelli
ds->enabled_port_mask only contains a bitmask of user-facing enabled
ports, we also need to allow programming CFP rules that target CPU ports
(e.g: ports 5 and 8).

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2_cfp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index b89acaee12d4..1e37b65aab93 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -755,7 +755,8 @@ static int bcm_sf2_cfp_rule_set(struct dsa_switch *ds, int 
port,
port_num = fs->ring_cookie / SF2_NUM_EGRESS_QUEUES;
 
if (fs->ring_cookie == RX_CLS_FLOW_DISC ||
-   !dsa_is_user_port(ds, port_num) ||
+   !(dsa_is_user_port(ds, port_num) ||
+ dsa_is_cpu_port(ds, port_num)) ||
port_num >= priv->hw_params.num_ports)
return -EINVAL;
/*
-- 
2.14.1



[PATCH net-next 4/4] net: systemport: Create helper to set MPD

2018-08-03 Thread Florian Fainelli
Create a helper function to turn on/off MPD, this will be used to avoid
duplicating code as we are going to add additional types of wake-up
types.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 7faad9e1a6f9..284581c9680e 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1041,17 +1041,25 @@ static int bcm_sysport_poll(struct napi_struct *napi, 
int budget)
return work_done;
 }
 
-static void bcm_sysport_resume_from_wol(struct bcm_sysport_priv *priv)
+static void mpd_enable_set(struct bcm_sysport_priv *priv, bool enable)
 {
u32 reg;
 
+   reg = umac_readl(priv, UMAC_MPD_CTRL);
+   if (enable)
+   reg |= MPD_EN;
+   else
+   reg &= ~MPD_EN;
+   umac_writel(priv, reg, UMAC_MPD_CTRL);
+}
+
+static void bcm_sysport_resume_from_wol(struct bcm_sysport_priv *priv)
+{
/* Stop monitoring MPD interrupt */
intrl2_0_mask_set(priv, INTRL2_0_MPD);
 
/* Clear the MagicPacket detection logic */
-   reg = umac_readl(priv, UMAC_MPD_CTRL);
-   reg &= ~MPD_EN;
-   umac_writel(priv, reg, UMAC_MPD_CTRL);
+   mpd_enable_set(priv, false);
 
netif_dbg(priv, wol, priv->netdev, "resumed from WOL\n");
 }
@@ -2447,9 +2455,7 @@ static int bcm_sysport_suspend_to_wol(struct 
bcm_sysport_priv *priv)
 
/* Do not leave the UniMAC RBUF matching only MPD packets */
if (!timeout) {
-   reg = umac_readl(priv, UMAC_MPD_CTRL);
-   reg &= ~MPD_EN;
-   umac_writel(priv, reg, UMAC_MPD_CTRL);
+   mpd_enable_set(priv, false);
netif_err(priv, wol, ndev, "failed to enter WOL mode\n");
return -ETIMEDOUT;
}
-- 
2.14.1



[PATCH net-next 2/4] net: dsa: bcm_sf2: Disable learning while in WoL

2018-08-03 Thread Florian Fainelli
When we are in Wake-on-LAN, we operate with the host sofware not running
a network stack, so we want to the switch to flood packets in order to
cause a system wake-up when matching specific filters (unicast or
multicast). This was not necessary before since we supported Magic
Packet which are targeting a broadcast MAC address which the switch
already floods.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c  | 12 +++-
 drivers/net/dsa/bcm_sf2_regs.h |  2 ++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index ac96ff40d37e..e0066adcd2f3 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -166,6 +166,11 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int 
port,
reg &= ~P_TXQ_PSM_VDD(port);
core_writel(priv, reg, CORE_MEM_PSM_VDD_CTRL);
 
+   /* Enable learning */
+   reg = core_readl(priv, CORE_DIS_LEARN);
+   reg &= ~BIT(port);
+   core_writel(priv, reg, CORE_DIS_LEARN);
+
/* Enable Broadcom tags for that port if requested */
if (priv->brcm_tag_mask & BIT(port))
b53_brcm_hdr_setup(ds, port);
@@ -222,8 +227,13 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, 
int port,
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
u32 reg;
 
-   if (priv->wol_ports_mask & (1 << port))
+   /* Disable learning while in WoL mode */
+   if (priv->wol_ports_mask & (1 << port)) {
+   reg = core_readl(priv, CORE_DIS_LEARN);
+   reg |= BIT(port);
+   core_writel(priv, reg, CORE_DIS_LEARN);
return;
+   }
 
if (port == priv->moca_port)
bcm_sf2_port_intr_disable(priv, port);
diff --git a/drivers/net/dsa/bcm_sf2_regs.h b/drivers/net/dsa/bcm_sf2_regs.h
index 3ccd5a865dcb..0a1e530d52b7 100644
--- a/drivers/net/dsa/bcm_sf2_regs.h
+++ b/drivers/net/dsa/bcm_sf2_regs.h
@@ -168,6 +168,8 @@ enum bcm_sf2_reg_offs {
 #define CORE_SWITCH_CTRL   0x00088
 #define  MII_DUMB_FWDG_EN  (1 << 6)
 
+#define CORE_DIS_LEARN 0x000f0
+
 #define CORE_SFT_LRN_CTRL  0x000f8
 #define  SW_LEARN_CNTL(x)  (1 << (x))
 
-- 
2.14.1



Re: [PATCH iproute2] ip link: don't stop batch processing

2018-08-03 Thread Dave Taht
On Fri, Aug 3, 2018 at 10:50 AM Matteo Croce  wrote:
>
> When 'ip link show dev DEVICE' is processed in a batch mode, ip exits
> and stop processing further commands.
> This because ipaddr_list_flush_or_save() calls exit() to avoid printing
> the link information twice.
> Replace the exit with a classic goto out instruction.
>
> Signed-off-by: Matteo Croce 

one thing I noticed in iproute2-next last week is that

( echo qdisc show dev eno1; sleep 5; echo qdisc show dev eno1; ) | tc -b -

batches the whole thing up to emerge on exit, only.

It didn't used to do that, the output of every command came out as it
completed. I used to use that to timestamp and save the overhead of
invoking the tc utility on openwrt while monitoring qdisc stats in
https://github.com/tohojo/flent/blob/master/misc/tc_iterate.c

alternatively adding timed/timestamped output to tc like -c count or
-I interval would be useful.

> ---
>  ip/ipaddress.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index 6c306ab7..b7b78f6e 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -1920,7 +1920,7 @@ static int ipaddr_list_flush_or_save(int argc, char 
> **argv, int action)
> exit(1);
> }
> delete_json_obj();
> -   exit(0);
> +   goto out;
> }
>
> if (filter.family != AF_PACKET) {
> --
> 2.17.1
>


-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619


Re: [PATCH ethtool] ethtool: Add support for WAKE_FILTER

2018-08-03 Thread Florian Fainelli
On 08/01/2018 09:32 AM, David Miller wrote:
> From: Florian Fainelli 
> Date: Mon, 30 Jul 2018 15:26:24 -0700
> 
>> On 07/17/2018 08:36 AM, Florian Fainelli wrote:
>>> Allow re-purposing the wol->sopass storage area to specify a bitmask of 
>>> filters
>>> (programmed previously via ethtool::rxnfc) to be used as wake-up patterns.
>>
>> John, David, can you provide some feedback if the approach is
>> acceptable? I will address Andrew's comment about the user friendliness
>> and allow providing a comma separate list of filter identifiers.
>>
>> One usability issue with this approach is that one cannot specify
>> wake-on-LAN using WAKE_MAGICSECURE *and* WAKE_FILTER at the same time,
>> since it uses the same location in the ioctl() structure that is being
>> passed. Do you see this as a problem?
> 
> Once again we are stuck in this weird situation, a sort of limbo.
> 
> On the one hand, I don't want to block your work on the ethtool
> netlink stuff being done.
> 
> However it is clear that by using netlink attributes, it would
> be so much cleaner.
> 
> I honestly don't know what to say at this time.  I wish I had
> a clear piece of advice and a way for everyone to move forward,
> and usually I do, but this time I really don't :-/
> 

That's fine, let me submit the first few patches that are per-requisite
but don't actually introduce the WAKE_FILTER support. Once Michal's
ethtool/netlink work gets merged I can quickly extend that in a way that
supports wake-on-LAN using configured filters.

Does the current approach of specifying a bitmask of filters looks
reasonable to you though?
-- 
Florian


[PATCH iproute2] ip link: don't stop batch processing

2018-08-03 Thread Matteo Croce
When 'ip link show dev DEVICE' is processed in a batch mode, ip exits
and stop processing further commands.
This because ipaddr_list_flush_or_save() calls exit() to avoid printing
the link information twice.
Replace the exit with a classic goto out instruction.

Signed-off-by: Matteo Croce 
---
 ip/ipaddress.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 6c306ab7..b7b78f6e 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1920,7 +1920,7 @@ static int ipaddr_list_flush_or_save(int argc, char 
**argv, int action)
exit(1);
}
delete_json_obj();
-   exit(0);
+   goto out;
}
 
if (filter.family != AF_PACKET) {
-- 
2.17.1



Re: [PATCH net-next 0/3] l2tp: sanitise MTU handling on sessions

2018-08-03 Thread David Miller
From: Guillaume Nault 
Date: Fri, 3 Aug 2018 12:38:32 +0200

> Most of the code handling sessions' MTU has no effect. The ->mtu field
> in struct l2tp_session might be used at session creation time, but
> neither PPP nor Ethernet pseudo-wires take updates into account.
> 
> L2TP sessions don't have a concept of MTU, which is the reason why
> ->mtu is mostly ignored. MTU should remain a network device thing.
> Therefore this patch set does not try to propagate/update ->mtu to/from
> the device. That would complicate the code unnecessarily. Instead this
> field and the associated ioctl commands and netlink attributes are
> removed.
> 
> Patch #1 defines l2tp_tunnel_dst_mtu() in order to simplify the
> following patches. Then patches #2 and #3 remove MTU handling from PPP
> and Ethernet pseudo-wires respectively.

Series applied, thanks.


Re: [PATCH v3 net-next 5/9] net: stmmac: Add MDIO related functions for XGMAC2

2018-08-03 Thread Andrew Lunn
On Fri, Aug 03, 2018 at 04:50:23PM +0100, Jose Abreu wrote:
> Add the MDIO related funcionalities for the new IP block XGMAC2.
> 
> Signed-off-by: Jose Abreu 
> Cc: David S. Miller 
> Cc: Joao Pinto 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> Cc: Andrew Lunn 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-08-03 Thread Stefano Brivio
Hi Ben,

On Tue, 31 Jul 2018 15:06:57 -0700
Ben Pfaff  wrote:

> This is an awkward problem to try to solve with sockets because of the
> nature of sockets, which are strictly first-in first-out.  What you
> really want is something closer to the algorithm that we use in
> ovs-vswitchd to send packets to an OpenFlow controller.  When the
> channel becomes congested, then for each packet to be sent to the
> controller, OVS appends it to a queue associated with its input port.
> (This could be done on a more granular basis than just port.)  If the
> maximum amount of queued packets is reached, then OVS discards a packet
> from the longest queue.  When space becomes available in the channel,
> OVS round-robins through the queues to send a packet.  This achieves
> pretty good fairness but it can't be done with sockets because you can't
> drop a packet that is already queued to one.

Thanks for your feedback. What you describe is, though, functionally
equivalent to what this patch does, minus the per-port queueing limit.

However, instead of having one explicit queue for each port, and
then fetching packets in a round-robin fashion from all the queues, we
implemented this with a single queue and choose insertion points while
queueing in such a way that the result is equivalent. This way, we
avoid the massive overhead associated with having one queue per each
port (we can have up to 2^16 ports), and cycling over them.

Let's say we have two ports, A and B, and three upcalls are sent for
each port. If we implement one queue for each port as you described, we
end up with this:

. - - -
| A1 | A2 | A3 |
' - - -

. - - -
| B1 | B2 | B3 |
' - - -

and then send upcalls in this order: A1, B1, A2, B2, A3, B3.

What we are doing here with a single queue is inserting the upcalls
directly in this order:

.--- - - -
| A1 | B1 | A2 | B2 | A3 | B3 |
'--- - - -

and dequeueing from the head.

About the per-port queueing limit: we currently have a global one
(UPCALL_QUEUE_MAX_LEN), while the per-port limit is simply given by
implementation constraints in our case:

if (dp->upcalls.count[pos->port_no] == U8_MAX - 1) {
err = -ENOSPC;
goto out_clear;
}

but we can easily swap that U8_MAX - 1 with another macro or a
configurable value, if there's any value in doing that.

> My current thought is that any fairness scheme we implement directly in
> the kernel is going to need to evolve over time.  Maybe we could do
> something flexible with BPF and maps, instead of hard-coding it.

Honestly, I fail to see what else we might want to do here, other than
adding a simple mechanism for fairness, to solve the specific issue at
hand. Flexibility would probably come at a higher cost. We could easily
make limits configurable if needed. Do you have anything else in mind?

-- 
Stefano


Re: [patch net-next] net: sched: fix flush on non-existing chain

2018-08-03 Thread David Miller
From: Jiri Pirko 
Date: Fri,  3 Aug 2018 11:08:47 +0200

> From: Jiri Pirko 
> 
> User was able to perform filter flush on chain 0 even if it didn't have
> any filters in it. With the patch that avoided implicit chain 0
> creation, this changed. So in case user wants filter flush on chain
> which does not exist, just return success. There's no reason for non-0
> chains to behave differently than chain 0, so do the same for them.
> 
> Reported-by: Ido Schimmel 
> Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation")
> Signed-off-by: Jiri Pirko 

Applied.


Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31

2018-08-03 Thread Ido Schimmel
On Thu, Aug 02, 2018 at 03:53:15PM -0700, Jakub Kicinski wrote:
> No one is requesting full RED offload here..  if someone sets the
> parameters you can't support you simply won't offload them.  And ignore
> the parameters which only make sense in software terms.  Look at the
> docs for mlxsw:
> 
> https://github.com/Mellanox/mlxsw/wiki/Queues-Management#offloading-red
> 
> It says "not offloaded" in a number of places.
> 
...
> It's generally preferable to implement a subset of exiting well defined
> API than create vendor knobs, hence hardly a misuse.

Sorry for derailing the discussion, but you mentioned some points that
have been bothering me for a while.

I think we didn't do a very good job with buffer management and this is
exactly why you see some parameters marked as "not offloaded". Take the
"limit" (queue size) for example. It's configured via devlink-sb, by
setting a quota on the number of bytes that can be queued for the port
and TC (queue) that RED manages. See:

https://github.com/Mellanox/mlxsw/wiki/Quality-of-Service#pool-binding

It would have been much better and user friendly to not ignore this
parameter and have users configure the limit using existing interfaces
(tc), instead of creating a discrepancy between the software and
hardware data paths by configuring the hardware directly via devlink-sb.

I believe devlink-sb is mainly the result of Linux's short comings in
this area and our lack of perspective back then. While the qdisc layer
(Linux's shared buffers) works for end hosts, it requires enhancements
(mainly on ingress) for switches (physical/virtual) that forward
packets.

For example, switches (I'm familiar with Mellanox ASICs, but I assume
the concept is similar in other ASICs) have ingress buffers where
packets are stored while going through the pipeline. Once out of the
pipeline you know from which port and queue the packet should egress. In
case you have both lossless and lossy traffic in your network you
probably want to classify it into different ingress buffers and mark the
buffers where the lossless traffic is stored as such, so that PFC frames
would be emitted above a certain threshold.

This is currently configured using dcbnl, but it lacks a software model
which means that packets that are forwarded by the kernel don't get the
same treatment (e.g., skb priority isn't set). It also means that when
you want to limit the number of packets that are queued *from* a certain
port and ingress buffer you resort to tools such as devlink-sb that end
up colliding with existing tools (tc).

I was thinking (not too much...) about modelling the above using ingress
qdiscs. They don't do any queueing, but more of accounting. Once the
egress qdisc dequeues the packet, you give credit back to the ingress
qdisc from which the packet came from. I believe that modelling these
buffers using the qdisc layer is the right abstraction.

Would appreciate hearing your thoughts on the above.


RE: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()

2018-08-03 Thread David Laight
From: Konstantin Khorenko
> Sent: 03 August 2018 17:21
> 
> Each SCTP association can have up to 65535 input and output streams.
> For each stream type an array of sctp_stream_in or sctp_stream_out
> structures is allocated using kmalloc_array() function. This function
> allocates physically contiguous memory regions, so this can lead
> to allocation of memory regions of very high order, i.e.:
...

Given how useless SCTP streams are, does anything actually use
more than about 4?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



RE: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams

2018-08-03 Thread David Laight
From: Konstantin Khorenko
> Sent: 03 August 2018 17:21
...
> --- a/net/sctp/stream.c
> +++ b/net/sctp/stream.c
> @@ -37,6 +37,18 @@
>  #include 
>  #include 
> 
> +struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
> + __u16 sid)
> +{
> + return ((struct sctp_stream_out *)(stream->out)) + sid;
> +}
> +
> +struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
> +   __u16 sid)
> +{
> + return ((struct sctp_stream_in *)(stream->in)) + sid;
> +}
> +

Those look like they ought to be static inlines in the header file.
Otherwise you'll be making SCTP performance worse that it is already.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



[PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams

2018-08-03 Thread Konstantin Khorenko
This patch introduces wrappers for accessing in/out streams indirectly.
This will enable to replace physically contiguous memory arrays
of streams with flexible arrays (or maybe any other appropriate
mechanism) which do memory allocation on a per-page basis.

Signed-off-by: Oleg Babin 
Signed-off-by: Konstantin Khorenko 

---
v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().
---
 include/net/sctp/structs.h   |  30 +++-
 net/sctp/chunk.c |   6 ++-
 net/sctp/outqueue.c  |  11 +++--
 net/sctp/socket.c|   4 +-
 net/sctp/stream.c| 107 +--
 net/sctp/stream_interleave.c |  20 
 net/sctp/stream_sched.c  |  13 +++---
 net/sctp/stream_sched_prio.c |  22 -
 net/sctp/stream_sched_rr.c   |   8 ++--
 9 files changed, 124 insertions(+), 97 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index dbe1b911a24d..dc48c8e2b293 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -394,37 +394,35 @@ void sctp_stream_update(struct sctp_stream *stream, 
struct sctp_stream *new);
 
 /* What is the current SSN number for this stream? */
 #define sctp_ssn_peek(stream, type, sid) \
-   ((stream)->type[sid].ssn)
+   (sctp_stream_##type((stream), (sid))->ssn)
 
 /* Return the next SSN number for this stream. */
 #define sctp_ssn_next(stream, type, sid) \
-   ((stream)->type[sid].ssn++)
+   (sctp_stream_##type((stream), (sid))->ssn++)
 
 /* Skip over this ssn and all below. */
 #define sctp_ssn_skip(stream, type, sid, ssn) \
-   ((stream)->type[sid].ssn = ssn + 1)
+   (sctp_stream_##type((stream), (sid))->ssn = ssn + 1)
 
 /* What is the current MID number for this stream? */
 #define sctp_mid_peek(stream, type, sid) \
-   ((stream)->type[sid].mid)
+   (sctp_stream_##type((stream), (sid))->mid)
 
 /* Return the next MID number for this stream.  */
 #define sctp_mid_next(stream, type, sid) \
-   ((stream)->type[sid].mid++)
+   (sctp_stream_##type((stream), (sid))->mid++)
 
 /* Skip over this mid and all below. */
 #define sctp_mid_skip(stream, type, sid, mid) \
-   ((stream)->type[sid].mid = mid + 1)
-
-#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
+   (sctp_stream_##type((stream), (sid))->mid = mid + 1)
 
 /* What is the current MID_uo number for this stream? */
 #define sctp_mid_uo_peek(stream, type, sid) \
-   ((stream)->type[sid].mid_uo)
+   (sctp_stream_##type((stream), (sid))->mid_uo)
 
 /* Return the next MID_uo number for this stream.  */
 #define sctp_mid_uo_next(stream, type, sid) \
-   ((stream)->type[sid].mid_uo++)
+   (sctp_stream_##type((stream), (sid))->mid_uo++)
 
 /*
  * Pointers to address related SCTP functions.
@@ -1433,8 +1431,8 @@ struct sctp_stream_in {
 };
 
 struct sctp_stream {
-   struct sctp_stream_out *out;
-   struct sctp_stream_in *in;
+   struct flex_array *out;
+   struct flex_array *in;
__u16 outcnt;
__u16 incnt;
/* Current stream being sent, if any */
@@ -1456,6 +1454,14 @@ struct sctp_stream {
struct sctp_stream_interleave *si;
 };
 
+struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
+   __u16 sid);
+struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
+ __u16 sid);
+
+#define SCTP_SO(s, i) sctp_stream_out((s), (i))
+#define SCTP_SI(s, i) sctp_stream_in((s), (i))
+
 #define SCTP_STREAM_CLOSED 0x00
 #define SCTP_STREAM_OPEN   0x01
 
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index bfb9f812e2ef..ce8087846f05 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -325,7 +325,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
if (SCTP_PR_TTL_ENABLED(chunk->sinfo.sinfo_flags) &&
time_after(jiffies, chunk->msg->expires_at)) {
struct sctp_stream_out *streamout =
-   >asoc->stream.out[chunk->sinfo.sinfo_stream];
+   SCTP_SO(>asoc->stream,
+   chunk->sinfo.sinfo_stream);
 
if (chunk->sent_count) {
chunk->asoc->abandoned_sent[SCTP_PR_INDEX(TTL)]++;
@@ -339,7 +340,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
} else if (SCTP_PR_RTX_ENABLED(chunk->sinfo.sinfo_flags) &&
   chunk->sent_count > chunk->sinfo.sinfo_timetolive) {
struct sctp_stream_out *streamout =
-   >asoc->stream.out[chunk->sinfo.sinfo_stream];
+   SCTP_SO(>asoc->stream,
+   chunk->sinfo.sinfo_stream);
 
chunk->asoc->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
streamout->ext->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
diff --git 

[PATCH v2 2/2] net/sctp: Replace in/out stream arrays with flex_array

2018-08-03 Thread Konstantin Khorenko
This path replaces physically contiguous memory arrays
allocated using kmalloc_array() with flexible arrays.
This enables to avoid memory allocation failures on the
systems under a memory stress.

Signed-off-by: Oleg Babin 
---
 include/net/sctp/structs.h |  1 +
 net/sctp/stream.c  | 78 +++---
 2 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index dc48c8e2b293..884d33965e89 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -57,6 +57,7 @@
 #include   /* This gets us atomic counters.  */
 #include   /* We need sk_buff_head. */
 #include/* We need tq_struct.*/
+#include   /* We need flex_array.   */
 #include /* We need sctp* header structs.  */
 #include  /* We need auth specific structs */
 #include /* For inet_skb_parm */
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index 56fadeec7cba..3e55db1a38d0 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -40,13 +40,60 @@
 struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
__u16 sid)
 {
-   return ((struct sctp_stream_out *)(stream->out)) + sid;
+   return flex_array_get(stream->out, sid);
 }
 
 struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
  __u16 sid)
 {
-   return ((struct sctp_stream_in *)(stream->in)) + sid;
+   return flex_array_get(stream->in, sid);
+}
+
+static struct flex_array *fa_alloc(size_t elem_size, size_t elem_count,
+  gfp_t gfp)
+{
+   struct flex_array *result;
+   int err;
+
+   result = flex_array_alloc(elem_size, elem_count, gfp);
+   if (result) {
+   err = flex_array_prealloc(result, 0, elem_count, gfp);
+   if (err) {
+   flex_array_free(result);
+   result = NULL;
+   }
+   }
+
+   return result;
+}
+
+static void fa_free(struct flex_array *fa)
+{
+   if (fa)
+   flex_array_free(fa);
+}
+
+static void fa_copy(struct flex_array *fa, struct flex_array *from,
+   size_t index, size_t count)
+{
+   void *elem;
+
+   while (count--) {
+   elem = flex_array_get(from, index);
+   flex_array_put(fa, index, elem, 0);
+   index++;
+   }
+}
+
+static void fa_zero(struct flex_array *fa, size_t index, size_t count)
+{
+   void *elem;
+
+   while (count--) {
+   elem = flex_array_get(fa, index);
+   memset(elem, 0, fa->element_size);
+   index++;
+   }
 }
 
 /* Migrates chunks from stream queues to new stream queues if needed,
@@ -106,19 +153,17 @@ static int sctp_stream_alloc_out(struct sctp_stream 
*stream, __u16 outcnt,
struct flex_array *out;
size_t elem_size = sizeof(struct sctp_stream_out);
 
-   out = kmalloc_array(outcnt, elem_size, gfp);
+   out = fa_alloc(elem_size, outcnt, gfp);
if (!out)
return -ENOMEM;
 
if (stream->out) {
-   memcpy(out, stream->out, min(outcnt, stream->outcnt) *
-elem_size);
-   kfree(stream->out);
+   fa_copy(out, stream->out, 0, min(outcnt, stream->outcnt));
+   fa_free(stream->out);
}
 
if (outcnt > stream->outcnt)
-   memset(((struct sctp_stream_out *)out) + stream->outcnt, 0,
-  (outcnt - stream->outcnt) * elem_size);
+   fa_zero(out, stream->outcnt, (outcnt - stream->outcnt));
 
stream->out = out;
 
@@ -131,20 +176,17 @@ static int sctp_stream_alloc_in(struct sctp_stream 
*stream, __u16 incnt,
struct flex_array *in;
size_t elem_size = sizeof(struct sctp_stream_in);
 
-   in = kmalloc_array(incnt, elem_size, gfp);
-
+   in = fa_alloc(elem_size, incnt, gfp);
if (!in)
return -ENOMEM;
 
if (stream->in) {
-   memcpy(in, stream->in, min(incnt, stream->incnt) *
-  elem_size);
-   kfree(stream->in);
+   fa_copy(in, stream->in, 0, min(incnt, stream->incnt));
+   fa_free(stream->in);
}
 
if (incnt > stream->incnt)
-   memset(((struct sctp_stream_in *)in) + stream->incnt, 0,
-  (incnt - stream->incnt) * elem_size);
+   fa_zero(in, stream->incnt, (incnt - stream->incnt));
 
stream->in = in;
 
@@ -188,7 +230,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 
outcnt, __u16 incnt,
ret = sctp_stream_alloc_in(stream, incnt, gfp);
if (ret) {
sched->free(stream);
-   kfree(stream->out);
+   fa_free(stream->out);
stream->out = NULL;
   

[PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()

2018-08-03 Thread Konstantin Khorenko
Each SCTP association can have up to 65535 input and output streams.
For each stream type an array of sctp_stream_in or sctp_stream_out
structures is allocated using kmalloc_array() function. This function
allocates physically contiguous memory regions, so this can lead
to allocation of memory regions of very high order, i.e.:

  sizeof(struct sctp_stream_out) == 24,
  ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
  which means 9th memory order.

This can lead to a memory allocation failures on the systems
under a memory stress.

We actually do not need these arrays of memory to be physically
contiguous. Possible simple solution would be to use kvmalloc()
instread of kmalloc() as kvmalloc() can allocate physically scattered
pages if contiguous pages are not available. But the problem
is that the allocation can happed in a softirq context with
GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.

So the other possible solution is to use flexible arrays instead of
contiguios arrays of memory so that the memory would be allocated
on a per-page basis.

This patchset replaces kvmalloc() with flex_array usage.
It consists of two parts:

  * First patch is preparatory - it mechanically wraps all direct
access to assoc->stream.out[] and assoc->stream.in[] arrays
with SCTP_SO() and SCTP_SI() wrappers so that later a direct
array access could be easily changed to an access to a
flex_array (or any other possible alternative).
  * Second patch replaces kmalloc_array() with flex_array usage.

Oleg Babin (2):
  net/sctp: Make wrappers for accessing in/out streams
  net/sctp: Replace in/out stream arrays with flex_array

 include/net/sctp/structs.h   |  31 
 net/sctp/chunk.c |   6 +-
 net/sctp/outqueue.c  |  11 +--
 net/sctp/socket.c|   4 +-
 net/sctp/stream.c| 165 +--
 net/sctp/stream_interleave.c |  20 +++---
 net/sctp/stream_sched.c  |  13 ++--
 net/sctp/stream_sched_prio.c |  22 +++---
 net/sctp/stream_sched_rr.c   |   8 +--
 9 files changed, 175 insertions(+), 105 deletions(-)

v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().


Performance results:

  * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this 
thread)
  * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
  RAM: 32 Gb

  * netperf: taken from https://github.com/HewlettPackard/netperf.git,
 compiled from sources with sctp support
  * netperf server and client are run on the same node
  * ip link set lo mtu 1500

The script used to run tests:
 # cat run_tests.sh
 #!/bin/bash

for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
  echo "TEST: $test";
  for i in `seq 1 3`; do
echo "Iteration: $i";
set -x
netperf -t $test -H localhost -p 2 -S 20,20 -s 20,20 \
-l 60 -- -m 1452;
set +x
  done
done


Results (a bit reformatted to be more readable):
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

v4.18-rc7   v4.18-rc7 + fixes
TEST: SCTP_STREAM
212992 212992   145260.21   1125.52 1247.04
212992 212992   145260.20   1376.38 1149.95
212992 212992   145260.20   1131.40 1163.85
TEST: SCTP_STREAM_MANY
212992 212992   145260.00   .00 1310.05
212992 212992   145260.00   1188.55 1130.50
212992 212992   145260.00   1108.06 1162.50

===
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate
bytes  Bytes  bytesbytes   secs.per sec

v4.18-rc7   v4.18-rc7 + fixes
TEST: SCTP_RR
212992 212992 11   60.0045486.9846089.43
212992 212992 11   60.0045584.1845994.21
212992 212992 11   60.0045703.8645720.84
TEST: SCTP_RR_MANY
212992 212992 11   60.0040.75   40.77
212992 212992 11   60.0040.58   40.08
212992 212992 11   60.0039.98   39.97

-- 
2.15.1



[PATCH v3 net-next 9/9] dt-bindings: net: stmmac: Add the bindings documentation for XGMAC2.

2018-08-03 Thread Jose Abreu
Adds the documentation for XGMAC2 DT bindings.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
Cc: Sergei Shtylyov 
Cc: devicet...@vger.kernel.org
Cc: Rob Herring 
---
Changes from v1:
- Correct header, now we also support 2.5/10G.
- Add missing '>' (Sergei)
---
 Documentation/devicetree/bindings/net/stmmac.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt 
b/Documentation/devicetree/bindings/net/stmmac.txt
index 3a28a5d8857d..a32fd590ce8f 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -1,7 +1,8 @@
-* STMicroelectronics 10/100/1000 Ethernet driver (GMAC)
+* STMicroelectronics 10/100/1000/2500/1 Ethernet driver (GMAC/XGMAC)
 
 Required properties:
-- compatible: Should be "snps,dwmac-", "snps,dwmac"
+- compatible: Should be "snps,dwmac-", "snps,dwmac" or
+   "snps,dwxgmac-", "snps,dwxgmac".
For backwards compatibility: "st,spear600-gmac" is also supported.
 - reg: Address and length of the register set for the device
 - interrupt-parent: Should be the phandle for the interrupt controller
-- 
2.7.4




[PATCH v3 net-next 8/9] net: stmmac: Add the bindings parsing for XGMAC2

2018-08-03 Thread Jose Abreu
Add the bindings parsing for XGMAC2 IP block.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c   | 2 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
index 3304095c934c..fad503820e04 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
@@ -78,6 +78,8 @@ static const struct of_device_id dwmac_generic_match[] = {
{ .compatible = "snps,dwmac-4.00"},
{ .compatible = "snps,dwmac-4.10a"},
{ .compatible = "snps,dwmac"},
+   { .compatible = "snps,dwxgmac-2.10"},
+   { .compatible = "snps,dwxgmac"},
{ }
 };
 MODULE_DEVICE_TABLE(of, dwmac_generic_match);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 72da77b94ecd..3609c7b696c7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -486,6 +486,12 @@ stmmac_probe_config_dt(struct platform_device *pdev, const 
char **mac)
plat->force_sf_dma_mode = 1;
}
 
+   if (of_device_is_compatible(np, "snps,dwxgmac")) {
+   plat->has_xgmac = 1;
+   plat->pmt = 1;
+   plat->tso_en = of_property_read_bool(np, "snps,tso");
+   }
+
dma_cfg = devm_kzalloc(>dev, sizeof(*dma_cfg),
   GFP_KERNEL);
if (!dma_cfg) {
-- 
2.7.4




[PATCH v3 net-next 7/9] net: stmmac: Integrate XGMAC into main driver flow

2018-08-03 Thread Jose Abreu
Now that we have all the XGMAC related callbacks, lets start integrating
this IP block into main driver.

Also, we corrected the initialization flow to only start DMA after
setting descriptors length.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
Cc: Andrew Lunn 
---
Changes from v1:
- Correct flow of initialization
- Remove 2.5G/10G support (Andrew)
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 67 ---
 1 file changed, 48 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 9d104a05044d..ff1ffb46198a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include "dwmac1000.h"
+#include "dwxgmac2.h"
 #include "hwif.h"
 
 #defineSTMMAC_ALIGN(x) __ALIGN_KERNEL(x, SMP_CACHE_BYTES)
@@ -262,6 +263,21 @@ static void stmmac_clk_csr_set(struct stmmac_priv *priv)
else
priv->clk_csr = 0;
}
+
+   if (priv->plat->has_xgmac) {
+   if (clk_rate > 4)
+   priv->clk_csr = 0x5;
+   else if (clk_rate > 35000)
+   priv->clk_csr = 0x4;
+   else if (clk_rate > 3)
+   priv->clk_csr = 0x3;
+   else if (clk_rate > 25000)
+   priv->clk_csr = 0x2;
+   else if (clk_rate > 15000)
+   priv->clk_csr = 0x1;
+   else
+   priv->clk_csr = 0x0;
+   }
 }
 
 static void print_pkt(unsigned char *buf, int len)
@@ -498,7 +514,7 @@ static void stmmac_get_rx_hwtstamp(struct stmmac_priv 
*priv, struct dma_desc *p,
if (!priv->hwts_rx_en)
return;
/* For GMAC4, the valid timestamp is from CTX next desc. */
-   if (priv->plat->has_gmac4)
+   if (priv->plat->has_gmac4 || priv->plat->has_xgmac)
desc = np;
 
/* Check if timestamp is available */
@@ -540,6 +556,9 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
u32 ts_event_en = 0;
u32 value = 0;
u32 sec_inc;
+   bool xmac;
+
+   xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac;
 
if (!(priv->dma_cap.time_stamp || priv->adv_ts)) {
netdev_alert(priv->dev, "No support for HW time stamping\n");
@@ -575,7 +594,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
/* PTP v1, UDP, any kind of event packet */
config.rx_filter = HWTSTAMP_FILTER_PTP_V1_L4_EVENT;
/* take time stamp for all event messages */
-   if (priv->plat->has_gmac4)
+   if (xmac)
snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
else
snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
@@ -610,7 +629,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L4_EVENT;
ptp_v2 = PTP_TCR_TSVER2ENA;
/* take time stamp for all event messages */
-   if (priv->plat->has_gmac4)
+   if (xmac)
snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
else
snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
@@ -647,7 +666,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
ptp_v2 = PTP_TCR_TSVER2ENA;
/* take time stamp for all event messages */
-   if (priv->plat->has_gmac4)
+   if (xmac)
snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
else
snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
@@ -718,7 +737,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
/* program Sub Second Increment reg */
stmmac_config_sub_second_increment(priv,
priv->ptpaddr, priv->plat->clk_ptp_rate,
-   priv->plat->has_gmac4, _inc);
+   xmac, _inc);
temp = div_u64(10ULL, sec_inc);
 
/* Store sub second increment and flags for later use */
@@ -755,12 +774,14 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
  */
 static int stmmac_init_ptp(struct stmmac_priv *priv)
 {
+   bool xmac = priv->plat->has_gmac4 || 

[PATCH v3 net-next 6/9] net: stmmac: Add PTP support for XGMAC2

2018-08-03 Thread Jose Abreu
XGMAC2 uses the same engine of timestamping as GMAC4. Let's use the same
callbacks.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/hwif.c   | 4 ++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c | 6 --
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h | 1 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c 
b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 4b4ba1c8bad5..357309a6d6a5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -193,13 +193,13 @@ static const struct stmmac_hwif_entry {
.xgmac = true,
.min_id = DWXGMAC_CORE_2_10,
.regs = {
-   .ptp_off = 0,
+   .ptp_off = PTP_XGMAC_OFFSET,
.mmc_off = 0,
},
.desc = _desc_ops,
.dma = _dma_ops,
.mac = _ops,
-   .hwtimestamp = NULL,
+   .hwtimestamp = _ptp,
.mode = NULL,
.tc = NULL,
.setup = dwxgmac2_setup,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
index 0cb0e39a2be9..2293e21f789f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
@@ -71,6 +71,9 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 
delta)
u32 sec, nsec;
u32 quotient, reminder;
int neg_adj = 0;
+   bool xmac;
+
+   xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac;
 
if (delta < 0) {
neg_adj = 1;
@@ -82,8 +85,7 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 
delta)
nsec = reminder;
 
spin_lock_irqsave(>ptp_lock, flags);
-   stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj,
-   priv->plat->has_gmac4);
+   stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj, xmac);
spin_unlock_irqrestore(>ptp_lock, flags);
 
return 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
index f4b31d69f60e..ecccf895fd7e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
@@ -21,6 +21,7 @@
 #ifndef__STMMAC_PTP_H__
 #define__STMMAC_PTP_H__
 
+#define PTP_XGMAC_OFFSET   0xd00
 #definePTP_GMAC4_OFFSET0xb00
 #definePTP_GMAC3_X_OFFSET  0x700
 
-- 
2.7.4




[PATCH v3 net-next 5/9] net: stmmac: Add MDIO related functions for XGMAC2

2018-08-03 Thread Jose Abreu
Add the MDIO related funcionalities for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
Cc: Andrew Lunn 
---
Changes from v2:
- Use helper to set C22 (Andrew)
- Wait for bus free before setting C22 reg (Andrew)
Changes from v1:
- Remove C45 support (Andrew)
- Add define for bits (Andrew)
- Remove uneeded cast (Andrew)
- Use different callbacks instead of if's (Andrew)
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 117 +-
 1 file changed, 115 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 5df1a608e566..7b0167059bd2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 
+#include "dwxgmac2.h"
 #include "stmmac.h"
 
 #define MII_BUSY 0x0001
@@ -39,6 +40,112 @@
 #define MII_GMAC4_WRITE(1 << MII_GMAC4_GOC_SHIFT)
 #define MII_GMAC4_READ (3 << MII_GMAC4_GOC_SHIFT)
 
+/* XGMAC defines */
+#define MII_XGMAC_SADDRBIT(18)
+#define MII_XGMAC_CMD_SHIFT16
+#define MII_XGMAC_WRITE(1 << MII_XGMAC_CMD_SHIFT)
+#define MII_XGMAC_READ (3 << MII_XGMAC_CMD_SHIFT)
+#define MII_XGMAC_BUSY BIT(22)
+
+static int stmmac_xgmac2_c22_format(struct stmmac_priv *priv, int phyaddr,
+   int phyreg, u32 *hw_addr)
+{
+   unsigned int mii_data = priv->hw->mii.data;
+   u32 tmp;
+
+   /* HW does not support C22 addr >= 4 */
+   if (phyaddr >= 4)
+   return -ENODEV;
+   /* Wait until any existing MII operation is complete */
+   if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+  !(tmp & MII_XGMAC_BUSY), 100, 1))
+   return -EBUSY;
+
+   /* Set port as Clause 22 */
+   tmp = readl(priv->ioaddr + XGMAC_MDIO_C22P);
+   tmp |= BIT(phyaddr);
+   writel(tmp, priv->ioaddr + XGMAC_MDIO_C22P);
+
+   *hw_addr = (phyaddr << 16) | (phyreg & 0x1f);
+   return 0;
+}
+
+static int stmmac_xgmac2_mdio_read(struct mii_bus *bus, int phyaddr, int 
phyreg)
+{
+   struct net_device *ndev = bus->priv;
+   struct stmmac_priv *priv = netdev_priv(ndev);
+   unsigned int mii_address = priv->hw->mii.addr;
+   unsigned int mii_data = priv->hw->mii.data;
+   u32 tmp, addr, value = MII_XGMAC_BUSY;
+   int ret;
+
+   if (phyreg & MII_ADDR_C45) {
+   return -EOPNOTSUPP;
+   } else {
+   ret = stmmac_xgmac2_c22_format(priv, phyaddr, phyreg, );
+   if (ret)
+   return ret;
+   }
+
+   value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+   & priv->hw->mii.clk_csr_mask;
+   value |= MII_XGMAC_SADDR | MII_XGMAC_READ;
+
+   /* Wait until any existing MII operation is complete */
+   if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+  !(tmp & MII_XGMAC_BUSY), 100, 1))
+   return -EBUSY;
+
+   /* Set the MII address register to read */
+   writel(addr, priv->ioaddr + mii_address);
+   writel(value, priv->ioaddr + mii_data);
+
+   /* Wait until any existing MII operation is complete */
+   if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+  !(tmp & MII_XGMAC_BUSY), 100, 1))
+   return -EBUSY;
+
+   /* Read the data from the MII data register */
+   return readl(priv->ioaddr + mii_data) & GENMASK(15, 0);
+}
+
+static int stmmac_xgmac2_mdio_write(struct mii_bus *bus, int phyaddr,
+   int phyreg, u16 phydata)
+{
+   struct net_device *ndev = bus->priv;
+   struct stmmac_priv *priv = netdev_priv(ndev);
+   unsigned int mii_address = priv->hw->mii.addr;
+   unsigned int mii_data = priv->hw->mii.data;
+   u32 addr, tmp, value = MII_XGMAC_BUSY;
+   int ret;
+
+   if (phyreg & MII_ADDR_C45) {
+   return -EOPNOTSUPP;
+   } else {
+   ret = stmmac_xgmac2_c22_format(priv, phyaddr, phyreg, );
+   if (ret)
+   return ret;
+   }
+
+   value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+   & priv->hw->mii.clk_csr_mask;
+   value |= phydata | MII_XGMAC_SADDR;
+   value |= MII_XGMAC_WRITE;
+
+   /* Wait until any existing MII operation is complete */
+   if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+  !(tmp & MII_XGMAC_BUSY), 100, 1))
+   return -EBUSY;
+
+   /* Set the MII address register to write */
+   writel(addr, priv->ioaddr + mii_address);
+   writel(value, priv->ioaddr + mii_data);
+
+ 

[PATCH v3 net-next 3/9] net: stmmac: Add DMA related callbacks for XGMAC2

2018-08-03 Thread Jose Abreu
Add the DMA related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h |  56 +++
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 410 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   2 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   1 +
 5 files changed, 469 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index a6cf632c9592..da40d3bba037 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -5,7 +5,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
  dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \
  mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o  \
  dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
- stmmac_tc.o dwxgmac2_core.o $(stmmac-y)
+ stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o $(stmmac-y)
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
index 7832571f791f..ddd23f8559df 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -138,4 +138,60 @@
 #define XGMAC_ABPSIS   BIT(1)
 #define XGMAC_TXUNFIS  BIT(0)
 
+/* DMA Registers */
+#define XGMAC_DMA_MODE 0x3000
+#define XGMAC_SWR  BIT(0)
+#define XGMAC_DMA_SYSBUS_MODE  0x3004
+#define XGMAC_WR_OSR_LMT   GENMASK(29, 24)
+#define XGMAC_WR_OSR_LMT_SHIFT 24
+#define XGMAC_RD_OSR_LMT   GENMASK(21, 16)
+#define XGMAC_RD_OSR_LMT_SHIFT 16
+#define XGMAC_EN_LPI   BIT(15)
+#define XGMAC_LPI_XIT_PKT  BIT(14)
+#define XGMAC_AAL  BIT(12)
+#define XGMAC_BLEN256  BIT(7)
+#define XGMAC_BLEN128  BIT(6)
+#define XGMAC_BLEN64   BIT(5)
+#define XGMAC_BLEN32   BIT(4)
+#define XGMAC_BLEN16   BIT(3)
+#define XGMAC_BLEN8BIT(2)
+#define XGMAC_BLEN4BIT(1)
+#define XGMAC_UNDEFBIT(0)
+#define XGMAC_DMA_CH_CONTROL(x)(0x3100 + (0x80 * (x)))
+#define XGMAC_PBLx8BIT(16)
+#define XGMAC_DMA_CH_TX_CONTROL(x) (0x3104 + (0x80 * (x)))
+#define XGMAC_TxPBLGENMASK(21, 16)
+#define XGMAC_TxPBL_SHIFT  16
+#define XGMAC_TSE  BIT(12)
+#define XGMAC_OSP  BIT(4)
+#define XGMAC_TXST BIT(0)
+#define XGMAC_DMA_CH_RX_CONTROL(x) (0x3108 + (0x80 * (x)))
+#define XGMAC_RxPBLGENMASK(21, 16)
+#define XGMAC_RxPBL_SHIFT  16
+#define XGMAC_RXST BIT(0)
+#define XGMAC_DMA_CH_TxDESC_LADDR(x)   (0x3114 + (0x80 * (x)))
+#define XGMAC_DMA_CH_RxDESC_LADDR(x)   (0x311c + (0x80 * (x)))
+#define XGMAC_DMA_CH_TxDESC_TAIL_LPTR(x)   (0x3124 + (0x80 * (x)))
+#define XGMAC_DMA_CH_RxDESC_TAIL_LPTR(x)   (0x312c + (0x80 * (x)))
+#define XGMAC_DMA_CH_TxDESC_RING_LEN(x)(0x3130 + (0x80 * 
(x)))
+#define XGMAC_DMA_CH_RxDESC_RING_LEN(x)(0x3134 + (0x80 * 
(x)))
+#define XGMAC_DMA_CH_INT_EN(x) (0x3138 + (0x80 * (x)))
+#define XGMAC_NIE  BIT(15)
+#define XGMAC_AIE  BIT(14)
+#define XGMAC_RBUE BIT(7)
+#define XGMAC_RIE  BIT(6)
+#define XGMAC_TIE  BIT(0)
+#define XGMAC_DMA_INT_DEFAULT_EN   (XGMAC_NIE | XGMAC_AIE | XGMAC_RBUE | \
+   XGMAC_RIE | XGMAC_TIE)
+#define XGMAC_DMA_CH_Rx_WATCHDOG(x)(0x313c + (0x80 * (x)))
+#define XGMAC_RWT  GENMASK(7, 0)
+#define XGMAC_DMA_CH_STATUS(x) (0x3160 + (0x80 * (x)))
+#define XGMAC_NIS  BIT(15)
+#define XGMAC_AIS  BIT(14)
+#define XGMAC_FBE  BIT(12)
+#define XGMAC_RBU  BIT(7)
+#define XGMAC_RI   BIT(6)
+#define XGMAC_TPS  BIT(1)
+#define XGMAC_TI   BIT(0)
+
 #endif /* __STMMAC_DWXGMAC2_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
new file mode 100644
index ..50d9fffc32b5
--- /dev/null
+++ 

[PATCH v3 net-next 4/9] net: stmmac: Add descriptor related callbacks for XGMAC2

2018-08-03 Thread Jose Abreu
Add the descriptor related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   3 +-
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h |  30 +++
 .../net/ethernet/stmicro/stmmac/dwxgmac2_descs.c   | 280 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   2 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   1 +
 5 files changed, 314 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index da40d3bba037..99967a80a8c8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -5,7 +5,8 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
  dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \
  mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o  \
  dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
- stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o $(stmmac-y)
+ stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o dwxgmac2_descs.o \
+ $(stmmac-y)
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
index ddd23f8559df..d090cbb501f2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -194,4 +194,34 @@
 #define XGMAC_TPS  BIT(1)
 #define XGMAC_TI   BIT(0)
 
+/* Descriptors */
+#define XGMAC_TDES2_IOCBIT(31)
+#define XGMAC_TDES2_TTSE   BIT(30)
+#define XGMAC_TDES2_B2LGENMASK(29, 16)
+#define XGMAC_TDES2_B2L_SHIFT  16
+#define XGMAC_TDES2_B1LGENMASK(13, 0)
+#define XGMAC_TDES3_OWNBIT(31)
+#define XGMAC_TDES3_CTXT   BIT(30)
+#define XGMAC_TDES3_FD BIT(29)
+#define XGMAC_TDES3_LD BIT(28)
+#define XGMAC_TDES3_CPCGENMASK(27, 26)
+#define XGMAC_TDES3_CPC_SHIFT  26
+#define XGMAC_TDES3_TCMSSV BIT(26)
+#define XGMAC_TDES3_THLGENMASK(22, 19)
+#define XGMAC_TDES3_THL_SHIFT  19
+#define XGMAC_TDES3_TSEBIT(18)
+#define XGMAC_TDES3_CICGENMASK(17, 16)
+#define XGMAC_TDES3_CIC_SHIFT  16
+#define XGMAC_TDES3_TPLGENMASK(17, 0)
+#define XGMAC_TDES3_FL GENMASK(14, 0)
+#define XGMAC_RDES3_OWNBIT(31)
+#define XGMAC_RDES3_CTXT   BIT(30)
+#define XGMAC_RDES3_IOCBIT(30)
+#define XGMAC_RDES3_LD BIT(28)
+#define XGMAC_RDES3_CDABIT(27)
+#define XGMAC_RDES3_ES BIT(15)
+#define XGMAC_RDES3_PL GENMASK(13, 0)
+#define XGMAC_RDES3_TSDBIT(6)
+#define XGMAC_RDES3_TSABIT(4)
+
 #endif /* __STMMAC_DWXGMAC2_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c
new file mode 100644
index ..1d858fdec997
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c
@@ -0,0 +1,280 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+ * stmmac XGMAC support.
+ */
+
+#include 
+#include "common.h"
+#include "dwxgmac2.h"
+
+static int dwxgmac2_get_tx_status(void *data, struct stmmac_extra_stats *x,
+ struct dma_desc *p, void __iomem *ioaddr)
+{
+   unsigned int tdes3 = le32_to_cpu(p->des3);
+   int ret = tx_done;
+
+   if (unlikely(tdes3 & XGMAC_TDES3_OWN))
+   return tx_dma_own;
+   if (likely(!(tdes3 & XGMAC_TDES3_LD)))
+   return tx_not_ls;
+
+   return ret;
+}
+
+static int dwxgmac2_get_rx_status(void *data, struct stmmac_extra_stats *x,
+ struct dma_desc *p)
+{
+   unsigned int rdes3 = le32_to_cpu(p->des3);
+   int ret = good_frame;
+
+   if (unlikely(rdes3 & XGMAC_RDES3_OWN))
+   return dma_own;
+   if (likely(!(rdes3 & XGMAC_RDES3_LD)))
+   return discard_frame;
+   if (unlikely(rdes3 & XGMAC_RDES3_ES))
+   ret = discard_frame;
+
+   return ret;
+}
+
+static int dwxgmac2_get_tx_len(struct dma_desc *p)
+{
+   return (le32_to_cpu(p->des2) & XGMAC_TDES2_B1L);
+}
+
+static int dwxgmac2_get_tx_owner(struct dma_desc *p)
+{
+   return (le32_to_cpu(p->des3) & 

[PATCH v3 net-next 1/9] net: stmmac: Add XGMAC 2.10 HWIF entry

2018-08-03 Thread Jose Abreu
Add a new entry to HWIF table for XGMAC 2.10. For now we fill it with
empty callbacks which will be added in posterior patches.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/common.h | 14 +++--
 drivers/net/ethernet/stmicro/stmmac/hwif.c   | 31 ++--
 include/linux/stmmac.h   |  1 +
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 78fd0f8b8e81..3fb81acbd274 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -36,12 +36,14 @@
 #include "mmc.h"
 
 /* Synopsys Core versions */
-#defineDWMAC_CORE_3_40 0x34
-#defineDWMAC_CORE_3_50 0x35
-#defineDWMAC_CORE_4_00 0x40
-#define DWMAC_CORE_4_100x41
-#define DWMAC_CORE_5_00 0x50
-#define DWMAC_CORE_5_10 0x51
+#defineDWMAC_CORE_3_40 0x34
+#defineDWMAC_CORE_3_50 0x35
+#defineDWMAC_CORE_4_00 0x40
+#define DWMAC_CORE_4_100x41
+#define DWMAC_CORE_5_000x50
+#define DWMAC_CORE_5_100x51
+#define DWXGMAC_CORE_2_10  0x21
+
 #define STMMAC_CHAN0   0   /* Always supported and default for all chips */
 
 /* These need to be power of two, and >= 4 */
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c 
b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 1f50e83cafb2..24f5ff175aa4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -72,6 +72,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
 static const struct stmmac_hwif_entry {
bool gmac;
bool gmac4;
+   bool xgmac;
u32 min_id;
const struct stmmac_regs_off regs;
const void *desc;
@@ -87,6 +88,7 @@ static const struct stmmac_hwif_entry {
{
.gmac = false,
.gmac4 = false,
+   .xgmac = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -103,6 +105,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = true,
.gmac4 = false,
+   .xgmac = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -119,6 +122,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -135,6 +139,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = DWMAC_CORE_4_00,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -151,6 +156,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = DWMAC_CORE_4_10,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -167,6 +173,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = DWMAC_CORE_5_10,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -180,11 +187,29 @@ static const struct stmmac_hwif_entry {
.tc = _tc_ops,
.setup = dwmac4_setup,
.quirks = NULL,
-   }
+   }, {
+   .gmac = false,
+   .gmac4 = false,
+   .xgmac = true,
+   .min_id = DWXGMAC_CORE_2_10,
+   .regs = {
+   .ptp_off = 0,
+   .mmc_off = 0,
+   },
+   .desc = NULL,
+   .dma = NULL,
+   .mac = NULL,
+   .hwtimestamp = NULL,
+   .mode = NULL,
+   .tc = NULL,
+   .setup = NULL,
+   .quirks = NULL,
+   },
 };
 
 int stmmac_hwif_init(struct stmmac_priv *priv)
 {
+   bool needs_xgmac = priv->plat->has_xgmac;
bool needs_gmac4 = priv->plat->has_gmac4;
bool needs_gmac = priv->plat->has_gmac;
const struct stmmac_hwif_entry *entry;
@@ -195,7 +220,7 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
 
if (needs_gmac) {
id = stmmac_get_id(priv, GMAC_VERSION);
-   } else if (needs_gmac4) {
+   } else if (needs_gmac4 || needs_xgmac) {
id = stmmac_get_id(priv, GMAC4_VERSION);
} else {
id = 0;
@@ -229,6 +254,8 @@ int stmmac_hwif_init(struct 

[PATCH v3 net-next 0/9] Add support for XGMAC2 in stmmac

2018-08-03 Thread Jose Abreu
This series adds support for 10Gigabit IP in stmmac. The IP is called XGMAC2
and has many similarities with GMAC4. Due to this, its relatively easy to
incorporate this new IP into stmmac driver by adding a new block and
filling the necessary callbacks.

The functionality added by this series is still reduced but its only a
starting point which will later be expanded.

I splitted the patches into funcionality and to ease the review. Only the
patch 8/9 really enables the XGMAC2 block by adding a new compatible string.

Version 3 addresses review comments of Andrew Lunn.

NOTE: Although the IP supports 10G, for now it was only possible to test it
at 1G speed due to 10G PHY HW shipping problems. Here follows iperf3
results at 1G:

---
# iperf3 -c 192.168.0.10
Connecting to host 192.168.0.10, port 5201
[  4] local 192.168.0.3 port 39178 connected to 192.168.0.10 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-1.00   sec   110 MBytes   920 Mbits/sec0482 KBytes
[  4]   1.00-2.00   sec   113 MBytes   946 Mbits/sec0482 KBytes
[  4]   2.00-3.00   sec   112 MBytes   937 Mbits/sec0482 KBytes
[  4]   3.00-4.00   sec   113 MBytes   946 Mbits/sec0482 KBytes
[  4]   4.00-5.00   sec   112 MBytes   935 Mbits/sec0482 KBytes
[  4]   5.00-6.00   sec   113 MBytes   946 Mbits/sec0482 KBytes
[  4]   6.00-7.00   sec   112 MBytes   937 Mbits/sec0482 KBytes
[  4]   7.00-8.00   sec   113 MBytes   946 Mbits/sec0482 KBytes
[  4]   8.00-9.00   sec   112 MBytes   937 Mbits/sec0482 KBytes
[  4]   9.00-10.00  sec   113 MBytes   946 Mbits/sec0482 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec0 sender
[  4]   0.00-10.00  sec  1.09 GBytes   938 Mbits/sec  receiver
---

Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
Cc: Andrew Lunn 

Jose Abreu (9):
  net: stmmac: Add XGMAC 2.10 HWIF entry
  net: stmmac: Add MAC related callbacks for XGMAC2
  net: stmmac: Add DMA related callbacks for XGMAC2
  net: stmmac: Add descriptor related callbacks for XGMAC2
  net: stmmac: Add MDIO related functions for XGMAC2
  net: stmmac: Add PTP support for XGMAC2
  net: stmmac: Integrate XGMAC into main driver flow
  net: stmmac: Add the bindings parsing for XGMAC2
  dt-bindings: net: stmmac: Add the bindings documentation for XGMAC2.

 Documentation/devicetree/bindings/net/stmmac.txt   |   5 +-
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   3 +-
 drivers/net/ethernet/stmicro/stmmac/common.h   |  17 +-
 .../net/ethernet/stmicro/stmmac/dwmac-generic.c|   2 +
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 227 
 .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c| 371 +++
 .../net/ethernet/stmicro/stmmac/dwxgmac2_descs.c   | 280 ++
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 410 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |  31 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   3 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  67 +++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  | 117 +-
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   6 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c   |   6 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h   |   1 +
 include/linux/stmmac.h |   1 +
 16 files changed, 1513 insertions(+), 34 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c

-- 
2.7.4




[PATCH v3 net-next 2/9] net: stmmac: Add MAC related callbacks for XGMAC2

2018-08-03 Thread Jose Abreu
Add the MAC related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/common.h   |   3 +
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 141 
 .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c| 371 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   4 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   1 +
 6 files changed, 519 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 68e9e2640c62..a6cf632c9592 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -5,7 +5,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
  dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \
  mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o  \
  dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
- stmmac_tc.o $(stmmac-y)
+ stmmac_tc.o dwxgmac2_core.o $(stmmac-y)
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 3fb81acbd274..1854f270ad66 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -400,6 +400,8 @@ struct mac_link {
u32 speed10;
u32 speed100;
u32 speed1000;
+   u32 speed2500;
+   u32 speed1;
u32 duplex;
 };
 
@@ -441,6 +443,7 @@ struct stmmac_rx_routing {
 int dwmac100_setup(struct stmmac_priv *priv);
 int dwmac1000_setup(struct stmmac_priv *priv);
 int dwmac4_setup(struct stmmac_priv *priv);
+int dwxgmac2_setup(struct stmmac_priv *priv);
 
 void stmmac_set_mac_addr(void __iomem *ioaddr, u8 addr[6],
 unsigned int high, unsigned int low);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
new file mode 100644
index ..7832571f791f
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+ * stmmac XGMAC definitions.
+ */
+
+#ifndef __STMMAC_DWXGMAC2_H__
+#define __STMMAC_DWXGMAC2_H__
+
+#include "common.h"
+
+/* Misc */
+#define XGMAC_JUMBO_LEN16368
+
+/* MAC Registers */
+#define XGMAC_TX_CONFIG0x
+#define XGMAC_CONFIG_SS_OFF29
+#define XGMAC_CONFIG_SS_MASK   GENMASK(30, 29)
+#define XGMAC_CONFIG_SS_1  (0x0 << XGMAC_CONFIG_SS_OFF)
+#define XGMAC_CONFIG_SS_2500   (0x2 << XGMAC_CONFIG_SS_OFF)
+#define XGMAC_CONFIG_SS_1000   (0x3 << XGMAC_CONFIG_SS_OFF)
+#define XGMAC_CONFIG_SARC  GENMASK(22, 20)
+#define XGMAC_CONFIG_SARC_SHIFT20
+#define XGMAC_CONFIG_JDBIT(16)
+#define XGMAC_CONFIG_TEBIT(0)
+#define XGMAC_CORE_INIT_TX (XGMAC_CONFIG_JD)
+#define XGMAC_RX_CONFIG0x0004
+#define XGMAC_CONFIG_ARPEN BIT(31)
+#define XGMAC_CONFIG_GPSL  GENMASK(29, 16)
+#define XGMAC_CONFIG_GPSL_SHIFT16
+#define XGMAC_CONFIG_S2KP  BIT(11)
+#define XGMAC_CONFIG_IPC   BIT(9)
+#define XGMAC_CONFIG_JEBIT(8)
+#define XGMAC_CONFIG_WDBIT(7)
+#define XGMAC_CONFIG_GPSLCEBIT(6)
+#define XGMAC_CONFIG_CST   BIT(2)
+#define XGMAC_CONFIG_ACS   BIT(1)
+#define XGMAC_CONFIG_REBIT(0)
+#define XGMAC_CORE_INIT_RX 0
+#define XGMAC_PACKET_FILTER0x0008
+#define XGMAC_FILTER_RABIT(31)
+#define XGMAC_FILTER_PMBIT(4)
+#define XGMAC_FILTER_HMC   BIT(2)
+#define XGMAC_FILTER_PRBIT(0)
+#define XGMAC_HASH_TABLE(x)(0x0010 + (x) * 4)
+#define XGMAC_RXQ_CTRL00x00a0
+#define XGMAC_RXQEN(x) GENMASK((x) * 2 + 1, (x) * 2)
+#define XGMAC_RXQEN_SHIFT(x)   ((x) * 2)
+#define XGMAC_RXQ_CTRL20x00a8
+#define XGMAC_RXQ_CTRL30x00ac
+#define XGMAC_PSRQ(x)  GENMASK((x) * 8 + 7, (x) * 8)
+#define XGMAC_PSRQ_SHIFT(x)((x) * 8)
+#define XGMAC_INT_STATUS   0x00b0
+#define XGMAC_PMTIS  

Re: Security enhancement proposal for kernel TLS

2018-08-03 Thread Dave Watson
On 08/02/18 05:23 PM, Vakul Garg wrote:
> > I agree that Boris' patch does what you say it does - it sets keys 
> > immediately
> > after CCS instead of after FINISHED message.  I disagree that the kernel tls
> > implementation currently requires that specific ordering, nor do I think 
> > that it
> > should require that ordering.
> 
> The current kernel implementation assumes record sequence number to start 
> from '0'.
> If keys have to be set after FINISHED message, then record sequence number 
> need to
> be communicated from user space TLS stack to kernel. IIRC, sequence number is 
> not 
> part of the interface through which key is transferred.

The setsockopt call struct takes the key, iv, salt, and seqno:

struct tls12_crypto_info_aes_gcm_128 {
struct tls_crypto_info info;
unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
};


Re: [PATCH v2 net-next 5/9] net: stmmac: Add MDIO related functions for XGMAC2

2018-08-03 Thread Andrew Lunn
> > Probably you want to wait for the bus to be idle before you change the
> > mode to C22. Some PHYs can do both C22 and C45, e.g. EEE registers can
> > be in C45 space, while the rest are in C22.
> 
> Ok but I can't test C45 right now so maybe leave that change to
> when I can test it ?

I would fix this now. It probably cannot cause issues now, but it is
wrong. You are going to have to fix it some time, so why not now?

   Andrew


Re: [PATCH v2 net-next 5/9] net: stmmac: Add MDIO related functions for XGMAC2

2018-08-03 Thread Jose Abreu
Hi Andrew,

On 03-08-2018 16:20, Andrew Lunn wrote:
> On Fri, Aug 03, 2018 at 03:56:07PM +0100, Jose Abreu wrote:
>> Add the MDIO related funcionalities for the new IP block XGMAC2.
>>
>> Signed-off-by: Jose Abreu 
>> Cc: David S. Miller 
>> Cc: Joao Pinto 
>> Cc: Giuseppe Cavallaro 
>> Cc: Alexandre Torgue 
>> Cc: Andrew Lunn 
>> ---
>> Changes from v1:
>>  - Remove C45 support (Andrew)
>>  - Add define for bits (Andrew)
>>  - Remove uneeded cast (Andrew)
>>  - Use different callbacks instead of if's (Andrew)
>> ---
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 101 
>> +-
>>  1 file changed, 99 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
>> b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> index 5df1a608e566..9bbdb78d3315 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> @@ -29,6 +29,7 @@
>>  #include 
>>  #include 
>>  
>> +#include "dwxgmac2.h"
>>  #include "stmmac.h"
>>  
>>  #define MII_BUSY 0x0001
>> @@ -39,6 +40,96 @@
>>  #define MII_GMAC4_WRITE (1 << MII_GMAC4_GOC_SHIFT)
>>  #define MII_GMAC4_READ  (3 << MII_GMAC4_GOC_SHIFT)
>>  
>> +/* XGMAC defines */
>> +#define MII_XGMAC_SADDR BIT(18)
>> +#define MII_XGMAC_CMD_SHIFT 16
>> +#define MII_XGMAC_WRITE (1 << MII_XGMAC_CMD_SHIFT)
>> +#define MII_XGMAC_READ  (3 << MII_XGMAC_CMD_SHIFT)
>> +#define MII_XGMAC_BUSY  BIT(22)
>> +
>> +static int stmmac_xgmac2_mdio_read(struct mii_bus *bus, int phyaddr, int 
>> phyreg)
>> +{
>> +struct net_device *ndev = bus->priv;
>> +struct stmmac_priv *priv = netdev_priv(ndev);
>> +unsigned int mii_address = priv->hw->mii.addr;
>> +unsigned int mii_data = priv->hw->mii.data;
>> +u32 tmp, addr, value = MII_XGMAC_BUSY;
>> +
>> +if (phyreg & MII_ADDR_C45) {
>> +return -EOPNOTSUPP;
>> +} else {
>> +if (phyaddr >= 4)
>> +return -ENODEV;
>> +
>> +/* Set port as Clause 22 */
>> +tmp = readl(priv->ioaddr + XGMAC_MDIO_C22P);
>> +tmp |= BIT(phyaddr);
>> +writel(tmp, priv->ioaddr + XGMAC_MDIO_C22P);
> Hi Jose
>
> Maybe put this into a helper? You do repeat it twice.

Yes, makes sense.

>
>> +
>> +addr = (phyaddr << 16) | (phyreg & 0x1f);
> You could use GENMASK(4, 0) here. That was the point i was trying to
> make earlier. But i actually find 0x1f, and 0x easier to read.

Less typing :D

>
>> +}
>> +
>> +value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
>> +& priv->hw->mii.clk_csr_mask;
>> +value |= MII_XGMAC_SADDR | MII_XGMAC_READ;
>> +
>> +if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
>> +   !(tmp & MII_XGMAC_BUSY), 100, 1))
>> +return -EBUSY;
> Probably you want to wait for the bus to be idle before you change the
> mode to C22. Some PHYs can do both C22 and C45, e.g. EEE registers can
> be in C45 space, while the rest are in C22.

Ok but I can't test C45 right now so maybe leave that change to
when I can test it ?

Thanks and Best Regards,
Jose Miguel Abreu

>
>> +
>> +writel(addr, priv->ioaddr + mii_address);
>> +writel(value, priv->ioaddr + mii_data);
>> +
>> +if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
>> +   !(tmp & MII_XGMAC_BUSY), 100, 1))
>> +return -EBUSY;
>> +
>> +/* Read the data from the MII data register */
>> +return readl(priv->ioaddr + mii_data) & GENMASK(15, 0);
>> +}
>   Andrew



Re: [PATCH v2 net-next 5/9] net: stmmac: Add MDIO related functions for XGMAC2

2018-08-03 Thread Andrew Lunn
On Fri, Aug 03, 2018 at 03:56:07PM +0100, Jose Abreu wrote:
> Add the MDIO related funcionalities for the new IP block XGMAC2.
> 
> Signed-off-by: Jose Abreu 
> Cc: David S. Miller 
> Cc: Joao Pinto 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> Cc: Andrew Lunn 
> ---
> Changes from v1:
>   - Remove C45 support (Andrew)
>   - Add define for bits (Andrew)
>   - Remove uneeded cast (Andrew)
>   - Use different callbacks instead of if's (Andrew)
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 101 
> +-
>  1 file changed, 99 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> index 5df1a608e566..9bbdb78d3315 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  
> +#include "dwxgmac2.h"
>  #include "stmmac.h"
>  
>  #define MII_BUSY 0x0001
> @@ -39,6 +40,96 @@
>  #define MII_GMAC4_WRITE  (1 << MII_GMAC4_GOC_SHIFT)
>  #define MII_GMAC4_READ   (3 << MII_GMAC4_GOC_SHIFT)
>  
> +/* XGMAC defines */
> +#define MII_XGMAC_SADDR  BIT(18)
> +#define MII_XGMAC_CMD_SHIFT  16
> +#define MII_XGMAC_WRITE  (1 << MII_XGMAC_CMD_SHIFT)
> +#define MII_XGMAC_READ   (3 << MII_XGMAC_CMD_SHIFT)
> +#define MII_XGMAC_BUSY   BIT(22)
> +
> +static int stmmac_xgmac2_mdio_read(struct mii_bus *bus, int phyaddr, int 
> phyreg)
> +{
> + struct net_device *ndev = bus->priv;
> + struct stmmac_priv *priv = netdev_priv(ndev);
> + unsigned int mii_address = priv->hw->mii.addr;
> + unsigned int mii_data = priv->hw->mii.data;
> + u32 tmp, addr, value = MII_XGMAC_BUSY;
> +
> + if (phyreg & MII_ADDR_C45) {
> + return -EOPNOTSUPP;
> + } else {
> + if (phyaddr >= 4)
> + return -ENODEV;
> +
> + /* Set port as Clause 22 */
> + tmp = readl(priv->ioaddr + XGMAC_MDIO_C22P);
> + tmp |= BIT(phyaddr);
> + writel(tmp, priv->ioaddr + XGMAC_MDIO_C22P);

Hi Jose

Maybe put this into a helper? You do repeat it twice.

> +
> + addr = (phyaddr << 16) | (phyreg & 0x1f);

You could use GENMASK(4, 0) here. That was the point i was trying to
make earlier. But i actually find 0x1f, and 0x easier to read.

> + }
> +
> + value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
> + & priv->hw->mii.clk_csr_mask;
> + value |= MII_XGMAC_SADDR | MII_XGMAC_READ;
> +
> + if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
> +!(tmp & MII_XGMAC_BUSY), 100, 1))
> + return -EBUSY;

Probably you want to wait for the bus to be idle before you change the
mode to C22. Some PHYs can do both C22 and C45, e.g. EEE registers can
be in C45 space, while the rest are in C22.

> +
> + writel(addr, priv->ioaddr + mii_address);
> + writel(value, priv->ioaddr + mii_data);
> +
> + if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
> +!(tmp & MII_XGMAC_BUSY), 100, 1))
> + return -EBUSY;
> +
> + /* Read the data from the MII data register */
> + return readl(priv->ioaddr + mii_data) & GENMASK(15, 0);
> +}

  Andrew


Re: UDP packets arriving on wrong sockets

2018-08-03 Thread Andrew Cann
On Fri, Aug 03, 2018 at 10:20:06AM -0400, Willem de Bruijn wrote:
> On Fri, Aug 3, 2018 at 12:20 AM Andrew Cann  wrote:
> >
> > On Thu, Aug 02, 2018 at 11:21:41AM -0400, Willem de Bruijn wrote:
> > > You have two sockets bound to the same address and port? Is this using
> > > SO_REUSEPORT?
> >
> > Yes, this is using SO_REUSEPORT.
> 
> Then this is working as intended.
> 
> Without SO_REUSEPORT it would not be possible to bind two sockets to
> the same address and port. See documentation, e.g., at
> https://lwn.net/Articles/542629/

The man page for connect clearly states that a connected UDP socket should only
receive datagrams from the address that it is connected to. This isn't the
behaviour I'm seeing. That's the issue.



signature.asc
Description: Digital signature


[PATCH net] l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()

2018-08-03 Thread Guillaume Nault
If 'session' is not NULL and is not a PPP pseudo-wire, then we fail to
drop the reference taken by l2tp_session_get().

Fixes: ecd012e45ab5 ("l2tp: filter out non-PPP sessions in 
pppol2tp_tunnel_ioctl()")
Signed-off-by: Guillaume Nault 
---
Sorry for the stupid mistake. I guess I got blinded by the apparent
simplicity of the bug when I wrote the original patch.

net/l2tp/l2tp_ppp.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index e398797878a9..cf6cca260e7b 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -1201,13 +1201,18 @@ static int pppol2tp_tunnel_ioctl(struct l2tp_tunnel 
*tunnel,
l2tp_session_get(sock_net(sk), tunnel,
 stats.session_id);
 
-   if (session && session->pwtype == L2TP_PWTYPE_PPP) {
-   err = pppol2tp_session_ioctl(session, cmd,
-arg);
+   if (!session) {
+   err = -EBADR;
+   break;
+   }
+   if (session->pwtype != L2TP_PWTYPE_PPP) {
l2tp_session_dec_refcount(session);
-   } else {
err = -EBADR;
+   break;
}
+
+   err = pppol2tp_session_ioctl(session, cmd, arg);
+   l2tp_session_dec_refcount(session);
break;
}
 #ifdef CONFIG_XFRM
-- 
2.18.0



[PATCH v2 net-next 7/9] net: stmmac: Integrate XGMAC into main driver flow

2018-08-03 Thread Jose Abreu
Now that we have all the XGMAC related callbacks, lets start integrating
this IP block into main driver.

Also, we corrected the initialization flow to only start DMA after
setting descriptors length.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
Cc: Andrew Lunn 
---
Changes from v1:
- Correct flow of initialization
- Remove 2.5G/10G support (Andrew)
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 67 ---
 1 file changed, 48 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 9d104a05044d..ff1ffb46198a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include "dwmac1000.h"
+#include "dwxgmac2.h"
 #include "hwif.h"
 
 #defineSTMMAC_ALIGN(x) __ALIGN_KERNEL(x, SMP_CACHE_BYTES)
@@ -262,6 +263,21 @@ static void stmmac_clk_csr_set(struct stmmac_priv *priv)
else
priv->clk_csr = 0;
}
+
+   if (priv->plat->has_xgmac) {
+   if (clk_rate > 4)
+   priv->clk_csr = 0x5;
+   else if (clk_rate > 35000)
+   priv->clk_csr = 0x4;
+   else if (clk_rate > 3)
+   priv->clk_csr = 0x3;
+   else if (clk_rate > 25000)
+   priv->clk_csr = 0x2;
+   else if (clk_rate > 15000)
+   priv->clk_csr = 0x1;
+   else
+   priv->clk_csr = 0x0;
+   }
 }
 
 static void print_pkt(unsigned char *buf, int len)
@@ -498,7 +514,7 @@ static void stmmac_get_rx_hwtstamp(struct stmmac_priv 
*priv, struct dma_desc *p,
if (!priv->hwts_rx_en)
return;
/* For GMAC4, the valid timestamp is from CTX next desc. */
-   if (priv->plat->has_gmac4)
+   if (priv->plat->has_gmac4 || priv->plat->has_xgmac)
desc = np;
 
/* Check if timestamp is available */
@@ -540,6 +556,9 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
u32 ts_event_en = 0;
u32 value = 0;
u32 sec_inc;
+   bool xmac;
+
+   xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac;
 
if (!(priv->dma_cap.time_stamp || priv->adv_ts)) {
netdev_alert(priv->dev, "No support for HW time stamping\n");
@@ -575,7 +594,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
/* PTP v1, UDP, any kind of event packet */
config.rx_filter = HWTSTAMP_FILTER_PTP_V1_L4_EVENT;
/* take time stamp for all event messages */
-   if (priv->plat->has_gmac4)
+   if (xmac)
snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
else
snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
@@ -610,7 +629,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L4_EVENT;
ptp_v2 = PTP_TCR_TSVER2ENA;
/* take time stamp for all event messages */
-   if (priv->plat->has_gmac4)
+   if (xmac)
snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
else
snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
@@ -647,7 +666,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
ptp_v2 = PTP_TCR_TSVER2ENA;
/* take time stamp for all event messages */
-   if (priv->plat->has_gmac4)
+   if (xmac)
snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
else
snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
@@ -718,7 +737,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
/* program Sub Second Increment reg */
stmmac_config_sub_second_increment(priv,
priv->ptpaddr, priv->plat->clk_ptp_rate,
-   priv->plat->has_gmac4, _inc);
+   xmac, _inc);
temp = div_u64(10ULL, sec_inc);
 
/* Store sub second increment and flags for later use */
@@ -755,12 +774,14 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
  */
 static int stmmac_init_ptp(struct stmmac_priv *priv)
 {
+   bool xmac = priv->plat->has_gmac4 || 

[PATCH v2 net-next 9/9] dt-bindings: net: stmmac: Add the bindings documentation for XGMAC2.

2018-08-03 Thread Jose Abreu
Adds the documentation for XGMAC2 DT bindings.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
Cc: Sergei Shtylyov 
Cc: devicet...@vger.kernel.org
Cc: Rob Herring 
---
Changes from v1:
- Correct header, now we also support 2.5/10G.
- Add missing '>' (Sergei)
---
 Documentation/devicetree/bindings/net/stmmac.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt 
b/Documentation/devicetree/bindings/net/stmmac.txt
index 3a28a5d8857d..a32fd590ce8f 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -1,7 +1,8 @@
-* STMicroelectronics 10/100/1000 Ethernet driver (GMAC)
+* STMicroelectronics 10/100/1000/2500/1 Ethernet driver (GMAC/XGMAC)
 
 Required properties:
-- compatible: Should be "snps,dwmac-", "snps,dwmac"
+- compatible: Should be "snps,dwmac-", "snps,dwmac" or
+   "snps,dwxgmac-", "snps,dwxgmac".
For backwards compatibility: "st,spear600-gmac" is also supported.
 - reg: Address and length of the register set for the device
 - interrupt-parent: Should be the phandle for the interrupt controller
-- 
2.7.4




[PATCH v2 net-next 5/9] net: stmmac: Add MDIO related functions for XGMAC2

2018-08-03 Thread Jose Abreu
Add the MDIO related funcionalities for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
Cc: Andrew Lunn 
---
Changes from v1:
- Remove C45 support (Andrew)
- Add define for bits (Andrew)
- Remove uneeded cast (Andrew)
- Use different callbacks instead of if's (Andrew)
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 101 +-
 1 file changed, 99 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 5df1a608e566..9bbdb78d3315 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 
+#include "dwxgmac2.h"
 #include "stmmac.h"
 
 #define MII_BUSY 0x0001
@@ -39,6 +40,96 @@
 #define MII_GMAC4_WRITE(1 << MII_GMAC4_GOC_SHIFT)
 #define MII_GMAC4_READ (3 << MII_GMAC4_GOC_SHIFT)
 
+/* XGMAC defines */
+#define MII_XGMAC_SADDRBIT(18)
+#define MII_XGMAC_CMD_SHIFT16
+#define MII_XGMAC_WRITE(1 << MII_XGMAC_CMD_SHIFT)
+#define MII_XGMAC_READ (3 << MII_XGMAC_CMD_SHIFT)
+#define MII_XGMAC_BUSY BIT(22)
+
+static int stmmac_xgmac2_mdio_read(struct mii_bus *bus, int phyaddr, int 
phyreg)
+{
+   struct net_device *ndev = bus->priv;
+   struct stmmac_priv *priv = netdev_priv(ndev);
+   unsigned int mii_address = priv->hw->mii.addr;
+   unsigned int mii_data = priv->hw->mii.data;
+   u32 tmp, addr, value = MII_XGMAC_BUSY;
+
+   if (phyreg & MII_ADDR_C45) {
+   return -EOPNOTSUPP;
+   } else {
+   if (phyaddr >= 4)
+   return -ENODEV;
+
+   /* Set port as Clause 22 */
+   tmp = readl(priv->ioaddr + XGMAC_MDIO_C22P);
+   tmp |= BIT(phyaddr);
+   writel(tmp, priv->ioaddr + XGMAC_MDIO_C22P);
+
+   addr = (phyaddr << 16) | (phyreg & 0x1f);
+   }
+
+   value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+   & priv->hw->mii.clk_csr_mask;
+   value |= MII_XGMAC_SADDR | MII_XGMAC_READ;
+
+   if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+  !(tmp & MII_XGMAC_BUSY), 100, 1))
+   return -EBUSY;
+
+   writel(addr, priv->ioaddr + mii_address);
+   writel(value, priv->ioaddr + mii_data);
+
+   if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+  !(tmp & MII_XGMAC_BUSY), 100, 1))
+   return -EBUSY;
+
+   /* Read the data from the MII data register */
+   return readl(priv->ioaddr + mii_data) & GENMASK(15, 0);
+}
+
+static int stmmac_xgmac2_mdio_write(struct mii_bus *bus, int phyaddr,
+   int phyreg, u16 phydata)
+{
+   struct net_device *ndev = bus->priv;
+   struct stmmac_priv *priv = netdev_priv(ndev);
+   unsigned int mii_address = priv->hw->mii.addr;
+   unsigned int mii_data = priv->hw->mii.data;
+   u32 addr, tmp, value = MII_XGMAC_BUSY;
+
+   if (phyreg & MII_ADDR_C45) {
+   return -EOPNOTSUPP;
+   } else {
+   if (phyaddr >= 4)
+   return -ENODEV;
+
+   /* Set port as Clause 22 */
+   tmp = readl(priv->ioaddr + XGMAC_MDIO_C22P);
+   tmp |= BIT(phyaddr);
+   writel(tmp, priv->ioaddr + XGMAC_MDIO_C22P);
+
+   addr = (phyaddr << 16) | (phyreg & 0x1f);
+   }
+
+   value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+   & priv->hw->mii.clk_csr_mask;
+   value |= phydata | MII_XGMAC_SADDR;
+   value |= MII_XGMAC_WRITE;
+
+   /* Wait until any existing MII operation is complete */
+   if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+  !(tmp & MII_XGMAC_BUSY), 100, 1))
+   return -EBUSY;
+
+   /* Set the MII address register to write */
+   writel(addr, priv->ioaddr + mii_address);
+   writel(value, priv->ioaddr + mii_data);
+
+   /* Wait until any existing MII operation is complete */
+   return readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+ !(tmp & MII_XGMAC_BUSY), 100, 1);
+}
+
 /**
  * stmmac_mdio_read
  * @bus: points to the mii_bus structure
@@ -223,8 +314,14 @@ int stmmac_mdio_register(struct net_device *ndev)
 #endif
 
new_bus->name = "stmmac";
-   new_bus->read = _mdio_read;
-   new_bus->write = _mdio_write;
+
+   if (priv->plat->has_xgmac) {
+   new_bus->read = _xgmac2_mdio_read;
+   new_bus->write = _xgmac2_mdio_write;
+   } else {
+   new_bus->read = _mdio_read;
+   new_bus->write = _mdio_write;
+   

[PATCH v2 net-next 4/9] net: stmmac: Add descriptor related callbacks for XGMAC2

2018-08-03 Thread Jose Abreu
Add the descriptor related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   3 +-
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h |  30 +++
 .../net/ethernet/stmicro/stmmac/dwxgmac2_descs.c   | 280 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   2 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   1 +
 5 files changed, 314 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index da40d3bba037..99967a80a8c8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -5,7 +5,8 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
  dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \
  mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o  \
  dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
- stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o $(stmmac-y)
+ stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o dwxgmac2_descs.o \
+ $(stmmac-y)
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
index ddd23f8559df..d090cbb501f2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -194,4 +194,34 @@
 #define XGMAC_TPS  BIT(1)
 #define XGMAC_TI   BIT(0)
 
+/* Descriptors */
+#define XGMAC_TDES2_IOCBIT(31)
+#define XGMAC_TDES2_TTSE   BIT(30)
+#define XGMAC_TDES2_B2LGENMASK(29, 16)
+#define XGMAC_TDES2_B2L_SHIFT  16
+#define XGMAC_TDES2_B1LGENMASK(13, 0)
+#define XGMAC_TDES3_OWNBIT(31)
+#define XGMAC_TDES3_CTXT   BIT(30)
+#define XGMAC_TDES3_FD BIT(29)
+#define XGMAC_TDES3_LD BIT(28)
+#define XGMAC_TDES3_CPCGENMASK(27, 26)
+#define XGMAC_TDES3_CPC_SHIFT  26
+#define XGMAC_TDES3_TCMSSV BIT(26)
+#define XGMAC_TDES3_THLGENMASK(22, 19)
+#define XGMAC_TDES3_THL_SHIFT  19
+#define XGMAC_TDES3_TSEBIT(18)
+#define XGMAC_TDES3_CICGENMASK(17, 16)
+#define XGMAC_TDES3_CIC_SHIFT  16
+#define XGMAC_TDES3_TPLGENMASK(17, 0)
+#define XGMAC_TDES3_FL GENMASK(14, 0)
+#define XGMAC_RDES3_OWNBIT(31)
+#define XGMAC_RDES3_CTXT   BIT(30)
+#define XGMAC_RDES3_IOCBIT(30)
+#define XGMAC_RDES3_LD BIT(28)
+#define XGMAC_RDES3_CDABIT(27)
+#define XGMAC_RDES3_ES BIT(15)
+#define XGMAC_RDES3_PL GENMASK(13, 0)
+#define XGMAC_RDES3_TSDBIT(6)
+#define XGMAC_RDES3_TSABIT(4)
+
 #endif /* __STMMAC_DWXGMAC2_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c
new file mode 100644
index ..1d858fdec997
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c
@@ -0,0 +1,280 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+ * stmmac XGMAC support.
+ */
+
+#include 
+#include "common.h"
+#include "dwxgmac2.h"
+
+static int dwxgmac2_get_tx_status(void *data, struct stmmac_extra_stats *x,
+ struct dma_desc *p, void __iomem *ioaddr)
+{
+   unsigned int tdes3 = le32_to_cpu(p->des3);
+   int ret = tx_done;
+
+   if (unlikely(tdes3 & XGMAC_TDES3_OWN))
+   return tx_dma_own;
+   if (likely(!(tdes3 & XGMAC_TDES3_LD)))
+   return tx_not_ls;
+
+   return ret;
+}
+
+static int dwxgmac2_get_rx_status(void *data, struct stmmac_extra_stats *x,
+ struct dma_desc *p)
+{
+   unsigned int rdes3 = le32_to_cpu(p->des3);
+   int ret = good_frame;
+
+   if (unlikely(rdes3 & XGMAC_RDES3_OWN))
+   return dma_own;
+   if (likely(!(rdes3 & XGMAC_RDES3_LD)))
+   return discard_frame;
+   if (unlikely(rdes3 & XGMAC_RDES3_ES))
+   ret = discard_frame;
+
+   return ret;
+}
+
+static int dwxgmac2_get_tx_len(struct dma_desc *p)
+{
+   return (le32_to_cpu(p->des2) & XGMAC_TDES2_B1L);
+}
+
+static int dwxgmac2_get_tx_owner(struct dma_desc *p)
+{
+   return (le32_to_cpu(p->des3) & 

[PATCH v2 net-next 1/9] net: stmmac: Add XGMAC 2.10 HWIF entry

2018-08-03 Thread Jose Abreu
Add a new entry to HWIF table for XGMAC 2.10. For now we fill it with
empty callbacks which will be added in posterior patches.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/common.h | 14 +++--
 drivers/net/ethernet/stmicro/stmmac/hwif.c   | 31 ++--
 include/linux/stmmac.h   |  1 +
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 78fd0f8b8e81..3fb81acbd274 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -36,12 +36,14 @@
 #include "mmc.h"
 
 /* Synopsys Core versions */
-#defineDWMAC_CORE_3_40 0x34
-#defineDWMAC_CORE_3_50 0x35
-#defineDWMAC_CORE_4_00 0x40
-#define DWMAC_CORE_4_100x41
-#define DWMAC_CORE_5_00 0x50
-#define DWMAC_CORE_5_10 0x51
+#defineDWMAC_CORE_3_40 0x34
+#defineDWMAC_CORE_3_50 0x35
+#defineDWMAC_CORE_4_00 0x40
+#define DWMAC_CORE_4_100x41
+#define DWMAC_CORE_5_000x50
+#define DWMAC_CORE_5_100x51
+#define DWXGMAC_CORE_2_10  0x21
+
 #define STMMAC_CHAN0   0   /* Always supported and default for all chips */
 
 /* These need to be power of two, and >= 4 */
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c 
b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 1f50e83cafb2..24f5ff175aa4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -72,6 +72,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
 static const struct stmmac_hwif_entry {
bool gmac;
bool gmac4;
+   bool xgmac;
u32 min_id;
const struct stmmac_regs_off regs;
const void *desc;
@@ -87,6 +88,7 @@ static const struct stmmac_hwif_entry {
{
.gmac = false,
.gmac4 = false,
+   .xgmac = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -103,6 +105,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = true,
.gmac4 = false,
+   .xgmac = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -119,6 +122,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -135,6 +139,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = DWMAC_CORE_4_00,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -151,6 +156,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = DWMAC_CORE_4_10,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -167,6 +173,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = DWMAC_CORE_5_10,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -180,11 +187,29 @@ static const struct stmmac_hwif_entry {
.tc = _tc_ops,
.setup = dwmac4_setup,
.quirks = NULL,
-   }
+   }, {
+   .gmac = false,
+   .gmac4 = false,
+   .xgmac = true,
+   .min_id = DWXGMAC_CORE_2_10,
+   .regs = {
+   .ptp_off = 0,
+   .mmc_off = 0,
+   },
+   .desc = NULL,
+   .dma = NULL,
+   .mac = NULL,
+   .hwtimestamp = NULL,
+   .mode = NULL,
+   .tc = NULL,
+   .setup = NULL,
+   .quirks = NULL,
+   },
 };
 
 int stmmac_hwif_init(struct stmmac_priv *priv)
 {
+   bool needs_xgmac = priv->plat->has_xgmac;
bool needs_gmac4 = priv->plat->has_gmac4;
bool needs_gmac = priv->plat->has_gmac;
const struct stmmac_hwif_entry *entry;
@@ -195,7 +220,7 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
 
if (needs_gmac) {
id = stmmac_get_id(priv, GMAC_VERSION);
-   } else if (needs_gmac4) {
+   } else if (needs_gmac4 || needs_xgmac) {
id = stmmac_get_id(priv, GMAC4_VERSION);
} else {
id = 0;
@@ -229,6 +254,8 @@ int stmmac_hwif_init(struct 

[PATCH v2 net-next 8/9] net: stmmac: Add the bindings parsing for XGMAC2

2018-08-03 Thread Jose Abreu
Add the bindings parsing for XGMAC2 IP block.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c   | 2 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
index 3304095c934c..fad503820e04 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
@@ -78,6 +78,8 @@ static const struct of_device_id dwmac_generic_match[] = {
{ .compatible = "snps,dwmac-4.00"},
{ .compatible = "snps,dwmac-4.10a"},
{ .compatible = "snps,dwmac"},
+   { .compatible = "snps,dwxgmac-2.10"},
+   { .compatible = "snps,dwxgmac"},
{ }
 };
 MODULE_DEVICE_TABLE(of, dwmac_generic_match);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 72da77b94ecd..3609c7b696c7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -486,6 +486,12 @@ stmmac_probe_config_dt(struct platform_device *pdev, const 
char **mac)
plat->force_sf_dma_mode = 1;
}
 
+   if (of_device_is_compatible(np, "snps,dwxgmac")) {
+   plat->has_xgmac = 1;
+   plat->pmt = 1;
+   plat->tso_en = of_property_read_bool(np, "snps,tso");
+   }
+
dma_cfg = devm_kzalloc(>dev, sizeof(*dma_cfg),
   GFP_KERNEL);
if (!dma_cfg) {
-- 
2.7.4




[PATCH v2 net-next 0/9] Add support for XGMAC2 in stmmac

2018-08-03 Thread Jose Abreu
This series adds support for 10Gigabit IP in stmmac. The IP is called XGMAC2
and has many similarities with GMAC4. Due to this, its relatively easy to
incorporate this new IP into stmmac driver by adding a new block and
filling the necessary callbacks.

The functionality added by this series is still reduced but its only a
starting point which will later be expanded.

I splitted the patches into funcionality and to ease the review. Only the
patch 8/9 really enables the XGMAC2 block by adding a new compatible string.

Version 2 addresses review comments of Andrew Lunn.

NOTE: Although the IP supports 10G, for now it was only possible to test it
at 1G speed due to 10G PHY HW shipping problems. Here follows iperf3
results at 1G:

---
# iperf3 -c 192.168.0.10
Connecting to host 192.168.0.10, port 5201
[  4] local 192.168.0.3 port 39178 connected to 192.168.0.10 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-1.00   sec   110 MBytes   920 Mbits/sec0482 KBytes
[  4]   1.00-2.00   sec   113 MBytes   946 Mbits/sec0482 KBytes
[  4]   2.00-3.00   sec   112 MBytes   937 Mbits/sec0482 KBytes
[  4]   3.00-4.00   sec   113 MBytes   946 Mbits/sec0482 KBytes
[  4]   4.00-5.00   sec   112 MBytes   935 Mbits/sec0482 KBytes
[  4]   5.00-6.00   sec   113 MBytes   946 Mbits/sec0482 KBytes
[  4]   6.00-7.00   sec   112 MBytes   937 Mbits/sec0482 KBytes
[  4]   7.00-8.00   sec   113 MBytes   946 Mbits/sec0482 KBytes
[  4]   8.00-9.00   sec   112 MBytes   937 Mbits/sec0482 KBytes
[  4]   9.00-10.00  sec   113 MBytes   946 Mbits/sec0482 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec0 sender
[  4]   0.00-10.00  sec  1.09 GBytes   938 Mbits/sec  receiver
---

Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
Cc: Andrew Lunn 

Jose Abreu (9):
  net: stmmac: Add XGMAC 2.10 HWIF entry
  net: stmmac: Add MAC related callbacks for XGMAC2
  net: stmmac: Add DMA related callbacks for XGMAC2
  net: stmmac: Add descriptor related callbacks for XGMAC2
  net: stmmac: Add MDIO related functions for XGMAC2
  net: stmmac: Add PTP support for XGMAC2
  net: stmmac: Integrate XGMAC into main driver flow
  net: stmmac: Add the bindings parsing for XGMAC2
  dt-bindings: net: stmmac: Add the bindings documentation for XGMAC2.

 Documentation/devicetree/bindings/net/stmmac.txt   |   5 +-
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   3 +-
 drivers/net/ethernet/stmicro/stmmac/common.h   |  17 +-
 .../net/ethernet/stmicro/stmmac/dwmac-generic.c|   2 +
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 227 
 .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c| 371 +++
 .../net/ethernet/stmicro/stmmac/dwxgmac2_descs.c   | 280 ++
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 410 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |  31 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   3 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  67 +++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  | 101 -
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   6 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c   |   6 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h   |   1 +
 include/linux/stmmac.h |   1 +
 16 files changed, 1497 insertions(+), 34 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c

-- 
2.7.4




[PATCH v2 net-next 3/9] net: stmmac: Add DMA related callbacks for XGMAC2

2018-08-03 Thread Jose Abreu
Add the DMA related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h |  56 +++
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 410 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   2 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   1 +
 5 files changed, 469 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index a6cf632c9592..da40d3bba037 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -5,7 +5,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
  dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \
  mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o  \
  dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
- stmmac_tc.o dwxgmac2_core.o $(stmmac-y)
+ stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o $(stmmac-y)
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
index 7832571f791f..ddd23f8559df 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -138,4 +138,60 @@
 #define XGMAC_ABPSIS   BIT(1)
 #define XGMAC_TXUNFIS  BIT(0)
 
+/* DMA Registers */
+#define XGMAC_DMA_MODE 0x3000
+#define XGMAC_SWR  BIT(0)
+#define XGMAC_DMA_SYSBUS_MODE  0x3004
+#define XGMAC_WR_OSR_LMT   GENMASK(29, 24)
+#define XGMAC_WR_OSR_LMT_SHIFT 24
+#define XGMAC_RD_OSR_LMT   GENMASK(21, 16)
+#define XGMAC_RD_OSR_LMT_SHIFT 16
+#define XGMAC_EN_LPI   BIT(15)
+#define XGMAC_LPI_XIT_PKT  BIT(14)
+#define XGMAC_AAL  BIT(12)
+#define XGMAC_BLEN256  BIT(7)
+#define XGMAC_BLEN128  BIT(6)
+#define XGMAC_BLEN64   BIT(5)
+#define XGMAC_BLEN32   BIT(4)
+#define XGMAC_BLEN16   BIT(3)
+#define XGMAC_BLEN8BIT(2)
+#define XGMAC_BLEN4BIT(1)
+#define XGMAC_UNDEFBIT(0)
+#define XGMAC_DMA_CH_CONTROL(x)(0x3100 + (0x80 * (x)))
+#define XGMAC_PBLx8BIT(16)
+#define XGMAC_DMA_CH_TX_CONTROL(x) (0x3104 + (0x80 * (x)))
+#define XGMAC_TxPBLGENMASK(21, 16)
+#define XGMAC_TxPBL_SHIFT  16
+#define XGMAC_TSE  BIT(12)
+#define XGMAC_OSP  BIT(4)
+#define XGMAC_TXST BIT(0)
+#define XGMAC_DMA_CH_RX_CONTROL(x) (0x3108 + (0x80 * (x)))
+#define XGMAC_RxPBLGENMASK(21, 16)
+#define XGMAC_RxPBL_SHIFT  16
+#define XGMAC_RXST BIT(0)
+#define XGMAC_DMA_CH_TxDESC_LADDR(x)   (0x3114 + (0x80 * (x)))
+#define XGMAC_DMA_CH_RxDESC_LADDR(x)   (0x311c + (0x80 * (x)))
+#define XGMAC_DMA_CH_TxDESC_TAIL_LPTR(x)   (0x3124 + (0x80 * (x)))
+#define XGMAC_DMA_CH_RxDESC_TAIL_LPTR(x)   (0x312c + (0x80 * (x)))
+#define XGMAC_DMA_CH_TxDESC_RING_LEN(x)(0x3130 + (0x80 * 
(x)))
+#define XGMAC_DMA_CH_RxDESC_RING_LEN(x)(0x3134 + (0x80 * 
(x)))
+#define XGMAC_DMA_CH_INT_EN(x) (0x3138 + (0x80 * (x)))
+#define XGMAC_NIE  BIT(15)
+#define XGMAC_AIE  BIT(14)
+#define XGMAC_RBUE BIT(7)
+#define XGMAC_RIE  BIT(6)
+#define XGMAC_TIE  BIT(0)
+#define XGMAC_DMA_INT_DEFAULT_EN   (XGMAC_NIE | XGMAC_AIE | XGMAC_RBUE | \
+   XGMAC_RIE | XGMAC_TIE)
+#define XGMAC_DMA_CH_Rx_WATCHDOG(x)(0x313c + (0x80 * (x)))
+#define XGMAC_RWT  GENMASK(7, 0)
+#define XGMAC_DMA_CH_STATUS(x) (0x3160 + (0x80 * (x)))
+#define XGMAC_NIS  BIT(15)
+#define XGMAC_AIS  BIT(14)
+#define XGMAC_FBE  BIT(12)
+#define XGMAC_RBU  BIT(7)
+#define XGMAC_RI   BIT(6)
+#define XGMAC_TPS  BIT(1)
+#define XGMAC_TI   BIT(0)
+
 #endif /* __STMMAC_DWXGMAC2_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
new file mode 100644
index ..50d9fffc32b5
--- /dev/null
+++ 

[PATCH v2 net-next 2/9] net: stmmac: Add MAC related callbacks for XGMAC2

2018-08-03 Thread Jose Abreu
Add the MAC related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/common.h   |   3 +
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 141 
 .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c| 371 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   4 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   1 +
 6 files changed, 519 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 68e9e2640c62..a6cf632c9592 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -5,7 +5,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
  dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \
  mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o  \
  dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
- stmmac_tc.o $(stmmac-y)
+ stmmac_tc.o dwxgmac2_core.o $(stmmac-y)
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 3fb81acbd274..1854f270ad66 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -400,6 +400,8 @@ struct mac_link {
u32 speed10;
u32 speed100;
u32 speed1000;
+   u32 speed2500;
+   u32 speed1;
u32 duplex;
 };
 
@@ -441,6 +443,7 @@ struct stmmac_rx_routing {
 int dwmac100_setup(struct stmmac_priv *priv);
 int dwmac1000_setup(struct stmmac_priv *priv);
 int dwmac4_setup(struct stmmac_priv *priv);
+int dwxgmac2_setup(struct stmmac_priv *priv);
 
 void stmmac_set_mac_addr(void __iomem *ioaddr, u8 addr[6],
 unsigned int high, unsigned int low);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
new file mode 100644
index ..7832571f791f
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+ * stmmac XGMAC definitions.
+ */
+
+#ifndef __STMMAC_DWXGMAC2_H__
+#define __STMMAC_DWXGMAC2_H__
+
+#include "common.h"
+
+/* Misc */
+#define XGMAC_JUMBO_LEN16368
+
+/* MAC Registers */
+#define XGMAC_TX_CONFIG0x
+#define XGMAC_CONFIG_SS_OFF29
+#define XGMAC_CONFIG_SS_MASK   GENMASK(30, 29)
+#define XGMAC_CONFIG_SS_1  (0x0 << XGMAC_CONFIG_SS_OFF)
+#define XGMAC_CONFIG_SS_2500   (0x2 << XGMAC_CONFIG_SS_OFF)
+#define XGMAC_CONFIG_SS_1000   (0x3 << XGMAC_CONFIG_SS_OFF)
+#define XGMAC_CONFIG_SARC  GENMASK(22, 20)
+#define XGMAC_CONFIG_SARC_SHIFT20
+#define XGMAC_CONFIG_JDBIT(16)
+#define XGMAC_CONFIG_TEBIT(0)
+#define XGMAC_CORE_INIT_TX (XGMAC_CONFIG_JD)
+#define XGMAC_RX_CONFIG0x0004
+#define XGMAC_CONFIG_ARPEN BIT(31)
+#define XGMAC_CONFIG_GPSL  GENMASK(29, 16)
+#define XGMAC_CONFIG_GPSL_SHIFT16
+#define XGMAC_CONFIG_S2KP  BIT(11)
+#define XGMAC_CONFIG_IPC   BIT(9)
+#define XGMAC_CONFIG_JEBIT(8)
+#define XGMAC_CONFIG_WDBIT(7)
+#define XGMAC_CONFIG_GPSLCEBIT(6)
+#define XGMAC_CONFIG_CST   BIT(2)
+#define XGMAC_CONFIG_ACS   BIT(1)
+#define XGMAC_CONFIG_REBIT(0)
+#define XGMAC_CORE_INIT_RX 0
+#define XGMAC_PACKET_FILTER0x0008
+#define XGMAC_FILTER_RABIT(31)
+#define XGMAC_FILTER_PMBIT(4)
+#define XGMAC_FILTER_HMC   BIT(2)
+#define XGMAC_FILTER_PRBIT(0)
+#define XGMAC_HASH_TABLE(x)(0x0010 + (x) * 4)
+#define XGMAC_RXQ_CTRL00x00a0
+#define XGMAC_RXQEN(x) GENMASK((x) * 2 + 1, (x) * 2)
+#define XGMAC_RXQEN_SHIFT(x)   ((x) * 2)
+#define XGMAC_RXQ_CTRL20x00a8
+#define XGMAC_RXQ_CTRL30x00ac
+#define XGMAC_PSRQ(x)  GENMASK((x) * 8 + 7, (x) * 8)
+#define XGMAC_PSRQ_SHIFT(x)((x) * 8)
+#define XGMAC_INT_STATUS   0x00b0
+#define XGMAC_PMTIS  

[PATCH v2 net-next 6/9] net: stmmac: Add PTP support for XGMAC2

2018-08-03 Thread Jose Abreu
XGMAC2 uses the same engine of timestamping as GMAC4. Let's use the same
callbacks.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/hwif.c   | 4 ++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c | 6 --
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h | 1 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c 
b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 4b4ba1c8bad5..357309a6d6a5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -193,13 +193,13 @@ static const struct stmmac_hwif_entry {
.xgmac = true,
.min_id = DWXGMAC_CORE_2_10,
.regs = {
-   .ptp_off = 0,
+   .ptp_off = PTP_XGMAC_OFFSET,
.mmc_off = 0,
},
.desc = _desc_ops,
.dma = _dma_ops,
.mac = _ops,
-   .hwtimestamp = NULL,
+   .hwtimestamp = _ptp,
.mode = NULL,
.tc = NULL,
.setup = dwxgmac2_setup,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
index 0cb0e39a2be9..2293e21f789f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
@@ -71,6 +71,9 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 
delta)
u32 sec, nsec;
u32 quotient, reminder;
int neg_adj = 0;
+   bool xmac;
+
+   xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac;
 
if (delta < 0) {
neg_adj = 1;
@@ -82,8 +85,7 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 
delta)
nsec = reminder;
 
spin_lock_irqsave(>ptp_lock, flags);
-   stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj,
-   priv->plat->has_gmac4);
+   stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj, xmac);
spin_unlock_irqrestore(>ptp_lock, flags);
 
return 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
index f4b31d69f60e..ecccf895fd7e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
@@ -21,6 +21,7 @@
 #ifndef__STMMAC_PTP_H__
 #define__STMMAC_PTP_H__
 
+#define PTP_XGMAC_OFFSET   0xd00
 #definePTP_GMAC4_OFFSET0xb00
 #definePTP_GMAC3_X_OFFSET  0x700
 
-- 
2.7.4




Re: UDP packets arriving on wrong sockets

2018-08-03 Thread Willem de Bruijn
On Fri, Aug 3, 2018 at 12:20 AM Andrew Cann  wrote:
>
> On Thu, Aug 02, 2018 at 11:21:41AM -0400, Willem de Bruijn wrote:
> > You have two sockets bound to the same address and port? Is this using
> > SO_REUSEPORT?
>
> Yes, this is using SO_REUSEPORT.

Then this is working as intended.

Without SO_REUSEPORT it would not be possible to bind two sockets to
the same address and port. See documentation, e.g., at
https://lwn.net/Articles/542629/


Re: [PATCH net 1/4] mlxsw: core_acl_flex_actions: Return error for conflicting actions

2018-08-03 Thread Ido Schimmel
+Stephen

On Fri, Aug 03, 2018 at 03:57:41PM +0300, Ido Schimmel wrote:
> From: Nir Dotan 
> 
> Spectrum switch ACL action set is built in groups of three actions
> which may point to additional actions. A group holds a single record
> which can be set as goto record for pointing at a following group
> or can be set to mark the termination of the lookup. This is perfectly
> adequate for handling a series of actions to be executed on a packet.
> While the SW model allows configuration of conflicting actions
> where it is clear that some actions will never execute, the mlxsw
> driver must block such configurations as it creates a conflict
> over the single terminate/goto record value.
...
> Where it is clear that the last action will never execute, the
> mlxsw driver was issuing a warning instead of returning an error.
> Therefore replace that warning with an error for this specific
> case.
> 
> Fixes: 4cda7d8d7098 ("mlxsw: core: Introduce flexible actions support")
> Signed-off-by: Nir Dotan 
> Reviewed-by: Jiri Pirko 
> Signed-off-by: Ido Schimmel 
> ---
>  .../mellanox/mlxsw/core_acl_flex_actions.c| 42 +--

Dave / Stephen, please note that this is going to conflict with recent
extack changes in net-next when you merge net into net-next.

Resolution is available here:
https://github.com/jpirko/linux_mlxsw/blob/combined_queue/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c

Thanks and sorry about the conflict


[PATCH net 3/4] mlxsw: core_acl_flex_actions: Remove redundant counter destruction

2018-08-03 Thread Ido Schimmel
From: Nir Dotan 

Each tc flower rule uses a hidden count action. As counter resource may
not be available due to limited HW resources, update _counter_create()
and _counter_destroy() pair to follow previously introduced symmetric
error condition handling, add a call to mlxsw_afa_resource_del() as part
of the counter resource destruction.

Fixes: c18c1e186ba8 ("mlxsw: core: Make counter index allocated inside the 
action append")
Signed-off-by: Nir Dotan 
Reviewed-by: Petr Machata 
Reviewed-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
index d664cc0289c2..a54f23f00a5f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
@@ -584,6 +584,7 @@ static void
 mlxsw_afa_counter_destroy(struct mlxsw_afa_block *block,
  struct mlxsw_afa_counter *counter)
 {
+   mlxsw_afa_resource_del(>resource);
block->afa->ops->counter_index_put(block->afa->ops_priv,
   counter->counter_index);
kfree(counter);
-- 
2.17.1



[PATCH net 0/4] mlxsw: Fix ACL actions error condition handling

2018-08-03 Thread Ido Schimmel
Nir says:

Two issues were lately noticed within mlxsw ACL actions error condition
handling. The first patch deals with conflicting actions such as:

 # tc filter add dev swp49 parent : \
   protocol ip pref 10 flower skip_sw dst_ip 192.168.101.1 \
   action goto chain 100 \
   action mirred egress redirect dev swp4

The second action will never execute, however SW model allows this
configuration, while the mlxsw driver cannot allow for it as it
implements actions in sets of up to three actions per set with a single
termination marking. Conflicting actions create a contradiction over
this single marking and thus cannot be configured. The fix replaces a
misplaced warning with an error code to be returned.

Patches 2-4 fix a condition of duplicate destruction of resources. Some
actions require allocation of specific resource prior to setting the
action itself. On error condition this resource was destroyed twice,
leading to a crash when using mirror action, and to a redundant
destruction in other cases, since for error condition rule destruction
also takes care of resource destruction. In order to fix this state a
symmetry in behavior is added and resource destruction also takes care
of removing the resource from rule's resource list.

Nir Dotan (4):
  mlxsw: core_acl_flex_actions: Return error for conflicting actions
  mlxsw: core_acl_flex_actions: Remove redundant resource destruction
  mlxsw: core_acl_flex_actions: Remove redundant counter destruction
  mlxsw: core_acl_flex_actions: Remove redundant mirror resource
destruction

 .../mellanox/mlxsw/core_acl_flex_actions.c| 51 +++
 1 file changed, 29 insertions(+), 22 deletions(-)

-- 
2.17.1



[PATCH net 1/4] mlxsw: core_acl_flex_actions: Return error for conflicting actions

2018-08-03 Thread Ido Schimmel
From: Nir Dotan 

Spectrum switch ACL action set is built in groups of three actions
which may point to additional actions. A group holds a single record
which can be set as goto record for pointing at a following group
or can be set to mark the termination of the lookup. This is perfectly
adequate for handling a series of actions to be executed on a packet.
While the SW model allows configuration of conflicting actions
where it is clear that some actions will never execute, the mlxsw
driver must block such configurations as it creates a conflict
over the single terminate/goto record value.

For a conflicting actions configuration such as:

 # tc filter add dev swp49 parent : \
   protocol ip pref 10 \
   flower skip_sw dst_ip 192.168.101.1 \
   action goto chain 100 \
   action mirred egress mirror dev swp4

Where it is clear that the last action will never execute, the
mlxsw driver was issuing a warning instead of returning an error.
Therefore replace that warning with an error for this specific
case.

Fixes: 4cda7d8d7098 ("mlxsw: core: Introduce flexible actions support")
Signed-off-by: Nir Dotan 
Reviewed-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 .../mellanox/mlxsw/core_acl_flex_actions.c| 42 +--
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
index 3c0d882ba183..ce280680258e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
@@ -626,8 +626,8 @@ static char *mlxsw_afa_block_append_action(struct 
mlxsw_afa_block *block,
char *oneact;
char *actions;
 
-   if (WARN_ON(block->finished))
-   return NULL;
+   if (block->finished)
+   return ERR_PTR(-EINVAL);
if (block->cur_act_index + action_size >
block->afa->max_acts_per_set) {
struct mlxsw_afa_set *set;
@@ -637,7 +637,7 @@ static char *mlxsw_afa_block_append_action(struct 
mlxsw_afa_block *block,
 */
set = mlxsw_afa_set_create(false);
if (!set)
-   return NULL;
+   return ERR_PTR(-ENOBUFS);
set->prev = block->cur_set;
block->cur_act_index = 0;
block->cur_set->next = set;
@@ -724,8 +724,8 @@ int mlxsw_afa_block_append_vlan_modify(struct 
mlxsw_afa_block *block,
  MLXSW_AFA_VLAN_CODE,
  MLXSW_AFA_VLAN_SIZE);
 
-   if (!act)
-   return -ENOBUFS;
+   if (IS_ERR(act))
+   return PTR_ERR(act);
mlxsw_afa_vlan_pack(act, MLXSW_AFA_VLAN_VLAN_TAG_CMD_NOP,
MLXSW_AFA_VLAN_CMD_SET_OUTER, vid,
MLXSW_AFA_VLAN_CMD_SET_OUTER, pcp,
@@ -806,8 +806,8 @@ int mlxsw_afa_block_append_drop(struct mlxsw_afa_block 
*block)
  MLXSW_AFA_TRAPDISC_CODE,
  MLXSW_AFA_TRAPDISC_SIZE);
 
-   if (!act)
-   return -ENOBUFS;
+   if (IS_ERR(act))
+   return PTR_ERR(act);
mlxsw_afa_trapdisc_pack(act, MLXSW_AFA_TRAPDISC_TRAP_ACTION_NOP,
MLXSW_AFA_TRAPDISC_FORWARD_ACTION_DISCARD, 0);
return 0;
@@ -820,8 +820,8 @@ int mlxsw_afa_block_append_trap(struct mlxsw_afa_block 
*block, u16 trap_id)
  MLXSW_AFA_TRAPDISC_CODE,
  MLXSW_AFA_TRAPDISC_SIZE);
 
-   if (!act)
-   return -ENOBUFS;
+   if (IS_ERR(act))
+   return PTR_ERR(act);
mlxsw_afa_trapdisc_pack(act, MLXSW_AFA_TRAPDISC_TRAP_ACTION_TRAP,
MLXSW_AFA_TRAPDISC_FORWARD_ACTION_DISCARD,
trap_id);
@@ -836,8 +836,8 @@ int mlxsw_afa_block_append_trap_and_forward(struct 
mlxsw_afa_block *block,
  MLXSW_AFA_TRAPDISC_CODE,
  MLXSW_AFA_TRAPDISC_SIZE);
 
-   if (!act)
-   return -ENOBUFS;
+   if (IS_ERR(act))
+   return PTR_ERR(act);
mlxsw_afa_trapdisc_pack(act, MLXSW_AFA_TRAPDISC_TRAP_ACTION_TRAP,
MLXSW_AFA_TRAPDISC_FORWARD_ACTION_FORWARD,
trap_id);
@@ -908,8 +908,8 @@ mlxsw_afa_block_append_allocated_mirror(struct 
mlxsw_afa_block *block,
char *act = mlxsw_afa_block_append_action(block,
  MLXSW_AFA_TRAPDISC_CODE,
  MLXSW_AFA_TRAPDISC_SIZE);
-   if (!act)
-   return -ENOBUFS;
+   if (IS_ERR(act))
+   

[PATCH net 2/4] mlxsw: core_acl_flex_actions: Remove redundant resource destruction

2018-08-03 Thread Ido Schimmel
From: Nir Dotan 

Some ACL actions require the allocation of a separate resource
prior to applying the action itself. When facing an error condition
during the setup phase of the action, resource should be destroyed.
For such actions the destruction was done twice which is dangerous
and lead to a potential crash.
The destruction took place first upon error on action setup phase
and then as the rule was destroyed.

The following sequence generated a crash:

 # tc qdisc add dev swp49 ingress
 # tc filter add dev swp49 parent : \
   protocol ip chain 100 pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action drop
 # tc filter add dev swp49 parent : \
   protocol ip pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
   action mirred egress mirror dev swp4

Therefore add mlxsw_afa_resource_del() as a complement of
mlxsw_afa_resource_add() to add symmetry to resource_list membership
handling. Call this from mlxsw_afa_fwd_entry_ref_destroy() to make the
_fwd_entry_ref_create() and _fwd_entry_ref_destroy() pair of calls a
NOP.

Fixes: 140ce421217e ("mlxsw: core: Convert fwd_entry_ref list to be generic 
per-block resource list")
Signed-off-by: Nir Dotan 
Reviewed-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 .../net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c| 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
index ce280680258e..d664cc0289c2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
@@ -327,12 +327,16 @@ static void mlxsw_afa_resource_add(struct mlxsw_afa_block 
*block,
list_add(>list, >resource_list);
 }
 
+static void mlxsw_afa_resource_del(struct mlxsw_afa_resource *resource)
+{
+   list_del(>list);
+}
+
 static void mlxsw_afa_resources_destroy(struct mlxsw_afa_block *block)
 {
struct mlxsw_afa_resource *resource, *tmp;
 
list_for_each_entry_safe(resource, tmp, >resource_list, list) {
-   list_del(>list);
resource->destructor(block, resource);
}
 }
@@ -530,6 +534,7 @@ static void
 mlxsw_afa_fwd_entry_ref_destroy(struct mlxsw_afa_block *block,
struct mlxsw_afa_fwd_entry_ref *fwd_entry_ref)
 {
+   mlxsw_afa_resource_del(_entry_ref->resource);
mlxsw_afa_fwd_entry_put(block->afa, fwd_entry_ref->fwd_entry);
kfree(fwd_entry_ref);
 }
-- 
2.17.1



[PATCH net 4/4] mlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction

2018-08-03 Thread Ido Schimmel
From: Nir Dotan 

In previous patch mlxsw_afa_resource_del() was added to avoid a duplicate
resource detruction scenario.
For mirror actions, such duplicate destruction leads to a crash as in:

 # tc qdisc add dev swp49 ingress
 # tc filter add dev swp49 parent : \
   protocol ip chain 100 pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action drop
 # tc filter add dev swp49 parent : \
   protocol ip pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
   action mirred egress mirror dev swp4

Therefore add a call to mlxsw_afa_resource_del() in
mlxsw_afa_mirror_destroy() in order to clear that resource
from rule's resources.

Fixes: d0d13c1858a1 ("mlxsw: spectrum_acl: Add support for mirror action")
Signed-off-by: Nir Dotan 
Reviewed-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
index a54f23f00a5f..f6f6a568d66a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
@@ -862,6 +862,7 @@ static void
 mlxsw_afa_mirror_destroy(struct mlxsw_afa_block *block,
 struct mlxsw_afa_mirror *mirror)
 {
+   mlxsw_afa_resource_del(>resource);
block->afa->ops->mirror_del(block->afa->ops_priv,
mirror->local_in_port,
mirror->span_id,
-- 
2.17.1



Re: [PATCH v8 bpf-next 00/10] veth: Driver XDP

2018-08-03 Thread Toshiaki Makita

On 18/08/03 (金) 18:45, Jesper Dangaard Brouer wrote:

On Fri,  3 Aug 2018 16:58:08 +0900
Toshiaki Makita  wrote:


This patch set introduces driver XDP for veth.
Basically this is used in conjunction with redirect action of another XDP
program.

   NIC ---> veth===veth
  (XDP) (redirect)(XDP)



I'm was playing with V7 on my testlab yesterday and I noticed one
fundamental issue.  You are not updating the "ifconfig" stats counters,
when in XDP mode.  This makes receive or send via XDP invisible to
sysadm/management tools.  This for-sure is going to cause confusion...


Yes, I did not update stats on ndo_xdp_xmit. My intention was that I'm 
going to make another patch set to make stats nice after this, but did 
not state that in the cover letter. Sorry about that.



I took a closer look at other driver. The ixgbe driver is doing the
right thing.  Driver i40e have a bug, where RX/TX stats are swapped
getting (strange!).  The mlx5 driver is not updating the regular RX/TX
counters, but A LOT of other ethtool stats counters (which are the ones
I usually monitor when testing).

So, given other drivers also didn't get this right, we need to have a
discussion outside your/this patchset.  Thus, I don't want to
stop/stall this patchset, but this is something we need to fixup in a
followup patchset to other drivers as well.


One of the reason why I did not include the stats patches in this series 
is that as you say basically stats in many drivers do not look correct 
and I thought the correctness is not strictly required for now.
In fact I recently fixed virtio_net stats which only updated packets 
counter but not bytes counter on XDP_DROP.


Another reason is that it will hurt the performance without more 
aggressive stats structure change. Drop counter is currently atomic so 
it would cause heavy cache contention on multiqueue env. The plan is to 
make this per-cpu or per-queue first. Also I want to introduce per-queue 
stats for ethtool, so the change would be relatively big and probably 
not fit in this series all together.



Thus, I'm acking the patchset, but I request that we do a joint effort
of fixing this as followup patches.


Sure, at least for veth I'm going to make a followup patches.


Acked-by: Jesper Dangaard Brouer 


Thank you for your thorough review!

Toshiaki Makita


Re: [PATCH net 0/5] tcp: more robust ooo handling

2018-08-03 Thread David Woodhouse
On Mon, 2018-07-23 at 12:03 -0700, David Miller wrote:
> From: Eric Dumazet 
> Date: Mon, 23 Jul 2018 09:28:16 -0700
> 
> > Juha-Matti Tilli reported that malicious peers could inject tiny
> > packets in out_of_order_queue, forcing very expensive calls
> > to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for
> > every incoming packet.
> > 
> > With tcp_rmem[2] default of 6MB, the ooo queue could
> > contain ~7000 nodes.
> > 
> > This patch series makes sure we cut cpu cycles enough to
> > render the attack not critical.
> > 
> > We might in the future go further, like disconnecting
> > or black-holing proven malicious flows.
> 
> Sucky...
> 
> It took me a while to understand the sums_tiny logic, every
> time I read that function I forget that we reset all of the
> state and restart the loop after a coalesce inside the loop.
> 
> Series applied, and queued up for -stable.

I see the first four in 4.9.116 but not the fifth (adding
tcp_ooo_try_coalesce()).

Is that intentional? 

smime.p7s
Description: S/MIME cryptographic signature


[PATCH net-next 3/3] l2tp: ignore L2TP_ATTR_MTU

2018-08-03 Thread Guillaume Nault
This attribute's handling is broken. It can only be used when creating
Ethernet pseudo-wires, in which case its value can be used as the
initial MTU for the l2tpeth device.
However, when handling update requests, L2TP_ATTR_MTU only modifies
session->mtu. This value is never propagated to the l2tpeth device.
Dump requests also return the value of session->mtu, which is not
synchronised anymore with the device MTU.

The same problem occurs if the device MTU is properly updated using the
generic IFLA_MTU attribute. In this case, session->mtu is not updated,
and L2TP_ATTR_MTU will report an invalid value again when dumping the
session.

It does not seem worthwhile to complexify l2tp_eth.c to synchronise
session->mtu with the device MTU. Even the ip-l2tp manpage advises to
use 'ip link' to initialise the MTU of l2tpeth devices (iproute2 does
not handle L2TP_ATTR_MTU at all anyway). So let's just ignore it
entirely.

Signed-off-by: Guillaume Nault 
---
 include/uapi/linux/l2tp.h |  2 +-
 net/l2tp/l2tp_core.c  |  1 -
 net/l2tp/l2tp_core.h  |  2 --
 net/l2tp/l2tp_debugfs.c   |  3 +--
 net/l2tp/l2tp_eth.c   | 17 +++--
 net/l2tp/l2tp_netlink.c   |  9 +
 6 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/include/uapi/linux/l2tp.h b/include/uapi/linux/l2tp.h
index 8bb8c7cfabe5..61158f5a1a5b 100644
--- a/include/uapi/linux/l2tp.h
+++ b/include/uapi/linux/l2tp.h
@@ -119,7 +119,7 @@ enum {
L2TP_ATTR_IP_DADDR, /* u32 */
L2TP_ATTR_UDP_SPORT,/* u16 */
L2TP_ATTR_UDP_DPORT,/* u16 */
-   L2TP_ATTR_MTU,  /* u16 */
+   L2TP_ATTR_MTU,  /* u16 (not used) */
L2TP_ATTR_MRU,  /* u16 (not used) */
L2TP_ATTR_STATS,/* nested */
L2TP_ATTR_IP6_SADDR,/* struct in6_addr */
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index c61a467fd9b8..ac6a00bcec71 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1674,7 +1674,6 @@ struct l2tp_session *l2tp_session_create(int priv_size, 
struct l2tp_tunnel *tunn
if (cfg) {
session->pwtype = cfg->pw_type;
session->debug = cfg->debug;
-   session->mtu = cfg->mtu;
session->send_seq = cfg->send_seq;
session->recv_seq = cfg->recv_seq;
session->lns_mode = cfg->lns_mode;
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 1ca39629031b..5804065dfbfb 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -64,7 +64,6 @@ struct l2tp_session_cfg {
int peer_cookie_len; /* 0, 4 or 8 bytes */
int reorder_timeout; /* configured reorder timeout
  * (in jiffies) */
-   int mtu;
char*ifname;
 };
 
@@ -108,7 +107,6 @@ struct l2tp_session {
int reorder_timeout; /* configured reorder timeout
  * (in jiffies) */
int reorder_skip;   /* set if skip to next nr */
-   int mtu;
enum l2tp_pwtypepwtype;
struct l2tp_stats   stats;
struct hlist_node   global_hlist;   /* Global hash list node */
diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
index aee271741f5b..9821a1458555 100644
--- a/net/l2tp/l2tp_debugfs.c
+++ b/net/l2tp/l2tp_debugfs.c
@@ -191,8 +191,7 @@ static void l2tp_dfs_seq_session_show(struct seq_file *m, 
void *v)
if (session->send_seq || session->recv_seq)
seq_printf(m, "   nr %hu, ns %hu\n", session->nr, session->ns);
seq_printf(m, "   refcnt %d\n", refcount_read(>ref_count));
-   seq_printf(m, "   config %d/0/%c/%c/-/%s %08x %u\n",
-  session->mtu,
+   seq_printf(m, "   config 0/0/%c/%c/-/%s %08x %u\n",
   session->recv_seq ? 'R' : '-',
   session->send_seq ? 'S' : '-',
   session->lns_mode ? "LNS" : "LAC",
diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index cfca5e63ae31..3728986ec885 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -234,14 +234,11 @@ static void l2tp_eth_adjust_mtu(struct l2tp_tunnel 
*tunnel,
overhead += sizeof(struct udphdr);
dev->needed_headroom += sizeof(struct udphdr);
}
-   if (session->mtu != 0) {
-   dev->mtu = session->mtu;
-   dev->needed_headroom += session->hdr_len;
-   return;
-   }
+
lock_sock(tunnel->sock);
l3_overhead = kernel_sock_ip_overhead(tunnel->sock);
release_sock(tunnel->sock);
+
if (l3_overhead == 0) {
/* L3 Overhead couldn't be identified, this could be
 * because 

[PATCH net-next 1/3] l2tp: define l2tp_tunnel_dst_mtu()

2018-08-03 Thread Guillaume Nault
Consolidate retrieval of tunnel's socket mtu in order to simplify
l2tp_eth and l2tp_ppp a bit.

Signed-off-by: Guillaume Nault 
---
 net/l2tp/l2tp_core.h | 18 ++
 net/l2tp/l2tp_eth.c  | 14 --
 net/l2tp/l2tp_ppp.c  | 15 ---
 3 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index fa5ae9432d38..1ca39629031b 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -12,6 +12,9 @@
 #ifndef _L2TP_CORE_H_
 #define _L2TP_CORE_H_
 
+#include 
+#include 
+
 /* Just some random numbers */
 #define L2TP_TUNNEL_MAGIC  0x42114DDA
 #define L2TP_SESSION_MAGIC 0x0C04EB7D
@@ -268,6 +271,21 @@ static inline int l2tp_get_l2specific_len(struct 
l2tp_session *session)
}
 }
 
+static inline u32 l2tp_tunnel_dst_mtu(const struct l2tp_tunnel *tunnel)
+{
+   struct dst_entry *dst;
+   u32 mtu;
+
+   dst = sk_dst_get(tunnel->sock);
+   if (!dst)
+   return 0;
+
+   mtu = dst_mtu(dst);
+   dst_release(dst);
+
+   return mtu;
+}
+
 #define l2tp_printk(ptr, type, func, fmt, ...) \
 do {   \
if (((ptr)->debug) & (type))\
diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 5c366ecfa1cb..cfca5e63ae31 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -226,8 +226,8 @@ static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel,
struct net_device *dev)
 {
unsigned int overhead = 0;
-   struct dst_entry *dst;
u32 l3_overhead = 0;
+   u32 mtu;
 
/* if the encap is UDP, account for UDP header size */
if (tunnel->encap == L2TP_ENCAPTYPE_UDP) {
@@ -256,15 +256,9 @@ static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel,
overhead += session->hdr_len + ETH_HLEN + l3_overhead;
 
/* If PMTU discovery was enabled, use discovered MTU on L2TP device */
-   dst = sk_dst_get(tunnel->sock);
-   if (dst) {
-   /* dst_mtu will use PMTU if found, else fallback to intf MTU */
-   u32 pmtu = dst_mtu(dst);
-
-   if (pmtu != 0)
-   dev->mtu = pmtu;
-   dst_release(dst);
-   }
+   mtu = l2tp_tunnel_dst_mtu(tunnel);
+   if (mtu)
+   dev->mtu = mtu;
session->mtu = dev->mtu - overhead;
dev->mtu = session->mtu;
dev->needed_headroom += session->hdr_len;
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 44cac66284a5..1c6da02f976a 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -93,7 +93,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -554,7 +553,7 @@ static void pppol2tp_show(struct seq_file *m, void *arg)
 static void pppol2tp_session_init(struct l2tp_session *session)
 {
struct pppol2tp_session *ps;
-   struct dst_entry *dst;
+   u32 mtu;
 
session->recv_skb = pppol2tp_recv;
 #if IS_ENABLED(CONFIG_L2TP_DEBUGFS)
@@ -566,15 +565,9 @@ static void pppol2tp_session_init(struct l2tp_session 
*session)
ps->owner = current->pid;
 
/* If PMTU discovery was enabled, use the MTU that was discovered */
-   dst = sk_dst_get(session->tunnel->sock);
-   if (dst) {
-   u32 pmtu = dst_mtu(dst);
-
-   if (pmtu)
-   session->mtu = pmtu - PPPOL2TP_HEADER_OVERHEAD;
-
-   dst_release(dst);
-   }
+   mtu = l2tp_tunnel_dst_mtu(session->tunnel);
+   if (mtu)
+   session->mtu = mtu - PPPOL2TP_HEADER_OVERHEAD;
 }
 
 struct l2tp_connect_info {
-- 
2.18.0



[PATCH net-next 2/3] l2tp: simplify MTU handling in l2tp_ppp

2018-08-03 Thread Guillaume Nault
The value of the session's .mtu field, as defined by
pppol2tp_connect() or pppol2tp_session_create(), is later overwritten
by pppol2tp_session_init() (unless getting the tunnel's socket PMTU
fails). This field is then only used when setting the PPP channel's MTU
in pppol2tp_connect().
Furthermore, the SIOC[GS]IFMTU ioctls only act on the session's .mtu
without propagating this value to the PPP channel, making them useless.

This patch initialises the PPP channel's MTU directly and ignores the
session's .mtu entirely. MTU is still computed by subtracting the
PPPOL2TP_HEADER_OVERHEAD constant. It is not optimal, but that doesn't
really matter: po->chan.mtu is only used when the channel is part of a
multilink PPP bundle. Running multilink PPP over packet switched
networks is certainly not going to be efficient, so not picking the
best MTU does not harm (in the worst case, packets will just be
fragmented by the underlay).

The SIOC[GS]IFMTU ioctls are removed entirely (as opposed to simply
ignored), because these ioctls commands are part of the requests that
should be handled generically by the socket layer. PX_PROTO_OL2TP was
the only socket type abusing these ioctls.

Signed-off-by: Guillaume Nault 
---
 net/l2tp/l2tp_ppp.c | 67 -
 1 file changed, 18 insertions(+), 49 deletions(-)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 1c6da02f976a..b403728e2757 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -553,7 +553,6 @@ static void pppol2tp_show(struct seq_file *m, void *arg)
 static void pppol2tp_session_init(struct l2tp_session *session)
 {
struct pppol2tp_session *ps;
-   u32 mtu;
 
session->recv_skb = pppol2tp_recv;
 #if IS_ENABLED(CONFIG_L2TP_DEBUGFS)
@@ -563,11 +562,6 @@ static void pppol2tp_session_init(struct l2tp_session 
*session)
ps = l2tp_session_priv(session);
mutex_init(>sk_lock);
ps->owner = current->pid;
-
-   /* If PMTU discovery was enabled, use the MTU that was discovered */
-   mtu = l2tp_tunnel_dst_mtu(session->tunnel);
-   if (mtu)
-   session->mtu = mtu - PPPOL2TP_HEADER_OVERHEAD;
 }
 
 struct l2tp_connect_info {
@@ -654,6 +648,22 @@ static int pppol2tp_sockaddr_get_info(const void *sa, int 
sa_len,
return 0;
 }
 
+/* Rough estimation of the maximum payload size a tunnel can transmit without
+ * fragmenting at the lower IP layer. Assumes L2TPv2 with sequence
+ * numbers and no IP option. Not quite accurate, but the result is mostly
+ * unused anyway.
+ */
+static int pppol2tp_tunnel_mtu(const struct l2tp_tunnel *tunnel)
+{
+   int mtu;
+
+   mtu = l2tp_tunnel_dst_mtu(tunnel);
+   if (mtu <= PPPOL2TP_HEADER_OVERHEAD)
+   return 1500 - PPPOL2TP_HEADER_OVERHEAD;
+
+   return mtu - PPPOL2TP_HEADER_OVERHEAD;
+}
+
 /* connect() handler. Attach a PPPoX socket to a tunnel UDP socket
  */
 static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr,
@@ -771,8 +781,6 @@ static int pppol2tp_connect(struct socket *sock, struct 
sockaddr *uservaddr,
goto end;
}
} else {
-   /* Default MTU must allow space for UDP/L2TP/PPP headers */
-   cfg.mtu = 1500 - PPPOL2TP_HEADER_OVERHEAD;
cfg.pw_type = L2TP_PWTYPE_PPP;
 
session = l2tp_session_create(sizeof(struct pppol2tp_session),
@@ -817,7 +825,7 @@ static int pppol2tp_connect(struct socket *sock, struct 
sockaddr *uservaddr,
 
po->chan.private = sk;
po->chan.ops = _chan_ops;
-   po->chan.mtu = session->mtu;
+   po->chan.mtu = pppol2tp_tunnel_mtu(tunnel);
 
error = ppp_register_net_channel(sock_net(sk), >chan);
if (error) {
@@ -873,10 +881,6 @@ static int pppol2tp_session_create(struct net *net, struct 
l2tp_tunnel *tunnel,
goto err;
}
 
-   /* Default MTU values. */
-   if (cfg->mtu == 0)
-   cfg->mtu = 1500 - PPPOL2TP_HEADER_OVERHEAD;
-
/* Allocate and initialize a new session context. */
session = l2tp_session_create(sizeof(struct pppol2tp_session),
  tunnel, session_id,
@@ -1040,7 +1044,6 @@ static void pppol2tp_copy_stats(struct pppol2tp_ioc_stats 
*dest,
 static int pppol2tp_session_ioctl(struct l2tp_session *session,
  unsigned int cmd, unsigned long arg)
 {
-   struct ifreq ifr;
int err = 0;
struct sock *sk;
int val = (int) arg;
@@ -1056,39 +1059,6 @@ static int pppol2tp_session_ioctl(struct l2tp_session 
*session,
return -EBADR;
 
switch (cmd) {
-   case SIOCGIFMTU:
-   err = -ENXIO;
-   if (!(sk->sk_state & PPPOX_CONNECTED))
-   break;
-
-   err = -EFAULT;
-   if (copy_from_user(, (void __user *) arg, sizeof(struct 
ifreq)))
-   break;
-  

[PATCH net-next 0/3] l2tp: sanitise MTU handling on sessions

2018-08-03 Thread Guillaume Nault
Most of the code handling sessions' MTU has no effect. The ->mtu field
in struct l2tp_session might be used at session creation time, but
neither PPP nor Ethernet pseudo-wires take updates into account.

L2TP sessions don't have a concept of MTU, which is the reason why
->mtu is mostly ignored. MTU should remain a network device thing.
Therefore this patch set does not try to propagate/update ->mtu to/from
the device. That would complicate the code unnecessarily. Instead this
field and the associated ioctl commands and netlink attributes are
removed.

Patch #1 defines l2tp_tunnel_dst_mtu() in order to simplify the
following patches. Then patches #2 and #3 remove MTU handling from PPP
and Ethernet pseudo-wires respectively.

Guillaume Nault (3):
  l2tp: define l2tp_tunnel_dst_mtu()
  l2tp: simplify MTU handling in l2tp_ppp
  l2tp: ignore L2TP_ATTR_MTU

 include/uapi/linux/l2tp.h |  2 +-
 net/l2tp/l2tp_core.c  |  1 -
 net/l2tp/l2tp_core.h  | 20 +--
 net/l2tp/l2tp_debugfs.c   |  3 +-
 net/l2tp/l2tp_eth.c   | 25 +
 net/l2tp/l2tp_netlink.c   |  9 +
 net/l2tp/l2tp_ppp.c   | 74 ++-
 7 files changed, 47 insertions(+), 87 deletions(-)

-- 
2.18.0



Re: [PATCH v8 bpf-next 00/10] veth: Driver XDP

2018-08-03 Thread Jesper Dangaard Brouer
On Fri,  3 Aug 2018 16:58:08 +0900
Toshiaki Makita  wrote:

> This patch set introduces driver XDP for veth.
> Basically this is used in conjunction with redirect action of another XDP
> program.
> 
>   NIC ---> veth===veth
>  (XDP) (redirect)(XDP)
> 

I'm was playing with V7 on my testlab yesterday and I noticed one
fundamental issue.  You are not updating the "ifconfig" stats counters,
when in XDP mode.  This makes receive or send via XDP invisible to
sysadm/management tools.  This for-sure is going to cause confusion...

I took a closer look at other driver. The ixgbe driver is doing the
right thing.  Driver i40e have a bug, where RX/TX stats are swapped
getting (strange!).  The mlx5 driver is not updating the regular RX/TX
counters, but A LOT of other ethtool stats counters (which are the ones
I usually monitor when testing).

So, given other drivers also didn't get this right, we need to have a
discussion outside your/this patchset.  Thus, I don't want to
stop/stall this patchset, but this is something we need to fixup in a
followup patchset to other drivers as well.

Thus, I'm acking the patchset, but I request that we do a joint effort
of fixing this as followup patches.

Acked-by: Jesper Dangaard Brouer 

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


[patch net-next] net: sched: fix flush on non-existing chain

2018-08-03 Thread Jiri Pirko
From: Jiri Pirko 

User was able to perform filter flush on chain 0 even if it didn't have
any filters in it. With the patch that avoided implicit chain 0
creation, this changed. So in case user wants filter flush on chain
which does not exist, just return success. There's no reason for non-0
chains to behave differently than chain 0, so do the same for them.

Reported-by: Ido Schimmel 
Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation")
Signed-off-by: Jiri Pirko 
---
 net/sched/cls_api.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e8b0bbd0883f..194c2e0b2737 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1389,6 +1389,13 @@ static int tc_del_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
}
chain = tcf_chain_get(block, chain_index, false);
if (!chain) {
+   /* User requested flush on non-existent chain. Nothing to do,
+* so just return success.
+*/
+   if (prio == 0) {
+   err = 0;
+   goto errout;
+   }
NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
err = -EINVAL;
goto errout;
-- 
2.14.4



Re: [PATCH v8 bpf-next 05/10] veth: Handle xdp_frames in xdp napi ring

2018-08-03 Thread Jesper Dangaard Brouer
On Fri,  3 Aug 2018 16:58:13 +0900
Toshiaki Makita  wrote:

> This is preparation for XDP TX and ndo_xdp_xmit.
> This allows napi handler to handle xdp_frames through xdp ring as well
> as sk_buff.
> 
> v8:
> - Don't use xdp_frame pointer address to calculate skb->head and
>   headroom.
> 
> v7:
> - Use xdp_scrub_frame() instead of memset().
> 
> v3:
> - Revert v2 change around rings and use a flag to differentiate skb and
>   xdp_frame, since bulk skb xmit makes little performance difference
>   for now.
> 
> v2:
> - Use another ring instead of using flag to differentiate skb and
>   xdp_frame. This approach makes bulk skb transmit possible in
>   veth_xmit later.
> - Clear xdp_frame feilds in skb->head.
> - Implement adjust_tail.
> 
> Signed-off-by: Toshiaki Makita 
> Acked-by: John Fastabend 

Acked-by: Jesper Dangaard Brouer 

Thanks this looks much better.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH v8 bpf-next 04/10] xdp: Helper function to clear kernel pointers in xdp_frame

2018-08-03 Thread Jesper Dangaard Brouer
On Fri,  3 Aug 2018 16:58:12 +0900
Toshiaki Makita  wrote:

> xdp_frame has kernel pointers which should not be readable from bpf
> programs. When we want to reuse xdp_frame region but it may be read by
> bpf programs later, we can use this helper to clear kernel pointers.
> This is more efficient than calling memset() for the entire struct.
> 
> Signed-off-by: Toshiaki Makita 

Acked-by: Jesper Dangaard Brouer 

After this patch is applied, I will take care of updating cpumap in a
similar way. Thanks.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


  1   2   >