[PATCH v4 net-next 0/3] net/sched: fix over mtu packet of defrag in

2020-11-24 Thread wenxu
From: wenxu 

Currently kernel tc subsystem can do conntrack in act_ct. But when several
fragment packets go through the act_ct, function tcf_ct_handle_fragments
will defrag the packets to a big one. But the last action will redirect
mirred to a device which maybe lead the reassembly big packet over the mtu
of target device.

The first patch fix miss init the qdisc_skb_cb->mru
The send one refactor the hanle of xmit in act_mirred and prepare for the
third one
The last one add implict packet fragment support to fix the over mtu for
defrag in act_ct.

wenxu (3):
  net/sched: fix miss init the mru in qdisc_skb_cb
  net/sched: act_mirred: refactor the handle of xmit
  net/sched: sch_frag: add generic packet fragment support.

 include/net/act_api.h |   6 ++
 include/net/sch_generic.h |   5 +-
 net/core/dev.c|   2 +
 net/sched/Makefile|   1 +
 net/sched/act_api.c   |  16 +
 net/sched/act_ct.c|   3 +
 net/sched/act_mirred.c|  21 +--
 net/sched/sch_frag.c  | 150 ++
 8 files changed, 194 insertions(+), 10 deletions(-)
 create mode 100644 net/sched/sch_frag.c

-- 
1.8.3.1



Re: [PATCH v4 net-next 0/3] add support for sending RFC8335 PROBE

2020-11-20 Thread David Ahern
On 11/20/20 10:27 AM, Andreas Roeseler wrote:
> On Thu, 2020-11-19 at 21:01 -0700, David Ahern wrote:
>> On 11/19/20 8:51 PM, David Ahern wrote:
>>> On 11/17/20 5:46 PM, Andreas Roeseler wrote:
 The popular utility ping has several severe limitations such as
 the
 inability to query specific  interfaces on a node and requiring
 bidirectional connectivity between the probing and the probed
 interfaces. RFC8335 attempts to solve these limitations by
 creating the
 new utility PROBE which is a specialized ICMP message that makes
 use of
 the ICMP Extension Structure outlined in RFC4884.

 This patchset adds definitions for the ICMP Extended Echo Request
 and
 Reply (PROBE) types for both IPv4 and IPv6. It also expands the
 list of
 supported ICMP messages to accommodate PROBEs.

>>>
>>> You are updating the send, but what about the response side?
>>>
>>
>> you also are not setting 'ICMP Extension Structure'. From:
>> https://tools.ietf.org/html/rfc8335
>>
>>    o  ICMP Extension Structure: The ICMP Extension Structure
>> identifies
>>   the probed interface.
>>
>>    Section 7 of [RFC4884] defines the ICMP Extension Structure.  As
>> per
>>    RFC 4884, the Extension Structure contains exactly one Extension
>>    Header followed by one or more objects.  When applied to the ICMP
>>    Extended Echo Request message, the ICMP Extension Structure MUST
>>    contain exactly one instance of the Interface Identification
>> Object
>>    (see Section 2.1).
> 
> I am currently finishing testing and polishing the response side and
> hope to be sendding out v1 of the patch in the upcoming few weeks.

send the response side with the request side -- 1 set of patches for the
entire feature.

> 
> As for the 'ICMP Extension Structure', I have been working with the
> iputils package to add a command to send PROBE messages, and the
> changes included in this patchset are all that are necessary to be able
> to send PROBEs using the existing ping framework.
> 

right.


Re: [PATCH v4 net-next 0/3] add support for sending RFC8335 PROBE

2020-11-20 Thread Andreas Roeseler
On Thu, 2020-11-19 at 21:01 -0700, David Ahern wrote:
> On 11/19/20 8:51 PM, David Ahern wrote:
> > On 11/17/20 5:46 PM, Andreas Roeseler wrote:
> > > The popular utility ping has several severe limitations such as
> > > the
> > > inability to query specific  interfaces on a node and requiring
> > > bidirectional connectivity between the probing and the probed
> > > interfaces. RFC8335 attempts to solve these limitations by
> > > creating the
> > > new utility PROBE which is a specialized ICMP message that makes
> > > use of
> > > the ICMP Extension Structure outlined in RFC4884.
> > > 
> > > This patchset adds definitions for the ICMP Extended Echo Request
> > > and
> > > Reply (PROBE) types for both IPv4 and IPv6. It also expands the
> > > list of
> > > supported ICMP messages to accommodate PROBEs.
> > > 
> > 
> > You are updating the send, but what about the response side?
> > 
> 
> you also are not setting 'ICMP Extension Structure'. From:
> https://tools.ietf.org/html/rfc8335
> 
>    o  ICMP Extension Structure: The ICMP Extension Structure
> identifies
>   the probed interface.
> 
>    Section 7 of [RFC4884] defines the ICMP Extension Structure.  As
> per
>    RFC 4884, the Extension Structure contains exactly one Extension
>    Header followed by one or more objects.  When applied to the ICMP
>    Extended Echo Request message, the ICMP Extension Structure MUST
>    contain exactly one instance of the Interface Identification
> Object
>    (see Section 2.1).

I am currently finishing testing and polishing the response side and
hope to be sendding out v1 of the patch in the upcoming few weeks.

As for the 'ICMP Extension Structure', I have been working with the
iputils package to add a command to send PROBE messages, and the
changes included in this patchset are all that are necessary to be able
to send PROBEs using the existing ping framework.



Re: [PATCH v4 net-next 0/3] add support for sending RFC8335 PROBE

2020-11-19 Thread David Ahern
On 11/19/20 8:51 PM, David Ahern wrote:
> On 11/17/20 5:46 PM, Andreas Roeseler wrote:
>> The popular utility ping has several severe limitations such as the
>> inability to query specific  interfaces on a node and requiring
>> bidirectional connectivity between the probing and the probed
>> interfaces. RFC8335 attempts to solve these limitations by creating the
>> new utility PROBE which is a specialized ICMP message that makes use of
>> the ICMP Extension Structure outlined in RFC4884.
>>
>> This patchset adds definitions for the ICMP Extended Echo Request and
>> Reply (PROBE) types for both IPv4 and IPv6. It also expands the list of
>> supported ICMP messages to accommodate PROBEs.
>>
> 
> You are updating the send, but what about the response side?
> 

you also are not setting 'ICMP Extension Structure'. From:
https://tools.ietf.org/html/rfc8335

   o  ICMP Extension Structure: The ICMP Extension Structure identifies
  the probed interface.

   Section 7 of [RFC4884] defines the ICMP Extension Structure.  As per
   RFC 4884, the Extension Structure contains exactly one Extension
   Header followed by one or more objects.  When applied to the ICMP
   Extended Echo Request message, the ICMP Extension Structure MUST
   contain exactly one instance of the Interface Identification Object
   (see Section 2.1).


Re: [PATCH v4 net-next 0/3] add support for sending RFC8335 PROBE

2020-11-19 Thread David Ahern
On 11/17/20 5:46 PM, Andreas Roeseler wrote:
> The popular utility ping has several severe limitations such as the
> inability to query specific  interfaces on a node and requiring
> bidirectional connectivity between the probing and the probed
> interfaces. RFC8335 attempts to solve these limitations by creating the
> new utility PROBE which is a specialized ICMP message that makes use of
> the ICMP Extension Structure outlined in RFC4884.
> 
> This patchset adds definitions for the ICMP Extended Echo Request and
> Reply (PROBE) types for both IPv4 and IPv6. It also expands the list of
> supported ICMP messages to accommodate PROBEs.
> 

You are updating the send, but what about the response side?


[PATCH v4 net-next 0/3] add support for sending RFC8335 PROBE

2020-11-17 Thread Andreas Roeseler
The popular utility ping has several severe limitations such as the
inability to query specific  interfaces on a node and requiring
bidirectional connectivity between the probing and the probed
interfaces. RFC8335 attempts to solve these limitations by creating the
new utility PROBE which is a specialized ICMP message that makes use of
the ICMP Extension Structure outlined in RFC4884.

This patchset adds definitions for the ICMP Extended Echo Request and
Reply (PROBE) types for both IPv4 and IPv6. It also expands the list of
supported ICMP messages to accommodate PROBEs.

Changes since v1:
 - Switch to correct base tree

Changes since v2:
 - Switch to net-next tree 67c70b5eb2bf7d0496fcb62d308dc3096bc11553

Changes since v3:
 - Reorder patches to add defines first

Andreas Roeseler (3):
  icmp: define PROBE message types
  ICMPv6: define PROBE message types
  net: add support for sending RFC8335 PROBE

 include/uapi/linux/icmp.h   | 3 +++
 include/uapi/linux/icmpv6.h | 6 ++
 net/ipv4/ping.c | 4 +++-
 3 files changed, 12 insertions(+), 1 deletion(-)

-- 
2.29.2



Re: [PATCH v4 net-next 0/3] Add PTP support for Octeontx2

2020-07-15 Thread Richard Cochran
On Wed, Jul 15, 2020 at 06:08:06PM +0530, Subbaraya Sundeep wrote:
> Hi,
> 
> This patchset adds PTP support for Octeontx2 platform.
> PTP is an independent coprocessor block from which
> CGX block fetches timestamp and prepends it to the
> packet before sending to NIX block. Patches are as
> follows:
> 
> Patch 1: Patch to enable/disable packet timstamping
>in CGX upon mailbox request. It also adjusts
>packet parser (NPC) for the 8 bytes timestamp
>appearing before the packet.
> 
> Patch 2: Patch adding PTP pci driver which configures
>the PTP block and hooks up to RVU AF driver.
>It also exposes a mailbox call to adjust PTP
>hardware clock.
> 
> Patch 3: Patch adding PTP clock driver for PF netdev.
> 
> 
> Aleksey Makarov (2):
>   octeontx2-af: Add support for Marvell PTP coprocessor
>   octeontx2-pf: Add support for PTP clock
> 
> Zyta Szpak (1):
>   octeontx2-af: Support to enable/disable HW timestamping

Acked-by: Richard Cochran 


[PATCH v4 net-next 0/3] Add PTP support for Octeontx2

2020-07-15 Thread Subbaraya Sundeep
Hi,

This patchset adds PTP support for Octeontx2 platform.
PTP is an independent coprocessor block from which
CGX block fetches timestamp and prepends it to the
packet before sending to NIX block. Patches are as
follows:

Patch 1: Patch to enable/disable packet timstamping
 in CGX upon mailbox request. It also adjusts
 packet parser (NPC) for the 8 bytes timestamp
 appearing before the packet.

Patch 2: Patch adding PTP pci driver which configures
 the PTP block and hooks up to RVU AF driver.
 It also exposes a mailbox call to adjust PTP
 hardware clock.

Patch 3: Patch adding PTP clock driver for PF netdev.


Aleksey Makarov (2):
  octeontx2-af: Add support for Marvell PTP coprocessor
  octeontx2-pf: Add support for PTP clock

Zyta Szpak (1):
  octeontx2-af: Support to enable/disable HW timestamping

 drivers/net/ethernet/marvell/octeontx2/af/Makefile |   2 +-
 drivers/net/ethernet/marvell/octeontx2/af/cgx.c|  29 +++
 drivers/net/ethernet/marvell/octeontx2/af/cgx.h|   4 +
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  21 ++
 drivers/net/ethernet/marvell/octeontx2/af/ptp.c| 244 +
 drivers/net/ethernet/marvell/octeontx2/af/ptp.h|  22 ++
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c|  29 ++-
 drivers/net/ethernet/marvell/octeontx2/af/rvu.h|   5 +
 .../net/ethernet/marvell/octeontx2/af/rvu_cgx.c|  54 +
 .../net/ethernet/marvell/octeontx2/af/rvu_nix.c|  52 +
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c|  27 +++
 .../net/ethernet/marvell/octeontx2/nic/Makefile|   3 +-
 .../ethernet/marvell/octeontx2/nic/otx2_common.c   |   7 +
 .../ethernet/marvell/octeontx2/nic/otx2_common.h   |  19 ++
 .../ethernet/marvell/octeontx2/nic/otx2_ethtool.c  |  28 +++
 .../net/ethernet/marvell/octeontx2/nic/otx2_pf.c   | 170 +-
 .../net/ethernet/marvell/octeontx2/nic/otx2_ptp.c  | 209 ++
 .../net/ethernet/marvell/octeontx2/nic/otx2_ptp.h  |  13 ++
 .../net/ethernet/marvell/octeontx2/nic/otx2_txrx.c |  87 +++-
 .../net/ethernet/marvell/octeontx2/nic/otx2_txrx.h |   1 +
 20 files changed, 1016 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/ptp.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/ptp.h
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.h

-- 
2.7.4



Re: [PATCH v4 net-next 0/3] ipv4: Move location of pcpu route cache and exceptions

2019-05-05 Thread David Miller
From: David Ahern 
Date: Tue, 30 Apr 2019 07:45:47 -0700

> This series moves IPv4 pcpu cached routes from fib_nh to fib_nh_common
> to make the caches available for IPv6 nexthops (fib6_nh) with IPv4
> routes. This allows a fib6_nh struct to be used with both IPv4 and
> and IPv6 routes.
 ...

Series applied, thanks David.


[PATCH v4 net-next 0/3] ipv4: Move location of pcpu route cache and exceptions

2019-04-30 Thread David Ahern
From: David Ahern 

This series moves IPv4 pcpu cached routes from fib_nh to fib_nh_common
to make the caches available for IPv6 nexthops (fib6_nh) with IPv4
routes. This allows a fib6_nh struct to be used with both IPv4 and
and IPv6 routes.

v4
- fixed memleak if encap_type is not set as noticed by Ido

v3
- dropped ipv6 patches for now. Will resubmit those once the existing
  refcnt problem is fixed

v2
- reverted patch 2 to use ifdef CONFIG_IP_ROUTE_CLASSID instead
  of IS_ENABLED(CONFIG_IP_ROUTE_CLASSID) to fix compile issues
  reported by kbuild test robot

David Ahern (3):
  ipv4: Move cached routes to fib_nh_common
  ipv4: Pass fib_nh_common to rt_cache_route
  ipv4: Move exception bucket to nh_common

 include/net/ip_fib.h |  8 --
 net/ipv4/fib_semantics.c | 48 ---
 net/ipv4/route.c | 75 ++--
 3 files changed, 64 insertions(+), 67 deletions(-)

-- 
2.11.0



[PATCH v4 net-next 0/3] rds: IPv6 support

2018-07-23 Thread Ka-Cheong Poon
This patch set adds IPv6 support to the kernel RDS and related
modules.  Existing RDS apps using IPv4 address continue to run without
any problem.  New RDS apps which want to use IPv6 address can do so by
passing the address in struct sockaddr_in6 to bind(), connect() or
sendmsg().  And those apps also need to use the new IPv6 equivalents
of some of the existing socket options as the existing options use a
32 bit integer to store IP address.

All RDS code now use struct in6_addr to store IP address.  IPv4
address is stored as an IPv4 mapped address.

Header file changes

There are many data structures (RDS socket options) used by RDS apps
which use a 32 bit integer to store IP address. To support IPv6,
struct in6_addr needs to be used. To ensure backward compatibility, a
new data structure is introduced for each of those data structures
which use a 32 bit integer to represent an IP address. And new socket
options are introduced to use those new structures. This means that
existing apps should work without a problem with the new RDS module.
For apps which want to use IPv6, those new data structures and socket
options can be used. IPv4 mapped address is used to represent IPv4
address in the new data structures.

Internally, all RDS data structures which contain an IP address are
changed to use struct in6_addr to store the address. IPv4 address is
stored as an IPv4 mapped address. All the functions which take an IP
address as argument are also changed to use struct in6_addr.

RDS/RDMA/IB uses a private data (struct rds_ib_connect_private)
exchange between endpoints at RDS connection establishment time to
support RDMA. This private data exchange uses a 32 bit integer to
represent an IP address. This needs to be changed in order to support
IPv6. A new private data struct rds6_ib_connect_private is introduced
to handle this. To ensure backward compatibility, an IPv6 capable RDS
stack uses another RDMA listener port (RDS_CM_PORT) to accept IPv6
connection. And it continues to use the original RDS_PORT for IPv4 RDS
connections. When it needs to communicate with an IPv6 peer, it uses
the RDS_TCP_PORT to send the connection set up request.

RDS/TCP changes

TCP related code is changed to support IPv6.  Note that only an IPv6
TCP listener on port RDS_TCP_PORT is created as it can accept both
IPv4 and IPv6 connection requests.

IB/RDMA changes

The initial private data exchange between IB endpoints using RDMA is
changed to support IPv6 address instead, if the peer address is IPv6.
To ensure backward compatibility, annother RDMA listener port
(RDS_CM_PORT) is used to accept IPv6 connection. An IPv6 capable RDS
module continues to use the original RDS_PORT for IPv4 RDS
connections. When it needs to communicate with an IPv6 peer, it uses
the RDS_CM_PORT to send the connection set up request.

Ka-Cheong Poon (3):
  rds: Changing IP address internal representation to struct in6_addr
  rds: Enable RDS IPv6 support
  rds: Extend RDS API for IPv6 support

 include/uapi/linux/rds.h |  69 ++-
 net/rds/af_rds.c | 201 --
 net/rds/bind.c   | 136 -
 net/rds/cong.c   |  23 ++--
 net/rds/connection.c | 259 ++-
 net/rds/ib.c | 114 +++--
 net/rds/ib.h |  51 ++--
 net/rds/ib_cm.c  | 309 +++
 net/rds/ib_mr.h  |   2 +
 net/rds/ib_rdma.c|  24 ++--
 net/rds/ib_recv.c|  18 +--
 net/rds/ib_send.c|  10 +-
 net/rds/loop.c   |   7 +-
 net/rds/rdma.c   |   6 +-
 net/rds/rdma_transport.c |  84 ++---
 net/rds/rdma_transport.h |   5 +
 net/rds/rds.h|  88 +-
 net/rds/recv.c   |  76 +---
 net/rds/send.c   | 114 ++---
 net/rds/tcp.c| 128 
 net/rds/tcp.h|   2 +-
 net/rds/tcp_connect.c|  68 ---
 net/rds/tcp_listen.c |  74 +---
 net/rds/tcp_recv.c   |   9 +-
 net/rds/tcp_send.c   |   4 +-
 net/rds/threads.c|  69 +--
 net/rds/transport.c  |  15 ++-
 27 files changed, 1543 insertions(+), 422 deletions(-)

-- 
1.8.3.1



[PATCH v4 net-next 0/3] lan78xx updates along with Fixed phy Support

2018-04-26 Thread Raghuram Chary J
These series of patches handle few modifications in driver
and adds support for fixed phy.

Raghuram Chary J (3):
  lan78xx: Lan7801 Support for Fixed PHY
  lan78xx: Remove DRIVER_VERSION for lan78xx driver
  lan78xx: Modify error messages

 drivers/net/usb/Kconfig   |   1 +
 drivers/net/usb/lan78xx.c | 106 --
 2 files changed, 75 insertions(+), 32 deletions(-)

-- 
2.16.2



Re: [PATCH v4 net-next 0/3]

2017-11-30 Thread William Tu
On Thu, Nov 30, 2017 at 11:32 AM, David Miller  wrote:
>
> There is no actual descriptive text in your Subject line here.
>
> Please fix this.
sure. thanks. Let me resubmit.


Re: [PATCH v4 net-next 0/3]

2017-11-30 Thread David Miller

There is no actual descriptive text in your Subject line here.

Please fix this.


[PATCH v4 net-next 0/3]

2017-11-30 Thread William Tu
change in v4:
  - rebase on top of net-next
  - use log_ecn_error in ip6_tnl_rcv
  
change in v3:
  - add inline for functions in header
  - rebase on top of net-next

change in v2:
  - remove inline
  - fix some indent
  - fix errors reports by clang and scan-build

William Tu (3):
  ip_gre: Refector the erpsan tunnel code.
  ip6_gre: Refactor ip6gre xmit codes
  ip6_gre: Add ERSPAN native tunnel support

 include/net/erspan.h |  51 ++
 include/net/ip6_tunnel.h |   1 +
 net/ipv4/ip_gre.c|  54 +--
 net/ipv6/ip6_gre.c   | 393 ---
 4 files changed, 398 insertions(+), 101 deletions(-)

-- 
--
A test script is provided below:
#!/bin/bash
# In the namespace NS0, create veth0 and ip6erspan00
# Out of the namespace, create veth1 and ip6erspan11
# Ping in and out of namespace using ERSPAN protocol 

# Patch v2 for iproute2
# https://marc.info/?l=linux-netdev&m=151002165705772&w=2 

cleanup() {
set +ex
ip netns del ns0
ip link del ip6erspan11
ip link del veth1
}

main() {
trap cleanup 0 2 3 9

ip netns add ns0
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns0

# non-namespace
ip addr add dev veth1 fc00:100::2/96
ip link add dev ip6erspan11 type ip6erspan seq key 102 erspan 123 \
 local fc00:100::2 \
remote fc00:100::1

ip addr add dev ip6erspan11 fc00:200::2/96
ip addr add dev ip6erspan11 10.10.200.2/24

# namespace: ns0 
ip netns exec ns0 ip addr add fc00:100::1/96 dev veth0

# Tunnel
ip netns exec ns0 ip link add dev ip6erspan00 type ip6erspan seq key 
102 erspan 12 \
 local fc00:100::1 \
remote fc00:100::2

ip netns exec ns0 ip addr add dev ip6erspan00 fc00:200::1/96
ip netns exec ns0 ip addr add dev ip6erspan00 10.10.200.1/24

ip link set dev veth1 up
ip link set dev ip6erspan11 up
ip netns exec ns0 ip link set dev ip6erspan00 up
ip netns exec ns0 ip link set dev veth0 up
}

main

# Ping underlying
ping6 -c 1 fc00:100::1 || true

# ping overlay
ping -c 3 10.10.200.1
ping6 -c 3 fc00:200::1


2.7.4



Re: [PATCH v4 net-next 0/3] bpf: Add BPF support to all perf_event

2017-06-04 Thread David Miller
From: Alexei Starovoitov 
Date: Fri, 2 Jun 2017 21:03:51 -0700

> v3->v4: one more tweak to reject unsupported events at map
> update time as Peter suggested
> 
> v2->v3: more refactoring to address Peter's feedback.
> Now all perf_events are attachable and readable
> 
> v1->v2: address Peter's feedback. Refactor patch 1 to allow attaching
> bpf programs to all event types and reading counters from all of them as well
> patch 2 - more tests
> patch 3 - address Dave's feedback and document bpf_perf_event_read()
> and bpf_perf_event_output() properly

Series applied, thanks.


[PATCH v4 net-next 0/3] bpf: Add BPF support to all perf_event

2017-06-02 Thread Alexei Starovoitov
v3->v4: one more tweak to reject unsupported events at map
update time as Peter suggested

v2->v3: more refactoring to address Peter's feedback.
Now all perf_events are attachable and readable

v1->v2: address Peter's feedback. Refactor patch 1 to allow attaching
bpf programs to all event types and reading counters from all of them as well
patch 2 - more tests
patch 3 - address Dave's feedback and document bpf_perf_event_read()
and bpf_perf_event_output() properly

Alexei Starovoitov (1):
  perf, bpf: Add BPF support to all perf_event types

Teng Qin (2):
  samples/bpf: add tests for more perf event types
  bpf: update perf event helper functions documentation

 include/linux/perf_event.h |   7 +-
 include/uapi/linux/bpf.h   |  11 ++-
 kernel/bpf/arraymap.c  |  28 ++-
 kernel/events/core.c   |  47 ++-
 kernel/trace/bpf_trace.c   |  21 ++---
 samples/bpf/bpf_helpers.h  |   3 +-
 samples/bpf/trace_event_user.c |  73 ++---
 samples/bpf/tracex6_kern.c |  28 +--
 samples/bpf/tracex6_user.c | 180 -
 tools/include/uapi/linux/bpf.h |  11 ++-
 10 files changed, 290 insertions(+), 119 deletions(-)

-- 
2.9.3



[PATCH V4 net-next 0/3] vhost_net tx batching

2017-01-05 Thread Jason Wang
Hi:

This series tries to implement tx batching support for vhost. This was
done by using MSG_MORE as a hint for under layer socket. The backend
(e.g tap) can then batch the packets temporarily in a list and
submit it all once the number of bacthed exceeds a limitation.

Tests shows obvious improvement on guest pktgen over over
mlx4(noqueue) on host:

 Mpps  -+%
rx-frames = 00.91  +0%
rx-frames = 41.00  +9.8%
rx-frames = 81.00  +9.8%
rx-frames = 16   1.01  +10.9%
rx-frames = 32   1.07  +17.5%
rx-frames = 48   1.07  +17.5%
rx-frames = 64   1.08  +18.6%
rx-frames = 64 (no MSG_MORE) 0.91  +0%

Changes from V3:
- use ethtool instead of module parameter to control the maximum
  number of batched packets
- avoid overhead when MSG_MORE were not set and no packet queued

Changes from V2:
- remove uselss queue limitation check (and we don't drop any packet now)

Changes from V1:
- drop NAPI handler since we don't use NAPI now
- fix the issues that may exceeds max pending of zerocopy
- more improvement on available buffer detection
- move the limitation of batched pacekts from vhost to tuntap

Please review.

Thanks

Jason Wang (3):
  vhost: better detection of available buffers
  vhost_net: tx batching
  tun: rx batching

 drivers/net/tun.c | 76 +++
 drivers/vhost/net.c   | 23 ++--
 drivers/vhost/vhost.c |  8 --
 3 files changed, 96 insertions(+), 11 deletions(-)

-- 
2.7.4



Re: [PATCH v4 net-next 0/3] strp: Stream parser for messages

2016-08-17 Thread David Miller
From: Tom Herbert 
Date: Mon, 15 Aug 2016 14:51:00 -0700

> This patch set introduces a utility for parsing application layer
> protocol messages in a TCP stream. This is a generalization of the
> mechanism implemented of Kernel Connection Multiplexor.
 ...

Series applied, thanks Tom.


[PATCH v4 net-next 0/3] strp: Stream parser for messages

2016-08-15 Thread Tom Herbert
This patch set introduces a utility for parsing application layer
protocol messages in a TCP stream. This is a generalization of the
mechanism implemented of Kernel Connection Multiplexor.

This patch set adapts KCM to use the strparser. We expect that kTLS
can use this mechanism also. RDS would probably be another candidate
to use a common stream parsing mechanism.

The API includes a context structure, a set of callbacks, utility
functions, and a data ready function. The callbacks include
a parse_msg function that is called to perform parsing (e.g.
BPF parsing in case of KCM), and a rcv_msg function that is called
when a full message has been completed.

For strparser we specify the return codes from the parser to allow
the backend to indicate that control of the socket should be
transferred back to userspace to handle some exceptions in the
stream: The return values are:

  >0 : indicates length of successfully parsed message
   0  : indicates more data must be received to parse the message
   -ESTRPIPE : current message should not be processed by the
  kernel, return control of the socket to userspace which
  can proceed to read the messages itself
   other < 0 : Error is parsing, give control back to userspace
  assuming that synchronization is lost and the stream
  is unrecoverable (application expected to close TCP socket)

There is one issue I haven't been able to fully resolve. If parse_msg
returns ESTRPIPE (wants control back to userspace) the parser may
already have consumed some bytes of the message. There is no way to
put bytes back into the TCP receive queue and tcp_read_sock does not
allow an easy way to peek messages. In lieu of a better solution, we
return ENODATA on the socket to indicate that the data stream is
unrecoverable (application needs to close socket). This condition
should only happen if an application layer message header is split
across two skbuffs and parsing just the first skbuff wasn't sufficient
to determine the that transfer to userspace is needed.

This patch set contains:

  - strparser implementation
  - changes to kcm to use strparser
  - strparser.txt documentation

v2:
  - Add copyright notice to C files
  - Remove GPL module license from strparser.c
  - Add report of rxpause

v3:
  - Restore GPL module license
  - Use EXPORT_SYMBOL_GPL

v4:
  - Removed unused function, changed another to be static as suggested
by davem
  - Rewoked data_ready to be called from upper layer, no longer requires
taking over socket data_ready callback as suggested by Lance Chao

Tested:
  - Ran a KCM thrash test for 24 hours. No behavioral or performance
differences observed.


Tom Herbert (3):
  strparser: Stream parser for messages
  kcm: Use stream parser
  strparser: Documentation

 Documentation/networking/strparser.txt | 137 +
 include/net/kcm.h  |  37 +--
 include/net/strparser.h| 145 ++
 net/Kconfig|   1 +
 net/Makefile   |   1 +
 net/ipv6/ila/ila_common.c  |   1 -
 net/kcm/Kconfig|   1 +
 net/kcm/kcmproc.c  |  44 ++-
 net/kcm/kcmsock.c  | 456 ++
 net/strparser/Kconfig  |   4 +
 net/strparser/Makefile |   1 +
 net/strparser/strparser.c  | 492 +
 12 files changed, 897 insertions(+), 423 deletions(-)
 create mode 100644 Documentation/networking/strparser.txt
 create mode 100644 include/net/strparser.h
 create mode 100644 net/strparser/Kconfig
 create mode 100644 net/strparser/Makefile
 create mode 100644 net/strparser/strparser.c

-- 
2.8.0.rc2



Re: [PATCH v4 net-next 0/3] tcp: Make use of MSG_EOR in tcp_sendmsg

2016-04-28 Thread David Miller
From: Martin KaFai Lau 
Date: Mon, 25 Apr 2016 14:44:47 -0700

 ...
> One potential use case is to use MSG_EOR with
> SOF_TIMESTAMPING_TX_ACK to get a more accurate
> TCP ack timestamping on application protocol with
> multiple outgoing response messages (e.g. HTTP2).
> 
> One of our use case is at the webserver.  The webserver tracks
> the HTTP2 response latency by measuring when the webserver sends
> the first byte to the socket till the TCP ACK of the last byte
> is received.  In the cases where we don't have client side
> measurement, measuring from the server side is the only option.
> In the cases we have the client side measurement, the server side
> data can also be used to justify/cross-check-with the client
> side data.

Looks good, series applied, thanks!


Re: [PATCH v4 net-next 0/3] tcp: Make use of MSG_EOR in tcp_sendmsg

2016-04-25 Thread Soheil Hassas Yeganeh
On Mon, Apr 25, 2016 at 5:44 PM, Martin KaFai Lau  wrote:
> v4:
> ~ Do not set eor bit in do_tcp_sendpages() since there is
>   no way to pass MSG_EOR from the userland now.
> ~ Avoid rmw by testing MSG_EOR first in tcp_sendmsg().
> ~ Move TCP_SKB_CB(skb)->eor test to a new helper
>   tcp_skb_can_collapse_to() (suggested by Soheil).
> ~ Add some packetdrill tests.

Thanks for the nice patches and the tests!

> v3:
> ~ Separate EOR marking from the SKBTX_ANY_TSTAMP logic.
> ~ Move the eor bit test back to the loop in tcp_sendmsg and
>   tcp_sendpage because there could be >1 threads doing
>   sendmsg.
> ~ Thanks to Eric Dumazet's suggestions on v2.
> ~ The TCP timestamp bug fixes are separated into other threads.
>
> v2:
> ~ Rework based on the recent work
>   "add TX timestamping via cmsg" by
>   Soheil Hassas Yeganeh 
> ~ This version takes the MSG_EOR bit as a signal of
>   end-of-response-message and leave the selective
>   timestamping job to the cmsg
> ~ Changes based on the v1 feedback (like avoid
>   unlikely check in a loop and adding tcp_sendpage
>   support)
> ~ The first 3 patches are bug fixes.  The fixes in this
>   series depend on the newly introduced txstamp_ack in
>   net-next.  I will make relevant patches against net after
>   getting some feedback.
> ~ The test results are based on the recently posted net fix:
>   "tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks"
>
> One potential use case is to use MSG_EOR with
> SOF_TIMESTAMPING_TX_ACK to get a more accurate
> TCP ack timestamping on application protocol with
> multiple outgoing response messages (e.g. HTTP2).
>
> One of our use case is at the webserver.  The webserver tracks
> the HTTP2 response latency by measuring when the webserver sends
> the first byte to the socket till the TCP ACK of the last byte
> is received.  In the cases where we don't have client side
> measurement, measuring from the server side is the only option.
> In the cases we have the client side measurement, the server side
> data can also be used to justify/cross-check-with the client
> side data.
>


[PATCH v4 net-next 0/3] tcp: Make use of MSG_EOR in tcp_sendmsg

2016-04-25 Thread Martin KaFai Lau
v4:
~ Do not set eor bit in do_tcp_sendpages() since there is
  no way to pass MSG_EOR from the userland now.
~ Avoid rmw by testing MSG_EOR first in tcp_sendmsg().
~ Move TCP_SKB_CB(skb)->eor test to a new helper
  tcp_skb_can_collapse_to() (suggested by Soheil).
~ Add some packetdrill tests.

v3:
~ Separate EOR marking from the SKBTX_ANY_TSTAMP logic.
~ Move the eor bit test back to the loop in tcp_sendmsg and
  tcp_sendpage because there could be >1 threads doing
  sendmsg.
~ Thanks to Eric Dumazet's suggestions on v2.
~ The TCP timestamp bug fixes are separated into other threads.

v2:
~ Rework based on the recent work
  "add TX timestamping via cmsg" by
  Soheil Hassas Yeganeh 
~ This version takes the MSG_EOR bit as a signal of
  end-of-response-message and leave the selective
  timestamping job to the cmsg
~ Changes based on the v1 feedback (like avoid
  unlikely check in a loop and adding tcp_sendpage
  support)
~ The first 3 patches are bug fixes.  The fixes in this
  series depend on the newly introduced txstamp_ack in
  net-next.  I will make relevant patches against net after
  getting some feedback.
~ The test results are based on the recently posted net fix:
  "tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks"

One potential use case is to use MSG_EOR with
SOF_TIMESTAMPING_TX_ACK to get a more accurate
TCP ack timestamping on application protocol with
multiple outgoing response messages (e.g. HTTP2).

One of our use case is at the webserver.  The webserver tracks
the HTTP2 response latency by measuring when the webserver sends
the first byte to the socket till the TCP ACK of the last byte
is received.  In the cases where we don't have client side
measurement, measuring from the server side is the only option.
In the cases we have the client side measurement, the server side
data can also be used to justify/cross-check-with the client
side data.



[PATCH v4 net-next 0/3] Add new capability and macb DT variant

2016-01-04 Thread Neil Armstrong
The first patch introduces a new capability bit to disable usage of the
USRIO register on platform not implementing it thus avoiding some external
imprecise aborts on ARM based platforms.
The two last patchs adds a new macb variant compatible name using the
capability, the NPx name is temporary and must be fixed when the first patch
hits mainline.

Only the first patch should be merged right now until the compatible name
is fixed.

v1: 
http://lkml.kernel.org/r/1449485914-12883-1-git-send-email-narmstr...@baylibre.com
v2: 
http://lkml.kernel.org/r/1449582726-6148-1-git-send-email-narmstr...@baylibre.com
v3: 
http://lkml.kernel.org/r/1451898103-21868-1-git-send-email-narmstr...@baylibre.com
v4: as nicolas suggested, use a new macb config and a new product/vendor prefix

Neil Armstrong (3):
  net: ethernet: cadence-macb: Add disabled usrio caps
  net: macb: Add NPx macb config using USRIO_DISABLED cap
  dt-bindings: net: macb: Add NPx macb variant

 Documentation/devicetree/bindings/net/macb.txt |  1 +
 drivers/net/ethernet/cadence/macb.c| 33 --
 drivers/net/ethernet/cadence/macb.h|  1 +
 3 files changed, 23 insertions(+), 12 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html