Re: [RFC PATCH 0/3] acpi: Add acpi mdio support code

2018-11-12 Thread Wang, Dongsheng
Hello Rafael,

On 2018/11/13 1:19, Andrew Lunn wrote:
>>> I'm just trying to ensure whatever is defined is flexible enough that
>>> we really can later support everything which DT does. We have PHYs on
>>> MDIO busses, inside switches, which are on MDIO busses, which are
>>> inside Ethernet interfaces, etc.
>> I think it can be satisfied. See the table I'm using above.
> Hi Dongsheng
>
>
> Since i don't know anything better, i think i have to trust you have
> this correct.
>
> It would be good to document this, so that the next person who needs
> to add ACPI support for a PHY has some documentation to look at.
> Could you add something to Documentation/acpi/dsd?

How about Andrew suggestion? I agree with Andrew.

Or We need to add the documentation to Documentation/acpi?
I see that both GPIO and I2C are down here.


Cheers,
Dongsheng



Re: [PATCH net-next 0/4] sctp: add subscribe per asoc and sockopt SCTP_EVENT

2018-11-12 Thread Xin Long
On Tue, Nov 13, 2018 at 1:26 AM Xin Long  wrote:
>
> This patchset mainly adds the Event Subscription sockopt described in
> rfc6525#section-6.2:
>
> Subscribing to events as described in [RFC6458] uses a setsockopt()
> call with the SCTP_EVENT socket option.  This option takes the
> following structure, which specifies the association, the event type
> (using the same value found in the event type field), and an on/off
> boolean.
>
>   struct sctp_event {
> sctp_assoc_t se_assoc_id;
> uint16_t se_type;
> uint8_t  se_on;
>   };
>
> The user fills in the se_type field with the same value found in the
> strreset_type field, i.e., SCTP_STREAM_RESET_EVENT.  The user will
> also fill in the se_assoc_id field with either the association to set
> this event on (this field is ignored for one-to-one style sockets) or
> one of the reserved constant values defined in [RFC6458].  Finally,
> the se_on field is set with a 1 to enable the event or a 0 to disable
> the event.
>
> As for the old SCTP_EVENTS Option with struct sctp_event_subscribe,
> it's being DEPRECATED.
>
> Xin Long (4):
>   sctp: define subscribe in sctp_sock as __u16
>   sctp: add subscribe per asoc
>   sctp: rename enum sctp_event to sctp_event_type
>   sctp: add sockopt SCTP_EVENT
>
>  include/net/sctp/constants.h |   2 +-
>  include/net/sctp/sm.h|   4 +-
>  include/net/sctp/structs.h   |   4 +-
>  include/net/sctp/ulpevent.h  |  39 --
>  include/uapi/linux/sctp.h|  13 -
>  net/sctp/associola.c |   2 +
>  net/sctp/chunk.c |   8 ++-
>  net/sctp/primitive.c |   2 +-
>  net/sctp/sm_sideeffect.c |  12 ++---
>  net/sctp/sm_statetable.c |   2 +-
>  net/sctp/socket.c| 126 
> ---
>  net/sctp/stream_interleave.c |  12 +++--
>  net/sctp/ulpqueue.c  |   8 +--
>  13 files changed, 184 insertions(+), 50 deletions(-)
>
> --
> 2.1.0
>
Because some key word in changelog triggerred the filters at vger.kernel.org,
v2 has been posted.


[PATCHv2 net-next 2/4] sctp: add subscribe per asoc

2018-11-12 Thread Xin Long
The member subscribe should be per asoc, so that sockopt SCTP_EVENT
in the next patch can subscribe a event from one asoc only.

Signed-off-by: Xin Long 
---
 include/net/sctp/structs.h   | 2 ++
 net/sctp/associola.c | 2 ++
 net/sctp/chunk.c | 6 ++
 net/sctp/socket.c| 6 +-
 net/sctp/stream_interleave.c | 7 ---
 net/sctp/ulpqueue.c  | 4 ++--
 6 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index bc7808a..7eaa294 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -2077,6 +2077,8 @@ struct sctp_association {
 
int sent_cnt_removable;
 
+   __u16 subscribe;
+
__u64 abandoned_unsent[SCTP_PR_INDEX(MAX) + 1];
__u64 abandoned_sent[SCTP_PR_INDEX(MAX) + 1];
 };
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 6a28b96..685c7ef 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -135,6 +135,8 @@ static struct sctp_association *sctp_association_init(
 */
asoc->max_burst = sp->max_burst;
 
+   asoc->subscribe = sp->subscribe;
+
/* initialize association timers */
asoc->timeouts[SCTP_EVENT_TIMEOUT_T1_COOKIE] = asoc->rto_initial;
asoc->timeouts[SCTP_EVENT_TIMEOUT_T1_INIT] = asoc->rto_initial;
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 6c761af..0b203b8 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -86,11 +86,10 @@ void sctp_datamsg_free(struct sctp_datamsg *msg)
 /* Final destructruction of datamsg memory. */
 static void sctp_datamsg_destroy(struct sctp_datamsg *msg)
 {
+   struct sctp_association *asoc = NULL;
struct list_head *pos, *temp;
struct sctp_chunk *chunk;
-   struct sctp_sock *sp;
struct sctp_ulpevent *ev;
-   struct sctp_association *asoc = NULL;
int error = 0, notify;
 
/* If we failed, we may need to notify. */
@@ -108,8 +107,7 @@ static void sctp_datamsg_destroy(struct sctp_datamsg *msg)
else
error = asoc->outqueue.error;
 
-   sp = sctp_sk(asoc->base.sk);
-   notify = sctp_ulpevent_type_enabled(sp->subscribe,
+   notify = sctp_ulpevent_type_enabled(asoc->subscribe,
SCTP_SEND_FAILED);
}
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 48e0b45..789008d 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2307,6 +2307,7 @@ static int sctp_setsockopt_events(struct sock *sk, char 
__user *optval,
struct sctp_event_subscribe subscribe;
__u8 *sn_type = (__u8 *)
struct sctp_sock *sp = sctp_sk(sk);
+   struct sctp_association *asoc;
int i;
 
if (optlen > sizeof(struct sctp_event_subscribe))
@@ -2319,14 +2320,17 @@ static int sctp_setsockopt_events(struct sock *sk, char 
__user *optval,
sctp_ulpevent_type_set(>subscribe, SCTP_SN_TYPE_BASE + i,
   sn_type[i]);
 
+   list_for_each_entry(asoc, >ep->asocs, asocs)
+   asoc->subscribe = sctp_sk(sk)->subscribe;
+
/* At the time when a user app subscribes to SCTP_SENDER_DRY_EVENT,
 * if there is no data to be sent or retransmit, the stack will
 * immediately send up this notification.
 */
if (sctp_ulpevent_type_enabled(sp->subscribe, SCTP_SENDER_DRY_EVENT)) {
-   struct sctp_association *asoc = sctp_id2assoc(sk, 0);
struct sctp_ulpevent *event;
 
+   asoc = sctp_id2assoc(sk, 0);
if (asoc && sctp_outq_is_empty(>outqueue)) {
event = sctp_ulpevent_make_sender_dry_event(asoc,
GFP_USER | __GFP_NOWARN);
diff --git a/net/sctp/stream_interleave.c b/net/sctp/stream_interleave.c
index ceef5a3..a6bf215 100644
--- a/net/sctp/stream_interleave.c
+++ b/net/sctp/stream_interleave.c
@@ -503,7 +503,7 @@ static int sctp_enqueue_event(struct sctp_ulpq *ulpq,
sk_incoming_cpu_update(sk);
}
 
-   if (!sctp_ulpevent_is_enabled(event, sp->subscribe))
+   if (!sctp_ulpevent_is_enabled(event, ulpq->asoc->subscribe))
goto out_free;
 
if (skb_list)
@@ -992,16 +992,17 @@ static void sctp_intl_stream_abort_pd(struct sctp_ulpq 
*ulpq, __u16 sid,
  __u32 mid, __u16 flags, gfp_t gfp)
 {
struct sock *sk = ulpq->asoc->base.sk;
-   struct sctp_sock *sp = sctp_sk(sk);
struct sctp_ulpevent *ev = NULL;
 
-   if (!sctp_ulpevent_type_enabled(sp->subscribe,
+   if (!sctp_ulpevent_type_enabled(ulpq->asoc->subscribe,
SCTP_PARTIAL_DELIVERY_EVENT))
return;
 
ev = sctp_ulpevent_make_pdapi(ulpq->asoc, SCTP_PARTIAL_DELIVERY_ABORTED,
  

[PATCHv2 net-next 4/4] sctp: add sockopt SCTP_EVENT

2018-11-12 Thread Xin Long
This patch adds sockopt SCTP_EVENT described in rfc6525#section-6.2.
With this sockopt users can subscribe to an event from a specified
asoc.

Signed-off-by: Xin Long 
---
 include/uapi/linux/sctp.h |  7 
 net/sctp/socket.c | 89 +++
 2 files changed, 96 insertions(+)

diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index 66afa5b..d584073 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -129,6 +129,7 @@ typedef __s32 sctp_assoc_t;
 #define SCTP_STREAM_SCHEDULER_VALUE124
 #define SCTP_INTERLEAVING_SUPPORTED125
 #define SCTP_SENDMSG_CONNECT   126
+#define SCTP_EVENT 127
 
 /* PR-SCTP policies */
 #define SCTP_PR_SCTP_NONE  0x
@@ -1154,6 +1155,12 @@ struct sctp_add_streams {
uint16_t sas_outstrms;
 };
 
+struct sctp_event {
+   sctp_assoc_t se_assoc_id;
+   uint16_t se_type;
+   uint8_t se_on;
+};
+
 /* SCTP Stream schedulers */
 enum sctp_sched_type {
SCTP_SS_FCFS,
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 789008d..1451211 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4288,6 +4288,57 @@ static int sctp_setsockopt_reuse_port(struct sock *sk, 
char __user *optval,
return 0;
 }
 
+static int sctp_setsockopt_event(struct sock *sk, char __user *optval,
+unsigned int optlen)
+{
+   struct sctp_association *asoc;
+   struct sctp_ulpevent *event;
+   struct sctp_event param;
+   int retval = 0;
+
+   if (optlen < sizeof(param)) {
+   retval = -EINVAL;
+   goto out;
+   }
+
+   optlen = sizeof(param);
+   if (copy_from_user(, optval, optlen)) {
+   retval = -EFAULT;
+   goto out;
+   }
+
+   if (param.se_type < SCTP_SN_TYPE_BASE ||
+   param.se_type > SCTP_SN_TYPE_MAX) {
+   retval = -EINVAL;
+   goto out;
+   }
+
+   asoc = sctp_id2assoc(sk, param.se_assoc_id);
+   if (!asoc) {
+   sctp_ulpevent_type_set(_sk(sk)->subscribe,
+  param.se_type, param.se_on);
+   goto out;
+   }
+
+   sctp_ulpevent_type_set(>subscribe, param.se_type, param.se_on);
+
+   if (param.se_type == SCTP_SENDER_DRY_EVENT && param.se_on) {
+   if (sctp_outq_is_empty(>outqueue)) {
+   event = sctp_ulpevent_make_sender_dry_event(asoc,
+   GFP_USER | __GFP_NOWARN);
+   if (!event) {
+   retval = -ENOMEM;
+   goto out;
+   }
+
+   asoc->stream.si->enqueue_event(>ulpq, event);
+   }
+   }
+
+out:
+   return retval;
+}
+
 /* API 6.2 setsockopt(), getsockopt()
  *
  * Applications use setsockopt() and getsockopt() to set or retrieve
@@ -4485,6 +4536,9 @@ static int sctp_setsockopt(struct sock *sk, int level, 
int optname,
case SCTP_REUSE_PORT:
retval = sctp_setsockopt_reuse_port(sk, optval, optlen);
break;
+   case SCTP_EVENT:
+   retval = sctp_setsockopt_event(sk, optval, optlen);
+   break;
default:
retval = -ENOPROTOOPT;
break;
@@ -7430,6 +7484,38 @@ static int sctp_getsockopt_reuse_port(struct sock *sk, 
int len,
return 0;
 }
 
+static int sctp_getsockopt_event(struct sock *sk, int len, char __user *optval,
+int __user *optlen)
+{
+   struct sctp_association *asoc;
+   struct sctp_event param;
+   __u16 subscribe;
+
+   if (len < sizeof(param))
+   return -EINVAL;
+
+   len = sizeof(param);
+   if (copy_from_user(, optval, len))
+   return -EFAULT;
+
+   if (param.se_type < SCTP_SN_TYPE_BASE ||
+   param.se_type > SCTP_SN_TYPE_MAX)
+   return -EINVAL;
+
+   asoc = sctp_id2assoc(sk, param.se_assoc_id);
+   subscribe = asoc ? asoc->subscribe : sctp_sk(sk)->subscribe;
+   param.se_on = sctp_ulpevent_type_enabled(subscribe, param.se_type);
+
+   if (put_user(len, optlen))
+   return -EFAULT;
+
+   if (copy_to_user(optval, , len))
+   return -EFAULT;
+
+   return 0;
+}
+
+
 static int sctp_getsockopt(struct sock *sk, int level, int optname,
   char __user *optval, int __user *optlen)
 {
@@ -7628,6 +7714,9 @@ static int sctp_getsockopt(struct sock *sk, int level, 
int optname,
case SCTP_REUSE_PORT:
retval = sctp_getsockopt_reuse_port(sk, len, optval, optlen);
break;
+   case SCTP_EVENT:
+   retval = sctp_getsockopt_event(sk, len, optval, optlen);
+   break;
default:
retval = -ENOPROTOOPT;
break;
-- 
2.1.0



[PATCHv2 net-next 1/4] sctp: define subscribe in sctp_sock as __u16

2018-11-12 Thread Xin Long
The member subscribe in sctp_sock is used to indicate to which of
the events it is subscribed, more like a group of flags. So it's
better to be defined as __u16 (2 bytpes), instead of struct
sctp_event_subscribe (13 bytes).

Note that sctp_event_subscribe is an UAPI struct, used on sockopt
calls, and thus it will not be removed. This patch only changes
the internal storage of the flags.

Signed-off-by: Xin Long 
---
 include/net/sctp/structs.h   |  2 +-
 include/net/sctp/ulpevent.h  | 39 ---
 include/uapi/linux/sctp.h|  6 +-
 net/sctp/chunk.c |  4 ++--
 net/sctp/socket.c| 35 ++-
 net/sctp/stream_interleave.c | 11 ++-
 net/sctp/ulpqueue.c  |  8 
 7 files changed, 68 insertions(+), 37 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index af9d494..bc7808a 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -217,7 +217,7 @@ struct sctp_sock {
 * These two structures must be grouped together for the usercopy
 * whitelist region.
 */
-   struct sctp_event_subscribe subscribe;
+   __u16 subscribe;
struct sctp_initmsg initmsg;
 
int user_frag;
diff --git a/include/net/sctp/ulpevent.h b/include/net/sctp/ulpevent.h
index 51b4e06..bd922a0 100644
--- a/include/net/sctp/ulpevent.h
+++ b/include/net/sctp/ulpevent.h
@@ -164,30 +164,39 @@ void sctp_ulpevent_read_nxtinfo(const struct 
sctp_ulpevent *event,
 
 __u16 sctp_ulpevent_get_notification_type(const struct sctp_ulpevent *event);
 
+static inline void sctp_ulpevent_type_set(__u16 *subscribe,
+ __u16 sn_type, __u8 on)
+{
+   if (sn_type > SCTP_SN_TYPE_MAX)
+   return;
+
+   if (on)
+   *subscribe |=  (1 << (sn_type - SCTP_SN_TYPE_BASE));
+   else
+   *subscribe &= ~(1 << (sn_type - SCTP_SN_TYPE_BASE));
+}
+
 /* Is this event type enabled? */
-static inline int sctp_ulpevent_type_enabled(__u16 sn_type,
-struct sctp_event_subscribe *mask)
+static inline bool sctp_ulpevent_type_enabled(__u16 subscribe, __u16 sn_type)
 {
-   int offset = sn_type - SCTP_SN_TYPE_BASE;
-   char *amask = (char *) mask;
+   if (sn_type > SCTP_SN_TYPE_MAX)
+   return false;
 
-   if (offset >= sizeof(struct sctp_event_subscribe))
-   return 0;
-   return amask[offset];
+   return subscribe & (1 << (sn_type - SCTP_SN_TYPE_BASE));
 }
 
 /* Given an event subscription, is this event enabled? */
-static inline int sctp_ulpevent_is_enabled(const struct sctp_ulpevent *event,
-  struct sctp_event_subscribe *mask)
+static inline bool sctp_ulpevent_is_enabled(const struct sctp_ulpevent *event,
+   __u16 subscribe)
 {
__u16 sn_type;
-   int enabled = 1;
 
-   if (sctp_ulpevent_is_notification(event)) {
-   sn_type = sctp_ulpevent_get_notification_type(event);
-   enabled = sctp_ulpevent_type_enabled(sn_type, mask);
-   }
-   return enabled;
+   if (!sctp_ulpevent_is_notification(event))
+   return true;
+
+   sn_type = sctp_ulpevent_get_notification_type(event);
+
+   return sctp_ulpevent_type_enabled(subscribe, sn_type);
 }
 
 #endif /* __sctp_ulpevent_h__ */
diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index c81feb3..66afa5b 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -632,7 +632,9 @@ union sctp_notification {
  */
 
 enum sctp_sn_type {
-   SCTP_SN_TYPE_BASE = (1<<15),
+   SCTP_SN_TYPE_BASE   = (1<<15),
+   SCTP_DATA_IO_EVENT  = SCTP_SN_TYPE_BASE,
+#define SCTP_DATA_IO_EVENT SCTP_DATA_IO_EVENT
SCTP_ASSOC_CHANGE,
 #define SCTP_ASSOC_CHANGE  SCTP_ASSOC_CHANGE
SCTP_PEER_ADDR_CHANGE,
@@ -657,6 +659,8 @@ enum sctp_sn_type {
 #define SCTP_ASSOC_RESET_EVENT SCTP_ASSOC_RESET_EVENT
SCTP_STREAM_CHANGE_EVENT,
 #define SCTP_STREAM_CHANGE_EVENT   SCTP_STREAM_CHANGE_EVENT
+   SCTP_SN_TYPE_MAX= SCTP_STREAM_CHANGE_EVENT,
+#define SCTP_SN_TYPE_MAX   SCTP_SN_TYPE_MAX
 };
 
 /* Notification error codes used to fill up the error fields in some
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index ce80878..6c761af 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -109,8 +109,8 @@ static void sctp_datamsg_destroy(struct sctp_datamsg *msg)
error = asoc->outqueue.error;
 
sp = sctp_sk(asoc->base.sk);
-   notify = sctp_ulpevent_type_enabled(SCTP_SEND_FAILED,
-   >subscribe);
+   notify = sctp_ulpevent_type_enabled(sp->subscribe,
+   

[PATCHv2 net-next 3/4] sctp: rename enum sctp_event to sctp_event_type

2018-11-12 Thread Xin Long
sctp_event is a structure name defined in RFC for sockopt
SCTP_EVENT. To avoid the conflict, rename it.

Signed-off-by: Xin Long 
---
 include/net/sctp/constants.h |  2 +-
 include/net/sctp/sm.h|  4 ++--
 net/sctp/primitive.c |  2 +-
 net/sctp/sm_sideeffect.c | 12 ++--
 net/sctp/sm_statetable.c |  2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index 8dadc74..4588bdc 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -71,7 +71,7 @@ enum { SCTP_DEFAULT_INSTREAMS = SCTP_MAX_STREAM };
 SCTP_NUM_AUTH_CHUNK_TYPES)
 
 /* These are the different flavours of event.  */
-enum sctp_event {
+enum sctp_event_type {
SCTP_EVENT_T_CHUNK = 1,
SCTP_EVENT_T_TIMEOUT,
SCTP_EVENT_T_OTHER,
diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 9e3d327..24825a8 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -173,7 +173,7 @@ sctp_state_fn_t sctp_sf_autoclose_timer_expire;
 __u8 sctp_get_chunk_type(struct sctp_chunk *chunk);
 const struct sctp_sm_table_entry *sctp_sm_lookup_event(
struct net *net,
-   enum sctp_event event_type,
+   enum sctp_event_type event_type,
enum sctp_state state,
union sctp_subtype event_subtype);
 int sctp_chunk_iif(const struct sctp_chunk *);
@@ -313,7 +313,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 
 /* Prototypes for statetable processing. */
 
-int sctp_do_sm(struct net *net, enum sctp_event event_type,
+int sctp_do_sm(struct net *net, enum sctp_event_type event_type,
   union sctp_subtype subtype, enum sctp_state state,
   struct sctp_endpoint *ep, struct sctp_association *asoc,
   void *event_arg, gfp_t gfp);
diff --git a/net/sctp/primitive.c b/net/sctp/primitive.c
index c0817f7a..a8c4c33 100644
--- a/net/sctp/primitive.c
+++ b/net/sctp/primitive.c
@@ -53,7 +53,7 @@
 int sctp_primitive_ ## name(struct net *net, struct sctp_association *asoc, \
void *arg) { \
int error = 0; \
-   enum sctp_event event_type; union sctp_subtype subtype; \
+   enum sctp_event_type event_type; union sctp_subtype subtype; \
enum sctp_state state; \
struct sctp_endpoint *ep; \
\
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 85d3930..1d143bc 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -52,7 +52,7 @@
 #include 
 #include 
 
-static int sctp_cmd_interpreter(enum sctp_event event_type,
+static int sctp_cmd_interpreter(enum sctp_event_type event_type,
union sctp_subtype subtype,
enum sctp_state state,
struct sctp_endpoint *ep,
@@ -61,7 +61,7 @@ static int sctp_cmd_interpreter(enum sctp_event event_type,
enum sctp_disposition status,
struct sctp_cmd_seq *commands,
gfp_t gfp);
-static int sctp_side_effects(enum sctp_event event_type,
+static int sctp_side_effects(enum sctp_event_type event_type,
 union sctp_subtype subtype,
 enum sctp_state state,
 struct sctp_endpoint *ep,
@@ -623,7 +623,7 @@ static void sctp_cmd_init_failed(struct sctp_cmd_seq 
*commands,
 /* Worker routine to handle SCTP_CMD_ASSOC_FAILED.  */
 static void sctp_cmd_assoc_failed(struct sctp_cmd_seq *commands,
  struct sctp_association *asoc,
- enum sctp_event event_type,
+ enum sctp_event_type event_type,
  union sctp_subtype subtype,
  struct sctp_chunk *chunk,
  unsigned int error)
@@ -1162,7 +1162,7 @@ static void sctp_cmd_send_asconf(struct sctp_association 
*asoc)
  * If you want to understand all of lksctp, this is a
  * good place to start.
  */
-int sctp_do_sm(struct net *net, enum sctp_event event_type,
+int sctp_do_sm(struct net *net, enum sctp_event_type event_type,
   union sctp_subtype subtype, enum sctp_state state,
   struct sctp_endpoint *ep, struct sctp_association *asoc,
   void *event_arg, gfp_t gfp)
@@ -1199,7 +1199,7 @@ int sctp_do_sm(struct net *net, enum sctp_event 
event_type,
 /*
  * This the master state function side effect processing function.
  */
-static int sctp_side_effects(enum sctp_event event_type,
+static int 

[PATCHv2 net-next 0/4] sctp: add subscribe per asoc and sockopt SCTP_EVENT

2018-11-12 Thread Xin Long
This patchset mainly adds the Event Subscription sockopt described in
rfc6525#section-6.2:

"Subscribing to events as described in [RFC6458] uses a setsockopt()
call with the SCTP_EVENT socket option.  This option takes the
following structure, which specifies the association, the event type
(using the same value found in the event type field), and an on/off
boolean.

  struct sctp_event {
sctp_assoc_t se_assoc_id;
uint16_t se_type;
uint8_t  se_on;
  };

The user fills in the se_type field with the same value found in the
strreset_type field, i.e., SCTP_STREAM_RESET_EVENT.  The user will
also fill in the se_assoc_id field with either the association to set
this event on (this field is ignored for one-to-one style sockets) or
one of the reserved constant values defined in [RFC6458].  Finally,
the se_on field is set with a 1 to enable the event or a 0 to disable
the event."

As for the old SCTP_EVENTS Option with struct sctp_event_subscribe,
it's being DEPRECATED.

Xin Long (4):
  sctp: define subscribe in sctp_sock as __u16
  sctp: add subscribe per asoc
  sctp: rename enum sctp_event to sctp_event_type
  sctp: add sockopt SCTP_EVENT

 include/net/sctp/constants.h |   2 +-
 include/net/sctp/sm.h|   4 +-
 include/net/sctp/structs.h   |   4 +-
 include/net/sctp/ulpevent.h  |  39 --
 include/uapi/linux/sctp.h|  13 -
 net/sctp/associola.c |   2 +
 net/sctp/chunk.c |   8 ++-
 net/sctp/primitive.c |   2 +-
 net/sctp/sm_sideeffect.c |  12 ++---
 net/sctp/sm_statetable.c |   2 +-
 net/sctp/socket.c| 126 ---
 net/sctp/stream_interleave.c |  12 +++--
 net/sctp/ulpqueue.c  |   8 +--
 13 files changed, 184 insertions(+), 50 deletions(-)

-- 
2.1.0



Re: [PATCH net-next 2/2] r8169: use proper constant for PCI vendor USR instead of numerical id

2018-11-12 Thread kbuild test robot
Hi Heiner,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Heiner-Kallweit/r8169-add-USR-PCI-vendor-id/2018-13
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   include/linux/slab.h:332:43: warning: dubious: x & !y
>> drivers/isdn/hardware/mISDN/w6692.c:1404:30: error: undefined identifier 
>> 'PCI_DEVICE_ID_USR_6692'
   drivers/isdn/hardware/mISDN/w6692.c:1404:23: error: 'PCI_DEVICE_ID_USR_6692' 
undeclared here (not in a function); did you mean 'PCI_DEVICE_ID_SI_6202'?
   PCI_VENDOR_ID_USR, PCI_DEVICE_ID_USR_6692, 0, 0,
  ^~
  PCI_DEVICE_ID_SI_6202

vim +/PCI_DEVICE_ID_USR_6692 +1404 drivers/isdn/hardware/mISDN/w6692.c

707b2ce6 Karsten Keil 2009-07-22  1399  
e8336ed0 Arvind Yadav 2017-07-15  1400  static const struct pci_device_id 
w6692_ids[] = {
707b2ce6 Karsten Keil 2009-07-22  1401  { PCI_VENDOR_ID_DYNALINK, 
PCI_DEVICE_ID_DYNALINK_IS64PH,
707b2ce6 Karsten Keil 2009-07-22  1402PCI_ANY_ID, PCI_ANY_ID, 0, 0, 
(ulong)_map[0]},
707b2ce6 Karsten Keil 2009-07-22  1403  { PCI_VENDOR_ID_WINBOND2, 
PCI_DEVICE_ID_WINBOND2_6692,
707b2ce6 Karsten Keil 2009-07-22 @1404PCI_VENDOR_ID_USR, 
PCI_DEVICE_ID_USR_6692, 0, 0,
707b2ce6 Karsten Keil 2009-07-22  1405(ulong)_map[2]},
707b2ce6 Karsten Keil 2009-07-22  1406  { PCI_VENDOR_ID_WINBOND2, 
PCI_DEVICE_ID_WINBOND2_6692,
707b2ce6 Karsten Keil 2009-07-22  1407PCI_ANY_ID, PCI_ANY_ID, 0, 0, 
(ulong)_map[1]},
707b2ce6 Karsten Keil 2009-07-22  1408  { }
707b2ce6 Karsten Keil 2009-07-22  1409  };
707b2ce6 Karsten Keil 2009-07-22  1410  MODULE_DEVICE_TABLE(pci, w6692_ids);
707b2ce6 Karsten Keil 2009-07-22  1411  

:: The code at line 1404 was first introduced by commit
:: 707b2ce6c1f4f1261788f2ff09ad82c35e0e6240 mISDN: Add driver for Winbond 
cards

:: TO: Karsten Keil 
:: CC: Karsten Keil 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH bpf-next v2] bpftool: make libbfd optional

2018-11-12 Thread Quentin Monnet
2018-11-12 14:02 UTC-0800 ~ Jakub Kicinski 
> On Mon, 12 Nov 2018 13:44:10 -0800, Stanislav Fomichev wrote:
>> Make it possible to build bpftool without libbfd. libbfd and libopcodes are
>> typically provided in dev/dbg packages (binutils-dev in debian) which we
>> usually don't have installed on the fleet machines and we'd like a way to 
>> have
>> bpftool version that works without installing any additional packages.
>> This excludes support for disassembling jit-ted code and prints an error if
>> the user tries to use these features.
>>
>> Tested by:
>> cat > FEATURES_DUMP.bpftool <> feature-libbfd=0
>> feature-disassembler-four-args=1
>> feature-reallocarray=0
>> feature-libelf=1
>> feature-libelf-mmap=1
>> feature-bpf=1
>> EOF
>> FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make
>> ldd bpftool | grep libbfd
>>
>> Signed-off-by: Stanislav Fomichev 
> 
> Seems reasonable, thanks!
> 
> Acked-by: Jakub Kicinski 
> 

Thanks Stanislav!

There is a problem with this patch on some distributions, Ubuntu at least.

Feature detection for libbfd has been used for perf before being also
used with bpftool. Since commit 280e7c48c3b8 the feature needs libz and
libiberty to be present on the system, otherwise the feature would not
compile (and be detected) on OpenSuse.

On Ubuntu, libiberty is not needed (libbfd might be statically linked
against it, if I remember correctly?), which means that we are able to
build bpftool as long as binutils-dev has been installed, even if
libiberty-dev has not been installed. The BFD feature, in that case,
will appear as “undetected”. It is a bug. But since the Makefile does
not stop compilation in that case (another bug), in the end we're good.

With your patch, the problem is that libbpf detection will fail on
Ubuntu if libiberty-dev is not present, even though all the necessary
libraries for using the JIT disassembler are available. And in that case
it _will_ make a difference, since the Makefile will no more compile the
libbfd-related bits.

So I'm not against the idea, but we have to fix libbfd detection first.

Thanks,
Quentin


[Patch net-next] net: remove unused skb_send_sock()

2018-11-12 Thread Cong Wang
Signed-off-by: Cong Wang 
---
 include/linux/skbuff.h |  1 -
 net/core/skbuff.c  | 13 -
 2 files changed, 14 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7dcfb5591dc3..9f5bd97a26bd 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3327,7 +3327,6 @@ int skb_splice_bits(struct sk_buff *skb, struct sock *sk, 
unsigned int offset,
unsigned int flags);
 int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset,
 int len);
-int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len);
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
 unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
 int skb_zerocopy(struct sk_buff *to, struct sk_buff *from,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index fcb1155a00ec..cfe5a03fbdf3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2366,19 +2366,6 @@ int skb_send_sock_locked(struct sock *sk, struct sk_buff 
*skb, int offset,
 }
 EXPORT_SYMBOL_GPL(skb_send_sock_locked);
 
-/* Send skb data on a socket. */
-int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len)
-{
-   int ret = 0;
-
-   lock_sock(sk);
-   ret = skb_send_sock_locked(sk, skb, offset, len);
-   release_sock(sk);
-
-   return ret;
-}
-EXPORT_SYMBOL_GPL(skb_send_sock);
-
 /**
  * skb_store_bits - store bits from kernel buffer to skb
  * @skb: destination buffer
-- 
2.19.1



Re: [PATCH v2] net: Add trace events for all receive exit points

2018-11-12 Thread Steven Rostedt
On Mon, 12 Nov 2018 15:46:53 -0500 (EST)
Mathieu Desnoyers  wrote:

> I also notice that in two cases, a "gro_result_t" is implicitly cast
> to "int". I usually frown upon this kind of stuff, because it's asking
> for trouble if gro_result_t typedef to something else than "int" in the
> future.
> 
> I would recommend going for two templates, one which takes a "int"
> ret parameter, and the other a "gro_result_t" ret parameter.
> 
> Or am I being too cautious ?

That's more of a question for the netdev maintainers. If they think
casting gro_result_t to int is fine, then I'm fine. If it breaks in the
future, they need to deal with it, I don't ;-)

The downside of two templates, is that the templates are the bloated
part of the trace event (DEFINE_EVENT()s are light weight). They can
add a couple of K to the memory foot print.

-- Steve


[PATCH][net-next] net: slightly optimize eth_type_trans

2018-11-12 Thread Li RongQing
netperf udp stream shows that eth_type_trans takes certain cpu,
so adjust the mac address check order, and firstly check if it
is device address, and only check if it is multicast address
only if not the device address.

After this change:
To unicast, and skb dst mac is device mac, this is most of time
reduce a comparision
To unicast, and skb dst mac is not device mac, nothing change
To multicast, increase a comparision

Before:
1.03%  [kernel]  [k] eth_type_trans

After:
0.78%  [kernel]  [k] eth_type_trans

Signed-off-by: Zhang Yu 
Signed-off-by: Li RongQing 
---
 net/ethernet/eth.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index fd8faa0dfa61..1c88f5c5d5b1 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -165,15 +165,17 @@ __be16 eth_type_trans(struct sk_buff *skb, struct 
net_device *dev)
eth = (struct ethhdr *)skb->data;
skb_pull_inline(skb, ETH_HLEN);
 
-   if (unlikely(is_multicast_ether_addr_64bits(eth->h_dest))) {
-   if (ether_addr_equal_64bits(eth->h_dest, dev->broadcast))
-   skb->pkt_type = PACKET_BROADCAST;
-   else
-   skb->pkt_type = PACKET_MULTICAST;
+   if (unlikely(!ether_addr_equal_64bits(eth->h_dest,
+  dev->dev_addr))) {
+   if (unlikely(is_multicast_ether_addr_64bits(eth->h_dest))) {
+   if (ether_addr_equal_64bits(eth->h_dest, 
dev->broadcast))
+   skb->pkt_type = PACKET_BROADCAST;
+   else
+   skb->pkt_type = PACKET_MULTICAST;
+   } else {
+   skb->pkt_type = PACKET_OTHERHOST;
+   }
}
-   else if (unlikely(!ether_addr_equal_64bits(eth->h_dest,
-  dev->dev_addr)))
-   skb->pkt_type = PACKET_OTHERHOST;
 
/*
 * Some variants of DSA tagging don't have an ethertype field
-- 
2.16.2



[PATCH][net-next][v2] net: remove BUG_ON from __pskb_pull_tail

2018-11-12 Thread Li RongQing
if list is NULL pointer, and the following access of list
will trigger panic, which is same as BUG_ON

Signed-off-by: Li RongQing 
---
 net/core/skbuff.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 396fcb3baad0..d69503d66021 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1925,8 +1925,6 @@ void *__pskb_pull_tail(struct sk_buff *skb, int delta)
struct sk_buff *insp = NULL;
 
do {
-   BUG_ON(!list);
-
if (list->len <= eat) {
/* Eaten as whole. */
eat -= list->len;
-- 
2.16.2



Re: [iproute PATCH] man: ip-route.8: Document nexthop limit

2018-11-12 Thread David Ahern
On 11/12/18 2:21 PM, Phil Sutter wrote:
> diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
> index a33ce1f0f4006..383178c11331e 100644
> --- a/man/man8/ip-route.8.in
> +++ b/man/man8/ip-route.8.in
> @@ -589,6 +589,13 @@ argument lists:
>  route reflecting its relative bandwidth or quality.
>  .in -8
>  
> +The internal buffer used in iproute2 limits the maximum number of nexthops to
> +be specified in one go. If only a gateway address is given, the current 
> buffer
> +size allows for 144 IPv6 nexthops and 253 IPv4 ones. If more are required, 
> they
> +may be added to the existing route using
> +.B "ip route append"
> +command.
> +

That is not true for IPv4. 'ip ro append' adds a new route after the
existing route - an entry that can not be hit unless all of the nexthops
in the first route are down. 'ip ro prepend' adds a new entry before the
existing one meaning it takes precedence over the existing entries.

For IPv6, 'append' and 'prepend' both add new nexthops to the existing
entry.



[Patch net-next] net: get rid of __tcp_checksum_complete()

2018-11-12 Thread Cong Wang
__tcp_checksum_complete() is 100% same with __skb_checksum_complete()
and there is no other caller except tcp_checksum_complete().
So, just use __skb_checksum_complete() there.

Signed-off-by: Cong Wang 
---
 include/net/tcp.h | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 4743836bed2e..b84b694e8b3d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1315,15 +1315,10 @@ static inline __sum16 tcp_v4_check(int len, __be32 
saddr,
return csum_tcpudp_magic(saddr,daddr,len,IPPROTO_TCP,base);
 }
 
-static inline __sum16 __tcp_checksum_complete(struct sk_buff *skb)
-{
-   return __skb_checksum_complete(skb);
-}
-
 static inline bool tcp_checksum_complete(struct sk_buff *skb)
 {
return !skb_csum_unnecessary(skb) &&
-   __tcp_checksum_complete(skb);
+   __skb_checksum_complete(skb);
 }
 
 bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb);
-- 
2.19.1



[PATCH net] net_sched: sch_fq: ensure maxrate fq parameter applies to EDT flows

2018-11-12 Thread Eric Dumazet
When EDT conversion happened, fq lost the ability to enfore a maxrate
for all flows. It kept it for non EDT flows.

This commit restores the functionality.

Tested:

tc qd replace dev eth0 root fq maxrate 500Mbit
netperf -P0 -H host -- -O THROUGHPUT
489.75

Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time 
model")
Signed-off-by: Eric Dumazet 
---
 net/sched/sch_fq.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 
4b1af706896c07e5a0fe6d542dfcd530acdcf8f5..25a7cf6d380fd1ef5610a43a06dea488121b8206
 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -469,22 +469,29 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch)
goto begin;
}
prefetch(>end);
-   f->credit -= qdisc_pkt_len(skb);
+   plen = qdisc_pkt_len(skb);
+   f->credit -= plen;
 
-   if (ktime_to_ns(skb->tstamp) || !q->rate_enable)
+   if (!q->rate_enable)
goto out;
 
rate = q->flow_max_rate;
-   if (skb->sk)
-   rate = min(skb->sk->sk_pacing_rate, rate);
 
-   if (rate <= q->low_rate_threshold) {
-   f->credit = 0;
-   plen = qdisc_pkt_len(skb);
-   } else {
-   plen = max(qdisc_pkt_len(skb), q->quantum);
-   if (f->credit > 0)
-   goto out;
+   /* If EDT time was provided for this skb, we need to
+* update f->time_next_packet only if this qdisc enforces
+* a flow max rate.
+*/
+   if (!skb->tstamp) {
+   if (skb->sk)
+   rate = min(skb->sk->sk_pacing_rate, rate);
+
+   if (rate <= q->low_rate_threshold) {
+   f->credit = 0;
+   } else {
+   plen = max(plen, q->quantum);
+   if (f->credit > 0)
+   goto out;
+   }
}
if (rate != ~0UL) {
u64 len = (u64)plen * NSEC_PER_SEC;
-- 
2.19.1.930.g4563a0d9d0-goog



[net-next PATCH v4] net: sched: cls_flower: Classify packets using port ranges

2018-11-12 Thread Amritha Nambiar
Added support in tc flower for filtering based on port ranges.

Example:
1. Match on a port range:
-
$ tc filter add dev enp4s0 protocol ip parent :\
  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent :
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port range 20-30
  skip_hw
  not_in_hw
action order 1: gact action drop
 random type none pass val 0
 index 1 ref 1 bind 1 installed 85 sec used 3 sec
Action statistics:
Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

2. Match on IP address and port range:
--
$ tc filter add dev enp4s0 protocol ip parent :\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent :
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port range 100-200
  skip_hw
  not_in_hw
action order 1: gact action drop
 random type none pass val 0
 index 2 ref 1 bind 1 installed 58 sec used 2 sec
Action statistics:
Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

v4:
1. Added condition before setting port key.
2. Organized setting and dumping port range keys into functions
   and added validation of input range.

v3:
1. Moved new fields in UAPI enum to the end of enum.
2. Removed couple of empty lines.

v2:
Addressed Jiri's comments:
1. Added separate functions for dst and src comparisons.
2. Removed endpoint enum.
3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
  lookup.
4. Cleaned up fl_lookup function.

Signed-off-by: Amritha Nambiar 
---
 include/uapi/linux/pkt_cls.h |7 ++
 net/sched/cls_flower.c   |  155 --
 2 files changed, 156 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 401d0c1..95d0db2 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -485,6 +485,11 @@ enum {
 
TCA_FLOWER_IN_HW_COUNT,
 
+   TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
+   TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
+   TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
+   TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
+
__TCA_FLOWER_MAX,
 };
 
@@ -518,6 +523,8 @@ enum {
TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
 };
 
+#define TCA_FLOWER_MASK_FLAGS_RANGE(1 << 0) /* Range-based match */
+
 /* Match-all classifier */
 
 enum {
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index c6c3278..85e9f8e 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -55,6 +55,8 @@ struct fl_flow_key {
struct flow_dissector_key_ip ip;
struct flow_dissector_key_ip enc_ip;
struct flow_dissector_key_enc_opts enc_opts;
+   struct flow_dissector_key_ports tp_min;
+   struct flow_dissector_key_ports tp_max;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -65,6 +67,7 @@ struct fl_flow_mask_range {
 struct fl_flow_mask {
struct fl_flow_key key;
struct fl_flow_mask_range range;
+   u32 flags;
struct rhash_head ht_node;
struct rhashtable ht;
struct rhashtable_params filter_ht_params;
@@ -179,13 +182,89 @@ static void fl_clear_masked_range(struct fl_flow_key *key,
memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
 }
 
-static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
-  struct fl_flow_key *mkey)
+static bool fl_range_port_dst_cmp(struct cls_fl_filter *filter,
+ struct fl_flow_key *key,
+ struct fl_flow_key *mkey)
+{
+   __be16 min_mask, max_mask, min_val, max_val;
+
+   min_mask = htons(filter->mask->key.tp_min.dst);
+   max_mask = htons(filter->mask->key.tp_max.dst);
+   min_val = htons(filter->key.tp_min.dst);
+   max_val = htons(filter->key.tp_max.dst);
+
+   if (min_mask && max_mask) {
+   if (htons(key->tp.dst) < min_val ||
+   htons(key->tp.dst) > max_val)
+   return false;
+
+   /* skb does not have min and max values */
+   mkey->tp_min.dst = filter->mkey.tp_min.dst;
+   mkey->tp_max.dst = filter->mkey.tp_max.dst;
+   }
+   return true;
+}
+
+static bool fl_range_port_src_cmp(struct cls_fl_filter *filter,
+ struct fl_flow_key *key,
+ struct fl_flow_key *mkey)
+{
+   __be16 min_mask, max_mask, min_val, max_val;
+
+   min_mask = 

Re: [net-next PATCH v3] net: sched: cls_flower: Classify packets using port ranges

2018-11-12 Thread Nambiar, Amritha
On 11/9/2018 11:14 PM, Jiri Pirko wrote:
> Sat, Nov 10, 2018 at 01:11:10AM CET, amritha.namb...@intel.com wrote:
> 
> [...]
> 
>> @@ -1026,8 +1122,7 @@ static void fl_init_dissector(struct flow_dissector 
>> *dissector,
>>   FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4);
>>  FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>>   FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
>> -FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>> - FLOW_DISSECTOR_KEY_PORTS, tp);
>> +FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp);
> 
> You still need to set the key under a condition. Something like:
>   if (FL_KEY_IS_MASKED(mask, tp) ||
>   FL_KEY_IS_MASKED(mask, tp_min) ||
>   FL_KEY_IS_MASKED(mask, tp_max)
>   FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp);
> 

Yes, will do. Thanks!

> 
>>  FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>>   FLOW_DISSECTOR_KEY_IP, ip);
>>  FL_KEY_SET_IF_MASKED(mask, keys, cnt,
> 
> [...]
> 


Re: [PATCH bpf-next] selftests/bpf: Fix uninitialized duration warning

2018-11-12 Thread Martin Lau
On Fri, Nov 09, 2018 at 10:18:16AM -0800, Joe Stringer wrote:
> Daniel Borkmann reports:
> 
> test_progs.c: In function ‘main’:
> test_progs.c:81:3: warning: ‘duration’ may be used uninitialized in this 
> function [-Wmaybe-uninitialized]
>printf("%s:PASS:%s %d nsec\n", __func__, tag, duration);\
>^~
> test_progs.c:1706:8: note: ‘duration’ was declared here
>   __u32 duration;
> ^~~~
> 
> Signed-off-by: Joe Stringer 
I can repro with my locally compiled gcc 7.3.  It fixed the warning.

Acked-by: Martin KaFai Lau 

> ---
> 
> I'm actually not able to reproduce this with GCC 7.3 or 8.2, so I'll
> rely on review to establish that this patch works as intended.
> ---
>  tools/testing/selftests/bpf/test_progs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/test_progs.c 
> b/tools/testing/selftests/bpf/test_progs.c
> index 2d3c04f45530..c1e688f61061 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -1703,7 +1703,7 @@ static void test_reference_tracking()
>   const char *file = "./test_sk_lookup_kern.o";
>   struct bpf_object *obj;
>   struct bpf_program *prog;
> - __u32 duration;
> + __u32 duration = 0;
>   int err = 0;
>  
>   obj = bpf_object__open(file);
> -- 
> 2.17.1
> 


[PATCH bpf-next] bpf: libbpf: Fix bpf_program__next() API

2018-11-12 Thread Martin KaFai Lau
This patch restores the behavior in
commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for 
multi-function programs")
such that bpf_program__next() does not return pseudo programs in ".text".

Fixes: 0c19a9fbc9cd ("libbpf: cleanup after partial failure in bpf_object__pin")
Signed-off-by: Martin KaFai Lau 
---
 tools/lib/bpf/libbpf.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e827542ffa3a..a01eb9584e52 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2193,19 +2193,25 @@ void *bpf_object__priv(struct bpf_object *obj)
 }
 
 static struct bpf_program *
-__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i)
+__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, bool 
forward)
 {
+   size_t nr_programs = obj->nr_programs;
ssize_t idx;
 
-   if (!obj->programs)
+   if (!nr_programs)
return NULL;
 
+   if (!p)
+   /* Iter from the beginning */
+   return forward ? >programs[0] :
+   >programs[nr_programs - 1];
+
if (p->obj != obj) {
pr_warning("error: program handler doesn't match object\n");
return NULL;
}
 
-   idx = (p - obj->programs) + i;
+   idx = (p - obj->programs) + (forward ? 1 : -1);
if (idx >= obj->nr_programs || idx < 0)
return NULL;
return >programs[idx];
@@ -2216,11 +,8 @@ bpf_program__next(struct bpf_program *prev, struct 
bpf_object *obj)
 {
struct bpf_program *prog = prev;
 
-   if (prev == NULL)
-   return obj->programs;
-
do {
-   prog = __bpf_program__iter(prog, obj, 1);
+   prog = __bpf_program__iter(prog, obj, true);
} while (prog && bpf_program__is_function_storage(prog, obj));
 
return prog;
@@ -2231,14 +2234,8 @@ bpf_program__prev(struct bpf_program *next, struct 
bpf_object *obj)
 {
struct bpf_program *prog = next;
 
-   if (next == NULL) {
-   if (!obj->nr_programs)
-   return NULL;
-   return obj->programs + obj->nr_programs - 1;
-   }
-
do {
-   prog = __bpf_program__iter(prog, obj, -1);
+   prog = __bpf_program__iter(prog, obj, false);
} while (prog && bpf_program__is_function_storage(prog, obj));
 
return prog;
-- 
2.17.1



Re: [PATCH v5 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin

2018-11-12 Thread Martin Lau
On Mon, Nov 12, 2018 at 03:29:25PM -0800, Stanislav Fomichev wrote:
> On 11/12, Martin Lau wrote:
> > On Mon, Nov 12, 2018 at 02:10:11PM -0800, Stanislav Fomichev wrote:
> > > On 11/12, Martin Lau wrote:
> > > > On Fri, Nov 09, 2018 at 08:21:41AM -0800, Stanislav Fomichev wrote:
> > > > [ ... ]
> > > > > @@ -1918,23 +2160,20 @@ void *bpf_object__priv(struct bpf_object *obj)
> > > > >  }
> > > > >  
> > > > >  static struct bpf_program *
> > > > > -__bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
> > > > > +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, 
> > > > > int i)
> > > > >  {
> > > > > - size_t idx;
> > > > > + ssize_t idx;
> > > > >  
> > > > >   if (!obj->programs)
> > > > >   return NULL;
> > > > > - /* First handler */
> > > > > - if (prev == NULL)
> > > > > - return >programs[0];
> > > > >  
> > > > > - if (prev->obj != obj) {
> > > > > + if (p->obj != obj) {
> > > > >   pr_warning("error: program handler doesn't match 
> > > > > object\n");
> > > > >   return NULL;
> > > > >   }
> > > > >  
> > > > > - idx = (prev - obj->programs) + 1;
> > > > > - if (idx >= obj->nr_programs)
> > > > > + idx = (p - obj->programs) + i;
> > > > > + if (idx >= obj->nr_programs || idx < 0)
> > > > >   return NULL;
> > > > >   return >programs[idx];
> > > > >  }
> > > > > @@ -1944,8 +2183,29 @@ bpf_program__next(struct bpf_program *prev, 
> > > > > struct bpf_object *obj)
> > > > >  {
> > > > >   struct bpf_program *prog = prev;
> > > > >  
> > > > > + if (prev == NULL)
> > > > > + return obj->programs;
> > > > > +
> > > > This patch breaks the behavior introduced in
> > > > commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program 
> > > > for multi-function programs"):
> > > > "Make bpf_program__next() skip over '.text' section if object file
> > > >  has pseudo calls.  The '.text' section is hardly a program in that
> > > >  case, it's more of a storage for code of functions other than main."
> > > > 
> > > > For example, the userspace could have been doing:
> > > > prog = bpf_program__next(NULL, obj);
> > > > bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
> > > > bpf_object__load(obj);
> > > > 
> > > > For the bpf_prog.o that has pseudo calls, after this patch in bpf-next,
> > > > the prog returned by bpf_program__next() could be in ".text" instead of
> > > > the main bpf program.  The next bpf_program__set_type() has
> > > > no effect to the main program.  The following bpf_object__load()
> > > > will catch user in surprise with the main bpf prog in
> > > > the wrong BPF_PROG_TYPE.
> > > 
> > > Will something like the following fix your concern? (plus, assuming the
> > > same for prev):
> > > 
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -2216,8 +2216,11 @@ bpf_program__next(struct bpf_program *prev, struct 
> > > bpf_object *obj)
> > >  {
> > > struct bpf_program *prog = prev;
> > >  
> > > -   if (prev == NULL)
> > > -   return obj->programs;
> > > +   if (prev == NULL) {
> > > +   prog = obj->programs;
> > > +   if (!prog || !bpf_program__is_function_storage(prog, obj))
> > > +   return prog;
> > > +   }
> > >  
> > > do {
> > > prog = __bpf_program__iter(prog, obj, 1);
> > > 
> > > Any suggestions for a better way to do it?
> > I think that would work.  The bpf_program__prev() will need the same
> > treatment though...
> > 
> > Here is my mostly untested fix to unblock my other dev works.  It moves
> > the very first NULL check back to __bpf_program__iter():
> I like your version and it works with my simple flow dissector test :-)
Great.  I will send out an offical patch.

> Thanks for spotting and fixing it!
> 
> 
> > From de1c89ae1768e756825a6874268b5b1686695c93 Mon Sep 17 00:00:00 2001
> > From: Martin KaFai Lau 
> > Date: Mon, 12 Nov 2018 14:52:39 -0800
> > Subject: [PATCH] bpf: libbpf: Fix bpf_program__next() API
> > 
> > This patch restores the behavior in
> > commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for 
> > multi-function programs"):
> > such that bpf_program__next() does not return pseudo programs in ".text".
> > 
> > Fixes: 0c19a9fbc9cd ("libbpf: cleanup after partial failure in 
> > bpf_object__pin")
> > Signed-off-by: Martin KaFai Lau 
> > ---
> >  tools/lib/bpf/libbpf.c | 25 +++--
> >  1 file changed, 11 insertions(+), 14 deletions(-)
> > 
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index e827542ffa3a..a01eb9584e52 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -2193,19 +2193,25 @@ void *bpf_object__priv(struct bpf_object *obj)
> >  }
> >  
> >  static struct bpf_program *
> > -__bpf_program__iter(struct bpf_program *p, struct bpf_object 

Re: [PATCH v5 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin

2018-11-12 Thread Stanislav Fomichev
On 11/12, Martin Lau wrote:
> On Mon, Nov 12, 2018 at 02:10:11PM -0800, Stanislav Fomichev wrote:
> > On 11/12, Martin Lau wrote:
> > > On Fri, Nov 09, 2018 at 08:21:41AM -0800, Stanislav Fomichev wrote:
> > > [ ... ]
> > > > @@ -1918,23 +2160,20 @@ void *bpf_object__priv(struct bpf_object *obj)
> > > >  }
> > > >  
> > > >  static struct bpf_program *
> > > > -__bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
> > > > +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int 
> > > > i)
> > > >  {
> > > > -   size_t idx;
> > > > +   ssize_t idx;
> > > >  
> > > > if (!obj->programs)
> > > > return NULL;
> > > > -   /* First handler */
> > > > -   if (prev == NULL)
> > > > -   return >programs[0];
> > > >  
> > > > -   if (prev->obj != obj) {
> > > > +   if (p->obj != obj) {
> > > > pr_warning("error: program handler doesn't match 
> > > > object\n");
> > > > return NULL;
> > > > }
> > > >  
> > > > -   idx = (prev - obj->programs) + 1;
> > > > -   if (idx >= obj->nr_programs)
> > > > +   idx = (p - obj->programs) + i;
> > > > +   if (idx >= obj->nr_programs || idx < 0)
> > > > return NULL;
> > > > return >programs[idx];
> > > >  }
> > > > @@ -1944,8 +2183,29 @@ bpf_program__next(struct bpf_program *prev, 
> > > > struct bpf_object *obj)
> > > >  {
> > > > struct bpf_program *prog = prev;
> > > >  
> > > > +   if (prev == NULL)
> > > > +   return obj->programs;
> > > > +
> > > This patch breaks the behavior introduced in
> > > commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program 
> > > for multi-function programs"):
> > > "Make bpf_program__next() skip over '.text' section if object file
> > >  has pseudo calls.  The '.text' section is hardly a program in that
> > >  case, it's more of a storage for code of functions other than main."
> > > 
> > > For example, the userspace could have been doing:
> > >   prog = bpf_program__next(NULL, obj);
> > >   bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
> > >   bpf_object__load(obj);
> > > 
> > > For the bpf_prog.o that has pseudo calls, after this patch in bpf-next,
> > > the prog returned by bpf_program__next() could be in ".text" instead of
> > > the main bpf program.  The next bpf_program__set_type() has
> > > no effect to the main program.  The following bpf_object__load()
> > > will catch user in surprise with the main bpf prog in
> > > the wrong BPF_PROG_TYPE.
> > 
> > Will something like the following fix your concern? (plus, assuming the
> > same for prev):
> > 
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -2216,8 +2216,11 @@ bpf_program__next(struct bpf_program *prev, struct 
> > bpf_object *obj)
> >  {
> > struct bpf_program *prog = prev;
> >  
> > -   if (prev == NULL)
> > -   return obj->programs;
> > +   if (prev == NULL) {
> > +   prog = obj->programs;
> > +   if (!prog || !bpf_program__is_function_storage(prog, obj))
> > +   return prog;
> > +   }
> >  
> > do {
> > prog = __bpf_program__iter(prog, obj, 1);
> > 
> > Any suggestions for a better way to do it?
> I think that would work.  The bpf_program__prev() will need the same
> treatment though...
> 
> Here is my mostly untested fix to unblock my other dev works.  It moves
> the very first NULL check back to __bpf_program__iter():
I like your version and it works with my simple flow dissector test :-)
Thanks for spotting and fixing it!


> From de1c89ae1768e756825a6874268b5b1686695c93 Mon Sep 17 00:00:00 2001
> From: Martin KaFai Lau 
> Date: Mon, 12 Nov 2018 14:52:39 -0800
> Subject: [PATCH] bpf: libbpf: Fix bpf_program__next() API
> 
> This patch restores the behavior in
> commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for 
> multi-function programs"):
> such that bpf_program__next() does not return pseudo programs in ".text".
> 
> Fixes: 0c19a9fbc9cd ("libbpf: cleanup after partial failure in 
> bpf_object__pin")
> Signed-off-by: Martin KaFai Lau 
> ---
>  tools/lib/bpf/libbpf.c | 25 +++--
>  1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index e827542ffa3a..a01eb9584e52 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -2193,19 +2193,25 @@ void *bpf_object__priv(struct bpf_object *obj)
>  }
>  
>  static struct bpf_program *
> -__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i)
> +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, bool 
> forward)
>  {
> + size_t nr_programs = obj->nr_programs;
>   ssize_t idx;
>  
> - if (!obj->programs)
> + if (!nr_programs)
>   return NULL;
>  
> + if (!p)
> + /* Iter from the beginning */
> +  

Re: [PATCH v5 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin

2018-11-12 Thread Martin Lau
On Mon, Nov 12, 2018 at 02:10:11PM -0800, Stanislav Fomichev wrote:
> On 11/12, Martin Lau wrote:
> > On Fri, Nov 09, 2018 at 08:21:41AM -0800, Stanislav Fomichev wrote:
> > [ ... ]
> > > @@ -1918,23 +2160,20 @@ void *bpf_object__priv(struct bpf_object *obj)
> > >  }
> > >  
> > >  static struct bpf_program *
> > > -__bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
> > > +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i)
> > >  {
> > > - size_t idx;
> > > + ssize_t idx;
> > >  
> > >   if (!obj->programs)
> > >   return NULL;
> > > - /* First handler */
> > > - if (prev == NULL)
> > > - return >programs[0];
> > >  
> > > - if (prev->obj != obj) {
> > > + if (p->obj != obj) {
> > >   pr_warning("error: program handler doesn't match object\n");
> > >   return NULL;
> > >   }
> > >  
> > > - idx = (prev - obj->programs) + 1;
> > > - if (idx >= obj->nr_programs)
> > > + idx = (p - obj->programs) + i;
> > > + if (idx >= obj->nr_programs || idx < 0)
> > >   return NULL;
> > >   return >programs[idx];
> > >  }
> > > @@ -1944,8 +2183,29 @@ bpf_program__next(struct bpf_program *prev, struct 
> > > bpf_object *obj)
> > >  {
> > >   struct bpf_program *prog = prev;
> > >  
> > > + if (prev == NULL)
> > > + return obj->programs;
> > > +
> > This patch breaks the behavior introduced in
> > commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for 
> > multi-function programs"):
> > "Make bpf_program__next() skip over '.text' section if object file
> >  has pseudo calls.  The '.text' section is hardly a program in that
> >  case, it's more of a storage for code of functions other than main."
> > 
> > For example, the userspace could have been doing:
> > prog = bpf_program__next(NULL, obj);
> > bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
> > bpf_object__load(obj);
> > 
> > For the bpf_prog.o that has pseudo calls, after this patch in bpf-next,
> > the prog returned by bpf_program__next() could be in ".text" instead of
> > the main bpf program.  The next bpf_program__set_type() has
> > no effect to the main program.  The following bpf_object__load()
> > will catch user in surprise with the main bpf prog in
> > the wrong BPF_PROG_TYPE.
> 
> Will something like the following fix your concern? (plus, assuming the
> same for prev):
> 
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -2216,8 +2216,11 @@ bpf_program__next(struct bpf_program *prev, struct 
> bpf_object *obj)
>  {
> struct bpf_program *prog = prev;
>  
> -   if (prev == NULL)
> -   return obj->programs;
> +   if (prev == NULL) {
> +   prog = obj->programs;
> +   if (!prog || !bpf_program__is_function_storage(prog, obj))
> +   return prog;
> +   }
>  
> do {
> prog = __bpf_program__iter(prog, obj, 1);
> 
> Any suggestions for a better way to do it?
I think that would work.  The bpf_program__prev() will need the same
treatment though...

Here is my mostly untested fix to unblock my other dev works.  It moves
the very first NULL check back to __bpf_program__iter():

>From de1c89ae1768e756825a6874268b5b1686695c93 Mon Sep 17 00:00:00 2001
From: Martin KaFai Lau 
Date: Mon, 12 Nov 2018 14:52:39 -0800
Subject: [PATCH] bpf: libbpf: Fix bpf_program__next() API

This patch restores the behavior in
commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for 
multi-function programs"):
such that bpf_program__next() does not return pseudo programs in ".text".

Fixes: 0c19a9fbc9cd ("libbpf: cleanup after partial failure in bpf_object__pin")
Signed-off-by: Martin KaFai Lau 
---
 tools/lib/bpf/libbpf.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e827542ffa3a..a01eb9584e52 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2193,19 +2193,25 @@ void *bpf_object__priv(struct bpf_object *obj)
 }
 
 static struct bpf_program *
-__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i)
+__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, bool 
forward)
 {
+   size_t nr_programs = obj->nr_programs;
ssize_t idx;
 
-   if (!obj->programs)
+   if (!nr_programs)
return NULL;
 
+   if (!p)
+   /* Iter from the beginning */
+   return forward ? >programs[0] :
+   >programs[nr_programs - 1];
+
if (p->obj != obj) {
pr_warning("error: program handler doesn't match object\n");
return NULL;
}
 
-   idx = (p - obj->programs) + i;
+   idx = (p - obj->programs) + (forward ? 1 : -1);
if (idx >= obj->nr_programs || idx < 0)
return NULL;
return >programs[idx];
@@ -2216,11 +,8 @@ bpf_program__next(struct bpf_program 

Hello, this is the second times am sending you this mail and you refused to reply to my email why?

2018-11-12 Thread KATIE HIGGINS




[PATCH net-next 01/13] nfp: abm: rename qdiscs -> red_qdiscs

2018-11-12 Thread Jakub Kicinski
Rename qdiscs member to red_qdiscs.  One of following patches will
use the name qdiscs for tracking all qdisc types.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 drivers/net/ethernet/netronome/nfp/abm/main.c | 11 +++---
 drivers/net/ethernet/netronome/nfp/abm/main.h |  4 +-
 .../net/ethernet/netronome/nfp/abm/qdisc.c| 38 ++-
 3 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c 
b/drivers/net/ethernet/netronome/nfp/abm/main.c
index 3d15de0ae271..5e749602989e 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -309,9 +309,10 @@ nfp_abm_vnic_alloc(struct nfp_app *app, struct nfp_net 
*nn, unsigned int id)
alink->id = id;
alink->parent = TC_H_ROOT;
alink->total_queues = alink->vnic->max_rx_rings;
-   alink->qdiscs = kvcalloc(alink->total_queues, sizeof(*alink->qdiscs),
-GFP_KERNEL);
-   if (!alink->qdiscs) {
+   alink->red_qdiscs = kvcalloc(alink->total_queues,
+sizeof(*alink->red_qdiscs),
+GFP_KERNEL);
+   if (!alink->red_qdiscs) {
err = -ENOMEM;
goto err_free_alink;
}
@@ -331,7 +332,7 @@ nfp_abm_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, 
unsigned int id)
return 0;
 
 err_free_qdiscs:
-   kvfree(alink->qdiscs);
+   kvfree(alink->red_qdiscs);
 err_free_alink:
kfree(alink);
return err;
@@ -342,7 +343,7 @@ static void nfp_abm_vnic_free(struct nfp_app *app, struct 
nfp_net *nn)
struct nfp_abm_link *alink = nn->app_priv;
 
nfp_abm_kill_reprs(alink->abm, alink);
-   kvfree(alink->qdiscs);
+   kvfree(alink->red_qdiscs);
kfree(alink);
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h 
b/drivers/net/ethernet/netronome/nfp/abm/main.h
index 3774c063e419..a09090004f82 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.h
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.h
@@ -82,7 +82,7 @@ struct nfp_red_qdisc {
  * @total_queues:  number of PF queues
  * @parent:handle of expected parent, i.e. handle of MQ, or TC_H_ROOT
  * @num_qdiscs:number of currently used qdiscs
- * @qdiscs:array of qdiscs
+ * @red_qdiscs:array of qdiscs
  */
 struct nfp_abm_link {
struct nfp_abm *abm;
@@ -92,7 +92,7 @@ struct nfp_abm_link {
unsigned int total_queues;
u32 parent;
unsigned int num_qdiscs;
-   struct nfp_red_qdisc *qdiscs;
+   struct nfp_red_qdisc *red_qdiscs;
 };
 
 int nfp_abm_setup_tc_red(struct net_device *netdev, struct nfp_abm_link *alink,
diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c 
b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
index bb05f9ee0401..abda392880e0 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
@@ -18,7 +18,8 @@ __nfp_abm_reset_root(struct net_device *netdev, struct 
nfp_abm_link *alink,
int ret;
 
ret = nfp_abm_ctrl_set_all_q_lvls(alink, init_val);
-   memset(alink->qdiscs, 0, sizeof(*alink->qdiscs) * alink->num_qdiscs);
+   memset(alink->red_qdiscs, 0,
+  sizeof(*alink->red_qdiscs) * alink->num_qdiscs);
 
alink->parent = handle;
alink->num_qdiscs = qs;
@@ -46,7 +47,8 @@ nfp_abm_red_find(struct nfp_abm_link *alink, struct 
tc_red_qopt_offload *opt)
else
return -EOPNOTSUPP;
 
-   if (i >= alink->num_qdiscs || opt->handle != alink->qdiscs[i].handle)
+   if (i >= alink->num_qdiscs ||
+   opt->handle != alink->red_qdiscs[i].handle)
return -EOPNOTSUPP;
 
return i;
@@ -59,7 +61,7 @@ nfp_abm_red_destroy(struct net_device *netdev, struct 
nfp_abm_link *alink,
unsigned int i;
 
for (i = 0; i < alink->num_qdiscs; i++)
-   if (handle == alink->qdiscs[i].handle)
+   if (handle == alink->red_qdiscs[i].handle)
break;
if (i == alink->num_qdiscs)
return;
@@ -68,7 +70,7 @@ nfp_abm_red_destroy(struct net_device *netdev, struct 
nfp_abm_link *alink,
nfp_abm_reset_root(netdev, alink, TC_H_ROOT, 0);
} else {
nfp_abm_ctrl_set_q_lvl(alink, i, NFP_ABM_LVL_INFINITY);
-   memset(>qdiscs[i], 0, sizeof(*alink->qdiscs));
+   memset(>red_qdiscs[i], 0, sizeof(*alink->red_qdiscs));
}
 }
 
@@ -139,37 +141,39 @@ nfp_abm_red_replace(struct net_device *netdev, struct 
nfp_abm_link *alink,
return -EINVAL;
}
/* Set the handle to try full clean up, in case IO failed */
-   alink->qdiscs[i].handle = opt->handle;
+   alink->red_qdiscs[i].handle = opt->handle;
if (err)
goto err_destroy;
 
if (opt->parent == TC_H_ROOT)
-   

[PATCH net-next 05/13] nfp: abm: remember which Qdisc is root

2018-11-12 Thread Jakub Kicinski
Keep track of which Qdisc is currently root.  We need to implement
TC_SETUP_ROOT_QDISC handling, and for completeness also clear the
root Qdisc pointer when it's freed.  TC_SETUP_ROOT_QDISC isn't always
sent when device is dismantled.

Remembering the root Qdisc will allow us to build the entire hierarchy
in following patches.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 drivers/net/ethernet/netronome/nfp/abm/main.c  |  2 ++
 drivers/net/ethernet/netronome/nfp/abm/main.h  |  4 
 drivers/net/ethernet/netronome/nfp/abm/qdisc.c | 13 +
 3 files changed, 19 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c 
b/drivers/net/ethernet/netronome/nfp/abm/main.c
index 35f3a6054569..da5394886a78 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -37,6 +37,8 @@ nfp_abm_setup_tc(struct nfp_app *app, struct net_device 
*netdev,
return -EOPNOTSUPP;
 
switch (type) {
+   case TC_SETUP_ROOT_QDISC:
+   return nfp_abm_setup_root(netdev, repr->app_priv, type_data);
case TC_SETUP_QDISC_MQ:
return nfp_abm_setup_tc_mq(netdev, repr->app_priv, type_data);
case TC_SETUP_QDISC_RED:
diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h 
b/drivers/net/ethernet/netronome/nfp/abm/main.h
index 64cb5ebcf80e..48d519989886 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.h
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.h
@@ -116,6 +116,7 @@ struct nfp_red_qdisc {
  * @parent:handle of expected parent, i.e. handle of MQ, or TC_H_ROOT
  * @num_qdiscs:number of currently used qdiscs
  * @red_qdiscs:array of qdiscs
+ * @root_qdisc:pointer to the current root of the Qdisc hierarchy
  * @qdiscs:all qdiscs recorded by major part of the handle
  */
 struct nfp_abm_link {
@@ -127,9 +128,12 @@ struct nfp_abm_link {
u32 parent;
unsigned int num_qdiscs;
struct nfp_red_qdisc *red_qdiscs;
+   struct nfp_qdisc *root_qdisc;
struct radix_tree_root qdiscs;
 };
 
+int nfp_abm_setup_root(struct net_device *netdev, struct nfp_abm_link *alink,
+  struct tc_root_qopt_offload *opt);
 int nfp_abm_setup_tc_red(struct net_device *netdev, struct nfp_abm_link *alink,
 struct tc_red_qopt_offload *opt);
 int nfp_abm_setup_tc_mq(struct net_device *netdev, struct nfp_abm_link *alink,
diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c 
b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
index a6f95924656d..ba6ce2d1eda2 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
@@ -143,6 +143,9 @@ nfp_abm_qdisc_destroy(struct net_device *netdev, struct 
nfp_abm_link *alink,
return;
 
nfp_abm_qdisc_free(netdev, alink, qdisc);
+
+   if (alink->root_qdisc == qdisc)
+   alink->root_qdisc = NULL;
 }
 
 static void
@@ -450,3 +453,13 @@ int nfp_abm_setup_tc_mq(struct net_device *netdev, struct 
nfp_abm_link *alink,
return -EOPNOTSUPP;
}
 }
+
+int nfp_abm_setup_root(struct net_device *netdev, struct nfp_abm_link *alink,
+  struct tc_root_qopt_offload *opt)
+{
+   if (opt->ingress)
+   return -EOPNOTSUPP;
+   alink->root_qdisc = nfp_abm_qdisc_find(alink, opt->handle);
+
+   return 0;
+}
-- 
2.17.1



[PATCH net-next 10/13] net: sched: red: notify drivers about RED's limit parameter

2018-11-12 Thread Jakub Kicinski
RED qdisc's limit parameter changes the behaviour of the qdisc,
for instance if it's set to 0 qdisc will drop all the packets.

When replace operation happens and parameter is set to non-0
a new fifo qdisc will be instantiated and replace the old child
qdisc which will be destroyed.

Drivers need to know the parameter, even if they don't impose
the actual limit to be able to reliably reconstruct the Qdisc
hierarchy.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 include/net/pkt_cls.h | 1 +
 net/sched/sch_red.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 5d31820b7e80..c497ada7f591 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -850,6 +850,7 @@ struct tc_red_qopt_offload_params {
u32 min;
u32 max;
u32 probability;
+   u32 limit;
bool is_ecn;
bool is_harddrop;
struct gnet_stats_queue *qstats;
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 4b5ca172ee2d..9df9942340ea 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -166,6 +166,7 @@ static int red_offload(struct Qdisc *sch, bool enable)
opt.set.min = q->parms.qth_min >> q->parms.Wlog;
opt.set.max = q->parms.qth_max >> q->parms.Wlog;
opt.set.probability = q->parms.max_P;
+   opt.set.limit = q->limit;
opt.set.is_ecn = red_use_ecn(q);
opt.set.is_harddrop = red_use_harddrop(q);
opt.set.qstats = >qstats;
-- 
2.17.1



[PATCH net-next 07/13] net: sched: red: offload a graft notification

2018-11-12 Thread Jakub Kicinski
Drivers offloading Qdiscs should have reasonable certainty
the offloaded behaviour matches the SW path.  This is impossible
if the driver does not know about all Qdiscs or when Qdiscs move
and are reused.  Send a graft notification from RED.  The drivers
are expected to simply stop offloading the Qdisc, if a non-standard
child is ever grafted onto it.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 include/net/pkt_cls.h |  2 ++
 net/sched/sch_red.c   | 17 +
 2 files changed, 19 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index fa31d034231d..01f2802b7aee 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -834,6 +834,7 @@ enum tc_red_command {
TC_RED_DESTROY,
TC_RED_STATS,
TC_RED_XSTATS,
+   TC_RED_GRAFT,
 };
 
 struct tc_red_qopt_offload_params {
@@ -853,6 +854,7 @@ struct tc_red_qopt_offload {
struct tc_red_qopt_offload_params set;
struct tc_qopt_offload_stats stats;
struct red_stats *xstats;
+   u32 child_handle;
};
 };
 
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index a1d08bdd9357..4b5ca172ee2d 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -367,6 +367,21 @@ static int red_dump_class(struct Qdisc *sch, unsigned long 
cl,
return 0;
 }
 
+static void red_graft_offload(struct Qdisc *sch,
+ struct Qdisc *new, struct Qdisc *old,
+ struct netlink_ext_ack *extack)
+{
+   struct tc_red_qopt_offload graft_offload = {
+   .handle = sch->handle,
+   .parent = sch->parent,
+   .child_handle   = new->handle,
+   .command= TC_RED_GRAFT,
+   };
+
+   qdisc_offload_graft_helper(qdisc_dev(sch), sch, new, old,
+  TC_SETUP_QDISC_RED, _offload, extack);
+}
+
 static int red_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
 struct Qdisc **old, struct netlink_ext_ack *extack)
 {
@@ -376,6 +391,8 @@ static int red_graft(struct Qdisc *sch, unsigned long arg, 
struct Qdisc *new,
new = _qdisc;
 
*old = qdisc_replace(sch, new, >qdisc);
+
+   red_graft_offload(sch, new, *old, extack);
return 0;
 }
 
-- 
2.17.1



[PATCH net-next 03/13] nfp: abm: track all offload-enabled qdiscs

2018-11-12 Thread Jakub Kicinski
Allocate an object corresponding to any offloaded qdisc we are
informed about by the kernel.  Not only the qdiscs we have a
chance of offloading.

The count of created objects will be used to decide whether
the ethtool TC offload can be disabled, since otherwise we may
miss destroy commands.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 drivers/net/ethernet/netronome/nfp/abm/main.c |   2 +
 drivers/net/ethernet/netronome/nfp/abm/main.h |  23 
 .../net/ethernet/netronome/nfp/abm/qdisc.c| 111 +-
 3 files changed, 132 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c 
b/drivers/net/ethernet/netronome/nfp/abm/main.c
index c6ae0ac9a154..35f3a6054569 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -329,6 +329,7 @@ nfp_abm_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, 
unsigned int id)
 
nfp_abm_vnic_set_mac(app->pf, abm, nn, id);
nfp_abm_ctrl_read_params(alink);
+   INIT_RADIX_TREE(>qdiscs, GFP_KERNEL);
 
return 0;
 
@@ -344,6 +345,7 @@ static void nfp_abm_vnic_free(struct nfp_app *app, struct 
nfp_net *nn)
struct nfp_abm_link *alink = nn->app_priv;
 
nfp_abm_kill_reprs(alink->abm, alink);
+   WARN(!radix_tree_empty(>qdiscs), "left over qdiscs\n");
kvfree(alink->red_qdiscs);
kfree(alink);
 }
diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h 
b/drivers/net/ethernet/netronome/nfp/abm/main.h
index 15732ad7c202..64cb5ebcf80e 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.h
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.h
@@ -5,6 +5,7 @@
 #define __NFP_ABM_H__ 1
 
 #include 
+#include 
 #include 
 #include 
 
@@ -71,6 +72,26 @@ struct nfp_alink_xstats {
u64 pdrop;
 };
 
+enum nfp_qdisc_type {
+   NFP_QDISC_NONE = 0,
+   NFP_QDISC_MQ,
+   NFP_QDISC_RED,
+};
+
+/**
+ * struct nfp_qdisc - tracked TC Qdisc
+ * @netdev:netdev on which Qdisc was created
+ * @type:  Qdisc type
+ * @handle:handle of this Qdisc
+ * @parent_handle: handle of the parent (unreliable if Qdisc was grafted)
+ */
+struct nfp_qdisc {
+   struct net_device *netdev;
+   enum nfp_qdisc_type type;
+   u32 handle;
+   u32 parent_handle;
+};
+
 /**
  * struct nfp_red_qdisc - representation of single RED Qdisc
  * @handle:handle of currently offloaded RED Qdisc
@@ -95,6 +116,7 @@ struct nfp_red_qdisc {
  * @parent:handle of expected parent, i.e. handle of MQ, or TC_H_ROOT
  * @num_qdiscs:number of currently used qdiscs
  * @red_qdiscs:array of qdiscs
+ * @qdiscs:all qdiscs recorded by major part of the handle
  */
 struct nfp_abm_link {
struct nfp_abm *abm;
@@ -105,6 +127,7 @@ struct nfp_abm_link {
u32 parent;
unsigned int num_qdiscs;
struct nfp_red_qdisc *red_qdiscs;
+   struct radix_tree_root qdiscs;
 };
 
 int nfp_abm_setup_tc_red(struct net_device *netdev, struct nfp_abm_link *alink,
diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c 
b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
index abb0a24c7fac..a6f95924656d 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
@@ -63,17 +63,97 @@ static void nfp_abm_offload_update(struct nfp_abm *abm)
 }
 
 static void
-__nfp_abm_reset_root(struct net_device *netdev, struct nfp_abm_link *alink,
-u32 handle, unsigned int qs, u32 init_val)
+nfp_abm_qdisc_free(struct net_device *netdev, struct nfp_abm_link *alink,
+  struct nfp_qdisc *qdisc)
 {
struct nfp_port *port = nfp_port_from_netdev(netdev);
 
+   if (!qdisc)
+   return;
+   WARN_ON(radix_tree_delete(>qdiscs,
+ TC_H_MAJ(qdisc->handle)) != qdisc);
+   kfree(qdisc);
+
+   port->tc_offload_cnt--;
+}
+
+static struct nfp_qdisc *
+nfp_abm_qdisc_alloc(struct net_device *netdev, struct nfp_abm_link *alink,
+   enum nfp_qdisc_type type, u32 parent_handle, u32 handle)
+{
+   struct nfp_port *port = nfp_port_from_netdev(netdev);
+   struct nfp_qdisc *qdisc;
+   int err;
+
+   qdisc = kzalloc(sizeof(*qdisc), GFP_KERNEL);
+   if (!qdisc)
+   return NULL;
+
+   qdisc->netdev = netdev;
+   qdisc->type = type;
+   qdisc->parent_handle = parent_handle;
+   qdisc->handle = handle;
+
+   err = radix_tree_insert(>qdiscs, TC_H_MAJ(qdisc->handle), qdisc);
+   if (err) {
+   nfp_err(alink->abm->app->cpp,
+   "Qdisc insertion into radix tree failed: %d\n", err);
+   goto err_free_qdisc;
+   }
+
+   port->tc_offload_cnt++;
+   return qdisc;
+
+err_free_qdisc:
+   kfree(qdisc);
+   return NULL;
+}
+
+static struct nfp_qdisc *
+nfp_abm_qdisc_find(struct nfp_abm_link *alink, u32 handle)
+{
+   return 

[PATCH net-next 04/13] net: sched: provide notification for graft on root

2018-11-12 Thread Jakub Kicinski
Drivers are currently not notified when a Qdisc is grafted as root.
This requires special casing Qdiscs added with parent = TC_H_ROOT in
the driver.  Also there is no notification sent to the driver when
an existing Qdisc is grafted as root.

Add this very simple notifications, drivers should now be able to
track their Qdisc tree fully.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 include/linux/netdevice.h |  1 +
 include/net/pkt_cls.h | 10 ++
 net/sched/sch_api.c   | 17 +
 3 files changed, 28 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 487fa5e0e165..97b4233120e4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -845,6 +845,7 @@ enum tc_setup_type {
TC_SETUP_QDISC_PRIO,
TC_SETUP_QDISC_MQ,
TC_SETUP_QDISC_ETF,
+   TC_SETUP_ROOT_QDISC,
 };
 
 /* These structures hold the attributes of bpf state that are being passed
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index f6c0cd29dea4..fa31d034231d 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -889,4 +889,14 @@ struct tc_prio_qopt_offload {
};
 };
 
+enum tc_root_command {
+   TC_ROOT_GRAFT,
+};
+
+struct tc_root_qopt_offload {
+   enum tc_root_command command;
+   u32 handle;
+   bool ingress;
+};
+
 #endif
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index f55bc50cd0a9..9c88cec7e8a2 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -860,6 +860,21 @@ void qdisc_offload_graft_helper(struct net_device *dev, 
struct Qdisc *sch,
 }
 EXPORT_SYMBOL(qdisc_offload_graft_helper);
 
+static void qdisc_offload_graft_root(struct net_device *dev,
+struct Qdisc *new, struct Qdisc *old,
+struct netlink_ext_ack *extack)
+{
+   struct tc_root_qopt_offload graft_offload = {
+   .command= TC_ROOT_GRAFT,
+   .handle = new ? new->handle : 0,
+   .ingress= (new && new->flags & TCQ_F_INGRESS) ||
+ (old && old->flags & TCQ_F_INGRESS),
+   };
+
+   qdisc_offload_graft_helper(dev, NULL, new, old,
+  TC_SETUP_ROOT_QDISC, _offload, extack);
+}
+
 static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 u32 portid, u32 seq, u16 flags, int event)
 {
@@ -1026,6 +1041,8 @@ static int qdisc_graft(struct net_device *dev, struct 
Qdisc *parent,
if (dev->flags & IFF_UP)
dev_deactivate(dev);
 
+   qdisc_offload_graft_root(dev, new, old, extack);
+
if (new && new->ops->attach)
goto skip;
 
-- 
2.17.1



[PATCH net-next 06/13] nfp: abm: allocate Qdisc child table

2018-11-12 Thread Jakub Kicinski
To keep track of Qdisc hierarchy allocate a table for children
for each Qdisc.  RED Qdisc can only have one child.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 drivers/net/ethernet/netronome/nfp/abm/main.h |  6 +
 .../net/ethernet/netronome/nfp/abm/qdisc.c| 25 +--
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h 
b/drivers/net/ethernet/netronome/nfp/abm/main.h
index 48d519989886..adffa36981e0 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.h
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.h
@@ -84,12 +84,18 @@ enum nfp_qdisc_type {
  * @type:  Qdisc type
  * @handle:handle of this Qdisc
  * @parent_handle: handle of the parent (unreliable if Qdisc was grafted)
+ * @use_cnt:   number of attachment points in the hierarchy
+ * @num_children:  current size of the @children array
+ * @children:  pointers to children
  */
 struct nfp_qdisc {
struct net_device *netdev;
enum nfp_qdisc_type type;
u32 handle;
u32 parent_handle;
+   unsigned int use_cnt;
+   unsigned int num_children;
+   struct nfp_qdisc **children;
 };
 
 /**
diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c 
b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
index ba6ce2d1eda2..3ecb63060429 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
@@ -72,6 +72,8 @@ nfp_abm_qdisc_free(struct net_device *netdev, struct 
nfp_abm_link *alink,
return;
WARN_ON(radix_tree_delete(>qdiscs,
  TC_H_MAJ(qdisc->handle)) != qdisc);
+
+   kfree(qdisc->children);
kfree(qdisc);
 
port->tc_offload_cnt--;
@@ -79,7 +81,8 @@ nfp_abm_qdisc_free(struct net_device *netdev, struct 
nfp_abm_link *alink,
 
 static struct nfp_qdisc *
 nfp_abm_qdisc_alloc(struct net_device *netdev, struct nfp_abm_link *alink,
-   enum nfp_qdisc_type type, u32 parent_handle, u32 handle)
+   enum nfp_qdisc_type type, u32 parent_handle, u32 handle,
+   unsigned int children)
 {
struct nfp_port *port = nfp_port_from_netdev(netdev);
struct nfp_qdisc *qdisc;
@@ -89,21 +92,28 @@ nfp_abm_qdisc_alloc(struct net_device *netdev, struct 
nfp_abm_link *alink,
if (!qdisc)
return NULL;
 
+   qdisc->children = kcalloc(children, sizeof(void *), GFP_KERNEL);
+   if (!qdisc->children)
+   goto err_free_qdisc;
+
qdisc->netdev = netdev;
qdisc->type = type;
qdisc->parent_handle = parent_handle;
qdisc->handle = handle;
+   qdisc->num_children = children;
 
err = radix_tree_insert(>qdiscs, TC_H_MAJ(qdisc->handle), qdisc);
if (err) {
nfp_err(alink->abm->app->cpp,
"Qdisc insertion into radix tree failed: %d\n", err);
-   goto err_free_qdisc;
+   goto err_free_child_tbl;
}
 
port->tc_offload_cnt++;
return qdisc;
 
+err_free_child_tbl:
+   kfree(qdisc->children);
 err_free_qdisc:
kfree(qdisc);
return NULL;
@@ -118,7 +128,7 @@ nfp_abm_qdisc_find(struct nfp_abm_link *alink, u32 handle)
 static int
 nfp_abm_qdisc_replace(struct net_device *netdev, struct nfp_abm_link *alink,
  enum nfp_qdisc_type type, u32 parent_handle, u32 handle,
- struct nfp_qdisc **qdisc)
+ unsigned int children, struct nfp_qdisc **qdisc)
 {
*qdisc = nfp_abm_qdisc_find(alink, handle);
if (*qdisc) {
@@ -127,8 +137,8 @@ nfp_abm_qdisc_replace(struct net_device *netdev, struct 
nfp_abm_link *alink,
return 0;
}
 
-   *qdisc = nfp_abm_qdisc_alloc(netdev, alink, type, parent_handle,
-handle);
+   *qdisc = nfp_abm_qdisc_alloc(netdev, alink, type, parent_handle, handle,
+children);
return *qdisc ? 0 : -ENOMEM;
 }
 
@@ -248,7 +258,7 @@ nfp_abm_red_replace(struct net_device *netdev, struct 
nfp_abm_link *alink,
int ret;
 
ret = nfp_abm_qdisc_replace(netdev, alink, NFP_QDISC_RED, opt->parent,
-   opt->handle, );
+   opt->handle, 1, );
 
i = nfp_abm_red_find(alink, opt);
existing = i >= 0;
@@ -428,7 +438,8 @@ nfp_abm_mq_create(struct net_device *netdev, struct 
nfp_abm_link *alink,
struct nfp_qdisc *qdisc;
 
return nfp_abm_qdisc_replace(netdev, alink, NFP_QDISC_MQ,
-TC_H_ROOT, opt->handle, );
+TC_H_ROOT, opt->handle,
+alink->total_queues, );
 }
 
 int nfp_abm_setup_tc_mq(struct net_device *netdev, struct nfp_abm_link *alink,
-- 
2.17.1



[PATCH net-next 12/13] nfp: abm: save RED's parameters

2018-11-12 Thread Jakub Kicinski
Use the new driver Qdisc structure to keep track of parameters
of RED Qdiscs.  This way as the Qdisc moves around in the hierarchy
we will be able to configure the HW appropriately.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 drivers/net/ethernet/netronome/nfp/abm/main.h  | 14 ++
 drivers/net/ethernet/netronome/nfp/abm/qdisc.c |  5 -
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h 
b/drivers/net/ethernet/netronome/nfp/abm/main.h
index daca93e90099..d0d85f82bd1c 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.h
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.h
@@ -89,6 +89,11 @@ enum nfp_qdisc_type {
  * @use_cnt:   number of attachment points in the hierarchy
  * @num_children:  current size of the @children array
  * @children:  pointers to children
+ *
+ * @params_ok: parameters of this Qdisc are OK for offload
+ *
+ * @red:   RED Qdisc specific parameters and state
+ * @red.threshold: ECN marking threshold
  */
 struct nfp_qdisc {
struct net_device *netdev;
@@ -98,6 +103,15 @@ struct nfp_qdisc {
unsigned int use_cnt;
unsigned int num_children;
struct nfp_qdisc **children;
+
+   bool params_ok;
+
+   union {
+   /* TC_SETUP_QDISC_RED */
+   struct {
+   u32 threshold;
+   } red;
+   };
 };
 
 /**
diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c 
b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
index 1b3c0b5b52bf..fb68038ec1da 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
@@ -375,7 +375,10 @@ nfp_abm_red_replace(struct net_device *netdev, struct 
nfp_abm_link *alink,
qdisc->children[0] = NFP_QDISC_UNTRACKED;
}
 
-   if (!nfp_abm_red_check_params(alink, opt)) {
+   qdisc->params_ok = nfp_abm_red_check_params(alink, opt);
+   if (qdisc->params_ok) {
+   qdisc->red.threshold = opt->set.min;
+   } else {
err = -EINVAL;
goto err_destroy;
}
-- 
2.17.1



[PATCH net-next 00/13] nfp: abm: track all Qdiscs

2018-11-12 Thread Jakub Kicinski
Hi!

Our Qdisc offload so far has been very simplistic.  We held
and array of marking thresholds and statistics sized to the
number of PF queues.  This was sufficient since the only
configuration we supported was single layer of RED Qdiscs
(on top of MQ or not, but MQ isn't really about queuing).

As we move to add more Qdiscs it's time to actually try to
track the full Qdisc hierarchy.  This allows us to make sure
our offloaded configuration reflects the SW path better.
We add graft notifications to MQ and RED (PRIO already sends
them) to allow drivers offloading those to learn how Qdiscs
are linked.  MQ graft gives us the obvious advantage of being
able to track when Qdiscs are shared or moved.  It seems
unlikely HW would offload RED's child Qdiscs but since the
behaviour would change based on linked child we should
stop offloading REDs with modified child.  RED will also
handle the child differently during reconfig when limit
parameter is set - so we have to inform the drivers about
the limit, and have them reset the child state when
appropriate.

The NFP driver will now allocate a structure to track each
Qdisc and link it to its children.  We will also maintain
a shadow copy of threshold settings - to save device writes
and make it easier to apply defaults when config is
re-evaluated.

Jakub Kicinski (13):
  nfp: abm: rename qdiscs -> red_qdiscs
  nfp: abm: keep track of all RED thresholds
  nfp: abm: track all offload-enabled qdiscs
  net: sched: provide notification for graft on root
  nfp: abm: remember which Qdisc is root
  nfp: abm: allocate Qdisc child table
  net: sched: red: offload a graft notification
  net: sched: mq: offload a graft notification
  nfp: abm: build full Qdisc hierarchy based on graft notifications
  net: sched: red: notify drivers about RED's limit parameter
  nfp: abm: reset RED's child based on limit
  nfp: abm: save RED's parameters
  nfp: abm: restructure Qdisc handling

 drivers/net/ethernet/netronome/nfp/abm/ctrl.c | 105 +--
 drivers/net/ethernet/netronome/nfp/abm/main.c |  42 +-
 drivers/net/ethernet/netronome/nfp/abm/main.h | 108 ++-
 .../net/ethernet/netronome/nfp/abm/qdisc.c| 712 +-
 include/linux/netdevice.h |   1 +
 include/net/pkt_cls.h |  24 +-
 net/sched/sch_api.c   |  17 +
 net/sched/sch_mq.c|   9 +
 net/sched/sch_red.c   |  18 +
 9 files changed, 740 insertions(+), 296 deletions(-)

-- 
2.17.1



[PATCH net-next 02/13] nfp: abm: keep track of all RED thresholds

2018-11-12 Thread Jakub Kicinski
Instead of writing the threshold out when Qdisc is configured
and not remembering it move to a scheme where we remember all
thresholds.  When configuration changes parse the offloaded
Qdiscs and set thresholds appropriately.

This will help future extensions.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 drivers/net/ethernet/netronome/nfp/abm/ctrl.c | 32 +++
 drivers/net/ethernet/netronome/nfp/abm/main.c | 25 +-
 drivers/net/ethernet/netronome/nfp/abm/main.h | 16 +++-
 .../net/ethernet/netronome/nfp/abm/qdisc.c| 88 ++-
 4 files changed, 120 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c 
b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
index 3c661f422688..564427e8a6e8 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
@@ -71,35 +71,37 @@ nfp_abm_ctrl_stat_all(struct nfp_abm_link *alink, const 
struct nfp_rtsym *sym,
return 0;
 }
 
-int nfp_abm_ctrl_set_q_lvl(struct nfp_abm_link *alink, unsigned int i, u32 val)
+int __nfp_abm_ctrl_set_q_lvl(struct nfp_abm *abm, unsigned int id, u32 val)
 {
-   struct nfp_cpp *cpp = alink->abm->app->cpp;
+   struct nfp_cpp *cpp = abm->app->cpp;
u64 sym_offset;
int err;
 
-   sym_offset = (alink->queue_base + i) * NFP_QLVL_STRIDE + NFP_QLVL_THRS;
-   err = __nfp_rtsym_writel(cpp, alink->abm->q_lvls, 4, 0,
-sym_offset, val);
+   __clear_bit(id, abm->threshold_undef);
+   if (abm->thresholds[id] == val)
+   return 0;
+
+   sym_offset = id * NFP_QLVL_STRIDE + NFP_QLVL_THRS;
+   err = __nfp_rtsym_writel(cpp, abm->q_lvls, 4, 0, sym_offset, val);
if (err) {
-   nfp_err(cpp, "RED offload setting level failed on vNIC %d queue 
%d\n",
-   alink->id, i);
+   nfp_err(cpp,
+   "RED offload setting level failed on subqueue %d\n",
+   id);
return err;
}
 
+   abm->thresholds[id] = val;
return 0;
 }
 
-int nfp_abm_ctrl_set_all_q_lvls(struct nfp_abm_link *alink, u32 val)
+int nfp_abm_ctrl_set_q_lvl(struct nfp_abm_link *alink, unsigned int queue,
+  u32 val)
 {
-   int i, err;
+   unsigned int threshold;
 
-   for (i = 0; i < alink->vnic->max_rx_rings; i++) {
-   err = nfp_abm_ctrl_set_q_lvl(alink, i, val);
-   if (err)
-   return err;
-   }
+   threshold = alink->queue_base + queue;
 
-   return 0;
+   return __nfp_abm_ctrl_set_q_lvl(alink->abm, threshold, val);
 }
 
 u64 nfp_abm_ctrl_stat_non_sto(struct nfp_abm_link *alink, unsigned int i)
diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c 
b/drivers/net/ethernet/netronome/nfp/abm/main.c
index 5e749602989e..c6ae0ac9a154 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -2,6 +2,7 @@
 /* Copyright (C) 2018 Netronome Systems, Inc. */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -399,6 +400,7 @@ static int nfp_abm_init(struct nfp_app *app)
struct nfp_pf *pf = app->pf;
struct nfp_reprs *reprs;
struct nfp_abm *abm;
+   unsigned int i;
int err;
 
if (!pf->eth_tbl) {
@@ -425,15 +427,28 @@ static int nfp_abm_init(struct nfp_app *app)
if (err)
goto err_free_abm;
 
+   err = -ENOMEM;
+   abm->num_thresholds = NFP_NET_MAX_RX_RINGS;
+   abm->threshold_undef = bitmap_zalloc(abm->num_thresholds, GFP_KERNEL);
+   if (!abm->threshold_undef)
+   goto err_free_abm;
+
+   abm->thresholds = kvcalloc(abm->num_thresholds,
+  sizeof(*abm->thresholds), GFP_KERNEL);
+   if (!abm->thresholds)
+   goto err_free_thresh_umap;
+   for (i = 0; i < NFP_NET_MAX_RX_RINGS; i++)
+   __nfp_abm_ctrl_set_q_lvl(abm, i, NFP_ABM_LVL_INFINITY);
+
/* We start in legacy mode, make sure advanced queuing is disabled */
err = nfp_abm_ctrl_qm_disable(abm);
if (err)
-   goto err_free_abm;
+   goto err_free_thresh;
 
err = -ENOMEM;
reprs = nfp_reprs_alloc(pf->max_data_vnics);
if (!reprs)
-   goto err_free_abm;
+   goto err_free_thresh;
RCU_INIT_POINTER(app->reprs[NFP_REPR_TYPE_PHYS_PORT], reprs);
 
reprs = nfp_reprs_alloc(pf->max_data_vnics);
@@ -445,6 +460,10 @@ static int nfp_abm_init(struct nfp_app *app)
 
 err_free_phys:
nfp_reprs_clean_and_free_by_type(app, NFP_REPR_TYPE_PHYS_PORT);
+err_free_thresh:
+   kvfree(abm->thresholds);
+err_free_thresh_umap:
+   bitmap_free(abm->threshold_undef);
 err_free_abm:
kfree(abm);
app->priv = NULL;
@@ -458,6 +477,8 @@ static void nfp_abm_clean(struct nfp_app *app)

[PATCH net-next 13/13] nfp: abm: restructure Qdisc handling

2018-11-12 Thread Jakub Kicinski
In preparation of handling more Qdisc types switch to a different
offload strategy.  We have now recreated the Qdisc hierarchy in
the driver.  Every time the hierarchy changes parse it, and update
the configuration of the HW accordingly.

While at it drop the support of pretending that we can instantiate
a single queue on a multi-queue device in HW/FW.  MQ is now required,
and each queue will have its own instance of RED.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 drivers/net/ethernet/netronome/nfp/abm/ctrl.c |  73 ---
 drivers/net/ethernet/netronome/nfp/abm/main.c |  14 +-
 drivers/net/ethernet/netronome/nfp/abm/main.h |  57 +-
 .../net/ethernet/netronome/nfp/abm/qdisc.c| 548 ++
 4 files changed, 340 insertions(+), 352 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c 
b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
index 564427e8a6e8..1629b07f727b 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
@@ -50,27 +50,6 @@ nfp_abm_ctrl_stat(struct nfp_abm_link *alink, const struct 
nfp_rtsym *sym,
return 0;
 }
 
-static int
-nfp_abm_ctrl_stat_all(struct nfp_abm_link *alink, const struct nfp_rtsym *sym,
- unsigned int stride, unsigned int offset, bool is_u64,
- u64 *res)
-{
-   u64 val, sum = 0;
-   unsigned int i;
-   int err;
-
-   for (i = 0; i < alink->vnic->max_rx_rings; i++) {
-   err = nfp_abm_ctrl_stat(alink, sym, stride, offset, i,
-   is_u64, );
-   if (err)
-   return err;
-   sum += val;
-   }
-
-   *res = sum;
-   return 0;
-}
-
 int __nfp_abm_ctrl_set_q_lvl(struct nfp_abm *abm, unsigned int id, u32 val)
 {
struct nfp_cpp *cpp = abm->app->cpp;
@@ -155,42 +134,6 @@ int nfp_abm_ctrl_read_q_stats(struct nfp_abm_link *alink, 
unsigned int i,
 i, true, >overlimits);
 }
 
-int nfp_abm_ctrl_read_stats(struct nfp_abm_link *alink,
-   struct nfp_alink_stats *stats)
-{
-   u64 pkts = 0, bytes = 0;
-   int i, err;
-
-   for (i = 0; i < alink->vnic->max_rx_rings; i++) {
-   pkts += nn_readq(alink->vnic, NFP_NET_CFG_RXR_STATS(i));
-   bytes += nn_readq(alink->vnic, NFP_NET_CFG_RXR_STATS(i) + 8);
-   }
-   stats->tx_pkts = pkts;
-   stats->tx_bytes = bytes;
-
-   err = nfp_abm_ctrl_stat_all(alink, alink->abm->q_lvls,
-   NFP_QLVL_STRIDE, NFP_QLVL_BLOG_BYTES,
-   false, >backlog_bytes);
-   if (err)
-   return err;
-
-   err = nfp_abm_ctrl_stat_all(alink, alink->abm->q_lvls,
-   NFP_QLVL_STRIDE, NFP_QLVL_BLOG_PKTS,
-   false, >backlog_pkts);
-   if (err)
-   return err;
-
-   err = nfp_abm_ctrl_stat_all(alink, alink->abm->qm_stats,
-   NFP_QMSTAT_STRIDE, NFP_QMSTAT_DROP,
-   true, >drops);
-   if (err)
-   return err;
-
-   return nfp_abm_ctrl_stat_all(alink, alink->abm->qm_stats,
-NFP_QMSTAT_STRIDE, NFP_QMSTAT_ECN,
-true, >overlimits);
-}
-
 int nfp_abm_ctrl_read_q_xstats(struct nfp_abm_link *alink, unsigned int i,
   struct nfp_alink_xstats *xstats)
 {
@@ -207,22 +150,6 @@ int nfp_abm_ctrl_read_q_xstats(struct nfp_abm_link *alink, 
unsigned int i,
 i, true, >ecn_marked);
 }
 
-int nfp_abm_ctrl_read_xstats(struct nfp_abm_link *alink,
-struct nfp_alink_xstats *xstats)
-{
-   int err;
-
-   err = nfp_abm_ctrl_stat_all(alink, alink->abm->qm_stats,
-   NFP_QMSTAT_STRIDE, NFP_QMSTAT_DROP,
-   true, >pdrop);
-   if (err)
-   return err;
-
-   return nfp_abm_ctrl_stat_all(alink, alink->abm->qm_stats,
-NFP_QMSTAT_STRIDE, NFP_QMSTAT_ECN,
-true, >ecn_marked);
-}
-
 int nfp_abm_ctrl_qm_enable(struct nfp_abm *abm)
 {
return nfp_mbox_cmd(abm->app->pf, NFP_MBOX_PCIE_ABM_ENABLE,
diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c 
b/drivers/net/ethernet/netronome/nfp/abm/main.c
index da5394886a78..a5732d3bd1b7 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "../nfpcore/nfp.h"
@@ -310,22 +311,14 @@ nfp_abm_vnic_alloc(struct nfp_app *app, struct nfp_net 
*nn, unsigned int id)
alink->abm = abm;
alink->vnic = nn;
alink->id = id;
-   alink->parent = TC_H_ROOT;

[PATCH net-next 11/13] nfp: abm: reset RED's child based on limit

2018-11-12 Thread Jakub Kicinski
RED qdisc will replace its child Qdisc with a new FIFO queue if
it is reconfigured and the limit parameter is not 0.

This means that when it's created with limit of 0 it will have no FIFO,
and all packets will be dropped.  If it's changed and limit is specified
it will loose its existing child (implicit graft).  Make sure we mark
RED Qdisc child as NFP_QDISC_UNTRACKED if its not the expected FIFO.

nfp_abm_qdisc_replace() will return 1 if Qdisc already existed.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 .../net/ethernet/netronome/nfp/abm/qdisc.c| 27 +++
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c 
b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
index 151d2dafbc76..1b3c0b5b52bf 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
@@ -196,7 +196,7 @@ nfp_abm_qdisc_replace(struct net_device *netdev, struct 
nfp_abm_link *alink,
if (*qdisc) {
if (WARN_ON((*qdisc)->type != type))
return -EINVAL;
-   return 0;
+   return 1;
}
 
*qdisc = nfp_abm_qdisc_alloc(netdev, alink, type, parent_handle, handle,
@@ -357,11 +357,24 @@ nfp_abm_red_replace(struct net_device *netdev, struct 
nfp_abm_link *alink,
i = nfp_abm_red_find(alink, opt);
existing = i >= 0;
 
-   if (ret) {
+   if (ret < 0) {
err = ret;
goto err_destroy;
}
 
+   /* If limit != 0 child gets reset */
+   if (opt->set.limit) {
+   if (nfp_abm_qdisc_child_valid(qdisc, 0))
+   qdisc->children[0]->use_cnt--;
+   qdisc->children[0] = NULL;
+   } else {
+   /* Qdisc was just allocated without a limit will use noop_qdisc,
+* i.e. a block hole.
+*/
+   if (!ret)
+   qdisc->children[0] = NFP_QDISC_UNTRACKED;
+   }
+
if (!nfp_abm_red_check_params(alink, opt)) {
err = -EINVAL;
goto err_destroy;
@@ -533,10 +546,14 @@ nfp_abm_mq_create(struct net_device *netdev, struct 
nfp_abm_link *alink,
  struct tc_mq_qopt_offload *opt)
 {
struct nfp_qdisc *qdisc;
+   int ret;
 
-   return nfp_abm_qdisc_replace(netdev, alink, NFP_QDISC_MQ,
-TC_H_ROOT, opt->handle,
-alink->total_queues, );
+   ret = nfp_abm_qdisc_replace(netdev, alink, NFP_QDISC_MQ,
+   TC_H_ROOT, opt->handle, alink->total_queues,
+   );
+   if (ret < 0)
+   return ret;
+   return 0;
 }
 
 int nfp_abm_setup_tc_mq(struct net_device *netdev, struct nfp_abm_link *alink,
-- 
2.17.1



[PATCH net-next 08/13] net: sched: mq: offload a graft notification

2018-11-12 Thread Jakub Kicinski
Drivers offloading Qdiscs should have reasonable certainty
the offloaded behaviour matches the SW path.  This is impossible
if the driver does not know about all Qdiscs or when Qdiscs move
and are reused.  Send a graft notification from MQ.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 include/net/pkt_cls.h | 11 ++-
 net/sched/sch_mq.c|  9 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 01f2802b7aee..5d31820b7e80 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -821,12 +821,21 @@ enum tc_mq_command {
TC_MQ_CREATE,
TC_MQ_DESTROY,
TC_MQ_STATS,
+   TC_MQ_GRAFT,
+};
+
+struct tc_mq_opt_offload_graft_params {
+   unsigned long queue;
+   u32 child_handle;
 };
 
 struct tc_mq_qopt_offload {
enum tc_mq_command command;
u32 handle;
-   struct tc_qopt_offload_stats stats;
+   union {
+   struct tc_qopt_offload_stats stats;
+   struct tc_mq_opt_offload_graft_params graft_params;
+   };
 };
 
 enum tc_red_command {
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 1db5c1bf6ddd..203659bc3906 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -193,6 +193,7 @@ static int mq_graft(struct Qdisc *sch, unsigned long cl, 
struct Qdisc *new,
struct Qdisc **old, struct netlink_ext_ack *extack)
 {
struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
+   struct tc_mq_qopt_offload graft_offload;
struct net_device *dev = qdisc_dev(sch);
 
if (dev->flags & IFF_UP)
@@ -203,6 +204,14 @@ static int mq_graft(struct Qdisc *sch, unsigned long cl, 
struct Qdisc *new,
new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
if (dev->flags & IFF_UP)
dev_activate(dev);
+
+   graft_offload.handle = sch->handle;
+   graft_offload.graft_params.queue = cl - 1;
+   graft_offload.graft_params.child_handle = new ? new->handle : 0;
+   graft_offload.command = TC_MQ_GRAFT;
+
+   qdisc_offload_graft_helper(qdisc_dev(sch), sch, new, *old,
+  TC_SETUP_QDISC_MQ, _offload, extack);
return 0;
 }
 
-- 
2.17.1



[PATCH net-next 09/13] nfp: abm: build full Qdisc hierarchy based on graft notifications

2018-11-12 Thread Jakub Kicinski
Using graft notifications recreate in the driver the full Qdisc
hierarchy.  Keep track of how many times each Qdisc is attached
to the hierarchy to make sure we don't offload Qdiscs which are
attached multiple times (device queues can't be shared).  For
graft events of Qdiscs we don't know exist make the child as
invalid/untracked.

Note that MQ Qdisc doesn't send destruction events reliably when
device is dismantled, so we need to manually clean out the
children otherwise we'd think Qdiscs which are still in use
are getting freed.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 drivers/net/ethernet/netronome/nfp/abm/main.h |   2 +
 .../net/ethernet/netronome/nfp/abm/qdisc.c| 105 ++
 2 files changed, 107 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h 
b/drivers/net/ethernet/netronome/nfp/abm/main.h
index adffa36981e0..daca93e90099 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.h
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.h
@@ -78,6 +78,8 @@ enum nfp_qdisc_type {
NFP_QDISC_RED,
 };
 
+#define NFP_QDISC_UNTRACKED((struct nfp_qdisc *)1UL)
+
 /**
  * struct nfp_qdisc - tracked TC Qdisc
  * @netdev:netdev on which Qdisc was created
diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c 
b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
index 3ecb63060429..151d2dafbc76 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
 /* Copyright (C) 2018 Netronome Systems, Inc. */
 
+#include 
 #include 
 #include 
 #include 
@@ -12,6 +13,66 @@
 #include "../nfp_port.h"
 #include "main.h"
 
+static bool nfp_abm_qdisc_child_valid(struct nfp_qdisc *qdisc, unsigned int id)
+{
+   return qdisc->children[id] &&
+  qdisc->children[id] != NFP_QDISC_UNTRACKED;
+}
+
+static void *nfp_abm_qdisc_tree_deref_slot(void __rcu **slot)
+{
+   return rtnl_dereference(*slot);
+}
+
+static void
+nfp_abm_qdisc_unlink_children(struct nfp_qdisc *qdisc,
+ unsigned int start, unsigned int end)
+{
+   unsigned int i;
+
+   for (i = start; i < end; i++)
+   if (nfp_abm_qdisc_child_valid(qdisc, i)) {
+   qdisc->children[i]->use_cnt--;
+   qdisc->children[i] = NULL;
+   }
+}
+
+static void
+nfp_abm_qdisc_clear_mq(struct net_device *netdev, struct nfp_abm_link *alink,
+  struct nfp_qdisc *qdisc)
+{
+   struct radix_tree_iter iter;
+   unsigned int mq_refs = 0;
+   void __rcu **slot;
+
+   if (!qdisc->use_cnt)
+   return;
+   /* MQ doesn't notify well on destruction, we need special handling of
+* MQ's children.
+*/
+   if (qdisc->type == NFP_QDISC_MQ &&
+   qdisc == alink->root_qdisc &&
+   netdev->reg_state == NETREG_UNREGISTERING)
+   return;
+
+   /* Count refs held by MQ instances and clear pointers */
+   radix_tree_for_each_slot(slot, >qdiscs, , 0) {
+   struct nfp_qdisc *mq = nfp_abm_qdisc_tree_deref_slot(slot);
+   unsigned int i;
+
+   if (mq->type != NFP_QDISC_MQ || mq->netdev != netdev)
+   continue;
+   for (i = 0; i < mq->num_children; i++)
+   if (mq->children[i] == qdisc) {
+   mq->children[i] = NULL;
+   mq_refs++;
+   }
+   }
+
+   WARN(qdisc->use_cnt != mq_refs, "non-zero qdisc use count: %d (- %d)\n",
+qdisc->use_cnt, mq_refs);
+}
+
 static void
 nfp_abm_offload_compile_red(struct nfp_abm_link *alink,
struct nfp_red_qdisc *qdisc, unsigned int queue)
@@ -70,6 +131,7 @@ nfp_abm_qdisc_free(struct net_device *netdev, struct 
nfp_abm_link *alink,
 
if (!qdisc)
return;
+   nfp_abm_qdisc_clear_mq(netdev, alink, qdisc);
WARN_ON(radix_tree_delete(>qdiscs,
  TC_H_MAJ(qdisc->handle)) != qdisc);
 
@@ -152,12 +214,44 @@ nfp_abm_qdisc_destroy(struct net_device *netdev, struct 
nfp_abm_link *alink,
if (!qdisc)
return;
 
+   /* We don't get TC_SETUP_ROOT_QDISC w/ MQ when netdev is unregistered */
+   if (alink->root_qdisc == qdisc)
+   qdisc->use_cnt--;
+
+   nfp_abm_qdisc_unlink_children(qdisc, 0, qdisc->num_children);
nfp_abm_qdisc_free(netdev, alink, qdisc);
 
if (alink->root_qdisc == qdisc)
alink->root_qdisc = NULL;
 }
 
+static int
+nfp_abm_qdisc_graft(struct nfp_abm_link *alink, u32 handle, u32 child_handle,
+   unsigned int id)
+{
+   struct nfp_qdisc *parent, *child;
+
+   parent = nfp_abm_qdisc_find(alink, handle);
+   if (!parent)
+   return 0;
+
+   if (WARN(id >= 

[Patch net-next v2] net: dump more useful information in netdev_rx_csum_fault()

2018-11-12 Thread Cong Wang
Currently netdev_rx_csum_fault() only shows a device name,
we need more information about the skb for debugging csum
failures.

Sample output:

 ens3: hw csum failure
 dev features: 0x00014b89
 skb len=84 data_len=0 pkt_type=0 gso_size=0 gso_type=0 nr_frags=0 ip_summed=0 
csum=0 csum_complete_sw=0 csum_valid=0 csum_level=0

Note, I use pr_err() just to be consistent with the existing one.

Signed-off-by: Cong Wang 
---
 include/linux/netdevice.h |  5 +++--
 net/core/datagram.c   |  2 +-
 net/core/dev.c| 11 +--
 net/core/skbuff.c |  4 ++--
 net/sunrpc/socklib.c  |  2 +-
 5 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 487fa5e0e165..5a97ffbff932 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4342,9 +4342,10 @@ static inline bool 
can_checksum_protocol(netdev_features_t features,
 }
 
 #ifdef CONFIG_BUG
-void netdev_rx_csum_fault(struct net_device *dev);
+void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
 #else
-static inline void netdev_rx_csum_fault(struct net_device *dev)
+static inline void netdev_rx_csum_fault(struct net_device *dev,
+   struct sk_buff *skb)
 {
 }
 #endif
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 07983b90d2bd..4bf62b1afa3b 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -767,7 +767,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
-   netdev_rx_csum_fault(NULL);
+   netdev_rx_csum_fault(NULL, skb);
}
return 0;
 fault:
diff --git a/net/core/dev.c b/net/core/dev.c
index bf7e0a471186..5927f6a7c301 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3091,10 +3091,17 @@ EXPORT_SYMBOL(__skb_gso_segment);
 
 /* Take action when hardware reception checksum errors are detected. */
 #ifdef CONFIG_BUG
-void netdev_rx_csum_fault(struct net_device *dev)
+void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
 {
if (net_ratelimit()) {
pr_err("%s: hw csum failure\n", dev ? dev->name : "");
+   if (dev)
+   pr_err("dev features: %pNF\n", >features);
+   pr_err("skb len=%u data_len=%u pkt_type=%u gso_size=%u 
gso_type=%u nr_frags=%u ip_summed=%u csum=%x csum_complete_sw=%d csum_valid=%d 
csum_level=%u\n",
+  skb->len, skb->data_len, skb->pkt_type,
+  skb_shinfo(skb)->gso_size, skb_shinfo(skb)->gso_type,
+  skb_shinfo(skb)->nr_frags, skb->ip_summed, skb->csum,
+  skb->csum_complete_sw, skb->csum_valid, skb->csum_level);
dump_stack();
}
 }
@@ -5781,7 +5788,7 @@ __sum16 __skb_gro_checksum_complete(struct sk_buff *skb)
if (likely(!sum)) {
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
-   netdev_rx_csum_fault(skb->dev);
+   netdev_rx_csum_fault(skb->dev, skb);
}
 
NAPI_GRO_CB(skb)->csum = wsum;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 396fcb3baad0..fcb1155a00ec 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2653,7 +2653,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, 
int len)
if (likely(!sum)) {
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
-   netdev_rx_csum_fault(skb->dev);
+   netdev_rx_csum_fault(skb->dev, skb);
}
if (!skb_shared(skb))
skb->csum_valid = !sum;
@@ -2673,7 +2673,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
if (likely(!sum)) {
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
-   netdev_rx_csum_fault(skb->dev);
+   netdev_rx_csum_fault(skb->dev, skb);
}
 
if (!skb_shared(skb)) {
diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c
index 9062967575c4..7e55cfc69697 100644
--- a/net/sunrpc/socklib.c
+++ b/net/sunrpc/socklib.c
@@ -175,7 +175,7 @@ int csum_partial_copy_to_xdr(struct xdr_buf *xdr, struct 
sk_buff *skb)
return -1;
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
-   netdev_rx_csum_fault(skb->dev);
+   netdev_rx_csum_fault(skb->dev, skb);
return 0;
 no_checksum:
if (xdr_partial_copy_from_skb(xdr, 0, , xdr_skb_read_bits) < 0)
-- 
2.19.1



Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-12 Thread Cong Wang
On Fri, Nov 9, 2018 at 8:16 PM David Miller  wrote:
>
> Didn't you move this function into net/core/skbuff.c? :-)

At the time when I created this patch, that patch didn't hit net-next yet. :)

>
> Please respin.

Sure. I will send v2, as I need to cleanup the mixed commas and
spaces too.

Thanks.


[iproute PATCH] man: ip-route.8: Document nexthop limit

2018-11-12 Thread Phil Sutter
Add a note to 'nexthop' description stating the maximum number of
nexthops per command and pointing at 'append' command as a workaround.

Signed-off-by: Phil Sutter 
---
 man/man8/ip-route.8.in | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
index a33ce1f0f4006..383178c11331e 100644
--- a/man/man8/ip-route.8.in
+++ b/man/man8/ip-route.8.in
@@ -589,6 +589,13 @@ argument lists:
 route reflecting its relative bandwidth or quality.
 .in -8
 
+The internal buffer used in iproute2 limits the maximum number of nexthops to
+be specified in one go. If only a gateway address is given, the current buffer
+size allows for 144 IPv6 nexthops and 253 IPv4 ones. If more are required, they
+may be added to the existing route using
+.B "ip route append"
+command.
+
 .TP
 .BI scope " SCOPE_VAL"
 the scope of the destinations covered by the route prefix.
-- 
2.19.0



Re: [PATCH v5 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin

2018-11-12 Thread Stanislav Fomichev
On 11/12, Martin Lau wrote:
> On Fri, Nov 09, 2018 at 08:21:41AM -0800, Stanislav Fomichev wrote:
> [ ... ]
> > @@ -1918,23 +2160,20 @@ void *bpf_object__priv(struct bpf_object *obj)
> >  }
> >  
> >  static struct bpf_program *
> > -__bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
> > +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i)
> >  {
> > -   size_t idx;
> > +   ssize_t idx;
> >  
> > if (!obj->programs)
> > return NULL;
> > -   /* First handler */
> > -   if (prev == NULL)
> > -   return >programs[0];
> >  
> > -   if (prev->obj != obj) {
> > +   if (p->obj != obj) {
> > pr_warning("error: program handler doesn't match object\n");
> > return NULL;
> > }
> >  
> > -   idx = (prev - obj->programs) + 1;
> > -   if (idx >= obj->nr_programs)
> > +   idx = (p - obj->programs) + i;
> > +   if (idx >= obj->nr_programs || idx < 0)
> > return NULL;
> > return >programs[idx];
> >  }
> > @@ -1944,8 +2183,29 @@ bpf_program__next(struct bpf_program *prev, struct 
> > bpf_object *obj)
> >  {
> > struct bpf_program *prog = prev;
> >  
> > +   if (prev == NULL)
> > +   return obj->programs;
> > +
> This patch breaks the behavior introduced in
> commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for 
> multi-function programs"):
> "Make bpf_program__next() skip over '.text' section if object file
>  has pseudo calls.  The '.text' section is hardly a program in that
>  case, it's more of a storage for code of functions other than main."
> 
> For example, the userspace could have been doing:
>   prog = bpf_program__next(NULL, obj);
>   bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
>   bpf_object__load(obj);
> 
> For the bpf_prog.o that has pseudo calls, after this patch in bpf-next,
> the prog returned by bpf_program__next() could be in ".text" instead of
> the main bpf program.  The next bpf_program__set_type() has
> no effect to the main program.  The following bpf_object__load()
> will catch user in surprise with the main bpf prog in
> the wrong BPF_PROG_TYPE.

Will something like the following fix your concern? (plus, assuming the
same for prev):

--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2216,8 +2216,11 @@ bpf_program__next(struct bpf_program *prev, struct 
bpf_object *obj)
 {
struct bpf_program *prog = prev;
 
-   if (prev == NULL)
-   return obj->programs;
+   if (prev == NULL) {
+   prog = obj->programs;
+   if (!prog || !bpf_program__is_function_storage(prog, obj))
+   return prog;
+   }
 
do {
prog = __bpf_program__iter(prog, obj, 1);

Any suggestions for a better way to do it?

> > do {
> > -   prog = __bpf_program__next(prog, obj);
> > +   prog = __bpf_program__iter(prog, obj, 1);
> > +   } while (prog && bpf_program__is_function_storage(prog, obj));
> > +
> > +   return prog;
> > +}
> > +
> > +struct bpf_program *
> > +bpf_program__prev(struct bpf_program *next, struct bpf_object *obj)
> > +{
> > +   struct bpf_program *prog = next;
> > +
> > +   if (next == NULL) {
> > +   if (!obj->nr_programs)
> > +   return NULL;
> > +   return obj->programs + obj->nr_programs - 1;
> > +   }
> > +
> > +   do {
> > +   prog = __bpf_program__iter(prog, obj, -1);
> > } while (prog && bpf_program__is_function_storage(prog, obj));
> >  
> > return prog;


Re: [PATCH bpf-next v2] bpftool: make libbfd optional

2018-11-12 Thread Jakub Kicinski
On Mon, 12 Nov 2018 13:44:10 -0800, Stanislav Fomichev wrote:
> Make it possible to build bpftool without libbfd. libbfd and libopcodes are
> typically provided in dev/dbg packages (binutils-dev in debian) which we
> usually don't have installed on the fleet machines and we'd like a way to have
> bpftool version that works without installing any additional packages.
> This excludes support for disassembling jit-ted code and prints an error if
> the user tries to use these features.
> 
> Tested by:
> cat > FEATURES_DUMP.bpftool < feature-libbfd=0
> feature-disassembler-four-args=1
> feature-reallocarray=0
> feature-libelf=1
> feature-libelf-mmap=1
> feature-bpf=1
> EOF
> FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make
> ldd bpftool | grep libbfd
> 
> Signed-off-by: Stanislav Fomichev 

Seems reasonable, thanks!

Acked-by: Jakub Kicinski 



Re: [PATCH v5 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin

2018-11-12 Thread Martin Lau
On Fri, Nov 09, 2018 at 08:21:41AM -0800, Stanislav Fomichev wrote:
[ ... ]
> @@ -1918,23 +2160,20 @@ void *bpf_object__priv(struct bpf_object *obj)
>  }
>  
>  static struct bpf_program *
> -__bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
> +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i)
>  {
> - size_t idx;
> + ssize_t idx;
>  
>   if (!obj->programs)
>   return NULL;
> - /* First handler */
> - if (prev == NULL)
> - return >programs[0];
>  
> - if (prev->obj != obj) {
> + if (p->obj != obj) {
>   pr_warning("error: program handler doesn't match object\n");
>   return NULL;
>   }
>  
> - idx = (prev - obj->programs) + 1;
> - if (idx >= obj->nr_programs)
> + idx = (p - obj->programs) + i;
> + if (idx >= obj->nr_programs || idx < 0)
>   return NULL;
>   return >programs[idx];
>  }
> @@ -1944,8 +2183,29 @@ bpf_program__next(struct bpf_program *prev, struct 
> bpf_object *obj)
>  {
>   struct bpf_program *prog = prev;
>  
> + if (prev == NULL)
> + return obj->programs;
> +
This patch breaks the behavior introduced in
commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for 
multi-function programs"):
"Make bpf_program__next() skip over '.text' section if object file
 has pseudo calls.  The '.text' section is hardly a program in that
 case, it's more of a storage for code of functions other than main."

For example, the userspace could have been doing:
prog = bpf_program__next(NULL, obj);
bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
bpf_object__load(obj);

For the bpf_prog.o that has pseudo calls, after this patch in bpf-next,
the prog returned by bpf_program__next() could be in ".text" instead of
the main bpf program.  The next bpf_program__set_type() has
no effect to the main program.  The following bpf_object__load()
will catch user in surprise with the main bpf prog in
the wrong BPF_PROG_TYPE.


>   do {
> - prog = __bpf_program__next(prog, obj);
> + prog = __bpf_program__iter(prog, obj, 1);
> + } while (prog && bpf_program__is_function_storage(prog, obj));
> +
> + return prog;
> +}
> +
> +struct bpf_program *
> +bpf_program__prev(struct bpf_program *next, struct bpf_object *obj)
> +{
> + struct bpf_program *prog = next;
> +
> + if (next == NULL) {
> + if (!obj->nr_programs)
> + return NULL;
> + return obj->programs + obj->nr_programs - 1;
> + }
> +
> + do {
> + prog = __bpf_program__iter(prog, obj, -1);
>   } while (prog && bpf_program__is_function_storage(prog, obj));
>  
>   return prog;


[PATCH bpf-next v2] bpftool: make libbfd optional

2018-11-12 Thread Stanislav Fomichev
Make it possible to build bpftool without libbfd. libbfd and libopcodes are
typically provided in dev/dbg packages (binutils-dev in debian) which we
usually don't have installed on the fleet machines and we'd like a way to have
bpftool version that works without installing any additional packages.
This excludes support for disassembling jit-ted code and prints an error if
the user tries to use these features.

Tested by:
cat > FEATURES_DUMP.bpftool <
---
 tools/bpf/bpftool/Makefile | 13 +++--
 tools/bpf/bpftool/jit_disasm.c |  8 +++-
 tools/bpf/bpftool/main.c   |  3 ---
 tools/bpf/bpftool/main.h   | 14 ++
 tools/bpf/bpftool/prog.c   |  3 +++
 5 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index dac7eff4c7e5..1bea6b979082 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -53,7 +53,7 @@ ifneq ($(EXTRA_LDFLAGS),)
 LDFLAGS += $(EXTRA_LDFLAGS)
 endif
 
-LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
+LIBS = -lelf $(LIBBPF)
 
 INSTALL ?= install
 RM ?= rm -f
@@ -90,7 +90,16 @@ include $(wildcard $(OUTPUT)*.d)
 
 all: $(OUTPUT)bpftool
 
-SRCS = $(wildcard *.c)
+BFD_SRCS = jit_disasm.c
+
+SRCS = $(filter-out $(BFD_SRCS),$(wildcard *.c))
+
+ifeq ($(feature-libbfd),1)
+CFLAGS += -DHAVE_LIBBFD_SUPPORT
+SRCS += $(BFD_SRCS)
+LIBS += -lbfd -lopcodes
+endif
+
 OBJS = $(patsubst %.c,$(OUTPUT)%.o,$(SRCS)) $(OUTPUT)disasm.o
 
 $(OUTPUT)disasm.o: $(srctree)/kernel/bpf/disasm.c
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
index c75ffd9ce2bb..b2ed5ee1af5f 100644
--- a/tools/bpf/bpftool/jit_disasm.c
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -109,7 +109,7 @@ void disasm_print_insn(unsigned char *image, ssize_t len, 
int opcodes,
if (inf) {
bfdf->arch_info = inf;
} else {
-   p_err("No libfd support for %s", arch);
+   p_err("No libbfd support for %s", arch);
return;
}
}
@@ -183,3 +183,9 @@ void disasm_print_insn(unsigned char *image, ssize_t len, 
int opcodes,
 
bfd_close(bfdf);
 }
+
+int disasm_init(void)
+{
+   bfd_init();
+   return 0;
+}
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 75a3296dc0bc..5c4c1cd5a7ba 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -31,7 +31,6 @@
  * SOFTWARE.
  */
 
-#include 
 #include 
 #include 
 #include 
@@ -399,8 +398,6 @@ int main(int argc, char **argv)
if (argc < 0)
usage();
 
-   bfd_init();
-
ret = cmd_select(cmds, argc, argv, do_help);
 
if (json_output)
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 61d82020af58..10c6c16fae29 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -147,8 +147,22 @@ int prog_parse_fd(int *argc, char ***argv);
 int map_parse_fd(int *argc, char ***argv);
 int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 
*info_len);
 
+#ifdef HAVE_LIBBFD_SUPPORT
 void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
   const char *arch, const char *disassembler_options);
+int disasm_init(void);
+#else
+static inline
+void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
+  const char *arch, const char *disassembler_options)
+{
+}
+static inline int disasm_init(void)
+{
+   p_err("No libbfd support");
+   return -1;
+}
+#endif
 void print_data_json(uint8_t *data, size_t len);
 void print_hex_data_json(uint8_t *data, size_t len);
 
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 5ff5544596e7..c176e1aa66fe 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -467,6 +467,9 @@ static int do_dump(int argc, char **argv)
int fd;
 
if (is_prefix(*argv, "jited")) {
+   if (disasm_init())
+   return -1;
+
member_len = _prog_len;
member_ptr = _prog_insns;
} else if (is_prefix(*argv, "xlated")) {
-- 
2.19.1.930.g4563a0d9d0-goog



Re: [PATCH bpf-next] bpftool: make libbfd optional

2018-11-12 Thread Stanislav Fomichev
On 11/12, Jakub Kicinski wrote:
> On Mon, 12 Nov 2018 11:58:27 -0800, Stanislav Fomichev wrote:
> > On 11/12, Jakub Kicinski wrote:
> > > On Mon, 12 Nov 2018 10:53:28 -0800, Stanislav Fomichev wrote:  
> > > > diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
> > > > index 61d82020af58..ec1bc2ae3c71 100644
> > > > --- a/tools/bpf/bpftool/main.h
> > > > +++ b/tools/bpf/bpftool/main.h
> > > > @@ -147,8 +147,19 @@ int prog_parse_fd(int *argc, char ***argv);
> > > >  int map_parse_fd(int *argc, char ***argv);
> > > >  int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 
> > > > *info_len);
> > > >  
> > > > +#ifdef HAVE_LIBBFD_SUPPORT
> > > >  void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
> > > >const char *arch, const char 
> > > > *disassembler_options);
> > > > +void disasm_init(void);
> > > > +#else
> > > > +static inline
> > > > +void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
> > > > +  const char *arch, const char 
> > > > *disassembler_options)
> > > > +{
> > > > +   p_err("No libbfd support");
> > > > +}  
> > > 
> > > I think an error per instruction is a bit much, could we make sure we
> > > error out earlier?  
> > 
> > I can return int from disasm_print_insn as an indication to
> > stop/continue. Let me know if you have better ideas, will post v2 later
> > today.
> 
> Why not simply:
Makes sense. I think I can actually repurpose disasm_init for that (the
function I added to wrap bfd_init, I think I can move the callsite here
without any problems).

> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 5302ee282409..44edf6705ad2 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -466,6 +466,10 @@ static int do_dump(int argc, char **argv)
> int fd;
>  
> if (is_prefix(*argv, "jited")) {
> +#ifndef HAVE_LIBBFD_SUPPORT
> +   p_err("No libbfd support");
> +   return -1;
> +#endif
> member_len = _prog_len;
> member_ptr = _prog_insns;
> } else if (is_prefix(*argv, "xlated")) {
> 
> Perhaps wrapped into some helper in the header to avoid the yuckiness.


Re: [PATCH v2] net: Add trace events for all receive exit points

2018-11-12 Thread Mathieu Desnoyers
- On Nov 12, 2018, at 3:46 PM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Nov 12, 2018, at 3:40 PM, rostedt rost...@goodmis.org wrote:
> 
>> On Mon, 12 Nov 2018 15:20:55 -0500 (EST)
>> Mathieu Desnoyers  wrote:
>> 
>>> 
>>> Hrm, looking at this again, I notice that there is a single DEFINE_EVENT
>>> using net_dev_template_simple.
>>> 
>>> We could simply turn netif_receive_skb_list_exit into a TRACE_EVENT(),
>>> remove the net_dev_template_simple, and rename the net_dev_template_return
>>> to net_dev_template ?
>> 
>> This too is only cosmetic and doesn't affect the code at all, because a
>> TRACE_EVENT() is really just:
>> 
>> #define TRACE_EVENT(name, proto, args, tstruct, assign, print) \
>>  DECLARE_EVENT_CLASS(name,  \
>>   PARAMS(proto),\
>>   PARAMS(args), \
>>   PARAMS(tstruct),  \
>>   PARAMS(assign),   \
>>   PARAMS(print));   \
>>  DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args));
>> 
>> -- Steve
>> 
> 
> Of course.
> 
> I also notice that in two cases, a "gro_result_t" is implicitly cast
> to "int". I usually frown upon this kind of stuff, because it's asking
> for trouble if gro_result_t typedef to something else than "int" in the
> future.
> 
> I would recommend going for two templates, one which takes a "int"
> ret parameter, and the other a "gro_result_t" ret parameter.
> 
> Or am I being too cautious ?

Digging further, gro_result_t maps to:

enum gro_result {
GRO_MERGED,
GRO_MERGED_FREE,
GRO_HELD,
GRO_NORMAL,
GRO_DROP,
GRO_CONSUMED,
};
typedef enum gro_result gro_result_t;

So we should add a TRACE_DEFINE_ENUM() for those.

Thanks,

Mathieu



> 
> Thanks,
> 
> Mathieu
> 
> 
>>> 
>>> It's pretty clear from the prototype that it expects a "ret" argument,
>>> so I don't see the need to also state it in the template name.
>>> 
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH v2] net: Add trace events for all receive exit points

2018-11-12 Thread Mathieu Desnoyers
- On Nov 12, 2018, at 3:40 PM, rostedt rost...@goodmis.org wrote:

> On Mon, 12 Nov 2018 15:20:55 -0500 (EST)
> Mathieu Desnoyers  wrote:
> 
>> 
>> Hrm, looking at this again, I notice that there is a single DEFINE_EVENT
>> using net_dev_template_simple.
>> 
>> We could simply turn netif_receive_skb_list_exit into a TRACE_EVENT(),
>> remove the net_dev_template_simple, and rename the net_dev_template_return
>> to net_dev_template ?
> 
> This too is only cosmetic and doesn't affect the code at all, because a
> TRACE_EVENT() is really just:
> 
> #define TRACE_EVENT(name, proto, args, tstruct, assign, print) \
>   DECLARE_EVENT_CLASS(name,  \
>PARAMS(proto),\
>PARAMS(args), \
>PARAMS(tstruct),  \
>PARAMS(assign),   \
>PARAMS(print));   \
>   DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args));
> 
> -- Steve
> 

Of course.

I also notice that in two cases, a "gro_result_t" is implicitly cast
to "int". I usually frown upon this kind of stuff, because it's asking
for trouble if gro_result_t typedef to something else than "int" in the
future.

I would recommend going for two templates, one which takes a "int"
ret parameter, and the other a "gro_result_t" ret parameter.

Or am I being too cautious ?

Thanks,

Mathieu


>> 
>> It's pretty clear from the prototype that it expects a "ret" argument,
>> so I don't see the need to also state it in the template name.
>> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH mlx5-next 00/10] Collection of ODP fixes

2018-11-12 Thread Leon Romanovsky
On Thu, Nov 08, 2018 at 09:10:07PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky 
>
> Hi,
>
> This is collection of fixes to mlx5_core and mlx5_ib ODP logic.
> There are two main reasons why we decided to forward it to mlx5-next
> and not to rdma-rc or net like our other fixes.
>
> 1. We have large number of patches exactly in that area that are ready
> for submission and we don't want to create extra merge work for
> maintainers due to that. Among those future series (already ready and
> tested): internal change of mlx5_core pagefaults handling, moving ODP
> code to RDMA, change in EQ mechanism, simplification and moving SRQ QP
> to RDMA, extra fixes to mlx5_ib ODP and ODP SRQ support.
>
> 2. Most of those fixes are for old bugs and there is no rush to fix them
> right now (in -rc1/-rc2).
>
> Thanks
>
> Moni Shoua (10):
>   net/mlx5: Release resource on error flow
>   net/mlx5: Add interface to hold and release core resources
>   net/mlx5: Enumerate page fault types
>   IB/mlx5: Lock QP during page fault handling
>   net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
>   net/mlx5: Use multi threaded workqueue for page fault handling
>   IB/mlx5: Improve ODP debugging messages

Applied to mlx5-next
b02394aa75e3 IB/mlx5: Improve ODP debugging messages
90290db7669b net/mlx5: Use multi threaded workqueue for page fault handling
ef90c5e9757d net/mlx5: Return success for PAGE_FAULT_RESUME in internal error 
state
032080ab43ac IB/mlx5: Lock QP during page fault handling
c99fefea2cc9 net/mlx5: Enumerate page fault types
27e95603f4df net/mlx5: Add interface to hold and release core resources
698114968a22 net/mlx5: Release resource on error flow

>   IB/mlx5: Avoid hangs due to a missing completion
>   IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously

Dropped to address feedback.

>   net/mlx5: Remove unused function

Dropped, because it depends on dropped commit.

Thanks

>
>  drivers/infiniband/core/umem_odp.c|  14 +-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h  |   1 +
>  drivers/infiniband/hw/mlx5/mr.c   |  15 +-
>  drivers/infiniband/hw/mlx5/odp.c  | 158 ++
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  21 +--
>  drivers/net/ethernet/mellanox/mlx5/core/qp.c  |  20 ++-
>  include/linux/mlx5/device.h   |   7 +
>  include/linux/mlx5/driver.h   |   7 +-
>  include/linux/mlx5/qp.h   |   5 +
>  10 files changed, 191 insertions(+), 59 deletions(-)
>
> --
> 2.19.1
>


signature.asc
Description: PGP signature


Re: [PATCH v2] net: Add trace events for all receive exit points

2018-11-12 Thread Steven Rostedt
On Mon, 12 Nov 2018 15:20:55 -0500 (EST)
Mathieu Desnoyers  wrote:

> 
> Hrm, looking at this again, I notice that there is a single DEFINE_EVENT
> using net_dev_template_simple.
> 
> We could simply turn netif_receive_skb_list_exit into a TRACE_EVENT(),
> remove the net_dev_template_simple, and rename the net_dev_template_return
> to net_dev_template ?

This too is only cosmetic and doesn't affect the code at all, because a
TRACE_EVENT() is really just:

#define TRACE_EVENT(name, proto, args, tstruct, assign, print) \
DECLARE_EVENT_CLASS(name,  \
 PARAMS(proto),\
 PARAMS(args), \
 PARAMS(tstruct),  \
 PARAMS(assign),   \
 PARAMS(print));   \
DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args));

-- Steve

> 
> It's pretty clear from the prototype that it expects a "ret" argument,
> so I don't see the need to also state it in the template name.
> 
>


Re: [PATCH v2] net: Add trace events for all receive exit points

2018-11-12 Thread Mathieu Desnoyers
- On Nov 12, 2018, at 3:20 PM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Nov 12, 2018, at 3:09 PM, Mathieu Desnoyers
> mathieu.desnoy...@efficios.com wrote:
> 
>> - On Nov 12, 2018, at 2:44 PM, Geneviève Bastien gbast...@versatic.net
>> wrote:
>> 
>>> Trace events are already present for the receive entry points, to indicate
>>> how the reception entered the stack.
>>> 
>>> This patch adds the corresponding exit trace events that will bound the
>>> reception such that all events occurring between the entry and the exit
>>> can be considered as part of the reception context. This greatly helps
>>> for dependency and root cause analyses.
>>> 
>>> Without this, it is impossible to determine whether a sched_wakeup
>>> event following a netif_receive_skb event is the result of the packet
>>> reception or a simple coincidence after further processing by the
>>> thread.
>> 
>> As discussed over IRC, it is _possible_ to use kretprobes to instrument
>> the exit, but it is not practical. A v3 will be sent soon to clarify
>> this part of the commit message.
>> 
>> Also, I wonder if we should use "net_dev_template_exit" for the event
>> class rather than "net_dev_template_return" ?
> 
> Hrm, looking at this again, I notice that there is a single DEFINE_EVENT
> using net_dev_template_simple.
> 
> We could simply turn netif_receive_skb_list_exit into a TRACE_EVENT(),
> remove the net_dev_template_simple, and rename the net_dev_template_return
> to net_dev_template ?
> 
> It's pretty clear from the prototype that it expects a "ret" argument,
> so I don't see the need to also state it in the template name.

As you pointed out on IRC, net_dev_template already exists. So we can
use "net_dev_rx_exit_template" which is along the same lines as
its entry counterpart "net_dev_rx_verbose_template".

Thanks,

Mathieu


> 
> Thanks,
> 
> Mathieu
> 
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>>> 
>>> Signed-off-by: Geneviève Bastien 
>>> CC: Mathieu Desnoyers 
>>> CC: Steven Rostedt 
>>> CC: Ingo Molnar 
>>> CC: David S. Miller 
>>> ---
>>> Changes in v2:
>>>  - Add the return value to tracepoints where applicable
>>>  - Verify if tracepoint is enabled before walking list in
>>>netif_receive_skb_list
>>> ---
>>> include/trace/events/net.h | 78 ++
>>> net/core/dev.c | 38 ---
>>> 2 files changed, 110 insertions(+), 6 deletions(-)
>>> 
>>> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
>>> index 00aa72ce0e7c..cff1a7b9d0bb 100644
>>> --- a/include/trace/events/net.h
>>> +++ b/include/trace/events/net.h
>>> @@ -117,6 +117,42 @@ DECLARE_EVENT_CLASS(net_dev_template,
>>> __get_str(name), __entry->skbaddr, __entry->len)
>>> )
>>> 
>>> +DECLARE_EVENT_CLASS(net_dev_template_return,
>>> +
>>> +   TP_PROTO(struct sk_buff *skb, int ret),
>>> +
>>> +   TP_ARGS(skb, ret),
>>> +
>>> +   TP_STRUCT__entry(
>>> +   __field(void *, skbaddr)
>>> +   __field(int,ret)
>>> +   ),
>>> +
>>> +   TP_fast_assign(
>>> +   __entry->skbaddr = skb;
>>> +   __entry->ret = ret;
>>> +   ),
>>> +
>>> +   TP_printk("skbaddr=%p ret=%d", __entry->skbaddr, __entry->ret)
>>> +)
>>> +
>>> +DECLARE_EVENT_CLASS(net_dev_template_simple,
>>> +
>>> +   TP_PROTO(struct sk_buff *skb),
>>> +
>>> +   TP_ARGS(skb),
>>> +
>>> +   TP_STRUCT__entry(
>>> +   __field(void *, skbaddr)
>>> +   ),
>>> +
>>> +   TP_fast_assign(
>>> +   __entry->skbaddr = skb;
>>> +   ),
>>> +
>>> +   TP_printk("skbaddr=%p", __entry->skbaddr)
>>> +)
>>> +
>>> DEFINE_EVENT(net_dev_template, net_dev_queue,
>>> 
>>> TP_PROTO(struct sk_buff *skb),
>>> @@ -244,6 +280,48 @@ DEFINE_EVENT(net_dev_rx_verbose_template,
>>> netif_rx_ni_entry,
>>> TP_ARGS(skb)
>>> );
>>> 
>>> +DEFINE_EVENT(net_dev_template_return, napi_gro_frags_exit,
>>> +
>>> +   TP_PROTO(struct sk_buff *skb, int ret),
>>> +
>>> +   TP_ARGS(skb, ret)
>>> +);
>>> +
>>> +DEFINE_EVENT(net_dev_template_return, napi_gro_receive_exit,
>>> +
>>> +   TP_PROTO(struct sk_buff *skb, int ret),
>>> +
>>> +   TP_ARGS(skb, ret)
>>> +);
>>> +
>>> +DEFINE_EVENT(net_dev_template_return, netif_receive_skb_exit,
>>> +
>>> +   TP_PROTO(struct sk_buff *skb, int ret),
>>> +
>>> +   TP_ARGS(skb, ret)
>>> +);
>>> +
>>> +DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_list_exit,
>>> +
>>> +   TP_PROTO(struct sk_buff *skb),
>>> +
>>> +   TP_ARGS(skb)
>>> +);
>>> +
>>> +DEFINE_EVENT(net_dev_template_return, netif_rx_exit,
>>> +
>>> +   TP_PROTO(struct sk_buff *skb, int ret),
>>> +
>>> +   TP_ARGS(skb, ret)
>>> +);
>>> +
>>> +DEFINE_EVENT(net_dev_template_return, netif_rx_ni_exit,
>>> +
>>> +   TP_PROTO(struct sk_buff *skb, int ret),
>>> +
>>> +   TP_ARGS(skb, ret)
>>> +);
>>> +
>>> #endif /* _TRACE_NET_H */
>>> 
>>> /* This part must be outside protection */
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 0ffcbdd55fa9..c4dc5df34abe 100644
>>> --- a/net/core/dev.c
>>> +++ 

Re: [PATCH bpf-next] bpftool: make libbfd optional

2018-11-12 Thread Jakub Kicinski
On Mon, 12 Nov 2018 11:58:27 -0800, Stanislav Fomichev wrote:
> On 11/12, Jakub Kicinski wrote:
> > On Mon, 12 Nov 2018 10:53:28 -0800, Stanislav Fomichev wrote:  
> > > diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
> > > index 61d82020af58..ec1bc2ae3c71 100644
> > > --- a/tools/bpf/bpftool/main.h
> > > +++ b/tools/bpf/bpftool/main.h
> > > @@ -147,8 +147,19 @@ int prog_parse_fd(int *argc, char ***argv);
> > >  int map_parse_fd(int *argc, char ***argv);
> > >  int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 
> > > *info_len);
> > >  
> > > +#ifdef HAVE_LIBBFD_SUPPORT
> > >  void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
> > >  const char *arch, const char *disassembler_options);
> > > +void disasm_init(void);
> > > +#else
> > > +static inline
> > > +void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
> > > +const char *arch, const char *disassembler_options)
> > > +{
> > > + p_err("No libbfd support");
> > > +}  
> > 
> > I think an error per instruction is a bit much, could we make sure we
> > error out earlier?  
> 
> I can return int from disasm_print_insn as an indication to
> stop/continue. Let me know if you have better ideas, will post v2 later
> today.

Why not simply:

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 5302ee282409..44edf6705ad2 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -466,6 +466,10 @@ static int do_dump(int argc, char **argv)
int fd;
 
if (is_prefix(*argv, "jited")) {
+#ifndef HAVE_LIBBFD_SUPPORT
+   p_err("No libbfd support");
+   return -1;
+#endif
member_len = _prog_len;
member_ptr = _prog_insns;
} else if (is_prefix(*argv, "xlated")) {

Perhaps wrapped into some helper in the header to avoid the yuckiness.


Re: [PATCH v2] net: Add trace events for all receive exit points

2018-11-12 Thread Steven Rostedt
On Mon, 12 Nov 2018 14:44:05 -0500
Geneviève Bastien  wrote:

> Trace events are already present for the receive entry points, to indicate
> how the reception entered the stack.
> 
> This patch adds the corresponding exit trace events that will bound the
> reception such that all events occurring between the entry and the exit
> can be considered as part of the reception context. This greatly helps
> for dependency and root cause analyses.
> 
> Without this, it is impossible to determine whether a sched_wakeup
> event following a netif_receive_skb event is the result of the packet
> reception or a simple coincidence after further processing by the
> thread.
> 
> Signed-off-by: Geneviève Bastien 
> CC: Mathieu Desnoyers 
> CC: Steven Rostedt 
> CC: Ingo Molnar 
> CC: David S. Miller 
> ---
> Changes in v2:
>   - Add the return value to tracepoints where applicable
>   - Verify if tracepoint is enabled before walking list in
> netif_receive_skb_list
> ---
>  include/trace/events/net.h | 78 ++
>  net/core/dev.c | 38 ---
>  2 files changed, 110 insertions(+), 6 deletions(-)
> 
> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> index 00aa72ce0e7c..cff1a7b9d0bb 100644
> --- a/include/trace/events/net.h
> +++ b/include/trace/events/net.h
> @@ -117,6 +117,42 @@ DECLARE_EVENT_CLASS(net_dev_template,
>   __get_str(name), __entry->skbaddr, __entry->len)
>  )
>  
> +DECLARE_EVENT_CLASS(net_dev_template_return,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret),
> +
> + TP_STRUCT__entry(
> + __field(void *, skbaddr)
> + __field(int,ret)
> + ),
> +
> + TP_fast_assign(
> + __entry->skbaddr = skb;
> + __entry->ret = ret;
> + ),
> +
> + TP_printk("skbaddr=%p ret=%d", __entry->skbaddr, __entry->ret)
> +)
> +
> +DECLARE_EVENT_CLASS(net_dev_template_simple,
> +
> + TP_PROTO(struct sk_buff *skb),
> +
> + TP_ARGS(skb),
> +
> + TP_STRUCT__entry(
> + __field(void *, skbaddr)
> + ),
> +
> + TP_fast_assign(
> + __entry->skbaddr = skb;
> + ),
> +
> + TP_printk("skbaddr=%p", __entry->skbaddr)
> +)
> +
>  DEFINE_EVENT(net_dev_template, net_dev_queue,
>  
>   TP_PROTO(struct sk_buff *skb),
> @@ -244,6 +280,48 @@ DEFINE_EVENT(net_dev_rx_verbose_template, 
> netif_rx_ni_entry,
>   TP_ARGS(skb)
>  );
>  
> +DEFINE_EVENT(net_dev_template_return, napi_gro_frags_exit,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret)
> +);
> +
> +DEFINE_EVENT(net_dev_template_return, napi_gro_receive_exit,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret)
> +);
> +
> +DEFINE_EVENT(net_dev_template_return, netif_receive_skb_exit,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret)
> +);
> +
> +DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_list_exit,

One small nit, and I don't care enough to ask you to fix it, but others
might care. We probably should not intermix net_dev_template_simple
events within net_dev_template_return events.

But like I said, I don't really care, it's more cosmetic than anything
else.

For the tracing side:

Reviewed-by: Steven Rostedt (VMware) 

-- Steve


> +
> + TP_PROTO(struct sk_buff *skb),
> +
> + TP_ARGS(skb)
> +);
> +
> +DEFINE_EVENT(net_dev_template_return, netif_rx_exit,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret)
> +);
> +
> +DEFINE_EVENT(net_dev_template_return, netif_rx_ni_exit,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret)
> +);
> +
>  #endif /* _TRACE_NET_H */
>  
>


Re: [PATCH net-next] net: phy: check for implementation of both callbacks in phy_drv_supports_irq

2018-11-12 Thread Florian Fainelli
On 11/12/18 12:16 PM, Heiner Kallweit wrote:
> Now that the icplus driver has been fixed all PHY drivers supporting
> interrupts have both callbacks (config_intr and ack_interrupt)
> implemented - as it should be. Therefore phy_drv_supports_irq()
> can be changed now to check for both callbacks being implemented.
> 
> Signed-off-by: Heiner Kallweit 

Acked-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v2] net: Add trace events for all receive exit points

2018-11-12 Thread Mathieu Desnoyers
- On Nov 12, 2018, at 3:09 PM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Nov 12, 2018, at 2:44 PM, Geneviève Bastien gbast...@versatic.net
> wrote:
> 
>> Trace events are already present for the receive entry points, to indicate
>> how the reception entered the stack.
>> 
>> This patch adds the corresponding exit trace events that will bound the
>> reception such that all events occurring between the entry and the exit
>> can be considered as part of the reception context. This greatly helps
>> for dependency and root cause analyses.
>> 
>> Without this, it is impossible to determine whether a sched_wakeup
>> event following a netif_receive_skb event is the result of the packet
>> reception or a simple coincidence after further processing by the
>> thread.
> 
> As discussed over IRC, it is _possible_ to use kretprobes to instrument
> the exit, but it is not practical. A v3 will be sent soon to clarify
> this part of the commit message.
> 
> Also, I wonder if we should use "net_dev_template_exit" for the event
> class rather than "net_dev_template_return" ?

Hrm, looking at this again, I notice that there is a single DEFINE_EVENT
using net_dev_template_simple.

We could simply turn netif_receive_skb_list_exit into a TRACE_EVENT(),
remove the net_dev_template_simple, and rename the net_dev_template_return
to net_dev_template ?

It's pretty clear from the prototype that it expects a "ret" argument,
so I don't see the need to also state it in the template name.

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
>> 
>> Signed-off-by: Geneviève Bastien 
>> CC: Mathieu Desnoyers 
>> CC: Steven Rostedt 
>> CC: Ingo Molnar 
>> CC: David S. Miller 
>> ---
>> Changes in v2:
>>  - Add the return value to tracepoints where applicable
>>  - Verify if tracepoint is enabled before walking list in
>>netif_receive_skb_list
>> ---
>> include/trace/events/net.h | 78 ++
>> net/core/dev.c | 38 ---
>> 2 files changed, 110 insertions(+), 6 deletions(-)
>> 
>> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
>> index 00aa72ce0e7c..cff1a7b9d0bb 100644
>> --- a/include/trace/events/net.h
>> +++ b/include/trace/events/net.h
>> @@ -117,6 +117,42 @@ DECLARE_EVENT_CLASS(net_dev_template,
>>  __get_str(name), __entry->skbaddr, __entry->len)
>> )
>> 
>> +DECLARE_EVENT_CLASS(net_dev_template_return,
>> +
>> +TP_PROTO(struct sk_buff *skb, int ret),
>> +
>> +TP_ARGS(skb, ret),
>> +
>> +TP_STRUCT__entry(
>> +__field(void *, skbaddr)
>> +__field(int,ret)
>> +),
>> +
>> +TP_fast_assign(
>> +__entry->skbaddr = skb;
>> +__entry->ret = ret;
>> +),
>> +
>> +TP_printk("skbaddr=%p ret=%d", __entry->skbaddr, __entry->ret)
>> +)
>> +
>> +DECLARE_EVENT_CLASS(net_dev_template_simple,
>> +
>> +TP_PROTO(struct sk_buff *skb),
>> +
>> +TP_ARGS(skb),
>> +
>> +TP_STRUCT__entry(
>> +__field(void *, skbaddr)
>> +),
>> +
>> +TP_fast_assign(
>> +__entry->skbaddr = skb;
>> +),
>> +
>> +TP_printk("skbaddr=%p", __entry->skbaddr)
>> +)
>> +
>> DEFINE_EVENT(net_dev_template, net_dev_queue,
>> 
>>  TP_PROTO(struct sk_buff *skb),
>> @@ -244,6 +280,48 @@ DEFINE_EVENT(net_dev_rx_verbose_template,
>> netif_rx_ni_entry,
>>  TP_ARGS(skb)
>> );
>> 
>> +DEFINE_EVENT(net_dev_template_return, napi_gro_frags_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb, int ret),
>> +
>> +TP_ARGS(skb, ret)
>> +);
>> +
>> +DEFINE_EVENT(net_dev_template_return, napi_gro_receive_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb, int ret),
>> +
>> +TP_ARGS(skb, ret)
>> +);
>> +
>> +DEFINE_EVENT(net_dev_template_return, netif_receive_skb_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb, int ret),
>> +
>> +TP_ARGS(skb, ret)
>> +);
>> +
>> +DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_list_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb),
>> +
>> +TP_ARGS(skb)
>> +);
>> +
>> +DEFINE_EVENT(net_dev_template_return, netif_rx_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb, int ret),
>> +
>> +TP_ARGS(skb, ret)
>> +);
>> +
>> +DEFINE_EVENT(net_dev_template_return, netif_rx_ni_exit,
>> +
>> +TP_PROTO(struct sk_buff *skb, int ret),
>> +
>> +TP_ARGS(skb, ret)
>> +);
>> +
>> #endif /* _TRACE_NET_H */
>> 
>> /* This part must be outside protection */
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 0ffcbdd55fa9..c4dc5df34abe 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -4520,9 +4520,14 @@ static int netif_rx_internal(struct sk_buff *skb)
>> 
>> int netif_rx(struct sk_buff *skb)
>> {
>> +int ret;
>> +
>>  trace_netif_rx_entry(skb);
>> 
>> -return netif_rx_internal(skb);
>> +ret = netif_rx_internal(skb);
>> +trace_netif_rx_exit(skb, ret);
>> +
>> +return ret;
>> }
>> EXPORT_SYMBOL(netif_rx);
>> 
>> @@ -4537,6 +4542,7 @@ int netif_rx_ni(struct sk_buff *skb)
>>  if 

[PATCH net-next] net: phy: check for implementation of both callbacks in phy_drv_supports_irq

2018-11-12 Thread Heiner Kallweit
Now that the icplus driver has been fixed all PHY drivers supporting
interrupts have both callbacks (config_intr and ack_interrupt)
implemented - as it should be. Therefore phy_drv_supports_irq()
can be changed now to check for both callbacks being implemented.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/phy/phy_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 55202a0ac..e06613f2d 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2122,7 +2122,7 @@ static void of_set_phy_eee_broken(struct phy_device 
*phydev)
 
 static bool phy_drv_supports_irq(struct phy_driver *phydrv)
 {
-   return phydrv->config_intr || phydrv->ack_interrupt;
+   return phydrv->config_intr && phydrv->ack_interrupt;
 }
 
 /**
-- 
2.19.1



Re: [PATCH v2] net: Add trace events for all receive exit points

2018-11-12 Thread Mathieu Desnoyers
- On Nov 12, 2018, at 2:44 PM, Geneviève Bastien gbast...@versatic.net 
wrote:

> Trace events are already present for the receive entry points, to indicate
> how the reception entered the stack.
> 
> This patch adds the corresponding exit trace events that will bound the
> reception such that all events occurring between the entry and the exit
> can be considered as part of the reception context. This greatly helps
> for dependency and root cause analyses.
> 
> Without this, it is impossible to determine whether a sched_wakeup
> event following a netif_receive_skb event is the result of the packet
> reception or a simple coincidence after further processing by the
> thread.

As discussed over IRC, it is _possible_ to use kretprobes to instrument
the exit, but it is not practical. A v3 will be sent soon to clarify
this part of the commit message.

Also, I wonder if we should use "net_dev_template_exit" for the event
class rather than "net_dev_template_return" ?

Thanks,

Mathieu

> 
> Signed-off-by: Geneviève Bastien 
> CC: Mathieu Desnoyers 
> CC: Steven Rostedt 
> CC: Ingo Molnar 
> CC: David S. Miller 
> ---
> Changes in v2:
>  - Add the return value to tracepoints where applicable
>  - Verify if tracepoint is enabled before walking list in
>netif_receive_skb_list
> ---
> include/trace/events/net.h | 78 ++
> net/core/dev.c | 38 ---
> 2 files changed, 110 insertions(+), 6 deletions(-)
> 
> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> index 00aa72ce0e7c..cff1a7b9d0bb 100644
> --- a/include/trace/events/net.h
> +++ b/include/trace/events/net.h
> @@ -117,6 +117,42 @@ DECLARE_EVENT_CLASS(net_dev_template,
>   __get_str(name), __entry->skbaddr, __entry->len)
> )
> 
> +DECLARE_EVENT_CLASS(net_dev_template_return,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret),
> +
> + TP_STRUCT__entry(
> + __field(void *, skbaddr)
> + __field(int,ret)
> + ),
> +
> + TP_fast_assign(
> + __entry->skbaddr = skb;
> + __entry->ret = ret;
> + ),
> +
> + TP_printk("skbaddr=%p ret=%d", __entry->skbaddr, __entry->ret)
> +)
> +
> +DECLARE_EVENT_CLASS(net_dev_template_simple,
> +
> + TP_PROTO(struct sk_buff *skb),
> +
> + TP_ARGS(skb),
> +
> + TP_STRUCT__entry(
> + __field(void *, skbaddr)
> + ),
> +
> + TP_fast_assign(
> + __entry->skbaddr = skb;
> + ),
> +
> + TP_printk("skbaddr=%p", __entry->skbaddr)
> +)
> +
> DEFINE_EVENT(net_dev_template, net_dev_queue,
> 
>   TP_PROTO(struct sk_buff *skb),
> @@ -244,6 +280,48 @@ DEFINE_EVENT(net_dev_rx_verbose_template,
> netif_rx_ni_entry,
>   TP_ARGS(skb)
> );
> 
> +DEFINE_EVENT(net_dev_template_return, napi_gro_frags_exit,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret)
> +);
> +
> +DEFINE_EVENT(net_dev_template_return, napi_gro_receive_exit,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret)
> +);
> +
> +DEFINE_EVENT(net_dev_template_return, netif_receive_skb_exit,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret)
> +);
> +
> +DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_list_exit,
> +
> + TP_PROTO(struct sk_buff *skb),
> +
> + TP_ARGS(skb)
> +);
> +
> +DEFINE_EVENT(net_dev_template_return, netif_rx_exit,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret)
> +);
> +
> +DEFINE_EVENT(net_dev_template_return, netif_rx_ni_exit,
> +
> + TP_PROTO(struct sk_buff *skb, int ret),
> +
> + TP_ARGS(skb, ret)
> +);
> +
> #endif /* _TRACE_NET_H */
> 
> /* This part must be outside protection */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0ffcbdd55fa9..c4dc5df34abe 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4520,9 +4520,14 @@ static int netif_rx_internal(struct sk_buff *skb)
> 
> int netif_rx(struct sk_buff *skb)
> {
> + int ret;
> +
>   trace_netif_rx_entry(skb);
> 
> - return netif_rx_internal(skb);
> + ret = netif_rx_internal(skb);
> + trace_netif_rx_exit(skb, ret);
> +
> + return ret;
> }
> EXPORT_SYMBOL(netif_rx);
> 
> @@ -4537,6 +4542,7 @@ int netif_rx_ni(struct sk_buff *skb)
>   if (local_softirq_pending())
>   do_softirq();
>   preempt_enable();
> + trace_netif_rx_ni_exit(skb, err);
> 
>   return err;
> }
> @@ -5222,9 +5228,14 @@ static void netif_receive_skb_list_internal(struct
> list_head *head)
>  */
> int netif_receive_skb(struct sk_buff *skb)
> {
> + int ret;
> +
>   trace_netif_receive_skb_entry(skb);
> 
> - return netif_receive_skb_internal(skb);
> + ret = netif_receive_skb_internal(skb);
> + trace_netif_receive_skb_exit(skb, ret);
> +
> + return ret;
> }
> EXPORT_SYMBOL(netif_receive_skb);
> 
> @@ -5244,9 +5255,15 @@ void netif_receive_skb_list(struct list_head 

Re: [PATCH bpf-next] bpftool: make libbfd optional

2018-11-12 Thread Stanislav Fomichev
On 11/12, Jakub Kicinski wrote:
> On Mon, 12 Nov 2018 10:53:28 -0800, Stanislav Fomichev wrote:
> > Make it possible to build bpftool without libbfd. This excludes support for
> > disassembling jit-ted code and prints an error if the user tries to use
> > these features.
> > 
> > Tested by:
> > cat > FEATURES_DUMP.bpftool < > feature-libbfd=0
> > feature-disassembler-four-args=1
> > feature-reallocarray=0
> > feature-libelf=1
> > feature-libelf-mmap=1
> > feature-bpf=1
> > EOF
> > FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make
> > ldd bpftool | grep libbfd
> > 
> > Signed-off-by: Stanislav Fomichev 
> 
> Would you mind spelling out the motivation?
Ack, will do in v2 commit message. tldr - some fleet machines don't
have dev/dbg stuff installed; would like an option to have bpftool that
works everywhere for introspection.

> > diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
> > index 61d82020af58..ec1bc2ae3c71 100644
> > --- a/tools/bpf/bpftool/main.h
> > +++ b/tools/bpf/bpftool/main.h
> > @@ -147,8 +147,19 @@ int prog_parse_fd(int *argc, char ***argv);
> >  int map_parse_fd(int *argc, char ***argv);
> >  int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 
> > *info_len);
> >  
> > +#ifdef HAVE_LIBBFD_SUPPORT
> >  void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
> >const char *arch, const char *disassembler_options);
> > +void disasm_init(void);
> > +#else
> > +static inline
> > +void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
> > +  const char *arch, const char *disassembler_options)
> > +{
> > +   p_err("No libbfd support");
> > +}
> 
> I think an error per instruction is a bit much, could we make sure we
> error out earlier?
I can return int from disasm_print_insn as an indication to
stop/continue. Let me know if you have better ideas, will post v2 later
today.

> > +static inline void disasm_init(void) {}
> > +#endif
> >  void print_data_json(uint8_t *data, size_t len);
> >  void print_hex_data_json(uint8_t *data, size_t len);
> 
> Otherwise LGTM.
Thanks!


[PATCH v2] net: Add trace events for all receive exit points

2018-11-12 Thread Geneviève Bastien
Trace events are already present for the receive entry points, to indicate
how the reception entered the stack.

This patch adds the corresponding exit trace events that will bound the
reception such that all events occurring between the entry and the exit
can be considered as part of the reception context. This greatly helps
for dependency and root cause analyses.

Without this, it is impossible to determine whether a sched_wakeup
event following a netif_receive_skb event is the result of the packet
reception or a simple coincidence after further processing by the
thread.

Signed-off-by: Geneviève Bastien 
CC: Mathieu Desnoyers 
CC: Steven Rostedt 
CC: Ingo Molnar 
CC: David S. Miller 
---
Changes in v2:
  - Add the return value to tracepoints where applicable
  - Verify if tracepoint is enabled before walking list in
netif_receive_skb_list
---
 include/trace/events/net.h | 78 ++
 net/core/dev.c | 38 ---
 2 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 00aa72ce0e7c..cff1a7b9d0bb 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -117,6 +117,42 @@ DECLARE_EVENT_CLASS(net_dev_template,
__get_str(name), __entry->skbaddr, __entry->len)
 )
 
+DECLARE_EVENT_CLASS(net_dev_template_return,
+
+   TP_PROTO(struct sk_buff *skb, int ret),
+
+   TP_ARGS(skb, ret),
+
+   TP_STRUCT__entry(
+   __field(void *, skbaddr)
+   __field(int,ret)
+   ),
+
+   TP_fast_assign(
+   __entry->skbaddr = skb;
+   __entry->ret = ret;
+   ),
+
+   TP_printk("skbaddr=%p ret=%d", __entry->skbaddr, __entry->ret)
+)
+
+DECLARE_EVENT_CLASS(net_dev_template_simple,
+
+   TP_PROTO(struct sk_buff *skb),
+
+   TP_ARGS(skb),
+
+   TP_STRUCT__entry(
+   __field(void *, skbaddr)
+   ),
+
+   TP_fast_assign(
+   __entry->skbaddr = skb;
+   ),
+
+   TP_printk("skbaddr=%p", __entry->skbaddr)
+)
+
 DEFINE_EVENT(net_dev_template, net_dev_queue,
 
TP_PROTO(struct sk_buff *skb),
@@ -244,6 +280,48 @@ DEFINE_EVENT(net_dev_rx_verbose_template, 
netif_rx_ni_entry,
TP_ARGS(skb)
 );
 
+DEFINE_EVENT(net_dev_template_return, napi_gro_frags_exit,
+
+   TP_PROTO(struct sk_buff *skb, int ret),
+
+   TP_ARGS(skb, ret)
+);
+
+DEFINE_EVENT(net_dev_template_return, napi_gro_receive_exit,
+
+   TP_PROTO(struct sk_buff *skb, int ret),
+
+   TP_ARGS(skb, ret)
+);
+
+DEFINE_EVENT(net_dev_template_return, netif_receive_skb_exit,
+
+   TP_PROTO(struct sk_buff *skb, int ret),
+
+   TP_ARGS(skb, ret)
+);
+
+DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_list_exit,
+
+   TP_PROTO(struct sk_buff *skb),
+
+   TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template_return, netif_rx_exit,
+
+   TP_PROTO(struct sk_buff *skb, int ret),
+
+   TP_ARGS(skb, ret)
+);
+
+DEFINE_EVENT(net_dev_template_return, netif_rx_ni_exit,
+
+   TP_PROTO(struct sk_buff *skb, int ret),
+
+   TP_ARGS(skb, ret)
+);
+
 #endif /* _TRACE_NET_H */
 
 /* This part must be outside protection */
diff --git a/net/core/dev.c b/net/core/dev.c
index 0ffcbdd55fa9..c4dc5df34abe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4520,9 +4520,14 @@ static int netif_rx_internal(struct sk_buff *skb)
 
 int netif_rx(struct sk_buff *skb)
 {
+   int ret;
+
trace_netif_rx_entry(skb);
 
-   return netif_rx_internal(skb);
+   ret = netif_rx_internal(skb);
+   trace_netif_rx_exit(skb, ret);
+
+   return ret;
 }
 EXPORT_SYMBOL(netif_rx);
 
@@ -4537,6 +4542,7 @@ int netif_rx_ni(struct sk_buff *skb)
if (local_softirq_pending())
do_softirq();
preempt_enable();
+   trace_netif_rx_ni_exit(skb, err);
 
return err;
 }
@@ -5222,9 +5228,14 @@ static void netif_receive_skb_list_internal(struct 
list_head *head)
  */
 int netif_receive_skb(struct sk_buff *skb)
 {
+   int ret;
+
trace_netif_receive_skb_entry(skb);
 
-   return netif_receive_skb_internal(skb);
+   ret = netif_receive_skb_internal(skb);
+   trace_netif_receive_skb_exit(skb, ret);
+
+   return ret;
 }
 EXPORT_SYMBOL(netif_receive_skb);
 
@@ -5244,9 +5255,15 @@ void netif_receive_skb_list(struct list_head *head)
 
if (list_empty(head))
return;
-   list_for_each_entry(skb, head, list)
-   trace_netif_receive_skb_list_entry(skb);
+   if (trace_netif_receive_skb_list_entry_enabled()) {
+   list_for_each_entry(skb, head, list)
+   trace_netif_receive_skb_list_entry(skb);
+   }
netif_receive_skb_list_internal(head);
+   if (trace_netif_receive_skb_list_exit_enabled()) {
+   list_for_each_entry(skb, head, list)
+   trace_netif_receive_skb_list_exit(skb);
+   }
 }
 

Re: [PATCH bpf-next] bpftool: make libbfd optional

2018-11-12 Thread Jakub Kicinski
On Mon, 12 Nov 2018 10:53:28 -0800, Stanislav Fomichev wrote:
> Make it possible to build bpftool without libbfd. This excludes support for
> disassembling jit-ted code and prints an error if the user tries to use
> these features.
> 
> Tested by:
> cat > FEATURES_DUMP.bpftool < feature-libbfd=0
> feature-disassembler-four-args=1
> feature-reallocarray=0
> feature-libelf=1
> feature-libelf-mmap=1
> feature-bpf=1
> EOF
> FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make
> ldd bpftool | grep libbfd
> 
> Signed-off-by: Stanislav Fomichev 

Would you mind spelling out the motivation?

> diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
> index 61d82020af58..ec1bc2ae3c71 100644
> --- a/tools/bpf/bpftool/main.h
> +++ b/tools/bpf/bpftool/main.h
> @@ -147,8 +147,19 @@ int prog_parse_fd(int *argc, char ***argv);
>  int map_parse_fd(int *argc, char ***argv);
>  int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 
> *info_len);
>  
> +#ifdef HAVE_LIBBFD_SUPPORT
>  void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
>  const char *arch, const char *disassembler_options);
> +void disasm_init(void);
> +#else
> +static inline
> +void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
> +const char *arch, const char *disassembler_options)
> +{
> + p_err("No libbfd support");
> +}

I think an error per instruction is a bit much, could we make sure we
error out earlier?

> +static inline void disasm_init(void) {}
> +#endif
>  void print_data_json(uint8_t *data, size_t len);
>  void print_hex_data_json(uint8_t *data, size_t len);

Otherwise LGTM.


Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-12 Thread Paweł Staszewski



W dniu 11.11.2018 o 09:56, Jesper Dangaard Brouer pisze:

On Sat, 10 Nov 2018 22:53:53 +0100 Paweł Staszewski  
wrote:


Now im messing with ring configuration for connectx5 nics.
And after reading that paper:
https://netdevconf.org/2.1/slides/apr6/network-performance/04-amir-RX_and_TX_bulking_v2.pdf


Do notice that some of the ideas in that slide deck, was never
implemented. But they are still on my todo list ;-).

Notice how that it show that TX bulking is very important, but based on
your ethtool_stats.pl, I can see that not much TX bulking is happening
in your case.  This is indicated via the xmit_more counters.

  Ethtool(enp175s0) stat:2630 ( 2,630) <= tx_xmit_more /sec
  Ethtool(enp175s0) stat: 4956995 ( 4,956,995) <= tx_packets /sec

And the per queue levels are also avail:

  Ethtool(enp175s0) stat: 184845 ( 184,845) <= tx7_packets /sec
  Ethtool(enp175s0) stat:     78 (  78) <= tx7_xmit_more /sec

This means that you are doing too many doorbell's to the NIC hardware
at TX time, which I worry could be what cause the NIC and PCIe hardware
not to operate at optimal speeds.


After tunning coal/ring a little with ethtool

Reached today:

 bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
  input: /proc/net/dev type: rate
  | iface   Rx Tx    Total
==
 enp175s0:  50.68 Gb/s   21.53 Gb/s   
72.20 Gb/s
 enp216s0:  21.62 Gb/s   50.81 Gb/s   
72.42 Gb/s

--
    total:  72.30 Gb/s   72.33 Gb/s  
144.63 Gb/s




And still no packet loss (icmp side to side test every 100ms)

Below perf top


   PerfTop:  104692 irqs/sec  kernel:99.5%  exact:  0.0% [4000Hz 
cycles],  (all, 56 CPUs)

---

 9.06%  [kernel]   [k] mlx5e_skb_from_cqe_mpwrq_linear
 6.43%  [kernel]   [k] tasklet_action_common.isra.21
 5.68%  [kernel]   [k] fib_table_lookup
 4.89%  [kernel]   [k] irq_entries_start
 4.53%  [kernel]   [k] mlx5_eq_int
 4.10%  [kernel]   [k] build_skb
 3.39%  [kernel]   [k] mlx5e_poll_tx_cq
 3.38%  [kernel]   [k] mlx5e_sq_xmit
 2.73%  [kernel]   [k] mlx5e_poll_rx_cq
 2.18%  [kernel]   [k] __dev_queue_xmit
 2.13%  [kernel]   [k] vlan_do_receive
 2.12%  [kernel]   [k] mlx5e_handle_rx_cqe_mpwrq
 2.00%  [kernel]   [k] ip_finish_output2
 1.87%  [kernel]   [k] mlx5e_post_rx_mpwqes
 1.86%  [kernel]   [k] memcpy_erms
 1.85%  [kernel]   [k] ipt_do_table
 1.70%  [kernel]   [k] dev_gro_receive
 1.39%  [kernel]   [k] __netif_receive_skb_core
 1.31%  [kernel]   [k] inet_gro_receive
 1.21%  [kernel]   [k] ip_route_input_rcu
 1.21%  [kernel]   [k] tcp_gro_receive
 1.13%  [kernel]   [k] _raw_spin_lock
 1.08%  [kernel]   [k] __build_skb
 1.06%  [kernel]   [k] kmem_cache_free_bulk
 1.05%  [kernel]   [k] __softirqentry_text_start
 1.03%  [kernel]   [k] vlan_dev_hard_start_xmit
 0.98%  [kernel]   [k] pfifo_fast_dequeue
 0.95%  [kernel]   [k] mlx5e_xmit
 0.95%  [kernel]   [k] page_frag_free
 0.88%  [kernel]   [k] ip_forward
 0.81%  [kernel]   [k] dev_hard_start_xmit
 0.78%  [kernel]   [k] rcu_irq_exit
 0.77%  [kernel]   [k] netif_skb_features
 0.72%  [kernel]   [k] napi_complete_done
 0.72%  [kernel]   [k] kmem_cache_alloc
 0.68%  [kernel]   [k] validate_xmit_skb.isra.142
 0.66%  [kernel]   [k] ip_rcv_core.isra.20.constprop.25
 0.58%  [kernel]   [k] swiotlb_map_page
 0.57%  [kernel]   [k] __qdisc_run
 0.56%  [kernel]   [k] tasklet_action
 0.54%  [kernel]   [k] __get_xps_queue_idx
 0.54%  [kernel]   [k] inet_lookup_ifaddr_rcu
 0.50%  [kernel]   [k] tcp4_gro_receive
 0.49%  [kernel]   [k] skb_release_data
 0.47%  [kernel]   [k] eth_type_trans
 0.40%  [kernel]   [k] sch_direct_xmit
 0.40%  [kernel]   [k] net_rx_action
 0.39%  [kernel]   [k] __local_bh_enable_ip


And perf record/report

https://ufile.io/zguq0





So now i know what was causing cpu load for some processes like:

2913 root  20   0   0  0  0 I  10.3  0.0 6:58.29 
kworker/u112:1-
    7 root  20   0   0  0  0 I   8.6  0.0 6:17.18 
kworker/u112:0-
10289 root  20   0   0  0  0 I   6.6  0.0 6:33.90 
kworker/u112:4-
 2939 root  20   0   0  0  0 R   3.6  0.0 7:37.68 
kworker/u112:2-




After disabling adaptative tx for coalescense - all this 

Re: [PATCH net-next] net: phy: switch to lockdep_assert_held in phylib

2018-11-12 Thread Florian Fainelli
On 11/12/18 9:44 AM, Andrew Lunn wrote:
> On Sun, Nov 11, 2018 at 10:33:08PM +0100, Heiner Kallweit wrote:
>> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
>> index 2e59a8419..5cb06f021 100644
>> --- a/drivers/net/phy/mdio_bus.c
>> +++ b/drivers/net/phy/mdio_bus.c
>> @@ -541,7 +541,7 @@ int __mdiobus_read(struct mii_bus *bus, int addr, u32 
>> regnum)
>>  {
>>  int retval;
>>  
>> -WARN_ON_ONCE(!mutex_is_locked(>mdio_lock));
>> +lockdep_assert_held_once(>mdio_lock);
> 
> Hi Heiner
> 
> I don't think there is a clear right/wrong here. This is not hot path
> code. The cost for checking the lock is held is very small compared to
> the actual MDIO transaction. So i don't think we really need to
> optimise this. I do sometimes build with lockdep on, but not
> always. So it is good to know when locking is broken on normal builds.
> 
> Florian, what do you think?

lockdep_assert_held_once() also looks at debug_locks (global variable)
so it sounds like in that regard, it would be superior in that it allows
an user-configurable, general debugging facility to behave consistently
as opposed to always having opt-in debugging within the mdio_bus.c file,
but that also has a lot of value. I have to admit debugging MDIO bus
locking issues is not particularly fun, so I would probably stick with
the current code in that regard.
-- 
Florian


Re: [PATCH net-next] net: phy: icplus: add config_intr callback

2018-11-12 Thread Florian Fainelli
On 11/11/18 12:49 PM, Heiner Kallweit wrote:
> Move IRQ configuration for IP101A/G from config_init to config_intr
> callback. Reasons:
> 
> 1. This allows phylib to disable interrupts if needed.
> 2. Icplus was the only driver supporting interrupts w/o defining a
>config_intr callback. Now we can add a phylib plausibility check
>disabling interrupt mode if one of the two irq-related callbacks
>isn't defined.
> 
> I don't own hardware with this PHY, and the change is based on the
> datasheet for IP101A LF (which is supposed to be register-compatible
> with IP101A/G). Change is compile-tested only.
> 
> Signed-off-by: Heiner Kallweit 

Reviewed-by: Florian Fainelli 

A bit surprising that this is not a read/modify/write sequence.
-- 
Florian


[PATCH bpf-next] bpftool: make libbfd optional

2018-11-12 Thread Stanislav Fomichev
Make it possible to build bpftool without libbfd. This excludes support for
disassembling jit-ted code and prints an error if the user tries to use
these features.

Tested by:
cat > FEATURES_DUMP.bpftool <
---
 tools/bpf/bpftool/Makefile | 13 +++--
 tools/bpf/bpftool/jit_disasm.c |  7 ++-
 tools/bpf/bpftool/main.c   |  3 +--
 tools/bpf/bpftool/main.h   | 11 +++
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index dac7eff4c7e5..1bea6b979082 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -53,7 +53,7 @@ ifneq ($(EXTRA_LDFLAGS),)
 LDFLAGS += $(EXTRA_LDFLAGS)
 endif
 
-LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
+LIBS = -lelf $(LIBBPF)
 
 INSTALL ?= install
 RM ?= rm -f
@@ -90,7 +90,16 @@ include $(wildcard $(OUTPUT)*.d)
 
 all: $(OUTPUT)bpftool
 
-SRCS = $(wildcard *.c)
+BFD_SRCS = jit_disasm.c
+
+SRCS = $(filter-out $(BFD_SRCS),$(wildcard *.c))
+
+ifeq ($(feature-libbfd),1)
+CFLAGS += -DHAVE_LIBBFD_SUPPORT
+SRCS += $(BFD_SRCS)
+LIBS += -lbfd -lopcodes
+endif
+
 OBJS = $(patsubst %.c,$(OUTPUT)%.o,$(SRCS)) $(OUTPUT)disasm.o
 
 $(OUTPUT)disasm.o: $(srctree)/kernel/bpf/disasm.c
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
index c75ffd9ce2bb..6176bfe66f22 100644
--- a/tools/bpf/bpftool/jit_disasm.c
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -109,7 +109,7 @@ void disasm_print_insn(unsigned char *image, ssize_t len, 
int opcodes,
if (inf) {
bfdf->arch_info = inf;
} else {
-   p_err("No libfd support for %s", arch);
+   p_err("No libbfd support for %s", arch);
return;
}
}
@@ -183,3 +183,8 @@ void disasm_print_insn(unsigned char *image, ssize_t len, 
int opcodes,
 
bfd_close(bfdf);
 }
+
+void disasm_init(void)
+{
+   bfd_init();
+}
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 75a3296dc0bc..51aa71069b14 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -31,7 +31,6 @@
  * SOFTWARE.
  */
 
-#include 
 #include 
 #include 
 #include 
@@ -399,7 +398,7 @@ int main(int argc, char **argv)
if (argc < 0)
usage();
 
-   bfd_init();
+   disasm_init();
 
ret = cmd_select(cmds, argc, argv, do_help);
 
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 61d82020af58..ec1bc2ae3c71 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -147,8 +147,19 @@ int prog_parse_fd(int *argc, char ***argv);
 int map_parse_fd(int *argc, char ***argv);
 int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 
*info_len);
 
+#ifdef HAVE_LIBBFD_SUPPORT
 void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
   const char *arch, const char *disassembler_options);
+void disasm_init(void);
+#else
+static inline
+void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
+  const char *arch, const char *disassembler_options)
+{
+   p_err("No libbfd support");
+}
+static inline void disasm_init(void) {}
+#endif
 void print_data_json(uint8_t *data, size_t len);
 void print_hex_data_json(uint8_t *data, size_t len);
 
-- 
2.19.1.930.g4563a0d9d0-goog



Re: [PATCH bpf-next] filter: add BPF_ADJ_ROOM_DATA mode to bpf_skb_adjust_room()

2018-11-12 Thread Martin Lau
On Sun, Nov 11, 2018 at 12:43:27AM +0100, Nicolas Dichtel wrote:
> Le 09/11/2018 à 19:51, Martin Lau a écrit :
> > On Thu, Nov 08, 2018 at 04:11:37PM +0100, Nicolas Dichtel wrote:
> [snip]
> >> +static int bpf_skb_data_shrink(struct sk_buff *skb, u32 len)
> >> +{
> >> +  unsigned short hhlen = skb->dev->header_ops ?
> >> + skb->dev->hard_header_len : 0;
> >> +  int ret;
> >> +
> >> +  ret = skb_unclone(skb, GFP_ATOMIC);
> >> +  if (unlikely(ret < 0))
> >> +  return ret;
> >> +
> >> +  __skb_pull(skb, len);
> >> +  skb_reset_mac_header(skb);
> >> +  skb_reset_network_header(skb);
> >> +  skb->network_header += hhlen;
Nit. skb_set_network_header(skb, hhlen);

Othen than that

Acked-by: Martin KaFai Lau 

> >> +  skb_reset_transport_header(skb);
> > hmm...why transport_header does not need += hhlen here
> > while network_header does?
> 
> network_header is mandatory because bpf_redirect(BPF_F_INGRESS) can be called
> and network_header is expected to be correctly set in this case.
> For transport_header, I choose to not set it, because the stack will set it
> later (for example ip_rcv_core()).
ic. make sense.


[PATCHv2 net-net] net: dsa: mv88e6xxx: Work around mv886e6161 SERDES missing MII_PHYSID2

2018-11-12 Thread Andrew Lunn
We already have a workaround for a couple of switches whose internal
PHYs only have the Marvel OUI, but no model number. We detect such
PHYs and give them the 6390 ID as the model number. However the
mv88e6161 has two SERDES interfaces in the same address range as its
internal PHYs. These suffer from the same problem, the Marvell OUI,
but no model number. As a result, these SERDES interfaces were getting
the same PHY ID as the mv88e6390, even though they are not PHYs, and
the Marvell PHY driver was trying to drive them.

Add a special case to stop this from happen.

Reported-by: Chris Healy 
Signed-off-by: Andrew Lunn 
---
v2:
grammar fix in commit message.
PHYS->PHYs
---
 drivers/net/dsa/mv88e6xxx/chip.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index fc0f508879d4..b603f8d6ee3e 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2524,11 +2524,22 @@ static int mv88e6xxx_mdio_read(struct mii_bus *bus, int 
phy, int reg)
mutex_unlock(>reg_lock);
 
if (reg == MII_PHYSID2) {
-   /* Some internal PHYS don't have a model number.  Use
-* the mv88e6390 family model number instead.
-*/
-   if (!(val & 0x3f0))
-   val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4;
+   /* Some internal PHYs don't have a model number. */
+   if (chip->info->family != MV88E6XXX_FAMILY_6165)
+   /* Then there is the 6165 family. It gets is
+* PHYs correct. But it can also have two
+* SERDES interfaces in the PHY address
+* space. And these don't have a model
+* number. But they are not PHYs, so we don't
+* want to give them something a PHY driver
+* will recognise.
+*
+* Use the mv88e6390 family model number
+* instead, for anything which really could be
+* a PHY,
+*/
+   if (!(val & 0x3f0))
+   val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4;
}
 
return err ? err : val;
-- 
2.19.1



[PATCHv2] MAINTAINERS: Replace Vince Bridgers as Altera TSE maintainer

2018-11-12 Thread thor . thayer
From: Thor Thayer 

Vince has moved to a different role. Replace him as Altera
TSE maintainer.

Signed-off-by: Thor Thayer 
Acked-by: Vince Bridgers 
Acked-by: Alan Tull 
---
v2  Include netdev and David Miller
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index e110e327bf38..827fd5fc6ebd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -717,7 +717,7 @@ F:  include/linux/mfd/altera-a10sr.h
 F: include/dt-bindings/reset/altr,rst-mgr-a10sr.h
 
 ALTERA TRIPLE SPEED ETHERNET DRIVER
-M: Vince Bridgers 
+M: Thor Thayer 
 L: netdev@vger.kernel.org
 L: nios2-...@lists.rocketboards.org (moderated for non-subscribers)
 S: Maintained
-- 
2.7.4



Re: [PATCH net-next] net: phy: switch to lockdep_assert_held in phylib

2018-11-12 Thread Andrew Lunn
On Sun, Nov 11, 2018 at 10:33:08PM +0100, Heiner Kallweit wrote:
> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> index 2e59a8419..5cb06f021 100644
> --- a/drivers/net/phy/mdio_bus.c
> +++ b/drivers/net/phy/mdio_bus.c
> @@ -541,7 +541,7 @@ int __mdiobus_read(struct mii_bus *bus, int addr, u32 
> regnum)
>  {
>   int retval;
>  
> - WARN_ON_ONCE(!mutex_is_locked(>mdio_lock));
> + lockdep_assert_held_once(>mdio_lock);

Hi Heiner

I don't think there is a clear right/wrong here. This is not hot path
code. The cost for checking the lock is held is very small compared to
the actual MDIO transaction. So i don't think we really need to
optimise this. I do sometimes build with lockdep on, but not
always. So it is good to know when locking is broken on normal builds.

Florian, what do you think?

 Andrew


Re: [PATCH net-next] net: phy: icplus: add config_intr callback

2018-11-12 Thread Andrew Lunn
On Sun, Nov 11, 2018 at 09:49:12PM +0100, Heiner Kallweit wrote:
> Move IRQ configuration for IP101A/G from config_init to config_intr
> callback. Reasons:
> 
> 1. This allows phylib to disable interrupts if needed.
> 2. Icplus was the only driver supporting interrupts w/o defining a
>config_intr callback. Now we can add a phylib plausibility check
>disabling interrupt mode if one of the two irq-related callbacks
>isn't defined.
> 
> I don't own hardware with this PHY, and the change is based on the
> datasheet for IP101A LF (which is supposed to be register-compatible
> with IP101A/G). Change is compile-tested only.

Hi Heiner

This looks sensible. Thanks for fixing up this driver.

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net-next 17/17] net: sched: unlock rules update API

2018-11-12 Thread David Miller
From: Vlad Buslov 
Date: Mon, 12 Nov 2018 09:55:46 +0200

> Register netlink protocol handlers for message types RTM_NEWTFILTER,
> RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
> tracks rtnl mutex state to be false by default.

This whole conditional locking mechanism is really not clean and makes
this code so much harder to understand and audit.

Please improve the code so that this kind of construct is not needed.

Thank you.


Re: [PATCH net-next 02/17] net: sched: protect block state with spinlock

2018-11-12 Thread David Miller
From: Vlad Buslov 
Date: Mon, 12 Nov 2018 09:55:31 +0200

> +#define ASSERT_BLOCK_LOCKED(block)   \
> + WARN_ONCE(!spin_is_locked(&(block)->lock),  \
> +   "BLOCK: assertion failed at %s (%d)\n", __FILE__,  __LINE__)

spin_is_locked() is not usable for assertions.

When SMP=n and DEBUG_SPINLOCK=n, it always returns zero.


Re: [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue

2018-11-12 Thread David Miller
From: Vlad Buslov 
Date: Mon, 12 Nov 2018 09:55:30 +0200

> +void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
> +   struct tcf_proto *tp_head)
> +{
> + xchg(>tp_head, tp_head);

If you are not checking the return value of xchg(), then this is
simply a store with optionally a memory barrier of some sort
either before or after.


Re: [RFC PATCH 1/3] acpi: Add acpi mdio support code

2018-11-12 Thread Andrew Lunn
On Thu, Nov 08, 2018 at 03:22:16PM +0800, Wang Dongsheng wrote:
> Add support for parsing the ACPI data node for PHY devices on an MDIO bus.
> The current implementation depend on mdio bus scan.
> With _DSD device properties we can finally do this:
> 
> Device (MDIO) {
> Name (_DSD, Package () {
> ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> Package () { Package () { "ethernet-phy@0", PHY0 }, }
> })
> Name (PHY0, Package() {
> ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> Package () { Package () { "reg", 0x0 }, }
> })
> }
> 
> Device (MACO) {
> Name (_DSD, Package () {
> ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> Package () { Package () { "phy-handle", \_SB.MDIO, 
> "ethernet-phy@0" }, }
> })
> }
> 
> Documentations:
> The DT "phy-handle" binding that we reuse for ACPI is documented in
> Documentation/devicetree/bindings/phy/phy-bindings.txt
> 
> Documentation/acpi/dsd/data-node-references.txt
> Documentation/acpi/dsd/graph.txt
> 
> Signed-off-by: Wang Dongsheng 
> ---
>  drivers/acpi/Kconfig   |   6 ++
>  drivers/acpi/Makefile  |   1 +
>  drivers/acpi/acpi_mdio.c   | 167 +
>  drivers/net/phy/mdio_bus.c |   3 +
>  include/linux/acpi_mdio.h  |  82 ++
>  5 files changed, 259 insertions(+)
> 
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index 9705fc986da9..0fefa3410ce9 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -252,6 +252,12 @@ config ACPI_PROCESSOR_IDLE
>  config ACPI_MCFG
>   bool
>  
> +config ACPI_MDIO
> + def_tristate PHYLIB
> + depends on PHYLIB
> + help
> +   ACPI MDIO bus (Ethernet PHY) accessors
> +
>  config ACPI_CPPC_LIB
>   bool
>   depends on ACPI_PROCESSOR
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 6d59aa109a91..ec7461a064fc 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -41,6 +41,7 @@ acpi-y  += ec.o
>  acpi-$(CONFIG_ACPI_DOCK) += dock.o
>  acpi-y   += pci_root.o pci_link.o pci_irq.o
>  obj-$(CONFIG_ACPI_MCFG)  += pci_mcfg.o
> +acpi-$(CONFIG_ACPI_MDIO) += acpi_mdio.o
>  acpi-y   += acpi_lpss.o acpi_apd.o
>  acpi-y   += acpi_platform.o
>  acpi-y   += acpi_pnp.o
> diff --git a/drivers/acpi/acpi_mdio.c b/drivers/acpi/acpi_mdio.c
> new file mode 100644
> index ..293bf9a63197
> --- /dev/null

> +++ b/drivers/acpi/acpi_mdio.c
> @@ -0,0 +1,167 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +// Lots of code in this file is copy from drivers/of/of_mdio.c
> +// Copyright (c) 2018 Huaxintong Semiconductor Technology Co., Ltd.

I agree with Rafael here. We should try to refactor and combine this
code. And where possible, we should try to add a uniform fwnode_ API
which underneath either uses OF or ACPI, depending on the type of the
node. We already have fwnode_get_mac_address(), fwmode_irq_get(),
fwnode_get_phy_mode(), so where possible, PHYs should be no different.

  Andrew


Re: [RFC PATCH 0/3] acpi: Add acpi mdio support code

2018-11-12 Thread Andrew Lunn
> > I'm just trying to ensure whatever is defined is flexible enough that
> > we really can later support everything which DT does. We have PHYs on
> > MDIO busses, inside switches, which are on MDIO busses, which are
> > inside Ethernet interfaces, etc.
> 
> I think it can be satisfied. See the table I'm using above.

Hi Dongsheng


Since i don't know anything better, i think i have to trust you have
this correct.

It would be good to document this, so that the next person who needs
to add ACPI support for a PHY has some documentation to look at.
Could you add something to Documentation/acpi/dsd?

  Andrew


Re: [PATCHv2 net-next 0/3] sctp: add support for sk_reuseport

2018-11-12 Thread David Miller
From: Xin Long 
Date: Mon, 12 Nov 2018 18:27:14 +0800

> sctp sk_reuseport allows multiple socks to listen on the same port and
> addresses, as long as these socks have the same uid. This works pretty
> much as TCP/UDP does, the only difference is that sctp is multi-homing
> and all the bind_addrs in these socks will have to completely matched,
> otherwise listen() will return err.
> 
> The below is when 5 sockets are listening on 172.16.254.254:6400 on a
> server, 26 sockets on a client connect to 172.16.254.254:6400 and each
> may be processed by a different socket on the server which is selected
> by hash(lport, pport, paddr) in reuseport_select_sock():
> 
>  # ss --sctp -nn
 ...

Series applied, thanks.


[PATCH net] l2tp: fix a sock refcnt leak in l2tp_tunnel_register

2018-11-12 Thread Xin Long
This issue happens when trying to add an existent tunnel. It
doesn't call sock_put() before returning -EEXIST to release
the sock refcnt that was held by calling sock_hold() before
the existence check.

This patch is to fix it by holding the sock after doing the
existence check.

Fixes: f6cd651b056f ("l2tp: fix race in duplicate tunnel detection")
Reported-by: Jianlin Shi 
Signed-off-by: Xin Long 
---
 net/l2tp/l2tp_core.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 82cdf90..26f1d43 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1490,12 +1490,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, 
struct net *net,
goto err_sock;
}
 
-   sk = sock->sk;
-
-   sock_hold(sk);
-   tunnel->sock = sk;
tunnel->l2tp_net = net;
-
pn = l2tp_pernet(net);
 
spin_lock_bh(>l2tp_tunnel_list_lock);
@@ -1510,6 +1505,10 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, 
struct net *net,
list_add_rcu(>list, >l2tp_tunnel_list);
spin_unlock_bh(>l2tp_tunnel_list_lock);
 
+   sk = sock->sk;
+   sock_hold(sk);
+   tunnel->sock = sk;
+
if (tunnel->encap == L2TP_ENCAPTYPE_UDP) {
struct udp_tunnel_sock_cfg udp_cfg = {
.sk_user_data = tunnel,
-- 
2.1.0



Re: [RFC PATCH 2/6] phy: armada38x: add common phy support

2018-11-12 Thread Andrew Lunn
> +static int a38x_comphy_poll(struct a38x_comphy_lane *lane,
> + unsigned int offset, u32 mask, u32 value)
> +{
> + unsigned int timeout = 10;
> + u32 val;
> +
> + while (1) {
> + val = readl_relaxed(lane->base + offset);
> + if ((val & mask) == value)
> + return 0;
> + if (!timeout--)
> + break;
> + udelay(10);
> + }

Hi Russell

Maybe use one of the readx_poll_timeout() variants?

  Andrew


Re: [PATCH][net-next] net: remove BUG_ON from __pskb_pull_tail

2018-11-12 Thread David Miller
From: Li RongQing 
Date: Mon, 12 Nov 2018 17:26:13 +0800

>   do {
> - BUG_ON(!list);
>  

Please get rid of the empty line afterwards as well.

Thank you.


Re: [PATCH net v2 1/1] bnx2x: Assign unique DMAE channel number for FW DMAE transactions.

2018-11-12 Thread David Miller
From: Sudarsana Reddy Kalluru 
Date: Sun, 11 Nov 2018 18:27:34 -0800

> Driver assigns DMAE channel 0 for FW as part of START_RAMROD command. FW
> uses this channel for DMAE operations (e.g., TIME_SYNC implementation).
> Driver also uses the same channel 0 for DMAE operations for some of the PFs
> (e.g., PF0 on Port0). This could lead to concurrent access to the DMAE
> channel by FW and driver which is not legal. Hence need to assign unique
> DMAE id for FW.
> Currently following DMAE channels are used by the clients,
>   MFW - OCBB/OCSD functionality uses DMAE channel 14/15
>   Driver 0-3 and 8-11 (for PF dmae operations)
>  4 and 12 (for stats requests)
> Assigning unique dmae_id '13' to the FW.
> 
> Changes from previous version:
> --
> v2: Incorporated the review comments.
> 
> Signed-off-by: Sudarsana Reddy Kalluru 
> Signed-off-by: Michal Kalderon 

Applied.


Re: [PATCH net-next 3/3] tcp: get rid of tcp_tso_should_defer() dependency on HZ/jiffies

2018-11-12 Thread Yuchung Cheng
On Sun, Nov 11, 2018 at 11:06 AM, Neal Cardwell  wrote:
> On Sun, Nov 11, 2018 at 9:41 AM Eric Dumazet  wrote:
>>
>> tcp_tso_should_defer() first heuristic is to not defer
>> if last send is "old enough".
>>
>> Its current implementation uses jiffies and its low granularity.
>>
>> TSO autodefer performance should not rely on kernel HZ :/
>>
>> After EDT conversion, we have state variables in nanoseconds that
>> can allow us to properly implement the heuristic.
>>
>> This patch increases TSO chunk sizes on medium rate flows,
>> especially when receivers do not use GRO or similar aggregation.
>>
>> It also reduces bursts for HZ=100 or HZ=250 kernels, making TCP
>> behavior more uniform.
>>
>> Signed-off-by: Eric Dumazet 
>> Acked-by: Soheil Hassas Yeganeh 
>> ---
>
> Nice. Thanks!
>
> Acked-by: Neal Cardwell 
Acked-by: Yuchung Cheng 

Love it
>
> neal


Re: [PATCH iproute2] testsuite: ss: Fix spacing in expected output for ssfilter.t

2018-11-12 Thread Stephen Hemminger
On Sat, 10 Nov 2018 22:48:44 +0100
Phil Sutter  wrote:

> Hi Stefano,
> 
> On Sat, Nov 10, 2018 at 10:21:59AM +0100, Stefano Brivio wrote:
> > Since commit 00240899ec0b ("ss: Actually print left delimiter for
> > columns") changes spacing in ss output, we also need to adjust for that in
> > the ss filter test.
> > 
> > Fixes: 00240899ec0b ("ss: Actually print left delimiter for columns")
> > Signed-off-by: Stefano Brivio 

Applied, thanks.


[PATCH iproute2] testsuite: colorize test result output

2018-11-12 Thread Stephen Hemminger
When running testsuite it is easy as a human to miss failure.
Add symbol colorizing to SKIPED/PASS/FAIL output.

Signed-off-by: Stephen Hemminger 
---
 testsuite/Makefile | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/testsuite/Makefile b/testsuite/Makefile
index b3aebec1517b..2dc7f166c709 100644
--- a/testsuite/Makefile
+++ b/testsuite/Makefile
@@ -85,11 +85,11 @@ endif
TC="$$i/tc/tc" IP="$$i/ip/ip" SS=$$i/misc/ss DEV="$(DEV)" 
IPVER="$@" SNAME="$$i" \
ERRF="$(RESULTS_DIR)/$@.$$o.err" $(PREFIX) tests/$@ > 
$(RESULTS_DIR)/$@.$$o.out; \
if [ "$$?" = "127" ]; then \
-   echo "SKIPPED"; \
+   echo -e "\e[1;35mSKIPPED\e[0m"; \
elif [ -e "$(RESULTS_DIR)/$@.$$o.err" ]; then \
-   echo "FAILED"; \
+   echo -e "\e[0;31mFAILED\e[0m"; \
else \
-   echo "PASS"; \
+   echo -e "\e[0;32mPASS\e[0m"; \
fi; \
rm "$$TMP_ERR" "$$TMP_OUT"; \
sudo dmesg > $(RESULTS_DIR)/$@.$$o.dmesg; \
-- 
2.19.1



[PATCH net-next 3/4] sctp: rename enum sctp_event to sctp_event_type

2018-11-12 Thread Xin Long
sctp_event is a structure name defined in RFC for sockopt
SCTP_EVENT. To avoid the conflict, rename it.

Signed-off-by: Xin Long 
---
 include/net/sctp/constants.h |  2 +-
 include/net/sctp/sm.h|  4 ++--
 net/sctp/primitive.c |  2 +-
 net/sctp/sm_sideeffect.c | 12 ++--
 net/sctp/sm_statetable.c |  2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index 8dadc74..4588bdc 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -71,7 +71,7 @@ enum { SCTP_DEFAULT_INSTREAMS = SCTP_MAX_STREAM };
 SCTP_NUM_AUTH_CHUNK_TYPES)
 
 /* These are the different flavours of event.  */
-enum sctp_event {
+enum sctp_event_type {
SCTP_EVENT_T_CHUNK = 1,
SCTP_EVENT_T_TIMEOUT,
SCTP_EVENT_T_OTHER,
diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 9e3d327..24825a8 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -173,7 +173,7 @@ sctp_state_fn_t sctp_sf_autoclose_timer_expire;
 __u8 sctp_get_chunk_type(struct sctp_chunk *chunk);
 const struct sctp_sm_table_entry *sctp_sm_lookup_event(
struct net *net,
-   enum sctp_event event_type,
+   enum sctp_event_type event_type,
enum sctp_state state,
union sctp_subtype event_subtype);
 int sctp_chunk_iif(const struct sctp_chunk *);
@@ -313,7 +313,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 
 /* Prototypes for statetable processing. */
 
-int sctp_do_sm(struct net *net, enum sctp_event event_type,
+int sctp_do_sm(struct net *net, enum sctp_event_type event_type,
   union sctp_subtype subtype, enum sctp_state state,
   struct sctp_endpoint *ep, struct sctp_association *asoc,
   void *event_arg, gfp_t gfp);
diff --git a/net/sctp/primitive.c b/net/sctp/primitive.c
index c0817f7a..a8c4c33 100644
--- a/net/sctp/primitive.c
+++ b/net/sctp/primitive.c
@@ -53,7 +53,7 @@
 int sctp_primitive_ ## name(struct net *net, struct sctp_association *asoc, \
void *arg) { \
int error = 0; \
-   enum sctp_event event_type; union sctp_subtype subtype; \
+   enum sctp_event_type event_type; union sctp_subtype subtype; \
enum sctp_state state; \
struct sctp_endpoint *ep; \
\
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 85d3930..1d143bc 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -52,7 +52,7 @@
 #include 
 #include 
 
-static int sctp_cmd_interpreter(enum sctp_event event_type,
+static int sctp_cmd_interpreter(enum sctp_event_type event_type,
union sctp_subtype subtype,
enum sctp_state state,
struct sctp_endpoint *ep,
@@ -61,7 +61,7 @@ static int sctp_cmd_interpreter(enum sctp_event event_type,
enum sctp_disposition status,
struct sctp_cmd_seq *commands,
gfp_t gfp);
-static int sctp_side_effects(enum sctp_event event_type,
+static int sctp_side_effects(enum sctp_event_type event_type,
 union sctp_subtype subtype,
 enum sctp_state state,
 struct sctp_endpoint *ep,
@@ -623,7 +623,7 @@ static void sctp_cmd_init_failed(struct sctp_cmd_seq 
*commands,
 /* Worker routine to handle SCTP_CMD_ASSOC_FAILED.  */
 static void sctp_cmd_assoc_failed(struct sctp_cmd_seq *commands,
  struct sctp_association *asoc,
- enum sctp_event event_type,
+ enum sctp_event_type event_type,
  union sctp_subtype subtype,
  struct sctp_chunk *chunk,
  unsigned int error)
@@ -1162,7 +1162,7 @@ static void sctp_cmd_send_asconf(struct sctp_association 
*asoc)
  * If you want to understand all of lksctp, this is a
  * good place to start.
  */
-int sctp_do_sm(struct net *net, enum sctp_event event_type,
+int sctp_do_sm(struct net *net, enum sctp_event_type event_type,
   union sctp_subtype subtype, enum sctp_state state,
   struct sctp_endpoint *ep, struct sctp_association *asoc,
   void *event_arg, gfp_t gfp)
@@ -1199,7 +1199,7 @@ int sctp_do_sm(struct net *net, enum sctp_event 
event_type,
 /*
  * This the master state function side effect processing function.
  */
-static int sctp_side_effects(enum sctp_event event_type,
+static int 

[PATCH net-next 4/4] sctp: add sockopt SCTP_EVENT

2018-11-12 Thread Xin Long
This patch adds sockopt SCTP_EVENT described in rfc6525#section-6.2.
With this sockopt users can subscribe to an event from a specified
asoc.

Signed-off-by: Xin Long 
---
 include/uapi/linux/sctp.h |  7 
 net/sctp/socket.c | 89 +++
 2 files changed, 96 insertions(+)

diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index 66afa5b..d584073 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -129,6 +129,7 @@ typedef __s32 sctp_assoc_t;
 #define SCTP_STREAM_SCHEDULER_VALUE124
 #define SCTP_INTERLEAVING_SUPPORTED125
 #define SCTP_SENDMSG_CONNECT   126
+#define SCTP_EVENT 127
 
 /* PR-SCTP policies */
 #define SCTP_PR_SCTP_NONE  0x
@@ -1154,6 +1155,12 @@ struct sctp_add_streams {
uint16_t sas_outstrms;
 };
 
+struct sctp_event {
+   sctp_assoc_t se_assoc_id;
+   uint16_t se_type;
+   uint8_t se_on;
+};
+
 /* SCTP Stream schedulers */
 enum sctp_sched_type {
SCTP_SS_FCFS,
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 789008d..1451211 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4288,6 +4288,57 @@ static int sctp_setsockopt_reuse_port(struct sock *sk, 
char __user *optval,
return 0;
 }
 
+static int sctp_setsockopt_event(struct sock *sk, char __user *optval,
+unsigned int optlen)
+{
+   struct sctp_association *asoc;
+   struct sctp_ulpevent *event;
+   struct sctp_event param;
+   int retval = 0;
+
+   if (optlen < sizeof(param)) {
+   retval = -EINVAL;
+   goto out;
+   }
+
+   optlen = sizeof(param);
+   if (copy_from_user(, optval, optlen)) {
+   retval = -EFAULT;
+   goto out;
+   }
+
+   if (param.se_type < SCTP_SN_TYPE_BASE ||
+   param.se_type > SCTP_SN_TYPE_MAX) {
+   retval = -EINVAL;
+   goto out;
+   }
+
+   asoc = sctp_id2assoc(sk, param.se_assoc_id);
+   if (!asoc) {
+   sctp_ulpevent_type_set(_sk(sk)->subscribe,
+  param.se_type, param.se_on);
+   goto out;
+   }
+
+   sctp_ulpevent_type_set(>subscribe, param.se_type, param.se_on);
+
+   if (param.se_type == SCTP_SENDER_DRY_EVENT && param.se_on) {
+   if (sctp_outq_is_empty(>outqueue)) {
+   event = sctp_ulpevent_make_sender_dry_event(asoc,
+   GFP_USER | __GFP_NOWARN);
+   if (!event) {
+   retval = -ENOMEM;
+   goto out;
+   }
+
+   asoc->stream.si->enqueue_event(>ulpq, event);
+   }
+   }
+
+out:
+   return retval;
+}
+
 /* API 6.2 setsockopt(), getsockopt()
  *
  * Applications use setsockopt() and getsockopt() to set or retrieve
@@ -4485,6 +4536,9 @@ static int sctp_setsockopt(struct sock *sk, int level, 
int optname,
case SCTP_REUSE_PORT:
retval = sctp_setsockopt_reuse_port(sk, optval, optlen);
break;
+   case SCTP_EVENT:
+   retval = sctp_setsockopt_event(sk, optval, optlen);
+   break;
default:
retval = -ENOPROTOOPT;
break;
@@ -7430,6 +7484,38 @@ static int sctp_getsockopt_reuse_port(struct sock *sk, 
int len,
return 0;
 }
 
+static int sctp_getsockopt_event(struct sock *sk, int len, char __user *optval,
+int __user *optlen)
+{
+   struct sctp_association *asoc;
+   struct sctp_event param;
+   __u16 subscribe;
+
+   if (len < sizeof(param))
+   return -EINVAL;
+
+   len = sizeof(param);
+   if (copy_from_user(, optval, len))
+   return -EFAULT;
+
+   if (param.se_type < SCTP_SN_TYPE_BASE ||
+   param.se_type > SCTP_SN_TYPE_MAX)
+   return -EINVAL;
+
+   asoc = sctp_id2assoc(sk, param.se_assoc_id);
+   subscribe = asoc ? asoc->subscribe : sctp_sk(sk)->subscribe;
+   param.se_on = sctp_ulpevent_type_enabled(subscribe, param.se_type);
+
+   if (put_user(len, optlen))
+   return -EFAULT;
+
+   if (copy_to_user(optval, , len))
+   return -EFAULT;
+
+   return 0;
+}
+
+
 static int sctp_getsockopt(struct sock *sk, int level, int optname,
   char __user *optval, int __user *optlen)
 {
@@ -7628,6 +7714,9 @@ static int sctp_getsockopt(struct sock *sk, int level, 
int optname,
case SCTP_REUSE_PORT:
retval = sctp_getsockopt_reuse_port(sk, len, optval, optlen);
break;
+   case SCTP_EVENT:
+   retval = sctp_getsockopt_event(sk, len, optval, optlen);
+   break;
default:
retval = -ENOPROTOOPT;
break;
-- 
2.1.0



[PATCH net-next 0/4] sctp: add subscribe per asoc and sockopt SCTP_EVENT

2018-11-12 Thread Xin Long
This patchset mainly adds the Event Subscription sockopt described in
rfc6525#section-6.2:

Subscribing to events as described in [RFC6458] uses a setsockopt()
call with the SCTP_EVENT socket option.  This option takes the
following structure, which specifies the association, the event type
(using the same value found in the event type field), and an on/off
boolean.

  struct sctp_event {
sctp_assoc_t se_assoc_id;
uint16_t se_type;
uint8_t  se_on;
  };

The user fills in the se_type field with the same value found in the
strreset_type field, i.e., SCTP_STREAM_RESET_EVENT.  The user will
also fill in the se_assoc_id field with either the association to set
this event on (this field is ignored for one-to-one style sockets) or
one of the reserved constant values defined in [RFC6458].  Finally,
the se_on field is set with a 1 to enable the event or a 0 to disable
the event.

As for the old SCTP_EVENTS Option with struct sctp_event_subscribe,
it's being DEPRECATED.

Xin Long (4):
  sctp: define subscribe in sctp_sock as __u16
  sctp: add subscribe per asoc
  sctp: rename enum sctp_event to sctp_event_type
  sctp: add sockopt SCTP_EVENT

 include/net/sctp/constants.h |   2 +-
 include/net/sctp/sm.h|   4 +-
 include/net/sctp/structs.h   |   4 +-
 include/net/sctp/ulpevent.h  |  39 --
 include/uapi/linux/sctp.h|  13 -
 net/sctp/associola.c |   2 +
 net/sctp/chunk.c |   8 ++-
 net/sctp/primitive.c |   2 +-
 net/sctp/sm_sideeffect.c |  12 ++---
 net/sctp/sm_statetable.c |   2 +-
 net/sctp/socket.c| 126 ---
 net/sctp/stream_interleave.c |  12 +++--
 net/sctp/ulpqueue.c  |   8 +--
 13 files changed, 184 insertions(+), 50 deletions(-)

-- 
2.1.0



[PATCH net 1/1] s390/ism: clear dmbe_mask bit before SMC IRQ handling

2018-11-12 Thread Ursula Braun
SMC-D stress workload showed connection stalls. Since the firmware
decides to skip raising an interrupt if the SBA DMBE mask bit is
still set, this SBA DMBE mask bit should be cleared before the
IRQ handling in the SMC code runs. Otherwise there are small windows
possible with missing interrupts for incoming data.
SMC-D currently does not care about the old value of the SBA DMBE
mask.

Acked-by: Sebastian Ott 
Signed-off-by: Ursula Braun 
---
 drivers/s390/net/ism_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index f96ec68af2e5..dcbf5c857743 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -415,9 +415,9 @@ static irqreturn_t ism_handle_irq(int irq, void *data)
break;
 
clear_bit_inv(bit, bv);
+   ism->sba->dmbe_mask[bit + ISM_DMB_BIT_OFFSET] = 0;
barrier();
smcd_handle_irq(ism->smcd, bit + ISM_DMB_BIT_OFFSET);
-   ism->sba->dmbe_mask[bit + ISM_DMB_BIT_OFFSET] = 0;
}
 
if (ism->sba->e) {
-- 
2.16.4



[PATCH net 5/5] net/smc: use after free fix in smc_wr_tx_put_slot()

2018-11-12 Thread Ursula Braun
From: Ursula Braun 

In smc_wr_tx_put_slot() field pend->idx is used after being
cleared. That means always idx 0 is cleared in the wr_tx_mask.
This results in a broken administration of available WR send
payload buffers.

Signed-off-by: Ursula Braun 
---
 net/smc/smc_wr.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 3c458d279855..c2694750a6a8 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -215,12 +215,14 @@ int smc_wr_tx_put_slot(struct smc_link *link,
 
pend = container_of(wr_pend_priv, struct smc_wr_tx_pend, priv);
if (pend->idx < link->wr_tx_cnt) {
+   u32 idx = pend->idx;
+
/* clear the full struct smc_wr_tx_pend including .priv */
memset(>wr_tx_pends[pend->idx], 0,
   sizeof(link->wr_tx_pends[pend->idx]));
memset(>wr_tx_bufs[pend->idx], 0,
   sizeof(link->wr_tx_bufs[pend->idx]));
-   test_and_clear_bit(pend->idx, link->wr_tx_mask);
+   test_and_clear_bit(idx, link->wr_tx_mask);
return 1;
}
 
-- 
2.16.4



[PATCH net 3/5] net/smc: add SMC-D shutdown signal

2018-11-12 Thread Ursula Braun
From: Hans Wippel 

When a SMC-D link group is freed, a shutdown signal should be sent to
the peer to indicate that the link group is invalid. This patch adds the
shutdown signal to the SMC code.

Signed-off-by: Hans Wippel 
Signed-off-by: Ursula Braun 
---
 net/smc/smc_core.c | 10 --
 net/smc/smc_core.h |  3 ++-
 net/smc/smc_ism.c  | 43 ---
 net/smc/smc_ism.h  |  1 +
 4 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 3c023de58afd..1c9fa7f0261a 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -184,6 +184,8 @@ static void smc_lgr_free_work(struct work_struct *work)
 
if (!lgr->is_smcd && lnk->state != SMC_LNK_INACTIVE)
smc_llc_link_inactive(lnk);
+   if (lgr->is_smcd)
+   smc_ism_signal_shutdown(lgr);
smc_lgr_free(lgr);
}
 }
@@ -485,7 +487,7 @@ void smc_port_terminate(struct smc_ib_device *smcibdev, u8 
ibport)
 }
 
 /* Called when SMC-D device is terminated or peer is lost */
-void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid)
+void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid, unsigned short 
vlan)
 {
struct smc_link_group *lgr, *l;
LIST_HEAD(lgr_free_list);
@@ -495,7 +497,7 @@ void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid)
list_for_each_entry_safe(lgr, l, _lgr_list.list, list) {
if (lgr->is_smcd && lgr->smcd == dev &&
(!peer_gid || lgr->peer_gid == peer_gid) &&
-   !list_empty(>list)) {
+   (vlan == VLAN_VID_MASK || lgr->vlan_id == vlan)) {
__smc_lgr_terminate(lgr);
list_move(>list, _free_list);
}
@@ -506,6 +508,8 @@ void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid)
list_for_each_entry_safe(lgr, l, _free_list, list) {
list_del_init(>list);
cancel_delayed_work_sync(>free_work);
+   if (!peer_gid && vlan == VLAN_VID_MASK) /* dev terminated? */
+   smc_ism_signal_shutdown(lgr);
smc_lgr_free(lgr);
}
 }
@@ -1026,6 +1030,8 @@ void smc_core_exit(void)
smc_llc_link_inactive(lnk);
}
cancel_delayed_work_sync(>free_work);
+   if (lgr->is_smcd)
+   smc_ism_signal_shutdown(lgr);
smc_lgr_free(lgr); /* free link group */
}
 }
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 5bc6cbaf0ed5..cf98f4d6093e 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -247,7 +247,8 @@ void smc_lgr_free(struct smc_link_group *lgr);
 void smc_lgr_forget(struct smc_link_group *lgr);
 void smc_lgr_terminate(struct smc_link_group *lgr);
 void smc_port_terminate(struct smc_ib_device *smcibdev, u8 ibport);
-void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid);
+void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid,
+   unsigned short vlan);
 int smc_buf_create(struct smc_sock *smc, bool is_smcd);
 int smc_uncompress_bufsize(u8 compressed);
 int smc_rmb_rtoken_handling(struct smc_connection *conn,
diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c
index e36f21ce7252..2fff79db1a59 100644
--- a/net/smc/smc_ism.c
+++ b/net/smc/smc_ism.c
@@ -187,22 +187,28 @@ struct smc_ism_event_work {
 #define ISM_EVENT_REQUEST  0x0001
 #define ISM_EVENT_RESPONSE 0x0002
 #define ISM_EVENT_REQUEST_IR   0x0001
+#define ISM_EVENT_CODE_SHUTDOWN0x80
 #define ISM_EVENT_CODE_TESTLINK0x83
 
+union smcd_sw_event_info {
+   u64 info;
+   struct {
+   u8  uid[SMC_LGR_ID_SIZE];
+   unsigned short  vlan_id;
+   u16 code;
+   };
+};
+
 static void smcd_handle_sw_event(struct smc_ism_event_work *wrk)
 {
-   union {
-   u64 info;
-   struct {
-   u32 uid;
-   unsigned short  vlanid;
-   u16 code;
-   };
-   } ev_info;
+   union smcd_sw_event_info ev_info;
 
+   ev_info.info = wrk->event.info;
switch (wrk->event.code) {
+   case ISM_EVENT_CODE_SHUTDOWN:   /* Peer shut down DMBs */
+   smc_smcd_terminate(wrk->smcd, wrk->event.tok, ev_info.vlan_id);
+   break;
case ISM_EVENT_CODE_TESTLINK:   /* Activity timer */
-   ev_info.info = wrk->event.info;
if (ev_info.code == ISM_EVENT_REQUEST) {
ev_info.code = ISM_EVENT_RESPONSE;
wrk->smcd->ops->signal_event(wrk->smcd,
@@ -215,6 +221,21 @@ static void smcd_handle_sw_event(struct smc_ism_event_work 
*wrk)
}
 }
 
+int smc_ism_signal_shutdown(struct smc_link_group *lgr)
+{
+ 

[PATCH net 2/5] net/smc: use queue pair number when matching link group

2018-11-12 Thread Ursula Braun
From: Karsten Graul 

When searching for an existing link group the queue pair number is also
to be taken into consideration. When the SMC server sends a new number
in a CLC packet (keeping all other values equal) then a new link group
is to be created on the SMC client side.

Signed-off-by: Karsten Graul 
Signed-off-by: Ursula Braun 
---
 net/smc/af_smc.c   |  9 +
 net/smc/smc_core.c | 10 ++
 net/smc/smc_core.h |  2 +-
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 84f67f601838..5fbaf1901571 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -549,7 +549,8 @@ static int smc_connect_rdma(struct smc_sock *smc,
 
mutex_lock(_create_lgr_pending);
local_contact = smc_conn_create(smc, false, aclc->hdr.flag, ibdev,
-   ibport, >lcl, NULL, 0);
+   ibport, ntoh24(aclc->qpn), >lcl,
+   NULL, 0);
if (local_contact < 0) {
if (local_contact == -ENOMEM)
reason_code = SMC_CLC_DECL_MEM;/* insufficient memory*/
@@ -620,7 +621,7 @@ static int smc_connect_ism(struct smc_sock *smc,
int rc = 0;
 
mutex_lock(_create_lgr_pending);
-   local_contact = smc_conn_create(smc, true, aclc->hdr.flag, NULL, 0,
+   local_contact = smc_conn_create(smc, true, aclc->hdr.flag, NULL, 0, 0,
NULL, ismdev, aclc->gid);
if (local_contact < 0)
return smc_connect_abort(smc, SMC_CLC_DECL_MEM, 0);
@@ -1085,7 +1086,7 @@ static int smc_listen_rdma_init(struct smc_sock *new_smc,
int *local_contact)
 {
/* allocate connection / link group */
-   *local_contact = smc_conn_create(new_smc, false, 0, ibdev, ibport,
+   *local_contact = smc_conn_create(new_smc, false, 0, ibdev, ibport, 0,
 >lcl, NULL, 0);
if (*local_contact < 0) {
if (*local_contact == -ENOMEM)
@@ -1109,7 +1110,7 @@ static int smc_listen_ism_init(struct smc_sock *new_smc,
struct smc_clc_msg_smcd *pclc_smcd;
 
pclc_smcd = smc_get_clc_msg_smcd(pclc);
-   *local_contact = smc_conn_create(new_smc, true, 0, NULL, 0, NULL,
+   *local_contact = smc_conn_create(new_smc, true, 0, NULL, 0, 0, NULL,
 ismdev, pclc_smcd->gid);
if (*local_contact < 0) {
if (*local_contact == -ENOMEM)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 18daebcef181..3c023de58afd 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -559,7 +559,7 @@ int smc_vlan_by_tcpsk(struct socket *clcsock, unsigned 
short *vlan_id)
 
 static bool smcr_lgr_match(struct smc_link_group *lgr,
   struct smc_clc_msg_local *lcl,
-  enum smc_lgr_role role)
+  enum smc_lgr_role role, u32 clcqpn)
 {
return !memcmp(lgr->peer_systemid, lcl->id_for_peer,
   SMC_SYSTEMID_LEN) &&
@@ -567,7 +567,9 @@ static bool smcr_lgr_match(struct smc_link_group *lgr,
SMC_GID_SIZE) &&
!memcmp(lgr->lnk[SMC_SINGLE_LINK].peer_mac, lcl->mac,
sizeof(lcl->mac)) &&
-   lgr->role == role;
+   lgr->role == role &&
+   (lgr->role == SMC_SERV ||
+lgr->lnk[SMC_SINGLE_LINK].peer_qpn == clcqpn);
 }
 
 static bool smcd_lgr_match(struct smc_link_group *lgr,
@@ -578,7 +580,7 @@ static bool smcd_lgr_match(struct smc_link_group *lgr,
 
 /* create a new SMC connection (and a new link group if necessary) */
 int smc_conn_create(struct smc_sock *smc, bool is_smcd, int srv_first_contact,
-   struct smc_ib_device *smcibdev, u8 ibport,
+   struct smc_ib_device *smcibdev, u8 ibport, u32 clcqpn,
struct smc_clc_msg_local *lcl, struct smcd_dev *smcd,
u64 peer_gid)
 {
@@ -603,7 +605,7 @@ int smc_conn_create(struct smc_sock *smc, bool is_smcd, int 
srv_first_contact,
list_for_each_entry(lgr, _lgr_list.list, list) {
write_lock_bh(>conns_lock);
if ((is_smcd ? smcd_lgr_match(lgr, smcd, peer_gid) :
-smcr_lgr_match(lgr, lcl, role)) &&
+smcr_lgr_match(lgr, lcl, role, clcqpn)) &&
!lgr->sync_err &&
lgr->vlan_id == vlan_id &&
(role == SMC_CLNT ||
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index c156674733c9..5bc6cbaf0ed5 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -262,7 +262,7 @@ int smc_vlan_by_tcpsk(struct socket *clcsock, unsigned 
short *vlan_id);
 
 void smc_conn_free(struct smc_connection *conn);
 int smc_conn_create(struct smc_sock *smc, bool is_smcd, int srv_first_contact,
-   struct 

[PATCH net 0/5] net/smc: fixes 2018-11-12

2018-11-12 Thread Ursula Braun
From: Ursula Braun 

here are some fixes in different areas of the smc code for the net
tree.

Thanks, Ursula

Hans Wippel (2):
  net/smc: abort CLC connection in smc_release
  net/smc: add SMC-D shutdown signal

Karsten Graul (1):
  net/smc: use queue pair number when matching link group

Ursula Braun (2):
  net/smc: atomic SMCD cursor handling
  net/smc: use after free fix in smc_wr_tx_put_slot()

 net/smc/af_smc.c   | 11 +++
 net/smc/smc_cdc.c  | 24 +--
 net/smc/smc_cdc.h  | 56 +-
 net/smc/smc_core.c | 20 +--
 net/smc/smc_core.h |  5 +++--
 net/smc/smc_ism.c  | 43 ++---
 net/smc/smc_ism.h  |  1 +
 net/smc/smc_wr.c   |  4 +++-
 8 files changed, 117 insertions(+), 47 deletions(-)

-- 
2.16.4



[PATCH net 1/5] net/smc: abort CLC connection in smc_release

2018-11-12 Thread Ursula Braun
From: Hans Wippel 

In case of a non-blocking SMC socket, the initial CLC handshake is
performed over a blocking TCP connection in a worker. If the SMC socket
is released, smc_release has to wait for the blocking CLC socket
operations (e.g., kernel_connect) inside the worker.

This patch aborts a CLC connection when the respective non-blocking SMC
socket is released to avoid waiting on socket operations or timeouts.

Signed-off-by: Hans Wippel 
Signed-off-by: Ursula Braun 
---
 net/smc/af_smc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 80e2119f1c70..84f67f601838 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -127,6 +127,8 @@ static int smc_release(struct socket *sock)
smc = smc_sk(sk);
 
/* cleanup for a dangling non-blocking connect */
+   if (smc->connect_info && sk->sk_state == SMC_INIT)
+   tcp_abort(smc->clcsock->sk, ECONNABORTED);
flush_work(>connect_work);
kfree(smc->connect_info);
smc->connect_info = NULL;
-- 
2.16.4



[PATCH net 4/5] net/smc: atomic SMCD cursor handling

2018-11-12 Thread Ursula Braun
Running uperf tests with SMCD on LPARs results in corrupted cursors.
SMCD cursors should be treated atomically to fix cursor corruption.

Signed-off-by: Ursula Braun 
---
 net/smc/smc_cdc.c | 24 ++--
 net/smc/smc_cdc.h | 56 ++-
 2 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index ed5dcf03fe0b..18c047c155fd 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -177,23 +177,24 @@ void smc_cdc_tx_dismiss_slots(struct smc_connection *conn)
 int smcd_cdc_msg_send(struct smc_connection *conn)
 {
struct smc_sock *smc = container_of(conn, struct smc_sock, conn);
+   union smc_host_cursor curs;
struct smcd_cdc_msg cdc;
int rc, diff;
 
memset(, 0, sizeof(cdc));
cdc.common.type = SMC_CDC_MSG_TYPE;
-   cdc.prod_wrap = conn->local_tx_ctrl.prod.wrap;
-   cdc.prod_count = conn->local_tx_ctrl.prod.count;
-
-   cdc.cons_wrap = conn->local_tx_ctrl.cons.wrap;
-   cdc.cons_count = conn->local_tx_ctrl.cons.count;
-   cdc.prod_flags = conn->local_tx_ctrl.prod_flags;
-   cdc.conn_state_flags = conn->local_tx_ctrl.conn_state_flags;
+   curs.acurs.counter = atomic64_read(>local_tx_ctrl.prod.acurs);
+   cdc.prod.wrap = curs.wrap;
+   cdc.prod.count = curs.count;
+   curs.acurs.counter = atomic64_read(>local_tx_ctrl.cons.acurs);
+   cdc.cons.wrap = curs.wrap;
+   cdc.cons.count = curs.count;
+   cdc.cons.prod_flags = conn->local_tx_ctrl.prod_flags;
+   cdc.cons.conn_state_flags = conn->local_tx_ctrl.conn_state_flags;
rc = smcd_tx_ism_write(conn, , sizeof(cdc), 0, 1);
if (rc)
return rc;
-   smc_curs_copy(>rx_curs_confirmed, >local_tx_ctrl.cons,
- conn);
+   smc_curs_copy(>rx_curs_confirmed, , conn);
/* Calculate transmitted data and increment free send buffer space */
diff = smc_curs_diff(conn->sndbuf_desc->len, >tx_curs_fin,
 >tx_curs_sent);
@@ -331,13 +332,16 @@ static void smc_cdc_msg_recv(struct smc_sock *smc, struct 
smc_cdc_msg *cdc)
 static void smcd_cdc_rx_tsklet(unsigned long data)
 {
struct smc_connection *conn = (struct smc_connection *)data;
+   struct smcd_cdc_msg *data_cdc;
struct smcd_cdc_msg cdc;
struct smc_sock *smc;
 
if (!conn)
return;
 
-   memcpy(, conn->rmb_desc->cpu_addr, sizeof(cdc));
+   data_cdc = (struct smcd_cdc_msg *)conn->rmb_desc->cpu_addr;
+   smcd_curs_copy(, _cdc->prod, conn);
+   smcd_curs_copy(, _cdc->cons, conn);
smc = container_of(conn, struct smc_sock, conn);
smc_cdc_msg_recv(smc, (struct smc_cdc_msg *));
 }
diff --git a/net/smc/smc_cdc.h b/net/smc/smc_cdc.h
index 934df4473a7c..b9772ab77ddf 100644
--- a/net/smc/smc_cdc.h
+++ b/net/smc/smc_cdc.h
@@ -50,17 +50,27 @@ struct smc_cdc_msg {
u8  reserved[18];
 } __packed;/* format defined in RFC7609 */
 
+/* SMC-D cursor format */
+union smcd_cdc_cursor {
+   struct {
+   u16 wrap;
+   u32 count;
+   struct smc_cdc_producer_flags   prod_flags;
+   struct smc_cdc_conn_state_flags conn_state_flags;
+   } __packed;
+#ifdef KERNEL_HAS_ATOMIC64
+   atomic64_t  acurs;  /* for atomic processing */
+#else
+   u64 acurs;  /* for atomic processing */
+#endif
+} __aligned(8);
+
 /* CDC message for SMC-D */
 struct smcd_cdc_msg {
struct smc_wr_rx_hdr common;/* Type = 0xFE */
u8 res1[7];
-   u16 prod_wrap;
-   u32 prod_count;
-   u8 res2[2];
-   u16 cons_wrap;
-   u32 cons_count;
-   struct smc_cdc_producer_flags   prod_flags;
-   struct smc_cdc_conn_state_flags conn_state_flags;
+   union smcd_cdc_cursor   prod;
+   union smcd_cdc_cursor   cons;
u8 res3[8];
 } __packed;
 
@@ -135,6 +145,21 @@ static inline void smc_curs_copy_net(union smc_cdc_cursor 
*tgt,
 #endif
 }
 
+static inline void smcd_curs_copy(union smcd_cdc_cursor *tgt,
+ union smcd_cdc_cursor *src,
+ struct smc_connection *conn)
+{
+#ifndef KERNEL_HAS_ATOMIC64
+   unsigned long flags;
+
+   spin_lock_irqsave(>acurs_lock, flags);
+   tgt->acurs = src->acurs;
+   spin_unlock_irqrestore(>acurs_lock, flags);
+#else
+   atomic64_set(>acurs, atomic64_read(>acurs));
+#endif
+}
+
 /* calculate cursor difference between old and new, where old <= new */
 static inline int smc_curs_diff(unsigned int size,
union smc_host_cursor *old,
@@ -222,12 +247,17 @@ static inline void smcr_cdc_msg_to_host(struct 
smc_host_cdc_msg *local,
 static inline void smcd_cdc_msg_to_host(struct smc_host_cdc_msg *local,
  

[PATCH net-next v2 1/6] net: aquantia: add rx-flow filter definitions

2018-11-12 Thread Igor Russkikh
From: Dmitry Bogdanov 

Add missing register definitions and the functions accessing them
related to rx-flow filters.

Signed-off-by: Dmitry Bogdanov 
Signed-off-by: Igor Russkikh 
---
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c | 109 +
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h |  48 
 .../aquantia/atlantic/hw_atl/hw_atl_llh_internal.h | 135 ++---
 3 files changed, 275 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c
index 5502ec5f0f69..939f77e2e117 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c
@@ -898,6 +898,24 @@ void hw_atl_rpf_vlan_id_flr_set(struct aq_hw_s *aq_hw, u32 
vlan_id_flr,
vlan_id_flr);
 }
 
+void hw_atl_rpf_vlan_rxq_en_flr_set(struct aq_hw_s *aq_hw, u32 vlan_rxq_en,
+   u32 filter)
+{
+   aq_hw_write_reg_bit(aq_hw, HW_ATL_RPF_VL_RXQ_EN_F_ADR(filter),
+   HW_ATL_RPF_VL_RXQ_EN_F_MSK,
+   HW_ATL_RPF_VL_RXQ_EN_F_SHIFT,
+   vlan_rxq_en);
+}
+
+void hw_atl_rpf_vlan_rxq_flr_set(struct aq_hw_s *aq_hw, u32 vlan_rxq,
+u32 filter)
+{
+   aq_hw_write_reg_bit(aq_hw, HW_ATL_RPF_VL_RXQ_F_ADR(filter),
+   HW_ATL_RPF_VL_RXQ_F_MSK,
+   HW_ATL_RPF_VL_RXQ_F_SHIFT,
+   vlan_rxq);
+};
+
 void hw_atl_rpf_etht_flr_en_set(struct aq_hw_s *aq_hw, u32 etht_flr_en,
u32 filter)
 {
@@ -965,6 +983,20 @@ void hw_atl_rpf_etht_flr_set(struct aq_hw_s *aq_hw, u32 
etht_flr, u32 filter)
HW_ATL_RPF_ET_VALF_SHIFT, etht_flr);
 }
 
+void hw_atl_rpf_l4_spd_set(struct aq_hw_s *aq_hw, u32 val, u32 filter)
+{
+   aq_hw_write_reg_bit(aq_hw, HW_ATL_RPF_L4_SPD_ADR(filter),
+   HW_ATL_RPF_L4_SPD_MSK,
+   HW_ATL_RPF_L4_SPD_SHIFT, val);
+}
+
+void hw_atl_rpf_l4_dpd_set(struct aq_hw_s *aq_hw, u32 val, u32 filter)
+{
+   aq_hw_write_reg_bit(aq_hw, HW_ATL_RPF_L4_DPD_ADR(filter),
+   HW_ATL_RPF_L4_DPD_MSK,
+   HW_ATL_RPF_L4_DPD_SHIFT, val);
+}
+
 /* RPO: rx packet offload */
 void hw_atl_rpo_ipv4header_crc_offload_en_set(struct aq_hw_s *aq_hw,
  u32 ipv4header_crc_offload_en)
@@ -1476,3 +1508,80 @@ void hw_atl_mcp_up_force_intr_set(struct aq_hw_s *aq_hw, 
u32 up_force_intr)
HW_ATL_MCP_UP_FORCE_INTERRUPT_SHIFT,
up_force_intr);
 }
+
+void hw_atl_rpfl3l4_ipv4_dest_addr_clear(struct aq_hw_s *aq_hw, u8 location)
+{
+   aq_hw_write_reg(aq_hw, HW_ATL_RPF_L3_DSTA_ADR(location), 0U);
+}
+
+void hw_atl_rpfl3l4_ipv4_src_addr_clear(struct aq_hw_s *aq_hw, u8 location)
+{
+   aq_hw_write_reg(aq_hw, HW_ATL_RPF_L3_SRCA_ADR(location), 0U);
+}
+
+void hw_atl_rpfl3l4_cmd_clear(struct aq_hw_s *aq_hw, u8 location)
+{
+   aq_hw_write_reg(aq_hw, HW_ATL_RPF_L3_REG_CTRL_ADR(location), 0U);
+}
+
+void hw_atl_rpfl3l4_ipv6_dest_addr_clear(struct aq_hw_s *aq_hw, u8 location)
+{
+   int i;
+
+   for (i = 0; i < 4; ++i)
+   aq_hw_write_reg(aq_hw,
+   HW_ATL_RPF_L3_DSTA_ADR(location + i),
+   0U);
+}
+
+void hw_atl_rpfl3l4_ipv6_src_addr_clear(struct aq_hw_s *aq_hw, u8 location)
+{
+   int i;
+
+   for (i = 0; i < 4; ++i)
+   aq_hw_write_reg(aq_hw,
+   HW_ATL_RPF_L3_SRCA_ADR(location + i),
+   0U);
+}
+
+void hw_atl_rpfl3l4_ipv4_dest_addr_set(struct aq_hw_s *aq_hw, u8 location,
+  u32 ipv4_dest)
+{
+   aq_hw_write_reg(aq_hw, HW_ATL_RPF_L3_DSTA_ADR(location),
+   ipv4_dest);
+}
+
+void hw_atl_rpfl3l4_ipv4_src_addr_set(struct aq_hw_s *aq_hw, u8 location,
+ u32 ipv4_src)
+{
+   aq_hw_write_reg(aq_hw,
+   HW_ATL_RPF_L3_SRCA_ADR(location),
+   ipv4_src);
+}
+
+void hw_atl_rpfl3l4_cmd_set(struct aq_hw_s *aq_hw, u8 location, u32 cmd)
+{
+   aq_hw_write_reg(aq_hw, HW_ATL_RPF_L3_REG_CTRL_ADR(location), cmd);
+}
+
+void hw_atl_rpfl3l4_ipv6_src_addr_set(struct aq_hw_s *aq_hw, u8 location,
+ u32 *ipv6_src)
+{
+   int i;
+
+   for (i = 0; i < 4; ++i)
+   aq_hw_write_reg(aq_hw,
+   HW_ATL_RPF_L3_SRCA_ADR(location + i),
+   ipv6_src[i]);
+}
+
+void hw_atl_rpfl3l4_ipv6_dest_addr_set(struct aq_hw_s *aq_hw, u8 location,
+  u32 *ipv6_dest)
+{
+   int i;
+
+   for (i = 0; i < 4; ++i)

[PATCH net-next v2 2/6] net: aquantia: add infrastructure for ntuple rules

2018-11-12 Thread Igor Russkikh
From: Dmitry Bogdanov 

Add infrastructure to support ntuple filter configuration.
Add rule, remove rule, reapply on interface up.

Signed-off-by: Dmitry Bogdanov 
Signed-off-by: Igor Russkikh 
---
 drivers/net/ethernet/aquantia/atlantic/Makefile|   1 +
 .../net/ethernet/aquantia/atlantic/aq_ethtool.c|  31 ++
 .../net/ethernet/aquantia/atlantic/aq_filters.c| 413 +
 .../net/ethernet/aquantia/atlantic/aq_filters.h|  31 ++
 drivers/net/ethernet/aquantia/atlantic/aq_main.c   |  15 +
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h|   6 +
 .../net/ethernet/aquantia/atlantic/aq_pci_func.c   |   2 +
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c  |   3 +-
 8 files changed, 501 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/aquantia/atlantic/aq_filters.c
 create mode 100644 drivers/net/ethernet/aquantia/atlantic/aq_filters.h

diff --git a/drivers/net/ethernet/aquantia/atlantic/Makefile 
b/drivers/net/ethernet/aquantia/atlantic/Makefile
index 686f6d8c9e79..4556630ee286 100644
--- a/drivers/net/ethernet/aquantia/atlantic/Makefile
+++ b/drivers/net/ethernet/aquantia/atlantic/Makefile
@@ -36,6 +36,7 @@ atlantic-objs := aq_main.o \
aq_ring.o \
aq_hw_utils.o \
aq_ethtool.o \
+   aq_filters.o \
hw_atl/hw_atl_a0.o \
hw_atl/hw_atl_b0.o \
hw_atl/hw_atl_utils.o \
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
index 99ef1daaa4d8..a5fd71692c8b 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
@@ -12,6 +12,7 @@
 #include "aq_ethtool.h"
 #include "aq_nic.h"
 #include "aq_vec.h"
+#include "aq_filters.h"
 
 static void aq_ethtool_get_regs(struct net_device *ndev,
struct ethtool_regs *regs, void *p)
@@ -213,7 +214,36 @@ static int aq_ethtool_get_rxnfc(struct net_device *ndev,
case ETHTOOL_GRXRINGS:
cmd->data = cfg->vecs;
break;
+   case ETHTOOL_GRXCLSRLCNT:
+   cmd->rule_cnt = aq_get_rxnfc_count_all_rules(aq_nic);
+   break;
+   case ETHTOOL_GRXCLSRULE:
+   err = aq_get_rxnfc_rule(aq_nic, cmd);
+   break;
+   case ETHTOOL_GRXCLSRLALL:
+   err = aq_get_rxnfc_all_rules(aq_nic, cmd, rule_locs);
+   break;
+   default:
+   err = -EOPNOTSUPP;
+   break;
+   }
 
+   return err;
+}
+
+static int aq_ethtool_set_rxnfc(struct net_device *ndev,
+   struct ethtool_rxnfc *cmd)
+{
+   int err = 0;
+   struct aq_nic_s *aq_nic = netdev_priv(ndev);
+
+   switch (cmd->cmd) {
+   case ETHTOOL_SRXCLSRLINS:
+   err = aq_add_rxnfc_rule(aq_nic, cmd);
+   break;
+   case ETHTOOL_SRXCLSRLDEL:
+   err = aq_del_rxnfc_rule(aq_nic, cmd);
+   break;
default:
err = -EOPNOTSUPP;
break;
@@ -520,6 +550,7 @@ const struct ethtool_ops aq_ethtool_ops = {
.get_rxfh_key_size   = aq_ethtool_get_rss_key_size,
.get_rxfh= aq_ethtool_get_rss,
.get_rxnfc   = aq_ethtool_get_rxnfc,
+   .set_rxnfc   = aq_ethtool_set_rxnfc,
.get_sset_count  = aq_ethtool_get_sset_count,
.get_ethtool_stats   = aq_ethtool_stats,
.get_link_ksettings  = aq_ethtool_get_link_ksettings,
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
new file mode 100644
index ..8cb4eaefd6e1
--- /dev/null
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
@@ -0,0 +1,413 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Copyright (C) 2014-2017 aQuantia Corporation. */
+
+/* File aq_filters.c: RX filters related functions. */
+
+#include "aq_filters.h"
+
+static bool __must_check
+aq_rule_is_approve(struct ethtool_rx_flow_spec *fsp)
+{
+   if (fsp->flow_type & FLOW_MAC_EXT)
+   return false;
+
+   switch (fsp->flow_type & ~FLOW_EXT) {
+   case ETHER_FLOW:
+   case TCP_V4_FLOW:
+   case UDP_V4_FLOW:
+   case SCTP_V4_FLOW:
+   case TCP_V6_FLOW:
+   case UDP_V6_FLOW:
+   case SCTP_V6_FLOW:
+   case IPV4_FLOW:
+   case IPV6_FLOW:
+   return true;
+   case IP_USER_FLOW:
+   switch (fsp->h_u.usr_ip4_spec.proto) {
+   case IPPROTO_TCP:
+   case IPPROTO_UDP:
+   case IPPROTO_SCTP:
+   case IPPROTO_IP:
+   return true;
+   default:
+   return false;
+   }
+   case IPV6_USER_FLOW:
+   switch (fsp->h_u.usr_ip6_spec.l4_proto) {
+   case IPPROTO_TCP:
+   case IPPROTO_UDP:
+   case IPPROTO_SCTP:
+   case IPPROTO_IP:
+ 

[PATCH net-next v2 4/6] net: aquantia: add vlan id to rx flow filters

2018-11-12 Thread Igor Russkikh
From: Dmitry Bogdanov 

The VLAN filter (VLAN id) is compared against 16 filters.
VLAN id must be accompanied by mask 0xF000. That is to distinguish
VLAN filter from L2 Ethertype filter with UserPriority since both
User Priority and VLAN ID are passed in the same 'vlan' parameter.
Flow type may be any as  it is not matched for VLAN filter.
Due to fixed order of the rules in the NIC, the location 0-15 are
reserved for vlan filters.

Example:
To add a rule that directs packets from VLAN 2001 to queue 5:
ethtool -N  flow-type ip4 vlan 2001 m 0xF000 action 5 loc 0

Signed-off-by: Dmitry Bogdanov 
Signed-off-by: Igor Russkikh 
---
 drivers/net/ethernet/aquantia/atlantic/aq_common.h |  2 +-
 .../net/ethernet/aquantia/atlantic/aq_filters.c| 92 +-
 .../net/ethernet/aquantia/atlantic/aq_filters.h|  2 +
 drivers/net/ethernet/aquantia/atlantic/aq_hw.h |  9 +++
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h|  5 ++
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c  | 37 +
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h|  7 ++
 7 files changed, 151 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_common.h 
b/drivers/net/ethernet/aquantia/atlantic/aq_common.h
index becb578211ed..6b6d1724676e 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_common.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_common.h
@@ -14,7 +14,7 @@
 
 #include 
 #include 
-
+#include 
 #include "ver.h"
 #include "aq_cfg.h"
 #include "aq_utils.h"
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
index 34e2a28d344a..c5240bc487b4 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
@@ -120,6 +120,29 @@ static int aq_check_approve_fl3l4(struct aq_nic_s *aq_nic,
 }
 
 static int __must_check
+aq_check_approve_fvlan(struct aq_nic_s *aq_nic,
+  struct aq_hw_rx_fltrs_s *rx_fltrs,
+  struct ethtool_rx_flow_spec *fsp)
+{
+   if (fsp->location < AQ_RX_FIRST_LOC_FVLANID ||
+   fsp->location > AQ_RX_LAST_LOC_FVLANID) {
+   netdev_err(aq_nic->ndev,
+  "ethtool: location must be in range [%d, %d]",
+  AQ_RX_FIRST_LOC_FVLANID,
+  AQ_RX_LAST_LOC_FVLANID);
+   return -EINVAL;
+   }
+
+   if (fsp->ring_cookie > aq_nic->aq_nic_cfg.num_rss_queues) {
+   netdev_err(aq_nic->ndev,
+  "ethtool: queue number must be in range [0, %d]",
+  aq_nic->aq_nic_cfg.num_rss_queues - 1);
+   return -EINVAL;
+   }
+   return 0;
+}
+
+static int __must_check
 aq_check_filter(struct aq_nic_s *aq_nic,
struct ethtool_rx_flow_spec *fsp)
 {
@@ -127,7 +150,14 @@ aq_check_filter(struct aq_nic_s *aq_nic,
struct aq_hw_rx_fltrs_s *rx_fltrs = aq_get_hw_rx_fltrs(aq_nic);
 
if (fsp->flow_type & FLOW_EXT) {
-   err = -EOPNOTSUPP;
+   if (be16_to_cpu(fsp->m_ext.vlan_tci) == VLAN_VID_MASK) {
+   err = aq_check_approve_fvlan(aq_nic, rx_fltrs, fsp);
+   } else {
+   netdev_err(aq_nic->ndev,
+  "ethtool: invalid vlan mask 0x%x specified",
+  be16_to_cpu(fsp->m_ext.vlan_tci));
+   err = -EINVAL;
+   }
} else {
switch (fsp->flow_type & ~FLOW_EXT) {
case ETHER_FLOW:
@@ -229,6 +259,42 @@ aq_check_rule(struct aq_nic_s *aq_nic,
return err;
 }
 
+static int aq_set_data_fvlan(struct aq_nic_s *aq_nic,
+struct aq_rx_filter *aq_rx_fltr,
+struct aq_rx_filter_vlan *aq_vlans, bool add)
+{
+   const struct ethtool_rx_flow_spec *fsp = _rx_fltr->aq_fsp;
+   int location = fsp->location - AQ_RX_FIRST_LOC_FVLANID;
+
+   memset(_vlans[location], 0, sizeof(aq_vlans[location]));
+
+   if (!add)
+   return 0;
+
+   aq_vlans[location].location = location;
+   aq_vlans[location].vlan_id = be16_to_cpu(fsp->h_ext.vlan_tci)
+& VLAN_VID_MASK;
+   aq_vlans[location].queue = fsp->ring_cookie & 0x1FU;
+   aq_vlans[location].enable = 1U;
+   return 0;
+}
+
+static int aq_add_del_fvlan(struct aq_nic_s *aq_nic,
+   struct aq_rx_filter *aq_rx_fltr, bool add)
+{
+   const struct aq_hw_ops *aq_hw_ops = aq_nic->aq_hw_ops;
+
+   if (unlikely(!aq_hw_ops->hw_filter_vlan_set))
+   return -EOPNOTSUPP;
+
+   aq_set_data_fvlan(aq_nic,
+ aq_rx_fltr,
+ aq_nic->aq_hw_rx_fltrs.fl2.aq_vlans,
+ add);
+
+   return aq_filters_vlans_update(aq_nic);
+}
+
 static int aq_set_data_fl3l4(struct 

[PATCH net-next v2 5/6] net: aquantia: add ethertype and PCP to rx flow filters

2018-11-12 Thread Igor Russkikh
From: Dmitry Bogdanov 

L2 EtherType filters allows to filter packet by EtherType field or
both EtherType and User Priority (PCP) field of 802.1Q.
UserPriority (vlan) parameter must be accompanied by mask 0x1FFF. That
is to distinguish VLAN filter from L2 Ethertype filter with
UserPriority since both User Priority and VLAN ID are passed in the
same 'vlan' parameter.

Example:
To add a filter that directs IP4 packess of priority 3 to queue 3:
ethtool -N  flow-type ether proto 0x800 vlan 0x600 m 0x1FFF \
action 3 loc 16

Signed-off-by: Dmitry Bogdanov 
Signed-off-by: Igor Russkikh 
---
 .../net/ethernet/aquantia/atlantic/aq_filters.c| 83 --
 .../net/ethernet/aquantia/atlantic/aq_filters.h|  1 +
 drivers/net/ethernet/aquantia/atlantic/aq_hw.h |  8 +++
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c  | 37 ++
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h|  8 +++
 5 files changed, 133 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
index c5240bc487b4..b987c6717af6 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
@@ -120,6 +120,30 @@ static int aq_check_approve_fl3l4(struct aq_nic_s *aq_nic,
 }
 
 static int __must_check
+aq_check_approve_fl2(struct aq_nic_s *aq_nic,
+struct aq_hw_rx_fltrs_s *rx_fltrs,
+struct ethtool_rx_flow_spec *fsp)
+{
+   if (fsp->location < AQ_RX_FIRST_LOC_FETHERT ||
+   fsp->location > AQ_RX_LAST_LOC_FETHERT) {
+   netdev_err(aq_nic->ndev,
+  "ethtool: location must be in range [%d, %d]",
+  AQ_RX_FIRST_LOC_FETHERT,
+  AQ_RX_LAST_LOC_FETHERT);
+   return -EINVAL;
+   }
+
+   if (be16_to_cpu(fsp->m_ext.vlan_tci) == VLAN_PRIO_MASK &&
+   fsp->m_u.ether_spec.h_proto == 0U) {
+   netdev_err(aq_nic->ndev,
+  "ethtool: proto (ether_type) parameter must be 
specfied");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+static int __must_check
 aq_check_approve_fvlan(struct aq_nic_s *aq_nic,
   struct aq_hw_rx_fltrs_s *rx_fltrs,
   struct ethtool_rx_flow_spec *fsp)
@@ -152,6 +176,8 @@ aq_check_filter(struct aq_nic_s *aq_nic,
if (fsp->flow_type & FLOW_EXT) {
if (be16_to_cpu(fsp->m_ext.vlan_tci) == VLAN_VID_MASK) {
err = aq_check_approve_fvlan(aq_nic, rx_fltrs, fsp);
+   } else if (be16_to_cpu(fsp->m_ext.vlan_tci) == VLAN_PRIO_MASK) {
+   err = aq_check_approve_fl2(aq_nic, rx_fltrs, fsp);
} else {
netdev_err(aq_nic->ndev,
   "ethtool: invalid vlan mask 0x%x specified",
@@ -161,7 +187,7 @@ aq_check_filter(struct aq_nic_s *aq_nic,
} else {
switch (fsp->flow_type & ~FLOW_EXT) {
case ETHER_FLOW:
-   err = -EOPNOTSUPP;
+   err = aq_check_approve_fl2(aq_nic, rx_fltrs, fsp);
break;
case TCP_V4_FLOW:
case UDP_V4_FLOW:
@@ -210,6 +236,10 @@ aq_rule_is_not_support(struct aq_nic_s *aq_nic,
netdev_err(aq_nic->ndev,
   "ethtool: The specified tos tclass are not 
supported\n");
rule_is_not_support = true;
+   } else if (fsp->flow_type & FLOW_MAC_EXT) {
+   netdev_err(aq_nic->ndev,
+  "ethtool: MAC_EXT is not supported");
+   rule_is_not_support = true;
}
 
return rule_is_not_support;
@@ -259,6 +289,48 @@ aq_check_rule(struct aq_nic_s *aq_nic,
return err;
 }
 
+static void aq_set_data_fl2(struct aq_nic_s *aq_nic,
+   struct aq_rx_filter *aq_rx_fltr,
+   struct aq_rx_filter_l2 *data, bool add)
+{
+   const struct ethtool_rx_flow_spec *fsp = _rx_fltr->aq_fsp;
+
+   memset(data, 0, sizeof(*data));
+
+   data->location = fsp->location - AQ_RX_FIRST_LOC_FETHERT;
+
+   if (fsp->ring_cookie != RX_CLS_FLOW_DISC)
+   data->queue = fsp->ring_cookie;
+   else
+   data->queue = -1;
+
+   data->ethertype = be16_to_cpu(fsp->h_u.ether_spec.h_proto);
+   data->user_priority_en = be16_to_cpu(fsp->m_ext.vlan_tci)
+== VLAN_PRIO_MASK;
+   data->user_priority = (be16_to_cpu(fsp->h_ext.vlan_tci)
+  & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT;
+}
+
+static int aq_add_del_fether(struct aq_nic_s *aq_nic,
+struct aq_rx_filter *aq_rx_fltr, bool add)
+{
+   struct aq_rx_filter_l2 data;
+   struct aq_hw_s *aq_hw = aq_nic->aq_hw;
+   const struct aq_hw_ops 

[PATCH net-next v2 6/6] net: aquantia: add support of rx-vlan-filter offload

2018-11-12 Thread Igor Russkikh
From: Dmitry Bogdanov 

Since it uses the same NIC table as rx flow vlan filter therefore
rx-flow vlan filter accepts only vlans that present on the interface
in case of rx-vlan-filter is on.

Signed-off-by: Dmitry Bogdanov 
Signed-off-by: Igor Russkikh 
---
 .../net/ethernet/aquantia/atlantic/aq_filters.c| 138 +
 .../net/ethernet/aquantia/atlantic/aq_filters.h|   2 +
 drivers/net/ethernet/aquantia/atlantic/aq_hw.h |   1 +
 drivers/net/ethernet/aquantia/atlantic/aq_main.c   |  40 +-
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c|   2 -
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h|   3 +-
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c  |  29 ++---
 7 files changed, 197 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
index b987c6717af6..f6145e7473b1 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
@@ -157,6 +157,14 @@ aq_check_approve_fvlan(struct aq_nic_s *aq_nic,
return -EINVAL;
}
 
+   if ((aq_nic->ndev->features & NETIF_F_HW_VLAN_CTAG_FILTER) &&
+   (!test_bit(be16_to_cpu(fsp->h_ext.vlan_tci),
+  aq_nic->active_vlans))) {
+   netdev_err(aq_nic->ndev,
+  "ethtool: unknown vlan-id specified");
+   return -EINVAL;
+   }
+
if (fsp->ring_cookie > aq_nic->aq_nic_cfg.num_rss_queues) {
netdev_err(aq_nic->ndev,
   "ethtool: queue number must be in range [0, %d]",
@@ -331,26 +339,108 @@ static int aq_add_del_fether(struct aq_nic_s *aq_nic,
return aq_hw_ops->hw_filter_l2_clear(aq_hw, );
 }
 
+static bool aq_fvlan_is_busy(struct aq_rx_filter_vlan *aq_vlans, int vlan)
+{
+   int i;
+
+   for (i = 0; i < AQ_VLAN_MAX_FILTERS; ++i) {
+   if (aq_vlans[i].enable &&
+   aq_vlans[i].queue != AQ_RX_QUEUE_NOT_ASSIGNED &&
+   aq_vlans[i].vlan_id == vlan) {
+   return true;
+   }
+   }
+
+   return false;
+}
+
+/* Function rebuilds array of vlan filters so that filters with assigned
+ * queue have a precedence over just vlans on the interface.
+ */
+static void aq_fvlan_rebuild(struct aq_nic_s *aq_nic,
+unsigned long *active_vlans,
+struct aq_rx_filter_vlan *aq_vlans)
+{
+   bool vlan_busy = false;
+   int vlan = -1;
+   int i;
+
+   for (i = 0; i < AQ_VLAN_MAX_FILTERS; ++i) {
+   if (aq_vlans[i].enable &&
+   aq_vlans[i].queue != AQ_RX_QUEUE_NOT_ASSIGNED)
+   continue;
+   do {
+   vlan = find_next_bit(active_vlans,
+VLAN_N_VID,
+vlan + 1);
+   if (vlan == VLAN_N_VID) {
+   aq_vlans[i].enable = 0U;
+   aq_vlans[i].queue = AQ_RX_QUEUE_NOT_ASSIGNED;
+   aq_vlans[i].vlan_id = 0;
+   continue;
+   }
+
+   vlan_busy = aq_fvlan_is_busy(aq_vlans, vlan);
+   if (!vlan_busy) {
+   aq_vlans[i].enable = 1U;
+   aq_vlans[i].queue = AQ_RX_QUEUE_NOT_ASSIGNED;
+   aq_vlans[i].vlan_id = vlan;
+   }
+   } while (vlan_busy && vlan != VLAN_N_VID);
+   }
+}
+
 static int aq_set_data_fvlan(struct aq_nic_s *aq_nic,
 struct aq_rx_filter *aq_rx_fltr,
 struct aq_rx_filter_vlan *aq_vlans, bool add)
 {
const struct ethtool_rx_flow_spec *fsp = _rx_fltr->aq_fsp;
int location = fsp->location - AQ_RX_FIRST_LOC_FVLANID;
+   int i;
 
memset(_vlans[location], 0, sizeof(aq_vlans[location]));
 
if (!add)
return 0;
 
+   /* remove vlan if it was in table without queue assignment */
+   for (i = 0; i < AQ_VLAN_MAX_FILTERS; ++i) {
+   if (aq_vlans[i].vlan_id ==
+  (be16_to_cpu(fsp->h_ext.vlan_tci) & VLAN_VID_MASK)) {
+   aq_vlans[i].enable = false;
+   }
+   }
+
aq_vlans[location].location = location;
aq_vlans[location].vlan_id = be16_to_cpu(fsp->h_ext.vlan_tci)
 & VLAN_VID_MASK;
aq_vlans[location].queue = fsp->ring_cookie & 0x1FU;
aq_vlans[location].enable = 1U;
+
return 0;
 }
 
+int aq_del_fvlan_by_vlan(struct aq_nic_s *aq_nic, u16 vlan_id)
+{
+   struct aq_hw_rx_fltrs_s *rx_fltrs = aq_get_hw_rx_fltrs(aq_nic);
+   struct aq_rx_filter *rule = NULL;
+   struct hlist_node 

[PATCH net-next v2 3/6] net: aquantia: add support of L3/L4 ntuple filters

2018-11-12 Thread Igor Russkikh
From: Dmitry Bogdanov 

Add support of L3/L4 5-tuple {protocol, src-ip, dst-ip, src-port, dst-port}
filters. Mask is not supported. Src-port and dst-port are only compared for
TCP/UDP/SCTP packets. Both IPv4 and IPv6 are supported.
The supported actions are the drop and the queue assignment.
Due to fixed order of the rules in the NIC, the location 32-39 are
reserved for L3/L4 5-tuple filters. The locations 32 and 36 are
reserved for IPv6 filters.

Examples:
sudo ethtool -N eth0 flow-type ip6 src-ip 2001:db8:0:f101::2 \
dst-ip 2001:db8:0:f101::5 action -1 loc 36

sudo ethtool -N eth0 flow-type udp4 src-ip 10.0.0.4 \
dst-ip 10.0.0.7 src-port 2000 dst-port 2001 action 2 loc 32

Signed-off-by: Dmitry Bogdanov 
Signed-off-by: Igor Russkikh 
---
 .../net/ethernet/aquantia/atlantic/aq_filters.c| 168 -
 drivers/net/ethernet/aquantia/atlantic/aq_hw.h |  11 ++
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h|   7 +
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c  |  58 +++
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h|  43 ++
 5 files changed, 284 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
index 8cb4eaefd6e1..34e2a28d344a 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c
@@ -85,11 +85,46 @@ aq_rule_already_exists(struct aq_nic_s *aq_nic,
return false;
 }
 
+static int aq_check_approve_fl3l4(struct aq_nic_s *aq_nic,
+ struct aq_hw_rx_fltrs_s *rx_fltrs,
+ struct ethtool_rx_flow_spec *fsp)
+{
+   if (fsp->location < AQ_RX_FIRST_LOC_FL3L4 ||
+   fsp->location > AQ_RX_LAST_LOC_FL3L4) {
+   netdev_err(aq_nic->ndev,
+  "ethtool: location must be in range [%d, %d]",
+  AQ_RX_FIRST_LOC_FL3L4,
+  AQ_RX_LAST_LOC_FL3L4);
+   return -EINVAL;
+   }
+   if (rx_fltrs->fl3l4.is_ipv6 && rx_fltrs->fl3l4.active_ipv4) {
+   rx_fltrs->fl3l4.is_ipv6 = false;
+   netdev_err(aq_nic->ndev,
+  "ethtool: mixing ipv4 and ipv6 is not allowed");
+   return -EINVAL;
+   } else if (!rx_fltrs->fl3l4.is_ipv6 && rx_fltrs->fl3l4.active_ipv6) {
+   rx_fltrs->fl3l4.is_ipv6 = true;
+   netdev_err(aq_nic->ndev,
+  "ethtool: mixing ipv4 and ipv6 is not allowed");
+   return -EINVAL;
+   } else if (rx_fltrs->fl3l4.is_ipv6&&
+  fsp->location != AQ_RX_FIRST_LOC_FL3L4 + 4 &&
+  fsp->location != AQ_RX_FIRST_LOC_FL3L4) {
+   netdev_err(aq_nic->ndev,
+  "ethtool: The specified location for ipv6 must be %d 
or %d",
+  AQ_RX_FIRST_LOC_FL3L4, AQ_RX_FIRST_LOC_FL3L4 + 4);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int __must_check
 aq_check_filter(struct aq_nic_s *aq_nic,
struct ethtool_rx_flow_spec *fsp)
 {
int err = 0;
+   struct aq_hw_rx_fltrs_s *rx_fltrs = aq_get_hw_rx_fltrs(aq_nic);
 
if (fsp->flow_type & FLOW_EXT) {
err = -EOPNOTSUPP;
@@ -103,14 +138,16 @@ aq_check_filter(struct aq_nic_s *aq_nic,
case SCTP_V4_FLOW:
case IPV4_FLOW:
case IP_USER_FLOW:
-   err = -EOPNOTSUPP;
+   rx_fltrs->fl3l4.is_ipv6 = false;
+   err = aq_check_approve_fl3l4(aq_nic, rx_fltrs, fsp);
break;
case TCP_V6_FLOW:
case UDP_V6_FLOW:
case SCTP_V6_FLOW:
case IPV6_FLOW:
case IPV6_USER_FLOW:
-   err = -EOPNOTSUPP;
+   rx_fltrs->fl3l4.is_ipv6 = true;
+   err = aq_check_approve_fl3l4(aq_nic, rx_fltrs, fsp);
break;
default:
netdev_err(aq_nic->ndev,
@@ -156,6 +193,11 @@ aq_rule_is_not_correct(struct aq_nic_s *aq_nic,
 
if (!aq_nic) {
rule_is_not_correct = true;
+   } else if (fsp->location > AQ_RX_MAX_RXNFC_LOC) {
+   netdev_err(aq_nic->ndev,
+  "ethtool: The specified number %u rule is invalid\n",
+  fsp->location);
+   rule_is_not_correct = true;
} else if (aq_check_filter(aq_nic, fsp)) {
rule_is_not_correct = true;
} else if (fsp->ring_cookie != RX_CLS_FLOW_DISC) {
@@ -187,6 +229,125 @@ aq_check_rule(struct aq_nic_s *aq_nic,
return err;
 }
 
+static int aq_set_data_fl3l4(struct aq_nic_s *aq_nic,
+struct aq_rx_filter *aq_rx_fltr,
+struct 

[PATCH net-next v2 0/6] net: aquantia: add rx-flow filter support

2018-11-12 Thread Igor Russkikh
In this patchset the rx-flow filters functionality and vlan filter offloads
are implemented.

The rules in NIC hardware have fixed order and priorities.
To support this, the locations of filters from ethtool perspective are also 
fixed:

* Locations 0 - 15 for VLAN ID filters
* Locations 16 - 31 for L2 EtherType and PCP filters
* Locations 32 - 39 for L3/L4 5-tuple filters (locations 32, 36 for IPv6)

Dmitry Bogdanov (6):
  net: aquantia: add rx-flow filter definitions
  net: aquantia: add infrastructure for ntuple rules
  net: aquantia: add support of L3/L4 ntuple filters
  net: aquantia: add vlan id to rx flow filters
  net: aquantia: add ethertype and PCP to rx flow filters
  net: aquantia: add support of rx-vlan-filter offload

 drivers/net/ethernet/aquantia/atlantic/Makefile|   1 +
 drivers/net/ethernet/aquantia/atlantic/aq_common.h |   2 +-
 .../net/ethernet/aquantia/atlantic/aq_ethtool.c|  31 +
 .../net/ethernet/aquantia/atlantic/aq_filters.c| 876 +
 .../net/ethernet/aquantia/atlantic/aq_filters.h|  36 +
 drivers/net/ethernet/aquantia/atlantic/aq_hw.h |  29 +
 drivers/net/ethernet/aquantia/atlantic/aq_main.c   |  55 +-
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c|   2 -
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h|  21 +-
 .../net/ethernet/aquantia/atlantic/aq_pci_func.c   |   2 +
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c  | 162 +++-
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c | 109 +++
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h |  48 ++
 .../aquantia/atlantic/hw_atl/hw_atl_llh_internal.h | 135 +++-
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h|  58 ++
 15 files changed, 1531 insertions(+), 36 deletions(-)
 create mode 100644 drivers/net/ethernet/aquantia/atlantic/aq_filters.c
 create mode 100644 drivers/net/ethernet/aquantia/atlantic/aq_filters.h

-- 
2.7.4



Re: [PATCH net-next 0/6] net: aquantia: add rx-flow filter support

2018-11-12 Thread Igor Russkikh

On 12.11.2018 18:02, Igor Russkikh wrote:
> In this patchset the rx-flow filters functionality and vlan filter offloads
> are implemented.
> 
> The rules in NIC hardware have fixed order and priorities.
> To support this, the locations of filters from ethtool perspective are also 
> fixed:
> 

Hi David, please discard the patch, it seems it gets a conflict with just 
integrated 'net' tree.

Will resend it shortly.

Sorry for this.


  1   2   >