Re: [RFC PATCH 0/3] acpi: Add acpi mdio support code
Hello Rafael, On 2018/11/13 1:19, Andrew Lunn wrote: >>> I'm just trying to ensure whatever is defined is flexible enough that >>> we really can later support everything which DT does. We have PHYs on >>> MDIO busses, inside switches, which are on MDIO busses, which are >>> inside Ethernet interfaces, etc. >> I think it can be satisfied. See the table I'm using above. > Hi Dongsheng > > > Since i don't know anything better, i think i have to trust you have > this correct. > > It would be good to document this, so that the next person who needs > to add ACPI support for a PHY has some documentation to look at. > Could you add something to Documentation/acpi/dsd? How about Andrew suggestion? I agree with Andrew. Or We need to add the documentation to Documentation/acpi? I see that both GPIO and I2C are down here. Cheers, Dongsheng
Re: [PATCH net-next 0/4] sctp: add subscribe per asoc and sockopt SCTP_EVENT
On Tue, Nov 13, 2018 at 1:26 AM Xin Long wrote: > > This patchset mainly adds the Event Subscription sockopt described in > rfc6525#section-6.2: > > Subscribing to events as described in [RFC6458] uses a setsockopt() > call with the SCTP_EVENT socket option. This option takes the > following structure, which specifies the association, the event type > (using the same value found in the event type field), and an on/off > boolean. > > struct sctp_event { > sctp_assoc_t se_assoc_id; > uint16_t se_type; > uint8_t se_on; > }; > > The user fills in the se_type field with the same value found in the > strreset_type field, i.e., SCTP_STREAM_RESET_EVENT. The user will > also fill in the se_assoc_id field with either the association to set > this event on (this field is ignored for one-to-one style sockets) or > one of the reserved constant values defined in [RFC6458]. Finally, > the se_on field is set with a 1 to enable the event or a 0 to disable > the event. > > As for the old SCTP_EVENTS Option with struct sctp_event_subscribe, > it's being DEPRECATED. > > Xin Long (4): > sctp: define subscribe in sctp_sock as __u16 > sctp: add subscribe per asoc > sctp: rename enum sctp_event to sctp_event_type > sctp: add sockopt SCTP_EVENT > > include/net/sctp/constants.h | 2 +- > include/net/sctp/sm.h| 4 +- > include/net/sctp/structs.h | 4 +- > include/net/sctp/ulpevent.h | 39 -- > include/uapi/linux/sctp.h| 13 - > net/sctp/associola.c | 2 + > net/sctp/chunk.c | 8 ++- > net/sctp/primitive.c | 2 +- > net/sctp/sm_sideeffect.c | 12 ++--- > net/sctp/sm_statetable.c | 2 +- > net/sctp/socket.c| 126 > --- > net/sctp/stream_interleave.c | 12 +++-- > net/sctp/ulpqueue.c | 8 +-- > 13 files changed, 184 insertions(+), 50 deletions(-) > > -- > 2.1.0 > Because some key word in changelog triggerred the filters at vger.kernel.org, v2 has been posted.
[PATCHv2 net-next 2/4] sctp: add subscribe per asoc
The member subscribe should be per asoc, so that sockopt SCTP_EVENT in the next patch can subscribe a event from one asoc only. Signed-off-by: Xin Long --- include/net/sctp/structs.h | 2 ++ net/sctp/associola.c | 2 ++ net/sctp/chunk.c | 6 ++ net/sctp/socket.c| 6 +- net/sctp/stream_interleave.c | 7 --- net/sctp/ulpqueue.c | 4 ++-- 6 files changed, 17 insertions(+), 10 deletions(-) diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index bc7808a..7eaa294 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -2077,6 +2077,8 @@ struct sctp_association { int sent_cnt_removable; + __u16 subscribe; + __u64 abandoned_unsent[SCTP_PR_INDEX(MAX) + 1]; __u64 abandoned_sent[SCTP_PR_INDEX(MAX) + 1]; }; diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 6a28b96..685c7ef 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -135,6 +135,8 @@ static struct sctp_association *sctp_association_init( */ asoc->max_burst = sp->max_burst; + asoc->subscribe = sp->subscribe; + /* initialize association timers */ asoc->timeouts[SCTP_EVENT_TIMEOUT_T1_COOKIE] = asoc->rto_initial; asoc->timeouts[SCTP_EVENT_TIMEOUT_T1_INIT] = asoc->rto_initial; diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c index 6c761af..0b203b8 100644 --- a/net/sctp/chunk.c +++ b/net/sctp/chunk.c @@ -86,11 +86,10 @@ void sctp_datamsg_free(struct sctp_datamsg *msg) /* Final destructruction of datamsg memory. */ static void sctp_datamsg_destroy(struct sctp_datamsg *msg) { + struct sctp_association *asoc = NULL; struct list_head *pos, *temp; struct sctp_chunk *chunk; - struct sctp_sock *sp; struct sctp_ulpevent *ev; - struct sctp_association *asoc = NULL; int error = 0, notify; /* If we failed, we may need to notify. */ @@ -108,8 +107,7 @@ static void sctp_datamsg_destroy(struct sctp_datamsg *msg) else error = asoc->outqueue.error; - sp = sctp_sk(asoc->base.sk); - notify = sctp_ulpevent_type_enabled(sp->subscribe, + notify = sctp_ulpevent_type_enabled(asoc->subscribe, SCTP_SEND_FAILED); } diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 48e0b45..789008d 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -2307,6 +2307,7 @@ static int sctp_setsockopt_events(struct sock *sk, char __user *optval, struct sctp_event_subscribe subscribe; __u8 *sn_type = (__u8 *) struct sctp_sock *sp = sctp_sk(sk); + struct sctp_association *asoc; int i; if (optlen > sizeof(struct sctp_event_subscribe)) @@ -2319,14 +2320,17 @@ static int sctp_setsockopt_events(struct sock *sk, char __user *optval, sctp_ulpevent_type_set(>subscribe, SCTP_SN_TYPE_BASE + i, sn_type[i]); + list_for_each_entry(asoc, >ep->asocs, asocs) + asoc->subscribe = sctp_sk(sk)->subscribe; + /* At the time when a user app subscribes to SCTP_SENDER_DRY_EVENT, * if there is no data to be sent or retransmit, the stack will * immediately send up this notification. */ if (sctp_ulpevent_type_enabled(sp->subscribe, SCTP_SENDER_DRY_EVENT)) { - struct sctp_association *asoc = sctp_id2assoc(sk, 0); struct sctp_ulpevent *event; + asoc = sctp_id2assoc(sk, 0); if (asoc && sctp_outq_is_empty(>outqueue)) { event = sctp_ulpevent_make_sender_dry_event(asoc, GFP_USER | __GFP_NOWARN); diff --git a/net/sctp/stream_interleave.c b/net/sctp/stream_interleave.c index ceef5a3..a6bf215 100644 --- a/net/sctp/stream_interleave.c +++ b/net/sctp/stream_interleave.c @@ -503,7 +503,7 @@ static int sctp_enqueue_event(struct sctp_ulpq *ulpq, sk_incoming_cpu_update(sk); } - if (!sctp_ulpevent_is_enabled(event, sp->subscribe)) + if (!sctp_ulpevent_is_enabled(event, ulpq->asoc->subscribe)) goto out_free; if (skb_list) @@ -992,16 +992,17 @@ static void sctp_intl_stream_abort_pd(struct sctp_ulpq *ulpq, __u16 sid, __u32 mid, __u16 flags, gfp_t gfp) { struct sock *sk = ulpq->asoc->base.sk; - struct sctp_sock *sp = sctp_sk(sk); struct sctp_ulpevent *ev = NULL; - if (!sctp_ulpevent_type_enabled(sp->subscribe, + if (!sctp_ulpevent_type_enabled(ulpq->asoc->subscribe, SCTP_PARTIAL_DELIVERY_EVENT)) return; ev = sctp_ulpevent_make_pdapi(ulpq->asoc, SCTP_PARTIAL_DELIVERY_ABORTED,
[PATCHv2 net-next 4/4] sctp: add sockopt SCTP_EVENT
This patch adds sockopt SCTP_EVENT described in rfc6525#section-6.2. With this sockopt users can subscribe to an event from a specified asoc. Signed-off-by: Xin Long --- include/uapi/linux/sctp.h | 7 net/sctp/socket.c | 89 +++ 2 files changed, 96 insertions(+) diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h index 66afa5b..d584073 100644 --- a/include/uapi/linux/sctp.h +++ b/include/uapi/linux/sctp.h @@ -129,6 +129,7 @@ typedef __s32 sctp_assoc_t; #define SCTP_STREAM_SCHEDULER_VALUE124 #define SCTP_INTERLEAVING_SUPPORTED125 #define SCTP_SENDMSG_CONNECT 126 +#define SCTP_EVENT 127 /* PR-SCTP policies */ #define SCTP_PR_SCTP_NONE 0x @@ -1154,6 +1155,12 @@ struct sctp_add_streams { uint16_t sas_outstrms; }; +struct sctp_event { + sctp_assoc_t se_assoc_id; + uint16_t se_type; + uint8_t se_on; +}; + /* SCTP Stream schedulers */ enum sctp_sched_type { SCTP_SS_FCFS, diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 789008d..1451211 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4288,6 +4288,57 @@ static int sctp_setsockopt_reuse_port(struct sock *sk, char __user *optval, return 0; } +static int sctp_setsockopt_event(struct sock *sk, char __user *optval, +unsigned int optlen) +{ + struct sctp_association *asoc; + struct sctp_ulpevent *event; + struct sctp_event param; + int retval = 0; + + if (optlen < sizeof(param)) { + retval = -EINVAL; + goto out; + } + + optlen = sizeof(param); + if (copy_from_user(, optval, optlen)) { + retval = -EFAULT; + goto out; + } + + if (param.se_type < SCTP_SN_TYPE_BASE || + param.se_type > SCTP_SN_TYPE_MAX) { + retval = -EINVAL; + goto out; + } + + asoc = sctp_id2assoc(sk, param.se_assoc_id); + if (!asoc) { + sctp_ulpevent_type_set(_sk(sk)->subscribe, + param.se_type, param.se_on); + goto out; + } + + sctp_ulpevent_type_set(>subscribe, param.se_type, param.se_on); + + if (param.se_type == SCTP_SENDER_DRY_EVENT && param.se_on) { + if (sctp_outq_is_empty(>outqueue)) { + event = sctp_ulpevent_make_sender_dry_event(asoc, + GFP_USER | __GFP_NOWARN); + if (!event) { + retval = -ENOMEM; + goto out; + } + + asoc->stream.si->enqueue_event(>ulpq, event); + } + } + +out: + return retval; +} + /* API 6.2 setsockopt(), getsockopt() * * Applications use setsockopt() and getsockopt() to set or retrieve @@ -4485,6 +4536,9 @@ static int sctp_setsockopt(struct sock *sk, int level, int optname, case SCTP_REUSE_PORT: retval = sctp_setsockopt_reuse_port(sk, optval, optlen); break; + case SCTP_EVENT: + retval = sctp_setsockopt_event(sk, optval, optlen); + break; default: retval = -ENOPROTOOPT; break; @@ -7430,6 +7484,38 @@ static int sctp_getsockopt_reuse_port(struct sock *sk, int len, return 0; } +static int sctp_getsockopt_event(struct sock *sk, int len, char __user *optval, +int __user *optlen) +{ + struct sctp_association *asoc; + struct sctp_event param; + __u16 subscribe; + + if (len < sizeof(param)) + return -EINVAL; + + len = sizeof(param); + if (copy_from_user(, optval, len)) + return -EFAULT; + + if (param.se_type < SCTP_SN_TYPE_BASE || + param.se_type > SCTP_SN_TYPE_MAX) + return -EINVAL; + + asoc = sctp_id2assoc(sk, param.se_assoc_id); + subscribe = asoc ? asoc->subscribe : sctp_sk(sk)->subscribe; + param.se_on = sctp_ulpevent_type_enabled(subscribe, param.se_type); + + if (put_user(len, optlen)) + return -EFAULT; + + if (copy_to_user(optval, , len)) + return -EFAULT; + + return 0; +} + + static int sctp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { @@ -7628,6 +7714,9 @@ static int sctp_getsockopt(struct sock *sk, int level, int optname, case SCTP_REUSE_PORT: retval = sctp_getsockopt_reuse_port(sk, len, optval, optlen); break; + case SCTP_EVENT: + retval = sctp_getsockopt_event(sk, len, optval, optlen); + break; default: retval = -ENOPROTOOPT; break; -- 2.1.0
[PATCHv2 net-next 1/4] sctp: define subscribe in sctp_sock as __u16
The member subscribe in sctp_sock is used to indicate to which of the events it is subscribed, more like a group of flags. So it's better to be defined as __u16 (2 bytpes), instead of struct sctp_event_subscribe (13 bytes). Note that sctp_event_subscribe is an UAPI struct, used on sockopt calls, and thus it will not be removed. This patch only changes the internal storage of the flags. Signed-off-by: Xin Long --- include/net/sctp/structs.h | 2 +- include/net/sctp/ulpevent.h | 39 --- include/uapi/linux/sctp.h| 6 +- net/sctp/chunk.c | 4 ++-- net/sctp/socket.c| 35 ++- net/sctp/stream_interleave.c | 11 ++- net/sctp/ulpqueue.c | 8 7 files changed, 68 insertions(+), 37 deletions(-) diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index af9d494..bc7808a 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -217,7 +217,7 @@ struct sctp_sock { * These two structures must be grouped together for the usercopy * whitelist region. */ - struct sctp_event_subscribe subscribe; + __u16 subscribe; struct sctp_initmsg initmsg; int user_frag; diff --git a/include/net/sctp/ulpevent.h b/include/net/sctp/ulpevent.h index 51b4e06..bd922a0 100644 --- a/include/net/sctp/ulpevent.h +++ b/include/net/sctp/ulpevent.h @@ -164,30 +164,39 @@ void sctp_ulpevent_read_nxtinfo(const struct sctp_ulpevent *event, __u16 sctp_ulpevent_get_notification_type(const struct sctp_ulpevent *event); +static inline void sctp_ulpevent_type_set(__u16 *subscribe, + __u16 sn_type, __u8 on) +{ + if (sn_type > SCTP_SN_TYPE_MAX) + return; + + if (on) + *subscribe |= (1 << (sn_type - SCTP_SN_TYPE_BASE)); + else + *subscribe &= ~(1 << (sn_type - SCTP_SN_TYPE_BASE)); +} + /* Is this event type enabled? */ -static inline int sctp_ulpevent_type_enabled(__u16 sn_type, -struct sctp_event_subscribe *mask) +static inline bool sctp_ulpevent_type_enabled(__u16 subscribe, __u16 sn_type) { - int offset = sn_type - SCTP_SN_TYPE_BASE; - char *amask = (char *) mask; + if (sn_type > SCTP_SN_TYPE_MAX) + return false; - if (offset >= sizeof(struct sctp_event_subscribe)) - return 0; - return amask[offset]; + return subscribe & (1 << (sn_type - SCTP_SN_TYPE_BASE)); } /* Given an event subscription, is this event enabled? */ -static inline int sctp_ulpevent_is_enabled(const struct sctp_ulpevent *event, - struct sctp_event_subscribe *mask) +static inline bool sctp_ulpevent_is_enabled(const struct sctp_ulpevent *event, + __u16 subscribe) { __u16 sn_type; - int enabled = 1; - if (sctp_ulpevent_is_notification(event)) { - sn_type = sctp_ulpevent_get_notification_type(event); - enabled = sctp_ulpevent_type_enabled(sn_type, mask); - } - return enabled; + if (!sctp_ulpevent_is_notification(event)) + return true; + + sn_type = sctp_ulpevent_get_notification_type(event); + + return sctp_ulpevent_type_enabled(subscribe, sn_type); } #endif /* __sctp_ulpevent_h__ */ diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h index c81feb3..66afa5b 100644 --- a/include/uapi/linux/sctp.h +++ b/include/uapi/linux/sctp.h @@ -632,7 +632,9 @@ union sctp_notification { */ enum sctp_sn_type { - SCTP_SN_TYPE_BASE = (1<<15), + SCTP_SN_TYPE_BASE = (1<<15), + SCTP_DATA_IO_EVENT = SCTP_SN_TYPE_BASE, +#define SCTP_DATA_IO_EVENT SCTP_DATA_IO_EVENT SCTP_ASSOC_CHANGE, #define SCTP_ASSOC_CHANGE SCTP_ASSOC_CHANGE SCTP_PEER_ADDR_CHANGE, @@ -657,6 +659,8 @@ enum sctp_sn_type { #define SCTP_ASSOC_RESET_EVENT SCTP_ASSOC_RESET_EVENT SCTP_STREAM_CHANGE_EVENT, #define SCTP_STREAM_CHANGE_EVENT SCTP_STREAM_CHANGE_EVENT + SCTP_SN_TYPE_MAX= SCTP_STREAM_CHANGE_EVENT, +#define SCTP_SN_TYPE_MAX SCTP_SN_TYPE_MAX }; /* Notification error codes used to fill up the error fields in some diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c index ce80878..6c761af 100644 --- a/net/sctp/chunk.c +++ b/net/sctp/chunk.c @@ -109,8 +109,8 @@ static void sctp_datamsg_destroy(struct sctp_datamsg *msg) error = asoc->outqueue.error; sp = sctp_sk(asoc->base.sk); - notify = sctp_ulpevent_type_enabled(SCTP_SEND_FAILED, - >subscribe); + notify = sctp_ulpevent_type_enabled(sp->subscribe, +
[PATCHv2 net-next 3/4] sctp: rename enum sctp_event to sctp_event_type
sctp_event is a structure name defined in RFC for sockopt SCTP_EVENT. To avoid the conflict, rename it. Signed-off-by: Xin Long --- include/net/sctp/constants.h | 2 +- include/net/sctp/sm.h| 4 ++-- net/sctp/primitive.c | 2 +- net/sctp/sm_sideeffect.c | 12 ++-- net/sctp/sm_statetable.c | 2 +- 5 files changed, 11 insertions(+), 11 deletions(-) diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h index 8dadc74..4588bdc 100644 --- a/include/net/sctp/constants.h +++ b/include/net/sctp/constants.h @@ -71,7 +71,7 @@ enum { SCTP_DEFAULT_INSTREAMS = SCTP_MAX_STREAM }; SCTP_NUM_AUTH_CHUNK_TYPES) /* These are the different flavours of event. */ -enum sctp_event { +enum sctp_event_type { SCTP_EVENT_T_CHUNK = 1, SCTP_EVENT_T_TIMEOUT, SCTP_EVENT_T_OTHER, diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h index 9e3d327..24825a8 100644 --- a/include/net/sctp/sm.h +++ b/include/net/sctp/sm.h @@ -173,7 +173,7 @@ sctp_state_fn_t sctp_sf_autoclose_timer_expire; __u8 sctp_get_chunk_type(struct sctp_chunk *chunk); const struct sctp_sm_table_entry *sctp_sm_lookup_event( struct net *net, - enum sctp_event event_type, + enum sctp_event_type event_type, enum sctp_state state, union sctp_subtype event_subtype); int sctp_chunk_iif(const struct sctp_chunk *); @@ -313,7 +313,7 @@ struct sctp_chunk *sctp_process_strreset_resp( /* Prototypes for statetable processing. */ -int sctp_do_sm(struct net *net, enum sctp_event event_type, +int sctp_do_sm(struct net *net, enum sctp_event_type event_type, union sctp_subtype subtype, enum sctp_state state, struct sctp_endpoint *ep, struct sctp_association *asoc, void *event_arg, gfp_t gfp); diff --git a/net/sctp/primitive.c b/net/sctp/primitive.c index c0817f7a..a8c4c33 100644 --- a/net/sctp/primitive.c +++ b/net/sctp/primitive.c @@ -53,7 +53,7 @@ int sctp_primitive_ ## name(struct net *net, struct sctp_association *asoc, \ void *arg) { \ int error = 0; \ - enum sctp_event event_type; union sctp_subtype subtype; \ + enum sctp_event_type event_type; union sctp_subtype subtype; \ enum sctp_state state; \ struct sctp_endpoint *ep; \ \ diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c index 85d3930..1d143bc 100644 --- a/net/sctp/sm_sideeffect.c +++ b/net/sctp/sm_sideeffect.c @@ -52,7 +52,7 @@ #include #include -static int sctp_cmd_interpreter(enum sctp_event event_type, +static int sctp_cmd_interpreter(enum sctp_event_type event_type, union sctp_subtype subtype, enum sctp_state state, struct sctp_endpoint *ep, @@ -61,7 +61,7 @@ static int sctp_cmd_interpreter(enum sctp_event event_type, enum sctp_disposition status, struct sctp_cmd_seq *commands, gfp_t gfp); -static int sctp_side_effects(enum sctp_event event_type, +static int sctp_side_effects(enum sctp_event_type event_type, union sctp_subtype subtype, enum sctp_state state, struct sctp_endpoint *ep, @@ -623,7 +623,7 @@ static void sctp_cmd_init_failed(struct sctp_cmd_seq *commands, /* Worker routine to handle SCTP_CMD_ASSOC_FAILED. */ static void sctp_cmd_assoc_failed(struct sctp_cmd_seq *commands, struct sctp_association *asoc, - enum sctp_event event_type, + enum sctp_event_type event_type, union sctp_subtype subtype, struct sctp_chunk *chunk, unsigned int error) @@ -1162,7 +1162,7 @@ static void sctp_cmd_send_asconf(struct sctp_association *asoc) * If you want to understand all of lksctp, this is a * good place to start. */ -int sctp_do_sm(struct net *net, enum sctp_event event_type, +int sctp_do_sm(struct net *net, enum sctp_event_type event_type, union sctp_subtype subtype, enum sctp_state state, struct sctp_endpoint *ep, struct sctp_association *asoc, void *event_arg, gfp_t gfp) @@ -1199,7 +1199,7 @@ int sctp_do_sm(struct net *net, enum sctp_event event_type, /* * This the master state function side effect processing function. */ -static int sctp_side_effects(enum sctp_event event_type, +static int
[PATCHv2 net-next 0/4] sctp: add subscribe per asoc and sockopt SCTP_EVENT
This patchset mainly adds the Event Subscription sockopt described in rfc6525#section-6.2: "Subscribing to events as described in [RFC6458] uses a setsockopt() call with the SCTP_EVENT socket option. This option takes the following structure, which specifies the association, the event type (using the same value found in the event type field), and an on/off boolean. struct sctp_event { sctp_assoc_t se_assoc_id; uint16_t se_type; uint8_t se_on; }; The user fills in the se_type field with the same value found in the strreset_type field, i.e., SCTP_STREAM_RESET_EVENT. The user will also fill in the se_assoc_id field with either the association to set this event on (this field is ignored for one-to-one style sockets) or one of the reserved constant values defined in [RFC6458]. Finally, the se_on field is set with a 1 to enable the event or a 0 to disable the event." As for the old SCTP_EVENTS Option with struct sctp_event_subscribe, it's being DEPRECATED. Xin Long (4): sctp: define subscribe in sctp_sock as __u16 sctp: add subscribe per asoc sctp: rename enum sctp_event to sctp_event_type sctp: add sockopt SCTP_EVENT include/net/sctp/constants.h | 2 +- include/net/sctp/sm.h| 4 +- include/net/sctp/structs.h | 4 +- include/net/sctp/ulpevent.h | 39 -- include/uapi/linux/sctp.h| 13 - net/sctp/associola.c | 2 + net/sctp/chunk.c | 8 ++- net/sctp/primitive.c | 2 +- net/sctp/sm_sideeffect.c | 12 ++--- net/sctp/sm_statetable.c | 2 +- net/sctp/socket.c| 126 --- net/sctp/stream_interleave.c | 12 +++-- net/sctp/ulpqueue.c | 8 +-- 13 files changed, 184 insertions(+), 50 deletions(-) -- 2.1.0
Re: [PATCH net-next 2/2] r8169: use proper constant for PCI vendor USR instead of numerical id
Hi Heiner, Thank you for the patch! Yet something to improve: [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Heiner-Kallweit/r8169-add-USR-PCI-vendor-id/2018-13 config: x86_64-allmodconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): include/linux/slab.h:332:43: warning: dubious: x & !y >> drivers/isdn/hardware/mISDN/w6692.c:1404:30: error: undefined identifier >> 'PCI_DEVICE_ID_USR_6692' drivers/isdn/hardware/mISDN/w6692.c:1404:23: error: 'PCI_DEVICE_ID_USR_6692' undeclared here (not in a function); did you mean 'PCI_DEVICE_ID_SI_6202'? PCI_VENDOR_ID_USR, PCI_DEVICE_ID_USR_6692, 0, 0, ^~ PCI_DEVICE_ID_SI_6202 vim +/PCI_DEVICE_ID_USR_6692 +1404 drivers/isdn/hardware/mISDN/w6692.c 707b2ce6 Karsten Keil 2009-07-22 1399 e8336ed0 Arvind Yadav 2017-07-15 1400 static const struct pci_device_id w6692_ids[] = { 707b2ce6 Karsten Keil 2009-07-22 1401 { PCI_VENDOR_ID_DYNALINK, PCI_DEVICE_ID_DYNALINK_IS64PH, 707b2ce6 Karsten Keil 2009-07-22 1402PCI_ANY_ID, PCI_ANY_ID, 0, 0, (ulong)_map[0]}, 707b2ce6 Karsten Keil 2009-07-22 1403 { PCI_VENDOR_ID_WINBOND2, PCI_DEVICE_ID_WINBOND2_6692, 707b2ce6 Karsten Keil 2009-07-22 @1404PCI_VENDOR_ID_USR, PCI_DEVICE_ID_USR_6692, 0, 0, 707b2ce6 Karsten Keil 2009-07-22 1405(ulong)_map[2]}, 707b2ce6 Karsten Keil 2009-07-22 1406 { PCI_VENDOR_ID_WINBOND2, PCI_DEVICE_ID_WINBOND2_6692, 707b2ce6 Karsten Keil 2009-07-22 1407PCI_ANY_ID, PCI_ANY_ID, 0, 0, (ulong)_map[1]}, 707b2ce6 Karsten Keil 2009-07-22 1408 { } 707b2ce6 Karsten Keil 2009-07-22 1409 }; 707b2ce6 Karsten Keil 2009-07-22 1410 MODULE_DEVICE_TABLE(pci, w6692_ids); 707b2ce6 Karsten Keil 2009-07-22 1411 :: The code at line 1404 was first introduced by commit :: 707b2ce6c1f4f1261788f2ff09ad82c35e0e6240 mISDN: Add driver for Winbond cards :: TO: Karsten Keil :: CC: Karsten Keil --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH bpf-next v2] bpftool: make libbfd optional
2018-11-12 14:02 UTC-0800 ~ Jakub Kicinski > On Mon, 12 Nov 2018 13:44:10 -0800, Stanislav Fomichev wrote: >> Make it possible to build bpftool without libbfd. libbfd and libopcodes are >> typically provided in dev/dbg packages (binutils-dev in debian) which we >> usually don't have installed on the fleet machines and we'd like a way to >> have >> bpftool version that works without installing any additional packages. >> This excludes support for disassembling jit-ted code and prints an error if >> the user tries to use these features. >> >> Tested by: >> cat > FEATURES_DUMP.bpftool <> feature-libbfd=0 >> feature-disassembler-four-args=1 >> feature-reallocarray=0 >> feature-libelf=1 >> feature-libelf-mmap=1 >> feature-bpf=1 >> EOF >> FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make >> ldd bpftool | grep libbfd >> >> Signed-off-by: Stanislav Fomichev > > Seems reasonable, thanks! > > Acked-by: Jakub Kicinski > Thanks Stanislav! There is a problem with this patch on some distributions, Ubuntu at least. Feature detection for libbfd has been used for perf before being also used with bpftool. Since commit 280e7c48c3b8 the feature needs libz and libiberty to be present on the system, otherwise the feature would not compile (and be detected) on OpenSuse. On Ubuntu, libiberty is not needed (libbfd might be statically linked against it, if I remember correctly?), which means that we are able to build bpftool as long as binutils-dev has been installed, even if libiberty-dev has not been installed. The BFD feature, in that case, will appear as “undetected”. It is a bug. But since the Makefile does not stop compilation in that case (another bug), in the end we're good. With your patch, the problem is that libbpf detection will fail on Ubuntu if libiberty-dev is not present, even though all the necessary libraries for using the JIT disassembler are available. And in that case it _will_ make a difference, since the Makefile will no more compile the libbfd-related bits. So I'm not against the idea, but we have to fix libbfd detection first. Thanks, Quentin
[Patch net-next] net: remove unused skb_send_sock()
Signed-off-by: Cong Wang --- include/linux/skbuff.h | 1 - net/core/skbuff.c | 13 - 2 files changed, 14 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 7dcfb5591dc3..9f5bd97a26bd 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3327,7 +3327,6 @@ int skb_splice_bits(struct sk_buff *skb, struct sock *sk, unsigned int offset, unsigned int flags); int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, int len); -int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len); void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to); unsigned int skb_zerocopy_headlen(const struct sk_buff *from); int skb_zerocopy(struct sk_buff *to, struct sk_buff *from, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index fcb1155a00ec..cfe5a03fbdf3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2366,19 +2366,6 @@ int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, } EXPORT_SYMBOL_GPL(skb_send_sock_locked); -/* Send skb data on a socket. */ -int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len) -{ - int ret = 0; - - lock_sock(sk); - ret = skb_send_sock_locked(sk, skb, offset, len); - release_sock(sk); - - return ret; -} -EXPORT_SYMBOL_GPL(skb_send_sock); - /** * skb_store_bits - store bits from kernel buffer to skb * @skb: destination buffer -- 2.19.1
Re: [PATCH v2] net: Add trace events for all receive exit points
On Mon, 12 Nov 2018 15:46:53 -0500 (EST) Mathieu Desnoyers wrote: > I also notice that in two cases, a "gro_result_t" is implicitly cast > to "int". I usually frown upon this kind of stuff, because it's asking > for trouble if gro_result_t typedef to something else than "int" in the > future. > > I would recommend going for two templates, one which takes a "int" > ret parameter, and the other a "gro_result_t" ret parameter. > > Or am I being too cautious ? That's more of a question for the netdev maintainers. If they think casting gro_result_t to int is fine, then I'm fine. If it breaks in the future, they need to deal with it, I don't ;-) The downside of two templates, is that the templates are the bloated part of the trace event (DEFINE_EVENT()s are light weight). They can add a couple of K to the memory foot print. -- Steve
[PATCH][net-next] net: slightly optimize eth_type_trans
netperf udp stream shows that eth_type_trans takes certain cpu, so adjust the mac address check order, and firstly check if it is device address, and only check if it is multicast address only if not the device address. After this change: To unicast, and skb dst mac is device mac, this is most of time reduce a comparision To unicast, and skb dst mac is not device mac, nothing change To multicast, increase a comparision Before: 1.03% [kernel] [k] eth_type_trans After: 0.78% [kernel] [k] eth_type_trans Signed-off-by: Zhang Yu Signed-off-by: Li RongQing --- net/ethernet/eth.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index fd8faa0dfa61..1c88f5c5d5b1 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -165,15 +165,17 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev) eth = (struct ethhdr *)skb->data; skb_pull_inline(skb, ETH_HLEN); - if (unlikely(is_multicast_ether_addr_64bits(eth->h_dest))) { - if (ether_addr_equal_64bits(eth->h_dest, dev->broadcast)) - skb->pkt_type = PACKET_BROADCAST; - else - skb->pkt_type = PACKET_MULTICAST; + if (unlikely(!ether_addr_equal_64bits(eth->h_dest, + dev->dev_addr))) { + if (unlikely(is_multicast_ether_addr_64bits(eth->h_dest))) { + if (ether_addr_equal_64bits(eth->h_dest, dev->broadcast)) + skb->pkt_type = PACKET_BROADCAST; + else + skb->pkt_type = PACKET_MULTICAST; + } else { + skb->pkt_type = PACKET_OTHERHOST; + } } - else if (unlikely(!ether_addr_equal_64bits(eth->h_dest, - dev->dev_addr))) - skb->pkt_type = PACKET_OTHERHOST; /* * Some variants of DSA tagging don't have an ethertype field -- 2.16.2
[PATCH][net-next][v2] net: remove BUG_ON from __pskb_pull_tail
if list is NULL pointer, and the following access of list will trigger panic, which is same as BUG_ON Signed-off-by: Li RongQing --- net/core/skbuff.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 396fcb3baad0..d69503d66021 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1925,8 +1925,6 @@ void *__pskb_pull_tail(struct sk_buff *skb, int delta) struct sk_buff *insp = NULL; do { - BUG_ON(!list); - if (list->len <= eat) { /* Eaten as whole. */ eat -= list->len; -- 2.16.2
Re: [iproute PATCH] man: ip-route.8: Document nexthop limit
On 11/12/18 2:21 PM, Phil Sutter wrote: > diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in > index a33ce1f0f4006..383178c11331e 100644 > --- a/man/man8/ip-route.8.in > +++ b/man/man8/ip-route.8.in > @@ -589,6 +589,13 @@ argument lists: > route reflecting its relative bandwidth or quality. > .in -8 > > +The internal buffer used in iproute2 limits the maximum number of nexthops to > +be specified in one go. If only a gateway address is given, the current > buffer > +size allows for 144 IPv6 nexthops and 253 IPv4 ones. If more are required, > they > +may be added to the existing route using > +.B "ip route append" > +command. > + That is not true for IPv4. 'ip ro append' adds a new route after the existing route - an entry that can not be hit unless all of the nexthops in the first route are down. 'ip ro prepend' adds a new entry before the existing one meaning it takes precedence over the existing entries. For IPv6, 'append' and 'prepend' both add new nexthops to the existing entry.
[Patch net-next] net: get rid of __tcp_checksum_complete()
__tcp_checksum_complete() is 100% same with __skb_checksum_complete() and there is no other caller except tcp_checksum_complete(). So, just use __skb_checksum_complete() there. Signed-off-by: Cong Wang --- include/net/tcp.h | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 4743836bed2e..b84b694e8b3d 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1315,15 +1315,10 @@ static inline __sum16 tcp_v4_check(int len, __be32 saddr, return csum_tcpudp_magic(saddr,daddr,len,IPPROTO_TCP,base); } -static inline __sum16 __tcp_checksum_complete(struct sk_buff *skb) -{ - return __skb_checksum_complete(skb); -} - static inline bool tcp_checksum_complete(struct sk_buff *skb) { return !skb_csum_unnecessary(skb) && - __tcp_checksum_complete(skb); + __skb_checksum_complete(skb); } bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb); -- 2.19.1
[PATCH net] net_sched: sch_fq: ensure maxrate fq parameter applies to EDT flows
When EDT conversion happened, fq lost the ability to enfore a maxrate for all flows. It kept it for non EDT flows. This commit restores the functionality. Tested: tc qd replace dev eth0 root fq maxrate 500Mbit netperf -P0 -H host -- -O THROUGHPUT 489.75 Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model") Signed-off-by: Eric Dumazet --- net/sched/sch_fq.c | 29 ++--- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index 4b1af706896c07e5a0fe6d542dfcd530acdcf8f5..25a7cf6d380fd1ef5610a43a06dea488121b8206 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -469,22 +469,29 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch) goto begin; } prefetch(>end); - f->credit -= qdisc_pkt_len(skb); + plen = qdisc_pkt_len(skb); + f->credit -= plen; - if (ktime_to_ns(skb->tstamp) || !q->rate_enable) + if (!q->rate_enable) goto out; rate = q->flow_max_rate; - if (skb->sk) - rate = min(skb->sk->sk_pacing_rate, rate); - if (rate <= q->low_rate_threshold) { - f->credit = 0; - plen = qdisc_pkt_len(skb); - } else { - plen = max(qdisc_pkt_len(skb), q->quantum); - if (f->credit > 0) - goto out; + /* If EDT time was provided for this skb, we need to +* update f->time_next_packet only if this qdisc enforces +* a flow max rate. +*/ + if (!skb->tstamp) { + if (skb->sk) + rate = min(skb->sk->sk_pacing_rate, rate); + + if (rate <= q->low_rate_threshold) { + f->credit = 0; + } else { + plen = max(plen, q->quantum); + if (f->credit > 0) + goto out; + } } if (rate != ~0UL) { u64 len = (u64)plen * NSEC_PER_SEC; -- 2.19.1.930.g4563a0d9d0-goog
[net-next PATCH v4] net: sched: cls_flower: Classify packets using port ranges
Added support in tc flower for filtering based on port ranges. Example: 1. Match on a port range: - $ tc filter add dev enp4s0 protocol ip parent :\ prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\ action drop $ tc -s filter show dev enp4s0 parent : filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 eth_type ipv4 ip_proto tcp dst_port range 20-30 skip_hw not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 85 sec used 3 sec Action statistics: Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0) backlog 0b 0p requeues 0 2. Match on IP address and port range: -- $ tc filter add dev enp4s0 protocol ip parent :\ prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\ skip_hw action drop $ tc -s filter show dev enp4s0 parent : filter protocol ip pref 1 flower chain 0 handle 0x2 eth_type ipv4 ip_proto tcp dst_ip 192.168.1.1 dst_port range 100-200 skip_hw not_in_hw action order 1: gact action drop random type none pass val 0 index 2 ref 1 bind 1 installed 58 sec used 2 sec Action statistics: Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0) backlog 0b 0p requeues 0 v4: 1. Added condition before setting port key. 2. Organized setting and dumping port range keys into functions and added validation of input range. v3: 1. Moved new fields in UAPI enum to the end of enum. 2. Removed couple of empty lines. v2: Addressed Jiri's comments: 1. Added separate functions for dst and src comparisons. 2. Removed endpoint enum. 3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range lookup. 4. Cleaned up fl_lookup function. Signed-off-by: Amritha Nambiar --- include/uapi/linux/pkt_cls.h |7 ++ net/sched/cls_flower.c | 155 -- 2 files changed, 156 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index 401d0c1..95d0db2 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -485,6 +485,11 @@ enum { TCA_FLOWER_IN_HW_COUNT, + TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */ + TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */ + TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */ + TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */ + __TCA_FLOWER_MAX, }; @@ -518,6 +523,8 @@ enum { TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1), }; +#define TCA_FLOWER_MASK_FLAGS_RANGE(1 << 0) /* Range-based match */ + /* Match-all classifier */ enum { diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index c6c3278..85e9f8e 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -55,6 +55,8 @@ struct fl_flow_key { struct flow_dissector_key_ip ip; struct flow_dissector_key_ip enc_ip; struct flow_dissector_key_enc_opts enc_opts; + struct flow_dissector_key_ports tp_min; + struct flow_dissector_key_ports tp_max; } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */ struct fl_flow_mask_range { @@ -65,6 +67,7 @@ struct fl_flow_mask_range { struct fl_flow_mask { struct fl_flow_key key; struct fl_flow_mask_range range; + u32 flags; struct rhash_head ht_node; struct rhashtable ht; struct rhashtable_params filter_ht_params; @@ -179,13 +182,89 @@ static void fl_clear_masked_range(struct fl_flow_key *key, memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask)); } -static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask, - struct fl_flow_key *mkey) +static bool fl_range_port_dst_cmp(struct cls_fl_filter *filter, + struct fl_flow_key *key, + struct fl_flow_key *mkey) +{ + __be16 min_mask, max_mask, min_val, max_val; + + min_mask = htons(filter->mask->key.tp_min.dst); + max_mask = htons(filter->mask->key.tp_max.dst); + min_val = htons(filter->key.tp_min.dst); + max_val = htons(filter->key.tp_max.dst); + + if (min_mask && max_mask) { + if (htons(key->tp.dst) < min_val || + htons(key->tp.dst) > max_val) + return false; + + /* skb does not have min and max values */ + mkey->tp_min.dst = filter->mkey.tp_min.dst; + mkey->tp_max.dst = filter->mkey.tp_max.dst; + } + return true; +} + +static bool fl_range_port_src_cmp(struct cls_fl_filter *filter, + struct fl_flow_key *key, + struct fl_flow_key *mkey) +{ + __be16 min_mask, max_mask, min_val, max_val; + + min_mask =
Re: [net-next PATCH v3] net: sched: cls_flower: Classify packets using port ranges
On 11/9/2018 11:14 PM, Jiri Pirko wrote: > Sat, Nov 10, 2018 at 01:11:10AM CET, amritha.namb...@intel.com wrote: > > [...] > >> @@ -1026,8 +1122,7 @@ static void fl_init_dissector(struct flow_dissector >> *dissector, >> FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4); >> FL_KEY_SET_IF_MASKED(mask, keys, cnt, >> FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6); >> -FL_KEY_SET_IF_MASKED(mask, keys, cnt, >> - FLOW_DISSECTOR_KEY_PORTS, tp); >> +FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp); > > You still need to set the key under a condition. Something like: > if (FL_KEY_IS_MASKED(mask, tp) || > FL_KEY_IS_MASKED(mask, tp_min) || > FL_KEY_IS_MASKED(mask, tp_max) > FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp); > Yes, will do. Thanks! > >> FL_KEY_SET_IF_MASKED(mask, keys, cnt, >> FLOW_DISSECTOR_KEY_IP, ip); >> FL_KEY_SET_IF_MASKED(mask, keys, cnt, > > [...] >
Re: [PATCH bpf-next] selftests/bpf: Fix uninitialized duration warning
On Fri, Nov 09, 2018 at 10:18:16AM -0800, Joe Stringer wrote: > Daniel Borkmann reports: > > test_progs.c: In function ‘main’: > test_progs.c:81:3: warning: ‘duration’ may be used uninitialized in this > function [-Wmaybe-uninitialized] >printf("%s:PASS:%s %d nsec\n", __func__, tag, duration);\ >^~ > test_progs.c:1706:8: note: ‘duration’ was declared here > __u32 duration; > ^~~~ > > Signed-off-by: Joe Stringer I can repro with my locally compiled gcc 7.3. It fixed the warning. Acked-by: Martin KaFai Lau > --- > > I'm actually not able to reproduce this with GCC 7.3 or 8.2, so I'll > rely on review to establish that this patch works as intended. > --- > tools/testing/selftests/bpf/test_progs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/bpf/test_progs.c > b/tools/testing/selftests/bpf/test_progs.c > index 2d3c04f45530..c1e688f61061 100644 > --- a/tools/testing/selftests/bpf/test_progs.c > +++ b/tools/testing/selftests/bpf/test_progs.c > @@ -1703,7 +1703,7 @@ static void test_reference_tracking() > const char *file = "./test_sk_lookup_kern.o"; > struct bpf_object *obj; > struct bpf_program *prog; > - __u32 duration; > + __u32 duration = 0; > int err = 0; > > obj = bpf_object__open(file); > -- > 2.17.1 >
[PATCH bpf-next] bpf: libbpf: Fix bpf_program__next() API
This patch restores the behavior in commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for multi-function programs") such that bpf_program__next() does not return pseudo programs in ".text". Fixes: 0c19a9fbc9cd ("libbpf: cleanup after partial failure in bpf_object__pin") Signed-off-by: Martin KaFai Lau --- tools/lib/bpf/libbpf.c | 25 +++-- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index e827542ffa3a..a01eb9584e52 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -2193,19 +2193,25 @@ void *bpf_object__priv(struct bpf_object *obj) } static struct bpf_program * -__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i) +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, bool forward) { + size_t nr_programs = obj->nr_programs; ssize_t idx; - if (!obj->programs) + if (!nr_programs) return NULL; + if (!p) + /* Iter from the beginning */ + return forward ? >programs[0] : + >programs[nr_programs - 1]; + if (p->obj != obj) { pr_warning("error: program handler doesn't match object\n"); return NULL; } - idx = (p - obj->programs) + i; + idx = (p - obj->programs) + (forward ? 1 : -1); if (idx >= obj->nr_programs || idx < 0) return NULL; return >programs[idx]; @@ -2216,11 +,8 @@ bpf_program__next(struct bpf_program *prev, struct bpf_object *obj) { struct bpf_program *prog = prev; - if (prev == NULL) - return obj->programs; - do { - prog = __bpf_program__iter(prog, obj, 1); + prog = __bpf_program__iter(prog, obj, true); } while (prog && bpf_program__is_function_storage(prog, obj)); return prog; @@ -2231,14 +2234,8 @@ bpf_program__prev(struct bpf_program *next, struct bpf_object *obj) { struct bpf_program *prog = next; - if (next == NULL) { - if (!obj->nr_programs) - return NULL; - return obj->programs + obj->nr_programs - 1; - } - do { - prog = __bpf_program__iter(prog, obj, -1); + prog = __bpf_program__iter(prog, obj, false); } while (prog && bpf_program__is_function_storage(prog, obj)); return prog; -- 2.17.1
Re: [PATCH v5 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin
On Mon, Nov 12, 2018 at 03:29:25PM -0800, Stanislav Fomichev wrote: > On 11/12, Martin Lau wrote: > > On Mon, Nov 12, 2018 at 02:10:11PM -0800, Stanislav Fomichev wrote: > > > On 11/12, Martin Lau wrote: > > > > On Fri, Nov 09, 2018 at 08:21:41AM -0800, Stanislav Fomichev wrote: > > > > [ ... ] > > > > > @@ -1918,23 +2160,20 @@ void *bpf_object__priv(struct bpf_object *obj) > > > > > } > > > > > > > > > > static struct bpf_program * > > > > > -__bpf_program__next(struct bpf_program *prev, struct bpf_object *obj) > > > > > +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, > > > > > int i) > > > > > { > > > > > - size_t idx; > > > > > + ssize_t idx; > > > > > > > > > > if (!obj->programs) > > > > > return NULL; > > > > > - /* First handler */ > > > > > - if (prev == NULL) > > > > > - return >programs[0]; > > > > > > > > > > - if (prev->obj != obj) { > > > > > + if (p->obj != obj) { > > > > > pr_warning("error: program handler doesn't match > > > > > object\n"); > > > > > return NULL; > > > > > } > > > > > > > > > > - idx = (prev - obj->programs) + 1; > > > > > - if (idx >= obj->nr_programs) > > > > > + idx = (p - obj->programs) + i; > > > > > + if (idx >= obj->nr_programs || idx < 0) > > > > > return NULL; > > > > > return >programs[idx]; > > > > > } > > > > > @@ -1944,8 +2183,29 @@ bpf_program__next(struct bpf_program *prev, > > > > > struct bpf_object *obj) > > > > > { > > > > > struct bpf_program *prog = prev; > > > > > > > > > > + if (prev == NULL) > > > > > + return obj->programs; > > > > > + > > > > This patch breaks the behavior introduced in > > > > commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program > > > > for multi-function programs"): > > > > "Make bpf_program__next() skip over '.text' section if object file > > > > has pseudo calls. The '.text' section is hardly a program in that > > > > case, it's more of a storage for code of functions other than main." > > > > > > > > For example, the userspace could have been doing: > > > > prog = bpf_program__next(NULL, obj); > > > > bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT); > > > > bpf_object__load(obj); > > > > > > > > For the bpf_prog.o that has pseudo calls, after this patch in bpf-next, > > > > the prog returned by bpf_program__next() could be in ".text" instead of > > > > the main bpf program. The next bpf_program__set_type() has > > > > no effect to the main program. The following bpf_object__load() > > > > will catch user in surprise with the main bpf prog in > > > > the wrong BPF_PROG_TYPE. > > > > > > Will something like the following fix your concern? (plus, assuming the > > > same for prev): > > > > > > --- a/tools/lib/bpf/libbpf.c > > > +++ b/tools/lib/bpf/libbpf.c > > > @@ -2216,8 +2216,11 @@ bpf_program__next(struct bpf_program *prev, struct > > > bpf_object *obj) > > > { > > > struct bpf_program *prog = prev; > > > > > > - if (prev == NULL) > > > - return obj->programs; > > > + if (prev == NULL) { > > > + prog = obj->programs; > > > + if (!prog || !bpf_program__is_function_storage(prog, obj)) > > > + return prog; > > > + } > > > > > > do { > > > prog = __bpf_program__iter(prog, obj, 1); > > > > > > Any suggestions for a better way to do it? > > I think that would work. The bpf_program__prev() will need the same > > treatment though... > > > > Here is my mostly untested fix to unblock my other dev works. It moves > > the very first NULL check back to __bpf_program__iter(): > I like your version and it works with my simple flow dissector test :-) Great. I will send out an offical patch. > Thanks for spotting and fixing it! > > > > From de1c89ae1768e756825a6874268b5b1686695c93 Mon Sep 17 00:00:00 2001 > > From: Martin KaFai Lau > > Date: Mon, 12 Nov 2018 14:52:39 -0800 > > Subject: [PATCH] bpf: libbpf: Fix bpf_program__next() API > > > > This patch restores the behavior in > > commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for > > multi-function programs"): > > such that bpf_program__next() does not return pseudo programs in ".text". > > > > Fixes: 0c19a9fbc9cd ("libbpf: cleanup after partial failure in > > bpf_object__pin") > > Signed-off-by: Martin KaFai Lau > > --- > > tools/lib/bpf/libbpf.c | 25 +++-- > > 1 file changed, 11 insertions(+), 14 deletions(-) > > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > > index e827542ffa3a..a01eb9584e52 100644 > > --- a/tools/lib/bpf/libbpf.c > > +++ b/tools/lib/bpf/libbpf.c > > @@ -2193,19 +2193,25 @@ void *bpf_object__priv(struct bpf_object *obj) > > } > > > > static struct bpf_program * > > -__bpf_program__iter(struct bpf_program *p, struct bpf_object
Re: [PATCH v5 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin
On 11/12, Martin Lau wrote: > On Mon, Nov 12, 2018 at 02:10:11PM -0800, Stanislav Fomichev wrote: > > On 11/12, Martin Lau wrote: > > > On Fri, Nov 09, 2018 at 08:21:41AM -0800, Stanislav Fomichev wrote: > > > [ ... ] > > > > @@ -1918,23 +2160,20 @@ void *bpf_object__priv(struct bpf_object *obj) > > > > } > > > > > > > > static struct bpf_program * > > > > -__bpf_program__next(struct bpf_program *prev, struct bpf_object *obj) > > > > +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int > > > > i) > > > > { > > > > - size_t idx; > > > > + ssize_t idx; > > > > > > > > if (!obj->programs) > > > > return NULL; > > > > - /* First handler */ > > > > - if (prev == NULL) > > > > - return >programs[0]; > > > > > > > > - if (prev->obj != obj) { > > > > + if (p->obj != obj) { > > > > pr_warning("error: program handler doesn't match > > > > object\n"); > > > > return NULL; > > > > } > > > > > > > > - idx = (prev - obj->programs) + 1; > > > > - if (idx >= obj->nr_programs) > > > > + idx = (p - obj->programs) + i; > > > > + if (idx >= obj->nr_programs || idx < 0) > > > > return NULL; > > > > return >programs[idx]; > > > > } > > > > @@ -1944,8 +2183,29 @@ bpf_program__next(struct bpf_program *prev, > > > > struct bpf_object *obj) > > > > { > > > > struct bpf_program *prog = prev; > > > > > > > > + if (prev == NULL) > > > > + return obj->programs; > > > > + > > > This patch breaks the behavior introduced in > > > commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program > > > for multi-function programs"): > > > "Make bpf_program__next() skip over '.text' section if object file > > > has pseudo calls. The '.text' section is hardly a program in that > > > case, it's more of a storage for code of functions other than main." > > > > > > For example, the userspace could have been doing: > > > prog = bpf_program__next(NULL, obj); > > > bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT); > > > bpf_object__load(obj); > > > > > > For the bpf_prog.o that has pseudo calls, after this patch in bpf-next, > > > the prog returned by bpf_program__next() could be in ".text" instead of > > > the main bpf program. The next bpf_program__set_type() has > > > no effect to the main program. The following bpf_object__load() > > > will catch user in surprise with the main bpf prog in > > > the wrong BPF_PROG_TYPE. > > > > Will something like the following fix your concern? (plus, assuming the > > same for prev): > > > > --- a/tools/lib/bpf/libbpf.c > > +++ b/tools/lib/bpf/libbpf.c > > @@ -2216,8 +2216,11 @@ bpf_program__next(struct bpf_program *prev, struct > > bpf_object *obj) > > { > > struct bpf_program *prog = prev; > > > > - if (prev == NULL) > > - return obj->programs; > > + if (prev == NULL) { > > + prog = obj->programs; > > + if (!prog || !bpf_program__is_function_storage(prog, obj)) > > + return prog; > > + } > > > > do { > > prog = __bpf_program__iter(prog, obj, 1); > > > > Any suggestions for a better way to do it? > I think that would work. The bpf_program__prev() will need the same > treatment though... > > Here is my mostly untested fix to unblock my other dev works. It moves > the very first NULL check back to __bpf_program__iter(): I like your version and it works with my simple flow dissector test :-) Thanks for spotting and fixing it! > From de1c89ae1768e756825a6874268b5b1686695c93 Mon Sep 17 00:00:00 2001 > From: Martin KaFai Lau > Date: Mon, 12 Nov 2018 14:52:39 -0800 > Subject: [PATCH] bpf: libbpf: Fix bpf_program__next() API > > This patch restores the behavior in > commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for > multi-function programs"): > such that bpf_program__next() does not return pseudo programs in ".text". > > Fixes: 0c19a9fbc9cd ("libbpf: cleanup after partial failure in > bpf_object__pin") > Signed-off-by: Martin KaFai Lau > --- > tools/lib/bpf/libbpf.c | 25 +++-- > 1 file changed, 11 insertions(+), 14 deletions(-) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index e827542ffa3a..a01eb9584e52 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -2193,19 +2193,25 @@ void *bpf_object__priv(struct bpf_object *obj) > } > > static struct bpf_program * > -__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i) > +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, bool > forward) > { > + size_t nr_programs = obj->nr_programs; > ssize_t idx; > > - if (!obj->programs) > + if (!nr_programs) > return NULL; > > + if (!p) > + /* Iter from the beginning */ > +
Re: [PATCH v5 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin
On Mon, Nov 12, 2018 at 02:10:11PM -0800, Stanislav Fomichev wrote: > On 11/12, Martin Lau wrote: > > On Fri, Nov 09, 2018 at 08:21:41AM -0800, Stanislav Fomichev wrote: > > [ ... ] > > > @@ -1918,23 +2160,20 @@ void *bpf_object__priv(struct bpf_object *obj) > > > } > > > > > > static struct bpf_program * > > > -__bpf_program__next(struct bpf_program *prev, struct bpf_object *obj) > > > +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i) > > > { > > > - size_t idx; > > > + ssize_t idx; > > > > > > if (!obj->programs) > > > return NULL; > > > - /* First handler */ > > > - if (prev == NULL) > > > - return >programs[0]; > > > > > > - if (prev->obj != obj) { > > > + if (p->obj != obj) { > > > pr_warning("error: program handler doesn't match object\n"); > > > return NULL; > > > } > > > > > > - idx = (prev - obj->programs) + 1; > > > - if (idx >= obj->nr_programs) > > > + idx = (p - obj->programs) + i; > > > + if (idx >= obj->nr_programs || idx < 0) > > > return NULL; > > > return >programs[idx]; > > > } > > > @@ -1944,8 +2183,29 @@ bpf_program__next(struct bpf_program *prev, struct > > > bpf_object *obj) > > > { > > > struct bpf_program *prog = prev; > > > > > > + if (prev == NULL) > > > + return obj->programs; > > > + > > This patch breaks the behavior introduced in > > commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for > > multi-function programs"): > > "Make bpf_program__next() skip over '.text' section if object file > > has pseudo calls. The '.text' section is hardly a program in that > > case, it's more of a storage for code of functions other than main." > > > > For example, the userspace could have been doing: > > prog = bpf_program__next(NULL, obj); > > bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT); > > bpf_object__load(obj); > > > > For the bpf_prog.o that has pseudo calls, after this patch in bpf-next, > > the prog returned by bpf_program__next() could be in ".text" instead of > > the main bpf program. The next bpf_program__set_type() has > > no effect to the main program. The following bpf_object__load() > > will catch user in surprise with the main bpf prog in > > the wrong BPF_PROG_TYPE. > > Will something like the following fix your concern? (plus, assuming the > same for prev): > > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -2216,8 +2216,11 @@ bpf_program__next(struct bpf_program *prev, struct > bpf_object *obj) > { > struct bpf_program *prog = prev; > > - if (prev == NULL) > - return obj->programs; > + if (prev == NULL) { > + prog = obj->programs; > + if (!prog || !bpf_program__is_function_storage(prog, obj)) > + return prog; > + } > > do { > prog = __bpf_program__iter(prog, obj, 1); > > Any suggestions for a better way to do it? I think that would work. The bpf_program__prev() will need the same treatment though... Here is my mostly untested fix to unblock my other dev works. It moves the very first NULL check back to __bpf_program__iter(): >From de1c89ae1768e756825a6874268b5b1686695c93 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Mon, 12 Nov 2018 14:52:39 -0800 Subject: [PATCH] bpf: libbpf: Fix bpf_program__next() API This patch restores the behavior in commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for multi-function programs"): such that bpf_program__next() does not return pseudo programs in ".text". Fixes: 0c19a9fbc9cd ("libbpf: cleanup after partial failure in bpf_object__pin") Signed-off-by: Martin KaFai Lau --- tools/lib/bpf/libbpf.c | 25 +++-- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index e827542ffa3a..a01eb9584e52 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -2193,19 +2193,25 @@ void *bpf_object__priv(struct bpf_object *obj) } static struct bpf_program * -__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i) +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, bool forward) { + size_t nr_programs = obj->nr_programs; ssize_t idx; - if (!obj->programs) + if (!nr_programs) return NULL; + if (!p) + /* Iter from the beginning */ + return forward ? >programs[0] : + >programs[nr_programs - 1]; + if (p->obj != obj) { pr_warning("error: program handler doesn't match object\n"); return NULL; } - idx = (p - obj->programs) + i; + idx = (p - obj->programs) + (forward ? 1 : -1); if (idx >= obj->nr_programs || idx < 0) return NULL; return >programs[idx]; @@ -2216,11 +,8 @@ bpf_program__next(struct bpf_program
Hello, this is the second times am sending you this mail and you refused to reply to my email why?
[PATCH net-next 01/13] nfp: abm: rename qdiscs -> red_qdiscs
Rename qdiscs member to red_qdiscs. One of following patches will use the name qdiscs for tracking all qdisc types. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- drivers/net/ethernet/netronome/nfp/abm/main.c | 11 +++--- drivers/net/ethernet/netronome/nfp/abm/main.h | 4 +- .../net/ethernet/netronome/nfp/abm/qdisc.c| 38 ++- 3 files changed, 29 insertions(+), 24 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c b/drivers/net/ethernet/netronome/nfp/abm/main.c index 3d15de0ae271..5e749602989e 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/main.c +++ b/drivers/net/ethernet/netronome/nfp/abm/main.c @@ -309,9 +309,10 @@ nfp_abm_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, unsigned int id) alink->id = id; alink->parent = TC_H_ROOT; alink->total_queues = alink->vnic->max_rx_rings; - alink->qdiscs = kvcalloc(alink->total_queues, sizeof(*alink->qdiscs), -GFP_KERNEL); - if (!alink->qdiscs) { + alink->red_qdiscs = kvcalloc(alink->total_queues, +sizeof(*alink->red_qdiscs), +GFP_KERNEL); + if (!alink->red_qdiscs) { err = -ENOMEM; goto err_free_alink; } @@ -331,7 +332,7 @@ nfp_abm_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, unsigned int id) return 0; err_free_qdiscs: - kvfree(alink->qdiscs); + kvfree(alink->red_qdiscs); err_free_alink: kfree(alink); return err; @@ -342,7 +343,7 @@ static void nfp_abm_vnic_free(struct nfp_app *app, struct nfp_net *nn) struct nfp_abm_link *alink = nn->app_priv; nfp_abm_kill_reprs(alink->abm, alink); - kvfree(alink->qdiscs); + kvfree(alink->red_qdiscs); kfree(alink); } diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h b/drivers/net/ethernet/netronome/nfp/abm/main.h index 3774c063e419..a09090004f82 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/main.h +++ b/drivers/net/ethernet/netronome/nfp/abm/main.h @@ -82,7 +82,7 @@ struct nfp_red_qdisc { * @total_queues: number of PF queues * @parent:handle of expected parent, i.e. handle of MQ, or TC_H_ROOT * @num_qdiscs:number of currently used qdiscs - * @qdiscs:array of qdiscs + * @red_qdiscs:array of qdiscs */ struct nfp_abm_link { struct nfp_abm *abm; @@ -92,7 +92,7 @@ struct nfp_abm_link { unsigned int total_queues; u32 parent; unsigned int num_qdiscs; - struct nfp_red_qdisc *qdiscs; + struct nfp_red_qdisc *red_qdiscs; }; int nfp_abm_setup_tc_red(struct net_device *netdev, struct nfp_abm_link *alink, diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c index bb05f9ee0401..abda392880e0 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c +++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c @@ -18,7 +18,8 @@ __nfp_abm_reset_root(struct net_device *netdev, struct nfp_abm_link *alink, int ret; ret = nfp_abm_ctrl_set_all_q_lvls(alink, init_val); - memset(alink->qdiscs, 0, sizeof(*alink->qdiscs) * alink->num_qdiscs); + memset(alink->red_qdiscs, 0, + sizeof(*alink->red_qdiscs) * alink->num_qdiscs); alink->parent = handle; alink->num_qdiscs = qs; @@ -46,7 +47,8 @@ nfp_abm_red_find(struct nfp_abm_link *alink, struct tc_red_qopt_offload *opt) else return -EOPNOTSUPP; - if (i >= alink->num_qdiscs || opt->handle != alink->qdiscs[i].handle) + if (i >= alink->num_qdiscs || + opt->handle != alink->red_qdiscs[i].handle) return -EOPNOTSUPP; return i; @@ -59,7 +61,7 @@ nfp_abm_red_destroy(struct net_device *netdev, struct nfp_abm_link *alink, unsigned int i; for (i = 0; i < alink->num_qdiscs; i++) - if (handle == alink->qdiscs[i].handle) + if (handle == alink->red_qdiscs[i].handle) break; if (i == alink->num_qdiscs) return; @@ -68,7 +70,7 @@ nfp_abm_red_destroy(struct net_device *netdev, struct nfp_abm_link *alink, nfp_abm_reset_root(netdev, alink, TC_H_ROOT, 0); } else { nfp_abm_ctrl_set_q_lvl(alink, i, NFP_ABM_LVL_INFINITY); - memset(>qdiscs[i], 0, sizeof(*alink->qdiscs)); + memset(>red_qdiscs[i], 0, sizeof(*alink->red_qdiscs)); } } @@ -139,37 +141,39 @@ nfp_abm_red_replace(struct net_device *netdev, struct nfp_abm_link *alink, return -EINVAL; } /* Set the handle to try full clean up, in case IO failed */ - alink->qdiscs[i].handle = opt->handle; + alink->red_qdiscs[i].handle = opt->handle; if (err) goto err_destroy; if (opt->parent == TC_H_ROOT) -
[PATCH net-next 05/13] nfp: abm: remember which Qdisc is root
Keep track of which Qdisc is currently root. We need to implement TC_SETUP_ROOT_QDISC handling, and for completeness also clear the root Qdisc pointer when it's freed. TC_SETUP_ROOT_QDISC isn't always sent when device is dismantled. Remembering the root Qdisc will allow us to build the entire hierarchy in following patches. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- drivers/net/ethernet/netronome/nfp/abm/main.c | 2 ++ drivers/net/ethernet/netronome/nfp/abm/main.h | 4 drivers/net/ethernet/netronome/nfp/abm/qdisc.c | 13 + 3 files changed, 19 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c b/drivers/net/ethernet/netronome/nfp/abm/main.c index 35f3a6054569..da5394886a78 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/main.c +++ b/drivers/net/ethernet/netronome/nfp/abm/main.c @@ -37,6 +37,8 @@ nfp_abm_setup_tc(struct nfp_app *app, struct net_device *netdev, return -EOPNOTSUPP; switch (type) { + case TC_SETUP_ROOT_QDISC: + return nfp_abm_setup_root(netdev, repr->app_priv, type_data); case TC_SETUP_QDISC_MQ: return nfp_abm_setup_tc_mq(netdev, repr->app_priv, type_data); case TC_SETUP_QDISC_RED: diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h b/drivers/net/ethernet/netronome/nfp/abm/main.h index 64cb5ebcf80e..48d519989886 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/main.h +++ b/drivers/net/ethernet/netronome/nfp/abm/main.h @@ -116,6 +116,7 @@ struct nfp_red_qdisc { * @parent:handle of expected parent, i.e. handle of MQ, or TC_H_ROOT * @num_qdiscs:number of currently used qdiscs * @red_qdiscs:array of qdiscs + * @root_qdisc:pointer to the current root of the Qdisc hierarchy * @qdiscs:all qdiscs recorded by major part of the handle */ struct nfp_abm_link { @@ -127,9 +128,12 @@ struct nfp_abm_link { u32 parent; unsigned int num_qdiscs; struct nfp_red_qdisc *red_qdiscs; + struct nfp_qdisc *root_qdisc; struct radix_tree_root qdiscs; }; +int nfp_abm_setup_root(struct net_device *netdev, struct nfp_abm_link *alink, + struct tc_root_qopt_offload *opt); int nfp_abm_setup_tc_red(struct net_device *netdev, struct nfp_abm_link *alink, struct tc_red_qopt_offload *opt); int nfp_abm_setup_tc_mq(struct net_device *netdev, struct nfp_abm_link *alink, diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c index a6f95924656d..ba6ce2d1eda2 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c +++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c @@ -143,6 +143,9 @@ nfp_abm_qdisc_destroy(struct net_device *netdev, struct nfp_abm_link *alink, return; nfp_abm_qdisc_free(netdev, alink, qdisc); + + if (alink->root_qdisc == qdisc) + alink->root_qdisc = NULL; } static void @@ -450,3 +453,13 @@ int nfp_abm_setup_tc_mq(struct net_device *netdev, struct nfp_abm_link *alink, return -EOPNOTSUPP; } } + +int nfp_abm_setup_root(struct net_device *netdev, struct nfp_abm_link *alink, + struct tc_root_qopt_offload *opt) +{ + if (opt->ingress) + return -EOPNOTSUPP; + alink->root_qdisc = nfp_abm_qdisc_find(alink, opt->handle); + + return 0; +} -- 2.17.1
[PATCH net-next 10/13] net: sched: red: notify drivers about RED's limit parameter
RED qdisc's limit parameter changes the behaviour of the qdisc, for instance if it's set to 0 qdisc will drop all the packets. When replace operation happens and parameter is set to non-0 a new fifo qdisc will be instantiated and replace the old child qdisc which will be destroyed. Drivers need to know the parameter, even if they don't impose the actual limit to be able to reliably reconstruct the Qdisc hierarchy. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- include/net/pkt_cls.h | 1 + net/sched/sch_red.c | 1 + 2 files changed, 2 insertions(+) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 5d31820b7e80..c497ada7f591 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -850,6 +850,7 @@ struct tc_red_qopt_offload_params { u32 min; u32 max; u32 probability; + u32 limit; bool is_ecn; bool is_harddrop; struct gnet_stats_queue *qstats; diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c index 4b5ca172ee2d..9df9942340ea 100644 --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -166,6 +166,7 @@ static int red_offload(struct Qdisc *sch, bool enable) opt.set.min = q->parms.qth_min >> q->parms.Wlog; opt.set.max = q->parms.qth_max >> q->parms.Wlog; opt.set.probability = q->parms.max_P; + opt.set.limit = q->limit; opt.set.is_ecn = red_use_ecn(q); opt.set.is_harddrop = red_use_harddrop(q); opt.set.qstats = >qstats; -- 2.17.1
[PATCH net-next 07/13] net: sched: red: offload a graft notification
Drivers offloading Qdiscs should have reasonable certainty the offloaded behaviour matches the SW path. This is impossible if the driver does not know about all Qdiscs or when Qdiscs move and are reused. Send a graft notification from RED. The drivers are expected to simply stop offloading the Qdisc, if a non-standard child is ever grafted onto it. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- include/net/pkt_cls.h | 2 ++ net/sched/sch_red.c | 17 + 2 files changed, 19 insertions(+) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index fa31d034231d..01f2802b7aee 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -834,6 +834,7 @@ enum tc_red_command { TC_RED_DESTROY, TC_RED_STATS, TC_RED_XSTATS, + TC_RED_GRAFT, }; struct tc_red_qopt_offload_params { @@ -853,6 +854,7 @@ struct tc_red_qopt_offload { struct tc_red_qopt_offload_params set; struct tc_qopt_offload_stats stats; struct red_stats *xstats; + u32 child_handle; }; }; diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c index a1d08bdd9357..4b5ca172ee2d 100644 --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -367,6 +367,21 @@ static int red_dump_class(struct Qdisc *sch, unsigned long cl, return 0; } +static void red_graft_offload(struct Qdisc *sch, + struct Qdisc *new, struct Qdisc *old, + struct netlink_ext_ack *extack) +{ + struct tc_red_qopt_offload graft_offload = { + .handle = sch->handle, + .parent = sch->parent, + .child_handle = new->handle, + .command= TC_RED_GRAFT, + }; + + qdisc_offload_graft_helper(qdisc_dev(sch), sch, new, old, + TC_SETUP_QDISC_RED, _offload, extack); +} + static int red_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, struct Qdisc **old, struct netlink_ext_ack *extack) { @@ -376,6 +391,8 @@ static int red_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, new = _qdisc; *old = qdisc_replace(sch, new, >qdisc); + + red_graft_offload(sch, new, *old, extack); return 0; } -- 2.17.1
[PATCH net-next 03/13] nfp: abm: track all offload-enabled qdiscs
Allocate an object corresponding to any offloaded qdisc we are informed about by the kernel. Not only the qdiscs we have a chance of offloading. The count of created objects will be used to decide whether the ethtool TC offload can be disabled, since otherwise we may miss destroy commands. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- drivers/net/ethernet/netronome/nfp/abm/main.c | 2 + drivers/net/ethernet/netronome/nfp/abm/main.h | 23 .../net/ethernet/netronome/nfp/abm/qdisc.c| 111 +- 3 files changed, 132 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c b/drivers/net/ethernet/netronome/nfp/abm/main.c index c6ae0ac9a154..35f3a6054569 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/main.c +++ b/drivers/net/ethernet/netronome/nfp/abm/main.c @@ -329,6 +329,7 @@ nfp_abm_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, unsigned int id) nfp_abm_vnic_set_mac(app->pf, abm, nn, id); nfp_abm_ctrl_read_params(alink); + INIT_RADIX_TREE(>qdiscs, GFP_KERNEL); return 0; @@ -344,6 +345,7 @@ static void nfp_abm_vnic_free(struct nfp_app *app, struct nfp_net *nn) struct nfp_abm_link *alink = nn->app_priv; nfp_abm_kill_reprs(alink->abm, alink); + WARN(!radix_tree_empty(>qdiscs), "left over qdiscs\n"); kvfree(alink->red_qdiscs); kfree(alink); } diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h b/drivers/net/ethernet/netronome/nfp/abm/main.h index 15732ad7c202..64cb5ebcf80e 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/main.h +++ b/drivers/net/ethernet/netronome/nfp/abm/main.h @@ -5,6 +5,7 @@ #define __NFP_ABM_H__ 1 #include +#include #include #include @@ -71,6 +72,26 @@ struct nfp_alink_xstats { u64 pdrop; }; +enum nfp_qdisc_type { + NFP_QDISC_NONE = 0, + NFP_QDISC_MQ, + NFP_QDISC_RED, +}; + +/** + * struct nfp_qdisc - tracked TC Qdisc + * @netdev:netdev on which Qdisc was created + * @type: Qdisc type + * @handle:handle of this Qdisc + * @parent_handle: handle of the parent (unreliable if Qdisc was grafted) + */ +struct nfp_qdisc { + struct net_device *netdev; + enum nfp_qdisc_type type; + u32 handle; + u32 parent_handle; +}; + /** * struct nfp_red_qdisc - representation of single RED Qdisc * @handle:handle of currently offloaded RED Qdisc @@ -95,6 +116,7 @@ struct nfp_red_qdisc { * @parent:handle of expected parent, i.e. handle of MQ, or TC_H_ROOT * @num_qdiscs:number of currently used qdiscs * @red_qdiscs:array of qdiscs + * @qdiscs:all qdiscs recorded by major part of the handle */ struct nfp_abm_link { struct nfp_abm *abm; @@ -105,6 +127,7 @@ struct nfp_abm_link { u32 parent; unsigned int num_qdiscs; struct nfp_red_qdisc *red_qdiscs; + struct radix_tree_root qdiscs; }; int nfp_abm_setup_tc_red(struct net_device *netdev, struct nfp_abm_link *alink, diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c index abb0a24c7fac..a6f95924656d 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c +++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c @@ -63,17 +63,97 @@ static void nfp_abm_offload_update(struct nfp_abm *abm) } static void -__nfp_abm_reset_root(struct net_device *netdev, struct nfp_abm_link *alink, -u32 handle, unsigned int qs, u32 init_val) +nfp_abm_qdisc_free(struct net_device *netdev, struct nfp_abm_link *alink, + struct nfp_qdisc *qdisc) { struct nfp_port *port = nfp_port_from_netdev(netdev); + if (!qdisc) + return; + WARN_ON(radix_tree_delete(>qdiscs, + TC_H_MAJ(qdisc->handle)) != qdisc); + kfree(qdisc); + + port->tc_offload_cnt--; +} + +static struct nfp_qdisc * +nfp_abm_qdisc_alloc(struct net_device *netdev, struct nfp_abm_link *alink, + enum nfp_qdisc_type type, u32 parent_handle, u32 handle) +{ + struct nfp_port *port = nfp_port_from_netdev(netdev); + struct nfp_qdisc *qdisc; + int err; + + qdisc = kzalloc(sizeof(*qdisc), GFP_KERNEL); + if (!qdisc) + return NULL; + + qdisc->netdev = netdev; + qdisc->type = type; + qdisc->parent_handle = parent_handle; + qdisc->handle = handle; + + err = radix_tree_insert(>qdiscs, TC_H_MAJ(qdisc->handle), qdisc); + if (err) { + nfp_err(alink->abm->app->cpp, + "Qdisc insertion into radix tree failed: %d\n", err); + goto err_free_qdisc; + } + + port->tc_offload_cnt++; + return qdisc; + +err_free_qdisc: + kfree(qdisc); + return NULL; +} + +static struct nfp_qdisc * +nfp_abm_qdisc_find(struct nfp_abm_link *alink, u32 handle) +{ + return
[PATCH net-next 04/13] net: sched: provide notification for graft on root
Drivers are currently not notified when a Qdisc is grafted as root. This requires special casing Qdiscs added with parent = TC_H_ROOT in the driver. Also there is no notification sent to the driver when an existing Qdisc is grafted as root. Add this very simple notifications, drivers should now be able to track their Qdisc tree fully. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- include/linux/netdevice.h | 1 + include/net/pkt_cls.h | 10 ++ net/sched/sch_api.c | 17 + 3 files changed, 28 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 487fa5e0e165..97b4233120e4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -845,6 +845,7 @@ enum tc_setup_type { TC_SETUP_QDISC_PRIO, TC_SETUP_QDISC_MQ, TC_SETUP_QDISC_ETF, + TC_SETUP_ROOT_QDISC, }; /* These structures hold the attributes of bpf state that are being passed diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index f6c0cd29dea4..fa31d034231d 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -889,4 +889,14 @@ struct tc_prio_qopt_offload { }; }; +enum tc_root_command { + TC_ROOT_GRAFT, +}; + +struct tc_root_qopt_offload { + enum tc_root_command command; + u32 handle; + bool ingress; +}; + #endif diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index f55bc50cd0a9..9c88cec7e8a2 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -860,6 +860,21 @@ void qdisc_offload_graft_helper(struct net_device *dev, struct Qdisc *sch, } EXPORT_SYMBOL(qdisc_offload_graft_helper); +static void qdisc_offload_graft_root(struct net_device *dev, +struct Qdisc *new, struct Qdisc *old, +struct netlink_ext_ack *extack) +{ + struct tc_root_qopt_offload graft_offload = { + .command= TC_ROOT_GRAFT, + .handle = new ? new->handle : 0, + .ingress= (new && new->flags & TCQ_F_INGRESS) || + (old && old->flags & TCQ_F_INGRESS), + }; + + qdisc_offload_graft_helper(dev, NULL, new, old, + TC_SETUP_ROOT_QDISC, _offload, extack); +} + static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid, u32 portid, u32 seq, u16 flags, int event) { @@ -1026,6 +1041,8 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent, if (dev->flags & IFF_UP) dev_deactivate(dev); + qdisc_offload_graft_root(dev, new, old, extack); + if (new && new->ops->attach) goto skip; -- 2.17.1
[PATCH net-next 06/13] nfp: abm: allocate Qdisc child table
To keep track of Qdisc hierarchy allocate a table for children for each Qdisc. RED Qdisc can only have one child. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- drivers/net/ethernet/netronome/nfp/abm/main.h | 6 + .../net/ethernet/netronome/nfp/abm/qdisc.c| 25 +-- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h b/drivers/net/ethernet/netronome/nfp/abm/main.h index 48d519989886..adffa36981e0 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/main.h +++ b/drivers/net/ethernet/netronome/nfp/abm/main.h @@ -84,12 +84,18 @@ enum nfp_qdisc_type { * @type: Qdisc type * @handle:handle of this Qdisc * @parent_handle: handle of the parent (unreliable if Qdisc was grafted) + * @use_cnt: number of attachment points in the hierarchy + * @num_children: current size of the @children array + * @children: pointers to children */ struct nfp_qdisc { struct net_device *netdev; enum nfp_qdisc_type type; u32 handle; u32 parent_handle; + unsigned int use_cnt; + unsigned int num_children; + struct nfp_qdisc **children; }; /** diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c index ba6ce2d1eda2..3ecb63060429 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c +++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c @@ -72,6 +72,8 @@ nfp_abm_qdisc_free(struct net_device *netdev, struct nfp_abm_link *alink, return; WARN_ON(radix_tree_delete(>qdiscs, TC_H_MAJ(qdisc->handle)) != qdisc); + + kfree(qdisc->children); kfree(qdisc); port->tc_offload_cnt--; @@ -79,7 +81,8 @@ nfp_abm_qdisc_free(struct net_device *netdev, struct nfp_abm_link *alink, static struct nfp_qdisc * nfp_abm_qdisc_alloc(struct net_device *netdev, struct nfp_abm_link *alink, - enum nfp_qdisc_type type, u32 parent_handle, u32 handle) + enum nfp_qdisc_type type, u32 parent_handle, u32 handle, + unsigned int children) { struct nfp_port *port = nfp_port_from_netdev(netdev); struct nfp_qdisc *qdisc; @@ -89,21 +92,28 @@ nfp_abm_qdisc_alloc(struct net_device *netdev, struct nfp_abm_link *alink, if (!qdisc) return NULL; + qdisc->children = kcalloc(children, sizeof(void *), GFP_KERNEL); + if (!qdisc->children) + goto err_free_qdisc; + qdisc->netdev = netdev; qdisc->type = type; qdisc->parent_handle = parent_handle; qdisc->handle = handle; + qdisc->num_children = children; err = radix_tree_insert(>qdiscs, TC_H_MAJ(qdisc->handle), qdisc); if (err) { nfp_err(alink->abm->app->cpp, "Qdisc insertion into radix tree failed: %d\n", err); - goto err_free_qdisc; + goto err_free_child_tbl; } port->tc_offload_cnt++; return qdisc; +err_free_child_tbl: + kfree(qdisc->children); err_free_qdisc: kfree(qdisc); return NULL; @@ -118,7 +128,7 @@ nfp_abm_qdisc_find(struct nfp_abm_link *alink, u32 handle) static int nfp_abm_qdisc_replace(struct net_device *netdev, struct nfp_abm_link *alink, enum nfp_qdisc_type type, u32 parent_handle, u32 handle, - struct nfp_qdisc **qdisc) + unsigned int children, struct nfp_qdisc **qdisc) { *qdisc = nfp_abm_qdisc_find(alink, handle); if (*qdisc) { @@ -127,8 +137,8 @@ nfp_abm_qdisc_replace(struct net_device *netdev, struct nfp_abm_link *alink, return 0; } - *qdisc = nfp_abm_qdisc_alloc(netdev, alink, type, parent_handle, -handle); + *qdisc = nfp_abm_qdisc_alloc(netdev, alink, type, parent_handle, handle, +children); return *qdisc ? 0 : -ENOMEM; } @@ -248,7 +258,7 @@ nfp_abm_red_replace(struct net_device *netdev, struct nfp_abm_link *alink, int ret; ret = nfp_abm_qdisc_replace(netdev, alink, NFP_QDISC_RED, opt->parent, - opt->handle, ); + opt->handle, 1, ); i = nfp_abm_red_find(alink, opt); existing = i >= 0; @@ -428,7 +438,8 @@ nfp_abm_mq_create(struct net_device *netdev, struct nfp_abm_link *alink, struct nfp_qdisc *qdisc; return nfp_abm_qdisc_replace(netdev, alink, NFP_QDISC_MQ, -TC_H_ROOT, opt->handle, ); +TC_H_ROOT, opt->handle, +alink->total_queues, ); } int nfp_abm_setup_tc_mq(struct net_device *netdev, struct nfp_abm_link *alink, -- 2.17.1
[PATCH net-next 12/13] nfp: abm: save RED's parameters
Use the new driver Qdisc structure to keep track of parameters of RED Qdiscs. This way as the Qdisc moves around in the hierarchy we will be able to configure the HW appropriately. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- drivers/net/ethernet/netronome/nfp/abm/main.h | 14 ++ drivers/net/ethernet/netronome/nfp/abm/qdisc.c | 5 - 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h b/drivers/net/ethernet/netronome/nfp/abm/main.h index daca93e90099..d0d85f82bd1c 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/main.h +++ b/drivers/net/ethernet/netronome/nfp/abm/main.h @@ -89,6 +89,11 @@ enum nfp_qdisc_type { * @use_cnt: number of attachment points in the hierarchy * @num_children: current size of the @children array * @children: pointers to children + * + * @params_ok: parameters of this Qdisc are OK for offload + * + * @red: RED Qdisc specific parameters and state + * @red.threshold: ECN marking threshold */ struct nfp_qdisc { struct net_device *netdev; @@ -98,6 +103,15 @@ struct nfp_qdisc { unsigned int use_cnt; unsigned int num_children; struct nfp_qdisc **children; + + bool params_ok; + + union { + /* TC_SETUP_QDISC_RED */ + struct { + u32 threshold; + } red; + }; }; /** diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c index 1b3c0b5b52bf..fb68038ec1da 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c +++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c @@ -375,7 +375,10 @@ nfp_abm_red_replace(struct net_device *netdev, struct nfp_abm_link *alink, qdisc->children[0] = NFP_QDISC_UNTRACKED; } - if (!nfp_abm_red_check_params(alink, opt)) { + qdisc->params_ok = nfp_abm_red_check_params(alink, opt); + if (qdisc->params_ok) { + qdisc->red.threshold = opt->set.min; + } else { err = -EINVAL; goto err_destroy; } -- 2.17.1
[PATCH net-next 00/13] nfp: abm: track all Qdiscs
Hi! Our Qdisc offload so far has been very simplistic. We held and array of marking thresholds and statistics sized to the number of PF queues. This was sufficient since the only configuration we supported was single layer of RED Qdiscs (on top of MQ or not, but MQ isn't really about queuing). As we move to add more Qdiscs it's time to actually try to track the full Qdisc hierarchy. This allows us to make sure our offloaded configuration reflects the SW path better. We add graft notifications to MQ and RED (PRIO already sends them) to allow drivers offloading those to learn how Qdiscs are linked. MQ graft gives us the obvious advantage of being able to track when Qdiscs are shared or moved. It seems unlikely HW would offload RED's child Qdiscs but since the behaviour would change based on linked child we should stop offloading REDs with modified child. RED will also handle the child differently during reconfig when limit parameter is set - so we have to inform the drivers about the limit, and have them reset the child state when appropriate. The NFP driver will now allocate a structure to track each Qdisc and link it to its children. We will also maintain a shadow copy of threshold settings - to save device writes and make it easier to apply defaults when config is re-evaluated. Jakub Kicinski (13): nfp: abm: rename qdiscs -> red_qdiscs nfp: abm: keep track of all RED thresholds nfp: abm: track all offload-enabled qdiscs net: sched: provide notification for graft on root nfp: abm: remember which Qdisc is root nfp: abm: allocate Qdisc child table net: sched: red: offload a graft notification net: sched: mq: offload a graft notification nfp: abm: build full Qdisc hierarchy based on graft notifications net: sched: red: notify drivers about RED's limit parameter nfp: abm: reset RED's child based on limit nfp: abm: save RED's parameters nfp: abm: restructure Qdisc handling drivers/net/ethernet/netronome/nfp/abm/ctrl.c | 105 +-- drivers/net/ethernet/netronome/nfp/abm/main.c | 42 +- drivers/net/ethernet/netronome/nfp/abm/main.h | 108 ++- .../net/ethernet/netronome/nfp/abm/qdisc.c| 712 +- include/linux/netdevice.h | 1 + include/net/pkt_cls.h | 24 +- net/sched/sch_api.c | 17 + net/sched/sch_mq.c| 9 + net/sched/sch_red.c | 18 + 9 files changed, 740 insertions(+), 296 deletions(-) -- 2.17.1
[PATCH net-next 02/13] nfp: abm: keep track of all RED thresholds
Instead of writing the threshold out when Qdisc is configured and not remembering it move to a scheme where we remember all thresholds. When configuration changes parse the offloaded Qdiscs and set thresholds appropriately. This will help future extensions. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- drivers/net/ethernet/netronome/nfp/abm/ctrl.c | 32 +++ drivers/net/ethernet/netronome/nfp/abm/main.c | 25 +- drivers/net/ethernet/netronome/nfp/abm/main.h | 16 +++- .../net/ethernet/netronome/nfp/abm/qdisc.c| 88 ++- 4 files changed, 120 insertions(+), 41 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c index 3c661f422688..564427e8a6e8 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c +++ b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c @@ -71,35 +71,37 @@ nfp_abm_ctrl_stat_all(struct nfp_abm_link *alink, const struct nfp_rtsym *sym, return 0; } -int nfp_abm_ctrl_set_q_lvl(struct nfp_abm_link *alink, unsigned int i, u32 val) +int __nfp_abm_ctrl_set_q_lvl(struct nfp_abm *abm, unsigned int id, u32 val) { - struct nfp_cpp *cpp = alink->abm->app->cpp; + struct nfp_cpp *cpp = abm->app->cpp; u64 sym_offset; int err; - sym_offset = (alink->queue_base + i) * NFP_QLVL_STRIDE + NFP_QLVL_THRS; - err = __nfp_rtsym_writel(cpp, alink->abm->q_lvls, 4, 0, -sym_offset, val); + __clear_bit(id, abm->threshold_undef); + if (abm->thresholds[id] == val) + return 0; + + sym_offset = id * NFP_QLVL_STRIDE + NFP_QLVL_THRS; + err = __nfp_rtsym_writel(cpp, abm->q_lvls, 4, 0, sym_offset, val); if (err) { - nfp_err(cpp, "RED offload setting level failed on vNIC %d queue %d\n", - alink->id, i); + nfp_err(cpp, + "RED offload setting level failed on subqueue %d\n", + id); return err; } + abm->thresholds[id] = val; return 0; } -int nfp_abm_ctrl_set_all_q_lvls(struct nfp_abm_link *alink, u32 val) +int nfp_abm_ctrl_set_q_lvl(struct nfp_abm_link *alink, unsigned int queue, + u32 val) { - int i, err; + unsigned int threshold; - for (i = 0; i < alink->vnic->max_rx_rings; i++) { - err = nfp_abm_ctrl_set_q_lvl(alink, i, val); - if (err) - return err; - } + threshold = alink->queue_base + queue; - return 0; + return __nfp_abm_ctrl_set_q_lvl(alink->abm, threshold, val); } u64 nfp_abm_ctrl_stat_non_sto(struct nfp_abm_link *alink, unsigned int i) diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c b/drivers/net/ethernet/netronome/nfp/abm/main.c index 5e749602989e..c6ae0ac9a154 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/main.c +++ b/drivers/net/ethernet/netronome/nfp/abm/main.c @@ -2,6 +2,7 @@ /* Copyright (C) 2018 Netronome Systems, Inc. */ #include +#include #include #include #include @@ -399,6 +400,7 @@ static int nfp_abm_init(struct nfp_app *app) struct nfp_pf *pf = app->pf; struct nfp_reprs *reprs; struct nfp_abm *abm; + unsigned int i; int err; if (!pf->eth_tbl) { @@ -425,15 +427,28 @@ static int nfp_abm_init(struct nfp_app *app) if (err) goto err_free_abm; + err = -ENOMEM; + abm->num_thresholds = NFP_NET_MAX_RX_RINGS; + abm->threshold_undef = bitmap_zalloc(abm->num_thresholds, GFP_KERNEL); + if (!abm->threshold_undef) + goto err_free_abm; + + abm->thresholds = kvcalloc(abm->num_thresholds, + sizeof(*abm->thresholds), GFP_KERNEL); + if (!abm->thresholds) + goto err_free_thresh_umap; + for (i = 0; i < NFP_NET_MAX_RX_RINGS; i++) + __nfp_abm_ctrl_set_q_lvl(abm, i, NFP_ABM_LVL_INFINITY); + /* We start in legacy mode, make sure advanced queuing is disabled */ err = nfp_abm_ctrl_qm_disable(abm); if (err) - goto err_free_abm; + goto err_free_thresh; err = -ENOMEM; reprs = nfp_reprs_alloc(pf->max_data_vnics); if (!reprs) - goto err_free_abm; + goto err_free_thresh; RCU_INIT_POINTER(app->reprs[NFP_REPR_TYPE_PHYS_PORT], reprs); reprs = nfp_reprs_alloc(pf->max_data_vnics); @@ -445,6 +460,10 @@ static int nfp_abm_init(struct nfp_app *app) err_free_phys: nfp_reprs_clean_and_free_by_type(app, NFP_REPR_TYPE_PHYS_PORT); +err_free_thresh: + kvfree(abm->thresholds); +err_free_thresh_umap: + bitmap_free(abm->threshold_undef); err_free_abm: kfree(abm); app->priv = NULL; @@ -458,6 +477,8 @@ static void nfp_abm_clean(struct nfp_app *app)
[PATCH net-next 13/13] nfp: abm: restructure Qdisc handling
In preparation of handling more Qdisc types switch to a different offload strategy. We have now recreated the Qdisc hierarchy in the driver. Every time the hierarchy changes parse it, and update the configuration of the HW accordingly. While at it drop the support of pretending that we can instantiate a single queue on a multi-queue device in HW/FW. MQ is now required, and each queue will have its own instance of RED. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- drivers/net/ethernet/netronome/nfp/abm/ctrl.c | 73 --- drivers/net/ethernet/netronome/nfp/abm/main.c | 14 +- drivers/net/ethernet/netronome/nfp/abm/main.h | 57 +- .../net/ethernet/netronome/nfp/abm/qdisc.c| 548 ++ 4 files changed, 340 insertions(+), 352 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c index 564427e8a6e8..1629b07f727b 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c +++ b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c @@ -50,27 +50,6 @@ nfp_abm_ctrl_stat(struct nfp_abm_link *alink, const struct nfp_rtsym *sym, return 0; } -static int -nfp_abm_ctrl_stat_all(struct nfp_abm_link *alink, const struct nfp_rtsym *sym, - unsigned int stride, unsigned int offset, bool is_u64, - u64 *res) -{ - u64 val, sum = 0; - unsigned int i; - int err; - - for (i = 0; i < alink->vnic->max_rx_rings; i++) { - err = nfp_abm_ctrl_stat(alink, sym, stride, offset, i, - is_u64, ); - if (err) - return err; - sum += val; - } - - *res = sum; - return 0; -} - int __nfp_abm_ctrl_set_q_lvl(struct nfp_abm *abm, unsigned int id, u32 val) { struct nfp_cpp *cpp = abm->app->cpp; @@ -155,42 +134,6 @@ int nfp_abm_ctrl_read_q_stats(struct nfp_abm_link *alink, unsigned int i, i, true, >overlimits); } -int nfp_abm_ctrl_read_stats(struct nfp_abm_link *alink, - struct nfp_alink_stats *stats) -{ - u64 pkts = 0, bytes = 0; - int i, err; - - for (i = 0; i < alink->vnic->max_rx_rings; i++) { - pkts += nn_readq(alink->vnic, NFP_NET_CFG_RXR_STATS(i)); - bytes += nn_readq(alink->vnic, NFP_NET_CFG_RXR_STATS(i) + 8); - } - stats->tx_pkts = pkts; - stats->tx_bytes = bytes; - - err = nfp_abm_ctrl_stat_all(alink, alink->abm->q_lvls, - NFP_QLVL_STRIDE, NFP_QLVL_BLOG_BYTES, - false, >backlog_bytes); - if (err) - return err; - - err = nfp_abm_ctrl_stat_all(alink, alink->abm->q_lvls, - NFP_QLVL_STRIDE, NFP_QLVL_BLOG_PKTS, - false, >backlog_pkts); - if (err) - return err; - - err = nfp_abm_ctrl_stat_all(alink, alink->abm->qm_stats, - NFP_QMSTAT_STRIDE, NFP_QMSTAT_DROP, - true, >drops); - if (err) - return err; - - return nfp_abm_ctrl_stat_all(alink, alink->abm->qm_stats, -NFP_QMSTAT_STRIDE, NFP_QMSTAT_ECN, -true, >overlimits); -} - int nfp_abm_ctrl_read_q_xstats(struct nfp_abm_link *alink, unsigned int i, struct nfp_alink_xstats *xstats) { @@ -207,22 +150,6 @@ int nfp_abm_ctrl_read_q_xstats(struct nfp_abm_link *alink, unsigned int i, i, true, >ecn_marked); } -int nfp_abm_ctrl_read_xstats(struct nfp_abm_link *alink, -struct nfp_alink_xstats *xstats) -{ - int err; - - err = nfp_abm_ctrl_stat_all(alink, alink->abm->qm_stats, - NFP_QMSTAT_STRIDE, NFP_QMSTAT_DROP, - true, >pdrop); - if (err) - return err; - - return nfp_abm_ctrl_stat_all(alink, alink->abm->qm_stats, -NFP_QMSTAT_STRIDE, NFP_QMSTAT_ECN, -true, >ecn_marked); -} - int nfp_abm_ctrl_qm_enable(struct nfp_abm *abm) { return nfp_mbox_cmd(abm->app->pf, NFP_MBOX_PCIE_ABM_ENABLE, diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c b/drivers/net/ethernet/netronome/nfp/abm/main.c index da5394886a78..a5732d3bd1b7 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/main.c +++ b/drivers/net/ethernet/netronome/nfp/abm/main.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include "../nfpcore/nfp.h" @@ -310,22 +311,14 @@ nfp_abm_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, unsigned int id) alink->abm = abm; alink->vnic = nn; alink->id = id; - alink->parent = TC_H_ROOT;
[PATCH net-next 11/13] nfp: abm: reset RED's child based on limit
RED qdisc will replace its child Qdisc with a new FIFO queue if it is reconfigured and the limit parameter is not 0. This means that when it's created with limit of 0 it will have no FIFO, and all packets will be dropped. If it's changed and limit is specified it will loose its existing child (implicit graft). Make sure we mark RED Qdisc child as NFP_QDISC_UNTRACKED if its not the expected FIFO. nfp_abm_qdisc_replace() will return 1 if Qdisc already existed. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- .../net/ethernet/netronome/nfp/abm/qdisc.c| 27 +++ 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c index 151d2dafbc76..1b3c0b5b52bf 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c +++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c @@ -196,7 +196,7 @@ nfp_abm_qdisc_replace(struct net_device *netdev, struct nfp_abm_link *alink, if (*qdisc) { if (WARN_ON((*qdisc)->type != type)) return -EINVAL; - return 0; + return 1; } *qdisc = nfp_abm_qdisc_alloc(netdev, alink, type, parent_handle, handle, @@ -357,11 +357,24 @@ nfp_abm_red_replace(struct net_device *netdev, struct nfp_abm_link *alink, i = nfp_abm_red_find(alink, opt); existing = i >= 0; - if (ret) { + if (ret < 0) { err = ret; goto err_destroy; } + /* If limit != 0 child gets reset */ + if (opt->set.limit) { + if (nfp_abm_qdisc_child_valid(qdisc, 0)) + qdisc->children[0]->use_cnt--; + qdisc->children[0] = NULL; + } else { + /* Qdisc was just allocated without a limit will use noop_qdisc, +* i.e. a block hole. +*/ + if (!ret) + qdisc->children[0] = NFP_QDISC_UNTRACKED; + } + if (!nfp_abm_red_check_params(alink, opt)) { err = -EINVAL; goto err_destroy; @@ -533,10 +546,14 @@ nfp_abm_mq_create(struct net_device *netdev, struct nfp_abm_link *alink, struct tc_mq_qopt_offload *opt) { struct nfp_qdisc *qdisc; + int ret; - return nfp_abm_qdisc_replace(netdev, alink, NFP_QDISC_MQ, -TC_H_ROOT, opt->handle, -alink->total_queues, ); + ret = nfp_abm_qdisc_replace(netdev, alink, NFP_QDISC_MQ, + TC_H_ROOT, opt->handle, alink->total_queues, + ); + if (ret < 0) + return ret; + return 0; } int nfp_abm_setup_tc_mq(struct net_device *netdev, struct nfp_abm_link *alink, -- 2.17.1
[PATCH net-next 08/13] net: sched: mq: offload a graft notification
Drivers offloading Qdiscs should have reasonable certainty the offloaded behaviour matches the SW path. This is impossible if the driver does not know about all Qdiscs or when Qdiscs move and are reused. Send a graft notification from MQ. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- include/net/pkt_cls.h | 11 ++- net/sched/sch_mq.c| 9 + 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 01f2802b7aee..5d31820b7e80 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -821,12 +821,21 @@ enum tc_mq_command { TC_MQ_CREATE, TC_MQ_DESTROY, TC_MQ_STATS, + TC_MQ_GRAFT, +}; + +struct tc_mq_opt_offload_graft_params { + unsigned long queue; + u32 child_handle; }; struct tc_mq_qopt_offload { enum tc_mq_command command; u32 handle; - struct tc_qopt_offload_stats stats; + union { + struct tc_qopt_offload_stats stats; + struct tc_mq_opt_offload_graft_params graft_params; + }; }; enum tc_red_command { diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c index 1db5c1bf6ddd..203659bc3906 100644 --- a/net/sched/sch_mq.c +++ b/net/sched/sch_mq.c @@ -193,6 +193,7 @@ static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new, struct Qdisc **old, struct netlink_ext_ack *extack) { struct netdev_queue *dev_queue = mq_queue_get(sch, cl); + struct tc_mq_qopt_offload graft_offload; struct net_device *dev = qdisc_dev(sch); if (dev->flags & IFF_UP) @@ -203,6 +204,14 @@ static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new, new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; if (dev->flags & IFF_UP) dev_activate(dev); + + graft_offload.handle = sch->handle; + graft_offload.graft_params.queue = cl - 1; + graft_offload.graft_params.child_handle = new ? new->handle : 0; + graft_offload.command = TC_MQ_GRAFT; + + qdisc_offload_graft_helper(qdisc_dev(sch), sch, new, *old, + TC_SETUP_QDISC_MQ, _offload, extack); return 0; } -- 2.17.1
[PATCH net-next 09/13] nfp: abm: build full Qdisc hierarchy based on graft notifications
Using graft notifications recreate in the driver the full Qdisc hierarchy. Keep track of how many times each Qdisc is attached to the hierarchy to make sure we don't offload Qdiscs which are attached multiple times (device queues can't be shared). For graft events of Qdiscs we don't know exist make the child as invalid/untracked. Note that MQ Qdisc doesn't send destruction events reliably when device is dismantled, so we need to manually clean out the children otherwise we'd think Qdiscs which are still in use are getting freed. Signed-off-by: Jakub Kicinski Reviewed-by: John Hurley --- drivers/net/ethernet/netronome/nfp/abm/main.h | 2 + .../net/ethernet/netronome/nfp/abm/qdisc.c| 105 ++ 2 files changed, 107 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h b/drivers/net/ethernet/netronome/nfp/abm/main.h index adffa36981e0..daca93e90099 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/main.h +++ b/drivers/net/ethernet/netronome/nfp/abm/main.h @@ -78,6 +78,8 @@ enum nfp_qdisc_type { NFP_QDISC_RED, }; +#define NFP_QDISC_UNTRACKED((struct nfp_qdisc *)1UL) + /** * struct nfp_qdisc - tracked TC Qdisc * @netdev:netdev on which Qdisc was created diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c index 3ecb63060429..151d2dafbc76 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c +++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) /* Copyright (C) 2018 Netronome Systems, Inc. */ +#include #include #include #include @@ -12,6 +13,66 @@ #include "../nfp_port.h" #include "main.h" +static bool nfp_abm_qdisc_child_valid(struct nfp_qdisc *qdisc, unsigned int id) +{ + return qdisc->children[id] && + qdisc->children[id] != NFP_QDISC_UNTRACKED; +} + +static void *nfp_abm_qdisc_tree_deref_slot(void __rcu **slot) +{ + return rtnl_dereference(*slot); +} + +static void +nfp_abm_qdisc_unlink_children(struct nfp_qdisc *qdisc, + unsigned int start, unsigned int end) +{ + unsigned int i; + + for (i = start; i < end; i++) + if (nfp_abm_qdisc_child_valid(qdisc, i)) { + qdisc->children[i]->use_cnt--; + qdisc->children[i] = NULL; + } +} + +static void +nfp_abm_qdisc_clear_mq(struct net_device *netdev, struct nfp_abm_link *alink, + struct nfp_qdisc *qdisc) +{ + struct radix_tree_iter iter; + unsigned int mq_refs = 0; + void __rcu **slot; + + if (!qdisc->use_cnt) + return; + /* MQ doesn't notify well on destruction, we need special handling of +* MQ's children. +*/ + if (qdisc->type == NFP_QDISC_MQ && + qdisc == alink->root_qdisc && + netdev->reg_state == NETREG_UNREGISTERING) + return; + + /* Count refs held by MQ instances and clear pointers */ + radix_tree_for_each_slot(slot, >qdiscs, , 0) { + struct nfp_qdisc *mq = nfp_abm_qdisc_tree_deref_slot(slot); + unsigned int i; + + if (mq->type != NFP_QDISC_MQ || mq->netdev != netdev) + continue; + for (i = 0; i < mq->num_children; i++) + if (mq->children[i] == qdisc) { + mq->children[i] = NULL; + mq_refs++; + } + } + + WARN(qdisc->use_cnt != mq_refs, "non-zero qdisc use count: %d (- %d)\n", +qdisc->use_cnt, mq_refs); +} + static void nfp_abm_offload_compile_red(struct nfp_abm_link *alink, struct nfp_red_qdisc *qdisc, unsigned int queue) @@ -70,6 +131,7 @@ nfp_abm_qdisc_free(struct net_device *netdev, struct nfp_abm_link *alink, if (!qdisc) return; + nfp_abm_qdisc_clear_mq(netdev, alink, qdisc); WARN_ON(radix_tree_delete(>qdiscs, TC_H_MAJ(qdisc->handle)) != qdisc); @@ -152,12 +214,44 @@ nfp_abm_qdisc_destroy(struct net_device *netdev, struct nfp_abm_link *alink, if (!qdisc) return; + /* We don't get TC_SETUP_ROOT_QDISC w/ MQ when netdev is unregistered */ + if (alink->root_qdisc == qdisc) + qdisc->use_cnt--; + + nfp_abm_qdisc_unlink_children(qdisc, 0, qdisc->num_children); nfp_abm_qdisc_free(netdev, alink, qdisc); if (alink->root_qdisc == qdisc) alink->root_qdisc = NULL; } +static int +nfp_abm_qdisc_graft(struct nfp_abm_link *alink, u32 handle, u32 child_handle, + unsigned int id) +{ + struct nfp_qdisc *parent, *child; + + parent = nfp_abm_qdisc_find(alink, handle); + if (!parent) + return 0; + + if (WARN(id >=
[Patch net-next v2] net: dump more useful information in netdev_rx_csum_fault()
Currently netdev_rx_csum_fault() only shows a device name, we need more information about the skb for debugging csum failures. Sample output: ens3: hw csum failure dev features: 0x00014b89 skb len=84 data_len=0 pkt_type=0 gso_size=0 gso_type=0 nr_frags=0 ip_summed=0 csum=0 csum_complete_sw=0 csum_valid=0 csum_level=0 Note, I use pr_err() just to be consistent with the existing one. Signed-off-by: Cong Wang --- include/linux/netdevice.h | 5 +++-- net/core/datagram.c | 2 +- net/core/dev.c| 11 +-- net/core/skbuff.c | 4 ++-- net/sunrpc/socklib.c | 2 +- 5 files changed, 16 insertions(+), 8 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 487fa5e0e165..5a97ffbff932 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4342,9 +4342,10 @@ static inline bool can_checksum_protocol(netdev_features_t features, } #ifdef CONFIG_BUG -void netdev_rx_csum_fault(struct net_device *dev); +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb); #else -static inline void netdev_rx_csum_fault(struct net_device *dev) +static inline void netdev_rx_csum_fault(struct net_device *dev, + struct sk_buff *skb) { } #endif diff --git a/net/core/datagram.c b/net/core/datagram.c index 07983b90d2bd..4bf62b1afa3b 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -767,7 +767,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb, if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) && !skb->csum_complete_sw) - netdev_rx_csum_fault(NULL); + netdev_rx_csum_fault(NULL, skb); } return 0; fault: diff --git a/net/core/dev.c b/net/core/dev.c index bf7e0a471186..5927f6a7c301 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3091,10 +3091,17 @@ EXPORT_SYMBOL(__skb_gso_segment); /* Take action when hardware reception checksum errors are detected. */ #ifdef CONFIG_BUG -void netdev_rx_csum_fault(struct net_device *dev) +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb) { if (net_ratelimit()) { pr_err("%s: hw csum failure\n", dev ? dev->name : ""); + if (dev) + pr_err("dev features: %pNF\n", >features); + pr_err("skb len=%u data_len=%u pkt_type=%u gso_size=%u gso_type=%u nr_frags=%u ip_summed=%u csum=%x csum_complete_sw=%d csum_valid=%d csum_level=%u\n", + skb->len, skb->data_len, skb->pkt_type, + skb_shinfo(skb)->gso_size, skb_shinfo(skb)->gso_type, + skb_shinfo(skb)->nr_frags, skb->ip_summed, skb->csum, + skb->csum_complete_sw, skb->csum_valid, skb->csum_level); dump_stack(); } } @@ -5781,7 +5788,7 @@ __sum16 __skb_gro_checksum_complete(struct sk_buff *skb) if (likely(!sum)) { if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) && !skb->csum_complete_sw) - netdev_rx_csum_fault(skb->dev); + netdev_rx_csum_fault(skb->dev, skb); } NAPI_GRO_CB(skb)->csum = wsum; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 396fcb3baad0..fcb1155a00ec 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2653,7 +2653,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len) if (likely(!sum)) { if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) && !skb->csum_complete_sw) - netdev_rx_csum_fault(skb->dev); + netdev_rx_csum_fault(skb->dev, skb); } if (!skb_shared(skb)) skb->csum_valid = !sum; @@ -2673,7 +2673,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb) if (likely(!sum)) { if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) && !skb->csum_complete_sw) - netdev_rx_csum_fault(skb->dev); + netdev_rx_csum_fault(skb->dev, skb); } if (!skb_shared(skb)) { diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c index 9062967575c4..7e55cfc69697 100644 --- a/net/sunrpc/socklib.c +++ b/net/sunrpc/socklib.c @@ -175,7 +175,7 @@ int csum_partial_copy_to_xdr(struct xdr_buf *xdr, struct sk_buff *skb) return -1; if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) && !skb->csum_complete_sw) - netdev_rx_csum_fault(skb->dev); + netdev_rx_csum_fault(skb->dev, skb); return 0; no_checksum: if (xdr_partial_copy_from_skb(xdr, 0, , xdr_skb_read_bits) < 0) -- 2.19.1
Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()
On Fri, Nov 9, 2018 at 8:16 PM David Miller wrote: > > Didn't you move this function into net/core/skbuff.c? :-) At the time when I created this patch, that patch didn't hit net-next yet. :) > > Please respin. Sure. I will send v2, as I need to cleanup the mixed commas and spaces too. Thanks.
[iproute PATCH] man: ip-route.8: Document nexthop limit
Add a note to 'nexthop' description stating the maximum number of nexthops per command and pointing at 'append' command as a workaround. Signed-off-by: Phil Sutter --- man/man8/ip-route.8.in | 7 +++ 1 file changed, 7 insertions(+) diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in index a33ce1f0f4006..383178c11331e 100644 --- a/man/man8/ip-route.8.in +++ b/man/man8/ip-route.8.in @@ -589,6 +589,13 @@ argument lists: route reflecting its relative bandwidth or quality. .in -8 +The internal buffer used in iproute2 limits the maximum number of nexthops to +be specified in one go. If only a gateway address is given, the current buffer +size allows for 144 IPv6 nexthops and 253 IPv4 ones. If more are required, they +may be added to the existing route using +.B "ip route append" +command. + .TP .BI scope " SCOPE_VAL" the scope of the destinations covered by the route prefix. -- 2.19.0
Re: [PATCH v5 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin
On 11/12, Martin Lau wrote: > On Fri, Nov 09, 2018 at 08:21:41AM -0800, Stanislav Fomichev wrote: > [ ... ] > > @@ -1918,23 +2160,20 @@ void *bpf_object__priv(struct bpf_object *obj) > > } > > > > static struct bpf_program * > > -__bpf_program__next(struct bpf_program *prev, struct bpf_object *obj) > > +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i) > > { > > - size_t idx; > > + ssize_t idx; > > > > if (!obj->programs) > > return NULL; > > - /* First handler */ > > - if (prev == NULL) > > - return >programs[0]; > > > > - if (prev->obj != obj) { > > + if (p->obj != obj) { > > pr_warning("error: program handler doesn't match object\n"); > > return NULL; > > } > > > > - idx = (prev - obj->programs) + 1; > > - if (idx >= obj->nr_programs) > > + idx = (p - obj->programs) + i; > > + if (idx >= obj->nr_programs || idx < 0) > > return NULL; > > return >programs[idx]; > > } > > @@ -1944,8 +2183,29 @@ bpf_program__next(struct bpf_program *prev, struct > > bpf_object *obj) > > { > > struct bpf_program *prog = prev; > > > > + if (prev == NULL) > > + return obj->programs; > > + > This patch breaks the behavior introduced in > commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for > multi-function programs"): > "Make bpf_program__next() skip over '.text' section if object file > has pseudo calls. The '.text' section is hardly a program in that > case, it's more of a storage for code of functions other than main." > > For example, the userspace could have been doing: > prog = bpf_program__next(NULL, obj); > bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT); > bpf_object__load(obj); > > For the bpf_prog.o that has pseudo calls, after this patch in bpf-next, > the prog returned by bpf_program__next() could be in ".text" instead of > the main bpf program. The next bpf_program__set_type() has > no effect to the main program. The following bpf_object__load() > will catch user in surprise with the main bpf prog in > the wrong BPF_PROG_TYPE. Will something like the following fix your concern? (plus, assuming the same for prev): --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -2216,8 +2216,11 @@ bpf_program__next(struct bpf_program *prev, struct bpf_object *obj) { struct bpf_program *prog = prev; - if (prev == NULL) - return obj->programs; + if (prev == NULL) { + prog = obj->programs; + if (!prog || !bpf_program__is_function_storage(prog, obj)) + return prog; + } do { prog = __bpf_program__iter(prog, obj, 1); Any suggestions for a better way to do it? > > do { > > - prog = __bpf_program__next(prog, obj); > > + prog = __bpf_program__iter(prog, obj, 1); > > + } while (prog && bpf_program__is_function_storage(prog, obj)); > > + > > + return prog; > > +} > > + > > +struct bpf_program * > > +bpf_program__prev(struct bpf_program *next, struct bpf_object *obj) > > +{ > > + struct bpf_program *prog = next; > > + > > + if (next == NULL) { > > + if (!obj->nr_programs) > > + return NULL; > > + return obj->programs + obj->nr_programs - 1; > > + } > > + > > + do { > > + prog = __bpf_program__iter(prog, obj, -1); > > } while (prog && bpf_program__is_function_storage(prog, obj)); > > > > return prog;
Re: [PATCH bpf-next v2] bpftool: make libbfd optional
On Mon, 12 Nov 2018 13:44:10 -0800, Stanislav Fomichev wrote: > Make it possible to build bpftool without libbfd. libbfd and libopcodes are > typically provided in dev/dbg packages (binutils-dev in debian) which we > usually don't have installed on the fleet machines and we'd like a way to have > bpftool version that works without installing any additional packages. > This excludes support for disassembling jit-ted code and prints an error if > the user tries to use these features. > > Tested by: > cat > FEATURES_DUMP.bpftool < feature-libbfd=0 > feature-disassembler-four-args=1 > feature-reallocarray=0 > feature-libelf=1 > feature-libelf-mmap=1 > feature-bpf=1 > EOF > FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make > ldd bpftool | grep libbfd > > Signed-off-by: Stanislav Fomichev Seems reasonable, thanks! Acked-by: Jakub Kicinski
Re: [PATCH v5 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin
On Fri, Nov 09, 2018 at 08:21:41AM -0800, Stanislav Fomichev wrote: [ ... ] > @@ -1918,23 +2160,20 @@ void *bpf_object__priv(struct bpf_object *obj) > } > > static struct bpf_program * > -__bpf_program__next(struct bpf_program *prev, struct bpf_object *obj) > +__bpf_program__iter(struct bpf_program *p, struct bpf_object *obj, int i) > { > - size_t idx; > + ssize_t idx; > > if (!obj->programs) > return NULL; > - /* First handler */ > - if (prev == NULL) > - return >programs[0]; > > - if (prev->obj != obj) { > + if (p->obj != obj) { > pr_warning("error: program handler doesn't match object\n"); > return NULL; > } > > - idx = (prev - obj->programs) + 1; > - if (idx >= obj->nr_programs) > + idx = (p - obj->programs) + i; > + if (idx >= obj->nr_programs || idx < 0) > return NULL; > return >programs[idx]; > } > @@ -1944,8 +2183,29 @@ bpf_program__next(struct bpf_program *prev, struct > bpf_object *obj) > { > struct bpf_program *prog = prev; > > + if (prev == NULL) > + return obj->programs; > + This patch breaks the behavior introduced in commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for multi-function programs"): "Make bpf_program__next() skip over '.text' section if object file has pseudo calls. The '.text' section is hardly a program in that case, it's more of a storage for code of functions other than main." For example, the userspace could have been doing: prog = bpf_program__next(NULL, obj); bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT); bpf_object__load(obj); For the bpf_prog.o that has pseudo calls, after this patch in bpf-next, the prog returned by bpf_program__next() could be in ".text" instead of the main bpf program. The next bpf_program__set_type() has no effect to the main program. The following bpf_object__load() will catch user in surprise with the main bpf prog in the wrong BPF_PROG_TYPE. > do { > - prog = __bpf_program__next(prog, obj); > + prog = __bpf_program__iter(prog, obj, 1); > + } while (prog && bpf_program__is_function_storage(prog, obj)); > + > + return prog; > +} > + > +struct bpf_program * > +bpf_program__prev(struct bpf_program *next, struct bpf_object *obj) > +{ > + struct bpf_program *prog = next; > + > + if (next == NULL) { > + if (!obj->nr_programs) > + return NULL; > + return obj->programs + obj->nr_programs - 1; > + } > + > + do { > + prog = __bpf_program__iter(prog, obj, -1); > } while (prog && bpf_program__is_function_storage(prog, obj)); > > return prog;
[PATCH bpf-next v2] bpftool: make libbfd optional
Make it possible to build bpftool without libbfd. libbfd and libopcodes are typically provided in dev/dbg packages (binutils-dev in debian) which we usually don't have installed on the fleet machines and we'd like a way to have bpftool version that works without installing any additional packages. This excludes support for disassembling jit-ted code and prints an error if the user tries to use these features. Tested by: cat > FEATURES_DUMP.bpftool < --- tools/bpf/bpftool/Makefile | 13 +++-- tools/bpf/bpftool/jit_disasm.c | 8 +++- tools/bpf/bpftool/main.c | 3 --- tools/bpf/bpftool/main.h | 14 ++ tools/bpf/bpftool/prog.c | 3 +++ 5 files changed, 35 insertions(+), 6 deletions(-) diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile index dac7eff4c7e5..1bea6b979082 100644 --- a/tools/bpf/bpftool/Makefile +++ b/tools/bpf/bpftool/Makefile @@ -53,7 +53,7 @@ ifneq ($(EXTRA_LDFLAGS),) LDFLAGS += $(EXTRA_LDFLAGS) endif -LIBS = -lelf -lbfd -lopcodes $(LIBBPF) +LIBS = -lelf $(LIBBPF) INSTALL ?= install RM ?= rm -f @@ -90,7 +90,16 @@ include $(wildcard $(OUTPUT)*.d) all: $(OUTPUT)bpftool -SRCS = $(wildcard *.c) +BFD_SRCS = jit_disasm.c + +SRCS = $(filter-out $(BFD_SRCS),$(wildcard *.c)) + +ifeq ($(feature-libbfd),1) +CFLAGS += -DHAVE_LIBBFD_SUPPORT +SRCS += $(BFD_SRCS) +LIBS += -lbfd -lopcodes +endif + OBJS = $(patsubst %.c,$(OUTPUT)%.o,$(SRCS)) $(OUTPUT)disasm.o $(OUTPUT)disasm.o: $(srctree)/kernel/bpf/disasm.c diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c index c75ffd9ce2bb..b2ed5ee1af5f 100644 --- a/tools/bpf/bpftool/jit_disasm.c +++ b/tools/bpf/bpftool/jit_disasm.c @@ -109,7 +109,7 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, if (inf) { bfdf->arch_info = inf; } else { - p_err("No libfd support for %s", arch); + p_err("No libbfd support for %s", arch); return; } } @@ -183,3 +183,9 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, bfd_close(bfdf); } + +int disasm_init(void) +{ + bfd_init(); + return 0; +} diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c index 75a3296dc0bc..5c4c1cd5a7ba 100644 --- a/tools/bpf/bpftool/main.c +++ b/tools/bpf/bpftool/main.c @@ -31,7 +31,6 @@ * SOFTWARE. */ -#include #include #include #include @@ -399,8 +398,6 @@ int main(int argc, char **argv) if (argc < 0) usage(); - bfd_init(); - ret = cmd_select(cmds, argc, argv, do_help); if (json_output) diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 61d82020af58..10c6c16fae29 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -147,8 +147,22 @@ int prog_parse_fd(int *argc, char ***argv); int map_parse_fd(int *argc, char ***argv); int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len); +#ifdef HAVE_LIBBFD_SUPPORT void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, const char *arch, const char *disassembler_options); +int disasm_init(void); +#else +static inline +void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, + const char *arch, const char *disassembler_options) +{ +} +static inline int disasm_init(void) +{ + p_err("No libbfd support"); + return -1; +} +#endif void print_data_json(uint8_t *data, size_t len); void print_hex_data_json(uint8_t *data, size_t len); diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 5ff5544596e7..c176e1aa66fe 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -467,6 +467,9 @@ static int do_dump(int argc, char **argv) int fd; if (is_prefix(*argv, "jited")) { + if (disasm_init()) + return -1; + member_len = _prog_len; member_ptr = _prog_insns; } else if (is_prefix(*argv, "xlated")) { -- 2.19.1.930.g4563a0d9d0-goog
Re: [PATCH bpf-next] bpftool: make libbfd optional
On 11/12, Jakub Kicinski wrote: > On Mon, 12 Nov 2018 11:58:27 -0800, Stanislav Fomichev wrote: > > On 11/12, Jakub Kicinski wrote: > > > On Mon, 12 Nov 2018 10:53:28 -0800, Stanislav Fomichev wrote: > > > > diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h > > > > index 61d82020af58..ec1bc2ae3c71 100644 > > > > --- a/tools/bpf/bpftool/main.h > > > > +++ b/tools/bpf/bpftool/main.h > > > > @@ -147,8 +147,19 @@ int prog_parse_fd(int *argc, char ***argv); > > > > int map_parse_fd(int *argc, char ***argv); > > > > int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 > > > > *info_len); > > > > > > > > +#ifdef HAVE_LIBBFD_SUPPORT > > > > void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, > > > >const char *arch, const char > > > > *disassembler_options); > > > > +void disasm_init(void); > > > > +#else > > > > +static inline > > > > +void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, > > > > + const char *arch, const char > > > > *disassembler_options) > > > > +{ > > > > + p_err("No libbfd support"); > > > > +} > > > > > > I think an error per instruction is a bit much, could we make sure we > > > error out earlier? > > > > I can return int from disasm_print_insn as an indication to > > stop/continue. Let me know if you have better ideas, will post v2 later > > today. > > Why not simply: Makes sense. I think I can actually repurpose disasm_init for that (the function I added to wrap bfd_init, I think I can move the callsite here without any problems). > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c > index 5302ee282409..44edf6705ad2 100644 > --- a/tools/bpf/bpftool/prog.c > +++ b/tools/bpf/bpftool/prog.c > @@ -466,6 +466,10 @@ static int do_dump(int argc, char **argv) > int fd; > > if (is_prefix(*argv, "jited")) { > +#ifndef HAVE_LIBBFD_SUPPORT > + p_err("No libbfd support"); > + return -1; > +#endif > member_len = _prog_len; > member_ptr = _prog_insns; > } else if (is_prefix(*argv, "xlated")) { > > Perhaps wrapped into some helper in the header to avoid the yuckiness.
Re: [PATCH v2] net: Add trace events for all receive exit points
- On Nov 12, 2018, at 3:46 PM, Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote: > - On Nov 12, 2018, at 3:40 PM, rostedt rost...@goodmis.org wrote: > >> On Mon, 12 Nov 2018 15:20:55 -0500 (EST) >> Mathieu Desnoyers wrote: >> >>> >>> Hrm, looking at this again, I notice that there is a single DEFINE_EVENT >>> using net_dev_template_simple. >>> >>> We could simply turn netif_receive_skb_list_exit into a TRACE_EVENT(), >>> remove the net_dev_template_simple, and rename the net_dev_template_return >>> to net_dev_template ? >> >> This too is only cosmetic and doesn't affect the code at all, because a >> TRACE_EVENT() is really just: >> >> #define TRACE_EVENT(name, proto, args, tstruct, assign, print) \ >> DECLARE_EVENT_CLASS(name, \ >> PARAMS(proto),\ >> PARAMS(args), \ >> PARAMS(tstruct), \ >> PARAMS(assign), \ >> PARAMS(print)); \ >> DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args)); >> >> -- Steve >> > > Of course. > > I also notice that in two cases, a "gro_result_t" is implicitly cast > to "int". I usually frown upon this kind of stuff, because it's asking > for trouble if gro_result_t typedef to something else than "int" in the > future. > > I would recommend going for two templates, one which takes a "int" > ret parameter, and the other a "gro_result_t" ret parameter. > > Or am I being too cautious ? Digging further, gro_result_t maps to: enum gro_result { GRO_MERGED, GRO_MERGED_FREE, GRO_HELD, GRO_NORMAL, GRO_DROP, GRO_CONSUMED, }; typedef enum gro_result gro_result_t; So we should add a TRACE_DEFINE_ENUM() for those. Thanks, Mathieu > > Thanks, > > Mathieu > > >>> >>> It's pretty clear from the prototype that it expects a "ret" argument, >>> so I don't see the need to also state it in the template name. >>> > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH v2] net: Add trace events for all receive exit points
- On Nov 12, 2018, at 3:40 PM, rostedt rost...@goodmis.org wrote: > On Mon, 12 Nov 2018 15:20:55 -0500 (EST) > Mathieu Desnoyers wrote: > >> >> Hrm, looking at this again, I notice that there is a single DEFINE_EVENT >> using net_dev_template_simple. >> >> We could simply turn netif_receive_skb_list_exit into a TRACE_EVENT(), >> remove the net_dev_template_simple, and rename the net_dev_template_return >> to net_dev_template ? > > This too is only cosmetic and doesn't affect the code at all, because a > TRACE_EVENT() is really just: > > #define TRACE_EVENT(name, proto, args, tstruct, assign, print) \ > DECLARE_EVENT_CLASS(name, \ >PARAMS(proto),\ >PARAMS(args), \ >PARAMS(tstruct), \ >PARAMS(assign), \ >PARAMS(print)); \ > DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args)); > > -- Steve > Of course. I also notice that in two cases, a "gro_result_t" is implicitly cast to "int". I usually frown upon this kind of stuff, because it's asking for trouble if gro_result_t typedef to something else than "int" in the future. I would recommend going for two templates, one which takes a "int" ret parameter, and the other a "gro_result_t" ret parameter. Or am I being too cautious ? Thanks, Mathieu >> >> It's pretty clear from the prototype that it expects a "ret" argument, >> so I don't see the need to also state it in the template name. >> -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH mlx5-next 00/10] Collection of ODP fixes
On Thu, Nov 08, 2018 at 09:10:07PM +0200, Leon Romanovsky wrote: > From: Leon Romanovsky > > Hi, > > This is collection of fixes to mlx5_core and mlx5_ib ODP logic. > There are two main reasons why we decided to forward it to mlx5-next > and not to rdma-rc or net like our other fixes. > > 1. We have large number of patches exactly in that area that are ready > for submission and we don't want to create extra merge work for > maintainers due to that. Among those future series (already ready and > tested): internal change of mlx5_core pagefaults handling, moving ODP > code to RDMA, change in EQ mechanism, simplification and moving SRQ QP > to RDMA, extra fixes to mlx5_ib ODP and ODP SRQ support. > > 2. Most of those fixes are for old bugs and there is no rush to fix them > right now (in -rc1/-rc2). > > Thanks > > Moni Shoua (10): > net/mlx5: Release resource on error flow > net/mlx5: Add interface to hold and release core resources > net/mlx5: Enumerate page fault types > IB/mlx5: Lock QP during page fault handling > net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state > net/mlx5: Use multi threaded workqueue for page fault handling > IB/mlx5: Improve ODP debugging messages Applied to mlx5-next b02394aa75e3 IB/mlx5: Improve ODP debugging messages 90290db7669b net/mlx5: Use multi threaded workqueue for page fault handling ef90c5e9757d net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state 032080ab43ac IB/mlx5: Lock QP during page fault handling c99fefea2cc9 net/mlx5: Enumerate page fault types 27e95603f4df net/mlx5: Add interface to hold and release core resources 698114968a22 net/mlx5: Release resource on error flow > IB/mlx5: Avoid hangs due to a missing completion > IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously Dropped to address feedback. > net/mlx5: Remove unused function Dropped, because it depends on dropped commit. Thanks > > drivers/infiniband/core/umem_odp.c| 14 +- > drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 + > drivers/infiniband/hw/mlx5/mr.c | 15 +- > drivers/infiniband/hw/mlx5/odp.c | 158 ++ > drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 2 +- > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 21 +-- > drivers/net/ethernet/mellanox/mlx5/core/qp.c | 20 ++- > include/linux/mlx5/device.h | 7 + > include/linux/mlx5/driver.h | 7 +- > include/linux/mlx5/qp.h | 5 + > 10 files changed, 191 insertions(+), 59 deletions(-) > > -- > 2.19.1 > signature.asc Description: PGP signature
Re: [PATCH v2] net: Add trace events for all receive exit points
On Mon, 12 Nov 2018 15:20:55 -0500 (EST) Mathieu Desnoyers wrote: > > Hrm, looking at this again, I notice that there is a single DEFINE_EVENT > using net_dev_template_simple. > > We could simply turn netif_receive_skb_list_exit into a TRACE_EVENT(), > remove the net_dev_template_simple, and rename the net_dev_template_return > to net_dev_template ? This too is only cosmetic and doesn't affect the code at all, because a TRACE_EVENT() is really just: #define TRACE_EVENT(name, proto, args, tstruct, assign, print) \ DECLARE_EVENT_CLASS(name, \ PARAMS(proto),\ PARAMS(args), \ PARAMS(tstruct), \ PARAMS(assign), \ PARAMS(print)); \ DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args)); -- Steve > > It's pretty clear from the prototype that it expects a "ret" argument, > so I don't see the need to also state it in the template name. > >
Re: [PATCH v2] net: Add trace events for all receive exit points
- On Nov 12, 2018, at 3:20 PM, Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote: > - On Nov 12, 2018, at 3:09 PM, Mathieu Desnoyers > mathieu.desnoy...@efficios.com wrote: > >> - On Nov 12, 2018, at 2:44 PM, Geneviève Bastien gbast...@versatic.net >> wrote: >> >>> Trace events are already present for the receive entry points, to indicate >>> how the reception entered the stack. >>> >>> This patch adds the corresponding exit trace events that will bound the >>> reception such that all events occurring between the entry and the exit >>> can be considered as part of the reception context. This greatly helps >>> for dependency and root cause analyses. >>> >>> Without this, it is impossible to determine whether a sched_wakeup >>> event following a netif_receive_skb event is the result of the packet >>> reception or a simple coincidence after further processing by the >>> thread. >> >> As discussed over IRC, it is _possible_ to use kretprobes to instrument >> the exit, but it is not practical. A v3 will be sent soon to clarify >> this part of the commit message. >> >> Also, I wonder if we should use "net_dev_template_exit" for the event >> class rather than "net_dev_template_return" ? > > Hrm, looking at this again, I notice that there is a single DEFINE_EVENT > using net_dev_template_simple. > > We could simply turn netif_receive_skb_list_exit into a TRACE_EVENT(), > remove the net_dev_template_simple, and rename the net_dev_template_return > to net_dev_template ? > > It's pretty clear from the prototype that it expects a "ret" argument, > so I don't see the need to also state it in the template name. As you pointed out on IRC, net_dev_template already exists. So we can use "net_dev_rx_exit_template" which is along the same lines as its entry counterpart "net_dev_rx_verbose_template". Thanks, Mathieu > > Thanks, > > Mathieu > >> >> Thanks, >> >> Mathieu >> >>> >>> Signed-off-by: Geneviève Bastien >>> CC: Mathieu Desnoyers >>> CC: Steven Rostedt >>> CC: Ingo Molnar >>> CC: David S. Miller >>> --- >>> Changes in v2: >>> - Add the return value to tracepoints where applicable >>> - Verify if tracepoint is enabled before walking list in >>>netif_receive_skb_list >>> --- >>> include/trace/events/net.h | 78 ++ >>> net/core/dev.c | 38 --- >>> 2 files changed, 110 insertions(+), 6 deletions(-) >>> >>> diff --git a/include/trace/events/net.h b/include/trace/events/net.h >>> index 00aa72ce0e7c..cff1a7b9d0bb 100644 >>> --- a/include/trace/events/net.h >>> +++ b/include/trace/events/net.h >>> @@ -117,6 +117,42 @@ DECLARE_EVENT_CLASS(net_dev_template, >>> __get_str(name), __entry->skbaddr, __entry->len) >>> ) >>> >>> +DECLARE_EVENT_CLASS(net_dev_template_return, >>> + >>> + TP_PROTO(struct sk_buff *skb, int ret), >>> + >>> + TP_ARGS(skb, ret), >>> + >>> + TP_STRUCT__entry( >>> + __field(void *, skbaddr) >>> + __field(int,ret) >>> + ), >>> + >>> + TP_fast_assign( >>> + __entry->skbaddr = skb; >>> + __entry->ret = ret; >>> + ), >>> + >>> + TP_printk("skbaddr=%p ret=%d", __entry->skbaddr, __entry->ret) >>> +) >>> + >>> +DECLARE_EVENT_CLASS(net_dev_template_simple, >>> + >>> + TP_PROTO(struct sk_buff *skb), >>> + >>> + TP_ARGS(skb), >>> + >>> + TP_STRUCT__entry( >>> + __field(void *, skbaddr) >>> + ), >>> + >>> + TP_fast_assign( >>> + __entry->skbaddr = skb; >>> + ), >>> + >>> + TP_printk("skbaddr=%p", __entry->skbaddr) >>> +) >>> + >>> DEFINE_EVENT(net_dev_template, net_dev_queue, >>> >>> TP_PROTO(struct sk_buff *skb), >>> @@ -244,6 +280,48 @@ DEFINE_EVENT(net_dev_rx_verbose_template, >>> netif_rx_ni_entry, >>> TP_ARGS(skb) >>> ); >>> >>> +DEFINE_EVENT(net_dev_template_return, napi_gro_frags_exit, >>> + >>> + TP_PROTO(struct sk_buff *skb, int ret), >>> + >>> + TP_ARGS(skb, ret) >>> +); >>> + >>> +DEFINE_EVENT(net_dev_template_return, napi_gro_receive_exit, >>> + >>> + TP_PROTO(struct sk_buff *skb, int ret), >>> + >>> + TP_ARGS(skb, ret) >>> +); >>> + >>> +DEFINE_EVENT(net_dev_template_return, netif_receive_skb_exit, >>> + >>> + TP_PROTO(struct sk_buff *skb, int ret), >>> + >>> + TP_ARGS(skb, ret) >>> +); >>> + >>> +DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_list_exit, >>> + >>> + TP_PROTO(struct sk_buff *skb), >>> + >>> + TP_ARGS(skb) >>> +); >>> + >>> +DEFINE_EVENT(net_dev_template_return, netif_rx_exit, >>> + >>> + TP_PROTO(struct sk_buff *skb, int ret), >>> + >>> + TP_ARGS(skb, ret) >>> +); >>> + >>> +DEFINE_EVENT(net_dev_template_return, netif_rx_ni_exit, >>> + >>> + TP_PROTO(struct sk_buff *skb, int ret), >>> + >>> + TP_ARGS(skb, ret) >>> +); >>> + >>> #endif /* _TRACE_NET_H */ >>> >>> /* This part must be outside protection */ >>> diff --git a/net/core/dev.c b/net/core/dev.c >>> index 0ffcbdd55fa9..c4dc5df34abe 100644 >>> --- a/net/core/dev.c >>> +++
Re: [PATCH bpf-next] bpftool: make libbfd optional
On Mon, 12 Nov 2018 11:58:27 -0800, Stanislav Fomichev wrote: > On 11/12, Jakub Kicinski wrote: > > On Mon, 12 Nov 2018 10:53:28 -0800, Stanislav Fomichev wrote: > > > diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h > > > index 61d82020af58..ec1bc2ae3c71 100644 > > > --- a/tools/bpf/bpftool/main.h > > > +++ b/tools/bpf/bpftool/main.h > > > @@ -147,8 +147,19 @@ int prog_parse_fd(int *argc, char ***argv); > > > int map_parse_fd(int *argc, char ***argv); > > > int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 > > > *info_len); > > > > > > +#ifdef HAVE_LIBBFD_SUPPORT > > > void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, > > > const char *arch, const char *disassembler_options); > > > +void disasm_init(void); > > > +#else > > > +static inline > > > +void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, > > > +const char *arch, const char *disassembler_options) > > > +{ > > > + p_err("No libbfd support"); > > > +} > > > > I think an error per instruction is a bit much, could we make sure we > > error out earlier? > > I can return int from disasm_print_insn as an indication to > stop/continue. Let me know if you have better ideas, will post v2 later > today. Why not simply: diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 5302ee282409..44edf6705ad2 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -466,6 +466,10 @@ static int do_dump(int argc, char **argv) int fd; if (is_prefix(*argv, "jited")) { +#ifndef HAVE_LIBBFD_SUPPORT + p_err("No libbfd support"); + return -1; +#endif member_len = _prog_len; member_ptr = _prog_insns; } else if (is_prefix(*argv, "xlated")) { Perhaps wrapped into some helper in the header to avoid the yuckiness.
Re: [PATCH v2] net: Add trace events for all receive exit points
On Mon, 12 Nov 2018 14:44:05 -0500 Geneviève Bastien wrote: > Trace events are already present for the receive entry points, to indicate > how the reception entered the stack. > > This patch adds the corresponding exit trace events that will bound the > reception such that all events occurring between the entry and the exit > can be considered as part of the reception context. This greatly helps > for dependency and root cause analyses. > > Without this, it is impossible to determine whether a sched_wakeup > event following a netif_receive_skb event is the result of the packet > reception or a simple coincidence after further processing by the > thread. > > Signed-off-by: Geneviève Bastien > CC: Mathieu Desnoyers > CC: Steven Rostedt > CC: Ingo Molnar > CC: David S. Miller > --- > Changes in v2: > - Add the return value to tracepoints where applicable > - Verify if tracepoint is enabled before walking list in > netif_receive_skb_list > --- > include/trace/events/net.h | 78 ++ > net/core/dev.c | 38 --- > 2 files changed, 110 insertions(+), 6 deletions(-) > > diff --git a/include/trace/events/net.h b/include/trace/events/net.h > index 00aa72ce0e7c..cff1a7b9d0bb 100644 > --- a/include/trace/events/net.h > +++ b/include/trace/events/net.h > @@ -117,6 +117,42 @@ DECLARE_EVENT_CLASS(net_dev_template, > __get_str(name), __entry->skbaddr, __entry->len) > ) > > +DECLARE_EVENT_CLASS(net_dev_template_return, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret), > + > + TP_STRUCT__entry( > + __field(void *, skbaddr) > + __field(int,ret) > + ), > + > + TP_fast_assign( > + __entry->skbaddr = skb; > + __entry->ret = ret; > + ), > + > + TP_printk("skbaddr=%p ret=%d", __entry->skbaddr, __entry->ret) > +) > + > +DECLARE_EVENT_CLASS(net_dev_template_simple, > + > + TP_PROTO(struct sk_buff *skb), > + > + TP_ARGS(skb), > + > + TP_STRUCT__entry( > + __field(void *, skbaddr) > + ), > + > + TP_fast_assign( > + __entry->skbaddr = skb; > + ), > + > + TP_printk("skbaddr=%p", __entry->skbaddr) > +) > + > DEFINE_EVENT(net_dev_template, net_dev_queue, > > TP_PROTO(struct sk_buff *skb), > @@ -244,6 +280,48 @@ DEFINE_EVENT(net_dev_rx_verbose_template, > netif_rx_ni_entry, > TP_ARGS(skb) > ); > > +DEFINE_EVENT(net_dev_template_return, napi_gro_frags_exit, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret) > +); > + > +DEFINE_EVENT(net_dev_template_return, napi_gro_receive_exit, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret) > +); > + > +DEFINE_EVENT(net_dev_template_return, netif_receive_skb_exit, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret) > +); > + > +DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_list_exit, One small nit, and I don't care enough to ask you to fix it, but others might care. We probably should not intermix net_dev_template_simple events within net_dev_template_return events. But like I said, I don't really care, it's more cosmetic than anything else. For the tracing side: Reviewed-by: Steven Rostedt (VMware) -- Steve > + > + TP_PROTO(struct sk_buff *skb), > + > + TP_ARGS(skb) > +); > + > +DEFINE_EVENT(net_dev_template_return, netif_rx_exit, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret) > +); > + > +DEFINE_EVENT(net_dev_template_return, netif_rx_ni_exit, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret) > +); > + > #endif /* _TRACE_NET_H */ > >
Re: [PATCH net-next] net: phy: check for implementation of both callbacks in phy_drv_supports_irq
On 11/12/18 12:16 PM, Heiner Kallweit wrote: > Now that the icplus driver has been fixed all PHY drivers supporting > interrupts have both callbacks (config_intr and ack_interrupt) > implemented - as it should be. Therefore phy_drv_supports_irq() > can be changed now to check for both callbacks being implemented. > > Signed-off-by: Heiner Kallweit Acked-by: Florian Fainelli -- Florian
Re: [PATCH v2] net: Add trace events for all receive exit points
- On Nov 12, 2018, at 3:09 PM, Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote: > - On Nov 12, 2018, at 2:44 PM, Geneviève Bastien gbast...@versatic.net > wrote: > >> Trace events are already present for the receive entry points, to indicate >> how the reception entered the stack. >> >> This patch adds the corresponding exit trace events that will bound the >> reception such that all events occurring between the entry and the exit >> can be considered as part of the reception context. This greatly helps >> for dependency and root cause analyses. >> >> Without this, it is impossible to determine whether a sched_wakeup >> event following a netif_receive_skb event is the result of the packet >> reception or a simple coincidence after further processing by the >> thread. > > As discussed over IRC, it is _possible_ to use kretprobes to instrument > the exit, but it is not practical. A v3 will be sent soon to clarify > this part of the commit message. > > Also, I wonder if we should use "net_dev_template_exit" for the event > class rather than "net_dev_template_return" ? Hrm, looking at this again, I notice that there is a single DEFINE_EVENT using net_dev_template_simple. We could simply turn netif_receive_skb_list_exit into a TRACE_EVENT(), remove the net_dev_template_simple, and rename the net_dev_template_return to net_dev_template ? It's pretty clear from the prototype that it expects a "ret" argument, so I don't see the need to also state it in the template name. Thanks, Mathieu > > Thanks, > > Mathieu > >> >> Signed-off-by: Geneviève Bastien >> CC: Mathieu Desnoyers >> CC: Steven Rostedt >> CC: Ingo Molnar >> CC: David S. Miller >> --- >> Changes in v2: >> - Add the return value to tracepoints where applicable >> - Verify if tracepoint is enabled before walking list in >>netif_receive_skb_list >> --- >> include/trace/events/net.h | 78 ++ >> net/core/dev.c | 38 --- >> 2 files changed, 110 insertions(+), 6 deletions(-) >> >> diff --git a/include/trace/events/net.h b/include/trace/events/net.h >> index 00aa72ce0e7c..cff1a7b9d0bb 100644 >> --- a/include/trace/events/net.h >> +++ b/include/trace/events/net.h >> @@ -117,6 +117,42 @@ DECLARE_EVENT_CLASS(net_dev_template, >> __get_str(name), __entry->skbaddr, __entry->len) >> ) >> >> +DECLARE_EVENT_CLASS(net_dev_template_return, >> + >> +TP_PROTO(struct sk_buff *skb, int ret), >> + >> +TP_ARGS(skb, ret), >> + >> +TP_STRUCT__entry( >> +__field(void *, skbaddr) >> +__field(int,ret) >> +), >> + >> +TP_fast_assign( >> +__entry->skbaddr = skb; >> +__entry->ret = ret; >> +), >> + >> +TP_printk("skbaddr=%p ret=%d", __entry->skbaddr, __entry->ret) >> +) >> + >> +DECLARE_EVENT_CLASS(net_dev_template_simple, >> + >> +TP_PROTO(struct sk_buff *skb), >> + >> +TP_ARGS(skb), >> + >> +TP_STRUCT__entry( >> +__field(void *, skbaddr) >> +), >> + >> +TP_fast_assign( >> +__entry->skbaddr = skb; >> +), >> + >> +TP_printk("skbaddr=%p", __entry->skbaddr) >> +) >> + >> DEFINE_EVENT(net_dev_template, net_dev_queue, >> >> TP_PROTO(struct sk_buff *skb), >> @@ -244,6 +280,48 @@ DEFINE_EVENT(net_dev_rx_verbose_template, >> netif_rx_ni_entry, >> TP_ARGS(skb) >> ); >> >> +DEFINE_EVENT(net_dev_template_return, napi_gro_frags_exit, >> + >> +TP_PROTO(struct sk_buff *skb, int ret), >> + >> +TP_ARGS(skb, ret) >> +); >> + >> +DEFINE_EVENT(net_dev_template_return, napi_gro_receive_exit, >> + >> +TP_PROTO(struct sk_buff *skb, int ret), >> + >> +TP_ARGS(skb, ret) >> +); >> + >> +DEFINE_EVENT(net_dev_template_return, netif_receive_skb_exit, >> + >> +TP_PROTO(struct sk_buff *skb, int ret), >> + >> +TP_ARGS(skb, ret) >> +); >> + >> +DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_list_exit, >> + >> +TP_PROTO(struct sk_buff *skb), >> + >> +TP_ARGS(skb) >> +); >> + >> +DEFINE_EVENT(net_dev_template_return, netif_rx_exit, >> + >> +TP_PROTO(struct sk_buff *skb, int ret), >> + >> +TP_ARGS(skb, ret) >> +); >> + >> +DEFINE_EVENT(net_dev_template_return, netif_rx_ni_exit, >> + >> +TP_PROTO(struct sk_buff *skb, int ret), >> + >> +TP_ARGS(skb, ret) >> +); >> + >> #endif /* _TRACE_NET_H */ >> >> /* This part must be outside protection */ >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 0ffcbdd55fa9..c4dc5df34abe 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -4520,9 +4520,14 @@ static int netif_rx_internal(struct sk_buff *skb) >> >> int netif_rx(struct sk_buff *skb) >> { >> +int ret; >> + >> trace_netif_rx_entry(skb); >> >> -return netif_rx_internal(skb); >> +ret = netif_rx_internal(skb); >> +trace_netif_rx_exit(skb, ret); >> + >> +return ret; >> } >> EXPORT_SYMBOL(netif_rx); >> >> @@ -4537,6 +4542,7 @@ int netif_rx_ni(struct sk_buff *skb) >> if
[PATCH net-next] net: phy: check for implementation of both callbacks in phy_drv_supports_irq
Now that the icplus driver has been fixed all PHY drivers supporting interrupts have both callbacks (config_intr and ack_interrupt) implemented - as it should be. Therefore phy_drv_supports_irq() can be changed now to check for both callbacks being implemented. Signed-off-by: Heiner Kallweit --- drivers/net/phy/phy_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 55202a0ac..e06613f2d 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -2122,7 +2122,7 @@ static void of_set_phy_eee_broken(struct phy_device *phydev) static bool phy_drv_supports_irq(struct phy_driver *phydrv) { - return phydrv->config_intr || phydrv->ack_interrupt; + return phydrv->config_intr && phydrv->ack_interrupt; } /** -- 2.19.1
Re: [PATCH v2] net: Add trace events for all receive exit points
- On Nov 12, 2018, at 2:44 PM, Geneviève Bastien gbast...@versatic.net wrote: > Trace events are already present for the receive entry points, to indicate > how the reception entered the stack. > > This patch adds the corresponding exit trace events that will bound the > reception such that all events occurring between the entry and the exit > can be considered as part of the reception context. This greatly helps > for dependency and root cause analyses. > > Without this, it is impossible to determine whether a sched_wakeup > event following a netif_receive_skb event is the result of the packet > reception or a simple coincidence after further processing by the > thread. As discussed over IRC, it is _possible_ to use kretprobes to instrument the exit, but it is not practical. A v3 will be sent soon to clarify this part of the commit message. Also, I wonder if we should use "net_dev_template_exit" for the event class rather than "net_dev_template_return" ? Thanks, Mathieu > > Signed-off-by: Geneviève Bastien > CC: Mathieu Desnoyers > CC: Steven Rostedt > CC: Ingo Molnar > CC: David S. Miller > --- > Changes in v2: > - Add the return value to tracepoints where applicable > - Verify if tracepoint is enabled before walking list in >netif_receive_skb_list > --- > include/trace/events/net.h | 78 ++ > net/core/dev.c | 38 --- > 2 files changed, 110 insertions(+), 6 deletions(-) > > diff --git a/include/trace/events/net.h b/include/trace/events/net.h > index 00aa72ce0e7c..cff1a7b9d0bb 100644 > --- a/include/trace/events/net.h > +++ b/include/trace/events/net.h > @@ -117,6 +117,42 @@ DECLARE_EVENT_CLASS(net_dev_template, > __get_str(name), __entry->skbaddr, __entry->len) > ) > > +DECLARE_EVENT_CLASS(net_dev_template_return, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret), > + > + TP_STRUCT__entry( > + __field(void *, skbaddr) > + __field(int,ret) > + ), > + > + TP_fast_assign( > + __entry->skbaddr = skb; > + __entry->ret = ret; > + ), > + > + TP_printk("skbaddr=%p ret=%d", __entry->skbaddr, __entry->ret) > +) > + > +DECLARE_EVENT_CLASS(net_dev_template_simple, > + > + TP_PROTO(struct sk_buff *skb), > + > + TP_ARGS(skb), > + > + TP_STRUCT__entry( > + __field(void *, skbaddr) > + ), > + > + TP_fast_assign( > + __entry->skbaddr = skb; > + ), > + > + TP_printk("skbaddr=%p", __entry->skbaddr) > +) > + > DEFINE_EVENT(net_dev_template, net_dev_queue, > > TP_PROTO(struct sk_buff *skb), > @@ -244,6 +280,48 @@ DEFINE_EVENT(net_dev_rx_verbose_template, > netif_rx_ni_entry, > TP_ARGS(skb) > ); > > +DEFINE_EVENT(net_dev_template_return, napi_gro_frags_exit, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret) > +); > + > +DEFINE_EVENT(net_dev_template_return, napi_gro_receive_exit, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret) > +); > + > +DEFINE_EVENT(net_dev_template_return, netif_receive_skb_exit, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret) > +); > + > +DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_list_exit, > + > + TP_PROTO(struct sk_buff *skb), > + > + TP_ARGS(skb) > +); > + > +DEFINE_EVENT(net_dev_template_return, netif_rx_exit, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret) > +); > + > +DEFINE_EVENT(net_dev_template_return, netif_rx_ni_exit, > + > + TP_PROTO(struct sk_buff *skb, int ret), > + > + TP_ARGS(skb, ret) > +); > + > #endif /* _TRACE_NET_H */ > > /* This part must be outside protection */ > diff --git a/net/core/dev.c b/net/core/dev.c > index 0ffcbdd55fa9..c4dc5df34abe 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4520,9 +4520,14 @@ static int netif_rx_internal(struct sk_buff *skb) > > int netif_rx(struct sk_buff *skb) > { > + int ret; > + > trace_netif_rx_entry(skb); > > - return netif_rx_internal(skb); > + ret = netif_rx_internal(skb); > + trace_netif_rx_exit(skb, ret); > + > + return ret; > } > EXPORT_SYMBOL(netif_rx); > > @@ -4537,6 +4542,7 @@ int netif_rx_ni(struct sk_buff *skb) > if (local_softirq_pending()) > do_softirq(); > preempt_enable(); > + trace_netif_rx_ni_exit(skb, err); > > return err; > } > @@ -5222,9 +5228,14 @@ static void netif_receive_skb_list_internal(struct > list_head *head) > */ > int netif_receive_skb(struct sk_buff *skb) > { > + int ret; > + > trace_netif_receive_skb_entry(skb); > > - return netif_receive_skb_internal(skb); > + ret = netif_receive_skb_internal(skb); > + trace_netif_receive_skb_exit(skb, ret); > + > + return ret; > } > EXPORT_SYMBOL(netif_receive_skb); > > @@ -5244,9 +5255,15 @@ void netif_receive_skb_list(struct list_head
Re: [PATCH bpf-next] bpftool: make libbfd optional
On 11/12, Jakub Kicinski wrote: > On Mon, 12 Nov 2018 10:53:28 -0800, Stanislav Fomichev wrote: > > Make it possible to build bpftool without libbfd. This excludes support for > > disassembling jit-ted code and prints an error if the user tries to use > > these features. > > > > Tested by: > > cat > FEATURES_DUMP.bpftool < > feature-libbfd=0 > > feature-disassembler-four-args=1 > > feature-reallocarray=0 > > feature-libelf=1 > > feature-libelf-mmap=1 > > feature-bpf=1 > > EOF > > FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make > > ldd bpftool | grep libbfd > > > > Signed-off-by: Stanislav Fomichev > > Would you mind spelling out the motivation? Ack, will do in v2 commit message. tldr - some fleet machines don't have dev/dbg stuff installed; would like an option to have bpftool that works everywhere for introspection. > > diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h > > index 61d82020af58..ec1bc2ae3c71 100644 > > --- a/tools/bpf/bpftool/main.h > > +++ b/tools/bpf/bpftool/main.h > > @@ -147,8 +147,19 @@ int prog_parse_fd(int *argc, char ***argv); > > int map_parse_fd(int *argc, char ***argv); > > int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 > > *info_len); > > > > +#ifdef HAVE_LIBBFD_SUPPORT > > void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, > >const char *arch, const char *disassembler_options); > > +void disasm_init(void); > > +#else > > +static inline > > +void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, > > + const char *arch, const char *disassembler_options) > > +{ > > + p_err("No libbfd support"); > > +} > > I think an error per instruction is a bit much, could we make sure we > error out earlier? I can return int from disasm_print_insn as an indication to stop/continue. Let me know if you have better ideas, will post v2 later today. > > +static inline void disasm_init(void) {} > > +#endif > > void print_data_json(uint8_t *data, size_t len); > > void print_hex_data_json(uint8_t *data, size_t len); > > Otherwise LGTM. Thanks!
[PATCH v2] net: Add trace events for all receive exit points
Trace events are already present for the receive entry points, to indicate how the reception entered the stack. This patch adds the corresponding exit trace events that will bound the reception such that all events occurring between the entry and the exit can be considered as part of the reception context. This greatly helps for dependency and root cause analyses. Without this, it is impossible to determine whether a sched_wakeup event following a netif_receive_skb event is the result of the packet reception or a simple coincidence after further processing by the thread. Signed-off-by: Geneviève Bastien CC: Mathieu Desnoyers CC: Steven Rostedt CC: Ingo Molnar CC: David S. Miller --- Changes in v2: - Add the return value to tracepoints where applicable - Verify if tracepoint is enabled before walking list in netif_receive_skb_list --- include/trace/events/net.h | 78 ++ net/core/dev.c | 38 --- 2 files changed, 110 insertions(+), 6 deletions(-) diff --git a/include/trace/events/net.h b/include/trace/events/net.h index 00aa72ce0e7c..cff1a7b9d0bb 100644 --- a/include/trace/events/net.h +++ b/include/trace/events/net.h @@ -117,6 +117,42 @@ DECLARE_EVENT_CLASS(net_dev_template, __get_str(name), __entry->skbaddr, __entry->len) ) +DECLARE_EVENT_CLASS(net_dev_template_return, + + TP_PROTO(struct sk_buff *skb, int ret), + + TP_ARGS(skb, ret), + + TP_STRUCT__entry( + __field(void *, skbaddr) + __field(int,ret) + ), + + TP_fast_assign( + __entry->skbaddr = skb; + __entry->ret = ret; + ), + + TP_printk("skbaddr=%p ret=%d", __entry->skbaddr, __entry->ret) +) + +DECLARE_EVENT_CLASS(net_dev_template_simple, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb), + + TP_STRUCT__entry( + __field(void *, skbaddr) + ), + + TP_fast_assign( + __entry->skbaddr = skb; + ), + + TP_printk("skbaddr=%p", __entry->skbaddr) +) + DEFINE_EVENT(net_dev_template, net_dev_queue, TP_PROTO(struct sk_buff *skb), @@ -244,6 +280,48 @@ DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_ni_entry, TP_ARGS(skb) ); +DEFINE_EVENT(net_dev_template_return, napi_gro_frags_exit, + + TP_PROTO(struct sk_buff *skb, int ret), + + TP_ARGS(skb, ret) +); + +DEFINE_EVENT(net_dev_template_return, napi_gro_receive_exit, + + TP_PROTO(struct sk_buff *skb, int ret), + + TP_ARGS(skb, ret) +); + +DEFINE_EVENT(net_dev_template_return, netif_receive_skb_exit, + + TP_PROTO(struct sk_buff *skb, int ret), + + TP_ARGS(skb, ret) +); + +DEFINE_EVENT(net_dev_template_simple, netif_receive_skb_list_exit, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb) +); + +DEFINE_EVENT(net_dev_template_return, netif_rx_exit, + + TP_PROTO(struct sk_buff *skb, int ret), + + TP_ARGS(skb, ret) +); + +DEFINE_EVENT(net_dev_template_return, netif_rx_ni_exit, + + TP_PROTO(struct sk_buff *skb, int ret), + + TP_ARGS(skb, ret) +); + #endif /* _TRACE_NET_H */ /* This part must be outside protection */ diff --git a/net/core/dev.c b/net/core/dev.c index 0ffcbdd55fa9..c4dc5df34abe 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4520,9 +4520,14 @@ static int netif_rx_internal(struct sk_buff *skb) int netif_rx(struct sk_buff *skb) { + int ret; + trace_netif_rx_entry(skb); - return netif_rx_internal(skb); + ret = netif_rx_internal(skb); + trace_netif_rx_exit(skb, ret); + + return ret; } EXPORT_SYMBOL(netif_rx); @@ -4537,6 +4542,7 @@ int netif_rx_ni(struct sk_buff *skb) if (local_softirq_pending()) do_softirq(); preempt_enable(); + trace_netif_rx_ni_exit(skb, err); return err; } @@ -5222,9 +5228,14 @@ static void netif_receive_skb_list_internal(struct list_head *head) */ int netif_receive_skb(struct sk_buff *skb) { + int ret; + trace_netif_receive_skb_entry(skb); - return netif_receive_skb_internal(skb); + ret = netif_receive_skb_internal(skb); + trace_netif_receive_skb_exit(skb, ret); + + return ret; } EXPORT_SYMBOL(netif_receive_skb); @@ -5244,9 +5255,15 @@ void netif_receive_skb_list(struct list_head *head) if (list_empty(head)) return; - list_for_each_entry(skb, head, list) - trace_netif_receive_skb_list_entry(skb); + if (trace_netif_receive_skb_list_entry_enabled()) { + list_for_each_entry(skb, head, list) + trace_netif_receive_skb_list_entry(skb); + } netif_receive_skb_list_internal(head); + if (trace_netif_receive_skb_list_exit_enabled()) { + list_for_each_entry(skb, head, list) + trace_netif_receive_skb_list_exit(skb); + } }
Re: [PATCH bpf-next] bpftool: make libbfd optional
On Mon, 12 Nov 2018 10:53:28 -0800, Stanislav Fomichev wrote: > Make it possible to build bpftool without libbfd. This excludes support for > disassembling jit-ted code and prints an error if the user tries to use > these features. > > Tested by: > cat > FEATURES_DUMP.bpftool < feature-libbfd=0 > feature-disassembler-four-args=1 > feature-reallocarray=0 > feature-libelf=1 > feature-libelf-mmap=1 > feature-bpf=1 > EOF > FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make > ldd bpftool | grep libbfd > > Signed-off-by: Stanislav Fomichev Would you mind spelling out the motivation? > diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h > index 61d82020af58..ec1bc2ae3c71 100644 > --- a/tools/bpf/bpftool/main.h > +++ b/tools/bpf/bpftool/main.h > @@ -147,8 +147,19 @@ int prog_parse_fd(int *argc, char ***argv); > int map_parse_fd(int *argc, char ***argv); > int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 > *info_len); > > +#ifdef HAVE_LIBBFD_SUPPORT > void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, > const char *arch, const char *disassembler_options); > +void disasm_init(void); > +#else > +static inline > +void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, > +const char *arch, const char *disassembler_options) > +{ > + p_err("No libbfd support"); > +} I think an error per instruction is a bit much, could we make sure we error out earlier? > +static inline void disasm_init(void) {} > +#endif > void print_data_json(uint8_t *data, size_t len); > void print_hex_data_json(uint8_t *data, size_t len); Otherwise LGTM.
Re: Kernel 4.19 network performance - forwarding/routing normal users traffic
W dniu 11.11.2018 o 09:56, Jesper Dangaard Brouer pisze: On Sat, 10 Nov 2018 22:53:53 +0100 Paweł Staszewski wrote: Now im messing with ring configuration for connectx5 nics. And after reading that paper: https://netdevconf.org/2.1/slides/apr6/network-performance/04-amir-RX_and_TX_bulking_v2.pdf Do notice that some of the ideas in that slide deck, was never implemented. But they are still on my todo list ;-). Notice how that it show that TX bulking is very important, but based on your ethtool_stats.pl, I can see that not much TX bulking is happening in your case. This is indicated via the xmit_more counters. Ethtool(enp175s0) stat:2630 ( 2,630) <= tx_xmit_more /sec Ethtool(enp175s0) stat: 4956995 ( 4,956,995) <= tx_packets /sec And the per queue levels are also avail: Ethtool(enp175s0) stat: 184845 ( 184,845) <= tx7_packets /sec Ethtool(enp175s0) stat: 78 ( 78) <= tx7_xmit_more /sec This means that you are doing too many doorbell's to the NIC hardware at TX time, which I worry could be what cause the NIC and PCIe hardware not to operate at optimal speeds. After tunning coal/ring a little with ethtool Reached today: bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help input: /proc/net/dev type: rate | iface Rx Tx Total == enp175s0: 50.68 Gb/s 21.53 Gb/s 72.20 Gb/s enp216s0: 21.62 Gb/s 50.81 Gb/s 72.42 Gb/s -- total: 72.30 Gb/s 72.33 Gb/s 144.63 Gb/s And still no packet loss (icmp side to side test every 100ms) Below perf top PerfTop: 104692 irqs/sec kernel:99.5% exact: 0.0% [4000Hz cycles], (all, 56 CPUs) --- 9.06% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear 6.43% [kernel] [k] tasklet_action_common.isra.21 5.68% [kernel] [k] fib_table_lookup 4.89% [kernel] [k] irq_entries_start 4.53% [kernel] [k] mlx5_eq_int 4.10% [kernel] [k] build_skb 3.39% [kernel] [k] mlx5e_poll_tx_cq 3.38% [kernel] [k] mlx5e_sq_xmit 2.73% [kernel] [k] mlx5e_poll_rx_cq 2.18% [kernel] [k] __dev_queue_xmit 2.13% [kernel] [k] vlan_do_receive 2.12% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq 2.00% [kernel] [k] ip_finish_output2 1.87% [kernel] [k] mlx5e_post_rx_mpwqes 1.86% [kernel] [k] memcpy_erms 1.85% [kernel] [k] ipt_do_table 1.70% [kernel] [k] dev_gro_receive 1.39% [kernel] [k] __netif_receive_skb_core 1.31% [kernel] [k] inet_gro_receive 1.21% [kernel] [k] ip_route_input_rcu 1.21% [kernel] [k] tcp_gro_receive 1.13% [kernel] [k] _raw_spin_lock 1.08% [kernel] [k] __build_skb 1.06% [kernel] [k] kmem_cache_free_bulk 1.05% [kernel] [k] __softirqentry_text_start 1.03% [kernel] [k] vlan_dev_hard_start_xmit 0.98% [kernel] [k] pfifo_fast_dequeue 0.95% [kernel] [k] mlx5e_xmit 0.95% [kernel] [k] page_frag_free 0.88% [kernel] [k] ip_forward 0.81% [kernel] [k] dev_hard_start_xmit 0.78% [kernel] [k] rcu_irq_exit 0.77% [kernel] [k] netif_skb_features 0.72% [kernel] [k] napi_complete_done 0.72% [kernel] [k] kmem_cache_alloc 0.68% [kernel] [k] validate_xmit_skb.isra.142 0.66% [kernel] [k] ip_rcv_core.isra.20.constprop.25 0.58% [kernel] [k] swiotlb_map_page 0.57% [kernel] [k] __qdisc_run 0.56% [kernel] [k] tasklet_action 0.54% [kernel] [k] __get_xps_queue_idx 0.54% [kernel] [k] inet_lookup_ifaddr_rcu 0.50% [kernel] [k] tcp4_gro_receive 0.49% [kernel] [k] skb_release_data 0.47% [kernel] [k] eth_type_trans 0.40% [kernel] [k] sch_direct_xmit 0.40% [kernel] [k] net_rx_action 0.39% [kernel] [k] __local_bh_enable_ip And perf record/report https://ufile.io/zguq0 So now i know what was causing cpu load for some processes like: 2913 root 20 0 0 0 0 I 10.3 0.0 6:58.29 kworker/u112:1- 7 root 20 0 0 0 0 I 8.6 0.0 6:17.18 kworker/u112:0- 10289 root 20 0 0 0 0 I 6.6 0.0 6:33.90 kworker/u112:4- 2939 root 20 0 0 0 0 R 3.6 0.0 7:37.68 kworker/u112:2- After disabling adaptative tx for coalescense - all this
Re: [PATCH net-next] net: phy: switch to lockdep_assert_held in phylib
On 11/12/18 9:44 AM, Andrew Lunn wrote: > On Sun, Nov 11, 2018 at 10:33:08PM +0100, Heiner Kallweit wrote: >> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c >> index 2e59a8419..5cb06f021 100644 >> --- a/drivers/net/phy/mdio_bus.c >> +++ b/drivers/net/phy/mdio_bus.c >> @@ -541,7 +541,7 @@ int __mdiobus_read(struct mii_bus *bus, int addr, u32 >> regnum) >> { >> int retval; >> >> -WARN_ON_ONCE(!mutex_is_locked(>mdio_lock)); >> +lockdep_assert_held_once(>mdio_lock); > > Hi Heiner > > I don't think there is a clear right/wrong here. This is not hot path > code. The cost for checking the lock is held is very small compared to > the actual MDIO transaction. So i don't think we really need to > optimise this. I do sometimes build with lockdep on, but not > always. So it is good to know when locking is broken on normal builds. > > Florian, what do you think? lockdep_assert_held_once() also looks at debug_locks (global variable) so it sounds like in that regard, it would be superior in that it allows an user-configurable, general debugging facility to behave consistently as opposed to always having opt-in debugging within the mdio_bus.c file, but that also has a lot of value. I have to admit debugging MDIO bus locking issues is not particularly fun, so I would probably stick with the current code in that regard. -- Florian
Re: [PATCH net-next] net: phy: icplus: add config_intr callback
On 11/11/18 12:49 PM, Heiner Kallweit wrote: > Move IRQ configuration for IP101A/G from config_init to config_intr > callback. Reasons: > > 1. This allows phylib to disable interrupts if needed. > 2. Icplus was the only driver supporting interrupts w/o defining a >config_intr callback. Now we can add a phylib plausibility check >disabling interrupt mode if one of the two irq-related callbacks >isn't defined. > > I don't own hardware with this PHY, and the change is based on the > datasheet for IP101A LF (which is supposed to be register-compatible > with IP101A/G). Change is compile-tested only. > > Signed-off-by: Heiner Kallweit Reviewed-by: Florian Fainelli A bit surprising that this is not a read/modify/write sequence. -- Florian
[PATCH bpf-next] bpftool: make libbfd optional
Make it possible to build bpftool without libbfd. This excludes support for disassembling jit-ted code and prints an error if the user tries to use these features. Tested by: cat > FEATURES_DUMP.bpftool < --- tools/bpf/bpftool/Makefile | 13 +++-- tools/bpf/bpftool/jit_disasm.c | 7 ++- tools/bpf/bpftool/main.c | 3 +-- tools/bpf/bpftool/main.h | 11 +++ 4 files changed, 29 insertions(+), 5 deletions(-) diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile index dac7eff4c7e5..1bea6b979082 100644 --- a/tools/bpf/bpftool/Makefile +++ b/tools/bpf/bpftool/Makefile @@ -53,7 +53,7 @@ ifneq ($(EXTRA_LDFLAGS),) LDFLAGS += $(EXTRA_LDFLAGS) endif -LIBS = -lelf -lbfd -lopcodes $(LIBBPF) +LIBS = -lelf $(LIBBPF) INSTALL ?= install RM ?= rm -f @@ -90,7 +90,16 @@ include $(wildcard $(OUTPUT)*.d) all: $(OUTPUT)bpftool -SRCS = $(wildcard *.c) +BFD_SRCS = jit_disasm.c + +SRCS = $(filter-out $(BFD_SRCS),$(wildcard *.c)) + +ifeq ($(feature-libbfd),1) +CFLAGS += -DHAVE_LIBBFD_SUPPORT +SRCS += $(BFD_SRCS) +LIBS += -lbfd -lopcodes +endif + OBJS = $(patsubst %.c,$(OUTPUT)%.o,$(SRCS)) $(OUTPUT)disasm.o $(OUTPUT)disasm.o: $(srctree)/kernel/bpf/disasm.c diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c index c75ffd9ce2bb..6176bfe66f22 100644 --- a/tools/bpf/bpftool/jit_disasm.c +++ b/tools/bpf/bpftool/jit_disasm.c @@ -109,7 +109,7 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, if (inf) { bfdf->arch_info = inf; } else { - p_err("No libfd support for %s", arch); + p_err("No libbfd support for %s", arch); return; } } @@ -183,3 +183,8 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, bfd_close(bfdf); } + +void disasm_init(void) +{ + bfd_init(); +} diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c index 75a3296dc0bc..51aa71069b14 100644 --- a/tools/bpf/bpftool/main.c +++ b/tools/bpf/bpftool/main.c @@ -31,7 +31,6 @@ * SOFTWARE. */ -#include #include #include #include @@ -399,7 +398,7 @@ int main(int argc, char **argv) if (argc < 0) usage(); - bfd_init(); + disasm_init(); ret = cmd_select(cmds, argc, argv, do_help); diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 61d82020af58..ec1bc2ae3c71 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -147,8 +147,19 @@ int prog_parse_fd(int *argc, char ***argv); int map_parse_fd(int *argc, char ***argv); int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len); +#ifdef HAVE_LIBBFD_SUPPORT void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, const char *arch, const char *disassembler_options); +void disasm_init(void); +#else +static inline +void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, + const char *arch, const char *disassembler_options) +{ + p_err("No libbfd support"); +} +static inline void disasm_init(void) {} +#endif void print_data_json(uint8_t *data, size_t len); void print_hex_data_json(uint8_t *data, size_t len); -- 2.19.1.930.g4563a0d9d0-goog
Re: [PATCH bpf-next] filter: add BPF_ADJ_ROOM_DATA mode to bpf_skb_adjust_room()
On Sun, Nov 11, 2018 at 12:43:27AM +0100, Nicolas Dichtel wrote: > Le 09/11/2018 à 19:51, Martin Lau a écrit : > > On Thu, Nov 08, 2018 at 04:11:37PM +0100, Nicolas Dichtel wrote: > [snip] > >> +static int bpf_skb_data_shrink(struct sk_buff *skb, u32 len) > >> +{ > >> + unsigned short hhlen = skb->dev->header_ops ? > >> + skb->dev->hard_header_len : 0; > >> + int ret; > >> + > >> + ret = skb_unclone(skb, GFP_ATOMIC); > >> + if (unlikely(ret < 0)) > >> + return ret; > >> + > >> + __skb_pull(skb, len); > >> + skb_reset_mac_header(skb); > >> + skb_reset_network_header(skb); > >> + skb->network_header += hhlen; Nit. skb_set_network_header(skb, hhlen); Othen than that Acked-by: Martin KaFai Lau > >> + skb_reset_transport_header(skb); > > hmm...why transport_header does not need += hhlen here > > while network_header does? > > network_header is mandatory because bpf_redirect(BPF_F_INGRESS) can be called > and network_header is expected to be correctly set in this case. > For transport_header, I choose to not set it, because the stack will set it > later (for example ip_rcv_core()). ic. make sense.
[PATCHv2 net-net] net: dsa: mv88e6xxx: Work around mv886e6161 SERDES missing MII_PHYSID2
We already have a workaround for a couple of switches whose internal PHYs only have the Marvel OUI, but no model number. We detect such PHYs and give them the 6390 ID as the model number. However the mv88e6161 has two SERDES interfaces in the same address range as its internal PHYs. These suffer from the same problem, the Marvell OUI, but no model number. As a result, these SERDES interfaces were getting the same PHY ID as the mv88e6390, even though they are not PHYs, and the Marvell PHY driver was trying to drive them. Add a special case to stop this from happen. Reported-by: Chris Healy Signed-off-by: Andrew Lunn --- v2: grammar fix in commit message. PHYS->PHYs --- drivers/net/dsa/mv88e6xxx/chip.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index fc0f508879d4..b603f8d6ee3e 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2524,11 +2524,22 @@ static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg) mutex_unlock(>reg_lock); if (reg == MII_PHYSID2) { - /* Some internal PHYS don't have a model number. Use -* the mv88e6390 family model number instead. -*/ - if (!(val & 0x3f0)) - val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4; + /* Some internal PHYs don't have a model number. */ + if (chip->info->family != MV88E6XXX_FAMILY_6165) + /* Then there is the 6165 family. It gets is +* PHYs correct. But it can also have two +* SERDES interfaces in the PHY address +* space. And these don't have a model +* number. But they are not PHYs, so we don't +* want to give them something a PHY driver +* will recognise. +* +* Use the mv88e6390 family model number +* instead, for anything which really could be +* a PHY, +*/ + if (!(val & 0x3f0)) + val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4; } return err ? err : val; -- 2.19.1
[PATCHv2] MAINTAINERS: Replace Vince Bridgers as Altera TSE maintainer
From: Thor Thayer Vince has moved to a different role. Replace him as Altera TSE maintainer. Signed-off-by: Thor Thayer Acked-by: Vince Bridgers Acked-by: Alan Tull --- v2 Include netdev and David Miller --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index e110e327bf38..827fd5fc6ebd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -717,7 +717,7 @@ F: include/linux/mfd/altera-a10sr.h F: include/dt-bindings/reset/altr,rst-mgr-a10sr.h ALTERA TRIPLE SPEED ETHERNET DRIVER -M: Vince Bridgers +M: Thor Thayer L: netdev@vger.kernel.org L: nios2-...@lists.rocketboards.org (moderated for non-subscribers) S: Maintained -- 2.7.4
Re: [PATCH net-next] net: phy: switch to lockdep_assert_held in phylib
On Sun, Nov 11, 2018 at 10:33:08PM +0100, Heiner Kallweit wrote: > diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c > index 2e59a8419..5cb06f021 100644 > --- a/drivers/net/phy/mdio_bus.c > +++ b/drivers/net/phy/mdio_bus.c > @@ -541,7 +541,7 @@ int __mdiobus_read(struct mii_bus *bus, int addr, u32 > regnum) > { > int retval; > > - WARN_ON_ONCE(!mutex_is_locked(>mdio_lock)); > + lockdep_assert_held_once(>mdio_lock); Hi Heiner I don't think there is a clear right/wrong here. This is not hot path code. The cost for checking the lock is held is very small compared to the actual MDIO transaction. So i don't think we really need to optimise this. I do sometimes build with lockdep on, but not always. So it is good to know when locking is broken on normal builds. Florian, what do you think? Andrew
Re: [PATCH net-next] net: phy: icplus: add config_intr callback
On Sun, Nov 11, 2018 at 09:49:12PM +0100, Heiner Kallweit wrote: > Move IRQ configuration for IP101A/G from config_init to config_intr > callback. Reasons: > > 1. This allows phylib to disable interrupts if needed. > 2. Icplus was the only driver supporting interrupts w/o defining a >config_intr callback. Now we can add a phylib plausibility check >disabling interrupt mode if one of the two irq-related callbacks >isn't defined. > > I don't own hardware with this PHY, and the change is based on the > datasheet for IP101A LF (which is supposed to be register-compatible > with IP101A/G). Change is compile-tested only. Hi Heiner This looks sensible. Thanks for fixing up this driver. Reviewed-by: Andrew Lunn Andrew
Re: [PATCH net-next 17/17] net: sched: unlock rules update API
From: Vlad Buslov Date: Mon, 12 Nov 2018 09:55:46 +0200 > Register netlink protocol handlers for message types RTM_NEWTFILTER, > RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that > tracks rtnl mutex state to be false by default. This whole conditional locking mechanism is really not clean and makes this code so much harder to understand and audit. Please improve the code so that this kind of construct is not needed. Thank you.
Re: [PATCH net-next 02/17] net: sched: protect block state with spinlock
From: Vlad Buslov Date: Mon, 12 Nov 2018 09:55:31 +0200 > +#define ASSERT_BLOCK_LOCKED(block) \ > + WARN_ONCE(!spin_is_locked(&(block)->lock), \ > + "BLOCK: assertion failed at %s (%d)\n", __FILE__, __LINE__) spin_is_locked() is not usable for assertions. When SMP=n and DEBUG_SPINLOCK=n, it always returns zero.
Re: [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue
From: Vlad Buslov Date: Mon, 12 Nov 2018 09:55:30 +0200 > +void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp, > + struct tcf_proto *tp_head) > +{ > + xchg(>tp_head, tp_head); If you are not checking the return value of xchg(), then this is simply a store with optionally a memory barrier of some sort either before or after.
Re: [RFC PATCH 1/3] acpi: Add acpi mdio support code
On Thu, Nov 08, 2018 at 03:22:16PM +0800, Wang Dongsheng wrote: > Add support for parsing the ACPI data node for PHY devices on an MDIO bus. > The current implementation depend on mdio bus scan. > With _DSD device properties we can finally do this: > > Device (MDIO) { > Name (_DSD, Package () { > ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"), > Package () { Package () { "ethernet-phy@0", PHY0 }, } > }) > Name (PHY0, Package() { > ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), > Package () { Package () { "reg", 0x0 }, } > }) > } > > Device (MACO) { > Name (_DSD, Package () { > ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), > Package () { Package () { "phy-handle", \_SB.MDIO, > "ethernet-phy@0" }, } > }) > } > > Documentations: > The DT "phy-handle" binding that we reuse for ACPI is documented in > Documentation/devicetree/bindings/phy/phy-bindings.txt > > Documentation/acpi/dsd/data-node-references.txt > Documentation/acpi/dsd/graph.txt > > Signed-off-by: Wang Dongsheng > --- > drivers/acpi/Kconfig | 6 ++ > drivers/acpi/Makefile | 1 + > drivers/acpi/acpi_mdio.c | 167 + > drivers/net/phy/mdio_bus.c | 3 + > include/linux/acpi_mdio.h | 82 ++ > 5 files changed, 259 insertions(+) > > diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig > index 9705fc986da9..0fefa3410ce9 100644 > --- a/drivers/acpi/Kconfig > +++ b/drivers/acpi/Kconfig > @@ -252,6 +252,12 @@ config ACPI_PROCESSOR_IDLE > config ACPI_MCFG > bool > > +config ACPI_MDIO > + def_tristate PHYLIB > + depends on PHYLIB > + help > + ACPI MDIO bus (Ethernet PHY) accessors > + > config ACPI_CPPC_LIB > bool > depends on ACPI_PROCESSOR > diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile > index 6d59aa109a91..ec7461a064fc 100644 > --- a/drivers/acpi/Makefile > +++ b/drivers/acpi/Makefile > @@ -41,6 +41,7 @@ acpi-y += ec.o > acpi-$(CONFIG_ACPI_DOCK) += dock.o > acpi-y += pci_root.o pci_link.o pci_irq.o > obj-$(CONFIG_ACPI_MCFG) += pci_mcfg.o > +acpi-$(CONFIG_ACPI_MDIO) += acpi_mdio.o > acpi-y += acpi_lpss.o acpi_apd.o > acpi-y += acpi_platform.o > acpi-y += acpi_pnp.o > diff --git a/drivers/acpi/acpi_mdio.c b/drivers/acpi/acpi_mdio.c > new file mode 100644 > index ..293bf9a63197 > --- /dev/null > +++ b/drivers/acpi/acpi_mdio.c > @@ -0,0 +1,167 @@ > +// SPDX-License-Identifier: GPL-2.0+ > +// Lots of code in this file is copy from drivers/of/of_mdio.c > +// Copyright (c) 2018 Huaxintong Semiconductor Technology Co., Ltd. I agree with Rafael here. We should try to refactor and combine this code. And where possible, we should try to add a uniform fwnode_ API which underneath either uses OF or ACPI, depending on the type of the node. We already have fwnode_get_mac_address(), fwmode_irq_get(), fwnode_get_phy_mode(), so where possible, PHYs should be no different. Andrew
Re: [RFC PATCH 0/3] acpi: Add acpi mdio support code
> > I'm just trying to ensure whatever is defined is flexible enough that > > we really can later support everything which DT does. We have PHYs on > > MDIO busses, inside switches, which are on MDIO busses, which are > > inside Ethernet interfaces, etc. > > I think it can be satisfied. See the table I'm using above. Hi Dongsheng Since i don't know anything better, i think i have to trust you have this correct. It would be good to document this, so that the next person who needs to add ACPI support for a PHY has some documentation to look at. Could you add something to Documentation/acpi/dsd? Andrew
Re: [PATCHv2 net-next 0/3] sctp: add support for sk_reuseport
From: Xin Long Date: Mon, 12 Nov 2018 18:27:14 +0800 > sctp sk_reuseport allows multiple socks to listen on the same port and > addresses, as long as these socks have the same uid. This works pretty > much as TCP/UDP does, the only difference is that sctp is multi-homing > and all the bind_addrs in these socks will have to completely matched, > otherwise listen() will return err. > > The below is when 5 sockets are listening on 172.16.254.254:6400 on a > server, 26 sockets on a client connect to 172.16.254.254:6400 and each > may be processed by a different socket on the server which is selected > by hash(lport, pport, paddr) in reuseport_select_sock(): > > # ss --sctp -nn ... Series applied, thanks.
[PATCH net] l2tp: fix a sock refcnt leak in l2tp_tunnel_register
This issue happens when trying to add an existent tunnel. It doesn't call sock_put() before returning -EEXIST to release the sock refcnt that was held by calling sock_hold() before the existence check. This patch is to fix it by holding the sock after doing the existence check. Fixes: f6cd651b056f ("l2tp: fix race in duplicate tunnel detection") Reported-by: Jianlin Shi Signed-off-by: Xin Long --- net/l2tp/l2tp_core.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index 82cdf90..26f1d43 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1490,12 +1490,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, goto err_sock; } - sk = sock->sk; - - sock_hold(sk); - tunnel->sock = sk; tunnel->l2tp_net = net; - pn = l2tp_pernet(net); spin_lock_bh(>l2tp_tunnel_list_lock); @@ -1510,6 +1505,10 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, list_add_rcu(>list, >l2tp_tunnel_list); spin_unlock_bh(>l2tp_tunnel_list_lock); + sk = sock->sk; + sock_hold(sk); + tunnel->sock = sk; + if (tunnel->encap == L2TP_ENCAPTYPE_UDP) { struct udp_tunnel_sock_cfg udp_cfg = { .sk_user_data = tunnel, -- 2.1.0
Re: [RFC PATCH 2/6] phy: armada38x: add common phy support
> +static int a38x_comphy_poll(struct a38x_comphy_lane *lane, > + unsigned int offset, u32 mask, u32 value) > +{ > + unsigned int timeout = 10; > + u32 val; > + > + while (1) { > + val = readl_relaxed(lane->base + offset); > + if ((val & mask) == value) > + return 0; > + if (!timeout--) > + break; > + udelay(10); > + } Hi Russell Maybe use one of the readx_poll_timeout() variants? Andrew
Re: [PATCH][net-next] net: remove BUG_ON from __pskb_pull_tail
From: Li RongQing Date: Mon, 12 Nov 2018 17:26:13 +0800 > do { > - BUG_ON(!list); > Please get rid of the empty line afterwards as well. Thank you.
Re: [PATCH net v2 1/1] bnx2x: Assign unique DMAE channel number for FW DMAE transactions.
From: Sudarsana Reddy Kalluru Date: Sun, 11 Nov 2018 18:27:34 -0800 > Driver assigns DMAE channel 0 for FW as part of START_RAMROD command. FW > uses this channel for DMAE operations (e.g., TIME_SYNC implementation). > Driver also uses the same channel 0 for DMAE operations for some of the PFs > (e.g., PF0 on Port0). This could lead to concurrent access to the DMAE > channel by FW and driver which is not legal. Hence need to assign unique > DMAE id for FW. > Currently following DMAE channels are used by the clients, > MFW - OCBB/OCSD functionality uses DMAE channel 14/15 > Driver 0-3 and 8-11 (for PF dmae operations) > 4 and 12 (for stats requests) > Assigning unique dmae_id '13' to the FW. > > Changes from previous version: > -- > v2: Incorporated the review comments. > > Signed-off-by: Sudarsana Reddy Kalluru > Signed-off-by: Michal Kalderon Applied.
Re: [PATCH net-next 3/3] tcp: get rid of tcp_tso_should_defer() dependency on HZ/jiffies
On Sun, Nov 11, 2018 at 11:06 AM, Neal Cardwell wrote: > On Sun, Nov 11, 2018 at 9:41 AM Eric Dumazet wrote: >> >> tcp_tso_should_defer() first heuristic is to not defer >> if last send is "old enough". >> >> Its current implementation uses jiffies and its low granularity. >> >> TSO autodefer performance should not rely on kernel HZ :/ >> >> After EDT conversion, we have state variables in nanoseconds that >> can allow us to properly implement the heuristic. >> >> This patch increases TSO chunk sizes on medium rate flows, >> especially when receivers do not use GRO or similar aggregation. >> >> It also reduces bursts for HZ=100 or HZ=250 kernels, making TCP >> behavior more uniform. >> >> Signed-off-by: Eric Dumazet >> Acked-by: Soheil Hassas Yeganeh >> --- > > Nice. Thanks! > > Acked-by: Neal Cardwell Acked-by: Yuchung Cheng Love it > > neal
Re: [PATCH iproute2] testsuite: ss: Fix spacing in expected output for ssfilter.t
On Sat, 10 Nov 2018 22:48:44 +0100 Phil Sutter wrote: > Hi Stefano, > > On Sat, Nov 10, 2018 at 10:21:59AM +0100, Stefano Brivio wrote: > > Since commit 00240899ec0b ("ss: Actually print left delimiter for > > columns") changes spacing in ss output, we also need to adjust for that in > > the ss filter test. > > > > Fixes: 00240899ec0b ("ss: Actually print left delimiter for columns") > > Signed-off-by: Stefano Brivio Applied, thanks.
[PATCH iproute2] testsuite: colorize test result output
When running testsuite it is easy as a human to miss failure. Add symbol colorizing to SKIPED/PASS/FAIL output. Signed-off-by: Stephen Hemminger --- testsuite/Makefile | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/testsuite/Makefile b/testsuite/Makefile index b3aebec1517b..2dc7f166c709 100644 --- a/testsuite/Makefile +++ b/testsuite/Makefile @@ -85,11 +85,11 @@ endif TC="$$i/tc/tc" IP="$$i/ip/ip" SS=$$i/misc/ss DEV="$(DEV)" IPVER="$@" SNAME="$$i" \ ERRF="$(RESULTS_DIR)/$@.$$o.err" $(PREFIX) tests/$@ > $(RESULTS_DIR)/$@.$$o.out; \ if [ "$$?" = "127" ]; then \ - echo "SKIPPED"; \ + echo -e "\e[1;35mSKIPPED\e[0m"; \ elif [ -e "$(RESULTS_DIR)/$@.$$o.err" ]; then \ - echo "FAILED"; \ + echo -e "\e[0;31mFAILED\e[0m"; \ else \ - echo "PASS"; \ + echo -e "\e[0;32mPASS\e[0m"; \ fi; \ rm "$$TMP_ERR" "$$TMP_OUT"; \ sudo dmesg > $(RESULTS_DIR)/$@.$$o.dmesg; \ -- 2.19.1
[PATCH net-next 3/4] sctp: rename enum sctp_event to sctp_event_type
sctp_event is a structure name defined in RFC for sockopt SCTP_EVENT. To avoid the conflict, rename it. Signed-off-by: Xin Long --- include/net/sctp/constants.h | 2 +- include/net/sctp/sm.h| 4 ++-- net/sctp/primitive.c | 2 +- net/sctp/sm_sideeffect.c | 12 ++-- net/sctp/sm_statetable.c | 2 +- 5 files changed, 11 insertions(+), 11 deletions(-) diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h index 8dadc74..4588bdc 100644 --- a/include/net/sctp/constants.h +++ b/include/net/sctp/constants.h @@ -71,7 +71,7 @@ enum { SCTP_DEFAULT_INSTREAMS = SCTP_MAX_STREAM }; SCTP_NUM_AUTH_CHUNK_TYPES) /* These are the different flavours of event. */ -enum sctp_event { +enum sctp_event_type { SCTP_EVENT_T_CHUNK = 1, SCTP_EVENT_T_TIMEOUT, SCTP_EVENT_T_OTHER, diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h index 9e3d327..24825a8 100644 --- a/include/net/sctp/sm.h +++ b/include/net/sctp/sm.h @@ -173,7 +173,7 @@ sctp_state_fn_t sctp_sf_autoclose_timer_expire; __u8 sctp_get_chunk_type(struct sctp_chunk *chunk); const struct sctp_sm_table_entry *sctp_sm_lookup_event( struct net *net, - enum sctp_event event_type, + enum sctp_event_type event_type, enum sctp_state state, union sctp_subtype event_subtype); int sctp_chunk_iif(const struct sctp_chunk *); @@ -313,7 +313,7 @@ struct sctp_chunk *sctp_process_strreset_resp( /* Prototypes for statetable processing. */ -int sctp_do_sm(struct net *net, enum sctp_event event_type, +int sctp_do_sm(struct net *net, enum sctp_event_type event_type, union sctp_subtype subtype, enum sctp_state state, struct sctp_endpoint *ep, struct sctp_association *asoc, void *event_arg, gfp_t gfp); diff --git a/net/sctp/primitive.c b/net/sctp/primitive.c index c0817f7a..a8c4c33 100644 --- a/net/sctp/primitive.c +++ b/net/sctp/primitive.c @@ -53,7 +53,7 @@ int sctp_primitive_ ## name(struct net *net, struct sctp_association *asoc, \ void *arg) { \ int error = 0; \ - enum sctp_event event_type; union sctp_subtype subtype; \ + enum sctp_event_type event_type; union sctp_subtype subtype; \ enum sctp_state state; \ struct sctp_endpoint *ep; \ \ diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c index 85d3930..1d143bc 100644 --- a/net/sctp/sm_sideeffect.c +++ b/net/sctp/sm_sideeffect.c @@ -52,7 +52,7 @@ #include #include -static int sctp_cmd_interpreter(enum sctp_event event_type, +static int sctp_cmd_interpreter(enum sctp_event_type event_type, union sctp_subtype subtype, enum sctp_state state, struct sctp_endpoint *ep, @@ -61,7 +61,7 @@ static int sctp_cmd_interpreter(enum sctp_event event_type, enum sctp_disposition status, struct sctp_cmd_seq *commands, gfp_t gfp); -static int sctp_side_effects(enum sctp_event event_type, +static int sctp_side_effects(enum sctp_event_type event_type, union sctp_subtype subtype, enum sctp_state state, struct sctp_endpoint *ep, @@ -623,7 +623,7 @@ static void sctp_cmd_init_failed(struct sctp_cmd_seq *commands, /* Worker routine to handle SCTP_CMD_ASSOC_FAILED. */ static void sctp_cmd_assoc_failed(struct sctp_cmd_seq *commands, struct sctp_association *asoc, - enum sctp_event event_type, + enum sctp_event_type event_type, union sctp_subtype subtype, struct sctp_chunk *chunk, unsigned int error) @@ -1162,7 +1162,7 @@ static void sctp_cmd_send_asconf(struct sctp_association *asoc) * If you want to understand all of lksctp, this is a * good place to start. */ -int sctp_do_sm(struct net *net, enum sctp_event event_type, +int sctp_do_sm(struct net *net, enum sctp_event_type event_type, union sctp_subtype subtype, enum sctp_state state, struct sctp_endpoint *ep, struct sctp_association *asoc, void *event_arg, gfp_t gfp) @@ -1199,7 +1199,7 @@ int sctp_do_sm(struct net *net, enum sctp_event event_type, /* * This the master state function side effect processing function. */ -static int sctp_side_effects(enum sctp_event event_type, +static int
[PATCH net-next 4/4] sctp: add sockopt SCTP_EVENT
This patch adds sockopt SCTP_EVENT described in rfc6525#section-6.2. With this sockopt users can subscribe to an event from a specified asoc. Signed-off-by: Xin Long --- include/uapi/linux/sctp.h | 7 net/sctp/socket.c | 89 +++ 2 files changed, 96 insertions(+) diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h index 66afa5b..d584073 100644 --- a/include/uapi/linux/sctp.h +++ b/include/uapi/linux/sctp.h @@ -129,6 +129,7 @@ typedef __s32 sctp_assoc_t; #define SCTP_STREAM_SCHEDULER_VALUE124 #define SCTP_INTERLEAVING_SUPPORTED125 #define SCTP_SENDMSG_CONNECT 126 +#define SCTP_EVENT 127 /* PR-SCTP policies */ #define SCTP_PR_SCTP_NONE 0x @@ -1154,6 +1155,12 @@ struct sctp_add_streams { uint16_t sas_outstrms; }; +struct sctp_event { + sctp_assoc_t se_assoc_id; + uint16_t se_type; + uint8_t se_on; +}; + /* SCTP Stream schedulers */ enum sctp_sched_type { SCTP_SS_FCFS, diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 789008d..1451211 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4288,6 +4288,57 @@ static int sctp_setsockopt_reuse_port(struct sock *sk, char __user *optval, return 0; } +static int sctp_setsockopt_event(struct sock *sk, char __user *optval, +unsigned int optlen) +{ + struct sctp_association *asoc; + struct sctp_ulpevent *event; + struct sctp_event param; + int retval = 0; + + if (optlen < sizeof(param)) { + retval = -EINVAL; + goto out; + } + + optlen = sizeof(param); + if (copy_from_user(, optval, optlen)) { + retval = -EFAULT; + goto out; + } + + if (param.se_type < SCTP_SN_TYPE_BASE || + param.se_type > SCTP_SN_TYPE_MAX) { + retval = -EINVAL; + goto out; + } + + asoc = sctp_id2assoc(sk, param.se_assoc_id); + if (!asoc) { + sctp_ulpevent_type_set(_sk(sk)->subscribe, + param.se_type, param.se_on); + goto out; + } + + sctp_ulpevent_type_set(>subscribe, param.se_type, param.se_on); + + if (param.se_type == SCTP_SENDER_DRY_EVENT && param.se_on) { + if (sctp_outq_is_empty(>outqueue)) { + event = sctp_ulpevent_make_sender_dry_event(asoc, + GFP_USER | __GFP_NOWARN); + if (!event) { + retval = -ENOMEM; + goto out; + } + + asoc->stream.si->enqueue_event(>ulpq, event); + } + } + +out: + return retval; +} + /* API 6.2 setsockopt(), getsockopt() * * Applications use setsockopt() and getsockopt() to set or retrieve @@ -4485,6 +4536,9 @@ static int sctp_setsockopt(struct sock *sk, int level, int optname, case SCTP_REUSE_PORT: retval = sctp_setsockopt_reuse_port(sk, optval, optlen); break; + case SCTP_EVENT: + retval = sctp_setsockopt_event(sk, optval, optlen); + break; default: retval = -ENOPROTOOPT; break; @@ -7430,6 +7484,38 @@ static int sctp_getsockopt_reuse_port(struct sock *sk, int len, return 0; } +static int sctp_getsockopt_event(struct sock *sk, int len, char __user *optval, +int __user *optlen) +{ + struct sctp_association *asoc; + struct sctp_event param; + __u16 subscribe; + + if (len < sizeof(param)) + return -EINVAL; + + len = sizeof(param); + if (copy_from_user(, optval, len)) + return -EFAULT; + + if (param.se_type < SCTP_SN_TYPE_BASE || + param.se_type > SCTP_SN_TYPE_MAX) + return -EINVAL; + + asoc = sctp_id2assoc(sk, param.se_assoc_id); + subscribe = asoc ? asoc->subscribe : sctp_sk(sk)->subscribe; + param.se_on = sctp_ulpevent_type_enabled(subscribe, param.se_type); + + if (put_user(len, optlen)) + return -EFAULT; + + if (copy_to_user(optval, , len)) + return -EFAULT; + + return 0; +} + + static int sctp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { @@ -7628,6 +7714,9 @@ static int sctp_getsockopt(struct sock *sk, int level, int optname, case SCTP_REUSE_PORT: retval = sctp_getsockopt_reuse_port(sk, len, optval, optlen); break; + case SCTP_EVENT: + retval = sctp_getsockopt_event(sk, len, optval, optlen); + break; default: retval = -ENOPROTOOPT; break; -- 2.1.0
[PATCH net-next 0/4] sctp: add subscribe per asoc and sockopt SCTP_EVENT
This patchset mainly adds the Event Subscription sockopt described in rfc6525#section-6.2: Subscribing to events as described in [RFC6458] uses a setsockopt() call with the SCTP_EVENT socket option. This option takes the following structure, which specifies the association, the event type (using the same value found in the event type field), and an on/off boolean. struct sctp_event { sctp_assoc_t se_assoc_id; uint16_t se_type; uint8_t se_on; }; The user fills in the se_type field with the same value found in the strreset_type field, i.e., SCTP_STREAM_RESET_EVENT. The user will also fill in the se_assoc_id field with either the association to set this event on (this field is ignored for one-to-one style sockets) or one of the reserved constant values defined in [RFC6458]. Finally, the se_on field is set with a 1 to enable the event or a 0 to disable the event. As for the old SCTP_EVENTS Option with struct sctp_event_subscribe, it's being DEPRECATED. Xin Long (4): sctp: define subscribe in sctp_sock as __u16 sctp: add subscribe per asoc sctp: rename enum sctp_event to sctp_event_type sctp: add sockopt SCTP_EVENT include/net/sctp/constants.h | 2 +- include/net/sctp/sm.h| 4 +- include/net/sctp/structs.h | 4 +- include/net/sctp/ulpevent.h | 39 -- include/uapi/linux/sctp.h| 13 - net/sctp/associola.c | 2 + net/sctp/chunk.c | 8 ++- net/sctp/primitive.c | 2 +- net/sctp/sm_sideeffect.c | 12 ++--- net/sctp/sm_statetable.c | 2 +- net/sctp/socket.c| 126 --- net/sctp/stream_interleave.c | 12 +++-- net/sctp/ulpqueue.c | 8 +-- 13 files changed, 184 insertions(+), 50 deletions(-) -- 2.1.0
[PATCH net 1/1] s390/ism: clear dmbe_mask bit before SMC IRQ handling
SMC-D stress workload showed connection stalls. Since the firmware decides to skip raising an interrupt if the SBA DMBE mask bit is still set, this SBA DMBE mask bit should be cleared before the IRQ handling in the SMC code runs. Otherwise there are small windows possible with missing interrupts for incoming data. SMC-D currently does not care about the old value of the SBA DMBE mask. Acked-by: Sebastian Ott Signed-off-by: Ursula Braun --- drivers/s390/net/ism_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c index f96ec68af2e5..dcbf5c857743 100644 --- a/drivers/s390/net/ism_drv.c +++ b/drivers/s390/net/ism_drv.c @@ -415,9 +415,9 @@ static irqreturn_t ism_handle_irq(int irq, void *data) break; clear_bit_inv(bit, bv); + ism->sba->dmbe_mask[bit + ISM_DMB_BIT_OFFSET] = 0; barrier(); smcd_handle_irq(ism->smcd, bit + ISM_DMB_BIT_OFFSET); - ism->sba->dmbe_mask[bit + ISM_DMB_BIT_OFFSET] = 0; } if (ism->sba->e) { -- 2.16.4
[PATCH net 5/5] net/smc: use after free fix in smc_wr_tx_put_slot()
From: Ursula Braun In smc_wr_tx_put_slot() field pend->idx is used after being cleared. That means always idx 0 is cleared in the wr_tx_mask. This results in a broken administration of available WR send payload buffers. Signed-off-by: Ursula Braun --- net/smc/smc_wr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index 3c458d279855..c2694750a6a8 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c @@ -215,12 +215,14 @@ int smc_wr_tx_put_slot(struct smc_link *link, pend = container_of(wr_pend_priv, struct smc_wr_tx_pend, priv); if (pend->idx < link->wr_tx_cnt) { + u32 idx = pend->idx; + /* clear the full struct smc_wr_tx_pend including .priv */ memset(>wr_tx_pends[pend->idx], 0, sizeof(link->wr_tx_pends[pend->idx])); memset(>wr_tx_bufs[pend->idx], 0, sizeof(link->wr_tx_bufs[pend->idx])); - test_and_clear_bit(pend->idx, link->wr_tx_mask); + test_and_clear_bit(idx, link->wr_tx_mask); return 1; } -- 2.16.4
[PATCH net 3/5] net/smc: add SMC-D shutdown signal
From: Hans Wippel When a SMC-D link group is freed, a shutdown signal should be sent to the peer to indicate that the link group is invalid. This patch adds the shutdown signal to the SMC code. Signed-off-by: Hans Wippel Signed-off-by: Ursula Braun --- net/smc/smc_core.c | 10 -- net/smc/smc_core.h | 3 ++- net/smc/smc_ism.c | 43 --- net/smc/smc_ism.h | 1 + 4 files changed, 43 insertions(+), 14 deletions(-) diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 3c023de58afd..1c9fa7f0261a 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -184,6 +184,8 @@ static void smc_lgr_free_work(struct work_struct *work) if (!lgr->is_smcd && lnk->state != SMC_LNK_INACTIVE) smc_llc_link_inactive(lnk); + if (lgr->is_smcd) + smc_ism_signal_shutdown(lgr); smc_lgr_free(lgr); } } @@ -485,7 +487,7 @@ void smc_port_terminate(struct smc_ib_device *smcibdev, u8 ibport) } /* Called when SMC-D device is terminated or peer is lost */ -void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid) +void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid, unsigned short vlan) { struct smc_link_group *lgr, *l; LIST_HEAD(lgr_free_list); @@ -495,7 +497,7 @@ void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid) list_for_each_entry_safe(lgr, l, _lgr_list.list, list) { if (lgr->is_smcd && lgr->smcd == dev && (!peer_gid || lgr->peer_gid == peer_gid) && - !list_empty(>list)) { + (vlan == VLAN_VID_MASK || lgr->vlan_id == vlan)) { __smc_lgr_terminate(lgr); list_move(>list, _free_list); } @@ -506,6 +508,8 @@ void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid) list_for_each_entry_safe(lgr, l, _free_list, list) { list_del_init(>list); cancel_delayed_work_sync(>free_work); + if (!peer_gid && vlan == VLAN_VID_MASK) /* dev terminated? */ + smc_ism_signal_shutdown(lgr); smc_lgr_free(lgr); } } @@ -1026,6 +1030,8 @@ void smc_core_exit(void) smc_llc_link_inactive(lnk); } cancel_delayed_work_sync(>free_work); + if (lgr->is_smcd) + smc_ism_signal_shutdown(lgr); smc_lgr_free(lgr); /* free link group */ } } diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 5bc6cbaf0ed5..cf98f4d6093e 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -247,7 +247,8 @@ void smc_lgr_free(struct smc_link_group *lgr); void smc_lgr_forget(struct smc_link_group *lgr); void smc_lgr_terminate(struct smc_link_group *lgr); void smc_port_terminate(struct smc_ib_device *smcibdev, u8 ibport); -void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid); +void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid, + unsigned short vlan); int smc_buf_create(struct smc_sock *smc, bool is_smcd); int smc_uncompress_bufsize(u8 compressed); int smc_rmb_rtoken_handling(struct smc_connection *conn, diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c index e36f21ce7252..2fff79db1a59 100644 --- a/net/smc/smc_ism.c +++ b/net/smc/smc_ism.c @@ -187,22 +187,28 @@ struct smc_ism_event_work { #define ISM_EVENT_REQUEST 0x0001 #define ISM_EVENT_RESPONSE 0x0002 #define ISM_EVENT_REQUEST_IR 0x0001 +#define ISM_EVENT_CODE_SHUTDOWN0x80 #define ISM_EVENT_CODE_TESTLINK0x83 +union smcd_sw_event_info { + u64 info; + struct { + u8 uid[SMC_LGR_ID_SIZE]; + unsigned short vlan_id; + u16 code; + }; +}; + static void smcd_handle_sw_event(struct smc_ism_event_work *wrk) { - union { - u64 info; - struct { - u32 uid; - unsigned short vlanid; - u16 code; - }; - } ev_info; + union smcd_sw_event_info ev_info; + ev_info.info = wrk->event.info; switch (wrk->event.code) { + case ISM_EVENT_CODE_SHUTDOWN: /* Peer shut down DMBs */ + smc_smcd_terminate(wrk->smcd, wrk->event.tok, ev_info.vlan_id); + break; case ISM_EVENT_CODE_TESTLINK: /* Activity timer */ - ev_info.info = wrk->event.info; if (ev_info.code == ISM_EVENT_REQUEST) { ev_info.code = ISM_EVENT_RESPONSE; wrk->smcd->ops->signal_event(wrk->smcd, @@ -215,6 +221,21 @@ static void smcd_handle_sw_event(struct smc_ism_event_work *wrk) } } +int smc_ism_signal_shutdown(struct smc_link_group *lgr) +{ +
[PATCH net 2/5] net/smc: use queue pair number when matching link group
From: Karsten Graul When searching for an existing link group the queue pair number is also to be taken into consideration. When the SMC server sends a new number in a CLC packet (keeping all other values equal) then a new link group is to be created on the SMC client side. Signed-off-by: Karsten Graul Signed-off-by: Ursula Braun --- net/smc/af_smc.c | 9 + net/smc/smc_core.c | 10 ++ net/smc/smc_core.h | 2 +- 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 84f67f601838..5fbaf1901571 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -549,7 +549,8 @@ static int smc_connect_rdma(struct smc_sock *smc, mutex_lock(_create_lgr_pending); local_contact = smc_conn_create(smc, false, aclc->hdr.flag, ibdev, - ibport, >lcl, NULL, 0); + ibport, ntoh24(aclc->qpn), >lcl, + NULL, 0); if (local_contact < 0) { if (local_contact == -ENOMEM) reason_code = SMC_CLC_DECL_MEM;/* insufficient memory*/ @@ -620,7 +621,7 @@ static int smc_connect_ism(struct smc_sock *smc, int rc = 0; mutex_lock(_create_lgr_pending); - local_contact = smc_conn_create(smc, true, aclc->hdr.flag, NULL, 0, + local_contact = smc_conn_create(smc, true, aclc->hdr.flag, NULL, 0, 0, NULL, ismdev, aclc->gid); if (local_contact < 0) return smc_connect_abort(smc, SMC_CLC_DECL_MEM, 0); @@ -1085,7 +1086,7 @@ static int smc_listen_rdma_init(struct smc_sock *new_smc, int *local_contact) { /* allocate connection / link group */ - *local_contact = smc_conn_create(new_smc, false, 0, ibdev, ibport, + *local_contact = smc_conn_create(new_smc, false, 0, ibdev, ibport, 0, >lcl, NULL, 0); if (*local_contact < 0) { if (*local_contact == -ENOMEM) @@ -1109,7 +1110,7 @@ static int smc_listen_ism_init(struct smc_sock *new_smc, struct smc_clc_msg_smcd *pclc_smcd; pclc_smcd = smc_get_clc_msg_smcd(pclc); - *local_contact = smc_conn_create(new_smc, true, 0, NULL, 0, NULL, + *local_contact = smc_conn_create(new_smc, true, 0, NULL, 0, 0, NULL, ismdev, pclc_smcd->gid); if (*local_contact < 0) { if (*local_contact == -ENOMEM) diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 18daebcef181..3c023de58afd 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -559,7 +559,7 @@ int smc_vlan_by_tcpsk(struct socket *clcsock, unsigned short *vlan_id) static bool smcr_lgr_match(struct smc_link_group *lgr, struct smc_clc_msg_local *lcl, - enum smc_lgr_role role) + enum smc_lgr_role role, u32 clcqpn) { return !memcmp(lgr->peer_systemid, lcl->id_for_peer, SMC_SYSTEMID_LEN) && @@ -567,7 +567,9 @@ static bool smcr_lgr_match(struct smc_link_group *lgr, SMC_GID_SIZE) && !memcmp(lgr->lnk[SMC_SINGLE_LINK].peer_mac, lcl->mac, sizeof(lcl->mac)) && - lgr->role == role; + lgr->role == role && + (lgr->role == SMC_SERV || +lgr->lnk[SMC_SINGLE_LINK].peer_qpn == clcqpn); } static bool smcd_lgr_match(struct smc_link_group *lgr, @@ -578,7 +580,7 @@ static bool smcd_lgr_match(struct smc_link_group *lgr, /* create a new SMC connection (and a new link group if necessary) */ int smc_conn_create(struct smc_sock *smc, bool is_smcd, int srv_first_contact, - struct smc_ib_device *smcibdev, u8 ibport, + struct smc_ib_device *smcibdev, u8 ibport, u32 clcqpn, struct smc_clc_msg_local *lcl, struct smcd_dev *smcd, u64 peer_gid) { @@ -603,7 +605,7 @@ int smc_conn_create(struct smc_sock *smc, bool is_smcd, int srv_first_contact, list_for_each_entry(lgr, _lgr_list.list, list) { write_lock_bh(>conns_lock); if ((is_smcd ? smcd_lgr_match(lgr, smcd, peer_gid) : -smcr_lgr_match(lgr, lcl, role)) && +smcr_lgr_match(lgr, lcl, role, clcqpn)) && !lgr->sync_err && lgr->vlan_id == vlan_id && (role == SMC_CLNT || diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index c156674733c9..5bc6cbaf0ed5 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -262,7 +262,7 @@ int smc_vlan_by_tcpsk(struct socket *clcsock, unsigned short *vlan_id); void smc_conn_free(struct smc_connection *conn); int smc_conn_create(struct smc_sock *smc, bool is_smcd, int srv_first_contact, - struct
[PATCH net 0/5] net/smc: fixes 2018-11-12
From: Ursula Braun here are some fixes in different areas of the smc code for the net tree. Thanks, Ursula Hans Wippel (2): net/smc: abort CLC connection in smc_release net/smc: add SMC-D shutdown signal Karsten Graul (1): net/smc: use queue pair number when matching link group Ursula Braun (2): net/smc: atomic SMCD cursor handling net/smc: use after free fix in smc_wr_tx_put_slot() net/smc/af_smc.c | 11 +++ net/smc/smc_cdc.c | 24 +-- net/smc/smc_cdc.h | 56 +- net/smc/smc_core.c | 20 +-- net/smc/smc_core.h | 5 +++-- net/smc/smc_ism.c | 43 ++--- net/smc/smc_ism.h | 1 + net/smc/smc_wr.c | 4 +++- 8 files changed, 117 insertions(+), 47 deletions(-) -- 2.16.4
[PATCH net 1/5] net/smc: abort CLC connection in smc_release
From: Hans Wippel In case of a non-blocking SMC socket, the initial CLC handshake is performed over a blocking TCP connection in a worker. If the SMC socket is released, smc_release has to wait for the blocking CLC socket operations (e.g., kernel_connect) inside the worker. This patch aborts a CLC connection when the respective non-blocking SMC socket is released to avoid waiting on socket operations or timeouts. Signed-off-by: Hans Wippel Signed-off-by: Ursula Braun --- net/smc/af_smc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 80e2119f1c70..84f67f601838 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -127,6 +127,8 @@ static int smc_release(struct socket *sock) smc = smc_sk(sk); /* cleanup for a dangling non-blocking connect */ + if (smc->connect_info && sk->sk_state == SMC_INIT) + tcp_abort(smc->clcsock->sk, ECONNABORTED); flush_work(>connect_work); kfree(smc->connect_info); smc->connect_info = NULL; -- 2.16.4
[PATCH net 4/5] net/smc: atomic SMCD cursor handling
Running uperf tests with SMCD on LPARs results in corrupted cursors. SMCD cursors should be treated atomically to fix cursor corruption. Signed-off-by: Ursula Braun --- net/smc/smc_cdc.c | 24 ++-- net/smc/smc_cdc.h | 56 ++- 2 files changed, 57 insertions(+), 23 deletions(-) diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c index ed5dcf03fe0b..18c047c155fd 100644 --- a/net/smc/smc_cdc.c +++ b/net/smc/smc_cdc.c @@ -177,23 +177,24 @@ void smc_cdc_tx_dismiss_slots(struct smc_connection *conn) int smcd_cdc_msg_send(struct smc_connection *conn) { struct smc_sock *smc = container_of(conn, struct smc_sock, conn); + union smc_host_cursor curs; struct smcd_cdc_msg cdc; int rc, diff; memset(, 0, sizeof(cdc)); cdc.common.type = SMC_CDC_MSG_TYPE; - cdc.prod_wrap = conn->local_tx_ctrl.prod.wrap; - cdc.prod_count = conn->local_tx_ctrl.prod.count; - - cdc.cons_wrap = conn->local_tx_ctrl.cons.wrap; - cdc.cons_count = conn->local_tx_ctrl.cons.count; - cdc.prod_flags = conn->local_tx_ctrl.prod_flags; - cdc.conn_state_flags = conn->local_tx_ctrl.conn_state_flags; + curs.acurs.counter = atomic64_read(>local_tx_ctrl.prod.acurs); + cdc.prod.wrap = curs.wrap; + cdc.prod.count = curs.count; + curs.acurs.counter = atomic64_read(>local_tx_ctrl.cons.acurs); + cdc.cons.wrap = curs.wrap; + cdc.cons.count = curs.count; + cdc.cons.prod_flags = conn->local_tx_ctrl.prod_flags; + cdc.cons.conn_state_flags = conn->local_tx_ctrl.conn_state_flags; rc = smcd_tx_ism_write(conn, , sizeof(cdc), 0, 1); if (rc) return rc; - smc_curs_copy(>rx_curs_confirmed, >local_tx_ctrl.cons, - conn); + smc_curs_copy(>rx_curs_confirmed, , conn); /* Calculate transmitted data and increment free send buffer space */ diff = smc_curs_diff(conn->sndbuf_desc->len, >tx_curs_fin, >tx_curs_sent); @@ -331,13 +332,16 @@ static void smc_cdc_msg_recv(struct smc_sock *smc, struct smc_cdc_msg *cdc) static void smcd_cdc_rx_tsklet(unsigned long data) { struct smc_connection *conn = (struct smc_connection *)data; + struct smcd_cdc_msg *data_cdc; struct smcd_cdc_msg cdc; struct smc_sock *smc; if (!conn) return; - memcpy(, conn->rmb_desc->cpu_addr, sizeof(cdc)); + data_cdc = (struct smcd_cdc_msg *)conn->rmb_desc->cpu_addr; + smcd_curs_copy(, _cdc->prod, conn); + smcd_curs_copy(, _cdc->cons, conn); smc = container_of(conn, struct smc_sock, conn); smc_cdc_msg_recv(smc, (struct smc_cdc_msg *)); } diff --git a/net/smc/smc_cdc.h b/net/smc/smc_cdc.h index 934df4473a7c..b9772ab77ddf 100644 --- a/net/smc/smc_cdc.h +++ b/net/smc/smc_cdc.h @@ -50,17 +50,27 @@ struct smc_cdc_msg { u8 reserved[18]; } __packed;/* format defined in RFC7609 */ +/* SMC-D cursor format */ +union smcd_cdc_cursor { + struct { + u16 wrap; + u32 count; + struct smc_cdc_producer_flags prod_flags; + struct smc_cdc_conn_state_flags conn_state_flags; + } __packed; +#ifdef KERNEL_HAS_ATOMIC64 + atomic64_t acurs; /* for atomic processing */ +#else + u64 acurs; /* for atomic processing */ +#endif +} __aligned(8); + /* CDC message for SMC-D */ struct smcd_cdc_msg { struct smc_wr_rx_hdr common;/* Type = 0xFE */ u8 res1[7]; - u16 prod_wrap; - u32 prod_count; - u8 res2[2]; - u16 cons_wrap; - u32 cons_count; - struct smc_cdc_producer_flags prod_flags; - struct smc_cdc_conn_state_flags conn_state_flags; + union smcd_cdc_cursor prod; + union smcd_cdc_cursor cons; u8 res3[8]; } __packed; @@ -135,6 +145,21 @@ static inline void smc_curs_copy_net(union smc_cdc_cursor *tgt, #endif } +static inline void smcd_curs_copy(union smcd_cdc_cursor *tgt, + union smcd_cdc_cursor *src, + struct smc_connection *conn) +{ +#ifndef KERNEL_HAS_ATOMIC64 + unsigned long flags; + + spin_lock_irqsave(>acurs_lock, flags); + tgt->acurs = src->acurs; + spin_unlock_irqrestore(>acurs_lock, flags); +#else + atomic64_set(>acurs, atomic64_read(>acurs)); +#endif +} + /* calculate cursor difference between old and new, where old <= new */ static inline int smc_curs_diff(unsigned int size, union smc_host_cursor *old, @@ -222,12 +247,17 @@ static inline void smcr_cdc_msg_to_host(struct smc_host_cdc_msg *local, static inline void smcd_cdc_msg_to_host(struct smc_host_cdc_msg *local,
[PATCH net-next v2 1/6] net: aquantia: add rx-flow filter definitions
From: Dmitry Bogdanov Add missing register definitions and the functions accessing them related to rx-flow filters. Signed-off-by: Dmitry Bogdanov Signed-off-by: Igor Russkikh --- .../ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c | 109 + .../ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h | 48 .../aquantia/atlantic/hw_atl/hw_atl_llh_internal.h | 135 ++--- 3 files changed, 275 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c index 5502ec5f0f69..939f77e2e117 100644 --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c @@ -898,6 +898,24 @@ void hw_atl_rpf_vlan_id_flr_set(struct aq_hw_s *aq_hw, u32 vlan_id_flr, vlan_id_flr); } +void hw_atl_rpf_vlan_rxq_en_flr_set(struct aq_hw_s *aq_hw, u32 vlan_rxq_en, + u32 filter) +{ + aq_hw_write_reg_bit(aq_hw, HW_ATL_RPF_VL_RXQ_EN_F_ADR(filter), + HW_ATL_RPF_VL_RXQ_EN_F_MSK, + HW_ATL_RPF_VL_RXQ_EN_F_SHIFT, + vlan_rxq_en); +} + +void hw_atl_rpf_vlan_rxq_flr_set(struct aq_hw_s *aq_hw, u32 vlan_rxq, +u32 filter) +{ + aq_hw_write_reg_bit(aq_hw, HW_ATL_RPF_VL_RXQ_F_ADR(filter), + HW_ATL_RPF_VL_RXQ_F_MSK, + HW_ATL_RPF_VL_RXQ_F_SHIFT, + vlan_rxq); +}; + void hw_atl_rpf_etht_flr_en_set(struct aq_hw_s *aq_hw, u32 etht_flr_en, u32 filter) { @@ -965,6 +983,20 @@ void hw_atl_rpf_etht_flr_set(struct aq_hw_s *aq_hw, u32 etht_flr, u32 filter) HW_ATL_RPF_ET_VALF_SHIFT, etht_flr); } +void hw_atl_rpf_l4_spd_set(struct aq_hw_s *aq_hw, u32 val, u32 filter) +{ + aq_hw_write_reg_bit(aq_hw, HW_ATL_RPF_L4_SPD_ADR(filter), + HW_ATL_RPF_L4_SPD_MSK, + HW_ATL_RPF_L4_SPD_SHIFT, val); +} + +void hw_atl_rpf_l4_dpd_set(struct aq_hw_s *aq_hw, u32 val, u32 filter) +{ + aq_hw_write_reg_bit(aq_hw, HW_ATL_RPF_L4_DPD_ADR(filter), + HW_ATL_RPF_L4_DPD_MSK, + HW_ATL_RPF_L4_DPD_SHIFT, val); +} + /* RPO: rx packet offload */ void hw_atl_rpo_ipv4header_crc_offload_en_set(struct aq_hw_s *aq_hw, u32 ipv4header_crc_offload_en) @@ -1476,3 +1508,80 @@ void hw_atl_mcp_up_force_intr_set(struct aq_hw_s *aq_hw, u32 up_force_intr) HW_ATL_MCP_UP_FORCE_INTERRUPT_SHIFT, up_force_intr); } + +void hw_atl_rpfl3l4_ipv4_dest_addr_clear(struct aq_hw_s *aq_hw, u8 location) +{ + aq_hw_write_reg(aq_hw, HW_ATL_RPF_L3_DSTA_ADR(location), 0U); +} + +void hw_atl_rpfl3l4_ipv4_src_addr_clear(struct aq_hw_s *aq_hw, u8 location) +{ + aq_hw_write_reg(aq_hw, HW_ATL_RPF_L3_SRCA_ADR(location), 0U); +} + +void hw_atl_rpfl3l4_cmd_clear(struct aq_hw_s *aq_hw, u8 location) +{ + aq_hw_write_reg(aq_hw, HW_ATL_RPF_L3_REG_CTRL_ADR(location), 0U); +} + +void hw_atl_rpfl3l4_ipv6_dest_addr_clear(struct aq_hw_s *aq_hw, u8 location) +{ + int i; + + for (i = 0; i < 4; ++i) + aq_hw_write_reg(aq_hw, + HW_ATL_RPF_L3_DSTA_ADR(location + i), + 0U); +} + +void hw_atl_rpfl3l4_ipv6_src_addr_clear(struct aq_hw_s *aq_hw, u8 location) +{ + int i; + + for (i = 0; i < 4; ++i) + aq_hw_write_reg(aq_hw, + HW_ATL_RPF_L3_SRCA_ADR(location + i), + 0U); +} + +void hw_atl_rpfl3l4_ipv4_dest_addr_set(struct aq_hw_s *aq_hw, u8 location, + u32 ipv4_dest) +{ + aq_hw_write_reg(aq_hw, HW_ATL_RPF_L3_DSTA_ADR(location), + ipv4_dest); +} + +void hw_atl_rpfl3l4_ipv4_src_addr_set(struct aq_hw_s *aq_hw, u8 location, + u32 ipv4_src) +{ + aq_hw_write_reg(aq_hw, + HW_ATL_RPF_L3_SRCA_ADR(location), + ipv4_src); +} + +void hw_atl_rpfl3l4_cmd_set(struct aq_hw_s *aq_hw, u8 location, u32 cmd) +{ + aq_hw_write_reg(aq_hw, HW_ATL_RPF_L3_REG_CTRL_ADR(location), cmd); +} + +void hw_atl_rpfl3l4_ipv6_src_addr_set(struct aq_hw_s *aq_hw, u8 location, + u32 *ipv6_src) +{ + int i; + + for (i = 0; i < 4; ++i) + aq_hw_write_reg(aq_hw, + HW_ATL_RPF_L3_SRCA_ADR(location + i), + ipv6_src[i]); +} + +void hw_atl_rpfl3l4_ipv6_dest_addr_set(struct aq_hw_s *aq_hw, u8 location, + u32 *ipv6_dest) +{ + int i; + + for (i = 0; i < 4; ++i)
[PATCH net-next v2 2/6] net: aquantia: add infrastructure for ntuple rules
From: Dmitry Bogdanov Add infrastructure to support ntuple filter configuration. Add rule, remove rule, reapply on interface up. Signed-off-by: Dmitry Bogdanov Signed-off-by: Igor Russkikh --- drivers/net/ethernet/aquantia/atlantic/Makefile| 1 + .../net/ethernet/aquantia/atlantic/aq_ethtool.c| 31 ++ .../net/ethernet/aquantia/atlantic/aq_filters.c| 413 + .../net/ethernet/aquantia/atlantic/aq_filters.h| 31 ++ drivers/net/ethernet/aquantia/atlantic/aq_main.c | 15 + drivers/net/ethernet/aquantia/atlantic/aq_nic.h| 6 + .../net/ethernet/aquantia/atlantic/aq_pci_func.c | 2 + .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 3 +- 8 files changed, 501 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/aquantia/atlantic/aq_filters.c create mode 100644 drivers/net/ethernet/aquantia/atlantic/aq_filters.h diff --git a/drivers/net/ethernet/aquantia/atlantic/Makefile b/drivers/net/ethernet/aquantia/atlantic/Makefile index 686f6d8c9e79..4556630ee286 100644 --- a/drivers/net/ethernet/aquantia/atlantic/Makefile +++ b/drivers/net/ethernet/aquantia/atlantic/Makefile @@ -36,6 +36,7 @@ atlantic-objs := aq_main.o \ aq_ring.o \ aq_hw_utils.o \ aq_ethtool.o \ + aq_filters.o \ hw_atl/hw_atl_a0.o \ hw_atl/hw_atl_b0.o \ hw_atl/hw_atl_utils.o \ diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c index 99ef1daaa4d8..a5fd71692c8b 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c @@ -12,6 +12,7 @@ #include "aq_ethtool.h" #include "aq_nic.h" #include "aq_vec.h" +#include "aq_filters.h" static void aq_ethtool_get_regs(struct net_device *ndev, struct ethtool_regs *regs, void *p) @@ -213,7 +214,36 @@ static int aq_ethtool_get_rxnfc(struct net_device *ndev, case ETHTOOL_GRXRINGS: cmd->data = cfg->vecs; break; + case ETHTOOL_GRXCLSRLCNT: + cmd->rule_cnt = aq_get_rxnfc_count_all_rules(aq_nic); + break; + case ETHTOOL_GRXCLSRULE: + err = aq_get_rxnfc_rule(aq_nic, cmd); + break; + case ETHTOOL_GRXCLSRLALL: + err = aq_get_rxnfc_all_rules(aq_nic, cmd, rule_locs); + break; + default: + err = -EOPNOTSUPP; + break; + } + return err; +} + +static int aq_ethtool_set_rxnfc(struct net_device *ndev, + struct ethtool_rxnfc *cmd) +{ + int err = 0; + struct aq_nic_s *aq_nic = netdev_priv(ndev); + + switch (cmd->cmd) { + case ETHTOOL_SRXCLSRLINS: + err = aq_add_rxnfc_rule(aq_nic, cmd); + break; + case ETHTOOL_SRXCLSRLDEL: + err = aq_del_rxnfc_rule(aq_nic, cmd); + break; default: err = -EOPNOTSUPP; break; @@ -520,6 +550,7 @@ const struct ethtool_ops aq_ethtool_ops = { .get_rxfh_key_size = aq_ethtool_get_rss_key_size, .get_rxfh= aq_ethtool_get_rss, .get_rxnfc = aq_ethtool_get_rxnfc, + .set_rxnfc = aq_ethtool_set_rxnfc, .get_sset_count = aq_ethtool_get_sset_count, .get_ethtool_stats = aq_ethtool_stats, .get_link_ksettings = aq_ethtool_get_link_ksettings, diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c new file mode 100644 index ..8cb4eaefd6e1 --- /dev/null +++ b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c @@ -0,0 +1,413 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Copyright (C) 2014-2017 aQuantia Corporation. */ + +/* File aq_filters.c: RX filters related functions. */ + +#include "aq_filters.h" + +static bool __must_check +aq_rule_is_approve(struct ethtool_rx_flow_spec *fsp) +{ + if (fsp->flow_type & FLOW_MAC_EXT) + return false; + + switch (fsp->flow_type & ~FLOW_EXT) { + case ETHER_FLOW: + case TCP_V4_FLOW: + case UDP_V4_FLOW: + case SCTP_V4_FLOW: + case TCP_V6_FLOW: + case UDP_V6_FLOW: + case SCTP_V6_FLOW: + case IPV4_FLOW: + case IPV6_FLOW: + return true; + case IP_USER_FLOW: + switch (fsp->h_u.usr_ip4_spec.proto) { + case IPPROTO_TCP: + case IPPROTO_UDP: + case IPPROTO_SCTP: + case IPPROTO_IP: + return true; + default: + return false; + } + case IPV6_USER_FLOW: + switch (fsp->h_u.usr_ip6_spec.l4_proto) { + case IPPROTO_TCP: + case IPPROTO_UDP: + case IPPROTO_SCTP: + case IPPROTO_IP: +
[PATCH net-next v2 4/6] net: aquantia: add vlan id to rx flow filters
From: Dmitry Bogdanov The VLAN filter (VLAN id) is compared against 16 filters. VLAN id must be accompanied by mask 0xF000. That is to distinguish VLAN filter from L2 Ethertype filter with UserPriority since both User Priority and VLAN ID are passed in the same 'vlan' parameter. Flow type may be any as it is not matched for VLAN filter. Due to fixed order of the rules in the NIC, the location 0-15 are reserved for vlan filters. Example: To add a rule that directs packets from VLAN 2001 to queue 5: ethtool -N flow-type ip4 vlan 2001 m 0xF000 action 5 loc 0 Signed-off-by: Dmitry Bogdanov Signed-off-by: Igor Russkikh --- drivers/net/ethernet/aquantia/atlantic/aq_common.h | 2 +- .../net/ethernet/aquantia/atlantic/aq_filters.c| 92 +- .../net/ethernet/aquantia/atlantic/aq_filters.h| 2 + drivers/net/ethernet/aquantia/atlantic/aq_hw.h | 9 +++ drivers/net/ethernet/aquantia/atlantic/aq_nic.h| 5 ++ .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 37 + .../aquantia/atlantic/hw_atl/hw_atl_utils.h| 7 ++ 7 files changed, 151 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_common.h b/drivers/net/ethernet/aquantia/atlantic/aq_common.h index becb578211ed..6b6d1724676e 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_common.h +++ b/drivers/net/ethernet/aquantia/atlantic/aq_common.h @@ -14,7 +14,7 @@ #include #include - +#include #include "ver.h" #include "aq_cfg.h" #include "aq_utils.h" diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c index 34e2a28d344a..c5240bc487b4 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c @@ -120,6 +120,29 @@ static int aq_check_approve_fl3l4(struct aq_nic_s *aq_nic, } static int __must_check +aq_check_approve_fvlan(struct aq_nic_s *aq_nic, + struct aq_hw_rx_fltrs_s *rx_fltrs, + struct ethtool_rx_flow_spec *fsp) +{ + if (fsp->location < AQ_RX_FIRST_LOC_FVLANID || + fsp->location > AQ_RX_LAST_LOC_FVLANID) { + netdev_err(aq_nic->ndev, + "ethtool: location must be in range [%d, %d]", + AQ_RX_FIRST_LOC_FVLANID, + AQ_RX_LAST_LOC_FVLANID); + return -EINVAL; + } + + if (fsp->ring_cookie > aq_nic->aq_nic_cfg.num_rss_queues) { + netdev_err(aq_nic->ndev, + "ethtool: queue number must be in range [0, %d]", + aq_nic->aq_nic_cfg.num_rss_queues - 1); + return -EINVAL; + } + return 0; +} + +static int __must_check aq_check_filter(struct aq_nic_s *aq_nic, struct ethtool_rx_flow_spec *fsp) { @@ -127,7 +150,14 @@ aq_check_filter(struct aq_nic_s *aq_nic, struct aq_hw_rx_fltrs_s *rx_fltrs = aq_get_hw_rx_fltrs(aq_nic); if (fsp->flow_type & FLOW_EXT) { - err = -EOPNOTSUPP; + if (be16_to_cpu(fsp->m_ext.vlan_tci) == VLAN_VID_MASK) { + err = aq_check_approve_fvlan(aq_nic, rx_fltrs, fsp); + } else { + netdev_err(aq_nic->ndev, + "ethtool: invalid vlan mask 0x%x specified", + be16_to_cpu(fsp->m_ext.vlan_tci)); + err = -EINVAL; + } } else { switch (fsp->flow_type & ~FLOW_EXT) { case ETHER_FLOW: @@ -229,6 +259,42 @@ aq_check_rule(struct aq_nic_s *aq_nic, return err; } +static int aq_set_data_fvlan(struct aq_nic_s *aq_nic, +struct aq_rx_filter *aq_rx_fltr, +struct aq_rx_filter_vlan *aq_vlans, bool add) +{ + const struct ethtool_rx_flow_spec *fsp = _rx_fltr->aq_fsp; + int location = fsp->location - AQ_RX_FIRST_LOC_FVLANID; + + memset(_vlans[location], 0, sizeof(aq_vlans[location])); + + if (!add) + return 0; + + aq_vlans[location].location = location; + aq_vlans[location].vlan_id = be16_to_cpu(fsp->h_ext.vlan_tci) +& VLAN_VID_MASK; + aq_vlans[location].queue = fsp->ring_cookie & 0x1FU; + aq_vlans[location].enable = 1U; + return 0; +} + +static int aq_add_del_fvlan(struct aq_nic_s *aq_nic, + struct aq_rx_filter *aq_rx_fltr, bool add) +{ + const struct aq_hw_ops *aq_hw_ops = aq_nic->aq_hw_ops; + + if (unlikely(!aq_hw_ops->hw_filter_vlan_set)) + return -EOPNOTSUPP; + + aq_set_data_fvlan(aq_nic, + aq_rx_fltr, + aq_nic->aq_hw_rx_fltrs.fl2.aq_vlans, + add); + + return aq_filters_vlans_update(aq_nic); +} + static int aq_set_data_fl3l4(struct
[PATCH net-next v2 5/6] net: aquantia: add ethertype and PCP to rx flow filters
From: Dmitry Bogdanov L2 EtherType filters allows to filter packet by EtherType field or both EtherType and User Priority (PCP) field of 802.1Q. UserPriority (vlan) parameter must be accompanied by mask 0x1FFF. That is to distinguish VLAN filter from L2 Ethertype filter with UserPriority since both User Priority and VLAN ID are passed in the same 'vlan' parameter. Example: To add a filter that directs IP4 packess of priority 3 to queue 3: ethtool -N flow-type ether proto 0x800 vlan 0x600 m 0x1FFF \ action 3 loc 16 Signed-off-by: Dmitry Bogdanov Signed-off-by: Igor Russkikh --- .../net/ethernet/aquantia/atlantic/aq_filters.c| 83 -- .../net/ethernet/aquantia/atlantic/aq_filters.h| 1 + drivers/net/ethernet/aquantia/atlantic/aq_hw.h | 8 +++ .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 37 ++ .../aquantia/atlantic/hw_atl/hw_atl_utils.h| 8 +++ 5 files changed, 133 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c index c5240bc487b4..b987c6717af6 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c @@ -120,6 +120,30 @@ static int aq_check_approve_fl3l4(struct aq_nic_s *aq_nic, } static int __must_check +aq_check_approve_fl2(struct aq_nic_s *aq_nic, +struct aq_hw_rx_fltrs_s *rx_fltrs, +struct ethtool_rx_flow_spec *fsp) +{ + if (fsp->location < AQ_RX_FIRST_LOC_FETHERT || + fsp->location > AQ_RX_LAST_LOC_FETHERT) { + netdev_err(aq_nic->ndev, + "ethtool: location must be in range [%d, %d]", + AQ_RX_FIRST_LOC_FETHERT, + AQ_RX_LAST_LOC_FETHERT); + return -EINVAL; + } + + if (be16_to_cpu(fsp->m_ext.vlan_tci) == VLAN_PRIO_MASK && + fsp->m_u.ether_spec.h_proto == 0U) { + netdev_err(aq_nic->ndev, + "ethtool: proto (ether_type) parameter must be specfied"); + return -EINVAL; + } + + return 0; +} + +static int __must_check aq_check_approve_fvlan(struct aq_nic_s *aq_nic, struct aq_hw_rx_fltrs_s *rx_fltrs, struct ethtool_rx_flow_spec *fsp) @@ -152,6 +176,8 @@ aq_check_filter(struct aq_nic_s *aq_nic, if (fsp->flow_type & FLOW_EXT) { if (be16_to_cpu(fsp->m_ext.vlan_tci) == VLAN_VID_MASK) { err = aq_check_approve_fvlan(aq_nic, rx_fltrs, fsp); + } else if (be16_to_cpu(fsp->m_ext.vlan_tci) == VLAN_PRIO_MASK) { + err = aq_check_approve_fl2(aq_nic, rx_fltrs, fsp); } else { netdev_err(aq_nic->ndev, "ethtool: invalid vlan mask 0x%x specified", @@ -161,7 +187,7 @@ aq_check_filter(struct aq_nic_s *aq_nic, } else { switch (fsp->flow_type & ~FLOW_EXT) { case ETHER_FLOW: - err = -EOPNOTSUPP; + err = aq_check_approve_fl2(aq_nic, rx_fltrs, fsp); break; case TCP_V4_FLOW: case UDP_V4_FLOW: @@ -210,6 +236,10 @@ aq_rule_is_not_support(struct aq_nic_s *aq_nic, netdev_err(aq_nic->ndev, "ethtool: The specified tos tclass are not supported\n"); rule_is_not_support = true; + } else if (fsp->flow_type & FLOW_MAC_EXT) { + netdev_err(aq_nic->ndev, + "ethtool: MAC_EXT is not supported"); + rule_is_not_support = true; } return rule_is_not_support; @@ -259,6 +289,48 @@ aq_check_rule(struct aq_nic_s *aq_nic, return err; } +static void aq_set_data_fl2(struct aq_nic_s *aq_nic, + struct aq_rx_filter *aq_rx_fltr, + struct aq_rx_filter_l2 *data, bool add) +{ + const struct ethtool_rx_flow_spec *fsp = _rx_fltr->aq_fsp; + + memset(data, 0, sizeof(*data)); + + data->location = fsp->location - AQ_RX_FIRST_LOC_FETHERT; + + if (fsp->ring_cookie != RX_CLS_FLOW_DISC) + data->queue = fsp->ring_cookie; + else + data->queue = -1; + + data->ethertype = be16_to_cpu(fsp->h_u.ether_spec.h_proto); + data->user_priority_en = be16_to_cpu(fsp->m_ext.vlan_tci) +== VLAN_PRIO_MASK; + data->user_priority = (be16_to_cpu(fsp->h_ext.vlan_tci) + & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT; +} + +static int aq_add_del_fether(struct aq_nic_s *aq_nic, +struct aq_rx_filter *aq_rx_fltr, bool add) +{ + struct aq_rx_filter_l2 data; + struct aq_hw_s *aq_hw = aq_nic->aq_hw; + const struct aq_hw_ops
[PATCH net-next v2 6/6] net: aquantia: add support of rx-vlan-filter offload
From: Dmitry Bogdanov Since it uses the same NIC table as rx flow vlan filter therefore rx-flow vlan filter accepts only vlans that present on the interface in case of rx-vlan-filter is on. Signed-off-by: Dmitry Bogdanov Signed-off-by: Igor Russkikh --- .../net/ethernet/aquantia/atlantic/aq_filters.c| 138 + .../net/ethernet/aquantia/atlantic/aq_filters.h| 2 + drivers/net/ethernet/aquantia/atlantic/aq_hw.h | 1 + drivers/net/ethernet/aquantia/atlantic/aq_main.c | 40 +- drivers/net/ethernet/aquantia/atlantic/aq_nic.c| 2 - drivers/net/ethernet/aquantia/atlantic/aq_nic.h| 3 +- .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 29 ++--- 7 files changed, 197 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c index b987c6717af6..f6145e7473b1 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c @@ -157,6 +157,14 @@ aq_check_approve_fvlan(struct aq_nic_s *aq_nic, return -EINVAL; } + if ((aq_nic->ndev->features & NETIF_F_HW_VLAN_CTAG_FILTER) && + (!test_bit(be16_to_cpu(fsp->h_ext.vlan_tci), + aq_nic->active_vlans))) { + netdev_err(aq_nic->ndev, + "ethtool: unknown vlan-id specified"); + return -EINVAL; + } + if (fsp->ring_cookie > aq_nic->aq_nic_cfg.num_rss_queues) { netdev_err(aq_nic->ndev, "ethtool: queue number must be in range [0, %d]", @@ -331,26 +339,108 @@ static int aq_add_del_fether(struct aq_nic_s *aq_nic, return aq_hw_ops->hw_filter_l2_clear(aq_hw, ); } +static bool aq_fvlan_is_busy(struct aq_rx_filter_vlan *aq_vlans, int vlan) +{ + int i; + + for (i = 0; i < AQ_VLAN_MAX_FILTERS; ++i) { + if (aq_vlans[i].enable && + aq_vlans[i].queue != AQ_RX_QUEUE_NOT_ASSIGNED && + aq_vlans[i].vlan_id == vlan) { + return true; + } + } + + return false; +} + +/* Function rebuilds array of vlan filters so that filters with assigned + * queue have a precedence over just vlans on the interface. + */ +static void aq_fvlan_rebuild(struct aq_nic_s *aq_nic, +unsigned long *active_vlans, +struct aq_rx_filter_vlan *aq_vlans) +{ + bool vlan_busy = false; + int vlan = -1; + int i; + + for (i = 0; i < AQ_VLAN_MAX_FILTERS; ++i) { + if (aq_vlans[i].enable && + aq_vlans[i].queue != AQ_RX_QUEUE_NOT_ASSIGNED) + continue; + do { + vlan = find_next_bit(active_vlans, +VLAN_N_VID, +vlan + 1); + if (vlan == VLAN_N_VID) { + aq_vlans[i].enable = 0U; + aq_vlans[i].queue = AQ_RX_QUEUE_NOT_ASSIGNED; + aq_vlans[i].vlan_id = 0; + continue; + } + + vlan_busy = aq_fvlan_is_busy(aq_vlans, vlan); + if (!vlan_busy) { + aq_vlans[i].enable = 1U; + aq_vlans[i].queue = AQ_RX_QUEUE_NOT_ASSIGNED; + aq_vlans[i].vlan_id = vlan; + } + } while (vlan_busy && vlan != VLAN_N_VID); + } +} + static int aq_set_data_fvlan(struct aq_nic_s *aq_nic, struct aq_rx_filter *aq_rx_fltr, struct aq_rx_filter_vlan *aq_vlans, bool add) { const struct ethtool_rx_flow_spec *fsp = _rx_fltr->aq_fsp; int location = fsp->location - AQ_RX_FIRST_LOC_FVLANID; + int i; memset(_vlans[location], 0, sizeof(aq_vlans[location])); if (!add) return 0; + /* remove vlan if it was in table without queue assignment */ + for (i = 0; i < AQ_VLAN_MAX_FILTERS; ++i) { + if (aq_vlans[i].vlan_id == + (be16_to_cpu(fsp->h_ext.vlan_tci) & VLAN_VID_MASK)) { + aq_vlans[i].enable = false; + } + } + aq_vlans[location].location = location; aq_vlans[location].vlan_id = be16_to_cpu(fsp->h_ext.vlan_tci) & VLAN_VID_MASK; aq_vlans[location].queue = fsp->ring_cookie & 0x1FU; aq_vlans[location].enable = 1U; + return 0; } +int aq_del_fvlan_by_vlan(struct aq_nic_s *aq_nic, u16 vlan_id) +{ + struct aq_hw_rx_fltrs_s *rx_fltrs = aq_get_hw_rx_fltrs(aq_nic); + struct aq_rx_filter *rule = NULL; + struct hlist_node
[PATCH net-next v2 3/6] net: aquantia: add support of L3/L4 ntuple filters
From: Dmitry Bogdanov Add support of L3/L4 5-tuple {protocol, src-ip, dst-ip, src-port, dst-port} filters. Mask is not supported. Src-port and dst-port are only compared for TCP/UDP/SCTP packets. Both IPv4 and IPv6 are supported. The supported actions are the drop and the queue assignment. Due to fixed order of the rules in the NIC, the location 32-39 are reserved for L3/L4 5-tuple filters. The locations 32 and 36 are reserved for IPv6 filters. Examples: sudo ethtool -N eth0 flow-type ip6 src-ip 2001:db8:0:f101::2 \ dst-ip 2001:db8:0:f101::5 action -1 loc 36 sudo ethtool -N eth0 flow-type udp4 src-ip 10.0.0.4 \ dst-ip 10.0.0.7 src-port 2000 dst-port 2001 action 2 loc 32 Signed-off-by: Dmitry Bogdanov Signed-off-by: Igor Russkikh --- .../net/ethernet/aquantia/atlantic/aq_filters.c| 168 - drivers/net/ethernet/aquantia/atlantic/aq_hw.h | 11 ++ drivers/net/ethernet/aquantia/atlantic/aq_nic.h| 7 + .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 58 +++ .../aquantia/atlantic/hw_atl/hw_atl_utils.h| 43 ++ 5 files changed, 284 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c index 8cb4eaefd6e1..34e2a28d344a 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c @@ -85,11 +85,46 @@ aq_rule_already_exists(struct aq_nic_s *aq_nic, return false; } +static int aq_check_approve_fl3l4(struct aq_nic_s *aq_nic, + struct aq_hw_rx_fltrs_s *rx_fltrs, + struct ethtool_rx_flow_spec *fsp) +{ + if (fsp->location < AQ_RX_FIRST_LOC_FL3L4 || + fsp->location > AQ_RX_LAST_LOC_FL3L4) { + netdev_err(aq_nic->ndev, + "ethtool: location must be in range [%d, %d]", + AQ_RX_FIRST_LOC_FL3L4, + AQ_RX_LAST_LOC_FL3L4); + return -EINVAL; + } + if (rx_fltrs->fl3l4.is_ipv6 && rx_fltrs->fl3l4.active_ipv4) { + rx_fltrs->fl3l4.is_ipv6 = false; + netdev_err(aq_nic->ndev, + "ethtool: mixing ipv4 and ipv6 is not allowed"); + return -EINVAL; + } else if (!rx_fltrs->fl3l4.is_ipv6 && rx_fltrs->fl3l4.active_ipv6) { + rx_fltrs->fl3l4.is_ipv6 = true; + netdev_err(aq_nic->ndev, + "ethtool: mixing ipv4 and ipv6 is not allowed"); + return -EINVAL; + } else if (rx_fltrs->fl3l4.is_ipv6&& + fsp->location != AQ_RX_FIRST_LOC_FL3L4 + 4 && + fsp->location != AQ_RX_FIRST_LOC_FL3L4) { + netdev_err(aq_nic->ndev, + "ethtool: The specified location for ipv6 must be %d or %d", + AQ_RX_FIRST_LOC_FL3L4, AQ_RX_FIRST_LOC_FL3L4 + 4); + return -EINVAL; + } + + return 0; +} + static int __must_check aq_check_filter(struct aq_nic_s *aq_nic, struct ethtool_rx_flow_spec *fsp) { int err = 0; + struct aq_hw_rx_fltrs_s *rx_fltrs = aq_get_hw_rx_fltrs(aq_nic); if (fsp->flow_type & FLOW_EXT) { err = -EOPNOTSUPP; @@ -103,14 +138,16 @@ aq_check_filter(struct aq_nic_s *aq_nic, case SCTP_V4_FLOW: case IPV4_FLOW: case IP_USER_FLOW: - err = -EOPNOTSUPP; + rx_fltrs->fl3l4.is_ipv6 = false; + err = aq_check_approve_fl3l4(aq_nic, rx_fltrs, fsp); break; case TCP_V6_FLOW: case UDP_V6_FLOW: case SCTP_V6_FLOW: case IPV6_FLOW: case IPV6_USER_FLOW: - err = -EOPNOTSUPP; + rx_fltrs->fl3l4.is_ipv6 = true; + err = aq_check_approve_fl3l4(aq_nic, rx_fltrs, fsp); break; default: netdev_err(aq_nic->ndev, @@ -156,6 +193,11 @@ aq_rule_is_not_correct(struct aq_nic_s *aq_nic, if (!aq_nic) { rule_is_not_correct = true; + } else if (fsp->location > AQ_RX_MAX_RXNFC_LOC) { + netdev_err(aq_nic->ndev, + "ethtool: The specified number %u rule is invalid\n", + fsp->location); + rule_is_not_correct = true; } else if (aq_check_filter(aq_nic, fsp)) { rule_is_not_correct = true; } else if (fsp->ring_cookie != RX_CLS_FLOW_DISC) { @@ -187,6 +229,125 @@ aq_check_rule(struct aq_nic_s *aq_nic, return err; } +static int aq_set_data_fl3l4(struct aq_nic_s *aq_nic, +struct aq_rx_filter *aq_rx_fltr, +struct
[PATCH net-next v2 0/6] net: aquantia: add rx-flow filter support
In this patchset the rx-flow filters functionality and vlan filter offloads are implemented. The rules in NIC hardware have fixed order and priorities. To support this, the locations of filters from ethtool perspective are also fixed: * Locations 0 - 15 for VLAN ID filters * Locations 16 - 31 for L2 EtherType and PCP filters * Locations 32 - 39 for L3/L4 5-tuple filters (locations 32, 36 for IPv6) Dmitry Bogdanov (6): net: aquantia: add rx-flow filter definitions net: aquantia: add infrastructure for ntuple rules net: aquantia: add support of L3/L4 ntuple filters net: aquantia: add vlan id to rx flow filters net: aquantia: add ethertype and PCP to rx flow filters net: aquantia: add support of rx-vlan-filter offload drivers/net/ethernet/aquantia/atlantic/Makefile| 1 + drivers/net/ethernet/aquantia/atlantic/aq_common.h | 2 +- .../net/ethernet/aquantia/atlantic/aq_ethtool.c| 31 + .../net/ethernet/aquantia/atlantic/aq_filters.c| 876 + .../net/ethernet/aquantia/atlantic/aq_filters.h| 36 + drivers/net/ethernet/aquantia/atlantic/aq_hw.h | 29 + drivers/net/ethernet/aquantia/atlantic/aq_main.c | 55 +- drivers/net/ethernet/aquantia/atlantic/aq_nic.c| 2 - drivers/net/ethernet/aquantia/atlantic/aq_nic.h| 21 +- .../net/ethernet/aquantia/atlantic/aq_pci_func.c | 2 + .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 162 +++- .../ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c | 109 +++ .../ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h | 48 ++ .../aquantia/atlantic/hw_atl/hw_atl_llh_internal.h | 135 +++- .../aquantia/atlantic/hw_atl/hw_atl_utils.h| 58 ++ 15 files changed, 1531 insertions(+), 36 deletions(-) create mode 100644 drivers/net/ethernet/aquantia/atlantic/aq_filters.c create mode 100644 drivers/net/ethernet/aquantia/atlantic/aq_filters.h -- 2.7.4
Re: [PATCH net-next 0/6] net: aquantia: add rx-flow filter support
On 12.11.2018 18:02, Igor Russkikh wrote: > In this patchset the rx-flow filters functionality and vlan filter offloads > are implemented. > > The rules in NIC hardware have fixed order and priorities. > To support this, the locations of filters from ethtool perspective are also > fixed: > Hi David, please discard the patch, it seems it gets a conflict with just integrated 'net' tree. Will resend it shortly. Sorry for this.