Re: [PATCH v3 1/5] netlink: extended ACK reporting
On Tue, 2017-04-11 at 13:42 -0400, David Miller wrote: > > Then, the library needs to be extended to enable this handling to > > modify the way it needs to handle errors, together with the > > setsockopt(). So I'd tend to agree, but * it was easy to solve this, with the flags I added * the flags reduce the amount of parsing dependencies, for example, it is then no longer necessary to check if error == 0 to know if the message was capped or not (when not requested), since the flag indicates it * having the flag that TLVs were set lets us not store if the setsockopt failed or not - this can be useful in the error report case (but not in the success/cookie case) since it avoids extra state that needs to be passed around * libnl, which seems pretty common, allows just passing a single pointer around as state, which is often already used for something else, meaning a bigger refactoring can be required to pass the extra state So yes, in theory we don't need this, but in practice it does make the userland parsing code quite a bit easier. johannes
Re: [PATCH v3 1/5] netlink: extended ACK reporting
On 4/11/17 1:05 PM, David Miller wrote: > From: David Ahern> Date: Tue, 11 Apr 2017 12:57:59 -0600 > >> On 4/11/17 11:42 AM, David Miller wrote: >>> David, if you have a specific case where it's absolutely impossible >>> to resolve this when the library is converted to support extended >>> ACKs, please mention it. >> >> I don't have a specific library in mind. It is more the disjoint nature >> of a socket option and message parsing, and thinking through the API. > > Would you prefer it being passed via a control message at sendmsg() > time? That would require more changes to userland programs, and also > increase the per-request overhead in the kernel as it parses this CMSG > every single time. Johannes' v4 adds a flag in the response that the parser can use to definitely know the size of the errmsg.msg that was returned. That seems sufficient to me. Did you disagree with that suggestion?
Re: [PATCH v3 1/5] netlink: extended ACK reporting
From: David AhernDate: Tue, 11 Apr 2017 12:57:59 -0600 > On 4/11/17 11:42 AM, David Miller wrote: >> David, if you have a specific case where it's absolutely impossible >> to resolve this when the library is converted to support extended >> ACKs, please mention it. > > I don't have a specific library in mind. It is more the disjoint nature > of a socket option and message parsing, and thinking through the API. Would you prefer it being passed via a control message at sendmsg() time? That would require more changes to userland programs, and also increase the per-request overhead in the kernel as it parses this CMSG every single time. We have a setsockopt for AF_PACKET that changes the entire layout of the packet mmap ring, nobody complains that libraries are hard to implement because of that right?
Re: [PATCH v3 1/5] netlink: extended ACK reporting
On 4/11/17 11:42 AM, David Miller wrote: > David, if you have a specific case where it's absolutely impossible > to resolve this when the library is converted to support extended > ACKs, please mention it. I don't have a specific library in mind. It is more the disjoint nature of a socket option and message parsing, and thinking through the API.
Re: [PATCH v3 1/5] netlink: extended ACK reporting
From: Pablo Neira AyusoDate: Tue, 11 Apr 2017 19:31:43 +0200 > On Tue, Apr 11, 2017 at 08:25:57AM -0600, David Ahern wrote: >> On 4/11/17 1:02 AM, Johannes Berg wrote: >> > On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote: >> >> CAP_ACK means: trim off the payload that the netlink error message >> >> is embedding, just like ICMP error does. >> >> >> >> What is exactly your concern? If the user explicitly requests this >> >> via socket option for this socket, then we're expecting they do the >> >> right handling for what they're asking for. >> > >> > I think David's concern was that when you want to parse the ACK in a >> > library (or application), you may not easily know if the application >> > (or library) requested capping. >> >> exactly. > > Then, the library needs to be extended to enable this handling to > modify the way it needs to handle errors, together with the > setsockopt(). That's my take on this. If there are libraries where there is a disconnect between the thing that controls the sending of the netlink request from the processing of the netlink response, that really is their problem to work out. If they wish to support extended ACK reports, they will need to sort those details out. If there is sharing of a newlink socket between different libraries, each wanting to operate in a different mode, that isn't all that reasonable to me. Often libraries can't even agree on whether they want to use a socket fd in non-blocking vs. blocking mode. David, if you have a specific case where it's absolutely impossible to resolve this when the library is converted to support extended ACKs, please mention it. Thanks.
Re: [PATCH v3 1/5] netlink: extended ACK reporting
On Tue, Apr 11, 2017 at 08:25:57AM -0600, David Ahern wrote: > On 4/11/17 1:02 AM, Johannes Berg wrote: > > On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote: > >> CAP_ACK means: trim off the payload that the netlink error message > >> is embedding, just like ICMP error does. > >> > >> What is exactly your concern? If the user explicitly requests this > >> via socket option for this socket, then we're expecting they do the > >> right handling for what they're asking for. > > > > I think David's concern was that when you want to parse the ACK in a > > library (or application), you may not easily know if the application > > (or library) requested capping. > > exactly. Then, the library needs to be extended to enable this handling to modify the way it needs to handle errors, together with the setsockopt().
Re: [PATCH v3 1/5] netlink: extended ACK reporting
On 4/11/17 1:02 AM, Johannes Berg wrote: > On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote: >> CAP_ACK means: trim off the payload that the netlink error message >> is embedding, just like ICMP error does. >> >> What is exactly your concern? If the user explicitly requests this >> via socket option for this socket, then we're expecting they do the >> right handling for what they're asking for. > > I think David's concern was that when you want to parse the ACK in a > library (or application), you may not easily know if the application > (or library) requested capping. exactly.
Re: [PATCH v3 1/5] netlink: extended ACK reporting
On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote: > CAP_ACK means: trim off the payload that the netlink error message > is embedding, just like ICMP error does. > > What is exactly your concern? If the user explicitly requests this > via socket option for this socket, then we're expecting they do the > right handling for what they're asking for. I think David's concern was that when you want to parse the ACK in a library (or application), you may not easily know if the application (or library) requested capping. I've addressed this by adding two new flags now, though the CAPPED flag can only be relied on when the TLVS flag is present, but that's the only case where you care anyway. I felt that we had enough space to spend two bits rather than one, in order to not have to rely on any length calculations to see if TLVs are present - as I'd suggested in my email last night. johannes
Re: [PATCH v3 1/5] netlink: extended ACK reporting
On Mon, Apr 10, 2017 at 09:35:27AM -0600, David Ahern wrote: > On 4/10/17 9:30 AM, Johannes Berg wrote: > > On Mon, 2017-04-10 at 09:26 -0600, David Ahern wrote: > >> On 4/8/17 2:24 PM, Johannes Berg wrote: > >>> @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb, > >>> struct nlmsghdr *nlh, int err) > >>> NLMSG_ERROR, payload, 0); > >>> errmsg = nlmsg_data(rep); > >>> errmsg->error = err; > >>> - memcpy(>msg, nlh, payload > sizeof(*errmsg) ? nlh- > nlmsg_len : sizeof(*nlh)); > >>> + memcpy(>msg, nlh, > >>> +!(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len > >>> + : sizeof(*nlh)); > >>> + > >> > >> generically this makes userspace parsing more problematic: the > >> parsing layer may not know if the socket option has been set to > >> precisely know the size of errmsg->msg and how much data needs to be > >> skipped to get to the new attributes. > > > > Yes, I know. I'd hope that userspace can remember that per socket - I > > don't see a good other way to do this. > > > > If we insert the TLVs in front of, or instead of (with a TLV containing > > it), the request message then at least libnl's debugging will need to > > be changed. > > > > As it is, I can assume that libnl will not set the CAP setting, and > > everything works fine even if I don't change libnl, which makes things > > easier. > > > > Do you have any better ideas? > > NETLINK_F_CAP_ACK and NETLINK_F_EXT_ACK should be incompatible -- if one > is set the other can not be set. CAP_ACK means abbreviate the response > and EXT_ACK means give me more data. CAP_ACK means: trim off the payload that the netlink error message is embedding, just like ICMP error does. What is exactly your concern? If the user explicitly requests this via socket option for this socket, then we're expecting they do the right handling for what they're asking for.
Re: [PATCH v3 1/5] netlink: extended ACK reporting
On Mon, 2017-04-10 at 17:40 +0200, Johannes Berg wrote: > > Another thought: if we add a new flag that indicates "message has > been capped", which we introduce in this same patch, then we can > disentangle this more easily, right? > > Adding a new flag for "TLVs present" won't really help, but if you > know the message was capped then you know the TLVs start after the > inner nlmsghdr and you ignore that header's nlmsg_len. Actually, the flag should be set if (and only if) the message was capped *and* TLVs were requested (or present, doesn't matter.) That way it becomes completely backward compatible and stateless: * on kernels that don't have extack you can ignore the setsockopt failure * checking if TLVs are present becomes flag set || nlh->nlmsg_len > sizeof(*nlh) + sizeof(int) + sizeof(*inner_nlh) + inner_nlh->nlmsg_len * TLV start offset is tlv_start_offs = sizeof(*nlh) + sizeof(int) + sizeof(inner_nlh) if (flag set) tlv_start_offs += inner_nlh->nlmsg_len I need to resend anyway so I'll add that tomorrow. johannes
Re: [PATCH v3 1/5] netlink: extended ACK reporting
On Mon, 2017-04-10 at 09:35 -0600, David Ahern wrote: > > Do you have any better ideas? > > NETLINK_F_CAP_ACK and NETLINK_F_EXT_ACK should be incompatible -- if > one is set the other can not be set. CAP_ACK means abbreviate the > response and EXT_ACK means give me more data. So you mean if I want EXT_ACK I cannot also cap the message? I guess that's doable, but again wouldn't it hurt applications that want to use CAP? I assume every application wants to use EXT_ACK eventually, and those that use CAP right now wouldn't be able to. Another thought: if we add a new flag that indicates "message has been capped", which we introduce in this same patch, then we can disentangle this more easily, right? Adding a new flag for "TLVs present" won't really help, but if you know the message was capped then you know the TLVs start after the inner nlmsghdr and you ignore that header's nlmsg_len. johannes
Re: [PATCH v3 1/5] netlink: extended ACK reporting
On 4/10/17 9:30 AM, Johannes Berg wrote: > On Mon, 2017-04-10 at 09:26 -0600, David Ahern wrote: >> On 4/8/17 2:24 PM, Johannes Berg wrote: >>> @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb, >>> struct nlmsghdr *nlh, int err) >>> NLMSG_ERROR, payload, 0); >>> errmsg = nlmsg_data(rep); >>> errmsg->error = err; >>> - memcpy(>msg, nlh, payload > sizeof(*errmsg) ? nlh- nlmsg_len : sizeof(*nlh)); >>> + memcpy(>msg, nlh, >>> + !(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len >>> +: sizeof(*nlh)); >>> + >> >> generically this makes userspace parsing more problematic: the >> parsing layer may not know if the socket option has been set to >> precisely know the size of errmsg->msg and how much data needs to be >> skipped to get to the new attributes. > > Yes, I know. I'd hope that userspace can remember that per socket - I > don't see a good other way to do this. > > If we insert the TLVs in front of, or instead of (with a TLV containing > it), the request message then at least libnl's debugging will need to > be changed. > > As it is, I can assume that libnl will not set the CAP setting, and > everything works fine even if I don't change libnl, which makes things > easier. > > Do you have any better ideas? NETLINK_F_CAP_ACK and NETLINK_F_EXT_ACK should be incompatible -- if one is set the other can not be set. CAP_ACK means abbreviate the response and EXT_ACK means give me more data.
Re: [PATCH v3 1/5] netlink: extended ACK reporting
On Mon, 2017-04-10 at 09:26 -0600, David Ahern wrote: > On 4/8/17 2:24 PM, Johannes Berg wrote: > > @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb, > > struct nlmsghdr *nlh, int err) > > NLMSG_ERROR, payload, 0); > > errmsg = nlmsg_data(rep); > > errmsg->error = err; > > - memcpy(>msg, nlh, payload > sizeof(*errmsg) ? nlh- > > >nlmsg_len : sizeof(*nlh)); > > + memcpy(>msg, nlh, > > + !(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len > > + : sizeof(*nlh)); > > + > > generically this makes userspace parsing more problematic: the > parsing layer may not know if the socket option has been set to > precisely know the size of errmsg->msg and how much data needs to be > skipped to get to the new attributes. Yes, I know. I'd hope that userspace can remember that per socket - I don't see a good other way to do this. If we insert the TLVs in front of, or instead of (with a TLV containing it), the request message then at least libnl's debugging will need to be changed. As it is, I can assume that libnl will not set the CAP setting, and everything works fine even if I don't change libnl, which makes things easier. Do you have any better ideas? johannes
Re: [PATCH v3 1/5] netlink: extended ACK reporting
On 4/8/17 2:24 PM, Johannes Berg wrote: > @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb, struct > nlmsghdr *nlh, int err) > NLMSG_ERROR, payload, 0); > errmsg = nlmsg_data(rep); > errmsg->error = err; > - memcpy(>msg, nlh, payload > sizeof(*errmsg) ? nlh->nlmsg_len : > sizeof(*nlh)); > + memcpy(>msg, nlh, > +!(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len > + : sizeof(*nlh)); > + generically this makes userspace parsing more problematic: the parsing layer may not know if the socket option has been set to precisely know the size of errmsg->msg and how much data needs to be skipped to get to the new attributes.
[PATCH v3 1/5] netlink: extended ACK reporting
From: Johannes BergAdd the base infrastructure and UAPI for netlink extended ACK reporting. All "manual" calls to netlink_ack() pass NULL for now and thus don't get extended ACK reporting. Big thanks goes to Pablo Neira Ayuso for not only bringing up the whole topic at netconf (again) but also coming up with the nlattr passing trick and various other ideads. Signed-off-by: Johannes Berg --- crypto/crypto_user.c | 3 +- drivers/infiniband/core/netlink.c | 5 +-- drivers/scsi/scsi_netlink.c | 2 +- include/linux/netlink.h | 28 ++- include/net/netlink.h | 3 +- include/uapi/linux/netlink.h | 29 kernel/audit.c| 2 +- net/core/rtnetlink.c | 3 +- net/core/sock_diag.c | 3 +- net/decnet/netfilter/dn_rtmsg.c | 2 +- net/hsr/hsr_netlink.c | 4 +-- net/netfilter/ipset/ip_set_core.c | 2 +- net/netfilter/nfnetlink.c | 22 ++-- net/netlink/af_netlink.c | 71 ++- net/netlink/af_netlink.h | 1 + net/netlink/genetlink.c | 3 +- net/xfrm/xfrm_user.c | 3 +- 17 files changed, 152 insertions(+), 34 deletions(-) diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c index a90404a0c5ff..4a44830741c1 100644 --- a/crypto/crypto_user.c +++ b/crypto/crypto_user.c @@ -483,7 +483,8 @@ static const struct crypto_link { [CRYPTO_MSG_DELRNG - CRYPTO_MSG_BASE] = { .doit = crypto_del_rng }, }; -static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) { struct nlattr *attrs[CRYPTOCFGA_MAX+1]; const struct crypto_link *link; diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 10469b0088b5..b784055423c8 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -146,7 +146,8 @@ int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh, } EXPORT_SYMBOL(ibnl_put_attr); -static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) { struct ibnl_client *client; int type = nlh->nlmsg_type; @@ -209,7 +210,7 @@ static void ibnl_rcv_reply_skb(struct sk_buff *skb) if (nlh->nlmsg_flags & NLM_F_REQUEST) return; - ibnl_rcv_msg(skb, nlh); + ibnl_rcv_msg(skb, nlh, NULL); msglen = NLMSG_ALIGN(nlh->nlmsg_len); if (msglen > skb->len) diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c index 109802f776ed..50e624fb8307 100644 --- a/drivers/scsi/scsi_netlink.c +++ b/drivers/scsi/scsi_netlink.c @@ -111,7 +111,7 @@ scsi_nl_rcv_msg(struct sk_buff *skb) next_msg: if ((err) || (nlh->nlmsg_flags & NLM_F_ACK)) - netlink_ack(skb, nlh, err); + netlink_ack(skb, nlh, err, NULL); skb_pull(skb, rlen); } diff --git a/include/linux/netlink.h b/include/linux/netlink.h index da14ab61f363..746359c27396 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -62,11 +62,37 @@ netlink_kernel_create(struct net *net, int unit, struct netlink_kernel_cfg *cfg) return __netlink_kernel_create(net, unit, THIS_MODULE, cfg); } +/** + * struct netlink_ext_ack - netlink extended ACK report struct + * @_msg: message string to report - don't access directly, use + * %NL_SET_ERR_MSG + * @bad_attr: attribute with error + * @missing_attr: number of missing attr (or 0) + */ +struct netlink_ext_ack { + const char *_msg; + const struct nlattr *bad_attr; + u16 missing_attr; +}; + +/* Always use this macro, this allows later putting the + * message into a separate section or such for things + * like translation or listing all possible messages. + * Currently string formatting is not supported (due + * to the lack of an output buffer.) + */ +#define NL_SET_ERR_MSG(extack, msg) do { \ + static const char *_msg = (msg);\ + \ + (extack)->_msg = _msg; \ +} while (0) + extern void netlink_kernel_release(struct sock *sk); extern int __netlink_change_ngroups(struct sock *sk, unsigned int groups); extern int netlink_change_ngroups(struct sock *sk, unsigned int groups); extern void __netlink_clear_multicast_users(struct sock *sk, unsigned int group); -extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err); +extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err, + const struct netlink_ext_ack