Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-11 Thread Johannes Berg
On Tue, 2017-04-11 at 13:42 -0400, David Miller wrote:

> > Then, the library needs to be extended to enable this handling to
> > modify the way it needs to handle errors, together with the
> > setsockopt().

So I'd tend to agree, but

 * it was easy to solve this, with the flags I added
 * the flags reduce the amount of parsing dependencies, for example, it
   is then no longer necessary to check if error == 0 to know if the
   message was capped or not (when not requested), since the flag
   indicates it
 * having the flag that TLVs were set lets us not store if the
   setsockopt failed or not - this can be useful in the error report
   case (but not in the success/cookie case) since it avoids extra
   state that needs to be passed around
 * libnl, which seems pretty common, allows just passing a single
   pointer around as state, which is often already used for something
   else, meaning a bigger refactoring can be required to pass the extra
   state

So yes, in theory we don't need this, but in practice it does make the
userland parsing code quite a bit easier.

johannes


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-11 Thread David Ahern
On 4/11/17 1:05 PM, David Miller wrote:
> From: David Ahern 
> Date: Tue, 11 Apr 2017 12:57:59 -0600
> 
>> On 4/11/17 11:42 AM, David Miller wrote:
>>> David, if you have a specific case where it's absolutely impossible
>>> to resolve this when the library is converted to support extended
>>> ACKs, please mention it.
>>
>> I don't have a specific library in mind. It is more the disjoint nature
>> of a socket option and message parsing, and thinking through the API.
> 
> Would you prefer it being passed via a control message at sendmsg()
> time?  That would require more changes to userland programs, and also
> increase the per-request overhead in the kernel as it parses this CMSG
> every single time.

Johannes' v4 adds a flag in the response that the parser can use to
definitely know the size of the errmsg.msg that was returned. That seems
sufficient to me. Did you disagree with that suggestion?


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-11 Thread David Miller
From: David Ahern 
Date: Tue, 11 Apr 2017 12:57:59 -0600

> On 4/11/17 11:42 AM, David Miller wrote:
>> David, if you have a specific case where it's absolutely impossible
>> to resolve this when the library is converted to support extended
>> ACKs, please mention it.
> 
> I don't have a specific library in mind. It is more the disjoint nature
> of a socket option and message parsing, and thinking through the API.

Would you prefer it being passed via a control message at sendmsg()
time?  That would require more changes to userland programs, and also
increase the per-request overhead in the kernel as it parses this CMSG
every single time.

We have a setsockopt for AF_PACKET that changes the entire layout of
the packet mmap ring, nobody complains that libraries are hard to
implement because of that right?


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-11 Thread David Ahern
On 4/11/17 11:42 AM, David Miller wrote:
> David, if you have a specific case where it's absolutely impossible
> to resolve this when the library is converted to support extended
> ACKs, please mention it.

I don't have a specific library in mind. It is more the disjoint nature
of a socket option and message parsing, and thinking through the API.


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-11 Thread David Miller
From: Pablo Neira Ayuso 
Date: Tue, 11 Apr 2017 19:31:43 +0200

> On Tue, Apr 11, 2017 at 08:25:57AM -0600, David Ahern wrote:
>> On 4/11/17 1:02 AM, Johannes Berg wrote:
>> > On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote:
>> >> CAP_ACK means: trim off the payload that the netlink error message
>> >> is embedding, just like ICMP error does.
>> >>
>> >> What is exactly your concern? If the user explicitly requests this
>> >> via socket option for this socket, then we're expecting they do the
>> >> right handling for what they're asking for.
>> > 
>> > I think David's concern was that when you want to parse the ACK in a
>> > library (or application), you may not easily know if the application
>> > (or library) requested capping.
>> 
>> exactly.
> 
> Then, the library needs to be extended to enable this handling to
> modify the way it needs to handle errors, together with the
> setsockopt().

That's my take on this.

If there are libraries where there is a disconnect between the thing
that controls the sending of the netlink request from the processing
of the netlink response, that really is their problem to work out.

If they wish to support extended ACK reports, they will need to
sort those details out.

If there is sharing of a newlink socket between different libraries,
each wanting to operate in a different mode, that isn't all that
reasonable to me.  Often libraries can't even agree on whether they
want to use a socket fd in non-blocking vs. blocking mode.

David, if you have a specific case where it's absolutely impossible
to resolve this when the library is converted to support extended
ACKs, please mention it.

Thanks.


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-11 Thread Pablo Neira Ayuso
On Tue, Apr 11, 2017 at 08:25:57AM -0600, David Ahern wrote:
> On 4/11/17 1:02 AM, Johannes Berg wrote:
> > On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote:
> >> CAP_ACK means: trim off the payload that the netlink error message
> >> is embedding, just like ICMP error does.
> >>
> >> What is exactly your concern? If the user explicitly requests this
> >> via socket option for this socket, then we're expecting they do the
> >> right handling for what they're asking for.
> > 
> > I think David's concern was that when you want to parse the ACK in a
> > library (or application), you may not easily know if the application
> > (or library) requested capping.
> 
> exactly.

Then, the library needs to be extended to enable this handling to
modify the way it needs to handle errors, together with the
setsockopt().


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-11 Thread David Ahern
On 4/11/17 1:02 AM, Johannes Berg wrote:
> On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote:
>> CAP_ACK means: trim off the payload that the netlink error message
>> is embedding, just like ICMP error does.
>>
>> What is exactly your concern? If the user explicitly requests this
>> via socket option for this socket, then we're expecting they do the
>> right handling for what they're asking for.
> 
> I think David's concern was that when you want to parse the ACK in a
> library (or application), you may not easily know if the application
> (or library) requested capping.

exactly.


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-11 Thread Johannes Berg
On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote:
> CAP_ACK means: trim off the payload that the netlink error message
> is embedding, just like ICMP error does.
> 
> What is exactly your concern? If the user explicitly requests this
> via socket option for this socket, then we're expecting they do the
> right handling for what they're asking for.

I think David's concern was that when you want to parse the ACK in a
library (or application), you may not easily know if the application
(or library) requested capping.

I've addressed this by adding two new flags now, though the CAPPED flag
can only be relied on when the TLVS flag is present, but that's the
only case where you care anyway. I felt that we had enough space to
spend two bits rather than one, in order to not have to rely on any
length calculations to see if TLVs are present - as I'd suggested in my
email last night.

johannes


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-11 Thread Pablo Neira Ayuso
On Mon, Apr 10, 2017 at 09:35:27AM -0600, David Ahern wrote:
> On 4/10/17 9:30 AM, Johannes Berg wrote:
> > On Mon, 2017-04-10 at 09:26 -0600, David Ahern wrote:
> >> On 4/8/17 2:24 PM, Johannes Berg wrote:
> >>> @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb,
> >>> struct nlmsghdr *nlh, int err)
> >>> NLMSG_ERROR, payload, 0);
> >>>   errmsg = nlmsg_data(rep);
> >>>   errmsg->error = err;
> >>> - memcpy(>msg, nlh, payload > sizeof(*errmsg) ? nlh-
>  nlmsg_len : sizeof(*nlh));
> >>> + memcpy(>msg, nlh,
> >>> +!(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len
> >>> +  : sizeof(*nlh));
> >>> +
> >>
> >> generically this makes userspace parsing more problematic: the
> >> parsing layer may not know if the socket option has been set to
> >> precisely know the size of errmsg->msg and how much data needs to be
> >> skipped to get to the new attributes.
> > 
> > Yes, I know. I'd hope that userspace can remember that per socket - I
> > don't see a good other way to do this.
> > 
> > If we insert the TLVs in front of, or instead of (with a TLV containing
> > it), the request message then at least libnl's debugging will need to
> > be changed.
> > 
> > As it is, I can assume that libnl will not set the CAP setting, and
> > everything works fine even if I don't change libnl, which makes things
> > easier.
> > 
> > Do you have any better ideas?
> 
> NETLINK_F_CAP_ACK and NETLINK_F_EXT_ACK should be incompatible -- if one
> is set the other can not be set. CAP_ACK means abbreviate the response
> and EXT_ACK means give me more data.

CAP_ACK means: trim off the payload that the netlink error message
is embedding, just like ICMP error does.

What is exactly your concern? If the user explicitly requests this via
socket option for this socket, then we're expecting they do the right
handling for what they're asking for.


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-10 Thread Johannes Berg
On Mon, 2017-04-10 at 17:40 +0200, Johannes Berg wrote:
> 
> Another thought: if we add a new flag that indicates "message has
> been capped", which we introduce in this same patch, then we can
> disentangle this more easily, right?
> 
> Adding a new flag for "TLVs present" won't really help, but if you
> know the message was capped then you know the TLVs start after the
> inner nlmsghdr and you ignore that header's nlmsg_len.

Actually, the flag should be set if (and only if) the message was
capped *and* TLVs were requested (or present, doesn't matter.)

That way it becomes completely backward compatible and stateless:
 * on kernels that don't have extack you can ignore the setsockopt
   failure
 * checking if TLVs are present becomes
   flag set ||
   nlh->nlmsg_len > sizeof(*nlh) + sizeof(int) +
sizeof(*inner_nlh) + inner_nlh->nlmsg_len
 * TLV start offset is
   tlv_start_offs = sizeof(*nlh) + sizeof(int) + sizeof(inner_nlh)
   if (flag set)
   tlv_start_offs += inner_nlh->nlmsg_len

I need to resend anyway so I'll add that tomorrow.

johannes


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-10 Thread Johannes Berg
On Mon, 2017-04-10 at 09:35 -0600, David Ahern wrote:

> > Do you have any better ideas?
> 
> NETLINK_F_CAP_ACK and NETLINK_F_EXT_ACK should be incompatible -- if
> one is set the other can not be set. CAP_ACK means abbreviate the
> response and EXT_ACK means give me more data.

So you mean if I want EXT_ACK I cannot also cap the message?

I guess that's doable, but again wouldn't it hurt applications that
want to use CAP? I assume every application wants to use EXT_ACK
eventually, and those that use CAP right now wouldn't be able to.

Another thought: if we add a new flag that indicates "message has been
capped", which we introduce in this same patch, then we can disentangle
this more easily, right?

Adding a new flag for "TLVs present" won't really help, but if you know
the message was capped then you know the TLVs start after the inner
nlmsghdr and you ignore that header's nlmsg_len.

johannes


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-10 Thread David Ahern
On 4/10/17 9:30 AM, Johannes Berg wrote:
> On Mon, 2017-04-10 at 09:26 -0600, David Ahern wrote:
>> On 4/8/17 2:24 PM, Johannes Berg wrote:
>>> @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb,
>>> struct nlmsghdr *nlh, int err)
>>>   NLMSG_ERROR, payload, 0);
>>> errmsg = nlmsg_data(rep);
>>> errmsg->error = err;
>>> -   memcpy(>msg, nlh, payload > sizeof(*errmsg) ? nlh-
 nlmsg_len : sizeof(*nlh));
>>> +   memcpy(>msg, nlh,
>>> +  !(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len
>>> +: sizeof(*nlh));
>>> +
>>
>> generically this makes userspace parsing more problematic: the
>> parsing layer may not know if the socket option has been set to
>> precisely know the size of errmsg->msg and how much data needs to be
>> skipped to get to the new attributes.
> 
> Yes, I know. I'd hope that userspace can remember that per socket - I
> don't see a good other way to do this.
> 
> If we insert the TLVs in front of, or instead of (with a TLV containing
> it), the request message then at least libnl's debugging will need to
> be changed.
> 
> As it is, I can assume that libnl will not set the CAP setting, and
> everything works fine even if I don't change libnl, which makes things
> easier.
> 
> Do you have any better ideas?

NETLINK_F_CAP_ACK and NETLINK_F_EXT_ACK should be incompatible -- if one
is set the other can not be set. CAP_ACK means abbreviate the response
and EXT_ACK means give me more data.


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-10 Thread Johannes Berg
On Mon, 2017-04-10 at 09:26 -0600, David Ahern wrote:
> On 4/8/17 2:24 PM, Johannes Berg wrote:
> > @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb,
> > struct nlmsghdr *nlh, int err)
> >       NLMSG_ERROR, payload, 0);
> >     errmsg = nlmsg_data(rep);
> >     errmsg->error = err;
> > -   memcpy(>msg, nlh, payload > sizeof(*errmsg) ? nlh-
> > >nlmsg_len : sizeof(*nlh));
> > +   memcpy(>msg, nlh,
> > +      !(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len
> > +    : sizeof(*nlh));
> > +
> 
> generically this makes userspace parsing more problematic: the
> parsing layer may not know if the socket option has been set to
> precisely know the size of errmsg->msg and how much data needs to be
> skipped to get to the new attributes.

Yes, I know. I'd hope that userspace can remember that per socket - I
don't see a good other way to do this.

If we insert the TLVs in front of, or instead of (with a TLV containing
it), the request message then at least libnl's debugging will need to
be changed.

As it is, I can assume that libnl will not set the CAP setting, and
everything works fine even if I don't change libnl, which makes things
easier.

Do you have any better ideas?

johannes


Re: [PATCH v3 1/5] netlink: extended ACK reporting

2017-04-10 Thread David Ahern
On 4/8/17 2:24 PM, Johannes Berg wrote:
> @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb, struct 
> nlmsghdr *nlh, int err)
> NLMSG_ERROR, payload, 0);
>   errmsg = nlmsg_data(rep);
>   errmsg->error = err;
> - memcpy(>msg, nlh, payload > sizeof(*errmsg) ? nlh->nlmsg_len : 
> sizeof(*nlh));
> + memcpy(>msg, nlh,
> +!(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len
> +  : sizeof(*nlh));
> +

generically this makes userspace parsing more problematic: the parsing
layer may not know if the socket option has been set to precisely know
the size of errmsg->msg and how much data needs to be skipped to get to
the new attributes.


[PATCH v3 1/5] netlink: extended ACK reporting

2017-04-08 Thread Johannes Berg
From: Johannes Berg 

Add the base infrastructure and UAPI for netlink
extended ACK reporting. All "manual" calls to
netlink_ack() pass NULL for now and thus don't
get extended ACK reporting.

Big thanks goes to Pablo Neira Ayuso for not only
bringing up the whole topic at netconf (again) but
also coming up with the nlattr passing trick and
various other ideads.

Signed-off-by: Johannes Berg 
---
 crypto/crypto_user.c  |  3 +-
 drivers/infiniband/core/netlink.c |  5 +--
 drivers/scsi/scsi_netlink.c   |  2 +-
 include/linux/netlink.h   | 28 ++-
 include/net/netlink.h |  3 +-
 include/uapi/linux/netlink.h  | 29 
 kernel/audit.c|  2 +-
 net/core/rtnetlink.c  |  3 +-
 net/core/sock_diag.c  |  3 +-
 net/decnet/netfilter/dn_rtmsg.c   |  2 +-
 net/hsr/hsr_netlink.c |  4 +--
 net/netfilter/ipset/ip_set_core.c |  2 +-
 net/netfilter/nfnetlink.c | 22 ++--
 net/netlink/af_netlink.c  | 71 ++-
 net/netlink/af_netlink.h  |  1 +
 net/netlink/genetlink.c   |  3 +-
 net/xfrm/xfrm_user.c  |  3 +-
 17 files changed, 152 insertions(+), 34 deletions(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index a90404a0c5ff..4a44830741c1 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -483,7 +483,8 @@ static const struct crypto_link {
[CRYPTO_MSG_DELRNG  - CRYPTO_MSG_BASE] = { .doit = crypto_del_rng },
 };
 
-static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
+static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
+  struct netlink_ext_ack *extack)
 {
struct nlattr *attrs[CRYPTOCFGA_MAX+1];
const struct crypto_link *link;
diff --git a/drivers/infiniband/core/netlink.c 
b/drivers/infiniband/core/netlink.c
index 10469b0088b5..b784055423c8 100644
--- a/drivers/infiniband/core/netlink.c
+++ b/drivers/infiniband/core/netlink.c
@@ -146,7 +146,8 @@ int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh,
 }
 EXPORT_SYMBOL(ibnl_put_attr);
 
-static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
+static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
+   struct netlink_ext_ack *extack)
 {
struct ibnl_client *client;
int type = nlh->nlmsg_type;
@@ -209,7 +210,7 @@ static void ibnl_rcv_reply_skb(struct sk_buff *skb)
if (nlh->nlmsg_flags & NLM_F_REQUEST)
return;
 
-   ibnl_rcv_msg(skb, nlh);
+   ibnl_rcv_msg(skb, nlh, NULL);
 
msglen = NLMSG_ALIGN(nlh->nlmsg_len);
if (msglen > skb->len)
diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 109802f776ed..50e624fb8307 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -111,7 +111,7 @@ scsi_nl_rcv_msg(struct sk_buff *skb)
 
 next_msg:
if ((err) || (nlh->nlmsg_flags & NLM_F_ACK))
-   netlink_ack(skb, nlh, err);
+   netlink_ack(skb, nlh, err, NULL);
 
skb_pull(skb, rlen);
}
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index da14ab61f363..746359c27396 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -62,11 +62,37 @@ netlink_kernel_create(struct net *net, int unit, struct 
netlink_kernel_cfg *cfg)
return __netlink_kernel_create(net, unit, THIS_MODULE, cfg);
 }
 
+/**
+ * struct netlink_ext_ack - netlink extended ACK report struct
+ * @_msg: message string to report - don't access directly, use
+ * %NL_SET_ERR_MSG
+ * @bad_attr: attribute with error
+ * @missing_attr: number of missing attr (or 0)
+ */
+struct netlink_ext_ack {
+   const char *_msg;
+   const struct nlattr *bad_attr;
+   u16 missing_attr;
+};
+
+/* Always use this macro, this allows later putting the
+ * message into a separate section or such for things
+ * like translation or listing all possible messages.
+ * Currently string formatting is not supported (due
+ * to the lack of an output buffer.)
+ */
+#define NL_SET_ERR_MSG(extack, msg) do {   \
+   static const char *_msg = (msg);\
+   \
+   (extack)->_msg = _msg;  \
+} while (0)
+
 extern void netlink_kernel_release(struct sock *sk);
 extern int __netlink_change_ngroups(struct sock *sk, unsigned int groups);
 extern int netlink_change_ngroups(struct sock *sk, unsigned int groups);
 extern void __netlink_clear_multicast_users(struct sock *sk, unsigned int 
group);
-extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err);
+extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err,
+   const struct netlink_ext_ack