Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 12:55 -0700, David Miller wrote: > > One idea. We use the macro thing to generate a "netlink.pot" file > and then some userland tree can contain the latest netlink.pot and > the translations. Right. For the record - since we just talked about it - I was thinking of putting it into a special section so that we could easily write a script to generate the pot file from the kernel binary after doing "allyesconfig" or collecting it from modules or so. Thinking of modules though - do we need to adjust the module loading code to load a new section? I'll have to look into that. johannes
Re: [RFC 0/3] netlink: extended error reporting
From: Johannes BergDate: Fri, 07 Apr 2017 21:46:46 +0200 > On Fri, 2017-04-07 at 12:43 -0700, David Miller wrote: >> I suspect that someone will eventually give us a hard time about the >> strings wrt. internationalization. :-) It's solvable at least >> partially in userspace I suppose. > > I tend to think of the strings more of a debugging aid, but that's a > good point. > > Perhaps we should have a macro that has the strings - similar to the > inline function I put into netlink - but if we make that a macro we can > put the strings into a separate section to be able to find them more > easily for later translation, and optionally even omit them entirely > (to satisfy the kernel tinification folks)? One idea. We use the macro thing to generate a "netlink.pot" file and then some userland tree can contain the latest netlink.pot and the translations. I'm suggesting it this way because whilst putting the netlink.pot file in the kernel tree is probably OK, putting all the translation there might not be.
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 21:45 +0200, Pablo Neira Ayuso wrote: > > We only need to store the pointer to the attribute in the error > container structure. We can calculate the offset from nl_err() by > pasing the skbuff as parameter there, right? Ah, that's a good point, we could store the nlattr pointer and calculate the offset later. I'll need to try this, but yeah, good idea! johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 12:43 -0700, David Miller wrote: > > I have no strong opinions about string length. > > In my opinion, I would like to believe that if someone tried to get a > networking patch applied that emitted a rediculously long string then > we would give them feedback about how that is not acceptable. Right? Agree with this, I don't really see much point in adding an extra warning. There's an implicit (multi-KB though) limit as well in the message size ;-) > I suspect that someone will eventually give us a hard time about the > strings wrt. internationalization. :-) It's solvable at least > partially in userspace I suppose. I tend to think of the strings more of a debugging aid, but that's a good point. Perhaps we should have a macro that has the strings - similar to the inline function I put into netlink - but if we make that a macro we can put the strings into a separate section to be able to find them more easily for later translation, and optionally even omit them entirely (to satisfy the kernel tinification folks)? johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 09:29:17PM +0200, Johannes Berg wrote: > On Fri, 2017-04-07 at 21:21 +0200, Pablo Neira Ayuso wrote: > > I think the most flexible way is to pass the container error > > structure to nla_parse() so it sets this for you. This would also > > save tons of "malformed attribute" strings. > > Yes, for sure. The thing is we'll probalby have to pass down the > request skb *and* the error struct so that we can get the offset, and > this seems like the generic thing that we really should try to get the > most information generated. We only need to store the pointer to the attribute in the error container structure. We can calculate the offset from nl_err() by pasing the skbuff as parameter there, right?
Re: [RFC 0/3] netlink: extended error reporting
From: Pablo Neira AyusoDate: Fri, 7 Apr 2017 21:35:50 +0200 > On Fri, Apr 07, 2017 at 12:20:53PM -0700, David Miller wrote: > [...] >> Let's just discuss the UAPI, since people complain we don't talk >> about that enough :-) For those playing at home it is three new >> attributes returned in a netlink ACK when the application asks >> for the extended response: >> >> NLMSGERR_ATTR_MSG string Extended error string >> NLMSGERR_ATTR_OFFS u32 Byte offset to netlink element causing >> error >> NLMSGERR_ATTR_CODE u32 Subsystem specific error code >> NLMSGERR_ATTR_ATTR u16 Netlink attribute triggering error or >> missing > > I think it would be good if we get a definition to cap the maximum > string length to something reasonable? This can be added in a follow > up patch BTW. Thus, we get people coming back to us and request larger > strings with a reason why they need more room for this. > > In general, my main concern with strings is that they can be used in a > more freely way than these u32 offsets and error codes, and > specifically how inconsistent these string will look like across > different netlink subsystems. > > Anyway, as long as this is optional (not every subsystem if forced to > use strings) I'm fine with it :). I have no strong opinions about string length. In my opinion, I would like to believe that if someone tried to get a networking patch applied that emitted a rediculously long string then we would give them feedback about how that is not acceptable. Right? I suspect that someone will eventually give us a hard time about the strings wrt. internationalization. :-) It's solvable at least partially in userspace I suppose.
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 12:20:53PM -0700, David Miller wrote: [...] > Let's just discuss the UAPI, since people complain we don't talk > about that enough :-) For those playing at home it is three new > attributes returned in a netlink ACK when the application asks > for the extended response: > > NLMSGERR_ATTR_MSG string Extended error string > NLMSGERR_ATTR_OFFS u32 Byte offset to netlink element causing > error > NLMSGERR_ATTR_CODE u32 Subsystem specific error code > NLMSGERR_ATTR_ATTR u16 Netlink attribute triggering error or > missing I think it would be good if we get a definition to cap the maximum string length to something reasonable? This can be added in a follow up patch BTW. Thus, we get people coming back to us and request larger strings with a reason why they need more room for this. In general, my main concern with strings is that they can be used in a more freely way than these u32 offsets and error codes, and specifically how inconsistent these string will look like across different netlink subsystems. Anyway, as long as this is optional (not every subsystem if forced to use strings) I'm fine with it :).
Re: [RFC 0/3] netlink: extended error reporting
From: Pablo Neira AyusoDate: Fri, 7 Apr 2017 21:27:14 +0200 > On Fri, Apr 07, 2017 at 12:22:23PM -0700, David Miller wrote: >> From: Johannes Berg >> Date: Fri, 07 Apr 2017 21:09:45 +0200 >> >> > On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote: >> >> On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: >> >> [...] >> >> > Heh. I think I really want to solve - at least partially - >> >> > nla_parse() >> >> > to see that it can be done this way. It'd be nice to even transform >> >> > all >> >> > the callers (I generated half of these patches with spatch anyway) >> >> > to >> >> > have at least that. >> >> >> >> We can just have a modified version of nla_parse that deals with >> >> this. >> > >> > Yes, but we need to figure out a good way to have the offset. >> > >> > We also need to see if we want to *force* having the offset. In some >> > sense that'd be useful, in another it might be very complicated to fill >> > it in at all times, if for example errors come from lower layers like >> > drivers. >> >> It has to be optional, some kinds of errors don't have an exact >> context per-se. >> >> Also another way to look at this is that we're providing a lot of >> new power and expressability. So even if only one aspect of the >> new error reporting is used it's a positive step forward. >> >> So allow offset "0" meaning "unspecified". > > Instead, we can just not send the offset attribute to userspace if > it's not specified. So missing attribute means "unspecified". Agreed, not providing the attribute to indicate this is fine.
Re: [RFC 0/3] netlink: extended error reporting
From: Pablo Neira AyusoDate: Fri, 7 Apr 2017 21:21:34 +0200 > For my usecases in netfilter, the attributes and an specific error > code should be enough to figure out what is wrong. Will not need > strings. > > BTW, we may not have an offset, eg. EINVAL because of missing > attribute. Given we have different requirements, I would leave it to > each subsystem to decide what netlink error attributes are specified. Yep, completely agreed. The use cases for offset and missing attribute I see as follows: 1) Top-level attribute is missing. Here, offset is set to zero and the missing attribute number is given as well. 2) Nested attribute is missing. Offset is set to the location of the beginning of nesting, inside of which the missing attribute was needed.
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 21:27 +0200, Pablo Neira Ayuso wrote: > > > Also another way to look at this is that we're providing a lot of > > new power and expressability. So even if only one aspect of the > > new error reporting is used it's a positive step forward. True. > > So allow offset "0" meaning "unspecified". > > Instead, we can just not send the offset attribute to userspace if > it's not specified. So missing attribute means "unspecified". > > I'm always a bit worried this "0 means something" semantics :) Yeah, I have that. If it's 0 internally the attribute will be omitted. johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 21:21 +0200, Pablo Neira Ayuso wrote: > > For my usecases in netfilter, the attributes and an specific error > code should be enough to figure out what is wrong. Will not need > strings. Fair enough. > BTW, we may not have an offset, eg. EINVAL because of missing > attribute. Given we have different requirements, I would leave it to > each subsystem to decide what netlink error attributes are specified. > > > (It's ultimately always going to be optional since we'll continue > > returning errors without *any* extended error information - likely > > indefinitely - but if we have a wrong attribute, should we always > > have > > an offset? Would be nice, but could be difficult in practice) > > > > > We can probably use struct nla_policy to place the extended error > > > code or the string (as we discussed I don't need string errors, > > > but > > > I'm fine if other people find it useful). > > > > I don't think for the error strings really would be useful for > > nla_parse() or a policy - we can return something generic like > > "malformed attribute" there as a string, and hopefully point to the > > attribute/offset from there generically. I just really want to see > > how > > to actually do that. > > I think the most flexible way is to pass the container error > structure to nla_parse() so it sets this for you. This would also > save tons of "malformed attribute" strings. Yes, for sure. The thing is we'll probalby have to pass down the request skb *and* the error struct so that we can get the offset, and this seems like the generic thing that we really should try to get the most information generated. johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 12:22:23PM -0700, David Miller wrote: > From: Johannes Berg> Date: Fri, 07 Apr 2017 21:09:45 +0200 > > > On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote: > >> On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: > >> [...] > >> > Heh. I think I really want to solve - at least partially - > >> > nla_parse() > >> > to see that it can be done this way. It'd be nice to even transform > >> > all > >> > the callers (I generated half of these patches with spatch anyway) > >> > to > >> > have at least that. > >> > >> We can just have a modified version of nla_parse that deals with > >> this. > > > > Yes, but we need to figure out a good way to have the offset. > > > > We also need to see if we want to *force* having the offset. In some > > sense that'd be useful, in another it might be very complicated to fill > > it in at all times, if for example errors come from lower layers like > > drivers. > > It has to be optional, some kinds of errors don't have an exact > context per-se. > > Also another way to look at this is that we're providing a lot of > new power and expressability. So even if only one aspect of the > new error reporting is used it's a positive step forward. > > So allow offset "0" meaning "unspecified". Instead, we can just not send the offset attribute to userspace if it's not specified. So missing attribute means "unspecified". I'm always a bit worried this "0 means something" semantics :)
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 12:20 -0700, David Miller wrote: > But what it lacks fundamentally is context. Therefore it can't be > used to provide the offset or the bad attribute number. So it can't > meet our requirements. Yes, it doesn't address the requirements here, and in a sense I suspect this will be true anywhere else too - perhaps that's the reason why it was never applied even over in perf. I've not gotten a good answer for that question yet :) > We've passed down new arguments into core method call chains for much > less useful facilitites than this. So that doesn't bother me. And > as that is a kernel internal matter, we can refine it later. Yes, both are true. > Wrt. nla_parse(), we can solve that problem as follow-on patches too. Yes. I just want to be sure we can solve it, but OTOH we're not locking down anything but the UAPI so whatever way we find to solve it, even if it requires a complete rearchitecture internally, it still doesn't matter. > So I consider this change ready as far as the implementation is > concerned. Fair enough. I still do need to test it though :) > Ok, Jiri, start reading. I will try to make you happy here. > > Let's just discuss the UAPI, since people complain we don't talk > about that enough :-) For those playing at home it is three new > attributes returned in a netlink ACK when the application asks > for the extended response: > > NLMSGERR_ATTR_MSG string Extended error string > NLMSGERR_ATTR_OFFS u32 Byte offset to netlink > element causing error > NLMSGERR_ATTR_CODE u32 Subsystem specific error code > NLMSGERR_ATTR_ATTR u16 Netlink attribute triggering > error or missing That's four ;-) Since you're carefully listing it here, you should also say that * the new behaviour is entirely optional, enabled by a new netlink socket option * the ACK message format ends up being [nlmsg header] [ack header including errno] [request message nlmsg hdr] [request message payload - optional, controlled by socket option, aligned to 4 byte length if present] [new TLVs - optional, controlled by socket option] johannes
Re: [RFC 0/3] netlink: extended error reporting
From: Johannes BergDate: Fri, 07 Apr 2017 21:09:45 +0200 > On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote: >> On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: >> [...] >> > Heh. I think I really want to solve - at least partially - >> > nla_parse() >> > to see that it can be done this way. It'd be nice to even transform >> > all >> > the callers (I generated half of these patches with spatch anyway) >> > to >> > have at least that. >> >> We can just have a modified version of nla_parse that deals with >> this. > > Yes, but we need to figure out a good way to have the offset. > > We also need to see if we want to *force* having the offset. In some > sense that'd be useful, in another it might be very complicated to fill > it in at all times, if for example errors come from lower layers like > drivers. It has to be optional, some kinds of errors don't have an exact context per-se. Also another way to look at this is that we're providing a lot of new power and expressability. So even if only one aspect of the new error reporting is used it's a positive step forward. So allow offset "0" meaning "unspecified".
Re: [RFC 0/3] netlink: extended error reporting
From: Johannes BergDate: Fri, 07 Apr 2017 20:59:12 +0200 > Alexander Shishkin's patch > https://patchwork.kernel.org/patch/7162421/ > > was nice in a way because you could get away without passing the error > structure down all the time, but like I said it doesn't deal with > dynamic errors (even the offset/attr from nla_parse) and if it's not > done *everywhere* it risks leaking those exterr numbers (-1024..-4095) > to userspace through syscalls if it's used in a non-netlink path for > some reason (e.g. a single driver call being called through both > netlink and ioctl, and trying to return an extended error) Alexander's patch is cute in that it annotates errors using the special section blobs. But what it lacks fundamentally is context. Therefore it can't be used to provide the offset or the bad attribute number. So it can't meet our requirements. We've passed down new arguments into core method call chains for much less useful facilitites than this. So that doesn't bother me. And as that is a kernel internal matter, we can refine it later. Wrt. nla_parse(), we can solve that problem as follow-on patches too. So I consider this change ready as far as the implementation is concerned. Ok, Jiri, start reading. I will try to make you happy here. Let's just discuss the UAPI, since people complain we don't talk about that enough :-) For those playing at home it is three new attributes returned in a netlink ACK when the application asks for the extended response: NLMSGERR_ATTR_MSG string Extended error string NLMSGERR_ATTR_OFFS u32 Byte offset to netlink element causing error NLMSGERR_ATTR_CODE u32 Subsystem specific error code NLMSGERR_ATTR_ATTR u16 Netlink attribute triggering error or missing So please consider this interface carefully.
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 09:09:45PM +0200, Johannes Berg wrote: > On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote: > > On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: > > [...] > > > Heh. I think I really want to solve - at least partially - > > > nla_parse() > > > to see that it can be done this way. It'd be nice to even transform > > > all > > > the callers (I generated half of these patches with spatch anyway) > > > to > > > have at least that. > > > > We can just have a modified version of nla_parse that deals with > > this. > > Yes, but we need to figure out a good way to have the offset. > > We also need to see if we want to *force* having the offset. In some > sense that'd be useful, in another it might be very complicated to fill > it in at all times, if for example errors come from lower layers like > drivers. For my usecases in netfilter, the attributes and an specific error code should be enough to figure out what is wrong. Will not need strings. BTW, we may not have an offset, eg. EINVAL because of missing attribute. Given we have different requirements, I would leave it to each subsystem to decide what netlink error attributes are specified. > (It's ultimately always going to be optional since we'll continue > returning errors without *any* extended error information - likely > indefinitely - but if we have a wrong attribute, should we always have > an offset? Would be nice, but could be difficult in practice) > > > We can probably use struct nla_policy to place the extended error > > code or the string (as we discussed I don't need string errors, but > > I'm fine if other people find it useful). > > I don't think for the error strings really would be useful for > nla_parse() or a policy - we can return something generic like > "malformed attribute" there as a string, and hopefully point to the > attribute/offset from there generically. I just really want to see how > to actually do that. I think the most flexible way is to pass the container error structure to nla_parse() so it sets this for you. This would also save tons of "malformed attribute" strings.
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 21:06 +0200, Pablo Neira Ayuso wrote: > On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: > [...] > > Heh. I think I really want to solve - at least partially - > > nla_parse() > > to see that it can be done this way. It'd be nice to even transform > > all > > the callers (I generated half of these patches with spatch anyway) > > to > > have at least that. > > We can just have a modified version of nla_parse that deals with > this. Yes, but we need to figure out a good way to have the offset. We also need to see if we want to *force* having the offset. In some sense that'd be useful, in another it might be very complicated to fill it in at all times, if for example errors come from lower layers like drivers. (It's ultimately always going to be optional since we'll continue returning errors without *any* extended error information - likely indefinitely - but if we have a wrong attribute, should we always have an offset? Would be nice, but could be difficult in practice) > We can probably use struct nla_policy to place the extended error > code or the string (as we discussed I don't need string errors, but > I'm fine if other people find it useful). I don't think for the error strings really would be useful for nla_parse() or a policy - we can return something generic like "malformed attribute" there as a string, and hopefully point to the attribute/offset from there generically. I just really want to see how to actually do that. johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 21:02 +0200, Pablo Neira Ayuso wrote: > > > I'm tempted to apply this as-is just to show that person that > > things do in fact happen eventually :-) > > We can just send follow up patches to refine, I think it's a good > start, Johannes? I guess we can. Like I just said though - I do have a plan to make nla_parse() work but the week's been exhausting enough that I don't feel entirely comfortable saying that will definitely work. OTOH, arguably this is something that seems to work - as evidenced by the nl80211 changes - for at least some error reporting, so I guess we can continue to refine it, and the only thing we're really locking in to is the ABI, which is TLVs now, so that should be safe. That said, the nl80211 patch really should be bigger, and I've not even tested this at all yet, so I really need to do that first. Also, I'd like to point to the genl_info change, does that seem like a reasonable way to pass it around in genetlink? I'm much less familiar with the "regular" netlink to say how to pass it around there beyond what I already did. johannes
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 08:59:12PM +0200, Johannes Berg wrote: [...] > Heh. I think I really want to solve - at least partially - nla_parse() > to see that it can be done this way. It'd be nice to even transform all > the callers (I generated half of these patches with spatch anyway) to > have at least that. We can just have a modified version of nla_parse that deals with this. We can probably use struct nla_policy to place the extended error code or the string (as we discussed I don't need string errors, but I'm fine if other people find it useful). Thanks!
Re: [RFC 0/3] netlink: extended error reporting
On Fri, Apr 07, 2017 at 11:53:15AM -0700, David Miller wrote: > From: Johannes Berg> Date: Fri, 7 Apr 2017 20:26:17 +0200 > > > So this is my first draft of what we'd talked about at netconf. > > I'm not super happy with the way we have to pass the extended > > error struct, but I don't see a way to implement reporting any > > dynamic information (like error offsets) in any other way. > > > > Alexander Shishkin had a nice way of reporting static extended > > error data, but that isn't really suitable for reporting the > > offset or even reporting the broken attribute from nla_parse(). > > > > Speaking of nla_parse(), that'll be somewhat complicated to do > > since we'll have to track the offsets of where we're parsing, > > but it might be possible since the nlattrs are just pointers > > into the message, so (optionally?) passing the skb as well can > > allow us to fill the offset information. > > I like it, nice work. > > I know people want dynamically generated strings and stuff, and we can > get there, but I prefer that the first thing we commit is super simple. > > Someone gave me a hard time about the fact that we've been talking > about this idea for years but nothing ever happens. > > I'm tempted to apply this as-is just to show that person that things > do in fact happen eventually :-) We can just send follow up patches to refine, I think it's a good start, Johannes? BTW, for this co-authored effort in designing this: Signed-off-by: Pablo Neira Ayuso Thanks!
Re: [RFC 0/3] netlink: extended error reporting
On Fri, 2017-04-07 at 11:53 -0700, David Miller wrote: > > Alexander Shishkin had a nice way of reporting static extended > > error data, but that isn't really suitable for reporting the > > offset or even reporting the broken attribute from nla_parse(). > > > > Speaking of nla_parse(), that'll be somewhat complicated to do > > since we'll have to track the offsets of where we're parsing, > > but it might be possible since the nlattrs are just pointers > > into the message, so (optionally?) passing the skb as well can > > allow us to fill the offset information. > > I like it, nice work. I don't like it all that much ;-) Alexander Shishkin's patch https://patchwork.kernel.org/patch/7162421/ was nice in a way because you could get away without passing the error structure down all the time, but like I said it doesn't deal with dynamic errors (even the offset/attr from nla_parse) and if it's not done *everywhere* it risks leaking those exterr numbers (-1024..-4095) to userspace through syscalls if it's used in a non-netlink path for some reason (e.g. a single driver call being called through both netlink and ioctl, and trying to return an extended error) > I know people want dynamically generated strings and stuff, and we > can get there, but I prefer that the first thing we commit is super > simple. > > Someone gave me a hard time about the fact that we've been talking > about this idea for years but nothing ever happens. > > I'm tempted to apply this as-is just to show that person that things > do in fact happen eventually :-) Heh. I think I really want to solve - at least partially - nla_parse() to see that it can be done this way. It'd be nice to even transform all the callers (I generated half of these patches with spatch anyway) to have at least that. johannes
Re: [RFC 0/3] netlink: extended error reporting
From: Johannes BergDate: Fri, 7 Apr 2017 20:26:17 +0200 > So this is my first draft of what we'd talked about at netconf. > I'm not super happy with the way we have to pass the extended > error struct, but I don't see a way to implement reporting any > dynamic information (like error offsets) in any other way. > > Alexander Shishkin had a nice way of reporting static extended > error data, but that isn't really suitable for reporting the > offset or even reporting the broken attribute from nla_parse(). > > Speaking of nla_parse(), that'll be somewhat complicated to do > since we'll have to track the offsets of where we're parsing, > but it might be possible since the nlattrs are just pointers > into the message, so (optionally?) passing the skb as well can > allow us to fill the offset information. I like it, nice work. I know people want dynamically generated strings and stuff, and we can get there, but I prefer that the first thing we commit is super simple. Someone gave me a hard time about the fact that we've been talking about this idea for years but nothing ever happens. I'm tempted to apply this as-is just to show that person that things do in fact happen eventually :-)
[RFC 0/3] netlink: extended error reporting
So this is my first draft of what we'd talked about at netconf. I'm not super happy with the way we have to pass the extended error struct, but I don't see a way to implement reporting any dynamic information (like error offsets) in any other way. Alexander Shishkin had a nice way of reporting static extended error data, but that isn't really suitable for reporting the offset or even reporting the broken attribute from nla_parse(). Speaking of nla_parse(), that'll be somewhat complicated to do since we'll have to track the offsets of where we're parsing, but it might be possible since the nlattrs are just pointers into the message, so (optionally?) passing the skb as well can allow us to fill the offset information. johannes