Re: Broken netlink ABI

2017-11-16 Thread David Ahern
On 11/14/17 1:24 PM, Jon Maloy wrote:
> 
> 
>> -Original Message-
>> From: netdev-ow...@vger.kernel.org [mailto:netdev-
>> ow...@vger.kernel.org] On Behalf Of David Ahern
>> Sent: Tuesday, November 14, 2017 15:18
>> To: Jon Maloy <jon.ma...@ericsson.com>; netdev@vger.kernel.org; Jiri
>> Pirko <j...@resnulli.us>
>> Cc: David Miller (da...@davemloft.net) <da...@davemloft.net>
>> Subject: Re: Broken netlink ABI
>>
>> On 11/14/17 1:15 PM, David Ahern wrote:
>>> On 11/14/17 12:19 PM, Jon Maloy wrote:
>>>> When I give the command:
>>>> ~$ tipc node set addr 1.1.2
>>>>
>>>> I get the following response:
>>>>
>>>> error: Numerical result out of range
>>>> Unable to get TIPC nl family id (module loaded?) error, message
>>>> initialisation failed
>>>
>>> tipc is sending a u32 for the family attribute when it should be a u16:
>>>
>>> diff --git a/tipc/msg.c b/tipc/msg.c
>>> index 22c6bb20..dc09d05048f3 100644
>>> --- a/tipc/msg.c
>>> +++ b/tipc/msg.c
>>> @@ -125,7 +125,7 @@ static int get_family(void)
>>> genl->cmd = CTRL_CMD_GETFAMILY;
>>> genl->version = 1;
>>>
>>> -   mnl_attr_put_u32(nlh, CTRL_ATTR_FAMILY_ID, GENL_ID_CTRL);
>>> +   mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, GENL_ID_CTRL);
>>> mnl_attr_put_strz(nlh, CTRL_ATTR_FAMILY_NAME,
>>> TIPC_GENL_V2_NAME);
>>>
>>> if ((err = msg_query(nlh, family_id_cb, _family)))
>>>
>>> With the above change the tipc command runs fine.
> 
> I can fix that, but that that doesn't change the fact that binaries that have 
> been around and worked flawlessly for years now all by sudden have stopped 
> working.

The command has to be broken on some platforms (big endian?); it is
sending a u32 value which is truncated to u16 by the kernel.

> Whether the user is doing right or wrong, that if for me the very definition 
> of a broken ABI, and is unacceptable.
> 
> Either you have to remove the test in your patch, or you can try to identify 
> tipc and devlink in the code and exempt those from your test.
> 

DaveM: opinions? I expected fallout like this. Should I just log a
warning telling users they are running broken commands?


RE: Broken netlink ABI

2017-11-14 Thread Jon Maloy


> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of David Ahern
> Sent: Tuesday, November 14, 2017 15:18
> To: Jon Maloy <jon.ma...@ericsson.com>; netdev@vger.kernel.org; Jiri
> Pirko <j...@resnulli.us>
> Cc: David Miller (da...@davemloft.net) <da...@davemloft.net>
> Subject: Re: Broken netlink ABI
> 
> On 11/14/17 1:15 PM, David Ahern wrote:
> > On 11/14/17 12:19 PM, Jon Maloy wrote:
> >> When I give the command:
> >> ~$ tipc node set addr 1.1.2
> >>
> >> I get the following response:
> >>
> >> error: Numerical result out of range
> >> Unable to get TIPC nl family id (module loaded?) error, message
> >> initialisation failed
> >
> > tipc is sending a u32 for the family attribute when it should be a u16:
> >
> > diff --git a/tipc/msg.c b/tipc/msg.c
> > index 22c6bb20..dc09d05048f3 100644
> > --- a/tipc/msg.c
> > +++ b/tipc/msg.c
> > @@ -125,7 +125,7 @@ static int get_family(void)
> > genl->cmd = CTRL_CMD_GETFAMILY;
> > genl->version = 1;
> >
> > -   mnl_attr_put_u32(nlh, CTRL_ATTR_FAMILY_ID, GENL_ID_CTRL);
> > +   mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, GENL_ID_CTRL);
> > mnl_attr_put_strz(nlh, CTRL_ATTR_FAMILY_NAME,
> > TIPC_GENL_V2_NAME);
> >
> > if ((err = msg_query(nlh, family_id_cb, _family)))
> >
> > With the above change the tipc command runs fine.

I can fix that, but that that doesn't change the fact that binaries that have 
been around and worked flawlessly for years now all by sudden have stopped 
working.
Whether the user is doing right or wrong, that if for me the very definition of 
a broken ABI, and is unacceptable.

Either you have to remove the test in your patch, or you can try to identify 
tipc and devlink in the code and exempt those from your test.

BR
///jon

> >
> 
> devlink is similarly broken:
> 
> diff --git a/devlink/mnlg.c b/devlink/mnlg.c index
> 9e27de275518..b1e1b0ab32f6 100644
> --- a/devlink/mnlg.c
> +++ b/devlink/mnlg.c
> @@ -163,7 +163,7 @@ int mnlg_socket_group_add(struct mnlg_socket *nlg,
> const char *group_name)
> 
> nlh = __mnlg_msg_prepare(nlg, CTRL_CMD_GETFAMILY,
>  NLM_F_REQUEST | NLM_F_ACK, GENL_ID_CTRL, 1);
> -   mnl_attr_put_u32(nlh, CTRL_ATTR_FAMILY_ID, nlg->id);
> +   mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, nlg->id);
> 
> err = mnlg_socket_send(nlg, nlh);
> if (err < 0)


Re: Broken netlink ABI

2017-11-14 Thread David Ahern
On 11/14/17 1:15 PM, David Ahern wrote:
> On 11/14/17 12:19 PM, Jon Maloy wrote:
>> When I give the command:
>> ~$ tipc node set addr 1.1.2
>>
>> I get the following response:
>>
>> error: Numerical result out of range
>> Unable to get TIPC nl family id (module loaded?)
>> error, message initialisation failed
> 
> tipc is sending a u32 for the family attribute when it should be a u16:
> 
> diff --git a/tipc/msg.c b/tipc/msg.c
> index 22c6bb20..dc09d05048f3 100644
> --- a/tipc/msg.c
> +++ b/tipc/msg.c
> @@ -125,7 +125,7 @@ static int get_family(void)
> genl->cmd = CTRL_CMD_GETFAMILY;
> genl->version = 1;
> 
> -   mnl_attr_put_u32(nlh, CTRL_ATTR_FAMILY_ID, GENL_ID_CTRL);
> +   mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, GENL_ID_CTRL);
> mnl_attr_put_strz(nlh, CTRL_ATTR_FAMILY_NAME, TIPC_GENL_V2_NAME);
> 
> if ((err = msg_query(nlh, family_id_cb, _family)))
> 
> With the above change the tipc command runs fine.
> 

devlink is similarly broken:

diff --git a/devlink/mnlg.c b/devlink/mnlg.c
index 9e27de275518..b1e1b0ab32f6 100644
--- a/devlink/mnlg.c
+++ b/devlink/mnlg.c
@@ -163,7 +163,7 @@ int mnlg_socket_group_add(struct mnlg_socket *nlg,
const char *group_name)

nlh = __mnlg_msg_prepare(nlg, CTRL_CMD_GETFAMILY,
 NLM_F_REQUEST | NLM_F_ACK,
GENL_ID_CTRL, 1);
-   mnl_attr_put_u32(nlh, CTRL_ATTR_FAMILY_ID, nlg->id);
+   mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, nlg->id);

err = mnlg_socket_send(nlg, nlh);
if (err < 0)


Re: Broken netlink ABI

2017-11-14 Thread David Ahern
On 11/14/17 12:19 PM, Jon Maloy wrote:
> When I give the command:
> ~$ tipc node set addr 1.1.2
> 
> I get the following response:
> 
> error: Numerical result out of range
> Unable to get TIPC nl family id (module loaded?)
> error, message initialisation failed

tipc is sending a u32 for the family attribute when it should be a u16:

diff --git a/tipc/msg.c b/tipc/msg.c
index 22c6bb20..dc09d05048f3 100644
--- a/tipc/msg.c
+++ b/tipc/msg.c
@@ -125,7 +125,7 @@ static int get_family(void)
genl->cmd = CTRL_CMD_GETFAMILY;
genl->version = 1;

-   mnl_attr_put_u32(nlh, CTRL_ATTR_FAMILY_ID, GENL_ID_CTRL);
+   mnl_attr_put_u16(nlh, CTRL_ATTR_FAMILY_ID, GENL_ID_CTRL);
mnl_attr_put_strz(nlh, CTRL_ATTR_FAMILY_NAME, TIPC_GENL_V2_NAME);

if ((err = msg_query(nlh, family_id_cb, _family)))

With the above change the tipc command runs fine.


Re: Broken netlink ABI

2017-11-14 Thread David Ahern
On 11/14/17 12:19 PM, Jon Maloy wrote:
> commit 28033ae4e0f ("net: netlink: Update attr validation to require exact 
> length for some types") breaks the netlink ABI.

It's not breaking the ABI; it's enforcing expected attributes based on
policy.

> 
> When I give the command:
> ~$ tipc node set addr 1.1.2
> 
> I get the following response:
> 
> error: Numerical result out of range
> Unable to get TIPC nl family id (module loaded?)
> error, message initialisation failed
> 

I'll take a look at tipc and get back to you.