[PATCH v2 3/3] net/ipv6/udp_tunnel: prefer SO_BINDTOIFINDEX over SO_BINDTODEVICE

2019-01-15 Thread David Herrmann
The udp-tunnel setup allows binding sockets to a network device. Prefer
the new SO_BINDTOIFINDEX to avoid temporarily resolving the device-name
just to look it up in the ioctl again.

Reviewed-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
v2:
 - Rename to SO_BINDTOIFINDEX from SO_BINDTOIF

 net/ipv6/ip6_udp_tunnel.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index ad1a9ccd4b44..25430c991cea 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -32,18 +32,9 @@ int udp_sock_create6(struct net *net, struct udp_port_cfg 
*cfg,
goto error;
}
if (cfg->bind_ifindex) {
-   struct net_device *dev;
-
-   dev = dev_get_by_index(net, cfg->bind_ifindex);
-   if (!dev) {
-   err = -ENODEV;
-   goto error;
-   }
-
-   err = kernel_setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE,
-   dev->name, strlen(dev->name) + 1);
-   dev_put(dev);
-
+   err = kernel_setsockopt(sock, SOL_SOCKET, SO_BINDTOIFINDEX,
+   (void *)&cfg->bind_ifindex,
+   sizeof(cfg->bind_ifindex));
if (err < 0)
goto error;
}
-- 
2.20.1



[PATCH v2 1/3] net: introduce SO_BINDTOIFINDEX sockopt

2019-01-15 Thread David Herrmann
This introduces a new generic SOL_SOCKET-level socket option called
SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a
network interface index as argument, rather than the network interface
name.

User-space often refers to network-interfaces via their index, but has
to temporarily resolve it to a name for a call into SO_BINDTODEVICE.
This might pose problems when the network-device is renamed
asynchronously by other parts of the system. When this happens, the
SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong
device.

In most cases user-space only ever operates on devices which they
either manage themselves, or otherwise have a guarantee that the device
name will not change (e.g., devices that are UP cannot be renamed).
However, particularly in libraries this guarantee is non-obvious and it
would be nice if that race-condition would simply not exist. It would
make it easier for those libraries to operate even in situations where
the device-name might change under the hood.

A real use-case that we recently hit is trying to start the network
stack early in the initrd but make it survive into the real system.
Existing distributions rename network-interfaces during the transition
from initrd into the real system. This, obviously, cannot affect
devices that are up and running (unless you also consider moving them
between network-namespaces). However, the network manager now has to
make sure its management engine for dormant devices will not run in
parallel to these renames. Particularly, when you offload operations
like DHCP into separate processes, these might setup their sockets
early, and thus have to resolve the device-name possibly running into
this race-condition.

By avoiding a call to resolve the device-name, we no longer depend on
the name and can run network setup of dormant devices in parallel to
the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this
race.

Reviewed-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
v2:
 - Rename to SO_BINDTOIFINDEX from SO_BINDTOIF
 - skip 0x0040 SO-value on sparc, as it is already used

 arch/alpha/include/uapi/asm/socket.h  |  2 ++
 arch/ia64/include/uapi/asm/socket.h   |  2 ++
 arch/mips/include/uapi/asm/socket.h   |  2 ++
 arch/parisc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h   |  2 ++
 arch/sparc/include/uapi/asm/socket.h  |  2 ++
 arch/xtensa/include/uapi/asm/socket.h |  2 ++
 include/uapi/asm-generic/socket.h |  2 ++
 net/core/sock.c   | 46 +--
 9 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 065fb372e355..b1c9b542c021 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -115,4 +115,6 @@
 #define SO_TXTIME  61
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIFINDEX   62
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index c872c4e6bafb..ba0d245f9576 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -117,4 +117,6 @@
 #define SO_TXTIME  61
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIFINDEX   62
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index 71370fb3ceef..73e25e35d803 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -126,4 +126,6 @@
 #define SO_TXTIME  61
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIFINDEX   62
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h 
b/arch/parisc/include/uapi/asm/socket.h
index 061b9cf2a779..52bed5976cbe 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -107,4 +107,6 @@
 #define SO_TXTIME  0x4036
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIFINDEX   0x4037
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h 
b/arch/s390/include/uapi/asm/socket.h
index 39d901476ee5..49c971587087 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -114,4 +114,6 @@
 #define SO_TXTIME  61
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIFINDEX   62
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h 
b/arch/sparc/include/uapi/asm/socket.h
index 7ea35e5601b6..bbdb81594dd4 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -104,6 +104,8 @@
 #define SO_TXTIME  0x003f
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIFINDEX   0x0041
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #d

[PATCH v2 2/3] net/ipv4/udp_tunnel: prefer SO_BINDTOIFINDEX over SO_BINDTODEVICE

2019-01-15 Thread David Herrmann
The udp-tunnel setup allows binding sockets to a network device. Prefer
the new SO_BINDTOIFINDEX to avoid temporarily resolving the device-name
just to look it up in the ioctl again.

Reviewed-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
v2:
 - Rename to SO_BINDTOIFINDEX from SO_BINDTOIF

 net/ipv4/udp_tunnel.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index be8b5b2157d8..e93cc0379201 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -21,18 +21,9 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg 
*cfg,
goto error;
 
if (cfg->bind_ifindex) {
-   struct net_device *dev;
-
-   dev = dev_get_by_index(net, cfg->bind_ifindex);
-   if (!dev) {
-   err = -ENODEV;
-   goto error;
-   }
-
-   err = kernel_setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE,
-   dev->name, strlen(dev->name) + 1);
-   dev_put(dev);
-
+   err = kernel_setsockopt(sock, SOL_SOCKET, SO_BINDTOIFINDEX,
+   (void *)&cfg->bind_ifindex,
+   sizeof(cfg->bind_ifindex));
if (err < 0)
goto error;
}
-- 
2.20.1



Re: [PATCH 1/3] net: introduce SO_BINDTOIF sockopt

2019-01-10 Thread David Herrmann
Hi

On Thu, Jan 10, 2019 at 5:38 PM David Ahern  wrote:
>
> On 1/10/19 7:25 AM, David Herrmann wrote:
> > This introduces a new generic SOL_SOCKET-level socket option called
> > SO_BINDTOIF. It behaves similar to SO_BINDTODEVICE, but takes a network
> > interface index as argument, rather than the network interface name.
>
> SO_BINDTOIF is not very descriptive related to SO_BINDTODEVICE.
>
> SO_BINDTOINDEX, SO_BINDTODEVINDEX or SO_BINDTODEVIDX would be clearer
> about this option versus SO_BINDTODEVICE.

I am open for these suggestions. I don't particularly have any
preference on the names, but I agree that BINDTOIF is not very easy to
read. For v2 I will pick SO_BINDTOIFINDEX (as suggested by Roopa),
unless there are any objections.

Thanks!
David


[PATCH 3/3] net/ipv6/udp_tunnel: prefer SO_BINDTOIF over SO_BINDTODEVICE

2019-01-10 Thread David Herrmann
The udp-tunnel setup allows binding sockets to a network device. Prefer
the new SO_BINDTOIF to avoid temporarily resolving the device-name just
to look it up in the ioctl again.

Signed-off-by: David Herrmann 
---
 net/ipv6/ip6_udp_tunnel.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index ad1a9ccd4b44..f5fd4e05ae81 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -32,18 +32,9 @@ int udp_sock_create6(struct net *net, struct udp_port_cfg 
*cfg,
goto error;
}
if (cfg->bind_ifindex) {
-   struct net_device *dev;
-
-   dev = dev_get_by_index(net, cfg->bind_ifindex);
-   if (!dev) {
-   err = -ENODEV;
-   goto error;
-   }
-
-   err = kernel_setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE,
-   dev->name, strlen(dev->name) + 1);
-   dev_put(dev);
-
+   err = kernel_setsockopt(sock, SOL_SOCKET, SO_BINDTOIF,
+   (void *)&cfg->bind_ifindex,
+   sizeof(cfg->bind_ifindex));
if (err < 0)
goto error;
}
-- 
2.20.1



[PATCH 1/3] net: introduce SO_BINDTOIF sockopt

2019-01-10 Thread David Herrmann
This introduces a new generic SOL_SOCKET-level socket option called
SO_BINDTOIF. It behaves similar to SO_BINDTODEVICE, but takes a network
interface index as argument, rather than the network interface name.

User-space often refers to network-interfaces via their index, but has
to temporarily resolve it to a name for a call into SO_BINDTODEVICE.
This might pose problems when the network-device is renamed
asynchronously by other parts of the system. When this happens, the
SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong
device.

In most cases user-space only ever operates on devices which they
either manage themselves, or otherwise have a guarantee that the device
name will not change (e.g., devices that are UP cannot be renamed).
However, particularly in libraries this guarantee is non-obvious and it
would be nice if that race-condition would simply not exist. It would
make it easier for those libraries to operate even in situations where
the device-name might change under the hood.

A real use-case that we recently hit is trying to start the network
stack early in the initrd but make it survive into the real system.
Existing distributions rename network-interfaces during the transition
from initrd into the real system. This, obviously, cannot affect
devices that are up and running (unless you also consider moving them
between network-namespaces). However, the network manager now has to
make sure its management engine for dormant devices will not run in
parallel to these renames. Particularly, when you offload operations
like DHCP into separate processes, these might setup their sockets
early, and thus have to resolve the device-name possibly running into
this race-condition.

By avoiding a call to resolve the device-name, we no longer depend on
the name and can run network setup of dormant devices in parallel to
the transition off the initrd. The SO_BINDTOIF ioctl plugs this race.

Signed-off-by: David Herrmann 
---
 arch/alpha/include/uapi/asm/socket.h  |  2 ++
 arch/ia64/include/uapi/asm/socket.h   |  2 ++
 arch/mips/include/uapi/asm/socket.h   |  2 ++
 arch/parisc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h   |  2 ++
 arch/sparc/include/uapi/asm/socket.h  |  2 ++
 arch/xtensa/include/uapi/asm/socket.h |  2 ++
 include/uapi/asm-generic/socket.h |  2 ++
 net/core/sock.c   | 46 +--
 9 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 065fb372e355..6e346e51eec7 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -115,4 +115,6 @@
 #define SO_TXTIME  61
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIF62
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index c872c4e6bafb..ece83ba17b9d 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -117,4 +117,6 @@
 #define SO_TXTIME  61
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIF62
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index 71370fb3ceef..27f7f761ace5 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -126,4 +126,6 @@
 #define SO_TXTIME  61
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIF62
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h 
b/arch/parisc/include/uapi/asm/socket.h
index 061b9cf2a779..efd3917f23e1 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -107,4 +107,6 @@
 #define SO_TXTIME  0x4036
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIF0x4037
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h 
b/arch/s390/include/uapi/asm/socket.h
index 39d901476ee5..c8ba542e69e6 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -114,4 +114,6 @@
 #define SO_TXTIME  61
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIF62
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h 
b/arch/sparc/include/uapi/asm/socket.h
index 7ea35e5601b6..50006bde7dc0 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -104,6 +104,8 @@
 #define SO_TXTIME  0x003f
 #define SCM_TXTIME SO_TXTIME
 
+#define SO_BINDTOIF0x0040
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION 0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT   0x5002
diff --git a/arch/xtensa/include/uapi/asm/socket.h 
b

[PATCH 2/3] net/ipv4/udp_tunnel: prefer SO_BINDTOIF over SO_BINDTODEVICE

2019-01-10 Thread David Herrmann
The udp-tunnel setup allows binding sockets to a network device. Prefer
the new SO_BINDTOIF to avoid temporarily resolving the device-name just
to look it up in the ioctl again.

Signed-off-by: David Herrmann 
---
 net/ipv4/udp_tunnel.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index be8b5b2157d8..5d26c501da3b 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -21,18 +21,9 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg 
*cfg,
goto error;
 
if (cfg->bind_ifindex) {
-   struct net_device *dev;
-
-   dev = dev_get_by_index(net, cfg->bind_ifindex);
-   if (!dev) {
-   err = -ENODEV;
-   goto error;
-   }
-
-   err = kernel_setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE,
-   dev->name, strlen(dev->name) + 1);
-   dev_put(dev);
-
+   err = kernel_setsockopt(sock, SOL_SOCKET, SO_BINDTOIF,
+   (void *)&cfg->bind_ifindex,
+   sizeof(cfg->bind_ifindex));
if (err < 0)
goto error;
}
-- 
2.20.1



[PATCH v2 0/4] Introduce LSM-hook for socketpair(2)

2018-05-04 Thread David Herrmann
Hi

This is v2 of the socketpair(2) LSM hook introduction. Changes
since v1 are:

  - Added ACKs from previous series.

  - Moved the hook into generic socketpair(2) handling. The hook is now
called security_socket_socketpair(), just like the other hooks on
the socket layer.
There is no AF_UNIX specific code, anymore.

  - Added SMACK support.

  - Still *NO* AppArmor support, since upstream AppArmor lacks
SO_PEERSEC support and requires downstream patches (in particular,
the apparmor_unix_stream_connect() function is mostly a stub and
was never synced with Ubuntu downstream modifications).

Old cover letter follows (only trivial changes).

Thanks
David


This series adds a new LSM hook for the socketpair(2) syscall. The idea
is to allow SO_PEERSEC to be called on AF_UNIX sockets created via
socketpair(2), and return the same information as if you emulated
socketpair(2) via a temporary listener socket. Right now SO_PEERSEC
will return the unlabeled credentials for a socketpair, rather than the
actual credentials of the creating process.

A simple call to:

socketpair(AF_UNIX, SOCK_STREAM, 0, out);

can be emulated via a temporary listener socket bound to a unique,
random name in the abstract namespace. By connecting to this listener
socket, accept(2) will return the second part of the pair. If
SO_PEERSEC is queried on these, the correct credentials of the creating
process are returned. A simple comparison between the behavior of
SO_PEERSEC on socketpair(2) and an emulated socketpair is included in
the dbus-broker test-suite [1].

This patch series tries to close this gap and makes both behave the
same. A new LSM-hook is added which allows LSMs to cache the correct
peer information on newly created socket-pairs.

Apart from fixing this behavioral difference, the dbus-broker project
actually needs to query the credentials of socketpairs, and currently
must resort to racy procfs(2) queries to get the LSM credentials of its
controller socket. Several parts of the dbus-broker project allow you
to pass in a socket during execve(2), which will be used by the child
process to accept control-commands from its parent. One natural way to
create this communication channel is to use socketpair(2). However,
right now SO_PEERSEC does not return any useful information, hence, the
child-process would need other means to retrieve this information. By
avoiding socketpair(2) and using the hacky-emulated version, this is not
an issue.

There was a previous discussion on this matter [2] roughly a year ago.
Back then there was the suspicion that proper SO_PEERSEC would confuse
applications. However, we could not find any evidence backing this
suspicion. Furthermore, we now actually see the contrary. Lack of
SO_PEERSEC makes it a hassle to use socketpairs with LSM credentials.
Hence, we propose to implement full SO_PEERSEC for socketpairs.

Thanks
David

[1] https://github.com/bus1/dbus-broker/blob/master/src/util/test-peersec.c
[2] https://www.spinics.net/lists/selinux/msg22674.html

David Herrmann (3):
  security: add hook for socketpair()
  net: hook socketpair() into LSM
  selinux: provide socketpair callback

Tom Gundersen (1):
  smack: provide socketpair callback

 include/linux/lsm_hooks.h  |  7 +++
 include/linux/security.h   |  7 +++
 net/socket.c   |  7 +++
 security/security.c|  6 ++
 security/selinux/hooks.c   | 13 +
 security/smack/smack_lsm.c | 22 ++
 6 files changed, 62 insertions(+)

-- 
2.17.0



[PATCH v2 3/4] selinux: provide socketpair callback

2018-05-04 Thread David Herrmann
Make sure to implement the new socketpair callback so the SO_PEERSEC
call on socketpair(2)s will return correct information.

Acked-by: Serge Hallyn 
Acked-by: Stephen Smalley 
Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 security/selinux/hooks.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 4cafe6a19167..02ebd1585eaf 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4569,6 +4569,18 @@ static int selinux_socket_post_create(struct socket 
*sock, int family,
return err;
 }
 
+static int selinux_socket_socketpair(struct socket *socka,
+struct socket *sockb)
+{
+   struct sk_security_struct *sksec_a = socka->sk->sk_security;
+   struct sk_security_struct *sksec_b = sockb->sk->sk_security;
+
+   sksec_a->peer_sid = sksec_b->sid;
+   sksec_b->peer_sid = sksec_a->sid;
+
+   return 0;
+}
+
 /* Range of port numbers used to automatically bind.
Need to determine whether we should perform a name_bind
permission check between the socket and the port number. */
@@ -6999,6 +7011,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(socket_create, selinux_socket_create),
LSM_HOOK_INIT(socket_post_create, selinux_socket_post_create),
+   LSM_HOOK_INIT(socket_socketpair, selinux_socket_socketpair),
LSM_HOOK_INIT(socket_bind, selinux_socket_bind),
LSM_HOOK_INIT(socket_connect, selinux_socket_connect),
LSM_HOOK_INIT(socket_listen, selinux_socket_listen),
-- 
2.17.0



[PATCH v2 1/4] security: add hook for socketpair()

2018-05-04 Thread David Herrmann
Right now the LSM labels for socketpairs are always uninitialized,
since there is no security hook for the socketpair() syscall. This
patch adds the required hooks so LSMs can properly label socketpairs.
This allows SO_PEERSEC to return useful information on those sockets.

Note that the behavior of socketpair() can be emulated by creating a
listener socket, connecting to it, and then discarding the initial
listener socket. With this workaround, SO_PEERSEC would return the
caller's security context. However, with socketpair(), the uninitialized
context is returned unconditionally. This is unexpected and makes
socketpair() less useful in situations where the security context is
crucial to the application.

With the new socketpair-hook this disparity can be solved by making
socketpair() return the expected security context.

Acked-by: Serge Hallyn 
Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 include/linux/lsm_hooks.h | 7 +++
 include/linux/security.h  | 7 +++
 security/security.c   | 6 ++
 3 files changed, 20 insertions(+)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 9d0b286f3dba..8f1131c8dd54 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -757,6 +757,11 @@
  * @type contains the requested communications type.
  * @protocol contains the requested protocol.
  * @kern set to 1 if a kernel socket.
+ * @socket_socketpair:
+ * Check permissions before creating a fresh pair of sockets.
+ * @socka contains the first socket structure.
+ * @sockb contains the second socket structure.
+ * Return 0 if permission is granted and the connection was established.
  * @socket_bind:
  * Check permission before socket protocol layer bind operation is
  * performed and the socket @sock is bound to the address specified in the
@@ -1656,6 +1661,7 @@ union security_list_options {
int (*socket_create)(int family, int type, int protocol, int kern);
int (*socket_post_create)(struct socket *sock, int family, int type,
int protocol, int kern);
+   int (*socket_socketpair)(struct socket *socka, struct socket *sockb);
int (*socket_bind)(struct socket *sock, struct sockaddr *address,
int addrlen);
int (*socket_connect)(struct socket *sock, struct sockaddr *address,
@@ -1922,6 +1928,7 @@ struct security_hook_heads {
struct hlist_head unix_may_send;
struct hlist_head socket_create;
struct hlist_head socket_post_create;
+   struct hlist_head socket_socketpair;
struct hlist_head socket_bind;
struct hlist_head socket_connect;
struct hlist_head socket_listen;
diff --git a/include/linux/security.h b/include/linux/security.h
index 200920f521a1..4ff3ba457e56 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1191,6 +1191,7 @@ int security_unix_may_send(struct socket *sock,  struct 
socket *other);
 int security_socket_create(int family, int type, int protocol, int kern);
 int security_socket_post_create(struct socket *sock, int family,
int type, int protocol, int kern);
+int security_socket_socketpair(struct socket *socka, struct socket *sockb);
 int security_socket_bind(struct socket *sock, struct sockaddr *address, int 
addrlen);
 int security_socket_connect(struct socket *sock, struct sockaddr *address, int 
addrlen);
 int security_socket_listen(struct socket *sock, int backlog);
@@ -1262,6 +1263,12 @@ static inline int security_socket_post_create(struct 
socket *sock,
return 0;
 }
 
+static inline int security_socket_socketpair(struct socket *socka,
+struct socket *sockb)
+{
+   return 0;
+}
+
 static inline int security_socket_bind(struct socket *sock,
   struct sockaddr *address,
   int addrlen)
diff --git a/security/security.c b/security/security.c
index 7bc2fde023a7..68f46d849abe 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1358,6 +1358,12 @@ int security_socket_post_create(struct socket *sock, int 
family,
protocol, kern);
 }
 
+int security_socket_socketpair(struct socket *socka, struct socket *sockb)
+{
+   return call_int_hook(socket_socketpair, 0, socka, sockb);
+}
+EXPORT_SYMBOL(security_socket_socketpair);
+
 int security_socket_bind(struct socket *sock, struct sockaddr *address, int 
addrlen)
 {
return call_int_hook(socket_bind, 0, sock, address, addrlen);
-- 
2.17.0



[PATCH v2 4/4] smack: provide socketpair callback

2018-05-04 Thread David Herrmann
From: Tom Gundersen 

Make sure to implement the new socketpair callback so the SO_PEERSEC
call on socketpair(2)s will return correct information.

Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 security/smack/smack_lsm.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 0b414836bebd..dcb976f98df2 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -2842,6 +2842,27 @@ static int smack_socket_post_create(struct socket *sock, 
int family,
return smack_netlabel(sock->sk, SMACK_CIPSO_SOCKET);
 }
 
+/**
+ * smack_socket_socketpair - create socket pair
+ * @socka: one socket
+ * @sockb: another socket
+ *
+ * Cross reference the peer labels for SO_PEERSEC
+ *
+ * Returns 0 on success, and error code otherwise
+ */
+static int smack_socket_socketpair(struct socket *socka,
+  struct socket *sockb)
+{
+   struct socket_smack *asp = socka->sk->sk_security;
+   struct socket_smack *bsp = sockb->sk->sk_security;
+
+   asp->smk_packet = bsp->smk_out;
+   bsp->smk_packet = asp->smk_out;
+
+   return 0;
+}
+
 #ifdef SMACK_IPV6_PORT_LABELING
 /**
  * smack_socket_bind - record port binding information.
@@ -4724,6 +4745,7 @@ static struct security_hook_list smack_hooks[] 
__lsm_ro_after_init = {
LSM_HOOK_INIT(unix_may_send, smack_unix_may_send),
 
LSM_HOOK_INIT(socket_post_create, smack_socket_post_create),
+   LSM_HOOK_INIT(socket_socketpair, smack_socket_socketpair),
 #ifdef SMACK_IPV6_PORT_LABELING
LSM_HOOK_INIT(socket_bind, smack_socket_bind),
 #endif
-- 
2.17.0



[PATCH v2 2/4] net: hook socketpair() into LSM

2018-05-04 Thread David Herrmann
Use the newly created LSM-hook for socketpair(). The default hook
return-value is 0, so behavior stays the same unless LSMs start using
this hook.

Acked-by: Serge Hallyn 
Signed-off-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 net/socket.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/socket.c b/net/socket.c
index f10f1d947c78..667a7b397134 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1420,6 +1420,13 @@ int __sys_socketpair(int family, int type, int protocol, 
int __user *usockvec)
goto out;
}
 
+   err = security_socket_socketpair(sock1, sock2);
+   if (unlikely(err)) {
+   sock_release(sock2);
+   sock_release(sock1);
+   goto out;
+   }
+
err = sock1->ops->socketpair(sock1, sock2);
if (unlikely(err < 0)) {
sock_release(sock2);
-- 
2.17.0



Re: [PATCH 0/3] Introduce LSM-hook for socketpair(2)

2018-05-04 Thread David Herrmann
Hey

On Wed, Apr 25, 2018 at 9:02 PM, James Morris  wrote:
> On Wed, 25 Apr 2018, Paul Moore wrote:
>
>> On Wed, Apr 25, 2018 at 2:44 PM, James Morris  wrote:
>> > On Mon, 23 Apr 2018, David Herrmann wrote:
>> >> This patch series tries to close this gap and makes both behave the
>> >> same. A new LSM-hook is added which allows LSMs to cache the correct
>> >> peer information on newly created socket-pairs.
>> >
>> > Looks okay to me.
>> >
>> > Once it's respun with the Smack backend and maybe the hook name change,
>> > I'll merge this unless DaveM wants it to go in via his networking tree.
>>
>> Note my objection to the hook placement in patch 2/3; I think we
>> should move the hook out of the AF_UNIX layer and up into the socket
>> layer.
>
> I vote for this as it maintains the intended abstraction of the socket
> API.

Sounds good, I changed it. I will send v2 shortly.

Thanks
David


[PATCH 0/3] Introduce LSM-hook for socketpair(2)

2018-04-23 Thread David Herrmann
Hi

This series adds a new LSM hook for the socketpair(2) syscall. The idea
is to allow SO_PEERSEC to be called on AF_UNIX sockets created via
socketpair(2), and return the same information as if you emulated
socketpair(2) via a temporary listener socket. Right now SO_PEERSEC
will return the unlabeled credentials for a socketpair, rather than the
actual credentials of the creating process.

A simple call to:

socketpair(AF_UNIX, SOCK_STREAM, 0, out);

can be emulated via a temporary listener socket bound to a unique,
random name in the abstract namespace. By connecting to this listener
socket, accept(2) will return the second part of the pair. If
SO_PEERSEC is queried on these, the correct credentials of the creating
process are returned. A simple comparison between the behavior of
SO_PEERSEC on socketpair(2) and an emulated socketpair is included in
the dbus-broker test-suite [1].

This patch series tries to close this gap and makes both behave the
same. A new LSM-hook is added which allows LSMs to cache the correct
peer information on newly created socket-pairs.

Apart from fixing this behavioral difference, the dbus-broker project
actually needs to query the credentials of socketpairs, and currently
must resort to racy procfs(2) queries to get the LSM credentials of its
controller socket. Several parts of the dbus-broker project allow you
to pass in a socket during execve(2), which will be used by the child
process to accept control-commands from its parent. One natural way to
create this communication channel is to use socketpair(2). However,
right now SO_PEERSEC does not return any useful information, hence, the
child-process would need other means to retrieve this information. By
avoiding socketpair(2) and using the hacky-emulated version, this is not
an issue.

There was a previous discussion on this matter [2] roughly a year ago.
Back then there was the suspicion that proper SO_PEERSEC would confuse
applications. However, we could not find any evidence backing this
suspicion. Furthermore, we now actually see the contrary. Lack of
SO_PEERSEC makes it a hassle to use socketpairs with LSM credentials.
Hence, we propose to implement full SO_PEERSEC for socketpairs.

This series only adds SELinux backends, since that is what we need for
RHEL. I will gladly extend the other LSMs if needed.

Thanks
David

[1] https://github.com/bus1/dbus-broker/blob/master/src/util/test-peersec.c
[2] https://www.spinics.net/lists/selinux/msg22674.html

David Herrmann (3):
  security: add hook for socketpair(AF_UNIX, ...)
  net/unix: hook unix_socketpair() into LSM
  selinux: provide unix_stream_socketpair callback

 include/linux/lsm_hooks.h |  8 
 include/linux/security.h  |  7 +++
 net/unix/af_unix.c|  5 +
 security/security.c   |  6 ++
 security/selinux/hooks.c  | 14 ++
 5 files changed, 40 insertions(+)

-- 
2.17.0



[PATCH 3/3] selinux: provide unix_stream_socketpair callback

2018-04-23 Thread David Herrmann
Make sure to implement the new unix_stream_socketpair callback so the
SO_PEERSEC call on socketpair(2)s will return correct information.

Signed-off-by: David Herrmann 
---
 security/selinux/hooks.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 4cafe6a19167..828881d9a41d 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4905,6 +4905,18 @@ static int selinux_socket_unix_stream_connect(struct 
sock *sock,
return 0;
 }
 
+static int selinux_socket_unix_stream_socketpair(struct sock *socka,
+struct sock *sockb)
+{
+   struct sk_security_struct *sksec_a = socka->sk_security;
+   struct sk_security_struct *sksec_b = sockb->sk_security;
+
+   sksec_a->peer_sid = sksec_b->sid;
+   sksec_b->peer_sid = sksec_a->sid;
+
+   return 0;
+}
+
 static int selinux_socket_unix_may_send(struct socket *sock,
struct socket *other)
 {
@@ -6995,6 +7007,8 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
LSM_HOOK_INIT(inode_getsecctx, selinux_inode_getsecctx),
 
LSM_HOOK_INIT(unix_stream_connect, selinux_socket_unix_stream_connect),
+   LSM_HOOK_INIT(unix_stream_socketpair,
+   selinux_socket_unix_stream_socketpair),
LSM_HOOK_INIT(unix_may_send, selinux_socket_unix_may_send),
 
LSM_HOOK_INIT(socket_create, selinux_socket_create),
-- 
2.17.0



[PATCH 1/3] security: add hook for socketpair(AF_UNIX, ...)

2018-04-23 Thread David Herrmann
Right now the LSM labels for socketpairs are always uninitialized,
since there is no security hook for the socketpair() syscall. This
patch adds the required hooks so LSMs can properly label socketpairs.
This allows SO_PEERSEC to return useful information on those sockets.

Note that the behavior of socketpair() can be emulated by creating a
listener socket, connecting to it, and then discarding the initial
listener socket. With this workaround, SO_PEERSEC would return the
caller's security context. However, with socketpair(), the uninitialized
context is returned unconditionally. This is unexpected and makes
socketpair() less useful in situations where the security context is
crucial to the application.

With the new socketpair-hook this disparity can be solved by making
socketpair() return the expected security context.

Signed-off-by: David Herrmann 
---
 include/linux/lsm_hooks.h | 8 
 include/linux/security.h  | 7 +++
 security/security.c   | 6 ++
 3 files changed, 21 insertions(+)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 9d0b286f3dba..2a23c75c1541 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -717,6 +717,12 @@
  * @other contains the peer sock structure.
  * @newsk contains the new sock structure.
  * Return 0 if permission is granted.
+ * @unix_stream_socketpair:
+ * Check permissions before establishing a Unix domain stream connection
+ * for a fresh pair of sockets.
+ * @socka contains the first sock structure.
+ * @sockb contains the second sock structure.
+ * Return 0 if permission is granted and the connection was established.
  * @unix_may_send:
  * Check permissions before connecting or sending datagrams from @sock to
  * @other.
@@ -1651,6 +1657,7 @@ union security_list_options {
 #ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect)(struct sock *sock, struct sock *other,
struct sock *newsk);
+   int (*unix_stream_socketpair)(struct sock *socka, struct sock *sockb);
int (*unix_may_send)(struct socket *sock, struct socket *other);
 
int (*socket_create)(int family, int type, int protocol, int kern);
@@ -1919,6 +1926,7 @@ struct security_hook_heads {
struct hlist_head inode_getsecctx;
 #ifdef CONFIG_SECURITY_NETWORK
struct hlist_head unix_stream_connect;
+   struct hlist_head unix_stream_socketpair;
struct hlist_head unix_may_send;
struct hlist_head socket_create;
struct hlist_head socket_post_create;
diff --git a/include/linux/security.h b/include/linux/security.h
index 200920f521a1..be275deeda10 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1187,6 +1187,7 @@ static inline int security_inode_getsecctx(struct inode 
*inode, void **ctx, u32
 #ifdef CONFIG_SECURITY_NETWORK
 
 int security_unix_stream_connect(struct sock *sock, struct sock *other, struct 
sock *newsk);
+int security_unix_stream_socketpair(struct sock *socka, struct sock *sockb);
 int security_unix_may_send(struct socket *sock,  struct socket *other);
 int security_socket_create(int family, int type, int protocol, int kern);
 int security_socket_post_create(struct socket *sock, int family,
@@ -1242,6 +1243,12 @@ static inline int security_unix_stream_connect(struct 
sock *sock,
return 0;
 }
 
+static inline int security_unix_stream_socketpair(struct sock *socka,
+ struct sock *sockb)
+{
+   return 0;
+}
+
 static inline int security_unix_may_send(struct socket *sock,
 struct socket *other)
 {
diff --git a/security/security.c b/security/security.c
index 7bc2fde023a7..3dfd374e84e5 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1340,6 +1340,12 @@ int security_unix_stream_connect(struct sock *sock, 
struct sock *other, struct s
 }
 EXPORT_SYMBOL(security_unix_stream_connect);
 
+int security_unix_stream_socketpair(struct sock *socka, struct sock *sockb)
+{
+   return call_int_hook(unix_stream_socketpair, 0, socka, sockb);
+}
+EXPORT_SYMBOL(security_unix_stream_socketpair);
+
 int security_unix_may_send(struct socket *sock,  struct socket *other)
 {
return call_int_hook(unix_may_send, 0, sock, other);
-- 
2.17.0



[PATCH 2/3] net/unix: hook unix_socketpair() into LSM

2018-04-23 Thread David Herrmann
Use the newly created LSM-hook for unix_socketpair(). The default hook
return-value is 0, so behavior stays the same unless LSMs start using
this hook.

Signed-off-by: David Herrmann 
---
 net/unix/af_unix.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 68bb70a62afe..bc9705ace9b1 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1371,6 +1371,11 @@ static int unix_stream_connect(struct socket *sock, 
struct sockaddr *uaddr,
 static int unix_socketpair(struct socket *socka, struct socket *sockb)
 {
struct sock *ska = socka->sk, *skb = sockb->sk;
+   int err;
+
+   err = security_unix_stream_socketpair(ska, skb);
+   if (err)
+   return err;
 
/* Join our sockets back to back */
sock_hold(ska);
-- 
2.17.0



[PATCH] net/unix: drop obsolete fd-recursion limits

2017-07-17 Thread David Herrmann
All unix sockets now account inflight FDs to the respective sender.
This was introduced in:

commit 712f4aad406bb1ed67f3f98d04c044191f0ff593
Author: willy tarreau 
Date:   Sun Jan 10 07:54:56 2016 +0100

unix: properly account for FDs passed over unix sockets

and further refined in:

commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6
Author: Hannes Frederic Sowa 
Date:   Wed Feb 3 02:11:03 2016 +0100

unix: correctly track in-flight fds in sending process user_struct

Hence, regardless of the stacking depth of FDs, the total number of
inflight FDs is limited, and accounted. There is no known way for a
local user to exceed those limits or exploit the accounting.

Furthermore, the GC logic is independent of the recursion/stacking depth
as well. It solely depends on the total number of inflight FDs,
regardless of their layout.

Lastly, the current `recursion_level' suffers a TOCTOU race, since it
checks and inherits depths only at queue time. If we consider `A<-B' to
mean `queue-B-on-A', the following sequence circumvents the recursion
level easily:

A<-B
   B<-C
  C<-D
 ...
   Y<-Z

resulting in:

A<-B<-C<-...<-Z

With all of this in mind, lets drop the recursion limit. It has no
additional security value, anymore. On the contrary, it randomly
confuses message brokers that try to forward file-descriptors, since
any sendmsg(2) call can fail spuriously with ETOOMANYREFS if a client
maliciously modifies the FD while inflight.

Cc: Alban Crequy 
Cc: Simon McVittie 
Signed-off-by: David Herrmann 
---
 include/net/af_unix.h |  1 -
 net/unix/af_unix.c| 24 +---
 2 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 678e4d6fa317..3b3194b2fc65 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -58,7 +58,6 @@ struct unix_sock {
struct list_headlink;
atomic_long_t   inflight;
spinlock_t  lock;
-   unsigned char   recursion_level;
unsigned long   gc_flags;
 #define UNIX_GC_CANDIDATE  0
 #define UNIX_GC_MAYBE_CYCLE1
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 7b52a380d710..5c53f22d62e8 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1528,26 +1528,13 @@ static inline bool too_many_unix_fds(struct task_struct 
*p)
return false;
 }
 
-#define MAX_RECURSION_LEVEL 4
-
 static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 {
int i;
-   unsigned char max_level = 0;
 
if (too_many_unix_fds(current))
return -ETOOMANYREFS;
 
-   for (i = scm->fp->count - 1; i >= 0; i--) {
-   struct sock *sk = unix_get_socket(scm->fp->fp[i]);
-
-   if (sk)
-   max_level = max(max_level,
-   unix_sk(sk)->recursion_level);
-   }
-   if (unlikely(max_level > MAX_RECURSION_LEVEL))
-   return -ETOOMANYREFS;
-
/*
 * Need to duplicate file references for the sake of garbage
 * collection.  Otherwise a socket in the fps might become a
@@ -1559,7 +1546,7 @@ static int unix_attach_fds(struct scm_cookie *scm, struct 
sk_buff *skb)
 
for (i = scm->fp->count - 1; i >= 0; i--)
unix_inflight(scm->fp->user, scm->fp->fp[i]);
-   return max_level;
+   return 0;
 }
 
 static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool 
send_fds)
@@ -1649,7 +1636,6 @@ static int unix_dgram_sendmsg(struct socket *sock, struct 
msghdr *msg,
struct sk_buff *skb;
long timeo;
struct scm_cookie scm;
-   int max_level;
int data_len = 0;
int sk_locked;
 
@@ -1701,7 +1687,6 @@ static int unix_dgram_sendmsg(struct socket *sock, struct 
msghdr *msg,
err = unix_scm_to_skb(&scm, skb, true);
if (err < 0)
goto out_free;
-   max_level = err + 1;
 
skb_put(skb, len - data_len);
skb->data_len = data_len;
@@ -1819,8 +1804,6 @@ static int unix_dgram_sendmsg(struct socket *sock, struct 
msghdr *msg,
__net_timestamp(skb);
maybe_add_creds(skb, sock, other);
skb_queue_tail(&other->sk_receive_queue, skb);
-   if (max_level > unix_sk(other)->recursion_level)
-   unix_sk(other)->recursion_level = max_level;
unix_state_unlock(other);
other->sk_data_ready(other);
sock_put(other);
@@ -1855,7 +1838,6 @@ static int unix_stream_sendmsg(struct socket *sock, 
struct msghdr *msg,
int sent = 0;
struct scm_cookie scm;
bool fds_sent = false;
-   int max_level;
int data_len;
 
wait_for_unix_gc();
@@ -1905,7 +1887,6 @@ static int unix_stream_sendmsg(struc

[PATCH v2] net: introduce SO_PEERGROUPS getsockopt

2017-06-21 Thread David Herrmann
This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
retrieve the auxiliary groups of the remote peer. It is designed to
naturally extend SO_PEERCRED. That is, the underlying data is from the
same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
is, if the provided buffer is too small, ERANGE is returned and @optlen
is updated. Otherwise, the information is copied, @optlen is set to the
actual size, and 0 is returned.

While SO_PEERCRED (and thus `struct ucred') already returns the primary
group, it lacks the auxiliary group vector. However, nearly all access
controls (including kernel side VFS and SYSVIPC, but also user-space
polkit, DBus, ...) consider the entire set of groups, rather than just
the primary group. But this is currently not possible with pure
SO_PEERCRED. Instead, user-space has to work around this and query the
system database for the auxiliary groups of a UID retrieved via
SO_PEERCRED.

Unfortunately, there is no race-free way to query the auxiliary groups
of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
solution is to use getgrouplist(3p), which itself falls back to NSS and
whatever is configured in nsswitch.conf(3). This effectively checks
which groups we *would* assign to the user if it logged in *now*. On
normal systems it is as easy as reading /etc/group, but with NSS it can
resort to quering network databases (eg., LDAP), using IPC or network
communication.

Long story short: Whenever we want to use auxiliary groups for access
checks on IPC, we need further IPC to talk to the user/group databases,
rather than just relying on SO_PEERCRED and the incoming socket. This
is unfortunate, and might even result in dead-locks if the database
query uses the same IPC as the original request.

So far, those recursions / dead-locks have been avoided by using
primitive IPC for all crucial NSS modules. However, we want to avoid
re-inventing the wheel for each NSS module that might be involved in
user/group queries. Hence, we would preferably make DBus (and other IPC
that supports access-management based on groups) work without resorting
to the user/group database. This new SO_PEERGROUPS ioctl would allow us
to make dbus-daemon work without ever calling into NSS.

Cc: Michal Sekletar 
Cc: Simon McVittie 
Reviewed-by: Tom Gundersen 
Signed-off-by: David Herrmann 
---
v2:
 - rebase on net-next
 - powerpc uapi header now uses asm-generic

 arch/alpha/include/uapi/asm/socket.h   |  2 ++
 arch/frv/include/uapi/asm/socket.h |  2 ++
 arch/ia64/include/uapi/asm/socket.h|  2 ++
 arch/m32r/include/uapi/asm/socket.h|  2 ++
 arch/mips/include/uapi/asm/socket.h|  2 ++
 arch/mn10300/include/uapi/asm/socket.h |  2 ++
 arch/parisc/include/uapi/asm/socket.h  |  2 ++
 arch/s390/include/uapi/asm/socket.h|  2 ++
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  2 ++
 include/uapi/asm-generic/socket.h  |  2 ++
 net/core/sock.c| 33 +
 12 files changed, 55 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 0926de63a62b..7b285dd4fe05 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -107,4 +107,6 @@
 
 #define SCM_TIMESTAMPING_PKTINFO   58
 
+#define SO_PEERGROUPS  59
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h 
b/arch/frv/include/uapi/asm/socket.h
index e491ff08b9a9..f1e3b20dce9f 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -100,5 +100,7 @@
 
 #define SCM_TIMESTAMPING_PKTINFO   58
 
+#define SO_PEERGROUPS  59
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index 86937241..5dd5c5d0d642 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -109,4 +109,6 @@
 
 #define SCM_TIMESTAMPING_PKTINFO   58
 
+#define SO_PEERGROUPS  59
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h 
b/arch/m32r/include/uapi/asm/socket.h
index 5d97890a8704..f8f7b47e247f 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -100,4 +100,6 @@
 
 #define SCM_TIMESTAMPING_PKTINFO   58
 
+#define SO_PEERGROUPS  59
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index 365ff51f033a..882823bec153 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -118,4 +118,6 @@
 
 #define SCM_TIMESTAMPING_PKTINFO   58
 
+#define SO_PEERGROUPS  59
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h 
b/arch/mn10300/include/uapi/asm/socket.h
index d013c0da0256..c710db354ff2 100644
--- a/arch/mn10300/include

[PATCH] net: introduce SO_PEERGROUPS getsockopt

2017-06-16 Thread David Herrmann
This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
retrieve the auxiliary groups of the remote peer. It is designed to
naturally extend SO_PEERCRED. That is, the underlying data is from the
same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
is, if the provided buffer is too small, ERANGE is returned and @optlen
is updated. Otherwise, the information is copied, @optlen is set to the
actual size, and 0 is returned.

While SO_PEERCRED (and thus `struct ucred') already returns the primary
group, it lacks the auxiliary group vector. However, nearly all access
controls (including kernel side VFS and SYSVIPC, but also user-space
polkit, DBus, ...) consider the entire set of groups, rather than just
the primary group. But this is currently not possible with pure
SO_PEERCRED. Instead, user-space has to work around this and query the
system database for the auxiliary groups of a UID retrieved via
SO_PEERCRED.

Unfortunately, there is no race-free way to query the auxiliary groups
of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
solution is to use getgrouplist(3p), which itself falls back to NSS and
whatever is configured in nsswitch.conf(3). This effectively checks
which groups we *would* assign to the user if it logged in *now*. On
normal systems it is as easy as reading /etc/group, but with NSS it can
resort to quering network databases (eg., LDAP), using IPC or network
communication.

Long story short: Whenever we want to use auxiliary groups for access
checks on IPC, we need further IPC to talk to the user/group databases,
rather than just relying on SO_PEERCRED and the incoming socket. This
is unfortunate, and might even result in dead-locks if the database
query uses the same IPC as the original request.

So far, those recursions / dead-locks have been avoided by using
primitive IPC for all crucial NSS modules. However, we want to avoid
re-inventing the wheel for each NSS module that might be involved in
user/group queries. Hence, we would preferably make DBus (and other IPC
that supports access-management based on groups) work without resorting
to the user/group database. This new SO_PEERGROUPS ioctl would allow us
to make dbus-daemon work without ever calling into NSS.

Cc: Michal Sekletar 
Cc: Simon McVittie 
Cc: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 arch/alpha/include/uapi/asm/socket.h   |  2 ++
 arch/frv/include/uapi/asm/socket.h |  2 ++
 arch/ia64/include/uapi/asm/socket.h|  2 ++
 arch/m32r/include/uapi/asm/socket.h|  2 ++
 arch/mips/include/uapi/asm/socket.h|  2 ++
 arch/mn10300/include/uapi/asm/socket.h |  2 ++
 arch/parisc/include/uapi/asm/socket.h  |  2 ++
 arch/powerpc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h|  2 ++
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  2 ++
 include/uapi/asm-generic/socket.h  |  2 ++
 net/core/sock.c| 33 +
 13 files changed, 57 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 148d7a32754e..975c5cbf9a86 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -105,4 +105,6 @@
 
 #define SO_COOKIE  57
 
+#define SO_PEERGROUPS  58
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h 
b/arch/frv/include/uapi/asm/socket.h
index 1ccf45657472..8e53a149b216 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -98,5 +98,7 @@
 
 #define SO_COOKIE  57
 
+#define SO_PEERGROUPS  58
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index 2c3f4b48042a..d122c30429ae 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -107,4 +107,6 @@
 
 #define SO_COOKIE  57
 
+#define SO_PEERGROUPS  58
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h 
b/arch/m32r/include/uapi/asm/socket.h
index ae6548d29a18..7e689cc14668 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -98,4 +98,6 @@
 
 #define SO_COOKIE  57
 
+#define SO_PEERGROUPS  58
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index 3418ec9c1c50..5c0947d063cc 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -116,4 +116,6 @@
 
 #define SO_COOKIE  57
 
+#define SO_PEERGROUPS  58
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h 
b/arch/mn10300/include/uapi/asm/socket.h
index 4526e92301a6..219f516eb6ad 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -98,4 

Re: [PATCH v3 1/2] bpf: add a longest prefix match trie map implementation

2017-01-18 Thread David Herrmann
Hi

On Sat, Jan 14, 2017 at 5:55 PM, Alexei Starovoitov  wrote:
> Another alternative is to extend samples/bpf/map_perf_test
> It has perf tests for most map types today (including lru)
> and trie would be natural addition there.
> I would prefer this latter option.

I hooked into gettid() and installed a simple kprobe bpf program that
searches for an entry in an lpm trie. The bpf program does either 0,
1, 8, or 32 lookups in a row (always the same element). The trie has
size 0, 1, or 8192. The data is below. The results vary by roughly 5%
on every run.

A single gettid() syscall with an empty bpf program takes roughly
6.5us on my system. Lookups in empty tries take ~1.8us on first try,
~0.9us on retries. Lookups in tries with 8192 entries take ~7.1us (on
the first _and_ any subsequent try).

https://gist.github.com/dvdhrm/4c90e61a1c39746d5c55ab9e0e29315e

Thanks
David


Trie-size: 0
#Lookups: 0
0:lpm_perf kmalloc 9,230,321 events per sec
-> 6.5us / syscall

Trie-size: 1
#Lookups: 1
0:lpm_perf kmalloc 7,224,508 events per sec
-> 8.3us / syscall

Trie-size: 1
#Lookups: 8
0:lpm_perf kmalloc 4,152,740 events per sec
-> 14.4us / syscall

Trie-size: 1
#Lookups: 32
0:lpm_perf kmalloc 1,713,415 events per sec
-> 35.0us / syscall

Trie-size: 8192
#Lookups: 1
0:lpm_perf kmalloc 4,369,138 events per sec
-> 13.7us / syscall

Trie-size: 8192
#Lookups: 8
0:lpm_perf kmalloc 943,849 events per sec
-> 63.6us / syscall

Trie-size: 8192
#Lookups: 32
0:lpm_perf kmalloc 271,737 events per sec
-> 220.8us / syscall


Re: [PATCH v2] unix: properly account for FDs passed over unix sockets

2016-02-03 Thread David Herrmann
Hi

On Wed, Feb 3, 2016 at 12:36 PM, Simon McVittie
 wrote:
> Am I right in saying that the advice I give to D-Bus users should be
> something like this?
>
> * system services should not send fds at all, unless they trust the
>   dbus-daemon
> * system services should not send fds via D-Bus that will be delivered
>   to recipients that they do not trust
> * sending fds to an untrusted recipient would enable that recipient to
>   carry out a denial-of-service attack (on what? the sender? the
>   dbus-daemon?)

With the revised patch from Hannes, this should no longer be needed.
My original concern was only about accounting inflight-fds on the
file-owner, rather than the sender.

However, with Hannes' revised patch, a different DoS attack against
dbus-daemon is possible. Imagine a peer that receives batches of FDs,
but never dequeues them. They will be accounted on the inflight-limit
of dbus-daemon, as such causing messages of independent peers to be
rejected in case they carry FDs.
Preferably, dbus-daemon would avoid queuing more than 16 FDs on a
single destination (total). But that would require POLLOUT to be
capped by the number of queued fds. A possible workaround is to add
CAP_SYS_RESOURCE to dbus-daemon.

Thanks
David


Re: [PATCH v2] unix: properly account for FDs passed over unix sockets

2016-02-02 Thread David Herrmann
Hi

On Sun, Jan 10, 2016 at 7:54 AM, Willy Tarreau  wrote:
> It is possible for a process to allocate and accumulate far more FDs than
> the process' limit by sending them over a unix socket then closing them
> to keep the process' fd count low.
>
> This change addresses this problem by keeping track of the number of FDs
> in flight per user and preventing non-privileged processes from having
> more FDs in flight than their configured FD limit.

This is not really what this patch does. This patch rather prevents
processes from having more of their owned *files* in flight than their
configured FD limit. See below for details..

> Reported-by: socketp...@gmail.com
> Reported-by: Tetsuo Handa 
> Mitigates: CVE-2013-4312 (Linux 2.0+)
> Suggested-by: Linus Torvalds 
> Acked-by: Hannes Frederic Sowa 
> Signed-off-by: Willy Tarreau 
> ---
> v2: add reported-by, mitigates and acked-by.
>
> It would be nice if (if accepted) it would be backported to -stable as the
> issue is currently exploitable.
> ---
>  include/linux/sched.h |  1 +
>  net/unix/af_unix.c| 24 
>  net/unix/garbage.c| 13 -
>  3 files changed, 29 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index edad7a4..fbf25f1 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -830,6 +830,7 @@ struct user_struct {
> unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? 
> */
>  #endif
> unsigned long locked_shm; /* How many pages of mlocked shm ? */
> +   unsigned long unix_inflight;/* How many files in flight in unix 
> sockets */
>
>  #ifdef CONFIG_KEYS
> struct key *uid_keyring;/* UID specific keyring */
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 45aebd9..d6d7b43 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1499,6 +1499,21 @@ static void unix_destruct_scm(struct sk_buff *skb)
> sock_wfree(skb);
>  }
>
> +/*
> + * The "user->unix_inflight" variable is protected by the garbage
> + * collection lock, and we just read it locklessly here. If you go
> + * over the limit, there might be a tiny race in actually noticing
> + * it across threads. Tough.

This limit is checked once per transaction. IIRC, a transaction can
carry 255 files. Thus, it is easy to exceed this limit without any
race involved.

> + */
> +static inline bool too_many_unix_fds(struct task_struct *p)
> +{
> +   struct user_struct *user = current_user();
> +
> +   if (unlikely(user->unix_inflight > task_rlimit(p, RLIMIT_NOFILE)))
> +   return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN);
> +   return false;
> +}
> +
>  #define MAX_RECURSION_LEVEL 4
>
>  static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
> @@ -1507,6 +1522,9 @@ static int unix_attach_fds(struct scm_cookie *scm, 
> struct sk_buff *skb)
> unsigned char max_level = 0;
> int unix_sock_count = 0;
>
> +   if (too_many_unix_fds(current))
> +   return -ETOOMANYREFS;
> +

Ignoring the capabilities, this effectively resolves to:

if (current_cred()->user->unix_inflight > rlimit(RLIMIT_NOFILE))
return -ETOOMANYREFS;

It limits the number of inflight FDs of the *current* user to its *own* limit.

But..

> for (i = scm->fp->count - 1; i >= 0; i--) {
> struct sock *sk = unix_get_socket(scm->fp->fp[i]);
>
> @@ -1528,10 +1546,8 @@ static int unix_attach_fds(struct scm_cookie *scm, 
> struct sk_buff *skb)
> if (!UNIXCB(skb).fp)
> return -ENOMEM;
>
> -   if (unix_sock_count) {
> -   for (i = scm->fp->count - 1; i >= 0; i--)
> -   unix_inflight(scm->fp->fp[i]);
> -   }
> +   for (i = scm->fp->count - 1; i >= 0; i--)
> +   unix_inflight(scm->fp->fp[i]);
> return max_level;
>  }
>
> diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> index a73a226..8fcdc22 100644
> --- a/net/unix/garbage.c
> +++ b/net/unix/garbage.c
> @@ -120,11 +120,11 @@ void unix_inflight(struct file *fp)
>  {
> struct sock *s = unix_get_socket(fp);
>
> +   spin_lock(&unix_gc_lock);
> +
> if (s) {
> struct unix_sock *u = unix_sk(s);
>
> -   spin_lock(&unix_gc_lock);
> -
> if (atomic_long_inc_return(&u->inflight) == 1) {
> BUG_ON(!list_empty(&u->link));
> list_add_tail(&u->link, &gc_inflight_list);
> @@ -132,25 +132,28 @@ void unix_inflight(struct file *fp)
> BUG_ON(list_empty(&u->link));
> }
> unix_tot_inflight++;
> -   spin_unlock(&unix_gc_lock);
> }
> +   fp->f_cred->user->unix_inflight++;

..this modifies the limit of the owner of the file you send. As such,
if you only send files that you don't own, you will continuously bump
the limit of the file-owner, but never 

[PATCH] net: drop write-only stack variable

2016-02-02 Thread David Herrmann
Remove a write-only stack variable from unix_attach_fds(). This is a
left-over from the security fix in:

commit 712f4aad406bb1ed67f3f98d04c044191f0ff593
Author: willy tarreau 
Date:   Sun Jan 10 07:54:56 2016 +0100

unix: properly account for FDs passed over unix sockets

Signed-off-by: David Herrmann 
---
 net/unix/af_unix.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index c5bf5ef..9f88193 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1534,7 +1534,6 @@ static int unix_attach_fds(struct scm_cookie *scm, struct 
sk_buff *skb)
 {
int i;
unsigned char max_level = 0;
-   int unix_sock_count = 0;
 
if (too_many_unix_fds(current))
return -ETOOMANYREFS;
@@ -1542,11 +1541,9 @@ static int unix_attach_fds(struct scm_cookie *scm, 
struct sk_buff *skb)
for (i = scm->fp->count - 1; i >= 0; i--) {
struct sock *sk = unix_get_socket(scm->fp->fp[i]);
 
-   if (sk) {
-   unix_sock_count++;
+   if (sk)
max_level = max(max_level,
unix_sk(sk)->recursion_level);
-   }
}
if (unlikely(max_level > MAX_RECURSION_LEVEL))
return -ETOOMANYREFS;
-- 
2.7.0



Re: [patch] netlink: fix a limit in NETLINK_LIST_MEMBERSHIPS

2015-11-13 Thread David Herrmann
Hi

On Fri, Nov 13, 2015 at 3:20 PM, Dan Carpenter  wrote:
> This condition doesn't work when len is smaller than expected and not a
> multiple of 4.  In that situation "len - pos" is negative and type
> promoted to a high unsigned value and we do not break out of the loop.
> It causes the program calling it to crash.

Could you give an example how this can happen? The loop-invariant
should be "len >= pos", as such, this shouldn't happen. "pos" starts
out as 0, "len" is guaranteed to be >=0. "pos" is only incremented by
4, if "len - pos >= 4".

What am I missing?

Thanks
David

> Fixes: b42be38b2778 ('netlink: add API to retrieve all group memberships')
> Signed-off-by: Dan Carpenter 
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 59651af..76a8466 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2373,7 +2373,7 @@ static int netlink_getsockopt(struct socket *sock, int 
> level, int optname,
> err = 0;
> netlink_lock_table();
> for (pos = 0; pos * 8 < nlk->ngroups; pos += sizeof(u32)) {
> -   if (len - pos < sizeof(u32))
> +   if (len < pos + sizeof(u32))
> break;
>
> idx = pos / sizeof(unsigned long);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netlink: fix locking around NETLINK_LIST_MEMBERSHIPS

2015-10-21 Thread David Herrmann
Currently, NETLINK_LIST_MEMBERSHIPS grabs the netlink table while copying
the membership state to user-space. However, grabing the netlink table is
effectively a write_lock_irq(), and as such we should not be triggering
page-faults in the critical section.

This can be easily reproduced by the following snippet:
int s = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
void *p = mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
int r = getsockopt(s, 0x10e, 9, p, (void*)((char*)p + 4092));

This should work just fine, but currently triggers EFAULT and a possible
WARN_ON below handle_mm_fault().

Fix this by reducing locking of NETLINK_LIST_MEMBERSHIPS to a read-side
lock. The write-lock was overkill in the first place, and the read-lock
allows page-faults just fine.

Cc:  # 4.2+
Reported-by: Dmitry Vyukov 
Signed-off-by: David Herrmann 
---
 net/netlink/af_netlink.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 8f060d7..2389602 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2371,7 +2371,7 @@ static int netlink_getsockopt(struct socket *sock, int 
level, int optname,
int pos, idx, shift;
 
err = 0;
-   netlink_table_grab();
+   netlink_lock_table();
for (pos = 0; pos * 8 < nlk->ngroups; pos += sizeof(u32)) {
if (len - pos < sizeof(u32))
break;
@@ -2386,7 +2386,7 @@ static int netlink_getsockopt(struct socket *sock, int 
level, int optname,
}
if (put_user(ALIGN(nlk->ngroups / 8, sizeof(u32)), optlen))
err = -EFAULT;
-   netlink_table_ungrab();
+   netlink_unlock_table();
break;
}
case NETLINK_CAP_ACK:
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] netlink: add API to retrieve all group memberships

2015-06-17 Thread David Herrmann
This patch adds getsockopt(SOL_NETLINK, NETLINK_LIST_MEMBERSHIPS) to
retrieve all groups a socket is a member of. Currently, we have to use
getsockname() and look at the nl.nl_groups bitmask. However, this mask is
limited to 32 groups. Hence, similar to NETLINK_ADD_MEMBERSHIP and
NETLINK_DROP_MEMBERSHIP, this adds a separate sockopt to manager higher
groups IDs than 32.

This new NETLINK_LIST_MEMBERSHIPS option takes a pointer to __u32 and the
size of the array. The array is filled with the full membership-set of the
socket, and the required array size is returned in optlen. Hence,
user-space can retry with a properly sized array in case it was too small.

Signed-off-by: David Herrmann 
---
v2:
  - rebase on top of net-next
conflict due to 59324cf35ab ('netlink: allow to listen "all" netns')

 include/uapi/linux/netlink.h | 17 +
 net/netlink/af_netlink.c | 22 ++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 3e34b7d..cf6a65c 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -101,14 +101,15 @@ struct nlmsgerr {
struct nlmsghdr msg;
 };
 
-#define NETLINK_ADD_MEMBERSHIP 1
-#define NETLINK_DROP_MEMBERSHIP2
-#define NETLINK_PKTINFO3
-#define NETLINK_BROADCAST_ERROR4
-#define NETLINK_NO_ENOBUFS 5
-#define NETLINK_RX_RING6
-#define NETLINK_TX_RING7
-#define NETLINK_LISTEN_ALL_NSID8
+#define NETLINK_ADD_MEMBERSHIP 1
+#define NETLINK_DROP_MEMBERSHIP2
+#define NETLINK_PKTINFO3
+#define NETLINK_BROADCAST_ERROR4
+#define NETLINK_NO_ENOBUFS 5
+#define NETLINK_RX_RING6
+#define NETLINK_TX_RING7
+#define NETLINK_LISTEN_ALL_NSID8
+#define NETLINK_LIST_MEMBERSHIPS   9
 
 struct nl_pktinfo {
__u32   group;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 69d67c3..dea9253 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2290,6 +2290,28 @@ static int netlink_getsockopt(struct socket *sock, int 
level, int optname,
return -EFAULT;
err = 0;
break;
+   case NETLINK_LIST_MEMBERSHIPS: {
+   int pos, idx, shift;
+
+   err = 0;
+   netlink_table_grab();
+   for (pos = 0; pos * 8 < nlk->ngroups; pos += sizeof(u32)) {
+   if (len - pos < sizeof(u32))
+   break;
+
+   idx = pos / sizeof(unsigned long);
+   shift = (pos % sizeof(unsigned long)) * 8;
+   if (put_user((u32)(nlk->groups[idx] >> shift),
+(u32 __user *)(optval + pos))) {
+   err = -EFAULT;
+   break;
+   }
+   }
+   if (put_user(ALIGN(nlk->ngroups / 8, sizeof(u32)), optlen))
+   err = -EFAULT;
+   netlink_table_ungrab();
+   break;
+   }
default:
err = -ENOPROTOOPT;
}
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netlink: add API to retrieve all group memberships

2015-06-12 Thread David Herrmann
This patch adds getsockopt(SOL_NETLINK, NETLINK_LIST_MEMBERSHIPS) to
retrieve all groups a socket is a member of. Currently, we have to use
getsockname() and look at the nl.nl_groups bitmask. However, this mask is
limited to 32 groups. Hence, similar to NETLINK_ADD_MEMBERSHIP and
NETLINK_DROP_MEMBERSHIP, this adds a separate sockopt to manager higher
groups IDs than 32.

This new NETLINK_LIST_MEMBERSHIPS option takes a pointer to __u32 and the
size of the array. The array is filled with the full membership-set of the
socket, and the required array size is returned in optlen. Hence,
user-space can retry with a properly sized array in case it was too small.

Signed-off-by: David Herrmann 
---
 include/uapi/linux/netlink.h | 15 ---
 net/netlink/af_netlink.c | 22 ++
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 1a85940..e38094f 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -101,13 +101,14 @@ struct nlmsgerr {
struct nlmsghdr msg;
 };
 
-#define NETLINK_ADD_MEMBERSHIP 1
-#define NETLINK_DROP_MEMBERSHIP2
-#define NETLINK_PKTINFO3
-#define NETLINK_BROADCAST_ERROR4
-#define NETLINK_NO_ENOBUFS 5
-#define NETLINK_RX_RING6
-#define NETLINK_TX_RING7
+#define NETLINK_ADD_MEMBERSHIP 1
+#define NETLINK_DROP_MEMBERSHIP2
+#define NETLINK_PKTINFO3
+#define NETLINK_BROADCAST_ERROR4
+#define NETLINK_NO_ENOBUFS 5
+#define NETLINK_RX_RING6
+#define NETLINK_TX_RING7
+#define NETLINK_LIST_MEMBERSHIPS   8
 
 struct nl_pktinfo {
__u32   group;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index bf6e766..b84dbe7 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2254,6 +2254,28 @@ static int netlink_getsockopt(struct socket *sock, int 
level, int optname,
return -EFAULT;
err = 0;
break;
+   case NETLINK_LIST_MEMBERSHIPS: {
+   int pos, idx, shift;
+
+   err = 0;
+   netlink_table_grab();
+   for (pos = 0; pos * 8 < nlk->ngroups; pos += sizeof(u32)) {
+   if (len - pos < sizeof(u32))
+   break;
+
+   idx = pos / sizeof(unsigned long);
+   shift = (pos % sizeof(unsigned long)) * 8;
+   if (put_user((u32)(nlk->groups[idx] >> shift),
+(u32 __user *)(optval + pos))) {
+   err = -EFAULT;
+   break;
+   }
+   }
+   if (put_user(ALIGN(nlk->ngroups / 8, sizeof(u32)), optlen))
+   err = -EFAULT;
+   netlink_table_ungrab();
+   break;
+   }
default:
err = -ENOPROTOOPT;
}
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html