Re: [RFC v1 00/14] Bus1 Kernel Message Bus
Hi! I'm a bit late to the party... > Example: > Imagine a receiver with a limit of 1024 handles. A sender transmits a > message to that receiver. It gets access to half the limit not used by > anyone else, hence 512 handles. It does not matter how many senders > there are, nor how many messages are sent, it will reach its quota at > 512. As long as they all belong to the same user, they will share the > quota and can queue at most 512 handles. If a second sending user > comes into play, it gets half the remaining not used by anyone else, > which ends up being 256. And so on... If the peer dequeues messages in > between, the numbers get higher again. But if you do the math, the > most you can get is 50% of the targets resources, if you're the only > sender. In all other cases you get less (like intertwined transfers, > etc). > > We did look into sender-based inflight accounting, but the same set of > issues arises. Sure, a Request+Reply model would make this easier to > handle, but we want to explicitly support a Subscribe+Event{n} model. > In this case there is more than one Reply to a message. > > Long story short: We have uid<->uid quotas so far, which prevent DoS > attacks, unless you get access to a ridiculous amount of local UIDs. > Details on which resources are accounted can be found in the wiki > [1]. So if there's limit of 1024 handles, all I need is 10 UIDs, right? That might be a problem on multiuser unix machine, but on Android phones, each application gets its own UID. So all you need is 10 applications to bring the system down... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [RFC v1 00/14] Bus1 Kernel Message Bus
Hi On Thu, Oct 27, 2016 at 2:45 AM, Kirill A. Shutemov wrote: > On Wed, Oct 26, 2016 at 10:34:30PM +0200, David Herrmann wrote: >> Long story short: We have uid<->uid quotas so far, which prevent DoS >> attacks, unless you get access to a ridiculous amount of local UIDs. >> Details on which resources are accounted can be found in the wiki [1]. > > Does only root user_ns uid count as separate or per-ns too? > > In first case we will have vitually unbounded access to UIDs. > > The second case can cap number of user namespaces a user can create while > using bus1 inside. > > Or am I missing something? We use the exact same mechanism as "struct user_struct" (as defined in linux/sched.h). One instance corresponds to each kuid_t currently in use. This is analogous to task, epoll, inotify, fanotify, mqueue, pipes, keys, ... resource accounting. Could you elaborate on what problem you see? Thanks David
Re: [RFC v1 00/14] Bus1 Kernel Message Bus
On Thu, Oct 27, 2016 at 03:45:24AM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 26, 2016 at 10:34:30PM +0200, David Herrmann wrote: > > Long story short: We have uid<->uid quotas so far, which prevent DoS > > attacks, unless you get access to a ridiculous amount of local UIDs. > > Details on which resources are accounted can be found in the wiki [1]. > > Does only root user_ns uid count as separate or per-ns too? > > In first case we will have vitually unbounded access to UIDs. > > The second case can cap number of user namespaces a user can create while > using bus1 inside. That seems easy enough to solve. Make the uid<->uid quota use uids in the namespace of the side whose resources the operation uses. That way, if both sender and recipient live in a user namespace then you get quota per user in the namespace, but you can't use a user namespace to cheat and manufacture more users to get more quota when talking to something *outside* that namespace. - Josh Triplett
Re: [RFC v1 00/14] Bus1 Kernel Message Bus
On Wed, Oct 26, 2016 at 10:34:30PM +0200, David Herrmann wrote: > Long story short: We have uid<->uid quotas so far, which prevent DoS > attacks, unless you get access to a ridiculous amount of local UIDs. > Details on which resources are accounted can be found in the wiki [1]. Does only root user_ns uid count as separate or per-ns too? In first case we will have vitually unbounded access to UIDs. The second case can cap number of user namespaces a user can create while using bus1 inside. Or am I missing something? -- Kirill A. Shutemov
Re: [RFC v1 00/14] Bus1 Kernel Message Bus
On Fri, Oct 28, 2016 at 3:11 PM, Richard Weinberger wrote: > On Wed, Oct 26, 2016 at 9:17 PM, David Herrmann wrote: >> Hi >> >> This proposal introduces bus1.ko, a kernel messaging bus. This is not a >> request >> for inclusion, yet. It is rather an initial draft and a Request For Comments. >> >> While bus1 emerged out of the kdbus project, bus1 was started from scratch >> and >> the concepts have little in common. In a nutshell, bus1 provides a >> capability-based IPC system, similar in nature to Android Binder, Cap'n >> Proto, >> and seL4. The module is completely generic and does neither require nor >> mandate >> a user-space counter-part. > > One thing which is not so clear to me is the role of bus1 wrt. containers. > Can a container A exchange messages with a container B? > If not, where is the boundary? I guess it is the pid namespace. There is no restriction with respect to containers. The metadata is translated between namespaces, obviously, but you can send messages to anyone you have a handle to. Cheers, Tom
Re: [RFC v1 00/14] Bus1 Kernel Message Bus
On Wed, Oct 26, 2016 at 9:17 PM, David Herrmann wrote: > Hi > > This proposal introduces bus1.ko, a kernel messaging bus. This is not a > request > for inclusion, yet. It is rather an initial draft and a Request For Comments. > > While bus1 emerged out of the kdbus project, bus1 was started from scratch and > the concepts have little in common. In a nutshell, bus1 provides a > capability-based IPC system, similar in nature to Android Binder, Cap'n Proto, > and seL4. The module is completely generic and does neither require nor > mandate > a user-space counter-part. One thing which is not so clear to me is the role of bus1 wrt. containers. Can a container A exchange messages with a container B? If not, where is the boundary? I guess it is the pid namespace. -- Thanks, //richard
Re: [RFC v1 00/14] Bus1 Kernel Message Bus
[CC += linuux-api]@vger.kernel.org Hi David, Could you please CC linux-api@ on all future iterations of this patch! Cheers, Michael On Wed, Oct 26, 2016 at 9:17 PM, David Herrmann wrote: > Hi > > This proposal introduces bus1.ko, a kernel messaging bus. This is not a > request > for inclusion, yet. It is rather an initial draft and a Request For Comments. > > While bus1 emerged out of the kdbus project, bus1 was started from scratch and > the concepts have little in common. In a nutshell, bus1 provides a > capability-based IPC system, similar in nature to Android Binder, Cap'n Proto, > and seL4. The module is completely generic and does neither require nor > mandate > a user-space counter-part. > > o Description > > Bus1 is a local IPC system, which provides a decentralized infrastructure > to > share objects between local peers. The main building blocks are nodes and > handles. Nodes represent objects of a local peer, while handles represent > descriptors that point to a node. Nodes can be created and destroyed by > any > peer, and they will always remain owned by their respective creator. > Handles > on the other hand, are used to refer to nodes and can be passed around > with > messages as auxiliary data. Whenever a handle is transferred, the receiver > will get its own handle allocated, pointing to the same node as the > original > handle. > > Any peer can send messages directed at one of their handles. This will > transfer the message to the owner of the node the handle points to. If a > peer does not posess a handle to a given node, it will not be able to > send a > message to that node. That is, handles provide exclusive access > management. > Anyone that somehow acquired a handle to a node is privileged to further > send this handle to other peers. As such, access management is transitive. > Once a peer acquired a handle, it cannot be revoked again. However, a node > owner can, at anytime, destroy a node. This will effectively unbind all > existing handles to that node on any peer, notifying each one of the > destruction. > > Unlike nodes and handles, peers cannot be addressed directly. In fact, > peers > are completely disconnected entities. A peer is merely an anchor of a set > of > nodes and handles, including an incoming message queue for any of those. > Whether multiple nodes are all part of the same peer, or part of different > peers does not affect the remote view of those. Peers solely exist as > management entity and command dispatcher to local processes. > > The set of actors on a system is completely decentralized. There is no > global component involved that provides a central registry or discovery > mechanism. Furthermore, communication between peers only involves those > peers, and does not affect any other peer in any way. No global > communication lock is taken. However, any communication is still globally > ordered, including unicasts, multicasts, and notifications. > > o Prior Art > > The concepts behind bus1 are almost identical to capability systems like > Android Binder, Google Mojo, Cap'n Proto, seL4, and more. Bus1 differs > from > them by supporting Global Ordering, Multicasts, Resource Accounting, No > Global Locking, No Global Context. > > While the bus1 UAPI does not expose all features (like soft-references as > supported by Binder), the in-kernel code includes support for it. Multiple > UAPIs can be supported on top of the in-kernel bus1 code, including > support > for the Binder UAPI. Efforts on this are still on-going. > > o Documentation > > The first patch in this series provides the bus1(7) man-page. It explains > all concepts in bus1 in more detail. Furthermore, it describes the API > that > is available on bus1 file descriptors. The pre-compiled man-page is > available at: > > http://www.bus1.org/bus1.html > > There is also a great bunch of in-source documentation available. All > cross-source-file APIs have KernelDoc annotations. Furthermore, we have an > introduction for each subsystem, to be found in the header files. The > total > number in lines of code for bus1 is roughly ~4.5k. The remaining ~5k LOC > are comments and documentation. > > o Upstream > > The upstream development repository is available on github: > > http://github.com/bus1/bus1 > > It is an out-of-tree repository that allows easy and fast development of > new bus1 features. The in-tree integration repository is available at: > > http://github.com/bus1/linux > > o Conferences > > Tom and I will be attending Linux Plumbers Conf next week. Please do not > hesitate to contact us there in person. There will also be a presentation > [1] of bus1 on the last day of the conference. > > Thanks > Tom & David > > [1] https://www.linuxplumber
Re: [RFC v1 00/14] Bus1 Kernel Message Bus
Hi On Wed, Oct 26, 2016 at 9:39 PM, Linus Torvalds wrote: > So the thing that tends to worry me about these is resource management. > > If I understood the documentation correctly, this has per-user > resource management, which guarantees that at least the system won't > run out of memory. Good. The act of sending a message transfers the > resource to the receiver end. Fine. > > However, the usual problem ends up being that a bad user can basically > DoS a system agent, especially since for obvious performance reasons > the send/receive has to be asynchronous. > > So the usual DoS model is that some user just sends a lot of messages > to a system agent, filling up the system agent resource quota, and > basically killing the system. No, it didn't run out of memory, but the > system agent may not be able to do anything more, since it is now out > of resources. > > Keeping the resource management with the sender doesn't solve the > problem, it just reverses it: now the attack will be to send a lot of > queries to the system agent, but then just refuse to listen to the > replies - again causing the system agent to run out of resources. > > Usually the way this is resolved this is by forcing a > "request-and-reply" resource management model, where the person who > sends out a request is not only the one who is accounted for the > request, but also accounted for the reply buffer. That way the system > agent never runs out of resources, because it's always the requesting > party that has its resources accounted, never the system agent. > > You may well have solved this, but can you describe what the solution > is without forcing people to read the code and try to analyze it? All accounting on bus1 is done on a UID-basis. This is the initial model that tries to match POSIX semantics. More advanced accounting is left as a future extension (like cgroup-based, etc.). So whenever we talk about "user accounting", we talk about the user-abstraction in bus1 that right now is based on UIDs, but could be extended for other schemes. All bus1 resources are owned by a peer, and each peer has a user assigned (which right now corresponds to file->f_cred->uid). Whenever a peer allocates resources, it is accounted on its user. There are global limits per user which cannot be exceeded. Additionally, each peer can set its own per-peer limits, to further partition the per-user limits. Of course, per-user limits override per-peer limits, if necessary. Now this is all trivial and obvious. It works like any resource accounting in the kernel. It becomes tricky when we try to transfer resources. Before SEND, a resource is always accounted on the sender. After RECV, a resource is accounted on the receiver. That is, resource ownership is transferred. In most cases this is obvious: memory is copied from one address-space into another, or file-descriptors are added into the file-table of another process, etc. Lastly, when a resource is queued, we decided to go with receiver-accounting. This means, at the time of SEND resource ownership is transferred (unlike sender-accounting, which would transfer it at time of RECV). The reasons are manifold, but mainly we want RECV to not fail due to accounting, resource exhaustion, etc. We wanted SEND to do the heavy-lifting, and RECV to just dequeue. By avoiding sender-based accounting, we avoid attacks where a receiver does not dequeue messages and thus exhausts the sender's limits. The issue left is senders DoS'ing a target user. To mitigate this, we implemented a quota system. Whenever a sender wants to transfer resources to a receiver, it only gets access to a subset of the receivers resource limits. The inflight resources are accounted on a uid<->uid basis, and the current algorithm allows a receiver access to at most half the limit of the destination not currently used by anyone else. Example: Imagine a receiver with a limit of 1024 handles. A sender transmits a message to that receiver. It gets access to half the limit not used by anyone else, hence 512 handles. It does not matter how many senders there are, nor how many messages are sent, it will reach its quota at 512. As long as they all belong to the same user, they will share the quota and can queue at most 512 handles. If a second sending user comes into play, it gets half the remaining not used by anyone else, which ends up being 256. And so on... If the peer dequeues messages in between, the numbers get higher again. But if you do the math, the most you can get is 50% of the targets resources, if you're the only sender. In all other cases you get less (like intertwined transfers, etc). We did look into sender-based inflight accounting, but the same set of issues arises. Sure, a Request+Reply model would make this easier to handle, but we want to explicitly support a Subscribe+Event{n} model. In this case there is more than one Reply to a message. Long story short: We have uid<->uid quotas so far, which prevent DoS attacks, unless
Re: [RFC v1 00/14] Bus1 Kernel Message Bus
So the thing that tends to worry me about these is resource management. If I understood the documentation correctly, this has per-user resource management, which guarantees that at least the system won't run out of memory. Good. The act of sending a message transfers the resource to the receiver end. Fine. However, the usual problem ends up being that a bad user can basically DoS a system agent, especially since for obvious performance reasons the send/receive has to be asynchronous. So the usual DoS model is that some user just sends a lot of messages to a system agent, filling up the system agent resource quota, and basically killing the system. No, it didn't run out of memory, but the system agent may not be able to do anything more, since it is now out of resources. Keeping the resource management with the sender doesn't solve the problem, it just reverses it: now the attack will be to send a lot of queries to the system agent, but then just refuse to listen to the replies - again causing the system agent to run out of resources. Usually the way this is resolved this is by forcing a "request-and-reply" resource management model, where the person who sends out a request is not only the one who is accounted for the request, but also accounted for the reply buffer. That way the system agent never runs out of resources, because it's always the requesting party that has its resources accounted, never the system agent. You may well have solved this, but can you describe what the solution is without forcing people to read the code and try to analyze it? Linus
[RFC v1 00/14] Bus1 Kernel Message Bus
Hi This proposal introduces bus1.ko, a kernel messaging bus. This is not a request for inclusion, yet. It is rather an initial draft and a Request For Comments. While bus1 emerged out of the kdbus project, bus1 was started from scratch and the concepts have little in common. In a nutshell, bus1 provides a capability-based IPC system, similar in nature to Android Binder, Cap'n Proto, and seL4. The module is completely generic and does neither require nor mandate a user-space counter-part. o Description Bus1 is a local IPC system, which provides a decentralized infrastructure to share objects between local peers. The main building blocks are nodes and handles. Nodes represent objects of a local peer, while handles represent descriptors that point to a node. Nodes can be created and destroyed by any peer, and they will always remain owned by their respective creator. Handles on the other hand, are used to refer to nodes and can be passed around with messages as auxiliary data. Whenever a handle is transferred, the receiver will get its own handle allocated, pointing to the same node as the original handle. Any peer can send messages directed at one of their handles. This will transfer the message to the owner of the node the handle points to. If a peer does not posess a handle to a given node, it will not be able to send a message to that node. That is, handles provide exclusive access management. Anyone that somehow acquired a handle to a node is privileged to further send this handle to other peers. As such, access management is transitive. Once a peer acquired a handle, it cannot be revoked again. However, a node owner can, at anytime, destroy a node. This will effectively unbind all existing handles to that node on any peer, notifying each one of the destruction. Unlike nodes and handles, peers cannot be addressed directly. In fact, peers are completely disconnected entities. A peer is merely an anchor of a set of nodes and handles, including an incoming message queue for any of those. Whether multiple nodes are all part of the same peer, or part of different peers does not affect the remote view of those. Peers solely exist as management entity and command dispatcher to local processes. The set of actors on a system is completely decentralized. There is no global component involved that provides a central registry or discovery mechanism. Furthermore, communication between peers only involves those peers, and does not affect any other peer in any way. No global communication lock is taken. However, any communication is still globally ordered, including unicasts, multicasts, and notifications. o Prior Art The concepts behind bus1 are almost identical to capability systems like Android Binder, Google Mojo, Cap'n Proto, seL4, and more. Bus1 differs from them by supporting Global Ordering, Multicasts, Resource Accounting, No Global Locking, No Global Context. While the bus1 UAPI does not expose all features (like soft-references as supported by Binder), the in-kernel code includes support for it. Multiple UAPIs can be supported on top of the in-kernel bus1 code, including support for the Binder UAPI. Efforts on this are still on-going. o Documentation The first patch in this series provides the bus1(7) man-page. It explains all concepts in bus1 in more detail. Furthermore, it describes the API that is available on bus1 file descriptors. The pre-compiled man-page is available at: http://www.bus1.org/bus1.html There is also a great bunch of in-source documentation available. All cross-source-file APIs have KernelDoc annotations. Furthermore, we have an introduction for each subsystem, to be found in the header files. The total number in lines of code for bus1 is roughly ~4.5k. The remaining ~5k LOC are comments and documentation. o Upstream The upstream development repository is available on github: http://github.com/bus1/bus1 It is an out-of-tree repository that allows easy and fast development of new bus1 features. The in-tree integration repository is available at: http://github.com/bus1/linux o Conferences Tom and I will be attending Linux Plumbers Conf next week. Please do not hesitate to contact us there in person. There will also be a presentation [1] of bus1 on the last day of the conference. Thanks Tom & David [1] https://www.linuxplumbersconf.org/2016/ocw/proposals/3819 Tom Gundersen (14): bus1: add bus1(7) man-page bus1: provide stub cdev /dev/bus1 bus1: util - active reference utility library bus1: util - fixed list utility library bus1: util - pool utility library bus1: util - queue utility library bus1: tracking user contexts bus1: implement peer management context bus1: provide transaction context for multicasts