Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2017-01-30 Thread Pavel Machek
Hi!

I'm a bit late to the party...

> Example:
> Imagine a receiver with a limit of 1024 handles. A sender transmits a
> message to that receiver. It gets access to half the limit not used by
> anyone else, hence 512 handles. It does not matter how many senders
> there are, nor how many messages are sent, it will reach its quota at
> 512. As long as they all belong to the same user, they will share the
> quota and can queue at most 512 handles. If a second sending user
> comes into play, it gets half the remaining not used by anyone else,
> which ends up being 256. And so on... If the peer dequeues messages in
> between, the numbers get higher again. But if you do the math, the
> most you can get is 50% of the targets resources, if you're the only
> sender. In all other cases you get less (like intertwined transfers,
> etc).
> 
> We did look into sender-based inflight accounting, but the same set of
> issues arises. Sure, a Request+Reply model would make this easier to
> handle, but we want to explicitly support a Subscribe+Event{n} model.
> In this case there is more than one Reply to a message.
> 
> Long story short: We have uid<->uid quotas so far, which prevent DoS
> attacks, unless you get access to a ridiculous amount of local UIDs.
> Details on which resources are accounted can be found in the wiki
> [1].

So if there's limit of 1024 handles, all I need is 10 UIDs, right?

That might be a problem on multiuser unix machine, but on Android
phones, each application gets its own UID. So all you need is 10
applications to bring the system down...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2017-01-30 Thread Pavel Machek
Hi!

I'm a bit late to the party...

> Example:
> Imagine a receiver with a limit of 1024 handles. A sender transmits a
> message to that receiver. It gets access to half the limit not used by
> anyone else, hence 512 handles. It does not matter how many senders
> there are, nor how many messages are sent, it will reach its quota at
> 512. As long as they all belong to the same user, they will share the
> quota and can queue at most 512 handles. If a second sending user
> comes into play, it gets half the remaining not used by anyone else,
> which ends up being 256. And so on... If the peer dequeues messages in
> between, the numbers get higher again. But if you do the math, the
> most you can get is 50% of the targets resources, if you're the only
> sender. In all other cases you get less (like intertwined transfers,
> etc).
> 
> We did look into sender-based inflight accounting, but the same set of
> issues arises. Sure, a Request+Reply model would make this easier to
> handle, but we want to explicitly support a Subscribe+Event{n} model.
> In this case there is more than one Reply to a message.
> 
> Long story short: We have uid<->uid quotas so far, which prevent DoS
> attacks, unless you get access to a ridiculous amount of local UIDs.
> Details on which resources are accounted can be found in the wiki
> [1].

So if there's limit of 1024 handles, all I need is 10 UIDs, right?

That might be a problem on multiuser unix machine, but on Android
phones, each application gets its own UID. So all you need is 10
applications to bring the system down...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-11-02 Thread David Herrmann
Hi

On Thu, Oct 27, 2016 at 2:45 AM, Kirill A. Shutemov
 wrote:
> On Wed, Oct 26, 2016 at 10:34:30PM +0200, David Herrmann wrote:
>> Long story short: We have uid<->uid quotas so far, which prevent DoS
>> attacks, unless you get access to a ridiculous amount of local UIDs.
>> Details on which resources are accounted can be found in the wiki [1].
>
> Does only root user_ns uid count as separate or per-ns too?
>
> In first case we will have vitually unbounded access to UIDs.
>
> The second case can cap number of user namespaces a user can create while
> using bus1 inside.
>
> Or am I missing something?

We use the exact same mechanism as "struct user_struct" (as defined in
linux/sched.h). One instance corresponds to each kuid_t currently in
use. This is analogous to task, epoll, inotify, fanotify, mqueue,
pipes, keys, ... resource accounting.

Could you elaborate on what problem you see?

Thanks
David


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-11-02 Thread David Herrmann
Hi

On Thu, Oct 27, 2016 at 2:45 AM, Kirill A. Shutemov
 wrote:
> On Wed, Oct 26, 2016 at 10:34:30PM +0200, David Herrmann wrote:
>> Long story short: We have uid<->uid quotas so far, which prevent DoS
>> attacks, unless you get access to a ridiculous amount of local UIDs.
>> Details on which resources are accounted can be found in the wiki [1].
>
> Does only root user_ns uid count as separate or per-ns too?
>
> In first case we will have vitually unbounded access to UIDs.
>
> The second case can cap number of user namespaces a user can create while
> using bus1 inside.
>
> Or am I missing something?

We use the exact same mechanism as "struct user_struct" (as defined in
linux/sched.h). One instance corresponds to each kuid_t currently in
use. This is analogous to task, epoll, inotify, fanotify, mqueue,
pipes, keys, ... resource accounting.

Could you elaborate on what problem you see?

Thanks
David


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-29 Thread Josh Triplett
On Thu, Oct 27, 2016 at 03:45:24AM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 26, 2016 at 10:34:30PM +0200, David Herrmann wrote:
> > Long story short: We have uid<->uid quotas so far, which prevent DoS
> > attacks, unless you get access to a ridiculous amount of local UIDs.
> > Details on which resources are accounted can be found in the wiki [1].
> 
> Does only root user_ns uid count as separate or per-ns too?
> 
> In first case we will have vitually unbounded access to UIDs.
> 
> The second case can cap number of user namespaces a user can create while
> using bus1 inside.

That seems easy enough to solve.  Make the uid<->uid quota use uids in
the namespace of the side whose resources the operation uses.  That way,
if both sender and recipient live in a user namespace then you get quota
per user in the namespace, but you can't use a user namespace to cheat
and manufacture more users to get more quota when talking to something
*outside* that namespace.

- Josh Triplett


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-29 Thread Josh Triplett
On Thu, Oct 27, 2016 at 03:45:24AM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 26, 2016 at 10:34:30PM +0200, David Herrmann wrote:
> > Long story short: We have uid<->uid quotas so far, which prevent DoS
> > attacks, unless you get access to a ridiculous amount of local UIDs.
> > Details on which resources are accounted can be found in the wiki [1].
> 
> Does only root user_ns uid count as separate or per-ns too?
> 
> In first case we will have vitually unbounded access to UIDs.
> 
> The second case can cap number of user namespaces a user can create while
> using bus1 inside.

That seems easy enough to solve.  Make the uid<->uid quota use uids in
the namespace of the side whose resources the operation uses.  That way,
if both sender and recipient live in a user namespace then you get quota
per user in the namespace, but you can't use a user namespace to cheat
and manufacture more users to get more quota when talking to something
*outside* that namespace.

- Josh Triplett


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-29 Thread Kirill A. Shutemov
On Wed, Oct 26, 2016 at 10:34:30PM +0200, David Herrmann wrote:
> Long story short: We have uid<->uid quotas so far, which prevent DoS
> attacks, unless you get access to a ridiculous amount of local UIDs.
> Details on which resources are accounted can be found in the wiki [1].

Does only root user_ns uid count as separate or per-ns too?

In first case we will have vitually unbounded access to UIDs.

The second case can cap number of user namespaces a user can create while
using bus1 inside.

Or am I missing something?

-- 
 Kirill A. Shutemov


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-29 Thread Kirill A. Shutemov
On Wed, Oct 26, 2016 at 10:34:30PM +0200, David Herrmann wrote:
> Long story short: We have uid<->uid quotas so far, which prevent DoS
> attacks, unless you get access to a ridiculous amount of local UIDs.
> Details on which resources are accounted can be found in the wiki [1].

Does only root user_ns uid count as separate or per-ns too?

In first case we will have vitually unbounded access to UIDs.

The second case can cap number of user namespaces a user can create while
using bus1 inside.

Or am I missing something?

-- 
 Kirill A. Shutemov


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-28 Thread Tom Gundersen
On Fri, Oct 28, 2016 at 3:11 PM, Richard Weinberger
 wrote:
> On Wed, Oct 26, 2016 at 9:17 PM, David Herrmann  wrote:
>> Hi
>>
>> This proposal introduces bus1.ko, a kernel messaging bus. This is not a 
>> request
>> for inclusion, yet. It is rather an initial draft and a Request For Comments.
>>
>> While bus1 emerged out of the kdbus project, bus1 was started from scratch 
>> and
>> the concepts have little in common. In a nutshell, bus1 provides a
>> capability-based IPC system, similar in nature to Android Binder, Cap'n 
>> Proto,
>> and seL4. The module is completely generic and does neither require nor 
>> mandate
>> a user-space counter-part.
>
> One thing which is not so clear to me is the role of bus1 wrt. containers.
> Can a container A exchange messages with a container B?
> If not, where is the boundary? I guess it is the pid namespace.

There is no restriction with respect to containers. The metadata is
translated between namespaces, obviously, but you can send messages to
anyone you have a handle to.

Cheers,

Tom


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-28 Thread Tom Gundersen
On Fri, Oct 28, 2016 at 3:11 PM, Richard Weinberger
 wrote:
> On Wed, Oct 26, 2016 at 9:17 PM, David Herrmann  wrote:
>> Hi
>>
>> This proposal introduces bus1.ko, a kernel messaging bus. This is not a 
>> request
>> for inclusion, yet. It is rather an initial draft and a Request For Comments.
>>
>> While bus1 emerged out of the kdbus project, bus1 was started from scratch 
>> and
>> the concepts have little in common. In a nutshell, bus1 provides a
>> capability-based IPC system, similar in nature to Android Binder, Cap'n 
>> Proto,
>> and seL4. The module is completely generic and does neither require nor 
>> mandate
>> a user-space counter-part.
>
> One thing which is not so clear to me is the role of bus1 wrt. containers.
> Can a container A exchange messages with a container B?
> If not, where is the boundary? I guess it is the pid namespace.

There is no restriction with respect to containers. The metadata is
translated between namespaces, obviously, but you can send messages to
anyone you have a handle to.

Cheers,

Tom


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-28 Thread Richard Weinberger
On Wed, Oct 26, 2016 at 9:17 PM, David Herrmann  wrote:
> Hi
>
> This proposal introduces bus1.ko, a kernel messaging bus. This is not a 
> request
> for inclusion, yet. It is rather an initial draft and a Request For Comments.
>
> While bus1 emerged out of the kdbus project, bus1 was started from scratch and
> the concepts have little in common. In a nutshell, bus1 provides a
> capability-based IPC system, similar in nature to Android Binder, Cap'n Proto,
> and seL4. The module is completely generic and does neither require nor 
> mandate
> a user-space counter-part.

One thing which is not so clear to me is the role of bus1 wrt. containers.
Can a container A exchange messages with a container B?
If not, where is the boundary? I guess it is the pid namespace.

-- 
Thanks,
//richard


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-28 Thread Richard Weinberger
On Wed, Oct 26, 2016 at 9:17 PM, David Herrmann  wrote:
> Hi
>
> This proposal introduces bus1.ko, a kernel messaging bus. This is not a 
> request
> for inclusion, yet. It is rather an initial draft and a Request For Comments.
>
> While bus1 emerged out of the kdbus project, bus1 was started from scratch and
> the concepts have little in common. In a nutshell, bus1 provides a
> capability-based IPC system, similar in nature to Android Binder, Cap'n Proto,
> and seL4. The module is completely generic and does neither require nor 
> mandate
> a user-space counter-part.

One thing which is not so clear to me is the role of bus1 wrt. containers.
Can a container A exchange messages with a container B?
If not, where is the boundary? I guess it is the pid namespace.

-- 
Thanks,
//richard


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-27 Thread Michael Kerrisk
[CC += linuux-api]@vger.kernel.org

Hi David,

Could you please CC linux-api@ on all future iterations of this patch!

Cheers,

Michael



On Wed, Oct 26, 2016 at 9:17 PM, David Herrmann  wrote:
> Hi
>
> This proposal introduces bus1.ko, a kernel messaging bus. This is not a 
> request
> for inclusion, yet. It is rather an initial draft and a Request For Comments.
>
> While bus1 emerged out of the kdbus project, bus1 was started from scratch and
> the concepts have little in common. In a nutshell, bus1 provides a
> capability-based IPC system, similar in nature to Android Binder, Cap'n Proto,
> and seL4. The module is completely generic and does neither require nor 
> mandate
> a user-space counter-part.
>
>  o Description
>
> Bus1 is a local IPC system, which provides a decentralized infrastructure 
> to
> share objects between local peers. The main building blocks are nodes and
> handles. Nodes represent objects of a local peer, while handles represent
> descriptors that point to a node. Nodes can be created and destroyed by 
> any
> peer, and they will always remain owned by their respective creator. 
> Handles
> on the other hand, are used to refer to nodes and can be passed around 
> with
> messages as auxiliary data. Whenever a handle is transferred, the receiver
> will get its own handle allocated, pointing to the same node as the 
> original
> handle.
>
> Any peer can send messages directed at one of their handles. This will
> transfer the message to the owner of the node the handle points to. If a
> peer does not posess a handle to a given node, it will not be able to 
> send a
> message to that node. That is, handles provide exclusive access 
> management.
> Anyone that somehow acquired a handle to a node is privileged to further
> send this handle to other peers. As such, access management is transitive.
> Once a peer acquired a handle, it cannot be revoked again. However, a node
> owner can, at anytime, destroy a node. This will effectively unbind all
> existing handles to that node on any peer, notifying each one of the
> destruction.
>
> Unlike nodes and handles, peers cannot be addressed directly. In fact, 
> peers
> are completely disconnected entities. A peer is merely an anchor of a set 
> of
> nodes and handles, including an incoming message queue for any of those.
> Whether multiple nodes are all part of the same peer, or part of different
> peers does not affect the remote view of those. Peers solely exist as
> management entity and command dispatcher to local processes.
>
> The set of actors on a system is completely decentralized. There is no
> global component involved that provides a central registry or discovery
> mechanism. Furthermore, communication between peers only involves those
> peers, and does not affect any other peer in any way. No global
> communication lock is taken. However, any communication is still globally
> ordered, including unicasts, multicasts, and notifications.
>
>  o Prior Art
>
> The concepts behind bus1 are almost identical to capability systems like
> Android Binder, Google Mojo, Cap'n Proto, seL4, and more. Bus1 differs 
> from
> them by supporting Global Ordering, Multicasts, Resource Accounting, No
> Global Locking, No Global Context.
>
> While the bus1 UAPI does not expose all features (like soft-references as
> supported by Binder), the in-kernel code includes support for it. Multiple
> UAPIs can be supported on top of the in-kernel bus1 code, including 
> support
> for the Binder UAPI. Efforts on this are still on-going.
>
>  o Documentation
>
> The first patch in this series provides the bus1(7) man-page. It explains
> all concepts in bus1 in more detail. Furthermore, it describes the API 
> that
> is available on bus1 file descriptors. The pre-compiled man-page is
> available at:
>
> http://www.bus1.org/bus1.html
>
> There is also a great bunch of in-source documentation available. All
> cross-source-file APIs have KernelDoc annotations. Furthermore, we have an
> introduction for each subsystem, to be found in the header files. The 
> total
> number in lines of code for bus1 is roughly ~4.5k. The remaining ~5k LOC
> are comments and documentation.
>
>  o Upstream
>
> The upstream development repository is available on github:
>
> http://github.com/bus1/bus1
>
> It is an out-of-tree repository that allows easy and fast development of
> new bus1 features. The in-tree integration repository is available at:
>
> http://github.com/bus1/linux
>
>  o Conferences
>
> Tom and I will be attending Linux Plumbers Conf next week. Please do not
> hesitate to contact us there in person. There will also be a presentation
> [1] of bus1 on the last day of the conference.
>
> Thanks
> Tom & David
>
> [1] 

Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-27 Thread Michael Kerrisk
[CC += linuux-api]@vger.kernel.org

Hi David,

Could you please CC linux-api@ on all future iterations of this patch!

Cheers,

Michael



On Wed, Oct 26, 2016 at 9:17 PM, David Herrmann  wrote:
> Hi
>
> This proposal introduces bus1.ko, a kernel messaging bus. This is not a 
> request
> for inclusion, yet. It is rather an initial draft and a Request For Comments.
>
> While bus1 emerged out of the kdbus project, bus1 was started from scratch and
> the concepts have little in common. In a nutshell, bus1 provides a
> capability-based IPC system, similar in nature to Android Binder, Cap'n Proto,
> and seL4. The module is completely generic and does neither require nor 
> mandate
> a user-space counter-part.
>
>  o Description
>
> Bus1 is a local IPC system, which provides a decentralized infrastructure 
> to
> share objects between local peers. The main building blocks are nodes and
> handles. Nodes represent objects of a local peer, while handles represent
> descriptors that point to a node. Nodes can be created and destroyed by 
> any
> peer, and they will always remain owned by their respective creator. 
> Handles
> on the other hand, are used to refer to nodes and can be passed around 
> with
> messages as auxiliary data. Whenever a handle is transferred, the receiver
> will get its own handle allocated, pointing to the same node as the 
> original
> handle.
>
> Any peer can send messages directed at one of their handles. This will
> transfer the message to the owner of the node the handle points to. If a
> peer does not posess a handle to a given node, it will not be able to 
> send a
> message to that node. That is, handles provide exclusive access 
> management.
> Anyone that somehow acquired a handle to a node is privileged to further
> send this handle to other peers. As such, access management is transitive.
> Once a peer acquired a handle, it cannot be revoked again. However, a node
> owner can, at anytime, destroy a node. This will effectively unbind all
> existing handles to that node on any peer, notifying each one of the
> destruction.
>
> Unlike nodes and handles, peers cannot be addressed directly. In fact, 
> peers
> are completely disconnected entities. A peer is merely an anchor of a set 
> of
> nodes and handles, including an incoming message queue for any of those.
> Whether multiple nodes are all part of the same peer, or part of different
> peers does not affect the remote view of those. Peers solely exist as
> management entity and command dispatcher to local processes.
>
> The set of actors on a system is completely decentralized. There is no
> global component involved that provides a central registry or discovery
> mechanism. Furthermore, communication between peers only involves those
> peers, and does not affect any other peer in any way. No global
> communication lock is taken. However, any communication is still globally
> ordered, including unicasts, multicasts, and notifications.
>
>  o Prior Art
>
> The concepts behind bus1 are almost identical to capability systems like
> Android Binder, Google Mojo, Cap'n Proto, seL4, and more. Bus1 differs 
> from
> them by supporting Global Ordering, Multicasts, Resource Accounting, No
> Global Locking, No Global Context.
>
> While the bus1 UAPI does not expose all features (like soft-references as
> supported by Binder), the in-kernel code includes support for it. Multiple
> UAPIs can be supported on top of the in-kernel bus1 code, including 
> support
> for the Binder UAPI. Efforts on this are still on-going.
>
>  o Documentation
>
> The first patch in this series provides the bus1(7) man-page. It explains
> all concepts in bus1 in more detail. Furthermore, it describes the API 
> that
> is available on bus1 file descriptors. The pre-compiled man-page is
> available at:
>
> http://www.bus1.org/bus1.html
>
> There is also a great bunch of in-source documentation available. All
> cross-source-file APIs have KernelDoc annotations. Furthermore, we have an
> introduction for each subsystem, to be found in the header files. The 
> total
> number in lines of code for bus1 is roughly ~4.5k. The remaining ~5k LOC
> are comments and documentation.
>
>  o Upstream
>
> The upstream development repository is available on github:
>
> http://github.com/bus1/bus1
>
> It is an out-of-tree repository that allows easy and fast development of
> new bus1 features. The in-tree integration repository is available at:
>
> http://github.com/bus1/linux
>
>  o Conferences
>
> Tom and I will be attending Linux Plumbers Conf next week. Please do not
> hesitate to contact us there in person. There will also be a presentation
> [1] of bus1 on the last day of the conference.
>
> Thanks
> Tom & David
>
> [1] 

Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-26 Thread David Herrmann
Hi

On Wed, Oct 26, 2016 at 9:39 PM, Linus Torvalds
 wrote:
> So the thing that tends to worry me about these is resource management.
>
> If I understood the documentation correctly, this has per-user
> resource management, which guarantees that at least the system won't
> run out of memory. Good. The act of sending a message transfers the
> resource to the receiver end. Fine.
>
> However, the usual problem ends up being that a bad user can basically
> DoS a system agent, especially since for obvious performance reasons
> the send/receive has to be asynchronous.
>
> So the usual DoS model is that some user just sends a lot of messages
> to a system agent, filling up the system agent resource quota, and
> basically killing the system. No, it didn't run out of memory, but the
> system agent may not be able to do anything more, since it is now out
> of resources.
>
> Keeping the resource management with the sender doesn't solve the
> problem, it just reverses it: now the attack will be to send a lot of
> queries to the system agent, but then just refuse to listen to the
> replies - again causing the system agent to run out of resources.
>
> Usually the way this is resolved this is by forcing a
> "request-and-reply" resource management model, where the person who
> sends out a request is not only the one who is accounted for the
> request, but also accounted for the reply buffer. That way the system
> agent never runs out of resources, because it's always the requesting
> party that has its resources accounted, never the system agent.
>
> You may well have solved this, but can you describe what the solution
> is without forcing people to read the code and try to analyze it?

All accounting on bus1 is done on a UID-basis. This is the initial
model that tries to match POSIX semantics. More advanced accounting is
left as a future extension (like cgroup-based, etc.). So whenever we
talk about "user accounting", we talk about the user-abstraction in
bus1 that right now is based on UIDs, but could be extended for other
schemes.

All bus1 resources are owned by a peer, and each peer has a user
assigned (which right now corresponds to file->f_cred->uid). Whenever
a peer allocates resources, it is accounted on its user. There are
global limits per user which cannot be exceeded. Additionally, each
peer can set its own per-peer limits, to further partition the
per-user limits. Of course, per-user limits override per-peer limits,
if necessary.

Now this is all trivial and obvious. It works like any resource
accounting in the kernel. It becomes tricky when we try to transfer
resources. Before SEND, a resource is always accounted on the sender.
After RECV, a resource is accounted on the receiver. That is, resource
ownership is transferred. In most cases this is obvious: memory is
copied from one address-space into another, or file-descriptors are
added into the file-table of another process, etc.

Lastly, when a resource is queued, we decided to go with
receiver-accounting. This means, at the time of SEND resource
ownership is transferred (unlike sender-accounting, which would
transfer it at time of RECV). The reasons are manifold, but mainly we
want RECV to not fail due to accounting, resource exhaustion, etc. We
wanted SEND to do the heavy-lifting, and RECV to just dequeue. By
avoiding sender-based accounting, we avoid attacks where a receiver
does not dequeue messages and thus exhausts the sender's limits. The
issue left is senders DoS'ing a target user. To mitigate this, we
implemented a quota system. Whenever a sender wants to transfer
resources to a receiver, it only gets access to a subset of the
receivers resource limits. The inflight resources are accounted on a
uid<->uid basis, and the current algorithm allows a receiver access to
at most half the limit of the destination not currently used by anyone
else.

Example:
Imagine a receiver with a limit of 1024 handles. A sender transmits a
message to that receiver. It gets access to half the limit not used by
anyone else, hence 512 handles. It does not matter how many senders
there are, nor how many messages are sent, it will reach its quota at
512. As long as they all belong to the same user, they will share the
quota and can queue at most 512 handles. If a second sending user
comes into play, it gets half the remaining not used by anyone else,
which ends up being 256. And so on... If the peer dequeues messages in
between, the numbers get higher again. But if you do the math, the
most you can get is 50% of the targets resources, if you're the only
sender. In all other cases you get less (like intertwined transfers,
etc).

We did look into sender-based inflight accounting, but the same set of
issues arises. Sure, a Request+Reply model would make this easier to
handle, but we want to explicitly support a Subscribe+Event{n} model.
In this case there is more than one Reply to a message.

Long story short: We have uid<->uid quotas so far, 

Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-26 Thread David Herrmann
Hi

On Wed, Oct 26, 2016 at 9:39 PM, Linus Torvalds
 wrote:
> So the thing that tends to worry me about these is resource management.
>
> If I understood the documentation correctly, this has per-user
> resource management, which guarantees that at least the system won't
> run out of memory. Good. The act of sending a message transfers the
> resource to the receiver end. Fine.
>
> However, the usual problem ends up being that a bad user can basically
> DoS a system agent, especially since for obvious performance reasons
> the send/receive has to be asynchronous.
>
> So the usual DoS model is that some user just sends a lot of messages
> to a system agent, filling up the system agent resource quota, and
> basically killing the system. No, it didn't run out of memory, but the
> system agent may not be able to do anything more, since it is now out
> of resources.
>
> Keeping the resource management with the sender doesn't solve the
> problem, it just reverses it: now the attack will be to send a lot of
> queries to the system agent, but then just refuse to listen to the
> replies - again causing the system agent to run out of resources.
>
> Usually the way this is resolved this is by forcing a
> "request-and-reply" resource management model, where the person who
> sends out a request is not only the one who is accounted for the
> request, but also accounted for the reply buffer. That way the system
> agent never runs out of resources, because it's always the requesting
> party that has its resources accounted, never the system agent.
>
> You may well have solved this, but can you describe what the solution
> is without forcing people to read the code and try to analyze it?

All accounting on bus1 is done on a UID-basis. This is the initial
model that tries to match POSIX semantics. More advanced accounting is
left as a future extension (like cgroup-based, etc.). So whenever we
talk about "user accounting", we talk about the user-abstraction in
bus1 that right now is based on UIDs, but could be extended for other
schemes.

All bus1 resources are owned by a peer, and each peer has a user
assigned (which right now corresponds to file->f_cred->uid). Whenever
a peer allocates resources, it is accounted on its user. There are
global limits per user which cannot be exceeded. Additionally, each
peer can set its own per-peer limits, to further partition the
per-user limits. Of course, per-user limits override per-peer limits,
if necessary.

Now this is all trivial and obvious. It works like any resource
accounting in the kernel. It becomes tricky when we try to transfer
resources. Before SEND, a resource is always accounted on the sender.
After RECV, a resource is accounted on the receiver. That is, resource
ownership is transferred. In most cases this is obvious: memory is
copied from one address-space into another, or file-descriptors are
added into the file-table of another process, etc.

Lastly, when a resource is queued, we decided to go with
receiver-accounting. This means, at the time of SEND resource
ownership is transferred (unlike sender-accounting, which would
transfer it at time of RECV). The reasons are manifold, but mainly we
want RECV to not fail due to accounting, resource exhaustion, etc. We
wanted SEND to do the heavy-lifting, and RECV to just dequeue. By
avoiding sender-based accounting, we avoid attacks where a receiver
does not dequeue messages and thus exhausts the sender's limits. The
issue left is senders DoS'ing a target user. To mitigate this, we
implemented a quota system. Whenever a sender wants to transfer
resources to a receiver, it only gets access to a subset of the
receivers resource limits. The inflight resources are accounted on a
uid<->uid basis, and the current algorithm allows a receiver access to
at most half the limit of the destination not currently used by anyone
else.

Example:
Imagine a receiver with a limit of 1024 handles. A sender transmits a
message to that receiver. It gets access to half the limit not used by
anyone else, hence 512 handles. It does not matter how many senders
there are, nor how many messages are sent, it will reach its quota at
512. As long as they all belong to the same user, they will share the
quota and can queue at most 512 handles. If a second sending user
comes into play, it gets half the remaining not used by anyone else,
which ends up being 256. And so on... If the peer dequeues messages in
between, the numbers get higher again. But if you do the math, the
most you can get is 50% of the targets resources, if you're the only
sender. In all other cases you get less (like intertwined transfers,
etc).

We did look into sender-based inflight accounting, but the same set of
issues arises. Sure, a Request+Reply model would make this easier to
handle, but we want to explicitly support a Subscribe+Event{n} model.
In this case there is more than one Reply to a message.

Long story short: We have uid<->uid quotas so far, which prevent DoS
attacks, unless 

Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-26 Thread Linus Torvalds
So the thing that tends to worry me about these is resource management.

If I understood the documentation correctly, this has per-user
resource management, which guarantees that at least the system won't
run out of memory. Good. The act of sending a message transfers the
resource to the receiver end. Fine.

However, the usual problem ends up being that a bad user can basically
DoS a system agent, especially since for obvious performance reasons
the send/receive has to be asynchronous.

So the usual DoS model is that some user just sends a lot of messages
to a system agent, filling up the system agent resource quota, and
basically killing the system. No, it didn't run out of memory, but the
system agent may not be able to do anything more, since it is now out
of resources.

Keeping the resource management with the sender doesn't solve the
problem, it just reverses it: now the attack will be to send a lot of
queries to the system agent, but then just refuse to listen to the
replies - again causing the system agent to run out of resources.

Usually the way this is resolved this is by forcing a
"request-and-reply" resource management model, where the person who
sends out a request is not only the one who is accounted for the
request, but also accounted for the reply buffer. That way the system
agent never runs out of resources, because it's always the requesting
party that has its resources accounted, never the system agent.

You may well have solved this, but can you describe what the solution
is without forcing people to read the code and try to analyze it?

   Linus


Re: [RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-26 Thread Linus Torvalds
So the thing that tends to worry me about these is resource management.

If I understood the documentation correctly, this has per-user
resource management, which guarantees that at least the system won't
run out of memory. Good. The act of sending a message transfers the
resource to the receiver end. Fine.

However, the usual problem ends up being that a bad user can basically
DoS a system agent, especially since for obvious performance reasons
the send/receive has to be asynchronous.

So the usual DoS model is that some user just sends a lot of messages
to a system agent, filling up the system agent resource quota, and
basically killing the system. No, it didn't run out of memory, but the
system agent may not be able to do anything more, since it is now out
of resources.

Keeping the resource management with the sender doesn't solve the
problem, it just reverses it: now the attack will be to send a lot of
queries to the system agent, but then just refuse to listen to the
replies - again causing the system agent to run out of resources.

Usually the way this is resolved this is by forcing a
"request-and-reply" resource management model, where the person who
sends out a request is not only the one who is accounted for the
request, but also accounted for the reply buffer. That way the system
agent never runs out of resources, because it's always the requesting
party that has its resources accounted, never the system agent.

You may well have solved this, but can you describe what the solution
is without forcing people to read the code and try to analyze it?

   Linus


[RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-26 Thread David Herrmann
Hi

This proposal introduces bus1.ko, a kernel messaging bus. This is not a request
for inclusion, yet. It is rather an initial draft and a Request For Comments.

While bus1 emerged out of the kdbus project, bus1 was started from scratch and
the concepts have little in common. In a nutshell, bus1 provides a
capability-based IPC system, similar in nature to Android Binder, Cap'n Proto,
and seL4. The module is completely generic and does neither require nor mandate
a user-space counter-part.

 o Description

Bus1 is a local IPC system, which provides a decentralized infrastructure to
share objects between local peers. The main building blocks are nodes and
handles. Nodes represent objects of a local peer, while handles represent
descriptors that point to a node. Nodes can be created and destroyed by any
peer, and they will always remain owned by their respective creator. Handles
on the other hand, are used to refer to nodes and can be passed around with
messages as auxiliary data. Whenever a handle is transferred, the receiver
will get its own handle allocated, pointing to the same node as the original
handle.

Any peer can send messages directed at one of their handles. This will
transfer the message to the owner of the node the handle points to. If a
peer does not posess a handle to a given node, it will not be able to send a
message to that node. That is, handles provide exclusive access management.
Anyone that somehow acquired a handle to a node is privileged to further
send this handle to other peers. As such, access management is transitive.
Once a peer acquired a handle, it cannot be revoked again. However, a node
owner can, at anytime, destroy a node. This will effectively unbind all
existing handles to that node on any peer, notifying each one of the
destruction.

Unlike nodes and handles, peers cannot be addressed directly. In fact, peers
are completely disconnected entities. A peer is merely an anchor of a set of
nodes and handles, including an incoming message queue for any of those.
Whether multiple nodes are all part of the same peer, or part of different
peers does not affect the remote view of those. Peers solely exist as
management entity and command dispatcher to local processes.

The set of actors on a system is completely decentralized. There is no
global component involved that provides a central registry or discovery
mechanism. Furthermore, communication between peers only involves those
peers, and does not affect any other peer in any way. No global
communication lock is taken. However, any communication is still globally
ordered, including unicasts, multicasts, and notifications.

 o Prior Art

The concepts behind bus1 are almost identical to capability systems like
Android Binder, Google Mojo, Cap'n Proto, seL4, and more. Bus1 differs from
them by supporting Global Ordering, Multicasts, Resource Accounting, No
Global Locking, No Global Context.

While the bus1 UAPI does not expose all features (like soft-references as
supported by Binder), the in-kernel code includes support for it. Multiple
UAPIs can be supported on top of the in-kernel bus1 code, including support
for the Binder UAPI. Efforts on this are still on-going.

 o Documentation

The first patch in this series provides the bus1(7) man-page. It explains
all concepts in bus1 in more detail. Furthermore, it describes the API that
is available on bus1 file descriptors. The pre-compiled man-page is
available at:

http://www.bus1.org/bus1.html

There is also a great bunch of in-source documentation available. All
cross-source-file APIs have KernelDoc annotations. Furthermore, we have an
introduction for each subsystem, to be found in the header files. The total
number in lines of code for bus1 is roughly ~4.5k. The remaining ~5k LOC
are comments and documentation.

 o Upstream

The upstream development repository is available on github:

http://github.com/bus1/bus1

It is an out-of-tree repository that allows easy and fast development of
new bus1 features. The in-tree integration repository is available at:

http://github.com/bus1/linux

 o Conferences

Tom and I will be attending Linux Plumbers Conf next week. Please do not
hesitate to contact us there in person. There will also be a presentation
[1] of bus1 on the last day of the conference.

Thanks
Tom & David

[1] https://www.linuxplumbersconf.org/2016/ocw/proposals/3819

Tom Gundersen (14):
  bus1: add bus1(7) man-page
  bus1: provide stub cdev /dev/bus1
  bus1: util - active reference utility library
  bus1: util - fixed list utility library
  bus1: util - pool utility library
  bus1: util - queue utility library
  bus1: tracking user contexts
  bus1: implement peer management context
  bus1: provide transaction context for multicasts
  

[RFC v1 00/14] Bus1 Kernel Message Bus

2016-10-26 Thread David Herrmann
Hi

This proposal introduces bus1.ko, a kernel messaging bus. This is not a request
for inclusion, yet. It is rather an initial draft and a Request For Comments.

While bus1 emerged out of the kdbus project, bus1 was started from scratch and
the concepts have little in common. In a nutshell, bus1 provides a
capability-based IPC system, similar in nature to Android Binder, Cap'n Proto,
and seL4. The module is completely generic and does neither require nor mandate
a user-space counter-part.

 o Description

Bus1 is a local IPC system, which provides a decentralized infrastructure to
share objects between local peers. The main building blocks are nodes and
handles. Nodes represent objects of a local peer, while handles represent
descriptors that point to a node. Nodes can be created and destroyed by any
peer, and they will always remain owned by their respective creator. Handles
on the other hand, are used to refer to nodes and can be passed around with
messages as auxiliary data. Whenever a handle is transferred, the receiver
will get its own handle allocated, pointing to the same node as the original
handle.

Any peer can send messages directed at one of their handles. This will
transfer the message to the owner of the node the handle points to. If a
peer does not posess a handle to a given node, it will not be able to send a
message to that node. That is, handles provide exclusive access management.
Anyone that somehow acquired a handle to a node is privileged to further
send this handle to other peers. As such, access management is transitive.
Once a peer acquired a handle, it cannot be revoked again. However, a node
owner can, at anytime, destroy a node. This will effectively unbind all
existing handles to that node on any peer, notifying each one of the
destruction.

Unlike nodes and handles, peers cannot be addressed directly. In fact, peers
are completely disconnected entities. A peer is merely an anchor of a set of
nodes and handles, including an incoming message queue for any of those.
Whether multiple nodes are all part of the same peer, or part of different
peers does not affect the remote view of those. Peers solely exist as
management entity and command dispatcher to local processes.

The set of actors on a system is completely decentralized. There is no
global component involved that provides a central registry or discovery
mechanism. Furthermore, communication between peers only involves those
peers, and does not affect any other peer in any way. No global
communication lock is taken. However, any communication is still globally
ordered, including unicasts, multicasts, and notifications.

 o Prior Art

The concepts behind bus1 are almost identical to capability systems like
Android Binder, Google Mojo, Cap'n Proto, seL4, and more. Bus1 differs from
them by supporting Global Ordering, Multicasts, Resource Accounting, No
Global Locking, No Global Context.

While the bus1 UAPI does not expose all features (like soft-references as
supported by Binder), the in-kernel code includes support for it. Multiple
UAPIs can be supported on top of the in-kernel bus1 code, including support
for the Binder UAPI. Efforts on this are still on-going.

 o Documentation

The first patch in this series provides the bus1(7) man-page. It explains
all concepts in bus1 in more detail. Furthermore, it describes the API that
is available on bus1 file descriptors. The pre-compiled man-page is
available at:

http://www.bus1.org/bus1.html

There is also a great bunch of in-source documentation available. All
cross-source-file APIs have KernelDoc annotations. Furthermore, we have an
introduction for each subsystem, to be found in the header files. The total
number in lines of code for bus1 is roughly ~4.5k. The remaining ~5k LOC
are comments and documentation.

 o Upstream

The upstream development repository is available on github:

http://github.com/bus1/bus1

It is an out-of-tree repository that allows easy and fast development of
new bus1 features. The in-tree integration repository is available at:

http://github.com/bus1/linux

 o Conferences

Tom and I will be attending Linux Plumbers Conf next week. Please do not
hesitate to contact us there in person. There will also be a presentation
[1] of bus1 on the last day of the conference.

Thanks
Tom & David

[1] https://www.linuxplumbersconf.org/2016/ocw/proposals/3819

Tom Gundersen (14):
  bus1: add bus1(7) man-page
  bus1: provide stub cdev /dev/bus1
  bus1: util - active reference utility library
  bus1: util - fixed list utility library
  bus1: util - pool utility library
  bus1: util - queue utility library
  bus1: tracking user contexts
  bus1: implement peer management context
  bus1: provide transaction context for multicasts