Re: kdbus: to merge or not to merge?
On Sun, Aug 9, 2015 at 3:11 PM, Daniel Mack wrote: > > The kdbus implementation is actually comparable to two tasks X and Y > which both have their own buffer file open and mmap()ed, and they both > pass their FD to the other side. If X now writes to Y's file, and that > is causing a page fault, X is accounted for it, correct? No. With shared memory, there's no particularly obvious accounting rules. In particular, when somebody maps an already allocated page, it's basically a no-op from a memory allocation standpoint. The whole "this is equivalent to the user space deamon" argument is bogus. Shared memory is very very different from just sending messages (copying the buffers) and is generally much harder to get a handle on. And thats' what you should be comparing to. The old "communicate over a unix domain socket" had pretty clear accounting rules, and while unix domain sockets have some horribly nasty issues (most are about passing fd's around) that isn't one of them. Anyway, the real issue for me here is that Andy is reporting all these actual real problems that happen in practice, and the developer replies are dismissing them on totally irrelevant grounds ("this should be equivalent to something entirely different that nobody ever does" or "well, people could opt out, even if they didn't" yadda yadda yadda). For example, the whole "tasks X and Y communicate over shmem" is irrelevant. Normally, when people write those kinds of applications, they are just regular applications. If they have issues, nobody else cares. Andy's concern is about one of X/Y being a system daemon and tricking it into doing bad things ends up effectively killing the system - whether the *kernel* is alive or not and did the right thing is almost entirely immaterial. So please. When Andy sends a bug report with a exploit that kills his system, just stop responding with irrelevant theoretical arguments. It is not appropriate. Instead, acknowledge the problem and work on fixing it, none of this "but but but it's all the same" crap. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Sun, Aug 9, 2015 at 3:11 PM, Daniel Mack dan...@zonque.org wrote: The kdbus implementation is actually comparable to two tasks X and Y which both have their own buffer file open and mmap()ed, and they both pass their FD to the other side. If X now writes to Y's file, and that is causing a page fault, X is accounted for it, correct? No. With shared memory, there's no particularly obvious accounting rules. In particular, when somebody maps an already allocated page, it's basically a no-op from a memory allocation standpoint. The whole this is equivalent to the user space deamon argument is bogus. Shared memory is very very different from just sending messages (copying the buffers) and is generally much harder to get a handle on. And thats' what you should be comparing to. The old communicate over a unix domain socket had pretty clear accounting rules, and while unix domain sockets have some horribly nasty issues (most are about passing fd's around) that isn't one of them. Anyway, the real issue for me here is that Andy is reporting all these actual real problems that happen in practice, and the developer replies are dismissing them on totally irrelevant grounds (this should be equivalent to something entirely different that nobody ever does or well, people could opt out, even if they didn't yadda yadda yadda). For example, the whole tasks X and Y communicate over shmem is irrelevant. Normally, when people write those kinds of applications, they are just regular applications. If they have issues, nobody else cares. Andy's concern is about one of X/Y being a system daemon and tricking it into doing bad things ends up effectively killing the system - whether the *kernel* is alive or not and did the right thing is almost entirely immaterial. So please. When Andy sends a bug report with a exploit that kills his system, just stop responding with irrelevant theoretical arguments. It is not appropriate. Instead, acknowledge the problem and work on fixing it, none of this but but but it's all the same crap. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Sun, 9 Aug 2015, Greg Kroah-Hartman wrote: The issue is with userspace clients opting in to receive all NameOwnerChanged messages on the bus, which is not a good idea as they constantly get woken up and process them, which is why the CPU was pegged. This issue should now be fixed in Rawhide for some of the packages we found that were doing this. Maintainers of other packages have been informed. End result, no one has ever really tested sending "bad" messages to the current system as all existing dbus users try to be "good actors", thanks to Andy's testing, these apps should all now become much more robust. Does it require elevated privileges to opt to receive all NameOwnerChanged messages on the bus? Is it the default unless the apps opt for something more restrictive? or is it somewhere in between? I was under the impression that the days of writing system-level stuff that assumes that all userspace apps are going to 'play nice' went out a decade or more ago. It's fine if the userspace app can kill itself, or possibly even the user it's running as, but being able to kill apps running as other users, let alone the whole system is a problem nowdays. It may be able to happen in a default system, but this is why cgroups and namespaces have been created, to give the system admin the ability to limit the resources that any one app can consume. Introducing a new mechanism that allows one user to consume resources allocated to another and kill the system without providing a kernel level mechanism to limit the damage (as opposed to fixing individual apps) seems rather short-sighted at best. David Lang
Re: kdbus: to merge or not to merge?
On Sun, Aug 9, 2015 at 3:11 PM, Daniel Mack wrote: > > Internally, the connection pool is simply a shmem backed file. From the > context of the HELLO ioctl, we are calling into shmem_file_setup(), so > the file is eventually owned by the task which created the bus task > connecting to the bus. One reason why we do the shmem file allocation in > the kernel and on behalf of a the userspace task is that we clear the > VM_MAYWRITE bit to prevent the task from writing to the pool through its > mapped buffer. We also do not set VM_NORESERVE, so the entire buffer is > pre-accounted for the task that created the connection. I don't have access to the system I've been using for testing right now, but I wonder how the kdbus pool stack up against the entire rest of memory allocations for the average desktop process. > > The pool implementation uses an r/b tree to organize the buffer into > slices. Those slices can be kept by userspace as long as the parsing > implementation needs to have access to them. When finished, the slices > are freed. A simple ring buffer cannot cope with the gaps that emerge by > that. > > When a connection buffer is written to, it is done from the context of > another task which calls into the kdbus code through one of the ioctls. > The memcg implementation should hence charge the task that acts as > writer, which is maybe not ideal but can be changed easily with some > addition to the internal APIs. We omitted it for the current version, > which is non-intrusive with regards to other kernel subsystems. > This has at least the following weakness. I can very easily get systemd to write to my shmem-backed pool: simply subscribe to one of its broadcasts. If I cause such a write to be very slow (intentionally or otherwise), then PID 1 blocks. If you change the memcg code to charge me instead of PID 1 (as it should IMO), then the problem gets worse. > The kdbus implementation is actually comparable to two tasks X and Y > which both have their own buffer file open and mmap()ed, and they both > pass their FD to the other side. If X now writes to Y's file, and that > is causing a page fault, X is accounted for it, correct? If PID 1 accepted a memfd from me (even a properly sealed one) and wrote to it, I would wonder whether it were actually a good idea. Does this scheme have any actual measurable advantage over the traditional model of a small non-paged buffer in the kernel (i.e. the way sockets work) with explicit userspace memfd use as appropriate? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On 08/09/2015 09:00 PM, Greg Kroah-Hartman wrote: > In chatting with Daniel on IRC, he is writing up a summary of how the > kdbus memory pools work in more detail, and he said he would sent that > out in a day or so, so that everyone can review. Yes, let me quickly describe again how the kdbus pool logic works. Every bus connection (peer) owns a buffer which is used in order to receive payloads. Such payloads are either messages sent from other connections, notifications or returned answer structures in return of query commands (name lists, etc). In order to avoid the kernel having to maintaining an internal buffer the connections then read from with an extra command, we decided to let the connections own their buffer directly, so they can mmap() the memory into their task. Allocating a local buffer to collect asynchronous messages is what they would need to do anyway, so we implemented a short-cut that allows the kernel to directly access the memory and write to it. The size of this buffer pool is configured by each connection individually, during the HELLO call, so the kernel interface is as flexible as any other memory allocation scheme the kernel provides and is subject to the same limits. Internally, the connection pool is simply a shmem backed file. From the context of the HELLO ioctl, we are calling into shmem_file_setup(), so the file is eventually owned by the task which created the bus task connecting to the bus. One reason why we do the shmem file allocation in the kernel and on behalf of a the userspace task is that we clear the VM_MAYWRITE bit to prevent the task from writing to the pool through its mapped buffer. We also do not set VM_NORESERVE, so the entire buffer is pre-accounted for the task that created the connection. The pool implementation uses an r/b tree to organize the buffer into slices. Those slices can be kept by userspace as long as the parsing implementation needs to have access to them. When finished, the slices are freed. A simple ring buffer cannot cope with the gaps that emerge by that. When a connection buffer is written to, it is done from the context of another task which calls into the kdbus code through one of the ioctls. The memcg implementation should hence charge the task that acts as writer, which is maybe not ideal but can be changed easily with some addition to the internal APIs. We omitted it for the current version, which is non-intrusive with regards to other kernel subsystems. The kdbus implementation is actually comparable to two tasks X and Y which both have their own buffer file open and mmap()ed, and they both pass their FD to the other side. If X now writes to Y's file, and that is causing a page fault, X is accounted for it, correct? The kernel does *not* do any memory allocation to buffer payload, and all other allocations (for instance, to keep around the internal state of a connection, names etc) are subject to conservatively chosen limitations. There is no unbounded memory allocation in kdbus that I am aware of. If there was, it would clearly be a bug. Addressing the point Andy made earlier: yes, due to memory overcommitment, OOM situations may happen with certain patterns, but the kernel should have the same measures to deal with them that it already has with other types of shared userspace memory. Right? Hope that all makes sense, we're open to discussions around the desired accounting details. I've copied linux-mm to let more people have a look into this again. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Fri, Aug 07, 2015 at 06:26:31PM +0300, Linus Torvalds wrote: > User space memory allocation is not AT ALL the same thing as kdbus. > Kernel allocations are very very different from user allocations. We > have reasonable, fairly tested, and generic models for handling user > space memory allocation issues - limiting, debugging, failing, and > handling catastrophes (ie oom). And no, even that doesn't always work > perfectly, but at least there is a *lot* of support for it, and this > is not some special case. The memory in this case is a shmem file that is created by the kernel, but on behalf of the bus client task, which will eventually own it. As discussed with the mm developers, the same logic for accounting, OOM handling, etc. applies to the kdbus shmem buffers, as they are written to from the context of another task. If this is mistaken, then yes, you are right, and the code will have to be changed. > This discussion has been full of kdbus people ignoring Andy saying "it > worked with the user space version, it killed the machine with kdbus". > And now people trying to claim the issues are the same. HELL NO. Andy found some great bugs with regards to flooding the bus with requests, which has not been ignored at all. The same issue is present in dbus today, but the kdbus code runs faster and more messages were being sent than the current userspace dbus daemon, so the machine becomes unresponsive easier. The issue is with userspace clients opting in to receive all NameOwnerChanged messages on the bus, which is not a good idea as they constantly get woken up and process them, which is why the CPU was pegged. This issue should now be fixed in Rawhide for some of the packages we found that were doing this. Maintainers of other packages have been informed. End result, no one has ever really tested sending "bad" messages to the current system as all existing dbus users try to be "good actors", thanks to Andy's testing, these apps should all now become much more robust. In chatting with Daniel on IRC, he is writing up a summary of how the kdbus memory pools work in more detail, and he said he would sent that out in a day or so, so that everyone can review. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Fri, Aug 07, 2015 at 06:26:31PM +0300, Linus Torvalds wrote: User space memory allocation is not AT ALL the same thing as kdbus. Kernel allocations are very very different from user allocations. We have reasonable, fairly tested, and generic models for handling user space memory allocation issues - limiting, debugging, failing, and handling catastrophes (ie oom). And no, even that doesn't always work perfectly, but at least there is a *lot* of support for it, and this is not some special case. The memory in this case is a shmem file that is created by the kernel, but on behalf of the bus client task, which will eventually own it. As discussed with the mm developers, the same logic for accounting, OOM handling, etc. applies to the kdbus shmem buffers, as they are written to from the context of another task. If this is mistaken, then yes, you are right, and the code will have to be changed. This discussion has been full of kdbus people ignoring Andy saying it worked with the user space version, it killed the machine with kdbus. And now people trying to claim the issues are the same. HELL NO. Andy found some great bugs with regards to flooding the bus with requests, which has not been ignored at all. The same issue is present in dbus today, but the kdbus code runs faster and more messages were being sent than the current userspace dbus daemon, so the machine becomes unresponsive easier. The issue is with userspace clients opting in to receive all NameOwnerChanged messages on the bus, which is not a good idea as they constantly get woken up and process them, which is why the CPU was pegged. This issue should now be fixed in Rawhide for some of the packages we found that were doing this. Maintainers of other packages have been informed. End result, no one has ever really tested sending bad messages to the current system as all existing dbus users try to be good actors, thanks to Andy's testing, these apps should all now become much more robust. In chatting with Daniel on IRC, he is writing up a summary of how the kdbus memory pools work in more detail, and he said he would sent that out in a day or so, so that everyone can review. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On 08/09/2015 09:00 PM, Greg Kroah-Hartman wrote: In chatting with Daniel on IRC, he is writing up a summary of how the kdbus memory pools work in more detail, and he said he would sent that out in a day or so, so that everyone can review. Yes, let me quickly describe again how the kdbus pool logic works. Every bus connection (peer) owns a buffer which is used in order to receive payloads. Such payloads are either messages sent from other connections, notifications or returned answer structures in return of query commands (name lists, etc). In order to avoid the kernel having to maintaining an internal buffer the connections then read from with an extra command, we decided to let the connections own their buffer directly, so they can mmap() the memory into their task. Allocating a local buffer to collect asynchronous messages is what they would need to do anyway, so we implemented a short-cut that allows the kernel to directly access the memory and write to it. The size of this buffer pool is configured by each connection individually, during the HELLO call, so the kernel interface is as flexible as any other memory allocation scheme the kernel provides and is subject to the same limits. Internally, the connection pool is simply a shmem backed file. From the context of the HELLO ioctl, we are calling into shmem_file_setup(), so the file is eventually owned by the task which created the bus task connecting to the bus. One reason why we do the shmem file allocation in the kernel and on behalf of a the userspace task is that we clear the VM_MAYWRITE bit to prevent the task from writing to the pool through its mapped buffer. We also do not set VM_NORESERVE, so the entire buffer is pre-accounted for the task that created the connection. The pool implementation uses an r/b tree to organize the buffer into slices. Those slices can be kept by userspace as long as the parsing implementation needs to have access to them. When finished, the slices are freed. A simple ring buffer cannot cope with the gaps that emerge by that. When a connection buffer is written to, it is done from the context of another task which calls into the kdbus code through one of the ioctls. The memcg implementation should hence charge the task that acts as writer, which is maybe not ideal but can be changed easily with some addition to the internal APIs. We omitted it for the current version, which is non-intrusive with regards to other kernel subsystems. The kdbus implementation is actually comparable to two tasks X and Y which both have their own buffer file open and mmap()ed, and they both pass their FD to the other side. If X now writes to Y's file, and that is causing a page fault, X is accounted for it, correct? The kernel does *not* do any memory allocation to buffer payload, and all other allocations (for instance, to keep around the internal state of a connection, names etc) are subject to conservatively chosen limitations. There is no unbounded memory allocation in kdbus that I am aware of. If there was, it would clearly be a bug. Addressing the point Andy made earlier: yes, due to memory overcommitment, OOM situations may happen with certain patterns, but the kernel should have the same measures to deal with them that it already has with other types of shared userspace memory. Right? Hope that all makes sense, we're open to discussions around the desired accounting details. I've copied linux-mm to let more people have a look into this again. Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Sun, Aug 9, 2015 at 3:11 PM, Daniel Mack dan...@zonque.org wrote: Internally, the connection pool is simply a shmem backed file. From the context of the HELLO ioctl, we are calling into shmem_file_setup(), so the file is eventually owned by the task which created the bus task connecting to the bus. One reason why we do the shmem file allocation in the kernel and on behalf of a the userspace task is that we clear the VM_MAYWRITE bit to prevent the task from writing to the pool through its mapped buffer. We also do not set VM_NORESERVE, so the entire buffer is pre-accounted for the task that created the connection. I don't have access to the system I've been using for testing right now, but I wonder how the kdbus pool stack up against the entire rest of memory allocations for the average desktop process. The pool implementation uses an r/b tree to organize the buffer into slices. Those slices can be kept by userspace as long as the parsing implementation needs to have access to them. When finished, the slices are freed. A simple ring buffer cannot cope with the gaps that emerge by that. When a connection buffer is written to, it is done from the context of another task which calls into the kdbus code through one of the ioctls. The memcg implementation should hence charge the task that acts as writer, which is maybe not ideal but can be changed easily with some addition to the internal APIs. We omitted it for the current version, which is non-intrusive with regards to other kernel subsystems. This has at least the following weakness. I can very easily get systemd to write to my shmem-backed pool: simply subscribe to one of its broadcasts. If I cause such a write to be very slow (intentionally or otherwise), then PID 1 blocks. If you change the memcg code to charge me instead of PID 1 (as it should IMO), then the problem gets worse. The kdbus implementation is actually comparable to two tasks X and Y which both have their own buffer file open and mmap()ed, and they both pass their FD to the other side. If X now writes to Y's file, and that is causing a page fault, X is accounted for it, correct? If PID 1 accepted a memfd from me (even a properly sealed one) and wrote to it, I would wonder whether it were actually a good idea. Does this scheme have any actual measurable advantage over the traditional model of a small non-paged buffer in the kernel (i.e. the way sockets work) with explicit userspace memfd use as appropriate? --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Sun, 9 Aug 2015, Greg Kroah-Hartman wrote: The issue is with userspace clients opting in to receive all NameOwnerChanged messages on the bus, which is not a good idea as they constantly get woken up and process them, which is why the CPU was pegged. This issue should now be fixed in Rawhide for some of the packages we found that were doing this. Maintainers of other packages have been informed. End result, no one has ever really tested sending bad messages to the current system as all existing dbus users try to be good actors, thanks to Andy's testing, these apps should all now become much more robust. Does it require elevated privileges to opt to receive all NameOwnerChanged messages on the bus? Is it the default unless the apps opt for something more restrictive? or is it somewhere in between? I was under the impression that the days of writing system-level stuff that assumes that all userspace apps are going to 'play nice' went out a decade or more ago. It's fine if the userspace app can kill itself, or possibly even the user it's running as, but being able to kill apps running as other users, let alone the whole system is a problem nowdays. It may be able to happen in a default system, but this is why cgroups and namespaces have been created, to give the system admin the ability to limit the resources that any one app can consume. Introducing a new mechanism that allows one user to consume resources allocated to another and kill the system without providing a kernel level mechanism to limit the damage (as opposed to fixing individual apps) seems rather short-sighted at best. David Lang
Re: kdbus: to merge or not to merge?
2015-08-07 2:43 GMT+08:00 Andy Lutomirski : > On Thu, Aug 6, 2015 at 11:14 AM, Daniel Mack wrote: >> On 08/06/2015 05:21 PM, Andy Lutomirski wrote: >>> Maybe gdbus really does use kdbus already, but on >>> very brief inspection it looked like it didn't at least on my test VM. >> >> No, it's not in any released version yet. The patches for that are being >> worked on though and look promising. >> >>> If the client buffers on !EPOLLOUT and has a monster buffer, then >>> that's the client's problem. >>> >>> If every single program has a monster buffer, then it's everyone's >>> problem, and the size of the problem gets multiplied by the number of >>> programs. >> >> The size of the memory pool of a bus client is chosen by the client >> itself individually during the HELLO call. It's pretty much the same as >> if the client allocated the buffer itself, except that the kernel does >> it on their behalf. >> >> Also note that kdbus features a peer-to-peer based quota accounting >> logic, so a single bus connection can not DOS another one by filling its >> buffer. > > I haven't looked at the quota code at all. > > Nonetheless, it looks like the slice logic (aside: it looks *way* more > complicated than necessary -- what's wrong with circular buffers) > will, under most (but not all!) workloads, concentrate access to a > smallish fraction of the pool. This is IMO bad, since it means that > most of the time most of the pool will remain uncommitted. If, at > some point, something causes the access pattern to change and hit all > the pages (even just once), suddenly all of the pools get committed, > and your memory usage blows up. > > Again, please stop blaming the clients. In practice, kdbus is a > system involving the kernel, systemd, sd-bus, and other stuff, mostly > written by the same people. If kdbus gets merged and it survives but > half the clients blow up and peoples' systems fall over, that's not > okay. Any comments about the questions mentioned by Andy? In KDBUS, sender writes a page of receiver's tmpfs space, may either helps receiver to escape its memcg limitation, or incurs receiver's limitation? Also, I'm curious about similar problems in these cases: 1. A UNIX domain Server (SOCK_STREAM or SOCK_DGRAM) replies to its Clients, but some clients consume the messages __too slow__, will the server block? Or can it serve other clients instead of blocking? 2. Open netlink sockets of NETLINK_KOBJECT_UEVENT, but some processes consume uevent __too slow__, and uevent is continually triggered. Will the system block? Or those processes finally lost some uevents? 3. Watch a directory via inotify, but some processes consume events __too slow__, and file operations is continually performed against the directory. Will the system block? Or those processes finally lost some events? -- Regards, - cee1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Fri, Aug 7, 2015 at 7:40 AM, Daniel Mack wrote: > On 08/06/2015 08:43 PM, Andy Lutomirski wrote: >> Nonetheless, it looks like the slice logic (aside: it looks *way* more >> complicated than necessary -- what's wrong with circular buffers) >> will, under most (but not all!) workloads, concentrate access to a >> smallish fraction of the pool. This is IMO bad, since it means that >> most of the time most of the pool will remain uncommitted. If, at >> some point, something causes the access pattern to change and hit all >> the pages (even just once), suddenly all of the pools get committed, >> and your memory usage blows up. > > That's a general problem with memory overcommitment, and not specific to > kdbus. IOW: You'd have the same problem with a similar logic implemented > in userspace, right? > Sure, except that, if it's in userspace and it starts causing problems, then userspace can fix it without running into kernel ABI stability issues. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On 08/06/2015 08:43 PM, Andy Lutomirski wrote: > Nonetheless, it looks like the slice logic (aside: it looks *way* more > complicated than necessary -- what's wrong with circular buffers) > will, under most (but not all!) workloads, concentrate access to a > smallish fraction of the pool. This is IMO bad, since it means that > most of the time most of the pool will remain uncommitted. If, at > some point, something causes the access pattern to change and hit all > the pages (even just once), suddenly all of the pools get committed, > and your memory usage blows up. That's a general problem with memory overcommitment, and not specific to kdbus. IOW: You'd have the same problem with a similar logic implemented in userspace, right? Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
2015-08-07 2:43 GMT+08:00 Andy Lutomirski l...@amacapital.net: On Thu, Aug 6, 2015 at 11:14 AM, Daniel Mack dan...@zonque.org wrote: On 08/06/2015 05:21 PM, Andy Lutomirski wrote: Maybe gdbus really does use kdbus already, but on very brief inspection it looked like it didn't at least on my test VM. No, it's not in any released version yet. The patches for that are being worked on though and look promising. If the client buffers on !EPOLLOUT and has a monster buffer, then that's the client's problem. If every single program has a monster buffer, then it's everyone's problem, and the size of the problem gets multiplied by the number of programs. The size of the memory pool of a bus client is chosen by the client itself individually during the HELLO call. It's pretty much the same as if the client allocated the buffer itself, except that the kernel does it on their behalf. Also note that kdbus features a peer-to-peer based quota accounting logic, so a single bus connection can not DOS another one by filling its buffer. I haven't looked at the quota code at all. Nonetheless, it looks like the slice logic (aside: it looks *way* more complicated than necessary -- what's wrong with circular buffers) will, under most (but not all!) workloads, concentrate access to a smallish fraction of the pool. This is IMO bad, since it means that most of the time most of the pool will remain uncommitted. If, at some point, something causes the access pattern to change and hit all the pages (even just once), suddenly all of the pools get committed, and your memory usage blows up. Again, please stop blaming the clients. In practice, kdbus is a system involving the kernel, systemd, sd-bus, and other stuff, mostly written by the same people. If kdbus gets merged and it survives but half the clients blow up and peoples' systems fall over, that's not okay. Any comments about the questions mentioned by Andy? In KDBUS, sender writes a page of receiver's tmpfs space, may either helps receiver to escape its memcg limitation, or incurs receiver's limitation? Also, I'm curious about similar problems in these cases: 1. A UNIX domain Server (SOCK_STREAM or SOCK_DGRAM) replies to its Clients, but some clients consume the messages __too slow__, will the server block? Or can it serve other clients instead of blocking? 2. Open netlink sockets of NETLINK_KOBJECT_UEVENT, but some processes consume uevent __too slow__, and uevent is continually triggered. Will the system block? Or those processes finally lost some uevents? 3. Watch a directory via inotify, but some processes consume events __too slow__, and file operations is continually performed against the directory. Will the system block? Or those processes finally lost some events? -- Regards, - cee1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Fri, Aug 7, 2015 at 7:40 AM, Daniel Mack dan...@zonque.org wrote: On 08/06/2015 08:43 PM, Andy Lutomirski wrote: Nonetheless, it looks like the slice logic (aside: it looks *way* more complicated than necessary -- what's wrong with circular buffers) will, under most (but not all!) workloads, concentrate access to a smallish fraction of the pool. This is IMO bad, since it means that most of the time most of the pool will remain uncommitted. If, at some point, something causes the access pattern to change and hit all the pages (even just once), suddenly all of the pools get committed, and your memory usage blows up. That's a general problem with memory overcommitment, and not specific to kdbus. IOW: You'd have the same problem with a similar logic implemented in userspace, right? Sure, except that, if it's in userspace and it starts causing problems, then userspace can fix it without running into kernel ABI stability issues. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On 08/06/2015 08:43 PM, Andy Lutomirski wrote: Nonetheless, it looks like the slice logic (aside: it looks *way* more complicated than necessary -- what's wrong with circular buffers) will, under most (but not all!) workloads, concentrate access to a smallish fraction of the pool. This is IMO bad, since it means that most of the time most of the pool will remain uncommitted. If, at some point, something causes the access pattern to change and hit all the pages (even just once), suddenly all of the pools get committed, and your memory usage blows up. That's a general problem with memory overcommitment, and not specific to kdbus. IOW: You'd have the same problem with a similar logic implemented in userspace, right? Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu, Aug 6, 2015 at 11:14 AM, Daniel Mack wrote: > On 08/06/2015 05:21 PM, Andy Lutomirski wrote: >> Maybe gdbus really does use kdbus already, but on >> very brief inspection it looked like it didn't at least on my test VM. > > No, it's not in any released version yet. The patches for that are being > worked on though and look promising. > >> If the client buffers on !EPOLLOUT and has a monster buffer, then >> that's the client's problem. >> >> If every single program has a monster buffer, then it's everyone's >> problem, and the size of the problem gets multiplied by the number of >> programs. > > The size of the memory pool of a bus client is chosen by the client > itself individually during the HELLO call. It's pretty much the same as > if the client allocated the buffer itself, except that the kernel does > it on their behalf. > > Also note that kdbus features a peer-to-peer based quota accounting > logic, so a single bus connection can not DOS another one by filling its > buffer. I haven't looked at the quota code at all. Nonetheless, it looks like the slice logic (aside: it looks *way* more complicated than necessary -- what's wrong with circular buffers) will, under most (but not all!) workloads, concentrate access to a smallish fraction of the pool. This is IMO bad, since it means that most of the time most of the pool will remain uncommitted. If, at some point, something causes the access pattern to change and hit all the pages (even just once), suddenly all of the pools get committed, and your memory usage blows up. Again, please stop blaming the clients. In practice, kdbus is a system involving the kernel, systemd, sd-bus, and other stuff, mostly written by the same people. If kdbus gets merged and it survives but half the clients blow up and peoples' systems fall over, that's not okay. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On 08/06/2015 05:21 PM, Andy Lutomirski wrote: > Maybe gdbus really does use kdbus already, but on > very brief inspection it looked like it didn't at least on my test VM. No, it's not in any released version yet. The patches for that are being worked on though and look promising. > If the client buffers on !EPOLLOUT and has a monster buffer, then > that's the client's problem. > > If every single program has a monster buffer, then it's everyone's > problem, and the size of the problem gets multiplied by the number of > programs. The size of the memory pool of a bus client is chosen by the client itself individually during the HELLO call. It's pretty much the same as if the client allocated the buffer itself, except that the kernel does it on their behalf. Also note that kdbus features a peer-to-peer based quota accounting logic, so a single bus connection can not DOS another one by filling its buffer. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On 08/06/2015 05:27 PM, Andy Lutomirski wrote: >> In DBus (both kdbus and DBus1), such matches are installed on the >> > NameOwnerChanged signal, and they can be either specific to a single ID, >> > or broad, which will make them match on any ID. There's actually no >> > reason for applications to install unspecific matches, but if they do, >> > they will of course get what they asked for, and are woken up on every >> > ID that is added to or removed from the bus. What you're seeing in your >> > system profile is that some applications misbehave and install >> > unspecific matches when they shouldn't. That's a userspace bug that >> > needs fixing. Two candidates were actually in the systemd code base >> > (logind and PID1), and both are now patched. > > Can you point me at the patch? https://github.com/systemd/systemd/pull/876 https://github.com/systemd/systemd/pull/887 firewalld and possibly some other applications in the Fedora default install use python-slip, a convenience library that currently unconditionally installs the broad matches. I filed a bug with patches here: https://fedorahosted.org/python-slip/ticket/2 And I filed more bugs for some GNOME components. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu, Aug 6, 2015 at 12:06 AM, Daniel Mack wrote: > Hi Andy, > > On 08/05/2015 02:18 AM, Andy Lutomirski wrote: >> I added the missing sd_bus_unref call. >> >> With userspace dbus, my program takes 95% CPU and dbus-daemon takes >> 88% CPU or so. >> >> With kdbus, I see abuse-bus (my test), systemd-journald, >> systemd-bus-proxy, auditd, gnome-shell, mission-control, sedispatch, >> firewalld, polkitd, NetworkManager, systemd, avahi-daemon, audisp, >> abrt-dump-jour* (whatever it's called -- it truncated), upowerd, and >> systemd-logind all taking tons of CPU. I've listed them in decreasing >> order of amount of CPU burned -- the top several are taking about as >> much as is possible. Load average is over 13. That's if I run it >> from a text console while I'm logged in to gnome in a different VT. > > That's right, I can reproduce this here. To explain what's going on, let > me provide some background. > > Every time a client connects to kdbus, a new ID is assigned to the > connection, and other connections which have previously subscribed to > notifications of type KDBUS_ITEM_ID_ADD or _REMOVE get a notification > and are woken up so they can dispatch it. By default, no such matches > exists, applicaions have to explicitly opt-in if they are interested in > these events. > > In DBus (both kdbus and DBus1), such matches are installed on the > NameOwnerChanged signal, and they can be either specific to a single ID, > or broad, which will make them match on any ID. There's actually no > reason for applications to install unspecific matches, but if they do, > they will of course get what they asked for, and are woken up on every > ID that is added to or removed from the bus. What you're seeing in your > system profile is that some applications misbehave and install > unspecific matches when they shouldn't. That's a userspace bug that > needs fixing. Two candidates were actually in the systemd code base > (logind and PID1), and both are now patched. Can you point me at the patch? It sounds like that will reduce the scalability issue with this particular test from whatever userspace overhead exists * number of clients to just the overhead of looping over all clients and their matches in the kernel. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Aug 6, 2015 1:04 AM, "David Herrmann" wrote: > > Given that all existing prototype userspace that I'm aware of > > (systemd and its consumers) apparently opts in, I don't really care > > that the feature is opt-in. > > This is just plain wrong. Out of the dozens of dbus applications, you > found like 9 which are buggy? Two of them are already fixed, the > maintainers of the other ones notified. > I'd be interested where you got this notion that "all existing > prototype userspace [...] opts in". > I would say instead that, out of one in-use kdbus library, I found one that was buggy. Maybe gdbus really does use kdbus already, but on very brief inspection it looked like it didn't at least on my test VM. > > > Also, you haven't addressed the memory usage issues -- > > ..because it doesn't change anything. If your IPC is message based and > async, _someone_ needs to buffer. I don't see the difference between > buffering locally on !EPOLLOUT or buffering in a shmem pool. In both > cases, clients have control over the buffer size. If you disagree, > please _elaborate_. If the client buffers on !EPOLLOUT and has a monster buffer, then that's the client's problem. If every single program has a monster buffer, then it's everyone's problem, and the size of the problem gets multiplied by the number of programs. Also, sensible clients that produce bulk data will throttle on !EPOLLOUT rather than blindly buffering, but that's not an option when the huge buffer is on the receiver's end. Read up on "bufferbloat". --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Am Donnerstag, 6. August 2015, 10:04:57 schrieb David Herrmann: > > Given that all existing prototype userspace that I'm aware of > > > > (systemd and its consumers) apparently opts in, I don't really care > > that the feature is opt-in. > > This is just plain wrong. Out of the dozens of dbus applications, you > found like 9 which are buggy? Two of them are already fixed, the > maintainers of the other ones notified. > I'd be interested where you got this notion that "all existing > prototype userspace [...] opts in". But these few can create the issues Andy described? Sure, one can argue I can setup a stress or stress-ng command line invocation as root user that will basically grind a Linux system to a halt – and in a way I consider this to be a bug in the kernel as well, but one that exists since a long time. But a GUI application running as a user? How about some robustness regarding what you see as bugs in userspace here? I think "The bug is not mine" is exactly the same language we have seen here before. If the kernel relies on bug-free userspace applications in order to do its job properly I think it has robustness issues. One certainly wouldn´t want this with any mission critical realtime OS. I think it is the kernel that should be in control. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Hi On Wed, Aug 5, 2015 at 10:11 PM, Andy Lutomirski wrote: > On Wed, Aug 5, 2015 at 12:10 AM, David Herrmann wrote: >> Hi >> >> On Tue, Aug 4, 2015 at 4:47 PM, Andy Lutomirski wrote: >>> On Tue, Aug 4, 2015 at 7:09 AM, David Herrmann >>> wrote: This is a bug in the proxy (which is already fixed). >>> >>> Should I expect to see it in Rawhide soon? >> >> Use this workaround until it does: >> >> $ DBUS_SYSTEM_BUS_ADDRESS="kernel:path=/sys/fs/kdbus/0-system/bus" >> ./your-binary >> > > Which binary is supposed to be run like that? Your test. >>> Anyway, the broadcasts that I intended to exercise were >>> KDBUS_ITEM_ID_REMOVE. Those appear to be broadcast to everyone, >>> irrespective of "policy", so long as the "match" thingy allows it. >> >> Matches are opt-in, not opt-out. Nobody will get this message unless >> they opt in. >> > > And what opts in? Either something's broken, or there's a different > scalabilty problem, or a whole pile of kdbus-using programs in Fedora > Rawhide do, in fact, opt in. See Daniel's explanation. If applications subscribe to all notifications, they get what they asked for. I recommend filing bug reports for the applications in question. > Given that all existing prototype userspace that I'm aware of > (systemd and its consumers) apparently opts in, I don't really care > that the feature is opt-in. This is just plain wrong. Out of the dozens of dbus applications, you found like 9 which are buggy? Two of them are already fixed, the maintainers of the other ones notified. I'd be interested where you got this notion that "all existing prototype userspace [...] opts in". > Also, given things like this: > > commit d27c8057699d164648b7d8c1559fa6529998f89d > Author: David Herrmann > Date: Tue May 26 09:30:14 2015 +0200 > > kdbus: forward ID notifications to everyone > > it really does seem to me that the point of these ID notifications is > for everyone to get them. It's not. This patch just opens the policy so everyone can see those notifications. By default, it's not delivered to anyone. > Also, you haven't addressed the memory usage issues -- ..because it doesn't change anything. If your IPC is message based and async, _someone_ needs to buffer. I don't see the difference between buffering locally on !EPOLLOUT or buffering in a shmem pool. In both cases, clients have control over the buffer size. If you disagree, please _elaborate_. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Hi Andy, On 08/05/2015 02:18 AM, Andy Lutomirski wrote: > I added the missing sd_bus_unref call. > > With userspace dbus, my program takes 95% CPU and dbus-daemon takes > 88% CPU or so. > > With kdbus, I see abuse-bus (my test), systemd-journald, > systemd-bus-proxy, auditd, gnome-shell, mission-control, sedispatch, > firewalld, polkitd, NetworkManager, systemd, avahi-daemon, audisp, > abrt-dump-jour* (whatever it's called -- it truncated), upowerd, and > systemd-logind all taking tons of CPU. I've listed them in decreasing > order of amount of CPU burned -- the top several are taking about as > much as is possible. Load average is over 13. That's if I run it > from a text console while I'm logged in to gnome in a different VT. That's right, I can reproduce this here. To explain what's going on, let me provide some background. Every time a client connects to kdbus, a new ID is assigned to the connection, and other connections which have previously subscribed to notifications of type KDBUS_ITEM_ID_ADD or _REMOVE get a notification and are woken up so they can dispatch it. By default, no such matches exists, applicaions have to explicitly opt-in if they are interested in these events. In DBus (both kdbus and DBus1), such matches are installed on the NameOwnerChanged signal, and they can be either specific to a single ID, or broad, which will make them match on any ID. There's actually no reason for applications to install unspecific matches, but if they do, they will of course get what they asked for, and are woken up on every ID that is added to or removed from the bus. What you're seeing in your system profile is that some applications misbehave and install unspecific matches when they shouldn't. That's a userspace bug that needs fixing. Two candidates were actually in the systemd code base (logind and PID1), and both are now patched. Note that these applications are actually affected on both DBus1 and kdbus. The reason you didn't see them trip up in your test is that sd_bus_open() behaves differently in the two worlds. In kdbus, it will immediately call into the kernel and register a new connection, hence triggering the behavior described above. On DBus1, however, the HELLO message will not be transmitted to the daemon until the first message is sent, so no ID is assigned, and no notifications are sent. When augmenting the test program a little so it reads its own ID on the bus, you'll see similar behavior on DBus1 as well, but the bottleneck in this case is the daemon, which significantly mitigates the load caused by other tasks. So, to wrap it up: you've triggered an existing userspace bug. The userspace components under our control have now been fixed, and we'll talk to other people to make them aware of the issue too. However, these issues are not directly related to kdbus, but rather show more impact as a side-effect now. You've raised a valid point here. Thanks a lot for providing this test, much appreciated! Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Hi Andy, On 08/05/2015 02:18 AM, Andy Lutomirski wrote: I added the missing sd_bus_unref call. With userspace dbus, my program takes 95% CPU and dbus-daemon takes 88% CPU or so. With kdbus, I see abuse-bus (my test), systemd-journald, systemd-bus-proxy, auditd, gnome-shell, mission-control, sedispatch, firewalld, polkitd, NetworkManager, systemd, avahi-daemon, audisp, abrt-dump-jour* (whatever it's called -- it truncated), upowerd, and systemd-logind all taking tons of CPU. I've listed them in decreasing order of amount of CPU burned -- the top several are taking about as much as is possible. Load average is over 13. That's if I run it from a text console while I'm logged in to gnome in a different VT. That's right, I can reproduce this here. To explain what's going on, let me provide some background. Every time a client connects to kdbus, a new ID is assigned to the connection, and other connections which have previously subscribed to notifications of type KDBUS_ITEM_ID_ADD or _REMOVE get a notification and are woken up so they can dispatch it. By default, no such matches exists, applicaions have to explicitly opt-in if they are interested in these events. In DBus (both kdbus and DBus1), such matches are installed on the NameOwnerChanged signal, and they can be either specific to a single ID, or broad, which will make them match on any ID. There's actually no reason for applications to install unspecific matches, but if they do, they will of course get what they asked for, and are woken up on every ID that is added to or removed from the bus. What you're seeing in your system profile is that some applications misbehave and install unspecific matches when they shouldn't. That's a userspace bug that needs fixing. Two candidates were actually in the systemd code base (logind and PID1), and both are now patched. Note that these applications are actually affected on both DBus1 and kdbus. The reason you didn't see them trip up in your test is that sd_bus_open() behaves differently in the two worlds. In kdbus, it will immediately call into the kernel and register a new connection, hence triggering the behavior described above. On DBus1, however, the HELLO message will not be transmitted to the daemon until the first message is sent, so no ID is assigned, and no notifications are sent. When augmenting the test program a little so it reads its own ID on the bus, you'll see similar behavior on DBus1 as well, but the bottleneck in this case is the daemon, which significantly mitigates the load caused by other tasks. So, to wrap it up: you've triggered an existing userspace bug. The userspace components under our control have now been fixed, and we'll talk to other people to make them aware of the issue too. However, these issues are not directly related to kdbus, but rather show more impact as a side-effect now. You've raised a valid point here. Thanks a lot for providing this test, much appreciated! Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On 08/06/2015 05:21 PM, Andy Lutomirski wrote: Maybe gdbus really does use kdbus already, but on very brief inspection it looked like it didn't at least on my test VM. No, it's not in any released version yet. The patches for that are being worked on though and look promising. If the client buffers on !EPOLLOUT and has a monster buffer, then that's the client's problem. If every single program has a monster buffer, then it's everyone's problem, and the size of the problem gets multiplied by the number of programs. The size of the memory pool of a bus client is chosen by the client itself individually during the HELLO call. It's pretty much the same as if the client allocated the buffer itself, except that the kernel does it on their behalf. Also note that kdbus features a peer-to-peer based quota accounting logic, so a single bus connection can not DOS another one by filling its buffer. Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu, Aug 6, 2015 at 11:14 AM, Daniel Mack dan...@zonque.org wrote: On 08/06/2015 05:21 PM, Andy Lutomirski wrote: Maybe gdbus really does use kdbus already, but on very brief inspection it looked like it didn't at least on my test VM. No, it's not in any released version yet. The patches for that are being worked on though and look promising. If the client buffers on !EPOLLOUT and has a monster buffer, then that's the client's problem. If every single program has a monster buffer, then it's everyone's problem, and the size of the problem gets multiplied by the number of programs. The size of the memory pool of a bus client is chosen by the client itself individually during the HELLO call. It's pretty much the same as if the client allocated the buffer itself, except that the kernel does it on their behalf. Also note that kdbus features a peer-to-peer based quota accounting logic, so a single bus connection can not DOS another one by filling its buffer. I haven't looked at the quota code at all. Nonetheless, it looks like the slice logic (aside: it looks *way* more complicated than necessary -- what's wrong with circular buffers) will, under most (but not all!) workloads, concentrate access to a smallish fraction of the pool. This is IMO bad, since it means that most of the time most of the pool will remain uncommitted. If, at some point, something causes the access pattern to change and hit all the pages (even just once), suddenly all of the pools get committed, and your memory usage blows up. Again, please stop blaming the clients. In practice, kdbus is a system involving the kernel, systemd, sd-bus, and other stuff, mostly written by the same people. If kdbus gets merged and it survives but half the clients blow up and peoples' systems fall over, that's not okay. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On 08/06/2015 05:27 PM, Andy Lutomirski wrote: In DBus (both kdbus and DBus1), such matches are installed on the NameOwnerChanged signal, and they can be either specific to a single ID, or broad, which will make them match on any ID. There's actually no reason for applications to install unspecific matches, but if they do, they will of course get what they asked for, and are woken up on every ID that is added to or removed from the bus. What you're seeing in your system profile is that some applications misbehave and install unspecific matches when they shouldn't. That's a userspace bug that needs fixing. Two candidates were actually in the systemd code base (logind and PID1), and both are now patched. Can you point me at the patch? https://github.com/systemd/systemd/pull/876 https://github.com/systemd/systemd/pull/887 firewalld and possibly some other applications in the Fedora default install use python-slip, a convenience library that currently unconditionally installs the broad matches. I filed a bug with patches here: https://fedorahosted.org/python-slip/ticket/2 And I filed more bugs for some GNOME components. Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Hi On Wed, Aug 5, 2015 at 10:11 PM, Andy Lutomirski l...@amacapital.net wrote: On Wed, Aug 5, 2015 at 12:10 AM, David Herrmann dh.herrm...@gmail.com wrote: Hi On Tue, Aug 4, 2015 at 4:47 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Aug 4, 2015 at 7:09 AM, David Herrmann dh.herrm...@gmail.com wrote: This is a bug in the proxy (which is already fixed). Should I expect to see it in Rawhide soon? Use this workaround until it does: $ DBUS_SYSTEM_BUS_ADDRESS=kernel:path=/sys/fs/kdbus/0-system/bus ./your-binary Which binary is supposed to be run like that? Your test. Anyway, the broadcasts that I intended to exercise were KDBUS_ITEM_ID_REMOVE. Those appear to be broadcast to everyone, irrespective of policy, so long as the match thingy allows it. Matches are opt-in, not opt-out. Nobody will get this message unless they opt in. And what opts in? Either something's broken, or there's a different scalabilty problem, or a whole pile of kdbus-using programs in Fedora Rawhide do, in fact, opt in. See Daniel's explanation. If applications subscribe to all notifications, they get what they asked for. I recommend filing bug reports for the applications in question. Given that all existing prototype userspace that I'm aware of (systemd and its consumers) apparently opts in, I don't really care that the feature is opt-in. This is just plain wrong. Out of the dozens of dbus applications, you found like 9 which are buggy? Two of them are already fixed, the maintainers of the other ones notified. I'd be interested where you got this notion that all existing prototype userspace [...] opts in. Also, given things like this: commit d27c8057699d164648b7d8c1559fa6529998f89d Author: David Herrmann dh.herrm...@gmail.com Date: Tue May 26 09:30:14 2015 +0200 kdbus: forward ID notifications to everyone it really does seem to me that the point of these ID notifications is for everyone to get them. It's not. This patch just opens the policy so everyone can see those notifications. By default, it's not delivered to anyone. Also, you haven't addressed the memory usage issues -- ..because it doesn't change anything. If your IPC is message based and async, _someone_ needs to buffer. I don't see the difference between buffering locally on !EPOLLOUT or buffering in a shmem pool. In both cases, clients have control over the buffer size. If you disagree, please _elaborate_. Thanks David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Aug 6, 2015 1:04 AM, David Herrmann dh.herrm...@gmail.com wrote: Given that all existing prototype userspace that I'm aware of (systemd and its consumers) apparently opts in, I don't really care that the feature is opt-in. This is just plain wrong. Out of the dozens of dbus applications, you found like 9 which are buggy? Two of them are already fixed, the maintainers of the other ones notified. I'd be interested where you got this notion that all existing prototype userspace [...] opts in. I would say instead that, out of one in-use kdbus library, I found one that was buggy. Maybe gdbus really does use kdbus already, but on very brief inspection it looked like it didn't at least on my test VM. Also, you haven't addressed the memory usage issues -- ..because it doesn't change anything. If your IPC is message based and async, _someone_ needs to buffer. I don't see the difference between buffering locally on !EPOLLOUT or buffering in a shmem pool. In both cases, clients have control over the buffer size. If you disagree, please _elaborate_. If the client buffers on !EPOLLOUT and has a monster buffer, then that's the client's problem. If every single program has a monster buffer, then it's everyone's problem, and the size of the problem gets multiplied by the number of programs. Also, sensible clients that produce bulk data will throttle on !EPOLLOUT rather than blindly buffering, but that's not an option when the huge buffer is on the receiver's end. Read up on bufferbloat. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Am Donnerstag, 6. August 2015, 10:04:57 schrieb David Herrmann: Given that all existing prototype userspace that I'm aware of (systemd and its consumers) apparently opts in, I don't really care that the feature is opt-in. This is just plain wrong. Out of the dozens of dbus applications, you found like 9 which are buggy? Two of them are already fixed, the maintainers of the other ones notified. I'd be interested where you got this notion that all existing prototype userspace [...] opts in. But these few can create the issues Andy described? Sure, one can argue I can setup a stress or stress-ng command line invocation as root user that will basically grind a Linux system to a halt – and in a way I consider this to be a bug in the kernel as well, but one that exists since a long time. But a GUI application running as a user? How about some robustness regarding what you see as bugs in userspace here? I think The bug is not mine is exactly the same language we have seen here before. If the kernel relies on bug-free userspace applications in order to do its job properly I think it has robustness issues. One certainly wouldn´t want this with any mission critical realtime OS. I think it is the kernel that should be in control. Thanks, -- Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu, Aug 6, 2015 at 12:06 AM, Daniel Mack dan...@zonque.org wrote: Hi Andy, On 08/05/2015 02:18 AM, Andy Lutomirski wrote: I added the missing sd_bus_unref call. With userspace dbus, my program takes 95% CPU and dbus-daemon takes 88% CPU or so. With kdbus, I see abuse-bus (my test), systemd-journald, systemd-bus-proxy, auditd, gnome-shell, mission-control, sedispatch, firewalld, polkitd, NetworkManager, systemd, avahi-daemon, audisp, abrt-dump-jour* (whatever it's called -- it truncated), upowerd, and systemd-logind all taking tons of CPU. I've listed them in decreasing order of amount of CPU burned -- the top several are taking about as much as is possible. Load average is over 13. That's if I run it from a text console while I'm logged in to gnome in a different VT. That's right, I can reproduce this here. To explain what's going on, let me provide some background. Every time a client connects to kdbus, a new ID is assigned to the connection, and other connections which have previously subscribed to notifications of type KDBUS_ITEM_ID_ADD or _REMOVE get a notification and are woken up so they can dispatch it. By default, no such matches exists, applicaions have to explicitly opt-in if they are interested in these events. In DBus (both kdbus and DBus1), such matches are installed on the NameOwnerChanged signal, and they can be either specific to a single ID, or broad, which will make them match on any ID. There's actually no reason for applications to install unspecific matches, but if they do, they will of course get what they asked for, and are woken up on every ID that is added to or removed from the bus. What you're seeing in your system profile is that some applications misbehave and install unspecific matches when they shouldn't. That's a userspace bug that needs fixing. Two candidates were actually in the systemd code base (logind and PID1), and both are now patched. Can you point me at the patch? It sounds like that will reduce the scalability issue with this particular test from whatever userspace overhead exists * number of clients to just the overhead of looping over all clients and their matches in the kernel. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Aug 5, 2015 at 12:10 AM, David Herrmann wrote: > Hi > > On Tue, Aug 4, 2015 at 4:47 PM, Andy Lutomirski wrote: >> On Tue, Aug 4, 2015 at 7:09 AM, David Herrmann wrote: >>> This is a bug in the proxy (which is already fixed). >> >> Should I expect to see it in Rawhide soon? > > Use this workaround until it does: > > $ DBUS_SYSTEM_BUS_ADDRESS="kernel:path=/sys/fs/kdbus/0-system/bus" > ./your-binary > Which binary is supposed to be run like that? >> Anyway, the broadcasts that I intended to exercise were >> KDBUS_ITEM_ID_REMOVE. Those appear to be broadcast to everyone, >> irrespective of "policy", so long as the "match" thingy allows it. > > Matches are opt-in, not opt-out. Nobody will get this message unless > they opt in. > And what opts in? Either something's broken, or there's a different scalabilty problem, or a whole pile of kdbus-using programs in Fedora Rawhide do, in fact, opt in. My interest in instrumenting kdbus and systemd to figure out the exact mechanism by which my tiny test case causes my system to freeze is near zero. I bet I'm actually right about the mechanism, but that's sort of beside the point. It freezes, so /something's/ wrong. The only real relevance of my suspicion about the failure mode is that I think it's a design issue that isn't going to be easy to fix. > >> So yes, as far as I can tell, kdbus really does track object lifetime >> by broadcasting every single destruction event to every single >> receiver (subject to caveats above) and pokes the data into every >> receiver's tmpfs space. > > Broadcast reception is opt-in. I've pointed out several times that there a feature in kdbus that doesn't work well and I get told that the problematic feature is opt-in. Given that all existing prototype userspace that I'm aware of (systemd and its consumers) apparently opts in, I don't really care that the feature is opt-in. Also, given things like this: commit d27c8057699d164648b7d8c1559fa6529998f89d Author: David Herrmann Date: Tue May 26 09:30:14 2015 +0200 kdbus: forward ID notifications to everyone it really does seem to me that the point of these ID notifications is for everyone to get them. Also, you haven't addressed the memory usage issues -- I don't see how a full kdbus-using desktop system can be expected to fit into RAM on anything short of the biggest and beefiest laptops. I also don't see how a kdbus-using xdg-app-happy kdbus-using system (with correspondingly many pools) will fit into RAM on even the biggest laptops. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Hi On Tue, Aug 4, 2015 at 4:47 PM, Andy Lutomirski wrote: > On Tue, Aug 4, 2015 at 7:09 AM, David Herrmann wrote: >> This is a bug in the proxy (which is already fixed). > > Should I expect to see it in Rawhide soon? Use this workaround until it does: $ DBUS_SYSTEM_BUS_ADDRESS="kernel:path=/sys/fs/kdbus/0-system/bus" ./your-binary > Anyway, the broadcasts that I intended to exercise were > KDBUS_ITEM_ID_REMOVE. Those appear to be broadcast to everyone, > irrespective of "policy", so long as the "match" thingy allows it. Matches are opt-in, not opt-out. Nobody will get this message unless they opt in. > The bloom filter thing won't help at all according to the docs: bloom > filters don't apply to kernel-generated notifications. Bloom filters apply to message payloads. Kernel notifications do not carry a message payload. Message metadata can be filtered for explicitly (without false-positives). > So yes, as far as I can tell, kdbus really does track object lifetime > by broadcasting every single destruction event to every single > receiver (subject to caveats above) and pokes the data into every > receiver's tmpfs space. Broadcast reception is opt-in. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Hi On Tue, Aug 4, 2015 at 4:47 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Aug 4, 2015 at 7:09 AM, David Herrmann dh.herrm...@gmail.com wrote: This is a bug in the proxy (which is already fixed). Should I expect to see it in Rawhide soon? Use this workaround until it does: $ DBUS_SYSTEM_BUS_ADDRESS=kernel:path=/sys/fs/kdbus/0-system/bus ./your-binary Anyway, the broadcasts that I intended to exercise were KDBUS_ITEM_ID_REMOVE. Those appear to be broadcast to everyone, irrespective of policy, so long as the match thingy allows it. Matches are opt-in, not opt-out. Nobody will get this message unless they opt in. The bloom filter thing won't help at all according to the docs: bloom filters don't apply to kernel-generated notifications. Bloom filters apply to message payloads. Kernel notifications do not carry a message payload. Message metadata can be filtered for explicitly (without false-positives). So yes, as far as I can tell, kdbus really does track object lifetime by broadcasting every single destruction event to every single receiver (subject to caveats above) and pokes the data into every receiver's tmpfs space. Broadcast reception is opt-in. Thanks David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Aug 5, 2015 at 12:10 AM, David Herrmann dh.herrm...@gmail.com wrote: Hi On Tue, Aug 4, 2015 at 4:47 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Aug 4, 2015 at 7:09 AM, David Herrmann dh.herrm...@gmail.com wrote: This is a bug in the proxy (which is already fixed). Should I expect to see it in Rawhide soon? Use this workaround until it does: $ DBUS_SYSTEM_BUS_ADDRESS=kernel:path=/sys/fs/kdbus/0-system/bus ./your-binary Which binary is supposed to be run like that? Anyway, the broadcasts that I intended to exercise were KDBUS_ITEM_ID_REMOVE. Those appear to be broadcast to everyone, irrespective of policy, so long as the match thingy allows it. Matches are opt-in, not opt-out. Nobody will get this message unless they opt in. And what opts in? Either something's broken, or there's a different scalabilty problem, or a whole pile of kdbus-using programs in Fedora Rawhide do, in fact, opt in. My interest in instrumenting kdbus and systemd to figure out the exact mechanism by which my tiny test case causes my system to freeze is near zero. I bet I'm actually right about the mechanism, but that's sort of beside the point. It freezes, so /something's/ wrong. The only real relevance of my suspicion about the failure mode is that I think it's a design issue that isn't going to be easy to fix. So yes, as far as I can tell, kdbus really does track object lifetime by broadcasting every single destruction event to every single receiver (subject to caveats above) and pokes the data into every receiver's tmpfs space. Broadcast reception is opt-in. I've pointed out several times that there a feature in kdbus that doesn't work well and I get told that the problematic feature is opt-in. Given that all existing prototype userspace that I'm aware of (systemd and its consumers) apparently opts in, I don't really care that the feature is opt-in. Also, given things like this: commit d27c8057699d164648b7d8c1559fa6529998f89d Author: David Herrmann dh.herrm...@gmail.com Date: Tue May 26 09:30:14 2015 +0200 kdbus: forward ID notifications to everyone it really does seem to me that the point of these ID notifications is for everyone to get them. Also, you haven't addressed the memory usage issues -- I don't see how a full kdbus-using desktop system can be expected to fit into RAM on anything short of the biggest and beefiest laptops. I also don't see how a kdbus-using xdg-app-happy kdbus-using system (with correspondingly many pools) will fit into RAM on even the biggest laptops. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Tue, Aug 4, 2015 at 7:47 AM, Andy Lutomirski wrote: > On Tue, Aug 4, 2015 at 7:09 AM, David Herrmann wrote: >> Hi >> >> On Tue, Aug 4, 2015 at 3:46 PM, Linus Torvalds >> wrote: >>> On Tue, Aug 4, 2015 at 1:58 AM, David Herrmann >>> wrote: You lack a call to sd_bus_unref() here. >>> >>> I assume it was intentional. Why would Andy talk about "scaling" otherwise? > > It was actually an error. I assumed that, since the user version > worked fine (at least for as long as I ran it) and the kernel version > didn't (killed X and left a blinking cursor, no visible log messages > even when run from a text console, and no obvious OOM recovery after a > long wait) that it was a kdbus issue or issue with other kdbus > clients. > > I'll play with it more today. > I added the missing sd_bus_unref call. With userspace dbus, my program takes 95% CPU and dbus-daemon takes 88% CPU or so. With kdbus, I see abuse-bus (my test), systemd-journald, systemd-bus-proxy, auditd, gnome-shell, mission-control, sedispatch, firewalld, polkitd, NetworkManager, systemd, avahi-daemon, audisp, abrt-dump-jour* (whatever it's called -- it truncated), upowerd, and systemd-logind all taking tons of CPU. I've listed them in decreasing order of amount of CPU burned -- the top several are taking about as much as is possible. Load average is over 13. That's if I run it from a text console while I'm logged in to gnome in a different VT. If I run the program from a graphical terminal, everything freezes so hard that the cursor doesn't even make it to the next line when I hit enter. So I still claim that kdbus doesn't scale. I'm not even just saying that it doesn't scale to large systems -- somewhat to my surprise, it doesn't even seem to scale well enough for a mostly empty Rawhide workstation system running just a graphical terminal. And I didn't even try to find stress tests more interesting than connecting and disconnecting in a loop. FWIW, the old test (without the unref) appeared to be allocating 16M of mapped kdbus pool every iteration, which seems unlikely to have helped matters. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Tue, Aug 4, 2015 at 7:09 AM, David Herrmann wrote: > Hi > > On Tue, Aug 4, 2015 at 3:46 PM, Linus Torvalds > wrote: >> On Tue, Aug 4, 2015 at 1:58 AM, David Herrmann wrote: >>> >>> You lack a call to sd_bus_unref() here. >> >> I assume it was intentional. Why would Andy talk about "scaling" otherwise? It was actually an error. I assumed that, since the user version worked fine (at least for as long as I ran it) and the kernel version didn't (killed X and left a blinking cursor, no visible log messages even when run from a text console, and no obvious OOM recovery after a long wait) that it was a kdbus issue or issue with other kdbus clients. I'll play with it more today. >> >> And the worry was why the kdbus version killed the machine, but the >> userspace version did not. That's a rather big difference, and not a >> good one. > > Neither test 'kills' the machine: > > * The userspace version will be killed by the OOM killer after about > 20s running (depending how much memory you have). Not on my system. Maybe too much memory? > > * The kernel version runs for 1024 iterations (maximum kdbus > connections per user) and then produces errors. > > In fact, the kernel version is even more stable than the user-space > version, and bails out much earlier. Run it on a VT and everything > works just fine. On my system, everything died as described above. > > The only issue you get with kdbus is the compat-bus-daemon, which > assert()s as a side-effect of accept4() failing. In other words, the > compat bus-daemon gets ENFILE if you open that many connections, then > assert()s and thus kills all other proxy connections. This has the > side effect, that Xorg loses access to your graphics device and thus > your screen 'freezes'. Also networkmanager bails out and stops network > connections. Ah, interesting. > > This is a bug in the proxy (which is already fixed). Should I expect to see it in Rawhide soon? Anyway, the broadcasts that I intended to exercise were KDBUS_ITEM_ID_REMOVE. Those appear to be broadcast to everyone, irrespective of "policy", so long as the "match" thingy allows it. As far as I can tell, that's the default behavior (i.e. receivers accept KDBUS_DST_ID_BROADCAST), but even if it's not default, we'll still fail to scale as long as the number of receivers accepting KDBUS_DST_ID_BROADCAST grows as systems become more kdbus-integrated. The bloom filter thing won't help at all according to the docs: bloom filters don't apply to kernel-generated notifications. So yes, as far as I can tell, kdbus really does track object lifetime by broadcasting every single destruction event to every single receiver (subject to caveats above) and pokes the data into every receiver's tmpfs space (via kdbus_bus_broadcast -> kdbus_conn_entry_insert -> lots of other stuff -> vfs_iter_write). At that point, there's well over a gigabyte of tmpfs space that can be scribbled on (and thus committed and thus needs to be read) by rogue broadcasters even on Rawhide, and Rawhide seems to have barely started converting all the kdbus clients from using the proxy to using kdbus directly. IIUC, once gdbus switches over to using kdbus directly, with current buffer sizing, the average laptop will have more kdbus pool tmpfs space mapped than total RAM. I still don't see how this will work well. I guess my test didn't exercise what I meant it to. I wrote it, userspace survived (on my system) and kdbus didn't. Apparently I blew up the bus proxy, not the pool mechanism. Next time I'll try to better characterize exactly what it is I'm doing to my poor VM... --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Hi On Tue, Aug 4, 2015 at 3:46 PM, Linus Torvalds wrote: > On Tue, Aug 4, 2015 at 1:58 AM, David Herrmann wrote: >> >> You lack a call to sd_bus_unref() here. > > I assume it was intentional. Why would Andy talk about "scaling" otherwise? > > And the worry was why the kdbus version killed the machine, but the > userspace version did not. That's a rather big difference, and not a > good one. Neither test 'kills' the machine: * The userspace version will be killed by the OOM killer after about 20s running (depending how much memory you have). * The kernel version runs for 1024 iterations (maximum kdbus connections per user) and then produces errors. In fact, the kernel version is even more stable than the user-space version, and bails out much earlier. Run it on a VT and everything works just fine. The only issue you get with kdbus is the compat-bus-daemon, which assert()s as a side-effect of accept4() failing. In other words, the compat bus-daemon gets ENFILE if you open that many connections, then assert()s and thus kills all other proxy connections. This has the side effect, that Xorg loses access to your graphics device and thus your screen 'freezes'. Also networkmanager bails out and stops network connections. This is a bug in the proxy (which is already fixed). Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Tue, Aug 4, 2015 at 1:58 AM, David Herrmann wrote: > > You lack a call to sd_bus_unref() here. I assume it was intentional. Why would Andy talk about "scaling" otherwise? And the worry was why the kdbus version killed the machine, but the userspace version did not. That's a rather big difference, and not a good one. Possibly the kdbus version ends up not just allocating user space memory (which we should handle gracefully), but kernel allocations too (which absolutely have to be explicitly resource-managed). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Hi On Tue, Aug 4, 2015 at 1:02 AM, Andy Lutomirski wrote: >I got Fedora > Rawhide working under kdbus (thanks, everyone!), and I ran this little > program: > > #include > #include > > int main(int argc, char *argv[]) > { > while (1) { > sd_bus *bus; > if (sd_bus_open_system() < 0) { > /* warn("sd_bus_open_system"); */ > continue; > } > sd_bus_close(bus); You lack a call to sd_bus_unref() here. Without it, your loop contains: while (1) malloc(1024); This simple malloc-loop already hogs your system. If I add the required call to _unref(), your tool runs smoothly on my machine. > } > } > > under both userspace dbus and under kdbus. Userspace dbus burns some > CPU -- no big deal. I expected kdbus to fail to scale and burn a > disproportionate amount of CPU (because I don't see how it /can/ > scale). Instead it fell over completely. I didn't bother debugging > it, but offhand I'd guess that the system OOMed and didn't come back. I cannot see the relation to kdbus. > On very brief inspection, Rawhide seems to have a lot of kdbus > connections with 16MiB of mapped tmpfs stuff each. (53 of them > mapped, and I don't know how many exist with tmpfs backing but aren't > mapped. Presumably the number only goes up as the degree of reliance > on the userspace proxy goes down. What does this have to do with the proxy? Why would resource consumption go *up* as the proxy users decline? Please elaborate. > I don't know of any deployed > systems that solve it by broadcasting the lifetime of everything to > everyone and relying on those broadcasts going through, though. Luckily, kdbus does not do this. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Hi On Tue, Aug 4, 2015 at 1:02 AM, Andy Lutomirski l...@kernel.org wrote: I got Fedora Rawhide working under kdbus (thanks, everyone!), and I ran this little program: #include systemd/sd-bus.h #include err.h int main(int argc, char *argv[]) { while (1) { sd_bus *bus; if (sd_bus_open_system(bus) 0) { /* warn(sd_bus_open_system); */ continue; } sd_bus_close(bus); You lack a call to sd_bus_unref() here. Without it, your loop contains: while (1) malloc(1024); This simple malloc-loop already hogs your system. If I add the required call to _unref(), your tool runs smoothly on my machine. } } under both userspace dbus and under kdbus. Userspace dbus burns some CPU -- no big deal. I expected kdbus to fail to scale and burn a disproportionate amount of CPU (because I don't see how it /can/ scale). Instead it fell over completely. I didn't bother debugging it, but offhand I'd guess that the system OOMed and didn't come back. I cannot see the relation to kdbus. On very brief inspection, Rawhide seems to have a lot of kdbus connections with 16MiB of mapped tmpfs stuff each. (53 of them mapped, and I don't know how many exist with tmpfs backing but aren't mapped. Presumably the number only goes up as the degree of reliance on the userspace proxy goes down. What does this have to do with the proxy? Why would resource consumption go *up* as the proxy users decline? Please elaborate. I don't know of any deployed systems that solve it by broadcasting the lifetime of everything to everyone and relying on those broadcasts going through, though. Luckily, kdbus does not do this. Thanks David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Hi On Tue, Aug 4, 2015 at 3:46 PM, Linus Torvalds torva...@linux-foundation.org wrote: On Tue, Aug 4, 2015 at 1:58 AM, David Herrmann dh.herrm...@gmail.com wrote: You lack a call to sd_bus_unref() here. I assume it was intentional. Why would Andy talk about scaling otherwise? And the worry was why the kdbus version killed the machine, but the userspace version did not. That's a rather big difference, and not a good one. Neither test 'kills' the machine: * The userspace version will be killed by the OOM killer after about 20s running (depending how much memory you have). * The kernel version runs for 1024 iterations (maximum kdbus connections per user) and then produces errors. In fact, the kernel version is even more stable than the user-space version, and bails out much earlier. Run it on a VT and everything works just fine. The only issue you get with kdbus is the compat-bus-daemon, which assert()s as a side-effect of accept4() failing. In other words, the compat bus-daemon gets ENFILE if you open that many connections, then assert()s and thus kills all other proxy connections. This has the side effect, that Xorg loses access to your graphics device and thus your screen 'freezes'. Also networkmanager bails out and stops network connections. This is a bug in the proxy (which is already fixed). Thanks David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Tue, Aug 4, 2015 at 1:58 AM, David Herrmann dh.herrm...@gmail.com wrote: You lack a call to sd_bus_unref() here. I assume it was intentional. Why would Andy talk about scaling otherwise? And the worry was why the kdbus version killed the machine, but the userspace version did not. That's a rather big difference, and not a good one. Possibly the kdbus version ends up not just allocating user space memory (which we should handle gracefully), but kernel allocations too (which absolutely have to be explicitly resource-managed). Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Tue, Aug 4, 2015 at 7:09 AM, David Herrmann dh.herrm...@gmail.com wrote: Hi On Tue, Aug 4, 2015 at 3:46 PM, Linus Torvalds torva...@linux-foundation.org wrote: On Tue, Aug 4, 2015 at 1:58 AM, David Herrmann dh.herrm...@gmail.com wrote: You lack a call to sd_bus_unref() here. I assume it was intentional. Why would Andy talk about scaling otherwise? It was actually an error. I assumed that, since the user version worked fine (at least for as long as I ran it) and the kernel version didn't (killed X and left a blinking cursor, no visible log messages even when run from a text console, and no obvious OOM recovery after a long wait) that it was a kdbus issue or issue with other kdbus clients. I'll play with it more today. And the worry was why the kdbus version killed the machine, but the userspace version did not. That's a rather big difference, and not a good one. Neither test 'kills' the machine: * The userspace version will be killed by the OOM killer after about 20s running (depending how much memory you have). Not on my system. Maybe too much memory? * The kernel version runs for 1024 iterations (maximum kdbus connections per user) and then produces errors. In fact, the kernel version is even more stable than the user-space version, and bails out much earlier. Run it on a VT and everything works just fine. On my system, everything died as described above. The only issue you get with kdbus is the compat-bus-daemon, which assert()s as a side-effect of accept4() failing. In other words, the compat bus-daemon gets ENFILE if you open that many connections, then assert()s and thus kills all other proxy connections. This has the side effect, that Xorg loses access to your graphics device and thus your screen 'freezes'. Also networkmanager bails out and stops network connections. Ah, interesting. This is a bug in the proxy (which is already fixed). Should I expect to see it in Rawhide soon? Anyway, the broadcasts that I intended to exercise were KDBUS_ITEM_ID_REMOVE. Those appear to be broadcast to everyone, irrespective of policy, so long as the match thingy allows it. As far as I can tell, that's the default behavior (i.e. receivers accept KDBUS_DST_ID_BROADCAST), but even if it's not default, we'll still fail to scale as long as the number of receivers accepting KDBUS_DST_ID_BROADCAST grows as systems become more kdbus-integrated. The bloom filter thing won't help at all according to the docs: bloom filters don't apply to kernel-generated notifications. So yes, as far as I can tell, kdbus really does track object lifetime by broadcasting every single destruction event to every single receiver (subject to caveats above) and pokes the data into every receiver's tmpfs space (via kdbus_bus_broadcast - kdbus_conn_entry_insert - lots of other stuff - vfs_iter_write). At that point, there's well over a gigabyte of tmpfs space that can be scribbled on (and thus committed and thus needs to be read) by rogue broadcasters even on Rawhide, and Rawhide seems to have barely started converting all the kdbus clients from using the proxy to using kdbus directly. IIUC, once gdbus switches over to using kdbus directly, with current buffer sizing, the average laptop will have more kdbus pool tmpfs space mapped than total RAM. I still don't see how this will work well. I guess my test didn't exercise what I meant it to. I wrote it, userspace survived (on my system) and kdbus didn't. Apparently I blew up the bus proxy, not the pool mechanism. Next time I'll try to better characterize exactly what it is I'm doing to my poor VM... --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Tue, Aug 4, 2015 at 7:47 AM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Aug 4, 2015 at 7:09 AM, David Herrmann dh.herrm...@gmail.com wrote: Hi On Tue, Aug 4, 2015 at 3:46 PM, Linus Torvalds torva...@linux-foundation.org wrote: On Tue, Aug 4, 2015 at 1:58 AM, David Herrmann dh.herrm...@gmail.com wrote: You lack a call to sd_bus_unref() here. I assume it was intentional. Why would Andy talk about scaling otherwise? It was actually an error. I assumed that, since the user version worked fine (at least for as long as I ran it) and the kernel version didn't (killed X and left a blinking cursor, no visible log messages even when run from a text console, and no obvious OOM recovery after a long wait) that it was a kdbus issue or issue with other kdbus clients. I'll play with it more today. I added the missing sd_bus_unref call. With userspace dbus, my program takes 95% CPU and dbus-daemon takes 88% CPU or so. With kdbus, I see abuse-bus (my test), systemd-journald, systemd-bus-proxy, auditd, gnome-shell, mission-control, sedispatch, firewalld, polkitd, NetworkManager, systemd, avahi-daemon, audisp, abrt-dump-jour* (whatever it's called -- it truncated), upowerd, and systemd-logind all taking tons of CPU. I've listed them in decreasing order of amount of CPU burned -- the top several are taking about as much as is possible. Load average is over 13. That's if I run it from a text console while I'm logged in to gnome in a different VT. If I run the program from a graphical terminal, everything freezes so hard that the cursor doesn't even make it to the next line when I hit enter. So I still claim that kdbus doesn't scale. I'm not even just saying that it doesn't scale to large systems -- somewhat to my surprise, it doesn't even seem to scale well enough for a mostly empty Rawhide workstation system running just a graphical terminal. And I didn't even try to find stress tests more interesting than connecting and disconnecting in a loop. FWIW, the old test (without the unref) appeared to be allocating 16M of mapped kdbus pool every iteration, which seems unlikely to have helped matters. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Mon, Jun 22, 2015 at 11:06 PM, Andy Lutomirski wrote: > 2. Kdbus introduces a novel buffering model. Receivers allocate a big > chunk of what's essentially tmpfs space. Assuming that space is > available (in a virtual memory sense), senders synchronously write to > the receivers' tmpfs space. Broadcast senders synchronously write to > *all* receivers' tmpfs space. I think that, regardless of > implementation, this is problematic if the sender and the receiver are > in different memcgs. Suppose that the message is to be written to a > page in the receivers' tmpfs space that is not currently resident. If > the write happens in the sender's memcg context, then a receiver can > effectively allocate an unlimited number of pages in the sender's > memcg, which will, in practice, be the init memcg if the sender is > systemd. This breaks the memcg model. If, on the other hand, the > sender writes to the receiver's tmpfs space in the receiver's memcg > context, then the sender will block (or fail? presumably > unpredictable failures are a bad thing) if the receiver's memcg is at > capacity. I realize that everyone is sick of this thread. Nonetheless, I should emphasize that I'm actually serious about this issue. I got Fedora Rawhide working under kdbus (thanks, everyone!), and I ran this little program: #include #include int main(int argc, char *argv[]) { while (1) { sd_bus *bus; if (sd_bus_open_system() < 0) { /* warn("sd_bus_open_system"); */ continue; } sd_bus_close(bus); } } under both userspace dbus and under kdbus. Userspace dbus burns some CPU -- no big deal. I expected kdbus to fail to scale and burn a disproportionate amount of CPU (because I don't see how it /can/ scale). Instead it fell over completely. I didn't bother debugging it, but offhand I'd guess that the system OOMed and didn't come back. On very brief inspection, Rawhide seems to have a lot of kdbus connections with 16MiB of mapped tmpfs stuff each. (53 of them mapped, and I don't know how many exist with tmpfs backing but aren't mapped. Presumably the number only goes up as the degree of reliance on the userspace proxy goes down. As it stands, that's over 3GB of uncommitted backing store that my test is likely to forcibly commit very quickly.) Frankly, I don't understand how it's possible to cleanly implement kdbus' broadcast or lifetime semantics* in an environment with bounded CPU or bounded memory. (And unbounded memory just changes the problem, since the message backlog can just get worse and worse.) I work in an industry in which lots of parties broadcast lots of data to lots of people. If you try to drink from the firehose and you can't swallow fast enough, either you need to throw something out (and test your recovery code!) or you fail. At least in finance, no one pretends that a global order of events in different cities is practical. * Detecting when when your peer goes away is, of course, a widely encountered and widely solved problem. I don't know of any deployed systems that solve it by broadcasting the lifetime of everything to everyone and relying on those broadcasts going through, though. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Mon, Jun 22, 2015 at 11:06 PM, Andy Lutomirski l...@amacapital.net wrote: 2. Kdbus introduces a novel buffering model. Receivers allocate a big chunk of what's essentially tmpfs space. Assuming that space is available (in a virtual memory sense), senders synchronously write to the receivers' tmpfs space. Broadcast senders synchronously write to *all* receivers' tmpfs space. I think that, regardless of implementation, this is problematic if the sender and the receiver are in different memcgs. Suppose that the message is to be written to a page in the receivers' tmpfs space that is not currently resident. If the write happens in the sender's memcg context, then a receiver can effectively allocate an unlimited number of pages in the sender's memcg, which will, in practice, be the init memcg if the sender is systemd. This breaks the memcg model. If, on the other hand, the sender writes to the receiver's tmpfs space in the receiver's memcg context, then the sender will block (or fail? presumably unpredictable failures are a bad thing) if the receiver's memcg is at capacity. I realize that everyone is sick of this thread. Nonetheless, I should emphasize that I'm actually serious about this issue. I got Fedora Rawhide working under kdbus (thanks, everyone!), and I ran this little program: #include systemd/sd-bus.h #include err.h int main(int argc, char *argv[]) { while (1) { sd_bus *bus; if (sd_bus_open_system(bus) 0) { /* warn(sd_bus_open_system); */ continue; } sd_bus_close(bus); } } under both userspace dbus and under kdbus. Userspace dbus burns some CPU -- no big deal. I expected kdbus to fail to scale and burn a disproportionate amount of CPU (because I don't see how it /can/ scale). Instead it fell over completely. I didn't bother debugging it, but offhand I'd guess that the system OOMed and didn't come back. On very brief inspection, Rawhide seems to have a lot of kdbus connections with 16MiB of mapped tmpfs stuff each. (53 of them mapped, and I don't know how many exist with tmpfs backing but aren't mapped. Presumably the number only goes up as the degree of reliance on the userspace proxy goes down. As it stands, that's over 3GB of uncommitted backing store that my test is likely to forcibly commit very quickly.) Frankly, I don't understand how it's possible to cleanly implement kdbus' broadcast or lifetime semantics* in an environment with bounded CPU or bounded memory. (And unbounded memory just changes the problem, since the message backlog can just get worse and worse.) I work in an industry in which lots of parties broadcast lots of data to lots of people. If you try to drink from the firehose and you can't swallow fast enough, either you need to throw something out (and test your recovery code!) or you fail. At least in finance, no one pretends that a global order of events in different cities is practical. * Detecting when when your peer goes away is, of course, a widely encountered and widely solved problem. I don't know of any deployed systems that solve it by broadcasting the lifetime of everything to everyone and relying on those broadcasts going through, though. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu 2015-07-09 10:39:58, Geert Uytterhoeven wrote: > On Wed, Jul 8, 2015 at 3:54 PM, Pavel Machek wrote: > > Apparently, new tools are needed in the community, as normal review > > comments did not stop drivers/android/binder.c merge. > > > > For example binder_transaction does not exactly look like a kernel > > code, "TODO: fput" does not really invoke confidence, and ammount of > > BUG_ON()s is quite amazing... > > Amazingly, checkpatch (without --strict) only complains about long lines. Well, checkpatch only tells half of storry. Anyway worst problem is that there's no documentation of kernel<->user interface binder provides, making understanding it hard/impossible. Closest to documentation pointer is: * Based on, but no longer compatible with, the original * OpenBinder.org binder driver interface, which is: Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu, Jul 9, 2015 at 12:29 PM, Joe Perches wrote: > On Thu, 2015-07-09 at 10:39 +0200, Geert Uytterhoeven wrote: >> On Wed, Jul 8, 2015 at 3:54 PM, Pavel Machek wrote: >> > Apparently, new tools are needed in the community, as normal review >> > comments did not stop drivers/android/binder.c merge. >> > >> > For example binder_transaction does not exactly look like a kernel >> > code, "TODO: fput" does not really invoke confidence, and amount of >> > BUG_ON()s is quite amazing... >> >> Amazingly, checkpatch (without --strict) only complains about long lines. >> >> Seems like the test for "BUG" is (and always has been) commented out... > > Maybe (requires --strict when scanning files) > --- > scripts/checkpatch.pl | 14 -- > 1 file changed, 8 insertions(+), 6 deletions(-) Thanks! Detected 31 occurrences (+ 1 commented out), shudder... Tested-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu, 2015-07-09 at 10:39 +0200, Geert Uytterhoeven wrote: > On Wed, Jul 8, 2015 at 3:54 PM, Pavel Machek wrote: > > Apparently, new tools are needed in the community, as normal review > > comments did not stop drivers/android/binder.c merge. > > > > For example binder_transaction does not exactly look like a kernel > > code, "TODO: fput" does not really invoke confidence, and amount of > > BUG_ON()s is quite amazing... > > Amazingly, checkpatch (without --strict) only complains about long lines. > > Seems like the test for "BUG" is (and always has been) commented out... Maybe (requires --strict when scanning files) --- scripts/checkpatch.pl | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 90e1edc..11c8186 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3439,13 +3439,15 @@ sub process { } } -# # no BUG() or BUG_ON() -# if ($line =~ /\b(BUG|BUG_ON)\b/) { -# print "Try to use WARN_ON & Recovery code rather than BUG() or BUG_ON()\n"; -# print "$herecurr"; -# $clean = 0; -# } +# avoid BUG() or BUG_ON() + if ($line =~ /\b(?:BUG|BUG_ON)\b/) { + my $msg_type = \ + $msg_type = \ if ($file); + &{$msg_type}("AVOID_BUG", +"Avoid crashing the kernel - Try using WARN_ON & Recovery code rather than BUG() or BUG_ON()\n" . $herecurr); + } +# avoid LINUX_VERSION_CODE if ($line =~ /\bLINUX_VERSION_CODE\b/) { WARN("LINUX_VERSION_CODE", "LINUX_VERSION_CODE should be avoided, code should be for the version to which it is merged\n" . $herecurr); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jul 8, 2015 at 3:54 PM, Pavel Machek wrote: > Apparently, new tools are needed in the community, as normal review > comments did not stop drivers/android/binder.c merge. > > For example binder_transaction does not exactly look like a kernel > code, "TODO: fput" does not really invoke confidence, and ammount of > BUG_ON()s is quite amazing... Amazingly, checkpatch (without --strict) only complains about long lines. Seems like the test for "BUG" is (and always has been) commented out... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jul 8, 2015 at 3:54 PM, Pavel Machek pa...@ucw.cz wrote: Apparently, new tools are needed in the community, as normal review comments did not stop drivers/android/binder.c merge. For example binder_transaction does not exactly look like a kernel code, TODO: fput does not really invoke confidence, and ammount of BUG_ON()s is quite amazing... Amazingly, checkpatch (without --strict) only complains about long lines. Seems like the test for BUG is (and always has been) commented out... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu, Jul 9, 2015 at 12:29 PM, Joe Perches j...@perches.com wrote: On Thu, 2015-07-09 at 10:39 +0200, Geert Uytterhoeven wrote: On Wed, Jul 8, 2015 at 3:54 PM, Pavel Machek pa...@ucw.cz wrote: Apparently, new tools are needed in the community, as normal review comments did not stop drivers/android/binder.c merge. For example binder_transaction does not exactly look like a kernel code, TODO: fput does not really invoke confidence, and amount of BUG_ON()s is quite amazing... Amazingly, checkpatch (without --strict) only complains about long lines. Seems like the test for BUG is (and always has been) commented out... Maybe (requires --strict when scanning files) --- scripts/checkpatch.pl | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) Thanks! Detected 31 occurrences (+ 1 commented out), shudder... Tested-by: Geert Uytterhoeven ge...@linux-m68k.org Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu 2015-07-09 10:39:58, Geert Uytterhoeven wrote: On Wed, Jul 8, 2015 at 3:54 PM, Pavel Machek pa...@ucw.cz wrote: Apparently, new tools are needed in the community, as normal review comments did not stop drivers/android/binder.c merge. For example binder_transaction does not exactly look like a kernel code, TODO: fput does not really invoke confidence, and ammount of BUG_ON()s is quite amazing... Amazingly, checkpatch (without --strict) only complains about long lines. Well, checkpatch only tells half of storry. Anyway worst problem is that there's no documentation of kernel-user interface binder provides, making understanding it hard/impossible. Closest to documentation pointer is: * Based on, but no longer compatible with, the original * OpenBinder.org binder driver interface, which is: Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu, 2015-07-09 at 10:39 +0200, Geert Uytterhoeven wrote: On Wed, Jul 8, 2015 at 3:54 PM, Pavel Machek pa...@ucw.cz wrote: Apparently, new tools are needed in the community, as normal review comments did not stop drivers/android/binder.c merge. For example binder_transaction does not exactly look like a kernel code, TODO: fput does not really invoke confidence, and amount of BUG_ON()s is quite amazing... Amazingly, checkpatch (without --strict) only complains about long lines. Seems like the test for BUG is (and always has been) commented out... Maybe (requires --strict when scanning files) --- scripts/checkpatch.pl | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 90e1edc..11c8186 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3439,13 +3439,15 @@ sub process { } } -# # no BUG() or BUG_ON() -# if ($line =~ /\b(BUG|BUG_ON)\b/) { -# print Try to use WARN_ON Recovery code rather than BUG() or BUG_ON()\n; -# print $herecurr; -# $clean = 0; -# } +# avoid BUG() or BUG_ON() + if ($line =~ /\b(?:BUG|BUG_ON)\b/) { + my $msg_type = \WARN; + $msg_type = \CHK if ($file); + {$msg_type}(AVOID_BUG, +Avoid crashing the kernel - Try using WARN_ON Recovery code rather than BUG() or BUG_ON()\n . $herecurr); + } +# avoid LINUX_VERSION_CODE if ($line =~ /\bLINUX_VERSION_CODE\b/) { WARN(LINUX_VERSION_CODE, LINUX_VERSION_CODE should be avoided, code should be for the version to which it is merged\n . $herecurr); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Mon 2015-06-22 23:41:40, Greg KH wrote: > On Mon, Jun 22, 2015 at 11:06:09PM -0700, Andy Lutomirski wrote: > > Hi Linus, > > > > Can you opine as to whether you think that kdbus should be merged? > > Ah, a preemptive pull request denial, how nice. > I don't think I've ever seen such a thing before, congratulations for > creating something so must have previously been lacking in our > development model in how to work together in a community in a productive > manner. Apparently, new tools are needed in the community, as normal review comments did not stop drivers/android/binder.c merge. For example binder_transaction does not exactly look like a kernel code, "TODO: fput" does not really invoke confidence, and ammount of BUG_ON()s is quite amazing... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Mon 2015-06-22 23:41:40, Greg KH wrote: On Mon, Jun 22, 2015 at 11:06:09PM -0700, Andy Lutomirski wrote: Hi Linus, Can you opine as to whether you think that kdbus should be merged? Ah, a preemptive pull request denial, how nice. I don't think I've ever seen such a thing before, congratulations for creating something so must have previously been lacking in our development model in how to work together in a community in a productive manner. Apparently, new tools are needed in the community, as normal review comments did not stop drivers/android/binder.c merge. For example binder_transaction does not exactly look like a kernel code, TODO: fput does not really invoke confidence, and ammount of BUG_ON()s is quite amazing... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jul 01, 2015 at 06:51:41PM +0200, David Herrmann wrote: > Hi > Thanks for the answers; in response I've got some further questions. Again, apologies for length -- I apparently don't know how to discuss IPC tersely. > On Wed, Jul 1, 2015 at 2:03 AM, Kalle A. Sandstrom wrote: > > For the first, compare unix domain sockets (i.e. point-to-point mode, access > > control through filesystem [or fork() parentage], read/write/select) to the > > kdbus message-sending ioctl. In the main data-exchanging portion, the former > > requires only a connection identifier, a pointer to a buffer, and the length > > of data in that buffer. To contrast, kdbus takes a complex message-sending > > command structure with 0..n items of m kinds that the ioctl must parse in a > > m-way switching loop, and then another complex message-describing structure > > which has its own 1..n items of another m kinds describing its contents, > > destination-lookup options, negotiation of supported options, and so forth. > > sendmsg(2) uses a very similar payload to kdbus. send(2) is a shortcut > to simplify the most common use-case. I'd be more than glad to accept > patches adding such shortcuts to kdbus, if accompanied by benchmark > numbers and reasoning why this is a common path for dbus/etc. clients. > A shortcut special case for e.g. only iovec-like payload items, only to a numerically designated peer, and only RPC forms, should be an immediate gain given that reduced functionality would lower the number of instructions executed, the number of impredictable branches met, and the number of possibly-cold cache lines accessed. The difference in raw cycles should be significant in comparison to the number of kernel exits avoided during a client's RPC to a service and the associated reply. Assuming that such RPCs are the bulk of what kdbus will do, and that c/s avoidance is crucial to the performance argument in its design, it seems silly not to have such a fast-path -- even if it is initially implemented as a simple wrapper of the full send ioctl. It would also put the basic send operation on par with sendmsg(2) over a connected socket in terms of interface complexity, and simplify any future "exit into peer without scheduler latency" shenanigans. However, these gains would go unobserved in code written to the current kdbus ABI. Bridging to such a fast-path from the full interface would eliminate most of its benefits while hurting its legit callers. That being said, considering that the eventual de-facto user API to kdbus is a library with explicit deserialization, endianness conversion, and suchlike, I could see how the difference would go unobserved. > The kdbus API is kept generic and extendable, while trying to keep > runtime overhead minimal. If this overhead turns out to be a > significant runtime slowdown (which none of my benchmarks showed), we > should consider adding shortcuts. Until then, I prefer an API that is > consistent, easy to extend and flexible. > Out of curiosity, what payload item types do you see being added in the near future, e.g. the next year? UDS knows only of simple buffers, scatter/gather iovecs, and inter-process dup(2); and recent Linux adds sourcing from a file descriptor. Perhaps a "pass this previously-received message on" item? > > Consequently, a carefully optimized implementation of unix domain sockets > > (and > > by extension all the data-carrying SysV etc. IPC primitives, optimized > > similarly) will always be superior to kdbus for both message throughput and > > latency, [...] > > Yes, that's due to the point-to-point nature of UDS. > Does this change for broadcast, unassociated, or doubly-addressed[0] operation? For the first, kdbus must already cause allocation of cache lines in proportion to msg_length * n_recvrs, which mutes the broker's single-copy advantage as the number of receivers grows. For the second, name lookup from (say) a hash table only adds to required processing, though the resulting identifier could be re-used immediately afterward; and the third mode would prohibit that optimization altogether. Relatedly, is there publicly-available data concerning the distribution of various dbus IPC modalities? Such as a desktop booting under systemd, running for a decent bit, and shutting down; or the automotive industry's (presumably signaling-heavy) use cases which I've heard quoted for a figure of 600k transactions before steady state. > > [...] For long messages (> L1 cache size per Stetson-Harrison[0]) the > > only performance benefit from kdbus is its claimed single-copy mode of > > operation-- an equivalent to which could be had with ye olde sockets by > > copying > > data from the writer directly into the reader while one of them blocks[1] in > > the appropriate syscall. That the current Linux pipes, SysV queues, unix > > domain > > sockets, etc. don't do this doesn't really factor in. > > Parts of the network subsystem have supported single-copy (mmap'ed
Re: kdbus: to merge or not to merge?
On Wed, Jul 01, 2015 at 06:51:41PM +0200, David Herrmann wrote: Hi Thanks for the answers; in response I've got some further questions. Again, apologies for length -- I apparently don't know how to discuss IPC tersely. On Wed, Jul 1, 2015 at 2:03 AM, Kalle A. Sandstrom ksand...@iki.fi wrote: For the first, compare unix domain sockets (i.e. point-to-point mode, access control through filesystem [or fork() parentage], read/write/select) to the kdbus message-sending ioctl. In the main data-exchanging portion, the former requires only a connection identifier, a pointer to a buffer, and the length of data in that buffer. To contrast, kdbus takes a complex message-sending command structure with 0..n items of m kinds that the ioctl must parse in a m-way switching loop, and then another complex message-describing structure which has its own 1..n items of another m kinds describing its contents, destination-lookup options, negotiation of supported options, and so forth. sendmsg(2) uses a very similar payload to kdbus. send(2) is a shortcut to simplify the most common use-case. I'd be more than glad to accept patches adding such shortcuts to kdbus, if accompanied by benchmark numbers and reasoning why this is a common path for dbus/etc. clients. A shortcut special case for e.g. only iovec-like payload items, only to a numerically designated peer, and only RPC forms, should be an immediate gain given that reduced functionality would lower the number of instructions executed, the number of impredictable branches met, and the number of possibly-cold cache lines accessed. The difference in raw cycles should be significant in comparison to the number of kernel exits avoided during a client's RPC to a service and the associated reply. Assuming that such RPCs are the bulk of what kdbus will do, and that c/s avoidance is crucial to the performance argument in its design, it seems silly not to have such a fast-path -- even if it is initially implemented as a simple wrapper of the full send ioctl. It would also put the basic send operation on par with sendmsg(2) over a connected socket in terms of interface complexity, and simplify any future exit into peer without scheduler latency shenanigans. However, these gains would go unobserved in code written to the current kdbus ABI. Bridging to such a fast-path from the full interface would eliminate most of its benefits while hurting its legit callers. That being said, considering that the eventual de-facto user API to kdbus is a library with explicit deserialization, endianness conversion, and suchlike, I could see how the difference would go unobserved. The kdbus API is kept generic and extendable, while trying to keep runtime overhead minimal. If this overhead turns out to be a significant runtime slowdown (which none of my benchmarks showed), we should consider adding shortcuts. Until then, I prefer an API that is consistent, easy to extend and flexible. Out of curiosity, what payload item types do you see being added in the near future, e.g. the next year? UDS knows only of simple buffers, scatter/gather iovecs, and inter-process dup(2); and recent Linux adds sourcing from a file descriptor. Perhaps a pass this previously-received message on item? Consequently, a carefully optimized implementation of unix domain sockets (and by extension all the data-carrying SysV etc. IPC primitives, optimized similarly) will always be superior to kdbus for both message throughput and latency, [...] Yes, that's due to the point-to-point nature of UDS. Does this change for broadcast, unassociated, or doubly-addressed[0] operation? For the first, kdbus must already cause allocation of cache lines in proportion to msg_length * n_recvrs, which mutes the broker's single-copy advantage as the number of receivers grows. For the second, name lookup from (say) a hash table only adds to required processing, though the resulting identifier could be re-used immediately afterward; and the third mode would prohibit that optimization altogether. Relatedly, is there publicly-available data concerning the distribution of various dbus IPC modalities? Such as a desktop booting under systemd, running for a decent bit, and shutting down; or the automotive industry's (presumably signaling-heavy) use cases which I've heard quoted for a figure of 600k transactions before steady state. [...] For long messages ( L1 cache size per Stetson-Harrison[0]) the only performance benefit from kdbus is its claimed single-copy mode of operation-- an equivalent to which could be had with ye olde sockets by copying data from the writer directly into the reader while one of them blocks[1] in the appropriate syscall. That the current Linux pipes, SysV queues, unix domain sockets, etc. don't do this doesn't really factor in. Parts of the network subsystem have supported single-copy (mmap'ed IO) for quite some time. kdbus mandates it, but
Re: kdbus: to merge or not to merge?
Hi On Wed, Jul 1, 2015 at 2:03 AM, Kalle A. Sandstrom wrote: > For the first, compare unix domain sockets (i.e. point-to-point mode, access > control through filesystem [or fork() parentage], read/write/select) to the > kdbus message-sending ioctl. In the main data-exchanging portion, the former > requires only a connection identifier, a pointer to a buffer, and the length > of data in that buffer. To contrast, kdbus takes a complex message-sending > command structure with 0..n items of m kinds that the ioctl must parse in a > m-way switching loop, and then another complex message-describing structure > which has its own 1..n items of another m kinds describing its contents, > destination-lookup options, negotiation of supported options, and so forth. sendmsg(2) uses a very similar payload to kdbus. send(2) is a shortcut to simplify the most common use-case. I'd be more than glad to accept patches adding such shortcuts to kdbus, if accompanied by benchmark numbers and reasoning why this is a common path for dbus/etc. clients. The kdbus API is kept generic and extendable, while trying to keep runtime overhead minimal. If this overhead turns out to be a significant runtime slowdown (which none of my benchmarks showed), we should consider adding shortcuts. Until then, I prefer an API that is consistent, easy to extend and flexible. > Consequently, a carefully optimized implementation of unix domain sockets (and > by extension all the data-carrying SysV etc. IPC primitives, optimized > similarly) will always be superior to kdbus for both message throughput and > latency, [...] Yes, that's due to the point-to-point nature of UDS. > [...] For long messages (> L1 cache size per Stetson-Harrison[0]) the > only performance benefit from kdbus is its claimed single-copy mode of > operation-- an equivalent to which could be had with ye olde sockets by > copying > data from the writer directly into the reader while one of them blocks[1] in > the appropriate syscall. That the current Linux pipes, SysV queues, unix > domain > sockets, etc. don't do this doesn't really factor in. Parts of the network subsystem have supported single-copy (mmap'ed IO) for quite some time. kdbus mandates it, but otherwise is not special in that regard. > A consequence of this buffering is that whenever a client sends a message with > kdbus, it must be prepared to handle an out-of-space non-delivery status. > [...] There's no option to e.g. overwrite a previous > message, or to discard queued messages in an oldest-first order, instead of > rebuffing the sender. Correct. > For broadcast messaging, a recipient may observe that messages were dropped by > looking at a `dropped_msgs' field delivered (and then reset) as part of the > message reception ioctl. Its value is the number of messages dropped since > last > read, so arguably a client could achieve the equivalent of the condition's > absence by resynchronizing explicitly with all signal-senders on its current > bus wrt which it knows the protocol, when the value is >0. This method could > in > principle apply to 1-to-1 unidirectional messaging as well[2]. Correct. > Looking at the kdbus "send message, wait for tagged reply" feature in > conjunction with these details appears to reveal two holes in its state graph. > The first is that if replies are delivered through the requestor's buffer, > concurrent sends into that same buffer may cause it to become full (or the > queue to grow too long, w/e) before the service gets a chance to reply. If > this > condition causes a reply to fall out of the IPC flow, the requestor will hang > until either its specified timeout happens or it gets interrupted by a signal. If sending a reply fails, the kdbus_reply state is destructed and the caller must be woken up. We do that for sync-calls just fine, but the async case does indeed lack a wake-up in the error path. I noted this down and will fix it. > If replies are delivered outside the shm pool, the requestor must be prepared > to pick them up using a different means from the "in your pool w/ offset X, > length Y" format the main-line kdbus interface provides. [...] Replies are never delivered outside the shm pool. > The second problem is that given how there can be a timeout or interrupt on > the > receive side of a "method call" transaction, it's possible for the requestor > to > bow out of the IPC flow _while the service is processing its request_. This > results either in the reply message being lost, or its ending up in the > requestor's buffer to appear in a loop where it may not be expected. Either (for completeness: we properly support resuming interrupted sync-calls) > way, the client must at that point resynchronize wrt all objects related to > the > request's side effects, or abandon the IPC flow entirely and start over. > (services need only confirm their replies before effecting e.g. a chardev-like > "destructively read N bytes from buffer" operation's outcome,
Re: kdbus: to merge or not to merge?
Hi On Wed, Jul 1, 2015 at 2:03 AM, Kalle A. Sandstrom ksand...@iki.fi wrote: For the first, compare unix domain sockets (i.e. point-to-point mode, access control through filesystem [or fork() parentage], read/write/select) to the kdbus message-sending ioctl. In the main data-exchanging portion, the former requires only a connection identifier, a pointer to a buffer, and the length of data in that buffer. To contrast, kdbus takes a complex message-sending command structure with 0..n items of m kinds that the ioctl must parse in a m-way switching loop, and then another complex message-describing structure which has its own 1..n items of another m kinds describing its contents, destination-lookup options, negotiation of supported options, and so forth. sendmsg(2) uses a very similar payload to kdbus. send(2) is a shortcut to simplify the most common use-case. I'd be more than glad to accept patches adding such shortcuts to kdbus, if accompanied by benchmark numbers and reasoning why this is a common path for dbus/etc. clients. The kdbus API is kept generic and extendable, while trying to keep runtime overhead minimal. If this overhead turns out to be a significant runtime slowdown (which none of my benchmarks showed), we should consider adding shortcuts. Until then, I prefer an API that is consistent, easy to extend and flexible. Consequently, a carefully optimized implementation of unix domain sockets (and by extension all the data-carrying SysV etc. IPC primitives, optimized similarly) will always be superior to kdbus for both message throughput and latency, [...] Yes, that's due to the point-to-point nature of UDS. [...] For long messages ( L1 cache size per Stetson-Harrison[0]) the only performance benefit from kdbus is its claimed single-copy mode of operation-- an equivalent to which could be had with ye olde sockets by copying data from the writer directly into the reader while one of them blocks[1] in the appropriate syscall. That the current Linux pipes, SysV queues, unix domain sockets, etc. don't do this doesn't really factor in. Parts of the network subsystem have supported single-copy (mmap'ed IO) for quite some time. kdbus mandates it, but otherwise is not special in that regard. A consequence of this buffering is that whenever a client sends a message with kdbus, it must be prepared to handle an out-of-space non-delivery status. [...] There's no option to e.g. overwrite a previous message, or to discard queued messages in an oldest-first order, instead of rebuffing the sender. Correct. For broadcast messaging, a recipient may observe that messages were dropped by looking at a `dropped_msgs' field delivered (and then reset) as part of the message reception ioctl. Its value is the number of messages dropped since last read, so arguably a client could achieve the equivalent of the condition's absence by resynchronizing explicitly with all signal-senders on its current bus wrt which it knows the protocol, when the value is 0. This method could in principle apply to 1-to-1 unidirectional messaging as well[2]. Correct. Looking at the kdbus send message, wait for tagged reply feature in conjunction with these details appears to reveal two holes in its state graph. The first is that if replies are delivered through the requestor's buffer, concurrent sends into that same buffer may cause it to become full (or the queue to grow too long, w/e) before the service gets a chance to reply. If this condition causes a reply to fall out of the IPC flow, the requestor will hang until either its specified timeout happens or it gets interrupted by a signal. If sending a reply fails, the kdbus_reply state is destructed and the caller must be woken up. We do that for sync-calls just fine, but the async case does indeed lack a wake-up in the error path. I noted this down and will fix it. If replies are delivered outside the shm pool, the requestor must be prepared to pick them up using a different means from the in your pool w/ offset X, length Y format the main-line kdbus interface provides. [...] Replies are never delivered outside the shm pool. The second problem is that given how there can be a timeout or interrupt on the receive side of a method call transaction, it's possible for the requestor to bow out of the IPC flow _while the service is processing its request_. This results either in the reply message being lost, or its ending up in the requestor's buffer to appear in a loop where it may not be expected. Either (for completeness: we properly support resuming interrupted sync-calls) way, the client must at that point resynchronize wrt all objects related to the request's side effects, or abandon the IPC flow entirely and start over. (services need only confirm their replies before effecting e.g. a chardev-like destructively read N bytes from buffer operation's outcome, which is slightly less ugly.) Correct. If you
Re: kdbus: to merge or not to merge?
[delurk; apparently kdbus is not receiving the architectural review it should. i've got quite a bit of knowledge on message-passing mechanisms in general, and kernel IPC in particular, so i'll weigh in uninvited. apologies for length. as my "proper" review on this topic is still under construction, i'll try (and fail) to be brief here. i started down that road only to realize that kdbus is quite the ball of mud even if the only thing under the scope is its interface, and that if i held off until properly ready i'd risk kdbus having already been merged, making review moot.] Ingo Molnar wrote: >- I've been closely monitoring Linux kernel changes for over 20 years, and for >the > last 10 years the linux/ipc/* code has been dormant: it works and was kept > good > for existing usecases, but no-one was maintaining and enhancing it with the > future in mind. It's my understanding that linux/ipc/* contains only SysV IPC, i.e. shm, sem, SysV message queues, and POSIX message queues. There are other IPC-implementing things in the kernel also, such as unix domain sockets, pipes, shared memory via mmap(), signals, mappings that appear shared across fork(), and whatever else provides either kernel-mediated multi-client buffer access or some combination of shared memory and synchronization that lets userspace exchange hot data across the address space boundary. It's also my understanding that no-one in their right mind would call SysV IPC state-of-the-art even at the level of interface; indeed its presence in the hoariest of vendor unixes suggests it's not supposed to be even close. However, the suggested replacement in kdbus replicates the worst[-1] of all known user-to-user IPC mechanisms, i.e. Mach. I'm not suggesting that Linux adopt e.g. a different microkernel IPC mechanism-- those are by and large inapplicable to a monolithic kernel for reasons of ABI (and, well, why would you do IPC when function calls are zomgfast already?)-- but rather, that the existing ones either are good enough at this time or can be reworked to become near-equivalent to the state of the art in terms of performance. > So there exists a technical vacuum: the kernel does not have any good, modern > IPC ABI at the moment that distros can rely on as a 'golden standard'. This > is > partly technical, partly political. The technical reason is that SysV IPC is > ancient and cumbersome. The political reason is that SystemD could be using > and extending Android's existing kernel accelerated IPC subsystem (Binder) > that is already upstream - but does not. I'll contend that the reason for this vacuum is that the existing kernel IPC interfaces are fine to the point that other mechanisms may be derived from them solely in user-space without significant performance demerit, and without pushing ca. 10k SLOC of IPC broker and policy engine into kernel space. Furthermore, it's my well-ruminated opinion that implementations of the userspace ABI specified in the kdbus 4.1-rc1 version (of April this year) will always be necessarily slower than existing IPC primitives in terms of both throughput and latency; and that the latter are directly applicable to constructing a more convenient user-space IPC broker that implements what kdbus seeks to provide: naming, broadcast, unidirectional signaling, bidirectional "method calls", and a policy mechanism. In addition I'll argue that as currently specified, the kdbus interface-- even if tuned to its utmost-- is not only necessarily inferior to e.g. a well-tuned version of unix domain sockets, but also fundamentally flawed in ways that prohibit construction of robust in-system distributed programs by kdbus' mechanisms alone (i.e. byzantine call-site workarounds notwithstanding). For the first, compare unix domain sockets (i.e. point-to-point mode, access control through filesystem [or fork() parentage], read/write/select) to the kdbus message-sending ioctl. In the main data-exchanging portion, the former requires only a connection identifier, a pointer to a buffer, and the length of data in that buffer. To contrast, kdbus takes a complex message-sending command structure with 0..n items of m kinds that the ioctl must parse in a m-way switching loop, and then another complex message-describing structure which has its own 1..n items of another m kinds describing its contents, destination-lookup options, negotiation of supported options, and so forth. Consequently, a carefully optimized implementation of unix domain sockets (and by extension all the data-carrying SysV etc. IPC primitives, optimized similarly) will always be superior to kdbus for both message throughput and latency, for the reason of kdbus' comparatively great interface complexity alone. There's an obvious caveat here, i.e. "well where is it, then?". Given the overhead dictated by its interface, kdbus' performance is already inferior for short messages. For long messages (> L1 cache size per Stetson-Harrison[0]) the only
Re: kdbus: to merge or not to merge?
[delurk; apparently kdbus is not receiving the architectural review it should. i've got quite a bit of knowledge on message-passing mechanisms in general, and kernel IPC in particular, so i'll weigh in uninvited. apologies for length. as my proper review on this topic is still under construction, i'll try (and fail) to be brief here. i started down that road only to realize that kdbus is quite the ball of mud even if the only thing under the scope is its interface, and that if i held off until properly ready i'd risk kdbus having already been merged, making review moot.] Ingo Molnar wrote: - I've been closely monitoring Linux kernel changes for over 20 years, and for the last 10 years the linux/ipc/* code has been dormant: it works and was kept good for existing usecases, but no-one was maintaining and enhancing it with the future in mind. It's my understanding that linux/ipc/* contains only SysV IPC, i.e. shm, sem, SysV message queues, and POSIX message queues. There are other IPC-implementing things in the kernel also, such as unix domain sockets, pipes, shared memory via mmap(), signals, mappings that appear shared across fork(), and whatever else provides either kernel-mediated multi-client buffer access or some combination of shared memory and synchronization that lets userspace exchange hot data across the address space boundary. It's also my understanding that no-one in their right mind would call SysV IPC state-of-the-art even at the level of interface; indeed its presence in the hoariest of vendor unixes suggests it's not supposed to be even close. However, the suggested replacement in kdbus replicates the worst[-1] of all known user-to-user IPC mechanisms, i.e. Mach. I'm not suggesting that Linux adopt e.g. a different microkernel IPC mechanism-- those are by and large inapplicable to a monolithic kernel for reasons of ABI (and, well, why would you do IPC when function calls are zomgfast already?)-- but rather, that the existing ones either are good enough at this time or can be reworked to become near-equivalent to the state of the art in terms of performance. So there exists a technical vacuum: the kernel does not have any good, modern IPC ABI at the moment that distros can rely on as a 'golden standard'. This is partly technical, partly political. The technical reason is that SysV IPC is ancient and cumbersome. The political reason is that SystemD could be using and extending Android's existing kernel accelerated IPC subsystem (Binder) that is already upstream - but does not. I'll contend that the reason for this vacuum is that the existing kernel IPC interfaces are fine to the point that other mechanisms may be derived from them solely in user-space without significant performance demerit, and without pushing ca. 10k SLOC of IPC broker and policy engine into kernel space. Furthermore, it's my well-ruminated opinion that implementations of the userspace ABI specified in the kdbus 4.1-rc1 version (of April this year) will always be necessarily slower than existing IPC primitives in terms of both throughput and latency; and that the latter are directly applicable to constructing a more convenient user-space IPC broker that implements what kdbus seeks to provide: naming, broadcast, unidirectional signaling, bidirectional method calls, and a policy mechanism. In addition I'll argue that as currently specified, the kdbus interface-- even if tuned to its utmost-- is not only necessarily inferior to e.g. a well-tuned version of unix domain sockets, but also fundamentally flawed in ways that prohibit construction of robust in-system distributed programs by kdbus' mechanisms alone (i.e. byzantine call-site workarounds notwithstanding). For the first, compare unix domain sockets (i.e. point-to-point mode, access control through filesystem [or fork() parentage], read/write/select) to the kdbus message-sending ioctl. In the main data-exchanging portion, the former requires only a connection identifier, a pointer to a buffer, and the length of data in that buffer. To contrast, kdbus takes a complex message-sending command structure with 0..n items of m kinds that the ioctl must parse in a m-way switching loop, and then another complex message-describing structure which has its own 1..n items of another m kinds describing its contents, destination-lookup options, negotiation of supported options, and so forth. Consequently, a carefully optimized implementation of unix domain sockets (and by extension all the data-carrying SysV etc. IPC primitives, optimized similarly) will always be superior to kdbus for both message throughput and latency, for the reason of kdbus' comparatively great interface complexity alone. There's an obvious caveat here, i.e. well where is it, then?. Given the overhead dictated by its interface, kdbus' performance is already inferior for short messages. For long messages ( L1 cache size per Stetson-Harrison[0]) the only performance benefit from
Re: kdbus: to merge or not to merge?
On Thu, Jun 25, 2015 at 09:57:45AM +0200, Geert Uytterhoeven wrote: > > > > in-kernel webserver > > Which was cool, and small, and _faster_ than anything else... > Until it was integrated, and people working on (userspace) webservers > started considering its performance as a target, and soon it was > out-performed by userspace webservers... > > So it did teach us a lesson... > > (Perhaps the above paragraph is actually good advocacy for integrating > kdbus, and for seeding a better userspace implementation? ;-) > Except back then, the userspace web servers were created by the competition and there was a strong incentive to beat tux. But today, kdbus is written by the same folks that write dbus, and there's no other competition. There's no incentive to fix dbus once kdbus is merged, and in fact, it gives incentive to just drop it completely. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Am Donnerstag, 25. Juni 2015, 09:34:56 schrieb Theodore Ts'o: > On Thu, Jun 25, 2015 at 08:05:58AM +0200, Martin Steigerwald wrote: > > Or, do you think, that there is a different option to handle this then > > the both I outlined above? > > Hmm... distros could have their engineers **fix** the busted userspace > code, instead of fixing the problem by jamming a different > implementation into the kernel? Hmm, I read on Devuan mailing list, that Qt engineers work on doing dbus directly inside Qt instead of using the existing libdbus. I did not verify this claim yet. But considering what I read here about performance issues with libdbus I think it would make quite some sense. Also I wonder who will use sdbus stuff from systemd / libsystemd – I sure hope sdbus will work without systemd running as PID 1, but I am not clear on this either – from the desktop environment people beside xdg-app. I doubt that Qt will depend on it, being available for more than the Linux platform. And if GNOME wants to be portable to the BSD variants at least, they can´t depend on it either. So who will use non portable sdbus anyway – except specialized apps? In case I missed this in the discussion so far, sorry, but from what I read from the various threads I am really not clear on this. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu, Jun 25, 2015 at 08:05:58AM +0200, Martin Steigerwald wrote: > > Or, do you think, that there is a different option to handle this then the > both I outlined above? Hmm... distros could have their engineers **fix** the busted userspace code, instead of fixing the problem by jamming a different implementation into the kernel? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jun 24, 2015 at 9:12 PM, David Lang wrote: > On Wed, 24 Jun 2015, Martin Steigerwald wrote: >> Am Mittwoch, 24. Juni 2015, 10:39:52 schrieb David Lang: >>> On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. >>> >>> counterexamples, devfs, tux >> >> What was tux? > > in-kernel webserver Which was cool, and small, and _faster_ than anything else... Until it was integrated, and people working on (userspace) webservers started considering its performance as a target, and soon it was out-performed by userspace webservers... So it did teach us a lesson... (Perhaps the above paragraph is actually good advocacy for integrating kdbus, and for seeding a better userspace implementation? ;-) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
* Ingo Molnar wrote: > > * David Lang wrote: > > > On Wed, 24 Jun 2015, Ingo Molnar wrote: > > > > > And the thing is, in hindsight, after such huge flamewars, years down the > > > line, almost never do I see the following question asked: 'what were we > > > thinking merging that crap??'. If any question arises it's usually along > > > the > > > lines of: 'what was the big fuss about?'. So I think by and large the > > > process > > > works. > > > > counterexamples, devfs, tux > > Actually, we never merged the Tux web server upstream, and the devfs concept > has > kind of made a comeback via devtmpfs. Bits of devfs also live on in sysfs. So devfs wasn't a bad initial idea IMHO, but we had to do one more (incompatible ...) iteration to figure out why we didn't like it. Furthermore, I'm pretty sure there's a snowball's chance in hell that we'd have ended up with the current pretty cleaned up hardware/system ABI _without_ devfs. So it was a necessary pain. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
* David Lang wrote: > On Wed, 24 Jun 2015, Ingo Molnar wrote: > > > And the thing is, in hindsight, after such huge flamewars, years down the > > line, almost never do I see the following question asked: 'what were we > > thinking merging that crap??'. If any question arises it's usually along > > the > > lines of: 'what was the big fuss about?'. So I think by and large the > > process > > works. > > counterexamples, devfs, tux Actually, we never merged the Tux web server upstream, and the devfs concept has kind of made a comeback via devtmpfs. And there are examples of bits we _should_ have merged: - GGI (General Graphics Interface) - [ and we should probably also have merged kgdb a decade earlier to avoid wasting all that energy on flaming about it unnecessarily ;-) ] And the thing is, I specifically talked about 'near zero cost' kernel patches that don't appreciably impact the 'core kernel'. There's plenty of examples of features with non-trivial 'core kernel' costs that weren't merged, and rightfully IMHO: - the STREAMS ABI - various forms of a generic kABI that were proposed - moving the kernel to C++ :-) ... and devfs arguably belongs into that category as well. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, 24 Jun 2015, Greg KH wrote: On Wed, Jun 24, 2015 at 10:39:52AM -0700, David Lang wrote: On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux Don't knock devfs. It created a lot of things that we take for granted now with our development model. Off the top of my head, here's a short list: - it showed that we can't arbritrary make user/kernel api changes without working with people outside of the kernel developer community, and expect people to follow them - the idea was sound, but the implementation was not, it had unfixable problems, so to fix those problems, we came up with better, kernel-wide solutions, forcing us to unify all device/driver subsystems. - we were forced to try to document our user/kernel apis better, hence Documentation/ABI/ was created - to remove devfs, we had to create a structure of _how_ to remove features. It took me 2-3 years to be able to finally delete the devfs code, as the infrastructure and feedback loops were just not in place before then to allow that to happen. So I would strongly argue that merging devfs was a good thing, it spurned a lot of us to get the job done correctly. Without it, we would have never seen the need, or had the knowledge of what needed to be done. I don't disagree with you, but it was definantly a case of adding something that was later regretted and removed. A lot was learned in the process, but that wasn't the issue I was referring to. I don't want kdbus to end up the same way. The more I think back to those discussions, the more parallels I see between the two. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jun 24, 2015 at 10:39:52AM -0700, David Lang wrote: > On Wed, 24 Jun 2015, Ingo Molnar wrote: > > >And the thing is, in hindsight, after such huge flamewars, years down the > >line, > >almost never do I see the following question asked: 'what were we thinking > >merging > >that crap??'. If any question arises it's usually along the lines of: 'what > >was > >the big fuss about?'. So I think by and large the process works. > > counterexamples, devfs, tux Don't knock devfs. It created a lot of things that we take for granted now with our development model. Off the top of my head, here's a short list: - it showed that we can't arbritrary make user/kernel api changes without working with people outside of the kernel developer community, and expect people to follow them - the idea was sound, but the implementation was not, it had unfixable problems, so to fix those problems, we came up with better, kernel-wide solutions, forcing us to unify all device/driver subsystems. - we were forced to try to document our user/kernel apis better, hence Documentation/ABI/ was created - to remove devfs, we had to create a structure of _how_ to remove features. It took me 2-3 years to be able to finally delete the devfs code, as the infrastructure and feedback loops were just not in place before then to allow that to happen. So I would strongly argue that merging devfs was a good thing, it spurned a lot of us to get the job done correctly. Without it, we would have never seen the need, or had the knowledge of what needed to be done. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Am Donnerstag, 25. Juni 2015, 08:01:35 schrieb Martin Steigerwald: > Am Mittwoch, 24. Juni 2015, 19:20:27 schrieb Linus Torvalds: > > On Wed, Jun 24, 2015 at 7:14 PM, Steven Rostedt > > wrote: > > > I don't think it will complicate things even if the API changes. The > > > distros will have to deal with that fall out. Mainline only cares > > > about > > > its own regressions. But any API changes would only be done for good > > > reasons, and give the distros an excuse to fix whatever was done wrong > > > in the first place. > > > > I don't think that's true. > > > > Realistically, every single kernel developer tends to work on a > > machine with some random distro. If that developer cannot compile his > > own kernel because his distro stops working, or has to use some > > "kdbus=0" switch to turn off the kernel kdbus and (hopefuly) the > > distro just switches to the legacy user mode bus, then for that > > developer, merging and enabling incompatible kdbus implementation is > > basically a regression. > > > > We've seen this before. We end up stuck with the ABI of whatever user > > land applications. It doesn't matter where that ABI came from. > > > > I do agree that distro's that want to enable kdbus before any agreed > > version has been merged would get to also act as guinea pigs and do > > their own QA, and handle fallout from whatever problems they encounter > > etc. That part might be good. But I don't think we really end up > > having the option to make up some incompatible kdbus ABI > > after-the-fact. > > Linus, so is that a recommendation to the distros to be careful to put > kdbus into the distro kernel right now and probably better defer it or > are you thinking that the ABI of kdbus already is suitable for merging > and you see no issues to merge a kdbus with the ABI it currently has, but > probably otherwise improved? Or, do you think, that there is a different option to handle this then the both I outlined above? -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Am Mittwoch, 24. Juni 2015, 19:20:27 schrieb Linus Torvalds: > On Wed, Jun 24, 2015 at 7:14 PM, Steven Rostedt wrote: > > I don't think it will complicate things even if the API changes. The > > distros will have to deal with that fall out. Mainline only cares about > > its own regressions. But any API changes would only be done for good > > reasons, and give the distros an excuse to fix whatever was done wrong > > in the first place. > I don't think that's true. > > Realistically, every single kernel developer tends to work on a > machine with some random distro. If that developer cannot compile his > own kernel because his distro stops working, or has to use some > "kdbus=0" switch to turn off the kernel kdbus and (hopefuly) the > distro just switches to the legacy user mode bus, then for that > developer, merging and enabling incompatible kdbus implementation is > basically a regression. > > We've seen this before. We end up stuck with the ABI of whatever user > land applications. It doesn't matter where that ABI came from. > > I do agree that distro's that want to enable kdbus before any agreed > version has been merged would get to also act as guinea pigs and do > their own QA, and handle fallout from whatever problems they encounter > etc. That part might be good. But I don't think we really end up > having the option to make up some incompatible kdbus ABI > after-the-fact. Linus, so is that a recommendation to the distros to be careful to put kdbus into the distro kernel right now and probably better defer it or are you thinking that the ABI of kdbus already is suitable for merging and you see no issues to merge a kdbus with the ABI it currently has, but probably otherwise improved? Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, 24 Jun 2015, Greg KH wrote: On Wed, Jun 24, 2015 at 10:39:52AM -0700, David Lang wrote: On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux Don't knock devfs. It created a lot of things that we take for granted now with our development model. Off the top of my head, here's a short list: - it showed that we can't arbritrary make user/kernel api changes without working with people outside of the kernel developer community, and expect people to follow them - the idea was sound, but the implementation was not, it had unfixable problems, so to fix those problems, we came up with better, kernel-wide solutions, forcing us to unify all device/driver subsystems. - we were forced to try to document our user/kernel apis better, hence Documentation/ABI/ was created - to remove devfs, we had to create a structure of _how_ to remove features. It took me 2-3 years to be able to finally delete the devfs code, as the infrastructure and feedback loops were just not in place before then to allow that to happen. So I would strongly argue that merging devfs was a good thing, it spurned a lot of us to get the job done correctly. Without it, we would have never seen the need, or had the knowledge of what needed to be done. I don't disagree with you, but it was definantly a case of adding something that was later regretted and removed. A lot was learned in the process, but that wasn't the issue I was referring to. I don't want kdbus to end up the same way. The more I think back to those discussions, the more parallels I see between the two. David Lang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
* David Lang da...@lang.hm wrote: On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux Actually, we never merged the Tux web server upstream, and the devfs concept has kind of made a comeback via devtmpfs. And there are examples of bits we _should_ have merged: - GGI (General Graphics Interface) - [ and we should probably also have merged kgdb a decade earlier to avoid wasting all that energy on flaming about it unnecessarily ;-) ] And the thing is, I specifically talked about 'near zero cost' kernel patches that don't appreciably impact the 'core kernel'. There's plenty of examples of features with non-trivial 'core kernel' costs that weren't merged, and rightfully IMHO: - the STREAMS ABI - various forms of a generic kABI that were proposed - moving the kernel to C++ :-) ... and devfs arguably belongs into that category as well. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
* Ingo Molnar mi...@kernel.org wrote: * David Lang da...@lang.hm wrote: On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux Actually, we never merged the Tux web server upstream, and the devfs concept has kind of made a comeback via devtmpfs. Bits of devfs also live on in sysfs. So devfs wasn't a bad initial idea IMHO, but we had to do one more (incompatible ...) iteration to figure out why we didn't like it. Furthermore, I'm pretty sure there's a snowball's chance in hell that we'd have ended up with the current pretty cleaned up hardware/system ABI _without_ devfs. So it was a necessary pain. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jun 24, 2015 at 9:12 PM, David Lang da...@lang.hm wrote: On Wed, 24 Jun 2015, Martin Steigerwald wrote: Am Mittwoch, 24. Juni 2015, 10:39:52 schrieb David Lang: On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux What was tux? in-kernel webserver Which was cool, and small, and _faster_ than anything else... Until it was integrated, and people working on (userspace) webservers started considering its performance as a target, and soon it was out-performed by userspace webservers... So it did teach us a lesson... (Perhaps the above paragraph is actually good advocacy for integrating kdbus, and for seeding a better userspace implementation? ;-) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu, Jun 25, 2015 at 08:05:58AM +0200, Martin Steigerwald wrote: Or, do you think, that there is a different option to handle this then the both I outlined above? Hmm... distros could have their engineers **fix** the busted userspace code, instead of fixing the problem by jamming a different implementation into the kernel? - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Thu, Jun 25, 2015 at 09:57:45AM +0200, Geert Uytterhoeven wrote: in-kernel webserver Which was cool, and small, and _faster_ than anything else... Until it was integrated, and people working on (userspace) webservers started considering its performance as a target, and soon it was out-performed by userspace webservers... So it did teach us a lesson... (Perhaps the above paragraph is actually good advocacy for integrating kdbus, and for seeding a better userspace implementation? ;-) Except back then, the userspace web servers were created by the competition and there was a strong incentive to beat tux. But today, kdbus is written by the same folks that write dbus, and there's no other competition. There's no incentive to fix dbus once kdbus is merged, and in fact, it gives incentive to just drop it completely. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Am Donnerstag, 25. Juni 2015, 09:34:56 schrieb Theodore Ts'o: On Thu, Jun 25, 2015 at 08:05:58AM +0200, Martin Steigerwald wrote: Or, do you think, that there is a different option to handle this then the both I outlined above? Hmm... distros could have their engineers **fix** the busted userspace code, instead of fixing the problem by jamming a different implementation into the kernel? Hmm, I read on Devuan mailing list, that Qt engineers work on doing dbus directly inside Qt instead of using the existing libdbus. I did not verify this claim yet. But considering what I read here about performance issues with libdbus I think it would make quite some sense. Also I wonder who will use sdbus stuff from systemd / libsystemd – I sure hope sdbus will work without systemd running as PID 1, but I am not clear on this either – from the desktop environment people beside xdg-app. I doubt that Qt will depend on it, being available for more than the Linux platform. And if GNOME wants to be portable to the BSD variants at least, they can´t depend on it either. So who will use non portable sdbus anyway – except specialized apps? In case I missed this in the discussion so far, sorry, but from what I read from the various threads I am really not clear on this. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Am Mittwoch, 24. Juni 2015, 19:20:27 schrieb Linus Torvalds: On Wed, Jun 24, 2015 at 7:14 PM, Steven Rostedt rost...@goodmis.org wrote: I don't think it will complicate things even if the API changes. The distros will have to deal with that fall out. Mainline only cares about its own regressions. But any API changes would only be done for good reasons, and give the distros an excuse to fix whatever was done wrong in the first place. I don't think that's true. Realistically, every single kernel developer tends to work on a machine with some random distro. If that developer cannot compile his own kernel because his distro stops working, or has to use some kdbus=0 switch to turn off the kernel kdbus and (hopefuly) the distro just switches to the legacy user mode bus, then for that developer, merging and enabling incompatible kdbus implementation is basically a regression. We've seen this before. We end up stuck with the ABI of whatever user land applications. It doesn't matter where that ABI came from. I do agree that distro's that want to enable kdbus before any agreed version has been merged would get to also act as guinea pigs and do their own QA, and handle fallout from whatever problems they encounter etc. That part might be good. But I don't think we really end up having the option to make up some incompatible kdbus ABI after-the-fact. Linus, so is that a recommendation to the distros to be careful to put kdbus into the distro kernel right now and probably better defer it or are you thinking that the ABI of kdbus already is suitable for merging and you see no issues to merge a kdbus with the ABI it currently has, but probably otherwise improved? Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jun 24, 2015 at 10:39:52AM -0700, David Lang wrote: On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux Don't knock devfs. It created a lot of things that we take for granted now with our development model. Off the top of my head, here's a short list: - it showed that we can't arbritrary make user/kernel api changes without working with people outside of the kernel developer community, and expect people to follow them - the idea was sound, but the implementation was not, it had unfixable problems, so to fix those problems, we came up with better, kernel-wide solutions, forcing us to unify all device/driver subsystems. - we were forced to try to document our user/kernel apis better, hence Documentation/ABI/ was created - to remove devfs, we had to create a structure of _how_ to remove features. It took me 2-3 years to be able to finally delete the devfs code, as the infrastructure and feedback loops were just not in place before then to allow that to happen. So I would strongly argue that merging devfs was a good thing, it spurned a lot of us to get the job done correctly. Without it, we would have never seen the need, or had the knowledge of what needed to be done. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Am Donnerstag, 25. Juni 2015, 08:01:35 schrieb Martin Steigerwald: Am Mittwoch, 24. Juni 2015, 19:20:27 schrieb Linus Torvalds: On Wed, Jun 24, 2015 at 7:14 PM, Steven Rostedt rost...@goodmis.org wrote: I don't think it will complicate things even if the API changes. The distros will have to deal with that fall out. Mainline only cares about its own regressions. But any API changes would only be done for good reasons, and give the distros an excuse to fix whatever was done wrong in the first place. I don't think that's true. Realistically, every single kernel developer tends to work on a machine with some random distro. If that developer cannot compile his own kernel because his distro stops working, or has to use some kdbus=0 switch to turn off the kernel kdbus and (hopefuly) the distro just switches to the legacy user mode bus, then for that developer, merging and enabling incompatible kdbus implementation is basically a regression. We've seen this before. We end up stuck with the ABI of whatever user land applications. It doesn't matter where that ABI came from. I do agree that distro's that want to enable kdbus before any agreed version has been merged would get to also act as guinea pigs and do their own QA, and handle fallout from whatever problems they encounter etc. That part might be good. But I don't think we really end up having the option to make up some incompatible kdbus ABI after-the-fact. Linus, so is that a recommendation to the distros to be careful to put kdbus into the distro kernel right now and probably better defer it or are you thinking that the ABI of kdbus already is suitable for merging and you see no issues to merge a kdbus with the ABI it currently has, but probably otherwise improved? Or, do you think, that there is a different option to handle this then the both I outlined above? -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jun 24, 2015 at 7:14 PM, Steven Rostedt wrote: > > I don't think it will complicate things even if the API changes. The distros > will have to deal with that fall out. Mainline only cares about its own > regressions. But any API changes would only be done for good reasons, and give > the distros an excuse to fix whatever was done wrong in the first place. I don't think that's true. Realistically, every single kernel developer tends to work on a machine with some random distro. If that developer cannot compile his own kernel because his distro stops working, or has to use some "kdbus=0" switch to turn off the kernel kdbus and (hopefuly) the distro just switches to the legacy user mode bus, then for that developer, merging and enabling incompatible kdbus implementation is basically a regression. We've seen this before. We end up stuck with the ABI of whatever user land applications. It doesn't matter where that ABI came from. I do agree that distro's that want to enable kdbus before any agreed version has been merged would get to also act as guinea pigs and do their own QA, and handle fallout from whatever problems they encounter etc. That part might be good. But I don't think we really end up having the option to make up some incompatible kdbus ABI after-the-fact. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Tue, Jun 23, 2015 at 08:07:41AM -0700, Andy Lutomirski wrote: > > FWIW, once there are real distros with kdbus userspace enabled, > reviewing kdbus gets more complicated -- we'll be in the position > where merging kdbus in a different form from that which was proposed > will break existing users. Actually, I think distros having it in their kernel before it's in mainline is actually a good thing. Let them straighten out the issues that may come up (not to mention possible CVEs). If the distros have it in their kernels and out in the public for 6 months or more, that may give enough information as to whether or not it should be merged. I don't think it will complicate things even if the API changes. The distros will have to deal with that fall out. Mainline only cares about its own regressions. But any API changes would only be done for good reasons, and give the distros an excuse to fix whatever was done wrong in the first place. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jun 24, 2015 at 9:43 PM, Andy Lutomirski wrote: > On Wed, Jun 24, 2015 at 10:11 AM, Alexander Larsson > wrote: >> My name is on the dbus specification, and I am (and was >> then) well aware of systems with object references. In fact, both >> previous ipc systems (Corba ORBs) that Gnome used before we designed >> dbus uses object references, and they had a lot of problems which dbus >> solved for us. I'm not saying dbus is perfect, but it has its reasons >> for the way it works. >> >> So, dbus-the-system has some kind of notion of an object reference >> (peer + object path), but the *bus* is fundamentally a way to >> communicate between peers, and the object path is merely some >> uninterpreted metadata. > > I'm talking about the reference part, not the object part. Peer + > object path is a name, not a reference. True, its not a reference in the "refcount" style. >> You wish that the kernel controlled access to a particular object in a >> peer, but once the message is dispatched into the target process all >> bets are off anyway. It will be running some code parsing your message >> in the process with no real separation from the other objects. Any bug >> there could give you wider access. I don't see how this fundamentally >> makes the whole system much more secure. On the other hand, I *do* >> remember having to track down cross-process leaks from circular >> references between processes using Corba... > > If you have peer ids keeping things alive on dbus, surely you can also > have circular references, no? Technically you could set up a situation where this happens, but in practice it doesn't really. Because object paths don't keep other processes alive you don't accidentally get circular references, whereas this happened a lot on corba because references was the only thing you had. >> You can run three instances of an app, but only one of them can own >> the bus name. This works out fine if your app does not use dbus, but >> it may be a problem if it uses dbus activation. > > I'd really like to be able to xdg-app --stateless oowriter > some_untrusted_file.docx and have it fully functional, including > printing, even if I have another instance running. If that was to work then you'd have to have a way to make all the session services that are needed for it to work to also listen to the new custom bus for only that app. >> Well, the service providers are not the same as the portals. Say you >> have a twitter client that you want to register to be able to share >> some selected text. The twitter client can be fully sandboxed. The >> portal is just the link between the requesting app and the list of >> registered share providers. >> > > Ah. I clearly am misunderstanding something. What's a portal? Well, portal is a general name for "service needed for making sandboxed apps work". So, they can be a bit different, but in essence they are small dbus services that facilitate communication between different apps and between the app and the host session, in a safe way. Think of them sort of like filtering proxies, but with a gui. >> Well, that is essentially what a portal like the share one does. >> Although it shows a user controlled UI inbetween to avoid the app >> being able to start any other app it wants. > > Hmm. So shouldn't xdg-app be explicitly choosing which portals are > allowed for which sandboxed apps rather than allowing > org.freedesktop.portal.*? Right now there is no default policy for this, as we don't really have the portal system fully formed yet. But, yeah, using portal.* was an example of a policy, another would be to list the allowed portals explicitly. >> You're free to design such a system and write a desktop to use it. >> However, in Gnome (and in the other desktops as well), dbus is already >> used for all ipc like this and all the freedesktop specs, >> infrastructure, type systems, interfaces, code and frameworks are >> built around that. There has to be a *massive* advantage for us to use >> something else, and I'm not at all convinced by the issues you bring >> up. > > The custom endpoint policy thing is brand new, whereas using a > userspace proxy for xdg-app actually sounds easier than using a > separate kdbus bus. Sticking dbus in the kernel would also be new. Yeah, some code in the middle is new, but the entire infrastructure and sematics are the same. I got the feeling you were proposing something completely different to dbus. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jun 24, 2015 at 10:11 AM, Alexander Larsson wrote: > On Wed, Jun 24, 2015 at 5:38 PM, Andy Lutomirski wrote: >> Was this intentionally off-list? > > Nah, that was a mistake, adding back the list. > >> On Wed, Jun 24, 2015 at 8:10 AM, Alexander Larsson > >>> The way i did it in the userspace proxy is to allow peer exited >>> messages from services that talked to you at some point, as this is >>> the core requirement (you must be able to limit things to the lifetime >>> of clients). However, i can see how tracking that in the kernel is a >>> bit painful, so just allowing all is probably a reasonable choice. >>> >> >> Hmm. I guess this is an ugliness of dbus in general. Since dbus >> doesn't really have a concept of objects (AIUI) you can't really get a >> notification that a particular object that you have a reference to is >> gone, so you have to ask for notification that the peer providing the >> object is gone, but there was never any concept of having a reference >> to a peer, so here we are :( > > You keep using works like ugly and stupid, which isn't super > impressive. Fair enough. On the other hand, I've called my own code ugly plenty of times. > My name is on the dbus specification, and I am (and was > then) well aware of systems with object references. In fact, both > previous ipc systems (Corba ORBs) that Gnome used before we designed > dbus uses object references, and they had a lot of problems which dbus > solved for us. I'm not saying dbus is perfect, but it has its reasons > for the way it works. > > So, dbus-the-system has some kind of notion of an object reference > (peer + object path), but the *bus* is fundamentally a way to > communicate between peers, and the object path is merely some > uninterpreted metadata. I'm talking about the reference part, not the object part. Peer + object path is a name, not a reference. > Once the message reaches the destination > process it is essentially free to interpret the object path however > they want. If something needs a long lasting "reference" to an object > you can implement that by e.g. using a Subscribe method, and you can > guarantee cleanup because the bus will tell you if the peer died. Except you can't pass them around. So it's still reference-by-name instead of reference-by-actual-reference. > > This also means that the bus itself is vastly simplified. It only has > to track peers, not every object in every peer. And clients are more > flexible with how objects are handled. They can be instantiated > lazily, or even created algorithmically from the object path if > needed. True. Nonetheless, things like Cap'n Proto and seL4 are quite simple and have real references. > > You wish that the kernel controlled access to a particular object in a > peer, but once the message is dispatched into the target process all > bets are off anyway. It will be running some code parsing your message > in the process with no real separation from the other objects. Any bug > there could give you wider access. I don't see how this fundamentally > makes the whole system much more secure. On the other hand, I *do* > remember having to track down cross-process leaks from circular > references between processes using Corba... If you have peer ids keeping things alive on dbus, surely you can also have circular references, no? > >>> The desktop file lists the icon, name and whatnot which is displayed >>> by the desktop environment. If DBusActivatable is true, then the app >>> is started by sending dbus messages to the same name as the desktop >>> file, to the org.freedesktop.Application interface, this way we can >>> ensure a singleton app and you can do more things than just spawning >>> it. >> >> How do I install apps as an unprivileged user? What about running >> sandboxed apps that aren't installed at all? What about downloading >> one app and running three instances of it that are all isolated from >> each other? > > Users install desktop files in a file in their home directory > (~/.local/share/applications/ typically). > > xdg-app apps require some form of installation before running. IMO that's unfortunate. If nothing else, it prevents programs from easily starting one-off sandboxed apps that weren't separately installed. > > You can run three instances of an app, but only one of them can own > the bus name. This works out fine if your app does not use dbus, but > it may be a problem if it uses dbus activation. I'd really like to be able to xdg-app --stateless oowriter some_untrusted_file.docx and have it fully functional, including printing, even if I have another instance running. > >>> Well, your "other than" part kinda breaks things like launching the >>> application. So, we need to be on the real bus. >>> Could you then *also* have a bus per app for talking to the portal? I >>> guess so, but I don't quite see the point. Just having the portals >>> trying to find all new buses that come and go will be all kinds of >>>
Re: kdbus: to merge or not to merge?
On Wed, 24 Jun 2015, Martin Steigerwald wrote: Am Mittwoch, 24. Juni 2015, 10:39:52 schrieb David Lang: On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux What was tux? in-kernel webserver David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Am Mittwoch, 24. Juni 2015, 10:39:52 schrieb David Lang: > On Wed, 24 Jun 2015, Ingo Molnar wrote: > > And the thing is, in hindsight, after such huge flamewars, years down > > the line, almost never do I see the following question asked: 'what > > were we thinking merging that crap??'. If any question arises it's > > usually along the lines of: 'what was the big fuss about?'. So I think > > by and large the process works. > counterexamples, devfs, tux What was tux? The filesystem tux3 is not merged as far as I am aware. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
David Lang writes: > On Wed, 24 Jun 2015, Ingo Molnar wrote: > >> And the thing is, in hindsight, after such huge flamewars, years down the >> line, >> almost never do I see the following question asked: 'what were we thinking >> merging >> that crap??'. If any question arises it's usually along the lines of: 'what >> was >> the big fuss about?'. So I think by and large the process works. > > counterexamples, devfs, tux The biggest I can think of cgroups. The way cgroups connect to processes instead of resources (semantically) and the fact that controllers are different from fundamental entities like schedulers. Of course I don't think "What were we thinking" I remember it all too well in that case. I think "What do we do now that we have made this mess". Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, 24 Jun 2015, Ingo Molnar wrote: And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. counterexamples, devfs, tux David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jun 24, 2015 at 5:38 PM, Andy Lutomirski wrote: > Was this intentionally off-list? Nah, that was a mistake, adding back the list. > On Wed, Jun 24, 2015 at 8:10 AM, Alexander Larsson >> The way i did it in the userspace proxy is to allow peer exited >> messages from services that talked to you at some point, as this is >> the core requirement (you must be able to limit things to the lifetime >> of clients). However, i can see how tracking that in the kernel is a >> bit painful, so just allowing all is probably a reasonable choice. >> > > Hmm. I guess this is an ugliness of dbus in general. Since dbus > doesn't really have a concept of objects (AIUI) you can't really get a > notification that a particular object that you have a reference to is > gone, so you have to ask for notification that the peer providing the > object is gone, but there was never any concept of having a reference > to a peer, so here we are :( You keep using works like ugly and stupid, which isn't super impressive. My name is on the dbus specification, and I am (and was then) well aware of systems with object references. In fact, both previous ipc systems (Corba ORBs) that Gnome used before we designed dbus uses object references, and they had a lot of problems which dbus solved for us. I'm not saying dbus is perfect, but it has its reasons for the way it works. So, dbus-the-system has some kind of notion of an object reference (peer + object path), but the *bus* is fundamentally a way to communicate between peers, and the object path is merely some uninterpreted metadata. Once the message reaches the destination process it is essentially free to interpret the object path however they want. If something needs a long lasting "reference" to an object you can implement that by e.g. using a Subscribe method, and you can guarantee cleanup because the bus will tell you if the peer died. This also means that the bus itself is vastly simplified. It only has to track peers, not every object in every peer. And clients are more flexible with how objects are handled. They can be instantiated lazily, or even created algorithmically from the object path if needed. You wish that the kernel controlled access to a particular object in a peer, but once the message is dispatched into the target process all bets are off anyway. It will be running some code parsing your message in the process with no real separation from the other objects. Any bug there could give you wider access. I don't see how this fundamentally makes the whole system much more secure. On the other hand, I *do* remember having to track down cross-process leaks from circular references between processes using Corba... >> The desktop file lists the icon, name and whatnot which is displayed >> by the desktop environment. If DBusActivatable is true, then the app >> is started by sending dbus messages to the same name as the desktop >> file, to the org.freedesktop.Application interface, this way we can >> ensure a singleton app and you can do more things than just spawning >> it. > > How do I install apps as an unprivileged user? What about running > sandboxed apps that aren't installed at all? What about downloading > one app and running three instances of it that are all isolated from > each other? Users install desktop files in a file in their home directory (~/.local/share/applications/ typically). xdg-app apps require some form of installation before running. You can run three instances of an app, but only one of them can own the bus name. This works out fine if your app does not use dbus, but it may be a problem if it uses dbus activation. >> Well, your "other than" part kinda breaks things like launching the >> application. So, we need to be on the real bus. >> Could you then *also* have a bus per app for talking to the portal? I >> guess so, but I don't quite see the point. Just having the portals >> trying to find all new buses that come and go will be all kinds of >> painful. > > How many portals will there be? It seems like, if you want multiple > portals programs (in the org.freedesktop.portal.* sense), then you'd > have some awkwardness if each app were on its own bus and you didn't > want a proxy, but I think you'll also have prevented yourself from > meaningfully sandboxing the portals themselves. You can sandbox the portals to some extent, but fundamentally they are meant to run in some kind of "higher privileges" mode, so they have to have access to things normal apps would not. For instance, they have to be able to activate other dbus names. > Android, on the other hand, sandboxes most of its service providers, > and Binder provides a nice way to selectively grant capabilities > between sandboxes. (The privacy and security disaster that's built on > top of Binder is another story, but that's not Binder's fault.) Well, the service providers are not the same as the portals. Say you have a twitter client that you want to register to be able to
Re: kdbus: to merge or not to merge?
On Wed, Jun 24, 2015 at 2:55 AM, Alexander Larsson wrote: > > I don't really understand this objection. I'm working on an > application sandboxing model for desktop applications (xdg-app), and > the kdbus model matches my needs well. In fact, I'm currently using a > userspace filtering proxy that implements exactly the kdbus policy > model. Of course, this adds *yet* another context switch per message. > The only problem I found is that kdbus filtering broke the ability to > track the lifetime of clients[1]. However, this has now been fixed > with exactly the change you complain about above. I find myself wondering whether the change I complain about will be a problem down the road. It's certainly an information leak of some sort. Whether the information that it leaks is valuable to anyone is an interesting question. > > I definitely don't want to do low level request interception with UI. > We learned long ago that it is a very poor fit for desktop use. At the > interception point you have no context at all about the larger scope, > such as what window caused the operation and how you would make it > modal or even just get the window parenting right. Also, if you do > this you will keep popping up windows all the time as apps do calls in > the background to be able to e.g. gray out unavailable menu items, > update folder counts, etc. Any operation that may cause user > interaction must be carefully designed to handle this. > > The way I expect to use kdbus policy, for an app called say > "org.gnome.gedit" is to have the following policy: > TALK org.freedesktop.DBus > OWN org.gnome.gedit > OWN org.gnome.gedit.* > TALK org.freedesktop.portal.* Aha! You're not doing what I assumed you were doing at all. > > This allows the app to conntect to and talk to the bus, own its own > name and broadcast signals. It also lets anyone else (that are not > sandboxed) talk to the app and it will be able to reply. This is > enough to have regular dbus activation of desktop files[2], as well > as allowing app-related custom services. Do I understand correctly that you're committing to an iOS-like model in which activations go to a particular named app as opposed to a more Android-like model in which multiple providers can offer the same service? > > It also allows the app to talk to a set of "portals" which are > sandbox-specific APIs that supply the necessary services for sandboxed > apps to interact with each other and the host. [snip description of what the portal does] This seems generally sensible. Here are my concerns. Feel free to tell me I'm nuts or ask me more. 1. Other than allowing non-sandboxed code to contact sandboxed apps directly (as opposed to via the portal), I still don't see how this is better than having a completely separate kdbusfs instance (or userspace socket or whatever) per app. The only things on the outside the app talks to are org.freedesktop.portal.*, and whatever service provides them could be taught to provide them to more than one running sandboxed app. By doing it with a policy rule like this, you're at risk of random non-sandboxed programs having a bright idea to offer some completely insecure service with a name like "org.freedesktop.portal.badidea" that destroys security. See, for example, the tons of reports of exploitable Android system services that shouldn't have been there in the first place. By using this type of policy rule, you're also preventing meaningful use of two different portal implementations -- their names will collide. That's fine when there's exactly one implementation that you're developing, but it might be nice to be able to run some apps under a super-locked-down portal, some under a standard portal, and some under some other fork of the portal, all at once. 2. Without seeing more details, I don't see how you will defend against name collisions. By allowing a sandboxed application to claim a well-known name with global significance (e.g. org.freedesktop.gedit), you're vulnerable to apps that maliciously claim some other app's name (e.g. by sticking it in their manifest or whatever). Search for the iOS "XARA" attacks, which mostly work like this and almost completely break iOS security (currently unfixed AFAIK). 3. Due to the IMO absurd way that kdbus policy works, you think you're limiting sandboxed apps to talking to names that match entries in your policy table. Instead, you're limiting sandboxed apps to talking to peer ids that advertise names that match entries in your policy table. As I understand it, you are completely and utterly hosed if your portal implements org.freedesktop.portal.secure_printing and org.freedesktop.admin.something_else. This issue is a large part of the reason that I consider kdbus' policy framework to be an unacceptable design. > Now, there will likely be a few cases where we need a more > fine-grained access limit. For instance you may have a service that > dynamically grants access to particular objects in
Re: kdbus: to merge or not to merge?
* Martin Steigerwald wrote: > Am Mittwoch, 24. Juni 2015, 10:05:02 schrieb Ingo Molnar: > > Not because I like it so much, but because I think the merge process > > should be stripped of politics and emotion as much as possible: if an > > initial submission is good and addresses all technical review properly, > > and if the cost to the core kernel is low, then barring alternative, > > fully equivalent and superior patch submissions, rejecting it does more > > harm than good. > > Now that is an interesting challenge. > > As I realize more and more we are all feeling beings. > > Linus himself according to his own words as I received them wants to make > perfectly sure that the developer who receives a message from him exactly > knows how he feels, especially when he disagrees with a pull request and > does not want to take it. So that twists what I said: how 'I feel about a pull request' is a technical term for: 'what is my subjective but rational technological opinion' about it. That's not an invitation to be irrationally emotional. (I'm reasonably sure that's what Linus meant there too, but I don't speak for him.) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
* Martin Steigerwald wrote: > Am Mittwoch, 24. Juni 2015, 10:05:02 schrieb Ingo Molnar: > > > - Once one (or two) major distros go with kdbus, it becomes a de-facto > > ABI. > > If the ABI is bad then that distro will hurt from it regardless of whether > > we > > merge it upstream or not - so technical pressure is there to improve it. > > But > > if the kernel refuses to merge it, Linux users will get hurt > > disproportionately badly. The kernel not being the first mover with a new > > ABI > > is absolutely sensible. But once Linux distros have taken the initial > > (non-trivial) plunge, not merging a zero-cost ABI upstream becomes more > > like > > revenge and obstruction, which is not productive. The kernel has very > > little > > value without full user-space, after all, so within reason the kernel > > project > > has to own up to distro ABI mistakes as well. > > So, in order to merge something that is not accepted upstream yet, is it an > accepted way to encourage distros to use it nonetheless, to get it upstream > then > anyway as in "as, look, now this and this distro uses it"? > > When I read > > > Not because I like it so much, but because I think the merge process should > > be > > stripped of politics and emotion as much as possible: if an initial > > submission > > is good and addresses all technical review properly, and if the cost to the > > core kernel is low, then barring alternative, fully equivalent and superior > > patch submissions, rejecting it does more harm than good. > > I think you didn´t mean it that way, as you state proper technical review as > a > requirement. > > Can you clarify? There's no conflict: when merging something upstream, technical feedback has to be addressed. AFAICS that is what happened when we merged controversial bits in the past where Linux distros jumped the gun: such as AppArmor or Binder. The main question that gets eliminated by a major distro using something is the (important) question of: 'does the Linux kernel need an ABI like that?'. Distros still run a considerable risk when forking new ABIs, obviously - as 'pre release' ABIs rarely survive upstreaming, and there's no guarantee that it will be accepted upstream. > Still as far as I got it, Andy raised technical concerns which Greg > outrightly > rejected as invalid without any further explaination. That does not seem like > technical concerns have been properly addressed to me. I haven't seen such responses but maybe I haven't managed to dig deep enough into the rather sizable discussion. Not addressing valid technical feedback would be a first for Greg in my book, so he definitely deserves the benefit of doubt from me. And the thing is, in hindsight, after such huge flamewars, years down the line, almost never do I see the following question asked: 'what were we thinking merging that crap??'. If any question arises it's usually along the lines of: 'what was the big fuss about?'. So I think by and large the process works. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Am Mittwoch, 24. Juni 2015, 10:05:02 schrieb Ingo Molnar: > Not because I like it so much, but because I think the merge process > should be stripped of politics and emotion as much as possible: if an > initial submission is good and addresses all technical review properly, > and if the cost to the core kernel is low, then barring alternative, > fully equivalent and superior patch submissions, rejecting it does more > harm than good. Now that is an interesting challenge. As I realize more and more we are all feeling beings. Linus himself according to his own words as I received them wants to make perfectly sure that the developer who receives a message from him exactly knows how he feels, especially when he disagrees with a pull request and does not want to take it. To my perception the whole kernel development process is quite full of emotion, including your message I reply to. And now you want to get rid of it. I bet you can. If you remove Linus… and every other kernel developer from the development process, including yourself. But then, who will develop the kernel? I think a different way to handle emotions can help and I intend handle them this way to see what results I create this way. I am aiming to feel my feelings as they are, instead of immediately judging them or attaching a thought to them basically making them emotions and distorting them that way, blocking my energy in them [1]. So I will attempt to feel my feelings before I answer again. I didn´t do so in the last answer to you, and I think it shows. [1] Arnold M. Patent, "You can have it all" Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Am Mittwoch, 24. Juni 2015, 10:05:02 schrieb Ingo Molnar: > - Once one (or two) major distros go with kdbus, it becomes a de-facto > ABI. If the ABI is bad then that distro will hurt from it regardless of > whether we merge it upstream or not - so technical pressure is there to > improve it. But if the kernel refuses to merge it, Linux users will get > hurt disproportionately badly. The kernel not being the first mover with > a new ABI is absolutely sensible. But once Linux distros have taken the > initial (non-trivial) plunge, not merging a zero-cost ABI upstream > becomes more like revenge and obstruction, which is not productive. The > kernel has very little value without full user-space, after all, so > within reason the kernel project has to own up to distro ABI mistakes as > well. So, in order to merge something that is not accepted upstream yet, is it an accepted way to encourage distros to use it nonetheless, to get it upstream then anyway as in "as, look, now this and this distro uses it"? When I read > Not because I like it so much, but because I think the merge process > should be stripped of politics and emotion as much as possible: if an > initial submission is good and addresses all technical review properly, > and if the cost to the core kernel is low, then barring alternative, > fully equivalent and superior patch submissions, rejecting it does more > harm than good. I think you didn´t mean it that way, as you state proper technical review as a requirement. Can you clarify? Still as far as I got it, Andy raised technical concerns which Greg outrightly rejected as invalid without any further explaination. That does not seem like technical concerns have been properly addressed to me. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
Ingo Molnar writes: > Not because I like it so much, but because I think the merge process should > be > stripped of politics and emotion as much as possible: if an initial > submission is > good and addresses all technical review properly, and if the cost to the core > kernel is low, then barring alternative, fully equivalent and superior patch > submissions, rejecting it does more harm than good. This is largely not what happened with kdbus. The initial submission was problematic. Many pieces of technical review were not addressed at the time a pull request was sent to Linus. Even now there are remaining outstanding technical items such as performance that have not been addressed. The cost to the rest of the core is potentially quite high as parts of kdbus double down on the worst mistakes in user interface of the kernel. Politics and emotion are involved because the discussions around kdbus have not been honest: - Lennart Poettering who has been hugely involved in the creation and the design of kdbus has not shown is face on lkml during the review, and he seems the only one who can actually answer many of the technical questions about kdbus. - Many times it was said some feature of kdbus is not important because using it was not required, and yet in practice using that feature is required in the common case. - Performance has been said to be a large benefit of kdbus and yet in the common case there will be a number of shared cache lines modifed for every message sent, for reference counts. At a quick glance it appears that communication with every system daemon will be serialized because they all have init as their parent process, so every reply will modify the reference count of init's struct pid. At this point I honestly do not know how to have a technical dialogue about the code in kdbus. Pointing out that bumping several reference counts per message is a bad idea, has gotten no where so far. Crazy things like using the processes command line (copied from userspace when a message is sent) for message authentication is still present in the code. I don't think any of these things are particularly subtle, hard to understand, or hard to fix yet months after they have been pointed out the code persists. For subtle issues who knows. Every review I have seen seems to get to a couple of simple things, point them out, and then stops. I am actually very strongly surprised at how many of these little issues remain in the code. There were enough changes added to the kdbus tree to fix small issues since the last merge window I would have thought I would have had to looked a little harder for problems. So whatever else the case may be I think the current kdbus code base is a long way from being ready to be merged. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Tue, Jun 23, 2015 at 8:06 AM, Andy Lutomirski wrote: > 3. The sandbox model is, in my opinion, an experiment that isn't going > to succeed. It's a poor model: a "restricted endpoint" (i.e. a > sandboxed kdbus client) sees a view of the world defined by a limited > policy language implemented by the kernel. This completely fails to > express what I think should be common use cases. If a sandboxed app > is given permission to access, say, > /org/gnome/evolution/dataserver/CalendarView/3125/12, then it knows > that it's looking at CalendarView/3125/12 (whatever that means) and > there's no way to hide the name. If someone subsequently deletes that > CalendarView and creates a new one with that name, racelessly blocking > access to the new one for the app may be complicated. If a sandbox > wants to prompt the user before allowing access to some resource, it > has a problem: the policy language doesn't seem to be able to express > request interception. > > The sandbox model is also already starting to accumulate kludges. > Apparently it was recently discovered that the kdbus connection > lifetime model was incompatible with sandbox policy, so as of a recent > change [2] connection lifetime messages completely bypass sandbox > policy. Maybe this isn't obviously insecure, but it seems like a bad > sign that "it's probably okay to poke this hole" is already happening > before the thing is even merged. > > I'll point out that a pure userspace implementation of sandboxed dbus > connections would be straightforward to implement today, would have > none of these problems, and would allow arbitrarily complex policy and > the flexibility to redesign it in the future if the initial design > turned out to be inappropriate for the sandbox being written. (You > could even have two different implementations to go with two different > sandboxes. Let a thousand sandboxes bloom, which is easy in userspace > but not so great in the kernel.) I don't really understand this objection. I'm working on an application sandboxing model for desktop applications (xdg-app), and the kdbus model matches my needs well. In fact, I'm currently using a userspace filtering proxy that implements exactly the kdbus policy model. Of course, this adds *yet* another context switch per message. The only problem I found is that kdbus filtering broke the ability to track the lifetime of clients[1]. However, this has now been fixed with exactly the change you complain about above. I definitely don't want to do low level request interception with UI. We learned long ago that it is a very poor fit for desktop use. At the interception point you have no context at all about the larger scope, such as what window caused the operation and how you would make it modal or even just get the window parenting right. Also, if you do this you will keep popping up windows all the time as apps do calls in the background to be able to e.g. gray out unavailable menu items, update folder counts, etc. Any operation that may cause user interaction must be carefully designed to handle this. The way I expect to use kdbus policy, for an app called say "org.gnome.gedit" is to have the following policy: TALK org.freedesktop.DBus OWN org.gnome.gedit OWN org.gnome.gedit.* TALK org.freedesktop.portal.* This allows the app to conntect to and talk to the bus, own its own name and broadcast signals. It also lets anyone else (that are not sandboxed) talk to the app and it will be able to reply. This is enough to have regular dbus activation of desktop files[2], as well as allowing app-related custom services. It also allows the app to talk to a set of "portals" which are sandbox-specific APIs that supply the necessary services for sandboxed apps to interact with each other and the host. For instance, it would have APIs for file choosing, where all user interaction will happen on the host side and the app just gets back the file data. Another example is sharing with intents-like semantics, where you'd say "I want to share text " and we open a dialog on the host side allowing you to chose how to share the text (tweet it, open in other app, etc) without the app knowing anything about it other than supplying the data. Operations like these are safe because they are interactive. An app can't use them to silently read the users files, and the user can always interactively abort the operation if it was unexpected. Now, there will likely be a few cases where we need a more fine-grained access limit. For instance you may have a service that dynamically grants access to particular objects in a portal service to an app. These things can be implemented fine in userspace in the actual service itself. The way I do that currently is by looking at the peer cgroup name, which encodes the xdg-app id. I don't see how making up policy dynamically and uploading it to the bus is better than just doing the filtering in the portal. [1]