Re: [PATCH 01/13] kdbus: add documentation
On Feb 4, 2015 4:16 PM, "David Herrmann" wrote: > > Hi > > On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski wrote: > > I see "latencies" of around 20 microseconds with lockdep and context > > tracking off. For example: > > Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us > for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I > suspect you enabled metadata transmission, which I think is not a fair > comparison. I tried to disable metadata. I may have failed. Regardless, if metadata is very slow, then that's more reason not to use it on send. And if you shouldn't use it, then maybe the kernel shouldn't provide it. I assumed there was a context switch in there. I can try to test differently. If UDS is twice as fast *with* a contest switch, then a userspace solution should be faster. Also, UDS can use memfds, too. > > A few notes on that: > > * kdbus is a bus layer. We don't intend to replace UDS, but improve > dbus. Comparing roundtrip times with UDS is tempting, but in no way > fair. To the very least, a bus layer has to perform peer-lookup, which > UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice. > Compare this to ~77us for dbus1 without marshaling. This makes me wonder what dbus1 is doing wrong. > > * We have not optimized kdbus code-paths for speed, yet. Our main > concerns are algorithmic challenges, and we believe they've been > improved considerably with kdbus. I have constantly measured kdbus > performance with 'perf' and flame-graphs, and there're a lot of > possible optimizations (especially on locking). However, I think this > can be done afterwards just fine. Neither API nor ioctl overhead has > shown up in my measurements. If anyone has counter evidence, please > let us know. But I'm a bit reluctant to change our API solely based on > performance guesses. But removal of send-time metadata can't be done after the fact. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Feb 4, 2015 4:16 PM, David Herrmann dh.herrm...@gmail.com wrote: Hi On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski l...@amacapital.net wrote: I see latencies of around 20 microseconds with lockdep and context tracking off. For example: Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I suspect you enabled metadata transmission, which I think is not a fair comparison. I tried to disable metadata. I may have failed. Regardless, if metadata is very slow, then that's more reason not to use it on send. And if you shouldn't use it, then maybe the kernel shouldn't provide it. I assumed there was a context switch in there. I can try to test differently. If UDS is twice as fast *with* a contest switch, then a userspace solution should be faster. Also, UDS can use memfds, too. A few notes on that: * kdbus is a bus layer. We don't intend to replace UDS, but improve dbus. Comparing roundtrip times with UDS is tempting, but in no way fair. To the very least, a bus layer has to perform peer-lookup, which UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice. Compare this to ~77us for dbus1 without marshaling. This makes me wonder what dbus1 is doing wrong. * We have not optimized kdbus code-paths for speed, yet. Our main concerns are algorithmic challenges, and we believe they've been improved considerably with kdbus. I have constantly measured kdbus performance with 'perf' and flame-graphs, and there're a lot of possible optimizations (especially on locking). However, I think this can be done afterwards just fine. Neither API nor ioctl overhead has shown up in my measurements. If anyone has counter evidence, please let us know. But I'm a bit reluctant to change our API solely based on performance guesses. But removal of send-time metadata can't be done after the fact. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski wrote: > I see "latencies" of around 20 microseconds with lockdep and context > tracking off. For example: Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I suspect you enabled metadata transmission, which I think is not a fair comparison. A few notes on that: * kdbus is a bus layer. We don't intend to replace UDS, but improve dbus. Comparing roundtrip times with UDS is tempting, but in no way fair. To the very least, a bus layer has to perform peer-lookup, which UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice. Compare this to ~77us for dbus1 without marshaling. * We have not optimized kdbus code-paths for speed, yet. Our main concerns are algorithmic challenges, and we believe they've been improved considerably with kdbus. I have constantly measured kdbus performance with 'perf' and flame-graphs, and there're a lot of possible optimizations (especially on locking). However, I think this can be done afterwards just fine. Neither API nor ioctl overhead has shown up in my measurements. If anyone has counter evidence, please let us know. But I'm a bit reluctant to change our API solely based on performance guesses. * We're about 50% slower than UDS on 1-byte transmissions. With 32k we're on-par. How can a lightweight user-space daemon even get close to that? * Broadcast performance is a completely different story. SEND gets around 30% faster compared to kdbus unicasts (as most of the control-paths are only taken once per message, instead of once per destination). * test-benchmark.c does performance tests in a single process. If the bus-layer is implemented in user-space, you need to account for context-switches and task wakeups. My UDS and pipe round-trip latency tests got around 3x slower if done cross processes (3.7us instead of 1.2us). With a user-space daemon, those slow-downs are taken two times more often for each roundtrip. * Process time is accounted on the sender, instead of a shared process (dbus-daemon). Broadcasts will thus no longer consume time-slices of dbus-daemon, but only the sender's. With kdbus, we implement a bus-layer. This is our only target! If your target environment does not require a bus, then don't use kdbus. We don't intend to replace UDS. On a bus-layer, we need peer-discovery, policy-handling, destination-lookups, broadcast-management and more. Pipes/UDS do not provide any of this. I cannot see how any other existing bus-implementation comes even close to kdbus, performance-wise. If someone does, please let us know! Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack wrote: > Hi Andy, > > On 02/02/2015 09:12 PM, Andy Lutomirski wrote: >> On Feb 2, 2015 1:34 AM, "Daniel Mack" wrote: > >>> That's right, but again - if an application wants to gather this kind of >>> information about tasks it interacts with, it can do so today by looking >>> at /proc or similar sources. Desktop machines do exactly that already, >>> and the kernel code executed in such cases very much resembles that in >>> metadata.c, and is certainly not cheaper. kdbus just makes such >>> information more accessible when requested. Which information is >>> collected is defined by bit-masks on both the sender and the receiver >>> connection, and most applications will effectively only use a very >>> limited set by default if they go through one of the more high-level >>> libraries. >> >> I should rephrase a bit. Kdbus doesn't require use of send-time >> metadata. It does, however, strongly encourage it, and it sounds like > > On the kernel level, kdbus just *offers* that, just like sockets offer > SO_PASSCRED. On the userland level, kdbus helps applications get that > information race-free, easier and faster than they would otherwise. > >> systemd and other major users will use send-time metadata. Once that >> happens, it's ABI (even if it's purely in userspace), and changing it >> is asking for security holes to pop up. So you'll be mostly stuck >> with it. > > We know we can't break the ABI. At most, we could deprecate item types > and introduce new ones, but we want to avoid that by all means of > course. However, I fail to see how that is related to send time > metadata, or even to kdbus in general, as all ABIs have to be kept stable. > >> Do you have some simple benchmark code you can share? I'd like to >> play with it a bit. > > Sure, it's part of the self-test suite. Call it with "-t benchmark" to > run the benchmark as isolated test with verbose output. The code for > that lives in test-benchmark.c. I see "latencies" of around 20 microseconds with lockdep and context tracking off. For example: stats (UNIX): 226730 packets processed, latency (nsecs) min/max/avg 3845 // 34828 //4069 stats (KDBUS): 37103 packets processed, latency (nsecs) min/max/avg 19123 // 99660 // 20696 This is IMO not very good. With memfds off: stats (UNIX): 226061 packets processed, latency (nsecs) min/max/avg 3885 // 32019 //4079 stats (KDBUS): 83284 packets processed, latency (nsecs) min/max/avg 10525 // 42578 // 10932 With memfds off and the payload set to 8 bytes: stats (KDBUS): 77669 packets processed, latency (nsecs) min/max/avg 9963 // 64325 // 11645 stats (UNIX): 253695 packets processed, latency (nsecs) min/max/avg 2986 // 56094 //3565 Am I missing something here? This is slow enough that a lightweight userspace dbus daemon should be able to outperform kdbus, or at least come very close. It would be kind of nice to know how long just the send call takes, too. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack dan...@zonque.org wrote: Hi Andy, On 02/02/2015 09:12 PM, Andy Lutomirski wrote: On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote: That's right, but again - if an application wants to gather this kind of information about tasks it interacts with, it can do so today by looking at /proc or similar sources. Desktop machines do exactly that already, and the kernel code executed in such cases very much resembles that in metadata.c, and is certainly not cheaper. kdbus just makes such information more accessible when requested. Which information is collected is defined by bit-masks on both the sender and the receiver connection, and most applications will effectively only use a very limited set by default if they go through one of the more high-level libraries. I should rephrase a bit. Kdbus doesn't require use of send-time metadata. It does, however, strongly encourage it, and it sounds like On the kernel level, kdbus just *offers* that, just like sockets offer SO_PASSCRED. On the userland level, kdbus helps applications get that information race-free, easier and faster than they would otherwise. systemd and other major users will use send-time metadata. Once that happens, it's ABI (even if it's purely in userspace), and changing it is asking for security holes to pop up. So you'll be mostly stuck with it. We know we can't break the ABI. At most, we could deprecate item types and introduce new ones, but we want to avoid that by all means of course. However, I fail to see how that is related to send time metadata, or even to kdbus in general, as all ABIs have to be kept stable. Do you have some simple benchmark code you can share? I'd like to play with it a bit. Sure, it's part of the self-test suite. Call it with -t benchmark to run the benchmark as isolated test with verbose output. The code for that lives in test-benchmark.c. I see latencies of around 20 microseconds with lockdep and context tracking off. For example: stats (UNIX): 226730 packets processed, latency (nsecs) min/max/avg 3845 // 34828 //4069 stats (KDBUS): 37103 packets processed, latency (nsecs) min/max/avg 19123 // 99660 // 20696 This is IMO not very good. With memfds off: stats (UNIX): 226061 packets processed, latency (nsecs) min/max/avg 3885 // 32019 //4079 stats (KDBUS): 83284 packets processed, latency (nsecs) min/max/avg 10525 // 42578 // 10932 With memfds off and the payload set to 8 bytes: stats (KDBUS): 77669 packets processed, latency (nsecs) min/max/avg 9963 // 64325 // 11645 stats (UNIX): 253695 packets processed, latency (nsecs) min/max/avg 2986 // 56094 //3565 Am I missing something here? This is slow enough that a lightweight userspace dbus daemon should be able to outperform kdbus, or at least come very close. It would be kind of nice to know how long just the send call takes, too. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski l...@amacapital.net wrote: I see latencies of around 20 microseconds with lockdep and context tracking off. For example: Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I suspect you enabled metadata transmission, which I think is not a fair comparison. A few notes on that: * kdbus is a bus layer. We don't intend to replace UDS, but improve dbus. Comparing roundtrip times with UDS is tempting, but in no way fair. To the very least, a bus layer has to perform peer-lookup, which UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice. Compare this to ~77us for dbus1 without marshaling. * We have not optimized kdbus code-paths for speed, yet. Our main concerns are algorithmic challenges, and we believe they've been improved considerably with kdbus. I have constantly measured kdbus performance with 'perf' and flame-graphs, and there're a lot of possible optimizations (especially on locking). However, I think this can be done afterwards just fine. Neither API nor ioctl overhead has shown up in my measurements. If anyone has counter evidence, please let us know. But I'm a bit reluctant to change our API solely based on performance guesses. * We're about 50% slower than UDS on 1-byte transmissions. With 32k we're on-par. How can a lightweight user-space daemon even get close to that? * Broadcast performance is a completely different story. SEND gets around 30% faster compared to kdbus unicasts (as most of the control-paths are only taken once per message, instead of once per destination). * test-benchmark.c does performance tests in a single process. If the bus-layer is implemented in user-space, you need to account for context-switches and task wakeups. My UDS and pipe round-trip latency tests got around 3x slower if done cross processes (3.7us instead of 1.2us). With a user-space daemon, those slow-downs are taken two times more often for each roundtrip. * Process time is accounted on the sender, instead of a shared process (dbus-daemon). Broadcasts will thus no longer consume time-slices of dbus-daemon, but only the sender's. With kdbus, we implement a bus-layer. This is our only target! If your target environment does not require a bus, then don't use kdbus. We don't intend to replace UDS. On a bus-layer, we need peer-discovery, policy-handling, destination-lookups, broadcast-management and more. Pipes/UDS do not provide any of this. I cannot see how any other existing bus-implementation comes even close to kdbus, performance-wise. If someone does, please let us know! Thanks David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Greg Kroah-Hartman writes: > On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote: >> Andy Lutomirski writes: >> >> > On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack wrote: >> >> Hi Andy, >> >> >> >> On 02/02/2015 09:12 PM, Andy Lutomirski wrote: >> >>> On Feb 2, 2015 1:34 AM, "Daniel Mack" wrote: >> >> >> That's right, but again - if an application wants to gather this kind of >> information about tasks it interacts with, it can do so today by looking >> at /proc or similar sources. Desktop machines do exactly that already, >> and the kernel code executed in such cases very much resembles that in >> metadata.c, and is certainly not cheaper. kdbus just makes such >> information more accessible when requested. Which information is >> collected is defined by bit-masks on both the sender and the receiver >> connection, and most applications will effectively only use a very >> limited set by default if they go through one of the more high-level >> libraries. >> >>> >> >>> I should rephrase a bit. Kdbus doesn't require use of send-time >> >>> metadata. It does, however, strongly encourage it, and it sounds like >> >> >> >> On the kernel level, kdbus just *offers* that, just like sockets offer >> >> SO_PASSCRED. On the userland level, kdbus helps applications get that >> >> information race-free, easier and faster than they would otherwise. >> >> >> >>> systemd and other major users will use send-time metadata. Once that >> >>> happens, it's ABI (even if it's purely in userspace), and changing it >> >>> is asking for security holes to pop up. So you'll be mostly stuck >> >>> with it. >> >> >> >> We know we can't break the ABI. At most, we could deprecate item types >> >> and introduce new ones, but we want to avoid that by all means of >> >> course. However, I fail to see how that is related to send time >> >> metadata, or even to kdbus in general, as all ABIs have to be kept stable. >> > >> > I should have said it differently. ABI is the wrong term -- it's more >> > of a protocol issue. >> > >> > It looks like, with the current code, the kernel will provide >> > (optional) send-time metadata, and the sd-bus library will use it. >> > The result will be that the communication protocol between clients and >> > udev, systemd, systemd-logind, g-s-d, etc, will likely involve >> > send-time metadata. This may end up being a bottleneck. >> >> A quick note on a couple of things I have seen in this conversation. >> >> - The reason for kdbus is performance. > > No, that's not the only reason for kdbus, don't focus only on this. I > set out a long list of things for why we created kdbus, speed was only > one of the things. Security is also one, and the ability to gather > these attributes in an atomic and secure way is very important as > userspace wants this. Perhaps I should have said the predominant reason. Certainly that seems to be most of what I have seen talked about. Regardless looking at the performance in the design and removing any substantial obstacle to making things go fast. Further. I had this conversation earlier in an earlier round of the review and I was told that in fact existing dbus applications do not want or need these attributes. I think I heard journald wants them for pretty printing things. If security is your concern I really think per message attributes collected and sent when a message is sent is a bad idea. It has been a nasty anti-pattern in the kernel code. Lots and lots of meta-data copyed from a task and sent to someone else has significant performance, maintenance, and security impacts. Code written in that pattern is complex and hard to analyze, and hard to think about. Consider debugging why a message does not get the expected treatment from your suid application because someone changed the euid over that particular call and had not thought about it's consequences. Frankly I have been there and done that and it is a mess. So no I do not think breaking encapsulation and having weird side effects affecting your new primitive will have any security benefits whatsover. It will just result in brittle complex code. If you want to avoid the races causing sends through a file descriptor to fail that don't have the expected attributes (my constructive suggestion earlier) is a very different thing from a performance and mainteance standpoint. That does not increase the code complexity nearly as much in the implementation or in use, and unexpected failures happen right away. >> - pipes rather than unix domain sockets are likely the standard to meet. >> If you can't equal unix domain sockets for simple things you are >> likely leaving a lot of stops in. Last I looked pipes in general were >> notiably faster than unix domain sockets. >> >> The performance numbers I saw posted up-thread were horrible. I have >> seen faster numbers across a network of machines. If your ping-pong >> latency
Re: [PATCH 01/13] kdbus: add documentation
On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote: > Andy Lutomirski writes: > > > On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack wrote: > >> Hi Andy, > >> > >> On 02/02/2015 09:12 PM, Andy Lutomirski wrote: > >>> On Feb 2, 2015 1:34 AM, "Daniel Mack" wrote: > >> > That's right, but again - if an application wants to gather this kind of > information about tasks it interacts with, it can do so today by looking > at /proc or similar sources. Desktop machines do exactly that already, > and the kernel code executed in such cases very much resembles that in > metadata.c, and is certainly not cheaper. kdbus just makes such > information more accessible when requested. Which information is > collected is defined by bit-masks on both the sender and the receiver > connection, and most applications will effectively only use a very > limited set by default if they go through one of the more high-level > libraries. > >>> > >>> I should rephrase a bit. Kdbus doesn't require use of send-time > >>> metadata. It does, however, strongly encourage it, and it sounds like > >> > >> On the kernel level, kdbus just *offers* that, just like sockets offer > >> SO_PASSCRED. On the userland level, kdbus helps applications get that > >> information race-free, easier and faster than they would otherwise. > >> > >>> systemd and other major users will use send-time metadata. Once that > >>> happens, it's ABI (even if it's purely in userspace), and changing it > >>> is asking for security holes to pop up. So you'll be mostly stuck > >>> with it. > >> > >> We know we can't break the ABI. At most, we could deprecate item types > >> and introduce new ones, but we want to avoid that by all means of > >> course. However, I fail to see how that is related to send time > >> metadata, or even to kdbus in general, as all ABIs have to be kept stable. > > > > I should have said it differently. ABI is the wrong term -- it's more > > of a protocol issue. > > > > It looks like, with the current code, the kernel will provide > > (optional) send-time metadata, and the sd-bus library will use it. > > The result will be that the communication protocol between clients and > > udev, systemd, systemd-logind, g-s-d, etc, will likely involve > > send-time metadata. This may end up being a bottleneck. > > A quick note on a couple of things I have seen in this conversation. > > - The reason for kdbus is performance. No, that's not the only reason for kdbus, don't focus only on this. I set out a long list of things for why we created kdbus, speed was only one of the things. Security is also one, and the ability to gather these attributes in an atomic and secure way is very important as userspace wants this. > - pipes rather than unix domain sockets are likely the standard to meet. > If you can't equal unix domain sockets for simple things you are > likely leaving a lot of stops in. Last I looked pipes in general were > notiably faster than unix domain sockets. > > The performance numbers I saw posted up-thread were horrible. I have > seen faster numbers across a network of machines. If your ping-pong > latency isn't measured in nano-seconds you are probably doing > something wrong. It all depends on what you are passing on that "ping-pong", a real D-Bus connection has real data and meta data that has to be sent. Trying to make a fake benchmark number isn't going to show anything. > - syscalls remove overhead. So since performance is kdbus's reason for > existence > let's remove some ridiculous stops, and get a fast path into the kernel. Again, not the only reason, see my first post in this thread for details. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Andy Lutomirski writes: > On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack wrote: >> Hi Andy, >> >> On 02/02/2015 09:12 PM, Andy Lutomirski wrote: >>> On Feb 2, 2015 1:34 AM, "Daniel Mack" wrote: >> That's right, but again - if an application wants to gather this kind of information about tasks it interacts with, it can do so today by looking at /proc or similar sources. Desktop machines do exactly that already, and the kernel code executed in such cases very much resembles that in metadata.c, and is certainly not cheaper. kdbus just makes such information more accessible when requested. Which information is collected is defined by bit-masks on both the sender and the receiver connection, and most applications will effectively only use a very limited set by default if they go through one of the more high-level libraries. >>> >>> I should rephrase a bit. Kdbus doesn't require use of send-time >>> metadata. It does, however, strongly encourage it, and it sounds like >> >> On the kernel level, kdbus just *offers* that, just like sockets offer >> SO_PASSCRED. On the userland level, kdbus helps applications get that >> information race-free, easier and faster than they would otherwise. >> >>> systemd and other major users will use send-time metadata. Once that >>> happens, it's ABI (even if it's purely in userspace), and changing it >>> is asking for security holes to pop up. So you'll be mostly stuck >>> with it. >> >> We know we can't break the ABI. At most, we could deprecate item types >> and introduce new ones, but we want to avoid that by all means of >> course. However, I fail to see how that is related to send time >> metadata, or even to kdbus in general, as all ABIs have to be kept stable. > > I should have said it differently. ABI is the wrong term -- it's more > of a protocol issue. > > It looks like, with the current code, the kernel will provide > (optional) send-time metadata, and the sd-bus library will use it. > The result will be that the communication protocol between clients and > udev, systemd, systemd-logind, g-s-d, etc, will likely involve > send-time metadata. This may end up being a bottleneck. A quick note on a couple of things I have seen in this conversation. - The reason for kdbus is performance. - pipes rather than unix domain sockets are likely the standard to meet. If you can't equal unix domain sockets for simple things you are likely leaving a lot of stops in. Last I looked pipes in general were notiably faster than unix domain sockets. The performance numbers I saw posted up-thread were horrible. I have seen faster numbers across a network of machines. If your ping-pong latency isn't measured in nano-seconds you are probably doing something wrong. - syscalls remove overhead. So since performance is kdbus's reason for existence let's remove some ridiculous stops, and get a fast path into the kernel. - send-time metadata is a performance nightmare. SO_PASSCRED is hard to implement in a fast performant way, especially when namespaces get involved. Over the long term if you use send-time metadata you will grow the kind of compatibility hacks that the user namespace and the pid namespace have on SO_PASSCRED and things will slow down. A similar effect that is more performant in general is to enforce that the sender has the expected attributes. > Once this happens, changing the protocol will be very hard without > introducing security bugs. If people start switching to > connection-time metadata to gain performance, then they'll break both > the communication protocol and the expectations of client code. (In > fact, it'll break twice, sort of, since I think that the current > protocols are connect-time.) > > To me, this seems like a down-side of using send-time metadata, albeit > possibly not a huge downside at least in the near term. I don't see a > corresponding benefit, though. I think send-time metadata verification is less bad in this regard. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack wrote: > Hi Andy, > > On 02/02/2015 09:12 PM, Andy Lutomirski wrote: >> On Feb 2, 2015 1:34 AM, "Daniel Mack" wrote: > >>> That's right, but again - if an application wants to gather this kind of >>> information about tasks it interacts with, it can do so today by looking >>> at /proc or similar sources. Desktop machines do exactly that already, >>> and the kernel code executed in such cases very much resembles that in >>> metadata.c, and is certainly not cheaper. kdbus just makes such >>> information more accessible when requested. Which information is >>> collected is defined by bit-masks on both the sender and the receiver >>> connection, and most applications will effectively only use a very >>> limited set by default if they go through one of the more high-level >>> libraries. >> >> I should rephrase a bit. Kdbus doesn't require use of send-time >> metadata. It does, however, strongly encourage it, and it sounds like > > On the kernel level, kdbus just *offers* that, just like sockets offer > SO_PASSCRED. On the userland level, kdbus helps applications get that > information race-free, easier and faster than they would otherwise. > >> systemd and other major users will use send-time metadata. Once that >> happens, it's ABI (even if it's purely in userspace), and changing it >> is asking for security holes to pop up. So you'll be mostly stuck >> with it. > > We know we can't break the ABI. At most, we could deprecate item types > and introduce new ones, but we want to avoid that by all means of > course. However, I fail to see how that is related to send time > metadata, or even to kdbus in general, as all ABIs have to be kept stable. I should have said it differently. ABI is the wrong term -- it's more of a protocol issue. It looks like, with the current code, the kernel will provide (optional) send-time metadata, and the sd-bus library will use it. The result will be that the communication protocol between clients and udev, systemd, systemd-logind, g-s-d, etc, will likely involve send-time metadata. This may end up being a bottleneck. Once this happens, changing the protocol will be very hard without introducing security bugs. If people start switching to connection-time metadata to gain performance, then they'll break both the communication protocol and the expectations of client code. (In fact, it'll break twice, sort of, since I think that the current protocols are connect-time.) To me, this seems like a down-side of using send-time metadata, albeit possibly not a huge downside at least in the near term. I don't see a corresponding benefit, though. > >> Do you have some simple benchmark code you can share? I'd like to >> play with it a bit. > > Sure, it's part of the self-test suite. Call it with "-t benchmark" to > run the benchmark as isolated test with verbose output. The code for > that lives in test-benchmark.c. > I'll try to play with this soon. Thanks. --Andy > > Thanks, > Daniel > > -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Andy, On 02/02/2015 09:12 PM, Andy Lutomirski wrote: > On Feb 2, 2015 1:34 AM, "Daniel Mack" wrote: >> That's right, but again - if an application wants to gather this kind of >> information about tasks it interacts with, it can do so today by looking >> at /proc or similar sources. Desktop machines do exactly that already, >> and the kernel code executed in such cases very much resembles that in >> metadata.c, and is certainly not cheaper. kdbus just makes such >> information more accessible when requested. Which information is >> collected is defined by bit-masks on both the sender and the receiver >> connection, and most applications will effectively only use a very >> limited set by default if they go through one of the more high-level >> libraries. > > I should rephrase a bit. Kdbus doesn't require use of send-time > metadata. It does, however, strongly encourage it, and it sounds like On the kernel level, kdbus just *offers* that, just like sockets offer SO_PASSCRED. On the userland level, kdbus helps applications get that information race-free, easier and faster than they would otherwise. > systemd and other major users will use send-time metadata. Once that > happens, it's ABI (even if it's purely in userspace), and changing it > is asking for security holes to pop up. So you'll be mostly stuck > with it. We know we can't break the ABI. At most, we could deprecate item types and introduce new ones, but we want to avoid that by all means of course. However, I fail to see how that is related to send time metadata, or even to kdbus in general, as all ABIs have to be kept stable. > Do you have some simple benchmark code you can share? I'd like to > play with it a bit. Sure, it's part of the self-test suite. Call it with "-t benchmark" to run the benchmark as isolated test with verbose output. The code for that lives in test-benchmark.c. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Greg Kroah-Hartman gre...@linuxfoundation.org writes: On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote: Andy Lutomirski l...@amacapital.net writes: On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack dan...@zonque.org wrote: Hi Andy, On 02/02/2015 09:12 PM, Andy Lutomirski wrote: On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote: That's right, but again - if an application wants to gather this kind of information about tasks it interacts with, it can do so today by looking at /proc or similar sources. Desktop machines do exactly that already, and the kernel code executed in such cases very much resembles that in metadata.c, and is certainly not cheaper. kdbus just makes such information more accessible when requested. Which information is collected is defined by bit-masks on both the sender and the receiver connection, and most applications will effectively only use a very limited set by default if they go through one of the more high-level libraries. I should rephrase a bit. Kdbus doesn't require use of send-time metadata. It does, however, strongly encourage it, and it sounds like On the kernel level, kdbus just *offers* that, just like sockets offer SO_PASSCRED. On the userland level, kdbus helps applications get that information race-free, easier and faster than they would otherwise. systemd and other major users will use send-time metadata. Once that happens, it's ABI (even if it's purely in userspace), and changing it is asking for security holes to pop up. So you'll be mostly stuck with it. We know we can't break the ABI. At most, we could deprecate item types and introduce new ones, but we want to avoid that by all means of course. However, I fail to see how that is related to send time metadata, or even to kdbus in general, as all ABIs have to be kept stable. I should have said it differently. ABI is the wrong term -- it's more of a protocol issue. It looks like, with the current code, the kernel will provide (optional) send-time metadata, and the sd-bus library will use it. The result will be that the communication protocol between clients and udev, systemd, systemd-logind, g-s-d, etc, will likely involve send-time metadata. This may end up being a bottleneck. A quick note on a couple of things I have seen in this conversation. - The reason for kdbus is performance. No, that's not the only reason for kdbus, don't focus only on this. I set out a long list of things for why we created kdbus, speed was only one of the things. Security is also one, and the ability to gather these attributes in an atomic and secure way is very important as userspace wants this. Perhaps I should have said the predominant reason. Certainly that seems to be most of what I have seen talked about. Regardless looking at the performance in the design and removing any substantial obstacle to making things go fast. Further. I had this conversation earlier in an earlier round of the review and I was told that in fact existing dbus applications do not want or need these attributes. I think I heard journald wants them for pretty printing things. If security is your concern I really think per message attributes collected and sent when a message is sent is a bad idea. It has been a nasty anti-pattern in the kernel code. Lots and lots of meta-data copyed from a task and sent to someone else has significant performance, maintenance, and security impacts. Code written in that pattern is complex and hard to analyze, and hard to think about. Consider debugging why a message does not get the expected treatment from your suid application because someone changed the euid over that particular call and had not thought about it's consequences. Frankly I have been there and done that and it is a mess. So no I do not think breaking encapsulation and having weird side effects affecting your new primitive will have any security benefits whatsover. It will just result in brittle complex code. If you want to avoid the races causing sends through a file descriptor to fail that don't have the expected attributes (my constructive suggestion earlier) is a very different thing from a performance and mainteance standpoint. That does not increase the code complexity nearly as much in the implementation or in use, and unexpected failures happen right away. - pipes rather than unix domain sockets are likely the standard to meet. If you can't equal unix domain sockets for simple things you are likely leaving a lot of stops in. Last I looked pipes in general were notiably faster than unix domain sockets. The performance numbers I saw posted up-thread were horrible. I have seen faster numbers across a network of machines. If your ping-pong latency isn't measured in nano-seconds you are probably doing something wrong. It all depends on what you are passing on that ping-pong, a real
Re: [PATCH 01/13] kdbus: add documentation
Andy Lutomirski l...@amacapital.net writes: On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack dan...@zonque.org wrote: Hi Andy, On 02/02/2015 09:12 PM, Andy Lutomirski wrote: On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote: That's right, but again - if an application wants to gather this kind of information about tasks it interacts with, it can do so today by looking at /proc or similar sources. Desktop machines do exactly that already, and the kernel code executed in such cases very much resembles that in metadata.c, and is certainly not cheaper. kdbus just makes such information more accessible when requested. Which information is collected is defined by bit-masks on both the sender and the receiver connection, and most applications will effectively only use a very limited set by default if they go through one of the more high-level libraries. I should rephrase a bit. Kdbus doesn't require use of send-time metadata. It does, however, strongly encourage it, and it sounds like On the kernel level, kdbus just *offers* that, just like sockets offer SO_PASSCRED. On the userland level, kdbus helps applications get that information race-free, easier and faster than they would otherwise. systemd and other major users will use send-time metadata. Once that happens, it's ABI (even if it's purely in userspace), and changing it is asking for security holes to pop up. So you'll be mostly stuck with it. We know we can't break the ABI. At most, we could deprecate item types and introduce new ones, but we want to avoid that by all means of course. However, I fail to see how that is related to send time metadata, or even to kdbus in general, as all ABIs have to be kept stable. I should have said it differently. ABI is the wrong term -- it's more of a protocol issue. It looks like, with the current code, the kernel will provide (optional) send-time metadata, and the sd-bus library will use it. The result will be that the communication protocol between clients and udev, systemd, systemd-logind, g-s-d, etc, will likely involve send-time metadata. This may end up being a bottleneck. A quick note on a couple of things I have seen in this conversation. - The reason for kdbus is performance. - pipes rather than unix domain sockets are likely the standard to meet. If you can't equal unix domain sockets for simple things you are likely leaving a lot of stops in. Last I looked pipes in general were notiably faster than unix domain sockets. The performance numbers I saw posted up-thread were horrible. I have seen faster numbers across a network of machines. If your ping-pong latency isn't measured in nano-seconds you are probably doing something wrong. - syscalls remove overhead. So since performance is kdbus's reason for existence let's remove some ridiculous stops, and get a fast path into the kernel. - send-time metadata is a performance nightmare. SO_PASSCRED is hard to implement in a fast performant way, especially when namespaces get involved. Over the long term if you use send-time metadata you will grow the kind of compatibility hacks that the user namespace and the pid namespace have on SO_PASSCRED and things will slow down. A similar effect that is more performant in general is to enforce that the sender has the expected attributes. Once this happens, changing the protocol will be very hard without introducing security bugs. If people start switching to connection-time metadata to gain performance, then they'll break both the communication protocol and the expectations of client code. (In fact, it'll break twice, sort of, since I think that the current protocols are connect-time.) To me, this seems like a down-side of using send-time metadata, albeit possibly not a huge downside at least in the near term. I don't see a corresponding benefit, though. I think send-time metadata verification is less bad in this regard. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote: Andy Lutomirski l...@amacapital.net writes: On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack dan...@zonque.org wrote: Hi Andy, On 02/02/2015 09:12 PM, Andy Lutomirski wrote: On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote: That's right, but again - if an application wants to gather this kind of information about tasks it interacts with, it can do so today by looking at /proc or similar sources. Desktop machines do exactly that already, and the kernel code executed in such cases very much resembles that in metadata.c, and is certainly not cheaper. kdbus just makes such information more accessible when requested. Which information is collected is defined by bit-masks on both the sender and the receiver connection, and most applications will effectively only use a very limited set by default if they go through one of the more high-level libraries. I should rephrase a bit. Kdbus doesn't require use of send-time metadata. It does, however, strongly encourage it, and it sounds like On the kernel level, kdbus just *offers* that, just like sockets offer SO_PASSCRED. On the userland level, kdbus helps applications get that information race-free, easier and faster than they would otherwise. systemd and other major users will use send-time metadata. Once that happens, it's ABI (even if it's purely in userspace), and changing it is asking for security holes to pop up. So you'll be mostly stuck with it. We know we can't break the ABI. At most, we could deprecate item types and introduce new ones, but we want to avoid that by all means of course. However, I fail to see how that is related to send time metadata, or even to kdbus in general, as all ABIs have to be kept stable. I should have said it differently. ABI is the wrong term -- it's more of a protocol issue. It looks like, with the current code, the kernel will provide (optional) send-time metadata, and the sd-bus library will use it. The result will be that the communication protocol between clients and udev, systemd, systemd-logind, g-s-d, etc, will likely involve send-time metadata. This may end up being a bottleneck. A quick note on a couple of things I have seen in this conversation. - The reason for kdbus is performance. No, that's not the only reason for kdbus, don't focus only on this. I set out a long list of things for why we created kdbus, speed was only one of the things. Security is also one, and the ability to gather these attributes in an atomic and secure way is very important as userspace wants this. - pipes rather than unix domain sockets are likely the standard to meet. If you can't equal unix domain sockets for simple things you are likely leaving a lot of stops in. Last I looked pipes in general were notiably faster than unix domain sockets. The performance numbers I saw posted up-thread were horrible. I have seen faster numbers across a network of machines. If your ping-pong latency isn't measured in nano-seconds you are probably doing something wrong. It all depends on what you are passing on that ping-pong, a real D-Bus connection has real data and meta data that has to be sent. Trying to make a fake benchmark number isn't going to show anything. - syscalls remove overhead. So since performance is kdbus's reason for existence let's remove some ridiculous stops, and get a fast path into the kernel. Again, not the only reason, see my first post in this thread for details. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack dan...@zonque.org wrote: Hi Andy, On 02/02/2015 09:12 PM, Andy Lutomirski wrote: On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote: That's right, but again - if an application wants to gather this kind of information about tasks it interacts with, it can do so today by looking at /proc or similar sources. Desktop machines do exactly that already, and the kernel code executed in such cases very much resembles that in metadata.c, and is certainly not cheaper. kdbus just makes such information more accessible when requested. Which information is collected is defined by bit-masks on both the sender and the receiver connection, and most applications will effectively only use a very limited set by default if they go through one of the more high-level libraries. I should rephrase a bit. Kdbus doesn't require use of send-time metadata. It does, however, strongly encourage it, and it sounds like On the kernel level, kdbus just *offers* that, just like sockets offer SO_PASSCRED. On the userland level, kdbus helps applications get that information race-free, easier and faster than they would otherwise. systemd and other major users will use send-time metadata. Once that happens, it's ABI (even if it's purely in userspace), and changing it is asking for security holes to pop up. So you'll be mostly stuck with it. We know we can't break the ABI. At most, we could deprecate item types and introduce new ones, but we want to avoid that by all means of course. However, I fail to see how that is related to send time metadata, or even to kdbus in general, as all ABIs have to be kept stable. I should have said it differently. ABI is the wrong term -- it's more of a protocol issue. It looks like, with the current code, the kernel will provide (optional) send-time metadata, and the sd-bus library will use it. The result will be that the communication protocol between clients and udev, systemd, systemd-logind, g-s-d, etc, will likely involve send-time metadata. This may end up being a bottleneck. Once this happens, changing the protocol will be very hard without introducing security bugs. If people start switching to connection-time metadata to gain performance, then they'll break both the communication protocol and the expectations of client code. (In fact, it'll break twice, sort of, since I think that the current protocols are connect-time.) To me, this seems like a down-side of using send-time metadata, albeit possibly not a huge downside at least in the near term. I don't see a corresponding benefit, though. Do you have some simple benchmark code you can share? I'd like to play with it a bit. Sure, it's part of the self-test suite. Call it with -t benchmark to run the benchmark as isolated test with verbose output. The code for that lives in test-benchmark.c. I'll try to play with this soon. Thanks. --Andy Thanks, Daniel -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Andy, On 02/02/2015 09:12 PM, Andy Lutomirski wrote: On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote: That's right, but again - if an application wants to gather this kind of information about tasks it interacts with, it can do so today by looking at /proc or similar sources. Desktop machines do exactly that already, and the kernel code executed in such cases very much resembles that in metadata.c, and is certainly not cheaper. kdbus just makes such information more accessible when requested. Which information is collected is defined by bit-masks on both the sender and the receiver connection, and most applications will effectively only use a very limited set by default if they go through one of the more high-level libraries. I should rephrase a bit. Kdbus doesn't require use of send-time metadata. It does, however, strongly encourage it, and it sounds like On the kernel level, kdbus just *offers* that, just like sockets offer SO_PASSCRED. On the userland level, kdbus helps applications get that information race-free, easier and faster than they would otherwise. systemd and other major users will use send-time metadata. Once that happens, it's ABI (even if it's purely in userspace), and changing it is asking for security holes to pop up. So you'll be mostly stuck with it. We know we can't break the ABI. At most, we could deprecate item types and introduce new ones, but we want to avoid that by all means of course. However, I fail to see how that is related to send time metadata, or even to kdbus in general, as all ABIs have to be kept stable. Do you have some simple benchmark code you can share? I'd like to play with it a bit. Sure, it's part of the self-test suite. Call it with -t benchmark to run the benchmark as isolated test with verbose output. The code for that lives in test-benchmark.c. Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Feb 2, 2015 1:34 AM, "Daniel Mack" wrote: > > Hi Andy, > > On 01/29/2015 01:09 PM, Andy Lutomirski wrote: > > On Jan 29, 2015 6:42 AM, "Daniel Mack" wrote: > > >> As we explained before, currently, D-Bus peers do collect the same > >> information already if they need to have them, but they have to do deal > >> with the inherit races in such cases. kdbus is closing the gap by > >> optionally providing the same information along with each message, if > >> requested. > > > > In all these discussions, no one ever gave a decent example use case. > > If a process drops some privilege, it must close all fds it has that > > captured its old privilege. This has nothing to do with kdbus. > > kdbus does not implement any new concept here but sticks to what > SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a > file-descriptor from socket() or socketpair() and freely pass it around > between different tasks or threads, but messages will always have the > credentials attached that are valid at *send* time. SO_PEERCREDS, > however, still reports the connect-time credentials, and kdbus provides > exactly the same semantics and both ways of retrieving information. > > > I agree that the design seems to have improved to a state of being at > > least decent, > > One reason for that is your feedback. Thanks for that again! > > > It's an optional feature that will get used, non-optionally, thousands > > of times on each boot, apparently. Keep in mind that it's also a > > scalability problem because it takes locks. If it ever gets used > > thousands of times per CPU on a big thousand-core machine, it's going > > to suck, and you'll have backed yourself into a corner. > > That's right, but again - if an application wants to gather this kind of > information about tasks it interacts with, it can do so today by looking > at /proc or similar sources. Desktop machines do exactly that already, > and the kernel code executed in such cases very much resembles that in > metadata.c, and is certainly not cheaper. kdbus just makes such > information more accessible when requested. Which information is > collected is defined by bit-masks on both the sender and the receiver > connection, and most applications will effectively only use a very > limited set by default if they go through one of the more high-level > libraries. I should rephrase a bit. Kdbus doesn't require use of send-time metadata. It does, however, strongly encourage it, and it sounds like systemd and other major users will use send-time metadata. Once that happens, it's ABI (even if it's purely in userspace), and changing it is asking for security holes to pop up. So you'll be mostly stuck with it. > > Also, when metadata is collected, the code mostly takes temporary > references on objects like PIDs, namespaces etc. Which operation would > you consider particularly expensive? The refcounting, copies of some of the data, and counting bytes and allocating space. The refcounting is the part that will scale particularly badly on many CPUs. Do you have some simple benchmark code you can share? I'd like to play with it a bit. --Andy > > > Thanks again, > Daniel > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Andy, On 01/29/2015 01:09 PM, Andy Lutomirski wrote: > On Jan 29, 2015 6:42 AM, "Daniel Mack" wrote: >> As we explained before, currently, D-Bus peers do collect the same >> information already if they need to have them, but they have to do deal >> with the inherit races in such cases. kdbus is closing the gap by >> optionally providing the same information along with each message, if >> requested. > > In all these discussions, no one ever gave a decent example use case. > If a process drops some privilege, it must close all fds it has that > captured its old privilege. This has nothing to do with kdbus. kdbus does not implement any new concept here but sticks to what SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a file-descriptor from socket() or socketpair() and freely pass it around between different tasks or threads, but messages will always have the credentials attached that are valid at *send* time. SO_PEERCREDS, however, still reports the connect-time credentials, and kdbus provides exactly the same semantics and both ways of retrieving information. > I agree that the design seems to have improved to a state of being at > least decent, One reason for that is your feedback. Thanks for that again! > It's an optional feature that will get used, non-optionally, thousands > of times on each boot, apparently. Keep in mind that it's also a > scalability problem because it takes locks. If it ever gets used > thousands of times per CPU on a big thousand-core machine, it's going > to suck, and you'll have backed yourself into a corner. That's right, but again - if an application wants to gather this kind of information about tasks it interacts with, it can do so today by looking at /proc or similar sources. Desktop machines do exactly that already, and the kernel code executed in such cases very much resembles that in metadata.c, and is certainly not cheaper. kdbus just makes such information more accessible when requested. Which information is collected is defined by bit-masks on both the sender and the receiver connection, and most applications will effectively only use a very limited set by default if they go through one of the more high-level libraries. Also, when metadata is collected, the code mostly takes temporary references on objects like PIDs, namespaces etc. Which operation would you consider particularly expensive? Thanks again, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote: Hi Andy, On 01/29/2015 01:09 PM, Andy Lutomirski wrote: On Jan 29, 2015 6:42 AM, Daniel Mack dan...@zonque.org wrote: As we explained before, currently, D-Bus peers do collect the same information already if they need to have them, but they have to do deal with the inherit races in such cases. kdbus is closing the gap by optionally providing the same information along with each message, if requested. In all these discussions, no one ever gave a decent example use case. If a process drops some privilege, it must close all fds it has that captured its old privilege. This has nothing to do with kdbus. kdbus does not implement any new concept here but sticks to what SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a file-descriptor from socket() or socketpair() and freely pass it around between different tasks or threads, but messages will always have the credentials attached that are valid at *send* time. SO_PEERCREDS, however, still reports the connect-time credentials, and kdbus provides exactly the same semantics and both ways of retrieving information. I agree that the design seems to have improved to a state of being at least decent, One reason for that is your feedback. Thanks for that again! It's an optional feature that will get used, non-optionally, thousands of times on each boot, apparently. Keep in mind that it's also a scalability problem because it takes locks. If it ever gets used thousands of times per CPU on a big thousand-core machine, it's going to suck, and you'll have backed yourself into a corner. That's right, but again - if an application wants to gather this kind of information about tasks it interacts with, it can do so today by looking at /proc or similar sources. Desktop machines do exactly that already, and the kernel code executed in such cases very much resembles that in metadata.c, and is certainly not cheaper. kdbus just makes such information more accessible when requested. Which information is collected is defined by bit-masks on both the sender and the receiver connection, and most applications will effectively only use a very limited set by default if they go through one of the more high-level libraries. I should rephrase a bit. Kdbus doesn't require use of send-time metadata. It does, however, strongly encourage it, and it sounds like systemd and other major users will use send-time metadata. Once that happens, it's ABI (even if it's purely in userspace), and changing it is asking for security holes to pop up. So you'll be mostly stuck with it. Also, when metadata is collected, the code mostly takes temporary references on objects like PIDs, namespaces etc. Which operation would you consider particularly expensive? The refcounting, copies of some of the data, and counting bytes and allocating space. The refcounting is the part that will scale particularly badly on many CPUs. Do you have some simple benchmark code you can share? I'd like to play with it a bit. --Andy Thanks again, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Andy, On 01/29/2015 01:09 PM, Andy Lutomirski wrote: On Jan 29, 2015 6:42 AM, Daniel Mack dan...@zonque.org wrote: As we explained before, currently, D-Bus peers do collect the same information already if they need to have them, but they have to do deal with the inherit races in such cases. kdbus is closing the gap by optionally providing the same information along with each message, if requested. In all these discussions, no one ever gave a decent example use case. If a process drops some privilege, it must close all fds it has that captured its old privilege. This has nothing to do with kdbus. kdbus does not implement any new concept here but sticks to what SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a file-descriptor from socket() or socketpair() and freely pass it around between different tasks or threads, but messages will always have the credentials attached that are valid at *send* time. SO_PEERCREDS, however, still reports the connect-time credentials, and kdbus provides exactly the same semantics and both ways of retrieving information. I agree that the design seems to have improved to a state of being at least decent, One reason for that is your feedback. Thanks for that again! It's an optional feature that will get used, non-optionally, thousands of times on each boot, apparently. Keep in mind that it's also a scalability problem because it takes locks. If it ever gets used thousands of times per CPU on a big thousand-core machine, it's going to suck, and you'll have backed yourself into a corner. That's right, but again - if an application wants to gather this kind of information about tasks it interacts with, it can do so today by looking at /proc or similar sources. Desktop machines do exactly that already, and the kernel code executed in such cases very much resembles that in metadata.c, and is certainly not cheaper. kdbus just makes such information more accessible when requested. Which information is collected is defined by bit-masks on both the sender and the receiver connection, and most applications will effectively only use a very limited set by default if they go through one of the more high-level libraries. Also, when metadata is collected, the code mostly takes temporary references on objects like PIDs, namespaces etc. Which operation would you consider particularly expensive? Thanks again, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Jan 29, 2015 6:42 AM, "Daniel Mack" wrote: > > On 01/29/2015 12:25 PM, Andy Lutomirski wrote: > > On Jan 29, 2015 3:53 AM, "Daniel Mack" wrote: > > >> Also note that if a receiving peer opts in for a certain piece of > >> metadata, it should do that that for a good reason, because it needs > >> that data to process a request. Letting kdbus do the work of providing > >> such information is still a lot faster than having the receiving peer > >> gather it itself, as that would involve more syscalls and more context > >> switches (let alone the fact that doing so is inherently racy, as > >> explained in earlier threads). > > > Given that I see almost no advantage to send-time metadata, and I see > > three disadvantages (slower, inconsistent with the basic POSIX model, > > and inconsistent with existing user-space dbus), I still don't see why > > you designed it this way. > > Because effective information about tasks may change over time, and > D-Bus is a connection-less protocol that has no notion of peer-to-peer > connections. > > As we explained before, currently, D-Bus peers do collect the same > information already if they need to have them, but they have to do deal > with the inherit races in such cases. kdbus is closing the gap by > optionally providing the same information along with each message, if > requested. In all these discussions, no one ever gave a decent example use case. If a process drops some privilege, it must close all fds it has that captured its old privilege. This has nothing to do with kdbus. With kdbus, you still need to close and reopen your kdbus fd, unless you've disabled that bit of metadata, so using send-time metadata hasn't bought you benefit that I can see. I agree that the design seems to have improved to a state of being at least decent, but that doesn't mean that using send-time metadata is a good idea for systemd or for anything else. > > > There's an added disadvantage of the current design: if a kdbus user > > is communicating with a traditional d-bus user using the proxy, then > > IIUC the credentials at the time of connection get used. > > That's not quite true any more. After our discussion in v2, we agreed on > dropping this detail. If you're using the proxy, no metadata is attached > to messages any more. Userspace has to gather this information in the > traditional, racy way in such cases. You are right - metadata about the > proxy task is of no interest here, and hence dropping the information > altogether is the most consistent thing we can do. > > But again - that metadata thing just an optional feature. People > developing with the bare kernel-level API are free to ignore all that > and just just kdbus as low-level protocol for reliable multicast. Note > that in such cases, you would still be able to retrieve the connect-time > metadata if that's needed. > It's an optional feature that will get used, non-optionally, thousands of times on each boot, apparently. Keep in mind that it's also a scalability problem because it takes locks. If it ever gets used thousands of times per CPU on a big thousand-core machine, it's going to suck, and you'll have backed yourself into a corner. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On 01/29/2015 12:25 PM, Andy Lutomirski wrote: > On Jan 29, 2015 3:53 AM, "Daniel Mack" wrote: >> Also note that if a receiving peer opts in for a certain piece of >> metadata, it should do that that for a good reason, because it needs >> that data to process a request. Letting kdbus do the work of providing >> such information is still a lot faster than having the receiving peer >> gather it itself, as that would involve more syscalls and more context >> switches (let alone the fact that doing so is inherently racy, as >> explained in earlier threads). > Given that I see almost no advantage to send-time metadata, and I see > three disadvantages (slower, inconsistent with the basic POSIX model, > and inconsistent with existing user-space dbus), I still don't see why > you designed it this way. Because effective information about tasks may change over time, and D-Bus is a connection-less protocol that has no notion of peer-to-peer connections. As we explained before, currently, D-Bus peers do collect the same information already if they need to have them, but they have to do deal with the inherit races in such cases. kdbus is closing the gap by optionally providing the same information along with each message, if requested. > There's an added disadvantage of the current design: if a kdbus user > is communicating with a traditional d-bus user using the proxy, then > IIUC the credentials at the time of connection get used. That's not quite true any more. After our discussion in v2, we agreed on dropping this detail. If you're using the proxy, no metadata is attached to messages any more. Userspace has to gather this information in the traditional, racy way in such cases. You are right - metadata about the proxy task is of no interest here, and hence dropping the information altogether is the most consistent thing we can do. But again - that metadata thing just an optional feature. People developing with the bare kernel-level API are free to ignore all that and just just kdbus as low-level protocol for reliable multicast. Note that in such cases, you would still be able to retrieve the connect-time metadata if that's needed. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Jan 29, 2015 3:53 AM, "Daniel Mack" wrote: > > Hi Andy, > > On 01/27/2015 05:03 PM, Andy Lutomirski wrote: > > On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann > > wrote: > > >> A 16byte copy does not affect the performance of kdbus message > >> transactions in any way that matters. > > > What are the performance goals of kdbus? How fast is it ever intended > > to be? > > One of the design goals of kdbus is to speed up a typical D-Bus message > turnaround. That is, to minimize the number of context switches it > currently takes to get a message across the bus, and to avoid > unnecessary extra payload copies. > > Even though I'm sure there's still room for future improvement, the > benchmark test we provided in the kernel self-tests shows that basic > data transmission performance is roughly comparable to that of UDS for > smaller payloads. For payloads of bigger sizes (>128kb), kdbus is > actually faster due to its zero-copy mechanism. > > > The reason I ask is that, in the current design, kdbus > > collects "metadata" (credentials and other identifying information, > > collected in kdbus_meta_proc_collect) from the sender of every message > > *at send time*. [1] This is slow, and it will always be slow. The > > slowness of this operation will, in my personal system performance > > crystal ball, overshadow the cost of a 16 byte copy by several orders > > of magnitude. > > That's certainly true, but that's not a contradiction to the performance > argument. Please keep in mind that if a receiving peer does not request > any metadata, the kernel doesn't collect and attach any. We know that > gathering of some of the metadata comes at a price, which is why we > split up the information is such fine-grained pieces. > > Also note that if a receiving peer opts in for a certain piece of > metadata, it should do that that for a good reason, because it needs > that data to process a request. Letting kdbus do the work of providing > such information is still a lot faster than having the receiving peer > gather it itself, as that would involve more syscalls and more context > switches (let alone the fact that doing so is inherently racy, as > explained in earlier threads). All this is true, but if you used connect-time metadata, this would be a non-issue. Given that I see almost no advantage to send-time metadata, and I see three disadvantages (slower, inconsistent with the basic POSIX model, and inconsistent with existing user-space dbus), I still don't see why you designed it this way. There's an added disadvantage of the current design: if a kdbus user is communicating with a traditional d-bus user using the proxy, then IIUC the credentials at the time of connection get used. In summary, the current design is (a) unlike almost everything else that uses file descriptors, (b) much slower, (c) different from traditional d-bus, and (d) gives inconsistent behavior to new clients depending on what server they're connecting to. --Andy > > So, yes, collecting metadata can slow down message exchange, but after > all, that's an optional feature that has to be used with sense. I'll add > some words on that to the man-pages. > > > HTH, > Daniel > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Andy, On 01/27/2015 05:03 PM, Andy Lutomirski wrote: > On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann wrote: >> A 16byte copy does not affect the performance of kdbus message >> transactions in any way that matters. > What are the performance goals of kdbus? How fast is it ever intended > to be? One of the design goals of kdbus is to speed up a typical D-Bus message turnaround. That is, to minimize the number of context switches it currently takes to get a message across the bus, and to avoid unnecessary extra payload copies. Even though I'm sure there's still room for future improvement, the benchmark test we provided in the kernel self-tests shows that basic data transmission performance is roughly comparable to that of UDS for smaller payloads. For payloads of bigger sizes (>128kb), kdbus is actually faster due to its zero-copy mechanism. > The reason I ask is that, in the current design, kdbus > collects "metadata" (credentials and other identifying information, > collected in kdbus_meta_proc_collect) from the sender of every message > *at send time*. [1] This is slow, and it will always be slow. The > slowness of this operation will, in my personal system performance > crystal ball, overshadow the cost of a 16 byte copy by several orders > of magnitude. That's certainly true, but that's not a contradiction to the performance argument. Please keep in mind that if a receiving peer does not request any metadata, the kernel doesn't collect and attach any. We know that gathering of some of the metadata comes at a price, which is why we split up the information is such fine-grained pieces. Also note that if a receiving peer opts in for a certain piece of metadata, it should do that that for a good reason, because it needs that data to process a request. Letting kdbus do the work of providing such information is still a lot faster than having the receiving peer gather it itself, as that would involve more syscalls and more context switches (let alone the fact that doing so is inherently racy, as explained in earlier threads). So, yes, collecting metadata can slow down message exchange, but after all, that's an optional feature that has to be used with sense. I'll add some words on that to the man-pages. HTH, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Andy, On 01/27/2015 05:03 PM, Andy Lutomirski wrote: On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann dh.herrm...@gmail.com wrote: A 16byte copy does not affect the performance of kdbus message transactions in any way that matters. What are the performance goals of kdbus? How fast is it ever intended to be? One of the design goals of kdbus is to speed up a typical D-Bus message turnaround. That is, to minimize the number of context switches it currently takes to get a message across the bus, and to avoid unnecessary extra payload copies. Even though I'm sure there's still room for future improvement, the benchmark test we provided in the kernel self-tests shows that basic data transmission performance is roughly comparable to that of UDS for smaller payloads. For payloads of bigger sizes (128kb), kdbus is actually faster due to its zero-copy mechanism. The reason I ask is that, in the current design, kdbus collects metadata (credentials and other identifying information, collected in kdbus_meta_proc_collect) from the sender of every message *at send time*. [1] This is slow, and it will always be slow. The slowness of this operation will, in my personal system performance crystal ball, overshadow the cost of a 16 byte copy by several orders of magnitude. That's certainly true, but that's not a contradiction to the performance argument. Please keep in mind that if a receiving peer does not request any metadata, the kernel doesn't collect and attach any. We know that gathering of some of the metadata comes at a price, which is why we split up the information is such fine-grained pieces. Also note that if a receiving peer opts in for a certain piece of metadata, it should do that that for a good reason, because it needs that data to process a request. Letting kdbus do the work of providing such information is still a lot faster than having the receiving peer gather it itself, as that would involve more syscalls and more context switches (let alone the fact that doing so is inherently racy, as explained in earlier threads). So, yes, collecting metadata can slow down message exchange, but after all, that's an optional feature that has to be used with sense. I'll add some words on that to the man-pages. HTH, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On 01/29/2015 12:25 PM, Andy Lutomirski wrote: On Jan 29, 2015 3:53 AM, Daniel Mack dan...@zonque.org wrote: Also note that if a receiving peer opts in for a certain piece of metadata, it should do that that for a good reason, because it needs that data to process a request. Letting kdbus do the work of providing such information is still a lot faster than having the receiving peer gather it itself, as that would involve more syscalls and more context switches (let alone the fact that doing so is inherently racy, as explained in earlier threads). Given that I see almost no advantage to send-time metadata, and I see three disadvantages (slower, inconsistent with the basic POSIX model, and inconsistent with existing user-space dbus), I still don't see why you designed it this way. Because effective information about tasks may change over time, and D-Bus is a connection-less protocol that has no notion of peer-to-peer connections. As we explained before, currently, D-Bus peers do collect the same information already if they need to have them, but they have to do deal with the inherit races in such cases. kdbus is closing the gap by optionally providing the same information along with each message, if requested. There's an added disadvantage of the current design: if a kdbus user is communicating with a traditional d-bus user using the proxy, then IIUC the credentials at the time of connection get used. That's not quite true any more. After our discussion in v2, we agreed on dropping this detail. If you're using the proxy, no metadata is attached to messages any more. Userspace has to gather this information in the traditional, racy way in such cases. You are right - metadata about the proxy task is of no interest here, and hence dropping the information altogether is the most consistent thing we can do. But again - that metadata thing just an optional feature. People developing with the bare kernel-level API are free to ignore all that and just just kdbus as low-level protocol for reliable multicast. Note that in such cases, you would still be able to retrieve the connect-time metadata if that's needed. Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Jan 29, 2015 6:42 AM, Daniel Mack dan...@zonque.org wrote: On 01/29/2015 12:25 PM, Andy Lutomirski wrote: On Jan 29, 2015 3:53 AM, Daniel Mack dan...@zonque.org wrote: Also note that if a receiving peer opts in for a certain piece of metadata, it should do that that for a good reason, because it needs that data to process a request. Letting kdbus do the work of providing such information is still a lot faster than having the receiving peer gather it itself, as that would involve more syscalls and more context switches (let alone the fact that doing so is inherently racy, as explained in earlier threads). Given that I see almost no advantage to send-time metadata, and I see three disadvantages (slower, inconsistent with the basic POSIX model, and inconsistent with existing user-space dbus), I still don't see why you designed it this way. Because effective information about tasks may change over time, and D-Bus is a connection-less protocol that has no notion of peer-to-peer connections. As we explained before, currently, D-Bus peers do collect the same information already if they need to have them, but they have to do deal with the inherit races in such cases. kdbus is closing the gap by optionally providing the same information along with each message, if requested. In all these discussions, no one ever gave a decent example use case. If a process drops some privilege, it must close all fds it has that captured its old privilege. This has nothing to do with kdbus. With kdbus, you still need to close and reopen your kdbus fd, unless you've disabled that bit of metadata, so using send-time metadata hasn't bought you benefit that I can see. I agree that the design seems to have improved to a state of being at least decent, but that doesn't mean that using send-time metadata is a good idea for systemd or for anything else. There's an added disadvantage of the current design: if a kdbus user is communicating with a traditional d-bus user using the proxy, then IIUC the credentials at the time of connection get used. That's not quite true any more. After our discussion in v2, we agreed on dropping this detail. If you're using the proxy, no metadata is attached to messages any more. Userspace has to gather this information in the traditional, racy way in such cases. You are right - metadata about the proxy task is of no interest here, and hence dropping the information altogether is the most consistent thing we can do. But again - that metadata thing just an optional feature. People developing with the bare kernel-level API are free to ignore all that and just just kdbus as low-level protocol for reliable multicast. Note that in such cases, you would still be able to retrieve the connect-time metadata if that's needed. It's an optional feature that will get used, non-optionally, thousands of times on each boot, apparently. Keep in mind that it's also a scalability problem because it takes locks. If it ever gets used thousands of times per CPU on a big thousand-core machine, it's going to suck, and you'll have backed yourself into a corner. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Jan 29, 2015 3:53 AM, Daniel Mack dan...@zonque.org wrote: Hi Andy, On 01/27/2015 05:03 PM, Andy Lutomirski wrote: On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann dh.herrm...@gmail.com wrote: A 16byte copy does not affect the performance of kdbus message transactions in any way that matters. What are the performance goals of kdbus? How fast is it ever intended to be? One of the design goals of kdbus is to speed up a typical D-Bus message turnaround. That is, to minimize the number of context switches it currently takes to get a message across the bus, and to avoid unnecessary extra payload copies. Even though I'm sure there's still room for future improvement, the benchmark test we provided in the kernel self-tests shows that basic data transmission performance is roughly comparable to that of UDS for smaller payloads. For payloads of bigger sizes (128kb), kdbus is actually faster due to its zero-copy mechanism. The reason I ask is that, in the current design, kdbus collects metadata (credentials and other identifying information, collected in kdbus_meta_proc_collect) from the sender of every message *at send time*. [1] This is slow, and it will always be slow. The slowness of this operation will, in my personal system performance crystal ball, overshadow the cost of a 16 byte copy by several orders of magnitude. That's certainly true, but that's not a contradiction to the performance argument. Please keep in mind that if a receiving peer does not request any metadata, the kernel doesn't collect and attach any. We know that gathering of some of the metadata comes at a price, which is why we split up the information is such fine-grained pieces. Also note that if a receiving peer opts in for a certain piece of metadata, it should do that that for a good reason, because it needs that data to process a request. Letting kdbus do the work of providing such information is still a lot faster than having the receiving peer gather it itself, as that would involve more syscalls and more context switches (let alone the fact that doing so is inherently racy, as explained in earlier threads). All this is true, but if you used connect-time metadata, this would be a non-issue. Given that I see almost no advantage to send-time metadata, and I see three disadvantages (slower, inconsistent with the basic POSIX model, and inconsistent with existing user-space dbus), I still don't see why you designed it this way. There's an added disadvantage of the current design: if a kdbus user is communicating with a traditional d-bus user using the proxy, then IIUC the credentials at the time of connection get used. In summary, the current design is (a) unlike almost everything else that uses file descriptors, (b) much slower, (c) different from traditional d-bus, and (d) gives inconsistent behavior to new clients depending on what server they're connecting to. --Andy So, yes, collecting metadata can slow down message exchange, but after all, that's an optional feature that has to be used with sense. I'll add some words on that to the man-pages. HTH, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hello Daniel, On 01/27/2015 07:14 PM, Daniel Mack wrote: > Hi Michael, > > On 01/27/2015 06:53 PM, Michael Kerrisk (man-pages) wrote: >> On 01/27/2015 04:23 PM, David Herrmann wrote: > >>> I only expect a handful of users to call the ioctls directly. The >>> libraries that implement the payload-marshaling, in particular. It's a >>> similar situation with netlink. >> >> Thanks, David, for the clarification. I think it would have been helpful >> to have that more clearly stated up front, especially as some comments >> in this thread, such as the above, could be interpreted to mean quite >> the opposite. Can I suggest that some text on this point be added to >> kdbus.txt? > > We're currently working on an a set of comprehensive man pages to > document all the commands in the API, along with every struct, enum etc. > We do that so that developers are able to actually understand every > detail of the API, even though most people - as David explained - will > not use that interface directly in the first place but let one of the > high-level libraries help them integrate D-Bus functionality into their > applications. (I suggest that some text about this text go into the kdbus(7) page.) > If you want, have a look at the upstream repository for a preliminary > version of the new docs. That's at https://code.google.com/p/d-bus/ , right? This looks like a good direction to go in. Thanks for tackling that. I hope to take a longer look sometime soon, but a few general conventions for man-pages that you might want to consider following: * When listing errors, I think you should change your language/formatting somewhat. Examples here from kdbus.endpoint.7: (1) The man page says RETURN VALUE On success, all mentioned ioctl commands return 0. Better to write this from a user-space point of view: RETURN VALUE On success, all mentioned ioctl commands return 0; on error, -1 is returned, and errno is set to indicate the error. (2) I would change the wording in the ERRORS sections from "may return the following errors" to "may fail with the following errors" (3) When listing the errors, drop the minus signs; that's not what user-space sees. They see a positive value in errno. (4) The usual formatting convention for constants, including error constants in man pages is boldface, rather than underline/emphasis. (5) Insofar as it's possible, it would be good to make all pages format nicely within 80 columns. Some of the literal text and ASCII art could, I think, be narrowed. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hello Daniel, On 01/27/2015 07:14 PM, Daniel Mack wrote: Hi Michael, On 01/27/2015 06:53 PM, Michael Kerrisk (man-pages) wrote: On 01/27/2015 04:23 PM, David Herrmann wrote: I only expect a handful of users to call the ioctls directly. The libraries that implement the payload-marshaling, in particular. It's a similar situation with netlink. Thanks, David, for the clarification. I think it would have been helpful to have that more clearly stated up front, especially as some comments in this thread, such as the above, could be interpreted to mean quite the opposite. Can I suggest that some text on this point be added to kdbus.txt? We're currently working on an a set of comprehensive man pages to document all the commands in the API, along with every struct, enum etc. We do that so that developers are able to actually understand every detail of the API, even though most people - as David explained - will not use that interface directly in the first place but let one of the high-level libraries help them integrate D-Bus functionality into their applications. (I suggest that some text about this text go into the kdbus(7) page.) If you want, have a look at the upstream repository for a preliminary version of the new docs. That's at https://code.google.com/p/d-bus/ , right? This looks like a good direction to go in. Thanks for tackling that. I hope to take a longer look sometime soon, but a few general conventions for man-pages that you might want to consider following: * When listing errors, I think you should change your language/formatting somewhat. Examples here from kdbus.endpoint.7: (1) The man page says RETURN VALUE On success, all mentioned ioctl commands return 0. Better to write this from a user-space point of view: RETURN VALUE On success, all mentioned ioctl commands return 0; on error, -1 is returned, and errno is set to indicate the error. (2) I would change the wording in the ERRORS sections from may return the following errors to may fail with the following errors (3) When listing the errors, drop the minus signs; that's not what user-space sees. They see a positive value in errno. (4) The usual formatting convention for constants, including error constants in man pages is boldface, rather than underline/emphasis. (5) Insofar as it's possible, it would be good to make all pages format nicely within 80 columns. Some of the literal text and ASCII art could, I think, be narrowed. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael, On 01/27/2015 06:53 PM, Michael Kerrisk (man-pages) wrote: > On 01/27/2015 04:23 PM, David Herrmann wrote: >> I only expect a handful of users to call the ioctls directly. The >> libraries that implement the payload-marshaling, in particular. It's a >> similar situation with netlink. > > Thanks, David, for the clarification. I think it would have been helpful > to have that more clearly stated up front, especially as some comments > in this thread, such as the above, could be interpreted to mean quite > the opposite. Can I suggest that some text on this point be added to > kdbus.txt? We're currently working on an a set of comprehensive man pages to document all the commands in the API, along with every struct, enum etc. We do that so that developers are able to actually understand every detail of the API, even though most people - as David explained - will not use that interface directly in the first place but let one of the high-level libraries help them integrate D-Bus functionality into their applications. If you want, have a look at the upstream repository for a preliminary version of the new docs. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi David, On 01/27/2015 04:05 PM, David Herrmann wrote: > Hi > > On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages) > wrote: >> Hello Greg, >> >> On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote: >>> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote: While I agree that there should be a way for userspace to get the list of supported operations, userspace apps will only actually care about that once, when they begin talking to kdbus, because (ignoring the live kernel patching that people have been working on recently) the list of supported operations isn't going to change while the system is running. While a u64 copy has relatively low overhead, it does have overhead, and that is very significant when you consider part of the reason some people want kdbus is for the performance gain. Especially for those automotive applications that have been mentioned which fire off thousands of messages during start-up, every little bit of performance is significant. >>> >>> A single u64 in a structure is not going to be measurable at all, >>> processors just copy memory too fast these days for 4 extra bytes to be >>> noticable. >> >> It depends on the definition of measurable, I suppose, but this statement >> appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking >> about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back >> sets of valid flags. That's 16 bytes, and it definitely makes a difference. >> Simply running a loop that does a naive memcpy() in a tight user-space >> loop (code below), I see the following for the execution of 1e9 loops: >> >> Including the two extra u64 fields: 3.2 sec >> Without the two extra u64 fields: 2.6 sec >> >> On the same box, doing 1e9 calls to getppid() (i.e., pretty much the >> simplest syscall, giving us a rough measure of the context switch) takes >> 68 seconds. In other words, the cost of copying those 16 bytes is about 1% >> of the base context switch/syscall cost. I assume the costs of copying >> those 16 bytes across the kernel-user-space boundary would not be cheaper, >> but have not tested that. If my assumption is correct, then 1% seems a >> significant figure to me in an API whose raison d'ĂŞtre is speed. > > I have no idea how this is related to any kdbus ioctl? > > A 16byte copy does not affect the performance of kdbus message > transactions in any way that matters. I'm not sure if it's related/significant or not, since I'm ignorant of the performance figures for kdbus. I just got curious when Greg stated that the cost of copying would not be noticeable. (I got curious also about my assumption, and did an experiment with a dummy system call that throws bytes across the fence into user space. The cost of an extra 16 bytes (56 to 72 bytes) is about 3% of the base syscall/context switch cost.) >>> So let's make this as easy as possible for userspace, making >>> it simpler logic there, which is much more important than saving >>> theoretical time in the kernel. >> >> But this also missed the other part of the point. Copying these fields on >> every operation, when in fact they are only needed once, clutters the API, >> in my opinion. Good APIs are as simple as they can be to do their job. >> Redundancy is an enemy of simplicity. Simplest would have been a one time >> API that returns a structure containing all of the supported flags across >> the API. Alternatively, the traditional EINVAL approach is well understood, >> and suffices. > > We're going to drop "kernel_flags" in favor of a new > KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature > negotiation for this ioctl and return the supported flags/items inline > (overwriting the passed data). The ioctl will not be executed and will > not affect the state of the FD. > I hope this keeps the API simple. Not sure I quite understand the details from your description, but I assume the it'll end up in the doc, and I'll try to take a look later. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On 01/27/2015 04:23 PM, David Herrmann wrote: > Hi > > On Mon, Jan 26, 2015 at 5:45 PM, Michael Kerrisk (man-pages) > wrote: >> On 01/26/2015 04:26 PM, Tom Gundersen wrote: >>> On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages) >>> wrote: 2. Is the API to be invoked directly by applications or is intended to be used only behind specific libraries? You seem to be saying that the latter is the case (here, I'm referring to your comment above about sd-bus). However, when I asked David Herrmann a similar question I got this responser: "kdbus is in no way bound to systemd. There are ongoing efforts to port glib and qt to kdbus natively. The API is pretty simple and I don't see how a libkdbus would simplify things. In fact, even our tests only have slim wrappers around the ioctls to simplify error-handling in test-scenarios." To me, that implies that users will employ the raw kernel API. >>> >>> The way I read this is that there will (probably) be a handful of >>> users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt, >>> ell, and maybe a few others. However, third-party developers will not >>> know/care about the details of kdbus, they'll just be coding against >>> the dbus libraries as before (might be minor changes, but they >>> certainly won't need to know anything about the kernel API). Similarly >>> to how userspace developers now code against their libc of choice, >>> rather than use kernel syscalls directly. >> >> Thanks, Tom, for the input. I'm still confused though, since elsewhere >> in this thread David Herrmann said in response to a question of mine: >> >> I think we can agree that we want it to be generically useful, >> like other ipc mechanisms, including UDS and netlink. >> >> Again, that sounds to me like the vision is not "a handful of users". >> Hopefully Greg and David can clarify. > > I only expect a handful of users to call the ioctls directly. The > libraries that implement the payload-marshaling, in particular. It's a > similar situation with netlink. Thanks, David, for the clarification. I think it would have been helpful to have that more clearly stated up front, especially as some comments in this thread, such as the above, could be interpreted to mean quite the opposite. Can I suggest that some text on this point be added to kdbus.txt? Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann wrote: > Hi > > On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages) > wrote: >> Hello Greg, >> >> On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote: >>> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote: While I agree that there should be a way for userspace to get the list of supported operations, userspace apps will only actually care about that once, when they begin talking to kdbus, because (ignoring the live kernel patching that people have been working on recently) the list of supported operations isn't going to change while the system is running. While a u64 copy has relatively low overhead, it does have overhead, and that is very significant when you consider part of the reason some people want kdbus is for the performance gain. Especially for those automotive applications that have been mentioned which fire off thousands of messages during start-up, every little bit of performance is significant. >>> >>> A single u64 in a structure is not going to be measurable at all, >>> processors just copy memory too fast these days for 4 extra bytes to be >>> noticable. >> >> It depends on the definition of measurable, I suppose, but this statement >> appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking >> about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back >> sets of valid flags. That's 16 bytes, and it definitely makes a difference. >> Simply running a loop that does a naive memcpy() in a tight user-space >> loop (code below), I see the following for the execution of 1e9 loops: >> >> Including the two extra u64 fields: 3.2 sec >> Without the two extra u64 fields: 2.6 sec >> >> On the same box, doing 1e9 calls to getppid() (i.e., pretty much the >> simplest syscall, giving us a rough measure of the context switch) takes >> 68 seconds. In other words, the cost of copying those 16 bytes is about 1% >> of the base context switch/syscall cost. I assume the costs of copying >> those 16 bytes across the kernel-user-space boundary would not be cheaper, >> but have not tested that. If my assumption is correct, then 1% seems a >> significant figure to me in an API whose raison d'ĂŞtre is speed. > > I have no idea how this is related to any kdbus ioctl? > > A 16byte copy does not affect the performance of kdbus message > transactions in any way that matters. > Sorry for jumping in so late. Since this version of kdbus was sent, I've been on vacation for part of the time and I had the flu for the rest of the time. What are the performance goals of kdbus? How fast is it ever intended to be? The reason I ask is that, in the current design, kdbus collects "metadata" (credentials and other identifying information, collected in kdbus_meta_proc_collect) from the sender of every message *at send time*. [1] This is slow, and it will always be slow. The slowness of this operation will, in my personal system performance crystal ball, overshadow the cost of a 16 byte copy by several orders of magnitude. [1] After much discussion last time around, I'm at least convinced that the kdbus people have reasons to like the idea of capturing metadata for each message. I still think the design is wrong even from a security standpoint, but right now I'm talking about performance. If you want the data plane to be fast, it should be separated from the control plane as much as possible, and this design is the opposite. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi On Mon, Jan 26, 2015 at 5:45 PM, Michael Kerrisk (man-pages) wrote: > On 01/26/2015 04:26 PM, Tom Gundersen wrote: >> On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages) >> wrote: >>> 2. Is the API to be invoked directly by applications or is intended to >>>be used only behind specific libraries? You seem to be saying that >>>the latter is the case (here, I'm referring to your comment above >>>about sd-bus). However, when I asked David Herrmann a similar >>>question I got this responser: >>> >>> "kdbus is in no way bound to systemd. There are ongoing efforts >>>to port glib and qt to kdbus natively. The API is pretty simple >>>and I don't see how a libkdbus would simplify things. In fact, >>>even our tests only have slim wrappers around the ioctls to >>>simplify error-handling in test-scenarios." >>> >>>To me, that implies that users will employ the raw kernel API. >> >> The way I read this is that there will (probably) be a handful of >> users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt, >> ell, and maybe a few others. However, third-party developers will not >> know/care about the details of kdbus, they'll just be coding against >> the dbus libraries as before (might be minor changes, but they >> certainly won't need to know anything about the kernel API). Similarly >> to how userspace developers now code against their libc of choice, >> rather than use kernel syscalls directly. > > Thanks, Tom, for the input. I'm still confused though, since elsewhere > in this thread David Herrmann said in response to a question of mine: > > I think we can agree that we want it to be generically useful, > like other ipc mechanisms, including UDS and netlink. > > Again, that sounds to me like the vision is not "a handful of users". > Hopefully Greg and David can clarify. I only expect a handful of users to call the ioctls directly. The libraries that implement the payload-marshaling, in particular. It's a similar situation with netlink. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages) wrote: > Hello Greg, > > On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote: >> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote: >>> While I agree that there should be a way for userspace to get the list of >>> supported operations, userspace apps will only actually care about that >>> once, when they begin talking to kdbus, because (ignoring the live kernel >>> patching that people have been working on recently) the list of supported >>> operations isn't going to change while the system is running. While a u64 >>> copy has relatively low overhead, it does have overhead, and that is very >>> significant when you consider part of the reason some people want kdbus is >>> for the performance gain. Especially for those automotive applications that >>> have been mentioned which fire off thousands of messages during start-up, >>> every little bit of performance is significant. >> >> A single u64 in a structure is not going to be measurable at all, >> processors just copy memory too fast these days for 4 extra bytes to be >> noticable. > > It depends on the definition of measurable, I suppose, but this statement > appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking > about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back > sets of valid flags. That's 16 bytes, and it definitely makes a difference. > Simply running a loop that does a naive memcpy() in a tight user-space > loop (code below), I see the following for the execution of 1e9 loops: > > Including the two extra u64 fields: 3.2 sec > Without the two extra u64 fields: 2.6 sec > > On the same box, doing 1e9 calls to getppid() (i.e., pretty much the > simplest syscall, giving us a rough measure of the context switch) takes > 68 seconds. In other words, the cost of copying those 16 bytes is about 1% > of the base context switch/syscall cost. I assume the costs of copying > those 16 bytes across the kernel-user-space boundary would not be cheaper, > but have not tested that. If my assumption is correct, then 1% seems a > significant figure to me in an API whose raison d'ĂŞtre is speed. I have no idea how this is related to any kdbus ioctl? A 16byte copy does not affect the performance of kdbus message transactions in any way that matters. >> So let's make this as easy as possible for userspace, making >> it simpler logic there, which is much more important than saving >> theoretical time in the kernel. > > But this also missed the other part of the point. Copying these fields on > every operation, when in fact they are only needed once, clutters the API, > in my opinion. Good APIs are as simple as they can be to do their job. > Redundancy is an enemy of simplicity. Simplest would have been a one time > API that returns a structure containing all of the supported flags across > the API. Alternatively, the traditional EINVAL approach is well understood, > and suffices. We're going to drop "kernel_flags" in favor of a new KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature negotiation for this ioctl and return the supported flags/items inline (overwriting the passed data). The ioctl will not be executed and will not affect the state of the FD. I hope this keeps the API simple. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael, On 01/27/2015 06:53 PM, Michael Kerrisk (man-pages) wrote: On 01/27/2015 04:23 PM, David Herrmann wrote: I only expect a handful of users to call the ioctls directly. The libraries that implement the payload-marshaling, in particular. It's a similar situation with netlink. Thanks, David, for the clarification. I think it would have been helpful to have that more clearly stated up front, especially as some comments in this thread, such as the above, could be interpreted to mean quite the opposite. Can I suggest that some text on this point be added to kdbus.txt? We're currently working on an a set of comprehensive man pages to document all the commands in the API, along with every struct, enum etc. We do that so that developers are able to actually understand every detail of the API, even though most people - as David explained - will not use that interface directly in the first place but let one of the high-level libraries help them integrate D-Bus functionality into their applications. If you want, have a look at the upstream repository for a preliminary version of the new docs. Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: Hello Greg, On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote: On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote: While I agree that there should be a way for userspace to get the list of supported operations, userspace apps will only actually care about that once, when they begin talking to kdbus, because (ignoring the live kernel patching that people have been working on recently) the list of supported operations isn't going to change while the system is running. While a u64 copy has relatively low overhead, it does have overhead, and that is very significant when you consider part of the reason some people want kdbus is for the performance gain. Especially for those automotive applications that have been mentioned which fire off thousands of messages during start-up, every little bit of performance is significant. A single u64 in a structure is not going to be measurable at all, processors just copy memory too fast these days for 4 extra bytes to be noticable. It depends on the definition of measurable, I suppose, but this statement appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back sets of valid flags. That's 16 bytes, and it definitely makes a difference. Simply running a loop that does a naive memcpy() in a tight user-space loop (code below), I see the following for the execution of 1e9 loops: Including the two extra u64 fields: 3.2 sec Without the two extra u64 fields: 2.6 sec On the same box, doing 1e9 calls to getppid() (i.e., pretty much the simplest syscall, giving us a rough measure of the context switch) takes 68 seconds. In other words, the cost of copying those 16 bytes is about 1% of the base context switch/syscall cost. I assume the costs of copying those 16 bytes across the kernel-user-space boundary would not be cheaper, but have not tested that. If my assumption is correct, then 1% seems a significant figure to me in an API whose raison d'ĂŞtre is speed. I have no idea how this is related to any kdbus ioctl? A 16byte copy does not affect the performance of kdbus message transactions in any way that matters. So let's make this as easy as possible for userspace, making it simpler logic there, which is much more important than saving theoretical time in the kernel. But this also missed the other part of the point. Copying these fields on every operation, when in fact they are only needed once, clutters the API, in my opinion. Good APIs are as simple as they can be to do their job. Redundancy is an enemy of simplicity. Simplest would have been a one time API that returns a structure containing all of the supported flags across the API. Alternatively, the traditional EINVAL approach is well understood, and suffices. We're going to drop kernel_flags in favor of a new KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature negotiation for this ioctl and return the supported flags/items inline (overwriting the passed data). The ioctl will not be executed and will not affect the state of the FD. I hope this keeps the API simple. Thanks David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi On Mon, Jan 26, 2015 at 5:45 PM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: On 01/26/2015 04:26 PM, Tom Gundersen wrote: On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: 2. Is the API to be invoked directly by applications or is intended to be used only behind specific libraries? You seem to be saying that the latter is the case (here, I'm referring to your comment above about sd-bus). However, when I asked David Herrmann a similar question I got this responser: kdbus is in no way bound to systemd. There are ongoing efforts to port glib and qt to kdbus natively. The API is pretty simple and I don't see how a libkdbus would simplify things. In fact, even our tests only have slim wrappers around the ioctls to simplify error-handling in test-scenarios. To me, that implies that users will employ the raw kernel API. The way I read this is that there will (probably) be a handful of users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt, ell, and maybe a few others. However, third-party developers will not know/care about the details of kdbus, they'll just be coding against the dbus libraries as before (might be minor changes, but they certainly won't need to know anything about the kernel API). Similarly to how userspace developers now code against their libc of choice, rather than use kernel syscalls directly. Thanks, Tom, for the input. I'm still confused though, since elsewhere in this thread David Herrmann said in response to a question of mine: I think we can agree that we want it to be generically useful, like other ipc mechanisms, including UDS and netlink. Again, that sounds to me like the vision is not a handful of users. Hopefully Greg and David can clarify. I only expect a handful of users to call the ioctls directly. The libraries that implement the payload-marshaling, in particular. It's a similar situation with netlink. Thanks David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann dh.herrm...@gmail.com wrote: Hi On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: Hello Greg, On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote: On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote: While I agree that there should be a way for userspace to get the list of supported operations, userspace apps will only actually care about that once, when they begin talking to kdbus, because (ignoring the live kernel patching that people have been working on recently) the list of supported operations isn't going to change while the system is running. While a u64 copy has relatively low overhead, it does have overhead, and that is very significant when you consider part of the reason some people want kdbus is for the performance gain. Especially for those automotive applications that have been mentioned which fire off thousands of messages during start-up, every little bit of performance is significant. A single u64 in a structure is not going to be measurable at all, processors just copy memory too fast these days for 4 extra bytes to be noticable. It depends on the definition of measurable, I suppose, but this statement appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back sets of valid flags. That's 16 bytes, and it definitely makes a difference. Simply running a loop that does a naive memcpy() in a tight user-space loop (code below), I see the following for the execution of 1e9 loops: Including the two extra u64 fields: 3.2 sec Without the two extra u64 fields: 2.6 sec On the same box, doing 1e9 calls to getppid() (i.e., pretty much the simplest syscall, giving us a rough measure of the context switch) takes 68 seconds. In other words, the cost of copying those 16 bytes is about 1% of the base context switch/syscall cost. I assume the costs of copying those 16 bytes across the kernel-user-space boundary would not be cheaper, but have not tested that. If my assumption is correct, then 1% seems a significant figure to me in an API whose raison d'ĂŞtre is speed. I have no idea how this is related to any kdbus ioctl? A 16byte copy does not affect the performance of kdbus message transactions in any way that matters. Sorry for jumping in so late. Since this version of kdbus was sent, I've been on vacation for part of the time and I had the flu for the rest of the time. What are the performance goals of kdbus? How fast is it ever intended to be? The reason I ask is that, in the current design, kdbus collects metadata (credentials and other identifying information, collected in kdbus_meta_proc_collect) from the sender of every message *at send time*. [1] This is slow, and it will always be slow. The slowness of this operation will, in my personal system performance crystal ball, overshadow the cost of a 16 byte copy by several orders of magnitude. [1] After much discussion last time around, I'm at least convinced that the kdbus people have reasons to like the idea of capturing metadata for each message. I still think the design is wrong even from a security standpoint, but right now I'm talking about performance. If you want the data plane to be fast, it should be separated from the control plane as much as possible, and this design is the opposite. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On 01/27/2015 04:23 PM, David Herrmann wrote: Hi On Mon, Jan 26, 2015 at 5:45 PM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: On 01/26/2015 04:26 PM, Tom Gundersen wrote: On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: 2. Is the API to be invoked directly by applications or is intended to be used only behind specific libraries? You seem to be saying that the latter is the case (here, I'm referring to your comment above about sd-bus). However, when I asked David Herrmann a similar question I got this responser: kdbus is in no way bound to systemd. There are ongoing efforts to port glib and qt to kdbus natively. The API is pretty simple and I don't see how a libkdbus would simplify things. In fact, even our tests only have slim wrappers around the ioctls to simplify error-handling in test-scenarios. To me, that implies that users will employ the raw kernel API. The way I read this is that there will (probably) be a handful of users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt, ell, and maybe a few others. However, third-party developers will not know/care about the details of kdbus, they'll just be coding against the dbus libraries as before (might be minor changes, but they certainly won't need to know anything about the kernel API). Similarly to how userspace developers now code against their libc of choice, rather than use kernel syscalls directly. Thanks, Tom, for the input. I'm still confused though, since elsewhere in this thread David Herrmann said in response to a question of mine: I think we can agree that we want it to be generically useful, like other ipc mechanisms, including UDS and netlink. Again, that sounds to me like the vision is not a handful of users. Hopefully Greg and David can clarify. I only expect a handful of users to call the ioctls directly. The libraries that implement the payload-marshaling, in particular. It's a similar situation with netlink. Thanks, David, for the clarification. I think it would have been helpful to have that more clearly stated up front, especially as some comments in this thread, such as the above, could be interpreted to mean quite the opposite. Can I suggest that some text on this point be added to kdbus.txt? Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi David, On 01/27/2015 04:05 PM, David Herrmann wrote: Hi On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: Hello Greg, On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote: On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote: While I agree that there should be a way for userspace to get the list of supported operations, userspace apps will only actually care about that once, when they begin talking to kdbus, because (ignoring the live kernel patching that people have been working on recently) the list of supported operations isn't going to change while the system is running. While a u64 copy has relatively low overhead, it does have overhead, and that is very significant when you consider part of the reason some people want kdbus is for the performance gain. Especially for those automotive applications that have been mentioned which fire off thousands of messages during start-up, every little bit of performance is significant. A single u64 in a structure is not going to be measurable at all, processors just copy memory too fast these days for 4 extra bytes to be noticable. It depends on the definition of measurable, I suppose, but this statement appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back sets of valid flags. That's 16 bytes, and it definitely makes a difference. Simply running a loop that does a naive memcpy() in a tight user-space loop (code below), I see the following for the execution of 1e9 loops: Including the two extra u64 fields: 3.2 sec Without the two extra u64 fields: 2.6 sec On the same box, doing 1e9 calls to getppid() (i.e., pretty much the simplest syscall, giving us a rough measure of the context switch) takes 68 seconds. In other words, the cost of copying those 16 bytes is about 1% of the base context switch/syscall cost. I assume the costs of copying those 16 bytes across the kernel-user-space boundary would not be cheaper, but have not tested that. If my assumption is correct, then 1% seems a significant figure to me in an API whose raison d'ĂŞtre is speed. I have no idea how this is related to any kdbus ioctl? A 16byte copy does not affect the performance of kdbus message transactions in any way that matters. I'm not sure if it's related/significant or not, since I'm ignorant of the performance figures for kdbus. I just got curious when Greg stated that the cost of copying would not be noticeable. (I got curious also about my assumption, and did an experiment with a dummy system call that throws bytes across the fence into user space. The cost of an extra 16 bytes (56 to 72 bytes) is about 3% of the base syscall/context switch cost.) So let's make this as easy as possible for userspace, making it simpler logic there, which is much more important than saving theoretical time in the kernel. But this also missed the other part of the point. Copying these fields on every operation, when in fact they are only needed once, clutters the API, in my opinion. Good APIs are as simple as they can be to do their job. Redundancy is an enemy of simplicity. Simplest would have been a one time API that returns a structure containing all of the supported flags across the API. Alternatively, the traditional EINVAL approach is well understood, and suffices. We're going to drop kernel_flags in favor of a new KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature negotiation for this ioctl and return the supported flags/items inline (overwriting the passed data). The ioctl will not be executed and will not affect the state of the FD. I hope this keeps the API simple. Not sure I quite understand the details from your description, but I assume the it'll end up in the doc, and I'll try to take a look later. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On 01/26/2015 04:26 PM, Tom Gundersen wrote: > Hi Michael, > > On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages) > wrote: >> 2. Is the API to be invoked directly by applications or is intended to >>be used only behind specific libraries? You seem to be saying that >>the latter is the case (here, I'm referring to your comment above >>about sd-bus). However, when I asked David Herrmann a similar >>question I got this responser: >> >> "kdbus is in no way bound to systemd. There are ongoing efforts >>to port glib and qt to kdbus natively. The API is pretty simple >>and I don't see how a libkdbus would simplify things. In fact, >>even our tests only have slim wrappers around the ioctls to >>simplify error-handling in test-scenarios." >> >>To me, that implies that users will employ the raw kernel API. > > The way I read this is that there will (probably) be a handful of > users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt, > ell, and maybe a few others. However, third-party developers will not > know/care about the details of kdbus, they'll just be coding against > the dbus libraries as before (might be minor changes, but they > certainly won't need to know anything about the kernel API). Similarly > to how userspace developers now code against their libc of choice, > rather than use kernel syscalls directly. Thanks, Tom, for the input. I'm still confused though, since elsewhere in this thread David Herrmann said in response to a question of mine: I think we can agree that we want it to be generically useful, like other ipc mechanisms, including UDS and netlink. Again, that sounds to me like the vision is not "a handful of users". Hopefully Greg and David can clarify. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Mon, Jan 26, 2015 at 04:26:53PM +0100, Tom Gundersen wrote: > The way I read this is that there will (probably) be a handful of > users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt, > ell, and maybe a few others. However, third-party developers will not > know/care about the details of kdbus, they'll just be coding against > the dbus libraries as before (might be minor changes, but they > certainly won't need to know anything about the kernel API). Similarly > to how userspace developers now code against their libc of choice, > rather than use kernel syscalls directly. Which means we do need proper man pages and detailed documentation for it, just like syscalls for syscalls which just happened to be used by a few libcs. I suspect it really should be implemented as syscalls anyway, but we can leave that argument aside from now. Good documentation certainly helps with making that decision in an educated way. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael, On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages) wrote: > 2. Is the API to be invoked directly by applications or is intended to >be used only behind specific libraries? You seem to be saying that >the latter is the case (here, I'm referring to your comment above >about sd-bus). However, when I asked David Herrmann a similar >question I got this responser: > > "kdbus is in no way bound to systemd. There are ongoing efforts >to port glib and qt to kdbus natively. The API is pretty simple >and I don't see how a libkdbus would simplify things. In fact, >even our tests only have slim wrappers around the ioctls to >simplify error-handling in test-scenarios." > >To me, that implies that users will employ the raw kernel API. The way I read this is that there will (probably) be a handful of users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt, ell, and maybe a few others. However, third-party developers will not know/care about the details of kdbus, they'll just be coding against the dbus libraries as before (might be minor changes, but they certainly won't need to know anything about the kernel API). Similarly to how userspace developers now code against their libc of choice, rather than use kernel syscalls directly. HTH, Tom -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hello Greg, On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote: > On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote: >> While I agree that there should be a way for userspace to get the list of >> supported operations, userspace apps will only actually care about that >> once, when they begin talking to kdbus, because (ignoring the live kernel >> patching that people have been working on recently) the list of supported >> operations isn't going to change while the system is running. While a u64 >> copy has relatively low overhead, it does have overhead, and that is very >> significant when you consider part of the reason some people want kdbus is >> for the performance gain. Especially for those automotive applications that >> have been mentioned which fire off thousands of messages during start-up, >> every little bit of performance is significant. > > A single u64 in a structure is not going to be measurable at all, > processors just copy memory too fast these days for 4 extra bytes to be > noticable. It depends on the definition of measurable, I suppose, but this statement appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back sets of valid flags. That's 16 bytes, and it definitely makes a difference. Simply running a loop that does a naive memcpy() in a tight user-space loop (code below), I see the following for the execution of 1e9 loops: Including the two extra u64 fields: 3.2 sec Without the two extra u64 fields: 2.6 sec On the same box, doing 1e9 calls to getppid() (i.e., pretty much the simplest syscall, giving us a rough measure of the context switch) takes 68 seconds. In other words, the cost of copying those 16 bytes is about 1% of the base context switch/syscall cost. I assume the costs of copying those 16 bytes across the kernel-user-space boundary would not be cheaper, but have not tested that. If my assumption is correct, then 1% seems a significant figure to me in an API whose raison d'ĂŞtre is speed. > So let's make this as easy as possible for userspace, making > it simpler logic there, which is much more important than saving > theoretical time in the kernel. But this also missed the other part of the point. Copying these fields on every operation, when in fact they are only needed once, clutters the API, in my opinion. Good APIs are as simple as they can be to do their job. Redundancy is an enemy of simplicity. Simplest would have been a one time API that returns a structure containing all of the supported flags across the API. Alternatively, the traditional EINVAL approach is well understood, and suffices. Thanks, Michael = #include #include #include #include #include struct kdbus_msg_info { uint64_t offset; uint64_t msg_size; uint64_t return_flags; }; struct kdbus_cmd_send { uint64_t size; uint64_t flags; #if FIELDS >= 1 uint64_t kernel_flags; #endif #if FIELDS >= 2 uint64_t kernel_msg_flags; #endif uint64_t return_flags; uint64_t msg_address; struct kdbus_msg_info reply; //struct kdbus_item items[0]; } __attribute__((aligned(8))); int main(int argc, char *argv[]) { long nloops, j; struct kdbus_cmd_send src, dst; memset(, 0, sizeof(struct kdbus_cmd_send)); printf("struct size: %zd\n", sizeof(struct kdbus_cmd_send)); nloops = (argc > 1) ? atol(argv[1]) : 10; for (j = 0; j < nloops; j++) { memcpy(, , sizeof(struct kdbus_cmd_send)); } exit(EXIT_SUCCESS); } -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Greg, First of all, I seem to have offended you. That was not my intention. It's certainly not my intent to disparage you or your work (or for that matter, the other kdbus developers). Insofar as any of the wordings I've used suggested otherwise, I do apologize. I'll comment on various points below, keeping it as technical as I can. Then I have a couple of general questions at the end with the goal of ensuring that my comments are not coming from a broken world view. On 01/23/2015 04:54 PM, Greg Kroah-Hartman wrote: > On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote: And that process seems to be frequent and ongoing even now. (And it's to your great credit that the API/ABI breaks are clearly and honestly marked in the kdbus.h changelog.) All of this lightens the burden of API design for kernel developers, but I'm concerned that the long-term pain for user-space developers who use an API which (in my estimation) may come to be widely used will be enormous. >>> >>> Yes, we've jointly reviewed the API details again until just recently to >>> unify structs and enums etc, and added fields to make the ioctls structs >>> more versatile for possible future additions. By that, we effectively >>> broke the ABI, but we did that because we know we can't do such things >>> again in the future. >>> >>> But again - I don't see how this would be different when using syscalls >>> rather than ioctls to transport information between the driver and >>> userspace. Could you elaborate? >> >> My suspicion is that not nearly enough thinking has yet been done about >> the design of the API. That's based on these observations: >> >> * Documentation that is, considering the size of the API, *way* too thin. >> * Some parts of the API not documented at all (various kdbus_item blobs) >> * ABI changes happening even quite recently >> * API oddities such as the 'kernel_flags' fields. Why do I need to >> be told what flags the kernel supports on *every* operation? >> >> The above is just after a day of looking hard at kdbus.txt. I strongly >> suspect I'd find a lot of other issues if I spent more time on kdbus. > > "not enough thinking"? > > We started working on kdbus 2 years ago this FOSDEM (in a few weeks.) > Before that we have been thinking about this for many years, learning > from the previous attempts to get this type of feature merged into the > kernel, talking with users about what they need for this, and soliciting > kernel developer's opinions on what type of API would be best for this > type of feature. > > Since then we have done nothing but constantly revise the API. My first > mock ups were way too simple, and in discussing things with people much > more knowledgeable about D-Bus, they pointed out the problems, and we > iterated. And iterated. And iterated some more. We have worked with > just about every userspace libdbus developer group, including QtDbus > developers as well as glib developers. Now not all of them agreed with > some of our decisions in the implementation, which is fair enough, you > can't please everyone, but they _all_ agree that what we have now is the > proper way to implement this type of functionality and have reviewed the > features as being correct and compatible with their needs and users. > > Those discussions have happened in emails, presentations, meetings, and > hackfests pretty much continuously for the past 2 years all around the > world. > > We have stress-tested the api with both unit tests (which are included > here in the patch set) as well as a real-world implementation (sd-bus in > the systemd source repo.) That real-world implementation successfully > has been booting many of our daily machines for many months now. Notwithstanding that I don't see how a unit test stress tests an API *design*, I've no reason to doubt that kdbus works. But that's not the point of my concern. I worry how usable this API is going to be for the world at large. > Yes, the documentation can always be better, but please don't confuse > the lack of understanding how D-Bus works and its model with the lack of > understanding this kdbus implementation, the two are not comparable. > For some good primers on what D-Bus is, and the terminology it expects > see: > http://dbus.freedesktop.org/doc/dbus-tutorial.html > and also: > http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc > > We are not going to put a basic "here is what D-Bus is and how to use > it" into the kernel tree, that is totally outside the scope here. I didn't expect that you should do that. But it does touch on a general question that I'll leave to the end of this mail. > I suggest reading the tutorial above, and then going back and reading > the kdbus documentation provided. If you think we are lacking stuff on > the kdbus side, we will be glad to flush out any needed areas. > > Also, Daniel has said he will work on a basic userspace
Re: [PATCH 01/13] kdbus: add documentation
Hi Greg, First of all, I seem to have offended you. That was not my intention. It's certainly not my intent to disparage you or your work (or for that matter, the other kdbus developers). Insofar as any of the wordings I've used suggested otherwise, I do apologize. I'll comment on various points below, keeping it as technical as I can. Then I have a couple of general questions at the end with the goal of ensuring that my comments are not coming from a broken world view. On 01/23/2015 04:54 PM, Greg Kroah-Hartman wrote: On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote: And that process seems to be frequent and ongoing even now. (And it's to your great credit that the API/ABI breaks are clearly and honestly marked in the kdbus.h changelog.) All of this lightens the burden of API design for kernel developers, but I'm concerned that the long-term pain for user-space developers who use an API which (in my estimation) may come to be widely used will be enormous. Yes, we've jointly reviewed the API details again until just recently to unify structs and enums etc, and added fields to make the ioctls structs more versatile for possible future additions. By that, we effectively broke the ABI, but we did that because we know we can't do such things again in the future. But again - I don't see how this would be different when using syscalls rather than ioctls to transport information between the driver and userspace. Could you elaborate? My suspicion is that not nearly enough thinking has yet been done about the design of the API. That's based on these observations: * Documentation that is, considering the size of the API, *way* too thin. * Some parts of the API not documented at all (various kdbus_item blobs) * ABI changes happening even quite recently * API oddities such as the 'kernel_flags' fields. Why do I need to be told what flags the kernel supports on *every* operation? The above is just after a day of looking hard at kdbus.txt. I strongly suspect I'd find a lot of other issues if I spent more time on kdbus. not enough thinking? We started working on kdbus 2 years ago this FOSDEM (in a few weeks.) Before that we have been thinking about this for many years, learning from the previous attempts to get this type of feature merged into the kernel, talking with users about what they need for this, and soliciting kernel developer's opinions on what type of API would be best for this type of feature. Since then we have done nothing but constantly revise the API. My first mock ups were way too simple, and in discussing things with people much more knowledgeable about D-Bus, they pointed out the problems, and we iterated. And iterated. And iterated some more. We have worked with just about every userspace libdbus developer group, including QtDbus developers as well as glib developers. Now not all of them agreed with some of our decisions in the implementation, which is fair enough, you can't please everyone, but they _all_ agree that what we have now is the proper way to implement this type of functionality and have reviewed the features as being correct and compatible with their needs and users. Those discussions have happened in emails, presentations, meetings, and hackfests pretty much continuously for the past 2 years all around the world. We have stress-tested the api with both unit tests (which are included here in the patch set) as well as a real-world implementation (sd-bus in the systemd source repo.) That real-world implementation successfully has been booting many of our daily machines for many months now. Notwithstanding that I don't see how a unit test stress tests an API *design*, I've no reason to doubt that kdbus works. But that's not the point of my concern. I worry how usable this API is going to be for the world at large. Yes, the documentation can always be better, but please don't confuse the lack of understanding how D-Bus works and its model with the lack of understanding this kdbus implementation, the two are not comparable. For some good primers on what D-Bus is, and the terminology it expects see: http://dbus.freedesktop.org/doc/dbus-tutorial.html and also: http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc We are not going to put a basic here is what D-Bus is and how to use it into the kernel tree, that is totally outside the scope here. I didn't expect that you should do that. But it does touch on a general question that I'll leave to the end of this mail. I suggest reading the tutorial above, and then going back and reading the kdbus documentation provided. If you think we are lacking stuff on the kdbus side, we will be glad to flush out any needed areas. Also, Daniel has said he will work on a basic userspace example library to show how to use this api, which might make the api a bit easier to understand. However, I personally don't
Re: [PATCH 01/13] kdbus: add documentation
On 01/26/2015 04:26 PM, Tom Gundersen wrote: Hi Michael, On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: 2. Is the API to be invoked directly by applications or is intended to be used only behind specific libraries? You seem to be saying that the latter is the case (here, I'm referring to your comment above about sd-bus). However, when I asked David Herrmann a similar question I got this responser: kdbus is in no way bound to systemd. There are ongoing efforts to port glib and qt to kdbus natively. The API is pretty simple and I don't see how a libkdbus would simplify things. In fact, even our tests only have slim wrappers around the ioctls to simplify error-handling in test-scenarios. To me, that implies that users will employ the raw kernel API. The way I read this is that there will (probably) be a handful of users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt, ell, and maybe a few others. However, third-party developers will not know/care about the details of kdbus, they'll just be coding against the dbus libraries as before (might be minor changes, but they certainly won't need to know anything about the kernel API). Similarly to how userspace developers now code against their libc of choice, rather than use kernel syscalls directly. Thanks, Tom, for the input. I'm still confused though, since elsewhere in this thread David Herrmann said in response to a question of mine: I think we can agree that we want it to be generically useful, like other ipc mechanisms, including UDS and netlink. Again, that sounds to me like the vision is not a handful of users. Hopefully Greg and David can clarify. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael, On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: 2. Is the API to be invoked directly by applications or is intended to be used only behind specific libraries? You seem to be saying that the latter is the case (here, I'm referring to your comment above about sd-bus). However, when I asked David Herrmann a similar question I got this responser: kdbus is in no way bound to systemd. There are ongoing efforts to port glib and qt to kdbus natively. The API is pretty simple and I don't see how a libkdbus would simplify things. In fact, even our tests only have slim wrappers around the ioctls to simplify error-handling in test-scenarios. To me, that implies that users will employ the raw kernel API. The way I read this is that there will (probably) be a handful of users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt, ell, and maybe a few others. However, third-party developers will not know/care about the details of kdbus, they'll just be coding against the dbus libraries as before (might be minor changes, but they certainly won't need to know anything about the kernel API). Similarly to how userspace developers now code against their libc of choice, rather than use kernel syscalls directly. HTH, Tom -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Mon, Jan 26, 2015 at 04:26:53PM +0100, Tom Gundersen wrote: The way I read this is that there will (probably) be a handful of users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt, ell, and maybe a few others. However, third-party developers will not know/care about the details of kdbus, they'll just be coding against the dbus libraries as before (might be minor changes, but they certainly won't need to know anything about the kernel API). Similarly to how userspace developers now code against their libc of choice, rather than use kernel syscalls directly. Which means we do need proper man pages and detailed documentation for it, just like syscalls for syscalls which just happened to be used by a few libcs. I suspect it really should be implemented as syscalls anyway, but we can leave that argument aside from now. Good documentation certainly helps with making that decision in an educated way. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hello Greg, On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote: On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote: While I agree that there should be a way for userspace to get the list of supported operations, userspace apps will only actually care about that once, when they begin talking to kdbus, because (ignoring the live kernel patching that people have been working on recently) the list of supported operations isn't going to change while the system is running. While a u64 copy has relatively low overhead, it does have overhead, and that is very significant when you consider part of the reason some people want kdbus is for the performance gain. Especially for those automotive applications that have been mentioned which fire off thousands of messages during start-up, every little bit of performance is significant. A single u64 in a structure is not going to be measurable at all, processors just copy memory too fast these days for 4 extra bytes to be noticable. It depends on the definition of measurable, I suppose, but this statement appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back sets of valid flags. That's 16 bytes, and it definitely makes a difference. Simply running a loop that does a naive memcpy() in a tight user-space loop (code below), I see the following for the execution of 1e9 loops: Including the two extra u64 fields: 3.2 sec Without the two extra u64 fields: 2.6 sec On the same box, doing 1e9 calls to getppid() (i.e., pretty much the simplest syscall, giving us a rough measure of the context switch) takes 68 seconds. In other words, the cost of copying those 16 bytes is about 1% of the base context switch/syscall cost. I assume the costs of copying those 16 bytes across the kernel-user-space boundary would not be cheaper, but have not tested that. If my assumption is correct, then 1% seems a significant figure to me in an API whose raison d'ĂŞtre is speed. So let's make this as easy as possible for userspace, making it simpler logic there, which is much more important than saving theoretical time in the kernel. But this also missed the other part of the point. Copying these fields on every operation, when in fact they are only needed once, clutters the API, in my opinion. Good APIs are as simple as they can be to do their job. Redundancy is an enemy of simplicity. Simplest would have been a one time API that returns a structure containing all of the supported flags across the API. Alternatively, the traditional EINVAL approach is well understood, and suffices. Thanks, Michael = #include stdint.h #include unistd.h #include stdio.h #include stdlib.h #include string.h struct kdbus_msg_info { uint64_t offset; uint64_t msg_size; uint64_t return_flags; }; struct kdbus_cmd_send { uint64_t size; uint64_t flags; #if FIELDS = 1 uint64_t kernel_flags; #endif #if FIELDS = 2 uint64_t kernel_msg_flags; #endif uint64_t return_flags; uint64_t msg_address; struct kdbus_msg_info reply; //struct kdbus_item items[0]; } __attribute__((aligned(8))); int main(int argc, char *argv[]) { long nloops, j; struct kdbus_cmd_send src, dst; memset(dst, 0, sizeof(struct kdbus_cmd_send)); printf(struct size: %zd\n, sizeof(struct kdbus_cmd_send)); nloops = (argc 1) ? atol(argv[1]) : 10; for (j = 0; j nloops; j++) { memcpy(dst, src, sizeof(struct kdbus_cmd_send)); } exit(EXIT_SUCCESS); } -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote: > On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote: > > On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote: > > > From: Daniel Mack > > > > > > kdbus is a system for low-latency, low-overhead, easy to use > > > interprocess communication (IPC). > > > > > > The interface to all functions in this driver is implemented via ioctls > > > on files exposed through a filesystem called 'kdbusfs'. The default > > > mount point of kdbusfs is /sys/fs/kdbus. > > > > Pardon my ignorance, but we've always been told that adding > > new ioctl()s to the kernel is a very big no-no. But given > > the seniority of the folks stewarding this kdbus effort, > > there must be a good rationale ;-) > > > > So, can the rationale behind introducing new ioctl()s be > > further explained? It would be even better if it's included > > in the documentation patch itself. > > The main reason to use an ioctl is that you want to atomically set > and/or get something "complex" through the user/kernel boundary. For > simple device attributes, sysfs works great, for configuring devices, > configfs works great, but for data streams / structures / etc. an ioctl > is the correct thing to use. > > Examples of new ioctls being added to the kernel are all over the > place, look at all of the special-purpose ioctls the filesystems keep > creating (they aren't adding new syscalls), look at the monstrosity that > is the DRM layer, look at other complex things like openvswitch, or > "simpler" device-specific interfaces like the MEI one, or even more > complex ones like the MMC interface. These are all valid uses of ioctls > as they are device/filesystem specific ways to interact with the kernel. > > The thing is, almost no one pays attention to these new ioctls as they > are domain-specific interfaces, with open userspace programs talking to > them, and they work well. ioctl is a powerful and useful interface, and > if we were to suddenly require no new ioctls, and require everything to > be a syscall, we would do nothing except make apis more complex (hint, > you now have to do extra validation on your file descriptor passed to > you to determine if it really is what you can properly operate your > ioctl on), and cause no real benefit at all. > > Yes, people abuse ioctls at times, but really, they are needed. > > And remember, I was one of the people who long ago thought we should not > be adding more ioctls, but I have seen the folly of my ways, and chalk > it up to youthful ignorance :) > Exactly, and that's why I wondered about the sudden change of heart ;-) Thanks for taking the time to write this. Regards, Darwish -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote: On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote: On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote: From: Daniel Mack dan...@zonque.org kdbus is a system for low-latency, low-overhead, easy to use interprocess communication (IPC). The interface to all functions in this driver is implemented via ioctls on files exposed through a filesystem called 'kdbusfs'. The default mount point of kdbusfs is /sys/fs/kdbus. Pardon my ignorance, but we've always been told that adding new ioctl()s to the kernel is a very big no-no. But given the seniority of the folks stewarding this kdbus effort, there must be a good rationale ;-) So, can the rationale behind introducing new ioctl()s be further explained? It would be even better if it's included in the documentation patch itself. The main reason to use an ioctl is that you want to atomically set and/or get something complex through the user/kernel boundary. For simple device attributes, sysfs works great, for configuring devices, configfs works great, but for data streams / structures / etc. an ioctl is the correct thing to use. Examples of new ioctls being added to the kernel are all over the place, look at all of the special-purpose ioctls the filesystems keep creating (they aren't adding new syscalls), look at the monstrosity that is the DRM layer, look at other complex things like openvswitch, or simpler device-specific interfaces like the MEI one, or even more complex ones like the MMC interface. These are all valid uses of ioctls as they are device/filesystem specific ways to interact with the kernel. The thing is, almost no one pays attention to these new ioctls as they are domain-specific interfaces, with open userspace programs talking to them, and they work well. ioctl is a powerful and useful interface, and if we were to suddenly require no new ioctls, and require everything to be a syscall, we would do nothing except make apis more complex (hint, you now have to do extra validation on your file descriptor passed to you to determine if it really is what you can properly operate your ioctl on), and cause no real benefit at all. Yes, people abuse ioctls at times, but really, they are needed. And remember, I was one of the people who long ago thought we should not be adding more ioctls, but I have seen the folly of my ways, and chalk it up to youthful ignorance :) Exactly, and that's why I wondered about the sudden change of heart ;-) Thanks for taking the time to write this. Regards, Darwish -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote: > While I agree that there should be a way for userspace to get the list of > supported operations, userspace apps will only actually care about that > once, when they begin talking to kdbus, because (ignoring the live kernel > patching that people have been working on recently) the list of supported > operations isn't going to change while the system is running. While a u64 > copy has relatively low overhead, it does have overhead, and that is very > significant when you consider part of the reason some people want kdbus is > for the performance gain. Especially for those automotive applications that > have been mentioned which fire off thousands of messages during start-up, > every little bit of performance is significant. A single u64 in a structure is not going to be measurable at all, processors just copy memory too fast these days for 4 extra bytes to be noticable. So let's make this as easy as possible for userspace, making it simpler logic there, which is much more important than saving theoretical time in the kernel. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote: > >> And that process seems to be frequent and ongoing even now. (And > >> it's to your great credit that the API/ABI breaks are clearly and honestly > >> marked in the kdbus.h changelog.) All of this lightens the burden of API > >> design for kernel developers, but I'm concerned that the long-term pain > >> for user-space developers who use an API which (in my estimation) may > >> come to be widely used will be enormous. > > > > Yes, we've jointly reviewed the API details again until just recently to > > unify structs and enums etc, and added fields to make the ioctls structs > > more versatile for possible future additions. By that, we effectively > > broke the ABI, but we did that because we know we can't do such things > > again in the future. > > > > But again - I don't see how this would be different when using syscalls > > rather than ioctls to transport information between the driver and > > userspace. Could you elaborate? > > My suspicion is that not nearly enough thinking has yet been done about > the design of the API. That's based on these observations: > > * Documentation that is, considering the size of the API, *way* too thin. > * Some parts of the API not documented at all (various kdbus_item blobs) > * ABI changes happening even quite recently > * API oddities such as the 'kernel_flags' fields. Why do I need to > be told what flags the kernel supports on *every* operation? > > The above is just after a day of looking hard at kdbus.txt. I strongly > suspect I'd find a lot of other issues if I spent more time on kdbus. "not enough thinking"? We started working on kdbus 2 years ago this FOSDEM (in a few weeks.) Before that we have been thinking about this for many years, learning from the previous attempts to get this type of feature merged into the kernel, talking with users about what they need for this, and soliciting kernel developer's opinions on what type of API would be best for this type of feature. Since then we have done nothing but constantly revise the API. My first mock ups were way too simple, and in discussing things with people much more knowledgeable about D-Bus, they pointed out the problems, and we iterated. And iterated. And iterated some more. We have worked with just about every userspace libdbus developer group, including QtDbus developers as well as glib developers. Now not all of them agreed with some of our decisions in the implementation, which is fair enough, you can't please everyone, but they _all_ agree that what we have now is the proper way to implement this type of functionality and have reviewed the features as being correct and compatible with their needs and users. Those discussions have happened in emails, presentations, meetings, and hackfests pretty much continuously for the past 2 years all around the world. We have stress-tested the api with both unit tests (which are included here in the patch set) as well as a real-world implementation (sd-bus in the systemd source repo.) That real-world implementation successfully has been booting many of our daily machines for many months now. Yes, the documentation can always be better, but please don't confuse the lack of understanding how D-Bus works and its model with the lack of understanding this kdbus implementation, the two are not comparable. For some good primers on what D-Bus is, and the terminology it expects see: http://dbus.freedesktop.org/doc/dbus-tutorial.html and also: http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc We are not going to put a basic "here is what D-Bus is and how to use it" into the kernel tree, that is totally outside the scope here. I suggest reading the tutorial above, and then going back and reading the kdbus documentation provided. If you think we are lacking stuff on the kdbus side, we will be glad to flush out any needed areas. Also, Daniel has said he will work on a basic userspace "example" library to show how to use this api, which might make the api a bit easier to understand. However, I personally don't think this "example code" is necessary at all. We don't ask for this type of "simple examples" from other new kernel apis we create and add to the kernel all the time. We require there to be a user of the api, but not one that is so well documented that others can write a from-scratch raw userspace replacement. Specific examples of this are my previously mentioned ioctl users (btrfs, mei, mic, openvlan, etc.), and the grand-daddy of all horrid apis, DRM. Users aren't going to be writing their own "raw kdbus" libraries. Or if they are, they are going to start with one of the existing implementations we have (the test examples and sd-bus, and I think there is a native Go implementation somewhere as well.) Users are going to be using those libraries to write their code, and to be honest, the user api for sd-bus is a delight to use compared to
Re: [PATCH 01/13] kdbus: add documentation
On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote: > On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote: > > On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote: > > > From: Daniel Mack > > > > > > kdbus is a system for low-latency, low-overhead, easy to use > > > interprocess communication (IPC). > > > > > > The interface to all functions in this driver is implemented via ioctls > > > on files exposed through a filesystem called 'kdbusfs'. The default > > > mount point of kdbusfs is /sys/fs/kdbus. > > > > Pardon my ignorance, but we've always been told that adding > > new ioctl()s to the kernel is a very big no-no. But given > > the seniority of the folks stewarding this kdbus effort, > > there must be a good rationale ;-) > > > > So, can the rationale behind introducing new ioctl()s be > > further explained? It would be even better if it's included > > in the documentation patch itself. > > The main reason to use an ioctl is that you want to atomically set > and/or get something "complex" through the user/kernel boundary. For > simple device attributes, sysfs works great, for configuring devices, > configfs works great, but for data streams / structures / etc. an ioctl > is the correct thing to use. > > Examples of new ioctls being added to the kernel are all over the > place, look at all of the special-purpose ioctls the filesystems keep > creating (they aren't adding new syscalls), look at the monstrosity that > is the DRM layer, look at other complex things like openvswitch, or > "simpler" device-specific interfaces like the MEI one, or even more > complex ones like the MMC interface. Oops, I meant, MIC, not MMC, sorry about that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote: > On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote: > > From: Daniel Mack > > > > kdbus is a system for low-latency, low-overhead, easy to use > > interprocess communication (IPC). > > > > The interface to all functions in this driver is implemented via ioctls > > on files exposed through a filesystem called 'kdbusfs'. The default > > mount point of kdbusfs is /sys/fs/kdbus. > > Pardon my ignorance, but we've always been told that adding > new ioctl()s to the kernel is a very big no-no. But given > the seniority of the folks stewarding this kdbus effort, > there must be a good rationale ;-) > > So, can the rationale behind introducing new ioctl()s be > further explained? It would be even better if it's included > in the documentation patch itself. The main reason to use an ioctl is that you want to atomically set and/or get something "complex" through the user/kernel boundary. For simple device attributes, sysfs works great, for configuring devices, configfs works great, but for data streams / structures / etc. an ioctl is the correct thing to use. Examples of new ioctls being added to the kernel are all over the place, look at all of the special-purpose ioctls the filesystems keep creating (they aren't adding new syscalls), look at the monstrosity that is the DRM layer, look at other complex things like openvswitch, or "simpler" device-specific interfaces like the MEI one, or even more complex ones like the MMC interface. These are all valid uses of ioctls as they are device/filesystem specific ways to interact with the kernel. The thing is, almost no one pays attention to these new ioctls as they are domain-specific interfaces, with open userspace programs talking to them, and they work well. ioctl is a powerful and useful interface, and if we were to suddenly require no new ioctls, and require everything to be a syscall, we would do nothing except make apis more complex (hint, you now have to do extra validation on your file descriptor passed to you to determine if it really is what you can properly operate your ioctl on), and cause no real benefit at all. Yes, people abuse ioctls at times, but really, they are needed. And remember, I was one of the people who long ago thought we should not be adding more ioctls, but I have seen the folly of my ways, and chalk it up to youthful ignorance :) Hope this helps explain things better, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi David, On 01/22/2015 02:46 PM, David Herrmann wrote: > Hi Michael > > On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages) > wrote: >> On 01/21/2015 05:58 PM, Daniel Mack wrote: > Also, the context the kdbus commands operate on originate from a > mountable special-purpose file system. It's not clear to me how this point implies any particular API design choice. >>> >>> It emphasizes the fact that our ioctl API can only be used with the >>> nodes exposed by kdbusfs, and vice versa. I think operations on >>> driver-specific files do not justify a new 'generic' syscall API. >> >> I see your (repeated) use of words like "driver" as just a distraction, >> I'm sorry. "Driver-specific files" is an implementation detail. The >> point is that kdbus provides a complex, general-purpose feature for all >> of userland. It is core infrastructure that will be used by key pieces >> of the plumbing layer, and likely by many other applications as well. >> *Please* stop suggesting that it is not core infrastructure: kdbus >> has the potential to be a great IPC that will be very useful to many, >> many user-space developers. > > We called it an 'ipc driver' so far. It is in no way meant as > distraction. Feel free to call it 'core infrastructure'. I think we > can agree that we want it to be generically useful, like other ipc > mechanisms, including UDS and netlink. Yes. >> (By the way, we have precedents for device/filesystem-specific system >> calls. Even a recent one, in memfd_create().) > > memfd_create() is in no way file-system specific. Internally, it uses > shmem, but that's an implementation detail. The API does not expose > this in any way. If you were referring to sealing, it's implemented as > fcntl(), not as a separate syscall. Furthermore, sealing is only > limited to shmem as it's the only volatile storage. It's not an API > restriction. Other volatile file-systems are free to implement > sealing. My bad. I mispoke there. ioctl() is a get-out-of-jail free card when it comes to API design. >>> >>> And how are syscalls different in that regard when they would both >>> transport the same data structures? >> >> My general impression is that when it comes to system calls, >> there's usually a lot more up front thought about API design. > > This is no technical reason why to use syscalls over ioctls. Imho, > it's rather a reason to improve the kernel's ioctl-review process. Agreed it's not a technical reason. But the distinction I make does capture how things usually work. [...] Rather than thinking carefully and long about a set of coherent, stable APIs that provide some degree of type-safety to user-space, one just adds/changes/removes an ioctl. >>> >>> Adding another ioctl in the future for cases we didn't think about >>> earlier would of course be considered a workaround; and even though such >>> things happen all the time, it's certainly something we'd like to avoid. >>> >>> However, we would also need to add another syscall in such cases, which >>> is even worse IMO. So that's really not an argument against ioctls after >>> all. What fundamental difference between a raw syscall and a ioctl, >>> especially with regards to type safety, is there which would help us here? >> >> Type safety helps user-space application developers. System calls have >> it, ioctl() does not. > > This is simply not true. There is no type-safety in: > syscall(__NR_xyz, some, random, arguments) > > The only way a syscall gets 'type-safe', is to provide a wrapper > function. Same applies to ioctls. But people tend to not do that for > ioctls, which is, again, not a technical argument against ioctls. It's > a matter of psychology, though. Yes, I see I wasn't quite clear enough. I should have said: Type safety helps user-space application developers. *As typically provided to user-space via libc wrappers*, system calls have it, ioctl() does not. And system call wrappers are generally provided pretty much automatically by libcs (at least, mostly, there's some to-ing and fro-ing about this in glibc-land these days as you are aware from the memfd_create() story; http://thread.gmane.org/gmane.comp.lib.glibc.alpha/45884/focus=46937). So, in practice user-space programmers typically automatically get type safety with system calls, but not with ioctl(). > I still don't see a technical reason to use syscalls. API proposals welcome! > We're now working on a small kdbus helper library, which provides > type-safe ioctl wrappers, item-iterators and documented examples. But, > like syscalls, nobody is forced to use the wrappers. The API design is > not affected by this. > And that process seems to be frequent and ongoing even now. (And it's to your great credit that the API/ABI breaks are clearly and honestly marked in the kdbus.h changelog.) All of this lightens the burden of API design for kernel developers, but I'm concerned
Re: [PATCH 01/13] kdbus: add documentation
On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote: On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote: On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote: From: Daniel Mack dan...@zonque.org kdbus is a system for low-latency, low-overhead, easy to use interprocess communication (IPC). The interface to all functions in this driver is implemented via ioctls on files exposed through a filesystem called 'kdbusfs'. The default mount point of kdbusfs is /sys/fs/kdbus. Pardon my ignorance, but we've always been told that adding new ioctl()s to the kernel is a very big no-no. But given the seniority of the folks stewarding this kdbus effort, there must be a good rationale ;-) So, can the rationale behind introducing new ioctl()s be further explained? It would be even better if it's included in the documentation patch itself. The main reason to use an ioctl is that you want to atomically set and/or get something complex through the user/kernel boundary. For simple device attributes, sysfs works great, for configuring devices, configfs works great, but for data streams / structures / etc. an ioctl is the correct thing to use. Examples of new ioctls being added to the kernel are all over the place, look at all of the special-purpose ioctls the filesystems keep creating (they aren't adding new syscalls), look at the monstrosity that is the DRM layer, look at other complex things like openvswitch, or simpler device-specific interfaces like the MEI one, or even more complex ones like the MMC interface. Oops, I meant, MIC, not MMC, sorry about that. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi David, On 01/22/2015 02:46 PM, David Herrmann wrote: Hi Michael On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: On 01/21/2015 05:58 PM, Daniel Mack wrote: Also, the context the kdbus commands operate on originate from a mountable special-purpose file system. It's not clear to me how this point implies any particular API design choice. It emphasizes the fact that our ioctl API can only be used with the nodes exposed by kdbusfs, and vice versa. I think operations on driver-specific files do not justify a new 'generic' syscall API. I see your (repeated) use of words like driver as just a distraction, I'm sorry. Driver-specific files is an implementation detail. The point is that kdbus provides a complex, general-purpose feature for all of userland. It is core infrastructure that will be used by key pieces of the plumbing layer, and likely by many other applications as well. *Please* stop suggesting that it is not core infrastructure: kdbus has the potential to be a great IPC that will be very useful to many, many user-space developers. We called it an 'ipc driver' so far. It is in no way meant as distraction. Feel free to call it 'core infrastructure'. I think we can agree that we want it to be generically useful, like other ipc mechanisms, including UDS and netlink. Yes. (By the way, we have precedents for device/filesystem-specific system calls. Even a recent one, in memfd_create().) memfd_create() is in no way file-system specific. Internally, it uses shmem, but that's an implementation detail. The API does not expose this in any way. If you were referring to sealing, it's implemented as fcntl(), not as a separate syscall. Furthermore, sealing is only limited to shmem as it's the only volatile storage. It's not an API restriction. Other volatile file-systems are free to implement sealing. My bad. I mispoke there. ioctl() is a get-out-of-jail free card when it comes to API design. And how are syscalls different in that regard when they would both transport the same data structures? My general impression is that when it comes to system calls, there's usually a lot more up front thought about API design. This is no technical reason why to use syscalls over ioctls. Imho, it's rather a reason to improve the kernel's ioctl-review process. Agreed it's not a technical reason. But the distinction I make does capture how things usually work. [...] Rather than thinking carefully and long about a set of coherent, stable APIs that provide some degree of type-safety to user-space, one just adds/changes/removes an ioctl. Adding another ioctl in the future for cases we didn't think about earlier would of course be considered a workaround; and even though such things happen all the time, it's certainly something we'd like to avoid. However, we would also need to add another syscall in such cases, which is even worse IMO. So that's really not an argument against ioctls after all. What fundamental difference between a raw syscall and a ioctl, especially with regards to type safety, is there which would help us here? Type safety helps user-space application developers. System calls have it, ioctl() does not. This is simply not true. There is no type-safety in: syscall(__NR_xyz, some, random, arguments) The only way a syscall gets 'type-safe', is to provide a wrapper function. Same applies to ioctls. But people tend to not do that for ioctls, which is, again, not a technical argument against ioctls. It's a matter of psychology, though. Yes, I see I wasn't quite clear enough. I should have said: Type safety helps user-space application developers. *As typically provided to user-space via libc wrappers*, system calls have it, ioctl() does not. And system call wrappers are generally provided pretty much automatically by libcs (at least, mostly, there's some to-ing and fro-ing about this in glibc-land these days as you are aware from the memfd_create() story; http://thread.gmane.org/gmane.comp.lib.glibc.alpha/45884/focus=46937). So, in practice user-space programmers typically automatically get type safety with system calls, but not with ioctl(). I still don't see a technical reason to use syscalls. API proposals welcome! We're now working on a small kdbus helper library, which provides type-safe ioctl wrappers, item-iterators and documented examples. But, like syscalls, nobody is forced to use the wrappers. The API design is not affected by this. And that process seems to be frequent and ongoing even now. (And it's to your great credit that the API/ABI breaks are clearly and honestly marked in the kdbus.h changelog.) All of this lightens the burden of API design for kernel developers, but I'm concerned that the long-term pain for user-space developers who use an API which (in my estimation) may come to be widely used will be enormous. Yes, we've jointly
Re: [PATCH 01/13] kdbus: add documentation
On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote: On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote: From: Daniel Mack dan...@zonque.org kdbus is a system for low-latency, low-overhead, easy to use interprocess communication (IPC). The interface to all functions in this driver is implemented via ioctls on files exposed through a filesystem called 'kdbusfs'. The default mount point of kdbusfs is /sys/fs/kdbus. Pardon my ignorance, but we've always been told that adding new ioctl()s to the kernel is a very big no-no. But given the seniority of the folks stewarding this kdbus effort, there must be a good rationale ;-) So, can the rationale behind introducing new ioctl()s be further explained? It would be even better if it's included in the documentation patch itself. The main reason to use an ioctl is that you want to atomically set and/or get something complex through the user/kernel boundary. For simple device attributes, sysfs works great, for configuring devices, configfs works great, but for data streams / structures / etc. an ioctl is the correct thing to use. Examples of new ioctls being added to the kernel are all over the place, look at all of the special-purpose ioctls the filesystems keep creating (they aren't adding new syscalls), look at the monstrosity that is the DRM layer, look at other complex things like openvswitch, or simpler device-specific interfaces like the MEI one, or even more complex ones like the MMC interface. These are all valid uses of ioctls as they are device/filesystem specific ways to interact with the kernel. The thing is, almost no one pays attention to these new ioctls as they are domain-specific interfaces, with open userspace programs talking to them, and they work well. ioctl is a powerful and useful interface, and if we were to suddenly require no new ioctls, and require everything to be a syscall, we would do nothing except make apis more complex (hint, you now have to do extra validation on your file descriptor passed to you to determine if it really is what you can properly operate your ioctl on), and cause no real benefit at all. Yes, people abuse ioctls at times, but really, they are needed. And remember, I was one of the people who long ago thought we should not be adding more ioctls, but I have seen the folly of my ways, and chalk it up to youthful ignorance :) Hope this helps explain things better, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote: While I agree that there should be a way for userspace to get the list of supported operations, userspace apps will only actually care about that once, when they begin talking to kdbus, because (ignoring the live kernel patching that people have been working on recently) the list of supported operations isn't going to change while the system is running. While a u64 copy has relatively low overhead, it does have overhead, and that is very significant when you consider part of the reason some people want kdbus is for the performance gain. Especially for those automotive applications that have been mentioned which fire off thousands of messages during start-up, every little bit of performance is significant. A single u64 in a structure is not going to be measurable at all, processors just copy memory too fast these days for 4 extra bytes to be noticable. So let's make this as easy as possible for userspace, making it simpler logic there, which is much more important than saving theoretical time in the kernel. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote: And that process seems to be frequent and ongoing even now. (And it's to your great credit that the API/ABI breaks are clearly and honestly marked in the kdbus.h changelog.) All of this lightens the burden of API design for kernel developers, but I'm concerned that the long-term pain for user-space developers who use an API which (in my estimation) may come to be widely used will be enormous. Yes, we've jointly reviewed the API details again until just recently to unify structs and enums etc, and added fields to make the ioctls structs more versatile for possible future additions. By that, we effectively broke the ABI, but we did that because we know we can't do such things again in the future. But again - I don't see how this would be different when using syscalls rather than ioctls to transport information between the driver and userspace. Could you elaborate? My suspicion is that not nearly enough thinking has yet been done about the design of the API. That's based on these observations: * Documentation that is, considering the size of the API, *way* too thin. * Some parts of the API not documented at all (various kdbus_item blobs) * ABI changes happening even quite recently * API oddities such as the 'kernel_flags' fields. Why do I need to be told what flags the kernel supports on *every* operation? The above is just after a day of looking hard at kdbus.txt. I strongly suspect I'd find a lot of other issues if I spent more time on kdbus. not enough thinking? We started working on kdbus 2 years ago this FOSDEM (in a few weeks.) Before that we have been thinking about this for many years, learning from the previous attempts to get this type of feature merged into the kernel, talking with users about what they need for this, and soliciting kernel developer's opinions on what type of API would be best for this type of feature. Since then we have done nothing but constantly revise the API. My first mock ups were way too simple, and in discussing things with people much more knowledgeable about D-Bus, they pointed out the problems, and we iterated. And iterated. And iterated some more. We have worked with just about every userspace libdbus developer group, including QtDbus developers as well as glib developers. Now not all of them agreed with some of our decisions in the implementation, which is fair enough, you can't please everyone, but they _all_ agree that what we have now is the proper way to implement this type of functionality and have reviewed the features as being correct and compatible with their needs and users. Those discussions have happened in emails, presentations, meetings, and hackfests pretty much continuously for the past 2 years all around the world. We have stress-tested the api with both unit tests (which are included here in the patch set) as well as a real-world implementation (sd-bus in the systemd source repo.) That real-world implementation successfully has been booting many of our daily machines for many months now. Yes, the documentation can always be better, but please don't confuse the lack of understanding how D-Bus works and its model with the lack of understanding this kdbus implementation, the two are not comparable. For some good primers on what D-Bus is, and the terminology it expects see: http://dbus.freedesktop.org/doc/dbus-tutorial.html and also: http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc We are not going to put a basic here is what D-Bus is and how to use it into the kernel tree, that is totally outside the scope here. I suggest reading the tutorial above, and then going back and reading the kdbus documentation provided. If you think we are lacking stuff on the kdbus side, we will be glad to flush out any needed areas. Also, Daniel has said he will work on a basic userspace example library to show how to use this api, which might make the api a bit easier to understand. However, I personally don't think this example code is necessary at all. We don't ask for this type of simple examples from other new kernel apis we create and add to the kernel all the time. We require there to be a user of the api, but not one that is so well documented that others can write a from-scratch raw userspace replacement. Specific examples of this are my previously mentioned ioctl users (btrfs, mei, mic, openvlan, etc.), and the grand-daddy of all horrid apis, DRM. Users aren't going to be writing their own raw kdbus libraries. Or if they are, they are going to start with one of the existing implementations we have (the test examples and sd-bus, and I think there is a native Go implementation somewhere as well.) Users are going to be using those libraries to write their code, and to be honest, the user api for sd-bus is a delight to use compared to the old style libdbus interface as we have the benefit of 10
Re: [PATCH 01/13] kdbus: add documentation
On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote: > From: Daniel Mack > > kdbus is a system for low-latency, low-overhead, easy to use > interprocess communication (IPC). > > The interface to all functions in this driver is implemented via ioctls > on files exposed through a filesystem called 'kdbusfs'. The default > mount point of kdbusfs is /sys/fs/kdbus. Pardon my ignorance, but we've always been told that adding new ioctl()s to the kernel is a very big no-no. But given the seniority of the folks stewarding this kdbus effort, there must be a good rationale ;-) So, can the rationale behind introducing new ioctl()s be further explained? It would be even better if it's included in the documentation patch itself. Thanks, Darwish -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On 2015-01-22 08:46, David Herrmann wrote: Hi Michael On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages) wrote: * API oddities such as the 'kernel_flags' fields. Why do I need to be told what flags the kernel supports on *every* operation? If we only returned EINVAL on invalid arguments, user-space had to probe for each flag to see whether it's supported. By returning the set of supported flags, user-space can cache those and _reliably_ know which flags are supported. We decided the overhead of a single u64 copy on each ioctl is preferred over a separate syscall/ioctl to query kernel flags. If you disagree, please elaborate (preferably with a suggestion how to do it better). While I agree that there should be a way for userspace to get the list of supported operations, userspace apps will only actually care about that once, when they begin talking to kdbus, because (ignoring the live kernel patching that people have been working on recently) the list of supported operations isn't going to change while the system is running. While a u64 copy has relatively low overhead, it does have overhead, and that is very significant when you consider part of the reason some people want kdbus is for the performance gain. Especially for those automotive applications that have been mentioned which fire off thousands of messages during start-up, every little bit of performance is significant. smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages) wrote: > On 01/21/2015 05:58 PM, Daniel Mack wrote: Also, the context the kdbus commands operate on originate from a mountable special-purpose file system. >>> >>> It's not clear to me how this point implies any particular API design >>> choice. >> >> It emphasizes the fact that our ioctl API can only be used with the >> nodes exposed by kdbusfs, and vice versa. I think operations on >> driver-specific files do not justify a new 'generic' syscall API. > > I see your (repeated) use of words like "driver" as just a distraction, > I'm sorry. "Driver-specific files" is an implementation detail. The > point is that kdbus provides a complex, general-purpose feature for all > of userland. It is core infrastructure that will be used by key pieces > of the plumbing layer, and likely by many other applications as well. > *Please* stop suggesting that it is not core infrastructure: kdbus > has the potential to be a great IPC that will be very useful to many, > many user-space developers. We called it an 'ipc driver' so far. It is in no way meant as distraction. Feel free to call it 'core infrastructure'. I think we can agree that we want it to be generically useful, like other ipc mechanisms, including UDS and netlink. > (By the way, we have precedents for device/filesystem-specific system > calls. Even a recent one, in memfd_create().) memfd_create() is in no way file-system specific. Internally, it uses shmem, but that's an implementation detail. The API does not expose this in any way. If you were referring to sealing, it's implemented as fcntl(), not as a separate syscall. Furthermore, sealing is only limited to shmem as it's the only volatile storage. It's not an API restriction. Other volatile file-systems are free to implement sealing. >>> ioctl() is a get-out-of-jail free card when it comes to API design. >> >> And how are syscalls different in that regard when they would both >> transport the same data structures? > > My general impression is that when it comes to system calls, > there's usually a lot more up front thought about API design. This is no technical reason why to use syscalls over ioctls. Imho, it's rather a reason to improve the kernel's ioctl-review process. >> Also note that all kdbus ioctls >> necessarily operate on a file descriptor context, which an ioctl passes >> along by default. > > I fail to see your point here. The same statement applies to multiple > special-purpose system calls (epoll, inotify, various shared memory APIs, > and so on). epoll and inotify don't have a 'parent' object living in the file-system. They *need* an entry-point. We can use open() for that. You're right, from a technical viewpoint, there's no restriction. There're examples for both (eg., see Solaris /dev/poll, as ioctl()-based 'epoll'). >>> Rather >>> than thinking carefully and long about a set of coherent, stable APIs that >>> provide some degree of type-safety to user-space, one just >>> adds/changes/removes >>> an ioctl. >> >> Adding another ioctl in the future for cases we didn't think about >> earlier would of course be considered a workaround; and even though such >> things happen all the time, it's certainly something we'd like to avoid. >> >> However, we would also need to add another syscall in such cases, which >> is even worse IMO. So that's really not an argument against ioctls after >> all. What fundamental difference between a raw syscall and a ioctl, >> especially with regards to type safety, is there which would help us here? > > Type safety helps user-space application developers. System calls have > it, ioctl() does not. This is simply not true. There is no type-safety in: syscall(__NR_xyz, some, random, arguments) The only way a syscall gets 'type-safe', is to provide a wrapper function. Same applies to ioctls. But people tend to not do that for ioctls, which is, again, not a technical argument against ioctls. It's a matter of psychology, though. I still don't see a technical reason to use syscalls. API proposals welcome! We're now working on a small kdbus helper library, which provides type-safe ioctl wrappers, item-iterators and documented examples. But, like syscalls, nobody is forced to use the wrappers. The API design is not affected by this. >>> And that process seems to be frequent and ongoing even now. (And >>> it's to your great credit that the API/ABI breaks are clearly and honestly >>> marked in the kdbus.h changelog.) All of this lightens the burden of API >>> design for kernel developers, but I'm concerned that the long-term pain >>> for user-space developers who use an API which (in my estimation) may >>> come to be widely used will be enormous. >> >> Yes, we've jointly reviewed the API details again until just recently to >> unify structs and enums etc, and added fields to make the ioctls structs >> more versatile for possible future additions. By that, we effectively >>
Re: [PATCH 01/13] kdbus: add documentation
Hi Daniel, On 01/21/2015 05:58 PM, Daniel Mack wrote: > Hi Michael, > > On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote: >> On 01/20/2015 07:23 PM, Daniel Mack wrote: > >>> It's rather an optional driver than a core kernel feature. >> >> Given the various things that I've seen said about kdbus, the >> preceding sentence makes little sense to me: >> >> * kdbus will be the framework supporting user-space D-Bus in the >> future, and also used by systemd, and so on pretty much every >> desktop system. >> * kdbus solves much of the bandwidth problem of D-Bus1. That, >> along with a host of other features mean that there will be >> a lot of user-space developers interested in using this API. >> * Various parties in user space are already expressing strong >> interest in kdbus. >> >> My guess from the above? This will NOT be an "optional driver". >> It *will be* a core kernel feature. > > There will be userlands that will depend on kdbus, but that will still > not turn the "driver" into a core Linux kernel feature. We really want > it to be losely coupled from the rest of the kernel and optional after > all. The kernel people are working toward making more and more things > optional these days, and there will still be lots of systems that won't > be using kdbus. Make it optional and configurable if you want. But that misses my point. kdbus is very likely to become an essential, widely used piece of *general-purpose* piece of ABI infrastructure that will be configured into virtually every type of system. As such, the same standards should apply as for a "core kernel feature", whether or not you want to cal it that. >>> Also, the context the kdbus commands operate on originate from a >>> mountable special-purpose file system. >> >> It's not clear to me how this point implies any particular API design >> choice. > > It emphasizes the fact that our ioctl API can only be used with the > nodes exposed by kdbusfs, and vice versa. I think operations on > driver-specific files do not justify a new 'generic' syscall API. I see your (repeated) use of words like "driver" as just a distraction, I'm sorry. "Driver-specific files" is an implementation detail. The point is that kdbus provides a complex, general-purpose feature for all of userland. It is core infrastructure that will be used by key pieces of the plumbing layer, and likely by many other applications as well. *Please* stop suggesting that it is not core infrastructure: kdbus has the potential to be a great IPC that will be very useful to many, many user-space developers. (By the way, we have precedents for device/filesystem-specific system calls. Even a recent one, in memfd_create().) >> Notwithstanding the fact that there's a lot of (good) information in >> kdbus.txt, there's not nearly enough for someone to create useful, >> robust applications that use that API (and not enough that I as a >> reviewer feel comfortable about reviewing the API). As things stand, >> user-space developers will be forced to decipher large amounts of kernel >> code and existing applications in order to actually build things. And >> when they do, they'll be using one of the worst APIs known to man: ioctl(), >> an API that provides no type safety at all. > > I don't see how ioctls are any worse than syscalls with pointers to > structures. One can screw up compatibility either way. How is an ioctl > wrapper/prototype any less type-safe than a syscall wrapper? Taking that argument to the extreme, we would have no system calls at all, just one gigantic ioctl() ;-). >> ioctl() is a get-out-of-jail free card when it comes to API design. > > And how are syscalls different in that regard when they would both > transport the same data structures? My general impression is that when it comes to system calls, there's usually a lot more up front thought about API design. > Also note that all kdbus ioctls > necessarily operate on a file descriptor context, which an ioctl passes > along by default. I fail to see your point here. The same statement applies to multiple special-purpose system calls (epoll, inotify, various shared memory APIs, and so on). >> Rather >> than thinking carefully and long about a set of coherent, stable APIs that >> provide some degree of type-safety to user-space, one just >> adds/changes/removes >> an ioctl. > > Adding another ioctl in the future for cases we didn't think about > earlier would of course be considered a workaround; and even though such > things happen all the time, it's certainly something we'd like to avoid. > > However, we would also need to add another syscall in such cases, which > is even worse IMO. So that's really not an argument against ioctls after > all. What fundamental difference between a raw syscall and a ioctl, > especially with regards to type safety, is there which would help us here? Type safety helps user-space application developers. System calls have it, ioctl() does not. >> And that
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: On 01/21/2015 05:58 PM, Daniel Mack wrote: Also, the context the kdbus commands operate on originate from a mountable special-purpose file system. It's not clear to me how this point implies any particular API design choice. It emphasizes the fact that our ioctl API can only be used with the nodes exposed by kdbusfs, and vice versa. I think operations on driver-specific files do not justify a new 'generic' syscall API. I see your (repeated) use of words like driver as just a distraction, I'm sorry. Driver-specific files is an implementation detail. The point is that kdbus provides a complex, general-purpose feature for all of userland. It is core infrastructure that will be used by key pieces of the plumbing layer, and likely by many other applications as well. *Please* stop suggesting that it is not core infrastructure: kdbus has the potential to be a great IPC that will be very useful to many, many user-space developers. We called it an 'ipc driver' so far. It is in no way meant as distraction. Feel free to call it 'core infrastructure'. I think we can agree that we want it to be generically useful, like other ipc mechanisms, including UDS and netlink. (By the way, we have precedents for device/filesystem-specific system calls. Even a recent one, in memfd_create().) memfd_create() is in no way file-system specific. Internally, it uses shmem, but that's an implementation detail. The API does not expose this in any way. If you were referring to sealing, it's implemented as fcntl(), not as a separate syscall. Furthermore, sealing is only limited to shmem as it's the only volatile storage. It's not an API restriction. Other volatile file-systems are free to implement sealing. ioctl() is a get-out-of-jail free card when it comes to API design. And how are syscalls different in that regard when they would both transport the same data structures? My general impression is that when it comes to system calls, there's usually a lot more up front thought about API design. This is no technical reason why to use syscalls over ioctls. Imho, it's rather a reason to improve the kernel's ioctl-review process. Also note that all kdbus ioctls necessarily operate on a file descriptor context, which an ioctl passes along by default. I fail to see your point here. The same statement applies to multiple special-purpose system calls (epoll, inotify, various shared memory APIs, and so on). epoll and inotify don't have a 'parent' object living in the file-system. They *need* an entry-point. We can use open() for that. You're right, from a technical viewpoint, there's no restriction. There're examples for both (eg., see Solaris /dev/poll, as ioctl()-based 'epoll'). Rather than thinking carefully and long about a set of coherent, stable APIs that provide some degree of type-safety to user-space, one just adds/changes/removes an ioctl. Adding another ioctl in the future for cases we didn't think about earlier would of course be considered a workaround; and even though such things happen all the time, it's certainly something we'd like to avoid. However, we would also need to add another syscall in such cases, which is even worse IMO. So that's really not an argument against ioctls after all. What fundamental difference between a raw syscall and a ioctl, especially with regards to type safety, is there which would help us here? Type safety helps user-space application developers. System calls have it, ioctl() does not. This is simply not true. There is no type-safety in: syscall(__NR_xyz, some, random, arguments) The only way a syscall gets 'type-safe', is to provide a wrapper function. Same applies to ioctls. But people tend to not do that for ioctls, which is, again, not a technical argument against ioctls. It's a matter of psychology, though. I still don't see a technical reason to use syscalls. API proposals welcome! We're now working on a small kdbus helper library, which provides type-safe ioctl wrappers, item-iterators and documented examples. But, like syscalls, nobody is forced to use the wrappers. The API design is not affected by this. And that process seems to be frequent and ongoing even now. (And it's to your great credit that the API/ABI breaks are clearly and honestly marked in the kdbus.h changelog.) All of this lightens the burden of API design for kernel developers, but I'm concerned that the long-term pain for user-space developers who use an API which (in my estimation) may come to be widely used will be enormous. Yes, we've jointly reviewed the API details again until just recently to unify structs and enums etc, and added fields to make the ioctls structs more versatile for possible future additions. By that, we effectively broke the ABI, but we did that because we know we can't do such things again in the future. But again -
Re: [PATCH 01/13] kdbus: add documentation
Hi Daniel, On 01/21/2015 05:58 PM, Daniel Mack wrote: Hi Michael, On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote: On 01/20/2015 07:23 PM, Daniel Mack wrote: It's rather an optional driver than a core kernel feature. Given the various things that I've seen said about kdbus, the preceding sentence makes little sense to me: * kdbus will be the framework supporting user-space D-Bus in the future, and also used by systemd, and so on pretty much every desktop system. * kdbus solves much of the bandwidth problem of D-Bus1. That, along with a host of other features mean that there will be a lot of user-space developers interested in using this API. * Various parties in user space are already expressing strong interest in kdbus. My guess from the above? This will NOT be an optional driver. It *will be* a core kernel feature. There will be userlands that will depend on kdbus, but that will still not turn the driver into a core Linux kernel feature. We really want it to be losely coupled from the rest of the kernel and optional after all. The kernel people are working toward making more and more things optional these days, and there will still be lots of systems that won't be using kdbus. Make it optional and configurable if you want. But that misses my point. kdbus is very likely to become an essential, widely used piece of *general-purpose* piece of ABI infrastructure that will be configured into virtually every type of system. As such, the same standards should apply as for a core kernel feature, whether or not you want to cal it that. Also, the context the kdbus commands operate on originate from a mountable special-purpose file system. It's not clear to me how this point implies any particular API design choice. It emphasizes the fact that our ioctl API can only be used with the nodes exposed by kdbusfs, and vice versa. I think operations on driver-specific files do not justify a new 'generic' syscall API. I see your (repeated) use of words like driver as just a distraction, I'm sorry. Driver-specific files is an implementation detail. The point is that kdbus provides a complex, general-purpose feature for all of userland. It is core infrastructure that will be used by key pieces of the plumbing layer, and likely by many other applications as well. *Please* stop suggesting that it is not core infrastructure: kdbus has the potential to be a great IPC that will be very useful to many, many user-space developers. (By the way, we have precedents for device/filesystem-specific system calls. Even a recent one, in memfd_create().) Notwithstanding the fact that there's a lot of (good) information in kdbus.txt, there's not nearly enough for someone to create useful, robust applications that use that API (and not enough that I as a reviewer feel comfortable about reviewing the API). As things stand, user-space developers will be forced to decipher large amounts of kernel code and existing applications in order to actually build things. And when they do, they'll be using one of the worst APIs known to man: ioctl(), an API that provides no type safety at all. I don't see how ioctls are any worse than syscalls with pointers to structures. One can screw up compatibility either way. How is an ioctl wrapper/prototype any less type-safe than a syscall wrapper? Taking that argument to the extreme, we would have no system calls at all, just one gigantic ioctl() ;-). ioctl() is a get-out-of-jail free card when it comes to API design. And how are syscalls different in that regard when they would both transport the same data structures? My general impression is that when it comes to system calls, there's usually a lot more up front thought about API design. Also note that all kdbus ioctls necessarily operate on a file descriptor context, which an ioctl passes along by default. I fail to see your point here. The same statement applies to multiple special-purpose system calls (epoll, inotify, various shared memory APIs, and so on). Rather than thinking carefully and long about a set of coherent, stable APIs that provide some degree of type-safety to user-space, one just adds/changes/removes an ioctl. Adding another ioctl in the future for cases we didn't think about earlier would of course be considered a workaround; and even though such things happen all the time, it's certainly something we'd like to avoid. However, we would also need to add another syscall in such cases, which is even worse IMO. So that's really not an argument against ioctls after all. What fundamental difference between a raw syscall and a ioctl, especially with regards to type safety, is there which would help us here? Type safety helps user-space application developers. System calls have it, ioctl() does not. And that process seems to be frequent and ongoing even now. (And it's to your great credit that the API/ABI breaks are clearly
Re: [PATCH 01/13] kdbus: add documentation
On 2015-01-22 08:46, David Herrmann wrote: Hi Michael On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: * API oddities such as the 'kernel_flags' fields. Why do I need to be told what flags the kernel supports on *every* operation? If we only returned EINVAL on invalid arguments, user-space had to probe for each flag to see whether it's supported. By returning the set of supported flags, user-space can cache those and _reliably_ know which flags are supported. We decided the overhead of a single u64 copy on each ioctl is preferred over a separate syscall/ioctl to query kernel flags. If you disagree, please elaborate (preferably with a suggestion how to do it better). While I agree that there should be a way for userspace to get the list of supported operations, userspace apps will only actually care about that once, when they begin talking to kdbus, because (ignoring the live kernel patching that people have been working on recently) the list of supported operations isn't going to change while the system is running. While a u64 copy has relatively low overhead, it does have overhead, and that is very significant when you consider part of the reason some people want kdbus is for the performance gain. Especially for those automotive applications that have been mentioned which fire off thousands of messages during start-up, every little bit of performance is significant. smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 01/13] kdbus: add documentation
On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote: From: Daniel Mack dan...@zonque.org kdbus is a system for low-latency, low-overhead, easy to use interprocess communication (IPC). The interface to all functions in this driver is implemented via ioctls on files exposed through a filesystem called 'kdbusfs'. The default mount point of kdbusfs is /sys/fs/kdbus. Pardon my ignorance, but we've always been told that adding new ioctl()s to the kernel is a very big no-no. But given the seniority of the folks stewarding this kdbus effort, there must be a good rationale ;-) So, can the rationale behind introducing new ioctl()s be further explained? It would be even better if it's included in the documentation patch itself. Thanks, Darwish -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael, On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote: > On 01/20/2015 07:23 PM, Daniel Mack wrote: >> It's rather an optional driver than a core kernel feature. > > Given the various things that I've seen said about kdbus, the > preceding sentence makes little sense to me: > > * kdbus will be the framework supporting user-space D-Bus in the > future, and also used by systemd, and so on pretty much every > desktop system. > * kdbus solves much of the bandwidth problem of D-Bus1. That, > along with a host of other features mean that there will be > a lot of user-space developers interested in using this API. > * Various parties in user space are already expressing strong > interest in kdbus. > > My guess from the above? This will NOT be an "optional driver". > It *will be* a core kernel feature. There will be userlands that will depend on kdbus, but that will still not turn the "driver" into a core Linux kernel feature. We really want it to be losely coupled from the rest of the kernel and optional after all. The kernel people are working toward making more and more things optional these days, and there will still be lots of systems that won't be using kdbus. >> Also, the context the kdbus commands operate on originate from a >> mountable special-purpose file system. > > It's not clear to me how this point implies any particular API design > choice. It emphasizes the fact that our ioctl API can only be used with the nodes exposed by kdbusfs, and vice versa. I think operations on driver-specific files do not justify a new 'generic' syscall API. > Notwithstanding the fact that there's a lot of (good) information in > kdbus.txt, there's not nearly enough for someone to create useful, > robust applications that use that API (and not enough that I as a > reviewer feel comfortable about reviewing the API). As things stand, > user-space developers will be forced to decipher large amounts of kernel > code and existing applications in order to actually build things. And > when they do, they'll be using one of the worst APIs known to man: ioctl(), > an API that provides no type safety at all. I don't see how ioctls are any worse than syscalls with pointers to structures. One can screw up compatibility either way. How is an ioctl wrapper/prototype any less type-safe than a syscall wrapper? > ioctl() is a get-out-of-jail free card when it comes to API design. And how are syscalls different in that regard when they would both transport the same data structures? Also note that all kdbus ioctls necessarily operate on a file descriptor context, which an ioctl passes along by default. > Rather > than thinking carefully and long about a set of coherent, stable APIs that > provide some degree of type-safety to user-space, one just > adds/changes/removes > an ioctl. Adding another ioctl in the future for cases we didn't think about earlier would of course be considered a workaround; and even though such things happen all the time, it's certainly something we'd like to avoid. However, we would also need to add another syscall in such cases, which is even worse IMO. So that's really not an argument against ioctls after all. What fundamental difference between a raw syscall and a ioctl, especially with regards to type safety, is there which would help us here? > And that process seems to be frequent and ongoing even now. (And > it's to your great credit that the API/ABI breaks are clearly and honestly > marked in the kdbus.h changelog.) All of this lightens the burden of API > design for kernel developers, but I'm concerned that the long-term pain > for user-space developers who use an API which (in my estimation) may > come to be widely used will be enormous. Yes, we've jointly reviewed the API details again until just recently to unify structs and enums etc, and added fields to make the ioctls structs more versatile for possible future additions. By that, we effectively broke the ABI, but we did that because we know we can't do such things again in the future. But again - I don't see how this would be different when using syscalls rather than ioctls to transport information between the driver and userspace. Could you elaborate? > Concretely, I'd like to see the following in kdbus.txt: > * A lot more detail, adding the various pieces that are currently missing. > I've mentioned already the absence of detail on the item blob structures, > but there's probably several other pieces as well. My problem is that the > API is so big and hard to grok that it's hard to even begin to work out > what's missing from the documentation. > * Fleshing out the API summaries with code snippets that illustrate the > use of the APIs. > * At least one simple working example application, complete with a walk > through of how it's built and run. > > Yes, all of this is a big demand. But this is a big API that is being added > to the kernel, and one that may become widely used and
Re: [PATCH 01/13] kdbus: add documentation
On Wed, Jan 21, 2015 at 11:32:59AM +0100, Michael Kerrisk (man-pages) wrote: > It's rather an optional driver than a core kernel feature. > > Given the various things that I've seen said about kdbus, the > preceding sentence makes little sense to me: > > * kdbus will be the framework supporting user-space D-Bus in the > future, and also used by systemd, and so on pretty much every > desktop system. I have to agree with Michael here; it's really, **really** disengenuous to say that that if you don't want kdbus, you can just #ifconfig it out. The fact that it systemd will be using it means that it will very shortly become a core kernel feature which is absolutely mandatory. Sure, maybe it can be configured out for "tiny kernels", just as in theory we can configure out the VM system for really tiny embedded systems. But we should be treating this as something that is not optional, because the reality is that's the way it's going to be in very short order. So if that means to use proper system calls instead of ioctls, we should do that. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hello Daniel, On 01/20/2015 07:23 PM, Daniel Mack wrote: > On 01/20/2015 02:53 PM, Michael Kerrisk (man-pages) wrote: >> This is an enormous and complex API. Why is the API ioctl() based, >> rather than system-call-based? Have we learned nothing from the hydra >> that the futex() multiplexing syscall became? (And kdbus is an order >> of magnitude more complex, by the look of things.) At the very least, >> a *good* justification of why the API is ioctl()-based should be part >> of this documentation file. > > I think the simplest reason is because we want to be able to build kdbus > as a module. This isn't any _good_ justification... > It's rather an optional driver than a core kernel feature. Given the various things that I've seen said about kdbus, the preceding sentence makes little sense to me: * kdbus will be the framework supporting user-space D-Bus in the future, and also used by systemd, and so on pretty much every desktop system. * kdbus solves much of the bandwidth problem of D-Bus1. That, along with a host of other features mean that there will be a lot of user-space developers interested in using this API. * Various parties in user space are already expressing strong interest in kdbus. My guess from the above? This will NOT be an "optional driver". It *will be* a core kernel feature. > IMO, kernel primitives should be syscalls, but kdbus is not a primitive > but an elaborate subsystem. Agreed. It's an elaborate subsystem. But that fact doesn't in itself dictate any particular API design choice. > Also, the context the kdbus commands operate on originate from a > mountable special-purpose file system. It's not clear to me how this point implies any particular API design choice. > Hence, we decided not to use a > global kernel interface but specific ioctls on the nodes exposed by kdbusfs. I don't follow the reasoning here at all. Here's what we have, if I have grasped it roughly correctly: * 16 ioctls exposed to user space. * some 20 different structures exchanged between kernel and user space * about 14k lines of kernel code implement the above * some rather thin documentation of the whole lot Sorry if that last point seems rather harsh. I know that you personally have done a lot of work on the kdbus.txt file. David Herrmann asserts that this is a simple API. It is not. He also suggests that there is no need for a libkdbus. I don't know whether that's right or not, but the point is then that there's an expectation that the raw kernel API is what user space will need to work with. Notwithstanding the fact that there's a lot of (good) information in kdbus.txt, there's not nearly enough for someone to create useful, robust applications that use that API (and not enough that I as a reviewer feel comfortable about reviewing the API). As things stand, user-space developers will be forced to decipher large amounts of kernel code and existing applications in order to actually build things. And when they do, they'll be using one of the worst APIs known to man: ioctl(), an API that provides no type safety at all. ioctl() is a get-out-of-jail free card when it comes to API design. Rather than thinking carefully and long about a set of coherent, stable APIs that provide some degree of type-safety to user-space, one just adds/changes/removes an ioctl. And that process seems to be frequent and ongoing even now. (And it's to your great credit that the API/ABI breaks are clearly and honestly marked in the kdbus.h changelog.) All of this lightens the burden of API design for kernel developers, but I'm concerned that the long-term pain for user-space developers who use an API which (in my estimation) may come to be widely used will be enormous. Concretely, I'd like to see the following in kdbus.txt: * A lot more detail, adding the various pieces that are currently missing. I've mentioned already the absence of detail on the item blob structures, but there's probably several other pieces as well. My problem is that the API is so big and hard to grok that it's hard to even begin to work out what's missing from the documentation. * Fleshing out the API summaries with code snippets that illustrate the use of the APIs. * At least one simple working example application, complete with a walk through of how it's built and run. Yes, all of this is a big demand. But this is a big API that is being added to the kernel, and one that may become widely used and for a long time. It's imperative that the API is well documented, and as well designed as possible. Furthermore, with such improved documentation I feel we'd be in a better position to evaluate the merits of an ioctl()-based API versus some other approach. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a
Re: [PATCH 01/13] kdbus: add documentation
Hi David, On 01/20/2015 03:31 PM, David Herrmann wrote: > Hi Michael > > On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages) > wrote: >> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote: >>> From: Daniel Mack >>> >>> kdbus is a system for low-latency, low-overhead, easy to use >>> interprocess communication (IPC). >>> >>> The interface to all functions in this driver is implemented via ioctls >>> on files exposed through a filesystem called 'kdbusfs'. The default >>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed >>> documentation about the kernel level API design. >> >> I have some details feedback on the contents of this file, and some >> bigger questions. I'll split them out into separate mails. >> >> So here, the bigger, general questions to start with. I've arrived late >> to this, so sorry if they've already been discussed, but the answers to >> some of the questions should actually be in this file, I would have >> expected. >> >> This is an enormous and complex API. Why is the API ioctl() based, >> rather than system-call-based? Have we learned nothing from the hydra >> that the futex() multiplexing syscall became? (And kdbus is an order >> of magnitude more complex, by the look of things.) At the very least, >> a *good* justification of why the API is ioctl()-based should be part >> of this documentation file. >> >> An observation: The documentation below is substantial, but this API is >> enormous, so the documentation still feels rather thin. What would >> really help would be some example code in the doc. >> >> And on the subject of code examples... Are there any (prototype) >> working user-space applications that exercise the current kdbus >> implementation? That is, can I install these kdbus patches, and >> then find a simple example application somewhere that does >> something to exercise kdbus? > > If you run a 3.18 kernel, you can install kdbus.ko from our repository > and boot a full Fedora system running Gnome3 with kdbus, given that > you compiled systemd with --enable-kdbus (which is still > experimental). No legacy dbus1 daemon is running. Instead, we have a > bus-proxy that converts classic dbus1 to kdbus, so all > bus-communication runs on kdbus. Good to hear. I think that some info like this should go out in the "00/" covering mails for future patch revisions, so that people can get some sense of the testing that has been done. >> And then: is there any substantial real-world application (e.g., a >> full D-Bus port) that is being developed in tandem with this kernel >> side patch? (I don't mean a user-space library; I mean a seriously >> large application.) This is an incredibly complex API whose >> failings are only going to become evident through real-world use. >> Solidifying an API in the kernel and then discovering the API >> problems later when writing real-world applications would make for >> a sad story. A story something like that of inotify, an API which >> is an order of magnitude less complex than kdbus. (I can't help but >> feel that many of inotify problems that I discuss at >> https://lwn.net/Articles/605128/ might have been fixed or mitigated >> if a few real-world applications had been implemented before the >> API was set in stone.) > > I think running a whole Gnome3 stack counts as "substantial real-world > application", right? Yes, I'll give you that ;-). > Sure, it's a dbus1-to-kdbus layer, but all the > systemd tools use kdbus natively and it works just fine. In fact, we > all run kdbus on our main-systems every day. > > We've spent over a year fixing races and API misdesigns, we've talked > to other toolkit developers (glib, qt, ..) and made sure we're > backwards compatible to dbus1. I don't think the API is perfect, > everyone makes mistakes. But with bus-proxy and systemd we have two > huge users of kdbus that put a lot of pressure on API design. I'll say more about that in another mail in a moment. I'm not enthusiastic about the API. >>> +For a kdbus specific userspace library implementation please refer to: >>> + http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h >> >> Is this library intended just for systemd? More generally, is there an >> intention to provide a general purpose library API for kdbus? Or is the >> intention that each application will roll a library suitable to its >> needs? I think an answer to that question would be useful in this >> Documentation file. > > kdbus is in no way bound to systemd. There are ongoing efforts to port > glib and qt to kdbus natively. The API is pretty simple I think you and I must have quite different definitions of "simple"... (For more on this point, see my reply to Daniel in a moment.) > and I don't > see how a libkdbus would simplify things. In fact, even our tests only > have slim wrappers around the ioctls to simplify error-handling in > test-scenarios. Again, the above info
Re: [PATCH 01/13] kdbus: add documentation
On 01/21/2015 10:07 AM, Michael Kerrisk (man-pages) wrote: > On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote: >>> +Also, if the connection allowed for file descriptor to be passed >>> +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be >>> +installed into the receiving process after the KDBUS_CMD_RECV ioctl >>> returns. >> >> ### >> "after"??? When exactly? > > By the way, what was the answer to this question? I've corrected the wording on this. The file descriptors are installed when the RECV ioctl is called, so they are ready to use when the call returns. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Daniel, On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote: > On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote: >> From: Daniel Mack >> [...] >> +offset field contains the location of the new message inside the receiver's >> +pool. The message is stored as struct kdbus_msg at this offset, and can be >> +interpreted with the semantics described above. >> + >> +Also, if the connection allowed for file descriptor to be passed >> +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be >> +installed into the receiving process after the KDBUS_CMD_RECV ioctl returns. > > ### > "after"??? When exactly? By the way, what was the answer to this question? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael, On 01/21/2015 09:57 AM, Michael Kerrisk (man-pages) wrote: > On 01/20/2015 06:50 PM, Daniel Mack wrote: >> I've addressed all but the below issues, following your suggestions. > > Are your changes already visible somewhere? Yes, in the upstream repo for the standalone module, which we also use to build the patch set from: https://code.google.com/p/d-bus/source/browse/kdbus.txt >> Hmm, you're quoting text from section 5, and section 4 actually >> describes the concept of items quite well I believe? > > Well, Section 4 is pretty short. My point is that most of the various > blob formats (e.g., kdbus_pids, kdbus_caps, kdbus_memfd) are not > documented in kdbus.txt. They all should be, IMO. Okay, I'll add some text about them. Best regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Daniel, On 01/20/2015 06:50 PM, Daniel Mack wrote: > Hi Michael, > > Thanks a lot for for intense review of the documentation. Much appreciated. > > I've addressed all but the below issues, following your suggestions. Are your changes already visible somewhere? > On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote: >>> +and KDBUS_CMD_ENDPOINT_MAKE (see above). >>> + >>> +Following items are expected for KDBUS_CMD_BUS_MAKE: >>> +KDBUS_ITEM_MAKE_NAME >>> + Contains a string to identify the bus name. >> >> So, up to here, I've seen no definition of 'kdbus_item', which leaves me >> asking questions like: what subfield is KDBUS_ITEM_MAKE_NAME stored in? >> which subfield holds the pointer to the string? >> >> Somewhere earlier, kdbus_item needs to be exaplained in more detail, >> I think. > > Hmm, you're quoting text from section 5, and section 4 actually > describes the concept of items quite well I believe? Well, Section 4 is pretty short. My point is that most of the various blob formats (e.g., kdbus_pids, kdbus_caps, kdbus_memfd) are not documented in kdbus.txt. They all should be, IMO. >>> + __s64 priority; >>> +With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in >>> +the queue with at least the given priority. If no such message is >>> waiting >>> +in the queue, -ENOMSG is returned. >> >> ### >> How do I simply select the highest priority message, without knowing what >> its priority is? > > The wording is indeed unclear here. KDBUS_RECV_USE_PRIORITY causes the > messages to be dequeued by their priority. The 'priority' field is > simply a filter that request a minimum priority. By setting this field > to the highest possible value, you effectively bypass the filter. I've > added a better description now. Thanks for the clarification. >>> + -ENOMEM The kernel memory is exhausted >>> + -ENOTTY Illegal ioctl command issued for the file descriptor >> >> Why ENOTTY here, rather than EINVAL? The latter is, I beleive, the usual >> ioctl() error for invalid commands, I believe (If you keep ENOTTY, add an >> explanation in this document.) > > Hmm, no. -ENOTTY is commonly used as return code when calling ioctls > that can't be handled by the FDs they're called on. 'man errno(3)' even > states: "ENOTTY Inappropriate I/O control operation (POSIX.1)". Okay. >>> + -EINVAL Unsupported item attached to command >>> + >>> +For all ioctls that carry a struct as payload: >>> + >>> + -EFAULT The supplied data pointer was not 64-bit aligned, or was >>> + inaccessible from the kernel side. >>> + -EINVAL The size inside the supplied struct was smaller than expected >>> + -EMSGSIZEThe size inside the supplied struct was bigger than >>> expected >> >> Why two different errors for smaller and larger than expected? (If you keep >> things this >> way, pelase explain the reason in this document.) > > Providing a struct that is smaller than the minimum doesn't give the > ioctl a valid set of information to process the request. Hence, I think > -EINVAL is appropriate. However, -EMSGSIZE is something that users might > hit when they make message payloads too big, and I think it's good to > have a change to distinguish the two cases in error handling. I added > something in the document now. Thanks. > Again, thanks a lot for reading the documentation so accurately! You're welcome. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael, On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote: On 01/20/2015 07:23 PM, Daniel Mack wrote: It's rather an optional driver than a core kernel feature. Given the various things that I've seen said about kdbus, the preceding sentence makes little sense to me: * kdbus will be the framework supporting user-space D-Bus in the future, and also used by systemd, and so on pretty much every desktop system. * kdbus solves much of the bandwidth problem of D-Bus1. That, along with a host of other features mean that there will be a lot of user-space developers interested in using this API. * Various parties in user space are already expressing strong interest in kdbus. My guess from the above? This will NOT be an optional driver. It *will be* a core kernel feature. There will be userlands that will depend on kdbus, but that will still not turn the driver into a core Linux kernel feature. We really want it to be losely coupled from the rest of the kernel and optional after all. The kernel people are working toward making more and more things optional these days, and there will still be lots of systems that won't be using kdbus. Also, the context the kdbus commands operate on originate from a mountable special-purpose file system. It's not clear to me how this point implies any particular API design choice. It emphasizes the fact that our ioctl API can only be used with the nodes exposed by kdbusfs, and vice versa. I think operations on driver-specific files do not justify a new 'generic' syscall API. Notwithstanding the fact that there's a lot of (good) information in kdbus.txt, there's not nearly enough for someone to create useful, robust applications that use that API (and not enough that I as a reviewer feel comfortable about reviewing the API). As things stand, user-space developers will be forced to decipher large amounts of kernel code and existing applications in order to actually build things. And when they do, they'll be using one of the worst APIs known to man: ioctl(), an API that provides no type safety at all. I don't see how ioctls are any worse than syscalls with pointers to structures. One can screw up compatibility either way. How is an ioctl wrapper/prototype any less type-safe than a syscall wrapper? ioctl() is a get-out-of-jail free card when it comes to API design. And how are syscalls different in that regard when they would both transport the same data structures? Also note that all kdbus ioctls necessarily operate on a file descriptor context, which an ioctl passes along by default. Rather than thinking carefully and long about a set of coherent, stable APIs that provide some degree of type-safety to user-space, one just adds/changes/removes an ioctl. Adding another ioctl in the future for cases we didn't think about earlier would of course be considered a workaround; and even though such things happen all the time, it's certainly something we'd like to avoid. However, we would also need to add another syscall in such cases, which is even worse IMO. So that's really not an argument against ioctls after all. What fundamental difference between a raw syscall and a ioctl, especially with regards to type safety, is there which would help us here? And that process seems to be frequent and ongoing even now. (And it's to your great credit that the API/ABI breaks are clearly and honestly marked in the kdbus.h changelog.) All of this lightens the burden of API design for kernel developers, but I'm concerned that the long-term pain for user-space developers who use an API which (in my estimation) may come to be widely used will be enormous. Yes, we've jointly reviewed the API details again until just recently to unify structs and enums etc, and added fields to make the ioctls structs more versatile for possible future additions. By that, we effectively broke the ABI, but we did that because we know we can't do such things again in the future. But again - I don't see how this would be different when using syscalls rather than ioctls to transport information between the driver and userspace. Could you elaborate? Concretely, I'd like to see the following in kdbus.txt: * A lot more detail, adding the various pieces that are currently missing. I've mentioned already the absence of detail on the item blob structures, but there's probably several other pieces as well. My problem is that the API is so big and hard to grok that it's hard to even begin to work out what's missing from the documentation. * Fleshing out the API summaries with code snippets that illustrate the use of the APIs. * At least one simple working example application, complete with a walk through of how it's built and run. Yes, all of this is a big demand. But this is a big API that is being added to the kernel, and one that may become widely used and for a long time. Fair enough. Everything that helps people
Re: [PATCH 01/13] kdbus: add documentation
Hi David, On 01/20/2015 03:31 PM, David Herrmann wrote: Hi Michael On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote: From: Daniel Mack dan...@zonque.org kdbus is a system for low-latency, low-overhead, easy to use interprocess communication (IPC). The interface to all functions in this driver is implemented via ioctls on files exposed through a filesystem called 'kdbusfs'. The default mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed documentation about the kernel level API design. I have some details feedback on the contents of this file, and some bigger questions. I'll split them out into separate mails. So here, the bigger, general questions to start with. I've arrived late to this, so sorry if they've already been discussed, but the answers to some of the questions should actually be in this file, I would have expected. This is an enormous and complex API. Why is the API ioctl() based, rather than system-call-based? Have we learned nothing from the hydra that the futex() multiplexing syscall became? (And kdbus is an order of magnitude more complex, by the look of things.) At the very least, a *good* justification of why the API is ioctl()-based should be part of this documentation file. An observation: The documentation below is substantial, but this API is enormous, so the documentation still feels rather thin. What would really help would be some example code in the doc. And on the subject of code examples... Are there any (prototype) working user-space applications that exercise the current kdbus implementation? That is, can I install these kdbus patches, and then find a simple example application somewhere that does something to exercise kdbus? If you run a 3.18 kernel, you can install kdbus.ko from our repository and boot a full Fedora system running Gnome3 with kdbus, given that you compiled systemd with --enable-kdbus (which is still experimental). No legacy dbus1 daemon is running. Instead, we have a bus-proxy that converts classic dbus1 to kdbus, so all bus-communication runs on kdbus. Good to hear. I think that some info like this should go out in the 00/ covering mails for future patch revisions, so that people can get some sense of the testing that has been done. And then: is there any substantial real-world application (e.g., a full D-Bus port) that is being developed in tandem with this kernel side patch? (I don't mean a user-space library; I mean a seriously large application.) This is an incredibly complex API whose failings are only going to become evident through real-world use. Solidifying an API in the kernel and then discovering the API problems later when writing real-world applications would make for a sad story. A story something like that of inotify, an API which is an order of magnitude less complex than kdbus. (I can't help but feel that many of inotify problems that I discuss at https://lwn.net/Articles/605128/ might have been fixed or mitigated if a few real-world applications had been implemented before the API was set in stone.) I think running a whole Gnome3 stack counts as substantial real-world application, right? Yes, I'll give you that ;-). Sure, it's a dbus1-to-kdbus layer, but all the systemd tools use kdbus natively and it works just fine. In fact, we all run kdbus on our main-systems every day. We've spent over a year fixing races and API misdesigns, we've talked to other toolkit developers (glib, qt, ..) and made sure we're backwards compatible to dbus1. I don't think the API is perfect, everyone makes mistakes. But with bus-proxy and systemd we have two huge users of kdbus that put a lot of pressure on API design. I'll say more about that in another mail in a moment. I'm not enthusiastic about the API. +For a kdbus specific userspace library implementation please refer to: + http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h Is this library intended just for systemd? More generally, is there an intention to provide a general purpose library API for kdbus? Or is the intention that each application will roll a library suitable to its needs? I think an answer to that question would be useful in this Documentation file. kdbus is in no way bound to systemd. There are ongoing efforts to port glib and qt to kdbus natively. The API is pretty simple I think you and I must have quite different definitions of simple... (For more on this point, see my reply to Daniel in a moment.) and I don't see how a libkdbus would simplify things. In fact, even our tests only have slim wrappers around the ioctls to simplify error-handling in test-scenarios. Again, the above info would be useful in the Documentation file. Note that most of the toolkit work is on the marshaling level, which is
Re: [PATCH 01/13] kdbus: add documentation
Hello Daniel, On 01/20/2015 07:23 PM, Daniel Mack wrote: On 01/20/2015 02:53 PM, Michael Kerrisk (man-pages) wrote: This is an enormous and complex API. Why is the API ioctl() based, rather than system-call-based? Have we learned nothing from the hydra that the futex() multiplexing syscall became? (And kdbus is an order of magnitude more complex, by the look of things.) At the very least, a *good* justification of why the API is ioctl()-based should be part of this documentation file. I think the simplest reason is because we want to be able to build kdbus as a module. This isn't any _good_ justification... It's rather an optional driver than a core kernel feature. Given the various things that I've seen said about kdbus, the preceding sentence makes little sense to me: * kdbus will be the framework supporting user-space D-Bus in the future, and also used by systemd, and so on pretty much every desktop system. * kdbus solves much of the bandwidth problem of D-Bus1. That, along with a host of other features mean that there will be a lot of user-space developers interested in using this API. * Various parties in user space are already expressing strong interest in kdbus. My guess from the above? This will NOT be an optional driver. It *will be* a core kernel feature. IMO, kernel primitives should be syscalls, but kdbus is not a primitive but an elaborate subsystem. Agreed. It's an elaborate subsystem. But that fact doesn't in itself dictate any particular API design choice. Also, the context the kdbus commands operate on originate from a mountable special-purpose file system. It's not clear to me how this point implies any particular API design choice. Hence, we decided not to use a global kernel interface but specific ioctls on the nodes exposed by kdbusfs. I don't follow the reasoning here at all. Here's what we have, if I have grasped it roughly correctly: * 16 ioctls exposed to user space. * some 20 different structures exchanged between kernel and user space * about 14k lines of kernel code implement the above * some rather thin documentation of the whole lot Sorry if that last point seems rather harsh. I know that you personally have done a lot of work on the kdbus.txt file. David Herrmann asserts that this is a simple API. It is not. He also suggests that there is no need for a libkdbus. I don't know whether that's right or not, but the point is then that there's an expectation that the raw kernel API is what user space will need to work with. Notwithstanding the fact that there's a lot of (good) information in kdbus.txt, there's not nearly enough for someone to create useful, robust applications that use that API (and not enough that I as a reviewer feel comfortable about reviewing the API). As things stand, user-space developers will be forced to decipher large amounts of kernel code and existing applications in order to actually build things. And when they do, they'll be using one of the worst APIs known to man: ioctl(), an API that provides no type safety at all. ioctl() is a get-out-of-jail free card when it comes to API design. Rather than thinking carefully and long about a set of coherent, stable APIs that provide some degree of type-safety to user-space, one just adds/changes/removes an ioctl. And that process seems to be frequent and ongoing even now. (And it's to your great credit that the API/ABI breaks are clearly and honestly marked in the kdbus.h changelog.) All of this lightens the burden of API design for kernel developers, but I'm concerned that the long-term pain for user-space developers who use an API which (in my estimation) may come to be widely used will be enormous. Concretely, I'd like to see the following in kdbus.txt: * A lot more detail, adding the various pieces that are currently missing. I've mentioned already the absence of detail on the item blob structures, but there's probably several other pieces as well. My problem is that the API is so big and hard to grok that it's hard to even begin to work out what's missing from the documentation. * Fleshing out the API summaries with code snippets that illustrate the use of the APIs. * At least one simple working example application, complete with a walk through of how it's built and run. Yes, all of this is a big demand. But this is a big API that is being added to the kernel, and one that may become widely used and for a long time. It's imperative that the API is well documented, and as well designed as possible. Furthermore, with such improved documentation I feel we'd be in a better position to evaluate the merits of an ioctl()-based API versus some other approach. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael, On 01/21/2015 09:57 AM, Michael Kerrisk (man-pages) wrote: On 01/20/2015 06:50 PM, Daniel Mack wrote: I've addressed all but the below issues, following your suggestions. Are your changes already visible somewhere? Yes, in the upstream repo for the standalone module, which we also use to build the patch set from: https://code.google.com/p/d-bus/source/browse/kdbus.txt Hmm, you're quoting text from section 5, and section 4 actually describes the concept of items quite well I believe? Well, Section 4 is pretty short. My point is that most of the various blob formats (e.g., kdbus_pids, kdbus_caps, kdbus_memfd) are not documented in kdbus.txt. They all should be, IMO. Okay, I'll add some text about them. Best regards, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On 01/21/2015 10:07 AM, Michael Kerrisk (man-pages) wrote: On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote: +Also, if the connection allowed for file descriptor to be passed +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be +installed into the receiving process after the KDBUS_CMD_RECV ioctl returns. ### after??? When exactly? By the way, what was the answer to this question? I've corrected the wording on this. The file descriptors are installed when the RECV ioctl is called, so they are ready to use when the call returns. Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Daniel, On 01/20/2015 06:50 PM, Daniel Mack wrote: Hi Michael, Thanks a lot for for intense review of the documentation. Much appreciated. I've addressed all but the below issues, following your suggestions. Are your changes already visible somewhere? On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote: +and KDBUS_CMD_ENDPOINT_MAKE (see above). + +Following items are expected for KDBUS_CMD_BUS_MAKE: +KDBUS_ITEM_MAKE_NAME + Contains a string to identify the bus name. So, up to here, I've seen no definition of 'kdbus_item', which leaves me asking questions like: what subfield is KDBUS_ITEM_MAKE_NAME stored in? which subfield holds the pointer to the string? Somewhere earlier, kdbus_item needs to be exaplained in more detail, I think. Hmm, you're quoting text from section 5, and section 4 actually describes the concept of items quite well I believe? Well, Section 4 is pretty short. My point is that most of the various blob formats (e.g., kdbus_pids, kdbus_caps, kdbus_memfd) are not documented in kdbus.txt. They all should be, IMO. + __s64 priority; +With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in +the queue with at least the given priority. If no such message is waiting +in the queue, -ENOMSG is returned. ### How do I simply select the highest priority message, without knowing what its priority is? The wording is indeed unclear here. KDBUS_RECV_USE_PRIORITY causes the messages to be dequeued by their priority. The 'priority' field is simply a filter that request a minimum priority. By setting this field to the highest possible value, you effectively bypass the filter. I've added a better description now. Thanks for the clarification. + -ENOMEM The kernel memory is exhausted + -ENOTTY Illegal ioctl command issued for the file descriptor Why ENOTTY here, rather than EINVAL? The latter is, I beleive, the usual ioctl() error for invalid commands, I believe (If you keep ENOTTY, add an explanation in this document.) Hmm, no. -ENOTTY is commonly used as return code when calling ioctls that can't be handled by the FDs they're called on. 'man errno(3)' even states: ENOTTY Inappropriate I/O control operation (POSIX.1). Okay. + -EINVAL Unsupported item attached to command + +For all ioctls that carry a struct as payload: + + -EFAULT The supplied data pointer was not 64-bit aligned, or was + inaccessible from the kernel side. + -EINVAL The size inside the supplied struct was smaller than expected + -EMSGSIZEThe size inside the supplied struct was bigger than expected Why two different errors for smaller and larger than expected? (If you keep things this way, pelase explain the reason in this document.) Providing a struct that is smaller than the minimum doesn't give the ioctl a valid set of information to process the request. Hence, I think -EINVAL is appropriate. However, -EMSGSIZE is something that users might hit when they make message payloads too big, and I think it's good to have a change to distinguish the two cases in error handling. I added something in the document now. Thanks. Again, thanks a lot for reading the documentation so accurately! You're welcome. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Daniel, On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote: On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote: From: Daniel Mack dan...@zonque.org [...] +offset field contains the location of the new message inside the receiver's +pool. The message is stored as struct kdbus_msg at this offset, and can be +interpreted with the semantics described above. + +Also, if the connection allowed for file descriptor to be passed +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be +installed into the receiving process after the KDBUS_CMD_RECV ioctl returns. ### after??? When exactly? By the way, what was the answer to this question? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Wed, Jan 21, 2015 at 11:32:59AM +0100, Michael Kerrisk (man-pages) wrote: It's rather an optional driver than a core kernel feature. Given the various things that I've seen said about kdbus, the preceding sentence makes little sense to me: * kdbus will be the framework supporting user-space D-Bus in the future, and also used by systemd, and so on pretty much every desktop system. I have to agree with Michael here; it's really, **really** disengenuous to say that that if you don't want kdbus, you can just #ifconfig it out. The fact that it systemd will be using it means that it will very shortly become a core kernel feature which is absolutely mandatory. Sure, maybe it can be configured out for tiny kernels, just as in theory we can configure out the VM system for really tiny embedded systems. But we should be treating this as something that is not optional, because the reality is that's the way it's going to be in very short order. So if that means to use proper system calls instead of ioctls, we should do that. - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi David, On Tue, Jan 20, 2015 at 06:00:28PM +0100, David Herrmann wrote: [big snip] > These are just examples off the top of my head, but I think they're > already pretty convincing. Thank you for writing this up. This is the information I was looking for which puts kdbus into context and explains the motivation for its development. Naturally I don't agree with all of it, but I'm content with what I learned so far. Daniel informed me off-list that he (and probably others) does not understand what my questions were aiming at. I'm sorry about that, I thought it was clear I was just lacking the background information to understand what kdbus is and what it is not, and why it exists -- information I couldn't find in some hours of googling. Thanks, Johannes -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On 01/20/2015 02:53 PM, Michael Kerrisk (man-pages) wrote: > This is an enormous and complex API. Why is the API ioctl() based, > rather than system-call-based? Have we learned nothing from the hydra > that the futex() multiplexing syscall became? (And kdbus is an order > of magnitude more complex, by the look of things.) At the very least, > a *good* justification of why the API is ioctl()-based should be part > of this documentation file. I think the simplest reason is because we want to be able to build kdbus as a module. It's rather an optional driver than a core kernel feature. IMO, kernel primitives should be syscalls, but kdbus is not a primitive but an elaborate subsystem. Also, the context the kdbus commands operate on originate from a mountable special-purpose file system. Hence, we decided not to use a global kernel interface but specific ioctls on the nodes exposed by kdbusfs. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael, Thanks a lot for for intense review of the documentation. Much appreciated. I've addressed all but the below issues, following your suggestions. On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote: >> +and KDBUS_CMD_ENDPOINT_MAKE (see above). >> + >> +Following items are expected for KDBUS_CMD_BUS_MAKE: >> +KDBUS_ITEM_MAKE_NAME >> + Contains a string to identify the bus name. > > So, up to here, I've seen no definition of 'kdbus_item', which leaves me > asking questions like: what subfield is KDBUS_ITEM_MAKE_NAME stored in? > which subfield holds the pointer to the string? > > Somewhere earlier, kdbus_item needs to be exaplained in more detail, > I think. Hmm, you're quoting text from section 5, and section 4 actually describes the concept of items quite well I believe? >> + __s64 priority; >> +With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in >> +the queue with at least the given priority. If no such message is >> waiting >> +in the queue, -ENOMSG is returned. > > ### > How do I simply select the highest priority message, without knowing what > its priority is? The wording is indeed unclear here. KDBUS_RECV_USE_PRIORITY causes the messages to be dequeued by their priority. The 'priority' field is simply a filter that request a minimum priority. By setting this field to the highest possible value, you effectively bypass the filter. I've added a better description now. >> + -ENOMEM The kernel memory is exhausted >> + -ENOTTY Illegal ioctl command issued for the file descriptor > > Why ENOTTY here, rather than EINVAL? The latter is, I beleive, the usual > ioctl() error for invalid commands, I believe (If you keep ENOTTY, add an > explanation in this document.) Hmm, no. -ENOTTY is commonly used as return code when calling ioctls that can't be handled by the FDs they're called on. 'man errno(3)' even states: "ENOTTY Inappropriate I/O control operation (POSIX.1)". >> + -EINVAL Unsupported item attached to command >> + >> +For all ioctls that carry a struct as payload: >> + >> + -EFAULT The supplied data pointer was not 64-bit aligned, or was >> +inaccessible from the kernel side. >> + -EINVAL The size inside the supplied struct was smaller than expected >> + -EMSGSIZE The size inside the supplied struct was bigger than expected > > Why two different errors for smaller and larger than expected? (If you keep > things this > way, pelase explain the reason in this document.) Providing a struct that is smaller than the minimum doesn't give the ioctl a valid set of information to process the request. Hence, I think -EINVAL is appropriate. However, -EMSGSIZE is something that users might hit when they make message payloads too big, and I think it's good to have a change to distinguish the two cases in error handling. I added something in the document now. Again, thanks a lot for reading the documentation so accurately! Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi On Tue, Jan 20, 2015 at 5:08 PM, Johannes Stezenbach wrote: > However, let me repeat and rephrase my previous questions: > Is there a noticable or measurable improvement from using kdbus? > IOW, is the added complexity of kdbus worth the result? > > I have stated my believe that current usage of D-Bus is not > performance sensitive and the number of messages exchanged > is low. I would love it if you would prove me wrong. > Or if you could show that any D-Bus related bug in Gnome3 > is fixed by kdbus. DBus is not used for performance sensitive applications because DBus is slow. We want to make it fast so we can finally use it for low-latency, high-throughput applications. A simple DBus method-call+reply takes 200us here, with kdbus it takes 8us (with UDS about 2us). If I increase the packet size from 8k to 128k, kdbus even tops UDS thanks to single-copy transfers. The fact that there is no performance-critical application using DBus is, imho, an argument *pro* kdbus. People haven't been capable of making classic dbus1 fast enough for low-latency audio, thus, we present kdbus. Starting up 'gdm' sends ~5k dbus messages on my machine. It takes >1s to transmit the messages alone. Each dbus1 message has to be copied 4 times for each direction. With kdbus, each message is copied only once for each transmission (or not at all, if you use memfds, though that doesn't mean it's necessarily faster). No intermediate context-switch is needed. This makes kdbus capable to transmit low-latency audio data *inline*. DBus marshaling is the de-facto standard in all major(!) linux desktop systems. It is well established and accepted by many DEs. It also solves many other problems, including: policy, authentication/authorization, well-known name registry, efficient broadcasts/multicasts, peer discovery, bus discovery, metadata transmission, and more. It is a shame that we cannot use this well-established protocol for low-latency applications. We, effectively, have to duplicate all this code on custom UDS and other transports just because DBus is too slow. kdbus tries to unify those efforts, so that we don't need multiple policy implementations, name registries and peer discovery mechanisms. Furthermore, kdbus implements comprehensive, yet optional, metadata transmission that allows to identify and authenticate peers in a race-free manner (which is *not* possible with UDS). Also, kdbus provides a single transport bus with sequential message numbering. If you use multiple channels, you cannot give any ordering guarantees across peers (for instance, regarding parallel name-registry changes). Given these theoretical advantages, here're some examples: *) The Tizen developers have been complaining about the high latency of DBus for polkit'ish policy queries. That's why their authentication framework uses custom UDS sockets (called 'Cynara'). If a UI-interaction needs multiple authentication-queries, you don't want it to take multiple milliseconds, given that you usually want to render the result in the same frame. *) PulseAudio doesn't use DBus for data transmission. They had to implement their own marshaling code, transport layer and so on, just because DBus1-latency is horrible. With kdbus, we can basically drop this code-duplication and unify the IPC layer. Same is true for Wayland, btw. *) By moving broadcast-transmission into the kernel, we can use the time-slices of the sender to perform heavy operations. This is also true for policy decisions, etc. With a userspace daemon, we cannot perform operations in a time-slice of the caller. This makes DoS attacks much harder. *) With priority-inheritance, we can do synchronous calls into trusted peers and let them optionally use our time-slice to perform the action. This allows syscall-like/binder-like method-calls into other processes. Without priority-inheritance, this is not possible in a secure manner (see 'priority-inheritance'). *) Logging-daemons often want to attach metadata to log-messages so debugging/filtering gets easier. If short-lived programs send log-messages, the destination peer might not be able to read such metadata from /proc, as the process might no longer be available at that time. Same is true for policy-decisions like polkit does. You cannot send off method-calls and exit. You have to wait for a reply, even though you might not even care for it. If you don't wait, the other side might not be able to verify your identity and as such reject the request. *) Even though the dbus traffic on idle-systems might be low, this doesn't mean it's not significant at boot-times or under high-load. If you run a dbus-monitor of your choice, you will see there is an significant number of messages exchanged during VT-switches, startup, shutdown, suspend, wakeup, hotplugging and similar situations where lots of control-messages are exchanged. We don't want to spend hundreds of ms just to transmit those messages. *) dbus-daemon is not available during
Re: [PATCH 01/13] kdbus: add documentation
Hi all, On Tue, Jan 20, 2015 at 03:53:53PM +0100, Djalal Harouni wrote: > On Tue, Jan 20, 2015 at 09:42:59AM -0500, Josh Boyer wrote: > > On Tue, Jan 20, 2015 at 9:31 AM, David Herrmann > > wrote: > > > > > > If you run a 3.18 kernel, you can install kdbus.ko from our repository > > > and boot a full Fedora system running Gnome3 with kdbus, given that > > > you compiled systemd with --enable-kdbus (which is still > > > experimental). No legacy dbus1 daemon is running. Instead, we have a > > > bus-proxy that converts classic dbus1 to kdbus, so all > > > bus-communication runs on kdbus. > > > > FWIW, we've been building a "playground" repository for the kernel > > that contains this already for Fedora. If you have a stock Fedora 21 > > or rawhide install, you can use: > > > > https://copr.fedoraproject.org/coprs/jwboyer/kernel-playground/ > > > > which has the kernel+kdbus and systemd built with --enable-kdbus > > already. Easy enough to throw in a VM for testing. > > Another addition, if kdbus is installed and loaded, you could also use > systemd-nspawn to boot a full system (systemd compiled with > --enable-kdbus) in a container [1], kdbusfs will be mounted in the > container. > > There is also the busctl tool to query kdbus... > > http://www.freedesktop.org/wiki/Software/systemd/VirtualizedTesting/ It is reassuring that kdbus actually works :) However, let me repeat and rephrase my previous questions: Is there a noticable or measurable improvement from using kdbus? IOW, is the added complexity of kdbus worth the result? I have stated my believe that current usage of D-Bus is not performance sensitive and the number of messages exchanged is low. I would love it if you would prove me wrong. Or if you could show that any D-Bus related bug in Gnome3 is fixed by kdbus. I would sooo love it if someone would finally post some data that proves kdbus is useful beyond systemd. Thanks, Johannes -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi, On Tue, Jan 20, 2015 at 09:42:59AM -0500, Josh Boyer wrote: > On Tue, Jan 20, 2015 at 9:31 AM, David Herrmann wrote: > > Hi Michael > > > > On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages) > > wrote: > >> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote: > >>> From: Daniel Mack > >>> > >>> kdbus is a system for low-latency, low-overhead, easy to use > >>> interprocess communication (IPC). > >>> > >>> The interface to all functions in this driver is implemented via ioctls > >>> on files exposed through a filesystem called 'kdbusfs'. The default > >>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed > >>> documentation about the kernel level API design. > >> > >> I have some details feedback on the contents of this file, and some > >> bigger questions. I'll split them out into separate mails. > >> > >> So here, the bigger, general questions to start with. I've arrived late > >> to this, so sorry if they've already been discussed, but the answers to > >> some of the questions should actually be in this file, I would have > >> expected. > >> > >> This is an enormous and complex API. Why is the API ioctl() based, > >> rather than system-call-based? Have we learned nothing from the hydra > >> that the futex() multiplexing syscall became? (And kdbus is an order > >> of magnitude more complex, by the look of things.) At the very least, > >> a *good* justification of why the API is ioctl()-based should be part > >> of this documentation file. > >> > >> An observation: The documentation below is substantial, but this API is > >> enormous, so the documentation still feels rather thin. What would > >> really help would be some example code in the doc. > >> > >> And on the subject of code examples... Are there any (prototype) > >> working user-space applications that exercise the current kdbus > >> implementation? That is, can I install these kdbus patches, and > >> then find a simple example application somewhere that does > >> something to exercise kdbus? > > > > If you run a 3.18 kernel, you can install kdbus.ko from our repository > > and boot a full Fedora system running Gnome3 with kdbus, given that > > you compiled systemd with --enable-kdbus (which is still > > experimental). No legacy dbus1 daemon is running. Instead, we have a > > bus-proxy that converts classic dbus1 to kdbus, so all > > bus-communication runs on kdbus. > > FWIW, we've been building a "playground" repository for the kernel > that contains this already for Fedora. If you have a stock Fedora 21 > or rawhide install, you can use: > > https://copr.fedoraproject.org/coprs/jwboyer/kernel-playground/ > > which has the kernel+kdbus and systemd built with --enable-kdbus > already. Easy enough to throw in a VM for testing. > > josh Yes thanks josh! Another addition, if kdbus is installed and loaded, you could also use systemd-nspawn to boot a full system (systemd compiled with --enable-kdbus) in a container [1], kdbusfs will be mounted in the container. There is also the busctl tool to query kdbus... http://www.freedesktop.org/wiki/Software/systemd/VirtualizedTesting/ -- Djalal Harouni http://opendz.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On Tue, Jan 20, 2015 at 9:31 AM, David Herrmann wrote: > Hi Michael > > On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages) > wrote: >> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote: >>> From: Daniel Mack >>> >>> kdbus is a system for low-latency, low-overhead, easy to use >>> interprocess communication (IPC). >>> >>> The interface to all functions in this driver is implemented via ioctls >>> on files exposed through a filesystem called 'kdbusfs'. The default >>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed >>> documentation about the kernel level API design. >> >> I have some details feedback on the contents of this file, and some >> bigger questions. I'll split them out into separate mails. >> >> So here, the bigger, general questions to start with. I've arrived late >> to this, so sorry if they've already been discussed, but the answers to >> some of the questions should actually be in this file, I would have >> expected. >> >> This is an enormous and complex API. Why is the API ioctl() based, >> rather than system-call-based? Have we learned nothing from the hydra >> that the futex() multiplexing syscall became? (And kdbus is an order >> of magnitude more complex, by the look of things.) At the very least, >> a *good* justification of why the API is ioctl()-based should be part >> of this documentation file. >> >> An observation: The documentation below is substantial, but this API is >> enormous, so the documentation still feels rather thin. What would >> really help would be some example code in the doc. >> >> And on the subject of code examples... Are there any (prototype) >> working user-space applications that exercise the current kdbus >> implementation? That is, can I install these kdbus patches, and >> then find a simple example application somewhere that does >> something to exercise kdbus? > > If you run a 3.18 kernel, you can install kdbus.ko from our repository > and boot a full Fedora system running Gnome3 with kdbus, given that > you compiled systemd with --enable-kdbus (which is still > experimental). No legacy dbus1 daemon is running. Instead, we have a > bus-proxy that converts classic dbus1 to kdbus, so all > bus-communication runs on kdbus. FWIW, we've been building a "playground" repository for the kernel that contains this already for Fedora. If you have a stock Fedora 21 or rawhide install, you can use: https://copr.fedoraproject.org/coprs/jwboyer/kernel-playground/ which has the kernel+kdbus and systemd built with --enable-kdbus already. Easy enough to throw in a VM for testing. josh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
Hi Michael On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages) wrote: > On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote: >> From: Daniel Mack >> >> kdbus is a system for low-latency, low-overhead, easy to use >> interprocess communication (IPC). >> >> The interface to all functions in this driver is implemented via ioctls >> on files exposed through a filesystem called 'kdbusfs'. The default >> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed >> documentation about the kernel level API design. > > I have some details feedback on the contents of this file, and some > bigger questions. I'll split them out into separate mails. > > So here, the bigger, general questions to start with. I've arrived late > to this, so sorry if they've already been discussed, but the answers to > some of the questions should actually be in this file, I would have > expected. > > This is an enormous and complex API. Why is the API ioctl() based, > rather than system-call-based? Have we learned nothing from the hydra > that the futex() multiplexing syscall became? (And kdbus is an order > of magnitude more complex, by the look of things.) At the very least, > a *good* justification of why the API is ioctl()-based should be part > of this documentation file. > > An observation: The documentation below is substantial, but this API is > enormous, so the documentation still feels rather thin. What would > really help would be some example code in the doc. > > And on the subject of code examples... Are there any (prototype) > working user-space applications that exercise the current kdbus > implementation? That is, can I install these kdbus patches, and > then find a simple example application somewhere that does > something to exercise kdbus? If you run a 3.18 kernel, you can install kdbus.ko from our repository and boot a full Fedora system running Gnome3 with kdbus, given that you compiled systemd with --enable-kdbus (which is still experimental). No legacy dbus1 daemon is running. Instead, we have a bus-proxy that converts classic dbus1 to kdbus, so all bus-communication runs on kdbus. > And then: is there any substantial real-world application (e.g., a > full D-Bus port) that is being developed in tandem with this kernel > side patch? (I don't mean a user-space library; I mean a seriously > large application.) This is an incredibly complex API whose > failings are only going to become evident through real-world use. > Solidifying an API in the kernel and then discovering the API > problems later when writing real-world applications would make for > a sad story. A story something like that of inotify, an API which > is an order of magnitude less complex than kdbus. (I can't help but > feel that many of inotify problems that I discuss at > https://lwn.net/Articles/605128/ might have been fixed or mitigated > if a few real-world applications had been implemented before the > API was set in stone.) I think running a whole Gnome3 stack counts as "substantial real-world application", right? Sure, it's a dbus1-to-kdbus layer, but all the systemd tools use kdbus natively and it works just fine. In fact, we all run kdbus on our main-systems every day. We've spent over a year fixing races and API misdesigns, we've talked to other toolkit developers (glib, qt, ..) and made sure we're backwards compatible to dbus1. I don't think the API is perfect, everyone makes mistakes. But with bus-proxy and systemd we have two huge users of kdbus that put a lot of pressure on API design. >> +For a kdbus specific userspace library implementation please refer to: >> + http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h > > Is this library intended just for systemd? More generally, is there an > intention to provide a general purpose library API for kdbus? Or is the > intention that each application will roll a library suitable to its > needs? I think an answer to that question would be useful in this > Documentation file. kdbus is in no way bound to systemd. There are ongoing efforts to port glib and qt to kdbus natively. The API is pretty simple and I don't see how a libkdbus would simplify things. In fact, even our tests only have slim wrappers around the ioctls to simplify error-handling in test-scenarios. Note that most of the toolkit work is on the marshaling level, which is independent of kdbus. kdbus just provides the transport level. DBus is just one, yet significant, application-layer on top of kdbus. Our test-cases use kdbus exclusively to transport raw byte streams. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/13] kdbus: add documentation
On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote: > From: Daniel Mack > > kdbus is a system for low-latency, low-overhead, easy to use > interprocess communication (IPC). > > The interface to all functions in this driver is implemented via ioctls > on files exposed through a filesystem called 'kdbusfs'. The default > mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed > documentation about the kernel level API design. And now the details feedback. Please note that for the various points I raise below, even in cases where I don't suggest/request a fix, the fact that I've needed to answer a question probably suggests a deficiency in the documentation that probably needs to be remedied. Many of my comments below are wording and typo fixes. While these may sem trivial, the existence of various wording problems and typos is a significant distraction, especially while trying to grok a document of this size. I've marked one or two notable questions about the API with "###". > Signed-off-by: Daniel Mack > Signed-off-by: David Herrmann > Signed-off-by: Djalal Harouni > Signed-off-by: Greg Kroah-Hartman > --- > Documentation/kdbus.txt | 2107 > +++ > 1 file changed, 2107 insertions(+) > create mode 100644 Documentation/kdbus.txt > > diff --git a/Documentation/kdbus.txt b/Documentation/kdbus.txt > new file mode 100644 > index ..2592a7e37079 > --- /dev/null > +++ b/Documentation/kdbus.txt > @@ -0,0 +1,2107 @@ > +D-Bus is a system for powerful, easy to use interprocess communication (IPC). > + > +The focus of this document is an overview of the low-level, native kernel > D-Bus > +transport called kdbus. Kdbus exposes its functionality via files in a > +filesystem called 'kdbusfs'. All communication between processes takes place > +via ioctls on files exposed through the mount point of a kdbusfs. The default > +mount point of kdbusfs is /sys/fs/kdbus. > + > +Please note that kdbus was designed as transport layer for D-Bus, but is in > no > +way limited, nor controlled by the D-Bus protocol specification. The D-Bus > +protocol is one possible application layer on top of kdbus. > + > +For the general D-Bus protocol specification, the payload format, the > +marshaling, and the communication semantics, please refer to: > + http://dbus.freedesktop.org/doc/dbus-specification.html > + > +For a kdbus specific userspace library implementation please refer to: > + http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h > + > +Articles about D-Bus and kdbus: > + http://lwn.net/Articles/580194/ > + > + > +1. Terminology > +=== > + > + Domain: > +A domain is created each time a kdbusfs is mounted. Each process that is > +capable to mount a new instance of a kdbusfs will have its own kdbus s/is capable to mount/mounts/ > +hierarchy. Each domain (ie, each mount point) offers its own "control" > +file to create new buses. Domains have no connection to each other and > +cannot see nor talk to each other. See section 5 for more details. Smoother would be: s/cannot see nor talk/can neither see nor talk/ > + > + Bus: > +A bus is a named object inside a domain. Clients exchange messages > +over a bus. Multiple buses themselves have no connection to each other; > +messages can only be exchanged on the same bus. The default endpoint of > +a bus, where clients establish the connection to, is the "bus" file Maybe: where clients establish the connection to ==> to which clients establish connections ? > +/sys/fs/kdbus//bus. > +Common operating system setups create one "system bus" per system, and > one > +"user bus" for every logged-in user. Applications or services may create At the kdbus level is there any difference between such system and user buses? If not, it would perhaps be good to insert a parenthetical aside to say that. > +their own private buses. See section 5 for more details. > + > + Endpoint: > +An endpoint provides a file to talk to a bus. Opening an endpoint > +creates a new connection to the bus to which the endpoint belongs. All > +endpoints have unique names and are accessible as files underneath the > +directory of a bus, e.g., /sys/fs/kdbus// > +Every bus has a default endpoint called "bus". A bus can optionally offer > +additional endpoints with custom names to provide restricted access to > the > +bus. Custom endpoints carry additional policy which can be used to create > +sandboxes with locked-down, limited, filtered access to a bus. See > +section 5 for more details. > + > + Connection: > +A connection to a bus is created by opening an endpoint file of a bus and > +becoming an active client with the HELLO exchange. Every ordinary client > +connection has a unique identifier on the bus and can address messages to > +
Re: [PATCH 01/13] kdbus: add documentation
On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote: > From: Daniel Mack > > kdbus is a system for low-latency, low-overhead, easy to use > interprocess communication (IPC). > > The interface to all functions in this driver is implemented via ioctls > on files exposed through a filesystem called 'kdbusfs'. The default > mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed > documentation about the kernel level API design. I have some details feedback on the contents of this file, and some bigger questions. I'll split them out into separate mails. So here, the bigger, general questions to start with. I've arrived late to this, so sorry if they've already been discussed, but the answers to some of the questions should actually be in this file, I would have expected. This is an enormous and complex API. Why is the API ioctl() based, rather than system-call-based? Have we learned nothing from the hydra that the futex() multiplexing syscall became? (And kdbus is an order of magnitude more complex, by the look of things.) At the very least, a *good* justification of why the API is ioctl()-based should be part of this documentation file. An observation: The documentation below is substantial, but this API is enormous, so the documentation still feels rather thin. What would really help would be some example code in the doc. And on the subject of code examples... Are there any (prototype) working user-space applications that exercise the current kdbus implementation? That is, can I install these kdbus patches, and then find a simple example application somewhere that does something to exercise kdbus? And then: is there any substantial real-world application (e.g., a full D-Bus port) that is being developed in tandem with this kernel side patch? (I don't mean a user-space library; I mean a seriously large application.) This is an incredibly complex API whose failings are only going to become evident through real-world use. Solidifying an API in the kernel and then discovering the API problems later when writing real-world applications would make for a sad story. A story something like that of inotify, an API which is an order of magnitude less complex than kdbus. (I can't help but feel that many of inotify problems that I discuss at https://lwn.net/Articles/605128/ might have been fixed or mitigated if a few real-world applications had been implemented before the API was set in stone.) > +For a kdbus specific userspace library implementation please refer to: > + http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h Is this library intended just for systemd? More generally, is there an intention to provide a general purpose library API for kdbus? Or is the intention that each application will roll a library suitable to its needs? I think an answer to that question would be useful in this Documentation file. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: add documentation
On 01/20/2015 09:25 AM, Daniel Mack wrote: > Hi Michael, > > On 01/20/2015 09:09 AM, Michael Kerrisk (man-pages) wrote: >> On 11/30/2014 06:23 PM, Florian Weimer wrote: >>> * David Herrmann: >>> On Sun, Nov 30, 2014 at 10:02 AM, Florian Weimer wrote: > * Greg Kroah-Hartman: > >> +7.4 Receiving messages >>> > What happens if this is not possible because the file descriptor limit > of the processes would be exceeded? EMFILE, and the message will not > be received? The message is returned without installing the FDs. This is signaled by EMFILE, but a valid pool offset. >>> >>> Oh. This is really surprising, so it needs documentation. But it's >>> probably better than the alternative (return EMFILE and leave the >>> message stuck, so that you receive it immediately again—this behavior >>> makes non-blocking accept rather difficult to use correctly). >> >> So, was this point in the end explicitly documented? I not >> obvious that it is documented in the revised kdbus.txt that >> Greg K-H sent out 4 days ago. > > No, we've revisited this point and changed the kernel behavior again in > v3. We're no longer returning -EMFILE in this case, but rather set > KDBUS_RECV_RETURN_INCOMPLETE_FDS in a new field in the receive ioctl > struct called 'return_flags'. We believe that's a nicer way of signaling > specific errors. The message will carry -1 for all FDs that failed to > get installed, so the user can actually see which one is missing. > > That's also documented in kdbus.txt, but we missed putting it into the > Changelog - sorry for that. Thanks for the info, Daniel. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: add documentation
Hi Michael, On 01/20/2015 09:09 AM, Michael Kerrisk (man-pages) wrote: > On 11/30/2014 06:23 PM, Florian Weimer wrote: >> * David Herrmann: >> >>> On Sun, Nov 30, 2014 at 10:02 AM, Florian Weimer wrote: * Greg Kroah-Hartman: > +7.4 Receiving messages >> What happens if this is not possible because the file descriptor limit of the processes would be exceeded? EMFILE, and the message will not be received? >>> >>> The message is returned without installing the FDs. This is signaled >>> by EMFILE, but a valid pool offset. >> >> Oh. This is really surprising, so it needs documentation. But it's >> probably better than the alternative (return EMFILE and leave the >> message stuck, so that you receive it immediately again—this behavior >> makes non-blocking accept rather difficult to use correctly). > > So, was this point in the end explicitly documented? I not > obvious that it is documented in the revised kdbus.txt that > Greg K-H sent out 4 days ago. No, we've revisited this point and changed the kernel behavior again in v3. We're no longer returning -EMFILE in this case, but rather set KDBUS_RECV_RETURN_INCOMPLETE_FDS in a new field in the receive ioctl struct called 'return_flags'. We believe that's a nicer way of signaling specific errors. The message will carry -1 for all FDs that failed to get installed, so the user can actually see which one is missing. That's also documented in kdbus.txt, but we missed putting it into the Changelog - sorry for that. Hope this helps, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/