Re: [PATCH 01/13] kdbus: add documentation

2015-02-08 Thread Andy Lutomirski
On Feb 4, 2015 4:16 PM, "David Herrmann"  wrote:
>
> Hi
>
> On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski  wrote:
> > I see "latencies" of around 20 microseconds with lockdep and context
> > tracking off.  For example:
>
> Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us
> for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I
> suspect you enabled metadata transmission, which I think is not a fair
> comparison.

I tried to disable metadata.  I may have failed.

Regardless, if metadata is very slow, then that's more reason not to
use it on send.  And if you shouldn't use it, then maybe the kernel
shouldn't provide it.

I assumed there was a context switch in there.  I can try to test
differently.  If UDS is twice as fast *with* a contest switch, then a
userspace solution should be faster.

Also, UDS can use memfds, too.

>
> A few notes on that:
>
> * kdbus is a bus layer. We don't intend to replace UDS, but improve
> dbus. Comparing roundtrip times with UDS is tempting, but in no way
> fair. To the very least, a bus layer has to perform peer-lookup, which
> UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice.
> Compare this to ~77us for dbus1 without marshaling.

This makes me wonder what dbus1 is doing wrong.

>
> * We have not optimized kdbus code-paths for speed, yet. Our main
> concerns are algorithmic challenges, and we believe they've been
> improved considerably with kdbus. I have constantly measured kdbus
> performance with 'perf' and flame-graphs, and there're a lot of
> possible optimizations (especially on locking). However, I think this
> can be done afterwards just fine. Neither API nor ioctl overhead has
> shown up in my measurements. If anyone has counter evidence, please
> let us know. But I'm a bit reluctant to change our API solely based on
> performance guesses.

But removal of send-time metadata can't be done after the fact.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-08 Thread Andy Lutomirski
On Feb 4, 2015 4:16 PM, David Herrmann dh.herrm...@gmail.com wrote:

 Hi

 On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski l...@amacapital.net wrote:
  I see latencies of around 20 microseconds with lockdep and context
  tracking off.  For example:

 Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us
 for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I
 suspect you enabled metadata transmission, which I think is not a fair
 comparison.

I tried to disable metadata.  I may have failed.

Regardless, if metadata is very slow, then that's more reason not to
use it on send.  And if you shouldn't use it, then maybe the kernel
shouldn't provide it.

I assumed there was a context switch in there.  I can try to test
differently.  If UDS is twice as fast *with* a contest switch, then a
userspace solution should be faster.

Also, UDS can use memfds, too.


 A few notes on that:

 * kdbus is a bus layer. We don't intend to replace UDS, but improve
 dbus. Comparing roundtrip times with UDS is tempting, but in no way
 fair. To the very least, a bus layer has to perform peer-lookup, which
 UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice.
 Compare this to ~77us for dbus1 without marshaling.

This makes me wonder what dbus1 is doing wrong.


 * We have not optimized kdbus code-paths for speed, yet. Our main
 concerns are algorithmic challenges, and we believe they've been
 improved considerably with kdbus. I have constantly measured kdbus
 performance with 'perf' and flame-graphs, and there're a lot of
 possible optimizations (especially on locking). However, I think this
 can be done afterwards just fine. Neither API nor ioctl overhead has
 shown up in my measurements. If anyone has counter evidence, please
 let us know. But I'm a bit reluctant to change our API solely based on
 performance guesses.

But removal of send-time metadata can't be done after the fact.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-04 Thread David Herrmann
Hi

On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski  wrote:
> I see "latencies" of around 20 microseconds with lockdep and context
> tracking off.  For example:

Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us
for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I
suspect you enabled metadata transmission, which I think is not a fair
comparison.

A few notes on that:

* kdbus is a bus layer. We don't intend to replace UDS, but improve
dbus. Comparing roundtrip times with UDS is tempting, but in no way
fair. To the very least, a bus layer has to perform peer-lookup, which
UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice.
Compare this to ~77us for dbus1 without marshaling.

* We have not optimized kdbus code-paths for speed, yet. Our main
concerns are algorithmic challenges, and we believe they've been
improved considerably with kdbus. I have constantly measured kdbus
performance with 'perf' and flame-graphs, and there're a lot of
possible optimizations (especially on locking). However, I think this
can be done afterwards just fine. Neither API nor ioctl overhead has
shown up in my measurements. If anyone has counter evidence, please
let us know. But I'm a bit reluctant to change our API solely based on
performance guesses.

* We're about 50% slower than UDS on 1-byte transmissions. With 32k
we're on-par. How can a lightweight user-space daemon even get close
to that?

* Broadcast performance is a completely different story. SEND gets
around 30% faster compared to kdbus unicasts (as most of the
control-paths are only taken once per message, instead of once per
destination).

* test-benchmark.c does performance tests in a single process. If the
bus-layer is implemented in user-space, you need to account for
context-switches and task wakeups. My UDS and pipe round-trip latency
tests got around 3x slower if done cross processes (3.7us instead of
1.2us). With a user-space daemon, those slow-downs are taken two times
more often for each roundtrip.

* Process time is accounted on the sender, instead of a shared process
(dbus-daemon). Broadcasts will thus no longer consume time-slices of
dbus-daemon, but only the sender's.


With kdbus, we implement a bus-layer. This is our only target! If your
target environment does not require a bus, then don't use kdbus. We
don't intend to replace UDS. On a bus-layer, we need peer-discovery,
policy-handling, destination-lookups, broadcast-management and more.
Pipes/UDS do not provide any of this.
I cannot see how any other existing bus-implementation comes even
close to kdbus, performance-wise. If someone does, please let us know!

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-04 Thread Andy Lutomirski
On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack  wrote:
> Hi Andy,
>
> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>> On Feb 2, 2015 1:34 AM, "Daniel Mack"  wrote:
>
>>> That's right, but again - if an application wants to gather this kind of
>>> information about tasks it interacts with, it can do so today by looking
>>> at /proc or similar sources. Desktop machines do exactly that already,
>>> and the kernel code executed in such cases very much resembles that in
>>> metadata.c, and is certainly not cheaper. kdbus just makes such
>>> information more accessible when requested. Which information is
>>> collected is defined by bit-masks on both the sender and the receiver
>>> connection, and most applications will effectively only use a very
>>> limited set by default if they go through one of the more high-level
>>> libraries.
>>
>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>> metadata.  It does, however, strongly encourage it, and it sounds like
>
> On the kernel level, kdbus just *offers* that, just like sockets offer
> SO_PASSCRED. On the userland level, kdbus helps applications get that
> information race-free, easier and faster than they would otherwise.
>
>> systemd and other major users will use send-time metadata.  Once that
>> happens, it's ABI (even if it's purely in userspace), and changing it
>> is asking for security holes to pop up.  So you'll be mostly stuck
>> with it.
>
> We know we can't break the ABI. At most, we could deprecate item types
> and introduce new ones, but we want to avoid that by all means of
> course. However, I fail to see how that is related to send time
> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
>
>> Do you have some simple benchmark code you can share?  I'd like to
>> play with it a bit.
>
> Sure, it's part of the self-test suite. Call it with "-t benchmark" to
> run the benchmark as isolated test with verbose output. The code for
> that lives in test-benchmark.c.

I see "latencies" of around 20 microseconds with lockdep and context
tracking off.  For example:

stats  (UNIX): 226730 packets processed, latency (nsecs) min/max/avg
 3845 //   34828 //4069
stats (KDBUS): 37103 packets processed, latency (nsecs) min/max/avg
19123 //   99660 //   20696

This is IMO not very good.  With memfds off:

stats  (UNIX): 226061 packets processed, latency (nsecs) min/max/avg
 3885 //   32019 //4079
stats (KDBUS): 83284 packets processed, latency (nsecs) min/max/avg
10525 //   42578 //   10932

With memfds off and the payload set to 8 bytes:

stats (KDBUS): 77669 packets processed, latency (nsecs) min/max/avg
9963 //   64325 //   11645
stats  (UNIX): 253695 packets processed, latency (nsecs) min/max/avg
 2986 //   56094 //3565

Am I missing something here?  This is slow enough that a lightweight
userspace dbus daemon should be able to outperform kdbus, or at least
come very close.

It would be kind of nice to know how long just the send call takes, too.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-04 Thread Andy Lutomirski
On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack dan...@zonque.org wrote:
 Hi Andy,

 On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
 On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote:

 That's right, but again - if an application wants to gather this kind of
 information about tasks it interacts with, it can do so today by looking
 at /proc or similar sources. Desktop machines do exactly that already,
 and the kernel code executed in such cases very much resembles that in
 metadata.c, and is certainly not cheaper. kdbus just makes such
 information more accessible when requested. Which information is
 collected is defined by bit-masks on both the sender and the receiver
 connection, and most applications will effectively only use a very
 limited set by default if they go through one of the more high-level
 libraries.

 I should rephrase a bit.  Kdbus doesn't require use of send-time
 metadata.  It does, however, strongly encourage it, and it sounds like

 On the kernel level, kdbus just *offers* that, just like sockets offer
 SO_PASSCRED. On the userland level, kdbus helps applications get that
 information race-free, easier and faster than they would otherwise.

 systemd and other major users will use send-time metadata.  Once that
 happens, it's ABI (even if it's purely in userspace), and changing it
 is asking for security holes to pop up.  So you'll be mostly stuck
 with it.

 We know we can't break the ABI. At most, we could deprecate item types
 and introduce new ones, but we want to avoid that by all means of
 course. However, I fail to see how that is related to send time
 metadata, or even to kdbus in general, as all ABIs have to be kept stable.

 Do you have some simple benchmark code you can share?  I'd like to
 play with it a bit.

 Sure, it's part of the self-test suite. Call it with -t benchmark to
 run the benchmark as isolated test with verbose output. The code for
 that lives in test-benchmark.c.

I see latencies of around 20 microseconds with lockdep and context
tracking off.  For example:

stats  (UNIX): 226730 packets processed, latency (nsecs) min/max/avg
 3845 //   34828 //4069
stats (KDBUS): 37103 packets processed, latency (nsecs) min/max/avg
19123 //   99660 //   20696

This is IMO not very good.  With memfds off:

stats  (UNIX): 226061 packets processed, latency (nsecs) min/max/avg
 3885 //   32019 //4079
stats (KDBUS): 83284 packets processed, latency (nsecs) min/max/avg
10525 //   42578 //   10932

With memfds off and the payload set to 8 bytes:

stats (KDBUS): 77669 packets processed, latency (nsecs) min/max/avg
9963 //   64325 //   11645
stats  (UNIX): 253695 packets processed, latency (nsecs) min/max/avg
 2986 //   56094 //3565

Am I missing something here?  This is slow enough that a lightweight
userspace dbus daemon should be able to outperform kdbus, or at least
come very close.

It would be kind of nice to know how long just the send call takes, too.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-04 Thread David Herrmann
Hi

On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski l...@amacapital.net wrote:
 I see latencies of around 20 microseconds with lockdep and context
 tracking off.  For example:

Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us
for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I
suspect you enabled metadata transmission, which I think is not a fair
comparison.

A few notes on that:

* kdbus is a bus layer. We don't intend to replace UDS, but improve
dbus. Comparing roundtrip times with UDS is tempting, but in no way
fair. To the very least, a bus layer has to perform peer-lookup, which
UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice.
Compare this to ~77us for dbus1 without marshaling.

* We have not optimized kdbus code-paths for speed, yet. Our main
concerns are algorithmic challenges, and we believe they've been
improved considerably with kdbus. I have constantly measured kdbus
performance with 'perf' and flame-graphs, and there're a lot of
possible optimizations (especially on locking). However, I think this
can be done afterwards just fine. Neither API nor ioctl overhead has
shown up in my measurements. If anyone has counter evidence, please
let us know. But I'm a bit reluctant to change our API solely based on
performance guesses.

* We're about 50% slower than UDS on 1-byte transmissions. With 32k
we're on-par. How can a lightweight user-space daemon even get close
to that?

* Broadcast performance is a completely different story. SEND gets
around 30% faster compared to kdbus unicasts (as most of the
control-paths are only taken once per message, instead of once per
destination).

* test-benchmark.c does performance tests in a single process. If the
bus-layer is implemented in user-space, you need to account for
context-switches and task wakeups. My UDS and pipe round-trip latency
tests got around 3x slower if done cross processes (3.7us instead of
1.2us). With a user-space daemon, those slow-downs are taken two times
more often for each roundtrip.

* Process time is accounted on the sender, instead of a shared process
(dbus-daemon). Broadcasts will thus no longer consume time-slices of
dbus-daemon, but only the sender's.


With kdbus, we implement a bus-layer. This is our only target! If your
target environment does not require a bus, then don't use kdbus. We
don't intend to replace UDS. On a bus-layer, we need peer-discovery,
policy-handling, destination-lookups, broadcast-management and more.
Pipes/UDS do not provide any of this.
I cannot see how any other existing bus-implementation comes even
close to kdbus, performance-wise. If someone does, please let us know!

Thanks
David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-03 Thread Eric W. Biederman
Greg Kroah-Hartman  writes:

> On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote:
>> Andy Lutomirski  writes:
>> 
>> > On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack  wrote:
>> >> Hi Andy,
>> >>
>> >> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>> >>> On Feb 2, 2015 1:34 AM, "Daniel Mack"  wrote:
>> >>
>>  That's right, but again - if an application wants to gather this kind of
>>  information about tasks it interacts with, it can do so today by looking
>>  at /proc or similar sources. Desktop machines do exactly that already,
>>  and the kernel code executed in such cases very much resembles that in
>>  metadata.c, and is certainly not cheaper. kdbus just makes such
>>  information more accessible when requested. Which information is
>>  collected is defined by bit-masks on both the sender and the receiver
>>  connection, and most applications will effectively only use a very
>>  limited set by default if they go through one of the more high-level
>>  libraries.
>> >>>
>> >>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>> >>> metadata.  It does, however, strongly encourage it, and it sounds like
>> >>
>> >> On the kernel level, kdbus just *offers* that, just like sockets offer
>> >> SO_PASSCRED. On the userland level, kdbus helps applications get that
>> >> information race-free, easier and faster than they would otherwise.
>> >>
>> >>> systemd and other major users will use send-time metadata.  Once that
>> >>> happens, it's ABI (even if it's purely in userspace), and changing it
>> >>> is asking for security holes to pop up.  So you'll be mostly stuck
>> >>> with it.
>> >>
>> >> We know we can't break the ABI. At most, we could deprecate item types
>> >> and introduce new ones, but we want to avoid that by all means of
>> >> course. However, I fail to see how that is related to send time
>> >> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
>> >
>> > I should have said it differently.  ABI is the wrong term -- it's more
>> > of a protocol issue.
>> >
>> > It looks like, with the current code, the kernel will provide
>> > (optional) send-time metadata, and the sd-bus library will use it.
>> > The result will be that the communication protocol between clients and
>> > udev, systemd, systemd-logind, g-s-d, etc, will likely involve
>> > send-time metadata.  This may end up being a bottleneck.
>> 
>> A quick note on a couple of things I have seen in this conversation.
>> 
>> - The reason for kdbus is performance.
>
> No, that's not the only reason for kdbus, don't focus only on this.  I
> set out a long list of things for why we created kdbus, speed was only
> one of the things.  Security is also one, and the ability to gather
> these attributes in an atomic and secure way is very important as
> userspace wants this.

Perhaps I should have said the predominant reason.  Certainly that seems
to be most of what I have seen talked about.

Regardless looking at the performance in the design and removing any
substantial obstacle to making things go fast.

Further.  I had this conversation earlier in an earlier round of the
review and I was told that in fact existing dbus applications do not
want or need these attributes.   I think I heard journald wants them for
pretty printing things.

If security is your concern I really think per message attributes
collected and sent when a message is sent is a bad idea.  It has been a
nasty anti-pattern in the kernel code.  Lots and lots of meta-data
copyed from a task and sent to someone else has significant performance,
maintenance, and security impacts.

Code written in that pattern is complex and hard to analyze, and hard to
think about.  Consider debugging why a message does not get the expected
treatment from your suid application because someone changed the euid
over that particular call and had not thought about it's consequences.
Frankly I have been there and done that and it is a mess. 

So no I do not think breaking encapsulation and having weird side
effects affecting your new primitive will have any security benefits
whatsover.   It will just result in brittle complex code.

If you want to avoid the races causing sends through a file descriptor
to fail that don't have the expected attributes (my constructive
suggestion earlier) is a very different thing from a performance and
mainteance standpoint.  That does not increase the code complexity
nearly as much in the implementation or in use, and unexpected failures
happen right away.

>> - pipes rather than unix domain sockets are likely the standard to meet.
>>   If you can't equal unix domain sockets for simple things you are
>>   likely leaving a lot of stops in.  Last I looked pipes in general were
>>   notiably faster than unix domain sockets.
>> 
>>   The performance numbers I saw posted up-thread were horrible.  I have
>>   seen faster numbers across a network of machines.  If your ping-pong
>>   latency 

Re: [PATCH 01/13] kdbus: add documentation

2015-02-03 Thread Greg Kroah-Hartman
On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote:
> Andy Lutomirski  writes:
> 
> > On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack  wrote:
> >> Hi Andy,
> >>
> >> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
> >>> On Feb 2, 2015 1:34 AM, "Daniel Mack"  wrote:
> >>
>  That's right, but again - if an application wants to gather this kind of
>  information about tasks it interacts with, it can do so today by looking
>  at /proc or similar sources. Desktop machines do exactly that already,
>  and the kernel code executed in such cases very much resembles that in
>  metadata.c, and is certainly not cheaper. kdbus just makes such
>  information more accessible when requested. Which information is
>  collected is defined by bit-masks on both the sender and the receiver
>  connection, and most applications will effectively only use a very
>  limited set by default if they go through one of the more high-level
>  libraries.
> >>>
> >>> I should rephrase a bit.  Kdbus doesn't require use of send-time
> >>> metadata.  It does, however, strongly encourage it, and it sounds like
> >>
> >> On the kernel level, kdbus just *offers* that, just like sockets offer
> >> SO_PASSCRED. On the userland level, kdbus helps applications get that
> >> information race-free, easier and faster than they would otherwise.
> >>
> >>> systemd and other major users will use send-time metadata.  Once that
> >>> happens, it's ABI (even if it's purely in userspace), and changing it
> >>> is asking for security holes to pop up.  So you'll be mostly stuck
> >>> with it.
> >>
> >> We know we can't break the ABI. At most, we could deprecate item types
> >> and introduce new ones, but we want to avoid that by all means of
> >> course. However, I fail to see how that is related to send time
> >> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
> >
> > I should have said it differently.  ABI is the wrong term -- it's more
> > of a protocol issue.
> >
> > It looks like, with the current code, the kernel will provide
> > (optional) send-time metadata, and the sd-bus library will use it.
> > The result will be that the communication protocol between clients and
> > udev, systemd, systemd-logind, g-s-d, etc, will likely involve
> > send-time metadata.  This may end up being a bottleneck.
> 
> A quick note on a couple of things I have seen in this conversation.
> 
> - The reason for kdbus is performance.

No, that's not the only reason for kdbus, don't focus only on this.  I
set out a long list of things for why we created kdbus, speed was only
one of the things.  Security is also one, and the ability to gather
these attributes in an atomic and secure way is very important as
userspace wants this.

> - pipes rather than unix domain sockets are likely the standard to meet.
>   If you can't equal unix domain sockets for simple things you are
>   likely leaving a lot of stops in.  Last I looked pipes in general were
>   notiably faster than unix domain sockets.
> 
>   The performance numbers I saw posted up-thread were horrible.  I have
>   seen faster numbers across a network of machines.  If your ping-pong
>   latency isn't measured in nano-seconds you are probably doing
>   something wrong.

It all depends on what you are passing on that "ping-pong", a real
D-Bus connection has real data and meta data that has to be sent.
Trying to make a fake benchmark number isn't going to show anything.

> - syscalls remove overhead.  So since performance is kdbus's reason for 
> existence
>   let's remove some ridiculous stops, and get a fast path into the kernel.

Again, not the only reason, see my first post in this thread for
details.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-03 Thread Eric W. Biederman
Andy Lutomirski  writes:

> On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack  wrote:
>> Hi Andy,
>>
>> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>>> On Feb 2, 2015 1:34 AM, "Daniel Mack"  wrote:
>>
 That's right, but again - if an application wants to gather this kind of
 information about tasks it interacts with, it can do so today by looking
 at /proc or similar sources. Desktop machines do exactly that already,
 and the kernel code executed in such cases very much resembles that in
 metadata.c, and is certainly not cheaper. kdbus just makes such
 information more accessible when requested. Which information is
 collected is defined by bit-masks on both the sender and the receiver
 connection, and most applications will effectively only use a very
 limited set by default if they go through one of the more high-level
 libraries.
>>>
>>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>>> metadata.  It does, however, strongly encourage it, and it sounds like
>>
>> On the kernel level, kdbus just *offers* that, just like sockets offer
>> SO_PASSCRED. On the userland level, kdbus helps applications get that
>> information race-free, easier and faster than they would otherwise.
>>
>>> systemd and other major users will use send-time metadata.  Once that
>>> happens, it's ABI (even if it's purely in userspace), and changing it
>>> is asking for security holes to pop up.  So you'll be mostly stuck
>>> with it.
>>
>> We know we can't break the ABI. At most, we could deprecate item types
>> and introduce new ones, but we want to avoid that by all means of
>> course. However, I fail to see how that is related to send time
>> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
>
> I should have said it differently.  ABI is the wrong term -- it's more
> of a protocol issue.
>
> It looks like, with the current code, the kernel will provide
> (optional) send-time metadata, and the sd-bus library will use it.
> The result will be that the communication protocol between clients and
> udev, systemd, systemd-logind, g-s-d, etc, will likely involve
> send-time metadata.  This may end up being a bottleneck.

A quick note on a couple of things I have seen in this conversation.

- The reason for kdbus is performance.

- pipes rather than unix domain sockets are likely the standard to meet.
  If you can't equal unix domain sockets for simple things you are
  likely leaving a lot of stops in.  Last I looked pipes in general were
  notiably faster than unix domain sockets.

  The performance numbers I saw posted up-thread were horrible.  I have
  seen faster numbers across a network of machines.  If your ping-pong
  latency isn't measured in nano-seconds you are probably doing
  something wrong.

- syscalls remove overhead.  So since performance is kdbus's reason for 
existence
  let's remove some ridiculous stops, and get a fast path into the kernel.

- send-time metadata is a performance nightmare.  SO_PASSCRED is hard
  to implement in a fast performant way, especially when namespaces
  get involved.  Over the long term if you use send-time metadata
  you will grow the kind of compatibility hacks that the user
  namespace and the pid namespace have on SO_PASSCRED and things will
  slow down.

  A similar effect that is more performant in general is to enforce that
  the sender has the expected attributes.

> Once this happens, changing the protocol will be very hard without
> introducing security bugs.  If people start switching to
> connection-time metadata to gain performance, then they'll break both
> the communication protocol and the expectations of client code.  (In
> fact, it'll break twice, sort of, since I think that the current
> protocols are connect-time.)
>
> To me, this seems like a down-side of using send-time metadata, albeit
> possibly not a huge downside at least in the near term.  I don't see a
> corresponding benefit, though.

I think send-time metadata verification is less bad in this regard.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-03 Thread Andy Lutomirski
On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack  wrote:
> Hi Andy,
>
> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>> On Feb 2, 2015 1:34 AM, "Daniel Mack"  wrote:
>
>>> That's right, but again - if an application wants to gather this kind of
>>> information about tasks it interacts with, it can do so today by looking
>>> at /proc or similar sources. Desktop machines do exactly that already,
>>> and the kernel code executed in such cases very much resembles that in
>>> metadata.c, and is certainly not cheaper. kdbus just makes such
>>> information more accessible when requested. Which information is
>>> collected is defined by bit-masks on both the sender and the receiver
>>> connection, and most applications will effectively only use a very
>>> limited set by default if they go through one of the more high-level
>>> libraries.
>>
>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>> metadata.  It does, however, strongly encourage it, and it sounds like
>
> On the kernel level, kdbus just *offers* that, just like sockets offer
> SO_PASSCRED. On the userland level, kdbus helps applications get that
> information race-free, easier and faster than they would otherwise.
>
>> systemd and other major users will use send-time metadata.  Once that
>> happens, it's ABI (even if it's purely in userspace), and changing it
>> is asking for security holes to pop up.  So you'll be mostly stuck
>> with it.
>
> We know we can't break the ABI. At most, we could deprecate item types
> and introduce new ones, but we want to avoid that by all means of
> course. However, I fail to see how that is related to send time
> metadata, or even to kdbus in general, as all ABIs have to be kept stable.

I should have said it differently.  ABI is the wrong term -- it's more
of a protocol issue.

It looks like, with the current code, the kernel will provide
(optional) send-time metadata, and the sd-bus library will use it.
The result will be that the communication protocol between clients and
udev, systemd, systemd-logind, g-s-d, etc, will likely involve
send-time metadata.  This may end up being a bottleneck.

Once this happens, changing the protocol will be very hard without
introducing security bugs.  If people start switching to
connection-time metadata to gain performance, then they'll break both
the communication protocol and the expectations of client code.  (In
fact, it'll break twice, sort of, since I think that the current
protocols are connect-time.)

To me, this seems like a down-side of using send-time metadata, albeit
possibly not a huge downside at least in the near term.  I don't see a
corresponding benefit, though.

>
>> Do you have some simple benchmark code you can share?  I'd like to
>> play with it a bit.
>
> Sure, it's part of the self-test suite. Call it with "-t benchmark" to
> run the benchmark as isolated test with verbose output. The code for
> that lives in test-benchmark.c.
>

I'll try to play with this soon.  Thanks.

--Andy

>
> Thanks,
> Daniel
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-03 Thread Daniel Mack
Hi Andy,

On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
> On Feb 2, 2015 1:34 AM, "Daniel Mack"  wrote:

>> That's right, but again - if an application wants to gather this kind of
>> information about tasks it interacts with, it can do so today by looking
>> at /proc or similar sources. Desktop machines do exactly that already,
>> and the kernel code executed in such cases very much resembles that in
>> metadata.c, and is certainly not cheaper. kdbus just makes such
>> information more accessible when requested. Which information is
>> collected is defined by bit-masks on both the sender and the receiver
>> connection, and most applications will effectively only use a very
>> limited set by default if they go through one of the more high-level
>> libraries.
> 
> I should rephrase a bit.  Kdbus doesn't require use of send-time
> metadata.  It does, however, strongly encourage it, and it sounds like

On the kernel level, kdbus just *offers* that, just like sockets offer
SO_PASSCRED. On the userland level, kdbus helps applications get that
information race-free, easier and faster than they would otherwise.

> systemd and other major users will use send-time metadata.  Once that
> happens, it's ABI (even if it's purely in userspace), and changing it
> is asking for security holes to pop up.  So you'll be mostly stuck
> with it.

We know we can't break the ABI. At most, we could deprecate item types
and introduce new ones, but we want to avoid that by all means of
course. However, I fail to see how that is related to send time
metadata, or even to kdbus in general, as all ABIs have to be kept stable.

> Do you have some simple benchmark code you can share?  I'd like to
> play with it a bit.

Sure, it's part of the self-test suite. Call it with "-t benchmark" to
run the benchmark as isolated test with verbose output. The code for
that lives in test-benchmark.c.


Thanks,
Daniel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-03 Thread Eric W. Biederman
Greg Kroah-Hartman gre...@linuxfoundation.org writes:

 On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote:
 Andy Lutomirski l...@amacapital.net writes:
 
  On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack dan...@zonque.org wrote:
  Hi Andy,
 
  On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
  On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote:
 
  That's right, but again - if an application wants to gather this kind of
  information about tasks it interacts with, it can do so today by looking
  at /proc or similar sources. Desktop machines do exactly that already,
  and the kernel code executed in such cases very much resembles that in
  metadata.c, and is certainly not cheaper. kdbus just makes such
  information more accessible when requested. Which information is
  collected is defined by bit-masks on both the sender and the receiver
  connection, and most applications will effectively only use a very
  limited set by default if they go through one of the more high-level
  libraries.
 
  I should rephrase a bit.  Kdbus doesn't require use of send-time
  metadata.  It does, however, strongly encourage it, and it sounds like
 
  On the kernel level, kdbus just *offers* that, just like sockets offer
  SO_PASSCRED. On the userland level, kdbus helps applications get that
  information race-free, easier and faster than they would otherwise.
 
  systemd and other major users will use send-time metadata.  Once that
  happens, it's ABI (even if it's purely in userspace), and changing it
  is asking for security holes to pop up.  So you'll be mostly stuck
  with it.
 
  We know we can't break the ABI. At most, we could deprecate item types
  and introduce new ones, but we want to avoid that by all means of
  course. However, I fail to see how that is related to send time
  metadata, or even to kdbus in general, as all ABIs have to be kept stable.
 
  I should have said it differently.  ABI is the wrong term -- it's more
  of a protocol issue.
 
  It looks like, with the current code, the kernel will provide
  (optional) send-time metadata, and the sd-bus library will use it.
  The result will be that the communication protocol between clients and
  udev, systemd, systemd-logind, g-s-d, etc, will likely involve
  send-time metadata.  This may end up being a bottleneck.
 
 A quick note on a couple of things I have seen in this conversation.
 
 - The reason for kdbus is performance.

 No, that's not the only reason for kdbus, don't focus only on this.  I
 set out a long list of things for why we created kdbus, speed was only
 one of the things.  Security is also one, and the ability to gather
 these attributes in an atomic and secure way is very important as
 userspace wants this.

Perhaps I should have said the predominant reason.  Certainly that seems
to be most of what I have seen talked about.

Regardless looking at the performance in the design and removing any
substantial obstacle to making things go fast.

Further.  I had this conversation earlier in an earlier round of the
review and I was told that in fact existing dbus applications do not
want or need these attributes.   I think I heard journald wants them for
pretty printing things.

If security is your concern I really think per message attributes
collected and sent when a message is sent is a bad idea.  It has been a
nasty anti-pattern in the kernel code.  Lots and lots of meta-data
copyed from a task and sent to someone else has significant performance,
maintenance, and security impacts.

Code written in that pattern is complex and hard to analyze, and hard to
think about.  Consider debugging why a message does not get the expected
treatment from your suid application because someone changed the euid
over that particular call and had not thought about it's consequences.
Frankly I have been there and done that and it is a mess. 

So no I do not think breaking encapsulation and having weird side
effects affecting your new primitive will have any security benefits
whatsover.   It will just result in brittle complex code.

If you want to avoid the races causing sends through a file descriptor
to fail that don't have the expected attributes (my constructive
suggestion earlier) is a very different thing from a performance and
mainteance standpoint.  That does not increase the code complexity
nearly as much in the implementation or in use, and unexpected failures
happen right away.

 - pipes rather than unix domain sockets are likely the standard to meet.
   If you can't equal unix domain sockets for simple things you are
   likely leaving a lot of stops in.  Last I looked pipes in general were
   notiably faster than unix domain sockets.
 
   The performance numbers I saw posted up-thread were horrible.  I have
   seen faster numbers across a network of machines.  If your ping-pong
   latency isn't measured in nano-seconds you are probably doing
   something wrong.

 It all depends on what you are passing on that ping-pong, a real
 

Re: [PATCH 01/13] kdbus: add documentation

2015-02-03 Thread Eric W. Biederman
Andy Lutomirski l...@amacapital.net writes:

 On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack dan...@zonque.org wrote:
 Hi Andy,

 On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
 On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote:

 That's right, but again - if an application wants to gather this kind of
 information about tasks it interacts with, it can do so today by looking
 at /proc or similar sources. Desktop machines do exactly that already,
 and the kernel code executed in such cases very much resembles that in
 metadata.c, and is certainly not cheaper. kdbus just makes such
 information more accessible when requested. Which information is
 collected is defined by bit-masks on both the sender and the receiver
 connection, and most applications will effectively only use a very
 limited set by default if they go through one of the more high-level
 libraries.

 I should rephrase a bit.  Kdbus doesn't require use of send-time
 metadata.  It does, however, strongly encourage it, and it sounds like

 On the kernel level, kdbus just *offers* that, just like sockets offer
 SO_PASSCRED. On the userland level, kdbus helps applications get that
 information race-free, easier and faster than they would otherwise.

 systemd and other major users will use send-time metadata.  Once that
 happens, it's ABI (even if it's purely in userspace), and changing it
 is asking for security holes to pop up.  So you'll be mostly stuck
 with it.

 We know we can't break the ABI. At most, we could deprecate item types
 and introduce new ones, but we want to avoid that by all means of
 course. However, I fail to see how that is related to send time
 metadata, or even to kdbus in general, as all ABIs have to be kept stable.

 I should have said it differently.  ABI is the wrong term -- it's more
 of a protocol issue.

 It looks like, with the current code, the kernel will provide
 (optional) send-time metadata, and the sd-bus library will use it.
 The result will be that the communication protocol between clients and
 udev, systemd, systemd-logind, g-s-d, etc, will likely involve
 send-time metadata.  This may end up being a bottleneck.

A quick note on a couple of things I have seen in this conversation.

- The reason for kdbus is performance.

- pipes rather than unix domain sockets are likely the standard to meet.
  If you can't equal unix domain sockets for simple things you are
  likely leaving a lot of stops in.  Last I looked pipes in general were
  notiably faster than unix domain sockets.

  The performance numbers I saw posted up-thread were horrible.  I have
  seen faster numbers across a network of machines.  If your ping-pong
  latency isn't measured in nano-seconds you are probably doing
  something wrong.

- syscalls remove overhead.  So since performance is kdbus's reason for 
existence
  let's remove some ridiculous stops, and get a fast path into the kernel.

- send-time metadata is a performance nightmare.  SO_PASSCRED is hard
  to implement in a fast performant way, especially when namespaces
  get involved.  Over the long term if you use send-time metadata
  you will grow the kind of compatibility hacks that the user
  namespace and the pid namespace have on SO_PASSCRED and things will
  slow down.

  A similar effect that is more performant in general is to enforce that
  the sender has the expected attributes.

 Once this happens, changing the protocol will be very hard without
 introducing security bugs.  If people start switching to
 connection-time metadata to gain performance, then they'll break both
 the communication protocol and the expectations of client code.  (In
 fact, it'll break twice, sort of, since I think that the current
 protocols are connect-time.)

 To me, this seems like a down-side of using send-time metadata, albeit
 possibly not a huge downside at least in the near term.  I don't see a
 corresponding benefit, though.

I think send-time metadata verification is less bad in this regard.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-03 Thread Greg Kroah-Hartman
On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote:
 Andy Lutomirski l...@amacapital.net writes:
 
  On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack dan...@zonque.org wrote:
  Hi Andy,
 
  On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
  On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote:
 
  That's right, but again - if an application wants to gather this kind of
  information about tasks it interacts with, it can do so today by looking
  at /proc or similar sources. Desktop machines do exactly that already,
  and the kernel code executed in such cases very much resembles that in
  metadata.c, and is certainly not cheaper. kdbus just makes such
  information more accessible when requested. Which information is
  collected is defined by bit-masks on both the sender and the receiver
  connection, and most applications will effectively only use a very
  limited set by default if they go through one of the more high-level
  libraries.
 
  I should rephrase a bit.  Kdbus doesn't require use of send-time
  metadata.  It does, however, strongly encourage it, and it sounds like
 
  On the kernel level, kdbus just *offers* that, just like sockets offer
  SO_PASSCRED. On the userland level, kdbus helps applications get that
  information race-free, easier and faster than they would otherwise.
 
  systemd and other major users will use send-time metadata.  Once that
  happens, it's ABI (even if it's purely in userspace), and changing it
  is asking for security holes to pop up.  So you'll be mostly stuck
  with it.
 
  We know we can't break the ABI. At most, we could deprecate item types
  and introduce new ones, but we want to avoid that by all means of
  course. However, I fail to see how that is related to send time
  metadata, or even to kdbus in general, as all ABIs have to be kept stable.
 
  I should have said it differently.  ABI is the wrong term -- it's more
  of a protocol issue.
 
  It looks like, with the current code, the kernel will provide
  (optional) send-time metadata, and the sd-bus library will use it.
  The result will be that the communication protocol between clients and
  udev, systemd, systemd-logind, g-s-d, etc, will likely involve
  send-time metadata.  This may end up being a bottleneck.
 
 A quick note on a couple of things I have seen in this conversation.
 
 - The reason for kdbus is performance.

No, that's not the only reason for kdbus, don't focus only on this.  I
set out a long list of things for why we created kdbus, speed was only
one of the things.  Security is also one, and the ability to gather
these attributes in an atomic and secure way is very important as
userspace wants this.

 - pipes rather than unix domain sockets are likely the standard to meet.
   If you can't equal unix domain sockets for simple things you are
   likely leaving a lot of stops in.  Last I looked pipes in general were
   notiably faster than unix domain sockets.
 
   The performance numbers I saw posted up-thread were horrible.  I have
   seen faster numbers across a network of machines.  If your ping-pong
   latency isn't measured in nano-seconds you are probably doing
   something wrong.

It all depends on what you are passing on that ping-pong, a real
D-Bus connection has real data and meta data that has to be sent.
Trying to make a fake benchmark number isn't going to show anything.

 - syscalls remove overhead.  So since performance is kdbus's reason for 
 existence
   let's remove some ridiculous stops, and get a fast path into the kernel.

Again, not the only reason, see my first post in this thread for
details.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-03 Thread Andy Lutomirski
On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack dan...@zonque.org wrote:
 Hi Andy,

 On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
 On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote:

 That's right, but again - if an application wants to gather this kind of
 information about tasks it interacts with, it can do so today by looking
 at /proc or similar sources. Desktop machines do exactly that already,
 and the kernel code executed in such cases very much resembles that in
 metadata.c, and is certainly not cheaper. kdbus just makes such
 information more accessible when requested. Which information is
 collected is defined by bit-masks on both the sender and the receiver
 connection, and most applications will effectively only use a very
 limited set by default if they go through one of the more high-level
 libraries.

 I should rephrase a bit.  Kdbus doesn't require use of send-time
 metadata.  It does, however, strongly encourage it, and it sounds like

 On the kernel level, kdbus just *offers* that, just like sockets offer
 SO_PASSCRED. On the userland level, kdbus helps applications get that
 information race-free, easier and faster than they would otherwise.

 systemd and other major users will use send-time metadata.  Once that
 happens, it's ABI (even if it's purely in userspace), and changing it
 is asking for security holes to pop up.  So you'll be mostly stuck
 with it.

 We know we can't break the ABI. At most, we could deprecate item types
 and introduce new ones, but we want to avoid that by all means of
 course. However, I fail to see how that is related to send time
 metadata, or even to kdbus in general, as all ABIs have to be kept stable.

I should have said it differently.  ABI is the wrong term -- it's more
of a protocol issue.

It looks like, with the current code, the kernel will provide
(optional) send-time metadata, and the sd-bus library will use it.
The result will be that the communication protocol between clients and
udev, systemd, systemd-logind, g-s-d, etc, will likely involve
send-time metadata.  This may end up being a bottleneck.

Once this happens, changing the protocol will be very hard without
introducing security bugs.  If people start switching to
connection-time metadata to gain performance, then they'll break both
the communication protocol and the expectations of client code.  (In
fact, it'll break twice, sort of, since I think that the current
protocols are connect-time.)

To me, this seems like a down-side of using send-time metadata, albeit
possibly not a huge downside at least in the near term.  I don't see a
corresponding benefit, though.


 Do you have some simple benchmark code you can share?  I'd like to
 play with it a bit.

 Sure, it's part of the self-test suite. Call it with -t benchmark to
 run the benchmark as isolated test with verbose output. The code for
 that lives in test-benchmark.c.


I'll try to play with this soon.  Thanks.

--Andy


 Thanks,
 Daniel





-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-03 Thread Daniel Mack
Hi Andy,

On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
 On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote:

 That's right, but again - if an application wants to gather this kind of
 information about tasks it interacts with, it can do so today by looking
 at /proc or similar sources. Desktop machines do exactly that already,
 and the kernel code executed in such cases very much resembles that in
 metadata.c, and is certainly not cheaper. kdbus just makes such
 information more accessible when requested. Which information is
 collected is defined by bit-masks on both the sender and the receiver
 connection, and most applications will effectively only use a very
 limited set by default if they go through one of the more high-level
 libraries.
 
 I should rephrase a bit.  Kdbus doesn't require use of send-time
 metadata.  It does, however, strongly encourage it, and it sounds like

On the kernel level, kdbus just *offers* that, just like sockets offer
SO_PASSCRED. On the userland level, kdbus helps applications get that
information race-free, easier and faster than they would otherwise.

 systemd and other major users will use send-time metadata.  Once that
 happens, it's ABI (even if it's purely in userspace), and changing it
 is asking for security holes to pop up.  So you'll be mostly stuck
 with it.

We know we can't break the ABI. At most, we could deprecate item types
and introduce new ones, but we want to avoid that by all means of
course. However, I fail to see how that is related to send time
metadata, or even to kdbus in general, as all ABIs have to be kept stable.

 Do you have some simple benchmark code you can share?  I'd like to
 play with it a bit.

Sure, it's part of the self-test suite. Call it with -t benchmark to
run the benchmark as isolated test with verbose output. The code for
that lives in test-benchmark.c.


Thanks,
Daniel


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-02 Thread Andy Lutomirski
On Feb 2, 2015 1:34 AM, "Daniel Mack"  wrote:
>
> Hi Andy,
>
> On 01/29/2015 01:09 PM, Andy Lutomirski wrote:
> > On Jan 29, 2015 6:42 AM, "Daniel Mack"  wrote:
>
> >> As we explained before, currently, D-Bus peers do collect the same
> >> information already if they need to have them, but they have to do deal
> >> with the inherit races in such cases. kdbus is closing the gap by
> >> optionally providing the same information along with each message, if
> >> requested.
> >
> > In all these discussions, no one ever gave a decent example use case.
> > If a process drops some privilege, it must close all fds it has that
> > captured its old privilege.  This has nothing to do with kdbus.
>
> kdbus does not implement any new concept here but sticks to what
> SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a
> file-descriptor from socket() or socketpair() and freely pass it around
> between different tasks or threads, but messages will always have the
> credentials attached that are valid at *send* time. SO_PEERCREDS,
> however, still reports the connect-time credentials, and kdbus provides
> exactly the same semantics and both ways of retrieving information.
>
> > I agree that the design seems to have improved to a state of being at
> > least decent,
>
> One reason for that is your feedback. Thanks for that again!
>
> > It's an optional feature that will get used, non-optionally, thousands
> > of times on each boot, apparently.  Keep in mind that it's also a
> > scalability problem because it takes locks.  If it ever gets used
> > thousands of times per CPU on a big thousand-core machine, it's going
> > to suck, and you'll have backed yourself into a corner.
>
> That's right, but again - if an application wants to gather this kind of
> information about tasks it interacts with, it can do so today by looking
> at /proc or similar sources. Desktop machines do exactly that already,
> and the kernel code executed in such cases very much resembles that in
> metadata.c, and is certainly not cheaper. kdbus just makes such
> information more accessible when requested. Which information is
> collected is defined by bit-masks on both the sender and the receiver
> connection, and most applications will effectively only use a very
> limited set by default if they go through one of the more high-level
> libraries.

I should rephrase a bit.  Kdbus doesn't require use of send-time
metadata.  It does, however, strongly encourage it, and it sounds like
systemd and other major users will use send-time metadata.  Once that
happens, it's ABI (even if it's purely in userspace), and changing it
is asking for security holes to pop up.  So you'll be mostly stuck
with it.

>
> Also, when metadata is collected, the code mostly takes temporary
> references on objects like PIDs, namespaces etc. Which operation would
> you consider particularly expensive?

The refcounting, copies of some of the data, and counting bytes and
allocating space.  The refcounting is the part that will scale
particularly badly on many CPUs.

Do you have some simple benchmark code you can share?  I'd like to
play with it a bit.

--Andy

>
>
> Thanks again,
> Daniel
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-02 Thread Daniel Mack
Hi Andy,

On 01/29/2015 01:09 PM, Andy Lutomirski wrote:
> On Jan 29, 2015 6:42 AM, "Daniel Mack"  wrote:

>> As we explained before, currently, D-Bus peers do collect the same
>> information already if they need to have them, but they have to do deal
>> with the inherit races in such cases. kdbus is closing the gap by
>> optionally providing the same information along with each message, if
>> requested.
> 
> In all these discussions, no one ever gave a decent example use case.
> If a process drops some privilege, it must close all fds it has that
> captured its old privilege.  This has nothing to do with kdbus.

kdbus does not implement any new concept here but sticks to what
SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a
file-descriptor from socket() or socketpair() and freely pass it around
between different tasks or threads, but messages will always have the
credentials attached that are valid at *send* time. SO_PEERCREDS,
however, still reports the connect-time credentials, and kdbus provides
exactly the same semantics and both ways of retrieving information.

> I agree that the design seems to have improved to a state of being at
> least decent,

One reason for that is your feedback. Thanks for that again!

> It's an optional feature that will get used, non-optionally, thousands
> of times on each boot, apparently.  Keep in mind that it's also a
> scalability problem because it takes locks.  If it ever gets used
> thousands of times per CPU on a big thousand-core machine, it's going
> to suck, and you'll have backed yourself into a corner.

That's right, but again - if an application wants to gather this kind of
information about tasks it interacts with, it can do so today by looking
at /proc or similar sources. Desktop machines do exactly that already,
and the kernel code executed in such cases very much resembles that in
metadata.c, and is certainly not cheaper. kdbus just makes such
information more accessible when requested. Which information is
collected is defined by bit-masks on both the sender and the receiver
connection, and most applications will effectively only use a very
limited set by default if they go through one of the more high-level
libraries.

Also, when metadata is collected, the code mostly takes temporary
references on objects like PIDs, namespaces etc. Which operation would
you consider particularly expensive?


Thanks again,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-02 Thread Andy Lutomirski
On Feb 2, 2015 1:34 AM, Daniel Mack dan...@zonque.org wrote:

 Hi Andy,

 On 01/29/2015 01:09 PM, Andy Lutomirski wrote:
  On Jan 29, 2015 6:42 AM, Daniel Mack dan...@zonque.org wrote:

  As we explained before, currently, D-Bus peers do collect the same
  information already if they need to have them, but they have to do deal
  with the inherit races in such cases. kdbus is closing the gap by
  optionally providing the same information along with each message, if
  requested.
 
  In all these discussions, no one ever gave a decent example use case.
  If a process drops some privilege, it must close all fds it has that
  captured its old privilege.  This has nothing to do with kdbus.

 kdbus does not implement any new concept here but sticks to what
 SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a
 file-descriptor from socket() or socketpair() and freely pass it around
 between different tasks or threads, but messages will always have the
 credentials attached that are valid at *send* time. SO_PEERCREDS,
 however, still reports the connect-time credentials, and kdbus provides
 exactly the same semantics and both ways of retrieving information.

  I agree that the design seems to have improved to a state of being at
  least decent,

 One reason for that is your feedback. Thanks for that again!

  It's an optional feature that will get used, non-optionally, thousands
  of times on each boot, apparently.  Keep in mind that it's also a
  scalability problem because it takes locks.  If it ever gets used
  thousands of times per CPU on a big thousand-core machine, it's going
  to suck, and you'll have backed yourself into a corner.

 That's right, but again - if an application wants to gather this kind of
 information about tasks it interacts with, it can do so today by looking
 at /proc or similar sources. Desktop machines do exactly that already,
 and the kernel code executed in such cases very much resembles that in
 metadata.c, and is certainly not cheaper. kdbus just makes such
 information more accessible when requested. Which information is
 collected is defined by bit-masks on both the sender and the receiver
 connection, and most applications will effectively only use a very
 limited set by default if they go through one of the more high-level
 libraries.

I should rephrase a bit.  Kdbus doesn't require use of send-time
metadata.  It does, however, strongly encourage it, and it sounds like
systemd and other major users will use send-time metadata.  Once that
happens, it's ABI (even if it's purely in userspace), and changing it
is asking for security holes to pop up.  So you'll be mostly stuck
with it.


 Also, when metadata is collected, the code mostly takes temporary
 references on objects like PIDs, namespaces etc. Which operation would
 you consider particularly expensive?

The refcounting, copies of some of the data, and counting bytes and
allocating space.  The refcounting is the part that will scale
particularly badly on many CPUs.

Do you have some simple benchmark code you can share?  I'd like to
play with it a bit.

--Andy



 Thanks again,
 Daniel

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-02-02 Thread Daniel Mack
Hi Andy,

On 01/29/2015 01:09 PM, Andy Lutomirski wrote:
 On Jan 29, 2015 6:42 AM, Daniel Mack dan...@zonque.org wrote:

 As we explained before, currently, D-Bus peers do collect the same
 information already if they need to have them, but they have to do deal
 with the inherit races in such cases. kdbus is closing the gap by
 optionally providing the same information along with each message, if
 requested.
 
 In all these discussions, no one ever gave a decent example use case.
 If a process drops some privilege, it must close all fds it has that
 captured its old privilege.  This has nothing to do with kdbus.

kdbus does not implement any new concept here but sticks to what
SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a
file-descriptor from socket() or socketpair() and freely pass it around
between different tasks or threads, but messages will always have the
credentials attached that are valid at *send* time. SO_PEERCREDS,
however, still reports the connect-time credentials, and kdbus provides
exactly the same semantics and both ways of retrieving information.

 I agree that the design seems to have improved to a state of being at
 least decent,

One reason for that is your feedback. Thanks for that again!

 It's an optional feature that will get used, non-optionally, thousands
 of times on each boot, apparently.  Keep in mind that it's also a
 scalability problem because it takes locks.  If it ever gets used
 thousands of times per CPU on a big thousand-core machine, it's going
 to suck, and you'll have backed yourself into a corner.

That's right, but again - if an application wants to gather this kind of
information about tasks it interacts with, it can do so today by looking
at /proc or similar sources. Desktop machines do exactly that already,
and the kernel code executed in such cases very much resembles that in
metadata.c, and is certainly not cheaper. kdbus just makes such
information more accessible when requested. Which information is
collected is defined by bit-masks on both the sender and the receiver
connection, and most applications will effectively only use a very
limited set by default if they go through one of the more high-level
libraries.

Also, when metadata is collected, the code mostly takes temporary
references on objects like PIDs, namespaces etc. Which operation would
you consider particularly expensive?


Thanks again,
Daniel

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-29 Thread Andy Lutomirski
On Jan 29, 2015 6:42 AM, "Daniel Mack"  wrote:
>
> On 01/29/2015 12:25 PM, Andy Lutomirski wrote:
> > On Jan 29, 2015 3:53 AM, "Daniel Mack"  wrote:
>
> >> Also note that if a receiving peer opts in for a certain piece of
> >> metadata, it should do that that for a good reason, because it needs
> >> that data to process a request. Letting kdbus do the work of providing
> >> such information is still a lot faster than having the receiving peer
> >> gather it itself, as that would involve more syscalls and more context
> >> switches (let alone the fact that doing so is inherently racy, as
> >> explained in earlier threads).
>
> > Given that I see almost no advantage to send-time metadata, and I see
> > three disadvantages (slower, inconsistent with the basic POSIX model,
> > and inconsistent with existing user-space dbus), I still don't see why
> > you designed it this way.
>
> Because effective information about tasks may change over time, and
> D-Bus is a connection-less protocol that has no notion of peer-to-peer
> connections.
>
> As we explained before, currently, D-Bus peers do collect the same
> information already if they need to have them, but they have to do deal
> with the inherit races in such cases. kdbus is closing the gap by
> optionally providing the same information along with each message, if
> requested.

In all these discussions, no one ever gave a decent example use case.
If a process drops some privilege, it must close all fds it has that
captured its old privilege.  This has nothing to do with kdbus.  With
kdbus, you still need to close and reopen your kdbus fd, unless you've
disabled that bit of metadata, so using send-time metadata hasn't
bought you benefit that I can see.

I agree that the design seems to have improved to a state of being at
least decent, but that doesn't mean that using send-time metadata is a
good idea for systemd or for anything else.

>
> > There's an added disadvantage of the current design: if a kdbus user
> > is communicating with a traditional d-bus user using the proxy, then
> > IIUC the credentials at the time of connection get used.
>
> That's not quite true any more. After our discussion in v2, we agreed on
> dropping this detail. If you're using the proxy, no metadata is attached
> to messages any more. Userspace has to gather this information in the
> traditional, racy way in such cases. You are right - metadata about the
> proxy task is of no interest here, and hence dropping the information
> altogether is the most consistent thing we can do.
>
> But again - that metadata thing just an optional feature. People
> developing with the bare kernel-level API are free to ignore all that
> and just just kdbus as low-level protocol for reliable multicast. Note
> that in such cases, you would still be able to retrieve the connect-time
> metadata if that's needed.
>

It's an optional feature that will get used, non-optionally, thousands
of times on each boot, apparently.  Keep in mind that it's also a
scalability problem because it takes locks.  If it ever gets used
thousands of times per CPU on a big thousand-core machine, it's going
to suck, and you'll have backed yourself into a corner.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-29 Thread Daniel Mack
On 01/29/2015 12:25 PM, Andy Lutomirski wrote:
> On Jan 29, 2015 3:53 AM, "Daniel Mack"  wrote:

>> Also note that if a receiving peer opts in for a certain piece of
>> metadata, it should do that that for a good reason, because it needs
>> that data to process a request. Letting kdbus do the work of providing
>> such information is still a lot faster than having the receiving peer
>> gather it itself, as that would involve more syscalls and more context
>> switches (let alone the fact that doing so is inherently racy, as
>> explained in earlier threads).

> Given that I see almost no advantage to send-time metadata, and I see
> three disadvantages (slower, inconsistent with the basic POSIX model,
> and inconsistent with existing user-space dbus), I still don't see why
> you designed it this way.

Because effective information about tasks may change over time, and
D-Bus is a connection-less protocol that has no notion of peer-to-peer
connections.

As we explained before, currently, D-Bus peers do collect the same
information already if they need to have them, but they have to do deal
with the inherit races in such cases. kdbus is closing the gap by
optionally providing the same information along with each message, if
requested.

> There's an added disadvantage of the current design: if a kdbus user
> is communicating with a traditional d-bus user using the proxy, then
> IIUC the credentials at the time of connection get used.

That's not quite true any more. After our discussion in v2, we agreed on
dropping this detail. If you're using the proxy, no metadata is attached
to messages any more. Userspace has to gather this information in the
traditional, racy way in such cases. You are right - metadata about the
proxy task is of no interest here, and hence dropping the information
altogether is the most consistent thing we can do.

But again - that metadata thing just an optional feature. People
developing with the bare kernel-level API are free to ignore all that
and just just kdbus as low-level protocol for reliable multicast. Note
that in such cases, you would still be able to retrieve the connect-time
metadata if that's needed.


Thanks,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-29 Thread Andy Lutomirski
On Jan 29, 2015 3:53 AM, "Daniel Mack"  wrote:
>
> Hi Andy,
>
> On 01/27/2015 05:03 PM, Andy Lutomirski wrote:
> > On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann  
> > wrote:
>
> >> A 16byte copy does not affect the performance of kdbus message
> >> transactions in any way that matters.
>
> > What are the performance goals of kdbus?  How fast is it ever intended
> > to be?
>
> One of the design goals of kdbus is to speed up a typical D-Bus message
> turnaround. That is, to minimize the number of context switches it
> currently takes to get a message across the bus, and to avoid
> unnecessary extra payload copies.
>
> Even though I'm sure there's still room for future improvement, the
> benchmark test we provided in the kernel self-tests shows that basic
> data transmission performance is roughly comparable to that of UDS for
> smaller payloads. For payloads of bigger sizes (>128kb), kdbus is
> actually faster due to its zero-copy mechanism.
>
> > The reason I ask is that, in the current design, kdbus
> > collects "metadata" (credentials and other identifying information,
> > collected in kdbus_meta_proc_collect) from the sender of every message
> > *at send time*. [1]  This is slow, and it will always be slow.  The
> > slowness of this operation will, in my personal system performance
> > crystal ball, overshadow the cost of a 16 byte copy by several orders
> > of magnitude.
>
> That's certainly true, but that's not a contradiction to the performance
> argument. Please keep in mind that if a receiving peer does not request
> any metadata, the kernel doesn't collect and attach any. We know that
> gathering of some of the metadata comes at a price, which is why we
> split up the information is such fine-grained pieces.
>
> Also note that if a receiving peer opts in for a certain piece of
> metadata, it should do that that for a good reason, because it needs
> that data to process a request. Letting kdbus do the work of providing
> such information is still a lot faster than having the receiving peer
> gather it itself, as that would involve more syscalls and more context
> switches (let alone the fact that doing so is inherently racy, as
> explained in earlier threads).

All this is true, but if you used connect-time metadata, this would be
a non-issue.

Given that I see almost no advantage to send-time metadata, and I see
three disadvantages (slower, inconsistent with the basic POSIX model,
and inconsistent with existing user-space dbus), I still don't see why
you designed it this way.

There's an added disadvantage of the current design: if a kdbus user
is communicating with a traditional d-bus user using the proxy, then
IIUC the credentials at the time of connection get used.

In summary, the current design is (a) unlike almost everything else
that uses file descriptors, (b) much slower, (c) different from
traditional d-bus, and (d) gives inconsistent behavior to new clients
depending on what server they're connecting to.

--Andy

>
> So, yes, collecting metadata can slow down message exchange, but after
> all, that's an optional feature that has to be used with sense. I'll add
> some words on that to the man-pages.
>
>
> HTH,
> Daniel
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-29 Thread Daniel Mack
Hi Andy,

On 01/27/2015 05:03 PM, Andy Lutomirski wrote:
> On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann  wrote:

>> A 16byte copy does not affect the performance of kdbus message
>> transactions in any way that matters.

> What are the performance goals of kdbus?  How fast is it ever intended
> to be?

One of the design goals of kdbus is to speed up a typical D-Bus message
turnaround. That is, to minimize the number of context switches it
currently takes to get a message across the bus, and to avoid
unnecessary extra payload copies.

Even though I'm sure there's still room for future improvement, the
benchmark test we provided in the kernel self-tests shows that basic
data transmission performance is roughly comparable to that of UDS for
smaller payloads. For payloads of bigger sizes (>128kb), kdbus is
actually faster due to its zero-copy mechanism.

> The reason I ask is that, in the current design, kdbus
> collects "metadata" (credentials and other identifying information,
> collected in kdbus_meta_proc_collect) from the sender of every message
> *at send time*. [1]  This is slow, and it will always be slow.  The
> slowness of this operation will, in my personal system performance
> crystal ball, overshadow the cost of a 16 byte copy by several orders
> of magnitude.

That's certainly true, but that's not a contradiction to the performance
argument. Please keep in mind that if a receiving peer does not request
any metadata, the kernel doesn't collect and attach any. We know that
gathering of some of the metadata comes at a price, which is why we
split up the information is such fine-grained pieces.

Also note that if a receiving peer opts in for a certain piece of
metadata, it should do that that for a good reason, because it needs
that data to process a request. Letting kdbus do the work of providing
such information is still a lot faster than having the receiving peer
gather it itself, as that would involve more syscalls and more context
switches (let alone the fact that doing so is inherently racy, as
explained in earlier threads).

So, yes, collecting metadata can slow down message exchange, but after
all, that's an optional feature that has to be used with sense. I'll add
some words on that to the man-pages.


HTH,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-29 Thread Daniel Mack
Hi Andy,

On 01/27/2015 05:03 PM, Andy Lutomirski wrote:
 On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann dh.herrm...@gmail.com wrote:

 A 16byte copy does not affect the performance of kdbus message
 transactions in any way that matters.

 What are the performance goals of kdbus?  How fast is it ever intended
 to be?

One of the design goals of kdbus is to speed up a typical D-Bus message
turnaround. That is, to minimize the number of context switches it
currently takes to get a message across the bus, and to avoid
unnecessary extra payload copies.

Even though I'm sure there's still room for future improvement, the
benchmark test we provided in the kernel self-tests shows that basic
data transmission performance is roughly comparable to that of UDS for
smaller payloads. For payloads of bigger sizes (128kb), kdbus is
actually faster due to its zero-copy mechanism.

 The reason I ask is that, in the current design, kdbus
 collects metadata (credentials and other identifying information,
 collected in kdbus_meta_proc_collect) from the sender of every message
 *at send time*. [1]  This is slow, and it will always be slow.  The
 slowness of this operation will, in my personal system performance
 crystal ball, overshadow the cost of a 16 byte copy by several orders
 of magnitude.

That's certainly true, but that's not a contradiction to the performance
argument. Please keep in mind that if a receiving peer does not request
any metadata, the kernel doesn't collect and attach any. We know that
gathering of some of the metadata comes at a price, which is why we
split up the information is such fine-grained pieces.

Also note that if a receiving peer opts in for a certain piece of
metadata, it should do that that for a good reason, because it needs
that data to process a request. Letting kdbus do the work of providing
such information is still a lot faster than having the receiving peer
gather it itself, as that would involve more syscalls and more context
switches (let alone the fact that doing so is inherently racy, as
explained in earlier threads).

So, yes, collecting metadata can slow down message exchange, but after
all, that's an optional feature that has to be used with sense. I'll add
some words on that to the man-pages.


HTH,
Daniel

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-29 Thread Daniel Mack
On 01/29/2015 12:25 PM, Andy Lutomirski wrote:
 On Jan 29, 2015 3:53 AM, Daniel Mack dan...@zonque.org wrote:

 Also note that if a receiving peer opts in for a certain piece of
 metadata, it should do that that for a good reason, because it needs
 that data to process a request. Letting kdbus do the work of providing
 such information is still a lot faster than having the receiving peer
 gather it itself, as that would involve more syscalls and more context
 switches (let alone the fact that doing so is inherently racy, as
 explained in earlier threads).

 Given that I see almost no advantage to send-time metadata, and I see
 three disadvantages (slower, inconsistent with the basic POSIX model,
 and inconsistent with existing user-space dbus), I still don't see why
 you designed it this way.

Because effective information about tasks may change over time, and
D-Bus is a connection-less protocol that has no notion of peer-to-peer
connections.

As we explained before, currently, D-Bus peers do collect the same
information already if they need to have them, but they have to do deal
with the inherit races in such cases. kdbus is closing the gap by
optionally providing the same information along with each message, if
requested.

 There's an added disadvantage of the current design: if a kdbus user
 is communicating with a traditional d-bus user using the proxy, then
 IIUC the credentials at the time of connection get used.

That's not quite true any more. After our discussion in v2, we agreed on
dropping this detail. If you're using the proxy, no metadata is attached
to messages any more. Userspace has to gather this information in the
traditional, racy way in such cases. You are right - metadata about the
proxy task is of no interest here, and hence dropping the information
altogether is the most consistent thing we can do.

But again - that metadata thing just an optional feature. People
developing with the bare kernel-level API are free to ignore all that
and just just kdbus as low-level protocol for reliable multicast. Note
that in such cases, you would still be able to retrieve the connect-time
metadata if that's needed.


Thanks,
Daniel

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-29 Thread Andy Lutomirski
On Jan 29, 2015 6:42 AM, Daniel Mack dan...@zonque.org wrote:

 On 01/29/2015 12:25 PM, Andy Lutomirski wrote:
  On Jan 29, 2015 3:53 AM, Daniel Mack dan...@zonque.org wrote:

  Also note that if a receiving peer opts in for a certain piece of
  metadata, it should do that that for a good reason, because it needs
  that data to process a request. Letting kdbus do the work of providing
  such information is still a lot faster than having the receiving peer
  gather it itself, as that would involve more syscalls and more context
  switches (let alone the fact that doing so is inherently racy, as
  explained in earlier threads).

  Given that I see almost no advantage to send-time metadata, and I see
  three disadvantages (slower, inconsistent with the basic POSIX model,
  and inconsistent with existing user-space dbus), I still don't see why
  you designed it this way.

 Because effective information about tasks may change over time, and
 D-Bus is a connection-less protocol that has no notion of peer-to-peer
 connections.

 As we explained before, currently, D-Bus peers do collect the same
 information already if they need to have them, but they have to do deal
 with the inherit races in such cases. kdbus is closing the gap by
 optionally providing the same information along with each message, if
 requested.

In all these discussions, no one ever gave a decent example use case.
If a process drops some privilege, it must close all fds it has that
captured its old privilege.  This has nothing to do with kdbus.  With
kdbus, you still need to close and reopen your kdbus fd, unless you've
disabled that bit of metadata, so using send-time metadata hasn't
bought you benefit that I can see.

I agree that the design seems to have improved to a state of being at
least decent, but that doesn't mean that using send-time metadata is a
good idea for systemd or for anything else.


  There's an added disadvantage of the current design: if a kdbus user
  is communicating with a traditional d-bus user using the proxy, then
  IIUC the credentials at the time of connection get used.

 That's not quite true any more. After our discussion in v2, we agreed on
 dropping this detail. If you're using the proxy, no metadata is attached
 to messages any more. Userspace has to gather this information in the
 traditional, racy way in such cases. You are right - metadata about the
 proxy task is of no interest here, and hence dropping the information
 altogether is the most consistent thing we can do.

 But again - that metadata thing just an optional feature. People
 developing with the bare kernel-level API are free to ignore all that
 and just just kdbus as low-level protocol for reliable multicast. Note
 that in such cases, you would still be able to retrieve the connect-time
 metadata if that's needed.


It's an optional feature that will get used, non-optionally, thousands
of times on each boot, apparently.  Keep in mind that it's also a
scalability problem because it takes locks.  If it ever gets used
thousands of times per CPU on a big thousand-core machine, it's going
to suck, and you'll have backed yourself into a corner.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-29 Thread Andy Lutomirski
On Jan 29, 2015 3:53 AM, Daniel Mack dan...@zonque.org wrote:

 Hi Andy,

 On 01/27/2015 05:03 PM, Andy Lutomirski wrote:
  On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann dh.herrm...@gmail.com 
  wrote:

  A 16byte copy does not affect the performance of kdbus message
  transactions in any way that matters.

  What are the performance goals of kdbus?  How fast is it ever intended
  to be?

 One of the design goals of kdbus is to speed up a typical D-Bus message
 turnaround. That is, to minimize the number of context switches it
 currently takes to get a message across the bus, and to avoid
 unnecessary extra payload copies.

 Even though I'm sure there's still room for future improvement, the
 benchmark test we provided in the kernel self-tests shows that basic
 data transmission performance is roughly comparable to that of UDS for
 smaller payloads. For payloads of bigger sizes (128kb), kdbus is
 actually faster due to its zero-copy mechanism.

  The reason I ask is that, in the current design, kdbus
  collects metadata (credentials and other identifying information,
  collected in kdbus_meta_proc_collect) from the sender of every message
  *at send time*. [1]  This is slow, and it will always be slow.  The
  slowness of this operation will, in my personal system performance
  crystal ball, overshadow the cost of a 16 byte copy by several orders
  of magnitude.

 That's certainly true, but that's not a contradiction to the performance
 argument. Please keep in mind that if a receiving peer does not request
 any metadata, the kernel doesn't collect and attach any. We know that
 gathering of some of the metadata comes at a price, which is why we
 split up the information is such fine-grained pieces.

 Also note that if a receiving peer opts in for a certain piece of
 metadata, it should do that that for a good reason, because it needs
 that data to process a request. Letting kdbus do the work of providing
 such information is still a lot faster than having the receiving peer
 gather it itself, as that would involve more syscalls and more context
 switches (let alone the fact that doing so is inherently racy, as
 explained in earlier threads).

All this is true, but if you used connect-time metadata, this would be
a non-issue.

Given that I see almost no advantage to send-time metadata, and I see
three disadvantages (slower, inconsistent with the basic POSIX model,
and inconsistent with existing user-space dbus), I still don't see why
you designed it this way.

There's an added disadvantage of the current design: if a kdbus user
is communicating with a traditional d-bus user using the proxy, then
IIUC the credentials at the time of connection get used.

In summary, the current design is (a) unlike almost everything else
that uses file descriptors, (b) much slower, (c) different from
traditional d-bus, and (d) gives inconsistent behavior to new clients
depending on what server they're connecting to.

--Andy


 So, yes, collecting metadata can slow down message exchange, but after
 all, that's an optional feature that has to be used with sense. I'll add
 some words on that to the man-pages.


 HTH,
 Daniel

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-28 Thread Michael Kerrisk (man-pages)
Hello Daniel,

On 01/27/2015 07:14 PM, Daniel Mack wrote:
> Hi Michael,
> 
> On 01/27/2015 06:53 PM, Michael Kerrisk (man-pages) wrote:
>> On 01/27/2015 04:23 PM, David Herrmann wrote:
> 
>>> I only expect a handful of users to call the ioctls directly. The
>>> libraries that implement the payload-marshaling, in particular. It's a
>>> similar situation with netlink.
>>
>> Thanks, David, for the clarification. I think it would have been helpful
>> to have that more clearly stated up front, especially as some comments 
>> in this thread, such as the above, could be interpreted to mean quite 
>> the opposite. Can I suggest that some text on this point be added to 
>> kdbus.txt?
> 
> We're currently working on an a set of comprehensive man pages to
> document all the commands in the API, along with every struct, enum etc.
> We do that so that developers are able to actually understand every
> detail of the API, even though most people - as David explained - will
> not use that interface directly in the first place but let one of the
> high-level libraries help them integrate D-Bus functionality into their
> applications.

(I suggest that some text about this text go into the kdbus(7) page.)

> If you want, have a look at the upstream repository for a preliminary
> version of the new docs.

That's at https://code.google.com/p/d-bus/ , right? This looks like a 
good direction to go in. Thanks for tackling that.

I hope to take a longer look sometime soon, but a few general conventions
for man-pages that you might want to consider following:

* When listing errors, I think you should change your 
  language/formatting somewhat. Examples here from kdbus.endpoint.7:
   
  (1) The man page says

  RETURN VALUE
   On success, all mentioned ioctl commands return 0.

   Better to write this from a user-space point of view:

  RETURN VALUE
   On success, all mentioned ioctl commands return 0; on 
   error, -1 is returned, and errno is set to indicate 
   the error.

  (2) I would change the wording in the ERRORS sections from
   "may return the following errors"
   to
   "may fail with the following errors"

  (3) When listing the errors, drop the minus signs; that's not 
  what user-space sees. They see a positive value in errno.

  (4) The usual formatting convention for constants, including 
  error constants in man pages is boldface, rather than 
  underline/emphasis.

  (5) Insofar as it's possible, it would be good to make all
  pages format nicely within 80 columns. Some of the literal
  text and ASCII art could, I think, be narrowed.

Thanks,

Michael
-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-28 Thread Michael Kerrisk (man-pages)
Hello Daniel,

On 01/27/2015 07:14 PM, Daniel Mack wrote:
 Hi Michael,
 
 On 01/27/2015 06:53 PM, Michael Kerrisk (man-pages) wrote:
 On 01/27/2015 04:23 PM, David Herrmann wrote:
 
 I only expect a handful of users to call the ioctls directly. The
 libraries that implement the payload-marshaling, in particular. It's a
 similar situation with netlink.

 Thanks, David, for the clarification. I think it would have been helpful
 to have that more clearly stated up front, especially as some comments 
 in this thread, such as the above, could be interpreted to mean quite 
 the opposite. Can I suggest that some text on this point be added to 
 kdbus.txt?
 
 We're currently working on an a set of comprehensive man pages to
 document all the commands in the API, along with every struct, enum etc.
 We do that so that developers are able to actually understand every
 detail of the API, even though most people - as David explained - will
 not use that interface directly in the first place but let one of the
 high-level libraries help them integrate D-Bus functionality into their
 applications.

(I suggest that some text about this text go into the kdbus(7) page.)

 If you want, have a look at the upstream repository for a preliminary
 version of the new docs.

That's at https://code.google.com/p/d-bus/ , right? This looks like a 
good direction to go in. Thanks for tackling that.

I hope to take a longer look sometime soon, but a few general conventions
for man-pages that you might want to consider following:

* When listing errors, I think you should change your 
  language/formatting somewhat. Examples here from kdbus.endpoint.7:
   
  (1) The man page says

  RETURN VALUE
   On success, all mentioned ioctl commands return 0.

   Better to write this from a user-space point of view:

  RETURN VALUE
   On success, all mentioned ioctl commands return 0; on 
   error, -1 is returned, and errno is set to indicate 
   the error.

  (2) I would change the wording in the ERRORS sections from
   may return the following errors
   to
   may fail with the following errors

  (3) When listing the errors, drop the minus signs; that's not 
  what user-space sees. They see a positive value in errno.

  (4) The usual formatting convention for constants, including 
  error constants in man pages is boldface, rather than 
  underline/emphasis.

  (5) Insofar as it's possible, it would be good to make all
  pages format nicely within 80 columns. Some of the literal
  text and ASCII art could, I think, be narrowed.

Thanks,

Michael
-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread Daniel Mack
Hi Michael,

On 01/27/2015 06:53 PM, Michael Kerrisk (man-pages) wrote:
> On 01/27/2015 04:23 PM, David Herrmann wrote:

>> I only expect a handful of users to call the ioctls directly. The
>> libraries that implement the payload-marshaling, in particular. It's a
>> similar situation with netlink.
> 
> Thanks, David, for the clarification. I think it would have been helpful
> to have that more clearly stated up front, especially as some comments 
> in this thread, such as the above, could be interpreted to mean quite 
> the opposite. Can I suggest that some text on this point be added to 
> kdbus.txt?

We're currently working on an a set of comprehensive man pages to
document all the commands in the API, along with every struct, enum etc.
We do that so that developers are able to actually understand every
detail of the API, even though most people - as David explained - will
not use that interface directly in the first place but let one of the
high-level libraries help them integrate D-Bus functionality into their
applications.

If you want, have a look at the upstream repository for a   preliminary
version of the new docs.


Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread Michael Kerrisk (man-pages)
Hi David,

On 01/27/2015 04:05 PM, David Herrmann wrote:
> Hi
> 
> On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
>  wrote:
>> Hello Greg,
>>
>> On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
>>> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
 While I agree that there should be a way for userspace to get the list of
 supported operations, userspace apps will only actually care about that
 once, when they begin talking to kdbus, because (ignoring the live kernel
 patching that people have been working on recently) the list of supported
 operations isn't going to change while the system is running.  While a u64
 copy has relatively low overhead, it does have overhead, and that is very
 significant when you consider part of the reason some people want kdbus is
 for the performance gain.  Especially for those automotive applications 
 that
 have been mentioned which fire off thousands of messages during start-up,
 every little bit of performance is significant.
>>>
>>> A single u64 in a structure is not going to be measurable at all,
>>> processors just copy memory too fast these days for 4 extra bytes to be
>>> noticable.
>>
>> It depends on the definition of measurable, I suppose, but this statement
>> appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
>> about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
>> sets of valid flags. That's 16 bytes, and it definitely makes a difference.
>> Simply running a loop that does a naive memcpy() in a tight user-space
>> loop (code below), I see the following for the execution of 1e9 loops:
>>
>> Including the two extra u64 fields: 3.2 sec
>> Without the two extra u64 fields:   2.6 sec
>>
>> On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
>> simplest syscall, giving us a rough measure of the context switch) takes
>> 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
>> of the base context switch/syscall cost. I assume the costs of copying
>> those 16 bytes across the kernel-user-space boundary would not be cheaper,
>> but have not tested that. If my assumption is correct, then 1% seems a
>> significant figure to me in an API whose raison d'ĂŞtre is speed.
> 
> I have no idea how this is related to any kdbus ioctl?
> 
> A 16byte copy does not affect the performance of kdbus message
> transactions in any way that matters.

I'm not sure if it's related/significant or not, since I'm ignorant
of the performance figures for kdbus. I just got curious when Greg
stated that the cost of copying would not be noticeable. (I got curious 
also about my assumption, and did an experiment with a dummy system call
that throws bytes across the fence into user space. The cost of an
extra 16 bytes (56 to 72 bytes) is about 3% of the base syscall/context 
switch cost.)

>>> So let's make this as easy as possible for userspace, making
>>> it simpler logic there, which is much more important than saving
>>> theoretical time in the kernel.
>>
>> But this also missed the other part of the point. Copying these fields on
>> every operation, when in fact they are only needed once, clutters the API,
>> in my opinion. Good APIs are as simple as they can be to do their job.
>> Redundancy is an enemy of simplicity. Simplest would have been a one time
>> API that returns a structure containing all of the supported flags across
>> the API. Alternatively, the traditional EINVAL approach is well understood,
>> and suffices.
> 
> We're going to drop "kernel_flags" in favor of a new
> KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature
> negotiation for this ioctl and return the supported flags/items inline
> (overwriting the passed data). The ioctl will not be executed and will
> not affect the state of the FD.
> I hope this keeps the API simple.

Not sure I quite understand the details from your description, but I assume 
the it'll end up in the doc, and I'll try to take a look later.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread Michael Kerrisk (man-pages)
On 01/27/2015 04:23 PM, David Herrmann wrote:
> Hi
> 
> On Mon, Jan 26, 2015 at 5:45 PM, Michael Kerrisk (man-pages)
>  wrote:
>> On 01/26/2015 04:26 PM, Tom Gundersen wrote:
>>> On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
>>>  wrote:
 2. Is the API to be invoked directly by applications or is intended to
be used only behind specific libraries? You seem to be saying that
the latter is the case (here, I'm referring to your comment above
about sd-bus). However, when I asked David Herrmann a similar
question I got this responser:

   "kdbus is in no way bound to systemd. There are ongoing efforts
to port glib and qt to kdbus natively. The API is pretty simple
and I don't see how a libkdbus would simplify things. In fact,
even our tests only have slim wrappers around the ioctls to
simplify error-handling in test-scenarios."

To me, that implies that users will employ the raw kernel API.
>>>
>>> The way I read this is that there will (probably) be a handful of
>>> users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
>>> ell, and maybe a few others. However, third-party developers will not
>>> know/care about the details of kdbus, they'll just be coding against
>>> the dbus libraries as before (might be minor changes, but they
>>> certainly won't need to know anything about the kernel API). Similarly
>>> to how userspace developers now code against their libc of choice,
>>> rather than use kernel syscalls directly.
>>
>> Thanks, Tom, for the input. I'm still confused though, since elsewhere
>> in this thread David Herrmann said in response to a question of mine:
>>
>> I think we can agree that we want it to be generically useful,
>> like other ipc mechanisms, including UDS and netlink.
>>
>> Again, that sounds to me like the vision is not "a handful of users".
>> Hopefully Greg and David can clarify.
> 
> I only expect a handful of users to call the ioctls directly. The
> libraries that implement the payload-marshaling, in particular. It's a
> similar situation with netlink.

Thanks, David, for the clarification. I think it would have been helpful
to have that more clearly stated up front, especially as some comments 
in this thread, such as the above, could be interpreted to mean quite 
the opposite. Can I suggest that some text on this point be added to 
kdbus.txt?

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread Andy Lutomirski
On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann  wrote:
> Hi
>
> On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
>  wrote:
>> Hello Greg,
>>
>> On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
>>> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
 While I agree that there should be a way for userspace to get the list of
 supported operations, userspace apps will only actually care about that
 once, when they begin talking to kdbus, because (ignoring the live kernel
 patching that people have been working on recently) the list of supported
 operations isn't going to change while the system is running.  While a u64
 copy has relatively low overhead, it does have overhead, and that is very
 significant when you consider part of the reason some people want kdbus is
 for the performance gain.  Especially for those automotive applications 
 that
 have been mentioned which fire off thousands of messages during start-up,
 every little bit of performance is significant.
>>>
>>> A single u64 in a structure is not going to be measurable at all,
>>> processors just copy memory too fast these days for 4 extra bytes to be
>>> noticable.
>>
>> It depends on the definition of measurable, I suppose, but this statement
>> appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
>> about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
>> sets of valid flags. That's 16 bytes, and it definitely makes a difference.
>> Simply running a loop that does a naive memcpy() in a tight user-space
>> loop (code below), I see the following for the execution of 1e9 loops:
>>
>> Including the two extra u64 fields: 3.2 sec
>> Without the two extra u64 fields:   2.6 sec
>>
>> On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
>> simplest syscall, giving us a rough measure of the context switch) takes
>> 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
>> of the base context switch/syscall cost. I assume the costs of copying
>> those 16 bytes across the kernel-user-space boundary would not be cheaper,
>> but have not tested that. If my assumption is correct, then 1% seems a
>> significant figure to me in an API whose raison d'ĂŞtre is speed.
>
> I have no idea how this is related to any kdbus ioctl?
>
> A 16byte copy does not affect the performance of kdbus message
> transactions in any way that matters.
>

Sorry for jumping in so late.  Since this version of kdbus was sent,
I've been on vacation for part of the time and I had the flu for the
rest of the time.

What are the performance goals of kdbus?  How fast is it ever intended
to be?  The reason I ask is that, in the current design, kdbus
collects "metadata" (credentials and other identifying information,
collected in kdbus_meta_proc_collect) from the sender of every message
*at send time*. [1]  This is slow, and it will always be slow.  The
slowness of this operation will, in my personal system performance
crystal ball, overshadow the cost of a 16 byte copy by several orders
of magnitude.

[1] After much discussion last time around, I'm at least convinced
that the kdbus people have reasons to like the idea of capturing
metadata for each message.  I still think the design is wrong even
from a security standpoint, but right now I'm talking about
performance.  If you want the data plane to be fast, it should be
separated from the control plane as much as possible, and this design
is the opposite.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread David Herrmann
Hi

On Mon, Jan 26, 2015 at 5:45 PM, Michael Kerrisk (man-pages)
 wrote:
> On 01/26/2015 04:26 PM, Tom Gundersen wrote:
>> On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
>>  wrote:
>>> 2. Is the API to be invoked directly by applications or is intended to
>>>be used only behind specific libraries? You seem to be saying that
>>>the latter is the case (here, I'm referring to your comment above
>>>about sd-bus). However, when I asked David Herrmann a similar
>>>question I got this responser:
>>>
>>>   "kdbus is in no way bound to systemd. There are ongoing efforts
>>>to port glib and qt to kdbus natively. The API is pretty simple
>>>and I don't see how a libkdbus would simplify things. In fact,
>>>even our tests only have slim wrappers around the ioctls to
>>>simplify error-handling in test-scenarios."
>>>
>>>To me, that implies that users will employ the raw kernel API.
>>
>> The way I read this is that there will (probably) be a handful of
>> users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
>> ell, and maybe a few others. However, third-party developers will not
>> know/care about the details of kdbus, they'll just be coding against
>> the dbus libraries as before (might be minor changes, but they
>> certainly won't need to know anything about the kernel API). Similarly
>> to how userspace developers now code against their libc of choice,
>> rather than use kernel syscalls directly.
>
> Thanks, Tom, for the input. I'm still confused though, since elsewhere
> in this thread David Herrmann said in response to a question of mine:
>
> I think we can agree that we want it to be generically useful,
> like other ipc mechanisms, including UDS and netlink.
>
> Again, that sounds to me like the vision is not "a handful of users".
> Hopefully Greg and David can clarify.

I only expect a handful of users to call the ioctls directly. The
libraries that implement the payload-marshaling, in particular. It's a
similar situation with netlink.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread David Herrmann
Hi

On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
 wrote:
> Hello Greg,
>
> On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
>> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
>>> While I agree that there should be a way for userspace to get the list of
>>> supported operations, userspace apps will only actually care about that
>>> once, when they begin talking to kdbus, because (ignoring the live kernel
>>> patching that people have been working on recently) the list of supported
>>> operations isn't going to change while the system is running.  While a u64
>>> copy has relatively low overhead, it does have overhead, and that is very
>>> significant when you consider part of the reason some people want kdbus is
>>> for the performance gain.  Especially for those automotive applications that
>>> have been mentioned which fire off thousands of messages during start-up,
>>> every little bit of performance is significant.
>>
>> A single u64 in a structure is not going to be measurable at all,
>> processors just copy memory too fast these days for 4 extra bytes to be
>> noticable.
>
> It depends on the definition of measurable, I suppose, but this statement
> appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
> about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
> sets of valid flags. That's 16 bytes, and it definitely makes a difference.
> Simply running a loop that does a naive memcpy() in a tight user-space
> loop (code below), I see the following for the execution of 1e9 loops:
>
> Including the two extra u64 fields: 3.2 sec
> Without the two extra u64 fields:   2.6 sec
>
> On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
> simplest syscall, giving us a rough measure of the context switch) takes
> 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
> of the base context switch/syscall cost. I assume the costs of copying
> those 16 bytes across the kernel-user-space boundary would not be cheaper,
> but have not tested that. If my assumption is correct, then 1% seems a
> significant figure to me in an API whose raison d'ĂŞtre is speed.

I have no idea how this is related to any kdbus ioctl?

A 16byte copy does not affect the performance of kdbus message
transactions in any way that matters.

>> So let's make this as easy as possible for userspace, making
>> it simpler logic there, which is much more important than saving
>> theoretical time in the kernel.
>
> But this also missed the other part of the point. Copying these fields on
> every operation, when in fact they are only needed once, clutters the API,
> in my opinion. Good APIs are as simple as they can be to do their job.
> Redundancy is an enemy of simplicity. Simplest would have been a one time
> API that returns a structure containing all of the supported flags across
> the API. Alternatively, the traditional EINVAL approach is well understood,
> and suffices.

We're going to drop "kernel_flags" in favor of a new
KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature
negotiation for this ioctl and return the supported flags/items inline
(overwriting the passed data). The ioctl will not be executed and will
not affect the state of the FD.
I hope this keeps the API simple.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread Daniel Mack
Hi Michael,

On 01/27/2015 06:53 PM, Michael Kerrisk (man-pages) wrote:
 On 01/27/2015 04:23 PM, David Herrmann wrote:

 I only expect a handful of users to call the ioctls directly. The
 libraries that implement the payload-marshaling, in particular. It's a
 similar situation with netlink.
 
 Thanks, David, for the clarification. I think it would have been helpful
 to have that more clearly stated up front, especially as some comments 
 in this thread, such as the above, could be interpreted to mean quite 
 the opposite. Can I suggest that some text on this point be added to 
 kdbus.txt?

We're currently working on an a set of comprehensive man pages to
document all the commands in the API, along with every struct, enum etc.
We do that so that developers are able to actually understand every
detail of the API, even though most people - as David explained - will
not use that interface directly in the first place but let one of the
high-level libraries help them integrate D-Bus functionality into their
applications.

If you want, have a look at the upstream repository for a   preliminary
version of the new docs.


Thanks,
Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread David Herrmann
Hi

On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
mtk.manpa...@gmail.com wrote:
 Hello Greg,

 On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
 On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
 While I agree that there should be a way for userspace to get the list of
 supported operations, userspace apps will only actually care about that
 once, when they begin talking to kdbus, because (ignoring the live kernel
 patching that people have been working on recently) the list of supported
 operations isn't going to change while the system is running.  While a u64
 copy has relatively low overhead, it does have overhead, and that is very
 significant when you consider part of the reason some people want kdbus is
 for the performance gain.  Especially for those automotive applications that
 have been mentioned which fire off thousands of messages during start-up,
 every little bit of performance is significant.

 A single u64 in a structure is not going to be measurable at all,
 processors just copy memory too fast these days for 4 extra bytes to be
 noticable.

 It depends on the definition of measurable, I suppose, but this statement
 appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
 about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
 sets of valid flags. That's 16 bytes, and it definitely makes a difference.
 Simply running a loop that does a naive memcpy() in a tight user-space
 loop (code below), I see the following for the execution of 1e9 loops:

 Including the two extra u64 fields: 3.2 sec
 Without the two extra u64 fields:   2.6 sec

 On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
 simplest syscall, giving us a rough measure of the context switch) takes
 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
 of the base context switch/syscall cost. I assume the costs of copying
 those 16 bytes across the kernel-user-space boundary would not be cheaper,
 but have not tested that. If my assumption is correct, then 1% seems a
 significant figure to me in an API whose raison d'ĂŞtre is speed.

I have no idea how this is related to any kdbus ioctl?

A 16byte copy does not affect the performance of kdbus message
transactions in any way that matters.

 So let's make this as easy as possible for userspace, making
 it simpler logic there, which is much more important than saving
 theoretical time in the kernel.

 But this also missed the other part of the point. Copying these fields on
 every operation, when in fact they are only needed once, clutters the API,
 in my opinion. Good APIs are as simple as they can be to do their job.
 Redundancy is an enemy of simplicity. Simplest would have been a one time
 API that returns a structure containing all of the supported flags across
 the API. Alternatively, the traditional EINVAL approach is well understood,
 and suffices.

We're going to drop kernel_flags in favor of a new
KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature
negotiation for this ioctl and return the supported flags/items inline
(overwriting the passed data). The ioctl will not be executed and will
not affect the state of the FD.
I hope this keeps the API simple.

Thanks
David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread David Herrmann
Hi

On Mon, Jan 26, 2015 at 5:45 PM, Michael Kerrisk (man-pages)
mtk.manpa...@gmail.com wrote:
 On 01/26/2015 04:26 PM, Tom Gundersen wrote:
 On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 2. Is the API to be invoked directly by applications or is intended to
be used only behind specific libraries? You seem to be saying that
the latter is the case (here, I'm referring to your comment above
about sd-bus). However, when I asked David Herrmann a similar
question I got this responser:

   kdbus is in no way bound to systemd. There are ongoing efforts
to port glib and qt to kdbus natively. The API is pretty simple
and I don't see how a libkdbus would simplify things. In fact,
even our tests only have slim wrappers around the ioctls to
simplify error-handling in test-scenarios.

To me, that implies that users will employ the raw kernel API.

 The way I read this is that there will (probably) be a handful of
 users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
 ell, and maybe a few others. However, third-party developers will not
 know/care about the details of kdbus, they'll just be coding against
 the dbus libraries as before (might be minor changes, but they
 certainly won't need to know anything about the kernel API). Similarly
 to how userspace developers now code against their libc of choice,
 rather than use kernel syscalls directly.

 Thanks, Tom, for the input. I'm still confused though, since elsewhere
 in this thread David Herrmann said in response to a question of mine:

 I think we can agree that we want it to be generically useful,
 like other ipc mechanisms, including UDS and netlink.

 Again, that sounds to me like the vision is not a handful of users.
 Hopefully Greg and David can clarify.

I only expect a handful of users to call the ioctls directly. The
libraries that implement the payload-marshaling, in particular. It's a
similar situation with netlink.

Thanks
David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread Andy Lutomirski
On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 Hello Greg,

 On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
 On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
 While I agree that there should be a way for userspace to get the list of
 supported operations, userspace apps will only actually care about that
 once, when they begin talking to kdbus, because (ignoring the live kernel
 patching that people have been working on recently) the list of supported
 operations isn't going to change while the system is running.  While a u64
 copy has relatively low overhead, it does have overhead, and that is very
 significant when you consider part of the reason some people want kdbus is
 for the performance gain.  Especially for those automotive applications 
 that
 have been mentioned which fire off thousands of messages during start-up,
 every little bit of performance is significant.

 A single u64 in a structure is not going to be measurable at all,
 processors just copy memory too fast these days for 4 extra bytes to be
 noticable.

 It depends on the definition of measurable, I suppose, but this statement
 appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
 about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
 sets of valid flags. That's 16 bytes, and it definitely makes a difference.
 Simply running a loop that does a naive memcpy() in a tight user-space
 loop (code below), I see the following for the execution of 1e9 loops:

 Including the two extra u64 fields: 3.2 sec
 Without the two extra u64 fields:   2.6 sec

 On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
 simplest syscall, giving us a rough measure of the context switch) takes
 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
 of the base context switch/syscall cost. I assume the costs of copying
 those 16 bytes across the kernel-user-space boundary would not be cheaper,
 but have not tested that. If my assumption is correct, then 1% seems a
 significant figure to me in an API whose raison d'ĂŞtre is speed.

 I have no idea how this is related to any kdbus ioctl?

 A 16byte copy does not affect the performance of kdbus message
 transactions in any way that matters.


Sorry for jumping in so late.  Since this version of kdbus was sent,
I've been on vacation for part of the time and I had the flu for the
rest of the time.

What are the performance goals of kdbus?  How fast is it ever intended
to be?  The reason I ask is that, in the current design, kdbus
collects metadata (credentials and other identifying information,
collected in kdbus_meta_proc_collect) from the sender of every message
*at send time*. [1]  This is slow, and it will always be slow.  The
slowness of this operation will, in my personal system performance
crystal ball, overshadow the cost of a 16 byte copy by several orders
of magnitude.

[1] After much discussion last time around, I'm at least convinced
that the kdbus people have reasons to like the idea of capturing
metadata for each message.  I still think the design is wrong even
from a security standpoint, but right now I'm talking about
performance.  If you want the data plane to be fast, it should be
separated from the control plane as much as possible, and this design
is the opposite.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread Michael Kerrisk (man-pages)
On 01/27/2015 04:23 PM, David Herrmann wrote:
 Hi
 
 On Mon, Jan 26, 2015 at 5:45 PM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 On 01/26/2015 04:26 PM, Tom Gundersen wrote:
 On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 2. Is the API to be invoked directly by applications or is intended to
be used only behind specific libraries? You seem to be saying that
the latter is the case (here, I'm referring to your comment above
about sd-bus). However, when I asked David Herrmann a similar
question I got this responser:

   kdbus is in no way bound to systemd. There are ongoing efforts
to port glib and qt to kdbus natively. The API is pretty simple
and I don't see how a libkdbus would simplify things. In fact,
even our tests only have slim wrappers around the ioctls to
simplify error-handling in test-scenarios.

To me, that implies that users will employ the raw kernel API.

 The way I read this is that there will (probably) be a handful of
 users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
 ell, and maybe a few others. However, third-party developers will not
 know/care about the details of kdbus, they'll just be coding against
 the dbus libraries as before (might be minor changes, but they
 certainly won't need to know anything about the kernel API). Similarly
 to how userspace developers now code against their libc of choice,
 rather than use kernel syscalls directly.

 Thanks, Tom, for the input. I'm still confused though, since elsewhere
 in this thread David Herrmann said in response to a question of mine:

 I think we can agree that we want it to be generically useful,
 like other ipc mechanisms, including UDS and netlink.

 Again, that sounds to me like the vision is not a handful of users.
 Hopefully Greg and David can clarify.
 
 I only expect a handful of users to call the ioctls directly. The
 libraries that implement the payload-marshaling, in particular. It's a
 similar situation with netlink.

Thanks, David, for the clarification. I think it would have been helpful
to have that more clearly stated up front, especially as some comments 
in this thread, such as the above, could be interpreted to mean quite 
the opposite. Can I suggest that some text on this point be added to 
kdbus.txt?

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-27 Thread Michael Kerrisk (man-pages)
Hi David,

On 01/27/2015 04:05 PM, David Herrmann wrote:
 Hi
 
 On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 Hello Greg,

 On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
 On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
 While I agree that there should be a way for userspace to get the list of
 supported operations, userspace apps will only actually care about that
 once, when they begin talking to kdbus, because (ignoring the live kernel
 patching that people have been working on recently) the list of supported
 operations isn't going to change while the system is running.  While a u64
 copy has relatively low overhead, it does have overhead, and that is very
 significant when you consider part of the reason some people want kdbus is
 for the performance gain.  Especially for those automotive applications 
 that
 have been mentioned which fire off thousands of messages during start-up,
 every little bit of performance is significant.

 A single u64 in a structure is not going to be measurable at all,
 processors just copy memory too fast these days for 4 extra bytes to be
 noticable.

 It depends on the definition of measurable, I suppose, but this statement
 appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
 about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
 sets of valid flags. That's 16 bytes, and it definitely makes a difference.
 Simply running a loop that does a naive memcpy() in a tight user-space
 loop (code below), I see the following for the execution of 1e9 loops:

 Including the two extra u64 fields: 3.2 sec
 Without the two extra u64 fields:   2.6 sec

 On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
 simplest syscall, giving us a rough measure of the context switch) takes
 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
 of the base context switch/syscall cost. I assume the costs of copying
 those 16 bytes across the kernel-user-space boundary would not be cheaper,
 but have not tested that. If my assumption is correct, then 1% seems a
 significant figure to me in an API whose raison d'ĂŞtre is speed.
 
 I have no idea how this is related to any kdbus ioctl?
 
 A 16byte copy does not affect the performance of kdbus message
 transactions in any way that matters.

I'm not sure if it's related/significant or not, since I'm ignorant
of the performance figures for kdbus. I just got curious when Greg
stated that the cost of copying would not be noticeable. (I got curious 
also about my assumption, and did an experiment with a dummy system call
that throws bytes across the fence into user space. The cost of an
extra 16 bytes (56 to 72 bytes) is about 3% of the base syscall/context 
switch cost.)

 So let's make this as easy as possible for userspace, making
 it simpler logic there, which is much more important than saving
 theoretical time in the kernel.

 But this also missed the other part of the point. Copying these fields on
 every operation, when in fact they are only needed once, clutters the API,
 in my opinion. Good APIs are as simple as they can be to do their job.
 Redundancy is an enemy of simplicity. Simplest would have been a one time
 API that returns a structure containing all of the supported flags across
 the API. Alternatively, the traditional EINVAL approach is well understood,
 and suffices.
 
 We're going to drop kernel_flags in favor of a new
 KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature
 negotiation for this ioctl and return the supported flags/items inline
 (overwriting the passed data). The ioctl will not be executed and will
 not affect the state of the FD.
 I hope this keeps the API simple.

Not sure I quite understand the details from your description, but I assume 
the it'll end up in the doc, and I'll try to take a look later.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-26 Thread Michael Kerrisk (man-pages)
On 01/26/2015 04:26 PM, Tom Gundersen wrote:
> Hi Michael,
> 
> On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
>  wrote:
>> 2. Is the API to be invoked directly by applications or is intended to
>>be used only behind specific libraries? You seem to be saying that
>>the latter is the case (here, I'm referring to your comment above
>>about sd-bus). However, when I asked David Herrmann a similar
>>question I got this responser:
>>
>>   "kdbus is in no way bound to systemd. There are ongoing efforts
>>to port glib and qt to kdbus natively. The API is pretty simple
>>and I don't see how a libkdbus would simplify things. In fact,
>>even our tests only have slim wrappers around the ioctls to
>>simplify error-handling in test-scenarios."
>>
>>To me, that implies that users will employ the raw kernel API.
> 
> The way I read this is that there will (probably) be a handful of
> users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
> ell, and maybe a few others. However, third-party developers will not
> know/care about the details of kdbus, they'll just be coding against
> the dbus libraries as before (might be minor changes, but they
> certainly won't need to know anything about the kernel API). Similarly
> to how userspace developers now code against their libc of choice,
> rather than use kernel syscalls directly.

Thanks, Tom, for the input. I'm still confused though, since elsewhere
in this thread David Herrmann said in response to a question of mine:

I think we can agree that we want it to be generically useful, 
like other ipc mechanisms, including UDS and netlink.

Again, that sounds to me like the vision is not "a handful of users".
Hopefully Greg and David can clarify.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-26 Thread christoph Hellwig
On Mon, Jan 26, 2015 at 04:26:53PM +0100, Tom Gundersen wrote:
> The way I read this is that there will (probably) be a handful of
> users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
> ell, and maybe a few others. However, third-party developers will not
> know/care about the details of kdbus, they'll just be coding against
> the dbus libraries as before (might be minor changes, but they
> certainly won't need to know anything about the kernel API). Similarly
> to how userspace developers now code against their libc of choice,
> rather than use kernel syscalls directly.

Which means we do need proper man pages and detailed documentation for
it, just like syscalls for syscalls which just happened to be used by
a few libcs.  I suspect it really should be implemented as
syscalls anyway, but we can leave that argument aside from now.  Good
documentation certainly helps with making that decision in an educated
way.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-26 Thread Tom Gundersen
Hi Michael,

On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
 wrote:
> 2. Is the API to be invoked directly by applications or is intended to
>be used only behind specific libraries? You seem to be saying that
>the latter is the case (here, I'm referring to your comment above
>about sd-bus). However, when I asked David Herrmann a similar
>question I got this responser:
>
>   "kdbus is in no way bound to systemd. There are ongoing efforts
>to port glib and qt to kdbus natively. The API is pretty simple
>and I don't see how a libkdbus would simplify things. In fact,
>even our tests only have slim wrappers around the ioctls to
>simplify error-handling in test-scenarios."
>
>To me, that implies that users will employ the raw kernel API.

The way I read this is that there will (probably) be a handful of
users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
ell, and maybe a few others. However, third-party developers will not
know/care about the details of kdbus, they'll just be coding against
the dbus libraries as before (might be minor changes, but they
certainly won't need to know anything about the kernel API). Similarly
to how userspace developers now code against their libc of choice,
rather than use kernel syscalls directly.

HTH,

Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-26 Thread Michael Kerrisk (man-pages)
Hello Greg,

On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
>> While I agree that there should be a way for userspace to get the list of
>> supported operations, userspace apps will only actually care about that
>> once, when they begin talking to kdbus, because (ignoring the live kernel
>> patching that people have been working on recently) the list of supported
>> operations isn't going to change while the system is running.  While a u64
>> copy has relatively low overhead, it does have overhead, and that is very
>> significant when you consider part of the reason some people want kdbus is
>> for the performance gain.  Especially for those automotive applications that
>> have been mentioned which fire off thousands of messages during start-up,
>> every little bit of performance is significant.
> 
> A single u64 in a structure is not going to be measurable at all,
> processors just copy memory too fast these days for 4 extra bytes to be
> noticable.  

It depends on the definition of measurable, I suppose, but this statement
appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
sets of valid flags. That's 16 bytes, and it definitely makes a difference.
Simply running a loop that does a naive memcpy() in a tight user-space
loop (code below), I see the following for the execution of 1e9 loops:

Including the two extra u64 fields: 3.2 sec
Without the two extra u64 fields:   2.6 sec

On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
simplest syscall, giving us a rough measure of the context switch) takes
68 seconds. In other words, the cost of copying those 16 bytes is about 1%
of the base context switch/syscall cost. I assume the costs of copying 
those 16 bytes across the kernel-user-space boundary would not be cheaper, 
but have not tested that. If my assumption is correct, then 1% seems a
significant figure to me in an API whose raison d'ĂŞtre is speed.

> So let's make this as easy as possible for userspace, making
> it simpler logic there, which is much more important than saving
> theoretical time in the kernel.

But this also missed the other part of the point. Copying these fields on
every operation, when in fact they are only needed once, clutters the API,
in my opinion. Good APIs are as simple as they can be to do their job. 
Redundancy is an enemy of simplicity. Simplest would have been a one time 
API that returns a structure containing all of the supported flags across 
the API. Alternatively, the traditional EINVAL approach is well understood,
and suffices.

Thanks,

Michael

=

#include 
#include 
#include 
#include 
#include 

struct kdbus_msg_info {
uint64_t offset;
uint64_t msg_size;
uint64_t return_flags;
};

struct kdbus_cmd_send {
uint64_t size;
uint64_t flags;
#if FIELDS >= 1
uint64_t kernel_flags;
#endif
#if FIELDS >= 2
uint64_t kernel_msg_flags;
#endif
uint64_t return_flags;
uint64_t msg_address;
struct kdbus_msg_info reply;
//struct kdbus_item items[0];
} __attribute__((aligned(8)));

int
main(int argc, char *argv[])
{
long nloops, j;
struct kdbus_cmd_send src, dst;
memset(, 0, sizeof(struct kdbus_cmd_send));

printf("struct size: %zd\n", sizeof(struct kdbus_cmd_send));
nloops = (argc > 1) ? atol(argv[1]) : 10;

for (j = 0; j < nloops; j++) {
memcpy(, , sizeof(struct kdbus_cmd_send));
}

exit(EXIT_SUCCESS);
}


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-26 Thread Michael Kerrisk (man-pages)
Hi Greg,

First of all, I seem to have offended you. That was not my intention.
It's certainly not my intent to disparage you or your work (or for 
that matter, the other kdbus developers). Insofar as any of the wordings 
I've used suggested otherwise, I do apologize.

I'll comment on various points below, keeping it as technical as I can.
Then I have a couple of general questions at the end with the goal
of ensuring that my comments are not coming from a broken world view.

On 01/23/2015 04:54 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote:
 And that process seems to be frequent and ongoing even now. (And 
 it's to your great credit that the API/ABI breaks are clearly and honestly 
 marked in the kdbus.h changelog.) All of this lightens the burden of API
 design for kernel developers, but I'm concerned that the long-term pain
 for user-space developers who use an API which (in my estimation) may
 come to be widely used will be enormous.
>>>
>>> Yes, we've jointly reviewed the API details again until just recently to
>>> unify structs and enums etc, and added fields to make the ioctls structs
>>> more versatile for possible future additions. By that, we effectively
>>> broke the ABI, but we did that because we know we can't do such things
>>> again in the future.
>>>
>>> But again - I don't see how this would be different when using syscalls
>>> rather than ioctls to transport information between the driver and
>>> userspace. Could you elaborate?
>>
>> My suspicion is that not nearly enough thinking has yet been done about
>> the design of the API. That's based on these observations:
>>
>> * Documentation that is, considering the size of the API, *way* too thin.
>> * Some parts of the API not documented at all (various kdbus_item blobs)
>> * ABI changes happening even quite recently
>> * API oddities such as the 'kernel_flags' fields. Why do I need to
>>   be told what flags the kernel supports on *every* operation?
>>
>> The above is just after a day of looking hard at kdbus.txt. I strongly
>> suspect I'd find a lot of other issues if I spent more time on kdbus.
> 
> "not enough thinking"?
> 
> We started working on kdbus 2 years ago this FOSDEM (in a few weeks.)
> Before that we have been thinking about this for many years, learning
> from the previous attempts to get this type of feature merged into the
> kernel, talking with users about what they need for this, and soliciting
> kernel developer's opinions on what type of API would be best for this
> type of feature.
> 
> Since then we have done nothing but constantly revise the API.  My first
> mock ups were way too simple, and in discussing things with people much
> more knowledgeable about D-Bus, they pointed out the problems, and we
> iterated.  And iterated.  And iterated some more.  We have worked with
> just about every userspace libdbus developer group, including QtDbus
> developers as well as glib developers.  Now not all of them agreed with
> some of our decisions in the implementation, which is fair enough, you
> can't please everyone, but they _all_ agree that what we have now is the
> proper way to implement this type of functionality and have reviewed the
> features as being correct and compatible with their needs and users.
> 
> Those discussions have happened in emails, presentations, meetings, and
> hackfests pretty much continuously for the past 2 years all around the
> world.
> 
> We have stress-tested the api with both unit tests (which are included
> here in the patch set) as well as a real-world implementation (sd-bus in
> the systemd source repo.)  That real-world implementation successfully
> has been booting many of our daily machines for many months now.

Notwithstanding that I don't see how a unit test stress tests an API 
*design*, I've no reason to doubt that kdbus works. But that's not the 
point of my concern. I worry how usable this API is going to be for the 
world at large.

> Yes, the documentation can always be better, but please don't confuse
> the lack of understanding how D-Bus works and its model with the lack of
> understanding this kdbus implementation, the two are not comparable.
> For some good primers on what D-Bus is, and the terminology it expects
> see:
>   http://dbus.freedesktop.org/doc/dbus-tutorial.html
> and also:
>   http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc
> 
> We are not going to put a basic "here is what D-Bus is and how to use
> it" into the kernel tree, that is totally outside the scope here.

I didn't expect that you should do that. But it does touch on a general 
question that I'll leave to the end of this mail.

> I suggest reading the tutorial above, and then going back and reading
> the kdbus documentation provided.  If you think we are lacking stuff on
> the kdbus side, we will be glad to flush out any needed areas.
> 
> Also, Daniel has said he will work on a basic userspace 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-26 Thread Michael Kerrisk (man-pages)
Hi Greg,

First of all, I seem to have offended you. That was not my intention.
It's certainly not my intent to disparage you or your work (or for 
that matter, the other kdbus developers). Insofar as any of the wordings 
I've used suggested otherwise, I do apologize.

I'll comment on various points below, keeping it as technical as I can.
Then I have a couple of general questions at the end with the goal
of ensuring that my comments are not coming from a broken world view.

On 01/23/2015 04:54 PM, Greg Kroah-Hartman wrote:
 On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote:
 And that process seems to be frequent and ongoing even now. (And 
 it's to your great credit that the API/ABI breaks are clearly and honestly 
 marked in the kdbus.h changelog.) All of this lightens the burden of API
 design for kernel developers, but I'm concerned that the long-term pain
 for user-space developers who use an API which (in my estimation) may
 come to be widely used will be enormous.

 Yes, we've jointly reviewed the API details again until just recently to
 unify structs and enums etc, and added fields to make the ioctls structs
 more versatile for possible future additions. By that, we effectively
 broke the ABI, but we did that because we know we can't do such things
 again in the future.

 But again - I don't see how this would be different when using syscalls
 rather than ioctls to transport information between the driver and
 userspace. Could you elaborate?

 My suspicion is that not nearly enough thinking has yet been done about
 the design of the API. That's based on these observations:

 * Documentation that is, considering the size of the API, *way* too thin.
 * Some parts of the API not documented at all (various kdbus_item blobs)
 * ABI changes happening even quite recently
 * API oddities such as the 'kernel_flags' fields. Why do I need to
   be told what flags the kernel supports on *every* operation?

 The above is just after a day of looking hard at kdbus.txt. I strongly
 suspect I'd find a lot of other issues if I spent more time on kdbus.
 
 not enough thinking?
 
 We started working on kdbus 2 years ago this FOSDEM (in a few weeks.)
 Before that we have been thinking about this for many years, learning
 from the previous attempts to get this type of feature merged into the
 kernel, talking with users about what they need for this, and soliciting
 kernel developer's opinions on what type of API would be best for this
 type of feature.
 
 Since then we have done nothing but constantly revise the API.  My first
 mock ups were way too simple, and in discussing things with people much
 more knowledgeable about D-Bus, they pointed out the problems, and we
 iterated.  And iterated.  And iterated some more.  We have worked with
 just about every userspace libdbus developer group, including QtDbus
 developers as well as glib developers.  Now not all of them agreed with
 some of our decisions in the implementation, which is fair enough, you
 can't please everyone, but they _all_ agree that what we have now is the
 proper way to implement this type of functionality and have reviewed the
 features as being correct and compatible with their needs and users.
 
 Those discussions have happened in emails, presentations, meetings, and
 hackfests pretty much continuously for the past 2 years all around the
 world.
 
 We have stress-tested the api with both unit tests (which are included
 here in the patch set) as well as a real-world implementation (sd-bus in
 the systemd source repo.)  That real-world implementation successfully
 has been booting many of our daily machines for many months now.

Notwithstanding that I don't see how a unit test stress tests an API 
*design*, I've no reason to doubt that kdbus works. But that's not the 
point of my concern. I worry how usable this API is going to be for the 
world at large.

 Yes, the documentation can always be better, but please don't confuse
 the lack of understanding how D-Bus works and its model with the lack of
 understanding this kdbus implementation, the two are not comparable.
 For some good primers on what D-Bus is, and the terminology it expects
 see:
   http://dbus.freedesktop.org/doc/dbus-tutorial.html
 and also:
   http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc
 
 We are not going to put a basic here is what D-Bus is and how to use
 it into the kernel tree, that is totally outside the scope here.

I didn't expect that you should do that. But it does touch on a general 
question that I'll leave to the end of this mail.

 I suggest reading the tutorial above, and then going back and reading
 the kdbus documentation provided.  If you think we are lacking stuff on
 the kdbus side, we will be glad to flush out any needed areas.
 
 Also, Daniel has said he will work on a basic userspace example
 library to show how to use this api, which might make the api a bit
 easier to understand.
 
 However, I personally don't 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-26 Thread Michael Kerrisk (man-pages)
On 01/26/2015 04:26 PM, Tom Gundersen wrote:
 Hi Michael,
 
 On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 2. Is the API to be invoked directly by applications or is intended to
be used only behind specific libraries? You seem to be saying that
the latter is the case (here, I'm referring to your comment above
about sd-bus). However, when I asked David Herrmann a similar
question I got this responser:

   kdbus is in no way bound to systemd. There are ongoing efforts
to port glib and qt to kdbus natively. The API is pretty simple
and I don't see how a libkdbus would simplify things. In fact,
even our tests only have slim wrappers around the ioctls to
simplify error-handling in test-scenarios.

To me, that implies that users will employ the raw kernel API.
 
 The way I read this is that there will (probably) be a handful of
 users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
 ell, and maybe a few others. However, third-party developers will not
 know/care about the details of kdbus, they'll just be coding against
 the dbus libraries as before (might be minor changes, but they
 certainly won't need to know anything about the kernel API). Similarly
 to how userspace developers now code against their libc of choice,
 rather than use kernel syscalls directly.

Thanks, Tom, for the input. I'm still confused though, since elsewhere
in this thread David Herrmann said in response to a question of mine:

I think we can agree that we want it to be generically useful, 
like other ipc mechanisms, including UDS and netlink.

Again, that sounds to me like the vision is not a handful of users.
Hopefully Greg and David can clarify.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-26 Thread Tom Gundersen
Hi Michael,

On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
mtk.manpa...@gmail.com wrote:
 2. Is the API to be invoked directly by applications or is intended to
be used only behind specific libraries? You seem to be saying that
the latter is the case (here, I'm referring to your comment above
about sd-bus). However, when I asked David Herrmann a similar
question I got this responser:

   kdbus is in no way bound to systemd. There are ongoing efforts
to port glib and qt to kdbus natively. The API is pretty simple
and I don't see how a libkdbus would simplify things. In fact,
even our tests only have slim wrappers around the ioctls to
simplify error-handling in test-scenarios.

To me, that implies that users will employ the raw kernel API.

The way I read this is that there will (probably) be a handful of
users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
ell, and maybe a few others. However, third-party developers will not
know/care about the details of kdbus, they'll just be coding against
the dbus libraries as before (might be minor changes, but they
certainly won't need to know anything about the kernel API). Similarly
to how userspace developers now code against their libc of choice,
rather than use kernel syscalls directly.

HTH,

Tom
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-26 Thread christoph Hellwig
On Mon, Jan 26, 2015 at 04:26:53PM +0100, Tom Gundersen wrote:
 The way I read this is that there will (probably) be a handful of
 users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
 ell, and maybe a few others. However, third-party developers will not
 know/care about the details of kdbus, they'll just be coding against
 the dbus libraries as before (might be minor changes, but they
 certainly won't need to know anything about the kernel API). Similarly
 to how userspace developers now code against their libc of choice,
 rather than use kernel syscalls directly.

Which means we do need proper man pages and detailed documentation for
it, just like syscalls for syscalls which just happened to be used by
a few libcs.  I suspect it really should be implemented as
syscalls anyway, but we can leave that argument aside from now.  Good
documentation certainly helps with making that decision in an educated
way.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-26 Thread Michael Kerrisk (man-pages)
Hello Greg,

On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
 On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
 While I agree that there should be a way for userspace to get the list of
 supported operations, userspace apps will only actually care about that
 once, when they begin talking to kdbus, because (ignoring the live kernel
 patching that people have been working on recently) the list of supported
 operations isn't going to change while the system is running.  While a u64
 copy has relatively low overhead, it does have overhead, and that is very
 significant when you consider part of the reason some people want kdbus is
 for the performance gain.  Especially for those automotive applications that
 have been mentioned which fire off thousands of messages during start-up,
 every little bit of performance is significant.
 
 A single u64 in a structure is not going to be measurable at all,
 processors just copy memory too fast these days for 4 extra bytes to be
 noticable.  

It depends on the definition of measurable, I suppose, but this statement
appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
sets of valid flags. That's 16 bytes, and it definitely makes a difference.
Simply running a loop that does a naive memcpy() in a tight user-space
loop (code below), I see the following for the execution of 1e9 loops:

Including the two extra u64 fields: 3.2 sec
Without the two extra u64 fields:   2.6 sec

On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
simplest syscall, giving us a rough measure of the context switch) takes
68 seconds. In other words, the cost of copying those 16 bytes is about 1%
of the base context switch/syscall cost. I assume the costs of copying 
those 16 bytes across the kernel-user-space boundary would not be cheaper, 
but have not tested that. If my assumption is correct, then 1% seems a
significant figure to me in an API whose raison d'ĂŞtre is speed.

 So let's make this as easy as possible for userspace, making
 it simpler logic there, which is much more important than saving
 theoretical time in the kernel.

But this also missed the other part of the point. Copying these fields on
every operation, when in fact they are only needed once, clutters the API,
in my opinion. Good APIs are as simple as they can be to do their job. 
Redundancy is an enemy of simplicity. Simplest would have been a one time 
API that returns a structure containing all of the supported flags across 
the API. Alternatively, the traditional EINVAL approach is well understood,
and suffices.

Thanks,

Michael

=

#include stdint.h
#include unistd.h
#include stdio.h
#include stdlib.h
#include string.h

struct kdbus_msg_info {
uint64_t offset;
uint64_t msg_size;
uint64_t return_flags;
};

struct kdbus_cmd_send {
uint64_t size;
uint64_t flags;
#if FIELDS = 1
uint64_t kernel_flags;
#endif
#if FIELDS = 2
uint64_t kernel_msg_flags;
#endif
uint64_t return_flags;
uint64_t msg_address;
struct kdbus_msg_info reply;
//struct kdbus_item items[0];
} __attribute__((aligned(8)));

int
main(int argc, char *argv[])
{
long nloops, j;
struct kdbus_cmd_send src, dst;
memset(dst, 0, sizeof(struct kdbus_cmd_send));

printf(struct size: %zd\n, sizeof(struct kdbus_cmd_send));
nloops = (argc  1) ? atol(argv[1]) : 10;

for (j = 0; j  nloops; j++) {
memcpy(dst, src, sizeof(struct kdbus_cmd_send));
}

exit(EXIT_SUCCESS);
}


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-24 Thread Ahmed S. Darwish
On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote:
> On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
> > On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> > > From: Daniel Mack 
> > > 
> > > kdbus is a system for low-latency, low-overhead, easy to use
> > > interprocess communication (IPC).
> > > 
> > > The interface to all functions in this driver is implemented via ioctls
> > > on files exposed through a filesystem called 'kdbusfs'. The default
> > > mount point of kdbusfs is /sys/fs/kdbus.
> > 
> > Pardon my ignorance, but we've always been told that adding
> > new ioctl()s to the kernel is a very big no-no.  But given
> > the seniority of the folks stewarding this kdbus effort,
> > there must be a good rationale ;-)
> > 
> > So, can the rationale behind introducing new ioctl()s be
> > further explained? It would be even better if it's included
> > in the documentation patch itself.
> 
> The main reason to use an ioctl is that you want to atomically set
> and/or get something "complex" through the user/kernel boundary.  For
> simple device attributes, sysfs works great, for configuring devices,
> configfs works great, but for data streams / structures / etc. an ioctl
> is the correct thing to use.
> 
> Examples of new ioctls being added to the kernel are all over the
> place, look at all of the special-purpose ioctls the filesystems keep
> creating (they aren't adding new syscalls), look at the monstrosity that
> is the DRM layer, look at other complex things like openvswitch, or
> "simpler" device-specific interfaces like the MEI one, or even more
> complex ones like the MMC interface.  These are all valid uses of ioctls
> as they are device/filesystem specific ways to interact with the kernel.
> 
> The thing is, almost no one pays attention to these new ioctls as they
> are domain-specific interfaces, with open userspace programs talking to
> them, and they work well.  ioctl is a powerful and useful interface, and
> if we were to suddenly require no new ioctls, and require everything to
> be a syscall, we would do nothing except make apis more complex (hint,
> you now have to do extra validation on your file descriptor passed to
> you to determine if it really is what you can properly operate your
> ioctl on), and cause no real benefit at all.
> 
> Yes, people abuse ioctls at times, but really, they are needed.
> 
> And remember, I was one of the people who long ago thought we should not
> be adding more ioctls, but I have seen the folly of my ways, and chalk
> it up to youthful ignorance :)
> 

Exactly, and that's why I wondered about the sudden change
of heart ;-)

Thanks for taking the time to write this.

Regards,
Darwish
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-24 Thread Ahmed S. Darwish
On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote:
 On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
  On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
   From: Daniel Mack dan...@zonque.org
   
   kdbus is a system for low-latency, low-overhead, easy to use
   interprocess communication (IPC).
   
   The interface to all functions in this driver is implemented via ioctls
   on files exposed through a filesystem called 'kdbusfs'. The default
   mount point of kdbusfs is /sys/fs/kdbus.
  
  Pardon my ignorance, but we've always been told that adding
  new ioctl()s to the kernel is a very big no-no.  But given
  the seniority of the folks stewarding this kdbus effort,
  there must be a good rationale ;-)
  
  So, can the rationale behind introducing new ioctl()s be
  further explained? It would be even better if it's included
  in the documentation patch itself.
 
 The main reason to use an ioctl is that you want to atomically set
 and/or get something complex through the user/kernel boundary.  For
 simple device attributes, sysfs works great, for configuring devices,
 configfs works great, but for data streams / structures / etc. an ioctl
 is the correct thing to use.
 
 Examples of new ioctls being added to the kernel are all over the
 place, look at all of the special-purpose ioctls the filesystems keep
 creating (they aren't adding new syscalls), look at the monstrosity that
 is the DRM layer, look at other complex things like openvswitch, or
 simpler device-specific interfaces like the MEI one, or even more
 complex ones like the MMC interface.  These are all valid uses of ioctls
 as they are device/filesystem specific ways to interact with the kernel.
 
 The thing is, almost no one pays attention to these new ioctls as they
 are domain-specific interfaces, with open userspace programs talking to
 them, and they work well.  ioctl is a powerful and useful interface, and
 if we were to suddenly require no new ioctls, and require everything to
 be a syscall, we would do nothing except make apis more complex (hint,
 you now have to do extra validation on your file descriptor passed to
 you to determine if it really is what you can properly operate your
 ioctl on), and cause no real benefit at all.
 
 Yes, people abuse ioctls at times, but really, they are needed.
 
 And remember, I was one of the people who long ago thought we should not
 be adding more ioctls, but I have seen the folly of my ways, and chalk
 it up to youthful ignorance :)
 

Exactly, and that's why I wondered about the sudden change
of heart ;-)

Thanks for taking the time to write this.

Regards,
Darwish
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-23 Thread Greg Kroah-Hartman
On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
> While I agree that there should be a way for userspace to get the list of
> supported operations, userspace apps will only actually care about that
> once, when they begin talking to kdbus, because (ignoring the live kernel
> patching that people have been working on recently) the list of supported
> operations isn't going to change while the system is running.  While a u64
> copy has relatively low overhead, it does have overhead, and that is very
> significant when you consider part of the reason some people want kdbus is
> for the performance gain.  Especially for those automotive applications that
> have been mentioned which fire off thousands of messages during start-up,
> every little bit of performance is significant.

A single u64 in a structure is not going to be measurable at all,
processors just copy memory too fast these days for 4 extra bytes to be
noticable.  So let's make this as easy as possible for userspace, making
it simpler logic there, which is much more important than saving
theoretical time in the kernel.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-23 Thread Greg Kroah-Hartman
On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote:
> >> And that process seems to be frequent and ongoing even now. (And 
> >> it's to your great credit that the API/ABI breaks are clearly and honestly 
> >> marked in the kdbus.h changelog.) All of this lightens the burden of API
> >> design for kernel developers, but I'm concerned that the long-term pain
> >> for user-space developers who use an API which (in my estimation) may
> >> come to be widely used will be enormous.
> > 
> > Yes, we've jointly reviewed the API details again until just recently to
> > unify structs and enums etc, and added fields to make the ioctls structs
> > more versatile for possible future additions. By that, we effectively
> > broke the ABI, but we did that because we know we can't do such things
> > again in the future.
> > 
> > But again - I don't see how this would be different when using syscalls
> > rather than ioctls to transport information between the driver and
> > userspace. Could you elaborate?
> 
> My suspicion is that not nearly enough thinking has yet been done about
> the design of the API. That's based on these observations:
> 
> * Documentation that is, considering the size of the API, *way* too thin.
> * Some parts of the API not documented at all (various kdbus_item blobs)
> * ABI changes happening even quite recently
> * API oddities such as the 'kernel_flags' fields. Why do I need to
>   be told what flags the kernel supports on *every* operation?
> 
> The above is just after a day of looking hard at kdbus.txt. I strongly
> suspect I'd find a lot of other issues if I spent more time on kdbus.

"not enough thinking"?

We started working on kdbus 2 years ago this FOSDEM (in a few weeks.)
Before that we have been thinking about this for many years, learning
from the previous attempts to get this type of feature merged into the
kernel, talking with users about what they need for this, and soliciting
kernel developer's opinions on what type of API would be best for this
type of feature.

Since then we have done nothing but constantly revise the API.  My first
mock ups were way too simple, and in discussing things with people much
more knowledgeable about D-Bus, they pointed out the problems, and we
iterated.  And iterated.  And iterated some more.  We have worked with
just about every userspace libdbus developer group, including QtDbus
developers as well as glib developers.  Now not all of them agreed with
some of our decisions in the implementation, which is fair enough, you
can't please everyone, but they _all_ agree that what we have now is the
proper way to implement this type of functionality and have reviewed the
features as being correct and compatible with their needs and users.

Those discussions have happened in emails, presentations, meetings, and
hackfests pretty much continuously for the past 2 years all around the
world.

We have stress-tested the api with both unit tests (which are included
here in the patch set) as well as a real-world implementation (sd-bus in
the systemd source repo.)  That real-world implementation successfully
has been booting many of our daily machines for many months now.

Yes, the documentation can always be better, but please don't confuse
the lack of understanding how D-Bus works and its model with the lack of
understanding this kdbus implementation, the two are not comparable.
For some good primers on what D-Bus is, and the terminology it expects
see:
http://dbus.freedesktop.org/doc/dbus-tutorial.html
and also:
http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc

We are not going to put a basic "here is what D-Bus is and how to use
it" into the kernel tree, that is totally outside the scope here.

I suggest reading the tutorial above, and then going back and reading
the kdbus documentation provided.  If you think we are lacking stuff on
the kdbus side, we will be glad to flush out any needed areas.

Also, Daniel has said he will work on a basic userspace "example"
library to show how to use this api, which might make the api a bit
easier to understand.

However, I personally don't think this "example code" is necessary at
all.  We don't ask for this type of "simple examples" from other new
kernel apis we create and add to the kernel all the time.  We require
there to be a user of the api, but not one that is so well documented
that others can write a from-scratch raw userspace replacement.
Specific examples of this are my previously mentioned ioctl users
(btrfs, mei, mic, openvlan, etc.), and the grand-daddy of all horrid
apis, DRM.

Users aren't going to be writing their own "raw kdbus" libraries.  Or if
they are, they are going to start with one of the existing
implementations we have (the test examples and sd-bus, and I think there
is a native Go implementation somewhere as well.)  Users are going to be
using those libraries to write their code, and to be honest, the user
api for sd-bus is a delight to use compared to 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-23 Thread Greg Kroah-Hartman
On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote:
> On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
> > On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> > > From: Daniel Mack 
> > > 
> > > kdbus is a system for low-latency, low-overhead, easy to use
> > > interprocess communication (IPC).
> > > 
> > > The interface to all functions in this driver is implemented via ioctls
> > > on files exposed through a filesystem called 'kdbusfs'. The default
> > > mount point of kdbusfs is /sys/fs/kdbus.
> > 
> > Pardon my ignorance, but we've always been told that adding
> > new ioctl()s to the kernel is a very big no-no.  But given
> > the seniority of the folks stewarding this kdbus effort,
> > there must be a good rationale ;-)
> > 
> > So, can the rationale behind introducing new ioctl()s be
> > further explained? It would be even better if it's included
> > in the documentation patch itself.
> 
> The main reason to use an ioctl is that you want to atomically set
> and/or get something "complex" through the user/kernel boundary.  For
> simple device attributes, sysfs works great, for configuring devices,
> configfs works great, but for data streams / structures / etc. an ioctl
> is the correct thing to use.
> 
> Examples of new ioctls being added to the kernel are all over the
> place, look at all of the special-purpose ioctls the filesystems keep
> creating (they aren't adding new syscalls), look at the monstrosity that
> is the DRM layer, look at other complex things like openvswitch, or
> "simpler" device-specific interfaces like the MEI one, or even more
> complex ones like the MMC interface.

Oops, I meant, MIC, not MMC, sorry about that.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-23 Thread Greg Kroah-Hartman
On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
> On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> > From: Daniel Mack 
> > 
> > kdbus is a system for low-latency, low-overhead, easy to use
> > interprocess communication (IPC).
> > 
> > The interface to all functions in this driver is implemented via ioctls
> > on files exposed through a filesystem called 'kdbusfs'. The default
> > mount point of kdbusfs is /sys/fs/kdbus.
> 
> Pardon my ignorance, but we've always been told that adding
> new ioctl()s to the kernel is a very big no-no.  But given
> the seniority of the folks stewarding this kdbus effort,
> there must be a good rationale ;-)
> 
> So, can the rationale behind introducing new ioctl()s be
> further explained? It would be even better if it's included
> in the documentation patch itself.

The main reason to use an ioctl is that you want to atomically set
and/or get something "complex" through the user/kernel boundary.  For
simple device attributes, sysfs works great, for configuring devices,
configfs works great, but for data streams / structures / etc. an ioctl
is the correct thing to use.

Examples of new ioctls being added to the kernel are all over the
place, look at all of the special-purpose ioctls the filesystems keep
creating (they aren't adding new syscalls), look at the monstrosity that
is the DRM layer, look at other complex things like openvswitch, or
"simpler" device-specific interfaces like the MEI one, or even more
complex ones like the MMC interface.  These are all valid uses of ioctls
as they are device/filesystem specific ways to interact with the kernel.

The thing is, almost no one pays attention to these new ioctls as they
are domain-specific interfaces, with open userspace programs talking to
them, and they work well.  ioctl is a powerful and useful interface, and
if we were to suddenly require no new ioctls, and require everything to
be a syscall, we would do nothing except make apis more complex (hint,
you now have to do extra validation on your file descriptor passed to
you to determine if it really is what you can properly operate your
ioctl on), and cause no real benefit at all.

Yes, people abuse ioctls at times, but really, they are needed.

And remember, I was one of the people who long ago thought we should not
be adding more ioctls, but I have seen the folly of my ways, and chalk
it up to youthful ignorance :)

Hope this helps explain things better,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-23 Thread Michael Kerrisk (man-pages)
Hi David,

On 01/22/2015 02:46 PM, David Herrmann wrote:
> Hi Michael
> 
> On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages)
>  wrote:
>> On 01/21/2015 05:58 PM, Daniel Mack wrote:
> Also, the context the kdbus commands operate on originate from a
> mountable special-purpose file system.

 It's not clear to me how this point implies any particular API design
 choice.
>>>
>>> It emphasizes the fact that our ioctl API can only be used with the
>>> nodes exposed by kdbusfs, and vice versa. I think operations on
>>> driver-specific files do not justify a new 'generic' syscall API.
>>
>> I see your (repeated) use of words like "driver" as just a distraction,
>> I'm sorry. "Driver-specific files" is an implementation detail. The
>> point is that kdbus provides a complex, general-purpose feature for all
>> of userland. It is core infrastructure that will be used by key pieces
>> of the plumbing layer, and likely by many other applications as well.
>> *Please* stop suggesting that it is not core infrastructure: kdbus
>> has the potential to be a great IPC that will be very useful to many,
>> many user-space developers.
> 
> We called it an 'ipc driver' so far. It is in no way meant as
> distraction. Feel free to call it 'core infrastructure'. I think we
> can agree that we want it to be generically useful, like other ipc
> mechanisms, including UDS and netlink.

Yes.

>> (By the way, we have precedents for device/filesystem-specific system
>> calls. Even a recent one, in memfd_create().)
> 
> memfd_create() is in no way file-system specific. Internally, it uses
> shmem, but that's an implementation detail. The API does not expose
> this in any way. If you were referring to sealing, it's implemented as
> fcntl(), not as a separate syscall. Furthermore, sealing is only
> limited to shmem as it's the only volatile storage. It's not an API
> restriction. Other volatile file-systems are free to implement
> sealing.

My bad. I mispoke there.

 ioctl() is a get-out-of-jail free card when it comes to API design.
>>>
>>> And how are syscalls different in that regard when they would both
>>> transport the same data structures?
>>
>> My general impression is that when it comes to system calls,
>> there's usually a lot more up front thought about API design.
> 
> This is no technical reason why to use syscalls over ioctls. Imho,
> it's rather a reason to improve the kernel's ioctl-review process.

Agreed it's not a technical reason. But the distinction I make 
does capture how things usually work.

[...]

 Rather
 than thinking carefully and long about a set of coherent, stable APIs that
 provide some degree of type-safety to user-space, one just 
 adds/changes/removes
 an ioctl.
>>>
>>> Adding another ioctl in the future for cases we didn't think about
>>> earlier would of course be considered a workaround; and even though such
>>> things happen all the time, it's certainly something we'd like to avoid.
>>>
>>> However, we would also need to add another syscall in such cases, which
>>> is even worse IMO. So that's really not an argument against ioctls after
>>> all. What fundamental difference between a raw syscall and a ioctl,
>>> especially with regards to type safety, is there which would help us here?
>>
>> Type safety helps user-space application developers. System calls have
>> it, ioctl() does not.
> 
> This is simply not true. There is no type-safety in:
> syscall(__NR_xyz, some, random, arguments)
> 
> The only way a syscall gets 'type-safe', is to provide a wrapper
> function. Same applies to ioctls. But people tend to not do that for
> ioctls, which is, again, not a technical argument against ioctls. It's
> a matter of psychology, though.

Yes, I see I wasn't quite clear enough. I should have said:

Type safety helps user-space application developers. 
*As typically provided to user-space via libc wrappers*,
system calls have it, ioctl() does not.

And system call wrappers are generally provided pretty much automatically
by libcs (at least, mostly, there's some to-ing and fro-ing about this in
glibc-land these days as you are aware from the memfd_create() story;
http://thread.gmane.org/gmane.comp.lib.glibc.alpha/45884/focus=46937).

So, in practice user-space programmers typically automatically get type
safety with system calls, but not with ioctl().

> I still don't see a technical reason to use syscalls. API proposals welcome!
> We're now working on a small kdbus helper library, which provides
> type-safe ioctl wrappers, item-iterators and documented examples. But,
> like syscalls, nobody is forced to use the wrappers. The API design is
> not affected by this.
> 
 And that process seems to be frequent and ongoing even now. (And
 it's to your great credit that the API/ABI breaks are clearly and honestly
 marked in the kdbus.h changelog.) All of this lightens the burden of API
 design for kernel developers, but I'm concerned 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-23 Thread Greg Kroah-Hartman
On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote:
 On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
  On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
   From: Daniel Mack dan...@zonque.org
   
   kdbus is a system for low-latency, low-overhead, easy to use
   interprocess communication (IPC).
   
   The interface to all functions in this driver is implemented via ioctls
   on files exposed through a filesystem called 'kdbusfs'. The default
   mount point of kdbusfs is /sys/fs/kdbus.
  
  Pardon my ignorance, but we've always been told that adding
  new ioctl()s to the kernel is a very big no-no.  But given
  the seniority of the folks stewarding this kdbus effort,
  there must be a good rationale ;-)
  
  So, can the rationale behind introducing new ioctl()s be
  further explained? It would be even better if it's included
  in the documentation patch itself.
 
 The main reason to use an ioctl is that you want to atomically set
 and/or get something complex through the user/kernel boundary.  For
 simple device attributes, sysfs works great, for configuring devices,
 configfs works great, but for data streams / structures / etc. an ioctl
 is the correct thing to use.
 
 Examples of new ioctls being added to the kernel are all over the
 place, look at all of the special-purpose ioctls the filesystems keep
 creating (they aren't adding new syscalls), look at the monstrosity that
 is the DRM layer, look at other complex things like openvswitch, or
 simpler device-specific interfaces like the MEI one, or even more
 complex ones like the MMC interface.

Oops, I meant, MIC, not MMC, sorry about that.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-23 Thread Michael Kerrisk (man-pages)
Hi David,

On 01/22/2015 02:46 PM, David Herrmann wrote:
 Hi Michael
 
 On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 On 01/21/2015 05:58 PM, Daniel Mack wrote:
 Also, the context the kdbus commands operate on originate from a
 mountable special-purpose file system.

 It's not clear to me how this point implies any particular API design
 choice.

 It emphasizes the fact that our ioctl API can only be used with the
 nodes exposed by kdbusfs, and vice versa. I think operations on
 driver-specific files do not justify a new 'generic' syscall API.

 I see your (repeated) use of words like driver as just a distraction,
 I'm sorry. Driver-specific files is an implementation detail. The
 point is that kdbus provides a complex, general-purpose feature for all
 of userland. It is core infrastructure that will be used by key pieces
 of the plumbing layer, and likely by many other applications as well.
 *Please* stop suggesting that it is not core infrastructure: kdbus
 has the potential to be a great IPC that will be very useful to many,
 many user-space developers.
 
 We called it an 'ipc driver' so far. It is in no way meant as
 distraction. Feel free to call it 'core infrastructure'. I think we
 can agree that we want it to be generically useful, like other ipc
 mechanisms, including UDS and netlink.

Yes.

 (By the way, we have precedents for device/filesystem-specific system
 calls. Even a recent one, in memfd_create().)
 
 memfd_create() is in no way file-system specific. Internally, it uses
 shmem, but that's an implementation detail. The API does not expose
 this in any way. If you were referring to sealing, it's implemented as
 fcntl(), not as a separate syscall. Furthermore, sealing is only
 limited to shmem as it's the only volatile storage. It's not an API
 restriction. Other volatile file-systems are free to implement
 sealing.

My bad. I mispoke there.

 ioctl() is a get-out-of-jail free card when it comes to API design.

 And how are syscalls different in that regard when they would both
 transport the same data structures?

 My general impression is that when it comes to system calls,
 there's usually a lot more up front thought about API design.
 
 This is no technical reason why to use syscalls over ioctls. Imho,
 it's rather a reason to improve the kernel's ioctl-review process.

Agreed it's not a technical reason. But the distinction I make 
does capture how things usually work.

[...]

 Rather
 than thinking carefully and long about a set of coherent, stable APIs that
 provide some degree of type-safety to user-space, one just 
 adds/changes/removes
 an ioctl.

 Adding another ioctl in the future for cases we didn't think about
 earlier would of course be considered a workaround; and even though such
 things happen all the time, it's certainly something we'd like to avoid.

 However, we would also need to add another syscall in such cases, which
 is even worse IMO. So that's really not an argument against ioctls after
 all. What fundamental difference between a raw syscall and a ioctl,
 especially with regards to type safety, is there which would help us here?

 Type safety helps user-space application developers. System calls have
 it, ioctl() does not.
 
 This is simply not true. There is no type-safety in:
 syscall(__NR_xyz, some, random, arguments)
 
 The only way a syscall gets 'type-safe', is to provide a wrapper
 function. Same applies to ioctls. But people tend to not do that for
 ioctls, which is, again, not a technical argument against ioctls. It's
 a matter of psychology, though.

Yes, I see I wasn't quite clear enough. I should have said:

Type safety helps user-space application developers. 
*As typically provided to user-space via libc wrappers*,
system calls have it, ioctl() does not.

And system call wrappers are generally provided pretty much automatically
by libcs (at least, mostly, there's some to-ing and fro-ing about this in
glibc-land these days as you are aware from the memfd_create() story;
http://thread.gmane.org/gmane.comp.lib.glibc.alpha/45884/focus=46937).

So, in practice user-space programmers typically automatically get type
safety with system calls, but not with ioctl().

 I still don't see a technical reason to use syscalls. API proposals welcome!
 We're now working on a small kdbus helper library, which provides
 type-safe ioctl wrappers, item-iterators and documented examples. But,
 like syscalls, nobody is forced to use the wrappers. The API design is
 not affected by this.
 
 And that process seems to be frequent and ongoing even now. (And
 it's to your great credit that the API/ABI breaks are clearly and honestly
 marked in the kdbus.h changelog.) All of this lightens the burden of API
 design for kernel developers, but I'm concerned that the long-term pain
 for user-space developers who use an API which (in my estimation) may
 come to be widely used will be enormous.

 Yes, we've jointly 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-23 Thread Greg Kroah-Hartman
On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
 On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
  From: Daniel Mack dan...@zonque.org
  
  kdbus is a system for low-latency, low-overhead, easy to use
  interprocess communication (IPC).
  
  The interface to all functions in this driver is implemented via ioctls
  on files exposed through a filesystem called 'kdbusfs'. The default
  mount point of kdbusfs is /sys/fs/kdbus.
 
 Pardon my ignorance, but we've always been told that adding
 new ioctl()s to the kernel is a very big no-no.  But given
 the seniority of the folks stewarding this kdbus effort,
 there must be a good rationale ;-)
 
 So, can the rationale behind introducing new ioctl()s be
 further explained? It would be even better if it's included
 in the documentation patch itself.

The main reason to use an ioctl is that you want to atomically set
and/or get something complex through the user/kernel boundary.  For
simple device attributes, sysfs works great, for configuring devices,
configfs works great, but for data streams / structures / etc. an ioctl
is the correct thing to use.

Examples of new ioctls being added to the kernel are all over the
place, look at all of the special-purpose ioctls the filesystems keep
creating (they aren't adding new syscalls), look at the monstrosity that
is the DRM layer, look at other complex things like openvswitch, or
simpler device-specific interfaces like the MEI one, or even more
complex ones like the MMC interface.  These are all valid uses of ioctls
as they are device/filesystem specific ways to interact with the kernel.

The thing is, almost no one pays attention to these new ioctls as they
are domain-specific interfaces, with open userspace programs talking to
them, and they work well.  ioctl is a powerful and useful interface, and
if we were to suddenly require no new ioctls, and require everything to
be a syscall, we would do nothing except make apis more complex (hint,
you now have to do extra validation on your file descriptor passed to
you to determine if it really is what you can properly operate your
ioctl on), and cause no real benefit at all.

Yes, people abuse ioctls at times, but really, they are needed.

And remember, I was one of the people who long ago thought we should not
be adding more ioctls, but I have seen the folly of my ways, and chalk
it up to youthful ignorance :)

Hope this helps explain things better,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-23 Thread Greg Kroah-Hartman
On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
 While I agree that there should be a way for userspace to get the list of
 supported operations, userspace apps will only actually care about that
 once, when they begin talking to kdbus, because (ignoring the live kernel
 patching that people have been working on recently) the list of supported
 operations isn't going to change while the system is running.  While a u64
 copy has relatively low overhead, it does have overhead, and that is very
 significant when you consider part of the reason some people want kdbus is
 for the performance gain.  Especially for those automotive applications that
 have been mentioned which fire off thousands of messages during start-up,
 every little bit of performance is significant.

A single u64 in a structure is not going to be measurable at all,
processors just copy memory too fast these days for 4 extra bytes to be
noticable.  So let's make this as easy as possible for userspace, making
it simpler logic there, which is much more important than saving
theoretical time in the kernel.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-23 Thread Greg Kroah-Hartman
On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote:
  And that process seems to be frequent and ongoing even now. (And 
  it's to your great credit that the API/ABI breaks are clearly and honestly 
  marked in the kdbus.h changelog.) All of this lightens the burden of API
  design for kernel developers, but I'm concerned that the long-term pain
  for user-space developers who use an API which (in my estimation) may
  come to be widely used will be enormous.
  
  Yes, we've jointly reviewed the API details again until just recently to
  unify structs and enums etc, and added fields to make the ioctls structs
  more versatile for possible future additions. By that, we effectively
  broke the ABI, but we did that because we know we can't do such things
  again in the future.
  
  But again - I don't see how this would be different when using syscalls
  rather than ioctls to transport information between the driver and
  userspace. Could you elaborate?
 
 My suspicion is that not nearly enough thinking has yet been done about
 the design of the API. That's based on these observations:
 
 * Documentation that is, considering the size of the API, *way* too thin.
 * Some parts of the API not documented at all (various kdbus_item blobs)
 * ABI changes happening even quite recently
 * API oddities such as the 'kernel_flags' fields. Why do I need to
   be told what flags the kernel supports on *every* operation?
 
 The above is just after a day of looking hard at kdbus.txt. I strongly
 suspect I'd find a lot of other issues if I spent more time on kdbus.

not enough thinking?

We started working on kdbus 2 years ago this FOSDEM (in a few weeks.)
Before that we have been thinking about this for many years, learning
from the previous attempts to get this type of feature merged into the
kernel, talking with users about what they need for this, and soliciting
kernel developer's opinions on what type of API would be best for this
type of feature.

Since then we have done nothing but constantly revise the API.  My first
mock ups were way too simple, and in discussing things with people much
more knowledgeable about D-Bus, they pointed out the problems, and we
iterated.  And iterated.  And iterated some more.  We have worked with
just about every userspace libdbus developer group, including QtDbus
developers as well as glib developers.  Now not all of them agreed with
some of our decisions in the implementation, which is fair enough, you
can't please everyone, but they _all_ agree that what we have now is the
proper way to implement this type of functionality and have reviewed the
features as being correct and compatible with their needs and users.

Those discussions have happened in emails, presentations, meetings, and
hackfests pretty much continuously for the past 2 years all around the
world.

We have stress-tested the api with both unit tests (which are included
here in the patch set) as well as a real-world implementation (sd-bus in
the systemd source repo.)  That real-world implementation successfully
has been booting many of our daily machines for many months now.

Yes, the documentation can always be better, but please don't confuse
the lack of understanding how D-Bus works and its model with the lack of
understanding this kdbus implementation, the two are not comparable.
For some good primers on what D-Bus is, and the terminology it expects
see:
http://dbus.freedesktop.org/doc/dbus-tutorial.html
and also:
http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc

We are not going to put a basic here is what D-Bus is and how to use
it into the kernel tree, that is totally outside the scope here.

I suggest reading the tutorial above, and then going back and reading
the kdbus documentation provided.  If you think we are lacking stuff on
the kdbus side, we will be glad to flush out any needed areas.

Also, Daniel has said he will work on a basic userspace example
library to show how to use this api, which might make the api a bit
easier to understand.

However, I personally don't think this example code is necessary at
all.  We don't ask for this type of simple examples from other new
kernel apis we create and add to the kernel all the time.  We require
there to be a user of the api, but not one that is so well documented
that others can write a from-scratch raw userspace replacement.
Specific examples of this are my previously mentioned ioctl users
(btrfs, mei, mic, openvlan, etc.), and the grand-daddy of all horrid
apis, DRM.

Users aren't going to be writing their own raw kdbus libraries.  Or if
they are, they are going to start with one of the existing
implementations we have (the test examples and sd-bus, and I think there
is a native Go implementation somewhere as well.)  Users are going to be
using those libraries to write their code, and to be honest, the user
api for sd-bus is a delight to use compared to the old style libdbus
interface as we have the benefit of 10 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-22 Thread Ahmed S. Darwish
On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> From: Daniel Mack 
> 
> kdbus is a system for low-latency, low-overhead, easy to use
> interprocess communication (IPC).
> 
> The interface to all functions in this driver is implemented via ioctls
> on files exposed through a filesystem called 'kdbusfs'. The default
> mount point of kdbusfs is /sys/fs/kdbus.

Pardon my ignorance, but we've always been told that adding
new ioctl()s to the kernel is a very big no-no.  But given
the seniority of the folks stewarding this kdbus effort,
there must be a good rationale ;-)

So, can the rationale behind introducing new ioctl()s be
further explained? It would be even better if it's included
in the documentation patch itself.

Thanks,
Darwish
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-22 Thread Austin S Hemmelgarn

On 2015-01-22 08:46, David Herrmann wrote:

Hi Michael

On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages)
 wrote:

* API oddities such as the 'kernel_flags' fields. Why do I need to
   be told what flags the kernel supports on *every* operation?


If we only returned EINVAL on invalid arguments, user-space had to
probe for each flag to see whether it's supported. By returning the
set of supported flags, user-space can cache those and _reliably_ know
which flags are supported.
We decided the overhead of a single u64 copy on each ioctl is
preferred over a separate syscall/ioctl to query kernel flags. If you
disagree, please elaborate (preferably with a suggestion how to do it
better).

While I agree that there should be a way for userspace to get the list 
of supported operations, userspace apps will only actually care about 
that once, when they begin talking to kdbus, because (ignoring the live 
kernel patching that people have been working on recently) the list of 
supported operations isn't going to change while the system is running. 
 While a u64 copy has relatively low overhead, it does have overhead, 
and that is very significant when you consider part of the reason some 
people want kdbus is for the performance gain.  Especially for those 
automotive applications that have been mentioned which fire off 
thousands of messages during start-up, every little bit of performance 
is significant.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 01/13] kdbus: add documentation

2015-01-22 Thread David Herrmann
Hi Michael

On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages)
 wrote:
> On 01/21/2015 05:58 PM, Daniel Mack wrote:
 Also, the context the kdbus commands operate on originate from a
 mountable special-purpose file system.
>>>
>>> It's not clear to me how this point implies any particular API design
>>> choice.
>>
>> It emphasizes the fact that our ioctl API can only be used with the
>> nodes exposed by kdbusfs, and vice versa. I think operations on
>> driver-specific files do not justify a new 'generic' syscall API.
>
> I see your (repeated) use of words like "driver" as just a distraction,
> I'm sorry. "Driver-specific files" is an implementation detail. The
> point is that kdbus provides a complex, general-purpose feature for all
> of userland. It is core infrastructure that will be used by key pieces
> of the plumbing layer, and likely by many other applications as well.
> *Please* stop suggesting that it is not core infrastructure: kdbus
> has the potential to be a great IPC that will be very useful to many,
> many user-space developers.

We called it an 'ipc driver' so far. It is in no way meant as
distraction. Feel free to call it 'core infrastructure'. I think we
can agree that we want it to be generically useful, like other ipc
mechanisms, including UDS and netlink.

> (By the way, we have precedents for device/filesystem-specific system
> calls. Even a recent one, in memfd_create().)

memfd_create() is in no way file-system specific. Internally, it uses
shmem, but that's an implementation detail. The API does not expose
this in any way. If you were referring to sealing, it's implemented as
fcntl(), not as a separate syscall. Furthermore, sealing is only
limited to shmem as it's the only volatile storage. It's not an API
restriction. Other volatile file-systems are free to implement
sealing.

>>> ioctl() is a get-out-of-jail free card when it comes to API design.
>>
>> And how are syscalls different in that regard when they would both
>> transport the same data structures?
>
> My general impression is that when it comes to system calls,
> there's usually a lot more up front thought about API design.

This is no technical reason why to use syscalls over ioctls. Imho,
it's rather a reason to improve the kernel's ioctl-review process.

>> Also note that all kdbus ioctls
>> necessarily operate on a file descriptor context, which an ioctl passes
>> along by default.
>
> I fail to see your point here. The same statement applies to multiple
> special-purpose system calls (epoll, inotify, various shared memory APIs,
> and so on).

epoll and inotify don't have a 'parent' object living in the
file-system. They *need* an entry-point. We can use open() for that.

You're right, from a technical viewpoint, there's no restriction.
There're examples for both (eg., see Solaris /dev/poll, as
ioctl()-based 'epoll').

>>> Rather
>>> than thinking carefully and long about a set of coherent, stable APIs that
>>> provide some degree of type-safety to user-space, one just 
>>> adds/changes/removes
>>> an ioctl.
>>
>> Adding another ioctl in the future for cases we didn't think about
>> earlier would of course be considered a workaround; and even though such
>> things happen all the time, it's certainly something we'd like to avoid.
>>
>> However, we would also need to add another syscall in such cases, which
>> is even worse IMO. So that's really not an argument against ioctls after
>> all. What fundamental difference between a raw syscall and a ioctl,
>> especially with regards to type safety, is there which would help us here?
>
> Type safety helps user-space application developers. System calls have
> it, ioctl() does not.

This is simply not true. There is no type-safety in:
syscall(__NR_xyz, some, random, arguments)

The only way a syscall gets 'type-safe', is to provide a wrapper
function. Same applies to ioctls. But people tend to not do that for
ioctls, which is, again, not a technical argument against ioctls. It's
a matter of psychology, though.

I still don't see a technical reason to use syscalls. API proposals welcome!

We're now working on a small kdbus helper library, which provides
type-safe ioctl wrappers, item-iterators and documented examples. But,
like syscalls, nobody is forced to use the wrappers. The API design is
not affected by this.

>>> And that process seems to be frequent and ongoing even now. (And
>>> it's to your great credit that the API/ABI breaks are clearly and honestly
>>> marked in the kdbus.h changelog.) All of this lightens the burden of API
>>> design for kernel developers, but I'm concerned that the long-term pain
>>> for user-space developers who use an API which (in my estimation) may
>>> come to be widely used will be enormous.
>>
>> Yes, we've jointly reviewed the API details again until just recently to
>> unify structs and enums etc, and added fields to make the ioctls structs
>> more versatile for possible future additions. By that, we effectively
>> 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-22 Thread Michael Kerrisk (man-pages)
Hi Daniel,

On 01/21/2015 05:58 PM, Daniel Mack wrote:
> Hi Michael,
> 
> On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote:
>> On 01/20/2015 07:23 PM, Daniel Mack wrote:
> 
>>> It's rather an optional driver than a core kernel feature.
>>
>> Given the various things that I've seen said about kdbus, the
>> preceding sentence makes little sense to me:
>>
>> * kdbus will be the framework supporting user-space D-Bus in the
>>   future, and also used by systemd, and so on pretty much every 
>>   desktop system.
>> * kdbus solves much of the bandwidth problem of D-Bus1. That,
>>   along with a host of other features mean that there will be
>>   a lot of user-space developers interested in using this API.
>> * Various parties in user space are already expressing strong 
>>   interest in kdbus.
>>
>> My guess from the above? This will NOT be an "optional driver". 
>> It *will be* a core kernel feature.
> 
> There will be userlands that will depend on kdbus, but that will still
> not turn the "driver" into a core Linux kernel feature. We really want
> it to be losely coupled from the rest of the kernel and optional after
> all. The kernel people are working toward making more and more things
> optional these days, and there will still be lots of systems that won't
> be using kdbus.

Make it optional and configurable if you want. But that misses my
point. kdbus is very likely to become an essential, widely used
piece of *general-purpose* piece of ABI infrastructure that will
be configured into virtually every type of system. As such, the same
standards should apply as for a "core kernel feature", whether or
not you want to cal it that.

>>> Also, the context the kdbus commands operate on originate from a
>>> mountable special-purpose file system.
>>
>> It's not clear to me how this point implies any particular API design
>> choice.
> 
> It emphasizes the fact that our ioctl API can only be used with the
> nodes exposed by kdbusfs, and vice versa. I think operations on
> driver-specific files do not justify a new 'generic' syscall API.

I see your (repeated) use of words like "driver" as just a distraction, 
I'm sorry. "Driver-specific files" is an implementation detail. The
point is that kdbus provides a complex, general-purpose feature for all
of userland. It is core infrastructure that will be used by key pieces 
of the plumbing layer, and likely by many other applications as well.
*Please* stop suggesting that it is not core infrastructure: kdbus
has the potential to be a great IPC that will be very useful to many,
many user-space developers.

(By the way, we have precedents for device/filesystem-specific system
calls. Even a recent one, in memfd_create().)

>> Notwithstanding the fact that there's a lot of (good) information in
>> kdbus.txt, there's not nearly enough for someone to create useful, 
>> robust applications that use that API (and not enough that I as a
>> reviewer feel comfortable about reviewing the API). As things stand,
>> user-space developers will be forced to decipher large amounts of kernel
>> code and existing applications in order to actually build things. And
>> when they do, they'll be using one of the worst APIs known to man: ioctl(),
>> an API that provides no type safety at all.
> 
> I don't see how ioctls are any worse than syscalls with pointers to
> structures. One can screw up compatibility either way. How is an ioctl
> wrapper/prototype any less type-safe than a syscall wrapper?

Taking that argument to the extreme, we would have no system calls 
at all, just one gigantic ioctl() ;-).

>> ioctl() is a get-out-of-jail free card when it comes to API design.
> 
> And how are syscalls different in that regard when they would both
> transport the same data structures? 

My general impression is that when it comes to system calls,
there's usually a lot more up front thought about API design.

> Also note that all kdbus ioctls
> necessarily operate on a file descriptor context, which an ioctl passes
> along by default.

I fail to see your point here. The same statement applies to multiple
special-purpose system calls (epoll, inotify, various shared memory APIs, 
and so on).

>> Rather
>> than thinking carefully and long about a set of coherent, stable APIs that 
>> provide some degree of type-safety to user-space, one just 
>> adds/changes/removes
>> an ioctl.
> 
> Adding another ioctl in the future for cases we didn't think about
> earlier would of course be considered a workaround; and even though such
> things happen all the time, it's certainly something we'd like to avoid.
> 
> However, we would also need to add another syscall in such cases, which
> is even worse IMO. So that's really not an argument against ioctls after
> all. What fundamental difference between a raw syscall and a ioctl,
> especially with regards to type safety, is there which would help us here?

Type safety helps user-space application developers. System calls have 
it, ioctl() does not.

>> And that 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-22 Thread David Herrmann
Hi Michael

On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages)
mtk.manpa...@gmail.com wrote:
 On 01/21/2015 05:58 PM, Daniel Mack wrote:
 Also, the context the kdbus commands operate on originate from a
 mountable special-purpose file system.

 It's not clear to me how this point implies any particular API design
 choice.

 It emphasizes the fact that our ioctl API can only be used with the
 nodes exposed by kdbusfs, and vice versa. I think operations on
 driver-specific files do not justify a new 'generic' syscall API.

 I see your (repeated) use of words like driver as just a distraction,
 I'm sorry. Driver-specific files is an implementation detail. The
 point is that kdbus provides a complex, general-purpose feature for all
 of userland. It is core infrastructure that will be used by key pieces
 of the plumbing layer, and likely by many other applications as well.
 *Please* stop suggesting that it is not core infrastructure: kdbus
 has the potential to be a great IPC that will be very useful to many,
 many user-space developers.

We called it an 'ipc driver' so far. It is in no way meant as
distraction. Feel free to call it 'core infrastructure'. I think we
can agree that we want it to be generically useful, like other ipc
mechanisms, including UDS and netlink.

 (By the way, we have precedents for device/filesystem-specific system
 calls. Even a recent one, in memfd_create().)

memfd_create() is in no way file-system specific. Internally, it uses
shmem, but that's an implementation detail. The API does not expose
this in any way. If you were referring to sealing, it's implemented as
fcntl(), not as a separate syscall. Furthermore, sealing is only
limited to shmem as it's the only volatile storage. It's not an API
restriction. Other volatile file-systems are free to implement
sealing.

 ioctl() is a get-out-of-jail free card when it comes to API design.

 And how are syscalls different in that regard when they would both
 transport the same data structures?

 My general impression is that when it comes to system calls,
 there's usually a lot more up front thought about API design.

This is no technical reason why to use syscalls over ioctls. Imho,
it's rather a reason to improve the kernel's ioctl-review process.

 Also note that all kdbus ioctls
 necessarily operate on a file descriptor context, which an ioctl passes
 along by default.

 I fail to see your point here. The same statement applies to multiple
 special-purpose system calls (epoll, inotify, various shared memory APIs,
 and so on).

epoll and inotify don't have a 'parent' object living in the
file-system. They *need* an entry-point. We can use open() for that.

You're right, from a technical viewpoint, there's no restriction.
There're examples for both (eg., see Solaris /dev/poll, as
ioctl()-based 'epoll').

 Rather
 than thinking carefully and long about a set of coherent, stable APIs that
 provide some degree of type-safety to user-space, one just 
 adds/changes/removes
 an ioctl.

 Adding another ioctl in the future for cases we didn't think about
 earlier would of course be considered a workaround; and even though such
 things happen all the time, it's certainly something we'd like to avoid.

 However, we would also need to add another syscall in such cases, which
 is even worse IMO. So that's really not an argument against ioctls after
 all. What fundamental difference between a raw syscall and a ioctl,
 especially with regards to type safety, is there which would help us here?

 Type safety helps user-space application developers. System calls have
 it, ioctl() does not.

This is simply not true. There is no type-safety in:
syscall(__NR_xyz, some, random, arguments)

The only way a syscall gets 'type-safe', is to provide a wrapper
function. Same applies to ioctls. But people tend to not do that for
ioctls, which is, again, not a technical argument against ioctls. It's
a matter of psychology, though.

I still don't see a technical reason to use syscalls. API proposals welcome!

We're now working on a small kdbus helper library, which provides
type-safe ioctl wrappers, item-iterators and documented examples. But,
like syscalls, nobody is forced to use the wrappers. The API design is
not affected by this.

 And that process seems to be frequent and ongoing even now. (And
 it's to your great credit that the API/ABI breaks are clearly and honestly
 marked in the kdbus.h changelog.) All of this lightens the burden of API
 design for kernel developers, but I'm concerned that the long-term pain
 for user-space developers who use an API which (in my estimation) may
 come to be widely used will be enormous.

 Yes, we've jointly reviewed the API details again until just recently to
 unify structs and enums etc, and added fields to make the ioctls structs
 more versatile for possible future additions. By that, we effectively
 broke the ABI, but we did that because we know we can't do such things
 again in the future.

 But again - 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-22 Thread Michael Kerrisk (man-pages)
Hi Daniel,

On 01/21/2015 05:58 PM, Daniel Mack wrote:
 Hi Michael,
 
 On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote:
 On 01/20/2015 07:23 PM, Daniel Mack wrote:
 
 It's rather an optional driver than a core kernel feature.

 Given the various things that I've seen said about kdbus, the
 preceding sentence makes little sense to me:

 * kdbus will be the framework supporting user-space D-Bus in the
   future, and also used by systemd, and so on pretty much every 
   desktop system.
 * kdbus solves much of the bandwidth problem of D-Bus1. That,
   along with a host of other features mean that there will be
   a lot of user-space developers interested in using this API.
 * Various parties in user space are already expressing strong 
   interest in kdbus.

 My guess from the above? This will NOT be an optional driver. 
 It *will be* a core kernel feature.
 
 There will be userlands that will depend on kdbus, but that will still
 not turn the driver into a core Linux kernel feature. We really want
 it to be losely coupled from the rest of the kernel and optional after
 all. The kernel people are working toward making more and more things
 optional these days, and there will still be lots of systems that won't
 be using kdbus.

Make it optional and configurable if you want. But that misses my
point. kdbus is very likely to become an essential, widely used
piece of *general-purpose* piece of ABI infrastructure that will
be configured into virtually every type of system. As such, the same
standards should apply as for a core kernel feature, whether or
not you want to cal it that.

 Also, the context the kdbus commands operate on originate from a
 mountable special-purpose file system.

 It's not clear to me how this point implies any particular API design
 choice.
 
 It emphasizes the fact that our ioctl API can only be used with the
 nodes exposed by kdbusfs, and vice versa. I think operations on
 driver-specific files do not justify a new 'generic' syscall API.

I see your (repeated) use of words like driver as just a distraction, 
I'm sorry. Driver-specific files is an implementation detail. The
point is that kdbus provides a complex, general-purpose feature for all
of userland. It is core infrastructure that will be used by key pieces 
of the plumbing layer, and likely by many other applications as well.
*Please* stop suggesting that it is not core infrastructure: kdbus
has the potential to be a great IPC that will be very useful to many,
many user-space developers.

(By the way, we have precedents for device/filesystem-specific system
calls. Even a recent one, in memfd_create().)

 Notwithstanding the fact that there's a lot of (good) information in
 kdbus.txt, there's not nearly enough for someone to create useful, 
 robust applications that use that API (and not enough that I as a
 reviewer feel comfortable about reviewing the API). As things stand,
 user-space developers will be forced to decipher large amounts of kernel
 code and existing applications in order to actually build things. And
 when they do, they'll be using one of the worst APIs known to man: ioctl(),
 an API that provides no type safety at all.
 
 I don't see how ioctls are any worse than syscalls with pointers to
 structures. One can screw up compatibility either way. How is an ioctl
 wrapper/prototype any less type-safe than a syscall wrapper?

Taking that argument to the extreme, we would have no system calls 
at all, just one gigantic ioctl() ;-).

 ioctl() is a get-out-of-jail free card when it comes to API design.
 
 And how are syscalls different in that regard when they would both
 transport the same data structures? 

My general impression is that when it comes to system calls,
there's usually a lot more up front thought about API design.

 Also note that all kdbus ioctls
 necessarily operate on a file descriptor context, which an ioctl passes
 along by default.

I fail to see your point here. The same statement applies to multiple
special-purpose system calls (epoll, inotify, various shared memory APIs, 
and so on).

 Rather
 than thinking carefully and long about a set of coherent, stable APIs that 
 provide some degree of type-safety to user-space, one just 
 adds/changes/removes
 an ioctl.
 
 Adding another ioctl in the future for cases we didn't think about
 earlier would of course be considered a workaround; and even though such
 things happen all the time, it's certainly something we'd like to avoid.
 
 However, we would also need to add another syscall in such cases, which
 is even worse IMO. So that's really not an argument against ioctls after
 all. What fundamental difference between a raw syscall and a ioctl,
 especially with regards to type safety, is there which would help us here?

Type safety helps user-space application developers. System calls have 
it, ioctl() does not.

 And that process seems to be frequent and ongoing even now. (And 
 it's to your great credit that the API/ABI breaks are clearly 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-22 Thread Austin S Hemmelgarn

On 2015-01-22 08:46, David Herrmann wrote:

Hi Michael

On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages)
mtk.manpa...@gmail.com wrote:

* API oddities such as the 'kernel_flags' fields. Why do I need to
   be told what flags the kernel supports on *every* operation?


If we only returned EINVAL on invalid arguments, user-space had to
probe for each flag to see whether it's supported. By returning the
set of supported flags, user-space can cache those and _reliably_ know
which flags are supported.
We decided the overhead of a single u64 copy on each ioctl is
preferred over a separate syscall/ioctl to query kernel flags. If you
disagree, please elaborate (preferably with a suggestion how to do it
better).

While I agree that there should be a way for userspace to get the list 
of supported operations, userspace apps will only actually care about 
that once, when they begin talking to kdbus, because (ignoring the live 
kernel patching that people have been working on recently) the list of 
supported operations isn't going to change while the system is running. 
 While a u64 copy has relatively low overhead, it does have overhead, 
and that is very significant when you consider part of the reason some 
people want kdbus is for the performance gain.  Especially for those 
automotive applications that have been mentioned which fire off 
thousands of messages during start-up, every little bit of performance 
is significant.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 01/13] kdbus: add documentation

2015-01-22 Thread Ahmed S. Darwish
On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
 From: Daniel Mack dan...@zonque.org
 
 kdbus is a system for low-latency, low-overhead, easy to use
 interprocess communication (IPC).
 
 The interface to all functions in this driver is implemented via ioctls
 on files exposed through a filesystem called 'kdbusfs'. The default
 mount point of kdbusfs is /sys/fs/kdbus.

Pardon my ignorance, but we've always been told that adding
new ioctl()s to the kernel is a very big no-no.  But given
the seniority of the folks stewarding this kdbus effort,
there must be a good rationale ;-)

So, can the rationale behind introducing new ioctl()s be
further explained? It would be even better if it's included
in the documentation patch itself.

Thanks,
Darwish
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Daniel Mack
Hi Michael,

On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote:
> On 01/20/2015 07:23 PM, Daniel Mack wrote:

>> It's rather an optional driver than a core kernel feature.
> 
> Given the various things that I've seen said about kdbus, the
> preceding sentence makes little sense to me:
> 
> * kdbus will be the framework supporting user-space D-Bus in the
>   future, and also used by systemd, and so on pretty much every 
>   desktop system.
> * kdbus solves much of the bandwidth problem of D-Bus1. That,
>   along with a host of other features mean that there will be
>   a lot of user-space developers interested in using this API.
> * Various parties in user space are already expressing strong 
>   interest in kdbus.
> 
> My guess from the above? This will NOT be an "optional driver". 
> It *will be* a core kernel feature.

There will be userlands that will depend on kdbus, but that will still
not turn the "driver" into a core Linux kernel feature. We really want
it to be losely coupled from the rest of the kernel and optional after
all. The kernel people are working toward making more and more things
optional these days, and there will still be lots of systems that won't
be using kdbus.

>> Also, the context the kdbus commands operate on originate from a
>> mountable special-purpose file system.
> 
> It's not clear to me how this point implies any particular API design
> choice.

It emphasizes the fact that our ioctl API can only be used with the
nodes exposed by kdbusfs, and vice versa. I think operations on
driver-specific files do not justify a new 'generic' syscall API.

> Notwithstanding the fact that there's a lot of (good) information in
> kdbus.txt, there's not nearly enough for someone to create useful, 
> robust applications that use that API (and not enough that I as a
> reviewer feel comfortable about reviewing the API). As things stand,
> user-space developers will be forced to decipher large amounts of kernel
> code and existing applications in order to actually build things. And
> when they do, they'll be using one of the worst APIs known to man: ioctl(),
> an API that provides no type safety at all.

I don't see how ioctls are any worse than syscalls with pointers to
structures. One can screw up compatibility either way. How is an ioctl
wrapper/prototype any less type-safe than a syscall wrapper?

> ioctl() is a get-out-of-jail free card when it comes to API design.

And how are syscalls different in that regard when they would both
transport the same data structures? Also note that all kdbus ioctls
necessarily operate on a file descriptor context, which an ioctl passes
along by default.

> Rather
> than thinking carefully and long about a set of coherent, stable APIs that 
> provide some degree of type-safety to user-space, one just 
> adds/changes/removes
> an ioctl.

Adding another ioctl in the future for cases we didn't think about
earlier would of course be considered a workaround; and even though such
things happen all the time, it's certainly something we'd like to avoid.

However, we would also need to add another syscall in such cases, which
is even worse IMO. So that's really not an argument against ioctls after
all. What fundamental difference between a raw syscall and a ioctl,
especially with regards to type safety, is there which would help us here?

> And that process seems to be frequent and ongoing even now. (And 
> it's to your great credit that the API/ABI breaks are clearly and honestly 
> marked in the kdbus.h changelog.) All of this lightens the burden of API
> design for kernel developers, but I'm concerned that the long-term pain
> for user-space developers who use an API which (in my estimation) may
> come to be widely used will be enormous.

Yes, we've jointly reviewed the API details again until just recently to
unify structs and enums etc, and added fields to make the ioctls structs
more versatile for possible future additions. By that, we effectively
broke the ABI, but we did that because we know we can't do such things
again in the future.

But again - I don't see how this would be different when using syscalls
rather than ioctls to transport information between the driver and
userspace. Could you elaborate?

> Concretely, I'd like to see the following in kdbus.txt:
> * A lot more detail, adding the various pieces that are currently missing.
>   I've mentioned already the absence of detail on the item blob structures, 
>   but there's probably several other pieces as well. My problem is that the
>   API is so big and hard to grok that it's hard to even begin to work out
>   what's missing from the documentation.
> * Fleshing out the API summaries with code snippets that illustrate the
>   use of the APIs.
> * At least one simple working example application, complete with a walk
>   through of how it's built and run.
> 
> Yes, all of this is a big demand. But this is a big API that is being added 
> to the kernel, and one that may become widely used and 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Theodore Ts'o
On Wed, Jan 21, 2015 at 11:32:59AM +0100, Michael Kerrisk (man-pages) wrote:
> It's rather an optional driver than a core kernel feature.
> 
> Given the various things that I've seen said about kdbus, the
> preceding sentence makes little sense to me:
> 
> * kdbus will be the framework supporting user-space D-Bus in the
>   future, and also used by systemd, and so on pretty much every 
>   desktop system.

I have to agree with Michael here; it's really, **really**
disengenuous to say that that if you don't want kdbus, you can just
#ifconfig it out.  The fact that it systemd will be using it means
that it will very shortly become a core kernel feature which is
absolutely mandatory.  Sure, maybe it can be configured out for "tiny
kernels", just as in theory we can configure out the VM system for
really tiny embedded systems.  But we should be treating this as
something that is not optional, because the reality is that's the way
it's going to be in very short order.  So if that means to use proper
system calls instead of ioctls, we should do that.

   - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Michael Kerrisk (man-pages)
Hello Daniel,

On 01/20/2015 07:23 PM, Daniel Mack wrote:
> On 01/20/2015 02:53 PM, Michael Kerrisk (man-pages) wrote:
>> This is an enormous and complex API. Why is the API ioctl() based,
>> rather than system-call-based? Have we learned nothing from the hydra
>> that the futex() multiplexing syscall became? (And kdbus is an order
>> of magnitude more complex, by the look of things.) At the very least,
>> a *good* justification of why the API is ioctl()-based should be part
>> of this documentation file.
> 
> I think the simplest reason is because we want to be able to build kdbus
> as a module. 

This isn't any _good_ justification...

> It's rather an optional driver than a core kernel feature.

Given the various things that I've seen said about kdbus, the
preceding sentence makes little sense to me:

* kdbus will be the framework supporting user-space D-Bus in the
  future, and also used by systemd, and so on pretty much every 
  desktop system.
* kdbus solves much of the bandwidth problem of D-Bus1. That,
  along with a host of other features mean that there will be
  a lot of user-space developers interested in using this API.
* Various parties in user space are already expressing strong 
  interest in kdbus.

My guess from the above? This will NOT be an "optional driver". 
It *will be* a core kernel feature.

> IMO, kernel primitives should be syscalls, but kdbus is not a primitive
> but an elaborate subsystem.

Agreed. It's an elaborate subsystem. But that fact doesn't in itself
dictate any particular API design choice.

> Also, the context the kdbus commands operate on originate from a
> mountable special-purpose file system.

It's not clear to me how this point implies any particular API design
choice.

> Hence, we decided not to use a
> global kernel interface but specific ioctls on the nodes exposed by kdbusfs.

I don't follow the reasoning here at all. Here's what we have, if I
have grasped it roughly correctly:

* 16 ioctls exposed to user space.
* some 20 different structures exchanged between kernel and user space
* about 14k lines of kernel code implement the above
* some rather thin documentation of the whole lot

Sorry if that last point seems rather harsh. I know that you personally 
have done a lot of work on the kdbus.txt file. David Herrmann asserts
that this is a simple API. It is not. He also suggests that there is
no need for a libkdbus. I don't know whether that's right or not, but the
point is then that there's an expectation that the raw kernel API is what
user space will need to work with. 

Notwithstanding the fact that there's a lot of (good) information in
kdbus.txt, there's not nearly enough for someone to create useful, 
robust applications that use that API (and not enough that I as a
reviewer feel comfortable about reviewing the API). As things stand,
user-space developers will be forced to decipher large amounts of kernel
code and existing applications in order to actually build things. And
when they do, they'll be using one of the worst APIs known to man: ioctl(),
an API that provides no type safety at all.

ioctl() is a get-out-of-jail free card when it comes to API design. Rather
than thinking carefully and long about a set of coherent, stable APIs that 
provide some degree of type-safety to user-space, one just adds/changes/removes
an ioctl. And that process seems to be frequent and ongoing even now. (And 
it's to your great credit that the API/ABI breaks are clearly and honestly 
marked in the kdbus.h changelog.) All of this lightens the burden of API
design for kernel developers, but I'm concerned that the long-term pain
for user-space developers who use an API which (in my estimation) may
come to be widely used will be enormous.

Concretely, I'd like to see the following in kdbus.txt:
* A lot more detail, adding the various pieces that are currently missing.
  I've mentioned already the absence of detail on the item blob structures, 
  but there's probably several other pieces as well. My problem is that the
  API is so big and hard to grok that it's hard to even begin to work out
  what's missing from the documentation.
* Fleshing out the API summaries with code snippets that illustrate the
  use of the APIs.
* At least one simple working example application, complete with a walk
  through of how it's built and run.

Yes, all of this is a big demand. But this is a big API that is being added 
to the kernel, and one that may become widely used and for a long time.
It's imperative that the API is well documented, and as well designed as
possible. Furthermore, with such improved documentation I feel we'd be in 
a better position to evaluate the merits of an ioctl()-based API versus
some other approach.

Thanks,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Michael Kerrisk (man-pages)
Hi David,

On 01/20/2015 03:31 PM, David Herrmann wrote:
> Hi Michael
> 
> On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
>  wrote:
>> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>>> From: Daniel Mack 
>>>
>>> kdbus is a system for low-latency, low-overhead, easy to use
>>> interprocess communication (IPC).
>>>
>>> The interface to all functions in this driver is implemented via ioctls
>>> on files exposed through a filesystem called 'kdbusfs'. The default
>>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
>>> documentation about the kernel level API design.
>>
>> I have some details feedback on the contents of this file, and some
>> bigger questions. I'll split them out into separate mails.
>>
>> So here, the bigger, general questions to start with. I've arrived late
>> to this, so sorry if they've already been discussed, but the answers to
>> some of the questions should actually be in this file, I would have
>> expected.
>>
>> This is an enormous and complex API. Why is the API ioctl() based,
>> rather than system-call-based? Have we learned nothing from the hydra
>> that the futex() multiplexing syscall became? (And kdbus is an order
>> of magnitude more complex, by the look of things.) At the very least,
>> a *good* justification of why the API is ioctl()-based should be part
>> of this documentation file.
>>
>> An observation: The documentation below is substantial, but this API is
>> enormous, so the documentation still feels rather thin. What would
>> really help would be some example code in the doc.
>>
>> And on the subject of code examples... Are there any (prototype)
>> working user-space applications that exercise the current kdbus
>> implementation? That is, can I install these kdbus patches, and
>> then find a simple example application somewhere that does
>> something to exercise kdbus?
> 
> If you run a 3.18 kernel, you can install kdbus.ko from our repository
> and boot a full Fedora system running Gnome3 with kdbus, given that
> you compiled systemd with --enable-kdbus (which is still
> experimental). No legacy dbus1 daemon is running. Instead, we have a
> bus-proxy that converts classic dbus1 to kdbus, so all
> bus-communication runs on kdbus.

Good to hear.  I think that some info like this should go out in 
the "00/" covering mails for future patch revisions, so that people
can get some sense of the testing that has been done.

>> And then: is there any substantial real-world application (e.g., a
>> full D-Bus port) that is being developed in tandem with this kernel
>> side patch? (I don't mean a user-space library; I mean a seriously
>> large application.) This is an incredibly complex API whose
>> failings are only going to become evident through real-world use.
>> Solidifying an API in the kernel and then discovering the API
>> problems later when writing real-world applications would make for
>> a sad story. A story something like that of inotify, an API which
>> is an order of magnitude less complex than kdbus. (I can't help but
>> feel that many of inotify problems that I discuss at
>> https://lwn.net/Articles/605128/ might have been fixed or mitigated
>> if a few real-world applications had been implemented before the
>> API  was set in stone.)
> 
> I think running a whole Gnome3 stack counts as "substantial real-world
> application", right? 

Yes, I'll give you that ;-).

>  Sure, it's a dbus1-to-kdbus layer, but all the
> systemd tools use kdbus natively and it works just fine. In fact, we
> all run kdbus on our main-systems every day.
> 
> We've spent over a year fixing races and API misdesigns, we've talked
> to other toolkit developers (glib, qt, ..) and made sure we're
> backwards compatible to dbus1. I don't think the API is perfect,
> everyone makes mistakes. But with bus-proxy and systemd we have two
> huge users of kdbus that put a lot of pressure on API design.

I'll say more about that in another mail in a moment. I'm not enthusiastic
about the API.

>>> +For a kdbus specific userspace library implementation please refer to:
>>> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
>>
>> Is this library intended just for systemd? More generally, is there an
>> intention to provide a general purpose library API for kdbus? Or is the
>> intention that each application will roll a library suitable to its
>> needs? I think an answer to that question would be useful in this
>> Documentation file.
> 
> kdbus is in no way bound to systemd. There are ongoing efforts to port
> glib and qt to kdbus natively. The API is pretty simple 
 
I think you and I must have quite different definitions of "simple"...
(For more on this point, see my reply to Daniel in a moment.)

> and I don't
> see how a libkdbus would simplify things. In fact, even our tests only
> have slim wrappers around the ioctls to simplify error-handling in
> test-scenarios.

Again, the above info 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Daniel Mack
On 01/21/2015 10:07 AM, Michael Kerrisk (man-pages) wrote:
> On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:

>>> +Also, if the connection allowed for file descriptor to be passed
>>> +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
>>> +installed into the receiving process after the KDBUS_CMD_RECV ioctl 
>>> returns.
>>
>> ###
>> "after"??? When exactly?
> 
> By the way, what was the answer to this question?

I've corrected the wording on this. The file descriptors are installed
when the RECV ioctl is called, so they are ready to use when the call
returns.


Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Michael Kerrisk (man-pages)
Daniel,

On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:
> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>> From: Daniel Mack 
>>

[...]

>> +offset field contains the location of the new message inside the receiver's
>> +pool. The message is stored as struct kdbus_msg at this offset, and can be
>> +interpreted with the semantics described above.
>> +
>> +Also, if the connection allowed for file descriptor to be passed
>> +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
>> +installed into the receiving process after the KDBUS_CMD_RECV ioctl returns.
> 
> ###
> "after"??? When exactly?

By the way, what was the answer to this question?

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Daniel Mack
Hi Michael,

On 01/21/2015 09:57 AM, Michael Kerrisk (man-pages) wrote:
> On 01/20/2015 06:50 PM, Daniel Mack wrote:

>> I've addressed all but the below issues, following your suggestions.
> 
> Are your changes already visible somewhere?

Yes, in the upstream repo for the standalone module, which we also use
to build the patch set from:

  https://code.google.com/p/d-bus/source/browse/kdbus.txt

>> Hmm, you're quoting text from section 5, and section 4 actually
>> describes the concept of items quite well I believe?
> 
> Well, Section 4 is pretty short. My point is that most of the various 
> blob formats (e.g., kdbus_pids, kdbus_caps, kdbus_memfd) are not
> documented in kdbus.txt. They all should be, IMO.

Okay, I'll add some text about them.


Best regards,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Michael Kerrisk (man-pages)
Hi Daniel,

On 01/20/2015 06:50 PM, Daniel Mack wrote:
> Hi Michael,
> 
> Thanks a lot for for intense review of the documentation. Much appreciated.
> 
> I've addressed all but the below issues, following your suggestions.

Are your changes already visible somewhere?

> On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:
>>> +and KDBUS_CMD_ENDPOINT_MAKE (see above).
>>> +
>>> +Following items are expected for KDBUS_CMD_BUS_MAKE:
>>> +KDBUS_ITEM_MAKE_NAME
>>> +  Contains a string to identify the bus name.
>>
>> So, up to here, I've seen no definition of 'kdbus_item', which leaves me 
>> asking questions like: what subfield is KDBUS_ITEM_MAKE_NAME stored in?
>> which subfield holds the pointer to the string?
>>
>> Somewhere earlier,  kdbus_item needs to be exaplained in more detail, 
>> I think.
> 
> Hmm, you're quoting text from section 5, and section 4 actually
> describes the concept of items quite well I believe?

Well, Section 4 is pretty short. My point is that most of the various 
blob formats (e.g., kdbus_pids, kdbus_caps, kdbus_memfd) are not
documented in kdbus.txt. They all should be, IMO.

>>> +  __s64 priority;
>>> +With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in
>>> +the queue with at least the given priority. If no such message is 
>>> waiting
>>> +in the queue, -ENOMSG is returned.
>>
>> ###
>> How do I simply select the highest priority message, without knowing what 
>> its priority is?
> 
> The wording is indeed unclear here. KDBUS_RECV_USE_PRIORITY causes the
> messages to be dequeued by their priority. The 'priority' field is
> simply a filter that request a minimum priority. By setting this field
> to the highest possible value, you effectively bypass the filter. I've
> added a better description now.

Thanks for the clarification.

>>> +  -ENOMEM  The kernel memory is exhausted
>>> +  -ENOTTY  Illegal ioctl command issued for the file descriptor
>>
>> Why ENOTTY here, rather than EINVAL? The latter is, I beleive, the usual 
>> ioctl() error for invalid commands, I believe (If you keep ENOTTY, add an
>> explanation in this document.)
> 
> Hmm, no. -ENOTTY is commonly used as return code when calling ioctls
> that can't be handled by the FDs they're called on. 'man errno(3)' even
> states: "ENOTTY   Inappropriate I/O control operation (POSIX.1)".

Okay.

>>> +  -EINVAL  Unsupported item attached to command
>>> +
>>> +For all ioctls that carry a struct as payload:
>>> +
>>> +  -EFAULT  The supplied data pointer was not 64-bit aligned, or was
>>> +   inaccessible from the kernel side.
>>> +  -EINVAL  The size inside the supplied struct was smaller than expected
>>> +  -EMSGSIZEThe size inside the supplied struct was bigger than 
>>> expected
>>
>> Why two different errors for smaller and larger than expected? (If you keep 
>> things this
>> way, pelase explain the reason in this document.)
> 
> Providing a struct that is smaller than the minimum doesn't give the
> ioctl a valid set of information to process the request. Hence, I think
> -EINVAL is appropriate. However, -EMSGSIZE is something that users might
> hit when they make message payloads too big, and I think it's good to
> have a change to distinguish the two cases in error handling. I added
> something in the document now.

Thanks.

> Again, thanks a lot for reading the documentation so accurately!

You're welcome.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Daniel Mack
Hi Michael,

On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote:
 On 01/20/2015 07:23 PM, Daniel Mack wrote:

 It's rather an optional driver than a core kernel feature.
 
 Given the various things that I've seen said about kdbus, the
 preceding sentence makes little sense to me:
 
 * kdbus will be the framework supporting user-space D-Bus in the
   future, and also used by systemd, and so on pretty much every 
   desktop system.
 * kdbus solves much of the bandwidth problem of D-Bus1. That,
   along with a host of other features mean that there will be
   a lot of user-space developers interested in using this API.
 * Various parties in user space are already expressing strong 
   interest in kdbus.
 
 My guess from the above? This will NOT be an optional driver. 
 It *will be* a core kernel feature.

There will be userlands that will depend on kdbus, but that will still
not turn the driver into a core Linux kernel feature. We really want
it to be losely coupled from the rest of the kernel and optional after
all. The kernel people are working toward making more and more things
optional these days, and there will still be lots of systems that won't
be using kdbus.

 Also, the context the kdbus commands operate on originate from a
 mountable special-purpose file system.
 
 It's not clear to me how this point implies any particular API design
 choice.

It emphasizes the fact that our ioctl API can only be used with the
nodes exposed by kdbusfs, and vice versa. I think operations on
driver-specific files do not justify a new 'generic' syscall API.

 Notwithstanding the fact that there's a lot of (good) information in
 kdbus.txt, there's not nearly enough for someone to create useful, 
 robust applications that use that API (and not enough that I as a
 reviewer feel comfortable about reviewing the API). As things stand,
 user-space developers will be forced to decipher large amounts of kernel
 code and existing applications in order to actually build things. And
 when they do, they'll be using one of the worst APIs known to man: ioctl(),
 an API that provides no type safety at all.

I don't see how ioctls are any worse than syscalls with pointers to
structures. One can screw up compatibility either way. How is an ioctl
wrapper/prototype any less type-safe than a syscall wrapper?

 ioctl() is a get-out-of-jail free card when it comes to API design.

And how are syscalls different in that regard when they would both
transport the same data structures? Also note that all kdbus ioctls
necessarily operate on a file descriptor context, which an ioctl passes
along by default.

 Rather
 than thinking carefully and long about a set of coherent, stable APIs that 
 provide some degree of type-safety to user-space, one just 
 adds/changes/removes
 an ioctl.

Adding another ioctl in the future for cases we didn't think about
earlier would of course be considered a workaround; and even though such
things happen all the time, it's certainly something we'd like to avoid.

However, we would also need to add another syscall in such cases, which
is even worse IMO. So that's really not an argument against ioctls after
all. What fundamental difference between a raw syscall and a ioctl,
especially with regards to type safety, is there which would help us here?

 And that process seems to be frequent and ongoing even now. (And 
 it's to your great credit that the API/ABI breaks are clearly and honestly 
 marked in the kdbus.h changelog.) All of this lightens the burden of API
 design for kernel developers, but I'm concerned that the long-term pain
 for user-space developers who use an API which (in my estimation) may
 come to be widely used will be enormous.

Yes, we've jointly reviewed the API details again until just recently to
unify structs and enums etc, and added fields to make the ioctls structs
more versatile for possible future additions. By that, we effectively
broke the ABI, but we did that because we know we can't do such things
again in the future.

But again - I don't see how this would be different when using syscalls
rather than ioctls to transport information between the driver and
userspace. Could you elaborate?

 Concretely, I'd like to see the following in kdbus.txt:
 * A lot more detail, adding the various pieces that are currently missing.
   I've mentioned already the absence of detail on the item blob structures, 
   but there's probably several other pieces as well. My problem is that the
   API is so big and hard to grok that it's hard to even begin to work out
   what's missing from the documentation.
 * Fleshing out the API summaries with code snippets that illustrate the
   use of the APIs.
 * At least one simple working example application, complete with a walk
   through of how it's built and run.
 
 Yes, all of this is a big demand. But this is a big API that is being added 
 to the kernel, and one that may become widely used and for a long time.

Fair enough. Everything that helps people 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Michael Kerrisk (man-pages)
Hi David,

On 01/20/2015 03:31 PM, David Herrmann wrote:
 Hi Michael
 
 On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
 mtk.manpa...@gmail.com wrote:
 On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
 From: Daniel Mack dan...@zonque.org

 kdbus is a system for low-latency, low-overhead, easy to use
 interprocess communication (IPC).

 The interface to all functions in this driver is implemented via ioctls
 on files exposed through a filesystem called 'kdbusfs'. The default
 mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
 documentation about the kernel level API design.

 I have some details feedback on the contents of this file, and some
 bigger questions. I'll split them out into separate mails.

 So here, the bigger, general questions to start with. I've arrived late
 to this, so sorry if they've already been discussed, but the answers to
 some of the questions should actually be in this file, I would have
 expected.

 This is an enormous and complex API. Why is the API ioctl() based,
 rather than system-call-based? Have we learned nothing from the hydra
 that the futex() multiplexing syscall became? (And kdbus is an order
 of magnitude more complex, by the look of things.) At the very least,
 a *good* justification of why the API is ioctl()-based should be part
 of this documentation file.

 An observation: The documentation below is substantial, but this API is
 enormous, so the documentation still feels rather thin. What would
 really help would be some example code in the doc.

 And on the subject of code examples... Are there any (prototype)
 working user-space applications that exercise the current kdbus
 implementation? That is, can I install these kdbus patches, and
 then find a simple example application somewhere that does
 something to exercise kdbus?
 
 If you run a 3.18 kernel, you can install kdbus.ko from our repository
 and boot a full Fedora system running Gnome3 with kdbus, given that
 you compiled systemd with --enable-kdbus (which is still
 experimental). No legacy dbus1 daemon is running. Instead, we have a
 bus-proxy that converts classic dbus1 to kdbus, so all
 bus-communication runs on kdbus.

Good to hear.  I think that some info like this should go out in 
the 00/ covering mails for future patch revisions, so that people
can get some sense of the testing that has been done.

 And then: is there any substantial real-world application (e.g., a
 full D-Bus port) that is being developed in tandem with this kernel
 side patch? (I don't mean a user-space library; I mean a seriously
 large application.) This is an incredibly complex API whose
 failings are only going to become evident through real-world use.
 Solidifying an API in the kernel and then discovering the API
 problems later when writing real-world applications would make for
 a sad story. A story something like that of inotify, an API which
 is an order of magnitude less complex than kdbus. (I can't help but
 feel that many of inotify problems that I discuss at
 https://lwn.net/Articles/605128/ might have been fixed or mitigated
 if a few real-world applications had been implemented before the
 API  was set in stone.)
 
 I think running a whole Gnome3 stack counts as substantial real-world
 application, right? 

Yes, I'll give you that ;-).

  Sure, it's a dbus1-to-kdbus layer, but all the
 systemd tools use kdbus natively and it works just fine. In fact, we
 all run kdbus on our main-systems every day.
 
 We've spent over a year fixing races and API misdesigns, we've talked
 to other toolkit developers (glib, qt, ..) and made sure we're
 backwards compatible to dbus1. I don't think the API is perfect,
 everyone makes mistakes. But with bus-proxy and systemd we have two
 huge users of kdbus that put a lot of pressure on API design.

I'll say more about that in another mail in a moment. I'm not enthusiastic
about the API.

 +For a kdbus specific userspace library implementation please refer to:
 +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h

 Is this library intended just for systemd? More generally, is there an
 intention to provide a general purpose library API for kdbus? Or is the
 intention that each application will roll a library suitable to its
 needs? I think an answer to that question would be useful in this
 Documentation file.
 
 kdbus is in no way bound to systemd. There are ongoing efforts to port
 glib and qt to kdbus natively. The API is pretty simple 
 
I think you and I must have quite different definitions of simple...
(For more on this point, see my reply to Daniel in a moment.)

 and I don't
 see how a libkdbus would simplify things. In fact, even our tests only
 have slim wrappers around the ioctls to simplify error-handling in
 test-scenarios.

Again, the above info would be useful in the Documentation file.

 Note that most of the toolkit work is on the marshaling level, which
 is 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Michael Kerrisk (man-pages)
Hello Daniel,

On 01/20/2015 07:23 PM, Daniel Mack wrote:
 On 01/20/2015 02:53 PM, Michael Kerrisk (man-pages) wrote:
 This is an enormous and complex API. Why is the API ioctl() based,
 rather than system-call-based? Have we learned nothing from the hydra
 that the futex() multiplexing syscall became? (And kdbus is an order
 of magnitude more complex, by the look of things.) At the very least,
 a *good* justification of why the API is ioctl()-based should be part
 of this documentation file.
 
 I think the simplest reason is because we want to be able to build kdbus
 as a module. 

This isn't any _good_ justification...

 It's rather an optional driver than a core kernel feature.

Given the various things that I've seen said about kdbus, the
preceding sentence makes little sense to me:

* kdbus will be the framework supporting user-space D-Bus in the
  future, and also used by systemd, and so on pretty much every 
  desktop system.
* kdbus solves much of the bandwidth problem of D-Bus1. That,
  along with a host of other features mean that there will be
  a lot of user-space developers interested in using this API.
* Various parties in user space are already expressing strong 
  interest in kdbus.

My guess from the above? This will NOT be an optional driver. 
It *will be* a core kernel feature.

 IMO, kernel primitives should be syscalls, but kdbus is not a primitive
 but an elaborate subsystem.

Agreed. It's an elaborate subsystem. But that fact doesn't in itself
dictate any particular API design choice.

 Also, the context the kdbus commands operate on originate from a
 mountable special-purpose file system.

It's not clear to me how this point implies any particular API design
choice.

 Hence, we decided not to use a
 global kernel interface but specific ioctls on the nodes exposed by kdbusfs.

I don't follow the reasoning here at all. Here's what we have, if I
have grasped it roughly correctly:

* 16 ioctls exposed to user space.
* some 20 different structures exchanged between kernel and user space
* about 14k lines of kernel code implement the above
* some rather thin documentation of the whole lot

Sorry if that last point seems rather harsh. I know that you personally 
have done a lot of work on the kdbus.txt file. David Herrmann asserts
that this is a simple API. It is not. He also suggests that there is
no need for a libkdbus. I don't know whether that's right or not, but the
point is then that there's an expectation that the raw kernel API is what
user space will need to work with. 

Notwithstanding the fact that there's a lot of (good) information in
kdbus.txt, there's not nearly enough for someone to create useful, 
robust applications that use that API (and not enough that I as a
reviewer feel comfortable about reviewing the API). As things stand,
user-space developers will be forced to decipher large amounts of kernel
code and existing applications in order to actually build things. And
when they do, they'll be using one of the worst APIs known to man: ioctl(),
an API that provides no type safety at all.

ioctl() is a get-out-of-jail free card when it comes to API design. Rather
than thinking carefully and long about a set of coherent, stable APIs that 
provide some degree of type-safety to user-space, one just adds/changes/removes
an ioctl. And that process seems to be frequent and ongoing even now. (And 
it's to your great credit that the API/ABI breaks are clearly and honestly 
marked in the kdbus.h changelog.) All of this lightens the burden of API
design for kernel developers, but I'm concerned that the long-term pain
for user-space developers who use an API which (in my estimation) may
come to be widely used will be enormous.

Concretely, I'd like to see the following in kdbus.txt:
* A lot more detail, adding the various pieces that are currently missing.
  I've mentioned already the absence of detail on the item blob structures, 
  but there's probably several other pieces as well. My problem is that the
  API is so big and hard to grok that it's hard to even begin to work out
  what's missing from the documentation.
* Fleshing out the API summaries with code snippets that illustrate the
  use of the APIs.
* At least one simple working example application, complete with a walk
  through of how it's built and run.

Yes, all of this is a big demand. But this is a big API that is being added 
to the kernel, and one that may become widely used and for a long time.
It's imperative that the API is well documented, and as well designed as
possible. Furthermore, with such improved documentation I feel we'd be in 
a better position to evaluate the merits of an ioctl()-based API versus
some other approach.

Thanks,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Daniel Mack
Hi Michael,

On 01/21/2015 09:57 AM, Michael Kerrisk (man-pages) wrote:
 On 01/20/2015 06:50 PM, Daniel Mack wrote:

 I've addressed all but the below issues, following your suggestions.
 
 Are your changes already visible somewhere?

Yes, in the upstream repo for the standalone module, which we also use
to build the patch set from:

  https://code.google.com/p/d-bus/source/browse/kdbus.txt

 Hmm, you're quoting text from section 5, and section 4 actually
 describes the concept of items quite well I believe?
 
 Well, Section 4 is pretty short. My point is that most of the various 
 blob formats (e.g., kdbus_pids, kdbus_caps, kdbus_memfd) are not
 documented in kdbus.txt. They all should be, IMO.

Okay, I'll add some text about them.


Best regards,
Daniel

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Daniel Mack
On 01/21/2015 10:07 AM, Michael Kerrisk (man-pages) wrote:
 On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:

 +Also, if the connection allowed for file descriptor to be passed
 +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
 +installed into the receiving process after the KDBUS_CMD_RECV ioctl 
 returns.

 ###
 after??? When exactly?
 
 By the way, what was the answer to this question?

I've corrected the wording on this. The file descriptors are installed
when the RECV ioctl is called, so they are ready to use when the call
returns.


Thanks,
Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Michael Kerrisk (man-pages)
Hi Daniel,

On 01/20/2015 06:50 PM, Daniel Mack wrote:
 Hi Michael,
 
 Thanks a lot for for intense review of the documentation. Much appreciated.
 
 I've addressed all but the below issues, following your suggestions.

Are your changes already visible somewhere?

 On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:
 +and KDBUS_CMD_ENDPOINT_MAKE (see above).
 +
 +Following items are expected for KDBUS_CMD_BUS_MAKE:
 +KDBUS_ITEM_MAKE_NAME
 +  Contains a string to identify the bus name.

 So, up to here, I've seen no definition of 'kdbus_item', which leaves me 
 asking questions like: what subfield is KDBUS_ITEM_MAKE_NAME stored in?
 which subfield holds the pointer to the string?

 Somewhere earlier,  kdbus_item needs to be exaplained in more detail, 
 I think.
 
 Hmm, you're quoting text from section 5, and section 4 actually
 describes the concept of items quite well I believe?

Well, Section 4 is pretty short. My point is that most of the various 
blob formats (e.g., kdbus_pids, kdbus_caps, kdbus_memfd) are not
documented in kdbus.txt. They all should be, IMO.

 +  __s64 priority;
 +With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in
 +the queue with at least the given priority. If no such message is 
 waiting
 +in the queue, -ENOMSG is returned.

 ###
 How do I simply select the highest priority message, without knowing what 
 its priority is?
 
 The wording is indeed unclear here. KDBUS_RECV_USE_PRIORITY causes the
 messages to be dequeued by their priority. The 'priority' field is
 simply a filter that request a minimum priority. By setting this field
 to the highest possible value, you effectively bypass the filter. I've
 added a better description now.

Thanks for the clarification.

 +  -ENOMEM  The kernel memory is exhausted
 +  -ENOTTY  Illegal ioctl command issued for the file descriptor

 Why ENOTTY here, rather than EINVAL? The latter is, I beleive, the usual 
 ioctl() error for invalid commands, I believe (If you keep ENOTTY, add an
 explanation in this document.)
 
 Hmm, no. -ENOTTY is commonly used as return code when calling ioctls
 that can't be handled by the FDs they're called on. 'man errno(3)' even
 states: ENOTTY   Inappropriate I/O control operation (POSIX.1).

Okay.

 +  -EINVAL  Unsupported item attached to command
 +
 +For all ioctls that carry a struct as payload:
 +
 +  -EFAULT  The supplied data pointer was not 64-bit aligned, or was
 +   inaccessible from the kernel side.
 +  -EINVAL  The size inside the supplied struct was smaller than expected
 +  -EMSGSIZEThe size inside the supplied struct was bigger than 
 expected

 Why two different errors for smaller and larger than expected? (If you keep 
 things this
 way, pelase explain the reason in this document.)
 
 Providing a struct that is smaller than the minimum doesn't give the
 ioctl a valid set of information to process the request. Hence, I think
 -EINVAL is appropriate. However, -EMSGSIZE is something that users might
 hit when they make message payloads too big, and I think it's good to
 have a change to distinguish the two cases in error handling. I added
 something in the document now.

Thanks.

 Again, thanks a lot for reading the documentation so accurately!

You're welcome.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Michael Kerrisk (man-pages)
Daniel,

On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:
 On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
 From: Daniel Mack dan...@zonque.org


[...]

 +offset field contains the location of the new message inside the receiver's
 +pool. The message is stored as struct kdbus_msg at this offset, and can be
 +interpreted with the semantics described above.
 +
 +Also, if the connection allowed for file descriptor to be passed
 +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
 +installed into the receiving process after the KDBUS_CMD_RECV ioctl returns.
 
 ###
 after??? When exactly?

By the way, what was the answer to this question?

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-21 Thread Theodore Ts'o
On Wed, Jan 21, 2015 at 11:32:59AM +0100, Michael Kerrisk (man-pages) wrote:
 It's rather an optional driver than a core kernel feature.
 
 Given the various things that I've seen said about kdbus, the
 preceding sentence makes little sense to me:
 
 * kdbus will be the framework supporting user-space D-Bus in the
   future, and also used by systemd, and so on pretty much every 
   desktop system.

I have to agree with Michael here; it's really, **really**
disengenuous to say that that if you don't want kdbus, you can just
#ifconfig it out.  The fact that it systemd will be using it means
that it will very shortly become a core kernel feature which is
absolutely mandatory.  Sure, maybe it can be configured out for tiny
kernels, just as in theory we can configure out the VM system for
really tiny embedded systems.  But we should be treating this as
something that is not optional, because the reality is that's the way
it's going to be in very short order.  So if that means to use proper
system calls instead of ioctls, we should do that.

   - Ted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-20 Thread Johannes Stezenbach
Hi David,

On Tue, Jan 20, 2015 at 06:00:28PM +0100, David Herrmann wrote:
[big snip]
> These are just examples off the top of my head, but I think they're
> already pretty convincing.

Thank you for writing this up.  This is the information I was
looking for which puts kdbus into context and explains
the motivation for its development.  Naturally I don't agree
with all of it, but I'm content with what I learned so far.

Daniel informed me off-list that he (and probably others) does
not understand what my questions were aiming at.  I'm sorry
about that, I thought it was clear I was just lacking
the background information to understand what kdbus is and
what it is not, and why it exists -- information I couldn't find
in some hours of googling.


Thanks,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-20 Thread Daniel Mack
On 01/20/2015 02:53 PM, Michael Kerrisk (man-pages) wrote:
> This is an enormous and complex API. Why is the API ioctl() based,
> rather than system-call-based? Have we learned nothing from the hydra
> that the futex() multiplexing syscall became? (And kdbus is an order
> of magnitude more complex, by the look of things.) At the very least,
> a *good* justification of why the API is ioctl()-based should be part
> of this documentation file.

I think the simplest reason is because we want to be able to build kdbus
as a module. It's rather an optional driver than a core kernel feature.
IMO, kernel primitives should be syscalls, but kdbus is not a primitive
but an elaborate subsystem.

Also, the context the kdbus commands operate on originate from a
mountable special-purpose file system. Hence, we decided not to use a
global kernel interface but specific ioctls on the nodes exposed by kdbusfs.


Thanks,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-20 Thread Daniel Mack
Hi Michael,

Thanks a lot for for intense review of the documentation. Much appreciated.

I've addressed all but the below issues, following your suggestions.


On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:
>> +and KDBUS_CMD_ENDPOINT_MAKE (see above).
>> +
>> +Following items are expected for KDBUS_CMD_BUS_MAKE:
>> +KDBUS_ITEM_MAKE_NAME
>> +  Contains a string to identify the bus name.
> 
> So, up to here, I've seen no definition of 'kdbus_item', which leaves me 
> asking questions like: what subfield is KDBUS_ITEM_MAKE_NAME stored in?
> which subfield holds the pointer to the string?
> 
> Somewhere earlier,  kdbus_item needs to be exaplained in more detail, 
> I think.

Hmm, you're quoting text from section 5, and section 4 actually
describes the concept of items quite well I believe?

>> +  __s64 priority;
>> +With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in
>> +the queue with at least the given priority. If no such message is 
>> waiting
>> +in the queue, -ENOMSG is returned.
> 
> ###
> How do I simply select the highest priority message, without knowing what 
> its priority is?

The wording is indeed unclear here. KDBUS_RECV_USE_PRIORITY causes the
messages to be dequeued by their priority. The 'priority' field is
simply a filter that request a minimum priority. By setting this field
to the highest possible value, you effectively bypass the filter. I've
added a better description now.

>> +  -ENOMEM   The kernel memory is exhausted
>> +  -ENOTTY   Illegal ioctl command issued for the file descriptor
> 
> Why ENOTTY here, rather than EINVAL? The latter is, I beleive, the usual 
> ioctl() error for invalid commands, I believe (If you keep ENOTTY, add an
> explanation in this document.)

Hmm, no. -ENOTTY is commonly used as return code when calling ioctls
that can't be handled by the FDs they're called on. 'man errno(3)' even
states: "ENOTTY   Inappropriate I/O control operation (POSIX.1)".

>> +  -EINVAL   Unsupported item attached to command
>> +
>> +For all ioctls that carry a struct as payload:
>> +
>> +  -EFAULT   The supplied data pointer was not 64-bit aligned, or was
>> +inaccessible from the kernel side.
>> +  -EINVAL   The size inside the supplied struct was smaller than expected
>> +  -EMSGSIZE The size inside the supplied struct was bigger than expected
> 
> Why two different errors for smaller and larger than expected? (If you keep 
> things this
> way, pelase explain the reason in this document.)

Providing a struct that is smaller than the minimum doesn't give the
ioctl a valid set of information to process the request. Hence, I think
-EINVAL is appropriate. However, -EMSGSIZE is something that users might
hit when they make message payloads too big, and I think it's good to
have a change to distinguish the two cases in error handling. I added
something in the document now.


Again, thanks a lot for reading the documentation so accurately!

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-20 Thread David Herrmann
Hi

On Tue, Jan 20, 2015 at 5:08 PM, Johannes Stezenbach  wrote:
> However, let me repeat and rephrase my previous questions:
> Is there a noticable or measurable improvement from using kdbus?
> IOW, is the added complexity of kdbus worth the result?
>
> I have stated my believe that current usage of D-Bus is not
> performance sensitive and the number of messages exchanged
> is low.  I would love it if you would prove me wrong.
> Or if you could show that any D-Bus related bug in Gnome3
> is fixed by kdbus.

DBus is not used for performance sensitive applications because DBus
is slow. We want to make it fast so we can finally use it for
low-latency, high-throughput applications. A simple DBus
method-call+reply takes 200us here, with kdbus it takes 8us (with UDS
about 2us). If I increase the packet size from 8k to 128k, kdbus even
tops UDS thanks to single-copy transfers.
The fact that there is no performance-critical application using DBus
is, imho, an argument *pro* kdbus. People haven't been capable of
making classic dbus1 fast enough for low-latency audio, thus, we
present kdbus.

Starting up 'gdm' sends ~5k dbus messages on my machine. It takes >1s
to transmit the messages alone. Each dbus1 message has to be copied 4
times for each direction. With kdbus, each message is copied only once
for each transmission (or not at all, if you use memfds, though that
doesn't mean it's necessarily faster). No intermediate context-switch
is needed. This makes kdbus capable to transmit low-latency audio data
*inline*.

DBus marshaling is the de-facto standard in all major(!) linux desktop
systems. It is well established and accepted by many DEs. It also
solves many other problems, including: policy,
authentication/authorization, well-known name registry, efficient
broadcasts/multicasts, peer discovery, bus discovery, metadata
transmission, and more.
It is a shame that we cannot use this well-established protocol for
low-latency applications. We, effectively, have to duplicate all this
code on custom UDS and other transports just because DBus is too slow.

kdbus tries to unify those efforts, so that we don't need multiple
policy implementations, name registries and peer discovery mechanisms.
Furthermore, kdbus implements comprehensive, yet optional, metadata
transmission that allows to identify and authenticate peers in a
race-free manner (which is *not* possible with UDS).
Also, kdbus provides a single transport bus with sequential message
numbering. If you use multiple channels, you cannot give any ordering
guarantees across peers (for instance, regarding parallel
name-registry changes).


Given these theoretical advantages, here're some examples:

*) The Tizen developers have been complaining about the high latency
of DBus for polkit'ish policy queries. That's why their authentication
framework uses custom UDS sockets (called 'Cynara'). If a
UI-interaction needs multiple authentication-queries, you don't want
it to take multiple milliseconds, given that you usually want to
render the result in the same frame.

*) PulseAudio doesn't use DBus for data transmission. They had to
implement their own marshaling code, transport layer and so on, just
because DBus1-latency is horrible. With kdbus, we can basically drop
this code-duplication and unify the IPC layer. Same is true for
Wayland, btw.

*) By moving broadcast-transmission into the kernel, we can use the
time-slices of the sender to perform heavy operations. This is also
true for policy decisions, etc. With a userspace daemon, we cannot
perform operations in a time-slice of the caller. This makes DoS
attacks much harder.

*) With priority-inheritance, we can do synchronous calls into trusted
peers and let them optionally use our time-slice to perform the
action. This allows syscall-like/binder-like method-calls into other
processes. Without priority-inheritance, this is not possible in a
secure manner (see 'priority-inheritance').

*) Logging-daemons often want to attach metadata to log-messages so
debugging/filtering gets easier. If short-lived programs send
log-messages, the destination peer might not be able to read such
metadata from /proc, as the process might no longer be available at
that time. Same is true for policy-decisions like polkit does. You
cannot send off method-calls and exit. You have to wait for a reply,
even though you might not even care for it. If you don't wait, the
other side might not be able to verify your identity and as such
reject the request.

*) Even though the dbus traffic on idle-systems might be low, this
doesn't mean it's not significant at boot-times or under high-load. If
you run a dbus-monitor of your choice, you will see there is an
significant number of messages exchanged during VT-switches, startup,
shutdown, suspend, wakeup, hotplugging and similar situations where
lots of control-messages are exchanged. We don't want to spend
hundreds of ms just to transmit those messages.

*) dbus-daemon is not available during 

Re: [PATCH 01/13] kdbus: add documentation

2015-01-20 Thread Johannes Stezenbach
Hi all,

On Tue, Jan 20, 2015 at 03:53:53PM +0100, Djalal Harouni wrote:
> On Tue, Jan 20, 2015 at 09:42:59AM -0500, Josh Boyer wrote:
> > On Tue, Jan 20, 2015 at 9:31 AM, David Herrmann  
> > wrote:
> > >
> > > If you run a 3.18 kernel, you can install kdbus.ko from our repository
> > > and boot a full Fedora system running Gnome3 with kdbus, given that
> > > you compiled systemd with --enable-kdbus (which is still
> > > experimental). No legacy dbus1 daemon is running. Instead, we have a
> > > bus-proxy that converts classic dbus1 to kdbus, so all
> > > bus-communication runs on kdbus.
> > 
> > FWIW, we've been building a "playground" repository for the kernel
> > that contains this already for Fedora.  If you have a stock Fedora 21
> > or rawhide install, you can use:
> > 
> > https://copr.fedoraproject.org/coprs/jwboyer/kernel-playground/
> > 
> > which has the kernel+kdbus and systemd built with --enable-kdbus
> > already.  Easy enough to throw in a VM for testing.
> 
> Another addition, if kdbus is installed and loaded, you could also use
> systemd-nspawn to boot a full system (systemd compiled with
> --enable-kdbus) in a container [1], kdbusfs will be mounted in the
> container.
> 
> There is also the busctl tool to query kdbus...
> 
> http://www.freedesktop.org/wiki/Software/systemd/VirtualizedTesting/

It is reassuring that kdbus actually works :)

However, let me repeat and rephrase my previous questions:
Is there a noticable or measurable improvement from using kdbus?
IOW, is the added complexity of kdbus worth the result?

I have stated my believe that current usage of D-Bus is not
performance sensitive and the number of messages exchanged
is low.  I would love it if you would prove me wrong.
Or if you could show that any D-Bus related bug in Gnome3
is fixed by kdbus.

I would sooo love it if someone would finally post some
data that proves kdbus is useful beyond systemd.


Thanks,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-20 Thread Djalal Harouni
Hi,

On Tue, Jan 20, 2015 at 09:42:59AM -0500, Josh Boyer wrote:
> On Tue, Jan 20, 2015 at 9:31 AM, David Herrmann  wrote:
> > Hi Michael
> >
> > On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
> >  wrote:
> >> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> >>> From: Daniel Mack 
> >>>
> >>> kdbus is a system for low-latency, low-overhead, easy to use
> >>> interprocess communication (IPC).
> >>>
> >>> The interface to all functions in this driver is implemented via ioctls
> >>> on files exposed through a filesystem called 'kdbusfs'. The default
> >>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
> >>> documentation about the kernel level API design.
> >>
> >> I have some details feedback on the contents of this file, and some
> >> bigger questions. I'll split them out into separate mails.
> >>
> >> So here, the bigger, general questions to start with. I've arrived late
> >> to this, so sorry if they've already been discussed, but the answers to
> >> some of the questions should actually be in this file, I would have
> >> expected.
> >>
> >> This is an enormous and complex API. Why is the API ioctl() based,
> >> rather than system-call-based? Have we learned nothing from the hydra
> >> that the futex() multiplexing syscall became? (And kdbus is an order
> >> of magnitude more complex, by the look of things.) At the very least,
> >> a *good* justification of why the API is ioctl()-based should be part
> >> of this documentation file.
> >>
> >> An observation: The documentation below is substantial, but this API is
> >> enormous, so the documentation still feels rather thin. What would
> >> really help would be some example code in the doc.
> >>
> >> And on the subject of code examples... Are there any (prototype)
> >> working user-space applications that exercise the current kdbus
> >> implementation? That is, can I install these kdbus patches, and
> >> then find a simple example application somewhere that does
> >> something to exercise kdbus?
> >
> > If you run a 3.18 kernel, you can install kdbus.ko from our repository
> > and boot a full Fedora system running Gnome3 with kdbus, given that
> > you compiled systemd with --enable-kdbus (which is still
> > experimental). No legacy dbus1 daemon is running. Instead, we have a
> > bus-proxy that converts classic dbus1 to kdbus, so all
> > bus-communication runs on kdbus.
> 
> FWIW, we've been building a "playground" repository for the kernel
> that contains this already for Fedora.  If you have a stock Fedora 21
> or rawhide install, you can use:
> 
> https://copr.fedoraproject.org/coprs/jwboyer/kernel-playground/
> 
> which has the kernel+kdbus and systemd built with --enable-kdbus
> already.  Easy enough to throw in a VM for testing.
> 
> josh
Yes thanks josh!

Another addition, if kdbus is installed and loaded, you could also use
systemd-nspawn to boot a full system (systemd compiled with
--enable-kdbus) in a container [1], kdbusfs will be mounted in the
container.

There is also the busctl tool to query kdbus...

http://www.freedesktop.org/wiki/Software/systemd/VirtualizedTesting/

-- 
Djalal Harouni
http://opendz.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-20 Thread Josh Boyer
On Tue, Jan 20, 2015 at 9:31 AM, David Herrmann  wrote:
> Hi Michael
>
> On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
>  wrote:
>> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>>> From: Daniel Mack 
>>>
>>> kdbus is a system for low-latency, low-overhead, easy to use
>>> interprocess communication (IPC).
>>>
>>> The interface to all functions in this driver is implemented via ioctls
>>> on files exposed through a filesystem called 'kdbusfs'. The default
>>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
>>> documentation about the kernel level API design.
>>
>> I have some details feedback on the contents of this file, and some
>> bigger questions. I'll split them out into separate mails.
>>
>> So here, the bigger, general questions to start with. I've arrived late
>> to this, so sorry if they've already been discussed, but the answers to
>> some of the questions should actually be in this file, I would have
>> expected.
>>
>> This is an enormous and complex API. Why is the API ioctl() based,
>> rather than system-call-based? Have we learned nothing from the hydra
>> that the futex() multiplexing syscall became? (And kdbus is an order
>> of magnitude more complex, by the look of things.) At the very least,
>> a *good* justification of why the API is ioctl()-based should be part
>> of this documentation file.
>>
>> An observation: The documentation below is substantial, but this API is
>> enormous, so the documentation still feels rather thin. What would
>> really help would be some example code in the doc.
>>
>> And on the subject of code examples... Are there any (prototype)
>> working user-space applications that exercise the current kdbus
>> implementation? That is, can I install these kdbus patches, and
>> then find a simple example application somewhere that does
>> something to exercise kdbus?
>
> If you run a 3.18 kernel, you can install kdbus.ko from our repository
> and boot a full Fedora system running Gnome3 with kdbus, given that
> you compiled systemd with --enable-kdbus (which is still
> experimental). No legacy dbus1 daemon is running. Instead, we have a
> bus-proxy that converts classic dbus1 to kdbus, so all
> bus-communication runs on kdbus.

FWIW, we've been building a "playground" repository for the kernel
that contains this already for Fedora.  If you have a stock Fedora 21
or rawhide install, you can use:

https://copr.fedoraproject.org/coprs/jwboyer/kernel-playground/

which has the kernel+kdbus and systemd built with --enable-kdbus
already.  Easy enough to throw in a VM for testing.

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-20 Thread David Herrmann
Hi Michael

On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
 wrote:
> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>> From: Daniel Mack 
>>
>> kdbus is a system for low-latency, low-overhead, easy to use
>> interprocess communication (IPC).
>>
>> The interface to all functions in this driver is implemented via ioctls
>> on files exposed through a filesystem called 'kdbusfs'. The default
>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
>> documentation about the kernel level API design.
>
> I have some details feedback on the contents of this file, and some
> bigger questions. I'll split them out into separate mails.
>
> So here, the bigger, general questions to start with. I've arrived late
> to this, so sorry if they've already been discussed, but the answers to
> some of the questions should actually be in this file, I would have
> expected.
>
> This is an enormous and complex API. Why is the API ioctl() based,
> rather than system-call-based? Have we learned nothing from the hydra
> that the futex() multiplexing syscall became? (And kdbus is an order
> of magnitude more complex, by the look of things.) At the very least,
> a *good* justification of why the API is ioctl()-based should be part
> of this documentation file.
>
> An observation: The documentation below is substantial, but this API is
> enormous, so the documentation still feels rather thin. What would
> really help would be some example code in the doc.
>
> And on the subject of code examples... Are there any (prototype)
> working user-space applications that exercise the current kdbus
> implementation? That is, can I install these kdbus patches, and
> then find a simple example application somewhere that does
> something to exercise kdbus?

If you run a 3.18 kernel, you can install kdbus.ko from our repository
and boot a full Fedora system running Gnome3 with kdbus, given that
you compiled systemd with --enable-kdbus (which is still
experimental). No legacy dbus1 daemon is running. Instead, we have a
bus-proxy that converts classic dbus1 to kdbus, so all
bus-communication runs on kdbus.

> And then: is there any substantial real-world application (e.g., a
> full D-Bus port) that is being developed in tandem with this kernel
> side patch? (I don't mean a user-space library; I mean a seriously
> large application.) This is an incredibly complex API whose
> failings are only going to become evident through real-world use.
> Solidifying an API in the kernel and then discovering the API
> problems later when writing real-world applications would make for
> a sad story. A story something like that of inotify, an API which
> is an order of magnitude less complex than kdbus. (I can't help but
> feel that many of inotify problems that I discuss at
> https://lwn.net/Articles/605128/ might have been fixed or mitigated
> if a few real-world applications had been implemented before the
> API  was set in stone.)

I think running a whole Gnome3 stack counts as "substantial real-world
application", right? Sure, it's a dbus1-to-kdbus layer, but all the
systemd tools use kdbus natively and it works just fine. In fact, we
all run kdbus on our main-systems every day.

We've spent over a year fixing races and API misdesigns, we've talked
to other toolkit developers (glib, qt, ..) and made sure we're
backwards compatible to dbus1. I don't think the API is perfect,
everyone makes mistakes. But with bus-proxy and systemd we have two
huge users of kdbus that put a lot of pressure on API design.

>> +For a kdbus specific userspace library implementation please refer to:
>> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
>
> Is this library intended just for systemd? More generally, is there an
> intention to provide a general purpose library API for kdbus? Or is the
> intention that each application will roll a library suitable to its
> needs? I think an answer to that question would be useful in this
> Documentation file.

kdbus is in no way bound to systemd. There are ongoing efforts to port
glib and qt to kdbus natively. The API is pretty simple and I don't
see how a libkdbus would simplify things. In fact, even our tests only
have slim wrappers around the ioctls to simplify error-handling in
test-scenarios.

Note that most of the toolkit work is on the marshaling level, which
is independent of kdbus. kdbus just provides the transport level. DBus
is just one, yet significant, application-layer on top of kdbus. Our
test-cases use kdbus exclusively to transport raw byte streams.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] kdbus: add documentation

2015-01-20 Thread Michael Kerrisk (man-pages)
On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> From: Daniel Mack 
> 
> kdbus is a system for low-latency, low-overhead, easy to use
> interprocess communication (IPC).
> 
> The interface to all functions in this driver is implemented via ioctls
> on files exposed through a filesystem called 'kdbusfs'. The default
> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
> documentation about the kernel level API design.

And now the details feedback.

Please note that for the various points I raise below, even in
cases where I don't suggest/request a fix, the fact that I've
needed to answer a question probably suggests a deficiency in 
the documentation that probably needs to be remedied.

Many of my comments below are wording and typo fixes. While
these may sem trivial, the existence of various wording problems
and typos is a significant distraction, especially while trying
to grok a document of this size.

I've marked one or two notable questions about the API with "###".

> Signed-off-by: Daniel Mack 
> Signed-off-by: David Herrmann 
> Signed-off-by: Djalal Harouni 
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  Documentation/kdbus.txt | 2107 
> +++
>  1 file changed, 2107 insertions(+)
>  create mode 100644 Documentation/kdbus.txt
> 
> diff --git a/Documentation/kdbus.txt b/Documentation/kdbus.txt
> new file mode 100644
> index ..2592a7e37079
> --- /dev/null
> +++ b/Documentation/kdbus.txt
> @@ -0,0 +1,2107 @@
> +D-Bus is a system for powerful, easy to use interprocess communication (IPC).
> +
> +The focus of this document is an overview of the low-level, native kernel 
> D-Bus
> +transport called kdbus. Kdbus exposes its functionality via files in a
> +filesystem called 'kdbusfs'. All communication between processes takes place
> +via ioctls on files exposed through the mount point of a kdbusfs. The default
> +mount point of kdbusfs is /sys/fs/kdbus.
> +
> +Please note that kdbus was designed as transport layer for D-Bus, but is in 
> no
> +way limited, nor controlled by the D-Bus protocol specification. The D-Bus
> +protocol is one possible application layer on top of kdbus.
> +
> +For the general D-Bus protocol specification, the payload format, the
> +marshaling, and the communication semantics, please refer to:
> +  http://dbus.freedesktop.org/doc/dbus-specification.html
> +
> +For a kdbus specific userspace library implementation please refer to:
> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
> +
> +Articles about D-Bus and kdbus:
> +  http://lwn.net/Articles/580194/
> +
> +
> +1. Terminology
> +===
> +
> +  Domain:
> +A domain is created each time a kdbusfs is mounted. Each process that is
> +capable to mount a new instance of a kdbusfs will have its own kdbus

s/is capable to mount/mounts/

> +hierarchy. Each domain (ie, each mount point) offers its own "control"
> +file to create new buses. Domains have no connection to each other and
> +cannot see nor talk to each other. See section 5 for more details.

Smoother would be:

s/cannot see nor talk/can neither see nor talk/

> +
> +  Bus:
> +A bus is a named object inside a domain. Clients exchange messages
> +over a bus. Multiple buses themselves have no connection to each other;
> +messages can only be exchanged on the same bus. The default endpoint of
> +a bus, where clients establish the connection to, is the "bus" file

Maybe:

where clients establish the connection to
==>
to which clients establish connections
?

> +/sys/fs/kdbus//bus.
> +Common operating system setups create one "system bus" per system, and 
> one
> +"user bus" for every logged-in user. Applications or services may create

At the kdbus level is there any difference between such system and user buses?
If not, it would perhaps be good to insert a parenthetical aside to say 
that.

> +their own private buses.  See section 5 for more details.
> +
> +  Endpoint:
> +An endpoint provides a file to talk to a bus. Opening an endpoint
> +creates a new connection to the bus to which the endpoint belongs. All
> +endpoints have unique names and are accessible as files underneath the
> +directory of a bus, e.g., /sys/fs/kdbus//
> +Every bus has a default endpoint called "bus". A bus can optionally offer
> +additional endpoints with custom names to provide restricted access to 
> the
> +bus. Custom endpoints carry additional policy which can be used to create
> +sandboxes with locked-down, limited, filtered access to a bus.  See
> +section 5 for more details.
> +
> +  Connection:
> +A connection to a bus is created by opening an endpoint file of a bus and
> +becoming an active client with the HELLO exchange. Every ordinary client
> +connection has a unique identifier on the bus and can address messages to
> +

Re: [PATCH 01/13] kdbus: add documentation

2015-01-20 Thread Michael Kerrisk (man-pages)
On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> From: Daniel Mack 
> 
> kdbus is a system for low-latency, low-overhead, easy to use
> interprocess communication (IPC).
> 
> The interface to all functions in this driver is implemented via ioctls
> on files exposed through a filesystem called 'kdbusfs'. The default
> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
> documentation about the kernel level API design.

I have some details feedback on the contents of this file, and some 
bigger questions. I'll split them out into separate mails.

So here, the bigger, general questions to start with. I've arrived late 
to this, so sorry if they've already been discussed, but the answers to 
some of the questions should actually be in this file, I would have 
expected.

This is an enormous and complex API. Why is the API ioctl() based,
rather than system-call-based? Have we learned nothing from the hydra
that the futex() multiplexing syscall became? (And kdbus is an order
of magnitude more complex, by the look of things.) At the very least,
a *good* justification of why the API is ioctl()-based should be part
of this documentation file.

An observation: The documentation below is substantial, but this API is 
enormous, so the documentation still feels rather thin. What would 
really help would be some example code in the doc. 

And on the subject of code examples... Are there any (prototype) 
working user-space applications that exercise the current kdbus 
implementation? That is, can I install these kdbus patches, and
then find a simple example application somewhere that does
something to exercise kdbus?

And then: is there any substantial real-world application (e.g., a 
full D-Bus port) that is being developed in tandem with this kernel
side patch? (I don't mean a user-space library; I mean a seriously
large application.) This is an incredibly complex API whose
failings are only going to become evident through real-world use.
Solidifying an API in the kernel and then discovering the API
problems later when writing real-world applications would make for
a sad story. A story something like that of inotify, an API which 
is an order of magnitude less complex than kdbus. (I can't help but
feel that many of inotify problems that I discuss at 
https://lwn.net/Articles/605128/ might have been fixed or mitigated 
if a few real-world applications had been implemented before the
API  was set in stone.)

> +For a kdbus specific userspace library implementation please refer to:
> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h

Is this library intended just for systemd? More generally, is there an 
intention to provide a general purpose library API for kdbus? Or is the
intention that each application will roll a library suitable to its
needs? I think an answer to that question would be useful in this 
Documentation file.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: add documentation

2015-01-20 Thread Michael Kerrisk (man-pages)
On 01/20/2015 09:25 AM, Daniel Mack wrote:
> Hi Michael,
> 
> On 01/20/2015 09:09 AM, Michael Kerrisk (man-pages) wrote:
>> On 11/30/2014 06:23 PM, Florian Weimer wrote:
>>> * David Herrmann:
>>>
 On Sun, Nov 30, 2014 at 10:02 AM, Florian Weimer  
 wrote:
> * Greg Kroah-Hartman:
>
>> +7.4 Receiving messages
>>>
> What happens if this is not possible because the file descriptor limit
> of the processes would be exceeded?  EMFILE, and the message will not
> be received?

 The message is returned without installing the FDs. This is signaled
 by EMFILE, but a valid pool offset.
>>>
>>> Oh.  This is really surprising, so it needs documentation.  But it's
>>> probably better than the alternative (return EMFILE and leave the
>>> message stuck, so that you receive it immediately again—this behavior
>>> makes non-blocking accept rather difficult to use correctly).
>>
>> So, was this point in the end explicitly documented? I not
>> obvious that it is documented in the revised kdbus.txt that
>> Greg K-H sent out 4 days ago.
> 
> No, we've revisited this point and changed the kernel behavior again in
> v3. We're no longer returning -EMFILE in this case, but rather set
> KDBUS_RECV_RETURN_INCOMPLETE_FDS in a new field in the receive ioctl
> struct called 'return_flags'. We believe that's a nicer way of signaling
> specific errors. The message will carry -1 for all FDs that failed to
> get installed, so the user can actually see which one is missing.
> 
> That's also documented in kdbus.txt, but we missed putting it into the
> Changelog - sorry for that.

Thanks for the info, Daniel.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdbus: add documentation

2015-01-20 Thread Daniel Mack
Hi Michael,

On 01/20/2015 09:09 AM, Michael Kerrisk (man-pages) wrote:
> On 11/30/2014 06:23 PM, Florian Weimer wrote:
>> * David Herrmann:
>>
>>> On Sun, Nov 30, 2014 at 10:02 AM, Florian Weimer  wrote:
 * Greg Kroah-Hartman:

> +7.4 Receiving messages
>>
 What happens if this is not possible because the file descriptor limit
 of the processes would be exceeded?  EMFILE, and the message will not
 be received?
>>>
>>> The message is returned without installing the FDs. This is signaled
>>> by EMFILE, but a valid pool offset.
>>
>> Oh.  This is really surprising, so it needs documentation.  But it's
>> probably better than the alternative (return EMFILE and leave the
>> message stuck, so that you receive it immediately again—this behavior
>> makes non-blocking accept rather difficult to use correctly).
> 
> So, was this point in the end explicitly documented? I not
> obvious that it is documented in the revised kdbus.txt that
> Greg K-H sent out 4 days ago.

No, we've revisited this point and changed the kernel behavior again in
v3. We're no longer returning -EMFILE in this case, but rather set
KDBUS_RECV_RETURN_INCOMPLETE_FDS in a new field in the receive ioctl
struct called 'return_flags'. We believe that's a nicer way of signaling
specific errors. The message will carry -1 for all FDs that failed to
get installed, so the user can actually see which one is missing.

That's also documented in kdbus.txt, but we missed putting it into the
Changelog - sorry for that.


Hope this helps,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >