Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-09 Thread Andy Lutomirski
On Wed, Dec 9, 2020 at 11:22 AM Topi Miettinen  wrote:
>
> On 9.12.2020 17.14, Andy Lutomirski wrote:
> >

> Maybe also malware which can escape all means of detection, enforced by
> the CPU? Though I don't know if any malware scanners for Linux work can
> check for fileless, memory only malware.

I don't think this is really relevant to malware detection.  You can't
do syscalls from SGX code, for example, and, even if you could,
malware behavior analysis would be unaffected.  The concern seems to
be more that, once someone has discovered some malware, if it's
protected by SGX then it's plausible that you can't disassemble it.

>
> >
> > In Intel’s original vision, only specially licensed vendors could create 
> > SGX software, but Linux pushed back against this quite hard, and new CPUs 
> > allow unlicensed enclaves. So your Skylake CPUs support SGX, but not on 
> > Linux.
>
> Kudos to Linux for the push.

:)

I don't know if Linux gets full credit for this, but I think we at
least had some impact.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-09 Thread Andy Lutomirski

> On Dec 9, 2020, at 12:58 AM, Topi Miettinen  wrote:
> 
> On 9.12.2020 2.42, Jarkko Sakkinen wrote:
>>> On Wed, Dec 09, 2020 at 02:15:28AM +0200, Jarkko Sakkinen wrote:
>>> On Wed, Dec 09, 2020 at 01:15:27AM +0200, Topi Miettinen wrote:
>>> As  a further argument, I just did this on a Fedora system:
>>> $ find /dev -perm /ugo+x -a \! -type d -a \! -type l
>>> No results.  So making /dev noexec doesn't seem to have any benefit.
>> 
>> It's no surprise that there aren't any executables in /dev since
>> removing MAKEDEV ages ago. That's not the issue, which is that
>> /dev is a writable directory (for UID=0 but no capabilities are
>> needed) and thus a potential location for constructing unapproved
>> executables if it is also mounted exec (W^X).
> 
> UID 0 can just change mount options, though, unless SELinux or similar is 
> used. And SELinux can protect /dev just fine without noexec.
 
 Well, mounting would need CAP_SYS_ADMIN in addition to UID 0. Also SELinux
 is not universal and the policies might not contain all users or services.
 
 -Topi
>>> 
>>> What's the data that supports having noexec /dev anyway? With root
>>> access I can then just use something else like /dev/shm mount.
>>> 
>>> Has there been out in the wild real world cases that noexec mount
>>> of would have prevented?
>> Typo: "of" = "of /dev"
>>> For me this sounds a lot just something that "feels more secure"
>>> without any measurable benefit. Can you prove me wrong?
>> The debate is circled around something not well defined. Of course you
>> get theoretically more safe system when you decrease priviliges anywhere
>> in the system. Like you could start do grazy things with stuff that
>> unprivilged user has access, in order to prevent malware to elevate to
>> UID 0 in the first place.
>> I think where this go intellectually wrong is that we are talking about
>> *default installation* of a distribution. That should have somewhat sane
>> common sense access control settings. For like a normal desktop user
>> noexec /dev will not do any possible favor.
>> Then there is the case when you want to harden installation for an
>> application, let's' say some server. In that case you will anyway
>> fine-tune the security settings and go grazy enough with hardening.
>> When you tailor a server, it's a standard practice to enumerate and
>> adjust the mount points if needed.
> 
> I think we agree that there's a need for either way to allow SGX (if default 
> is hardened) or a hardening option (in the opposite case). For systemd I see 
> two approaches:
> 
> 1. Default noexec /dev, override with something like
> - ExecPaths=/dev
> - MountOptions=/dev:rw,exec,dev,nosuid
> - or even MountOptions=/dev/sgx:rw,exec,dev,nosuid
> - ProtectDev=no
> - AllowSGX=yes
> 
> 2. Default exec /dev, override with
> - NoExecPaths=/dev
> - MountOptions=/dev:rw,noexec,dev,nosuid
> - ProtectDev=yes
> - DenySGX=yes
> 
> I'd prefer 1. but of course 2. would be reasonable.

I would argue for 2, for the following reason.  I absolutely agree that 
hardening a system by making it impossible to create executable code 
dynamically is valuable, but I don’t think it’s a good default. By default, 
programs like gcc and clang should work, but so should JITs, and JITs are 
getting more popular and powerful all the time.  In a default setting that 
allows JITs, etc, I see no benefit at all to making /dev noexec. To the 
contrary, making /dev noexec seems like plugging a little restricted corner of 
code creation (because it requires UID=0) while allowing the easy ways (/tmp, 
/home, /dev/shm, unshare(2), mmap(), etc).  By all means let admins harden 
this, but I see no reason to apply some of the hardening when the rest is 
disabled.

> 
>> To summarize, I neither understand the intended target audience.
> 
> We have something in common: me neither. What's the target audience for SGX? 
> What's the use case? What are the users: browsers, system services? How would 
> applications use SGX? Should udev rules for /dev/sgx make it available to any 
> logged in users with uaccess tags?
> 
> 

I would certainly like it to be available to all software, with the possible 
exception of extra-hardened systems. Using SGX is not really an interesting 
attack surface. The main threat is that malware might use SGX to make itself 
hard to reverse engineer.

In Intel’s original vision, only specially licensed vendors could create SGX 
software, but Linux pushed back against this quite hard, and new CPUs allow 
unlicensed enclaves. So your Skylake CPUs support SGX, but not on Linux.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-08 Thread Andy Lutomirski

> On Dec 8, 2020, at 12:45 PM, Topi Miettinen  wrote:
> 
> On 8.12.2020 20.07, Andy Lutomirski wrote:
>>> On Thu, Nov 19, 2020 at 10:05 AM Topi Miettinen  wrote:
>>> 
>>> On 19.11.2020 18.32, Zbigniew Jędrzejewski-Szmek wrote:
>>>> On Thu, Nov 19, 2020 at 08:17:08AM -0800, Andy Lutomirski wrote:
>>>>> Hi udev people-
>>>>> 
>>>>> The upcoming Linux SGX driver has a device node /dev/sgx.  User code
>>>>> opens it, does various setup things, mmaps it, and needs to be able to
>>>>> create PROT_EXEC mappings.  This gets quite awkward if /dev is mounted
>>>>> noexec.
>>>>> 
>>>>> Can udev arrange to make a device node executable on distros that make
>>>>> /dev noexec?  This could be done by bind-mounting from an exec tmpfs.
>>>>> Alternatively, the kernel could probably learn to ignore noexec on
>>>>> /dev/sgx, but that seems a little bit evil.
>>>> 
>>>> I'd be inclined to simply drop noexec from /dev by default.
>>>> We don't do noexec on either /tmp or /dev/shm (because that causes 
>>>> immediate
>>>> problems with stuff like Java and cffi). And if you have those two at your
>>>> disposal anyway, having noexec on /dev doesn't seem important.
>>> 
>>> I'd propose to not enable exec globally, but if a service needs SGX, it
>>> could use something like MountOptions=/dev:exec only in those cases
>>> where it's needed. That way it's possible to disallow writable and
>>> executable file systems for most services (which typically don't need
>>> /tmp or /dev/shm either). Of course the opposite
>>> (MountOptions=/dev:noexec) would be also possible, but I'd expect that
>>> this would be needed to be used more often.
>>> 
>> I imagine the opposite would be more sensible.  It seems odd to me
>> that we would want any SGX-using service to require both special mount
>> options and regular ACL permissions.
> 
> How common are thes SGX-using services? Will every service start using it 
> without any special measures taken on it's behalf, or perhaps only a special 
> SGX control tool needs access? What about unprivileged user applications, do 
> they ever want to access SGX? Could something like Widevine deep in a browser 
> need to talk to SGX in a DRM scheme?

I honestly don’t know. Widevine is probably some unholy mess of SGX and ME 
crud. But regular user programs may well end up using SGX for little non-evil 
enclaves, e.g. storing their keys securely.  It would be nice if unprivileged 
enclaves just work as long as the use has appropriate permissions on the device 
nodes.

SGX adoption has been severely hampered by the massive series of recent 
vulnerabilities and by Intel’s silly licensing scheme. The latter won’t be 
supported upstream.

> 
>> As  a further argument, I just did this on a Fedora system:
>> $ find /dev -perm /ugo+x -a \! -type d -a \! -type l
>> No results.  So making /dev noexec doesn't seem to have any benefit.
> 
> It's no surprise that there aren't any executables in /dev since removing 
> MAKEDEV ages ago. That's not the issue, which is that /dev is a writable 
> directory (for UID=0 but no capabilities are needed) and thus a potential 
> location for constructing unapproved executables if it is also mounted exec 
> (W^X).

UID 0 can just change mount options, though, unless SELinux or similar is used. 
And SELinux can protect /dev just fine without noexec.

> 
> -Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-08 Thread Andy Lutomirski
On Thu, Nov 19, 2020 at 10:05 AM Topi Miettinen  wrote:
>
> On 19.11.2020 18.32, Zbigniew Jędrzejewski-Szmek wrote:
> > On Thu, Nov 19, 2020 at 08:17:08AM -0800, Andy Lutomirski wrote:
> >> Hi udev people-
> >>
> >> The upcoming Linux SGX driver has a device node /dev/sgx.  User code
> >> opens it, does various setup things, mmaps it, and needs to be able to
> >> create PROT_EXEC mappings.  This gets quite awkward if /dev is mounted
> >> noexec.
> >>
> >> Can udev arrange to make a device node executable on distros that make
> >> /dev noexec?  This could be done by bind-mounting from an exec tmpfs.
> >> Alternatively, the kernel could probably learn to ignore noexec on
> >> /dev/sgx, but that seems a little bit evil.
> >
> > I'd be inclined to simply drop noexec from /dev by default.
> > We don't do noexec on either /tmp or /dev/shm (because that causes immediate
> > problems with stuff like Java and cffi). And if you have those two at your
> > disposal anyway, having noexec on /dev doesn't seem important.
>
> I'd propose to not enable exec globally, but if a service needs SGX, it
> could use something like MountOptions=/dev:exec only in those cases
> where it's needed. That way it's possible to disallow writable and
> executable file systems for most services (which typically don't need
> /tmp or /dev/shm either). Of course the opposite
> (MountOptions=/dev:noexec) would be also possible, but I'd expect that
> this would be needed to be used more often.
>

I imagine the opposite would be more sensible.  It seems odd to me
that we would want any SGX-using service to require both special mount
options and regular ACL permissions.

As  a further argument, I just did this on a Fedora system:

$ find /dev -perm /ugo+x -a \! -type d -a \! -type l

No results.  So making /dev noexec doesn't seem to have any benefit.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Creating executable device nodes in /dev?

2020-11-19 Thread Andy Lutomirski
Hi udev people-

The upcoming Linux SGX driver has a device node /dev/sgx.  User code
opens it, does various setup things, mmaps it, and needs to be able to
create PROT_EXEC mappings.  This gets quite awkward if /dev is mounted
noexec.

Can udev arrange to make a device node executable on distros that make
/dev noexec?  This could be done by bind-mounting from an exec tmpfs.
Alternatively, the kernel could probably learn to ignore noexec on
/dev/sgx, but that seems a little bit evil.

Thanks,
Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] kdbus refactoring?

2015-11-09 Thread Andy Lutomirski
On Mon, Nov 9, 2015 at 9:07 AM, Greg KH  wrote:
> On Mon, Nov 09, 2015 at 05:02:45PM +, Måns Rullgård wrote:
>> Andy Lutomirski  writes:
>>
>> > On Sun, Nov 8, 2015 at 3:30 PM, Greg KH  wrote:
>> >> On Sun, Nov 08, 2015 at 10:39:43PM +0100, Richard Weinberger wrote:
>> >>> On Sun, Nov 8, 2015 at 10:35 PM, Greg KH  
>> >>> wrote:
>> >>> > On Sun, Nov 08, 2015 at 10:06:31PM +0100, Richard Weinberger wrote:
>> >>> >> Hi all,
>> >>> >>
>> >>> >> after reading on the removal of kdbus from Rawhide[1] I've searched
>> >>> >> the mailinglist archives for more details but didn't find anything.
>> >>> >> So, what are your plans?
>> >>> >>
>> >>> >> [1] 
>> >>> >> https://lists.fedoraproject.org/pipermail/kernel/2015-October/006011.html
>> >>> >
>> >>> > As that link said, based on the result of the code being in Rawhide, it
>> >>> > is now being reworked / redesigned.  The result will be posted for
>> >>> > review "when it's ready".
>> >>>
>> >>> If you rework/redesign something you have to know what you want to 
>> >>> change.
>> >>> That's why I was asking for the plan...
>> >>
>> >> Since when do people post "plans" or "design documents" on lkml without
>> >> real code?  Again, code will be posted when it's ready, like any other
>> >> kernel submission.
>> >
>> > I ask for feedback on ideas and designs on a fairly regular basis.  I
>> > even frequently get valuable feedback :)
>> >
>> > I would like to think that the kernel community would have something
>> > of value to add to the process of designing and implementing a major
>> > new IPC mechanism.
>>
>> The "trust us, we'll show it when it's ready" attitude reminds me of the
>> controversial TPP and TTIP negotiations.
>
> Ok, that's just trolling, cut it out.
>
> When something is even in the "hey look, it works, here's the big
> changes from last time", we will of course post it, but right now,
> things are being totally revisited based on the feedback we have
> received so far.  Give people a chance to recover from conferences and
> then get back to work...
>

I hate to say this, but this approach to receiving feedback makes me
really dislike the process.

I read a fairly large fraction of the kdbus code.  I found what I
perceived to be issues, and I spoke up.  I was told for quite a while
that the authors disagreed that the issues I found were issues and
that my assessment of the security aspects of the code was correct.

Now the submission has been withdrawn (because of feedback received so
far?  from me?) and there will apparently be a new submission out of
the blue, allegedly based on feedback.

As a developer, I'm willing to ask for feedback on ideas and to ask
for feedback on code.  In many cases, I've gotten (correct!) feedback
telling me that my design is wrong or needs major changes.  I *always*
try to answer such feedback respectfully and, if the reviewers are
right (which they usually are), I won't keep shoving code they don't
like in their face.  In fact, IIRC I got my start as a serious x86
developer when I wrote some code and tglx told me that the way I
designed it was unacceptable for upstream.  In response, I thought
about what the issues were, asked some questions, and rewrite the
majority of the code.  I think I learned a lot from the process, and
the code was vastly improved as a result.  If I'd sent substantially
the same patch series three or four more times and then declared that
I was withdrawing it without commenting directly on what I'd changed,
I really doubt that anyone would have taken my next submission
seriously.

Please understand that kdbus' approach to receiving feedback is very
off-putting.  Fortunately the vast majority of kernel developers
receive feedback for graciously and transparently, because otherwise
I'd probably just never review anything.  Frankly, if I were in the
chain of people through whom the kdbus code would flow to an eventual
home in Linus' tree, I would just say that the developers have used up
my patience as a reviewer and the onus would be on the developers to
demonstrate that it's worth my time to continue thinking about the
code.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] kdbus refactoring?

2015-11-09 Thread Andy Lutomirski
On Sun, Nov 8, 2015 at 3:30 PM, Greg KH  wrote:
> On Sun, Nov 08, 2015 at 10:39:43PM +0100, Richard Weinberger wrote:
>> On Sun, Nov 8, 2015 at 10:35 PM, Greg KH  wrote:
>> > On Sun, Nov 08, 2015 at 10:06:31PM +0100, Richard Weinberger wrote:
>> >> Hi all,
>> >>
>> >> after reading on the removal of kdbus from Rawhide[1] I've searched
>> >> the mailinglist archives for more details but didn't find anything.
>> >> So, what are your plans?
>> >>
>> >> [1] 
>> >> https://lists.fedoraproject.org/pipermail/kernel/2015-October/006011.html
>> >
>> > As that link said, based on the result of the code being in Rawhide, it
>> > is now being reworked / redesigned.  The result will be posted for
>> > review "when it's ready".
>>
>> If you rework/redesign something you have to know what you want to change.
>> That's why I was asking for the plan...
>
> Since when do people post "plans" or "design documents" on lkml without
> real code?  Again, code will be posted when it's ready, like any other
> kernel submission.

I ask for feedback on ideas and designs on a fairly regular basis.  I
even frequently get valuable feedback :)

I would like to think that the kernel community would have something
of value to add to the process of designing and implementing a major
new IPC mechanism.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Add ambient capability support to execution environment config?

2015-10-08 Thread Andy Lutomirski
For non-root services, getting Capabilities= and CapabilityBoundingSet= to
do anything useful is rather tricky.  Would it make sense to add
AmbientCapabilities= to set ambient (and, implicitly, inheritable)
capabilities, which will be available in Linux 4.3?

Alternatively, there could be a boolean option to change the meaning of
Capabilities so that it uses ambient capabilities instead of whatever it
currently does.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-20 Thread Andy Lutomirski
On Apr 20, 2015 9:07 AM, "Lennart Poettering" 
wrote:
>
> On Mon, 20.04.15 08:51, Andy Lutomirski (l...@amacapital.net) wrote:
>
> > > > > I will grant you that they aren't particularly expressive, and I
will
> > > > > grant you that one day there might be better concepts. But that's
not
> > > > > a strong reason not to support them really, that's just a reason
to
> > > > > later add support for something better.
> > > >
> > > > Technical reasons:
> > > >
> > > > 1. They can't be usefully delegated to a namespace.
> > >
> > > Not sure I can parse that. If you use the bus to communicate across
> > > namespace boundaries then each side lacks caps for the other. Great,
> > > that's how it should be. And same as for uid checks btw... if a uid
> > > cannot be translated, then it will not be passed!
> >
> > No.   You're right for nspawn-style namespaces, but not for namespaces
more
> > broadly.  Namespaces are a great way to drop various privileges, but
your
> > use of caps prevents certain uses (e.g. confining your hypothetical
> > time-setting daemon in a namespace).
>
> I have seen no use of userns for sandboxing normal daemons so far. I
> have seen tons of daemons using caps for such sandboxing.

Sandstorm.io

I bet Chromium will follow suit, and don't forget Docker and similar tools.

>
> maybe if userns in its current iteration doesn't mix as well as you'd
> like it with caps this is indication that userns might need some more
> polish, and not that caps are necessarily the bad guy here?
>

Nope, userns works just fine.

> I mean, as the one who added most of the sandboxing features we expose
> in systemd .service files currently (including things like
> ReadOnlyDirectories=, PrviateTmp=, PrivateNetwork= which are all based
> on kernel namespacing), I completely fail to see how we could even
> expose user namespace like this in a good way that would hold water?
> Capability-based sandboxing on the other hand is much easier to
> handle, and in many cases a highly efficient way to sandbox stuff. Two
> examples:

There's a world outside systemd and, in that world, programs would like to
be able to sandbox themselves.  Userns is wonderful for that purpose.

>
> systemd-networkd drops privileges, becomes the "systemd-network"
> user, only retains CAP_NET_ADMIN. That's actually already a really
> good sandbox!
>
> systemd-timesyncd drops privileges, becomes the "systemd-timesync"
> user, only retains CAP_SYS_TIME. And that's actually a really good
> sandbox too!
>
> Both daemons are network facing, hence sandboxing things like this is
> actually of quite some importance, and it *works*! Today! And it is
> easy! easy enough for most administrators to grok it easily! And
> because of that it is actually *good* security!

Except that if it's too coarse-gained, it fails to actually separate
privileges.

>
> And please don't discount these two daemons as irrelevant
> examples. They are highly relevant, since they run on a good chunk of
> Linux systems, as one of the few daemons that run really everywhere.
>
> Caps *do* have good uses, please don't claim otherwise. They are a
> *lot* more relevant on today's system than userns at least!
>
> (Note that I am not saying that userns are really a bad idea or so, I
> just would like to ask you to keep things in perspective: caps are
> *good* -- even though they could be much better. And they are a ton
> more useful and used than userns right now)
>
> > > OK, they are not very expressive, I granted you that already. But they
> > > are still more expressive than "uid == 0".
> > >
> > > That they are not expressive is something I can agree with, as
> > > mentioned above, but I don't consider this a real issue not to using
> > > them. I mean, it would be great if we had something better in the
> > > kernel, like capsicum or whatever, but we don't. And since caps are
> > > pretty well supported otherwise on Linux, and they are better then
> > > simple uid == 0 checks, I think they should be supported by kdbus too.
> >
> > This is a bad excuse.  Given that you're designing something new, you
have
> > plenty of room to do better.
>
> We are not really in the business in designing comprehensive new
> access control systems that can be used for in-kernel and in-userspace
> subsystems.
>
> Sure, we also have bus policy, but that given it's non-interactive
> logic it's not really suitable for the uses where we need uid or caps
> checks, i.e. dynamic access control within called methods that need to
> check different privileges dynamically.
>
> Lennart
>
> --
> Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-20 Thread Andy Lutomirski
On Apr 20, 2015 8:22 AM, "Lennart Poettering" 
wrote:
>
> On Mon, 20.04.15 08:08, Andy Lutomirski (l...@amacapital.net) wrote:
>
> > On Apr 20, 2015 7:57 AM, "Lennart Poettering" 
> > wrote:
> > >
> > > On Fri, 17.04.15 09:14, Andy Lutomirski (l...@amacapital.net) wrote:
> > >
> > > > My point here is that there's no real shortage of downsides to this
> > > > scheme, and there still appears to be little to no benefit.
> > >
> > > Well, let's turn this around. You seem to really dislike caps. And you
> > > vaguely claim security holes, which we have shown know don't
> > > exist. So, now, can you clearly explain why precisely you dislike them
> > > so much still?  And something more technical then "systemd shouldn't
> > > use them" or "i don't like them", or "they should only be used in the
> > > kernel", because these are not technical reasons, they are just claims
> > > of personal taste.
> > >
> > > I will grant you that they aren't particularly expressive, and I will
> > > grant you that one day there might be better concepts. But that's not
> > > a strong reason not to support them really, that's just a reason to
> > > later add support for something better.
> >
> > Technical reasons:
> >
> > 1. They can't be usefully delegated to a namespace.
>
> Not sure I can parse that. If you use the bus to communicate across
> namespace boundaries then each side lacks caps for the other. Great,
> that's how it should be. And same as for uid checks btw... if a uid
> cannot be translated, then it will not be passed!

No.   You're right for nspawn-style namespaces, but not for namespaces more
broadly.  Namespaces are a great way to drop various privileges, but your
use of caps prevents certain uses (e.g. confining your hypothetical
time-setting daemon in a namespace).

>
> > 2. The set of caps that exist is controlled by the kernel, whereas the
set
> > of dbus methods is large and controlled by userspace.  Caps can't scale
to
> > accommodate flexble userspace policies.
>
> OK, they are not very expressive, I granted you that already. But they
> are still more expressive than "uid == 0".
>
> That they are not expressive is something I can agree with, as
> mentioned above, but I don't consider this a real issue not to using
> them. I mean, it would be great if we had something better in the
> kernel, like capsicum or whatever, but we don't. And since caps are
> pretty well supported otherwise on Linux, and they are better then
> simple uid == 0 checks, I think they should be supported by kdbus too.

This is a bad excuse.  Given that you're designing something new, you have
plenty of room to do better.

>
> > 3. They don't appear to add value, and avoiding unnecessary complexity
is
> > good.
>
> Well, I disagree on this. I think they are better because more
> fine-grained than "uid == 0" checks.
>
> > 4. They suck.  This is a technical issue -- the kernel doesn't allow
> > flexible assignments of caps to processes.  This is a problem for kernel
> > API users and it will be a problem for you.
>
> Not a technical reason, unlike you claim. Just a personal taste issue.

Sure it is.  Caps are defined in the kernel sources and, as seems to have
been covered pretty well in this thread, the list of caps is very far from
what a userspace dbus service would want to check.

--Andy

>
> Honestly, I don't think the issues you raise are very convincing
>
> Lennart
>
> --
> Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-20 Thread Andy Lutomirski
On Apr 20, 2015 7:57 AM, "Lennart Poettering" 
wrote:
>
> On Fri, 17.04.15 09:14, Andy Lutomirski (l...@amacapital.net) wrote:
>
> > My point here is that there's no real shortage of downsides to this
> > scheme, and there still appears to be little to no benefit.
>
> Well, let's turn this around. You seem to really dislike caps. And you
> vaguely claim security holes, which we have shown know don't
> exist. So, now, can you clearly explain why precisely you dislike them
> so much still?  And something more technical then "systemd shouldn't
> use them" or "i don't like them", or "they should only be used in the
> kernel", because these are not technical reasons, they are just claims
> of personal taste.
>
> I will grant you that they aren't particularly expressive, and I will
> grant you that one day there might be better concepts. But that's not
> a strong reason not to support them really, that's just a reason to
> later add support for something better.

Technical reasons:

1. They can't be usefully delegated to a namespace.

2. The set of caps that exist is controlled by the kernel, whereas the set
of dbus methods is large and controlled by userspace.  Caps can't scale to
accommodate flexble userspace policies.

3. They don't appear to add value, and avoiding unnecessary complexity is
good.

4. They suck.  This is a technical issue -- the kernel doesn't allow
flexible assignments of caps to processes.  This is a problem for kernel
API users and it will be a problem for you.

--Andy

>
> Lennart
>
> --
> Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-17 Thread Andy Lutomirski
On Apr 17, 2015 4:53 AM, "Djalal Harouni"  wrote:
>
> Hi Andy,
>
> On Thu, Apr 16, 2015 at 12:30:28PM -0700, Andy Lutomirski wrote:
> > On Thu, Apr 16, 2015 at 11:23 AM, Lennart Poettering
> >  wrote:
> [...]
> > AFAICT this piece of kdbus code serves to enable a rather odd way to
> > write privilege-separated services to change the time and kill
> > processes.  The cost is complex security code that, at best, fails
> > secure in the presence of different user namespaces, and the cost also
> > involves touching a global refcount for each message sent (this might
> > be the *only* thing that would reference init_user_ns's refcount when
> > sending).  Oh yeah, the cost is also ABI crap -- if, say, my
> The global ref-counts on metadata is just a workaround due to userns and
> caps. I actually thought we already sorted that out?
>
>  https://lkml.org/lkml/2015/3/25/702
>
> Hmm there are other paths that refs user_ns, the mqueue notification
> perhaps ?
>
> Please note that we also have _per_ user quota accounting, the trade off
> of accouting prevents further performance penalties on other bus
> operations. Referring to performance critical code, this code path can
> just be ignored by to not opt-in for KDBUS_ATTACH_CAPS which is the
> default behaviour.

Quoting that link:

> It's conditional on KDBUS_ATTACH_CAPS, anyway.

Fair enough.

[end quote]

I don't believe it'll be usefully conditional.  Systemd is pretty
clearly planning on using it, so you get a silly, if small,
performance hit.

My point here is that there's no real shortage of downsides to this
scheme, and there still appears to be little to no benefit.

--Andy

>
> Thanks!
>
> --
> Djalal Harouni
> http://opendz.org
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-17 Thread Andy Lutomirski
On Apr 17, 2015 5:42 AM, "Simon McVittie"
 wrote:
>
> On 16/04/15 15:52, Andy Lutomirski wrote:
> > (I really think this dichotomy
> > needs to be removed, *especially* since it looks like code already
> > exists to try to use both metadata sources.  This seems like it's just
> > asking for security screw-ups.)
>
> Would it address this concern if there was an explicit API separation
> into "things that can't be faked, suitable for authorization" and
> "things that could have been faked, only suitable for debugging" - such
> that the uid would be in the first set only, capabilities would be in
> the first set on kdbus but absent or in the second set on traditional
> D-Bus, and /proc/*/cmdline would always be in the second set?

It would certainly improve the sd-bus code, I think.  I'm not a
systemd developer, though.

From the kernel side, I don't even see the point of reporting caps for
debugging IPC things.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-17 Thread Andy Lutomirski
On Apr 17, 2015 6:05 AM, "Cristian Rodríguez"  wrote:
>
> On Fri, Apr 17, 2015 at 7:51 AM, Lennart Poettering
>  wrote:
>
> > Groups *suck* as authentication scheme. If you add one group for each
> > privilege you want, then you'll have a huge number of groups, and
> > that's hardly desirable. It's pretty close to being unmanagable with
> > user/group editors. Also, you can never take group membership away,
> > since users who once where members of group can create sgid binaries
> > which allows them to always return into that group forever.
>
> Not to mention, we are running out of system users and groups in
> distributions (if we didn't already) and some people want us to
> provide fixed UID/GID system users
> across distributions for clustering applications...this is a totally
> unworkable way forward.

I believe you're arguing that you think gids are a scarse resource, so
you want to save ~2 gids (certainly fewer than 64) by inventing a
whole new userspace authorization scheme using *caps* that doesn't
even solve the problem that you want to solve.

I'm not sure how this is supposed to justify anything.  Caps are
probably the single least scalable authorization mechanism you could
come up with.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 11:23 AM, Lennart Poettering
 wrote:
> On Thu, 16.04.15 10:52, Andy Lutomirski (l...@amacapital.net) wrote:
>
>> >
>> > It would be very helpful if you could go into details on why you think
>> > more care is needed here than for other things. Is there anything
>> > non-trivial going on here that I'm missing? The way capabilites are
>> > exposed through kdbus seems perfectly straight-forward to me, so I'm
>> > really trying to get my head around your concerns here.
>> >
>> > So, let me ask explicitly again:
>> >
>> > * Are there any actual exploitable vulnerabilities you are aware
>> > of in the kdbus kernel code?
>> >
>> > * Are there any actual exploitable vulnerabilities on the sd-bus
>> > userspace code you are aware of, regardless if in the kdbus or dbus1
>> > support in it?
>> >
>> > If you are, please provide us with good explanations, so that we can
>> > do something about them. We'd love to fix this!
>>
>> I don't see a vulnerability in the code as such.  But you still
>> haven't given a real use case, and vulnerabilities in security
>> mechanisms are often found in the interactions between components.
>> There doesn't seem to be a big picture here.
>>
>> Can you give a plausible real world example along the lines of
>> "process A, which has capability B for some decent reason, will send
>> message C to daemon D, and D will permit the request because of
>> capability B, but the request would not otherwise have been
>> permitted."
>>
>> So far it seems like there's a bunch of code that may not be
>> exploitable but doesn't really do anything either.
>>
>> And, honestly, if this is only about rebooting and setting the time,
>> then, even if correct, this is seriously not worth the complexity even
>> if it's completely sensible and correct.
>
> Well, you asked for examples, I gave you two simple, existing
> ones.
>
> Note that the caps-based method authorization is not much used
> currently, especially not outside of systemd, because it's not safely
> doable on dbus1, but only on kdbus, and kdbus is not merged into the
> kernel yet, hence not many projects use it yet.
>
> But anyway, here are two more example in systemd itself:
>
> systemd itself checks CAP_SYS_KILL for clients asking to kill
> arbitrary services (which means invoking kill() to all PIDs in the
> service's cgroup).
>
> Similar to this, logind checks CAP_SYS_KILL for clients asking to kill
> sessions (which means invoking kill() for all PIDS in the session).
>

These aren't the kinds of examples I'm asking about.  They're what
systemd does, not what use case makes doing that serve a purpose..

> Now, to put together a more complex scenario for you: consider a small
> web UI that can be used to set the system time. It should realy run at
> minimal privileges, after all it has a surface to the web. Hence you
> write it as daemon, make it run as a user id of its own, set up a
> chroot() or a file system namespace, but you do keep CAP_SYS_TIME and
> a bus connection open.

Aha, a real example!

But this is a bad example -- systemd should provide the web server
with a way to retain the right to issue the one method call that it
needs.  CAP_SYS_TIME is overbroad.  For example, CAP_SYS_TIME also
grants the service to bypass whatever auditing and logging systemd
would do.

And this is also a worthless design pattern, as it seems to be
applicable to sending signals and changing the time, and probably very
little else.

> With that setup the web daemon can pretty much
> only set the system clock, and if it exploited cannot be used for much
> else. It will not have access to /dev/rtc, due to the file system
> namespace, but it can nicely set the system clock via timedated still,
> and pretty much only that, and the clock will be synced to the RTC by
> it.

It can set the time with settimeofday(2), which is undesirable.

>
> To map this back to your earlier request for an example. In this case
> process A is the web UI daemon. Capability B is CAP_SYS_TIME. Message
> C is the SetTime() bus call. Daemon D is timedated.
>
> If the web UI daemon would not have CAP_SYS_TIME it couldn't change
> the time like this -- unless of course that web UI daemon would run as
> UID 0 all the time, which is certainly much much less desirable, given
> that it is a network facing daemon.
>
> And if you are not happy with this example, there are numerous others
> like this of course...

Honestly, I doubt that.  There are a smallish number of capabilities

Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 10:30 AM, Lennart Poettering
 wrote:
> On Thu, 16.04.15 09:53, Andy Lutomirski (l...@amacapital.net) wrote:
>
>> > It's a noop, unless people OR in SD_BUS_CREDS_AUGMENT into the flags
>> > of creds they want. Doing this basically voids your warranty: it means
>> > that the creds data shall be augmented with data from /proc, which are
>> > good enough for logging, debugging things, but not for security
>> > relevant things, such as authorizing message calls.
>>
>> OK, I finally found the relevant AUGMENT check.
>>
>> >
>> > Our own message authorization call does not make use of this flag.
>> >
>>
>> mac_selinux_generic_access_check does.  It also uses sender metadata,
>> which I find interesting given that you seem to strongly prefer using
>> message metadata.
>
> It certainly does. But this is not actually used for authentication
> either, ly, because the auth code in there only cares for the
> selinux label, which is available as native connection metadata on
> dbus1 already. The augmentation is used because the selinux guys want to
> generate a nice log message when the permission is not granted.
>
> The selinux code in systemd was actually contributed by the selinux
> people a long time ago. It contained code to read data from /proc
> manually for showing in the messages. I recently reworked this to just
> use SD_BUS_CREDS_AUGMENT instead, so that reading data from /proc is
> at least unified at one place. This is safe only because we know that
> the selinux data is not augmented, but available anyway and safely,
> and we rely on that.
>
> You can take the selinux code in system as a hint, that getting
> metadata about bus requests that one can log about is nothign we made
> up, it's very real, and our selinux code added that years ago, and was
> written by the selinux guys, not us...
>
> I will grant you though that it is confusing that we use
> SD_BUS_CREDS_AUGMENT here like this, and implicitly rely on that the
> selinux label is not a field that is being augmented. We should make
> this explicit, absolutely. I'll now add some code that will make this
> assumption explicit and fails early if the selinux label happens to be
> augmented. Of course in real-life this is impossible to trigger, but
> it's certainly helps understanding the code.
>
>> >> Agreed, and that's not the only vulnerability.  But code to do it
>> >> certainly exists in systemd.
>> >
>> > "Not the only vulnerability"? I am not aware of any vulnerability, can
>> > you explain?
>>
>> See below and attached.
>>
>> >
>> >> > $ cp `which busctl` .
>> >> > $ sudo setcap all=eip busctl
>> >> > $ ./busctl call org.freedesktop.timedate1 /org/freedesktop/timedate1
>> >> > org.freedesktop.timedate1 SetTimezone "sb" Europe/London 0
>> >> >
>> >>
>> >> I still don't see why, even if this were bug-free, it's even remotely
>> >> useful.  Keep in mind that basically no one has a capability if
>> >> they're not root, and dbus users should really use *dbus* mechanisms
>> >> to retain selected privileges when they drop other privileges.
>> >
>> > Well, some of systemd's own deamons run with an uid != 0, but some
>> > caps left.
>> >
>> > And avahi (as a daemon I know very well, that is not part of systemd)
>> > does that too, since 10 years actually... It's not that uncommon among
>> > system daemons really.
>>
>> And does it use those caps for the purpose of authenticating to dbus
>> services?
>
> No, it does not. Because it's unsafe to do that on dbus1. As mentioned
> before, kdbus makes this safe.
>
>> If so, perhaps you should reconsider allowing open dbus connections to
>> act as capabilities (the real kind, not the Linux capability crap)
>> themselves.
>
> No, they do not use capabilities for bus authentication, since they
> predate kdbus, and acquiring them on dbus1/af_unix is not safe.
>
>> >> Sure.  Unshare your user namespace, set things up right, and systemd
>> >> or any other server will see you as having all capabilities.  You've
>> >> fixed that in kdbus, but you haven't (and probably can't!) fix it in
>> >> the legacy code, and that legacy code is still there (!).  (Actually,
>> >> that code seems to have been actively developed as recently as
>> >> November 2014.)
>> >
>> > Hmm? I do not understand what

Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 10:43 AM, Tom Gundersen  wrote:
> On Thu, Apr 16, 2015 at 5:57 PM, Andy Lutomirski  wrote:
>>> We have several uses of this, see my mail to Jiri regarding
>>> CAP_SYS_BOOT for instance:
>>>   https://lkml.org/lkml/2015/4/16/219
>>
>> I read that, but I disagree with you.
>>
>> CAP_SYS_BOOT is the privilege to directly hard-reboot the system, not
>> the privilege to initiate a clean reboot.
>
> My main point that being allowed to do a hard-reboot, but not to do a
> clean reboot, makes no sense.
>
>>> However, what we are trying to get to the bottom of is if you see any
>>> technical problems with the current kdbus capability handling code. My
>>> understanding is that you don't.
>>>
>>
>> I have a technical problem with it: it's a design that has
>> insufficient justification.  It also seems like it will be quite
>> limiting in the future, *especially* wrt user namespaces.
>
> What I was really asking was if you see any actual vulnerabilities.
> That we have a different technical opinion on the design of this is a
> different matter.
>
>> I agree that it's probably not exploitable *if used carefully* in the
>> latest kdbus code.
>
> It would be very helpful if you could go into details on why you think
> more care is needed here than for other things. Is there anything
> non-trivial going on here that I'm missing? The way capabilites are
> exposed through kdbus seems perfectly straight-forward to me, so I'm
> really trying to get my head around your concerns here.
>
> So, let me ask explicitly again:
>
> * Are there any actual exploitable vulnerabilities you are aware
> of in the kdbus kernel code?
>
> * Are there any actual exploitable vulnerabilities on the sd-bus
> userspace code you are aware of, regardless if in the kdbus or dbus1
> support in it?
>
> If you are, please provide us with good explanations, so that we can
> do something about them. We'd love to fix this!

I don't see a vulnerability in the code as such.  But you still
haven't given a real use case, and vulnerabilities in security
mechanisms are often found in the interactions between components.
There doesn't seem to be a big picture here.

Can you give a plausible real world example along the lines of
"process A, which has capability B for some decent reason, will send
message C to daemon D, and D will permit the request because of
capability B, but the request would not otherwise have been
permitted."

So far it seems like there's a bunch of code that may not be
exploitable but doesn't really do anything either.

And, honestly, if this is only about rebooting and setting the time,
then, even if correct, this is seriously not worth the complexity even
if it's completely sensible and correct.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 9:43 AM, Tom Gundersen  wrote:
> On Thu, Apr 16, 2015 at 4:52 PM, Andy Lutomirski  wrote:
>> Unshare your user namespace, set things up right, and systemd
>> or any other server will see you as having all capabilities.  You've
>> fixed that in kdbus, but you haven't (and probably can't!) fix it in
>> the legacy code, and that legacy code is still there (!).
>
> The dbus1 code (which I assume you mean when you say "legacy code")
> does not make use of capabilities, and it should not (see Lennart's
> answer for all the details). If anything, this should be an argument
> to move to kdbus with native, race-free capability-passing support.
>
> Do I understand correctly, that any concerns you had are about systemd
> and its dbus1 compat code (which of course we should take seriously
> too), and that you no longer see any security vulnerabilities in the
> capability related code of kdbus?
>
>> The ratio of complexity of capability code the kdbus folks have
>> already written (hundreds of lines across multiple files) to its
>> utility (very near zero AFAICT) is, in my book, not a good sign at
>> all.
>
> We have several uses of this, see my mail to Jiri regarding
> CAP_SYS_BOOT for instance:
>   https://lkml.org/lkml/2015/4/16/219

I read that, but I disagree with you.

CAP_SYS_BOOT is the privilege to directly hard-reboot the system, not
the privilege to initiate a clean reboot.

Keep in mind that, on some recent Windows versions, for the most part
you *can't* directly hard-reboot the system; instead you have to give
the OS a reason so the OS can log it.  IOW, the high-level Windows
reboot permission doesn't confer the privilege of directly
hard-rebooting.

Maybe systemd or GNOME will want to do that some day.

>
> However, what we are trying to get to the bottom of is if you see any
> technical problems with the current kdbus capability handling code. My
> understanding is that you don't.
>

I have a technical problem with it: it's a design that has
insufficient justification.  It also seems like it will be quite
limiting in the future, *especially* wrt user namespaces.

I agree that it's probably not exploitable *if used carefully* in the
latest kdbus code.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 8:59 AM, Lennart Poettering
 wrote:
> On Thu, 16.04.15 07:52, Andy Lutomirski (l...@amacapital.net) wrote:
>
>> I'm looking at sd_bus_query_sender_privilege, which does:
>>
>> r = sd_bus_query_sender_creds(call,
>> SD_BUS_CREDS_UID|SD_BUS_CREDS_EUID|SD_BUS_CREDS_EFFECTIVE_CAPS,
>> &creds);
>>
>> That, in turn, does:
>>
>> if (!c || !(c->mask & SD_BUS_CREDS_PID)) {
>> /* We couldn't read anything from the call, let's try
>>  * to get it from the sender or peer */
>>
>> if (call->sender)
>> return sd_bus_get_name_creds(call->bus,
>> call->sender, mask, creds);
>> else
>> return sd_bus_get_owner_creds(call->bus, mask, 
>> creds);
>> }
>>
>> return bus_creds_extend_by_pid(c, mask, creds);
>>
>> The sd_bus_get_name_creds call seems questionable to me -- isn't that
>> trying to augment missing information from the time of call with
>> metadata from the sender's connection?  (I really think this dichotomy
>> needs to be removed, *especially* since it looks like code already
>> exists to try to use both metadata sources.  This seems like it's just
>> asking for security screw-ups.)
>
> This information is missing only for dbus1 clients connecting via the
> dbus1 compat proxy. In that case we need to fall back to connection
> credentials, because that's how authentication works on dbus1.
>
>> The sd_bus_get_owner_creds call looks totally bogus.  I don't know
>> when it would be called, but I hope the answer is "provably never in
>> response to a message from an unprivileged user", in which case I'd
>> wonder why it's there.
>
> Messages with empty senders are only used for dbus1 *direct*
> connections. This is a special setup dbus1 supports where
> communication is not done via the bus daemon, but via a direct AF_UNIX
> connection between two peers. It's very seldom used, and mostly a hack
> to avoid the performance penalty that the dbus-daemon means. These
> direct connections are entirely redundant in a kdbus world, and you
> better shouldn't think about them.
>
> Ultimately, this code exists in sd-bus for one reason only: in a
> non-kdbus world systemd must be accessible even if dbus-daemon is not
> up. Since dbus-daemon is only started relatively late during boot
> there's a window in early boot where we need to be accessible by some
> other means: for that we allow privileged clients to connect to us
> directly. We then still speak the dbus protocol, and offer similar
> APIs, but we do not use the bus topology. In kdbus all of that goes
> away, since kdbus is available all the time, since earliest time of
> boot.
>
> For such direct dbus1 connections it's the creds of the peer of the
> dbus connection that we need to authenticate, that is exposed as
> "owner creds".
>
> Or in other words, the if check above you can read like this:
>
> if (we_get_a_full_kdbus_message)
> use_the_attached_creds();
> else {
> if (we_get_a_compat_dbus1_message_via_kdbus_or_native_dbus1)
> use_the_senders_creds()
> else if (we_get_a_message_via_dbus_direct_connection())
> use_the_peer_creds();
> }
>
> To make this easier to understand I'll add some comments to this code:
>
> http://cgit.freedesktop.org/systemd/systemd/commit/?id=2d0c1561340efff3265fe89b05eae4ee8f4037a7
>
>> The bus_creds_extend_by_pid is what I'm asking about.  When is it
>> used?
>
> It's a noop, unless people OR in SD_BUS_CREDS_AUGMENT into the flags
> of creds they want. Doing this basically voids your warranty: it means
> that the creds data shall be augmented with data from /proc, which are
> good enough for logging, debugging things, but not for security
> relevant things, such as authorizing message calls.

OK, I finally found the relevant AUGMENT check.

>
> Our own message authorization call does not make use of this flag.
>

mac_selinux_generic_access_check does.  It also uses sender metadata,
which I find interesting given that you seem to strongly prefer using
message metadata.

> Also see the comment about this in sd-bus.h:
>
> http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h?id=2d0c1561340efff3265fe89b05eae4ee8f4037a7#n89
>
> Or check the man page for busctl that explains this too:
>
> http://www.freedesktop.org/software/systemd/man/busctl.html#--augment-creds=BOOL
>
>> In any event, under certain (st

Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 3:23 AM, Tom Gundersen  wrote:
> Hi Andy,
>
> On Thu, Apr 16, 2015 at 2:55 AM, Andy Lutomirski  wrote:
>> Yesterday, I discovered SD_BUS_VTABLE_CAPABILITY.  Are there any
>> examples in which it does anything?
>
> Please note that you need to be using kdbus to get any capabilities
> transported, so in dbus1 this does nothing (as on dbus1 using
> capabilities would be racy). Are you testing this on kdbus?

No.

I'm looking at sd_bus_query_sender_privilege, which does:

r = sd_bus_query_sender_creds(call,
SD_BUS_CREDS_UID|SD_BUS_CREDS_EUID|SD_BUS_CREDS_EFFECTIVE_CAPS,
&creds);

That, in turn, does:

if (!c || !(c->mask & SD_BUS_CREDS_PID)) {
/* We couldn't read anything from the call, let's try
 * to get it from the sender or peer */

if (call->sender)
return sd_bus_get_name_creds(call->bus,
call->sender, mask, creds);
else
return sd_bus_get_owner_creds(call->bus, mask, creds);
}

return bus_creds_extend_by_pid(c, mask, creds);

The sd_bus_get_name_creds call seems questionable to me -- isn't that
trying to augment missing information from the time of call with
metadata from the sender's connection?  (I really think this dichotomy
needs to be removed, *especially* since it looks like code already
exists to try to use both metadata sources.  This seems like it's just
asking for security screw-ups.)

The sd_bus_get_owner_creds call looks totally bogus.  I don't know
when it would be called, but I hope the answer is "provably never in
response to a message from an unprivileged user", in which case I'd
wonder why it's there.

The bus_creds_extend_by_pid is what I'm asking about.  When is it used?

In any event, under certain (strewn-across-multiple-files with
duplicated code in various places) conditions, all three paths above
seem to land in bus_creds_add_more.

It looks like some of this is gated on whether the sending PID is
known (IMO this is odd -- it should be gated by whether you think it's
a good idea, not whether a piece of information is known), but maybe
mac_selinux_generic_access_check can circumvent that.

So why exactly are capabilities only used with kdbus?  I don't see it,
and I do see code that tries to extract them from /proc.

>
>> If so, I don't suppose any of you
>> could give me an example of:
>>
>> $ cp `which dbus-send` .
>> $ sudo setcap all=eip dbus-send
>> $ dbus-send [not sure what goes here]
>>
>> that passes an authentication test that would have failed without the setcap?
>
> Note that dbus-send is the old dbus1 command, and will always go via
> the bus proxy that connects dbus1 clients to kdbus. As such, it will
> only carry the credentials that are available safely on dbus1, and
> hence not capabilities. So even if you do use kdbus on your machine,
> this example would not work.
>
> To be clear, the reason we do not use the capability logic on dbus1,
> but only on kdbus is as follows: On kdbus, the capabilities are
> attached to the method call itself, hence we safely know that at the
> time the method call was issued by the client side, that client
> actually really had those capabilities. On dbus1 however, which uses
> AF_UNIX/SOCK_STREAM, this is not possible: we only get the
> UID/GID/PID, hence we could only derive the caps from
> /proc/$PID/status. But this opens this up for a vulnerability: if a
> client quickly issues a method call, and then exec()s a suid binary,
> it could happen that the service side would read the capabilities from
> that SUID process, and not the original process, and the exploit was
> successful. We cannot allow that, hence no capabilities on dbus1,
> instead we open up things to a much, much broader uid == 0. kdbus
> allows us to be much stricter there, by allowing us to check only one
> specific capability.

Agreed, and that's not the only vulnerability.  But code to do it
certainly exists in systemd.

>
> An equivalent example to the one you gave using native kdbus would be:
>
> $ cp `which busctl` .
> $ sudo setcap all=eip busctl
> $ ./busctl call org.freedesktop.timedate1 /org/freedesktop/timedate1
> org.freedesktop.timedate1 SetTimezone "sb" Europe/London 0
>

I still don't see why, even if this were bug-free, it's even remotely
useful.  Keep in mind that basically no one has a capability if
they're not root, and dbus users should really use *dbus* mechanisms
to retain selected privileges when they drop other privileges.

--Andy

>> In the interest of full disclosure, I'm asking because I think that
>> one of two things is true:
>>
>> 1. The SD_BUS_VTABLE_C

[systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-15 Thread Andy Lutomirski
Hi all-

Yesterday, I discovered SD_BUS_VTABLE_CAPABILITY.  Are there any
examples in which it does anything?  If so, I don't suppose any of you
could give me an example of:

$ cp `which dbus-send` .
$ sudo setcap all=eip dbus-send
$ dbus-send [not sure what goes here]

that passes an authentication test that would have failed without the setcap?

In the interest of full disclosure, I'm asking because I think that
one of two things is true:

1. The SD_BUS_VTABLE_CAPABILITY code is useless and should therefore be deleted.

2. The SD_BUS_VTABLE_CAPABILITY code is exploitably buggy and should
therefore be deleted.

I can't tell which one, since I haven't figured out how to test it
realistically in the first place.  Most of the protected calls seem to
be heavily restricted by dbus policy.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Wed, Apr 1, 2015 at 2:47 PM, Kay Sievers  wrote:
> On Wed, Apr 1, 2015 at 11:38 PM, Andy Lutomirski  wrote:
>> On Wed, Apr 1, 2015 at 2:36 PM, Kay Sievers  wrote:
>
>>> They should only get created when something accesses the corresponding
>>> tty. deallocvt(1) can kill unused ones and the device nodes should
>>> disappear.
>>>
>>
>> deallocvt doesn't seem to kill those device nodes for me.
>
> Seems to work here:
>
> # ls -l /dev/vcs[6789]
> crw-rw 1 root tty 7, 6 Apr  1 22:21 /dev/vcs6
> # cat /dev/tty7
> ^C
> # cat /dev/tty9
> ^C
> # ls -l /dev/vcs[6789]
> crw-rw 1 root tty 7, 6 Apr  1 22:21 /dev/vcs6
> crw-rw 1 root tty 7, 7 Apr  1 23:42 /dev/vcs7
> crw-rw 1 root tty 7, 9 Apr  1 23:42 /dev/vcs9
> # deallocvt 7
> # ls -l /dev/vcs[6789]
> crw-rw 1 root tty 7, 6 Apr  1 22:21 /dev/vcs6
> crw-rw 1 root tty 7, 9 Apr  1 23:42 /dev/vcs9
> # deallocvt 9
> # ls -l /dev/vcs[6789]
> crw-rw 1 root tty 7, 6 Apr  1 22:21 /dev/vcs6

Aha.  It seems that I have something holding tty1-tty4 open.  I'll fix
that on my end.  Will that make vconsole-setup stop calling setfont?
If so, that will indirectly solve my problem.  (Although... I don't
see why the presence or absence of in-use vts should affect font
loading.  Also, it seems like vcs1 shows up no matter what I do.)

>
>>>> The offending qemu command line args appear to be -vga none -display
>>>> none.  I assume I have "CGA" because it's the fallback case in
>>>> vgacon.c if nothing matches.
>>>
>>> Hehe, blast from the past. :) If you give kvm a VGA device, it all works 
>>> fine?
>>
>> I just tried it.  setfont succeeds, and the VGA device matches
>> /dev/vcs's contents.
>
> Ah, nice.
>
> If we figure out some dummy font-related call to check if the kernel
> supports font handling at all, we could just add that to
> vconsole-setup, I guess.
>
> Kay



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Wed, Apr 1, 2015 at 2:36 PM, Kay Sievers  wrote:
> On Wed, Apr 1, 2015 at 11:19 PM, Andy Lutomirski  wrote:
>> On Wed, Apr 1, 2015 at 1:53 PM, Kay Sievers  wrote:
>>> On Wed, Apr 1, 2015 at 10:45 PM, Andy Lutomirski  
>>> wrote:
>>>> On Apr 1, 2015 12:56 PM, "Kay Sievers"  wrote:
>>>
>>>>> Do you have an idea why the VM does not accept the custom font? If
>>>>> that is something obvious, and we can detect it, we could make
>>>>> vconsole-setup check for it. But then again, fixing setfont seems like
>>>>> the obvious fix here.
>>>>
>>>> I assume it's because the VM has no graphical console at all.
>>>
>>> We check the existence of the corresponding /dev/vcs%i, to check if
>>> the tty is allocated where we want to apply the font to. Do these
>>> devices exist on the running machine?
>>
>> Yes:
>>
>> # ls /dev/vcs*
>> /dev/vcs   /dev/vcs2  /dev/vcs4  /dev/vcsa1  /dev/vcsa3
>> /dev/vcs1  /dev/vcs3  /dev/vcsa  /dev/vcsa2  /dev/vcsa4
>>
>> Looking at the code, the vc_screen.c code seems to create those
>> devices unconditionally.
>
> They should only get created when something accesses the corresponding
> tty. deallocvt(1) can kill unused ones and the device nodes should
> disappear.
>

deallocvt doesn't seem to kill those device nodes for me.

>>> And what does this say?
>>>   grep . /sys/class/tty/tty0/active /sys/class/tty/console/active
>>
>> # grep . /sys/class/tty/tty0/active /sys/class/tty/console/active
>> /sys/class/tty/tty0/active:tty1
>> /sys/class/tty/console/active:ttyS0
>>
>> vcs1 has, roughly:
>>
>> early console in decompress_kernel
>> Decompressing Linux... Parsing ELF... done.
>> Booting the kernel.
>>
>> Now I'm wondering how that buffer came to be.
>>
>> In any event, some tracing of the code suggests that I have
>> vga_video_type == VIDEO_TYPE_CGA, and that fails "if (vga_video_type <
>> VIDEO_TYPE_EGAM)" in vgacon_font_set.
>>
>> Indeed, /proc/ioports has:
>>
>>   03d4-03d5 : cga
>>
>> and dmesg says:
>>
>> [0.00] Console: colour *CGA 80x25
>>
>> I don't see this information in sysfs anywhere.  Perhaps checking for
>> an active console and detecting -EINVAL from vgacon_font_get would
>> work.
>
> Hmm, yeah, maybe we could try one of the font-related ioctls to find
> out if the driver supports that before we spawn setfont.
>
>> /proc/fb is empty on this VM, so maybe that would help.  Grr, this
>> stuff is really old and crufty.
>>
>> The offending qemu command line args appear to be -vga none -display
>> none.  I assume I have "CGA" because it's the fallback case in
>> vgacon.c if nothing matches.
>
> Hehe, blast from the past. :) If you give kvm a VGA device, it all works fine?

I just tried it.  setfont succeeds, and the VGA device matches
/dev/vcs's contents.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Wed, Apr 1, 2015 at 1:53 PM, Kay Sievers  wrote:
> On Wed, Apr 1, 2015 at 10:45 PM, Andy Lutomirski  wrote:
>> On Apr 1, 2015 12:56 PM, "Kay Sievers"  wrote:
>
>>> Do you have an idea why the VM does not accept the custom font? If
>>> that is something obvious, and we can detect it, we could make
>>> vconsole-setup check for it. But then again, fixing setfont seems like
>>> the obvious fix here.
>>
>> I assume it's because the VM has no graphical console at all.
>
> We check the existence of the corresponding /dev/vcs%i, to check if
> the tty is allocated where we want to apply the font to. Do these
> devices exist on the running machine?

Yes:

# ls /dev/vcs*
/dev/vcs   /dev/vcs2  /dev/vcs4  /dev/vcsa1  /dev/vcsa3
/dev/vcs1  /dev/vcs3  /dev/vcsa  /dev/vcsa2  /dev/vcsa4

Looking at the code, the vc_screen.c code seems to create those
devices unconditionally.

>
> And what does this say?
>   grep . /sys/class/tty/tty0/active /sys/class/tty/console/active

# grep . /sys/class/tty/tty0/active /sys/class/tty/console/active
/sys/class/tty/tty0/active:tty1
/sys/class/tty/console/active:ttyS0

vcs1 has, roughly:

early console in decompress_kernel
Decompressing Linux... Parsing ELF... done.
Booting the kernel.

Now I'm wondering how that buffer came to be.

In any event, some tracing of the code suggests that I have
vga_video_type == VIDEO_TYPE_CGA, and that fails "if (vga_video_type <
VIDEO_TYPE_EGAM)" in vgacon_font_set.

Indeed, /proc/ioports has:

  03d4-03d5 : cga

and dmesg says:

[0.00] Console: colour *CGA 80x25

I don't see this information in sysfs anywhere.  Perhaps checking for
an active console and detecting -EINVAL from vgacon_font_get would
work.

/proc/fb is empty on this VM, so maybe that would help.  Grr, this
stuff is really old and crufty.

The offending qemu command line args appear to be -vga none -display
none.  I assume I have "CGA" because it's the fallback case in
vgacon.c if nothing matches.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Apr 1, 2015 12:56 PM, "Kay Sievers"  wrote:
>
> On Wed, Apr 1, 2015 at 9:36 PM, Andy Lutomirski  wrote:
> > On Wed, Apr 1, 2015 at 12:32 PM, Kay Sievers  wrote:
> >> On Wed, Apr 1, 2015 at 8:56 PM, Andy Lutomirski  
> >> wrote:
> >>> On Thu, Jan 22, 2015 at 6:29 PM, Andy Lutomirski  
> >>> wrote:
> >>>> On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering
> >>>>  wrote:
> >>>>> On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote:
> >>>>>
> >>>>>> Hi all-
> >>>>>>
> >>>>>> When running virtme (a simple vm gadget) on Fedora 21, the slowest
> >>>>>> part of bootup by far appears to be systemd-vconsole-setup:
> >>>>>>
> >>>>>> # time /usr/lib/systemd/systemd-vconsole-setup
> >>>>>> putfont: PIO_FONT trying ...
> >>>>>> ...
> >>>>>> setfont: putfont: 512,8x16:  failed: -1
> >>>>>> putfont: PIO_FONT: Invalid argument
> >>>>>> /usr/bin/setfont failed with error code 71.
> >>>>>
> >>>>> setfont is not part of systemd, we just invoke it. If that fails, this
> >>>>> is a problem somewhere between the VM, the kernel and console-tools.
> >>>>>
> >>>>
> >>>> Aha -- I missed that systemd-vconsole-setup calls setfont.  I can
> >>>> trigger the same problem by just typing setfont.  For whatever reason,
> >>>> my other Fedora 21 computer only has this problem if I type setfont
> >>>> and not if I run systemd-vconcole-setup.
> >>>>
> >>>>> My uneducated guess is that your virtual machine boots up with a
> >>>>> non-graphical console, and the tool thus tries to upload the fonts
> >>>>> into the good old VGA hw text mode glyph tables, and qemu is very slow
> >>>>> at that... Or something like that.
> >>>>
> >>>> setfont is doing this:
> >>>>
> >>>> nanosleep({0, 25000}, NULL) = 0
> >>>> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
> >>>> write(2, ".", 1.)= 1
> >>>> nanosleep({0, 25000}, NULL) = 0
> >>>> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
> >>>> write(2, ".", 1.)= 1
> >>>> nanosleep({0, 25000}, NULL) = 0
> >>>> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
> >>>> write(2, ".", 1.)= 1
> >>>> nanosleep({0, 25000}, NULL) = 0
> >>>> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
> >>>> write(2, ".", 1.)= 1
> >>>> nanosleep({0, 25000}, NULL) = 0
> >>>> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
> >>>> write(2, ".", 1.)= 1
> >>>>
> >>>> This thing has only a serial console:
> >>>>
> >>>> # cat /proc/consoles
> >>>> ttyS0-W- (EC   a)4:64
> >>>>
> >>>> setfont does this:
> >>>>
> >>>> /* we allow ourselves to hang here for ca 5 seconds, xdm may
> >>>> be playing tricks on us. */
> >>>> while ((loop++ < 20) && (i = ioctl(fd, PIO_FONT, buf)))
> >>>>   {
> >>>> if (loop <= 1)
> >>>>   fprintf(stderr, "putfont: PIO_FONT trying ...\n");
> >>>> else
> >>>>   fprintf(stderr, ".");
> >>>> usleep(25);
> >>>>   }
> >>>> fprintf(stderr, "\n");
> >>>>
> >>>> Alexey, would it make sense to remove this loop or to add a way to turn 
> >>>> it off?
> >>>
> >>> Ping, everyone?
> >>>
> >>> This issue still exists.  AFAICT systemd is relying on a really old
> >>> tool, that that really old tool (setfont) is sometimes delaying boot
> >>> by a very large amount.  Can we either fix the tool (Alexey) or stop
> >>> using it (systemd people)?

Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Wed, Apr 1, 2015 at 12:32 PM, Kay Sievers  wrote:
> On Wed, Apr 1, 2015 at 8:56 PM, Andy Lutomirski  wrote:
>> On Thu, Jan 22, 2015 at 6:29 PM, Andy Lutomirski  wrote:
>>> On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering
>>>  wrote:
>>>> On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote:
>>>>
>>>>> Hi all-
>>>>>
>>>>> When running virtme (a simple vm gadget) on Fedora 21, the slowest
>>>>> part of bootup by far appears to be systemd-vconsole-setup:
>>>>>
>>>>> # time /usr/lib/systemd/systemd-vconsole-setup
>>>>> putfont: PIO_FONT trying ...
>>>>> ...
>>>>> setfont: putfont: 512,8x16:  failed: -1
>>>>> putfont: PIO_FONT: Invalid argument
>>>>> /usr/bin/setfont failed with error code 71.
>>>>
>>>> setfont is not part of systemd, we just invoke it. If that fails, this
>>>> is a problem somewhere between the VM, the kernel and console-tools.
>>>>
>>>
>>> Aha -- I missed that systemd-vconsole-setup calls setfont.  I can
>>> trigger the same problem by just typing setfont.  For whatever reason,
>>> my other Fedora 21 computer only has this problem if I type setfont
>>> and not if I run systemd-vconcole-setup.
>>>
>>>> My uneducated guess is that your virtual machine boots up with a
>>>> non-graphical console, and the tool thus tries to upload the fonts
>>>> into the good old VGA hw text mode glyph tables, and qemu is very slow
>>>> at that... Or something like that.
>>>
>>> setfont is doing this:
>>>
>>> nanosleep({0, 25000}, NULL) = 0
>>> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
>>> write(2, ".", 1.)= 1
>>> nanosleep({0, 25000}, NULL) = 0
>>> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
>>> write(2, ".", 1.)= 1
>>> nanosleep({0, 25000}, NULL) = 0
>>> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
>>> write(2, ".", 1.)= 1
>>> nanosleep({0, 25000}, NULL) = 0
>>> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
>>> write(2, ".", 1.)= 1
>>> nanosleep({0, 25000}, NULL) = 0
>>> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
>>> write(2, ".", 1.)= 1
>>>
>>> This thing has only a serial console:
>>>
>>> # cat /proc/consoles
>>> ttyS0-W- (EC   a)4:64
>>>
>>> setfont does this:
>>>
>>> /* we allow ourselves to hang here for ca 5 seconds, xdm may
>>> be playing tricks on us. */
>>> while ((loop++ < 20) && (i = ioctl(fd, PIO_FONT, buf)))
>>>   {
>>> if (loop <= 1)
>>>   fprintf(stderr, "putfont: PIO_FONT trying ...\n");
>>> else
>>>   fprintf(stderr, ".");
>>> usleep(25);
>>>   }
>>> fprintf(stderr, "\n");
>>>
>>> Alexey, would it make sense to remove this loop or to add a way to turn it 
>>> off?
>>
>> Ping, everyone?
>>
>> This issue still exists.  AFAICT systemd is relying on a really old
>> tool, that that really old tool (setfont) is sometimes delaying boot
>> by a very large amount.  Can we either fix the tool (Alexey) or stop
>> using it (systemd people)?
>
> Hmm, why is the "vm gadget" you run configuring a custom console font
> at all? If there is no custom font specified in t he config, systemd
> will not run setfont.

It's not intentionally configuring a custom font, but it might be
inheriting Fedora's settings.

>
> Or did you mean to have vconsole-setup detect that it should not even
> try to run setfont? Not sure how to find that out.
>
> I don't really see how vconsole-setup could get rid of calling setfont
> from systemd, it is needed in many setups.

vconsole-setup could set the font itself instead of using setfont if
setfont can't be configured or fixed not to keep retrying for five
seconds (!).

Ideally, I think that setfont would just stop retrying on failure.  Or
perhaps all of this could go through udev or some other mechanism that
doesn't try to set the font until the device actually exists.  But the
console system is weird and may be that's hard.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Thu, Jan 22, 2015 at 6:29 PM, Andy Lutomirski  wrote:
> On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering
>  wrote:
>> On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote:
>>
>>> Hi all-
>>>
>>> When running virtme (a simple vm gadget) on Fedora 21, the slowest
>>> part of bootup by far appears to be systemd-vconsole-setup:
>>>
>>> # time /usr/lib/systemd/systemd-vconsole-setup
>>> putfont: PIO_FONT trying ...
>>> ...
>>> setfont: putfont: 512,8x16:  failed: -1
>>> putfont: PIO_FONT: Invalid argument
>>> /usr/bin/setfont failed with error code 71.
>>
>> setfont is not part of systemd, we just invoke it. If that fails, this
>> is a problem somewhere between the VM, the kernel and console-tools.
>>
>
> Aha -- I missed that systemd-vconsole-setup calls setfont.  I can
> trigger the same problem by just typing setfont.  For whatever reason,
> my other Fedora 21 computer only has this problem if I type setfont
> and not if I run systemd-vconcole-setup.
>
>> My uneducated guess is that your virtual machine boots up with a
>> non-graphical console, and the tool thus tries to upload the fonts
>> into the good old VGA hw text mode glyph tables, and qemu is very slow
>> at that... Or something like that.
>
> setfont is doing this:
>
> nanosleep({0, 25000}, NULL) = 0
> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
> write(2, ".", 1.)= 1
> nanosleep({0, 25000}, NULL) = 0
> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
> write(2, ".", 1.)= 1
> nanosleep({0, 25000}, NULL) = 0
> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
> write(2, ".", 1.)= 1
> nanosleep({0, 25000}, NULL) = 0
> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
> write(2, ".", 1.)= 1
> nanosleep({0, 25000}, NULL) = 0
> ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
> write(2, ".", 1.)= 1
>
> This thing has only a serial console:
>
> # cat /proc/consoles
> ttyS0-W- (EC   a)4:64
>
> setfont does this:
>
> /* we allow ourselves to hang here for ca 5 seconds, xdm may
> be playing tricks on us. */
> while ((loop++ < 20) && (i = ioctl(fd, PIO_FONT, buf)))
>   {
> if (loop <= 1)
>   fprintf(stderr, "putfont: PIO_FONT trying ...\n");
> else
>   fprintf(stderr, ".");
> usleep(25);
>   }
> fprintf(stderr, "\n");
>
> Alexey, would it make sense to remove this loop or to add a way to turn it 
> off?

Ping, everyone?

This issue still exists.  AFAICT systemd is relying on a really old
tool, that that really old tool (setfont) is sometimes delaying boot
by a very large amount.  Can we either fix the tool (Alexey) or stop
using it (systemd people)?

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-01-22 Thread Andy Lutomirski
On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering
 wrote:
> On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote:
>
>> Hi all-
>>
>> When running virtme (a simple vm gadget) on Fedora 21, the slowest
>> part of bootup by far appears to be systemd-vconsole-setup:
>>
>> # time /usr/lib/systemd/systemd-vconsole-setup
>> putfont: PIO_FONT trying ...
>> ...
>> setfont: putfont: 512,8x16:  failed: -1
>> putfont: PIO_FONT: Invalid argument
>> /usr/bin/setfont failed with error code 71.
>
> setfont is not part of systemd, we just invoke it. If that fails, this
> is a problem somewhere between the VM, the kernel and console-tools.
>

Aha -- I missed that systemd-vconsole-setup calls setfont.  I can
trigger the same problem by just typing setfont.  For whatever reason,
my other Fedora 21 computer only has this problem if I type setfont
and not if I run systemd-vconcole-setup.

> My uneducated guess is that your virtual machine boots up with a
> non-graphical console, and the tool thus tries to upload the fonts
> into the good old VGA hw text mode glyph tables, and qemu is very slow
> at that... Or something like that.

setfont is doing this:

nanosleep({0, 25000}, NULL) = 0
ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
write(2, ".", 1.)= 1
nanosleep({0, 25000}, NULL) = 0
ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
write(2, ".", 1.)= 1
nanosleep({0, 25000}, NULL) = 0
ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
write(2, ".", 1.)= 1
nanosleep({0, 25000}, NULL) = 0
ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
write(2, ".", 1.)= 1
nanosleep({0, 25000}, NULL) = 0
ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
write(2, ".", 1.)= 1

This thing has only a serial console:

# cat /proc/consoles
ttyS0-W- (EC   a)4:64

setfont does this:

/* we allow ourselves to hang here for ca 5 seconds, xdm may
be playing tricks on us. */
while ((loop++ < 20) && (i = ioctl(fd, PIO_FONT, buf)))
  {
if (loop <= 1)
  fprintf(stderr, "putfont: PIO_FONT trying ...\n");
else
  fprintf(stderr, ".");
usleep(25);
  }
fprintf(stderr, "\n");

Alexey, would it make sense to remove this loop or to add a way to turn it off?

Thanks,
Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] systemd-vconsole-setup fails very slowly

2015-01-21 Thread Andy Lutomirski
Hi all-

When running virtme (a simple vm gadget) on Fedora 21, the slowest
part of bootup by far appears to be systemd-vconsole-setup:

# time /usr/lib/systemd/systemd-vconsole-setup
putfont: PIO_FONT trying ...
...
setfont: putfont: 512,8x16:  failed: -1
putfont: PIO_FONT: Invalid argument
/usr/bin/setfont failed with error code 71.

real0m5.361s
user0m0.079s
sys0m0.093s

I have no idea what it's trying to do, why it thinks it should work,
or why it doesn't work, but can it be updated to fail quickly.  The VM
in question has no graphics:

# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 Unclassified device [0002]: Red Hat, Inc Virtio filesystem
00:03.0 System peripheral: Intel Corporation 6300ESB Watchdog Timer

To reproduce on Fedora 21:

$ sudo dnf install virtme
$ virtme-run --installed-kernel

Then twiddle your thumbs for ~6.3 seconds instead of the ~1.3 seconds
of thumb twiddling that would be required AFAICT without this issue.

Thanks,
Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-12-10 Thread Andy Lutomirski
On Tue, Dec 9, 2014 at 12:46 PM, Andy Lutomirski  wrote:
> On Mon, Nov 3, 2014 at 12:41 PM, Andy Lutomirski  wrote:
>> On Mon, Nov 3, 2014 at 12:21 PM, Jiri Kosina  wrote:
>>> On Mon, 3 Nov 2014, David Herrmann wrote:
>>>
>>>> > Agreed, mostly.  My only real concern is that this could be annoying
>>>> > for the userspace developers who will need to target Linux and HIDAPI
>>>> > separately.  Admittedly the Linux support will be trivial.
>>>>
>>>> I see. I'll not stop you from using hidraw, I'd just not recommend it.
>>>> Especially for security stuff I dislike exposing the whole HID device
>>>> writable. But yeah, I guess you got my point.
>>>
>>> Alright, I am basically thinking loudly now, but how about we allow HID
>>> drivers that would override default hidraw interface?
>>>
>>> I am very well aware of the fact that this could be opening a can of
>>> worms, so we'll have to make it very restrictive. How about, let's say, we
>>> allow HID drivers to provide custom hidraw interface (completely
>>> overriding the one that HID core would normally create) only for cases
>>> such as:
>>>
>>> - the intent is to expose only certain parts of a combined device
>>> - the intent is to introduce some level of access control
>>>
>>> I.e. still no interference of kernel with data parsing allowed.
>>
>> Hmm.  Would this be like a filter on hidraw actions?
>>
>> How would udev distinguish these special hidraw devices from normal
>> hidraw devices?
>>
>> Also, for U2F, this could be a little awkward.  There's some crazy
>> fragmentation stuff to allow a U2F request to be split across HID
>> requests, and I think a kernel driver would much rather get the
>> original unfragmented application request.
>>
>
> I started writing a driver for this.  I got enumeration working.  I
> assume I should create a "u2f" device class, and then... ?
>
> Where am I supposed to get my character device from?  Do I register my
> own chrdev major?  Do I use misc?  Is there some input thing I'm
> supposed to use?

Another question:

I'm hitting this in hid_input_report:

if (down_trylock(&hid->driver_input_lock)) {
pr_err("HID: trylock failed\n");  // I added this
return -EBUSY;
}

This is a problem for u2f: u2f reports are actual protocol messages,
and there isn't a retransmit mechanism.  Losing messages randomly
causes the handshake to fail, and then nothing works.

I *think* that the only way I can hit that failure is during probe (or
if the USB stack somehow completes two transfers at once). This still
makes it quite awkward to start IO from the probe routine.

I could start a work item to take driver_input_lock, release it again,
and then start the handshake, but that sucks.

Ideas?

--Andy

>
> Thanks,
> Andy
>
>> --Andy
>>
>>>
>>>> > I can give the driver a try.  It'll actually be simpler than the spec
>>>> > makes it out, since a real kernel driver will have no need to
>>>> > arbitrate with itself.
>>>>
>>>> I'll gladly review any custom HID drivers. Feel free to put me on CC
>>>> when sending to linux-input. There's also a lot of examples in
>>>> drivers/hid/ in case you need a head-start (especially if you need the
>>>> raw_event callback).
>>>
>>> Same here, of course.
>>>
>>> Please always CC me in parallel to sending to linux-input@ to make sure
>>> that the patch doesn't fall in between cracks.
>>>
>>> Thanks,
>>>
>>> --
>>> Jiri Kosina
>>> SUSE Labs
>>
>>
>>
>> --
>> Andy Lutomirski
>> AMA Capital Management, LLC
>
>
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-12-09 Thread Andy Lutomirski
On Mon, Nov 3, 2014 at 12:41 PM, Andy Lutomirski  wrote:
> On Mon, Nov 3, 2014 at 12:21 PM, Jiri Kosina  wrote:
>> On Mon, 3 Nov 2014, David Herrmann wrote:
>>
>>> > Agreed, mostly.  My only real concern is that this could be annoying
>>> > for the userspace developers who will need to target Linux and HIDAPI
>>> > separately.  Admittedly the Linux support will be trivial.
>>>
>>> I see. I'll not stop you from using hidraw, I'd just not recommend it.
>>> Especially for security stuff I dislike exposing the whole HID device
>>> writable. But yeah, I guess you got my point.
>>
>> Alright, I am basically thinking loudly now, but how about we allow HID
>> drivers that would override default hidraw interface?
>>
>> I am very well aware of the fact that this could be opening a can of
>> worms, so we'll have to make it very restrictive. How about, let's say, we
>> allow HID drivers to provide custom hidraw interface (completely
>> overriding the one that HID core would normally create) only for cases
>> such as:
>>
>> - the intent is to expose only certain parts of a combined device
>> - the intent is to introduce some level of access control
>>
>> I.e. still no interference of kernel with data parsing allowed.
>
> Hmm.  Would this be like a filter on hidraw actions?
>
> How would udev distinguish these special hidraw devices from normal
> hidraw devices?
>
> Also, for U2F, this could be a little awkward.  There's some crazy
> fragmentation stuff to allow a U2F request to be split across HID
> requests, and I think a kernel driver would much rather get the
> original unfragmented application request.
>

I started writing a driver for this.  I got enumeration working.  I
assume I should create a "u2f" device class, and then... ?

Where am I supposed to get my character device from?  Do I register my
own chrdev major?  Do I use misc?  Is there some input thing I'm
supposed to use?

Thanks,
Andy

> --Andy
>
>>
>>> > I can give the driver a try.  It'll actually be simpler than the spec
>>> > makes it out, since a real kernel driver will have no need to
>>> > arbitrate with itself.
>>>
>>> I'll gladly review any custom HID drivers. Feel free to put me on CC
>>> when sending to linux-input. There's also a lot of examples in
>>> drivers/hid/ in case you need a head-start (especially if you need the
>>> raw_event callback).
>>
>> Same here, of course.
>>
>> Please always CC me in parallel to sending to linux-input@ to make sure
>> that the patch doesn't fall in between cracks.
>>
>> Thanks,
>>
>> --
>> Jiri Kosina
>> SUSE Labs
>
>
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-12-01 Thread Andy Lutomirski
On Mon, Dec 1, 2014 at 8:39 AM, Konstantin Khlebnikov  wrote:
> On Mon, Dec 1, 2014 at 7:21 PM, Andy Lutomirski  wrote:
>> On Sun, Nov 30, 2014 at 11:03 PM, Konstantin Khlebnikov
>>  wrote:
>>> Hmm. What about per-task/thread UUID? exported via separate file: 
>>> /proc/PID/uuid
>>> It could be created at the first access, thus this wouldn't shlowdown 
>>> clone().
>>> Also it could be droped at execve(), so it'll describe execution
>>> context more precisely than pid.
>>>
>>
>> I'd be all for this if it weren't for two issues:
>>
>> 1. This thing needs to work as soon as init is started, and we can't
>> reliably generate random numbers that early.
>>
>> 2. Migration / CRIU is rather fundamentally at odds with making
>> anything universally unique, as opposed to just per-user-ns.
>>
>> So, given that I don't think we can actually provide a universally
>> unique pid-like thing, I'd rather not even try.
>
> Well... another idea: what about pid generation counter? Of course it
> should be per-pid-ns.
> As I see struct upid has nice 32-bit hole next to 'nr' field. (on
> 64-bit systems)
>

I thought about that, but it has two issues:

1. Implementing it will be pain.  The pid allocation algorithm is
already complicated, and it wasn't obvious to me how to accurately
detect wraparound without racing against other wrapping users.

2. I don't think it will be reliable.  Suppose that there are pid_max
- 1 processes.  One of them can repeatedly clone and exit,
incrementing the generation counter each time.  After 2^32 iterations,
which won't take all that long, the pid generation will wrap.

--Andy

>>
>> That being said, I want to rework this a little bit.  I think that
>> highpid should be restricted to the range 2^32 through 2^64 - 4096.
>> The former prevents anyone from confusing highpid with regular pid,
>> and the latter means that we don't need to worry about confusion
>> between errors and valid highpids (e.g. -1 will never be a highpid).
>>
>> Implementing that will be only mildly annoying.
>>
>> --Andy
>>
>>> On Sat, Nov 29, 2014 at 2:05 AM, Andy Lutomirski  
>>> wrote:
>>>> Pid reuse is common, which means that it's difficult or impossible
>>>> to read information about a pid from /proc without races.
>>>>
>>>> This introduces a second number associated with each (task, pidns)
>>>> pair called highpid.  Highpid is a 64-bit number, and, barring
>>>> extremely unlikely circumstances or outright error, a (highpid, pid)
>>>> will never be reused.
>>>>
>>>> With just this change, a program can open /proc/PID/status, read the
>>>> "Highpid" field, and confirm that it has the expected value.  If the
>>>> pid has been reused, then highpid will be different.
>>>>
>>>> The initial implementation is straightforward: highpid is simply a
>>>> 64-bit counter. If a high-end system can fork every 3 ns (which
>>>> would be amazing, given that just allocating a pid requires at
>>>> atomic operation), it would take well over 1000 years for highpid to
>>>> wrap.
>>>>
>>>> For CRIU's benefit, the next highpid can be set by a privileged
>>>> user.
>>>>
>>>> NB: The sysctl stuff only works on 64-bit systems.  If the approach
>>>> looks good, I'll fix that somehow.
>>>>
>>>> Signed-off-by: Andy Lutomirski 
>>>> ---
>>>>
>>>> If this goes in, there's plenty of room to add new interfaces to
>>>> make this more useful.  For example, we could add a fancier tgkill
>>>> that adds and validates hightgid and highpid, and we might want to
>>>> add a syscall to read one's own hightgid and highpid.  These would
>>>> be quite useful for pidfiles.
>>>>
>>>> David, would this be useful for kdbus?
>>>>
>>>> CRIU people: will this be unduly difficult to support in CRIU?
>>>>
>>>> If you all think this is a good idea, I'll fix the sysctl stuff and
>>>> document it.  It might also be worth adding "Hightgid" to status.
>>>>
>>>>  fs/proc/array.c   |  5 -
>>>>  include/linux/pid.h   |  2 ++
>>>>  include/linux/pid_namespace.h |  1 +
>>>>  kernel/pid.c  | 19 +++
>>>>  kernel/pid_namespace.c| 22 ++

Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-12-01 Thread Andy Lutomirski
On Sun, Nov 30, 2014 at 11:03 PM, Konstantin Khlebnikov
 wrote:
> Hmm. What about per-task/thread UUID? exported via separate file: 
> /proc/PID/uuid
> It could be created at the first access, thus this wouldn't shlowdown clone().
> Also it could be droped at execve(), so it'll describe execution
> context more precisely than pid.
>

I'd be all for this if it weren't for two issues:

1. This thing needs to work as soon as init is started, and we can't
reliably generate random numbers that early.

2. Migration / CRIU is rather fundamentally at odds with making
anything universally unique, as opposed to just per-user-ns.

So, given that I don't think we can actually provide a universally
unique pid-like thing, I'd rather not even try.

That being said, I want to rework this a little bit.  I think that
highpid should be restricted to the range 2^32 through 2^64 - 4096.
The former prevents anyone from confusing highpid with regular pid,
and the latter means that we don't need to worry about confusion
between errors and valid highpids (e.g. -1 will never be a highpid).

Implementing that will be only mildly annoying.

--Andy

> On Sat, Nov 29, 2014 at 2:05 AM, Andy Lutomirski  wrote:
>> Pid reuse is common, which means that it's difficult or impossible
>> to read information about a pid from /proc without races.
>>
>> This introduces a second number associated with each (task, pidns)
>> pair called highpid.  Highpid is a 64-bit number, and, barring
>> extremely unlikely circumstances or outright error, a (highpid, pid)
>> will never be reused.
>>
>> With just this change, a program can open /proc/PID/status, read the
>> "Highpid" field, and confirm that it has the expected value.  If the
>> pid has been reused, then highpid will be different.
>>
>> The initial implementation is straightforward: highpid is simply a
>> 64-bit counter. If a high-end system can fork every 3 ns (which
>> would be amazing, given that just allocating a pid requires at
>> atomic operation), it would take well over 1000 years for highpid to
>> wrap.
>>
>> For CRIU's benefit, the next highpid can be set by a privileged
>> user.
>>
>> NB: The sysctl stuff only works on 64-bit systems.  If the approach
>> looks good, I'll fix that somehow.
>>
>> Signed-off-by: Andy Lutomirski 
>> ---
>>
>> If this goes in, there's plenty of room to add new interfaces to
>> make this more useful.  For example, we could add a fancier tgkill
>> that adds and validates hightgid and highpid, and we might want to
>> add a syscall to read one's own hightgid and highpid.  These would
>> be quite useful for pidfiles.
>>
>> David, would this be useful for kdbus?
>>
>> CRIU people: will this be unduly difficult to support in CRIU?
>>
>> If you all think this is a good idea, I'll fix the sysctl stuff and
>> document it.  It might also be worth adding "Hightgid" to status.
>>
>>  fs/proc/array.c   |  5 -
>>  include/linux/pid.h   |  2 ++
>>  include/linux/pid_namespace.h |  1 +
>>  kernel/pid.c  | 19 +++
>>  kernel/pid_namespace.c| 22 ++
>>  5 files changed, 44 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/proc/array.c b/fs/proc/array.c
>> index cd3653e4f35c..f1e0e69d19f9 100644
>> --- a/fs/proc/array.c
>> +++ b/fs/proc/array.c
>> @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, struct 
>> pid_namespace *ns,
>> int g;
>> struct fdtable *fdt = NULL;
>> const struct cred *cred;
>> +   const struct upid *upid;
>> pid_t ppid, tpid;
>>
>> rcu_read_lock();
>> @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, 
>> struct pid_namespace *ns,
>> if (tracer)
>> tpid = task_pid_nr_ns(tracer, ns);
>> }
>> +   upid = pid_upid_ns(pid, ns);
>> cred = get_task_cred(p);
>> seq_printf(m,
>> "State:\t%s\n"
>> "Tgid:\t%d\n"
>> "Ngid:\t%d\n"
>> "Pid:\t%d\n"
>> +   "Highpid:\t%llu\n"
>> "PPid:\t%d\n"
>> "TracerPid:\t%d\n"
>> "Uid:\t%d\t%d\t%d\t%d\n"
>> @@ -183,7 +186,7 @@ static inline void task_state(struct seq_file *m, struct 
>> pid_namespace *ns,
>> 

Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-11-30 Thread Andy Lutomirski
On Nov 30, 2014 1:47 AM, "Florian Weimer"  wrote:
>
> * Andy Lutomirski:
>
> > The initial implementation is straightforward: highpid is simply a
> > 64-bit counter. If a high-end system can fork every 3 ns (which
> > would be amazing, given that just allocating a pid requires at
> > atomic operation), it would take well over 1000 years for highpid to
> > wrap.
>
> I'm not sure if I'm reading the patch correctly, but is the counter
> namespaced?  If yes, why?

It's namespaced so that CRIU can migrate/restore a whole pid namespace.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-11-30 Thread Andy Lutomirski
On Nov 30, 2014 9:45 AM, "David Herrmann"  wrote:
>
> Hi Andy
>
> On Sat, Nov 29, 2014 at 12:05 AM, Andy Lutomirski  wrote:
> > Pid reuse is common, which means that it's difficult or impossible
> > to read information about a pid from /proc without races.
> >
> > This introduces a second number associated with each (task, pidns)
> > pair called highpid.  Highpid is a 64-bit number, and, barring
> > extremely unlikely circumstances or outright error, a (highpid, pid)
> > will never be reused.
> >
> > With just this change, a program can open /proc/PID/status, read the
> > "Highpid" field, and confirm that it has the expected value.  If the
> > pid has been reused, then highpid will be different.
> >
> > The initial implementation is straightforward: highpid is simply a
> > 64-bit counter. If a high-end system can fork every 3 ns (which
> > would be amazing, given that just allocating a pid requires at
> > atomic operation), it would take well over 1000 years for highpid to
> > wrap.
> >
> > For CRIU's benefit, the next highpid can be set by a privileged
> > user.
> >
> > NB: The sysctl stuff only works on 64-bit systems.  If the approach
> > looks good, I'll fix that somehow.
> >
> > Signed-off-by: Andy Lutomirski 
> > ---
> >
> > If this goes in, there's plenty of room to add new interfaces to
> > make this more useful.  For example, we could add a fancier tgkill
> > that adds and validates hightgid and highpid, and we might want to
> > add a syscall to read one's own hightgid and highpid.  These would
> > be quite useful for pidfiles.
> >
> > David, would this be useful for kdbus?
>
> Much appreciated! This would serve well as replacement for
> 'starttime'. I'd prefer if pid_t was 64bit, but I guess that ship
> sailed long ago. Though, your patch might in the end just introduce a
> new pid64, which replaces the old pid and lives in parallel.
>
> Anyway, considering that we actually want the same pid-reuse
> protection for tid, tgid, ppid and so on, we'd have to add a
> 'starttime' for all of them. Sounds ugly.. so we might just end up
> dropping 'starttime' and introduce KDBUS_ITEM_PIDS2 whenever a 'fix'
> is merged upstream.

I don't understand "we want".  Doesn't kdbus currently only think
about tid and tgid?

In any event, I just looked at the kdbus code
(https://github.com/gregkh/kdbus/blob/master/metadata.c), and I see:

meta->pid = get_pid(task_pid(current));
meta->tgid = get_pid(task_tgid(current));
meta->starttime = current->start_time,

I don't understand how that concept of starttime is particularly
useful.  Isn't that the start time associated with tid?  How can that
be useful when looking up tgid?

Regardless, my patch here associates a highpid with each struct pid,
and both tid and tgid are just struct pids, so you get both.  I agree
that adding Highppid to /proc might be useful, but that seems like a
future extension that has nothing to do with kdbus.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-11-29 Thread Andy Lutomirski
On Fri, Nov 28, 2014 at 7:34 PM, Eric W. Biederman
 wrote:
> Andy Lutomirski  writes:
>
>> Pid reuse is common, which means that it's difficult or impossible
>> to read information about a pid from /proc without races.
>
> Sigh.
>
> What we need are not race free pids, but a file descriptor based process
> management api.  Possibly one that starts by handing you a proc
> directory.

I agree completely, and the Capsicum stuff from FreeBSD would probably
be a very good start here.

>
> Which probably means that we need a proc file we can write to and send
> signals to a process, and another proc file we can select on and wait
> for the process to exit.

That too.  In fact, I have an old patch that went nowhere that makes
the proc directory fd itself pollable.

>
> Making pids bigger just looks like bandaid.
>
> Remember evovle things in the direction of an object capability system
> things wind up being more maintainable.

Yes, but this really is intended to be a much better bandaid than what
people currently use.

I'm aiming this at two main use cases that aren't going to switch to
using fds any time soon.  One is shell stuff and PID files.  If we can
put something like "12345@81726162" in /run/lock/foo.pid and safely
kill `cat /run/lock/foo.pid`, that would be nice (even though sensible
users don't use pid files any more, there are *tons* of things that
still use them).  This could also be used for diagnostics.  Suppose
the kernel log indicates a misbehaving pid.  There's currently no
race-free way to poke at those pids.

The much more pressing issue is that there are apparently programs
that think that checking the process's starttime is a good idea for
avoiding PID reuse races.  Both kdbus v1 and v2 do this, hopefully
only for diagnostics.  This is, IMO, completely unacceptable, but I
really don't expect kdbus to start passing even more fds around when
they'll be ignored most of the time.  So, if kdbus is going to be
merged at some point, I'd like to give it the opportunity to name the
sender of a message in a way that is only mildly dangerous (in non
race-related ways) as opposed to being totally broken.

Now if someone wants to implement real light-weight capability-ish fds
in Linux, that would be neat.  IIRC someone tried several years ago
using fds with some high bits set, but it never went anywhere.

FWIW, given that I seem to be the most vocal reviewer of the kdbus
patches, I feel like it's nice to offer some assistance in a piece of
the kdbus stuff that I think really would benefit from a new kernel
feature.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-11-29 Thread Andy Lutomirski
On Nov 28, 2014 9:24 PM, "Greg KH"  wrote:
>
> On Fri, Nov 28, 2014 at 03:05:01PM -0800, Andy Lutomirski wrote:
> > Pid reuse is common, which means that it's difficult or impossible
> > to read information about a pid from /proc without races.
> >
> > This introduces a second number associated with each (task, pidns)
> > pair called highpid.  Highpid is a 64-bit number, and, barring
> > extremely unlikely circumstances or outright error, a (highpid, pid)
> > will never be reused.
> >
> > With just this change, a program can open /proc/PID/status, read the
> > "Highpid" field, and confirm that it has the expected value.  If the
> > pid has been reused, then highpid will be different.
> >
> > The initial implementation is straightforward: highpid is simply a
> > 64-bit counter. If a high-end system can fork every 3 ns (which
> > would be amazing, given that just allocating a pid requires at
> > atomic operation), it would take well over 1000 years for highpid to
> > wrap.
> >
> > For CRIU's benefit, the next highpid can be set by a privileged
> > user.
> >
> > NB: The sysctl stuff only works on 64-bit systems.  If the approach
> > looks good, I'll fix that somehow.
> >
> > Signed-off-by: Andy Lutomirski 
> > ---
> >
> > If this goes in, there's plenty of room to add new interfaces to
> > make this more useful.  For example, we could add a fancier tgkill
> > that adds and validates hightgid and highpid, and we might want to
> > add a syscall to read one's own hightgid and highpid.  These would
> > be quite useful for pidfiles.
> >
> > David, would this be useful for kdbus?
> >
> > CRIU people: will this be unduly difficult to support in CRIU?
> >
> > If you all think this is a good idea, I'll fix the sysctl stuff and
> > document it.  It might also be worth adding "Hightgid" to status.
> >
> >  fs/proc/array.c   |  5 -
> >  include/linux/pid.h   |  2 ++
> >  include/linux/pid_namespace.h |  1 +
> >  kernel/pid.c  | 19 +++
> >  kernel/pid_namespace.c| 22 ++
> >  5 files changed, 44 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/proc/array.c b/fs/proc/array.c
> > index cd3653e4f35c..f1e0e69d19f9 100644
> > --- a/fs/proc/array.c
> > +++ b/fs/proc/array.c
> > @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, 
> > struct pid_namespace *ns,
> >   int g;
> >   struct fdtable *fdt = NULL;
> >   const struct cred *cred;
> > + const struct upid *upid;
> >   pid_t ppid, tpid;
> >
> >   rcu_read_lock();
> > @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, 
> > struct pid_namespace *ns,
> >   if (tracer)
> >   tpid = task_pid_nr_ns(tracer, ns);
> >   }
> > + upid = pid_upid_ns(pid, ns);
> >   cred = get_task_cred(p);
> >   seq_printf(m,
> >   "State:\t%s\n"
> >   "Tgid:\t%d\n"
> >   "Ngid:\t%d\n"
> >   "Pid:\t%d\n"
> > + "Highpid:\t%llu\n"
> >   "PPid:\t%d\n"
> >   "TracerPid:\t%d\n"
> >   "Uid:\t%d\t%d\t%d\t%d\n"
>
> Changing existing proc files like this is dangerous and can cause lots
> of breakage in userspace programs if you are not careful.  Usually
> adding fields to the end of the file is best, but sometimes even then
> things can break :(

Point taken.  I had hoped that everything would parse /proc/PID/status
by looking for the lamed headers.  I'll see what the history of new
fields in there is, but I can change this easily enough.

--Andy

> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-11-28 Thread Andy Lutomirski
[Adding CRIU people.  Whoops.]

On Fri, Nov 28, 2014 at 3:05 PM, Andy Lutomirski  wrote:
> Pid reuse is common, which means that it's difficult or impossible
> to read information about a pid from /proc without races.
>
> This introduces a second number associated with each (task, pidns)
> pair called highpid.  Highpid is a 64-bit number, and, barring
> extremely unlikely circumstances or outright error, a (highpid, pid)
> will never be reused.
>
> With just this change, a program can open /proc/PID/status, read the
> "Highpid" field, and confirm that it has the expected value.  If the
> pid has been reused, then highpid will be different.
>
> The initial implementation is straightforward: highpid is simply a
> 64-bit counter. If a high-end system can fork every 3 ns (which
> would be amazing, given that just allocating a pid requires at
> atomic operation), it would take well over 1000 years for highpid to
> wrap.
>
> For CRIU's benefit, the next highpid can be set by a privileged
> user.
>
> NB: The sysctl stuff only works on 64-bit systems.  If the approach
> looks good, I'll fix that somehow.
>
> Signed-off-by: Andy Lutomirski 
> ---
>
> If this goes in, there's plenty of room to add new interfaces to
> make this more useful.  For example, we could add a fancier tgkill
> that adds and validates hightgid and highpid, and we might want to
> add a syscall to read one's own hightgid and highpid.  These would
> be quite useful for pidfiles.
>
> David, would this be useful for kdbus?
>
> CRIU people: will this be unduly difficult to support in CRIU?
>
> If you all think this is a good idea, I'll fix the sysctl stuff and
> document it.  It might also be worth adding "Hightgid" to status.
>
>  fs/proc/array.c   |  5 -
>  include/linux/pid.h   |  2 ++
>  include/linux/pid_namespace.h |  1 +
>  kernel/pid.c  | 19 +++
>  kernel/pid_namespace.c| 22 ++
>  5 files changed, 44 insertions(+), 5 deletions(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index cd3653e4f35c..f1e0e69d19f9 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, struct 
> pid_namespace *ns,
> int g;
> struct fdtable *fdt = NULL;
> const struct cred *cred;
> +   const struct upid *upid;
> pid_t ppid, tpid;
>
> rcu_read_lock();
> @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, 
> struct pid_namespace *ns,
> if (tracer)
> tpid = task_pid_nr_ns(tracer, ns);
> }
> +   upid = pid_upid_ns(pid, ns);
> cred = get_task_cred(p);
> seq_printf(m,
> "State:\t%s\n"
> "Tgid:\t%d\n"
> "Ngid:\t%d\n"
> "Pid:\t%d\n"
> +   "Highpid:\t%llu\n"
> "PPid:\t%d\n"
> "TracerPid:\t%d\n"
> "Uid:\t%d\t%d\t%d\t%d\n"
> @@ -183,7 +186,7 @@ static inline void task_state(struct seq_file *m, struct 
> pid_namespace *ns,
> get_task_state(p),
> task_tgid_nr_ns(p, ns),
> task_numa_group_id(p),
> -   pid_nr_ns(pid, ns),
> +   upid ? upid->nr : 0, upid ? upid->highnr : 0,
> ppid, tpid,
> from_kuid_munged(user_ns, cred->uid),
> from_kuid_munged(user_ns, cred->euid),
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index 23705a53abba..ece70b64d04c 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -51,6 +51,7 @@ struct upid {
> /* Try to keep pid_chain in the same cacheline as nr for find_vpid */
> int nr;
> struct pid_namespace *ns;
> +   u64 highnr;
> struct hlist_node pid_chain;
>  };
>
> @@ -170,6 +171,7 @@ static inline pid_t pid_nr(struct pid *pid)
>  }
>
>  pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns);
> +const struct upid *pid_upid_ns(struct pid *pid, struct pid_namespace *ns);
>  pid_t pid_vnr(struct pid *pid);
>
>  #define do_each_pid_task(pid, type, task)  \
> diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
> index 1997ffc295a7..1f9f0455d7ef 100644
> --- a/include/linux/pid_namespace.h
> +++ b/include/linux/pid_namespace.h
> @@ -26,6 +26,7 @@ struct pid_namespace {
> struct rcu_head rcu;
> int last_pid;
> unsigned

[systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-11-28 Thread Andy Lutomirski
Pid reuse is common, which means that it's difficult or impossible
to read information about a pid from /proc without races.

This introduces a second number associated with each (task, pidns)
pair called highpid.  Highpid is a 64-bit number, and, barring
extremely unlikely circumstances or outright error, a (highpid, pid)
will never be reused.

With just this change, a program can open /proc/PID/status, read the
"Highpid" field, and confirm that it has the expected value.  If the
pid has been reused, then highpid will be different.

The initial implementation is straightforward: highpid is simply a
64-bit counter. If a high-end system can fork every 3 ns (which
would be amazing, given that just allocating a pid requires at
atomic operation), it would take well over 1000 years for highpid to
wrap.

For CRIU's benefit, the next highpid can be set by a privileged
user.

NB: The sysctl stuff only works on 64-bit systems.  If the approach
looks good, I'll fix that somehow.

Signed-off-by: Andy Lutomirski 
---

If this goes in, there's plenty of room to add new interfaces to
make this more useful.  For example, we could add a fancier tgkill
that adds and validates hightgid and highpid, and we might want to
add a syscall to read one's own hightgid and highpid.  These would
be quite useful for pidfiles.

David, would this be useful for kdbus?

CRIU people: will this be unduly difficult to support in CRIU?

If you all think this is a good idea, I'll fix the sysctl stuff and
document it.  It might also be worth adding "Hightgid" to status.

 fs/proc/array.c   |  5 -
 include/linux/pid.h   |  2 ++
 include/linux/pid_namespace.h |  1 +
 kernel/pid.c  | 19 +++
 kernel/pid_namespace.c| 22 ++
 5 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index cd3653e4f35c..f1e0e69d19f9 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, struct 
pid_namespace *ns,
int g;
struct fdtable *fdt = NULL;
const struct cred *cred;
+   const struct upid *upid;
pid_t ppid, tpid;
 
rcu_read_lock();
@@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, struct 
pid_namespace *ns,
if (tracer)
tpid = task_pid_nr_ns(tracer, ns);
}
+   upid = pid_upid_ns(pid, ns);
cred = get_task_cred(p);
seq_printf(m,
"State:\t%s\n"
"Tgid:\t%d\n"
"Ngid:\t%d\n"
"Pid:\t%d\n"
+   "Highpid:\t%llu\n"
"PPid:\t%d\n"
"TracerPid:\t%d\n"
"Uid:\t%d\t%d\t%d\t%d\n"
@@ -183,7 +186,7 @@ static inline void task_state(struct seq_file *m, struct 
pid_namespace *ns,
get_task_state(p),
task_tgid_nr_ns(p, ns),
task_numa_group_id(p),
-   pid_nr_ns(pid, ns),
+   upid ? upid->nr : 0, upid ? upid->highnr : 0,
ppid, tpid,
from_kuid_munged(user_ns, cred->uid),
from_kuid_munged(user_ns, cred->euid),
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 23705a53abba..ece70b64d04c 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -51,6 +51,7 @@ struct upid {
/* Try to keep pid_chain in the same cacheline as nr for find_vpid */
int nr;
struct pid_namespace *ns;
+   u64 highnr;
struct hlist_node pid_chain;
 };
 
@@ -170,6 +171,7 @@ static inline pid_t pid_nr(struct pid *pid)
 }
 
 pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns);
+const struct upid *pid_upid_ns(struct pid *pid, struct pid_namespace *ns);
 pid_t pid_vnr(struct pid *pid);
 
 #define do_each_pid_task(pid, type, task)  \
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 1997ffc295a7..1f9f0455d7ef 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -26,6 +26,7 @@ struct pid_namespace {
struct rcu_head rcu;
int last_pid;
unsigned int nr_hashed;
+   atomic64_t next_highpid;
struct task_struct *child_reaper;
struct kmem_cache *pid_cachep;
unsigned int level;
diff --git a/kernel/pid.c b/kernel/pid.c
index 9b9a26698144..863d11a9bfbf 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -312,6 +312,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 
pid->numbers[i].nr = nr;
pid->numbers[i].ns = tmp;
+   pid->numbers[i].highnr =
+   atomic64_inc_return(&tmp->next_highpid) - 1;
tmp = tmp->parent;
}
 
@@ -492,17 +494,26 @@ struct pid 

Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-03 Thread Andy Lutomirski
On Mon, Nov 3, 2014 at 12:21 PM, Jiri Kosina  wrote:
> On Mon, 3 Nov 2014, David Herrmann wrote:
>
>> > Agreed, mostly.  My only real concern is that this could be annoying
>> > for the userspace developers who will need to target Linux and HIDAPI
>> > separately.  Admittedly the Linux support will be trivial.
>>
>> I see. I'll not stop you from using hidraw, I'd just not recommend it.
>> Especially for security stuff I dislike exposing the whole HID device
>> writable. But yeah, I guess you got my point.
>
> Alright, I am basically thinking loudly now, but how about we allow HID
> drivers that would override default hidraw interface?
>
> I am very well aware of the fact that this could be opening a can of
> worms, so we'll have to make it very restrictive. How about, let's say, we
> allow HID drivers to provide custom hidraw interface (completely
> overriding the one that HID core would normally create) only for cases
> such as:
>
> - the intent is to expose only certain parts of a combined device
> - the intent is to introduce some level of access control
>
> I.e. still no interference of kernel with data parsing allowed.

Hmm.  Would this be like a filter on hidraw actions?

How would udev distinguish these special hidraw devices from normal
hidraw devices?

Also, for U2F, this could be a little awkward.  There's some crazy
fragmentation stuff to allow a U2F request to be split across HID
requests, and I think a kernel driver would much rather get the
original unfragmented application request.

--Andy

>
>> > I can give the driver a try.  It'll actually be simpler than the spec
>> > makes it out, since a real kernel driver will have no need to
>> > arbitrate with itself.
>>
>> I'll gladly review any custom HID drivers. Feel free to put me on CC
>> when sending to linux-input. There's also a lot of examples in
>> drivers/hid/ in case you need a head-start (especially if you need the
>> raw_event callback).
>
> Same here, of course.
>
> Please always CC me in parallel to sending to linux-input@ to make sure
> that the patch doesn't fall in between cracks.
>
> Thanks,
>
> --
> Jiri Kosina
> SUSE Labs



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-03 Thread Andy Lutomirski
On Mon, Nov 3, 2014 at 11:03 AM, David Herrmann  wrote:
> Hi
>
> On Sun, Nov 2, 2014 at 7:57 PM, Andy Lutomirski  wrote:
>> I want to get U2F (universal second factor, sometimes called "security
>> key" or even "gnubby") working on Linux.  U2F tokens are HID devices
>> that speak a custom protocol.  The intent is that user code will speak
>> to then using something like HIDAPI.
>>
>> The trick is that, for HIDAPI to work, something needs to recognize
>> these devices and get udev to set appropriate device permissions.
>
> [snip]
>
>>  - An actual kernel driver for U2F devices using the hid group
>> mechanism for enumeration.  This seems overcomplicated.
>
> Imho, this is the way to go. Create a proper char-dev for U2F, create
> an API and make it work.
>
> We had this discussion earlier about vendor-extensions that should be
> writable via hidraw from user-space. This turned out to be really
> messy.. and was discussed for several weeks straight. hidraw just
> wasn't designed as unprivileged user-space API. For instance, what
> happens if a device provides U2F plus something else? Both will be on
> the same hidraw device.
> We could split hidraw per usage, but I don't see how that is superior
> to a proper U2F API. And once one usage can affect a device as a whole
> (like power-off), you're screwed.

Agreed, mostly.  My only real concern is that this could be annoying
for the userspace developers who will need to target Linux and HIDAPI
separately.  Admittedly the Linux support will be trivial.

I can give the driver a try.  It'll actually be simpler than the spec
makes it out, since a real kernel driver will have no need to
arbitrate with itself.

--Andy

>
> Just look at the libusb mess where some devices are handled in the
> kernel and some in user-space (eg., see Gnome cheese, media devices,
> ...). I don't think we should repeat that with HID.
>
> Thanks
> David



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] udev: Add hidraw_id and a rule file to invoke it

2014-11-03 Thread Andy Lutomirski
On Mon, Nov 3, 2014 at 5:32 AM, Tom Gundersen  wrote:
> Hi Andy,
>
> On Tue, Oct 28, 2014 at 11:46 PM, Andy Lutomirski  wrote:
>> So far, hidraw_id detects U2F tokens and sets:
>> ID_U2F_TOKEN=1
>> ID_SECURITY_TOKEN=1
>>
>> This causes the uaccess rules to apply to U2F devices.
>> ---
>>
>> I've never written any udev code before.  Feedback welcome.
>>
>> If you think this doesn't belong in udev, I can try to find it another home.
>
> As mentioned, I don't think we should do this in core udev code (udev
> rule would be ok), but making it generic and exposing all the usages
> sounds useful. I don't have a strong opinion on whether this should be
> in udev or in the kernel, but if you end up going with udev, here are
> some comments.

Thanks.  I'll wait to see what Jiri thinks before sending an updated version.

>
> Also, we don't want to add any more external helpers to the udev
> codebase, so we should just make this a builtin (analogous to
> udev-builtin-usb_id.c).
>
> Minor comments inline.
>
> [...]
>
>> diff --git a/src/udev/hidraw_id/hidraw_id.c b/src/udev/hidraw_id/hidraw_id.c
>> new file mode 100644
>> index ..e32f222f22f9
>> --- /dev/null
>> +++ b/src/udev/hidraw_id/hidraw_id.c
>> @@ -0,0 +1,144 @@
>> +/*
>> + * Copyright (c) Andrew Lutomirski, 2014
>> + *
>> + * This program is free software: you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>
> New udev code must be LGPL2+.

OK.

[snipped things I'll change if I resubmit]

>
>> +/* Long item; skip it. */
>> +if (i + 1 >= desclen) {
>> +log_error("bad report_descriptor");
>> +goto out;
>> +}
>> +i += (desc[i+1] + 3);  /* Can't overflow. */
>> +continue;
>> +}
>> +
>> +size = (sizecode < 3 ? sizecode : 4);
>> +
>> +if (i + 1 + size > desclen) {
>> +log_error("bad report_descriptor");
>> +goto out;
>> +}
>> +
>> +for (j = 0; j < size; j++)
>> +value |= (desc[i + 1 + j] << 8*j);
>> +
>> +if (type == 1 && tag == 0)
>> +usage_page = value;
>
> Should we verify that the data has the right length (2 or 4 bytes)? I
> guess we also need to handle 2-byte usage pages specially?

I think we're not supposed to check the length in general, but I
should re-read the part about two-byte usage pages.  (Although, off
the top of my head, I thought that the special case was for very long
usages, not usage pages.)

[...]

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
On Sun, Nov 2, 2014 at 4:40 PM, Benjamin Tissoires
 wrote:
> On Sun, Nov 2, 2014 at 6:34 PM, Andy Lutomirski  wrote:
>> On Sun, Nov 2, 2014 at 3:01 PM, Benjamin Tissoires
>>  wrote:
>>> On Sun, Nov 2, 2014 at 5:49 PM, Andy Lutomirski  wrote:
>>>> On Sun, Nov 2, 2014 at 2:45 PM, Benjamin Tissoires
>>>>  wrote:
>>>>> On Sun, Nov 2, 2014 at 4:40 PM, Jiri Kosina  wrote:
>>>>>> On Sun, 2 Nov 2014, Jiri Kosina wrote:
>>>>>>
>>>>>>> Alternatively, you can just write udev rule which triggers on HID 
>>>>>>> devices,
>>>>>>> issues HIDIOCGRDESC ioctl() on the just-created hidraw node, and decides
>>>>>>> afterwards whether node permissions need to be altered ... right?
>>>>>>
>>>>>> Just to make myself clear here -- this is basically alternative to your
>>>>>> 1st option, but for cases where you want to make yourself independent on
>>>>>> sysfs existence (I don't what is the craziness level of setups you are
>>>>>> considering).
>>>>>>
>>>>>
>>>>> I am a little bit concerned with the user space retrieving the
>>>>> descriptor, parsing it and then deciding how to set the permissions.
>>>>> It's not that it is super complex, but it will duplicate the parsing
>>>>> we already do in the kernel. Wacom tried to do that in their usb
>>>>> driver, and they never managed to fully implement one.
>>>>>
>>>>> I think the HID_GROUP solution is super trivial:
>>>>> - add a match on the U2F input report and set the group to
>>>>> HID_GROUP_U2F (2 lines in hid_scan_input_usage() in hid-core.c)
>>>>> - allow hid-generic to also match HID_GROUP_U2F (1 line in hid-generic.c).
>>>>>
>>>>> Then, the udev rule would be super trivial.
>>>>
>>>> I'm confused.  What would the user-visible effect of this be?  Would
>>>> the hidraw node still show up?  What is user code supposed to match to
>>>> detect a U2F device or to otherwise set permissions?
>>>>
>>>
>>> The only change with what we currently have is that the modalias of
>>> the device will be something like
>>> MODALIAS=hid:b0003gHID_GROUP_U2Fvp. (replace
>>> HID_GROUP_U2F with the proper hex value). So matching against this
>>> modalias is trivial, and you can just put the hidraw node rw for
>>> users.
>>
>> Do we really want udev matching against MODALIAS, and do we really
>> want udev rules to hardcode hid group constants?
>
> That's a good question. I don't think the hid group will change in the
> future and we can guarantee that in the kernel I think.
>
>>
>> I'd like this idea better if we added a HID_GROUP uevent property with
>> a textual description, or perhaps something a little more specific.
>
> This refers back to Jiri's first remark. Adding such a thing is
> doable, but do we really want/need it :/

I would tend to think that designing a HID interface for userspace
consumption might be a hint that it's the right time to give it a
better name than "MODALIAS" :)

--Andy

>
>> The hid group field seems to be used for different types of things.
>
> yes, my proposal is definitively a (ugly) hack around the groups which
> are used to select which hid subdriver we use.
>
>
> An other question which comes to my mind is don't we want logind to
> assign the hidraw node to the proper client? We may still need to tag
> the device somehow, but if logind prevents untrusted application to
> run arbitrary code on the u2f token, that might be a little bit safer
> than allowing anybody to read/write.

We do want it to be assigned to the proper client.  My udev patch
(which will almost certainly not be accepted as-is, but it could be
fixed up) uses the report_descriptor to set ID_SECURITY_TOKEN=1, which
in turn causes an existing rule to set TAG+="uaccess", which causes
logind to do its thing.

>
> Sorry to raise more questions than providing answers.

No problem.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
On Sun, Nov 2, 2014 at 3:01 PM, Benjamin Tissoires
 wrote:
> On Sun, Nov 2, 2014 at 5:49 PM, Andy Lutomirski  wrote:
>> On Sun, Nov 2, 2014 at 2:45 PM, Benjamin Tissoires
>>  wrote:
>>> On Sun, Nov 2, 2014 at 4:40 PM, Jiri Kosina  wrote:
>>>> On Sun, 2 Nov 2014, Jiri Kosina wrote:
>>>>
>>>>> Alternatively, you can just write udev rule which triggers on HID devices,
>>>>> issues HIDIOCGRDESC ioctl() on the just-created hidraw node, and decides
>>>>> afterwards whether node permissions need to be altered ... right?
>>>>
>>>> Just to make myself clear here -- this is basically alternative to your
>>>> 1st option, but for cases where you want to make yourself independent on
>>>> sysfs existence (I don't what is the craziness level of setups you are
>>>> considering).
>>>>
>>>
>>> I am a little bit concerned with the user space retrieving the
>>> descriptor, parsing it and then deciding how to set the permissions.
>>> It's not that it is super complex, but it will duplicate the parsing
>>> we already do in the kernel. Wacom tried to do that in their usb
>>> driver, and they never managed to fully implement one.
>>>
>>> I think the HID_GROUP solution is super trivial:
>>> - add a match on the U2F input report and set the group to
>>> HID_GROUP_U2F (2 lines in hid_scan_input_usage() in hid-core.c)
>>> - allow hid-generic to also match HID_GROUP_U2F (1 line in hid-generic.c).
>>>
>>> Then, the udev rule would be super trivial.
>>
>> I'm confused.  What would the user-visible effect of this be?  Would
>> the hidraw node still show up?  What is user code supposed to match to
>> detect a U2F device or to otherwise set permissions?
>>
>
> The only change with what we currently have is that the modalias of
> the device will be something like
> MODALIAS=hid:b0003gHID_GROUP_U2Fvp. (replace
> HID_GROUP_U2F with the proper hex value). So matching against this
> modalias is trivial, and you can just put the hidraw node rw for
> users.

Do we really want udev matching against MODALIAS, and do we really
want udev rules to hardcode hid group constants?

I'd like this idea better if we added a HID_GROUP uevent property with
a textual description, or perhaps something a little more specific.
The hid group field seems to be used for different types of things.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
On Sun, Nov 2, 2014 at 2:45 PM, Benjamin Tissoires
 wrote:
> On Sun, Nov 2, 2014 at 4:40 PM, Jiri Kosina  wrote:
>> On Sun, 2 Nov 2014, Jiri Kosina wrote:
>>
>>> Alternatively, you can just write udev rule which triggers on HID devices,
>>> issues HIDIOCGRDESC ioctl() on the just-created hidraw node, and decides
>>> afterwards whether node permissions need to be altered ... right?
>>
>> Just to make myself clear here -- this is basically alternative to your
>> 1st option, but for cases where you want to make yourself independent on
>> sysfs existence (I don't what is the craziness level of setups you are
>> considering).
>>
>
> I am a little bit concerned with the user space retrieving the
> descriptor, parsing it and then deciding how to set the permissions.
> It's not that it is super complex, but it will duplicate the parsing
> we already do in the kernel. Wacom tried to do that in their usb
> driver, and they never managed to fully implement one.
>
> I think the HID_GROUP solution is super trivial:
> - add a match on the U2F input report and set the group to
> HID_GROUP_U2F (2 lines in hid_scan_input_usage() in hid-core.c)
> - allow hid-generic to also match HID_GROUP_U2F (1 line in hid-generic.c).
>
> Then, the udev rule would be super trivial.

I'm confused.  What would the user-visible effect of this be?  Would
the hidraw node still show up?  What is user code supposed to match to
detect a U2F device or to otherwise set permissions?

--Andy

>
> Also, if we want to further extend the kernel API for U2F, the group
> will already be in place.
>
> Cheers,
> Benjamin



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
On Sun, Nov 2, 2014 at 12:47 PM, Tom Gundersen  wrote:
> Hi Andy,
>
> On Sun, Nov 2, 2014 at 7:57 PM, Andy Lutomirski  wrote:
>> I want to get U2F (universal second factor, sometimes called "security
>> key" or even "gnubby") working on Linux.  U2F tokens are HID devices
>> that speak a custom protocol.  The intent is that user code will speak
>> to then using something like HIDAPI.
>>
>> The trick is that, for HIDAPI to work, something needs to recognize
>> these devices and get udev to set appropriate device permissions.
>>
>> My question is: how should this be done?  The official way to
>> enumerate U2F devices is to look for a HID usage page 0xf1d0
>> containing usage 0x1.
>>
>> Options include:
>>
>>  - A builtin udev helper that reads the sysfs report_descriptor for
>> hid or hidraw devices and sets attributes accordingly (either
>> ID_SECURITY_TOKEN or something more general).
>
> I don't think we should have such special-purpose logic in the udev core.
>
> [...]
>
>> - HID core code in the kernel to add
>> HID_USAGES=f1d1:lots:of:other:things to the uevent (or udev code
>> to do the same).  This might end up producing a rather long string or
>> some devices.
>
> This makes the most sense to me. We could put this logic (adapting the
> patch you posted) in src/udev/udev-builtin-usb_id.c.

How would that work?  Can't a USB device have more than one HID class
device under it?  I could imagine a future U2F keyboard that has a HID
class device for the keyboard part and another one for U2F.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
On Sun, Nov 2, 2014 at 12:42 PM, Jiri Kosina  wrote:
> On Sun, 2 Nov 2014, Andy Lutomirski wrote:
>
>> I want to get U2F (universal second factor, sometimes called "security
>> key" or even "gnubby") working on Linux.  U2F tokens are HID devices
>> that speak a custom protocol.  The intent is that user code will speak
>> to then using something like HIDAPI.
>>
>> The trick is that, for HIDAPI to work, something needs to recognize
>> these devices and get udev to set appropriate device permissions.
>
> Just to make sure we are on the same page -- this is really only about
> setting proper device node permissions and nothing else, right?

Yes.

>
>> My question is: how should this be done?  The official way to
>> enumerate U2F devices is to look for a HID usage page 0xf1d0
>> containing usage 0x1.
>>
>> Options include:
>>
>>  - A builtin udev helper that reads the sysfs report_descriptor for
>> hid or hidraw devices and sets attributes accordingly (either
>> ID_SECURITY_TOKEN or something more general).
>
> Hmmm ... please keep in mind that report_descriptor is actually in
> debugfs, so it's a bit questionable whether you can rely on it being
> present on well-defined location on all systems.
>

Huh?  I have 
/sys/devices/pci:00/:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:1050:0120.0013/report_descriptor
on my system.

> You can get it from different other places though, such as libusb (but
> then you are limited only to USB HID devices ... which might be enough
> in your particular case).

I have no idea whether there will ever be a bluetooth U2F device.

>
>> - A udev helper that does this and doesn't live in the systemd tree.
>> I don't love this option -- I'd prefer for this to be as plug-and-play
>> as possible.
>
> Agreed.
>
>> - HID core code in the kernel to add
>> HID_USAGES=f1d1:lots:of:other:things to the uevent (or udev code
>> to do the same).  This might end up producing a rather long string or
>> some devices.
>
> We have been thinking about this quite a lot in the past, and decided to
> postpone this until there is a very good reason for it to happen.

I'm not sure that U2F counts as a very good reason.

>
>>  - An actual kernel driver for U2F devices using the hid group
>> mechanism for enumeration.  This seems overcomplicated.
>
> Hmmm ... why do you think so? I believe it actually should be really
> super-trivial.

The HID group part is trivial, but what interface would the driver
expose?  I don't think that a udev rule for hidraw matching the
modalias makes any sense, and, if it were a real driver, what
interface would it expose?  Most cross-platform userspace code for U2F
is likely to use HIDAPI.

Admittedly, a U2F driver in the kernel would be slightly nicer from a
multiplexing standpoint than hidraw.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
I want to get U2F (universal second factor, sometimes called "security
key" or even "gnubby") working on Linux.  U2F tokens are HID devices
that speak a custom protocol.  The intent is that user code will speak
to then using something like HIDAPI.

The trick is that, for HIDAPI to work, something needs to recognize
these devices and get udev to set appropriate device permissions.

My question is: how should this be done?  The official way to
enumerate U2F devices is to look for a HID usage page 0xf1d0
containing usage 0x1.

Options include:

 - A builtin udev helper that reads the sysfs report_descriptor for
hid or hidraw devices and sets attributes accordingly (either
ID_SECURITY_TOKEN or something more general).

- A udev helper that does this and doesn't live in the systemd tree.
I don't love this option -- I'd prefer for this to be as plug-and-play
as possible.

- HID core code in the kernel to add
HID_USAGES=f1d1:lots:of:other:things to the uevent (or udev code
to do the same).  This might end up producing a rather long string or
some devices.

 - An actual kernel driver for U2F devices using the hid group
mechanism for enumeration.  This seems overcomplicated.

Concretely, my U2F device is:

/devices/pci:00/:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:1050:0120.0011/hidraw/hidraw0

The 0xf1d0 thing is enumerable on
/devices/pci:00/:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:1050:0120.0011.

Thoughts?

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] udev: Add hidraw_id and a rule file to invoke it

2014-10-29 Thread Andy Lutomirski
On Tue, Oct 28, 2014 at 3:46 PM, Andy Lutomirski  wrote:
> So far, hidraw_id detects U2F tokens and sets:
> ID_U2F_TOKEN=1
> ID_SECURITY_TOKEN=1
>
> This causes the uaccess rules to apply to U2F devices.

This works for the Plug-up security key, too.

--Andy

> ---
>
> I've never written any udev code before.  Feedback welcome.
>
> If you think this doesn't belong in udev, I can try to find it another home.
>
>  .gitignore |   1 +
>  Makefile.am|  11 
>  rules/60-hidraw.rules  |   7 ++
>  src/udev/hidraw_id/Makefile|   1 +
>  src/udev/hidraw_id/hidraw_id.c | 144 
> +
>  5 files changed, 164 insertions(+)
>  create mode 100644 rules/60-hidraw.rules
>  create mode 12 src/udev/hidraw_id/Makefile
>  create mode 100644 src/udev/hidraw_id/hidraw_id.c
>
> diff --git a/.gitignore b/.gitignore
> index f119b574c777..4bd3cdf08f0d 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -34,6 +34,7 @@
>  /exported
>  /exported-*
>  /gtk-doc.make
> +/hidraw_id
>  /hostnamectl
>  /install-tree
>  /journalctl
> diff --git a/Makefile.am b/Makefile.am
> index fae946a388af..9f64687d32b1 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -3542,6 +3542,17 @@ udevlibexec_PROGRAMS += \
> ata_id
>
>  # 
> --
> +hidraw_id_SOURCES = \
> +   src/udev/hidraw_id/hidraw_id.c
> +
> +hidraw_id_LDADD = \
> +   libudev-internal.la \
> +   libsystemd-shared.la
> +
> +udevlibexec_PROGRAMS += \
> +   hidraw_id
> +
> +# 
> --
>  cdrom_id_SOURCES = \
> src/udev/cdrom_id/cdrom_id.c
>
> diff --git a/rules/60-hidraw.rules b/rules/60-hidraw.rules
> new file mode 100644
> index ..1ee9c812f711
> --- /dev/null
> +++ b/rules/60-hidraw.rules
> @@ -0,0 +1,7 @@
> +# do not edit this file, it will be overwritten on update
> +
> +ACTION=="remove", GOTO="hidraw_end"
> +
> +SUBSYSTEM=="hidraw", IMPORT{program}="hidraw_id --udev"
> +
> +LABEL="keyboard_end"
> diff --git a/src/udev/hidraw_id/Makefile b/src/udev/hidraw_id/Makefile
> new file mode 12
> index ..d0b0e8e0086f
> --- /dev/null
> +++ b/src/udev/hidraw_id/Makefile
> @@ -0,0 +1 @@
> +../Makefile
> \ No newline at end of file
> diff --git a/src/udev/hidraw_id/hidraw_id.c b/src/udev/hidraw_id/hidraw_id.c
> new file mode 100644
> index ..e32f222f22f9
> --- /dev/null
> +++ b/src/udev/hidraw_id/hidraw_id.c
> @@ -0,0 +1,144 @@
> +/*
> + * Copyright (c) Andrew Lutomirski, 2014
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "libudev.h"
> +#include "libudev-private.h"
> +
> +_printf_(6,0)
> +static void log_fn(struct udev *udev, int priority,
> +   const char *file, int line, const char *fn,
> +   const char *format, va_list args)
> +{
> +log_metav(priority, file, line, fn, format, args);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +struct udev *udev;
> +struct udev_device *dev, *hiddev;
> +char path[4096];
> +unsigned char desc[4096];
> +int desclen;
> +int fd = -1;
> +int i;
> +int ret = 1;
> +unsigned int usage_page = 0;
> +int is_u2f_token = 0;
> +
> +if (argc != 2) {
> +fprintf(stderr, "Usage: hidraw_id SYSFS_PATH|--udev\n");
> +return 1;
> +}
> +
> +log_parse_environment();
> +log_open();
> +
> +udev = udev_new();
> +
> +udev_set_log_fn(udev, log_fn);
> +
> +if (!strcmp(argv[1], "--udev"))
> +de

[systemd-devel] [PATCH] udev: Add hidraw_id and a rule file to invoke it

2014-10-28 Thread Andy Lutomirski
So far, hidraw_id detects U2F tokens and sets:
ID_U2F_TOKEN=1
ID_SECURITY_TOKEN=1

This causes the uaccess rules to apply to U2F devices.
---

I've never written any udev code before.  Feedback welcome.

If you think this doesn't belong in udev, I can try to find it another home.

 .gitignore |   1 +
 Makefile.am|  11 
 rules/60-hidraw.rules  |   7 ++
 src/udev/hidraw_id/Makefile|   1 +
 src/udev/hidraw_id/hidraw_id.c | 144 +
 5 files changed, 164 insertions(+)
 create mode 100644 rules/60-hidraw.rules
 create mode 12 src/udev/hidraw_id/Makefile
 create mode 100644 src/udev/hidraw_id/hidraw_id.c

diff --git a/.gitignore b/.gitignore
index f119b574c777..4bd3cdf08f0d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,6 +34,7 @@
 /exported
 /exported-*
 /gtk-doc.make
+/hidraw_id
 /hostnamectl
 /install-tree
 /journalctl
diff --git a/Makefile.am b/Makefile.am
index fae946a388af..9f64687d32b1 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -3542,6 +3542,17 @@ udevlibexec_PROGRAMS += \
ata_id
 
 # 
--
+hidraw_id_SOURCES = \
+   src/udev/hidraw_id/hidraw_id.c
+
+hidraw_id_LDADD = \
+   libudev-internal.la \
+   libsystemd-shared.la
+
+udevlibexec_PROGRAMS += \
+   hidraw_id
+
+# 
--
 cdrom_id_SOURCES = \
src/udev/cdrom_id/cdrom_id.c
 
diff --git a/rules/60-hidraw.rules b/rules/60-hidraw.rules
new file mode 100644
index ..1ee9c812f711
--- /dev/null
+++ b/rules/60-hidraw.rules
@@ -0,0 +1,7 @@
+# do not edit this file, it will be overwritten on update
+
+ACTION=="remove", GOTO="hidraw_end"
+
+SUBSYSTEM=="hidraw", IMPORT{program}="hidraw_id --udev"
+
+LABEL="keyboard_end"
diff --git a/src/udev/hidraw_id/Makefile b/src/udev/hidraw_id/Makefile
new file mode 12
index ..d0b0e8e0086f
--- /dev/null
+++ b/src/udev/hidraw_id/Makefile
@@ -0,0 +1 @@
+../Makefile
\ No newline at end of file
diff --git a/src/udev/hidraw_id/hidraw_id.c b/src/udev/hidraw_id/hidraw_id.c
new file mode 100644
index ..e32f222f22f9
--- /dev/null
+++ b/src/udev/hidraw_id/hidraw_id.c
@@ -0,0 +1,144 @@
+/*
+ * Copyright (c) Andrew Lutomirski, 2014
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "libudev.h"
+#include "libudev-private.h"
+
+_printf_(6,0)
+static void log_fn(struct udev *udev, int priority,
+   const char *file, int line, const char *fn,
+   const char *format, va_list args)
+{
+log_metav(priority, file, line, fn, format, args);
+}
+
+int main(int argc, char **argv)
+{
+struct udev *udev;
+struct udev_device *dev, *hiddev;
+char path[4096];
+unsigned char desc[4096];
+int desclen;
+int fd = -1;
+int i;
+int ret = 1;
+unsigned int usage_page = 0;
+int is_u2f_token = 0;
+
+if (argc != 2) {
+fprintf(stderr, "Usage: hidraw_id SYSFS_PATH|--udev\n");
+return 1;
+}
+
+log_parse_environment();
+log_open();
+
+udev = udev_new();
+
+udev_set_log_fn(udev, log_fn);
+
+if (!strcmp(argv[1], "--udev"))
+dev = udev_device_new_from_environment(udev);
+else
+dev = udev_device_new_from_syspath(udev, argv[1]);
+
+if (!dev)
+goto out;
+
+hiddev = udev_device_get_parent(dev);
+if (!hiddev)
+goto out;
+
+if (snprintf(path, sizeof(path), "%s/report_descriptor",
+ udev_device_get_syspath(hiddev)) > (int)sizeof(path))
+return 1;
+
+fd = open(path, O_RDONLY | O_NOFOLLOW);
+if (fd == -1)
+goto out;
+
+desclen = read(fd, desc, sizeof(desc));
+if (desclen <= 0)
+goto out;
+
+/* Parse the report descriptor. */
+for (i = 0; i < desclen; ) {
+unsigned char tag = desc[i] >> 4;
+unsigned char type = (desc[i] >> 2) & 0x3;
+unsigned char sizecode = desc[i] & 0x3;
+int size, j;
+uns

Re: [systemd-devel] Writing a udev rule for U2F security tokens?

2014-10-28 Thread Andy Lutomirski
On Tue, Oct 28, 2014 at 1:40 AM, Greg KH  wrote:
> On Mon, Oct 27, 2014 at 04:37:14PM -0700, Andy Lutomirski wrote:
>> On Mon, Oct 27, 2014 at 4:32 PM, Greg KH  wrote:
>> > On Mon, Oct 27, 2014 at 04:12:30PM -0700, Andy Lutomirski wrote:
>> >> Hi-
>> >>
>> >> I'd like to write a generic udev rule for U2F security tokens and to
>> >> possibly get it integrated into systemd / udev, but I'm not sure how
>> >> to write it in the first place.
>> >>
>> >> U2F tokens are USB HID devices that have a usage page 0xF1D0 that
>> >> contains usage 0x01.  The rule should match any hidraw device with
>> >> that property.  Can this be done without a user helper?  Is there an
>> >> existing helper in which it would make sense to add such a check?
>> >>
>> >> Here's the draft USB forum allocation:
>> >>
>> >> http://www.usb.org/developers/hidpage/HUTRR48.pdf
>> >>
>> >> Here's the draft spec from the FIDO Alliance:
>> >>
>> >> https://fidoalliance.org/specs/fido-u2f-HID-protocol-v1.0-rd-20141008.pdf
>> >>
>> >> In practice, I expect little change between the draft and final specs,
>> >> since these devices are already for sale and Chromium supports them.
>> >
>> > I don't understand, what would a udev rule do with these devices?
>> > Shouldn't they be exported automatically using the hid "raw" interface
>> > so that userspace can talk to them?  What else needs to be done?
>>
>> Wow, I clearly failed to transfer my thoughts into email...
>>
>> I want to set ID_SECURITY_TOKEN=1 or, more generally, cause the
>> uaccess tag to be set so that users have permission to use the token.
>>
>> This rule works in Fedora for the existing tokens by Yubico:
>>
>> KERNEL=="hidraw*", SUBSYSTEM=="hidraw", ATTRS{idVendor}=="1050",
>> ATTRS{idProduct}=="0113|0114|0115|0116|0120",
>> ENV{ID_SECURITY_TOKEN}="1"
>>
>> but it won't work for other brands of U2F token.
>
> If there's no sysfs attribute that you can read directly to determine
> that it is a a U2F token, then it's not easy to write a udev rule.
>
> You can write a "simple" program to read the hid pages from the hidraw
> interface, and then set an environment variable from there if the "FIDO
> Alliance Page" is present.  You can use a udev rule for that, but it
> will have to be an external tool.

Would a tool like that be considered appropriate to distribute with
udev?  It would have somewhat unpleasant overhead for what is
currently a niche use case.

I suppose the kernel could also be modified to expose this, but doing
that cleanly will involve exposing all the usage pages in sysfs, which
is more complexity than I really want to add.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Writing a udev rule for U2F security tokens?

2014-10-27 Thread Andy Lutomirski
On Mon, Oct 27, 2014 at 4:32 PM, Greg KH  wrote:
> On Mon, Oct 27, 2014 at 04:12:30PM -0700, Andy Lutomirski wrote:
>> Hi-
>>
>> I'd like to write a generic udev rule for U2F security tokens and to
>> possibly get it integrated into systemd / udev, but I'm not sure how
>> to write it in the first place.
>>
>> U2F tokens are USB HID devices that have a usage page 0xF1D0 that
>> contains usage 0x01.  The rule should match any hidraw device with
>> that property.  Can this be done without a user helper?  Is there an
>> existing helper in which it would make sense to add such a check?
>>
>> Here's the draft USB forum allocation:
>>
>> http://www.usb.org/developers/hidpage/HUTRR48.pdf
>>
>> Here's the draft spec from the FIDO Alliance:
>>
>> https://fidoalliance.org/specs/fido-u2f-HID-protocol-v1.0-rd-20141008.pdf
>>
>> In practice, I expect little change between the draft and final specs,
>> since these devices are already for sale and Chromium supports them.
>
> I don't understand, what would a udev rule do with these devices?
> Shouldn't they be exported automatically using the hid "raw" interface
> so that userspace can talk to them?  What else needs to be done?

Wow, I clearly failed to transfer my thoughts into email...

I want to set ID_SECURITY_TOKEN=1 or, more generally, cause the
uaccess tag to be set so that users have permission to use the token.

This rule works in Fedora for the existing tokens by Yubico:

KERNEL=="hidraw*", SUBSYSTEM=="hidraw", ATTRS{idVendor}=="1050",
ATTRS{idProduct}=="0113|0114|0115|0116|0120",
ENV{ID_SECURITY_TOKEN}="1"

but it won't work for other brands of U2F token.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Writing a udev rule for U2F security tokens?

2014-10-27 Thread Andy Lutomirski
Hi-

I'd like to write a generic udev rule for U2F security tokens and to
possibly get it integrated into systemd / udev, but I'm not sure how
to write it in the first place.

U2F tokens are USB HID devices that have a usage page 0xF1D0 that
contains usage 0x01.  The rule should match any hidraw device with
that property.  Can this be done without a user helper?  Is there an
existing helper in which it would make sense to add such a check?

Here's the draft USB forum allocation:

http://www.usb.org/developers/hidpage/HUTRR48.pdf

Here's the draft spec from the FIDO Alliance:

https://fidoalliance.org/specs/fido-u2f-HID-protocol-v1.0-rd-20141008.pdf

In practice, I expect little change between the draft and final specs,
since these devices are already for sale and Chromium supports them.

Thanks,
Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] udev: fail firmware loading immediately if no search path is defined

2013-08-07 Thread Andy Lutomirski
On Wed, Aug 7, 2013 at 12:52 AM, Maarten Lankhorst
 wrote:
> Op 07-08-13 02:26, Andy Lutomirski schreef:
>> On Tue, Aug 6, 2013 at 5:24 PM, Tom Gundersen  wrote:
>>> On 6 Aug 2013 18:32, "Bryan Kadzban"  wrote:
>>>> On Tue, Aug 06, 2013 at 11:17:17AM +0200, Tom Gundersen wrote:
>>>>> On Tue, Aug 6, 2013 at 11:11 AM, Tom Gundersen  wrote:
>>>>>> On Tue, Aug 6, 2013 at 10:20 AM, Maarten Lankhorst
>>>>>>  wrote:
>>>>>>> Op 05-08-13 18:29, Andy Lutomirski schreef:
>>>>>>>> The systemd commit below can delay firmware loading by multiple
>>>>>>>> minutes if CONFIG_FW_LOADER_USER_HELPER=y.  Unfortunately no one
>>>>>>>> noticed that the systemd-udev change would break new kernels as well
>>>>>>>> as old kernels.
>>>>>>>>
>>>>>>>> Since the kernel apparently can't count on reasonable userspace
>>>>>>>> support, turn this thing off by default.
>>>>>>>>
>>>>>>>> commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a
>>>>>>>> Author: Tom Gundersen 
>>>>>>>> Date:   Mon Mar 18 15:12:18 2013 +0100
>>>>>>>>
>>>>>>>> udev: make firmware loading optional and disable by default
>>>>>>>>
>>>>>>>> Distros that whish to support old kernels should set
>>>>>>>>
>>>>>>>> --with-firmware-dirs="/usr/lib/firmware/updates:/usr/lib/firmware"
>>>>>>>> to retain the old behaviour.
>>>>>>>>
>>>>>>> methinks this patch should be reverted then,
>>>>>> Well, all the code is still there, so it can be enabled if anyone
>>>>>> wants it.
>>>>>>
>>>>>>> or a stub should be added to udev to always fail firmware loading so
>>>>>>> timeouts don't occur.
>>>>>> I think the only use (if any) of a userspace firmware loader would be
>>>>>> for anyone who wants a custom one (i.e., not udev), so we shouldn't
>>>>>> just fail the loading from udev unconditionally.
>>>>>>
>>>>>> How about we just improve the udev documentation a bit, similar to
>>>>>> Andy's kernel patch?
>>>>> Sorry, I should first have checked. We already document this in the
>>>>> README:
>>>>>
>>>>>>Userspace firmware loading is deprecated, will go away, and
>>>>>>sometimes causes problems:
>>>>>>  CONFIG_FW_LOADER_USER_HELPER=n
>>>> ...And this patch is making the kernel default to the correct behavior,
>>>> instead of the now-broken-by-udev behavior.
>>>>
>>>> I'm not sure I see the issue with it?  :-)
>>> Oh yeah this patch is totally the right thing to do, I was just arguing that
>>> there is nothing to be done on the udev side.
>>>
>>>> (Add me to the list of people that think udev is broken too, fwiw.  But
>>>> let's at least not leave *both* sides in a broken-by-default state.)
>>> Well I don't think it is too much to ask that the kernel and udev should be
>>> configured in a consistent way. Especially as thing still work even if you
>>> get it wrong, albeit with a delay.
>> Except that the current defaults are inconsistent and there is no
>> explanation anywhere in the logs when this is screwed up.
>>
> So what is wrong with my 'fail in udev immediately if not configured' idea? 
> In that case it
> doesn't matter whether CONFIG_FW_LOADER_USER_HELPER is set or not.
>
> You could even print a useful message for the user in udev to the log, so 
> they have an idea of what
> happened. Breaking udev on older still supported kernels by default without 
> printing any debug info
> is silly, and the only cost is a small increase in disk space when unused. I 
> did so in below patch.

Seems reasonable to me.  It might be worth adding something to the
message to suggest turning off CONFIG_FW_LOADER_USER_HELPER.

>
> ~Maarten
>
> I converted < ELEMENTSOF to != ELEMENTSOF in the loop to work correctly when 
> the array is empty.
> Most code in udev-builtin-firmware is eliminated at -O2 optimization level 
> when FIRMWARE_PATH
> is not explicitly set.
>
> 8<
> diff --git a/Makefile.am b/Makefile.am
> index b8b8

Re: [systemd-devel] [PATCH] Change CONFIG_FW_LOADER_USER_HELPER to default n and don't select it

2013-08-06 Thread Andy Lutomirski
On Tue, Aug 6, 2013 at 5:24 PM, Tom Gundersen  wrote:
>
> On 6 Aug 2013 18:32, "Bryan Kadzban"  wrote:
>>
>> On Tue, Aug 06, 2013 at 11:17:17AM +0200, Tom Gundersen wrote:
>> > On Tue, Aug 6, 2013 at 11:11 AM, Tom Gundersen  wrote:
>> > > On Tue, Aug 6, 2013 at 10:20 AM, Maarten Lankhorst
>> > >  wrote:
>> > >> Op 05-08-13 18:29, Andy Lutomirski schreef:
>> > >>> The systemd commit below can delay firmware loading by multiple
>> > >>> minutes if CONFIG_FW_LOADER_USER_HELPER=y.  Unfortunately no one
>> > >>> noticed that the systemd-udev change would break new kernels as well
>> > >>> as old kernels.
>> > >>>
>> > >>> Since the kernel apparently can't count on reasonable userspace
>> > >>> support, turn this thing off by default.
>> > >>>
>> > >>> commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a
>> > >>> Author: Tom Gundersen 
>> > >>> Date:   Mon Mar 18 15:12:18 2013 +0100
>> > >>>
>> > >>> udev: make firmware loading optional and disable by default
>> > >>>
>> > >>> Distros that whish to support old kernels should set
>> > >>>
>> > >>> --with-firmware-dirs="/usr/lib/firmware/updates:/usr/lib/firmware"
>> > >>> to retain the old behaviour.
>> > >>>
>> > >> methinks this patch should be reverted then,
>> > >
>> > > Well, all the code is still there, so it can be enabled if anyone
>> > > wants it.
>> > >
>> > >> or a stub should be added to udev to always fail firmware loading so
>> > >> timeouts don't occur.
>> > >
>> > > I think the only use (if any) of a userspace firmware loader would be
>> > > for anyone who wants a custom one (i.e., not udev), so we shouldn't
>> > > just fail the loading from udev unconditionally.
>> > >
>> > > How about we just improve the udev documentation a bit, similar to
>> > > Andy's kernel patch?
>> >
>> > Sorry, I should first have checked. We already document this in the
>> > README:
>> >
>> > >Userspace firmware loading is deprecated, will go away, and
>> > >sometimes causes problems:
>> > >  CONFIG_FW_LOADER_USER_HELPER=n
>>
>> ...And this patch is making the kernel default to the correct behavior,
>> instead of the now-broken-by-udev behavior.
>>
>> I'm not sure I see the issue with it?  :-)
>
> Oh yeah this patch is totally the right thing to do, I was just arguing that
> there is nothing to be done on the udev side.
>
>> (Add me to the list of people that think udev is broken too, fwiw.  But
>> let's at least not leave *both* sides in a broken-by-default state.)
>
> Well I don't think it is too much to ask that the kernel and udev should be
> configured in a consistent way. Especially as thing still work even if you
> get it wrong, albeit with a delay.

Except that the current defaults are inconsistent and there is no
explanation anywhere in the logs when this is screwed up.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Change CONFIG_FW_LOADER_USER_HELPER to default n and don't select it

2013-08-06 Thread Andy Lutomirski
On Tue, Aug 6, 2013 at 2:17 AM, Tom Gundersen  wrote:
> On Tue, Aug 6, 2013 at 11:11 AM, Tom Gundersen  wrote:
>> On Tue, Aug 6, 2013 at 10:20 AM, Maarten Lankhorst
>>  wrote:
>>> Op 05-08-13 18:29, Andy Lutomirski schreef:
>>>> The systemd commit below can delay firmware loading by multiple
>>>> minutes if CONFIG_FW_LOADER_USER_HELPER=y.  Unfortunately no one
>>>> noticed that the systemd-udev change would break new kernels as well
>>>> as old kernels.
>>>>
>>>> Since the kernel apparently can't count on reasonable userspace
>>>> support, turn this thing off by default.
>>>>
>>>> commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a
>>>> Author: Tom Gundersen 
>>>> Date:   Mon Mar 18 15:12:18 2013 +0100
>>>>
>>>> udev: make firmware loading optional and disable by default
>>>>
>>>> Distros that whish to support old kernels should set
>>>>   --with-firmware-dirs="/usr/lib/firmware/updates:/usr/lib/firmware"
>>>> to retain the old behaviour.
>>>>
>>> methinks this patch should be reverted then,
>>
>> Well, all the code is still there, so it can be enabled if anyone wants it.
>>
>>> or a stub should be added to udev to always fail firmware loading so 
>>> timeouts don't occur.
>>
>> I think the only use (if any) of a userspace firmware loader would be
>> for anyone who wants a custom one (i.e., not udev), so we shouldn't
>> just fail the loading from udev unconditionally.
>>
>> How about we just improve the udev documentation a bit, similar to
>> Andy's kernel patch?
>
> Sorry, I should first have checked. We already document this in the README:
>
>>Userspace firmware loading is deprecated, will go away, and
>>sometimes causes problems:
>>  CONFIG_FW_LOADER_USER_HELPER=n

TBH, the udev README is the last thing I'm going to check to figure
out why I don't have wifi for a couple minutes after boot.  Also, the
message is missing the point.  It's not that it's deprecated and
sometimes causes problems -- it's that udev *changed behavior* and
breaks your system if you have CONFIG_FW_LOADER_USER_HELPER=y.

If udev logged something (for a couple of years), that would be a
different story.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] Change CONFIG_FW_LOADER_USER_HELPER to default n and don't select it

2013-08-05 Thread Andy Lutomirski
The systemd commit below can delay firmware loading by multiple
minutes if CONFIG_FW_LOADER_USER_HELPER=y.  Unfortunately no one
noticed that the systemd-udev change would break new kernels as well
as old kernels.

Since the kernel apparently can't count on reasonable userspace
support, turn this thing off by default.

commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a
Author: Tom Gundersen 
Date:   Mon Mar 18 15:12:18 2013 +0100

udev: make firmware loading optional and disable by default

Distros that whish to support old kernels should set
  --with-firmware-dirs="/usr/lib/firmware/updates:/usr/lib/firmware"
to retain the old behaviour.
---
 drivers/base/Kconfig | 15 +++
 drivers/firmware/Kconfig |  1 -
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 5daa259..de3903e 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -146,13 +146,20 @@ config EXTRA_FIRMWARE_DIR
 config FW_LOADER_USER_HELPER
bool "Fallback user-helper invocation for firmware loading"
depends on FW_LOADER
-   default y
+   default n
help
  This option enables / disables the invocation of user-helper
  (e.g. udev) for loading firmware files as a fallback after the
- direct file loading in kernel fails.  The user-mode helper is
- no longer required unless you have a special firmware file that
- resides in a non-standard path.
+ direct file loading in kernel fails.
+
+ Since March 2013, a default udev build does not understand
+ firmware loading requests.  These udev versions will not
+ even indicate failure; instead they cause long timeouts.
+ This can dramatically slow down the boot process.
+
+ Say Y only if you have special firmware-loading requirements
+ and if you have a non-standard helper that will handle these
+ requests.
 
 config DEBUG_DRIVER
bool "Driver Core verbose debug messages"
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 07478728..9387630 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -64,7 +64,6 @@ config DELL_RBU
tristate "BIOS update support for DELL systems via sysfs"
depends on X86
select FW_LOADER
-   select FW_LOADER_USER_HELPER
help
 Say m if you want to have the option of updating the BIOS for your
 DELL system. Note you need a Dell OpenManage or Dell Update package 
(DUP)
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Slow firmware timeouts again (Re: [3.11 regression?] iwlwifi firmware takes two minutes to load)

2013-08-05 Thread Andy Lutomirski
On Mon, Aug 5, 2013 at 4:18 AM, Kay Sievers  wrote:
> On Fri, Aug 2, 2013 at 6:28 PM, Zbigniew Jędrzejewski-Szmek
>  wrote:
>> On Fri, Aug 02, 2013 at 09:04:44AM -0700, Andy Lutomirski wrote:
>>> CONFIG_FW_LOADER_USER_HELPER=y
>> Do you need this? Unsetting this should help.
>>
>> """This option enables / disables the invocation of user-helper
>> (e.g. udev) for loading firmware files as a fallback after the
>> direct file loading in kernel fails. The user-mode helper is
>> no longer required unless you have a special firmware file that
>> resides in a non-standard path."""
>
> On recent systems, if the kernel configures
> CONFIG_FW_LOADER_USER_HELPER=y and a firmware is not found by the
> kernel, the kernel will issue a request which is ignored by userspace
> and will block in that for 60 seconds.
>
> Udev is no longer in the game of firmware loading, not even as a
> fallback, it will just completely ignore all kernel firmware class
> events.
>
> The source code in udev to handle firmware requests is disabled by
> default, currently still kept around for old kernels without the
> in-kernel firmware loader, but it will be deleted in the near future.

Any chance you'd consider a less regression-inducing path to getting
rid of this feature?  For example, have udev warn and immediate fail
firmware loading requests for a few releases, then just warn, then
drop support?

Meanwhile, CONFIG_FW_LOADER_USER_HELPER is still default y (!), so
udev has introduced massive bootup delays into the default
configuration with no warning.  It might be nice to change it to
default n, get rid of everything that selects it, and possible even
rename it to something with LEGACY or OBSOLETE in the name so that
make oldconfig will prompt.

--Andy

>
> Any issues with firmware timeouts should be addressed in the kernel by
> disabling CONFIG_FW_LOADER_USER_HELPER or by removing the fallback
> code from the in-kernel loader.
>
> Kay



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Slow firmware timeouts again (Re: [3.11 regression?] iwlwifi firmware takes two minutes to load)

2013-08-02 Thread Andy Lutomirski
On Fri, Aug 2, 2013 at 9:21 AM, Johannes Berg  wrote:
> On Fri, 2013-08-02 at 09:04 -0700, Andy Lutomirski wrote:
>
>> > It wasn't exactly fixed and it's really more of a userspace problem - we
>> > probably request firmware version 8, and then it takes 30 seconds to
>> > time out for each of 8,7,6,5, after which the next request for 4 is
>> > successful.
>>
>> Why's it requesting those firmwares?  They don't seem to exist on
>> intellinuxwireless.org.
>
> Well for one you've never even mentioned what device you have, and then
> also it's not requesting 8/7 only 6,5,4 -- I guess the timeout was
> increased to 60 seconds (or I'm remembering wrong and it always was? I
> thought it was 30s)

I have an "Ultimate-N 6300 (rev 35)".  It's requesting at least
versions 6 and 5 (I saw them in udevadm monitor).  It looks like the
g2a and g2b variants have -5 and -6 versions, but 6000-4 appears to be
the only relevant version for my hardware.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Slow firmware timeouts again (Re: [3.11 regression?] iwlwifi firmware takes two minutes to load)

2013-08-02 Thread Andy Lutomirski
[cc: linux-kernel, linux-hotplug, and systemd-devel.  This is 3.11-rc3+]

On Fri, Aug 2, 2013 at 12:38 AM, Johannes Berg
 wrote:
> On Thu, 2013-08-01 at 21:38 -0700, Andy Lutomirski wrote:
>> At boot, I get:
>> [   12.537108] iwlwifi :03:00.0: irq 51 for MSI/MSI-X
>> ...
>> [  132.676781] iwlwifi :03:00.0: loaded firmware version 9.221.4.1
>> build 25532 op_mode iwldvm
>>
>> This sounds familiar, but wasn't it fixed awhile ago?
>
> It wasn't exactly fixed and it's really more of a userspace problem - we
> probably request firmware version 8, and then it takes 30 seconds to
> time out for each of 8,7,6,5, after which the next request for 4 is
> successful.

Why's it requesting those firmwares?  They don't seem to exist on
intellinuxwireless.org.

I have:

CONFIG_FW_LOADER=y
# CONFIG_FIRMWARE_IN_KERNEL is not set
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y


>
> I don't know why your userspace isn't behaving differently though.
>
> johannes
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-25 Thread Andy Lutomirski
On Jun 25, 2013 2:43 AM, "Lennart Poettering" 
wrote:
>
> On Mon, 24.06.13 17:09, Andy Lutomirski (l...@amacapital.net) wrote:
>
> >
> > On Mon, Jun 24, 2013 at 4:57 PM, Lennart Poettering
> >  wrote:
> > > On Mon, 24.06.13 16:01, Andy Lutomirski (l...@amacapital.net) wrote:
> > >
> > >> AFAICT the main reason that systemd uses cgroup is to efficiently
> > >> track which service various processes came from and to send signals,
> > >> and it seems like that use case could be handled without cgroups at
> > >> all by creative use of subreapers and a syscall to broadcast a signal
> > >> to everything that has a given subreaper as an ancestor.  In that
> > >> case, systemd could be asked to stay away from cgroups even in the
> > >> single-hierarchy case.
> > >
> > > systemd uses cgroups to manage services. Managing services means many
> > > things. Among them: keeping track of processes, listing processes of a
> > > service, killing processes of a service, doing per-service logging
> > > (which means reliably, immediately, and race-freely tracing back
> > > messages to the service which logged them), about 55 other things, and
> > > also resource management.
> > >
> > > I don't see how I can do anything of this without something like
> > > cgroups, i.e. hierarchial, resource management involved systemd which
> > > allows me to securely put labels on processes.
> >
> > Boneheaded straw-man proposal: two new syscalls and a few spare
processes.
> >
> > int sys_task_reaper(int tid): Returns the reaper for the task tid
> > (which is 1 if there's no subreaper).  (This could just as easily be a
> > file in /proc.)
> >
> > int sys_killall_under_subreaper(int subreaper, int sig): Broadcasts
> > sig to all tasks under subreaper (excluding subreaper).  Guarantees
> > that, even if those tasks are forking, they all get the signal.
> >
> > Then, when starting a service, systemd forks, sets the child to be a
> > subreaper, then forks that child again to exec the service.
> >
> > Does this do everything that's needed?
>
> No. It doesn't do anything that's needed. How do I list all PIDs in a
> service with this?

Walk /proc//children recursively.  A kernel patch to make that
field show up unconditionally instead of hiding under EXPERT would help.

> How do I determine the service of a PID?

Call sys_task_reaper, then look up what service that subreaper comes from.

> How do i do
> resource manage with this?

With cgroups, unless the admin has configured systemd not to use cgroups,
in which case you don't.  (The whole point would be to keep
DefaultControllers= without using the one and only cgroup hierarchy.)

--Andy

>
> > sys_task_reaper is trivial to
> > implement (that functionality is already there in the reparenting
> > code), and sys_killall_under_subreaper is probably not so bad.
> >
> > This has one main downside I can think of: it wastes a decent number
> > of processes (one subreaper per service).
>
> Yeah, also the downside that it doesn't do what we need.
>
> Lennart
>
> --
> Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 4:57 PM, Lennart Poettering
 wrote:
> On Mon, 24.06.13 16:01, Andy Lutomirski (l...@amacapital.net) wrote:
>
>> AFAICT the main reason that systemd uses cgroup is to efficiently
>> track which service various processes came from and to send signals,
>> and it seems like that use case could be handled without cgroups at
>> all by creative use of subreapers and a syscall to broadcast a signal
>> to everything that has a given subreaper as an ancestor.  In that
>> case, systemd could be asked to stay away from cgroups even in the
>> single-hierarchy case.
>
> systemd uses cgroups to manage services. Managing services means many
> things. Among them: keeping track of processes, listing processes of a
> service, killing processes of a service, doing per-service logging
> (which means reliably, immediately, and race-freely tracing back
> messages to the service which logged them), about 55 other things, and
> also resource management.
>
> I don't see how I can do anything of this without something like
> cgroups, i.e. hierarchial, resource management involved systemd which
> allows me to securely put labels on processes.

Boneheaded straw-man proposal: two new syscalls and a few spare processes.

int sys_task_reaper(int tid): Returns the reaper for the task tid
(which is 1 if there's no subreaper).  (This could just as easily be a
file in /proc.)

int sys_killall_under_subreaper(int subreaper, int sig): Broadcasts
sig to all tasks under subreaper (excluding subreaper).  Guarantees
that, even if those tasks are forking, they all get the signal.

Then, when starting a service, systemd forks, sets the child to be a
subreaper, then forks that child again to exec the service.

Does this do everything that's needed?  sys_task_reaper is trivial to
implement (that functionality is already there in the reparenting
code), and sys_killall_under_subreaper is probably not so bad.


This has one main downside I can think of: it wastes a decent number
of processes (one subreaper per service).

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 4:40 PM, Tejun Heo  wrote:
> Hello,
>
> On Mon, Jun 24, 2013 at 4:38 PM, Andy Lutomirski  wrote:
>> Now I'm confused.  I thought that support for multiple hierarchies was
>> going away.  Is it here to stay after all?
>
> It is going to be deprecated but also stay around for quite a while.
> That said, I didn' t mean to use multiple hierarchies. I was saying
> that if you build a sub-hierarchy in the unified hierarchy, you're
> likely to get away with it in most cases.

Isn't that exactly what I was originally asking for?  Quoting from
earlier in the thread:

On Mon, Jun 24, 2013 at 6:27 AM, Lennart Poettering
 wrote:
> On Sat, 22.06.13 15:19, Andy Lutomirski (l...@amacapital.net) wrote:
>>
>> 2. I manage services and tasks outside systemd (for one thing, I
>> currently use Ubuntu, but even if I were on Fedora, I have a bunch
>> of fine-grained things that figure out how they're supposed to
>> allocate resources, and porting them to systemd just to keep working
>> in the new world order would be a PITA [1]).
>>

[...]

>
>> I think that what I want are something like sub-unit cgroups -- I
>> want to be able to ask systemd to further subdivide the group for my
>> unit, login session, or whatever.  Would this be reasonable?
>> (Another way of thinking of this is that a unit would have a whole
>> cgroup hierarchy instead of just one cgroup.)
>
> The idea is not even to allow this. Basically, if you want to partitions
> your daemon into different cgroups you need to do that through systemd's
> abstractions: slices and services. To make this more palatable we'll
> introduce "throw-away" units though, so that you can dynamically run
> something as a workload and don't need to be concerned about naming
> this, or cleaning it up.
>

If I can subdivide my service in the hierarchy, then I'm happy.  If
this gets lost *and* systemd insists on controlling the one and only
cgroup hierarchy, then I think I have serious problems with the new
regime.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 4:37 PM, Tejun Heo  wrote:
> Hello, Andy.
>
> On Mon, Jun 24, 2013 at 04:27:17PM -0700, Andy Lutomirski wrote:
>> I guess what I'm trying to say here is that many systems will rather
>> fundamentally use systemd.  Admins of those systems should still have
>> access to a reasonably large subset of cgroup functionality.  If the
>> single-hierarchy model is going to prevent going around systemd and if
>> systemd isn't going to expose all of the useful cgroup functionality,
>> then perhaps there should be a way to separate systemd's hierarchy
>> from the cgroup hierarchy.
>
> I don't think systemd will prevent you from buildling your own
> hierarchy on the side.  It sure won't be properly supported and things
> might break in corener cases / over time but if you're willing to take
> such risks anyway...  In the long term tho, what should happen
> probably is examining use cases like yours and then incorporating
> sensible mechanisms to support that into the base system
> infrastructure.  It might not be completely identical but I'm sure
> over time we'll be able to find what are the fundamental pieces and
> proper abstractions.  Right now, we're exposing way too much without
> even clearly understanding what are being enabled.  It is
> unsustainable.

Now I'm confused.  I thought that support for multiple hierarchies was
going away.  Is it here to stay after all?

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 4:19 PM, Tejun Heo  wrote:
> Hello,
>
> On Mon, Jun 24, 2013 at 04:01:07PM -0700, Andy Lutomirski wrote:
>> So what is cgroup for?  That is, what's the goal for what the new API
>> should be able to do?
>
> It is a for controlling and distributing resources.  That part doesn't
> change.  It's just not built to be used directly by individual
> applications.  It's an admin tool just like sysctl - be that admin be
> a human or userland base system.
>
> There's a huge chasm between something which can be generally used by
> normal applications and something which is restricted to admins and
> base systems in terms of interface generality and stability, security,
> how the abstractions fit together with the existing APIs and so on.
> cgroup firmly belongs to the former.  It still serves the same purpose
> but isn't, in a way, developed enough to be used directly by
> individual applications and I'm not even sure we want or need to
> develop it to such a level.

My application is running on a single-purpose system I administer.

I guess what I'm trying to say here is that many systems will rather
fundamentally use systemd.  Admins of those systems should still have
access to a reasonably large subset of cgroup functionality.  If the
single-hierarchy model is going to prevent going around systemd and if
systemd isn't going to expose all of the useful cgroup functionality,
then perhaps there should be a way to separate systemd's hierarchy
from the cgroup hierarchy.

Looking at http://0pointer.de/blog/projects/cgroups-vs-cgroups.html,
it looks like systemd doesn't actually need the cgroup resource
control functionality.  Maybe there's a way to disentangle this stuff.
 The /proc//children feature that CRIU added seems like a decent
start.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 12:37 PM, Tejun Heo  wrote:
> Hello,
>
> On Mon, Jun 24, 2013 at 12:24:38PM -0700, Andy Lutomirski wrote:
>> Because more things are becoming per cpu without the option of moving
>> of per-cpu things on behalf of one cpu to another cpu.  RCU is a nice
>> exception.
>
> Hmm... but in most cases it's per-cpu on the same cpu that initiated
> the task.  If a given CPU is just crunching numbers and IRQ affinity
> is properly configured, the CPU shouldn't be bothered too much by
> per-cpu work items.  If there are, please let us know.  We can hunt
> them down.

I'm not just crunching numbers -- I do (nonblocking) I/O as well.

>
>> The functionality I care about is that a program can reliably and
>> hierarchically subdivide system resources -- think rlimits but
>> actually useful.  I, and probably many other things, want this
>> functionality.  Yes, the current cgroup interface is awful, but it
>> gets one thing right: it's a hierarchy.
>
> And the hierarchy support was completely broken for many resource
> controllers up until only several releases ago.
>
>> I would argue that designing a kernel interface that requires exactly
>> one userspace component to manage it and ties that one userspace
>> component to something that can't easily be deployed everywhere (the
>> init system) is as big a cheat as the old approach of sneaking bad
>> APIs in through a filesystem was.
>
> In terms of API, it is firmly at the level of sysctl.  That's it.
>
> While I agree that having a proper kernel API for hierarchical
> resource management could be nice.  That currently is out of scope.
> We're already knee-deep in shit with the limited capabilities we're
> trying to implement.  Also, I really don't think cgroup is the right
> interface for such thing even if we get to that.  It should be part of
> the usual process/thread model, not this completely separate thing on
> the side.
>
>> IOW, please, when designing this, please specify an API that programs
>> are permitted to use, and let that API be reviewed.
>
> cgroup is not that API and it's never gonna be in all likelihood.  As
> for systemd vs. non-systemd compatibility, I'm afraid I don't have a
> good answer.  This is still all in a pretty earlly phase and the
> proper abstractions and APIs are being figured out.  Hopefully, we'll
> converge on a mostly compatible high-level abstraction which can be
> presented regardless of the actual base system implementation.
>

So what is cgroup for?  That is, what's the goal for what the new API
should be able to do?

AFAICT the main reason that systemd uses cgroup is to efficiently
track which service various processes came from and to send signals,
and it seems like that use case could be handled without cgroups at
all by creative use of subreapers and a syscall to broadcast a signal
to everything that has a given subreaper as an ancestor.  In that
case, systemd could be asked to stay away from cgroups even in the
single-hierarchy case.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 12:10 PM, Tejun Heo  wrote:
> Hello, Andy.
>
> On Mon, Jun 24, 2013 at 11:49:05AM -0700, Andy Lutomirski wrote:
>> > I have an idea where it should be headed in the long term but am not
>> > sure about short-term solution.  Given that the only sort wide-spread
>> > use case is virt kthreads, maybe it just needs to be special cased for
>> > now.  Not sure.
>>
>> I'll be okay (I think) if I can reliably set affinities of these
>> threads.  I'm currently doing it with cgroups.
>>
>> That being said, I don't like the direction that kernel thread magic
>> affinity is going.  It may be great for cache performance and reducing
>> random bounding, but I have a scheduling-jitter-sensitive workload and
>> I don't care about overall system throughput.  I need the kernel to
>> stay the f!&k off my important cpus, and arranging for this to happen
>> is becoming increasingly complicated.
>
> Why is it becoming increasingly complicated?  The biggest change
> probably was the shared workqueue pool implementation but that was
> years ago and workqueue has grown pool attributes recently adding more
> properly designed flexibility and, for example, adding default
> affinity for !per-cpu workqueues should be pretty easy now.  But
> anyways, if it's an issue, it should be examined and properly solved
> rather than hacking up hacky solution with cgroup.

Because more things are becoming per cpu without the option of moving
of per-cpu things on behalf of one cpu to another cpu.  RCU is a nice
exception.

>
>> cgroups are most certainly something that a binary can be aware of.
>> It's not like a sysctl knob at all -- it's per process.  I have lots
>
> No, it definitely is not.  Sure it is more granular than sysctl but
> that's it.  It exposes control knobs which are directly tied into
> kernel implementation details.  It is not a properly designed
> programming API by any stretch of imagination.  It is an extreme
> failure on the kernel side that that part hasn't been made crystal
> clear from the beginning.  I don't know how intentional it was but the
> whole thing is completely botched.
>
> cgroup *never* was held to the standard necessary for any widely
> available API and many of the controls it exposes are exactly at the
> level of sysctls.  As the interface was filesystem, it could evade
> scrutiny and with the hierarchical organization also gave the
> impression that it's something which can be used directly by
> individual applications.  It found a loophole in the way we implement
> and police kernel APIs and then exploited it like there's no tomorrow.
>
> We are firmly bound to maintain what already has been exposed from the
> kernel side and I'm not gonna break any of them but the free-for-all
> cgroup is broken and deprecated.  It's gonna wither and fade away and
> any attempt to reverse that will be met with extreme prejudice.

The functionality I care about is that a program can reliably and
hierarchically subdivide system resources -- think rlimits but
actually useful.  I, and probably many other things, want this
functionality.  Yes, the current cgroup interface is awful, but it
gets one thing right: it's a hierarchy.

Back when my software ran on Windows, I used the awful "job" interface
to allocate resources among different parts of my software.  When I
switched to Linux, I lost some of that functionality and replaced
other bits with cgroups.  It's hackish, but it works.

Now we're apparently moving toward having a unified hierarchy
(great!), a more sane API (great!), and a nasty userspace situation
where systemd-using systems control the hierarchy through a highly
limiting systemd-specific interface and non-systemd systems do
something else which will presumably look nothing like what systemd
does.

I would argue that designing a kernel interface that requires exactly
one userspace component to manage it and ties that one userspace
component to something that can't easily be deployed everywhere (the
init system) is as big a cheat as the old approach of sneaking bad
APIs in through a filesystem was.

IOW, please, when designing this, please specify an API that programs
are permitted to use, and let that API be reviewed.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 11:38 AM, Tejun Heo  wrote:
> Hello,
>
> On Mon, Jun 24, 2013 at 03:27:15PM +0200, Lennart Poettering wrote:
>> On Sat, 22.06.13 15:19, Andy Lutomirski (l...@amacapital.net) wrote:
>>
>> > 1. I put all the entire world into a separate, highly constrained
>> > cgroup.  My real-time code runs outside that cgroup.  This seems to
>> > exactly what slices are for, but I need kernel threads to go in to
>> > the constrained cgroup.  Will systemd support this?
>>
>> I am not sure whether the ability to move kernel threads into cgroups
>> will stay around at all, from the kernel side. Tejun, can you comment on 
>> this?
>
> Any kernel threads with PF_NO_SETAFFINITY set already can't be removed
> from the root cgroup.  In general, I don't think moving kernel threads
> into !root cgroups is a good idea.  They're in most cases shared
> resources and userland doesn't really have much idea what they're
> actually doing, which is the fundmental issue.
>
> Which kthreads are running on the kernel side and what they're doing
> is strict implementation detail from the kernel side.  There's no
> effort from kernel side in keeping them stable and userland is likely
> to get things completely wrong - e.g. many kernel threads named after
> workqueues in any recent kernels don't actually do anything until the
> system is under heavy memory pressure.  Userland can't tell and has no
> control over what's being executed where at all and that's the way it
> should be.
>
> That said, there are cases where certain async executions are
> concretely bound to userland processes - say, (planned) aio updates,
> virt drivers and so on.  Right now, virt implements something pretty
> hacky but I think they'll have to be tied closer to the usual process
> mechanism - ie. they should be saying that these kthreads are serving
> this process and should be treated as such in terms of resource
> control rather than the current "move this kthread to this set of
> cgroups, don't ask why" thing.  Another not-well-thought-out aspect of
> the current cgroup.  :(
>
> I have an idea where it should be headed in the long term but am not
> sure about short-term solution.  Given that the only sort wide-spread
> use case is virt kthreads, maybe it just needs to be special cased for
> now.  Not sure.

I'll be okay (I think) if I can reliably set affinities of these
threads.  I'm currently doing it with cgroups.

That being said, I don't like the direction that kernel thread magic
affinity is going.  It may be great for cache performance and reducing
random bounding, but I have a scheduling-jitter-sensitive workload and
I don't care about overall system throughput.  I need the kernel to
stay the f!&k off my important cpus, and arranging for this to happen
is becoming increasingly complicated.

>
>> > 2. I manage services and tasks outside systemd (for one thing, I
>> > currently use Ubuntu, but even if I were on Fedora, I have a bunch
>> > of fine-grained things that figure out how they're supposed to
>> > allocate resources, and porting them to systemd just to keep working
>> > in the new world order would be a PITA [1]).
>> >
>> > (cgroups have the odd feature that they are per-task, not per thread
>> > group, and the systemd proposal seems likely to break anything that
>> > actually wants task granularity.  I may actually want to use this,
>> > even though it's a bit evil -- my real-time thread groups have
>> > non-real-time threads.)
>>
>> Here too, Tejun is pretty keen on removing the ability of splitting up
>> threads into cgroups from the kernel, and will only allow this
>> per-process. Tejun, please comment!
>
> Yes, again, the biggest issue is how much of low-level cgroup details
> become known to individual programs.  Splitting threads into different
> cgroup would in most cases mean that the binary itself would become
> aware of cgroup and it's akin to burying sysctl knob tunings into
> individual binaries.  cgroup is not an interface for each individual
> program to fiddle with.  If certain thread-granular control is
> absolutely necessary and justifiable, it's something to be added to
> the existing thread API, not something to be bolted on using cgroups.
>

cgroups are most certainly something that a binary can be aware of.
It's not like a sysctl knob at all -- it's per process.  I have lots
of binaries that have worked quite well for a couple years that move
themselves into different cgroups.  I have no problem with a unified
hierarchy, but I need control of my little piec

Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 6:27 AM, Lennart Poettering
 wrote:
> On Sat, 22.06.13 15:19, Andy Lutomirski (l...@amacapital.net) wrote:

>>
>> 2. I manage services and tasks outside systemd (for one thing, I
>> currently use Ubuntu, but even if I were on Fedora, I have a bunch
>> of fine-grained things that figure out how they're supposed to
>> allocate resources, and porting them to systemd just to keep working
>> in the new world order would be a PITA [1]).
>>

[...]

>
>> I think that what I want are something like sub-unit cgroups -- I
>> want to be able to ask systemd to further subdivide the group for my
>> unit, login session, or whatever.  Would this be reasonable?
>> (Another way of thinking of this is that a unit would have a whole
>> cgroup hierarchy instead of just one cgroup.)
>
> The idea is not even to allow this. Basically, if you want to partitions
> your daemon into different cgroups you need to do that through systemd's
> abstractions: slices and services. To make this more palatable we'll
> introduce "throw-away" units though, so that you can dynamically run
> something as a workload and don't need to be concerned about naming
> this, or cleaning it up.
>

Hmm.  My particular software can maybe live with this with unpleasant
modifications, but this will break anything that, say, accepts a
connection from a client, forks into a (possibly new) cgroup based on
the identity of that client, and then does something.

How can this support containers or the use of cgroups in a
non-systemwide systemd instance?  Containers may no longer be allowed
to escape from the cgroup they start in, but there should (IMO) still
be a way for things to subdivide their cgroup-controlled resources.

If I want to have a hierarchy more than two levels deep, I suspect I'm
SOL under this model.  If I'm understanding correctly, there will be
slices, then units, and that's it.

>
>> 4. As mentioned, I'm on Ubuntu some of the time.  I'd like to keep
>> the same code working on systemd and non-systemd systems.
>>
>> How hard would it be to run systemd as just a cgroup controller?
>> That is, have systemd create its slices, run exactly one unit that
>> represents the whole system, and let other things use the cgroup
>> API.
>
> I have no idea, I don't develop Ubuntu. They will have to come up with
> some cgroup maintenance daemon of their own. As I know them they'll
> either do a "port" of the systemd counter part (but that's going to be
> tough!), or they'll stick something half-baked into Upstart...
>
> Sorry, if this all sounds a bit disappointing. But yeah, this all is not
> a trivial change...
>

I'm worried that the impedance mismatch between systemd and any other
possible API is going to be enormous.  On systemd, I'll have to:

 - Create a throwaway unit
 - Figure out how to wire up stdout and stderr correctly (I use them
for communication between processes)
 - Translate the current directory, the environment, etc. into systemd
configuration
 - Translate my desired resource controls into systemd's "let's
pretend that there aren't really cgroups underlying it" configuration
 - Start the throwaway unit
 - Figure out how to get notified when it finishes

Without systemd, I'll have to:

 - fork()
 - Ask whatever is managing cgroups to switch me to a different cgroup
 - exec()

This is going to suck, I think.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski

On 06/21/2013 10:36 AM, Lennart Poettering wrote:


2) This hierarchy becomes private property of systemd. systemd will set
it up. Systemd will maintain it. Systemd will rearrange it. Other
software that wants to make use of cgroups can do so only through
systemd's APIs. This single-writer logic is absolutely necessary, since
interdependencies between the various controllers, the various
attributes, the various cgroups are non-obvious and we simply cannot
allow that cgroup users alter the tree independently of each other
forever. Due to all this: The "Pax Cgroup" document is a thing of the
past, it is dead.




If you are using non-trivial cgroup setups with systemd right now, then
things will change for you. We will provide you with similar
functionality as before, but things will be different and less
low-level. As long as you only used the high-level options such as
CPUShares, MemoryLimit and so on you should be on the safe side.



Hmm.  This may be tricky for my use case.  Here are a few issues.  For 
all I know, they may already be supported (or planned), but I don't want 
to get caught.


1. I put all the entire world into a separate, highly constrained 
cgroup.  My real-time code runs outside that cgroup.  This seems to 
exactly what slices are for, but I need kernel threads to go in to the 
constrained cgroup.  Will systemd support this?


2. I manage services and tasks outside systemd (for one thing, I 
currently use Ubuntu, but even if I were on Fedora, I have a bunch of 
fine-grained things that figure out how they're supposed to allocate 
resources, and porting them to systemd just to keep working in the new 
world order would be a PITA [1]).


(cgroups have the odd feature that they are per-task, not per thread 
group, and the systemd proposal seems likely to break anything that 
actually wants task granularity.  I may actually want to use this, even 
though it's a bit evil -- my real-time thread groups have non-real-time 
threads.)


I think that what I want are something like sub-unit cgroups -- I want 
to be able to ask systemd to further subdivide the group for my unit, 
login session, or whatever.  Would this be reasonable?  (Another way of 
thinking of this is that a unit would have a whole cgroup hierarchy 
instead of just one cgroup.)


I think that the single-hierarchy model will require that I subdivide my 
user session so that the default sub-unit cgroup is constrained 
similarly to the default slice.  I'll lose functionality, but I don't 
think this is a showstopper.


A different approach would be to allow units to (with systemd's 
cooperation) escape into their own, dynamically created unit.  This 
seems kind of awful.


3. My code runs unprivileged, but it still wants to configure itself. 
If needed, I can write a little privileged daemon to handle the systemd 
calls.


I think I can get away without anything fancy if a unit (login session?) 
grant the right to manipulate sub-unit cgroups to a non-root user.


4. As mentioned, I'm on Ubuntu some of the time.  I'd like to keep the 
same code working on systemd and non-systemd systems.


How hard would it be to run systemd as just a cgroup controller?  That 
is, have systemd create its slices, run exactly one unit that represents 
the whole system, and let other things use the cgroup API.



[1] Some day, I might convert my code to use a session systemd instance. 
 I'm not holding my breath, but it could be nice.


--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel