Re: cgroup: status-quo and userland efforts

2015-03-04 Thread David Lang

On Wed, 4 Mar 2015, Luke Kenneth Casson Leighton wrote:




and why he concludes that having a single hierarchy for all resource types.


correcting to add "is not always a good idea"



i think having a single hierarchy is fine *if* and only if it is
possible to overlay something similar to SE/Linux policy files -
enforced by the kernel *not* by userspace (sorry serge!) - such that
through those policy files any type of hierarchy be it single or multi
layer, recursive or in fact absolutely anything, may be emulated and
properly enforced.


The fundamental problem is that sometimes you have types of controls that are 
orthoginal to each other, and you either manage the two types of things in 
separate hierarchies, or you end up with one hierarchy that is a permutation of 
all the combinations of what would have been separate hierarchies.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-04 Thread Luke Kenneth Casson Leighton
On Wed, Mar 4, 2015 at 5:08 AM, David Lang  wrote:
> On Tue, 3 Mar 2015, Luke Leighton wrote:

>> whilst the majority of people view management to be "hierarchical"
>> (so there is a top dog or God process and everything trickles down
>>  from that), this is viewed as such an anathema in the security
>> industry that someone came up with a formal specification for the
>> real-world way in which permissions are managed,

 sorry i should have said "managed in the security esp. defense industry"

>> and it's called the FLASK model.
>
>
> On this topic it's also worth reading Neil Brown's series of articles on
> this over at http://lwn.net/Articles/604609/

 oo good background, thank you david.  happily reading now :)

> and why he concludes that having a single hierarchy for all resource types.

 i think having a single hierarchy is fine *if* and only if it is
possible to overlay something similar to SE/Linux policy files -
enforced by the kernel *not* by userspace (sorry serge!) - such that
through those policy files any type of hierarchy be it single or multi
layer, recursive or in fact absolutely anything, may be emulated and
properly enforced.

l.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-04 Thread David Lang

On Wed, 4 Mar 2015, Luke Kenneth Casson Leighton wrote:




and why he concludes that having a single hierarchy for all resource types.


correcting to add is not always a good idea



i think having a single hierarchy is fine *if* and only if it is
possible to overlay something similar to SE/Linux policy files -
enforced by the kernel *not* by userspace (sorry serge!) - such that
through those policy files any type of hierarchy be it single or multi
layer, recursive or in fact absolutely anything, may be emulated and
properly enforced.


The fundamental problem is that sometimes you have types of controls that are 
orthoginal to each other, and you either manage the two types of things in 
separate hierarchies, or you end up with one hierarchy that is a permutation of 
all the combinations of what would have been separate hierarchies.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-04 Thread Luke Kenneth Casson Leighton
On Wed, Mar 4, 2015 at 5:08 AM, David Lang da...@lang.hm wrote:
 On Tue, 3 Mar 2015, Luke Leighton wrote:

 whilst the majority of people view management to be hierarchical
 (so there is a top dog or God process and everything trickles down
  from that), this is viewed as such an anathema in the security
 industry that someone came up with a formal specification for the
 real-world way in which permissions are managed,

 sorry i should have said managed in the security esp. defense industry

 and it's called the FLASK model.


 On this topic it's also worth reading Neil Brown's series of articles on
 this over at http://lwn.net/Articles/604609/

 oo good background, thank you david.  happily reading now :)

 and why he concludes that having a single hierarchy for all resource types.

 i think having a single hierarchy is fine *if* and only if it is
possible to overlay something similar to SE/Linux policy files -
enforced by the kernel *not* by userspace (sorry serge!) - such that
through those policy files any type of hierarchy be it single or multi
layer, recursive or in fact absolutely anything, may be emulated and
properly enforced.

l.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread David Lang

On Tue, 3 Mar 2015, Luke Leighton wrote:


I wrote about that many times, but here are two of the problems.

* There's no way to designate a cgroup to a resource, because cgroup
  is only defined by the combination of who's looking at it for which
  controller.  That's how you end up with tagging the same resource
  multiple times for different controllers and even then it's broken
  as when you move resources from one cgroup to another, you can't
  tell what to do with other tags.

  While allowing obscene level of flexibility, multiple hierarchies
  destroy a very fundamental concept that it *should* provide - that
  of a resource container.  It can't because a "cgroup" is undefined
  under multiple hierarchies.


ok, there is an alternative to hierarchies, which has precedent
(and, importantly, a set of userspace management tools as well as
 existing code in the linux kernel), and it's the FLASK model which
 you know as SE/Linux.

whilst the majority of people view management to be "hierarchical"
(so there is a top dog or God process and everything trickles down
 from that), this is viewed as such an anathema in the security
industry that someone came up with a formal specification for the
real-world way in which permissions are managed, and it's called the
FLASK model.


On this topic it's also worth reading Neil Brown's series of articles on this 
over at http://lwn.net/Articles/604609/ and why he concludes that having a 
single hierarchy for all resource types.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Serge Hallyn  writes:

> 
> Quoting Daniel P. Berrange (berrange@...):

> > Are you also planning to actually write a new cgroup parent manager
> > daemon too ? Currently my plan for libvirt is to just talk directly
> 
> I'm toying with the idea, yes.  (Right now my toy runs in either native
> mode, using cgroupfs, or child mode, talking to a parent manager)  I'd
> love if someone else does it, but it needs to be done.
> 
> As I've said elsewhere in the thread, I see 2 problems to be addressed:
> 
> 1. The ability to nest the cgroup manager daemons, so that a daemon
> running in a container can talk to a daemon running on the host.  This
> is the problem my current toy is aiming to address.  But the API it
> exports is just a thin layer over cgroupfs.

 cool!  that's funny, that sounds exactly like what i asked if you
 could provide, and it turns out that you already did :)

 so, in theoorryy. you could have this:

 * run the service on top of /dev/cgroups, republishing [a subset?] as
   /run/cgroups and some other parts as /run/cgroups2

 * have PID1, instead of going directly to /dev/cgroups, to go to
   /run/cgroups *instead*.

 * have lxc, instead of going directly to /dev/cgroups, to go to
   /run/cgroups2 *instead*.

 the problem: as lennart mentions, PID1s such as systemd may be expecting
 to manage the setup of cgroups - entirely - for security or other
 initialisation reasons - *before* even the service that you've created,
 serge, is allowed to run.

 and *that's* why i suggested the idea of following what SE/Linux has
 done, which is to have policy files that compile down to a set of
 permissions that the (various) managers can and cannot do.  bits of
 cgroup that they are and are not permitted to manage.
 
 flat at the kernel implementation level; hierarchical (or other)
 at the "compile-the-policy-file" level.

 l.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Tejun Heo  writes:

> 
> Hello, Serge.
> 
> On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
> > At some point (probably soon) we might want to talk about a standard API
> > for these things.  However I think it will have to come in the form of
> > a standard library, which knows to either send requests over dbus to
> > systemd, or over /dev/cgroup sock to the manager.
> 
> Yeah, eventually, I think we'll have a standardized way to configure
> resource distribution in the system.  Maybe we'll agree on a
> standardized dbus protocol or there will be library, I don't know;
> however, whatever form it may be in, it abstraction level should be
> way higher than that of direct cgroupfs access.  It's way too low
> level and very easy to end up in a complete nonsense configuration.

 just because it sounds easy to end up in a complete nonsense
 configuration does not mean that the entire API should be abandoned.

 instead, it sounds to me like there should be explicit policies
 (taking a leaf out of SE/Linux's book) on what is and is not
 permitted.

 i think you'll find that that is much more acceptable [to have
 explicit policy files which define what can and can't be done].

 it then becomes possible to define "sensible and sane" default
 policies for the average situation, whilst also allowing for more
 complex cases to be created by those people who really really
 know what they're doing.

 the "ridiculous counterexample" to what you are suggesting is that
 just because "rm -fr /*" does such a lot of damage, rm should have
 its "-r" option removed.  perhaps a better example would involve
 rsync, which even as far back as 1999 had already run out of
 lowercase _and_ uppercase letters to use as options... but i can't
 think of one because rsync is awesome :)

 l.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Serge Hallyn  writes:

> 
> Quoting Tim Hockin (thockin@...):

> > > FWIW, the code is too embarassing yet to see daylight, but I'm playing
> > > with a very lowlevel cgroup manager which supports nesting itself.
> > > Access in this POC is low-level ("set freezer.state to THAWED for cgroup
> > > /c1/c2", "Create /c3"), but the key feature is that it can run in two
> > > modes - native mode in which it uses cgroupfs, and child mode where it
> > > talks to a parent manager to make the changes.
> > 
> > In this world, are users able to read cgroup files, or do they have to
> > go through a central agent, too?
> 
> The agent won't itself do anything to stop access through cgroupfs, but
> the idea would be that cgroupfs would only be mounted in the agent's
> mntns.  My hope would be that the libcgroup commands (like cgexec,
> cgcreate, etc) would know to talk to the agent when possible, and users
> would use those.

 serge, i realise this is a year on, so you probably have something
 at least working by now... but i have a possibly crazy idea..

 would it be possible or convenient for the agent that you are writing
 to emulate - in userspace - the *exact* same interface as /dev/cgroups,
 providing a controlled hierarchy yet presenting itself to other
 processes in such a way that its hierarchical management would be
 completely transparent to anything that used it?

 including of course a new instance of the agent itself, in a recursive
 fashion :)

 the important question on top of this would be: is there anything that
 needs to be atomic which emulation of the /dev/cgroups kernel API in
 userspace could not handle?

l.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Tejun Heo  writes:

> 
> Hello, Tim.
> 
> On Fri, Jun 28, 2013 at 11:44:23AM -0700, Tim Hockin wrote:


> The goal is to reach sane and widely useable / useful state with
> minimum amount of complexity.  Maintaining backward compatibility for
> some period - likely quite a few years - while still allowing future
> development is a pretty important consideration.  Another factor is
> that the general situation has been more or less atrocious and cgroup
> as a whole has been failing in the very basic places, which also
> reinforces the drive for simplicity.

 was it einstein who said that something should be made as simple as
 it needs to be... but no simpler?
 
> That said, I stil don't know very well the scope and severity of the
> problems you guys might face from the loss of multiple orthogonal
> hierarchies.

 i think he made it very clear that it would be utterly catastrophic,
 with the cost being millions of dollars or more.

 the thing is, if you compare a "normal" company or individual user(s)
 needs, the numbers of such users may be large but each one has only
 one or a few machines.  but in this case, it's just "one person"
 (tim) saying "i represent hundreds of thousands of machines, here,
 being adversely affected by these discussions".

 so he feels that you *should* be lending far more weight to what he's
 saying *but*... see below...

> So, can you please explain the issues that you've experienced and are
> foreseeing in detail with their contexts?  ie. if you have certain
> requirement, please give at least brief explanation on where such
> requirement is coming from and how important the requirement is.

 well... that's the problem, tejun: he's not permitted to.  he's under
 NDA.  thus the "weighting" gets multiplied by... a number significantly
 less than 1e-5...  oops :)

 l.
 l.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Tejun Heo  writes:

> 
> Hello, Tim.
> 
> On Wed, Jun 26, 2013 at 08:42:21PM -0700, Tim Hockin wrote:
> > OK, then what I don't know is what is the new interface?  A new cgroupfs?
> 
> It's gonna be a new mount option for cgroupfs.
> 
> > DTF and CPU and cpuset all have "default" groups for some tasks (and
> > not others) in our world today.  DTF actually has default, prio, and
> > "normal".  I was simplifying before.  I really wish it were as simple
> > as you think it is.  But if it were, do you think I'd still be
> > arguing?
> 
> How am I supposed to know when you don't communicate it but just wave
> your hands saying it's all very complicated? 

 i'd say that tejun's got you there, tim.  how is anyone supposed to
 understand or help you to support what your team is doing if the entire
 work - no matter how good it is - is kept secret and proprietary?

 we *know* that secret and proprietary is risky, so why is the company
 that you work for indulging itself in such dangerous practices,
 especially when there appears to be so much at risk here if the only
 mindshare for the work you're doing exists solely and exclusively
 in some "secret lair"?

 my suggestion to you would be to urgently, *urgently* get the *entire*
 set of tools and documentation surrounding what is clearly mission
 critical infrastructure released *immediately* as a software libre
 project.

 and the second suggestion would - if they are amenable - to hire
 tejun and any of his associates - to come over for as long as possible
 and necessary to review what you've been doing, on site, giving
 them carte blanche (or even a remit) to update and refine the online
 documentation.

 without that happening - without there being publicly-available
 documentation - i really don't see how you can be expected to ask
 tejun to understand the complexity of what the team needs, when the
 majority of what you want - and need! - to say you *can't*... because
 you're under some bloody stupid NDA!  that's... insane!

 you *need mindshare*: that means releasing the tools and documentation
 as a software libre project so that, if nothing else, there's other
 people whom the company you work for can poach when they get proficient
 at working with it :)

 l.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Tejun Heo  writes:


> I don't really understand your example anyway because you can classify
> by DTF / non-DTF first and then just propagate cpuset settings along.
> You won't lose anything that way, right?

 without spoiling the fun by reading ahead, based on the extreme
 complexity of what tim's team have spent probably man-decades
 possibly even getting on for a man-century getting right, i'm guessing
 two things: (a) that he will have said "we lose everything we
 worked to achieve over the past few years" and (b) "what we have
 now, whilst extremely complex, works really really well: why would
 we even remotely contemplate changing / losing it / replacing it
 with something that, from our deep level of expertise which we
 seem unable to get across to you quite how complex it is, we *know*
 will simply not possibly be adequate".

 tim: the only thing i can suggest here which may help is that
 you discuss seriously amongst the team as to whether to fork the
 functionality present in the linux kernel re hierarchical cgroups,
 and to maintain it indefinitely.
 

> I wrote about that many times, but here are two of the problems.
> 
> * There's no way to designate a cgroup to a resource, because cgroup
>   is only defined by the combination of who's looking at it for which
>   controller.  That's how you end up with tagging the same resource
>   multiple times for different controllers and even then it's broken
>   as when you move resources from one cgroup to another, you can't
>   tell what to do with other tags.
> 
>   While allowing obscene level of flexibility, multiple hierarchies
>   destroy a very fundamental concept that it *should* provide - that
>   of a resource container.  It can't because a "cgroup" is undefined
>   under multiple hierarchies.

 ok, there is an alternative to hierarchies, which has precedent
 (and, importantly, a set of userspace management tools as well as
  existing code in the linux kernel), and it's the FLASK model which
  you know as SE/Linux.

 whilst the majority of people view management to be "hierarchical"
 (so there is a top dog or God process and everything trickles down
  from that), this is viewed as such an anathema in the security
 industry that someone came up with a formal specification for the
 real-world way in which permissions are managed, and it's called the
 FLASK model.

 basically you have a security policy which may, in its extreme limits,
 either contain absolutely all and any permissions (in the case of
 SE/Linux that's quite literally every single system call), or it may
 contain absolutely none.

 *but* - and this is the key bit: when a process exec's a new one,
 there is *no correlation* between the amount of permissions that the
 new child process has and its parent.  in other words, the security
 policy *may* say that a parent may exec a process which has *more*
 permissions (or even an entirely different set) than the parent.

 in other words there *is* no hierarchy.  it's all "flat", with
 inter-relationships.

 now, the way in which the security policy is expressed is in an m4
 macro language that may contain wildcards and includes and macros and
 functions and so on, meaning that its expression can be kept really
 quite simple if properly managed (and the SE/Linux team do an
 extraordinarily good job of doing exactly that).

 basically the reason why i mention this, tejun, is because it has
 distinct advantages.  intuitively i am guessing that the reason why
 you are freaking out about hierarchies is because it is effectively
 potentially infinite depth.  the reason why i mention SE/Linux is
 because it is effectively completely flat, and the responsibility
 for creating hierarchies (or not) is down to the userspace tools
 that compile the m4 macros into the binary files that the kernel
 reads and acts upon.

 so i think you'll find that if you investigate this approach and
 copy it, you should be able to keep the inherent simplicity of
 a "unified" underlying approach, but not have tim's team freaking
 out because they would be able to create policy files based on
 a hierarchical arrangement.

 it would also mean that policies could be written that ensure lxc
 doesn't need to get rewritten; PID1 could be allocated specific
 permissions that it can manage, and so on.

 does that make any sense?

 l.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Tejun Heo tj@... writes:

 
 Hello, Serge.
 
 On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
  At some point (probably soon) we might want to talk about a standard API
  for these things.  However I think it will have to come in the form of
  a standard library, which knows to either send requests over dbus to
  systemd, or over /dev/cgroup sock to the manager.
 
 Yeah, eventually, I think we'll have a standardized way to configure
 resource distribution in the system.  Maybe we'll agree on a
 standardized dbus protocol or there will be library, I don't know;
 however, whatever form it may be in, it abstraction level should be
 way higher than that of direct cgroupfs access.  It's way too low
 level and very easy to end up in a complete nonsense configuration.

 just because it sounds easy to end up in a complete nonsense
 configuration does not mean that the entire API should be abandoned.

 instead, it sounds to me like there should be explicit policies
 (taking a leaf out of SE/Linux's book) on what is and is not
 permitted.

 i think you'll find that that is much more acceptable [to have
 explicit policy files which define what can and can't be done].

 it then becomes possible to define sensible and sane default
 policies for the average situation, whilst also allowing for more
 complex cases to be created by those people who really really
 know what they're doing.

 the ridiculous counterexample to what you are suggesting is that
 just because rm -fr /* does such a lot of damage, rm should have
 its -r option removed.  perhaps a better example would involve
 rsync, which even as far back as 1999 had already run out of
 lowercase _and_ uppercase letters to use as options... but i can't
 think of one because rsync is awesome :)

 l.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Serge Hallyn serge.hallyn@... writes:

 
 Quoting Tim Hockin (thockin@...):

   FWIW, the code is too embarassing yet to see daylight, but I'm playing
   with a very lowlevel cgroup manager which supports nesting itself.
   Access in this POC is low-level (set freezer.state to THAWED for cgroup
   /c1/c2, Create /c3), but the key feature is that it can run in two
   modes - native mode in which it uses cgroupfs, and child mode where it
   talks to a parent manager to make the changes.
  
  In this world, are users able to read cgroup files, or do they have to
  go through a central agent, too?
 
 The agent won't itself do anything to stop access through cgroupfs, but
 the idea would be that cgroupfs would only be mounted in the agent's
 mntns.  My hope would be that the libcgroup commands (like cgexec,
 cgcreate, etc) would know to talk to the agent when possible, and users
 would use those.

 serge, i realise this is a year on, so you probably have something
 at least working by now... but i have a possibly crazy idea..

 would it be possible or convenient for the agent that you are writing
 to emulate - in userspace - the *exact* same interface as /dev/cgroups,
 providing a controlled hierarchy yet presenting itself to other
 processes in such a way that its hierarchical management would be
 completely transparent to anything that used it?

 including of course a new instance of the agent itself, in a recursive
 fashion :)

 the important question on top of this would be: is there anything that
 needs to be atomic which emulation of the /dev/cgroups kernel API in
 userspace could not handle?

l.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Tejun Heo tj@... writes:


 I don't really understand your example anyway because you can classify
 by DTF / non-DTF first and then just propagate cpuset settings along.
 You won't lose anything that way, right?

 without spoiling the fun by reading ahead, based on the extreme
 complexity of what tim's team have spent probably man-decades
 possibly even getting on for a man-century getting right, i'm guessing
 two things: (a) that he will have said we lose everything we
 worked to achieve over the past few years and (b) what we have
 now, whilst extremely complex, works really really well: why would
 we even remotely contemplate changing / losing it / replacing it
 with something that, from our deep level of expertise which we
 seem unable to get across to you quite how complex it is, we *know*
 will simply not possibly be adequate.

 tim: the only thing i can suggest here which may help is that
 you discuss seriously amongst the team as to whether to fork the
 functionality present in the linux kernel re hierarchical cgroups,
 and to maintain it indefinitely.
 

 I wrote about that many times, but here are two of the problems.
 
 * There's no way to designate a cgroup to a resource, because cgroup
   is only defined by the combination of who's looking at it for which
   controller.  That's how you end up with tagging the same resource
   multiple times for different controllers and even then it's broken
   as when you move resources from one cgroup to another, you can't
   tell what to do with other tags.
 
   While allowing obscene level of flexibility, multiple hierarchies
   destroy a very fundamental concept that it *should* provide - that
   of a resource container.  It can't because a cgroup is undefined
   under multiple hierarchies.

 ok, there is an alternative to hierarchies, which has precedent
 (and, importantly, a set of userspace management tools as well as
  existing code in the linux kernel), and it's the FLASK model which
  you know as SE/Linux.

 whilst the majority of people view management to be hierarchical
 (so there is a top dog or God process and everything trickles down
  from that), this is viewed as such an anathema in the security
 industry that someone came up with a formal specification for the
 real-world way in which permissions are managed, and it's called the
 FLASK model.

 basically you have a security policy which may, in its extreme limits,
 either contain absolutely all and any permissions (in the case of
 SE/Linux that's quite literally every single system call), or it may
 contain absolutely none.

 *but* - and this is the key bit: when a process exec's a new one,
 there is *no correlation* between the amount of permissions that the
 new child process has and its parent.  in other words, the security
 policy *may* say that a parent may exec a process which has *more*
 permissions (or even an entirely different set) than the parent.

 in other words there *is* no hierarchy.  it's all flat, with
 inter-relationships.

 now, the way in which the security policy is expressed is in an m4
 macro language that may contain wildcards and includes and macros and
 functions and so on, meaning that its expression can be kept really
 quite simple if properly managed (and the SE/Linux team do an
 extraordinarily good job of doing exactly that).

 basically the reason why i mention this, tejun, is because it has
 distinct advantages.  intuitively i am guessing that the reason why
 you are freaking out about hierarchies is because it is effectively
 potentially infinite depth.  the reason why i mention SE/Linux is
 because it is effectively completely flat, and the responsibility
 for creating hierarchies (or not) is down to the userspace tools
 that compile the m4 macros into the binary files that the kernel
 reads and acts upon.

 so i think you'll find that if you investigate this approach and
 copy it, you should be able to keep the inherent simplicity of
 a unified underlying approach, but not have tim's team freaking
 out because they would be able to create policy files based on
 a hierarchical arrangement.

 it would also mean that policies could be written that ensure lxc
 doesn't need to get rewritten; PID1 could be allocated specific
 permissions that it can manage, and so on.

 does that make any sense?

 l.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Tejun Heo tj@... writes:

 
 Hello, Tim.
 
 On Wed, Jun 26, 2013 at 08:42:21PM -0700, Tim Hockin wrote:
  OK, then what I don't know is what is the new interface?  A new cgroupfs?
 
 It's gonna be a new mount option for cgroupfs.
 
  DTF and CPU and cpuset all have default groups for some tasks (and
  not others) in our world today.  DTF actually has default, prio, and
  normal.  I was simplifying before.  I really wish it were as simple
  as you think it is.  But if it were, do you think I'd still be
  arguing?
 
 How am I supposed to know when you don't communicate it but just wave
 your hands saying it's all very complicated? 

 i'd say that tejun's got you there, tim.  how is anyone supposed to
 understand or help you to support what your team is doing if the entire
 work - no matter how good it is - is kept secret and proprietary?

 we *know* that secret and proprietary is risky, so why is the company
 that you work for indulging itself in such dangerous practices,
 especially when there appears to be so much at risk here if the only
 mindshare for the work you're doing exists solely and exclusively
 in some secret lair?

 my suggestion to you would be to urgently, *urgently* get the *entire*
 set of tools and documentation surrounding what is clearly mission
 critical infrastructure released *immediately* as a software libre
 project.

 and the second suggestion would - if they are amenable - to hire
 tejun and any of his associates - to come over for as long as possible
 and necessary to review what you've been doing, on site, giving
 them carte blanche (or even a remit) to update and refine the online
 documentation.

 without that happening - without there being publicly-available
 documentation - i really don't see how you can be expected to ask
 tejun to understand the complexity of what the team needs, when the
 majority of what you want - and need! - to say you *can't*... because
 you're under some bloody stupid NDA!  that's... insane!

 you *need mindshare*: that means releasing the tools and documentation
 as a software libre project so that, if nothing else, there's other
 people whom the company you work for can poach when they get proficient
 at working with it :)

 l.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Tejun Heo tj@... writes:

 
 Hello, Tim.
 
 On Fri, Jun 28, 2013 at 11:44:23AM -0700, Tim Hockin wrote:


 The goal is to reach sane and widely useable / useful state with
 minimum amount of complexity.  Maintaining backward compatibility for
 some period - likely quite a few years - while still allowing future
 development is a pretty important consideration.  Another factor is
 that the general situation has been more or less atrocious and cgroup
 as a whole has been failing in the very basic places, which also
 reinforces the drive for simplicity.

 was it einstein who said that something should be made as simple as
 it needs to be... but no simpler?
 
 That said, I stil don't know very well the scope and severity of the
 problems you guys might face from the loss of multiple orthogonal
 hierarchies.

 i think he made it very clear that it would be utterly catastrophic,
 with the cost being millions of dollars or more.

 the thing is, if you compare a normal company or individual user(s)
 needs, the numbers of such users may be large but each one has only
 one or a few machines.  but in this case, it's just one person
 (tim) saying i represent hundreds of thousands of machines, here,
 being adversely affected by these discussions.

 so he feels that you *should* be lending far more weight to what he's
 saying *but*... see below...

 So, can you please explain the issues that you've experienced and are
 foreseeing in detail with their contexts?  ie. if you have certain
 requirement, please give at least brief explanation on where such
 requirement is coming from and how important the requirement is.

 well... that's the problem, tejun: he's not permitted to.  he's under
 NDA.  thus the weighting gets multiplied by... a number significantly
 less than 1e-5...  oops :)

 l.
 l.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Serge Hallyn serge.hallyn@... writes:

 
 Quoting Daniel P. Berrange (berrange@...):

  Are you also planning to actually write a new cgroup parent manager
  daemon too ? Currently my plan for libvirt is to just talk directly
 
 I'm toying with the idea, yes.  (Right now my toy runs in either native
 mode, using cgroupfs, or child mode, talking to a parent manager)  I'd
 love if someone else does it, but it needs to be done.
 
 As I've said elsewhere in the thread, I see 2 problems to be addressed:
 
 1. The ability to nest the cgroup manager daemons, so that a daemon
 running in a container can talk to a daemon running on the host.  This
 is the problem my current toy is aiming to address.  But the API it
 exports is just a thin layer over cgroupfs.

 cool!  that's funny, that sounds exactly like what i asked if you
 could provide, and it turns out that you already did :)

 so, in theoorryy. you could have this:

 * run the service on top of /dev/cgroups, republishing [a subset?] as
   /run/cgroups and some other parts as /run/cgroups2

 * have PID1, instead of going directly to /dev/cgroups, to go to
   /run/cgroups *instead*.

 * have lxc, instead of going directly to /dev/cgroups, to go to
   /run/cgroups2 *instead*.

 the problem: as lennart mentions, PID1s such as systemd may be expecting
 to manage the setup of cgroups - entirely - for security or other
 initialisation reasons - *before* even the service that you've created,
 serge, is allowed to run.

 and *that's* why i suggested the idea of following what SE/Linux has
 done, which is to have policy files that compile down to a set of
 permissions that the (various) managers can and cannot do.  bits of
 cgroup that they are and are not permitted to manage.
 
 flat at the kernel implementation level; hierarchical (or other)
 at the compile-the-policy-file level.

 l.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2015-03-03 Thread David Lang

On Tue, 3 Mar 2015, Luke Leighton wrote:


I wrote about that many times, but here are two of the problems.

* There's no way to designate a cgroup to a resource, because cgroup
  is only defined by the combination of who's looking at it for which
  controller.  That's how you end up with tagging the same resource
  multiple times for different controllers and even then it's broken
  as when you move resources from one cgroup to another, you can't
  tell what to do with other tags.

  While allowing obscene level of flexibility, multiple hierarchies
  destroy a very fundamental concept that it *should* provide - that
  of a resource container.  It can't because a cgroup is undefined
  under multiple hierarchies.


ok, there is an alternative to hierarchies, which has precedent
(and, importantly, a set of userspace management tools as well as
 existing code in the linux kernel), and it's the FLASK model which
 you know as SE/Linux.

whilst the majority of people view management to be hierarchical
(so there is a top dog or God process and everything trickles down
 from that), this is viewed as such an anathema in the security
industry that someone came up with a formal specification for the
real-world way in which permissions are managed, and it's called the
FLASK model.


On this topic it's also worth reading Neil Brown's series of articles on this 
over at http://lwn.net/Articles/604609/ and why he concludes that having a 
single hierarchy for all resource types.


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-07-23 Thread Michal Hocko
On Mon 15-07-13 14:49:40, Vivek Goyal wrote:
> On Sun, Jun 30, 2013 at 08:38:38PM +0200, Michal Hocko wrote:
> > On Fri 28-06-13 14:01:55, Vivek Goyal wrote:
> > > On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> > [...]
> > > > OK, so libcgroup's rules daemon will still work and place my tasks in
> > > > appropriate cgroups?
> > > 
> > > Do you use that daemon in practice?
> > 
> > I am not but my users do. And that is why I care.
> 
> Michael, 
> 
> would you have more details of how those users are exactly using
> rules engine daemon.

The most common usage is uid and exec names.

> To me rulesengined processed 3 kinds of rules.
> 
> - uid based
> - gid based
> - exec file path based
> 
> uid/gid based rule exection can be taken care by pam_cgroup module too.
> So I think one should not need cgrulesengined for that.

I am not familiar with pam_cgroup much but it is a part of libcgroup
package, right?

> I am curious what kind of exec rules are useful. Any placement of
> services one can do using systemd. So only executables we are left
> to manage are which are not services. 

Yes, those are usually backup processes which should not disrupt the
regular server workload.

uid ones are used to keep a leash on local users of the machine but i do
not have many details as I usually do not have access to those machines.
All I see are complains when something explodes ;)
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-07-23 Thread Michal Hocko
On Mon 15-07-13 14:49:40, Vivek Goyal wrote:
 On Sun, Jun 30, 2013 at 08:38:38PM +0200, Michal Hocko wrote:
  On Fri 28-06-13 14:01:55, Vivek Goyal wrote:
   On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
  [...]
OK, so libcgroup's rules daemon will still work and place my tasks in
appropriate cgroups?
   
   Do you use that daemon in practice?
  
  I am not but my users do. And that is why I care.
 
 Michael, 
 
 would you have more details of how those users are exactly using
 rules engine daemon.

The most common usage is uid and exec names.

 To me rulesengined processed 3 kinds of rules.
 
 - uid based
 - gid based
 - exec file path based
 
 uid/gid based rule exection can be taken care by pam_cgroup module too.
 So I think one should not need cgrulesengined for that.

I am not familiar with pam_cgroup much but it is a part of libcgroup
package, right?

 I am curious what kind of exec rules are useful. Any placement of
 services one can do using systemd. So only executables we are left
 to manage are which are not services. 

Yes, those are usually backup processes which should not disrupt the
regular server workload.

uid ones are used to keep a leash on local users of the machine but i do
not have many details as I usually do not have access to those machines.
All I see are complains when something explodes ;)
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-07-15 Thread Vivek Goyal
On Sun, Jun 30, 2013 at 08:38:38PM +0200, Michal Hocko wrote:
> On Fri 28-06-13 14:01:55, Vivek Goyal wrote:
> > On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> [...]
> > > OK, so libcgroup's rules daemon will still work and place my tasks in
> > > appropriate cgroups?
> > 
> > Do you use that daemon in practice?
> 
> I am not but my users do. And that is why I care.

Michael, 

would you have more details of how those users are exactly using
rules engine daemon.

To me rulesengined processed 3 kinds of rules.

- uid based
- gid based
- exec file path based

uid/gid based rule exection can be taken care by pam_cgroup module too.
So I think one should not need cgrulesengined for that.

I am curious what kind of exec rules are useful. Any placement of
services one can do using systemd. So only executables we are left
to manage are which are not services. 

In practice is it very useful for an admin to say if "firefox" is launched
by a user then it should run in xyz cgroup. And if user cares about
firefox running in a sub cgroup, then it can always use cgexec to do
that.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-07-15 Thread Vivek Goyal
On Sun, Jun 30, 2013 at 08:38:38PM +0200, Michal Hocko wrote:
 On Fri 28-06-13 14:01:55, Vivek Goyal wrote:
  On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
 [...]
   OK, so libcgroup's rules daemon will still work and place my tasks in
   appropriate cgroups?
  
  Do you use that daemon in practice?
 
 I am not but my users do. And that is why I care.

Michael, 

would you have more details of how those users are exactly using
rules engine daemon.

To me rulesengined processed 3 kinds of rules.

- uid based
- gid based
- exec file path based

uid/gid based rule exection can be taken care by pam_cgroup module too.
So I think one should not need cgrulesengined for that.

I am curious what kind of exec rules are useful. Any placement of
services one can do using systemd. So only executables we are left
to manage are which are not services. 

In practice is it very useful for an admin to say if firefox is launched
by a user then it should run in xyz cgroup. And if user cares about
firefox running in a sub cgroup, then it can always use cgexec to do
that.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-09 Thread Jiri Kosina
On Wed, 3 Jul 2013, Kay Sievers wrote:

> >> > But that's not my point.  It seems pretty easy to make this cgroup
> >> > management (in "native mode") a library that can have either a thin
> >> > veneer of a main() function, while also being usable by systemd.  The
> >> > point is to solve all of the problems ONCE.  I'm trying to make the
> >> > case that systemd itself should be focusing on features and policies
> >> > and awesome APIs.
> >>
> >> You know, getting this all right isn't easy. If you want to do things
> >> properly, then you need to propagate attribute changes between the units 
> >> you
> >> manage. You also need something like a scheduler, since a number of
> >> controllers can only be configured under certain external conditions (for
> >> example: the blkio or devices controller use major/minor parameters for
> >> configuring per-device limits. Since major/minor assignments are pretty 
> >> much
> >> unpredictable these days -- and users probably want to configure things 
> >> with
> >> friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us 
> >> to
> >> wait for devices to show up before we can configure the parameters.) Soo...
> >> you need a graph of units, where you can propagate things, and schedule 
> >> things
> >> based on some execution/event queue. And the propagation and scheduling are
> >> closely intermingled.
> >
> > you are confusing policy and mechanisms.
> >
> > The access to cgroupfs is mechanism.
> >
> > The propagation of changes, the scheduling of cgroupfs access and
> > the correlation to external conditions are policy.
> >
> > What Tim is asking for is to have a common interface, i.e. a library
> > which implements the low level access to the cgroupfs mechanism
> > without imposing systemd defined policies to it (It might implement a
> > set of common useful policies, but that's a different discussion).
> >
> > That's definitely not an unreasonable request, because he wants to
> > implement his own set of policies which are not necessarily the same
> > as those which are implemented by systemd.
> >
> > You are simply ignoring the fact, that Linux is used in other ways
> > than those which you are focussed on. That's true for Google's way to
> > manage its gazillion machines and that's equally true for the other
> > end of the spectrum which is deep embedded or any other specialized
> > use case. Just face it: running Linux on your laptop and on some RHT
> > lab machines is covering about 1% of the use cases.
> >
> > Nevertheless you repeatedly claim, that systemd is the only way to
> > deal with system startup and system management, is covering _ALL_ use
> > cases and the interfaces you expose are sufficient.
> >
> > Did you ever work on specialized embedded or big data use cases? I
> > really doubt that, but I might be wrong as usual.
> >
> > So I invite you to prove that you can beat an existing setup for an
> > automotive use case with your magic systemd foo. I refund you fully,
> > if you can beat the mark of a functional system less than 800ms after
> > reset release on a 200MHz ARM machine. Functional is defined by the
> > use case requirements and means:
> >
> > - Basic cgroups management working
> > - GUI up and running
> > - Main communication interface (CAN bus) up and running
> >
> > The rest of the system is starting up after that including a more
> > complex cgroup management.
> >
> > According to your claim that systemd is covering everything and some
> > more, this should take you a few hours. I grant you a full week to
> > work on that.
> >
> > The use case Tim is talking about is different, but has similar
> > constraints which are completely driven by his particular use case
> > scenario. I'm sure, that Tim can persuade his management to setup a
> > similar contest to prove your expertise on the other extreme of the
> > Linux world.
> >
> > Before answering please think about the relevance of your statements
> > "getting this all right isn't easy", "something like a scheduler",
> > "users probably want ..."  and "stable /dev/disk/by-id/* symlinks" in
> > those contexts.
> 
> I don't think anybody needs your money.
> 
> But it's sure an improvement over last time when you wanted to use a
> "Kantholz" to make your statement.

Now how about the policy vs. mechanisms part of Thomas' e-mail?

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-09 Thread Jiri Kosina
On Wed, 3 Jul 2013, Kay Sievers wrote:

   But that's not my point.  It seems pretty easy to make this cgroup
   management (in native mode) a library that can have either a thin
   veneer of a main() function, while also being usable by systemd.  The
   point is to solve all of the problems ONCE.  I'm trying to make the
   case that systemd itself should be focusing on features and policies
   and awesome APIs.
 
  You know, getting this all right isn't easy. If you want to do things
  properly, then you need to propagate attribute changes between the units 
  you
  manage. You also need something like a scheduler, since a number of
  controllers can only be configured under certain external conditions (for
  example: the blkio or devices controller use major/minor parameters for
  configuring per-device limits. Since major/minor assignments are pretty 
  much
  unpredictable these days -- and users probably want to configure things 
  with
  friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us 
  to
  wait for devices to show up before we can configure the parameters.) Soo...
  you need a graph of units, where you can propagate things, and schedule 
  things
  based on some execution/event queue. And the propagation and scheduling are
  closely intermingled.
 
  you are confusing policy and mechanisms.
 
  The access to cgroupfs is mechanism.
 
  The propagation of changes, the scheduling of cgroupfs access and
  the correlation to external conditions are policy.
 
  What Tim is asking for is to have a common interface, i.e. a library
  which implements the low level access to the cgroupfs mechanism
  without imposing systemd defined policies to it (It might implement a
  set of common useful policies, but that's a different discussion).
 
  That's definitely not an unreasonable request, because he wants to
  implement his own set of policies which are not necessarily the same
  as those which are implemented by systemd.
 
  You are simply ignoring the fact, that Linux is used in other ways
  than those which you are focussed on. That's true for Google's way to
  manage its gazillion machines and that's equally true for the other
  end of the spectrum which is deep embedded or any other specialized
  use case. Just face it: running Linux on your laptop and on some RHT
  lab machines is covering about 1% of the use cases.
 
  Nevertheless you repeatedly claim, that systemd is the only way to
  deal with system startup and system management, is covering _ALL_ use
  cases and the interfaces you expose are sufficient.
 
  Did you ever work on specialized embedded or big data use cases? I
  really doubt that, but I might be wrong as usual.
 
  So I invite you to prove that you can beat an existing setup for an
  automotive use case with your magic systemd foo. I refund you fully,
  if you can beat the mark of a functional system less than 800ms after
  reset release on a 200MHz ARM machine. Functional is defined by the
  use case requirements and means:
 
  - Basic cgroups management working
  - GUI up and running
  - Main communication interface (CAN bus) up and running
 
  The rest of the system is starting up after that including a more
  complex cgroup management.
 
  According to your claim that systemd is covering everything and some
  more, this should take you a few hours. I grant you a full week to
  work on that.
 
  The use case Tim is talking about is different, but has similar
  constraints which are completely driven by his particular use case
  scenario. I'm sure, that Tim can persuade his management to setup a
  similar contest to prove your expertise on the other extreme of the
  Linux world.
 
  Before answering please think about the relevance of your statements
  getting this all right isn't easy, something like a scheduler,
  users probably want ...  and stable /dev/disk/by-id/* symlinks in
  those contexts.
 
 I don't think anybody needs your money.
 
 But it's sure an improvement over last time when you wanted to use a
 Kantholz to make your statement.

Now how about the policy vs. mechanisms part of Thomas' e-mail?

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-03 Thread James Bottomley
On Wed, 2013-07-03 at 01:57 +0200, Thomas Gleixner wrote:
> Lennart,
> 
> On Sun, 30 Jun 2013, Lennart Poettering wrote:
> > On 29.06.2013 05:05, Tim Hockin wrote:
> > > But that's not my point.  It seems pretty easy to make this cgroup
> > > management (in "native mode") a library that can have either a thin
> > > veneer of a main() function, while also being usable by systemd.  The
> > > point is to solve all of the problems ONCE.  I'm trying to make the
> > > case that systemd itself should be focusing on features and policies
> > > and awesome APIs.
> > 
> > You know, getting this all right isn't easy. If you want to do things
> > properly, then you need to propagate attribute changes between the units you
> > manage. You also need something like a scheduler, since a number of
> > controllers can only be configured under certain external conditions (for
> > example: the blkio or devices controller use major/minor parameters for
> > configuring per-device limits. Since major/minor assignments are pretty much
> > unpredictable these days -- and users probably want to configure things with
> > friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
> > wait for devices to show up before we can configure the parameters.) Soo...
> > you need a graph of units, where you can propagate things, and schedule 
> > things
> > based on some execution/event queue. And the propagation and scheduling are
> > closely intermingled.
> 
> you are confusing policy and mechanisms.
> 
> The access to cgroupfs is mechanism.
> 
> The propagation of changes, the scheduling of cgroupfs access and
> the correlation to external conditions are policy.
> 
> What Tim is asking for is to have a common interface, i.e. a library
> which implements the low level access to the cgroupfs mechanism
> without imposing systemd defined policies to it (It might implement a
> set of common useful policies, but that's a different discussion).
> 
> That's definitely not an unreasonable request, because he wants to
> implement his own set of policies which are not necessarily the same
> as those which are implemented by systemd.

Could I just add a me too to this from Parallels.  We need the ability
to impose our own container policy on the kernel mechanisms.

Perhaps I should step back a bit and say first of all that we all use
the word "container" a lot, but if you analyse what we mean, you'll find
that a Google container is different from a Parallels/OpenVZ container
which is different from an LXC container and so on.  How we all build
our containers is a policy we impose on the various cgroup and namespace
mechanisms within the kernel.  We've spent a lot of discussion time over
the years making sure that the kernel mechanisms support all of our
different use cases, so I really don't want to see that change in the
name of simplifying the API.

I also don't think any quest for the one true container will be
successful for the simple reason that containers are best when tuned for
the job they're doing. For instance at Parallels we do IaaS containers.
That means we can take a container, boot up any old Linux OS inside it
and give you root on it in exactly the same way as you could for a
virtual machine.  Google does something more like application containers
for job control and some network companies do pure namespace containers
without any cgroup controllers at all.  There's no one container
description that would fit all use cases.

So where we are is that the current APIs may be messy, but they support
all use cases and all container structure policies.  If anyone, systemd
included, wants to do a new API, it must support all use cases as well.
Ideally, it should be agreed to and in the kernel as well rather than
having some userspace filter.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-03 Thread Thomas Gleixner
On Wed, 3 Jul 2013, Kay Sievers wrote:
> On Wed, Jul 3, 2013 at 1:57 AM, Thomas Gleixner  wrote:
> > Before answering please think about the relevance of your statements
> > "getting this all right isn't easy", "something like a scheduler",
> > "users probably want ..."  and "stable /dev/disk/by-id/* symlinks" in
> > those contexts.
> 
> I don't think anybody needs your money.

Thanks for your well thought out technical argument.
 
> But it's sure an improvement over last time when you wanted to use a
> "Kantholz" to make your statement.

Using an out of context snippet from a private conversation at the bar
to answer a technical argument is definitely proving your point.

Thanks,

tglx



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-03 Thread Borislav Petkov
On Wed, Jul 03, 2013 at 02:44:31AM +0200, Kay Sievers wrote:
> I don't think anybody needs your money.
> 
> But it's sure an improvement over last time when you wanted to use a
> "Kantholz" to make your statement.

Kantholz, frozen sharks, whatever helps get the real point across. Hint:
this is not at all about the money.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-03 Thread Borislav Petkov
On Wed, Jul 03, 2013 at 02:44:31AM +0200, Kay Sievers wrote:
 I don't think anybody needs your money.
 
 But it's sure an improvement over last time when you wanted to use a
 Kantholz to make your statement.

Kantholz, frozen sharks, whatever helps get the real point across. Hint:
this is not at all about the money.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-03 Thread Thomas Gleixner
On Wed, 3 Jul 2013, Kay Sievers wrote:
 On Wed, Jul 3, 2013 at 1:57 AM, Thomas Gleixner t...@linutronix.de wrote:
  Before answering please think about the relevance of your statements
  getting this all right isn't easy, something like a scheduler,
  users probably want ...  and stable /dev/disk/by-id/* symlinks in
  those contexts.
 
 I don't think anybody needs your money.

Thanks for your well thought out technical argument.
 
 But it's sure an improvement over last time when you wanted to use a
 Kantholz to make your statement.

Using an out of context snippet from a private conversation at the bar
to answer a technical argument is definitely proving your point.

Thanks,

tglx



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-03 Thread James Bottomley
On Wed, 2013-07-03 at 01:57 +0200, Thomas Gleixner wrote:
 Lennart,
 
 On Sun, 30 Jun 2013, Lennart Poettering wrote:
  On 29.06.2013 05:05, Tim Hockin wrote:
   But that's not my point.  It seems pretty easy to make this cgroup
   management (in native mode) a library that can have either a thin
   veneer of a main() function, while also being usable by systemd.  The
   point is to solve all of the problems ONCE.  I'm trying to make the
   case that systemd itself should be focusing on features and policies
   and awesome APIs.
  
  You know, getting this all right isn't easy. If you want to do things
  properly, then you need to propagate attribute changes between the units you
  manage. You also need something like a scheduler, since a number of
  controllers can only be configured under certain external conditions (for
  example: the blkio or devices controller use major/minor parameters for
  configuring per-device limits. Since major/minor assignments are pretty much
  unpredictable these days -- and users probably want to configure things with
  friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
  wait for devices to show up before we can configure the parameters.) Soo...
  you need a graph of units, where you can propagate things, and schedule 
  things
  based on some execution/event queue. And the propagation and scheduling are
  closely intermingled.
 
 you are confusing policy and mechanisms.
 
 The access to cgroupfs is mechanism.
 
 The propagation of changes, the scheduling of cgroupfs access and
 the correlation to external conditions are policy.
 
 What Tim is asking for is to have a common interface, i.e. a library
 which implements the low level access to the cgroupfs mechanism
 without imposing systemd defined policies to it (It might implement a
 set of common useful policies, but that's a different discussion).
 
 That's definitely not an unreasonable request, because he wants to
 implement his own set of policies which are not necessarily the same
 as those which are implemented by systemd.

Could I just add a me too to this from Parallels.  We need the ability
to impose our own container policy on the kernel mechanisms.

Perhaps I should step back a bit and say first of all that we all use
the word container a lot, but if you analyse what we mean, you'll find
that a Google container is different from a Parallels/OpenVZ container
which is different from an LXC container and so on.  How we all build
our containers is a policy we impose on the various cgroup and namespace
mechanisms within the kernel.  We've spent a lot of discussion time over
the years making sure that the kernel mechanisms support all of our
different use cases, so I really don't want to see that change in the
name of simplifying the API.

I also don't think any quest for the one true container will be
successful for the simple reason that containers are best when tuned for
the job they're doing. For instance at Parallels we do IaaS containers.
That means we can take a container, boot up any old Linux OS inside it
and give you root on it in exactly the same way as you could for a
virtual machine.  Google does something more like application containers
for job control and some network companies do pure namespace containers
without any cgroup controllers at all.  There's no one container
description that would fit all use cases.

So where we are is that the current APIs may be messy, but they support
all use cases and all container structure policies.  If anyone, systemd
included, wants to do a new API, it must support all use cases as well.
Ideally, it should be agreed to and in the kernel as well rather than
having some userspace filter.

James


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-02 Thread Kay Sievers
On Wed, Jul 3, 2013 at 1:57 AM, Thomas Gleixner  wrote:
> On Sun, 30 Jun 2013, Lennart Poettering wrote:
>> On 29.06.2013 05:05, Tim Hockin wrote:
>> > But that's not my point.  It seems pretty easy to make this cgroup
>> > management (in "native mode") a library that can have either a thin
>> > veneer of a main() function, while also being usable by systemd.  The
>> > point is to solve all of the problems ONCE.  I'm trying to make the
>> > case that systemd itself should be focusing on features and policies
>> > and awesome APIs.
>>
>> You know, getting this all right isn't easy. If you want to do things
>> properly, then you need to propagate attribute changes between the units you
>> manage. You also need something like a scheduler, since a number of
>> controllers can only be configured under certain external conditions (for
>> example: the blkio or devices controller use major/minor parameters for
>> configuring per-device limits. Since major/minor assignments are pretty much
>> unpredictable these days -- and users probably want to configure things with
>> friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
>> wait for devices to show up before we can configure the parameters.) Soo...
>> you need a graph of units, where you can propagate things, and schedule 
>> things
>> based on some execution/event queue. And the propagation and scheduling are
>> closely intermingled.
>
> you are confusing policy and mechanisms.
>
> The access to cgroupfs is mechanism.
>
> The propagation of changes, the scheduling of cgroupfs access and
> the correlation to external conditions are policy.
>
> What Tim is asking for is to have a common interface, i.e. a library
> which implements the low level access to the cgroupfs mechanism
> without imposing systemd defined policies to it (It might implement a
> set of common useful policies, but that's a different discussion).
>
> That's definitely not an unreasonable request, because he wants to
> implement his own set of policies which are not necessarily the same
> as those which are implemented by systemd.
>
> You are simply ignoring the fact, that Linux is used in other ways
> than those which you are focussed on. That's true for Google's way to
> manage its gazillion machines and that's equally true for the other
> end of the spectrum which is deep embedded or any other specialized
> use case. Just face it: running Linux on your laptop and on some RHT
> lab machines is covering about 1% of the use cases.
>
> Nevertheless you repeatedly claim, that systemd is the only way to
> deal with system startup and system management, is covering _ALL_ use
> cases and the interfaces you expose are sufficient.
>
> Did you ever work on specialized embedded or big data use cases? I
> really doubt that, but I might be wrong as usual.
>
> So I invite you to prove that you can beat an existing setup for an
> automotive use case with your magic systemd foo. I refund you fully,
> if you can beat the mark of a functional system less than 800ms after
> reset release on a 200MHz ARM machine. Functional is defined by the
> use case requirements and means:
>
> - Basic cgroups management working
> - GUI up and running
> - Main communication interface (CAN bus) up and running
>
> The rest of the system is starting up after that including a more
> complex cgroup management.
>
> According to your claim that systemd is covering everything and some
> more, this should take you a few hours. I grant you a full week to
> work on that.
>
> The use case Tim is talking about is different, but has similar
> constraints which are completely driven by his particular use case
> scenario. I'm sure, that Tim can persuade his management to setup a
> similar contest to prove your expertise on the other extreme of the
> Linux world.
>
> Before answering please think about the relevance of your statements
> "getting this all right isn't easy", "something like a scheduler",
> "users probably want ..."  and "stable /dev/disk/by-id/* symlinks" in
> those contexts.

I don't think anybody needs your money.

But it's sure an improvement over last time when you wanted to use a
"Kantholz" to make your statement.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-02 Thread Thomas Gleixner
Lennart,

On Sun, 30 Jun 2013, Lennart Poettering wrote:
> On 29.06.2013 05:05, Tim Hockin wrote:
> > But that's not my point.  It seems pretty easy to make this cgroup
> > management (in "native mode") a library that can have either a thin
> > veneer of a main() function, while also being usable by systemd.  The
> > point is to solve all of the problems ONCE.  I'm trying to make the
> > case that systemd itself should be focusing on features and policies
> > and awesome APIs.
> 
> You know, getting this all right isn't easy. If you want to do things
> properly, then you need to propagate attribute changes between the units you
> manage. You also need something like a scheduler, since a number of
> controllers can only be configured under certain external conditions (for
> example: the blkio or devices controller use major/minor parameters for
> configuring per-device limits. Since major/minor assignments are pretty much
> unpredictable these days -- and users probably want to configure things with
> friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
> wait for devices to show up before we can configure the parameters.) Soo...
> you need a graph of units, where you can propagate things, and schedule things
> based on some execution/event queue. And the propagation and scheduling are
> closely intermingled.

you are confusing policy and mechanisms.

The access to cgroupfs is mechanism.

The propagation of changes, the scheduling of cgroupfs access and
the correlation to external conditions are policy.

What Tim is asking for is to have a common interface, i.e. a library
which implements the low level access to the cgroupfs mechanism
without imposing systemd defined policies to it (It might implement a
set of common useful policies, but that's a different discussion).

That's definitely not an unreasonable request, because he wants to
implement his own set of policies which are not necessarily the same
as those which are implemented by systemd.

You are simply ignoring the fact, that Linux is used in other ways
than those which you are focussed on. That's true for Google's way to
manage its gazillion machines and that's equally true for the other
end of the spectrum which is deep embedded or any other specialized
use case. Just face it: running Linux on your laptop and on some RHT
lab machines is covering about 1% of the use cases.

Nevertheless you repeatedly claim, that systemd is the only way to
deal with system startup and system management, is covering _ALL_ use
cases and the interfaces you expose are sufficient.

Did you ever work on specialized embedded or big data use cases? I
really doubt that, but I might be wrong as usual.

So I invite you to prove that you can beat an existing setup for an
automotive use case with your magic systemd foo. I refund you fully,
if you can beat the mark of a functional system less than 800ms after
reset release on a 200MHz ARM machine. Functional is defined by the
use case requirements and means:

- Basic cgroups management working
- GUI up and running
- Main communication interface (CAN bus) up and running

The rest of the system is starting up after that including a more
complex cgroup management.

According to your claim that systemd is covering everything and some
more, this should take you a few hours. I grant you a full week to
work on that.

The use case Tim is talking about is different, but has similar
constraints which are completely driven by his particular use case
scenario. I'm sure, that Tim can persuade his management to setup a
similar contest to prove your expertise on the other extreme of the
Linux world.

Before answering please think about the relevance of your statements
"getting this all right isn't easy", "something like a scheduler",
"users probably want ..."  and "stable /dev/disk/by-id/* symlinks" in
those contexts.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-02 Thread Thomas Gleixner
Lennart,

On Sun, 30 Jun 2013, Lennart Poettering wrote:
 On 29.06.2013 05:05, Tim Hockin wrote:
  But that's not my point.  It seems pretty easy to make this cgroup
  management (in native mode) a library that can have either a thin
  veneer of a main() function, while also being usable by systemd.  The
  point is to solve all of the problems ONCE.  I'm trying to make the
  case that systemd itself should be focusing on features and policies
  and awesome APIs.
 
 You know, getting this all right isn't easy. If you want to do things
 properly, then you need to propagate attribute changes between the units you
 manage. You also need something like a scheduler, since a number of
 controllers can only be configured under certain external conditions (for
 example: the blkio or devices controller use major/minor parameters for
 configuring per-device limits. Since major/minor assignments are pretty much
 unpredictable these days -- and users probably want to configure things with
 friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
 wait for devices to show up before we can configure the parameters.) Soo...
 you need a graph of units, where you can propagate things, and schedule things
 based on some execution/event queue. And the propagation and scheduling are
 closely intermingled.

you are confusing policy and mechanisms.

The access to cgroupfs is mechanism.

The propagation of changes, the scheduling of cgroupfs access and
the correlation to external conditions are policy.

What Tim is asking for is to have a common interface, i.e. a library
which implements the low level access to the cgroupfs mechanism
without imposing systemd defined policies to it (It might implement a
set of common useful policies, but that's a different discussion).

That's definitely not an unreasonable request, because he wants to
implement his own set of policies which are not necessarily the same
as those which are implemented by systemd.

You are simply ignoring the fact, that Linux is used in other ways
than those which you are focussed on. That's true for Google's way to
manage its gazillion machines and that's equally true for the other
end of the spectrum which is deep embedded or any other specialized
use case. Just face it: running Linux on your laptop and on some RHT
lab machines is covering about 1% of the use cases.

Nevertheless you repeatedly claim, that systemd is the only way to
deal with system startup and system management, is covering _ALL_ use
cases and the interfaces you expose are sufficient.

Did you ever work on specialized embedded or big data use cases? I
really doubt that, but I might be wrong as usual.

So I invite you to prove that you can beat an existing setup for an
automotive use case with your magic systemd foo. I refund you fully,
if you can beat the mark of a functional system less than 800ms after
reset release on a 200MHz ARM machine. Functional is defined by the
use case requirements and means:

- Basic cgroups management working
- GUI up and running
- Main communication interface (CAN bus) up and running

The rest of the system is starting up after that including a more
complex cgroup management.

According to your claim that systemd is covering everything and some
more, this should take you a few hours. I grant you a full week to
work on that.

The use case Tim is talking about is different, but has similar
constraints which are completely driven by his particular use case
scenario. I'm sure, that Tim can persuade his management to setup a
similar contest to prove your expertise on the other extreme of the
Linux world.

Before answering please think about the relevance of your statements
getting this all right isn't easy, something like a scheduler,
users probably want ...  and stable /dev/disk/by-id/* symlinks in
those contexts.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-02 Thread Kay Sievers
On Wed, Jul 3, 2013 at 1:57 AM, Thomas Gleixner t...@linutronix.de wrote:
 On Sun, 30 Jun 2013, Lennart Poettering wrote:
 On 29.06.2013 05:05, Tim Hockin wrote:
  But that's not my point.  It seems pretty easy to make this cgroup
  management (in native mode) a library that can have either a thin
  veneer of a main() function, while also being usable by systemd.  The
  point is to solve all of the problems ONCE.  I'm trying to make the
  case that systemd itself should be focusing on features and policies
  and awesome APIs.

 You know, getting this all right isn't easy. If you want to do things
 properly, then you need to propagate attribute changes between the units you
 manage. You also need something like a scheduler, since a number of
 controllers can only be configured under certain external conditions (for
 example: the blkio or devices controller use major/minor parameters for
 configuring per-device limits. Since major/minor assignments are pretty much
 unpredictable these days -- and users probably want to configure things with
 friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
 wait for devices to show up before we can configure the parameters.) Soo...
 you need a graph of units, where you can propagate things, and schedule 
 things
 based on some execution/event queue. And the propagation and scheduling are
 closely intermingled.

 you are confusing policy and mechanisms.

 The access to cgroupfs is mechanism.

 The propagation of changes, the scheduling of cgroupfs access and
 the correlation to external conditions are policy.

 What Tim is asking for is to have a common interface, i.e. a library
 which implements the low level access to the cgroupfs mechanism
 without imposing systemd defined policies to it (It might implement a
 set of common useful policies, but that's a different discussion).

 That's definitely not an unreasonable request, because he wants to
 implement his own set of policies which are not necessarily the same
 as those which are implemented by systemd.

 You are simply ignoring the fact, that Linux is used in other ways
 than those which you are focussed on. That's true for Google's way to
 manage its gazillion machines and that's equally true for the other
 end of the spectrum which is deep embedded or any other specialized
 use case. Just face it: running Linux on your laptop and on some RHT
 lab machines is covering about 1% of the use cases.

 Nevertheless you repeatedly claim, that systemd is the only way to
 deal with system startup and system management, is covering _ALL_ use
 cases and the interfaces you expose are sufficient.

 Did you ever work on specialized embedded or big data use cases? I
 really doubt that, but I might be wrong as usual.

 So I invite you to prove that you can beat an existing setup for an
 automotive use case with your magic systemd foo. I refund you fully,
 if you can beat the mark of a functional system less than 800ms after
 reset release on a 200MHz ARM machine. Functional is defined by the
 use case requirements and means:

 - Basic cgroups management working
 - GUI up and running
 - Main communication interface (CAN bus) up and running

 The rest of the system is starting up after that including a more
 complex cgroup management.

 According to your claim that systemd is covering everything and some
 more, this should take you a few hours. I grant you a full week to
 work on that.

 The use case Tim is talking about is different, but has similar
 constraints which are completely driven by his particular use case
 scenario. I'm sure, that Tim can persuade his management to setup a
 similar contest to prove your expertise on the other extreme of the
 Linux world.

 Before answering please think about the relevance of your statements
 getting this all right isn't easy, something like a scheduler,
 users probably want ...  and stable /dev/disk/by-id/* symlinks in
 those contexts.

I don't think anybody needs your money.

But it's sure an improvement over last time when you wanted to use a
Kantholz to make your statement.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-07-01 Thread Tim Hockin
On Sun, Jun 30, 2013 at 12:39 PM, Lennart Poettering
 wrote:
> Heya,
>
>
> On 29.06.2013 05:05, Tim Hockin wrote:
>>
>> Come on, now, Lennart.  You put a lot of words in my mouth.
>
>
>>> I for sure am not going to make the PID 1 a client of another daemon.
>>> That's
>>> just wrong. If you have a daemon that is both conceptually the manager of
>>> another service and the client of that other service, then that's bad
>>> design
>>> and you will easily run into deadlocks and such. Just think about it: if
>>> you
>>> have some external daemon for managing cgroups, and you need cgroups for
>>> running external daemons, how are you going to start the external daemon
>>> for
>>> managing cgroups? Sure, you can hack around this, make that daemon
>>> special,
>>> and magic, and stuff -- or you can just not do such nonsense. There's no
>>> reason to repeat the fuckup that cgroup became in kernelspace a second
>>> time,
>>> but this time in userspace, with multiple manager daemons all with
>>> different
>>> and slightly incompatible definitions what a unit to manage actualy is...
>>
>>
>> I forgot about the tautology of systemd.  systemd is monolithic.
>
>
> systemd is certainly not monolithic for almost any definition of that term.
> I am not sure where you are taking that from, and I am not sure I want to
> discuss on that level. This just sounds like FUD you picked up somewhere and
> are repeating carelessly...

It does a number of sort-of-related things.  Maybe it does them better
by doing them together.  I can't say, really.  We don't use it at
work, and I am on Ubuntu elsewhere, for now.

>> But that's not my point.  It seems pretty easy to make this cgroup
>> management (in "native mode") a library that can have either a thin
>> veneer of a main() function, while also being usable by systemd.  The
>> point is to solve all of the problems ONCE.  I'm trying to make the
>> case that systemd itself should be focusing on features and policies
>> and awesome APIs.
>
> You know, getting this all right isn't easy. If you want to do things
> properly, then you need to propagate attribute changes between the units you
> manage. You also need something like a scheduler, since a number of
> controllers can only be configured under certain external conditions (for
> example: the blkio or devices controller use major/minor parameters for
> configuring per-device limits. Since major/minor assignments are pretty much
> unpredictable these days -- and users probably want to configure things with
> friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
> wait for devices to show up before we can configure the parameters.) Soo...
> you need a graph of units, where you can propagate things, and schedule
> things based on some execution/event queue. And the propagation and
> scheduling are closely intermingled.

I'm really just talking about the most basic low-level substrate of
writing to cgroupfs.  Again, we don't use udev (yet?) so we don't have
these problems.  It seems to me that it's possible to formulate a
bottom layer that is usable by both systemd and non-systemd systems.
But, you know, maybe I am wrong and our internal universe is so much
simpler (and behind the times) than the rest of the world that
layering can work for us and not you.

> Now, that's pretty much exactly what systemd actually *is*. It implements a
> graph of units with a scheduler. And if you rip that part out of systemd to
> make this an "easy cgroup management library", then you simply turn what
> systemd is into a library without leaving anything. Which is just bogus.
>
> So no, if you say "seems pretty easy to make this cgroup management a
> library" then well, I have to disagree with you.
>
>
>>> We want to run fewer, simpler things on our systems, we want to reuse as
>>
>>
>> Fewer and simpler are not compatible, unless you are losing
>> functionality.  Systemd is fewer, but NOT simpler.
>
>
> Oh, certainly it is. If we'd split up the cgroup fs access into separate
> daemon of some kind, then we'd need some kind of IPC for that, and so you
> have more daemons and you have some complex IPC between the processes. So
> yeah, the systemd approach is certainly both simpler and uses fewer daemons
> then your hypothetical one.

Well, it SOUNDS like Serge is trying to develop this to demonstrate
that a standalone daemon works.  That's what I am keen to help with
(or else we have to invent ourselves).  I am not really afraid of IPC
or of "more daemons".  I much prefer simple agents doing one thing and
interacting with each other in simple ways.  But that's me.

>>> much of the code as we can. You don't achieve that by running yet another
>>> daemon that does worse what systemd can anyway do simpler, easier and
>>> better.
>>
>>
>> Considering this is all hypothetical, I find this to be a funny
>> debate.  My hypothetical idea is better than your hypothetical idea.
>
>
> Well, systemd is pretty real, and the code to do the unified 

Re: cgroup: status-quo and userland efforts

2013-07-01 Thread Tim Hockin
On Sun, Jun 30, 2013 at 12:39 PM, Lennart Poettering
lpoet...@redhat.com wrote:
 Heya,


 On 29.06.2013 05:05, Tim Hockin wrote:

 Come on, now, Lennart.  You put a lot of words in my mouth.


 I for sure am not going to make the PID 1 a client of another daemon.
 That's
 just wrong. If you have a daemon that is both conceptually the manager of
 another service and the client of that other service, then that's bad
 design
 and you will easily run into deadlocks and such. Just think about it: if
 you
 have some external daemon for managing cgroups, and you need cgroups for
 running external daemons, how are you going to start the external daemon
 for
 managing cgroups? Sure, you can hack around this, make that daemon
 special,
 and magic, and stuff -- or you can just not do such nonsense. There's no
 reason to repeat the fuckup that cgroup became in kernelspace a second
 time,
 but this time in userspace, with multiple manager daemons all with
 different
 and slightly incompatible definitions what a unit to manage actualy is...


 I forgot about the tautology of systemd.  systemd is monolithic.


 systemd is certainly not monolithic for almost any definition of that term.
 I am not sure where you are taking that from, and I am not sure I want to
 discuss on that level. This just sounds like FUD you picked up somewhere and
 are repeating carelessly...

It does a number of sort-of-related things.  Maybe it does them better
by doing them together.  I can't say, really.  We don't use it at
work, and I am on Ubuntu elsewhere, for now.

 But that's not my point.  It seems pretty easy to make this cgroup
 management (in native mode) a library that can have either a thin
 veneer of a main() function, while also being usable by systemd.  The
 point is to solve all of the problems ONCE.  I'm trying to make the
 case that systemd itself should be focusing on features and policies
 and awesome APIs.

 You know, getting this all right isn't easy. If you want to do things
 properly, then you need to propagate attribute changes between the units you
 manage. You also need something like a scheduler, since a number of
 controllers can only be configured under certain external conditions (for
 example: the blkio or devices controller use major/minor parameters for
 configuring per-device limits. Since major/minor assignments are pretty much
 unpredictable these days -- and users probably want to configure things with
 friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
 wait for devices to show up before we can configure the parameters.) Soo...
 you need a graph of units, where you can propagate things, and schedule
 things based on some execution/event queue. And the propagation and
 scheduling are closely intermingled.

I'm really just talking about the most basic low-level substrate of
writing to cgroupfs.  Again, we don't use udev (yet?) so we don't have
these problems.  It seems to me that it's possible to formulate a
bottom layer that is usable by both systemd and non-systemd systems.
But, you know, maybe I am wrong and our internal universe is so much
simpler (and behind the times) than the rest of the world that
layering can work for us and not you.

 Now, that's pretty much exactly what systemd actually *is*. It implements a
 graph of units with a scheduler. And if you rip that part out of systemd to
 make this an easy cgroup management library, then you simply turn what
 systemd is into a library without leaving anything. Which is just bogus.

 So no, if you say seems pretty easy to make this cgroup management a
 library then well, I have to disagree with you.


 We want to run fewer, simpler things on our systems, we want to reuse as


 Fewer and simpler are not compatible, unless you are losing
 functionality.  Systemd is fewer, but NOT simpler.


 Oh, certainly it is. If we'd split up the cgroup fs access into separate
 daemon of some kind, then we'd need some kind of IPC for that, and so you
 have more daemons and you have some complex IPC between the processes. So
 yeah, the systemd approach is certainly both simpler and uses fewer daemons
 then your hypothetical one.

Well, it SOUNDS like Serge is trying to develop this to demonstrate
that a standalone daemon works.  That's what I am keen to help with
(or else we have to invent ourselves).  I am not really afraid of IPC
or of more daemons.  I much prefer simple agents doing one thing and
interacting with each other in simple ways.  But that's me.

 much of the code as we can. You don't achieve that by running yet another
 daemon that does worse what systemd can anyway do simpler, easier and
 better.


 Considering this is all hypothetical, I find this to be a funny
 debate.  My hypothetical idea is better than your hypothetical idea.


 Well, systemd is pretty real, and the code to do the unified cgroup
 management within systemd is pretty complete. systemd is certainly not
 hypothetical.

Fair enough - I did not realize you had 

Re: cgroup: status-quo and userland efforts

2013-06-30 Thread Lennart Poettering

Heya,

On 29.06.2013 05:05, Tim Hockin wrote:

Come on, now, Lennart.  You put a lot of words in my mouth.



I for sure am not going to make the PID 1 a client of another daemon. That's
just wrong. If you have a daemon that is both conceptually the manager of
another service and the client of that other service, then that's bad design
and you will easily run into deadlocks and such. Just think about it: if you
have some external daemon for managing cgroups, and you need cgroups for
running external daemons, how are you going to start the external daemon for
managing cgroups? Sure, you can hack around this, make that daemon special,
and magic, and stuff -- or you can just not do such nonsense. There's no
reason to repeat the fuckup that cgroup became in kernelspace a second time,
but this time in userspace, with multiple manager daemons all with different
and slightly incompatible definitions what a unit to manage actualy is...


I forgot about the tautology of systemd.  systemd is monolithic.


systemd is certainly not monolithic for almost any definition of that 
term. I am not sure where you are taking that from, and I am not sure I 
want to discuss on that level. This just sounds like FUD you picked up 
somewhere and are repeating carelessly...



But that's not my point.  It seems pretty easy to make this cgroup
management (in "native mode") a library that can have either a thin
veneer of a main() function, while also being usable by systemd.  The
point is to solve all of the problems ONCE.  I'm trying to make the
case that systemd itself should be focusing on features and policies
and awesome APIs.


You know, getting this all right isn't easy. If you want to do things 
properly, then you need to propagate attribute changes between the units 
you manage. You also need something like a scheduler, since a number of 
controllers can only be configured under certain external conditions 
(for example: the blkio or devices controller use major/minor parameters 
for configuring per-device limits. Since major/minor assignments are 
pretty much unpredictable these days -- and users probably want to 
configure things with friendly and stable /dev/disk/by-id/* symlinks 
anyway -- this requires us to wait for devices to show up before we can 
configure the parameters.) Soo... you need a graph of units, where you 
can propagate things, and schedule things based on some execution/event 
queue. And the propagation and scheduling are closely intermingled.


Now, that's pretty much exactly what systemd actually *is*. It 
implements a graph of units with a scheduler. And if you rip that part 
out of systemd to make this an "easy cgroup management library", then 
you simply turn what systemd is into a library without leaving anything. 
Which is just bogus.


So no, if you say "seems pretty easy to make this cgroup management a 
library" then well, I have to disagree with you.



We want to run fewer, simpler things on our systems, we want to reuse as


Fewer and simpler are not compatible, unless you are losing
functionality.  Systemd is fewer, but NOT simpler.


Oh, certainly it is. If we'd split up the cgroup fs access into 
separate daemon of some kind, then we'd need some kind of IPC for that, 
and so you have more daemons and you have some complex IPC between the 
processes. So yeah, the systemd approach is certainly both simpler and 
uses fewer daemons then your hypothetical one.



much of the code as we can. You don't achieve that by running yet another
daemon that does worse what systemd can anyway do simpler, easier and
better.


Considering this is all hypothetical, I find this to be a funny
debate.  My hypothetical idea is better than your hypothetical idea.


Well, systemd is pretty real, and the code to do the unified cgroup 
management within systemd is pretty complete. systemd is certainly not 
hypothetical.



The least you could grant us is to have a look at the final APIs we will
have to offer before you already imply that systemd cannot be a valid
implementation of any API people could ever agree on.


Whoah, don't get defensive.  I said nothing of the sort.  The fact of
the matter is that we do not run systemd, at least in part because of
the monolithic nature.  That's unlikely to change in this timescale.


Oh, my. I am not sure what makes you think it is monolithic.


What I said was that it would be a shame if we had to invent our own
low-level cgroup daemon just because the "upstream" daemons was too
tightly coupled with systemd.


I have no interest to reimplement systemd as a library, just to make you 
happy... I am quite happy with what we already have



This is supposed to be collaborative, not combative.


It certainly sounds *very* differently in what you are writing.

Lennart
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read 

Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-30 Thread Michal Hocko
On Fri 28-06-13 14:01:55, Vivek Goyal wrote:
> On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
[...]
> > OK, so libcgroup's rules daemon will still work and place my tasks in
> > appropriate cgroups?
> 
> Do you use that daemon in practice?

I am not but my users do. And that is why I care.

> For user session logins, I think systemd has plans to put user
> sessions in a cgroup (kind of making pam_cgroup redundant).
> 
> Other functionality rulesengined was providing moving tasks automatically
> in a cgroup based on executable name. I think that was racy and not
> many people had liked it.

It doesn't make sense for short lived processes, all right, but it can
be useful for those that live for a long time.
 
> IIUC, systemd can't disable access to cgroupfs from other utilities.

The previous messages read otherwise. And that is why this rised the red
flag at many fronts.

> So most likely rulesengined should contine to work. But having both
> systemd and libcgroup might not make much sense though.
> 
> Thanks
> Vivek

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-30 Thread Michal Hocko
On Fri 28-06-13 14:01:55, Vivek Goyal wrote:
 On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
[...]
  OK, so libcgroup's rules daemon will still work and place my tasks in
  appropriate cgroups?
 
 Do you use that daemon in practice?

I am not but my users do. And that is why I care.

 For user session logins, I think systemd has plans to put user
 sessions in a cgroup (kind of making pam_cgroup redundant).
 
 Other functionality rulesengined was providing moving tasks automatically
 in a cgroup based on executable name. I think that was racy and not
 many people had liked it.

It doesn't make sense for short lived processes, all right, but it can
be useful for those that live for a long time.
 
 IIUC, systemd can't disable access to cgroupfs from other utilities.

The previous messages read otherwise. And that is why this rised the red
flag at many fronts.

 So most likely rulesengined should contine to work. But having both
 systemd and libcgroup might not make much sense though.
 
 Thanks
 Vivek

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-30 Thread Lennart Poettering

Heya,

On 29.06.2013 05:05, Tim Hockin wrote:

Come on, now, Lennart.  You put a lot of words in my mouth.



I for sure am not going to make the PID 1 a client of another daemon. That's
just wrong. If you have a daemon that is both conceptually the manager of
another service and the client of that other service, then that's bad design
and you will easily run into deadlocks and such. Just think about it: if you
have some external daemon for managing cgroups, and you need cgroups for
running external daemons, how are you going to start the external daemon for
managing cgroups? Sure, you can hack around this, make that daemon special,
and magic, and stuff -- or you can just not do such nonsense. There's no
reason to repeat the fuckup that cgroup became in kernelspace a second time,
but this time in userspace, with multiple manager daemons all with different
and slightly incompatible definitions what a unit to manage actualy is...


I forgot about the tautology of systemd.  systemd is monolithic.


systemd is certainly not monolithic for almost any definition of that 
term. I am not sure where you are taking that from, and I am not sure I 
want to discuss on that level. This just sounds like FUD you picked up 
somewhere and are repeating carelessly...



But that's not my point.  It seems pretty easy to make this cgroup
management (in native mode) a library that can have either a thin
veneer of a main() function, while also being usable by systemd.  The
point is to solve all of the problems ONCE.  I'm trying to make the
case that systemd itself should be focusing on features and policies
and awesome APIs.


You know, getting this all right isn't easy. If you want to do things 
properly, then you need to propagate attribute changes between the units 
you manage. You also need something like a scheduler, since a number of 
controllers can only be configured under certain external conditions 
(for example: the blkio or devices controller use major/minor parameters 
for configuring per-device limits. Since major/minor assignments are 
pretty much unpredictable these days -- and users probably want to 
configure things with friendly and stable /dev/disk/by-id/* symlinks 
anyway -- this requires us to wait for devices to show up before we can 
configure the parameters.) Soo... you need a graph of units, where you 
can propagate things, and schedule things based on some execution/event 
queue. And the propagation and scheduling are closely intermingled.


Now, that's pretty much exactly what systemd actually *is*. It 
implements a graph of units with a scheduler. And if you rip that part 
out of systemd to make this an easy cgroup management library, then 
you simply turn what systemd is into a library without leaving anything. 
Which is just bogus.


So no, if you say seems pretty easy to make this cgroup management a 
library then well, I have to disagree with you.



We want to run fewer, simpler things on our systems, we want to reuse as


Fewer and simpler are not compatible, unless you are losing
functionality.  Systemd is fewer, but NOT simpler.


Oh, certainly it is. If we'd split up the cgroup fs access into 
separate daemon of some kind, then we'd need some kind of IPC for that, 
and so you have more daemons and you have some complex IPC between the 
processes. So yeah, the systemd approach is certainly both simpler and 
uses fewer daemons then your hypothetical one.



much of the code as we can. You don't achieve that by running yet another
daemon that does worse what systemd can anyway do simpler, easier and
better.


Considering this is all hypothetical, I find this to be a funny
debate.  My hypothetical idea is better than your hypothetical idea.


Well, systemd is pretty real, and the code to do the unified cgroup 
management within systemd is pretty complete. systemd is certainly not 
hypothetical.



The least you could grant us is to have a look at the final APIs we will
have to offer before you already imply that systemd cannot be a valid
implementation of any API people could ever agree on.


Whoah, don't get defensive.  I said nothing of the sort.  The fact of
the matter is that we do not run systemd, at least in part because of
the monolithic nature.  That's unlikely to change in this timescale.


Oh, my. I am not sure what makes you think it is monolithic.


What I said was that it would be a shame if we had to invent our own
low-level cgroup daemon just because the upstream daemons was too
tightly coupled with systemd.


I have no interest to reimplement systemd as a library, just to make you 
happy... I am quite happy with what we already have



This is supposed to be collaborative, not combative.


It certainly sounds *very* differently in what you are writing.

Lennart
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at 

Re: cgroup: status-quo and userland efforts

2013-06-29 Thread Tejun Heo
Hello, Tim.

On Fri, Jun 28, 2013 at 11:44:23AM -0700, Tim Hockin wrote:
> I totally understand where you're coming from - trying to get back to
> a stable feature set.  But it sucks to be on the losing end of that

Oh, it has been sucking and will continue to suck like hell for me too
for the foreseeable future.  Trust me, this side ain't any greener.

> battle - you're cutting things that REALLY matter to us, and without a
> really viable alternative.  So we'll keep fighting.

Yeah, that's understandable.  More on this later.

> Splitting threads is sort of important for some cgroups, like CPU.  I
> wonder if pjt is paying attention to this thread.

Paul?

> I think this is wrong.  Take the opportunity to define the RIGHT
> interface that you WANT - a container.  Implement it in terms of
> cgroups (and maybe other stuff!).  Make that API so compelling that
> people want to use it, and your war of attrition on direct cgroup
> madness will be won, but with net progress rather than regress.

The goal is to reach sane and widely useable / useful state with
minimum amount of complexity.  Maintaining backward compatibility for
some period - likely quite a few years - while still allowing future
development is a pretty important consideration.  Another factor is
that the general situation has been more or less atrocious and cgroup
as a whole has been failing in the very basic places, which also
reinforces the drive for simplicity.

I probably am forgetting some, but anyways, from my POV, there are
fairly strong by-default factors which push for simplicity even if
that means some loss of functionalities as long as those aren't
something catastrophic.  I've been going over the decisions past few
days and unified hierarchy still seems the best, or rather, most
acceptable solution.

That said, I stil don't know very well the scope and severity of the
problems you guys might face from the loss of multiple orthogonal
hierarchies.  The cpuset one wasn't very convincing especially given
that most of expressibility problems can be mitigated if you presume
the central managing facility which can adapt the configurations as
the workload changes.  Dynamic execution of configuration of course is
the job of cgroup proper but larger cadence changes doesn't have to be
statically encoded in the hierarchy itself and as I wrote before some
just can't be whether multiple hierarchy or not.

While the bar to overcome is pretty high, I do want to learn about the
problems you guys are foreseeing, so that I can at least evaulate the
graveness properly and hopefully compromises which can mitigate the
most sore ones can be made wherever necessary.

So, can you please explain the issues that you've experienced and are
foreseeing in detail with their contexts?  ie. if you have certain
requirement, please give at least brief explanation on where such
requirement is coming from and how important the requirement is.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-29 Thread Tejun Heo
Hello, Tim.

On Fri, Jun 28, 2013 at 11:44:23AM -0700, Tim Hockin wrote:
 I totally understand where you're coming from - trying to get back to
 a stable feature set.  But it sucks to be on the losing end of that

Oh, it has been sucking and will continue to suck like hell for me too
for the foreseeable future.  Trust me, this side ain't any greener.

 battle - you're cutting things that REALLY matter to us, and without a
 really viable alternative.  So we'll keep fighting.

Yeah, that's understandable.  More on this later.

 Splitting threads is sort of important for some cgroups, like CPU.  I
 wonder if pjt is paying attention to this thread.

Paul?

 I think this is wrong.  Take the opportunity to define the RIGHT
 interface that you WANT - a container.  Implement it in terms of
 cgroups (and maybe other stuff!).  Make that API so compelling that
 people want to use it, and your war of attrition on direct cgroup
 madness will be won, but with net progress rather than regress.

The goal is to reach sane and widely useable / useful state with
minimum amount of complexity.  Maintaining backward compatibility for
some period - likely quite a few years - while still allowing future
development is a pretty important consideration.  Another factor is
that the general situation has been more or less atrocious and cgroup
as a whole has been failing in the very basic places, which also
reinforces the drive for simplicity.

I probably am forgetting some, but anyways, from my POV, there are
fairly strong by-default factors which push for simplicity even if
that means some loss of functionalities as long as those aren't
something catastrophic.  I've been going over the decisions past few
days and unified hierarchy still seems the best, or rather, most
acceptable solution.

That said, I stil don't know very well the scope and severity of the
problems you guys might face from the loss of multiple orthogonal
hierarchies.  The cpuset one wasn't very convincing especially given
that most of expressibility problems can be mitigated if you presume
the central managing facility which can adapt the configurations as
the workload changes.  Dynamic execution of configuration of course is
the job of cgroup proper but larger cadence changes doesn't have to be
statically encoded in the hierarchy itself and as I wrote before some
just can't be whether multiple hierarchy or not.

While the bar to overcome is pretty high, I do want to learn about the
problems you guys are foreseeing, so that I can at least evaulate the
graveness properly and hopefully compromises which can mitigate the
most sore ones can be made wherever necessary.

So, can you please explain the issues that you've experienced and are
foreseeing in detail with their contexts?  ie. if you have certain
requirement, please give at least brief explanation on where such
requirement is coming from and how important the requirement is.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Tim Hockin
Come on, now, Lennart.  You put a lot of words in my mouth.

On Fri, Jun 28, 2013 at 6:48 PM, Lennart Poettering  wrote:
> On 28.06.2013 20:53, Tim Hockin wrote:
>
>> a single-agent, we should make a kick-ass implementation that is
>> flexible and scalable, and full-featured enough to not require
>> divergence at the lowest layer of the stack.  Then build systemd on
>> top of that. Let systemd offer more features and policies and
>> "semantic" APIs.
>
>
> Well, what if systemd is already kick-ass? I mean, if you have a problem
> with systemd, then that's your own problem, but I really don't think why I
> should bother?

I didn't say it wasn't.  I said that we can build a common substrate
that systemd can build on *and* non-systemd systems can use *and*
Google can participate in.

> I for sure am not going to make the PID 1 a client of another daemon. That's
> just wrong. If you have a daemon that is both conceptually the manager of
> another service and the client of that other service, then that's bad design
> and you will easily run into deadlocks and such. Just think about it: if you
> have some external daemon for managing cgroups, and you need cgroups for
> running external daemons, how are you going to start the external daemon for
> managing cgroups? Sure, you can hack around this, make that daemon special,
> and magic, and stuff -- or you can just not do such nonsense. There's no
> reason to repeat the fuckup that cgroup became in kernelspace a second time,
> but this time in userspace, with multiple manager daemons all with different
> and slightly incompatible definitions what a unit to manage actualy is...

I forgot about the tautology of systemd.  systemd is monolithic.
Therefore it can not have any external dependencies.  Therefore it
must absorb anything it depends on.  Therefore systemd continues to
grow in size and scope.  Up next: systemd manages your X sessions!

But that's not my point.  It seems pretty easy to make this cgroup
management (in "native mode") a library that can have either a thin
veneer of a main() function, while also being usable by systemd.  The
point is to solve all of the problems ONCE.  I'm trying to make the
case that systemd itself should be focusing on features and policies
and awesome APIs.

> We want to run fewer, simpler things on our systems, we want to reuse as

Fewer and simpler are not compatible, unless you are losing
functionality.  Systemd is fewer, but NOT simpler.

> much of the code as we can. You don't achieve that by running yet another
> daemon that does worse what systemd can anyway do simpler, easier and
> better.

Considering this is all hypothetical, I find this to be a funny
debate.  My hypothetical idea is better than your hypothetical idea.

> The least you could grant us is to have a look at the final APIs we will
> have to offer before you already imply that systemd cannot be a valid
> implementation of any API people could ever agree on.

Whoah, don't get defensive.  I said nothing of the sort.  The fact of
the matter is that we do not run systemd, at least in part because of
the monolithic nature.  That's unlikely to change in this timescale.
What I said was that it would be a shame if we had to invent our own
low-level cgroup daemon just because the "upstream" daemons was too
tightly coupled with systemd.

I think we have a lot of experience to offer to this project, and a
vested interest in seeing it done well.  But if it is purely
targetting systemd, we have little incentive to devote resources to
it.

Please note that I am strictly talking about the lowest layer of the
API.  Just the thing that guards cgroupfs against mere mortals.  The
higher layers - where abstractions exist, that are actually USEFUL to
end users - are not really in scope right now.  We already have our
own higher level APIs.

This is supposed to be collaborative, not combative.

Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Lennart Poettering

On 28.06.2013 20:53, Tim Hockin wrote:


a single-agent, we should make a kick-ass implementation that is
flexible and scalable, and full-featured enough to not require
divergence at the lowest layer of the stack.  Then build systemd on
top of that. Let systemd offer more features and policies and
"semantic" APIs.


Well, what if systemd is already kick-ass? I mean, if you have a problem 
with systemd, then that's your own problem, but I really don't think why 
I should bother?


I for sure am not going to make the PID 1 a client of another daemon. 
That's just wrong. If you have a daemon that is both conceptually the 
manager of another service and the client of that other service, then 
that's bad design and you will easily run into deadlocks and such. Just 
think about it: if you have some external daemon for managing cgroups, 
and you need cgroups for running external daemons, how are you going to 
start the external daemon for managing cgroups? Sure, you can hack 
around this, make that daemon special, and magic, and stuff -- or you 
can just not do such nonsense. There's no reason to repeat the fuckup 
that cgroup became in kernelspace a second time, but this time in 
userspace, with multiple manager daemons all with different and slightly 
incompatible definitions what a unit to manage actualy is...


We want to run fewer, simpler things on our systems, we want to reuse as 
much of the code as we can. You don't achieve that by running yet 
another daemon that does worse what systemd can anyway do simpler, 
easier and better.


The least you could grant us is to have a look at the final APIs we will 
have to offer before you already imply that systemd cannot be a valid 
implementation of any API people could ever agree on.


Lennart
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Tejun Heo
On Fri, Jun 28, 2013 at 05:40:53PM -0500, Serge Hallyn wrote:
> > The kernel can exposed a knob that would allow systemd to lock that
> > down
> 
> Gah - why would you give him that idea?  :)

That's one of the ideas I had from the beginning.

> But yes, I'd sort of assume that was coming, eventually.

But I think we'll probably settle with a mechanism to find out whether
someone else is touching the hierarchy, which will be generally useful
for other consumers of cgroup too.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Serge Hallyn
Quoting Daniel P. Berrange (berra...@redhat.com):
> On Fri, Jun 28, 2013 at 02:01:55PM -0400, Vivek Goyal wrote:
> > On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> > > On Thu 27-06-13 22:01:38, Tejun Heo wrote:
> > > > Hello, Mike.
> > > > 
> > > > On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> > > > > I always thought that was a very cool feature, mkdir+echo, poof done.
> > > > > Now maybe that interface is suboptimal for serious usage, but it makes
> > > > > the things usable via dirt simple scripts, very flexible, nice.
> > > > 
> > > > Oh, that in itself is not bad.  I mean, if you're root, it's pretty
> > > > easy to play with and that part is fine.  But combined with the
> > > > hierarchical nature of cgroup and file permissions, it encourages
> > > > people to "deligate" subdirectories to less previledged domains,
> > > 
> > > OK, this really depends on what you expose to non-root users. I have
> > > seen use cases where admin prepares top-level which is root-only but
> > > it allows creating sub-groups which are under _full_ control of the
> > > subdomain. This worked nicely for memcg for example because hard limit,
> > > oom handling and other knobs are hierarchical so the subdomain cannot
> > > overwrite what admin has said.
> > > 
> > > > which
> > > > in turn leads to normal binaries to manipulate them directly, which is
> > > > where the horror begins.  We end up exposing control knobs which are
> > > > tightly coupled to kernel implementation details right into lay
> > > > binaries and scripts directly used by end users.
> > > >
> > > > I think this is the first time this happened, which is probably why
> > > > nobody really noticed the mess earlier.
> > > > 
> > > > Anyways, if you're root, you can keep doing whatever you want.
> > > 
> > > OK, so libcgroup's rules daemon will still work and place my tasks in
> > > appropriate cgroups?
> > 
> > Do you use that daemon in practice? For user session logins, I think
> > systemd has plans to put user sessions in a cgroup (kind of making
> > pam_cgroup redundant). 
> > 
> > Other functionality rulesengined was providing moving tasks automatically
> > in a cgroup based on executable name. I think that was racy and not
> > many people had liked it.
> 
> Regardless of the changes being proposed, IMHO, the cgrulesd should
> never be used. It is just outright dangerous for a daemon to be
> arbitrarily re-arranging what cgroups a process is placed in without
> the applications being aware of it. It can only be safely used in a
> scenario where cgroups are exclusively used by the administrator,
> and never used by applications for their own needs.

Even then it's not safe, since if the program quickly forks or clones a
few times, you can end up with some of the tasks being reclassified
and some not.

> > IIUC, systemd can't disable access to cgroupfs from other utilities.
> 
> The kernel can exposed a knob that would allow systemd to lock that
> down

Gah - why would you give him that idea?  :)

But yes, I'd sort of assume that was coming, eventually.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Daniel P. Berrange
On Fri, Jun 28, 2013 at 02:01:55PM -0400, Vivek Goyal wrote:
> On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> > On Thu 27-06-13 22:01:38, Tejun Heo wrote:
> > > Hello, Mike.
> > > 
> > > On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> > > > I always thought that was a very cool feature, mkdir+echo, poof done.
> > > > Now maybe that interface is suboptimal for serious usage, but it makes
> > > > the things usable via dirt simple scripts, very flexible, nice.
> > > 
> > > Oh, that in itself is not bad.  I mean, if you're root, it's pretty
> > > easy to play with and that part is fine.  But combined with the
> > > hierarchical nature of cgroup and file permissions, it encourages
> > > people to "deligate" subdirectories to less previledged domains,
> > 
> > OK, this really depends on what you expose to non-root users. I have
> > seen use cases where admin prepares top-level which is root-only but
> > it allows creating sub-groups which are under _full_ control of the
> > subdomain. This worked nicely for memcg for example because hard limit,
> > oom handling and other knobs are hierarchical so the subdomain cannot
> > overwrite what admin has said.
> > 
> > > which
> > > in turn leads to normal binaries to manipulate them directly, which is
> > > where the horror begins.  We end up exposing control knobs which are
> > > tightly coupled to kernel implementation details right into lay
> > > binaries and scripts directly used by end users.
> > >
> > > I think this is the first time this happened, which is probably why
> > > nobody really noticed the mess earlier.
> > > 
> > > Anyways, if you're root, you can keep doing whatever you want.
> > 
> > OK, so libcgroup's rules daemon will still work and place my tasks in
> > appropriate cgroups?
> 
> Do you use that daemon in practice? For user session logins, I think
> systemd has plans to put user sessions in a cgroup (kind of making
> pam_cgroup redundant). 
> 
> Other functionality rulesengined was providing moving tasks automatically
> in a cgroup based on executable name. I think that was racy and not
> many people had liked it.

Regardless of the changes being proposed, IMHO, the cgrulesd should
never be used. It is just outright dangerous for a daemon to be
arbitrarily re-arranging what cgroups a process is placed in without
the applications being aware of it. It can only be safely used in a
scenario where cgroups are exclusively used by the administrator,
and never used by applications for their own needs.

> IIUC, systemd can't disable access to cgroupfs from other utilities.

The kernel can exposed a knob that would allow systemd to lock that
down

> So most likely rulesengined should contine to work. But having both
> systemd and libcgroup might not make much sense though.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Serge Hallyn
Quoting Andy Lutomirski (l...@amacapital.net):
> On 06/27/2013 11:01 AM, Tejun Heo wrote:
> > AFAICS, having a userland agent which has overall knowledge of the
> > hierarchy and enforcesf structure and limiations is a requirement to
> > make cgroup generally useable and useful.  For systemd based systems,
> > systemd serving that role isn't too crazy.  It's sure gonna have
> > teeting issues at the beginning but it has all the necessary
> > information to manage workloads on the system.
> > 
> > A valid issue is interoperability between systemd and non-systemd
> > systems.  I don't have an immediately good answer for that.  I wrote
> > in another reply but making cgroup generally available is a pretty new
> > effort and we're still in the process of figuring out what the right
> > constructs and abstractions are.  Hopefully, we'll be able to reach a
> > common set of abstractions to base things on top in itme.
> > 
> 
> The systemd stuff will break my code, too (although the single hierarchy
> by itself won't, I think).  I think that the kernel should make whatever
> simple changes are needed so that systemd can function without using
> cgroups at all.  That way users of a different cgroup scheme can turn
> off systemd's.
> 
> Here was my proposal, which hasn't gotten a clear reply:
> 
> http://article.gmane.org/gmane.comp.sysutils.systemd.devel/11424

Neat.  I like that proposal.

> I've already sent a patch to make /proc//task//children
> available regardless of configuration.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Andy Lutomirski
On 06/27/2013 11:01 AM, Tejun Heo wrote:
> AFAICS, having a userland agent which has overall knowledge of the
> hierarchy and enforcesf structure and limiations is a requirement to
> make cgroup generally useable and useful.  For systemd based systems,
> systemd serving that role isn't too crazy.  It's sure gonna have
> teeting issues at the beginning but it has all the necessary
> information to manage workloads on the system.
> 
> A valid issue is interoperability between systemd and non-systemd
> systems.  I don't have an immediately good answer for that.  I wrote
> in another reply but making cgroup generally available is a pretty new
> effort and we're still in the process of figuring out what the right
> constructs and abstractions are.  Hopefully, we'll be able to reach a
> common set of abstractions to base things on top in itme.
> 

The systemd stuff will break my code, too (although the single hierarchy
by itself won't, I think).  I think that the kernel should make whatever
simple changes are needed so that systemd can function without using
cgroups at all.  That way users of a different cgroup scheme can turn
off systemd's.

Here was my proposal, which hasn't gotten a clear reply:

http://article.gmane.org/gmane.comp.sysutils.systemd.devel/11424

I've already sent a patch to make /proc//task//children
available regardless of configuration.

--Andy


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Tim Hockin
On Fri, Jun 28, 2013 at 8:53 AM, Serge Hallyn  wrote:
> Quoting Daniel P. Berrange (berra...@redhat.com):

>> Are you also planning to actually write a new cgroup parent manager
>> daemon too ? Currently my plan for libvirt is to just talk directly
>
> I'm toying with the idea, yes.  (Right now my toy runs in either native
> mode, using cgroupfs, or child mode, talking to a parent manager)  I'd
> love if someone else does it, but it needs to be done.
>
> As I've said elsewhere in the thread, I see 2 problems to be addressed:
>
> 1. The ability to nest the cgroup manager daemons, so that a daemon
> running in a container can talk to a daemon running on the host.  This
> is the problem my current toy is aiming to address.  But the API it
> exports is just a thin layer over cgroupfs.
>
> 2. Abstract away the kernel/cgroupfs details so that userspace can
> explain its cgroup needs generically.  This is IIUC what systemd is
> addressing with slices and scopes.
>
> (2) is where I'd really like to have a well thought out, community
> designed API that everyone can agree on, and it might be worth getting
> together (with Tejun) at plumbers or something to lay something out.

We're also working on (2) (well, we HAVE it, but we're dis-integrating
it so we can hopefully publish more widely).  But our (2) depends on
direct cgroupfs access.  If that is to change, we need a really robust
(1).  It's OK (desireable, in fact) that (1) be a very thin layer of
abstraction.

> In the end, something like libvirt or lxc should not need to care
> what is running underneat it.  It should be able to make its requests
> the same way regardless of whether it running in fedora or ubuntu,
> and whether it is running on the host or in a tightly bound container.
> That's my goal anyway :)
>
>> to systemd's new DBus APIs for all management of cgroups, and then
>> fall back to writing to cgroupfs directly for cases where systemd
>> is not around.  Having a library to abstract these two possible
>> alternatives isn't all that compelling unless we think there will
>> be multiple cgroups manager daemons. I've been somewhat assuming that
>> even Ubuntu will eventually see the benefits & switch to systemd,
>
> So far I've seen no indication of that :)
>
> If the systemd code to manage slices could be made separately
> compileable as a standalone library or daemon, then I'd advocate
> using that.  But I don't see a lot of incentive for systemd to do
> that, so I'd feel like a heel even asking.

I want to say "let the best API win", but I know that systemd is a
giant katamari ball, and it's absorbing subsystems so it may win by
default.  That isn't going to stop us from trying to do what we do,
and share that with the world.

>> then the issue of multiple manager daemons wouldn't really exist.
>
> True.  But I'm running under the assumption that Ubuntu will stick with
> upstart, and therefore yes I'll need a separate (perhaps pair of)
> management daemons.
>
> Even if we were to switch to systemd, I'd like the API for userspace
> programs to configure and use cgroups to be as generic as possible,
> so that anyone who wanted to write their own daemon could do so.
>
> -serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Tim Hockin
On Fri, Jun 28, 2013 at 8:05 AM, Michal Hocko  wrote:
> On Thu 27-06-13 22:01:38, Tejun Heo wrote:

>> Oh, that in itself is not bad.  I mean, if you're root, it's pretty
>> easy to play with and that part is fine.  But combined with the
>> hierarchical nature of cgroup and file permissions, it encourages
>> people to "deligate" subdirectories to less previledged domains,
>
> OK, this really depends on what you expose to non-root users. I have
> seen use cases where admin prepares top-level which is root-only but
> it allows creating sub-groups which are under _full_ control of the
> subdomain. This worked nicely for memcg for example because hard limit,
> oom handling and other knobs are hierarchical so the subdomain cannot
> overwrite what admin has said.

bingo

> And the systemd, with its history of eating projects and not caring much
> about their previous users who are not willing to jump in to the systemd
> car, doesn't sound like a good place where to place the new interface to
> me.

+1

If systemd is the only upstream implementation of this single-agent
idea, we will have to invent our own, and continue to diverge rather
than converge.  I think that, if we are going to pursue this model of
a single-agent, we should make a kick-ass implementation that is
flexible and scalable, and full-featured enough to not require
divergence at the lowest layer of the stack.  Then build systemd on
top of that. Let systemd offer more features and policies and
"semantic" APIs.

We will build our own semantic APIs that are, necessarily, different
from systemd.  But we can all use the same low-level mechanism.

Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Tim Hockin
On Thu, Jun 27, 2013 at 2:04 PM, Tejun Heo  wrote:
> Hello,
>
> On Thu, Jun 27, 2013 at 01:46:18PM -0700, Tim Hockin wrote:
>> So what you're saying is that you don't care that this new thing is
>> less capable than the old thing, despite it having real impact.
>
> Sort of.  I'm saying, at least up until now, moving away from
> orthogonal hierarchy support seems to be the right trade-off.  It all
> depends on how you measure how much things are simplified and how
> heavy the "real impacts" are.  It's not like these things can be
> determined white and black.  Given the current situation, I think it's
> the right call.

I totally understand where you're coming from - trying to get back to
a stable feature set.  But it sucks to be on the losing end of that
battle - you're cutting things that REALLY matter to us, and without a
really viable alternative.  So we'll keep fighting.

>> If controller C is enabled at level X but disabled at level X/Y, does
>> that mean that X/Y uses the limits set in X?  How about X/Y/Z?
>
> Y and Y/Z wouldn't make any difference.  Tasks belonging to them would
> behave as if they belong to X as far as C is concerened.

OK, that *sounds* sane.  It doesn't solve all our problems, but it
alleviates some of them.

>> So take away some of the flexibility that has minimal impact and
>> maximum return.  Splitting threads across cgroups - we use it, but we
>> could get off that.  Force all-or-nothing joining of an aggregate
>
> Please do so.

Splitting threads is sort of important for some cgroups, like CPU.  I
wonder if pjt is paying attention to this thread.

>> construct (a container vs N cgroups).
>>
>> But perform surgery with a scalpel, not a hatchet.
>
> As anything else, it's drawing a line in a continuous spectrum of
> grey.  Right now, given that maintaining multiple orthogonal
> hierarchies while introducing a proper concept of resource container
> involves addition of completely new constructs and complexity, I don't
> think that's a good option.  If there are problems which can't be
> resolved / worked around in a reasonable manner, please bring them up
> along with their contexts.  Let's examine them and see whether there
> are other ways to accomodate them.

You're arguing that the abstraction you want is that of a "container"
but that it's easier to remove options than to actually build a better
API.

I think this is wrong.  Take the opportunity to define the RIGHT
interface that you WANT - a container.  Implement it in terms of
cgroups (and maybe other stuff!).  Make that API so compelling that
people want to use it, and your war of attrition on direct cgroup
madness will be won, but with net progress rather than regress.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Tejun Heo
Hello, Michal.

On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> OK, this really depends on what you expose to non-root users. I have
> seen use cases where admin prepares top-level which is root-only but
> it allows creating sub-groups which are under _full_ control of the
> subdomain. This worked nicely for memcg for example because hard limit,
> oom handling and other knobs are hierarchical so the subdomain cannot
> overwrite what admin has said.

Some knobs are safer than others and memcg probably has it easy as it
doesn't implement proportional control.  But, even then, there's a
huge chasm between cgroup knobs and proper kernel API visible to
normal programs.  Just imagine exposing memcg features by extending
rlimits.  It'll take months if not a couple years ironing out the API
details and going through review process, and rightfully so, these
things, once published and made widely available, can't be taken back.
Now compare that to how we decide what knobs to expose in cgroup.  I
mean, you even recently suggested flipping the default polarity of
soft limit knob.

cgroup's interface standard is very low.  It's probably a notch higher
than boot params but about at the same level as sysctl knobs.  It
isn't necessarily a bad thing as it allows us to rapidly explore
various options and expose useable things in a very agile manner, but
we should be very aware of how widely the interface is exposed;
otherwise, we'd be exposing features and leaking kernel implementation
details directly into userland programs without going through proper
review process or buliding consensus, which, in the long term, is
gonna be much worse than not having the feature exposed at all.

"It works for special cases XXX and YYY" is a very poor and extremely
short-sighted argument when the whole approach is breaching the very
fundamentals of kernel API conventions.

In addition, I really don't think cgroup is the right interface to
directly expose to individual programs.  As a management thing, it
does make some sense but kernel API already has its, at times ancient
but, generally working hierarchy and inheritance rules and conventions
and primitive resource control contructs - nice, ionice, rlimits and
so on.  If exposing cgroup-level resource control directly to
individual applications proves to be beneficial enough, what we should
do is extending those things.  The backend sure can be supported by
cgroups but this mkdiring and echoing things with separate hierarchy
from the usual process hierarchy isn't something which should be
visible to individual applications.

Currently, I'm not convinced that this is something which should be
exposed to individual applications, but I sure can be wrong.  But,
right now, let's first get the existing part settled.  We can worry
about the rest later.

Also, in light of the rather sneaky subversion happened with cgroup
filesystem interface, I wonder whether we need to add some sort of
generic warning mechanism which warns when permissions of pseudo file
systems like cgroupfs are delegated to lesser security domains.  In
itself, it could be harmless but it can serves as a useful beacon.
Not sure to what extent or how tho.

> OK, so libcgroup's rules daemon will still work and place my tasks in
> appropriate cgroups?

You have two competing managers of the same hierarchy.  There are ways
to make them not interfere with each other too much but ultimately
it's gonna be something clunky.  That said, libcgroup itself is pretty
clunky, so maybe you'll be okay with it.  I don't know.

> This is not quite in par with "libcgroup is dead and others have to
> migrate to systemd as well" statements from the link posted earlier.
> I really do not think that _any_ central agent will understand my
> requirements and needs so I need a way to talk to cgroupfs somehow - I
> have used libcgroups so far but touching cgroupfs is quite convinient
> as well.

As a developer who knows what's going on, I don't think it'd be too
difficult to meddle with things manually with or without the central
manager.  It'll complain that someone else is meddling with the cgroup
hierarchy and some functionalities might not work as expected, but I
don't think it'll lock you out.

At the same time, while us, the developers, having the level of
latitude required to do our work is necessary, that shouldn't be the
overruling focal point of the design of the whole system.  It's
something to be used and supporting the actual use cases should be the
priority.  I'm not saying developer convenience is not important but
that it's not the only thing which matters.  The way I see it, cgroup
has basically been a playground for devs going wild without too much,
if any, thought on how it'll actually be useable and useful to wider
audience, so let's please adjust our priorities a bit.

And, no, I don't believe that the use cases are so wildly different
that we can't have a capable enough central manager.  That's usually a
symptom of 

Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Vivek Goyal
On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> On Thu 27-06-13 22:01:38, Tejun Heo wrote:
> > Hello, Mike.
> > 
> > On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> > > I always thought that was a very cool feature, mkdir+echo, poof done.
> > > Now maybe that interface is suboptimal for serious usage, but it makes
> > > the things usable via dirt simple scripts, very flexible, nice.
> > 
> > Oh, that in itself is not bad.  I mean, if you're root, it's pretty
> > easy to play with and that part is fine.  But combined with the
> > hierarchical nature of cgroup and file permissions, it encourages
> > people to "deligate" subdirectories to less previledged domains,
> 
> OK, this really depends on what you expose to non-root users. I have
> seen use cases where admin prepares top-level which is root-only but
> it allows creating sub-groups which are under _full_ control of the
> subdomain. This worked nicely for memcg for example because hard limit,
> oom handling and other knobs are hierarchical so the subdomain cannot
> overwrite what admin has said.
> 
> > which
> > in turn leads to normal binaries to manipulate them directly, which is
> > where the horror begins.  We end up exposing control knobs which are
> > tightly coupled to kernel implementation details right into lay
> > binaries and scripts directly used by end users.
> >
> > I think this is the first time this happened, which is probably why
> > nobody really noticed the mess earlier.
> > 
> > Anyways, if you're root, you can keep doing whatever you want.
> 
> OK, so libcgroup's rules daemon will still work and place my tasks in
> appropriate cgroups?

Do you use that daemon in practice? For user session logins, I think
systemd has plans to put user sessions in a cgroup (kind of making
pam_cgroup redundant). 

Other functionality rulesengined was providing moving tasks automatically
in a cgroup based on executable name. I think that was racy and not
many people had liked it.

IIUC, systemd can't disable access to cgroupfs from other utilities.
So most likely rulesengined should contine to work. But having both
systemd and libcgroup might not make much sense though.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Serge Hallyn
Quoting Daniel P. Berrange (berra...@redhat.com):
> On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
> > FWIW, the code is too embarassing yet to see daylight, but I'm playing
> > with a very lowlevel cgroup manager which supports nesting itself.
> > Access in this POC is low-level ("set freezer.state to THAWED for cgroup
> > /c1/c2", "Create /c3"), but the key feature is that it can run in two
> > modes - native mode in which it uses cgroupfs, and child mode where it
> > talks to a parent manager to make the changes.
> > 
> > So then the idea would be that userspace (like libvirt and lxc) would
> > talk over /dev/cgroup to its manager.  Userspace inside a container
> > (which can't actually mount cgroups itself) would talk to its own
> > manager which is talking over a passed-in socket to the host manager,
> > which in turn runs natively (uses cgroupfs, and nests "create /c1" under
> > the requestor's cgroup).
> > 
> > At some point (probably soon) we might want to talk about a standard API
> > for these things.  However I think it will have to come in the form of
> > a standard library, which knows to either send requests over dbus to
> > systemd, or over /dev/cgroup sock to the manager.
> 
> Are you also planning to actually write a new cgroup parent manager
> daemon too ? Currently my plan for libvirt is to just talk directly

I'm toying with the idea, yes.  (Right now my toy runs in either native
mode, using cgroupfs, or child mode, talking to a parent manager)  I'd
love if someone else does it, but it needs to be done.

As I've said elsewhere in the thread, I see 2 problems to be addressed:

1. The ability to nest the cgroup manager daemons, so that a daemon
running in a container can talk to a daemon running on the host.  This
is the problem my current toy is aiming to address.  But the API it
exports is just a thin layer over cgroupfs.

2. Abstract away the kernel/cgroupfs details so that userspace can
explain its cgroup needs generically.  This is IIUC what systemd is
addressing with slices and scopes.

(2) is where I'd really like to have a well thought out, community
designed API that everyone can agree on, and it might be worth getting
together (with Tejun) at plumbers or something to lay something out.

In the end, something like libvirt or lxc should not need to care
what is running underneat it.  It should be able to make its requests
the same way regardless of whether it running in fedora or ubuntu,
and whether it is running on the host or in a tightly bound container.
That's my goal anyway :)

> to systemd's new DBus APIs for all management of cgroups, and then
> fall back to writing to cgroupfs directly for cases where systemd
> is not around.  Having a library to abstract these two possible
> alternatives isn't all that compelling unless we think there will
> be multiple cgroups manager daemons. I've been somewhat assuming that
> even Ubuntu will eventually see the benefits & switch to systemd,

So far I've seen no indication of that :)

If the systemd code to manage slices could be made separately
compileable as a standalone library or daemon, then I'd advocate
using that.  But I don't see a lot of incentive for systemd to do
that, so I'd feel like a heel even asking.

> then the issue of multiple manager daemons wouldn't really exist.

True.  But I'm running under the assumption that Ubuntu will stick with
upstart, and therefore yes I'll need a separate (perhaps pair of)
management daemons.

Even if we were to switch to systemd, I'd like the API for userspace
programs to configure and use cgroups to be as generic as possible,
so that anyone who wanted to write their own daemon could do so.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Michal Hocko
On Thu 27-06-13 22:01:38, Tejun Heo wrote:
> Hello, Mike.
> 
> On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> > I always thought that was a very cool feature, mkdir+echo, poof done.
> > Now maybe that interface is suboptimal for serious usage, but it makes
> > the things usable via dirt simple scripts, very flexible, nice.
> 
> Oh, that in itself is not bad.  I mean, if you're root, it's pretty
> easy to play with and that part is fine.  But combined with the
> hierarchical nature of cgroup and file permissions, it encourages
> people to "deligate" subdirectories to less previledged domains,

OK, this really depends on what you expose to non-root users. I have
seen use cases where admin prepares top-level which is root-only but
it allows creating sub-groups which are under _full_ control of the
subdomain. This worked nicely for memcg for example because hard limit,
oom handling and other knobs are hierarchical so the subdomain cannot
overwrite what admin has said.

> which
> in turn leads to normal binaries to manipulate them directly, which is
> where the horror begins.  We end up exposing control knobs which are
> tightly coupled to kernel implementation details right into lay
> binaries and scripts directly used by end users.
>
> I think this is the first time this happened, which is probably why
> nobody really noticed the mess earlier.
> 
> Anyways, if you're root, you can keep doing whatever you want.

OK, so libcgroup's rules daemon will still work and place my tasks in
appropriate cgroups?

This is not quite in par with "libcgroup is dead and others have to
migrate to systemd as well" statements from the link posted earlier.
I really do not think that _any_ central agent will understand my
requirements and needs so I need a way to talk to cgroupfs somehow - I
have used libcgroups so far but touching cgroupfs is quite convinient
as well.

And the systemd, with its history of eating projects and not caring much
about their previous users who are not willing to jump in to the systemd
car, doesn't sound like a good place where to place the new interface to
me.

[...]
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Daniel P. Berrange
On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
> FWIW, the code is too embarassing yet to see daylight, but I'm playing
> with a very lowlevel cgroup manager which supports nesting itself.
> Access in this POC is low-level ("set freezer.state to THAWED for cgroup
> /c1/c2", "Create /c3"), but the key feature is that it can run in two
> modes - native mode in which it uses cgroupfs, and child mode where it
> talks to a parent manager to make the changes.
> 
> So then the idea would be that userspace (like libvirt and lxc) would
> talk over /dev/cgroup to its manager.  Userspace inside a container
> (which can't actually mount cgroups itself) would talk to its own
> manager which is talking over a passed-in socket to the host manager,
> which in turn runs natively (uses cgroupfs, and nests "create /c1" under
> the requestor's cgroup).
> 
> At some point (probably soon) we might want to talk about a standard API
> for these things.  However I think it will have to come in the form of
> a standard library, which knows to either send requests over dbus to
> systemd, or over /dev/cgroup sock to the manager.

Are you also planning to actually write a new cgroup parent manager
daemon too ? Currently my plan for libvirt is to just talk directly
to systemd's new DBus APIs for all management of cgroups, and then
fall back to writing to cgroupfs directly for cases where systemd
is not around.  Having a library to abstract these two possible
alternatives isn't all that compelling unless we think there will
be multiple cgroups manager daemons. I've been somewhat assuming that
even Ubuntu will eventually see the benefits & switch to systemd,
then the issue of multiple manager daemons wouldn't really exist.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Mike Galbraith
On Thu, 2013-06-27 at 22:01 -0700, Tejun Heo wrote:

> Anyways, if you're root, you can keep doing whatever you want.  You
> could be stepping on the centralized agent's toes a bit and vice-versa

Keep on truckn' sounds good, that vice-versa toe stomping not so good,
but yeah, until systemd or ilk grows the ability to shut me down, I
shouldn't feel any burning need to introduce it to my machete.

> but I don't think that's gonna be disastrous.  What I'm trying to
> stamp out is direct usages from !root domains and !system-management
> binaries / scripts.  They absolutely have to go.  There's no question
> about it and I'll take totalitarian userland agent anyday over the
> current mess.

I get some of the why.. and yeah, it's the dirt simple usage that I care
about most, not the big hairy problem cases you're trying to address. 

> Eventually, I think we'll be able to reach an equilibrium where most
> things are reasonable and we'll be exploring the acceptable limits of
> flexibility again, but right now, please bear with the brutality.
> We're way over the line and I can't see a way back which isn't gonna
> sting a bit.  I'm and will keep trying to make it as painless as
> possible.

Keep on driving, and thanks for listening.  Aaao ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Mike Galbraith
On Thu, 2013-06-27 at 22:01 -0700, Tejun Heo wrote:

 Anyways, if you're root, you can keep doing whatever you want.  You
 could be stepping on the centralized agent's toes a bit and vice-versa

Keep on truckn' sounds good, that vice-versa toe stomping not so good,
but yeah, until systemd or ilk grows the ability to shut me down, I
shouldn't feel any burning need to introduce it to my machete.

 but I don't think that's gonna be disastrous.  What I'm trying to
 stamp out is direct usages from !root domains and !system-management
 binaries / scripts.  They absolutely have to go.  There's no question
 about it and I'll take totalitarian userland agent anyday over the
 current mess.

I get some of the why.. and yeah, it's the dirt simple usage that I care
about most, not the big hairy problem cases you're trying to address. 

 Eventually, I think we'll be able to reach an equilibrium where most
 things are reasonable and we'll be exploring the acceptable limits of
 flexibility again, but right now, please bear with the brutality.
 We're way over the line and I can't see a way back which isn't gonna
 sting a bit.  I'm and will keep trying to make it as painless as
 possible.

Keep on driving, and thanks for listening.  Aaao ;-)

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Daniel P. Berrange
On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
 FWIW, the code is too embarassing yet to see daylight, but I'm playing
 with a very lowlevel cgroup manager which supports nesting itself.
 Access in this POC is low-level (set freezer.state to THAWED for cgroup
 /c1/c2, Create /c3), but the key feature is that it can run in two
 modes - native mode in which it uses cgroupfs, and child mode where it
 talks to a parent manager to make the changes.
 
 So then the idea would be that userspace (like libvirt and lxc) would
 talk over /dev/cgroup to its manager.  Userspace inside a container
 (which can't actually mount cgroups itself) would talk to its own
 manager which is talking over a passed-in socket to the host manager,
 which in turn runs natively (uses cgroupfs, and nests create /c1 under
 the requestor's cgroup).
 
 At some point (probably soon) we might want to talk about a standard API
 for these things.  However I think it will have to come in the form of
 a standard library, which knows to either send requests over dbus to
 systemd, or over /dev/cgroup sock to the manager.

Are you also planning to actually write a new cgroup parent manager
daemon too ? Currently my plan for libvirt is to just talk directly
to systemd's new DBus APIs for all management of cgroups, and then
fall back to writing to cgroupfs directly for cases where systemd
is not around.  Having a library to abstract these two possible
alternatives isn't all that compelling unless we think there will
be multiple cgroups manager daemons. I've been somewhat assuming that
even Ubuntu will eventually see the benefits  switch to systemd,
then the issue of multiple manager daemons wouldn't really exist.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Michal Hocko
On Thu 27-06-13 22:01:38, Tejun Heo wrote:
 Hello, Mike.
 
 On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
  I always thought that was a very cool feature, mkdir+echo, poof done.
  Now maybe that interface is suboptimal for serious usage, but it makes
  the things usable via dirt simple scripts, very flexible, nice.
 
 Oh, that in itself is not bad.  I mean, if you're root, it's pretty
 easy to play with and that part is fine.  But combined with the
 hierarchical nature of cgroup and file permissions, it encourages
 people to deligate subdirectories to less previledged domains,

OK, this really depends on what you expose to non-root users. I have
seen use cases where admin prepares top-level which is root-only but
it allows creating sub-groups which are under _full_ control of the
subdomain. This worked nicely for memcg for example because hard limit,
oom handling and other knobs are hierarchical so the subdomain cannot
overwrite what admin has said.

 which
 in turn leads to normal binaries to manipulate them directly, which is
 where the horror begins.  We end up exposing control knobs which are
 tightly coupled to kernel implementation details right into lay
 binaries and scripts directly used by end users.

 I think this is the first time this happened, which is probably why
 nobody really noticed the mess earlier.
 
 Anyways, if you're root, you can keep doing whatever you want.

OK, so libcgroup's rules daemon will still work and place my tasks in
appropriate cgroups?

This is not quite in par with libcgroup is dead and others have to
migrate to systemd as well statements from the link posted earlier.
I really do not think that _any_ central agent will understand my
requirements and needs so I need a way to talk to cgroupfs somehow - I
have used libcgroups so far but touching cgroupfs is quite convinient
as well.

And the systemd, with its history of eating projects and not caring much
about their previous users who are not willing to jump in to the systemd
car, doesn't sound like a good place where to place the new interface to
me.

[...]
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Serge Hallyn
Quoting Daniel P. Berrange (berra...@redhat.com):
 On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
  FWIW, the code is too embarassing yet to see daylight, but I'm playing
  with a very lowlevel cgroup manager which supports nesting itself.
  Access in this POC is low-level (set freezer.state to THAWED for cgroup
  /c1/c2, Create /c3), but the key feature is that it can run in two
  modes - native mode in which it uses cgroupfs, and child mode where it
  talks to a parent manager to make the changes.
  
  So then the idea would be that userspace (like libvirt and lxc) would
  talk over /dev/cgroup to its manager.  Userspace inside a container
  (which can't actually mount cgroups itself) would talk to its own
  manager which is talking over a passed-in socket to the host manager,
  which in turn runs natively (uses cgroupfs, and nests create /c1 under
  the requestor's cgroup).
  
  At some point (probably soon) we might want to talk about a standard API
  for these things.  However I think it will have to come in the form of
  a standard library, which knows to either send requests over dbus to
  systemd, or over /dev/cgroup sock to the manager.
 
 Are you also planning to actually write a new cgroup parent manager
 daemon too ? Currently my plan for libvirt is to just talk directly

I'm toying with the idea, yes.  (Right now my toy runs in either native
mode, using cgroupfs, or child mode, talking to a parent manager)  I'd
love if someone else does it, but it needs to be done.

As I've said elsewhere in the thread, I see 2 problems to be addressed:

1. The ability to nest the cgroup manager daemons, so that a daemon
running in a container can talk to a daemon running on the host.  This
is the problem my current toy is aiming to address.  But the API it
exports is just a thin layer over cgroupfs.

2. Abstract away the kernel/cgroupfs details so that userspace can
explain its cgroup needs generically.  This is IIUC what systemd is
addressing with slices and scopes.

(2) is where I'd really like to have a well thought out, community
designed API that everyone can agree on, and it might be worth getting
together (with Tejun) at plumbers or something to lay something out.

In the end, something like libvirt or lxc should not need to care
what is running underneat it.  It should be able to make its requests
the same way regardless of whether it running in fedora or ubuntu,
and whether it is running on the host or in a tightly bound container.
That's my goal anyway :)

 to systemd's new DBus APIs for all management of cgroups, and then
 fall back to writing to cgroupfs directly for cases where systemd
 is not around.  Having a library to abstract these two possible
 alternatives isn't all that compelling unless we think there will
 be multiple cgroups manager daemons. I've been somewhat assuming that
 even Ubuntu will eventually see the benefits  switch to systemd,

So far I've seen no indication of that :)

If the systemd code to manage slices could be made separately
compileable as a standalone library or daemon, then I'd advocate
using that.  But I don't see a lot of incentive for systemd to do
that, so I'd feel like a heel even asking.

 then the issue of multiple manager daemons wouldn't really exist.

True.  But I'm running under the assumption that Ubuntu will stick with
upstart, and therefore yes I'll need a separate (perhaps pair of)
management daemons.

Even if we were to switch to systemd, I'd like the API for userspace
programs to configure and use cgroups to be as generic as possible,
so that anyone who wanted to write their own daemon could do so.

-serge
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Vivek Goyal
On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
 On Thu 27-06-13 22:01:38, Tejun Heo wrote:
  Hello, Mike.
  
  On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
   I always thought that was a very cool feature, mkdir+echo, poof done.
   Now maybe that interface is suboptimal for serious usage, but it makes
   the things usable via dirt simple scripts, very flexible, nice.
  
  Oh, that in itself is not bad.  I mean, if you're root, it's pretty
  easy to play with and that part is fine.  But combined with the
  hierarchical nature of cgroup and file permissions, it encourages
  people to deligate subdirectories to less previledged domains,
 
 OK, this really depends on what you expose to non-root users. I have
 seen use cases where admin prepares top-level which is root-only but
 it allows creating sub-groups which are under _full_ control of the
 subdomain. This worked nicely for memcg for example because hard limit,
 oom handling and other knobs are hierarchical so the subdomain cannot
 overwrite what admin has said.
 
  which
  in turn leads to normal binaries to manipulate them directly, which is
  where the horror begins.  We end up exposing control knobs which are
  tightly coupled to kernel implementation details right into lay
  binaries and scripts directly used by end users.
 
  I think this is the first time this happened, which is probably why
  nobody really noticed the mess earlier.
  
  Anyways, if you're root, you can keep doing whatever you want.
 
 OK, so libcgroup's rules daemon will still work and place my tasks in
 appropriate cgroups?

Do you use that daemon in practice? For user session logins, I think
systemd has plans to put user sessions in a cgroup (kind of making
pam_cgroup redundant). 

Other functionality rulesengined was providing moving tasks automatically
in a cgroup based on executable name. I think that was racy and not
many people had liked it.

IIUC, systemd can't disable access to cgroupfs from other utilities.
So most likely rulesengined should contine to work. But having both
systemd and libcgroup might not make much sense though.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Tejun Heo
Hello, Michal.

On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
 OK, this really depends on what you expose to non-root users. I have
 seen use cases where admin prepares top-level which is root-only but
 it allows creating sub-groups which are under _full_ control of the
 subdomain. This worked nicely for memcg for example because hard limit,
 oom handling and other knobs are hierarchical so the subdomain cannot
 overwrite what admin has said.

Some knobs are safer than others and memcg probably has it easy as it
doesn't implement proportional control.  But, even then, there's a
huge chasm between cgroup knobs and proper kernel API visible to
normal programs.  Just imagine exposing memcg features by extending
rlimits.  It'll take months if not a couple years ironing out the API
details and going through review process, and rightfully so, these
things, once published and made widely available, can't be taken back.
Now compare that to how we decide what knobs to expose in cgroup.  I
mean, you even recently suggested flipping the default polarity of
soft limit knob.

cgroup's interface standard is very low.  It's probably a notch higher
than boot params but about at the same level as sysctl knobs.  It
isn't necessarily a bad thing as it allows us to rapidly explore
various options and expose useable things in a very agile manner, but
we should be very aware of how widely the interface is exposed;
otherwise, we'd be exposing features and leaking kernel implementation
details directly into userland programs without going through proper
review process or buliding consensus, which, in the long term, is
gonna be much worse than not having the feature exposed at all.

It works for special cases XXX and YYY is a very poor and extremely
short-sighted argument when the whole approach is breaching the very
fundamentals of kernel API conventions.

In addition, I really don't think cgroup is the right interface to
directly expose to individual programs.  As a management thing, it
does make some sense but kernel API already has its, at times ancient
but, generally working hierarchy and inheritance rules and conventions
and primitive resource control contructs - nice, ionice, rlimits and
so on.  If exposing cgroup-level resource control directly to
individual applications proves to be beneficial enough, what we should
do is extending those things.  The backend sure can be supported by
cgroups but this mkdiring and echoing things with separate hierarchy
from the usual process hierarchy isn't something which should be
visible to individual applications.

Currently, I'm not convinced that this is something which should be
exposed to individual applications, but I sure can be wrong.  But,
right now, let's first get the existing part settled.  We can worry
about the rest later.

Also, in light of the rather sneaky subversion happened with cgroup
filesystem interface, I wonder whether we need to add some sort of
generic warning mechanism which warns when permissions of pseudo file
systems like cgroupfs are delegated to lesser security domains.  In
itself, it could be harmless but it can serves as a useful beacon.
Not sure to what extent or how tho.

 OK, so libcgroup's rules daemon will still work and place my tasks in
 appropriate cgroups?

You have two competing managers of the same hierarchy.  There are ways
to make them not interfere with each other too much but ultimately
it's gonna be something clunky.  That said, libcgroup itself is pretty
clunky, so maybe you'll be okay with it.  I don't know.

 This is not quite in par with libcgroup is dead and others have to
 migrate to systemd as well statements from the link posted earlier.
 I really do not think that _any_ central agent will understand my
 requirements and needs so I need a way to talk to cgroupfs somehow - I
 have used libcgroups so far but touching cgroupfs is quite convinient
 as well.

As a developer who knows what's going on, I don't think it'd be too
difficult to meddle with things manually with or without the central
manager.  It'll complain that someone else is meddling with the cgroup
hierarchy and some functionalities might not work as expected, but I
don't think it'll lock you out.

At the same time, while us, the developers, having the level of
latitude required to do our work is necessary, that shouldn't be the
overruling focal point of the design of the whole system.  It's
something to be used and supporting the actual use cases should be the
priority.  I'm not saying developer convenience is not important but
that it's not the only thing which matters.  The way I see it, cgroup
has basically been a playground for devs going wild without too much,
if any, thought on how it'll actually be useable and useful to wider
audience, so let's please adjust our priorities a bit.

And, no, I don't believe that the use cases are so wildly different
that we can't have a capable enough central manager.  That's usually a
symptom of not understanding 

Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Tim Hockin
On Thu, Jun 27, 2013 at 2:04 PM, Tejun Heo t...@kernel.org wrote:
 Hello,

 On Thu, Jun 27, 2013 at 01:46:18PM -0700, Tim Hockin wrote:
 So what you're saying is that you don't care that this new thing is
 less capable than the old thing, despite it having real impact.

 Sort of.  I'm saying, at least up until now, moving away from
 orthogonal hierarchy support seems to be the right trade-off.  It all
 depends on how you measure how much things are simplified and how
 heavy the real impacts are.  It's not like these things can be
 determined white and black.  Given the current situation, I think it's
 the right call.

I totally understand where you're coming from - trying to get back to
a stable feature set.  But it sucks to be on the losing end of that
battle - you're cutting things that REALLY matter to us, and without a
really viable alternative.  So we'll keep fighting.

 If controller C is enabled at level X but disabled at level X/Y, does
 that mean that X/Y uses the limits set in X?  How about X/Y/Z?

 Y and Y/Z wouldn't make any difference.  Tasks belonging to them would
 behave as if they belong to X as far as C is concerened.

OK, that *sounds* sane.  It doesn't solve all our problems, but it
alleviates some of them.

 So take away some of the flexibility that has minimal impact and
 maximum return.  Splitting threads across cgroups - we use it, but we
 could get off that.  Force all-or-nothing joining of an aggregate

 Please do so.

Splitting threads is sort of important for some cgroups, like CPU.  I
wonder if pjt is paying attention to this thread.

 construct (a container vs N cgroups).

 But perform surgery with a scalpel, not a hatchet.

 As anything else, it's drawing a line in a continuous spectrum of
 grey.  Right now, given that maintaining multiple orthogonal
 hierarchies while introducing a proper concept of resource container
 involves addition of completely new constructs and complexity, I don't
 think that's a good option.  If there are problems which can't be
 resolved / worked around in a reasonable manner, please bring them up
 along with their contexts.  Let's examine them and see whether there
 are other ways to accomodate them.

You're arguing that the abstraction you want is that of a container
but that it's easier to remove options than to actually build a better
API.

I think this is wrong.  Take the opportunity to define the RIGHT
interface that you WANT - a container.  Implement it in terms of
cgroups (and maybe other stuff!).  Make that API so compelling that
people want to use it, and your war of attrition on direct cgroup
madness will be won, but with net progress rather than regress.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Tim Hockin
On Fri, Jun 28, 2013 at 8:53 AM, Serge Hallyn serge.hal...@ubuntu.com wrote:
 Quoting Daniel P. Berrange (berra...@redhat.com):

 Are you also planning to actually write a new cgroup parent manager
 daemon too ? Currently my plan for libvirt is to just talk directly

 I'm toying with the idea, yes.  (Right now my toy runs in either native
 mode, using cgroupfs, or child mode, talking to a parent manager)  I'd
 love if someone else does it, but it needs to be done.

 As I've said elsewhere in the thread, I see 2 problems to be addressed:

 1. The ability to nest the cgroup manager daemons, so that a daemon
 running in a container can talk to a daemon running on the host.  This
 is the problem my current toy is aiming to address.  But the API it
 exports is just a thin layer over cgroupfs.

 2. Abstract away the kernel/cgroupfs details so that userspace can
 explain its cgroup needs generically.  This is IIUC what systemd is
 addressing with slices and scopes.

 (2) is where I'd really like to have a well thought out, community
 designed API that everyone can agree on, and it might be worth getting
 together (with Tejun) at plumbers or something to lay something out.

We're also working on (2) (well, we HAVE it, but we're dis-integrating
it so we can hopefully publish more widely).  But our (2) depends on
direct cgroupfs access.  If that is to change, we need a really robust
(1).  It's OK (desireable, in fact) that (1) be a very thin layer of
abstraction.

 In the end, something like libvirt or lxc should not need to care
 what is running underneat it.  It should be able to make its requests
 the same way regardless of whether it running in fedora or ubuntu,
 and whether it is running on the host or in a tightly bound container.
 That's my goal anyway :)

 to systemd's new DBus APIs for all management of cgroups, and then
 fall back to writing to cgroupfs directly for cases where systemd
 is not around.  Having a library to abstract these two possible
 alternatives isn't all that compelling unless we think there will
 be multiple cgroups manager daemons. I've been somewhat assuming that
 even Ubuntu will eventually see the benefits  switch to systemd,

 So far I've seen no indication of that :)

 If the systemd code to manage slices could be made separately
 compileable as a standalone library or daemon, then I'd advocate
 using that.  But I don't see a lot of incentive for systemd to do
 that, so I'd feel like a heel even asking.

I want to say let the best API win, but I know that systemd is a
giant katamari ball, and it's absorbing subsystems so it may win by
default.  That isn't going to stop us from trying to do what we do,
and share that with the world.

 then the issue of multiple manager daemons wouldn't really exist.

 True.  But I'm running under the assumption that Ubuntu will stick with
 upstart, and therefore yes I'll need a separate (perhaps pair of)
 management daemons.

 Even if we were to switch to systemd, I'd like the API for userspace
 programs to configure and use cgroups to be as generic as possible,
 so that anyone who wanted to write their own daemon could do so.

 -serge
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Tim Hockin
On Fri, Jun 28, 2013 at 8:05 AM, Michal Hocko mho...@suse.cz wrote:
 On Thu 27-06-13 22:01:38, Tejun Heo wrote:

 Oh, that in itself is not bad.  I mean, if you're root, it's pretty
 easy to play with and that part is fine.  But combined with the
 hierarchical nature of cgroup and file permissions, it encourages
 people to deligate subdirectories to less previledged domains,

 OK, this really depends on what you expose to non-root users. I have
 seen use cases where admin prepares top-level which is root-only but
 it allows creating sub-groups which are under _full_ control of the
 subdomain. This worked nicely for memcg for example because hard limit,
 oom handling and other knobs are hierarchical so the subdomain cannot
 overwrite what admin has said.

bingo

 And the systemd, with its history of eating projects and not caring much
 about their previous users who are not willing to jump in to the systemd
 car, doesn't sound like a good place where to place the new interface to
 me.

+1

If systemd is the only upstream implementation of this single-agent
idea, we will have to invent our own, and continue to diverge rather
than converge.  I think that, if we are going to pursue this model of
a single-agent, we should make a kick-ass implementation that is
flexible and scalable, and full-featured enough to not require
divergence at the lowest layer of the stack.  Then build systemd on
top of that. Let systemd offer more features and policies and
semantic APIs.

We will build our own semantic APIs that are, necessarily, different
from systemd.  But we can all use the same low-level mechanism.

Tim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Andy Lutomirski
On 06/27/2013 11:01 AM, Tejun Heo wrote:
 AFAICS, having a userland agent which has overall knowledge of the
 hierarchy and enforcesf structure and limiations is a requirement to
 make cgroup generally useable and useful.  For systemd based systems,
 systemd serving that role isn't too crazy.  It's sure gonna have
 teeting issues at the beginning but it has all the necessary
 information to manage workloads on the system.
 
 A valid issue is interoperability between systemd and non-systemd
 systems.  I don't have an immediately good answer for that.  I wrote
 in another reply but making cgroup generally available is a pretty new
 effort and we're still in the process of figuring out what the right
 constructs and abstractions are.  Hopefully, we'll be able to reach a
 common set of abstractions to base things on top in itme.
 

The systemd stuff will break my code, too (although the single hierarchy
by itself won't, I think).  I think that the kernel should make whatever
simple changes are needed so that systemd can function without using
cgroups at all.  That way users of a different cgroup scheme can turn
off systemd's.

Here was my proposal, which hasn't gotten a clear reply:

http://article.gmane.org/gmane.comp.sysutils.systemd.devel/11424

I've already sent a patch to make /proc/pid/task/tid/children
available regardless of configuration.

--Andy


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Serge Hallyn
Quoting Andy Lutomirski (l...@amacapital.net):
 On 06/27/2013 11:01 AM, Tejun Heo wrote:
  AFAICS, having a userland agent which has overall knowledge of the
  hierarchy and enforcesf structure and limiations is a requirement to
  make cgroup generally useable and useful.  For systemd based systems,
  systemd serving that role isn't too crazy.  It's sure gonna have
  teeting issues at the beginning but it has all the necessary
  information to manage workloads on the system.
  
  A valid issue is interoperability between systemd and non-systemd
  systems.  I don't have an immediately good answer for that.  I wrote
  in another reply but making cgroup generally available is a pretty new
  effort and we're still in the process of figuring out what the right
  constructs and abstractions are.  Hopefully, we'll be able to reach a
  common set of abstractions to base things on top in itme.
  
 
 The systemd stuff will break my code, too (although the single hierarchy
 by itself won't, I think).  I think that the kernel should make whatever
 simple changes are needed so that systemd can function without using
 cgroups at all.  That way users of a different cgroup scheme can turn
 off systemd's.
 
 Here was my proposal, which hasn't gotten a clear reply:
 
 http://article.gmane.org/gmane.comp.sysutils.systemd.devel/11424

Neat.  I like that proposal.

 I've already sent a patch to make /proc/pid/task/tid/children
 available regardless of configuration.

-serge
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Daniel P. Berrange
On Fri, Jun 28, 2013 at 02:01:55PM -0400, Vivek Goyal wrote:
 On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
  On Thu 27-06-13 22:01:38, Tejun Heo wrote:
   Hello, Mike.
   
   On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
I always thought that was a very cool feature, mkdir+echo, poof done.
Now maybe that interface is suboptimal for serious usage, but it makes
the things usable via dirt simple scripts, very flexible, nice.
   
   Oh, that in itself is not bad.  I mean, if you're root, it's pretty
   easy to play with and that part is fine.  But combined with the
   hierarchical nature of cgroup and file permissions, it encourages
   people to deligate subdirectories to less previledged domains,
  
  OK, this really depends on what you expose to non-root users. I have
  seen use cases where admin prepares top-level which is root-only but
  it allows creating sub-groups which are under _full_ control of the
  subdomain. This worked nicely for memcg for example because hard limit,
  oom handling and other knobs are hierarchical so the subdomain cannot
  overwrite what admin has said.
  
   which
   in turn leads to normal binaries to manipulate them directly, which is
   where the horror begins.  We end up exposing control knobs which are
   tightly coupled to kernel implementation details right into lay
   binaries and scripts directly used by end users.
  
   I think this is the first time this happened, which is probably why
   nobody really noticed the mess earlier.
   
   Anyways, if you're root, you can keep doing whatever you want.
  
  OK, so libcgroup's rules daemon will still work and place my tasks in
  appropriate cgroups?
 
 Do you use that daemon in practice? For user session logins, I think
 systemd has plans to put user sessions in a cgroup (kind of making
 pam_cgroup redundant). 
 
 Other functionality rulesengined was providing moving tasks automatically
 in a cgroup based on executable name. I think that was racy and not
 many people had liked it.

Regardless of the changes being proposed, IMHO, the cgrulesd should
never be used. It is just outright dangerous for a daemon to be
arbitrarily re-arranging what cgroups a process is placed in without
the applications being aware of it. It can only be safely used in a
scenario where cgroups are exclusively used by the administrator,
and never used by applications for their own needs.

 IIUC, systemd can't disable access to cgroupfs from other utilities.

The kernel can exposed a knob that would allow systemd to lock that
down

 So most likely rulesengined should contine to work. But having both
 systemd and libcgroup might not make much sense though.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Serge Hallyn
Quoting Daniel P. Berrange (berra...@redhat.com):
 On Fri, Jun 28, 2013 at 02:01:55PM -0400, Vivek Goyal wrote:
  On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
   On Thu 27-06-13 22:01:38, Tejun Heo wrote:
Hello, Mike.

On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
 I always thought that was a very cool feature, mkdir+echo, poof done.
 Now maybe that interface is suboptimal for serious usage, but it makes
 the things usable via dirt simple scripts, very flexible, nice.

Oh, that in itself is not bad.  I mean, if you're root, it's pretty
easy to play with and that part is fine.  But combined with the
hierarchical nature of cgroup and file permissions, it encourages
people to deligate subdirectories to less previledged domains,
   
   OK, this really depends on what you expose to non-root users. I have
   seen use cases where admin prepares top-level which is root-only but
   it allows creating sub-groups which are under _full_ control of the
   subdomain. This worked nicely for memcg for example because hard limit,
   oom handling and other knobs are hierarchical so the subdomain cannot
   overwrite what admin has said.
   
which
in turn leads to normal binaries to manipulate them directly, which is
where the horror begins.  We end up exposing control knobs which are
tightly coupled to kernel implementation details right into lay
binaries and scripts directly used by end users.
   
I think this is the first time this happened, which is probably why
nobody really noticed the mess earlier.

Anyways, if you're root, you can keep doing whatever you want.
   
   OK, so libcgroup's rules daemon will still work and place my tasks in
   appropriate cgroups?
  
  Do you use that daemon in practice? For user session logins, I think
  systemd has plans to put user sessions in a cgroup (kind of making
  pam_cgroup redundant). 
  
  Other functionality rulesengined was providing moving tasks automatically
  in a cgroup based on executable name. I think that was racy and not
  many people had liked it.
 
 Regardless of the changes being proposed, IMHO, the cgrulesd should
 never be used. It is just outright dangerous for a daemon to be
 arbitrarily re-arranging what cgroups a process is placed in without
 the applications being aware of it. It can only be safely used in a
 scenario where cgroups are exclusively used by the administrator,
 and never used by applications for their own needs.

Even then it's not safe, since if the program quickly forks or clones a
few times, you can end up with some of the tasks being reclassified
and some not.

  IIUC, systemd can't disable access to cgroupfs from other utilities.
 
 The kernel can exposed a knob that would allow systemd to lock that
 down

Gah - why would you give him that idea?  :)

But yes, I'd sort of assume that was coming, eventually.

-serge
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Tejun Heo
On Fri, Jun 28, 2013 at 05:40:53PM -0500, Serge Hallyn wrote:
  The kernel can exposed a knob that would allow systemd to lock that
  down
 
 Gah - why would you give him that idea?  :)

That's one of the ideas I had from the beginning.

 But yes, I'd sort of assume that was coming, eventually.

But I think we'll probably settle with a mechanism to find out whether
someone else is touching the hierarchy, which will be generally useful
for other consumers of cgroup too.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Lennart Poettering

On 28.06.2013 20:53, Tim Hockin wrote:


a single-agent, we should make a kick-ass implementation that is
flexible and scalable, and full-featured enough to not require
divergence at the lowest layer of the stack.  Then build systemd on
top of that. Let systemd offer more features and policies and
semantic APIs.


Well, what if systemd is already kick-ass? I mean, if you have a problem 
with systemd, then that's your own problem, but I really don't think why 
I should bother?


I for sure am not going to make the PID 1 a client of another daemon. 
That's just wrong. If you have a daemon that is both conceptually the 
manager of another service and the client of that other service, then 
that's bad design and you will easily run into deadlocks and such. Just 
think about it: if you have some external daemon for managing cgroups, 
and you need cgroups for running external daemons, how are you going to 
start the external daemon for managing cgroups? Sure, you can hack 
around this, make that daemon special, and magic, and stuff -- or you 
can just not do such nonsense. There's no reason to repeat the fuckup 
that cgroup became in kernelspace a second time, but this time in 
userspace, with multiple manager daemons all with different and slightly 
incompatible definitions what a unit to manage actualy is...


We want to run fewer, simpler things on our systems, we want to reuse as 
much of the code as we can. You don't achieve that by running yet 
another daemon that does worse what systemd can anyway do simpler, 
easier and better.


The least you could grant us is to have a look at the final APIs we will 
have to offer before you already imply that systemd cannot be a valid 
implementation of any API people could ever agree on.


Lennart
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-28 Thread Tim Hockin
Come on, now, Lennart.  You put a lot of words in my mouth.

On Fri, Jun 28, 2013 at 6:48 PM, Lennart Poettering lpoet...@redhat.com wrote:
 On 28.06.2013 20:53, Tim Hockin wrote:

 a single-agent, we should make a kick-ass implementation that is
 flexible and scalable, and full-featured enough to not require
 divergence at the lowest layer of the stack.  Then build systemd on
 top of that. Let systemd offer more features and policies and
 semantic APIs.


 Well, what if systemd is already kick-ass? I mean, if you have a problem
 with systemd, then that's your own problem, but I really don't think why I
 should bother?

I didn't say it wasn't.  I said that we can build a common substrate
that systemd can build on *and* non-systemd systems can use *and*
Google can participate in.

 I for sure am not going to make the PID 1 a client of another daemon. That's
 just wrong. If you have a daemon that is both conceptually the manager of
 another service and the client of that other service, then that's bad design
 and you will easily run into deadlocks and such. Just think about it: if you
 have some external daemon for managing cgroups, and you need cgroups for
 running external daemons, how are you going to start the external daemon for
 managing cgroups? Sure, you can hack around this, make that daemon special,
 and magic, and stuff -- or you can just not do such nonsense. There's no
 reason to repeat the fuckup that cgroup became in kernelspace a second time,
 but this time in userspace, with multiple manager daemons all with different
 and slightly incompatible definitions what a unit to manage actualy is...

I forgot about the tautology of systemd.  systemd is monolithic.
Therefore it can not have any external dependencies.  Therefore it
must absorb anything it depends on.  Therefore systemd continues to
grow in size and scope.  Up next: systemd manages your X sessions!

But that's not my point.  It seems pretty easy to make this cgroup
management (in native mode) a library that can have either a thin
veneer of a main() function, while also being usable by systemd.  The
point is to solve all of the problems ONCE.  I'm trying to make the
case that systemd itself should be focusing on features and policies
and awesome APIs.

 We want to run fewer, simpler things on our systems, we want to reuse as

Fewer and simpler are not compatible, unless you are losing
functionality.  Systemd is fewer, but NOT simpler.

 much of the code as we can. You don't achieve that by running yet another
 daemon that does worse what systemd can anyway do simpler, easier and
 better.

Considering this is all hypothetical, I find this to be a funny
debate.  My hypothetical idea is better than your hypothetical idea.

 The least you could grant us is to have a look at the final APIs we will
 have to offer before you already imply that systemd cannot be a valid
 implementation of any API people could ever agree on.

Whoah, don't get defensive.  I said nothing of the sort.  The fact of
the matter is that we do not run systemd, at least in part because of
the monolithic nature.  That's unlikely to change in this timescale.
What I said was that it would be a shame if we had to invent our own
low-level cgroup daemon just because the upstream daemons was too
tightly coupled with systemd.

I think we have a lot of experience to offer to this project, and a
vested interest in seeing it done well.  But if it is purely
targetting systemd, we have little incentive to devote resources to
it.

Please note that I am strictly talking about the lowest layer of the
API.  Just the thing that guards cgroupfs against mere mortals.  The
higher layers - where abstractions exist, that are actually USEFUL to
end users - are not really in scope right now.  We already have our
own higher level APIs.

This is supposed to be collaborative, not combative.

Tim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello, Mike.

On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> I always thought that was a very cool feature, mkdir+echo, poof done.
> Now maybe that interface is suboptimal for serious usage, but it makes
> the things usable via dirt simple scripts, very flexible, nice.

Oh, that in itself is not bad.  I mean, if you're root, it's pretty
easy to play with and that part is fine.  But combined with the
hierarchical nature of cgroup and file permissions, it encourages
people to "deligate" subdirectories to less previledged domains, which
in turn leads to normal binaries to manipulate them directly, which is
where the horror begins.  We end up exposing control knobs which are
tightly coupled to kernel implementation details right into lay
binaries and scripts directly used by end users.

I think this is the first time this happened, which is probably why
nobody really noticed the mess earlier.

Anyways, if you're root, you can keep doing whatever you want.  You
could be stepping on the centralized agent's toes a bit and vice-versa
but I don't think that's gonna be disastrous.  What I'm trying to
stamp out is direct usages from !root domains and !system-management
binaries / scripts.  They absolutely have to go.  There's no question
about it and I'll take totalitarian userland agent anyday over the
current mess.

Eventually, I think we'll be able to reach an equilibrium where most
things are reasonable and we'll be exploring the acceptable limits of
flexibility again, but right now, please bear with the brutality.
We're way over the line and I can't see a way back which isn't gonna
sting a bit.  I'm and will keep trying to make it as painless as
possible.

Thanks!

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Mike Galbraith
On Thu, 2013-06-27 at 21:09 -0700, Tejun Heo wrote:

> No, it's completely messed up.  We're now starting to see users trying
> to embed low level cgroup details into their binaries and cgroup is
> exposing sysctl-level konbs which are directly tied to internal
> implementation of core subsystems.  cgroup successfully bypassed the
> usual kernel API policing with the help of hierarchical filesystem
> interface which allows delegation on the surface.  We completely
> fucked up.  This is a full scale disaster unrolling.

I always thought that was a very cool feature, mkdir+echo, poof done.
Now maybe that interface is suboptimal for serious usage, but it makes
the things usable via dirt simple scripts, very flexible, nice.

But whatever, not my call, you know your business better than I.  If
mandatory agent happens, fine, but imho that will be sad day.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello, Mike.

On Fri, Jun 28, 2013 at 05:46:38AM +0200, Mike Galbraith wrote:
> Sure, because in private property and I mandatory agent, I see "gimme
> yer wallet bitch", an incredibly arrogant and brutal mugging.  That's
> not the way it's meant, I know that, but that's how it comes across.
> You asked, so you get the straight up answer.

I don't know.  It reads more like tungue-in-cheek thing to me rather
than being actually arrogant, and some part of the brutality is
necessary at this point.

> Offering to manage cgroups is one thing, very generous, forcefully
> placing itself between user and kernel quite another.  Perhaps I
> misread, but my interpretation was that the intent is to make systemd a
> mandatory agent, even saw reference to it taking up residence in the
> kernel tree (that bit made me chuckle, pull request would have to be
> very cleverly worded methinks).  I'm sure it will be quite capable, its
> authors are.  However, when I want to talk to my kernel, I expect to be
> able to tell anyone else using the phone to hang up.. now.

I don't know how to respond to this.  It feels more emotional than
technical.

> It's useful now, usable to the point that enterprise users exist who
> have integrated cgroups into their business model.  But then you know
> that.  Sure, there are problems, things could and no doubt will get a
> lot better.

No, it's completely messed up.  We're now starting to see users trying
to embed low level cgroup details into their binaries and cgroup is
exposing sysctl-level konbs which are directly tied to internal
implementation of core subsystems.  cgroup successfully bypassed the
usual kernel API policing with the help of hierarchical filesystem
interface which allows delegation on the surface.  We completely
fucked up.  This is a full scale disaster unrolling.

> However, wrt userspace agent, no agent is going to be the right answer
> for all, so that agent needs to have a step aside button so another
> agent can be tasked with the managerial duties, whether that be little
> ole /me or Aunt Tilly piddling with this and that because we damn well
> feel like it, or BigFoot company X going massively wild and crazy doing
> their business thing.

*ANY* agent is better than now.  We need to back the hell out of
direct usages as soon as possible.  cgroup is leaking kernel
implementation details into individual binaries.  The current
situation is dangerous and putting an agent inbetween is a good way of
gradually backing out of it.

> No, it's not at all crazy, _offering_ the user a managerial service is
> great, generous, way to go guys, pass out the white hats.  Use force,
> and those pretty white hats turn black as night, hero to villain.

No, it's completely crazy.  Full psycho crazy.  You just don't realize
it yet.

> systemd and no systemd is also a valid issue.  I'm sure it'll all get
> worked out, but that link, and others like it make me see bright red.

That red is nothing compared to the kernel implementation detail leak
going on right now.  The alarm for that has been blinking
psychedelically for some time now.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Mike Galbraith
On Thu, 2013-06-27 at 11:01 -0700, Tejun Heo wrote: 
> Hello, Mike.
> 
> On Thu, Jun 27, 2013 at 07:45:07AM +0200, Mike Galbraith wrote:
> > I can understand some alarm.  When I saw the below I started frothing at
> > the face and howling at the moon, and I don't even use the things much.
> 
> Can I ask why?  The reasons are not apparent to me.

Sure, because in private property and I mandatory agent, I see "gimme
yer wallet bitch", an incredibly arrogant and brutal mugging.  That's
not the way it's meant, I know that, but that's how it comes across.
You asked, so you get the straight up answer.

Offering to manage cgroups is one thing, very generous, forcefully
placing itself between user and kernel quite another.  Perhaps I
misread, but my interpretation was that the intent is to make systemd a
mandatory agent, even saw reference to it taking up residence in the
kernel tree (that bit made me chuckle, pull request would have to be
very cleverly worded methinks).  I'm sure it will be quite capable, its
authors are.  However, when I want to talk to my kernel, I expect to be
able to tell anyone else using the phone to hang up.. now.

> > http://lists.freedesktop.org/archives/systemd-devel/2013-June/011521.html
> > 
> > Hierarchy layout aside, that "private property" bit says that the folks
> > who currently own and use the cgroups interface will lose direct access
> > to it.  I can imagine folks who have become dependent upon an on the fly
> > management agents of their own design becoming a tad alarmed.
> 
> They're gonna be able to do what they've been doing for the
> foreseeable future if they choose not to use systemd's unified

Those are the comforting words I wanted to hear, that the user chooses,
that the user will not find that this that or any other userspace agent
gains the right to insert itself between user and kernel.

> AFAICS, having a userland agent which has overall knowledge of the
> hierarchy and enforcesf structure and limiations is a requirement to
> make cgroup generally useable and useful.

It's useful now, usable to the point that enterprise users exist who
have integrated cgroups into their business model.  But then you know
that.  Sure, there are problems, things could and no doubt will get a
lot better.

However, wrt userspace agent, no agent is going to be the right answer
for all, so that agent needs to have a step aside button so another
agent can be tasked with the managerial duties, whether that be little
ole /me or Aunt Tilly piddling with this and that because we damn well
feel like it, or BigFoot company X going massively wild and crazy doing
their business thing.

>   For systemd based systems,
> systemd serving that role isn't too crazy.  It's sure gonna have
> teeting issues at the beginning but it has all the necessary
> information to manage workloads on the system.

No, it's not at all crazy, _offering_ the user a managerial service is
great, generous, way to go guys, pass out the white hats.  Use force,
and those pretty white hats turn black as night, hero to villain.

> A valid issue is interoperability between systemd and non-systemd
> systems.  I don't have an immediately good answer for that.  I wrote
> in another reply but making cgroup generally available is a pretty new
> effort and we're still in the process of figuring out what the right
> constructs and abstractions are.  Hopefully, we'll be able to reach a
> common set of abstractions to base things on top in itme.

systemd and no systemd is also a valid issue.  I'm sure it'll all get
worked out, but that link, and others like it make me see bright red.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello,

On Thu, Jun 27, 2013 at 01:46:18PM -0700, Tim Hockin wrote:
> So what you're saying is that you don't care that this new thing is
> less capable than the old thing, despite it having real impact.

Sort of.  I'm saying, at least up until now, moving away from
orthogonal hierarchy support seems to be the right trade-off.  It all
depends on how you measure how much things are simplified and how
heavy the "real impacts" are.  It's not like these things can be
determined white and black.  Given the current situation, I think it's
the right call.

> If controller C is enabled at level X but disabled at level X/Y, does
> that mean that X/Y uses the limits set in X?  How about X/Y/Z?

Y and Y/Z wouldn't make any difference.  Tasks belonging to them would
behave as if they belong to X as far as C is concerened.

> So take away some of the flexibility that has minimal impact and
> maximum return.  Splitting threads across cgroups - we use it, but we
> could get off that.  Force all-or-nothing joining of an aggregate

Please do so.

> construct (a container vs N cgroups).
> 
> But perform surgery with a scalpel, not a hatchet.

As anything else, it's drawing a line in a continuous spectrum of
grey.  Right now, given that maintaining multiple orthogonal
hierarchies while introducing a proper concept of resource container
involves addition of completely new constructs and complexity, I don't
think that's a good option.  If there are problems which can't be
resolved / worked around in a reasonable manner, please bring them up
along with their contexts.  Let's examine them and see whether there
are other ways to accomodate them.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tim Hockin
On Thu, Jun 27, 2013 at 11:14 AM, Serge Hallyn  wrote:
> Quoting Tejun Heo (t...@kernel.org):
>> Hello, Serge.
>>
>> On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
>> > At some point (probably soon) we might want to talk about a standard API
>> > for these things.  However I think it will have to come in the form of
>> > a standard library, which knows to either send requests over dbus to
>> > systemd, or over /dev/cgroup sock to the manager.
>>
>> Yeah, eventually, I think we'll have a standardized way to configure
>> resource distribution in the system.  Maybe we'll agree on a
>> standardized dbus protocol or there will be library, I don't know;
>> however, whatever form it may be in, it abstraction level should be
>> way higher than that of direct cgroupfs access.  It's way too low
>> level and very easy to end up in a complete nonsense configuration.
>>
>> e.g. enabling "cpu" on a cgroup whlie leaving other cgroups alone
>> wouldn't enable fair scheduling on that cgroup but drastically reduce
>> the amount of cpu share it gets as it now gets treated as single
>> entity competing with all tasks at the parent level.
>
> Right.  I *think* this can be offered as a daemon which sits as the
> sole consumer of my agent's API, and offers a higher level "do what I
> want" API.  But designing that API is going to be interesting.

This is something we have, partially, and are working to be able to
open-source.  We have a LOT of experience feeding into the semantics
that actually make users happy.

Today it leverages split-hierarchies, but that is not required in the
generic case (only if you want to offer the semantics we do).  It
explicitly delegates some aspects of sub-cgroup control to users, but
that could go away if your lowest-level agency can handle it.

> I should find a good, up-to-date summary of the current behaviors of
> each controller so I can talk more intelligently about it.  (I'll
> start by looking at the kernel Documentation/cgroups, but don't
> feel too confident that they'll be uptodate :)
>
>> At the moment, I'm not sure what the eventual abstraction would look
>> like.  systemd is extending its basic constructs by adding slices and
>> scopes and it does make sense to integrate the general organization of
>> the system (services, user sessions, VMs and so on) with resource
>> management.  Given some time, I'm hoping we'll be able to come up with
>> and agree on some common constructs so that each workload can indicate
>> its resource requirements in a unified way.
>>
>> That said, I really think we should experiment for a while before
>> trying to settle down on things.  We've now just started exploring how
>> system-wide resource managment can be made widely available to systems
>> without requiring extremely specialized hand-crafted configurations
>> and I'm pretty sure we're getting and gonna get quite a few details
>> wrong, so I don't think it'd be a good idea to try to agree on things
>> right now.  As far as such integration goes, I think it's time to play
>> with things and observe the results.
>
> Right,  I'm not attached to my toy implementation at all - except for
> the ability, in some fashion, to have nested agents which don't have
> cgroupfs access but talk to another agent to get the job done.
>
> -serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tim Hockin
On Thu, Jun 27, 2013 at 10:38 AM, Tejun Heo  wrote:
> Hello, Tim.
>
> On Wed, Jun 26, 2013 at 08:42:21PM -0700, Tim Hockin wrote:
>> OK, then what I don't know is what is the new interface?  A new cgroupfs?
>
> It's gonna be a new mount option for cgroupfs.
>
>> DTF and CPU and cpuset all have "default" groups for some tasks (and
>> not others) in our world today.  DTF actually has default, prio, and
>> "normal".  I was simplifying before.  I really wish it were as simple
>> as you think it is.  But if it were, do you think I'd still be
>> arguing?
>
> How am I supposed to know when you don't communicate it but just wave
> your hands saying it's all very complicated?  The cpuset / blkcg
> example is pretty bad because you can enforce any cpuset rules at the
> leaves.

Modifying hundreds of cgroups is really painful, and yes, we do it
often enough to be able to see it.

>> This really doesn't scale when I have thousands of jobs running.
>> Being able to disable at some levels on some controllers probably
>> helps some, but I can't say for sure without knowing the new interface
>
> How does the number of jobs affect it?  Does each job create a new
> cgroup?

Well, in your model it does...

>> We tried it in unified hierarchy.  We had our Top People on the
>> problem.  The best we could get was bad enough that we embarked on a
>> LITERAL 2 year transition to make it better.
>
> What didn't work?  What part was so bad?  I find it pretty difficult
> to believe that multiple orthogonal hierarchies is the only possible
> solution, so please elaborate the issues that you guys have
> experienced.

I'm looping in more Google people.

> The hierarchy is for organization and enforcement of dynamic
> hierarchical resource distribution and that's it.  If its expressive
> power is lacking, take compromise or tune the configuration according
> to the workloads.  The latter is necessary in workloads which have
> clear distinction of foreground and background anyway - anything which
> interacts with human beings including androids.

So what you're saying is that you don't care that this new thing is
less capable than the old thing, despite it having real impact.

>> In other words, define a container as a set of cgroups, one under each
>> each active controller type.  A TID enters the container atomically,
>> joining all of the cgroups or none of the cgroups.
>>
>> container C1 = { /cgroup/cpu/foo, /cgroup/memory/bar,
>> /cgroup/io/default/foo/bar, /cgroup/cpuset/
>>
>> This is an abstraction that we maintain in userspace (more or less)
>> and we do actually have headaches from split hierarchies here
>> (handling partial failures, non-atomic joins, etc)
>
> That'd separate out task organization from controllre config
> hierarchies.  Kay had a similar idea some time ago.  I think it makes
> things even more complex than it is right now.  I'll continue on this
> below.
>
>> I'm still a bit fuzzy - is all of this written somewhere?
>
> If you dig through cgroup ML, most are there.  There'll be
> "cgroup.controllers" file with which you can enable / disable
> controllers.  Enabling a controller in a cgroup implies that the
> controller is enabled in all ancestors.

Implies or requires?  Cause or predicate?

If controller C is enabled at level X but disabled at level X/Y, does
that mean that X/Y uses the limits set in X?  How about X/Y/Z?

This will get rid of the bulk of the cpuset scaling problem, but not
all of it.  I think we still have the same problems with cpu as we do
with io.  Perhaps that should have been the example.

>> It sounds like you're missing a layer of abstraction.  Why not add the
>> abstraction you want to expose on top of powerful primitives, instead
>> of dumbing down the primitives?
>
> It sure would be possible build more and try to address the issues
> we're seeing now; however, after looking at cgroups for some time now,
> the underlying theme is failure to take reasonable trade-offs and
> going for maximum flexibility in making each choice - the choice of
> interface, multiple hierarchies, no restriction on hierarchical
> behavior, splitting threads of the same process into separate cgroups,
> semi-encouraging delegation through file permission without actually
> pondering the consequences and so on.  And each choice probably made
> sense trying to serve each immediate requirement at the time but added
> up it's a giant pile of mess which developed without direction.

I am very sympathetic to this problem.  You could have just described
some of our internal problems too.  The difference is that we are
trying to make changes that provide more structure and boundaries in
ways that retain the fundamental power, without tossing out the baby
with the bathwater.

> So, at this point, I'm very skeptical about adding more flexibility.
> Once the basics are settled, we sure can look into the missing pieces
> but I don't think that's what we should be doing right now.  Another
> thing is that the unified 

Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello,

On Thu, Jun 27, 2013 at 11:51 AM, Serge Hallyn  wrote:
>> I think it probably would be better to allow organization and RO
>
> What do you mean by "organization"?  Creating cgroups and moving tasks
> between them, without setting other cgroup values?

Yeap, I also think that's how user sessions are gonna be handled.
We're gonna have limited amount of delegation for organization and
read accesses.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Serge Hallyn
Quoting Tejun Heo (t...@kernel.org):
> Hello, Serge.
> 
> On Thu, Jun 27, 2013 at 01:14:57PM -0500, Serge Hallyn wrote:
> > I should find a good, up-to-date summary of the current behaviors of
> > each controller so I can talk more intelligently about it.  (I'll
> > start by looking at the kernel Documentation/cgroups, but don't
> > feel too confident that they'll be uptodate :)
> 
> Heh, it's hopelessly outdated.  Sorry about that.  I'll get around to
> updating it eventually.  Right now everything is in flux.
> 
> > Right,  I'm not attached to my toy implementation at all - except for
> > the ability, in some fashion, to have nested agents which don't have
> > cgroupfs access but talk to another agent to get the job done.
> 
> I think it probably would be better to allow organization and RO

What do you mean by "organization"?  Creating cgroups and moving tasks
between them, without setting other cgroup values?

> access to knobs and stat files inside containers, for lower overhead,
> if nothing else, and have comm channel for operations which need
> supervision at a wider level.
> 
> Thanks.
> 
> -- 
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello, Serge.

On Thu, Jun 27, 2013 at 01:14:57PM -0500, Serge Hallyn wrote:
> I should find a good, up-to-date summary of the current behaviors of
> each controller so I can talk more intelligently about it.  (I'll
> start by looking at the kernel Documentation/cgroups, but don't
> feel too confident that they'll be uptodate :)

Heh, it's hopelessly outdated.  Sorry about that.  I'll get around to
updating it eventually.  Right now everything is in flux.

> Right,  I'm not attached to my toy implementation at all - except for
> the ability, in some fashion, to have nested agents which don't have
> cgroupfs access but talk to another agent to get the job done.

I think it probably would be better to allow organization and RO
access to knobs and stat files inside containers, for lower overhead,
if nothing else, and have comm channel for operations which need
supervision at a wider level.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Serge Hallyn
Quoting Tejun Heo (t...@kernel.org):
> Hello, Serge.
> 
> On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
> > At some point (probably soon) we might want to talk about a standard API
> > for these things.  However I think it will have to come in the form of
> > a standard library, which knows to either send requests over dbus to
> > systemd, or over /dev/cgroup sock to the manager.
> 
> Yeah, eventually, I think we'll have a standardized way to configure
> resource distribution in the system.  Maybe we'll agree on a
> standardized dbus protocol or there will be library, I don't know;
> however, whatever form it may be in, it abstraction level should be
> way higher than that of direct cgroupfs access.  It's way too low
> level and very easy to end up in a complete nonsense configuration.
> 
> e.g. enabling "cpu" on a cgroup whlie leaving other cgroups alone
> wouldn't enable fair scheduling on that cgroup but drastically reduce
> the amount of cpu share it gets as it now gets treated as single
> entity competing with all tasks at the parent level.

Right.  I *think* this can be offered as a daemon which sits as the
sole consumer of my agent's API, and offers a higher level "do what I
want" API.  But designing that API is going to be interesting.

I should find a good, up-to-date summary of the current behaviors of
each controller so I can talk more intelligently about it.  (I'll
start by looking at the kernel Documentation/cgroups, but don't
feel too confident that they'll be uptodate :)

> At the moment, I'm not sure what the eventual abstraction would look
> like.  systemd is extending its basic constructs by adding slices and
> scopes and it does make sense to integrate the general organization of
> the system (services, user sessions, VMs and so on) with resource
> management.  Given some time, I'm hoping we'll be able to come up with
> and agree on some common constructs so that each workload can indicate
> its resource requirements in a unified way.
> 
> That said, I really think we should experiment for a while before
> trying to settle down on things.  We've now just started exploring how
> system-wide resource managment can be made widely available to systems
> without requiring extremely specialized hand-crafted configurations
> and I'm pretty sure we're getting and gonna get quite a few details
> wrong, so I don't think it'd be a good idea to try to agree on things
> right now.  As far as such integration goes, I think it's time to play
> with things and observe the results.

Right,  I'm not attached to my toy implementation at all - except for
the ability, in some fashion, to have nested agents which don't have
cgroupfs access but talk to another agent to get the job done.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello, Mike.

On Thu, Jun 27, 2013 at 07:45:07AM +0200, Mike Galbraith wrote:
> I can understand some alarm.  When I saw the below I started frothing at
> the face and howling at the moon, and I don't even use the things much.

Can I ask why?  The reasons are not apparent to me.

> http://lists.freedesktop.org/archives/systemd-devel/2013-June/011521.html
> 
> Hierarchy layout aside, that "private property" bit says that the folks
> who currently own and use the cgroups interface will lose direct access
> to it.  I can imagine folks who have become dependent upon an on the fly
> management agents of their own design becoming a tad alarmed.

They're gonna be able to do what they've been doing for the
foreseeable future if they choose not to use systemd's unified
resource management.  That said, what we have today is pretty lousy
and a lot of hierarchical stuff were completely broken until some
releases ago and things *must* have been broken on the userland side
too.  It could have worked for their specific setup but I strongly
doubt there are anything generic working well out in the wild.  cgroup
hasn't been capable of supporting something like that.

AFAICS, having a userland agent which has overall knowledge of the
hierarchy and enforcesf structure and limiations is a requirement to
make cgroup generally useable and useful.  For systemd based systems,
systemd serving that role isn't too crazy.  It's sure gonna have
teeting issues at the beginning but it has all the necessary
information to manage workloads on the system.

A valid issue is interoperability between systemd and non-systemd
systems.  I don't have an immediately good answer for that.  I wrote
in another reply but making cgroup generally available is a pretty new
effort and we're still in the process of figuring out what the right
constructs and abstractions are.  Hopefully, we'll be able to reach a
common set of abstractions to base things on top in itme.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello, Serge.

On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
> At some point (probably soon) we might want to talk about a standard API
> for these things.  However I think it will have to come in the form of
> a standard library, which knows to either send requests over dbus to
> systemd, or over /dev/cgroup sock to the manager.

Yeah, eventually, I think we'll have a standardized way to configure
resource distribution in the system.  Maybe we'll agree on a
standardized dbus protocol or there will be library, I don't know;
however, whatever form it may be in, it abstraction level should be
way higher than that of direct cgroupfs access.  It's way too low
level and very easy to end up in a complete nonsense configuration.

e.g. enabling "cpu" on a cgroup whlie leaving other cgroups alone
wouldn't enable fair scheduling on that cgroup but drastically reduce
the amount of cpu share it gets as it now gets treated as single
entity competing with all tasks at the parent level.

At the moment, I'm not sure what the eventual abstraction would look
like.  systemd is extending its basic constructs by adding slices and
scopes and it does make sense to integrate the general organization of
the system (services, user sessions, VMs and so on) with resource
management.  Given some time, I'm hoping we'll be able to come up with
and agree on some common constructs so that each workload can indicate
its resource requirements in a unified way.

That said, I really think we should experiment for a while before
trying to settle down on things.  We've now just started exploring how
system-wide resource managment can be made widely available to systems
without requiring extremely specialized hand-crafted configurations
and I'm pretty sure we're getting and gonna get quite a few details
wrong, so I don't think it'd be a good idea to try to agree on things
right now.  As far as such integration goes, I think it's time to play
with things and observe the results.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello, Tim.

On Wed, Jun 26, 2013 at 08:42:21PM -0700, Tim Hockin wrote:
> OK, then what I don't know is what is the new interface?  A new cgroupfs?

It's gonna be a new mount option for cgroupfs.

> DTF and CPU and cpuset all have "default" groups for some tasks (and
> not others) in our world today.  DTF actually has default, prio, and
> "normal".  I was simplifying before.  I really wish it were as simple
> as you think it is.  But if it were, do you think I'd still be
> arguing?

How am I supposed to know when you don't communicate it but just wave
your hands saying it's all very complicated?  The cpuset / blkcg
example is pretty bad because you can enforce any cpuset rules at the
leaves.

> This really doesn't scale when I have thousands of jobs running.
> Being able to disable at some levels on some controllers probably
> helps some, but I can't say for sure without knowing the new interface

How does the number of jobs affect it?  Does each job create a new
cgroup?

> We tried it in unified hierarchy.  We had our Top People on the
> problem.  The best we could get was bad enough that we embarked on a
> LITERAL 2 year transition to make it better.

What didn't work?  What part was so bad?  I find it pretty difficult
to believe that multiple orthogonal hierarchies is the only possible
solution, so please elaborate the issues that you guys have
experienced.

The hierarchy is for organization and enforcement of dynamic
hierarchical resource distribution and that's it.  If its expressive
power is lacking, take compromise or tune the configuration according
to the workloads.  The latter is necessary in workloads which have
clear distinction of foreground and background anyway - anything which
interacts with human beings including androids.

> In other words, define a container as a set of cgroups, one under each
> each active controller type.  A TID enters the container atomically,
> joining all of the cgroups or none of the cgroups.
> 
> container C1 = { /cgroup/cpu/foo, /cgroup/memory/bar,
> /cgroup/io/default/foo/bar, /cgroup/cpuset/
> 
> This is an abstraction that we maintain in userspace (more or less)
> and we do actually have headaches from split hierarchies here
> (handling partial failures, non-atomic joins, etc)

That'd separate out task organization from controllre config
hierarchies.  Kay had a similar idea some time ago.  I think it makes
things even more complex than it is right now.  I'll continue on this
below.

> I'm still a bit fuzzy - is all of this written somewhere?

If you dig through cgroup ML, most are there.  There'll be
"cgroup.controllers" file with which you can enable / disable
controllers.  Enabling a controller in a cgroup implies that the
controller is enabled in all ancestors.

> It sounds like you're missing a layer of abstraction.  Why not add the
> abstraction you want to expose on top of powerful primitives, instead
> of dumbing down the primitives?

It sure would be possible build more and try to address the issues
we're seeing now; however, after looking at cgroups for some time now,
the underlying theme is failure to take reasonable trade-offs and
going for maximum flexibility in making each choice - the choice of
interface, multiple hierarchies, no restriction on hierarchical
behavior, splitting threads of the same process into separate cgroups,
semi-encouraging delegation through file permission without actually
pondering the consequences and so on.  And each choice probably made
sense trying to serve each immediate requirement at the time but added
up it's a giant pile of mess which developed without direction.

So, at this point, I'm very skeptical about adding more flexibility.
Once the basics are settled, we sure can look into the missing pieces
but I don't think that's what we should be doing right now.  Another
thing is that the unified hierarchy can be implemented by using most
of the constructs cgroup core already has in more controller way.
Given that we're gonna have to maintain both interfaces for quite some
time, the deviation should be kept as minimal as possible.

> But it seems vastly better to define a next-gen API that retains the
> important flexibility but adds structure where it was lacking
> previously.

I suppose that's where we disagree.  I think a lot of cgroup's
problems stem from too much flexibility.  The problem with such level
of flexibility is that, in addition to breaking fundamental constructs
and adding significantly to maintenance overhead, it blocks reasonable
trade-offs to be made at the right places, in turn requiring more
"flexibility" to address the introduced deficiencies.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Serge Hallyn
Quoting Tim Hockin (thoc...@hockin.org):
> On Thu, Jun 27, 2013 at 6:22 AM, Serge Hallyn  wrote:
> > Quoting Mike Galbraith (bitbuc...@online.de):
> >> On Wed, 2013-06-26 at 14:20 -0700, Tejun Heo wrote:
> >> > Hello, Tim.
> >> >
> >> > On Mon, Jun 24, 2013 at 09:07:47PM -0700, Tim Hockin wrote:
> >> > > I really want to understand why this is SO IMPORTANT that you have to
> >> > > break userspace compatibility?  I mean, isn't Linux supposed to be the
> >> > > OS with the stable kernel interface?  I've seen Linus rant time and
> >> > > time again about this - why is it OK now?
> >> >
> >> > What the hell are you talking about?  Nobody is breaking userland
> >> > interface.  A new version of interface is being phased in and the old
> >> > one will stay there for the foreseeable future.  It will be phased out
> >> > eventually but that's gonna take a long time and it will have to be
> >> > something hardly noticeable.  Of course new features will only be
> >> > available with the new interface and there will be efforts to nudge
> >> > people away from the old one but the existing interface will keep
> >> > working it does.
> >>
> >> I can understand some alarm.  When I saw the below I started frothing at
> >> the face and howling at the moon, and I don't even use the things much.
> >>
> >> http://lists.freedesktop.org/archives/systemd-devel/2013-June/011521.html
> >>
> >> Hierarchy layout aside, that "private property" bit says that the folks
> >> who currently own and use the cgroups interface will lose direct access
> >> to it.  I can imagine folks who have become dependent upon an on the fly
> >> management agents of their own design becoming a tad alarmed.
> >
> > FWIW, the code is too embarassing yet to see daylight, but I'm playing
> > with a very lowlevel cgroup manager which supports nesting itself.
> > Access in this POC is low-level ("set freezer.state to THAWED for cgroup
> > /c1/c2", "Create /c3"), but the key feature is that it can run in two
> > modes - native mode in which it uses cgroupfs, and child mode where it
> > talks to a parent manager to make the changes.
> 
> In this world, are users able to read cgroup files, or do they have to
> go through a central agent, too?

The agent won't itself do anything to stop access through cgroupfs, but
the idea would be that cgroupfs would only be mounted in the agent's
mntns.  My hope would be that the libcgroup commands (like cgexec,
cgcreate, etc) would know to talk to the agent when possible, and users
would use those.

> > So then the idea would be that userspace (like libvirt and lxc) would
> > talk over /dev/cgroup to its manager.  Userspace inside a container
> > (which can't actually mount cgroups itself) would talk to its own
> > manager which is talking over a passed-in socket to the host manager,
> > which in turn runs natively (uses cgroupfs, and nests "create /c1" under
> > the requestor's cgroup).
> 
> How do you handle updates of this agent?  Suppose I have hundreds of
> running containers, and I want to release a new version of the cgroupd
> ?

This may change (which is part of what I want to investigate with some
POC), but right now I'm building any controller-aware smarts into it.  I
think that's what you're asking about?  The agent doesn't do "slices"
etc.  This may turn out to be insufficient, we'll see.

So the only state which the agent stores is a list of cgroup mounts (if
in native mode) or an open socket to the parent (if in child mode), and a
list of connected children sockets.

HUPping the agent will cause it to reload the cgroupfs mounts (in case
you've mounted a new controller, living in "the old world" :).  If you
just kill it and start a new one, it shouldn't matter.

> (note: inquiries about the implementation do not denote acceptance of
> the model :)

To put it another way, the problem I'm solving (for now) is not the "I
want a daemon to ensure that requested guarantees are correctly
implemented." In that sense I'm maintaining the status quo, i.e. the
admin needs to architect the layout correctly.

The problem I'm solving is really that I want containers to be able to
handle cgroups even if they can't mount cgroupfs, and I want all
userspace to be able to behave the same whether they are in a container
or not.

This isn't meant as a poke in the eye of anyone who wants to address the
other problem.  If it turns out that we (meaning "the community of
cgroup users") really want such an agent, then we can add that.  I'm not
convinced.

What would probably be a better design, then, would be that the agent
I'm working on can plug into a resource allocation agent.  Or, I
suppose, the other way around.

> > At some point (probably soon) we might want to talk about a standard API
> > for these things.  However I think it will have to come in the form of
> > a standard library, which knows to either send requests over dbus to
> > systemd, or over /dev/cgroup sock to the manager.
> >
> > -serge
--
To unsubscribe from this 

Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tim Hockin
On Thu, Jun 27, 2013 at 6:22 AM, Serge Hallyn  wrote:
> Quoting Mike Galbraith (bitbuc...@online.de):
>> On Wed, 2013-06-26 at 14:20 -0700, Tejun Heo wrote:
>> > Hello, Tim.
>> >
>> > On Mon, Jun 24, 2013 at 09:07:47PM -0700, Tim Hockin wrote:
>> > > I really want to understand why this is SO IMPORTANT that you have to
>> > > break userspace compatibility?  I mean, isn't Linux supposed to be the
>> > > OS with the stable kernel interface?  I've seen Linus rant time and
>> > > time again about this - why is it OK now?
>> >
>> > What the hell are you talking about?  Nobody is breaking userland
>> > interface.  A new version of interface is being phased in and the old
>> > one will stay there for the foreseeable future.  It will be phased out
>> > eventually but that's gonna take a long time and it will have to be
>> > something hardly noticeable.  Of course new features will only be
>> > available with the new interface and there will be efforts to nudge
>> > people away from the old one but the existing interface will keep
>> > working it does.
>>
>> I can understand some alarm.  When I saw the below I started frothing at
>> the face and howling at the moon, and I don't even use the things much.
>>
>> http://lists.freedesktop.org/archives/systemd-devel/2013-June/011521.html
>>
>> Hierarchy layout aside, that "private property" bit says that the folks
>> who currently own and use the cgroups interface will lose direct access
>> to it.  I can imagine folks who have become dependent upon an on the fly
>> management agents of their own design becoming a tad alarmed.
>
> FWIW, the code is too embarassing yet to see daylight, but I'm playing
> with a very lowlevel cgroup manager which supports nesting itself.
> Access in this POC is low-level ("set freezer.state to THAWED for cgroup
> /c1/c2", "Create /c3"), but the key feature is that it can run in two
> modes - native mode in which it uses cgroupfs, and child mode where it
> talks to a parent manager to make the changes.

In this world, are users able to read cgroup files, or do they have to
go through a central agent, too?

> So then the idea would be that userspace (like libvirt and lxc) would
> talk over /dev/cgroup to its manager.  Userspace inside a container
> (which can't actually mount cgroups itself) would talk to its own
> manager which is talking over a passed-in socket to the host manager,
> which in turn runs natively (uses cgroupfs, and nests "create /c1" under
> the requestor's cgroup).

How do you handle updates of this agent?  Suppose I have hundreds of
running containers, and I want to release a new version of the cgroupd
?

(note: inquiries about the implementation do not denote acceptance of
the model :)

> At some point (probably soon) we might want to talk about a standard API
> for these things.  However I think it will have to come in the form of
> a standard library, which knows to either send requests over dbus to
> systemd, or over /dev/cgroup sock to the manager.
>
> -serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Serge Hallyn
Quoting Mike Galbraith (bitbuc...@online.de):
> On Wed, 2013-06-26 at 14:20 -0700, Tejun Heo wrote: 
> > Hello, Tim.
> > 
> > On Mon, Jun 24, 2013 at 09:07:47PM -0700, Tim Hockin wrote:
> > > I really want to understand why this is SO IMPORTANT that you have to
> > > break userspace compatibility?  I mean, isn't Linux supposed to be the
> > > OS with the stable kernel interface?  I've seen Linus rant time and
> > > time again about this - why is it OK now?
> > 
> > What the hell are you talking about?  Nobody is breaking userland
> > interface.  A new version of interface is being phased in and the old
> > one will stay there for the foreseeable future.  It will be phased out
> > eventually but that's gonna take a long time and it will have to be
> > something hardly noticeable.  Of course new features will only be
> > available with the new interface and there will be efforts to nudge
> > people away from the old one but the existing interface will keep
> > working it does.
> 
> I can understand some alarm.  When I saw the below I started frothing at
> the face and howling at the moon, and I don't even use the things much.
> 
> http://lists.freedesktop.org/archives/systemd-devel/2013-June/011521.html
> 
> Hierarchy layout aside, that "private property" bit says that the folks
> who currently own and use the cgroups interface will lose direct access
> to it.  I can imagine folks who have become dependent upon an on the fly
> management agents of their own design becoming a tad alarmed.

FWIW, the code is too embarassing yet to see daylight, but I'm playing
with a very lowlevel cgroup manager which supports nesting itself.
Access in this POC is low-level ("set freezer.state to THAWED for cgroup
/c1/c2", "Create /c3"), but the key feature is that it can run in two
modes - native mode in which it uses cgroupfs, and child mode where it
talks to a parent manager to make the changes.

So then the idea would be that userspace (like libvirt and lxc) would
talk over /dev/cgroup to its manager.  Userspace inside a container
(which can't actually mount cgroups itself) would talk to its own
manager which is talking over a passed-in socket to the host manager,
which in turn runs natively (uses cgroupfs, and nests "create /c1" under
the requestor's cgroup).

At some point (probably soon) we might want to talk about a standard API
for these things.  However I think it will have to come in the form of
a standard library, which knows to either send requests over dbus to
systemd, or over /dev/cgroup sock to the manager.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Serge Hallyn
Quoting Mike Galbraith (bitbuc...@online.de):
 On Wed, 2013-06-26 at 14:20 -0700, Tejun Heo wrote: 
  Hello, Tim.
  
  On Mon, Jun 24, 2013 at 09:07:47PM -0700, Tim Hockin wrote:
   I really want to understand why this is SO IMPORTANT that you have to
   break userspace compatibility?  I mean, isn't Linux supposed to be the
   OS with the stable kernel interface?  I've seen Linus rant time and
   time again about this - why is it OK now?
  
  What the hell are you talking about?  Nobody is breaking userland
  interface.  A new version of interface is being phased in and the old
  one will stay there for the foreseeable future.  It will be phased out
  eventually but that's gonna take a long time and it will have to be
  something hardly noticeable.  Of course new features will only be
  available with the new interface and there will be efforts to nudge
  people away from the old one but the existing interface will keep
  working it does.
 
 I can understand some alarm.  When I saw the below I started frothing at
 the face and howling at the moon, and I don't even use the things much.
 
 http://lists.freedesktop.org/archives/systemd-devel/2013-June/011521.html
 
 Hierarchy layout aside, that private property bit says that the folks
 who currently own and use the cgroups interface will lose direct access
 to it.  I can imagine folks who have become dependent upon an on the fly
 management agents of their own design becoming a tad alarmed.

FWIW, the code is too embarassing yet to see daylight, but I'm playing
with a very lowlevel cgroup manager which supports nesting itself.
Access in this POC is low-level (set freezer.state to THAWED for cgroup
/c1/c2, Create /c3), but the key feature is that it can run in two
modes - native mode in which it uses cgroupfs, and child mode where it
talks to a parent manager to make the changes.

So then the idea would be that userspace (like libvirt and lxc) would
talk over /dev/cgroup to its manager.  Userspace inside a container
(which can't actually mount cgroups itself) would talk to its own
manager which is talking over a passed-in socket to the host manager,
which in turn runs natively (uses cgroupfs, and nests create /c1 under
the requestor's cgroup).

At some point (probably soon) we might want to talk about a standard API
for these things.  However I think it will have to come in the form of
a standard library, which knows to either send requests over dbus to
systemd, or over /dev/cgroup sock to the manager.

-serge
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tim Hockin
On Thu, Jun 27, 2013 at 6:22 AM, Serge Hallyn serge.hal...@ubuntu.com wrote:
 Quoting Mike Galbraith (bitbuc...@online.de):
 On Wed, 2013-06-26 at 14:20 -0700, Tejun Heo wrote:
  Hello, Tim.
 
  On Mon, Jun 24, 2013 at 09:07:47PM -0700, Tim Hockin wrote:
   I really want to understand why this is SO IMPORTANT that you have to
   break userspace compatibility?  I mean, isn't Linux supposed to be the
   OS with the stable kernel interface?  I've seen Linus rant time and
   time again about this - why is it OK now?
 
  What the hell are you talking about?  Nobody is breaking userland
  interface.  A new version of interface is being phased in and the old
  one will stay there for the foreseeable future.  It will be phased out
  eventually but that's gonna take a long time and it will have to be
  something hardly noticeable.  Of course new features will only be
  available with the new interface and there will be efforts to nudge
  people away from the old one but the existing interface will keep
  working it does.

 I can understand some alarm.  When I saw the below I started frothing at
 the face and howling at the moon, and I don't even use the things much.

 http://lists.freedesktop.org/archives/systemd-devel/2013-June/011521.html

 Hierarchy layout aside, that private property bit says that the folks
 who currently own and use the cgroups interface will lose direct access
 to it.  I can imagine folks who have become dependent upon an on the fly
 management agents of their own design becoming a tad alarmed.

 FWIW, the code is too embarassing yet to see daylight, but I'm playing
 with a very lowlevel cgroup manager which supports nesting itself.
 Access in this POC is low-level (set freezer.state to THAWED for cgroup
 /c1/c2, Create /c3), but the key feature is that it can run in two
 modes - native mode in which it uses cgroupfs, and child mode where it
 talks to a parent manager to make the changes.

In this world, are users able to read cgroup files, or do they have to
go through a central agent, too?

 So then the idea would be that userspace (like libvirt and lxc) would
 talk over /dev/cgroup to its manager.  Userspace inside a container
 (which can't actually mount cgroups itself) would talk to its own
 manager which is talking over a passed-in socket to the host manager,
 which in turn runs natively (uses cgroupfs, and nests create /c1 under
 the requestor's cgroup).

How do you handle updates of this agent?  Suppose I have hundreds of
running containers, and I want to release a new version of the cgroupd
?

(note: inquiries about the implementation do not denote acceptance of
the model :)

 At some point (probably soon) we might want to talk about a standard API
 for these things.  However I think it will have to come in the form of
 a standard library, which knows to either send requests over dbus to
 systemd, or over /dev/cgroup sock to the manager.

 -serge
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Serge Hallyn
Quoting Tim Hockin (thoc...@hockin.org):
 On Thu, Jun 27, 2013 at 6:22 AM, Serge Hallyn serge.hal...@ubuntu.com wrote:
  Quoting Mike Galbraith (bitbuc...@online.de):
  On Wed, 2013-06-26 at 14:20 -0700, Tejun Heo wrote:
   Hello, Tim.
  
   On Mon, Jun 24, 2013 at 09:07:47PM -0700, Tim Hockin wrote:
I really want to understand why this is SO IMPORTANT that you have to
break userspace compatibility?  I mean, isn't Linux supposed to be the
OS with the stable kernel interface?  I've seen Linus rant time and
time again about this - why is it OK now?
  
   What the hell are you talking about?  Nobody is breaking userland
   interface.  A new version of interface is being phased in and the old
   one will stay there for the foreseeable future.  It will be phased out
   eventually but that's gonna take a long time and it will have to be
   something hardly noticeable.  Of course new features will only be
   available with the new interface and there will be efforts to nudge
   people away from the old one but the existing interface will keep
   working it does.
 
  I can understand some alarm.  When I saw the below I started frothing at
  the face and howling at the moon, and I don't even use the things much.
 
  http://lists.freedesktop.org/archives/systemd-devel/2013-June/011521.html
 
  Hierarchy layout aside, that private property bit says that the folks
  who currently own and use the cgroups interface will lose direct access
  to it.  I can imagine folks who have become dependent upon an on the fly
  management agents of their own design becoming a tad alarmed.
 
  FWIW, the code is too embarassing yet to see daylight, but I'm playing
  with a very lowlevel cgroup manager which supports nesting itself.
  Access in this POC is low-level (set freezer.state to THAWED for cgroup
  /c1/c2, Create /c3), but the key feature is that it can run in two
  modes - native mode in which it uses cgroupfs, and child mode where it
  talks to a parent manager to make the changes.
 
 In this world, are users able to read cgroup files, or do they have to
 go through a central agent, too?

The agent won't itself do anything to stop access through cgroupfs, but
the idea would be that cgroupfs would only be mounted in the agent's
mntns.  My hope would be that the libcgroup commands (like cgexec,
cgcreate, etc) would know to talk to the agent when possible, and users
would use those.

  So then the idea would be that userspace (like libvirt and lxc) would
  talk over /dev/cgroup to its manager.  Userspace inside a container
  (which can't actually mount cgroups itself) would talk to its own
  manager which is talking over a passed-in socket to the host manager,
  which in turn runs natively (uses cgroupfs, and nests create /c1 under
  the requestor's cgroup).
 
 How do you handle updates of this agent?  Suppose I have hundreds of
 running containers, and I want to release a new version of the cgroupd
 ?

This may change (which is part of what I want to investigate with some
POC), but right now I'm building any controller-aware smarts into it.  I
think that's what you're asking about?  The agent doesn't do slices
etc.  This may turn out to be insufficient, we'll see.

So the only state which the agent stores is a list of cgroup mounts (if
in native mode) or an open socket to the parent (if in child mode), and a
list of connected children sockets.

HUPping the agent will cause it to reload the cgroupfs mounts (in case
you've mounted a new controller, living in the old world :).  If you
just kill it and start a new one, it shouldn't matter.

 (note: inquiries about the implementation do not denote acceptance of
 the model :)

To put it another way, the problem I'm solving (for now) is not the I
want a daemon to ensure that requested guarantees are correctly
implemented. In that sense I'm maintaining the status quo, i.e. the
admin needs to architect the layout correctly.

The problem I'm solving is really that I want containers to be able to
handle cgroups even if they can't mount cgroupfs, and I want all
userspace to be able to behave the same whether they are in a container
or not.

This isn't meant as a poke in the eye of anyone who wants to address the
other problem.  If it turns out that we (meaning the community of
cgroup users) really want such an agent, then we can add that.  I'm not
convinced.

What would probably be a better design, then, would be that the agent
I'm working on can plug into a resource allocation agent.  Or, I
suppose, the other way around.

  At some point (probably soon) we might want to talk about a standard API
  for these things.  However I think it will have to come in the form of
  a standard library, which knows to either send requests over dbus to
  systemd, or over /dev/cgroup sock to the manager.
 
  -serge
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello, Tim.

On Wed, Jun 26, 2013 at 08:42:21PM -0700, Tim Hockin wrote:
 OK, then what I don't know is what is the new interface?  A new cgroupfs?

It's gonna be a new mount option for cgroupfs.

 DTF and CPU and cpuset all have default groups for some tasks (and
 not others) in our world today.  DTF actually has default, prio, and
 normal.  I was simplifying before.  I really wish it were as simple
 as you think it is.  But if it were, do you think I'd still be
 arguing?

How am I supposed to know when you don't communicate it but just wave
your hands saying it's all very complicated?  The cpuset / blkcg
example is pretty bad because you can enforce any cpuset rules at the
leaves.

 This really doesn't scale when I have thousands of jobs running.
 Being able to disable at some levels on some controllers probably
 helps some, but I can't say for sure without knowing the new interface

How does the number of jobs affect it?  Does each job create a new
cgroup?

 We tried it in unified hierarchy.  We had our Top People on the
 problem.  The best we could get was bad enough that we embarked on a
 LITERAL 2 year transition to make it better.

What didn't work?  What part was so bad?  I find it pretty difficult
to believe that multiple orthogonal hierarchies is the only possible
solution, so please elaborate the issues that you guys have
experienced.

The hierarchy is for organization and enforcement of dynamic
hierarchical resource distribution and that's it.  If its expressive
power is lacking, take compromise or tune the configuration according
to the workloads.  The latter is necessary in workloads which have
clear distinction of foreground and background anyway - anything which
interacts with human beings including androids.

 In other words, define a container as a set of cgroups, one under each
 each active controller type.  A TID enters the container atomically,
 joining all of the cgroups or none of the cgroups.
 
 container C1 = { /cgroup/cpu/foo, /cgroup/memory/bar,
 /cgroup/io/default/foo/bar, /cgroup/cpuset/
 
 This is an abstraction that we maintain in userspace (more or less)
 and we do actually have headaches from split hierarchies here
 (handling partial failures, non-atomic joins, etc)

That'd separate out task organization from controllre config
hierarchies.  Kay had a similar idea some time ago.  I think it makes
things even more complex than it is right now.  I'll continue on this
below.

 I'm still a bit fuzzy - is all of this written somewhere?

If you dig through cgroup ML, most are there.  There'll be
cgroup.controllers file with which you can enable / disable
controllers.  Enabling a controller in a cgroup implies that the
controller is enabled in all ancestors.

 It sounds like you're missing a layer of abstraction.  Why not add the
 abstraction you want to expose on top of powerful primitives, instead
 of dumbing down the primitives?

It sure would be possible build more and try to address the issues
we're seeing now; however, after looking at cgroups for some time now,
the underlying theme is failure to take reasonable trade-offs and
going for maximum flexibility in making each choice - the choice of
interface, multiple hierarchies, no restriction on hierarchical
behavior, splitting threads of the same process into separate cgroups,
semi-encouraging delegation through file permission without actually
pondering the consequences and so on.  And each choice probably made
sense trying to serve each immediate requirement at the time but added
up it's a giant pile of mess which developed without direction.

So, at this point, I'm very skeptical about adding more flexibility.
Once the basics are settled, we sure can look into the missing pieces
but I don't think that's what we should be doing right now.  Another
thing is that the unified hierarchy can be implemented by using most
of the constructs cgroup core already has in more controller way.
Given that we're gonna have to maintain both interfaces for quite some
time, the deviation should be kept as minimal as possible.

 But it seems vastly better to define a next-gen API that retains the
 important flexibility but adds structure where it was lacking
 previously.

I suppose that's where we disagree.  I think a lot of cgroup's
problems stem from too much flexibility.  The problem with such level
of flexibility is that, in addition to breaking fundamental constructs
and adding significantly to maintenance overhead, it blocks reasonable
trade-offs to be made at the right places, in turn requiring more
flexibility to address the introduced deficiencies.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello, Serge.

On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
 At some point (probably soon) we might want to talk about a standard API
 for these things.  However I think it will have to come in the form of
 a standard library, which knows to either send requests over dbus to
 systemd, or over /dev/cgroup sock to the manager.

Yeah, eventually, I think we'll have a standardized way to configure
resource distribution in the system.  Maybe we'll agree on a
standardized dbus protocol or there will be library, I don't know;
however, whatever form it may be in, it abstraction level should be
way higher than that of direct cgroupfs access.  It's way too low
level and very easy to end up in a complete nonsense configuration.

e.g. enabling cpu on a cgroup whlie leaving other cgroups alone
wouldn't enable fair scheduling on that cgroup but drastically reduce
the amount of cpu share it gets as it now gets treated as single
entity competing with all tasks at the parent level.

At the moment, I'm not sure what the eventual abstraction would look
like.  systemd is extending its basic constructs by adding slices and
scopes and it does make sense to integrate the general organization of
the system (services, user sessions, VMs and so on) with resource
management.  Given some time, I'm hoping we'll be able to come up with
and agree on some common constructs so that each workload can indicate
its resource requirements in a unified way.

That said, I really think we should experiment for a while before
trying to settle down on things.  We've now just started exploring how
system-wide resource managment can be made widely available to systems
without requiring extremely specialized hand-crafted configurations
and I'm pretty sure we're getting and gonna get quite a few details
wrong, so I don't think it'd be a good idea to try to agree on things
right now.  As far as such integration goes, I think it's time to play
with things and observe the results.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello, Mike.

On Thu, Jun 27, 2013 at 07:45:07AM +0200, Mike Galbraith wrote:
 I can understand some alarm.  When I saw the below I started frothing at
 the face and howling at the moon, and I don't even use the things much.

Can I ask why?  The reasons are not apparent to me.

 http://lists.freedesktop.org/archives/systemd-devel/2013-June/011521.html
 
 Hierarchy layout aside, that private property bit says that the folks
 who currently own and use the cgroups interface will lose direct access
 to it.  I can imagine folks who have become dependent upon an on the fly
 management agents of their own design becoming a tad alarmed.

They're gonna be able to do what they've been doing for the
foreseeable future if they choose not to use systemd's unified
resource management.  That said, what we have today is pretty lousy
and a lot of hierarchical stuff were completely broken until some
releases ago and things *must* have been broken on the userland side
too.  It could have worked for their specific setup but I strongly
doubt there are anything generic working well out in the wild.  cgroup
hasn't been capable of supporting something like that.

AFAICS, having a userland agent which has overall knowledge of the
hierarchy and enforcesf structure and limiations is a requirement to
make cgroup generally useable and useful.  For systemd based systems,
systemd serving that role isn't too crazy.  It's sure gonna have
teeting issues at the beginning but it has all the necessary
information to manage workloads on the system.

A valid issue is interoperability between systemd and non-systemd
systems.  I don't have an immediately good answer for that.  I wrote
in another reply but making cgroup generally available is a pretty new
effort and we're still in the process of figuring out what the right
constructs and abstractions are.  Hopefully, we'll be able to reach a
common set of abstractions to base things on top in itme.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Serge Hallyn
Quoting Tejun Heo (t...@kernel.org):
 Hello, Serge.
 
 On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
  At some point (probably soon) we might want to talk about a standard API
  for these things.  However I think it will have to come in the form of
  a standard library, which knows to either send requests over dbus to
  systemd, or over /dev/cgroup sock to the manager.
 
 Yeah, eventually, I think we'll have a standardized way to configure
 resource distribution in the system.  Maybe we'll agree on a
 standardized dbus protocol or there will be library, I don't know;
 however, whatever form it may be in, it abstraction level should be
 way higher than that of direct cgroupfs access.  It's way too low
 level and very easy to end up in a complete nonsense configuration.
 
 e.g. enabling cpu on a cgroup whlie leaving other cgroups alone
 wouldn't enable fair scheduling on that cgroup but drastically reduce
 the amount of cpu share it gets as it now gets treated as single
 entity competing with all tasks at the parent level.

Right.  I *think* this can be offered as a daemon which sits as the
sole consumer of my agent's API, and offers a higher level do what I
want API.  But designing that API is going to be interesting.

I should find a good, up-to-date summary of the current behaviors of
each controller so I can talk more intelligently about it.  (I'll
start by looking at the kernel Documentation/cgroups, but don't
feel too confident that they'll be uptodate :)

 At the moment, I'm not sure what the eventual abstraction would look
 like.  systemd is extending its basic constructs by adding slices and
 scopes and it does make sense to integrate the general organization of
 the system (services, user sessions, VMs and so on) with resource
 management.  Given some time, I'm hoping we'll be able to come up with
 and agree on some common constructs so that each workload can indicate
 its resource requirements in a unified way.
 
 That said, I really think we should experiment for a while before
 trying to settle down on things.  We've now just started exploring how
 system-wide resource managment can be made widely available to systems
 without requiring extremely specialized hand-crafted configurations
 and I'm pretty sure we're getting and gonna get quite a few details
 wrong, so I don't think it'd be a good idea to try to agree on things
 right now.  As far as such integration goes, I think it's time to play
 with things and observe the results.

Right,  I'm not attached to my toy implementation at all - except for
the ability, in some fashion, to have nested agents which don't have
cgroupfs access but talk to another agent to get the job done.

-serge
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Tejun Heo
Hello, Serge.

On Thu, Jun 27, 2013 at 01:14:57PM -0500, Serge Hallyn wrote:
 I should find a good, up-to-date summary of the current behaviors of
 each controller so I can talk more intelligently about it.  (I'll
 start by looking at the kernel Documentation/cgroups, but don't
 feel too confident that they'll be uptodate :)

Heh, it's hopelessly outdated.  Sorry about that.  I'll get around to
updating it eventually.  Right now everything is in flux.

 Right,  I'm not attached to my toy implementation at all - except for
 the ability, in some fashion, to have nested agents which don't have
 cgroupfs access but talk to another agent to get the job done.

I think it probably would be better to allow organization and RO
access to knobs and stat files inside containers, for lower overhead,
if nothing else, and have comm channel for operations which need
supervision at a wider level.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cgroup: status-quo and userland efforts

2013-06-27 Thread Serge Hallyn
Quoting Tejun Heo (t...@kernel.org):
 Hello, Serge.
 
 On Thu, Jun 27, 2013 at 01:14:57PM -0500, Serge Hallyn wrote:
  I should find a good, up-to-date summary of the current behaviors of
  each controller so I can talk more intelligently about it.  (I'll
  start by looking at the kernel Documentation/cgroups, but don't
  feel too confident that they'll be uptodate :)
 
 Heh, it's hopelessly outdated.  Sorry about that.  I'll get around to
 updating it eventually.  Right now everything is in flux.
 
  Right,  I'm not attached to my toy implementation at all - except for
  the ability, in some fashion, to have nested agents which don't have
  cgroupfs access but talk to another agent to get the job done.
 
 I think it probably would be better to allow organization and RO

What do you mean by organization?  Creating cgroups and moving tasks
between them, without setting other cgroup values?

 access to knobs and stat files inside containers, for lower overhead,
 if nothing else, and have comm channel for operations which need
 supervision at a wider level.
 
 Thanks.
 
 -- 
 tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >