subject:"Re\: \[PATCH RFC 0\/2\] add nproc cgroup subsystem"

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-03-02 Thread Tejun Heo

On Tue, Mar 03, 2015 at 12:31:19AM +1100, Aleksa Sarai wrote:
> > If 16-bit PID's aren't a concern anymore, then why do we still default to
> > treating it like a 16-bit signed int (the default for
> > /proc/sys/kernel/pid_max is 32768)?
> 
> I just want to emphasise that *even if* we changed to another default
> limit, the mere existence of a system-wide pid_max makes PIDs a
> resource.

We seem to fail to communicate.  The primary reason why pid promotes
itself to a global resource status is because it's globally capped way
below its backing resource's (kernel memory) limit and it is very
difficult to make it not so due to direct userland dependencies on it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-03-02 Thread Tejun Heo

On Mon, Mar 02, 2015 at 08:13:23AM -0500, Austin S Hemmelgarn wrote:
> If 16-bit PID's aren't a concern anymore, then why do we still default to
> treating it like a 16-bit signed int (the default for
> /proc/sys/kernel/pid_max is 32768)?

Inertia.  It has to start there for backward compatibility.  Now it's
trivial to adjust dynamically and majority of the users don't need to
worry about it, so there's no pressing reason to bump it up by
default.

16bit pid_t was already a dying breed on 32bit config and it never was
an option on 64bit.  Any remotely modern distros in the past decade,
whether 32 or 64bit, wouldn't have any problem with it.  The only
possibly problematic case would be legacy code which for some reason
explicitly used 16bit integer types instead of pid_t, but at this
point, we shouldn't be basing any design decisions on that.  If
anybody is still depending on that, there are different ways ton deal
with the issue on their end including namespacing its pid space.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-03-02 Thread Aleksa Sarai

> If 16-bit PID's aren't a concern anymore, then why do we still default to
> treating it like a 16-bit signed int (the default for
> /proc/sys/kernel/pid_max is 32768)?

I just want to emphasise that *even if* we changed to another default
limit, the mere existence of a system-wide pid_max makes PIDs a
resource.

--
Aleksa Sarai (cyphar)
www.cyphar.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-03-02 Thread Austin S Hemmelgarn


On 2015-02-28 11:43, Tejun Heo wrote:

Hello, Tim.

On Sat, Feb 28, 2015 at 08:38:07AM -0800, Tim Hockin wrote:

I know there is not much concern for legacy-system problems, but it is
worth adding this case - there are systems that limit PIDs for other
reasons, eg broken infrastructure that assumes PIDs fit in a short int,
hypothetically.  Given such a system, PIDs become precious and limiting
them per job is important.

My main point being that there are less obvious considerations in play than
just memory usage.


Sure, there are those cases but it'd be unwise to hinge long term
decisions on them.  It's hard to even argue 16bit pid in legacy code
as a significant contributing factor at this point.  At any rate, it
seems that pid is a global resource which needs to be provisioned for
reasonable isolation which is a good reason to consider controlling it
via cgroups.
If 16-bit PID's aren't a concern anymore, then why do we still default 
to treating it like a 16-bit signed int (the default for 
/proc/sys/kernel/pid_max is 32768)?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-03-02 Thread Tejun Heo

On Tue, Mar 03, 2015 at 12:31:19AM +1100, Aleksa Sarai wrote:
  If 16-bit PID's aren't a concern anymore, then why do we still default to
  treating it like a 16-bit signed int (the default for
  /proc/sys/kernel/pid_max is 32768)?
 
 I just want to emphasise that *even if* we changed to another default
 limit, the mere existence of a system-wide pid_max makes PIDs a
 resource.

We seem to fail to communicate.  The primary reason why pid promotes
itself to a global resource status is because it's globally capped way
below its backing resource's (kernel memory) limit and it is very
difficult to make it not so due to direct userland dependencies on it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-03-02 Thread Tejun Heo

On Mon, Mar 02, 2015 at 08:13:23AM -0500, Austin S Hemmelgarn wrote:
 If 16-bit PID's aren't a concern anymore, then why do we still default to
 treating it like a 16-bit signed int (the default for
 /proc/sys/kernel/pid_max is 32768)?

Inertia.  It has to start there for backward compatibility.  Now it's
trivial to adjust dynamically and majority of the users don't need to
worry about it, so there's no pressing reason to bump it up by
default.

16bit pid_t was already a dying breed on 32bit config and it never was
an option on 64bit.  Any remotely modern distros in the past decade,
whether 32 or 64bit, wouldn't have any problem with it.  The only
possibly problematic case would be legacy code which for some reason
explicitly used 16bit integer types instead of pid_t, but at this
point, we shouldn't be basing any design decisions on that.  If
anybody is still depending on that, there are different ways ton deal
with the issue on their end including namespacing its pid space.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-03-02 Thread Austin S Hemmelgarn


On 2015-02-28 11:43, Tejun Heo wrote:

Hello, Tim.

On Sat, Feb 28, 2015 at 08:38:07AM -0800, Tim Hockin wrote:

I know there is not much concern for legacy-system problems, but it is
worth adding this case - there are systems that limit PIDs for other
reasons, eg broken infrastructure that assumes PIDs fit in a short int,
hypothetically.  Given such a system, PIDs become precious and limiting
them per job is important.

My main point being that there are less obvious considerations in play than
just memory usage.


Sure, there are those cases but it'd be unwise to hinge long term
decisions on them.  It's hard to even argue 16bit pid in legacy code
as a significant contributing factor at this point.  At any rate, it
seems that pid is a global resource which needs to be provisioned for
reasonable isolation which is a good reason to consider controlling it
via cgroups.
If 16-bit PID's aren't a concern anymore, then why do we still default 
to treating it like a 16-bit signed int (the default for 
/proc/sys/kernel/pid_max is 32768)?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-03-02 Thread Aleksa Sarai

 If 16-bit PID's aren't a concern anymore, then why do we still default to
 treating it like a 16-bit signed int (the default for
 /proc/sys/kernel/pid_max is 32768)?

I just want to emphasise that *even if* we changed to another default
limit, the mere existence of a system-wide pid_max makes PIDs a
resource.

--
Aleksa Sarai (cyphar)
www.cyphar.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tim Hockin

On Feb 28, 2015 2:50 PM, "Tejun Heo"  wrote:
>
> On Sat, Feb 28, 2015 at 02:26:58PM -0800, Tim Hockin wrote:
> > Wow, so much anger.  I'm not even sure how to respond, so I'll just
> > say this and sign off.  All I want is a better, friendlier, more
> > useful system overall.  We clearly have different ways of looking at
> > the problem.
>
> Can you communicate anything w/o passive aggression?  If you have a
> technical point, just state that.  Can you at least agree that we
> shouldn't be making design decisions based on 16bit pid_t?

Hmm, I have screwed this thread up, I think.  I've made some remarks
that did not come through with the proper tongue-in-cheek slant.  I'm
not being passive aggressive - we DO look at this problem differently.
OF COURSE we should not make decisions based on ancient artifacts of
history.  My point was that there are secondary considerations here -
PIDs are more than just the memory that backs them.  They _ARE_ a
constrained resource, and you shouldn't assume the constraint is just
physical memory.  It is a piece of policy that is outside the control
of the kernel proper - we handed those keys to userspace along time
ago.

Given that, I believe and have believed that the solution should model
the problem as the user perceives it - limiting PIDs - rather than
attaching to a solution-by-proxy.

Yes a solution here partially overlaps with kmemcg, but I don't think
that is a significant problem.  They are different policies governing
behavior that may result in the same condition, but for very different
reasons.  I do not think that is particularly bad for overall
comprehension, and I think the fact that this popped up yet again
indicates the existence of some nugget of user experience that is
worth paying consideration to.

I appreciate your promised consideration through a slightly refocused
lens.  I will go back to my cave and do something I hope is more
productive and less antagonistic.  I did not mean to bring out so much
vitriol.

Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Johannes Weiner

On Sat, Feb 28, 2015 at 02:26:58PM -0800, Tim Hockin wrote:
> On Sat, Feb 28, 2015 at 8:57 AM, Tejun Heo  wrote:
> >
> > On Sat, Feb 28, 2015 at 08:48:12AM -0800, Tim Hockin wrote:
> > > I am sorry that real-user problems are not perceived as substantial.  This
> > > was/is a real issue for us.  Being in limbo for years on end might not be 
> > > a
> > > technical point, but I do think it matters, and that was my point.
> >
> > It's a problem which is localized to you and caused by the specific
> > problems of your setup.  This isn't a wide-spread problem at all and
> > the world doesn't revolve around you.  If your setup is so messed up
> > as to require sticking to 16bit pids, handle that locally.  If
> > something at larger scale eases that handling, you get lucky.  If not,
> > it's *your* predicament to deal with.  The rest of the world doesn't
> > exist to wipe your ass.
> 
> Wow, so much anger.

Yeah, quite surprising after such an intellectually honest discussion:

: On Fri, Feb 27, 2015 at 01:45:09PM -0800, Tim Hockin wrote:
: > At least 3 or 4 people have INDEPENDENTLY decided this is what is
: > causing them pain and tried to fix it and invested the time to send a
: > patch says that it is actually a thing.  There exists a problem that
: > you are disallowing to be fixed.  Do you recognize that users are
: > experiencing pain?  Why do you hate your users? :)

[...]

: > Are you willing to put a drop-dead date on it?  If we don't have
: > kmemcg working well enough to _actually_ bound PID usage and FD usage
: > by, say, June 1st, will you then accept a patch to this effect?  If
: > the answer is no, then I have zero faith that it's coming any time
: > soon - I heard this 2 years ago.  I believed you then.

> I'm not even sure how to respond, so I'll just say this and sign
> off.  All I want is a better, friendlier, more useful system
> overall.  We clearly have different ways of looking at the problem.

Overlapping features and inconsistent userspace interfaces are only
better for the people that pick the hacks.  They are the opposite of
friendly and useful.  They are also horrible to maintain, which could
be a reason why you constantly disagree with the people that cleaned
up this unholy mess and are now trying to keep a balance between your
short term interests and the long-term health of the Linux kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tejun Heo

On Sat, Feb 28, 2015 at 02:26:58PM -0800, Tim Hockin wrote:
> Wow, so much anger.  I'm not even sure how to respond, so I'll just
> say this and sign off.  All I want is a better, friendlier, more
> useful system overall.  We clearly have different ways of looking at
> the problem.

Can you communicate anything w/o passive aggression?  If you have a
technical point, just state that.  Can you at least agree that we
shouldn't be making design decisions based on 16bit pid_t?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tim Hockin

On Sat, Feb 28, 2015 at 8:57 AM, Tejun Heo  wrote:
>
> On Sat, Feb 28, 2015 at 08:48:12AM -0800, Tim Hockin wrote:
> > I am sorry that real-user problems are not perceived as substantial.  This
> > was/is a real issue for us.  Being in limbo for years on end might not be a
> > technical point, but I do think it matters, and that was my point.
>
> It's a problem which is localized to you and caused by the specific
> problems of your setup.  This isn't a wide-spread problem at all and
> the world doesn't revolve around you.  If your setup is so messed up
> as to require sticking to 16bit pids, handle that locally.  If
> something at larger scale eases that handling, you get lucky.  If not,
> it's *your* predicament to deal with.  The rest of the world doesn't
> exist to wipe your ass.

Wow, so much anger.  I'm not even sure how to respond, so I'll just
say this and sign off.  All I want is a better, friendlier, more
useful system overall.  We clearly have different ways of looking at
the problem.

No antagonism intended

Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tejun Heo

On Sat, Feb 28, 2015 at 08:48:12AM -0800, Tim Hockin wrote:
> I am sorry that real-user problems are not perceived as substantial.  This
> was/is a real issue for us.  Being in limbo for years on end might not be a
> technical point, but I do think it matters, and that was my point.

It's a problem which is localized to you and caused by the specific
problems of your setup.  This isn't a wide-spread problem at all and
the world doesn't revolve around you.  If your setup is so messed up
as to require sticking to 16bit pids, handle that locally.  If
something at larger scale eases that handling, you get lucky.  If not,
it's *your* predicament to deal with.  The rest of the world doesn't
exist to wipe your ass.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tejun Heo

Hello, Tim.

On Sat, Feb 28, 2015 at 08:38:07AM -0800, Tim Hockin wrote:
> I know there is not much concern for legacy-system problems, but it is
> worth adding this case - there are systems that limit PIDs for other
> reasons, eg broken infrastructure that assumes PIDs fit in a short int,
> hypothetically.  Given such a system, PIDs become precious and limiting
> them per job is important.
>
> My main point being that there are less obvious considerations in play than
> just memory usage.

Sure, there are those cases but it'd be unwise to hinge long term
decisions on them.  It's hard to even argue 16bit pid in legacy code
as a significant contributing factor at this point.  At any rate, it
seems that pid is a global resource which needs to be provisioned for
reasonable isolation which is a good reason to consider controlling it
via cgroups.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tejun Heo

Hello, Aleksa.

On Sat, Feb 28, 2015 at 08:26:34PM +1100, Aleksa Sarai wrote:
> I just want to quickly echo my support for this statement. Process IDs
> aren't limited by kernel memory, they're a hard-set limit. Thus they are

Process IDs become a hard global resource because we didn't switch to
long during 64bit transition and put an artifical global limit on it,
which allows it to affect system-wide operation while its memory
consumption is staying within practical range.

> a resource like other global resources (open files, etc). Now, while you

Unlike open files.

> can argue that it is possible to limit the amount of *effective*
> processes you can use in a cgroup through kmemcg (by limiting the amount
> of memory spent in storing task_struct data) -- that isn't limiting the
> usage of the *actual* resource (the fact you're limiting the number of
> PIDs is little more than a by-product).

No, the problem is not that.  The problem is that pid_t is, as a
resource, is decoupled from its backing resource - memory - by the
extra artificial and difficult-to-overcome limit put on it.  You are
saying something which is completely different from what Austin was
arguing.

> Also, If it wasn't an actual resource then why is RLIMIT_NPROC a thing?

One strong reason would be because we didn't have a way to account for
and limit the fundamental resources.  If you can fully contain and
control the consumption via rationing the underlying resource, there
isn't much point in controlling the upper layer constructs.

> To me, that indicates that PID limiting not an esoteric usecase and it
> should be possible to use the Linux kernel's home-grown accounting
> system to limit the number of PIDs in a cgroup. Otherwise you're stuck

Again, I think it's a lot more indicative of the fact that we didn't
have any way to control kernel side memory consumption and pids and
open files were one of the things which are relatively easy to
implement policy-wise.

> in a weird world where you *can* limit the number of processes in a
> process tree but *not* the number of processes in a cgroup.

I'm not sold on the idea of replicating the features of ulimit in
cgroups.  ulimit is a mixed bag of relatively easily implementable
resource limits and their behaviors are a combination of resource
limits, per-user usage policies, and per-process behavior safetynets.
The only part translatable to cgroups is actual resource related part
and even among those we should identify what are actual resources
which can't be mapped to consumption of other fundamental resources.

> >> In general, I'm pretty strongly against adding controllers for things
> >> which aren't fundamental resources in the system.  What's next?  Open
> >> files?  Pipe buffer?  Number of flocks?  Number of session leaders or
> >> program groups?
> >>
> > PID's are a fundamental resource, you run out and it's an only marginally
> > better situation than OOM, namely, if you don't already have a shell open
> > which has kill builtin (because you can't fork), or have some other reliable
> > way to terminate processes without forking, you are stuck either waiting for
> > the problem to resolve itself, or have to reset the system.
> 
> I couldn't agree more. PIDs are a fundamental resource because there is
> a hard limit on the amount of PIDs you can have in any one system. Once
> you've exhausted that limit, there's not much you can do apart from
> doing the SYSRQ dance.

The reason why this holds is because we can hit the global limit way
earlier than a practically sized kmem consumption limits can kick in.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Aleksa Sarai

> I wouldn't think that preventing PID exhaustion would be all that much of a
> niche case, it's fully possible for it to happen without using excessive
> amounts of kernel memory (think about BIG server systems with terabytes of
> memory running (arguably poorly written) forking servers that handle tens of
> thousands of client requests per second, each lasting multiple tens of
> seconds), and not necessarily as trivial as you might think to handle sanely
> (especially if you want callbacks when the limits get hit).
> As far as being trivial to achieve, I'm assuming you are referring to rlimit
> and PAM's limits module, both of which have their own issues. Using
> pam_limits.so to limit processes isn't trivial because it requires calling
> through PAM to begin with, which almost no software that isn't login related
> does, and rlimits are tricky to set up properly with the granularity that
> having a cgroup would provide.

I just want to quickly echo my support for this statement. Process IDs
aren't limited by kernel memory, they're a hard-set limit. Thus they are
a resource like other global resources (open files, etc). Now, while you
can argue that it is possible to limit the amount of *effective*
processes you can use in a cgroup through kmemcg (by limiting the amount
of memory spent in storing task_struct data) -- that isn't limiting the
usage of the *actual* resource (the fact you're limiting the number of
PIDs is little more than a by-product).

Also, If it wasn't an actual resource then why is RLIMIT_NPROC a thing?
To me, that indicates that PID limiting not an esoteric usecase and it
should be possible to use the Linux kernel's home-grown accounting
system to limit the number of PIDs in a cgroup. Otherwise you're stuck
in a weird world where you *can* limit the number of processes in a
process tree but *not* the number of processes in a cgroup.

>> In general, I'm pretty strongly against adding controllers for things
>> which aren't fundamental resources in the system.  What's next?  Open
>> files?  Pipe buffer?  Number of flocks?  Number of session leaders or
>> program groups?
>>
> PID's are a fundamental resource, you run out and it's an only marginally
> better situation than OOM, namely, if you don't already have a shell open
> which has kill builtin (because you can't fork), or have some other reliable
> way to terminate processes without forking, you are stuck either waiting for
> the problem to resolve itself, or have to reset the system.

I couldn't agree more. PIDs are a fundamental resource because there is
a hard limit on the amount of PIDs you can have in any one system. Once
you've exhausted that limit, there's not much you can do apart from
doing the SYSRQ dance.

-- 
Aleksa Sarai (cyphar)
www.cyphar.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Aleksa Sarai

 I wouldn't think that preventing PID exhaustion would be all that much of a
 niche case, it's fully possible for it to happen without using excessive
 amounts of kernel memory (think about BIG server systems with terabytes of
 memory running (arguably poorly written) forking servers that handle tens of
 thousands of client requests per second, each lasting multiple tens of
 seconds), and not necessarily as trivial as you might think to handle sanely
 (especially if you want callbacks when the limits get hit).
 As far as being trivial to achieve, I'm assuming you are referring to rlimit
 and PAM's limits module, both of which have their own issues. Using
 pam_limits.so to limit processes isn't trivial because it requires calling
 through PAM to begin with, which almost no software that isn't login related
 does, and rlimits are tricky to set up properly with the granularity that
 having a cgroup would provide.

I just want to quickly echo my support for this statement. Process IDs
aren't limited by kernel memory, they're a hard-set limit. Thus they are
a resource like other global resources (open files, etc). Now, while you
can argue that it is possible to limit the amount of *effective*
processes you can use in a cgroup through kmemcg (by limiting the amount
of memory spent in storing task_struct data) -- that isn't limiting the
usage of the *actual* resource (the fact you're limiting the number of
PIDs is little more than a by-product).

Also, If it wasn't an actual resource then why is RLIMIT_NPROC a thing?
To me, that indicates that PID limiting not an esoteric usecase and it
should be possible to use the Linux kernel's home-grown accounting
system to limit the number of PIDs in a cgroup. Otherwise you're stuck
in a weird world where you *can* limit the number of processes in a
process tree but *not* the number of processes in a cgroup.

 In general, I'm pretty strongly against adding controllers for things
 which aren't fundamental resources in the system.  What's next?  Open
 files?  Pipe buffer?  Number of flocks?  Number of session leaders or
 program groups?

 PID's are a fundamental resource, you run out and it's an only marginally
 better situation than OOM, namely, if you don't already have a shell open
 which has kill builtin (because you can't fork), or have some other reliable
 way to terminate processes without forking, you are stuck either waiting for
 the problem to resolve itself, or have to reset the system.

I couldn't agree more. PIDs are a fundamental resource because there is
a hard limit on the amount of PIDs you can have in any one system. Once
you've exhausted that limit, there's not much you can do apart from
doing the SYSRQ dance.

-- 
Aleksa Sarai (cyphar)
www.cyphar.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tejun Heo

Hello, Aleksa.

On Sat, Feb 28, 2015 at 08:26:34PM +1100, Aleksa Sarai wrote:
 I just want to quickly echo my support for this statement. Process IDs
 aren't limited by kernel memory, they're a hard-set limit. Thus they are

Process IDs become a hard global resource because we didn't switch to
long during 64bit transition and put an artifical global limit on it,
which allows it to affect system-wide operation while its memory
consumption is staying within practical range.

 a resource like other global resources (open files, etc). Now, while you

Unlike open files.

 can argue that it is possible to limit the amount of *effective*
 processes you can use in a cgroup through kmemcg (by limiting the amount
 of memory spent in storing task_struct data) -- that isn't limiting the
 usage of the *actual* resource (the fact you're limiting the number of
 PIDs is little more than a by-product).

No, the problem is not that.  The problem is that pid_t is, as a
resource, is decoupled from its backing resource - memory - by the
extra artificial and difficult-to-overcome limit put on it.  You are
saying something which is completely different from what Austin was
arguing.

 Also, If it wasn't an actual resource then why is RLIMIT_NPROC a thing?

One strong reason would be because we didn't have a way to account for
and limit the fundamental resources.  If you can fully contain and
control the consumption via rationing the underlying resource, there
isn't much point in controlling the upper layer constructs.

 To me, that indicates that PID limiting not an esoteric usecase and it
 should be possible to use the Linux kernel's home-grown accounting
 system to limit the number of PIDs in a cgroup. Otherwise you're stuck

Again, I think it's a lot more indicative of the fact that we didn't
have any way to control kernel side memory consumption and pids and
open files were one of the things which are relatively easy to
implement policy-wise.

 in a weird world where you *can* limit the number of processes in a
 process tree but *not* the number of processes in a cgroup.

I'm not sold on the idea of replicating the features of ulimit in
cgroups.  ulimit is a mixed bag of relatively easily implementable
resource limits and their behaviors are a combination of resource
limits, per-user usage policies, and per-process behavior safetynets.
The only part translatable to cgroups is actual resource related part
and even among those we should identify what are actual resources
which can't be mapped to consumption of other fundamental resources.

  In general, I'm pretty strongly against adding controllers for things
  which aren't fundamental resources in the system.  What's next?  Open
  files?  Pipe buffer?  Number of flocks?  Number of session leaders or
  program groups?
 
  PID's are a fundamental resource, you run out and it's an only marginally
  better situation than OOM, namely, if you don't already have a shell open
  which has kill builtin (because you can't fork), or have some other reliable
  way to terminate processes without forking, you are stuck either waiting for
  the problem to resolve itself, or have to reset the system.
 
 I couldn't agree more. PIDs are a fundamental resource because there is
 a hard limit on the amount of PIDs you can have in any one system. Once
 you've exhausted that limit, there's not much you can do apart from
 doing the SYSRQ dance.

The reason why this holds is because we can hit the global limit way
earlier than a practically sized kmem consumption limits can kick in.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tim Hockin

On Sat, Feb 28, 2015 at 8:57 AM, Tejun Heo t...@kernel.org wrote:

 On Sat, Feb 28, 2015 at 08:48:12AM -0800, Tim Hockin wrote:
  I am sorry that real-user problems are not perceived as substantial.  This
  was/is a real issue for us.  Being in limbo for years on end might not be a
  technical point, but I do think it matters, and that was my point.

 It's a problem which is localized to you and caused by the specific
 problems of your setup.  This isn't a wide-spread problem at all and
 the world doesn't revolve around you.  If your setup is so messed up
 as to require sticking to 16bit pids, handle that locally.  If
 something at larger scale eases that handling, you get lucky.  If not,
 it's *your* predicament to deal with.  The rest of the world doesn't
 exist to wipe your ass.

Wow, so much anger.  I'm not even sure how to respond, so I'll just
say this and sign off.  All I want is a better, friendlier, more
useful system overall.  We clearly have different ways of looking at
the problem.

No antagonism intended

Tim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tejun Heo

On Sat, Feb 28, 2015 at 02:26:58PM -0800, Tim Hockin wrote:
 Wow, so much anger.  I'm not even sure how to respond, so I'll just
 say this and sign off.  All I want is a better, friendlier, more
 useful system overall.  We clearly have different ways of looking at
 the problem.

Can you communicate anything w/o passive aggression?  If you have a
technical point, just state that.  Can you at least agree that we
shouldn't be making design decisions based on 16bit pid_t?

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Johannes Weiner

On Sat, Feb 28, 2015 at 02:26:58PM -0800, Tim Hockin wrote:
 On Sat, Feb 28, 2015 at 8:57 AM, Tejun Heo t...@kernel.org wrote:
 
  On Sat, Feb 28, 2015 at 08:48:12AM -0800, Tim Hockin wrote:
   I am sorry that real-user problems are not perceived as substantial.  This
   was/is a real issue for us.  Being in limbo for years on end might not be 
   a
   technical point, but I do think it matters, and that was my point.
 
  It's a problem which is localized to you and caused by the specific
  problems of your setup.  This isn't a wide-spread problem at all and
  the world doesn't revolve around you.  If your setup is so messed up
  as to require sticking to 16bit pids, handle that locally.  If
  something at larger scale eases that handling, you get lucky.  If not,
  it's *your* predicament to deal with.  The rest of the world doesn't
  exist to wipe your ass.
 
 Wow, so much anger.

Yeah, quite surprising after such an intellectually honest discussion:

: On Fri, Feb 27, 2015 at 01:45:09PM -0800, Tim Hockin wrote:
:  At least 3 or 4 people have INDEPENDENTLY decided this is what is
:  causing them pain and tried to fix it and invested the time to send a
:  patch says that it is actually a thing.  There exists a problem that
:  you are disallowing to be fixed.  Do you recognize that users are
:  experiencing pain?  Why do you hate your users? :)

[...]

:  Are you willing to put a drop-dead date on it?  If we don't have
:  kmemcg working well enough to _actually_ bound PID usage and FD usage
:  by, say, June 1st, will you then accept a patch to this effect?  If
:  the answer is no, then I have zero faith that it's coming any time
:  soon - I heard this 2 years ago.  I believed you then.

 I'm not even sure how to respond, so I'll just say this and sign
 off.  All I want is a better, friendlier, more useful system
 overall.  We clearly have different ways of looking at the problem.

Overlapping features and inconsistent userspace interfaces are only
better for the people that pick the hacks.  They are the opposite of
friendly and useful.  They are also horrible to maintain, which could
be a reason why you constantly disagree with the people that cleaned
up this unholy mess and are now trying to keep a balance between your
short term interests and the long-term health of the Linux kernel.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tejun Heo

On Sat, Feb 28, 2015 at 08:48:12AM -0800, Tim Hockin wrote:
 I am sorry that real-user problems are not perceived as substantial.  This
 was/is a real issue for us.  Being in limbo for years on end might not be a
 technical point, but I do think it matters, and that was my point.

It's a problem which is localized to you and caused by the specific
problems of your setup.  This isn't a wide-spread problem at all and
the world doesn't revolve around you.  If your setup is so messed up
as to require sticking to 16bit pids, handle that locally.  If
something at larger scale eases that handling, you get lucky.  If not,
it's *your* predicament to deal with.  The rest of the world doesn't
exist to wipe your ass.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tejun Heo

Hello, Tim.

On Sat, Feb 28, 2015 at 08:38:07AM -0800, Tim Hockin wrote:
 I know there is not much concern for legacy-system problems, but it is
 worth adding this case - there are systems that limit PIDs for other
 reasons, eg broken infrastructure that assumes PIDs fit in a short int,
 hypothetically.  Given such a system, PIDs become precious and limiting
 them per job is important.

 My main point being that there are less obvious considerations in play than
 just memory usage.

Sure, there are those cases but it'd be unwise to hinge long term
decisions on them.  It's hard to even argue 16bit pid in legacy code
as a significant contributing factor at this point.  At any rate, it
seems that pid is a global resource which needs to be provisioned for
reasonable isolation which is a good reason to consider controlling it
via cgroups.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-28 Thread Tim Hockin

On Feb 28, 2015 2:50 PM, Tejun Heo t...@kernel.org wrote:

 On Sat, Feb 28, 2015 at 02:26:58PM -0800, Tim Hockin wrote:
  Wow, so much anger.  I'm not even sure how to respond, so I'll just
  say this and sign off.  All I want is a better, friendlier, more
  useful system overall.  We clearly have different ways of looking at
  the problem.

 Can you communicate anything w/o passive aggression?  If you have a
 technical point, just state that.  Can you at least agree that we
 shouldn't be making design decisions based on 16bit pid_t?

Hmm, I have screwed this thread up, I think.  I've made some remarks
that did not come through with the proper tongue-in-cheek slant.  I'm
not being passive aggressive - we DO look at this problem differently.
OF COURSE we should not make decisions based on ancient artifacts of
history.  My point was that there are secondary considerations here -
PIDs are more than just the memory that backs them.  They _ARE_ a
constrained resource, and you shouldn't assume the constraint is just
physical memory.  It is a piece of policy that is outside the control
of the kernel proper - we handed those keys to userspace along time
ago.

Given that, I believe and have believed that the solution should model
the problem as the user perceives it - limiting PIDs - rather than
attaching to a solution-by-proxy.

Yes a solution here partially overlaps with kmemcg, but I don't think
that is a significant problem.  They are different policies governing
behavior that may result in the same condition, but for very different
reasons.  I do not think that is particularly bad for overall
comprehension, and I think the fact that this popped up yet again
indicates the existence of some nugget of user experience that is
worth paying consideration to.

I appreciate your promised consideration through a slightly refocused
lens.  I will go back to my cave and do something I hope is more
productive and less antagonistic.  I did not mean to bring out so much
vitriol.

Tim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

On Fri, Feb 27, 2015 at 01:45:09PM -0800, Tim Hockin wrote:
> Are you willing to put a drop-dead date on it?  If we don't have
> kmemcg working well enough to _actually_ bound PID usage and FD usage
> by, say, June 1st, will you then accept a patch to this effect?  If
> the answer is no, then I have zero faith that it's coming any time
> soon - I heard this 2 years ago.  I believed you then.

Tim, cut this bullshit.  That's not how kernel development works.
Contribute to techincal discussion or shut it.  I'm really getting
tired of your whining without any useful substance.

> I see further downthread that you said you'll think about it.  Thank
> you.  Just because our use cases are not normal does not mean we're
> not valid :)

And can you even see why that made progress?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tim Hockin

On Fri, Feb 27, 2015 at 9:45 AM, Tejun Heo  wrote:
> On Fri, Feb 27, 2015 at 09:25:10AM -0800, Tim Hockin wrote:
>> > In general, I'm pretty strongly against adding controllers for things
>> > which aren't fundamental resources in the system.  What's next?  Open
>> > files?  Pipe buffer?  Number of flocks?  Number of session leaders or
>> > program groups?
>>
>> Yes to some or all of those.  We do exactly this internally and it has
>> greatly added to the stability of our overall container management
>> system.  and while you have been telling everyone to wait for kmemcg,
>> we have had an extra 3+ years of stability.
>
> Yeah, good job.  I totally get why kernel part of memory consumption
> needs protection.  I'm not arguing against that at all.

You keep shifting the focus to be about memory, but that's not what
people are asking for.  You're letting the desire for a perfect
solution (which is years late) block good solutions that exist NOW.

>> > If you want to prevent a certain class of jobs from exhausting a given
>> > resource, protecting that resource is the obvious thing to do.
>>
>> I don't follow your argument - isn't this exactly what this patch set
>> is doing - protecting resources?
>
> If you have proper protection over kernel memory consumption, this is
> completely covered because memory is the fundamental resource here.
> Controlling distribution of those fundamental resources is what
> cgroups are primarily about.

You say that's what cgroups are about, but it's not at all obvious
that you are right.  What users, admins, systems people want is
building blocks that are usable and make sense.  Limiting kernel
memory is NOT the logical building block, here.  It's not something
people can reason about or quantify easily.  if you need to implement
the interfaces in terms of memory, go nuts, but making users think
liek that is just not right.

>> > Wasn't it like a year ago?  Yeah, it's taking longer than everybody
>> > hoped but seriously kmemcg reclaimer just got merged and also did the
>> > new memcg interface which will tie kmemcg and memcg together.
>>
>> By my email it was almost 2 years ago, and that was the second or
>> third incarnation of this patch.
>
> Again, I agree this is taking a while.  Memory people had to retool
> the whole reclamation path to make this work, which is the pattern
> being repeated across the different controllers - we're refactoring a
> lot of infrastructure code so that resource control can integrate with
> the regular operation of the kernel, which BTW is what we should have
> been doing from the beginning.
>
> If your complaint is that this is taking too long, I hear you, and
> there's a certain amount of validity in arguing that upstreaming a
> temporary measure is the better trade-off, but the rationale for nproc
> (or nfds, or virtual memory, whatever) has been pretty weak otherwise.

At least 3 or 4 people have INDEPENDENTLY decided this is what is
causing them pain and tried to fix it and invested the time to send a
patch says that it is actually a thing.  There exists a problem that
you are disallowing to be fixed.  Do you recognize that users are
experiencing pain?  Why do you hate your users? :)

> And as for the different incarnations of this patchset.  Reposting the
> same stuff repeatedly doesn't really change anything.  Why would it?

Because reasonable people might survey the ecosystem and say "humm,
things have changed over the years - isolation has become a pretty
serious topic".  or maybe they hope that you'll finally agree that
fixing the problem NOW is worthwhile, even if the solution is
imperfect, and that a more perfect solution will arrive.

>> >> Something like this is long overdue, IMO, and is still more
>> >> appropriate and obvious than kmemcg anyway.
>> >
>> > Thanks for chiming in again but if you aren't bringing out anything
>> > new to the table (I don't remember you doing that last time either),
>> > I'm not sure why the decision would be different this time.
>>
>> I'm just vocalizing my support for this idea in defense of practical
>> solutions that work NOW instead of "engineering ideals" that never
>> actually arrive.
>>
>> As containers take the server world by storm, stuff like this gets
>> more and more important.
>
> Again, protection of kernel side memory consumption is important.
> There's no question about that.  As for the never-arriving part, well,
> it is arriving.  If you still can't believe, just take a look at the
> code.

Are you willing to put a drop-dead date on it?  If we don't have
kmemcg working well enough to _actually_ bound PID usage and FD usage
by, say, June 1st, will you then accept a patch to this effect?  If
the answer is no, then I have zero faith that it's coming any time
soon - I heard this 2 years ago.  I believed you then.

I see further downthread that you said you'll think about it.  Thank
you.  Just because our use cases are not normal does not mean we're
not valid :)

Tim
--
To

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

Hello, Austin.

On Fri, Feb 27, 2015 at 01:49:53PM -0500, Austin S Hemmelgarn wrote:
> As far as being trivial to achieve, I'm assuming you are referring to rlimit
> and PAM's limits module, both of which have their own issues. Using
> pam_limits.so to limit processes isn't trivial because it requires calling
> through PAM to begin with, which almost no software that isn't login related
> does, and rlimits are tricky to set up properly with the granularity that
> having a cgroup would provide.
...
> PID's are a fundamental resource, you run out and it's an only marginally
> better situation than OOM, namely, if you don't already have a shell open
> which has kill builtin (because you can't fork), or have some other reliable
> way to terminate processes without forking, you are stuck either waiting for
> the problem to resolve itself, or have to reset the system.

Right, this is an a lot more valid argument.  Currently, we're capping
max pid at 4M which translates to some tens of gigs of memory which
isn't a crazy amount on modern machines.  The hard(er) barrier would
be around 2^30 (2^29 from futex side, apparently) which would also be
reacheable on configurations w/ terabytes of memory.

I'll think more about it and get back.

Thanks a lot.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Austin S Hemmelgarn


On 2015-02-27 12:06, Tejun Heo wrote:

Hello,

On Fri, Feb 27, 2015 at 11:42:10AM -0500, Austin S Hemmelgarn wrote:

Kernel memory consumption isn't the only valid reason to want to limit the
number of processes in a cgroup.  Limiting the number of processes is very
useful to ensure that a program is working correctly (for example, the NTP
daemon should (usually) have an _exact_ number of children if it is
functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_
children), to prevent PID number exhaustion, to head off DoS attacks against
forking network servers before they get to the point of causing kmem
exhaustion, and to limit the number of processes in a cgroup that uses lots
of kernel memory very infrequently.


All the use cases you're listing are extremely niche and can be
trivially achieved without introducing another cgroup controller.  Not
only that, they're actually pretty silly.  Let's say NTP daemon is
misbehaving (or its code changed w/o you knowing or there are corner
cases which trigger extremely infrequently).  What do you exactly
achieve by rejecting its fork call?  It's just adding another
variation to the misbehavior.  It was misbehaving before and would now
be continuing to misbehave after a failed fork.

I wouldn't think that preventing PID exhaustion would be all that much 
of a niche case, it's fully possible for it to happen without using 
excessive amounts of kernel memory (think about BIG server systems with 
terabytes of memory running (arguably poorly written) forking servers 
that handle tens of thousands of client requests per second, each 
lasting multiple tens of seconds), and not necessarily as trivial as you 
might think to handle sanely (especially if you want callbacks when the 
limits get hit).
As far as being trivial to achieve, I'm assuming you are referring to 
rlimit and PAM's limits module, both of which have their own issues. 
Using pam_limits.so to limit processes isn't trivial because it requires 
calling through PAM to begin with, which almost no software that isn't 
login related does, and rlimits are tricky to set up properly with the 
granularity that having a cgroup would provide.

In general, I'm pretty strongly against adding controllers for things
which aren't fundamental resources in the system.  What's next?  Open
files?  Pipe buffer?  Number of flocks?  Number of session leaders or
program groups?

PID's are a fundamental resource, you run out and it's an only 
marginally better situation than OOM, namely, if you don't already have 
a shell open which has kill builtin (because you can't fork), or have 
some other reliable way to terminate processes without forking, you are 
stuck either waiting for the problem to resolve itself, or have to reset 
the system.

If you want to prevent a certain class of jobs from exhausting a given
resource, protecting that resource is the obvious thing to do.

Which is why I'm advocating something that provides a more robust method 
of preventing the system from exhausting PID numbers.

Thanks.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

On Fri, Feb 27, 2015 at 12:45:03PM -0500, Tejun Heo wrote:
> If your complaint is that this is taking too long, I hear you, and
> there's a certain amount of validity in arguing that upstreaming a
> temporary measure is the better trade-off, but the rationale for nproc
> (or nfds, or virtual memory, whatever) has been pretty weak otherwise.

Also, note that this is subset of a larger problem.  e.g. there's a
patchset trying to implement writeback IO control from the filesystem
layer.  cgroup control of writeback has been a thorny issue for over
three years now and the rationale for implementing this reversed
controlling scheme is about the same - doing it properly is too
difficult, let's bolt something on the top as a practical measure.

I think it'd be seriously short-sighted to give in and merge all
those.  These sorts of shortcuts are crippling in the long term.
Again, similarly, proper cgroup writeback support is literally right
around the corner.

The situation sure can be frustrating if you need something now but we
can't make decisions solely on that.  This is an a lot longer term
project and we better, for once, get things right.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

On Fri, Feb 27, 2015 at 09:25:10AM -0800, Tim Hockin wrote:
> > In general, I'm pretty strongly against adding controllers for things
> > which aren't fundamental resources in the system.  What's next?  Open
> > files?  Pipe buffer?  Number of flocks?  Number of session leaders or
> > program groups?
> 
> Yes to some or all of those.  We do exactly this internally and it has
> greatly added to the stability of our overall container management
> system.  and while you have been telling everyone to wait for kmemcg,
> we have had an extra 3+ years of stability.

Yeah, good job.  I totally get why kernel part of memory consumption
needs protection.  I'm not arguing against that at all.

> > If you want to prevent a certain class of jobs from exhausting a given
> > resource, protecting that resource is the obvious thing to do.
> 
> I don't follow your argument - isn't this exactly what this patch set
> is doing - protecting resources?

If you have proper protection over kernel memory consumption, this is
completely covered because memory is the fundamental resource here.
Controlling distribution of those fundamental resources is what
cgroups are primarily about.

> > Wasn't it like a year ago?  Yeah, it's taking longer than everybody
> > hoped but seriously kmemcg reclaimer just got merged and also did the
> > new memcg interface which will tie kmemcg and memcg together.
> 
> By my email it was almost 2 years ago, and that was the second or
> third incarnation of this patch.

Again, I agree this is taking a while.  Memory people had to retool
the whole reclamation path to make this work, which is the pattern
being repeated across the different controllers - we're refactoring a
lot of infrastructure code so that resource control can integrate with
the regular operation of the kernel, which BTW is what we should have
been doing from the beginning.

If your complaint is that this is taking too long, I hear you, and
there's a certain amount of validity in arguing that upstreaming a
temporary measure is the better trade-off, but the rationale for nproc
(or nfds, or virtual memory, whatever) has been pretty weak otherwise.

And as for the different incarnations of this patchset.  Reposting the
same stuff repeatedly doesn't really change anything.  Why would it?

> >> Something like this is long overdue, IMO, and is still more
> >> appropriate and obvious than kmemcg anyway.
> >
> > Thanks for chiming in again but if you aren't bringing out anything
> > new to the table (I don't remember you doing that last time either),
> > I'm not sure why the decision would be different this time.
> 
> I'm just vocalizing my support for this idea in defense of practical
> solutions that work NOW instead of "engineering ideals" that never
> actually arrive.
> 
> As containers take the server world by storm, stuff like this gets
> more and more important.

Again, protection of kernel side memory consumption is important.
There's no question about that.  As for the never-arriving part, well,
it is arriving.  If you still can't believe, just take a look at the
code.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tim Hockin

On Fri, Feb 27, 2015 at 9:06 AM, Tejun Heo  wrote:
> Hello,
>
> On Fri, Feb 27, 2015 at 11:42:10AM -0500, Austin S Hemmelgarn wrote:
>> Kernel memory consumption isn't the only valid reason to want to limit the
>> number of processes in a cgroup.  Limiting the number of processes is very
>> useful to ensure that a program is working correctly (for example, the NTP
>> daemon should (usually) have an _exact_ number of children if it is
>> functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_
>> children), to prevent PID number exhaustion, to head off DoS attacks against
>> forking network servers before they get to the point of causing kmem
>> exhaustion, and to limit the number of processes in a cgroup that uses lots
>> of kernel memory very infrequently.
>
> All the use cases you're listing are extremely niche and can be
> trivially achieved without introducing another cgroup controller.  Not
> only that, they're actually pretty silly.  Let's say NTP daemon is
> misbehaving (or its code changed w/o you knowing or there are corner
> cases which trigger extremely infrequently).  What do you exactly
> achieve by rejecting its fork call?  It's just adding another
> variation to the misbehavior.  It was misbehaving before and would now
> be continuing to misbehave after a failed fork.
>
> In general, I'm pretty strongly against adding controllers for things
> which aren't fundamental resources in the system.  What's next?  Open
> files?  Pipe buffer?  Number of flocks?  Number of session leaders or
> program groups?

Yes to some or all of those.  We do exactly this internally and it has
greatly added to the stability of our overall container management
system.  and while you have been telling everyone to wait for kmemcg,
we have had an extra 3+ years of stability.

> If you want to prevent a certain class of jobs from exhausting a given
> resource, protecting that resource is the obvious thing to do.

I don't follow your argument - isn't this exactly what this patch set
is doing - protecting resources?

> Wasn't it like a year ago?  Yeah, it's taking longer than everybody
> hoped but seriously kmemcg reclaimer just got merged and also did the
> new memcg interface which will tie kmemcg and memcg together.

By my email it was almost 2 years ago, and that was the second or
third incarnation of this patch.

>> Something like this is long overdue, IMO, and is still more
>> appropriate and obvious than kmemcg anyway.
>
> Thanks for chiming in again but if you aren't bringing out anything
> new to the table (I don't remember you doing that last time either),
> I'm not sure why the decision would be different this time.

I'm just vocalizing my support for this idea in defense of practical
solutions that work NOW instead of "engineering ideals" that never
actually arrive.

As containers take the server world by storm, stuff like this gets
more and more important.

Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

On Fri, Feb 27, 2015 at 09:12:45AM -0800, Tim Hockin wrote:
> I was told that the plan was to use kmemcg - but I was told that YEARS
> AGO.  In the mean time we all either do our own thing or we do nothing
> and suffer.

Wasn't it like a year ago?  Yeah, it's taking longer than everybody
hoped but seriously kmemcg reclaimer just got merged and also did the
new memcg interface which will tie kmemcg and memcg together.

> Something like this is long overdue, IMO, and is still more
> appropriate and obvious than kmemcg anyway.

Thanks for chiming in again but if you aren't bringing out anything
new to the table (I don't remember you doing that last time either),
I'm not sure why the decision would be different this time.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tim Hockin

On Fri, Feb 27, 2015 at 8:42 AM, Austin S Hemmelgarn
 wrote:
> On 2015-02-27 06:49, Tejun Heo wrote:
>>
>> Hello,
>>
>> On Mon, Feb 23, 2015 at 02:08:09PM +1100, Aleksa Sarai wrote:
>>>
>>> The current state of resource limitation for the number of open
>>> processes (as well as the number of open file descriptors) requires you
>>> to use setrlimit(2), which means that you are limited to resource
>>> limiting process trees rather than resource limiting cgroups (which is
>>> the point of cgroups).
>>>
>>> There was a patch to implement this in 2011[1], but that was rejected
>>> because it implemented a general-purpose rlimit subsystem -- which meant
>>> that you couldn't control distinct resource limits in different
>>> heirarchies. This patch implements a resource controller *specifically*
>>> for the number of processes in a cgroup, overcoming this issue.
>>>
>>> There has been a similar attempt to implement a resource controller for
>>> the number of open file descriptors[2], which has not been merged
>>> becasue the reasons were dubious. Merely from a "sane interface"
>>> perspective, it should be possible to utilise cgroups to do such
>>> rudimentary resource management (which currently only exists for process
>>> trees).
>>
>>
>> This isn't a proper resource to control.  kmemcg just grew proper
>> reclaim support and will be useable to control kernel side of memory
>> consumption.

I was told that the plan was to use kmemcg - but I was told that YEARS
AGO.  In the mean time we all either do our own thing or we do nothing
and suffer.

Something like this is long overdue, IMO, and is still more
appropriate and obvious than kmemcg anyway.


>> Thanks.
>>
> Kernel memory consumption isn't the only valid reason to want to limit the
> number of processes in a cgroup.  Limiting the number of processes is very
> useful to ensure that a program is working correctly (for example, the NTP
> daemon should (usually) have an _exact_ number of children if it is
> functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_
> children), to prevent PID number exhaustion, to head off DoS attacks against
> forking network servers before they get to the point of causing kmem
> exhaustion, and to limit the number of processes in a cgroup that uses lots
> of kernel memory very infrequently.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

Hello,

On Fri, Feb 27, 2015 at 11:42:10AM -0500, Austin S Hemmelgarn wrote:
> Kernel memory consumption isn't the only valid reason to want to limit the
> number of processes in a cgroup.  Limiting the number of processes is very
> useful to ensure that a program is working correctly (for example, the NTP
> daemon should (usually) have an _exact_ number of children if it is
> functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_
> children), to prevent PID number exhaustion, to head off DoS attacks against
> forking network servers before they get to the point of causing kmem
> exhaustion, and to limit the number of processes in a cgroup that uses lots
> of kernel memory very infrequently.

All the use cases you're listing are extremely niche and can be
trivially achieved without introducing another cgroup controller.  Not
only that, they're actually pretty silly.  Let's say NTP daemon is
misbehaving (or its code changed w/o you knowing or there are corner
cases which trigger extremely infrequently).  What do you exactly
achieve by rejecting its fork call?  It's just adding another
variation to the misbehavior.  It was misbehaving before and would now
be continuing to misbehave after a failed fork.

In general, I'm pretty strongly against adding controllers for things
which aren't fundamental resources in the system.  What's next?  Open
files?  Pipe buffer?  Number of flocks?  Number of session leaders or
program groups?

If you want to prevent a certain class of jobs from exhausting a given
resource, protecting that resource is the obvious thing to do.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Austin S Hemmelgarn


On 2015-02-27 06:49, Tejun Heo wrote:

Hello,

On Mon, Feb 23, 2015 at 02:08:09PM +1100, Aleksa Sarai wrote:

The current state of resource limitation for the number of open
processes (as well as the number of open file descriptors) requires you
to use setrlimit(2), which means that you are limited to resource
limiting process trees rather than resource limiting cgroups (which is
the point of cgroups).

There was a patch to implement this in 2011[1], but that was rejected
because it implemented a general-purpose rlimit subsystem -- which meant
that you couldn't control distinct resource limits in different
heirarchies. This patch implements a resource controller *specifically*
for the number of processes in a cgroup, overcoming this issue.

There has been a similar attempt to implement a resource controller for
the number of open file descriptors[2], which has not been merged
becasue the reasons were dubious. Merely from a "sane interface"
perspective, it should be possible to utilise cgroups to do such
rudimentary resource management (which currently only exists for process
trees).


This isn't a proper resource to control.  kmemcg just grew proper
reclaim support and will be useable to control kernel side of memory
consumption.

Thanks.

Kernel memory consumption isn't the only valid reason to want to limit 
the number of processes in a cgroup.  Limiting the number of processes 
is very useful to ensure that a program is working correctly (for 
example, the NTP daemon should (usually) have an _exact_ number of 
children if it is functioning correctly, and rpcbind shouldn't (AFAIK) 
ever have _any_ children), to prevent PID number exhaustion, to head off 
DoS attacks against forking network servers before they get to the point 
of causing kmem exhaustion, and to limit the number of processes in a 
cgroup that uses lots of kernel memory very infrequently.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

Hello,

On Fri, Feb 27, 2015 at 02:46:13PM +0100, Richard Weinberger wrote:
> just to make sure that I understand the big picture.
> The plan is to limit kernel memory per cgroup such that fork bombs and
> stuff cannot harm other groups of processes?

Yes, the kmem part of memcg hasn't really been functional because the
reclaim part was broken and (partially conseqently) kmem config being
siloed from the rest but we're very close to solving that at this
point.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Richard Weinberger

Tejun,

Am 27.02.2015 um 12:49 schrieb Tejun Heo:
> This isn't a proper resource to control.  kmemcg just grew proper
> reclaim support and will be useable to control kernel side of memory
> consumption.

just to make sure that I understand the big picture.
The plan is to limit kernel memory per cgroup such that fork bombs and
stuff cannot harm other groups of processes?

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

Hello,

On Mon, Feb 23, 2015 at 02:08:09PM +1100, Aleksa Sarai wrote:
> The current state of resource limitation for the number of open
> processes (as well as the number of open file descriptors) requires you
> to use setrlimit(2), which means that you are limited to resource
> limiting process trees rather than resource limiting cgroups (which is
> the point of cgroups).
> 
> There was a patch to implement this in 2011[1], but that was rejected
> because it implemented a general-purpose rlimit subsystem -- which meant
> that you couldn't control distinct resource limits in different
> heirarchies. This patch implements a resource controller *specifically*
> for the number of processes in a cgroup, overcoming this issue.
> 
> There has been a similar attempt to implement a resource controller for
> the number of open file descriptors[2], which has not been merged
> becasue the reasons were dubious. Merely from a "sane interface"
> perspective, it should be possible to utilise cgroups to do such
> rudimentary resource management (which currently only exists for process
> trees).

This isn't a proper resource to control.  kmemcg just grew proper
reclaim support and will be useable to control kernel side of memory
consumption.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Austin S Hemmelgarn


On 2015-02-27 06:49, Tejun Heo wrote:

Hello,

On Mon, Feb 23, 2015 at 02:08:09PM +1100, Aleksa Sarai wrote:

The current state of resource limitation for the number of open
processes (as well as the number of open file descriptors) requires you
to use setrlimit(2), which means that you are limited to resource
limiting process trees rather than resource limiting cgroups (which is
the point of cgroups).

There was a patch to implement this in 2011[1], but that was rejected
because it implemented a general-purpose rlimit subsystem -- which meant
that you couldn't control distinct resource limits in different
heirarchies. This patch implements a resource controller *specifically*
for the number of processes in a cgroup, overcoming this issue.

There has been a similar attempt to implement a resource controller for
the number of open file descriptors[2], which has not been merged
becasue the reasons were dubious. Merely from a sane interface
perspective, it should be possible to utilise cgroups to do such
rudimentary resource management (which currently only exists for process
trees).


This isn't a proper resource to control.  kmemcg just grew proper
reclaim support and will be useable to control kernel side of memory
consumption.

Thanks.

Kernel memory consumption isn't the only valid reason to want to limit 
the number of processes in a cgroup.  Limiting the number of processes 
is very useful to ensure that a program is working correctly (for 
example, the NTP daemon should (usually) have an _exact_ number of 
children if it is functioning correctly, and rpcbind shouldn't (AFAIK) 
ever have _any_ children), to prevent PID number exhaustion, to head off 
DoS attacks against forking network servers before they get to the point 
of causing kmem exhaustion, and to limit the number of processes in a 
cgroup that uses lots of kernel memory very infrequently.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tim Hockin

On Fri, Feb 27, 2015 at 8:42 AM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:
 On 2015-02-27 06:49, Tejun Heo wrote:

 Hello,

 On Mon, Feb 23, 2015 at 02:08:09PM +1100, Aleksa Sarai wrote:

 The current state of resource limitation for the number of open
 processes (as well as the number of open file descriptors) requires you
 to use setrlimit(2), which means that you are limited to resource
 limiting process trees rather than resource limiting cgroups (which is
 the point of cgroups).

 There was a patch to implement this in 2011[1], but that was rejected
 because it implemented a general-purpose rlimit subsystem -- which meant
 that you couldn't control distinct resource limits in different
 heirarchies. This patch implements a resource controller *specifically*
 for the number of processes in a cgroup, overcoming this issue.

 There has been a similar attempt to implement a resource controller for
 the number of open file descriptors[2], which has not been merged
 becasue the reasons were dubious. Merely from a sane interface
 perspective, it should be possible to utilise cgroups to do such
 rudimentary resource management (which currently only exists for process
 trees).


 This isn't a proper resource to control.  kmemcg just grew proper
 reclaim support and will be useable to control kernel side of memory
 consumption.

I was told that the plan was to use kmemcg - but I was told that YEARS
AGO.  In the mean time we all either do our own thing or we do nothing
and suffer.

Something like this is long overdue, IMO, and is still more
appropriate and obvious than kmemcg anyway.


 Thanks.

 Kernel memory consumption isn't the only valid reason to want to limit the
 number of processes in a cgroup.  Limiting the number of processes is very
 useful to ensure that a program is working correctly (for example, the NTP
 daemon should (usually) have an _exact_ number of children if it is
 functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_
 children), to prevent PID number exhaustion, to head off DoS attacks against
 forking network servers before they get to the point of causing kmem
 exhaustion, and to limit the number of processes in a cgroup that uses lots
 of kernel memory very infrequently.

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

On Fri, Feb 27, 2015 at 09:12:45AM -0800, Tim Hockin wrote:
 I was told that the plan was to use kmemcg - but I was told that YEARS
 AGO.  In the mean time we all either do our own thing or we do nothing
 and suffer.

Wasn't it like a year ago?  Yeah, it's taking longer than everybody
hoped but seriously kmemcg reclaimer just got merged and also did the
new memcg interface which will tie kmemcg and memcg together.

 Something like this is long overdue, IMO, and is still more
 appropriate and obvious than kmemcg anyway.

Thanks for chiming in again but if you aren't bringing out anything
new to the table (I don't remember you doing that last time either),
I'm not sure why the decision would be different this time.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tim Hockin

On Fri, Feb 27, 2015 at 9:06 AM, Tejun Heo t...@kernel.org wrote:
 Hello,

 On Fri, Feb 27, 2015 at 11:42:10AM -0500, Austin S Hemmelgarn wrote:
 Kernel memory consumption isn't the only valid reason to want to limit the
 number of processes in a cgroup.  Limiting the number of processes is very
 useful to ensure that a program is working correctly (for example, the NTP
 daemon should (usually) have an _exact_ number of children if it is
 functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_
 children), to prevent PID number exhaustion, to head off DoS attacks against
 forking network servers before they get to the point of causing kmem
 exhaustion, and to limit the number of processes in a cgroup that uses lots
 of kernel memory very infrequently.

 All the use cases you're listing are extremely niche and can be
 trivially achieved without introducing another cgroup controller.  Not
 only that, they're actually pretty silly.  Let's say NTP daemon is
 misbehaving (or its code changed w/o you knowing or there are corner
 cases which trigger extremely infrequently).  What do you exactly
 achieve by rejecting its fork call?  It's just adding another
 variation to the misbehavior.  It was misbehaving before and would now
 be continuing to misbehave after a failed fork.

 In general, I'm pretty strongly against adding controllers for things
 which aren't fundamental resources in the system.  What's next?  Open
 files?  Pipe buffer?  Number of flocks?  Number of session leaders or
 program groups?

Yes to some or all of those.  We do exactly this internally and it has
greatly added to the stability of our overall container management
system.  and while you have been telling everyone to wait for kmemcg,
we have had an extra 3+ years of stability.

 If you want to prevent a certain class of jobs from exhausting a given
 resource, protecting that resource is the obvious thing to do.

I don't follow your argument - isn't this exactly what this patch set
is doing - protecting resources?

 Wasn't it like a year ago?  Yeah, it's taking longer than everybody
 hoped but seriously kmemcg reclaimer just got merged and also did the
 new memcg interface which will tie kmemcg and memcg together.

By my email it was almost 2 years ago, and that was the second or
third incarnation of this patch.

 Something like this is long overdue, IMO, and is still more
 appropriate and obvious than kmemcg anyway.

 Thanks for chiming in again but if you aren't bringing out anything
 new to the table (I don't remember you doing that last time either),
 I'm not sure why the decision would be different this time.

I'm just vocalizing my support for this idea in defense of practical
solutions that work NOW instead of engineering ideals that never
actually arrive.

As containers take the server world by storm, stuff like this gets
more and more important.

Tim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

Hello,

On Fri, Feb 27, 2015 at 11:42:10AM -0500, Austin S Hemmelgarn wrote:
 Kernel memory consumption isn't the only valid reason to want to limit the
 number of processes in a cgroup.  Limiting the number of processes is very
 useful to ensure that a program is working correctly (for example, the NTP
 daemon should (usually) have an _exact_ number of children if it is
 functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_
 children), to prevent PID number exhaustion, to head off DoS attacks against
 forking network servers before they get to the point of causing kmem
 exhaustion, and to limit the number of processes in a cgroup that uses lots
 of kernel memory very infrequently.

All the use cases you're listing are extremely niche and can be
trivially achieved without introducing another cgroup controller.  Not
only that, they're actually pretty silly.  Let's say NTP daemon is
misbehaving (or its code changed w/o you knowing or there are corner
cases which trigger extremely infrequently).  What do you exactly
achieve by rejecting its fork call?  It's just adding another
variation to the misbehavior.  It was misbehaving before and would now
be continuing to misbehave after a failed fork.

In general, I'm pretty strongly against adding controllers for things
which aren't fundamental resources in the system.  What's next?  Open
files?  Pipe buffer?  Number of flocks?  Number of session leaders or
program groups?

If you want to prevent a certain class of jobs from exhausting a given
resource, protecting that resource is the obvious thing to do.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

On Fri, Feb 27, 2015 at 09:25:10AM -0800, Tim Hockin wrote:
  In general, I'm pretty strongly against adding controllers for things
  which aren't fundamental resources in the system.  What's next?  Open
  files?  Pipe buffer?  Number of flocks?  Number of session leaders or
  program groups?
 
 Yes to some or all of those.  We do exactly this internally and it has
 greatly added to the stability of our overall container management
 system.  and while you have been telling everyone to wait for kmemcg,
 we have had an extra 3+ years of stability.

Yeah, good job.  I totally get why kernel part of memory consumption
needs protection.  I'm not arguing against that at all.

  If you want to prevent a certain class of jobs from exhausting a given
  resource, protecting that resource is the obvious thing to do.
 
 I don't follow your argument - isn't this exactly what this patch set
 is doing - protecting resources?

If you have proper protection over kernel memory consumption, this is
completely covered because memory is the fundamental resource here.
Controlling distribution of those fundamental resources is what
cgroups are primarily about.

  Wasn't it like a year ago?  Yeah, it's taking longer than everybody
  hoped but seriously kmemcg reclaimer just got merged and also did the
  new memcg interface which will tie kmemcg and memcg together.
 
 By my email it was almost 2 years ago, and that was the second or
 third incarnation of this patch.

Again, I agree this is taking a while.  Memory people had to retool
the whole reclamation path to make this work, which is the pattern
being repeated across the different controllers - we're refactoring a
lot of infrastructure code so that resource control can integrate with
the regular operation of the kernel, which BTW is what we should have
been doing from the beginning.

If your complaint is that this is taking too long, I hear you, and
there's a certain amount of validity in arguing that upstreaming a
temporary measure is the better trade-off, but the rationale for nproc
(or nfds, or virtual memory, whatever) has been pretty weak otherwise.

And as for the different incarnations of this patchset.  Reposting the
same stuff repeatedly doesn't really change anything.  Why would it?

  Something like this is long overdue, IMO, and is still more
  appropriate and obvious than kmemcg anyway.
 
  Thanks for chiming in again but if you aren't bringing out anything
  new to the table (I don't remember you doing that last time either),
  I'm not sure why the decision would be different this time.
 
 I'm just vocalizing my support for this idea in defense of practical
 solutions that work NOW instead of engineering ideals that never
 actually arrive.
 
 As containers take the server world by storm, stuff like this gets
 more and more important.

Again, protection of kernel side memory consumption is important.
There's no question about that.  As for the never-arriving part, well,
it is arriving.  If you still can't believe, just take a look at the
code.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

Hello,

On Fri, Feb 27, 2015 at 02:46:13PM +0100, Richard Weinberger wrote:
 just to make sure that I understand the big picture.
 The plan is to limit kernel memory per cgroup such that fork bombs and
 stuff cannot harm other groups of processes?

Yes, the kmem part of memcg hasn't really been functional because the
reclaim part was broken and (partially conseqently) kmem config being
siloed from the rest but we're very close to solving that at this
point.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Richard Weinberger

Tejun,

Am 27.02.2015 um 12:49 schrieb Tejun Heo:
 This isn't a proper resource to control.  kmemcg just grew proper
 reclaim support and will be useable to control kernel side of memory
 consumption.

just to make sure that I understand the big picture.
The plan is to limit kernel memory per cgroup such that fork bombs and
stuff cannot harm other groups of processes?

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Austin S Hemmelgarn


On 2015-02-27 12:06, Tejun Heo wrote:

Hello,

On Fri, Feb 27, 2015 at 11:42:10AM -0500, Austin S Hemmelgarn wrote:

Kernel memory consumption isn't the only valid reason to want to limit the
number of processes in a cgroup.  Limiting the number of processes is very
useful to ensure that a program is working correctly (for example, the NTP
daemon should (usually) have an _exact_ number of children if it is
functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_
children), to prevent PID number exhaustion, to head off DoS attacks against
forking network servers before they get to the point of causing kmem
exhaustion, and to limit the number of processes in a cgroup that uses lots
of kernel memory very infrequently.


All the use cases you're listing are extremely niche and can be
trivially achieved without introducing another cgroup controller.  Not
only that, they're actually pretty silly.  Let's say NTP daemon is
misbehaving (or its code changed w/o you knowing or there are corner
cases which trigger extremely infrequently).  What do you exactly
achieve by rejecting its fork call?  It's just adding another
variation to the misbehavior.  It was misbehaving before and would now
be continuing to misbehave after a failed fork.

I wouldn't think that preventing PID exhaustion would be all that much 
of a niche case, it's fully possible for it to happen without using 
excessive amounts of kernel memory (think about BIG server systems with 
terabytes of memory running (arguably poorly written) forking servers 
that handle tens of thousands of client requests per second, each 
lasting multiple tens of seconds), and not necessarily as trivial as you 
might think to handle sanely (especially if you want callbacks when the 
limits get hit).
As far as being trivial to achieve, I'm assuming you are referring to 
rlimit and PAM's limits module, both of which have their own issues. 
Using pam_limits.so to limit processes isn't trivial because it requires 
calling through PAM to begin with, which almost no software that isn't 
login related does, and rlimits are tricky to set up properly with the 
granularity that having a cgroup would provide.

In general, I'm pretty strongly against adding controllers for things
which aren't fundamental resources in the system.  What's next?  Open
files?  Pipe buffer?  Number of flocks?  Number of session leaders or
program groups?

PID's are a fundamental resource, you run out and it's an only 
marginally better situation than OOM, namely, if you don't already have 
a shell open which has kill builtin (because you can't fork), or have 
some other reliable way to terminate processes without forking, you are 
stuck either waiting for the problem to resolve itself, or have to reset 
the system.

If you want to prevent a certain class of jobs from exhausting a given
resource, protecting that resource is the obvious thing to do.

Which is why I'm advocating something that provides a more robust method 
of preventing the system from exhausting PID numbers.

Thanks.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

On Fri, Feb 27, 2015 at 12:45:03PM -0500, Tejun Heo wrote:
 If your complaint is that this is taking too long, I hear you, and
 there's a certain amount of validity in arguing that upstreaming a
 temporary measure is the better trade-off, but the rationale for nproc
 (or nfds, or virtual memory, whatever) has been pretty weak otherwise.

Also, note that this is subset of a larger problem.  e.g. there's a
patchset trying to implement writeback IO control from the filesystem
layer.  cgroup control of writeback has been a thorny issue for over
three years now and the rationale for implementing this reversed
controlling scheme is about the same - doing it properly is too
difficult, let's bolt something on the top as a practical measure.

I think it'd be seriously short-sighted to give in and merge all
those.  These sorts of shortcuts are crippling in the long term.
Again, similarly, proper cgroup writeback support is literally right
around the corner.

The situation sure can be frustrating if you need something now but we
can't make decisions solely on that.  This is an a lot longer term
project and we better, for once, get things right.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

Hello, Austin.

On Fri, Feb 27, 2015 at 01:49:53PM -0500, Austin S Hemmelgarn wrote:
 As far as being trivial to achieve, I'm assuming you are referring to rlimit
 and PAM's limits module, both of which have their own issues. Using
 pam_limits.so to limit processes isn't trivial because it requires calling
 through PAM to begin with, which almost no software that isn't login related
 does, and rlimits are tricky to set up properly with the granularity that
 having a cgroup would provide.
...
 PID's are a fundamental resource, you run out and it's an only marginally
 better situation than OOM, namely, if you don't already have a shell open
 which has kill builtin (because you can't fork), or have some other reliable
 way to terminate processes without forking, you are stuck either waiting for
 the problem to resolve itself, or have to reset the system.

Right, this is an a lot more valid argument.  Currently, we're capping
max pid at 4M which translates to some tens of gigs of memory which
isn't a crazy amount on modern machines.  The hard(er) barrier would
be around 2^30 (2^29 from futex side, apparently) which would also be
reacheable on configurations w/ terabytes of memory.

I'll think more about it and get back.

Thanks a lot.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tim Hockin

On Fri, Feb 27, 2015 at 9:45 AM, Tejun Heo t...@kernel.org wrote:
 On Fri, Feb 27, 2015 at 09:25:10AM -0800, Tim Hockin wrote:
  In general, I'm pretty strongly against adding controllers for things
  which aren't fundamental resources in the system.  What's next?  Open
  files?  Pipe buffer?  Number of flocks?  Number of session leaders or
  program groups?

 Yes to some or all of those.  We do exactly this internally and it has
 greatly added to the stability of our overall container management
 system.  and while you have been telling everyone to wait for kmemcg,
 we have had an extra 3+ years of stability.

 Yeah, good job.  I totally get why kernel part of memory consumption
 needs protection.  I'm not arguing against that at all.

You keep shifting the focus to be about memory, but that's not what
people are asking for.  You're letting the desire for a perfect
solution (which is years late) block good solutions that exist NOW.

  If you want to prevent a certain class of jobs from exhausting a given
  resource, protecting that resource is the obvious thing to do.

 I don't follow your argument - isn't this exactly what this patch set
 is doing - protecting resources?

 If you have proper protection over kernel memory consumption, this is
 completely covered because memory is the fundamental resource here.
 Controlling distribution of those fundamental resources is what
 cgroups are primarily about.

You say that's what cgroups are about, but it's not at all obvious
that you are right.  What users, admins, systems people want is
building blocks that are usable and make sense.  Limiting kernel
memory is NOT the logical building block, here.  It's not something
people can reason about or quantify easily.  if you need to implement
the interfaces in terms of memory, go nuts, but making users think
liek that is just not right.

  Wasn't it like a year ago?  Yeah, it's taking longer than everybody
  hoped but seriously kmemcg reclaimer just got merged and also did the
  new memcg interface which will tie kmemcg and memcg together.

 By my email it was almost 2 years ago, and that was the second or
 third incarnation of this patch.

 Again, I agree this is taking a while.  Memory people had to retool
 the whole reclamation path to make this work, which is the pattern
 being repeated across the different controllers - we're refactoring a
 lot of infrastructure code so that resource control can integrate with
 the regular operation of the kernel, which BTW is what we should have
 been doing from the beginning.

 If your complaint is that this is taking too long, I hear you, and
 there's a certain amount of validity in arguing that upstreaming a
 temporary measure is the better trade-off, but the rationale for nproc
 (or nfds, or virtual memory, whatever) has been pretty weak otherwise.

At least 3 or 4 people have INDEPENDENTLY decided this is what is
causing them pain and tried to fix it and invested the time to send a
patch says that it is actually a thing.  There exists a problem that
you are disallowing to be fixed.  Do you recognize that users are
experiencing pain?  Why do you hate your users? :)

 And as for the different incarnations of this patchset.  Reposting the
 same stuff repeatedly doesn't really change anything.  Why would it?

Because reasonable people might survey the ecosystem and say humm,
things have changed over the years - isolation has become a pretty
serious topic.  or maybe they hope that you'll finally agree that
fixing the problem NOW is worthwhile, even if the solution is
imperfect, and that a more perfect solution will arrive.

  Something like this is long overdue, IMO, and is still more
  appropriate and obvious than kmemcg anyway.
 
  Thanks for chiming in again but if you aren't bringing out anything
  new to the table (I don't remember you doing that last time either),
  I'm not sure why the decision would be different this time.

 I'm just vocalizing my support for this idea in defense of practical
 solutions that work NOW instead of engineering ideals that never
 actually arrive.

 As containers take the server world by storm, stuff like this gets
 more and more important.

 Again, protection of kernel side memory consumption is important.
 There's no question about that.  As for the never-arriving part, well,
 it is arriving.  If you still can't believe, just take a look at the
 code.

Are you willing to put a drop-dead date on it?  If we don't have
kmemcg working well enough to _actually_ bound PID usage and FD usage
by, say, June 1st, will you then accept a patch to this effect?  If
the answer is no, then I have zero faith that it's coming any time
soon - I heard this 2 years ago.  I believed you then.

I see further downthread that you said you'll think about it.  Thank
you.  Just because our use cases are not normal does not mean we're
not valid :)

Tim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

On Fri, Feb 27, 2015 at 01:45:09PM -0800, Tim Hockin wrote:
 Are you willing to put a drop-dead date on it?  If we don't have
 kmemcg working well enough to _actually_ bound PID usage and FD usage
 by, say, June 1st, will you then accept a patch to this effect?  If
 the answer is no, then I have zero faith that it's coming any time
 soon - I heard this 2 years ago.  I believed you then.

Tim, cut this bullshit.  That's not how kernel development works.
Contribute to techincal discussion or shut it.  I'm really getting
tired of your whining without any useful substance.

 I see further downthread that you said you'll think about it.  Thank
 you.  Just because our use cases are not normal does not mean we're
 not valid :)

And can you even see why that made progress?

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] add nproc cgroup subsystem

2015-02-27 Thread Tejun Heo

Hello,

On Mon, Feb 23, 2015 at 02:08:09PM +1100, Aleksa Sarai wrote:
 The current state of resource limitation for the number of open
 processes (as well as the number of open file descriptors) requires you
 to use setrlimit(2), which means that you are limited to resource
 limiting process trees rather than resource limiting cgroups (which is
 the point of cgroups).
 
 There was a patch to implement this in 2011[1], but that was rejected
 because it implemented a general-purpose rlimit subsystem -- which meant
 that you couldn't control distinct resource limits in different
 heirarchies. This patch implements a resource controller *specifically*
 for the number of processes in a cgroup, overcoming this issue.
 
 There has been a similar attempt to implement a resource controller for
 the number of open file descriptors[2], which has not been merged
 becasue the reasons were dubious. Merely from a sane interface
 perspective, it should be possible to utilise cgroups to do such
 rudimentary resource management (which currently only exists for process
 trees).

This isn't a proper resource to control.  kmemcg just grew proper
reclaim support and will be useable to control kernel side of memory
consumption.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

52 matches

Mail list logo