Re: User space out of memory approach

2005-01-28 Thread Andrea Arcangeli
On Fri, Jan 28, 2005 at 11:21:11AM -0400, Mauricio Lin wrote:
> As you know, Andrew generated the patch. Here goes some test results
> about your OOM Killer and the Original OOm Killer. We accomplished 10
> experiments for each OOM Killer and below are average values.
> 
> "Invocations" is the number of times that out_of_memory function is
> called. "Selections" is the number of times that select_bad_process
> function is called and "Killed" is the number of killed process.
> 
> Original OOM Killer
> Invocations average = 51620/10 = 5162
> Selections average = 30/10 = 3
> Killed average = 38/10 = 3.8
> 
> Andrea OOM Killer
> Invocations average = 213/10 = 21.3
> Selections average = 213/10 = 21.3
> Killed average = 52/10 = 5.2
> 
> As you can see the number of invocations reduced significantly using
> your OOM Killer.

Yep, thanks for testing!

> I did not know about this problem when I was moving the original
> ranking algorithm to userland. As Thomaz mentioned: invocation
> madness, reentrancy problems and those strange timers and counter as
> now, since, last, lastkill and count. I guess that now i can put some
> OOM Killer stuffs in userland in a safer manner with those problems
> solved, right?

Yep ;)

> BTW, will your OOM Killer be included in the kernel tree?

Yes, Andrew said it should go in the next few days, which is a great
news, thanks everyone!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-28 Thread Mauricio Lin
Hi Andrea,


On Fri, 28 Jan 2005 09:58:24 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> Hi Andrea,
> 
> On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> 
> wrote:
> > On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote:
> > > Hi Andrea,
> > >
> > > On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> 
> > > wrote:
> > > > On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
> > > > > Sometimes the first application to be killed is XFree. AFAIK the
> > > >
> > > > This makes more sense now. You need somebody trapping sigterm in order
> > > > to lockup and X sure traps it to recover the text console.
> > > >
> > > > Can you replace this:
> > > >
> > > > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
> > > > force_sig(SIGTERM, p);
> > > > } else {
> > > > force_sig(SIGKILL, p);
> > > > }
> > > >
> > > > with this?
> > > >
> > > > force_sig(SIGKILL, p);
> > > >
> > > > in mm/oom_kill.c.
> > >
> > > Nice. Your suggestion made the error goes away.
> > >
> > > We are still testing in order to compare between your OOM Killer and
> > > Original OOM Killer.
> >
> > Ok, thanks for the confirmation. So my theory was right.
> >
> > Basically we've to make this patch, now that you already edited the
> > code, can you diff and send a patch that will be the 6/5 in the serie?
> 
> OK. I will send the patch.

As you know, Andrew generated the patch. Here goes some test results
about your OOM Killer and the Original OOm Killer. We accomplished 10
experiments for each OOM Killer and below are average values.

"Invocations" is the number of times that out_of_memory function is
called. "Selections" is the number of times that select_bad_process
function is called and "Killed" is the number of killed process.

Original OOM Killer
Invocations average = 51620/10 = 5162
Selections average = 30/10 = 3
Killed average = 38/10 = 3.8

Andrea OOM Killer
Invocations average = 213/10 = 21.3
Selections average = 213/10 = 21.3
Killed average = 52/10 = 5.2

As you can see the number of invocations reduced significantly using
your OOM Killer.

I did not know about this problem when I was moving the original
ranking algorithm to userland. As Thomaz mentioned: invocation
madness, reentrancy problems and those strange timers and counter as
now, since, last, lastkill and count. I guess that now i can put some
OOM Killer stuffs in userland in a safer manner with those problems
solved, right?

BTW, will your OOM Killer be included in the kernel tree?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-28 Thread Mauricio Lin
Hi Andrea,

On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote:
> > Hi Andrea,
> >
> > On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> 
> > wrote:
> > > On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
> > > > Sometimes the first application to be killed is XFree. AFAIK the
> > >
> > > This makes more sense now. You need somebody trapping sigterm in order
> > > to lockup and X sure traps it to recover the text console.
> > >
> > > Can you replace this:
> > >
> > > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
> > > force_sig(SIGTERM, p);
> > > } else {
> > > force_sig(SIGKILL, p);
> > > }
> > >
> > > with this?
> > >
> > > force_sig(SIGKILL, p);
> > >
> > > in mm/oom_kill.c.
> >
> > Nice. Your suggestion made the error goes away.
> >
> > We are still testing in order to compare between your OOM Killer and
> > Original OOM Killer.
> 
> Ok, thanks for the confirmation. So my theory was right.
> 
> Basically we've to make this patch, now that you already edited the
> code, can you diff and send a patch that will be the 6/5 in the serie?

OK. I will send the patch.

> (then after fixing this last very longstanding [now deadlock prone too]
> bug, we can think how to make at a 7/5 that will wait a few seconds
> after sending a sigterm, to fallback into a sigkill, that shouldn't be
> difficult, but the above 6/5 will already make the code correct)
> 
> Note, if you add swap it'll workaround it too since then the memhog will
> be allowed to grow to a larger rss than X. With 128m of ram and no swap,
> X is one of the biggest with xshm involved from some client app
> allocating lots of pictures. I could never notice since I always tested
> it either with swap or on higher mem systems and my test box runs
> with an idle X too which isn't that big ;).

Well, we like to reduce the memory resources, because we also think
about OOM Killer in small devices with few resources.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-28 Thread Mauricio Lin
Hi Andrea,

On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote:
 On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote:
  Hi Andrea,
 
  On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] 
  wrote:
   On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
Sometimes the first application to be killed is XFree. AFAIK the
  
   This makes more sense now. You need somebody trapping sigterm in order
   to lockup and X sure traps it to recover the text console.
  
   Can you replace this:
  
   if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
   force_sig(SIGTERM, p);
   } else {
   force_sig(SIGKILL, p);
   }
  
   with this?
  
   force_sig(SIGKILL, p);
  
   in mm/oom_kill.c.
 
  Nice. Your suggestion made the error goes away.
 
  We are still testing in order to compare between your OOM Killer and
  Original OOM Killer.
 
 Ok, thanks for the confirmation. So my theory was right.
 
 Basically we've to make this patch, now that you already edited the
 code, can you diff and send a patch that will be the 6/5 in the serie?

OK. I will send the patch.

 (then after fixing this last very longstanding [now deadlock prone too]
 bug, we can think how to make at a 7/5 that will wait a few seconds
 after sending a sigterm, to fallback into a sigkill, that shouldn't be
 difficult, but the above 6/5 will already make the code correct)
 
 Note, if you add swap it'll workaround it too since then the memhog will
 be allowed to grow to a larger rss than X. With 128m of ram and no swap,
 X is one of the biggest with xshm involved from some client app
 allocating lots of pictures. I could never notice since I always tested
 it either with swap or on higher mem systems and my test box runs
 with an idle X too which isn't that big ;).

Well, we like to reduce the memory resources, because we also think
about OOM Killer in small devices with few resources.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-28 Thread Mauricio Lin
Hi Andrea,


On Fri, 28 Jan 2005 09:58:24 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
 Hi Andrea,
 
 On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli [EMAIL PROTECTED] 
 wrote:
  On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote:
   Hi Andrea,
  
   On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] 
   wrote:
On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
 Sometimes the first application to be killed is XFree. AFAIK the
   
This makes more sense now. You need somebody trapping sigterm in order
to lockup and X sure traps it to recover the text console.
   
Can you replace this:
   
if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
force_sig(SIGTERM, p);
} else {
force_sig(SIGKILL, p);
}
   
with this?
   
force_sig(SIGKILL, p);
   
in mm/oom_kill.c.
  
   Nice. Your suggestion made the error goes away.
  
   We are still testing in order to compare between your OOM Killer and
   Original OOM Killer.
 
  Ok, thanks for the confirmation. So my theory was right.
 
  Basically we've to make this patch, now that you already edited the
  code, can you diff and send a patch that will be the 6/5 in the serie?
 
 OK. I will send the patch.

As you know, Andrew generated the patch. Here goes some test results
about your OOM Killer and the Original OOm Killer. We accomplished 10
experiments for each OOM Killer and below are average values.

Invocations is the number of times that out_of_memory function is
called. Selections is the number of times that select_bad_process
function is called and Killed is the number of killed process.

Original OOM Killer
Invocations average = 51620/10 = 5162
Selections average = 30/10 = 3
Killed average = 38/10 = 3.8

Andrea OOM Killer
Invocations average = 213/10 = 21.3
Selections average = 213/10 = 21.3
Killed average = 52/10 = 5.2

As you can see the number of invocations reduced significantly using
your OOM Killer.

I did not know about this problem when I was moving the original
ranking algorithm to userland. As Thomaz mentioned: invocation
madness, reentrancy problems and those strange timers and counter as
now, since, last, lastkill and count. I guess that now i can put some
OOM Killer stuffs in userland in a safer manner with those problems
solved, right?

BTW, will your OOM Killer be included in the kernel tree?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-28 Thread Andrea Arcangeli
On Fri, Jan 28, 2005 at 11:21:11AM -0400, Mauricio Lin wrote:
 As you know, Andrew generated the patch. Here goes some test results
 about your OOM Killer and the Original OOm Killer. We accomplished 10
 experiments for each OOM Killer and below are average values.
 
 Invocations is the number of times that out_of_memory function is
 called. Selections is the number of times that select_bad_process
 function is called and Killed is the number of killed process.
 
 Original OOM Killer
 Invocations average = 51620/10 = 5162
 Selections average = 30/10 = 3
 Killed average = 38/10 = 3.8
 
 Andrea OOM Killer
 Invocations average = 213/10 = 21.3
 Selections average = 213/10 = 21.3
 Killed average = 52/10 = 5.2
 
 As you can see the number of invocations reduced significantly using
 your OOM Killer.

Yep, thanks for testing!

 I did not know about this problem when I was moving the original
 ranking algorithm to userland. As Thomaz mentioned: invocation
 madness, reentrancy problems and those strange timers and counter as
 now, since, last, lastkill and count. I guess that now i can put some
 OOM Killer stuffs in userland in a safer manner with those problems
 solved, right?

Yep ;)

 BTW, will your OOM Killer be included in the kernel tree?

Yes, Andrew said it should go in the next few days, which is a great
news, thanks everyone!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Andrew Morton
Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
>
> And they had not necessairly hardware access. They "might" have hardware
> access.

On x86 we could perhaps test for non-nullness of tsk->thread->io_bitmap_ptr?

> I thought I could wait the other patches
> to be merged to avoid confusion before making more changes (since it'd
> be a pretty self contained feature), but I can do that now if you
> prefer.

I'll send your current stuff off to Linus in the next few days - we can let
that sit for a while, use that as a base for further work.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Andrea Arcangeli
On Thu, Jan 27, 2005 at 03:35:35PM -0800, Andrew Morton wrote:
> On x86 we could perhaps test for non-nullness of tsk->thread->io_bitmap_ptr?

yes for ioports. But I'm afraid I was too optimistic about eflags for
iopl, that's not in the per-task tss, it's only stored at the very top
of the kernel stack and inherit during fork/clone. So we probably need
to check esp0 and read the top of the stack to see if a task has eflags
set. esp0 is definitely stored in the thread struct when the task is
rescheduled, and it cannot change for each given task, so we can access
it even while the task is runnable and it shouldn't be corrupted by
iret. But the problem is sysenter is optimized not to save eflags on the
kernel stack, so the top of the stack - 12bytes would not contain eflags
if sysenter is in use.

So basically we'd need to change iopl to propagate the info to the task
struct synchronously somehow, because we can't read it reliably from the
kernel stack.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Andrea Arcangeli
On Thu, Jan 27, 2005 at 02:29:43PM -0800, Andrew Morton wrote:
> I've already queued a patch for this:
> 
> --- 25/mm/oom_kill.c~mm-fix-several-oom-killer-bugs-fix   Thu Jan 27 
> 13:56:58 2005
> +++ 25-akpm/mm/oom_kill.c Thu Jan 27 13:57:19 2005
> @@ -198,12 +198,7 @@ static void __oom_kill_task(task_t *p)
>   p->time_slice = HZ;
>   p->memdie = 1;
>  
> - /* This process has hardware access, be more careful. */
> - if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
> - force_sig(SIGTERM, p);
> - } else {
> - force_sig(SIGKILL, p);
> - }
> + force_sig(SIGKILL, p);
>  }
>  
>  static struct mm_struct *oom_kill_task(task_t *p)

Thanks.

> However.  This means that we'll now kill off tasks which had hardware
> access.  What are the implications of this?

The implication of the above is basically that the X server won't be
able to restore the text mode, but that avoids the deadlock ;).

And they had not necessairly hardware access. They "might" have hardware
access. Note that an app may have hardware access even if it has no
rawio capabilities. One can run iopl and then change uid just fine. So
the above check is quite weak since it leaves the kernel susceptible to
bugs and memleaks in any app started by root. Kernel shouldn't trust
root apps, all apps are buggy, root apps too (I even once fixed a signal
race in /sbin/init that showed up with the schedule child first sched
optimization ;).

iopl and ioperm are the only two things we care about.  We can a
synchronous reliable eflags/ioperm value only from the "regs" in the
task context. Problem is that since we can pick a task to kill that
isn't necessairly the current task, we should start to approximate, and
assume the process is sleeping. The regs must be saved during
reschedule, so it should cache the old contents. So perhaps we can get a
pratically reliable eflags dump from the tss_struct. But this will not
be common code and it'll require a specialized arch API. Like
has_hw_access(). Only then we can make a stronger assumption and be
truly careful about sending SIGKILL.

The right way to do this is probably to wait a few seconds before
sending the sigkill. I'm not currently sure if it worth adding the
has_hw_access(). But certainly I would prefer to do nothing special with
only the sys_rawio capability. I thought I could wait the other patches
to be merged to avoid confusion before making more changes (since it'd
be a pretty self contained feature), but I can do that now if you
prefer.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Andrew Morton
Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
>
> > > Can you replace this:
> > > 
> > > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
> > > force_sig(SIGTERM, p);
> > > } else {
> > > force_sig(SIGKILL, p);
> > > }
> > > 
> > > with this?
> > > 
> > > force_sig(SIGKILL, p);
> > > 
> > > in mm/oom_kill.c.
> > 
> > Nice. Your suggestion made the error goes away.
> > 
> > We are still testing in order to compare between your OOM Killer and
> > Original OOM Killer.
> 
> Ok, thanks for the confirmation. So my theory was right.
> 
> Basically we've to make this patch, now that you already edited the
> code, can you diff and send a patch that will be the 6/5 in the serie?
> 

I've already queued a patch for this:

--- 25/mm/oom_kill.c~mm-fix-several-oom-killer-bugs-fix Thu Jan 27 13:56:58 2005
+++ 25-akpm/mm/oom_kill.c   Thu Jan 27 13:57:19 2005
@@ -198,12 +198,7 @@ static void __oom_kill_task(task_t *p)
p->time_slice = HZ;
p->memdie = 1;
 
-   /* This process has hardware access, be more careful. */
-   if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
-   force_sig(SIGTERM, p);
-   } else {
-   force_sig(SIGKILL, p);
-   }
+   force_sig(SIGKILL, p);
 }
 
 static struct mm_struct *oom_kill_task(task_t *p)

However.  This means that we'll now kill off tasks which had hardware
access.  What are the implications of this?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Andrea Arcangeli
On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote:
> Hi Andrea,
> 
> On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> 
> wrote:
> > On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
> > > Sometimes the first application to be killed is XFree. AFAIK the
> > 
> > This makes more sense now. You need somebody trapping sigterm in order
> > to lockup and X sure traps it to recover the text console.
> > 
> > Can you replace this:
> > 
> > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
> > force_sig(SIGTERM, p);
> > } else {
> > force_sig(SIGKILL, p);
> > }
> > 
> > with this?
> > 
> > force_sig(SIGKILL, p);
> > 
> > in mm/oom_kill.c.
> 
> Nice. Your suggestion made the error goes away.
> 
> We are still testing in order to compare between your OOM Killer and
> Original OOM Killer.

Ok, thanks for the confirmation. So my theory was right.

Basically we've to make this patch, now that you already edited the
code, can you diff and send a patch that will be the 6/5 in the serie?

(then after fixing this last very longstanding [now deadlock prone too]
bug, we can think how to make at a 7/5 that will wait a few seconds
after sending a sigterm, to fallback into a sigkill, that shouldn't be
difficult, but the above 6/5 will already make the code correct)

Note, if you add swap it'll workaround it too since then the memhog will
be allowed to grow to a larger rss than X. With 128m of ram and no swap,
X is one of the biggest with xshm involved from some client app
allocating lots of pictures. I could never notice since I always tested
it either with swap or on higher mem systems and my test box runs
with an idle X too which isn't that big ;).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Mauricio Lin
Hi Andrea,

On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
> > Sometimes the first application to be killed is XFree. AFAIK the
> 
> This makes more sense now. You need somebody trapping sigterm in order
> to lockup and X sure traps it to recover the text console.
> 
> Can you replace this:
> 
> if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
> force_sig(SIGTERM, p);
> } else {
> force_sig(SIGKILL, p);
> }
> 
> with this?
> 
> force_sig(SIGKILL, p);
> 
> in mm/oom_kill.c.

Nice. Your suggestion made the error goes away.

We are still testing in order to compare between your OOM Killer and
Original OOM Killer.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Mauricio Lin
Hi Andrea,

On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote:
 On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
  Sometimes the first application to be killed is XFree. AFAIK the
 
 This makes more sense now. You need somebody trapping sigterm in order
 to lockup and X sure traps it to recover the text console.
 
 Can you replace this:
 
 if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
 force_sig(SIGTERM, p);
 } else {
 force_sig(SIGKILL, p);
 }
 
 with this?
 
 force_sig(SIGKILL, p);
 
 in mm/oom_kill.c.

Nice. Your suggestion made the error goes away.

We are still testing in order to compare between your OOM Killer and
Original OOM Killer.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Andrea Arcangeli
On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote:
 Hi Andrea,
 
 On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] 
 wrote:
  On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
   Sometimes the first application to be killed is XFree. AFAIK the
  
  This makes more sense now. You need somebody trapping sigterm in order
  to lockup and X sure traps it to recover the text console.
  
  Can you replace this:
  
  if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
  force_sig(SIGTERM, p);
  } else {
  force_sig(SIGKILL, p);
  }
  
  with this?
  
  force_sig(SIGKILL, p);
  
  in mm/oom_kill.c.
 
 Nice. Your suggestion made the error goes away.
 
 We are still testing in order to compare between your OOM Killer and
 Original OOM Killer.

Ok, thanks for the confirmation. So my theory was right.

Basically we've to make this patch, now that you already edited the
code, can you diff and send a patch that will be the 6/5 in the serie?

(then after fixing this last very longstanding [now deadlock prone too]
bug, we can think how to make at a 7/5 that will wait a few seconds
after sending a sigterm, to fallback into a sigkill, that shouldn't be
difficult, but the above 6/5 will already make the code correct)

Note, if you add swap it'll workaround it too since then the memhog will
be allowed to grow to a larger rss than X. With 128m of ram and no swap,
X is one of the biggest with xshm involved from some client app
allocating lots of pictures. I could never notice since I always tested
it either with swap or on higher mem systems and my test box runs
with an idle X too which isn't that big ;).
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Andrew Morton
Andrea Arcangeli [EMAIL PROTECTED] wrote:

   Can you replace this:
   
   if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
   force_sig(SIGTERM, p);
   } else {
   force_sig(SIGKILL, p);
   }
   
   with this?
   
   force_sig(SIGKILL, p);
   
   in mm/oom_kill.c.
  
  Nice. Your suggestion made the error goes away.
  
  We are still testing in order to compare between your OOM Killer and
  Original OOM Killer.
 
 Ok, thanks for the confirmation. So my theory was right.
 
 Basically we've to make this patch, now that you already edited the
 code, can you diff and send a patch that will be the 6/5 in the serie?
 

I've already queued a patch for this:

--- 25/mm/oom_kill.c~mm-fix-several-oom-killer-bugs-fix Thu Jan 27 13:56:58 2005
+++ 25-akpm/mm/oom_kill.c   Thu Jan 27 13:57:19 2005
@@ -198,12 +198,7 @@ static void __oom_kill_task(task_t *p)
p-time_slice = HZ;
p-memdie = 1;
 
-   /* This process has hardware access, be more careful. */
-   if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
-   force_sig(SIGTERM, p);
-   } else {
-   force_sig(SIGKILL, p);
-   }
+   force_sig(SIGKILL, p);
 }
 
 static struct mm_struct *oom_kill_task(task_t *p)

However.  This means that we'll now kill off tasks which had hardware
access.  What are the implications of this?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Andrea Arcangeli
On Thu, Jan 27, 2005 at 02:29:43PM -0800, Andrew Morton wrote:
 I've already queued a patch for this:
 
 --- 25/mm/oom_kill.c~mm-fix-several-oom-killer-bugs-fix   Thu Jan 27 
 13:56:58 2005
 +++ 25-akpm/mm/oom_kill.c Thu Jan 27 13:57:19 2005
 @@ -198,12 +198,7 @@ static void __oom_kill_task(task_t *p)
   p-time_slice = HZ;
   p-memdie = 1;
  
 - /* This process has hardware access, be more careful. */
 - if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
 - force_sig(SIGTERM, p);
 - } else {
 - force_sig(SIGKILL, p);
 - }
 + force_sig(SIGKILL, p);
  }
  
  static struct mm_struct *oom_kill_task(task_t *p)

Thanks.

 However.  This means that we'll now kill off tasks which had hardware
 access.  What are the implications of this?

The implication of the above is basically that the X server won't be
able to restore the text mode, but that avoids the deadlock ;).

And they had not necessairly hardware access. They might have hardware
access. Note that an app may have hardware access even if it has no
rawio capabilities. One can run iopl and then change uid just fine. So
the above check is quite weak since it leaves the kernel susceptible to
bugs and memleaks in any app started by root. Kernel shouldn't trust
root apps, all apps are buggy, root apps too (I even once fixed a signal
race in /sbin/init that showed up with the schedule child first sched
optimization ;).

iopl and ioperm are the only two things we care about.  We can a
synchronous reliable eflags/ioperm value only from the regs in the
task context. Problem is that since we can pick a task to kill that
isn't necessairly the current task, we should start to approximate, and
assume the process is sleeping. The regs must be saved during
reschedule, so it should cache the old contents. So perhaps we can get a
pratically reliable eflags dump from the tss_struct. But this will not
be common code and it'll require a specialized arch API. Like
has_hw_access(). Only then we can make a stronger assumption and be
truly careful about sending SIGKILL.

The right way to do this is probably to wait a few seconds before
sending the sigkill. I'm not currently sure if it worth adding the
has_hw_access(). But certainly I would prefer to do nothing special with
only the sys_rawio capability. I thought I could wait the other patches
to be merged to avoid confusion before making more changes (since it'd
be a pretty self contained feature), but I can do that now if you
prefer.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Andrea Arcangeli
On Thu, Jan 27, 2005 at 03:35:35PM -0800, Andrew Morton wrote:
 On x86 we could perhaps test for non-nullness of tsk-thread-io_bitmap_ptr?

yes for ioports. But I'm afraid I was too optimistic about eflags for
iopl, that's not in the per-task tss, it's only stored at the very top
of the kernel stack and inherit during fork/clone. So we probably need
to check esp0 and read the top of the stack to see if a task has eflags
set. esp0 is definitely stored in the thread struct when the task is
rescheduled, and it cannot change for each given task, so we can access
it even while the task is runnable and it shouldn't be corrupted by
iret. But the problem is sysenter is optimized not to save eflags on the
kernel stack, so the top of the stack - 12bytes would not contain eflags
if sysenter is in use.

So basically we'd need to change iopl to propagate the info to the task
struct synchronously somehow, because we can't read it reliably from the
kernel stack.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Andrew Morton
Andrea Arcangeli [EMAIL PROTECTED] wrote:

 And they had not necessairly hardware access. They might have hardware
 access.

On x86 we could perhaps test for non-nullness of tsk-thread-io_bitmap_ptr?

 I thought I could wait the other patches
 to be merged to avoid confusion before making more changes (since it'd
 be a pretty self contained feature), but I can do that now if you
 prefer.

I'll send your current stuff off to Linus in the next few days - we can let
that sit for a while, use that as a base for further work.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-26 Thread Mauricio Lin
Hi Andrea,

On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
> > Sometimes the first application to be killed is XFree. AFAIK the
> 
> This makes more sense now. You need somebody trapping sigterm in order
> to lockup and X sure traps it to recover the text console.
> 
> Can you replace this:
> 
> if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
> force_sig(SIGTERM, p);
> } else {
> force_sig(SIGKILL, p);
> }
> 
> with this?

OK, let me test it. If I get some news, I will let you know.

> 
> force_sig(SIGKILL, p);
> 
> in mm/oom_kill.c.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-26 Thread Mauricio Lin
Hi Andrea,

On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote:
 On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
  Sometimes the first application to be killed is XFree. AFAIK the
 
 This makes more sense now. You need somebody trapping sigterm in order
 to lockup and X sure traps it to recover the text console.
 
 Can you replace this:
 
 if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
 force_sig(SIGTERM, p);
 } else {
 force_sig(SIGKILL, p);
 }
 
 with this?

OK, let me test it. If I get some news, I will let you know.

 
 force_sig(SIGKILL, p);
 
 in mm/oom_kill.c.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Thomas Gleixner
On Tue, 2005-01-25 at 20:11 -0400, Mauricio Lin wrote:
> > Can you please show the kernel messages ?
> 
> OK. We will try to reach a situation that the printk messages can be
> written entirely in the log file and show you the kernel messages. But
> as I said: usually the printks messages are not written in the log
> file using Andrea's patch. But using the original OOM Killer we can
> see the messages in the log file. The syslog.conf file is the same for
> both OOM Killer(Andrea and Original). Do you have any idea what is
> happening to log file?

Add "console=ttyS0,115200" to your commandline so you get all the
messages on the serial console.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Andrea Arcangeli
On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
> Sometimes the first application to be killed is XFree. AFAIK the

This makes more sense now. You need somebody trapping sigterm in order
to lockup and X sure traps it to recover the text console.

Can you replace this:

if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
force_sig(SIGTERM, p);
} else {
force_sig(SIGKILL, p);
}

with this?

force_sig(SIGKILL, p);

in mm/oom_kill.c.

This should fix it. Problem is that SIGTERM is unsafe even if the app is
not malicious, there's not enough ram to pagein the userland sighander,
so the system lockups.

We need a sort of timeout where we fallback into SIGKILL if SIGTERM
didn't help.

Anyway this is not a new bug, I didn't touch a single bit in that code.
I'd really like to see current fixes merged, then we can take care of
root app getting killed reliably. In all my test I always run the
malicious app as non-root, and anyway I never trap sigterm (X is tiny in
my setup, so it never gets killed). Probably the GUI stuff you opened
has increased significantly X size for X to be killed.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Mauricio Lin
Hi Thomaz,

On Tue, 25 Jan 2005 22:39:39 +0100, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> On Tue, 2005-01-25 at 17:13 -0400, Mauricio Lin wrote:
> > Hi Andrea,
> >
> > Your OOM Killer patch was tested and a strange behaviour was found.
> > Basically as normal user we started some applications as openoffice,
> > mozilla and emacs.
> > And as a root (in another tty) we started a simple program that uses
> > malloc in a forever loop as below:
> >
> > int main (void)
> > {
> >   int * mem;
> >   for (;;)
> > mem = (int *) malloc(sizeof(int));
> >   return 0;
> > }
> >
> >
> > Using the original OOM Killer, malloc is the first killed application
> > and the sytem is restored in a useful state. After applying your patch
> > and accomplish the same experiment, the OOM Killer it does not kill
> > malloc program and it enters in a kind of forever loop as below:
> >
> > 1) out_of_memory is invoked;
> > 2) select_bad_process is invoked;
> 
> Which process is selected ?
Sometimes the first application to be killed is XFree. AFAIK the
malloc is never killed, because the OOM Killer does not stop to do its
work. Usually we are not able to check the kernel log file after
rebooting the system. Because nothing was written there (perhaps
syslogd or klogd were killed during OOM). But I can see the printk
messages on the screen during OOM Killer action. This does not happen
with original OOM Killer.

I put some printk in order to trace the OOM Killer and IMHO what is going is:

out_of_memory function is invoked and after that the
select_bad_process is also invoked.
So its starts to point each task. But during the do_each_thread /
while each_thread loop the
condition:

if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p->flags &
PF_EXITING)) &&
!(p->flags & PF_DEAD))
   return ERR_PTR(-1UL);

is true and it leaves from select_bad_process function because of the
return statement.

So the running code return from the point that select_bad_process was
called, i.e., in the out_of_memory function. The condition statement
in out_of_memory function:

if (PTR_ERR(p) == -1UL)
goto out;

is also true so it goes to "out" label and leaves from the
out_of_memory function. But because of the OOM state the out_of_memory
function is invoked again and after that the select_bad_process is
also invoked again. And during the do_each_thread / while each_thread
loop the same condition as mentioned above is true again. So it leaves
from select_bad_process function because of the return statement and
goes to "out" label and
leaves from the out_of_memory function again. This behaviour is
repeated continuously
during a long time until I stop waiting and reboot the system using my
own finger.

> Can you please show the kernel messages ?

OK. We will try to reach a situation that the printk messages can be
written entirely in the log file and show you the kernel messages. But
as I said: usually the printks messages are not written in the log
file using Andrea's patch. But using the original OOM Killer we can
see the messages in the log file. The syslog.conf file is the same for
both OOM Killer(Andrea and Original). Do you have any idea what is
happening to log file?

If you do not mind, you can accomplish the same test case as I
mentioned on my last email. I would like to know if this problem
happens to others people as well.

We tested on the laptop and desktop machines with 128MB of RAM and
swap space disabled.


BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Thomas Gleixner
On Tue, 2005-01-25 at 17:13 -0400, Mauricio Lin wrote:
> Hi Andrea,
> 
> Your OOM Killer patch was tested and a strange behaviour was found.
> Basically as normal user we started some applications as openoffice,
> mozilla and emacs.
> And as a root (in another tty) we started a simple program that uses
> malloc in a forever loop as below:
> 
> int main (void)
> {
>   int * mem;
>   for (;;)
> mem = (int *) malloc(sizeof(int));
>   return 0;
> }
> 
> 
> Using the original OOM Killer, malloc is the first killed application
> and the sytem is restored in a useful state. After applying your patch
> and accomplish the same experiment, the OOM Killer it does not kill
> malloc program and it enters in a kind of forever loop as below:
> 
> 1) out_of_memory is invoked;
> 2) select_bad_process is invoked;

Which process is selected ?

> 3) the following condition is fullfied;
> if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p->flags &
> PF_EXITING)) &&
>   !(p->flags & PF_DEAD))
>   return ERR_PTR(-1UL);

???

Can you please show the kernel messages ?

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Mauricio Lin
Hi Andrea,

Your OOM Killer patch was tested and a strange behaviour was found.
Basically as normal user we started some applications as openoffice,
mozilla and emacs.
And as a root (in another tty) we started a simple program that uses
malloc in a forever loop as below:

int main (void)
{
  int * mem;
  for (;;)
mem = (int *) malloc(sizeof(int));
  return 0;
}


Using the original OOM Killer, malloc is the first killed application
and the sytem is restored in a useful state. After applying your patch
and accomplish the same experiment, the OOM Killer it does not kill
malloc program and it enters in a kind of forever loop as below:

1) out_of_memory is invoked;
2) select_bad_process is invoked;
3) the following condition is fullfied;
if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p->flags &
PF_EXITING)) &&
!(p->flags & PF_DEAD))
return ERR_PTR(-1UL);
4) step 1, 2 ,3 above is executed again;

This loop (step 1 until step 4) lasts during a long time (and nothing
is killed) until I give up and reboot the system after waiting for
some minutes.

Any comments? What do you think about our test case? Could you
accomplish the same test case using malloc program as root and other
graphical applications as normal user?

Let me know about your ideas.

BR,

Mauricio Lin.

On Sat, 22 Jan 2005 04:32:19 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> On Fri, Jan 21, 2005 at 05:45:13PM -0400, Mauricio Lin wrote:
> > Hi Andrew,
> >
> > I have another question. You included an oom_adj entry in /proc for
> > each process. This was the approach you used in order to allow someone
> > or something to interfere the ranking algorithm from userland, right?
> > So if i have an another ranking algorithm in user space, I can use it
> > to complement the kernel decision as necessary. Was it your idea?
> 
> Yes, you should use your userspace algorithm to tune the oom killer via
> the oom_adj and you can check the effect of your changes with oom_score.
> I posted a one liner ugly script to do that a few days ago on l-k.
> 
> The oom_adj has this effect on the badness() code:
> 
> /*
>  * Adjust the score by oomkilladj.
>  */
> if (p->oomkilladj) {
> if (p->oomkilladj > 0)
> points <<= p->oomkilladj;
> else
> points >>= -(p->oomkilladj);
> }
> 
> The biggest the points become, the more likely the task will be choosen
> by the oom killer.
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Mauricio Lin
Hi Andrea,

Your OOM Killer patch was tested and a strange behaviour was found.
Basically as normal user we started some applications as openoffice,
mozilla and emacs.
And as a root (in another tty) we started a simple program that uses
malloc in a forever loop as below:

int main (void)
{
  int * mem;
  for (;;)
mem = (int *) malloc(sizeof(int));
  return 0;
}


Using the original OOM Killer, malloc is the first killed application
and the sytem is restored in a useful state. After applying your patch
and accomplish the same experiment, the OOM Killer it does not kill
malloc program and it enters in a kind of forever loop as below:

1) out_of_memory is invoked;
2) select_bad_process is invoked;
3) the following condition is fullfied;
if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p-flags 
PF_EXITING)) 
!(p-flags  PF_DEAD))
return ERR_PTR(-1UL);
4) step 1, 2 ,3 above is executed again;

This loop (step 1 until step 4) lasts during a long time (and nothing
is killed) until I give up and reboot the system after waiting for
some minutes.

Any comments? What do you think about our test case? Could you
accomplish the same test case using malloc program as root and other
graphical applications as normal user?

Let me know about your ideas.

BR,

Mauricio Lin.

On Sat, 22 Jan 2005 04:32:19 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote:
 On Fri, Jan 21, 2005 at 05:45:13PM -0400, Mauricio Lin wrote:
  Hi Andrew,
 
  I have another question. You included an oom_adj entry in /proc for
  each process. This was the approach you used in order to allow someone
  or something to interfere the ranking algorithm from userland, right?
  So if i have an another ranking algorithm in user space, I can use it
  to complement the kernel decision as necessary. Was it your idea?
 
 Yes, you should use your userspace algorithm to tune the oom killer via
 the oom_adj and you can check the effect of your changes with oom_score.
 I posted a one liner ugly script to do that a few days ago on l-k.
 
 The oom_adj has this effect on the badness() code:
 
 /*
  * Adjust the score by oomkilladj.
  */
 if (p-oomkilladj) {
 if (p-oomkilladj  0)
 points = p-oomkilladj;
 else
 points = -(p-oomkilladj);
 }
 
 The biggest the points become, the more likely the task will be choosen
 by the oom killer.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Thomas Gleixner
On Tue, 2005-01-25 at 17:13 -0400, Mauricio Lin wrote:
 Hi Andrea,
 
 Your OOM Killer patch was tested and a strange behaviour was found.
 Basically as normal user we started some applications as openoffice,
 mozilla and emacs.
 And as a root (in another tty) we started a simple program that uses
 malloc in a forever loop as below:
 
 int main (void)
 {
   int * mem;
   for (;;)
 mem = (int *) malloc(sizeof(int));
   return 0;
 }
 
 
 Using the original OOM Killer, malloc is the first killed application
 and the sytem is restored in a useful state. After applying your patch
 and accomplish the same experiment, the OOM Killer it does not kill
 malloc program and it enters in a kind of forever loop as below:
 
 1) out_of_memory is invoked;
 2) select_bad_process is invoked;

Which process is selected ?

 3) the following condition is fullfied;
 if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p-flags 
 PF_EXITING)) 
   !(p-flags  PF_DEAD))
   return ERR_PTR(-1UL);

???

Can you please show the kernel messages ?

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Mauricio Lin
Hi Thomaz,

On Tue, 25 Jan 2005 22:39:39 +0100, Thomas Gleixner [EMAIL PROTECTED] wrote:
 On Tue, 2005-01-25 at 17:13 -0400, Mauricio Lin wrote:
  Hi Andrea,
 
  Your OOM Killer patch was tested and a strange behaviour was found.
  Basically as normal user we started some applications as openoffice,
  mozilla and emacs.
  And as a root (in another tty) we started a simple program that uses
  malloc in a forever loop as below:
 
  int main (void)
  {
int * mem;
for (;;)
  mem = (int *) malloc(sizeof(int));
return 0;
  }
 
 
  Using the original OOM Killer, malloc is the first killed application
  and the sytem is restored in a useful state. After applying your patch
  and accomplish the same experiment, the OOM Killer it does not kill
  malloc program and it enters in a kind of forever loop as below:
 
  1) out_of_memory is invoked;
  2) select_bad_process is invoked;
 
 Which process is selected ?
Sometimes the first application to be killed is XFree. AFAIK the
malloc is never killed, because the OOM Killer does not stop to do its
work. Usually we are not able to check the kernel log file after
rebooting the system. Because nothing was written there (perhaps
syslogd or klogd were killed during OOM). But I can see the printk
messages on the screen during OOM Killer action. This does not happen
with original OOM Killer.

I put some printk in order to trace the OOM Killer and IMHO what is going is:

out_of_memory function is invoked and after that the
select_bad_process is also invoked.
So its starts to point each task. But during the do_each_thread /
while each_thread loop the
condition:

if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p-flags 
PF_EXITING)) 
!(p-flags  PF_DEAD))
   return ERR_PTR(-1UL);

is true and it leaves from select_bad_process function because of the
return statement.

So the running code return from the point that select_bad_process was
called, i.e., in the out_of_memory function. The condition statement
in out_of_memory function:

if (PTR_ERR(p) == -1UL)
goto out;

is also true so it goes to out label and leaves from the
out_of_memory function. But because of the OOM state the out_of_memory
function is invoked again and after that the select_bad_process is
also invoked again. And during the do_each_thread / while each_thread
loop the same condition as mentioned above is true again. So it leaves
from select_bad_process function because of the return statement and
goes to out label and
leaves from the out_of_memory function again. This behaviour is
repeated continuously
during a long time until I stop waiting and reboot the system using my
own finger.

 Can you please show the kernel messages ?

OK. We will try to reach a situation that the printk messages can be
written entirely in the log file and show you the kernel messages. But
as I said: usually the printks messages are not written in the log
file using Andrea's patch. But using the original OOM Killer we can
see the messages in the log file. The syslog.conf file is the same for
both OOM Killer(Andrea and Original). Do you have any idea what is
happening to log file?

If you do not mind, you can accomplish the same test case as I
mentioned on my last email. I would like to know if this problem
happens to others people as well.

We tested on the laptop and desktop machines with 128MB of RAM and
swap space disabled.


BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Andrea Arcangeli
On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
 Sometimes the first application to be killed is XFree. AFAIK the

This makes more sense now. You need somebody trapping sigterm in order
to lockup and X sure traps it to recover the text console.

Can you replace this:

if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
force_sig(SIGTERM, p);
} else {
force_sig(SIGKILL, p);
}

with this?

force_sig(SIGKILL, p);

in mm/oom_kill.c.

This should fix it. Problem is that SIGTERM is unsafe even if the app is
not malicious, there's not enough ram to pagein the userland sighander,
so the system lockups.

We need a sort of timeout where we fallback into SIGKILL if SIGTERM
didn't help.

Anyway this is not a new bug, I didn't touch a single bit in that code.
I'd really like to see current fixes merged, then we can take care of
root app getting killed reliably. In all my test I always run the
malicious app as non-root, and anyway I never trap sigterm (X is tiny in
my setup, so it never gets killed). Probably the GUI stuff you opened
has increased significantly X size for X to be killed.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Thomas Gleixner
On Tue, 2005-01-25 at 20:11 -0400, Mauricio Lin wrote:
  Can you please show the kernel messages ?
 
 OK. We will try to reach a situation that the printk messages can be
 written entirely in the log file and show you the kernel messages. But
 as I said: usually the printks messages are not written in the log
 file using Andrea's patch. But using the original OOM Killer we can
 see the messages in the log file. The syslog.conf file is the same for
 both OOM Killer(Andrea and Original). Do you have any idea what is
 happening to log file?

Add console=ttyS0,115200 to your commandline so you get all the
messages on the serial console.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-21 Thread Andrea Arcangeli
On Fri, Jan 21, 2005 at 05:45:13PM -0400, Mauricio Lin wrote:
> Hi Andrew,
> 
> I have another question. You included an oom_adj entry in /proc for
> each process. This was the approach you used in order to allow someone
> or something to interfere the ranking algorithm from userland, right?
> So if i have an another ranking algorithm in user space, I can use it
> to complement the kernel decision as necessary. Was it your idea?

Yes, you should use your userspace algorithm to tune the oom killer via
the oom_adj and you can check the effect of your changes with oom_score.
I posted a one liner ugly script to do that a few days ago on l-k.

The oom_adj has this effect on the badness() code:

/* 
 * Adjust the score by oomkilladj.
 */
if (p->oomkilladj) {
if (p->oomkilladj > 0)
points <<= p->oomkilladj;
else
points >>= -(p->oomkilladj);
}

The biggest the points become, the more likely the task will be choosen
by the oom killer.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-21 Thread Andrea Arcangeli
On Fri, Jan 21, 2005 at 05:27:11PM -0400, Mauricio Lin wrote:
> Hi Andrea,
> 
> I applied your patch and I am checking your code. It is really a very
> interesting work. I have a question about the function
> __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory
> function. Do not you think it would be better put set_current_state
> instead of __set_current_state function? AFAIK the set_current_state
> function is more feasible for SMP systems, right?

set_current_state is needed only when you need to place a memory barrier
after __set_current_state. So it's needed in the usual wait_event loop,
right after registering in the waitqueue. Example:

unsigned long flags;

wait->flags &= ~WQ_FLAG_EXCLUSIVE;
spin_lock_irqsave(>lock, flags);
if (list_empty(>task_list))
__add_wait_queue(q, wait);
/*
 * don't alter the task state if this is just going to
 * queue an async wait queue callback
 */
if (is_sync_wait(wait))
set_current_state(state);
spin_unlock_irqrestore(>lock, flags);

and even in the above is needed only because spin_unlock has inclusive
semantics in ia64. In 2.4 there was no unlock at all after
set_current_state and it was like this:


set_current_state(TASK_UNINTERRUPTIBLE);
\
if (condition)
\
break;
\
schedule();
\

The rule of thumb is that if there's nothing between set_current_state
and schedule() then __set_current_state is more efficient and equally
safe to use. And the oom killer path I posted falls in this category,
nothing in between set_current_state and schedule, so no reason to place
memory barries in there.

Hope this helps ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-21 Thread Mauricio Lin
Hi Andrew,

I have another question. You included an oom_adj entry in /proc for
each process. This was the approach you used in order to allow someone
or something to interfere the ranking algorithm from userland, right?
So if i have an another ranking algorithm in user space, I can use it
to complement the kernel decision as necessary. Was it your idea?

BR,

Mauricio Lin.


On Fri, 21 Jan 2005 17:27:11 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> Hi Andrea,
> 
> I applied your patch and I am checking your code. It is really a very
> interesting work. I have a question about the function
> __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory
> function. Do not you think it would be better put set_current_state
> instead of __set_current_state function? AFAIK the set_current_state
> function is more feasible for SMP systems, right?
> 
> BR,
> 
> Mauricio Lin.
> 
> 
> On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> 
> wrote:
> > On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote:
> > > confirmed fix for this available. It was posted more than once.
> >
> > I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all
> > applied to mainline, they're self contained. They add the userspace
> > ratings too.
> >
> > Those patches fixes a longstanding PF_MEMDIE race too and they optimize
> > used_math as well.
> >
> > I'm running with all 6 patches applied with an uptime of 6 days on SMP
> > and no problems at all. They're all 6 patches applied to the kotd too
> > (plus the other bits posted on l-k as well for the write throttling,
> > just one bit is still missing but I'll add it soon):
> >
> > ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD
> >
> >
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-21 Thread Mauricio Lin
Hi Andrea,

I applied your patch and I am checking your code. It is really a very
interesting work. I have a question about the function
__set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory
function. Do not you think it would be better put set_current_state
instead of __set_current_state function? AFAIK the set_current_state
function is more feasible for SMP systems, right?

BR,

Mauricio Lin.


On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote:
> > confirmed fix for this available. It was posted more than once.
> 
> I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all
> applied to mainline, they're self contained. They add the userspace
> ratings too.
> 
> Those patches fixes a longstanding PF_MEMDIE race too and they optimize
> used_math as well.
> 
> I'm running with all 6 patches applied with an uptime of 6 days on SMP
> and no problems at all. They're all 6 patches applied to the kotd too
> (plus the other bits posted on l-k as well for the write throttling,
> just one bit is still missing but I'll add it soon):
> 
> ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD
> 
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-21 Thread Mauricio Lin
Hi Andrea,

I applied your patch and I am checking your code. It is really a very
interesting work. I have a question about the function
__set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory
function. Do not you think it would be better put set_current_state
instead of __set_current_state function? AFAIK the set_current_state
function is more feasible for SMP systems, right?

BR,

Mauricio Lin.


On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote:
 On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote:
  confirmed fix for this available. It was posted more than once.
 
 I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all
 applied to mainline, they're self contained. They add the userspace
 ratings too.
 
 Those patches fixes a longstanding PF_MEMDIE race too and they optimize
 used_math as well.
 
 I'm running with all 6 patches applied with an uptime of 6 days on SMP
 and no problems at all. They're all 6 patches applied to the kotd too
 (plus the other bits posted on l-k as well for the write throttling,
 just one bit is still missing but I'll add it soon):
 
 ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-21 Thread Mauricio Lin
Hi Andrew,

I have another question. You included an oom_adj entry in /proc for
each process. This was the approach you used in order to allow someone
or something to interfere the ranking algorithm from userland, right?
So if i have an another ranking algorithm in user space, I can use it
to complement the kernel decision as necessary. Was it your idea?

BR,

Mauricio Lin.


On Fri, 21 Jan 2005 17:27:11 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
 Hi Andrea,
 
 I applied your patch and I am checking your code. It is really a very
 interesting work. I have a question about the function
 __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory
 function. Do not you think it would be better put set_current_state
 instead of __set_current_state function? AFAIK the set_current_state
 function is more feasible for SMP systems, right?
 
 BR,
 
 Mauricio Lin.
 
 
 On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli [EMAIL PROTECTED] 
 wrote:
  On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote:
   confirmed fix for this available. It was posted more than once.
 
  I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all
  applied to mainline, they're self contained. They add the userspace
  ratings too.
 
  Those patches fixes a longstanding PF_MEMDIE race too and they optimize
  used_math as well.
 
  I'm running with all 6 patches applied with an uptime of 6 days on SMP
  and no problems at all. They're all 6 patches applied to the kotd too
  (plus the other bits posted on l-k as well for the write throttling,
  just one bit is still missing but I'll add it soon):
 
  ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD
 
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-21 Thread Andrea Arcangeli
On Fri, Jan 21, 2005 at 05:27:11PM -0400, Mauricio Lin wrote:
 Hi Andrea,
 
 I applied your patch and I am checking your code. It is really a very
 interesting work. I have a question about the function
 __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory
 function. Do not you think it would be better put set_current_state
 instead of __set_current_state function? AFAIK the set_current_state
 function is more feasible for SMP systems, right?

set_current_state is needed only when you need to place a memory barrier
after __set_current_state. So it's needed in the usual wait_event loop,
right after registering in the waitqueue. Example:

unsigned long flags;

wait-flags = ~WQ_FLAG_EXCLUSIVE;
spin_lock_irqsave(q-lock, flags);
if (list_empty(wait-task_list))
__add_wait_queue(q, wait);
/*
 * don't alter the task state if this is just going to
 * queue an async wait queue callback
 */
if (is_sync_wait(wait))
set_current_state(state);
spin_unlock_irqrestore(q-lock, flags);

and even in the above is needed only because spin_unlock has inclusive
semantics in ia64. In 2.4 there was no unlock at all after
set_current_state and it was like this:


set_current_state(TASK_UNINTERRUPTIBLE);
\
if (condition)
\
break;
\
schedule();
\

The rule of thumb is that if there's nothing between set_current_state
and schedule() then __set_current_state is more efficient and equally
safe to use. And the oom killer path I posted falls in this category,
nothing in between set_current_state and schedule, so no reason to place
memory barries in there.

Hope this helps ;)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-21 Thread Andrea Arcangeli
On Fri, Jan 21, 2005 at 05:45:13PM -0400, Mauricio Lin wrote:
 Hi Andrew,
 
 I have another question. You included an oom_adj entry in /proc for
 each process. This was the approach you used in order to allow someone
 or something to interfere the ranking algorithm from userland, right?
 So if i have an another ranking algorithm in user space, I can use it
 to complement the kernel decision as necessary. Was it your idea?

Yes, you should use your userspace algorithm to tune the oom killer via
the oom_adj and you can check the effect of your changes with oom_score.
I posted a one liner ugly script to do that a few days ago on l-k.

The oom_adj has this effect on the badness() code:

/* 
 * Adjust the score by oomkilladj.
 */
if (p-oomkilladj) {
if (p-oomkilladj  0)
points = p-oomkilladj;
else
points = -(p-oomkilladj);
}

The biggest the points become, the more likely the task will be choosen
by the oom killer.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-19 Thread Bodo Eggert
On Thu, 20 Jan 2005, Edjard Souza Mota wrote:

> > > > What about creating a linked list of (stackable) algorhithms which can 
> > > > be
> > > > extended by loading modules and resorted using {proc,sys}fs? It will 
> > > > avoid
> > > > the extra process, the extra CPU time (and task switches) to frequently
> > > > update the list and I think it will decrease the typical amount of used
> > > > memory, too.
> > >
> > > Wouldn't this bring the (set of ) ranking algorithm(s) back to the 
> > > kernel? This
> > > is exactly what we're trying to avoid.
> > 
> > You're trying to avoid it in order to let admins try other ranking
> > algorhithms (at least that's what I read). The module approach seems to be
> > flexible enough to do that, and it avoids the mentioned issues. If you
> > really want a userspace daemon, it can be controled by a module.-)
> 
> Yes, your reading is correct, but this choice should take into account
> the "patterns"
> of how memory is allocated for user's mostly used applications. Why?
> The closer the
> ranking gets to "The Best choice" the longer it will take to invoke
> oom killer again.

ACK.

> I am wondering how could a module control a user space deamon if it
> hasn't started
> yet? I mean, processes at user space are supposed to start only after
> all modules
> are loaded (those loadable at boot time). So, this user space deamon
> would break
> this standard. But if we manage to have a special module that takes
> care of loading
> this stack of  OOM Killer ranking algorithms, then the deamon would
> not need to break
> the default order of loading modules.

I don't think there neeeds to be a special order while loading the 
modules, since each module will provide a defined interface which can be 
registered in a linked list and sorted on demand. Just init all 
compiled-in modules and sort using a kernel-parameter (remembering 
modprobe might be fubar), then modprobe (if compiled-in) all missing 
decision modules from the list (appending them) and resort again.

If the admin wants to add a module later, he can also change the order
again, possibly after configuring the module. Disabeling may be either
done by moving a decision past one without fall-through or by using a
seperate list.

There will be a need for a controling instance which will build a list of
candidates and pass it to each decision module in turn untill the victim
is found. Maybe the list will need a field for a ranking offset and a
scaling factor if a module is not supposed to do the final decision but to
modify the ranking for some blessed processes.

> The init could be changed to
> start the deamon,
> and then the module would start controlling it. Am I right?

It can, but it should be run from the (possibly autogenerated)  
initr{d,amfs} if it's used.

> So that's why people is complaining every distro would have to update the init
> and load this new module. Correct?

ACK. (It's just me - for now)

Upgrading kernels used to be a drop-in replacement, except for ISDN and 
(for 2.4 -> 2.6) v4l. I like it that way.
-- 
Top 100 things you don't want the sysadmin to say:
66. What do you mean you needed that directory?

Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-19 Thread Edjard Souza Mota
> > > If my system needs the OOM killer, it's usurally unresponsive to most
> > > userspace applications. A normal daemon would be swapped out before the
> > > runaway dhcpd grows larger than the web cache. It would have to be a 
> > > mlocked
> > > RT task started from early userspace. It would be difficult to set up 
> > > (unless
> > > you upgrade your distro), and almost nobody will feel like tweaking it to
> > > take the benefit (OOM == -ECANNOTHAPPEN).
> >
> > Please correct me if I got it wrong: as deamon in this case is not a normal 
> > one,
> > since it never gets rate for its own safety,
> 
> That's it's own task, it must make sure not to commit suicide. I forgot
> about that.

Ok.

> > then it needs an RT lock whenever
> > system boots.
> 
> It may not be blocked by a random RT task iff the RT task is supposed to
> be OOM-killed. Therefore it *MUST* run at the highest priority and be
> locked into the RAM.
> 
> It *SHOULD* be run at boot time, too, just in case it's needed early.

Yes. That's the idea of the application we posted to test the oom
killer ranking at
user space. At least, we are working to put it at boot time and these
suggestions are very helpful.

> > > What about creating a linked list of (stackable) algorhithms which can be
> > > extended by loading modules and resorted using {proc,sys}fs? It will avoid
> > > the extra process, the extra CPU time (and task switches) to frequently
> > > update the list and I think it will decrease the typical amount of used
> > > memory, too.
> >
> > Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? 
> > This
> > is exactly what we're trying to avoid.
> 
> You're trying to avoid it in order to let admins try other ranking
> algorhithms (at least that's what I read). The module approach seems to be
> flexible enough to do that, and it avoids the mentioned issues. If you
> really want a userspace daemon, it can be controled by a module.-)

Yes, your reading is correct, but this choice should take into account
the "patterns"
of how memory is allocated for user's mostly used applications. Why?
The closer the
ranking gets to "The Best choice" the longer it will take to invoke
oom killer again.

I am wondering how could a module control a user space deamon if it
hasn't started
yet? I mean, processes at user space are supposed to start only after
all modules
are loaded (those loadable at boot time). So, this user space deamon
would break
this standard. But if we manage to have a special module that takes
care of loading
this stack of  OOM Killer ranking algorithms, then the deamon would
not need to break
the default order of loading modules. The init could be changed to
start the deamon,
and then the module would start controlling it. Am I right?

So that's why people is complaining every distro would have to update the init
and load this new module. Correct?

> 
> I 'm thinking of something like that:
> 
> [X] support stacking of OOM killer ranking algorhythms
> [X] Task blessing OOM filter
> [X] Userspace OOM ranking daemon
> [X] Default OOM killer ranking
> 
> -vs-
> 
> [ ] support stacking of OOM killer ranking algorhythms
> ( ) Userspace OOM ranking daemon
> (o) Default OOM killer ranking
> 

Very interesting idea. Will take that into account. Thanks a lot.

-- 
"In a world without fences ... who needs Gates?"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-19 Thread Edjard Souza Mota
   If my system needs the OOM killer, it's usurally unresponsive to most
   userspace applications. A normal daemon would be swapped out before the
   runaway dhcpd grows larger than the web cache. It would have to be a 
   mlocked
   RT task started from early userspace. It would be difficult to set up 
   (unless
   you upgrade your distro), and almost nobody will feel like tweaking it to
   take the benefit (OOM == -ECANNOTHAPPEN).
 
  Please correct me if I got it wrong: as deamon in this case is not a normal 
  one,
  since it never gets rate for its own safety,
 
 That's it's own task, it must make sure not to commit suicide. I forgot
 about that.

Ok.

  then it needs an RT lock whenever
  system boots.
 
 It may not be blocked by a random RT task iff the RT task is supposed to
 be OOM-killed. Therefore it *MUST* run at the highest priority and be
 locked into the RAM.
 
 It *SHOULD* be run at boot time, too, just in case it's needed early.

Yes. That's the idea of the application we posted to test the oom
killer ranking at
user space. At least, we are working to put it at boot time and these
suggestions are very helpful.

   What about creating a linked list of (stackable) algorhithms which can be
   extended by loading modules and resorted using {proc,sys}fs? It will avoid
   the extra process, the extra CPU time (and task switches) to frequently
   update the list and I think it will decrease the typical amount of used
   memory, too.
 
  Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? 
  This
  is exactly what we're trying to avoid.
 
 You're trying to avoid it in order to let admins try other ranking
 algorhithms (at least that's what I read). The module approach seems to be
 flexible enough to do that, and it avoids the mentioned issues. If you
 really want a userspace daemon, it can be controled by a module.-)

Yes, your reading is correct, but this choice should take into account
the patterns
of how memory is allocated for user's mostly used applications. Why?
The closer the
ranking gets to The Best choice the longer it will take to invoke
oom killer again.

I am wondering how could a module control a user space deamon if it
hasn't started
yet? I mean, processes at user space are supposed to start only after
all modules
are loaded (those loadable at boot time). So, this user space deamon
would break
this standard. But if we manage to have a special module that takes
care of loading
this stack of  OOM Killer ranking algorithms, then the deamon would
not need to break
the default order of loading modules. The init could be changed to
start the deamon,
and then the module would start controlling it. Am I right?

So that's why people is complaining every distro would have to update the init
and load this new module. Correct?

 
 I 'm thinking of something like that:
 
 [X] support stacking of OOM killer ranking algorhythms
 [X] Task blessing OOM filter
 [X] Userspace OOM ranking daemon
 [X] Default OOM killer ranking
 
 -vs-
 
 [ ] support stacking of OOM killer ranking algorhythms
 ( ) Userspace OOM ranking daemon
 (o) Default OOM killer ranking
 

Very interesting idea. Will take that into account. Thanks a lot.

-- 
In a world without fences ... who needs Gates?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-19 Thread Bodo Eggert
On Thu, 20 Jan 2005, Edjard Souza Mota wrote:

What about creating a linked list of (stackable) algorhithms which can 
be
extended by loading modules and resorted using {proc,sys}fs? It will 
avoid
the extra process, the extra CPU time (and task switches) to frequently
update the list and I think it will decrease the typical amount of used
memory, too.
  
   Wouldn't this bring the (set of ) ranking algorithm(s) back to the 
   kernel? This
   is exactly what we're trying to avoid.
  
  You're trying to avoid it in order to let admins try other ranking
  algorhithms (at least that's what I read). The module approach seems to be
  flexible enough to do that, and it avoids the mentioned issues. If you
  really want a userspace daemon, it can be controled by a module.-)
 
 Yes, your reading is correct, but this choice should take into account
 the patterns
 of how memory is allocated for user's mostly used applications. Why?
 The closer the
 ranking gets to The Best choice the longer it will take to invoke
 oom killer again.

ACK.

 I am wondering how could a module control a user space deamon if it
 hasn't started
 yet? I mean, processes at user space are supposed to start only after
 all modules
 are loaded (those loadable at boot time). So, this user space deamon
 would break
 this standard. But if we manage to have a special module that takes
 care of loading
 this stack of  OOM Killer ranking algorithms, then the deamon would
 not need to break
 the default order of loading modules.

I don't think there neeeds to be a special order while loading the 
modules, since each module will provide a defined interface which can be 
registered in a linked list and sorted on demand. Just init all 
compiled-in modules and sort using a kernel-parameter (remembering 
modprobe might be fubar), then modprobe (if compiled-in) all missing 
decision modules from the list (appending them) and resort again.

If the admin wants to add a module later, he can also change the order
again, possibly after configuring the module. Disabeling may be either
done by moving a decision past one without fall-through or by using a
seperate list.

There will be a need for a controling instance which will build a list of
candidates and pass it to each decision module in turn untill the victim
is found. Maybe the list will need a field for a ranking offset and a
scaling factor if a module is not supposed to do the final decision but to
modify the ranking for some blessed processes.

 The init could be changed to
 start the deamon,
 and then the module would start controlling it. Am I right?

It can, but it should be run from the (possibly autogenerated)  
initr{d,amfs} if it's used.

 So that's why people is complaining every distro would have to update the init
 and load this new module. Correct?

ACK. (It's just me - for now)

Upgrading kernels used to be a drop-in replacement, except for ISDN and 
(for 2.4 - 2.6) v4l. I like it that way.
-- 
Top 100 things you don't want the sysadmin to say:
66. What do you mean you needed that directory?

Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-18 Thread Bodo Eggert
On Tue, 18 Jan 2005, Edjard Souza Mota wrote:

> > If my system needs the OOM killer, it's usurally unresponsive to most
> > userspace applications. A normal daemon would be swapped out before the
> > runaway dhcpd grows larger than the web cache. It would have to be a mlocked
> > RT task started from early userspace. It would be difficult to set up 
> > (unless
> > you upgrade your distro), and almost nobody will feel like tweaking it to
> > take the benefit (OOM == -ECANNOTHAPPEN).
> 
> Please correct me if I got it wrong: as deamon in this case is not a normal 
> one,
> since it never gets rate for its own safety,

That's it's own task, it must make sure not to commit suicide. I forgot
about that.

> then it needs an RT lock whenever
> system boots.

It may not be blocked by a random RT task iff the RT task is supposed to
be OOM-killed. Therefore it *MUST* run at the highest priority and be
locked into the RAM.

It *SHOULD* be run at boot time, too, just in case it's needed early.

> > What about creating a linked list of (stackable) algorhithms which can be
> > extended by loading modules and resorted using {proc,sys}fs? It will avoid
> > the extra process, the extra CPU time (and task switches) to frequently
> > update the list and I think it will decrease the typical amount of used
> > memory, too.
> 
> Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? 
> This
> is exactly what we're trying to avoid.

You're trying to avoid it in order to let admins try other ranking
algorhithms (at least that's what I read). The module approach seems to be
flexible enough to do that, and it avoids the mentioned issues. If you
really want a userspace daemon, it can be controled by a module.-)

I 'm thinking of something like that:

[X] support stacking of OOM killer ranking algorhythms
[X] Task blessing OOM filter
[X] Userspace OOM ranking daemon
[X] Default OOM killer ranking

-vs-

[ ] support stacking of OOM killer ranking algorhythms
( ) Userspace OOM ranking daemon
(o) Default OOM killer ranking

-- 
Exceptions prove the rule, and destroy the battle plan. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-18 Thread Edjard Souza Mota
Hi,
 
> If my system needs the OOM killer, it's usurally unresponsive to most
> userspace applications. A normal daemon would be swapped out before the
> runaway dhcpd grows larger than the web cache. It would have to be a mlocked
> RT task started from early userspace. It would be difficult to set up (unless
> you upgrade your distro), and almost nobody will feel like tweaking it to
> take the benefit (OOM == -ECANNOTHAPPEN).

Please correct me if I got it wrong: as deamon in this case is not a normal one,
since it never gets rate for its own safety, then it needs an RT lock whenever
system boots. 

> What about creating a linked list of (stackable) algorhithms which can be
> extended by loading modules and resorted using {proc,sys}fs? It will avoid
> the extra process, the extra CPU time (and task switches) to frequently
> update the list and I think it will decrease the typical amount of used
> memory, too.

Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? This
is exactly what we're trying to avoid. The way we see the potential for doing 
this is that kernel shouldn't  worry about users decision on which process to 
kill but rather take her/his option into account. The computation of such a
decision could be at user space (protected as you suggested above).

We'll think about it, although I'm not sure if there would be such a decrease 
in memory concumption.

br

Edjard


-- 
"In a world without fences ... who needs Gates?"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-18 Thread Edjard Souza Mota
Hi,
 
 If my system needs the OOM killer, it's usurally unresponsive to most
 userspace applications. A normal daemon would be swapped out before the
 runaway dhcpd grows larger than the web cache. It would have to be a mlocked
 RT task started from early userspace. It would be difficult to set up (unless
 you upgrade your distro), and almost nobody will feel like tweaking it to
 take the benefit (OOM == -ECANNOTHAPPEN).

Please correct me if I got it wrong: as deamon in this case is not a normal one,
since it never gets rate for its own safety, then it needs an RT lock whenever
system boots. 

 What about creating a linked list of (stackable) algorhithms which can be
 extended by loading modules and resorted using {proc,sys}fs? It will avoid
 the extra process, the extra CPU time (and task switches) to frequently
 update the list and I think it will decrease the typical amount of used
 memory, too.

Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? This
is exactly what we're trying to avoid. The way we see the potential for doing 
this is that kernel shouldn't  worry about users decision on which process to 
kill but rather take her/his option into account. The computation of such a
decision could be at user space (protected as you suggested above).

We'll think about it, although I'm not sure if there would be such a decrease 
in memory concumption.

br

Edjard


-- 
In a world without fences ... who needs Gates?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-18 Thread Bodo Eggert
On Tue, 18 Jan 2005, Edjard Souza Mota wrote:

  If my system needs the OOM killer, it's usurally unresponsive to most
  userspace applications. A normal daemon would be swapped out before the
  runaway dhcpd grows larger than the web cache. It would have to be a mlocked
  RT task started from early userspace. It would be difficult to set up 
  (unless
  you upgrade your distro), and almost nobody will feel like tweaking it to
  take the benefit (OOM == -ECANNOTHAPPEN).
 
 Please correct me if I got it wrong: as deamon in this case is not a normal 
 one,
 since it never gets rate for its own safety,

That's it's own task, it must make sure not to commit suicide. I forgot
about that.

 then it needs an RT lock whenever
 system boots.

It may not be blocked by a random RT task iff the RT task is supposed to
be OOM-killed. Therefore it *MUST* run at the highest priority and be
locked into the RAM.

It *SHOULD* be run at boot time, too, just in case it's needed early.

  What about creating a linked list of (stackable) algorhithms which can be
  extended by loading modules and resorted using {proc,sys}fs? It will avoid
  the extra process, the extra CPU time (and task switches) to frequently
  update the list and I think it will decrease the typical amount of used
  memory, too.
 
 Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? 
 This
 is exactly what we're trying to avoid.

You're trying to avoid it in order to let admins try other ranking
algorhithms (at least that's what I read). The module approach seems to be
flexible enough to do that, and it avoids the mentioned issues. If you
really want a userspace daemon, it can be controled by a module.-)

I 'm thinking of something like that:

[X] support stacking of OOM killer ranking algorhythms
[X] Task blessing OOM filter
[X] Userspace OOM ranking daemon
[X] Default OOM killer ranking

-vs-

[ ] support stacking of OOM killer ranking algorhythms
( ) Userspace OOM ranking daemon
(o) Default OOM killer ranking

-- 
Exceptions prove the rule, and destroy the battle plan. 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-17 Thread Thomas Gleixner
On Sun, 2005-01-16 at 21:10 +, Alan Cox wrote:
> On Sul, 2005-01-16 at 10:06, Edjard Souza Mota wrote:
> > What do you think about the point we are trying to make, i.e., moving the
> > ranking of PIDs to be killed to user space? Or, making user have some 
> > influence
> > on it? We were misunderstood because the patch we sent was to make "a 
> > slight"
> > organization in the way OOM killer compute rates to PIDs, not to change its
> 
> Im sceptical there is an answer but moving it to user space (or at least
> implementing /proc tunables in user space to experiment) certainly seems
> to be the right way to find out.

No objections against an userspace tuning mechanism, but I still doubt
that replacing the always imperfect in kernel selection completely is
feasable.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-17 Thread Thomas Gleixner
On Sun, 2005-01-16 at 21:10 +, Alan Cox wrote:
 On Sul, 2005-01-16 at 10:06, Edjard Souza Mota wrote:
  What do you think about the point we are trying to make, i.e., moving the
  ranking of PIDs to be killed to user space? Or, making user have some 
  influence
  on it? We were misunderstood because the patch we sent was to make a 
  slight
  organization in the way OOM killer compute rates to PIDs, not to change its
 
 Im sceptical there is an answer but moving it to user space (or at least
 implementing /proc tunables in user space to experiment) certainly seems
 to be the right way to find out.

No objections against an userspace tuning mechanism, but I still doubt
that replacing the always imperfect in kernel selection completely is
feasable.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-16 Thread Alan Cox
On Sul, 2005-01-16 at 10:06, Edjard Souza Mota wrote:
> What do you think about the point we are trying to make, i.e., moving the
> ranking of PIDs to be killed to user space? Or, making user have some 
> influence
> on it? We were misunderstood because the patch we sent was to make "a slight"
> organization in the way OOM killer compute rates to PIDs, not to change its

Im sceptical there is an answer but moving it to user space (or at least
implementing /proc tunables in user space to experiment) certainly seems
to be the right way to find out.

> Well, while AF_TELEPATH socket is not on its way :) ... we may at
> least experiment
> different raking policies.

agreed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-16 Thread Bodo Eggert
Edjard Souza Mota wrote:

> What do you think about the point we are trying to make, i.e., moving the
> ranking of PIDs to be killed to user space?

If my system needs the OOM killer, it's usurally unresponsive to most
userspace applications. A normal daemon would be swapped out before the
runaway dhcpd grows larger than the web cache. It would have to be a mlocked
RT task started from early userspace. It would be difficult to set up (unless
you upgrade your distro), and almost nobody will feel like tweaking it to
take the benefit (OOM == -ECANNOTHAPPEN).

What about creating a linked list of (stackable) algorhithms which can be
extended by loading modules and resorted using {proc,sys}fs? It will avoid
the extra process, the extra CPU time (and task switches) to frequently
update the list and I think it will decrease the typical amount of used
memory, too.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-16 Thread Edjard Souza Mota
Hi,

Thanks Alan...

> > well looking into Alan's email again I think I answered thinking on
> > the wrong side :-) that the suggestion was to switch off OOM
> > altogether and be done with all the discussion... tsk tsk tsk too
> > defensive and hasty I guess :-)
> 
> Thats what mode 2 is all about. There are some problems with over-early
> triggering of OOM that Andrea fixed that are still relevant (or stick
> "never OOM if mode == 2" into your kernel)
> 
> > Did I get it right this time Alan?
> 
> Basically yes - the real problem with the OOM situation is there is no
> correct answer. People have spent years screwing around with the OOM
> killer selection logic and while you can make it pick large tasks or old
> tasks or growing tasks easily nobody has a good heuristic about what to
> die because it depends on the users wishes. OOM requires AF_TELEPATHY
> sockets and we don't have them.
>
> 
> For most users simply not allowing the mess to occur solves the problem
> - not all but most.
> 

What do you think about the point we are trying to make, i.e., moving the
ranking of PIDs to be killed to user space? Or, making user have some influence
on it? We were misunderstood because the patch we sent was to make "a slight"
organization in the way OOM killer compute rates to PIDs, not to change its
selection logic. But now, we can discuss (I mean implement)
alternative selection
logics without messing the code at kernel space. The parameters and
criteria on how
to combine them can be open to more people test it according to platform and, if
not user, at least according to application memory consumpition pattern.

Well, while AF_TELEPATH socket is not on its way :) ... we may at
least experiment
different raking policies.

br

Edard
 

-- 
"In a world without fences ... who needs Gates?"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-16 Thread Edjard Souza Mota
Hi,

Thanks Alan...

  well looking into Alan's email again I think I answered thinking on
  the wrong side :-) that the suggestion was to switch off OOM
  altogether and be done with all the discussion... tsk tsk tsk too
  defensive and hasty I guess :-)
 
 Thats what mode 2 is all about. There are some problems with over-early
 triggering of OOM that Andrea fixed that are still relevant (or stick
 never OOM if mode == 2 into your kernel)
 
  Did I get it right this time Alan?
 
 Basically yes - the real problem with the OOM situation is there is no
 correct answer. People have spent years screwing around with the OOM
 killer selection logic and while you can make it pick large tasks or old
 tasks or growing tasks easily nobody has a good heuristic about what to
 die because it depends on the users wishes. OOM requires AF_TELEPATHY
 sockets and we don't have them.

 
 For most users simply not allowing the mess to occur solves the problem
 - not all but most.
 

What do you think about the point we are trying to make, i.e., moving the
ranking of PIDs to be killed to user space? Or, making user have some influence
on it? We were misunderstood because the patch we sent was to make a slight
organization in the way OOM killer compute rates to PIDs, not to change its
selection logic. But now, we can discuss (I mean implement)
alternative selection
logics without messing the code at kernel space. The parameters and
criteria on how
to combine them can be open to more people test it according to platform and, if
not user, at least according to application memory consumpition pattern.

Well, while AF_TELEPATH socket is not on its way :) ... we may at
least experiment
different raking policies.

br

Edard
 

-- 
In a world without fences ... who needs Gates?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-16 Thread Bodo Eggert
Edjard Souza Mota wrote:

 What do you think about the point we are trying to make, i.e., moving the
 ranking of PIDs to be killed to user space?

If my system needs the OOM killer, it's usurally unresponsive to most
userspace applications. A normal daemon would be swapped out before the
runaway dhcpd grows larger than the web cache. It would have to be a mlocked
RT task started from early userspace. It would be difficult to set up (unless
you upgrade your distro), and almost nobody will feel like tweaking it to
take the benefit (OOM == -ECANNOTHAPPEN).

What about creating a linked list of (stackable) algorhithms which can be
extended by loading modules and resorted using {proc,sys}fs? It will avoid
the extra process, the extra CPU time (and task switches) to frequently
update the list and I think it will decrease the typical amount of used
memory, too.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-16 Thread Alan Cox
On Sul, 2005-01-16 at 10:06, Edjard Souza Mota wrote:
 What do you think about the point we are trying to make, i.e., moving the
 ranking of PIDs to be killed to user space? Or, making user have some 
 influence
 on it? We were misunderstood because the patch we sent was to make a slight
 organization in the way OOM killer compute rates to PIDs, not to change its

Im sceptical there is an answer but moving it to user space (or at least
implementing /proc tunables in user space to experiment) certainly seems
to be the right way to find out.

 Well, while AF_TELEPATH socket is not on its way :) ... we may at
 least experiment
 different raking policies.

agreed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/