Re: User space out of memory approach
On Fri, Jan 28, 2005 at 11:21:11AM -0400, Mauricio Lin wrote: > As you know, Andrew generated the patch. Here goes some test results > about your OOM Killer and the Original OOm Killer. We accomplished 10 > experiments for each OOM Killer and below are average values. > > "Invocations" is the number of times that out_of_memory function is > called. "Selections" is the number of times that select_bad_process > function is called and "Killed" is the number of killed process. > > Original OOM Killer > Invocations average = 51620/10 = 5162 > Selections average = 30/10 = 3 > Killed average = 38/10 = 3.8 > > Andrea OOM Killer > Invocations average = 213/10 = 21.3 > Selections average = 213/10 = 21.3 > Killed average = 52/10 = 5.2 > > As you can see the number of invocations reduced significantly using > your OOM Killer. Yep, thanks for testing! > I did not know about this problem when I was moving the original > ranking algorithm to userland. As Thomaz mentioned: invocation > madness, reentrancy problems and those strange timers and counter as > now, since, last, lastkill and count. I guess that now i can put some > OOM Killer stuffs in userland in a safer manner with those problems > solved, right? Yep ;) > BTW, will your OOM Killer be included in the kernel tree? Yes, Andrew said it should go in the next few days, which is a great news, thanks everyone! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, On Fri, 28 Jan 2005 09:58:24 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote: > Hi Andrea, > > On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> > wrote: > > On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote: > > > Hi Andrea, > > > > > > On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> > > > wrote: > > > > On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: > > > > > Sometimes the first application to be killed is XFree. AFAIK the > > > > > > > > This makes more sense now. You need somebody trapping sigterm in order > > > > to lockup and X sure traps it to recover the text console. > > > > > > > > Can you replace this: > > > > > > > > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) { > > > > force_sig(SIGTERM, p); > > > > } else { > > > > force_sig(SIGKILL, p); > > > > } > > > > > > > > with this? > > > > > > > > force_sig(SIGKILL, p); > > > > > > > > in mm/oom_kill.c. > > > > > > Nice. Your suggestion made the error goes away. > > > > > > We are still testing in order to compare between your OOM Killer and > > > Original OOM Killer. > > > > Ok, thanks for the confirmation. So my theory was right. > > > > Basically we've to make this patch, now that you already edited the > > code, can you diff and send a patch that will be the 6/5 in the serie? > > OK. I will send the patch. As you know, Andrew generated the patch. Here goes some test results about your OOM Killer and the Original OOm Killer. We accomplished 10 experiments for each OOM Killer and below are average values. "Invocations" is the number of times that out_of_memory function is called. "Selections" is the number of times that select_bad_process function is called and "Killed" is the number of killed process. Original OOM Killer Invocations average = 51620/10 = 5162 Selections average = 30/10 = 3 Killed average = 38/10 = 3.8 Andrea OOM Killer Invocations average = 213/10 = 21.3 Selections average = 213/10 = 21.3 Killed average = 52/10 = 5.2 As you can see the number of invocations reduced significantly using your OOM Killer. I did not know about this problem when I was moving the original ranking algorithm to userland. As Thomaz mentioned: invocation madness, reentrancy problems and those strange timers and counter as now, since, last, lastkill and count. I guess that now i can put some OOM Killer stuffs in userland in a safer manner with those problems solved, right? BTW, will your OOM Killer be included in the kernel tree? BR, Mauricio Lin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote: > > Hi Andrea, > > > > On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> > > wrote: > > > On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: > > > > Sometimes the first application to be killed is XFree. AFAIK the > > > > > > This makes more sense now. You need somebody trapping sigterm in order > > > to lockup and X sure traps it to recover the text console. > > > > > > Can you replace this: > > > > > > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) { > > > force_sig(SIGTERM, p); > > > } else { > > > force_sig(SIGKILL, p); > > > } > > > > > > with this? > > > > > > force_sig(SIGKILL, p); > > > > > > in mm/oom_kill.c. > > > > Nice. Your suggestion made the error goes away. > > > > We are still testing in order to compare between your OOM Killer and > > Original OOM Killer. > > Ok, thanks for the confirmation. So my theory was right. > > Basically we've to make this patch, now that you already edited the > code, can you diff and send a patch that will be the 6/5 in the serie? OK. I will send the patch. > (then after fixing this last very longstanding [now deadlock prone too] > bug, we can think how to make at a 7/5 that will wait a few seconds > after sending a sigterm, to fallback into a sigkill, that shouldn't be > difficult, but the above 6/5 will already make the code correct) > > Note, if you add swap it'll workaround it too since then the memhog will > be allowed to grow to a larger rss than X. With 128m of ram and no swap, > X is one of the biggest with xshm involved from some client app > allocating lots of pictures. I could never notice since I always tested > it either with swap or on higher mem systems and my test box runs > with an idle X too which isn't that big ;). Well, we like to reduce the memory resources, because we also think about OOM Killer in small devices with few resources. BR, Mauricio Lin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote: Hi Andrea, On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: Sometimes the first application to be killed is XFree. AFAIK the This makes more sense now. You need somebody trapping sigterm in order to lockup and X sure traps it to recover the text console. Can you replace this: if (cap_t(p-cap_effective) CAP_TO_MASK(CAP_SYS_RAWIO)) { force_sig(SIGTERM, p); } else { force_sig(SIGKILL, p); } with this? force_sig(SIGKILL, p); in mm/oom_kill.c. Nice. Your suggestion made the error goes away. We are still testing in order to compare between your OOM Killer and Original OOM Killer. Ok, thanks for the confirmation. So my theory was right. Basically we've to make this patch, now that you already edited the code, can you diff and send a patch that will be the 6/5 in the serie? OK. I will send the patch. (then after fixing this last very longstanding [now deadlock prone too] bug, we can think how to make at a 7/5 that will wait a few seconds after sending a sigterm, to fallback into a sigkill, that shouldn't be difficult, but the above 6/5 will already make the code correct) Note, if you add swap it'll workaround it too since then the memhog will be allowed to grow to a larger rss than X. With 128m of ram and no swap, X is one of the biggest with xshm involved from some client app allocating lots of pictures. I could never notice since I always tested it either with swap or on higher mem systems and my test box runs with an idle X too which isn't that big ;). Well, we like to reduce the memory resources, because we also think about OOM Killer in small devices with few resources. BR, Mauricio Lin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, On Fri, 28 Jan 2005 09:58:24 -0400, Mauricio Lin [EMAIL PROTECTED] wrote: Hi Andrea, On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote: Hi Andrea, On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: Sometimes the first application to be killed is XFree. AFAIK the This makes more sense now. You need somebody trapping sigterm in order to lockup and X sure traps it to recover the text console. Can you replace this: if (cap_t(p-cap_effective) CAP_TO_MASK(CAP_SYS_RAWIO)) { force_sig(SIGTERM, p); } else { force_sig(SIGKILL, p); } with this? force_sig(SIGKILL, p); in mm/oom_kill.c. Nice. Your suggestion made the error goes away. We are still testing in order to compare between your OOM Killer and Original OOM Killer. Ok, thanks for the confirmation. So my theory was right. Basically we've to make this patch, now that you already edited the code, can you diff and send a patch that will be the 6/5 in the serie? OK. I will send the patch. As you know, Andrew generated the patch. Here goes some test results about your OOM Killer and the Original OOm Killer. We accomplished 10 experiments for each OOM Killer and below are average values. Invocations is the number of times that out_of_memory function is called. Selections is the number of times that select_bad_process function is called and Killed is the number of killed process. Original OOM Killer Invocations average = 51620/10 = 5162 Selections average = 30/10 = 3 Killed average = 38/10 = 3.8 Andrea OOM Killer Invocations average = 213/10 = 21.3 Selections average = 213/10 = 21.3 Killed average = 52/10 = 5.2 As you can see the number of invocations reduced significantly using your OOM Killer. I did not know about this problem when I was moving the original ranking algorithm to userland. As Thomaz mentioned: invocation madness, reentrancy problems and those strange timers and counter as now, since, last, lastkill and count. I guess that now i can put some OOM Killer stuffs in userland in a safer manner with those problems solved, right? BTW, will your OOM Killer be included in the kernel tree? BR, Mauricio Lin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Fri, Jan 28, 2005 at 11:21:11AM -0400, Mauricio Lin wrote: As you know, Andrew generated the patch. Here goes some test results about your OOM Killer and the Original OOm Killer. We accomplished 10 experiments for each OOM Killer and below are average values. Invocations is the number of times that out_of_memory function is called. Selections is the number of times that select_bad_process function is called and Killed is the number of killed process. Original OOM Killer Invocations average = 51620/10 = 5162 Selections average = 30/10 = 3 Killed average = 38/10 = 3.8 Andrea OOM Killer Invocations average = 213/10 = 21.3 Selections average = 213/10 = 21.3 Killed average = 52/10 = 5.2 As you can see the number of invocations reduced significantly using your OOM Killer. Yep, thanks for testing! I did not know about this problem when I was moving the original ranking algorithm to userland. As Thomaz mentioned: invocation madness, reentrancy problems and those strange timers and counter as now, since, last, lastkill and count. I guess that now i can put some OOM Killer stuffs in userland in a safer manner with those problems solved, right? Yep ;) BTW, will your OOM Killer be included in the kernel tree? Yes, Andrew said it should go in the next few days, which is a great news, thanks everyone! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > > And they had not necessairly hardware access. They "might" have hardware > access. On x86 we could perhaps test for non-nullness of tsk->thread->io_bitmap_ptr? > I thought I could wait the other patches > to be merged to avoid confusion before making more changes (since it'd > be a pretty self contained feature), but I can do that now if you > prefer. I'll send your current stuff off to Linus in the next few days - we can let that sit for a while, use that as a base for further work. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Thu, Jan 27, 2005 at 03:35:35PM -0800, Andrew Morton wrote: > On x86 we could perhaps test for non-nullness of tsk->thread->io_bitmap_ptr? yes for ioports. But I'm afraid I was too optimistic about eflags for iopl, that's not in the per-task tss, it's only stored at the very top of the kernel stack and inherit during fork/clone. So we probably need to check esp0 and read the top of the stack to see if a task has eflags set. esp0 is definitely stored in the thread struct when the task is rescheduled, and it cannot change for each given task, so we can access it even while the task is runnable and it shouldn't be corrupted by iret. But the problem is sysenter is optimized not to save eflags on the kernel stack, so the top of the stack - 12bytes would not contain eflags if sysenter is in use. So basically we'd need to change iopl to propagate the info to the task struct synchronously somehow, because we can't read it reliably from the kernel stack. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Thu, Jan 27, 2005 at 02:29:43PM -0800, Andrew Morton wrote: > I've already queued a patch for this: > > --- 25/mm/oom_kill.c~mm-fix-several-oom-killer-bugs-fix Thu Jan 27 > 13:56:58 2005 > +++ 25-akpm/mm/oom_kill.c Thu Jan 27 13:57:19 2005 > @@ -198,12 +198,7 @@ static void __oom_kill_task(task_t *p) > p->time_slice = HZ; > p->memdie = 1; > > - /* This process has hardware access, be more careful. */ > - if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) { > - force_sig(SIGTERM, p); > - } else { > - force_sig(SIGKILL, p); > - } > + force_sig(SIGKILL, p); > } > > static struct mm_struct *oom_kill_task(task_t *p) Thanks. > However. This means that we'll now kill off tasks which had hardware > access. What are the implications of this? The implication of the above is basically that the X server won't be able to restore the text mode, but that avoids the deadlock ;). And they had not necessairly hardware access. They "might" have hardware access. Note that an app may have hardware access even if it has no rawio capabilities. One can run iopl and then change uid just fine. So the above check is quite weak since it leaves the kernel susceptible to bugs and memleaks in any app started by root. Kernel shouldn't trust root apps, all apps are buggy, root apps too (I even once fixed a signal race in /sbin/init that showed up with the schedule child first sched optimization ;). iopl and ioperm are the only two things we care about. We can a synchronous reliable eflags/ioperm value only from the "regs" in the task context. Problem is that since we can pick a task to kill that isn't necessairly the current task, we should start to approximate, and assume the process is sleeping. The regs must be saved during reschedule, so it should cache the old contents. So perhaps we can get a pratically reliable eflags dump from the tss_struct. But this will not be common code and it'll require a specialized arch API. Like has_hw_access(). Only then we can make a stronger assumption and be truly careful about sending SIGKILL. The right way to do this is probably to wait a few seconds before sending the sigkill. I'm not currently sure if it worth adding the has_hw_access(). But certainly I would prefer to do nothing special with only the sys_rawio capability. I thought I could wait the other patches to be merged to avoid confusion before making more changes (since it'd be a pretty self contained feature), but I can do that now if you prefer. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > > > > Can you replace this: > > > > > > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) { > > > force_sig(SIGTERM, p); > > > } else { > > > force_sig(SIGKILL, p); > > > } > > > > > > with this? > > > > > > force_sig(SIGKILL, p); > > > > > > in mm/oom_kill.c. > > > > Nice. Your suggestion made the error goes away. > > > > We are still testing in order to compare between your OOM Killer and > > Original OOM Killer. > > Ok, thanks for the confirmation. So my theory was right. > > Basically we've to make this patch, now that you already edited the > code, can you diff and send a patch that will be the 6/5 in the serie? > I've already queued a patch for this: --- 25/mm/oom_kill.c~mm-fix-several-oom-killer-bugs-fix Thu Jan 27 13:56:58 2005 +++ 25-akpm/mm/oom_kill.c Thu Jan 27 13:57:19 2005 @@ -198,12 +198,7 @@ static void __oom_kill_task(task_t *p) p->time_slice = HZ; p->memdie = 1; - /* This process has hardware access, be more careful. */ - if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) { - force_sig(SIGTERM, p); - } else { - force_sig(SIGKILL, p); - } + force_sig(SIGKILL, p); } static struct mm_struct *oom_kill_task(task_t *p) However. This means that we'll now kill off tasks which had hardware access. What are the implications of this? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote: > Hi Andrea, > > On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> > wrote: > > On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: > > > Sometimes the first application to be killed is XFree. AFAIK the > > > > This makes more sense now. You need somebody trapping sigterm in order > > to lockup and X sure traps it to recover the text console. > > > > Can you replace this: > > > > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) { > > force_sig(SIGTERM, p); > > } else { > > force_sig(SIGKILL, p); > > } > > > > with this? > > > > force_sig(SIGKILL, p); > > > > in mm/oom_kill.c. > > Nice. Your suggestion made the error goes away. > > We are still testing in order to compare between your OOM Killer and > Original OOM Killer. Ok, thanks for the confirmation. So my theory was right. Basically we've to make this patch, now that you already edited the code, can you diff and send a patch that will be the 6/5 in the serie? (then after fixing this last very longstanding [now deadlock prone too] bug, we can think how to make at a 7/5 that will wait a few seconds after sending a sigterm, to fallback into a sigkill, that shouldn't be difficult, but the above 6/5 will already make the code correct) Note, if you add swap it'll workaround it too since then the memhog will be allowed to grow to a larger rss than X. With 128m of ram and no swap, X is one of the biggest with xshm involved from some client app allocating lots of pictures. I could never notice since I always tested it either with swap or on higher mem systems and my test box runs with an idle X too which isn't that big ;). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: > > Sometimes the first application to be killed is XFree. AFAIK the > > This makes more sense now. You need somebody trapping sigterm in order > to lockup and X sure traps it to recover the text console. > > Can you replace this: > > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) { > force_sig(SIGTERM, p); > } else { > force_sig(SIGKILL, p); > } > > with this? > > force_sig(SIGKILL, p); > > in mm/oom_kill.c. Nice. Your suggestion made the error goes away. We are still testing in order to compare between your OOM Killer and Original OOM Killer. BR, Mauricio Lin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: Sometimes the first application to be killed is XFree. AFAIK the This makes more sense now. You need somebody trapping sigterm in order to lockup and X sure traps it to recover the text console. Can you replace this: if (cap_t(p-cap_effective) CAP_TO_MASK(CAP_SYS_RAWIO)) { force_sig(SIGTERM, p); } else { force_sig(SIGKILL, p); } with this? force_sig(SIGKILL, p); in mm/oom_kill.c. Nice. Your suggestion made the error goes away. We are still testing in order to compare between your OOM Killer and Original OOM Killer. BR, Mauricio Lin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote: Hi Andrea, On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: Sometimes the first application to be killed is XFree. AFAIK the This makes more sense now. You need somebody trapping sigterm in order to lockup and X sure traps it to recover the text console. Can you replace this: if (cap_t(p-cap_effective) CAP_TO_MASK(CAP_SYS_RAWIO)) { force_sig(SIGTERM, p); } else { force_sig(SIGKILL, p); } with this? force_sig(SIGKILL, p); in mm/oom_kill.c. Nice. Your suggestion made the error goes away. We are still testing in order to compare between your OOM Killer and Original OOM Killer. Ok, thanks for the confirmation. So my theory was right. Basically we've to make this patch, now that you already edited the code, can you diff and send a patch that will be the 6/5 in the serie? (then after fixing this last very longstanding [now deadlock prone too] bug, we can think how to make at a 7/5 that will wait a few seconds after sending a sigterm, to fallback into a sigkill, that shouldn't be difficult, but the above 6/5 will already make the code correct) Note, if you add swap it'll workaround it too since then the memhog will be allowed to grow to a larger rss than X. With 128m of ram and no swap, X is one of the biggest with xshm involved from some client app allocating lots of pictures. I could never notice since I always tested it either with swap or on higher mem systems and my test box runs with an idle X too which isn't that big ;). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Andrea Arcangeli [EMAIL PROTECTED] wrote: Can you replace this: if (cap_t(p-cap_effective) CAP_TO_MASK(CAP_SYS_RAWIO)) { force_sig(SIGTERM, p); } else { force_sig(SIGKILL, p); } with this? force_sig(SIGKILL, p); in mm/oom_kill.c. Nice. Your suggestion made the error goes away. We are still testing in order to compare between your OOM Killer and Original OOM Killer. Ok, thanks for the confirmation. So my theory was right. Basically we've to make this patch, now that you already edited the code, can you diff and send a patch that will be the 6/5 in the serie? I've already queued a patch for this: --- 25/mm/oom_kill.c~mm-fix-several-oom-killer-bugs-fix Thu Jan 27 13:56:58 2005 +++ 25-akpm/mm/oom_kill.c Thu Jan 27 13:57:19 2005 @@ -198,12 +198,7 @@ static void __oom_kill_task(task_t *p) p-time_slice = HZ; p-memdie = 1; - /* This process has hardware access, be more careful. */ - if (cap_t(p-cap_effective) CAP_TO_MASK(CAP_SYS_RAWIO)) { - force_sig(SIGTERM, p); - } else { - force_sig(SIGKILL, p); - } + force_sig(SIGKILL, p); } static struct mm_struct *oom_kill_task(task_t *p) However. This means that we'll now kill off tasks which had hardware access. What are the implications of this? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Thu, Jan 27, 2005 at 02:29:43PM -0800, Andrew Morton wrote: I've already queued a patch for this: --- 25/mm/oom_kill.c~mm-fix-several-oom-killer-bugs-fix Thu Jan 27 13:56:58 2005 +++ 25-akpm/mm/oom_kill.c Thu Jan 27 13:57:19 2005 @@ -198,12 +198,7 @@ static void __oom_kill_task(task_t *p) p-time_slice = HZ; p-memdie = 1; - /* This process has hardware access, be more careful. */ - if (cap_t(p-cap_effective) CAP_TO_MASK(CAP_SYS_RAWIO)) { - force_sig(SIGTERM, p); - } else { - force_sig(SIGKILL, p); - } + force_sig(SIGKILL, p); } static struct mm_struct *oom_kill_task(task_t *p) Thanks. However. This means that we'll now kill off tasks which had hardware access. What are the implications of this? The implication of the above is basically that the X server won't be able to restore the text mode, but that avoids the deadlock ;). And they had not necessairly hardware access. They might have hardware access. Note that an app may have hardware access even if it has no rawio capabilities. One can run iopl and then change uid just fine. So the above check is quite weak since it leaves the kernel susceptible to bugs and memleaks in any app started by root. Kernel shouldn't trust root apps, all apps are buggy, root apps too (I even once fixed a signal race in /sbin/init that showed up with the schedule child first sched optimization ;). iopl and ioperm are the only two things we care about. We can a synchronous reliable eflags/ioperm value only from the regs in the task context. Problem is that since we can pick a task to kill that isn't necessairly the current task, we should start to approximate, and assume the process is sleeping. The regs must be saved during reschedule, so it should cache the old contents. So perhaps we can get a pratically reliable eflags dump from the tss_struct. But this will not be common code and it'll require a specialized arch API. Like has_hw_access(). Only then we can make a stronger assumption and be truly careful about sending SIGKILL. The right way to do this is probably to wait a few seconds before sending the sigkill. I'm not currently sure if it worth adding the has_hw_access(). But certainly I would prefer to do nothing special with only the sys_rawio capability. I thought I could wait the other patches to be merged to avoid confusion before making more changes (since it'd be a pretty self contained feature), but I can do that now if you prefer. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Thu, Jan 27, 2005 at 03:35:35PM -0800, Andrew Morton wrote: On x86 we could perhaps test for non-nullness of tsk-thread-io_bitmap_ptr? yes for ioports. But I'm afraid I was too optimistic about eflags for iopl, that's not in the per-task tss, it's only stored at the very top of the kernel stack and inherit during fork/clone. So we probably need to check esp0 and read the top of the stack to see if a task has eflags set. esp0 is definitely stored in the thread struct when the task is rescheduled, and it cannot change for each given task, so we can access it even while the task is runnable and it shouldn't be corrupted by iret. But the problem is sysenter is optimized not to save eflags on the kernel stack, so the top of the stack - 12bytes would not contain eflags if sysenter is in use. So basically we'd need to change iopl to propagate the info to the task struct synchronously somehow, because we can't read it reliably from the kernel stack. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Andrea Arcangeli [EMAIL PROTECTED] wrote: And they had not necessairly hardware access. They might have hardware access. On x86 we could perhaps test for non-nullness of tsk-thread-io_bitmap_ptr? I thought I could wait the other patches to be merged to avoid confusion before making more changes (since it'd be a pretty self contained feature), but I can do that now if you prefer. I'll send your current stuff off to Linus in the next few days - we can let that sit for a while, use that as a base for further work. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: > > Sometimes the first application to be killed is XFree. AFAIK the > > This makes more sense now. You need somebody trapping sigterm in order > to lockup and X sure traps it to recover the text console. > > Can you replace this: > > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) { > force_sig(SIGTERM, p); > } else { > force_sig(SIGKILL, p); > } > > with this? OK, let me test it. If I get some news, I will let you know. > > force_sig(SIGKILL, p); > > in mm/oom_kill.c. BR, Mauricio Lin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: Sometimes the first application to be killed is XFree. AFAIK the This makes more sense now. You need somebody trapping sigterm in order to lockup and X sure traps it to recover the text console. Can you replace this: if (cap_t(p-cap_effective) CAP_TO_MASK(CAP_SYS_RAWIO)) { force_sig(SIGTERM, p); } else { force_sig(SIGKILL, p); } with this? OK, let me test it. If I get some news, I will let you know. force_sig(SIGKILL, p); in mm/oom_kill.c. BR, Mauricio Lin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Tue, 2005-01-25 at 20:11 -0400, Mauricio Lin wrote: > > Can you please show the kernel messages ? > > OK. We will try to reach a situation that the printk messages can be > written entirely in the log file and show you the kernel messages. But > as I said: usually the printks messages are not written in the log > file using Andrea's patch. But using the original OOM Killer we can > see the messages in the log file. The syslog.conf file is the same for > both OOM Killer(Andrea and Original). Do you have any idea what is > happening to log file? Add "console=ttyS0,115200" to your commandline so you get all the messages on the serial console. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: > Sometimes the first application to be killed is XFree. AFAIK the This makes more sense now. You need somebody trapping sigterm in order to lockup and X sure traps it to recover the text console. Can you replace this: if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) { force_sig(SIGTERM, p); } else { force_sig(SIGKILL, p); } with this? force_sig(SIGKILL, p); in mm/oom_kill.c. This should fix it. Problem is that SIGTERM is unsafe even if the app is not malicious, there's not enough ram to pagein the userland sighander, so the system lockups. We need a sort of timeout where we fallback into SIGKILL if SIGTERM didn't help. Anyway this is not a new bug, I didn't touch a single bit in that code. I'd really like to see current fixes merged, then we can take care of root app getting killed reliably. In all my test I always run the malicious app as non-root, and anyway I never trap sigterm (X is tiny in my setup, so it never gets killed). Probably the GUI stuff you opened has increased significantly X size for X to be killed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Thomaz, On Tue, 25 Jan 2005 22:39:39 +0100, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > On Tue, 2005-01-25 at 17:13 -0400, Mauricio Lin wrote: > > Hi Andrea, > > > > Your OOM Killer patch was tested and a strange behaviour was found. > > Basically as normal user we started some applications as openoffice, > > mozilla and emacs. > > And as a root (in another tty) we started a simple program that uses > > malloc in a forever loop as below: > > > > int main (void) > > { > > int * mem; > > for (;;) > > mem = (int *) malloc(sizeof(int)); > > return 0; > > } > > > > > > Using the original OOM Killer, malloc is the first killed application > > and the sytem is restored in a useful state. After applying your patch > > and accomplish the same experiment, the OOM Killer it does not kill > > malloc program and it enters in a kind of forever loop as below: > > > > 1) out_of_memory is invoked; > > 2) select_bad_process is invoked; > > Which process is selected ? Sometimes the first application to be killed is XFree. AFAIK the malloc is never killed, because the OOM Killer does not stop to do its work. Usually we are not able to check the kernel log file after rebooting the system. Because nothing was written there (perhaps syslogd or klogd were killed during OOM). But I can see the printk messages on the screen during OOM Killer action. This does not happen with original OOM Killer. I put some printk in order to trace the OOM Killer and IMHO what is going is: out_of_memory function is invoked and after that the select_bad_process is also invoked. So its starts to point each task. But during the do_each_thread / while each_thread loop the condition: if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p->flags & PF_EXITING)) && !(p->flags & PF_DEAD)) return ERR_PTR(-1UL); is true and it leaves from select_bad_process function because of the return statement. So the running code return from the point that select_bad_process was called, i.e., in the out_of_memory function. The condition statement in out_of_memory function: if (PTR_ERR(p) == -1UL) goto out; is also true so it goes to "out" label and leaves from the out_of_memory function. But because of the OOM state the out_of_memory function is invoked again and after that the select_bad_process is also invoked again. And during the do_each_thread / while each_thread loop the same condition as mentioned above is true again. So it leaves from select_bad_process function because of the return statement and goes to "out" label and leaves from the out_of_memory function again. This behaviour is repeated continuously during a long time until I stop waiting and reboot the system using my own finger. > Can you please show the kernel messages ? OK. We will try to reach a situation that the printk messages can be written entirely in the log file and show you the kernel messages. But as I said: usually the printks messages are not written in the log file using Andrea's patch. But using the original OOM Killer we can see the messages in the log file. The syslog.conf file is the same for both OOM Killer(Andrea and Original). Do you have any idea what is happening to log file? If you do not mind, you can accomplish the same test case as I mentioned on my last email. I would like to know if this problem happens to others people as well. We tested on the laptop and desktop machines with 128MB of RAM and swap space disabled. BR, Mauricio Lin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Tue, 2005-01-25 at 17:13 -0400, Mauricio Lin wrote: > Hi Andrea, > > Your OOM Killer patch was tested and a strange behaviour was found. > Basically as normal user we started some applications as openoffice, > mozilla and emacs. > And as a root (in another tty) we started a simple program that uses > malloc in a forever loop as below: > > int main (void) > { > int * mem; > for (;;) > mem = (int *) malloc(sizeof(int)); > return 0; > } > > > Using the original OOM Killer, malloc is the first killed application > and the sytem is restored in a useful state. After applying your patch > and accomplish the same experiment, the OOM Killer it does not kill > malloc program and it enters in a kind of forever loop as below: > > 1) out_of_memory is invoked; > 2) select_bad_process is invoked; Which process is selected ? > 3) the following condition is fullfied; > if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p->flags & > PF_EXITING)) && > !(p->flags & PF_DEAD)) > return ERR_PTR(-1UL); ??? Can you please show the kernel messages ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, Your OOM Killer patch was tested and a strange behaviour was found. Basically as normal user we started some applications as openoffice, mozilla and emacs. And as a root (in another tty) we started a simple program that uses malloc in a forever loop as below: int main (void) { int * mem; for (;;) mem = (int *) malloc(sizeof(int)); return 0; } Using the original OOM Killer, malloc is the first killed application and the sytem is restored in a useful state. After applying your patch and accomplish the same experiment, the OOM Killer it does not kill malloc program and it enters in a kind of forever loop as below: 1) out_of_memory is invoked; 2) select_bad_process is invoked; 3) the following condition is fullfied; if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p->flags & PF_EXITING)) && !(p->flags & PF_DEAD)) return ERR_PTR(-1UL); 4) step 1, 2 ,3 above is executed again; This loop (step 1 until step 4) lasts during a long time (and nothing is killed) until I give up and reboot the system after waiting for some minutes. Any comments? What do you think about our test case? Could you accomplish the same test case using malloc program as root and other graphical applications as normal user? Let me know about your ideas. BR, Mauricio Lin. On Sat, 22 Jan 2005 04:32:19 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > On Fri, Jan 21, 2005 at 05:45:13PM -0400, Mauricio Lin wrote: > > Hi Andrew, > > > > I have another question. You included an oom_adj entry in /proc for > > each process. This was the approach you used in order to allow someone > > or something to interfere the ranking algorithm from userland, right? > > So if i have an another ranking algorithm in user space, I can use it > > to complement the kernel decision as necessary. Was it your idea? > > Yes, you should use your userspace algorithm to tune the oom killer via > the oom_adj and you can check the effect of your changes with oom_score. > I posted a one liner ugly script to do that a few days ago on l-k. > > The oom_adj has this effect on the badness() code: > > /* > * Adjust the score by oomkilladj. > */ > if (p->oomkilladj) { > if (p->oomkilladj > 0) > points <<= p->oomkilladj; > else > points >>= -(p->oomkilladj); > } > > The biggest the points become, the more likely the task will be choosen > by the oom killer. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, Your OOM Killer patch was tested and a strange behaviour was found. Basically as normal user we started some applications as openoffice, mozilla and emacs. And as a root (in another tty) we started a simple program that uses malloc in a forever loop as below: int main (void) { int * mem; for (;;) mem = (int *) malloc(sizeof(int)); return 0; } Using the original OOM Killer, malloc is the first killed application and the sytem is restored in a useful state. After applying your patch and accomplish the same experiment, the OOM Killer it does not kill malloc program and it enters in a kind of forever loop as below: 1) out_of_memory is invoked; 2) select_bad_process is invoked; 3) the following condition is fullfied; if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p-flags PF_EXITING)) !(p-flags PF_DEAD)) return ERR_PTR(-1UL); 4) step 1, 2 ,3 above is executed again; This loop (step 1 until step 4) lasts during a long time (and nothing is killed) until I give up and reboot the system after waiting for some minutes. Any comments? What do you think about our test case? Could you accomplish the same test case using malloc program as root and other graphical applications as normal user? Let me know about your ideas. BR, Mauricio Lin. On Sat, 22 Jan 2005 04:32:19 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: On Fri, Jan 21, 2005 at 05:45:13PM -0400, Mauricio Lin wrote: Hi Andrew, I have another question. You included an oom_adj entry in /proc for each process. This was the approach you used in order to allow someone or something to interfere the ranking algorithm from userland, right? So if i have an another ranking algorithm in user space, I can use it to complement the kernel decision as necessary. Was it your idea? Yes, you should use your userspace algorithm to tune the oom killer via the oom_adj and you can check the effect of your changes with oom_score. I posted a one liner ugly script to do that a few days ago on l-k. The oom_adj has this effect on the badness() code: /* * Adjust the score by oomkilladj. */ if (p-oomkilladj) { if (p-oomkilladj 0) points = p-oomkilladj; else points = -(p-oomkilladj); } The biggest the points become, the more likely the task will be choosen by the oom killer. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Tue, 2005-01-25 at 17:13 -0400, Mauricio Lin wrote: Hi Andrea, Your OOM Killer patch was tested and a strange behaviour was found. Basically as normal user we started some applications as openoffice, mozilla and emacs. And as a root (in another tty) we started a simple program that uses malloc in a forever loop as below: int main (void) { int * mem; for (;;) mem = (int *) malloc(sizeof(int)); return 0; } Using the original OOM Killer, malloc is the first killed application and the sytem is restored in a useful state. After applying your patch and accomplish the same experiment, the OOM Killer it does not kill malloc program and it enters in a kind of forever loop as below: 1) out_of_memory is invoked; 2) select_bad_process is invoked; Which process is selected ? 3) the following condition is fullfied; if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p-flags PF_EXITING)) !(p-flags PF_DEAD)) return ERR_PTR(-1UL); ??? Can you please show the kernel messages ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Thomaz, On Tue, 25 Jan 2005 22:39:39 +0100, Thomas Gleixner [EMAIL PROTECTED] wrote: On Tue, 2005-01-25 at 17:13 -0400, Mauricio Lin wrote: Hi Andrea, Your OOM Killer patch was tested and a strange behaviour was found. Basically as normal user we started some applications as openoffice, mozilla and emacs. And as a root (in another tty) we started a simple program that uses malloc in a forever loop as below: int main (void) { int * mem; for (;;) mem = (int *) malloc(sizeof(int)); return 0; } Using the original OOM Killer, malloc is the first killed application and the sytem is restored in a useful state. After applying your patch and accomplish the same experiment, the OOM Killer it does not kill malloc program and it enters in a kind of forever loop as below: 1) out_of_memory is invoked; 2) select_bad_process is invoked; Which process is selected ? Sometimes the first application to be killed is XFree. AFAIK the malloc is never killed, because the OOM Killer does not stop to do its work. Usually we are not able to check the kernel log file after rebooting the system. Because nothing was written there (perhaps syslogd or klogd were killed during OOM). But I can see the printk messages on the screen during OOM Killer action. This does not happen with original OOM Killer. I put some printk in order to trace the OOM Killer and IMHO what is going is: out_of_memory function is invoked and after that the select_bad_process is also invoked. So its starts to point each task. But during the do_each_thread / while each_thread loop the condition: if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p-flags PF_EXITING)) !(p-flags PF_DEAD)) return ERR_PTR(-1UL); is true and it leaves from select_bad_process function because of the return statement. So the running code return from the point that select_bad_process was called, i.e., in the out_of_memory function. The condition statement in out_of_memory function: if (PTR_ERR(p) == -1UL) goto out; is also true so it goes to out label and leaves from the out_of_memory function. But because of the OOM state the out_of_memory function is invoked again and after that the select_bad_process is also invoked again. And during the do_each_thread / while each_thread loop the same condition as mentioned above is true again. So it leaves from select_bad_process function because of the return statement and goes to out label and leaves from the out_of_memory function again. This behaviour is repeated continuously during a long time until I stop waiting and reboot the system using my own finger. Can you please show the kernel messages ? OK. We will try to reach a situation that the printk messages can be written entirely in the log file and show you the kernel messages. But as I said: usually the printks messages are not written in the log file using Andrea's patch. But using the original OOM Killer we can see the messages in the log file. The syslog.conf file is the same for both OOM Killer(Andrea and Original). Do you have any idea what is happening to log file? If you do not mind, you can accomplish the same test case as I mentioned on my last email. I would like to know if this problem happens to others people as well. We tested on the laptop and desktop machines with 128MB of RAM and swap space disabled. BR, Mauricio Lin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote: Sometimes the first application to be killed is XFree. AFAIK the This makes more sense now. You need somebody trapping sigterm in order to lockup and X sure traps it to recover the text console. Can you replace this: if (cap_t(p-cap_effective) CAP_TO_MASK(CAP_SYS_RAWIO)) { force_sig(SIGTERM, p); } else { force_sig(SIGKILL, p); } with this? force_sig(SIGKILL, p); in mm/oom_kill.c. This should fix it. Problem is that SIGTERM is unsafe even if the app is not malicious, there's not enough ram to pagein the userland sighander, so the system lockups. We need a sort of timeout where we fallback into SIGKILL if SIGTERM didn't help. Anyway this is not a new bug, I didn't touch a single bit in that code. I'd really like to see current fixes merged, then we can take care of root app getting killed reliably. In all my test I always run the malicious app as non-root, and anyway I never trap sigterm (X is tiny in my setup, so it never gets killed). Probably the GUI stuff you opened has increased significantly X size for X to be killed. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Tue, 2005-01-25 at 20:11 -0400, Mauricio Lin wrote: Can you please show the kernel messages ? OK. We will try to reach a situation that the printk messages can be written entirely in the log file and show you the kernel messages. But as I said: usually the printks messages are not written in the log file using Andrea's patch. But using the original OOM Killer we can see the messages in the log file. The syslog.conf file is the same for both OOM Killer(Andrea and Original). Do you have any idea what is happening to log file? Add console=ttyS0,115200 to your commandline so you get all the messages on the serial console. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Fri, Jan 21, 2005 at 05:45:13PM -0400, Mauricio Lin wrote: > Hi Andrew, > > I have another question. You included an oom_adj entry in /proc for > each process. This was the approach you used in order to allow someone > or something to interfere the ranking algorithm from userland, right? > So if i have an another ranking algorithm in user space, I can use it > to complement the kernel decision as necessary. Was it your idea? Yes, you should use your userspace algorithm to tune the oom killer via the oom_adj and you can check the effect of your changes with oom_score. I posted a one liner ugly script to do that a few days ago on l-k. The oom_adj has this effect on the badness() code: /* * Adjust the score by oomkilladj. */ if (p->oomkilladj) { if (p->oomkilladj > 0) points <<= p->oomkilladj; else points >>= -(p->oomkilladj); } The biggest the points become, the more likely the task will be choosen by the oom killer. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Fri, Jan 21, 2005 at 05:27:11PM -0400, Mauricio Lin wrote: > Hi Andrea, > > I applied your patch and I am checking your code. It is really a very > interesting work. I have a question about the function > __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory > function. Do not you think it would be better put set_current_state > instead of __set_current_state function? AFAIK the set_current_state > function is more feasible for SMP systems, right? set_current_state is needed only when you need to place a memory barrier after __set_current_state. So it's needed in the usual wait_event loop, right after registering in the waitqueue. Example: unsigned long flags; wait->flags &= ~WQ_FLAG_EXCLUSIVE; spin_lock_irqsave(>lock, flags); if (list_empty(>task_list)) __add_wait_queue(q, wait); /* * don't alter the task state if this is just going to * queue an async wait queue callback */ if (is_sync_wait(wait)) set_current_state(state); spin_unlock_irqrestore(>lock, flags); and even in the above is needed only because spin_unlock has inclusive semantics in ia64. In 2.4 there was no unlock at all after set_current_state and it was like this: set_current_state(TASK_UNINTERRUPTIBLE); \ if (condition) \ break; \ schedule(); \ The rule of thumb is that if there's nothing between set_current_state and schedule() then __set_current_state is more efficient and equally safe to use. And the oom killer path I posted falls in this category, nothing in between set_current_state and schedule, so no reason to place memory barries in there. Hope this helps ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrew, I have another question. You included an oom_adj entry in /proc for each process. This was the approach you used in order to allow someone or something to interfere the ranking algorithm from userland, right? So if i have an another ranking algorithm in user space, I can use it to complement the kernel decision as necessary. Was it your idea? BR, Mauricio Lin. On Fri, 21 Jan 2005 17:27:11 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote: > Hi Andrea, > > I applied your patch and I am checking your code. It is really a very > interesting work. I have a question about the function > __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory > function. Do not you think it would be better put set_current_state > instead of __set_current_state function? AFAIK the set_current_state > function is more feasible for SMP systems, right? > > BR, > > Mauricio Lin. > > > On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> > wrote: > > On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote: > > > confirmed fix for this available. It was posted more than once. > > > > I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all > > applied to mainline, they're self contained. They add the userspace > > ratings too. > > > > Those patches fixes a longstanding PF_MEMDIE race too and they optimize > > used_math as well. > > > > I'm running with all 6 patches applied with an uptime of 6 days on SMP > > and no problems at all. They're all 6 patches applied to the kotd too > > (plus the other bits posted on l-k as well for the write throttling, > > just one bit is still missing but I'll add it soon): > > > > ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD > > > > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, I applied your patch and I am checking your code. It is really a very interesting work. I have a question about the function __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory function. Do not you think it would be better put set_current_state instead of __set_current_state function? AFAIK the set_current_state function is more feasible for SMP systems, right? BR, Mauricio Lin. On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote: > > confirmed fix for this available. It was posted more than once. > > I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all > applied to mainline, they're self contained. They add the userspace > ratings too. > > Those patches fixes a longstanding PF_MEMDIE race too and they optimize > used_math as well. > > I'm running with all 6 patches applied with an uptime of 6 days on SMP > and no problems at all. They're all 6 patches applied to the kotd too > (plus the other bits posted on l-k as well for the write throttling, > just one bit is still missing but I'll add it soon): > > ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrea, I applied your patch and I am checking your code. It is really a very interesting work. I have a question about the function __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory function. Do not you think it would be better put set_current_state instead of __set_current_state function? AFAIK the set_current_state function is more feasible for SMP systems, right? BR, Mauricio Lin. On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote: confirmed fix for this available. It was posted more than once. I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all applied to mainline, they're self contained. They add the userspace ratings too. Those patches fixes a longstanding PF_MEMDIE race too and they optimize used_math as well. I'm running with all 6 patches applied with an uptime of 6 days on SMP and no problems at all. They're all 6 patches applied to the kotd too (plus the other bits posted on l-k as well for the write throttling, just one bit is still missing but I'll add it soon): ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi Andrew, I have another question. You included an oom_adj entry in /proc for each process. This was the approach you used in order to allow someone or something to interfere the ranking algorithm from userland, right? So if i have an another ranking algorithm in user space, I can use it to complement the kernel decision as necessary. Was it your idea? BR, Mauricio Lin. On Fri, 21 Jan 2005 17:27:11 -0400, Mauricio Lin [EMAIL PROTECTED] wrote: Hi Andrea, I applied your patch and I am checking your code. It is really a very interesting work. I have a question about the function __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory function. Do not you think it would be better put set_current_state instead of __set_current_state function? AFAIK the set_current_state function is more feasible for SMP systems, right? BR, Mauricio Lin. On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote: confirmed fix for this available. It was posted more than once. I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all applied to mainline, they're self contained. They add the userspace ratings too. Those patches fixes a longstanding PF_MEMDIE race too and they optimize used_math as well. I'm running with all 6 patches applied with an uptime of 6 days on SMP and no problems at all. They're all 6 patches applied to the kotd too (plus the other bits posted on l-k as well for the write throttling, just one bit is still missing but I'll add it soon): ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Fri, Jan 21, 2005 at 05:27:11PM -0400, Mauricio Lin wrote: Hi Andrea, I applied your patch and I am checking your code. It is really a very interesting work. I have a question about the function __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory function. Do not you think it would be better put set_current_state instead of __set_current_state function? AFAIK the set_current_state function is more feasible for SMP systems, right? set_current_state is needed only when you need to place a memory barrier after __set_current_state. So it's needed in the usual wait_event loop, right after registering in the waitqueue. Example: unsigned long flags; wait-flags = ~WQ_FLAG_EXCLUSIVE; spin_lock_irqsave(q-lock, flags); if (list_empty(wait-task_list)) __add_wait_queue(q, wait); /* * don't alter the task state if this is just going to * queue an async wait queue callback */ if (is_sync_wait(wait)) set_current_state(state); spin_unlock_irqrestore(q-lock, flags); and even in the above is needed only because spin_unlock has inclusive semantics in ia64. In 2.4 there was no unlock at all after set_current_state and it was like this: set_current_state(TASK_UNINTERRUPTIBLE); \ if (condition) \ break; \ schedule(); \ The rule of thumb is that if there's nothing between set_current_state and schedule() then __set_current_state is more efficient and equally safe to use. And the oom killer path I posted falls in this category, nothing in between set_current_state and schedule, so no reason to place memory barries in there. Hope this helps ;) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Fri, Jan 21, 2005 at 05:45:13PM -0400, Mauricio Lin wrote: Hi Andrew, I have another question. You included an oom_adj entry in /proc for each process. This was the approach you used in order to allow someone or something to interfere the ranking algorithm from userland, right? So if i have an another ranking algorithm in user space, I can use it to complement the kernel decision as necessary. Was it your idea? Yes, you should use your userspace algorithm to tune the oom killer via the oom_adj and you can check the effect of your changes with oom_score. I posted a one liner ugly script to do that a few days ago on l-k. The oom_adj has this effect on the badness() code: /* * Adjust the score by oomkilladj. */ if (p-oomkilladj) { if (p-oomkilladj 0) points = p-oomkilladj; else points = -(p-oomkilladj); } The biggest the points become, the more likely the task will be choosen by the oom killer. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Thu, 20 Jan 2005, Edjard Souza Mota wrote: > > > > What about creating a linked list of (stackable) algorhithms which can > > > > be > > > > extended by loading modules and resorted using {proc,sys}fs? It will > > > > avoid > > > > the extra process, the extra CPU time (and task switches) to frequently > > > > update the list and I think it will decrease the typical amount of used > > > > memory, too. > > > > > > Wouldn't this bring the (set of ) ranking algorithm(s) back to the > > > kernel? This > > > is exactly what we're trying to avoid. > > > > You're trying to avoid it in order to let admins try other ranking > > algorhithms (at least that's what I read). The module approach seems to be > > flexible enough to do that, and it avoids the mentioned issues. If you > > really want a userspace daemon, it can be controled by a module.-) > > Yes, your reading is correct, but this choice should take into account > the "patterns" > of how memory is allocated for user's mostly used applications. Why? > The closer the > ranking gets to "The Best choice" the longer it will take to invoke > oom killer again. ACK. > I am wondering how could a module control a user space deamon if it > hasn't started > yet? I mean, processes at user space are supposed to start only after > all modules > are loaded (those loadable at boot time). So, this user space deamon > would break > this standard. But if we manage to have a special module that takes > care of loading > this stack of OOM Killer ranking algorithms, then the deamon would > not need to break > the default order of loading modules. I don't think there neeeds to be a special order while loading the modules, since each module will provide a defined interface which can be registered in a linked list and sorted on demand. Just init all compiled-in modules and sort using a kernel-parameter (remembering modprobe might be fubar), then modprobe (if compiled-in) all missing decision modules from the list (appending them) and resort again. If the admin wants to add a module later, he can also change the order again, possibly after configuring the module. Disabeling may be either done by moving a decision past one without fall-through or by using a seperate list. There will be a need for a controling instance which will build a list of candidates and pass it to each decision module in turn untill the victim is found. Maybe the list will need a field for a ranking offset and a scaling factor if a module is not supposed to do the final decision but to modify the ranking for some blessed processes. > The init could be changed to > start the deamon, > and then the module would start controlling it. Am I right? It can, but it should be run from the (possibly autogenerated) initr{d,amfs} if it's used. > So that's why people is complaining every distro would have to update the init > and load this new module. Correct? ACK. (It's just me - for now) Upgrading kernels used to be a drop-in replacement, except for ISDN and (for 2.4 -> 2.6) v4l. I like it that way. -- Top 100 things you don't want the sysadmin to say: 66. What do you mean you needed that directory? Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
> > > If my system needs the OOM killer, it's usurally unresponsive to most > > > userspace applications. A normal daemon would be swapped out before the > > > runaway dhcpd grows larger than the web cache. It would have to be a > > > mlocked > > > RT task started from early userspace. It would be difficult to set up > > > (unless > > > you upgrade your distro), and almost nobody will feel like tweaking it to > > > take the benefit (OOM == -ECANNOTHAPPEN). > > > > Please correct me if I got it wrong: as deamon in this case is not a normal > > one, > > since it never gets rate for its own safety, > > That's it's own task, it must make sure not to commit suicide. I forgot > about that. Ok. > > then it needs an RT lock whenever > > system boots. > > It may not be blocked by a random RT task iff the RT task is supposed to > be OOM-killed. Therefore it *MUST* run at the highest priority and be > locked into the RAM. > > It *SHOULD* be run at boot time, too, just in case it's needed early. Yes. That's the idea of the application we posted to test the oom killer ranking at user space. At least, we are working to put it at boot time and these suggestions are very helpful. > > > What about creating a linked list of (stackable) algorhithms which can be > > > extended by loading modules and resorted using {proc,sys}fs? It will avoid > > > the extra process, the extra CPU time (and task switches) to frequently > > > update the list and I think it will decrease the typical amount of used > > > memory, too. > > > > Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? > > This > > is exactly what we're trying to avoid. > > You're trying to avoid it in order to let admins try other ranking > algorhithms (at least that's what I read). The module approach seems to be > flexible enough to do that, and it avoids the mentioned issues. If you > really want a userspace daemon, it can be controled by a module.-) Yes, your reading is correct, but this choice should take into account the "patterns" of how memory is allocated for user's mostly used applications. Why? The closer the ranking gets to "The Best choice" the longer it will take to invoke oom killer again. I am wondering how could a module control a user space deamon if it hasn't started yet? I mean, processes at user space are supposed to start only after all modules are loaded (those loadable at boot time). So, this user space deamon would break this standard. But if we manage to have a special module that takes care of loading this stack of OOM Killer ranking algorithms, then the deamon would not need to break the default order of loading modules. The init could be changed to start the deamon, and then the module would start controlling it. Am I right? So that's why people is complaining every distro would have to update the init and load this new module. Correct? > > I 'm thinking of something like that: > > [X] support stacking of OOM killer ranking algorhythms > [X] Task blessing OOM filter > [X] Userspace OOM ranking daemon > [X] Default OOM killer ranking > > -vs- > > [ ] support stacking of OOM killer ranking algorhythms > ( ) Userspace OOM ranking daemon > (o) Default OOM killer ranking > Very interesting idea. Will take that into account. Thanks a lot. -- "In a world without fences ... who needs Gates?" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
If my system needs the OOM killer, it's usurally unresponsive to most userspace applications. A normal daemon would be swapped out before the runaway dhcpd grows larger than the web cache. It would have to be a mlocked RT task started from early userspace. It would be difficult to set up (unless you upgrade your distro), and almost nobody will feel like tweaking it to take the benefit (OOM == -ECANNOTHAPPEN). Please correct me if I got it wrong: as deamon in this case is not a normal one, since it never gets rate for its own safety, That's it's own task, it must make sure not to commit suicide. I forgot about that. Ok. then it needs an RT lock whenever system boots. It may not be blocked by a random RT task iff the RT task is supposed to be OOM-killed. Therefore it *MUST* run at the highest priority and be locked into the RAM. It *SHOULD* be run at boot time, too, just in case it's needed early. Yes. That's the idea of the application we posted to test the oom killer ranking at user space. At least, we are working to put it at boot time and these suggestions are very helpful. What about creating a linked list of (stackable) algorhithms which can be extended by loading modules and resorted using {proc,sys}fs? It will avoid the extra process, the extra CPU time (and task switches) to frequently update the list and I think it will decrease the typical amount of used memory, too. Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? This is exactly what we're trying to avoid. You're trying to avoid it in order to let admins try other ranking algorhithms (at least that's what I read). The module approach seems to be flexible enough to do that, and it avoids the mentioned issues. If you really want a userspace daemon, it can be controled by a module.-) Yes, your reading is correct, but this choice should take into account the patterns of how memory is allocated for user's mostly used applications. Why? The closer the ranking gets to The Best choice the longer it will take to invoke oom killer again. I am wondering how could a module control a user space deamon if it hasn't started yet? I mean, processes at user space are supposed to start only after all modules are loaded (those loadable at boot time). So, this user space deamon would break this standard. But if we manage to have a special module that takes care of loading this stack of OOM Killer ranking algorithms, then the deamon would not need to break the default order of loading modules. The init could be changed to start the deamon, and then the module would start controlling it. Am I right? So that's why people is complaining every distro would have to update the init and load this new module. Correct? I 'm thinking of something like that: [X] support stacking of OOM killer ranking algorhythms [X] Task blessing OOM filter [X] Userspace OOM ranking daemon [X] Default OOM killer ranking -vs- [ ] support stacking of OOM killer ranking algorhythms ( ) Userspace OOM ranking daemon (o) Default OOM killer ranking Very interesting idea. Will take that into account. Thanks a lot. -- In a world without fences ... who needs Gates? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Thu, 20 Jan 2005, Edjard Souza Mota wrote: What about creating a linked list of (stackable) algorhithms which can be extended by loading modules and resorted using {proc,sys}fs? It will avoid the extra process, the extra CPU time (and task switches) to frequently update the list and I think it will decrease the typical amount of used memory, too. Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? This is exactly what we're trying to avoid. You're trying to avoid it in order to let admins try other ranking algorhithms (at least that's what I read). The module approach seems to be flexible enough to do that, and it avoids the mentioned issues. If you really want a userspace daemon, it can be controled by a module.-) Yes, your reading is correct, but this choice should take into account the patterns of how memory is allocated for user's mostly used applications. Why? The closer the ranking gets to The Best choice the longer it will take to invoke oom killer again. ACK. I am wondering how could a module control a user space deamon if it hasn't started yet? I mean, processes at user space are supposed to start only after all modules are loaded (those loadable at boot time). So, this user space deamon would break this standard. But if we manage to have a special module that takes care of loading this stack of OOM Killer ranking algorithms, then the deamon would not need to break the default order of loading modules. I don't think there neeeds to be a special order while loading the modules, since each module will provide a defined interface which can be registered in a linked list and sorted on demand. Just init all compiled-in modules and sort using a kernel-parameter (remembering modprobe might be fubar), then modprobe (if compiled-in) all missing decision modules from the list (appending them) and resort again. If the admin wants to add a module later, he can also change the order again, possibly after configuring the module. Disabeling may be either done by moving a decision past one without fall-through or by using a seperate list. There will be a need for a controling instance which will build a list of candidates and pass it to each decision module in turn untill the victim is found. Maybe the list will need a field for a ranking offset and a scaling factor if a module is not supposed to do the final decision but to modify the ranking for some blessed processes. The init could be changed to start the deamon, and then the module would start controlling it. Am I right? It can, but it should be run from the (possibly autogenerated) initr{d,amfs} if it's used. So that's why people is complaining every distro would have to update the init and load this new module. Correct? ACK. (It's just me - for now) Upgrading kernels used to be a drop-in replacement, except for ISDN and (for 2.4 - 2.6) v4l. I like it that way. -- Top 100 things you don't want the sysadmin to say: 66. What do you mean you needed that directory? Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Tue, 18 Jan 2005, Edjard Souza Mota wrote: > > If my system needs the OOM killer, it's usurally unresponsive to most > > userspace applications. A normal daemon would be swapped out before the > > runaway dhcpd grows larger than the web cache. It would have to be a mlocked > > RT task started from early userspace. It would be difficult to set up > > (unless > > you upgrade your distro), and almost nobody will feel like tweaking it to > > take the benefit (OOM == -ECANNOTHAPPEN). > > Please correct me if I got it wrong: as deamon in this case is not a normal > one, > since it never gets rate for its own safety, That's it's own task, it must make sure not to commit suicide. I forgot about that. > then it needs an RT lock whenever > system boots. It may not be blocked by a random RT task iff the RT task is supposed to be OOM-killed. Therefore it *MUST* run at the highest priority and be locked into the RAM. It *SHOULD* be run at boot time, too, just in case it's needed early. > > What about creating a linked list of (stackable) algorhithms which can be > > extended by loading modules and resorted using {proc,sys}fs? It will avoid > > the extra process, the extra CPU time (and task switches) to frequently > > update the list and I think it will decrease the typical amount of used > > memory, too. > > Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? > This > is exactly what we're trying to avoid. You're trying to avoid it in order to let admins try other ranking algorhithms (at least that's what I read). The module approach seems to be flexible enough to do that, and it avoids the mentioned issues. If you really want a userspace daemon, it can be controled by a module.-) I 'm thinking of something like that: [X] support stacking of OOM killer ranking algorhythms [X] Task blessing OOM filter [X] Userspace OOM ranking daemon [X] Default OOM killer ranking -vs- [ ] support stacking of OOM killer ranking algorhythms ( ) Userspace OOM ranking daemon (o) Default OOM killer ranking -- Exceptions prove the rule, and destroy the battle plan. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi, > If my system needs the OOM killer, it's usurally unresponsive to most > userspace applications. A normal daemon would be swapped out before the > runaway dhcpd grows larger than the web cache. It would have to be a mlocked > RT task started from early userspace. It would be difficult to set up (unless > you upgrade your distro), and almost nobody will feel like tweaking it to > take the benefit (OOM == -ECANNOTHAPPEN). Please correct me if I got it wrong: as deamon in this case is not a normal one, since it never gets rate for its own safety, then it needs an RT lock whenever system boots. > What about creating a linked list of (stackable) algorhithms which can be > extended by loading modules and resorted using {proc,sys}fs? It will avoid > the extra process, the extra CPU time (and task switches) to frequently > update the list and I think it will decrease the typical amount of used > memory, too. Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? This is exactly what we're trying to avoid. The way we see the potential for doing this is that kernel shouldn't worry about users decision on which process to kill but rather take her/his option into account. The computation of such a decision could be at user space (protected as you suggested above). We'll think about it, although I'm not sure if there would be such a decrease in memory concumption. br Edjard -- "In a world without fences ... who needs Gates?" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi, If my system needs the OOM killer, it's usurally unresponsive to most userspace applications. A normal daemon would be swapped out before the runaway dhcpd grows larger than the web cache. It would have to be a mlocked RT task started from early userspace. It would be difficult to set up (unless you upgrade your distro), and almost nobody will feel like tweaking it to take the benefit (OOM == -ECANNOTHAPPEN). Please correct me if I got it wrong: as deamon in this case is not a normal one, since it never gets rate for its own safety, then it needs an RT lock whenever system boots. What about creating a linked list of (stackable) algorhithms which can be extended by loading modules and resorted using {proc,sys}fs? It will avoid the extra process, the extra CPU time (and task switches) to frequently update the list and I think it will decrease the typical amount of used memory, too. Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? This is exactly what we're trying to avoid. The way we see the potential for doing this is that kernel shouldn't worry about users decision on which process to kill but rather take her/his option into account. The computation of such a decision could be at user space (protected as you suggested above). We'll think about it, although I'm not sure if there would be such a decrease in memory concumption. br Edjard -- In a world without fences ... who needs Gates? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Tue, 18 Jan 2005, Edjard Souza Mota wrote: If my system needs the OOM killer, it's usurally unresponsive to most userspace applications. A normal daemon would be swapped out before the runaway dhcpd grows larger than the web cache. It would have to be a mlocked RT task started from early userspace. It would be difficult to set up (unless you upgrade your distro), and almost nobody will feel like tweaking it to take the benefit (OOM == -ECANNOTHAPPEN). Please correct me if I got it wrong: as deamon in this case is not a normal one, since it never gets rate for its own safety, That's it's own task, it must make sure not to commit suicide. I forgot about that. then it needs an RT lock whenever system boots. It may not be blocked by a random RT task iff the RT task is supposed to be OOM-killed. Therefore it *MUST* run at the highest priority and be locked into the RAM. It *SHOULD* be run at boot time, too, just in case it's needed early. What about creating a linked list of (stackable) algorhithms which can be extended by loading modules and resorted using {proc,sys}fs? It will avoid the extra process, the extra CPU time (and task switches) to frequently update the list and I think it will decrease the typical amount of used memory, too. Wouldn't this bring the (set of ) ranking algorithm(s) back to the kernel? This is exactly what we're trying to avoid. You're trying to avoid it in order to let admins try other ranking algorhithms (at least that's what I read). The module approach seems to be flexible enough to do that, and it avoids the mentioned issues. If you really want a userspace daemon, it can be controled by a module.-) I 'm thinking of something like that: [X] support stacking of OOM killer ranking algorhythms [X] Task blessing OOM filter [X] Userspace OOM ranking daemon [X] Default OOM killer ranking -vs- [ ] support stacking of OOM killer ranking algorhythms ( ) Userspace OOM ranking daemon (o) Default OOM killer ranking -- Exceptions prove the rule, and destroy the battle plan. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Sun, 2005-01-16 at 21:10 +, Alan Cox wrote: > On Sul, 2005-01-16 at 10:06, Edjard Souza Mota wrote: > > What do you think about the point we are trying to make, i.e., moving the > > ranking of PIDs to be killed to user space? Or, making user have some > > influence > > on it? We were misunderstood because the patch we sent was to make "a > > slight" > > organization in the way OOM killer compute rates to PIDs, not to change its > > Im sceptical there is an answer but moving it to user space (or at least > implementing /proc tunables in user space to experiment) certainly seems > to be the right way to find out. No objections against an userspace tuning mechanism, but I still doubt that replacing the always imperfect in kernel selection completely is feasable. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Sun, 2005-01-16 at 21:10 +, Alan Cox wrote: On Sul, 2005-01-16 at 10:06, Edjard Souza Mota wrote: What do you think about the point we are trying to make, i.e., moving the ranking of PIDs to be killed to user space? Or, making user have some influence on it? We were misunderstood because the patch we sent was to make a slight organization in the way OOM killer compute rates to PIDs, not to change its Im sceptical there is an answer but moving it to user space (or at least implementing /proc tunables in user space to experiment) certainly seems to be the right way to find out. No objections against an userspace tuning mechanism, but I still doubt that replacing the always imperfect in kernel selection completely is feasable. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Sul, 2005-01-16 at 10:06, Edjard Souza Mota wrote: > What do you think about the point we are trying to make, i.e., moving the > ranking of PIDs to be killed to user space? Or, making user have some > influence > on it? We were misunderstood because the patch we sent was to make "a slight" > organization in the way OOM killer compute rates to PIDs, not to change its Im sceptical there is an answer but moving it to user space (or at least implementing /proc tunables in user space to experiment) certainly seems to be the right way to find out. > Well, while AF_TELEPATH socket is not on its way :) ... we may at > least experiment > different raking policies. agreed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Edjard Souza Mota wrote: > What do you think about the point we are trying to make, i.e., moving the > ranking of PIDs to be killed to user space? If my system needs the OOM killer, it's usurally unresponsive to most userspace applications. A normal daemon would be swapped out before the runaway dhcpd grows larger than the web cache. It would have to be a mlocked RT task started from early userspace. It would be difficult to set up (unless you upgrade your distro), and almost nobody will feel like tweaking it to take the benefit (OOM == -ECANNOTHAPPEN). What about creating a linked list of (stackable) algorhithms which can be extended by loading modules and resorted using {proc,sys}fs? It will avoid the extra process, the extra CPU time (and task switches) to frequently update the list and I think it will decrease the typical amount of used memory, too. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi, Thanks Alan... > > well looking into Alan's email again I think I answered thinking on > > the wrong side :-) that the suggestion was to switch off OOM > > altogether and be done with all the discussion... tsk tsk tsk too > > defensive and hasty I guess :-) > > Thats what mode 2 is all about. There are some problems with over-early > triggering of OOM that Andrea fixed that are still relevant (or stick > "never OOM if mode == 2" into your kernel) > > > Did I get it right this time Alan? > > Basically yes - the real problem with the OOM situation is there is no > correct answer. People have spent years screwing around with the OOM > killer selection logic and while you can make it pick large tasks or old > tasks or growing tasks easily nobody has a good heuristic about what to > die because it depends on the users wishes. OOM requires AF_TELEPATHY > sockets and we don't have them. > > > For most users simply not allowing the mess to occur solves the problem > - not all but most. > What do you think about the point we are trying to make, i.e., moving the ranking of PIDs to be killed to user space? Or, making user have some influence on it? We were misunderstood because the patch we sent was to make "a slight" organization in the way OOM killer compute rates to PIDs, not to change its selection logic. But now, we can discuss (I mean implement) alternative selection logics without messing the code at kernel space. The parameters and criteria on how to combine them can be open to more people test it according to platform and, if not user, at least according to application memory consumpition pattern. Well, while AF_TELEPATH socket is not on its way :) ... we may at least experiment different raking policies. br Edard -- "In a world without fences ... who needs Gates?" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Hi, Thanks Alan... well looking into Alan's email again I think I answered thinking on the wrong side :-) that the suggestion was to switch off OOM altogether and be done with all the discussion... tsk tsk tsk too defensive and hasty I guess :-) Thats what mode 2 is all about. There are some problems with over-early triggering of OOM that Andrea fixed that are still relevant (or stick never OOM if mode == 2 into your kernel) Did I get it right this time Alan? Basically yes - the real problem with the OOM situation is there is no correct answer. People have spent years screwing around with the OOM killer selection logic and while you can make it pick large tasks or old tasks or growing tasks easily nobody has a good heuristic about what to die because it depends on the users wishes. OOM requires AF_TELEPATHY sockets and we don't have them. For most users simply not allowing the mess to occur solves the problem - not all but most. What do you think about the point we are trying to make, i.e., moving the ranking of PIDs to be killed to user space? Or, making user have some influence on it? We were misunderstood because the patch we sent was to make a slight organization in the way OOM killer compute rates to PIDs, not to change its selection logic. But now, we can discuss (I mean implement) alternative selection logics without messing the code at kernel space. The parameters and criteria on how to combine them can be open to more people test it according to platform and, if not user, at least according to application memory consumpition pattern. Well, while AF_TELEPATH socket is not on its way :) ... we may at least experiment different raking policies. br Edard -- In a world without fences ... who needs Gates? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
Edjard Souza Mota wrote: What do you think about the point we are trying to make, i.e., moving the ranking of PIDs to be killed to user space? If my system needs the OOM killer, it's usurally unresponsive to most userspace applications. A normal daemon would be swapped out before the runaway dhcpd grows larger than the web cache. It would have to be a mlocked RT task started from early userspace. It would be difficult to set up (unless you upgrade your distro), and almost nobody will feel like tweaking it to take the benefit (OOM == -ECANNOTHAPPEN). What about creating a linked list of (stackable) algorhithms which can be extended by loading modules and resorted using {proc,sys}fs? It will avoid the extra process, the extra CPU time (and task switches) to frequently update the list and I think it will decrease the typical amount of used memory, too. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: User space out of memory approach
On Sul, 2005-01-16 at 10:06, Edjard Souza Mota wrote: What do you think about the point we are trying to make, i.e., moving the ranking of PIDs to be killed to user space? Or, making user have some influence on it? We were misunderstood because the patch we sent was to make a slight organization in the way OOM killer compute rates to PIDs, not to change its Im sceptical there is an answer but moving it to user space (or at least implementing /proc tunables in user space to experiment) certainly seems to be the right way to find out. Well, while AF_TELEPATH socket is not on its way :) ... we may at least experiment different raking policies. agreed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/