Re: [RFC -v2] panic_on_oom_timeout

2015-06-19 Thread Tetsuo Handa
Michal Hocko wrote: On Wed 17-06-15 22:59:54, Tetsuo Handa wrote: Michal Hocko wrote: [...] But you have a point that we could have - constrained OOM which elevates oom_victims - global OOM killer strikes but wouldn't start the timer This is certainly possible and timer_pending

Re: [RFC -v2] panic_on_oom_timeout

2015-06-19 Thread Tetsuo Handa
Michal Hocko wrote: Yes I was thinking about this as well because the primary assumption of the OOM killer is that the victim will release some memory. And it doesn't matter whether the OOM killer was constrained or the global one. So the above looks good at first sight, I am just afraid it is

Re: [RFC] panic_on_oom_timeout

2015-06-11 Thread Tetsuo Handa
Michal Hocko wrote: On Thu 11-06-15 22:12:40, Tetsuo Handa wrote: Michal Hocko wrote: [...] The moom_work used by SysRq-f sometimes cannot be executed because some work which is processed before the moom_work is processed is stalled for unbounded amount of time due to looping

Re: [RFC] panic_on_oom_timeout

2015-06-11 Thread Tetsuo Handa
Michal Hocko wrote: The feature is implemented as a delayed work which is scheduled when the OOM condition is declared for the first time (oom_victims is still zero) in out_of_memory and it is canceled in exit_oom_victim after the oom_victims count drops down to zero. For this time

Re: [RFC] panic_on_oom_timeout

2015-06-16 Thread Tetsuo Handa
Michal Hocko wrote: This patch implements system_memdie_panic_secs sysctl which configures a maximum timeout for the OOM killer to resolve the OOM situation. If the system is still under OOM (i.e. the OOM victim cannot release memory) after the timeout expires, it will panic the system. A

Re: [RFC -v2] panic_on_oom_timeout

2015-06-17 Thread Tetsuo Handa
Michal Hocko wrote: Hi, I was thinking about this and I am more and more convinced that we shouldn't care about panic_on_oom=2 configuration for now and go with the simplest solution first. I have revisited my original patch and replaced delayed work by a timer based on the feedback from

Re: [RFC] panic_on_oom_timeout

2015-06-17 Thread Tetsuo Handa
Michal Hocko wrote a few minutes ago: Subject: [RFC -v2] panic_on_oom_timeout Oops, we raced... Michal Hocko wrote: On Tue 16-06-15 22:14:28, Tetsuo Handa wrote: Michal Hocko wrote: This patch implements system_memdie_panic_secs sysctl which configures a maximum timeout for the OOM

Re: [RFC -v2] panic_on_oom_timeout

2015-06-17 Thread Tetsuo Handa
Michal Hocko wrote: + if (sysctl_panic_on_oom_timeout) { + if (sysctl_panic_on_oom 1) { + pr_warn(panic_on_oom_timeout is ignored for panic_on_oom=2\n); + } else { + /* + * Only schedule the delayed

Re: [RFC] panic_on_oom_timeout

2015-06-12 Thread Tetsuo Handa
that most administrators will no longer need to use panic_on_oom 0 by setting adequate values to these timeouts. From e59b64683827151a35257384352c70bce61babdd Mon Sep 17 00:00:00 2001 From: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp

Re: [PATCH 0/9] mm: improve OOM mechanism v2

2015-05-23 Thread Tetsuo Handa
Michal Hocko wrote: On Thu 30-04-15 18:44:25, Tetsuo Handa wrote: Michal Hocko wrote: I mean we should eventually fail all the allocation types but GFP_NOFS is coming from _carefully_ handled code paths which is an easier starting point than a random code path in the kernel/drivers. So

[PATCH] mm: Introduce timeout based OOM killing

2015-05-23 Thread Tetsuo Handa
From 5999a1ebee5e611eaa4fa7be37abbf1fbdc8ef93 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp Date: Sat, 23 May 2015 22:42:20 +0900 Subject: [PATCH] mm: Introduce timeout based OOM killing This proposal is an interim amendment, which focused on possibility

Re: [patch -mm] mm, oom: add global access to memory reserves on livelock

2015-08-21 Thread Tetsuo Handa
Michal Hocko wrote: [CCing Tetsuo - he was really concerned about the oom deadlocks and he was proposing a timeout based solution as well] Thank you for CCing me. My proposal is http://lkml.kernel.org/r/201505232339.dab00557.vfflhmsojfo...@i-love.sakura.ne.jp . On Thu 20-08-15 14:00:36,

Re: [RFC -v2] panic_on_oom_timeout

2015-07-29 Thread Tetsuo Handa
Michal Hocko wrote: On Wed 17-06-15 15:24:27, Michal Hocko wrote: On Wed 17-06-15 14:51:27, Michal Hocko wrote: [...] The important thing is to decide what is the reasonable way forward. We have two two implementations of panic based timeout. So we should decide And the most

Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks

2015-10-23 Thread Tetsuo Handa
Michal Hocko wrote: > On Fri 23-10-15 19:36:30, Tejun Heo wrote: > > Hello, Michal. > > > > On Fri, Oct 23, 2015 at 10:33:16AM +0200, Michal Hocko wrote: > > > Ohh, OK I can see wq_worker_sleeping now. I've missed your point in > > > other email, sorry about that. But now I am wondering whether

Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks

2015-10-25 Thread Tetsuo Handa
ator does (3). But kernel code invoked via workqueue is expected to do (4) than (3). This means that any kernel code which invokes a __GFP_WAIT allocation might fail to do (4) when invoked via workqueue, regardless of flags passed to alloc_workqueue()? Michal Hocko wrote: > On Fri 23-10-15 06:42:4

Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks

2015-10-21 Thread Tetsuo Handa
Michal Hocko wrote: > On Wed 21-10-15 09:49:07, Christoph Lameter wrote: > > On Wed, 21 Oct 2015, Michal Hocko wrote: > > > > > Because all the WQ workers are stuck somewhere, maybe in the memory > > > allocation which cannot make any progress and the vmstat update work is > > > queued behind

Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks

2015-10-22 Thread Tetsuo Handa
Christoph Lameter wrote: > On Wed, 21 Oct 2015, Michal Hocko wrote: > > > I am not sure how to achieve that. Requiring non-sleeping worker would > > work out but do we have enough users to add such an API? > > > > I would rather see vmstat using dedicated kernel thread(s) for this this > >

Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable()checks

2015-10-22 Thread Tetsuo Handa
Tejun Heo wrote: > On Thu, Oct 22, 2015 at 05:49:22PM +0200, Michal Hocko wrote: > > I am confused. What makes rescuer to not run? Nothing seems to be > > hogging CPUs, we are just out of workers which are loopin in the > > allocator but that is preemptible context. > > It's concurrency

Newbie's question: memory allocation when reclaiming memory

2015-10-26 Thread Tetsuo Handa
May I ask a newbie question? Say, there is some amount of memory pages which can be reclaimed if they are flushed to storage. And lower layer might issue memory allocation request in a way which won't cause reclaim deadlock (e.g. using GFP_NOFS or GFP_NOIO) when flushing to storage, isn't it?

Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable()checks

2015-10-27 Thread Tetsuo Handa
Michal Hocko wrote: > > On Fri, Oct 23, 2015 at 01:11:45PM +0200, Michal Hocko wrote: > > > > The problem here is not lack > > > > of execution resource but concurrency management misunderstanding the > > > > situation. > > > > > > And this sounds like a bug to me. > > > > I don't know. I can

Re: [patch 3/3] vmstat: Create our own workqueue

2015-10-28 Thread Tetsuo Handa
Christoph Lameter wrote: > On Wed, 28 Oct 2015, Tejun Heo wrote: > > > The only thing necessary here is WQ_MEM_RECLAIM. I don't see how > > WQ_SYSFS and WQ_FREEZABLE make sense here. > I can still trigger silent livelock with this patchset applied. -- [ 272.283217] MemAlloc-Info: 9

Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim

2015-11-17 Thread Tetsuo Handa
Michal Hocko wrote: > __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if > __GFP_NOFAIL is requested. This is fragile because we are basically > relying on somebody else to make the reclaim (be it the direct reclaim > or OOM killer) for us. The caller might be holding resources

[PATCH] tree wide: Use kvfree() than conditional kfree()/vfree()

2015-11-09 Thread Tetsuo Handa
check and reply if you found problems. Signed-off-by: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Acked-by: Michal Hocko <mho...@suse.com> Cc: Russell King <li...@arm.linux.org.uk> # arm Cc: <linux-a...@vger.kernel.org> # apei Cc: <drbd-u...@lists.linbit.co

Re: [patch 3/3] vmstat: Create our own workqueue

2015-10-30 Thread Tetsuo Handa
Christoph Lameter wrote: > On Thu, 29 Oct 2015, Tejun Heo wrote: > > > Wait, this series doesn't include Tetsuo's change. Of course it won't > > fix the deadlock problem. What's necessary is Tetsuo's patch + > > WQ_MEM_RECLAIM. > > This series is only dealing with vmstat changes. Do I get an

Re: [RFC 1/3] mm, oom: refactor oom detection

2015-10-30 Thread Tetsuo Handa
Michal Hocko wrote: > + target -= (stall_backoff * target + MAX_STALL_BACKOFF - 1) / > MAX_STALL_BACKOFF; target -= DIV_ROUND_UP(stall_backoff * target, MAX_STALL_BACKOFF); Michal Hocko wrote: > This alone wouldn't be sufficient, though, because the writeback might > get stuck and

Re: [PATCH] jbd2: get rid of superfluous __GFP_REPEAT

2015-11-06 Thread Tetsuo Handa
On 2015/11/07 1:17, mho...@kernel.org wrote: > From: Michal Hocko > > jbd2_alloc is explicit about its allocation preferences wrt. the > allocation size. Sub page allocations go to the slab allocator > and larger are using either the page allocator or vmalloc. This > is all good

[PATCH] tree wide: Use kvfree() than conditional kfree()/vfree()

2015-11-07 Thread Tetsuo Handa
-off-by: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> --- arch/arm/mm/dma-mapping.c | 11 ++-- drivers/acpi/apei/erst.c | 6 ++-- drivers/block/drbd/drbd_bitmap.c | 26 + drivers/block/drbd/drbd

Re: [PATCH] tree wide: Use kvfree() than conditional kfree()/vfree()

2015-11-07 Thread Tetsuo Handa
Andy Shevchenko wrote: > Like Joe noticed you have left few places like > void my_func_kvfree(arg) > { > kvfree(arg); > } > > Might make sense to remove them completely, especially in case when > you have changed the callers. I think we should stop at #define my_func_kvfree(arg) kvfree(arg) in

Re: [patch 3/3] vmstat: Create our own workqueue

2015-11-02 Thread Tetsuo Handa
Christoph Lameter wrote: > On Sat, 31 Oct 2015, Tetsuo Handa wrote: > > > Then, you need to update below description (or drop it) because > > patch 3/3 alone will not guarantee that the counters are up to date. > > The vmstat system does not guarantee that the counters

Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks

2015-11-05 Thread Tetsuo Handa
Michal Hocko wrote: > As already pointed out I really detest a short sleep and would prefer > a way to tell WQ what we really need. vmstat is not the only user. OOM > sysrq will need this special treatment as well. While the > zone_reclaimable can be fixed in an easy patch >

Re: [patch 3/3] vmstat: Create our own workqueue

2015-11-06 Thread Tetsuo Handa
Christoph Lameter wrote: > On Sat, 31 Oct 2015, Tetsuo Handa wrote: > > > Then, you need to update below description (or drop it) because > > patch 3/3 alone will not guarantee that the counters are up to date. > > The vmstat system does not guarantee that the counters

Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks

2015-11-02 Thread Tetsuo Handa
Tejun Heo wrote: > If > the possibility of sysrq getting stuck behind concurrency management > is an issue, queueing them on an unbound or highpri workqueue should > be good enough. Regarding SysRq-f, we could do like below. Though

Re: Silent hang up caused by pages being not scanned?

2015-10-14 Thread Tetsuo Handa
Michal Hocko wrote: > The OOM report is really interesting: > > > [ 69.039152] Node 0 DMA32 free:74224kB min:44652kB low:55812kB > > high:66976kB active_anon:1334792kB inactive_anon:8240kB active_file:48364kB > > inactive_file:230752kB unevictable:0kB isolated(anon):92kB > >

Re: Silent hang up caused by pages being not scanned?

2015-10-14 Thread Tetsuo Handa
Michal Hocko wrote: > On Wed 14-10-15 23:38:00, Tetsuo Handa wrote: > > Michal Hocko wrote: > [...] > > > Why hasn't balance_dirty_pages throttled writers and allowed them to > > > make the whole LRU dirty? What is your dirty{_background}_{ratio,bytes} >

Re: Silent hang up caused by pages being not scanned?

2015-10-16 Thread Tetsuo Handa
Linus Torvalds wrote: > Tetsuo, mind trying it out and maybe tweaking it a bit for the load > you have? Does it seem to improve on your situation? Yes, I already tried it and just replied to Michal. I tested for one hour using various memory stressing programs. As far as I tested, I did not hit

Re: Can't we use timeout based OOM warning/killing?

2015-10-08 Thread Tetsuo Handa
Linus Torvalds wrote: > Because another thing that tends to affect this is that oom without swap is > very different from oom with lots of swap, so different people will see > very different issues. If you have some particular case you want to check, > and could make a VM image for it, maybe that

[PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks

2015-10-21 Thread Tetsuo Handa
>From 0c50792dfa6396453c89c71351a7458b94d3e881 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Date: Wed, 21 Oct 2015 21:15:30 +0900 Subject: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks Since "struct zone"-

[RFC][PATCH] Memory allocation watchdog kernel thread.

2015-10-18 Thread Tetsuo Handa
>From e07c200277cdb8e46aa754d3b980b02ab727cb80 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Date: Sun, 18 Oct 2015 20:28:45 +0900 Subject: [PATCH] Memory allocation watchdog kernel thread. This patch adds a kernel thread which periodically repor

Re: Silent hang up caused by pages being not scanned?

2015-10-14 Thread Tetsuo Handa
Linus Torvalds wrote: > On Tue, Oct 13, 2015 at 5:21 AM, Tetsuo Handa > <penguin-ker...@i-love.sakura.ne.jp> wrote: > > > > If I remove > > > > /* Any of the zones still reclaimable? Don't OOM. */ > > if (zones_reclaimable) > >

Re: Can't we use timeout based OOM warning/killing?

2015-10-12 Thread Tetsuo Handa
Tetsuo Handa wrote: > So, zapping the first OOM victim's mm might fail by chance. I retested with a slightly different version. -- Reproducer start -- #define _GNU_SOURCE #include #include #include #include #include #include #include #include static int writer(v

Silent hang up caused by pages being not scanned?

2015-10-12 Thread Tetsuo Handa
Tetsuo Handa wrote: > Uptime between 101 and 300 is a silent hang up (i.e. no OOM killer messages, > no SIGKILL pending tasks, no TIF_MEMDIE tasks) which I solved using SysRq-f > at uptime = 289. I don't know the reason of this silent hang up, but the > memory unzapping kernel thread w

Re: can't oom-kill zap the victim's memory?

2015-10-07 Thread Tetsuo Handa
Oleg Nesterov wrote: > > > Hmm. If we already have mmap_sem and started zap_page_range() then > > > I do not think it makes sense to stop until we free everything we can. > > > > Zapping a huge address space can take quite some time > > Yes, and this is another reason we should do this

Re: can't oom-kill zap the victim's memory?

2015-10-07 Thread Tetsuo Handa
Vlastimil Babka wrote: > On 5.10.2015 16:44, Michal Hocko wrote: > > So I can see basically only few ways out of this deadlock situation. > > Either we face the reality and allow small allocations (withtout > > __GFP_NOFAIL) to fail after all attempts to reclaim memory have failed > > (so after

Re: Can't we use timeout based OOM warning/killing?

2015-10-06 Thread Tetsuo Handa
Tetsuo Handa wrote: > Sorry. This was my misunderstanding. But I still think that we need to be > prepared for cases where zapping OOM victim's mm approach fails. > ( > http://lkml.kernel.org/r/201509242050.ehe95837.fvfootmqhlj...@i-love.sakura.ne.jp > ) I tested whether it is

Re: Silent hang up caused by pages being not scanned?

2015-10-13 Thread Tetsuo Handa
Linus Torvalds wrote: > On Mon, Oct 12, 2015 at 8:25 AM, Tetsuo Handa > <penguin-ker...@i-love.sakura.ne.jp> wrote: > > > > I examined this hang up using additional debug printk() patch. And it was > > observed that when this silent hang up occurs, zone_

Re: Can't we use timeout based OOM warning/killing?

2015-10-10 Thread Tetsuo Handa
Tetsuo Handa wrote: > Without means to find out what was happening, we will "overlook real bugs" > before "paper over real bugs". The means are expected to work without > knowledge to use trace points functionality, are expected to run without > memory allocat

Re: Silent hang up caused by pages being not scanned?

2015-10-13 Thread Tetsuo Handa
Michal Hocko wrote: > I can see two options here. Either we teach zone_reclaimable to be less > fragile or remove zone_reclaimable from shrink_zones altogether. Both of > them are risky because we have a long history of changes in this areas > which made other subtle behavior changes but I guess

Re: [RFC 0/8] Allow GFP_NOFS allocation to fail

2015-09-07 Thread Tetsuo Handa
Michal Hocko wrote: > As the VM cannot do much about these requests we should face the reality > and allow those allocations to fail. Johannes has already posted the > patch which does that (http://marc.info/?l=linux-mm=142726428514236=2) > but the discussion died pretty quickly. Addition of

Re: [RFC 0/8] Allow GFP_NOFS allocation to fail

2015-09-15 Thread Tetsuo Handa
Tetsuo Handa wrote: > > Thoughts? Opinions? > > To me, fixing callers (adding __GFP_NORETRY to callers) in a step-by-step > fashion after adding proactive countermeasure sounds better than changing > the default behavior (implicitly applying __GFP_NORETRY inside). > Ping?

Re: can't oom-kill zap the victim's memory?

2015-09-29 Thread Tetsuo Handa
David Rientjes wrote: > On Fri, 25 Sep 2015, Michal Hocko wrote: > > > > I am still not sure how you want to implement that kernel thread but I > > > > am quite skeptical it would be very much useful because all the current > > > > allocations which end up in the OOM killer path cannot simply back

Re: [PATCH -mm 3/3] mm/oom_kill: fix the wrong task->mm == mm checks in

2015-09-29 Thread Tetsuo Handa
Oleg Nesterov wrote: > Both "child->mm == mm" and "p->mm != mm" checks in oom_kill_process() > are wrong. ->mm can be if task is the exited group leader. This means can be [missing word here?] if task > +static bool process_has_mm(struct task_struct *p, struct mm_struct *mm) > +{ > +

Re: can't oom-kill zap the victim's memory?

2015-09-29 Thread Tetsuo Handa
David Rientjes wrote: > I think both of your illustrations show why it is not helpful to kill > additional processes after a time period has elapsed and a victim has > failed to exit. In both of your scenarios, it would require that KT1 be > killed to allow forward progress and we know that's

Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

2015-09-29 Thread Tetsuo Handa
David Rientjes wrote: > On Tue, 29 Sep 2015, Oleg Nesterov wrote: > > > The fatal_signal_pending() was added to suppress unnecessary "sharing > > same memory" message, but it can't 100% help anyway because it can be > > false-negative; SIGKILL can be already dequeued. > > > > And worse, it can

Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer

2015-09-30 Thread Tetsuo Handa
Oleg Nesterov wrote: > Just in case, this doesn't depend on the previous series I sent. > > Tetsuo, iirc we already discussed the change in 1/2 some time ago, > could you review? > > Oleg. I tested patch 1/2 and 2/2 on next-20150929 using reproducer at

Re: can't oom-kill zap the victim's memory?

2015-09-30 Thread Tetsuo Handa
Tetsuo Handa wrote: > (Well, do we need to change __alloc_pages_slowpath() that OOM victims do not > enter direct reclaim paths in order to avoid being blocked by unkillable fs > locks?) I'm not familiar with how fs writeback manages memory. I feel I'm missing something. Can somebody

Re: can't oom-kill zap the victim's memory?

2015-09-28 Thread Tetsuo Handa
ecs will be acceptable for those who want to retry a bit more rather than panic on accidental livelock if this approach is used as opt-in. Tetsuo Handa wrote: > Excuse me, but thinking about CLONE_VM without CLONE_THREAD case... > Isn't there possibility of hitting livelocks at > >

Re: [PATCH -mm v2 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()check in oom_kill_process()

2015-10-01 Thread Tetsuo Handa
skip this process. > > > > We could probably add the additional ->group_exit_task check but this > > pach just removes the wrong check along with pr_info(). > > > > Signed-off-by: Oleg Nesterov <o...@redhat.com> > > Acked-by: David Rientjes <rient...

Re: can't oom-kill zap the victim's memory?

2015-10-01 Thread Tetsuo Handa
David Rientjes wrote: > On Wed, 30 Sep 2015, Tetsuo Handa wrote: > > > If we choose only 1 OOM victim, the possibility of hitting this memory > > unmapping livelock is (say) 1%. But if we choose multiple OOM victims, the > > possibility becomes (almost) 0%. And if we stil

Re: [PATCH -mm v2 1/3] mm/oom_kill: remove the wrong fatal_signal_pending() check in oom_kill_process()

2015-10-02 Thread Tetsuo Handa
Michal Hocko wrote: > > --- a/fs/coredump.c > > +++ b/fs/coredump.c > > @@ -295,6 +295,8 @@ static int zap_process(struct task_struct *start, int > > exit_code, int flags) > > for_each_thread(start, t) { > > task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); > >

Re: [PATCH -mm v2 1/3] mm/oom_kill: remove the wrong fatal_signal_pending() check in oom_kill_process()

2015-10-02 Thread Tetsuo Handa
Oleg Nesterov wrote: > Tetsuo, sorry, I don't understand your question... > > > because it is possible that T starts the coredump, T sends SIGKILL to P, > > P calls out_of_memory() on GFP_FS allocation, > > yes, and since fatal_signal_pending() == T we do not even check > task_will_free_mem(). >

Re: can't oom-kill zap the victim's memory?

2015-10-02 Thread Tetsuo Handa
Michal Hocko wrote: > On Mon 28-09-15 15:24:06, David Rientjes wrote: > > I agree that i_mutex seems to be one of the most common offenders. > > However, I'm not sure I understand why holding it while trying to allocate > > infinitely for an order-0 allocation is problematic wrt the proposed >

Re: [PATCH -mm v2 1/3] mm/oom_kill: remove the wrong fatal_signal_pending() check in oom_kill_process()

2015-10-02 Thread Tetsuo Handa
Michal Hocko wrote: > > Since T sends SIGKILL to all clone(CLONE_VM) tasks upon coredump, P needs > > to do > > It does that only to all threads in the _same_ thread group AFAIU. I'm confused. What the _same_ thread group? I can observe that SIGKILL is sent to all clone(CLONE_THREAD |

Re: [PATCH -mm v2 1/3] mm/oom_kill: remove the wrong fatal_signal_pending() check in oom_kill_process()

2015-10-02 Thread Tetsuo Handa
Oleg Nesterov wrote: > On 10/01, Michal Hocko wrote: > > > > zap_process will add SIGKILL to all threads but the > > current which will go on without being killed and if this is not a > > thread group leader then we would miss it. > > Yes. And note that de_thread() does the same. Speaking of

Can't we use timeout based OOM warning/killing?

2015-10-03 Thread Tetsuo Handa
Michal Hocko wrote: > On Tue 29-09-15 01:18:00, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > The point I've tried to made is that oom unmapper running in a detached > > > context (e.g. kernel thread) vs. directly in the oom context doesn't > > > m

Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer

2015-09-30 Thread Tetsuo Handa
Oleg Nesterov wrote: > This patch just makes the SIGNAL_GROUP_COREDUMP check in task_will_free_mem() > a bit more correct wrt CLONE_VM tasks, nothing more. OK. Then, that's out of what I can understand. But I wish for some description to PATCH 2/2 about why to change from "do { }

Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrongfatal_signal_pending()

2015-09-30 Thread Tetsuo Handa
Oleg Nesterov wrote: > > This fatal_signal_pending() check is about to be added by me because the OOM > > killer spams the kernel log when the mm struct which the OOM victim is using > > is shared by many threads. ( http://marc.info/?l=linux-mm=143256441501204 > > ) > > OK, I see, but it is

Re: can't oom-kill zap the victim's memory?

2015-09-20 Thread Tetsuo Handa
Oleg Nesterov wrote: > On 09/17, Kyle Walker wrote: > > > > Currently, the oom killer will attempt to kill a process that is in > > TASK_UNINTERRUPTIBLE state. For tasks in this state for an exceptional > > period of time, such as processes writing to a frozen filesystem during > > a lengthy

Re: [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks

2015-09-24 Thread Tetsuo Handa
Kyle Walker wrote: > I agree, in lieu of treating TASK_UNINTERRUPTIBLE tasks as unkillable, > and omitting them from the oom selection process, continuing the > carnage is likely to result in more unpredictable results. At this > time, I believe Oleg's solution of zapping the process memory use >

Re: [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks

2015-09-21 Thread Tetsuo Handa
David Rientjes wrote: > Your proposal, which I mostly agree with, tries to kill additional > processes so that they allocate and drop the lock that the original victim > depends on. My approach, from > http://marc.info/?l=linux-kernel=144010444913702, is the same, but > without the killing.

Re: can't oom-kill zap the victim's memory?

2015-09-25 Thread Tetsuo Handa
Michal Hocko wrote: > On Thu 24-09-15 14:15:34, David Rientjes wrote: > > > > Finally. Whatever we do, we need to change oom_kill_process() first, > > > > and I think we should do this regardless. The "Kill all user processes > > > > sharing victim->mm" logic looks wrong and

Re: can't oom-kill zap the victim's memory?

2015-09-22 Thread Tetsuo Handa
Oleg Nesterov wrote: > On 09/22, Tetsuo Handa wrote: > > > > I imagined a dedicated kernel thread doing something like shown below. > > (I don't know about mm->mmap management.) > > mm->mmap_zapped corresponds to MMF_MEMDIE. > > No, it doesn't, please

Re: [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks

2015-09-19 Thread Tetsuo Handa
Michal Hocko wrote: > This has been posted in various forms many times over past years. I > still do not think this is a right approach of dealing with the problem. I do not think "GFP_NOFS can fail" patch is a right approach because that patch easily causes messages like below. Buffer I/O

Re: [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks

2015-09-18 Thread Tetsuo Handa
Oleg Nesterov wrote: > To simplify the discussion lets ignore PF_FROZEN, this is another issue. > > I am not sure this change is enough, we need to ensure that > select_bad_process() won't pick the same task (or its sub-thread) again. SysRq-f is sometimes unusable because it continues choosing

Re: can't oom-kill zap the victim's memory?

2015-09-21 Thread Tetsuo Handa
Oleg Nesterov wrote: > Yes, yes, and I already tried to comment this part. We probably need a > dedicated kernel thread, but I still think (although I am not sure) that > initial change can use workueue. In the likely case system_unbound_wq pool > should have an idle thread, if not - OK, this

Re: [RFC PATCH -v2] mm, oom: introduce oom reaper

2015-12-05 Thread Tetsuo Handa
Michal Hocko wrote: > On Sun 29-11-15 01:10:10, Tetsuo Handa wrote: > > Tetsuo Handa wrote: > > > > Users of mmap_sem which need it for write should be carefully reviewed > > > > to use _killable waiting as much as possible and reduce allocations > &g

Re: [RFC PATCH -v2] mm, oom: introduce oom reaper

2015-12-07 Thread Tetsuo Handa
Michal Hocko wrote: > Yes you are right! The reference count should be incremented before > publishing the new mm_to_reap. I thought that an elevated ref. count by > the caller would be enough but this was clearly wrong. Does the update > below looks better? I think that moving mmdrop() from

Re: [PATCH v4] mm,oom: Add memory allocation watchdog kernel thread.

2015-12-13 Thread Tetsuo Handa
Johannes Weiner wrote: > On Sun, Dec 13, 2015 at 12:33:04AM +0900, Tetsuo Handa wrote: > > +Currently, when something went wrong inside memory allocation request, > > +the system will stall with either 100% CPU usage (if memory allocating > > +tasks are doing busy loop) or 0%

[PATCH v4] mm,oom: Add memory allocation watchdog kernel thread.

2015-12-12 Thread Tetsuo Handa
>From 2804913f4d21a20a154b93d5437c21e52bf761a1 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Date: Sun, 13 Dec 2015 00:02:29 +0900 Subject: [PATCH v4] mm/oom: Add memory allocation watchdog kernel thread. This patch adds a kernel thread which per

Re: [PATCH 1/2] mm, oom: introduce oom reaper

2015-12-18 Thread Tetsuo Handa
Michal Hocko wrote: > On Wed 16-12-15 16:50:35, Andrew Morton wrote: > > On Tue, 15 Dec 2015 19:36:15 +0100 Michal Hocko wrote: > [...] > > > +static void oom_reap_vmas(struct mm_struct *mm) > > > +{ > > > + int attempts = 0; > > > + > > > + while (attempts++ < 10 &&

Re: [PATCH v4] mm,oom: Add memory allocation watchdog kernel thread.

2015-12-16 Thread Tetsuo Handa
Here is a different version. Is this version better than creating a dedicated kernel thread? -- From c9f61902977b04d24d809d2b853a5682fc3c41e8 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Date: Thu, 17 Dec 2015 00:02:37 +0900 Subject: [PATCH dra

Re: __vmalloc() vs. GFP_NOIO/GFP_NOFS

2016-01-04 Thread Tetsuo Handa
On 2016/01/03 16:12, Al Viro wrote: > Those, AFAICS, are such callers with GFP_NOIO; however, there's a shitload > of GFP_NOFS ones. XFS uses memalloc_noio_save(), but a _lot_ of other > callers do not. For example, all call chains leading to ceph_kvmalloc() > pass GFP_NOFS and none of them is

Re: [PATCH] unix: properly account for FDs passed over unix sockets

2015-12-30 Thread Tetsuo Handa
before the culprit process is killed (CVE-2013-4312)". Reported-by: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Mitigates: CVE-2013-4312 (Linux 2.0+) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.or

Re: [PATCH 0/3] OOM detection rework v4

2016-01-02 Thread Tetsuo Handa
Tetsuo Handa wrote: > Michal Hocko wrote: > > On Mon 28-12-15 21:08:56, Tetsuo Handa wrote: > > > Tetsuo Handa wrote: > > > > I got OOM killers while running heavy disk I/O (extracting kernel > > > > source, > > > > running lxr's gen

Re: [RFC][PATCH] sysrq: ensure manual invocation of the OOM killerunder OOM livelock

2016-01-06 Thread Tetsuo Handa
Michal Hocko wrote: > On Tue 05-01-16 17:22:46, Michal Hocko wrote: > > On Wed 30-12-15 15:33:47, Tetsuo Handa wrote: > [...] > > > I wish for a kernel thread that does OOM-kill operation. > > > Maybe we can change the OOM reaper kernel thread to do it. > >

[PATCH] mm,oom: Re-enable OOM killer using timers.

2016-01-07 Thread Tetsuo Handa
>From 2f73abcec47535062d41c04bd7d9068cd71214b0 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Date: Thu, 7 Jan 2016 11:34:41 +0900 Subject: [PATCH] mm,oom: Re-enable OOM killer using timers. This patch introduces two timers ( holdoff timer and victim w

Re: [PATCH] mm,oom: Exclude TIF_MEMDIE processes from candidates.

2016-01-07 Thread Tetsuo Handa
Michal Hocko wrote: > I do not think the placement in find_lock_task_mm is desirable nor > correct. This function is used in multiple contexts outside of the oom > proper. It only returns a locked task_struct for a thread that belongs > to the process. OK. Andrew, please drop from -mm tree for

[PATCH] mm,oom: Always sleep before retrying.

2015-12-29 Thread Tetsuo Handa
>From c0b5820c594343e06239f15afb35d23b4b8ac0d0 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Date: Wed, 30 Dec 2015 10:55:59 +0900 Subject: [PATCH] mm,oom: Always sleep before retrying. When we entered into "Reclaim has failed us, start killing

Re: [PATCH] mm,oom: Always sleep before retrying.

2015-12-31 Thread Tetsuo Handa
Tetsuo Handa wrote: > When we entered into "Reclaim has failed us, start killing things" > state, sleep function is called only when mutex_trylock(_lock) > in __alloc_pages_may_oom() failed or immediately after returning from > oom_kill_process() in out_of_memory(). Th

Re: [PATCH 1/2] mm, oom: introduce oom reaper

2015-12-19 Thread Tetsuo Handa
Tetsuo Handa wrote: > Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20151218.txt.xz . > -- > [ 438.304082] Killed process 12680 (oom_reaper-test) total-vm:4324kB, > anon-rss:120kB, file-rss:0kB, shmem-rss:0kB > [ 439.318951] oom_reaper: attempts=11 > [ 44

Re: [PATCH 0/3] OOM detection rework v4

2015-12-28 Thread Tetsuo Handa
Tetsuo Handa wrote: > I got OOM killers while running heavy disk I/O (extracting kernel source, > running lxr's genxref command). (Environ: 4 CPUs / 2048MB RAM / no swap / XFS) > Do you think these OOM killers reasonable? Too weak against fragmentation? Well, current patch invokes OO

Re: [PATCH 0/3] OOM detection rework v4

2015-12-28 Thread Tetsuo Handa
Tetsuo Handa wrote: > Tetsuo Handa wrote: > > I got OOM killers while running heavy disk I/O (extracting kernel source, > > running lxr's genxref command). (Environ: 4 CPUs / 2048MB RAM / no swap / > > XFS) > > Do you think these OOM killers reasonable? Too weak against

Re: [PATCH] kernel/hung_task.c: use timeout diff when timeout is updated

2015-12-21 Thread Tetsuo Handa
Andrew Morton wrote: > On Thu, 17 Dec 2015 21:23:03 +0900 Tetsuo Handa > <penguin-ker...@i-love.sakura.ne.jp> wrote: > > > >From 529ff00b556e110c6e801c39e94b06f559307136 Mon Sep 17 00:00:00 2001 > > From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> >

Re: [PATCH 1/2] mm, oom: introduce oom reaper

2015-12-24 Thread Tetsuo Handa
Michal Hocko wrote: > This is VM_BUG_ON_PAGE(page_mapped(page), page), right? Could you attach > the full kernel log? It all smells like a race when OOM reaper tears > down the mapping and there is a truncate still in progress. But hitting > the BUG_ON just because of that doesn't make much sense

Re: [PATCH 0/3] OOM detection rework v4

2015-12-24 Thread Tetsuo Handa
I got OOM killers while running heavy disk I/O (extracting kernel source, running lxr's genxref command). (Environ: 4 CPUs / 2048MB RAM / no swap / XFS) Do you think these OOM killers reasonable? Too weak against fragmentation? [ 3902.430630] kthreadd invoked oom-killer: order=2, oom_score_adj=0,

[RFC][PATCH] sysrq: ensure manual invocation of the OOM killer under OOM livelock

2015-12-29 Thread Tetsuo Handa
>From 7fcac2054b33dc3df6c5915a58f232b9b80bb1e6 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Date: Wed, 30 Dec 2015 15:24:40 +0900 Subject: [RFC][PATCH] sysrq: ensure manual invocation of the OOM killer under OOM livelock This patch is similar to wh

[PATCH] mm,oom: Use hold off timer after invoking the OOM killer.

2015-12-28 Thread Tetsuo Handa
>From 749b861430cca1cb5a1cd7df9bd79a475b2515eb Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Date: Tue, 29 Dec 2015 15:52:41 +0900 Subject: [PATCH] mm,oom: Use hold off timer after invoking the OOM killer. When many hundreds of tasks running o

Re: [PATCH 0/3] OOM detection rework v4

2015-12-30 Thread Tetsuo Handa
Michal Hocko wrote: > On Mon 28-12-15 21:08:56, Tetsuo Handa wrote: > > Tetsuo Handa wrote: > > > I got OOM killers while running heavy disk I/O (extracting kernel source, > > > running lxr's genxref command). (Environ: 4 CPUs / 2048MB RAM / no swap / > > &g

[PATCH] kernel/hung_task.c: use timeout diff when timeout is updated

2015-12-17 Thread Tetsuo Handa
>From 529ff00b556e110c6e801c39e94b06f559307136 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Date: Thu, 17 Dec 2015 16:27:08 +0900 Subject: [PATCH] kernel/hung_task.c: use timeout diff when timeout is updated When new timeout is written to /proc/s

[PATCH] mm,oom: Exclude TIF_MEMDIE processes from candidates.

2015-12-29 Thread Tetsuo Handa
>From 8bb9e36891a803e82c589ef78077838026ce0f7d Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Date: Tue, 29 Dec 2015 22:20:58 +0900 Subject: [PATCH] mm,oom: Exclude TIF_MEMDIE processes from candidates. The OOM reaper kernel thread can reclaim OOM victim

Re: [RFC PATCH] mm, oom: introduce oom reaper

2015-11-27 Thread Tetsuo Handa
Michal Hocko wrote: > > > + for (vma = mm->mmap ; vma; vma = vma->vm_next) { > > > + if (is_vm_hugetlb_page(vma)) > > > + continue; > > > + > > > + /* > > > + * Only anonymous pages have a good chance to be dropped > > > + * without additional

<    1   2   3   4   5   6   7   8   9   10   >