On Thu 07-05-20 10:00:07, Shakeel Butt wrote:
> On Thu, May 7, 2020 at 9:47 AM Michal Hocko wrote:
> >
> > On Thu 07-05-20 09:33:01, Shakeel Butt wrote:
> > [...]
> > > @@ -2600,8 +2596,23 @@ static int try_charge(struct mem_cgroup
schedule_work(>high_work);
> break;
> }
> } while ((memcg = parent_mem_cgroup(memcg)));
> --
> 2.26.2.526.g744177e7f7-goog
>
--
Michal Hocko
SUSE Labs
.org/linux-mm/20200504070304.127361-1-sandi...@linux.ibm.com/T/#u
>
> Signed-off-by: Vlastimil Babka
Acked-by: Michal Hocko
Thanks!
> ---
> Documentation/admin-guide/numastat.rst | 31 +++---
> 1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a
On Tue 05-05-20 08:35:45, Shakeel Butt wrote:
> On Tue, May 5, 2020 at 8:27 AM Johannes Weiner wrote:
> >
> > On Mon, May 04, 2020 at 12:23:51PM -0700, Shakeel Butt wrote:
> > > On Mon, May 4, 2020 at 9:06 AM Michal Hocko wrote:
> > > > I really hate to re
l information can we focus on
the remote charging side of the problem and deal with it in a sensible
way? That would make memory.high usable for your usecase and I still
believe that this is what you should be using in the first place.
--
Michal Hocko
SUSE Labs
On Mon 04-05-20 08:35:57, Shakeel Butt wrote:
> On Mon, May 4, 2020 at 8:00 AM Michal Hocko wrote:
> >
> > On Mon 04-05-20 07:53:01, Shakeel Butt wrote:
[...]
> > > I am trying to see if "no eligible task" is really an issue and should
> > > be warned f
On Mon 04-05-20 07:53:01, Shakeel Butt wrote:
> On Mon, May 4, 2020 at 7:11 AM Michal Hocko wrote:
> >
> > On Mon 04-05-20 06:54:40, Shakeel Butt wrote:
> > > On Sun, May 3, 2020 at 11:56 PM Michal Hocko wrote:
> > > >
> > > > On Thu 30-04-20
On Mon 04-05-20 06:54:40, Shakeel Butt wrote:
> On Sun, May 3, 2020 at 11:56 PM Michal Hocko wrote:
> >
> > On Thu 30-04-20 11:27:12, Shakeel Butt wrote:
> > > Lowering memory.max can trigger an oom-kill if the reclaim does not
> > > succeed. However if o
On Thu 30-04-20 12:48:20, Srikar Dronamraju wrote:
> * Michal Hocko [2020-04-29 14:22:11]:
>
> > On Wed 29-04-20 07:11:45, Srikar Dronamraju wrote:
> > > > >
> > > > > By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that No
On Mon 04-05-20 15:40:18, Yafang Shao wrote:
> On Mon, May 4, 2020 at 3:35 PM Michal Hocko wrote:
> >
> > On Mon 04-05-20 15:26:52, Yafang Shao wrote:
[...]
> > > As explianed above, no eligible task is different from no task.
> > > If there are some candidates b
On Mon 04-05-20 15:26:52, Yafang Shao wrote:
> On Mon, May 4, 2020 at 3:03 PM Michal Hocko wrote:
> >
> > On Fri 01-05-20 09:39:24, Yafang Shao wrote:
> > > On Fri, May 1, 2020 at 2:27 AM Shakeel Butt wrote:
> > > >
> > > > Lowering memory
On Fri 01-05-20 07:59:57, Yafang Shao wrote:
> On Thu, Apr 30, 2020 at 10:57 PM Michal Hocko wrote:
> >
> > On Wed 29-04-20 12:56:27, Johannes Weiner wrote:
> > [...]
> > > I think to address this, we need a more comprehensive solution and
> > > introduce s
; + break;
> +
> memcg_memory_event(memcg, MEMCG_OOM);
> if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> break;
I am not a great fan to be honest. The warning might be useful for other
usecases when it is not clear that the memcg is empty.
--
Michal Hocko
SUSE Labs
ter the memcg is offlined and at the moment, high
> reclaim does not work for remote memcg and the usage can go till max
> or global pressure. This is most probably a misconfiguration and we
> might not receive the warnings in the log ever. Setting memory.max to
> 0 will definitely give such warnings.
Can we add a warning for the remote charging on dead memcgs?
--
Michal Hocko
SUSE Labs
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 463b3d74a64a..5ace39f6fe1e 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -1098,7 +1098,7 @@ bool out_of_memory(struct oom_control *oc)
>
> select_bad_process(oc);
> /* Found nothing?!?! */
> - if (!oc->chosen) {
> + if (!oc->chosen && !oc->no_warn) {
> dump_header(oc, NULL);
> pr_warn("Out of memory and no killable processes...\n");
> /*
> --
> 2.26.2.526.g744177e7f7-goog
--
Michal Hocko
SUSE Labs
l that much because limit reclaim
> and global reclaim tend to occur in complementary
> containerization/isolation strategies, not heavily simultaneously.
I would expect that as well but this is always hard to tell.
--
Michal Hocko
SUSE Labs
On Wed 29-04-20 10:03:30, Johannes Weiner wrote:
> On Wed, Apr 29, 2020 at 12:15:10PM +0200, Michal Hocko wrote:
> > On Tue 28-04-20 19:26:47, Chris Down wrote:
> > > From: Yafang Shao
> > >
> > > A cgroup can have both memory protection and a memory limit t
by: Vaneet Narang
> Signed-off-by: Maninder Singh
You could have kept my ack from v1
Acked-by: Michal Hocko
Thanks!
> ---
> v1 -> v2: position of variable changed mistakenly, thus reverted.
> v2 -> v3: Don't change position of any variable, thus reverted.
> if required then
On Wed 29-04-20 01:23:15, Tetsuo Handa wrote:
> On 2020/04/29 0:45, Michal Hocko wrote:
> > On Tue 28-04-20 22:11:19, Tetsuo Handa wrote:
> >> Existing KERN_$LEVEL allows a user to determine whether he/she wants that
> >> message
> >> to be printed on consoles
On Wed 29-04-20 18:59:40, Vaneet Narang wrote:
> Hi Michal,
>
> >> >
> >> >Acked-by: Michal Hocko
> >> >
> >> >Is there any reason to move declarations here?
> >> >
> >>
> >> "unsigned int ret" was c
On Wed 29-04-20 18:23:23, Maninder Singh wrote:
>
> Hi,
>
> >
> >Acked-by: Michal Hocko
> >
> >Is there any reason to move declarations here?
> >
>
> "unsigned int ret" was changed mistakenely, sending V2.
> and "unsigned int nr_re
MA Multi node but with no CPUs and memory from node 0.
Have you tested on something else than ppc? Each arch does the NUMA
setup separately and this is a big mess. E.g. x86 marks even memory less
nodes (see init_memory_less_node) as online.
Honestly I have hard time to evaluate the effect of this
by: Vaneet Narang
> Signed-off-by: Maninder Singh
Acked-by: Michal Hocko
Is there any reason to move declarations here?
> -unsigned long reclaim_clean_pages_from_list(struct zone *zone,
> +unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>
On Wed 29-04-20 19:45:07, Tetsuo Handa wrote:
> On 2020/04/29 18:04, Michal Hocko wrote:
> > Completely agreed! The in kernel OOM killer is to deal with situations
> > when memory is desperately depleted without any sign of a forward
> > progress. If there is a recla
e more robust against
races on top of that because this is likely a more tricky thing to do.
> Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> Signed-off-by: Yafang Shao
> Signed-off-by: Chris Down
> Cc: Johannes Weiner
> Cc: Michal Hocko
simply checking and
> don't need to worry about that.
>
> Signed-off-by: Chris Down
> Suggested-by: Johannes Weiner
> Cc: Michal Hocko
> Cc: Roman Gushchin
> Cc: Yafang Shao
Acked-by: Michal Hocko
> ---
> include/linux/memcontrol.h | 48 +++
is desperately depleted without any sign of a forward
progress. If there is a reclaimable memory then we are not there yet.
If a workload can benefit from early oom killing based on response time
then we have facilities to achieve that (e.g. PSI).
--
Michal Hocko
SUSE Labs
On Wed 29-04-20 10:31:41, peter enderborg wrote:
> On 4/28/20 9:43 AM, Michal Hocko wrote:
> > On Mon 27-04-20 16:35:58, Andrew Morton wrote:
> > [...]
> >> No consumer of GFP_ATOMIC memory should consume an unbounded amount of
> >> it.
> >> Subsystems
On Tue 28-04-20 22:11:19, Tetsuo Handa wrote:
> On 2020/04/28 21:18, Michal Hocko wrote:
> > On Tue 28-04-20 20:33:21, Tetsuo Handa wrote:
> >> On 2020/04/27 15:21, Sergey Senozhatsky wrote:
> >>>> KERN_NO_CONSOLES is for type of messages where
hard-coded policy.
>
> But given that whether to use KERN_NO_CONSOLES is configurable via e.g.
> sysctl,
> KERN_NO_CONSOLES will become a user configurable parameter. What's still
> wrong?
How do I as a kernel developer know that KERN_NO_CONSOLES should be
used? In other words, how can I assume what a user will consider
important on the console?
--
Michal Hocko
SUSE Labs
g . form,
> wonder why it doesn't work, then read the doc and realize it's not
> supported?
Yes, I do agree. I have only recently learned that sysctl supports / as
well. Most people are simply used to . notation. The copy of the arch
and . -> / substitution is a trivial operation and I do not think it is
a real reason to introduce unnecessarily harder to use interface.
--
Michal Hocko
SUSE Labs
cing with the reclaim and betting on luck. The last problem was the
most annoying because it is really hard to tune for.
--
Michal Hocko
SUSE Labs
printf(m, "%6lu ", area->nr_free);
> + }
> + seq_putc(m, '\n');
This is essentially duplicating /proc/buddyinfo. Do we really need that?
--
Michal Hocko
SUSE Labs
seq_printf(m, "%s%6lu ", overflow ? ">" : "",
freecount);
+ spin_unlock_irq(>lock);
+ cond_resched();
+ spin_lock_irq(>lock);
}
seq_putc(m, '\n');
}
I do not have a strong opinion here but I can fold this into my patch 2.
--
Michal Hocko
SUSE Labs
On Wed 23-10-19 12:27:29, Mike Christie wrote:
> On 10/23/2019 02:11 AM, Michal Hocko wrote:
> > On Wed 23-10-19 07:43:44, Dave Chinner wrote:
> >> On Tue, Oct 22, 2019 at 06:33:10PM +0200, Michal Hocko wrote:
> >
> > Thanks for more clarifiat
With a brown paper bag bug fixed. I have also added a note about low
number of pages being more important as per Vlastimil's feedback
>From 0282f604144a5c06fdf3cf0bb2df532411e7f8c9 Mon Sep 17 00:00:00 2001
From: Michal Hocko
Date: Wed, 23 Oct 2019 12:13:02 +0200
Subject: [PATCH] mm, vms
On Wed 23-10-19 10:56:30, Waiman Long wrote:
> On 10/23/19 6:27 AM, Michal Hocko wrote:
> > From: Michal Hocko
> >
> > pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
> > This is not really nice because it blocks both any interrupts on that
* will be artificially small.
>*/
> +#ifdef CONFIG_MEMORY_HOTPLUG
> for_each_populated_zone(zone)
> zone_pcp_update(zone);
> +#endif
>
> /*
> * We initialized the rest of the deferred pages. Permanently disable
> --
> 2.7.4
--
Michal Hocko
SUSE Labs
On Wed 23-10-19 15:48:36, Vlastimil Babka wrote:
> On 10/23/19 3:37 PM, Michal Hocko wrote:
> > On Wed 23-10-19 15:32:05, Vlastimil Babka wrote:
> >> On 10/23/19 12:27 PM, Michal Hocko wrote:
> >>> From: Michal Hocko
> >>>
> >>> pagetypeinfo_s
up vmpressure notifications
>
> Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
> ---
> mm/vmscan.c | 28 ++--
> 1 file changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index db073b40c432..65baa89740
to access node or cgroup properties can look
> them them up if necessary, but there are only a few cases.
>
> Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
> ---
> mm/vmscan.c | 21 ++---
> 1 file changed, 10 insertions(+), 11 deletions(-)
>
>
y memcg will stall in page writeback so avoid forcibly
> + * stalling in wait_iff_congested().
> + */
> + if (cgroup_reclaim(sc) && writeback_throttling_sane(sc) &&
> + sc->nr.dirty && sc->nr.dirty == sc->nr.congested)
> + set_memcg_congestion(pgdat, root, true);
> +
> + /*
> + * Stall direct reclaim for IO completions if underlying BDIs
> + * and node is congested. Allow kswapd to continue until it
> + * starts encountering unqueued dirty pages or cycling through
> + * the LRU too quickly.
> + */
> + if (!sc->hibernation_mode && !current_is_kswapd() &&
> + current_may_throttle() && pgdat_memcg_congested(pgdat, root))
> + wait_iff_congested(BLK_RW_ASYNC, HZ/10);
>
> - } while (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
> - sc));
> + if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
> + sc))
> + goto again;
>
> /*
>* Kswapd gives up on balancing particular nodes after too
> --
> 2.23.0
--
Michal Hocko
SUSE Labs
f you insist on having sane in the name then I won't object but it just
raises a question whether we have some levels of throttling with a
different level of sanity.
> Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
> ---
> mm/vmscan.c | 38 ++
>
ck. Add it there.
>
> Then delete the swap check from inactive_list_is_low().
>
> Signed-off-by: Johannes Weiner
OK, makes sense to me.
Acked-by: Michal Hocko
> ---
> mm/vmscan.c | 9 +
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/
le in this area, swap the mem_cgroup_lruvec() argument order. The
> name suggests a memcg operation, yet it takes a pgdat first and a
> memcg second. I have to double take every time I call this. Fix that.
>
> Signed-off-by: Johannes Weiner
I do agree that node_lruvec() adds confusion and i
.
The original intention was to optimize this for GFP_KERNEL like
allocations by reducing the number of zones to reduce. But considering
this is not called from hot paths I do agree that a simpler code is more
preferable.
> Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
> --
On Wed 23-10-19 15:32:05, Vlastimil Babka wrote:
> On 10/23/19 12:27 PM, Michal Hocko wrote:
> > From: Michal Hocko
> >
> > pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
> > This is not really nice because it blocks both any interrupt
On Wed 23-10-19 19:53:50, Hillf Danton wrote:
>
> On Wed, 23 Oct 2019 10:17:29 +0200 Michal Hocko wrote:
[...]
> > This doesn't really answer my question.
> > Why cannot you use memcgs as they are now.
>
> No prio provided.
>
> > Why exactly do you need a fix
On Wed 23-10-19 10:56:08, Mel Gorman wrote:
> On Wed, Oct 23, 2019 at 11:04:22AM +0200, Michal Hocko wrote:
> > So can we go with this to address the security aspect of this and have
> > something trivial to backport.
> >
>
> Yes.
Ok, pat
From: Michal Hocko
pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
This is not really nice because it blocks both any interrupts on that
cpu and the page allocator. On large machines this might even trigger
the hard lockup detector.
Considering the pagetypei
From: Michal Hocko
/proc/pagetypeinfo is a debugging tool to examine internal page
allocator state wrt to fragmentation. It is not very useful for
any other use so normal users really do not need to read this file.
Waiman Long has noticed that reading this file can have negative side
effects
(PageOffline() + refcount == 0)?
Simply skip over PageOffline pages. Reference count should never be != 0
at this stage.
> In summary, is what you suggest simply delaying setting the reference count
> to 0
> in MEM_GOING_OFFLINE instead of right away when the driver unpluggs the pages?
Yes
> What's the big benefit you see and I fail to see?
Aparat from no hooks into __put_page it is also an explicit control over
the page via reference counting. Do you see any downsides?
--
Michal Hocko
SUSE Labs
On Wed 23-10-19 09:31:43, Mel Gorman wrote:
> On Tue, Oct 22, 2019 at 06:57:45PM +0200, Michal Hocko wrote:
> > [Cc Mel]
> >
> > On Tue 22-10-19 12:21:56, Waiman Long wrote:
> > > The pagetypeinfo_showfree_print() function prints out the number of
> > >
On Tue 22-10-19 22:28:02, Hillf Danton wrote:
>
> On Tue, 22 Oct 2019 14:42:41 +0200 Michal Hocko wrote:
> >
> > On Tue 22-10-19 20:14:39, Hillf Danton wrote:
> > >
> > > On Mon, 21 Oct 2019 14:27:28 +0200 Michal Hocko wrote:
> > [...]
> > >
On Wed 23-10-19 12:44:48, Hillf Danton wrote:
>
> On Tue, 22 Oct 2019 15:58:32 +0200 Michal Hocko wrote:
> >
> > On Tue 22-10-19 21:30:50, Hillf Danton wrote:
[...]
> > > in this RFC after ripping pages off
> > > the first victim, the work finishes w
+ reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
> + GFP_KERNEL, true);
> +
> + if (!reclaimed && !nr_retries--)
> + break;
> + }
>
> - memcg_wb_domain_size_changed(memcg);
> return nbytes;
> }
>
> --
> 2.23.0
--
Michal Hocko
SUSE Labs
Weiner
Acked-by: Michal Hocko
> ---
> mm/memcontrol.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 055975b0b3a3..ff90d4e7df37 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6122,10 +6122,8
burden of reclaim on regular allocation requests
> + * and let these go through as privileged allocations.
> + */
> + if (gfp_mask & __GFP_ATOMIC)
> + goto force;
> +
> /*
>* Unlike in global OOM situations, memcg is not in a physical
>* memory shortage. Allow dying and OOM-killed tasks to
> --
> 2.23.0
>
--
Michal Hocko
SUSE Labs
from Mel and Vlastimil
how would they feel about making free_list fully migrate type aware
(including nr_free).
> Why are we actually holding zone->lock so much? Can we get away with
> holding it across the list_for_each() loop and nothing else? If so,
> this still isn't a bulletproof fix. Maybe just terminate the list
> walk if freecount reaches 1024. Would anyone really care?
>
> Sigh. I wonder if anyone really uses this thing for anything
> important. Can we just remove it all?
Vlastimil would know much better but I have seen this being used for
fragmentation related debugging. That should imply that 0400 should be
sufficient and a quick and easily backportable fix for the most pressing
immediate problem.
--
Michal Hocko
SUSE Labs
On Tue 22-10-19 15:06:18, Hao Lee wrote:
> These comments should be updated as memcg limit enforcement has been moved
> from zones to nodes.
>
> Signed-off-by: Hao Lee
Acked-by: Michal Hocko
> ---
> include/linux/memcontrol.h | 5 ++---
> 1 file changed, 2 inser
< MIGRATE_TYPES; mtype++) {
> + seq_printf(m, "Node %4d, zone %8s, type %12s ",
> + pgdat->node_id,
> + zone->name,
> + migratetype_names[mtype]);
> + for (order = 0; order < MAX_ORDER; ++order)
> + seq_printf(m, "%6lu ", nfree[order][mtype]);
> seq_putc(m, '\n');
> }
> }
> --
> 2.18.1
--
Michal Hocko
SUSE Labs
On Tue 22-10-19 21:30:50, Hillf Danton wrote:
>
> On Mon, 21 Oct 2019 14:14:53 +0200 Michal Hocko wrote:
> >
> > On Mon 21-10-19 19:56:54, Hillf Danton wrote:
> > >
> > > Currently soft limit reclaim is frozen, see
> > > Documentation/admin-guide/cg
On Fri 18-10-19 07:54:20, Dave Hansen wrote:
> On 10/18/19 12:44 AM, Michal Hocko wrote:
> > How does this compare to
> > http://lkml.kernel.org/r/1560468577-101178-1-git-send-email-yang@linux.alibaba.com
>
> It's a _bit_ more tied to persistent memory and it appears a b
gt; things generally simpler.
What is the performance impact? Also what is the effect on the memory
reclaim side and the isolation. I would expect that mixing objects from
different cgroups would have a negative/unpredictable impact on the
memcg slab shrinking.
--
Michal Hocko
SUSE Labs
On Tue 22-10-19 15:22:06, Michal Hocko wrote:
> On Thu 17-10-19 17:28:04, Roman Gushchin wrote:
> [...]
> > Using a drgn* script I've got an estimation of slab utilization on
> > a number of machines running different production workloads. In most
> > cases it was between 4
ific caches that tend to utilize much worse than others?
--
Michal Hocko
SUSE Labs
On Tue 22-10-19 20:14:39, Hillf Danton wrote:
>
> On Mon, 21 Oct 2019 14:27:28 +0200 Michal Hocko wrote:
[...]
> > Why do we care and which workloads would benefit and how much.
>
> Page preemption, disabled by default, should be turned on by those
> who wish
On Fri 18-10-19 14:35:06, David Hildenbrand wrote:
> On 18.10.19 13:20, Michal Hocko wrote:
> > On Fri 18-10-19 10:50:24, David Hildenbrand wrote:
> > > On 18.10.19 10:15, Michal Hocko wrote:
[...]
> > > > for that - MEM_GOING_OFFLINE notification.
On Tue 22-10-19 11:58:52, Oscar Salvador wrote:
> On Tue, Oct 22, 2019 at 11:22:56AM +0200, Michal Hocko wrote:
> > Hmm, that might be a misunderstanding on my end. I thought that it is
> > the MCE handler to say whether the failure is recoverable or not. If yes
> > then we
On Tue 22-10-19 11:17:24, David Hildenbrand wrote:
> On 22.10.19 11:14, Michal Hocko wrote:
> > On Tue 22-10-19 10:32:11, David Hildenbrand wrote:
> > [...]
> > > E.g., arch/x86/kvm/mmu.c:kvm_is_mmio_pfn()
> >
> > Thanks for these references. I am not real
On Tue 22-10-19 10:35:17, Oscar Salvador wrote:
> On Tue, Oct 22, 2019 at 10:26:11AM +0200, Michal Hocko wrote:
> > On Tue 22-10-19 09:46:20, Oscar Salvador wrote:
> > [...]
> > > So, opposite to hard-offline, in soft-offline we do not fiddle with pages
> >
we do care
about holes in RAM (from the early boot), those should be reserved
already AFAIR. So we are left with hotplugged memory with holes and
I am not really sure we should bother with this until there is a clear
usecase in sight.
--
Michal Hocko
SUSE Labs
On Tue 22-10-19 10:23:37, David Hildenbrand wrote:
> On 22.10.19 10:20, Michal Hocko wrote:
> > On Mon 21-10-19 17:54:35, David Hildenbrand wrote:
> > > On 21.10.19 17:47, Michal Hocko wrote:
> > > > On Mon 21-10-19 17:39:36, David Hildenbrand wrote:
> > > &g
On Tue 22-10-19 09:56:27, Oscar Salvador wrote:
> On Mon, Oct 21, 2019 at 04:06:19PM +0200, Michal Hocko wrote:
> > On Mon 21-10-19 15:48:48, Oscar Salvador wrote:
> > > We can only perform actions on LRU/Movable pages or hugetlb pages.
> >
> > What would preve
enttly from MCE (hard-offline)?
--
Michal Hocko
SUSE Labs
On Tue 22-10-19 10:15:07, David Hildenbrand wrote:
> On 22.10.19 10:08, Michal Hocko wrote:
> > On Tue 22-10-19 08:52:28, David Hildenbrand wrote:
> > > On 21.10.19 19:23, David Hildenbrand wrote:
> > > > Two cleanups that popped up while working on (and d
On Mon 21-10-19 17:54:35, David Hildenbrand wrote:
> On 21.10.19 17:47, Michal Hocko wrote:
> > On Mon 21-10-19 17:39:36, David Hildenbrand wrote:
> > > On 21.10.19 16:43, Michal Hocko wrote:
> > [...]
> > > > We still set PageReserved before onlining pages
to_online_page() check already). But of course, there might be special
> cases
I remember Alexander didn't want to change the PageReserved handling
because he was worried about unforeseeable side effects. I have a vague
recollection he (or maybe Dan) has promissed some follow up clean ups
which didn't seem to materialize.
--
Michal Hocko
SUSE Labs
On Mon 21-10-19 17:39:36, David Hildenbrand wrote:
> On 21.10.19 16:43, Michal Hocko wrote:
[...]
> > We still set PageReserved before onlining pages and that one should be
> > good to go as well (memmap_init_zone).
> > Thanks!
>
> memmap_init_zone() is called when onli
On Mon 21-10-19 14:58:49, Oscar Salvador wrote:
> On Fri, Oct 18, 2019 at 02:06:15PM +0200, Michal Hocko wrote:
> > On Thu 17-10-19 16:21:17, Oscar Salvador wrote:
> > [...]
> > > +bool take_page_off_buddy(struct page *page)
> > > + {
> > > + struct zone
e such
> memory.
>
> Let's generalize the approach so we can special case other types of
> pages we want to skip over in case we offline memory. While at it, also
> pass the same flags to test_pages_isolated().
>
> Cc: Michal Hocko
> Cc: Oscar Salvador
> Cc: Andrew Morton
ges were set
> PageReserved so re-onling would work as expected).
>
> Cc: Andrew Morton
> Cc: Michal Hocko
> Cc: Vlastimil Babka
> Cc: Oscar Salvador
> Cc: Mel Gorman
> Cc: Mike Rapoport
> Cc: Dan Williams
> Cc: Wei Yang
> Cc: Alexander Duyck
> Cc: Anshuman Kha
t_sleep+0x334/0x370
> [ 15.590588][ T658] [c0003d8cfbb0] [c094a784]
> __mutex_lock+0x84/0xb20
> [ 15.590643][ T658] [c0003d8cfcc0] [c0954038]
> zone_pcp_update+0x34/0x64
> [ 15.590689][ T658] [c0003d8cfcf0] [c0b9e6bc]
> deferred_init_memmap+0x1b8/0x26c
> [ 15.590739][ T658] [c0003d8cfdb0] [c0149528]
> kthread+0x1a8/0x1b0
> [ 15.590790][ T658] [c0003d8cfe20] [c000b748]
> ret_from_kernel_thread+0x5c/0x74
--
Michal Hocko
SUSE Labs
On Mon 21-10-19 15:48:48, Oscar Salvador wrote:
> On Fri, Oct 18, 2019 at 02:39:01PM +0200, Michal Hocko wrote:
> >
> > I am sorry but I got lost in the above description and I cannot really
> > make much sense from the code either. Let me try to outline the way
re compared when deactivating lru
> pages, and skip page if it is higher on prio.
>
> V1 is based on next-20191018.
>
> Changes since v0
> - s/page->nice/page->prio/
> - drop the role of kswapd's reclaiming prioirty in prio comparison
> - add pgdat->kswapd_prio
&g
On Mon 21-10-19 07:02:55, Naoya Horiguchi wrote:
> On Fri, Oct 18, 2019 at 01:52:27PM +0200, Michal Hocko wrote:
> > On Thu 17-10-19 16:21:09, Oscar Salvador wrote:
> > > From: Naoya Horiguchi
> > >
> > > The call to get_user_pages_fast is only to ge
On Mon 21-10-19 07:00:46, Naoya Horiguchi wrote:
> On Fri, Oct 18, 2019 at 01:48:32PM +0200, Michal Hocko wrote:
> > On Thu 17-10-19 16:21:08, Oscar Salvador wrote:
> > > From: Naoya Horiguchi
> > >
> > > Drop the PageHuge check since memory_failure fork
it/Kconfig
> - drop changes in mm/vmscan.c
> - make memcg lru work in parallel to slr
>
> Cc: Chris Down
> Cc: Tejun Heo
> Cc: Roman Gushchin
> Cc: Michal Hocko
> Cc: Johannes Weiner
> Cc: Shakeel Butt
> Cc: Matthew Wilcox
> Cc: Minchan Kim
> Cc: Mel Go
batch: 1
> 768 batch: 63
> 256 high: 0
> 768 high: 378
>
> Cc: sta...@vger.kernel.org # v4.1+
> Signed-off-by: Mel Gorman
Acked-by: Michal Hocko
> ---
> mm/page_alloc.c | 8
> 1 file changed,
On Mon 21-10-19 10:28:16, David Hildenbrand wrote:
> On 21.10.19 10:26, Michal Hocko wrote:
> > Has this been properly reviewed? I do not see any Acks nor Reviewed-bys.
> >
>
> As I modified this patch while carrying it along, it at least has my
> implicit Ack/RB.
OK,
_start_pfn)
> + node_start_pfn = zone->zone_start_pfn;
> }
>
> - /* The pgdat has no valid section */
> - pgdat->node_start_pfn = 0;
> - pgdat->node_spanned_pages = 0;
> + pgdat->node_start_pfn = node_start_pfn;
> + pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
> }
>
> static void __remove_zone(struct zone *zone, unsigned long start_pfn,
> @@ -507,7 +465,7 @@ static void __remove_zone(struct zone *z
>
> pgdat_resize_lock(zone->zone_pgdat, );
> shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
> - shrink_pgdat_span(pgdat, start_pfn, start_pfn + nr_pages);
> + update_pgdat_span(pgdat);
> pgdat_resize_unlock(zone->zone_pgdat, );
> }
>
> _
--
Michal Hocko
SUSE Labs
ogan Gunthorpe
> Cc: Ira Weiny
> Cc: Damian Tometzki
> Cc: Alexander Duyck
> Cc: Alexander Potapenko
> Cc: Andy Lutomirski
> Cc: Anshuman Khandual
> Cc: Benjamin Herrenschmidt
> Cc: Borislav Petkov
> Cc: Catalin Marinas
> Cc: Christian Borntraeger
> Cc: Ch
ensed form
On Tue 01-10-19 10:37:43, Michal Hocko wrote:
> I have split out my kvm machine into two nodes to get at least some
> idea how these patches behave
> $ numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 2
> node 0 size: 475 MB
> node 0 free: 432 MB
> node 1 cpus
On Fri 18-10-19 11:56:06, Mel Gorman wrote:
> Memory hotplug needs to be able to reset and reinit the pcpu allocator
> batch and high limits but this action is internal to the VM. Move
> the declaration to internal.h
>
> Signed-off-by: Mel Gorman
Acked-by: Michal Hocko
> --
sta...@vger.kernel.org # v4.15+
Hmm, are you sure about 4.15? Doesn't this go all the way down to
deferred initialization? I do not see any recent changes on when
setup_per_cpu_pageset is called.
> Signed-off-by: Mel Gorman
Acked-by: Michal Hocko
> ---
> mm/page_alloc.c |
>
> Signed-off-by: Mel Gorman
Acked-by: Michal Hocko
> ---
> mm/page_alloc.c | 23 ---
> 1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c0b2e0306720..cafe568d36f6 100644
> --- a/mm/page_al
This
> - * test is performed under the zone lock to prevent a race against page
> - * allocation.
> - */
> -bool set_hwpoison_free_buddy_page(struct page *page)
> -{
> - struct zone *zone = page_zone(page);
> - unsigned long pfn = page_to_pfn(page);
> - unsigned long flags;
> - unsigned int order;
> - bool hwpoisoned = false;
> -
> - spin_lock_irqsave(>lock, flags);
> - for (order = 0; order < MAX_ORDER; order++) {
> - struct page *page_head = page - (pfn & ((1 << order) - 1));
> -
> - if (PageBuddy(page_head) && page_order(page_head) >= order) {
> - if (!TestSetPageHWPoison(page))
> - hwpoisoned = true;
> - break;
> - }
> - }
> - spin_unlock_irqrestore(>lock, flags);
> -
> - return hwpoisoned;
> -}
> #endif
> --
> 2.12.3
--
Michal Hocko
SUSE Labs
pin_unlock_irqrestore(>lock, flags);
> + return ret;
> + }
> +
> +/*
> * Set PG_hwpoison flag if a given page is confirmed to be a free page. This
> * test is performed under the zone lock to prevent a race against page
> * allocation.
> --
> 2.12.3
--
Michal Hocko
SUSE Labs
reference taken by get_user_pages_fast(). In
> - * the absence of MF_COUNT_INCREASED the memory_failure()
> - * routine is responsible for pinning the page to prevent it
> - * from being released back to the page allocator.
> - */
> - put_page(page);
> ret = memory_failure(pfn, 0);
> if (ret)
> return ret;
> --
> 2.12.3
>
--
Michal Hocko
SUSE Labs
> + page_flags = p->flags;
>
> /*
>* unpoison always clear PG_hwpoison inside page lock
> --
> 2.12.3
--
Michal Hocko
SUSE Labs
1301 - 1400 of 20557 matches
Mail list logo