hange or
> memory online/offline) are actually unchanged from the previous ones.
>
> Signed-off-by: Vlastimil Babka
Acked-by: Michal Hocko
I would consider the check flipped with early return more pleasing to my
eyes but nothing to lose sleep over.
> ---
> include/linux/
t pagesets then subsequently update them to the proper
> values.
>
> No functional change.
>
> Signed-off-by: Vlastimil Babka
> Reviewed-by: David Hildenbrand
Acked-by: Michal Hocko
Btw. where do we initialize pcp->count? I thought that pcp allocator
zeroes out the
On Wed 07-10-20 14:21:44, Peter Zijlstra wrote:
> On Wed, Oct 07, 2020 at 02:04:01PM +0200, Michal Hocko wrote:
> > I wanted to make sure that the idea is sound for maintainers first. The
> > next step would be extending the command line to support full preemption
> > as w
On Wed 07-10-20 14:19:39, Peter Zijlstra wrote:
> On Wed, Oct 07, 2020 at 02:04:01PM +0200, Michal Hocko wrote:
> > From: Michal Hocko
> >
> > Many people are still relying on pre built distribution kernels and so
> > distributions have to provide mutliple kernel
From: Michal Hocko
Many people are still relying on pre built distribution kernels and so
distributions have to provide mutliple kernel flavors to offer different
preemption models. Most of them are providing PREEMPT_NONE for typical
server deployments and PREEMPT_VOLUNTARY for desktop users
On Wed 07-10-20 00:25:29, Uladzislau Rezki wrote:
> On Mon, Oct 05, 2020 at 05:41:00PM +0200, Michal Hocko wrote:
> > On Mon 05-10-20 17:08:01, Uladzislau Rezki wrote:
> > > On Fri, Oct 02, 2020 at 11:05:07AM +0200, Michal Hocko wrote:
> > > > On Fri 02-10
Yang Shi
> Cc: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org
--
Michal Hocko
SUSE Labs
eviewed-by: Oscar Salvador
> Acked-by: Pankaj Gupta
> Reviewed-by: Wei Yang
> Cc: Andrew Morton
> Cc: Alexander Duyck
> Cc: Mel Gorman
> Cc: Michal Hocko
> Cc: Dave Hansen
> Cc: Vlastimil Babka
> Cc: Wei Yang
> Cc: Oscar Salvador
> Cc: Mike Rapopo
e
functionality users are getting.
> I just do not get why hugetlbfs is so special that it can have pagesize
> fine control when normal pages cannot get. The “it should be invisible
> to userpsace” argument suddenly does not hold for hugetlbfs.
In short it provides a guarantee.
Does the above clarifies it a bit?
[1] this is not entirely true though because there is a non-trivial
admin interface around THP. Mostly because they turned out to be too
transparent and many people do care about internal fragmentation,
allocation latency, locality (small page on a local node or a large on a
slightly further one?) or simply follow a cargo cult - just have a look
how many admin guides recommend disabling THPs. We got seriously burned
by 2MB THP because of the way how they were enforced on users.
--
Michal Hocko
SUSE Labs
On Tue 06-10-20 10:40:23, David Hildenbrand wrote:
> On 06.10.20 10:34, Michal Hocko wrote:
> > On Tue 22-09-20 16:37:12, Vlastimil Babka wrote:
> >> Page isolation can race with process freeing pages to pcplists in a way
> >> that
> >> a page from isolated p
line_pages().
>
> [1]
> https://lore.kernel.org/linux-mm/20200903140032.380431-1-pasha.tatas...@soleen.com/
>
> Suggested-by: David Hildenbrand
> Suggested-by: Michal Hocko
> Signed-off-by: Vlastimil Babka
> ---
> include/linux/mmzone.h | 2 ++
> include/linux
On Tue 06-10-20 08:26:35, Anshuman Khandual wrote:
>
>
> On 10/05/2020 11:35 AM, Michal Hocko wrote:
> > On Mon 05-10-20 07:59:12, Anshuman Khandual wrote:
> >>
> >>
> >> On 10/02/2020 05:34 PM, Michal Hocko wrote:
> >>> On Wed 30-09-20 11:30
On Mon 05-10-20 16:22:46, Vlastimil Babka wrote:
> On 10/5/20 4:05 PM, Michal Hocko wrote:
> > On Fri 25-09-20 13:10:05, Vlastimil Babka wrote:
> >> On 9/25/20 12:54 PM, David Hildenbrand wrote:
> >>
> >> Hmm that temporary write lock would still block new cal
On Mon 05-10-20 17:08:01, Uladzislau Rezki wrote:
> On Fri, Oct 02, 2020 at 11:05:07AM +0200, Michal Hocko wrote:
> > On Fri 02-10-20 09:50:14, Mel Gorman wrote:
> > > On Fri, Oct 02, 2020 at 09:11:23AM +0200, Michal Hocko wrote:
> > > > On Thu 01-10-20 21
tomic_inc(...);
> else if (atomic_inc_return == 1)
> // atomic_cmpxchg from 0 to 1; if that fails, goto retry
>
> Tricky, but races could only read to unnecessary duplicated updates + flushing
> but nothing worse?
>
> Or add another spinlock to cover this part instead of the temp write lock...
Do you plan to post a new version or should I review this one?
--
Michal Hocko
SUSE Labs
HIBERNATION
> >
> > /*
> >
>
> Interesting race. Instead of this ugly __drain_all_pages() with a
> boolean parameter, can we have two properly named functions to be used
> in !page_alloc.c code without scratching your head what the difference is?
I tend to ag
the
> current imperfect draining to the callers also as a preparation step.
>
> Suggested-by: Pavel Tatashin
> Signed-off-by: Vlastimil Babka
Acked-by: Michal Hocko
> ---
> mm/memory_hotplug.c | 11 ++-
> mm/page_alloc.c | 2 ++
> mm/page_isolation.c |
(static) per cpu variable into the per cpu area.
>*/
> zone->pageset = &boot_pageset;
> + zone->pageset_high = BOOT_PAGESET_HIGH;
> + zone->pageset_batch = BOOT_PAGESET_BATCH;
>
> if (populated_zone(zone))
> printk(KERN_DEBUG " %s zone: %lu pages, LIFO batch:%u\n",
> --
> 2.28.0
--
Michal Hocko
SUSE Labs
rcpu(struct per_cpu_pageset);
> + new_pageset = alloc_percpu(struct per_cpu_pageset);
> for_each_possible_cpu(cpu) {
> - p = per_cpu_ptr(zone->pageset, cpu);
> + p = per_cpu_ptr(new_pageset, cpu);
> pageset_init(p);
> }
>
> + smp_store_release(&zone->pageset, new_pageset);
> zone_set_pageset_high_and_batch(zone);
> }
>
> --
> 2.28.0
--
Michal Hocko
SUSE Labs
bka
Yes, this should be safe AFAICS. I believe the original intention was
well minded but didn't go all the way to do the thing properly.
I have to admit I have stumbled over this weirdness few times and never
found enough motivation to think that through.
Acked-by: Michal Hocko
> --
to all per-cpu pagesets of the zone.
>
> This also allows removing the zone_pageset_init() and __zone_pcp_update()
> wrappers.
>
> No functional change.
>
> Signed-off-by: Vlastimil Babka
> Reviewed-by: Oscar Salvador
> Reviewed-by: David Hildenbrand
I like this.
called from the memory hotplug as well. Isn't this more about
early zone initialization rather than boot pagesets? Or am I misreading
the patch?
> + */
> + pcp->high = 0;
> + pcp->batch = 1;
> }
>
> /*
> --
> 2.28.0
--
Michal Hocko
SUSE Labs
ers instead.
>
> No functional change.
>
> Signed-off-by: Vlastimil Babka
> Reviewed-by: Oscar Salvador
yes this looks better, the original code was really hard to follow.
Acked-by: Michal Hocko
> ---
> mm/page_alloc.c | 49 -
>
On Mon 05-10-20 11:13:48, David Hildenbrand wrote:
> On 05.10.20 08:12, Michal Hocko wrote:
> > On Sat 03-10-20 00:44:09, Topi Miettinen wrote:
> >> On 2.10.2020 20.52, David Hildenbrand wrote:
> >>> On 02.10.20 19:19, Topi Miettinen wrote:
> >>>>
A similar thing has been proposed recently by Shakeel
http://lkml.kernel.org/r/20200909215752.1725525-1-shake...@google.com
Please have a look at the follow up discussion.
--
Michal Hocko
SUSE Labs
gt; compatibility with legacy software is more important than any hardening.
I believe we already do have means to filter syscalls from userspace for
security hardened environements. Or is there any reason to duplicate
that and control during the configuration time?
--
Michal Hocko
SUSE Labs
On Fri 02-10-20 21:53:37, pi...@codeaurora.org wrote:
> On 2020-10-02 17:47, Michal Hocko wrote:
>
> > > __vm_enough_memory: commitment overflow: ppid:150, pid:164,
> > > pages:62451
> > > fork failed[count:0]: Cannot allocate memory
> >
> > While I u
On Fri 02-10-20 17:20:09, David Hildenbrand wrote:
> On 02.10.20 15:24, Michal Hocko wrote:
> > On Mon 28-09-20 20:21:08, David Hildenbrand wrote:
> >> Page isolation doesn't actually touch the pages, it simply isolates
> >> pageblocks and moves all free pages
for users to configure? How
do I know that something won't break? brk() is one of those syscalls
that has been here for ever and a lot of userspace might depend on it.
I haven't checked but the code size is very unlikely to be shrunk much
as this is mostly a tiny wrapper around mmap code. We are not going to
get rid of any complexity.
So what is the point?
--
Michal Hocko
SUSE Labs
On Mon 05-10-20 07:59:12, Anshuman Khandual wrote:
>
>
> On 10/02/2020 05:34 PM, Michal Hocko wrote:
> > On Wed 30-09-20 11:30:49, Anshuman Khandual wrote:
> >> Add following new vmstat events which will track HugeTLB page migration.
> >>
> &g
On Fri 02-10-20 09:53:05, Rik van Riel wrote:
> On Fri, 2020-10-02 at 09:03 +0200, Michal Hocko wrote:
> > On Thu 01-10-20 18:18:10, Sebastiaan Meijer wrote:
> > > (Apologies for messing up the mailing list thread, Gmail had fooled
> > > me into
> > > beli
c: Alexander Duyck
> Cc: Mel Gorman
> Cc: Michal Hocko
> Cc: Dave Hansen
> Cc: Vlastimil Babka
> Cc: Wei Yang
> Cc: Oscar Salvador
> Cc: Mike Rapoport
> Signed-off-by: David Hildenbrand
Acked-by: Michal Hocko
> ---
> mm/memory_hotplug.c | 11 ---
>
briefly. I do not expect this to make a huge difference but who knows.
It makes some sense to add pages in the order they show up in the
physical address ordering.
> Reviewed-by: Vlastimil Babka
> Reviewed-by: Oscar Salvador
> Cc: Andrew Morton
> Cc: Alexander Duyck
> Cc: Mel Gorman
by: Oscar Salvador
> Cc: Andrew Morton
> Cc: Alexander Duyck
> Cc: Mel Gorman
> Cc: Michal Hocko
> Cc: Dave Hansen
> Cc: Vlastimil Babka
> Cc: Wei Yang
> Cc: Oscar Salvador
> Cc: Mike Rapoport
> Cc: Scott Cheloha
> Cc: Michael Ellerman
> Sign
whole
> zone when undoing isolation of larger ranges, and after
> free_contig_range().
>
> Reviewed-by: Alexander Duyck
> Reviewed-by: Oscar Salvador
> Cc: Andrew Morton
> Cc: Alexander Duyck
> Cc: Mel Gorman
> Cc: Michal Hocko
> Cc: Dave Hansen
> Cc: Vlas
ld
> be good enough for internal purposes.
>
> Reviewed-by: Alexander Duyck
> Reviewed-by: Vlastimil Babka
> Reviewed-by: Oscar Salvador
> Cc: Andrew Morton
> Cc: Alexander Duyck
> Cc: Mel Gorman
> Cc: Michal Hocko
> Cc: Dave Hansen
> Cc: Vlastimil Bab
es);
>
> + pr_err_once("%s: commitment overflow: ppid:%d, pid:%d, pages:%ld\n",
> + __func__, current->parent->pid, current->pid, pages);
> +
> return -ENOMEM;
> }
>
> --
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
> is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
--
Michal Hocko
SUSE Labs
_split);
> + count_vm_events(HUGETLB_MIGRATION_SUCCESS, nr_hugetlb_succeeded);
> + count_vm_events(HUGETLB_MIGRATION_FAIL, nr_hugetlb_failed);
> trace_mm_migrate_pages(nr_succeeded, nr_failed, nr_thp_succeeded,
> -nr_thp_failed, nr_thp_split, mode, reason);
> +nr_thp_failed, nr_thp_split,
> nr_hugetlb_succeeded,
> +nr_hugetlb_failed, mode, reason);
>
> if (!swapwrite)
> current->flags &= ~PF_SWAPWRITE;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 79e5cd0abd0e..12fd35ba135f 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1286,6 +1286,8 @@ const char * const vmstat_text[] = {
> "thp_migration_success",
> "thp_migration_fail",
> "thp_migration_split",
> + "hugetlb_migration_success",
> + "hugetlb_migration_fail",
> #endif
> #ifdef CONFIG_COMPACTION
> "compact_migrate_scanned",
> --
> 2.20.1
>
--
Michal Hocko
SUSE Labs
racking it may help with the
> next message.
Auto tuning and user provided override is quite tricky to get sensible.
Especially in the case here. Admin has provided an override but has the
potential memory hotplug been considered? Or to make it even more
complicated, consider that the hotplug happens without admin involvement
- e.g. memory gets hotremoved due to HW problems. Is the admin provided
value still meaningful? To be honest I do not have a good answer and I
am not sure we should care all that much until we see practical
problems.
--
Michal Hocko
SUSE Labs
On Fri 02-10-20 09:50:14, Mel Gorman wrote:
> On Fri, Oct 02, 2020 at 09:11:23AM +0200, Michal Hocko wrote:
> > On Thu 01-10-20 21:26:26, Uladzislau Rezki wrote:
> > > >
> > > > No, I meant going back to idea of new gfp flag, but adjust the
> > >
y do not want to be very explicit about that.
E.g. an interface for address space defragmentation without any more
specifics sounds like a useful feature to me. It will be up to the
kernel to decide which huge pages to use.
--
Michal Hocko
SUSE Labs
On Thu 01-10-20 11:14:14, Zi Yan wrote:
> On 30 Sep 2020, at 7:55, Michal Hocko wrote:
>
> > On Mon 28-09-20 13:53:58, Zi Yan wrote:
> >> From: Zi Yan
> >>
> >> Hi all,
> >>
> >> This patchset adds support for 1GB PUD THP on x86_64. It
if a new gfp flag gains a sufficient traction and support I am
_strongly_ opposed against consuming another flag for that. Bit space is
limited. Besides that we certainly do not want to allow craziness like
__GFP_NO_LOCK | __GFP_RECLAIM (and similar), do we?
--
Michal Hocko
SUSE Labs
On Thu 01-10-20 18:18:10, Sebastiaan Meijer wrote:
> (Apologies for messing up the mailing list thread, Gmail had fooled me into
> believing that it properly picked up the thread)
>
> On Thu, 1 Oct 2020 at 14:30, Michal Hocko wrote:
> >
> > On Wed 30-09-20 21:27:12,
is commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations
> carried out by the single-argument variant of kvfree_rcu(), thus avoiding
> this can-sleep code path from dipping into the emergency reserves.
>
> Suggested-by: Michal Hocko
> Signed-off-by: Paul E. McKe
normal swap out in a
context outside of the reclaim?
My recollection of the particular patch is dimm but I do remember it
tried to add more kswapd threads which would just paper over the problem
you are seein rather than solve it.
--
Michal Hocko
SUSE Labs
On Wed 30-09-20 16:21:54, Paul E. McKenney wrote:
> On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote:
> > On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
[...]
> > > No argument on it being confusing, and I hope that the added header
> > > comment helps.
On Wed 30-09-20 13:03:29, Joel Fernandes wrote:
> On Wed, Sep 30, 2020 at 12:48 PM Michal Hocko wrote:
> >
> > On Wed 30-09-20 11:25:17, Joel Fernandes wrote:
> > > On Fri, Sep 25, 2020 at 05:47:41PM +0200, Michal Hocko wrote:
> > > > On Fri 25-09
On Wed 30-09-20 11:25:17, Joel Fernandes wrote:
> On Fri, Sep 25, 2020 at 05:47:41PM +0200, Michal Hocko wrote:
> > On Fri 25-09-20 17:31:29, Uladzislau Rezki wrote:
> > > > > > >
> > > > > > > All good points!
> > > > > > >
On Wed 30-09-20 15:39:54, Uladzislau Rezki wrote:
> On Wed, Sep 30, 2020 at 02:44:13PM +0200, Michal Hocko wrote:
> > On Wed 30-09-20 14:35:35, Uladzislau Rezki wrote:
> > > On Wed, Sep 30, 2020 at 11:27:32AM +0200, Michal Hocko wrote:
> > > > On Tue 29-09-20 18
On Wed 30-09-20 14:35:35, Uladzislau Rezki wrote:
> On Wed, Sep 30, 2020 at 11:27:32AM +0200, Michal Hocko wrote:
> > On Tue 29-09-20 18:25:14, Uladzislau Rezki wrote:
> > > > > I look at it in scope of GFP_ATOMIC/GFP_NOWAIT issues, i.e. inability
> > > > > t
we need some sort of access control or privilege check as some THPs
would be a really scarce (like those that require pre-reservation).
--
Michal Hocko
SUSE Labs
ense
> here because mem_cgroup_oom_lock() does not operate on under_oom field. So
> we reword the comment as this would be helpful.
> [Thanks Michal Hocko for rewording this comment.]
>
> Signed-off-by: Miaohe Lin
> Cc: Johannes Weiner
> Cc: Michal Hocko
> Cc: Vladimir Davydov
Ac
s. I do not think we want users to be
aware of internal implementation details like pcp caches, migrate types
or others. While pcp caches are here for years and unlikely to change in
a foreseeable future many details are changing on regular basis.
--
Michal Hocko
SUSE Labs
> %GFP_ATOMIC users can not sleep and need the allocation to succeed. A %lower
>
>
> should be rephrased, IMHO.
Any suggestions? Or more specifics about which part is conflicting? It
tries to say that there is a higher demand to succeed even though the
context cannot sleep to take active measures to achieve that. So the
only way to achieve that is to break the watermakrs to a certain degree
which is making them more "higher class" than other allocations.
--
Michal Hocko
SUSE Labs
On Wed 30-09-20 01:34:25, linmiaohe wrote:
> Michal Hocko wrote:
> > On Thu 17-09-20 06:59:00, Miaohe Lin wrote:
> >> Since commit 79dfdaccd1d5 ("memcg: make oom_lock 0 and 1 based rather
> >> than counter"), the mem_cgroup_unmark_under_oom()
On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
> On Tue, Sep 29, 2020 at 02:07:56PM +0200, Michal Hocko wrote:
> > On Mon 28-09-20 16:31:01, paul...@kernel.org wrote:
> > [...]
>
> Apologies for the delay, but today has not been boring.
>
> > > This commi
to implement this because
that tends to be tricky from the configuration POV as you mentioned
above. But a new limit (memory.middle for a lack of a better name) to
define the background reclaim sounds like a good fit with above points.
--
Michal Hocko
SUSE Labs
ult set when THP enabled is lost. This change restores min_free_kbytes
> as expected for THP consumers.
>
> Fixes: f000565adb77 ("thp: set recommended min free kbytes")
>
> Signed-off-by: Vijay Balakrishna
> Cc: sta...@vger.kernel.org
> Reviewed-by: Pavel Tatashin
On Tue 29-09-20 11:00:03, Daniel Vetter wrote:
> On Tue, Sep 29, 2020 at 10:19:38AM +0200, Michal Hocko wrote:
> > On Wed 16-09-20 23:43:02, Daniel Vetter wrote:
> > > I can
> > > then figure out whether it's better to risk not spotting issues with
> > >
is under oom,
> - * mem_cgroup_oom_lock() may not be called. Watch for underflow.
> - */
> spin_lock(&memcg_oom_lock);
> for_each_mem_cgroup_tree(iter, memcg)
> if (iter->under_oom > 0)
> --
> 2.19.1
--
Michal Hocko
SUSE Labs
From: Michal Hocko
There is a general understanding that GFP_ATOMIC/GFP_NOWAIT are
to be used from atomic contexts. E.g. from within a spin lock or from
the IRQ context. This is correct but there are some atomic contexts
where the above doesn't hold. One of them would be an NMI context.
!(*krcp)->bkvhead[idx] ||
> + (*krcp)->bkvhead[idx]->nr_records ==
> KVFREE_BULK_MAX_ENTR) {
> + bnode = get_cached_bnode(*krcp);
> + if (!bnode && can_alloc_page) {
> + krc_this_cpu_unlock(*krcp, *flags);
> + bnode = kmalloc(PAGE_SIZE, gfp);
What is the point of calling kmalloc for a PAGE_SIZE object? Wouldn't
using the page allocator directly be better?
--
Michal Hocko
SUSE Labs
On Tue 29-09-20 17:38:43, Joonsoo Kim wrote:
> 2020년 9월 29일 (화) 오후 5:08, Michal Hocko 님이 작성:
> >
> > On Mon 28-09-20 17:50:46, Joonsoo Kim wrote:
> > > From: Joonsoo Kim
> > >
> > > memalloc_nocma_{save/restore} APIs can be used to skip page allocation
o carefuly consider failure.
This is not a random allocation mode.
--
Michal Hocko
SUSE Labs
page = __rmqueue_smallest(zone, order,
> MIGRATE_HIGHATOMIC);
> if (page)
> trace_mm_page_alloc_zone_locked(page, order,
> migratetype);
But this condition is not clear to me. __rmqueue_smallest doesn't access
pcp lists. Maybe I have missed the point in the original discussion but
this deserves a comment at least.
> --
> 2.7.4
--
Michal Hocko
SUSE Labs
s applied to allow access to "atomic reserves"
+ * watermark is applied to allow access to "atomic reserves".
+ * The current implementation doesn't support NMI and other non-preemptive
context
+ * (e.g. raw_spin_lock).
*
* %GFP_KERNEL is typical for kernel-internal allocations. The caller requires
* %ZONE_NORMAL or a lower zone for direct access but can direct reclaim.
[...]
--
Michal Hocko
SUSE Labs
at much. They
simply use whatever they can find or somebody will show them. Really,
deprecation has never really worked. The only thing that worked was to
remove the functionality and then wait for somebody to complain and
revert or somehow allow the functionality without necessity to alter the
userspace.
As much as I would like to remove as much crud as possible I strongly
suspect that the existing hotplug interface is just a lost case and it
doesn't make for the best used time to put a lip stick on a pig. Even if
we remove this particular interface we are not going to get rid of a lot
of code or we won't gain any more sensible semantic, right?
--
Michal Hocko
SUSE Labs
nsumers.
Anyway, I am afraid that we are going in circles here. We do not have
any meaningful numbers to claim memory footprint problems. There is a
clear opposition to hook into page allocator for reasons already
mentioned. You are looking for a dedicated memory pool and it should be
quite trivial to develop one and fine tune it for your specific usecase.
All that on top of page allocator. Unless this is seen as completely
unfeasible based on some solid arguments then we can start talking about
the page allocator itself.
--
Michal Hocko
SUSE Labs
200922070726.dlw24lf3wd3p2...@black.fi.intel.com
--
Michal Hocko
SUSE Labs
ith this patch. I am not sure this is worth backporting to
stable trees becasuse this is not a functional bug. Surprising behavior,
yes, but not much more than that.
Acked-by: Michal Hocko
One minor comment below
[...]
> @@ -857,6 +858,7 @@ int __ref online_pages(unsigned long pfn, unsigned
page *page = area->pages[i];
> -
> - BUG_ON(!page);
> - __free_pages(page, 0);
> - }
> + release_pages(area->pages, area->nr_pages);
> atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
> -
> kvfree(area->pages);
> }
>
> --
> 2.28.0
--
Michal Hocko
SUSE Labs
On Mon 21-09-20 18:06:44, Michal Hocko wrote:
[...]
> Thanks a lot for this clarification! So I believe the only existing bug
> is in documentation which should be explicit that the cgroup fd read
> access is not sufficient because it also requires to have a write access
> for cgroup.
On Tue 22-09-20 11:10:17, Shakeel Butt wrote:
> On Tue, Sep 22, 2020 at 9:55 AM Michal Hocko wrote:
[...]
> > Last but not least the memcg
> > background reclaim is something that should be possible without a new
> > interface.
>
> So, it comes down to adding more
On Tue 22-09-20 11:10:17, Shakeel Butt wrote:
> On Tue, Sep 22, 2020 at 9:55 AM Michal Hocko wrote:
[...]
> > So far I have learned that you are primarily working around an
> > implementation detail in the zswap which is doing the swapout path
> > directly in the pageout pa
On Tue 22-09-20 09:51:30, Shakeel Butt wrote:
> On Tue, Sep 22, 2020 at 9:34 AM Michal Hocko wrote:
> >
> > On Tue 22-09-20 09:29:48, Shakeel Butt wrote:
[...]
> > > Anyways, what do you think of the in-kernel PSI based
> > > oom-kill trigger. I think Johannes ha
On Tue 22-09-20 08:54:25, Shakeel Butt wrote:
> On Tue, Sep 22, 2020 at 4:49 AM Michal Hocko wrote:
> >
> > On Mon 21-09-20 10:50:14, Shakeel Butt wrote:
[...]
> > > Let me add one more point. Even if the high limit reclaim is swift, it
> > > can still take 10
On Tue 22-09-20 09:29:48, Shakeel Butt wrote:
> On Tue, Sep 22, 2020 at 8:16 AM Michal Hocko wrote:
> >
> > On Tue 22-09-20 06:37:02, Shakeel Butt wrote:
[...]
> > > I talked about this problem with Johannes at LPC 2019 and I think we
> > > talked about two
>
> Could you please elaborate? Do not want to speculate :)
It thrown 501 on me. lkml.org is quite unreliable. It works now. I will
read through that. Please use lore or lkml.kernel.org/r/$msg in future.
--
Michal Hocko
SUSE Labs
but the second one
> might help.
Why does your oomd depend on memory allocation?
--
Michal Hocko
SUSE Labs
On Mon 21-09-20 10:50:14, Shakeel Butt wrote:
> On Mon, Sep 21, 2020 at 9:30 AM Michal Hocko wrote:
> >
> > On Wed 09-09-20 14:57:52, Shakeel Butt wrote:
> > > Introduce an memcg interface to trigger memory reclaim on a memory cgroup.
> > >
> > > Use
got stuck under this much memory
> pressure.
>
> I am wondering if anyone else has seen a similar situation in production
> and if there is a recommended way to resolve this situation.
I would recommend to focus on tracking down the who is blocking the
further progress.
--
Michal Hocko
SUSE Labs
to tuned value is to be expected. The primary problem is that
the hotadding memory after boot (without any user configured value) will
decrease the value effectively because khugepaged tuning
(set_recommended_min_free_kbytes) is not called.
--
Michal Hocko
SUSE Labs
On Tue 22-09-20 16:06:31, Yafang Shao wrote:
> On Tue, Sep 22, 2020 at 3:27 PM Michal Hocko wrote:
[...]
> > What is the latency triggered by the memory reclaim? It should be mostly
> > a clean page cache right as drop_caches only drops clean pages. Or is
> > this more ab
On Mon 21-09-20 20:35:53, Paul E. McKenney wrote:
> On Mon, Sep 21, 2020 at 06:03:18PM +0200, Michal Hocko wrote:
> > On Mon 21-09-20 08:45:58, Paul E. McKenney wrote:
> > > On Mon, Sep 21, 2020 at 09:47:16AM +0200, Michal Hocko wrote:
> > > > On Fri 18-09-20 21
mory and need
to reclaim. Otherwise they are constantly refilled/rebalanced on demand.
The fact that you are refilling them from outside just suggest that you
are operating on a wrong layer. Really, create your own pool of pages
and rebalance them based on the workload.
> Could you please specify a real test case or workload you are talking about?
I am not a performance expert but essentially any memory allocator heavy
workload might notice. I am pretty sure Mel would tell you more.
--
Michal Hocko
SUSE Labs
On Tue 22-09-20 12:20:52, Yafang Shao wrote:
> On Mon, Sep 21, 2020 at 7:36 PM Michal Hocko wrote:
> >
> > On Mon 21-09-20 19:23:01, Yafang Shao wrote:
> > > On Mon, Sep 21, 2020 at 7:05 PM Michal Hocko wrote:
> > > >
> > > > On Mon 21-09-20 18:55
like something too
easy to use incorrectly (remember drop_caches). I am also a bit worried
about corner cases wich would be easier to hit - e.g. fill up the swap
limit and turn anonymous memory into unreclaimable and who knows what
else.
--
Michal Hocko
SUSE Labs
On Mon 21-09-20 17:04:50, Christian Brauner wrote:
> On Mon, Sep 21, 2020 at 04:55:37PM +0200, Michal Hocko wrote:
> > On Mon 21-09-20 16:43:55, Christian Brauner wrote:
> > > On Mon, Sep 21, 2020 at 10:38:47AM -0400, Tejun Heo wrote:
> > > > Hello,
> > > &
On Mon 21-09-20 08:45:58, Paul E. McKenney wrote:
> On Mon, Sep 21, 2020 at 09:47:16AM +0200, Michal Hocko wrote:
> > On Fri 18-09-20 21:48:15, Uladzislau Rezki (Sony) wrote:
> > [...]
> > > Proposal
> > >
> > > Introduce a lock-free function
On Mon 21-09-20 16:41:34, Christian Brauner wrote:
> On Mon, Sep 21, 2020 at 03:42:00PM +0200, Michal Hocko wrote:
> > [Cc Tejun and Christian - this is a part of a larger discussion which is
> > not directly related to this particular question so let me trim the
> > origi
On Mon 21-09-20 16:43:55, Christian Brauner wrote:
> On Mon, Sep 21, 2020 at 10:38:47AM -0400, Tejun Heo wrote:
> > Hello,
> >
> > On Mon, Sep 21, 2020 at 04:28:34PM +0200, Michal Hocko wrote:
> > > Fundamentaly CLONE_INTO_CGROUP is similar to regular fork + move to
On Mon 21-09-20 10:18:30, Peter Xu wrote:
> Hi, Michal,
>
> On Mon, Sep 21, 2020 at 03:42:00PM +0200, Michal Hocko wrote:
[...]
> > I have only now
> > learned about this feature so I am not deeply familiar with all the
> > details and I might be easily wrong. No
we might have quite a lot of resources bound to
child's lifetime but accounted to the parent's memcg which can lead to
all sorts of interesting problems (e.g. unreclaimable memory - even by
the oom killer).
Christian, Tejun is this the expected semantic or I am just misreading
the code?
--
Michal Hocko
SUSE Labs
On Mon 21-09-20 19:23:01, Yafang Shao wrote:
> On Mon, Sep 21, 2020 at 7:05 PM Michal Hocko wrote:
> >
> > On Mon 21-09-20 18:55:40, Yafang Shao wrote:
> > > On Mon, Sep 21, 2020 at 4:12 PM Michal Hocko wrote:
> > > >
> > > > On Mon 21
On Fri 18-09-20 12:53:58, Yu Zhao wrote:
> On Fri, Sep 18, 2020 at 01:09:14PM +0200, Michal Hocko wrote:
> > On Fri 18-09-20 04:27:13, Yu Zhao wrote:
> > > On Fri, Sep 18, 2020 at 09:37:00AM +0200, Michal Hocko wrote:
> > > > On Thu 17-09-20 21:00:40, Yu Zhao wrote:
On Mon 21-09-20 18:55:40, Yafang Shao wrote:
> On Mon, Sep 21, 2020 at 4:12 PM Michal Hocko wrote:
> >
> > On Mon 21-09-20 16:02:55, zangchun...@bytedance.com wrote:
> > > From: Chunxin Zang
> > >
> > > In the cgroup v1, we have 'force_mepty' i
ory_max_write,
> },
> {
> + .name = "drop_cache",
> + .flags = CFTYPE_NOT_ON_ROOT,
> + .write = mem_cgroup_force_empty_write,
> + },
> + {
> .name = "events",
> .flags = CFTYPE_NOT_ON_ROOT,
> .file_offset = offsetof(struct mem_cgroup, events_file),
> --
> 2.11.0
--
Michal Hocko
SUSE Labs
going to do any
good for long term maintainability.
--
Michal Hocko
SUSE Labs
t people are
asking for a long time.
This functionality shouldn't be much different from the standard memory
reclaim. It has some limitations (e.g. it can only handle mapped memory)
but allows to pro-actively swap out or reclaim disk based memory based
on a specific knowlege of the workload. Kernel is not able to do the
same.
[1] http://lkml.kernel.org/r/20200117115225.gv19...@dhcp22.suse.cz
--
Michal Hocko
SUSE Labs
801 - 900 of 5772 matches
Mail list logo