> x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
> check")
>
> Reviewed-by: Christophe Leroy
> Signed-off-by: Rohan McLure
Reviewed-by: Pasha Tatashin
Thank you for this work!
Pasha
On Tue, Apr 2, 2024 at 1:13 AM Rohan McLure wrote:
>
> In the new set_ptes() API, set_pte_at() (a special case of set_ptes())
> is intended to be instrumented by the page table check facility. There
> are however several other routines that constitute the API for setting
> page table entries,
gt; Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()")
> pte_user() is no longer required to be present on all platforms as it
> may be equivalent to or implied by pte_read(). Hence implementations of
> pte_user_accessible_page() are specialised.
>
> Signed-off-by: Rohan McLure
Reviewed-by: Pasha Tatashin
gated by a call to pud_user_accessible_page(), which will return zero,
> include this stub as a BUILD_BUG().
>
> Signed-off-by: Rohan McLure
Reviewed-by: Pasha Tatashin
l address argument to each of these routines in order to
> provide support for page table check on powerpc.
>
> Signed-off-by: Rohan McLure
Reviewed-by: Pasha Tatashin
, as many do not encode user/kernel ownership of the
> page in the pte, but instead in the address of the access.
>
> Signed-off-by: Rohan McLure
Reviewed-by: Pasha Tatashin
, as many do not encode user/kernel ownership of the
> page in the pte, but instead in the address of the access.
>
> Signed-off-by: Rohan McLure
Reviewed-by: Pasha Tatashin
All patches in this series should be also CC'd to linux-ker...@vger.kernel.org.
, as many do not encode user/kernel ownership of the
> page in the pte, but instead in the address of the access.
>
> Signed-off-by: Rohan McLure
Reviewed-by: Pasha Tatashin
rnel in the pte. On such platforms, this can be inferred form the
> addr parameter.
>
> Signed-off-by: Rohan McLure
Reviewed-by: Pasha Tatashin
riscv. Keep that change rather than reverting it, as the signature of
> __set_pte_at() is changed in a different commit.
>
> Signed-off-by: Rohan McLure
Reviewed-by: Pasha Tatashin
riscv. Keep that change rather than reverting it, as the signature of
> __set_pte_at() is changed in a different commit.
>
> Signed-off-by: Rohan McLure
Reviewed-by: Pasha Tatashin
For some reason this one was not delivered to the linux-mm mailing list.
Pasha
On 9/6/18 5:15 AM, Michal Hocko wrote:
> On Wed 05-09-18 18:59:15, Mike Rapoport wrote:
> [...]
>> 325 files changed, 846 insertions(+), 2478 deletions(-)
>> delete mode 100644 include/linux/bootmem.h
>> delete mode 100644 mm/bootmem.c
>> delete mode 100644 mm/nobootmem.c
>
> This is really
0.090992] Kernel panic - not syncing: Attempted to kill the idle task!
Pavel
On 8/31/18 7:29 AM, Jiri Slaby wrote:
> On 08/31/2018, 01:26 PM, Jiri Slaby wrote:
>> On 08/30/2018, 05:45 PM, Pasha Tatashin wrote:
>>> Hi Jiri,
>>>
>>&
Reviewed-by: Pavel Tatashin
On 8/21/18 6:44 AM, David Hildenbrand wrote:
> Let's document the magic a bit, especially why device_hotplug_lock is
> required when adding/removing memory and how it all play together with
> requests to online/offline memory from user space.
>
Reviewed-by: Pavel Tatashin
On 8/21/18 6:44 AM, David Hildenbrand wrote:
> Let's perform all checking + offlining + removing under
> device_hotplug_lock, so nobody can mess with these devices via
> sysfs concurrently.
>
> Cc: Benjamin Herrenschmidt
> Cc: Paul Mackerras
> Cc: Michael Ellerman
Reviewed-by: Pavel Tatashin
On 8/21/18 6:44 AM, David Hildenbrand wrote:
> device_online() should be called with device_hotplug_lock() held.
>
> Cc: Benjamin Herrenschmidt
> Cc: Paul Mackerras
> Cc: Michael Ellerman
> Cc: Rashmica Gupta
> Cc: Balbir Singh
> Cc: Michael Neuling
>
On 8/21/18 6:44 AM, David Hildenbrand wrote:
> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
> fix concurrent memory hot-add deadlock"), which tried to fix a possible
> lock inversion reported and discussed in [1] due to the two locks
> a) device_lock()
> b)
On 8/21/18 6:44 AM, David Hildenbrand wrote:
> add_memory() currently does not take the device_hotplug_lock, however
> is aleady called under the lock from
> arch/powerpc/platforms/pseries/hotplug-memory.c
> drivers/acpi/acpi_memhotplug.c
> to synchronize against CPU hot-remove and
> +
> +void __ref remove_memory(int nid, u64 start, u64 size)
Remove __ref, otherwise looks good:
Reviewed-by: Pavel Tatashin
> +{
> + lock_device_hotplug();
> + __remove_memory(nid, start, size);
> + unlock_device_hotplug();
> +}
> EXPORT_SYMBOL_GPL(remove_memory);
> #endif /*
On 8/30/18 8:31 AM, David Hildenbrand wrote:
> On 21.08.2018 12:44, David Hildenbrand wrote:
>> This is the same approach as in the first RFC, but this time without
>> exporting device_hotplug_lock (requested by Greg) and with some more
>> details and documentation regarding locking. Tested only
Hi Jiri,
I believe this bug is fixed with this change:
d39f8fb4b7776dcb09ec3bf7a321547083078ee3
mm: make DEFERRED_STRUCT_PAGE_INIT explicitly depend on SPARSEMEM
I am not able to reproduce this problem on x86-32.
Pavel
On 8/30/18 10:35 AM, Pavel Tatashin wrote:
> Thank you Jiri, I am studying
Thank you Jiri, I am studying it.
Pavel
On 8/24/18 3:44 AM, Jiri Slaby wrote:
> pasha.tatas...@oracle.com -> pavel.tatas...@microsoft.com
>
> due to
> 550 5.1.1 Unknown recipient address.
>
>
> On 08/24/2018, 09:32 AM, Jiri Slaby wrote:
>> On 06/19/2018, 09:56 PM, Pavel Tatashin wrote:
>>>
Hi Michal,
As I've said in other reply this should go in only if the scenario you
describe is real. I am somehow suspicious to be honest. I simply do not
see how those weird struct pages would be in a valid pfn range of any
zone.
There are examples of both when unavailable memory is not
Hi Anshuman,
Thank you very much for looking at this. My reply below::
On 10/06/2017 02:48 AM, Anshuman Khandual wrote:
On 10/04/2017 08:59 PM, Pavel Tatashin wrote:
This patch fixes another existing issue on systems that have holes in
zones i.e CONFIG_HOLES_IN_ZONE is defined.
In
On 10/04/2017 10:04 AM, Michal Hocko wrote:
On Wed 04-10-17 09:28:55, Pasha Tatashin wrote:
I am not really familiar with the trim_low_memory_range code path. I am
not even sure we have to care about it because nobody should be walking
pfns outside of any zone.
According to commit
I am not really familiar with the trim_low_memory_range code path. I am
not even sure we have to care about it because nobody should be walking
pfns outside of any zone.
According to commit comments first 4K belongs to BIOS, so I think the
memory exists but BIOS may or may not report it to
Could you be more specific where is such a memory reserved?
I know of one example: trim_low_memory_range() unconditionally reserves from
pfn 0, but e820__memblock_setup() might provide the exiting memory from pfn
1 (i.e. KVM).
Then just initialize struct pages for that mapping rigth there
On 10/04/2017 04:45 AM, Michal Hocko wrote:
On Tue 03-10-17 11:22:35, Pasha Tatashin wrote:
On 10/03/2017 09:08 AM, Michal Hocko wrote:
On Wed 20-09-17 16:17:08, Pavel Tatashin wrote:
Add struct page zeroing as a part of initialization of other fields in
__init_single_page().
This single
that are not zeroed properly.
However, the first patch depends on
mm: zero struct pages during initialization
As it uses mm_zero_struct_page().
Pasha
On 10/03/2017 11:34 AM, Pasha Tatashin wrote:
On 10/03/2017 09:19 AM, Michal Hocko wrote:
On Wed 20-09-17 16:17:14, Pavel Tatashin wrote
-volatile counters, compiler
will be smart enough to remove all the unnecessary de-references. As a
plus, we won't be adding any new branches, and the code is still going
to stay compact.
Pasha
On 10/03/2017 11:15 AM, Pasha Tatashin wrote:
Hi Michal,
Please be explicit that this is possible
On 10/03/2017 09:19 AM, Michal Hocko wrote:
On Wed 20-09-17 16:17:14, Pavel Tatashin wrote:
vmemmap_alloc_block() will no longer zero the block, so zero memory
at its call sites for everything except struct pages. Struct page memory
is zero'd by struct page initialization.
Replace allocators
On 10/03/2017 09:18 AM, Michal Hocko wrote:
On Wed 20-09-17 16:17:10, Pavel Tatashin wrote:
Some memory is reserved but unavailable: not present in memblock.memory
(because not backed by physical pages), but present in memblock.reserved.
Such memory has backing struct pages, but they are not
On 10/03/2017 09:08 AM, Michal Hocko wrote:
On Wed 20-09-17 16:17:08, Pavel Tatashin wrote:
Add struct page zeroing as a part of initialization of other fields in
__init_single_page().
This single thread performance collected on: Intel(R) Xeon(R) CPU E7-8895
v3 @ 2.60GHz with 1T of memory
Acked-by: Michal Hocko
Thank you,
Pasha
Hi Michal,
Please be explicit that this is possible only because we discard
memblock data later after 3010f876500f ("mm: discard memblock data
later"). Also be more explicit how the new code works.
OK
I like how the resulting code is more compact and smaller.
That was the goal :)
As you separated x86 and sparc patches doing essentially the same I
assume David is going to take this patch?
Correct, I noticed that usually platform specific changes are done in
separate patches even if they are small. Dave already Acked this patch.
So, I do not think it should be
Hi Mark,
I considered using a new *populate() function for shadow without using
vmemmap_populate(), but that makes things unnecessary complicated:
vmemmap_populate() has builtin:
1. large page support
2. device memory support
3. node locality support
4. several config based variants on
Hi Michal,
I hope I haven't missed anything but it looks good to me.
Acked-by: Michal Hocko
Thank you for your review.
one nit below
---
arch/x86/mm/init_64.c | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/init_64.c
Hi Dave,
Thank you for acking.
The reason I am not doing initializing stores is because they require a
membar, even if only regular stores are following (I hoped to do a
membar before first load). This is something I was thinking was not
true, but after consulting with colleagues and
Hi Michal,
While working on a bug that was reported to me by "kernel test robot".
unable to handle kernel NULL pointer dereference at (null)
The issue was that page_to_pfn() on that configuration was looking for a
section inside flags fields in "struct page". So, reserved but
h to use
this iterator, which will simplify it.
Pasha
On 08/14/2017 09:51 AM, Pasha Tatashin wrote:
mem_init()
free_all_bootmem()
free_low_memory_core_early()
for_each_reserved_mem_region()
reserve_bootmem_region()
init_reserved_page() <- if this is deferred re
However, now thinking about it, I will change it to CONFIG_MEMBLOCK_DEBUG,
and let users decide what other debugging configs need to be enabled, as
this is also OK.
Actually the more I think about it the more I am convinced that a kernel
boot parameter would be better because it doesn't need
mem_init()
free_all_bootmem()
free_low_memory_core_early()
for_each_reserved_mem_region()
reserve_bootmem_region()
init_reserved_page() <- if this is deferred reserved page
__init_single_pfn()
__init_single_page()
So, currently, we are using the value of
#ifdef CONFIG_MEMBLOCK in page_alloc, or define memblock_discard() stubs in
nobootmem headfile.
This is the standard way to do this. And it is usually preferred to
proliferate ifdefs in the code.
Hi Michal,
As you suggested, I sent-out this patch separately. If you feel
strongly, that this
OK, I will post it separately. No it does not depend on the rest, but the
reset depends on this. So, I am not sure how to enforce that this comes
before the rest.
Andrew will take care of that. Just make it explicit that some of the
patch depends on an earlier work when reposting.
Ok.
Yes,
On 08/14/2017 07:43 AM, Michal Hocko wrote:
register_page_bootmem_info
register_page_bootmem_info_node
get_page_bootmem
.. setting fields here ..
such as: page->freelist = (void *)type;
free_all_bootmem()
free_low_memory_core_early()
for_each_reserved_mem_region()
Correct, the pgflags asserts were triggered when we were setting reserved
flags to struct page for PFN 0 in which was never initialized through
__init_single_page(). The reason they were triggered is because we set all
uninitialized memory to ones in one of the debug patches.
And why don't we
Hi Michal,
This suggestion won't work, because there are arches without memblock
support: tile, sh...
So, I would still need to have:
#ifdef CONFIG_MEMBLOCK in page_alloc, or define memblock_discard() stubs
in nobootmem headfile. In either case it would become messier than what
it is right
Sure, I could do this, but as I understood from earlier Dave Miller's
comments, we should do one logical change at a time. Hence, introduce API in
one patch use it in another. So, this is how I tried to organize this patch
set. Is this assumption incorrect?
Well, it really depends. If the patch
I will address your comment, and send out a new patch. Should I send it out
separately from the series or should I keep it inside?
I would post it separatelly. It doesn't depend on the rest.
OK, I will post it separately. No it does not depend on the rest, but
the reset depends on this. So,
When CONFIG_DEBUG_VM is enabled, this patch sets all the memory that is
returned by memblock_virt_alloc_try_nid_raw() to ones to ensure that no
places excpect zeroed memory.
Please fold this into the patch which introduces
memblock_virt_alloc_try_nid_raw.
OK
I am not sure CONFIG_DEBUG_VM is
Clients can call alloc_large_system_hash() with flag: HASH_ZERO to specify
that memory that was allocated for system hash needs to be zeroed,
otherwise the memory does not need to be zeroed, and client will initialize
it.
If memory does not need to be zero'd, call the new
On 08/11/2017 09:04 AM, Michal Hocko wrote:
On Mon 07-08-17 16:38:47, Pavel Tatashin wrote:
Replace allocators in sprase-vmemmap to use the non-zeroing version. So,
we will get the performance improvement by zeroing the memory in parallel
when struct pages are zeroed.
First of all this should
Add an optimized mm_zero_struct_page(), so struct page's are zeroed without
calling memset(). We do eight to tent regular stores based on the size of
struct page. Compiler optimizes out the conditions of switch() statement.
Again, this doesn't explain why we need this. You have mentioned those
I believe this deserves much more detailed explanation why this is safe.
What actually prevents any pfn walker from seeing an uninitialized
struct page? Please make your assumptions explicit in the commit log so
that we can check them independently.
There is nothing prevents pfn walkers from
On 08/11/2017 08:39 AM, Michal Hocko wrote:
On Mon 07-08-17 16:38:41, Pavel Tatashin wrote:
A new variant of memblock_virt_alloc_* allocations:
memblock_virt_alloc_try_nid_raw()
- Does not zero the allocated memory
- Does not panic if request cannot be satisfied
OK, this looks good
On 08/11/2017 05:37 AM, Michal Hocko wrote:
On Mon 07-08-17 16:38:39, Pavel Tatashin wrote:
In deferred_init_memmap() where all deferred struct pages are initialized
we have a check like this:
if (page->flags) {
VM_BUG_ON(page_zone(page) != zone);
goto
I guess this goes all the way down to
Fixes: 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel
with kswapd")
I will add this to the patch.
Signed-off-by: Pavel Tatashin
Reviewed-by: Steven Sistare
Reviewed-by:
AFAIU register_page_bootmem_info_node is only about struct pages backing
pgdat, usemap and memmap. Those should be in reserved memblocks and we
do not initialize those at later times, they are not relevant to the
deferred initialization as your changelog suggests so the ordering with
Struct pages are initialized by going through __init_single_page(). Since
the existing physical memory in memblock is represented in memblock.memory
list, struct page for every page from this list goes through
__init_single_page().
By a page _from_ this list you mean struct pages backing the
On 08/11/2017 03:58 AM, Michal Hocko wrote:
[I am sorry I didn't get to your previous versions]
Thank you for reviewing this work. I will address your comments, and
send-out a new patches.
In this work we do the following:
- Never read access struct page until it was initialized
How is
On 2017-08-08 09:15, David Laight wrote:
From: Pasha Tatashin
Sent: 08 August 2017 12:49
Thank you for looking at this change. What you described was in my
previous iterations of this project.
See for example here: https://lkml.org/lkml/2017/5/5/369
I was asked to remove that flag, and only
Hi Will,
> Damn, I actually prefer the flag :)
>
> But actually, if you look at our implementation of vmemmap_populate,
then we
> have our own version of vmemmap_populate_basepages that terminates at the
> pmd level anyway if ARM64_SWAPPER_USES_SECTION_MAPS. If there's
resistance
> to do
Hi Will,
Thank you for looking at this change. What you described was in my
previous iterations of this project.
See for example here: https://lkml.org/lkml/2017/5/5/369
I was asked to remove that flag, and only zero memory in place when
needed. Overall the current approach is better
Hi Ard,
Thank you very much for reviewing this. I will fix the bug you found in
the next iteration.
+zero_vemmap_populated_memory(void)
Typo here: vemmap -> vmemmap
Yeap, will rename here, and in Intel variant.
+{
+ struct memblock_region *reg;
+ u64 start, end;
+
+
Hi Sam,
Thank you for looking at this. I will update patch description, and as
you suggested replace memset() via static assert in next iteration.
Pasha
On 08/04/2017 01:37 AM, Sam Ravnborg wrote:
Hi Pavel.
On Thu, Aug 03, 2017 at 05:23:47PM -0400, Pavel Tatashin wrote:
Add an optimized
OK, so why cannot we make zero_struct_page 8x 8B stores, other arches
would do memset. You said it would be slower but would that be
measurable? I am sorry to be so persistent here but I would be really
happier if this didn't depend on the deferred initialization. If this is
absolutely a no-go
Could you be more specific? E.g. how are other stores done in
__init_single_page safe then? I am sorry to be dense here but how does
the full 64B store differ from other stores done in the same function.
Hi Michal,
It is safe to do regular 8-byte and smaller stores (stx, st, sth, stb)
without
Hi Michal,
I have considered your proposals:
1. Making memset(0) unconditional inside __init_single_page() is not
going to work because it slows down SPARC, and ppc64. On SPARC even the
BSTI optimization that I have proposed earlier won't work, because after
consulting with other engineers I
Ah OK, I will include the change.
Thank you,
Pasha
On 05/15/2017 07:17 PM, Heiko Carstens wrote:
Hello Pasha,
Thank you for looking at this patch. I am worried to make the proposed
change, because, as I understand in this case we allocate memory not for
"struct page"s but for table that hold
On 05/15/2017 03:38 PM, Michal Hocko wrote:
On Mon 15-05-17 14:12:10, Pasha Tatashin wrote:
Hi Michal,
After looking at your suggested memblock_virt_alloc_core() change again, I
decided to keep what I have. I do not want to inline
memblock_virt_alloc_internal(), because
Hi Heiko,
Thank you for looking at this patch. I am worried to make the proposed
change, because, as I understand in this case we allocate memory not for
"struct page"s but for table that hold them. So, we will change the
behavior from the current one, where this table is allocated zeroed,
Hi Michal,
After looking at your suggested memblock_virt_alloc_core() change again,
I decided to keep what I have. I do not want to inline
memblock_virt_alloc_internal(), because it is not a performance critical
path, and by inlining it we will unnecessarily increase the text size on
all
On 05/12/2017 12:57 PM, David Miller wrote:
From: Pasha Tatashin <pasha.tatas...@oracle.com>
Date: Thu, 11 May 2017 16:59:33 -0400
We should either keep memset() only for deferred struct pages as what
I have in my patches.
Another option is to add a new function struct_page_clear()
do one membar call after all
"struct pages" are initialized.
I think what I sent out already is cleaner and better solution, because
I am not sure what kind of performance we would see on other chips.
On 05/11/2017 04:47 PM, Pasha Tatashin wrote:
Have you measured that? I do
Have you measured that? I do not think it would be super hard to
measure. I would be quite surprised if this added much if anything at
all as the whole struct page should be in the cache line already. We do
set reference count and other struct members. Almost nobody should be
looking at our page
On 05/10/2017 10:57 AM, Michal Hocko wrote:
On Wed 10-05-17 09:42:22, Pasha Tatashin wrote:
Well, I didn't object to this particular part. I was mostly concerned
about
http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatas...@oracle.com
and the "zero" argument
Well, I didn't object to this particular part. I was mostly concerned
about
http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatas...@oracle.com
and the "zero" argument for other functions. I guess we can do without
that. I _think_ that we should simply _always_ initialize the
Hi Michal,
I like the idea of postponing the zeroing from the allocation to the
init time. To be honest the improvement looks much larger than I would
expect (Btw. this should be a part of the changelog rather than a
outside link).
The improvements are larger, because this time was never
Hi Dave,
Thank you for the review. I will address your comment and update patchset..
Pasha
On 05/03/2017 10:34 AM, David Miller wrote:
From: Pavel Tatashin
Date: Fri, 24 Mar 2017 15:19:50 -0400
Allow clients to request non-zeroed memory from vmemmap allocator.
On 03/23/2017 07:47 PM, Pasha Tatashin wrote:
How long does it take if we just don't zero this memory at all? We seem
to be initialising most of struct page in __init_single_page(), so it
seems like a lot of additional complexity to conditionally zero the rest
of struct page.
Alternatively
On 03/23/2017 07:35 PM, David Miller wrote:
From: Matthew Wilcox
Date: Thu, 23 Mar 2017 16:26:38 -0700
On Thu, Mar 23, 2017 at 07:01:48PM -0400, Pavel Tatashin wrote:
When deferred struct page initialization feature is enabled, we get a
performance gain of initializing
Hi Matthew,
Thank you for your comment. If you look at the data, having memset()
actually benefits initializing data.
With base it takes:
[ 66.148867] node 0 initialised, 128312523 pages in 7200ms
With fix:
[ 15.260634] node 0 initialised, 128312523 pages in 4190ms
So 4.19s vs 7.2s for
84 matches
Mail list logo