Re: [PATCH v4 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO

2021-04-21 Thread Gavin Shan

Hi Keqian,

On 4/21/21 4:36 PM, Keqian Zhu wrote:

On 2021/4/21 15:52, Gavin Shan wrote:

On 4/16/21 12:03 AM, Keqian Zhu wrote:

The MMIO region of a device maybe huge (GB level), try to use
block mapping in stage2 to speedup both map and unmap.

Compared to normal memory mapping, we should consider two more
points when try block mapping for MMIO region:

1. For normal memory mapping, the PA(host physical address) and
HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
the HVA to request hugepage, so we don't need to consider PA
alignment when verifing block mapping. But for device memory
mapping, the PA and HVA may have different alignment.

2. For normal memory mapping, we are sure hugepage size properly
fit into vma, so we don't check whether the mapping size exceeds
the boundary of vma. But for device memory mapping, we should pay
attention to this.

This adds get_vma_page_shift() to get page shift for both normal
memory and device MMIO region, and check these two points when
selecting block mapping size for MMIO region.

Signed-off-by: Keqian Zhu 
---
   arch/arm64/kvm/mmu.c | 61 
   1 file changed, 51 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c59af5ca01b0..5a1cc7751e6d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -738,6 +738,35 @@ transparent_hugepage_adjust(struct kvm_memory_slot 
*memslot,
   return PAGE_SIZE;
   }
   +static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long hva)
+{
+unsigned long pa;
+
+if (is_vm_hugetlb_page(vma) && !(vma->vm_flags & VM_PFNMAP))
+return huge_page_shift(hstate_vma(vma));
+
+if (!(vma->vm_flags & VM_PFNMAP))
+return PAGE_SHIFT;
+
+VM_BUG_ON(is_vm_hugetlb_page(vma));
+


I don't understand how VM_PFNMAP is set for hugetlbfs related vma.
I think they are exclusive, meaning the flag is never set for
hugetlbfs vma. If it's true, VM_PFNMAP needn't be checked on hugetlbfs
vma and the VM_BUG_ON() becomes unnecessary.

Yes, but we're not sure all drivers follow this rule. Add a BUG_ON() is
a way to catch issue.



I think I didn't make things clear. What I meant is VM_PFNMAP can't
be set for hugetlbfs VMAs. So the checks here can be simplified as
below if you agree:

if (is_vm_hugetlb_page(vma))
return huge_page_shift(hstate_vma(vma));

if (!(vma->vm_flags & VM_PFNMAP))
return PAGE_SHIFT;

VM_BUG_ON(is_vm_hugetlb_page(vma));   /* Can be dropped */




+pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
+
+#ifndef __PAGETABLE_PMD_FOLDED
+if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
+ALIGN_DOWN(hva, PUD_SIZE) >= vma->vm_start &&
+ALIGN(hva, PUD_SIZE) <= vma->vm_end)
+return PUD_SHIFT;
+#endif
+
+if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
+ALIGN_DOWN(hva, PMD_SIZE) >= vma->vm_start &&
+ALIGN(hva, PMD_SIZE) <= vma->vm_end)
+return PMD_SHIFT;
+
+return PAGE_SHIFT;
+}
+


There is "switch(...)" fallback mechanism in user_mem_abort(). PUD_SIZE/PMD_SIZE
can be downgraded accordingly if the addresses fails in the alignment check
by fault_supports_stage2_huge_mapping(). I think it would make user_mem_abort()
simplified if the logic can be moved to get_vma_page_shift().

Another question if we need the check from fault_supports_stage2_huge_mapping()
if VM_PFNMAP area is going to be covered by block mapping. If so, the 
"switch(...)"
fallback mechanism needs to be part of get_vma_page_shift().

Yes, Good suggestion. My idea is that we can keep this series simpler and do 
further
optimization in another patch series. Do you mind to send a patch?



Yeah, It's fine to keep this series as of being. There are 3 checks applied
here for MMIO region: hva/hpa/ipa and they're distributed in two functions,
making the code a bit hard to follow. I can post patch to simplify it after
your series gets merged :)

Thanks,
Gavin

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 1/2] kvm/arm64: Remove the creation time's mapping of MMIO regions

2021-04-21 Thread Gavin Shan

Hi Keqian,

On 4/21/21 4:28 PM, Keqian Zhu wrote:

On 2021/4/21 14:38, Gavin Shan wrote:

On 4/16/21 12:03 AM, Keqian Zhu wrote:

The MMIO regions may be unmapped for many reasons and can be remapped
by stage2 fault path. Map MMIO regions at creation time becomes a
minor optimization and makes these two mapping path hard to sync.

Remove the mapping code while keep the useful sanity check.

Signed-off-by: Keqian Zhu 
---
   arch/arm64/kvm/mmu.c | 38 +++---
   1 file changed, 3 insertions(+), 35 deletions(-)



After removing the logic to create stage2 mapping for VM_PFNMAP region,
I think the "do { } while" loop becomes unnecessary and can be dropped
completely. It means the only sanity check is to see if the memory slot
overflows IPA space or not. In that case, KVM_MR_FLAGS_ONLY can be
ignored because the memory slot's base address and length aren't changed
when we have KVM_MR_FLAGS_ONLY.

Maybe not exactly. Here we do an important sanity check that we shouldn't
log dirty for memslots with VM_PFNMAP.



Yeah, Sorry that I missed that part. Something associated with Santosh's
patch. The flag can be not existing until the page fault happened on
the vma. In this case, the check could be not working properly.

  [PATCH] KVM: arm64: Correctly handle the mmio faulting

[...]

Thanks,
Gavin




diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8711894db8c2..c59af5ca01b0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1301,7 +1301,6 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
   {
   hva_t hva = mem->userspace_addr;
   hva_t reg_end = hva + mem->memory_size;
-bool writable = !(mem->flags & KVM_MEM_READONLY);
   int ret = 0;
 if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
@@ -1318,8 +1317,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
   mmap_read_lock(current->mm);
   /*
* A memory region could potentially cover multiple VMAs, and any holes
- * between them, so iterate over all of them to find out if we can map
- * any of them right now.
+ * between them, so iterate over all of them.
*
* ++
* +---++   ++
@@ -1330,50 +1328,20 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
*/
   do {
   struct vm_area_struct *vma = find_vma(current->mm, hva);
-hva_t vm_start, vm_end;
 if (!vma || vma->vm_start >= reg_end)
   break;
   -/*
- * Take the intersection of this VMA with the memory region
- */
-vm_start = max(hva, vma->vm_start);
-vm_end = min(reg_end, vma->vm_end);
-
   if (vma->vm_flags & VM_PFNMAP) {
-gpa_t gpa = mem->guest_phys_addr +
-(vm_start - mem->userspace_addr);
-phys_addr_t pa;
-
-pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
-pa += vm_start - vma->vm_start;
-
   /* IO region dirty page logging not allowed */
   if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES) {
   ret = -EINVAL;
-goto out;
-}
-
-ret = kvm_phys_addr_ioremap(kvm, gpa, pa,
-vm_end - vm_start,
-writable);
-if (ret)
   break;
+}
   }
-hva = vm_end;
+hva = min(reg_end, vma->vm_end);
   } while (hva < reg_end);
   -if (change == KVM_MR_FLAGS_ONLY)
-goto out;
-
-spin_lock(>mmu_lock);
-if (ret)
-unmap_stage2_range(>arch.mmu, mem->guest_phys_addr, 
mem->memory_size);
-else if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
-stage2_flush_memslot(kvm, memslot);
-spin_unlock(>mmu_lock);
-out:
   mmap_read_unlock(current->mm);
   return ret;
   }



.





___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting

2021-04-21 Thread Gavin Shan

Hi Marc,

On 4/21/21 9:59 PM, Marc Zyngier wrote:

On Wed, 21 Apr 2021 07:17:44 +0100,
Keqian Zhu  wrote:

On 2021/4/21 14:20, Gavin Shan wrote:

On 4/21/21 12:59 PM, Keqian Zhu wrote:

On 2020/10/22 0:16, Santosh Shukla wrote:

The Commit:6d674e28 introduces a notion to detect and handle the
device mapping. The commit checks for the VM_PFNMAP flag is set
in vma->flags and if set then marks force_pte to true such that
if force_pte is true then ignore the THP function check
(/transparent_hugepage_adjust()).

There could be an issue with the VM_PFNMAP flag setting and checking.
For example consider a case where the mdev vendor driver register's
the vma_fault handler named vma_mmio_fault(), which maps the
host MMIO region in-turn calls remap_pfn_range() and maps
the MMIO's vma space. Where, remap_pfn_range implicitly sets
the VM_PFNMAP flag into vma->flags.

Could you give the name of the mdev vendor driver that triggers this issue?
I failed to find one according to your description. Thanks.



I think it would be fixed in driver side to set VM_PFNMAP in
its mmap() callback (call_mmap()), like vfio PCI driver does.
It means it won't be delayed until page fault is issued and
remap_pfn_range() is called. It's determined from the beginning
that the vma associated the mdev vendor driver is serving as
PFN remapping purpose. So the vma should be populated completely,
including the VM_PFNMAP flag before it becomes visible to user
space.


Why should that be a requirement? Lazy populating of the VMA should be
perfectly acceptable if the fault can only happen on the CPU side.



It isn't a requirement and the drivers needn't follow strictly. I checked
several drivers before looking into the patch and found almost all the
drivers have VM_PFNMAP set at mmap() time. In drivers/vfio/vfio-pci.c,
there is a comment as below, but it doesn't reveal too much about why
we can't set VM_PFNMAP at fault time.

static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
{
  :
/*
 * See remap_pfn_range(), called from vfio_pci_fault() but we can't
 * change vm_flags within the fault handler.  Set them now.
 */
vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
vma->vm_ops = _pci_mmap_ops;

return 0;
}

To set these flags in advance does have advantages. For example, VM_DONTEXPAND
prevents the vma to be merged with another one. VM_DONTDUMP make this vma
isn't eligible for coredump. Otherwise, the address space, which is associated
with the vma is accessed and unnecessary page faults are triggered on coredump.
VM_IO and VM_PFNMAP avoids to walk the page frames associated with the vma since
we don't have valid PFN in the mapping.



The example can be found from vfio driver in drivers/vfio/pci/vfio_pci.c:
 vfio_pci_mmap:   VM_PFNMAP is set for the vma
 vfio_pci_mmap_fault: remap_pfn_range() is called

Right. I have discussed the above with Marc. I want to find the driver
to fix it. However, AFAICS, there is no driver matches the description...


I have the feeling this is an out-of-tree driver (and Santosh email
address is bouncing, so I guess we won't have much information from
him).

However, the simple fact that any odd driver can provide a fault
handler and populate it the VMA on demand makes me think that we need
to support this case.



Yeah, Santosh's email isn't reachable. I think the VM_PFNMAP need to be
rechecked after gup if we're going to support this case.

Thanks,
Gavin

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: build perf support only when enabled

2021-04-21 Thread Arnd Bergmann
On Wed, Apr 21, 2021 at 3:56 PM Marc Zyngier  wrote:
> On Wed, 21 Apr 2021 14:49:01 +0100, Arnd Bergmann  wrote:
>
> I think a better way is to get rid of perf_num_counters() entirely,
> see [1]. If someone acks the second and last patches, I'll even take
> the whole series in (nudge nudge...).

Makes sense. I like your diffstat, too.

   Arnd
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm64: build perf support only when enabled

2021-04-21 Thread Arnd Bergmann
From: Arnd Bergmann 

The perf_num_counters() function is only defined when CONFIG_PERF_EVENTS
is enabled:

arch/arm64/kvm/perf.c: In function 'kvm_perf_init':
arch/arm64/kvm/perf.c:58:43: error: implicit declaration of function 
'perf_num_counters'; did you mean 'dec_mm_counter'? 
[-Werror=implicit-function-declaration]
   58 | if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0
  |   ^

Use conditional compilation to disable this feature entirely when
CONFIG_PERF_EVENTS is disabled in the host.

Fixes: 6b5b368fccd7 ("KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static 
key")
Signed-off-by: Arnd Bergmann 
---
Not sure if this is the correct symbol to check for, but this is
one way to avoid the build failure.
---
 arch/arm64/kvm/Makefile | 4 +++-
 arch/arm64/kvm/arm.c| 8 +---
 include/kvm/arm_pmu.h   | 7 +++
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 589921392cb1..9adf12ba5047 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -12,7 +12,7 @@ obj-$(CONFIG_KVM) += hyp/
 
 kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 $(KVM)/vfio.o $(KVM)/irqchip.o \
-arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
+arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 inject_fault.o va_layout.o handle_exit.o \
 guest.o debug.o reset.o sys_regs.o \
 vgic-sys-reg-v3.o fpsimd.o pmu.o \
@@ -24,4 +24,6 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
$(KVM)/eventfd.o \
 vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
 vgic/vgic-its.o vgic/vgic-debug.o
 
+kvm-$(CONFIG_PERF_EVENTS) += perf.o
+
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 4808aca8c87c..720e075c70f9 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1694,7 +1694,8 @@ static int init_subsystems(void)
if (err)
goto out;
 
-   kvm_perf_init();
+   if (IS_ENABLED(CONFIG_PERF_EVENTS))
+   kvm_perf_init();
kvm_sys_reg_table_init();
 
 out:
@@ -1899,7 +1900,8 @@ static int init_hyp_mode(void)
return 0;
 
 out_err:
-   teardown_hyp_mode();
+   if (IS_ENABLED(CONFIG_PERF_EVENTS))
+   teardown_hyp_mode();
kvm_err("error initializing Hyp mode: %d\n", err);
return err;
 }
@@ -2101,7 +2103,7 @@ int kvm_arch_init(void *opaque)
 
 out_hyp:
hyp_cpu_pm_exit();
-   if (!in_hyp_mode)
+   if (!IS_ENABLED(CONFIG_PERF_EVENTS) && in_hyp_mode)
teardown_hyp_mode();
 out_err:
return err;
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 6fd3cda608e4..d84307a1ebd5 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -13,12 +13,19 @@
 #define ARMV8_PMU_CYCLE_IDX(ARMV8_PMU_MAX_COUNTERS - 1)
 #define ARMV8_PMU_MAX_COUNTER_PAIRS((ARMV8_PMU_MAX_COUNTERS + 1) >> 1)
 
+#ifdef CONFIG_PERF_EVENTS
 DECLARE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
 
 static __always_inline bool kvm_arm_support_pmu_v3(void)
 {
return static_branch_likely(_arm_pmu_available);
 }
+#else
+static __always_inline bool kvm_arm_support_pmu_v3(void)
+{
+   return 0;
+}
+#endif
 
 #ifdef CONFIG_HW_PERF_EVENTS
 
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: build perf support only when enabled

2021-04-21 Thread Marc Zyngier
On Wed, 21 Apr 2021 14:49:01 +0100,
Arnd Bergmann  wrote:
> 
> From: Arnd Bergmann 
> 
> The perf_num_counters() function is only defined when CONFIG_PERF_EVENTS
> is enabled:
> 
> arch/arm64/kvm/perf.c: In function 'kvm_perf_init':
> arch/arm64/kvm/perf.c:58:43: error: implicit declaration of function 
> 'perf_num_counters'; did you mean 'dec_mm_counter'? 
> [-Werror=implicit-function-declaration]
>58 | if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0
>   |   ^
> 
> Use conditional compilation to disable this feature entirely when
> CONFIG_PERF_EVENTS is disabled in the host.
> 
> Fixes: 6b5b368fccd7 ("KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static 
> key")
> Signed-off-by: Arnd Bergmann 
> ---
> Not sure if this is the correct symbol to check for, but this is
> one way to avoid the build failure.

I think a better way is to get rid of perf_num_counters() entirely,
see [1]. If someone acks the second and last patches, I'll even take
the whole series in (nudge nudge...).

Thanks,

M.

[1] https://lore.kernel.org/r/20210414134409.1266357-1-...@kernel.org

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()

2021-04-21 Thread Anshuman Khandual



On 4/21/21 5:54 PM, Mike Rapoport wrote:
> On Wed, Apr 21, 2021 at 04:36:46PM +0530, Anshuman Khandual wrote:
>>
>> On 4/21/21 12:21 PM, Mike Rapoport wrote:
>>> From: Mike Rapoport 
>>>
>>> The arm64's version of pfn_valid() differs from the generic because of two
>>> reasons:
>>>
>>> * Parts of the memory map are freed during boot. This makes it necessary to
>>>   verify that there is actual physical memory that corresponds to a pfn
>>>   which is done by querying memblock.
>>>
>>> * There are NOMAP memory regions. These regions are not mapped in the
>>>   linear map and until the previous commit the struct pages representing
>>>   these areas had default values.
>>>
>>> As the consequence of absence of the special treatment of NOMAP regions in
>>> the memory map it was necessary to use memblock_is_map_memory() in
>>> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
>>> generic mm functionality would not treat a NOMAP page as a normal page.
>>>
>>> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
>>> the rest of core mm will treat them as unusable memory and thus
>>> pfn_valid_within() is no longer required at all and can be disabled by
>>> removing CONFIG_HOLES_IN_ZONE on arm64.
>>
>> This makes sense.
>>
>>>
>>> pfn_valid() can be slightly simplified by replacing
>>> memblock_is_map_memory() with memblock_is_memory().
>>>
>>> Signed-off-by: Mike Rapoport 
>>> ---
>>>  arch/arm64/Kconfig   | 3 ---
>>>  arch/arm64/mm/init.c | 4 ++--
>>>  2 files changed, 2 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index e4e1b6550115..58e439046d05 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>> def_bool y
>>> depends on NUMA
>>>  
>>> -config HOLES_IN_ZONE
>>> -   def_bool y
>>> -
>>
>> Right.
>>
>>>  source "kernel/Kconfig.hz"
>>>  
>>>  config ARCH_SPARSEMEM_ENABLE
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index dc03bdc12c0f..eb3f56fb8c7c 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
>>>  
>>> /*
>>>  * ZONE_DEVICE memory does not have the memblock entries.
>>> -* memblock_is_map_memory() check for ZONE_DEVICE based
>>> +* memblock_is_memory() check for ZONE_DEVICE based
>>>  * addresses will always fail. Even the normal hotplugged
>>>  * memory will never have MEMBLOCK_NOMAP flag set in their
>>>  * memblock entries. Skip memblock search for all non early
>>> @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
>>> return pfn_section_valid(ms, pfn);
>>>  }
>>>  #endif
>>> -   return memblock_is_map_memory(addr);
>>> +   return memblock_is_memory(addr);
>>
>> Wondering if MEMBLOCK_NOMAP is now being treated similarly to other
>> memory pfns for page table walking purpose but with PageReserved(),
>> why memblock_is_memory() is still required ? At this point, should
>> not we just return valid for early_section() memory. As pfn_valid()
>> now just implies that pfn has a struct page backing which has been
>> already verified with valid_section() etc.
> 
> memblock_is_memory() is required because arm64 frees unused parts of the
> memory map. So, for instance, if we have 64M out of 128M populated in a
> section the section based calculation would return 1 for a pfn in the
> second half of the section, but there would be no memory map there.

Understood.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()

2021-04-21 Thread Anshuman Khandual
On 4/21/21 5:49 PM, Mike Rapoport wrote:
> On Wed, Apr 21, 2021 at 04:29:48PM +0530, Anshuman Khandual wrote:
>>
>> On 4/21/21 12:21 PM, Mike Rapoport wrote:
>>> From: Mike Rapoport 
>>>
>>> The intended semantics of pfn_valid() is to verify whether there is a
>>> struct page for the pfn in question and nothing else.
>>>
>>> Yet, on arm64 it is used to distinguish memory areas that are mapped in the
>>> linear map vs those that require ioremap() to access them.
>>>
>>> Introduce a dedicated pfn_is_map_memory() wrapper for
>>> memblock_is_map_memory() to perform such check and use it where
>>> appropriate.
>>>
>>> Using a wrapper allows to avoid cyclic include dependencies.
>>>
>>> Signed-off-by: Mike Rapoport 
>>> ---
>>>  arch/arm64/include/asm/memory.h |  2 +-
>>>  arch/arm64/include/asm/page.h   |  1 +
>>>  arch/arm64/kvm/mmu.c|  2 +-
>>>  arch/arm64/mm/init.c| 11 +++
>>>  arch/arm64/mm/ioremap.c |  4 ++--
>>>  arch/arm64/mm/mmu.c |  2 +-
>>>  6 files changed, 17 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/memory.h 
>>> b/arch/arm64/include/asm/memory.h
>>> index 0aabc3be9a75..194f9f993d30 100644
>>> --- a/arch/arm64/include/asm/memory.h
>>> +++ b/arch/arm64/include/asm/memory.h
>>> @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
>>>  
>>>  #define virt_addr_valid(addr)  ({  
>>> \
>>> __typeof__(addr) __addr = __tag_reset(addr);\
>>> -   __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));  \
>>> +   __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr));  
>>> \
>>>  })
>>>  
>>>  void dump_mem_limit(void);
>>> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
>>> index 012cffc574e8..99a6da91f870 100644
>>> --- a/arch/arm64/include/asm/page.h
>>> +++ b/arch/arm64/include/asm/page.h
>>> @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
>>>  typedef struct page *pgtable_t;
>>>  
>>>  extern int pfn_valid(unsigned long);
>>> +extern int pfn_is_map_memory(unsigned long);
>>
>> Check patch is complaining about this.
>>
>> WARNING: function definition argument 'unsigned long' should also have an 
>> identifier name
>> #50: FILE: arch/arm64/include/asm/page.h:41:
>> +extern int pfn_is_map_memory(unsigned long);
>>
>>
>>>  
>>>  #include 
>>>  
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index 8711894db8c2..23dd99e29b23 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>>>  
>>>  static bool kvm_is_device_pfn(unsigned long pfn)
>>>  {
>>> -   return !pfn_valid(pfn);
>>> +   return !pfn_is_map_memory(pfn);
>>>  }
>>>  
>>>  /*
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 3685e12aba9b..dc03bdc12c0f 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -258,6 +258,17 @@ int pfn_valid(unsigned long pfn)
>>>  }
>>>  EXPORT_SYMBOL(pfn_valid);
>>>  
>>> +int pfn_is_map_memory(unsigned long pfn)
>>> +{
>>> +   phys_addr_t addr = PFN_PHYS(pfn);
>>> +
>>
>> Should also bring with it, the comment regarding upper bits in
>> the pfn from arm64 pfn_valid().
> 
> I think a reference to the comment in pfn_valid() will suffice.

Okay.

> 
> BTW, I wonder how is that other architectures do not need this check?

Trying to move that into generic pfn_valid() in mmzone.h, will resend
the RFC patch after this series.

https://patchwork.kernel.org/project/linux-mm/patch/1615174073-10520-1-git-send-email-anshuman.khand...@arm.com/

>  
>>> +   if (PHYS_PFN(addr) != pfn)
>>> +   return 0;
>>> +   
>>
>>  ^ trailing spaces here.
>>
>> ERROR: trailing whitespace
>> #81: FILE: arch/arm64/mm/init.c:263:
>> +^I$
> 
> Oops :)
>  
>>> +   return memblock_is_map_memory(addr);
>>> +}
>>> +EXPORT_SYMBOL(pfn_is_map_memory);
>>> +
>>
>> Is the EXPORT_SYMBOL() required to build drivers which will use
>> pfn_is_map_memory() but currently use pfn_valid() ?
> 
> Yes, this is required for virt_addr_valid() that is used by modules.
> 

There will be two adjacent EXPORT_SYMBOLS(), one for pfn_valid() and
one for pfn_is_map_memory(). But its okay I guess, cant help it.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()

2021-04-21 Thread Mike Rapoport
On Wed, Apr 21, 2021 at 04:36:46PM +0530, Anshuman Khandual wrote:
> 
> On 4/21/21 12:21 PM, Mike Rapoport wrote:
> > From: Mike Rapoport 
> > 
> > The arm64's version of pfn_valid() differs from the generic because of two
> > reasons:
> > 
> > * Parts of the memory map are freed during boot. This makes it necessary to
> >   verify that there is actual physical memory that corresponds to a pfn
> >   which is done by querying memblock.
> > 
> > * There are NOMAP memory regions. These regions are not mapped in the
> >   linear map and until the previous commit the struct pages representing
> >   these areas had default values.
> > 
> > As the consequence of absence of the special treatment of NOMAP regions in
> > the memory map it was necessary to use memblock_is_map_memory() in
> > pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> > generic mm functionality would not treat a NOMAP page as a normal page.
> > 
> > Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> > the rest of core mm will treat them as unusable memory and thus
> > pfn_valid_within() is no longer required at all and can be disabled by
> > removing CONFIG_HOLES_IN_ZONE on arm64.
> 
> This makes sense.
> 
> > 
> > pfn_valid() can be slightly simplified by replacing
> > memblock_is_map_memory() with memblock_is_memory().
> > 
> > Signed-off-by: Mike Rapoport 
> > ---
> >  arch/arm64/Kconfig   | 3 ---
> >  arch/arm64/mm/init.c | 4 ++--
> >  2 files changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index e4e1b6550115..58e439046d05 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> > def_bool y
> > depends on NUMA
> >  
> > -config HOLES_IN_ZONE
> > -   def_bool y
> > -
> 
> Right.
> 
> >  source "kernel/Kconfig.hz"
> >  
> >  config ARCH_SPARSEMEM_ENABLE
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index dc03bdc12c0f..eb3f56fb8c7c 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
> >  
> > /*
> >  * ZONE_DEVICE memory does not have the memblock entries.
> > -* memblock_is_map_memory() check for ZONE_DEVICE based
> > +* memblock_is_memory() check for ZONE_DEVICE based
> >  * addresses will always fail. Even the normal hotplugged
> >  * memory will never have MEMBLOCK_NOMAP flag set in their
> >  * memblock entries. Skip memblock search for all non early
> > @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
> > return pfn_section_valid(ms, pfn);
> >  }
> >  #endif
> > -   return memblock_is_map_memory(addr);
> > +   return memblock_is_memory(addr);
> 
> Wondering if MEMBLOCK_NOMAP is now being treated similarly to other
> memory pfns for page table walking purpose but with PageReserved(),
> why memblock_is_memory() is still required ? At this point, should
> not we just return valid for early_section() memory. As pfn_valid()
> now just implies that pfn has a struct page backing which has been
> already verified with valid_section() etc.

memblock_is_memory() is required because arm64 frees unused parts of the
memory map. So, for instance, if we have 64M out of 128M populated in a
section the section based calculation would return 1 for a pfn in the
second half of the section, but there would be no memory map there.


> >  }
> >  EXPORT_SYMBOL(pfn_valid);
> >  
> > 

-- 
Sincerely yours,
Mike.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()

2021-04-21 Thread Mike Rapoport
On Wed, Apr 21, 2021 at 04:29:48PM +0530, Anshuman Khandual wrote:
> 
> On 4/21/21 12:21 PM, Mike Rapoport wrote:
> > From: Mike Rapoport 
> > 
> > The intended semantics of pfn_valid() is to verify whether there is a
> > struct page for the pfn in question and nothing else.
> > 
> > Yet, on arm64 it is used to distinguish memory areas that are mapped in the
> > linear map vs those that require ioremap() to access them.
> > 
> > Introduce a dedicated pfn_is_map_memory() wrapper for
> > memblock_is_map_memory() to perform such check and use it where
> > appropriate.
> > 
> > Using a wrapper allows to avoid cyclic include dependencies.
> > 
> > Signed-off-by: Mike Rapoport 
> > ---
> >  arch/arm64/include/asm/memory.h |  2 +-
> >  arch/arm64/include/asm/page.h   |  1 +
> >  arch/arm64/kvm/mmu.c|  2 +-
> >  arch/arm64/mm/init.c| 11 +++
> >  arch/arm64/mm/ioremap.c |  4 ++--
> >  arch/arm64/mm/mmu.c |  2 +-
> >  6 files changed, 17 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/memory.h 
> > b/arch/arm64/include/asm/memory.h
> > index 0aabc3be9a75..194f9f993d30 100644
> > --- a/arch/arm64/include/asm/memory.h
> > +++ b/arch/arm64/include/asm/memory.h
> > @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
> >  
> >  #define virt_addr_valid(addr)  ({  
> > \
> > __typeof__(addr) __addr = __tag_reset(addr);\
> > -   __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));  \
> > +   __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr));  
> > \
> >  })
> >  
> >  void dump_mem_limit(void);
> > diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> > index 012cffc574e8..99a6da91f870 100644
> > --- a/arch/arm64/include/asm/page.h
> > +++ b/arch/arm64/include/asm/page.h
> > @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
> >  typedef struct page *pgtable_t;
> >  
> >  extern int pfn_valid(unsigned long);
> > +extern int pfn_is_map_memory(unsigned long);
> 
> Check patch is complaining about this.
> 
> WARNING: function definition argument 'unsigned long' should also have an 
> identifier name
> #50: FILE: arch/arm64/include/asm/page.h:41:
> +extern int pfn_is_map_memory(unsigned long);
> 
> 
> >  
> >  #include 
> >  
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 8711894db8c2..23dd99e29b23 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
> >  
> >  static bool kvm_is_device_pfn(unsigned long pfn)
> >  {
> > -   return !pfn_valid(pfn);
> > +   return !pfn_is_map_memory(pfn);
> >  }
> >  
> >  /*
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 3685e12aba9b..dc03bdc12c0f 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -258,6 +258,17 @@ int pfn_valid(unsigned long pfn)
> >  }
> >  EXPORT_SYMBOL(pfn_valid);
> >  
> > +int pfn_is_map_memory(unsigned long pfn)
> > +{
> > +   phys_addr_t addr = PFN_PHYS(pfn);
> > +
> 
> Should also bring with it, the comment regarding upper bits in
> the pfn from arm64 pfn_valid().

I think a reference to the comment in pfn_valid() will suffice.

BTW, I wonder how is that other architectures do not need this check?
 
> > +   if (PHYS_PFN(addr) != pfn)
> > +   return 0;
> > +   
> 
>  ^ trailing spaces here.
> 
> ERROR: trailing whitespace
> #81: FILE: arch/arm64/mm/init.c:263:
> +^I$

Oops :)
 
> > +   return memblock_is_map_memory(addr);
> > +}
> > +EXPORT_SYMBOL(pfn_is_map_memory);
> > +
> 
> Is the EXPORT_SYMBOL() required to build drivers which will use
> pfn_is_map_memory() but currently use pfn_valid() ?

Yes, this is required for virt_addr_valid() that is used by modules.

-- 
Sincerely yours,
Mike.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)

2021-04-21 Thread Vivek Kumar Gautam

Hi Eric,

On 4/11/21 4:42 PM, Eric Auger wrote:

SMMUv3 Nested Stage Setup (IOMMU part)



[snip]



Eric Auger (12):
   iommu: Introduce attach/detach_pasid_table API
   iommu: Introduce bind/unbind_guest_msi
   iommu/smmuv3: Allow s1 and s2 configs to coexist
   iommu/smmuv3: Get prepared for nested stage support
   iommu/smmuv3: Implement attach/detach_pasid_table
   iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
   iommu/smmuv3: Implement cache_invalidate
   dma-iommu: Implement NESTED_MSI cookie
   iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
   iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
 regions
   iommu/smmuv3: Implement bind/unbind_guest_msi
   iommu/smmuv3: report additional recoverable faults


[snip]

I noticed that the patch[1]:
[PATCH v13 15/15] iommu/smmuv3: Add PASID cache invalidation per PASID
has been dropped in the v14 and v15 of
 this series.

Is this planned to be part of any future series, or did I miss a
discussion about dropping the patch? :-)


[1]
https://patchwork.kernel.org/project/kvm/patch/20201118112151.25412-16-eric.au...@redhat.com/


Best regards
Vivek
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting

2021-04-21 Thread Marc Zyngier
On Wed, 21 Apr 2021 07:17:44 +0100,
Keqian Zhu  wrote:
> 
> Hi Gavin,
> 
> On 2021/4/21 14:20, Gavin Shan wrote:
> > Hi Keqian and Santosh,
> > 
> > On 4/21/21 12:59 PM, Keqian Zhu wrote:
> >> On 2020/10/22 0:16, Santosh Shukla wrote:
> >>> The Commit:6d674e28 introduces a notion to detect and handle the
> >>> device mapping. The commit checks for the VM_PFNMAP flag is set
> >>> in vma->flags and if set then marks force_pte to true such that
> >>> if force_pte is true then ignore the THP function check
> >>> (/transparent_hugepage_adjust()).
> >>>
> >>> There could be an issue with the VM_PFNMAP flag setting and checking.
> >>> For example consider a case where the mdev vendor driver register's
> >>> the vma_fault handler named vma_mmio_fault(), which maps the
> >>> host MMIO region in-turn calls remap_pfn_range() and maps
> >>> the MMIO's vma space. Where, remap_pfn_range implicitly sets
> >>> the VM_PFNMAP flag into vma->flags.
> >> Could you give the name of the mdev vendor driver that triggers this issue?
> >> I failed to find one according to your description. Thanks.
> >>
> > 
> > I think it would be fixed in driver side to set VM_PFNMAP in
> > its mmap() callback (call_mmap()), like vfio PCI driver does.
> > It means it won't be delayed until page fault is issued and
> > remap_pfn_range() is called. It's determined from the beginning
> > that the vma associated the mdev vendor driver is serving as
> > PFN remapping purpose. So the vma should be populated completely,
> > including the VM_PFNMAP flag before it becomes visible to user
> > space.

Why should that be a requirement? Lazy populating of the VMA should be
perfectly acceptable if the fault can only happen on the CPU side.

> > 
> > The example can be found from vfio driver in drivers/vfio/pci/vfio_pci.c:
> > vfio_pci_mmap:   VM_PFNMAP is set for the vma
> > vfio_pci_mmap_fault: remap_pfn_range() is called
> Right. I have discussed the above with Marc. I want to find the driver
> to fix it. However, AFAICS, there is no driver matches the description...

I have the feeling this is an out-of-tree driver (and Santosh email
address is bouncing, so I guess we won't have much information from
him).

However, the simple fact that any odd driver can provide a fault
handler and populate it the VMA on demand makes me think that we need
to support this case.

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()

2021-04-21 Thread Anshuman Khandual


On 4/21/21 12:21 PM, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> The arm64's version of pfn_valid() differs from the generic because of two
> reasons:
> 
> * Parts of the memory map are freed during boot. This makes it necessary to
>   verify that there is actual physical memory that corresponds to a pfn
>   which is done by querying memblock.
> 
> * There are NOMAP memory regions. These regions are not mapped in the
>   linear map and until the previous commit the struct pages representing
>   these areas had default values.
> 
> As the consequence of absence of the special treatment of NOMAP regions in
> the memory map it was necessary to use memblock_is_map_memory() in
> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> generic mm functionality would not treat a NOMAP page as a normal page.
> 
> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> the rest of core mm will treat them as unusable memory and thus
> pfn_valid_within() is no longer required at all and can be disabled by
> removing CONFIG_HOLES_IN_ZONE on arm64.

This makes sense.

> 
> pfn_valid() can be slightly simplified by replacing
> memblock_is_map_memory() with memblock_is_memory().
> 
> Signed-off-by: Mike Rapoport 
> ---
>  arch/arm64/Kconfig   | 3 ---
>  arch/arm64/mm/init.c | 4 ++--
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e4e1b6550115..58e439046d05 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>   def_bool y
>   depends on NUMA
>  
> -config HOLES_IN_ZONE
> - def_bool y
> -

Right.

>  source "kernel/Kconfig.hz"
>  
>  config ARCH_SPARSEMEM_ENABLE
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index dc03bdc12c0f..eb3f56fb8c7c 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
>  
>   /*
>* ZONE_DEVICE memory does not have the memblock entries.
> -  * memblock_is_map_memory() check for ZONE_DEVICE based
> +  * memblock_is_memory() check for ZONE_DEVICE based
>* addresses will always fail. Even the normal hotplugged
>* memory will never have MEMBLOCK_NOMAP flag set in their
>* memblock entries. Skip memblock search for all non early
> @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
>   return pfn_section_valid(ms, pfn);
>  }
>  #endif
> - return memblock_is_map_memory(addr);
> + return memblock_is_memory(addr);

Wondering if MEMBLOCK_NOMAP is now being treated similarly to other
memory pfns for page table walking purpose but with PageReserved(),
why memblock_is_memory() is still required ? At this point, should
not we just return valid for early_section() memory. As pfn_valid()
now just implies that pfn has a struct page backing which has been
already verified with valid_section() etc.

>  }
>  EXPORT_SYMBOL(pfn_valid);
>  
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()

2021-04-21 Thread Anshuman Khandual


On 4/21/21 12:21 PM, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> The intended semantics of pfn_valid() is to verify whether there is a
> struct page for the pfn in question and nothing else.
> 
> Yet, on arm64 it is used to distinguish memory areas that are mapped in the
> linear map vs those that require ioremap() to access them.
> 
> Introduce a dedicated pfn_is_map_memory() wrapper for
> memblock_is_map_memory() to perform such check and use it where
> appropriate.
> 
> Using a wrapper allows to avoid cyclic include dependencies.
> 
> Signed-off-by: Mike Rapoport 
> ---
>  arch/arm64/include/asm/memory.h |  2 +-
>  arch/arm64/include/asm/page.h   |  1 +
>  arch/arm64/kvm/mmu.c|  2 +-
>  arch/arm64/mm/init.c| 11 +++
>  arch/arm64/mm/ioremap.c |  4 ++--
>  arch/arm64/mm/mmu.c |  2 +-
>  6 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 0aabc3be9a75..194f9f993d30 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
>  
>  #define virt_addr_valid(addr)({  
> \
>   __typeof__(addr) __addr = __tag_reset(addr);\
> - __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));  \
> + __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr));  
> \
>  })
>  
>  void dump_mem_limit(void);
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 012cffc574e8..99a6da91f870 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
>  typedef struct page *pgtable_t;
>  
>  extern int pfn_valid(unsigned long);
> +extern int pfn_is_map_memory(unsigned long);

Check patch is complaining about this.

WARNING: function definition argument 'unsigned long' should also have an 
identifier name
#50: FILE: arch/arm64/include/asm/page.h:41:
+extern int pfn_is_map_memory(unsigned long);


>  
>  #include 
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 8711894db8c2..23dd99e29b23 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>  
>  static bool kvm_is_device_pfn(unsigned long pfn)
>  {
> - return !pfn_valid(pfn);
> + return !pfn_is_map_memory(pfn);
>  }
>  
>  /*
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 3685e12aba9b..dc03bdc12c0f 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -258,6 +258,17 @@ int pfn_valid(unsigned long pfn)
>  }
>  EXPORT_SYMBOL(pfn_valid);
>  
> +int pfn_is_map_memory(unsigned long pfn)
> +{
> + phys_addr_t addr = PFN_PHYS(pfn);
> +

Should also bring with it, the comment regarding upper bits in
the pfn from arm64 pfn_valid().

> + if (PHYS_PFN(addr) != pfn)
> + return 0;
> + 

 ^ trailing spaces here.

ERROR: trailing whitespace
#81: FILE: arch/arm64/mm/init.c:263:
+^I$

> + return memblock_is_map_memory(addr);
> +}
> +EXPORT_SYMBOL(pfn_is_map_memory);
> +

Is the EXPORT_SYMBOL() required to build drivers which will use
pfn_is_map_memory() but currently use pfn_valid() ?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 2/4] memblock: update initialization of reserved pages

2021-04-21 Thread Anshuman Khandual


On 4/21/21 12:21 PM, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
> 
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().
> 
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.
> 
> Signed-off-by: Mike Rapoport 
> ---
>  include/linux/memblock.h |  4 +++-
>  mm/memblock.c| 28 ++--
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 5984fff3f175..634c1a578db8 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn;
>   * @MEMBLOCK_NONE: no special request
>   * @MEMBLOCK_HOTPLUG: hotpluggable region
>   * @MEMBLOCK_MIRROR: mirrored region
> - * @MEMBLOCK_NOMAP: don't add to kernel direct mapping
> + * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
> + * reserved in the memory map; refer to memblock_mark_nomap() description
> + * for futher details

Small nit - s/futher/further

>   */
>  enum memblock_flags {
>   MEMBLOCK_NONE   = 0x0,  /* No special request */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..3abf2c3fea7f 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t 
> base, phys_addr_t size)
>   * @base: the base phys addr of the region
>   * @size: the size of the region
>   *
> + * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the
> + * direct mapping of the physical memory. These regions will still be
> + * covered by the memory map. The struct page representing NOMAP memory
> + * frames in the memory map will be PageReserved()
> + *
>   * Return: 0 on success, -errno on failure.
>   */
>  int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
> @@ -2002,6 +2007,26 @@ static unsigned long __init 
> __free_memory_core(phys_addr_t start,
>   return end_pfn - start_pfn;
>  }
>  
> +static void __init memmap_init_reserved_pages(void)
> +{
> + struct memblock_region *region;
> + phys_addr_t start, end;
> + u64 i;
> +
> + /* initialize struct pages for the reserved regions */
> + for_each_reserved_mem_range(i, , )
> + reserve_bootmem_region(start, end);
> +
> + /* and also treat struct pages for the NOMAP regions as PageReserved */
> + for_each_mem_region(region) {
> + if (memblock_is_nomap(region)) {
> + start = region->base;
> + end = start + region->size;
> + reserve_bootmem_region(start, end);
> + }
> + }

I guess there is no possible method to unify both these loops.

> +}
> +
>  static unsigned long __init free_low_memory_core_early(void)
>  {
>   unsigned long count = 0;
> @@ -2010,8 +2035,7 @@ static unsigned long __init 
> free_low_memory_core_early(void)
>  
>   memblock_clear_hotplug(0, -1);
>  
> - for_each_reserved_mem_range(i, , )
> - reserve_bootmem_region(start, end);
> + memmap_init_reserved_pages();
>  
>   /*
>* We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> 


Reviewed-by: Anshuman Khandual 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/4] include/linux/mmzone.h: add documentation for pfn_valid()

2021-04-21 Thread Anshuman Khandual
On 4/21/21 12:21 PM, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> Add comment describing the semantics of pfn_valid() that clarifies that
> pfn_valid() only checks for availability of a memory map entry (i.e. struct
> page) for a PFN rather than availability of usable memory backing that PFN.
> 
> The most "generic" version of pfn_valid() used by the configurations with
> SPARSEMEM enabled resides in include/linux/mmzone.h so this is the most
> suitable place for documentation about semantics of pfn_valid().
> 
> Suggested-by: Anshuman Khandual 
> Signed-off-by: Mike Rapoport 

Reviewed-by: Anshuman Khandual 

> ---
>  include/linux/mmzone.h | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 47946cec7584..961f0eeefb62 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1410,6 +1410,17 @@ static inline int pfn_section_valid(struct mem_section 
> *ms, unsigned long pfn)
>  #endif
>  
>  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> +/**
> + * pfn_valid - check if there is a valid memory map entry for a PFN
> + * @pfn: the page frame number to check
> + *
> + * Check if there is a valid memory map entry aka struct page for the @pfn.
> + * Note, that availability of the memory map entry does not imply that
> + * there is actual usable memory at that @pfn. The struct page may
> + * represent a hole or an unusable page frame.
> + *
> + * Return: 1 for PFNs that have memory map entries and 0 otherwise
> + */
>  static inline int pfn_valid(unsigned long pfn)
>  {
>   struct mem_section *ms;
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 2/4] memblock: update initialization of reserved pages

2021-04-21 Thread David Hildenbrand

On 21.04.21 08:51, Mike Rapoport wrote:

From: Mike Rapoport 

The struct pages representing a reserved memory region are initialized
using reserve_bootmem_range() function. This function is called for each
reserved region just before the memory is freed from memblock to the buddy
page allocator.

The struct pages for MEMBLOCK_NOMAP regions are kept with the default
values set by the memory map initialization which makes it necessary to
have a special treatment for such pages in pfn_valid() and
pfn_valid_within().

Split out initialization of the reserved pages to a function with a
meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
reserved regions and mark struct pages for the NOMAP regions as
PageReserved.

Signed-off-by: Mike Rapoport 
---
  include/linux/memblock.h |  4 +++-
  mm/memblock.c| 28 ++--
  2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5984fff3f175..634c1a578db8 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn;
   * @MEMBLOCK_NONE: no special request
   * @MEMBLOCK_HOTPLUG: hotpluggable region
   * @MEMBLOCK_MIRROR: mirrored region
- * @MEMBLOCK_NOMAP: don't add to kernel direct mapping
+ * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
+ * reserved in the memory map; refer to memblock_mark_nomap() description
+ * for futher details
   */
  enum memblock_flags {
MEMBLOCK_NONE   = 0x0,  /* No special request */
diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..3abf2c3fea7f 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, 
phys_addr_t size)
   * @base: the base phys addr of the region
   * @size: the size of the region
   *
+ * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the
+ * direct mapping of the physical memory. These regions will still be
+ * covered by the memory map. The struct page representing NOMAP memory
+ * frames in the memory map will be PageReserved()
+ *
   * Return: 0 on success, -errno on failure.
   */
  int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
@@ -2002,6 +2007,26 @@ static unsigned long __init 
__free_memory_core(phys_addr_t start,
return end_pfn - start_pfn;
  }
  
+static void __init memmap_init_reserved_pages(void)

+{
+   struct memblock_region *region;
+   phys_addr_t start, end;
+   u64 i;
+
+   /* initialize struct pages for the reserved regions */
+   for_each_reserved_mem_range(i, , )
+   reserve_bootmem_region(start, end);
+
+   /* and also treat struct pages for the NOMAP regions as PageReserved */
+   for_each_mem_region(region) {
+   if (memblock_is_nomap(region)) {
+   start = region->base;
+   end = start + region->size;
+   reserve_bootmem_region(start, end);
+   }
+   }
+}
+
  static unsigned long __init free_low_memory_core_early(void)
  {
unsigned long count = 0;
@@ -2010,8 +2035,7 @@ static unsigned long __init 
free_low_memory_core_early(void)
  
  	memblock_clear_hotplug(0, -1);
  
-	for_each_reserved_mem_range(i, , )

-   reserve_bootmem_region(start, end);
+   memmap_init_reserved_pages();
  
  	/*

 * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id



Reviewed-by: David Hildenbrand 

--
Thanks,

David / dhildenb

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()

2021-04-21 Thread David Hildenbrand

On 21.04.21 08:51, Mike Rapoport wrote:

From: Mike Rapoport 

The arm64's version of pfn_valid() differs from the generic because of two
reasons:

* Parts of the memory map are freed during boot. This makes it necessary to
   verify that there is actual physical memory that corresponds to a pfn
   which is done by querying memblock.

* There are NOMAP memory regions. These regions are not mapped in the
   linear map and until the previous commit the struct pages representing
   these areas had default values.

As the consequence of absence of the special treatment of NOMAP regions in
the memory map it was necessary to use memblock_is_map_memory() in
pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
generic mm functionality would not treat a NOMAP page as a normal page.

Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
the rest of core mm will treat them as unusable memory and thus
pfn_valid_within() is no longer required at all and can be disabled by
removing CONFIG_HOLES_IN_ZONE on arm64.

pfn_valid() can be slightly simplified by replacing
memblock_is_map_memory() with memblock_is_memory().

Signed-off-by: Mike Rapoport 
---
  arch/arm64/Kconfig   | 3 ---
  arch/arm64/mm/init.c | 4 ++--
  2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e4e1b6550115..58e439046d05 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
def_bool y
depends on NUMA
  
-config HOLES_IN_ZONE

-   def_bool y
-
  source "kernel/Kconfig.hz"
  
  config ARCH_SPARSEMEM_ENABLE

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index dc03bdc12c0f..eb3f56fb8c7c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
  
  	/*

 * ZONE_DEVICE memory does not have the memblock entries.
-* memblock_is_map_memory() check for ZONE_DEVICE based
+* memblock_is_memory() check for ZONE_DEVICE based
 * addresses will always fail. Even the normal hotplugged
 * memory will never have MEMBLOCK_NOMAP flag set in their
 * memblock entries. Skip memblock search for all non early
@@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
return pfn_section_valid(ms, pfn);
  }
  #endif
-   return memblock_is_map_memory(addr);
+   return memblock_is_memory(addr);
  }
  EXPORT_SYMBOL(pfn_valid);
  



Acked-by: David Hildenbrand 

--
Thanks,

David / dhildenb

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()

2021-04-21 Thread Mike Rapoport
From: Mike Rapoport 

The arm64's version of pfn_valid() differs from the generic because of two
reasons:

* Parts of the memory map are freed during boot. This makes it necessary to
  verify that there is actual physical memory that corresponds to a pfn
  which is done by querying memblock.

* There are NOMAP memory regions. These regions are not mapped in the
  linear map and until the previous commit the struct pages representing
  these areas had default values.

As the consequence of absence of the special treatment of NOMAP regions in
the memory map it was necessary to use memblock_is_map_memory() in
pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
generic mm functionality would not treat a NOMAP page as a normal page.

Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
the rest of core mm will treat them as unusable memory and thus
pfn_valid_within() is no longer required at all and can be disabled by
removing CONFIG_HOLES_IN_ZONE on arm64.

pfn_valid() can be slightly simplified by replacing
memblock_is_map_memory() with memblock_is_memory().

Signed-off-by: Mike Rapoport 
---
 arch/arm64/Kconfig   | 3 ---
 arch/arm64/mm/init.c | 4 ++--
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e4e1b6550115..58e439046d05 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
def_bool y
depends on NUMA
 
-config HOLES_IN_ZONE
-   def_bool y
-
 source "kernel/Kconfig.hz"
 
 config ARCH_SPARSEMEM_ENABLE
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index dc03bdc12c0f..eb3f56fb8c7c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn)
 
/*
 * ZONE_DEVICE memory does not have the memblock entries.
-* memblock_is_map_memory() check for ZONE_DEVICE based
+* memblock_is_memory() check for ZONE_DEVICE based
 * addresses will always fail. Even the normal hotplugged
 * memory will never have MEMBLOCK_NOMAP flag set in their
 * memblock entries. Skip memblock search for all non early
@@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn)
return pfn_section_valid(ms, pfn);
 }
 #endif
-   return memblock_is_map_memory(addr);
+   return memblock_is_memory(addr);
 }
 EXPORT_SYMBOL(pfn_valid);
 
-- 
2.28.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()

2021-04-21 Thread Mike Rapoport
From: Mike Rapoport 

The intended semantics of pfn_valid() is to verify whether there is a
struct page for the pfn in question and nothing else.

Yet, on arm64 it is used to distinguish memory areas that are mapped in the
linear map vs those that require ioremap() to access them.

Introduce a dedicated pfn_is_map_memory() wrapper for
memblock_is_map_memory() to perform such check and use it where
appropriate.

Using a wrapper allows to avoid cyclic include dependencies.

Signed-off-by: Mike Rapoport 
---
 arch/arm64/include/asm/memory.h |  2 +-
 arch/arm64/include/asm/page.h   |  1 +
 arch/arm64/kvm/mmu.c|  2 +-
 arch/arm64/mm/init.c| 11 +++
 arch/arm64/mm/ioremap.c |  4 ++--
 arch/arm64/mm/mmu.c |  2 +-
 6 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0aabc3be9a75..194f9f993d30 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
 
 #define virt_addr_valid(addr)  ({  \
__typeof__(addr) __addr = __tag_reset(addr);\
-   __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));  \
+   __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr));  
\
 })
 
 void dump_mem_limit(void);
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 012cffc574e8..99a6da91f870 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
 typedef struct page *pgtable_t;
 
 extern int pfn_valid(unsigned long);
+extern int pfn_is_map_memory(unsigned long);
 
 #include 
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8711894db8c2..23dd99e29b23 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 
 static bool kvm_is_device_pfn(unsigned long pfn)
 {
-   return !pfn_valid(pfn);
+   return !pfn_is_map_memory(pfn);
 }
 
 /*
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 3685e12aba9b..dc03bdc12c0f 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -258,6 +258,17 @@ int pfn_valid(unsigned long pfn)
 }
 EXPORT_SYMBOL(pfn_valid);
 
+int pfn_is_map_memory(unsigned long pfn)
+{
+   phys_addr_t addr = PFN_PHYS(pfn);
+
+   if (PHYS_PFN(addr) != pfn)
+   return 0;
+   
+   return memblock_is_map_memory(addr);
+}
+EXPORT_SYMBOL(pfn_is_map_memory);
+
 static phys_addr_t memory_limit = PHYS_ADDR_MAX;
 
 /*
diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index b5e83c46b23e..b7c81dacabf0 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -43,7 +43,7 @@ static void __iomem *__ioremap_caller(phys_addr_t phys_addr, 
size_t size,
/*
 * Don't allow RAM to be mapped.
 */
-   if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr
+   if (WARN_ON(pfn_is_map_memory(__phys_to_pfn(phys_addr
return NULL;
 
area = get_vm_area_caller(size, VM_IOREMAP, caller);
@@ -84,7 +84,7 @@ EXPORT_SYMBOL(iounmap);
 void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
 {
/* For normal memory we already have a cacheable mapping. */
-   if (pfn_valid(__phys_to_pfn(phys_addr)))
+   if (pfn_is_map_memory(__phys_to_pfn(phys_addr)))
return (void __iomem *)__phys_to_virt(phys_addr);
 
return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 5d9550fdb9cf..26045e9adbd7 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -81,7 +81,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
  unsigned long size, pgprot_t vma_prot)
 {
-   if (!pfn_valid(pfn))
+   if (!pfn_is_map_memory(pfn))
return pgprot_noncached(vma_prot);
else if (file->f_flags & O_SYNC)
return pgprot_writecombine(vma_prot);
-- 
2.28.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 2/4] memblock: update initialization of reserved pages

2021-04-21 Thread Mike Rapoport
From: Mike Rapoport 

The struct pages representing a reserved memory region are initialized
using reserve_bootmem_range() function. This function is called for each
reserved region just before the memory is freed from memblock to the buddy
page allocator.

The struct pages for MEMBLOCK_NOMAP regions are kept with the default
values set by the memory map initialization which makes it necessary to
have a special treatment for such pages in pfn_valid() and
pfn_valid_within().

Split out initialization of the reserved pages to a function with a
meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
reserved regions and mark struct pages for the NOMAP regions as
PageReserved.

Signed-off-by: Mike Rapoport 
---
 include/linux/memblock.h |  4 +++-
 mm/memblock.c| 28 ++--
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5984fff3f175..634c1a578db8 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn;
  * @MEMBLOCK_NONE: no special request
  * @MEMBLOCK_HOTPLUG: hotpluggable region
  * @MEMBLOCK_MIRROR: mirrored region
- * @MEMBLOCK_NOMAP: don't add to kernel direct mapping
+ * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
+ * reserved in the memory map; refer to memblock_mark_nomap() description
+ * for futher details
  */
 enum memblock_flags {
MEMBLOCK_NONE   = 0x0,  /* No special request */
diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..3abf2c3fea7f 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, 
phys_addr_t size)
  * @base: the base phys addr of the region
  * @size: the size of the region
  *
+ * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the
+ * direct mapping of the physical memory. These regions will still be
+ * covered by the memory map. The struct page representing NOMAP memory
+ * frames in the memory map will be PageReserved()
+ *
  * Return: 0 on success, -errno on failure.
  */
 int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
@@ -2002,6 +2007,26 @@ static unsigned long __init 
__free_memory_core(phys_addr_t start,
return end_pfn - start_pfn;
 }
 
+static void __init memmap_init_reserved_pages(void)
+{
+   struct memblock_region *region;
+   phys_addr_t start, end;
+   u64 i;
+
+   /* initialize struct pages for the reserved regions */
+   for_each_reserved_mem_range(i, , )
+   reserve_bootmem_region(start, end);
+
+   /* and also treat struct pages for the NOMAP regions as PageReserved */
+   for_each_mem_region(region) {
+   if (memblock_is_nomap(region)) {
+   start = region->base;
+   end = start + region->size;
+   reserve_bootmem_region(start, end);
+   }
+   }
+}
+
 static unsigned long __init free_low_memory_core_early(void)
 {
unsigned long count = 0;
@@ -2010,8 +2035,7 @@ static unsigned long __init 
free_low_memory_core_early(void)
 
memblock_clear_hotplug(0, -1);
 
-   for_each_reserved_mem_range(i, , )
-   reserve_bootmem_region(start, end);
+   memmap_init_reserved_pages();
 
/*
 * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
-- 
2.28.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 1/4] include/linux/mmzone.h: add documentation for pfn_valid()

2021-04-21 Thread Mike Rapoport
From: Mike Rapoport 

Add comment describing the semantics of pfn_valid() that clarifies that
pfn_valid() only checks for availability of a memory map entry (i.e. struct
page) for a PFN rather than availability of usable memory backing that PFN.

The most "generic" version of pfn_valid() used by the configurations with
SPARSEMEM enabled resides in include/linux/mmzone.h so this is the most
suitable place for documentation about semantics of pfn_valid().

Suggested-by: Anshuman Khandual 
Signed-off-by: Mike Rapoport 
---
 include/linux/mmzone.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 47946cec7584..961f0eeefb62 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1410,6 +1410,17 @@ static inline int pfn_section_valid(struct mem_section 
*ms, unsigned long pfn)
 #endif
 
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
+/**
+ * pfn_valid - check if there is a valid memory map entry for a PFN
+ * @pfn: the page frame number to check
+ *
+ * Check if there is a valid memory map entry aka struct page for the @pfn.
+ * Note, that availability of the memory map entry does not imply that
+ * there is actual usable memory at that @pfn. The struct page may
+ * represent a hole or an unusable page frame.
+ *
+ * Return: 1 for PFNs that have memory map entries and 0 otherwise
+ */
 static inline int pfn_valid(unsigned long pfn)
 {
struct mem_section *ms;
-- 
2.28.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()

2021-04-21 Thread Mike Rapoport
From: Mike Rapoport 

Hi,

These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
pfn_valid_within() to 1. 

The idea is to mark NOMAP pages as reserved in the memory map and restore
the intended semantics of pfn_valid() to designate availability of struct
page for a pfn.

With this the core mm will be able to cope with the fact that it cannot use
NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
will be treated correctly even without the need for pfn_valid_within.

The patches are only boot tested on qemu-system-aarch64 so I'd really
appreciate memory stress tests on real hardware.

If this actually works we'll be one step closer to drop custom pfn_valid()
on arm64 altogether.

v2:
* Add check for PFN overflow in pfn_is_map_memory()
* Add Acked-by and Reviewed-by tags, thanks David.

v1: Link: https://lore.kernel.org/lkml/20210420090925.7457-1-r...@kernel.org
* Add comment about the semantics of pfn_valid() as Anshuman suggested
* Extend comments about MEMBLOCK_NOMAP, per Anshuman
* Use pfn_is_map_memory() name for the exported wrapper for
  memblock_is_map_memory(). It is still local to arch/arm64 in the end
  because of header dependency issues.

rfc: Link: https://lore.kernel.org/lkml/20210407172607.8812-1-r...@kernel.org

Mike Rapoport (4):
  include/linux/mmzone.h: add documentation for pfn_valid()
  memblock: update initialization of reserved pages
  arm64: decouple check whether pfn is in linear map from pfn_valid()
  arm64: drop pfn_valid_within() and simplify pfn_valid()

 arch/arm64/Kconfig  |  3 ---
 arch/arm64/include/asm/memory.h |  2 +-
 arch/arm64/include/asm/page.h   |  1 +
 arch/arm64/kvm/mmu.c|  2 +-
 arch/arm64/mm/init.c| 10 --
 arch/arm64/mm/ioremap.c |  4 ++--
 arch/arm64/mm/mmu.c |  2 +-
 include/linux/memblock.h|  4 +++-
 include/linux/mmzone.h  | 11 +++
 mm/memblock.c   | 28 ++--
 10 files changed, 54 insertions(+), 13 deletions(-)

base-commit: e49d033bddf5b565044e2abe4241353959bc9120
-- 
2.28.0

*** BLURB HERE ***

Mike Rapoport (4):
  include/linux/mmzone.h: add documentation for pfn_valid()
  memblock: update initialization of reserved pages
  arm64: decouple check whether pfn is in linear map from pfn_valid()
  arm64: drop pfn_valid_within() and simplify pfn_valid()

 arch/arm64/Kconfig  |  3 ---
 arch/arm64/include/asm/memory.h |  2 +-
 arch/arm64/include/asm/page.h   |  1 +
 arch/arm64/kvm/mmu.c|  2 +-
 arch/arm64/mm/init.c| 15 +--
 arch/arm64/mm/ioremap.c |  4 ++--
 arch/arm64/mm/mmu.c |  2 +-
 include/linux/memblock.h|  4 +++-
 include/linux/mmzone.h  | 11 +++
 mm/memblock.c   | 28 ++--
 10 files changed, 59 insertions(+), 13 deletions(-)

-- 
2.28.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO

2021-04-21 Thread Keqian Zhu



On 2021/4/21 15:52, Gavin Shan wrote:
> Hi Keqian,
> 
> On 4/16/21 12:03 AM, Keqian Zhu wrote:
>> The MMIO region of a device maybe huge (GB level), try to use
>> block mapping in stage2 to speedup both map and unmap.
>>
>> Compared to normal memory mapping, we should consider two more
>> points when try block mapping for MMIO region:
>>
>> 1. For normal memory mapping, the PA(host physical address) and
>> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
>> the HVA to request hugepage, so we don't need to consider PA
>> alignment when verifing block mapping. But for device memory
>> mapping, the PA and HVA may have different alignment.
>>
>> 2. For normal memory mapping, we are sure hugepage size properly
>> fit into vma, so we don't check whether the mapping size exceeds
>> the boundary of vma. But for device memory mapping, we should pay
>> attention to this.
>>
>> This adds get_vma_page_shift() to get page shift for both normal
>> memory and device MMIO region, and check these two points when
>> selecting block mapping size for MMIO region.
>>
>> Signed-off-by: Keqian Zhu 
>> ---
>>   arch/arm64/kvm/mmu.c | 61 
>>   1 file changed, 51 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c59af5ca01b0..5a1cc7751e6d 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -738,6 +738,35 @@ transparent_hugepage_adjust(struct kvm_memory_slot 
>> *memslot,
>>   return PAGE_SIZE;
>>   }
>>   +static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long 
>> hva)
>> +{
>> +unsigned long pa;
>> +
>> +if (is_vm_hugetlb_page(vma) && !(vma->vm_flags & VM_PFNMAP))
>> +return huge_page_shift(hstate_vma(vma));
>> +
>> +if (!(vma->vm_flags & VM_PFNMAP))
>> +return PAGE_SHIFT;
>> +
>> +VM_BUG_ON(is_vm_hugetlb_page(vma));
>> +
> 
> I don't understand how VM_PFNMAP is set for hugetlbfs related vma.
> I think they are exclusive, meaning the flag is never set for
> hugetlbfs vma. If it's true, VM_PFNMAP needn't be checked on hugetlbfs
> vma and the VM_BUG_ON() becomes unnecessary.
Yes, but we're not sure all drivers follow this rule. Add a BUG_ON() is
a way to catch issue.

> 
>> +pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
>> +
>> +#ifndef __PAGETABLE_PMD_FOLDED
>> +if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
>> +ALIGN_DOWN(hva, PUD_SIZE) >= vma->vm_start &&
>> +ALIGN(hva, PUD_SIZE) <= vma->vm_end)
>> +return PUD_SHIFT;
>> +#endif
>> +
>> +if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
>> +ALIGN_DOWN(hva, PMD_SIZE) >= vma->vm_start &&
>> +ALIGN(hva, PMD_SIZE) <= vma->vm_end)
>> +return PMD_SHIFT;
>> +
>> +return PAGE_SHIFT;
>> +}
>> +
> 
> There is "switch(...)" fallback mechanism in user_mem_abort(). 
> PUD_SIZE/PMD_SIZE
> can be downgraded accordingly if the addresses fails in the alignment check
> by fault_supports_stage2_huge_mapping(). I think it would make 
> user_mem_abort()
> simplified if the logic can be moved to get_vma_page_shift().
> 
> Another question if we need the check from 
> fault_supports_stage2_huge_mapping()
> if VM_PFNMAP area is going to be covered by block mapping. If so, the 
> "switch(...)"
> fallback mechanism needs to be part of get_vma_page_shift().
Yes, Good suggestion. My idea is that we can keep this series simpler and do 
further
optimization in another patch series. Do you mind to send a patch?

Thanks,
Keqian
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 1/2] kvm/arm64: Remove the creation time's mapping of MMIO regions

2021-04-21 Thread Keqian Zhu
Hi Gavin,

On 2021/4/21 14:38, Gavin Shan wrote:
> Hi Keqian,
> 
> On 4/16/21 12:03 AM, Keqian Zhu wrote:
>> The MMIO regions may be unmapped for many reasons and can be remapped
>> by stage2 fault path. Map MMIO regions at creation time becomes a
>> minor optimization and makes these two mapping path hard to sync.
>>
>> Remove the mapping code while keep the useful sanity check.
>>
>> Signed-off-by: Keqian Zhu 
>> ---
>>   arch/arm64/kvm/mmu.c | 38 +++---
>>   1 file changed, 3 insertions(+), 35 deletions(-)
>>
> 
> After removing the logic to create stage2 mapping for VM_PFNMAP region,
> I think the "do { } while" loop becomes unnecessary and can be dropped
> completely. It means the only sanity check is to see if the memory slot
> overflows IPA space or not. In that case, KVM_MR_FLAGS_ONLY can be
> ignored because the memory slot's base address and length aren't changed
> when we have KVM_MR_FLAGS_ONLY.
Maybe not exactly. Here we do an important sanity check that we shouldn't
log dirty for memslots with VM_PFNMAP.


> 
> It seems the patch isn't based on "next" branch because find_vma() was
> replaced by find_vma_intersection() by one of my patches :)
Yep, I remember it. I will replace it at next merge window...

Thanks,
Keqian

> 
> Thanks,
> Gavin
> 
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 8711894db8c2..c59af5ca01b0 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -1301,7 +1301,6 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>>   {
>>   hva_t hva = mem->userspace_addr;
>>   hva_t reg_end = hva + mem->memory_size;
>> -bool writable = !(mem->flags & KVM_MEM_READONLY);
>>   int ret = 0;
>> if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
>> @@ -1318,8 +1317,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>>   mmap_read_lock(current->mm);
>>   /*
>>* A memory region could potentially cover multiple VMAs, and any holes
>> - * between them, so iterate over all of them to find out if we can map
>> - * any of them right now.
>> + * between them, so iterate over all of them.
>>*
>>* ++
>>* +---++   ++
>> @@ -1330,50 +1328,20 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>>*/
>>   do {
>>   struct vm_area_struct *vma = find_vma(current->mm, hva);
>> -hva_t vm_start, vm_end;
>> if (!vma || vma->vm_start >= reg_end)
>>   break;
>>   -/*
>> - * Take the intersection of this VMA with the memory region
>> - */
>> -vm_start = max(hva, vma->vm_start);
>> -vm_end = min(reg_end, vma->vm_end);
>> -
>>   if (vma->vm_flags & VM_PFNMAP) {
>> -gpa_t gpa = mem->guest_phys_addr +
>> -(vm_start - mem->userspace_addr);
>> -phys_addr_t pa;
>> -
>> -pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
>> -pa += vm_start - vma->vm_start;
>> -
>>   /* IO region dirty page logging not allowed */
>>   if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES) {
>>   ret = -EINVAL;
>> -goto out;
>> -}
>> -
>> -ret = kvm_phys_addr_ioremap(kvm, gpa, pa,
>> -vm_end - vm_start,
>> -writable);
>> -if (ret)
>>   break;
>> +}
>>   }
>> -hva = vm_end;
>> +hva = min(reg_end, vma->vm_end);
>>   } while (hva < reg_end);
>>   -if (change == KVM_MR_FLAGS_ONLY)
>> -goto out;
>> -
>> -spin_lock(>mmu_lock);
>> -if (ret)
>> -unmap_stage2_range(>arch.mmu, mem->guest_phys_addr, 
>> mem->memory_size);
>> -else if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
>> -stage2_flush_memslot(kvm, memslot);
>> -spin_unlock(>mmu_lock);
>> -out:
>>   mmap_read_unlock(current->mm);
>>   return ret;
>>   }
>>
> 
> .
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting

2021-04-21 Thread Keqian Zhu
Hi Gavin,

On 2021/4/21 14:20, Gavin Shan wrote:
> Hi Keqian and Santosh,
> 
> On 4/21/21 12:59 PM, Keqian Zhu wrote:
>> On 2020/10/22 0:16, Santosh Shukla wrote:
>>> The Commit:6d674e28 introduces a notion to detect and handle the
>>> device mapping. The commit checks for the VM_PFNMAP flag is set
>>> in vma->flags and if set then marks force_pte to true such that
>>> if force_pte is true then ignore the THP function check
>>> (/transparent_hugepage_adjust()).
>>>
>>> There could be an issue with the VM_PFNMAP flag setting and checking.
>>> For example consider a case where the mdev vendor driver register's
>>> the vma_fault handler named vma_mmio_fault(), which maps the
>>> host MMIO region in-turn calls remap_pfn_range() and maps
>>> the MMIO's vma space. Where, remap_pfn_range implicitly sets
>>> the VM_PFNMAP flag into vma->flags.
>> Could you give the name of the mdev vendor driver that triggers this issue?
>> I failed to find one according to your description. Thanks.
>>
> 
> I think it would be fixed in driver side to set VM_PFNMAP in
> its mmap() callback (call_mmap()), like vfio PCI driver does.
> It means it won't be delayed until page fault is issued and
> remap_pfn_range() is called. It's determined from the beginning
> that the vma associated the mdev vendor driver is serving as
> PFN remapping purpose. So the vma should be populated completely,
> including the VM_PFNMAP flag before it becomes visible to user
> space.
> 
> The example can be found from vfio driver in drivers/vfio/pci/vfio_pci.c:
> vfio_pci_mmap:   VM_PFNMAP is set for the vma
> vfio_pci_mmap_fault: remap_pfn_range() is called
Right. I have discussed the above with Marc. I want to find the driver
to fix it. However, AFAICS, there is no driver matches the description...

Thanks,
Keqian

> 
> Thanks,
> Gavin
> 
>>>
>>> Now lets assume a mmio fault handing flow where guest first access
>>> the MMIO region whose 2nd stage translation is not present.
>>> So that results to arm64-kvm hypervisor executing guest abort handler,
>>> like below:
>>>
>>> kvm_handle_guest_abort() -->
>>>   user_mem_abort()--> {
>>>
>>>  ...
>>>  0. checks the vma->flags for the VM_PFNMAP.
>>>  1. Since VM_PFNMAP flag is not yet set so force_pte _is_ false;
>>>  2. gfn_to_pfn_prot() -->
>>>  __gfn_to_pfn_memslot() -->
>>>  fixup_user_fault() -->
>>>  handle_mm_fault()-->
>>>  __do_fault() -->
>>> vma_mmio_fault() --> // vendor's mdev fault handler
>>>  remap_pfn_range()--> // Here sets the VM_PFNMAP
>>> flag into vma->flags.
>>>  3. Now that force_pte is set to false in step-2),
>>> will execute transparent_hugepage_adjust() func and
>>> that lead to Oops [4].
>>>   }
>>>
>>> The proposition is to check is_iomap flag before executing the THP
>>> function transparent_hugepage_adjust().
>>>
>>> [4] THP Oops:
 pc: kvm_is_transparent_hugepage+0x18/0xb0
 ...
 ...
 user_mem_abort+0x340/0x9b8
 kvm_handle_guest_abort+0x248/0x468
 handle_exit+0x150/0x1b0
 kvm_arch_vcpu_ioctl_run+0x4d4/0x778
 kvm_vcpu_ioctl+0x3c0/0x858
 ksys_ioctl+0x84/0xb8
 __arm64_sys_ioctl+0x28/0x38
>>>
>>> Tested on Huawei Kunpeng Taishan-200 arm64 server, Using VFIO-mdev device.
>>> Linux tip: 583090b1
>>>
>>> Fixes: 6d674e28 ("KVM: arm/arm64: Properly handle faulting of device 
>>> mappings")
>>> Signed-off-by: Santosh Shukla 
>>> ---
>>>   arch/arm64/kvm/mmu.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index 3d26b47..ff15357 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -1947,7 +1947,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>>> phys_addr_t fault_ipa,
>>>* If we are not forced to use page mapping, check if we are
>>>* backed by a THP and thus use block mapping if possible.
>>>*/
>>> -if (vma_pagesize == PAGE_SIZE && !force_pte)
>>> +if (vma_pagesize == PAGE_SIZE && !force_pte && !is_iomap(flags))
>>>   vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>  , _ipa);
>>>   if (writable)
>>>
>>
> 
> .
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm