Re: [RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings

2021-03-22 Thread wangyanan (Y)

Hi Alex,

On 2021/3/19 23:07, Alexandru Elisei wrote:

Hi Yanan,

Sorry for taking so long to reply, been busy with other things unfortunately.

Still appreciate your patient reply! :)

I
did notice that you sent a new version of this series, but I would like to
continue our discussion on this patch, since it's easier to get the full 
context.

On 3/4/21 7:07 AM, wangyanan (Y) wrote:

Hi Alex,

On 2021/3/4 1:27, Alexandru Elisei wrote:

Hi Yanan,

On 3/3/21 11:04 AM, wangyanan (Y) wrote:

Hi Alex,

On 2021/3/3 1:13, Alexandru Elisei wrote:

Hello,

On 2/8/21 11:22 AM, Yanan Wang wrote:

When KVM needs to coalesce the normal page mappings into a block mapping,
we currently invalidate the old table entry first followed by invalidation
of TLB, then unmap the page mappings, and install the block entry at last.

It will cost a long time to unmap the numerous page mappings, which means
there will be a long period when the table entry can be found invalid.
If other vCPUs access any guest page within the block range and find the
table entry invalid, they will all exit from guest with a translation fault
which is not necessary. And KVM will make efforts to handle these faults,
especially when performing CMOs by block range.

So let's quickly install the block entry at first to ensure uninterrupted
memory access of the other vCPUs, and then unmap the page mappings after
installation. This will reduce most of the time when the table entry is
invalid, and avoid most of the unnecessary translation faults.

I'm not convinced I've fully understood what is going on yet, but it seems to me
that the idea is sound. Some questions and comments below.

What I am trying to do in this patch is to adjust the order of rebuilding block
mappings from page mappings.
Take the rebuilding of 1G block mappings as an example.
Before this patch, the order is like:
1) invalidate the table entry of the 1st level(PUD)
2) flush TLB by VMID
3) unmap the old PMD/PTE tables
4) install the new block entry to the 1st level(PUD)

So entry in the 1st level can be found invalid by other vcpus in 1), 2), and 3),
and it's a long time in 3) to unmap
the numerous old PMD/PTE tables, which means the total time of the entry being
invalid is long enough to
affect the performance.

After this patch, the order is like:
1) invalidate the table ebtry of the 1st level(PUD)
2) flush TLB by VMID
3) install the new block entry to the 1st level(PUD)
4) unmap the old PMD/PTE tables

The change ensures that period of entry in the 1st level(PUD) being invalid is
only in 1) and 2),
so if other vcpus access memory within 1G, there will be less chance to find the
entry invalid
and as a result trigger an unnecessary translation fault.

Thank you for the explanation, that was my understand of it also, and I believe
your idea is correct. I was more concerned that I got some of the details wrong,
and you have kindly corrected me below.


Signed-off-by: Yanan Wang 
---
    arch/arm64/kvm/hyp/pgtable.c | 26 --
    1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 78a560446f80..308c36b9cd21 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -434,6 +434,7 @@ struct stage2_map_data {
    kvm_pte_t    attr;
      kvm_pte_t    *anchor;
+    kvm_pte_t    *follow;
      struct kvm_s2_mmu    *mmu;
    struct kvm_mmu_memory_cache    *memcache;
@@ -553,15 +554,14 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end,
u32 level,
    if (!kvm_block_mapping_supported(addr, end, data->phys, level))
    return 0;
    -    kvm_set_invalid_pte(ptep);
-
    /*
- * Invalidate the whole stage-2, as we may have numerous leaf
- * entries below us which would otherwise need invalidating
- * individually.
+ * If we need to coalesce existing table entries into a block here,
+ * then install the block entry first and the sub-level page mappings
+ * will be unmapped later.
     */
-    kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
    data->anchor = ptep;
+    data->follow = kvm_pte_follow(*ptep);
+    stage2_coalesce_tables_into_block(addr, level, ptep, data);

Here's how stage2_coalesce_tables_into_block() is implemented from the previous
patch (it might be worth merging it with this patch, I found it impossible to
judge if the function is correct without seeing how it is used and what is
replacing):

Ok, will do this if v2 is going to be post.

static void stage2_coalesce_tables_into_block(u64 addr, u32 level,
                     kvm_pte_t *ptep,
                     struct stage2_map_data *data)
{
   u64 granule = kvm_granule_size(level), phys = data->phys;
   kvm_pte_t new = kvm_init_valid_leaf_pte(phys, data->attr, level);

   kvm_set_invalid_pte(ptep);

   /*
    * Invalidate the whole stage-2, as we may have numerous leaf entries
    

Re: [RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings

2021-03-19 Thread Alexandru Elisei
Hi Yanan,

Sorry for taking so long to reply, been busy with other things unfortunately. I
did notice that you sent a new version of this series, but I would like to
continue our discussion on this patch, since it's easier to get the full 
context.

On 3/4/21 7:07 AM, wangyanan (Y) wrote:
> Hi Alex,
>
> On 2021/3/4 1:27, Alexandru Elisei wrote:
>> Hi Yanan,
>>
>> On 3/3/21 11:04 AM, wangyanan (Y) wrote:
>>> Hi Alex,
>>>
>>> On 2021/3/3 1:13, Alexandru Elisei wrote:
 Hello,

 On 2/8/21 11:22 AM, Yanan Wang wrote:
> When KVM needs to coalesce the normal page mappings into a block mapping,
> we currently invalidate the old table entry first followed by invalidation
> of TLB, then unmap the page mappings, and install the block entry at last.
>
> It will cost a long time to unmap the numerous page mappings, which means
> there will be a long period when the table entry can be found invalid.
> If other vCPUs access any guest page within the block range and find the
> table entry invalid, they will all exit from guest with a translation 
> fault
> which is not necessary. And KVM will make efforts to handle these faults,
> especially when performing CMOs by block range.
>
> So let's quickly install the block entry at first to ensure uninterrupted
> memory access of the other vCPUs, and then unmap the page mappings after
> installation. This will reduce most of the time when the table entry is
> invalid, and avoid most of the unnecessary translation faults.
 I'm not convinced I've fully understood what is going on yet, but it seems 
 to me
 that the idea is sound. Some questions and comments below.
>>> What I am trying to do in this patch is to adjust the order of rebuilding 
>>> block
>>> mappings from page mappings.
>>> Take the rebuilding of 1G block mappings as an example.
>>> Before this patch, the order is like:
>>> 1) invalidate the table entry of the 1st level(PUD)
>>> 2) flush TLB by VMID
>>> 3) unmap the old PMD/PTE tables
>>> 4) install the new block entry to the 1st level(PUD)
>>>
>>> So entry in the 1st level can be found invalid by other vcpus in 1), 2), 
>>> and 3),
>>> and it's a long time in 3) to unmap
>>> the numerous old PMD/PTE tables, which means the total time of the entry 
>>> being
>>> invalid is long enough to
>>> affect the performance.
>>>
>>> After this patch, the order is like:
>>> 1) invalidate the table ebtry of the 1st level(PUD)
>>> 2) flush TLB by VMID
>>> 3) install the new block entry to the 1st level(PUD)
>>> 4) unmap the old PMD/PTE tables
>>>
>>> The change ensures that period of entry in the 1st level(PUD) being invalid 
>>> is
>>> only in 1) and 2),
>>> so if other vcpus access memory within 1G, there will be less chance to 
>>> find the
>>> entry invalid
>>> and as a result trigger an unnecessary translation fault.
>> Thank you for the explanation, that was my understand of it also, and I 
>> believe
>> your idea is correct. I was more concerned that I got some of the details 
>> wrong,
>> and you have kindly corrected me below.
>>
> Signed-off-by: Yanan Wang 
> ---
>    arch/arm64/kvm/hyp/pgtable.c | 26 --
>    1 file changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 78a560446f80..308c36b9cd21 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -434,6 +434,7 @@ struct stage2_map_data {
>    kvm_pte_t    attr;
>      kvm_pte_t    *anchor;
> +    kvm_pte_t    *follow;
>      struct kvm_s2_mmu    *mmu;
>    struct kvm_mmu_memory_cache    *memcache;
> @@ -553,15 +554,14 @@ static int stage2_map_walk_table_pre(u64 addr, u64 
> end,
> u32 level,
>    if (!kvm_block_mapping_supported(addr, end, data->phys, level))
>    return 0;
>    -    kvm_set_invalid_pte(ptep);
> -
>    /*
> - * Invalidate the whole stage-2, as we may have numerous leaf
> - * entries below us which would otherwise need invalidating
> - * individually.
> + * If we need to coalesce existing table entries into a block here,
> + * then install the block entry first and the sub-level page mappings
> + * will be unmapped later.
>     */
> -    kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
>    data->anchor = ptep;
> +    data->follow = kvm_pte_follow(*ptep);
> +    stage2_coalesce_tables_into_block(addr, level, ptep, data);
 Here's how stage2_coalesce_tables_into_block() is implemented from the 
 previous
 patch (it might be worth merging it with this patch, I found it impossible 
 to
 judge if the function is correct without seeing how it is used and what is
 replacing):
>>> Ok, will do this if v2 is going to be post.
 static void 

Re: [RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings

2021-03-03 Thread wangyanan (Y)



On 2021/3/4 15:07, wangyanan (Y) wrote:

Hi Alex,

On 2021/3/4 1:27, Alexandru Elisei wrote:

Hi Yanan,

On 3/3/21 11:04 AM, wangyanan (Y) wrote:

Hi Alex,

On 2021/3/3 1:13, Alexandru Elisei wrote:

Hello,

On 2/8/21 11:22 AM, Yanan Wang wrote:
When KVM needs to coalesce the normal page mappings into a block 
mapping,
we currently invalidate the old table entry first followed by 
invalidation
of TLB, then unmap the page mappings, and install the block entry 
at last.


It will cost a long time to unmap the numerous page mappings, 
which means
there will be a long period when the table entry can be found 
invalid.
If other vCPUs access any guest page within the block range and 
find the
table entry invalid, they will all exit from guest with a 
translation fault
which is not necessary. And KVM will make efforts to handle these 
faults,

especially when performing CMOs by block range.

So let's quickly install the block entry at first to ensure 
uninterrupted
memory access of the other vCPUs, and then unmap the page mappings 
after
installation. This will reduce most of the time when the table 
entry is

invalid, and avoid most of the unnecessary translation faults.
I'm not convinced I've fully understood what is going on yet, but 
it seems to me

that the idea is sound. Some questions and comments below.
What I am trying to do in this patch is to adjust the order of 
rebuilding block

mappings from page mappings.
Take the rebuilding of 1G block mappings as an example.
Before this patch, the order is like:
1) invalidate the table entry of the 1st level(PUD)
2) flush TLB by VMID
3) unmap the old PMD/PTE tables
4) install the new block entry to the 1st level(PUD)

So entry in the 1st level can be found invalid by other vcpus in 1), 
2), and 3),

and it's a long time in 3) to unmap
the numerous old PMD/PTE tables, which means the total time of the 
entry being

invalid is long enough to
affect the performance.

After this patch, the order is like:
1) invalidate the table ebtry of the 1st level(PUD)
2) flush TLB by VMID
3) install the new block entry to the 1st level(PUD)
4) unmap the old PMD/PTE tables

The change ensures that period of entry in the 1st level(PUD) being 
invalid is

only in 1) and 2),
so if other vcpus access memory within 1G, there will be less chance 
to find the

entry invalid
and as a result trigger an unnecessary translation fault.
Thank you for the explanation, that was my understand of it also, and 
I believe
your idea is correct. I was more concerned that I got some of the 
details wrong,

and you have kindly corrected me below.


Signed-off-by: Yanan Wang 
---
   arch/arm64/kvm/hyp/pgtable.c | 26 --
   1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c 
b/arch/arm64/kvm/hyp/pgtable.c

index 78a560446f80..308c36b9cd21 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -434,6 +434,7 @@ struct stage2_map_data {
   kvm_pte_t    attr;
     kvm_pte_t    *anchor;
+    kvm_pte_t    *follow;
     struct kvm_s2_mmu    *mmu;
   struct kvm_mmu_memory_cache    *memcache;
@@ -553,15 +554,14 @@ static int stage2_map_walk_table_pre(u64 
addr, u64 end,

u32 level,
   if (!kvm_block_mapping_supported(addr, end, data->phys, 
level))

   return 0;
   -    kvm_set_invalid_pte(ptep);
-
   /*
- * Invalidate the whole stage-2, as we may have numerous leaf
- * entries below us which would otherwise need invalidating
- * individually.
+ * If we need to coalesce existing table entries into a block 
here,
+ * then install the block entry first and the sub-level page 
mappings

+ * will be unmapped later.
    */
-    kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
   data->anchor = ptep;
+    data->follow = kvm_pte_follow(*ptep);
+    stage2_coalesce_tables_into_block(addr, level, ptep, data);
Here's how stage2_coalesce_tables_into_block() is implemented from 
the previous
patch (it might be worth merging it with this patch, I found it 
impossible to
judge if the function is correct without seeing how it is used and 
what is

replacing):

Ok, will do this if v2 is going to be post.

static void stage2_coalesce_tables_into_block(u64 addr, u32 level,
                    kvm_pte_t *ptep,
                    struct stage2_map_data *data)
{
  u64 granule = kvm_granule_size(level), phys = data->phys;
  kvm_pte_t new = kvm_init_valid_leaf_pte(phys, data->attr, 
level);


  kvm_set_invalid_pte(ptep);

  /*
   * Invalidate the whole stage-2, as we may have numerous leaf 
entries
   * below us which would otherwise need invalidating 
individually.

   */
  kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
  smp_store_release(ptep, new);
  data->phys += granule;
}

This works because __kvm_pgtable_visit() saves the *ptep value 
before calling the
pre callback, and it visits the next 

Re: [RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings

2021-03-03 Thread wangyanan (Y)

Hi Alex,

On 2021/3/4 1:27, Alexandru Elisei wrote:

Hi Yanan,

On 3/3/21 11:04 AM, wangyanan (Y) wrote:

Hi Alex,

On 2021/3/3 1:13, Alexandru Elisei wrote:

Hello,

On 2/8/21 11:22 AM, Yanan Wang wrote:

When KVM needs to coalesce the normal page mappings into a block mapping,
we currently invalidate the old table entry first followed by invalidation
of TLB, then unmap the page mappings, and install the block entry at last.

It will cost a long time to unmap the numerous page mappings, which means
there will be a long period when the table entry can be found invalid.
If other vCPUs access any guest page within the block range and find the
table entry invalid, they will all exit from guest with a translation fault
which is not necessary. And KVM will make efforts to handle these faults,
especially when performing CMOs by block range.

So let's quickly install the block entry at first to ensure uninterrupted
memory access of the other vCPUs, and then unmap the page mappings after
installation. This will reduce most of the time when the table entry is
invalid, and avoid most of the unnecessary translation faults.

I'm not convinced I've fully understood what is going on yet, but it seems to me
that the idea is sound. Some questions and comments below.

What I am trying to do in this patch is to adjust the order of rebuilding block
mappings from page mappings.
Take the rebuilding of 1G block mappings as an example.
Before this patch, the order is like:
1) invalidate the table entry of the 1st level(PUD)
2) flush TLB by VMID
3) unmap the old PMD/PTE tables
4) install the new block entry to the 1st level(PUD)

So entry in the 1st level can be found invalid by other vcpus in 1), 2), and 3),
and it's a long time in 3) to unmap
the numerous old PMD/PTE tables, which means the total time of the entry being
invalid is long enough to
affect the performance.

After this patch, the order is like:
1) invalidate the table ebtry of the 1st level(PUD)
2) flush TLB by VMID
3) install the new block entry to the 1st level(PUD)
4) unmap the old PMD/PTE tables

The change ensures that period of entry in the 1st level(PUD) being invalid is
only in 1) and 2),
so if other vcpus access memory within 1G, there will be less chance to find the
entry invalid
and as a result trigger an unnecessary translation fault.

Thank you for the explanation, that was my understand of it also, and I believe
your idea is correct. I was more concerned that I got some of the details wrong,
and you have kindly corrected me below.


Signed-off-by: Yanan Wang 
---
   arch/arm64/kvm/hyp/pgtable.c | 26 --
   1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 78a560446f80..308c36b9cd21 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -434,6 +434,7 @@ struct stage2_map_data {
   kvm_pte_t    attr;
     kvm_pte_t    *anchor;
+    kvm_pte_t    *follow;
     struct kvm_s2_mmu    *mmu;
   struct kvm_mmu_memory_cache    *memcache;
@@ -553,15 +554,14 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end,
u32 level,
   if (!kvm_block_mapping_supported(addr, end, data->phys, level))
   return 0;
   -    kvm_set_invalid_pte(ptep);
-
   /*
- * Invalidate the whole stage-2, as we may have numerous leaf
- * entries below us which would otherwise need invalidating
- * individually.
+ * If we need to coalesce existing table entries into a block here,
+ * then install the block entry first and the sub-level page mappings
+ * will be unmapped later.
    */
-    kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
   data->anchor = ptep;
+    data->follow = kvm_pte_follow(*ptep);
+    stage2_coalesce_tables_into_block(addr, level, ptep, data);

Here's how stage2_coalesce_tables_into_block() is implemented from the previous
patch (it might be worth merging it with this patch, I found it impossible to
judge if the function is correct without seeing how it is used and what is
replacing):

Ok, will do this if v2 is going to be post.

static void stage2_coalesce_tables_into_block(u64 addr, u32 level,
                    kvm_pte_t *ptep,
                    struct stage2_map_data *data)
{
  u64 granule = kvm_granule_size(level), phys = data->phys;
  kvm_pte_t new = kvm_init_valid_leaf_pte(phys, data->attr, level);

  kvm_set_invalid_pte(ptep);

  /*
   * Invalidate the whole stage-2, as we may have numerous leaf entries
   * below us which would otherwise need invalidating individually.
   */
  kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
  smp_store_release(ptep, new);
  data->phys += granule;
}

This works because __kvm_pgtable_visit() saves the *ptep value before calling 
the
pre callback, and it visits the next level table based on the initial pte value,
not the new value written by 

Re: [RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings

2021-03-03 Thread Alexandru Elisei
Hi Yanan,

On 3/3/21 11:04 AM, wangyanan (Y) wrote:
> Hi Alex,
>
> On 2021/3/3 1:13, Alexandru Elisei wrote:
>> Hello,
>>
>> On 2/8/21 11:22 AM, Yanan Wang wrote:
>>> When KVM needs to coalesce the normal page mappings into a block mapping,
>>> we currently invalidate the old table entry first followed by invalidation
>>> of TLB, then unmap the page mappings, and install the block entry at last.
>>>
>>> It will cost a long time to unmap the numerous page mappings, which means
>>> there will be a long period when the table entry can be found invalid.
>>> If other vCPUs access any guest page within the block range and find the
>>> table entry invalid, they will all exit from guest with a translation fault
>>> which is not necessary. And KVM will make efforts to handle these faults,
>>> especially when performing CMOs by block range.
>>>
>>> So let's quickly install the block entry at first to ensure uninterrupted
>>> memory access of the other vCPUs, and then unmap the page mappings after
>>> installation. This will reduce most of the time when the table entry is
>>> invalid, and avoid most of the unnecessary translation faults.
>> I'm not convinced I've fully understood what is going on yet, but it seems 
>> to me
>> that the idea is sound. Some questions and comments below.
> What I am trying to do in this patch is to adjust the order of rebuilding 
> block
> mappings from page mappings.
> Take the rebuilding of 1G block mappings as an example.
> Before this patch, the order is like:
> 1) invalidate the table entry of the 1st level(PUD)
> 2) flush TLB by VMID
> 3) unmap the old PMD/PTE tables
> 4) install the new block entry to the 1st level(PUD)
>
> So entry in the 1st level can be found invalid by other vcpus in 1), 2), and 
> 3),
> and it's a long time in 3) to unmap
> the numerous old PMD/PTE tables, which means the total time of the entry being
> invalid is long enough to
> affect the performance.
>
> After this patch, the order is like:
> 1) invalidate the table ebtry of the 1st level(PUD)
> 2) flush TLB by VMID
> 3) install the new block entry to the 1st level(PUD)
> 4) unmap the old PMD/PTE tables
>
> The change ensures that period of entry in the 1st level(PUD) being invalid is
> only in 1) and 2),
> so if other vcpus access memory within 1G, there will be less chance to find 
> the
> entry invalid
> and as a result trigger an unnecessary translation fault.

Thank you for the explanation, that was my understand of it also, and I believe
your idea is correct. I was more concerned that I got some of the details wrong,
and you have kindly corrected me below.

>>> Signed-off-by: Yanan Wang 
>>> ---
>>>   arch/arm64/kvm/hyp/pgtable.c | 26 --
>>>   1 file changed, 12 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>> index 78a560446f80..308c36b9cd21 100644
>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>> @@ -434,6 +434,7 @@ struct stage2_map_data {
>>>   kvm_pte_t    attr;
>>>     kvm_pte_t    *anchor;
>>> +    kvm_pte_t    *follow;
>>>     struct kvm_s2_mmu    *mmu;
>>>   struct kvm_mmu_memory_cache    *memcache;
>>> @@ -553,15 +554,14 @@ static int stage2_map_walk_table_pre(u64 addr, u64 
>>> end,
>>> u32 level,
>>>   if (!kvm_block_mapping_supported(addr, end, data->phys, level))
>>>   return 0;
>>>   -    kvm_set_invalid_pte(ptep);
>>> -
>>>   /*
>>> - * Invalidate the whole stage-2, as we may have numerous leaf
>>> - * entries below us which would otherwise need invalidating
>>> - * individually.
>>> + * If we need to coalesce existing table entries into a block here,
>>> + * then install the block entry first and the sub-level page mappings
>>> + * will be unmapped later.
>>>    */
>>> -    kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
>>>   data->anchor = ptep;
>>> +    data->follow = kvm_pte_follow(*ptep);
>>> +    stage2_coalesce_tables_into_block(addr, level, ptep, data);
>> Here's how stage2_coalesce_tables_into_block() is implemented from the 
>> previous
>> patch (it might be worth merging it with this patch, I found it impossible to
>> judge if the function is correct without seeing how it is used and what is
>> replacing):
> Ok, will do this if v2 is going to be post.
>> static void stage2_coalesce_tables_into_block(u64 addr, u32 level,
>>                    kvm_pte_t *ptep,
>>                    struct stage2_map_data *data)
>> {
>>  u64 granule = kvm_granule_size(level), phys = data->phys;
>>  kvm_pte_t new = kvm_init_valid_leaf_pte(phys, data->attr, level);
>>
>>  kvm_set_invalid_pte(ptep);
>>
>>  /*
>>   * Invalidate the whole stage-2, as we may have numerous leaf entries
>>   * below us which would otherwise need invalidating individually.
>>   */
>>  kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
>>  

Re: [RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings

2021-03-03 Thread wangyanan (Y)

Hi Alex,

On 2021/3/3 1:13, Alexandru Elisei wrote:

Hello,

On 2/8/21 11:22 AM, Yanan Wang wrote:

When KVM needs to coalesce the normal page mappings into a block mapping,
we currently invalidate the old table entry first followed by invalidation
of TLB, then unmap the page mappings, and install the block entry at last.

It will cost a long time to unmap the numerous page mappings, which means
there will be a long period when the table entry can be found invalid.
If other vCPUs access any guest page within the block range and find the
table entry invalid, they will all exit from guest with a translation fault
which is not necessary. And KVM will make efforts to handle these faults,
especially when performing CMOs by block range.

So let's quickly install the block entry at first to ensure uninterrupted
memory access of the other vCPUs, and then unmap the page mappings after
installation. This will reduce most of the time when the table entry is
invalid, and avoid most of the unnecessary translation faults.

I'm not convinced I've fully understood what is going on yet, but it seems to me
that the idea is sound. Some questions and comments below.
What I am trying to do in this patch is to adjust the order of 
rebuilding block mappings from page mappings.

Take the rebuilding of 1G block mappings as an example.
Before this patch, the order is like:
1) invalidate the table entry of the 1st level(PUD)
2) flush TLB by VMID
3) unmap the old PMD/PTE tables
4) install the new block entry to the 1st level(PUD)

So entry in the 1st level can be found invalid by other vcpus in 1), 2), 
and 3), and it's a long time in 3) to unmap
the numerous old PMD/PTE tables, which means the total time of the entry 
being invalid is long enough to

affect the performance.

After this patch, the order is like:
1) invalidate the table ebtry of the 1st level(PUD)
2) flush TLB by VMID
3) install the new block entry to the 1st level(PUD)
4) unmap the old PMD/PTE tables

The change ensures that period of entry in the 1st level(PUD) being 
invalid is only in 1) and 2),
so if other vcpus access memory within 1G, there will be less chance to 
find the entry invalid

and as a result trigger an unnecessary translation fault.

Signed-off-by: Yanan Wang 
---
  arch/arm64/kvm/hyp/pgtable.c | 26 --
  1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 78a560446f80..308c36b9cd21 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -434,6 +434,7 @@ struct stage2_map_data {
kvm_pte_t   attr;
  
  	kvm_pte_t			*anchor;

+   kvm_pte_t   *follow;
  
  	struct kvm_s2_mmu		*mmu;

struct kvm_mmu_memory_cache *memcache;
@@ -553,15 +554,14 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, 
u32 level,
if (!kvm_block_mapping_supported(addr, end, data->phys, level))
return 0;
  
-	kvm_set_invalid_pte(ptep);

-
/*
-* Invalidate the whole stage-2, as we may have numerous leaf
-* entries below us which would otherwise need invalidating
-* individually.
+* If we need to coalesce existing table entries into a block here,
+* then install the block entry first and the sub-level page mappings
+* will be unmapped later.
 */
-   kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
data->anchor = ptep;
+   data->follow = kvm_pte_follow(*ptep);
+   stage2_coalesce_tables_into_block(addr, level, ptep, data);

Here's how stage2_coalesce_tables_into_block() is implemented from the previous
patch (it might be worth merging it with this patch, I found it impossible to
judge if the function is correct without seeing how it is used and what is 
replacing):

Ok, will do this if v2 is going to be post.

static void stage2_coalesce_tables_into_block(u64 addr, u32 level,
                       kvm_pte_t *ptep,
                       struct stage2_map_data *data)
{
     u64 granule = kvm_granule_size(level), phys = data->phys;
     kvm_pte_t new = kvm_init_valid_leaf_pte(phys, data->attr, level);

     kvm_set_invalid_pte(ptep);

     /*
      * Invalidate the whole stage-2, as we may have numerous leaf entries
      * below us which would otherwise need invalidating individually.
      */
     kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
     smp_store_release(ptep, new);
     data->phys += granule;
}

This works because __kvm_pgtable_visit() saves the *ptep value before calling 
the
pre callback, and it visits the next level table based on the initial pte value,
not the new value written by stage2_coalesce_tables_into_block().
Right. So before replacing the initial pte value with the new value, we 
have to use
*data->follow = kvm_pte_follow(*ptep)* in stage2_map_walk_table_pre() to 
save
the initial pte value in advance. And data->follow will be used when  we 
start to


Re: [RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings

2021-03-02 Thread Alexandru Elisei
Hello,

On 2/8/21 11:22 AM, Yanan Wang wrote:
> When KVM needs to coalesce the normal page mappings into a block mapping,
> we currently invalidate the old table entry first followed by invalidation
> of TLB, then unmap the page mappings, and install the block entry at last.
>
> It will cost a long time to unmap the numerous page mappings, which means
> there will be a long period when the table entry can be found invalid.
> If other vCPUs access any guest page within the block range and find the
> table entry invalid, they will all exit from guest with a translation fault
> which is not necessary. And KVM will make efforts to handle these faults,
> especially when performing CMOs by block range.
>
> So let's quickly install the block entry at first to ensure uninterrupted
> memory access of the other vCPUs, and then unmap the page mappings after
> installation. This will reduce most of the time when the table entry is
> invalid, and avoid most of the unnecessary translation faults.

I'm not convinced I've fully understood what is going on yet, but it seems to me
that the idea is sound. Some questions and comments below.

>
> Signed-off-by: Yanan Wang 
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 26 --
>  1 file changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 78a560446f80..308c36b9cd21 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -434,6 +434,7 @@ struct stage2_map_data {
>   kvm_pte_t   attr;
>  
>   kvm_pte_t   *anchor;
> + kvm_pte_t   *follow;
>  
>   struct kvm_s2_mmu   *mmu;
>   struct kvm_mmu_memory_cache *memcache;
> @@ -553,15 +554,14 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, 
> u32 level,
>   if (!kvm_block_mapping_supported(addr, end, data->phys, level))
>   return 0;
>  
> - kvm_set_invalid_pte(ptep);
> -
>   /*
> -  * Invalidate the whole stage-2, as we may have numerous leaf
> -  * entries below us which would otherwise need invalidating
> -  * individually.
> +  * If we need to coalesce existing table entries into a block here,
> +  * then install the block entry first and the sub-level page mappings
> +  * will be unmapped later.
>*/
> - kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
>   data->anchor = ptep;
> + data->follow = kvm_pte_follow(*ptep);
> + stage2_coalesce_tables_into_block(addr, level, ptep, data);

Here's how stage2_coalesce_tables_into_block() is implemented from the previous
patch (it might be worth merging it with this patch, I found it impossible to
judge if the function is correct without seeing how it is used and what is 
replacing):

static void stage2_coalesce_tables_into_block(u64 addr, u32 level,
                      kvm_pte_t *ptep,
                      struct stage2_map_data *data)
{
    u64 granule = kvm_granule_size(level), phys = data->phys;
    kvm_pte_t new = kvm_init_valid_leaf_pte(phys, data->attr, level);

    kvm_set_invalid_pte(ptep);

    /*
     * Invalidate the whole stage-2, as we may have numerous leaf entries
     * below us which would otherwise need invalidating individually.
     */
    kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
    smp_store_release(ptep, new);
    data->phys += granule;
}

This works because __kvm_pgtable_visit() saves the *ptep value before calling 
the
pre callback, and it visits the next level table based on the initial pte value,
not the new value written by stage2_coalesce_tables_into_block().

Assuming the first patch in the series is merged ("KVM: arm64: Move the clean of
dcache to the map handler"), this function is missing the CMOs from
stage2_map_walker_try_leaf(). I can think of the following situation where they
are needed:

1. The 2nd level (PMD) table that will be turned into a block is mapped at 
stage 2
because one of the pages in the 3rd level (PTE) table it points to is accessed 
by
the guest.

2. The kernel decides to turn the userspace mapping into a transparent huge page
and calls the mmu notifier to remove the mapping from stage 2. The 2nd level 
table
is still valid.

3. Guest accesses a page which is not the page it accessed at step 1, which 
causes
a translation fault. KVM decides we can use a PMD block mapping to map the 
address
and we end up in stage2_coalesce_tables_into_block(). We need CMOs in this case
because the guest accesses memory it didn't access before.

What do you think, is that a valid situation?

>   return 0;
>  }
>  
> @@ -614,20 +614,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 
> end, u32 level,
> kvm_pte_t *ptep,
> struct stage2_map_data *data)
>  {
> - int ret = 0;
> -
>   if (!data->anchor)
>   return 0;
>  
> - free_page((unsigned 

Re: [RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings

2021-02-28 Thread wangyanan (Y)



On 2021/2/8 19:22, Yanan Wang wrote:

When KVM needs to coalesce the normal page mappings into a block mapping,
we currently invalidate the old table entry first followed by invalidation
of TLB, then unmap the page mappings, and install the block entry at last.

It will cost a long time to unmap the numerous page mappings, which means
there will be a long period when the table entry can be found invalid.
If other vCPUs access any guest page within the block range and find the
table entry invalid, they will all exit from guest with a translation fault
which is not necessary. And KVM will make efforts to handle these faults,
especially when performing CMOs by block range.

So let's quickly install the block entry at first to ensure uninterrupted
memory access of the other vCPUs, and then unmap the page mappings after
installation. This will reduce most of the time when the table entry is
invalid, and avoid most of the unnecessary translation faults.
BTW: Here show the benefit of this patch alone for reference (testing 
based on patch1) .
This patch aims to speed up the reconstruction of block 
mappings(especially for 1G blocks)
after they have been split, and the following test results represent the 
significant change.
Selftest: 
https://lore.kernel.org/lkml/20210208090841.333724-1-wangyana...@huawei.com/ 



---

hardware platform: HiSilicon Kunpeng920 Server(FWB not supported)
host kernel: Linux mainline v5.11-rc6 (with series of 
https://lore.kernel.org/r/20210114121350.123684-4-wangyana...@huawei.com 
applied)


multiple vcpus concurrently access 20G memory.
execution time of KVM reconstituting the block mappings after dirty 
logging.


cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20
   (20 vcpus, 20G memory, block mappings(HUGETLB 1G))
Before patch: KVM_ADJUST_MAPPINGS: 2.881s 2.883s 2.885s 2.879s 2.882s
After  patch: KVM_ADJUST_MAPPINGS: 0.310s 0.301s 0.312s 0.299s 0.306s  
*average 89% improvement*


cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 40
   (40 vcpus, 20G memory, block mappings(HUGETLB 1G))
Before patch: KVM_ADJUST_MAPPINGS: 2.954s 2.955s 2.949s 2.951s 2.953s
After  patch: KVM_ADJUST_MAPPINGS: 0.381s 0.366s 0.381s 0.380s 0.378s  
*average 87% improvement*


cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 60
   (60 vcpus, 20G memory, block mappings(HUGETLB 1G))
Before patch: KVM_ADJUST_MAPPINGS: 3.118s 3.112s 3.130s 3.128s 3.119s
After  patch: KVM_ADJUST_MAPPINGS: 0.524s 0.534s 0.536s 0.525s 0.539s  
*average 83% improvement*


---

Thanks,

Yanan


Signed-off-by: Yanan Wang 
---
  arch/arm64/kvm/hyp/pgtable.c | 26 --
  1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 78a560446f80..308c36b9cd21 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -434,6 +434,7 @@ struct stage2_map_data {
kvm_pte_t   attr;
  
  	kvm_pte_t			*anchor;

+   kvm_pte_t   *follow;
  
  	struct kvm_s2_mmu		*mmu;

struct kvm_mmu_memory_cache *memcache;
@@ -553,15 +554,14 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, 
u32 level,
if (!kvm_block_mapping_supported(addr, end, data->phys, level))
return 0;
  
-	kvm_set_invalid_pte(ptep);

-
/*
-* Invalidate the whole stage-2, as we may have numerous leaf
-* entries below us which would otherwise need invalidating
-* individually.
+* If we need to coalesce existing table entries into a block here,
+* then install the block entry first and the sub-level page mappings
+* will be unmapped later.
 */
-   kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
data->anchor = ptep;
+   data->follow = kvm_pte_follow(*ptep);
+   stage2_coalesce_tables_into_block(addr, level, ptep, data);
return 0;
  }
  
@@ -614,20 +614,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,

  kvm_pte_t *ptep,
  struct stage2_map_data *data)
  {
-   int ret = 0;
-
if (!data->anchor)
return 0;
  
-	free_page((unsigned long)kvm_pte_follow(*ptep));

-   put_page(virt_to_page(ptep));
-
-   if (data->anchor == ptep) {
+   if (data->anchor != ptep) {
+   free_page((unsigned long)kvm_pte_follow(*ptep));
+   put_page(virt_to_page(ptep));
+   } else {
+   free_page((unsigned long)data->follow);
data->anchor = NULL;
-   ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
}
  
-	return ret;

+   return 0;
  }
  
  /*


[RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings

2021-02-08 Thread Yanan Wang
When KVM needs to coalesce the normal page mappings into a block mapping,
we currently invalidate the old table entry first followed by invalidation
of TLB, then unmap the page mappings, and install the block entry at last.

It will cost a long time to unmap the numerous page mappings, which means
there will be a long period when the table entry can be found invalid.
If other vCPUs access any guest page within the block range and find the
table entry invalid, they will all exit from guest with a translation fault
which is not necessary. And KVM will make efforts to handle these faults,
especially when performing CMOs by block range.

So let's quickly install the block entry at first to ensure uninterrupted
memory access of the other vCPUs, and then unmap the page mappings after
installation. This will reduce most of the time when the table entry is
invalid, and avoid most of the unnecessary translation faults.

Signed-off-by: Yanan Wang 
---
 arch/arm64/kvm/hyp/pgtable.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 78a560446f80..308c36b9cd21 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -434,6 +434,7 @@ struct stage2_map_data {
kvm_pte_t   attr;
 
kvm_pte_t   *anchor;
+   kvm_pte_t   *follow;
 
struct kvm_s2_mmu   *mmu;
struct kvm_mmu_memory_cache *memcache;
@@ -553,15 +554,14 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, 
u32 level,
if (!kvm_block_mapping_supported(addr, end, data->phys, level))
return 0;
 
-   kvm_set_invalid_pte(ptep);
-
/*
-* Invalidate the whole stage-2, as we may have numerous leaf
-* entries below us which would otherwise need invalidating
-* individually.
+* If we need to coalesce existing table entries into a block here,
+* then install the block entry first and the sub-level page mappings
+* will be unmapped later.
 */
-   kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
data->anchor = ptep;
+   data->follow = kvm_pte_follow(*ptep);
+   stage2_coalesce_tables_into_block(addr, level, ptep, data);
return 0;
 }
 
@@ -614,20 +614,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, 
u32 level,
  kvm_pte_t *ptep,
  struct stage2_map_data *data)
 {
-   int ret = 0;
-
if (!data->anchor)
return 0;
 
-   free_page((unsigned long)kvm_pte_follow(*ptep));
-   put_page(virt_to_page(ptep));
-
-   if (data->anchor == ptep) {
+   if (data->anchor != ptep) {
+   free_page((unsigned long)kvm_pte_follow(*ptep));
+   put_page(virt_to_page(ptep));
+   } else {
+   free_page((unsigned long)data->follow);
data->anchor = NULL;
-   ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
}
 
-   return ret;
+   return 0;
 }
 
 /*
-- 
2.23.0