[PATCH] x86/mm/KASLR: Account for minimum padding when calculating entropy

2020-10-22 Thread Junaid Shahid
Subtract the minimum padding between regions from the initial
remain_entropy. Without this, the last region could potentially
overflow past vaddr_end if we happen to get a specific sequence
of random numbers (although extremely unlikely in practice).
The bug can be demonstrated by replacing the prandom_bytes_state
call with "rand = entropy;"

Signed-off-by: Junaid Shahid 
---
 arch/x86/mm/kaslr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 6e6b39710e5f..fe3eec30f736 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -109,7 +109,8 @@ void __init kernel_randomize_memory(void)
kaslr_regions[2].size_tb = DIV_ROUND_UP(vmemmap_size, 1UL << TB_SHIFT);
 
/* Calculate entropy available between regions */
-   remain_entropy = vaddr_end - vaddr_start;
+   remain_entropy = vaddr_end - vaddr_start -
+(ARRAY_SIZE(kaslr_regions) - 1) * PUD_SIZE;
for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
remain_entropy -= get_padding(_regions[i]);
 
-- 
2.29.0.rc2.309.g374f81d7ae-goog



Re: [PATCH] KVM: x86: drop erroneous mmu_check_root() from fast_pgd_switch()

2020-06-30 Thread Junaid Shahid

On 6/30/20 3:07 AM, Vitaly Kuznetsov wrote:

Undesired triple fault gets injected to L1 guest on SVM when L2 is
launched with certain CR3 values. It seems the mmu_check_root()
check in fast_pgd_switch() is wrong: first of all we don't know
if 'new_pgd' is a GPA or a nested GPA and, in case it is a nested
GPA, we can't check it with kvm_is_visible_gfn().

The problematic code path is:
nested_svm_vmrun()
   ...
   nested_prepare_vmcb_save()
 kvm_set_cr3(..., nested_vmcb->save.cr3)
   kvm_mmu_new_pgd()
 ...
 mmu_check_root() -> TRIPLE FAULT

The mmu_check_root() check in fast_pgd_switch() seems to be
superfluous even for non-nested case: when GPA is outside of the
visible range cached_root_available() will fail for non-direct
roots (as we can't have a matching one on the list) and we don't
seem to care for direct ones.

Also, raising #TF immediately when a non-existent GFN is written to CR3
doesn't seem to mach architecture behavior.

Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path")
Signed-off-by: Vitaly Kuznetsov 
---
- The patch fixes the immediate issue and doesn't seem to break any
   tests even with shadow PT but I'm not sure I properly understood
   why the check was there in the first place. Please review!
---
  arch/x86/kvm/mmu/mmu.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 76817d13c86e..286c74d2ae8d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4277,8 +4277,7 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t 
new_pgd,
 */
if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
mmu->root_level >= PT64_ROOT_4LEVEL)
-   return !mmu_check_root(vcpu, new_pgd >> PAGE_SHIFT) &&
-  cached_root_available(vcpu, new_pgd, new_role);
+   return cached_root_available(vcpu, new_pgd, new_role);
  
  	return false;

  }



The check does seem superfluous, so should be ok to remove. Though I think that 
fast_pgd_switch() really should be getting only L1 GPAs. Otherwise, there could 
be confusion between the same GPAs from two different L2s.

IIUC, at least on Intel, only L1 CR3s (including shadow L1 CR3s for L2) or L1 EPTPs should 
get to fast_pgd_switch(). But I am not familiar enough with SVM to see why an L2 GPA would 
end up there. From a cursory look, it seems that until "978ce5837c7e KVM: SVM: always 
update CR3 in VMCB", enter_svm_guest_mode() was calling kvm_set_cr3() only when using 
shadow paging, in which case I assume that nested_vmcb->save.cr3 would have been an L1 
CR3 shadowing the L2 CR3, correct? But now kvm_set_cr3() is called even when not using 
shadow paging, which I suppose is how we are getting the L2 CR3. Should we skip calling 
fast_pgd_switch() in that particular case?

Thanks,
Junaid


Re: [PATCH v2 2/3] KVM: x86: fix nested guest live migration with PML

2019-09-27 Thread Junaid Shahid
On 9/27/19 4:15 AM, Paolo Bonzini wrote:
> Shadow paging is fundamentally incompatible with the page-modification
> log, because the GPAs in the log come from the wrong memory map.
> In particular, for the EPT page-modification log, the GPAs in the log come
> from L2 rather than L1.  (If there was a non-EPT page-modification log,
> we couldn't use it for shadow paging because it would log GVAs rather
> than GPAs).
> 
> Therefore, we need to rely on write protection to record dirty pages.
> This has the side effect of bypassing PML, since writes now result in an
> EPT violation vmexit.
> 
> This is relatively easy to add to KVM, because pretty much the only place
> that needs changing is spte_clear_dirty.  The first access to the page
> already goes through the page fault path and records the correct GPA;
> it's only subsequent accesses that are wrong.  Therefore, we can equip
> set_spte (where the first access happens) to record that the SPTE will
> have to be write protected, and then spte_clear_dirty will use this
> information to do the right thing.
> 
> Signed-off-by: Paolo Bonzini 
> ---

Reviewed-by: Junaid Shahid 


Re: [PATCH 2/3] KVM: x86: fix nested guest live migration with PML

2019-09-26 Thread Junaid Shahid
On 9/26/19 10:18 AM, Paolo Bonzini wrote:
> @@ -1597,8 +1615,11 @@ static bool spte_clear_dirty(u64 *sptep)
>  
>   rmap_printk("rmap_clear_dirty: spte %p %llx\n", sptep, *sptep);
>  
> - spte &= ~shadow_dirty_mask;
> + WARN_ON(!spte_ad_enabled(spte));
> + if (spte_ad_need_write_protect(spte))
> + return spte_write_protect(sptep, false);
>  
> + spte &= ~shadow_dirty_mask;
>   return mmu_spte_update(sptep, spte);
>  }
>  

I think that it would be a bit cleaner to move the spte_ad_need_write_protect() 
check to the if statement inside __rmap_clear_dirty() instead, since it already 
checks for spte_ad_enabled() to decide between write-protection and 
dirty-clearing.


Reviewed-by: Junaid Shahid 


Re: [PATCH 1/3] KVM: x86: assign two bits to track SPTE kinds

2019-09-26 Thread Junaid Shahid


On 9/26/19 10:18 AM, Paolo Bonzini wrote:
> Currently, we are overloading SPTE_SPECIAL_MASK to mean both
> "A/D bits unavailable" and MMIO, where the difference between the
> two is determined by mio_mask and mmio_value.
> 
> However, the next patch will need two bits to distinguish
> availability of A/D bits from write protection.  So, while at
> it give MMIO its own bit pattern, and move the two bits from
> bit 62 to bits 52..53 since Intel is allocating EPT page table
> bits from the top.
> 
> Signed-off-by: Paolo Bonzini 
> ---

Reviewed-by: Junaid Shahid 


Re: [PATCH 3/3] kvm: introduce manual dirty log reprotect

2018-11-26 Thread Junaid Shahid
On 11/26/2018 08:54 AM, Paolo Bonzini wrote:
> There are two problems with KVM_GET_DIRTY_LOG.  First, and less important,
> it can take kvm->mmu_lock for an extended period of time.  Second, its user
> can actually see many false positives in some cases.  The latter is due
> to a benign race like this:
> 
>   1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
>  them.
>   2. The guest modifies the pages, causing them to be marked ditry.
>   3. Userspace actually copies the pages.
>   4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
>  they were not written to since (3).
> 
> This is especially a problem for large guests, where the time between
> (1) and (3) can be substantial.  This patch introduces a new
> capability which, when enabled, makes KVM_GET_DIRTY_LOG not
> write-protect the pages it returns.  Instead, userspace has to
> explicitly clear the dirty log bits just before using the content
> of the page.  The new KVM_CLEAR_DIRTY_LOG ioctl can operate on a
> 64-page granularity rather than requiring to sync a full memslot.
> This way the mmu_lock is taken for small amounts of time, and
> only a small amount of time will pass between write protection
> of pages and the sending of their content.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  Documentation/virtual/kvm/api.txt  |  65 +++
>  arch/mips/kvm/mips.c   |  23 
>  arch/x86/kvm/x86.c |  27 +
>  include/linux/kvm_host.h   |   5 +
>  include/uapi/linux/kvm.h   |  15 +++
>  tools/testing/selftests/kvm/Makefile   |   2 +
>  tools/testing/selftests/kvm/clear_dirty_log_test.c |   2 +
>  tools/testing/selftests/kvm/dirty_log_test.c   |  19 
>  tools/testing/selftests/kvm/include/kvm_util.h |   2 +
>  tools/testing/selftests/kvm/lib/kvm_util.c |  13 +++
>  virt/kvm/arm/arm.c |  16 +++
>  virt/kvm/kvm_main.c| 120 
> ++---
>  12 files changed, 293 insertions(+), 16 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/clear_dirty_log_test.c
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 1071c10cf1c7..859927fb0b9c 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -305,6 +305,9 @@ the address space for which you want to return the dirty 
> bitmap.
>  They must be less than the value that KVM_CHECK_EXTENSION returns for
>  the KVM_CAP_MULTI_ADDRESS_SPACE capability.
>  
> +The bits in the dirty bitmap are cleared before the ioctl returns, unless
> +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is enabled.  For more information,
> +see the description of the capability.
>  
>  4.9 KVM_SET_MEMORY_ALIAS
>  
> @@ -3758,6 +3761,44 @@ Coalesced pio is based on coalesced mmio. There is 
> little difference
>  between coalesced mmio and pio except that coalesced pio records accesses
>  to I/O ports.
>  
> +4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl)
> +
> +Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct kvm_dirty_log (in/out)

Shouldn't this be just (in) rather than (in/out)?

> +Returns: 0 on success, -1 on error
> +
> +/* for KVM_CLEAR_DIRTY_LOG */
> +struct kvm_clear_dirty_log {
> + __u32 slot;
> + __u32 num_pages;
> + __u64 first_page;
> + union {
> + void __user *dirty_bitmap; /* one bit per page */
> + __u64 padding;
> + };
> +};
> +
> +The ioctl write-protects pages in a memory slot according to the
> +bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap
> +field.  Bit 0 of the bitmap corresponds to page "first_page" in the
> +memory slot, and num_pages is the size in bits of the input bitmap.
> +Both first_page and num_pages must be a multiple of 64.  For each
> +bit that is set in the input bitmap, the corresponding page is write
> +protected and marked "clean" in KVM's dirty bitmap.

There will not be any write protection when PML is enabled. To make it more 
accurate, perhaps we could say something like "The ioctl clears the dirty 
status of pages ..." and later "For each bit that is set in the input bitmap, 
the corresponding page is marked clean in KVM's dirty bitmap and dirty tracking 
is re-enabled for that page either via write-protection or via clearing the 
Dirty bit in the PTE, depending on the dirty logging mode in use."

> +
> +If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies
> +the address space for which you want to return the dirty bitmap.
> +They must be less than the value that KVM_CHECK_EXTENSION returns for
> +the KVM_CAP_MULTI_ADDRESS_SPACE capability.
> +
> +This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
> +is enabled; for more information, see the description of the capability.
> +However, it can always be used 

Re: [PATCH 3/3] kvm: introduce manual dirty log reprotect

2018-11-26 Thread Junaid Shahid
On 11/26/2018 08:54 AM, Paolo Bonzini wrote:
> There are two problems with KVM_GET_DIRTY_LOG.  First, and less important,
> it can take kvm->mmu_lock for an extended period of time.  Second, its user
> can actually see many false positives in some cases.  The latter is due
> to a benign race like this:
> 
>   1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
>  them.
>   2. The guest modifies the pages, causing them to be marked ditry.
>   3. Userspace actually copies the pages.
>   4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
>  they were not written to since (3).
> 
> This is especially a problem for large guests, where the time between
> (1) and (3) can be substantial.  This patch introduces a new
> capability which, when enabled, makes KVM_GET_DIRTY_LOG not
> write-protect the pages it returns.  Instead, userspace has to
> explicitly clear the dirty log bits just before using the content
> of the page.  The new KVM_CLEAR_DIRTY_LOG ioctl can operate on a
> 64-page granularity rather than requiring to sync a full memslot.
> This way the mmu_lock is taken for small amounts of time, and
> only a small amount of time will pass between write protection
> of pages and the sending of their content.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  Documentation/virtual/kvm/api.txt  |  65 +++
>  arch/mips/kvm/mips.c   |  23 
>  arch/x86/kvm/x86.c |  27 +
>  include/linux/kvm_host.h   |   5 +
>  include/uapi/linux/kvm.h   |  15 +++
>  tools/testing/selftests/kvm/Makefile   |   2 +
>  tools/testing/selftests/kvm/clear_dirty_log_test.c |   2 +
>  tools/testing/selftests/kvm/dirty_log_test.c   |  19 
>  tools/testing/selftests/kvm/include/kvm_util.h |   2 +
>  tools/testing/selftests/kvm/lib/kvm_util.c |  13 +++
>  virt/kvm/arm/arm.c |  16 +++
>  virt/kvm/kvm_main.c| 120 
> ++---
>  12 files changed, 293 insertions(+), 16 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/clear_dirty_log_test.c
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 1071c10cf1c7..859927fb0b9c 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -305,6 +305,9 @@ the address space for which you want to return the dirty 
> bitmap.
>  They must be less than the value that KVM_CHECK_EXTENSION returns for
>  the KVM_CAP_MULTI_ADDRESS_SPACE capability.
>  
> +The bits in the dirty bitmap are cleared before the ioctl returns, unless
> +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is enabled.  For more information,
> +see the description of the capability.
>  
>  4.9 KVM_SET_MEMORY_ALIAS
>  
> @@ -3758,6 +3761,44 @@ Coalesced pio is based on coalesced mmio. There is 
> little difference
>  between coalesced mmio and pio except that coalesced pio records accesses
>  to I/O ports.
>  
> +4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl)
> +
> +Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct kvm_dirty_log (in/out)

Shouldn't this be just (in) rather than (in/out)?

> +Returns: 0 on success, -1 on error
> +
> +/* for KVM_CLEAR_DIRTY_LOG */
> +struct kvm_clear_dirty_log {
> + __u32 slot;
> + __u32 num_pages;
> + __u64 first_page;
> + union {
> + void __user *dirty_bitmap; /* one bit per page */
> + __u64 padding;
> + };
> +};
> +
> +The ioctl write-protects pages in a memory slot according to the
> +bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap
> +field.  Bit 0 of the bitmap corresponds to page "first_page" in the
> +memory slot, and num_pages is the size in bits of the input bitmap.
> +Both first_page and num_pages must be a multiple of 64.  For each
> +bit that is set in the input bitmap, the corresponding page is write
> +protected and marked "clean" in KVM's dirty bitmap.

There will not be any write protection when PML is enabled. To make it more 
accurate, perhaps we could say something like "The ioctl clears the dirty 
status of pages ..." and later "For each bit that is set in the input bitmap, 
the corresponding page is marked clean in KVM's dirty bitmap and dirty tracking 
is re-enabled for that page either via write-protection or via clearing the 
Dirty bit in the PTE, depending on the dirty logging mode in use."

> +
> +If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies
> +the address space for which you want to return the dirty bitmap.
> +They must be less than the value that KVM_CHECK_EXTENSION returns for
> +the KVM_CAP_MULTI_ADDRESS_SPACE capability.
> +
> +This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
> +is enabled; for more information, see the description of the capability.
> +However, it can always be used 

Re: [PATCH 2/3] kvm: rename last argument to kvm_get_dirty_log_protect

2018-11-26 Thread Junaid Shahid


On 11/26/2018 08:54 AM, Paolo Bonzini wrote:
> When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's
> pointer argument will always be false on exit, because no TLB flush is needed
> until the manual re-protection operation.  Rename it from "is_dirty" to 
> "flush",
> which more accurately tells the caller what they have to do with it.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/mips/kvm/mips.c | 6 +++---
>  arch/x86/kvm/x86.c   | 6 +++---
>  include/linux/kvm_host.h | 2 +-
>  virt/kvm/arm/arm.c   | 6 +++---
>  virt/kvm/kvm_main.c  | 6 +++---
>  5 files changed, 13 insertions(+), 13 deletions(-)
> 

Reviewed-by: Junaid Shahid 


Re: [PATCH 2/3] kvm: rename last argument to kvm_get_dirty_log_protect

2018-11-26 Thread Junaid Shahid


On 11/26/2018 08:54 AM, Paolo Bonzini wrote:
> When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's
> pointer argument will always be false on exit, because no TLB flush is needed
> until the manual re-protection operation.  Rename it from "is_dirty" to 
> "flush",
> which more accurately tells the caller what they have to do with it.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/mips/kvm/mips.c | 6 +++---
>  arch/x86/kvm/x86.c   | 6 +++---
>  include/linux/kvm_host.h | 2 +-
>  virt/kvm/arm/arm.c   | 6 +++---
>  virt/kvm/kvm_main.c  | 6 +++---
>  5 files changed, 13 insertions(+), 13 deletions(-)
> 

Reviewed-by: Junaid Shahid 


Re: [PATCH 1/3] kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic

2018-11-26 Thread Junaid Shahid


On 11/26/2018 08:54 AM, Paolo Bonzini wrote:
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index cd209f7730af..1071c10cf1c7 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1129,10 +1129,15 @@ documentation when it pops into existence).
>  
>  4.37 KVM_ENABLE_CAP
>  
> -Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
> -Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM),
> -mips (only KVM_CAP_ENABLE_CAP), ppc, s390
> -Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
> +Capability: KVM_CAP_ENABLE_CAP
> +Architectures: mips, ppc, s390
> +Type: vcpu ioctl
> +Parameters: struct kvm_enable_cap (in)
> +Returns: 0 on success; -1 on error
> +
> +Capability: KVM_CAP_ENABLE_CAP_VM
> +Architectures: all
> +Type: vcpu ioctl

I suppose that this should be "vm ioctl".

>  Parameters: struct kvm_enable_cap (in)
>  Returns: 0 on success; -1 on error
...

Reviewed-by: Junaid Shahid 


Re: [PATCH 1/3] kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic

2018-11-26 Thread Junaid Shahid


On 11/26/2018 08:54 AM, Paolo Bonzini wrote:
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index cd209f7730af..1071c10cf1c7 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1129,10 +1129,15 @@ documentation when it pops into existence).
>  
>  4.37 KVM_ENABLE_CAP
>  
> -Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
> -Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM),
> -mips (only KVM_CAP_ENABLE_CAP), ppc, s390
> -Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
> +Capability: KVM_CAP_ENABLE_CAP
> +Architectures: mips, ppc, s390
> +Type: vcpu ioctl
> +Parameters: struct kvm_enable_cap (in)
> +Returns: 0 on success; -1 on error
> +
> +Capability: KVM_CAP_ENABLE_CAP_VM
> +Architectures: all
> +Type: vcpu ioctl

I suppose that this should be "vm ioctl".

>  Parameters: struct kvm_enable_cap (in)
>  Returns: 0 on success; -1 on error
...

Reviewed-by: Junaid Shahid 


Re: [PATCH] KVM/MMU: Combine flushing remote tlb in mmu_set_spte()

2018-07-24 Thread Junaid Shahid
On 07/24/2018 07:35 AM, Paolo Bonzini wrote:
> On 24/07/2018 10:17, Tianyu Lan wrote:
>> mmu_set_spte() flushes remote tlbs for drop_parent_pte/drop_spte()
>> and set_spte() separately. This may introduce redundant flush. This
>> patch is to combine these flushes and check flush request after
>> calling set_spte().
>>
>> Signed-off-by: Lan Tianyu 
> 
> Looks good, but I'd like a second opinion.  Guangrong, Junaid, can you
> review this?
> 
> Thanks,
> 
> Paolo
> 
>> ---
>>  arch/x86/kvm/mmu.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 22a7984..8f21632 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2901,6 +2901,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 
>> *sptep, unsigned pte_access,
>>  int rmap_count;
>>  int set_spte_ret;
>>  int ret = RET_PF_RETRY;
>> +bool flush = false;
>>  
>>  pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
>>   *sptep, write_fault, gfn);
>> @@ -2917,12 +2918,12 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 
>> *sptep, unsigned pte_access,
>>  
>>  child = page_header(pte & PT64_BASE_ADDR_MASK);
>>  drop_parent_pte(child, sptep);
>> -kvm_flush_remote_tlbs(vcpu->kvm);
>> +flush = true;
>>  } else if (pfn != spte_to_pfn(*sptep)) {
>>  pgprintk("hfn old %llx new %llx\n",
>>   spte_to_pfn(*sptep), pfn);
>>  drop_spte(vcpu->kvm, sptep);
>> -kvm_flush_remote_tlbs(vcpu->kvm);
>> +flush = true;
>>  } else
>>  was_rmapped = 1;
>>  }
>> @@ -2934,7 +2935,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 
>> *sptep, unsigned pte_access,
>>  ret = RET_PF_EMULATE;
>>      kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>>  }
>> -if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH)
>> +if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush)
>>  kvm_flush_remote_tlbs(vcpu->kvm);
>>  
>>  if (unlikely(is_mmio_spte(*sptep)))
>>
> 

Reviewed-by: Junaid Shahid 


Re: [PATCH] KVM/MMU: Combine flushing remote tlb in mmu_set_spte()

2018-07-24 Thread Junaid Shahid
On 07/24/2018 07:35 AM, Paolo Bonzini wrote:
> On 24/07/2018 10:17, Tianyu Lan wrote:
>> mmu_set_spte() flushes remote tlbs for drop_parent_pte/drop_spte()
>> and set_spte() separately. This may introduce redundant flush. This
>> patch is to combine these flushes and check flush request after
>> calling set_spte().
>>
>> Signed-off-by: Lan Tianyu 
> 
> Looks good, but I'd like a second opinion.  Guangrong, Junaid, can you
> review this?
> 
> Thanks,
> 
> Paolo
> 
>> ---
>>  arch/x86/kvm/mmu.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 22a7984..8f21632 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2901,6 +2901,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 
>> *sptep, unsigned pte_access,
>>  int rmap_count;
>>  int set_spte_ret;
>>  int ret = RET_PF_RETRY;
>> +bool flush = false;
>>  
>>  pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
>>   *sptep, write_fault, gfn);
>> @@ -2917,12 +2918,12 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 
>> *sptep, unsigned pte_access,
>>  
>>  child = page_header(pte & PT64_BASE_ADDR_MASK);
>>  drop_parent_pte(child, sptep);
>> -kvm_flush_remote_tlbs(vcpu->kvm);
>> +flush = true;
>>  } else if (pfn != spte_to_pfn(*sptep)) {
>>  pgprintk("hfn old %llx new %llx\n",
>>   spte_to_pfn(*sptep), pfn);
>>  drop_spte(vcpu->kvm, sptep);
>> -kvm_flush_remote_tlbs(vcpu->kvm);
>> +flush = true;
>>  } else
>>  was_rmapped = 1;
>>  }
>> @@ -2934,7 +2935,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 
>> *sptep, unsigned pte_access,
>>  ret = RET_PF_EMULATE;
>>      kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>>  }
>> -if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH)
>> +if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush)
>>  kvm_flush_remote_tlbs(vcpu->kvm);
>>  
>>  if (unlikely(is_mmio_spte(*sptep)))
>>
> 

Reviewed-by: Junaid Shahid 


Re: [PATCH v2] KVM: X86: Fix CR3 reserve bits

2018-05-13 Thread Junaid Shahid
On 05/13/2018 02:24 AM, Wanpeng Li wrote:
> From: Wanpeng Li <wanpen...@tencent.com>
> 
> MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4. 
> It should be checked when PCIDE bit is not set, however commit 
> 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on 
> its physical address width")' removes the bit 63 checking 
> unconditionally. This patch fixes it by checking bit 63 of CR3 
> when PCIDE bit is not set in CR4.
> 
> Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits based on its 
> physical address width)
> Cc: Paolo Bonzini <pbonz...@redhat.com>
> Cc: Radim Krčmář <rkrc...@redhat.com>
> Cc: Junaid Shahid <juna...@google.com>
> Cc: Liran Alon <liran.a...@oracle.com>
> Signed-off-by: Wanpeng Li <wanpen...@tencent.com>
> ---
> v1 -> v2:
>  * remove CR3_PCID_INVD in rsvd when PCIDE is 1 instead of 
>removing CR3_PCID_INVD in new_value
> 
>  arch/x86/kvm/emulate.c | 4 +++-
>  arch/x86/kvm/x86.c | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index b3705ae..143b7ae 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4189,7 +4189,9 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
>   maxphyaddr = eax & 0xff;
>   else
>   maxphyaddr = 36;
> - rsvd = rsvd_bits(maxphyaddr, 62);
> + rsvd = rsvd_bits(maxphyaddr, 63);
> + if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
> + rsvd &= ~CR3_PCID_INVD;
>   }
>  
>   if (new_val & rsvd)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 87e4805..9a90668 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -863,7 +863,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   }
>  
>   if (is_long_mode(vcpu) &&
> - (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62)))
> + (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
>   return 1;
>   else if (is_pae(vcpu) && is_paging(vcpu) &&
>  !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
> 

Reviewed-by: Junaid Shahid <juna...@google.com>


Re: [PATCH v2] KVM: X86: Fix CR3 reserve bits

2018-05-13 Thread Junaid Shahid
On 05/13/2018 02:24 AM, Wanpeng Li wrote:
> From: Wanpeng Li 
> 
> MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4. 
> It should be checked when PCIDE bit is not set, however commit 
> 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on 
> its physical address width")' removes the bit 63 checking 
> unconditionally. This patch fixes it by checking bit 63 of CR3 
> when PCIDE bit is not set in CR4.
> 
> Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits based on its 
> physical address width)
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Junaid Shahid 
> Cc: Liran Alon 
> Signed-off-by: Wanpeng Li 
> ---
> v1 -> v2:
>  * remove CR3_PCID_INVD in rsvd when PCIDE is 1 instead of 
>removing CR3_PCID_INVD in new_value
> 
>  arch/x86/kvm/emulate.c | 4 +++-
>  arch/x86/kvm/x86.c | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index b3705ae..143b7ae 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4189,7 +4189,9 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
>   maxphyaddr = eax & 0xff;
>   else
>   maxphyaddr = 36;
> - rsvd = rsvd_bits(maxphyaddr, 62);
> + rsvd = rsvd_bits(maxphyaddr, 63);
> + if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
> + rsvd &= ~CR3_PCID_INVD;
>   }
>  
>   if (new_val & rsvd)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 87e4805..9a90668 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -863,7 +863,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   }
>  
>   if (is_long_mode(vcpu) &&
> - (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62)))
> + (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
>   return 1;
>   else if (is_pae(vcpu) && is_paging(vcpu) &&
>  !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
> 

Reviewed-by: Junaid Shahid 


Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

2018-02-13 Thread Junaid Shahid
[Resending after delivery failure]

Hi Dave,

On 02/13/2018 10:22 AM, Dave Watson wrote:
>
> Yes, these both sound reasonable.  I will send a V2.
>
> Thanks!

Another minor suggestion for v2:

It might be a good idea to check if the first assoclen bytes are already 
contiguous and only do the kmalloc if that isn't the case.

Thanks,
Junaid


Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

2018-02-13 Thread Junaid Shahid
[Resending after delivery failure]

Hi Dave,

On 02/13/2018 10:22 AM, Dave Watson wrote:
>
> Yes, these both sound reasonable.  I will send a V2.
>
> Thanks!

Another minor suggestion for v2:

It might be a good idea to check if the first assoclen bytes are already 
contiguous and only do the kmalloc if that isn't the case.

Thanks,
Junaid


Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

2018-02-12 Thread Junaid Shahid
Hi Dave,


On 02/12/2018 11:51 AM, Dave Watson wrote:

> +static int gcmaes_encrypt_sg(struct aead_request *req, unsigned int assoclen,
> + u8 *hash_subkey, u8 *iv, void *aes_ctx)
>  
> +static int gcmaes_decrypt_sg(struct aead_request *req, unsigned int assoclen,
> + u8 *hash_subkey, u8 *iv, void *aes_ctx)

These two functions are almost identical. Wouldn't it be better to combine them 
into a single encrypt/decrypt function, similar to what you have done for the 
assembly macros?

> + if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 ||
> + aesni_gcm_enc_tfm == aesni_gcm_enc) {

Shouldn't we also include a check for the buffer length being less than 
AVX_GEN2_OPTSIZE? AVX will not be used in that case either.


Thanks,
Junaid



Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

2018-02-12 Thread Junaid Shahid
Hi Dave,


On 02/12/2018 11:51 AM, Dave Watson wrote:

> +static int gcmaes_encrypt_sg(struct aead_request *req, unsigned int assoclen,
> + u8 *hash_subkey, u8 *iv, void *aes_ctx)
>  
> +static int gcmaes_decrypt_sg(struct aead_request *req, unsigned int assoclen,
> + u8 *hash_subkey, u8 *iv, void *aes_ctx)

These two functions are almost identical. Wouldn't it be better to combine them 
into a single encrypt/decrypt function, similar to what you have done for the 
assembly macros?

> + if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 ||
> + aesni_gcm_enc_tfm == aesni_gcm_enc) {

Shouldn't we also include a check for the buffer length being less than 
AVX_GEN2_OPTSIZE? AVX will not be used in that case either.


Thanks,
Junaid



Re: [PATCH -V3 -mm] mm, swap: Fix race between swapoff and some swap operations

2017-12-18 Thread Junaid Shahid
On Monday, December 18, 2017 3:41:41 PM PST Huang, Ying wrote:
> 
> A version implemented via stop_machine() could be gotten via a small
> patch as below.  If you still prefer stop_machine(), I can resend a
> version implemented with stop_machine().
> 

For the stop_machine() version, would it work to just put 
preempt_disable/enable at the start and end of lock_cluster() rather than 
introducing get/put_swap_device? Something like that might be simpler and would 
also disable preemption for less duration.

Thanks,
Junaid


Re: [PATCH -V3 -mm] mm, swap: Fix race between swapoff and some swap operations

2017-12-18 Thread Junaid Shahid
On Monday, December 18, 2017 3:41:41 PM PST Huang, Ying wrote:
> 
> A version implemented via stop_machine() could be gotten via a small
> patch as below.  If you still prefer stop_machine(), I can resend a
> version implemented with stop_machine().
> 

For the stop_machine() version, would it work to just put 
preempt_disable/enable at the start and end of lock_cluster() rather than 
introducing get/put_swap_device? Something like that might be simpler and would 
also disable preemption for less duration.

Thanks,
Junaid


Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()

2017-09-29 Thread Junaid Shahid
Thanks for the clarification. But in that case, shouldn’t the patch check 
whether IS_PARKED was already set before calling complete(>parked)? 
Otherwise, the completion count for self->parked could be more than 1 as a 
result of spurious wakeups, which could make a future call to kthread_park 
complete prematurely.

Thanks,
Junaid

On Friday, September 29, 2017 10:28:38 AM Peter Zijlstra wrote:
> On Fri, Sep 29, 2017 at 09:59:55AM +0200, Thomas Gleixner wrote:
> > On Thu, 28 Sep 2017, Junaid Shahid wrote:
> > 
> > > Hi Peter,
> > > 
> > > It looks like try_cmpxchg is not available on non-x86 archs, but other 
> > > than
> > > that the version that you proposed looks good.
> > > 
> > > One thing that I am a bit curious about is that the original code, before
> > > either patch, had a test_and_set_bit for KTHREAD_IS_PARKED rather than 
> > > just
> > > a set_bit. I can't think of any reason why that was needed, since it
> > > doesn't look like TASK_PARKED tasks are susceptible to spurious wakeups. 
> > > Do
> > > you by any chance happen to know if there was any specific reason for it?
> > 
> > Everything is susceptible to spurious wakeups and has to deal with it.
> 
> Right, we should code as if they are at all times possible. Currently,
> for TASK_PARKED, I don't think they can happen, but I've had patches
> that introduce them on purpose (regardless the state) just to stress the
> code.
> 
> IIRC only TASK_STOPPED and/or TASK_TRACED hard rely on not getting any.


Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()

2017-09-29 Thread Junaid Shahid
Thanks for the clarification. But in that case, shouldn’t the patch check 
whether IS_PARKED was already set before calling complete(>parked)? 
Otherwise, the completion count for self->parked could be more than 1 as a 
result of spurious wakeups, which could make a future call to kthread_park 
complete prematurely.

Thanks,
Junaid

On Friday, September 29, 2017 10:28:38 AM Peter Zijlstra wrote:
> On Fri, Sep 29, 2017 at 09:59:55AM +0200, Thomas Gleixner wrote:
> > On Thu, 28 Sep 2017, Junaid Shahid wrote:
> > 
> > > Hi Peter,
> > > 
> > > It looks like try_cmpxchg is not available on non-x86 archs, but other 
> > > than
> > > that the version that you proposed looks good.
> > > 
> > > One thing that I am a bit curious about is that the original code, before
> > > either patch, had a test_and_set_bit for KTHREAD_IS_PARKED rather than 
> > > just
> > > a set_bit. I can't think of any reason why that was needed, since it
> > > doesn't look like TASK_PARKED tasks are susceptible to spurious wakeups. 
> > > Do
> > > you by any chance happen to know if there was any specific reason for it?
> > 
> > Everything is susceptible to spurious wakeups and has to deal with it.
> 
> Right, we should code as if they are at all times possible. Currently,
> for TASK_PARKED, I don't think they can happen, but I've had patches
> that introduce them on purpose (regardless the state) just to stress the
> code.
> 
> IIRC only TASK_STOPPED and/or TASK_TRACED hard rely on not getting any.


Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()

2017-07-17 Thread Junaid Shahid
Hi,

Has anyone been able to take a look at this?

Thanks,
Junaid

On Tuesday, May 30, 2017 03:40:02 PM Andrew Morton wrote:
> (cc tglx & tj)
> 
> On Tue, 30 May 2017 15:37:23 -0700 Junaid Shahid <juna...@google.com> wrote:
> 
> > (Resending)
> > 
> > On Friday, April 28, 2017 07:32:36 PM Junaid Shahid wrote:
> > > In general, if kthread_unpark() and kthread_parkme() execute together,
> > > the kthread is supposed to be in an unparked state. This is because
> > > kthread_unpark() either wakes up the thread if it already got parked,
> > > or it cancels a prior kthread_park() call and hence renders the
> > > kthread_parkme() moot.
> > > 
> > > However, if kthread_unpark() happens to execute between the time that
> > > kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the
> > > KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the
> > > thread and it will remain in a parked state.
> > > 
> > > This is fixed by making the checking of KTHREAD_SHOULD_STOP and the
> > > setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme.
> > > 
> > > Signed-off-by: Junaid Shahid <juna...@google.com>
> > > ---
> > >  kernel/kthread.c | 16 
> > >  1 file changed, 12 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > > index 26db528c1d88..651f03baf62f 100644
> > > --- a/kernel/kthread.c
> > > +++ b/kernel/kthread.c
> > > @@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task)
> > >  
> > >  static void __kthread_parkme(struct kthread *self)
> > >  {
> > > + ulong flags;
> > > +
> > >   __set_current_state(TASK_PARKED);
> > > - while (test_bit(KTHREAD_SHOULD_PARK, >flags)) {
> > > - if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags))
> > > - complete(>parked);
> > > - schedule();
> > > + flags = self->flags;
> > > +
> > > + while (test_bit(KTHREAD_SHOULD_PARK, )) {
> > > + if (cmpxchg(>flags, flags,
> > > + flags | (1 << KTHREAD_IS_PARKED)) == flags) {
> > > + if (!test_bit(KTHREAD_IS_PARKED, ))
> > > + complete(>parked);
> > > + schedule();
> > > + }
> > >   __set_current_state(TASK_PARKED);
> > > + flags = self->flags;
> > >   }
> > >   clear_bit(KTHREAD_IS_PARKED, >flags);
> > >   __set_current_state(TASK_RUNNING);
> > > 


Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()

2017-07-17 Thread Junaid Shahid
Hi,

Has anyone been able to take a look at this?

Thanks,
Junaid

On Tuesday, May 30, 2017 03:40:02 PM Andrew Morton wrote:
> (cc tglx & tj)
> 
> On Tue, 30 May 2017 15:37:23 -0700 Junaid Shahid  wrote:
> 
> > (Resending)
> > 
> > On Friday, April 28, 2017 07:32:36 PM Junaid Shahid wrote:
> > > In general, if kthread_unpark() and kthread_parkme() execute together,
> > > the kthread is supposed to be in an unparked state. This is because
> > > kthread_unpark() either wakes up the thread if it already got parked,
> > > or it cancels a prior kthread_park() call and hence renders the
> > > kthread_parkme() moot.
> > > 
> > > However, if kthread_unpark() happens to execute between the time that
> > > kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the
> > > KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the
> > > thread and it will remain in a parked state.
> > > 
> > > This is fixed by making the checking of KTHREAD_SHOULD_STOP and the
> > > setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme.
> > > 
> > > Signed-off-by: Junaid Shahid 
> > > ---
> > >  kernel/kthread.c | 16 
> > >  1 file changed, 12 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > > index 26db528c1d88..651f03baf62f 100644
> > > --- a/kernel/kthread.c
> > > +++ b/kernel/kthread.c
> > > @@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task)
> > >  
> > >  static void __kthread_parkme(struct kthread *self)
> > >  {
> > > + ulong flags;
> > > +
> > >   __set_current_state(TASK_PARKED);
> > > - while (test_bit(KTHREAD_SHOULD_PARK, >flags)) {
> > > - if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags))
> > > - complete(>parked);
> > > - schedule();
> > > + flags = self->flags;
> > > +
> > > + while (test_bit(KTHREAD_SHOULD_PARK, )) {
> > > + if (cmpxchg(>flags, flags,
> > > + flags | (1 << KTHREAD_IS_PARKED)) == flags) {
> > > + if (!test_bit(KTHREAD_IS_PARKED, ))
> > > + complete(>parked);
> > > + schedule();
> > > + }
> > >   __set_current_state(TASK_PARKED);
> > > + flags = self->flags;
> > >   }
> > >   clear_bit(KTHREAD_IS_PARKED, >flags);
> > >   __set_current_state(TASK_RUNNING);
> > > 


Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()

2017-05-30 Thread Junaid Shahid
(Resending)

On Friday, April 28, 2017 07:32:36 PM Junaid Shahid wrote:
> In general, if kthread_unpark() and kthread_parkme() execute together,
> the kthread is supposed to be in an unparked state. This is because
> kthread_unpark() either wakes up the thread if it already got parked,
> or it cancels a prior kthread_park() call and hence renders the
> kthread_parkme() moot.
> 
> However, if kthread_unpark() happens to execute between the time that
> kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the
> KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the
> thread and it will remain in a parked state.
> 
> This is fixed by making the checking of KTHREAD_SHOULD_STOP and the
> setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme.
> 
> Signed-off-by: Junaid Shahid <juna...@google.com>
> ---
>  kernel/kthread.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 26db528c1d88..651f03baf62f 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task)
>  
>  static void __kthread_parkme(struct kthread *self)
>  {
> + ulong flags;
> +
>   __set_current_state(TASK_PARKED);
> - while (test_bit(KTHREAD_SHOULD_PARK, >flags)) {
> - if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags))
> - complete(>parked);
> - schedule();
> + flags = self->flags;
> +
> + while (test_bit(KTHREAD_SHOULD_PARK, )) {
> + if (cmpxchg(>flags, flags,
> + flags | (1 << KTHREAD_IS_PARKED)) == flags) {
> + if (!test_bit(KTHREAD_IS_PARKED, ))
> + complete(>parked);
> + schedule();
> + }
>   __set_current_state(TASK_PARKED);
> + flags = self->flags;
>   }
>   clear_bit(KTHREAD_IS_PARKED, >flags);
>   __set_current_state(TASK_RUNNING);
> 


Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()

2017-05-30 Thread Junaid Shahid
(Resending)

On Friday, April 28, 2017 07:32:36 PM Junaid Shahid wrote:
> In general, if kthread_unpark() and kthread_parkme() execute together,
> the kthread is supposed to be in an unparked state. This is because
> kthread_unpark() either wakes up the thread if it already got parked,
> or it cancels a prior kthread_park() call and hence renders the
> kthread_parkme() moot.
> 
> However, if kthread_unpark() happens to execute between the time that
> kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the
> KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the
> thread and it will remain in a parked state.
> 
> This is fixed by making the checking of KTHREAD_SHOULD_STOP and the
> setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme.
> 
> Signed-off-by: Junaid Shahid 
> ---
>  kernel/kthread.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 26db528c1d88..651f03baf62f 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task)
>  
>  static void __kthread_parkme(struct kthread *self)
>  {
> + ulong flags;
> +
>   __set_current_state(TASK_PARKED);
> - while (test_bit(KTHREAD_SHOULD_PARK, >flags)) {
> - if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags))
> - complete(>parked);
> - schedule();
> + flags = self->flags;
> +
> + while (test_bit(KTHREAD_SHOULD_PARK, )) {
> + if (cmpxchg(>flags, flags,
> + flags | (1 << KTHREAD_IS_PARKED)) == flags) {
> + if (!test_bit(KTHREAD_IS_PARKED, ))
> + complete(>parked);
> + schedule();
> + }
>   __set_current_state(TASK_PARKED);
> + flags = self->flags;
>   }
>   clear_bit(KTHREAD_IS_PARKED, >flags);
>   __set_current_state(TASK_RUNNING);
> 


Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()

2017-05-18 Thread Junaid Shahid
(Adding back the correct linux-mm email address and also adding linux-kernel.)

On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote:
> On Thu, 18 May 2017, Michal Hocko wrote:
> 
> > On Thu 18-05-17 11:50:40, Junaid Shahid wrote:
> > > d224e9381897 (drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded
> > > variant) left out the __GFP_HIGH flag when converting from __vmalloc to
> > > kvmalloc. This can cause the IOCTL to fail in some low memory situations
> > > where it wouldn't have failed earlier. This patch adds it back to avoid
> > > any potential regression.
> > 
> > The code previously used __GFP_HIGH only for the vmalloc fallback and
> > that doesn't make that much sense with the current implementation
> > because vmalloc does order-0 pages and those do not really fail and the
> > oom killer is invoked to free memory.
> > 
> 
> Order-0 pages certainly do fail, there is not an infinite amount of memory 
> nor is there a specific exemption to allow order-0 memory to be alloctable 
> below watermarks without this gfp flag.  OOM kill is the last thing we 
> want for these allocations since they are very temporary.
> 
> > There is no reason to access memory reserves from this context.
> > 
> 
> Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, 
> assuming there was a reason to do it in the first place in two different 
> ways.
> 
> This decision is up to the device mapper maintainers.
> 
> > > Signed-off-by: Junaid Shahid <juna...@google.com>
> > > ---
> > >  drivers/md/dm-ioctl.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> > > index 0555b4410e05..bacad7637a56 100644
> > > --- a/drivers/md/dm-ioctl.c
> > > +++ b/drivers/md/dm-ioctl.c
> > > @@ -1715,7 +1715,7 @@ static int copy_params(struct dm_ioctl __user 
> > > *user, struct dm_ioctl *param_kern
> > >*/
> > >   dmi = NULL;
> > >   noio_flag = memalloc_noio_save();
> > > - dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL);
> > > + dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH);
> > >   memalloc_noio_restore(noio_flag);
> > >  
> > >   if (!dmi) {


Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()

2017-05-18 Thread Junaid Shahid
(Adding back the correct linux-mm email address and also adding linux-kernel.)

On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote:
> On Thu, 18 May 2017, Michal Hocko wrote:
> 
> > On Thu 18-05-17 11:50:40, Junaid Shahid wrote:
> > > d224e9381897 (drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded
> > > variant) left out the __GFP_HIGH flag when converting from __vmalloc to
> > > kvmalloc. This can cause the IOCTL to fail in some low memory situations
> > > where it wouldn't have failed earlier. This patch adds it back to avoid
> > > any potential regression.
> > 
> > The code previously used __GFP_HIGH only for the vmalloc fallback and
> > that doesn't make that much sense with the current implementation
> > because vmalloc does order-0 pages and those do not really fail and the
> > oom killer is invoked to free memory.
> > 
> 
> Order-0 pages certainly do fail, there is not an infinite amount of memory 
> nor is there a specific exemption to allow order-0 memory to be alloctable 
> below watermarks without this gfp flag.  OOM kill is the last thing we 
> want for these allocations since they are very temporary.
> 
> > There is no reason to access memory reserves from this context.
> > 
> 
> Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, 
> assuming there was a reason to do it in the first place in two different 
> ways.
> 
> This decision is up to the device mapper maintainers.
> 
> > > Signed-off-by: Junaid Shahid 
> > > ---
> > >  drivers/md/dm-ioctl.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> > > index 0555b4410e05..bacad7637a56 100644
> > > --- a/drivers/md/dm-ioctl.c
> > > +++ b/drivers/md/dm-ioctl.c
> > > @@ -1715,7 +1715,7 @@ static int copy_params(struct dm_ioctl __user 
> > > *user, struct dm_ioctl *param_kern
> > >*/
> > >   dmi = NULL;
> > >   noio_flag = memalloc_noio_save();
> > > - dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL);
> > > + dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH);
> > >   memalloc_noio_restore(noio_flag);
> > >  
> > >   if (!dmi) {


[PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()

2017-04-28 Thread Junaid Shahid
In general, if kthread_unpark() and kthread_parkme() execute together,
the kthread is supposed to be in an unparked state. This is because
kthread_unpark() either wakes up the thread if it already got parked,
or it cancels a prior kthread_park() call and hence renders the
kthread_parkme() moot.

However, if kthread_unpark() happens to execute between the time that
kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the
KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the
thread and it will remain in a parked state.

This is fixed by making the checking of KTHREAD_SHOULD_STOP and the
setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme.

Signed-off-by: Junaid Shahid <juna...@google.com>
---
 kernel/kthread.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 26db528c1d88..651f03baf62f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task)
 
 static void __kthread_parkme(struct kthread *self)
 {
+   ulong flags;
+
__set_current_state(TASK_PARKED);
-   while (test_bit(KTHREAD_SHOULD_PARK, >flags)) {
-   if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags))
-   complete(>parked);
-   schedule();
+   flags = self->flags;
+
+   while (test_bit(KTHREAD_SHOULD_PARK, )) {
+   if (cmpxchg(>flags, flags,
+   flags | (1 << KTHREAD_IS_PARKED)) == flags) {
+   if (!test_bit(KTHREAD_IS_PARKED, ))
+   complete(>parked);
+   schedule();
+   }
__set_current_state(TASK_PARKED);
+   flags = self->flags;
}
clear_bit(KTHREAD_IS_PARKED, >flags);
__set_current_state(TASK_RUNNING);
-- 
2.13.0.rc0.306.g87b477812d-goog



[PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()

2017-04-28 Thread Junaid Shahid
In general, if kthread_unpark() and kthread_parkme() execute together,
the kthread is supposed to be in an unparked state. This is because
kthread_unpark() either wakes up the thread if it already got parked,
or it cancels a prior kthread_park() call and hence renders the
kthread_parkme() moot.

However, if kthread_unpark() happens to execute between the time that
kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the
KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the
thread and it will remain in a parked state.

This is fixed by making the checking of KTHREAD_SHOULD_STOP and the
setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme.

Signed-off-by: Junaid Shahid 
---
 kernel/kthread.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 26db528c1d88..651f03baf62f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task)
 
 static void __kthread_parkme(struct kthread *self)
 {
+   ulong flags;
+
__set_current_state(TASK_PARKED);
-   while (test_bit(KTHREAD_SHOULD_PARK, >flags)) {
-   if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags))
-   complete(>parked);
-   schedule();
+   flags = self->flags;
+
+   while (test_bit(KTHREAD_SHOULD_PARK, )) {
+   if (cmpxchg(>flags, flags,
+   flags | (1 << KTHREAD_IS_PARKED)) == flags) {
+   if (!test_bit(KTHREAD_IS_PARKED, ))
+   complete(>parked);
+   schedule();
+   }
__set_current_state(TASK_PARKED);
+   flags = self->flags;
}
clear_bit(KTHREAD_IS_PARKED, >flags);
__set_current_state(TASK_RUNNING);
-- 
2.13.0.rc0.306.g87b477812d-goog