[PATCH] x86/mm/KASLR: Account for minimum padding when calculating entropy
Subtract the minimum padding between regions from the initial remain_entropy. Without this, the last region could potentially overflow past vaddr_end if we happen to get a specific sequence of random numbers (although extremely unlikely in practice). The bug can be demonstrated by replacing the prandom_bytes_state call with "rand = entropy;" Signed-off-by: Junaid Shahid --- arch/x86/mm/kaslr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index 6e6b39710e5f..fe3eec30f736 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -109,7 +109,8 @@ void __init kernel_randomize_memory(void) kaslr_regions[2].size_tb = DIV_ROUND_UP(vmemmap_size, 1UL << TB_SHIFT); /* Calculate entropy available between regions */ - remain_entropy = vaddr_end - vaddr_start; + remain_entropy = vaddr_end - vaddr_start - +(ARRAY_SIZE(kaslr_regions) - 1) * PUD_SIZE; for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) remain_entropy -= get_padding(_regions[i]); -- 2.29.0.rc2.309.g374f81d7ae-goog
Re: [PATCH] KVM: x86: drop erroneous mmu_check_root() from fast_pgd_switch()
On 6/30/20 3:07 AM, Vitaly Kuznetsov wrote: Undesired triple fault gets injected to L1 guest on SVM when L2 is launched with certain CR3 values. It seems the mmu_check_root() check in fast_pgd_switch() is wrong: first of all we don't know if 'new_pgd' is a GPA or a nested GPA and, in case it is a nested GPA, we can't check it with kvm_is_visible_gfn(). The problematic code path is: nested_svm_vmrun() ... nested_prepare_vmcb_save() kvm_set_cr3(..., nested_vmcb->save.cr3) kvm_mmu_new_pgd() ... mmu_check_root() -> TRIPLE FAULT The mmu_check_root() check in fast_pgd_switch() seems to be superfluous even for non-nested case: when GPA is outside of the visible range cached_root_available() will fail for non-direct roots (as we can't have a matching one on the list) and we don't seem to care for direct ones. Also, raising #TF immediately when a non-existent GFN is written to CR3 doesn't seem to mach architecture behavior. Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path") Signed-off-by: Vitaly Kuznetsov --- - The patch fixes the immediate issue and doesn't seem to break any tests even with shadow PT but I'm not sure I properly understood why the check was there in the first place. Please review! --- arch/x86/kvm/mmu/mmu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 76817d13c86e..286c74d2ae8d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4277,8 +4277,7 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd, */ if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL && mmu->root_level >= PT64_ROOT_4LEVEL) - return !mmu_check_root(vcpu, new_pgd >> PAGE_SHIFT) && - cached_root_available(vcpu, new_pgd, new_role); + return cached_root_available(vcpu, new_pgd, new_role); return false; } The check does seem superfluous, so should be ok to remove. Though I think that fast_pgd_switch() really should be getting only L1 GPAs. Otherwise, there could be confusion between the same GPAs from two different L2s. IIUC, at least on Intel, only L1 CR3s (including shadow L1 CR3s for L2) or L1 EPTPs should get to fast_pgd_switch(). But I am not familiar enough with SVM to see why an L2 GPA would end up there. From a cursory look, it seems that until "978ce5837c7e KVM: SVM: always update CR3 in VMCB", enter_svm_guest_mode() was calling kvm_set_cr3() only when using shadow paging, in which case I assume that nested_vmcb->save.cr3 would have been an L1 CR3 shadowing the L2 CR3, correct? But now kvm_set_cr3() is called even when not using shadow paging, which I suppose is how we are getting the L2 CR3. Should we skip calling fast_pgd_switch() in that particular case? Thanks, Junaid
Re: [PATCH v2 2/3] KVM: x86: fix nested guest live migration with PML
On 9/27/19 4:15 AM, Paolo Bonzini wrote: > Shadow paging is fundamentally incompatible with the page-modification > log, because the GPAs in the log come from the wrong memory map. > In particular, for the EPT page-modification log, the GPAs in the log come > from L2 rather than L1. (If there was a non-EPT page-modification log, > we couldn't use it for shadow paging because it would log GVAs rather > than GPAs). > > Therefore, we need to rely on write protection to record dirty pages. > This has the side effect of bypassing PML, since writes now result in an > EPT violation vmexit. > > This is relatively easy to add to KVM, because pretty much the only place > that needs changing is spte_clear_dirty. The first access to the page > already goes through the page fault path and records the correct GPA; > it's only subsequent accesses that are wrong. Therefore, we can equip > set_spte (where the first access happens) to record that the SPTE will > have to be write protected, and then spte_clear_dirty will use this > information to do the right thing. > > Signed-off-by: Paolo Bonzini > --- Reviewed-by: Junaid Shahid
Re: [PATCH 2/3] KVM: x86: fix nested guest live migration with PML
On 9/26/19 10:18 AM, Paolo Bonzini wrote: > @@ -1597,8 +1615,11 @@ static bool spte_clear_dirty(u64 *sptep) > > rmap_printk("rmap_clear_dirty: spte %p %llx\n", sptep, *sptep); > > - spte &= ~shadow_dirty_mask; > + WARN_ON(!spte_ad_enabled(spte)); > + if (spte_ad_need_write_protect(spte)) > + return spte_write_protect(sptep, false); > > + spte &= ~shadow_dirty_mask; > return mmu_spte_update(sptep, spte); > } > I think that it would be a bit cleaner to move the spte_ad_need_write_protect() check to the if statement inside __rmap_clear_dirty() instead, since it already checks for spte_ad_enabled() to decide between write-protection and dirty-clearing. Reviewed-by: Junaid Shahid
Re: [PATCH 1/3] KVM: x86: assign two bits to track SPTE kinds
On 9/26/19 10:18 AM, Paolo Bonzini wrote: > Currently, we are overloading SPTE_SPECIAL_MASK to mean both > "A/D bits unavailable" and MMIO, where the difference between the > two is determined by mio_mask and mmio_value. > > However, the next patch will need two bits to distinguish > availability of A/D bits from write protection. So, while at > it give MMIO its own bit pattern, and move the two bits from > bit 62 to bits 52..53 since Intel is allocating EPT page table > bits from the top. > > Signed-off-by: Paolo Bonzini > --- Reviewed-by: Junaid Shahid
Re: [PATCH 3/3] kvm: introduce manual dirty log reprotect
On 11/26/2018 08:54 AM, Paolo Bonzini wrote: > There are two problems with KVM_GET_DIRTY_LOG. First, and less important, > it can take kvm->mmu_lock for an extended period of time. Second, its user > can actually see many false positives in some cases. The latter is due > to a benign race like this: > > 1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects > them. > 2. The guest modifies the pages, causing them to be marked ditry. > 3. Userspace actually copies the pages. > 4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though > they were not written to since (3). > > This is especially a problem for large guests, where the time between > (1) and (3) can be substantial. This patch introduces a new > capability which, when enabled, makes KVM_GET_DIRTY_LOG not > write-protect the pages it returns. Instead, userspace has to > explicitly clear the dirty log bits just before using the content > of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can operate on a > 64-page granularity rather than requiring to sync a full memslot. > This way the mmu_lock is taken for small amounts of time, and > only a small amount of time will pass between write protection > of pages and the sending of their content. > > Signed-off-by: Paolo Bonzini > --- > Documentation/virtual/kvm/api.txt | 65 +++ > arch/mips/kvm/mips.c | 23 > arch/x86/kvm/x86.c | 27 + > include/linux/kvm_host.h | 5 + > include/uapi/linux/kvm.h | 15 +++ > tools/testing/selftests/kvm/Makefile | 2 + > tools/testing/selftests/kvm/clear_dirty_log_test.c | 2 + > tools/testing/selftests/kvm/dirty_log_test.c | 19 > tools/testing/selftests/kvm/include/kvm_util.h | 2 + > tools/testing/selftests/kvm/lib/kvm_util.c | 13 +++ > virt/kvm/arm/arm.c | 16 +++ > virt/kvm/kvm_main.c| 120 > ++--- > 12 files changed, 293 insertions(+), 16 deletions(-) > create mode 100644 tools/testing/selftests/kvm/clear_dirty_log_test.c > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 1071c10cf1c7..859927fb0b9c 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -305,6 +305,9 @@ the address space for which you want to return the dirty > bitmap. > They must be less than the value that KVM_CHECK_EXTENSION returns for > the KVM_CAP_MULTI_ADDRESS_SPACE capability. > > +The bits in the dirty bitmap are cleared before the ioctl returns, unless > +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is enabled. For more information, > +see the description of the capability. > > 4.9 KVM_SET_MEMORY_ALIAS > > @@ -3758,6 +3761,44 @@ Coalesced pio is based on coalesced mmio. There is > little difference > between coalesced mmio and pio except that coalesced pio records accesses > to I/O ports. > > +4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl) > + > +Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT > +Architectures: x86 > +Type: vm ioctl > +Parameters: struct kvm_dirty_log (in/out) Shouldn't this be just (in) rather than (in/out)? > +Returns: 0 on success, -1 on error > + > +/* for KVM_CLEAR_DIRTY_LOG */ > +struct kvm_clear_dirty_log { > + __u32 slot; > + __u32 num_pages; > + __u64 first_page; > + union { > + void __user *dirty_bitmap; /* one bit per page */ > + __u64 padding; > + }; > +}; > + > +The ioctl write-protects pages in a memory slot according to the > +bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap > +field. Bit 0 of the bitmap corresponds to page "first_page" in the > +memory slot, and num_pages is the size in bits of the input bitmap. > +Both first_page and num_pages must be a multiple of 64. For each > +bit that is set in the input bitmap, the corresponding page is write > +protected and marked "clean" in KVM's dirty bitmap. There will not be any write protection when PML is enabled. To make it more accurate, perhaps we could say something like "The ioctl clears the dirty status of pages ..." and later "For each bit that is set in the input bitmap, the corresponding page is marked clean in KVM's dirty bitmap and dirty tracking is re-enabled for that page either via write-protection or via clearing the Dirty bit in the PTE, depending on the dirty logging mode in use." > + > +If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies > +the address space for which you want to return the dirty bitmap. > +They must be less than the value that KVM_CHECK_EXTENSION returns for > +the KVM_CAP_MULTI_ADDRESS_SPACE capability. > + > +This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT > +is enabled; for more information, see the description of the capability. > +However, it can always be used
Re: [PATCH 3/3] kvm: introduce manual dirty log reprotect
On 11/26/2018 08:54 AM, Paolo Bonzini wrote: > There are two problems with KVM_GET_DIRTY_LOG. First, and less important, > it can take kvm->mmu_lock for an extended period of time. Second, its user > can actually see many false positives in some cases. The latter is due > to a benign race like this: > > 1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects > them. > 2. The guest modifies the pages, causing them to be marked ditry. > 3. Userspace actually copies the pages. > 4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though > they were not written to since (3). > > This is especially a problem for large guests, where the time between > (1) and (3) can be substantial. This patch introduces a new > capability which, when enabled, makes KVM_GET_DIRTY_LOG not > write-protect the pages it returns. Instead, userspace has to > explicitly clear the dirty log bits just before using the content > of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can operate on a > 64-page granularity rather than requiring to sync a full memslot. > This way the mmu_lock is taken for small amounts of time, and > only a small amount of time will pass between write protection > of pages and the sending of their content. > > Signed-off-by: Paolo Bonzini > --- > Documentation/virtual/kvm/api.txt | 65 +++ > arch/mips/kvm/mips.c | 23 > arch/x86/kvm/x86.c | 27 + > include/linux/kvm_host.h | 5 + > include/uapi/linux/kvm.h | 15 +++ > tools/testing/selftests/kvm/Makefile | 2 + > tools/testing/selftests/kvm/clear_dirty_log_test.c | 2 + > tools/testing/selftests/kvm/dirty_log_test.c | 19 > tools/testing/selftests/kvm/include/kvm_util.h | 2 + > tools/testing/selftests/kvm/lib/kvm_util.c | 13 +++ > virt/kvm/arm/arm.c | 16 +++ > virt/kvm/kvm_main.c| 120 > ++--- > 12 files changed, 293 insertions(+), 16 deletions(-) > create mode 100644 tools/testing/selftests/kvm/clear_dirty_log_test.c > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 1071c10cf1c7..859927fb0b9c 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -305,6 +305,9 @@ the address space for which you want to return the dirty > bitmap. > They must be less than the value that KVM_CHECK_EXTENSION returns for > the KVM_CAP_MULTI_ADDRESS_SPACE capability. > > +The bits in the dirty bitmap are cleared before the ioctl returns, unless > +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is enabled. For more information, > +see the description of the capability. > > 4.9 KVM_SET_MEMORY_ALIAS > > @@ -3758,6 +3761,44 @@ Coalesced pio is based on coalesced mmio. There is > little difference > between coalesced mmio and pio except that coalesced pio records accesses > to I/O ports. > > +4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl) > + > +Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT > +Architectures: x86 > +Type: vm ioctl > +Parameters: struct kvm_dirty_log (in/out) Shouldn't this be just (in) rather than (in/out)? > +Returns: 0 on success, -1 on error > + > +/* for KVM_CLEAR_DIRTY_LOG */ > +struct kvm_clear_dirty_log { > + __u32 slot; > + __u32 num_pages; > + __u64 first_page; > + union { > + void __user *dirty_bitmap; /* one bit per page */ > + __u64 padding; > + }; > +}; > + > +The ioctl write-protects pages in a memory slot according to the > +bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap > +field. Bit 0 of the bitmap corresponds to page "first_page" in the > +memory slot, and num_pages is the size in bits of the input bitmap. > +Both first_page and num_pages must be a multiple of 64. For each > +bit that is set in the input bitmap, the corresponding page is write > +protected and marked "clean" in KVM's dirty bitmap. There will not be any write protection when PML is enabled. To make it more accurate, perhaps we could say something like "The ioctl clears the dirty status of pages ..." and later "For each bit that is set in the input bitmap, the corresponding page is marked clean in KVM's dirty bitmap and dirty tracking is re-enabled for that page either via write-protection or via clearing the Dirty bit in the PTE, depending on the dirty logging mode in use." > + > +If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies > +the address space for which you want to return the dirty bitmap. > +They must be less than the value that KVM_CHECK_EXTENSION returns for > +the KVM_CAP_MULTI_ADDRESS_SPACE capability. > + > +This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT > +is enabled; for more information, see the description of the capability. > +However, it can always be used
Re: [PATCH 2/3] kvm: rename last argument to kvm_get_dirty_log_protect
On 11/26/2018 08:54 AM, Paolo Bonzini wrote: > When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's > pointer argument will always be false on exit, because no TLB flush is needed > until the manual re-protection operation. Rename it from "is_dirty" to > "flush", > which more accurately tells the caller what they have to do with it. > > Signed-off-by: Paolo Bonzini > --- > arch/mips/kvm/mips.c | 6 +++--- > arch/x86/kvm/x86.c | 6 +++--- > include/linux/kvm_host.h | 2 +- > virt/kvm/arm/arm.c | 6 +++--- > virt/kvm/kvm_main.c | 6 +++--- > 5 files changed, 13 insertions(+), 13 deletions(-) > Reviewed-by: Junaid Shahid
Re: [PATCH 2/3] kvm: rename last argument to kvm_get_dirty_log_protect
On 11/26/2018 08:54 AM, Paolo Bonzini wrote: > When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's > pointer argument will always be false on exit, because no TLB flush is needed > until the manual re-protection operation. Rename it from "is_dirty" to > "flush", > which more accurately tells the caller what they have to do with it. > > Signed-off-by: Paolo Bonzini > --- > arch/mips/kvm/mips.c | 6 +++--- > arch/x86/kvm/x86.c | 6 +++--- > include/linux/kvm_host.h | 2 +- > virt/kvm/arm/arm.c | 6 +++--- > virt/kvm/kvm_main.c | 6 +++--- > 5 files changed, 13 insertions(+), 13 deletions(-) > Reviewed-by: Junaid Shahid
Re: [PATCH 1/3] kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic
On 11/26/2018 08:54 AM, Paolo Bonzini wrote: > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index cd209f7730af..1071c10cf1c7 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -1129,10 +1129,15 @@ documentation when it pops into existence). > > 4.37 KVM_ENABLE_CAP > > -Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM > -Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM), > -mips (only KVM_CAP_ENABLE_CAP), ppc, s390 > -Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM) > +Capability: KVM_CAP_ENABLE_CAP > +Architectures: mips, ppc, s390 > +Type: vcpu ioctl > +Parameters: struct kvm_enable_cap (in) > +Returns: 0 on success; -1 on error > + > +Capability: KVM_CAP_ENABLE_CAP_VM > +Architectures: all > +Type: vcpu ioctl I suppose that this should be "vm ioctl". > Parameters: struct kvm_enable_cap (in) > Returns: 0 on success; -1 on error ... Reviewed-by: Junaid Shahid
Re: [PATCH 1/3] kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic
On 11/26/2018 08:54 AM, Paolo Bonzini wrote: > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index cd209f7730af..1071c10cf1c7 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -1129,10 +1129,15 @@ documentation when it pops into existence). > > 4.37 KVM_ENABLE_CAP > > -Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM > -Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM), > -mips (only KVM_CAP_ENABLE_CAP), ppc, s390 > -Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM) > +Capability: KVM_CAP_ENABLE_CAP > +Architectures: mips, ppc, s390 > +Type: vcpu ioctl > +Parameters: struct kvm_enable_cap (in) > +Returns: 0 on success; -1 on error > + > +Capability: KVM_CAP_ENABLE_CAP_VM > +Architectures: all > +Type: vcpu ioctl I suppose that this should be "vm ioctl". > Parameters: struct kvm_enable_cap (in) > Returns: 0 on success; -1 on error ... Reviewed-by: Junaid Shahid
Re: [PATCH] KVM/MMU: Combine flushing remote tlb in mmu_set_spte()
On 07/24/2018 07:35 AM, Paolo Bonzini wrote: > On 24/07/2018 10:17, Tianyu Lan wrote: >> mmu_set_spte() flushes remote tlbs for drop_parent_pte/drop_spte() >> and set_spte() separately. This may introduce redundant flush. This >> patch is to combine these flushes and check flush request after >> calling set_spte(). >> >> Signed-off-by: Lan Tianyu > > Looks good, but I'd like a second opinion. Guangrong, Junaid, can you > review this? > > Thanks, > > Paolo > >> --- >> arch/x86/kvm/mmu.c | 7 --- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index 22a7984..8f21632 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -2901,6 +2901,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 >> *sptep, unsigned pte_access, >> int rmap_count; >> int set_spte_ret; >> int ret = RET_PF_RETRY; >> +bool flush = false; >> >> pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, >> *sptep, write_fault, gfn); >> @@ -2917,12 +2918,12 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 >> *sptep, unsigned pte_access, >> >> child = page_header(pte & PT64_BASE_ADDR_MASK); >> drop_parent_pte(child, sptep); >> -kvm_flush_remote_tlbs(vcpu->kvm); >> +flush = true; >> } else if (pfn != spte_to_pfn(*sptep)) { >> pgprintk("hfn old %llx new %llx\n", >> spte_to_pfn(*sptep), pfn); >> drop_spte(vcpu->kvm, sptep); >> -kvm_flush_remote_tlbs(vcpu->kvm); >> +flush = true; >> } else >> was_rmapped = 1; >> } >> @@ -2934,7 +2935,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 >> *sptep, unsigned pte_access, >> ret = RET_PF_EMULATE; >> kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); >> } >> -if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH) >> +if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush) >> kvm_flush_remote_tlbs(vcpu->kvm); >> >> if (unlikely(is_mmio_spte(*sptep))) >> > Reviewed-by: Junaid Shahid
Re: [PATCH] KVM/MMU: Combine flushing remote tlb in mmu_set_spte()
On 07/24/2018 07:35 AM, Paolo Bonzini wrote: > On 24/07/2018 10:17, Tianyu Lan wrote: >> mmu_set_spte() flushes remote tlbs for drop_parent_pte/drop_spte() >> and set_spte() separately. This may introduce redundant flush. This >> patch is to combine these flushes and check flush request after >> calling set_spte(). >> >> Signed-off-by: Lan Tianyu > > Looks good, but I'd like a second opinion. Guangrong, Junaid, can you > review this? > > Thanks, > > Paolo > >> --- >> arch/x86/kvm/mmu.c | 7 --- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index 22a7984..8f21632 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -2901,6 +2901,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 >> *sptep, unsigned pte_access, >> int rmap_count; >> int set_spte_ret; >> int ret = RET_PF_RETRY; >> +bool flush = false; >> >> pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, >> *sptep, write_fault, gfn); >> @@ -2917,12 +2918,12 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 >> *sptep, unsigned pte_access, >> >> child = page_header(pte & PT64_BASE_ADDR_MASK); >> drop_parent_pte(child, sptep); >> -kvm_flush_remote_tlbs(vcpu->kvm); >> +flush = true; >> } else if (pfn != spte_to_pfn(*sptep)) { >> pgprintk("hfn old %llx new %llx\n", >> spte_to_pfn(*sptep), pfn); >> drop_spte(vcpu->kvm, sptep); >> -kvm_flush_remote_tlbs(vcpu->kvm); >> +flush = true; >> } else >> was_rmapped = 1; >> } >> @@ -2934,7 +2935,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 >> *sptep, unsigned pte_access, >> ret = RET_PF_EMULATE; >> kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); >> } >> -if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH) >> +if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush) >> kvm_flush_remote_tlbs(vcpu->kvm); >> >> if (unlikely(is_mmio_spte(*sptep))) >> > Reviewed-by: Junaid Shahid
Re: [PATCH v2] KVM: X86: Fix CR3 reserve bits
On 05/13/2018 02:24 AM, Wanpeng Li wrote: > From: Wanpeng Li <wanpen...@tencent.com> > > MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4. > It should be checked when PCIDE bit is not set, however commit > 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on > its physical address width")' removes the bit 63 checking > unconditionally. This patch fixes it by checking bit 63 of CR3 > when PCIDE bit is not set in CR4. > > Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits based on its > physical address width) > Cc: Paolo Bonzini <pbonz...@redhat.com> > Cc: Radim Krčmář <rkrc...@redhat.com> > Cc: Junaid Shahid <juna...@google.com> > Cc: Liran Alon <liran.a...@oracle.com> > Signed-off-by: Wanpeng Li <wanpen...@tencent.com> > --- > v1 -> v2: > * remove CR3_PCID_INVD in rsvd when PCIDE is 1 instead of >removing CR3_PCID_INVD in new_value > > arch/x86/kvm/emulate.c | 4 +++- > arch/x86/kvm/x86.c | 2 +- > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index b3705ae..143b7ae 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -4189,7 +4189,9 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt) > maxphyaddr = eax & 0xff; > else > maxphyaddr = 36; > - rsvd = rsvd_bits(maxphyaddr, 62); > + rsvd = rsvd_bits(maxphyaddr, 63); > + if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE) > + rsvd &= ~CR3_PCID_INVD; > } > > if (new_val & rsvd) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 87e4805..9a90668 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -863,7 +863,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) > } > > if (is_long_mode(vcpu) && > - (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62))) > + (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63))) > return 1; > else if (is_pae(vcpu) && is_paging(vcpu) && > !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) > Reviewed-by: Junaid Shahid <juna...@google.com>
Re: [PATCH v2] KVM: X86: Fix CR3 reserve bits
On 05/13/2018 02:24 AM, Wanpeng Li wrote: > From: Wanpeng Li > > MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4. > It should be checked when PCIDE bit is not set, however commit > 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on > its physical address width")' removes the bit 63 checking > unconditionally. This patch fixes it by checking bit 63 of CR3 > when PCIDE bit is not set in CR4. > > Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits based on its > physical address width) > Cc: Paolo Bonzini > Cc: Radim Krčmář > Cc: Junaid Shahid > Cc: Liran Alon > Signed-off-by: Wanpeng Li > --- > v1 -> v2: > * remove CR3_PCID_INVD in rsvd when PCIDE is 1 instead of >removing CR3_PCID_INVD in new_value > > arch/x86/kvm/emulate.c | 4 +++- > arch/x86/kvm/x86.c | 2 +- > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index b3705ae..143b7ae 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -4189,7 +4189,9 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt) > maxphyaddr = eax & 0xff; > else > maxphyaddr = 36; > - rsvd = rsvd_bits(maxphyaddr, 62); > + rsvd = rsvd_bits(maxphyaddr, 63); > + if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE) > + rsvd &= ~CR3_PCID_INVD; > } > > if (new_val & rsvd) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 87e4805..9a90668 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -863,7 +863,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) > } > > if (is_long_mode(vcpu) && > - (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62))) > + (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63))) > return 1; > else if (is_pae(vcpu) && is_paging(vcpu) && > !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) > Reviewed-by: Junaid Shahid
Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather
[Resending after delivery failure] Hi Dave, On 02/13/2018 10:22 AM, Dave Watson wrote: > > Yes, these both sound reasonable. I will send a V2. > > Thanks! Another minor suggestion for v2: It might be a good idea to check if the first assoclen bytes are already contiguous and only do the kmalloc if that isn't the case. Thanks, Junaid
Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather
[Resending after delivery failure] Hi Dave, On 02/13/2018 10:22 AM, Dave Watson wrote: > > Yes, these both sound reasonable. I will send a V2. > > Thanks! Another minor suggestion for v2: It might be a good idea to check if the first assoclen bytes are already contiguous and only do the kmalloc if that isn't the case. Thanks, Junaid
Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather
Hi Dave, On 02/12/2018 11:51 AM, Dave Watson wrote: > +static int gcmaes_encrypt_sg(struct aead_request *req, unsigned int assoclen, > + u8 *hash_subkey, u8 *iv, void *aes_ctx) > > +static int gcmaes_decrypt_sg(struct aead_request *req, unsigned int assoclen, > + u8 *hash_subkey, u8 *iv, void *aes_ctx) These two functions are almost identical. Wouldn't it be better to combine them into a single encrypt/decrypt function, similar to what you have done for the assembly macros? > + if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 || > + aesni_gcm_enc_tfm == aesni_gcm_enc) { Shouldn't we also include a check for the buffer length being less than AVX_GEN2_OPTSIZE? AVX will not be used in that case either. Thanks, Junaid
Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather
Hi Dave, On 02/12/2018 11:51 AM, Dave Watson wrote: > +static int gcmaes_encrypt_sg(struct aead_request *req, unsigned int assoclen, > + u8 *hash_subkey, u8 *iv, void *aes_ctx) > > +static int gcmaes_decrypt_sg(struct aead_request *req, unsigned int assoclen, > + u8 *hash_subkey, u8 *iv, void *aes_ctx) These two functions are almost identical. Wouldn't it be better to combine them into a single encrypt/decrypt function, similar to what you have done for the assembly macros? > + if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 || > + aesni_gcm_enc_tfm == aesni_gcm_enc) { Shouldn't we also include a check for the buffer length being less than AVX_GEN2_OPTSIZE? AVX will not be used in that case either. Thanks, Junaid
Re: [PATCH -V3 -mm] mm, swap: Fix race between swapoff and some swap operations
On Monday, December 18, 2017 3:41:41 PM PST Huang, Ying wrote: > > A version implemented via stop_machine() could be gotten via a small > patch as below. If you still prefer stop_machine(), I can resend a > version implemented with stop_machine(). > For the stop_machine() version, would it work to just put preempt_disable/enable at the start and end of lock_cluster() rather than introducing get/put_swap_device? Something like that might be simpler and would also disable preemption for less duration. Thanks, Junaid
Re: [PATCH -V3 -mm] mm, swap: Fix race between swapoff and some swap operations
On Monday, December 18, 2017 3:41:41 PM PST Huang, Ying wrote: > > A version implemented via stop_machine() could be gotten via a small > patch as below. If you still prefer stop_machine(), I can resend a > version implemented with stop_machine(). > For the stop_machine() version, would it work to just put preempt_disable/enable at the start and end of lock_cluster() rather than introducing get/put_swap_device? Something like that might be simpler and would also disable preemption for less duration. Thanks, Junaid
Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()
Thanks for the clarification. But in that case, shouldn’t the patch check whether IS_PARKED was already set before calling complete(>parked)? Otherwise, the completion count for self->parked could be more than 1 as a result of spurious wakeups, which could make a future call to kthread_park complete prematurely. Thanks, Junaid On Friday, September 29, 2017 10:28:38 AM Peter Zijlstra wrote: > On Fri, Sep 29, 2017 at 09:59:55AM +0200, Thomas Gleixner wrote: > > On Thu, 28 Sep 2017, Junaid Shahid wrote: > > > > > Hi Peter, > > > > > > It looks like try_cmpxchg is not available on non-x86 archs, but other > > > than > > > that the version that you proposed looks good. > > > > > > One thing that I am a bit curious about is that the original code, before > > > either patch, had a test_and_set_bit for KTHREAD_IS_PARKED rather than > > > just > > > a set_bit. I can't think of any reason why that was needed, since it > > > doesn't look like TASK_PARKED tasks are susceptible to spurious wakeups. > > > Do > > > you by any chance happen to know if there was any specific reason for it? > > > > Everything is susceptible to spurious wakeups and has to deal with it. > > Right, we should code as if they are at all times possible. Currently, > for TASK_PARKED, I don't think they can happen, but I've had patches > that introduce them on purpose (regardless the state) just to stress the > code. > > IIRC only TASK_STOPPED and/or TASK_TRACED hard rely on not getting any.
Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()
Thanks for the clarification. But in that case, shouldn’t the patch check whether IS_PARKED was already set before calling complete(>parked)? Otherwise, the completion count for self->parked could be more than 1 as a result of spurious wakeups, which could make a future call to kthread_park complete prematurely. Thanks, Junaid On Friday, September 29, 2017 10:28:38 AM Peter Zijlstra wrote: > On Fri, Sep 29, 2017 at 09:59:55AM +0200, Thomas Gleixner wrote: > > On Thu, 28 Sep 2017, Junaid Shahid wrote: > > > > > Hi Peter, > > > > > > It looks like try_cmpxchg is not available on non-x86 archs, but other > > > than > > > that the version that you proposed looks good. > > > > > > One thing that I am a bit curious about is that the original code, before > > > either patch, had a test_and_set_bit for KTHREAD_IS_PARKED rather than > > > just > > > a set_bit. I can't think of any reason why that was needed, since it > > > doesn't look like TASK_PARKED tasks are susceptible to spurious wakeups. > > > Do > > > you by any chance happen to know if there was any specific reason for it? > > > > Everything is susceptible to spurious wakeups and has to deal with it. > > Right, we should code as if they are at all times possible. Currently, > for TASK_PARKED, I don't think they can happen, but I've had patches > that introduce them on purpose (regardless the state) just to stress the > code. > > IIRC only TASK_STOPPED and/or TASK_TRACED hard rely on not getting any.
Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()
Hi, Has anyone been able to take a look at this? Thanks, Junaid On Tuesday, May 30, 2017 03:40:02 PM Andrew Morton wrote: > (cc tglx & tj) > > On Tue, 30 May 2017 15:37:23 -0700 Junaid Shahid <juna...@google.com> wrote: > > > (Resending) > > > > On Friday, April 28, 2017 07:32:36 PM Junaid Shahid wrote: > > > In general, if kthread_unpark() and kthread_parkme() execute together, > > > the kthread is supposed to be in an unparked state. This is because > > > kthread_unpark() either wakes up the thread if it already got parked, > > > or it cancels a prior kthread_park() call and hence renders the > > > kthread_parkme() moot. > > > > > > However, if kthread_unpark() happens to execute between the time that > > > kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the > > > KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the > > > thread and it will remain in a parked state. > > > > > > This is fixed by making the checking of KTHREAD_SHOULD_STOP and the > > > setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme. > > > > > > Signed-off-by: Junaid Shahid <juna...@google.com> > > > --- > > > kernel/kthread.c | 16 > > > 1 file changed, 12 insertions(+), 4 deletions(-) > > > > > > diff --git a/kernel/kthread.c b/kernel/kthread.c > > > index 26db528c1d88..651f03baf62f 100644 > > > --- a/kernel/kthread.c > > > +++ b/kernel/kthread.c > > > @@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task) > > > > > > static void __kthread_parkme(struct kthread *self) > > > { > > > + ulong flags; > > > + > > > __set_current_state(TASK_PARKED); > > > - while (test_bit(KTHREAD_SHOULD_PARK, >flags)) { > > > - if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags)) > > > - complete(>parked); > > > - schedule(); > > > + flags = self->flags; > > > + > > > + while (test_bit(KTHREAD_SHOULD_PARK, )) { > > > + if (cmpxchg(>flags, flags, > > > + flags | (1 << KTHREAD_IS_PARKED)) == flags) { > > > + if (!test_bit(KTHREAD_IS_PARKED, )) > > > + complete(>parked); > > > + schedule(); > > > + } > > > __set_current_state(TASK_PARKED); > > > + flags = self->flags; > > > } > > > clear_bit(KTHREAD_IS_PARKED, >flags); > > > __set_current_state(TASK_RUNNING); > > >
Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()
Hi, Has anyone been able to take a look at this? Thanks, Junaid On Tuesday, May 30, 2017 03:40:02 PM Andrew Morton wrote: > (cc tglx & tj) > > On Tue, 30 May 2017 15:37:23 -0700 Junaid Shahid wrote: > > > (Resending) > > > > On Friday, April 28, 2017 07:32:36 PM Junaid Shahid wrote: > > > In general, if kthread_unpark() and kthread_parkme() execute together, > > > the kthread is supposed to be in an unparked state. This is because > > > kthread_unpark() either wakes up the thread if it already got parked, > > > or it cancels a prior kthread_park() call and hence renders the > > > kthread_parkme() moot. > > > > > > However, if kthread_unpark() happens to execute between the time that > > > kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the > > > KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the > > > thread and it will remain in a parked state. > > > > > > This is fixed by making the checking of KTHREAD_SHOULD_STOP and the > > > setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme. > > > > > > Signed-off-by: Junaid Shahid > > > --- > > > kernel/kthread.c | 16 > > > 1 file changed, 12 insertions(+), 4 deletions(-) > > > > > > diff --git a/kernel/kthread.c b/kernel/kthread.c > > > index 26db528c1d88..651f03baf62f 100644 > > > --- a/kernel/kthread.c > > > +++ b/kernel/kthread.c > > > @@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task) > > > > > > static void __kthread_parkme(struct kthread *self) > > > { > > > + ulong flags; > > > + > > > __set_current_state(TASK_PARKED); > > > - while (test_bit(KTHREAD_SHOULD_PARK, >flags)) { > > > - if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags)) > > > - complete(>parked); > > > - schedule(); > > > + flags = self->flags; > > > + > > > + while (test_bit(KTHREAD_SHOULD_PARK, )) { > > > + if (cmpxchg(>flags, flags, > > > + flags | (1 << KTHREAD_IS_PARKED)) == flags) { > > > + if (!test_bit(KTHREAD_IS_PARKED, )) > > > + complete(>parked); > > > + schedule(); > > > + } > > > __set_current_state(TASK_PARKED); > > > + flags = self->flags; > > > } > > > clear_bit(KTHREAD_IS_PARKED, >flags); > > > __set_current_state(TASK_RUNNING); > > >
Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()
(Resending) On Friday, April 28, 2017 07:32:36 PM Junaid Shahid wrote: > In general, if kthread_unpark() and kthread_parkme() execute together, > the kthread is supposed to be in an unparked state. This is because > kthread_unpark() either wakes up the thread if it already got parked, > or it cancels a prior kthread_park() call and hence renders the > kthread_parkme() moot. > > However, if kthread_unpark() happens to execute between the time that > kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the > KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the > thread and it will remain in a parked state. > > This is fixed by making the checking of KTHREAD_SHOULD_STOP and the > setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme. > > Signed-off-by: Junaid Shahid <juna...@google.com> > --- > kernel/kthread.c | 16 > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/kernel/kthread.c b/kernel/kthread.c > index 26db528c1d88..651f03baf62f 100644 > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task) > > static void __kthread_parkme(struct kthread *self) > { > + ulong flags; > + > __set_current_state(TASK_PARKED); > - while (test_bit(KTHREAD_SHOULD_PARK, >flags)) { > - if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags)) > - complete(>parked); > - schedule(); > + flags = self->flags; > + > + while (test_bit(KTHREAD_SHOULD_PARK, )) { > + if (cmpxchg(>flags, flags, > + flags | (1 << KTHREAD_IS_PARKED)) == flags) { > + if (!test_bit(KTHREAD_IS_PARKED, )) > + complete(>parked); > + schedule(); > + } > __set_current_state(TASK_PARKED); > + flags = self->flags; > } > clear_bit(KTHREAD_IS_PARKED, >flags); > __set_current_state(TASK_RUNNING); >
Re: [PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()
(Resending) On Friday, April 28, 2017 07:32:36 PM Junaid Shahid wrote: > In general, if kthread_unpark() and kthread_parkme() execute together, > the kthread is supposed to be in an unparked state. This is because > kthread_unpark() either wakes up the thread if it already got parked, > or it cancels a prior kthread_park() call and hence renders the > kthread_parkme() moot. > > However, if kthread_unpark() happens to execute between the time that > kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the > KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the > thread and it will remain in a parked state. > > This is fixed by making the checking of KTHREAD_SHOULD_STOP and the > setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme. > > Signed-off-by: Junaid Shahid > --- > kernel/kthread.c | 16 > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/kernel/kthread.c b/kernel/kthread.c > index 26db528c1d88..651f03baf62f 100644 > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task) > > static void __kthread_parkme(struct kthread *self) > { > + ulong flags; > + > __set_current_state(TASK_PARKED); > - while (test_bit(KTHREAD_SHOULD_PARK, >flags)) { > - if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags)) > - complete(>parked); > - schedule(); > + flags = self->flags; > + > + while (test_bit(KTHREAD_SHOULD_PARK, )) { > + if (cmpxchg(>flags, flags, > + flags | (1 << KTHREAD_IS_PARKED)) == flags) { > + if (!test_bit(KTHREAD_IS_PARKED, )) > + complete(>parked); > + schedule(); > + } > __set_current_state(TASK_PARKED); > + flags = self->flags; > } > clear_bit(KTHREAD_IS_PARKED, >flags); > __set_current_state(TASK_RUNNING); >
Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()
(Adding back the correct linux-mm email address and also adding linux-kernel.) On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote: > On Thu, 18 May 2017, Michal Hocko wrote: > > > On Thu 18-05-17 11:50:40, Junaid Shahid wrote: > > > d224e9381897 (drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded > > > variant) left out the __GFP_HIGH flag when converting from __vmalloc to > > > kvmalloc. This can cause the IOCTL to fail in some low memory situations > > > where it wouldn't have failed earlier. This patch adds it back to avoid > > > any potential regression. > > > > The code previously used __GFP_HIGH only for the vmalloc fallback and > > that doesn't make that much sense with the current implementation > > because vmalloc does order-0 pages and those do not really fail and the > > oom killer is invoked to free memory. > > > > Order-0 pages certainly do fail, there is not an infinite amount of memory > nor is there a specific exemption to allow order-0 memory to be alloctable > below watermarks without this gfp flag. OOM kill is the last thing we > want for these allocations since they are very temporary. > > > There is no reason to access memory reserves from this context. > > > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, > assuming there was a reason to do it in the first place in two different > ways. > > This decision is up to the device mapper maintainers. > > > > Signed-off-by: Junaid Shahid <juna...@google.com> > > > --- > > > drivers/md/dm-ioctl.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c > > > index 0555b4410e05..bacad7637a56 100644 > > > --- a/drivers/md/dm-ioctl.c > > > +++ b/drivers/md/dm-ioctl.c > > > @@ -1715,7 +1715,7 @@ static int copy_params(struct dm_ioctl __user > > > *user, struct dm_ioctl *param_kern > > >*/ > > > dmi = NULL; > > > noio_flag = memalloc_noio_save(); > > > - dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL); > > > + dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH); > > > memalloc_noio_restore(noio_flag); > > > > > > if (!dmi) {
Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()
(Adding back the correct linux-mm email address and also adding linux-kernel.) On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote: > On Thu, 18 May 2017, Michal Hocko wrote: > > > On Thu 18-05-17 11:50:40, Junaid Shahid wrote: > > > d224e9381897 (drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded > > > variant) left out the __GFP_HIGH flag when converting from __vmalloc to > > > kvmalloc. This can cause the IOCTL to fail in some low memory situations > > > where it wouldn't have failed earlier. This patch adds it back to avoid > > > any potential regression. > > > > The code previously used __GFP_HIGH only for the vmalloc fallback and > > that doesn't make that much sense with the current implementation > > because vmalloc does order-0 pages and those do not really fail and the > > oom killer is invoked to free memory. > > > > Order-0 pages certainly do fail, there is not an infinite amount of memory > nor is there a specific exemption to allow order-0 memory to be alloctable > below watermarks without this gfp flag. OOM kill is the last thing we > want for these allocations since they are very temporary. > > > There is no reason to access memory reserves from this context. > > > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, > assuming there was a reason to do it in the first place in two different > ways. > > This decision is up to the device mapper maintainers. > > > > Signed-off-by: Junaid Shahid > > > --- > > > drivers/md/dm-ioctl.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c > > > index 0555b4410e05..bacad7637a56 100644 > > > --- a/drivers/md/dm-ioctl.c > > > +++ b/drivers/md/dm-ioctl.c > > > @@ -1715,7 +1715,7 @@ static int copy_params(struct dm_ioctl __user > > > *user, struct dm_ioctl *param_kern > > >*/ > > > dmi = NULL; > > > noio_flag = memalloc_noio_save(); > > > - dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL); > > > + dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH); > > > memalloc_noio_restore(noio_flag); > > > > > > if (!dmi) {
[PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()
In general, if kthread_unpark() and kthread_parkme() execute together, the kthread is supposed to be in an unparked state. This is because kthread_unpark() either wakes up the thread if it already got parked, or it cancels a prior kthread_park() call and hence renders the kthread_parkme() moot. However, if kthread_unpark() happens to execute between the time that kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the thread and it will remain in a parked state. This is fixed by making the checking of KTHREAD_SHOULD_STOP and the setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme. Signed-off-by: Junaid Shahid <juna...@google.com> --- kernel/kthread.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index 26db528c1d88..651f03baf62f 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task) static void __kthread_parkme(struct kthread *self) { + ulong flags; + __set_current_state(TASK_PARKED); - while (test_bit(KTHREAD_SHOULD_PARK, >flags)) { - if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags)) - complete(>parked); - schedule(); + flags = self->flags; + + while (test_bit(KTHREAD_SHOULD_PARK, )) { + if (cmpxchg(>flags, flags, + flags | (1 << KTHREAD_IS_PARKED)) == flags) { + if (!test_bit(KTHREAD_IS_PARKED, )) + complete(>parked); + schedule(); + } __set_current_state(TASK_PARKED); + flags = self->flags; } clear_bit(KTHREAD_IS_PARKED, >flags); __set_current_state(TASK_RUNNING); -- 2.13.0.rc0.306.g87b477812d-goog
[PATCH] kthread: Fix race condition between kthread_parkme() and kthread_unpark()
In general, if kthread_unpark() and kthread_parkme() execute together, the kthread is supposed to be in an unparked state. This is because kthread_unpark() either wakes up the thread if it already got parked, or it cancels a prior kthread_park() call and hence renders the kthread_parkme() moot. However, if kthread_unpark() happens to execute between the time that kthread_parkme() checks the KTHREAD_SHOULD_STOP flag and sets the KTHREAD_IS_PARKED flag, then kthread_unpark() will not wake up the thread and it will remain in a parked state. This is fixed by making the checking of KTHREAD_SHOULD_STOP and the setting of KTHREAD_IS_PARKED atomic via a cmpxchg inside kthread_parkme. Signed-off-by: Junaid Shahid --- kernel/kthread.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index 26db528c1d88..651f03baf62f 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -169,12 +169,20 @@ void *kthread_probe_data(struct task_struct *task) static void __kthread_parkme(struct kthread *self) { + ulong flags; + __set_current_state(TASK_PARKED); - while (test_bit(KTHREAD_SHOULD_PARK, >flags)) { - if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags)) - complete(>parked); - schedule(); + flags = self->flags; + + while (test_bit(KTHREAD_SHOULD_PARK, )) { + if (cmpxchg(>flags, flags, + flags | (1 << KTHREAD_IS_PARKED)) == flags) { + if (!test_bit(KTHREAD_IS_PARKED, )) + complete(>parked); + schedule(); + } __set_current_state(TASK_PARKED); + flags = self->flags; } clear_bit(KTHREAD_IS_PARKED, >flags); __set_current_state(TASK_RUNNING); -- 2.13.0.rc0.306.g87b477812d-goog