date:20230718

Re: [PATCH 0/4] Invalidate secondary IOMMU TLB on permission upgrade

2023-07-18 Thread Alistair Popple

"Tian, Kevin"  writes:

>> From: Anshuman Khandual 
>> Sent: Wednesday, July 19, 2023 11:04 AM
>> 
>> On 7/18/23 13:26, Alistair Popple wrote:
>> > The main change is to move secondary TLB invalidation mmu notifier
>> > callbacks into the architecture specific TLB flushing functions. This
>> > makes secondary TLB invalidation mostly match CPU invalidation while
>> > still allowing efficient range based invalidations based on the
>> > existing TLB batching code.
>> >
>> > ==
>> > Background
>> > ==
>> >
>> > The arm64 architecture specifies TLB permission bits may be cached and
>> > therefore the TLB must be invalidated during permission upgrades. For
>> > the CPU this currently occurs in the architecture specific
>> > ptep_set_access_flags() routine.
>> >
>> > Secondary TLBs such as implemented by the SMMU IOMMU match the CPU
>> > architecture specification and may also cache permission bits and
>> > require the same TLB invalidations. This may be achieved in one of two
>> > ways.
>> >
>> > Some SMMU implementations implement broadcast TLB maintenance
>> > (BTM). This snoops CPU TLB invalidates and will invalidate any
>> > secondary TLB at the same time as the CPU. However implementations are
>> > not required to implement BTM.
>> 
>> So, the implementations with BTM do not even need a MMU notifier callback
>> for secondary TLB invalidation purpose ? Perhaps mmu_notifier_register()
>> could also be skipped for such cases i.e with ARM_SMMU_FEAT_BTM
>> enabled ?
>> 

A notifier callback is still required to send the PCIe ATC request to
devices. As I understand it BTM means just that SMMU TLB maintenance
isn't required. In other words SMMU with BTM will snoop CPU TLB
invalidates to maintain the SMMU TLB but still won't generate ATC
requests based on snooping.

> Out of curiosity. How does BTM work with device tlb? Can SMMU translate
> a TLB broadcast request (based on ASID) into a set of PCI ATS invalidation
> requests (based on PCI requestor ID and PASID) in hardware?

See above but I don't think so.

> If software intervention is required then it might be the reason why mmu
> notifier cannot be skipped. With BTM enabled it just means the notifier
> callback can skip iotlb invalidation...

Right. If you look at the implementation for
arm_smmu_mm_arch_invalidate_secondary_tlbs() you can see
arm_smmu_tlb_inv_range_asid() is only called if BTM is not supported to
invalidate SMMU TLB vs. arm_smmu_atc_inv_domain() which is always called
to send the invalidations down to the devices.

>> Based on feedback from Jason [2] the proposed solution to the bug is
>> to move the calls to mmu_notifier_arch_invalidate_secondary_tlbs()
>> closer to the architecture specific TLB invalidation code. This
>> ensures the secondary TLB won't miss invalidations, including the
>> existing invalidation in the ARM64 code to deal with permission
>> upgrade.
>
> ptep_set_access_flags() is the only problematic place where this issue
> is being reported ? If yes, why dont fix that instead of moving these
> into platform specific callbacks ? OR there are other problematic areas
> I might be missing.

See the previous feedback, and in particular this thread -
https://lore.kernel.org/all/5d8e1f752051173d2d1b5c3e14b54eb3506ed3ef.1684892404.git-series.apop...@nvidia.com/.

TLDR - I don't think there are any other problematic areas, but it's
hard to reason about when TLB notifiers should be called when it all
happens out of band and it's easy to miss. For example this bug would
not have been possible had they been called from the TLB flushing code.

Ideally I think most kernel code should call some generic TLB flushing
function that could call this. However at the moment no intermediate
functions exist - kernel calls the architecture specific implementations
directly. Adding a layer of indirection seems like it would be a lot of
churn with possible performance implications as well.

Re: [PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock()

2023-07-18 Thread Aneesh Kumar K V

On 7/19/23 10:34 AM, Hugh Dickins wrote:
> On Tue, 18 Jul 2023, Aneesh Kumar K.V wrote:
>> Hugh Dickins  writes:
>>
>>> Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
>>> in assert_pte_locked().  BUG if pte_offset_map_nolock() fails: this is
>>> stricter than the previous implementation, which skipped when pmd_none()
>>> (with a comment on khugepaged collapse transitions): but wouldn't we want
>>> to know, if an assert_pte_locked() caller can be racing such transitions?
>>>
>>
>> The reason we had that pmd_none check there was to handle khugpaged. In
>> case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear.
>> ppc64 had the assert_pte_locked check inside that ptep_clear.
>>
>> _pmd = pmdp_collapse_flush(vma, address, pmd);
>> ..
>> ptep_clear()
>> -> asset_ptep_locked()
>> ---> pmd_none
>> -> BUG
>>
>>
>> The problem is how assert_pte_locked() verify whether we are holding
>> ptl. It does that by walking the page table again and in this specific
>> case by the time we call the function we already had cleared pmd .
> 
> Aneesh, please clarify, I've spent hours on this.
> 
> From all your use of past tense ("had"), I thought you were Acking my
> patch; but now, after looking again at v3.11 source and today's,
> I think you are NAKing my patch in its present form.
> 

Sorry for the confusion my reply created. 

> You are pointing out that anon THP's __collapse_huge_page_copy_succeeded()
> uses ptep_clear() at a point after pmdp_collapse_flush() already cleared
> *pmd, so my patch now leads that one use of assert_pte_locked() to BUG.
> Is that your point?
> 

Yes. I haven't tested this yet to verify that it is indeed hitting that BUG.
But a code inspection tells me we will hit that BUG on powerpc because of
the above details.

> I can easily restore that khugepaged comment (which had appeared to me
> out of date at the time, but now looks still relevant) and pmd_none(*pmd)
> check: but please clarify.
> 

That is correct. if we add that pmd_none check back we should be good here.


-aneesh

Re: [PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock()

2023-07-18 Thread Hugh Dickins

On Tue, 18 Jul 2023, Aneesh Kumar K.V wrote:
> Hugh Dickins  writes:
> 
> > Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
> > in assert_pte_locked().  BUG if pte_offset_map_nolock() fails: this is
> > stricter than the previous implementation, which skipped when pmd_none()
> > (with a comment on khugepaged collapse transitions): but wouldn't we want
> > to know, if an assert_pte_locked() caller can be racing such transitions?
> >
> 
> The reason we had that pmd_none check there was to handle khugpaged. In
> case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear.
> ppc64 had the assert_pte_locked check inside that ptep_clear.
> 
> _pmd = pmdp_collapse_flush(vma, address, pmd);
> ..
> ptep_clear()
> -> asset_ptep_locked()
> ---> pmd_none
> -> BUG
> 
> 
> The problem is how assert_pte_locked() verify whether we are holding
> ptl. It does that by walking the page table again and in this specific
> case by the time we call the function we already had cleared pmd .

Aneesh, please clarify, I've spent hours on this.

>From all your use of past tense ("had"), I thought you were Acking my
patch; but now, after looking again at v3.11 source and today's,
I think you are NAKing my patch in its present form.

You are pointing out that anon THP's __collapse_huge_page_copy_succeeded()
uses ptep_clear() at a point after pmdp_collapse_flush() already cleared
*pmd, so my patch now leads that one use of assert_pte_locked() to BUG.
Is that your point?

I can easily restore that khugepaged comment (which had appeared to me
out of date at the time, but now looks still relevant) and pmd_none(*pmd)
check: but please clarify.

Thanks,
Hugh

> >
> > This mod might cause new crashes: which either expose my ignorance, or
> > indicate issues to be fixed, or limit the usage of assert_pte_locked().
> >
> > Signed-off-by: Hugh Dickins 
> > ---
> >  arch/powerpc/mm/pgtable.c | 16 ++--
> >  1 file changed, 6 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> > index cb2dcdb18f8e..16b061af86d7 100644
> > --- a/arch/powerpc/mm/pgtable.c
> > +++ b/arch/powerpc/mm/pgtable.c
> > @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned 
> > long addr)
> > p4d_t *p4d;
> > pud_t *pud;
> > pmd_t *pmd;
> > +   pte_t *pte;
> > +   spinlock_t *ptl;
> >  
> > if (mm == _mm)
> > return;
> > @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned 
> > long addr)
> > pud = pud_offset(p4d, addr);
> > BUG_ON(pud_none(*pud));
> > pmd = pmd_offset(pud, addr);
> > -   /*
> > -* khugepaged to collapse normal pages to hugepage, first set
> > -* pmd to none to force page fault/gup to take mmap_lock. After
> > -* pmd is set to none, we do a pte_clear which does this assertion
> > -* so if we find pmd none, return.
> > -*/
> > -   if (pmd_none(*pmd))
> > -   return;
> > -   BUG_ON(!pmd_present(*pmd));
> > -   assert_spin_locked(pte_lockptr(mm, pmd));
> > +   pte = pte_offset_map_nolock(mm, pmd, addr, );
> > +   BUG_ON(!pte);
> > +   assert_spin_locked(ptl);
> > +   pte_unmap(pte);
> >  }
> >  #endif /* CONFIG_DEBUG_VM */
> >  
> > -- 
> > 2.35.3

Re: [PATCH v2 2/2] PCI: layerscape: Add the workaround for lost link capablities during reset

2023-07-18 Thread Manivannan Sadhasivam

On Tue, Jul 18, 2023 at 02:21:42PM -0400, Frank Li wrote:
> From: Xiaowei Bao 
> 
> A workaround for the issue where the PCI Express Endpoint (EP) controller
> loses the values of the Maximum Link Width and Supported Link Speed from
> the Link Capabilities Register, which initially configured by the Reset
> Configuration Word (RCW) during a link-down or hot reset event.
> 
> Fixes: a805770d8a22 ("PCI: layerscape: Add EP mode support")
> Signed-off-by: Xiaowei Bao 
> Signed-off-by: Hou Zhiqiang 
> Signed-off-by: Frank Li 

Acked-by: Manivannan Sadhasivam 

- Mani

> ---
> change from v1 to v2:
>  - add comments at restore register
>  - add fixes tag
> 
>  .../pci/controller/dwc/pci-layerscape-ep.c| 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> index e0969ff2ddf7..b1faf41a2fae 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -45,6 +45,7 @@ struct ls_pcie_ep {
>   struct pci_epc_features *ls_epc;
>   const struct ls_pcie_ep_drvdata *drvdata;
>   int irq;
> + u32 lnkcap;
>   boolbig_endian;
>  };
>  
> @@ -73,6 +74,7 @@ static irqreturn_t ls_pcie_ep_event_handler(int irq, void 
> *dev_id)
>   struct ls_pcie_ep *pcie = dev_id;
>   struct dw_pcie *pci = pcie->pci;
>   u32 val, cfg;
> + u8 offset;
>  
>   val = ls_lut_readl(pcie, PEX_PF0_PME_MES_DR);
>   ls_lut_writel(pcie, PEX_PF0_PME_MES_DR, val);
> @@ -81,6 +83,19 @@ static irqreturn_t ls_pcie_ep_event_handler(int irq, void 
> *dev_id)
>   return IRQ_NONE;
>  
>   if (val & PEX_PF0_PME_MES_DR_LUD) {
> +
> + offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
> +
> + /*
> +  * The values of the Maximum Link Width and Supported Link
> +  * Speed from the Link Capabilities Register will be lost
> +  * during link down or hot reset. Restore initial value
> +  * that configured by the Reset Configuration Word (RCW).
> +  */
> + dw_pcie_dbi_ro_wr_en(pci);
> + dw_pcie_writel_dbi(pci, offset + PCI_EXP_LNKCAP, pcie->lnkcap);
> + dw_pcie_dbi_ro_wr_dis(pci);
> +
>   cfg = ls_lut_readl(pcie, PEX_PF0_CONFIG);
>   cfg |= PEX_PF0_CFG_READY;
>   ls_lut_writel(pcie, PEX_PF0_CONFIG, cfg);
> @@ -216,6 +231,7 @@ static int __init ls_pcie_ep_probe(struct platform_device 
> *pdev)
>   struct ls_pcie_ep *pcie;
>   struct pci_epc_features *ls_epc;
>   struct resource *dbi_base;
> + u8 offset;
>   int ret;
>  
>   pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL);
> @@ -252,6 +268,9 @@ static int __init ls_pcie_ep_probe(struct platform_device 
> *pdev)
>  
>   platform_set_drvdata(pdev, pcie);
>  
> + offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
> + pcie->lnkcap = dw_pcie_readl_dbi(pci, offset + PCI_EXP_LNKCAP);
> +
>   ret = dw_pcie_ep_init(>ep);
>   if (ret)
>   return ret;
> -- 
> 2.34.1
> 

-- 
மணிவண்ணன் சதாசிவம்

RE: [PATCH 0/4] Invalidate secondary IOMMU TLB on permission upgrade

2023-07-18 Thread Tian, Kevin

> From: Anshuman Khandual 
> Sent: Wednesday, July 19, 2023 11:04 AM
> 
> On 7/18/23 13:26, Alistair Popple wrote:
> > The main change is to move secondary TLB invalidation mmu notifier
> > callbacks into the architecture specific TLB flushing functions. This
> > makes secondary TLB invalidation mostly match CPU invalidation while
> > still allowing efficient range based invalidations based on the
> > existing TLB batching code.
> >
> > ==
> > Background
> > ==
> >
> > The arm64 architecture specifies TLB permission bits may be cached and
> > therefore the TLB must be invalidated during permission upgrades. For
> > the CPU this currently occurs in the architecture specific
> > ptep_set_access_flags() routine.
> >
> > Secondary TLBs such as implemented by the SMMU IOMMU match the CPU
> > architecture specification and may also cache permission bits and
> > require the same TLB invalidations. This may be achieved in one of two
> > ways.
> >
> > Some SMMU implementations implement broadcast TLB maintenance
> > (BTM). This snoops CPU TLB invalidates and will invalidate any
> > secondary TLB at the same time as the CPU. However implementations are
> > not required to implement BTM.
> 
> So, the implementations with BTM do not even need a MMU notifier callback
> for secondary TLB invalidation purpose ? Perhaps mmu_notifier_register()
> could also be skipped for such cases i.e with ARM_SMMU_FEAT_BTM
> enabled ?
> 

Out of curiosity. How does BTM work with device tlb? Can SMMU translate
a TLB broadcast request (based on ASID) into a set of PCI ATS invalidation
requests (based on PCI requestor ID and PASID) in hardware?

If software intervention is required then it might be the reason why mmu
notifier cannot be skipped. With BTM enabled it just means the notifier
callback can skip iotlb invalidation...

Re: [PATCH 0/4] Invalidate secondary IOMMU TLB on permission upgrade

2023-07-18 Thread Anshuman Khandual




On 7/18/23 13:26, Alistair Popple wrote:
> The main change is to move secondary TLB invalidation mmu notifier
> callbacks into the architecture specific TLB flushing functions. This
> makes secondary TLB invalidation mostly match CPU invalidation while
> still allowing efficient range based invalidations based on the
> existing TLB batching code.
> 
> ==
> Background
> ==
> 
> The arm64 architecture specifies TLB permission bits may be cached and
> therefore the TLB must be invalidated during permission upgrades. For
> the CPU this currently occurs in the architecture specific
> ptep_set_access_flags() routine.
> 
> Secondary TLBs such as implemented by the SMMU IOMMU match the CPU
> architecture specification and may also cache permission bits and
> require the same TLB invalidations. This may be achieved in one of two
> ways.
> 
> Some SMMU implementations implement broadcast TLB maintenance
> (BTM). This snoops CPU TLB invalidates and will invalidate any
> secondary TLB at the same time as the CPU. However implementations are
> not required to implement BTM.

So, the implementations with BTM do not even need a MMU notifier callback
for secondary TLB invalidation purpose ? Perhaps mmu_notifier_register()
could also be skipped for such cases i.e with ARM_SMMU_FEAT_BTM enabled ?

BTW, dont see ARM_SMMU_FEAT_BTM being added as a feature any where during
the probe i.e arm_smmu_device_hw_probe().

> 
> Implementations without BTM rely on mmu notifier callbacks to send
> explicit TLB invalidation commands to invalidate SMMU TLB. Therefore
> either generic kernel code or architecture specific code needs to call
> the mmu notifier on permission upgrade.
> 
> Currently that doesn't happen so devices will fault indefinitely when
> writing to a PTE that was previously read-only as nothing invalidates
> the SMMU TLB.

Why does not the current SMMU MMU notifier intercept all invalidation from
generic MM code and do the required secondary TLB invalidation ? Is there
a timing issue involved here ? Secondary TLB invalidation does happen but
after the damage has been done ? Could you please point us to a real world
bug report taking such indefinite faults as mentioned above ?

> 
> 
> Solution
> 
> 
> To fix this the series first renames the .invalidate_range() callback
> to .arch_invalidate_secondary_tlbs() as suggested by Jason and Sean to
> make it clear this callback is only used for secondary TLBs. That was
> made possible thanks to Sean's series [1] to remove KVM's incorrect
> usage.
> 
> Based on feedback from Jason [2] the proposed solution to the bug is
> to move the calls to mmu_notifier_arch_invalidate_secondary_tlbs()
> closer to the architecture specific TLB invalidation code. This
> ensures the secondary TLB won't miss invalidations, including the
> existing invalidation in the ARM64 code to deal with permission
> upgrade.

ptep_set_access_flags() is the only problematic place where this issue
is being reported ? If yes, why dont fix that instead of moving these
into platform specific callbacks ? OR there are other problematic areas
I might be missing.

> 
> Currently only ARM64, PowerPC and x86 have IOMMU with secondary TLBs
> requiring SW invalidation so the notifier is only called for those
> architectures. It is also not called for invalidation of kernel
> mappings as no secondary IOMMU implementations can access those and
> hence it is not required.
> 
> [1] - https://lore.kernel.org/all/20230602011518.787006-1-sea...@google.com/
> [2] - https://lore.kernel.org/linux-mm/zjmr5bw8l+bbz...@ziepe.ca/
> 
> Alistair Popple (4):
>   mm_notifiers: Rename invalidate_range notifier
>   arm64/smmu: Use TLBI ASID when invalidating entire range
>   mmu_notifiers: Call arch_invalidate_secondary_tlbs() when invalidating TLBs
>   mmu_notifiers: Don't invalidate secondary TLBs as part of 
> mmu_notifier_invalidate_range_end()
> 
>  arch/arm64/include/asm/tlbflush.h   |   5 +-
>  arch/powerpc/include/asm/book3s/64/tlbflush.h   |   1 +-
>  arch/powerpc/mm/book3s64/radix_hugetlbpage.c|   1 +-
>  arch/powerpc/mm/book3s64/radix_tlb.c|   6 +-
>  arch/x86/mm/tlb.c   |   3 +-
>  drivers/iommu/amd/iommu_v2.c|  10 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c |  29 +++--
>  drivers/iommu/intel/svm.c   |   8 +-
>  drivers/misc/ocxl/link.c|   8 +-
>  include/asm-generic/tlb.h   |   1 +-
>  include/linux/mmu_notifier.h| 104 -
>  kernel/events/uprobes.c |   2 +-
>  mm/huge_memory.c|  29 +
>  mm/hugetlb.c|   8 +-
>  mm/memory.c |   8 +-
>  mm/migrate_device.c |   9 +-
>  mm/mmu_notifier.c   |  47 +++-
>  mm/rmap.c

Re: [RFC PATCH v11 11/29] security: Export security_inode_init_security_anon() for use by KVM

2023-07-18 Thread Paul Moore

On Tue, Jul 18, 2023 at 7:48 PM Sean Christopherson  wrote:
>
> Signed-off-by: Sean Christopherson 
> ---
>  security/security.c | 1 +
>  1 file changed, 1 insertion(+)

Acked-by: Paul Moore 

> diff --git a/security/security.c b/security/security.c
> index b720424ca37d..7fc78f0f3622 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1654,6 +1654,7 @@ int security_inode_init_security_anon(struct inode 
> *inode,
> return call_int_hook(inode_init_security_anon, 0, inode, name,
>  context_inode);
>  }
> +EXPORT_SYMBOL_GPL(security_inode_init_security_anon);
>
>  #ifdef CONFIG_SECURITY_PATH
>  /**
> --
> 2.41.0.255.g8b1d071c50-goog

--
paul-moore.com

Re: [PATCH 1/4] mm_notifiers: Rename invalidate_range notifier

2023-07-18 Thread Alistair Popple



Andrew Morton  writes:

> On Tue, 18 Jul 2023 14:57:12 -0300 Jason Gunthorpe  wrote:
>
>> On Tue, Jul 18, 2023 at 05:56:15PM +1000, Alistair Popple wrote:
>> > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
>> > index b466172..48c81b9 100644
>> > --- a/include/asm-generic/tlb.h
>> > +++ b/include/asm-generic/tlb.h
>> > @@ -456,7 +456,7 @@ static inline void tlb_flush_mmu_tlbonly(struct 
>> > mmu_gather *tlb)
>> >return;
>> >  
>> >tlb_flush(tlb);
>> > -  mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
>> > +  mmu_notifier_invalidate_secondary_tlbs(tlb->mm, tlb->start, tlb->end);
>> >__tlb_reset_range(tlb);
>> 
>> Does this compile? I don't see
>> "mmu_notifier_invalidate_secondary_tlbs" ?

Dang, sorry. The original rename was to that but then we added *_arch_*
and I obviously missed some of the already renamed calls.

> Seems this call gets deleted later in the series.
>
>> But I think the approach in this series looks fine, it is so much
>> cleaner after we remove all the cruft in patch 4, just look at the
>> diffstat..
>
> I'll push this into -next if it compiles OK for me, but yes, a redo is
> desirable please.

Yep, will respin.

[RFC PATCH v11 29/29] KVM: selftests: Test KVM exit behavior for private memory/access

2023-07-18 Thread Sean Christopherson

From: Ackerley Tng 

"Testing private access when memslot gets deleted" tests the behavior
of KVM when a private memslot gets deleted while the VM is using the
private memslot. When KVM looks up the deleted (slot = NULL) memslot,
KVM should exit to userspace with KVM_EXIT_MEMORY_FAULT.

In the second test, upon a private access to non-private memslot, KVM
should also exit to userspace with KVM_EXIT_MEMORY_FAULT.

sean: These testcases belong in set_memory_region_test.c, they're private
variants on existing testscases and aren't as robust, e.g. don't ensure
the vCPU is actually running and accessing memory when converting and
deleting.

Signed-off-by: Ackerley Tng 
Signed-off-by: Sean Christopherson 
---
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../kvm/x86_64/private_mem_kvm_exits_test.c   | 115 ++
 2 files changed, 116 insertions(+)
 create mode 100644 
tools/testing/selftests/kvm/x86_64/private_mem_kvm_exits_test.c

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 18c43336ede3..cb9450022302 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -81,6 +81,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/nested_exceptions_test
 TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test
 TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test
 TEST_GEN_PROGS_x86_64 += x86_64/private_mem_conversions_test
+TEST_GEN_PROGS_x86_64 += x86_64/private_mem_kvm_exits_test
 TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id
 TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test
 TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test
diff --git a/tools/testing/selftests/kvm/x86_64/private_mem_kvm_exits_test.c 
b/tools/testing/selftests/kvm/x86_64/private_mem_kvm_exits_test.c
new file mode 100644
index ..8daaa08c0d90
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/private_mem_kvm_exits_test.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022, Google LLC.
+ */
+#include 
+#include 
+#include 
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+/* Arbitrarily selected to avoid overlaps with anything else */
+#define EXITS_TEST_GVA 0xc000
+#define EXITS_TEST_GPA EXITS_TEST_GVA
+#define EXITS_TEST_NPAGES 1
+#define EXITS_TEST_SIZE (EXITS_TEST_NPAGES * PAGE_SIZE)
+#define EXITS_TEST_SLOT 10
+
+static uint64_t guest_repeatedly_read(void)
+{
+   volatile uint64_t value;
+
+   while (true)
+   value = *((uint64_t *) EXITS_TEST_GVA);
+
+   return value;
+}
+
+static uint32_t run_vcpu_get_exit_reason(struct kvm_vcpu *vcpu)
+{
+   vcpu_run(vcpu);
+
+   return vcpu->run->exit_reason;
+}
+
+const struct vm_shape protected_vm_shape = {
+   .mode = VM_MODE_DEFAULT,
+   .type = KVM_X86_SW_PROTECTED_VM,
+};
+
+static void test_private_access_memslot_deleted(void)
+{
+   struct kvm_vm *vm;
+   struct kvm_vcpu *vcpu;
+   pthread_t vm_thread;
+   void *thread_return;
+   uint32_t exit_reason;
+
+   vm = vm_create_shape_with_one_vcpu(protected_vm_shape, ,
+  guest_repeatedly_read);
+
+   vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+   EXITS_TEST_GPA, EXITS_TEST_SLOT,
+   EXITS_TEST_NPAGES,
+   KVM_MEM_PRIVATE);
+
+   virt_map(vm, EXITS_TEST_GVA, EXITS_TEST_GPA, EXITS_TEST_NPAGES);
+
+   /* Request to access page privately */
+   vm_mem_set_private(vm, EXITS_TEST_GPA, EXITS_TEST_SIZE);
+
+   pthread_create(_thread, NULL,
+  (void *(*)(void *))run_vcpu_get_exit_reason,
+  (void *)vcpu);
+
+   vm_mem_region_delete(vm, EXITS_TEST_SLOT);
+
+   pthread_join(vm_thread, _return);
+   exit_reason = (uint32_t)(uint64_t)thread_return;
+
+   ASSERT_EQ(exit_reason, KVM_EXIT_MEMORY_FAULT);
+   ASSERT_EQ(vcpu->run->memory.flags, KVM_MEMORY_EXIT_FLAG_PRIVATE);
+   ASSERT_EQ(vcpu->run->memory.gpa, EXITS_TEST_GPA);
+   ASSERT_EQ(vcpu->run->memory.size, EXITS_TEST_SIZE);
+
+   kvm_vm_free(vm);
+}
+
+static void test_private_access_memslot_not_private(void)
+{
+   struct kvm_vm *vm;
+   struct kvm_vcpu *vcpu;
+   uint32_t exit_reason;
+
+   vm = vm_create_shape_with_one_vcpu(protected_vm_shape, ,
+  guest_repeatedly_read);
+
+   /* Add a non-private memslot (flags = 0) */
+   vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+   EXITS_TEST_GPA, EXITS_TEST_SLOT,
+   EXITS_TEST_NPAGES, 0);
+
+   virt_map(vm, EXITS_TEST_GVA, EXITS_TEST_GPA, EXITS_TEST_NPAGES);
+
+   /* Request to access page privately */
+   vm_mem_set_private(vm, EXITS_TEST_GPA, EXITS_TEST_SIZE);
+
+   exit_reason = run_vcpu_get_exit_reason(vcpu);
+
+

[RFC PATCH v11 28/29] KVM: selftests: Add basic selftest for guest_memfd()

2023-07-18 Thread Sean Christopherson

Add a selftest to verify the basic functionality of guest_memfd():

+ file descriptor created with the guest_memfd() ioctl does not allow
  read/write/mmap operations
+ file size and block size as returned from fstat are as expected
+ fallocate on the fd checks that offset/length on
  fallocate(FALLOC_FL_PUNCH_HOLE) should be page aligned

Signed-off-by: Chao Peng 
Co-developed-by: Ackerley Tng 
Signed-off-by: Ackerley Tng 
Signed-off-by: Sean Christopherson 
---
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../testing/selftests/kvm/guest_memfd_test.c  | 114 ++
 2 files changed, 115 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/guest_memfd_test.c

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index fdc7dff8d6ae..18c43336ede3 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -123,6 +123,7 @@ TEST_GEN_PROGS_x86_64 += access_tracking_perf_test
 TEST_GEN_PROGS_x86_64 += demand_paging_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
 TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
+TEST_GEN_PROGS_x86_64 += guest_memfd_test
 TEST_GEN_PROGS_x86_64 += hardware_disable_test
 TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_x86_64 += kvm_page_table_test
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c 
b/tools/testing/selftests/kvm/guest_memfd_test.c
new file mode 100644
index ..d698f9fde987
--- /dev/null
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright Intel Corporation, 2023
+ *
+ * Author: Chao Peng 
+ */
+
+#define _GNU_SOURCE
+#include "test_util.h"
+#include "kvm_util_base.h"
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static void test_file_read_write(int fd)
+{
+   char buf[64];
+
+   TEST_ASSERT(read(fd, buf, sizeof(buf)) < 0,
+   "read on a guest_mem fd should fail");
+   TEST_ASSERT(write(fd, buf, sizeof(buf)) < 0,
+   "write on a guest_mem fd should fail");
+   TEST_ASSERT(pread(fd, buf, sizeof(buf), 0) < 0,
+   "pread on a guest_mem fd should fail");
+   TEST_ASSERT(pwrite(fd, buf, sizeof(buf), 0) < 0,
+   "pwrite on a guest_mem fd should fail");
+}
+
+static void test_mmap(int fd, size_t page_size)
+{
+   char *mem;
+
+   mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+   ASSERT_EQ(mem, MAP_FAILED);
+}
+
+static void test_file_size(int fd, size_t page_size, size_t total_size)
+{
+   struct stat sb;
+   int ret;
+
+   ret = fstat(fd, );
+   TEST_ASSERT(!ret, "fstat should succeed");
+   ASSERT_EQ(sb.st_size, total_size);
+   ASSERT_EQ(sb.st_blksize, page_size);
+}
+
+static void test_fallocate(int fd, size_t page_size, size_t total_size)
+{
+   int ret;
+
+   ret = fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, total_size);
+   TEST_ASSERT(!ret, "fallocate with aligned offset and size should 
succeed");
+
+   ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+   page_size - 1, page_size);
+   TEST_ASSERT(ret, "fallocate with unaligned offset should fail");
+
+   ret = fallocate(fd, FALLOC_FL_KEEP_SIZE, total_size, page_size);
+   TEST_ASSERT(ret, "fallocate beginning at total_size should fail");
+
+   ret = fallocate(fd, FALLOC_FL_KEEP_SIZE, total_size + page_size, 
page_size);
+   TEST_ASSERT(ret, "fallocate beginning at total_size should fail");
+
+   ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+   total_size, page_size);
+   TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) at total_size should succeed");
+
+   ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+   total_size + page_size, page_size);
+   TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) after total_size should 
succeed");
+
+   ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+   page_size, page_size - 1);
+   TEST_ASSERT(ret, "fallocate with unaligned size should fail");
+
+   ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+   page_size, page_size);
+   TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) with aligned offset and size 
should succeed");
+
+   ret = fallocate(fd, FALLOC_FL_KEEP_SIZE, page_size, page_size);
+   TEST_ASSERT(!ret, "fallocate to restore punched hole should succeed");
+}
+
+
+int main(int argc, char *argv[])
+{
+   size_t page_size;
+   size_t total_size;
+   int fd;
+   struct kvm_vm *vm;
+
+   page_size = getpagesize();
+   total_size = page_size * 4;
+
+   vm = vm_create_barebones();
+
+   fd = vm_create_guest_memfd(vm, total_size, 0);
+
+   test_file_read_write(fd);
+   test_mmap(fd, page_size);
+

[RFC PATCH v11 27/29] KVM: selftests: Expand set_memory_region_test to validate guest_memfd()

2023-07-18 Thread Sean Christopherson

From: Chao Peng 

Expand set_memory_region_test to exercise various positive and negative
testcases for private memory.

 - Non-guest_memfd() file descriptor for private memory
 - guest_memfd() from different VM
 - Overlapping bindings
 - Unaligned bindings

Signed-off-by: Chao Peng 
Co-developed-by: Ackerley Tng 
Signed-off-by: Ackerley Tng 
[sean: trim the testcases to remove duplicate coverage]
Signed-off-by: Sean Christopherson 
---
 .../selftests/kvm/include/kvm_util_base.h | 10 ++
 .../selftests/kvm/set_memory_region_test.c| 99 +++
 2 files changed, 109 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 334df27a6f43..39b38c75b99c 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -789,6 +789,16 @@ static inline struct kvm_vm *vm_create_barebones(void)
return vm_create(VM_SHAPE_DEFAULT);
 }
 
+static inline struct kvm_vm *vm_create_barebones_protected_vm(void)
+{
+   const struct vm_shape shape = {
+   .mode = VM_MODE_DEFAULT,
+   .type = KVM_X86_SW_PROTECTED_VM,
+   };
+
+   return vm_create(shape);
+}
+
 static inline struct kvm_vm *vm_create(uint32_t nr_runnable_vcpus)
 {
return __vm_create(VM_SHAPE_DEFAULT, nr_runnable_vcpus, 0);
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c 
b/tools/testing/selftests/kvm/set_memory_region_test.c
index a849ce23ca97..ca2ca6947376 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -382,6 +382,98 @@ static void test_add_max_memory_regions(void)
kvm_vm_free(vm);
 }
 
+
+static void test_invalid_guest_memfd(struct kvm_vm *vm, int memfd,
+size_t offset, const char *msg)
+{
+   int r = __vm_set_user_memory_region2(vm, MEM_REGION_SLOT, 
KVM_MEM_PRIVATE,
+MEM_REGION_GPA, MEM_REGION_SIZE,
+0, memfd, offset);
+   TEST_ASSERT(r == -1 && errno == EINVAL, "%s", msg);
+}
+
+static void test_add_private_memory_region(void)
+{
+   struct kvm_vm *vm, *vm2;
+   int memfd, i;
+
+   pr_info("Testing ADD of KVM_MEM_PRIVATE memory regions\n");
+
+   vm = vm_create_barebones_protected_vm();
+
+   test_invalid_guest_memfd(vm, vm->kvm_fd, 0, "KVM fd should fail");
+   test_invalid_guest_memfd(vm, vm->fd, 0, "VM's fd should fail");
+
+   memfd = kvm_memfd_alloc(MEM_REGION_SIZE, false);
+   test_invalid_guest_memfd(vm, vm->fd, 0, "Regular memfd() should fail");
+   close(memfd);
+
+   vm2 = vm_create_barebones_protected_vm();
+   memfd = vm_create_guest_memfd(vm2, MEM_REGION_SIZE, 0);
+   test_invalid_guest_memfd(vm, memfd, 0, "Other VM's guest_memfd() should 
fail");
+
+   vm_set_user_memory_region2(vm2, MEM_REGION_SLOT, KVM_MEM_PRIVATE,
+  MEM_REGION_GPA, MEM_REGION_SIZE, 0, memfd, 
0);
+   close(memfd);
+   kvm_vm_free(vm2);
+
+   memfd = vm_create_guest_memfd(vm, MEM_REGION_SIZE, 0);
+   for (i = 1; i < PAGE_SIZE; i++)
+   test_invalid_guest_memfd(vm, memfd, i, "Unaligned offset should 
fail");
+
+   vm_set_user_memory_region2(vm, MEM_REGION_SLOT, KVM_MEM_PRIVATE,
+  MEM_REGION_GPA, MEM_REGION_SIZE, 0, memfd, 
0);
+   close(memfd);
+
+   kvm_vm_free(vm);
+}
+
+static void test_add_overlapping_private_memory_regions(void)
+{
+   struct kvm_vm *vm;
+   int memfd;
+   int r;
+
+   pr_info("Testing ADD of overlapping KVM_MEM_PRIVATE memory regions\n");
+
+   vm = vm_create_barebones_protected_vm();
+
+   memfd = vm_create_guest_memfd(vm, MEM_REGION_SIZE * 4, 0);
+
+   vm_set_user_memory_region2(vm, MEM_REGION_SLOT, KVM_MEM_PRIVATE,
+  MEM_REGION_GPA, MEM_REGION_SIZE * 2, 0, 
memfd, 0);
+
+   vm_set_user_memory_region2(vm, MEM_REGION_SLOT + 1, KVM_MEM_PRIVATE,
+  MEM_REGION_GPA * 2, MEM_REGION_SIZE * 2,
+  0, memfd, MEM_REGION_SIZE * 2);
+
+   /*
+* Delete the first memslot, and then attempt to recreate it except
+* with a "bad" offset that results in overlap in the guest_memfd().
+*/
+   vm_set_user_memory_region2(vm, MEM_REGION_SLOT, KVM_MEM_PRIVATE,
+  MEM_REGION_GPA, 0, NULL, -1, 0);
+
+   /* Overlap the front half of the other slot. */
+   r = __vm_set_user_memory_region2(vm, MEM_REGION_SLOT, KVM_MEM_PRIVATE,
+MEM_REGION_GPA * 2 - MEM_REGION_SIZE,
+MEM_REGION_SIZE * 2,
+0, memfd, 0);
+   TEST_ASSERT(r == -1 && errno == EEXIST, "%s",
+

[RFC PATCH v11 26/29] KVM: selftests: Add KVM_SET_USER_MEMORY_REGION2 helper

2023-07-18 Thread Sean Christopherson

From: Chao Peng 

Provide a raw version as well as an assert-success version to reduce
the amount of boilerplate code need for basic usage.

Signed-off-by: Chao Peng 
Signed-off-by: Ackerley Tng 
---
 .../selftests/kvm/include/kvm_util_base.h |  7 +
 tools/testing/selftests/kvm/lib/kvm_util.c| 29 +++
 2 files changed, 36 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 856440294013..334df27a6f43 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -492,6 +492,13 @@ void vm_set_user_memory_region(struct kvm_vm *vm, uint32_t 
slot, uint32_t flags,
   uint64_t gpa, uint64_t size, void *hva);
 int __vm_set_user_memory_region(struct kvm_vm *vm, uint32_t slot, uint32_t 
flags,
uint64_t gpa, uint64_t size, void *hva);
+void vm_set_user_memory_region2(struct kvm_vm *vm, uint32_t slot,
+   uint32_t flags, uint64_t gpa, uint64_t size,
+   void *hva, uint32_t gmem_fd, uint64_t 
gmem_offset);
+int __vm_set_user_memory_region2(struct kvm_vm *vm, uint32_t slot,
+uint32_t flags, uint64_t gpa, uint64_t size,
+void *hva, uint32_t gmem_fd, uint64_t 
gmem_offset);
+
 void vm_userspace_mem_region_add(struct kvm_vm *vm,
enum vm_mem_backing_src_type src_type,
uint64_t guest_paddr, uint32_t slot, uint64_t npages,
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
b/tools/testing/selftests/kvm/lib/kvm_util.c
index 64221c320389..f7b8b5eb3e8f 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -868,6 +868,35 @@ void vm_set_user_memory_region(struct kvm_vm *vm, uint32_t 
slot, uint32_t flags,
errno, strerror(errno));
 }
 
+int __vm_set_user_memory_region2(struct kvm_vm *vm, uint32_t slot,
+uint32_t flags, uint64_t gpa, uint64_t size,
+void *hva, uint32_t gmem_fd, uint64_t 
gmem_offset)
+{
+   struct kvm_userspace_memory_region2 region = {
+   .slot = slot,
+   .flags = flags,
+   .guest_phys_addr = gpa,
+   .memory_size = size,
+   .userspace_addr = (uintptr_t)hva,
+   .gmem_fd = gmem_fd,
+   .gmem_offset = gmem_offset,
+   };
+
+   return ioctl(vm->fd, KVM_SET_USER_MEMORY_REGION2, );
+}
+
+void vm_set_user_memory_region2(struct kvm_vm *vm, uint32_t slot,
+   uint32_t flags, uint64_t gpa, uint64_t size,
+   void *hva, uint32_t gmem_fd, uint64_t 
gmem_offset)
+{
+   int ret = __vm_set_user_memory_region2(vm, slot, flags, gpa, size, hva,
+  gmem_fd, gmem_offset);
+
+   TEST_ASSERT(!ret, "KVM_SET_USER_MEMORY_REGION2 failed, errno = %d (%s)",
+   errno, strerror(errno));
+}
+
+
 /* FIXME: This thing needs to be ripped apart and rewritten. */
 void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
uint64_t guest_paddr, uint32_t slot, uint64_t npages,
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 25/29] KVM: selftests: Add x86-only selftest for private memory conversions

2023-07-18 Thread Sean Christopherson

From: Vishal Annapurve 

Add a selftest to exercise implicit/explicit conversion functionality
within KVM and verify:

 - Shared memory is visible to host userspace
 - Private memory is not visible to host userspace
 - Host userspace and guest can communicate over shared memory
 - Data in shared backing is preserved across conversions (test's
   host userspace doesn't free the data)
 - Private memory is bound to the lifetime of the VM

TODO: rewrite this to allow backing a single region of guest memory with
multiple memslots for _all_ backing types and shapes, i.e. make the code
for using a single backing fd across multiple memslots apply to regular
memory as well.

Signed-off-by: Vishal Annapurve 
Co-developed-by: Ackerley Tng 
Signed-off-by: Ackerley Tng 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
---
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../kvm/x86_64/private_mem_conversions_test.c | 408 ++
 2 files changed, 409 insertions(+)
 create mode 100644 
tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index c692cc86e7da..fdc7dff8d6ae 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -80,6 +80,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/monitor_mwait_test
 TEST_GEN_PROGS_x86_64 += x86_64/nested_exceptions_test
 TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test
 TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test
+TEST_GEN_PROGS_x86_64 += x86_64/private_mem_conversions_test
 TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id
 TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test
 TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test
diff --git a/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c 
b/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c
new file mode 100644
index ..40ec5f9cc256
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c
@@ -0,0 +1,408 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022, Google LLC.
+ */
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define BASE_DATA_SLOT 10
+#define BASE_DATA_GPA  ((uint64_t)(1ull << 32))
+#define PER_CPU_DATA_SIZE  ((uint64_t)(SZ_2M + PAGE_SIZE))
+
+/* Horrific macro so that the line info is captured accurately :-( */
+#define memcmp_g(gpa, pattern,  size)  \
+do {   \
+   uint8_t *mem = (uint8_t *)gpa;  \
+   size_t i;   \
+   \
+   for (i = 0; i < size; i++)  \
+   GUEST_ASSERT_4(mem[i] == pattern,   \
+  gpa, i, mem[i], pattern);\
+} while (0)
+
+static void memcmp_h(uint8_t *mem, uint8_t pattern, size_t size)
+{
+   size_t i;
+
+   for (i = 0; i < size; i++)
+   TEST_ASSERT(mem[i] == pattern,
+   "Expected 0x%x at offset %lu, got 0x%x",
+   pattern, i, mem[i]);
+}
+
+/*
+ * Run memory conversion tests with explicit conversion:
+ * Execute KVM hypercall to map/unmap gpa range which will cause userspace exit
+ * to back/unback private memory. Subsequent accesses by guest to the gpa range
+ * will not cause exit to userspace.
+ *
+ * Test memory conversion scenarios with following steps:
+ * 1) Access private memory using private access and verify that memory 
contents
+ *   are not visible to userspace.
+ * 2) Convert memory to shared using explicit conversions and ensure that
+ *   userspace is able to access the shared regions.
+ * 3) Convert memory back to private using explicit conversions and ensure that
+ *   userspace is again not able to access converted private regions.
+ */
+
+#define GUEST_STAGE(o, s) { .offset = o, .size = s }
+
+enum ucall_syncs {
+   SYNC_SHARED,
+   SYNC_PRIVATE,
+};
+
+static void guest_sync_shared(uint64_t gpa, uint64_t size,
+ uint8_t current_pattern, uint8_t new_pattern)
+{
+   GUEST_SYNC5(SYNC_SHARED, gpa, size, current_pattern, new_pattern);
+}
+
+static void guest_sync_private(uint64_t gpa, uint64_t size, uint8_t pattern)
+{
+   GUEST_SYNC4(SYNC_PRIVATE, gpa, size, pattern);
+}
+
+/* Arbitrary values, KVM doesn't care about the attribute flags. */
+#define MAP_GPA_SHARED BIT(0)
+#define MAP_GPA_DO_FALLOCATE   BIT(1)
+
+static void guest_map_mem(uint64_t gpa, uint64_t size, bool map_shared,
+ bool do_fallocate)
+{
+   uint64_t flags = 0;
+
+

[RFC PATCH v11 24/29] KVM: selftests: Add GUEST_SYNC[1-6] macros for synchronizing more data

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 tools/testing/selftests/kvm/include/ucall_common.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/ucall_common.h 
b/tools/testing/selftests/kvm/include/ucall_common.h
index 1a6aaef5ccae..8087c877fd58 100644
--- a/tools/testing/selftests/kvm/include/ucall_common.h
+++ b/tools/testing/selftests/kvm/include/ucall_common.h
@@ -46,6 +46,18 @@ void ucall_init(struct kvm_vm *vm, vm_paddr_t mmio_gpa);
 #define GUEST_SYNC_ARGS(stage, arg1, arg2, arg3, arg4) \
ucall(UCALL_SYNC, 6, "hello", stage, arg1, 
arg2, arg3, arg4)
 #define GUEST_SYNC(stage)  ucall(UCALL_SYNC, 2, "hello", stage)
+
+#define GUEST_SYNC1(arg0)  ucall(UCALL_SYNC, 1, arg0)
+#define GUEST_SYNC2(arg0, arg1)ucall(UCALL_SYNC, 2, arg0, arg1)
+#define GUEST_SYNC3(arg0, arg1, arg2) \
+   ucall(UCALL_SYNC, 3, arg0, arg1, arg2)
+#define GUEST_SYNC4(arg0, arg1, arg2, arg3) \
+   ucall(UCALL_SYNC, 4, arg0, arg1, arg2, arg3)
+#define GUEST_SYNC5(arg0, arg1, arg2, arg3, arg4) \
+   ucall(UCALL_SYNC, 5, arg0, arg1, arg2, arg3, 
arg4)
+#define GUEST_SYNC6(arg0, arg1, arg2, arg3, arg4, arg5) \
+   ucall(UCALL_SYNC, 6, arg0, arg1, arg2, arg3, 
arg4, arg5)
+
 #define GUEST_DONE()   ucall(UCALL_DONE, 0)
 
 enum guest_assert_builtin_args {
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 23/29] KVM: selftests: Introduce VM "shape" to allow tests to specify the VM type

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 tools/testing/selftests/kvm/dirty_log_test.c  |  2 +-
 .../selftests/kvm/include/kvm_util_base.h | 54 +++
 .../selftests/kvm/kvm_page_table_test.c   |  2 +-
 tools/testing/selftests/kvm/lib/kvm_util.c| 43 +++
 tools/testing/selftests/kvm/lib/memstress.c   |  3 +-
 .../kvm/x86_64/ucna_injection_test.c  |  2 +-
 6 files changed, 72 insertions(+), 34 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
b/tools/testing/selftests/kvm/dirty_log_test.c
index 936f3a8d1b83..6cbecf499767 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -699,7 +699,7 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, 
struct kvm_vcpu **vcpu,
 
pr_info("Testing guest mode: %s\n", vm_guest_mode_string(mode));
 
-   vm = __vm_create(mode, 1, extra_mem_pages);
+   vm = __vm_create(VM_SHAPE(mode), 1, extra_mem_pages);
 
log_mode_create_vm_done(vm);
*vcpu = vm_vcpu_add(vm, 0, guest_code);
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 1819787b773b..856440294013 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -167,6 +167,23 @@ enum vm_guest_mode {
NUM_VM_MODES,
 };
 
+struct vm_shape {
+   enum vm_guest_mode mode;
+   unsigned int type;
+};
+
+#define VM_TYPE_DEFAULT0
+
+#define VM_SHAPE(__mode)   \
+({ \
+   struct vm_shape shape = {   \
+   .mode = (__mode),   \
+   .type = VM_TYPE_DEFAULT \
+   };  \
+   \
+   shape;  \
+})
+
 #if defined(__aarch64__)
 
 extern enum vm_guest_mode vm_mode_default;
@@ -199,6 +216,8 @@ extern enum vm_guest_mode vm_mode_default;
 
 #endif
 
+#define VM_SHAPE_DEFAULT   VM_SHAPE(VM_MODE_DEFAULT)
+
 #define MIN_PAGE_SIZE  (1U << MIN_PAGE_SHIFT)
 #define PTES_PER_MIN_PAGE  ptes_per_page(MIN_PAGE_SIZE)
 
@@ -754,21 +773,21 @@ vm_paddr_t vm_alloc_page_table(struct kvm_vm *vm);
  * __vm_create() does NOT create vCPUs, @nr_runnable_vcpus is used purely to
  * calculate the amount of memory needed for per-vCPU data, e.g. stacks.
  */
-struct kvm_vm *vm_create(enum vm_guest_mode mode);
-struct kvm_vm *__vm_create(enum vm_guest_mode mode, uint32_t nr_runnable_vcpus,
+struct kvm_vm *vm_create(struct vm_shape shape);
+struct kvm_vm *__vm_create(struct vm_shape shape, uint32_t nr_runnable_vcpus,
   uint64_t nr_extra_pages);
 
 static inline struct kvm_vm *vm_create_barebones(void)
 {
-   return vm_create(VM_MODE_DEFAULT);
+   return vm_create(VM_SHAPE_DEFAULT);
 }
 
 static inline struct kvm_vm *vm_create(uint32_t nr_runnable_vcpus)
 {
-   return __vm_create(VM_MODE_DEFAULT, nr_runnable_vcpus, 0);
+   return __vm_create(VM_SHAPE_DEFAULT, nr_runnable_vcpus, 0);
 }
 
-struct kvm_vm *__vm_create_with_vcpus(enum vm_guest_mode mode, uint32_t 
nr_vcpus,
+struct kvm_vm *__vm_create_with_vcpus(struct vm_shape shape, uint32_t nr_vcpus,
  uint64_t extra_mem_pages,
  void *guest_code, struct kvm_vcpu 
*vcpus[]);
 
@@ -776,17 +795,27 @@ static inline struct kvm_vm 
*vm_create_with_vcpus(uint32_t nr_vcpus,
  void *guest_code,
  struct kvm_vcpu *vcpus[])
 {
-   return __vm_create_with_vcpus(VM_MODE_DEFAULT, nr_vcpus, 0,
+   return __vm_create_with_vcpus(VM_SHAPE_DEFAULT, nr_vcpus, 0,
  guest_code, vcpus);
 }
 
+
+struct kvm_vm *__vm_create_shape_with_one_vcpu(struct vm_shape shape,
+  struct kvm_vcpu **vcpu,
+  uint64_t extra_mem_pages,
+  void *guest_code);
+
 /*
  * Create a VM with a single vCPU with reasonable defaults and @extra_mem_pages
  * additional pages of guest memory.  Returns the VM and vCPU (via out param).
  */
-struct kvm_vm *__vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
-uint64_t extra_mem_pages,
-void *guest_code);
+static inline struct kvm_vm *__vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
+  uint64_t extra_mem_pages,
+  void *guest_code)
+{
+   return __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, vcpu,
+  extra_mem_pages,

[RFC PATCH v11 22/29] KVM: selftests: Add helpers to do KVM_HC_MAP_GPA_RANGE hypercalls (x86)

2023-07-18 Thread Sean Christopherson

From: Vishal Annapurve 

Signed-off-by: Vishal Annapurve 
[sean: drop shared/private helpers (let tests specify flags)]
Signed-off-by: Sean Christopherson 
---
 .../selftests/kvm/include/x86_64/processor.h  | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h 
b/tools/testing/selftests/kvm/include/x86_64/processor.h
index aa434c8f19c5..8857143d400a 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -15,6 +15,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 #include "../kvm_util.h"
@@ -1166,6 +1167,20 @@ uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, 
uint64_t a1, uint64_t a2,
 uint64_t __xen_hypercall(uint64_t nr, uint64_t a0, void *a1);
 void xen_hypercall(uint64_t nr, uint64_t a0, void *a1);
 
+static inline uint64_t __kvm_hypercall_map_gpa_range(uint64_t gpa,
+uint64_t size, uint64_t 
flags)
+{
+   return kvm_hypercall(KVM_HC_MAP_GPA_RANGE, gpa, size >> PAGE_SHIFT, 
flags, 0);
+}
+
+static inline void kvm_hypercall_map_gpa_range(uint64_t gpa, uint64_t size,
+  uint64_t flags)
+{
+   uint64_t ret = __kvm_hypercall_map_gpa_range(gpa, size, flags);
+
+   GUEST_ASSERT_1(!ret, ret);
+}
+
 void __vm_xsave_require_permission(uint64_t xfeature, const char *name);
 
 #define vm_xsave_require_permission(xfeature)  \
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 21/29] KVM: selftests: Add helpers to convert guest memory b/w private and shared

2023-07-18 Thread Sean Christopherson

From: Vishal Annapurve 

Signed-off-by: Vishal Annapurve 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
---
 .../selftests/kvm/include/kvm_util_base.h | 48 +++
 tools/testing/selftests/kvm/lib/kvm_util.c| 26 ++
 2 files changed, 74 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index f1de6a279561..1819787b773b 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -312,6 +312,54 @@ static inline void vm_enable_cap(struct kvm_vm *vm, 
uint32_t cap, uint64_t arg0)
vm_ioctl(vm, KVM_ENABLE_CAP, _cap);
 }
 
+static inline void vm_set_memory_attributes(struct kvm_vm *vm, uint64_t gpa,
+   uint64_t size, uint64_t attributes)
+{
+   struct kvm_memory_attributes attr = {
+   .attributes = attributes,
+   .address = gpa,
+   .size = size,
+   .flags = 0,
+   };
+
+   /*
+* KVM_SET_MEMORY_ATTRIBUTES overwrites _all_ attributes.  These flows
+* need significant enhancements to support multiple attributes.
+*/
+   TEST_ASSERT(!attributes || attributes == KVM_MEMORY_ATTRIBUTE_PRIVATE,
+   "Update me to support multiple attributes!");
+
+   vm_ioctl(vm, KVM_SET_MEMORY_ATTRIBUTES, );
+}
+
+
+static inline void vm_mem_set_private(struct kvm_vm *vm, uint64_t gpa,
+ uint64_t size)
+{
+   vm_set_memory_attributes(vm, gpa, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
+}
+
+static inline void vm_mem_set_shared(struct kvm_vm *vm, uint64_t gpa,
+uint64_t size)
+{
+   vm_set_memory_attributes(vm, gpa, size, 0);
+}
+
+void vm_guest_mem_fallocate(struct kvm_vm *vm, uint64_t gpa, uint64_t size,
+   bool punch_hole);
+
+static inline void vm_guest_mem_punch_hole(struct kvm_vm *vm, uint64_t gpa,
+  uint64_t size)
+{
+   vm_guest_mem_fallocate(vm, gpa, size, true);
+}
+
+static inline void vm_guest_mem_allocate(struct kvm_vm *vm, uint64_t gpa,
+uint64_t size)
+{
+   vm_guest_mem_fallocate(vm, gpa, size, false);
+}
+
 void vm_enable_dirty_ring(struct kvm_vm *vm, uint32_t ring_size);
 const char *vm_guest_mode_string(uint32_t i);
 
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
b/tools/testing/selftests/kvm/lib/kvm_util.c
index b93717e62325..1283e24b76f1 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1171,6 +1171,32 @@ void vm_mem_region_delete(struct kvm_vm *vm, uint32_t 
slot)
__vm_mem_region_delete(vm, memslot2region(vm, slot), true);
 }
 
+void vm_guest_mem_fallocate(struct kvm_vm *vm, uint64_t gpa, uint64_t size,
+   bool punch_hole)
+{
+   struct userspace_mem_region *region;
+   uint64_t end = gpa + size - 1;
+   off_t fd_offset;
+   int mode, ret;
+
+   region = userspace_mem_region_find(vm, gpa, gpa);
+   TEST_ASSERT(region && region->region.flags & KVM_MEM_PRIVATE,
+   "Private memory region not found for GPA 0x%lx", gpa);
+
+   TEST_ASSERT(region == userspace_mem_region_find(vm, end, end),
+   "fallocate() for guest_memfd must act on a single memslot");
+
+   fd_offset = region->region.gmem_offset +
+   (gpa - region->region.guest_phys_addr);
+
+   mode = FALLOC_FL_KEEP_SIZE | (punch_hole ? FALLOC_FL_PUNCH_HOLE : 0);
+
+   ret = fallocate(region->region.gmem_fd, mode, fd_offset, size);
+   TEST_ASSERT(!ret, "fallocate() failed to %s at %lx[%lu], fd = %d, mode 
= %x, offset = %lx\n",
+punch_hole ? "punch hole" : "allocate", gpa, size,
+region->region.gmem_fd, mode, fd_offset);
+}
+
 /* Returns the size of a vCPU's kvm_run structure. */
 static int vcpu_mmap_sz(void)
 {
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 20/29] KVM: selftests: Add support for creating private memslots

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 .../selftests/kvm/include/kvm_util_base.h | 16 
 .../testing/selftests/kvm/include/test_util.h |  5 ++
 tools/testing/selftests/kvm/lib/kvm_util.c| 85 ---
 3 files changed, 75 insertions(+), 31 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index d4a9925d6815..f1de6a279561 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -407,6 +407,19 @@ static inline uint64_t vm_get_stat(struct kvm_vm *vm, 
const char *stat_name)
 }
 
 void vm_create_irqchip(struct kvm_vm *vm);
+static inline int vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size,
+   uint64_t flags)
+{
+   struct kvm_create_guest_memfd gmem = {
+   .size = size,
+   .flags = flags,
+   };
+
+   int fd = __vm_ioctl(vm, KVM_CREATE_GUEST_MEMFD, );
+
+   TEST_ASSERT(fd >= 0, KVM_IOCTL_ERROR(KVM_CREATE_GUEST_MEMFD, fd));
+   return fd;
+}
 
 void vm_set_user_memory_region(struct kvm_vm *vm, uint32_t slot, uint32_t 
flags,
   uint64_t gpa, uint64_t size, void *hva);
@@ -416,6 +429,9 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
enum vm_mem_backing_src_type src_type,
uint64_t guest_paddr, uint32_t slot, uint64_t npages,
uint32_t flags);
+void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
+   uint64_t guest_paddr, uint32_t slot, uint64_t npages,
+   uint32_t flags, int gmem_fd, uint64_t gmem_offset);
 
 void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t flags);
 void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, uint64_t new_gpa);
diff --git a/tools/testing/selftests/kvm/include/test_util.h 
b/tools/testing/selftests/kvm/include/test_util.h
index a6e9f215ce70..f3088d27f3ce 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -143,6 +143,11 @@ static inline bool backing_src_is_shared(enum 
vm_mem_backing_src_type t)
return vm_mem_backing_src_alias(t)->flag & MAP_SHARED;
 }
 
+static inline bool backing_src_can_be_huge(enum vm_mem_backing_src_type t)
+{
+   return t != VM_MEM_SRC_ANONYMOUS && t != VM_MEM_SRC_SHMEM;
+}
+
 /* Aligns x up to the next multiple of size. Size must be a power of 2. */
 static inline uint64_t align_up(uint64_t x, uint64_t size)
 {
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
b/tools/testing/selftests/kvm/lib/kvm_util.c
index c1e4de53d082..b93717e62325 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -664,6 +664,8 @@ static void __vm_mem_region_delete(struct kvm_vm *vm,
TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
close(region->fd);
}
+   if (region->region.gmem_fd >= 0)
+   close(region->region.gmem_fd);
 
free(region);
 }
@@ -865,36 +867,15 @@ void vm_set_user_memory_region(struct kvm_vm *vm, 
uint32_t slot, uint32_t flags,
errno, strerror(errno));
 }
 
-/*
- * VM Userspace Memory Region Add
- *
- * Input Args:
- *   vm - Virtual Machine
- *   src_type - Storage source for this region.
- *  NULL to use anonymous memory.
- *   guest_paddr - Starting guest physical address
- *   slot - KVM region slot
- *   npages - Number of physical pages
- *   flags - KVM memory region flags (e.g. KVM_MEM_LOG_DIRTY_PAGES)
- *
- * Output Args: None
- *
- * Return: None
- *
- * Allocates a memory area of the number of pages specified by npages
- * and maps it to the VM specified by vm, at a starting physical address
- * given by guest_paddr.  The region is created with a KVM region slot
- * given by slot, which must be unique and < KVM_MEM_SLOTS_NUM.  The
- * region is created with the flags given by flags.
- */
-void vm_userspace_mem_region_add(struct kvm_vm *vm,
-   enum vm_mem_backing_src_type src_type,
-   uint64_t guest_paddr, uint32_t slot, uint64_t npages,
-   uint32_t flags)
+/* FIXME: This thing needs to be ripped apart and rewritten. */
+void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
+   uint64_t guest_paddr, uint32_t slot, uint64_t npages,
+   uint32_t flags, int gmem_fd, uint64_t gmem_offset)
 {
int ret;
struct userspace_mem_region *region;
size_t backing_src_pagesz = get_backing_src_pagesz(src_type);
+   size_t mem_size = npages * vm->page_size;
size_t alignment;
 
TEST_ASSERT(vm_adjust_num_guest_pages(vm->mode, npages) == npages,
@@ -947,7 +928,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
/* Allocate and initialize new mem region structure. */
region = calloc(1, sizeof(*region));
TEST_ASSERT(region !=

[RFC PATCH v11 19/29] KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 .../selftests/kvm/include/kvm_util_base.h  |  2 +-
 tools/testing/selftests/kvm/lib/kvm_util.c | 18 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 6aeb008dd668..d4a9925d6815 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -43,7 +43,7 @@ typedef uint64_t vm_paddr_t; /* Virtual Machine (Guest) 
physical address */
 typedef uint64_t vm_vaddr_t; /* Virtual Machine (Guest) virtual address */
 
 struct userspace_mem_region {
-   struct kvm_userspace_memory_region region;
+   struct kvm_userspace_memory_region2 region;
struct sparsebit *unused_phy_pages;
int fd;
off_t offset;
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
b/tools/testing/selftests/kvm/lib/kvm_util.c
index 45d21e052db0..c1e4de53d082 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -449,8 +449,8 @@ void kvm_vm_restart(struct kvm_vm *vmp)
vm_create_irqchip(vmp);
 
hash_for_each(vmp->regions.slot_hash, ctr, region, slot_node) {
-   int ret = ioctl(vmp->fd, KVM_SET_USER_MEMORY_REGION, 
>region);
-   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION IOCTL 
failed,\n"
+   int ret = ioctl(vmp->fd, KVM_SET_USER_MEMORY_REGION2, 
>region);
+   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL 
failed,\n"
"  rc: %i errno: %i\n"
"  slot: %u flags: 0x%x\n"
"  guest_phys_addr: 0x%llx size: 0x%llx",
@@ -653,7 +653,7 @@ static void __vm_mem_region_delete(struct kvm_vm *vm,
}
 
region->region.memory_size = 0;
-   vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, >region);
+   vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION2, >region);
 
sparsebit_free(>unused_phy_pages);
ret = munmap(region->mmap_start, region->mmap_size);
@@ -1010,8 +1010,8 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
region->region.guest_phys_addr = guest_paddr;
region->region.memory_size = npages * vm->page_size;
region->region.userspace_addr = (uintptr_t) region->host_mem;
-   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, >region);
-   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION IOCTL failed,\n"
+   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION2, >region);
+   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL failed,\n"
"  rc: %i errno: %i\n"
"  slot: %u flags: 0x%x\n"
"  guest_phys_addr: 0x%lx size: 0x%lx",
@@ -1093,9 +1093,9 @@ void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t 
slot, uint32_t flags)
 
region->region.flags = flags;
 
-   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, >region);
+   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION2, >region);
 
-   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION IOCTL failed,\n"
+   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL failed,\n"
"  rc: %i errno: %i slot: %u flags: 0x%x",
ret, errno, slot, flags);
 }
@@ -1123,9 +1123,9 @@ void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, 
uint64_t new_gpa)
 
region->region.guest_phys_addr = new_gpa;
 
-   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, >region);
+   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION2, >region);
 
-   TEST_ASSERT(!ret, "KVM_SET_USER_MEMORY_REGION failed\n"
+   TEST_ASSERT(!ret, "KVM_SET_USER_MEMORY_REGION2 failed\n"
"ret: %i errno: %i slot: %u new_gpa: 0x%lx",
ret, errno, slot, new_gpa);
 }
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 18/29] KVM: selftests: Drop unused kvm_userspace_memory_region_find() helper

2023-07-18 Thread Sean Christopherson

Drop kvm_userspace_memory_region_find(), it's unused and a terrible API
(probably why it's unused).  If anything outside of kvm_util.c needs to
get at the memslot, userspace_mem_region_find() can be exposed to give
others full access to all memory region/slot information.

Signed-off-by: Sean Christopherson 
---
 .../selftests/kvm/include/kvm_util_base.h |  4 ---
 tools/testing/selftests/kvm/lib/kvm_util.c| 29 ---
 2 files changed, 33 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 07732a157ccd..6aeb008dd668 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -753,10 +753,6 @@ vm_adjust_num_guest_pages(enum vm_guest_mode mode, 
unsigned int num_guest_pages)
return n;
 }
 
-struct kvm_userspace_memory_region *
-kvm_userspace_memory_region_find(struct kvm_vm *vm, uint64_t start,
-uint64_t end);
-
 #define sync_global_to_guest(vm, g) ({ \
typeof(g) *_p = addr_gva2hva(vm, (vm_vaddr_t)&(g)); \
memcpy(_p, &(g), sizeof(g));\
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
b/tools/testing/selftests/kvm/lib/kvm_util.c
index 9741a7ff6380..45d21e052db0 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -586,35 +586,6 @@ userspace_mem_region_find(struct kvm_vm *vm, uint64_t 
start, uint64_t end)
return NULL;
 }
 
-/*
- * KVM Userspace Memory Region Find
- *
- * Input Args:
- *   vm - Virtual Machine
- *   start - Starting VM physical address
- *   end - Ending VM physical address, inclusive.
- *
- * Output Args: None
- *
- * Return:
- *   Pointer to overlapping region, NULL if no such region.
- *
- * Public interface to userspace_mem_region_find. Allows tests to look up
- * the memslot datastructure for a given range of guest physical memory.
- */
-struct kvm_userspace_memory_region *
-kvm_userspace_memory_region_find(struct kvm_vm *vm, uint64_t start,
-uint64_t end)
-{
-   struct userspace_mem_region *region;
-
-   region = userspace_mem_region_find(vm, start, end);
-   if (!region)
-   return NULL;
-
-   return >region;
-}
-
 __weak void vcpu_arch_free(struct kvm_vcpu *vcpu)
 {
 
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 17/29] KVM: x86: Add support for "protected VMs" that can utilize private memory

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 Documentation/virt/kvm/api.rst  | 32 
 arch/x86/include/asm/kvm_host.h | 15 +--
 arch/x86/include/uapi/asm/kvm.h |  3 +++
 arch/x86/kvm/Kconfig| 12 
 arch/x86/kvm/mmu/mmu_internal.h |  1 +
 arch/x86/kvm/x86.c  | 16 +++-
 include/uapi/linux/kvm.h|  1 +
 virt/kvm/Kconfig|  5 +
 8 files changed, 78 insertions(+), 7 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0ca8561775ac..9f7b95327c2a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -147,10 +147,29 @@ described as 'basic' will be available.
 The new VM has no virtual cpus and no memory.
 You probably want to use 0 as machine type.
 
+X86:
+
+
+Supported X86 VM types can be queried via KVM_CAP_VM_TYPES.
+
+S390:
+^
+
 In order to create user controlled virtual machines on S390, check
 KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
 privileged user (CAP_SYS_ADMIN).
 
+MIPS:
+^
+
+To use hardware assisted virtualization on MIPS (VZ ASE) rather than
+the default trap & emulate implementation (which changes the virtual
+memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
+flag KVM_VM_MIPS_VZ.
+
+ARM64:
+^^
+
 On arm64, the physical address size for a VM (IPA Size limit) is limited
 to 40bits by default. The limit can be configured if the host supports the
 extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
@@ -8554,6 +8573,19 @@ block sizes is exposed in 
KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES as a
 This capability indicates KVM supports per-page memory attributes and ioctls
 KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES/KVM_SET_MEMORY_ATTRIBUTES are available.
 
+8.41 KVM_CAP_VM_TYPES
+-
+
+:Capability: KVM_CAP_MEMORY_ATTRIBUTES
+:Architectures: x86
+:Type: system ioctl
+
+This capability returns a bitmap of support VM types.  The 1-setting of bit @n
+means the VM type with value @n is supported.  Possible values of @n are::
+
+  #define KVM_X86_DEFAULT_VM   0
+  #define KVM_X86_SW_PROTECTED_VM  1
+
 9. Known KVM API problems
 =
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 08b44544a330..bbefd79b7950 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1227,6 +1227,7 @@ enum kvm_apicv_inhibit {
 };
 
 struct kvm_arch {
+   unsigned long vm_type;
unsigned long n_used_mmu_pages;
unsigned long n_requested_mmu_pages;
unsigned long n_max_mmu_pages;
@@ -2058,6 +2059,12 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t 
new_pgd);
 void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
   int tdp_max_root_level, int tdp_huge_page_level);
 
+#ifdef CONFIG_KVM_PRIVATE_MEM
+#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.vm_type != 
KVM_X86_DEFAULT_VM)
+#else
+#define kvm_arch_has_private_mem(kvm) false
+#endif
+
 static inline u16 kvm_read_ldt(void)
 {
u16 ldt;
@@ -2106,14 +2113,10 @@ enum {
 #define HF_SMM_INSIDE_NMI_MASK (1 << 2)
 
 # define KVM_MAX_NR_ADDRESS_SPACES 2
+/* SMM is currently unsupported for guests with private memory. */
+# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 : 
2)
 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 
1 : 0)
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
-
-static inline int kvm_arch_nr_memslot_as_ids(struct kvm *kvm)
-{
-   return KVM_MAX_NR_ADDRESS_SPACES;
-}
-
 #else
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
 #endif
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 1a6a1f987949..a448d0964fc0 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -562,4 +562,7 @@ struct kvm_pmu_event_filter {
 /* x86-specific KVM_EXIT_HYPERCALL flags. */
 #define KVM_EXIT_HYPERCALL_LONG_MODE   BIT(0)
 
+#define KVM_X86_DEFAULT_VM 0
+#define KVM_X86_SW_PROTECTED_VM1
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index a7eb2bdbfb18..029c76bcd1a5 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -77,6 +77,18 @@ config KVM_WERROR
 
  If in doubt, say "N".
 
+config KVM_SW_PROTECTED_VM
+   bool "Enable support for KVM software-protected VMs"
+   depends on EXPERT
+   depends on X86_64
+   select KVM_GENERIC_PRIVATE_MEM
+   help
+ Enable support for KVM software-protected VMs.  Currently "protected"
+ means the VM can be backed with memory provided by
+ KVM_CREATE_GUEST_MEMFD.
+
+ If unsure, say "N".
+
 config KVM_INTEL
tristate "KVM for Intel (and compatible) processors support"
depends on KVM && IA32_FEAT_CTL
diff --git

[RFC PATCH v11 16/29] KVM: Allow arch code to track number of memslot address spaces per VM

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 arch/powerpc/kvm/book3s_hv.c|  2 +-
 arch/x86/include/asm/kvm_host.h |  8 +++-
 arch/x86/kvm/debugfs.c  |  2 +-
 arch/x86/kvm/mmu/mmu.c  |  8 
 arch/x86/kvm/mmu/tdp_mmu.c  |  2 +-
 arch/x86/kvm/x86.c  |  2 +-
 include/linux/kvm_host.h| 17 +++--
 virt/kvm/dirty_ring.c   |  2 +-
 virt/kvm/kvm_main.c | 24 
 9 files changed, 39 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 130bafdb1430..9b0eaa17275a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6084,7 +6084,7 @@ static int kvmhv_svm_off(struct kvm *kvm)
}
 
srcu_idx = srcu_read_lock(>srcu);
-   for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+   for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
struct kvm_memory_slot *memslot;
struct kvm_memslots *slots = __kvm_memslots(kvm, i);
int bkt;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7a905e033932..08b44544a330 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2105,9 +2105,15 @@ enum {
 #define HF_SMM_MASK(1 << 1)
 #define HF_SMM_INSIDE_NMI_MASK (1 << 2)
 
-# define KVM_ADDRESS_SPACE_NUM 2
+# define KVM_MAX_NR_ADDRESS_SPACES 2
 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 
1 : 0)
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
+
+static inline int kvm_arch_nr_memslot_as_ids(struct kvm *kvm)
+{
+   return KVM_MAX_NR_ADDRESS_SPACES;
+}
+
 #else
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
 #endif
diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index ee8c4c3496ed..42026b3f3ff3 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -111,7 +111,7 @@ static int kvm_mmu_rmaps_stat_show(struct seq_file *m, void 
*v)
mutex_lock(>slots_lock);
write_lock(>mmu_lock);
 
-   for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+   for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
int bkt;
 
slots = __kvm_memslots(kvm, i);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4cf73a579ee1..05943ccb55a4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3801,7 +3801,7 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm)
kvm_page_track_write_tracking_enabled(kvm))
goto out_success;
 
-   for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+   for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
slots = __kvm_memslots(kvm, i);
kvm_for_each_memslot(slot, bkt, slots) {
/*
@@ -6351,7 +6351,7 @@ static bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t 
gfn_start, gfn_t gfn_e
if (!kvm_memslots_have_rmaps(kvm))
return flush;
 
-   for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+   for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
slots = __kvm_memslots(kvm, i);
 
kvm_for_each_memslot_in_gfn_range(, slots, gfn_start, 
gfn_end) {
@@ -6391,7 +6391,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, 
gfn_t gfn_end)
flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
 
if (tdp_mmu_enabled) {
-   for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
+   for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++)
flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start,
  gfn_end, true, flush);
}
@@ -6855,7 +6855,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 
gen)
 * modifier prior to checking for a wrap of the MMIO generation so
 * that a wrap in any address space is detected.
 */
-   gen &= ~((u64)KVM_ADDRESS_SPACE_NUM - 1);
+   gen &= ~((u64)kvm_arch_nr_memslot_as_ids(kvm) - 1);
 
/*
 * The very rare case: if the MMIO generation number has wrapped,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 6250bd3d20c1..70052f59cfdf 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -905,7 +905,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
 * is being destroyed or the userspace VMM has exited.  In both cases,
 * KVM_RUN is unreachable, i.e. no vCPUs will ever service the request.
 */
-   for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+   for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
for_each_tdp_mmu_root_yield_safe(kvm, root, i)
tdp_mmu_zap_root(kvm, root, false);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dd7cefe78815..463ecf70cec0 100644

[RFC PATCH v11 15/29] KVM: Drop superfluous __KVM_VCPU_MULTIPLE_ADDRESS_SPACE macro

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm_host.h | 1 -
 include/linux/kvm_host.h| 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b87ff7b601fa..7a905e033932 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2105,7 +2105,6 @@ enum {
 #define HF_SMM_MASK(1 << 1)
 #define HF_SMM_INSIDE_NMI_MASK (1 << 2)
 
-# define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
 # define KVM_ADDRESS_SPACE_NUM 2
 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 
1 : 0)
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0d1e2ee8ae7a..5839ef44e145 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -693,7 +693,7 @@ bool kvm_arch_irqchip_in_kernel(struct kvm *kvm);
 #define KVM_MEM_SLOTS_NUM SHRT_MAX
 #define KVM_USER_MEM_SLOTS (KVM_MEM_SLOTS_NUM - KVM_INTERNAL_MEM_SLOTS)
 
-#ifndef __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
+#if KVM_ADDRESS_SPACE_NUM == 1
 static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
 {
return 0;
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 10/29] mm: Add AS_UNMOVABLE to mark mapping as completely unmovable

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 include/linux/pagemap.h | 11 +++
 mm/compaction.c |  4 
 mm/migrate.c|  2 ++
 3 files changed, 17 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 716953ee1ebd..931d2f1da7d5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -203,6 +203,7 @@ enum mapping_flags {
/* writeback related tags are not used */
AS_NO_WRITEBACK_TAGS = 5,
AS_LARGE_FOLIO_SUPPORT = 6,
+   AS_UNMOVABLE= 7,/* The mapping cannot be moved, ever */
 };
 
 /**
@@ -273,6 +274,16 @@ static inline int mapping_use_writeback_tags(struct 
address_space *mapping)
return !test_bit(AS_NO_WRITEBACK_TAGS, >flags);
 }
 
+static inline void mapping_set_unmovable(struct address_space *mapping)
+{
+   set_bit(AS_UNMOVABLE, >flags);
+}
+
+static inline bool mapping_unmovable(struct address_space *mapping)
+{
+   return test_bit(AS_UNMOVABLE, >flags);
+}
+
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
return mapping->gfp_mask;
diff --git a/mm/compaction.c b/mm/compaction.c
index dbc9f86b1934..a3d2b132df52 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1047,6 +1047,10 @@ isolate_migratepages_block(struct compact_control *cc, 
unsigned long low_pfn,
if (!mapping && (folio_ref_count(folio) - 1) > 
folio_mapcount(folio))
goto isolate_fail_put;
 
+   /* The mapping truly isn't movable. */
+   if (mapping && mapping_unmovable(mapping))
+   goto isolate_fail_put;
+
/*
 * Only allow to migrate anonymous pages in GFP_NOFS context
 * because those do not depend on fs locks.
diff --git a/mm/migrate.c b/mm/migrate.c
index 24baad2571e3..c00a4ca86698 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -954,6 +954,8 @@ static int move_to_new_folio(struct folio *dst, struct 
folio *src,
 
if (!mapping)
rc = migrate_folio(mapping, dst, src, mode);
+   else if (mapping_unmovable(mapping))
+   rc = -EOPNOTSUPP;
else if (mapping->a_ops->migrate_folio)
/*
 * Most folios have a mapping and most filesystems
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 14/29] KVM: x86/mmu: Handle page fault for private memory

2023-07-18 Thread Sean Christopherson

From: Chao Peng 

A KVM_MEM_PRIVATE memslot can include both fd-based private memory and
hva-based shared memory. Architecture code (like TDX code) can tell
whether the on-going fault is private or not. This patch adds a
'is_private' field to kvm_page_fault to indicate this and architecture
code is expected to set it.

To handle page fault for such memslot, the handling logic is different
depending on whether the fault is private or shared. KVM checks if
'is_private' matches the host's view of the page (maintained in
mem_attr_array).
  - For a successful match, private pfn is obtained with
restrictedmem_get_page() and shared pfn is obtained with existing
get_user_pages().
  - For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to
userspace. Userspace then can convert memory between private/shared
in host's view and retry the fault.

Co-developed-by: Yu Zhang 
Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c  | 82 +++--
 arch/x86/kvm/mmu/mmu_internal.h |  3 ++
 arch/x86/kvm/mmu/mmutrace.h |  1 +
 3 files changed, 81 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index aefe67185637..4cf73a579ee1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3179,9 +3179,9 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t 
gfn,
return level;
 }
 
-int kvm_mmu_max_mapping_level(struct kvm *kvm,
- const struct kvm_memory_slot *slot, gfn_t gfn,
- int max_level)
+static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
+  const struct kvm_memory_slot *slot,
+  gfn_t gfn, int max_level, bool 
is_private)
 {
struct kvm_lpage_info *linfo;
int host_level;
@@ -3193,6 +3193,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
break;
}
 
+   if (is_private)
+   return max_level;
+
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
 
@@ -3200,6 +3203,16 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
return min(host_level, max_level);
 }
 
+int kvm_mmu_max_mapping_level(struct kvm *kvm,
+ const struct kvm_memory_slot *slot, gfn_t gfn,
+ int max_level)
+{
+   bool is_private = kvm_slot_can_be_private(slot) &&
+ kvm_mem_is_private(kvm, gfn);
+
+   return __kvm_mmu_max_mapping_level(kvm, slot, gfn, max_level, 
is_private);
+}
+
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault 
*fault)
 {
struct kvm_memory_slot *slot = fault->slot;
@@ -3220,8 +3233,9 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, 
struct kvm_page_fault *fault
 * Enforce the iTLB multihit workaround after capturing the requested
 * level, which will be used to do precise, accurate accounting.
 */
-   fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
-fault->gfn, 
fault->max_level);
+   fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot,
+  fault->gfn, 
fault->max_level,
+  fault->is_private);
if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
return;
 
@@ -4304,6 +4318,55 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, 
struct kvm_async_pf *work)
kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL);
 }
 
+static inline u8 kvm_max_level_for_order(int order)
+{
+   BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
+
+   MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) &&
+   order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) &&
+   order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K));
+
+   if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
+   return PG_LEVEL_1G;
+
+   if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+   return PG_LEVEL_2M;
+
+   return PG_LEVEL_4K;
+}
+
+static int kvm_do_memory_fault_exit(struct kvm_vcpu *vcpu,
+   struct kvm_page_fault *fault)
+{
+   vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
+   if (fault->is_private)
+   vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE;
+   else
+   vcpu->run->memory.flags = 0;
+   vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
+   vcpu->run->memory.size = PAGE_SIZE;
+   return RET_PF_USER;
+}
+
+static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
+  struct kvm_page_fault *fault)
+{
+   int max_order, r;
+
+   if

[RFC PATCH v11 13/29] KVM: Add transparent hugepage support for dedicated guest memory

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 include/uapi/linux/kvm.h |  2 ++
 virt/kvm/guest_mem.c | 52 
 2 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9b344fc98598..17b12ee8b70e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -2290,6 +2290,8 @@ struct kvm_memory_attributes {
 
 #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct 
kvm_create_guest_memfd)
 
+#define KVM_GUEST_MEMFD_ALLOW_HUGEPAGE (1ULL << 0)
+
 struct kvm_create_guest_memfd {
__u64 size;
__u64 flags;
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index 1b705fd63fa8..384671a55b41 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -17,15 +17,48 @@ struct kvm_gmem {
struct list_head entry;
 };
 
-static struct folio *kvm_gmem_get_folio(struct file *file, pgoff_t index)
+static struct folio *kvm_gmem_get_huge_folio(struct inode *inode, pgoff_t 
index)
 {
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+   unsigned long huge_index = round_down(index, HPAGE_PMD_NR);
+   unsigned long flags = (unsigned long)inode->i_private;
+   struct address_space *mapping  = inode->i_mapping;
+   gfp_t gfp = mapping_gfp_mask(mapping);
struct folio *folio;
 
-   /* TODO: Support huge pages. */
-   folio = filemap_grab_folio(file->f_mapping, index);
+   if (!(flags & KVM_GUEST_MEMFD_ALLOW_HUGEPAGE))
+   return NULL;
+
+   if (filemap_range_has_page(mapping, huge_index << PAGE_SHIFT,
+  (huge_index + HPAGE_PMD_NR - 1) << 
PAGE_SHIFT))
+   return NULL;
+
+   folio = filemap_alloc_folio(gfp, HPAGE_PMD_ORDER);
if (!folio)
return NULL;
 
+   if (filemap_add_folio(mapping, folio, huge_index, gfp)) {
+   folio_put(folio);
+   return NULL;
+   }
+
+   return folio;
+#else
+   return NULL;
+#endif
+}
+
+static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
+{
+   struct folio *folio;
+
+   folio = kvm_gmem_get_huge_folio(inode, index);
+   if (!folio) {
+   folio = filemap_grab_folio(inode->i_mapping, index);
+   if (!folio)
+   return NULL;
+   }
+
/*
 * Use the up-to-date flag to track whether or not the memory has been
 * zeroed before being handed off to the guest.  There is no backing
@@ -332,7 +365,8 @@ static const struct inode_operations kvm_gmem_iops = {
.setattr= kvm_gmem_setattr,
 };
 
-static int __kvm_gmem_create(struct kvm *kvm, loff_t size, struct vfsmount 
*mnt)
+static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags,
+struct vfsmount *mnt)
 {
const char *anon_name = "[kvm-gmem]";
const struct qstr qname = QSTR_INIT(anon_name, strlen(anon_name));
@@ -355,6 +389,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, 
struct vfsmount *mnt)
inode->i_mode |= S_IFREG;
inode->i_size = size;
mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
+   mapping_set_large_folios(inode->i_mapping);
mapping_set_unevictable(inode->i_mapping);
mapping_set_unmovable(inode->i_mapping);
 
@@ -404,6 +439,12 @@ static bool kvm_gmem_is_valid_size(loff_t size, u64 flags)
if (size < 0 || !PAGE_ALIGNED(size))
return false;
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+   if ((flags & KVM_GUEST_MEMFD_ALLOW_HUGEPAGE) &&
+   !IS_ALIGNED(size, HPAGE_PMD_SIZE))
+   return false;
+#endif
+
return true;
 }
 
@@ -413,6 +454,9 @@ int kvm_gmem_create(struct kvm *kvm, struct 
kvm_create_guest_memfd *args)
u64 flags = args->flags;
u64 valid_flags = 0;
 
+   if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+   valid_flags |= KVM_GUEST_MEMFD_ALLOW_HUGEPAGE;
+
if (flags & ~valid_flags)
return -EINVAL;
 
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory

2023-07-18 Thread Sean Christopherson

TODO

Cc: Fuad Tabba 
Cc: Vishal Annapurve 
Cc: Ackerley Tng 
Cc: Jarkko Sakkinen 
Cc: Maciej Szmigiero 
Cc: Vlastimil Babka 
Cc: David Hildenbrand 
Cc: Quentin Perret 
Cc: Michael Roth 
Cc: Wang 
Cc: Liam Merwick 
Cc: Isaku Yamahata 
Co-developed-by: Kirill A. Shutemov 
Signed-off-by: Kirill A. Shutemov 
Co-developed-by: Yu Zhang 
Signed-off-by: Yu Zhang 
Co-developed-by: Chao Peng 
Signed-off-by: Chao Peng 
Co-developed-by: Ackerley Tng 
Signed-off-by: Ackerley Tng 
Signed-off-by: Sean Christopherson 
---
 include/linux/kvm_host.h   |  48 +++
 include/uapi/linux/kvm.h   |  14 +-
 include/uapi/linux/magic.h |   1 +
 virt/kvm/Kconfig   |   4 +
 virt/kvm/Makefile.kvm  |   1 +
 virt/kvm/guest_mem.c   | 591 +
 virt/kvm/kvm_main.c|  58 +++-
 virt/kvm/kvm_mm.h  |  38 +++
 8 files changed, 750 insertions(+), 5 deletions(-)
 create mode 100644 virt/kvm/guest_mem.c

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 97db63da6227..0d1e2ee8ae7a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -592,8 +592,20 @@ struct kvm_memory_slot {
u32 flags;
short id;
u16 as_id;
+
+#ifdef CONFIG_KVM_PRIVATE_MEM
+   struct {
+   struct file __rcu *file;
+   pgoff_t pgoff;
+   } gmem;
+#endif
 };
 
+static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot)
+{
+   return slot && (slot->flags & KVM_MEM_PRIVATE);
+}
+
 static inline bool kvm_slot_dirty_track_enabled(const struct kvm_memory_slot 
*slot)
 {
return slot->flags & KVM_MEM_LOG_DIRTY_PAGES;
@@ -688,6 +700,17 @@ static inline int kvm_arch_vcpu_memslots_id(struct 
kvm_vcpu *vcpu)
 }
 #endif
 
+/*
+ * Arch code must define kvm_arch_has_private_mem if support for private memory
+ * is enabled.
+ */
+#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM)
+static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
+{
+   return false;
+}
+#endif
+
 struct kvm_memslots {
u64 generation;
atomic_long_t last_used_slot;
@@ -1380,6 +1403,7 @@ void *kvm_mmu_memory_cache_alloc(struct 
kvm_mmu_memory_cache *mc);
 void kvm_mmu_invalidate_begin(struct kvm *kvm);
 void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
 void kvm_mmu_invalidate_end(struct kvm *kvm);
+bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
@@ -2313,6 +2337,30 @@ static inline unsigned long 
kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn
 
 bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 struct kvm_gfn_range *range);
+
+static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
+{
+   return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&
+  kvm_get_memory_attributes(kvm, gfn) & 
KVM_MEMORY_ATTRIBUTE_PRIVATE;
+}
+#else
+static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
+{
+   return false;
+}
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 
+#ifdef CONFIG_KVM_PRIVATE_MEM
+int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn, kvm_pfn_t *pfn, int *max_order);
+#else
+static inline int kvm_gmem_get_pfn(struct kvm *kvm,
+  struct kvm_memory_slot *slot, gfn_t gfn,
+  kvm_pfn_t *pfn, int *max_order)
+{
+   KVM_BUG_ON(1, kvm);
+   return -EIO;
+}
+#endif /* CONFIG_KVM_PRIVATE_MEM */
+
 #endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f065c57db327..9b344fc98598 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -102,7 +102,10 @@ struct kvm_userspace_memory_region2 {
__u64 guest_phys_addr;
__u64 memory_size;
__u64 userspace_addr;
-   __u64 pad[16];
+   __u64 gmem_offset;
+   __u32 gmem_fd;
+   __u32 pad1;
+   __u64 pad2[14];
 };
 
 /*
@@ -112,6 +115,7 @@ struct kvm_userspace_memory_region2 {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES(1UL << 0)
 #define KVM_MEM_READONLY   (1UL << 1)
+#define KVM_MEM_PRIVATE(1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
@@ -2284,4 +2288,12 @@ struct kvm_memory_attributes {
 
 #define KVM_MEMORY_ATTRIBUTE_PRIVATE   (1ULL << 3)
 
+#define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct 
kvm_create_guest_memfd)
+
+struct kvm_create_guest_memfd {
+   __u64 size;
+   __u64 flags;
+   __u64 reserved[6];
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 6325d1d0e90f..15041aa7d9ae 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -101,5 +101,6 @@
 #define DMA_BUF_MAGIC  0x444d4142  /* "DMAB" */
 #define DEVMEM_MAGIC   0x454d444d  /*

[RFC PATCH v11 11/29] security: Export security_inode_init_security_anon() for use by KVM

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 security/security.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/security.c b/security/security.c
index b720424ca37d..7fc78f0f3622 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1654,6 +1654,7 @@ int security_inode_init_security_anon(struct inode *inode,
return call_int_hook(inode_init_security_anon, 0, inode, name,
 context_inode);
 }
+EXPORT_SYMBOL_GPL(security_inode_init_security_anon);
 
 #ifdef CONFIG_SECURITY_PATH
 /**
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 09/29] KVM: x86: Disallow hugepages when memory attributes are mixed

2023-07-18 Thread Sean Christopherson

From: Chao Peng 

Disallow creating hugepages with mixed memory attributes, e.g. shared
versus private, as mapping a hugepage in this case would allow the guest
to access memory with the wrong attributes, e.g. overlaying private memory
with a shared hugepage.

Tracking whether or not attributes are mixed via the existing
disallow_lpage field, but use the most significant bit in 'disallow_lpage'
to indicate a hugepage has mixed attributes instead using the normal
refcounting.  Whether or not attributes are mixed is binary; either they
are or they aren't.  Attempting to squeeze that info into the refcount is
unnecessarily complex as it would require knowing the previous state of
the mixed count when updating attributes.  Using a flag means KVM just
needs to ensure the current status is reflected in the memslots.

Signed-off-by: Chao Peng 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm_host.h |   3 +
 arch/x86/kvm/mmu/mmu.c  | 185 +++-
 arch/x86/kvm/x86.c  |   4 +
 3 files changed, 190 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f9a927296d85..b87ff7b601fa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1816,6 +1816,9 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu);
 int kvm_mmu_init_vm(struct kvm *kvm);
 void kvm_mmu_uninit_vm(struct kvm *kvm);
 
+void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
+   struct kvm_memory_slot *slot);
+
 void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu);
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b034727c4cf9..aefe67185637 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -803,16 +803,27 @@ static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn,
return >arch.lpage_info[level - 2][idx];
 }
 
+/*
+ * The most significant bit in disallow_lpage tracks whether or not memory
+ * attributes are mixed, i.e. not identical for all gfns at the current level.
+ * The lower order bits are used to refcount other cases where a hugepage is
+ * disallowed, e.g. if KVM has shadow a page table at the gfn.
+ */
+#define KVM_LPAGE_MIXED_FLAG   BIT(31)
+
 static void update_gfn_disallow_lpage_count(const struct kvm_memory_slot *slot,
gfn_t gfn, int count)
 {
struct kvm_lpage_info *linfo;
-   int i;
+   int old, i;
 
for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
linfo = lpage_info_slot(gfn, slot, i);
+
+   old = linfo->disallow_lpage;
linfo->disallow_lpage += count;
-   WARN_ON(linfo->disallow_lpage < 0);
+
+   WARN_ON_ONCE((old ^ linfo->disallow_lpage) & 
KVM_LPAGE_MIXED_FLAG);
}
 }
 
@@ -7223,3 +7234,173 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
if (kvm->arch.nx_huge_page_recovery_thread)
kthread_stop(kvm->arch.nx_huge_page_recovery_thread);
 }
+
+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
+static bool hugepage_test_mixed(struct kvm_memory_slot *slot, gfn_t gfn,
+   int level)
+{
+   return lpage_info_slot(gfn, slot, level)->disallow_lpage & 
KVM_LPAGE_MIXED_FLAG;
+}
+
+static void hugepage_clear_mixed(struct kvm_memory_slot *slot, gfn_t gfn,
+int level)
+{
+   lpage_info_slot(gfn, slot, level)->disallow_lpage &= 
~KVM_LPAGE_MIXED_FLAG;
+}
+
+static void hugepage_set_mixed(struct kvm_memory_slot *slot, gfn_t gfn,
+  int level)
+{
+   lpage_info_slot(gfn, slot, level)->disallow_lpage |= 
KVM_LPAGE_MIXED_FLAG;
+}
+
+static bool range_has_attrs(struct kvm *kvm, gfn_t start, gfn_t end,
+   unsigned long attrs)
+{
+   XA_STATE(xas, >mem_attr_array, start);
+   unsigned long index;
+   bool has_attrs;
+   void *entry;
+
+   rcu_read_lock();
+
+   if (!attrs) {
+   has_attrs = !xas_find(, end);
+   goto out;
+   }
+
+   has_attrs = true;
+   for (index = start; index < end; index++) {
+   do {
+   entry = xas_next();
+   } while (xas_retry(, entry));
+
+   if (xas.xa_index != index || xa_to_value(entry) != attrs) {
+   has_attrs = false;
+   break;
+   }
+   }
+
+out:
+   rcu_read_unlock();
+   return has_attrs;
+}
+
+static bool hugepage_has_attrs(struct kvm *kvm, struct kvm_memory_slot *slot,
+  gfn_t gfn, int level, unsigned long attrs)
+{
+   const unsigned long start = gfn;
+   const unsigned long end = start + KVM_PAGES_PER_HPAGE(level);
+
+   if (level ==

[RFC PATCH v11 08/29] KVM: Introduce per-page memory attributes

2023-07-18 Thread Sean Christopherson

From: Chao Peng 

In confidential computing usages, whether a page is private or shared is
necessary information for KVM to perform operations like page fault
handling, page zapping etc. There are other potential use cases for
per-page memory attributes, e.g. to make memory read-only (or no-exec,
or exec-only, etc.) without having to modify memslots.

Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow
userspace to operate on the per-page memory attributes.
  - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to
a guest memory range.
  - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported
memory attributes.

Use an xarray to store the per-page attributes internally, with a naive,
not fully optimized implementation, i.e. prioritize correctness over
performance for the initial implementation.

Because setting memory attributes is roughly analogous to mprotect() on
memory that is mapped into the guest, zap existing mappings prior to
updating the memory attributes.  Opportunistically provide an arch hook
for the post-set path (needed to complete invalidation anyways) in
anticipation of x86 needing the hook to update metadata related to
determining whether or not a given gfn can be backed with various sizes
of hugepages.

It's possible that future usages may not require an invalidation, e.g.
if KVM ends up supporting RWX protections and userspace grants _more_
protections, but again opt for simplicity and punt optimizations to
if/when they are needed.

Suggested-by: Sean Christopherson 
Link: https://lore.kernel.org/all/y2wb48kd0j4vg...@google.com
Cc: Fuad Tabba 
Signed-off-by: Chao Peng 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
---
 Documentation/virt/kvm/api.rst |  60 
 include/linux/kvm_host.h   |  14 +++
 include/uapi/linux/kvm.h   |  14 +++
 virt/kvm/Kconfig   |   4 +
 virt/kvm/kvm_main.c| 170 +
 5 files changed, 262 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 34d4ce66e0c8..0ca8561775ac 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6068,6 +6068,56 @@ writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using 
the SET_ONE_REG
 interface. No error will be returned, but the resulting offset will not be
 applied.
 
+4.139 KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES
+-
+
+:Capability: KVM_CAP_MEMORY_ATTRIBUTES
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: u64 memory attributes bitmask(out)
+:Returns: 0 on success, <0 on error
+
+Returns supported memory attributes bitmask. Supported memory attributes will
+have the corresponding bits set in u64 memory attributes bitmask.
+
+The following memory attributes are defined::
+
+  #define KVM_MEMORY_ATTRIBUTE_PRIVATE   (1ULL << 3)
+
+4.140 KVM_SET_MEMORY_ATTRIBUTES
+-
+
+:Capability: KVM_CAP_MEMORY_ATTRIBUTES
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_memory_attributes(in/out)
+:Returns: 0 on success, <0 on error
+
+Sets memory attributes for pages in a guest memory range. Parameters are
+specified via the following structure::
+
+  struct kvm_memory_attributes {
+   __u64 address;
+   __u64 size;
+   __u64 attributes;
+   __u64 flags;
+  };
+
+The user sets the per-page memory attributes to a guest memory range indicated
+by address/size, and in return KVM adjusts address and size to reflect the
+actual pages of the memory range have been successfully set to the attributes.
+If the call returns 0, "address" is updated to the last successful address + 1
+and "size" is updated to the remaining address size that has not been set
+successfully. The user should check the return value as well as the size to
+decide if the operation succeeded for the whole range or not. The user may want
+to retry the operation with the returned address/size if the previous range was
+partially successful.
+
+Both address and size should be page aligned and the supported attributes can 
be
+retrieved with KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES.
+
+The "flags" field may be used for future extensions and should be set to 0s.
+
 5. The kvm_run structure
 
 
@@ -8494,6 +8544,16 @@ block sizes is exposed in 
KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES as a
 64-bit bitmap (each bit describing a block size). The default value is
 0, to disable the eager page splitting.
 
+8.41 KVM_CAP_MEMORY_ATTRIBUTES
+--
+
+:Capability: KVM_CAP_MEMORY_ATTRIBUTES
+:Architectures: x86
+:Type: vm
+
+This capability indicates KVM supports per-page memory attributes and ioctls
+KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES/KVM_SET_MEMORY_ATTRIBUTES are available.
+
 9. Known KVM API problems
 =
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e9ca49d451f3..97db63da6227

[RFC PATCH v11 07/29] KVM: Add KVM_EXIT_MEMORY_FAULT exit

2023-07-18 Thread Sean Christopherson

From: Chao Peng 

This new KVM exit allows userspace to handle memory-related errors. It
indicates an error happens in KVM at guest memory range [gpa, gpa+size).
The flags includes additional information for userspace to handle the
error. Currently bit 0 is defined as 'private memory' where '1'
indicates error happens due to private memory access and '0' indicates
error happens due to shared memory access.

When private memory is enabled, this new exit will be used for KVM to
exit to userspace for shared <-> private memory conversion in memory
encryption usage. In such usage, typically there are two kind of memory
conversions:
  - explicit conversion: happens when guest explicitly calls into KVM
to map a range (as private or shared), KVM then exits to userspace
to perform the map/unmap operations.
  - implicit conversion: happens in KVM page fault handler where KVM
exits to userspace for an implicit conversion when the page is in a
different state than requested (private or shared).

Suggested-by: Sean Christopherson 
Co-developed-by: Yu Zhang 
Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 
Signed-off-by: Sean Christopherson 
---
 Documentation/virt/kvm/api.rst | 22 ++
 include/uapi/linux/kvm.h   |  8 
 2 files changed, 30 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index c0ddd3035462..34d4ce66e0c8 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6700,6 +6700,28 @@ array field represents return values. The userspace 
should update the return
 values of SBI call before resuming the VCPU. For more details on RISC-V SBI
 spec refer, https://github.com/riscv/riscv-sbi-doc.
 
+::
+
+   /* KVM_EXIT_MEMORY_FAULT */
+   struct {
+  #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3)
+   __u64 flags;
+   __u64 gpa;
+   __u64 size;
+   } memory;
+
+If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has
+encountered a memory error which is not handled by KVM kernel module and
+userspace may choose to handle it. The 'flags' field indicates the memory
+properties of the exit.
+
+ - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by
+   private memory access when the bit is set. Otherwise the memory error is
+   caused by shared memory access when the bit is clear.
+
+'gpa' and 'size' indicate the memory range the error occurs at. The userspace
+may handle the error and return to KVM to retry the previous memory access.
+
 ::
 
 /* KVM_EXIT_NOTIFY */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4d4b3de8ac55..6c6ed214b6ac 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -274,6 +274,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_SBI35
 #define KVM_EXIT_RISCV_CSR36
 #define KVM_EXIT_NOTIFY   37
+#define KVM_EXIT_MEMORY_FAULT 38
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -520,6 +521,13 @@ struct kvm_run {
 #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0)
__u32 flags;
} notify;
+   /* KVM_EXIT_MEMORY_FAULT */
+   struct {
+#define KVM_MEMORY_EXIT_FLAG_PRIVATE   (1ULL << 3)
+   __u64 flags;
+   __u64 gpa;
+   __u64 size;
+   } memory;
/* Fix the size of the union. */
char padding[256];
};
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 06/29] KVM: Introduce KVM_SET_USER_MEMORY_REGION2

2023-07-18 Thread Sean Christopherson

Cc: Jarkko Sakkinen 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/x86.c   |  2 +-
 include/linux/kvm_host.h |  4 ++--
 include/uapi/linux/kvm.h | 13 +
 virt/kvm/kvm_main.c  | 38 ++
 4 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a6b9bea62fb8..92e77afd3ffd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12420,7 +12420,7 @@ void __user * __x86_set_memory_region(struct kvm *kvm, 
int id, gpa_t gpa,
}
 
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
-   struct kvm_userspace_memory_region m;
+   struct kvm_userspace_memory_region2 m;
 
m.slot = id | (i << 16);
m.flags = 0;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d2d3e083ec7f..e9ca49d451f3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1130,9 +1130,9 @@ enum kvm_mr_change {
 };
 
 int kvm_set_memory_region(struct kvm *kvm,
- const struct kvm_userspace_memory_region *mem);
+ const struct kvm_userspace_memory_region2 *mem);
 int __kvm_set_memory_region(struct kvm *kvm,
-   const struct kvm_userspace_memory_region *mem);
+   const struct kvm_userspace_memory_region2 *mem);
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen);
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f089ab290978..4d4b3de8ac55 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -95,6 +95,16 @@ struct kvm_userspace_memory_region {
__u64 userspace_addr; /* start of the userspace allocated memory */
 };
 
+/* for KVM_SET_USER_MEMORY_REGION2 */
+struct kvm_userspace_memory_region2 {
+   __u32 slot;
+   __u32 flags;
+   __u64 guest_phys_addr;
+   __u64 memory_size;
+   __u64 userspace_addr;
+   __u64 pad[16];
+};
+
 /*
  * The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for
  * userspace, other bits are reserved for kvm internal use which are defined
@@ -1192,6 +1202,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_COUNTER_OFFSET 227
 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
+#define KVM_CAP_USER_MEMORY2 230
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1466,6 +1477,8 @@ struct kvm_vfio_spapr_tce {
struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR  _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_SET_USER_MEMORY_REGION2 _IOW(KVMIO, 0x49, \
+struct kvm_userspace_memory_region2)
 
 /* enable ucontrol for s390 */
 struct kvm_s390_ucas_mapping {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 53346bc2902a..c14adf93daec 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1549,7 +1549,7 @@ static void kvm_replace_memslot(struct kvm *kvm,
}
 }
 
-static int check_memory_region_flags(const struct kvm_userspace_memory_region 
*mem)
+static int check_memory_region_flags(const struct kvm_userspace_memory_region2 
*mem)
 {
u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
 
@@ -1951,7 +1951,7 @@ static bool kvm_check_memslot_overlap(struct kvm_memslots 
*slots, int id,
  * Must be called holding kvm->slots_lock for write.
  */
 int __kvm_set_memory_region(struct kvm *kvm,
-   const struct kvm_userspace_memory_region *mem)
+   const struct kvm_userspace_memory_region2 *mem)
 {
struct kvm_memory_slot *old, *new;
struct kvm_memslots *slots;
@@ -2055,7 +2055,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 EXPORT_SYMBOL_GPL(__kvm_set_memory_region);
 
 int kvm_set_memory_region(struct kvm *kvm,
- const struct kvm_userspace_memory_region *mem)
+ const struct kvm_userspace_memory_region2 *mem)
 {
int r;
 
@@ -2067,7 +2067,7 @@ int kvm_set_memory_region(struct kvm *kvm,
 EXPORT_SYMBOL_GPL(kvm_set_memory_region);
 
 static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
- struct kvm_userspace_memory_region 
*mem)
+ struct kvm_userspace_memory_region2 
*mem)
 {
if ((u16)mem->slot >= KVM_USER_MEM_SLOTS)
return -EINVAL;
@@ -4514,6 +4514,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct 
kvm *kvm, long arg)
 {
switch (arg) {
case KVM_CAP_USER_MEMORY:
+   case KVM_CAP_USER_MEMORY2:
case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS:
case KVM_CAP_INTERNAL_ERROR_DATA:
@@ -4757,6 +4758,14 @@ static int

[RFC PATCH v11 05/29] KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIER

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 arch/arm64/include/asm/kvm_host.h   |  2 --
 arch/arm64/kvm/Kconfig  |  2 +-
 arch/mips/include/asm/kvm_host.h|  2 --
 arch/mips/kvm/Kconfig   |  2 +-
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kvm/Kconfig|  8 
 arch/powerpc/kvm/powerpc.c  |  4 +---
 arch/riscv/include/asm/kvm_host.h   |  2 --
 arch/riscv/kvm/Kconfig  |  2 +-
 arch/x86/include/asm/kvm_host.h |  2 --
 arch/x86/kvm/Kconfig|  2 +-
 include/linux/kvm_host.h|  8 +---
 virt/kvm/Kconfig|  4 
 virt/kvm/kvm_main.c | 10 +-
 14 files changed, 23 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 8b6096753740..50d89d400bf1 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -912,8 +912,6 @@ int __kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
 int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
  struct kvm_vcpu_events *events);
 
-#define KVM_ARCH_WANT_MMU_NOTIFIER
-
 void kvm_arm_halt_guest(struct kvm *kvm);
 void kvm_arm_resume_guest(struct kvm *kvm);
 
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index f531da6b362e..a650b46f4f2f 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -22,7 +22,7 @@ menuconfig KVM
bool "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM
select KVM_GENERIC_HARDWARE_ENABLING
-   select MMU_NOTIFIER
+   select KVM_GENERIC_MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select HAVE_KVM_CPU_RELAX_INTERCEPT
select HAVE_KVM_ARCH_TLB_FLUSH_ALL
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 04cedf9f8811..22a41d941bf3 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -810,8 +810,6 @@ int kvm_mips_mkclean_gpa_pt(struct kvm *kvm, gfn_t 
start_gfn, gfn_t end_gfn);
 pgd_t *kvm_pgd_alloc(void);
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 
-#define KVM_ARCH_WANT_MMU_NOTIFIER
-
 /* Emulation */
 enum emulation_result update_pc(struct kvm_vcpu *vcpu, u32 cause);
 int kvm_get_badinstr(u32 *opc, struct kvm_vcpu *vcpu, u32 *out);
diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
index a8cdba75f98d..c04987d2ed2e 100644
--- a/arch/mips/kvm/Kconfig
+++ b/arch/mips/kvm/Kconfig
@@ -25,7 +25,7 @@ config KVM
select HAVE_KVM_EVENTFD
select HAVE_KVM_VCPU_ASYNC_IOCTL
select KVM_MMIO
-   select MMU_NOTIFIER
+   select KVM_GENERIC_MMU_NOTIFIER
select INTERVAL_TREE
select KVM_GENERIC_HARDWARE_ENABLING
help
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 14ee0dece853..4b5c3f2acf78 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -62,8 +62,6 @@
 
 #include 
 
-#define KVM_ARCH_WANT_MMU_NOTIFIER
-
 #define HPTEG_CACHE_NUM(1 << 15)
 #define HPTEG_HASH_BITS_PTE13
 #define HPTEG_HASH_BITS_PTE_LONG   12
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 902611954200..b33358ee6424 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -42,7 +42,7 @@ config KVM_BOOK3S_64_HANDLER
 config KVM_BOOK3S_PR_POSSIBLE
bool
select KVM_MMIO
-   select MMU_NOTIFIER
+   select KVM_GENERIC_MMU_NOTIFIER
 
 config KVM_BOOK3S_HV_POSSIBLE
bool
@@ -85,7 +85,7 @@ config KVM_BOOK3S_64_HV
tristate "KVM for POWER7 and later using hypervisor mode in host"
depends on KVM_BOOK3S_64 && PPC_POWERNV
select KVM_BOOK3S_HV_POSSIBLE
-   select MMU_NOTIFIER
+   select KVM_GENERIC_MMU_NOTIFIER
select CMA
help
  Support running unmodified book3s_64 guest kernels in
@@ -194,7 +194,7 @@ config KVM_E500V2
depends on !CONTEXT_TRACKING_USER
select KVM
select KVM_MMIO
-   select MMU_NOTIFIER
+   select KVM_GENERIC_MMU_NOTIFIER
help
  Support running unmodified E500 guest kernels in virtual machines on
  E500v2 host processors.
@@ -211,7 +211,7 @@ config KVM_E500MC
select KVM
select KVM_MMIO
select KVM_BOOKE_HV
-   select MMU_NOTIFIER
+   select KVM_GENERIC_MMU_NOTIFIER
help
  Support running unmodified E500MC/E5500/E6500 guest kernels in
  virtual machines on E500MC/E5500/E6500 host processors.
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5cf9e5e3112a..f97fbac7eac9 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -635,9 +635,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
r = hv_enabled;
 #else
-#ifndef KVM_ARCH_WANT_MMU_NOTIFIER
-

[RFC PATCH v11 04/29] KVM: PPC: Drop dead code related to KVM_ARCH_WANT_MMU_NOTIFIER

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 arch/powerpc/kvm/powerpc.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 7197c8256668..5cf9e5e3112a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -634,10 +634,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_SYNC_MMU:
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
r = hv_enabled;
-#elif defined(KVM_ARCH_WANT_MMU_NOTIFIER)
-   r = 1;
 #else
-   r = 0;
+#ifndef KVM_ARCH_WANT_MMU_NOTIFIER
+   BUILD_BUG();
+#endif
+   r = 1;
 #endif
break;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-- 
2.41.0.255.g8b1d071c50-goog

[RFC PATCH v11 03/29] KVM: Use gfn instead of hva for mmu_notifier_retry

2023-07-18 Thread Sean Christopherson

From: Chao Peng 

Currently in mmu_notifier invalidate path, hva range is recorded and
then checked against by mmu_notifier_retry_hva() in the page fault
handling path. However, for the to be introduced private memory, a page
fault may not have a hva associated, checking gfn(gpa) makes more sense.

For existing hva based shared memory, gfn is expected to also work. The
only downside is when aliasing multiple gfns to a single hva, the
current algorithm of checking multiple ranges could result in a much
larger range being rejected. Such aliasing should be uncommon, so the
impact is expected small.

Suggested-by: Sean Christopherson 
Signed-off-by: Chao Peng 
Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 
[sean: convert vmx_set_apic_access_page_addr() to gfn-based API]
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c   | 10 ++
 arch/x86/kvm/vmx/vmx.c   | 11 +--
 include/linux/kvm_host.h | 33 +
 virt/kvm/kvm_main.c  | 40 +++-
 4 files changed, 63 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d72f2b20f430..b034727c4cf9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3087,7 +3087,7 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, 
u64 *sptep)
  *
  * There are several ways to safely use this helper:
  *
- * - Check mmu_invalidate_retry_hva() after grabbing the mapping level, before
+ * - Check mmu_invalidate_retry_gfn() after grabbing the mapping level, before
  *   consuming it.  In this case, mmu_lock doesn't need to be held during the
  *   lookup, but it does need to be held while checking the MMU notifier.
  *
@@ -4400,7 +4400,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
return true;
 
return fault->slot &&
-  mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva);
+  mmu_invalidate_retry_gfn(vcpu->kvm, fault->mmu_seq, fault->gfn);
 }
 
 static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault 
*fault)
@@ -6301,7 +6301,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, 
gfn_t gfn_end)
 
write_lock(>mmu_lock);
 
-   kvm_mmu_invalidate_begin(kvm, 0, -1ul);
+   kvm_mmu_invalidate_begin(kvm);
+
+   kvm_mmu_invalidate_range_add(kvm, gfn_start, gfn_end);
 
flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
 
@@ -6314,7 +6316,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, 
gfn_t gfn_end)
if (flush)
kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - 
gfn_start);
 
-   kvm_mmu_invalidate_end(kvm, 0, -1ul);
+   kvm_mmu_invalidate_end(kvm);
 
write_unlock(>mmu_lock);
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0ecf4be2c6af..946380b53cf5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6729,10 +6729,10 @@ static void vmx_set_apic_access_page_addr(struct 
kvm_vcpu *vcpu)
return;
 
/*
-* Grab the memslot so that the hva lookup for the mmu_notifier retry
-* is guaranteed to use the same memslot as the pfn lookup, i.e. rely
-* on the pfn lookup's validation of the memslot to ensure a valid hva
-* is used for the retry check.
+* Explicitly grab the memslot using KVM's internal slot ID to ensure
+* KVM doesn't unintentionally grab a userspace memslot.  It _should_
+* be impossible for userspace to create a memslot for the APIC when
+* APICv is enabled, but paranoia won't hurt in this case.
 */
slot = id_to_memslot(slots, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT);
if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
@@ -6757,8 +6757,7 @@ static void vmx_set_apic_access_page_addr(struct kvm_vcpu 
*vcpu)
return;
 
read_lock(>kvm->mmu_lock);
-   if (mmu_invalidate_retry_hva(kvm, mmu_seq,
-gfn_to_hva_memslot(slot, gfn))) {
+   if (mmu_invalidate_retry_gfn(kvm, mmu_seq, gfn)) {
kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
read_unlock(>kvm->mmu_lock);
goto out;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b901571ab61e..90a0be261a5c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -788,8 +788,8 @@ struct kvm {
struct mmu_notifier mmu_notifier;
unsigned long mmu_invalidate_seq;
long mmu_invalidate_in_progress;
-   unsigned long mmu_invalidate_range_start;
-   unsigned long mmu_invalidate_range_end;
+   gfn_t mmu_invalidate_range_start;
+   gfn_t mmu_invalidate_range_end;
 #endif
struct list_head devices;
u64 manual_dirty_log_protect;
@@ -1371,10 +1371,9 @@ void kvm_mmu_free_memory_cache(struct 
kvm_mmu_memory_cache *mc);
 void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 #endif

[RFC PATCH v11 02/29] KVM: Tweak kvm_hva_range and hva_handler_t to allow reusing for gfn ranges

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 virt/kvm/kvm_main.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d58b7a506d27..50aea855eeae 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -516,21 +516,25 @@ static inline struct kvm *mmu_notifier_to_kvm(struct 
mmu_notifier *mn)
return container_of(mn, struct kvm, mmu_notifier);
 }
 
-typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
+typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
 
 typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start,
 unsigned long end);
 
 typedef void (*on_unlock_fn_t)(struct kvm *kvm);
 
-struct kvm_hva_range {
-   unsigned long start;
-   unsigned long end;
+struct kvm_mmu_notifier_range {
+   /*
+* 64-bit addresses, as KVM notifiers can operate on host virtual
+* addresses (unsigned long) and guest physical addresses (64-bit).
+*/
+   u64 start;
+   u64 end;
union {
pte_t pte;
u64 raw;
} arg;
-   hva_handler_t handler;
+   gfn_handler_t handler;
on_lock_fn_t on_lock;
on_unlock_fn_t on_unlock;
bool flush_on_ret;
@@ -557,7 +561,7 @@ static void kvm_null_fn(void)
 node = interval_tree_iter_next(node, start, last))  \
 
 static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
- const struct kvm_hva_range 
*range)
+ const struct 
kvm_mmu_notifier_range *range)
 {
bool ret = false, locked = false;
struct kvm_gfn_range gfn_range;
@@ -588,9 +592,9 @@ static __always_inline int __kvm_handle_hva_range(struct 
kvm *kvm,
unsigned long hva_start, hva_end;
 
slot = container_of(node, struct kvm_memory_slot, 
hva_node[slots->node_idx]);
-   hva_start = max(range->start, slot->userspace_addr);
-   hva_end = min(range->end, slot->userspace_addr +
- (slot->npages << PAGE_SHIFT));
+   hva_start = max_t(unsigned long, range->start, 
slot->userspace_addr);
+   hva_end = min_t(unsigned long, range->end,
+   slot->userspace_addr + (slot->npages << 
PAGE_SHIFT));
 
/*
 * To optimize for the likely case where the address
@@ -640,10 +644,10 @@ static __always_inline int kvm_handle_hva_range(struct 
mmu_notifier *mn,
unsigned long start,
unsigned long end,
pte_t pte,
-   hva_handler_t handler)
+   gfn_handler_t handler)
 {
struct kvm *kvm = mmu_notifier_to_kvm(mn);
-   const struct kvm_hva_range range = {
+   const struct kvm_mmu_notifier_range range = {
.start  = start,
.end= end,
.arg.pte= pte,
@@ -660,10 +664,10 @@ static __always_inline int kvm_handle_hva_range(struct 
mmu_notifier *mn,
 static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier 
*mn,
 unsigned long start,
 unsigned long end,
-hva_handler_t handler)
+gfn_handler_t handler)
 {
struct kvm *kvm = mmu_notifier_to_kvm(mn);
-   const struct kvm_hva_range range = {
+   const struct kvm_mmu_notifier_range range = {
.start  = start,
.end= end,
.handler= handler,
@@ -750,7 +754,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct 
mmu_notifier *mn,
const struct mmu_notifier_range *range)
 {
struct kvm *kvm = mmu_notifier_to_kvm(mn);
-   const struct kvm_hva_range hva_range = {
+   const struct kvm_mmu_notifier_range hva_range = {
.start  = range->start,
.end= range->end,
.handler= kvm_unmap_gfn_range,
@@ -814,7 +818,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct 
mmu_notifier *mn,
const struct mmu_notifier_range *range)
 {
struct kvm *kvm = mmu_notifier_to_kvm(mn);
-   const struct kvm_hva_range hva_range = {
+   const struct kvm_mmu_notifier_range hva_range = {
.start  = range->start,

[RFC PATCH v11 01/29] KVM: Wrap kvm_gfn_range.pte in a per-action union

2023-07-18 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 arch/arm64/kvm/mmu.c   |  2 +-
 arch/mips/kvm/mmu.c|  2 +-
 arch/riscv/kvm/mmu.c   |  2 +-
 arch/x86/kvm/mmu/mmu.c |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c |  6 +++---
 include/linux/kvm_host.h   |  5 -
 virt/kvm/kvm_main.c| 16 ++--
 7 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..55f03a68f1cd 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1721,7 +1721,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
kvm_gfn_range *range)
 
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-   kvm_pfn_t pfn = pte_pfn(range->pte);
+   kvm_pfn_t pfn = pte_pfn(range->arg.pte);
 
if (!kvm->arch.mmu.pgt)
return false;
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index e8c08988ed37..7b2ac1319d70 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -447,7 +447,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
kvm_gfn_range *range)
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
gpa_t gpa = range->start << PAGE_SHIFT;
-   pte_t hva_pte = range->pte;
+   pte_t hva_pte = range->arg.pte;
pte_t *gpa_pte = kvm_mips_pte_for_gpa(kvm, NULL, gpa);
pte_t old_pte;
 
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index f2eb47925806..857f4312b0f8 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -559,7 +559,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
kvm_gfn_range *range)
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
int ret;
-   kvm_pfn_t pfn = pte_pfn(range->pte);
+   kvm_pfn_t pfn = pte_pfn(range->arg.pte);
 
if (!kvm->arch.pgd)
return false;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ec169f5c7dce..d72f2b20f430 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1588,7 +1588,7 @@ static __always_inline bool kvm_handle_gfn_range(struct 
kvm *kvm,
for_each_slot_rmap_range(range->slot, PG_LEVEL_4K, 
KVM_MAX_HUGEPAGE_LEVEL,
 range->start, range->end - 1, )
ret |= handler(kvm, iterator.rmap, range->slot, iterator.gfn,
-  iterator.level, range->pte);
+  iterator.level, range->arg.pte);
 
return ret;
 }
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 512163d52194..6250bd3d20c1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1241,7 +1241,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter 
*iter,
u64 new_spte;
 
/* Huge pages aren't expected to be modified without first being 
zapped. */
-   WARN_ON(pte_huge(range->pte) || range->start + 1 != range->end);
+   WARN_ON(pte_huge(range->arg.pte) || range->start + 1 != range->end);
 
if (iter->level != PG_LEVEL_4K ||
!is_shadow_present_pte(iter->old_spte))
@@ -1255,9 +1255,9 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter 
*iter,
 */
tdp_mmu_iter_set_spte(kvm, iter, 0);
 
-   if (!pte_write(range->pte)) {
+   if (!pte_write(range->arg.pte)) {
new_spte = 
kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
- 
pte_pfn(range->pte));
+ 
pte_pfn(range->arg.pte));
 
tdp_mmu_iter_set_spte(kvm, iter, new_spte);
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9d3ac7720da9..b901571ab61e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -260,7 +260,10 @@ struct kvm_gfn_range {
struct kvm_memory_slot *slot;
gfn_t start;
gfn_t end;
-   pte_t pte;
+   union {
+   pte_t pte;
+   u64 raw;
+   } arg;
bool may_block;
 };
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dfbaafbe3a00..d58b7a506d27 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -526,7 +526,10 @@ typedef void (*on_unlock_fn_t)(struct kvm *kvm);
 struct kvm_hva_range {
unsigned long start;
unsigned long end;
-   pte_t pte;
+   union {
+   pte_t pte;
+   u64 raw;
+   } arg;
hva_handler_t handler;
on_lock_fn_t on_lock;
on_unlock_fn_t on_unlock;
@@ -562,6 +565,10 @@ static __always_inline int __kvm_handle_hva_range(struct 
kvm *kvm,
struct kvm_memslots *slots;
int i, idx;
 
+   BUILD_BUG_ON(sizeof(gfn_range.arg) != sizeof(gfn_range.arg.raw));
+   BUILD_BUG_ON(sizeof(range->arg) != sizeof(range->arg.raw));
+   BUILD_BUG_ON(sizeof(gfn_range.arg) !=

[RFC PATCH v11 00/29] KVM: guest_memfd() and per-page attributes

2023-07-18 Thread Sean Christopherson

This is the next iteration of implementing fd-based (instead of vma-based)
memory for KVM guests.  If you want the full background of why we are doing
this, please go read the v10 cover letter[1].

The biggest change from v10 is to implement the backing storage in KVM
itself, and expose it via a KVM ioctl() instead of a "generic" sycall.
See link[2] for details on why we pivoted to a KVM-specific approach.

Key word is "biggest".  Relative to v10, there are many big changes.
Highlights below (I can't remember everything that got changed at
this point).

Tagged RFC as there are a lot of empty changelogs, and a lot of missing
documentation.  And ideally, we'll have even more tests before merging.
There are also several gaps/opens (to be discussed in tomorrow's PUCK).

v11:
 - Test private<=>shared conversions *without* doing fallocate()
 - PUNCH_HOLE all memory between iterations of the conversion test so that
   KVM doesn't retain pages in the guest_memfd
 - Rename hugepage control to be a very generic ALLOW_HUGEPAGE, instead of
   giving it a THP or PMD specific name.
 - Fold in fixes from a lot of people (thank you!)
 - Zap SPTEs *before* updating attributes to ensure no weirdness, e.g. if
   KVM handles a page fault and looks at inconsistent attributes
 - Refactor MMU interaction with attributes updates to reuse much of KVM's
   framework for mmu_notifiers.

[1] 
https://lore.kernel.org/all/20221202061347.1070246-1-chao.p.p...@linux.intel.com
[2] https://lore.kernel.org/all/zem5zq8oo+xna...@google.com

Ackerley Tng (1):
  KVM: selftests: Test KVM exit behavior for private memory/access

Chao Peng (7):
  KVM: Use gfn instead of hva for mmu_notifier_retry
  KVM: Add KVM_EXIT_MEMORY_FAULT exit
  KVM: Introduce per-page memory attributes
  KVM: x86: Disallow hugepages when memory attributes are mixed
  KVM: x86/mmu: Handle page fault for private memory
  KVM: selftests: Add KVM_SET_USER_MEMORY_REGION2 helper
  KVM: selftests: Expand set_memory_region_test to validate
guest_memfd()

Sean Christopherson (18):
  KVM: Wrap kvm_gfn_range.pte in a per-action union
  KVM: Tweak kvm_hva_range and hva_handler_t to allow reusing for gfn
ranges
  KVM: PPC: Drop dead code related to KVM_ARCH_WANT_MMU_NOTIFIER
  KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to
CONFIG_KVM_GENERIC_MMU_NOTIFIER
  KVM: Introduce KVM_SET_USER_MEMORY_REGION2
  mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
  security: Export security_inode_init_security_anon() for use by KVM
  KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing
memory
  KVM: Add transparent hugepage support for dedicated guest memory
  KVM: Drop superfluous __KVM_VCPU_MULTIPLE_ADDRESS_SPACE macro
  KVM: Allow arch code to track number of memslot address spaces per VM
  KVM: x86: Add support for "protected VMs" that can utilize private
memory
  KVM: selftests: Drop unused kvm_userspace_memory_region_find() helper
  KVM: selftests: Convert lib's mem regions to
KVM_SET_USER_MEMORY_REGION2
  KVM: selftests: Add support for creating private memslots
  KVM: selftests: Introduce VM "shape" to allow tests to specify the VM
type
  KVM: selftests: Add GUEST_SYNC[1-6] macros for synchronizing more data
  KVM: selftests: Add basic selftest for guest_memfd()

Vishal Annapurve (3):
  KVM: selftests: Add helpers to convert guest memory b/w private and
shared
  KVM: selftests: Add helpers to do KVM_HC_MAP_GPA_RANGE hypercalls
(x86)
  KVM: selftests: Add x86-only selftest for private memory conversions

 Documentation/virt/kvm/api.rst| 114 
 arch/arm64/include/asm/kvm_host.h |   2 -
 arch/arm64/kvm/Kconfig|   2 +-
 arch/arm64/kvm/mmu.c  |   2 +-
 arch/mips/include/asm/kvm_host.h  |   2 -
 arch/mips/kvm/Kconfig |   2 +-
 arch/mips/kvm/mmu.c   |   2 +-
 arch/powerpc/include/asm/kvm_host.h   |   2 -
 arch/powerpc/kvm/Kconfig  |   8 +-
 arch/powerpc/kvm/book3s_hv.c  |   2 +-
 arch/powerpc/kvm/powerpc.c|   5 +-
 arch/riscv/include/asm/kvm_host.h |   2 -
 arch/riscv/kvm/Kconfig|   2 +-
 arch/riscv/kvm/mmu.c  |   2 +-
 arch/x86/include/asm/kvm_host.h   |  17 +-
 arch/x86/include/uapi/asm/kvm.h   |   3 +
 arch/x86/kvm/Kconfig  |  14 +-
 arch/x86/kvm/debugfs.c|   2 +-
 arch/x86/kvm/mmu/mmu.c| 287 +++-
 arch/x86/kvm/mmu/mmu_internal.h   |   4 +
 arch/x86/kvm/mmu/mmutrace.h   |   1 +
 arch/x86/kvm/mmu/tdp_mmu.c|   8 +-
 arch/x86/kvm/vmx/vmx.c|  11 +-
 arch/x86/kvm/x86.c|  24 +-
 include/linux/kvm_host.h  | 129 +++-
 include/linux/pagemap.h   |  11 +

Re: [RFC PATCH 01/21] crypto: scomp - Revert "add support for deflate rfc1950 (zlib)"

2023-07-18 Thread Ard Biesheuvel

On Wed, 19 Jul 2023 at 00:54, Eric Biggers  wrote:
>
> On Tue, Jul 18, 2023 at 03:32:39PM -0700, Eric Biggers wrote:
> > On Tue, Jul 18, 2023 at 02:58:27PM +0200, Ard Biesheuvel wrote:
> > > This reverts commit a368f43d6e3a001e684e9191a27df384fbff12f5.
> > >
> > > "zlib-deflate" was introduced 6 years ago, but it does not have any
> > > users. So let's remove the generic implementation and the test vectors,
> > > but retain the "zlib-deflate" entry in the testmgr code to avoid
> > > introducing warning messages on systems that implement zlib-deflate in
> > > hardware.
> > >
> > > Note that RFC 1950 which forms the basis of this algorithm dates back to
> > > 1996, and predates RFC 1951, on which the existing IPcomp is based and
> > > which we have supported in the kernel since 2003. So it seems rather
> > > unlikely that we will ever grow the need to support zlib-deflate.
> > >
> > > Signed-off-by: Ard Biesheuvel 
> > > ---
> > >  crypto/deflate.c | 61 +---
> > >  crypto/testmgr.c |  8 +--
> > >  crypto/testmgr.h | 75 
> > >  3 files changed, 18 insertions(+), 126 deletions(-)
> >
> > So if this is really unused, it's probably fair to remove it on that basis.
> > However, it's not correct to claim that DEFLATE is obsoleted by zlib (the 
> > data
> > format).  zlib is just DEFLATE plus a checksum, as is gzip.
> >
> > Many users of zlib or gzip use an external checksum and therefore would be
> > better served by DEFLATE, avoiding a redundant builtin checksum.  Typically,
> > people have chosen zlib or gzip simply because their compression library
> > defaulted to it, they didn't understand the difference, and they overlooked 
> > that
> > they're paying the price for a redundant builtin checksum.
> >
> > An example of someone doing it right is EROFS, which is working on adding
> > DEFLATE support (not zlib or gzip!):
> > https://lore.kernel.org/r/20230713001441.30462-1-hsiang...@linux.alibaba.com
> >
> > Of course, they are using the library API instead of the clumsy crypto API.
> >
>
> Ah, I misread this patch, sorry.  It's actually removing support for zlib (the
> data format) from the scomp API, leaving just DEFLATE.  That's fine too; 
> again,
> it ultimately just depends on what is actually being used via the scomp API.
> But similarly you can't really claim that zlib is obsoleted by DEFLATE just
> because of the RFC dates.  As I mentioned, many people do use zlib (the data
> format), often just because it's the default of zlib (the library) and they
> didn't know any better.  For example, btrfs compression supports zlib.
>

I am not suggesting either is obsolete. I am merely pointing out that
zlib-deflate is as old as plain deflate, and so we could have
implemented both at the same time when IPcomp support was added, but
we never bothered.

Re: [RFC PATCH 01/21] crypto: scomp - Revert "add support for deflate rfc1950 (zlib)"

2023-07-18 Thread Eric Biggers

On Tue, Jul 18, 2023 at 03:32:39PM -0700, Eric Biggers wrote:
> On Tue, Jul 18, 2023 at 02:58:27PM +0200, Ard Biesheuvel wrote:
> > This reverts commit a368f43d6e3a001e684e9191a27df384fbff12f5.
> > 
> > "zlib-deflate" was introduced 6 years ago, but it does not have any
> > users. So let's remove the generic implementation and the test vectors,
> > but retain the "zlib-deflate" entry in the testmgr code to avoid
> > introducing warning messages on systems that implement zlib-deflate in
> > hardware.
> > 
> > Note that RFC 1950 which forms the basis of this algorithm dates back to
> > 1996, and predates RFC 1951, on which the existing IPcomp is based and
> > which we have supported in the kernel since 2003. So it seems rather
> > unlikely that we will ever grow the need to support zlib-deflate.
> > 
> > Signed-off-by: Ard Biesheuvel 
> > ---
> >  crypto/deflate.c | 61 +---
> >  crypto/testmgr.c |  8 +--
> >  crypto/testmgr.h | 75 
> >  3 files changed, 18 insertions(+), 126 deletions(-)
> 
> So if this is really unused, it's probably fair to remove it on that basis.
> However, it's not correct to claim that DEFLATE is obsoleted by zlib (the data
> format).  zlib is just DEFLATE plus a checksum, as is gzip.
> 
> Many users of zlib or gzip use an external checksum and therefore would be
> better served by DEFLATE, avoiding a redundant builtin checksum.  Typically,
> people have chosen zlib or gzip simply because their compression library
> defaulted to it, they didn't understand the difference, and they overlooked 
> that
> they're paying the price for a redundant builtin checksum.
> 
> An example of someone doing it right is EROFS, which is working on adding
> DEFLATE support (not zlib or gzip!):
> https://lore.kernel.org/r/20230713001441.30462-1-hsiang...@linux.alibaba.com
> 
> Of course, they are using the library API instead of the clumsy crypto API.
> 

Ah, I misread this patch, sorry.  It's actually removing support for zlib (the
data format) from the scomp API, leaving just DEFLATE.  That's fine too; again,
it ultimately just depends on what is actually being used via the scomp API.
But similarly you can't really claim that zlib is obsoleted by DEFLATE just
because of the RFC dates.  As I mentioned, many people do use zlib (the data
format), often just because it's the default of zlib (the library) and they
didn't know any better.  For example, btrfs compression supports zlib.

- Eric

Re: [RFC PATCH 05/21] ubifs: Pass worst-case buffer size to compression routines

2023-07-18 Thread Eric Biggers

On Tue, Jul 18, 2023 at 02:58:31PM +0200, Ard Biesheuvel wrote:
> Currently, the ubifs code allocates a worst case buffer size to
> recompress a data node, but does not pass the size of that buffer to the
> compression code. This means that the compression code will never use
> the additional space, and might fail spuriously due to lack of space.
> 
> So let's multiply out_len by WORST_COMPR_FACTOR after allocating the
> buffer. Doing so is guaranteed not to overflow, given that the preceding
> kmalloc_array() call would have failed otherwise.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  fs/ubifs/journal.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
> index dc52ac0f4a345f30..4e5961878f336033 100644
> --- a/fs/ubifs/journal.c
> +++ b/fs/ubifs/journal.c
> @@ -1493,6 +1493,8 @@ static int truncate_data_node(const struct ubifs_info 
> *c, const struct inode *in
>   if (!buf)
>   return -ENOMEM;
>  
> + out_len *= WORST_COMPR_FACTOR;
> +
>   dlen = le32_to_cpu(dn->ch.len) - UBIFS_DATA_NODE_SZ;
>   data_size = dn_size - UBIFS_DATA_NODE_SZ;
>   compr_type = le16_to_cpu(dn->compr_type);

This looks like another case where data that would be expanded by compression
should just be stored uncompressed instead.

In fact, it seems that UBIFS does that already.  ubifs_compress() has this:

/*
 * If the data compressed only slightly, it is better to leave it
 * uncompressed to improve read speed.
 */
if (in_len - *out_len < UBIFS_MIN_COMPRESS_DIFF)
goto no_compr;

So it's unclear why the WORST_COMPR_FACTOR thing is needed at all.

- Eric

Re: [RFC PATCH 01/21] crypto: scomp - Revert "add support for deflate rfc1950 (zlib)"

2023-07-18 Thread Eric Biggers

On Tue, Jul 18, 2023 at 02:58:27PM +0200, Ard Biesheuvel wrote:
> This reverts commit a368f43d6e3a001e684e9191a27df384fbff12f5.
> 
> "zlib-deflate" was introduced 6 years ago, but it does not have any
> users. So let's remove the generic implementation and the test vectors,
> but retain the "zlib-deflate" entry in the testmgr code to avoid
> introducing warning messages on systems that implement zlib-deflate in
> hardware.
> 
> Note that RFC 1950 which forms the basis of this algorithm dates back to
> 1996, and predates RFC 1951, on which the existing IPcomp is based and
> which we have supported in the kernel since 2003. So it seems rather
> unlikely that we will ever grow the need to support zlib-deflate.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  crypto/deflate.c | 61 +---
>  crypto/testmgr.c |  8 +--
>  crypto/testmgr.h | 75 
>  3 files changed, 18 insertions(+), 126 deletions(-)

So if this is really unused, it's probably fair to remove it on that basis.
However, it's not correct to claim that DEFLATE is obsoleted by zlib (the data
format).  zlib is just DEFLATE plus a checksum, as is gzip.

Many users of zlib or gzip use an external checksum and therefore would be
better served by DEFLATE, avoiding a redundant builtin checksum.  Typically,
people have chosen zlib or gzip simply because their compression library
defaulted to it, they didn't understand the difference, and they overlooked that
they're paying the price for a redundant builtin checksum.

An example of someone doing it right is EROFS, which is working on adding
DEFLATE support (not zlib or gzip!):
https://lore.kernel.org/r/20230713001441.30462-1-hsiang...@linux.alibaba.com

Of course, they are using the library API instead of the clumsy crypto API.

- Eric

Re: [PATCH 0/2] eventfd: simplify signal helpers

2023-07-18 Thread Jason Gunthorpe

On Mon, Jul 17, 2023 at 04:52:03PM -0600, Alex Williamson wrote:
> On Mon, 17 Jul 2023 19:12:16 -0300
> Jason Gunthorpe  wrote:
> 
> > On Mon, Jul 17, 2023 at 01:08:31PM -0600, Alex Williamson wrote:
> > 
> > > What would that mechanism be?  We've been iterating on getting the
> > > serialization and buffering correct, but I don't know of another means
> > > that combines the notification with a value, so we'd likely end up with
> > > an eventfd only for notification and a separate ring buffer for
> > > notification values.  
> > 
> > All FDs do this. You just have to make a FD with custom
> > file_operations that does what this wants. The uAPI shouldn't be able
> > to tell if the FD is backing it with an eventfd or otherwise. Have the
> > kernel return the FD instead of accepting it. Follow the basic design
> > of eg mlx5vf_save_fops
> 
> Sure, userspace could poll on any fd and read a value from it, but at
> that point we're essentially duplicating a lot of what eventfd provides
> for a minor(?) semantic difference over how the counter value is
> interpreted.  Using an actual eventfd allows the ACPI notification to
> work as just another interrupt index within the existing vfio IRQ
> uAPI.

Yes, duplicated, sort of, whatever the "ack" is to allow pushing a new
value can be revised to run as part of the read.

But I don't really view it as a minor difference. eventfd is a
counter. It should not be abused otherwise, even if it can be made to
work.

It really isn't an IRQ if it is pushing an async message w/data.

Jason

Re: [PATCH 1/4] mm_notifiers: Rename invalidate_range notifier

2023-07-18 Thread Andrew Morton

On Tue, 18 Jul 2023 14:57:12 -0300 Jason Gunthorpe  wrote:

> On Tue, Jul 18, 2023 at 05:56:15PM +1000, Alistair Popple wrote:
> > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> > index b466172..48c81b9 100644
> > --- a/include/asm-generic/tlb.h
> > +++ b/include/asm-generic/tlb.h
> > @@ -456,7 +456,7 @@ static inline void tlb_flush_mmu_tlbonly(struct 
> > mmu_gather *tlb)
> > return;
> >  
> > tlb_flush(tlb);
> > -   mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
> > +   mmu_notifier_invalidate_secondary_tlbs(tlb->mm, tlb->start, tlb->end);
> > __tlb_reset_range(tlb);
> 
> Does this compile? I don't see
> "mmu_notifier_invalidate_secondary_tlbs" ?

Seems this call gets deleted later in the series.

> But I think the approach in this series looks fine, it is so much
> cleaner after we remove all the cruft in patch 4, just look at the
> diffstat..

I'll push this into -next if it compiles OK for me, but yes, a redo is
desirable please.

Re: [PATCH 3/4] mmu_notifiers: Call arch_invalidate_secondary_tlbs() when invalidating TLBs

2023-07-18 Thread Jason Gunthorpe

On Tue, Jul 18, 2023 at 11:17:59AM -0700, Andrew Morton wrote:
> On Tue, 18 Jul 2023 17:56:17 +1000 Alistair Popple  wrote:
> 
> > The arch_invalidate_secondary_tlbs() is an architecture specific mmu
> > notifier used to keep the TLB of secondary MMUs such as an IOMMU in
> > sync with the CPU page tables. Currently it is called from separate
> > code paths to the main CPU TLB invalidations. This can lead to a
> > secondary TLB not getting invalidated when required and makes it hard
> > to reason about when exactly the secondary TLB is invalidated.
> > 
> > To fix this move the notifier call to the architecture specific TLB
> > maintenance functions for architectures that have secondary MMUs
> > requiring explicit software invalidations.
> > 
> > This fixes a SMMU bug on ARM64. On ARM64 PTE permission upgrades
> > require a TLB invalidation. This invalidation is done by the
> > architecutre specific ptep_set_access_flags() which calls
> > flush_tlb_page() if required. However this doesn't call the notifier
> > resulting in infinite faults being generated by devices using the SMMU
> > if it has previously cached a read-only PTE in it's TLB.
> 
> This sounds like a pretty serious bug.  Can it happen in current
> released kernels?  If so, is a -stable backport needed?

There are currently no in-kernel drivers using the IOMMU SVA API, so
the impact for -stable is sort of muted. But it is serious if you are
unlucky to hit it.

Jason

[PATCH v2 2/2] PCI: layerscape: Add the workaround for lost link capablities during reset

2023-07-18 Thread Frank Li

From: Xiaowei Bao 

A workaround for the issue where the PCI Express Endpoint (EP) controller
loses the values of the Maximum Link Width and Supported Link Speed from
the Link Capabilities Register, which initially configured by the Reset
Configuration Word (RCW) during a link-down or hot reset event.

Fixes: a805770d8a22 ("PCI: layerscape: Add EP mode support")
Signed-off-by: Xiaowei Bao 
Signed-off-by: Hou Zhiqiang 
Signed-off-by: Frank Li 
---
change from v1 to v2:
 - add comments at restore register
 - add fixes tag

 .../pci/controller/dwc/pci-layerscape-ep.c| 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index e0969ff2ddf7..b1faf41a2fae 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -45,6 +45,7 @@ struct ls_pcie_ep {
struct pci_epc_features *ls_epc;
const struct ls_pcie_ep_drvdata *drvdata;
int irq;
+   u32 lnkcap;
boolbig_endian;
 };
 
@@ -73,6 +74,7 @@ static irqreturn_t ls_pcie_ep_event_handler(int irq, void 
*dev_id)
struct ls_pcie_ep *pcie = dev_id;
struct dw_pcie *pci = pcie->pci;
u32 val, cfg;
+   u8 offset;
 
val = ls_lut_readl(pcie, PEX_PF0_PME_MES_DR);
ls_lut_writel(pcie, PEX_PF0_PME_MES_DR, val);
@@ -81,6 +83,19 @@ static irqreturn_t ls_pcie_ep_event_handler(int irq, void 
*dev_id)
return IRQ_NONE;
 
if (val & PEX_PF0_PME_MES_DR_LUD) {
+
+   offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
+
+   /*
+* The values of the Maximum Link Width and Supported Link
+* Speed from the Link Capabilities Register will be lost
+* during link down or hot reset. Restore initial value
+* that configured by the Reset Configuration Word (RCW).
+*/
+   dw_pcie_dbi_ro_wr_en(pci);
+   dw_pcie_writel_dbi(pci, offset + PCI_EXP_LNKCAP, pcie->lnkcap);
+   dw_pcie_dbi_ro_wr_dis(pci);
+
cfg = ls_lut_readl(pcie, PEX_PF0_CONFIG);
cfg |= PEX_PF0_CFG_READY;
ls_lut_writel(pcie, PEX_PF0_CONFIG, cfg);
@@ -216,6 +231,7 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
struct ls_pcie_ep *pcie;
struct pci_epc_features *ls_epc;
struct resource *dbi_base;
+   u8 offset;
int ret;
 
pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL);
@@ -252,6 +268,9 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
 
platform_set_drvdata(pdev, pcie);
 
+   offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
+   pcie->lnkcap = dw_pcie_readl_dbi(pci, offset + PCI_EXP_LNKCAP);
+
ret = dw_pcie_ep_init(>ep);
if (ret)
return ret;
-- 
2.34.1

[PATCH v2 1/2] PCI: layerscape: Add support for Link down notification

2023-07-18 Thread Frank Li

Add support to pass Link down notification to Endpoint function driver
so that the LINK_DOWN event can be processed by the function.

Acked-by: Manivannan Sadhasivam 
Signed-off-by: Frank Li 
---
Change from v1 to v2
 - move pci_epc_linkdown() after dev_dbg()

 drivers/pci/controller/dwc/pci-layerscape-ep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index de4c1758a6c3..e0969ff2ddf7 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -89,6 +89,7 @@ static irqreturn_t ls_pcie_ep_event_handler(int irq, void 
*dev_id)
dev_dbg(pci->dev, "Link up\n");
} else if (val & PEX_PF0_PME_MES_DR_LDD) {
dev_dbg(pci->dev, "Link down\n");
+   pci_epc_linkdown(pci->ep.epc);
} else if (val & PEX_PF0_PME_MES_DR_HRD) {
dev_dbg(pci->dev, "Hot reset\n");
}
-- 
2.34.1

Re: [PATCH 3/4] mmu_notifiers: Call arch_invalidate_secondary_tlbs() when invalidating TLBs

2023-07-18 Thread Andrew Morton

On Tue, 18 Jul 2023 17:56:17 +1000 Alistair Popple  wrote:

> The arch_invalidate_secondary_tlbs() is an architecture specific mmu
> notifier used to keep the TLB of secondary MMUs such as an IOMMU in
> sync with the CPU page tables. Currently it is called from separate
> code paths to the main CPU TLB invalidations. This can lead to a
> secondary TLB not getting invalidated when required and makes it hard
> to reason about when exactly the secondary TLB is invalidated.
> 
> To fix this move the notifier call to the architecture specific TLB
> maintenance functions for architectures that have secondary MMUs
> requiring explicit software invalidations.
> 
> This fixes a SMMU bug on ARM64. On ARM64 PTE permission upgrades
> require a TLB invalidation. This invalidation is done by the
> architecutre specific ptep_set_access_flags() which calls
> flush_tlb_page() if required. However this doesn't call the notifier
> resulting in infinite faults being generated by devices using the SMMU
> if it has previously cached a read-only PTE in it's TLB.

This sounds like a pretty serious bug.  Can it happen in current
released kernels?  If so, is a -stable backport needed?

> Moving the invalidations into the TLB invalidation functions ensures
> all invalidations happen at the same time as the CPU invalidation. The
> architecture specific flush_tlb_all() routines do not call the
> notifier as none of the IOMMUs require this.

Re: [PATCH 1/4] mm_notifiers: Rename invalidate_range notifier

2023-07-18 Thread Jason Gunthorpe

On Tue, Jul 18, 2023 at 05:56:15PM +1000, Alistair Popple wrote:
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index b466172..48c81b9 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -456,7 +456,7 @@ static inline void tlb_flush_mmu_tlbonly(struct 
> mmu_gather *tlb)
>   return;
>  
>   tlb_flush(tlb);
> - mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
> + mmu_notifier_invalidate_secondary_tlbs(tlb->mm, tlb->start, tlb->end);
>   __tlb_reset_range(tlb);

Does this compile? I don't see
"mmu_notifier_invalidate_secondary_tlbs" ?

Maybe we don't need to rename this function since you pretty much
remove it in the next patches?

> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 50c0dde..34c5a84 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -207,7 +207,7 @@ mmu_interval_read_begin(struct mmu_interval_notifier 
> *interval_sub)
>*spin_lock
>* seq = ++subscriptions->invalidate_seq
>*spin_unlock
> -  * op->invalidate_range():
> +  * op->invalidate_secondary_tlbs():

The later patch should delete this stuff from the comment too, we
no longer guarantee this relationship?

> @@ -560,23 +560,23 @@ mn_hlist_invalidate_end(struct 
> mmu_notifier_subscriptions *subscriptions,
>   hlist_for_each_entry_rcu(subscription, >list, hlist,
>srcu_read_lock_held()) {
>   /*
> -  * Call invalidate_range here too to avoid the need for the
> -  * subsystem of having to register an invalidate_range_end
> -  * call-back when there is invalidate_range already. Usually a
> -  * subsystem registers either invalidate_range_start()/end() or
> -  * invalidate_range(), so this will be no additional overhead
> -  * (besides the pointer check).
> +  * Subsystems should register either invalidate_secondary_tlbs()
> +  * or invalidate_range_start()/end() callbacks.
>*
> -  * We skip call to invalidate_range() if we know it is safe ie
> -  * call site use mmu_notifier_invalidate_range_only_end() which
> -  * is safe to do when we know that a call to invalidate_range()
> -  * already happen under page table lock.
> +  * We call invalidate_secondary_tlbs() here so that subsystems
> +  * can use larger range based invalidations. In some cases
> +  * though invalidate_secondary_tlbs() needs to be called while
> +  * holding the page table lock. In that case call sites use
> +  * mmu_notifier_invalidate_range_only_end() and we know it is
> +  * safe to skip secondary TLB invalidation as it will have
> +  * already been done.
>*/
> - if (!only_end && subscription->ops->invalidate_range)
> - subscription->ops->invalidate_range(subscription,
> - range->mm,
> - range->start,
> - range->end);
> + if (!only_end && subscription->ops->invalidate_secondary_tlbs)
> + subscription->ops->invalidate_secondary_tlbs(

More doesn't compile, and the comment has the same issue..

But I think the approach in this series looks fine, it is so much
cleaner after we remove all the cruft in patch 4, just look at the
diffstat..

Jason

Re: linux-next: Tree for Jul 13 (drivers/video/fbdev/ps3fb.c)

2023-07-18 Thread Randy Dunlap

On 7/18/23 04:48, Michael Ellerman wrote:
> Bagas Sanjaya  writes:
>> On Thu, Jul 13, 2023 at 09:11:10AM -0700, Randy Dunlap wrote:
>>> on ppc64:
>>>
>>> In file included from ../include/linux/device.h:15,
>>>  from ../arch/powerpc/include/asm/io.h:22,
>>>  from ../include/linux/io.h:13,
>>>  from ../include/linux/irq.h:20,
>>>  from ../arch/powerpc/include/asm/hardirq.h:6,
>>>  from ../include/linux/hardirq.h:11,
>>>  from ../include/linux/interrupt.h:11,
>>>  from ../drivers/video/fbdev/ps3fb.c:25:
>>> ../drivers/video/fbdev/ps3fb.c: In function 'ps3fb_probe':
>>> ../drivers/video/fbdev/ps3fb.c:1172:40: error: 'struct fb_info' has no 
>>> member named 'dev'
>>>  1172 |  dev_driver_string(info->dev), dev_name(info->dev),
>>>   |^~
>>> ../include/linux/dev_printk.h:110:37: note: in definition of macro 
>>> 'dev_printk_index_wrap'
>>>   110 | _p_func(dev, fmt, ##__VA_ARGS__);   
>>> \
>>>   | ^~~
>>> ../drivers/video/fbdev/ps3fb.c:1171:9: note: in expansion of macro 
>>> 'dev_info'
>>>  1171 | dev_info(info->device, "%s %s, using %u KiB of video 
>>> memory\n",
>>>   | ^~~~
>>> ../drivers/video/fbdev/ps3fb.c:1172:61: error: 'struct fb_info' has no 
>>> member named 'dev'
>>>  1172 |  dev_driver_string(info->dev), dev_name(info->dev),
>>>   | ^~
>>> ../include/linux/dev_printk.h:110:37: note: in definition of macro 
>>> 'dev_printk_index_wrap'
>>>   110 | _p_func(dev, fmt, ##__VA_ARGS__);   
>>> \
>>>   | ^~~
>>> ../drivers/video/fbdev/ps3fb.c:1171:9: note: in expansion of macro 
>>> 'dev_info'
>>>  1171 | dev_info(info->device, "%s %s, using %u KiB of video 
>>> memory\n",
>>>   | ^~~~
>>>
>>>
>>
>> Hmm, there is no response from Thomas yet. I guess we should go with
>> reverting bdb616479eff419, right? Regardless, I'm adding this build 
>> regression
>> to regzbot so that parties involved are aware of it:
>>
>> #regzbot ^introduced: bdb616479eff419
>> #regzbot title: build regression in PS3 framebuffer
> 
> Does regzbot track issues in linux-next?
> 
> They're not really regressions because they're not in a release yet.
> 
> Anyway I don't see where bdb616479eff419 comes from.
> 
> The issue was introduced by:
> 
>   701d2054fa31 fbdev: Make support for userspace interfaces configurable
> 
> The driver seems to only use info->dev in that one dev_info() line,
> which seems purely cosmetic, so I think it could just be removed, eg:
> 
> diff --git a/drivers/video/fbdev/ps3fb.c b/drivers/video/fbdev/ps3fb.c
> index d4abcf8aff75..a304a39d712b 100644
> --- a/drivers/video/fbdev/ps3fb.c
> +++ b/drivers/video/fbdev/ps3fb.c
> @@ -1168,8 +1168,7 @@ static int ps3fb_probe(struct ps3_system_bus_device 
> *dev)
>  
>   ps3_system_bus_set_drvdata(dev, info);
>  
> - dev_info(info->device, "%s %s, using %u KiB of video memory\n",
> -  dev_driver_string(info->dev), dev_name(info->dev),
> + dev_info(info->device, "using %u KiB of video memory\n",
>info->fix.smem_len >> 10);
>  
>   task = kthread_run(ps3fbd, info, DEVICE_NAME);


Tested-by: Randy Dunlap  # build-tested

Thanks.

-- 
~Randy

Re: [PATCH 07/12] arch/x86: Declare edid_info in

2023-07-18 Thread Arnd Bergmann

On Wed, Jul 5, 2023, at 10:18, Thomas Zimmermann wrote:
> Am 30.06.23 um 13:53 schrieb Arnd Bergmann:
>> On Fri, Jun 30, 2023, at 09:46, Thomas Zimmermann wrote:
>>> Am 29.06.23 um 15:21 schrieb Arnd Bergmann:
>> 
>> I definitely get it for the screen_info, which needs the complexity.
>> For ARCHARCH_HAS_EDID_INFO I would hope that it's never selected by
>> anything other than x86, so I would still go with just a dependency
>> on x86 for simplicity, but I don't mind having the extra symbol if that
>> keeps it more consistent with how the screen_info is handled.
>
> Well, I'd like to add edid_info to platforms with EFI. What would be 
> arm/arm64 and loongarch, I guess. See below for the future plans.

To be clear: I don't mind using a 'struct edid_info' being passed
around between subsystems, that is clearly an improvement over
'struct screen_info'. It's the global variable that seems like
an artifact of linux-2.4 days, and I think we can do better than that.

 I suppose you could use FIRMWARE_EDID on EFI or OF systems without
 the need for a global edid_info structure, but that would not
 share any code with the current fb_firmware_edid() function.
>>>
>>> The current code is build on top of screen_info and edid_info. I'd
>>> preferably not replace that, if possible.
>> 
>> One way I could imagine this looking in the end would be
>> something like
>> 
>> struct screen_info *fb_screen_info(struct device *dev)
>> {
>>struct screen_info *si = NULL;
>> 
>>if (IS_ENABLED(CONFIG_EFI))
>>  si = efi_get_screen_info(dev);
>> 
>>if (IS_ENABLED(CONFIG_ARCH_HAS_SCREEN_INFO) && !si)
>>  si = screen_info;
>> 
>>return si;
>> }
>> 
>> corresponding to fb_firmware_edid(). With this, any driver
>> that wants to access screen_info would call this function
>> instead of using the global pointer, plus either NULL pointer
>> check or a CONFIG_ARCH_HAS_SCREEN_INFO dependency.
>> 
>> This way we could completely eliminate the global screen_info
>> on arm64, riscv, and loongarch but still use the efi and
>> hyperv framebuffer/drm drivers.
>
> If possible, I'd like to remove global screen_info and edid_info 
> entirely from fbdev and the various consoles.

ok

> We currently use screen_info to set up the generic framebuffer device in 
> drivers/firmware/sysfb.c. I'd like to use edid_info here as well, so 
> that the generic graphics drivers can get EDID information.
>
> For the few fbdev drivers and consoles that require the global 
> screen_info/edid_info, I'd rather provide lookup functions in sysfb 
> (e.g., sysfb_get_screen_info(), sysfb_get_edid_info()). The global 
> screen_info/edid_info state would then become an internal artifact of 
> the sysfb code.
>
> Hopefully that explains some of the decisions made in this patchset.

I spent some more time looking at the screen_info side, after my
first set of patches to refine the #ifdefs, and I think we don't
even need to make screen_info available to non-x86 drivers at all:

- All the vgacon users except for x86 can just register a static
  screen_info (or simplified into a simpler structure) with the
  driver itself. This even includes ia64, which does not support
  EFI framebuffers.

- The VESA, vga16, SIS, Intel and HyperV framebuffer drivers only
  need access to screen_info on x86. HyperV is the only driver that
  can currently access the data from EFI firmware on arm64, but
  that is only used for 'gen 1' guests, which I'm pretty sure
  only exist on x86.

- All the other references to screen_info are specific to EFI
  firmware, so we can move the global definition from arm,
  arm64, loongarch, riscv and ia64 into the EFI firmware
  code itself. It is still accessed by efifb and efi-earlycon
  at this point.

I have uploaded version 2 of my series to
https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=screen-info-v2
and will send it out after I get the green light from build
bots. 

   Arnd

[PATCH v2] powerpc: Explicitly include correct DT includes

2023-07-18 Thread Rob Herring

The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.

Signed-off-by: Rob Herring 
---
v2:
- Fix double include of of.h
---
 arch/powerpc/include/asm/ibmebus.h  | 2 ++
 arch/powerpc/include/asm/macio.h| 3 ++-
 arch/powerpc/kernel/legacy_serial.c | 2 +-
 arch/powerpc/kernel/of_platform.c   | 4 +---
 arch/powerpc/kernel/setup-common.c  | 4 ++--
 arch/powerpc/kexec/file_load_64.c   | 2 +-
 arch/powerpc/kexec/ranges.c | 2 +-
 arch/powerpc/platforms/4xx/cpm.c| 2 +-
 arch/powerpc/platforms/4xx/hsta_msi.c   | 2 +-
 arch/powerpc/platforms/4xx/soc.c| 2 +-
 arch/powerpc/platforms/512x/mpc5121_ads.c   | 2 +-
 arch/powerpc/platforms/512x/mpc512x_generic.c   | 2 +-
 arch/powerpc/platforms/512x/mpc512x_lpbfifo.c   | 2 +-
 arch/powerpc/platforms/512x/pdm360ng.c  | 3 ++-
 arch/powerpc/platforms/52xx/mpc52xx_gpt.c   | 3 +--
 arch/powerpc/platforms/82xx/ep8248e.c   | 1 +
 arch/powerpc/platforms/83xx/km83xx.c| 4 ++--
 arch/powerpc/platforms/83xx/suspend.c   | 2 +-
 arch/powerpc/platforms/85xx/bsc913x_qds.c   | 2 +-
 arch/powerpc/platforms/85xx/bsc913x_rdb.c   | 2 +-
 arch/powerpc/platforms/85xx/c293pcie.c  | 3 +--
 arch/powerpc/platforms/85xx/ge_imp3a.c  | 2 +-
 arch/powerpc/platforms/85xx/ksi8560.c   | 3 ++-
 arch/powerpc/platforms/85xx/mpc8536_ds.c| 2 +-
 arch/powerpc/platforms/85xx/mpc85xx_ds.c| 2 +-
 arch/powerpc/platforms/85xx/mpc85xx_mds.c   | 4 ++--
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c   | 3 ++-
 arch/powerpc/platforms/85xx/p1010rdb.c  | 2 +-
 arch/powerpc/platforms/85xx/p1022_ds.c  | 2 +-
 arch/powerpc/platforms/85xx/p1022_rdk.c | 2 +-
 arch/powerpc/platforms/85xx/p1023_rdb.c | 3 +--
 arch/powerpc/platforms/85xx/socrates.c  | 2 +-
 arch/powerpc/platforms/85xx/socrates_fpga_pic.c | 1 -
 arch/powerpc/platforms/85xx/stx_gp3.c   | 2 +-
 arch/powerpc/platforms/85xx/tqm85xx.c   | 2 +-
 arch/powerpc/platforms/85xx/twr_p102x.c | 3 ++-
 arch/powerpc/platforms/85xx/xes_mpc85xx.c   | 2 +-
 arch/powerpc/platforms/86xx/gef_ppc9a.c | 2 +-
 arch/powerpc/platforms/86xx/gef_sbc310.c| 2 +-
 arch/powerpc/platforms/86xx/gef_sbc610.c| 2 +-
 arch/powerpc/platforms/86xx/mvme7100.c  | 1 -
 arch/powerpc/platforms/86xx/pic.c   | 2 +-
 arch/powerpc/platforms/cell/axon_msi.c  | 3 ++-
 arch/powerpc/platforms/cell/cbe_regs.c  | 3 +--
 arch/powerpc/platforms/cell/iommu.c | 2 +-
 arch/powerpc/platforms/cell/setup.c | 1 +
 arch/powerpc/platforms/cell/spider-pci.c| 1 -
 arch/powerpc/platforms/embedded6xx/holly.c  | 2 +-
 arch/powerpc/platforms/maple/setup.c| 4 ++--
 arch/powerpc/platforms/pasemi/gpio_mdio.c   | 2 +-
 arch/powerpc/platforms/pasemi/setup.c   | 2 ++
 arch/powerpc/platforms/powermac/setup.c | 2 +-
 arch/powerpc/platforms/powernv/opal-imc.c   | 1 -
 arch/powerpc/platforms/powernv/opal-rtc.c   | 3 ++-
 arch/powerpc/platforms/powernv/opal-secvar.c| 2 +-
 arch/powerpc/platforms/powernv/opal-sensor.c| 2 ++
 arch/powerpc/platforms/pseries/ibmebus.c| 1 +
 arch/powerpc/sysdev/cpm_common.c| 2 --
 arch/powerpc/sysdev/cpm_gpio.c  | 3 ++-
 arch/powerpc/sysdev/fsl_pmc.c   | 4 ++--
 arch/powerpc/sysdev/fsl_rio.c   | 4 ++--
 arch/powerpc/sysdev/fsl_rmu.c   | 1 -
 arch/powerpc/sysdev/fsl_soc.c   | 1 -
 arch/powerpc/sysdev/mpic_msgr.c | 3 ++-
 arch/powerpc/sysdev/mpic_timer.c| 1 -
 arch/powerpc/sysdev/of_rtc.c| 4 ++--
 arch/powerpc/sysdev/pmi.c   | 4 ++--
 67 files changed, 79 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/include/asm/ibmebus.h 
b/arch/powerpc/include/asm/ibmebus.h
index 088f95b2e14f..6f33253a364a 100644
--- a/arch/powerpc/include/asm/ibmebus.h
+++ b/arch/powerpc/include/asm/ibmebus.h
@@ -46,6 +46,8 @@
 #include 
 #include 
 
+struct platform_driver;
+
 extern struct bus_type ibmebus_bus_type;
 
 int ibmebus_register_driver(struct platform_driver *drv);
diff --git a/arch/powerpc/include/asm/macio.h b/arch/powerpc/include/asm/macio.h
index ff5fd82d9ff0..3a07c62973aa 100644
--- a/arch/powerpc/include/asm/macio.h
+++

[PATCH v2] dmaengine: Explicitly include correct DT includes

2023-07-18 Thread Rob Herring

The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.

Signed-off-by: Rob Herring 
---
v2:
- Fix build error on bestcomm
---
 drivers/dma/apple-admac.c  | 3 ++-
 drivers/dma/at_hdmac.c | 2 +-
 drivers/dma/bcm-sba-raid.c | 4 +++-
 drivers/dma/bestcomm/bestcomm.c| 3 +--
 drivers/dma/dma-jz4780.c   | 1 -
 drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c | 1 -
 drivers/dma/dw/rzn1-dmamux.c   | 4 +++-
 drivers/dma/fsl-qdma.c | 4 ++--
 drivers/dma/fsl_raid.c | 3 ++-
 drivers/dma/fsldma.c   | 3 ++-
 drivers/dma/img-mdc-dma.c  | 1 -
 drivers/dma/imx-dma.c  | 2 +-
 drivers/dma/imx-sdma.c | 1 -
 drivers/dma/lpc18xx-dmamux.c   | 4 +++-
 drivers/dma/mediatek/mtk-cqdma.c   | 1 -
 drivers/dma/mediatek/mtk-hsdma.c   | 1 -
 drivers/dma/mediatek/mtk-uart-apdma.c  | 1 -
 drivers/dma/mpc512x_dma.c  | 4 ++--
 drivers/dma/mxs-dma.c  | 1 -
 drivers/dma/nbpfaxi.c  | 1 -
 drivers/dma/owl-dma.c  | 3 ++-
 drivers/dma/ppc4xx/adma.c  | 2 +-
 drivers/dma/qcom/hidma.c   | 2 +-
 drivers/dma/sh/shdmac.c| 1 -
 drivers/dma/sprd-dma.c | 2 +-
 drivers/dma/stm32-dmamux.c | 4 +++-
 drivers/dma/stm32-mdma.c   | 1 -
 drivers/dma/sun6i-dma.c| 2 +-
 drivers/dma/tegra186-gpc-dma.c | 2 +-
 drivers/dma/tegra20-apb-dma.c  | 1 -
 drivers/dma/tegra210-adma.c| 3 ++-
 drivers/dma/ti/dma-crossbar.c  | 5 +++--
 drivers/dma/ti/edma.c  | 1 -
 drivers/dma/ti/k3-udma-private.c   | 2 ++
 drivers/dma/ti/k3-udma.c   | 1 -
 drivers/dma/ti/omap-dma.c  | 2 +-
 drivers/dma/xgene-dma.c| 3 ++-
 drivers/dma/xilinx/xilinx_dma.c| 4 ++--
 drivers/dma/xilinx/zynqmp_dma.c| 3 ++-
 39 files changed, 46 insertions(+), 43 deletions(-)

diff --git a/drivers/dma/apple-admac.c b/drivers/dma/apple-admac.c
index 4cf8da77bdd9..3af795635c5c 100644
--- a/drivers/dma/apple-admac.c
+++ b/drivers/dma/apple-admac.c
@@ -10,8 +10,9 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index ee3a219e3a89..b2876f67471f 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -20,7 +20,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c
index 064761289a73..94ea35330eb5 100644
--- a/drivers/dma/bcm-sba-raid.c
+++ b/drivers/dma/bcm-sba-raid.c
@@ -35,7 +35,9 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 
diff --git a/drivers/dma/bestcomm/bestcomm.c b/drivers/dma/bestcomm/bestcomm.c
index eabbcfcaa7cb..80096f94032d 100644
--- a/drivers/dma/bestcomm/bestcomm.c
+++ b/drivers/dma/bestcomm/bestcomm.c
@@ -14,9 +14,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/dma/dma-jz4780.c b/drivers/dma/dma-jz4780.c
index 9c1a6e9a9c03..adbd47bd6adf 100644
--- a/drivers/dma/dma-jz4780.c
+++ b/drivers/dma/dma-jz4780.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c 
b/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c
index 796b6caf0bab..dd02f84e404d 100644
--- a/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c
+++ b/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c
@@ -21,7 +21,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/dma/dw/rzn1-dmamux.c b/drivers/dma/dw/rzn1-dmamux.c
index f9912c3dd4d7..4fb8508419db 100644
--- a/drivers/dma/dw/rzn1-dmamux.c
+++ b/drivers/dma/dw/rzn1-dmamux.c
@@ -5,8 +5,10 @@
  * Based on TI crossbar driver written by Peter Ujfalusi 

  */
 #include 
-#include 
+#include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
diff --git

[PATCH v2] misc: Explicitly include correct DT includes

2023-07-18 Thread Rob Herring

The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.

Acked-by: Andrew Donnellan  # cxl
Signed-off-by: Rob Herring 
---
v2:
- Fix double include of of.h
---
 drivers/misc/cxl/base.c| 1 +
 drivers/misc/fastrpc.c | 1 +
 drivers/misc/lis3lv02d/lis3lv02d.c | 2 +-
 drivers/misc/qcom-coincell.c   | 1 -
 drivers/misc/sram.c| 2 +-
 drivers/misc/vcpu_stall_detector.c | 1 -
 drivers/misc/xilinx_sdfec.c| 3 ++-
 drivers/misc/xilinx_tmr_inject.c   | 3 ++-
 drivers/misc/xilinx_tmr_manager.c  | 3 ++-
 9 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index cc0caf9192dc..b054562c046e 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "cxl.h"
 
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 9666d28037e1..1c7c0532da6f 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/misc/lis3lv02d/lis3lv02d.c 
b/drivers/misc/lis3lv02d/lis3lv02d.c
index 299d316f1bda..49868a45c0ad 100644
--- a/drivers/misc/lis3lv02d/lis3lv02d.c
+++ b/drivers/misc/lis3lv02d/lis3lv02d.c
@@ -26,7 +26,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include "lis3lv02d.h"
 
 #define DRIVER_NAME "lis3lv02d"
diff --git a/drivers/misc/qcom-coincell.c b/drivers/misc/qcom-coincell.c
index 54d4f6ee..3c57f7429147 100644
--- a/drivers/misc/qcom-coincell.c
+++ b/drivers/misc/qcom-coincell.c
@@ -8,7 +8,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 struct qcom_coincell {
diff --git a/drivers/misc/sram.c b/drivers/misc/sram.c
index 5757adf418b1..a88f92cf35be 100644
--- a/drivers/misc/sram.c
+++ b/drivers/misc/sram.c
@@ -10,8 +10,8 @@
 #include 
 #include 
 #include 
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/misc/vcpu_stall_detector.c 
b/drivers/misc/vcpu_stall_detector.c
index 53b5506080e1..6479c962da1a 100644
--- a/drivers/misc/vcpu_stall_detector.c
+++ b/drivers/misc/vcpu_stall_detector.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/misc/xilinx_sdfec.c b/drivers/misc/xilinx_sdfec.c
index 270ff4c5971a..29e9c380b643 100644
--- a/drivers/misc/xilinx_sdfec.c
+++ b/drivers/misc/xilinx_sdfec.c
@@ -15,7 +15,8 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/misc/xilinx_tmr_inject.c b/drivers/misc/xilinx_tmr_inject.c
index d96f6d7cd109..9fc5835bfebc 100644
--- a/drivers/misc/xilinx_tmr_inject.c
+++ b/drivers/misc/xilinx_tmr_inject.c
@@ -11,7 +11,8 @@
 
 #include 
 #include 
-#include 
+#include 
+#include 
 #include 
 
 /* TMR Inject Register offsets */
diff --git a/drivers/misc/xilinx_tmr_manager.c 
b/drivers/misc/xilinx_tmr_manager.c
index 0ef55e06d3a0..3e4e40c3766f 100644
--- a/drivers/misc/xilinx_tmr_manager.c
+++ b/drivers/misc/xilinx_tmr_manager.c
@@ -15,7 +15,8 @@
 
 #include 
 #include 
-#include 
+#include 
+#include 
 
 /* TMR Manager Register offsets */
 #define XTMR_MANAGER_CR_OFFSET 0x0
-- 
2.40.1

[PATCH v2] usb: Explicitly include correct DT includes

2023-07-18 Thread Rob Herring

The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.

Acked-by: Herve Codina 
Signed-off-by: Rob Herring 
---
v2:
- Fix double include of of.h
---
 drivers/usb/cdns3/cdns3-gadget.c| 1 +
 drivers/usb/cdns3/cdns3-plat.c  | 1 +
 drivers/usb/cdns3/cdns3-ti.c| 1 +
 drivers/usb/cdns3/core.c| 1 +
 drivers/usb/chipidea/ci_hdrc_imx.c  | 1 +
 drivers/usb/chipidea/ci_hdrc_tegra.c| 3 ++-
 drivers/usb/chipidea/usbmisc_imx.c  | 3 ++-
 drivers/usb/common/common.c | 1 +
 drivers/usb/core/message.c  | 1 +
 drivers/usb/core/of.c   | 1 -
 drivers/usb/core/usb.c  | 1 +
 drivers/usb/dwc2/gadget.c   | 1 -
 drivers/usb/dwc2/platform.c | 2 +-
 drivers/usb/dwc3/dwc3-imx8mp.c  | 1 +
 drivers/usb/dwc3/dwc3-keystone.c| 1 +
 drivers/usb/gadget/udc/fsl_udc_core.c   | 1 -
 drivers/usb/gadget/udc/gr_udc.c | 5 ++---
 drivers/usb/gadget/udc/max3420_udc.c| 4 +---
 drivers/usb/gadget/udc/pxa27x_udc.c | 2 +-
 drivers/usb/gadget/udc/renesas_usb3.c   | 2 +-
 drivers/usb/gadget/udc/renesas_usbf.c   | 5 ++---
 drivers/usb/gadget/udc/tegra-xudc.c | 1 -
 drivers/usb/gadget/udc/udc-xilinx.c | 6 ++
 drivers/usb/host/ehci-fsl.c | 2 +-
 drivers/usb/host/ehci-orion.c   | 2 --
 drivers/usb/host/fhci-hcd.c | 3 ++-
 drivers/usb/host/fsl-mph-dr-of.c| 3 ++-
 drivers/usb/host/ohci-at91.c| 2 +-
 drivers/usb/host/ohci-da8xx.c   | 1 +
 drivers/usb/host/ohci-ppc-of.c  | 3 ++-
 drivers/usb/host/xhci-plat.c| 1 -
 drivers/usb/host/xhci-rcar.c| 1 -
 drivers/usb/host/xhci-tegra.c   | 2 +-
 drivers/usb/misc/usb251xb.c | 2 +-
 drivers/usb/mtu3/mtu3.h | 1 +
 drivers/usb/mtu3/mtu3_host.c| 1 +
 drivers/usb/musb/jz4740.c   | 2 +-
 drivers/usb/musb/mediatek.c | 1 +
 drivers/usb/musb/mpfs.c | 1 +
 drivers/usb/musb/musb_dsps.c| 2 --
 drivers/usb/musb/sunxi.c| 1 -
 drivers/usb/phy/phy-mxs-usb.c   | 2 +-
 drivers/usb/phy/phy-tegra-usb.c | 2 +-
 drivers/usb/renesas_usbhs/common.c  | 2 +-
 drivers/usb/renesas_usbhs/rza.c | 2 +-
 drivers/usb/renesas_usbhs/rza2.c| 1 -
 drivers/usb/typec/tcpm/fusb302.c| 2 +-
 drivers/usb/typec/tcpm/qcom/qcom_pmic_typec.c   | 2 +-
 drivers/usb/typec/tcpm/qcom/qcom_pmic_typec_pdphy.c | 2 --
 drivers/usb/typec/tcpm/qcom/qcom_pmic_typec_port.c  | 1 -
 drivers/usb/typec/ucsi/ucsi_glink.c | 1 -
 51 files changed, 46 insertions(+), 48 deletions(-)

diff --git a/drivers/usb/cdns3/cdns3-gadget.c b/drivers/usb/cdns3/cdns3-gadget.c
index ea19253fd2d0..e6f6aeb7b5bb 100644
--- a/drivers/usb/cdns3/cdns3-gadget.c
+++ b/drivers/usb/cdns3/cdns3-gadget.c
@@ -61,6 +61,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "core.h"
 #include "gadget-export.h"
diff --git a/drivers/usb/cdns3/cdns3-plat.c b/drivers/usb/cdns3/cdns3-plat.c
index 884e2301237f..b15ff5bd91c2 100644
--- a/drivers/usb/cdns3/cdns3-plat.c
+++ b/drivers/usb/cdns3/cdns3-plat.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
diff --git a/drivers/usb/cdns3/cdns3-ti.c b/drivers/usb/cdns3/cdns3-ti.c
index 81b9132e3aaa..5945c4b1e11f 100644
--- a/drivers/usb/cdns3/cdns3-ti.c
+++ b/drivers/usb/cdns3/cdns3-ti.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* USB Wrapper register offsets */
 #define USBSS_PID  0x0
diff --git a/drivers/usb/cdns3/core.c b/drivers/usb/cdns3/core.c
index dbcdf3b24b47..baa154cee352 100644
--- a/drivers/usb/cdns3/core.c
+++ b/drivers/usb/cdns3/core.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c 
b/drivers/usb/chipidea/ci_hdrc_imx.c
index

Re: [PATCH] powerpc/build: vdso linker warning for orphan sections

2023-07-18 Thread John Ogness

On 2023-07-18, Michael Ellerman  wrote:
>> ld: warning: discarding dynamic section .rela.opd
>>
>> and bisects to:
>>
>> 8ad57add77d3 ("powerpc/build: vdso linker warning for orphan sections")
>
> Can you test with a newer compiler/binutils?

Testing the Debian release cross compilers/binutils:

Debian 10 / gcc 8.3.0  / ld 2.31.1: generates the warning

Debian 11 / gcc 10.2.1 / ld 2.35.2: generates the warning

Debian 12 / gcc 12.2.0 / ld 2.40:   does _not_ generate the warning

I suppose moving to the newer toolchain is the workaround. Although it
is a bit unusual to require such a modern toolchain in order to build a
kernel without warnings.

John Ogness

Re: [PATCH] platforms: 52xx: Remove space after '(' and before ')'

2023-07-18 Thread Andy Shevchenko

On Tue, Jul 18, 2023 at 05:02:39PM +0800, hanyu...@208suo.com wrote:
> The patch fixes the following errors detected by checkpatch:
> 
> platforms/52xx/mpc52xx_pci.c:346:ERROR: space prohibited after that open
> parenthesis '('
> platforms/52xx/mpc52xx_pci.c:347:ERROR: space prohibited after that open
> parenthesis '('
> platforms/52xx/mpc52xx_pci.c:348:ERROR: space prohibited before that close
> parenthesis ')'

First of all, your patch is mangled and may not be applied.
Second, we usually don't do this kind of patches at all.
Besides the fact that we don't run checkpatch on the files
which are already in upstream (esp. so-o-o old as this one).

NAK.

...

> +if ((dev->vendor == PCI_VENDOR_ID_MOTOROLA) &&
> + (dev->device == PCI_DEVICE_ID_MOTOROLA_MPC5200
> +  || dev->device == PCI_DEVICE_ID_MOTOROLA_MPC5200B)) {

Also note, you can move this to use pci_match_id().
That kind of patch might be approved.

-- 
With Best Regards,
Andy Shevchenko

Re: Kernel Crash Dump (kdump) broken with 6.5

2023-07-18 Thread Michael Ellerman

Mahesh J Salgaonkar  writes:
> On 2023-07-17 20:15:53 Mon, Sachin Sant wrote:
>> Kdump seems to be broken with 6.5 for ppc64le.
>> 
>> [ 14.200412] systemd[1]: Starting dracut pre-pivot and cleanup hook...
>> [[0;32m OK [0m] Started dracut pre-pivot and cleanup hook.
>> Starting Kdump Vmcore Save Service...
>> [ 14.231669] systemd[1]: Started dracut pre-pivot and cleanup hook.
>> [ 14.231801] systemd[1]: Starting Kdump Vmcore Save Service...
>> [ 14.341035] kdump.sh[297]: kdump: saving to 
>> /sysroot//var/crash//127.0.0.1-2023-07-14-13:32:34/
>> [ 14.350053] EXT4-fs (sda2): re-mounted e971a335-1ef8-4295-ab4e-3940f28e53fc 
>> r/w. Quota mode: none.
>> [ 14.345979] kdump.sh[297]: kdump: saving vmcore-dmesg.txt to 
>> /sysroot//var/crash//127.0.0.1-2023-07-14-13:32:34/
>> [ 14.348742] kdump.sh[331]: Cannot open /proc/vmcore: No such file or 
>> directory
>> [ 14.348845] kdump.sh[297]: kdump: saving vmcore-dmesg.txt failed
>> [ 14.349014] kdump.sh[297]: kdump: saving vmcore
>> [ 14.443422] kdump.sh[332]: open_dump_memory: Can't open the dump 
>> memory(/proc/vmcore). No such file or directory
>> [ 14.456413] kdump.sh[332]: makedumpfile Failed.
>> [ 14.456662] kdump.sh[297]: kdump: saving vmcore failed, _exitcode:1
>> [ 14.456822] kdump.sh[297]: kdump: saving the /run/initramfs/kexec-dmesg.log 
>> to /sysroot//var/crash//127.0.0.1-2023-07-14-13:32:34/
>> [ 14.487002] kdump.sh[297]: kdump: saving vmcore failed
>> [[0;1;31mFAILED[0m] Failed to start Kdump Vmcore Save Service.
>
> Thanks Sachin for catching this.
>
>> 
>> 6.4 was good. Git bisect points to following patch
>> 
>> commit 606787fed7268feb256957872586370b56af697a
>> powerpc/64s: Remove support for ELFv1 little endian userspace
>> 
>> Reverting this patch allows a successful capture of vmcore.
>> 
>> Does this change require any corresponding change to kdump
>> and/or kexec tools?
>
> Need to investigate that. It looks like vmcore_elf64_check_arch()
> check from fs/proc/vmcore.c is failing after above commit.
>
> static int __init parse_crash_elf64_headers(void)
> {
> [...]
>
> /* Do some basic Verification. */
> if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0 ||
> (ehdr.e_type != ET_CORE) ||
> !vmcore_elf64_check_arch() ||
> [...]

Where vmcore_elf64_check_arch() calls elf_check_arch(), which was
modified by the commit, so that makes sense.

> It looks like ehdr->e_flags are not set properly while generating vmcore
> ELF header. I see that in kexec_file_load, ehdr->e_flags left set to 0
> irrespective of IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) is true or false.

Does initialising it in crash_prepare_elf64_headers() fix the issue?

cheers

[RFC PATCH 21/21] crypto: scompress - Drop the use of per-cpu scratch buffers

2023-07-18 Thread Ard Biesheuvel

The scomp to acomp adaptation layer allocates 256k of scratch buffers
per CPU in order to be able to present the input provided by the caller
via scatterlists as linear byte arrays to the underlying synchronous
compression drivers, most of which are thin wrappers around the various
compression algorithm library implementations we have in the kernel.

This sucks. With high core counts and SMT, this easily adds up to
multiple megabytes that are permanently tied up for this purpose, and
given that all acomp users pass either single pages or contiguous
buffers in lowmem, we can optimize for this pattern and just pass the
buffer directly if we can. This removes the need for scratch buffers,
and along with it, the arbitrary 128k upper bound on the input and
output size of the acomp API when the implementation happens to be scomp
based.

So add a scomp_map_sg() helper to try and obtain the virtual addresses
associated with the scatterlists, which is guaranteed to be successful
100% of the time given the existing users, which all fit the prerequisite
pattern. And as a fallback for other cases, use kvmalloc with GFP_KERNEL
to allocate buffers on the fly and free them again right after.

This puts the burden on future callers to either use a contiguous
buffer, or deal with the potentially blocking nature of GFP_KERNEL.
For IPcomp in particular, the only relevant compression algorithm is
'deflate' which is no longer implemented as an scomp, and so this change
will not affect it even if we decide to convert it to take advantage of
the ability to pass discontiguous scatterlists.

Signed-off-by: Ard Biesheuvel 
---
 crypto/scompress.c  | 159 ++--
 include/crypto/internal/scompress.h |   2 -
 2 files changed, 76 insertions(+), 85 deletions(-)

diff --git a/crypto/scompress.c b/crypto/scompress.c
index 3155cdce9116e092..1c050aa864bd604d 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -18,24 +18,11 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "compress.h"
 
-struct scomp_scratch {
-   spinlock_t  lock;
-   void*src;
-   void*dst;
-};
-
-static DEFINE_PER_CPU(struct scomp_scratch, scomp_scratch) = {
-   .lock = __SPIN_LOCK_UNLOCKED(scomp_scratch.lock),
-};
-
 static const struct crypto_type crypto_scomp_type;
-static int scomp_scratch_users;
-static DEFINE_MUTEX(scomp_lock);
 
 static int __maybe_unused crypto_scomp_report(
struct sk_buff *skb, struct crypto_alg *alg)
@@ -58,56 +45,45 @@ static void crypto_scomp_show(struct seq_file *m, struct 
crypto_alg *alg)
seq_puts(m, "type : scomp\n");
 }
 
-static void crypto_scomp_free_scratches(void)
-{
-   struct scomp_scratch *scratch;
-   int i;
-
-   for_each_possible_cpu(i) {
-   scratch = per_cpu_ptr(_scratch, i);
-
-   vfree(scratch->src);
-   vfree(scratch->dst);
-   scratch->src = NULL;
-   scratch->dst = NULL;
-   }
-}
-
-static int crypto_scomp_alloc_scratches(void)
-{
-   struct scomp_scratch *scratch;
-   int i;
-
-   for_each_possible_cpu(i) {
-   void *mem;
-
-   scratch = per_cpu_ptr(_scratch, i);
-
-   mem = vmalloc_node(SCOMP_SCRATCH_SIZE, cpu_to_node(i));
-   if (!mem)
-   goto error;
-   scratch->src = mem;
-   mem = vmalloc_node(SCOMP_SCRATCH_SIZE, cpu_to_node(i));
-   if (!mem)
-   goto error;
-   scratch->dst = mem;
-   }
-   return 0;
-error:
-   crypto_scomp_free_scratches();
-   return -ENOMEM;
-}
-
 static int crypto_scomp_init_tfm(struct crypto_tfm *tfm)
 {
-   int ret = 0;
+   return 0;
+}
 
-   mutex_lock(_lock);
-   if (!scomp_scratch_users++)
-   ret = crypto_scomp_alloc_scratches();
-   mutex_unlock(_lock);
+/**
+ * scomp_map_sg - Return virtual address of memory described by a scatterlist
+ *
+ * @sg:The address of the scatterlist in memory
+ * @len:   The length of the buffer described by the scatterlist
+ *
+ * If the memory region described by scatterlist @sg consists of @len
+ * contiguous bytes in memory and is accessible via the linear mapping or via a
+ * single kmap(), return its virtual address.  Otherwise, return NULL.
+ */
+static void *scomp_map_sg(struct scatterlist *sg, unsigned int len)
+{
+   struct page *page;
+   unsigned int offset;
 
-   return ret;
+   while (sg_is_chain(sg))
+   sg = sg_next(sg);
+
+   if (!sg || sg_nents_for_len(sg, len) != 1)
+   return NULL;
+
+   page   = sg_page(sg) + (sg->offset >> PAGE_SHIFT);
+   offset = offset_in_page(sg->offset);
+
+   if (PageHighMem(page) && (offset + sg->length) > PAGE_SIZE)
+   return NULL;
+
+   return kmap_local_page(page) + offset;
+}
+
+static void scomp_unmap_sg(const void *addr)

[RFC PATCH 20/21] crypto: deflate - implement acomp API directly

2023-07-18 Thread Ard Biesheuvel

Drop the scomp implementation of deflate, which can only operate on
contiguous in- and output buffer, and replace it with an implementation
of acomp directly. This implementation walks the scatterlists, removing
the need for the caller to use scratch buffers to present the input and
output in a contiguous manner.

This is intended for use by the IPcomp code, which currently needs to
'linearize' SKBs in order for the compression to be able to consume the
input in a single chunk.

Signed-off-by: Ard Biesheuvel 
---
 crypto/deflate.c | 315 +++-
 include/crypto/scatterwalk.h |   2 +-
 2 files changed, 113 insertions(+), 204 deletions(-)

diff --git a/crypto/deflate.c b/crypto/deflate.c
index 0955040ca9e64146..112683473df2b588 100644
--- a/crypto/deflate.c
+++ b/crypto/deflate.c
@@ -6,246 +6,154 @@
  * by IPCOMP (RFC 3173 & RFC 2394).
  *
  * Copyright (c) 2003 James Morris 
- *
- * FIXME: deflate transforms will require up to a total of about 436k of kernel
- * memory on i386 (390k for compression, the rest for decompression), as the
- * current zlib kernel code uses a worst case pre-allocation system by default.
- * This needs to be fixed so that the amount of memory required is properly
- * related to the  winbits and memlevel parameters.
- *
- * The default winbits of 11 should suit most packets, and it may be something
- * to configure on a per-tfm basis in the future.
- *
- * Currently, compression history is not maintained between tfm calls, as
- * it is not needed for IPCOMP and keeps the code simpler.  It can be
- * implemented if someone wants it.
+ * Copyright (c) 2023 Google, LLC. 
  */
 #include 
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
 #include 
-#include 
+#include 
+#include 
+#include 
 
 #define DEFLATE_DEF_LEVEL  Z_DEFAULT_COMPRESSION
 #define DEFLATE_DEF_WINBITS11
 #define DEFLATE_DEF_MEMLEVEL   MAX_MEM_LEVEL
 
-struct deflate_ctx {
-   struct z_stream_s comp_stream;
-   struct z_stream_s decomp_stream;
+struct deflate_req_ctx {
+   struct z_stream_s stream;
+   u8 workspace[];
 };
 
-static int deflate_comp_init(struct deflate_ctx *ctx)
+static int deflate_process(struct acomp_req *req, struct z_stream_s *stream,
+  int (*process)(struct z_stream_s *, int))
 {
-   int ret = 0;
-   struct z_stream_s *stream = >comp_stream;
+   unsigned int slen = req->slen;
+   unsigned int dlen = req->dlen;
+   struct scatter_walk src, dst;
+   unsigned int scur, dcur;
+   int ret;
 
-   stream->workspace = vzalloc(zlib_deflate_workspacesize(
-   -DEFLATE_DEF_WINBITS, DEFLATE_DEF_MEMLEVEL));
-   if (!stream->workspace) {
-   ret = -ENOMEM;
-   goto out;
-   }
+   stream->avail_in = stream->avail_out = 0;
+
+   scatterwalk_start(, req->src);
+   scatterwalk_start(, req->dst);
+
+   scur = dcur = 0;
+
+   do {
+   if (stream->avail_in == 0) {
+   if (scur) {
+   slen -= scur;
+
+   scatterwalk_unmap(stream->next_in - scur);
+   scatterwalk_advance(, scur);
+   scatterwalk_done(, 0, slen);
+   }
+
+   scur = scatterwalk_clamp(, slen);
+   if (scur) {
+   stream->next_in = scatterwalk_map();
+   stream->avail_in = scur;
+   }
+   }
+
+   if (stream->avail_out == 0) {
+   if (dcur) {
+   dlen -= dcur;
+
+   scatterwalk_unmap(stream->next_out - dcur);
+   scatterwalk_advance(, dcur);
+   scatterwalk_done(, 1, dlen);
+   }
+
+   dcur = scatterwalk_clamp(, dlen);
+   if (!dcur)
+   break;
+
+   stream->next_out = scatterwalk_map();
+   stream->avail_out = dcur;
+   }
+
+   ret = process(stream, (slen == scur) ? Z_FINISH : Z_SYNC_FLUSH);
+   } while (ret == Z_OK);
+
+   if (scur)
+   scatterwalk_unmap(stream->next_in - scur);
+   if (dcur)
+   scatterwalk_unmap(stream->next_out - dcur);
+
+   if (ret != Z_STREAM_END)
+   return -EINVAL;
+
+   req->dlen = stream->total_out;
+   return 0;
+}
+
+static int deflate_compress(struct acomp_req *req)
+{
+   struct deflate_req_ctx *ctx = acomp_request_ctx(req);
+   struct z_stream_s *stream = >stream;
+   int ret;
+
+if (!req->src || !req->slen || !req->dst || !req->dlen)
+return -EINVAL;
+
+   stream->workspace = ctx->workspace;
ret =

[RFC PATCH 19/21] crypto: remove obsolete 'comp' compression API

2023-07-18 Thread Ard Biesheuvel

The 'comp' compression API has been superseded by the acomp API, which
is a bit more cumbersome to use, but ultimately more flexible when it
comes to hardware implementations.

Now that all the users and implementations have been removed, let's
remove the core plumbing of the 'comp' API as well.

Signed-off-by: Ard Biesheuvel 
---
 Documentation/crypto/architecture.rst |   2 -
 crypto/Makefile   |   2 +-
 crypto/api.c  |   4 -
 crypto/compress.c |  32 -
 crypto/crypto_user_base.c |  16 ---
 crypto/crypto_user_stat.c |   4 -
 crypto/proc.c |   3 -
 crypto/testmgr.c  | 144 ++--
 include/linux/crypto.h|  49 +--
 9 files changed, 12 insertions(+), 244 deletions(-)

diff --git a/Documentation/crypto/architecture.rst 
b/Documentation/crypto/architecture.rst
index 646c3380a7edc4c6..ec7436aade15c2e6 100644
--- a/Documentation/crypto/architecture.rst
+++ b/Documentation/crypto/architecture.rst
@@ -196,8 +196,6 @@ the aforementioned cipher types:
 
 -  CRYPTO_ALG_TYPE_CIPHER Single block cipher
 
--  CRYPTO_ALG_TYPE_COMPRESS Compression
-
 -  CRYPTO_ALG_TYPE_AEAD Authenticated Encryption with Associated Data
(MAC)
 
diff --git a/crypto/Makefile b/crypto/Makefile
index 953a7e105e58c837..5775440c62e09eac 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -4,7 +4,7 @@
 #
 
 obj-$(CONFIG_CRYPTO) += crypto.o
-crypto-y := api.o cipher.o compress.o
+crypto-y := api.o cipher.o
 
 obj-$(CONFIG_CRYPTO_ENGINE) += crypto_engine.o
 obj-$(CONFIG_CRYPTO_FIPS) += fips.o
diff --git a/crypto/api.c b/crypto/api.c
index b9cc0c906efe0706..23d691a70bc3fb00 100644
--- a/crypto/api.c
+++ b/crypto/api.c
@@ -369,10 +369,6 @@ static unsigned int crypto_ctxsize(struct crypto_alg *alg, 
u32 type, u32 mask)
case CRYPTO_ALG_TYPE_CIPHER:
len += crypto_cipher_ctxsize(alg);
break;
-
-   case CRYPTO_ALG_TYPE_COMPRESS:
-   len += crypto_compress_ctxsize(alg);
-   break;
}
 
return len;
diff --git a/crypto/compress.c b/crypto/compress.c
deleted file mode 100644
index 9048fe390c463069..
--- a/crypto/compress.c
+++ /dev/null
@@ -1,32 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * Cryptographic API.
- *
- * Compression operations.
- *
- * Copyright (c) 2002 James Morris 
- */
-#include 
-#include "internal.h"
-
-int crypto_comp_compress(struct crypto_comp *comp,
-const u8 *src, unsigned int slen,
-u8 *dst, unsigned int *dlen)
-{
-   struct crypto_tfm *tfm = crypto_comp_tfm(comp);
-
-   return tfm->__crt_alg->cra_compress.coa_compress(tfm, src, slen, dst,
-dlen);
-}
-EXPORT_SYMBOL_GPL(crypto_comp_compress);
-
-int crypto_comp_decompress(struct crypto_comp *comp,
-  const u8 *src, unsigned int slen,
-  u8 *dst, unsigned int *dlen)
-{
-   struct crypto_tfm *tfm = crypto_comp_tfm(comp);
-
-   return tfm->__crt_alg->cra_compress.coa_decompress(tfm, src, slen, dst,
-  dlen);
-}
-EXPORT_SYMBOL_GPL(crypto_comp_decompress);
diff --git a/crypto/crypto_user_base.c b/crypto/crypto_user_base.c
index 3fa20f12989f7ef2..c27484b0042e6bd8 100644
--- a/crypto/crypto_user_base.c
+++ b/crypto/crypto_user_base.c
@@ -85,17 +85,6 @@ static int crypto_report_cipher(struct sk_buff *skb, struct 
crypto_alg *alg)
   sizeof(rcipher), );
 }
 
-static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg)
-{
-   struct crypto_report_comp rcomp;
-
-   memset(, 0, sizeof(rcomp));
-
-   strscpy(rcomp.type, "compression", sizeof(rcomp.type));
-
-   return nla_put(skb, CRYPTOCFGA_REPORT_COMPRESS, sizeof(rcomp), );
-}
-
 static int crypto_report_one(struct crypto_alg *alg,
 struct crypto_user_alg *ualg, struct sk_buff *skb)
 {
@@ -136,11 +125,6 @@ static int crypto_report_one(struct crypto_alg *alg,
if (crypto_report_cipher(skb, alg))
goto nla_put_failure;
 
-   break;
-   case CRYPTO_ALG_TYPE_COMPRESS:
-   if (crypto_report_comp(skb, alg))
-   goto nla_put_failure;
-
break;
}
 
diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c
index d4f3d39b51376973..d3133eda2f528d17 100644
--- a/crypto/crypto_user_stat.c
+++ b/crypto/crypto_user_stat.c
@@ -86,10 +86,6 @@ static int crypto_reportstat_one(struct crypto_alg *alg,
if (crypto_report_cipher(skb, alg))
goto nla_put_failure;
break;
-   case CRYPTO_ALG_TYPE_COMPRESS:
-   if (crypto_report_comp(skb, alg))
-   goto nla_put_failure;
-

[RFC PATCH 18/21] crypto: compress_null - drop obsolete 'comp' implementation

2023-07-18 Thread Ard Biesheuvel

The 'comp' API is obsolete and will be removed, so remove this comp
implementation.

Signed-off-by: Ard Biesheuvel 
---
 crypto/crypto_null.c | 31 
 crypto/testmgr.c |  3 --
 2 files changed, 5 insertions(+), 29 deletions(-)

diff --git a/crypto/crypto_null.c b/crypto/crypto_null.c
index 5b84b0f7cc178fcd..75e73b1d6df01cc6 100644
--- a/crypto/crypto_null.c
+++ b/crypto/crypto_null.c
@@ -24,16 +24,6 @@ static DEFINE_MUTEX(crypto_default_null_skcipher_lock);
 static struct crypto_sync_skcipher *crypto_default_null_skcipher;
 static int crypto_default_null_skcipher_refcnt;
 
-static int null_compress(struct crypto_tfm *tfm, const u8 *src,
-unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-   if (slen > *dlen)
-   return -EINVAL;
-   memcpy(dst, src, slen);
-   *dlen = slen;
-   return 0;
-}
-
 static int null_init(struct shash_desc *desc)
 {
return 0;
@@ -121,7 +111,7 @@ static struct skcipher_alg skcipher_null = {
.decrypt=   null_skcipher_crypt,
 };
 
-static struct crypto_alg null_algs[] = { {
+static struct crypto_alg cipher_null = {
.cra_name   =   "cipher_null",
.cra_driver_name=   "cipher_null-generic",
.cra_flags  =   CRYPTO_ALG_TYPE_CIPHER,
@@ -134,19 +124,8 @@ static struct crypto_alg null_algs[] = { {
.cia_setkey =   null_setkey,
.cia_encrypt=   null_crypt,
.cia_decrypt=   null_crypt } }
-}, {
-   .cra_name   =   "compress_null",
-   .cra_driver_name=   "compress_null-generic",
-   .cra_flags  =   CRYPTO_ALG_TYPE_COMPRESS,
-   .cra_blocksize  =   NULL_BLOCK_SIZE,
-   .cra_ctxsize=   0,
-   .cra_module =   THIS_MODULE,
-   .cra_u  =   { .compress = {
-   .coa_compress   =   null_compress,
-   .coa_decompress =   null_compress } }
-} };
+};
 
-MODULE_ALIAS_CRYPTO("compress_null");
 MODULE_ALIAS_CRYPTO("digest_null");
 MODULE_ALIAS_CRYPTO("cipher_null");
 
@@ -189,7 +168,7 @@ static int __init crypto_null_mod_init(void)
 {
int ret = 0;
 
-   ret = crypto_register_algs(null_algs, ARRAY_SIZE(null_algs));
+   ret = crypto_register_alg(_null);
if (ret < 0)
goto out;
 
@@ -206,14 +185,14 @@ static int __init crypto_null_mod_init(void)
 out_unregister_shash:
crypto_unregister_shash(_null);
 out_unregister_algs:
-   crypto_unregister_algs(null_algs, ARRAY_SIZE(null_algs));
+   crypto_unregister_alg(_null);
 out:
return ret;
 }
 
 static void __exit crypto_null_mod_fini(void)
 {
-   crypto_unregister_algs(null_algs, ARRAY_SIZE(null_algs));
+   crypto_unregister_alg(_null);
crypto_unregister_shash(_null);
crypto_unregister_skcipher(_null);
 }
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 4971351f55dbabb9..e4b6d67233763193 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -4633,9 +4633,6 @@ static const struct alg_test_desc alg_test_descs[] = {
.suite = {
.hash = __VECS(sm4_cmac128_tv_template)
}
-   }, {
-   .alg = "compress_null",
-   .test = alg_test_null,
}, {
.alg = "crc32",
.test = alg_test_hash,
-- 
2.39.2

[RFC PATCH 17/21] crypto: cavium/zip - drop obsolete 'comp' implementation

2023-07-18 Thread Ard Biesheuvel

The 'comp' API is obsolete and will be removed, so remove this comp
implementation.

Signed-off-by: Ard Biesheuvel 
---
 drivers/crypto/cavium/zip/zip_crypto.c | 40 
 drivers/crypto/cavium/zip/zip_crypto.h | 10 
 drivers/crypto/cavium/zip/zip_main.c   | 50 +---
 3 files changed, 1 insertion(+), 99 deletions(-)

diff --git a/drivers/crypto/cavium/zip/zip_crypto.c 
b/drivers/crypto/cavium/zip/zip_crypto.c
index 1046a746d36f551c..5edad3b1d1dc8398 100644
--- a/drivers/crypto/cavium/zip/zip_crypto.c
+++ b/drivers/crypto/cavium/zip/zip_crypto.c
@@ -195,46 +195,6 @@ static int zip_decompress(const u8 *src, unsigned int slen,
return ret;
 }
 
-/* Legacy Compress framework start */
-int zip_alloc_comp_ctx_deflate(struct crypto_tfm *tfm)
-{
-   struct zip_kernel_ctx *zip_ctx = crypto_tfm_ctx(tfm);
-
-   return zip_ctx_init(zip_ctx, 0);
-}
-
-int zip_alloc_comp_ctx_lzs(struct crypto_tfm *tfm)
-{
-   struct zip_kernel_ctx *zip_ctx = crypto_tfm_ctx(tfm);
-
-   return zip_ctx_init(zip_ctx, 1);
-}
-
-void zip_free_comp_ctx(struct crypto_tfm *tfm)
-{
-   struct zip_kernel_ctx *zip_ctx = crypto_tfm_ctx(tfm);
-
-   zip_ctx_exit(zip_ctx);
-}
-
-int  zip_comp_compress(struct crypto_tfm *tfm,
-  const u8 *src, unsigned int slen,
-  u8 *dst, unsigned int *dlen)
-{
-   struct zip_kernel_ctx *zip_ctx = crypto_tfm_ctx(tfm);
-
-   return zip_compress(src, slen, dst, dlen, zip_ctx);
-}
-
-int  zip_comp_decompress(struct crypto_tfm *tfm,
-const u8 *src, unsigned int slen,
-u8 *dst, unsigned int *dlen)
-{
-   struct zip_kernel_ctx *zip_ctx = crypto_tfm_ctx(tfm);
-
-   return zip_decompress(src, slen, dst, dlen, zip_ctx);
-} /* Legacy compress framework end */
-
 /* SCOMP framework start */
 void *zip_alloc_scomp_ctx_deflate(struct crypto_scomp *tfm)
 {
diff --git a/drivers/crypto/cavium/zip/zip_crypto.h 
b/drivers/crypto/cavium/zip/zip_crypto.h
index b59ddfcacd34447e..a1ae3825fb65c3b6 100644
--- a/drivers/crypto/cavium/zip/zip_crypto.h
+++ b/drivers/crypto/cavium/zip/zip_crypto.h
@@ -57,16 +57,6 @@ struct zip_kernel_ctx {
struct zip_operation zip_decomp;
 };
 
-int  zip_alloc_comp_ctx_deflate(struct crypto_tfm *tfm);
-int  zip_alloc_comp_ctx_lzs(struct crypto_tfm *tfm);
-void zip_free_comp_ctx(struct crypto_tfm *tfm);
-int  zip_comp_compress(struct crypto_tfm *tfm,
-  const u8 *src, unsigned int slen,
-  u8 *dst, unsigned int *dlen);
-int  zip_comp_decompress(struct crypto_tfm *tfm,
-const u8 *src, unsigned int slen,
-u8 *dst, unsigned int *dlen);
-
 void *zip_alloc_scomp_ctx_deflate(struct crypto_scomp *tfm);
 void *zip_alloc_scomp_ctx_lzs(struct crypto_scomp *tfm);
 void  zip_free_scomp_ctx(struct crypto_scomp *tfm, void *zip_ctx);
diff --git a/drivers/crypto/cavium/zip/zip_main.c 
b/drivers/crypto/cavium/zip/zip_main.c
index dc5b7bf7e1fd9867..abd58de4343ddd8e 100644
--- a/drivers/crypto/cavium/zip/zip_main.c
+++ b/drivers/crypto/cavium/zip/zip_main.c
@@ -371,36 +371,6 @@ static struct pci_driver zip_driver = {
 
 /* Kernel Crypto Subsystem Interface */
 
-static struct crypto_alg zip_comp_deflate = {
-   .cra_name   = "deflate",
-   .cra_driver_name= "deflate-cavium",
-   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
-   .cra_ctxsize= sizeof(struct zip_kernel_ctx),
-   .cra_priority   = 300,
-   .cra_module = THIS_MODULE,
-   .cra_init   = zip_alloc_comp_ctx_deflate,
-   .cra_exit   = zip_free_comp_ctx,
-   .cra_u  = { .compress = {
-   .coa_compress   = zip_comp_compress,
-   .coa_decompress = zip_comp_decompress
-} }
-};
-
-static struct crypto_alg zip_comp_lzs = {
-   .cra_name   = "lzs",
-   .cra_driver_name= "lzs-cavium",
-   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
-   .cra_ctxsize= sizeof(struct zip_kernel_ctx),
-   .cra_priority   = 300,
-   .cra_module = THIS_MODULE,
-   .cra_init   = zip_alloc_comp_ctx_lzs,
-   .cra_exit   = zip_free_comp_ctx,
-   .cra_u  = { .compress = {
-   .coa_compress   = zip_comp_compress,
-   .coa_decompress = zip_comp_decompress
-} }
-};
-
 static struct scomp_alg zip_scomp_deflate = {
.alloc_ctx  = zip_alloc_scomp_ctx_deflate,
.free_ctx   = zip_free_scomp_ctx,
@@ -431,22 +401,10 @@ static int zip_register_compression_device(void)
 {
int ret;
 
-   ret = crypto_register_alg(_comp_deflate);
-   if (ret < 0) {
-   zip_err("Deflate algorithm registration failed\n");
-   return ret;
-   }
-
-   ret =

[RFC PATCH 16/21] crypto: zstd - drop obsolete 'comp' implementation

2023-07-18 Thread Ard Biesheuvel

The 'comp' API is obsolete and will be removed, so remove this comp
implementation.

Signed-off-by: Ard Biesheuvel 
---
 crypto/zstd.c | 56 +---
 1 file changed, 1 insertion(+), 55 deletions(-)

diff --git a/crypto/zstd.c b/crypto/zstd.c
index 154a969c83a82277..c6e6f135c5812c9c 100644
--- a/crypto/zstd.c
+++ b/crypto/zstd.c
@@ -121,13 +121,6 @@ static void *zstd_alloc_ctx(struct crypto_scomp *tfm)
return ctx;
 }
 
-static int zstd_init(struct crypto_tfm *tfm)
-{
-   struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   return __zstd_init(ctx);
-}
-
 static void __zstd_exit(void *ctx)
 {
zstd_comp_exit(ctx);
@@ -140,13 +133,6 @@ static void zstd_free_ctx(struct crypto_scomp *tfm, void 
*ctx)
kfree_sensitive(ctx);
 }
 
-static void zstd_exit(struct crypto_tfm *tfm)
-{
-   struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   __zstd_exit(ctx);
-}
-
 static int __zstd_compress(const u8 *src, unsigned int slen,
   u8 *dst, unsigned int *dlen, void *ctx)
 {
@@ -161,14 +147,6 @@ static int __zstd_compress(const u8 *src, unsigned int 
slen,
return 0;
 }
 
-static int zstd_compress(struct crypto_tfm *tfm, const u8 *src,
-unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-   struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   return __zstd_compress(src, slen, dst, dlen, ctx);
-}
-
 static int zstd_scompress(struct crypto_scomp *tfm, const u8 *src,
  unsigned int slen, u8 *dst, unsigned int *dlen,
  void *ctx)
@@ -189,14 +167,6 @@ static int __zstd_decompress(const u8 *src, unsigned int 
slen,
return 0;
 }
 
-static int zstd_decompress(struct crypto_tfm *tfm, const u8 *src,
-  unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-   struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   return __zstd_decompress(src, slen, dst, dlen, ctx);
-}
-
 static int zstd_sdecompress(struct crypto_scomp *tfm, const u8 *src,
unsigned int slen, u8 *dst, unsigned int *dlen,
void *ctx)
@@ -204,19 +174,6 @@ static int zstd_sdecompress(struct crypto_scomp *tfm, 
const u8 *src,
return __zstd_decompress(src, slen, dst, dlen, ctx);
 }
 
-static struct crypto_alg alg = {
-   .cra_name   = "zstd",
-   .cra_driver_name= "zstd-generic",
-   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
-   .cra_ctxsize= sizeof(struct zstd_ctx),
-   .cra_module = THIS_MODULE,
-   .cra_init   = zstd_init,
-   .cra_exit   = zstd_exit,
-   .cra_u  = { .compress = {
-   .coa_compress   = zstd_compress,
-   .coa_decompress = zstd_decompress } }
-};
-
 static struct scomp_alg scomp = {
.alloc_ctx  = zstd_alloc_ctx,
.free_ctx   = zstd_free_ctx,
@@ -231,22 +188,11 @@ static struct scomp_alg scomp = {
 
 static int __init zstd_mod_init(void)
 {
-   int ret;
-
-   ret = crypto_register_alg();
-   if (ret)
-   return ret;
-
-   ret = crypto_register_scomp();
-   if (ret)
-   crypto_unregister_alg();
-
-   return ret;
+   return crypto_register_scomp();
 }
 
 static void __exit zstd_mod_fini(void)
 {
-   crypto_unregister_alg();
crypto_unregister_scomp();
 }
 
-- 
2.39.2

[RFC PATCH 15/21] crypto: lzo - drop obsolete 'comp' implementation

2023-07-18 Thread Ard Biesheuvel

The 'comp' API is obsolete and will be removed, so remove this comp
implementation.

Signed-off-by: Ard Biesheuvel 
---
 crypto/lzo.c | 60 +---
 1 file changed, 1 insertion(+), 59 deletions(-)

diff --git a/crypto/lzo.c b/crypto/lzo.c
index ebda132dd22bf543..52558f9d41f3dcea 100644
--- a/crypto/lzo.c
+++ b/crypto/lzo.c
@@ -26,29 +26,11 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
return ctx;
 }
 
-static int lzo_init(struct crypto_tfm *tfm)
-{
-   struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   ctx->lzo_comp_mem = lzo_alloc_ctx(NULL);
-   if (IS_ERR(ctx->lzo_comp_mem))
-   return -ENOMEM;
-
-   return 0;
-}
-
 static void lzo_free_ctx(struct crypto_scomp *tfm, void *ctx)
 {
kvfree(ctx);
 }
 
-static void lzo_exit(struct crypto_tfm *tfm)
-{
-   struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   lzo_free_ctx(NULL, ctx->lzo_comp_mem);
-}
-
 static int __lzo_compress(const u8 *src, unsigned int slen,
  u8 *dst, unsigned int *dlen, void *ctx)
 {
@@ -64,14 +46,6 @@ static int __lzo_compress(const u8 *src, unsigned int slen,
return 0;
 }
 
-static int lzo_compress(struct crypto_tfm *tfm, const u8 *src,
-   unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-   struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   return __lzo_compress(src, slen, dst, dlen, ctx->lzo_comp_mem);
-}
-
 static int lzo_scompress(struct crypto_scomp *tfm, const u8 *src,
 unsigned int slen, u8 *dst, unsigned int *dlen,
 void *ctx)
@@ -94,12 +68,6 @@ static int __lzo_decompress(const u8 *src, unsigned int slen,
return 0;
 }
 
-static int lzo_decompress(struct crypto_tfm *tfm, const u8 *src,
- unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-   return __lzo_decompress(src, slen, dst, dlen);
-}
-
 static int lzo_sdecompress(struct crypto_scomp *tfm, const u8 *src,
   unsigned int slen, u8 *dst, unsigned int *dlen,
   void *ctx)
@@ -107,19 +75,6 @@ static int lzo_sdecompress(struct crypto_scomp *tfm, const 
u8 *src,
return __lzo_decompress(src, slen, dst, dlen);
 }
 
-static struct crypto_alg alg = {
-   .cra_name   = "lzo",
-   .cra_driver_name= "lzo-generic",
-   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
-   .cra_ctxsize= sizeof(struct lzo_ctx),
-   .cra_module = THIS_MODULE,
-   .cra_init   = lzo_init,
-   .cra_exit   = lzo_exit,
-   .cra_u  = { .compress = {
-   .coa_compress   = lzo_compress,
-   .coa_decompress = lzo_decompress } }
-};
-
 static struct scomp_alg scomp = {
.alloc_ctx  = lzo_alloc_ctx,
.free_ctx   = lzo_free_ctx,
@@ -134,24 +89,11 @@ static struct scomp_alg scomp = {
 
 static int __init lzo_mod_init(void)
 {
-   int ret;
-
-   ret = crypto_register_alg();
-   if (ret)
-   return ret;
-
-   ret = crypto_register_scomp();
-   if (ret) {
-   crypto_unregister_alg();
-   return ret;
-   }
-
-   return ret;
+   return crypto_register_scomp();
 }
 
 static void __exit lzo_mod_fini(void)
 {
-   crypto_unregister_alg();
crypto_unregister_scomp();
 }
 
-- 
2.39.2

[RFC PATCH 14/21] crypto: lzo-rle - drop obsolete 'comp' implementation

2023-07-18 Thread Ard Biesheuvel

The 'comp' API is obsolete and will be removed, so remove this comp
implementation.

Signed-off-by: Ard Biesheuvel 
---
 crypto/lzo-rle.c | 60 +---
 1 file changed, 1 insertion(+), 59 deletions(-)

diff --git a/crypto/lzo-rle.c b/crypto/lzo-rle.c
index 0631d975bfac1129..658d6aa46fe21e19 100644
--- a/crypto/lzo-rle.c
+++ b/crypto/lzo-rle.c
@@ -26,29 +26,11 @@ static void *lzorle_alloc_ctx(struct crypto_scomp *tfm)
return ctx;
 }
 
-static int lzorle_init(struct crypto_tfm *tfm)
-{
-   struct lzorle_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   ctx->lzorle_comp_mem = lzorle_alloc_ctx(NULL);
-   if (IS_ERR(ctx->lzorle_comp_mem))
-   return -ENOMEM;
-
-   return 0;
-}
-
 static void lzorle_free_ctx(struct crypto_scomp *tfm, void *ctx)
 {
kvfree(ctx);
 }
 
-static void lzorle_exit(struct crypto_tfm *tfm)
-{
-   struct lzorle_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   lzorle_free_ctx(NULL, ctx->lzorle_comp_mem);
-}
-
 static int __lzorle_compress(const u8 *src, unsigned int slen,
  u8 *dst, unsigned int *dlen, void *ctx)
 {
@@ -64,14 +46,6 @@ static int __lzorle_compress(const u8 *src, unsigned int 
slen,
return 0;
 }
 
-static int lzorle_compress(struct crypto_tfm *tfm, const u8 *src,
-   unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-   struct lzorle_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   return __lzorle_compress(src, slen, dst, dlen, ctx->lzorle_comp_mem);
-}
-
 static int lzorle_scompress(struct crypto_scomp *tfm, const u8 *src,
 unsigned int slen, u8 *dst, unsigned int *dlen,
 void *ctx)
@@ -94,12 +68,6 @@ static int __lzorle_decompress(const u8 *src, unsigned int 
slen,
return 0;
 }
 
-static int lzorle_decompress(struct crypto_tfm *tfm, const u8 *src,
- unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-   return __lzorle_decompress(src, slen, dst, dlen);
-}
-
 static int lzorle_sdecompress(struct crypto_scomp *tfm, const u8 *src,
   unsigned int slen, u8 *dst, unsigned int *dlen,
   void *ctx)
@@ -107,19 +75,6 @@ static int lzorle_sdecompress(struct crypto_scomp *tfm, 
const u8 *src,
return __lzorle_decompress(src, slen, dst, dlen);
 }
 
-static struct crypto_alg alg = {
-   .cra_name   = "lzo-rle",
-   .cra_driver_name= "lzo-rle-generic",
-   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
-   .cra_ctxsize= sizeof(struct lzorle_ctx),
-   .cra_module = THIS_MODULE,
-   .cra_init   = lzorle_init,
-   .cra_exit   = lzorle_exit,
-   .cra_u  = { .compress = {
-   .coa_compress   = lzorle_compress,
-   .coa_decompress = lzorle_decompress } }
-};
-
 static struct scomp_alg scomp = {
.alloc_ctx  = lzorle_alloc_ctx,
.free_ctx   = lzorle_free_ctx,
@@ -134,24 +89,11 @@ static struct scomp_alg scomp = {
 
 static int __init lzorle_mod_init(void)
 {
-   int ret;
-
-   ret = crypto_register_alg();
-   if (ret)
-   return ret;
-
-   ret = crypto_register_scomp();
-   if (ret) {
-   crypto_unregister_alg();
-   return ret;
-   }
-
-   return ret;
+   return crypto_register_scomp();
 }
 
 static void __exit lzorle_mod_fini(void)
 {
-   crypto_unregister_alg();
crypto_unregister_scomp();
 }
 
-- 
2.39.2

[RFC PATCH 13/21] crypto: lz4hc - drop obsolete 'comp' implementation

2023-07-18 Thread Ard Biesheuvel

The 'comp' API is obsolete and will be removed, so remove this comp
implementation.

Signed-off-by: Ard Biesheuvel 
---
 crypto/lz4hc.c | 63 +---
 1 file changed, 1 insertion(+), 62 deletions(-)

diff --git a/crypto/lz4hc.c b/crypto/lz4hc.c
index d7cc94aa2fcf42fa..5d6b13319f5e7683 100644
--- a/crypto/lz4hc.c
+++ b/crypto/lz4hc.c
@@ -26,29 +26,11 @@ static void *lz4hc_alloc_ctx(struct crypto_scomp *tfm)
return ctx;
 }
 
-static int lz4hc_init(struct crypto_tfm *tfm)
-{
-   struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   ctx->lz4hc_comp_mem = lz4hc_alloc_ctx(NULL);
-   if (IS_ERR(ctx->lz4hc_comp_mem))
-   return -ENOMEM;
-
-   return 0;
-}
-
 static void lz4hc_free_ctx(struct crypto_scomp *tfm, void *ctx)
 {
vfree(ctx);
 }
 
-static void lz4hc_exit(struct crypto_tfm *tfm)
-{
-   struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   lz4hc_free_ctx(NULL, ctx->lz4hc_comp_mem);
-}
-
 static int __lz4hc_compress_crypto(const u8 *src, unsigned int slen,
   u8 *dst, unsigned int *dlen, void *ctx)
 {
@@ -69,16 +51,6 @@ static int lz4hc_scompress(struct crypto_scomp *tfm, const 
u8 *src,
return __lz4hc_compress_crypto(src, slen, dst, dlen, ctx);
 }
 
-static int lz4hc_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
-unsigned int slen, u8 *dst,
-unsigned int *dlen)
-{
-   struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   return __lz4hc_compress_crypto(src, slen, dst, dlen,
-   ctx->lz4hc_comp_mem);
-}
-
 static int __lz4hc_decompress_crypto(const u8 *src, unsigned int slen,
 u8 *dst, unsigned int *dlen, void *ctx)
 {
@@ -98,26 +70,6 @@ static int lz4hc_sdecompress(struct crypto_scomp *tfm, const 
u8 *src,
return __lz4hc_decompress_crypto(src, slen, dst, dlen, NULL);
 }
 
-static int lz4hc_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
-  unsigned int slen, u8 *dst,
-  unsigned int *dlen)
-{
-   return __lz4hc_decompress_crypto(src, slen, dst, dlen, NULL);
-}
-
-static struct crypto_alg alg_lz4hc = {
-   .cra_name   = "lz4hc",
-   .cra_driver_name= "lz4hc-generic",
-   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
-   .cra_ctxsize= sizeof(struct lz4hc_ctx),
-   .cra_module = THIS_MODULE,
-   .cra_init   = lz4hc_init,
-   .cra_exit   = lz4hc_exit,
-   .cra_u  = { .compress = {
-   .coa_compress   = lz4hc_compress_crypto,
-   .coa_decompress = lz4hc_decompress_crypto } }
-};
-
 static struct scomp_alg scomp = {
.alloc_ctx  = lz4hc_alloc_ctx,
.free_ctx   = lz4hc_free_ctx,
@@ -132,24 +84,11 @@ static struct scomp_alg scomp = {
 
 static int __init lz4hc_mod_init(void)
 {
-   int ret;
-
-   ret = crypto_register_alg(_lz4hc);
-   if (ret)
-   return ret;
-
-   ret = crypto_register_scomp();
-   if (ret) {
-   crypto_unregister_alg(_lz4hc);
-   return ret;
-   }
-
-   return ret;
+   return crypto_register_scomp();
 }
 
 static void __exit lz4hc_mod_fini(void)
 {
-   crypto_unregister_alg(_lz4hc);
crypto_unregister_scomp();
 }
 
-- 
2.39.2

[RFC PATCH 12/21] crypto: lz4 - drop obsolete 'comp' implementation

2023-07-18 Thread Ard Biesheuvel

The 'comp' API is obsolete and will be removed, so remove this comp
implementation.

Signed-off-by: Ard Biesheuvel 
---
 crypto/lz4.c | 61 +---
 1 file changed, 1 insertion(+), 60 deletions(-)

diff --git a/crypto/lz4.c b/crypto/lz4.c
index 0606f8862e7872ad..c46b6cbd91ce10c0 100644
--- a/crypto/lz4.c
+++ b/crypto/lz4.c
@@ -27,29 +27,11 @@ static void *lz4_alloc_ctx(struct crypto_scomp *tfm)
return ctx;
 }
 
-static int lz4_init(struct crypto_tfm *tfm)
-{
-   struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   ctx->lz4_comp_mem = lz4_alloc_ctx(NULL);
-   if (IS_ERR(ctx->lz4_comp_mem))
-   return -ENOMEM;
-
-   return 0;
-}
-
 static void lz4_free_ctx(struct crypto_scomp *tfm, void *ctx)
 {
vfree(ctx);
 }
 
-static void lz4_exit(struct crypto_tfm *tfm)
-{
-   struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   lz4_free_ctx(NULL, ctx->lz4_comp_mem);
-}
-
 static int __lz4_compress_crypto(const u8 *src, unsigned int slen,
 u8 *dst, unsigned int *dlen, void *ctx)
 {
@@ -70,14 +52,6 @@ static int lz4_scompress(struct crypto_scomp *tfm, const u8 
*src,
return __lz4_compress_crypto(src, slen, dst, dlen, ctx);
 }
 
-static int lz4_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
-  unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-   struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   return __lz4_compress_crypto(src, slen, dst, dlen, ctx->lz4_comp_mem);
-}
-
 static int __lz4_decompress_crypto(const u8 *src, unsigned int slen,
   u8 *dst, unsigned int *dlen, void *ctx)
 {
@@ -97,26 +71,6 @@ static int lz4_sdecompress(struct crypto_scomp *tfm, const 
u8 *src,
return __lz4_decompress_crypto(src, slen, dst, dlen, NULL);
 }
 
-static int lz4_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
-unsigned int slen, u8 *dst,
-unsigned int *dlen)
-{
-   return __lz4_decompress_crypto(src, slen, dst, dlen, NULL);
-}
-
-static struct crypto_alg alg_lz4 = {
-   .cra_name   = "lz4",
-   .cra_driver_name= "lz4-generic",
-   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
-   .cra_ctxsize= sizeof(struct lz4_ctx),
-   .cra_module = THIS_MODULE,
-   .cra_init   = lz4_init,
-   .cra_exit   = lz4_exit,
-   .cra_u  = { .compress = {
-   .coa_compress   = lz4_compress_crypto,
-   .coa_decompress = lz4_decompress_crypto } }
-};
-
 static struct scomp_alg scomp = {
.alloc_ctx  = lz4_alloc_ctx,
.free_ctx   = lz4_free_ctx,
@@ -131,24 +85,11 @@ static struct scomp_alg scomp = {
 
 static int __init lz4_mod_init(void)
 {
-   int ret;
-
-   ret = crypto_register_alg(_lz4);
-   if (ret)
-   return ret;
-
-   ret = crypto_register_scomp();
-   if (ret) {
-   crypto_unregister_alg(_lz4);
-   return ret;
-   }
-
-   return ret;
+   return crypto_register_scomp();
 }
 
 static void __exit lz4_mod_fini(void)
 {
-   crypto_unregister_alg(_lz4);
crypto_unregister_scomp();
 }
 
-- 
2.39.2

[RFC PATCH 11/21] crypto: deflate - drop obsolete 'comp' implementation

2023-07-18 Thread Ard Biesheuvel

No users of the obsolete 'comp' crypto compression API remain, so let's
drop the software deflate version of it.

Signed-off-by: Ard Biesheuvel 
---
 crypto/deflate.c | 58 +---
 1 file changed, 1 insertion(+), 57 deletions(-)

diff --git a/crypto/deflate.c b/crypto/deflate.c
index f4f127078fe2a5aa..0955040ca9e64146 100644
--- a/crypto/deflate.c
+++ b/crypto/deflate.c
@@ -130,13 +130,6 @@ static void *deflate_alloc_ctx(struct crypto_scomp *tfm)
return ctx;
 }
 
-static int deflate_init(struct crypto_tfm *tfm)
-{
-   struct deflate_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   return __deflate_init(ctx);
-}
-
 static void __deflate_exit(void *ctx)
 {
deflate_comp_exit(ctx);
@@ -149,13 +142,6 @@ static void deflate_free_ctx(struct crypto_scomp *tfm, 
void *ctx)
kfree_sensitive(ctx);
 }
 
-static void deflate_exit(struct crypto_tfm *tfm)
-{
-   struct deflate_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   __deflate_exit(ctx);
-}
-
 static int __deflate_compress(const u8 *src, unsigned int slen,
  u8 *dst, unsigned int *dlen, void *ctx)
 {
@@ -185,14 +171,6 @@ static int __deflate_compress(const u8 *src, unsigned int 
slen,
return ret;
 }
 
-static int deflate_compress(struct crypto_tfm *tfm, const u8 *src,
-   unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-   struct deflate_ctx *dctx = crypto_tfm_ctx(tfm);
-
-   return __deflate_compress(src, slen, dst, dlen, dctx);
-}
-
 static int deflate_scompress(struct crypto_scomp *tfm, const u8 *src,
 unsigned int slen, u8 *dst, unsigned int *dlen,
 void *ctx)
@@ -241,14 +219,6 @@ static int __deflate_decompress(const u8 *src, unsigned 
int slen,
return ret;
 }
 
-static int deflate_decompress(struct crypto_tfm *tfm, const u8 *src,
- unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-   struct deflate_ctx *dctx = crypto_tfm_ctx(tfm);
-
-   return __deflate_decompress(src, slen, dst, dlen, dctx);
-}
-
 static int deflate_sdecompress(struct crypto_scomp *tfm, const u8 *src,
   unsigned int slen, u8 *dst, unsigned int *dlen,
   void *ctx)
@@ -256,19 +226,6 @@ static int deflate_sdecompress(struct crypto_scomp *tfm, 
const u8 *src,
return __deflate_decompress(src, slen, dst, dlen, ctx);
 }
 
-static struct crypto_alg alg = {
-   .cra_name   = "deflate",
-   .cra_driver_name= "deflate-generic",
-   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
-   .cra_ctxsize= sizeof(struct deflate_ctx),
-   .cra_module = THIS_MODULE,
-   .cra_init   = deflate_init,
-   .cra_exit   = deflate_exit,
-   .cra_u  = { .compress = {
-   .coa_compress   = deflate_compress,
-   .coa_decompress = deflate_decompress } }
-};
-
 static struct scomp_alg scomp = {
.alloc_ctx  = deflate_alloc_ctx,
.free_ctx   = deflate_free_ctx,
@@ -283,24 +240,11 @@ static struct scomp_alg scomp = {
 
 static int __init deflate_mod_init(void)
 {
-   int ret;
-
-   ret = crypto_register_alg();
-   if (ret)
-   return ret;
-
-   ret = crypto_register_scomp();
-   if (ret) {
-   crypto_unregister_alg();
-   return ret;
-   }
-
-   return ret;
+   return crypto_register_scomp();
 }
 
 static void __exit deflate_mod_fini(void)
 {
-   crypto_unregister_alg();
crypto_unregister_scomp();
 }
 
-- 
2.39.2

[RFC PATCH 10/21] crypto: 842 - drop obsolete 'comp' implementation

2023-07-18 Thread Ard Biesheuvel

The 'comp' API is obsolete and will be removed, so remove this comp
implementation.

Signed-off-by: Ard Biesheuvel 
---
 crypto/842.c | 63 +---
 1 file changed, 1 insertion(+), 62 deletions(-)

diff --git a/crypto/842.c b/crypto/842.c
index e59e54d769609ba6..5001d88cf727f74e 100644
--- a/crypto/842.c
+++ b/crypto/842.c
@@ -39,38 +39,11 @@ static void *crypto842_alloc_ctx(struct crypto_scomp *tfm)
return ctx;
 }
 
-static int crypto842_init(struct crypto_tfm *tfm)
-{
-   struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   ctx->wmem = crypto842_alloc_ctx(NULL);
-   if (IS_ERR(ctx->wmem))
-   return -ENOMEM;
-
-   return 0;
-}
-
 static void crypto842_free_ctx(struct crypto_scomp *tfm, void *ctx)
 {
kfree(ctx);
 }
 
-static void crypto842_exit(struct crypto_tfm *tfm)
-{
-   struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   crypto842_free_ctx(NULL, ctx->wmem);
-}
-
-static int crypto842_compress(struct crypto_tfm *tfm,
- const u8 *src, unsigned int slen,
- u8 *dst, unsigned int *dlen)
-{
-   struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
-
-   return sw842_compress(src, slen, dst, dlen, ctx->wmem);
-}
-
 static int crypto842_scompress(struct crypto_scomp *tfm,
   const u8 *src, unsigned int slen,
   u8 *dst, unsigned int *dlen, void *ctx)
@@ -78,13 +51,6 @@ static int crypto842_scompress(struct crypto_scomp *tfm,
return sw842_compress(src, slen, dst, dlen, ctx);
 }
 
-static int crypto842_decompress(struct crypto_tfm *tfm,
-   const u8 *src, unsigned int slen,
-   u8 *dst, unsigned int *dlen)
-{
-   return sw842_decompress(src, slen, dst, dlen);
-}
-
 static int crypto842_sdecompress(struct crypto_scomp *tfm,
 const u8 *src, unsigned int slen,
 u8 *dst, unsigned int *dlen, void *ctx)
@@ -92,20 +58,6 @@ static int crypto842_sdecompress(struct crypto_scomp *tfm,
return sw842_decompress(src, slen, dst, dlen);
 }
 
-static struct crypto_alg alg = {
-   .cra_name   = "842",
-   .cra_driver_name= "842-generic",
-   .cra_priority   = 100,
-   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
-   .cra_ctxsize= sizeof(struct crypto842_ctx),
-   .cra_module = THIS_MODULE,
-   .cra_init   = crypto842_init,
-   .cra_exit   = crypto842_exit,
-   .cra_u  = { .compress = {
-   .coa_compress   = crypto842_compress,
-   .coa_decompress = crypto842_decompress } }
-};
-
 static struct scomp_alg scomp = {
.alloc_ctx  = crypto842_alloc_ctx,
.free_ctx   = crypto842_free_ctx,
@@ -121,25 +73,12 @@ static struct scomp_alg scomp = {
 
 static int __init crypto842_mod_init(void)
 {
-   int ret;
-
-   ret = crypto_register_alg();
-   if (ret)
-   return ret;
-
-   ret = crypto_register_scomp();
-   if (ret) {
-   crypto_unregister_alg();
-   return ret;
-   }
-
-   return ret;
+   return crypto_register_scomp();
 }
 subsys_initcall(crypto842_mod_init);
 
 static void __exit crypto842_mod_exit(void)
 {
-   crypto_unregister_alg();
crypto_unregister_scomp();
 }
 module_exit(crypto842_mod_exit);
-- 
2.39.2

[RFC PATCH 09/21] crypto: nx - Migrate to scomp API

2023-07-18 Thread Ard Biesheuvel

The only remaining user of 842 compression has been migrated to the
acomp compression API, and so the NX hardware driver has to follow suit,
given that no users of the obsolete 'comp' API remain, and it is going
to be removed.

So migrate the NX driver code to scomp. These will be wrapped and
exposed as acomp implementation via the crypto subsystem's
acomp-to-scomp adaptation layer.

Signed-off-by: Ard Biesheuvel 
---
 drivers/crypto/nx/nx-842.c| 34 
 drivers/crypto/nx/nx-842.h| 14 
 drivers/crypto/nx/nx-common-powernv.c | 30 -
 drivers/crypto/nx/nx-common-pseries.c | 32 +-
 4 files changed, 57 insertions(+), 53 deletions(-)

diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
index 2ab90ec10e61ebe8..331b9cdf85e27044 100644
--- a/drivers/crypto/nx/nx-842.c
+++ b/drivers/crypto/nx/nx-842.c
@@ -101,9 +101,14 @@ static int update_param(struct nx842_crypto_param *p,
return 0;
 }
 
-int nx842_crypto_init(struct crypto_tfm *tfm, struct nx842_driver *driver)
+void *nx842_crypto_alloc_ctx(struct crypto_scomp *tfm,
+struct nx842_driver *driver)
 {
-   struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+   struct nx842_crypto_ctx *ctx;
+
+   ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+   if (!ctx)
+   return ERR_PTR(-ENOMEM);
 
spin_lock_init(>lock);
ctx->driver = driver;
@@ -114,22 +119,23 @@ int nx842_crypto_init(struct crypto_tfm *tfm, struct 
nx842_driver *driver)
kfree(ctx->wmem);
free_page((unsigned long)ctx->sbounce);
free_page((unsigned long)ctx->dbounce);
-   return -ENOMEM;
+   kfree(ctx);
+   return ERR_PTR(-ENOMEM);
}
 
-   return 0;
+   return ctx;
 }
-EXPORT_SYMBOL_GPL(nx842_crypto_init);
+EXPORT_SYMBOL_GPL(nx842_crypto_alloc_ctx);
 
-void nx842_crypto_exit(struct crypto_tfm *tfm)
+void nx842_crypto_free_ctx(struct crypto_scomp *tfm, void *p)
 {
-   struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+   struct nx842_crypto_ctx *ctx = p;
 
kfree(ctx->wmem);
free_page((unsigned long)ctx->sbounce);
free_page((unsigned long)ctx->dbounce);
 }
-EXPORT_SYMBOL_GPL(nx842_crypto_exit);
+EXPORT_SYMBOL_GPL(nx842_crypto_free_ctx);
 
 static void check_constraints(struct nx842_constraints *c)
 {
@@ -246,11 +252,11 @@ static int compress(struct nx842_crypto_ctx *ctx,
return update_param(p, slen, dskip + dlen);
 }
 
-int nx842_crypto_compress(struct crypto_tfm *tfm,
+int nx842_crypto_compress(struct crypto_scomp *tfm,
  const u8 *src, unsigned int slen,
- u8 *dst, unsigned int *dlen)
+ u8 *dst, unsigned int *dlen, void *pctx)
 {
-   struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+   struct nx842_crypto_ctx *ctx = pctx;
struct nx842_crypto_header *hdr = >header;
struct nx842_crypto_param p;
struct nx842_constraints c = *ctx->driver->constraints;
@@ -429,11 +435,11 @@ static int decompress(struct nx842_crypto_ctx *ctx,
return update_param(p, slen + padding, dlen);
 }
 
-int nx842_crypto_decompress(struct crypto_tfm *tfm,
+int nx842_crypto_decompress(struct crypto_scomp *tfm,
const u8 *src, unsigned int slen,
-   u8 *dst, unsigned int *dlen)
+   u8 *dst, unsigned int *dlen, void *pctx)
 {
-   struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+   struct nx842_crypto_ctx *ctx = pctx;
struct nx842_crypto_header *hdr;
struct nx842_crypto_param p;
struct nx842_constraints c = *ctx->driver->constraints;
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
index 7590bfb24d79bf42..de9dc8df62ed9dcb 100644
--- a/drivers/crypto/nx/nx-842.h
+++ b/drivers/crypto/nx/nx-842.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Restrictions on Data Descriptor List (DDL) and Entry (DDE) buffers
  *
@@ -177,13 +178,14 @@ struct nx842_crypto_ctx {
struct nx842_driver *driver;
 };
 
-int nx842_crypto_init(struct crypto_tfm *tfm, struct nx842_driver *driver);
-void nx842_crypto_exit(struct crypto_tfm *tfm);
-int nx842_crypto_compress(struct crypto_tfm *tfm,
+void *nx842_crypto_alloc_ctx(struct crypto_scomp *tfm,
+struct nx842_driver *driver);
+void nx842_crypto_free_ctx(struct crypto_scomp *tfm, void *ctx);
+int nx842_crypto_compress(struct crypto_scomp *tfm,
  const u8 *src, unsigned int slen,
- u8 *dst, unsigned int *dlen);
-int nx842_crypto_decompress(struct crypto_tfm *tfm,
+ u8 *dst, unsigned int *dlen, void *ctx);
+int nx842_crypto_decompress(struct crypto_scomp *tfm,
const u8 *src, unsigned int slen,
-   u8

[RFC PATCH 08/21] zram: Migrate to acomp compression API

2023-07-18 Thread Ard Biesheuvel

Switch from the deprecated 'comp' to the more recent 'acomp' API.

This involves using scatterlists and request objects to describe the in-
and output buffers, all of which happen to be contiguous in memory, and
reside either entirely in lowmem, or inside a single highmem page. This
makes the conversion quite straight-forward, and easy to back by either
a software or a hardware implementation.

Signed-off-by: Ard Biesheuvel 
---
 drivers/block/zram/zcomp.c| 67 +++-
 drivers/block/zram/zcomp.h|  7 +-
 drivers/block/zram/zram_drv.c | 12 +---
 3 files changed, 57 insertions(+), 29 deletions(-)

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 55af4efd79835666..12bdd288a153c455 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -11,6 +11,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "zcomp.h"
 
@@ -35,26 +38,32 @@ static const char * const backends[] = {
 
 static void zcomp_strm_free(struct zcomp_strm *zstrm)
 {
+   if (zstrm->req)
+   acomp_request_free(zstrm->req);
if (!IS_ERR_OR_NULL(zstrm->tfm))
-   crypto_free_comp(zstrm->tfm);
+   crypto_free_acomp(zstrm->tfm);
free_pages((unsigned long)zstrm->buffer, 1);
+   zstrm->req = NULL;
zstrm->tfm = NULL;
zstrm->buffer = NULL;
 }
 
 /*
- * Initialize zcomp_strm structure with ->tfm initialized by backend, and
- * ->buffer. Return a negative value on error.
+ * Initialize zcomp_strm structure with ->tfm and ->req initialized by
+ * backend, and ->buffer. Return a negative value on error.
  */
 static int zcomp_strm_init(struct zcomp_strm *zstrm, struct zcomp *comp)
 {
-   zstrm->tfm = crypto_alloc_comp(comp->name, 0, 0);
+   zstrm->tfm = crypto_alloc_acomp(comp->name, 0, CRYPTO_ALG_ASYNC);
+   if (!IS_ERR_OR_NULL(zstrm->tfm))
+   zstrm->req = acomp_request_alloc(zstrm->tfm);
+
/*
 * allocate 2 pages. 1 for compressed data, plus 1 extra for the
 * case when compressed size is larger than the original one
 */
zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
-   if (IS_ERR_OR_NULL(zstrm->tfm) || !zstrm->buffer) {
+   if (IS_ERR_OR_NULL(zstrm->tfm) || !zstrm->req || !zstrm->buffer) {
zcomp_strm_free(zstrm);
return -ENOMEM;
}
@@ -70,7 +79,7 @@ bool zcomp_available_algorithm(const char *comp)
 * This also means that we permit zcomp initialisation
 * with any compressing algorithm known to crypto api.
 */
-   return crypto_has_comp(comp, 0, 0) == 1;
+   return crypto_has_acomp(comp, 0, CRYPTO_ALG_ASYNC);
 }
 
 /* show available compressors */
@@ -95,7 +104,7 @@ ssize_t zcomp_available_show(const char *comp, char *buf)
 * Out-of-tree module known to crypto api or a missing
 * entry in `backends'.
 */
-   if (!known_algorithm && crypto_has_comp(comp, 0, 0) == 1)
+   if (!known_algorithm && crypto_has_acomp(comp, 0, CRYPTO_ALG_ASYNC))
sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2,
"[%s] ", comp);
 
@@ -115,8 +124,14 @@ void zcomp_stream_put(struct zcomp *comp)
 }
 
 int zcomp_compress(struct zcomp_strm *zstrm,
-   const void *src, unsigned int *dst_len)
+  struct page *src, unsigned int *dst_len)
 {
+   struct scatterlist sg_src, sg_dst;
+   int ret;
+
+   sg_init_table(_src, 1);
+   sg_set_page(_src, src, PAGE_SIZE, 0);
+
/*
 * Our dst memory (zstrm->buffer) is always `2 * PAGE_SIZE' sized
 * because sometimes we can endup having a bigger compressed data
@@ -131,21 +146,39 @@ int zcomp_compress(struct zcomp_strm *zstrm,
 * the dst buffer, zram_drv will take care of the fact that
 * compressed buffer is too big.
 */
-   *dst_len = PAGE_SIZE * 2;
+   sg_init_one(_dst, zstrm->buffer, PAGE_SIZE * 2);
 
-   return crypto_comp_compress(zstrm->tfm,
-   src, PAGE_SIZE,
-   zstrm->buffer, dst_len);
+   acomp_request_set_params(zstrm->req, _src, _dst, PAGE_SIZE,
+PAGE_SIZE * 2);
+
+   ret = crypto_acomp_compress(zstrm->req);
+   if (ret)
+   return ret;
+
+   *dst_len = zstrm->req->dlen;
+   return 0;
 }
 
 int zcomp_decompress(struct zcomp_strm *zstrm,
-   const void *src, unsigned int src_len, void *dst)
+const void *src, unsigned int src_len, struct page *dst)
 {
-   unsigned int dst_len = PAGE_SIZE;
+   struct scatterlist sg_src, sg_dst;
 
-   return crypto_comp_decompress(zstrm->tfm,
-   src, src_len,
-   dst, _len);
+   if (is_kmap_addr(src)) {
+   sg_init_table(_src, 1);
+   sg_set_page(_src, kmap_to_page((void *)src), src_len,
+

[RFC PATCH 07/21] ubifs: Migrate to acomp compression API

2023-07-18 Thread Ard Biesheuvel

UBIFS is one of the remaining users of the obsolete 'comp' compression
API exposed by the crypto subsystem. Given that it operates strictly on
contiguous buffers that are either entirely in lowmem or covered by a
single page, the conversion to the acomp API is quite straight-forward.

Only synchronous acomp implementations are considered at the moment, and
whether or not a future conversion to permit asynchronous ones too will
be worth the effort remains to be seen.

Signed-off-by: Ard Biesheuvel 
---
 fs/ubifs/compress.c | 61 ++--
 fs/ubifs/file.c | 46 ---
 fs/ubifs/journal.c  | 19 --
 fs/ubifs/ubifs.h| 15 +++--
 4 files changed, 90 insertions(+), 51 deletions(-)

diff --git a/fs/ubifs/compress.c b/fs/ubifs/compress.c
index 75461777c466b1c9..570919b218a0a8cc 100644
--- a/fs/ubifs/compress.c
+++ b/fs/ubifs/compress.c
@@ -82,15 +82,15 @@ struct ubifs_compressor 
*ubifs_compressors[UBIFS_COMPR_TYPES_CNT];
 
 /**
  * ubifs_compress - compress data.
- * @in_buf: data to compress
+ * @in_sg: data to compress
  * @in_len: length of the data to compress
  * @out_buf: output buffer where compressed data should be stored
  * @out_len: output buffer length is returned here
  * @compr_type: type of compression to use on enter, actually used compression
  *  type on exit
  *
- * This function compresses input buffer @in_buf of length @in_len and stores
- * the result in the output buffer @out_buf and the resulting length in
+ * This function compresses input scatterlist @in_sg of length @in_len and
+ * stores the result in the output buffer @out_buf and the resulting length in
  * @out_len. If the input buffer does not compress, it is just copied to the
  * @out_buf. The same happens if @compr_type is %UBIFS_COMPR_NONE or if
  * compression error occurred.
@@ -98,11 +98,12 @@ struct ubifs_compressor 
*ubifs_compressors[UBIFS_COMPR_TYPES_CNT];
  * Note, if the input buffer was not compressed, it is copied to the output
  * buffer and %UBIFS_COMPR_NONE is returned in @compr_type.
  */
-void ubifs_compress(const struct ubifs_info *c, const void *in_buf,
+void ubifs_compress(const struct ubifs_info *c, struct scatterlist *in_sg,
int in_len, void *out_buf, int *out_len, int *compr_type)
 {
int err;
struct ubifs_compressor *compr = ubifs_compressors[*compr_type];
+   struct scatterlist out_sg;
 
if (*compr_type == UBIFS_COMPR_NONE)
goto no_compr;
@@ -111,10 +112,13 @@ void ubifs_compress(const struct ubifs_info *c, const 
void *in_buf,
if (in_len < UBIFS_MIN_COMPR_LEN)
goto no_compr;
 
+   sg_init_one(_sg, out_buf, *out_len);
+
if (compr->comp_mutex)
mutex_lock(compr->comp_mutex);
-   err = crypto_comp_compress(compr->cc, in_buf, in_len, out_buf,
-  (unsigned int *)out_len);
+   acomp_request_set_params(compr->req, in_sg, _sg, in_len, *out_len);
+   err = crypto_acomp_compress(compr->req);
+   *out_len = compr->req->dlen;
if (compr->comp_mutex)
mutex_unlock(compr->comp_mutex);
if (unlikely(err)) {
@@ -133,7 +137,7 @@ void ubifs_compress(const struct ubifs_info *c, const void 
*in_buf,
return;
 
 no_compr:
-   memcpy(out_buf, in_buf, in_len);
+   sg_copy_to_buffer(in_sg, 1, out_buf, in_len);
*out_len = in_len;
*compr_type = UBIFS_COMPR_NONE;
 }
@@ -142,19 +146,20 @@ void ubifs_compress(const struct ubifs_info *c, const 
void *in_buf,
  * ubifs_decompress - decompress data.
  * @in_buf: data to decompress
  * @in_len: length of the data to decompress
- * @out_buf: output buffer where decompressed data should
+ * @out_sg: output buffer where decompressed data should be stored
  * @out_len: output length is returned here
  * @compr_type: type of compression
  *
- * This function decompresses data from buffer @in_buf into buffer @out_buf.
+ * This function decompresses data from buffer @in_buf into scatterlist 
@out_sg.
  * The length of the uncompressed data is returned in @out_len. This functions
  * returns %0 on success or a negative error code on failure.
  */
-int ubifs_decompress(const struct ubifs_info *c, const void *in_buf,
-int in_len, void *out_buf, int *out_len, int compr_type)
+int ubifs_decompress(const struct ubifs_info *c, const void *in_buf, int 
in_len,
+struct scatterlist *out_sg, int *out_len, int compr_type)
 {
int err;
struct ubifs_compressor *compr;
+   struct scatterlist in_sg;
 
if (unlikely(compr_type < 0 || compr_type >= UBIFS_COMPR_TYPES_CNT)) {
ubifs_err(c, "invalid compression type %d", compr_type);
@@ -169,15 +174,18 @@ int ubifs_decompress(const struct ubifs_info *c, const 
void *in_buf,
}
 
if (compr_type == UBIFS_COMPR_NONE) {
-   memcpy(out_buf, in_buf, in_len);
+   sg_copy_from_buffer(out_sg, 1,

[RFC PATCH 06/21] ubifs: Avoid allocating buffer space unnecessarily

2023-07-18 Thread Ard Biesheuvel

The recompression scratch buffer is only used when the data node is
compressed, and there is no need to allocate it otherwise. So move the
allocation into the branch of the if() that actually makes use of it.

Signed-off-by: Ard Biesheuvel 
---
 fs/ubifs/journal.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
index 4e5961878f336033..5ce618f82aed201b 100644
--- a/fs/ubifs/journal.c
+++ b/fs/ubifs/journal.c
@@ -1485,16 +1485,9 @@ static int truncate_data_node(const struct ubifs_info 
*c, const struct inode *in
  unsigned int block, struct ubifs_data_node *dn,
  int *new_len, int dn_size)
 {
-   void *buf;
+   void *buf = NULL;
int err, dlen, compr_type, out_len, data_size;
 
-   out_len = le32_to_cpu(dn->size);
-   buf = kmalloc_array(out_len, WORST_COMPR_FACTOR, GFP_NOFS);
-   if (!buf)
-   return -ENOMEM;
-
-   out_len *= WORST_COMPR_FACTOR;
-
dlen = le32_to_cpu(dn->ch.len) - UBIFS_DATA_NODE_SZ;
data_size = dn_size - UBIFS_DATA_NODE_SZ;
compr_type = le16_to_cpu(dn->compr_type);
@@ -1508,6 +1501,13 @@ static int truncate_data_node(const struct ubifs_info 
*c, const struct inode *in
if (compr_type == UBIFS_COMPR_NONE) {
out_len = *new_len;
} else {
+   out_len = le32_to_cpu(dn->size);
+   buf = kmalloc_array(out_len, WORST_COMPR_FACTOR, GFP_NOFS);
+   if (!buf)
+   return -ENOMEM;
+
+   out_len *= WORST_COMPR_FACTOR;
+
err = ubifs_decompress(c, >data, dlen, buf, _len, 
compr_type);
if (err)
goto out;
-- 
2.39.2

[RFC PATCH 05/21] ubifs: Pass worst-case buffer size to compression routines

2023-07-18 Thread Ard Biesheuvel

Currently, the ubifs code allocates a worst case buffer size to
recompress a data node, but does not pass the size of that buffer to the
compression code. This means that the compression code will never use
the additional space, and might fail spuriously due to lack of space.

So let's multiply out_len by WORST_COMPR_FACTOR after allocating the
buffer. Doing so is guaranteed not to overflow, given that the preceding
kmalloc_array() call would have failed otherwise.

Signed-off-by: Ard Biesheuvel 
---
 fs/ubifs/journal.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
index dc52ac0f4a345f30..4e5961878f336033 100644
--- a/fs/ubifs/journal.c
+++ b/fs/ubifs/journal.c
@@ -1493,6 +1493,8 @@ static int truncate_data_node(const struct ubifs_info *c, 
const struct inode *in
if (!buf)
return -ENOMEM;
 
+   out_len *= WORST_COMPR_FACTOR;
+
dlen = le32_to_cpu(dn->ch.len) - UBIFS_DATA_NODE_SZ;
data_size = dn_size - UBIFS_DATA_NODE_SZ;
compr_type = le16_to_cpu(dn->compr_type);
-- 
2.39.2

[RFC PATCH 04/21] net: ipcomp: Migrate to acomp API from deprecated comp API

2023-07-18 Thread Ard Biesheuvel

Migrate the IPcomp network compression code to the acomp API, in order
to drop the dependency on the obsolete 'comp' API which is going away.

For the time being, this is a rather mechanical conversion replacing
each comp TFM object with an acomp TFM/request object pair - this is
necessary because, at this point, there is still a 1:1 relation between
acomp tranforms and requests in the acomp-to-scomp adaptation layer, and
this deviates from the model used by AEADs and skciphers where the TFM
is fully reentrant, and operations using the same encryption keys can be
issued in parallel using individual request objects but the same TFM.

Also, this minimal conversion does not yet take advantage of the fact
that the acomp API takes scatterlists as input and output descriptors,
which in principle removes the need to linearize the SKBs. However,
given that compression code generally requires in- and output buffers to
be non-overlapping, scratch buffers will always be needed, and so
whether this conversion is worth while is TBD.

Signed-off-by: Ard Biesheuvel 
---
 include/crypto/acompress.h |   5 +
 include/net/ipcomp.h   |   4 +-
 net/xfrm/xfrm_algo.c   |   7 +-
 net/xfrm/xfrm_ipcomp.c | 107 +---
 4 files changed, 79 insertions(+), 44 deletions(-)

diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h
index ccb6f3279bc8b32e..3f54e3d8815a9d0d 100644
--- a/include/crypto/acompress.h
+++ b/include/crypto/acompress.h
@@ -318,4 +318,9 @@ static inline int crypto_acomp_decompress(struct acomp_req 
*req)
return crypto_comp_errstat(alg, tfm->decompress(req));
 }
 
+static inline const char *crypto_acomp_name(struct crypto_acomp *acomp)
+{
+   return crypto_tfm_alg_name(crypto_acomp_tfm(acomp));
+}
+
 #endif
diff --git a/include/net/ipcomp.h b/include/net/ipcomp.h
index 8660a2a6d1fc76a7..bf27ac7e3ca952e2 100644
--- a/include/net/ipcomp.h
+++ b/include/net/ipcomp.h
@@ -7,12 +7,12 @@
 
 #define IPCOMP_SCRATCH_SIZE 65400
 
-struct crypto_comp;
+struct acomp_req;
 struct ip_comp_hdr;
 
 struct ipcomp_data {
u16 threshold;
-   struct crypto_comp * __percpu *tfms;
+   struct acomp_req * __percpu *reqs;
 };
 
 struct ip_comp_hdr;
diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index 094734fbec967505..ca411bcebc53ad4f 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2002 James Morris 
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -674,7 +675,7 @@ static const struct xfrm_algo_list xfrm_ealg_list = {
 static const struct xfrm_algo_list xfrm_calg_list = {
.algs = calg_list,
.entries = ARRAY_SIZE(calg_list),
-   .type = CRYPTO_ALG_TYPE_COMPRESS,
+   .type = CRYPTO_ALG_TYPE_ACOMPRESS,
.mask = CRYPTO_ALG_TYPE_MASK,
 };
 
@@ -833,8 +834,8 @@ void xfrm_probe_algs(void)
}
 
for (i = 0; i < calg_entries(); i++) {
-   status = crypto_has_comp(calg_list[i].name, 0,
-CRYPTO_ALG_ASYNC);
+   status = crypto_has_acomp(calg_list[i].name, 0,
+ CRYPTO_ALG_ASYNC);
if (calg_list[i].available != status)
calg_list[i].available = status;
}
diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
index 9c0fa0e1786a2d42..e29ef55e0f01d144 100644
--- a/net/xfrm/xfrm_ipcomp.c
+++ b/net/xfrm/xfrm_ipcomp.c
@@ -20,20 +20,21 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
-struct ipcomp_tfms {
+struct ipcomp_reqs {
struct list_head list;
-   struct crypto_comp * __percpu *tfms;
+   struct acomp_req * __percpu *reqs;
int users;
 };
 
 static DEFINE_MUTEX(ipcomp_resource_mutex);
 static void * __percpu *ipcomp_scratches;
 static int ipcomp_scratch_users;
-static LIST_HEAD(ipcomp_tfms_list);
+static LIST_HEAD(ipcomp_reqs_list);
 
 static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb)
 {
@@ -42,13 +43,19 @@ static int ipcomp_decompress(struct xfrm_state *x, struct 
sk_buff *skb)
int dlen = IPCOMP_SCRATCH_SIZE;
const u8 *start = skb->data;
u8 *scratch = *this_cpu_ptr(ipcomp_scratches);
-   struct crypto_comp *tfm = *this_cpu_ptr(ipcd->tfms);
-   int err = crypto_comp_decompress(tfm, start, plen, scratch, );
-   int len;
+   struct acomp_req *req = *this_cpu_ptr(ipcd->reqs);
+   struct scatterlist sg_in, sg_out;
+   int err, len;
 
+   sg_init_one(_in, start, plen);
+   sg_init_one(_out, scratch, dlen);
+   acomp_request_set_params(req, _in, _out, plen, dlen);
+
+   err = crypto_acomp_decompress(req);
if (err)
return err;
 
+   dlen = req->dlen;
if (dlen < (plen + sizeof(struct ip_comp_hdr)))
return -EINVAL;
 
@@ -125,17 +132,24 @@ static int ipcomp_compress(struct xfrm_state *x, struct 
sk_buff *skb)
const int plen = skb->len;

[RFC PATCH 03/21] crypto: acompress - Drop destination scatterlist allocation feature

2023-07-18 Thread Ard Biesheuvel

The acomp crypto code will allocate a destination scatterlist and its
backing pages on the fly if no destination is passed. This feature is
not used, and given that the caller should own this memory, it is far
better if the caller allocates it. This is especially true for
decompression, where the output size is essentially unbounded, and so
the caller already needs to provide the size for this feature to work
reliably.

Signed-off-by: Ard Biesheuvel 
---
 crypto/acompress.c |  6 
 crypto/scompress.c | 14 +-
 crypto/testmgr.c   | 29 
 include/crypto/acompress.h | 16 ++-
 4 files changed, 4 insertions(+), 61 deletions(-)

diff --git a/crypto/acompress.c b/crypto/acompress.c
index 1c682810a484dcdf..431876b0ee2096fd 100644
--- a/crypto/acompress.c
+++ b/crypto/acompress.c
@@ -71,7 +71,6 @@ static int crypto_acomp_init_tfm(struct crypto_tfm *tfm)
 
acomp->compress = alg->compress;
acomp->decompress = alg->decompress;
-   acomp->dst_free = alg->dst_free;
acomp->reqsize = alg->reqsize;
 
if (alg->exit)
@@ -173,11 +172,6 @@ void acomp_request_free(struct acomp_req *req)
if (tfm->__crt_alg->cra_type != _acomp_type)
crypto_acomp_scomp_free_ctx(req);
 
-   if (req->flags & CRYPTO_ACOMP_ALLOC_OUTPUT) {
-   acomp->dst_free(req->dst);
-   req->dst = NULL;
-   }
-
__acomp_request_free(req);
 }
 EXPORT_SYMBOL_GPL(acomp_request_free);
diff --git a/crypto/scompress.c b/crypto/scompress.c
index 442a82c9de7def1f..3155cdce9116e092 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -122,12 +122,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, 
int dir)
if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
return -EINVAL;
 
-   if (req->dst && !req->dlen)
+   if (!req->dst || !req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
return -EINVAL;
 
-   if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
-   req->dlen = SCOMP_SCRATCH_SIZE;
-
scratch = raw_cpu_ptr(_scratch);
spin_lock(>lock);
 
@@ -139,17 +136,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, 
int dir)
ret = crypto_scomp_decompress(scomp, scratch->src, req->slen,
  scratch->dst, >dlen, *ctx);
if (!ret) {
-   if (!req->dst) {
-   req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL);
-   if (!req->dst) {
-   ret = -ENOMEM;
-   goto out;
-   }
-   }
scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
 1);
}
-out:
spin_unlock(>lock);
return ret;
 }
@@ -197,7 +186,6 @@ int crypto_init_scomp_ops_async(struct crypto_tfm *tfm)
 
crt->compress = scomp_acomp_compress;
crt->decompress = scomp_acomp_decompress;
-   crt->dst_free = sgl_free;
crt->reqsize = sizeof(void *);
 
return 0;
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index b41a8e8c1d1a1987..4971351f55dbabb9 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3497,21 +3497,6 @@ static int test_acomp(struct crypto_acomp *tfm,
goto out;
}
 
-#ifdef CONFIG_CRYPTO_MANAGER_EXTRA_TESTS
-   crypto_init_wait();
-   sg_init_one(, input_vec, ilen);
-   acomp_request_set_params(req, , NULL, ilen, 0);
-
-   ret = crypto_wait_req(crypto_acomp_compress(req), );
-   if (ret) {
-   pr_err("alg: acomp: compression failed on NULL dst 
buffer test %d for %s: ret=%d\n",
-  i + 1, algo, -ret);
-   kfree(input_vec);
-   acomp_request_free(req);
-   goto out;
-   }
-#endif
-
kfree(input_vec);
acomp_request_free(req);
}
@@ -3573,20 +3558,6 @@ static int test_acomp(struct crypto_acomp *tfm,
goto out;
}
 
-#ifdef CONFIG_CRYPTO_MANAGER_EXTRA_TESTS
-   crypto_init_wait();
-   acomp_request_set_params(req, , NULL, ilen, 0);
-
-   ret = crypto_wait_req(crypto_acomp_decompress(req), );
-   if (ret) {
-   pr_err("alg: acomp: decompression failed on NULL dst 
buffer test %d for %s: ret=%d\n",
-  i + 1, algo, -ret);
-   kfree(input_vec);
-   acomp_request_free(req);
-   goto out;
-   }
-#endif
-
kfree(input_vec);
acomp_request_free(req);
}
diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h
index

[RFC PATCH 02/21] crypto: qat - Drop support for allocating destination buffers

2023-07-18 Thread Ard Biesheuvel

Remove the logic that allocates the destination scatterlist and backing
pages on the fly when no destination is provided: this is a rather
dubious proposition, given that the caller is in a far better position
to estimate the size of such a buffer, or how it should be allocated.

This feature has no current users, so let's remove it while we still
can.

Signed-off-by: Ard Biesheuvel 
---
 drivers/crypto/intel/qat/qat_common/qat_bl.c| 159 
 drivers/crypto/intel/qat/qat_common/qat_bl.h|   6 -
 drivers/crypto/intel/qat/qat_common/qat_comp_algs.c |  86 +--
 drivers/crypto/intel/qat/qat_common/qat_comp_req.h  |  10 --
 4 files changed, 1 insertion(+), 260 deletions(-)

diff --git a/drivers/crypto/intel/qat/qat_common/qat_bl.c 
b/drivers/crypto/intel/qat/qat_common/qat_bl.c
index 76baed0a76c0ee93..94f6a5fe0f3dea75 100644
--- a/drivers/crypto/intel/qat/qat_common/qat_bl.c
+++ b/drivers/crypto/intel/qat/qat_common/qat_bl.c
@@ -249,162 +249,3 @@ int qat_bl_sgl_to_bufl(struct adf_accel_dev *accel_dev,
extra_dst_buff, sz_extra_dst_buff,
sskip, dskip, flags);
 }
-
-static void qat_bl_sgl_unmap(struct adf_accel_dev *accel_dev,
-struct qat_alg_buf_list *bl)
-{
-   struct device *dev = _DEV(accel_dev);
-   int n = bl->num_bufs;
-   int i;
-
-   for (i = 0; i < n; i++)
-   if (!dma_mapping_error(dev, bl->buffers[i].addr))
-   dma_unmap_single(dev, bl->buffers[i].addr,
-bl->buffers[i].len, DMA_FROM_DEVICE);
-}
-
-static int qat_bl_sgl_map(struct adf_accel_dev *accel_dev,
- struct scatterlist *sgl,
- struct qat_alg_buf_list **bl)
-{
-   struct device *dev = _DEV(accel_dev);
-   struct qat_alg_buf_list *bufl;
-   int node = dev_to_node(dev);
-   struct scatterlist *sg;
-   int n, i, sg_nctr;
-   size_t sz;
-
-   n = sg_nents(sgl);
-   sz = struct_size(bufl, buffers, n);
-   bufl = kzalloc_node(sz, GFP_KERNEL, node);
-   if (unlikely(!bufl))
-   return -ENOMEM;
-
-   for (i = 0; i < n; i++)
-   bufl->buffers[i].addr = DMA_MAPPING_ERROR;
-
-   sg_nctr = 0;
-   for_each_sg(sgl, sg, n, i) {
-   int y = sg_nctr;
-
-   if (!sg->length)
-   continue;
-
-   bufl->buffers[y].addr = dma_map_single(dev, sg_virt(sg),
-  sg->length,
-  DMA_FROM_DEVICE);
-   bufl->buffers[y].len = sg->length;
-   if (unlikely(dma_mapping_error(dev, bufl->buffers[y].addr)))
-   goto err_map;
-   sg_nctr++;
-   }
-   bufl->num_bufs = sg_nctr;
-   bufl->num_mapped_bufs = sg_nctr;
-
-   *bl = bufl;
-
-   return 0;
-
-err_map:
-   for (i = 0; i < n; i++)
-   if (!dma_mapping_error(dev, bufl->buffers[i].addr))
-   dma_unmap_single(dev, bufl->buffers[i].addr,
-bufl->buffers[i].len,
-DMA_FROM_DEVICE);
-   kfree(bufl);
-   *bl = NULL;
-
-   return -ENOMEM;
-}
-
-static void qat_bl_sgl_free_unmap(struct adf_accel_dev *accel_dev,
- struct scatterlist *sgl,
- struct qat_alg_buf_list *bl,
- bool free_bl)
-{
-   if (bl) {
-   qat_bl_sgl_unmap(accel_dev, bl);
-
-   if (free_bl)
-   kfree(bl);
-   }
-   if (sgl)
-   sgl_free(sgl);
-}
-
-static int qat_bl_sgl_alloc_map(struct adf_accel_dev *accel_dev,
-   struct scatterlist **sgl,
-   struct qat_alg_buf_list **bl,
-   unsigned int dlen,
-   gfp_t gfp)
-{
-   struct scatterlist *dst;
-   int ret;
-
-   dst = sgl_alloc(dlen, gfp, NULL);
-   if (!dst) {
-   dev_err(_DEV(accel_dev), "sg_alloc failed\n");
-   return -ENOMEM;
-   }
-
-   ret = qat_bl_sgl_map(accel_dev, dst, bl);
-   if (ret)
-   goto err;
-
-   *sgl = dst;
-
-   return 0;
-
-err:
-   sgl_free(dst);
-   *sgl = NULL;
-   return ret;
-}
-
-int qat_bl_realloc_map_new_dst(struct adf_accel_dev *accel_dev,
-  struct scatterlist **sg,
-  unsigned int dlen,
-  struct qat_request_buffs *qat_bufs,
-  gfp_t gfp)
-{
-   struct device *dev = _DEV(accel_dev);
-   dma_addr_t new_blp = DMA_MAPPING_ERROR;
-   struct qat_alg_buf_list *new_bl;
-   struct scatterlist *new_sg;
-   size_t

[RFC PATCH 01/21] crypto: scomp - Revert "add support for deflate rfc1950 (zlib)"

2023-07-18 Thread Ard Biesheuvel

This reverts commit a368f43d6e3a001e684e9191a27df384fbff12f5.

"zlib-deflate" was introduced 6 years ago, but it does not have any
users. So let's remove the generic implementation and the test vectors,
but retain the "zlib-deflate" entry in the testmgr code to avoid
introducing warning messages on systems that implement zlib-deflate in
hardware.

Note that RFC 1950 which forms the basis of this algorithm dates back to
1996, and predates RFC 1951, on which the existing IPcomp is based and
which we have supported in the kernel since 2003. So it seems rather
unlikely that we will ever grow the need to support zlib-deflate.

Signed-off-by: Ard Biesheuvel 
---
 crypto/deflate.c | 61 +---
 crypto/testmgr.c |  8 +--
 crypto/testmgr.h | 75 
 3 files changed, 18 insertions(+), 126 deletions(-)

diff --git a/crypto/deflate.c b/crypto/deflate.c
index b2a46f6dc961e71d..f4f127078fe2a5aa 100644
--- a/crypto/deflate.c
+++ b/crypto/deflate.c
@@ -39,24 +39,20 @@ struct deflate_ctx {
struct z_stream_s decomp_stream;
 };
 
-static int deflate_comp_init(struct deflate_ctx *ctx, int format)
+static int deflate_comp_init(struct deflate_ctx *ctx)
 {
int ret = 0;
struct z_stream_s *stream = >comp_stream;
 
stream->workspace = vzalloc(zlib_deflate_workspacesize(
-   MAX_WBITS, MAX_MEM_LEVEL));
+   -DEFLATE_DEF_WINBITS, DEFLATE_DEF_MEMLEVEL));
if (!stream->workspace) {
ret = -ENOMEM;
goto out;
}
-   if (format)
-   ret = zlib_deflateInit(stream, 3);
-   else
-   ret = zlib_deflateInit2(stream, DEFLATE_DEF_LEVEL, Z_DEFLATED,
-   -DEFLATE_DEF_WINBITS,
-   DEFLATE_DEF_MEMLEVEL,
-   Z_DEFAULT_STRATEGY);
+   ret = zlib_deflateInit2(stream, DEFLATE_DEF_LEVEL, Z_DEFLATED,
+   -DEFLATE_DEF_WINBITS, DEFLATE_DEF_MEMLEVEL,
+   Z_DEFAULT_STRATEGY);
if (ret != Z_OK) {
ret = -EINVAL;
goto out_free;
@@ -68,7 +64,7 @@ static int deflate_comp_init(struct deflate_ctx *ctx, int 
format)
goto out;
 }
 
-static int deflate_decomp_init(struct deflate_ctx *ctx, int format)
+static int deflate_decomp_init(struct deflate_ctx *ctx)
 {
int ret = 0;
struct z_stream_s *stream = >decomp_stream;
@@ -78,10 +74,7 @@ static int deflate_decomp_init(struct deflate_ctx *ctx, int 
format)
ret = -ENOMEM;
goto out;
}
-   if (format)
-   ret = zlib_inflateInit(stream);
-   else
-   ret = zlib_inflateInit2(stream, -DEFLATE_DEF_WINBITS);
+   ret = zlib_inflateInit2(stream, -DEFLATE_DEF_WINBITS);
if (ret != Z_OK) {
ret = -EINVAL;
goto out_free;
@@ -105,21 +98,21 @@ static void deflate_decomp_exit(struct deflate_ctx *ctx)
vfree(ctx->decomp_stream.workspace);
 }
 
-static int __deflate_init(void *ctx, int format)
+static int __deflate_init(void *ctx)
 {
int ret;
 
-   ret = deflate_comp_init(ctx, format);
+   ret = deflate_comp_init(ctx);
if (ret)
goto out;
-   ret = deflate_decomp_init(ctx, format);
+   ret = deflate_decomp_init(ctx);
if (ret)
deflate_comp_exit(ctx);
 out:
return ret;
 }
 
-static void *gen_deflate_alloc_ctx(struct crypto_scomp *tfm, int format)
+static void *deflate_alloc_ctx(struct crypto_scomp *tfm)
 {
struct deflate_ctx *ctx;
int ret;
@@ -128,7 +121,7 @@ static void *gen_deflate_alloc_ctx(struct crypto_scomp 
*tfm, int format)
if (!ctx)
return ERR_PTR(-ENOMEM);
 
-   ret = __deflate_init(ctx, format);
+   ret = __deflate_init(ctx);
if (ret) {
kfree(ctx);
return ERR_PTR(ret);
@@ -137,21 +130,11 @@ static void *gen_deflate_alloc_ctx(struct crypto_scomp 
*tfm, int format)
return ctx;
 }
 
-static void *deflate_alloc_ctx(struct crypto_scomp *tfm)
-{
-   return gen_deflate_alloc_ctx(tfm, 0);
-}
-
-static void *zlib_deflate_alloc_ctx(struct crypto_scomp *tfm)
-{
-   return gen_deflate_alloc_ctx(tfm, 1);
-}
-
 static int deflate_init(struct crypto_tfm *tfm)
 {
struct deflate_ctx *ctx = crypto_tfm_ctx(tfm);
 
-   return __deflate_init(ctx, 0);
+   return __deflate_init(ctx);
 }
 
 static void __deflate_exit(void *ctx)
@@ -286,7 +269,7 @@ static struct crypto_alg alg = {
.coa_decompress = deflate_decompress } }
 };
 
-static struct scomp_alg scomp[] = { {
+static struct scomp_alg scomp = {
.alloc_ctx  = deflate_alloc_ctx,
.free_ctx   = deflate_free_ctx,
.compress   = deflate_scompress,
@@ -296,17 +279,7 @@ static struct scomp_alg scomp[] = {

[RFC PATCH 00/21] crypto: consolidate and clean up compression APIs

2023-07-18 Thread Ard Biesheuvel

This series is presented as an RFC, because I haven't quite convinced
myself that the acomp API really needs both scatterlists and request
objects to encapsulate the in- and output buffers, and perhaps there are
more drastic simplifications that we might consider.

However, the current situation with comp, scomp and acomp APIs is
definitely something that needs cleaning up, and so I implemented this
series under the working assumption that we will keep the current acomp
semantics wrt scatterlists and request objects.

Patch #1 drops zlib-deflate support in software, along with the test
cases we have for it. This has no users and should have never been
added.

Patch #2 removes the support for on-the-fly allocation of destination
buffers and scatterlists from the Intel QAT driver. This is never used,
and not even implemented by all drivers (the HiSilicon ZIP driver does
not support it). The diffstat of this patch makes a good case why the
caller should be in charge of allocating the memory, not the driver.

Patch #3 removes this on-the-fly allocation from the core acomp API.

Patch #4 does a minimal conversion of IPcomp to the acomp API.

Patch #5 and #6 are independent UBIFS fixes for things I ran into while
working on patch #7.

Patch #7 converts UBIFS to the acomp API.

Patch #8 converts the zram block driver to the acomp API.

Patches #9 to #19 remove the existing 'comp' API implementations as well
as the core plumbing, now that all clients of the API have been
converted. (Note that pstore stopped using the 'comp' API as well, but
these changes are already queued elsewhere)

Patch #20 converts the generic deflate compression driver to the acomp
API, so that it can natively operate on discontiguous buffers, rather
than requiring scratch buffers. This is the only IPcomp compression
algorithm we actually implement in software in the kernel, and this
conversion could help IPcomp if we decide to convert it further, and
remove the code that 'linearizes' SKBs in order to present them to the
compression API as a contiguous range.

Patch #21 converts the acomp-to-scomp adaptation layer so it no longer
requires per-CPU scratch buffers. This takes advantage of the fact that
all existing users of the acomp API pass contiguous memory regions, and
so scratch buffers are only needed in exceptional cases, and can be
allocated and deallocated on the fly. This removes the need for
preallocated per-CPU scratch buffers that can easily add up to tens of
megabytes on modern systems with high core counts and SMT.

These changes have been build tested and only lightly runtime tested. In
particular, I haven't performed any thorough testing on the acomp
conversions of IPcomp, UBIFS and ZRAM. Any hints on which respective
methods and test cases to use here are highly appreciated.

Cc: Herbert Xu 
Cc: Eric Biggers 
Cc: Kees Cook 
Cc: Haren Myneni 
Cc: Nick Terrell 
Cc: Minchan Kim 
Cc: Sergey Senozhatsky 
Cc: Jens Axboe 
Cc: Giovanni Cabiddu 
Cc: Richard Weinberger 
Cc: David Ahern 
Cc: Eric Dumazet 
Cc: Jakub Kicinski 
Cc: Paolo Abeni 
Cc: Steffen Klassert 
Cc: linux-cry...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-bl...@vger.kernel.org
Cc: qat-li...@intel.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-...@lists.infradead.org
Cc: net...@vger.kernel.org

Ard Biesheuvel (21):
  crypto: scomp - Revert "add support for deflate rfc1950 (zlib)"
  crypto: qat - Drop support for allocating destination buffers
  crypto: acompress - Drop destination scatterlist allocation feature
  net: ipcomp: Migrate to acomp API from deprecated comp API
  ubifs: Pass worst-case buffer size to compression routines
  ubifs: Avoid allocating buffer space unnecessarily
  ubifs: Migrate to acomp compression API
  zram: Migrate to acomp compression API
  crypto: nx - Migrate to scomp API
  crypto: 842 - drop obsolete 'comp' implementation
  crypto: deflate - drop obsolete 'comp' implementation
  crypto: lz4 - drop obsolete 'comp' implementation
  crypto: lz4hc - drop obsolete 'comp' implementation
  crypto: lzo-rle - drop obsolete 'comp' implementation
  crypto: lzo - drop obsolete 'comp' implementation
  crypto: zstd - drop obsolete 'comp' implementation
  crypto: cavium/zip - drop obsolete 'comp' implementation
  crypto: compress_null - drop obsolete 'comp' implementation
  crypto: remove obsolete 'comp' compression API
  crypto: deflate - implement acomp API directly
  crypto: scompress - Drop the use of per-cpu scratch buffers

 Documentation/crypto/architecture.rst   |   2 -
 crypto/842.c|  63 +---
 crypto/Makefile |   2 +-
 crypto/acompress.c  |   6 -
 crypto/api.c|   4 -
 crypto/compress.c   |  32 --
 crypto/crypto_null.c|  31 +-
 crypto/crypto_user_base.c   |  16 -

Re: [PATCH] powerpc/build: vdso linker warning for orphan sections

2023-07-18 Thread Michael Ellerman

John Ogness  writes:
> Hi Nicholas,
>
> On 2023-06-09, Nicholas Piggin  wrote:
>> Add --orphan-handlin for vdsos, and adjust vdso linker scripts to deal
>> with orphan sections.
>
> I'm reporting that I am getting a linker warning with 6.5-rc2. The
> warning message is:
>
> ld: warning: discarding dynamic section .rela.opd
>
> and bisects to:
>
> 8ad57add77d3 ("powerpc/build: vdso linker warning for orphan sections")
>
> Despite the warning, my ppc64 system seems to run fine. Let me know if
> you need any other information from me.

We already discard .opd and .rela*, so I guess we should also be
discarding .rela.opd.

Can you test with a newer compiler/binutils?

cheers

Re: linux-next: Tree for Jul 13 (drivers/video/fbdev/ps3fb.c)

2023-07-18 Thread Linux regression tracking (Thorsten Leemhuis)

Michael, thx for looking into this!

On 18.07.23 13:48, Michael Ellerman wrote:
> Bagas Sanjaya  writes:
>> On Thu, Jul 13, 2023 at 09:11:10AM -0700, Randy Dunlap wrote:
>>> on ppc64:
>>>
>>> In file included from ../include/linux/device.h:15,
>>>  from ../arch/powerpc/include/asm/io.h:22,
>>>  from ../include/linux/io.h:13,
>>>  from ../include/linux/irq.h:20,
>>>  from ../arch/powerpc/include/asm/hardirq.h:6,
>>>  from ../include/linux/hardirq.h:11,
>>>  from ../include/linux/interrupt.h:11,
>>>  from ../drivers/video/fbdev/ps3fb.c:25:
>>> ../drivers/video/fbdev/ps3fb.c: In function 'ps3fb_probe':
>>> ../drivers/video/fbdev/ps3fb.c:1172:40: error: 'struct fb_info' has no 
>>> member named 'dev'
>>>  1172 |  dev_driver_string(info->dev), dev_name(info->dev),
>>>   |^~
>>> ../include/linux/dev_printk.h:110:37: note: in definition of macro 
>>> 'dev_printk_index_wrap'
>>>   110 | _p_func(dev, fmt, ##__VA_ARGS__);   
>>> \
>>>   | ^~~
>>> ../drivers/video/fbdev/ps3fb.c:1171:9: note: in expansion of macro 
>>> 'dev_info'
>>>  1171 | dev_info(info->device, "%s %s, using %u KiB of video 
>>> memory\n",
>>>   | ^~~~
>>> ../drivers/video/fbdev/ps3fb.c:1172:61: error: 'struct fb_info' has no 
>>> member named 'dev'
>>>  1172 |  dev_driver_string(info->dev), dev_name(info->dev),
>>>   | ^~
>>> ../include/linux/dev_printk.h:110:37: note: in definition of macro 
>>> 'dev_printk_index_wrap'
>>>   110 | _p_func(dev, fmt, ##__VA_ARGS__);   
>>> \
>>>   | ^~~
>>> ../drivers/video/fbdev/ps3fb.c:1171:9: note: in expansion of macro 
>>> 'dev_info'
>>>  1171 | dev_info(info->device, "%s %s, using %u KiB of video 
>>> memory\n",
>>>   | ^~~~
>>>
>>>
>>
>> Hmm, there is no response from Thomas yet. I guess we should go with
>> reverting bdb616479eff419, right? Regardless, I'm adding this build 
>> regression
>> to regzbot so that parties involved are aware of it:
>>
>> #regzbot ^introduced: bdb616479eff419
>> #regzbot title: build regression in PS3 framebuffer
> 
> Does regzbot track issues in linux-next?

It can, I made sure of that in case somebody want to use this sooner or
later (and it wasn't much work), but I don't actively use this
functionally right now and do not plan to do so, there are more
important issues to spend time on.

> They're not really regressions because they're not in a release yet.
> 
> Anyway I don't see where bdb616479eff419 comes from.

That makes two of us :-D

> The issue was introduced by:
> 
>   701d2054fa31 fbdev: Make support for userspace interfaces configurable

Ahh, that makes a lot more sense. While at it, let me tell regzbot:

#regzbot introduced: 701d2054fa31

Ciao, Thorsten

Re: linux-next: Tree for Jul 13 (drivers/video/fbdev/ps3fb.c)

2023-07-18 Thread Michael Ellerman

Bagas Sanjaya  writes:
> On Thu, Jul 13, 2023 at 09:11:10AM -0700, Randy Dunlap wrote:
>> on ppc64:
>> 
>> In file included from ../include/linux/device.h:15,
>>  from ../arch/powerpc/include/asm/io.h:22,
>>  from ../include/linux/io.h:13,
>>  from ../include/linux/irq.h:20,
>>  from ../arch/powerpc/include/asm/hardirq.h:6,
>>  from ../include/linux/hardirq.h:11,
>>  from ../include/linux/interrupt.h:11,
>>  from ../drivers/video/fbdev/ps3fb.c:25:
>> ../drivers/video/fbdev/ps3fb.c: In function 'ps3fb_probe':
>> ../drivers/video/fbdev/ps3fb.c:1172:40: error: 'struct fb_info' has no 
>> member named 'dev'
>>  1172 |  dev_driver_string(info->dev), dev_name(info->dev),
>>   |^~
>> ../include/linux/dev_printk.h:110:37: note: in definition of macro 
>> 'dev_printk_index_wrap'
>>   110 | _p_func(dev, fmt, ##__VA_ARGS__);
>>\
>>   | ^~~
>> ../drivers/video/fbdev/ps3fb.c:1171:9: note: in expansion of macro 'dev_info'
>>  1171 | dev_info(info->device, "%s %s, using %u KiB of video 
>> memory\n",
>>   | ^~~~
>> ../drivers/video/fbdev/ps3fb.c:1172:61: error: 'struct fb_info' has no 
>> member named 'dev'
>>  1172 |  dev_driver_string(info->dev), dev_name(info->dev),
>>   | ^~
>> ../include/linux/dev_printk.h:110:37: note: in definition of macro 
>> 'dev_printk_index_wrap'
>>   110 | _p_func(dev, fmt, ##__VA_ARGS__);
>>\
>>   | ^~~
>> ../drivers/video/fbdev/ps3fb.c:1171:9: note: in expansion of macro 'dev_info'
>>  1171 | dev_info(info->device, "%s %s, using %u KiB of video 
>> memory\n",
>>   | ^~~~
>> 
>> 
>
> Hmm, there is no response from Thomas yet. I guess we should go with
> reverting bdb616479eff419, right? Regardless, I'm adding this build regression
> to regzbot so that parties involved are aware of it:
>
> #regzbot ^introduced: bdb616479eff419
> #regzbot title: build regression in PS3 framebuffer

Does regzbot track issues in linux-next?

They're not really regressions because they're not in a release yet.

Anyway I don't see where bdb616479eff419 comes from.

The issue was introduced by:

  701d2054fa31 fbdev: Make support for userspace interfaces configurable

The driver seems to only use info->dev in that one dev_info() line,
which seems purely cosmetic, so I think it could just be removed, eg:

diff --git a/drivers/video/fbdev/ps3fb.c b/drivers/video/fbdev/ps3fb.c
index d4abcf8aff75..a304a39d712b 100644
--- a/drivers/video/fbdev/ps3fb.c
+++ b/drivers/video/fbdev/ps3fb.c
@@ -1168,8 +1168,7 @@ static int ps3fb_probe(struct ps3_system_bus_device *dev)
 
ps3_system_bus_set_drvdata(dev, info);
 
-   dev_info(info->device, "%s %s, using %u KiB of video memory\n",
-dev_driver_string(info->dev), dev_name(info->dev),
+   dev_info(info->device, "using %u KiB of video memory\n",
 info->fix.smem_len >> 10);
 
task = kthread_run(ps3fbd, info, DEVICE_NAME);


cheers

Re: [PATCH] ASoC: fsl_sai: Disable bit clock with transmitter

2023-07-18 Thread Mark Brown

On Wed, 12 Jul 2023 14:49:33 +0200, Matus Gajdos wrote:
> Otherwise bit clock remains running writing invalid data to the DAC.
> 
> 

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl_sai: Disable bit clock with transmitter
  commit: 269f399dc19f0e5c51711c3ba3bd06e0ef6ef403

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend

2023-07-18 Thread Bjorn Helgaas

[+cc Rafael]

On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote:
> PCIe services that share an IRQ with PME, such as AER or DPC, may cause a
> spurious wakeup on system suspend. To prevent this, disable the AER interrupt
> notification during the system suspend process.

I see that in this particular BZ dmesg log, PME, AER, and DPC do share
the same IRQ, but I don't think this is true in general.

Root Ports usually use MSI or MSI-X.  PME and hotplug events use the
Interrupt Message Number in the PCIe Capability, but AER uses the one
in the AER Root Error Status register, and DPC uses the one in the DPC
Capability register.  Those potentially correspond to three distinct
MSI/MSI-X vectors.

I think this probably has nothing to do with the IRQ being *shared*,
but just that putting the downstream component into D3cold, where the
link state is L3, may cause the upstream component to log and signal a
link-related error as the link goes completely down.

I don't think D0-D3hot should be relevant here because in all those
states, the link should be active because the downstream config space
remains accessible.  So I'm not sure if it's possible, but I wonder if
there's a more targeted place we could do this, e.g., in the path that
puts downstream devices in D3cold.

> As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management",
> TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2
> (D3cold with aux power) and L3 (D3cold) states. So disabling the AER
> notification during suspend and re-enabling them during the resume process
> should not affect the basic functionality.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> Reviewed-by: Mika Westerberg 
> Signed-off-by: Kai-Heng Feng 
> ---
> v6:
> v5:
>  - Wording.
> 
> v4:
> v3:
>  - No change.
> 
> v2:
>  - Only disable AER IRQ.
>  - No more check on PME IRQ#.
>  - Use helper.
> 
>  drivers/pci/pcie/aer.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1420e1f27105..9c07fdbeb52d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev)
>   return 0;
>  }
>  
> +static int aer_suspend(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + aer_disable_irq(pdev);
> +
> + return 0;
> +}
> +
> +static int aer_resume(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + aer_enable_irq(pdev);
> +
> + return 0;
> +}
> +
>  /**
>   * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
>   * @dev: pointer to Root Port, RCEC, or RCiEP
> @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = {
>   .service= PCIE_PORT_SERVICE_AER,
>  
>   .probe  = aer_probe,
> + .suspend= aer_suspend,
> + .resume = aer_resume,
>   .remove = aer_remove,
>  };
>  
> -- 
> 2.34.1
>

Re: [PATCH v4 11/18] media: Remove flag FBINFO_FLAG_DEFAULT from fbdev drivers

2023-07-18 Thread Hans Verkuil

Hi Thomas,

On 15/07/2023 20:51, Thomas Zimmermann wrote:
> The flag FBINFO_FLAG_DEFAULT is 0 and has no effect, as struct
> fbinfo.flags has been allocated to zero by kzalloc(). So do not
> set it.
> 
> Flags should signal differences from the default values. After cleaning
> up all occurrences of FBINFO_DEFAULT, the token will be removed.
> 
> v2:
>   * fix commit message (Miguel)
> 
> Signed-off-by: Thomas Zimmermann 
> Acked-by: Sam Ravnborg 
> Cc: Andy Walls 
> Cc: Mauro Carvalho Chehab 
> Cc: Hans Verkuil 
> ---
>  drivers/media/pci/ivtv/ivtvfb.c  | 1 -
>  drivers/media/test-drivers/vivid/vivid-osd.c | 1 -
>  2 files changed, 2 deletions(-)

I can take this patches for 6.6, unless you prefer to have this whole series
merged in one go?

In that case you can use my:

Reviewed-by: Hans Verkuil 

Regards,

Hans

> 
> diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
> index 0aeb9daaee4c..23c8c094e791 100644
> --- a/drivers/media/pci/ivtv/ivtvfb.c
> +++ b/drivers/media/pci/ivtv/ivtvfb.c
> @@ -1048,7 +1048,6 @@ static int ivtvfb_init_vidmode(struct ivtv *itv)
>   /* Generate valid fb_info */
>  
>   oi->ivtvfb_info.node = -1;
> - oi->ivtvfb_info.flags = FBINFO_FLAG_DEFAULT;
>   oi->ivtvfb_info.par = itv;
>   oi->ivtvfb_info.var = oi->ivtvfb_defined;
>   oi->ivtvfb_info.fix = oi->ivtvfb_fix;
> diff --git a/drivers/media/test-drivers/vivid/vivid-osd.c 
> b/drivers/media/test-drivers/vivid/vivid-osd.c
> index ec25edc679b3..051f1805a16d 100644
> --- a/drivers/media/test-drivers/vivid/vivid-osd.c
> +++ b/drivers/media/test-drivers/vivid/vivid-osd.c
> @@ -310,7 +310,6 @@ static int vivid_fb_init_vidmode(struct vivid_dev *dev)
>   /* Generate valid fb_info */
>  
>   dev->fb_info.node = -1;
> - dev->fb_info.flags = FBINFO_FLAG_DEFAULT;
>   dev->fb_info.par = dev;
>   dev->fb_info.var = dev->fb_defined;
>   dev->fb_info.fix = dev->fb_fix;

Re: [PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock()

2023-07-18 Thread Aneesh Kumar K.V

Hugh Dickins  writes:

> Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
> in assert_pte_locked().  BUG if pte_offset_map_nolock() fails: this is
> stricter than the previous implementation, which skipped when pmd_none()
> (with a comment on khugepaged collapse transitions): but wouldn't we want
> to know, if an assert_pte_locked() caller can be racing such transitions?
>

The reason we had that pmd_none check there was to handle khugpaged. In
case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear.
ppc64 had the assert_pte_locked check inside that ptep_clear.

_pmd = pmdp_collapse_flush(vma, address, pmd);
..
ptep_clear()
-> asset_ptep_locked()
---> pmd_none
-> BUG


The problem is how assert_pte_locked() verify whether we are holding
ptl. It does that by walking the page table again and in this specific
case by the time we call the function we already had cleared pmd .
>
> This mod might cause new crashes: which either expose my ignorance, or
> indicate issues to be fixed, or limit the usage of assert_pte_locked().
>
> Signed-off-by: Hugh Dickins 
> ---
>  arch/powerpc/mm/pgtable.c | 16 ++--
>  1 file changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index cb2dcdb18f8e..16b061af86d7 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned 
> long addr)
>   p4d_t *p4d;
>   pud_t *pud;
>   pmd_t *pmd;
> + pte_t *pte;
> + spinlock_t *ptl;
>  
>   if (mm == _mm)
>   return;
> @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned 
> long addr)
>   pud = pud_offset(p4d, addr);
>   BUG_ON(pud_none(*pud));
>   pmd = pmd_offset(pud, addr);
> - /*
> -  * khugepaged to collapse normal pages to hugepage, first set
> -  * pmd to none to force page fault/gup to take mmap_lock. After
> -  * pmd is set to none, we do a pte_clear which does this assertion
> -  * so if we find pmd none, return.
> -  */
> - if (pmd_none(*pmd))
> - return;
> - BUG_ON(!pmd_present(*pmd));
> - assert_spin_locked(pte_lockptr(mm, pmd));
> + pte = pte_offset_map_nolock(mm, pmd, addr, );
> + BUG_ON(!pte);
> + assert_spin_locked(ptl);
> + pte_unmap(pte);
>  }
>  #endif /* CONFIG_DEBUG_VM */
>  
> -- 
> 2.35.3

Re: linux-next: Tree for Jul 13 (drivers/video/fbdev/ps3fb.c)

2023-07-18 Thread Thorsten Leemhuis

On 18.07.23 05:32, Bagas Sanjaya wrote:
> On Thu, Jul 13, 2023 at 09:11:10AM -0700, Randy Dunlap wrote:
>> On 7/12/23 19:37, Stephen Rothwell wrote:
>>> Changes since 20230712:
>>
>> on ppc64:
>>
>> In file included from ../include/linux/device.h:15,
>>  from ../arch/powerpc/include/asm/io.h:22,
>>  from ../include/linux/io.h:13,
>>  from ../include/linux/irq.h:20,
>>  from ../arch/powerpc/include/asm/hardirq.h:6,
>>  from ../include/linux/hardirq.h:11,
>>  from ../include/linux/interrupt.h:11,
>>  from ../drivers/video/fbdev/ps3fb.c:25:
>> ../drivers/video/fbdev/ps3fb.c: In function 'ps3fb_probe':
>> ../drivers/video/fbdev/ps3fb.c:1172:40: error: 'struct fb_info' has no 
>> member named 'dev'
>>  1172 |  dev_driver_string(info->dev), dev_name(info->dev),
>>   |^~
>> ../include/linux/dev_printk.h:110:37: note: in definition of macro 
>> 'dev_printk_index_wrap'
>>   110 | _p_func(dev, fmt, ##__VA_ARGS__);
>>\
>>   | ^~~
>> ../drivers/video/fbdev/ps3fb.c:1171:9: note: in expansion of macro 'dev_info'
>>  1171 | dev_info(info->device, "%s %s, using %u KiB of video 
>> memory\n",
>>   | ^~~~
>> ../drivers/video/fbdev/ps3fb.c:1172:61: error: 'struct fb_info' has no 
>> member named 'dev'
>>  1172 |  dev_driver_string(info->dev), dev_name(info->dev),
>>   | ^~
>> ../include/linux/dev_printk.h:110:37: note: in definition of macro 
>> 'dev_printk_index_wrap'
>>   110 | _p_func(dev, fmt, ##__VA_ARGS__);
>>\
>>   | ^~~
>> ../drivers/video/fbdev/ps3fb.c:1171:9: note: in expansion of macro 'dev_info'
>>  1171 | dev_info(info->device, "%s %s, using %u KiB of video 
>> memory\n",
>>   | ^~~~
> 
> Hmm, there is no response from Thomas yet. I guess we should go with
> reverting bdb616479eff419, right?

I'm missing something here:

* What makes you think this is caused by bdb616479eff419? I didn't see
anything in the thread that claims this, but I might be missing something
* related: if I understand Randy right, this is only happening in -next;
so why is bdb616479eff419 the culprit, which is also in mainline since
End of June?

And asking for a revert already is a bit jumping the gun; sure, it would
be good to get this fixed, but remember: developers have a lot on their
plate and thus sometimes are forced to set priorities; they also
sometimes go on vacation or are afk for other reasons; and sometimes
they just miss a mail or two. These are just a few reasons why there
might be good reasons why Thomas didn't look into this yet, hence please
first ask really kindly before asking for a revert.

Ciao, Thorsten

[PATCH] platforms: powermac: insert space before the open parenthesis '('

2023-07-18 Thread hanyu001


Fixes checkpatch errors:

/platforms/powermac/low_i2c.c:55:ERROR: space required before the open 
parenthesis '('
/platforms/powermac/low_i2c.c:63:ERROR: space required before the open 
parenthesis '('


Signed-off-by: Yu Han 
---
 arch/powerpc/platforms/powermac/low_i2c.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/low_i2c.c 
b/arch/powerpc/platforms/powermac/low_i2c.c

index 40f3aa432fba..25cc6eec962f 100644
--- a/arch/powerpc/platforms/powermac/low_i2c.c
+++ b/arch/powerpc/platforms/powermac/low_i2c.c
@@ -52,7 +52,7 @@
 #ifdef DEBUG
 #define DBG(x...) do {\
 printk(KERN_DEBUG "low_i2c:" x);\
-} while(0)
+} while (0)
 #else
 #define DBG(x...)
 #endif
@@ -60,7 +60,7 @@
 #ifdef DEBUG_LOW
 #define DBG_LOW(x...) do {\
 printk(KERN_DEBUG "low_i2c:" x);\
-} while(0)
+} while (0)
 #else
 #define DBG_LOW(x...)
 #endif

[PATCH] platforms: powermac: insert space before the open parenthesis '('

2023-07-18 Thread hanyu001


Fixes checkpatch error:

/powerpc/platforms/powermac/setup.c:222:ERROR: space required before the 
open parenthesis '('


Signed-off-by: Yu Han 
---
 arch/powerpc/platforms/powermac/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powermac/setup.c 
b/arch/powerpc/platforms/powermac/setup.c

index 0c41f4b005bc..a89f3022f3a8 100644
--- a/arch/powerpc/platforms/powermac/setup.c
+++ b/arch/powerpc/platforms/powermac/setup.c
@@ -219,7 +219,7 @@ static void __init ohare_init(void)
 sysctrl_regs[4] |= 0x0420;
 else
 sysctrl_regs[4] |= 0x0400;
-if(has_l2cache)
+if (has_l2cache)
 printk(KERN_INFO "Level 2 cache enabled\n");
 }
 }

[PATCH] powerpc: platforms: chrp: Add require space after that ','

2023-07-18 Thread hanyu001


Fixes checkpatch errors:

./arch/powerpc/platforms/chrp/time.c:109: ERROR: space required after 
that ',' (ctx:VxV)
./arch/powerpc/platforms/chrp/time.c:110: ERROR: space required after 
that ',' (ctx:VxV)
./arch/powerpc/platforms/chrp/time.c:111: ERROR: space required after 
that ',' (ctx:VxV)
./arch/powerpc/platforms/chrp/time.c:112: ERROR: space required after 
that ',' (ctx:VxV)
./arch/powerpc/platforms/chrp/time.c:113: ERROR: space required after 
that ',' (ctx:VxV)
./arch/powerpc/platforms/chrp/time.c:114: ERROR: space required after 
that ',' (ctx:VxV)


Signed-off-by: Yu Han 
---
 arch/powerpc/platforms/chrp/time.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/chrp/time.c 
b/arch/powerpc/platforms/chrp/time.c

index d46417e3d8e0..6bd40be22c33 100644
--- a/arch/powerpc/platforms/chrp/time.c
+++ b/arch/powerpc/platforms/chrp/time.c
@@ -106,12 +106,12 @@ int chrp_set_rtc_time(struct rtc_time *tmarg)
 tm.tm_mday = bin2bcd(tm.tm_mday);
 tm.tm_year = bin2bcd(tm.tm_year);
 }
-chrp_cmos_clock_write(tm.tm_sec,RTC_SECONDS);
-chrp_cmos_clock_write(tm.tm_min,RTC_MINUTES);
-chrp_cmos_clock_write(tm.tm_hour,RTC_HOURS);
-chrp_cmos_clock_write(tm.tm_mon,RTC_MONTH);
-chrp_cmos_clock_write(tm.tm_mday,RTC_DAY_OF_MONTH);
-chrp_cmos_clock_write(tm.tm_year,RTC_YEAR);
+chrp_cmos_clock_write(tm.tm_sec, RTC_SECONDS);
+chrp_cmos_clock_write(tm.tm_min, RTC_MINUTES);
+chrp_cmos_clock_write(tm.tm_hour, RTC_HOURS);
+chrp_cmos_clock_write(tm.tm_mon, RTC_MONTH);
+chrp_cmos_clock_write(tm.tm_mday, RTC_DAY_OF_MONTH);
+chrp_cmos_clock_write(tm.tm_year, RTC_YEAR);

 /* The following flags have to be released exactly in this order,
  * otherwise the DS12887 (popular MC146818A clone with integrated

Re: [PATCH] net: Explicitly include correct DT includes

2023-07-18 Thread Paolo Abeni

Hi,

On Sat, 2023-07-15 at 10:11 -0500, Alex Elder wrote:
> On 7/14/23 12:48 PM, Rob Herring wrote:
> > The DT of_device.h and of_platform.h date back to the separate
> > of_platform_bus_type before it as merged into the regular platform bus.
> > As part of that merge prepping Arm DT support 13 years ago, they
> > "temporarily" include each other. They also include platform_device.h
> > and of.h. As a result, there's a pretty much random mix of those include
> > files used throughout the tree. In order to detangle these headers and
> > replace the implicit includes with struct declarations, users need to
> > explicitly include the correct includes.
> > 
> > Signed-off-by: Rob Herring 
> 
> (I significantly reduced the addressee list to permit the message
> to be sent.)
> 
> For "drivers/net/ipa/ipa_main.c":
> 
> Acked-by: Alex Elder 

The patch does not apply cleanly to net-next. Rob, could you please re-
spin it? While at that, have you considered splitting it in a few
smaller patches (e.g. can, dsa, freescale, ibm, marvel, mediatek,
stmmicro,  sun, ti, xilinx, wireless, remaining)?

Thanks!

Paolo

[PATCH] platforms: chrp: Add require space after that ','

2023-07-18 Thread hanyu001


Fixes checkpatch errors:

./arch/powerpc/platforms/chrp/setup.c:91: ERROR: space required after 
that ',' (ctx:VxV)
./arch/powerpc/platforms/chrp/setup.c:91: ERROR: space required after 
that ',' (ctx:VxV)


Signed-off-by: Yu Han 
---
 arch/powerpc/platforms/chrp/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/chrp/setup.c 
b/arch/powerpc/platforms/chrp/setup.c

index 36ee3a5056a1..f8f06413bf23 100644
--- a/arch/powerpc/platforms/chrp/setup.c
+++ b/arch/powerpc/platforms/chrp/setup.c
@@ -88,7 +88,7 @@ static const char *gg2_cachemodes[4] = {

 static const char *chrp_names[] = {
 "Unknown",
-"","","",
+"", "", "",
 "Motorola",
 "IBM or Longtrail",
 "Genesi Pegasos",

[PATCH] platforms: 52xx: Remove space after '(' and before ')'

2023-07-18 Thread hanyu001


The patch fixes the following errors detected by checkpatch:

platforms/52xx/mpc52xx_pci.c:346:ERROR: space prohibited after that open 
parenthesis '('
platforms/52xx/mpc52xx_pci.c:347:ERROR: space prohibited after that open 
parenthesis '('
platforms/52xx/mpc52xx_pci.c:348:ERROR: space prohibited before that 
close parenthesis ')'


Signed-off-by: Yu Han 
---
 arch/powerpc/platforms/52xx/mpc52xx_pci.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/mpc52xx_pci.c 
b/arch/powerpc/platforms/52xx/mpc52xx_pci.c

index 0ca4401ba781..452723f8ba53 100644
--- a/arch/powerpc/platforms/52xx/mpc52xx_pci.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_pci.c
@@ -343,9 +343,9 @@ mpc52xx_pci_fixup_resources(struct pci_dev *dev)

 /* The PCI Host bridge of MPC52xx has a prefetch memory resource
fixed to 1Gb. Doesn't fit in the resource system so we remove it 
*/

-if ( (dev->vendor == PCI_VENDOR_ID_MOTOROLA) &&
- (   dev->device == PCI_DEVICE_ID_MOTOROLA_MPC5200
-  || dev->device == PCI_DEVICE_ID_MOTOROLA_MPC5200B) ) {
+if ((dev->vendor == PCI_VENDOR_ID_MOTOROLA) &&
+ (dev->device == PCI_DEVICE_ID_MOTOROLA_MPC5200
+  || dev->device == PCI_DEVICE_ID_MOTOROLA_MPC5200B)) {
 struct resource *res = >resource[1];
 res->start = res->end = res->flags = 0;
 }

[PATCH] powerpc: platforms: ps3: Add require space after that ';'

2023-07-18 Thread hanyu001


Fixes checkpatch errors:

./arch/powerpc/platforms/ps3/platform.h:198: ERROR: space required after 
that ';' (ctx:VxV)
./arch/powerpc/platforms/ps3/platform.h:200: ERROR: space required after 
that ';' (ctx:VxV)
./arch/powerpc/platforms/ps3/platform.h:202: ERROR: space required after 
that ';' (ctx:VxV)
./arch/powerpc/platforms/ps3/platform.h:204: ERROR: space required after 
that ';' (ctx:VxV)
./arch/powerpc/platforms/ps3/platform.h:206: ERROR: space required after 
that ';' (ctx:VxV)


Signed-off-by: Yu Han 
---
 arch/powerpc/platforms/ps3/platform.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/ps3/platform.h 
b/arch/powerpc/platforms/ps3/platform.h

index 6beecdb0d51f..715167ab7348 100644
--- a/arch/powerpc/platforms/ps3/platform.h
+++ b/arch/powerpc/platforms/ps3/platform.h
@@ -195,15 +195,15 @@ int ps3_repository_write_highmem_info(unsigned int 
region_index,

 int ps3_repository_delete_highmem_info(unsigned int region_index);
 #else
 static inline int ps3_repository_write_highmem_region_count(
-unsigned int region_count) {return 0;}
+unsigned int region_count) {return 0; }
 static inline int ps3_repository_write_highmem_base(unsigned int 
region_index,

-u64 highmem_base) {return 0;}
+u64 highmem_base) {return 0; }
 static inline int ps3_repository_write_highmem_size(unsigned int 
region_index,

-u64 highmem_size) {return 0;}
+u64 highmem_size) {return 0; }
 static inline int ps3_repository_write_highmem_info(unsigned int 
region_index,

-u64 highmem_base, u64 highmem_size) {return 0;}
+u64 highmem_base, u64 highmem_size) {return 0; }
 static inline int ps3_repository_delete_highmem_info(unsigned int 
region_index)

-{return 0;}
+{return 0; }
 #endif

 /* repository pme info */

[PATCH] powerpc: platforms: ps3: insert space before the open parenthesis '('

2023-07-18 Thread hanyu001


Fixes checkpatch error:

powerpc/platforms/ps3/os-area.c:782:ERROR: space required before the 
open parenthesis '('


Signed-off-by: Yu Han 
---
 arch/powerpc/platforms/ps3/os-area.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/ps3/os-area.c 
b/arch/powerpc/platforms/ps3/os-area.c

index b384cd2d6b99..237a76d1c92f 100644
--- a/arch/powerpc/platforms/ps3/os-area.c
+++ b/arch/powerpc/platforms/ps3/os-area.c
@@ -779,7 +779,7 @@ void __init ps3_os_area_init(void)
 os_area_get_property(node, _av_multi_out);
 }

-if(!saved_params.rtc_diff)
+if (!saved_params.rtc_diff)
 saved_params.rtc_diff = SECONDS_FROM_1970_TO_2000;

 if (node) {

[PATCH] powerpc: platforms: insert space before the open parenthesis '('

2023-07-18 Thread hanyu001


Fixes checkpatch error:

arch/powerpc/platforms/ps3/setup.c:107:ERROR:space required before the 
open parenthesis '('


Signed-off-by: Yu Han 
---
 arch/powerpc/platforms/ps3/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/ps3/setup.c 
b/arch/powerpc/platforms/ps3/setup.c

index 5144f11359f7..3a22a26912db 100644
--- a/arch/powerpc/platforms/ps3/setup.c
+++ b/arch/powerpc/platforms/ps3/setup.c
@@ -104,7 +104,7 @@ static void ps3_panic(char *str)
 printk("\n");
 panic_flush_kmsg_end();

-while(1)
+while (1)
 lv1_pause(1);
 }

[PATCH 4/4] mmu_notifiers: Don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end()

2023-07-18 Thread Alistair Popple

Secondary TLBs are now invalidated from the architecture specific TLB
invalidation functions. Therefore there is no need to explicitly
notify or invalidate as part of the range end functions. This means we
can remove mmu_notifier_invalidate_range_end_only() and some of the
ptep_*_notify() functions.

Signed-off-by: Alistair Popple 
---
 include/linux/mmu_notifier.h | 56 +
 kernel/events/uprobes.c  |  2 +-
 mm/huge_memory.c | 25 ++---
 mm/hugetlb.c |  2 +-
 mm/memory.c  |  8 +
 mm/migrate_device.c  |  9 +-
 mm/mmu_notifier.c| 25 ++---
 mm/rmap.c| 42 +
 8 files changed, 14 insertions(+), 155 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index a4bc818..6e3c857 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -395,8 +395,7 @@ extern int __mmu_notifier_test_young(struct mm_struct *mm,
 extern void __mmu_notifier_change_pte(struct mm_struct *mm,
  unsigned long address, pte_t pte);
 extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r);
-extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r,
- bool only_end);
+extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r);
 extern void __mmu_notifier_arch_invalidate_secondary_tlbs(struct mm_struct *mm,
unsigned long start, unsigned long end);
 extern bool
@@ -481,14 +480,7 @@ mmu_notifier_invalidate_range_end(struct 
mmu_notifier_range *range)
might_sleep();
 
if (mm_has_notifiers(range->mm))
-   __mmu_notifier_invalidate_range_end(range, false);
-}
-
-static inline void
-mmu_notifier_invalidate_range_only_end(struct mmu_notifier_range *range)
-{
-   if (mm_has_notifiers(range->mm))
-   __mmu_notifier_invalidate_range_end(range, true);
+   __mmu_notifier_invalidate_range_end(range);
 }
 
 static inline void mmu_notifier_arch_invalidate_secondary_tlbs(struct 
mm_struct *mm,
@@ -582,45 +574,6 @@ static inline void mmu_notifier_range_init_owner(
__young;\
 })
 
-#defineptep_clear_flush_notify(__vma, __address, __ptep)   
\
-({ \
-   unsigned long ___addr = __address & PAGE_MASK;  \
-   struct mm_struct *___mm = (__vma)->vm_mm;   \
-   pte_t ___pte;   \
-   \
-   ___pte = ptep_clear_flush(__vma, __address, __ptep);\
-   mmu_notifier_arch_invalidate_secondary_tlbs(___mm, ___addr, 
\
-   ___addr + PAGE_SIZE);   \
-   \
-   ___pte; \
-})
-
-#define pmdp_huge_clear_flush_notify(__vma, __haddr, __pmd)\
-({ \
-   unsigned long ___haddr = __haddr & HPAGE_PMD_MASK;  \
-   struct mm_struct *___mm = (__vma)->vm_mm;   \
-   pmd_t ___pmd;   \
-   \
-   ___pmd = pmdp_huge_clear_flush(__vma, __haddr, __pmd);  \
-   mmu_notifier_arch_invalidate_secondary_tlbs(___mm, ___haddr,
\
- ___haddr + HPAGE_PMD_SIZE);   \
-   \
-   ___pmd; \
-})
-
-#define pudp_huge_clear_flush_notify(__vma, __haddr, __pud)\
-({ \
-   unsigned long ___haddr = __haddr & HPAGE_PUD_MASK;  \
-   struct mm_struct *___mm = (__vma)->vm_mm;   \
-   pud_t ___pud;   \
-   \
-   ___pud = pudp_huge_clear_flush(__vma, __haddr, __pud);  \
-   mmu_notifier_arch_invalidate_secondary_tlbs(___mm, ___haddr,
\
- ___haddr + HPAGE_PUD_SIZE);   \
-   \
-   ___pud; \
-})
-
 /*
  * set_pte_at_notify() sets the pte _after_ running the notifier.
  * This is safe to start by updating the

[PATCH 3/4] mmu_notifiers: Call arch_invalidate_secondary_tlbs() when invalidating TLBs

2023-07-18 Thread Alistair Popple

The arch_invalidate_secondary_tlbs() is an architecture specific mmu
notifier used to keep the TLB of secondary MMUs such as an IOMMU in
sync with the CPU page tables. Currently it is called from separate
code paths to the main CPU TLB invalidations. This can lead to a
secondary TLB not getting invalidated when required and makes it hard
to reason about when exactly the secondary TLB is invalidated.

To fix this move the notifier call to the architecture specific TLB
maintenance functions for architectures that have secondary MMUs
requiring explicit software invalidations.

This fixes a SMMU bug on ARM64. On ARM64 PTE permission upgrades
require a TLB invalidation. This invalidation is done by the
architecutre specific ptep_set_access_flags() which calls
flush_tlb_page() if required. However this doesn't call the notifier
resulting in infinite faults being generated by devices using the SMMU
if it has previously cached a read-only PTE in it's TLB.

Moving the invalidations into the TLB invalidation functions ensures
all invalidations happen at the same time as the CPU invalidation. The
architecture specific flush_tlb_all() routines do not call the
notifier as none of the IOMMUs require this.

Signed-off-by: Alistair Popple 
Suggested-by: Jason Gunthorpe 
---
 arch/arm64/include/asm/tlbflush.h | 5 +
 arch/powerpc/include/asm/book3s/64/tlbflush.h | 1 +
 arch/powerpc/mm/book3s64/radix_hugetlbpage.c  | 1 +
 arch/powerpc/mm/book3s64/radix_tlb.c  | 6 ++
 arch/x86/mm/tlb.c | 2 ++
 include/asm-generic/tlb.h | 1 -
 6 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/tlbflush.h 
b/arch/arm64/include/asm/tlbflush.h
index 412a3b9..386f0f7 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -252,6 +253,7 @@ static inline void flush_tlb_mm(struct mm_struct *mm)
__tlbi(aside1is, asid);
__tlbi_user(aside1is, asid);
dsb(ish);
+   mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
 }
 
 static inline void flush_tlb_page_nosync(struct vm_area_struct *vma,
@@ -263,6 +265,8 @@ static inline void flush_tlb_page_nosync(struct 
vm_area_struct *vma,
addr = __TLBI_VADDR(uaddr, ASID(vma->vm_mm));
__tlbi(vale1is, addr);
__tlbi_user(vale1is, addr);
+   mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, uaddr & 
PAGE_MASK,
+   (uaddr & PAGE_MASK) + 
PAGE_SIZE);
 }
 
 static inline void flush_tlb_page(struct vm_area_struct *vma,
@@ -358,6 +362,7 @@ static inline void __flush_tlb_range(struct vm_area_struct 
*vma,
scale++;
}
dsb(ish);
+   mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
 }
 
 static inline void flush_tlb_range(struct vm_area_struct *vma,
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index 0d0c144..dca0477 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -5,6 +5,7 @@
 #define MMU_NO_CONTEXT ~0UL
 
 #include 
+#include 
 #include 
 #include 
 
diff --git a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c 
b/arch/powerpc/mm/book3s64/radix_hugetlbpage.c
index 5e31955..17075c7 100644
--- a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c
+++ b/arch/powerpc/mm/book3s64/radix_hugetlbpage.c
@@ -39,6 +39,7 @@ void radix__flush_hugetlb_tlb_range(struct vm_area_struct 
*vma, unsigned long st
radix__flush_tlb_pwc_range_psize(vma->vm_mm, start, end, psize);
else
radix__flush_tlb_range_psize(vma->vm_mm, start, end, psize);
+   mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
 }
 
 void radix__huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index 0bd4866..64c11a4 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -752,6 +752,8 @@ void radix__local_flush_tlb_page(struct vm_area_struct 
*vma, unsigned long vmadd
return radix__local_flush_hugetlb_page(vma, vmaddr);
 #endif
radix__local_flush_tlb_page_psize(vma->vm_mm, vmaddr, 
mmu_virtual_psize);
+   mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, vmaddr,
+   vmaddr + mmu_virtual_psize);
 }
 EXPORT_SYMBOL(radix__local_flush_tlb_page);
 
@@ -987,6 +989,7 @@ void radix__flush_tlb_mm(struct mm_struct *mm)
}
}
preempt_enable();
+   mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
 }
 EXPORT_SYMBOL(radix__flush_tlb_mm);
 
@@ -1020,6 +1023,7 @@ static void __flush_all_mm(struct mm_struct *mm, bool 
fullmm)
_tlbiel_pid_multicast(mm,

[PATCH 2/4] arm64/smmu: Use TLBI ASID when invalidating entire range

2023-07-18 Thread Alistair Popple

The ARM SMMU has a specific command for invalidating the TLB for an
entire ASID. Currently this is used for the IO_PGTABLE API but not for
ATS when called from the MMU notifier.

The current implementation of notifiers does not attempt to invalidate
such a large address range, instead walking each VMA and invalidating
each range individually during mmap removal. However in future SMMU
TLB invalidations are going to be sent as part of the normal
flush_tlb_*() kernel calls. To better deal with that add handling to
use TLBI ASID when invalidating the entire address space.

Signed-off-by: Alistair Popple 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index aa63cff..dbc812a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -201,10 +201,20 @@ static void 
arm_smmu_mm_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
 * range. So do a simple translation here by calculating size correctly.
 */
size = end - start;
+   if (size == ULONG_MAX)
+   size = 0;
+
+   if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) {
+   if (!size)
+   arm_smmu_tlb_inv_asid(smmu_domain->smmu,
+ smmu_mn->cd->asid);
+   else
+   arm_smmu_tlb_inv_range_asid(start, size,
+   smmu_mn->cd->asid,
+   PAGE_SIZE, false,
+   smmu_domain);
+   }
 
-   if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
-   arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
-   PAGE_SIZE, false, smmu_domain);
arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size);
 }
 
-- 
git-series 0.9.1

1 2 >

1 - 100 of 104 matches

Mail list logo