Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-13 Thread Davidlohr Bueso

On Fri, 09 Sep 2022, Jonathan Cameron wrote:


On Thu, 8 Sep 2022 16:22:26 -0700
Dan Williams  wrote:


Andrew Morton wrote:
> On Thu, 8 Sep 2022 15:51:50 -0700 Dan Williams  
wrote:
>
> > Jonathan Cameron wrote:
> > > On Wed, 7 Sep 2022 18:07:31 -0700
> > > Dan Williams  wrote:
> > >
> > > > Andrew Morton wrote:
> > > > > I really dislike the term "flush".  Sometimes it means writeback,
> > > > > sometimes it means invalidate.  Perhaps at other times it means
> > > > > both.
> > > > >
> > > > > Can we please be very clear in comments and changelogs about exactly
> > > > > what this "flush" does.   With bonus points for being more specific 
in the
> > > > > function naming?
> > > > >
> > > >
> > > > That's a good point, "flush" has been cargo-culted along in Linux's
> > > > cache management APIs to mean write-back-and-invalidate. In this case I
> > > > think this API is purely about invalidate. It just so happens that x86
> > > > has not historically had a global invalidate instruction readily
> > > > available which leads to the overuse of wbinvd.
> > > >
> > > > It would be nice to make clear that this API is purely about
> > > > invalidating any data cached for a physical address impacted by address
> > > > space management event (secure erase / new region provision). Write-back
> > > > is an unnecessary side-effect.
> > > >
> > > > So how about:
> > > >
> > > > s/arch_flush_memregion/cpu_cache_invalidate_memregion/?
> > >
> > > Want to indicate it 'might' write back perhaps?
> > > So could be invalidate or clean and invalidate (using arm ARM terms just 
to add
> > > to the confusion ;)
> > >
> > > Feels like there will be potential race conditions where that matters as 
we might
> > > force stale data to be written back.
> > >
> > > Perhaps a comment is enough for that. Anyone have the "famous last words" 
feeling?
> >
> > Is "invalidate" not clear that write-back is optional? Maybe not.
>
> Yes, I'd say that "invalidate" means "dirty stuff may of may not have
> been written back".  Ditto for invalidate_inode_pages2().
>
> > Also, I realized that we tried to include the address range to allow for
> > the possibility of flushing by virtual address range, but that
> > overcomplicates the use. I.e. if someone issue secure erase and the
> > region association is not established does that mean that mean that the
> > cache invalidation is not needed? It could be the case that someone
> > disables a device, does the secure erase, and then reattaches to the
> > same region. The cache invalidation is needed, but at the time of the
> > secure erase the HPA was unknown.
> >
> > All this to say that I feel the bikeshedding will need to continue until
> > morale improves.
> >
> > I notice that the DMA API uses 'sync' to indicate, "make this memory
> > consistent/coherent for the CPU or the device", so how about an API like
> >
> > memregion_sync_for_cpu(int res_desc)
> >
> > ...where the @res_desc would be IORES_DESC_CXL for all CXL and
> > IORES_DESC_PERSISTENT_MEMORY for the current nvdimm use case.
>
> "sync" is another of my pet peeves ;) In filesystem land, at least.
> Does it mean "start writeback and return" or does it mean "start
> writeback, wait for its completion then return".

Ok, no "sync" :).

/**
 * cpu_cache_invalidate_memregion - drop any CPU cached data for
 * memregions described by @res_des
 * @res_desc: one of the IORES_DESC_* types
 *
 * Perform cache maintenance after a memory event / operation that
 * changes the contents of physical memory in a cache-incoherent manner.
 * For example, memory-device secure erase, or provisioning new CXL
 * regions. This routine may or may not write back any dirty contents
 * while performing the invalidation.
 *
 * Returns 0 on success or negative error code on a failure to perform
 * the cache maintenance.
 */
int cpu_cache_invalidate_memregion(int res_desc)


lgtm


Likewise, and I don't see anyone else objecting so I'll go ahead and send
a new iteration.

Thanks,
Davidlohr



Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-08 Thread Dan Williams
Andrew Morton wrote:
> On Thu, 8 Sep 2022 15:51:50 -0700 Dan Williams  
> wrote:
> 
> > Jonathan Cameron wrote:
> > > On Wed, 7 Sep 2022 18:07:31 -0700
> > > Dan Williams  wrote:
> > > 
> > > > Andrew Morton wrote:
> > > > > I really dislike the term "flush".  Sometimes it means writeback,
> > > > > sometimes it means invalidate.  Perhaps at other times it means
> > > > > both.
> > > > > 
> > > > > Can we please be very clear in comments and changelogs about exactly
> > > > > what this "flush" does.   With bonus points for being more specific 
> > > > > in the 
> > > > > function naming?
> > > > >   
> > > > 
> > > > That's a good point, "flush" has been cargo-culted along in Linux's
> > > > cache management APIs to mean write-back-and-invalidate. In this case I
> > > > think this API is purely about invalidate. It just so happens that x86
> > > > has not historically had a global invalidate instruction readily
> > > > available which leads to the overuse of wbinvd.
> > > > 
> > > > It would be nice to make clear that this API is purely about
> > > > invalidating any data cached for a physical address impacted by address
> > > > space management event (secure erase / new region provision). Write-back
> > > > is an unnecessary side-effect.
> > > > 
> > > > So how about:
> > > > 
> > > > s/arch_flush_memregion/cpu_cache_invalidate_memregion/?
> > > 
> > > Want to indicate it 'might' write back perhaps?
> > > So could be invalidate or clean and invalidate (using arm ARM terms just 
> > > to add
> > > to the confusion ;)
> > > 
> > > Feels like there will be potential race conditions where that matters as 
> > > we might
> > > force stale data to be written back.
> > > 
> > > Perhaps a comment is enough for that. Anyone have the "famous last words" 
> > > feeling?
> > 
> > Is "invalidate" not clear that write-back is optional? Maybe not.
> 
> Yes, I'd say that "invalidate" means "dirty stuff may of may not have
> been written back".  Ditto for invalidate_inode_pages2().
> 
> > Also, I realized that we tried to include the address range to allow for
> > the possibility of flushing by virtual address range, but that
> > overcomplicates the use. I.e. if someone issue secure erase and the
> > region association is not established does that mean that mean that the
> > cache invalidation is not needed? It could be the case that someone
> > disables a device, does the secure erase, and then reattaches to the
> > same region. The cache invalidation is needed, but at the time of the
> > secure erase the HPA was unknown.
> > 
> > All this to say that I feel the bikeshedding will need to continue until
> > morale improves.
> > 
> > I notice that the DMA API uses 'sync' to indicate, "make this memory
> > consistent/coherent for the CPU or the device", so how about an API like
> > 
> > memregion_sync_for_cpu(int res_desc)
> > 
> > ...where the @res_desc would be IORES_DESC_CXL for all CXL and
> > IORES_DESC_PERSISTENT_MEMORY for the current nvdimm use case.
> 
> "sync" is another of my pet peeves ;) In filesystem land, at least. 
> Does it mean "start writeback and return" or does it mean "start
> writeback, wait for its completion then return".  

Ok, no "sync" :).

/**
 * cpu_cache_invalidate_memregion - drop any CPU cached data for
 * memregions described by @res_des
 * @res_desc: one of the IORES_DESC_* types
 *
 * Perform cache maintenance after a memory event / operation that
 * changes the contents of physical memory in a cache-incoherent manner.
 * For example, memory-device secure erase, or provisioning new CXL
 * regions. This routine may or may not write back any dirty contents
 * while performing the invalidation.
 *
 * Returns 0 on success or negative error code on a failure to perform
 * the cache maintenance.
 */
int cpu_cache_invalidate_memregion(int res_desc)

??



Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-08 Thread Andrew Morton
On Thu, 8 Sep 2022 15:51:50 -0700 Dan Williams  wrote:

> Jonathan Cameron wrote:
> > On Wed, 7 Sep 2022 18:07:31 -0700
> > Dan Williams  wrote:
> > 
> > > Andrew Morton wrote:
> > > > I really dislike the term "flush".  Sometimes it means writeback,
> > > > sometimes it means invalidate.  Perhaps at other times it means
> > > > both.
> > > > 
> > > > Can we please be very clear in comments and changelogs about exactly
> > > > what this "flush" does.   With bonus points for being more specific in 
> > > > the 
> > > > function naming?
> > > >   
> > > 
> > > That's a good point, "flush" has been cargo-culted along in Linux's
> > > cache management APIs to mean write-back-and-invalidate. In this case I
> > > think this API is purely about invalidate. It just so happens that x86
> > > has not historically had a global invalidate instruction readily
> > > available which leads to the overuse of wbinvd.
> > > 
> > > It would be nice to make clear that this API is purely about
> > > invalidating any data cached for a physical address impacted by address
> > > space management event (secure erase / new region provision). Write-back
> > > is an unnecessary side-effect.
> > > 
> > > So how about:
> > > 
> > > s/arch_flush_memregion/cpu_cache_invalidate_memregion/?
> > 
> > Want to indicate it 'might' write back perhaps?
> > So could be invalidate or clean and invalidate (using arm ARM terms just to 
> > add
> > to the confusion ;)
> > 
> > Feels like there will be potential race conditions where that matters as we 
> > might
> > force stale data to be written back.
> > 
> > Perhaps a comment is enough for that. Anyone have the "famous last words" 
> > feeling?
> 
> Is "invalidate" not clear that write-back is optional? Maybe not.

Yes, I'd say that "invalidate" means "dirty stuff may of may not have
been written back".  Ditto for invalidate_inode_pages2().

> Also, I realized that we tried to include the address range to allow for
> the possibility of flushing by virtual address range, but that
> overcomplicates the use. I.e. if someone issue secure erase and the
> region association is not established does that mean that mean that the
> cache invalidation is not needed? It could be the case that someone
> disables a device, does the secure erase, and then reattaches to the
> same region. The cache invalidation is needed, but at the time of the
> secure erase the HPA was unknown.
> 
> All this to say that I feel the bikeshedding will need to continue until
> morale improves.
> 
> I notice that the DMA API uses 'sync' to indicate, "make this memory
> consistent/coherent for the CPU or the device", so how about an API like
> 
> memregion_sync_for_cpu(int res_desc)
> 
> ...where the @res_desc would be IORES_DESC_CXL for all CXL and
> IORES_DESC_PERSISTENT_MEMORY for the current nvdimm use case.

"sync" is another of my pet peeves ;) In filesystem land, at least. 
Does it mean "start writeback and return" or does it mean "start
writeback, wait for its completion then return".  




Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-08 Thread Dan Williams
Jonathan Cameron wrote:
> On Wed, 7 Sep 2022 18:07:31 -0700
> Dan Williams  wrote:
> 
> > Andrew Morton wrote:
> > > I really dislike the term "flush".  Sometimes it means writeback,
> > > sometimes it means invalidate.  Perhaps at other times it means
> > > both.
> > > 
> > > Can we please be very clear in comments and changelogs about exactly
> > > what this "flush" does.   With bonus points for being more specific in 
> > > the 
> > > function naming?
> > >   
> > 
> > That's a good point, "flush" has been cargo-culted along in Linux's
> > cache management APIs to mean write-back-and-invalidate. In this case I
> > think this API is purely about invalidate. It just so happens that x86
> > has not historically had a global invalidate instruction readily
> > available which leads to the overuse of wbinvd.
> > 
> > It would be nice to make clear that this API is purely about
> > invalidating any data cached for a physical address impacted by address
> > space management event (secure erase / new region provision). Write-back
> > is an unnecessary side-effect.
> > 
> > So how about:
> > 
> > s/arch_flush_memregion/cpu_cache_invalidate_memregion/?
> 
> Want to indicate it 'might' write back perhaps?
> So could be invalidate or clean and invalidate (using arm ARM terms just to 
> add
> to the confusion ;)
> 
> Feels like there will be potential race conditions where that matters as we 
> might
> force stale data to be written back.
> 
> Perhaps a comment is enough for that. Anyone have the "famous last words" 
> feeling?

Is "invalidate" not clear that write-back is optional? Maybe not.

Also, I realized that we tried to include the address range to allow for
the possibility of flushing by virtual address range, but that
overcomplicates the use. I.e. if someone issue secure erase and the
region association is not established does that mean that mean that the
cache invalidation is not needed? It could be the case that someone
disables a device, does the secure erase, and then reattaches to the
same region. The cache invalidation is needed, but at the time of the
secure erase the HPA was unknown.

All this to say that I feel the bikeshedding will need to continue until
morale improves.

I notice that the DMA API uses 'sync' to indicate, "make this memory
consistent/coherent for the CPU or the device", so how about an API like

memregion_sync_for_cpu(int res_desc)

...where the @res_desc would be IORES_DESC_CXL for all CXL and
IORES_DESC_PERSISTENT_MEMORY for the current nvdimm use case.



Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-08 Thread Dan Williams
Borislav Petkov wrote:
> On Wed, Sep 07, 2022 at 09:52:17AM -0700, Dan Williams wrote:
> > To be clear nfit stuff and CXL does run in guests, but they do not
> > support secure-erase in a guest.
> > 
> > However, the QEMU CXL enabling is building the ability to do *guest
> > physical* address space management, but in that case the driver can be
> > paravirtualized to realize that it is not managing host-physical address
> > space and does not need to flush caches. That will need some indicator
> > to differentiate virtual CXL memory expanders from assigned devices.
> 
> Sounds to me like that check should be improved later to ask
> whether the kernel is managing host-physical address space, maybe
> arch_flush_memregion() should check whether the address it is supposed
> to flush is host-physical and exit early if not...

Even though I raised the possibility of guest passthrough of a CXL
memory expander, I do not think it could work in practice without it
being a gigantic security nightmare. So it is probably safe to just do
the hypervisor check and assume that there's no such thing as guest
management of host physical address space.



Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-07 Thread Dan Williams
Andrew Morton wrote:
> I really dislike the term "flush".  Sometimes it means writeback,
> sometimes it means invalidate.  Perhaps at other times it means
> both.
> 
> Can we please be very clear in comments and changelogs about exactly
> what this "flush" does.   With bonus points for being more specific in the 
> function naming?
> 

That's a good point, "flush" has been cargo-culted along in Linux's
cache management APIs to mean write-back-and-invalidate. In this case I
think this API is purely about invalidate. It just so happens that x86
has not historically had a global invalidate instruction readily
available which leads to the overuse of wbinvd.

It would be nice to make clear that this API is purely about
invalidating any data cached for a physical address impacted by address
space management event (secure erase / new region provision). Write-back
is an unnecessary side-effect.

So how about:

s/arch_flush_memregion/cpu_cache_invalidate_memregion/?



Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-07 Thread Andrew Morton
I really dislike the term "flush".  Sometimes it means writeback,
sometimes it means invalidate.  Perhaps at other times it means
both.

Can we please be very clear in comments and changelogs about exactly
what this "flush" does.   With bonus points for being more specific in the 
function naming?



Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-07 Thread Davidlohr Bueso

Not sure the proper way to route this (akpm?). But unless any remaining
objections, could this be picked up?

Thanks,
Davidlohr



Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-07 Thread Davidlohr Bueso

On Wed, 07 Sep 2022, Borislav Petkov wrote:


On Mon, Aug 29, 2022 at 02:29:18PM -0700, Davidlohr Bueso wrote:

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 1abd5438f126..18463cb704fb 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -330,6 +330,20 @@ void arch_invalidate_pmem(void *addr, size_t size)
 EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
 #endif

+#ifdef CONFIG_ARCH_HAS_MEMREGION_INVALIDATE
+bool arch_has_flush_memregion(void)
+{
+   return !cpu_feature_enabled(X86_FEATURE_HYPERVISOR);


This looks really weird. Why does this need to care about HV at all?


So the context here is:

e2efb6359e62 ("ACPICA: Avoid cache flush inside virtual machines")



Does that nfit stuff even run in guests?


No, nor does cxl. This was mostly in general a precautionary check such
that the api is unavailable in VMs.




+EXPORT_SYMBOL(arch_has_flush_memregion);


...


+EXPORT_SYMBOL(arch_flush_memregion);


Why aren't those exports _GPL?


Fine by me.

Thanks,
Davidlohr



Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-07 Thread Davidlohr Bueso

On Wed, 07 Sep 2022, Dan Williams wrote:


Davidlohr Bueso wrote:

On Wed, 07 Sep 2022, Borislav Petkov wrote:

>On Mon, Aug 29, 2022 at 02:29:18PM -0700, Davidlohr Bueso wrote:
>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>> index 1abd5438f126..18463cb704fb 100644
>> --- a/arch/x86/mm/pat/set_memory.c
>> +++ b/arch/x86/mm/pat/set_memory.c
>> @@ -330,6 +330,20 @@ void arch_invalidate_pmem(void *addr, size_t size)
>>  EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
>>  #endif
>>
>> +#ifdef CONFIG_ARCH_HAS_MEMREGION_INVALIDATE
>> +bool arch_has_flush_memregion(void)
>> +{
>> +  return !cpu_feature_enabled(X86_FEATURE_HYPERVISOR);
>
>This looks really weird. Why does this need to care about HV at all?

So the context here is:

e2efb6359e62 ("ACPICA: Avoid cache flush inside virtual machines")

>
>Does that nfit stuff even run in guests?

No, nor does cxl. This was mostly in general a precautionary check such
that the api is unavailable in VMs.


To be clear nfit stuff and CXL does run in guests, but they do not
support secure-erase in a guest.


Yes, I meant the feats this api enables.


However, the QEMU CXL enabling is building the ability to do *guest
physical* address space management, but in that case the driver can be
paravirtualized to realize that it is not managing host-physical address
space and does not need to flush caches. That will need some indicator
to differentiate virtual CXL memory expanders from assigned devices. Is
there such a thing as a PCIe-virtio extended capability to differentiate
physical vs emulated devices?


In any case such check would be specific to each user (cxl in this case),
and outside the scope of _this_ particular api. Here we just really want to
avoid the broken TDX guest bits.

Thanks,
Davidlohr



Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-07 Thread Dan Williams
Davidlohr Bueso wrote:
> On Wed, 07 Sep 2022, Borislav Petkov wrote:
> 
> >On Mon, Aug 29, 2022 at 02:29:18PM -0700, Davidlohr Bueso wrote:
> >> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> >> index 1abd5438f126..18463cb704fb 100644
> >> --- a/arch/x86/mm/pat/set_memory.c
> >> +++ b/arch/x86/mm/pat/set_memory.c
> >> @@ -330,6 +330,20 @@ void arch_invalidate_pmem(void *addr, size_t size)
> >>  EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
> >>  #endif
> >>
> >> +#ifdef CONFIG_ARCH_HAS_MEMREGION_INVALIDATE
> >> +bool arch_has_flush_memregion(void)
> >> +{
> >> +  return !cpu_feature_enabled(X86_FEATURE_HYPERVISOR);
> >
> >This looks really weird. Why does this need to care about HV at all?
> 
> So the context here is:
> 
> e2efb6359e62 ("ACPICA: Avoid cache flush inside virtual machines")
> 
> >
> >Does that nfit stuff even run in guests?
> 
> No, nor does cxl. This was mostly in general a precautionary check such
> that the api is unavailable in VMs.

To be clear nfit stuff and CXL does run in guests, but they do not
support secure-erase in a guest.

However, the QEMU CXL enabling is building the ability to do *guest
physical* address space management, but in that case the driver can be
paravirtualized to realize that it is not managing host-physical address
space and does not need to flush caches. That will need some indicator
to differentiate virtual CXL memory expanders from assigned devices. Is
there such a thing as a PCIe-virtio extended capability to differentiate
physical vs emulated devices?



Re: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-09-07 Thread Dan Williams
Davidlohr Bueso wrote:
> Not sure the proper way to route this (akpm?). But unless any remaining
> objections, could this be picked up?

My plan was, barring objections, to take it through the CXL tree with
its first user, the CXL security commands.



RE: [PATCH -next] memregion: Add arch_flush_memregion() interface

2022-08-29 Thread Dan Williams
[ add Christoph ]

Davidlohr Bueso wrote:
> With CXL security features, global CPU cache flushing nvdimm requirements
> are no longer specific to that subsystem, even beyond the scope of
> security_ops. CXL will need such semantics for features not necessarily
> limited to persistent memory.
> 
> The functionality this is enabling is to be able to instantaneously
> secure erase potentially terabytes of memory at once and the kernel
> needs to be sure that none of the data from before the secure is still
> present in the cache. It is also used when unlocking a memory device
> where speculative reads and firmware accesses could have cached poison
> from before the device was unlocked.
> 
> This capability is typically only used once per-boot (for unlock), or
> once per bare metal provisioning event (secure erase), like when handing
> off the system to another tenant or decommissioning a device.
> 
> Users must first call arch_has_flush_memregion() to know whether this
> functionality is available on the architecture. Only enable it on x86-64
> via the wbinvd() hammer.
> 
> Signed-off-by: Davidlohr Bueso 
> ---
> 
> Changes from v2 
> (https://lore.kernel.org/all/20220819171024.1766857-1-d...@stgolabs.net/):
> - Redid to use memregion based interfaces + VMM check on x86 (Dan)
> - Restricted the flushing to x86-64.
> 
> Note: Since we still are dealing with a physical "range" at this level,
> added the spa range for nfit even though this is unused.

Looks reasonable to me.

Reviewed-by: Dan Williams