Re: [PATCH v5] scripts/link-vmlinux.sh: Add alias to duplicate symbols for kallsyms

2023-10-05 Thread Kris Van Hees
On Thu, Oct 05, 2023 at 12:24:03PM -0400, Kris Van Hees wrote:
> On Wed, Sep 27, 2023 at 05:35:16PM +, Alessandro Carminati (Red Hat) 
> wrote:
> > It is not uncommon for drivers or modules related to similar peripherals
> > to have symbols with the exact same name.
> > While this is not a problem for the kernel's binary itself, it becomes an
> > issue when attempting to trace or probe specific functions using
> > infrastructure like ftrace or kprobe.
> > 
> > The tracing subsystem relies on the `nm -n vmlinux` output, which provides
> > symbol information from the kernel's ELF binary. However, when multiple
> > symbols share the same name, the standard nm output does not differentiate
> > between them. This can lead to confusion and difficulty when trying to
> > probe the intended symbol.
> > 
> >  ~ # cat /proc/kallsyms | grep " name_show"
> >  8c4f76d0 t name_show
> >  8c9cccb0 t name_show
> >  8cb0ac20 t name_show
> >  8cc728c0 t name_show
> >  8ce0efd0 t name_show
> >  8ce126c0 t name_show
> >  8ce1dd20 t name_show
> >  8ce24e70 t name_show
> >  8d1104c0 t name_show
> >  8d1fe480 t name_show
> 
> One problem that remains as far as I can see is that this approach does not
> take into consideration that there can be duplicate symbols in the core
> kernel, but also between core kernel and loadable modules, and even between
> loadable modules.  So, I think more is needed to also ensure that this
> approach of adding alias symbols is also done for loadable modules.

Re-reading my email a bit more, I realize that I did not accurate demonstrate
the issue I arefer to.  Obviously, the current patch provides aliases for
duplicate symbols in the core kernel.  And loadable modules are annotated with
the module name.  So, all is well unless modules have duplicate symbols
themselves.  And although it is rare, it does happen:

# grep ' metadata_show' /proc/kallsyms 
c05659c0 t metadata_show[md_mod]
c05739f0 t metadata_show[md_mod]

This again shows a case where there are duplicate symbols that cannot be
distinguished for tracing purposes without additional information.  So, the
proposed patch unfortunately does not cover all cases because of loadable
modules.

> Earlier work that cover all symbols (core kernel and loadable modules) was
> posted quite a while ago by Nick Alcock:
> 
> https://lore.kernel.org/all/20221205163157.269335-1-nick.alc...@oracle.com/
> 
> It takes a different approach and adds in other info that is very useful for
> tracing, but unfortunately it has been dormant for a long time now.
> 
> While module symbols are handled quite differently (for kallsyms) from the
> core kernel symbols, I think that a similar approach tied in with modpost
> ought to be quite possible.  It will add to the size of modules because the
> data needs to be stored in the .ko but that is unavoidable.  But not doing it
> unfortunately would mean that the duplicate symbol issue remains unresolved
> in the presence of loadable modules.
> 
> > kas_alias addresses this challenge by enhancing symbol names with
> > meaningful suffixes generated from the source file and line number
> > during the kernel build process.
> > These newly generated aliases provide tracers with the ability to
> > comprehend the symbols they are interacting with when utilizing the
> > ftracefs interface.
> > This approach may also allow for the probing by name of previously
> > inaccessible symbols.
> > 
> >  ~ # cat /proc/kallsyms | grep gic_mask_irq
> >  d15671e505ac t gic_mask_irq
> >  d15671e505ac t gic_mask_irq@drivers_irqchip_irq_gic_c_167
> >  d15671e532a4 t gic_mask_irq
> >  d15671e532a4 t gic_mask_irq@drivers_irqchip_irq_gic_v3_c_407
> >  ~ #
> 
> In the same context as mentioned above (module symbols), I am hoping that the
> alias you generate might also be able to contain a module identifier name,
> much like the aforementioned patch series by Nick Alcock added.  We have it
> for loadable modules already of course, but as has been discussed in relation
> to the earlier work, being able to associate a module name with a symbol
> regardless of whether that module is configured to be built into the kernel
> or whether it is configured to be a loadable module is helpful for tracing
> purposes.  Especially for tracers that use tracing scripts that might get
> deployed on diverse systems where it cannot always be known at the time of
> developing the tracer scripts whether a kernel module is configured to be
> loadable or built-in.
> 
> I'd be happy to work on something like this as a contribution to your work.
> I would envision the alias entry not needing to have the typical [module] 
> added
> to it because that will already be annotated on the actual symbol entry.  So,
> the alias could be extended to be something like:
> 
> c0533720 t floppy_open  [floppy]
> c0533720 t 

Re: [PATCH] bus: mhi: host: Add tracing support

2023-10-05 Thread Bjorn Andersson
On Thu, Oct 05, 2023 at 03:55:20PM +0530, Krishna chaitanya chundru wrote:
> This change adds ftrace support for following:
> 1. mhi_intvec_threaded_handler
> 2. mhi_process_data_event_ring
> 3. mhi_process_ctrl_ev_ring
> 4. mhi_gen_tre
> 5. mhi_update_channel_state
> 6. mhi_tryset_pm_state
> 7. mhi_pm_st_worker

This is not the best "problem description".

> 
> Usage:
>   echo 1 > /sys/kernel/debug/tracing/events/mhi_host/enable
>   cat /sys/kernel/debug/tracing/trace

This does not need to be included in the commit message, how to use the
tracing framework is documented elsewhere.

[..]
> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
> index dcf627b36e82..499590437e9b 100644
> --- a/drivers/bus/mhi/host/main.c
> +++ b/drivers/bus/mhi/host/main.c
> @@ -491,11 +491,10 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, 
> void *priv)
>  
>   state = mhi_get_mhi_state(mhi_cntrl);
>   ee = mhi_get_exec_env(mhi_cntrl);
> - dev_dbg(dev, "local ee: %s state: %s device ee: %s state: %s\n",
> - TO_MHI_EXEC_STR(mhi_cntrl->ee),
> - mhi_state_str(mhi_cntrl->dev_state),
> - TO_MHI_EXEC_STR(ee), mhi_state_str(state));
>  
> + trace_mhi_intvec_threaded_handler(mhi_cntrl->mhi_dev->name, 
> TO_MHI_EXEC_STR(mhi_cntrl->ee),
> +   mhi_state_str(mhi_cntrl->dev_state),
> +   TO_MHI_EXEC_STR(ee), 
> mhi_state_str(state));

All these helper functions that translates a state to a string, pass the
raw state into the trace event and use __print_symbolic() in your
TP_printk() instead.

This will allow you to read the state, but you can have tools act of the
numerical value.


(This comment applies to all the trace events)

>   if (state == MHI_STATE_SYS_ERR) {
>   dev_dbg(dev, "System error detected\n");
>   pm_state = mhi_tryset_pm_state(mhi_cntrl,
[..]
> diff --git a/include/trace/events/mhi_host.h b/include/trace/events/mhi_host.h
[..]
> +
> +TRACE_EVENT(mhi_pm_st_worker,

Why is this trace event called "worker", isn't the event a
"mhi_pm_state_transition"?

Don't just name your trace event based on the function that triggers
them, but what they represent and make sure they carry useful
information to understand the system.

If you want to trace the flow through your functions, you can use e.g.
ftrace.

Regards,
Bjorn



Re: [PATCH v7 00/13] selftests/sgx: Fix compilation errors

2023-10-05 Thread Huang, Kai
Hi Jo,

Just FYI I won't review the rest patches in this series.  One of the reasons is
I am not that familiar with the rest.  Jarkko has reviewed anyway :-).

On Thu, 2023-10-05 at 17:38 +0200, Jo Van Bulck wrote:
> Hi,
> 
> This patch series ensures that all SGX selftests succeed when compiling with
> optimizations (as tested with -O{0,1,2,3,s} for both gcc 11.3.0 and clang
> 14.0.0). The aim of the patches is to avoid reliance on undefined,
> compiler-specific behavior that can make the test results fragile.
> 
> As far as I see, all commits in this series now have an explicit reviewed-by
> tag, so hopefully this can get merged upstream? Please let me know if any
> concerns remain and I'd happily address them.
> 
> Reference output below:
> 
> .. Testing   gcc   -O0[OK]
> .. Testing   gcc   -O1[OK]
> .. Testing   gcc   -O2[OK]
> .. Testing   gcc   -O3[OK]
> .. Testing   gcc   -Os[OK]
> .. Testing   gcc   -Ofast [OK]
> .. Testing   gcc   -Og[OK]
> .. Testing   clang -O0[OK]
> .. Testing   clang -O1[OK]
> .. Testing   clang -O2[OK]
> .. Testing   clang -O3[OK]
> .. Testing   clang -Os[OK]
> .. Testing   clang -Ofast [OK]
> .. Testing   clang -Og[OK]
> 
> Changelog
> -
> 
> v7
>   - Add reviewed-by tag (Jarkko)
> 
> v6
>   - Collect final ack/reviewed-by tags (Jarkko, Kai)
> 
> v5
>   - Reorder patches (Jarkko, Kai)
>   - Include fixes tag for inline asm memory clobber patch (Kai)
>   - Include linker error in static-pie commit message (Kai)
>   - Include generated assembly in relocations commit (Kai)
> 
> v4
>   - Remove redundant -nostartfiles compiler flag (Jarkko)
>   - Split dynamic symbol table removal in separate commit (Kai)
>   - Split redundant push/pop elimination in separate commit (Kai)
>   - Remove (incomplete) register cleansing on enclave exit
>   - Fix possibly uninitialized pointer dereferences in load.c
> 
> v3
>   - Refactor encl_op_array declaration and indexing (Jarkko)
>   - Annotate encl_buffer with "used" attribute (Kai)
>   - Split encl_buffer size and placement commits (Kai)
> 
> v2
>   - Add additional check for NULL pointer (Kai)
>   - Refine to produce proper static-pie executable
>   - Fix linker script assertions
>   - Specify memory clobber for inline asm instead of volatile (Kai)
>   - Clarify why encl_buffer non-static (Jarkko, Kai)
>   - Clarify -ffreestanding (Jarkko)
> 
> Best,
> Jo
> 
> Jo Van Bulck (13):
>   selftests/sgx: Fix uninitialized pointer dereference in error path
>   selftests/sgx: Fix uninitialized pointer dereferences in
> encl_get_entry
>   selftests/sgx: Include memory clobber for inline asm in test enclave
>   selftests/sgx: Separate linker options
>   selftests/sgx: Specify freestanding environment for enclave
> compilation
>   selftests/sgx: Remove redundant enclave base address save/restore
>   selftests/sgx: Produce static-pie executable for test enclave
>   selftests/sgx: Handle relocations in test enclave
>   selftests/sgx: Fix linker script asserts
>   selftests/sgx: Ensure test enclave buffer is entirely preserved
>   selftests/sgx: Ensure expected location of test enclave buffer
>   selftests/sgx: Discard unsupported ELF sections
>   selftests/sgx: Remove incomplete ABI sanitization code in test enclave
> 
>  tools/testing/selftests/sgx/Makefile  | 12 ++--
>  tools/testing/selftests/sgx/defines.h |  2 +
>  tools/testing/selftests/sgx/load.c|  9 ++-
>  tools/testing/selftests/sgx/sigstruct.c   |  5 +-
>  tools/testing/selftests/sgx/test_encl.c   | 67 +--
>  tools/testing/selftests/sgx/test_encl.lds | 10 +--
>  .../selftests/sgx/test_encl_bootstrap.S   | 28 +++-
>  7 files changed, 77 insertions(+), 56 deletions(-)
> 



RE: [PATCH v5 1/2] mm/memory_hotplug: split memmap_on_memory requests across memblocks

2023-10-05 Thread Dan Williams
Vishal Verma wrote:
> The MHP_MEMMAP_ON_MEMORY flag for hotplugged memory is restricted to
> 'memblock_size' chunks of memory being added. Adding a larger span of
> memory precludes memmap_on_memory semantics.
> 
> For users of hotplug such as kmem, large amounts of memory might get
> added from the CXL subsystem. In some cases, this amount may exceed the
> available 'main memory' to store the memmap for the memory being added.
> In this case, it is useful to have a way to place the memmap on the
> memory being added, even if it means splitting the addition into
> memblock-sized chunks.
> 
> Change add_memory_resource() to loop over memblock-sized chunks of
> memory if caller requested memmap_on_memory, and if other conditions for
> it are met. Teach try_remove_memory() to also expect that a memory
> range being removed might have been split up into memblock sized chunks,
> and to loop through those as needed.
> 
> Cc: Andrew Morton 
> Cc: David Hildenbrand 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: Dan Williams 
> Cc: Dave Jiang 
> Cc: Dave Hansen 
> Cc: Huang Ying 
> Suggested-by: David Hildenbrand 
> Signed-off-by: Vishal Verma 
> ---
>  mm/memory_hotplug.c | 162 
> 
>  1 file changed, 99 insertions(+), 63 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index f8d3e7427e32..77ec6f15f943 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1380,6 +1380,44 @@ static bool mhp_supports_memmap_on_memory(unsigned 
> long size)
>   return arch_supports_memmap_on_memory(vmemmap_size);
>  }
>  
> +static int add_memory_create_devices(int nid, struct memory_group *group,
> +  u64 start, u64 size, mhp_t mhp_flags)
> +{
> + struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
> + struct vmem_altmap mhp_altmap = {
> + .base_pfn =  PHYS_PFN(start),
> + .end_pfn  =  PHYS_PFN(start + size - 1),
> + };
> + int ret;
> +
> + if ((mhp_flags & MHP_MEMMAP_ON_MEMORY)) {
> + mhp_altmap.free = memory_block_memmap_on_memory_pages();
> + params.altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
> + if (!params.altmap)
> + return -ENOMEM;
> +
> + memcpy(params.altmap, _altmap, sizeof(mhp_altmap));

Isn't this just open coded kmemdup()?

Other than that, I am not seeing anything else to comment on, you can add:

Reviewed-by: Dan Williams 



RE: [PATCH v5 2/2] dax/kmem: allow kmem to add memory with memmap_on_memory

2023-10-05 Thread Dan Williams
Vishal Verma wrote:
> Large amounts of memory managed by the kmem driver may come in via CXL,
> and it is often desirable to have the memmap for this memory on the new
> memory itself.
> 
> Enroll kmem-managed memory for memmap_on_memory semantics if the dax
> region originates via CXL. For non-CXL dax regions, retain the existing
> default behavior of hot adding without memmap_on_memory semantics.
> 
> Add a sysfs override under the dax device to control this behavior and
> override either default.
> 
> Cc: Andrew Morton 
> Cc: David Hildenbrand 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: Dan Williams 
> Cc: Dave Jiang 
> Cc: Dave Hansen 
> Cc: Huang Ying 
> Reviewed-by: Jonathan Cameron 
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Vishal Verma 
> ---
>  drivers/dax/bus.h |  1 +
>  drivers/dax/dax-private.h |  1 +
>  drivers/dax/bus.c | 38 ++
>  drivers/dax/cxl.c |  1 +
>  drivers/dax/hmem/hmem.c   |  1 +
>  drivers/dax/kmem.c|  8 +++-
>  drivers/dax/pmem.c|  1 +
>  7 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index 1ccd23360124..cbbf64443098 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -23,6 +23,7 @@ struct dev_dax_data {
>   struct dev_pagemap *pgmap;
>   resource_size_t size;
>   int id;
> + bool memmap_on_memory;
>  };
>  
>  struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data);
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index 27cf2d79..446617b73aea 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -70,6 +70,7 @@ struct dev_dax {
>   struct ida ida;
>   struct device dev;
>   struct dev_pagemap *pgmap;
> + bool memmap_on_memory;
>   int nr_range;
>   struct dev_dax_range {
>   unsigned long pgoff;
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 0ee96e6fc426..43be95a231c9 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -367,6 +367,7 @@ static ssize_t create_store(struct device *dev, struct 
> device_attribute *attr,
>   .dax_region = dax_region,
>   .size = 0,
>   .id = -1,
> + .memmap_on_memory = false,
>   };
>   struct dev_dax *dev_dax = devm_create_dev_dax();
>  
> @@ -1269,6 +1270,40 @@ static ssize_t numa_node_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(numa_node);
>  
> +static ssize_t memmap_on_memory_show(struct device *dev,
> +  struct device_attribute *attr, char *buf)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> +
> + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory);
> +}
> +
> +static ssize_t memmap_on_memory_store(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf, size_t len)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> + struct dax_region *dax_region = dev_dax->region;
> + ssize_t rc;
> + bool val;
> +
> + rc = kstrtobool(buf, );
> + if (rc)
> + return rc;

Perhaps:

if (dev_dax->memmap_on_memory == val)
return len;

...and skip the check below when it is going to be a nop

> +
> + device_lock(dax_region->dev);
> + if (!dax_region->dev->driver) {

Is the polarity backwards here? I.e. if the device is already attached to
the kmem driver it is too late to modify memmap_on_memory policy.

> + device_unlock(dax_region->dev);
> + return -ENXIO;

I would expect -EBUSY since disabling the device allows the property to be
set and -ENXIO implies a more permanent error.

> + }
> +
> + dev_dax->memmap_on_memory = val;
> +
> + device_unlock(dax_region->dev);
> + return len;
> +}
> +static DEVICE_ATTR_RW(memmap_on_memory);

This new attribute needs a new Documentation/ABI/ entry... in fact all of
these attributes need Documentation/ entries. I can help with that base
document to get things started.

Perhaps split this sysfs ABI into its own patch and, depending on how fast
we can pull the Documentation together, start with the
region-driver-conveyed approach in the meantime.



Re: [PATCH v5 16/18] x86/sgx: Limit process EPC usage with misc cgroup controller

2023-10-05 Thread Huang, Kai

> ---
>  arch/x86/Kconfig |  13 +
>  arch/x86/kernel/cpu/sgx/Makefile |   1 +
>  arch/x86/kernel/cpu/sgx/epc_cgroup.c | 415 +++
>  arch/x86/kernel/cpu/sgx/epc_cgroup.h |  59 
>  arch/x86/kernel/cpu/sgx/main.c   |  68 -
>  arch/x86/kernel/cpu/sgx/sgx.h|  17 +-
>  6 files changed, 556 insertions(+), 17 deletions(-)
>  create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h

Given how large this patch is, it's better to split if we can.

It seems we can at least split ...

[...]

> 
> @@ -970,6 +1005,7 @@ static void __init arch_update_sysfs_visibility(int nid) 
> {}
>  static bool __init sgx_page_cache_init(void)
>  {
>   u32 eax, ebx, ecx, edx, type;
> + u64 capacity = 0;
>   u64 pa, size;
>   int nid;
>   int i;
> @@ -1020,6 +1056,7 @@ static bool __init sgx_page_cache_init(void)
>  
>   sgx_epc_sections[i].node =  _numa_nodes[nid];
>   sgx_numa_nodes[nid].size += size;
> + capacity += size;
>  
>   sgx_nr_epc_sections++;
>   }
> @@ -1029,6 +1066,9 @@ static bool __init sgx_page_cache_init(void)
>   return false;
>   }
>  
> + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, capacity);
> + sgx_epc_total_pages = capacity >> PAGE_SHIFT;
> +
>   return true;
>  }
> 

... setting up capacity out as a separate patch, as it is a top-level only file
showing the maximum instances of the resource.

I'll review rest later.


Re: [PATCH v5 15/18] x86/sgx: Prepare for multiple LRUs

2023-10-05 Thread Huang, Kai
On Thu, 2023-10-05 at 14:33 -0500, Haitao Huang wrote:
> On Thu, 05 Oct 2023 07:30:46 -0500, Huang, Kai  wrote:
> 
> > On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> > > +static inline struct sgx_epc_lru_lists *sgx_lru_lists(struct  
> > > sgx_epc_page *epc_page)
> > > +{
> > > + return _global_lru;
> > > +}
> > > +
> > > +static inline bool sgx_can_reclaim(void)
> > > +{
> > > + return !list_empty(_global_lru.reclaimable);
> > > +}
> > > +
> > 
> > Shouldn't sgx_can_reclaim() also take a 'struct sgx_epc_lru_lists *'?
> > 
> > I thought we also need to check whether a cgroup's LRU lists can be  
> > reclaimed?
> 
> This is only used to check if any pages reclaimable at the top level in  
> this file. Later sgx_epc_cgroup_lru_empty(NULL) is used in this function  
> to recursively check all cgroups starting from the root.
> 
> 

This again falls to the "impossible to review unless review a later patch first"
category.  This patch says nothing about sgx_can_reclaim() will only be used at
the top level.  Even if it does, why cannot it take LRU lists as input?

All this patch says is we need to prepare these functions to suit multiple LRU
lists.

Btw, why sgx_reclaim_epc_pages() doesn't take LRU lists as input either?  Is it
possible that it can be called across multiple LRU lists, or across different
lists in one LRU?

Why do we need to find some particular LRU lists by given EPC page?

+static inline struct sgx_epc_lru_lists *sgx_lru_lists(struct sgx_epc_page
*epc_page)
+{
+   return _global_lru;
+}
+

Maybe it's clear for other people, but to me it sounds some necessary design
background is missing at least.

Please try best to make the patch self-reviewable by justifying all of those.


Re: [PATCH v5 13/18] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup

2023-10-05 Thread Huang, Kai

> > > 
> > > -/*
> > > +/**
> > > + * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers
> > > + * @nr_to_scan:   Number of EPC pages to scan for reclaim
> > > + * @ignore_age:   Reclaim a page even if it is young
> > > + *
> > >   * Take a fixed number of pages from the head of the active page pool  
> > > and
> > >   * reclaim them to the enclave's private shmem files. Skip the pages,  
> > > which have
> > >   * been accessed since the last scan. Move those pages to the tail of  
> > > active
> > > @@ -292,15 +301,14 @@ static void sgx_reclaimer_write(struct  
> > > sgx_epc_page *epc_page,
> > >   * problematic as it would increase the lock contention too much,  
> > > which would
> > >   * halt forward progress.
> > >   */
> > > -static void sgx_reclaim_pages(void)
> > > +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age)
> > 
> > 'size_t' looks odd.  Any reason to use it?
> > 
> > Given you only scan 32 at maximum, seems 'int' is good enough?
> > 
> 
> Initially was int.
> Jarkko was suggesting ssize_t. I changed to size_t as this function will  
> never return negative.

Then 'unsigned int'.  We are talking about 32 at max here.

size_t is more suitable for bytes, but we are dealing with number of pages.

Maybe Jarkko could comment why size_t is better.

[...]

> > 
> > >   i = 0;
> > >   list_for_each_entry_safe(epc_page, tmp, , list) {
> > >   encl_page = epc_page->encl_page;
> > > 
> > > - if (!sgx_reclaimer_age(epc_page))
> > > + if (i == SGX_NR_TO_SCAN_MAX ||
> > 
> > i == nr_to_scan?
> > 
> Not needed if above for statement fixed for nr_to_scan.
> Anything above MAX will be skipped and put back to LRU.

I believe using nr_to_scan is more logically correct.

[...]


> > 
> > I found this function a little bit odd, given the mixing of 'nr_to_scan',
> > SGX_NR_TO_SCAN and SGX_NR_TO_SCAN_MAX.
> > 
> > From the changelog:
> > 
> > 1) To take a parameter that specifies the number of pages to scan for
> > reclaiming. Define a max value of 32, but scan 16 in the case for the
> > global reclaimer (ksgxd).
> > 
> > It appears we want to make this function to scan @nr_to_scan for cgroup,  
> > but
> > still want to scan a fixed value for ksgxd, which is SGX_NR_TO_SCAN.  And
> > @nr_to_scan can be larger than SGX_NR_TO_SCAN but smaller than
> > SGX_NR_TO_SCAN_MAX.
> > 
> > Putting behind the mystery of why above is needed, to achieve it, is it  
> > more
> > clear if we do below?
> > 
> > int __sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age)
> > {
> > struct sgx_backing backing[SGX_NR_TO_SCAN_MAX];
> > ...
> > 
> > if (nr_to_scan > SGX_NR_TO_SCAN_MAX)
> > return 0;
> 
> We could set nr_to_scan to MAX but since this is code internal to driver,  
> maybe just make sure callers don't call with bigger numbers.

Please add this check, using WARN_ON_ONCE() if it's better.

Then the code is much easier to review.

> 
> > 
> > for (i = 0; i < nr_to_scan; i++) {
> > ...
> > }
> > 
> 
> yes

please fix this up, then ...

> 
> > return reclaimed;
> > }
> > 
> > /* This is for ksgxd() */
> > int sgx_reclaim_epc_page(void)
> > {
> > return __sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
> > }
> 
> Some maintainers may prefer no wrapping.
> 

... OK.


Re: [PATCH v5 15/18] x86/sgx: Prepare for multiple LRUs

2023-10-05 Thread Haitao Huang

On Thu, 05 Oct 2023 07:30:46 -0500, Huang, Kai  wrote:


On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
+static inline struct sgx_epc_lru_lists *sgx_lru_lists(struct  
sgx_epc_page *epc_page)

+{
+   return _global_lru;
+}
+
+static inline bool sgx_can_reclaim(void)
+{
+   return !list_empty(_global_lru.reclaimable);
+}
+


Shouldn't sgx_can_reclaim() also take a 'struct sgx_epc_lru_lists *'?

I thought we also need to check whether a cgroup's LRU lists can be  
reclaimed?


This is only used to check if any pages reclaimable at the top level in  
this file. Later sgx_epc_cgroup_lru_empty(NULL) is used in this function  
to recursively check all cgroups starting from the root.


Haitao


Re: [PATCH v5 13/18] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup

2023-10-05 Thread Haitao Huang

On Thu, 05 Oct 2023 07:24:12 -0500, Huang, Kai  wrote:


On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:

From: Sean Christopherson 

Adjust and expose the top-level reclaim function as
sgx_reclaim_epc_pages() for use by the upcoming EPC cgroup, which will
initiate reclaim to enforce the max limit.

Make these adjustments to the function signature.

1) To take a parameter that specifies the number of pages to scan for
reclaiming. Define a max value of 32, but scan 16 in the case for the
global reclaimer (ksgxd). The EPC cgroup will use it to specify a
desired number of pages to be reclaimed up to the max value of 32.

2) To take a flag to force reclaiming a page regardless of its age.  The
EPC cgroup will use the flag to enforce its limits by draining the
reclaimable lists before resorting to other measures, e.g. forcefully
kill enclaves.

3) Return the number of reclaimed pages. The EPC cgroup will use the
result to track reclaiming progress and escalate to a more forceful
reclaiming mode, e.g., calling this function with the flag to ignore age
of pages.

Signed-off-by: Sean Christopherson 
Co-developed-by: Kristen Carlson Accardi 
Signed-off-by: Kristen Carlson Accardi 
Co-developed-by: Haitao Huang 
Signed-off-by: Haitao Huang 
Cc: Sean Christopherson 
---
V4:
- Combined the 3 patches that made the individual changes to the
function signature.
- Removed 'high' limit in commit message.
---
 arch/x86/kernel/cpu/sgx/main.c | 31 +--
 arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c  
b/arch/x86/kernel/cpu/sgx/main.c

index 3b875ab4dcd0..4e1a3e038db5 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -18,6 +18,11 @@
 #include "encl.h"
 #include "encls.h"

+/*
+ * Maximum number of pages to scan for reclaiming.
+ */
+#define SGX_NR_TO_SCAN_MAX 32
+
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
@@ -279,7 +284,11 @@ static void sgx_reclaimer_write(struct  
sgx_epc_page *epc_page,

mutex_unlock(>lock);
 }

-/*
+/**
+ * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers
+ * @nr_to_scan: Number of EPC pages to scan for reclaim
+ * @ignore_age: Reclaim a page even if it is young
+ *
  * Take a fixed number of pages from the head of the active page pool  
and
  * reclaim them to the enclave's private shmem files. Skip the pages,  
which have
  * been accessed since the last scan. Move those pages to the tail of  
active
@@ -292,15 +301,14 @@ static void sgx_reclaimer_write(struct  
sgx_epc_page *epc_page,
  * problematic as it would increase the lock contention too much,  
which would

  * halt forward progress.
  */
-static void sgx_reclaim_pages(void)
+size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age)


'size_t' looks odd.  Any reason to use it?

Given you only scan 32 at maximum, seems 'int' is good enough?



Initially was int.
Jarkko was suggesting ssize_t. I changed to size_t as this function will  
never return negative.



 {
-   struct sgx_backing backing[SGX_NR_TO_SCAN];
+   struct sgx_backing backing[SGX_NR_TO_SCAN_MAX];
struct sgx_epc_page *epc_page, *tmp;
struct sgx_encl_page *encl_page;
pgoff_t page_index;
LIST_HEAD(iso);
-   int ret;
-   int i;
+   size_t ret, i;

spin_lock(_global_lru.lock);
for (i = 0; i < SGX_NR_TO_SCAN; i++) {



This should be nr_to_scan
It was missed during some rebase and reordering operations.


The function comment says

* @nr_to_scan:   Number of EPC pages to scan for reclaim

But I don't see it is even used, if my eye isn't deceiving me?


@@ -326,13 +334,14 @@ static void sgx_reclaim_pages(void)
spin_unlock(_global_lru.lock);

if (list_empty())
-   return;
+   return 0;

i = 0;
list_for_each_entry_safe(epc_page, tmp, , list) {
encl_page = epc_page->encl_page;

-   if (!sgx_reclaimer_age(epc_page))
+   if (i == SGX_NR_TO_SCAN_MAX ||


i == nr_to_scan?


Not needed if above for statement fixed for nr_to_scan.
Anything above MAX will be skipped and put back to LRU.


And should we have a

if (nr_to_scan < SGX_NR_TO_SCAN_MAX)
return 0;

at the very beginning of this function?



 In final version caller to make sure not call with nr_to_scan not larger  
than SGX_NR_TO_SCAN_MAX



+   (!ignore_age && !sgx_reclaimer_age(epc_page)))
goto skip;

page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
@@ -371,6 +380,8 @@ static void sgx_reclaim_pages(void)

sgx_free_epc_page(epc_page);
}
+
+   return i;
 }



I found this function a little bit odd, given the mixing of 

Re: [PATCH v1 2/9] docs: firmware-guide: ACPI: Clarify ACPI bus concepts

2023-10-05 Thread Wilczynski, Michal



On 10/5/2023 8:28 PM, Wilczynski, Michal wrote:
>
> On 10/5/2023 7:57 PM, Rafael J. Wysocki wrote:
>> On Monday, September 25, 2023 4:48:35 PM CEST Michal Wilczynski wrote:
>>> Some devices implement ACPI driver as a way to manage devices
>>> enumerated by the ACPI. This might be confusing as a preferred way to
>>> implement a driver for devices not connected to any bus is a platform
>>> driver, as stated in the documentation. Clarify relationships between
>>> ACPI device, platform device and ACPI entries.
>>>
>>> Suggested-by: Elena Reshetova 
>>> Signed-off-by: Michal Wilczynski 
>>> ---
>>>  Documentation/firmware-guide/acpi/enumeration.rst | 13 +
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/Documentation/firmware-guide/acpi/enumeration.rst 
>>> b/Documentation/firmware-guide/acpi/enumeration.rst
>>> index 56d9913a3370..f56cc79a9e83 100644
>>> --- a/Documentation/firmware-guide/acpi/enumeration.rst
>>> +++ b/Documentation/firmware-guide/acpi/enumeration.rst
>>> @@ -64,6 +64,19 @@ If the driver needs to perform more complex 
>>> initialization like getting and
>>>  configuring GPIOs it can get its ACPI handle and extract this information
>>>  from ACPI tables.
>>>  
>>> +ACPI bus
>>> +
>>> +
>>> +Historically some devices not connected to any bus were represented as ACPI
>>> +devices, and had to implement ACPI driver. This is not a preferred way for 
>>> new
>>> +drivers. As explained above devices not connected to any bus should 
>>> implement
>>> +platform driver. ACPI device would be created during enumeration 
>>> nonetheless,
>>> +and would be accessible through ACPI_COMPANION() macro, and the ACPI 
>>> handle would
>>> +be accessible through ACPI_HANDLE() macro. ACPI device is meant to describe
>>> +information related to ACPI entry e.g. handle of the ACPI entry. Think -
>>> +ACPI device interfaces with the FW, and the platform device with the rest 
>>> of
>>> +the system.
>>> +
>>>  DMA support
>>>  ===
>> I rewrote the above entirely, so here's a new patch to replace this one:
>>
>> ---
>> From: Rafael J. Wysocki 
>> Subject: [PATCH v2 2/9] ACPI: docs: enumeration: Clarify ACPI bus concepts
>>
>> In some cases, ACPI drivers are implemented as a way to manage devices
>> enumerated with the help of the platform firmware through ACPI.
>>
>> This might be confusing, since the preferred way to implement a driver
>> for a device that cannot be enumerated natively, is a platform
>> driver, as stated in the documentation.
>>
>> Clarify relationships between ACPI device objects, platform devices and
>> ACPI Namespace entries.
>>
>> Suggested-by: Elena Reshetova 
>> Co-developed-by: Michal Wilczynski 
>> Signed-off-by: Michal Wilczynski 
>> Signed-off-by: Rafael J. Wysocki 
>> ---
>>  Documentation/firmware-guide/acpi/enumeration.rst |   43 
>> ++
>>  1 file changed, 43 insertions(+)
>>
>> Index: linux-pm/Documentation/firmware-guide/acpi/enumeration.rst
>> ===
>> --- linux-pm.orig/Documentation/firmware-guide/acpi/enumeration.rst
>> +++ linux-pm/Documentation/firmware-guide/acpi/enumeration.rst
>> @@ -64,6 +64,49 @@ If the driver needs to perform more comp
>>  configuring GPIOs it can get its ACPI handle and extract this information
>>  from ACPI tables.
>>  
>> +ACPI device objects
>> +===
>> +
>> +Generally speaking, there are two categories of devices in a system in which
>> +ACPI is used as an interface between the platform firmware and the OS: 
>> Devices
>> +that can be discovered and enumerated natively, through a protocol defined 
>> for
>> +the specific bus that they are on (for example, configuration space in PCI),
>> +without the platform firmware assistance, and devices that need to be 
>> described
>> +by the platform firmware so that they can be discovered.  Still, for any 
>> device
>> +known to the platform firmware, regardless of which category it falls into,
>> +there can be a corresponding ACPI device object in the ACPI Namespace in 
>> which
>> +case the Linux kernel will create a struct acpi_device object based on it 
>> for
>> +that device.
>> +
>> +Those struct acpi_device objects are never used for binding drivers to 
>> natively
>> +discoverable devices, because they are represented by other types of device
>> +objects (for example, struct pci_dev for PCI devices) that are bound to by
>> +device drivers (the corresponding struct acpi_device object is then used as
>> +an additional source of information on the configuration of the given 
>> device).
>> +Moreover, the core ACPI device enumeration code creates struct 
>> platform_device
>> +objects for the majority of devices that are discovered and enumerated with 
>> the
>> +help of the platform firmware and those platform device objects can be 
>> bound to
>> +by platform drivers in direct analogy with the natively enumerable devices
>> +case.  Therefore it is logically 

[PATCH v5 1/2] mm/memory_hotplug: split memmap_on_memory requests across memblocks

2023-10-05 Thread Vishal Verma
The MHP_MEMMAP_ON_MEMORY flag for hotplugged memory is restricted to
'memblock_size' chunks of memory being added. Adding a larger span of
memory precludes memmap_on_memory semantics.

For users of hotplug such as kmem, large amounts of memory might get
added from the CXL subsystem. In some cases, this amount may exceed the
available 'main memory' to store the memmap for the memory being added.
In this case, it is useful to have a way to place the memmap on the
memory being added, even if it means splitting the addition into
memblock-sized chunks.

Change add_memory_resource() to loop over memblock-sized chunks of
memory if caller requested memmap_on_memory, and if other conditions for
it are met. Teach try_remove_memory() to also expect that a memory
range being removed might have been split up into memblock sized chunks,
and to loop through those as needed.

Cc: Andrew Morton 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Dave Hansen 
Cc: Huang Ying 
Suggested-by: David Hildenbrand 
Signed-off-by: Vishal Verma 
---
 mm/memory_hotplug.c | 162 
 1 file changed, 99 insertions(+), 63 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f8d3e7427e32..77ec6f15f943 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1380,6 +1380,44 @@ static bool mhp_supports_memmap_on_memory(unsigned long 
size)
return arch_supports_memmap_on_memory(vmemmap_size);
 }
 
+static int add_memory_create_devices(int nid, struct memory_group *group,
+u64 start, u64 size, mhp_t mhp_flags)
+{
+   struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
+   struct vmem_altmap mhp_altmap = {
+   .base_pfn =  PHYS_PFN(start),
+   .end_pfn  =  PHYS_PFN(start + size - 1),
+   };
+   int ret;
+
+   if ((mhp_flags & MHP_MEMMAP_ON_MEMORY)) {
+   mhp_altmap.free = memory_block_memmap_on_memory_pages();
+   params.altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
+   if (!params.altmap)
+   return -ENOMEM;
+
+   memcpy(params.altmap, _altmap, sizeof(mhp_altmap));
+   }
+
+   /* call arch's memory hotadd */
+   ret = arch_add_memory(nid, start, size, );
+   if (ret < 0)
+   goto error;
+
+   /* create memory block devices after memory was added */
+   ret = create_memory_block_devices(start, size, params.altmap, group);
+   if (ret)
+   goto err_bdev;
+
+   return 0;
+
+err_bdev:
+   arch_remove_memory(start, size, NULL);
+error:
+   kfree(params.altmap);
+   return ret;
+}
+
 /*
  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
  * and online/offline operations (triggered e.g. by sysfs).
@@ -1388,14 +1426,10 @@ static bool mhp_supports_memmap_on_memory(unsigned long 
size)
  */
 int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
-   struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
+   unsigned long memblock_size = memory_block_size_bytes();
enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-   struct vmem_altmap mhp_altmap = {
-   .base_pfn =  PHYS_PFN(res->start),
-   .end_pfn  =  PHYS_PFN(res->end),
-   };
struct memory_group *group = NULL;
-   u64 start, size;
+   u64 start, size, cur_start;
bool new_node = false;
int ret;
 
@@ -1436,28 +1470,21 @@ int __ref add_memory_resource(int nid, struct resource 
*res, mhp_t mhp_flags)
/*
 * Self hosted memmap array
 */
-   if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
-   if (mhp_supports_memmap_on_memory(size)) {
-   mhp_altmap.free = memory_block_memmap_on_memory_pages();
-   params.altmap = kmalloc(sizeof(struct vmem_altmap), 
GFP_KERNEL);
-   if (!params.altmap)
+   if ((mhp_flags & MHP_MEMMAP_ON_MEMORY) &&
+   mhp_supports_memmap_on_memory(memblock_size)) {
+   for (cur_start = start; cur_start < start + size;
+cur_start += memblock_size) {
+   ret = add_memory_create_devices(nid, group, cur_start,
+   memblock_size,
+   mhp_flags);
+   if (ret)
goto error;
-
-   memcpy(params.altmap, _altmap, sizeof(mhp_altmap));
}
-   /* fallback to not using altmap  */
-   }
-
-   /* call arch's memory hotadd */
-   ret = arch_add_memory(nid, start, size, );
-   if (ret < 0)
-   goto error_free;
-
-   /* create memory block devices after memory was added */
-   ret = 

[PATCH v5 0/2] mm: use memmap_on_memory semantics for dax/kmem

2023-10-05 Thread Vishal Verma
The dax/kmem driver can potentially hot-add large amounts of memory
originating from CXL memory expanders, or NVDIMMs, or other 'device
memories'. There is a chance there isn't enough regular system memory
available to fit the memmap for this new memory. It's therefore
desirable, if all other conditions are met, for the kmem managed memory
to place its memmap on the newly added memory itself.

The main hurdle for accomplishing this for kmem is that memmap_on_memory
can only be done if the memory being added is equal to the size of one
memblock. To overcome this, allow the hotplug code to split an add_memory()
request into memblock-sized chunks, and try_remove_memory() to also
expect and handle such a scenario.

Patch 1 teaches the memory_hotplug code to allow for splitting
add_memory() and remove_memory() requests over memblock sized chunks.

Patch 2 adds a sysfs control for the kmem driver that would
allow an opt-out of using memmap_on_memory for the memory being added.

Signed-off-by: Vishal Verma 
---
Changes in v5:
- Separate out per-memblock operations from per memory block operations
  in try_remove_memory(), and rename the inner function appropriately.
  This does expand the scope of the memory hotplug lock to include
  remove_memory_block_devices(), but the alternative was to drop the
  lock in the inner function separately for each iteration, and then
  re-acquire it in try_remove_memory() creating a small window where
  the lock isn't held. (David Hildenbrand)
- Remove unnecessary rc check from the memmap_on_memory_store sysfs
  helper in patch 2 (Dan Carpenter)
- Link to v4: 
https://lore.kernel.org/r/20230928-vv-kmem_memmap-v4-0-6ff73fec5...@intel.com

Changes in v4:
- Rebase to Aneesh's PPC64 memmap_on_memory series v8 [2].
- Tweak a goto / error path in add_memory_create_devices() (Jonathan)
- Retain the old behavior for dax devices, only default to
  memmap_on_memory for CXL (Jonathan)
- Link to v3: 
https://lore.kernel.org/r/20230801-vv-kmem_memmap-v3-0-406e9aaf5...@intel.com

[2]: 
https://lore.kernel.org/linux-mm/20230808091501.287660-1-aneesh.ku...@linux.ibm.com

Changes in v3:
- Rebase on Aneesh's patches [1]
- Drop Patch 1 - it is not needed since [1] allows for dynamic setting
  of the memmap_on_memory param (David)
- Link to v2: 
https://lore.kernel.org/r/20230720-vv-kmem_memmap-v2-0-88bdaab34...@intel.com

[1]: https://lore.kernel.org/r/20230801044116.10674-1-aneesh.ku...@linux.ibm.com

Changes in v2:
- Drop the patch to create an override path for the memmap_on_memory
  module param (David)
- Move the chunking into memory_hotplug.c so that any caller of
  add_memory() can request this behavior. (David)
- Handle remove_memory() too. (David, Ying)
- Add a sysfs control in the kmem driver for memmap_on_memory semantics
  (David, Jonathan)
- Add a #else case to define mhp_supports_memmap_on_memory() if
  CONFIG_MEMORY_HOTPLUG is unset. (0day report)
- Link to v1: 
https://lore.kernel.org/r/20230613-vv-kmem_memmap-v1-0-f6de9c6af...@intel.com

---
Vishal Verma (2):
  mm/memory_hotplug: split memmap_on_memory requests across memblocks
  dax/kmem: allow kmem to add memory with memmap_on_memory

 drivers/dax/bus.h |   1 +
 drivers/dax/dax-private.h |   1 +
 drivers/dax/bus.c |  38 +++
 drivers/dax/cxl.c |   1 +
 drivers/dax/hmem/hmem.c   |   1 +
 drivers/dax/kmem.c|   8 ++-
 drivers/dax/pmem.c|   1 +
 mm/memory_hotplug.c   | 162 --
 8 files changed, 149 insertions(+), 64 deletions(-)
---
base-commit: 25b5b1a0646c3d39e1d885e27c10be1c9e202bf2
change-id: 20230613-vv-kmem_memmap-5483c8d04279

Best regards,
-- 
Vishal Verma 




[PATCH v5 2/2] dax/kmem: allow kmem to add memory with memmap_on_memory

2023-10-05 Thread Vishal Verma
Large amounts of memory managed by the kmem driver may come in via CXL,
and it is often desirable to have the memmap for this memory on the new
memory itself.

Enroll kmem-managed memory for memmap_on_memory semantics if the dax
region originates via CXL. For non-CXL dax regions, retain the existing
default behavior of hot adding without memmap_on_memory semantics.

Add a sysfs override under the dax device to control this behavior and
override either default.

Cc: Andrew Morton 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Dave Hansen 
Cc: Huang Ying 
Reviewed-by: Jonathan Cameron 
Reviewed-by: David Hildenbrand 
Signed-off-by: Vishal Verma 
---
 drivers/dax/bus.h |  1 +
 drivers/dax/dax-private.h |  1 +
 drivers/dax/bus.c | 38 ++
 drivers/dax/cxl.c |  1 +
 drivers/dax/hmem/hmem.c   |  1 +
 drivers/dax/kmem.c|  8 +++-
 drivers/dax/pmem.c|  1 +
 7 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 1ccd23360124..cbbf64443098 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -23,6 +23,7 @@ struct dev_dax_data {
struct dev_pagemap *pgmap;
resource_size_t size;
int id;
+   bool memmap_on_memory;
 };
 
 struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data);
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 27cf2d79..446617b73aea 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -70,6 +70,7 @@ struct dev_dax {
struct ida ida;
struct device dev;
struct dev_pagemap *pgmap;
+   bool memmap_on_memory;
int nr_range;
struct dev_dax_range {
unsigned long pgoff;
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 0ee96e6fc426..43be95a231c9 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -367,6 +367,7 @@ static ssize_t create_store(struct device *dev, struct 
device_attribute *attr,
.dax_region = dax_region,
.size = 0,
.id = -1,
+   .memmap_on_memory = false,
};
struct dev_dax *dev_dax = devm_create_dev_dax();
 
@@ -1269,6 +1270,40 @@ static ssize_t numa_node_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(numa_node);
 
+static ssize_t memmap_on_memory_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   struct dev_dax *dev_dax = to_dev_dax(dev);
+
+   return sprintf(buf, "%d\n", dev_dax->memmap_on_memory);
+}
+
+static ssize_t memmap_on_memory_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+   struct dev_dax *dev_dax = to_dev_dax(dev);
+   struct dax_region *dax_region = dev_dax->region;
+   ssize_t rc;
+   bool val;
+
+   rc = kstrtobool(buf, );
+   if (rc)
+   return rc;
+
+   device_lock(dax_region->dev);
+   if (!dax_region->dev->driver) {
+   device_unlock(dax_region->dev);
+   return -ENXIO;
+   }
+
+   dev_dax->memmap_on_memory = val;
+
+   device_unlock(dax_region->dev);
+   return len;
+}
+static DEVICE_ATTR_RW(memmap_on_memory);
+
 static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int 
n)
 {
struct device *dev = container_of(kobj, struct device, kobj);
@@ -1295,6 +1330,7 @@ static struct attribute *dev_dax_attributes[] = {
_attr_align.attr,
_attr_resource.attr,
_attr_numa_node.attr,
+   _attr_memmap_on_memory.attr,
NULL,
 };
 
@@ -1400,6 +1436,8 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data 
*data)
dev_dax->align = dax_region->align;
ida_init(_dax->ida);
 
+   dev_dax->memmap_on_memory = data->memmap_on_memory;
+
inode = dax_inode(dax_dev);
dev->devt = inode->i_rdev;
dev->bus = _bus_type;
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index 8bc9d04034d6..c696837ab23c 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -26,6 +26,7 @@ static int cxl_dax_region_probe(struct device *dev)
.dax_region = dax_region,
.id = -1,
.size = range_len(_dax->hpa_range),
+   .memmap_on_memory = true,
};
 
return PTR_ERR_OR_ZERO(devm_create_dev_dax());
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 5d2ddef0f8f5..b9da69f92697 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -36,6 +36,7 @@ static int dax_hmem_probe(struct platform_device *pdev)
.dax_region = dax_region,
.id = -1,
.size = region_idle ? 0 : range_len(>range),
+   .memmap_on_memory = false,
};
 
return 

Re: [PATCH v1 1/9] ACPI: bus: Make notify wrappers more generic

2023-10-05 Thread Rafael J. Wysocki
On Thu, Oct 5, 2023 at 8:27 PM Wilczynski, Michal
 wrote:
>
>
>
> On 10/5/2023 7:03 PM, Rafael J. Wysocki wrote:
> > On Thursday, October 5, 2023 5:30:59 PM CEST Rafael J. Wysocki wrote:
> >> On Thu, Oct 5, 2023 at 2:05 PM Wilczynski, Michal
> >>  wrote:
> >>> On 10/5/2023 12:57 PM, Rafael J. Wysocki wrote:
>  On Thu, Oct 5, 2023 at 10:10 AM Wilczynski, Michal
>   wrote:
> >> [cut]
> >>
> >> That said, why exactly is it better to use acpi_handle instead of a
> >> struct acpi_device pointer?
> > I wanted to make the wrapper as close as possible to the wrapped 
> > function.
> > This way it would be easier to remove it in the future i.e if we ever 
> > deem
> > extra synchronization not worth it etc. What the ACPICA function need to
> > install a wrapper is a handle not a pointer to a device.
> > So there is no need for a middle man.
>  Taking a struct acpi_device pointer as the first argument is part of
>  duplication reduction, however, because in the most common case it
>  saves the users of it the need to dereference the struct acpi_device
>  they get from ACPI_COMPANION() in order to obtain the handle.
> >>> User don't even have to use acpi device anywhere, as he can choose
> >>> to use ACPI_HANDLE() instead on 'struct device*' and never interact
> >>> with acpi device directly.
> >> Have you actually looked at this macro?  It is a wrapper around
> >> ACPI_COMPANION().
> >>
> >> So they may think that they don't use struct acpi_device pointers, but
> >> in fact they do.
> >>
>  Arguably, acpi_handle is an ACPICA concept and it is better to reduce
>  its usage outside ACPICA.
> >>> Use of acpi_handle is deeply entrenched in the kernel. There is even
> >>> a macro ACPI_HANDLE() that returns acpi_handle. I would say it's
> >>> way too late to limit it to ACPICA internal code.
> >> So there is a difference between "limiting to ACPICA" and "reducing".
> >> It cannot be limited to ACPICA, because the code outside ACPICA needs
> >> to evaluate ACPI objects sometimes and ACPI handles are needed for
> >> that.
> >>
> >> And this observation doesn't invalidate the point.
> >>
> >> Realistically, in a platform driver you'll need the latter to obtain
> >> the former anyway.
> > I don't want to introduce arbitrary limitations where they are not 
> > necessary.
>  I'm not sure what you mean.  This patch is changing existing functions.
> >>> That's true, but those functions aren't yet deeply entrenched in the
> >>> kernel yet, so in my view how they should look like should still be
> >>> a subject for discussion, as for now they're only used locally in
> >>> drivers/acpi, and my next patchset, that would remove .notify in
> >>> platform directory would spread them more, and would
> >>> make them harder to change. For now we can change how they
> >>> work pretty painlessly.
> >> I see no particular reason to do that, though.
> >>
> >> What specifically is a problem with passing struct acpi_device
> >> pointers to the wrappers?  I don't see any TBH.
> >>
> > It is often the case that driver allocates it's own private struct 
> > using kmalloc
> > family of functions, and that structure already contains everything 
> > that is
> > needed to remove the handler, so why force ? There are already examples
> > in the drivers that do that i.e in acpi_video the function
> > acpi_video_dev_add_notify_handler() uses raw ACPICA handler to install
> > a notify handler and it passes private structure there.
> > So there is value in leaving the choice of an actual type to the user 
> > of the
> > API.
>  No, if the user has a pointer to struct acpi_device already, there is
>  no difference between passing this and passing the acpi_handle from it
>  except for the extra dereference in the latter case.
> >>> Dereference would happen anyway in the wrapper, and it doesn't cause
> >>> any harm anyway for readability in my opinion. And of course you don't
> >>> have to use acpi device at all, you can use ACPI_HANDLE() macro.
> >> So one can use ACPI_COMPANION() just as well and it is slightly less 
> >> overhead.
> >>
>  If the user doesn't have a struct acpi_device pointer, let them use
>  the raw ACPICA handler directly and worry about the synchronization
>  themselves.
> >>> As mentioned acpi_device pointer is not really required to use the 
> >>> wrapper.
> >>> Instead we can use ACPI_HANDLE() macro directly. Look at the usage of
> >>> the wrapper in the AC driver [1].
> >> You don't really have to repeat the same argument  several times and I
> >> know how ACPI_HANDLE() works.  Also I don't like some of the things
> >> done by this patch.
> >>
> >> Whoever uses ACPI_HANDLE(), they also use ACPI_COMPANION() which is
> >> hidden in the former.
> >>
> >> If they don't need to store either the acpi_handle or the struct
> >> acpi_device pointer, there is no reason at all to use the 

Re: [PATCH v1 2/9] docs: firmware-guide: ACPI: Clarify ACPI bus concepts

2023-10-05 Thread Wilczynski, Michal



On 10/5/2023 7:57 PM, Rafael J. Wysocki wrote:
> On Monday, September 25, 2023 4:48:35 PM CEST Michal Wilczynski wrote:
>> Some devices implement ACPI driver as a way to manage devices
>> enumerated by the ACPI. This might be confusing as a preferred way to
>> implement a driver for devices not connected to any bus is a platform
>> driver, as stated in the documentation. Clarify relationships between
>> ACPI device, platform device and ACPI entries.
>>
>> Suggested-by: Elena Reshetova 
>> Signed-off-by: Michal Wilczynski 
>> ---
>>  Documentation/firmware-guide/acpi/enumeration.rst | 13 +
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/Documentation/firmware-guide/acpi/enumeration.rst 
>> b/Documentation/firmware-guide/acpi/enumeration.rst
>> index 56d9913a3370..f56cc79a9e83 100644
>> --- a/Documentation/firmware-guide/acpi/enumeration.rst
>> +++ b/Documentation/firmware-guide/acpi/enumeration.rst
>> @@ -64,6 +64,19 @@ If the driver needs to perform more complex 
>> initialization like getting and
>>  configuring GPIOs it can get its ACPI handle and extract this information
>>  from ACPI tables.
>>  
>> +ACPI bus
>> +
>> +
>> +Historically some devices not connected to any bus were represented as ACPI
>> +devices, and had to implement ACPI driver. This is not a preferred way for 
>> new
>> +drivers. As explained above devices not connected to any bus should 
>> implement
>> +platform driver. ACPI device would be created during enumeration 
>> nonetheless,
>> +and would be accessible through ACPI_COMPANION() macro, and the ACPI handle 
>> would
>> +be accessible through ACPI_HANDLE() macro. ACPI device is meant to describe
>> +information related to ACPI entry e.g. handle of the ACPI entry. Think -
>> +ACPI device interfaces with the FW, and the platform device with the rest of
>> +the system.
>> +
>>  DMA support
>>  ===
> I rewrote the above entirely, so here's a new patch to replace this one:
>
> ---
> From: Rafael J. Wysocki 
> Subject: [PATCH v2 2/9] ACPI: docs: enumeration: Clarify ACPI bus concepts
>
> In some cases, ACPI drivers are implemented as a way to manage devices
> enumerated with the help of the platform firmware through ACPI.
>
> This might be confusing, since the preferred way to implement a driver
> for a device that cannot be enumerated natively, is a platform
> driver, as stated in the documentation.
>
> Clarify relationships between ACPI device objects, platform devices and
> ACPI Namespace entries.
>
> Suggested-by: Elena Reshetova 
> Co-developed-by: Michal Wilczynski 
> Signed-off-by: Michal Wilczynski 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  Documentation/firmware-guide/acpi/enumeration.rst |   43 
> ++
>  1 file changed, 43 insertions(+)
>
> Index: linux-pm/Documentation/firmware-guide/acpi/enumeration.rst
> ===
> --- linux-pm.orig/Documentation/firmware-guide/acpi/enumeration.rst
> +++ linux-pm/Documentation/firmware-guide/acpi/enumeration.rst
> @@ -64,6 +64,49 @@ If the driver needs to perform more comp
>  configuring GPIOs it can get its ACPI handle and extract this information
>  from ACPI tables.
>  
> +ACPI device objects
> +===
> +
> +Generally speaking, there are two categories of devices in a system in which
> +ACPI is used as an interface between the platform firmware and the OS: 
> Devices
> +that can be discovered and enumerated natively, through a protocol defined 
> for
> +the specific bus that they are on (for example, configuration space in PCI),
> +without the platform firmware assistance, and devices that need to be 
> described
> +by the platform firmware so that they can be discovered.  Still, for any 
> device
> +known to the platform firmware, regardless of which category it falls into,
> +there can be a corresponding ACPI device object in the ACPI Namespace in 
> which
> +case the Linux kernel will create a struct acpi_device object based on it for
> +that device.
> +
> +Those struct acpi_device objects are never used for binding drivers to 
> natively
> +discoverable devices, because they are represented by other types of device
> +objects (for example, struct pci_dev for PCI devices) that are bound to by
> +device drivers (the corresponding struct acpi_device object is then used as
> +an additional source of information on the configuration of the given 
> device).
> +Moreover, the core ACPI device enumeration code creates struct 
> platform_device
> +objects for the majority of devices that are discovered and enumerated with 
> the
> +help of the platform firmware and those platform device objects can be bound 
> to
> +by platform drivers in direct analogy with the natively enumerable devices
> +case.  Therefore it is logically inconsistent and so generally invalid to 
> bind
> +drivers to struct acpi_device objects, including drivers for devices that are
> +discovered with the help of the platform 

Re: [PATCH v1 1/9] ACPI: bus: Make notify wrappers more generic

2023-10-05 Thread Wilczynski, Michal



On 10/5/2023 7:03 PM, Rafael J. Wysocki wrote:
> On Thursday, October 5, 2023 5:30:59 PM CEST Rafael J. Wysocki wrote:
>> On Thu, Oct 5, 2023 at 2:05 PM Wilczynski, Michal
>>  wrote:
>>> On 10/5/2023 12:57 PM, Rafael J. Wysocki wrote:
 On Thu, Oct 5, 2023 at 10:10 AM Wilczynski, Michal
  wrote:
>> [cut]
>>
>> That said, why exactly is it better to use acpi_handle instead of a
>> struct acpi_device pointer?
> I wanted to make the wrapper as close as possible to the wrapped function.
> This way it would be easier to remove it in the future i.e if we ever deem
> extra synchronization not worth it etc. What the ACPICA function need to
> install a wrapper is a handle not a pointer to a device.
> So there is no need for a middle man.
 Taking a struct acpi_device pointer as the first argument is part of
 duplication reduction, however, because in the most common case it
 saves the users of it the need to dereference the struct acpi_device
 they get from ACPI_COMPANION() in order to obtain the handle.
>>> User don't even have to use acpi device anywhere, as he can choose
>>> to use ACPI_HANDLE() instead on 'struct device*' and never interact
>>> with acpi device directly.
>> Have you actually looked at this macro?  It is a wrapper around
>> ACPI_COMPANION().
>>
>> So they may think that they don't use struct acpi_device pointers, but
>> in fact they do.
>>
 Arguably, acpi_handle is an ACPICA concept and it is better to reduce
 its usage outside ACPICA.
>>> Use of acpi_handle is deeply entrenched in the kernel. There is even
>>> a macro ACPI_HANDLE() that returns acpi_handle. I would say it's
>>> way too late to limit it to ACPICA internal code.
>> So there is a difference between "limiting to ACPICA" and "reducing".
>> It cannot be limited to ACPICA, because the code outside ACPICA needs
>> to evaluate ACPI objects sometimes and ACPI handles are needed for
>> that.
>>
>> And this observation doesn't invalidate the point.
>>
>> Realistically, in a platform driver you'll need the latter to obtain
>> the former anyway.
> I don't want to introduce arbitrary limitations where they are not 
> necessary.
 I'm not sure what you mean.  This patch is changing existing functions.
>>> That's true, but those functions aren't yet deeply entrenched in the
>>> kernel yet, so in my view how they should look like should still be
>>> a subject for discussion, as for now they're only used locally in
>>> drivers/acpi, and my next patchset, that would remove .notify in
>>> platform directory would spread them more, and would
>>> make them harder to change. For now we can change how they
>>> work pretty painlessly.
>> I see no particular reason to do that, though.
>>
>> What specifically is a problem with passing struct acpi_device
>> pointers to the wrappers?  I don't see any TBH.
>>
> It is often the case that driver allocates it's own private struct using 
> kmalloc
> family of functions, and that structure already contains everything that 
> is
> needed to remove the handler, so why force ? There are already examples
> in the drivers that do that i.e in acpi_video the function
> acpi_video_dev_add_notify_handler() uses raw ACPICA handler to install
> a notify handler and it passes private structure there.
> So there is value in leaving the choice of an actual type to the user of 
> the
> API.
 No, if the user has a pointer to struct acpi_device already, there is
 no difference between passing this and passing the acpi_handle from it
 except for the extra dereference in the latter case.
>>> Dereference would happen anyway in the wrapper, and it doesn't cause
>>> any harm anyway for readability in my opinion. And of course you don't
>>> have to use acpi device at all, you can use ACPI_HANDLE() macro.
>> So one can use ACPI_COMPANION() just as well and it is slightly less 
>> overhead.
>>
 If the user doesn't have a struct acpi_device pointer, let them use
 the raw ACPICA handler directly and worry about the synchronization
 themselves.
>>> As mentioned acpi_device pointer is not really required to use the wrapper.
>>> Instead we can use ACPI_HANDLE() macro directly. Look at the usage of
>>> the wrapper in the AC driver [1].
>> You don't really have to repeat the same argument  several times and I
>> know how ACPI_HANDLE() works.  Also I don't like some of the things
>> done by this patch.
>>
>> Whoever uses ACPI_HANDLE(), they also use ACPI_COMPANION() which is
>> hidden in the former.
>>
>> If they don't need to store either the acpi_handle or the struct
>> acpi_device pointer, there is no reason at all to use the former
>> instead of the latter.
>>
>> If they get an acpi_handle from somewhere else than ACPI_HANDLE(),
>> then yes, they would need to get the ACPI devices from there (which is
>> possible still), but they may be better off by using the raw ACPICA
>> interface for events 

Re: [PATCH] [v3] ieee802154: ca8210: Fix a potential UAF in ca8210_probe

2023-10-05 Thread Stefan Schmidt

Hello Dinghao,


On 01.10.23 09:19, kernel test robot wrote:

Hi Dinghao,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.6-rc3 next-20230929]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Dinghao-Liu/ieee802154-ca8210-Fix-a-potential-UAF-in-ca8210_probe/20231001-135130
base:   linus/master
patch link:
https://lore.kernel.org/r/20231001054949.14624-1-dinghao.liu%40zju.edu.cn
patch subject: [PATCH] [v3] ieee802154: ca8210: Fix a potential UAF in 
ca8210_probe
config: m68k-allyesconfig 
(https://download.01.org/0day-ci/archive/20231001/202310011548.qyqmuodi-...@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20231001/202310011548.qyqmuodi-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202310011548.qyqmuodi-...@intel.com/

All warnings (new ones prefixed by >>):

drivers/net/ieee802154/ca8210.c: In function 'ca8210_register_ext_clock':

drivers/net/ieee802154/ca8210.c:2743:13: warning: unused variable 'ret' 
[-Wunused-variable]

 2743 | int ret = 0;
  | ^~~



Please take care of this now unused variable after your re-factor.
With this fixed and send out as v4 I am happy to get this applied to the 
wpan tree.


regards
Stefan Schmidt


Re: [PATCH v3 03/13] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2023-10-05 Thread Edgecombe, Rick P
On Mon, 2023-09-18 at 10:29 +0300, Mike Rapoport wrote:
> +/**
> + * struct execmem_range - definition of a memory range suitable for
> code and
> + *   related data allocations
> + * @start: address space start
> + * @end:   address space end (inclusive)
> + * @pgprot:permissions for memory in this address space
> + * @alignment: alignment required for text allocations
> + */
> +struct execmem_range {
> +   unsigned long   start;
> +   unsigned long   end;
> +   pgprot_t    pgprot;
> +   unsigned intalignment;
> +};

Not a strong opinion, but range doesn't seem an appropriate name. It
*has* a range, but also other allocation configuration. It gets
especially confusing when multiple "ranges" have the same range. Maybe
execmem_alloc_params?


Re: [PATCH v3 03/13] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2023-10-05 Thread Edgecombe, Rick P
On Thu, 2023-10-05 at 08:26 +0300, Mike Rapoport wrote:
> On Wed, Oct 04, 2023 at 03:39:26PM +, Edgecombe, Rick P wrote:
> > On Tue, 2023-10-03 at 17:29 -0700, Rick Edgecombe wrote:
> > > It seems a bit weird to copy all of this. Is it trying to be
> > > faster
> > > or
> > > something?
> > > 
> > > Couldn't it just check r->start in execmem_text/data_alloc() path
> > > and
> > > switch to EXECMEM_DEFAULT if needed then? The
> > > execmem_range_is_data()
> > > part that comes later could be added to the logic there too. So
> > > this
> > > seems like unnecessary complexity to me or I don't see the
> > > reason.
> > 
> > I guess this is a bad idea because if you have the full size array
> > sitting around anyway you might as well use it and reduce the
> > exec_mem_alloc() logic.
> 
> That's was the idea, indeed. :)
> 
> > Just looking at it from the x86 side (and
> > similar) though, where there is actually only one execmem_range and
> > it
> > building this whole array with identical data and it seems weird.
> 
> Right, most architectures have only one range, but to support all
> variants
> that we have, execmem has to maintain the whole array.

What about just having an index into a smaller set of ranges. The
module area and the extra JIT area. So ->ranges can be size 3
(statically allocated in the arch code) for three areas and then the
index array can be size EXECMEM_TYPE_MAX. The default 0 value of the
indexing array will point to the default area and any special areas can
be set in the index point to the desired range.

Looking at how it would do for x86 and arm64, it looks maybe a bit
better to me. A little bit less code and memory usage, and a bit easier
to trace the configuration through to the final state (IMO). What do
you think? Very rough, on top of this series, below.

As I was playing around with this, I was also wondering why it needs
two copies of struct execmem_params: one returned from the arch code
and one in exec mem. And why the temporary arch copy is ro_after_init,
but the final execmem.c copy is not ro_after_init?

 arch/arm64/mm/init.c| 67 ++---
--
 arch/x86/mm/init.c  | 24 +---
 include/linux/execmem.h |  5 +++--
 mm/execmem.c| 61 -

 4 files changed, 70 insertions(+), 87 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 9b7716b4d84c..7df119101f20 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -633,49 +633,58 @@ static int __init module_init_limits(void)
return 0;
 }
 
-static struct execmem_params execmem_params __ro_after_init = {
-   .ranges = {
-   [EXECMEM_DEFAULT] = {
-   .flags = EXECMEM_KASAN_SHADOW,
-   .alignment = MODULE_ALIGN,
-   },
-   [EXECMEM_KPROBES] = {
-   .start = VMALLOC_START,
-   .end = VMALLOC_END,
-   .alignment = 1,
-   },
-   [EXECMEM_BPF] = {
-   .start = VMALLOC_START,
-   .end = VMALLOC_END,
-   .alignment = 1,
-   },
+static struct execmem_range[2] ranges __ro_after_init = {
+   /* Module area */
+   [0] = {
+   .flags = EXECMEM_KASAN_SHADOW,
+   .alignment = MODULE_ALIGN,
+   },
+   /* Kprobes area */
+   [1] = {
+   .start = VMALLOC_START,
+   .end = VMALLOC_END,
+   .alignment = 1,
+   },
+   /* BPF area */
+   [2] = {
+   .start = VMALLOC_START,
+   .end = VMALLOC_END,
+   .alignment = 1,
},
 };
 
-struct execmem_params __init *execmem_arch_params(void)
+void __init execmem_arch_params(struct execmem_params *p)
 {
-   struct execmem_range *r =
_params.ranges[EXECMEM_DEFAULT];
+   struct execmem_range *default;
+   struct execmem_range *jit;
+
+   p->ranges = 
 
module_init_limits();
 
-   r->pgprot = PAGE_KERNEL;
-
+   /* Default area */
+   default = [0];
+   default->pgprot = PAGE_KERNEL;
if (module_direct_base) {
-   r->start = module_direct_base;
-   r->end = module_direct_base + SZ_128M;
+   default->start = module_direct_base;
+   default->end = module_direct_base + SZ_128M;
 
if (module_plt_base) {
-   r->fallback_start = module_plt_base;
-   r->fallback_end = module_plt_base + SZ_2G;
+   default->fallback_start = module_plt_base;
+   default->fallback_end = module_plt_base +
SZ_2G;
}
} else if (module_plt_base) {
-   r->start = module_plt_base;
-   r->end = module_plt_base + SZ_2G;
+   default->start = 

Re: [PATCH v1 2/9] docs: firmware-guide: ACPI: Clarify ACPI bus concepts

2023-10-05 Thread Rafael J. Wysocki
On Monday, September 25, 2023 4:48:35 PM CEST Michal Wilczynski wrote:
> Some devices implement ACPI driver as a way to manage devices
> enumerated by the ACPI. This might be confusing as a preferred way to
> implement a driver for devices not connected to any bus is a platform
> driver, as stated in the documentation. Clarify relationships between
> ACPI device, platform device and ACPI entries.
> 
> Suggested-by: Elena Reshetova 
> Signed-off-by: Michal Wilczynski 
> ---
>  Documentation/firmware-guide/acpi/enumeration.rst | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/Documentation/firmware-guide/acpi/enumeration.rst 
> b/Documentation/firmware-guide/acpi/enumeration.rst
> index 56d9913a3370..f56cc79a9e83 100644
> --- a/Documentation/firmware-guide/acpi/enumeration.rst
> +++ b/Documentation/firmware-guide/acpi/enumeration.rst
> @@ -64,6 +64,19 @@ If the driver needs to perform more complex initialization 
> like getting and
>  configuring GPIOs it can get its ACPI handle and extract this information
>  from ACPI tables.
>  
> +ACPI bus
> +
> +
> +Historically some devices not connected to any bus were represented as ACPI
> +devices, and had to implement ACPI driver. This is not a preferred way for 
> new
> +drivers. As explained above devices not connected to any bus should implement
> +platform driver. ACPI device would be created during enumeration nonetheless,
> +and would be accessible through ACPI_COMPANION() macro, and the ACPI handle 
> would
> +be accessible through ACPI_HANDLE() macro. ACPI device is meant to describe
> +information related to ACPI entry e.g. handle of the ACPI entry. Think -
> +ACPI device interfaces with the FW, and the platform device with the rest of
> +the system.
> +
>  DMA support
>  ===

I rewrote the above entirely, so here's a new patch to replace this one:

---
From: Rafael J. Wysocki 
Subject: [PATCH v2 2/9] ACPI: docs: enumeration: Clarify ACPI bus concepts

In some cases, ACPI drivers are implemented as a way to manage devices
enumerated with the help of the platform firmware through ACPI.

This might be confusing, since the preferred way to implement a driver
for a device that cannot be enumerated natively, is a platform
driver, as stated in the documentation.

Clarify relationships between ACPI device objects, platform devices and
ACPI Namespace entries.

Suggested-by: Elena Reshetova 
Co-developed-by: Michal Wilczynski 
Signed-off-by: Michal Wilczynski 
Signed-off-by: Rafael J. Wysocki 
---
 Documentation/firmware-guide/acpi/enumeration.rst |   43 ++
 1 file changed, 43 insertions(+)

Index: linux-pm/Documentation/firmware-guide/acpi/enumeration.rst
===
--- linux-pm.orig/Documentation/firmware-guide/acpi/enumeration.rst
+++ linux-pm/Documentation/firmware-guide/acpi/enumeration.rst
@@ -64,6 +64,49 @@ If the driver needs to perform more comp
 configuring GPIOs it can get its ACPI handle and extract this information
 from ACPI tables.
 
+ACPI device objects
+===
+
+Generally speaking, there are two categories of devices in a system in which
+ACPI is used as an interface between the platform firmware and the OS: Devices
+that can be discovered and enumerated natively, through a protocol defined for
+the specific bus that they are on (for example, configuration space in PCI),
+without the platform firmware assistance, and devices that need to be described
+by the platform firmware so that they can be discovered.  Still, for any device
+known to the platform firmware, regardless of which category it falls into,
+there can be a corresponding ACPI device object in the ACPI Namespace in which
+case the Linux kernel will create a struct acpi_device object based on it for
+that device.
+
+Those struct acpi_device objects are never used for binding drivers to natively
+discoverable devices, because they are represented by other types of device
+objects (for example, struct pci_dev for PCI devices) that are bound to by
+device drivers (the corresponding struct acpi_device object is then used as
+an additional source of information on the configuration of the given device).
+Moreover, the core ACPI device enumeration code creates struct platform_device
+objects for the majority of devices that are discovered and enumerated with the
+help of the platform firmware and those platform device objects can be bound to
+by platform drivers in direct analogy with the natively enumerable devices
+case.  Therefore it is logically inconsistent and so generally invalid to bind
+drivers to struct acpi_device objects, including drivers for devices that are
+discovered with the help of the platform firmware.
+
+Historically, ACPI drivers that bound directly to struct acpi_device objects
+were implemented for some devices enumerated with the help of the platform
+firmware, but this is not recommended for any new drivers.  As explained 

Re: [PATCH v1 1/9] ACPI: bus: Make notify wrappers more generic

2023-10-05 Thread Rafael J. Wysocki
On Thursday, October 5, 2023 5:30:59 PM CEST Rafael J. Wysocki wrote:
> On Thu, Oct 5, 2023 at 2:05 PM Wilczynski, Michal
>  wrote:
> > On 10/5/2023 12:57 PM, Rafael J. Wysocki wrote:
> > > On Thu, Oct 5, 2023 at 10:10 AM Wilczynski, Michal
> > >  wrote:
> 
> [cut]
> 
> > >>>
> > >>> That said, why exactly is it better to use acpi_handle instead of a
> > >>> struct acpi_device pointer?
> > >> I wanted to make the wrapper as close as possible to the wrapped 
> > >> function.
> > >> This way it would be easier to remove it in the future i.e if we ever 
> > >> deem
> > >> extra synchronization not worth it etc. What the ACPICA function need to
> > >> install a wrapper is a handle not a pointer to a device.
> > >> So there is no need for a middle man.
> > > Taking a struct acpi_device pointer as the first argument is part of
> > > duplication reduction, however, because in the most common case it
> > > saves the users of it the need to dereference the struct acpi_device
> > > they get from ACPI_COMPANION() in order to obtain the handle.
> >
> > User don't even have to use acpi device anywhere, as he can choose
> > to use ACPI_HANDLE() instead on 'struct device*' and never interact
> > with acpi device directly.
> 
> Have you actually looked at this macro?  It is a wrapper around
> ACPI_COMPANION().
> 
> So they may think that they don't use struct acpi_device pointers, but
> in fact they do.
> 
> > >
> > > Arguably, acpi_handle is an ACPICA concept and it is better to reduce
> > > its usage outside ACPICA.
> >
> > Use of acpi_handle is deeply entrenched in the kernel. There is even
> > a macro ACPI_HANDLE() that returns acpi_handle. I would say it's
> > way too late to limit it to ACPICA internal code.
> 
> So there is a difference between "limiting to ACPICA" and "reducing".
> It cannot be limited to ACPICA, because the code outside ACPICA needs
> to evaluate ACPI objects sometimes and ACPI handles are needed for
> that.
> 
> And this observation doesn't invalidate the point.
> 
> > >
> > >>> Realistically, in a platform driver you'll need the latter to obtain
> > >>> the former anyway.
> > >> I don't want to introduce arbitrary limitations where they are not 
> > >> necessary.
> > > I'm not sure what you mean.  This patch is changing existing functions.
> >
> > That's true, but those functions aren't yet deeply entrenched in the
> > kernel yet, so in my view how they should look like should still be
> > a subject for discussion, as for now they're only used locally in
> > drivers/acpi, and my next patchset, that would remove .notify in
> > platform directory would spread them more, and would
> > make them harder to change. For now we can change how they
> > work pretty painlessly.
> 
> I see no particular reason to do that, though.
> 
> What specifically is a problem with passing struct acpi_device
> pointers to the wrappers?  I don't see any TBH.
> 
> > >
> > >> It is often the case that driver allocates it's own private struct using 
> > >> kmalloc
> > >> family of functions, and that structure already contains everything that 
> > >> is
> > >> needed to remove the handler, so why force ? There are already examples
> > >> in the drivers that do that i.e in acpi_video the function
> > >> acpi_video_dev_add_notify_handler() uses raw ACPICA handler to install
> > >> a notify handler and it passes private structure there.
> > >> So there is value in leaving the choice of an actual type to the user of 
> > >> the
> > >> API.
> > > No, if the user has a pointer to struct acpi_device already, there is
> > > no difference between passing this and passing the acpi_handle from it
> > > except for the extra dereference in the latter case.
> >
> > Dereference would happen anyway in the wrapper, and it doesn't cause
> > any harm anyway for readability in my opinion. And of course you don't
> > have to use acpi device at all, you can use ACPI_HANDLE() macro.
> 
> So one can use ACPI_COMPANION() just as well and it is slightly less overhead.
> 
> > >
> > > If the user doesn't have a struct acpi_device pointer, let them use
> > > the raw ACPICA handler directly and worry about the synchronization
> > > themselves.
> >
> > As mentioned acpi_device pointer is not really required to use the wrapper.
> > Instead we can use ACPI_HANDLE() macro directly. Look at the usage of
> > the wrapper in the AC driver [1].
> 
> You don't really have to repeat the same argument  several times and I
> know how ACPI_HANDLE() works.  Also I don't like some of the things
> done by this patch.
> 
> Whoever uses ACPI_HANDLE(), they also use ACPI_COMPANION() which is
> hidden in the former.
> 
> If they don't need to store either the acpi_handle or the struct
> acpi_device pointer, there is no reason at all to use the former
> instead of the latter.
> 
> If they get an acpi_handle from somewhere else than ACPI_HANDLE(),
> then yes, they would need to get the ACPI devices from there (which is
> possible still), but they may be better off by using 

Re: [PATCH] bus: mhi: host: Add tracing support

2023-10-05 Thread kernel test robot
Hi Krishna,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 3006adf3be79cde4d14b1800b963b82b6e5572e0]

url:
https://github.com/intel-lab-lkp/linux/commits/Krishna-chaitanya-chundru/bus-mhi-host-Add-tracing-support/20231005-231430
base:   3006adf3be79cde4d14b1800b963b82b6e5572e0
patch link:
https://lore.kernel.org/r/20231005-ftrace_support-v1-1-23a2f394fa49%40quicinc.com
patch subject: [PATCH] bus: mhi: host: Add tracing support
config: m68k-allyesconfig 
(https://download.01.org/0day-ci/archive/20231006/202310060033.z0ojejxe-...@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20231006/202310060033.z0ojejxe-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202310060033.z0ojejxe-...@intel.com/

All warnings (new ones prefixed by >>):

   drivers/bus/mhi/host/main.c: In function 'mhi_process_ctrl_ev_ring':
>> drivers/bus/mhi/host/main.c:834:74: warning: cast from pointer to integer of 
>> different size [-Wpointer-to-int-cast]
 834 | 
trace_mhi_process_ctrl_ev_ring(mhi_cntrl->mhi_dev->name, (u64)(local_rp),
 |  
^
   drivers/bus/mhi/host/main.c: In function 'mhi_gen_tre':
   drivers/bus/mhi/host/main.c:1245:69: warning: cast from pointer to integer 
of different size [-Wpointer-to-int-cast]
1245 | trace_mhi_gen_tre(mhi_cntrl->mhi_dev->name, mhi_chan->chan, 
(u64)(mhi_tre),
 | ^
--
   drivers/bus/mhi/host/pm.c: In function 'mhi_pm_st_worker':
>> drivers/bus/mhi/host/pm.c:758:24: warning: unused variable 'dev' 
>> [-Wunused-variable]
 758 | struct device *dev = _cntrl->mhi_dev->dev;
 |^~~


vim +834 drivers/bus/mhi/host/main.c

   799  
   800  int mhi_process_ctrl_ev_ring(struct mhi_controller *mhi_cntrl,
   801   struct mhi_event *mhi_event,
   802   u32 event_quota)
   803  {
   804  struct mhi_ring_element *dev_rp, *local_rp;
   805  struct mhi_ring *ev_ring = _event->ring;
   806  struct mhi_event_ctxt *er_ctxt =
   807  _cntrl->mhi_ctxt->er_ctxt[mhi_event->er_index];
   808  struct mhi_chan *mhi_chan;
   809  struct device *dev = _cntrl->mhi_dev->dev;
   810  u32 chan;
   811  int count = 0;
   812  dma_addr_t ptr = le64_to_cpu(er_ctxt->rp);
   813  
   814  /*
   815   * This is a quick check to avoid unnecessary event processing
   816   * in case MHI is already in error state, but it's still 
possible
   817   * to transition to error state while processing events
   818   */
   819  if (unlikely(MHI_EVENT_ACCESS_INVALID(mhi_cntrl->pm_state)))
   820  return -EIO;
   821  
   822  if (!is_valid_ring_ptr(ev_ring, ptr)) {
   823  dev_err(_cntrl->mhi_dev->dev,
   824  "Event ring rp points outside of the event 
ring\n");
   825  return -EIO;
   826  }
   827  
   828  dev_rp = mhi_to_virtual(ev_ring, ptr);
   829  local_rp = ev_ring->rp;
   830  
   831  while (dev_rp != local_rp) {
   832  enum mhi_pkt_type type = MHI_TRE_GET_EV_TYPE(local_rp);
   833  
 > 834  
 > trace_mhi_process_ctrl_ev_ring(mhi_cntrl->mhi_dev->name, (u64)(local_rp),
   835 local_rp->ptr, 
local_rp->dword[0],
   836 local_rp->dword[1],
   837 
mhi_state_str(MHI_TRE_GET_EV_STATE(local_rp)));
   838  
   839  switch (type) {
   840  case MHI_PKT_TYPE_BW_REQ_EVENT:
   841  {
   842  struct mhi_link_info *link_info;
   843  
   844  link_info = _cntrl->mhi_link_info;
   845  write_lock_irq(_cntrl->pm_lock);
   846  link_info->target_link_speed =
   847  MHI_TRE_GET_EV_LINKSPEED(local_rp);
   848  link_info->target_link_width =
   849  MHI_TRE_GET_EV_LINKWIDTH(local_rp);
   850  write_unlock_irq(_cntrl->pm_lock);
   851  dev_dbg(dev, "Received BW_REQ event\n");
   852  m

[PATCH v7 13/13] selftests/sgx: Remove incomplete ABI sanitization code in test enclave

2023-10-05 Thread Jo Van Bulck
As the selftest enclave is *not* intended for production, simplify the
code by not initializing CPU configuration registers as expected by the
ABI on enclave entry or cleansing caller-save registers on enclave exit.

Link: 
https://lore.kernel.org/all/da0cfb1e-e347-f7f2-ac72-aec0ee0d8...@intel.com/
Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
---
 .../testing/selftests/sgx/test_encl_bootstrap.S  | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/sgx/test_encl_bootstrap.S 
b/tools/testing/selftests/sgx/test_encl_bootstrap.S
index 28fe5d2ac0af..d8c4ac94e032 100644
--- a/tools/testing/selftests/sgx/test_encl_bootstrap.S
+++ b/tools/testing/selftests/sgx/test_encl_bootstrap.S
@@ -59,21 +59,11 @@ encl_entry_core:
 
push%rcx # push the address after EENTER
 
+   # NOTE: as the selftest enclave is *not* intended for production,
+   # simplify the code by not initializing ABI registers on entry or
+   # cleansing caller-save registers on exit.
callencl_body
 
-   /* Clear volatile GPRs, except RAX (EEXIT function). */
-   xor %rcx, %rcx
-   xor %rdx, %rdx
-   xor %rdi, %rdi
-   xor %rsi, %rsi
-   xor %r8, %r8
-   xor %r9, %r9
-   xor %r10, %r10
-   xor %r11, %r11
-
-   # Reset status flags.
-   add %rdx, %rdx # OF = SF = AF = CF = 0; ZF = PF = 1
-
# Prepare EEXIT target by popping the address of the instruction after
# EENTER to RBX.
pop %rbx
-- 
2.25.1



[PATCH v7 10/13] selftests/sgx: Ensure test enclave buffer is entirely preserved

2023-10-05 Thread Jo Van Bulck
Attach the "used" attribute to instruct the compiler to preserve the static
encl_buffer, even if it appears it is not entirely referenced in the enclave
code, as expected by the external tests manipulating page permissions.

Link: 
https://lore.kernel.org/all/a2732938-f3db-a0af-3d68-a18060f66...@cs.kuleuven.be/
Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
Acked-by: Kai Huang 
---
 tools/testing/selftests/sgx/defines.h   | 1 +
 tools/testing/selftests/sgx/test_encl.c | 9 +
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/sgx/defines.h 
b/tools/testing/selftests/sgx/defines.h
index d8587c971941..b8f482667ce1 100644
--- a/tools/testing/selftests/sgx/defines.h
+++ b/tools/testing/selftests/sgx/defines.h
@@ -13,6 +13,7 @@
 
 #define __aligned(x) __attribute__((__aligned__(x)))
 #define __packed __attribute__((packed))
+#define __used __attribute__((used))
 
 #include "../../../../arch/x86/include/asm/sgx.h"
 #include "../../../../arch/x86/include/asm/enclu.h"
diff --git a/tools/testing/selftests/sgx/test_encl.c 
b/tools/testing/selftests/sgx/test_encl.c
index 649604c526e7..7465f121fb74 100644
--- a/tools/testing/selftests/sgx/test_encl.c
+++ b/tools/testing/selftests/sgx/test_encl.c
@@ -5,11 +5,12 @@
 #include "defines.h"
 
 /*
- * Data buffer spanning two pages that will be placed first in .data
- * segment. Even if not used internally the second page is needed by
- * external test manipulating page permissions.
+ * Data buffer spanning two pages that will be placed first in the .data
+ * segment. Even if not used internally the second page is needed by external
+ * test manipulating page permissions, so mark encl_buffer as "used" to make
+ * sure it is entirely preserved by the compiler.
  */
-static uint8_t encl_buffer[8192] = { 1 };
+static uint8_t __used encl_buffer[8192] = { 1 };
 
 enum sgx_enclu_function {
EACCEPT = 0x5,
-- 
2.25.1



Re: [PATCH v5] scripts/link-vmlinux.sh: Add alias to duplicate symbols for kallsyms

2023-10-05 Thread Kris Van Hees
On Wed, Sep 27, 2023 at 05:35:16PM +, Alessandro Carminati (Red Hat) wrote:
> It is not uncommon for drivers or modules related to similar peripherals
> to have symbols with the exact same name.
> While this is not a problem for the kernel's binary itself, it becomes an
> issue when attempting to trace or probe specific functions using
> infrastructure like ftrace or kprobe.
> 
> The tracing subsystem relies on the `nm -n vmlinux` output, which provides
> symbol information from the kernel's ELF binary. However, when multiple
> symbols share the same name, the standard nm output does not differentiate
> between them. This can lead to confusion and difficulty when trying to
> probe the intended symbol.
> 
>  ~ # cat /proc/kallsyms | grep " name_show"
>  8c4f76d0 t name_show
>  8c9cccb0 t name_show
>  8cb0ac20 t name_show
>  8cc728c0 t name_show
>  8ce0efd0 t name_show
>  8ce126c0 t name_show
>  8ce1dd20 t name_show
>  8ce24e70 t name_show
>  8d1104c0 t name_show
>  8d1fe480 t name_show

One problem that remains as far as I can see is that this approach does not
take into consideration that there can be duplicate symbols in the core
kernel, but also between core kernel and loadable modules, and even between
loadable modules.  So, I think more is needed to also ensure that this
approach of adding alias symbols is also done for loadable modules.

Earlier work that cover all symbols (core kernel and loadable modules) was
posted quite a while ago by Nick Alcock:

https://lore.kernel.org/all/20221205163157.269335-1-nick.alc...@oracle.com/

It takes a different approach and adds in other info that is very useful for
tracing, but unfortunately it has been dormant for a long time now.

While module symbols are handled quite differently (for kallsyms) from the
core kernel symbols, I think that a similar approach tied in with modpost
ought to be quite possible.  It will add to the size of modules because the
data needs to be stored in the .ko but that is unavoidable.  But not doing it
unfortunately would mean that the duplicate symbol issue remains unresolved
in the presence of loadable modules.

> kas_alias addresses this challenge by enhancing symbol names with
> meaningful suffixes generated from the source file and line number
> during the kernel build process.
> These newly generated aliases provide tracers with the ability to
> comprehend the symbols they are interacting with when utilizing the
> ftracefs interface.
> This approach may also allow for the probing by name of previously
> inaccessible symbols.
> 
>  ~ # cat /proc/kallsyms | grep gic_mask_irq
>  d15671e505ac t gic_mask_irq
>  d15671e505ac t gic_mask_irq@drivers_irqchip_irq_gic_c_167
>  d15671e532a4 t gic_mask_irq
>  d15671e532a4 t gic_mask_irq@drivers_irqchip_irq_gic_v3_c_407
>  ~ #

In the same context as mentioned above (module symbols), I am hoping that the
alias you generate might also be able to contain a module identifier name,
much like the aforementioned patch series by Nick Alcock added.  We have it
for loadable modules already of course, but as has been discussed in relation
to the earlier work, being able to associate a module name with a symbol
regardless of whether that module is configured to be built into the kernel
or whether it is configured to be a loadable module is helpful for tracing
purposes.  Especially for tracers that use tracing scripts that might get
deployed on diverse systems where it cannot always be known at the time of
developing the tracer scripts whether a kernel module is configured to be
loadable or built-in.

I'd be happy to work on something like this as a contribution to your work.
I would envision the alias entry not needing to have the typical [module] added
to it because that will already be annotated on the actual symbol entry.  So,
the alias could be extended to be something like:

c0533720 t floppy_open  [floppy]
c0533720 t floppy_open@floppy:drivers_block_floppy_c_3988

(absence of a name: prefix to the path would indicate the symbol is not
 associated with any module)

Doing this is more realistic now as a result of the clean-up patches that
Nick introduced, e.g.

https://lore.kernel.org/lkml/20230302211759.30135-1-nick.alc...@oracle.com/

> Changes from v1:
> - Integrated changes requested by Masami to exclude symbols with prefixes
>   "_cfi" and "_pfx".
> - Introduced a small framework to handle patterns that need to be excluded
>   from the alias production.
> - Excluded other symbols using the framework.
> - Introduced the ability to discriminate between text and data symbols.
> - Added two new config symbols in this version: CONFIG_KALLSYMS_ALIAS_DATA,
>   which allows data for data, and CONFIG_KALLSYMS_ALIAS_DATA_ALL, which
>   excludes all filters and provides an alias for each duplicated symbol.
> 
> 

[PATCH v7 09/13] selftests/sgx: Fix linker script asserts

2023-10-05 Thread Jo Van Bulck
DEFINED only considers symbols, not section names. Hence, replace the
check for .got.plt with the _GLOBAL_OFFSET_TABLE_ symbol and remove other
(non-essential) asserts.

Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
---
 tools/testing/selftests/sgx/test_encl.lds | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/tools/testing/selftests/sgx/test_encl.lds 
b/tools/testing/selftests/sgx/test_encl.lds
index 62d37160f59b..6ffdfc9fb4cf 100644
--- a/tools/testing/selftests/sgx/test_encl.lds
+++ b/tools/testing/selftests/sgx/test_encl.lds
@@ -35,8 +35,4 @@ SECTIONS
}
 }
 
-ASSERT(!DEFINED(.altinstructions), "ALTERNATIVES are not supported in 
enclaves")
-ASSERT(!DEFINED(.altinstr_replacement), "ALTERNATIVES are not supported in 
enclaves")
-ASSERT(!DEFINED(.discard.retpoline_safe), "RETPOLINE ALTERNATIVES are not 
supported in enclaves")
-ASSERT(!DEFINED(.discard.nospec), "RETPOLINE ALTERNATIVES are not supported in 
enclaves")
-ASSERT(!DEFINED(.got.plt), "Libcalls are not supported in enclaves")
+ASSERT(!DEFINED(_GLOBAL_OFFSET_TABLE_), "Libcalls through GOT are not 
supported in enclaves")
-- 
2.25.1



[PATCH v7 07/13] selftests/sgx: Produce static-pie executable for test enclave

2023-10-05 Thread Jo Van Bulck
The current combination of -static and -fPIC creates a static executable
with position-dependent addresses for global variables. Use -static-pie
and -fPIE to create a proper static position independent executable that
can be loaded at any address without a dynamic linker.

When building the original "lea (encl_stack)(%rbx), %rax" assembly code
with -static-pie -fPIE, the linker complains about a relocation it cannot
resolve:

/usr/local/bin/ld: /tmp/cchIWyfG.o: relocation R_X86_64_32S against
`.data' can not be used when making a PIE object; recompile with -fPIE
collect2: error: ld returned 1 exit status

Thus, since only RIP-relative addressing is legit for local symbols, use
"encl_stack(%rip)" and declare an explicit "__encl_base" symbol at the
start of the linker script to be able to calculate the stack address
relative to the current TCS in the enclave assembly entry code.

Link: 
https://lore.kernel.org/all/f9c24d89-ed72-7d9e-c650-050d722c6...@cs.kuleuven.be/
Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
Acked-by: Kai Huang 
---
 tools/testing/selftests/sgx/Makefile  | 2 +-
 tools/testing/selftests/sgx/test_encl.lds | 1 +
 tools/testing/selftests/sgx/test_encl_bootstrap.S | 9 ++---
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/sgx/Makefile 
b/tools/testing/selftests/sgx/Makefile
index 7eb890bdd3f0..8d2ba6adc92b 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -14,7 +14,7 @@ endif
 INCLUDES := -I$(top_srcdir)/tools/include
 HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC
 HOST_LDFLAGS := -z noexecstack -lcrypto
-ENCL_CFLAGS += -Wall -Werror -static -nostdlib -ffreestanding -fPIC \
+ENCL_CFLAGS += -Wall -Werror -static-pie -nostdlib -ffreestanding -fPIE \
   -fno-stack-protector -mrdrnd $(INCLUDES)
 ENCL_LDFLAGS := -Wl,-T,test_encl.lds,--build-id=none
 
diff --git a/tools/testing/selftests/sgx/test_encl.lds 
b/tools/testing/selftests/sgx/test_encl.lds
index a1ec64f7d91f..62d37160f59b 100644
--- a/tools/testing/selftests/sgx/test_encl.lds
+++ b/tools/testing/selftests/sgx/test_encl.lds
@@ -10,6 +10,7 @@ PHDRS
 SECTIONS
 {
. = 0;
+__encl_base = .;
.tcs : {
*(.tcs*)
} : tcs
diff --git a/tools/testing/selftests/sgx/test_encl_bootstrap.S 
b/tools/testing/selftests/sgx/test_encl_bootstrap.S
index e0ce993d3f2c..28fe5d2ac0af 100644
--- a/tools/testing/selftests/sgx/test_encl_bootstrap.S
+++ b/tools/testing/selftests/sgx/test_encl_bootstrap.S
@@ -42,9 +42,12 @@
 encl_entry:
# RBX contains the base address for TCS, which is the first address
# inside the enclave for TCS #1 and one page into the enclave for
-   # TCS #2. By adding the value of encl_stack to it, we get
-   # the absolute address for the stack.
-   lea (encl_stack)(%rbx), %rax
+   # TCS #2. First make it relative by substracting __encl_base and
+   # then add the address of encl_stack to get the address for the stack.
+   lea __encl_base(%rip), %rax
+   sub %rax, %rbx
+   lea encl_stack(%rip), %rax
+   add %rbx, %rax
jmp encl_entry_core
 encl_dyn_entry:
# Entry point for dynamically created TCS page expected to follow
-- 
2.25.1



[PATCH v7 06/13] selftests/sgx: Remove redundant enclave base address save/restore

2023-10-05 Thread Jo Van Bulck
Remove redundant push/pop pair that stores and restores the enclave base
address in the test enclave, as it is never used after the pop and can
anyway be easily retrieved via the __encl_base symbol.

Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
Acked-by: Kai Huang 
---
 tools/testing/selftests/sgx/test_encl_bootstrap.S | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/testing/selftests/sgx/test_encl_bootstrap.S 
b/tools/testing/selftests/sgx/test_encl_bootstrap.S
index 03ae0f57e29d..e0ce993d3f2c 100644
--- a/tools/testing/selftests/sgx/test_encl_bootstrap.S
+++ b/tools/testing/selftests/sgx/test_encl_bootstrap.S
@@ -55,12 +55,9 @@ encl_entry_core:
push%rax
 
push%rcx # push the address after EENTER
-   push%rbx # push the enclave base address
 
callencl_body
 
-   pop %rbx # pop the enclave base address
-
/* Clear volatile GPRs, except RAX (EEXIT function). */
xor %rcx, %rcx
xor %rdx, %rdx
-- 
2.25.1



[PATCH v7 04/13] selftests/sgx: Separate linker options

2023-10-05 Thread Jo Van Bulck
Fixes "'linker' input unused [-Wunused-command-line-argument]" errors when
compiling with clang.

Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
---
 tools/testing/selftests/sgx/Makefile | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/sgx/Makefile 
b/tools/testing/selftests/sgx/Makefile
index 50aab6b57da3..dcdd04b322f8 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -12,9 +12,11 @@ OBJCOPY := $(CROSS_COMPILE)objcopy
 endif
 
 INCLUDES := -I$(top_srcdir)/tools/include
-HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC -z noexecstack
-ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
+HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC
+HOST_LDFLAGS := -z noexecstack -lcrypto
+ENCL_CFLAGS += -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
   -fno-stack-protector -mrdrnd $(INCLUDES)
+ENCL_LDFLAGS := -Wl,-T,test_encl.lds,--build-id=none
 
 TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
 TEST_FILES := $(OUTPUT)/test_encl.elf
@@ -28,7 +30,7 @@ $(OUTPUT)/test_sgx: $(OUTPUT)/main.o \
$(OUTPUT)/sigstruct.o \
$(OUTPUT)/call.o \
$(OUTPUT)/sign_key.o
-   $(CC) $(HOST_CFLAGS) -o $@ $^ -lcrypto
+   $(CC) $(HOST_CFLAGS) -o $@ $^ $(HOST_LDFLAGS)
 
 $(OUTPUT)/main.o: main.c
$(CC) $(HOST_CFLAGS) -c $< -o $@
@@ -45,8 +47,8 @@ $(OUTPUT)/call.o: call.S
 $(OUTPUT)/sign_key.o: sign_key.S
$(CC) $(HOST_CFLAGS) -c $< -o $@
 
-$(OUTPUT)/test_encl.elf: test_encl.lds test_encl.c test_encl_bootstrap.S
-   $(CC) $(ENCL_CFLAGS) -T $^ -o $@ -Wl,--build-id=none
+$(OUTPUT)/test_encl.elf: test_encl.c test_encl_bootstrap.S
+   $(CC) $(ENCL_CFLAGS) $^ -o $@ $(ENCL_LDFLAGS)
 
 EXTRA_CLEAN := \
$(OUTPUT)/test_encl.elf \
-- 
2.25.1



[PATCH v7 02/13] selftests/sgx: Fix uninitialized pointer dereferences in encl_get_entry

2023-10-05 Thread Jo Van Bulck
Ensure sym_tab and sym_names are zero-initialized and add an early-out
condition in the unlikely (erroneous) case that the enclave ELF file would
not contain a symbol table.

This addresses -Werror=maybe-uninitialized compiler warnings for gcc -O2.

Fixes: 33c5aac3bf32 ("selftests/sgx: Test complete changing of page type flow")
Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
---
 tools/testing/selftests/sgx/load.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/sgx/load.c 
b/tools/testing/selftests/sgx/load.c
index 94bdeac1cf04..c9f658e44de6 100644
--- a/tools/testing/selftests/sgx/load.c
+++ b/tools/testing/selftests/sgx/load.c
@@ -136,11 +136,11 @@ static bool encl_ioc_add_pages(struct encl *encl, struct 
encl_segment *seg)
  */
 uint64_t encl_get_entry(struct encl *encl, const char *symbol)
 {
+   Elf64_Sym *symtab = NULL;
+   char *sym_names = NULL;
Elf64_Shdr *sections;
-   Elf64_Sym *symtab;
Elf64_Ehdr *ehdr;
-   char *sym_names;
-   int num_sym;
+   int num_sym = 0;
int i;
 
ehdr = encl->bin;
@@ -161,6 +161,9 @@ uint64_t encl_get_entry(struct encl *encl, const char 
*symbol)
}
}
 
+   if (!symtab || !sym_names)
+   return 0;
+
for (i = 0; i < num_sym; i++) {
Elf64_Sym *sym = [i];
 
-- 
2.25.1



[PATCH v7 08/13] selftests/sgx: Handle relocations in test enclave

2023-10-05 Thread Jo Van Bulck
Static-pie binaries normally include a startup routine to perform any ELF
relocations from .rela.dyn. Since the enclave loading process is different
and glibc is not included, do the necessary relocation for encl_op_array
entries manually at runtime relative to the enclave base to ensure correct
function pointers.

When keeping encl_op_array as a local variable on the stack, gcc without
optimizations generates code that explicitly gets the right function
addresses and stores them to create the array on the stack:

encl_body:
/* snipped */
leado_encl_op_put_to_buf(%rip), %rax
mov%rax, -0x50(%rbp)
leado_encl_op_get_from_buf(%rip), %rax
mov%rax,-0x48(%rbp)
leado_encl_op_put_to_addr(%rip), %rax
/* snipped */

However, gcc -Os or clang generate more efficient code that initializes
encl_op_array by copying a "prepared copy" containing the absolute
addresses of the functions (i.e., relative to the image base starting from
0) generated by the compiler/linker:

encl_body:
/* snipped */
leaprepared_copy(%rip), %rsi
lea-0x48(%rsp), %rdi
mov$0x10,%ecx
rep movsl %ds:(%rsi),%es:(%rdi)
/* snipped */

When building the enclave with -static-pie, the compiler/linker includes
relocation entries for the function symbols in the "prepared copy":

Relocation section '.rela.dyn' at offset 0x4000 contains 12 entries:
  Offset  Info   Type Symbol
/* snipped; "prepared_copy" starts at 0x6000 */
6000  0008 R_X86_64_RELATIVE  
6008  0008 R_X86_64_RELATIVE  
6010  0008 R_X86_64_RELATIVE  
6018  0008 R_X86_64_RELATIVE  
6020  0008 R_X86_64_RELATIVE  
6028  0008 R_X86_64_RELATIVE  
6030  0008 R_X86_64_RELATIVE  
6038  0008 R_X86_64_RELATIVE  

Static-pie binaries normally include a glibc "_dl_relocate_static_pie"
routine that will perform these relocations as part of the startup.
However, since the enclave loading process is different and glibc is not
included, we cannot rely on these relocations to be performed. Without
relocations, the code would erroneously jump to the _absolute_ function
address loaded from the local copy.

Thus, declare "encl_op_array" as global and manually relocate the loaded
function-pointer entries relative to the enclave base at runtime. This
generates the following code:

encl_body:
/* snipped */
leaencl_op_array(%rip), %rcx
lea__encl_base(%rip), %rax
add(%rcx,%rdx,8),%rax
jmp*%rax

Link: 
https://lore.kernel.org/all/150d8ca8-2c66-60d1-f9fc-8e6279824...@cs.kuleuven.be/
Link: 
https://lore.kernel.org/all/5c22de5a-4b3b-1f38-9771-409b4ec7f...@cs.kuleuven.be/#r
Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
Acked-by: Kai Huang 
---
 tools/testing/selftests/sgx/test_encl.c | 50 +
 1 file changed, 35 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/sgx/test_encl.c 
b/tools/testing/selftests/sgx/test_encl.c
index ae791df3e5a5..649604c526e7 100644
--- a/tools/testing/selftests/sgx/test_encl.c
+++ b/tools/testing/selftests/sgx/test_encl.c
@@ -121,21 +121,41 @@ static void do_encl_op_nop(void *_op)
 
 }
 
+/*
+ * Symbol placed at the start of the enclave image by the linker script.
+ * Declare this extern symbol with visibility "hidden" to ensure the compiler
+ * does not access it through the GOT and generates position-independent
+ * addressing as __encl_base(%rip), so we can get the actual enclave base
+ * during runtime.
+ */
+extern const uint8_t __attribute__((visibility("hidden"))) __encl_base;
+
+typedef void (*encl_op_t)(void *);
+static const encl_op_t encl_op_array[ENCL_OP_MAX] = {
+   do_encl_op_put_to_buf,
+   do_encl_op_get_from_buf,
+   do_encl_op_put_to_addr,
+   do_encl_op_get_from_addr,
+   do_encl_op_nop,
+   do_encl_eaccept,
+   do_encl_emodpe,
+   do_encl_init_tcs_page,
+};
+
 void encl_body(void *rdi,  void *rsi)
 {
-   const void (*encl_op_array[ENCL_OP_MAX])(void *) = {
-   do_encl_op_put_to_buf,
-   do_encl_op_get_from_buf,
-   do_encl_op_put_to_addr,
-   do_encl_op_get_from_addr,
-   do_encl_op_nop,
-   do_encl_eaccept,
-   do_encl_emodpe,
-   do_encl_init_tcs_page,
-   };
-
-   struct encl_op_header *op = (struct encl_op_header *)rdi;
-
-   if (op->type < ENCL_OP_MAX)
-   (*encl_op_array[op->type])(op);
+   struct encl_op_header *header = (struct encl_op_header *)rdi;
+   encl_op_t op;
+
+   if (header->type >= ENCL_OP_MAX)
+   return;
+
+   /*
+* The enclave base address needs to be added, as this call site
+* *cannot be* made rip-relative by the compiler, or fixed up by
+* any other possible means.
+*/
+   op = ((uint64_t)&__encl_base) + 

[PATCH v7 03/13] selftests/sgx: Include memory clobber for inline asm in test enclave

2023-10-05 Thread Jo Van Bulck
Add the "memory" clobber to the EMODPE and EACCEPT asm blocks to tell the
compiler the assembly code accesses to the secinfo struct. This ensures
the compiler treats the asm block as a memory barrier and the write to
secinfo will be visible to ENCLU.

Fixes: 20404a808593 ("selftests/sgx: Add test for EPCM permission changes")
Signed-off-by: Jo Van Bulck 
Reviewed-by: Kai Huang 
Reviewed-by: Jarkko Sakkinen 
---
 tools/testing/selftests/sgx/test_encl.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/sgx/test_encl.c 
b/tools/testing/selftests/sgx/test_encl.c
index c0d6397295e3..ae791df3e5a5 100644
--- a/tools/testing/selftests/sgx/test_encl.c
+++ b/tools/testing/selftests/sgx/test_encl.c
@@ -24,10 +24,11 @@ static void do_encl_emodpe(void *_op)
secinfo.flags = op->flags;
 
asm volatile(".byte 0x0f, 0x01, 0xd7"
-   :
+   : /* no outputs */
: "a" (EMODPE),
  "b" (),
- "c" (op->epc_addr));
+ "c" (op->epc_addr)
+   : "memory" /* read from secinfo pointer */);
 }
 
 static void do_encl_eaccept(void *_op)
@@ -42,7 +43,8 @@ static void do_encl_eaccept(void *_op)
: "=a" (rax)
: "a" (EACCEPT),
  "b" (),
- "c" (op->epc_addr));
+ "c" (op->epc_addr)
+   : "memory" /* read from secinfo pointer */);
 
op->ret = rax;
 }
-- 
2.25.1



[PATCH v7 12/13] selftests/sgx: Discard unsupported ELF sections

2023-10-05 Thread Jo Van Bulck
Building the test enclave with -static-pie may produce a dynamic symbol
table, but this is not supported for enclaves and any relocations need to
happen manually (e.g., as for "encl_op_array"). Thus, opportunistically
discard ".dyn*" and ".gnu.hash" which the enclave loader cannot handle.

Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
---
 tools/testing/selftests/sgx/test_encl.lds | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/sgx/test_encl.lds 
b/tools/testing/selftests/sgx/test_encl.lds
index 333a3e78fdc9..ffe851a1cac4 100644
--- a/tools/testing/selftests/sgx/test_encl.lds
+++ b/tools/testing/selftests/sgx/test_encl.lds
@@ -33,6 +33,8 @@ SECTIONS
*(.note*)
*(.debug*)
*(.eh_frame*)
+   *(.dyn*)
+   *(.gnu.hash)
}
 }
 
-- 
2.25.1



[PATCH v7 00/13] selftests/sgx: Fix compilation errors

2023-10-05 Thread Jo Van Bulck
Hi,

This patch series ensures that all SGX selftests succeed when compiling with
optimizations (as tested with -O{0,1,2,3,s} for both gcc 11.3.0 and clang
14.0.0). The aim of the patches is to avoid reliance on undefined,
compiler-specific behavior that can make the test results fragile.

As far as I see, all commits in this series now have an explicit reviewed-by
tag, so hopefully this can get merged upstream? Please let me know if any
concerns remain and I'd happily address them.

Reference output below:

.. Testing   gcc   -O0[OK]
.. Testing   gcc   -O1[OK]
.. Testing   gcc   -O2[OK]
.. Testing   gcc   -O3[OK]
.. Testing   gcc   -Os[OK]
.. Testing   gcc   -Ofast [OK]
.. Testing   gcc   -Og[OK]
.. Testing   clang -O0[OK]
.. Testing   clang -O1[OK]
.. Testing   clang -O2[OK]
.. Testing   clang -O3[OK]
.. Testing   clang -Os[OK]
.. Testing   clang -Ofast [OK]
.. Testing   clang -Og[OK]

Changelog
-

v7
  - Add reviewed-by tag (Jarkko)

v6
  - Collect final ack/reviewed-by tags (Jarkko, Kai)

v5
  - Reorder patches (Jarkko, Kai)
  - Include fixes tag for inline asm memory clobber patch (Kai)
  - Include linker error in static-pie commit message (Kai)
  - Include generated assembly in relocations commit (Kai)

v4
  - Remove redundant -nostartfiles compiler flag (Jarkko)
  - Split dynamic symbol table removal in separate commit (Kai)
  - Split redundant push/pop elimination in separate commit (Kai)
  - Remove (incomplete) register cleansing on enclave exit
  - Fix possibly uninitialized pointer dereferences in load.c

v3
  - Refactor encl_op_array declaration and indexing (Jarkko)
  - Annotate encl_buffer with "used" attribute (Kai)
  - Split encl_buffer size and placement commits (Kai)

v2
  - Add additional check for NULL pointer (Kai)
  - Refine to produce proper static-pie executable
  - Fix linker script assertions
  - Specify memory clobber for inline asm instead of volatile (Kai)
  - Clarify why encl_buffer non-static (Jarkko, Kai)
  - Clarify -ffreestanding (Jarkko)

Best,
Jo

Jo Van Bulck (13):
  selftests/sgx: Fix uninitialized pointer dereference in error path
  selftests/sgx: Fix uninitialized pointer dereferences in
encl_get_entry
  selftests/sgx: Include memory clobber for inline asm in test enclave
  selftests/sgx: Separate linker options
  selftests/sgx: Specify freestanding environment for enclave
compilation
  selftests/sgx: Remove redundant enclave base address save/restore
  selftests/sgx: Produce static-pie executable for test enclave
  selftests/sgx: Handle relocations in test enclave
  selftests/sgx: Fix linker script asserts
  selftests/sgx: Ensure test enclave buffer is entirely preserved
  selftests/sgx: Ensure expected location of test enclave buffer
  selftests/sgx: Discard unsupported ELF sections
  selftests/sgx: Remove incomplete ABI sanitization code in test enclave

 tools/testing/selftests/sgx/Makefile  | 12 ++--
 tools/testing/selftests/sgx/defines.h |  2 +
 tools/testing/selftests/sgx/load.c|  9 ++-
 tools/testing/selftests/sgx/sigstruct.c   |  5 +-
 tools/testing/selftests/sgx/test_encl.c   | 67 +--
 tools/testing/selftests/sgx/test_encl.lds | 10 +--
 .../selftests/sgx/test_encl_bootstrap.S   | 28 +++-
 7 files changed, 77 insertions(+), 56 deletions(-)

-- 
2.25.1



[PATCH v7 01/13] selftests/sgx: Fix uninitialized pointer dereference in error path

2023-10-05 Thread Jo Van Bulck
Ensure ctx is zero-initialized, such that the encl_measure function will
not call EVP_MD_CTX_destroy with an uninitialized ctx pointer in case of an
early error during key generation.

Fixes: 2adcba79e69d ("selftests/x86: Add a selftest for SGX")
Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
Acked-by: Kai Huang 
---
 tools/testing/selftests/sgx/sigstruct.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/sgx/sigstruct.c 
b/tools/testing/selftests/sgx/sigstruct.c
index a07896a46364..d73b29becf5b 100644
--- a/tools/testing/selftests/sgx/sigstruct.c
+++ b/tools/testing/selftests/sgx/sigstruct.c
@@ -318,9 +318,9 @@ bool encl_measure(struct encl *encl)
struct sgx_sigstruct *sigstruct = >sigstruct;
struct sgx_sigstruct_payload payload;
uint8_t digest[SHA256_DIGEST_LENGTH];
+   EVP_MD_CTX *ctx = NULL;
unsigned int siglen;
RSA *key = NULL;
-   EVP_MD_CTX *ctx;
int i;
 
memset(sigstruct, 0, sizeof(*sigstruct));
@@ -384,7 +384,8 @@ bool encl_measure(struct encl *encl)
return true;
 
 err:
-   EVP_MD_CTX_destroy(ctx);
+   if (ctx)
+   EVP_MD_CTX_destroy(ctx);
RSA_free(key);
return false;
 }
-- 
2.25.1



[PATCH v7 05/13] selftests/sgx: Specify freestanding environment for enclave compilation

2023-10-05 Thread Jo Van Bulck
Use -ffreestanding to assert the enclave compilation targets a
freestanding environment (i.e., without "main" or standard libraries).
This fixes clang reporting "undefined reference to `memset'" after
erroneously optimizing away the provided memset/memcpy implementations.

Still need to instruct the linker from using standard system startup
functions, but drop -nostartfiles as it is implied by -nostdlib.

Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
Acked-by: Kai Huang 
---
 tools/testing/selftests/sgx/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/sgx/Makefile 
b/tools/testing/selftests/sgx/Makefile
index dcdd04b322f8..7eb890bdd3f0 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -14,7 +14,7 @@ endif
 INCLUDES := -I$(top_srcdir)/tools/include
 HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC
 HOST_LDFLAGS := -z noexecstack -lcrypto
-ENCL_CFLAGS += -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
+ENCL_CFLAGS += -Wall -Werror -static -nostdlib -ffreestanding -fPIC \
   -fno-stack-protector -mrdrnd $(INCLUDES)
 ENCL_LDFLAGS := -Wl,-T,test_encl.lds,--build-id=none
 
-- 
2.25.1



[PATCH v7 11/13] selftests/sgx: Ensure expected location of test enclave buffer

2023-10-05 Thread Jo Van Bulck
The external tests manipulating page permissions expect encl_buffer to be
placed at the start of the test enclave's .data section. As this is not
guaranteed per the C standard, explicitly place encl_buffer in a separate
section that is explicitly placed at the start of the .data segment in the
linker script to avoid the compiler placing it somewhere else in .data.

Signed-off-by: Jo Van Bulck 
Reviewed-by: Jarkko Sakkinen 
Acked-by: Kai Huang 
---
 tools/testing/selftests/sgx/defines.h | 1 +
 tools/testing/selftests/sgx/test_encl.c   | 8 
 tools/testing/selftests/sgx/test_encl.lds | 1 +
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/sgx/defines.h 
b/tools/testing/selftests/sgx/defines.h
index b8f482667ce1..402f8787a71c 100644
--- a/tools/testing/selftests/sgx/defines.h
+++ b/tools/testing/selftests/sgx/defines.h
@@ -14,6 +14,7 @@
 #define __aligned(x) __attribute__((__aligned__(x)))
 #define __packed __attribute__((packed))
 #define __used __attribute__((used))
+#define __section(x)__attribute__((__section__(x)))
 
 #include "../../../../arch/x86/include/asm/sgx.h"
 #include "../../../../arch/x86/include/asm/enclu.h"
diff --git a/tools/testing/selftests/sgx/test_encl.c 
b/tools/testing/selftests/sgx/test_encl.c
index 7465f121fb74..2c4d709cce2d 100644
--- a/tools/testing/selftests/sgx/test_encl.c
+++ b/tools/testing/selftests/sgx/test_encl.c
@@ -6,11 +6,11 @@
 
 /*
  * Data buffer spanning two pages that will be placed first in the .data
- * segment. Even if not used internally the second page is needed by external
- * test manipulating page permissions, so mark encl_buffer as "used" to make
- * sure it is entirely preserved by the compiler.
+ * segment via the linker script. Even if not used internally the second page
+ * is needed by external test manipulating page permissions, so mark
+ * encl_buffer as "used" to make sure it is entirely preserved by the compiler.
  */
-static uint8_t __used encl_buffer[8192] = { 1 };
+static uint8_t __used __section(".data.encl_buffer") encl_buffer[8192] = { 1 };
 
 enum sgx_enclu_function {
EACCEPT = 0x5,
diff --git a/tools/testing/selftests/sgx/test_encl.lds 
b/tools/testing/selftests/sgx/test_encl.lds
index 6ffdfc9fb4cf..333a3e78fdc9 100644
--- a/tools/testing/selftests/sgx/test_encl.lds
+++ b/tools/testing/selftests/sgx/test_encl.lds
@@ -24,6 +24,7 @@ SECTIONS
} : text
 
.data : {
+   *(.data.encl_buffer)
*(.data*)
} : data
 
-- 
2.25.1



Re: (subset) [PATCH v2 1/2] dt-bindings: mfd: qcom,spmi-pmic: Drop unused labels from examples

2023-10-05 Thread Lee Jones
On Mon, 02 Oct 2023 09:00:11 +0200, Luca Weiss wrote:
> There's not much point in having unused labels in the binding example,
> so drop them.
> 
> This patch was originally motivated by ea25d61b448a ("arm64: dts: qcom:
> Use plural _gpios node label for PMIC gpios") updating all dts files to
> use the plural _gpios label instead of the singular _gpio as label but
> this example wasn't updated. But since we should just drop the label
> alltogether, do that.
> 
> [...]

Applied, thanks!

[1/2] dt-bindings: mfd: qcom,spmi-pmic: Drop unused labels from examples
  commit: cac94656ff2b16827d7cd455f0d3746280cf3138

--
Lee Jones [李琼斯]



Re: [PATCH v1 1/9] ACPI: bus: Make notify wrappers more generic

2023-10-05 Thread Rafael J. Wysocki
On Thu, Oct 5, 2023 at 2:05 PM Wilczynski, Michal
 wrote:
> On 10/5/2023 12:57 PM, Rafael J. Wysocki wrote:
> > On Thu, Oct 5, 2023 at 10:10 AM Wilczynski, Michal
> >  wrote:

[cut]

> >>>
> >>> That said, why exactly is it better to use acpi_handle instead of a
> >>> struct acpi_device pointer?
> >> I wanted to make the wrapper as close as possible to the wrapped function.
> >> This way it would be easier to remove it in the future i.e if we ever deem
> >> extra synchronization not worth it etc. What the ACPICA function need to
> >> install a wrapper is a handle not a pointer to a device.
> >> So there is no need for a middle man.
> > Taking a struct acpi_device pointer as the first argument is part of
> > duplication reduction, however, because in the most common case it
> > saves the users of it the need to dereference the struct acpi_device
> > they get from ACPI_COMPANION() in order to obtain the handle.
>
> User don't even have to use acpi device anywhere, as he can choose
> to use ACPI_HANDLE() instead on 'struct device*' and never interact
> with acpi device directly.

Have you actually looked at this macro?  It is a wrapper around
ACPI_COMPANION().

So they may think that they don't use struct acpi_device pointers, but
in fact they do.

> >
> > Arguably, acpi_handle is an ACPICA concept and it is better to reduce
> > its usage outside ACPICA.
>
> Use of acpi_handle is deeply entrenched in the kernel. There is even
> a macro ACPI_HANDLE() that returns acpi_handle. I would say it's
> way too late to limit it to ACPICA internal code.

So there is a difference between "limiting to ACPICA" and "reducing".
It cannot be limited to ACPICA, because the code outside ACPICA needs
to evaluate ACPI objects sometimes and ACPI handles are needed for
that.

And this observation doesn't invalidate the point.

> >
> >>> Realistically, in a platform driver you'll need the latter to obtain
> >>> the former anyway.
> >> I don't want to introduce arbitrary limitations where they are not 
> >> necessary.
> > I'm not sure what you mean.  This patch is changing existing functions.
>
> That's true, but those functions aren't yet deeply entrenched in the
> kernel yet, so in my view how they should look like should still be
> a subject for discussion, as for now they're only used locally in
> drivers/acpi, and my next patchset, that would remove .notify in
> platform directory would spread them more, and would
> make them harder to change. For now we can change how they
> work pretty painlessly.

I see no particular reason to do that, though.

What specifically is a problem with passing struct acpi_device
pointers to the wrappers?  I don't see any TBH.

> >
> >> It is often the case that driver allocates it's own private struct using 
> >> kmalloc
> >> family of functions, and that structure already contains everything that is
> >> needed to remove the handler, so why force ? There are already examples
> >> in the drivers that do that i.e in acpi_video the function
> >> acpi_video_dev_add_notify_handler() uses raw ACPICA handler to install
> >> a notify handler and it passes private structure there.
> >> So there is value in leaving the choice of an actual type to the user of 
> >> the
> >> API.
> > No, if the user has a pointer to struct acpi_device already, there is
> > no difference between passing this and passing the acpi_handle from it
> > except for the extra dereference in the latter case.
>
> Dereference would happen anyway in the wrapper, and it doesn't cause
> any harm anyway for readability in my opinion. And of course you don't
> have to use acpi device at all, you can use ACPI_HANDLE() macro.

So one can use ACPI_COMPANION() just as well and it is slightly less overhead.

> >
> > If the user doesn't have a struct acpi_device pointer, let them use
> > the raw ACPICA handler directly and worry about the synchronization
> > themselves.
>
> As mentioned acpi_device pointer is not really required to use the wrapper.
> Instead we can use ACPI_HANDLE() macro directly. Look at the usage of
> the wrapper in the AC driver [1].

You don't really have to repeat the same argument  several times and I
know how ACPI_HANDLE() works.  Also I don't like some of the things
done by this patch.

Whoever uses ACPI_HANDLE(), they also use ACPI_COMPANION() which is
hidden in the former.

If they don't need to store either the acpi_handle or the struct
acpi_device pointer, there is no reason at all to use the former
instead of the latter.

If they get an acpi_handle from somewhere else than ACPI_HANDLE(),
then yes, they would need to get the ACPI devices from there (which is
possible still), but they may be better off by using the raw ACPICA
interface for events in that case.

> -static void acpi_ac_remove(struct acpi_device *device)
> +static void acpi_ac_remove(struct platform_device *pdev)
>  {
> -   struct acpi_ac *ac = acpi_driver_data(device);
> +  struct acpi_ac *ac = platform_get_drvdata(pdev);
>
> -   

[PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-05 Thread Michal Suchanek
The default MODLIB value is composed of two variables and the hardcoded
string '/lib/modules/'.

MODLIB = $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)

Defining this middle part as a variable was rejected on the basis that
users can pass the whole MODLIB to make, such as

make 'MODLIB=$(INSTALL_MOD_PATH)/usr/lib/modules/$(KERNELRELEASE)'

However, this middle part of MODLIB is independently hardcoded by
rpm-pkg, and when the user alters MODLIB this is not reflected when
building the package.

Given that $(INSTALL_MOD_PATH) is overridden during the rpm package build
it is likely going to be empty. Then MODLIB can be passed to the rpm
package, and used in place of the whole
/usr/lib/modules/$(KERNELRELEASE) part.

Signed-off-by: Michal Suchanek 
---
 scripts/package/kernel.spec | 8 
 scripts/package/mkspec  | 1 +
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/scripts/package/kernel.spec b/scripts/package/kernel.spec
index 3eee0143e0c5..15f49c5077db 100644
--- a/scripts/package/kernel.spec
+++ b/scripts/package/kernel.spec
@@ -67,7 +67,7 @@ cp $(%{make} %{makeflags} -s image_name) 
%{buildroot}/boot/vmlinuz-%{KERNELRELEA
 %{make} %{makeflags} INSTALL_HDR_PATH=%{buildroot}/usr headers_install
 cp System.map %{buildroot}/boot/System.map-%{KERNELRELEASE}
 cp .config %{buildroot}/boot/config-%{KERNELRELEASE}
-ln -fns /usr/src/kernels/%{KERNELRELEASE} 
%{buildroot}/lib/modules/%{KERNELRELEASE}/build
+ln -fns /usr/src/kernels/%{KERNELRELEASE} %{buildroot}%{MODLIB}/build
 %if %{with_devel}
 %{make} %{makeflags} run-command 
KBUILD_RUN_COMMAND='${srctree}/scripts/package/install-extmod-build 
%{buildroot}/usr/src/kernels/%{KERNELRELEASE}'
 %endif
@@ -98,8 +98,8 @@ fi
 
 %files
 %defattr (-, root, root)
-/lib/modules/%{KERNELRELEASE}
-%exclude /lib/modules/%{KERNELRELEASE}/build
+%{MODLIB}
+%exclude %{MODLIB}/build
 /boot/*
 
 %files headers
@@ -110,5 +110,5 @@ fi
 %files devel
 %defattr (-, root, root)
 /usr/src/kernels/%{KERNELRELEASE}
-/lib/modules/%{KERNELRELEASE}/build
+%{MODLIB}/build
 %endif
diff --git a/scripts/package/mkspec b/scripts/package/mkspec
index d41608efb747..d41b2e5304ac 100755
--- a/scripts/package/mkspec
+++ b/scripts/package/mkspec
@@ -18,6 +18,7 @@ fi
 cat<

Re: [PATCH v5 09/18] x86/sgx: Store struct sgx_encl when allocating new VA pages

2023-10-05 Thread Huang, Kai
On Wed, 2023-10-04 at 23:22 -0500, Haitao Huang wrote:
> On Wed, 04 Oct 2023 16:13:41 -0500, Huang, Kai  wrote:
> 
> > On Wed, 2023-10-04 at 10:03 -0500, Haitao Huang wrote:
> > > On Tue, 03 Oct 2023 15:07:42 -0500, Huang, Kai   
> > > wrote:
> > > 
> > > > On Tue, 2023-10-03 at 01:45 -0500, Haitao Huang wrote:
> > > > > > 
> > > > > > Btw, probably a dumb question:
> > > > > > 
> > > > > > Theoretically if you only need to find a victim enclave you don't  
> > > need
> > > > > > to put VA
> > > > > > pages to the unreclaimable list, because those VA pages will be  
> > > freed
> > > > > > anyway
> > > > > > when enclave is killed.  So keeping VA pages in the list is for>
> > > > > accounting all
> > > > > > the pages that the cgroup is having?
> > > > > 
> > > > > Yes basically tracking them in cgroups as they are allocated.
> > > > > 
> > > > > VAs and SECS may also come and go as swapping/unswapping happens.  
> > > But
> > > > > if acgroup is OOM, and all reclaimables are gone (swapped out), it'd
> > > > > have toreclaim VAs/SECs in the same cgroup starting from the front  
> > > of
> > > > > the LRUlist. To reclaim a VA/SECS, it identifies the enclave from  
> > > the
> > > > > owner ofthe VA/SECS page and kills it, as killing enclave is the  
> > > only
> > > > > way toreclaim VA/SECS pages.
> > > > 
> > > > To kill enclave you just need to track SECS in  the unreclaimable  
> > > list.
> > > > Only when you want to account the total EPC pages via some list you
> > > > _probably_
> > > > need to track VA as well.  But I am not quite sure about this either.
> > > 
> > > There is a case where even SECS is paged out for an enclave with all
> > > reclaimables out.
> > 
> > Yes.  But this essentially means these enclaves are not active, thus  
> > shouldn't
> > be the victim of OOM?
> > 
> 
> But there are VA pages for the enclave at that moment. So it can be  
> candidate for OOM victim.

Yes.  I am not familiar with how does OOM choose victim, but it seems choosing
inactive enclaves seems more reasonable.


[...]

> > > There were some discussion on paging out VAs without killing enclaves  
> > > but
> > > it'd be complicated and not implemented yet.
> > 
> > No we don't involve swapping VA pages now.  It's a separate topic.
> > 
> Only mentioned it as a kind of constraints impacting current design.
> 
> Another potential alternative: we don't reclaim SECS either until OOM and  
> only track SECS pages for cgroups. But that would change current behavior.  
> And I'm not sure about other consequences, e.g., enclaves theoretically  
> can allocate pages (including VA pages) in different cgroups/processes, so  
> we may still end up tracking all VA pages for cgroups or we track SECS  
> page in all cgroups in which enclave allocated any pages. Let me know your  
> thoughts.

Let's not change current behaviour.  I seriously doubt that is needed.

So it seems to me that what we need is just some way to let the OOM find some
victim enclave.  I am not sure whether "tracking EPC pages in some lists" has
anything to do with cgroup accounting EPC pages, so will take a look the rest of
the patches.




Re: [PATCH 2/2] arm64: dts: qcom: msm8939-huawei-kiwi: Add initial device tree

2023-10-05 Thread lukas walter
>> Is this fine?
>> Should I send a V2 with the signoff and reserved-memory changes?
>I don't quite get it, what signoff?

Krzysztof noticed that the Signed-off-by are in the wrong order.
But I was told this alone is not worth a V2.


Re: [PATCH v5 13/18] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup

2023-10-05 Thread Huang, Kai
On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> From: Sean Christopherson 
> 
> Adjust and expose the top-level reclaim function as
> sgx_reclaim_epc_pages() for use by the upcoming EPC cgroup, which will
> initiate reclaim to enforce the max limit.
> 
> Make these adjustments to the function signature.
> 
> 1) To take a parameter that specifies the number of pages to scan for
> reclaiming. Define a max value of 32, but scan 16 in the case for the
> global reclaimer (ksgxd). The EPC cgroup will use it to specify a
> desired number of pages to be reclaimed up to the max value of 32.
> 
> 2) To take a flag to force reclaiming a page regardless of its age.  The
> EPC cgroup will use the flag to enforce its limits by draining the
> reclaimable lists before resorting to other measures, e.g. forcefully
> kill enclaves.
> 
> 3) Return the number of reclaimed pages. The EPC cgroup will use the
> result to track reclaiming progress and escalate to a more forceful
> reclaiming mode, e.g., calling this function with the flag to ignore age
> of pages.
> 
> Signed-off-by: Sean Christopherson 
> Co-developed-by: Kristen Carlson Accardi 
> Signed-off-by: Kristen Carlson Accardi 
> Co-developed-by: Haitao Huang 
> Signed-off-by: Haitao Huang 
> Cc: Sean Christopherson 
> ---
> V4:
> - Combined the 3 patches that made the individual changes to the
> function signature.
> - Removed 'high' limit in commit message.
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 31 +--
>  arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
>  2 files changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 3b875ab4dcd0..4e1a3e038db5 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -18,6 +18,11 @@
>  #include "encl.h"
>  #include "encls.h"
>  
> +/*
> + * Maximum number of pages to scan for reclaiming.
> + */
> +#define SGX_NR_TO_SCAN_MAX   32
> +
>  struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
>  static int sgx_nr_epc_sections;
>  static struct task_struct *ksgxd_tsk;
> @@ -279,7 +284,11 @@ static void sgx_reclaimer_write(struct sgx_epc_page 
> *epc_page,
>   mutex_unlock(>lock);
>  }
>  
> -/*
> +/**
> + * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers
> + * @nr_to_scan:   Number of EPC pages to scan for reclaim
> + * @ignore_age:   Reclaim a page even if it is young
> + *
>   * Take a fixed number of pages from the head of the active page pool and
>   * reclaim them to the enclave's private shmem files. Skip the pages, which 
> have
>   * been accessed since the last scan. Move those pages to the tail of active
> @@ -292,15 +301,14 @@ static void sgx_reclaimer_write(struct sgx_epc_page 
> *epc_page,
>   * problematic as it would increase the lock contention too much, which would
>   * halt forward progress.
>   */
> -static void sgx_reclaim_pages(void)
> +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age)

'size_t' looks odd.  Any reason to use it?

Given you only scan 32 at maximum, seems 'int' is good enough?

>  {
> - struct sgx_backing backing[SGX_NR_TO_SCAN];
> + struct sgx_backing backing[SGX_NR_TO_SCAN_MAX];
>   struct sgx_epc_page *epc_page, *tmp;
>   struct sgx_encl_page *encl_page;
>   pgoff_t page_index;
>   LIST_HEAD(iso);
> - int ret;
> - int i;
> + size_t ret, i;
>  
>   spin_lock(_global_lru.lock);
>   for (i = 0; i < SGX_NR_TO_SCAN; i++) {

The function comment says 

* @nr_to_scan:   Number of EPC pages to scan for reclaim

But I don't see it is even used, if my eye isn't deceiving me?

> @@ -326,13 +334,14 @@ static void sgx_reclaim_pages(void)
>   spin_unlock(_global_lru.lock);
>  
>   if (list_empty())
> - return;
> + return 0;
>  
>   i = 0;
>   list_for_each_entry_safe(epc_page, tmp, , list) {
>   encl_page = epc_page->encl_page;
>  
> - if (!sgx_reclaimer_age(epc_page))
> + if (i == SGX_NR_TO_SCAN_MAX ||

i == nr_to_scan?

And should we have a

if (nr_to_scan < SGX_NR_TO_SCAN_MAX)
return 0;

at the very beginning of this function?

> + (!ignore_age && !sgx_reclaimer_age(epc_page)))
>   goto skip;
>  
>   page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
> @@ -371,6 +380,8 @@ static void sgx_reclaim_pages(void)
>  
>   sgx_free_epc_page(epc_page);
>   }
> +
> + return i;
>  }
>  

I found this function a little bit odd, given the mixing of 'nr_to_scan',
SGX_NR_TO_SCAN and SGX_NR_TO_SCAN_MAX.

From the changelog:

1) To take a parameter that specifies the number of pages to scan for
reclaiming. Define a max value of 32, but scan 16 in the case for the
global reclaimer (ksgxd). 

It appears we want to make this function to scan @nr_to_scan for 

Re: [PATCH v5 09/18] x86/sgx: Store struct sgx_encl when allocating new VA pages

2023-10-05 Thread Haitao Huang

On Wed, 04 Oct 2023 16:13:41 -0500, Huang, Kai  wrote:


On Wed, 2023-10-04 at 10:03 -0500, Haitao Huang wrote:
On Tue, 03 Oct 2023 15:07:42 -0500, Huang, Kai   
wrote:


> On Tue, 2023-10-03 at 01:45 -0500, Haitao Huang wrote:
> > >
> > > Btw, probably a dumb question:
> > >
> > > Theoretically if you only need to find a victim enclave you don't  
need

> > > to put VA
> > > pages to the unreclaimable list, because those VA pages will be  
freed

> > > anyway
> > > when enclave is killed.  So keeping VA pages in the list is for>
> > accounting all
> > > the pages that the cgroup is having?
> >
> > Yes basically tracking them in cgroups as they are allocated.
> >
> > VAs and SECS may also come and go as swapping/unswapping happens.  
But

> > if acgroup is OOM, and all reclaimables are gone (swapped out), it'd
> > have toreclaim VAs/SECs in the same cgroup starting from the front  
of
> > the LRUlist. To reclaim a VA/SECS, it identifies the enclave from  
the
> > owner ofthe VA/SECS page and kills it, as killing enclave is the  
only

> > way toreclaim VA/SECS pages.
>
> To kill enclave you just need to track SECS in  the unreclaimable  
list.

> Only when you want to account the total EPC pages via some list you
> _probably_
> need to track VA as well.  But I am not quite sure about this either.

There is a case where even SECS is paged out for an enclave with all
reclaimables out.


Yes.  But this essentially means these enclaves are not active, thus  
shouldn't

be the victim of OOM?



But there are VA pages for the enclave at that moment. So it can be  
candidate for OOM victim.



So cgroup needs to track each page used by an enclave
and kill enclave when cgroup needs to lower usage by evicting an VA or
SECS page.


Let's discuss more on tracking SECS on unreclaimable list only.

Could we assume that when the OOM wants to pick up a victim to serve the  
new
enclave, there must be at least another one *active* enclave which still  
has the

SECS page in EPC?

No, at a given instant when OOM happens, "active" enclave's SECS may not  
be in EPC, but lots of VAs.


OOM := "no reclaimable pages left in the cgroup to reclaim and total usage  
is still at/near limit".





If yes, that enclave will be selected as victim.

If not, then no other enclave will be selected as victim.  Instead, only  
the new
enclave which is requesting more EPC will be selected, because it's SECS  
is on

the unreclaimable list.



You can't assume the requesting enclave's SECS is in unreclaimable list  
either. Think the request is from #PF in the scenario we fixed the NULL  
pointer of SECS by reloading it.


Somehow this is unacceptable, thus we need to track VA pages too in  
order to

kill other inactive enclave?



If we know for sure SECS will always be in EPC, thus tracked in  
unreclaimables, then we probably can do it (see below).

I hope the reason given above is clear.

There were some discussion on paging out VAs without killing enclaves  
but

it'd be complicated and not implemented yet.


No we don't involve swapping VA pages now.  It's a separate topic.


Only mentioned it as a kind of constraints impacting current design.

Another potential alternative: we don't reclaim SECS either until OOM and  
only track SECS pages for cgroups. But that would change current behavior.  
And I'm not sure about other consequences, e.g., enclaves theoretically  
can allocate pages (including VA pages) in different cgroups/processes, so  
we may still end up tracking all VA pages for cgroups or we track SECS  
page in all cgroups in which enclave allocated any pages. Let me know your  
thoughts.




BTW, I need clarify tracking pages which is done by LRUs vs usage
accounting which is done by charge/uncharge to misc. To me tracking is  
for

reclaiming not accounting. Also vEPCs not tracked at all but they are
accounted for.


I'll review the rest patches.  Thanks.



Thank you!
Haitao


Re: [PATCH v5 15/18] x86/sgx: Prepare for multiple LRUs

2023-10-05 Thread Huang, Kai
On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> +static inline struct sgx_epc_lru_lists *sgx_lru_lists(struct sgx_epc_page 
> *epc_page)
> +{
> + return _global_lru;
> +}
> +
> +static inline bool sgx_can_reclaim(void)
> +{
> + return !list_empty(_global_lru.reclaimable);
> +}
> +

Shouldn't sgx_can_reclaim() also take a 'struct sgx_epc_lru_lists *'?

I thought we also need to check whether a cgroup's LRU lists can be reclaimed?


Re: [RFC PATCH] tracing: change syscall number type in struct syscall_trace_*

2023-10-05 Thread Artem Savkov
On Wed, Oct 04, 2023 at 02:55:47PM +0200, Artem Savkov wrote:
> On Tue, Oct 03, 2023 at 09:38:44PM -0400, Steven Rostedt wrote:
> > On Mon,  2 Oct 2023 15:52:42 +0200
> > Artem Savkov  wrote:
> > 
> > > linux-rt-devel tree contains a patch that adds an extra member to struct
> > > trace_entry. This causes the offset of args field in struct
> > > trace_event_raw_sys_enter be different from the one in struct
> > > syscall_trace_enter:
> > 
> > This patch looks like it's fixing the symptom and not the issue. No code
> > should rely on the two event structures to be related. That's an unwanted
> > coupling, that will likely cause issues down the road (like the RT patch
> > you mentioned).
> 
> I agree, but I didn't see a better solution and that was my way of
> starting conversation, thus the RFC.
> 
> > > 
> > > struct trace_event_raw_sys_enter {
> > > struct trace_entry ent;  /* 012 */
> > > 
> > > /* XXX last struct has 3 bytes of padding */
> > > /* XXX 4 bytes hole, try to pack */
> > > 
> > > long int   id;   /*16 8 */
> > > long unsigned int  args[6];  /*2448 */
> > > /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> > > char   __data[]; /*72 0 */
> > > 
> > > /* size: 72, cachelines: 2, members: 4 */
> > > /* sum members: 68, holes: 1, sum holes: 4 */
> > > /* paddings: 1, sum paddings: 3 */
> > > /* last cacheline: 8 bytes */
> > > };
> > > 
> > > struct syscall_trace_enter {
> > > struct trace_entry ent;  /* 012 */
> > > 
> > > /* XXX last struct has 3 bytes of padding */
> > > 
> > > intnr;   /*12 4 */
> > > long unsigned int  args[];   /*16 0 */
> > > 
> > > /* size: 16, cachelines: 1, members: 3 */
> > > /* paddings: 1, sum paddings: 3 */
> > > /* last cacheline: 16 bytes */
> > > };
> > > 
> > > This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
> > > test_profiler testcase because max_ctx_offset is calculated based on the
> > > former struct, while off on the latter:
> > 
> > The above appears to be pointing to the real bug. The "is calculated based
> > on the former struct while off on the latter" Why are the two being used
> > together? They are supposed to be *unrelated*!
> > 
> > 
> > > 
> > >   10488 if (is_tracepoint || is_syscall_tp) {
> > >   10489 int off = 
> > > trace_event_get_offsets(event->tp_event);
> > 
> > So basically this is clumping together the raw_syscalls with the syscalls
> > events as if they are the same. But the are not. They are created
> > differently. It's basically like using one structure to get the offsets of
> > another structure. That would be a bug anyplace else in the kernel. Sounds
> > like it's a bug here too.
> > 
> > I think the issue is with this code, not the tracing code.
> > 
> > We could expose the struct syscall_trace_enter and syscall_trace_exit if
> > the offsets to those are needed.
> 
> I don't think we need syscall_trace_* offsets, looks like
> trace_event_get_offsets() should return offset trace_event_raw_sys_enter
> instead. I am still trying to figure out how all of this works together.
> Maybe Alexei or Andrii have more context here.

Turns out it is even more confusing. The tests dereference the context as
struct trace_event_raw_sys_enter so bpf verifier sets max_ctx_offset
based on that, then perf_event_set_bpf_prog() checks this offset against
the one in struct syscall_trace_enter, but what bpf prog really gets is
a pointer to struct syscall_tp_t from kernel/trace/trace_syscalls.c.

I don't know the history behind these decisions, but should the tests
dereference context as struct syscall_trace_enter instead and struct
syscall_tp_t be changed to have syscall_nr as int?

-- 
 Artem




[PATCH] bus: mhi: host: Add tracing support

2023-10-05 Thread Krishna chaitanya chundru
ord0, dword1),
+
+   TP_STRUCT__entry(
+   __string(name, name)
+   __field(u64, ptr)
+   __field(int, dword0)
+   __field(int, dword1)
+   ),
+
+   TP_fast_assign(
+   __assign_str(name, name);
+   __entry->ptr = ptr;
+   __entry->dword0 = dword0;
+   __entry->dword1 = dword1;
+   ),
+
+   TP_printk("%s: Processing Event:0x%llx 0x%08x 0x%08x\n",
+ __get_str(name), __entry->ptr, __entry->dword0, 
__entry->dword1)
+);
+
+TRACE_EVENT(mhi_process_ctrl_ev_ring,
+
+   TP_PROTO(const char *name, u64 rp, u64 ptr, int dword0, int dword1, 
const char *state),
+
+   TP_ARGS(name, rp, ptr, dword0, dword1, state),
+
+   TP_STRUCT__entry(
+   __string(name, name)
+   __field(u64, rp)
+   __field(u64, ptr)
+   __field(int, dword0)
+   __field(int, dword1)
+   __string(state, state)
+   ),
+
+   TP_fast_assign(
+   __assign_str(name, name);
+   __entry->rp = rp;
+   __entry->ptr = ptr;
+   __entry->dword0 = dword0;
+   __entry->dword1 = dword1;
+   __assign_str(state, state);
+   ),
+
+   TP_printk("%s: RP:0x%llx Processing Event:0x%llx 0x%08x 0x%08x 
state:%s\n",
+ __get_str(name), __entry->rp, __entry->ptr, __entry->dword0,
+ __entry->dword1, __get_str(state))
+);
+
+TRACE_EVENT(mhi_update_channel_state_start,
+
+   TP_PROTO(const char *name, int ch_num, const char *state),
+
+   TP_ARGS(name, ch_num, state),
+
+   TP_STRUCT__entry(
+   __string(name, name)
+   __field(int, ch_num)
+   __string(state, state)
+   ),
+
+   TP_fast_assign(
+   __assign_str(name, name);
+   __entry->ch_num = ch_num;
+   __assign_str(state, state);
+   ),
+
+   TP_printk("%s: ch%d: Updating state to: %s\n",
+ __get_str(name), __entry->ch_num,  __get_str(state))
+);
+
+TRACE_EVENT(mhi_update_channel_state_end,
+
+   TP_PROTO(const char *name, int ch_num, const char *state),
+
+   TP_ARGS(name, ch_num, state),
+
+   TP_STRUCT__entry(
+   __string(name, name)
+   __field(int, ch_num)
+   __string(state, state)
+   ),
+
+   TP_fast_assign(
+   __assign_str(name, name);
+   __entry->ch_num = ch_num;
+   __assign_str(state, state);
+   ),
+
+   TP_printk("%s: ch%d: Updated state to: %s\n",
+ __get_str(name), __entry->ch_num,  __get_str(state))
+);
+
+TRACE_EVENT(mhi_pm_st_worker,
+
+   TP_PROTO(const char *name, const char *state),
+
+   TP_ARGS(name, state),
+
+   TP_STRUCT__entry(
+   __string(name, name)
+   __string(state, state)
+   ),
+
+   TP_fast_assign(
+   __assign_str(name, name);
+   __assign_str(state, state);
+   ),
+
+   TP_printk("%s: Handling state transition: %s\n", __get_str(name), 
__get_str(state))
+);
+
+#endif
+#include 

---
base-commit: 3006adf3be79cde4d14b1800b963b82b6e5572e0
change-id: 20231005-ftrace_support-6869d4156139

Best regards,
-- 
Krishna chaitanya chundru 




Re: [PATCH v1 1/9] ACPI: bus: Make notify wrappers more generic

2023-10-05 Thread Wilczynski, Michal



On 10/5/2023 12:57 PM, Rafael J. Wysocki wrote:
> On Thu, Oct 5, 2023 at 10:10 AM Wilczynski, Michal
>  wrote:
>> Hi,
>>
>> Thanks for your review !
>>
>> On 10/4/2023 9:09 PM, Rafael J. Wysocki wrote:
>>> On Mon, Sep 25, 2023 at 6:31 PM Michal Wilczynski
>>>  wrote:
 acpi_dev_install_notify_handler() and acpi_dev_remove_notify_handler()
 are wrappers around ACPICA installers. They are meant to save some
 duplicated code from drivers. However as we're moving towards drivers
 operating on platform_device they become a bit inconvenient to use as
 inside the driver code we mostly want to use driver data of platform
 device instead of ACPI device.
>>> That's fair enough, but ->
>>>
 Make notify handlers installer wrappers more generic, while still
 saving some code that would be duplicated otherwise.

 Reviewed-by: Andy Shevchenko 
 Signed-off-by: Michal Wilczynski 
 ---

 Notes:
 So one solution could be to just replace acpi_device with
 platform_device as an argument in those functions. However I don't
 believe this is a correct solution, as it is very often the case that
 drivers declare their own private structures which gets allocated 
 during
 the .probe() callback, and become the heart of the driver. When drivers
 do that it makes much more sense to just pass the private structure
 to the notify handler instead of forcing user to dance with the
 platform_device or acpi_device.

  drivers/acpi/ac.c |  6 +++---
  drivers/acpi/acpi_video.c |  6 +++---
  drivers/acpi/battery.c|  6 +++---
  drivers/acpi/bus.c| 14 ++
  drivers/acpi/hed.c|  6 +++---
  drivers/acpi/nfit/core.c  |  6 +++---
  drivers/acpi/thermal.c|  6 +++---
  include/acpi/acpi_bus.h   |  9 -
  8 files changed, 28 insertions(+), 31 deletions(-)

 diff --git a/drivers/acpi/ac.c b/drivers/acpi/ac.c
 index 225dc6818751..0b245f9f7ec8 100644
 --- a/drivers/acpi/ac.c
 +++ b/drivers/acpi/ac.c
 @@ -256,8 +256,8 @@ static int acpi_ac_add(struct acpi_device *device)
 ac->battery_nb.notifier_call = acpi_ac_battery_notify;
 register_acpi_notifier(>battery_nb);

 -   result = acpi_dev_install_notify_handler(device, ACPI_ALL_NOTIFY,
 -acpi_ac_notify);
 +   result = acpi_dev_install_notify_handler(device->handle, 
 ACPI_ALL_NOTIFY,
 +acpi_ac_notify, device);
 if (result)
 goto err_unregister;

 @@ -306,7 +306,7 @@ static void acpi_ac_remove(struct acpi_device *device)

 ac = acpi_driver_data(device);

 -   acpi_dev_remove_notify_handler(device, ACPI_ALL_NOTIFY,
 +   acpi_dev_remove_notify_handler(device->handle, ACPI_ALL_NOTIFY,
acpi_ac_notify);
 power_supply_unregister(ac->charger);
 unregister_acpi_notifier(>battery_nb);
 diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
 index 948e31f7ce6e..025c17890127 100644
 --- a/drivers/acpi/acpi_video.c
 +++ b/drivers/acpi/acpi_video.c
 @@ -2059,8 +2059,8 @@ static int acpi_video_bus_add(struct acpi_device 
 *device)

 acpi_video_bus_add_notify_handler(video);

 -   error = acpi_dev_install_notify_handler(device, ACPI_DEVICE_NOTIFY,
 -   acpi_video_bus_notify);
 +   error = acpi_dev_install_notify_handler(device->handle, 
 ACPI_DEVICE_NOTIFY,
 +   acpi_video_bus_notify, 
 device);
 if (error)
 goto err_remove;

 @@ -2092,7 +2092,7 @@ static void acpi_video_bus_remove(struct acpi_device 
 *device)

 video = acpi_driver_data(device);

 -   acpi_dev_remove_notify_handler(device, ACPI_DEVICE_NOTIFY,
 +   acpi_dev_remove_notify_handler(device->handle, ACPI_DEVICE_NOTIFY,
acpi_video_bus_notify);

 mutex_lock(_list_lock);
 diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c
 index 969bf81e8d54..45dae32a8646 100644
 --- a/drivers/acpi/battery.c
 +++ b/drivers/acpi/battery.c
 @@ -1213,8 +1213,8 @@ static int acpi_battery_add(struct acpi_device 
 *device)

 device_init_wakeup(>dev, 1);

 -   result = acpi_dev_install_notify_handler(device, ACPI_ALL_NOTIFY,
 -acpi_battery_notify);
 +   result = acpi_dev_install_notify_handler(device->handle, 
 ACPI_ALL_NOTIFY,
 +acpi_battery_notify, 
 device);
 

Re: [PATCH v1 1/9] ACPI: bus: Make notify wrappers more generic

2023-10-05 Thread Rafael J. Wysocki
On Thu, Oct 5, 2023 at 10:10 AM Wilczynski, Michal
 wrote:
>
> Hi,
>
> Thanks for your review !
>
> On 10/4/2023 9:09 PM, Rafael J. Wysocki wrote:
> > On Mon, Sep 25, 2023 at 6:31 PM Michal Wilczynski
> >  wrote:
> >> acpi_dev_install_notify_handler() and acpi_dev_remove_notify_handler()
> >> are wrappers around ACPICA installers. They are meant to save some
> >> duplicated code from drivers. However as we're moving towards drivers
> >> operating on platform_device they become a bit inconvenient to use as
> >> inside the driver code we mostly want to use driver data of platform
> >> device instead of ACPI device.
> > That's fair enough, but ->
> >
> >> Make notify handlers installer wrappers more generic, while still
> >> saving some code that would be duplicated otherwise.
> >>
> >> Reviewed-by: Andy Shevchenko 
> >> Signed-off-by: Michal Wilczynski 
> >> ---
> >>
> >> Notes:
> >> So one solution could be to just replace acpi_device with
> >> platform_device as an argument in those functions. However I don't
> >> believe this is a correct solution, as it is very often the case that
> >> drivers declare their own private structures which gets allocated 
> >> during
> >> the .probe() callback, and become the heart of the driver. When drivers
> >> do that it makes much more sense to just pass the private structure
> >> to the notify handler instead of forcing user to dance with the
> >> platform_device or acpi_device.
> >>
> >>  drivers/acpi/ac.c |  6 +++---
> >>  drivers/acpi/acpi_video.c |  6 +++---
> >>  drivers/acpi/battery.c|  6 +++---
> >>  drivers/acpi/bus.c| 14 ++
> >>  drivers/acpi/hed.c|  6 +++---
> >>  drivers/acpi/nfit/core.c  |  6 +++---
> >>  drivers/acpi/thermal.c|  6 +++---
> >>  include/acpi/acpi_bus.h   |  9 -
> >>  8 files changed, 28 insertions(+), 31 deletions(-)
> >>
> >> diff --git a/drivers/acpi/ac.c b/drivers/acpi/ac.c
> >> index 225dc6818751..0b245f9f7ec8 100644
> >> --- a/drivers/acpi/ac.c
> >> +++ b/drivers/acpi/ac.c
> >> @@ -256,8 +256,8 @@ static int acpi_ac_add(struct acpi_device *device)
> >> ac->battery_nb.notifier_call = acpi_ac_battery_notify;
> >> register_acpi_notifier(>battery_nb);
> >>
> >> -   result = acpi_dev_install_notify_handler(device, ACPI_ALL_NOTIFY,
> >> -acpi_ac_notify);
> >> +   result = acpi_dev_install_notify_handler(device->handle, 
> >> ACPI_ALL_NOTIFY,
> >> +acpi_ac_notify, device);
> >> if (result)
> >> goto err_unregister;
> >>
> >> @@ -306,7 +306,7 @@ static void acpi_ac_remove(struct acpi_device *device)
> >>
> >> ac = acpi_driver_data(device);
> >>
> >> -   acpi_dev_remove_notify_handler(device, ACPI_ALL_NOTIFY,
> >> +   acpi_dev_remove_notify_handler(device->handle, ACPI_ALL_NOTIFY,
> >>acpi_ac_notify);
> >> power_supply_unregister(ac->charger);
> >> unregister_acpi_notifier(>battery_nb);
> >> diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
> >> index 948e31f7ce6e..025c17890127 100644
> >> --- a/drivers/acpi/acpi_video.c
> >> +++ b/drivers/acpi/acpi_video.c
> >> @@ -2059,8 +2059,8 @@ static int acpi_video_bus_add(struct acpi_device 
> >> *device)
> >>
> >> acpi_video_bus_add_notify_handler(video);
> >>
> >> -   error = acpi_dev_install_notify_handler(device, ACPI_DEVICE_NOTIFY,
> >> -   acpi_video_bus_notify);
> >> +   error = acpi_dev_install_notify_handler(device->handle, 
> >> ACPI_DEVICE_NOTIFY,
> >> +   acpi_video_bus_notify, 
> >> device);
> >> if (error)
> >> goto err_remove;
> >>
> >> @@ -2092,7 +2092,7 @@ static void acpi_video_bus_remove(struct acpi_device 
> >> *device)
> >>
> >> video = acpi_driver_data(device);
> >>
> >> -   acpi_dev_remove_notify_handler(device, ACPI_DEVICE_NOTIFY,
> >> +   acpi_dev_remove_notify_handler(device->handle, ACPI_DEVICE_NOTIFY,
> >>acpi_video_bus_notify);
> >>
> >> mutex_lock(_list_lock);
> >> diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c
> >> index 969bf81e8d54..45dae32a8646 100644
> >> --- a/drivers/acpi/battery.c
> >> +++ b/drivers/acpi/battery.c
> >> @@ -1213,8 +1213,8 @@ static int acpi_battery_add(struct acpi_device 
> >> *device)
> >>
> >> device_init_wakeup(>dev, 1);
> >>
> >> -   result = acpi_dev_install_notify_handler(device, ACPI_ALL_NOTIFY,
> >> -acpi_battery_notify);
> >> +   result = acpi_dev_install_notify_handler(device->handle, 
> >> ACPI_ALL_NOTIFY,
> >> +acpi_battery_notify, 
> >> device);
> >> if (result)
> >> goto fail_pm;
> >>
> >> 

Re: [PATCH v1 1/9] ACPI: bus: Make notify wrappers more generic

2023-10-05 Thread Wilczynski, Michal
Hi,

Thanks for your review !

On 10/4/2023 9:09 PM, Rafael J. Wysocki wrote:
> On Mon, Sep 25, 2023 at 6:31 PM Michal Wilczynski
>  wrote:
>> acpi_dev_install_notify_handler() and acpi_dev_remove_notify_handler()
>> are wrappers around ACPICA installers. They are meant to save some
>> duplicated code from drivers. However as we're moving towards drivers
>> operating on platform_device they become a bit inconvenient to use as
>> inside the driver code we mostly want to use driver data of platform
>> device instead of ACPI device.
> That's fair enough, but ->
>
>> Make notify handlers installer wrappers more generic, while still
>> saving some code that would be duplicated otherwise.
>>
>> Reviewed-by: Andy Shevchenko 
>> Signed-off-by: Michal Wilczynski 
>> ---
>>
>> Notes:
>> So one solution could be to just replace acpi_device with
>> platform_device as an argument in those functions. However I don't
>> believe this is a correct solution, as it is very often the case that
>> drivers declare their own private structures which gets allocated during
>> the .probe() callback, and become the heart of the driver. When drivers
>> do that it makes much more sense to just pass the private structure
>> to the notify handler instead of forcing user to dance with the
>> platform_device or acpi_device.
>>
>>  drivers/acpi/ac.c |  6 +++---
>>  drivers/acpi/acpi_video.c |  6 +++---
>>  drivers/acpi/battery.c|  6 +++---
>>  drivers/acpi/bus.c| 14 ++
>>  drivers/acpi/hed.c|  6 +++---
>>  drivers/acpi/nfit/core.c  |  6 +++---
>>  drivers/acpi/thermal.c|  6 +++---
>>  include/acpi/acpi_bus.h   |  9 -
>>  8 files changed, 28 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/acpi/ac.c b/drivers/acpi/ac.c
>> index 225dc6818751..0b245f9f7ec8 100644
>> --- a/drivers/acpi/ac.c
>> +++ b/drivers/acpi/ac.c
>> @@ -256,8 +256,8 @@ static int acpi_ac_add(struct acpi_device *device)
>> ac->battery_nb.notifier_call = acpi_ac_battery_notify;
>> register_acpi_notifier(>battery_nb);
>>
>> -   result = acpi_dev_install_notify_handler(device, ACPI_ALL_NOTIFY,
>> -acpi_ac_notify);
>> +   result = acpi_dev_install_notify_handler(device->handle, 
>> ACPI_ALL_NOTIFY,
>> +acpi_ac_notify, device);
>> if (result)
>> goto err_unregister;
>>
>> @@ -306,7 +306,7 @@ static void acpi_ac_remove(struct acpi_device *device)
>>
>> ac = acpi_driver_data(device);
>>
>> -   acpi_dev_remove_notify_handler(device, ACPI_ALL_NOTIFY,
>> +   acpi_dev_remove_notify_handler(device->handle, ACPI_ALL_NOTIFY,
>>acpi_ac_notify);
>> power_supply_unregister(ac->charger);
>> unregister_acpi_notifier(>battery_nb);
>> diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
>> index 948e31f7ce6e..025c17890127 100644
>> --- a/drivers/acpi/acpi_video.c
>> +++ b/drivers/acpi/acpi_video.c
>> @@ -2059,8 +2059,8 @@ static int acpi_video_bus_add(struct acpi_device 
>> *device)
>>
>> acpi_video_bus_add_notify_handler(video);
>>
>> -   error = acpi_dev_install_notify_handler(device, ACPI_DEVICE_NOTIFY,
>> -   acpi_video_bus_notify);
>> +   error = acpi_dev_install_notify_handler(device->handle, 
>> ACPI_DEVICE_NOTIFY,
>> +   acpi_video_bus_notify, 
>> device);
>> if (error)
>> goto err_remove;
>>
>> @@ -2092,7 +2092,7 @@ static void acpi_video_bus_remove(struct acpi_device 
>> *device)
>>
>> video = acpi_driver_data(device);
>>
>> -   acpi_dev_remove_notify_handler(device, ACPI_DEVICE_NOTIFY,
>> +   acpi_dev_remove_notify_handler(device->handle, ACPI_DEVICE_NOTIFY,
>>acpi_video_bus_notify);
>>
>> mutex_lock(_list_lock);
>> diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c
>> index 969bf81e8d54..45dae32a8646 100644
>> --- a/drivers/acpi/battery.c
>> +++ b/drivers/acpi/battery.c
>> @@ -1213,8 +1213,8 @@ static int acpi_battery_add(struct acpi_device *device)
>>
>> device_init_wakeup(>dev, 1);
>>
>> -   result = acpi_dev_install_notify_handler(device, ACPI_ALL_NOTIFY,
>> -acpi_battery_notify);
>> +   result = acpi_dev_install_notify_handler(device->handle, 
>> ACPI_ALL_NOTIFY,
>> +acpi_battery_notify, 
>> device);
>> if (result)
>> goto fail_pm;
>>
>> @@ -1241,7 +1241,7 @@ static void acpi_battery_remove(struct acpi_device 
>> *device)
>>
>> battery = acpi_driver_data(device);
>>
>> -   acpi_dev_remove_notify_handler(device, ACPI_ALL_NOTIFY,
>> +   acpi_dev_remove_notify_handler(device->handle, ACPI_ALL_NOTIFY,
>>