Re: [PATCH 5/5] modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules

2023-10-17 Thread Christoph Hellwig
On Wed, Oct 18, 2023 at 01:30:18AM +0100, David Woodhouse wrote:
> 
> But if we're going to tolerate the core kernel still exporting some
> stuff with EXPORT_SYMBOL, why isn't OK for a GPL-licensed module do to
> the same? Even an *in-tree* GPL-licensed module now can't export
> functionality with EXPORT_SYMBOL and have it used with symbol_get().

Anything using symbol_get is by intent very deeply internal for tightly
coupled modules working together, and thus not a non-GPL export.

In fact the current series is just a stepping stone.  Once some mess
in the kvm/vfio integration is fixed up we'll require a new explicit
EXPORT_SYMBOL variant as symbol_get wasn't ever intended to be used
on totally random symbols not exported for use by symbol_get.



Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-17 Thread Haitao Huang

On Tue, 17 Oct 2023 14:13:22 -0500, Michal Koutný  wrote:

On Tue, Oct 17, 2023 at 08:54:48PM +0200, Michal Koutný  
 wrote:

Is this distinction between preemptability of EPC pages mandated by the
HW implementation? (host/"process" enclaves vs VM enclaves) Or do have
users an option to lock certain pages in memory that yields this
difference?


(After skimming Documentation/arch/x86/sgx.rst, Section "Virtual EPC")

Or would these two types warrant also two types of miscresource? (To
deal with each in own way.)


They are from the same bucket of HW resource so I think it's more suitable  
to be one resource type. Otherwise need to policy to dividing the  
capacity, etc. And it is still possible in future vEPC become reclaimable.


My current thinking is we probably can get away with non-preemptive  
max_write for enclaves too. See my other reply.


Thanks
Haitao


Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-17 Thread Haitao Huang

Hi Michal,

On Tue, 17 Oct 2023 13:54:46 -0500, Michal Koutný  wrote:


Hello Haitao.

On Tue, Oct 17, 2023 at 07:58:02AM -0500, Haitao Huang  
 wrote:
AFAIK, before we introducing max_write() callback in this series, no  
misc
controller would possibly enforce the limit when misc.max is reduced.  
e.g. I

don't think CVMs be killed when ASID limit is reduced and the cgroup was
full before limit is reduced.


Yes, misccontroller was meant to be simple, current >= max serves to
prevent new allocations.


Thanks for confirming. Maybe another alternative we just keep max_write
non-preemptive. No need to add max_write() callback.

The EPC controller only triggers reclaiming on new allocations or return
NOMEM if no more to reclaim. Reclaiming here includes normal EPC page  
reclaiming and killing enclaves in out of EPC cases. vEPCs assigned to  
guests are basically carved out and never reclaimable by the host.


As we no longer enforce limits on max_write a lower value, user should not  
expect cgroup to force reclaim pages from enclave or kill VMs/enclaves as  
a result of reducing limits 'in-place'. User should always create cgroups,  
set limits, launch enclave/VM into the groups created.



FTR, at some point in time memory.max was considered for reclaim control
of regular pages but it turned out to be too coarse (and OOM killing
processes if amount was not sensed correctly) and this eventually
evolved into specific mechanism of memory.reclaim.
So I'm mentioning this should that be an interface with better semantic
for your use case (and misc.max writes can remain non-preemptive).



Yes we can introduce misc.reclaim to give user a knob to forcefully  
reducing usage if
that is really needed in real usage. The semantics would make force-kill  
VMs explicit to user.



One more note -- I was quite confused when I read in the rest of the
series about OOM and _kill_ing but then I found no such measure in the
code implementation. So I would suggest two terminological changes:

- the basic premise of the series (00/18) is that EPC pages are a
  different resource than memory, hence choose a better suiting name
  than OOM (out of memory) condition,


I couldn't come up a good name. Out of EPC (OOEPC) maybe? I feel OOEPC  
would be hard to read in code though. OOM was relatable as it is similar  
to normal OOM but special kind of memory :-) I'm open to any better  
suggestions.



- killing -- (unless you have an intention to implement process
  termination later) My current interpretation that it is rather some
  aggressive unmapping within address space, so less confusing name for
  that would be "reclaim".



yes. Killing here refers to killing enclave, analogous to killing process,
not just 'reclaim' though. I can change to always use 'killing enclave'  
explicitly.




I think EPC pages to VMs could have the same behavior, once they are  
given
to a guest, never taken back by the host. For enclaves on host side,  
pages

are reclaimable, that allows us to enforce in a similar way to memcg.


Is this distinction between preemptability of EPC pages mandated by the
HW implementation? (host/"process" enclaves vs VM enclaves) Or do have
users an option to lock certain pages in memory that yields this
difference?



The difference is really a result of current vEPC implementation. Because
enclave pages once in use contains confidential content, they need special
process to reclaim. So it's complex to implement host reclaiming guest EPCs
gracefully.

Thanks
Haitao


Re: [PATCH v3] ACPI: NFIT: Use cleanup.h helpers instead of devm_*()

2023-10-17 Thread Dan Williams
Michal Wilczynski wrote:
> The new cleanup.h facilities that arrived in v6.5-rc1 can replace the
> the usage of devm semantics in acpi_nfit_init_interleave_set(). That
> routine appears to only be using devm to avoid goto statements. The
> new __free() annotation at variable declaration time can achieve the same
> effect more efficiently.
> 
> There is no end user visible side effects of this patch, I was
> motivated to send this cleanup to practice using the new helpers.
> 
> Suggested-by: Dave Jiang 
> Suggested-by: Andy Shevchenko 
> Reviewed-by: Dave Jiang 
> Reviewed-by: Andy Shevchenko 
> Signed-off-by: Michal Wilczynski 
> ---
> 
> Dan, would you like me to give you credit for the changelog changes
> with Co-developed-by tag ?

Nope, this looks good to me, thanks for fixing it up.

Reviewed-by: Dan Williams 



Re: [PATCH v11 1/5] lib: objpool added: ring-array based lockless MPMC

2023-10-17 Thread wuqiang.matt

On 2023/10/18 10:18, Chengming Zhou wrote:

On 2023/10/17 21:56, wuqiang.matt wrote:

objpool is a scalable implementation of high performance queue for
object allocation and reclamation, such as kretprobe instances.

With leveraging percpu ring-array to mitigate hot spots of memory
contention, it delivers near-linear scalability for high parallel
scenarios. The objpool is best suited for the following cases:
1) Memory allocation or reclamation are prohibited or too expensive
2) Consumers are of different priorities, such as irqs and threads

Limitations:
1) Maximum objects (capacity) is fixed after objpool creation
2) All pre-allocated objects are managed in percpu ring array,
which consumes more memory than linked lists


I'm curious why not just extend the existing lockless freelist to
percpu lockless freelists? And the percpu freelist is more flexible
to use than this percpu ring-array? The latter has to be fixed size
when creation.


I did that in first 2 versions, and abandoned it from the 3rd version.
The core reason is there are data races in freelist node:

After pop() from freelist, the freelist_node zone of the object could
be still in busy spinning by other nodes, so even the owner of this
object couldn't know when the races would go. freelist_zone is defined
as a union in the use cases (kretprobe), which brings potential issues.
If the object owner touches freelist_node:refs and then tries to push
to freelist to reclaim the object, freelist_add might just quit if
atomic_fetch_add_release(REFS_ON_FREELIST, >refs) returns true.

Keeping freelist_node as private could be fine, which is imposing an
extra rule to the users. Current ring array likes something moving
"freelist_node" to the ring array, which minimizes memory footprints.

Flexibility is not a strong requirement since all the use cases have
objects pre-allocated.


Thanks.


Thank you.




Signed-off-by: wuqiang.matt 
---
  include/linux/objpool.h | 176 +
  lib/Makefile|   2 +-
  lib/objpool.c   | 286 
  3 files changed, 463 insertions(+), 1 deletion(-)
  create mode 100644 include/linux/objpool.h
  create mode 100644 lib/objpool.c

diff --git a/include/linux/objpool.h b/include/linux/objpool.h
new file mode 100644
index ..4df18405420a
--- /dev/null
+++ b/include/linux/objpool.h
@@ -0,0 +1,181 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_OBJPOOL_H
+#define _LINUX_OBJPOOL_H
+
+#include 
+#include 
+
+/*
+ * objpool: ring-array based lockless MPMC queue
+ *
+ * Copyright: wuqiang.m...@bytedance.com,mhira...@kernel.org
+ *
+ * objpool is a scalable implementation of high performance queue for
+ * object allocation and reclamation, such as kretprobe instances.
+ *
+ * With leveraging percpu ring-array to mitigate hot spots of memory
+ * contention, it delivers near-linear scalability for high parallel
+ * scenarios. The objpool is best suited for the following cases:
+ * 1) Memory allocation or reclamation are prohibited or too expensive
+ * 2) Consumers are of different priorities, such as irqs and threads
+ *
+ * Limitations:
+ * 1) Maximum objects (capacity) is fixed after objpool creation
+ * 2) All pre-allocated objects are managed in percpu ring array,
+ *which consumes more memory than linked lists
+ */
+
+/**
+ * struct objpool_slot - percpu ring array of objpool
+ * @head: head sequence of the local ring array (to retrieve at)
+ * @tail: tail sequence of the local ring array (to append at)
+ * @last: the last sequence number marked as ready for retrieve
+ * @mask: bits mask for modulo capacity to compute array indexes
+ * @entries: object entries on this slot
+ *
+ * Represents a cpu-local array-based ring buffer, its size is specialized
+ * during initialization of object pool. The percpu objpool node is to be
+ * allocated from local memory for NUMA system, and to be kept compact in
+ * continuous memory: CPU assigned number of objects are stored just after
+ * the body of objpool_node.
+ *
+ * Real size of the ring array is far too smaller than the value range of
+ * head and tail, typed as uint32_t: [0, 2^32), so only lower bits (mask)
+ * of head and tail are used as the actual position in the ring array. In
+ * general the ring array is acting like a small sliding window, which is
+ * always moving forward in the loop of [0, 2^32).
+ */
+struct objpool_slot {
+   uint32_thead;
+   uint32_ttail;
+   uint32_tlast;
+   uint32_tmask;
+   void   *entries[];
+} __packed;
+
+struct objpool_head;
+
+/*
+ * caller-specified callback for object initial setup, it's only called
+ * once for each object (just after the memory allocation of the object)
+ */
+typedef int (*objpool_init_obj_cb)(void *obj, void *context);
+
+/* caller-specified cleanup callback for objpool destruction */
+typedef int (*objpool_fini_cb)(struct objpool_head 

Re: [PATCH v11 1/5] lib: objpool added: ring-array based lockless MPMC

2023-10-17 Thread Chengming Zhou
On 2023/10/17 21:56, wuqiang.matt wrote:
> objpool is a scalable implementation of high performance queue for
> object allocation and reclamation, such as kretprobe instances.
> 
> With leveraging percpu ring-array to mitigate hot spots of memory
> contention, it delivers near-linear scalability for high parallel
> scenarios. The objpool is best suited for the following cases:
> 1) Memory allocation or reclamation are prohibited or too expensive
> 2) Consumers are of different priorities, such as irqs and threads
> 
> Limitations:
> 1) Maximum objects (capacity) is fixed after objpool creation
> 2) All pre-allocated objects are managed in percpu ring array,
>which consumes more memory than linked lists
> 
I'm curious why not just extend the existing lockless freelist to
percpu lockless freelists? And the percpu freelist is more flexible
to use than this percpu ring-array? The latter has to be fixed size
when creation.

Thanks.

> Signed-off-by: wuqiang.matt 
> ---
>  include/linux/objpool.h | 176 +
>  lib/Makefile|   2 +-
>  lib/objpool.c   | 286 
>  3 files changed, 463 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/objpool.h
>  create mode 100644 lib/objpool.c
> 
> diff --git a/include/linux/objpool.h b/include/linux/objpool.h
> new file mode 100644
> index ..4df18405420a
> --- /dev/null
> +++ b/include/linux/objpool.h
> @@ -0,0 +1,181 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_OBJPOOL_H
> +#define _LINUX_OBJPOOL_H
> +
> +#include 
> +#include 
> +
> +/*
> + * objpool: ring-array based lockless MPMC queue
> + *
> + * Copyright: wuqiang.m...@bytedance.com,mhira...@kernel.org
> + *
> + * objpool is a scalable implementation of high performance queue for
> + * object allocation and reclamation, such as kretprobe instances.
> + *
> + * With leveraging percpu ring-array to mitigate hot spots of memory
> + * contention, it delivers near-linear scalability for high parallel
> + * scenarios. The objpool is best suited for the following cases:
> + * 1) Memory allocation or reclamation are prohibited or too expensive
> + * 2) Consumers are of different priorities, such as irqs and threads
> + *
> + * Limitations:
> + * 1) Maximum objects (capacity) is fixed after objpool creation
> + * 2) All pre-allocated objects are managed in percpu ring array,
> + *which consumes more memory than linked lists
> + */
> +
> +/**
> + * struct objpool_slot - percpu ring array of objpool
> + * @head: head sequence of the local ring array (to retrieve at)
> + * @tail: tail sequence of the local ring array (to append at)
> + * @last: the last sequence number marked as ready for retrieve
> + * @mask: bits mask for modulo capacity to compute array indexes
> + * @entries: object entries on this slot
> + *
> + * Represents a cpu-local array-based ring buffer, its size is specialized
> + * during initialization of object pool. The percpu objpool node is to be
> + * allocated from local memory for NUMA system, and to be kept compact in
> + * continuous memory: CPU assigned number of objects are stored just after
> + * the body of objpool_node.
> + *
> + * Real size of the ring array is far too smaller than the value range of
> + * head and tail, typed as uint32_t: [0, 2^32), so only lower bits (mask)
> + * of head and tail are used as the actual position in the ring array. In
> + * general the ring array is acting like a small sliding window, which is
> + * always moving forward in the loop of [0, 2^32).
> + */
> +struct objpool_slot {
> + uint32_thead;
> + uint32_ttail;
> + uint32_tlast;
> + uint32_tmask;
> + void   *entries[];
> +} __packed;
> +
> +struct objpool_head;
> +
> +/*
> + * caller-specified callback for object initial setup, it's only called
> + * once for each object (just after the memory allocation of the object)
> + */
> +typedef int (*objpool_init_obj_cb)(void *obj, void *context);
> +
> +/* caller-specified cleanup callback for objpool destruction */
> +typedef int (*objpool_fini_cb)(struct objpool_head *head, void *context);
> +
> +/**
> + * struct objpool_head - object pooling metadata
> + * @obj_size:   object size, aligned to sizeof(void *)
> + * @nr_objs:total objs (to be pre-allocated with objpool)
> + * @nr_cpus:local copy of nr_cpu_ids
> + * @capacity:   max objs can be managed by one objpool_slot
> + * @gfp:gfp flags for kmalloc & vmalloc
> + * @ref:refcount of objpool
> + * @flags:  flags for objpool management
> + * @cpu_slots:  pointer to the array of objpool_slot
> + * @release:resource cleanup callback
> + * @context:caller-provided context
> + */
> +struct objpool_head {
> + int obj_size;
> + int nr_objs;
> + int nr_cpus;
> + int capacity;
> + gfp_t  

Re: [PATCH kmod v5 5/5] libkmod, depmod, modprobe: Make directory for kernel modules configurable

2023-10-17 Thread Jan Engelhardt
On Tuesday 2023-10-17 19:50, Lucas De Marchi wrote:
>> +AC_ARG_WITH([module_directory],
>> +AS_HELP_STRING([--with-module-directory=DIR], [directory in which to
>> look for kernel modules - typically '/lib/modules' or
>> '${prefix}/lib/modules']),
>> +[], [with_module_directory=/lib/modules])
>> +AC_SUBST([module_directory], [$with_module_directory])
>
> we will probably have "fun" results if we accept a relative path here.

$ ./configure --prefix=/usr --bindir=../bin
configure: error: expected an absolute directory name for --bindir: ../bin

While such check does not exist for --with-module-directory, everyone
has likely been well-trained not to use relative paths. Even if, say,
cmake/meson *could* do it, it just would have never occurred to me to
actually *utilize* it, because it is just too ambiguous and
potentially dangerous.

Just think of all the fun you could have with LD_LIBRARY_PATH=".."




Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-17 Thread Jan Engelhardt
On Tuesday 2023-10-17 17:10, Michal Suchánek wrote:
>
>> In my system (Ubuntu), I see the directory paths
>> 
>> /usr/aarch64-linux-gnu/lib/
>> /usr/i686-linux-gnu/lib/
>> /usr/x86_64-linux-gnu/lib/
>> 
>> If there were such a crazy distro that supports multiple kernel arches
>> within a single image, modules might be installed:
>> /usr/x86_64-linux-gnu/lib/module//
>
>For me it's /usr/lib/i386-linux-gnu/.
>
>Did they change the scheme at some point?

It's a complicated mumble-jumble. Prior art exists as in:

 /opt/vendorThing/bin/...
 /usr/X11R6/lib/libXi.so.6 [host binary]
 /usr/x86_64-w64-mingw32/bin/as [host binary]
 /usr/x86_64-w64-mingw32/sys-root/mingw/bin/as.exe [foreign binary]
 /usr/platform/SUNW,Ultra-2/lib/libprtdiag_psr.so.1 [looks foreign]

The use of suffix-based naming must have been established sometime
near the end of the 90s or the start of 2000s as the first biarch
Linux distros emerged. Probably in gcc or glibc sources one will find
the root of where the use of suffix identifiers like /usr/lib64
started. Leaves the question open "why".



Re: [PATCH 5/5] modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules

2023-10-17 Thread David Woodhouse
On Tue, 2023-08-01 at 19:35 +0200, Christoph Hellwig wrote:
> It has recently come to my attention that nvidia is circumventing the
> protection added in 262e6ae7081d ("modules: inherit
> TAINT_PROPRIETARY_MODULE") by importing exports from their proprietary
> modules into an allegedly GPL licensed module and then rexporting them.
> 
> Given that symbol_get was only ever intended for tightly cooperating
> modules using very internal symbols it is logical to restrict it to
> being used on EXPORT_SYMBOL_GPL and prevent nvidia from costly DMCA
> Circumvention of Access Controls law suites.

I'm all for insisting that everything be exported with
EXPORT_SYMBOL_GPL and nothing at all ever be exported with just
EXPORT_SYMBOL.

But if we're going to tolerate the core kernel still exporting some
stuff with EXPORT_SYMBOL, why isn't OK for a GPL-licensed module do to
the same? Even an *in-tree* GPL-licensed module now can't export
functionality with EXPORT_SYMBOL and have it used with symbol_get().

We're forced to *either* allow direct linking by non-GPL modules, or
allow symbol_get(), but not both?

> Fixes: 262e6ae7081d ("modules: inherit TAINT_PROPRIETARY_MODULE")

Hm, the condition we really need to fix *that* is "symbol_get() will
only import symbols from GPL-licensed modules", isn't it?

As long as that property is correctly transitive, why does the symbol
itself have to be EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL? Am I
missing another potential loophole?

I suppose there's now scope for a different type of shim which
*directly* imports an EXPORT_SYMBOL function in order to export it
again as EXPORT_SYMBOL_GPL and thus allow the GPL export to be found
with symbol_get()?

That's the *converse* of the problematic shim that was being used
before, and from a licensing point of view it seems fine... it's just
working around the unintended side-effects of this patch?


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] arm64: dts: qcom: msm8939-longcheer-l9100: Add proximity-near-level

2023-10-17 Thread André Apitzsch
Am Dienstag, dem 17.10.2023 um 18:25 +0200 schrieb Konrad Dybcio:
> 
> 
> On 10/16/23 22:18, André Apitzsch wrote:
> > Consider an object near to the sensor when their distance is about
> > 4 cm
> > or below.
> > 
> > Signed-off-by: André Apitzsch 
> > ---
> Reviewed-by: Konrad Dybcio 
> 
> Out of interest, what is it set to by default?
> 
> Konrad

The default value is 0.

André


Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-17 Thread Michal Koutný
On Tue, Oct 17, 2023 at 08:54:48PM +0200, Michal Koutný  
wrote:
> Is this distinction between preemptability of EPC pages mandated by the
> HW implementation? (host/"process" enclaves vs VM enclaves) Or do have
> users an option to lock certain pages in memory that yields this
> difference?

(After skimming Documentation/arch/x86/sgx.rst, Section "Virtual EPC")

Or would these two types warrant also two types of miscresource? (To
deal with each in own way.)

Thanks,
Michal


signature.asc
Description: PGP signature


Re: [PATCH 1/3] dt-bindings: usb: fsa4480: Add data-lanes property to endpoint

2023-10-17 Thread Rob Herring
On Mon, Oct 16, 2023 at 04:32:55PM +0200, Neil Armstrong wrote:
> On 16/10/2023 16:22, Rob Herring wrote:
> > On Fri, Oct 13, 2023 at 01:38:05PM +0200, Luca Weiss wrote:
> > > Allow specifying data-lanes to reverse the SBU muxing orientation where
> > > necessary by the hardware design.
> > 
> > What situation in the hardware design makes this necessary. Please
> > describe the problem.
> > 
> > > 
> > > Signed-off-by: Luca Weiss 
> > > ---
> > >   .../devicetree/bindings/usb/fcs,fsa4480.yaml   | 29 
> > > +-
> > >   1 file changed, 28 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/Documentation/devicetree/bindings/usb/fcs,fsa4480.yaml 
> > > b/Documentation/devicetree/bindings/usb/fcs,fsa4480.yaml
> > > index f6e7a5c1ff0b..86f6d633c2fb 100644
> > > --- a/Documentation/devicetree/bindings/usb/fcs,fsa4480.yaml
> > > +++ b/Documentation/devicetree/bindings/usb/fcs,fsa4480.yaml
> > > @@ -32,10 +32,37 @@ properties:
> > >   type: boolean
> > > port:
> > > -$ref: /schemas/graph.yaml#/properties/port
> > > +$ref: /schemas/graph.yaml#/$defs/port-base
> > >   description:
> > > A port node to link the FSA4480 to a TypeC controller for the 
> > > purpose of
> > > handling altmode muxing and orientation switching.
> > > +unevaluatedProperties: false
> > > +
> > > +properties:
> > > +  endpoint:
> > > +$ref: /schemas/graph.yaml#/$defs/endpoint-base
> > > +unevaluatedProperties: false
> > > +
> > > +properties:
> > > +  data-lanes:
> > > +$ref: /schemas/types.yaml#/definitions/uint32-array
> > > +description:
> > > +  Specifies how the AUX+/- lines are connected to SBU1/2.
> > 
> > Doesn't this depend on the connector orientation? Or it is both that and
> > the lines can be swapped on the PCB?
> > 
> > Seems like an abuse of data-lanes which already has a definition which
> > is not about swapping + and - differential lanes.
> 
> The FSA acts as a mux between DP AUX, Audio lanes on one side and
> the USB-C SBU lanes on the other side.
> ___  __
>   |  | |
>   |-- HP   --| |
>   |-- MIC  --| |or
> SoC   |  | MUX |-- SBU1 --->  To the USB-C
> Codec |-- AUX+ --| |-- SBU2 --->  connected
>   |-- AUX- --| |
> __|  | |
> 
> The SBU1 & SBU2 are connected to the USB-C connector, and the actual 
> orientation
> to the connected devices/cable/whatever is determined by the TPCM and the MUX 
> in
> the FSA4480 with be dynamically changed according to the CC1/CC2 detection 
> and PD alt mode.
> 
> But on the other side the orientation of the AUX+/AUX- connected to the SoC
> is not tied to the USB-C orientation but how it's routed on the PCB.
> 
> This describes how the AUX+/AUX- are physically routed to the FSA4480 chip.

I'd hate for this ASCII art to go to waste. Please add this detail to 
the commit message.

Rob



Re: [PATCH v5 01/18] cgroup/misc: Add per resource callbacks for CSS events

2023-10-17 Thread Michal Koutný
On Fri, Sep 22, 2023 at 08:06:40PM -0700, Haitao Huang 
 wrote:
> @@ -276,10 +276,13 @@ static ssize_t misc_cg_max_write(struct 
> kernfs_open_file *of, char *buf,
>  
>   cg = css_misc(of_css(of));
>  
> - if (READ_ONCE(misc_res_capacity[type]))
> + if (READ_ONCE(misc_res_capacity[type])) {
>   WRITE_ONCE(cg->res[type].max, max);
> - else
> + if (cg->res[type].max_write)
> + cg->res[type].max_write(cg);
> + } else {
>   ret = -EINVAL;
>
> + }

Is it time for a misc_cg_mutex? This given no synchronization guarantees
to implementors of max_write. (Alternatively, document it that the
callback must implement own synchronization.)


Thanks,
Michal


signature.asc
Description: PGP signature


Re: [PATCH v5 16/18] x86/sgx: Limit process EPC usage with misc cgroup controller

2023-10-17 Thread Michal Koutný
On Fri, Sep 22, 2023 at 08:06:55PM -0700, Haitao Huang 
 wrote:
> +static void sgx_epc_cgroup_free(struct misc_cg *cg)
> +{
> + struct sgx_epc_cgroup *epc_cg;
> +
> + epc_cg = sgx_epc_cgroup_from_misc_cg(cg);

It should check for !epc_cg since the misc controller implementation
in misc_cg_alloc() would roll back even on non-allocated resources.

> + cancel_work_sync(_cg->reclaim_work);
> + kfree(epc_cg);
> +}
> +
> +static void sgx_epc_cgroup_max_write(struct misc_cg *cg)
> +{
> + struct sgx_epc_reclaim_control rc;
> + struct sgx_epc_cgroup *epc_cg;
> +
> + epc_cg = sgx_epc_cgroup_from_misc_cg(cg);
> +
> + sgx_epc_reclaim_control_init(, epc_cg);
> + /* Let the reclaimer to do the work so user is not blocked */
> + queue_work(sgx_epc_cg_wq, _cg->reclaim_work);

This is weird. The writer will never learn about the result of the
operation.

Thanks,
Michal


signature.asc
Description: PGP signature


Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-17 Thread Michal Koutný
Hello Haitao.

On Tue, Oct 17, 2023 at 07:58:02AM -0500, Haitao Huang 
 wrote:
> AFAIK, before we introducing max_write() callback in this series, no misc
> controller would possibly enforce the limit when misc.max is reduced. e.g. I
> don't think CVMs be killed when ASID limit is reduced and the cgroup was
> full before limit is reduced.

Yes, misccontroller was meant to be simple, current >= max serves to
prevent new allocations.

FTR, at some point in time memory.max was considered for reclaim control
of regular pages but it turned out to be too coarse (and OOM killing
processes if amount was not sensed correctly) and this eventually
evolved into specific mechanism of memory.reclaim.
So I'm mentioning this should that be an interface with better semantic
for your use case (and misc.max writes can remain non-preemptive).

One more note -- I was quite confused when I read in the rest of the
series about OOM and _kill_ing but then I found no such measure in the
code implementation. So I would suggest two terminological changes:

- the basic premise of the series (00/18) is that EPC pages are a
  different resource than memory, hence choose a better suiting name
  than OOM (out of memory) condition,
- killing -- (unless you have an intention to implement process
  termination later) My current interpretation that it is rather some
  aggressive unmapping within address space, so less confusing name for
  that would be "reclaim".


> I think EPC pages to VMs could have the same behavior, once they are given
> to a guest, never taken back by the host. For enclaves on host side, pages
> are reclaimable, that allows us to enforce in a similar way to memcg.

Is this distinction between preemptability of EPC pages mandated by the
HW implementation? (host/"process" enclaves vs VM enclaves) Or do have
users an option to lock certain pages in memory that yields this
difference?

Regards,
Michal


signature.asc
Description: PGP signature


RE: [PATCH v2 5/6] ACPI: NFIT: Replace acpi_driver with platform_driver

2023-10-17 Thread Dan Williams
Michal Wilczynski wrote:
> NFIT driver uses struct acpi_driver incorrectly to register itself.
> This is wrong as the instances of the ACPI devices are not meant
> to be literal devices, they're supposed to describe ACPI entry of a
> particular device.
> 
> Use platform_driver instead of acpi_driver. In relevant places call
> platform devices instances pdev to make a distinction with ACPI
> devices instances.
> 
> NFIT driver uses devm_*() family of functions extensively. This change
> has no impact on correct functioning of the whole devm_*() family of
> functions, since the lifecycle of the device stays the same. It is still
> being created during the enumeration, and destroyed on platform device
> removal.

I notice this verbiage has the same fundamental misunderstanding of devm
allocation lifetime as the acpi_nfit_init_interleave_set() discussion.
The devm allocation lifetime typically starts in driver->probe() and
ends either with driver->probe() failure, or the driver->remove() call.
Note that the driver->remove() call is invoked not only for
platform-device removal, but also driver "unbind" events. So the
"destroyed on platform device removal" is the least likely way that
these allocations are torn down given ACPI0012 devices are never
removed.

Outside of that, my main concern about this patch is that I expect it
breaks unit tests. The infrastructure in
tools/testing/nvdimm/test/nfit.c emulates an ACPI0012 device that allows
for deeper regression testing given hardware is difficult to come by,
and because QEMU does not implement some of the tricky corner cases that
the unit tests cover.

This needs to pass tests, but fair warning, 
tools/testing/nvdimm/test/nfit.c does some non-idiomatic + "strange"
things to achieve deeper test coverage. So I expect that if this breaks
tests as I expect the effort needed to fix the emulation could be
significant.

If you want to give running the tests a try the easiest would be to use
"run_qemu.sh" with --nfit-test option [1], or you can try to setup an
environment manually using the ndctl instructions [2].

[1]: https://github.com/pmem/run_qemu
[2]: https://github.com/pmem/ndctl#readme



Re: [PATCH kmod v5 5/5] libkmod, depmod, modprobe: Make directory for kernel modules configurable

2023-10-17 Thread Lucas De Marchi

On Tue, Jul 18, 2023 at 02:01:56PM +0200, Michal Suchanek wrote:

Now that modprobe.d is searched under ${prefix}/lib, allow a complete
transition to files only under ${prefix} by adding a ${module_directory}
configuration. This specifies the directory where to search for kernel
modules and should match the location where the kernel/distro installs
them.

With this distributions that do not want to ship files in /lib can also
move kernel modules to /usr while others can keep them in /lib.

Signed-off-by: Michal Suchanek 
---
v4: Make the whole path configurable
v5: More verbose commit message
---
Makefile.am  |   3 +-
configure.ac |   7 ++
libkmod/libkmod.c|   4 +-
man/Makefile.am  |   1 +
man/depmod.d.xml |   6 +-
man/depmod.xml   |   4 +-
man/modinfo.xml  |   2 +-
man/modprobe.xml |   2 +-
man/modules.dep.xml  |   6 +-
testsuite/module-playground/Makefile |   2 +-
testsuite/setup-rootfs.sh| 109 +++
testsuite/test-depmod.c  |  16 ++--
testsuite/test-testsuite.c   |   8 +-
tools/depmod.c   |   6 +-
tools/kmod.pc.in |   1 +
tools/modinfo.c  |   4 +-
tools/modprobe.c |   4 +-
tools/static-nodes.c |   6 +-
18 files changed, 107 insertions(+), 84 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index 6d0b2decfef3..019aa749fdf1 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -20,6 +20,7 @@ AM_CPPFLAGS = \
-I$(top_srcdir) \
-DSYSCONFDIR=\""$(sysconfdir)"\" \
-DDISTCONFDIR=\""$(distconfdir)"\" \
+   -DMODULE_DIRECTORY=\""$(module_directory)"\" \
${zlib_CFLAGS}

AM_CFLAGS = $(OUR_CFLAGS)
@@ -220,7 +221,7 @@ EXTRA_DIST += testsuite/setup-rootfs.sh
MODULE_PLAYGROUND = testsuite/module-playground
ROOTFS = testsuite/rootfs
ROOTFS_PRISTINE = $(top_srcdir)/testsuite/rootfs-pristine
-CREATE_ROOTFS = $(AM_V_GEN) $(top_srcdir)/testsuite/setup-rootfs.sh 
$(ROOTFS_PRISTINE) $(ROOTFS) $(MODULE_PLAYGROUND) $(top_builddir)/config.h 
$(sysconfdir)
+CREATE_ROOTFS = $(AM_V_GEN) MODULE_DIRECTORY=$(module_directory) 
$(top_srcdir)/testsuite/setup-rootfs.sh $(ROOTFS_PRISTINE) $(ROOTFS) 
$(MODULE_PLAYGROUND) $(top_builddir)/config.h $(sysconfdir)

build-module-playground:
$(AM_V_GEN)if test "$(top_srcdir)" != "$(top_builddir)"; then \
diff --git a/configure.ac b/configure.ac
index b4584d6cdc67..4051dc9249e2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -91,6 +91,12 @@ AC_ARG_WITH([rootlibdir],
[], [with_rootlibdir=$libdir])
AC_SUBST([rootlibdir], [$with_rootlibdir])

+# Ideally this would be $prefix/lib/modules but default to /lib/modules for 
compatibility with earlier versions
+AC_ARG_WITH([module_directory],
+AS_HELP_STRING([--with-module-directory=DIR], [directory in which to 
look for kernel modules - typically '/lib/modules' or '${prefix}/lib/modules']),
+[], [with_module_directory=/lib/modules])
+AC_SUBST([module_directory], [$with_module_directory])


we will probably have "fun" results if we accept a relative path here.


+
AC_ARG_WITH([zstd],
AS_HELP_STRING([--with-zstd], [handle Zstandard-compressed modules 
@<:@default=disabled@:>@]),
[], [with_zstd=no])
@@ -326,6 +332,7 @@ AC_MSG_RESULT([
$PACKAGE $VERSION
===

+   module_directory:   ${module_directory}
prefix: ${prefix}
sysconfdir: ${sysconfdir}
distconfdir:${distconfdir}
diff --git a/libkmod/libkmod.c b/libkmod/libkmod.c
index 09e6041461b0..63719e886de8 100644
--- a/libkmod/libkmod.c
+++ b/libkmod/libkmod.c
@@ -209,7 +209,7 @@ static int log_priority(const char *priority)
return 0;
}

-static const char *dirname_default_prefix = "/lib/modules";
+static const char *dirname_default_prefix = MODULE_DIRECTORY;

static char *get_kernel_release(const char *dirname)
{
@@ -231,7 +231,7 @@ static char *get_kernel_release(const char *dirname)
/**
 * kmod_new:
 * @dirname: what to consider as linux module's directory, if NULL
- *   defaults to /lib/modules/`uname -r`. If it's relative,
+ *   defaults to ${module_prefix}/lib/modules/`uname -r`. If it's 
relative,


module_prefix?  did you mean to use $MODULE_DIRECTORY/`uname -r`?


 *   it's treated as relative to the current working directory.
 *   Otherwise, give an absolute dirname.
 * @config_paths: ordered array of paths (directories or files) where
diff --git a/man/Makefile.am b/man/Makefile.am
index 2fea8e46bf2f..f550091a216a 100644
--- a/man/Makefile.am
+++ b/man/Makefile.am
@@ -22,6 +22,7 @@ CLEANFILES = $(dist_man_MANS)
else \
sed -e '/@DISTCONFDIR@/d' $< ; \
fi | \
+   sed -e 's|@MODULE_DIRECTORY@|$(module_directory)|g' | \

Re: [PATCH kmod v5 0/5] kmod /usr support

2023-10-17 Thread Michal Suchánek
On Tue, Oct 17, 2023 at 11:18:23AM -0500, Lucas De Marchi wrote:
> On Tue, Oct 17, 2023 at 05:45:39PM +0200, Michal Suchánek wrote:
> > Hello,
> > 
> > it has been a few months since these kmod patches have been posted, and
> > a new kmod versio has been released since.
> > 
> > Is there any interest in adding this to kmod?
> 
> yes, but I think the main drag is deciding with the kernel build system
> maintainers what they are willing to accept as an interface. There isn't
> much point in exporting a json config if from the kernel side they would
> rather use something else. Or to use pkg-config.
> 
> I confess I lost track of that discussion. Did that settle with
> pkg-config being the preferred solution?

The current discussion about the kernel side can be found here:

https://lore.kernel.org/linux-kbuild/20231017151050.gj6...@kitsune.suse.cz/T/#t

My impression is that pkg-config is accepted as an interface on the
basis that it's already required for building the kernel while jq is
currently required only for some additional optional tools.

Thanks

Michal



Re: [PATCH 1/2] arm64: dts: qcom: msm8953: Set initial address for memory

2023-10-17 Thread Konrad Dybcio




On 10/16/23 09:37, Stephan Gerhold wrote:

On Sun, Oct 15, 2023 at 10:26:01PM +0200, Luca Weiss wrote:

The dtbs_check really doesn't like having memory without reg set.
Address this by setting it to 0x1000 which seems to be the value
filled in by the bootloader.



Looks like MSM8953 has the same RAM setup as MSM8916, where the base
address depends on the amount of RAM you have:

   <= 2.00 GiB RAM: 0x8000
= 3.00 GiB RAM: 0x4000
= 3.75 GiB RAM: 0x1000

What a royal mess

Konrad


Re: [PATCH] arm64: dts: qcom: msm8939-longcheer-l9100: Add proximity-near-level

2023-10-17 Thread Konrad Dybcio




On 10/16/23 22:18, André Apitzsch wrote:

Consider an object near to the sensor when their distance is about 4 cm
or below.

Signed-off-by: André Apitzsch 
---

Reviewed-by: Konrad Dybcio 

Out of interest, what is it set to by default?

Konrad


Re: [PATCH kmod v5 0/5] kmod /usr support

2023-10-17 Thread Lucas De Marchi

On Tue, Oct 17, 2023 at 05:45:39PM +0200, Michal Suchánek wrote:

Hello,

it has been a few months since these kmod patches have been posted, and
a new kmod versio has been released since.

Is there any interest in adding this to kmod?


yes, but I think the main drag is deciding with the kernel build system
maintainers what they are willing to accept as an interface. There isn't
much point in exporting a json config if from the kernel side they would
rather use something else. Or to use pkg-config.

I confess I lost track of that discussion. Did that settle with
pkg-config being the preferred solution?

Lucas De Marchi



Re: [PATCH kmod v5 0/5] kmod /usr support

2023-10-17 Thread Michal Suchánek
Hello,

it has been a few months since these kmod patches have been posted, and
a new kmod versio has been released since.

Is there any interest in adding this to kmod?

Thanks

Michal

On Tue, Jul 18, 2023 at 02:01:51PM +0200, Michal Suchanek wrote:
> Hello,
> 
> with these patches it is possible to install kernel modules in an arbitrary
> directory - eg. moving the /lib/modules to /usr/lib/modules or /opt/linux.
> 
> While the modprobe.d and depmod.d search which already includes multiple
> paths is expanded to also include $(prefix) the module directory still
> supports only one location, only a different one under $(module_directory).
> 
> Having kmod search multiple module locations while only one is supported now
> might break some assumption about relative module path corresponding to a
> specific file, would require more invasive changes to implement, and is not
> supportive of the goal of moving the modules away from /lib.
> 
> Both kmod and the kernel need to be patched to make use of this feature.
> Patched kernel is backwards compatible with older kmod.  Patched kmod
> with $(module_directory) set to /lib/modules is equivalent to unpatched kmod.
> 
> Thanks
> 
> Michal
> 
> Link: 
> https://lore.kernel.org/linux-modules/20210112160211.5614-1-msucha...@suse.de/
> 
> v4: set whole path to module directory instead of adding prefix
> v5: use pkg-config instead of jq, fix build on openssl without sm3 support
> 
> 
> Michal Suchanek (5):
>   configure: Detect openssl sm3 support
>   man/depmod.d: Fix incorrect /usr/lib search path
>   libkmod, depmod: Load modprobe.d, depmod.d from ${prefix}/lib.
>   kmod: Add pkgconfig file with kmod compile time configuration
>   libkmod, depmod, modprobe: Make directory for kernel modules
> configurable
> 
>  Makefile.am  |   6 +-
>  configure.ac |  30 
>  libkmod/libkmod.c|  11 +--
>  man/Makefile.am  |  10 ++-
>  man/depmod.d.xml |   9 ++-
>  man/depmod.xml   |   4 +-
>  man/modinfo.xml  |   2 +-
>  man/modprobe.d.xml   |   1 +
>  man/modprobe.xml |   2 +-
>  man/modules.dep.xml  |   6 +-
>  testsuite/module-playground/Makefile |   2 +-
>  testsuite/setup-rootfs.sh| 109 +++
>  testsuite/test-depmod.c  |  16 ++--
>  testsuite/test-testsuite.c   |   8 +-
>  tools/depmod.c   |   7 +-
>  tools/kmod.pc.in |  10 +++
>  tools/modinfo.c  |   4 +-
>  tools/modprobe.c |   4 +-
>  tools/static-nodes.c |   6 +-
>  19 files changed, 156 insertions(+), 91 deletions(-)
>  create mode 100644 tools/kmod.pc.in
> 
> -- 
> 2.41.0
> 



Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-17 Thread Sean Christopherson
On Mon, Oct 16, 2023, Haitao Huang wrote:
> Hi Sean
> 
> On Mon, 16 Oct 2023 16:32:31 -0500, Sean Christopherson 
> wrote:
> 
> > On Mon, Oct 16, 2023, Haitao Huang wrote:
> > > From this perspective, I think the current implementation is
> > > "well-defined":
> > > EPC cgroup limits for VMs are only enforced at VM launch time, not
> > > runtime.  In practice,  SGX VM can be launched only with fixed EPC size
> > > and all those EPCs are fully committed to the VM once launched.
> > 
> > Fully committed doesn't mean those numbers are reflected in the cgroup.  A
> > VM scheduler can easily "commit" EPC to a guest, but allocate EPC on
> > demand, i.e.  when the guest attempts to actually access a page.
> > Preallocating memory isn't free, e.g. it can slow down guest boot, so it's
> > entirely reasonable to have virtual EPC be allocated on-demand.  Enforcing
> > at launch time doesn't work for such setups, because from the cgroup's
> > perspective, the VM is using 0 pages of EPC at launch.
> > 
> Maybe I understood the current implementation wrong. From what I see, vEPC
> is impossible not fully commit at launch time. The guest would EREMOVE all
> pages during initialization resulting #PF and all pages allocated. This
> essentially makes "prealloc=off" the same as "prealloc=on".
> Unless you are talking about some custom OS or kernel other than upstream
> Linux here?

Yes, a customer could be running an older kernel, something other than Linux, a
custom kernel, an out-of-tree SGX driver, etc.  The host should never assume
anything about the guest kernel when it comes to correctness (unless the guest
kernel is controlled by the host).

Doing EREMOVE on all pages is definitely not mandatory, especially if the kernel
detects a hypervisor, i.e. knows its running as a guest.


Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-17 Thread Michal Suchánek
On Tue, Oct 17, 2023 at 11:46:45PM +0900, Masahiro Yamada wrote:
> On Tue, Oct 17, 2023 at 9:27 PM Michal Suchánek  wrote:
> >
> > On Tue, Oct 17, 2023 at 09:05:29PM +0900, Masahiro Yamada wrote:
> > > On Tue, Oct 17, 2023 at 7:44 PM Michal Suchánek  wrote:
> > > >
> > > > On Tue, Oct 17, 2023 at 07:15:50PM +0900, Masahiro Yamada wrote:

> > > > > If MOD_PREFIX is given from an env variable or from the command line,
> > > > > it is respected.
> > > > >
> > > > > If "pkg-config --variable=module_prefix libkmod" works,
> > > > > that configuration is applied.
> > > > >
> > > > > Otherwise, MOD_PREFIX is empty, i.e. fall back to the current 
> > > > > behavior.
> > > > >
> > > > >
> > > > > I prefer 'MOD_PREFIX' to 'KERNEL_MODULE_DIRECTORY' in your patch [1]
> > > > > because "|| echo /lib/modules" can be omitted.
> > > > >
> > > > > I do not think we will have such a crazy distro that
> > > > > installs modules under /opt/ directory.
> > > >
> > > > However, I can easily imagine a distribution that would want to put
> > > > modules in /usr/lib-amd64-linux/modules.
> > >
> > >
> > > Sorry, it is not easy for me.
> > >
> > > What is the background of your thought?
> >
> > That's where every other library and module would go on distributions
> > that care about ability to install packages for multiple architectures
> > at the same time. AFAIK the workaround is to inclclude the CPU
> > architecture in extraversion for the kernel to fit.
> 
> 
> In my system (Ubuntu), I see the directory paths
> 
> /usr/aarch64-linux-gnu/lib/
> /usr/i686-linux-gnu/lib/
> /usr/x86_64-linux-gnu/lib/
> 
> If there were such a crazy distro that supports multiple kernel arches
> within a single image, modules might be installed:
> /usr/x86_64-linux-gnu/lib/module//

For me it's /usr/lib/i386-linux-gnu/.

Did they change the scheme at some point?

> > > >
> > > > > I could not understand why you inserted
> > > > > "--print-variables kmod 2>/dev/null | grep '^module_directory$$' 
> > > > > >/dev/null"
> > > > > but I guess the reason is the same.
> > > > > "pkg-config --variable=module_directory kmod" always succeeds,
> > > > > so "|| echo /lib/modules" is never processed.
> > > >
> > > > Yes, that's the semantics of the tool. The jq version was slightly less
> > > > convoluted but required additional tool for building the kernel.
> > >
> > >
> > > It IS convoluted.
> >
> > That's unfortunate result of how the pkgconfig tool works. By now it is
> > even too late to complain to the tool author because it's been like that
> > forever, best bet is to to use it as is or pick a different tool for
> > configuration.
> 
> "pkg-config --variable=" returns its value.
> It is pretty simple, and I do not think it is a big problem.
> 
> Your code is long, but the reason is that you implemented
> it in that way.
> 
> 
> If you go with KERNEL_MODULE_DIRECTORY for max flexibility,
> 
>   KERNEL_MODULE_DIRECTORY := $(or $(shell pkg-config
> --variable=module_directory kmod 2>/dev/null),/lib/modules)
> 
> should work with less characters and less process forks.

And assumes that the module_directory cannot be empty.

Which may or may not be a reasonable assumption, the script as proposed
in the patch does not rely on it.

> But, now I started to prefer confining the long code
> into the shell script, "scripts/modinst-dir",
> and calling it where needed.

That's also an option.

Thanks

Michal

> > > > > [1] 
> > > > > https://lore.kernel.org/linux-kbuild/20230718120348.383-1-msucha...@suse.de/
> > > > > [2] https://github.com/kmod-project/kmod/blob/v31/configure.ac#L295



Re: [PATCH] neighbor: tracing: Move pin6 inside CONFIG_IPV6=y section

2023-10-17 Thread David Ahern
On 10/16/23 6:49 AM, Geert Uytterhoeven wrote:
> When CONFIG_IPV6=n, and building with W=1:
> 
> In file included from include/trace/define_trace.h:102,
>from include/trace/events/neigh.h:255,
>from net/core/net-traces.c:51:
> include/trace/events/neigh.h: In function 
> ‘trace_event_raw_event_neigh_create’:
> include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not 
> used [-Werror=unused-but-set-variable]
>42 | struct in6_addr *pin6;
> |  ^~~~
> include/trace/trace_events.h:402:11: note: in definition of macro 
> ‘DECLARE_EVENT_CLASS’
>   402 | { assign; }   
>   \
> |   ^~
> include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
>44 |  PARAMS(assign),   \
> |  ^~
> include/trace/events/neigh.h:23:1: note: in expansion of macro 
> ‘TRACE_EVENT’
>23 | TRACE_EVENT(neigh_create,
> | ^~~
> include/trace/events/neigh.h:41:9: note: in expansion of macro 
> ‘TP_fast_assign’
>41 | TP_fast_assign(
> | ^~
> In file included from include/trace/define_trace.h:103,
>from include/trace/events/neigh.h:255,
>from net/core/net-traces.c:51:
> include/trace/events/neigh.h: In function ‘perf_trace_neigh_create’:
> include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not 
> used [-Werror=unused-but-set-variable]
>42 | struct in6_addr *pin6;
> |  ^~~~
> include/trace/perf.h:51:11: note: in definition of macro 
> ‘DECLARE_EVENT_CLASS’
>51 | { assign; }   
>   \
> |   ^~
> include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
>44 |  PARAMS(assign),   \
> |  ^~
> include/trace/events/neigh.h:23:1: note: in expansion of macro 
> ‘TRACE_EVENT’
>23 | TRACE_EVENT(neigh_create,
> | ^~~
> include/trace/events/neigh.h:41:9: note: in expansion of macro 
> ‘TP_fast_assign’
>41 | TP_fast_assign(
> | ^~
> 
> Indeed, the variable pin6 is declared and initialized unconditionally,
> while it is only used and needlessly re-initialized when support for
> IPv6 is enabled.
> 
> Fix this by dropping the unused variable initialization, and moving the
> variable declaration inside the existing section protected by a check
> for CONFIG_IPV6.
> 
> Fixes: fc651001d2c5ca4f ("neighbor: Add tracepoint to __neigh_create")
> Signed-off-by: Geert Uytterhoeven 
> ---
> No changes in generated code.
> 
>  include/trace/events/neigh.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Reviewed-by: David Ahern 

Google put your email in the spam folder, BTW.



Re: [PATCH v6 2/3] mm/memory_hotplug: split memmap_on_memory requests across memblocks

2023-10-17 Thread David Hildenbrand

On 17.10.23 07:44, Vishal Verma wrote:

The MHP_MEMMAP_ON_MEMORY flag for hotplugged memory is restricted to
'memblock_size' chunks of memory being added. Adding a larger span of
memory precludes memmap_on_memory semantics.

For users of hotplug such as kmem, large amounts of memory might get
added from the CXL subsystem. In some cases, this amount may exceed the
available 'main memory' to store the memmap for the memory being added.
In this case, it is useful to have a way to place the memmap on the
memory being added, even if it means splitting the addition into
memblock-sized chunks.

Change add_memory_resource() to loop over memblock-sized chunks of
memory if caller requested memmap_on_memory, and if other conditions for
it are met. Teach try_remove_memory() to also expect that a memory
range being removed might have been split up into memblock sized chunks,
and to loop through those as needed.

This does preclude being able to use PUD mappings in the direct map; a
proposal to how this could be optimized in the future is laid out
here[1].

[1]: 
https://lore.kernel.org/linux-mm/b6753402-2de9-25b2-36e9-eacd49752...@redhat.com/

Cc: Andrew Morton 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Dave Hansen 
Cc: Huang Ying 
Suggested-by: David Hildenbrand 
Reviewed-by: Dan Williams 
Signed-off-by: Vishal Verma 
---
  mm/memory_hotplug.c | 214 
  1 file changed, 148 insertions(+), 66 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6be7de9efa55..83e5ec377aad 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1380,6 +1380,43 @@ static bool mhp_supports_memmap_on_memory(unsigned long 
size)
return arch_supports_memmap_on_memory(vmemmap_size);
  }
  
+static int add_memory_create_devices(int nid, struct memory_group *group,

+u64 start, u64 size, mhp_t mhp_flags)
+{
+   struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
+   struct vmem_altmap mhp_altmap = {
+   .base_pfn =  PHYS_PFN(start),
+   .end_pfn  =  PHYS_PFN(start + size - 1),
+   };
+   int ret;
+
+   if ((mhp_flags & MHP_MEMMAP_ON_MEMORY)) {
+   mhp_altmap.free = memory_block_memmap_on_memory_pages();
+   params.altmap = kmemdup(_altmap, sizeof(struct vmem_altmap),
+   GFP_KERNEL);
+   if (!params.altmap)
+   return -ENOMEM;
+   }
+
+   /* call arch's memory hotadd */
+   ret = arch_add_memory(nid, start, size, );
+   if (ret < 0)
+   goto error;
+
+   /* create memory block devices after memory was added */
+   ret = create_memory_block_devices(start, size, params.altmap, group);
+   if (ret)
+   goto err_bdev;
+
+   return 0;
+
+err_bdev:
+   arch_remove_memory(start, size, NULL);
+error:
+   kfree(params.altmap);
+   return ret;
+}
+
  /*
   * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
   * and online/offline operations (triggered e.g. by sysfs).
@@ -1388,14 +1425,10 @@ static bool mhp_supports_memmap_on_memory(unsigned long 
size)
   */
  int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
  {
-   struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
+   unsigned long memblock_size = memory_block_size_bytes();
enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-   struct vmem_altmap mhp_altmap = {
-   .base_pfn =  PHYS_PFN(res->start),
-   .end_pfn  =  PHYS_PFN(res->end),
-   };
struct memory_group *group = NULL;
-   u64 start, size;
+   u64 start, size, cur_start;
bool new_node = false;
int ret;
  
@@ -1436,28 +1469,21 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)

/*
 * Self hosted memmap array
 */
-   if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
-   if (mhp_supports_memmap_on_memory(size)) {
-   mhp_altmap.free = memory_block_memmap_on_memory_pages();
-   params.altmap = kmemdup(_altmap,
-   sizeof(struct vmem_altmap),
-   GFP_KERNEL);
-   if (!params.altmap)
+   if ((mhp_flags & MHP_MEMMAP_ON_MEMORY) &&
+   mhp_supports_memmap_on_memory(memblock_size)) {
+   for (cur_start = start; cur_start < start + size;
+cur_start += memblock_size) {
+   ret = add_memory_create_devices(nid, group, cur_start,
+   memblock_size,
+   mhp_flags);


Just like on the removal part, that function doesn't make any sense to 
be called with 

Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-17 Thread Masahiro Yamada
On Tue, Oct 17, 2023 at 9:27 PM Michal Suchánek  wrote:
>
> On Tue, Oct 17, 2023 at 09:05:29PM +0900, Masahiro Yamada wrote:
> > On Tue, Oct 17, 2023 at 7:44 PM Michal Suchánek  wrote:
> > >
> > > On Tue, Oct 17, 2023 at 07:15:50PM +0900, Masahiro Yamada wrote:
> > > > > >
> > > > > > Let me add more context to my question.
> > > > > >
> > > > > >
> > > > > > I am interested in the timing when
> > > > > > 'pkg-config --print-variables kmod | grep module_directory'
> > > > > > is executed.
> > > > > >
> > > > > >
> > > > > >
> > > > > > 1.  Build a SRPM on machine A
> > > > > >
> > > > > > 2.  Copy the SRPM from machine A to machine B
> > > > > >
> > > > > > 3.  Run rpmbuild on machine B to build the SRPM into a RPM
> > > > > >
> > > > > > 4.  Copy the RPM from machine B to machine C
> > > > > >
> > > > > > 5.  Install the RPM to machine C
> > > > >
> > > > > As far as I am aware the typical use case is two step:
> > > > >
> > > > > 1. run make rpm-pkg on machine A
> > > > > 2. install the binary rpm on machine C that might not have build tools
> > > > >or powerful enough CPU
> > > > >
> > > > > While it's theoretically possible to use the srpm to rebuild the 
> > > > > binary
> > > > > rpm independently of the kernel git tree I am not aware of people
> > > > > commonly doing this.
> > > >
> > > >
> > > >
> > > > If I correctly understand commit
> > > > 8818039f959b2efc0d6f2cb101f8061332f0c77e,
> > > > those Redhat guys pack a SRPM on a local machine,
> > > > then send it to their build server called 'koji'.
> > > >
> > > > Otherwise, there is no reason
> > > > to have 'make srcrpm-pkg'.
> > > >
> > > >
> > > >
> > > > I believe "A == B" is not always true,
> > > > but we can assume "distro(A) == distro(B)" is always met
> > > > for simplicity.
> > > >
> > > > So, I am OK with configuration at the SRPM time.
> > >
> > > Even if the distro does not match it will likely work to configure SRPM
> > > for non-matching distro and then build it on the target distro but I have
> > > not tested it.
> >
> >
> >
> > Your approach specifies %{MODLIB} as a fixed string
> > when generating kernel.spec, i.e. at the SRPM time.
> >
> >
> >  %files
> >  %defattr (-, root, root)
> > -/lib/modules/%{KERNELRELEASE}
> > -%exclude /lib/modules/%{KERNELRELEASE}/build
> > +%{MODLIB}
> > +%exclude %{MODLIB}/build
> >  /boot/*
> >
> >
> > Then, how to change the path later?
>
> Why would you need to change the path later?
>
> The SRPM has sources, it does not need to build on the system on which
> it is authored if it is intended for another distribution.
>
> Of course, you would need to know for what distribution and where it
> wants its modules so that you can specify the location when creating the
> SRPM.



Simply I wrongly understood your description.

If you manage to correctly configure for the target distro
when building SRPM, that's fine.





>
> > > > > If rebuilding the source rpm on a different machine from where the git
> > > > > tree is located, and possibly on a different distribution is desirable
> > > > > then the detection of the KERNEL_MODULE_DIRECTORY should be added in 
> > > > > the
> > > > > rpm spec file as well.
> > > > >
> > > > > > Of course, we are most interested in the module path
> > > > > > of machine C, but it is difficult/impossible to
> > > > > > guess it at the time of building.
> > > > > >
> > > > > > We can assume machine B == machine C.
> > > > > >
> > > > > > We are the second most interested in the module
> > > > > > path on machine B.
> > > > > >
> > > > > > The module path of machine A is not important.
> > > > > >
> > > > > > So, I am asking where you would inject
> > > > > > 'pkg-config --print-variables kmod | grep module_directory'.
> > > > >
> > > > > I don't. I don't think there will be a separate machine B.
> > > > >
> > > > > And I can't really either - so far any attempt at adding support for
> > > > > this has been rejected.
> > > > >
> > > > > Technically the KERNEL_MODULE_DIRECTORY could be set in two steps - 
> > > > > one
> > > > > giving the script to run, and one running it, and then it could be run
> > > > > independently in the SRPM as well.
> > > >
> > > >
> > > > At first, I thought your patch [1] was very ugly,
> > > > but I do not think it is so ugly if cleanly implemented.
> > > >
> > > > It won't hurt to allow users to specify the middle part of MODLIB.
> > > >
> > > >
> > > > There are two options.
> > > >
> > > >
> > > > [A]  Add 'MOD_PREFIX' to specify the middle part of MODLIB
> > > >
> > > >
> > > > The top Makefile will look as follows:
> > > >
> > > >
> > > > MODLIB = $(INSTALL_MOD_PATH)$(MOD_PREFIX)/lib/modules/$(KERNELRELEASE)
> > > > export MODLIB
> > > >
> > > >
> > > > It is easier than specifying the entire MODLIB, but you still need
> > > > to manually pass "MOD_PREFIX=/usr" from an env variable or
> > > > the command line.
> > > >
> > > > If MOD_PREFIX is not given, MODLIB is the same as the current one.
> > > >
> > > > [B] Support a dynamic 

Re: [PATCH v6 1/3] mm/memory_hotplug: replace an open-coded kmemdup() in add_memory_resource()

2023-10-17 Thread David Hildenbrand

On 17.10.23 07:44, Vishal Verma wrote:

A review of the memmap_on_memory modifications to add_memory_resource()
revealed an instance of an open-coded kmemdup(). Replace it with
kmemdup().

Cc: Andrew Morton 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Dan Williams 
Reported-by: Dan Williams 
Signed-off-by: Vishal Verma 
---
  mm/memory_hotplug.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f8d3e7427e32..6be7de9efa55 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1439,11 +1439,11 @@ int __ref add_memory_resource(int nid, struct resource 
*res, mhp_t mhp_flags)
if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
if (mhp_supports_memmap_on_memory(size)) {
mhp_altmap.free = memory_block_memmap_on_memory_pages();
-   params.altmap = kmalloc(sizeof(struct vmem_altmap), 
GFP_KERNEL);
+   params.altmap = kmemdup(_altmap,
+   sizeof(struct vmem_altmap),
+   GFP_KERNEL);
if (!params.altmap)
goto error;
-
-   memcpy(params.altmap, _altmap, sizeof(mhp_altmap));
}
/* fallback to not using altmap  */
}



Reviewed-by: David Hildenbrand 

--
Cheers,

David / dhildenb




Re: [PATCH v2 5/6] ACPI: NFIT: Replace acpi_driver with platform_driver

2023-10-17 Thread Rafael J. Wysocki
On Fri, Oct 6, 2023 at 8:33 PM Michal Wilczynski
 wrote:
>
> NFIT driver uses struct acpi_driver incorrectly to register itself.
> This is wrong as the instances of the ACPI devices are not meant
> to be literal devices, they're supposed to describe ACPI entry of a
> particular device.
>
> Use platform_driver instead of acpi_driver. In relevant places call
> platform devices instances pdev to make a distinction with ACPI
> devices instances.
>
> NFIT driver uses devm_*() family of functions extensively. This change
> has no impact on correct functioning of the whole devm_*() family of
> functions, since the lifecycle of the device stays the same. It is still
> being created during the enumeration, and destroyed on platform device
> removal.
>
> Suggested-by: Rafael J. Wysocki 
> Signed-off-by: Michal Wilczynski 
> ---
>  drivers/acpi/nfit/core.c | 34 ++
>  1 file changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 942b84d94078..fb0bc16fa186 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include "intel.h"
> @@ -98,7 +99,7 @@ static struct acpi_device *to_acpi_dev(struct 
> acpi_nfit_desc *acpi_desc)
> || strcmp(nd_desc->provider_name, "ACPI.NFIT") != 0)
> return NULL;
>
> -   return to_acpi_device(acpi_desc->dev);
> +   return ACPI_COMPANION(acpi_desc->dev);
>  }
>
>  static int xlat_bus_status(void *buf, unsigned int cmd, u32 status)
> @@ -3284,11 +3285,11 @@ static void acpi_nfit_put_table(void *table)
>
>  static void acpi_nfit_notify(acpi_handle handle, u32 event, void *data)
>  {
> -   struct acpi_device *adev = data;
> +   struct device *dev = data;
>
> -   device_lock(>dev);
> -   __acpi_nfit_notify(>dev, handle, event);
> -   device_unlock(>dev);
> +   device_lock(dev);
> +   __acpi_nfit_notify(dev, handle, event);
> +   device_unlock(dev);

Careful here.

The ACPI device locking is changed to platform device locking without
a word of explanation in the changelog.

Do you actually know what the role of the locking around
__acpi_nfit_notify() is and whether or not it can be replaced with
platform device locking safely?

>  }
>
>  static void acpi_nfit_remove_notify_handler(void *data)
> @@ -3329,11 +3330,12 @@ void acpi_nfit_shutdown(void *data)
>  }
>  EXPORT_SYMBOL_GPL(acpi_nfit_shutdown);
>
> -static int acpi_nfit_add(struct acpi_device *adev)
> +static int acpi_nfit_probe(struct platform_device *pdev)
>  {
> struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
> struct acpi_nfit_desc *acpi_desc;
> -   struct device *dev = >dev;
> +   struct device *dev = >dev;
> +   struct acpi_device *adev = ACPI_COMPANION(dev);
> struct acpi_table_header *tbl;
> acpi_status status = AE_OK;
> acpi_size sz;
> @@ -3360,7 +3362,7 @@ static int acpi_nfit_add(struct acpi_device *adev)
> acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> if (!acpi_desc)
> return -ENOMEM;
> -   acpi_nfit_desc_init(acpi_desc, >dev);
> +   acpi_nfit_desc_init(acpi_desc, dev);

You seem to think that replacing adev->dev with pdev->dev everywhere
in this driver will work,

Have you verified that in any way?  If so, then how?

>
> /* Save the acpi header for exporting the revision via sysfs */
> acpi_desc->acpi_header = *tbl;
> @@ -3391,7 +3393,7 @@ static int acpi_nfit_add(struct acpi_device *adev)
> return rc;
>
> rc = acpi_dev_install_notify_handler(adev, ACPI_DEVICE_NOTIFY,
> -acpi_nfit_notify, adev);
> +acpi_nfit_notify, dev);
> if (rc)
> return rc;
>
> @@ -3475,11 +3477,11 @@ static const struct acpi_device_id acpi_nfit_ids[] = {
>  };
>  MODULE_DEVICE_TABLE(acpi, acpi_nfit_ids);
>
> -static struct acpi_driver acpi_nfit_driver = {
> -   .name = KBUILD_MODNAME,
> -   .ids = acpi_nfit_ids,
> -   .ops = {
> -   .add = acpi_nfit_add,
> +static struct platform_driver acpi_nfit_driver = {
> +   .probe = acpi_nfit_probe,
> +   .driver = {
> +   .name = KBUILD_MODNAME,
> +   .acpi_match_table = acpi_nfit_ids,
> },
>  };
>
> @@ -3517,7 +3519,7 @@ static __init int nfit_init(void)
> return -ENOMEM;
>
> nfit_mce_register();
> -   ret = acpi_bus_register_driver(_nfit_driver);
> +   ret = platform_driver_register(_nfit_driver);
> if (ret) {
> nfit_mce_unregister();
> destroy_workqueue(nfit_wq);
> @@ -3530,7 +3532,7 @@ static __init int nfit_init(void)
>  static __exit void nfit_exit(void)
>  {
> nfit_mce_unregister();
> -   

[PATCH v11 4/5] kprobes: freelist.h removed

2023-10-17 Thread wuqiang.matt
This patch will remove freelist.h from kernel source tree, since the
only use cases (kretprobe and rethook) are converted to objpool.

Signed-off-by: wuqiang.matt 
---
 include/linux/freelist.h | 129 ---
 1 file changed, 129 deletions(-)
 delete mode 100644 include/linux/freelist.h

diff --git a/include/linux/freelist.h b/include/linux/freelist.h
deleted file mode 100644
index fc1842b96469..
--- a/include/linux/freelist.h
+++ /dev/null
@@ -1,129 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause */
-#ifndef FREELIST_H
-#define FREELIST_H
-
-#include 
-
-/*
- * Copyright: came...@moodycamel.com
- *
- * A simple CAS-based lock-free free list. Not the fastest thing in the world
- * under heavy contention, but simple and correct (assuming nodes are never
- * freed until after the free list is destroyed), and fairly speedy under low
- * contention.
- *
- * Adapted from: 
https://moodycamel.com/blog/2014/solving-the-aba-problem-for-lock-free-free-lists
- */
-
-struct freelist_node {
-   atomic_trefs;
-   struct freelist_node*next;
-};
-
-struct freelist_head {
-   struct freelist_node*head;
-};
-
-#define REFS_ON_FREELIST 0x8000
-#define REFS_MASK   0x7FFF
-
-static inline void __freelist_add(struct freelist_node *node, struct 
freelist_head *list)
-{
-   /*
-* Since the refcount is zero, and nobody can increase it once it's
-* zero (except us, and we run only one copy of this method per node at
-* a time, i.e. the single thread case), then we know we can safely
-* change the next pointer of the node; however, once the refcount is
-* back above zero, then other threads could increase it (happens under
-* heavy contention, when the refcount goes to zero in between a load
-* and a refcount increment of a node in try_get, then back up to
-* something non-zero, then the refcount increment is done by the other
-* thread) -- so if the CAS to add the node to the actual list fails,
-* decrese the refcount and leave the add operation to the next thread
-* who puts the refcount back to zero (which could be us, hence the
-* loop).
-*/
-   struct freelist_node *head = READ_ONCE(list->head);
-
-   for (;;) {
-   WRITE_ONCE(node->next, head);
-   atomic_set_release(>refs, 1);
-
-   if (!try_cmpxchg_release(>head, , node)) {
-   /*
-* Hmm, the add failed, but we can only try again when
-* the refcount goes back to zero.
-*/
-   if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, 
>refs) == 1)
-   continue;
-   }
-   return;
-   }
-}
-
-static inline void freelist_add(struct freelist_node *node, struct 
freelist_head *list)
-{
-   /*
-* We know that the should-be-on-freelist bit is 0 at this point, so
-* it's safe to set it using a fetch_add.
-*/
-   if (!atomic_fetch_add_release(REFS_ON_FREELIST, >refs)) {
-   /*
-* Oh look! We were the last ones referencing this node, and we
-* know we want to add it to the free list, so let's do it!
-*/
-   __freelist_add(node, list);
-   }
-}
-
-static inline struct freelist_node *freelist_try_get(struct freelist_head 
*list)
-{
-   struct freelist_node *prev, *next, *head = 
smp_load_acquire(>head);
-   unsigned int refs;
-
-   while (head) {
-   prev = head;
-   refs = atomic_read(>refs);
-   if ((refs & REFS_MASK) == 0 ||
-   !atomic_try_cmpxchg_acquire(>refs, , refs+1)) {
-   head = smp_load_acquire(>head);
-   continue;
-   }
-
-   /*
-* Good, reference count has been incremented (it wasn't at
-* zero), which means we can read the next and not worry about
-* it changing between now and the time we do the CAS.
-*/
-   next = READ_ONCE(head->next);
-   if (try_cmpxchg_acquire(>head, , next)) {
-   /*
-* Yay, got the node. This means it was on the list,
-* which means should-be-on-freelist must be false no
-* matter the refcount (because nobody else knows it's
-* been taken off yet, it can't have been put back on).
-*/
-   WARN_ON_ONCE(atomic_read(>refs) & 
REFS_ON_FREELIST);
-
-   /*
-* Decrease refcount twice, once for our ref, and once
-* for the list's ref.
-*/
-   

[PATCH v11 5/5] MAINTAINERS: objpool added

2023-10-17 Thread wuqiang.matt
objpool, a scalable and lockless ring-array based object pool, was
introduced to replace the original freelist (a LIFO queue based on
singly linked list) to improve kretprobe scalability.

Signed-off-by: wuqiang.matt 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 48abe1a281f2..1c0a38d763a2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15328,6 +15328,13 @@ F: include/linux/objagg.h
 F: lib/objagg.c
 F: lib/test_objagg.c
 
+OBJPOOL
+M: Matt Wu 
+S: Supported
+F: include/linux/objpool.h
+F: lib/objpool.c
+F: lib/test_objpool.c
+
 OBJTOOL
 M: Josh Poimboeuf 
 M: Peter Zijlstra 
-- 
2.40.1




[PATCH v11 3/5] kprobes: kretprobe scalability improvement

2023-10-17 Thread wuqiang.matt
kretprobe is using freelist to manage return-instances, but freelist,
as LIFO queue based on singly linked list, scales badly and reduces
the overall throughput of kretprobed routines, especially for high
contention scenarios.

Here's a typical throughput test of sys_prctl (counts in 10 seconds,
measured with perf stat -a -I 1 -e syscalls:sys_enter_prctl):

OS: Debian 10 X86_64, Linux 6.5rc7 with freelist
HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s

 1T   2T   4T   8T  16T  24T
   24150045 29317964 15446741 12494489 18287272 17708768
32T  48T  64T  72T  96T 128T
   16200682 13737658 11645677 11269858 10470118  9931051

This patch introduces objpool to replace freelist. objpool is a
high performance queue, which can bring near-linear scalability
to kretprobed routines. Tests of kretprobe throughput show the
biggest ratio as 159x of original freelist. Here's the result:

  1T 2T 4T 8T16T
native: 41186213   82336866  164250978  328662645  658810299
freelist:   24150045   29317964   15446741   12494489   18287272
objpool:23926730   48010314   96125218  191782984  385091769
 32T48T64T96T   128T
native:   1330338351 1969957941 2512291791 2615754135 2671040914
freelist:   16200682   13737658   11645677   104701189931051
objpool:   764481096 1147149781 1456220214 1502109662 1579015050

Testings on 96-core ARM64 output similarly, but with the biggest
ratio up to 448x:

OS: Debian 10 AARCH64, Linux 6.5rc7
HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s

  1T 2T 4T 8T16T
native: .   30066096   63569843  126194076  257447289  505800181
freelist:   16152090   11064397   1112406872157685663013
objpool:13997541   28032100   55726624  110099926  221498787
 24T32T48T64T96T
native:763305277 1015925192 1521075123 2033009392 3021013752
freelist:50158104602893376679233824782945292
objpool:   328192025  439439564  668534502  887401381 1319972072

Signed-off-by: wuqiang.matt 
---
 include/linux/kprobes.h | 11 ++---
 include/linux/rethook.h | 16 ++-
 kernel/kprobes.c| 93 +
 kernel/trace/fprobe.c   | 32 ++
 kernel/trace/rethook.c  | 90 ++-
 5 files changed, 98 insertions(+), 144 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 85a64cb95d75..365eb092e9c4 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -26,8 +26,7 @@
 #include 
 #include 
 #include 
-#include 
-#include 
+#include 
 #include 
 #include 
 
@@ -141,7 +140,7 @@ static inline bool kprobe_ftrace(struct kprobe *p)
  */
 struct kretprobe_holder {
struct kretprobe*rp;
-   refcount_t  ref;
+   struct objpool_head pool;
 };
 
 struct kretprobe {
@@ -154,7 +153,6 @@ struct kretprobe {
 #ifdef CONFIG_KRETPROBE_ON_RETHOOK
struct rethook *rh;
 #else
-   struct freelist_head freelist;
struct kretprobe_holder *rph;
 #endif
 };
@@ -165,10 +163,7 @@ struct kretprobe_instance {
 #ifdef CONFIG_KRETPROBE_ON_RETHOOK
struct rethook_node node;
 #else
-   union {
-   struct freelist_node freelist;
-   struct rcu_head rcu;
-   };
+   struct rcu_head rcu;
struct llist_node llist;
struct kretprobe_holder *rph;
kprobe_opcode_t *ret_addr;
diff --git a/include/linux/rethook.h b/include/linux/rethook.h
index 26b6f3c81a76..ce69b2b7bc35 100644
--- a/include/linux/rethook.h
+++ b/include/linux/rethook.h
@@ -6,11 +6,10 @@
 #define _LINUX_RETHOOK_H
 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
-#include 
 
 struct rethook_node;
 
@@ -30,14 +29,12 @@ typedef void (*rethook_handler_t) (struct rethook_node *, 
void *, unsigned long,
 struct rethook {
void*data;
rethook_handler_t   handler;
-   struct freelist_headpool;
-   refcount_t  ref;
+   struct objpool_head pool;
struct rcu_head rcu;
 };
 
 /**
  * struct rethook_node - The rethook shadow-stack entry node.
- * @freelist: The freelist, linked to struct rethook::pool.
  * @rcu: The rcu_head for deferred freeing.
  * @llist: The llist, linked to a struct task_struct::rethooks.
  * @rethook: The pointer to the struct rethook.
@@ -48,20 +45,16 @@ struct rethook {
  * on each entry of the shadow stack.
  */
 struct rethook_node {
-   union {
-   struct freelist_node freelist;
-   struct rcu_head  rcu;
-   };
+   struct rcu_head rcu;
struct llist_node   llist;
struct rethook  *rethook;
unsigned long   ret_addr;
unsigned long   frame;
 };
 

[PATCH v11 2/5] lib: objpool test module added

2023-10-17 Thread wuqiang.matt
The test_objpool module (test_objpool) will run several testcases
for objpool stress and performance evaluation. Each testcase will
have all available cpu cores involved to create a situation of high
parallel and high contention.

As of now there are 5 groups and 5 * 2 testcases in total:

1) group 1: synchronous mode
   objpool is managed synchronously, that is, all objects are to be
   reclaimed before objpool finalization and the objpool owner makes
   sure of it. All threads on different cores run in the same pace
2) group 2: synchronous mode + hrtimer
   this case have 2 customers: normal threads and hrtimer softirqs
3) group 3: synchronous + overrun mode
   This test group is mainly for performance evaluation of missing
   cases when pre-allocated objects are less than the requested
4) group 4: asynchronous mode
   This case is just an emulation of kretprobe, with refcount used
   to control the objpool lifecycle
5) group 5: asynchronous mode with hrtimer
   hrtimer softirq is introduced to stress async objpool operations

Signed-off-by: wuqiang.matt 
---
 lib/Kconfig.debug  |  11 +
 lib/Makefile   |   2 +
 lib/test_objpool.c | 689 +
 3 files changed, 702 insertions(+)
 create mode 100644 lib/test_objpool.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d6798513a8c2..6598604cf6c8 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2931,6 +2931,17 @@ config TEST_CLOCKSOURCE_WATCHDOG
 
  If unsure, say N.
 
+config TEST_OBJPOOL
+   tristate "Test module for correctness and stress of objpool"
+   default n
+   depends on m && DEBUG_KERNEL
+   help
+ This builds the "test_objpool" module that should be used for
+ correctness verification and concurrent testings of objects
+ allocation and reclamation.
+
+ If unsure, say N.
+
 endif # RUNTIME_TESTING_MENU
 
 config ARCH_USE_MEMTEST
diff --git a/lib/Makefile b/lib/Makefile
index 7a84c922d9ff..19b936f2af1c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
 obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o
 CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE)
 obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o
+obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o
+
 #
 # CFLAGS for compiling floating point code inside the kernel. x86/Makefile 
turns
 # off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS
diff --git a/lib/test_objpool.c b/lib/test_objpool.c
new file mode 100644
index ..d329472f8ab6
--- /dev/null
+++ b/lib/test_objpool.c
@@ -0,0 +1,696 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Test module for lockless object pool
+ *
+ * Copyright: wuqiang.m...@bytedance.com
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define OT_NR_MAX_BULK (16)
+
+/* memory usage */
+struct ot_mem_stat {
+   atomic_long_t alloc;
+   atomic_long_t free;
+};
+
+/* object allocation results */
+struct ot_obj_stat {
+   unsigned long nhits;
+   unsigned long nmiss;
+};
+
+/* control & results per testcase */
+struct ot_data {
+   struct rw_semaphore start;
+   struct completion wait;
+   struct completion rcu;
+   atomic_t nthreads cacheline_aligned_in_smp;
+   atomic_t stop cacheline_aligned_in_smp;
+   struct ot_mem_stat kmalloc;
+   struct ot_mem_stat vmalloc;
+   struct ot_obj_stat objects;
+   u64duration;
+};
+
+/* testcase */
+struct ot_test {
+   int async; /* synchronous or asynchronous */
+   int mode; /* only mode 0 supported */
+   int objsz; /* object size */
+   int duration; /* ms */
+   int delay; /* ms */
+   int bulk_normal;
+   int bulk_irq;
+   unsigned long hrtimer; /* ms */
+   const char *name;
+   struct ot_data data;
+};
+
+/* per-cpu worker */
+struct ot_item {
+   struct objpool_head *pool; /* pool head */
+   struct ot_test *test; /* test parameters */
+
+   void (*worker)(struct ot_item *item, int irq);
+
+   /* hrtimer control */
+   ktime_t hrtcycle;
+   struct hrtimer hrtimer;
+
+   int bulk[2]; /* for thread and irq */
+   int delay;
+   u32 niters;
+
+   /* summary per thread */
+   struct ot_obj_stat stat[2]; /* thread and irq */
+   u64 duration;
+};
+
+/*
+ * memory leakage checking
+ */
+
+static void *ot_kzalloc(struct ot_test *test, long size)
+{
+   void *ptr = kzalloc(size, GFP_KERNEL);
+
+   if (ptr)
+   atomic_long_add(size, >data.kmalloc.alloc);
+   return ptr;
+}
+
+static void ot_kfree(struct ot_test *test, void *ptr, long size)
+{
+   if (!ptr)
+   return;
+   atomic_long_add(size, >data.kmalloc.free);
+   kfree(ptr);
+}
+
+static void ot_mem_report(struct ot_test *test)
+{
+   

[PATCH v11 1/5] lib: objpool added: ring-array based lockless MPMC

2023-10-17 Thread wuqiang.matt
objpool is a scalable implementation of high performance queue for
object allocation and reclamation, such as kretprobe instances.

With leveraging percpu ring-array to mitigate hot spots of memory
contention, it delivers near-linear scalability for high parallel
scenarios. The objpool is best suited for the following cases:
1) Memory allocation or reclamation are prohibited or too expensive
2) Consumers are of different priorities, such as irqs and threads

Limitations:
1) Maximum objects (capacity) is fixed after objpool creation
2) All pre-allocated objects are managed in percpu ring array,
   which consumes more memory than linked lists

Signed-off-by: wuqiang.matt 
---
 include/linux/objpool.h | 176 +
 lib/Makefile|   2 +-
 lib/objpool.c   | 286 
 3 files changed, 463 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/objpool.h
 create mode 100644 lib/objpool.c

diff --git a/include/linux/objpool.h b/include/linux/objpool.h
new file mode 100644
index ..4df18405420a
--- /dev/null
+++ b/include/linux/objpool.h
@@ -0,0 +1,181 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_OBJPOOL_H
+#define _LINUX_OBJPOOL_H
+
+#include 
+#include 
+
+/*
+ * objpool: ring-array based lockless MPMC queue
+ *
+ * Copyright: wuqiang.m...@bytedance.com,mhira...@kernel.org
+ *
+ * objpool is a scalable implementation of high performance queue for
+ * object allocation and reclamation, such as kretprobe instances.
+ *
+ * With leveraging percpu ring-array to mitigate hot spots of memory
+ * contention, it delivers near-linear scalability for high parallel
+ * scenarios. The objpool is best suited for the following cases:
+ * 1) Memory allocation or reclamation are prohibited or too expensive
+ * 2) Consumers are of different priorities, such as irqs and threads
+ *
+ * Limitations:
+ * 1) Maximum objects (capacity) is fixed after objpool creation
+ * 2) All pre-allocated objects are managed in percpu ring array,
+ *which consumes more memory than linked lists
+ */
+
+/**
+ * struct objpool_slot - percpu ring array of objpool
+ * @head: head sequence of the local ring array (to retrieve at)
+ * @tail: tail sequence of the local ring array (to append at)
+ * @last: the last sequence number marked as ready for retrieve
+ * @mask: bits mask for modulo capacity to compute array indexes
+ * @entries: object entries on this slot
+ *
+ * Represents a cpu-local array-based ring buffer, its size is specialized
+ * during initialization of object pool. The percpu objpool node is to be
+ * allocated from local memory for NUMA system, and to be kept compact in
+ * continuous memory: CPU assigned number of objects are stored just after
+ * the body of objpool_node.
+ *
+ * Real size of the ring array is far too smaller than the value range of
+ * head and tail, typed as uint32_t: [0, 2^32), so only lower bits (mask)
+ * of head and tail are used as the actual position in the ring array. In
+ * general the ring array is acting like a small sliding window, which is
+ * always moving forward in the loop of [0, 2^32).
+ */
+struct objpool_slot {
+   uint32_thead;
+   uint32_ttail;
+   uint32_tlast;
+   uint32_tmask;
+   void   *entries[];
+} __packed;
+
+struct objpool_head;
+
+/*
+ * caller-specified callback for object initial setup, it's only called
+ * once for each object (just after the memory allocation of the object)
+ */
+typedef int (*objpool_init_obj_cb)(void *obj, void *context);
+
+/* caller-specified cleanup callback for objpool destruction */
+typedef int (*objpool_fini_cb)(struct objpool_head *head, void *context);
+
+/**
+ * struct objpool_head - object pooling metadata
+ * @obj_size:   object size, aligned to sizeof(void *)
+ * @nr_objs:total objs (to be pre-allocated with objpool)
+ * @nr_cpus:local copy of nr_cpu_ids
+ * @capacity:   max objs can be managed by one objpool_slot
+ * @gfp:gfp flags for kmalloc & vmalloc
+ * @ref:refcount of objpool
+ * @flags:  flags for objpool management
+ * @cpu_slots:  pointer to the array of objpool_slot
+ * @release:resource cleanup callback
+ * @context:caller-provided context
+ */
+struct objpool_head {
+   int obj_size;
+   int nr_objs;
+   int nr_cpus;
+   int capacity;
+   gfp_t   gfp;
+   refcount_t  ref;
+   unsigned long   flags;
+   struct objpool_slot   **cpu_slots;
+   objpool_fini_cb release;
+   void   *context;
+};
+
+#define OBJPOOL_NR_OBJECT_MAX  (1UL << 24) /* maximum numbers of total objects 
*/
+#define OBJPOOL_OBJECT_SIZE_MAX(1UL << 16) /* maximum size of an 
object */
+
+/**
+ * objpool_init() - initialize objpool and pre-allocated objects
+ * 

[PATCH v11 0/5] lib,kprobes: kretprobe scalability improvement

2023-10-17 Thread wuqiang.matt
This patch series introduces a scalable and lockless ring-array based
object pool to improve scalability of kretprobed routines.

v11:
  *) patchset rebased to branch probes/core of linux-trace.git
  *) objpool: objpool_fini optimized for better code readability
  *) test_objpool: asynchronous releasing of objpool now covered

wuqiang.matt (5):
  lib: objpool added: ring-array based lockless MPMC
  lib: objpool test module added
  kprobes: kretprobe scalability improvement with objpool
  kprobes: freelist.h removed
  MAINTAINERS: objpool added

 MAINTAINERS  |   7 +
 include/linux/freelist.h | 129 
 include/linux/kprobes.h  |  11 +-
 include/linux/objpool.h  | 176 ++
 include/linux/rethook.h  |  16 +-
 kernel/kprobes.c |  93 +++---
 kernel/trace/fprobe.c|  32 +-
 kernel/trace/rethook.c   |  90 +++--
 lib/Kconfig.debug|  11 +
 lib/Makefile |   4 +-
 lib/objpool.c| 286 
 lib/test_objpool.c   | 689 +++
 12 files changed, 1270 insertions(+), 274 deletions(-)
 delete mode 100644 include/linux/freelist.h
 create mode 100644 include/linux/objpool.h
 create mode 100644 lib/objpool.c
 create mode 100644 lib/test_objpool.c

-- 
2.40.1




Re: [PATCH v3 4/6] ACPI: AC: Rename ACPI device from device to adev

2023-10-17 Thread Rafael J. Wysocki
On Wed, Oct 11, 2023 at 10:34 AM Michal Wilczynski
 wrote:
>
> Since transformation from ACPI driver to platform driver there are two
> devices on which the driver operates - ACPI device and platform device.
> For the sake of reader this calls for the distinction in their naming,
> to avoid confusion. Rename device to adev, as corresponding
> platform device is called pdev.
>
> Signed-off-by: Michal Wilczynski 
> ---
>  drivers/acpi/ac.c | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/acpi/ac.c b/drivers/acpi/ac.c
> index 1dd4919be7ac..2618a7ccc11c 100644
> --- a/drivers/acpi/ac.c
> +++ b/drivers/acpi/ac.c
> @@ -121,11 +121,11 @@ static void acpi_ac_notify(acpi_handle handle, u32 
> event, void *data)
>  {
> struct device *dev = data;
> struct acpi_ac *ac = dev_get_drvdata(dev);
> -   struct acpi_device *device = ACPI_COMPANION(dev);
> +   struct acpi_device *adev = ACPI_COMPANION(dev);
>
> switch (event) {
> default:
> -   acpi_handle_debug(device->handle, "Unsupported event 
> [0x%x]\n",
> +   acpi_handle_debug(adev->handle, "Unsupported event [0x%x]\n",
>   event);
> fallthrough;
> case ACPI_AC_NOTIFY_STATUS:
> @@ -142,11 +142,11 @@ static void acpi_ac_notify(acpi_handle handle, u32 
> event, void *data)
> msleep(ac_sleep_before_get_state_ms);
>
> acpi_ac_get_state(ac);
> -   acpi_bus_generate_netlink_event(device->pnp.device_class,
> +   acpi_bus_generate_netlink_event(adev->pnp.device_class,
> dev_name(dev),
> event,
> ac->state);
> -   acpi_notifier_call_chain(device, event, ac->state);
> +   acpi_notifier_call_chain(adev, event, ac->state);
> kobject_uevent(>charger->dev.kobj, KOBJ_CHANGE);
> }
>  }
> @@ -205,7 +205,7 @@ static const struct dmi_system_id ac_dmi_table[]  
> __initconst = {
>
>  static int acpi_ac_probe(struct platform_device *pdev)
>  {
> -   struct acpi_device *device = ACPI_COMPANION(>dev);
> +   struct acpi_device *adev = ACPI_COMPANION(>dev);
> struct power_supply_config psy_cfg = {};
> struct acpi_ac *ac;
> int result;
> @@ -214,9 +214,9 @@ static int acpi_ac_probe(struct platform_device *pdev)
> if (!ac)
> return -ENOMEM;
>
> -   ac->device = device;
> -   strcpy(acpi_device_name(device), ACPI_AC_DEVICE_NAME);
> -   strcpy(acpi_device_class(device), ACPI_AC_CLASS);
> +   ac->device = adev;
> +   strcpy(acpi_device_name(adev), ACPI_AC_DEVICE_NAME);
> +   strcpy(acpi_device_class(adev), ACPI_AC_CLASS);
>
> platform_set_drvdata(pdev, ac);
>
> @@ -226,7 +226,7 @@ static int acpi_ac_probe(struct platform_device *pdev)
>
> psy_cfg.drv_data = ac;
>
> -   ac->charger_desc.name = acpi_device_bid(device);
> +   ac->charger_desc.name = acpi_device_bid(adev);
> ac->charger_desc.type = POWER_SUPPLY_TYPE_MAINS;
> ac->charger_desc.properties = ac_props;
> ac->charger_desc.num_properties = ARRAY_SIZE(ac_props);
> @@ -238,13 +238,13 @@ static int acpi_ac_probe(struct platform_device *pdev)
> goto err_release_ac;
> }
>
> -   pr_info("%s [%s] (%s-line)\n", acpi_device_name(device),
> -   acpi_device_bid(device), str_on_off(ac->state));
> +   pr_info("%s [%s] (%s-line)\n", acpi_device_name(adev),
> +   acpi_device_bid(adev), str_on_off(ac->state));
>
> ac->battery_nb.notifier_call = acpi_ac_battery_notify;
> register_acpi_notifier(>battery_nb);
>
> -   result = acpi_dev_install_notify_handler(device, ACPI_ALL_NOTIFY,
> +   result = acpi_dev_install_notify_handler(adev, ACPI_ALL_NOTIFY,
>  acpi_ac_notify, >dev);
> if (result)
> goto err_unregister;
> --

Rebased on top of current linux-next and applied as 6.7 material, thanks!



Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-17 Thread Haitao Huang

On Mon, 16 Oct 2023 20:34:57 -0500, Huang, Kai  wrote:


On Mon, 2023-10-16 at 19:10 -0500, Haitao Huang wrote:
On Mon, 16 Oct 2023 16:09:52 -0500, Huang, Kai   
wrote:

[...]

> still need to fix the bug mentioned above here.
>
> I really think you should just go this simple way:
>
> When you want to take EPC back from VM, kill the VM.
>

My only concern is that this is a compromise due to current limitation  
(no

other sane way to take EPC from VMs). If we define this behavior and it
becomes a contract to user space, then we can't change in future.


Why do we need to "define such behaviour"?

This isn't some kinda of kernel/userspace ABI IMHO, but only kernel  
internal
implementation.  Here VM is similar to normal host enclaves.  You limit  
the
resource, some host enclaves could be killed.  Similarly, VM could also  
be

killed too.

And supporting VMM EPC oversubscription doesn't mean VM won't be  
killed.  The VM
can still be a target to kill after VM's all EPC pages have been swapped  
out.




On the other hand, my understanding the reason you want this behavior is
to enforce EPC limit at runtime.


No I just thought this is a bug/issue needs to be fixed.  If anyone  
believes

this is not a bug/issue then it's a separate discussion.



AFAIK, before we introducing max_write() callback in this series, no misc  
controller would possibly enforce the limit when misc.max is reduced. e.g.  
I don't think CVMs be killed when ASID limit is reduced and the cgroup was  
full before limit is reduced.


I think EPC pages to VMs could have the same behavior, once they are given  
to a guest, never taken back by the host. For enclaves on host side, pages  
are reclaimable, that allows us to enforce in a similar way to memcg.


Thanks
Haitao


Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-17 Thread Michal Suchánek
On Tue, Oct 17, 2023 at 09:05:29PM +0900, Masahiro Yamada wrote:
> On Tue, Oct 17, 2023 at 7:44 PM Michal Suchánek  wrote:
> >
> > On Tue, Oct 17, 2023 at 07:15:50PM +0900, Masahiro Yamada wrote:
> > > > >
> > > > > Let me add more context to my question.
> > > > >
> > > > >
> > > > > I am interested in the timing when
> > > > > 'pkg-config --print-variables kmod | grep module_directory'
> > > > > is executed.
> > > > >
> > > > >
> > > > >
> > > > > 1.  Build a SRPM on machine A
> > > > >
> > > > > 2.  Copy the SRPM from machine A to machine B
> > > > >
> > > > > 3.  Run rpmbuild on machine B to build the SRPM into a RPM
> > > > >
> > > > > 4.  Copy the RPM from machine B to machine C
> > > > >
> > > > > 5.  Install the RPM to machine C
> > > >
> > > > As far as I am aware the typical use case is two step:
> > > >
> > > > 1. run make rpm-pkg on machine A
> > > > 2. install the binary rpm on machine C that might not have build tools
> > > >or powerful enough CPU
> > > >
> > > > While it's theoretically possible to use the srpm to rebuild the binary
> > > > rpm independently of the kernel git tree I am not aware of people
> > > > commonly doing this.
> > >
> > >
> > >
> > > If I correctly understand commit
> > > 8818039f959b2efc0d6f2cb101f8061332f0c77e,
> > > those Redhat guys pack a SRPM on a local machine,
> > > then send it to their build server called 'koji'.
> > >
> > > Otherwise, there is no reason
> > > to have 'make srcrpm-pkg'.
> > >
> > >
> > >
> > > I believe "A == B" is not always true,
> > > but we can assume "distro(A) == distro(B)" is always met
> > > for simplicity.
> > >
> > > So, I am OK with configuration at the SRPM time.
> >
> > Even if the distro does not match it will likely work to configure SRPM
> > for non-matching distro and then build it on the target distro but I have
> > not tested it.
> 
> 
> 
> Your approach specifies %{MODLIB} as a fixed string
> when generating kernel.spec, i.e. at the SRPM time.
> 
> 
>  %files
>  %defattr (-, root, root)
> -/lib/modules/%{KERNELRELEASE}
> -%exclude /lib/modules/%{KERNELRELEASE}/build
> +%{MODLIB}
> +%exclude %{MODLIB}/build
>  /boot/*
> 
> 
> Then, how to change the path later?

Why would you need to change the path later?

The SRPM has sources, it does not need to build on the system on which
it is authored if it is intended for another distribution.

Of course, you would need to know for what distribution and where it
wants its modules so that you can specify the location when creating the
SRPM.

> > > > If rebuilding the source rpm on a different machine from where the git
> > > > tree is located, and possibly on a different distribution is desirable
> > > > then the detection of the KERNEL_MODULE_DIRECTORY should be added in the
> > > > rpm spec file as well.
> > > >
> > > > > Of course, we are most interested in the module path
> > > > > of machine C, but it is difficult/impossible to
> > > > > guess it at the time of building.
> > > > >
> > > > > We can assume machine B == machine C.
> > > > >
> > > > > We are the second most interested in the module
> > > > > path on machine B.
> > > > >
> > > > > The module path of machine A is not important.
> > > > >
> > > > > So, I am asking where you would inject
> > > > > 'pkg-config --print-variables kmod | grep module_directory'.
> > > >
> > > > I don't. I don't think there will be a separate machine B.
> > > >
> > > > And I can't really either - so far any attempt at adding support for
> > > > this has been rejected.
> > > >
> > > > Technically the KERNEL_MODULE_DIRECTORY could be set in two steps - one
> > > > giving the script to run, and one running it, and then it could be run
> > > > independently in the SRPM as well.
> > >
> > >
> > > At first, I thought your patch [1] was very ugly,
> > > but I do not think it is so ugly if cleanly implemented.
> > >
> > > It won't hurt to allow users to specify the middle part of MODLIB.
> > >
> > >
> > > There are two options.
> > >
> > >
> > > [A]  Add 'MOD_PREFIX' to specify the middle part of MODLIB
> > >
> > >
> > > The top Makefile will look as follows:
> > >
> > >
> > > MODLIB = $(INSTALL_MOD_PATH)$(MOD_PREFIX)/lib/modules/$(KERNELRELEASE)
> > > export MODLIB
> > >
> > >
> > > It is easier than specifying the entire MODLIB, but you still need
> > > to manually pass "MOD_PREFIX=/usr" from an env variable or
> > > the command line.
> > >
> > > If MOD_PREFIX is not given, MODLIB is the same as the current one.
> > >
> > > [B] Support a dynamic configuration as well
> > >
> > >
> > > MOD_PREFIX ?= $(shell pkg-config --variable=module_prefix libkmod 
> > > 2>/dev/null)
> > > export MOD_PREFIX
> > >
> > > MODLIB = $(INSTALL_MOD_PATH)$(MOD_PREFIX)/lib/modules/$(KERNELRELEASE)
> > > export MODLIB
> >
> > That's basically the same thing as the patch that has been rejected.
> >
> > I used := to prevent calling pkg-config every time MODLIB is used but it
> > might not be the most flexible wrt overrides.
> 
> That's good you care about the cost of 

Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-17 Thread Masahiro Yamada
On Tue, Oct 17, 2023 at 7:44 PM Michal Suchánek  wrote:
>
> On Tue, Oct 17, 2023 at 07:15:50PM +0900, Masahiro Yamada wrote:
> > > >
> > > > Let me add more context to my question.
> > > >
> > > >
> > > > I am interested in the timing when
> > > > 'pkg-config --print-variables kmod | grep module_directory'
> > > > is executed.
> > > >
> > > >
> > > >
> > > > 1.  Build a SRPM on machine A
> > > >
> > > > 2.  Copy the SRPM from machine A to machine B
> > > >
> > > > 3.  Run rpmbuild on machine B to build the SRPM into a RPM
> > > >
> > > > 4.  Copy the RPM from machine B to machine C
> > > >
> > > > 5.  Install the RPM to machine C
> > >
> > > As far as I am aware the typical use case is two step:
> > >
> > > 1. run make rpm-pkg on machine A
> > > 2. install the binary rpm on machine C that might not have build tools
> > >or powerful enough CPU
> > >
> > > While it's theoretically possible to use the srpm to rebuild the binary
> > > rpm independently of the kernel git tree I am not aware of people
> > > commonly doing this.
> >
> >
> >
> > If I correctly understand commit
> > 8818039f959b2efc0d6f2cb101f8061332f0c77e,
> > those Redhat guys pack a SRPM on a local machine,
> > then send it to their build server called 'koji'.
> >
> > Otherwise, there is no reason
> > to have 'make srcrpm-pkg'.
> >
> >
> >
> > I believe "A == B" is not always true,
> > but we can assume "distro(A) == distro(B)" is always met
> > for simplicity.
> >
> > So, I am OK with configuration at the SRPM time.
>
> Even if the distro does not match it will likely work to configure SRPM
> for non-matching distro and then build it on the target distro but I have
> not tested it.



Your approach specifies %{MODLIB} as a fixed string
when generating kernel.spec, i.e. at the SRPM time.


 %files
 %defattr (-, root, root)
-/lib/modules/%{KERNELRELEASE}
-%exclude /lib/modules/%{KERNELRELEASE}/build
+%{MODLIB}
+%exclude %{MODLIB}/build
 /boot/*


Then, how to change the path later?






I do not know if the relocatable package
is a sensible solution because the kernel package has /boot/

http://ftp.rpm.org/api/4.4.2.2/relocatable.html


We might be able to tweak installation paths in %post section.

Or perhaps, %{shell } can defer the module path detection
until building RPM.

%define MOD_PREFIX%{shell pkg-config --variable=module_prefix
libkmod 2>/dev/null}


Overall, I did not find a cool solution.



>
> > > If rebuilding the source rpm on a different machine from where the git
> > > tree is located, and possibly on a different distribution is desirable
> > > then the detection of the KERNEL_MODULE_DIRECTORY should be added in the
> > > rpm spec file as well.
> > >
> > > > Of course, we are most interested in the module path
> > > > of machine C, but it is difficult/impossible to
> > > > guess it at the time of building.
> > > >
> > > > We can assume machine B == machine C.
> > > >
> > > > We are the second most interested in the module
> > > > path on machine B.
> > > >
> > > > The module path of machine A is not important.
> > > >
> > > > So, I am asking where you would inject
> > > > 'pkg-config --print-variables kmod | grep module_directory'.
> > >
> > > I don't. I don't think there will be a separate machine B.
> > >
> > > And I can't really either - so far any attempt at adding support for
> > > this has been rejected.
> > >
> > > Technically the KERNEL_MODULE_DIRECTORY could be set in two steps - one
> > > giving the script to run, and one running it, and then it could be run
> > > independently in the SRPM as well.
> >
> >
> > At first, I thought your patch [1] was very ugly,
> > but I do not think it is so ugly if cleanly implemented.
> >
> > It won't hurt to allow users to specify the middle part of MODLIB.
> >
> >
> > There are two options.
> >
> >
> > [A]  Add 'MOD_PREFIX' to specify the middle part of MODLIB
> >
> >
> > The top Makefile will look as follows:
> >
> >
> > MODLIB = $(INSTALL_MOD_PATH)$(MOD_PREFIX)/lib/modules/$(KERNELRELEASE)
> > export MODLIB
> >
> >
> > It is easier than specifying the entire MODLIB, but you still need
> > to manually pass "MOD_PREFIX=/usr" from an env variable or
> > the command line.
> >
> > If MOD_PREFIX is not given, MODLIB is the same as the current one.
> >
> > [B] Support a dynamic configuration as well
> >
> >
> > MOD_PREFIX ?= $(shell pkg-config --variable=module_prefix libkmod 
> > 2>/dev/null)
> > export MOD_PREFIX
> >
> > MODLIB = $(INSTALL_MOD_PATH)$(MOD_PREFIX)/lib/modules/$(KERNELRELEASE)
> > export MODLIB
>
> That's basically the same thing as the patch that has been rejected.
>
> I used := to prevent calling pkg-config every time MODLIB is used but it
> might not be the most flexible wrt overrides.




That's good you care about the cost of $(shell ) invocations.

:= is evaluated one time at maximum, but one time at minimum.

$(shell ) is always invoked for non-build targets as
"make clean", "make help", etc.
That is what I care about.


?= is a recursive variable.

The 

Re: [PATCH v2 2/2] kbuild: unify no-compiler-targets and no-sync-config-targets

2023-10-17 Thread Nicolas Schier
On Sat, Oct 14, 2023 at 07:54:36PM +0900, Masahiro Yamada wrote:
> Now that vdso_install does not depend on any in-tree build artifact,
> it no longer needs a compiler, making no-compiler-targets the same
> as no-sync-config-targets.
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
> Changes in v2:
>   - Revive need-compiler flag
> 

Reviewed-by: Nicolas Schier 


Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-17 Thread Mikko Ylinen
On Mon, Oct 16, 2023 at 02:32:31PM -0700, Sean Christopherson wrote:
> Genuinely curious, who is asking for EPC cgroup support that *isn't* running 
> VMs?

People who work with containers: [1], [2]. 

> AFAIK, these days, SGX is primarily targeted at cloud.  I assume virtual EPC 
> is
> the primary use case for an EPC cgroup.

The common setup is that a cloud VM instance with vEPC is created and then
several SGX enclave containers are run simultaneously on that instance. EPC
cgroups is used to ensure that each container gets their own share of EPC
(and any attempts to go beyond the limit is reclaimed and charged from
the container's memcg). The same containers w/ enclaves use case is
applicable to baremetal also, though.

As far as Kubernetes orchestrated containers are concerned, "in-place" resource
scaling is still in very early stages which means that the cgroups values are
adjusted by *re-creating* the container. The hierarchies are also built
such that there's no mix of VMs w/ vEPC and enclaves in the same tree.

Mikko

[1] 
https://lore.kernel.org/linux-sgx/20221202183655.3767674-1-kris...@linux.intel.com/T/#m6d1c895534b4c0636f47c2d1620016b4c362bb9b
[2] 
https://lore.kernel.org/linux-sgx/20221202183655.3767674-1-kris...@linux.intel.com/T/#m37600e457b832feee6e8346aa74dcff8f21965f8


Re: [PATCH v2 5/6] ACPI: NFIT: Replace acpi_driver with platform_driver

2023-10-17 Thread Rafael J. Wysocki
On Tue, Oct 17, 2023 at 12:45 PM Wilczynski, Michal
 wrote:
>
>
> On 10/6/2023 7:30 PM, Michal Wilczynski wrote:
> > NFIT driver uses struct acpi_driver incorrectly to register itself.
> > This is wrong as the instances of the ACPI devices are not meant
> > to be literal devices, they're supposed to describe ACPI entry of a
> > particular device.
> >
> > Use platform_driver instead of acpi_driver. In relevant places call
> > platform devices instances pdev to make a distinction with ACPI
> > devices instances.
> >
> > NFIT driver uses devm_*() family of functions extensively. This change
> > has no impact on correct functioning of the whole devm_*() family of
> > functions, since the lifecycle of the device stays the same. It is still
> > being created during the enumeration, and destroyed on platform device
> > removal.
> >
> > Suggested-by: Rafael J. Wysocki 
> > Signed-off-by: Michal Wilczynski 
>
> Hi Dan,
> Rafael already reviewed this patch series and merged patches
> that he likes. We're waiting on your input regarding two NFIT
> commits in this series.

I actually haven't looked at the NFIT patches in this series myself
and this is not urgent at all IMV.

Thanks!



Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-17 Thread Michal Suchánek
On Tue, Oct 17, 2023 at 07:15:50PM +0900, Masahiro Yamada wrote:
> > >
> > > Let me add more context to my question.
> > >
> > >
> > > I am interested in the timing when
> > > 'pkg-config --print-variables kmod | grep module_directory'
> > > is executed.
> > >
> > >
> > >
> > > 1.  Build a SRPM on machine A
> > >
> > > 2.  Copy the SRPM from machine A to machine B
> > >
> > > 3.  Run rpmbuild on machine B to build the SRPM into a RPM
> > >
> > > 4.  Copy the RPM from machine B to machine C
> > >
> > > 5.  Install the RPM to machine C
> >
> > As far as I am aware the typical use case is two step:
> >
> > 1. run make rpm-pkg on machine A
> > 2. install the binary rpm on machine C that might not have build tools
> >or powerful enough CPU
> >
> > While it's theoretically possible to use the srpm to rebuild the binary
> > rpm independently of the kernel git tree I am not aware of people
> > commonly doing this.
> 
> 
> 
> If I correctly understand commit
> 8818039f959b2efc0d6f2cb101f8061332f0c77e,
> those Redhat guys pack a SRPM on a local machine,
> then send it to their build server called 'koji'.
> 
> Otherwise, there is no reason
> to have 'make srcrpm-pkg'.
> 
> 
> 
> I believe "A == B" is not always true,
> but we can assume "distro(A) == distro(B)" is always met
> for simplicity.
> 
> So, I am OK with configuration at the SRPM time.

Even if the distro does not match it will likely work to configure SRPM
for non-matching distro and then build it on the target distro but I have
not tested it.

> > If rebuilding the source rpm on a different machine from where the git
> > tree is located, and possibly on a different distribution is desirable
> > then the detection of the KERNEL_MODULE_DIRECTORY should be added in the
> > rpm spec file as well.
> >
> > > Of course, we are most interested in the module path
> > > of machine C, but it is difficult/impossible to
> > > guess it at the time of building.
> > >
> > > We can assume machine B == machine C.
> > >
> > > We are the second most interested in the module
> > > path on machine B.
> > >
> > > The module path of machine A is not important.
> > >
> > > So, I am asking where you would inject
> > > 'pkg-config --print-variables kmod | grep module_directory'.
> >
> > I don't. I don't think there will be a separate machine B.
> >
> > And I can't really either - so far any attempt at adding support for
> > this has been rejected.
> >
> > Technically the KERNEL_MODULE_DIRECTORY could be set in two steps - one
> > giving the script to run, and one running it, and then it could be run
> > independently in the SRPM as well.
> 
> 
> At first, I thought your patch [1] was very ugly,
> but I do not think it is so ugly if cleanly implemented.
> 
> It won't hurt to allow users to specify the middle part of MODLIB.
> 
> 
> There are two options.
> 
> 
> [A]  Add 'MOD_PREFIX' to specify the middle part of MODLIB
> 
> 
> The top Makefile will look as follows:
> 
> 
> MODLIB = $(INSTALL_MOD_PATH)$(MOD_PREFIX)/lib/modules/$(KERNELRELEASE)
> export MODLIB
> 
> 
> It is easier than specifying the entire MODLIB, but you still need
> to manually pass "MOD_PREFIX=/usr" from an env variable or
> the command line.
> 
> If MOD_PREFIX is not given, MODLIB is the same as the current one.
> 
> [B] Support a dynamic configuration as well
> 
> 
> MOD_PREFIX ?= $(shell pkg-config --variable=module_prefix libkmod 2>/dev/null)
> export MOD_PREFIX
> 
> MODLIB = $(INSTALL_MOD_PATH)$(MOD_PREFIX)/lib/modules/$(KERNELRELEASE)
> export MODLIB

That's basically the same thing as the patch that has been rejected.

I used := to prevent calling pkg-config every time MODLIB is used but it
might not be the most flexible wrt overrides.

> If MOD_PREFIX is given from an env variable or from the command line,
> it is respected.
> 
> If "pkg-config --variable=module_prefix libkmod" works,
> that configuration is applied.
> 
> Otherwise, MOD_PREFIX is empty, i.e. fall back to the current behavior.
> 
> 
> I prefer 'MOD_PREFIX' to 'KERNEL_MODULE_DIRECTORY' in your patch [1]
> because "|| echo /lib/modules" can be omitted.
> 
> I do not think we will have such a crazy distro that
> installs modules under /opt/ directory.

However, I can easily imagine a distribution that would want to put
modules in /usr/lib-amd64-linux/modules.

> I could not understand why you inserted
> "--print-variables kmod 2>/dev/null | grep '^module_directory$$' >/dev/null"
> but I guess the reason is the same.
> "pkg-config --variable=module_directory kmod" always succeeds,
> so "|| echo /lib/modules" is never processed.

Yes, that's the semantics of the tool. The jq version was slightly less
convoluted but required additional tool for building the kernel.

> I do not know why you parsed kmod.pc instead of libkmod.pc [2]

Because it's kmod property, not libkmod property.

Distributions would install libkmod.pc only with development files
whereas the kmod.pc should be installed with the binaries.

Thanks

Michal

> 
> 
> [1] 
> 

Re: [PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA

2023-10-17 Thread Catalin Marinas
On Mon, Oct 16, 2023 at 01:41:15PM +0100, Alexandru Elisei wrote:
> On Thu, Oct 12, 2023 at 10:25:11AM +0900, Hyesoo Yu wrote:
> > I don't think it would be effcient when the majority of movable pages
> > do not use GFP_TAGGED.
> > 
> > Metadata pages have a low probability of being in the pcp list
> > because metadata pages is bypassed when freeing pages.
> > 
> > The allocation performance of most movable pages is likely to decrease
> > if only the request with ALLOC_FROM_METADATA could be allocated.
> 
> You're right, I hadn't considered that.
> 
> > 
> > How about not including metadata pages in the pcp list at all ?
> 
> Sounds reasonable, I will keep it in mind for the next iteration of the
> series.

BTW, I suggest for the next iteration we drop MIGRATE_METADATA, only use
CMA and assume that the tag storage itself supports tagging. Hopefully
it makes the patches a bit simpler.

-- 
Catalin



Re: [PATCH 2/3] usb: typec: fsa4480: Add support to swap SBU orientation

2023-10-17 Thread Luca Weiss
On Tue Oct 17, 2023 at 11:01 AM CEST, Heikki Krogerus wrote:
> Hi Luca,
>
> On Fri, Oct 13, 2023 at 01:38:06PM +0200, Luca Weiss wrote:
> > On some hardware designs the AUX+/- lanes are connected reversed to
> > SBU1/2 compared to the expected design by FSA4480.
> > 
> > Made more complicated, the otherwise compatible Orient-Chip OCP96011
> > expects the lanes to be connected reversed compared to FSA4480.
> > 
> > * FSA4480 block diagram shows AUX+ connected to SBU2 and AUX- to SBU1.
> > * OCP96011 block diagram shows AUX+ connected to SBU1 and AUX- to SBU2.
> > 
> > So if OCP96011 is used as drop-in for FSA4480 then the orientation
> > handling in the driver needs to be reversed to match the expectation of
> > the OCP96011 hardware.
> > 
> > Support parsing the data-lanes parameter in the endpoint node to swap
> > this in the driver.
> > 
> > The parse_data_lanes_mapping function is mostly taken from nb7vpq904m.c.
> > 
> > Signed-off-by: Luca Weiss 
> > ---
> >  drivers/usb/typec/mux/fsa4480.c | 81 
> > +
> >  1 file changed, 81 insertions(+)
> > 
> > diff --git a/drivers/usb/typec/mux/fsa4480.c 
> > b/drivers/usb/typec/mux/fsa4480.c
> > index e0ee1f621abb..6ee467c96fb6 100644
> > --- a/drivers/usb/typec/mux/fsa4480.c
> > +++ b/drivers/usb/typec/mux/fsa4480.c
> > @@ -9,6 +9,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
>
> If you don't mind, let's keep this driver ready for ACPI, just in
> case...

I'm quite clueless about any details about ACPI but makes sense to use
the generic APIs.

>
> >  #include 
> >  #include 
> >  #include 
> > @@ -60,6 +61,7 @@ struct fsa4480 {
> > unsigned int svid;
> >  
> > u8 cur_enable;
> > +   bool swap_sbu_lanes;
> >  };
> >  
> >  static const struct regmap_config fsa4480_regmap_config = {
> > @@ -76,6 +78,9 @@ static int fsa4480_set(struct fsa4480 *fsa)
> > u8 enable = FSA4480_ENABLE_DEVICE;
> > u8 sel = 0;
> >  
> > +   if (fsa->swap_sbu_lanes)
> > +   reverse = !reverse;
> > +
> > /* USB Mode */
> > if (fsa->mode < TYPEC_STATE_MODAL ||
> > (!fsa->svid && (fsa->mode == TYPEC_MODE_USB2 ||
> > @@ -179,12 +184,84 @@ static int fsa4480_mux_set(struct typec_mux_dev *mux, 
> > struct typec_mux_state *st
> > return ret;
> >  }
> >  
> > +enum {
> > +   NORMAL_LANE_MAPPING,
> > +   INVERT_LANE_MAPPING,
> > +};
> > +
> > +#define DATA_LANES_COUNT   2
> > +
> > +static const int supported_data_lane_mapping[][DATA_LANES_COUNT] = {
> > +   [NORMAL_LANE_MAPPING] = { 0, 1 },
> > +   [INVERT_LANE_MAPPING] = { 1, 0 },
> > +};
> > +
> > +static int fsa4480_parse_data_lanes_mapping(struct fsa4480 *fsa)
> > +{
> > +   struct device_node *ep;
>
> struct fwnode_handle *ep;
>
> > +   u32 data_lanes[DATA_LANES_COUNT];
> > +   int ret, i, j;
> > +
> > +   ep = of_graph_get_next_endpoint(fsa->client->dev.of_node, NULL);
>
> Shouldn't you loop through the endpoints? In any case:
>
> ep = fwnode_graph_get_next_endpoint(dev_fwnode(>client->dev, 
> NULL));

The docs only mention one endpoint so I'm assuming just next_endpoint is
fine?

>
> > +   if (!ep)
> > +   return 0;
> > +
> > +   ret = of_property_count_u32_elems(ep, "data-lanes");
>
> ret = fwnode_property_count_u32(ep, "data-lanes");
>
> But is this necessary at all in this case - why not just read the
> array since you expect a fixed size for it (if the read fails it fails)?

Hm yeah that should be okay.. Will check the docs
of_property_read_u32_array (or well fwnode_property_read_u32_array) to
see if there's any gotchas if there's less or more elements provided.

Regards
Luca

>
> > +   if (ret == -EINVAL)
> > +   /* Property isn't here, consider default mapping */
> > +   goto out_done;
> > +   if (ret < 0)
> > +   goto out_error;
> > +
> > +   if (ret != DATA_LANES_COUNT) {
> > +   dev_err(>client->dev, "expected 2 data lanes\n");
> > +   ret = -EINVAL;
> > +   goto out_error;
> > +   }
> > +
> > +   ret = of_property_read_u32_array(ep, "data-lanes", data_lanes, 
> > DATA_LANES_COUNT);
>
> ret = fwnode_property_read_u32_array(ep, "data-lanes", data_lanes, 
> DATA_LANES_COUNT);
>
> > +   if (ret)
> > +   goto out_error;
> > +
> > +   for (i = 0; i < ARRAY_SIZE(supported_data_lane_mapping); i++) {
> > +   for (j = 0; j < DATA_LANES_COUNT; j++) {
> > +   if (data_lanes[j] != supported_data_lane_mapping[i][j])
> > +   break;
> > +   }
> > +
> > +   if (j == DATA_LANES_COUNT)
> > +   break;
> > +   }
> > +
> > +   switch (i) {
> > +   case NORMAL_LANE_MAPPING:
> > +   break;
> > +   case INVERT_LANE_MAPPING:
> > +   fsa->swap_sbu_lanes = true;
> > +   dev_info(>client->dev, "using inverted data lanes 
> > mapping\n");
>
> That is just noise. Please drop it.
>
> > +   break;
> > +   default:
> > +   dev_err(>client->dev, 

Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-17 Thread Masahiro Yamada
> >
> > Let me add more context to my question.
> >
> >
> > I am interested in the timing when
> > 'pkg-config --print-variables kmod | grep module_directory'
> > is executed.
> >
> >
> >
> > 1.  Build a SRPM on machine A
> >
> > 2.  Copy the SRPM from machine A to machine B
> >
> > 3.  Run rpmbuild on machine B to build the SRPM into a RPM
> >
> > 4.  Copy the RPM from machine B to machine C
> >
> > 5.  Install the RPM to machine C
>
> As far as I am aware the typical use case is two step:
>
> 1. run make rpm-pkg on machine A
> 2. install the binary rpm on machine C that might not have build tools
>or powerful enough CPU
>
> While it's theoretically possible to use the srpm to rebuild the binary
> rpm independently of the kernel git tree I am not aware of people
> commonly doing this.



If I correctly understand commit
8818039f959b2efc0d6f2cb101f8061332f0c77e,
those Redhat guys pack a SRPM on a local machine,
then send it to their build server called 'koji'.

Otherwise, there is no reason
to have 'make srcrpm-pkg'.



I believe "A == B" is not always true,
but we can assume "distro(A) == distro(B)" is always met
for simplicity.

So, I am OK with configuration at the SRPM time.





> If rebuilding the source rpm on a different machine from where the git
> tree is located, and possibly on a different distribution is desirable
> then the detection of the KERNEL_MODULE_DIRECTORY should be added in the
> rpm spec file as well.
>
> > Of course, we are most interested in the module path
> > of machine C, but it is difficult/impossible to
> > guess it at the time of building.
> >
> > We can assume machine B == machine C.
> >
> > We are the second most interested in the module
> > path on machine B.
> >
> > The module path of machine A is not important.
> >
> > So, I am asking where you would inject
> > 'pkg-config --print-variables kmod | grep module_directory'.
>
> I don't. I don't think there will be a separate machine B.
>
> And I can't really either - so far any attempt at adding support for
> this has been rejected.
>
> Technically the KERNEL_MODULE_DIRECTORY could be set in two steps - one
> giving the script to run, and one running it, and then it could be run
> independently in the SRPM as well.


At first, I thought your patch [1] was very ugly,
but I do not think it is so ugly if cleanly implemented.

It won't hurt to allow users to specify the middle part of MODLIB.


There are two options.




[A]  Add 'MOD_PREFIX' to specify the middle part of MODLIB


The top Makefile will look as follows:


MODLIB = $(INSTALL_MOD_PATH)$(MOD_PREFIX)/lib/modules/$(KERNELRELEASE)
export MODLIB


It is easier than specifying the entire MODLIB, but you still need
to manually pass "MOD_PREFIX=/usr" from an env variable or
the command line.

If MOD_PREFIX is not given, MODLIB is the same as the current one.




[B] Support a dynamic configuration as well



MOD_PREFIX ?= $(shell pkg-config --variable=module_prefix libkmod 2>/dev/null)
export MOD_PREFIX

MODLIB = $(INSTALL_MOD_PATH)$(MOD_PREFIX)/lib/modules/$(KERNELRELEASE)
export MODLIB




If MOD_PREFIX is given from an env variable or from the command line,
it is respected.

If "pkg-config --variable=module_prefix libkmod" works,
that configuration is applied.

Otherwise, MOD_PREFIX is empty, i.e. fall back to the current behavior.






I prefer 'MOD_PREFIX' to 'KERNEL_MODULE_DIRECTORY' in your patch [1]
because "|| echo /lib/modules" can be omitted.

I do not think we will have such a crazy distro that
installs modules under /opt/ directory.




I could not understand why you inserted
"--print-variables kmod 2>/dev/null | grep '^module_directory$$' >/dev/null"
but I guess the reason is the same.
"pkg-config --variable=module_directory kmod" always succeeds,
so "|| echo /lib/modules" is never processed.


I do not know why you parsed kmod.pc instead of libkmod.pc [2]



[1] https://lore.kernel.org/linux-kbuild/20230718120348.383-1-msucha...@suse.de/
[2] https://github.com/kmod-project/kmod/blob/v31/configure.ac#L295



--
Best Regards
Masahiro Yamada



Re: [PATCH] neighbor: tracing: Move pin6 inside CONFIG_IPV6=y section

2023-10-17 Thread Simon Horman
On Mon, Oct 16, 2023 at 02:49:04PM +0200, Geert Uytterhoeven wrote:
> When CONFIG_IPV6=n, and building with W=1:
> 
> In file included from include/trace/define_trace.h:102,
>from include/trace/events/neigh.h:255,
>from net/core/net-traces.c:51:
> include/trace/events/neigh.h: In function 
> ‘trace_event_raw_event_neigh_create’:
> include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not 
> used [-Werror=unused-but-set-variable]
>42 | struct in6_addr *pin6;
> |  ^~~~
> include/trace/trace_events.h:402:11: note: in definition of macro 
> ‘DECLARE_EVENT_CLASS’
>   402 | { assign; }   
>   \
> |   ^~
> include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
>44 |  PARAMS(assign),   \
> |  ^~
> include/trace/events/neigh.h:23:1: note: in expansion of macro 
> ‘TRACE_EVENT’
>23 | TRACE_EVENT(neigh_create,
> | ^~~
> include/trace/events/neigh.h:41:9: note: in expansion of macro 
> ‘TP_fast_assign’
>41 | TP_fast_assign(
> | ^~
> In file included from include/trace/define_trace.h:103,
>from include/trace/events/neigh.h:255,
>from net/core/net-traces.c:51:
> include/trace/events/neigh.h: In function ‘perf_trace_neigh_create’:
> include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not 
> used [-Werror=unused-but-set-variable]
>42 | struct in6_addr *pin6;
> |  ^~~~
> include/trace/perf.h:51:11: note: in definition of macro 
> ‘DECLARE_EVENT_CLASS’
>51 | { assign; }   
>   \
> |   ^~
> include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
>44 |  PARAMS(assign),   \
> |  ^~
> include/trace/events/neigh.h:23:1: note: in expansion of macro 
> ‘TRACE_EVENT’
>23 | TRACE_EVENT(neigh_create,
> | ^~~
> include/trace/events/neigh.h:41:9: note: in expansion of macro 
> ‘TP_fast_assign’
>41 | TP_fast_assign(
> | ^~
> 
> Indeed, the variable pin6 is declared and initialized unconditionally,
> while it is only used and needlessly re-initialized when support for
> IPv6 is enabled.
> 
> Fix this by dropping the unused variable initialization, and moving the
> variable declaration inside the existing section protected by a check
> for CONFIG_IPV6.
> 
> Fixes: fc651001d2c5ca4f ("neighbor: Add tracepoint to __neigh_create")
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: Simon Horman 
Tested-by: Simon Horman  # build-tested




Re: [PATCH v2] bus: mhi: host: Add tracing support

2023-10-17 Thread Krishna Chaitanya Chundru



On 10/16/2023 8:43 PM, Steven Rostedt wrote:

On Fri, 13 Oct 2023 15:22:19 +0530
Krishna chaitanya chundru  wrote:


+++ b/include/trace/events/mhi_host.h
@@ -0,0 +1,287 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mhi_host
+
+#if !defined(_TRACE_EVENT_MHI_HOST_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_EVENT_MHI_HOST_H
+
+#include 
+#define MHI_STATE  \
+   EM(MHI_STATE_RESET, "RESET")  \
+   EM(MHI_STATE_READY, "READY")  \
+   EM(MHI_STATE_M0,"M0") \
+   EM(MHI_STATE_M1,"M1") \
+   EM(MHI_STATE_M2,"M2") \
+   EM(MHI_STATE_M3,"M3") \
+   EM(MHI_STATE_M3_FAST,   "M3 FAST")\
+   EM(MHI_STATE_BHI,   "BHI")\
+   EMe(MHI_STATE_SYS_ERR,  "SYS ERROR")
+
+#define MHI_EE \
+   EM(MHI_EE_PBL,  "PRIMARY BOOTLOADER") \
+   EM(MHI_EE_SBL,  "SECONDARY BOOTLOADER")   \
+   EM(MHI_EE_AMSS, "MISSION MODE")   \
+   EM(MHI_EE_RDDM, "RAMDUMP DOWNLOAD MODE")  \
+   EM(MHI_EE_WFW,  "WLAN FIRMWARE")  \
+   EM(MHI_EE_PTHRU,"PASS THROUGH")   \
+   EM(MHI_EE_EDL,  "EMERGENCY DOWNLOAD") \
+   EM(MHI_EE_FP,   "FLASH PROGRAMMER")   \
+   EM(MHI_EE_DISABLE_TRANSITION,   "DISABLE")\
+   EMe(MHI_EE_NOT_SUPPORTED,   "NOT SUPPORTED")
+
+#define MHI_PM_STATE   \
+   EM(MHI_PM_STATE_DISABLE,"DISABLE")\
+   EM(MHI_PM_STATE_POR,"POWER ON RESET") \
+   EM(MHI_PM_STATE_M0, "M0") \
+   EM(MHI_PM_STATE_M2, "M2") \
+   EM(MHI_PM_STATE_M3_ENTER,   "M?->M3")  \
+   EM(MHI_PM_STATE_M3, "M3") \
+   EM(MHI_PM_STATE_M3_EXIT,"M3->M0")  \
+   EM(MHI_PM_STATE_FW_DL_ERR,  "Firmware Download Error")\
+   EM(MHI_PM_STATE_SYS_ERR_DETECT, "SYS ERROR Detect")   \
+   EM(MHI_PM_STATE_SYS_ERR_PROCESS,"SYS ERROR Process")  \
+   EM(MHI_PM_STATE_SHUTDOWN_PROCESS,   "SHUTDOWN Process")   \
+   EMe(MHI_PM_STATE_LD_ERR_FATAL_DETECT,   "Linkdown or Error Fatal 
Detect")
+
+#define MHI_CH_STATE   \
+   EM(MHI_CH_STATE_TYPE_RESET, "RESET")  \
+   EM(MHI_CH_STATE_TYPE_STOP, "STOP")\
+   EMe(MHI_CH_STATE_TYPE_START, "START")
+
+#define MHI_DEV_ST_TRANSITION  \
+   EM(DEV_ST_TRANSITION_PBL,   "PBL")\
+   EM(DEV_ST_TRANSITION_READY, "READY")  \
+   EM(DEV_ST_TRANSITION_SBL,   "SBL")\
+   EM(DEV_ST_TRANSITION_MISSION_MODE,  "MISSION MODE")   \
+   EM(DEV_ST_TRANSITION_FP,"FLASH PROGRAMMER")   \
+   EM(DEV_ST_TRANSITION_SYS_ERR,   "SYS ERROR")  \
+   EMe(DEV_ST_TRANSITION_DISABLE,  "DISABLE")
+
+/* Enums require being exported to userspace, for user tool parsing */
+#undef EM
+#undef EMe
+#defineEM(a, b)TRACE_DEFINE_ENUM(a);
+#defineEMe(a, b)   TRACE_DEFINE_ENUM(a);
+
+MHI_STATE
+MHI_EE
+MHI_PM_STATE
+MHI_CH_STATE
+MHI_DEV_ST_TRANSITION
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a, b)   {a, b},
+#define EMe(a, b)  {a, b}
+
+TRACE_EVENT(mhi_gen_tre,
+
+   TP_PROTO(const char *name, int ch_num, u64 wp, __le64 tre_ptr,
+__le32 dword0, __le32 dword1),
+
+   TP_ARGS(name, ch_num, wp, tre_ptr, dword0, dword1),
+
+   TP_STRUCT__entry(
+   __string(name, name)
+   __field(int, ch_num)

This is fine as __string() is four bytes in the event (2 bytes for offset
where the string exists, and 2 bytes for its length).


+   __field(u64, wp)
+   __field(__le64, tre_ptr)

Which makes the two 8 byte fields aligned. Good!


+   __field(__le32, dword0)
+   __field(__le32, dword1)
+   ),
+
+   TP_fast_assign(
+   __assign_str(name, name);
+   __entry->ch_num = ch_num;
+   __entry->wp = wp;
+   __entry->tre_ptr = tre_ptr;
+   __entry->dword0 = dword0;

Re: [PATCH 2/3] usb: typec: fsa4480: Add support to swap SBU orientation

2023-10-17 Thread Heikki Krogerus
Hi Luca,

On Fri, Oct 13, 2023 at 01:38:06PM +0200, Luca Weiss wrote:
> On some hardware designs the AUX+/- lanes are connected reversed to
> SBU1/2 compared to the expected design by FSA4480.
> 
> Made more complicated, the otherwise compatible Orient-Chip OCP96011
> expects the lanes to be connected reversed compared to FSA4480.
> 
> * FSA4480 block diagram shows AUX+ connected to SBU2 and AUX- to SBU1.
> * OCP96011 block diagram shows AUX+ connected to SBU1 and AUX- to SBU2.
> 
> So if OCP96011 is used as drop-in for FSA4480 then the orientation
> handling in the driver needs to be reversed to match the expectation of
> the OCP96011 hardware.
> 
> Support parsing the data-lanes parameter in the endpoint node to swap
> this in the driver.
> 
> The parse_data_lanes_mapping function is mostly taken from nb7vpq904m.c.
> 
> Signed-off-by: Luca Weiss 
> ---
>  drivers/usb/typec/mux/fsa4480.c | 81 
> +
>  1 file changed, 81 insertions(+)
> 
> diff --git a/drivers/usb/typec/mux/fsa4480.c b/drivers/usb/typec/mux/fsa4480.c
> index e0ee1f621abb..6ee467c96fb6 100644
> --- a/drivers/usb/typec/mux/fsa4480.c
> +++ b/drivers/usb/typec/mux/fsa4480.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

If you don't mind, let's keep this driver ready for ACPI, just in
case...

>  #include 
>  #include 
>  #include 
> @@ -60,6 +61,7 @@ struct fsa4480 {
>   unsigned int svid;
>  
>   u8 cur_enable;
> + bool swap_sbu_lanes;
>  };
>  
>  static const struct regmap_config fsa4480_regmap_config = {
> @@ -76,6 +78,9 @@ static int fsa4480_set(struct fsa4480 *fsa)
>   u8 enable = FSA4480_ENABLE_DEVICE;
>   u8 sel = 0;
>  
> + if (fsa->swap_sbu_lanes)
> + reverse = !reverse;
> +
>   /* USB Mode */
>   if (fsa->mode < TYPEC_STATE_MODAL ||
>   (!fsa->svid && (fsa->mode == TYPEC_MODE_USB2 ||
> @@ -179,12 +184,84 @@ static int fsa4480_mux_set(struct typec_mux_dev *mux, 
> struct typec_mux_state *st
>   return ret;
>  }
>  
> +enum {
> + NORMAL_LANE_MAPPING,
> + INVERT_LANE_MAPPING,
> +};
> +
> +#define DATA_LANES_COUNT 2
> +
> +static const int supported_data_lane_mapping[][DATA_LANES_COUNT] = {
> + [NORMAL_LANE_MAPPING] = { 0, 1 },
> + [INVERT_LANE_MAPPING] = { 1, 0 },
> +};
> +
> +static int fsa4480_parse_data_lanes_mapping(struct fsa4480 *fsa)
> +{
> + struct device_node *ep;

struct fwnode_handle *ep;

> + u32 data_lanes[DATA_LANES_COUNT];
> + int ret, i, j;
> +
> + ep = of_graph_get_next_endpoint(fsa->client->dev.of_node, NULL);

Shouldn't you loop through the endpoints? In any case:

ep = fwnode_graph_get_next_endpoint(dev_fwnode(>client->dev, 
NULL));

> + if (!ep)
> + return 0;
> +
> + ret = of_property_count_u32_elems(ep, "data-lanes");

ret = fwnode_property_count_u32(ep, "data-lanes");

But is this necessary at all in this case - why not just read the
array since you expect a fixed size for it (if the read fails it fails)?

> + if (ret == -EINVAL)
> + /* Property isn't here, consider default mapping */
> + goto out_done;
> + if (ret < 0)
> + goto out_error;
> +
> + if (ret != DATA_LANES_COUNT) {
> + dev_err(>client->dev, "expected 2 data lanes\n");
> + ret = -EINVAL;
> + goto out_error;
> + }
> +
> + ret = of_property_read_u32_array(ep, "data-lanes", data_lanes, 
> DATA_LANES_COUNT);

ret = fwnode_property_read_u32_array(ep, "data-lanes", data_lanes, 
DATA_LANES_COUNT);

> + if (ret)
> + goto out_error;
> +
> + for (i = 0; i < ARRAY_SIZE(supported_data_lane_mapping); i++) {
> + for (j = 0; j < DATA_LANES_COUNT; j++) {
> + if (data_lanes[j] != supported_data_lane_mapping[i][j])
> + break;
> + }
> +
> + if (j == DATA_LANES_COUNT)
> + break;
> + }
> +
> + switch (i) {
> + case NORMAL_LANE_MAPPING:
> + break;
> + case INVERT_LANE_MAPPING:
> + fsa->swap_sbu_lanes = true;
> + dev_info(>client->dev, "using inverted data lanes 
> mapping\n");

That is just noise. Please drop it.

> + break;
> + default:
> + dev_err(>client->dev, "invalid data lanes mapping\n");
> + ret = -EINVAL;
> + goto out_error;
> + }
> +
> +out_done:
> + ret = 0;
> +
> +out_error:
> + of_node_put(ep);
> +
> + return ret;
> +}
> +
>  static int fsa4480_probe(struct i2c_client *client)
>  {
>   struct device *dev = >dev;
>   struct typec_switch_desc sw_desc = { };
>   struct typec_mux_desc mux_desc = { };
>   struct fsa4480 *fsa;
> + int ret;
>  
>   fsa = devm_kzalloc(dev, sizeof(*fsa), GFP_KERNEL);
>   if (!fsa)
> @@ -193,6 +270,10 @@ static int fsa4480_probe(struct i2c_client *client)
>   fsa->client 

Re: [PATCH v2 5/6] ACPI: NFIT: Replace acpi_driver with platform_driver

2023-10-17 Thread Wilczynski, Michal


On 10/6/2023 7:30 PM, Michal Wilczynski wrote:
> NFIT driver uses struct acpi_driver incorrectly to register itself.
> This is wrong as the instances of the ACPI devices are not meant
> to be literal devices, they're supposed to describe ACPI entry of a
> particular device.
>
> Use platform_driver instead of acpi_driver. In relevant places call
> platform devices instances pdev to make a distinction with ACPI
> devices instances.
>
> NFIT driver uses devm_*() family of functions extensively. This change
> has no impact on correct functioning of the whole devm_*() family of
> functions, since the lifecycle of the device stays the same. It is still
> being created during the enumeration, and destroyed on platform device
> removal.
>
> Suggested-by: Rafael J. Wysocki 
> Signed-off-by: Michal Wilczynski 

Hi Dan,
Rafael already reviewed this patch series and merged patches
that he likes. We're waiting on your input regarding two NFIT
commits in this series.

Thanks a lot !
Michał

> ---
>  drivers/acpi/nfit/core.c | 34 ++
>  1 file changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 942b84d94078..fb0bc16fa186 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include "intel.h"
> @@ -98,7 +99,7 @@ static struct acpi_device *to_acpi_dev(struct 
> acpi_nfit_desc *acpi_desc)
>   || strcmp(nd_desc->provider_name, "ACPI.NFIT") != 0)
>   return NULL;
>  
> - return to_acpi_device(acpi_desc->dev);
> + return ACPI_COMPANION(acpi_desc->dev);
>  }
>  
>  static int xlat_bus_status(void *buf, unsigned int cmd, u32 status)
> @@ -3284,11 +3285,11 @@ static void acpi_nfit_put_table(void *table)
>  
>  static void acpi_nfit_notify(acpi_handle handle, u32 event, void *data)
>  {
> - struct acpi_device *adev = data;
> + struct device *dev = data;
>  
> - device_lock(>dev);
> - __acpi_nfit_notify(>dev, handle, event);
> - device_unlock(>dev);
> + device_lock(dev);
> + __acpi_nfit_notify(dev, handle, event);
> + device_unlock(dev);
>  }
>  
>  static void acpi_nfit_remove_notify_handler(void *data)
> @@ -3329,11 +3330,12 @@ void acpi_nfit_shutdown(void *data)
>  }
>  EXPORT_SYMBOL_GPL(acpi_nfit_shutdown);
>  
> -static int acpi_nfit_add(struct acpi_device *adev)
> +static int acpi_nfit_probe(struct platform_device *pdev)
>  {
>   struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
>   struct acpi_nfit_desc *acpi_desc;
> - struct device *dev = >dev;
> + struct device *dev = >dev;
> + struct acpi_device *adev = ACPI_COMPANION(dev);
>   struct acpi_table_header *tbl;
>   acpi_status status = AE_OK;
>   acpi_size sz;
> @@ -3360,7 +3362,7 @@ static int acpi_nfit_add(struct acpi_device *adev)
>   acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
>   if (!acpi_desc)
>   return -ENOMEM;
> - acpi_nfit_desc_init(acpi_desc, >dev);
> + acpi_nfit_desc_init(acpi_desc, dev);
>  
>   /* Save the acpi header for exporting the revision via sysfs */
>   acpi_desc->acpi_header = *tbl;
> @@ -3391,7 +3393,7 @@ static int acpi_nfit_add(struct acpi_device *adev)
>   return rc;
>  
>   rc = acpi_dev_install_notify_handler(adev, ACPI_DEVICE_NOTIFY,
> -  acpi_nfit_notify, adev);
> +  acpi_nfit_notify, dev);
>   if (rc)
>   return rc;
>  
> @@ -3475,11 +3477,11 @@ static const struct acpi_device_id acpi_nfit_ids[] = {
>  };
>  MODULE_DEVICE_TABLE(acpi, acpi_nfit_ids);
>  
> -static struct acpi_driver acpi_nfit_driver = {
> - .name = KBUILD_MODNAME,
> - .ids = acpi_nfit_ids,
> - .ops = {
> - .add = acpi_nfit_add,
> +static struct platform_driver acpi_nfit_driver = {
> + .probe = acpi_nfit_probe,
> + .driver = {
> + .name = KBUILD_MODNAME,
> + .acpi_match_table = acpi_nfit_ids,
>   },
>  };
>  
> @@ -3517,7 +3519,7 @@ static __init int nfit_init(void)
>   return -ENOMEM;
>  
>   nfit_mce_register();
> - ret = acpi_bus_register_driver(_nfit_driver);
> + ret = platform_driver_register(_nfit_driver);
>   if (ret) {
>   nfit_mce_unregister();
>   destroy_workqueue(nfit_wq);
> @@ -3530,7 +3532,7 @@ static __init int nfit_init(void)
>  static __exit void nfit_exit(void)
>  {
>   nfit_mce_unregister();
> - acpi_bus_unregister_driver(_nfit_driver);
> + platform_driver_unregister(_nfit_driver);
>   destroy_workqueue(nfit_wq);
>   WARN_ON(!list_empty(_descs));
>  }




Re: [PATCH 2/2] usb: typec: add support for PTN36502 redriver

2023-10-17 Thread Heikki Krogerus
Hi,

On Fri, Oct 13, 2023 at 04:24:48PM +0200, Luca Weiss wrote:
> Add a driver for the NXP PTN36502 Type-C USB 3.1 Gen 1 and DisplayPort
> v1.2 combo redriver.
> 
> Signed-off-by: Luca Weiss 

Looks OK to me, but couple of nitpicks below. With those fixed:

Reviewed-by: Heikki Krogerus 

> ---
>  drivers/usb/typec/mux/Kconfig|  10 +
>  drivers/usb/typec/mux/Makefile   |   1 +
>  drivers/usb/typec/mux/ptn36502.c | 421 
> +++
>  3 files changed, 432 insertions(+)
> 
> diff --git a/drivers/usb/typec/mux/Kconfig b/drivers/usb/typec/mux/Kconfig
> index 65da61150ba7..816b9bd08355 100644
> --- a/drivers/usb/typec/mux/Kconfig
> +++ b/drivers/usb/typec/mux/Kconfig
> @@ -46,4 +46,14 @@ config TYPEC_MUX_NB7VPQ904M
> Say Y or M if your system has a On Semiconductor NB7VPQ904M Type-C
> redriver chip found on some devices with a Type-C port.
>  
> +config TYPEC_MUX_PTN36502
> + tristate "NXP PTN36502 Type-C redriver driver"
> + depends on I2C
> + depends on DRM || DRM=n
> + select DRM_PANEL_BRIDGE if DRM
> + select REGMAP_I2C
> + help
> +   Say Y or M if your system has a NXP PTN36502 Type-C redriver chip
> +   found on some devices with a Type-C port.
> +
>  endmenu
> diff --git a/drivers/usb/typec/mux/Makefile b/drivers/usb/typec/mux/Makefile
> index 76196096ef41..9d6a5557b0bd 100644
> --- a/drivers/usb/typec/mux/Makefile
> +++ b/drivers/usb/typec/mux/Makefile
> @@ -5,3 +5,4 @@ obj-$(CONFIG_TYPEC_MUX_GPIO_SBU)  += gpio-sbu-mux.o
>  obj-$(CONFIG_TYPEC_MUX_PI3USB30532)  += pi3usb30532.o
>  obj-$(CONFIG_TYPEC_MUX_INTEL_PMC)+= intel_pmc_mux.o
>  obj-$(CONFIG_TYPEC_MUX_NB7VPQ904M)   += nb7vpq904m.o
> +obj-$(CONFIG_TYPEC_MUX_PTN36502) += ptn36502.o
> diff --git a/drivers/usb/typec/mux/ptn36502.c 
> b/drivers/usb/typec/mux/ptn36502.c
> new file mode 100644
> index ..91684a856f3a
> --- /dev/null
> +++ b/drivers/usb/typec/mux/ptn36502.c
> @@ -0,0 +1,421 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * NXP PTN36502 Type-C driver
> + *
> + * Copyright (C) 2023 Luca Weiss 
> + *
> + * Based on NB7VPQ904M driver:
> + * Copyright (C) 2023 Dmitry Baryshkov 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define PTN36502_CHIP_ID_REG 0x00
> +#define PTN36502_CHIP_ID 0x02
> +
> +#define PTN36502_CHIP_REVISION_REG   0x01
> +#define PTN36502_CHIP_REVISION_BASE(val) FIELD_GET(GENMASK(7, 
> 4), (val))
> +#define PTN36502_CHIP_REVISION_METAL(val)FIELD_GET(GENMASK(3, 
> 0), (val))
> +
> +#define PTN36502_DP_LINK_CTRL_REG0x06
> +#define PTN36502_DP_LINK_CTRL_LANES_2(2 << 2)
> +#define PTN36502_DP_LINK_CTRL_LANES_4(3 << 2)
> +#define PTN36502_DP_LINK_CTRL_LINK_RATE_5_4GBPS  (2 << 0)
> +
> +/* Registers for lane 0 (0x07) to lane 3 (0x0a) have the same layout */
> +#define PTN36502_DP_LANE_CTRL_REG(n) (0x07 + (n))
> +#define PTN36502_DP_LANE_CTRL_RX_GAIN_3DB(2<<4)
> +#define PTN36502_DP_LANE_CTRL_TX_SWING_800MVPPD  (2<<2)
> +#define PTN36502_DP_LANE_CTRL_PRE_EMPHASIS_3_5DB (1<<0)
> +
> +#define PTN36502_MODE_CTRL1_REG  0x0b
> +#define PTN36502_MODE_CTRL1_PLUG_ORIENT_REVERSE  (1<<5)
> +#define PTN36502_MODE_CTRL1_AUX_CROSSBAR_SW_ON   (1<<3)
> +#define PTN36502_MODE_CTRL1_MODE_OFF (0<<0)
> +#define PTN36502_MODE_CTRL1_MODE_USB_ONLY(1<<0)
> +#define PTN36502_MODE_CTRL1_MODE_USB_DP  (2<<0)
> +#define PTN36502_MODE_CTRL1_MODE_DP  (3<<0)
> +
> +#define PTN36502_DEVICE_CTRL_REG 0x0d
> +#define PTN36502_DEVICE_CTRL_AUX_MONITORING_EN   (1<<7)

You have couple of different styles here. Please try to always use
BIT() and GENMASK() macros when possible. At the very least put spaces
around << and >>.

> +struct ptn36502 {
> + struct i2c_client *client;
> + struct regulator *vdd18_supply;
> + struct regmap *regmap;
> + struct typec_switch_dev *sw;
> + struct typec_retimer *retimer;
> +
> + struct typec_switch *typec_switch;
> +
> + struct drm_bridge bridge;
> +
> + struct mutex lock; /* protect non-concurrent retimer & switch */
> +
> + enum typec_orientation orientation;
> + unsigned long mode;
> + unsigned int svid;
> +};
> +
> +static int ptn36502_set(struct ptn36502 *ptn)
> +{
> + bool reverse = (ptn->orientation == TYPEC_ORIENTATION_REVERSE);
> + unsigned int ctrl1_val = 0;
> + unsigned int lane_ctrl_val = 0;
> + unsigned int link_ctrl_val = 0;
> +
> + switch (ptn->mode) {
> + case TYPEC_STATE_SAFE:
> + /* Deep power saving state */
> + regmap_write(ptn->regmap, 

[PATCH v3] ACPI: NFIT: Use cleanup.h helpers instead of devm_*()

2023-10-17 Thread Michal Wilczynski
The new cleanup.h facilities that arrived in v6.5-rc1 can replace the
the usage of devm semantics in acpi_nfit_init_interleave_set(). That
routine appears to only be using devm to avoid goto statements. The
new __free() annotation at variable declaration time can achieve the same
effect more efficiently.

There is no end user visible side effects of this patch, I was
motivated to send this cleanup to practice using the new helpers.

Suggested-by: Dave Jiang 
Suggested-by: Andy Shevchenko 
Reviewed-by: Dave Jiang 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Michal Wilczynski 
---

Dan, would you like me to give you credit for the changelog changes
with Co-developed-by tag ?

v3:
 - changed changelog
v2:
 - removed first commit from the patchset, as the commit couldn't
   be marked as a fix
 - squashed those commits together, since the second one were
   mostly overwriting the previous one

 drivers/acpi/nfit/core.c | 21 -
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 3826f49d481b..67a844a705c4 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2257,26 +2257,23 @@ static int acpi_nfit_init_interleave_set(struct 
acpi_nfit_desc *acpi_desc,
struct nd_region_desc *ndr_desc,
struct acpi_nfit_system_address *spa)
 {
+   u16 nr = ndr_desc->num_mappings;
+   struct nfit_set_info2 *info2 __free(kfree) =
+   kcalloc(nr, sizeof(*info2), GFP_KERNEL);
+   struct nfit_set_info *info __free(kfree) =
+   kcalloc(nr, sizeof(*info), GFP_KERNEL);
struct device *dev = acpi_desc->dev;
struct nd_interleave_set *nd_set;
-   u16 nr = ndr_desc->num_mappings;
-   struct nfit_set_info2 *info2;
-   struct nfit_set_info *info;
int i;
 
+   if (!info || !info2)
+   return -ENOMEM;
+
nd_set = devm_kzalloc(dev, sizeof(*nd_set), GFP_KERNEL);
if (!nd_set)
return -ENOMEM;
import_guid(_set->type_guid, spa->range_guid);
 
-   info = devm_kcalloc(dev, nr, sizeof(*info), GFP_KERNEL);
-   if (!info)
-   return -ENOMEM;
-
-   info2 = devm_kcalloc(dev, nr, sizeof(*info2), GFP_KERNEL);
-   if (!info2)
-   return -ENOMEM;
-
for (i = 0; i < nr; i++) {
struct nd_mapping_desc *mapping = _desc->mapping[i];
struct nvdimm *nvdimm = mapping->nvdimm;
@@ -2337,8 +2334,6 @@ static int acpi_nfit_init_interleave_set(struct 
acpi_nfit_desc *acpi_desc,
}
 
ndr_desc->nd_set = nd_set;
-   devm_kfree(dev, info);
-   devm_kfree(dev, info2);
 
return 0;
 }
-- 
2.41.0




Re: [PATCH v2] bus: mhi: host: Add tracing support

2023-10-17 Thread kernel test robot
Hi Krishna,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 3006adf3be79cde4d14b1800b963b82b6e5572e0]

url:
https://github.com/intel-lab-lkp/linux/commits/Krishna-chaitanya-chundru/bus-mhi-host-Add-tracing-support/20231017-120241
base:   3006adf3be79cde4d14b1800b963b82b6e5572e0
patch link:
https://lore.kernel.org/r/20231013-ftrace_support-v2-1-6e893ce010b5%40quicinc.com
patch subject: [PATCH v2] bus: mhi: host: Add tracing support
config: m68k-allyesconfig 
(https://download.01.org/0day-ci/archive/20231017/202310171556.hmpbsvrx-...@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20231017/202310171556.hmpbsvrx-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202310171556.hmpbsvrx-...@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/bus/mhi/host/main.c:249:12: warning: no previous prototype for 
>> 'mhi_to_physical' [-Wmissing-prototypes]
 249 | dma_addr_t mhi_to_physical(struct mhi_ring *ring, void *addr)
 |^~~


vim +/mhi_to_physical +249 drivers/bus/mhi/host/main.c

   248  
 > 249  dma_addr_t mhi_to_physical(struct mhi_ring *ring, void *addr)
   250  {
   251  return (addr - ring->base) + ring->iommu_base;
   252  }
   253  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



[PATCH v2] soc: qcom: pmic_glink_altmode: Print return value on error

2023-10-17 Thread Luca Weiss
It can be useful to know with which return value for example the
typec_retimer_set call failed, so include this info in the dev_err
prints.

Signed-off-by: Luca Weiss 
---
Changes in v2:
- Add ret to print in more prints, not just typec_retimer_set (Bjorn)
- Link to v1: 
https://lore.kernel.org/r/20231013-glink-altmode-ret-v1-1-77941537a...@fairphone.com
---
 drivers/soc/qcom/pmic_glink_altmode.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/soc/qcom/pmic_glink_altmode.c 
b/drivers/soc/qcom/pmic_glink_altmode.c
index 9569d999391d..0a75bfac5e02 100644
--- a/drivers/soc/qcom/pmic_glink_altmode.c
+++ b/drivers/soc/qcom/pmic_glink_altmode.c
@@ -160,7 +160,7 @@ static void pmic_glink_altmode_enable_dp(struct 
pmic_glink_altmode *altmode,
 
ret = typec_mux_set(port->typec_mux, >state);
if (ret)
-   dev_err(altmode->dev, "failed to switch mux to DP\n");
+   dev_err(altmode->dev, "failed to switch mux to DP: %d\n", ret);
 
port->retimer_state.alt = >dp_alt;
port->retimer_state.data = _data;
@@ -168,7 +168,7 @@ static void pmic_glink_altmode_enable_dp(struct 
pmic_glink_altmode *altmode,
 
ret = typec_retimer_set(port->typec_retimer, >retimer_state);
if (ret)
-   dev_err(altmode->dev, "failed to setup retimer to DP\n");
+   dev_err(altmode->dev, "failed to setup retimer to DP: %d\n", 
ret);
 }
 
 static void pmic_glink_altmode_enable_usb(struct pmic_glink_altmode *altmode,
@@ -182,7 +182,7 @@ static void pmic_glink_altmode_enable_usb(struct 
pmic_glink_altmode *altmode,
 
ret = typec_mux_set(port->typec_mux, >state);
if (ret)
-   dev_err(altmode->dev, "failed to switch mux to USB\n");
+   dev_err(altmode->dev, "failed to switch mux to USB: %d\n", ret);
 
port->retimer_state.alt = NULL;
port->retimer_state.data = NULL;
@@ -190,7 +190,7 @@ static void pmic_glink_altmode_enable_usb(struct 
pmic_glink_altmode *altmode,
 
ret = typec_retimer_set(port->typec_retimer, >retimer_state);
if (ret)
-   dev_err(altmode->dev, "failed to setup retimer to USB\n");
+   dev_err(altmode->dev, "failed to setup retimer to USB: %d\n", 
ret);
 }
 
 static void pmic_glink_altmode_safe(struct pmic_glink_altmode *altmode,
@@ -204,7 +204,7 @@ static void pmic_glink_altmode_safe(struct 
pmic_glink_altmode *altmode,
 
ret = typec_mux_set(port->typec_mux, >state);
if (ret)
-   dev_err(altmode->dev, "failed to switch mux to safe mode\n");
+   dev_err(altmode->dev, "failed to switch mux to safe mode: 
%d\n", ret);
 
port->retimer_state.alt = NULL;
port->retimer_state.data = NULL;
@@ -212,7 +212,7 @@ static void pmic_glink_altmode_safe(struct 
pmic_glink_altmode *altmode,
 
ret = typec_retimer_set(port->typec_retimer, >retimer_state);
if (ret)
-   dev_err(altmode->dev, "failed to setup retimer to USB\n");
+   dev_err(altmode->dev, "failed to setup retimer to USB: %d\n", 
ret);
 }
 
 static void pmic_glink_altmode_worker(struct work_struct *work)
@@ -397,7 +397,7 @@ static void pmic_glink_altmode_enable_worker(struct 
work_struct *work)
 
ret = pmic_glink_altmode_request(altmode, ALTMODE_PAN_EN, 0);
if (ret)
-   dev_err(altmode->dev, "failed to request altmode 
notifications\n");
+   dev_err(altmode->dev, "failed to request altmode notifications: 
%d\n", ret);
 }
 
 static void pmic_glink_altmode_pdr_notify(void *priv, int state)

---
base-commit: e3b18f7200f45d66f7141136c25554ac1e82009b
change-id: 20231013-glink-altmode-ret-3911e6c1eab5

Best regards,
-- 
Luca Weiss 



Re: [PATCH v6 2/3] mm/memory_hotplug: split memmap_on_memory requests across memblocks

2023-10-17 Thread Huang, Ying
Vishal Verma  writes:

> The MHP_MEMMAP_ON_MEMORY flag for hotplugged memory is restricted to
> 'memblock_size' chunks of memory being added. Adding a larger span of
> memory precludes memmap_on_memory semantics.
>
> For users of hotplug such as kmem, large amounts of memory might get
> added from the CXL subsystem. In some cases, this amount may exceed the
> available 'main memory' to store the memmap for the memory being added.
> In this case, it is useful to have a way to place the memmap on the
> memory being added, even if it means splitting the addition into
> memblock-sized chunks.
>
> Change add_memory_resource() to loop over memblock-sized chunks of
> memory if caller requested memmap_on_memory, and if other conditions for
> it are met. Teach try_remove_memory() to also expect that a memory
> range being removed might have been split up into memblock sized chunks,
> and to loop through those as needed.
>
> This does preclude being able to use PUD mappings in the direct map; a
> proposal to how this could be optimized in the future is laid out
> here[1].
>
> [1]: 
> https://lore.kernel.org/linux-mm/b6753402-2de9-25b2-36e9-eacd49752...@redhat.com/
>
> Cc: Andrew Morton 
> Cc: David Hildenbrand 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: Dan Williams 
> Cc: Dave Jiang 
> Cc: Dave Hansen 
> Cc: Huang Ying 
> Suggested-by: David Hildenbrand 
> Reviewed-by: Dan Williams 
> Signed-off-by: Vishal Verma 
> ---
>  mm/memory_hotplug.c | 214 
> 
>  1 file changed, 148 insertions(+), 66 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6be7de9efa55..83e5ec377aad 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1380,6 +1380,43 @@ static bool mhp_supports_memmap_on_memory(unsigned 
> long size)
>   return arch_supports_memmap_on_memory(vmemmap_size);
>  }
>  
> +static int add_memory_create_devices(int nid, struct memory_group *group,
> +  u64 start, u64 size, mhp_t mhp_flags)
> +{
> + struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
> + struct vmem_altmap mhp_altmap = {
> + .base_pfn =  PHYS_PFN(start),
> + .end_pfn  =  PHYS_PFN(start + size - 1),
> + };
> + int ret;
> +
> + if ((mhp_flags & MHP_MEMMAP_ON_MEMORY)) {
> + mhp_altmap.free = memory_block_memmap_on_memory_pages();
> + params.altmap = kmemdup(_altmap, sizeof(struct vmem_altmap),
> + GFP_KERNEL);
> + if (!params.altmap)
> + return -ENOMEM;
> + }
> +
> + /* call arch's memory hotadd */
> + ret = arch_add_memory(nid, start, size, );
> + if (ret < 0)
> + goto error;
> +
> + /* create memory block devices after memory was added */
> + ret = create_memory_block_devices(start, size, params.altmap, group);
> + if (ret)
> + goto err_bdev;
> +
> + return 0;
> +
> +err_bdev:
> + arch_remove_memory(start, size, NULL);
> +error:
> + kfree(params.altmap);
> + return ret;
> +}
> +
>  /*
>   * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
>   * and online/offline operations (triggered e.g. by sysfs).
> @@ -1388,14 +1425,10 @@ static bool mhp_supports_memmap_on_memory(unsigned 
> long size)
>   */
>  int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>  {
> - struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
> + unsigned long memblock_size = memory_block_size_bytes();
>   enum memblock_flags memblock_flags = MEMBLOCK_NONE;
> - struct vmem_altmap mhp_altmap = {
> - .base_pfn =  PHYS_PFN(res->start),
> - .end_pfn  =  PHYS_PFN(res->end),
> - };
>   struct memory_group *group = NULL;
> - u64 start, size;
> + u64 start, size, cur_start;
>   bool new_node = false;
>   int ret;
>  
> @@ -1436,28 +1469,21 @@ int __ref add_memory_resource(int nid, struct 
> resource *res, mhp_t mhp_flags)
>   /*
>* Self hosted memmap array
>*/
> - if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
> - if (mhp_supports_memmap_on_memory(size)) {
> - mhp_altmap.free = memory_block_memmap_on_memory_pages();
> - params.altmap = kmemdup(_altmap,
> - sizeof(struct vmem_altmap),
> - GFP_KERNEL);
> - if (!params.altmap)
> + if ((mhp_flags & MHP_MEMMAP_ON_MEMORY) &&
> + mhp_supports_memmap_on_memory(memblock_size)) {
> + for (cur_start = start; cur_start < start + size;
> +  cur_start += memblock_size) {
> + ret = add_memory_create_devices(nid, group, cur_start,
> + memblock_size,
> +