Re: [PATCH] powerpc/mm: Update set_ptes to call pte_filter for all the ptes

2023-10-17 Thread Christophe Leroy


Le 18/10/2023 à 06:55, Aneesh Kumar K.V a écrit :
> With commit 9fee28baa601 ("powerpc: implement the new page table range
> API") we added set_ptes to powerpc architecture but the implementation
> missed calling the pte filter for all the ptes we are setting in the
> range. set_pte_filter can be used for filter pte values and on some
> platforms which don't support coherent icache it clears the exec bit so
> that we can flush the icache on exec fault
> 
> The patch also removes the usage of arch_enter/leave_lazy_mmu() because
> set_pte is not supposed to be used when updating a pte entry. Powerpc
> architecture uses this rule to skip the expensive tlb invalidate which
> is not needed when you are setting up the pte for the first time. See
> commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating
> _PAGE_NUMA bit") for more details
> 
> Fixes: 9fee28baa601 ("powerpc: implement the new page table range API")
> Signed-off-by: Aneesh Kumar K.V 
> ---
>   arch/powerpc/mm/pgtable.c | 33 -
>   1 file changed, 20 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 3ba9fe411604..95ab20cca2da 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, 
> pte_t *ptep,
>   pte_t pte, unsigned int nr)
>   {
>   /*
> -  * Make sure hardware valid bit is not set. We don't do
> -  * tlb flush for this update.
> +  * We don't need to call arch_enter/leave_lazy_mmu_mode()
> +  * because we expect set_ptes to be only be used on not present
> +  * and not hw_valid ptes. Hence there is not translation cache flush
> +  * involved that need to be batched.
>*/
> - VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
> + for (;;) {
>   
> - /* Note: mm->context.id might not yet have been assigned as
> -  * this context might not have been activated yet when this
> -  * is called.
> -  */
> - pte = set_pte_filter(pte);
> + /*
> +  * Make sure hardware valid bit is not set. We don't do
> +  * tlb flush for this update.
> +  */
> + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>   
> - /* Perform the setting of the PTE */
> - arch_enter_lazy_mmu_mode();
> - for (;;) {
> + /* Note: mm->context.id might not yet have been assigned as
> +  * this context might not have been activated yet when this
> +  * is called.
> +  */
> + pte = set_pte_filter(pte);

Why do you need to call set_pte_filter() inside the loop ?
The only difference between previous pte and next pte is the RPN, other 
flags remain untouched so I can't see why you need to call 
set_pte_filter() again.

> +
> + /* Perform the setting of the PTE */
>   __set_pte_at(mm, addr, ptep, pte, 0);
>   if (--nr == 0)
>   break;
>   ptep++;
> - pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
>   addr += PAGE_SIZE;
> + /* increment the pfn */
> + pte = __pte(pte_val(pte) + PAGE_SIZE);

PAGE_SIZE doesn't work on all platforms, see for instance e500.

see comment at 
https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/32/pgtable.h#L147

And then you see 
https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/pte-e500.h#L63

> +
>   }
> - arch_leave_lazy_mmu_mode();
>   }
>   
>   void unmap_kernel_page(unsigned long va)

Christophe


Re: [Bisected] PowerMac G5 fails booting kernel 6.6-rc3 (BUG: Unable to handle kernel data access at 0xfeffbb62ffec65fe)

2023-10-17 Thread Michael Ellerman
Erhard Furtner  writes:
> On Tue, 17 Oct 2023 14:40:49 +1100
> Michael Ellerman  wrote:
>
>> I think I've reproduced the crash on my Quad G5 by using your config
>> with some things tweaked, but I don't get any output on the screen :/
>
> You could try PPC_EARLY_DEBUG=y with PPC_EARLY_DEBUG_BOOTX or 
> PPC_EARLY_DEBUG_G5.

I have tried PPC_EARLY_DEBUG_BOOTX but it didn't help :/

>> Do you mind booting the commit above and taking a photo of the oops and
>> attach it here. The oops you transcribed didn't entirely make sense,
>> probably due to a typo here or there, so a photo would be best.
>> 
>> cheers
>
> No problem. Just thought transcribing the photo would make more sense
> for a mailing list. ;) But maybe some subtle errors slept crept in.
> Attached are 2 photos from the issue on current v6.6-rc6.

Thanks. Yeah text is generally better, it archives better and can be
grepped etc. but in this case I was going a bit mad trying to make sense
of the oops :)

In hindsight the bug is an obvious boot time ordering problem, can you
confirm this fixes it. That should apply on top of Linus' current
master.

cheers

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 2f1026fba00d..71f16fb32ceb 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -948,6 +948,7 @@ void __init setup_arch(char **cmdline_p)
 
/* Parse memory topology */
mem_topology_setup();
+   set_max_mapnr(max_pfn);
 
/*
 * Release secondary cpus out of their spinloops at 0x60 now that
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 8b121df7b08f..07e8f4f1e07f 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -288,7 +288,6 @@ void __init mem_init(void)
 #endif
 
high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);
-   set_max_mapnr(max_pfn);
 
kasan_late_init();
 


[PATCH] powerpc/mm: Update set_ptes to call pte_filter for all the ptes

2023-10-17 Thread Aneesh Kumar K.V
With commit 9fee28baa601 ("powerpc: implement the new page table range
API") we added set_ptes to powerpc architecture but the implementation
missed calling the pte filter for all the ptes we are setting in the
range. set_pte_filter can be used for filter pte values and on some
platforms which don't support coherent icache it clears the exec bit so
that we can flush the icache on exec fault

The patch also removes the usage of arch_enter/leave_lazy_mmu() because
set_pte is not supposed to be used when updating a pte entry. Powerpc
architecture uses this rule to skip the expensive tlb invalidate which
is not needed when you are setting up the pte for the first time. See
commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating
_PAGE_NUMA bit") for more details

Fixes: 9fee28baa601 ("powerpc: implement the new page table range API")
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable.c | 33 -
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 3ba9fe411604..95ab20cca2da 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, 
pte_t *ptep,
pte_t pte, unsigned int nr)
 {
/*
-* Make sure hardware valid bit is not set. We don't do
-* tlb flush for this update.
+* We don't need to call arch_enter/leave_lazy_mmu_mode()
+* because we expect set_ptes to be only be used on not present
+* and not hw_valid ptes. Hence there is not translation cache flush
+* involved that need to be batched.
 */
-   VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
+   for (;;) {
 
-   /* Note: mm->context.id might not yet have been assigned as
-* this context might not have been activated yet when this
-* is called.
-*/
-   pte = set_pte_filter(pte);
+   /*
+* Make sure hardware valid bit is not set. We don't do
+* tlb flush for this update.
+*/
+   VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
 
-   /* Perform the setting of the PTE */
-   arch_enter_lazy_mmu_mode();
-   for (;;) {
+   /* Note: mm->context.id might not yet have been assigned as
+* this context might not have been activated yet when this
+* is called.
+*/
+   pte = set_pte_filter(pte);
+
+   /* Perform the setting of the PTE */
__set_pte_at(mm, addr, ptep, pte, 0);
if (--nr == 0)
break;
ptep++;
-   pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
addr += PAGE_SIZE;
+   /* increment the pfn */
+   pte = __pte(pte_val(pte) + PAGE_SIZE);
+
}
-   arch_leave_lazy_mmu_mode();
 }
 
 void unmap_kernel_page(unsigned long va)
-- 
2.41.0



[PATCH] powerpc/vas: Limit open window failure messages in log bufffer

2023-10-17 Thread Haren Myneni
The VAS open window call prints error message and returns -EBUSY
after the migration suspend event initiated and until the resume
event completed on the destination system. It can cause the log
buffer filled with these error messages if the user space issues
continuous open window calls.  Similar case even for DLPAR CPU
remove event when no credits are available until the credits are
freed or with the other DLPAR CPU add event.

So changes in the patch to use pr_err_ratelimited() instead of
pr_err() to display open window failure and not-available credits
error messages.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/book3s/vas-api.c | 4 ++--
 arch/powerpc/platforms/pseries/vas.c| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/book3s/vas-api.c 
b/arch/powerpc/platforms/book3s/vas-api.c
index 77ea9335fd04..203cfc2fb8ff 100644
--- a/arch/powerpc/platforms/book3s/vas-api.c
+++ b/arch/powerpc/platforms/book3s/vas-api.c
@@ -311,8 +311,8 @@ static int coproc_ioc_tx_win_open(struct file *fp, unsigned 
long arg)
txwin = cp_inst->coproc->vops->open_win(uattr.vas_id, uattr.flags,
cp_inst->coproc->cop_type);
if (IS_ERR(txwin)) {
-   pr_err("%s() VAS window open failed, %ld\n", __func__,
-   PTR_ERR(txwin));
+   pr_err_ratelimited("%s() VAS window open failed, %ld\n",
+   __func__, PTR_ERR(txwin));
return PTR_ERR(txwin);
}
 
diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index b86f0db08e98..7259e6676503 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -341,7 +341,7 @@ static struct vas_window *vas_allocate_window(int vas_id, 
u64 flags,
 
if (atomic_inc_return(_feat_caps->nr_used_credits) >
atomic_read(_feat_caps->nr_total_credits)) {
-   pr_err("Credits are not available to allocate window\n");
+   pr_err_ratelimited("Credits are not available to allocate 
window\n");
rc = -EINVAL;
goto out;
}
@@ -439,7 +439,7 @@ static struct vas_window *vas_allocate_window(int vas_id, 
u64 flags,
 
put_vas_user_win_ref(>vas_win.task_ref);
rc = -EBUSY;
-   pr_err("No credit is available to allocate window\n");
+   pr_err_ratelimited("No credit is available to allocate window\n");
 
 out_free:
/*
-- 
2.26.3



[PATCH v2] powerpc/pseries/vas: Migration suspend waits for no in-progress open windows

2023-10-17 Thread Haren Myneni
The hypervisor returns migration failure if all VAS windows are not
closed. During pre-migration stage, vas_migration_handler() sets
migration_in_progress flag and closes all windows from the list.
The allocate VAS window routine checks the migration flag, setup
the window and then add it to the list. So there is possibility of
the migration handler missing the window that is still in the
process of setup.

t1: Allocate and open VAS   t2: Migration event
window

lock vas_pseries_mutex
If migration_in_progress set
  unlock vas_pseries_mutex
  return
open window HCALL
unlock vas_pseries_mutex
Modify window HCALL lock vas_pseries_mutex
setup windowmigration_in_progress=true
Closes all windows from
the list
unlock vas_pseries_mutex
lock vas_pseries_mutex  return
if nr_closed_windows == 0
  // No DLPAR CPU or migration
  add to the list
  unlock vas_pseries_mutex
  return
unlock vas_pseries_mutex
Close VAS window
// due to DLPAR CPU or migration
return -EBUSY

This patch resolves the issue with the following steps:
- Define migration_in_progress as atomic so that the migration
  handler sets this flag without holding mutex.
- Introduce nr_open_wins_progress counter in VAS capabilities
  struct
- This counter tracks the number of open windows are still in
  progress
- The allocate setup window thread closes windows if the migration
  is set and decrements nr_open_window_progress counter
- The migration handler waits for no in-progress open windows.

Fixes: 37e6764895ef ("powerpc/pseries/vas: Add VAS migration handler")
Signed-off-by: Haren Myneni 

---
Changes from v1:
- Do not define the migration_in_progress flag as atomic as
  suggested by Nathan
---
 arch/powerpc/platforms/pseries/vas.c | 45 +++-
 arch/powerpc/platforms/pseries/vas.h |  2 ++
 2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index 15d958e38eca..b86f0db08e98 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -32,6 +32,7 @@ static struct hv_vas_cop_feat_caps hv_cop_caps;
 static struct vas_caps vascaps[VAS_MAX_FEAT_TYPE];
 static DEFINE_MUTEX(vas_pseries_mutex);
 static bool migration_in_progress;
+static DECLARE_WAIT_QUEUE_HEAD(open_win_progress_wq);
 
 static long hcall_return_busy_check(long rc)
 {
@@ -384,11 +385,15 @@ static struct vas_window *vas_allocate_window(int vas_id, 
u64 flags,
 * same fault IRQ is not freed by the OS before.
 */
mutex_lock(_pseries_mutex);
-   if (migration_in_progress)
+   if (migration_in_progress) {
rc = -EBUSY;
-   else
+   } else {
rc = allocate_setup_window(txwin, (u64 *)[0],
   cop_feat_caps->win_type);
+   if (!rc)
+   atomic_inc(>nr_open_wins_progress);
+   }
+
mutex_unlock(_pseries_mutex);
if (rc)
goto out;
@@ -403,8 +408,17 @@ static struct vas_window *vas_allocate_window(int vas_id, 
u64 flags,
goto out_free;
 
txwin->win_type = cop_feat_caps->win_type;
-   mutex_lock(_pseries_mutex);
+
/*
+* The migration SUSPEND thread sets migration_in_progress and
+* closes all open windows from the list. But the window is
+* added to the list after open and modify HCALLs. So possible
+* that migration_in_progress is set before modify HCALL which
+* may cause some windows are still open when the hypervisor
+* initiates the migration.
+* So checks the migration_in_progress flag again and close all
+* open windows.
+*
 * Possible to lose the acquired credit with DLPAR core
 * removal after the window is opened. So if there are any
 * closed windows (means with lost credits), do not give new
@@ -412,9 +426,11 @@ static struct vas_window *vas_allocate_window(int vas_id, 
u64 flags,
 * after the existing windows are reopened when credits are
 * available.
 */
-   if (!caps->nr_close_wins) {
+   mutex_lock(_pseries_mutex);
+   if (!caps->nr_close_wins && !migration_in_progress) {
list_add(>win_list, >list);
caps->nr_open_windows++;
+   atomic_dec(>nr_open_wins_progress);
mutex_unlock(_pseries_mutex);
vas_user_win_add_mm_context(>vas_win.task_ref);
return >vas_win;
@@ -432,6 +448,8 @@ static struct vas_window *vas_allocate_window(int vas_id, 
u64 flags,
 */
free_irq_setup(txwin);
h_deallocate_vas_window(txwin->vas_win.winid);
+   atomic_dec(>nr_open_wins_progress);
+   wake_up(_win_progress_wq);
 out:
atomic_dec(_feat_caps->nr_used_credits);

Re: [RFC PATCH v6 09/11] media: uapi: Add audio rate controls support

2023-10-17 Thread Shengjiu Wang
On Tue, Oct 17, 2023 at 9:37 PM Hans Verkuil  wrote:
>
> On 17/10/2023 15:11, Shengjiu Wang wrote:
> > On Mon, Oct 16, 2023 at 9:16 PM Hans Verkuil  wrote:
> >>
> >> Hi Shengjiu,
> >>
> >> On 13/10/2023 10:31, Shengjiu Wang wrote:
> >>> Fixed point controls are used by the user to configure
> >>> the audio sample rate to driver.
> >>>
> >>> Add V4L2_CID_ASRC_SOURCE_RATE and V4L2_CID_ASRC_DEST_RATE
> >>> new IDs for ASRC rate control.
> >>>
> >>> Signed-off-by: Shengjiu Wang 
> >>> ---
> >>>  .../userspace-api/media/v4l/common.rst|  1 +
> >>>  .../media/v4l/ext-ctrls-fixed-point.rst   | 36 +++
> >>>  .../media/v4l/vidioc-g-ext-ctrls.rst  |  4 +++
> >>>  .../media/v4l/vidioc-queryctrl.rst|  7 
> >>>  .../media/videodev2.h.rst.exceptions  |  1 +
> >>>  drivers/media/v4l2-core/v4l2-ctrls-core.c |  5 +++
> >>>  drivers/media/v4l2-core/v4l2-ctrls-defs.c |  4 +++
> >>>  include/media/v4l2-ctrls.h|  2 ++
> >>>  include/uapi/linux/v4l2-controls.h| 13 +++
> >>>  include/uapi/linux/videodev2.h|  3 ++
> >>>  10 files changed, 76 insertions(+)
> >>>  create mode 100644 
> >>> Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst
> >>>
> >>> diff --git a/Documentation/userspace-api/media/v4l/common.rst 
> >>> b/Documentation/userspace-api/media/v4l/common.rst
> >>> index ea0435182e44..35707edffb13 100644
> >>> --- a/Documentation/userspace-api/media/v4l/common.rst
> >>> +++ b/Documentation/userspace-api/media/v4l/common.rst
> >>> @@ -52,6 +52,7 @@ applicable to all devices.
> >>>  ext-ctrls-fm-rx
> >>>  ext-ctrls-detect
> >>>  ext-ctrls-colorimetry
> >>> +ext-ctrls-fixed-point
> >>
> >> Rename this to ext-ctrls-audio-m2m.
> >>
> >>>  fourcc
> >>>  format
> >>>  planar-apis
> >>> diff --git 
> >>> a/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst 
> >>> b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst
> >>> new file mode 100644
> >>> index ..2ef6e250580c
> >>> --- /dev/null
> >>> +++ b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst
> >>> @@ -0,0 +1,36 @@
> >>> +.. SPDX-License-Identifier: GFDL-1.1-no-invariants-or-later
> >>> +
> >>> +.. _fixed-point-controls:
> >>> +
> >>> +***
> >>> +Fixed Point Control Reference
> >>
> >> This is for audio controls. "Fixed Point" is just the type, and it doesn't 
> >> make
> >> sense to group fixed point controls. But it does make sense to group the 
> >> audio
> >> controls.
> >>
> >> V4L2 controls can be grouped into classes. Basically it is a way to put 
> >> controls
> >> into categories, and for each category there is also a control that gives a
> >> description of the class (see 2.15.15 in
> >> https://linuxtv.org/downloads/v4l-dvb-apis-new/driver-api/v4l2-controls.html#introduction)
> >>
> >> If you use e.g. 'v4l2-ctl -l' to list all the controls, then you will see 
> >> that
> >> they are grouped based on what class of control they are.
> >>
> >> So I think it would be a good idea to create a new control class for M2M 
> >> audio controls,
> >> instead of just adding them to the catch-all 'User Controls' class.
> >>
> >> Search e.g. for V4L2_CTRL_CLASS_COLORIMETRY and V4L2_CID_COLORIMETRY_CLASS 
> >> to see how
> >> it is done.
> >>
> >> M2M_AUDIO would probably be a good name for the class.
> >>
> >>> +***
> >>> +
> >>> +These controls are intended to support an asynchronous sample
> >>> +rate converter.
> >>
> >> Add ' (ASRC).' at the end to indicate the common abbreviation for
> >> that.
> >>
> >>> +
> >>> +.. _v4l2-audio-asrc:
> >>> +
> >>> +``V4L2_CID_ASRC_SOURCE_RATE``
> >>> +sets the resampler source rate.
> >>> +
> >>> +``V4L2_CID_ASRC_DEST_RATE``
> >>> +sets the resampler destination rate.
> >>
> >> Document the unit (Hz) for these two controls.
> >>
> >>> +
> >>> +.. c:type:: v4l2_ctrl_fixed_point
> >>> +
> >>> +.. cssclass:: longtable
> >>> +
> >>> +.. tabularcolumns:: |p{1.5cm}|p{5.8cm}|p{10.0cm}|
> >>> +
> >>> +.. flat-table:: struct v4l2_ctrl_fixed_point
> >>> +:header-rows:  0
> >>> +:stub-columns: 0
> >>> +:widths:   1 1 2
> >>> +
> >>> +* - __u32
> >>
> >> Hmm, shouldn't this be __s32?
> >>
> >>> +  - ``integer``
> >>> +  - integer part of fixed point value.
> >>> +* - __s32
> >>
> >> and this __u32?
> >>
> >> You want to be able to use this generic type as a signed value.
> >>
> >>> +  - ``fractional``
> >>> +  - fractional part of fixed point value, which is Q31.
> >>> diff --git a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst 
> >>> b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
> >>> index f9f73530a6be..1811dabf5c74 100644
> >>> --- a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
> >>> +++ b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
> >>> @@ -295,6 +295,10 @@ still cause this 

[PATCH] hvc/xen: fix event channel handling for secondary consoles

2023-10-17 Thread David Woodhouse
From: David Woodhouse 

The xencons_connect_backend() function allocates a local interdomain
event channel with xenbus_alloc_evtchn(), then calls
bind_interdomain_evtchn_to_irq_lateeoi() to bind to that port# on the
*remote* domain.

That doesn't work very well:

(qemu) device_add xen-console,id=con1,chardev=pty0
[   44.323872] xenconsole console-1: 2 xenbus_dev_probe on device/console/1
[   44.323995] xenconsole: probe of console-1 failed with error -2

Fix it to use bind_evtchn_to_irq_lateeoi(), which does the right thing
by just binding that *local* event channel to an irq. The backend will
do the interdomain binding.

This didn't affect the primary console because the setup for that is
special — the toolstack allocates the guest event channel and the guest
discovers it with HVMOP_get_param.

Once that's fixed, there's also a warning on hot-unplug because
xencons_disconnect_backend() unconditionally calls free_irq() via
unbind_from_irqhandler():

(qemu) device_del con1
[   32.050919] [ cut here ]
[   32.050942] Trying to free already-free IRQ 33
[   32.050990] WARNING: CPU: 0 PID: 51 at kernel/irq/manage.c:1895 
__free_irq+0x1d4/0x330

Fix that by calling notifier_del_irq() first, which only calls
free_irq() if the irq was requested in the first place. Then use
evtchn_put() to release the irq and event channel. Avoid calling
xenbus_free_evtchn() in the normal case, as evtchn_put() will do that
too. The only time xenbus_free_evtchn() needs to be called is for the
cleanup when bind_evtchn_to_irq_lateeoi() fails.

Finally, fix the error path in xen_hvc_init() when there's no primary
console. It should still register the frontend driver, as there may be
secondary consoles. (Qemu can always add secondary consoles, but only
the toolstack can add the primary because it's special.)

Fixes: fe415186b4 ("xen/console: harden hvc_xen against event channel storms")
Signed-off-by: David Woodhouse 
Cc: sta...@vger.kernel.org
---

Untested on actual Xen.

 drivers/tty/hvc/hvc_xen.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 98764e740c07..f0376612b267 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -377,9 +377,13 @@ void xen_console_resume(void)
 #ifdef CONFIG_HVC_XEN_FRONTEND
 static void xencons_disconnect_backend(struct xencons_info *info)
 {
-   if (info->irq > 0)
-   unbind_from_irqhandler(info->irq, NULL);
-   info->irq = 0;
+   if (info->irq > 0) {
+   notifier_del_irq(info->hvc, info->irq);
+   evtchn_put(info->evtchn);
+   info->irq = 0;
+   info->evtchn = 0;
+   }
+   /* evtchn_put() will also close it so this is only an error path */
if (info->evtchn > 0)
xenbus_free_evtchn(info->xbdev, info->evtchn);
info->evtchn = 0;
@@ -433,7 +437,7 @@ static int xencons_connect_backend(struct xenbus_device 
*dev,
if (ret)
return ret;
info->evtchn = evtchn;
-   irq = bind_interdomain_evtchn_to_irq_lateeoi(dev, evtchn);
+   irq = bind_evtchn_to_irq_lateeoi(evtchn);
if (irq < 0)
return irq;
info->irq = irq;
@@ -597,7 +601,7 @@ static int __init xen_hvc_init(void)
else
r = xen_pv_console_init();
if (r < 0)
-   return r;
+   goto register_fe;
 
info = vtermno_to_xencons(HVC_COOKIE);
info->irq = bind_evtchn_to_irq_lateeoi(info->evtchn);
@@ -616,12 +620,13 @@ static int __init xen_hvc_init(void)
list_del(>list);
spin_unlock_irqrestore(_lock, flags);
if (info->irq)
-   unbind_from_irqhandler(info->irq, NULL);
+   evtchn_put(info->evtchn);
kfree(info);
return r;
}
 
r = 0;
+ register_fe:
 #ifdef CONFIG_HVC_XEN_FRONTEND
r = xenbus_register_frontend(_driver);
 #endif
-- 
2.40.1




smime.p7s
Description: S/MIME cryptographic signature


[PING][PATCH] uapi/auxvec: Define AT_HWCAP3 and AT_HWCAP4 aux vector, entries

2023-10-17 Thread Peter Bergner
CCing linux-kernel for more exposure.

PING.  I'm waiting on a reply from anyone on the kernel side of things
to see whether they have an issue with reserving values for AT_HWCAP3
and AT_HWCAP4.  

I'll note reviews from the GLIBC camp did not have an issue with the below 
patch.

Thanks.

Peter


On 9/26/23 5:02 PM, Peter Bergner wrote:
> The powerpc toolchain keeps a copy of the HWCAP bit masks in our TCB for fast
> access by our __builtin_cpu_supports built-in function.  The TCB space for
> the HWCAP entries - which are created in pairs - is an ABI extension, so
> waiting to create the space for HWCAP3 and HWCAP4 until we need them is
> problematical, given distro unwillingness to apply ABI modifying patches
> to distro point releases.  Define AT_HWCAP3 and AT_HWCAP4 in the generic
> uapi header so they can be used in GLIBC to reserve space in the powerpc
> TCB for their future use.
> 
> I scanned both the Linux and GLIBC source codes looking for unused AT_*
> values and 29 and 30 did not seem to be used, so they are what I went
> with.  If anyone sees a problem with using those specific values, I'm
> amenable to using other values, just let me know what would be better.
> 
> Peter
> 
> 
> Signed-off-by: Peter Bergner 
> ---
>  include/uapi/linux/auxvec.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/uapi/linux/auxvec.h b/include/uapi/linux/auxvec.h
> index 6991c4b8ab18..cc61cb9b3e9a 100644
> --- a/include/uapi/linux/auxvec.h
> +++ b/include/uapi/linux/auxvec.h
> @@ -32,6 +32,8 @@
>  #define AT_HWCAP2 26 /* extension of AT_HWCAP */
>  #define AT_RSEQ_FEATURE_SIZE 27  /* rseq supported feature size */
>  #define AT_RSEQ_ALIGN28  /* rseq allocation alignment */
> +#define AT_HWCAP3 29 /* extension of AT_HWCAP */
> +#define AT_HWCAP4 30 /* extension of AT_HWCAP */
>  
>  #define AT_EXECFN  31/* filename of program */
>  



Re: [PATCH 0/2] Allow nesting of lazy MMU mode

2023-10-17 Thread Erhard Furtner
On Tue, 17 Oct 2023 11:34:23 +0530
"Aneesh Kumar K.V"  wrote:

> ie, we can do something like below. The change also make sure we call
> set_pte_filter on all the ptes we are setting via set_ptes(). I haven't
> sent this as a proper patch because we still are not able to fix the
> issue Erhard reported. 
> 
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 3ba9fe411604..95ab20cca2da 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, 
> pte_t *ptep,
>   pte_t pte, unsigned int nr)
>  {
>   /*
> -  * Make sure hardware valid bit is not set. We don't do
> -  * tlb flush for this update.
> +  * We don't need to call arch_enter/leave_lazy_mmu_mode()
> +  * because we expect set_ptes to be only be used on not present
> +  * and not hw_valid ptes. Hence there is not translation cache flush
> +  * involved that need to be batched.
>*/
> - VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
> + for (;;) {
>  
> - /* Note: mm->context.id might not yet have been assigned as
> -  * this context might not have been activated yet when this
> -  * is called.
> -  */
> - pte = set_pte_filter(pte);
> + /*
> +  * Make sure hardware valid bit is not set. We don't do
> +  * tlb flush for this update.
> +  */
> + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>  
> - /* Perform the setting of the PTE */
> - arch_enter_lazy_mmu_mode();
> - for (;;) {
> + /* Note: mm->context.id might not yet have been assigned as
> +  * this context might not have been activated yet when this
> +  * is called.
> +  */
> + pte = set_pte_filter(pte);
> +
> + /* Perform the setting of the PTE */
>   __set_pte_at(mm, addr, ptep, pte, 0);
>   if (--nr == 0)
>   break;
>   ptep++;
> - pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
>   addr += PAGE_SIZE;
> + /* increment the pfn */
> + pte = __pte(pte_val(pte) + PAGE_SIZE);
> +
>   }
> - arch_leave_lazy_mmu_mode();
>  }
>  
>  void unmap_kernel_page(unsigned long va)

Was this a new version of the patch for me to test? Sorry for asking but this 
was a bit unclear to me.

In any case I tried it on top of v6.6-rc6 and it did not help with the issue I 
reported.

Regards,
Erhard


Re: [PATCH 1/4] kbuild: remove ARCH_POSTLINK from module builds

2023-10-17 Thread Nicolas Schier
On Tue, Oct 17, 2023 at 07:37:39PM +0900, Masahiro Yamada wrote:
> The '%.ko' rule in arch/*/Makefile.postlink does nothing but call the
> 'true' command.
> 
> Remove the meaningless code.
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
>  arch/mips/Makefile.postlink| 3 ---
>  arch/powerpc/Makefile.postlink | 3 ---
>  arch/riscv/Makefile.postlink   | 3 ---
>  arch/x86/Makefile.postlink | 3 ---
>  scripts/Makefile.modfinal  | 5 +
>  5 files changed, 1 insertion(+), 16 deletions(-)

Reviewed-by: Nicolas Schier 


Re: [PATCH 1/4] add generic builtin command line

2023-10-17 Thread Pratyush Brahma

Hi Daniel

We have a usecase which requires this patch necessarily. For android 
usecases, we have two different build variants
differentiated by defconfigs - production and debug. However, we only 
have a single dts for both these variants.



We want to enable certain features like page owner and slub debug which 
require cmdline params in addition to
their respective configs to be enabled. Enabling page_owner and 
slub_debug options in dts file enables it for both
production and debug variants. These features have significant memory 
overhead which are undesirable for
our production environment. However, these are necessary for debug 
environment to enable internal testing and debug.
Currently, android uses out-of-tree configs like CONFIG_CMDLINE_EXTEND 
to do so in gki_defconfig [1].
One option is to use CMDLINE_FORCE option which would enable these 
cmdline params but this disables the bootloader to add

any additional cmdline params which may be necessary.


For such a usecase, the CONFIG_CMDLINE_PREPEND seems to be quite useful 
as it would help to stitch bootloader
and the desired build variant's configs together. Can you please help to 
merge this patch?



[1] 
https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1-lts/arch/arm64/configs/gki_defconfig#62




[PATCH 2/6] usb: gadget: fsl-udc: Convert to use module_platform_driver()

2023-10-17 Thread Uwe Kleine-König
module_platform_driver_probe() has the advantage that the .probe() and
.remove() calls can live in .init.text and .exit.text respectively and
so some memory is saved. The downside is that dynamic bind and unbind
are impossible. As the driver doesn't benefit from the advantages (both
.probe and .remove are defined in plain .text), stop suffering from the
downsides and use module_platform_driver() instead of
module_platform_driver_probe().

Signed-off-by: Uwe Kleine-König 
---
 drivers/usb/gadget/udc/fsl_udc_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c 
b/drivers/usb/gadget/udc/fsl_udc_core.c
index ee5705d336e3..2693a10eb0c7 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -2666,6 +2666,7 @@ static const struct platform_device_id fsl_udc_devtype[] 
= {
 };
 MODULE_DEVICE_TABLE(platform, fsl_udc_devtype);
 static struct platform_driver udc_driver = {
+   .probe  = fsl_udc_probe,
.remove = fsl_udc_remove,
.id_table   = fsl_udc_devtype,
/* these suspend and resume are not usb suspend and resume */
@@ -2679,7 +2680,7 @@ static struct platform_driver udc_driver = {
},
 };
 
-module_platform_driver_probe(udc_driver, fsl_udc_probe);
+module_platform_driver(udc_driver);
 
 MODULE_DESCRIPTION(DRIVER_DESC);
 MODULE_AUTHOR(DRIVER_AUTHOR);
-- 
2.42.0



[PATCH 0/6] usb: gadget: Convert to use module_platform_driver()

2023-10-17 Thread Uwe Kleine-König
Hello,

module_platform_driver_probe() is an alternative to
module_platform_driver(). Comparing the two the former has the advantage
that the probe and remove callbacks can live in .init.text and
.exit.text respectively. The latter has the advantage that it's a bit
easier to use correctly and you can bind/unbind via sysfs and/or
hotplug.

There are considerations about deprecating
module_platform_driver_probe()[1] as very few drivers use it, still less
make actually use of the advantages and saving a few bytes isn't as
important any more as it was (say) 10 years ago.

Given that the drivers below drivers/usb/gadget making use of
module_platform_driver_probe() doesn't benefit from the advantages
at all (probe and remove are all defined in .text), convert these
drivers to module_platform_driver().

Best regards
Uwe

[1] 
https://lore.kernel.org/linux-kbuild/20231017132045.afswdgcv4axjf...@pengutronix.de

Uwe Kleine-König (6):
  usb: gadget: at91-udc: Convert to use module_platform_driver()
  usb: gadget: fsl-udc: Convert to use module_platform_driver()
  usb: gadget: fusb300-udc: Convert to use module_platform_driver()
  usb: gadget: lpc32xx-udc: Convert to use module_platform_driver()
  usb: gadget: m66592-udc: Convert to use module_platform_driver()
  usb: gadget: r8a66597-udc: Convert to use module_platform_driver()

 drivers/usb/gadget/udc/at91_udc.c | 3 ++-
 drivers/usb/gadget/udc/fsl_udc_core.c | 3 ++-
 drivers/usb/gadget/udc/fusb300_udc.c  | 7 ---
 drivers/usb/gadget/udc/lpc32xx_udc.c  | 3 ++-
 drivers/usb/gadget/udc/m66592-udc.c   | 3 ++-
 drivers/usb/gadget/udc/r8a66597-udc.c | 3 ++-
 6 files changed, 14 insertions(+), 8 deletions(-)

base-commit: 4d5ab2376ec576af173e5eac3887ed0b51bd8566
-- 
2.42.0



Re: [PATCH] tools/perf/arch/powerpc: Fix the CPU ID const char* value by adding 0x prefix

2023-10-17 Thread Namhyung Kim
On Mon, 9 Oct 2023 10:30:52 +0530, Athira Rajeev wrote:
> Simple expression parser test fails in powerpc as below:
> 
> 4: Simple expression parser
> test child forked, pid 170385
> Using CPUID 004e2102
> division by zero
> syntax error
> syntax error
> FAILED tests/expr.c:65 parse test failed
> test child finished with -1
> Simple expression parser: FAILED!
> 
> [...]

Applied to perf-tools-next, thanks!



Re: [PATCH V2 0/3] Fix for shellcheck issues with latest scripts in tests/shell

2023-10-17 Thread Namhyung Kim
On Fri, 13 Oct 2023 13:00:18 +0530, Athira Rajeev wrote:
> shellcheck was run on perf tool shell scripts as a pre-requisite
> to include a build option for shellcheck discussed here:
> https://www.spinics.net/lists/linux-perf-users/msg25553.html
> 
> And fixes were added for the coding/formatting issues in
> two patchsets:
> https://lore.kernel.org/linux-perf-users/20230613164145.50488-1-atraj...@linux.vnet.ibm.com/
> https://lore.kernel.org/linux-perf-users/20230709182800.53002-1-atraj...@linux.vnet.ibm.com/
> 
> [...]

Applied to perf-tools-next, thanks!



[PATCH v3 4/4] PCI: layerscape: Add suspend/resume for ls1043a

2023-10-17 Thread Frank Li
ls1043a add suspend/resume support.
Implement ls1043a_pcie_send_turnoff_msg() to send PME_Turn_Off message.
Implement ls1043a_pcie_exit_from_l2() to exit from L2 state.

Signed-off-by: Frank Li 
---

Notes:
Change from v2 to v3
- Remove ls_pcie_lut_readl(writel) function

Change from v1 to v2
- Update subject 'a' to 'A'

 drivers/pci/controller/dwc/pci-layerscape.c | 86 -
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 4b663b20d8612..9656224960b0c 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -41,6 +41,15 @@
 #define SCFG_PEXSFTRSTCR   0x190
 #define PEXSR(idx) BIT(idx)
 
+/* LS1043A PEX PME control register */
+#define SCFG_PEXPMECR  0x144
+#define PEXPME(idx)BIT(31 - (idx) * 4)
+
+/* LS1043A PEX LUT debug register */
+#define LS_PCIE_LDBG   0x7fc
+#define LDBG_SRBIT(30)
+#define LDBG_WEBIT(31)
+
 #define PCIE_IATU_NUM  6
 
 #define LS_PCIE_DRV_SCFG   BIT(0)
@@ -227,6 +236,68 @@ static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
return 0;
 }
 
+static void ls1043a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+   u32 val;
+
+   if (!pcie->scfg) {
+   dev_dbg(pcie->pci->dev, "SYSCFG is NULL\n");
+   return;
+   }
+
+   /* Send Turn_off message */
+   regmap_read(pcie->scfg, SCFG_PEXPMECR, );
+   val |= PEXPME(pcie->index);
+   regmap_write(pcie->scfg, SCFG_PEXPMECR, val);
+
+   /*
+* There is no specific register to check for PME_To_Ack from endpoint.
+* So on the safe side, wait for PCIE_PME_TO_L2_TIMEOUT_US.
+*/
+   mdelay(PCIE_PME_TO_L2_TIMEOUT_US/1000);
+
+   /*
+* Layerscape hardware reference manual recommends clearing the 
PMXMTTURNOFF bit
+* to complete the PME_Turn_Off handshake.
+*/
+   regmap_read(pcie->scfg, SCFG_PEXPMECR, );
+   val &= ~PEXPME(pcie->index);
+   regmap_write(pcie->scfg, SCFG_PEXPMECR, val);
+}
+
+static int ls1043a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+   u32 val;
+
+   /*
+* Only way let PEX module exit L2 is do a software reset.
+* LDBG_WE: allows the user to have write access to the PEXDBG[SR] for 
both setting and
+*  clearing the soft reset on the PEX module.
+* LDBG_SR: When SR is set to 1, the PEX module enters soft reset.
+*/
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val |= LDBG_WE;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val |= LDBG_SR;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val &= ~LDBG_SR;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val &= ~LDBG_WE;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   return 0;
+}
+
 static const struct dw_pcie_host_ops ls_pcie_host_ops = {
.host_init = ls_pcie_host_init,
.pme_turn_off = ls_pcie_send_turnoff_msg,
@@ -244,6 +315,19 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
.flags = LS_PCIE_DRV_SCFG,
 };
 
+static const struct dw_pcie_host_ops ls1043a_pcie_host_ops = {
+   .host_init = ls_pcie_host_init,
+   .pme_turn_off = ls1043a_pcie_send_turnoff_msg,
+};
+
+static const struct ls_pcie_drvdata ls1043a_drvdata = {
+   .pf_lut_off = 0x1,
+   .pm_support = true,
+   .ops = _pcie_host_ops,
+   .exit_from_l2 = ls1043a_pcie_exit_from_l2,
+   .flags = LS_PCIE_DRV_SCFG,
+};
+
 static const struct ls_pcie_drvdata layerscape_drvdata = {
.pf_lut_off = 0xc,
.pm_support = true,
@@ -254,7 +338,7 @@ static const struct of_device_id ls_pcie_of_match[] = {
{ .compatible = "fsl,ls1012a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1021a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1028a-pcie", .data = _drvdata },
-   { .compatible = "fsl,ls1043a-pcie", .data = _drvdata },
+   { .compatible = "fsl,ls1043a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1046a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls2080a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls2085a-pcie", .data = _drvdata },
-- 
2.34.1



[PATCH v3 3/4] PCI: layerscape: Rename pf_* as pf_lut_*

2023-10-17 Thread Frank Li
'pf' and 'lut' is just difference name in difference chips, but basic it is
a MMIO base address plus an offset.

Rename it to avoid duplicate pf_* and lut_* in driver.

Signed-off-by: Frank Li 
---

Notes:
change from v1 to v3
- new patch at v3

 drivers/pci/controller/dwc/pci-layerscape.c | 34 ++---
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 6f47cfe146c44..4b663b20d8612 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -46,7 +46,7 @@
 #define LS_PCIE_DRV_SCFG   BIT(0)
 
 struct ls_pcie_drvdata {
-   const u32 pf_off;
+   const u32 pf_lut_off;
const struct dw_pcie_host_ops *ops;
int (*exit_from_l2)(struct dw_pcie_rp *pp);
int flags;
@@ -56,13 +56,13 @@ struct ls_pcie_drvdata {
 struct ls_pcie {
struct dw_pcie *pci;
const struct ls_pcie_drvdata *drvdata;
-   void __iomem *pf_base;
+   void __iomem *pf_lut_base;
struct regmap *scfg;
int index;
bool big_endian;
 };
 
-#define ls_pcie_pf_readl_addr(addr)ls_pcie_pf_readl(pcie, addr)
+#define ls_pcie_pf_lut_readl_addr(addr)ls_pcie_pf_lut_readl(pcie, addr)
 #define to_ls_pcie(x)  dev_get_drvdata((x)->dev)
 
 static bool ls_pcie_is_bridge(struct ls_pcie *pcie)
@@ -103,20 +103,20 @@ static void ls_pcie_fix_error_response(struct ls_pcie 
*pcie)
iowrite32(PCIE_ABSERR_SETTING, pci->dbi_base + PCIE_ABSERR);
 }
 
-static u32 ls_pcie_pf_readl(struct ls_pcie *pcie, u32 off)
+static u32 ls_pcie_pf_lut_readl(struct ls_pcie *pcie, u32 off)
 {
if (pcie->big_endian)
-   return ioread32be(pcie->pf_base + off);
+   return ioread32be(pcie->pf_lut_base + off);
 
-   return ioread32(pcie->pf_base + off);
+   return ioread32(pcie->pf_lut_base + off);
 }
 
-static void ls_pcie_pf_writel(struct ls_pcie *pcie, u32 off, u32 val)
+static void ls_pcie_pf_lut_writel(struct ls_pcie *pcie, u32 off, u32 val)
 {
if (pcie->big_endian)
-   iowrite32be(val, pcie->pf_base + off);
+   iowrite32be(val, pcie->pf_lut_base + off);
else
-   iowrite32(val, pcie->pf_base + off);
+   iowrite32(val, pcie->pf_lut_base + off);
 }
 
 static void ls_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
@@ -126,11 +126,11 @@ static void ls_pcie_send_turnoff_msg(struct dw_pcie_rp 
*pp)
u32 val;
int ret;
 
-   val = ls_pcie_pf_readl(pcie, LS_PCIE_PF_MCR);
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_PF_MCR);
val |= PF_MCR_PTOMR;
-   ls_pcie_pf_writel(pcie, LS_PCIE_PF_MCR, val);
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_PF_MCR, val);
 
-   ret = readx_poll_timeout(ls_pcie_pf_readl_addr, LS_PCIE_PF_MCR,
+   ret = readx_poll_timeout(ls_pcie_pf_lut_readl_addr, LS_PCIE_PF_MCR,
 val, !(val & PF_MCR_PTOMR),
 PCIE_PME_TO_L2_TIMEOUT_US/10,
 PCIE_PME_TO_L2_TIMEOUT_US);
@@ -149,15 +149,15 @@ static int ls_pcie_exit_from_l2(struct dw_pcie_rp *pp)
 * Set PF_MCR_EXL2S bit in LS_PCIE_PF_MCR register for the link
 * to exit L2 state.
 */
-   val = ls_pcie_pf_readl(pcie, LS_PCIE_PF_MCR);
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_PF_MCR);
val |= PF_MCR_EXL2S;
-   ls_pcie_pf_writel(pcie, LS_PCIE_PF_MCR, val);
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_PF_MCR, val);
 
/*
 * L2 exit timeout of 10ms is not defined in the specifications,
 * it was chosen based on empirical observations.
 */
-   ret = readx_poll_timeout(ls_pcie_pf_readl_addr, LS_PCIE_PF_MCR,
+   ret = readx_poll_timeout(ls_pcie_pf_lut_readl_addr, LS_PCIE_PF_MCR,
 val, !(val & PF_MCR_EXL2S),
 1000,
 1);
@@ -245,7 +245,7 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
 };
 
 static const struct ls_pcie_drvdata layerscape_drvdata = {
-   .pf_off = 0xc,
+   .pf_lut_off = 0xc,
.pm_support = true,
.exit_from_l2 = ls_pcie_exit_from_l2,
 };
@@ -295,7 +295,7 @@ static int ls_pcie_probe(struct platform_device *pdev)
 
pcie->big_endian = of_property_read_bool(dev->of_node, "big-endian");
 
-   pcie->pf_base = pci->dbi_base + pcie->drvdata->pf_off;
+   pcie->pf_lut_base = pci->dbi_base + pcie->drvdata->pf_lut_off;
 
if (pcie->drvdata->flags & LS_PCIE_DRV_SCFG) {
 
-- 
2.34.1



[PATCH v3 2/4] PCI: layerscape: Add suspend/resume for ls1021a

2023-10-17 Thread Frank Li
ls1021a add suspend/resume support.

Implement callback ls1021a_pcie_send_turnoff_msg(), which write scfg's
SCFG_PEXPMWRCR to issue PME_Turn_off message.

Implement ls1021a_pcie_exit_from_l2() to let controller exit L2 state.

Signed-off-by: Frank Li 
---

Notes:
Change from v2 to v3
- update according to mani's feedback
change from v1 to v2
- change subject 'a' to 'A'

 drivers/pci/controller/dwc/pci-layerscape.c | 86 -
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index aea89926bcc4f..6f47cfe146c44 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -35,11 +35,21 @@
 #define PF_MCR_PTOMR   BIT(0)
 #define PF_MCR_EXL2S   BIT(1)
 
+/* LS1021A PEXn PM Write Control Register */
+#define SCFG_PEXPMWRCR(idx)(0x5c + (idx) * 0x64)
+#define PMXMTTURNOFF   BIT(31)
+#define SCFG_PEXSFTRSTCR   0x190
+#define PEXSR(idx) BIT(idx)
+
 #define PCIE_IATU_NUM  6
 
+#define LS_PCIE_DRV_SCFG   BIT(0)
+
 struct ls_pcie_drvdata {
const u32 pf_off;
+   const struct dw_pcie_host_ops *ops;
int (*exit_from_l2)(struct dw_pcie_rp *pp);
+   int flags;
bool pm_support;
 };
 
@@ -47,6 +57,8 @@ struct ls_pcie {
struct dw_pcie *pci;
const struct ls_pcie_drvdata *drvdata;
void __iomem *pf_base;
+   struct regmap *scfg;
+   int index;
bool big_endian;
 };
 
@@ -171,13 +183,65 @@ static int ls_pcie_host_init(struct dw_pcie_rp *pp)
return 0;
 }
 
+static void ls1021a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+   u32 val;
+
+   /* Send PME_Turn_Off message */
+   regmap_read(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), );
+   val |= PMXMTTURNOFF;
+   regmap_write(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), val);
+
+   /*
+* There is no specific register to check for PME_To_Ack from endpoint.
+* So on the safe side, wait for PCIE_PME_TO_L2_TIMEOUT_US.
+*/
+   mdelay(PCIE_PME_TO_L2_TIMEOUT_US/1000);
+
+   /*
+* Layerscape hardware reference manual recommends clearing the 
PMXMTTURNOFF bit
+* to complete the PME_Turn_Off handshake.
+*/
+   regmap_read(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), );
+   val &= ~PMXMTTURNOFF;
+   regmap_write(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), val);
+}
+
+static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+   u32 val;
+
+   /* Only way exit from l2 is that do software reset */
+   regmap_read(pcie->scfg, SCFG_PEXSFTRSTCR, );
+   val |= PEXSR(pcie->index);
+   regmap_write(pcie->scfg, SCFG_PEXSFTRSTCR, val);
+
+   regmap_read(pcie->scfg, SCFG_PEXSFTRSTCR, );
+   val &= ~PEXSR(pcie->index);
+   regmap_write(pcie->scfg, SCFG_PEXSFTRSTCR, val);
+
+   return 0;
+}
+
 static const struct dw_pcie_host_ops ls_pcie_host_ops = {
.host_init = ls_pcie_host_init,
.pme_turn_off = ls_pcie_send_turnoff_msg,
 };
 
+static const struct dw_pcie_host_ops ls1021a_pcie_host_ops = {
+   .host_init = ls_pcie_host_init,
+   .pme_turn_off = ls1021a_pcie_send_turnoff_msg,
+};
+
 static const struct ls_pcie_drvdata ls1021a_drvdata = {
-   .pm_support = false,
+   .pm_support = true,
+   .ops = _pcie_host_ops,
+   .exit_from_l2 = ls1021a_pcie_exit_from_l2,
+   .flags = LS_PCIE_DRV_SCFG,
 };
 
 static const struct ls_pcie_drvdata layerscape_drvdata = {
@@ -205,6 +269,8 @@ static int ls_pcie_probe(struct platform_device *pdev)
struct dw_pcie *pci;
struct ls_pcie *pcie;
struct resource *dbi_base;
+   u32 index[2];
+   int ret;
 
pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL);
if (!pcie)
@@ -220,6 +286,7 @@ static int ls_pcie_probe(struct platform_device *pdev)
pci->pp.ops = _pcie_host_ops;
 
pcie->pci = pci;
+   pci->pp.ops = pcie->drvdata->ops ? pcie->drvdata->ops : 
_pcie_host_ops;
 
dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);
@@ -230,6 +297,23 @@ static int ls_pcie_probe(struct platform_device *pdev)
 
pcie->pf_base = pci->dbi_base + pcie->drvdata->pf_off;
 
+   if (pcie->drvdata->flags & LS_PCIE_DRV_SCFG) {
+
+   pcie->scfg = syscon_regmap_lookup_by_phandle(dev->of_node, 
"fsl,pcie-scfg");
+   if (IS_ERR(pcie->scfg)) {
+   dev_err(dev, "No syscfg phandle specified\n");
+   return PTR_ERR(pcie->scfg);
+   }
+
+   ret = 

[PATCH v3 1/4] PCI: layerscape: Add function pointer for exit_from_l2()

2023-10-17 Thread Frank Li
Since difference SoCs require different sequence for exiting L2, let's add
a separate "exit_from_l2()" callback. This callback can be used to execute
SoC specific sequence.

Signed-off-by: Frank Li 
---

Notes:
Change from v2 to v3
- fixed according to mani's feedback
  1. update commit message
  2. move dw_pcie_host_ops to next patch
  3. check return value from exit_from_l2()
Change from v1 to v2
- change subject 'a' to 'A'

Change from v1 to v2
- change subject 'a' to 'A'

 drivers/pci/controller/dwc/pci-layerscape.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 37956e09c65bd..aea89926bcc4f 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -39,6 +39,7 @@
 
 struct ls_pcie_drvdata {
const u32 pf_off;
+   int (*exit_from_l2)(struct dw_pcie_rp *pp);
bool pm_support;
 };
 
@@ -125,7 +126,7 @@ static void ls_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
dev_err(pcie->pci->dev, "PME_Turn_off timeout\n");
 }
 
-static void ls_pcie_exit_from_l2(struct dw_pcie_rp *pp)
+static int ls_pcie_exit_from_l2(struct dw_pcie_rp *pp)
 {
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
struct ls_pcie *pcie = to_ls_pcie(pci);
@@ -150,6 +151,8 @@ static void ls_pcie_exit_from_l2(struct dw_pcie_rp *pp)
 1);
if (ret)
dev_err(pcie->pci->dev, "L2 exit timeout\n");
+
+   return ret;
 }
 
 static int ls_pcie_host_init(struct dw_pcie_rp *pp)
@@ -180,6 +183,7 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
 static const struct ls_pcie_drvdata layerscape_drvdata = {
.pf_off = 0xc,
.pm_support = true,
+   .exit_from_l2 = ls_pcie_exit_from_l2,
 };
 
 static const struct of_device_id ls_pcie_of_match[] = {
@@ -247,11 +251,14 @@ static int ls_pcie_suspend_noirq(struct device *dev)
 static int ls_pcie_resume_noirq(struct device *dev)
 {
struct ls_pcie *pcie = dev_get_drvdata(dev);
+   int ret;
 
if (!pcie->drvdata->pm_support)
return 0;
 
-   ls_pcie_exit_from_l2(>pci->pp);
+   ret = pcie->drvdata->exit_from_l2(>pci->pp);
+   if (ret)
+   return ret;
 
return dw_pcie_resume_noirq(pcie->pci);
 }
-- 
2.34.1



[PATCH v3 0/4] PCI: layerscape: Add suspend/resume support for ls1043 and ls1021

2023-10-17 Thread Frank Li
Add suspend/resume support for ls1043 and ls1021.
Change log see each patch

Frank Li (4):
  PCI: layerscape: Add function pointer for exit_from_l2()
  PCI: layerscape: Add suspend/resume for ls1021a
  PCI: layerscape: Rename pf_* as pf_lut_*
  PCI: layerscape: Add suspend/resume for ls1043a

 drivers/pci/controller/dwc/pci-layerscape.c | 217 ++--
 1 file changed, 196 insertions(+), 21 deletions(-)

-- 
2.34.1



Re: [powerpc] kernel BUG fs/xfs/xfs_message.c:102! [4k block]

2023-10-17 Thread Sachin Sant



> On 13-Oct-2023, at 2:49 AM, Dave Chinner  wrote:
> 
> On Thu, Oct 12, 2023 at 03:51:04PM +0530, Sachin Sant wrote:
>> While running xfstests on a IBM Power10 server having xfs file system with
>> 4k block size following crash was seen
>> 
>> [ 1609.771548] run fstests xfs/238 at 2023-10-11 17:00:40
>> [ 1609.971214] XFS (sdb1): Mounting V5 Filesystem 
>> 1105d60c-9514-4e7a-af6a-632d99bf06ea
>> [ 1609.980693] XFS (sdb1): Ending clean mount
>> [ 1609.982166] xfs filesystem being mounted at /mnt/test supports timestamps 
>> until 2038-01-19 (0x7fff)
>> [ 1610.024793] XFS (sdb2): Mounting V5 Filesystem 
>> 60de8f0c-c80e-4a2a-b60a-f41a9cc0feca
>> [ 1610.030295] XFS (sdb2): Ending clean mount
>> [ 1610.031342] xfs filesystem being mounted at /mnt/scratch supports 
>> timestamps until 2038-01-19 (0x7fff)
>> [ 1610.087139] XFS: Assertion failed: bp->b_flags & XBF_DONE, file: 
>> fs/xfs/xfs_trans_buf.c, line: 241
>> [ 1610.087162] [ cut here ]
>> [ 1610.087165] kernel BUG at fs/xfs/xfs_message.c:102!
>> [ 1610.087168] Oops: Exception in kernel mode, sig: 5 [#1]
>> [ 1610.087171] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
>> [ 1610.087175] Modules linked in: overlay dm_zero dm_thin_pool 
>> dm_persistent_data dm_bio_prison dm_snapshot dm_bufio loop dm_flakey xfs 
>> dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet 
>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat 
>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding rfkill ip_set tls 
>> nf_tables libcrc32c nfnetlink pseries_rng vmx_crypto ext4 mbcache jbd2 
>> sd_mod t10_pi crc64_rocksoft crc64 sg ibmvscsi scsi_transport_srp ibmveth 
>> fuse [last unloaded: scsi_debug]
>> [ 1610.087220] CPU: 11 PID: 225897 Comm: kworker/11:46 Not tainted 
>> 6.6.0-rc5-gb8b05bc6d83c #1
>> [ 1610.087224] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
>> of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
>> [ 1610.087227] Workqueue: xfs-inodegc/sdb2 xfs_inodegc_worker [xfs]
>> [ 1610.087303] NIP: c00802857edc LR: c00802857ec8 CTR: 
>> 7fff
>> [ 1610.087306] REGS: c0002441b810 TRAP: 0700 Not tainted 
>> (6.6.0-rc5-gb8b05bc6d83c)
>> [ 1610.087309] MSR: 8282b033  CR: 
>> 28000424 XER: 0007
>> [ 1610.087318] CFAR: c00802857d1c IRQMASK: 0 
>> [ 1610.087318] GPR00: c00802857ec8 c0002441bab0 c00802a68300 
>> ffea 
>> [ 1610.087318] GPR04: 000a c0002441b9b0  
>> c008016c6c40 
>> [ 1610.087318] GPR08: ffc0 0001  
>> c0080285a3a8 
>> [ 1610.087318] GPR12: c08311d0 c0117fff1b00 c0197de8 
>> c936bf40 
>> [ 1610.087318] GPR16:    
>> c009d16d48b0 
>> [ 1610.087318] GPR20: c00079e80205 c0001cb52f80 c0001cb52fc0 
>> 8000 
>> [ 1610.087318] GPR24: 0001 c0002441bbc8 c000e77a7000 
>> c000209b7e00 
>> [ 1610.087318] GPR28: c0080279ae48 c008016b25f0 c0002441bc10 
>> c0002dabaee8 
>> [ 1610.087354] NIP [c00802857edc] assfail+0x54/0x74 [xfs]
>> [ 1610.087420] LR [c00802857ec8] assfail+0x40/0x74 [xfs]
>> [ 1610.087485] Call Trace:
>> [ 1610.087486] [c0002441bab0] [c00802857ec8] assfail+0x40/0x74 [xfs] 
>> (unreliable)
>> [ 1610.087551] [c0002441bb10] [c0080281cebc] 
>> xfs_trans_read_buf_map+0x384/0x590 [xfs]
>> [ 1610.087622] [c0002441bba0] [c0080279ae48] 
>> xfs_imap_to_bp+0x70/0xa8 [xfs]
>> [ 1610.087691] [c0002441bbf0] [c0080280c3ec] 
>> xfs_inode_item_precommit+0x244/0x320 [xfs]
>> [ 1610.087760] [c0002441bc60] [c008027f3034] 
>> xfs_trans_run_precommits+0xac/0x160 [xfs]
>> [ 1610.087830] [c0002441bcb0] [c008027f45b0] 
>> __xfs_trans_commit+0x68/0x430 [xfs]
>> [ 1610.087900] [c0002441bd20] [c008027dfc30] 
>> xfs_inactive_ifree+0x158/0x2a0 [xfs]
>> [ 1610.087969] [c0002441bdb0] [c008027dff84] 
>> xfs_inactive+0x20c/0x420 [xfs]
>> [ 1610.088037] [c0002441bdf0] [c008027ceb90] 
>> xfs_inodegc_worker+0x148/0x250 [xfs]
>> [ 1610.088106] [c0002441be40] [c0188130] 
>> process_scheduled_works+0x230/0x4f0
>> [ 1610.088113] [c0002441bf10] [c018b164] 
>> worker_thread+0x1e4/0x500
>> [ 1610.088118] [c0002441bf90] [c0197f18] kthread+0x138/0x140
>> [ 1610.088122] [c0002441bfe0] [c000df98] 
>> start_kernel_thread+0x14/0x18
>> [ 1610.088127] Code: e8a5ca38 7c641b78 3c62 e863ca48 f8010010 f821ffa1 
>> 4bfffd91 3d22 e929ca50 89290010 2f89 419e0008 <0fe0> 0fe0 
>> 38210060 e8010010 
>> [ 1610.088140] ---[ end trace  ]---
>> [ 1610.093928] sh (1070303): drop_caches: 3
>> [ 1610.095600] pstore: backend (nvram) writing error (-1)
>> [ 1610.095605]
>> 
>> xfs/238 test was executed when the crash was encountered.
>> Devices were formatted with 4k block size.
> 
> Yeah, I've seen 

Re: [PATCH] powerpc/64s/radix: Don't warn on copros in radix__tlb_flush()

2023-10-17 Thread Sachin Sant



> On 17-Oct-2023, at 5:45 PM, Michael Ellerman  wrote:
> 
> Sachin reported a warning when running the inject-ra-err selftest:
> 
>  # selftests: powerpc/mce: inject-ra-err
>  Disabling lock debugging due to kernel taint
>  MCE: CPU19: machine check (Severe)  Real address Load/Store (foreign/control 
> memory) [Not recovered]
>  MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [1e48]
>  MCE: CPU19: Initiator CPU
>  MCE: CPU19: Unknown
>  [ cut here ]
>  WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 
> radix__tlb_flush+0x160/0x180
>  CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G   ME  
> 6.6.0-rc3-00055-g9ed22ae6be81 #4
>  Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
> of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
>  ...
>  NIP radix__tlb_flush+0x160/0x180
>  LR  radix__tlb_flush+0x104/0x180
>  Call Trace:
>radix__tlb_flush+0xf4/0x180 (unreliable)
>tlb_finish_mmu+0x15c/0x1e0
>exit_mmap+0x1a0/0x510
>__mmput+0x60/0x1e0
>exit_mm+0xdc/0x170
>do_exit+0x2bc/0x5a0
>do_group_exit+0x4c/0xc0
>sys_exit_group+0x28/0x30
>system_call_exception+0x138/0x330
>system_call_vectored_common+0x15c/0x2ec
> 
> And bisected it to commit e43c0a0c3c28 ("powerpc/64s/radix: combine
> final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning
> in radix__tlb_flush() if mm->context.copros is still elevated.
> 
> However it's possible for the copros count to be elevated if a process
> exits without first closing file descriptors that are associated with a
> copro, eg. VAS.
> 
> If the process exits with a VAS file still open, the release callback
> is queued up for exit_task_work() via:
>  exit_files()
>put_files_struct()
>  close_files()
>filp_close()
>  fput()
> 
> And called via:
>  exit_task_work()
>fput()
>  __fput()
>file->f_op->release(inode, file)
>  coproc_release()
>vas_user_win_ops->close_win()
>  vas_deallocate_window()
>mm_context_remove_vas_window()
>  mm_context_remove_copro()
> 
> But that is after exit_mm() has been called from do_exit() and triggered
> the warning.
> 
> Fix it by dropping the warning, and always calling __flush_all_mm().
> 
> In the normal case of no copros, that will result in a call to
> _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code
> does.
> 
> If the copros count is elevated then it will cause a global flush, which
> should flush translations from any copros. Note that the process table
> entry was cleared in arch_exit_mmap(), so copros should not be able to
> fetch any new translations.
> 
> Fixes: e43c0a0c3c28 ("powerpc/64s/radix: combine final TLB flush and lazy tlb 
> mm shootdown IPIs")
> Reported-by: Sachin Sant 
> Closes: 
> https://lore.kernel.org/all/a8e52547-4bf1-47ce-8aea-bc5a9d7e3...@linux.ibm.com/
> Signed-off-by: Nicholas Piggin 
> Signed-off-by: Michael Ellerman 
> ---

Thanks for the fix. This fixes the reported problem.

Tested-by: Sachin Sant 

- Sachin


Re: [PATCH v8 0/3] generic and PowerPC SED Opal keystore

2023-10-17 Thread Jens Axboe


On Wed, 04 Oct 2023 15:19:54 -0500, gjo...@linux.vnet.ibm.com wrote:
> This patchset has gone through numerous rounds of review and
> all comments/suggetions have been addressed. The reviews have
> covered all relevant areas including reviews by block and keyring
> developers as well as the SED Opal maintainer.
> 
> TCG SED Opal is a specification from The Trusted Computing Group
> that allows self encrypting storage devices (SED) to be locked at
> power on and require an authentication key to unlock the drive.
> 
> [...]

Applied, thanks!

[1/3] block:sed-opal: SED Opal keystore
  commit: 96ff37ceb203426b1bcebbae42399686110b0130
[2/3] block: sed-opal: keystore access for SED Opal keys
  commit: 5dd339722f5f612f349b068e8da6d6710fd0e460
[3/3] powerpc/pseries: PLPKS SED Opal keystore support
  commit: ec8cf230ceccfcc2bd29990c2902be168a92dee4

Best regards,
-- 
Jens Axboe





Re: [PATCH v8 0/3] generic and PowerPC SED Opal keystore

2023-10-17 Thread Jens Axboe
On 10/17/23 9:09 AM, Greg Joyce wrote:
> 
> Hi Jens,
> 
> I've addressed all the comments/issues on v7 of the patchset and
> haven't received any feedback on v8. Is there anything else that you'd
> like to see before this can be included?

Let's give it another shot!

-- 
Jens Axboe




Re: [PATCH v8 0/3] generic and PowerPC SED Opal keystore

2023-10-17 Thread Greg Joyce


Hi Jens,

I've addressed all the comments/issues on v7 of the patchset and
haven't received any feedback on v8. Is there anything else that you'd
like to see before this can be included?

Thanks,
Greg

On Wed, 2023-10-04 at 15:19 -0500, gjo...@linux.vnet.ibm.com wrote:
> From: Greg Joyce 
> 
> This patchset has gone through numerous rounds of review and
> all comments/suggetions have been addressed. The reviews have
> covered all relevant areas including reviews by block and keyring
> developers as well as the SED Opal maintainer.
> 
> TCG SED Opal is a specification from The Trusted Computing Group
> that allows self encrypting storage devices (SED) to be locked at
> power on and require an authentication key to unlock the drive.
> 
> PowerPC/pseries versions of key functions provide read/write access
> to SED Opal keys in the PLPKS keystore.
> 
> The SED block driver has been modified to read the SED Opal
> keystore to populate a key in the SED Opal keyring. Changes to the
> SED Opal key will be written to the SED Opal keystore.
> 
> 
> Changelog
> v8: - rebased to 6.6-rc4
>   - fixed issues using clang (thanks Nathan Chancellor and Nick
> Desaulniers)
>   - fixed crash if PLPKS is not present for pseries (thanks
> Michael
> Ellerman)
> 
> v7: - rebased to for-6.5/block
> 
> v6: - squashed two commits (suggested by Andrew Donnellan)
> 
> v5: - updated to reflect changes in PLPKS API
> 
> v4:
> - scope reduced to cover just SED Opal keys
> - base SED Opal keystore is now in SED block driver
> - removed use of enum to indicate type
> - refactored common code into common function that read and
>   write use
> - removed cast to void
> - added use of SED Opal keystore functions to SED block
> driver
> 
> v3:
> - No code changes, but per reviewer requests, adding
> additional
>   mailing lists(keyring, EFI) for wider review.
> 
> v2:
> - Include feedback from Gregory Joyce, Eric Richter and
>   Murilo Opsfelder Araujo.
> - Include suggestions from Michael Ellerman.
> - Moved a dependency from generic SED code to this patchset.
>   This patchset now builds of its own.
> 
> 
> 
> 
> Greg Joyce (3):
>   block:sed-opal: SED Opal keystore
>   block: sed-opal: keystore access for SED Opal keys
>   powerpc/pseries: PLPKS SED Opal keystore support
> 
>  arch/powerpc/platforms/pseries/Kconfig|   6 +
>  arch/powerpc/platforms/pseries/Makefile   |   1 +
>  .../powerpc/platforms/pseries/plpks_sed_ops.c | 131
> ++
>  block/Kconfig |   1 +
>  block/sed-opal.c  |  18 ++-
>  include/linux/sed-opal-key.h  |  26 
>  6 files changed, 181 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/platforms/pseries/plpks_sed_ops.c
>  create mode 100644 include/linux/sed-opal-key.h
> 



Re: [PATCH 1/4] add generic builtin command line

2023-10-17 Thread Daniel Walker (danielwa)
On Tue, Oct 17, 2023 at 04:10:42PM +0530, Pratyush Brahma wrote:
> For such a usecase, the CONFIG_CMDLINE_PREPEND seems to be quite useful as
> it would help to stitch bootloader
> and the desired build variant's configs together. Can you please help to
> merge this patch?

Yes, your at least the second person that's asked for it, and it's been on my
list for some time to release again. I'll try to release it as soon as possible.

Daniel

Re: [powerpc] Kernel crash while running LTP (bisected)

2023-10-17 Thread Sachin Sant



> On 17-Oct-2023, at 4:35 PM, Lorenzo Stoakes  wrote:
> 
> On Tue, Oct 17, 2023 at 02:46:07PM +0530, Sachin Sant wrote:
>> While running LTP tests (getpid02) on a Power10 server booted with
>> 6.6.0-rc6-next-20231016 following crash was seen:
>> 
>> [   76.386628] Kernel attempted to read user page (d8) - exploit attempt? 
>> (uid: 0)
>> [   76.386649] BUG: Kernel NULL pointer dereference on read at 0x00d8
>> [   76.386653] Faulting instruction address: 0xc04cda90
>> [   76.386658] Oops: Kernel access of bad area, sig: 11 [#1]
> [snip]
>> 
>> Git bisect points to following patch
>> 
>> commit 1db41d29b79ad271674081c752961edd064bbbac
>>mm: perform the mapping_map_writable() check after call_mmap()
>> 
>> Reverting the patch allows the test to complete.
>> 
>> - Sachin
> 
> Hi Sachin,
> 
> Thanks for the report but this was triggered in another test previously and
> has been fixed already (apologies for the inconvenience!) see [0]. Andrew
> took the -fix patch and applied to mm-unstable, this should wend its way to
> -next in the meantime.

Ah, thank you. Yes the fix works for me.

- Sachin


Re: [RFC PATCH v6 09/11] media: uapi: Add audio rate controls support

2023-10-17 Thread Hans Verkuil
On 17/10/2023 15:11, Shengjiu Wang wrote:
> On Mon, Oct 16, 2023 at 9:16 PM Hans Verkuil  wrote:
>>
>> Hi Shengjiu,
>>
>> On 13/10/2023 10:31, Shengjiu Wang wrote:
>>> Fixed point controls are used by the user to configure
>>> the audio sample rate to driver.
>>>
>>> Add V4L2_CID_ASRC_SOURCE_RATE and V4L2_CID_ASRC_DEST_RATE
>>> new IDs for ASRC rate control.
>>>
>>> Signed-off-by: Shengjiu Wang 
>>> ---
>>>  .../userspace-api/media/v4l/common.rst|  1 +
>>>  .../media/v4l/ext-ctrls-fixed-point.rst   | 36 +++
>>>  .../media/v4l/vidioc-g-ext-ctrls.rst  |  4 +++
>>>  .../media/v4l/vidioc-queryctrl.rst|  7 
>>>  .../media/videodev2.h.rst.exceptions  |  1 +
>>>  drivers/media/v4l2-core/v4l2-ctrls-core.c |  5 +++
>>>  drivers/media/v4l2-core/v4l2-ctrls-defs.c |  4 +++
>>>  include/media/v4l2-ctrls.h|  2 ++
>>>  include/uapi/linux/v4l2-controls.h| 13 +++
>>>  include/uapi/linux/videodev2.h|  3 ++
>>>  10 files changed, 76 insertions(+)
>>>  create mode 100644 
>>> Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst
>>>
>>> diff --git a/Documentation/userspace-api/media/v4l/common.rst 
>>> b/Documentation/userspace-api/media/v4l/common.rst
>>> index ea0435182e44..35707edffb13 100644
>>> --- a/Documentation/userspace-api/media/v4l/common.rst
>>> +++ b/Documentation/userspace-api/media/v4l/common.rst
>>> @@ -52,6 +52,7 @@ applicable to all devices.
>>>  ext-ctrls-fm-rx
>>>  ext-ctrls-detect
>>>  ext-ctrls-colorimetry
>>> +ext-ctrls-fixed-point
>>
>> Rename this to ext-ctrls-audio-m2m.
>>
>>>  fourcc
>>>  format
>>>  planar-apis
>>> diff --git 
>>> a/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst 
>>> b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst
>>> new file mode 100644
>>> index ..2ef6e250580c
>>> --- /dev/null
>>> +++ b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst
>>> @@ -0,0 +1,36 @@
>>> +.. SPDX-License-Identifier: GFDL-1.1-no-invariants-or-later
>>> +
>>> +.. _fixed-point-controls:
>>> +
>>> +***
>>> +Fixed Point Control Reference
>>
>> This is for audio controls. "Fixed Point" is just the type, and it doesn't 
>> make
>> sense to group fixed point controls. But it does make sense to group the 
>> audio
>> controls.
>>
>> V4L2 controls can be grouped into classes. Basically it is a way to put 
>> controls
>> into categories, and for each category there is also a control that gives a
>> description of the class (see 2.15.15 in
>> https://linuxtv.org/downloads/v4l-dvb-apis-new/driver-api/v4l2-controls.html#introduction)
>>
>> If you use e.g. 'v4l2-ctl -l' to list all the controls, then you will see 
>> that
>> they are grouped based on what class of control they are.
>>
>> So I think it would be a good idea to create a new control class for M2M 
>> audio controls,
>> instead of just adding them to the catch-all 'User Controls' class.
>>
>> Search e.g. for V4L2_CTRL_CLASS_COLORIMETRY and V4L2_CID_COLORIMETRY_CLASS 
>> to see how
>> it is done.
>>
>> M2M_AUDIO would probably be a good name for the class.
>>
>>> +***
>>> +
>>> +These controls are intended to support an asynchronous sample
>>> +rate converter.
>>
>> Add ' (ASRC).' at the end to indicate the common abbreviation for
>> that.
>>
>>> +
>>> +.. _v4l2-audio-asrc:
>>> +
>>> +``V4L2_CID_ASRC_SOURCE_RATE``
>>> +sets the resampler source rate.
>>> +
>>> +``V4L2_CID_ASRC_DEST_RATE``
>>> +sets the resampler destination rate.
>>
>> Document the unit (Hz) for these two controls.
>>
>>> +
>>> +.. c:type:: v4l2_ctrl_fixed_point
>>> +
>>> +.. cssclass:: longtable
>>> +
>>> +.. tabularcolumns:: |p{1.5cm}|p{5.8cm}|p{10.0cm}|
>>> +
>>> +.. flat-table:: struct v4l2_ctrl_fixed_point
>>> +:header-rows:  0
>>> +:stub-columns: 0
>>> +:widths:   1 1 2
>>> +
>>> +* - __u32
>>
>> Hmm, shouldn't this be __s32?
>>
>>> +  - ``integer``
>>> +  - integer part of fixed point value.
>>> +* - __s32
>>
>> and this __u32?
>>
>> You want to be able to use this generic type as a signed value.
>>
>>> +  - ``fractional``
>>> +  - fractional part of fixed point value, which is Q31.
>>> diff --git a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst 
>>> b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
>>> index f9f73530a6be..1811dabf5c74 100644
>>> --- a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
>>> +++ b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
>>> @@ -295,6 +295,10 @@ still cause this situation.
>>>- ``p_av1_film_grain``
>>>- A pointer to a struct :c:type:`v4l2_ctrl_av1_film_grain`. Valid if 
>>> this control is
>>>  of type ``V4L2_CTRL_TYPE_AV1_FILM_GRAIN``.
>>> +* - struct :c:type:`v4l2_ctrl_fixed_point` *
>>> +  - ``p_fixed_point``
>>> +  - A pointer to a struct 

Re: [RFC PATCH v6 09/11] media: uapi: Add audio rate controls support

2023-10-17 Thread Shengjiu Wang
On Mon, Oct 16, 2023 at 9:16 PM Hans Verkuil  wrote:
>
> Hi Shengjiu,
>
> On 13/10/2023 10:31, Shengjiu Wang wrote:
> > Fixed point controls are used by the user to configure
> > the audio sample rate to driver.
> >
> > Add V4L2_CID_ASRC_SOURCE_RATE and V4L2_CID_ASRC_DEST_RATE
> > new IDs for ASRC rate control.
> >
> > Signed-off-by: Shengjiu Wang 
> > ---
> >  .../userspace-api/media/v4l/common.rst|  1 +
> >  .../media/v4l/ext-ctrls-fixed-point.rst   | 36 +++
> >  .../media/v4l/vidioc-g-ext-ctrls.rst  |  4 +++
> >  .../media/v4l/vidioc-queryctrl.rst|  7 
> >  .../media/videodev2.h.rst.exceptions  |  1 +
> >  drivers/media/v4l2-core/v4l2-ctrls-core.c |  5 +++
> >  drivers/media/v4l2-core/v4l2-ctrls-defs.c |  4 +++
> >  include/media/v4l2-ctrls.h|  2 ++
> >  include/uapi/linux/v4l2-controls.h| 13 +++
> >  include/uapi/linux/videodev2.h|  3 ++
> >  10 files changed, 76 insertions(+)
> >  create mode 100644 
> > Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst
> >
> > diff --git a/Documentation/userspace-api/media/v4l/common.rst 
> > b/Documentation/userspace-api/media/v4l/common.rst
> > index ea0435182e44..35707edffb13 100644
> > --- a/Documentation/userspace-api/media/v4l/common.rst
> > +++ b/Documentation/userspace-api/media/v4l/common.rst
> > @@ -52,6 +52,7 @@ applicable to all devices.
> >  ext-ctrls-fm-rx
> >  ext-ctrls-detect
> >  ext-ctrls-colorimetry
> > +ext-ctrls-fixed-point
>
> Rename this to ext-ctrls-audio-m2m.
>
> >  fourcc
> >  format
> >  planar-apis
> > diff --git 
> > a/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst 
> > b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst
> > new file mode 100644
> > index ..2ef6e250580c
> > --- /dev/null
> > +++ b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst
> > @@ -0,0 +1,36 @@
> > +.. SPDX-License-Identifier: GFDL-1.1-no-invariants-or-later
> > +
> > +.. _fixed-point-controls:
> > +
> > +***
> > +Fixed Point Control Reference
>
> This is for audio controls. "Fixed Point" is just the type, and it doesn't 
> make
> sense to group fixed point controls. But it does make sense to group the audio
> controls.
>
> V4L2 controls can be grouped into classes. Basically it is a way to put 
> controls
> into categories, and for each category there is also a control that gives a
> description of the class (see 2.15.15 in
> https://linuxtv.org/downloads/v4l-dvb-apis-new/driver-api/v4l2-controls.html#introduction)
>
> If you use e.g. 'v4l2-ctl -l' to list all the controls, then you will see that
> they are grouped based on what class of control they are.
>
> So I think it would be a good idea to create a new control class for M2M 
> audio controls,
> instead of just adding them to the catch-all 'User Controls' class.
>
> Search e.g. for V4L2_CTRL_CLASS_COLORIMETRY and V4L2_CID_COLORIMETRY_CLASS to 
> see how
> it is done.
>
> M2M_AUDIO would probably be a good name for the class.
>
> > +***
> > +
> > +These controls are intended to support an asynchronous sample
> > +rate converter.
>
> Add ' (ASRC).' at the end to indicate the common abbreviation for
> that.
>
> > +
> > +.. _v4l2-audio-asrc:
> > +
> > +``V4L2_CID_ASRC_SOURCE_RATE``
> > +sets the resampler source rate.
> > +
> > +``V4L2_CID_ASRC_DEST_RATE``
> > +sets the resampler destination rate.
>
> Document the unit (Hz) for these two controls.
>
> > +
> > +.. c:type:: v4l2_ctrl_fixed_point
> > +
> > +.. cssclass:: longtable
> > +
> > +.. tabularcolumns:: |p{1.5cm}|p{5.8cm}|p{10.0cm}|
> > +
> > +.. flat-table:: struct v4l2_ctrl_fixed_point
> > +:header-rows:  0
> > +:stub-columns: 0
> > +:widths:   1 1 2
> > +
> > +* - __u32
>
> Hmm, shouldn't this be __s32?
>
> > +  - ``integer``
> > +  - integer part of fixed point value.
> > +* - __s32
>
> and this __u32?
>
> You want to be able to use this generic type as a signed value.
>
> > +  - ``fractional``
> > +  - fractional part of fixed point value, which is Q31.
> > diff --git a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst 
> > b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
> > index f9f73530a6be..1811dabf5c74 100644
> > --- a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
> > +++ b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
> > @@ -295,6 +295,10 @@ still cause this situation.
> >- ``p_av1_film_grain``
> >- A pointer to a struct :c:type:`v4l2_ctrl_av1_film_grain`. Valid if 
> > this control is
> >  of type ``V4L2_CTRL_TYPE_AV1_FILM_GRAIN``.
> > +* - struct :c:type:`v4l2_ctrl_fixed_point` *
> > +  - ``p_fixed_point``
> > +  - A pointer to a struct :c:type:`v4l2_ctrl_fixed_point`. Valid if 
> > this control is
> > +of type 

Re: [PATCH] powerpc/paravirt: Improve vcpu_is_preempted

2023-10-17 Thread Aboorva Devarajan
On Mon, 2023-10-09 at 10:47 +0530, Srikar Dronamraju wrote:

Hi Srikar,

Benchmarked this patch on baremetal POWER9 node by launching KVM to
observe the improvements achieved in KVM with the patched kernel.
Below, you can find the schbench latency result comparision.

System was running on SMT4 mode with the below configuration:

Setup:

$ lscpu
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  128
On-line CPU(s) list: 0-127
Thread(s) per core:  4
Core(s) per socket:  16
Socket(s):   2
NUMA node(s):8
Model:   2.3 (pvr 004e 1203)
Model name:  POWER9, altivec supported
CPU max MHz: 3800.
CPU min MHz: 2300.
L1d cache:   32K
L1i cache:   32K
L2 cache:512K
L3 cache:10240K
NUMA node0 CPU(s):   0-63
NUMA node8 CPU(s):   64-127
NUMA node250 CPU(s):
NUMA node251 CPU(s):
NUMA node252 CPU(s):
NUMA node253 CPU(s):
NUMA node254 CPU(s):
NUMA node255 CPU(s):

- Baseline kernel  : v6.6.0-rc5
- Patched kernel   : v6.6.0-rc5 with patchset
- schbench version : upto commit 2eef44 (schbench: record the 
  execution time in the matrix multiplication mode)


Results:


These results shows the schbench latency on a patched kernel compared
to a baseline kernel on KVM. The numbers in the "compare%" column
represent the percentage difference between the latency measured on the
baseline kernel and the patched kernel. A negative percentage means the
patched kernel performs less optimially (higher latency) than the
baseline, while a positive percentage means it performs better (lower
latency).


Scenarios:
--


Case 1: No Noise

Host: Idle
KVM 1: Launched a KVM affined to 0-39 CPUs (40 CPUs)
KVM 1 (Workload) : ./schbench -m 20 -t 2 -r 30 (benchmark)

schbench latency (niter: 20)

percentile  compare% (avg)
(higher the better)

50.0th: -4.84
75.0th: -8.09
90.0th: -3.39
95.0th: +5.16
99.0th: +90.78
99.5th: +36.34
99.9th: +8.31



Case 2: With Noise: Over-commit case: Multiple KVM guests sharing the
same set of CPUs

Two KVM instances are launched, where one being benchmarked, and the
other executing a workload to introduce noise.

KVM 1: Launched a KVM affined to 0-39 CPUs (40 CPUs)
KVM 1 (Workload) : ./schbench -m 20 -t 2 -r 30 (benchmark)

KVM 2 (Noise): Launched a KVM affined to 0-39 CPUs 

schbench latency (niter: 20)

percentile  compare% (avg) 
(higher the better)

50.0th: -1.47
75.0th: -5.72
90.0th: +7.88
95.0th: +10.71
99.0th: +512.08
99.5th: +380.61
99.9th: +90.39



Case 3: Overlap case: Multiple KVM guests sharing a subset of CPUs.

Two KVM instances are launched, where one being benchmarked, and the
other executing a workload to introduce noise.

KVM 1: Launched a KVM affined to 0-39 CPUs (40 CPUs)
KVM 1 (Workload) : ./schbench -m 20 -t 2 -r 30 (benchmark)

KVM 2 (Noise): Launched a KVM affined to 0-19 CPUs

schbench latency (niter: 20)

percentile  compare% (avg)
(higher the better)

50.0th: -1.63
75.0th: -2.78
90.0th: +57.62
95.0th: +87.90
99.0th: +343.66
99.5th: +178.01
99.9th: +36.07



The above results demonstrate the effectiveness of the proposed
approach, which utilizes the idle-hint in lppaca to detect the
preempted vCPU more efficiently. This approach is beneficial for
improving schbench latency on KVM, particularly the tail latencies.

Thanks,
Aboorva

> PowerVM Hypervisor dispatches on a whole core basis. In a shared
> LPAR, a
> CPU from a core that is preempted may have a larger latency. In
> such a scenario, its preferable to choose a different CPU to run.
> 
> If one of the CPUs in the core is active, i.e neither CEDED nor
> preempted, then consider this CPU as not preempted.
> 
> Also if any of the CPUs in the core has yielded but OS has not
> requested
> CEDE or CONFER, then consider this CPU to be preempted.
> 
> Cc: Ajay Kaher 
> Cc: Alexey Makhalov 
> Cc: Christophe Leroy 
> Cc: Juergen Gross 
> Cc: linux-ker...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: virtualizat...@lists.linux-foundation.org
> Signed-off-by: Srikar Dronamraju 
> ---
>  arch/powerpc/include/asm/paravirt.h | 33 ++-
> --
>  1 file changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/paravirt.h
> b/arch/powerpc/include/asm/paravirt.h
> index e08513d73119..a980756f58df 100644
> --- a/arch/powerpc/include/asm/paravirt.h
> +++ b/arch/powerpc/include/asm/paravirt.h
> @@ -121,9 +121,19 @@ static inline bool vcpu_is_preempted(int cpu)
>   if (!is_shared_processor())
>   return 

[PATCH] powerpc/64s/radix: Don't warn on copros in radix__tlb_flush()

2023-10-17 Thread Michael Ellerman
Sachin reported a warning when running the inject-ra-err selftest:

  # selftests: powerpc/mce: inject-ra-err
  Disabling lock debugging due to kernel taint
  MCE: CPU19: machine check (Severe)  Real address Load/Store (foreign/control 
memory) [Not recovered]
  MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [1e48]
  MCE: CPU19: Initiator CPU
  MCE: CPU19: Unknown
  [ cut here ]
  WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 
radix__tlb_flush+0x160/0x180
  CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G   ME   
   6.6.0-rc3-00055-g9ed22ae6be81 #4
  Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1030.20 
(NH1030_058) hv:phyp pSeries
  ...
  NIP radix__tlb_flush+0x160/0x180
  LR  radix__tlb_flush+0x104/0x180
  Call Trace:
radix__tlb_flush+0xf4/0x180 (unreliable)
tlb_finish_mmu+0x15c/0x1e0
exit_mmap+0x1a0/0x510
__mmput+0x60/0x1e0
exit_mm+0xdc/0x170
do_exit+0x2bc/0x5a0
do_group_exit+0x4c/0xc0
sys_exit_group+0x28/0x30
system_call_exception+0x138/0x330
system_call_vectored_common+0x15c/0x2ec

And bisected it to commit e43c0a0c3c28 ("powerpc/64s/radix: combine
final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning
in radix__tlb_flush() if mm->context.copros is still elevated.

However it's possible for the copros count to be elevated if a process
exits without first closing file descriptors that are associated with a
copro, eg. VAS.

If the process exits with a VAS file still open, the release callback
is queued up for exit_task_work() via:
  exit_files()
put_files_struct()
  close_files()
filp_close()
  fput()

And called via:
  exit_task_work()
fput()
  __fput()
file->f_op->release(inode, file)
  coproc_release()
vas_user_win_ops->close_win()
  vas_deallocate_window()
mm_context_remove_vas_window()
  mm_context_remove_copro()

But that is after exit_mm() has been called from do_exit() and triggered
the warning.

Fix it by dropping the warning, and always calling __flush_all_mm().

In the normal case of no copros, that will result in a call to
_tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code
does.

If the copros count is elevated then it will cause a global flush, which
should flush translations from any copros. Note that the process table
entry was cleared in arch_exit_mmap(), so copros should not be able to
fetch any new translations.

Fixes: e43c0a0c3c28 ("powerpc/64s/radix: combine final TLB flush and lazy tlb 
mm shootdown IPIs")
Reported-by: Sachin Sant 
Closes: 
https://lore.kernel.org/all/a8e52547-4bf1-47ce-8aea-bc5a9d7e3...@linux.ibm.com/
Signed-off-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/book3s64/radix_tlb.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index 39acc2cbab4c..9e1f6558d026 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -1212,14 +1212,7 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 
smp_mb(); /* see radix__flush_tlb_mm */
exit_flush_lazy_tlbs(mm);
-   _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL);
-
-   /*
-* It should not be possible to have coprocessors still
-* attached here.
-*/
-   if (WARN_ON_ONCE(atomic_read(>context.copros) > 0))
-   __flush_all_mm(mm, true);
+   __flush_all_mm(mm, true);
 
preempt_enable();
} else {
-- 
2.41.0



Re: [powerpc] Kernel crash while running LTP (bisected)

2023-10-17 Thread Lorenzo Stoakes
On Tue, Oct 17, 2023 at 02:46:07PM +0530, Sachin Sant wrote:
> While running LTP tests (getpid02) on a Power10 server booted with
> 6.6.0-rc6-next-20231016 following crash was seen:
>
> [   76.386628] Kernel attempted to read user page (d8) - exploit attempt? 
> (uid: 0)
> [   76.386649] BUG: Kernel NULL pointer dereference on read at 0x00d8
> [   76.386653] Faulting instruction address: 0xc04cda90
> [   76.386658] Oops: Kernel access of bad area, sig: 11 [#1]
[snip]
>
> Git bisect points to following patch
>
> commit 1db41d29b79ad271674081c752961edd064bbbac
> mm: perform the mapping_map_writable() check after call_mmap()
>
> Reverting the patch allows the test to complete.
>
> - Sachin

Hi Sachin,

Thanks for the report but this was triggered in another test previously and
has been fixed already (apologies for the inconvenience!) see [0]. Andrew
took the -fix patch and applied to mm-unstable, this should wend its way to
-next in the meantime.

[0]:https://lore.kernel.org/all/c9eb4cc6-7db4-4c2b-838d-43a0b319a4f0@lucifer.local/


Re: [PATCHv9 2/2] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt

2023-10-17 Thread Hari Bathini




On 17/10/23 7:58 am, Pingfan Liu wrote:

*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem
of allocating memory for paca_ptrs[]. However, in theory, there is no
requirement to assign cpu's logical id as its present sequence in the
device tree. But there is something like cpu_first_thread_sibling(),
which makes assumption on the mapping inside a core. Hence partially
loosening the mapping, i.e. unbind the mapping of core while keep the
mapping inside a core.

*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this
patch allocates interim memory to link the cpu info on a list, then
reorder cpus by changing the list head. As a result, there is a rotate
shift between the sequence number in dt and the cpu logical number.

*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the
range [0,threads_per_core).

Besides this, at this phase, all threads in the boot core are forced to
be onlined. This restriction will be lifted in a later patch with
extra effort.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: Sourabh Jain 
Cc: Hari Bathini 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org


Thanks for working on this, Pingfan.
Looks good to me.

Acked-by: Hari Bathini 


---
  arch/powerpc/kernel/prom.c | 25 +
  arch/powerpc/kernel/setup-common.c | 84 +++---
  2 files changed, 82 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index ec82f5bda908..7ed9034912ca 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
  unsigned int boot_cpu_node_count __ro_after_init;
  #endif
  static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
  static int __initdata boot_cpu_count;
+#endif
  
  static int __init early_parse_mem(char *p)

  {
@@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
const __be32 *intserv;
int i, nthreads;
int len;
-   int found = -1;
-   int found_thread = 0;
+   bool found = false;
  
  	/* We are scanning "cpu" nodes only */

if (type == NULL || strcmp(type, "cpu") != 0)
@@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
for (i = 0; i < nthreads; i++) {
if (be32_to_cpu(intserv[i]) ==
fdt_boot_cpuid_phys(initial_boot_params)) {
-   found = boot_cpu_count;
-   found_thread = i;
+   /*
+* always map the boot-cpu logical id into the
+* range of [0, thread_per_core)
+*/
+   boot_cpuid = i;
+   found = true;
+   /* This forces all threads in a core to be online */
+   if (nr_cpu_ids % nthreads != 0)
+   set_nr_cpu_ids(ALIGN(nr_cpu_ids, nthreads));
}
  #ifdef CONFIG_SMP
/* logical cpu id is always 0 on UP kernels */
@@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
}
  
  	/* Not the boot CPU */

-   if (found < 0)
+   if (!found)
return 0;
  
-	DBG("boot cpu: logical %d physical %d\n", found,

-   be32_to_cpu(intserv[found_thread]));
-   boot_cpuid = found;
+   DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
+   be32_to_cpu(intserv[boot_cpuid]));
  
-	boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);

+   boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
  
  	/*

 * PAPR defines "logical" PVR values for cpus that
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 707f0490639d..9802c7e5ee2f 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
  
  u32 *cpu_to_phys_id = NULL;
  
+struct interrupt_server_node {

+   struct list_head node;
+   boolavail;
+   int len;
+   __be32 intserv[];
+};
+
  /**
   * setup_cpu_maps - initialize the following cpu maps:
   *  cpu_possible_mask
@@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL;
  void __init smp_setup_cpu_maps(void)
  {
struct device_node *dn;
-   int cpu = 0;
-   int nthreads = 1;
+   int shift = 0, cpu = 0;
+   int j, nthreads = 1;
+   int len;
+   struct interrupt_server_node *intserv_node, *n;
+   struct list_head *bt_node, head;
+   bool avail, found_boot_cpu = false;
  
  	DBG("smp_setup_cpu_maps()\n");
  
+	

[PATCH 1/4] kbuild: remove ARCH_POSTLINK from module builds

2023-10-17 Thread Masahiro Yamada
The '%.ko' rule in arch/*/Makefile.postlink does nothing but call the
'true' command.

Remove the meaningless code.

Signed-off-by: Masahiro Yamada 
---

 arch/mips/Makefile.postlink| 3 ---
 arch/powerpc/Makefile.postlink | 3 ---
 arch/riscv/Makefile.postlink   | 3 ---
 arch/x86/Makefile.postlink | 3 ---
 scripts/Makefile.modfinal  | 5 +
 5 files changed, 1 insertion(+), 16 deletions(-)

diff --git a/arch/mips/Makefile.postlink b/arch/mips/Makefile.postlink
index 34e3bd71f3b0..6cfdc149d3bc 100644
--- a/arch/mips/Makefile.postlink
+++ b/arch/mips/Makefile.postlink
@@ -31,9 +31,6 @@ ifeq ($(CONFIG_RELOCATABLE),y)
$(call if_changed,relocs)
 endif
 
-%.ko: FORCE
-   @true
-
 clean:
@true
 
diff --git a/arch/powerpc/Makefile.postlink b/arch/powerpc/Makefile.postlink
index 1f860b3c9bec..ae5a4256b03d 100644
--- a/arch/powerpc/Makefile.postlink
+++ b/arch/powerpc/Makefile.postlink
@@ -35,9 +35,6 @@ ifdef CONFIG_RELOCATABLE
$(call if_changed,relocs_check)
 endif
 
-%.ko: FORCE
-   @true
-
 clean:
rm -f .tmp_symbols.txt
 
diff --git a/arch/riscv/Makefile.postlink b/arch/riscv/Makefile.postlink
index a46fc578b30b..829b9abc91f6 100644
--- a/arch/riscv/Makefile.postlink
+++ b/arch/riscv/Makefile.postlink
@@ -36,9 +36,6 @@ ifdef CONFIG_RELOCATABLE
$(call if_changed,relocs_strip)
 endif
 
-%.ko: FORCE
-   @true
-
 clean:
@true
 
diff --git a/arch/x86/Makefile.postlink b/arch/x86/Makefile.postlink
index 936093d29160..fef2e977cc7d 100644
--- a/arch/x86/Makefile.postlink
+++ b/arch/x86/Makefile.postlink
@@ -34,9 +34,6 @@ ifeq ($(CONFIG_X86_NEED_RELOCS),y)
$(call cmd,strip_relocs)
 endif
 
-%.ko: FORCE
-   @true
-
 clean:
@rm -f $(OUT_RELOCS)/vmlinux.relocs
 
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index b3a6aa8fbe8c..8568d256d6fb 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -28,14 +28,11 @@ quiet_cmd_cc_o_c = CC [M]  $@
 %.mod.o: %.mod.c FORCE
$(call if_changed_dep,cc_o_c)
 
-ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
-
 quiet_cmd_ld_ko_o = LD [M]  $@
   cmd_ld_ko_o +=   \
$(LD) -r $(KBUILD_LDFLAGS)  \
$(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE)  \
-   -T scripts/module.lds -o $@ $(filter %.o, $^);  \
-   $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
+   -T scripts/module.lds -o $@ $(filter %.o, $^)
 
 quiet_cmd_btf_ko = BTF [M] $@
   cmd_btf_ko = \
-- 
2.40.1



Re: [PATCHv9 1/2] powerpc/setup : Enable boot_cpu_hwid for PPC32

2023-10-17 Thread Hari Bathini




On 17/10/23 7:58 am, Pingfan Liu wrote:

In order to identify the boot cpu, its intserv[] should be recorded and
checked in smp_setup_cpu_maps().

smp_setup_cpu_maps() is shared between PPC64 and PPC32. Since PPC64 has
already used boot_cpu_hwid to carry that information, enabling this
variable on PPC32 so later it can also be used to carry that information
for PPC32 in the coming patch.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: Sourabh Jain 
Cc: Hari Bathini 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org


LGTM.

Acked-by: Hari Bathini 


---
  arch/powerpc/include/asm/smp.h | 2 +-
  arch/powerpc/kernel/prom.c | 3 +--
  arch/powerpc/kernel/setup-common.c | 2 --
  3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 576d0e15..5db9178cc800 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -26,7 +26,7 @@
  #include 
  
  extern int boot_cpuid;

-extern int boot_cpu_hwid; /* PPC64 only */
+extern int boot_cpu_hwid;
  extern int spinning_secondaries;
  extern u32 *cpu_to_phys_id;
  extern bool coregroup_enabled;
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 0b5878c3125b..ec82f5bda908 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -372,8 +372,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
be32_to_cpu(intserv[found_thread]));
boot_cpuid = found;
  
-	if (IS_ENABLED(CONFIG_PPC64))

-   boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
+   boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
  
  	/*

 * PAPR defines "logical" PVR values for cpus that
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 2f1026fba00d..707f0490639d 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -87,9 +87,7 @@ EXPORT_SYMBOL(machine_id);
  int boot_cpuid = -1;
  EXPORT_SYMBOL_GPL(boot_cpuid);
  
-#ifdef CONFIG_PPC64

  int boot_cpu_hwid = -1;
-#endif
  
  /*

   * These are used in binfmt_elf.c to put aux entries on the stack


[powerpc] Kernel crash while running LTP (bisected)

2023-10-17 Thread Sachin Sant
While running LTP tests (getpid02) on a Power10 server booted with
6.6.0-rc6-next-20231016 following crash was seen:

[   76.386628] Kernel attempted to read user page (d8) - exploit attempt? (uid: 
0)
[   76.386649] BUG: Kernel NULL pointer dereference on read at 0x00d8
[   76.386653] Faulting instruction address: 0xc04cda90
[   76.386658] Oops: Kernel access of bad area, sig: 11 [#1]
[   76.386661] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
[   76.386667] Modules linked in: rpadlpar_io rpaphp xsk_diag nft_fib_inet 
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 bonding rfkill tls ip_set nf_tables nfnetlink sunrpc pseries_rng 
vmx_crypto aes_gcm_p10_crypto binfmt_misc xfs libcrc32c sd_mod t10_pi sr_mod 
cdrom crc64_rocksoft crc64 sg ibmvscsi ibmveth scsi_transport_srp fuse
[   76.386709] CPU: 22 PID: 5763 Comm: getpid02 Kdump: loaded Not tainted 
6.6.0-rc6-next-20231016 #3
[   76.386713] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
[   76.386718] NIP:  c04cda90 LR: c04cd840 CTR: 
[   76.386721] REGS: c001f491b840 TRAP: 0300   Not tainted  
(6.6.0-rc6-next-20231016)
[   76.386724] MSR:  80009033   CR: 48082804  
XER: 
[   76.386733] CFAR: c04cd848 DAR: 00d8 DSISR: 4000 
IRQMASK: 0
[   76.386733] GPR00: c04cd840 c001f491bae0 c1471a00 

[   76.386733] GPR04: 00fb   
0001
[   76.386733] GPR08: 01c4 c001fb8aa830 c001e5140d00 
c001eccfac00
[   76.386733] GPR12: 001f c00e87bf7300  

[   76.386733] GPR16:    

[   76.386733] GPR20: 7fff9944  c001e86bdd60 
c001e86be8e0
[   76.386733] GPR24: 0001 0001 0001 

[   76.386733] GPR28: 00fb c001e5140d00 7fff9944 
c001fb8aa830
[   76.386773] NIP [c04cda90] mmap_region+0x8b0/0xb30
[   76.386781] LR [c04cd840] mmap_region+0x660/0xb30
[   76.386784] Call Trace:
[   76.386786] [c001f491bae0] [c04cd840] mmap_region+0x660/0xb30 
(unreliable)
[   76.386791] [c001f491bc10] [c04ce0dc] do_mmap+0x3cc/0x5c0
[   76.386794] [c001f491bca0] [c0486724] vm_mmap_pgoff+0x134/0x240
[   76.386800] [c001f491bd80] [c04c98a8] ksys_mmap_pgoff+0x158/0x2b0
[   76.386806] [c001f491bdf0] [c0011834] do_mmap2+0x54/0xc0
[   76.386811] [c001f491be10] [c0036624] 
system_call_exception+0x134/0x330
[   76.386817] [c001f491be50] [c000d6a0] 
system_call_common+0x160/0x2e4
[   76.386822] --- interrupt: c00 at 0x7fff9932ff68
[   76.386825] NIP:  7fff9932ff68 LR: 10005074 CTR: 
[   76.386828] REGS: c001f491be80 TRAP: 0c00   Not tainted  
(6.6.0-rc6-next-20231016)
[   76.386831] MSR:  8280f033   CR: 
24002204  XER: 
[   76.386840] IRQMASK: 0
[   76.386840] GPR00: 005a 7fffd709f9f0 7fff99407300 

[   76.386840] GPR04: 0004 0003 0021 

[   76.386840] GPR08:    

[   76.386840] GPR12:  7fff994ea3d0  

[   76.386840] GPR16:  10034498 10034be8 
100336a8
[   76.386840] GPR20: 10034ba8 0001 1007c418 
10033770
[   76.386840] GPR24:   10034bd0 
1007c438
[   76.386840] GPR28: 10061c88 7fffd70afed5 1007c438 
10033770
[   76.386876] NIP [7fff9932ff68] 0x7fff9932ff68
[   76.386879] LR [10005074] 0x10005074
[   76.386881] --- interrupt: c00
[   76.386883] Code: 73890008 4082012c e93f0020 3b00 fb7f0078 4bfffc74 
6000 6000 e87f0088 3b00 4b20 6000  39490044 
7d005028 3108  [   76.386896] ---[ end trace  ]---
[   76.388667] pstore: backend (nvram) writing error (-1)

Git bisect points to following patch

commit 1db41d29b79ad271674081c752961edd064bbbac
mm: perform the mapping_map_writable() check after call_mmap()

Reverting the patch allows the test to complete.

- Sachin

Re: [PATCH 2/3] PCI: layerscape: add suspend/resume for ls1021a

2023-10-17 Thread Manivannan Sadhasivam
On Mon, Oct 16, 2023 at 04:18:36PM -0400, Frank Li wrote:
> On Mon, Oct 16, 2023 at 10:28:24PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Sep 15, 2023 at 02:43:05PM -0400, Frank Li wrote:
> > > ls1021a add suspend/resume support.
> > > 
> > 
> > Please add what the driver is doing during suspend/resume.
> > 
> > > Signed-off-by: Frank Li 
> > > ---
> > >  drivers/pci/controller/dwc/pci-layerscape.c | 88 -
> > >  1 file changed, 87 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> > > b/drivers/pci/controller/dwc/pci-layerscape.c
> > > index 20c48c06e2248..bc5a8ff1a26ce 100644
> > > --- a/drivers/pci/controller/dwc/pci-layerscape.c
> > > +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> > > @@ -35,6 +35,12 @@
> > >  #define PF_MCR_PTOMR BIT(0)
> > >  #define PF_MCR_EXL2S BIT(1)
> > >  
> > > +/* LS1021A PEXn PM Write Control Register */
> > > +#define SCFG_PEXPMWRCR(idx)  (0x5c + (idx) * 0x64)
> > > +#define PMXMTTURNOFF BIT(31)
> > > +#define SCFG_PEXSFTRSTCR 0x190
> > > +#define PEXSR(idx)   BIT(idx)
> > > +
> > >  #define PCIE_IATU_NUM6
> > >  
> > >  struct ls_pcie_drvdata {
> > > @@ -48,6 +54,8 @@ struct ls_pcie {
> > >   struct dw_pcie *pci;
> > >   const struct ls_pcie_drvdata *drvdata;
> > >   void __iomem *pf_base;
> > > + struct regmap *scfg;
> > > + int index;
> > >   bool big_endian;
> > >  };
> > >  
> > > @@ -170,13 +178,91 @@ static int ls_pcie_host_init(struct dw_pcie_rp *pp)
> > >   return 0;
> > >  }
> > >  
> > > +static void ls1021a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
> > > +{
> > > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > + struct ls_pcie *pcie = to_ls_pcie(pci);
> > > + u32 val;
> > > +
> > > + if (!pcie->scfg) {
> > 
> > Can this ever happen?
> > 
> > > + dev_dbg(pcie->pci->dev, "SYSCFG is NULL\n");
> > > + return;
> > > + }
> > > +
> > > + /* Send Turn_off message */
> > 
> > "Send PME_Turn_Off message"
> > 
> > > + regmap_read(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), );
> > > + val |= PMXMTTURNOFF;
> > > + regmap_write(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), val);
> > > +
> > > + /* There are not register to check ACK, so wait 
> > > PCIE_PME_TO_L2_TIMEOUT_US */
> > 
> > "There is no specific register to check for PME_To_Ack from endpoint. So on 
> > the
> > safe side, wait for PCIE_PME_TO_L2_TIMEOUT_US."
> > 
> > > + mdelay(PCIE_PME_TO_L2_TIMEOUT_US/1000);
> > > +
> > > + /* Clear Turn_off message */
> > 
> > "PME_Turn_off". But I'm not sure if this is really required. Are you doing 
> > it
> > because the layerspace hw implements the PME_Turn_Off bit as "level 
> > triggered"?
> 
> I am not sure how hardware implement this. But reference manual said:
>  
> PMXMTTURNOFF:
> Generate PM turnoff message for power management of PCI Express controllers.
> This bit should be cleared by software.
> 0 Clear PM turnoff (default)
> 1 Trigger PM turnoff
> 

Hmm, okay. Atleast add the below comment to make it understandable in the
future:

"Layerscape hardware reference manual recommends clearing the PMXMTTURNOFF bit
to complete the PME_Turn_Off handshake."

- Mani

> Frank
> 
> > 
> > > + regmap_read(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), );
> > > + val &= ~PMXMTTURNOFF;
> > > + regmap_write(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), val);
> > > +}
> > > +
> > > +static void ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
> > > +{
> > > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > + struct ls_pcie *pcie = to_ls_pcie(pci);
> > > + u32 val;
> > > +
> > 
> > A comment here would be good.
> > 
> > > + regmap_read(pcie->scfg, SCFG_PEXSFTRSTCR, );
> > > + val |= PEXSR(pcie->index);
> > > + regmap_write(pcie->scfg, SCFG_PEXSFTRSTCR, val);
> > > +
> > > + regmap_read(pcie->scfg, SCFG_PEXSFTRSTCR, );
> > > + val &= ~PEXSR(pcie->index);
> > > + regmap_write(pcie->scfg, SCFG_PEXSFTRSTCR, val);
> > > +}
> > > +
> > > +static int ls1021a_pcie_host_init(struct dw_pcie_rp *pp)
> > > +{
> > > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > + struct ls_pcie *pcie = to_ls_pcie(pci);
> > > + struct device *dev = pcie->pci->dev;
> > > + u32 index[2];
> > > + int ret;
> > > +
> > > + ret = ls_pcie_host_init(pp);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + pcie->scfg = syscon_regmap_lookup_by_phandle(dev->of_node, 
> > > "fsl,pcie-scfg");
> > > + if (IS_ERR(pcie->scfg)) {
> > > + ret = PTR_ERR(pcie->scfg);
> > > + dev_err(dev, "No syscfg phandle specified\n");
> > > + pcie->scfg = NULL;
> > > + return ret;
> > > + }
> > > +
> > > + ret = of_property_read_u32_array(dev->of_node, "fsl,pcie-scfg", index, 
> > > 2);
> > > + if (ret) {
> > > + pcie->scfg = NULL;
> > > + return ret;
> > > + }
> > > +
> > > + pcie->index = index[1];
> > > +
> > 
> > The above syscon parsing could be done conditionally during probe itself. 
> > There
> > is no need to do it 

Re: [PATCH v2 0/3] Add generic data patching functions

2023-10-17 Thread Benjamin Gray

On 17/10/23 5:39 pm, Christophe Leroy wrote:

Le 16/10/2023 à 07:01, Benjamin Gray a écrit :

Currently patch_instruction() bases the write length on the value being
written. If the value looks like a prefixed instruction it writes 8 bytes,
otherwise it writes 4 bytes. This makes it potentially buggy to use for
writing arbitrary data, as if you want to write 4 bytes but it decides to
write 8 bytes it may clobber the following memory or be unaligned and
trigger an oops if it tries to cross a page boundary.

To solve this, this series pulls out the size parameter to the 'top' of
the text patching logic, and propagates it through the various functions.

The two sizes supported are int and long; this allows for patching
instructions and pointers on both ppc32 and ppc64. On ppc32 these are the
same size, so care is taken to only use the size parameter on static
functions, so the compiler can optimise it out entirely. Unfortunately
GCC trips over its own feet here and won't optimise in a way that is
optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX
(pmac32_defconfig).

In the first case, patch_memory() is very large and can only be inlined
if a single function calls it. While the source only calls it in
patch_instruction(), an earlier optimisation pass inlined
patch_instruction() into patch_branch(), so now there are 'two' references
to patch_memory() and it cannot be inlined into patch_instruction().
Instead patch_instruction() becomes a single branch directly to
patch_memory().

We can fix this by marking patch_instruction() as noinline, but this
prevents patch_memory() from being directly inlined into patch_branch()
when RWX is disabled and patch_memory() is very small.

It may be possible to avoid this by merging together patch_instruction()
and patch_memory() on ppc32, but the only way I can think to do this
without duplicating the implementation involves using the preprocessor
to change if is_dword is a parameter or a local variable, which is gross.


What about:

diff --git a/arch/powerpc/include/asm/code-patching.h
b/arch/powerpc/include/asm/code-patching.h
index 7c6056bb1706..af89ef450c93 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -72,7 +72,7 @@ static inline int create_branch(ppc_inst_t *instr,
const u32 *addr,
   int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
   unsigned long target, int flags);
   int patch_branch(u32 *addr, unsigned long target, int flags);
-int patch_instruction(u32 *addr, ppc_inst_t instr);
+int patch_memory(void *addr, unsigned long val, bool is_dword);
   int raw_patch_instruction(u32 *addr, ppc_inst_t instr);

   /*
@@ -87,24 +87,28 @@ int raw_patch_instruction(u32 *addr, ppc_inst_t instr);

   #ifdef CONFIG_PPC64

-int patch_uint(void *addr, unsigned int val);
-int patch_ulong(void *addr, unsigned long val);
+int patch_instruction(u32 *addr, ppc_inst_t instr);

   #define patch_u64 patch_ulong

   #else

-static inline int patch_uint(u32 *addr, unsigned int val)
+static inline int patch_instruction(u32 *addr, ppc_inst_t instr)
   {
-   return patch_instruction(addr, ppc_inst(val));
+   return patch_memory(addr, ppc_inst_val(instr), false);
   }

+#endif
+
   static inline int patch_ulong(void *addr, unsigned long val)
   {
-   return patch_instruction(addr, ppc_inst(val));
+   return patch_memory(addr, val, true);
   }

-#endif
+static inline int patch_uint(void *addr, unsigned int val)
+{
+   return patch_memory(addr, val, false);
+}

   #define patch_u32 patch_uint

diff --git a/arch/powerpc/lib/code-patching.c
b/arch/powerpc/lib/code-patching.c
index 60289332412f..77418b2a4aa4 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -355,7 +355,7 @@ static int __do_patch_memory(void *addr, unsigned
long val, bool is_dword)
return err;
   }

-static int patch_memory(void *addr, unsigned long val, bool is_dword)
+int patch_memory(void *addr, unsigned long val, bool is_dword)
   {
int err;
unsigned long flags;
@@ -378,6 +378,7 @@ static int patch_memory(void *addr, unsigned long
val, bool is_dword)

return err;
   }
+NOKPROBE_SYMBOL(patch_memory)

   #ifdef CONFIG_PPC64

@@ -390,26 +391,6 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
   }
   NOKPROBE_SYMBOL(patch_instruction)

-int patch_uint(void *addr, unsigned int val)
-{
-   return patch_memory(addr, val, false);
-}
-NOKPROBE_SYMBOL(patch_uint)
-
-int patch_ulong(void *addr, unsigned long val)
-{
-   return patch_memory(addr, val, true);
-}
-NOKPROBE_SYMBOL(patch_ulong)
-
-#else
-
-int patch_instruction(u32 *addr, ppc_inst_t instr)
-{
-   return patch_memory(addr, ppc_inst_val(instr), false);
-}
-NOKPROBE_SYMBOL(patch_instruction)
-
   #endif

   int patch_branch(u32 *addr, unsigned long target, int flags)



Wouldn't every caller need to initialise the is_dword parameter in that 
case? It can't tell it's unused across a 

Re: [PATCH v2 0/3] Add generic data patching functions

2023-10-17 Thread Christophe Leroy


Le 16/10/2023 à 07:01, Benjamin Gray a écrit :
> Currently patch_instruction() bases the write length on the value being
> written. If the value looks like a prefixed instruction it writes 8 bytes,
> otherwise it writes 4 bytes. This makes it potentially buggy to use for
> writing arbitrary data, as if you want to write 4 bytes but it decides to
> write 8 bytes it may clobber the following memory or be unaligned and
> trigger an oops if it tries to cross a page boundary.
> 
> To solve this, this series pulls out the size parameter to the 'top' of
> the text patching logic, and propagates it through the various functions.
> 
> The two sizes supported are int and long; this allows for patching
> instructions and pointers on both ppc32 and ppc64. On ppc32 these are the
> same size, so care is taken to only use the size parameter on static
> functions, so the compiler can optimise it out entirely. Unfortunately
> GCC trips over its own feet here and won't optimise in a way that is
> optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX
> (pmac32_defconfig).
> 
> In the first case, patch_memory() is very large and can only be inlined
> if a single function calls it. While the source only calls it in
> patch_instruction(), an earlier optimisation pass inlined
> patch_instruction() into patch_branch(), so now there are 'two' references
> to patch_memory() and it cannot be inlined into patch_instruction().
> Instead patch_instruction() becomes a single branch directly to
> patch_memory().
> 
> We can fix this by marking patch_instruction() as noinline, but this
> prevents patch_memory() from being directly inlined into patch_branch()
> when RWX is disabled and patch_memory() is very small.
> 
> It may be possible to avoid this by merging together patch_instruction()
> and patch_memory() on ppc32, but the only way I can think to do this
> without duplicating the implementation involves using the preprocessor
> to change if is_dword is a parameter or a local variable, which is gross.

What about:

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 7c6056bb1706..af89ef450c93 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -72,7 +72,7 @@ static inline int create_branch(ppc_inst_t *instr, 
const u32 *addr,
  int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
   unsigned long target, int flags);
  int patch_branch(u32 *addr, unsigned long target, int flags);
-int patch_instruction(u32 *addr, ppc_inst_t instr);
+int patch_memory(void *addr, unsigned long val, bool is_dword);
  int raw_patch_instruction(u32 *addr, ppc_inst_t instr);

  /*
@@ -87,24 +87,28 @@ int raw_patch_instruction(u32 *addr, ppc_inst_t instr);

  #ifdef CONFIG_PPC64

-int patch_uint(void *addr, unsigned int val);
-int patch_ulong(void *addr, unsigned long val);
+int patch_instruction(u32 *addr, ppc_inst_t instr);

  #define patch_u64 patch_ulong

  #else

-static inline int patch_uint(u32 *addr, unsigned int val)
+static inline int patch_instruction(u32 *addr, ppc_inst_t instr)
  {
-   return patch_instruction(addr, ppc_inst(val));
+   return patch_memory(addr, ppc_inst_val(instr), false);
  }

+#endif
+
  static inline int patch_ulong(void *addr, unsigned long val)
  {
-   return patch_instruction(addr, ppc_inst(val));
+   return patch_memory(addr, val, true);
  }

-#endif
+static inline int patch_uint(void *addr, unsigned int val)
+{
+   return patch_memory(addr, val, false);
+}

  #define patch_u32 patch_uint

diff --git a/arch/powerpc/lib/code-patching.c 
b/arch/powerpc/lib/code-patching.c
index 60289332412f..77418b2a4aa4 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -355,7 +355,7 @@ static int __do_patch_memory(void *addr, unsigned 
long val, bool is_dword)
return err;
  }

-static int patch_memory(void *addr, unsigned long val, bool is_dword)
+int patch_memory(void *addr, unsigned long val, bool is_dword)
  {
int err;
unsigned long flags;
@@ -378,6 +378,7 @@ static int patch_memory(void *addr, unsigned long 
val, bool is_dword)

return err;
  }
+NOKPROBE_SYMBOL(patch_memory)

  #ifdef CONFIG_PPC64

@@ -390,26 +391,6 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
  }
  NOKPROBE_SYMBOL(patch_instruction)

-int patch_uint(void *addr, unsigned int val)
-{
-   return patch_memory(addr, val, false);
-}
-NOKPROBE_SYMBOL(patch_uint)
-
-int patch_ulong(void *addr, unsigned long val)
-{
-   return patch_memory(addr, val, true);
-}
-NOKPROBE_SYMBOL(patch_ulong)
-
-#else
-
-int patch_instruction(u32 *addr, ppc_inst_t instr)
-{
-   return patch_memory(addr, ppc_inst_val(instr), false);
-}
-NOKPROBE_SYMBOL(patch_instruction)
-
  #endif

  int patch_branch(u32 *addr, unsigned long target, int flags)


> 
> For now I've removed the noinline, because at least the compiler might
> get smarter in future and do 

Re: [PATCH v6 0/5] powerpc/bpf: use BPF prog pack allocator

2023-10-17 Thread Hari Bathini




On 16/10/23 5:37 pm, Daniel Borkmann wrote:

On 10/12/23 10:03 PM, Hari Bathini wrote:

Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Both bpf_arch_text_copy() & bpf_arch_text_invalidate() functions,
needed for this support depend on instruction patching in text area.
Currently, patch_instruction() supports patching only one instruction
at a time. The first patch introduces patch_instructions() function
to enable patching more than one instruction at a time. This helps in
avoiding performance degradation while JITing bpf programs.

Patches 2 & 3 implement the above mentioned arch specific functions
using patch_instructions(). Patch 4 fixes a misnomer in bpf JITing
code. The last patch enables the use of BPF prog pack allocator on
powerpc and also, ensures cleanup is handled gracefully.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-s...@kernel.org/

Changes in v6:
* No changes in patches 2-5/5 except addition of Acked-by tags from Song.
* Skipped merging code path of patch_instruction() & patch_instructions()
   to avoid performance overhead observed on ppc32 with that.


I presume this will be routed via Michael?


Yes, Daniel. This can go via linuxppc tree.

Thanks
Hari


Re: [PATCH 0/2] Allow nesting of lazy MMU mode

2023-10-17 Thread Aneesh Kumar K.V
Erhard Furtner  writes:

> On Thu, 12 Oct 2023 20:54:13 +0100
> "Matthew Wilcox (Oracle)"  wrote:
>
>> Dave Woodhouse reported that we now nest calls to
>> arch_enter_lazy_mmu_mode().  That was inadvertent, but in principle we
>> should allow it.  On further investigation, Juergen already fixed it
>> for Xen, but didn't tell anyone.  Fix it for Sparc & PowerPC too.
>> This may or may not help fix the problem that Erhard reported.
>> 
>> Matthew Wilcox (Oracle) (2):
>>   powerpc: Allow nesting of lazy MMU mode
>>   sparc: Allow nesting of lazy MMU mode
>> 
>>  arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 5 ++---
>>  arch/sparc/mm/tlb.c| 5 ++---
>>  2 files changed, 4 insertions(+), 6 deletions(-)
>> 
>> -- 
>> 2.40.1
>
> Applied the patch on top of v6.6-rc5 but unfortunately it did not fix my 
> reported issue.
>
> Regards,
> Erhard
> 

With the problem reported I guess we are finding the page->compound_head
wrong and hence folio->flags PG_dcache_clean check crashing. I still
don't know why we find page->compound_head wrong. Michael noted we are
using FLAT_MEM. That implies we are suppose to inialize struct page correctly
via init_unavailable_range because we are hitting this on an ioremap
address. We need to instrument the kernel to track the initialization of
the struct page backing these pfns which we know is crashing.

W.r.t arch_enter_lazy_mmu_mode() we can skip that completely on powerpc
because we don't allow the usage of set_pte on a valid pte entries. pte
updates are not done via set_pte interface and hence there is no TLB
invalidate required while using set_pte(). 

ie, we can do something like below. The change also make sure we call
set_pte_filter on all the ptes we are setting via set_ptes(). I haven't
sent this as a proper patch because we still are not able to fix the
issue Erhard reported. 

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 3ba9fe411604..95ab20cca2da 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, 
pte_t *ptep,
pte_t pte, unsigned int nr)
 {
/*
-* Make sure hardware valid bit is not set. We don't do
-* tlb flush for this update.
+* We don't need to call arch_enter/leave_lazy_mmu_mode()
+* because we expect set_ptes to be only be used on not present
+* and not hw_valid ptes. Hence there is not translation cache flush
+* involved that need to be batched.
 */
-   VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
+   for (;;) {
 
-   /* Note: mm->context.id might not yet have been assigned as
-* this context might not have been activated yet when this
-* is called.
-*/
-   pte = set_pte_filter(pte);
+   /*
+* Make sure hardware valid bit is not set. We don't do
+* tlb flush for this update.
+*/
+   VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
 
-   /* Perform the setting of the PTE */
-   arch_enter_lazy_mmu_mode();
-   for (;;) {
+   /* Note: mm->context.id might not yet have been assigned as
+* this context might not have been activated yet when this
+* is called.
+*/
+   pte = set_pte_filter(pte);
+
+   /* Perform the setting of the PTE */
__set_pte_at(mm, addr, ptep, pte, 0);
if (--nr == 0)
break;
ptep++;
-   pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
addr += PAGE_SIZE;
+   /* increment the pfn */
+   pte = __pte(pte_val(pte) + PAGE_SIZE);
+
}
-   arch_leave_lazy_mmu_mode();
 }
 
 void unmap_kernel_page(unsigned long va)