Re: [PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-05-14 Thread Oscar Salvador
On Tue, May 14, 2024 at 04:04:43PM +0200, Björn Töpel wrote:
> From: Björn Töpel 
> 
> During memory hot remove, the ptdump functionality can end up touching
> stale data. Avoid any potential crashes (or worse), by holding the
> memory hotplug read-lock while traversing the page table.
> 
> This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
> Hold memory hotplug lock while walking for kernel page table dump").
> 
> Signed-off-by: Björn Töpel 

Reviewed-by: Oscar Salvador 

funny enough, it seems arm64 and riscv are the only ones holding the
hotplug lock here.
I think we have the same problem on the other arches as well (at least
on x86_64 that I can see).

If we happen to finally need the lock in those, I would rather have a
centric function in the generic mm code with the locking and then
calling an arch specific ptdump_show function, so the lock is not
scattered. But that is another story.

 

-- 
Oscar Salvador
SUSE Labs



Re: [PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-05-14 Thread David Hildenbrand

On 14.05.24 16:04, Björn Töpel wrote:

From: Björn Töpel 

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.

This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").

Signed-off-by: Björn Töpel 
---
  arch/riscv/mm/ptdump.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 1289cc6d3700..9d5f657a251b 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,6 +6,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -370,7 +371,9 @@ bool ptdump_check_wx(void)
  
  static int ptdump_show(struct seq_file *m, void *v)

  {
+   get_online_mems();
ptdump_walk(m, m->private);
+   put_online_mems();
  
  	return 0;

  }


Reviewed-by: David Hildenbrand 

--
Cheers,

David / dhildenb




[PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-05-14 Thread Björn Töpel
From: Björn Töpel 

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.

This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").

Signed-off-by: Björn Töpel 
---
 arch/riscv/mm/ptdump.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 1289cc6d3700..9d5f657a251b 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -370,7 +371,9 @@ bool ptdump_check_wx(void)
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
+   get_online_mems();
ptdump_walk(m, m->private);
+   put_online_mems();
 
return 0;
 }
-- 
2.40.1




Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-05-13 Thread Michal Koutný
On Mon, Apr 08, 2024 at 04:58:18PM GMT, Michal Koutný  wrote:
> The kernel provides mechanisms, while it should not imply policies --
> default pid_max seems to be an example of the policy that does not fit
> all. At the same time pid_max must have some value assigned, so use the
> end of the allowed range -- pid_max_max.
> 
> This change thus increases initial pid_max from 32k to 4M (x86_64
> defconfig).

Out of curiosity I dug out the commit
acdc721fe26d ("[PATCH] pid-max-2.5.33-A0") v2.5.34~5
that introduced the 32k default. The commit message doesn't say why such
a sudden change though.
Previously, the limit was 1G of pids (i.e. effectively no default limit
like the intention of this series).

Honestly, I expected more enthusiasm or reasons against removing the
default value of pid_max. Is this really not of interest to anyone?

(Thanks, Andrew, for your responses. I don't plan to pursue this further
should there be no more interest in having less default limit values in
kernel.)

Regards,
Michal


signature.asc
Description: PGP signature


Re: [PATCH v2 5/6] kbuild: generate modules.builtin.ranges when linking the kernel

2024-05-12 Thread Masahiro Yamada
On Sun, May 12, 2024 at 7:44 AM Kris Van Hees  wrote:
>
> Signed-off-by: Kris Van Hees 
> Reviewed-by: Nick Alcock 
> ---
> Changes since v1:
>  - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES
> ---
>  scripts/Makefile.vmlinux | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index c9f3e03124d7f..54095d72f7fd7 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -36,6 +36,23 @@ targets += vmlinux
>  vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
> +$(call if_changed_dep,link_vmlinux)
>
> +# module.builtin.ranges
> +# ---
> +ifdef CONFIG_BUILTIN_MODULE_RANGES
> +__default: modules.builtin.ranges
> +
> +quiet_cmd_modules_builtin_ranges = GEN $@
> +  cmd_modules_builtin_ranges = \
> +   $(srctree)/scripts/generate_builtin_ranges.awk \
> + $(filter-out FORCE,$+) > $@


$(filter-out FORCE,$+)

  ->

$(real-prereqs)



> +
> +vmlinux.map: vmlinux
> +
> +targets += modules.builtin.ranges
> +modules.builtin.ranges: modules.builtin.objs vmlinux.map vmlinux.o.map FORCE
> +   $(call if_changed,modules_builtin_ranges)
> +endif
> +
>  # Add FORCE to the prequisites of a target to force it to be always rebuilt.
>  # ---
>
> --
> 2.43.0
>
>


-- 
Best Regards
Masahiro Yamada



Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-05-12 Thread Steev Klimaszewski
On Sat, May 11, 2024 at 4:56 PM Dmitry Baryshkov
 wrote:
>
> Protection domain mapper is a QMI service providing mapping between
> 'protection domains' and services supported / allowed in these domains.
> For example such mapping is required for loading of the WiFi firmware or
> for properly starting up the UCSI / altmode / battery manager support.
>
> The existing userspace implementation has several issue. It doesn't play
> well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
> firmware location is changed (or if the firmware was not available at
> the time pd-mapper was started but the corresponding directory is
> mounted later), etc.
>
> However this configuration is largely static and common between
> different platforms. Provide in-kernel service implementing static
> per-platform data.
>
> To: Bjorn Andersson 
> To: Konrad Dybcio 
> To: Sibi Sankar 
> To: Mathieu Poirier 
> Cc: linux-arm-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-remotep...@vger.kernel.org
> Cc: Johan Hovold 
> Cc: Xilin Wu 
> Cc: "Bryan O'Donoghue" 
> Cc: Steev Klimaszewski 
> Cc: Alexey Minnekhanov 
>
> --
>
> Changes in v8:
> - Reworked pd-mapper to register as an rproc_subdev / auxdev
> - Dropped Tested-by from Steev and Alexey from the last patch since the
>   implementation was changed significantly.
> - Add sensors, cdsp and mpss_root domains to 660 config (Alexey
>   Minnekhanov)
> - Added platform entry for sm4250 (used for qrb4210 / RB2)
> - Added locking to the pdr_get_domain_list() (Chris Lew)
> - Remove the call to qmi_del_server() and corresponding API (Chris Lew)
> - In qmi_handle_init() changed 1024 to a defined constant (Chris Lew)
> - Link to v7: 
> https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org
>
> Changes in v7:
> - Fixed modular build (Steev)
> - Link to v6: 
> https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org
>
> Changes in v6:
> - Reworked mutex to fix lockdep issue on deregistration
> - Fixed dependencies between PD-mapper and remoteproc to fix modular
>   builds (Krzysztof)
> - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
> - Fixed kerneldocs (Krzysztof)
> - Removed extra pr_debug messages (Krzysztof)
> - Fixed wcss build (Krzysztof)
> - Added platforms which do not require protection domain mapping to
>   silence the notice on those platforms
> - Link to v5: 
> https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org
>
> Changes in v5:
> - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris 
> Lew)
> - pd_mapper: reworked to provide static configuration per platform
>   (Bjorn)
> - Link to v4: 
> https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org
>
> Changes in v4:
> - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
> - Added configuration for sm6350 (Thanks to Luca)
> - Removed RFC tag (Konrad)
> - Link to v3: 
> https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org
>
> Changes in RFC v3:
> - Send start / stop notifications when PD-mapper domain list is changed
> - Reworked the way PD-mapper treats protection domains, register all of
>   them in a single batch
> - Added SC7180 domains configuration based on TCL Book 14 GO
> - Link to v2: 
> https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org
>
> Changes in RFC v2:
> - Swapped num_domains / domains (Konrad)
> - Fixed an issue with battery not working on sc8280xp
> - Added missing configuration for QCS404
>
> ---
> Dmitry Baryshkov (5):
>   soc: qcom: pdr: protect locator_addr with the main mutex
>   soc: qcom: pdr: fix parsing of domains lists
>   soc: qcom: pdr: extract PDR message marshalling data
>   soc: qcom: add pd-mapper implementation
>   remoteproc: qcom: enable in-kernel PD mapper
>
>  drivers/remoteproc/qcom_common.c|  87 +
>  drivers/remoteproc/qcom_common.h|  10 +
>  drivers/remoteproc/qcom_q6v5_adsp.c |   3 +
>  drivers/remoteproc/qcom_q6v5_mss.c  |   3 +
>  drivers/remoteproc/qcom_q6v5_pas.c  |   3 +
>  drivers/remoteproc/qcom_q6v5_wcss.c |   3 +
>  drivers/soc/qcom/Kconfig|  15 +
>  drivers/soc/qcom/Makefile   |   2 +
>  drivers/soc/qcom/pdr_interface.c|  17 +-
>  drivers/soc/qcom/pdr_internal.h | 318 ++---
>  drivers/soc/qcom/qcom_pd_mapper.c   | 676 
> 
>  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
>  12 files changed, 1190 insertions(+), 300 deletions(-)
> ---
> base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488
> change-id: 20240301-qcom-pd-mapper-e12d622d4ad0
>
> Best regards,
> --
> Dmitry Baryshkov 
>

I've tested this over the weekend on my Thinkpad X13s with a number of
reboots and seems to do the correct thing in v8 as well.
Tested-by: Steev Klimaszewski 



[PATCH v2 5/6] kbuild: generate modules.builtin.ranges when linking the kernel

2024-05-11 Thread Kris Van Hees
Signed-off-by: Kris Van Hees 
Reviewed-by: Nick Alcock 
---
Changes since v1:
 - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES
---
 scripts/Makefile.vmlinux | 17 +
 1 file changed, 17 insertions(+)

diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index c9f3e03124d7f..54095d72f7fd7 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -36,6 +36,23 @@ targets += vmlinux
 vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
+$(call if_changed_dep,link_vmlinux)
 
+# module.builtin.ranges
+# ---
+ifdef CONFIG_BUILTIN_MODULE_RANGES
+__default: modules.builtin.ranges
+
+quiet_cmd_modules_builtin_ranges = GEN $@
+  cmd_modules_builtin_ranges = \
+   $(srctree)/scripts/generate_builtin_ranges.awk \
+ $(filter-out FORCE,$+) > $@
+
+vmlinux.map: vmlinux
+
+targets += modules.builtin.ranges
+modules.builtin.ranges: modules.builtin.objs vmlinux.map vmlinux.o.map FORCE
+   $(call if_changed,modules_builtin_ranges)
+endif
+
 # Add FORCE to the prequisites of a target to force it to be always rebuilt.
 # ---
 
-- 
2.43.0




[PATCH v8 5/5] remoteproc: qcom: enable in-kernel PD mapper

2024-05-11 Thread Dmitry Baryshkov
Request in-kernel protection domain mapper to be started before starting
Qualcomm DSP and release it once DSP is stopped. Once all DSPs are
stopped, the PD mapper will be stopped too.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/remoteproc/qcom_common.c| 87 +
 drivers/remoteproc/qcom_common.h| 10 +
 drivers/remoteproc/qcom_q6v5_adsp.c |  3 ++
 drivers/remoteproc/qcom_q6v5_mss.c  |  3 ++
 drivers/remoteproc/qcom_q6v5_pas.c  |  3 ++
 drivers/remoteproc/qcom_q6v5_wcss.c |  3 ++
 6 files changed, 109 insertions(+)

diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
index 03e5f5d533eb..8c8688f99f0a 100644
--- a/drivers/remoteproc/qcom_common.c
+++ b/drivers/remoteproc/qcom_common.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -25,6 +26,7 @@
 #define to_glink_subdev(d) container_of(d, struct qcom_rproc_glink, subdev)
 #define to_smd_subdev(d) container_of(d, struct qcom_rproc_subdev, subdev)
 #define to_ssr_subdev(d) container_of(d, struct qcom_rproc_ssr, subdev)
+#define to_pdm_subdev(d) container_of(d, struct qcom_rproc_pdm, subdev)
 
 #define MAX_NUM_OF_SS   10
 #define MAX_REGION_NAME_LENGTH  16
@@ -519,5 +521,90 @@ void qcom_remove_ssr_subdev(struct rproc *rproc, struct 
qcom_rproc_ssr *ssr)
 }
 EXPORT_SYMBOL_GPL(qcom_remove_ssr_subdev);
 
+static void pdm_dev_release(struct device *dev)
+{
+   struct auxiliary_device *adev = to_auxiliary_dev(dev);
+
+   kfree(adev);
+}
+
+static int pdm_notify_prepare(struct rproc_subdev *subdev)
+{
+   struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev);
+   struct auxiliary_device *adev;
+   int ret;
+
+   adev = kzalloc(sizeof(*adev), GFP_KERNEL);
+   if (!adev)
+   return -ENOMEM;
+
+   adev->dev.parent = pdm->dev;
+   adev->dev.release = pdm_dev_release;
+   adev->name = "pd-mapper";
+   adev->id = pdm->index;
+
+   ret = auxiliary_device_init(adev);
+   if (ret) {
+   kfree(adev);
+   return ret;
+   }
+
+   ret = auxiliary_device_add(adev);
+   if (ret) {
+   auxiliary_device_uninit(adev);
+   return ret;
+   }
+
+   pdm->adev = adev;
+
+   return 0;
+}
+
+
+static void pdm_notify_unprepare(struct rproc_subdev *subdev)
+{
+   struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev);
+
+   if (!pdm->adev)
+   return;
+
+   auxiliary_device_delete(pdm->adev);
+   auxiliary_device_uninit(pdm->adev);
+   pdm->adev = NULL;
+}
+
+/**
+ * qcom_add_pdm_subdev() - register PD Mapper subdevice
+ * @rproc: rproc handle
+ * @pdm:   PDM subdevice handle
+ *
+ * Register @pdm so that Protection Device mapper service is started when the
+ * DSP is started too.
+ */
+void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm)
+{
+   pdm->dev = >dev;
+   pdm->index = rproc->index;
+
+   pdm->subdev.prepare = pdm_notify_prepare;
+   pdm->subdev.unprepare = pdm_notify_unprepare;
+
+   rproc_add_subdev(rproc, >subdev);
+}
+EXPORT_SYMBOL_GPL(qcom_add_pdm_subdev);
+
+/**
+ * qcom_remove_pdm_subdev() - remove PD Mapper subdevice
+ * @rproc: rproc handle
+ * @pdm:   PDM subdevice handle
+ *
+ * Remove the PD Mapper subdevice.
+ */
+void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm)
+{
+   rproc_remove_subdev(rproc, >subdev);
+}
+EXPORT_SYMBOL_GPL(qcom_remove_pdm_subdev);
+
 MODULE_DESCRIPTION("Qualcomm Remoteproc helper driver");
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/remoteproc/qcom_common.h b/drivers/remoteproc/qcom_common.h
index 9ef4449052a9..b07fbaa091a0 100644
--- a/drivers/remoteproc/qcom_common.h
+++ b/drivers/remoteproc/qcom_common.h
@@ -34,6 +34,13 @@ struct qcom_rproc_ssr {
struct qcom_ssr_subsystem *info;
 };
 
+struct qcom_rproc_pdm {
+   struct rproc_subdev subdev;
+   struct device *dev;
+   int index;
+   struct auxiliary_device *adev;
+};
+
 void qcom_minidump(struct rproc *rproc, unsigned int minidump_id,
void (*rproc_dumpfn_t)(struct rproc *rproc,
struct rproc_dump_segment *segment, void *dest, 
size_t offset,
@@ -52,6 +59,9 @@ void qcom_add_ssr_subdev(struct rproc *rproc, struct 
qcom_rproc_ssr *ssr,
 const char *ssr_name);
 void qcom_remove_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr);
 
+void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm);
+void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm);
+
 #if IS_ENABLED(CONFIG_QCOM_SYSMON)
 struct qcom_sysmon *qcom_add_sysmon_subdev(struct rproc *rproc,
   const char *name,
diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/drivers/remoteproc/qcom_q6v5_a

[PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-05-11 Thread Dmitry Baryshkov
Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org
Cc: Johan Hovold 
Cc: Xilin Wu 
Cc: "Bryan O'Donoghue" 
Cc: Steev Klimaszewski 
Cc: Alexey Minnekhanov 

--

Changes in v8:
- Reworked pd-mapper to register as an rproc_subdev / auxdev
- Dropped Tested-by from Steev and Alexey from the last patch since the
  implementation was changed significantly.
- Add sensors, cdsp and mpss_root domains to 660 config (Alexey
  Minnekhanov)
- Added platform entry for sm4250 (used for qrb4210 / RB2)
- Added locking to the pdr_get_domain_list() (Chris Lew)
- Remove the call to qmi_del_server() and corresponding API (Chris Lew)
- In qmi_handle_init() changed 1024 to a defined constant (Chris Lew)
- Link to v7: 
https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
  builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
  silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
  (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
  them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404

---
Dmitry Baryshkov (5):
  soc: qcom: pdr: protect locator_addr with the main mutex
  soc: qcom: pdr: fix parsing of domains lists
  soc: qcom: pdr: extract PDR message marshalling data
  soc: qcom: add pd-mapper implementation
  remoteproc: qcom: enable in-kernel PD mapper

 drivers/remoteproc/qcom_common.c|  87 +
 drivers/remoteproc/qcom_common.h|  10 +
 drivers/remoteproc/qcom_q6v5_adsp.c |   3 +
 drivers/remoteproc/qcom_q6v5_mss.c  |   3 +
 drivers/remoteproc/qcom_q6v5_pas.c  |   3 +
 drivers/remoteproc/qcom_q6v5_wcss.c |   3 +
 drivers/soc/qcom/Kconfig|  15 +
 drivers/soc/qcom/Makefile   |   2 +
 drivers/soc/qcom/pdr_interface.c|  17 +-
 drivers/soc/qcom/pdr_internal.h | 318 ++---
 drivers/soc/qcom/qcom_pd_mapper.c   | 676 
 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
 12 files changed, 1190 insertions(+), 300 deletions(-)
---
base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488
change-id: 20240301-qcom-pd-mapper-e12d622d4ad0

Best regards,
-- 
Dmitry Baryshkov 




Re: kernel BUG in ptr_stale

2024-05-09 Thread Kent Overstreet
On Thu, May 09, 2024 at 02:26:24PM +0800, Ubisectech Sirius wrote:
> Hello.
> We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
> Recently, our team has discovered a issue in Linux kernel 6.7. Attached to 
> the email were a PoC file of the issue.

This (and several of your others) are fixed in Linus's tree.

> 
> Stack dump:
> 
> bcachefs (loop1): mounting version 1.7: (unknown version) 
> opts=metadata_checksum=none,data_checksum=none,nojournal_transaction_names
> ----[ cut here ]
> kernel BUG at fs/bcachefs/buckets.h:114!
> invalid opcode:  [#1] PREEMPT SMP KASAN NOPTI
> CPU: 1 PID: 9472 Comm: syz-executor.1 Not tainted 6.7.0 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 
> 04/01/2014
> RIP: 0010:bucket_gen fs/bcachefs/buckets.h:114 [inline]
> RIP: 0010:ptr_stale+0x474/0x4e0 fs/bcachefs/buckets.h:188
> Code: 48 c7 c2 80 8c 1b 8b be 67 00 00 00 48 c7 c7 e0 8c 1b 8b c6 05 ea a6 72 
> 0b 01 e8 57 55 9c fd e9 fb fc ff ff e8 9d 02 bd fd 90 <0f> 0b 48 89 04 24 e8 
> 31 bb 13 fe 48 8b 04 24 e9 35 fc ff ff e8 23
> RSP: 0018:c90007c4ec38 EFLAGS: 00010246
> RAX: 0004 RBX: 0080 RCX: c90002679000
> RDX: 0004 RSI: 83ccf3b3 RDI: 0006
> RBP:  R08: 0006 R09: 1028
> R10: 0080 R11:  R12: 1028
> R13: 88804dee5100 R14:  R15: 88805b1a4110
> FS:  7f79ba8ab640() GS:88807ec0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f0bbda3f000 CR3: 5f37a000 CR4: 00750ef0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> PKRU: 5554
> Call Trace:
>  
>  bch2_bkey_ptrs_to_text+0xb4e/0x1760 fs/bcachefs/extents.c:1012
>  bch2_btree_ptr_v2_to_text+0x288/0x330 fs/bcachefs/extents.c:215
>  bch2_val_to_text fs/bcachefs/bkey_methods.c:287 [inline]
>  bch2_bkey_val_to_text+0x1c8/0x210 fs/bcachefs/bkey_methods.c:297
>  journal_validate_key+0x7ab/0xb50 fs/bcachefs/journal_io.c:322
>  journal_entry_btree_root_validate+0x31c/0x380 fs/bcachefs/journal_io.c:411
>  bch2_journal_entry_validate+0xc7/0x130 fs/bcachefs/journal_io.c:752
>  bch2_sb_clean_validate_late+0x14b/0x1e0 fs/bcachefs/sb-clean.c:32
>  bch2_read_superblock_clean+0xbb/0x250 fs/bcachefs/sb-clean.c:160
>  bch2_fs_recovery+0x113/0x52d0 fs/bcachefs/recovery.c:691
>  bch2_fs_start+0x365/0x5e0 fs/bcachefs/super.c:978
>  bch2_fs_open+0x1ac9/0x3890 fs/bcachefs/super.c:1968
>  bch2_mount+0x538/0x13c0 fs/bcachefs/fs.c:1863
>  legacy_get_tree+0x109/0x220 fs/fs_context.c:662
>  vfs_get_tree+0x93/0x380 fs/super.c:1771
>  do_new_mount fs/namespace.c:3337 [inline]
>  path_mount+0x679/0x1e40 fs/namespace.c:3664
>  do_mount fs/namespace.c:3677 [inline]
>  __do_sys_mount fs/namespace.c:3886 [inline]
>  __se_sys_mount fs/namespace.c:3863 [inline]
>  __x64_sys_mount+0x287/0x310 fs/namespace.c:3863
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x6f/0x77
> RIP: 0033:0x7f79b9a91b3e
> Code: 48 c7 c0 ff ff ff ff eb aa e8 be 0d 00 00 66 2e 0f 1f 84 00 00 00 00 00 
> 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7f79ba8aae38 EFLAGS: 0202 ORIG_RAX: 00a5
> RAX: ffda RBX: 000119f4 RCX: 7f79b9a91b3e
> RDX: 20011a00 RSI: 20011a40 RDI: 7f79ba8aae90
> RBP: 7f79ba8aaed0 R08: 7f79ba8aaed0 R09: 0181c050
> R10: 0181c050 R11: 0202 R12: 20011a00
> R13: 20011a40 R14: 7f79ba8aae90 R15: 21c0
>  
> Modules linked in:
> ---[ end trace  ]---
> 
> 
> Thank you for taking the time to read this email and we look forward to 
> working with you further.
> 
> 
> 
> 
> 
> 





[PATCH] tracing: Fix trace_pid_list_free() kernel-doc

2024-05-06 Thread Jeff Johnson
make C=1 reports:

kernel/trace/pid_list.c:458: warning: Function parameter or struct member 
'pid_list' not described in 'trace_pid_list_free'

Add the missing parameter to the trace_pid_list_free() kernel-doc.

Signed-off-by: Jeff Johnson 
---
 kernel/trace/pid_list.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/pid_list.c b/kernel/trace/pid_list.c
index 95106d02b32d..19b271a12c99 100644
--- a/kernel/trace/pid_list.c
+++ b/kernel/trace/pid_list.c
@@ -451,6 +451,7 @@ struct trace_pid_list *trace_pid_list_alloc(void)
 
 /**
  * trace_pid_list_free - Frees an allocated pid_list.
+ * @pid_list: The pid list to free.
  *
  * Frees the memory for a pid_list that was allocated.
  */

---
base-commit: dd5a440a31fae6e459c0d627162825505361
change-id: 20240506-trace_pid_list_free-kdoc-e2bf15be84ee




Re: [PATCH v3 1/2] virtiofs: use pages instead of pointer for kernel direct IO

2024-05-06 Thread Hou Tao



On 4/26/2024 10:39 PM, Hou Tao wrote:
> From: Hou Tao 
>
> When trying to insert a 10MB kernel module kept in a virtio-fs with cache
> disabled, the following warning was reported:
>
>   [ cut here ]
>   WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ..
>   Modules linked in:
>   CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ..
>   RIP: 0010:__alloc_pages+0x2bf/0x380
>   ..
>   Call Trace:
>
>? __warn+0x8e/0x150
>? __alloc_pages+0x2bf/0x380
>__kmalloc_large_node+0x86/0x160
>__kmalloc+0x33c/0x480
>virtio_fs_enqueue_req+0x240/0x6d0
>virtio_fs_wake_pending_and_unlock+0x7f/0x190
>queue_request_and_unlock+0x55/0x60
>fuse_simple_request+0x152/0x2b0
>fuse_direct_io+0x5d2/0x8c0
>fuse_file_read_iter+0x121/0x160
>__kernel_read+0x151/0x2d0
>kernel_read+0x45/0x50
>kernel_read_file+0x1a9/0x2a0
>init_module_from_file+0x6a/0xe0
>idempotent_init_module+0x175/0x230
>__x64_sys_finit_module+0x5d/0xb0
>x64_sys_call+0x1c3/0x9e0
>do_syscall_64+0x3d/0xc0
>entry_SYSCALL_64_after_hwframe+0x4b/0x53
>..
>
>   ---[ end trace  ]---
>
> The warning is triggered as follows:
>

SNIP
> @@ -1585,7 +1589,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct 
> iov_iter *iter,
>   size_t nbytes = min(count, nmax);
>  
>   err = fuse_get_user_pages(>ap, iter, , write,
> -   max_pages);
> +   max_pages, fc->use_pages_for_kvec_io);
>   if (err && !nbytes)
>   break;

Just find out that flush_kernel_vmap_range() and
invalidate_kernel_vmap_range() should be used before DMA rw operation
and after DMA read operation if the kvec IO is backed by vmalloc() area.
Will update it in v4.
>  
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index f239196103137..d4f04e19058c1 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -860,6 +860,9 @@ struct fuse_conn {
>   /** Passthrough support for read/write IO */
>   unsigned int passthrough:1;
>  
> + /* Use pages instead of pointer for kernel I/O */
> + unsigned int use_pages_for_kvec_io:1;
> +
>   /** Maximum stack depth for passthrough backing files */
>   int max_stack_depth;
>  
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 322af827a2329..36984c0e23d14 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -1512,6 +1512,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc)
>   fc->delete_stale = true;
>   fc->auto_submounts = true;
>   fc->sync_fs = true;
> + fc->use_pages_for_kvec_io = true;
>  
>   /* Tell FUSE to split requests that exceed the virtqueue's size */
>   fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit,




Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-05-06 Thread Huacai Chen
On Mon, May 6, 2024 at 3:00 PM maobibo  wrote:
>
>
>
> On 2024/5/6 上午9:53, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:
> >>
> >> PARAVIRT option and pv ipi is added on guest kernel side, function
> >> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
> >> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> >> it will call function kvm_para_available() to detect current hypervirsor
> >> type. Now only KVM type detection is supported, the paravirt function can
> >> work only if current hypervisor type is KVM, since there is only KVM
> >> supported on LoongArch now.
> >>
> >> PV IPI uses virtual IPI sender and virtual IPI receiver function. With
> >> virutal IPI sender, ipi message is stored in DDR memory rather than
> >> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
> >> at the same time like X86 KVM method. Hypercall method is used for IPI
> >> sending.
> >>
> >> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
> >> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
> >> acknowledge. And IPI message is stored in DDR, no trap in get IPI message.
> >>
> >> Signed-off-by: Bibo Mao 
> >> ---
> >>   arch/loongarch/Kconfig    |   9 ++
> >>   arch/loongarch/include/asm/hardirq.h  |   1 +
> >>   arch/loongarch/include/asm/paravirt.h |  27 
> >>   .../include/asm/paravirt_api_clock.h  |   1 +
> >>   arch/loongarch/kernel/Makefile|   1 +
> >>   arch/loongarch/kernel/irq.c   |   2 +-
> >>   arch/loongarch/kernel/paravirt.c  | 151 ++
> >>   arch/loongarch/kernel/smp.c   |   4 +-
> >>   8 files changed, 194 insertions(+), 2 deletions(-)
> >>   create mode 100644 arch/loongarch/include/asm/paravirt.h
> >>   create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> >>   create mode 100644 arch/loongarch/kernel/paravirt.c
> >>
> >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >> index 54ad04dacdee..0a1540a8853e 100644
> >> --- a/arch/loongarch/Kconfig
> >> +++ b/arch/loongarch/Kconfig
> >> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
> >>  bool
> >>  default y
> >>
> >> +config PARAVIRT
> >> +   bool "Enable paravirtualization code"
> >> +   depends on AS_HAS_LVZ_EXTENSION
> >> +   help
> >> +  This changes the kernel so it can modify itself when it is run
> >> + under a hypervisor, potentially improving performance 
> >> significantly
> >> + over full virtualization.  However, when run without a hypervisor
> >> + the kernel is theoretically slower and slightly larger.
> >> +
> >>   config ARCH_SUPPORTS_KEXEC
> >>  def_bool y
> >>
> >> diff --git a/arch/loongarch/include/asm/hardirq.h 
> >> b/arch/loongarch/include/asm/hardirq.h
> >> index 9f0038e19c7f..b26d596a73aa 100644
> >> --- a/arch/loongarch/include/asm/hardirq.h
> >> +++ b/arch/loongarch/include/asm/hardirq.h
> >> @@ -21,6 +21,7 @@ enum ipi_msg_type {
> >>   typedef struct {
> >>  unsigned int ipi_irqs[NR_IPI];
> >>  unsigned int __softirq_pending;
> >> +   atomic_t message cacheline_aligned_in_smp;
> >>   } cacheline_aligned irq_cpustat_t;
> >>
> >>   DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> >> diff --git a/arch/loongarch/include/asm/paravirt.h 
> >> b/arch/loongarch/include/asm/paravirt.h
> >> new file mode 100644
> >> index ..58f7b7b89f2c
> >> --- /dev/null
> >> +++ b/arch/loongarch/include/asm/paravirt.h
> >> @@ -0,0 +1,27 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> >> +#define _ASM_LOONGARCH_PARAVIRT_H
> >> +
> >> +#ifdef CONFIG_PARAVIRT
> >> +#include 
> >> +struct static_key;
> >> +extern struct static_key paravirt_steal_enabled;
> >> +extern struct static_key paravirt_steal_rq_enabled;
> >> +
> >> +u64 dummy_steal_clock(int cpu);
> >> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> >> +
> >> +static inline u64 paravirt_steal_clock

Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-05-06 Thread maobibo




On 2024/5/6 上午9:53, Huacai Chen wrote:

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:


PARAVIRT option and pv ipi is added on guest kernel side, function
pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current hypervirsor
type. Now only KVM type detection is supported, the paravirt function can
work only if current hypervisor type is KVM, since there is only KVM
supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver function. With
virutal IPI sender, ipi message is stored in DDR memory rather than
emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
acknowledge. And IPI message is stored in DDR, no trap in get IPI message.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/Kconfig|   9 ++
  arch/loongarch/include/asm/hardirq.h  |   1 +
  arch/loongarch/include/asm/paravirt.h |  27 
  .../include/asm/paravirt_api_clock.h  |   1 +
  arch/loongarch/kernel/Makefile|   1 +
  arch/loongarch/kernel/irq.c   |   2 +-
  arch/loongarch/kernel/paravirt.c  | 151 ++
  arch/loongarch/kernel/smp.c   |   4 +-
  8 files changed, 194 insertions(+), 2 deletions(-)
  create mode 100644 arch/loongarch/include/asm/paravirt.h
  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
  create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 54ad04dacdee..0a1540a8853e 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
 bool
 default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
  config ARCH_SUPPORTS_KEXEC
 def_bool y

diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
  typedef struct {
 unsigned int ipi_irqs[NR_IPI];
 unsigned int __softirq_pending;
+   atomic_t message cacheline_aligned_in_smp;
  } cacheline_aligned irq_cpustat_t;

  DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3a7620b66bc6..c9bfeda89e40 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
  obj-$(CONFIG_STACKTRACE)   += stacktrace.o

  obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o

  obj-$(CONFIG_SMP)  += smp.o

diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
 per_cpu(irq_stack, i), per_cpu(irq_stack, i) + 
IRQ_STACK_SIZE);
 }

-   set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+   se

[PATCH] kernel/module: disable cfi for do_mod_ctors

2024-05-06 Thread Joey Jiao
CFI failure when both CONFIG_CONSTRUCTORS and CFI_CLANG enabled.

CFI failure at do_init_module+0x100/0x384 (target:
tsan.module_ctor+0x0/0xa98 [module_name_xx]; expected type: 0xa540670c)

Disable cfi for do_mod_ctors to avoid cfi check on mod->ctors[i]().

Signed-off-by: Joey Jiao 
---
 kernel/module/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index e1e8a7a9d6c1..d51e63795637 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2453,6 +2453,7 @@ static int post_relocation(struct module *mod, const 
struct load_info *info)
 }
 
 /* Call module constructors. */
+__nocfi
 static void do_mod_ctors(struct module *mod)
 {
 #ifdef CONFIG_CONSTRUCTORS
-- 
2.43.2




Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-05-05 Thread Huacai Chen
Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:
>
> PARAVIRT option and pv ipi is added on guest kernel side, function
> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> it will call function kvm_para_available() to detect current hypervirsor
> type. Now only KVM type detection is supported, the paravirt function can
> work only if current hypervisor type is KVM, since there is only KVM
> supported on LoongArch now.
>
> PV IPI uses virtual IPI sender and virtual IPI receiver function. With
> virutal IPI sender, ipi message is stored in DDR memory rather than
> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
> at the same time like X86 KVM method. Hypercall method is used for IPI
> sending.
>
> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
> acknowledge. And IPI message is stored in DDR, no trap in get IPI message.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/Kconfig|   9 ++
>  arch/loongarch/include/asm/hardirq.h  |   1 +
>  arch/loongarch/include/asm/paravirt.h |  27 
>  .../include/asm/paravirt_api_clock.h  |   1 +
>  arch/loongarch/kernel/Makefile|   1 +
>  arch/loongarch/kernel/irq.c   |   2 +-
>  arch/loongarch/kernel/paravirt.c  | 151 ++
>  arch/loongarch/kernel/smp.c   |   4 +-
>  8 files changed, 194 insertions(+), 2 deletions(-)
>  create mode 100644 arch/loongarch/include/asm/paravirt.h
>  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>  create mode 100644 arch/loongarch/kernel/paravirt.c
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 54ad04dacdee..0a1540a8853e 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
> bool
> default y
>
> +config PARAVIRT
> +   bool "Enable paravirtualization code"
> +   depends on AS_HAS_LVZ_EXTENSION
> +   help
> +  This changes the kernel so it can modify itself when it is run
> + under a hypervisor, potentially improving performance significantly
> + over full virtualization.  However, when run without a hypervisor
> + the kernel is theoretically slower and slightly larger.
> +
>  config ARCH_SUPPORTS_KEXEC
> def_bool y
>
> diff --git a/arch/loongarch/include/asm/hardirq.h 
> b/arch/loongarch/include/asm/hardirq.h
> index 9f0038e19c7f..b26d596a73aa 100644
> --- a/arch/loongarch/include/asm/hardirq.h
> +++ b/arch/loongarch/include/asm/hardirq.h
> @@ -21,6 +21,7 @@ enum ipi_msg_type {
>  typedef struct {
> unsigned int ipi_irqs[NR_IPI];
> unsigned int __softirq_pending;
> +   atomic_t message cacheline_aligned_in_smp;
>  } cacheline_aligned irq_cpustat_t;
>
>  DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> diff --git a/arch/loongarch/include/asm/paravirt.h 
> b/arch/loongarch/include/asm/paravirt.h
> new file mode 100644
> index ..58f7b7b89f2c
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> +#define _ASM_LOONGARCH_PARAVIRT_H
> +
> +#ifdef CONFIG_PARAVIRT
> +#include 
> +struct static_key;
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 dummy_steal_clock(int cpu);
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> +   return static_call(pv_steal_clock)(cpu);
> +}
> +
> +int pv_ipi_init(void);
> +#else
> +static inline int pv_ipi_init(void)
> +{
> +   return 0;
> +}
> +
> +#endif // CONFIG_PARAVIRT
> +#endif
> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
> b/arch/loongarch/include/asm/paravirt_api_clock.h
> new file mode 100644
> index ..65ac7cee0dad
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> @@ -0,0 +1 @@
> +#include 
> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> index 3a7620b66bc6..c9bfeda89e40 100644
> --- a/arch/loongarch/kernel/Makefile
> +++ b/arch/loongarch/kernel/Makefile
> @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
>  obj-$(CONFIG_STACKTRACE)   += stacktrace.o
>
>  obj-$(CONFIG_PROC_FS)  += proc.o
>

Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-30 Thread Chris Lew




On 4/26/2024 6:36 PM, Dmitry Baryshkov wrote:

On Sat, 27 Apr 2024 at 04:03, Chris Lew  wrote:




On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:

diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/drivers/remoteproc/qcom_q6v5_adsp.c
index 1d24c9b656a8..02d0c626b03b 100644
--- a/drivers/remoteproc/qcom_q6v5_adsp.c
+++ b/drivers/remoteproc/qcom_q6v5_adsp.c
@@ -23,6 +23,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 

@@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)
   int ret;
   unsigned int val;

- ret = qcom_q6v5_prepare(>q6v5);
+ ret = qcom_pdm_get();
   if (ret)
   return ret;


Would it make sense to try and model this as a rproc subdev? This
section of the remoteproc code seems to be focused on making specific
calls to setup and enable hardware resources, where as pd mapper is
software.

sysmon and ssr are also purely software and they are modeled as subdevs
in qcom_common. I'm not an expert on remoteproc organization but this
was just a thought.


Well, the issue is that the pd-mapper is a global, not a per-remoteproc instance



Both sysmon and ssr have some kind of global states that they manage 
too. Each subdev functionality tends to be a mix of per-remoteproc 
instance management and global state management.


If pd-mapper was completely global, pd-mapper would be able to 
instantiate by itself. Instead, instantiation is dependent on each 
remoteproc instance properly getting and putting references.


The pdm subdev could manage the references to pd-mapper for that 
remoteproc instance.


On the other hand, I think Bjorn recommended this could be moved to 
probe time in v4. The v4 version was doing the reinitialization-dance, 
but I think the recommendation could still apply to this version.




Thanks!
Chris



+ ret = qcom_q6v5_prepare(>q6v5);
+ if (ret)
+ goto put_pdm;
+
   ret = adsp_map_carveout(rproc);
   if (ret) {
   dev_err(adsp->dev, "ADSP smmu mapping failed\n");
@@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
   adsp_unmap_carveout(rproc);
   disable_irqs:
   qcom_q6v5_unprepare(>q6v5);
+put_pdm:
+ qcom_pdm_release();

   return ret;
   }









BUG: unable to handle kernel paging request in do_split

2024-04-29 Thread Ubisectech Sirius
Hello.
We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
Recently, our team has discovered a issue in Linux kernel 6.7. Attached to the 
email were a PoC file of the issue.

Stack dump:
BUG: unable to handle page fault for address: ed110c2fd97f
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 7ffd0067 P4D 7ffd0067 PUD 0
Oops:  [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 PID: 24082 Comm: syz-executor.3 Not tainted 6.7.0 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:do_split+0xfef/0x1e10 fs/ext4/namei.c:2047
Code: d2 0f 85 38 0b 00 00 8b 45 00 89 84 24 84 00 00 00 41 8d 45 ff 48 8d 1c 
c3 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 
e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ef
RSP: 0018:c90001e9f858 EFLAGS: 00010a02
RAX: dc00 RBX: 617ecbf8 RCX: c9001048f000
RDX: 11110c2fd97f RSI: 823364ab RDI: 0005
RBP: 8880617ecc00 R08: 0005 R09: 
R10:  R11:  R12: dc00
R13:  R14:  R15: 88801ee8d2b0
FS:  7f191402a640() GS:88802c60() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ed110c2fd97f CR3: 5500a000 CR4: 00750ef0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554
Call Trace:
 
 make_indexed_dir+0x1158/0x1540 fs/ext4/namei.c:2342
 ext4_add_entry+0xcd0/0xe80 fs/ext4/namei.c:2454
 ext4_add_nondir+0x90/0x2b0 fs/ext4/namei.c:2795
 ext4_symlink+0x539/0x9e0 fs/ext4/namei.c:3436
 vfs_symlink fs/namei.c:4464 [inline]
 vfs_symlink+0x3f6/0x640 fs/namei.c:4448
 do_symlinkat+0x245/0x2f0 fs/namei.c:4490
 __do_sys_symlink fs/namei.c:4511 [inline]
 __se_sys_symlink fs/namei.c:4509 [inline]
 __x64_sys_symlink+0x79/0xa0 fs/namei.c:4509
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f191329002d
Code: c3 e8 97 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7f191402a028 EFLAGS: 0246 ORIG_RAX: 0058
RAX: ffda RBX: 7f19133cbf80 RCX: 7f191329002d
RDX:  RSI: 2e40 RDI: 20001640
RBP: 7f19132f14d0 R08:  R09: 
R10:  R11: 0246 R12: 
R13: 000b R14: 7f19133cbf80 R15: 7f191400a000
 
Modules linked in:
CR2: ed110c2fd97f
---[ end trace  ]---
RIP: 0010:do_split+0xfef/0x1e10 fs/ext4/namei.c:2047
Code: d2 0f 85 38 0b 00 00 8b 45 00 89 84 24 84 00 00 00 41 8d 45 ff 48 8d 1c 
c3 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 
e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ef
RSP: 0018:c90001e9f858 EFLAGS: 00010a02
RAX: dc00 RBX: 617ecbf8 RCX: c9001048f000
RDX: 11110c2fd97f RSI: 823364ab RDI: 0005
RBP: 8880617ecc00 R08: 0005 R09: 
R10:  R11:  R12: dc00
R13:  R14:  R15: 88801ee8d2b0
FS:  7f191402a640() GS:88802c60() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ed110c2fd97f CR3: 5500a000 CR4: 00750ef0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554

Code disassembly (best guess):
   0:   d2 0f   rorb   %cl,(%rdi)
   2:   85 38   test   %edi,(%rax)
   4:   0b 00   or (%rax),%eax
   6:   00 8b 45 00 89 84   add%cl,-0x7b76ffbb(%rbx)
   c:   24 84   and$0x84,%al
   e:   00 00   add%al,(%rax)
  10:   00 41 8dadd%al,-0x73(%rcx)
  13:   45 ff 48 8d rex.RB decl -0x73(%r8)
  17:   1c c3   sbb$0xc3,%al
  19:   48 b8 00 00 00 00 00movabs $0xdc00,%rax
  20:   fc ff df
  23:   48 89 damov%rbx,%rdx
  26:   48 c1 ea 03 shr$0x3,%rdx
* 2a:   0f b6 14 02 movzbl (%rdx,%rax,1),%edx <-- trapping 
instruction
  2e:   48 89 d8mov%rbx,%rax
  31:   83 e0 07and$0x7,%eax
  34:   83 c0 03add$0x3,%eax
  37:   38 d0   cmp%dl,%al
  39:   7c 08   jl 0x43
  3b:   84 d2   test   %dl,%dl
  3d:   0f  .byte 0xf
  3e:   85 ef   test   

[PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-04-28 Thread Bibo Mao
PARAVIRT option and pv ipi is added on guest kernel side, function
pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current hypervirsor
type. Now only KVM type detection is supported, the paravirt function can
work only if current hypervisor type is KVM, since there is only KVM
supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver function. With
virutal IPI sender, ipi message is stored in DDR memory rather than
emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
acknowledge. And IPI message is stored in DDR, no trap in get IPI message.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|   9 ++
 arch/loongarch/include/asm/hardirq.h  |   1 +
 arch/loongarch/include/asm/paravirt.h |  27 
 .../include/asm/paravirt_api_clock.h  |   1 +
 arch/loongarch/kernel/Makefile|   1 +
 arch/loongarch/kernel/irq.c   |   2 +-
 arch/loongarch/kernel/paravirt.c  | 151 ++
 arch/loongarch/kernel/smp.c   |   4 +-
 8 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 54ad04dacdee..0a1540a8853e 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
 typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
+   atomic_t message cacheline_aligned_in_smp;
 } cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3a7620b66bc6..c9bfeda89e40 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
per_cpu(irq_stack, i), per_cpu(irq_stack, i) + 
IRQ_STACK_SIZE);
}
 
-   set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+   set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI 
| ECFGF_PMC);
 }
diff --git a/arch/loongarch/kernel/paravir

Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-26 Thread Dmitry Baryshkov
On Sat, 27 Apr 2024 at 04:03, Chris Lew  wrote:
>
>
>
> On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:
> > diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
> > b/drivers/remoteproc/qcom_q6v5_adsp.c
> > index 1d24c9b656a8..02d0c626b03b 100644
> > --- a/drivers/remoteproc/qcom_q6v5_adsp.c
> > +++ b/drivers/remoteproc/qcom_q6v5_adsp.c
> > @@ -23,6 +23,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >
> > @@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)
> >   int ret;
> >   unsigned int val;
> >
> > - ret = qcom_q6v5_prepare(>q6v5);
> > + ret = qcom_pdm_get();
> >   if (ret)
> >   return ret;
>
> Would it make sense to try and model this as a rproc subdev? This
> section of the remoteproc code seems to be focused on making specific
> calls to setup and enable hardware resources, where as pd mapper is
> software.
>
> sysmon and ssr are also purely software and they are modeled as subdevs
> in qcom_common. I'm not an expert on remoteproc organization but this
> was just a thought.

Well, the issue is that the pd-mapper is a global, not a per-remoteproc instance

>
> Thanks!
> Chris
>
> >
> > + ret = qcom_q6v5_prepare(>q6v5);
> > + if (ret)
> > + goto put_pdm;
> > +
> >   ret = adsp_map_carveout(rproc);
> >   if (ret) {
> >   dev_err(adsp->dev, "ADSP smmu mapping failed\n");
> > @@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
> >   adsp_unmap_carveout(rproc);
> >   disable_irqs:
> >   qcom_q6v5_unprepare(>q6v5);
> > +put_pdm:
> > + qcom_pdm_release();
> >
> >   return ret;
> >   }
>


-- 
With best wishes
Dmitry



Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-26 Thread Chris Lew




On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:

diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/drivers/remoteproc/qcom_q6v5_adsp.c
index 1d24c9b656a8..02d0c626b03b 100644
--- a/drivers/remoteproc/qcom_q6v5_adsp.c
+++ b/drivers/remoteproc/qcom_q6v5_adsp.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)

int ret;
unsigned int val;
  
-	ret = qcom_q6v5_prepare(>q6v5);

+   ret = qcom_pdm_get();
if (ret)
return ret;


Would it make sense to try and model this as a rproc subdev? This 
section of the remoteproc code seems to be focused on making specific 
calls to setup and enable hardware resources, where as pd mapper is 
software.


sysmon and ssr are also purely software and they are modeled as subdevs 
in qcom_common. I'm not an expert on remoteproc organization but this 
was just a thought.


Thanks!
Chris

  
+	ret = qcom_q6v5_prepare(>q6v5);

+   if (ret)
+   goto put_pdm;
+
ret = adsp_map_carveout(rproc);
if (ret) {
dev_err(adsp->dev, "ADSP smmu mapping failed\n");
@@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
adsp_unmap_carveout(rproc);
  disable_irqs:
qcom_q6v5_unprepare(>q6v5);
+put_pdm:
+   qcom_pdm_release();
  
  	return ret;

  }





Re: [PATCH] kernel/trace/trace_probe:Fixed memory leak issues in trace_probe.c.

2024-04-26 Thread Google
Hi LuMingYin,

Thanks for finding the problem! But please make a commit message
following Documentation/process/submitting-patches.rst


On Fri, 26 Apr 2024 10:13:43 +0100
lumingyindet...@126.com wrote:

> From: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>
> 
> At line 1408 of the file /linux/kernel/trace/trace_probe.c, pointer variables 
> named code and tmp are defined. At line 1437, a new dynamic memory area is 
> allocated using the function kcalloc. When the if statement at line 1467 
> evaluates to true, the program jumps to the out label at line 1469. Within 
> this function, there are two labels: out and fail. The difference between 
> these two labels is that fail additionally frees the dynamic memory area 
> pointed to by the variable code. Therefore, the program should jump to the 
> fail label instead of the out label. This commit fixes this bug.
> 

For example, you must line break after about 70 characters. Also,
please don't use the line number because the line number is easily
changed (function name is OK). Since this bug is very clear mistake,
so you can just explain that as following.


 If traceprobe_parse_probe_arg_body() fails to allocate 'parg->fmt', it
 jumps to 'out' instead of 'fail' by mistake. In the result, in this
 case the 'tmp' buffer is not freed and leaks its memory.

 Fix it by jumping to 'fail' in that case.

The first paragraph explains what happens, and second one to exaplain
how to fix it.

Also, please add this Fixes tag.

Fixes: 032330abd08b ("tracing/probes: Cleanup probe argument parser")

You can easily find this commit number with git blame.

Thank you,

> Signed-off-by: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>
> ---
>  kernel/trace/trace_probe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index dfe3ee6035ec..42bc0f362226 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -1466,7 +1466,7 @@ static int traceprobe_parse_probe_arg_body(const char 
> *argv, ssize_t *size,
>   parg->fmt = kmalloc(len, GFP_KERNEL);
>   if (!parg->fmt) {
>   ret = -ENOMEM;
> - goto out;
> + goto fail;
>   }
>   snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
>parg->count);
> -- 
> 2.25.1
> 


-- 
Masami Hiramatsu (Google) 



[PATCH v3 0/2] virtiofs: fix the warning for kernel direct IO

2024-04-26 Thread Hou Tao
From: Hou Tao 

Hi,

The patch set aims to fix the warning related to an abnormal size
parameter of kmalloc() in virtiofs. Patch #1 fixes it by introducing
use_pages_for_kvec_io option in fuse_conn and enabling it in virtiofs.
Beside the abnormal size parameter for kmalloc, the gfp parameter is
also questionable: GFP_ATOMIC is used even when the allocation occurs
in a kworker context. Patch #2 fixes it by using GFP_NOFS when the
allocation is initiated by the kworker. For more details, please check
the individual patches.

As usual, comments are always welcome.

Change Log:

v3:
 * introduce use_pages_for_kvec_io for virtiofs. When the option is
   enabled, fuse will use iov_iter_extract_pages() to construct a page
   array and pass the pages array instead of a pointer to virtiofs.
   The benefit is twofold: the length of the data passed to virtiofs is
   limited by max_pages, and there is no memory copy compared with v2.

v2: 
https://lore.kernel.org/linux-fsdevel/20240228144126.2864064-1-hou...@huaweicloud.com/
  * limit the length of ITER_KVEC dio by max_pages instead of the
newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
dio being consistent with other rw operations.
  * replace kmalloc-allocated bounce buffer by using a bounce buffer
backed by scattered pages when the length of the bounce buffer for
KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
fragmented memory, the KVEC_ITER dio can be handled normally by
virtiofs. (Bernd Schubert)
  * merge the GFP_NOFS patch [1] into this patch-set and use
memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
(Benjamin Coddington)

v1: 
https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/

[1]: 
https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/

Hou Tao (2):
  virtiofs: use pages instead of pointer for kernel direct IO
  virtiofs: use GFP_NOFS when enqueuing request through kworker

 fs/fuse/file.c  | 12 
 fs/fuse/fuse_i.h|  3 +++
 fs/fuse/virtio_fs.c | 25 -
 3 files changed, 27 insertions(+), 13 deletions(-)

-- 
2.29.2




[PATCH v3 1/2] virtiofs: use pages instead of pointer for kernel direct IO

2024-04-26 Thread Hou Tao
From: Hou Tao 

When trying to insert a 10MB kernel module kept in a virtio-fs with cache
disabled, the following warning was reported:

  [ cut here ]
  WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ..
  Modules linked in:
  CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ..
  RIP: 0010:__alloc_pages+0x2bf/0x380
  ..
  Call Trace:
   
   ? __warn+0x8e/0x150
   ? __alloc_pages+0x2bf/0x380
   __kmalloc_large_node+0x86/0x160
   __kmalloc+0x33c/0x480
   virtio_fs_enqueue_req+0x240/0x6d0
   virtio_fs_wake_pending_and_unlock+0x7f/0x190
   queue_request_and_unlock+0x55/0x60
   fuse_simple_request+0x152/0x2b0
   fuse_direct_io+0x5d2/0x8c0
   fuse_file_read_iter+0x121/0x160
   __kernel_read+0x151/0x2d0
   kernel_read+0x45/0x50
   kernel_read_file+0x1a9/0x2a0
   init_module_from_file+0x6a/0xe0
   idempotent_init_module+0x175/0x230
   __x64_sys_finit_module+0x5d/0xb0
   x64_sys_call+0x1c3/0x9e0
   do_syscall_64+0x3d/0xc0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   ..
   
  ---[ end trace  ]---

The warning is triggered as follows:

1) syscall finit_module() handles the module insertion and it invokes
kernel_read_file() to read the content of the module first.

2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and
passes it to kernel_read(). kernel_read() constructs a kvec iter by
using iov_iter_kvec() and passes it to fuse_file_read_iter().

3) virtio-fs disables the cache, so fuse_file_read_iter() invokes
fuse_direct_io(). As for now, the maximal read size for kvec iter is
only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so
fuse_direct_io() doesn't split the 10MB buffer. It saves the address and
the size of the 10MB-sized buffer in out_args[0] of a fuse request and
passes the fuse request to virtio_fs_wake_pending_and_unlock().

4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to
queue the request. Because virtiofs need DMA-able address, so
virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for
all fuse args, copies these args into the bounce buffer and passed the
physical address of the bounce buffer to virtiofsd. The total length of
these fuse args for the passed fuse request is about 10MB, so
copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and
it triggers the warning in __alloc_pages():

if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
return NULL;

5) virtio_fs_enqueue_req() will retry the memory allocation in a
kworker, but it won't help, because kmalloc() will always return NULL
due to the abnormal size and finit_module() will hang forever.

A feasible solution is to limit the value of max_read for virtio-fs, so
the length passed to kmalloc() will be limited. However it will affect
the maximal read size for normal read. And for virtio-fs write initiated
from kernel, it has the similar problem but now there is no way to limit
fc->max_write in kernel.

So instead of limiting both the values of max_read and max_write in
kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as
true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use
pages instead of pointer to pass the KVEC_IO data.

Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
Signed-off-by: Hou Tao 
---
 fs/fuse/file.c  | 12 
 fs/fuse/fuse_i.h|  3 +++
 fs/fuse/virtio_fs.c |  1 +
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b57ce41576407..82b77c5d8c643 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1471,13 +1471,17 @@ static inline size_t fuse_get_frag_size(const struct 
iov_iter *ii,
 
 static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
   size_t *nbytesp, int write,
-  unsigned int max_pages)
+  unsigned int max_pages,
+  bool use_pages_for_kvec_io)
 {
size_t nbytes = 0;  /* # bytes already packed in req */
ssize_t ret = 0;
 
-   /* Special case for kernel I/O: can copy directly into the buffer */
-   if (iov_iter_is_kvec(ii)) {
+   /* Special case for kernel I/O: can copy directly into the buffer.
+* However if the implementation of fuse_conn requires pages instead of
+* pointer (e.g., virtio-fs), use iov_iter_extract_pages() instead.
+*/
+   if (iov_iter_is_kvec(ii) && !use_pages_for_kvec_io) {
unsigned long user_addr = fuse_get_user_addr(ii);
size_t frag_size = fuse_get_frag_size(ii, *nbytesp);
 
@@ -1585,7 +1589,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct 
iov_iter *iter,
size_t nbytes = min(count, nmax);
 
err = fuse_get_user_pages(>ap, iter, , write,
-

Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation

2024-04-26 Thread Alexey Minnekhanov




On 24.04.2024 12:27, Dmitry Baryshkov wrote:

Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

Unlike previous revisions of the patchset, this iteration uses static
configuration per platform, rather than building it dynamically from the
list of DSPs being started.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org
Cc: Johan Hovold 
Cc: Xilin Wu 
Cc: "Bryan O'Donoghue" 
--

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
   builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
   silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
   (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
   them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404



I've tested this series on sdm660 device, with userspace pd-mapper
service disabled, and don't see any regressions - e.g Wi-Fi/BT
still come online and work as before.

Debug logs:
https://paste.sr.ht/~minlexx/bd03db4c582a3275078ce4fd05ea76ce46a52b8e

Missing cdsp_root and adsp_sensors PDs are not currently an issue,
because those are not enabled yet on SDM660 or hard to test, so

Tested-by: Alexey Minnekhanov 

--
Regards,
Alexey Minnekhanov
postmarketOS developer



[PATCH] kernel/trace/trace_probe:Fixed memory leak issues in trace_probe.c.

2024-04-26 Thread lumingyindetect
From: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>

At line 1408 of the file /linux/kernel/trace/trace_probe.c, pointer variables 
named code and tmp are defined. At line 1437, a new dynamic memory area is 
allocated using the function kcalloc. When the if statement at line 1467 
evaluates to true, the program jumps to the out label at line 1469. Within this 
function, there are two labels: out and fail. The difference between these two 
labels is that fail additionally frees the dynamic memory area pointed to by 
the variable code. Therefore, the program should jump to the fail label instead 
of the out label. This commit fixes this bug.

Signed-off-by: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>
---
 kernel/trace/trace_probe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index dfe3ee6035ec..42bc0f362226 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1466,7 +1466,7 @@ static int traceprobe_parse_probe_arg_body(const char 
*argv, ssize_t *size,
parg->fmt = kmalloc(len, GFP_KERNEL);
if (!parg->fmt) {
ret = -ENOMEM;
-   goto out;
+   goto fail;
}
snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
 parg->count);
-- 
2.25.1




Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation

2024-04-25 Thread Dmitry Baryshkov
On Thu, 25 Apr 2024 at 10:08, Steev Klimaszewski  wrote:
>
> Hi Dmitry,
>
> On Wed, Apr 24, 2024 at 4:28 AM Dmitry Baryshkov
>  wrote:
> >
> > Protection domain mapper is a QMI service providing mapping between
> > 'protection domains' and services supported / allowed in these domains.
> > For example such mapping is required for loading of the WiFi firmware or
> > for properly starting up the UCSI / altmode / battery manager support.
> >
> > The existing userspace implementation has several issue. It doesn't play
> > well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
> > firmware location is changed (or if the firmware was not available at
> > the time pd-mapper was started but the corresponding directory is
> > mounted later), etc.
> >
> > However this configuration is largely static and common between
> > different platforms. Provide in-kernel service implementing static
> > per-platform data.
> >
> > Unlike previous revisions of the patchset, this iteration uses static
> > configuration per platform, rather than building it dynamically from the
> > list of DSPs being started.
> >
> > To: Bjorn Andersson 
> > To: Konrad Dybcio 
> > To: Sibi Sankar 
> > To: Mathieu Poirier 
> > Cc: linux-arm-...@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-remotep...@vger.kernel.org
> > Cc: Johan Hovold 
> > Cc: Xilin Wu 
> > Cc: "Bryan O'Donoghue" 
> > --
> >
> > Changes in v7:
> > - Fixed modular build (Steev)
> > - Link to v6: 
> > https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org
> >
> > Changes in v6:
> > - Reworked mutex to fix lockdep issue on deregistration
> > - Fixed dependencies between PD-mapper and remoteproc to fix modular
> >   builds (Krzysztof)
> > - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
> > - Fixed kerneldocs (Krzysztof)
> > - Removed extra pr_debug messages (Krzysztof)
> > - Fixed wcss build (Krzysztof)
> > - Added platforms which do not require protection domain mapping to
> >   silence the notice on those platforms
> > - Link to v5: 
> > https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org
> >
> > Changes in v5:
> > - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris 
> > Lew)
> > - pd_mapper: reworked to provide static configuration per platform
> >   (Bjorn)
> > - Link to v4: 
> > https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org
> >
> > Changes in v4:
> > - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
> > - Added configuration for sm6350 (Thanks to Luca)
> > - Removed RFC tag (Konrad)
> > - Link to v3: 
> > https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org
> >
> > Changes in RFC v3:
> > - Send start / stop notifications when PD-mapper domain list is changed
> > - Reworked the way PD-mapper treats protection domains, register all of
> >   them in a single batch
> > - Added SC7180 domains configuration based on TCL Book 14 GO
> > - Link to v2: 
> > https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org
> >
> > Changes in RFC v2:
> > - Swapped num_domains / domains (Konrad)
> > - Fixed an issue with battery not working on sc8280xp
> > - Added missing configuration for QCS404
> >
> > ---
> > Dmitry Baryshkov (6):
> >   soc: qcom: pdr: protect locator_addr with the main mutex
> >   soc: qcom: pdr: fix parsing of domains lists
> >   soc: qcom: pdr: extract PDR message marshalling data
> >   soc: qcom: qmi: add a way to remove running service
> >   soc: qcom: add pd-mapper implementation
> >   remoteproc: qcom: enable in-kernel PD mapper
> >
> >  drivers/remoteproc/Kconfig  |   4 +
> >  drivers/remoteproc/qcom_q6v5_adsp.c |  11 +-
> >  drivers/remoteproc/qcom_q6v5_mss.c  |  10 +-
> >  drivers/remoteproc/qcom_q6v5_pas.c  |  12 +-
> >  drivers/remoteproc/qcom_q6v5_wcss.c |  12 +-
> >  drivers/soc/qcom/Kconfig|  14 +
> >  drivers/soc/qcom/Makefile   |   2 +
> >  drivers/soc/qcom/pdr_interface.c|   6 +-
> >  drivers/soc/qcom/pdr_internal.h | 318 ++---
> >  drivers/soc/qcom/qcom_pd_mapper.c   | 656 
> > 
> >  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
> >  drivers/soc/qcom/qmi_interface.c|  67 
> >  include/linux/soc/qcom/pd_mapper.h  |  28 ++
> >  include/linux/soc/qcom/qmi.h|   2 +
> >  14 files changed, 1193 insertions(+), 302 deletions(-)
> > ---
> > base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8
> > change-id: 20240301-qcom-pd-mapper-e12d622d4ad0
> >
> > Best regards,
> > --
> > Dmitry Baryshkov 
> >
> >
> I've tested this series over a large number of reboots, and the p-d
> devices(?) do always seem to come up (with the pd-mapper service
> disabled) on my Thinkpad X13s.  One less service to run in userland!
> Tested-by: Steev Klimaszewski 

Thank you!

-- 
With best wishes
Dmitry



Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation

2024-04-25 Thread Steev Klimaszewski
Hi Dmitry,

On Wed, Apr 24, 2024 at 4:28 AM Dmitry Baryshkov
 wrote:
>
> Protection domain mapper is a QMI service providing mapping between
> 'protection domains' and services supported / allowed in these domains.
> For example such mapping is required for loading of the WiFi firmware or
> for properly starting up the UCSI / altmode / battery manager support.
>
> The existing userspace implementation has several issue. It doesn't play
> well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
> firmware location is changed (or if the firmware was not available at
> the time pd-mapper was started but the corresponding directory is
> mounted later), etc.
>
> However this configuration is largely static and common between
> different platforms. Provide in-kernel service implementing static
> per-platform data.
>
> Unlike previous revisions of the patchset, this iteration uses static
> configuration per platform, rather than building it dynamically from the
> list of DSPs being started.
>
> To: Bjorn Andersson 
> To: Konrad Dybcio 
> To: Sibi Sankar 
> To: Mathieu Poirier 
> Cc: linux-arm-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-remotep...@vger.kernel.org
> Cc: Johan Hovold 
> Cc: Xilin Wu 
> Cc: "Bryan O'Donoghue" 
> --
>
> Changes in v7:
> - Fixed modular build (Steev)
> - Link to v6: 
> https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org
>
> Changes in v6:
> - Reworked mutex to fix lockdep issue on deregistration
> - Fixed dependencies between PD-mapper and remoteproc to fix modular
>   builds (Krzysztof)
> - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
> - Fixed kerneldocs (Krzysztof)
> - Removed extra pr_debug messages (Krzysztof)
> - Fixed wcss build (Krzysztof)
> - Added platforms which do not require protection domain mapping to
>   silence the notice on those platforms
> - Link to v5: 
> https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org
>
> Changes in v5:
> - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris 
> Lew)
> - pd_mapper: reworked to provide static configuration per platform
>   (Bjorn)
> - Link to v4: 
> https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org
>
> Changes in v4:
> - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
> - Added configuration for sm6350 (Thanks to Luca)
> - Removed RFC tag (Konrad)
> - Link to v3: 
> https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org
>
> Changes in RFC v3:
> - Send start / stop notifications when PD-mapper domain list is changed
> - Reworked the way PD-mapper treats protection domains, register all of
>   them in a single batch
> - Added SC7180 domains configuration based on TCL Book 14 GO
> - Link to v2: 
> https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org
>
> Changes in RFC v2:
> - Swapped num_domains / domains (Konrad)
> - Fixed an issue with battery not working on sc8280xp
> - Added missing configuration for QCS404
>
> ---
> Dmitry Baryshkov (6):
>   soc: qcom: pdr: protect locator_addr with the main mutex
>   soc: qcom: pdr: fix parsing of domains lists
>   soc: qcom: pdr: extract PDR message marshalling data
>   soc: qcom: qmi: add a way to remove running service
>   soc: qcom: add pd-mapper implementation
>   remoteproc: qcom: enable in-kernel PD mapper
>
>  drivers/remoteproc/Kconfig  |   4 +
>  drivers/remoteproc/qcom_q6v5_adsp.c |  11 +-
>  drivers/remoteproc/qcom_q6v5_mss.c  |  10 +-
>  drivers/remoteproc/qcom_q6v5_pas.c  |  12 +-
>  drivers/remoteproc/qcom_q6v5_wcss.c |  12 +-
>  drivers/soc/qcom/Kconfig|  14 +
>  drivers/soc/qcom/Makefile   |   2 +
>  drivers/soc/qcom/pdr_interface.c|   6 +-
>  drivers/soc/qcom/pdr_internal.h | 318 ++---
>  drivers/soc/qcom/qcom_pd_mapper.c   | 656 
> 
>  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
>  drivers/soc/qcom/qmi_interface.c|  67 
>  include/linux/soc/qcom/pd_mapper.h  |  28 ++
>  include/linux/soc/qcom/qmi.h|   2 +
>  14 files changed, 1193 insertions(+), 302 deletions(-)
> ---
> base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8
> change-id: 20240301-qcom-pd-mapper-e12d622d4ad0
>
> Best regards,
> --
> Dmitry Baryshkov 
>
>
I've tested this series over a large number of reboots, and the p-d
devices(?) do always seem to come up (with the pd-mapper service
disabled) on my Thinkpad X13s.  One less service to run in userland!
Tested-by: Steev Klimaszewski 



[PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-24 Thread Dmitry Baryshkov
Request in-kernel protection domain mapper to be started before starting
Qualcomm DSP and release it once DSP is stopped. Once all DSPs are
stopped, the PD mapper will be stopped too.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/remoteproc/Kconfig  |  4 
 drivers/remoteproc/qcom_q6v5_adsp.c | 11 ++-
 drivers/remoteproc/qcom_q6v5_mss.c  | 10 +-
 drivers/remoteproc/qcom_q6v5_pas.c  | 12 +++-
 drivers/remoteproc/qcom_q6v5_wcss.c | 12 +++-
 5 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 48845dc8fa85..a0ce552f89a1 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -181,6 +181,7 @@ config QCOM_Q6V5_ADSP
depends on QCOM_SYSMON || QCOM_SYSMON=n
depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n
depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n
+   depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n
select MFD_SYSCON
select QCOM_PIL_INFO
select QCOM_MDT_LOADER
@@ -201,6 +202,7 @@ config QCOM_Q6V5_MSS
depends on QCOM_SYSMON || QCOM_SYSMON=n
depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n
depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n
+   depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n
select MFD_SYSCON
select QCOM_MDT_LOADER
select QCOM_PIL_INFO
@@ -221,6 +223,7 @@ config QCOM_Q6V5_PAS
depends on QCOM_SYSMON || QCOM_SYSMON=n
depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n
depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n
+   depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n
select MFD_SYSCON
select QCOM_PIL_INFO
select QCOM_MDT_LOADER
@@ -243,6 +246,7 @@ config QCOM_Q6V5_WCSS
depends on QCOM_SYSMON || QCOM_SYSMON=n
depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n
depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n
+   depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n
select MFD_SYSCON
select QCOM_MDT_LOADER
select QCOM_PIL_INFO
diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/drivers/remoteproc/qcom_q6v5_adsp.c
index 1d24c9b656a8..02d0c626b03b 100644
--- a/drivers/remoteproc/qcom_q6v5_adsp.c
+++ b/drivers/remoteproc/qcom_q6v5_adsp.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)
int ret;
unsigned int val;
 
-   ret = qcom_q6v5_prepare(>q6v5);
+   ret = qcom_pdm_get();
if (ret)
return ret;
 
+   ret = qcom_q6v5_prepare(>q6v5);
+   if (ret)
+   goto put_pdm;
+
ret = adsp_map_carveout(rproc);
if (ret) {
dev_err(adsp->dev, "ADSP smmu mapping failed\n");
@@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
adsp_unmap_carveout(rproc);
 disable_irqs:
qcom_q6v5_unprepare(>q6v5);
+put_pdm:
+   qcom_pdm_release();
 
return ret;
 }
@@ -478,6 +485,8 @@ static int adsp_stop(struct rproc *rproc)
if (handover)
qcom_adsp_pil_handover(>q6v5);
 
+   qcom_pdm_release();
+
return ret;
 }
 
diff --git a/drivers/remoteproc/qcom_q6v5_mss.c 
b/drivers/remoteproc/qcom_q6v5_mss.c
index 1779fc890e10..791f11e7adbf 100644
--- a/drivers/remoteproc/qcom_q6v5_mss.c
+++ b/drivers/remoteproc/qcom_q6v5_mss.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1581,10 +1582,14 @@ static int q6v5_start(struct rproc *rproc)
int xfermemop_ret;
int ret;
 
-   ret = q6v5_mba_load(qproc);
+   ret = qcom_pdm_get();
if (ret)
return ret;
 
+   ret = q6v5_mba_load(qproc);
+   if (ret)
+   goto put_pdm;
+
dev_info(qproc->dev, "MBA booted with%s debug policy, loading mpss\n",
 qproc->dp_size ? "" : "out");
 
@@ -1613,6 +1618,8 @@ static int q6v5_start(struct rproc *rproc)
 reclaim_mpss:
q6v5_mba_reclaim(qproc);
q6v5_dump_mba_logs(qproc);
+put_pdm:
+   qcom_pdm_release();
 
return ret;
 }
@@ -1627,6 +1634,7 @@ static int q6v5_stop(struct rproc *rproc)
dev_err(qproc->dev, "timed out on wait\n");
 
q6v5_mba_reclaim(qproc);
+   qcom_pdm_release();
 
return 0;
 }
diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
b/drivers/remoteproc/qcom_q6v5_pas.c
index 54d8005d40a3..653e54f975fc 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -261,10 +262,14 @@ static int adsp_start(struct rproc *rproc)
struct qcom_adsp *adsp = rproc->priv;
int ret;
 
-   ret = qcom_q6v5_prepare(>q6v5);
+   ret = qcom_pdm_get();
if (ret)
return ret;
 
+

[PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation

2024-04-24 Thread Dmitry Baryshkov
Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

Unlike previous revisions of the patchset, this iteration uses static
configuration per platform, rather than building it dynamically from the
list of DSPs being started.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org
Cc: Johan Hovold 
Cc: Xilin Wu 
Cc: "Bryan O'Donoghue" 
--

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
  builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
  silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
  (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
  them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404

---
Dmitry Baryshkov (6):
  soc: qcom: pdr: protect locator_addr with the main mutex
  soc: qcom: pdr: fix parsing of domains lists
  soc: qcom: pdr: extract PDR message marshalling data
  soc: qcom: qmi: add a way to remove running service
  soc: qcom: add pd-mapper implementation
  remoteproc: qcom: enable in-kernel PD mapper

 drivers/remoteproc/Kconfig  |   4 +
 drivers/remoteproc/qcom_q6v5_adsp.c |  11 +-
 drivers/remoteproc/qcom_q6v5_mss.c  |  10 +-
 drivers/remoteproc/qcom_q6v5_pas.c  |  12 +-
 drivers/remoteproc/qcom_q6v5_wcss.c |  12 +-
 drivers/soc/qcom/Kconfig|  14 +
 drivers/soc/qcom/Makefile   |   2 +
 drivers/soc/qcom/pdr_interface.c|   6 +-
 drivers/soc/qcom/pdr_internal.h | 318 ++---
 drivers/soc/qcom/qcom_pd_mapper.c   | 656 
 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
 drivers/soc/qcom/qmi_interface.c|  67 
 include/linux/soc/qcom/pd_mapper.h  |  28 ++
 include/linux/soc/qcom/qmi.h|   2 +
 14 files changed, 1193 insertions(+), 302 deletions(-)
---
base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8
change-id: 20240301-qcom-pd-mapper-e12d622d4ad0

Best regards,
-- 
Dmitry Baryshkov 




Re: Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-12 Thread Michal Koutný
On Thu, Apr 11, 2024 at 03:03:31PM -0700, Andrew Morton 
 wrote:
> A large increase in the maximum number of processes.

The change from (some) default to effective infinity is the crux of the
change. Because that is only a number.
(Thus I don't find the number's 12700% increase alone a big change.)

Actual maximum amount of processes is "workload dependent" and hence
should be determined based on the particular workload.

> Or did I misinterpret?

I thought you saw an issue with projection of that number into sizings
based on the default. Which of them comprises the large change in your
eyes?

Thanks,
Michal



Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-11 Thread Andrew Morton
On Thu, 11 Apr 2024 17:40:02 +0200 Michal Koutný  wrote:

> Hello.
> 
> On Mon, Apr 08, 2024 at 01:29:55PM -0700, Andrew Morton 
>  wrote:
> > That seems like a large change.
> 
> In what sense is it large?

A large increase in the maximum number of processes.  Or did I misinterpret?





[PATCH 3/5] openrisc: traps: Don't send signals to kernel mode threads

2024-04-11 Thread Stafford Horne
OpenRISC exception handling sends signals to user processes on floating
point exceptions and trap instructions (for debugging) among others.
There is a bug where the trap handling logic may send signals to kernel
threads, we should not send these signals to kernel threads, if that
happens we treat it as an error.

This patch adds conditions to die if the kernel receives these
exceptions in kernel mode code.

Fixes: 27267655c531 ("openrisc: Support floating point user api")
Signed-off-by: Stafford Horne 
---
 arch/openrisc/kernel/traps.c | 48 ++--
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/arch/openrisc/kernel/traps.c b/arch/openrisc/kernel/traps.c
index 88fe27e4c10c..211ddaa0c5fa 100644
--- a/arch/openrisc/kernel/traps.c
+++ b/arch/openrisc/kernel/traps.c
@@ -180,29 +180,39 @@ asmlinkage void unhandled_exception(struct pt_regs *regs, 
int ea, int vector)
 
 asmlinkage void do_fpe_trap(struct pt_regs *regs, unsigned long address)
 {
-   int code = FPE_FLTUNK;
-   unsigned long fpcsr = regs->fpcsr;
-
-   if (fpcsr & SPR_FPCSR_IVF)
-   code = FPE_FLTINV;
-   else if (fpcsr & SPR_FPCSR_OVF)
-   code = FPE_FLTOVF;
-   else if (fpcsr & SPR_FPCSR_UNF)
-   code = FPE_FLTUND;
-   else if (fpcsr & SPR_FPCSR_DZF)
-   code = FPE_FLTDIV;
-   else if (fpcsr & SPR_FPCSR_IXF)
-   code = FPE_FLTRES;
-
-   /* Clear all flags */
-   regs->fpcsr &= ~SPR_FPCSR_ALLF;
-
-   force_sig_fault(SIGFPE, code, (void __user *)regs->pc);
+   if (user_mode(regs)) {
+   int code = FPE_FLTUNK;
+   unsigned long fpcsr = regs->fpcsr;
+
+   if (fpcsr & SPR_FPCSR_IVF)
+   code = FPE_FLTINV;
+   else if (fpcsr & SPR_FPCSR_OVF)
+   code = FPE_FLTOVF;
+   else if (fpcsr & SPR_FPCSR_UNF)
+   code = FPE_FLTUND;
+   else if (fpcsr & SPR_FPCSR_DZF)
+   code = FPE_FLTDIV;
+   else if (fpcsr & SPR_FPCSR_IXF)
+   code = FPE_FLTRES;
+
+   /* Clear all flags */
+   regs->fpcsr &= ~SPR_FPCSR_ALLF;
+
+   force_sig_fault(SIGFPE, code, (void __user *)regs->pc);
+   } else {
+   pr_emerg("KERNEL: Illegal fpe exception 0x%.8lx\n", regs->pc);
+   die("Die:", regs, SIGFPE);
+   }
 }
 
 asmlinkage void do_trap(struct pt_regs *regs, unsigned long address)
 {
-   force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)regs->pc);
+   if (user_mode(regs)) {
+   force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)regs->pc);
+   } else {
+   pr_emerg("KERNEL: Illegal trap exception 0x%.8lx\n", regs->pc);
+   die("Die:", regs, SIGILL);
+   }
 }
 
 asmlinkage void do_unaligned_access(struct pt_regs *regs, unsigned long 
address)
-- 
2.44.0




Re: Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-11 Thread Michal Koutný
Hello.

On Mon, Apr 08, 2024 at 01:29:55PM -0700, Andrew Morton 
 wrote:
> That seems like a large change.

In what sense is it large?

I tried to lookup the code parts that depend on this default and either
add the other patches or mention the impact (that part could be more
thorough) in the commit message.

> It isn't clear why we'd want to merge this patchset.  Does it improve
> anyone's life and if so, how?

- kernel devs who don't care about policy
  - policy should be decided by distros/users, not in kernel

- users who need many threads
  - current default is too low
  - this is one more place to look at when configuring

- users who want to prevent fork-bombs
  - current default is ineffective (too high), false feeling of safety
  - i.e. they should configure appropriate mechanism appropriately


I thought that the first point alone would be convincing and that only
scaling impact might need clarification.

Regards,
Michal



[PATCH v2 13/13] mailbox: omap: Remove kernel FIFO message queuing

2024-04-10 Thread Andrew Davis
The kernel FIFO queue has a couple issues. The biggest issue is that
it causes extra latency in a path that can be used in real-time tasks,
such as communication with real-time remote processors.

The whole FIFO idea itself looks to be a leftover from before the
unified mailbox framework. The current mailbox framework expects
mbox_chan_received_data() to be called with data immediately as it
arrives. Remove the FIFO and pass the messages to the mailbox
framework directly as part of a threaded IRQ handler.

Signed-off-by: Andrew Davis 
---
 drivers/mailbox/Kconfig|   9 ---
 drivers/mailbox/omap-mailbox.c | 107 ++---
 2 files changed, 5 insertions(+), 111 deletions(-)

diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig
index 42940108a1874..78e4c74fbe5c2 100644
--- a/drivers/mailbox/Kconfig
+++ b/drivers/mailbox/Kconfig
@@ -68,15 +68,6 @@ config OMAP2PLUS_MBOX
  OMAP2/3; or IPU, IVA HD and DSP in OMAP4/5. Say Y here if you
  want to use OMAP2+ Mailbox framework support.
 
-config OMAP_MBOX_KFIFO_SIZE
-   int "Mailbox kfifo default buffer size (bytes)"
-   depends on OMAP2PLUS_MBOX
-   default 256
-   help
- Specify the default size of mailbox's kfifo buffers (bytes).
- This can also be changed at runtime (via the mbox_kfifo_size
- module parameter).
-
 config ROCKCHIP_MBOX
bool "Rockchip Soc Integrated Mailbox Support"
depends on ARCH_ROCKCHIP || COMPILE_TEST
diff --git a/drivers/mailbox/omap-mailbox.c b/drivers/mailbox/omap-mailbox.c
index c5d4083125856..46747559b438f 100644
--- a/drivers/mailbox/omap-mailbox.c
+++ b/drivers/mailbox/omap-mailbox.c
@@ -65,14 +65,6 @@ struct omap_mbox_fifo {
u32 intr_bit;
 };
 
-struct omap_mbox_queue {
-   spinlock_t  lock;
-   struct kfifofifo;
-   struct work_struct  work;
-   struct omap_mbox*mbox;
-   bool full;
-};
-
 struct omap_mbox_match_data {
u32 intr_type;
 };
@@ -90,7 +82,6 @@ struct omap_mbox_device {
 struct omap_mbox {
const char  *name;
int irq;
-   struct omap_mbox_queue  *rxq;
struct omap_mbox_device *parent;
struct omap_mbox_fifo   tx_fifo;
struct omap_mbox_fifo   rx_fifo;
@@ -99,10 +90,6 @@ struct omap_mbox {
boolsend_no_irq;
 };
 
-static unsigned int mbox_kfifo_size = CONFIG_OMAP_MBOX_KFIFO_SIZE;
-module_param(mbox_kfifo_size, uint, S_IRUGO);
-MODULE_PARM_DESC(mbox_kfifo_size, "Size of omap's mailbox kfifo (bytes)");
-
 static inline
 unsigned int mbox_read_reg(struct omap_mbox_device *mdev, size_t ofs)
 {
@@ -202,30 +189,6 @@ static void omap_mbox_disable_irq(struct omap_mbox *mbox, 
omap_mbox_irq_t irq)
mbox_write_reg(mbox->parent, bit, irqdisable);
 }
 
-/*
- * Message receiver(workqueue)
- */
-static void mbox_rx_work(struct work_struct *work)
-{
-   struct omap_mbox_queue *mq =
-   container_of(work, struct omap_mbox_queue, work);
-   u32 msg;
-   int len;
-
-   while (kfifo_len(>fifo) >= sizeof(msg)) {
-   len = kfifo_out(>fifo, (unsigned char *), sizeof(msg));
-   WARN_ON(len != sizeof(msg));
-
-   mbox_chan_received_data(mq->mbox->chan, (void *)(uintptr_t)msg);
-   spin_lock_irq(>lock);
-   if (mq->full) {
-   mq->full = false;
-   omap_mbox_enable_irq(mq->mbox, IRQ_RX);
-   }
-   spin_unlock_irq(>lock);
-   }
-}
-
 /*
  * Mailbox interrupt handler
  */
@@ -238,27 +201,15 @@ static void __mbox_tx_interrupt(struct omap_mbox *mbox)
 
 static void __mbox_rx_interrupt(struct omap_mbox *mbox)
 {
-   struct omap_mbox_queue *mq = mbox->rxq;
u32 msg;
-   int len;
 
while (!mbox_fifo_empty(mbox)) {
-   if (unlikely(kfifo_avail(>fifo) < sizeof(msg))) {
-   omap_mbox_disable_irq(mbox, IRQ_RX);
-   mq->full = true;
-   goto nomem;
-   }
-
msg = mbox_fifo_read(mbox);
-
-   len = kfifo_in(>fifo, (unsigned char *), sizeof(msg));
-   WARN_ON(len != sizeof(msg));
+   mbox_chan_received_data(mbox->chan, (void *)(uintptr_t)msg);
}
 
-   /* no more messages in the fifo. clear IRQ source. */
+   /* clear IRQ source. */
ack_mbox_irq(mbox, IRQ_RX);
-nomem:
-   schedule_work(>rxq->work);
 }
 
 static irqreturn_t mbox_interrupt(int irq, void *p)
@@ -274,57 +225,15 @@ static irqreturn_t mbox_interrupt(int irq, void *p)
return IRQ_HANDLED;
 }
 
-static struct omap_mbox_queue *mbox_queue_alloc(struct omap_mbox *mbox,
-   void (*work)(struct work_struct *))
-{
-   struct omap_mbox_queue *mq;
- 

[PATCH v2 fs/proc/bootconfig 2/2] fs/proc: Skip bootloader comment if no embedded kernel parameters

2024-04-08 Thread Paul E. McKenney
From: Masami Hiramatsu 

If the "bootconfig" kernel command-line argument was specified or if
the kernel was built with CONFIG_BOOT_CONFIG_FORCE, but if there are
no embedded kernel parameter, omit the "# Parameters from bootloader:"
comment from the /proc/bootconfig file.  This will cause automation
to fall back to the /proc/cmdline file, which will be identical to the
comment in this no-embedded-kernel-parameters case.

Signed-off-by: Masami Hiramatsu 
Signed-off-by: Paul E. McKenney 
---
 fs/proc/bootconfig.c   | 2 +-
 include/linux/bootconfig.h | 1 +
 init/main.c| 5 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c
index e5635a6b127b0..87dcaae32ff87 100644
--- a/fs/proc/bootconfig.c
+++ b/fs/proc/bootconfig.c
@@ -63,7 +63,7 @@ static int __init copy_xbc_key_value_list(char *dst, size_t 
size)
dst += ret;
}
}
-   if (ret >= 0 && boot_command_line[0]) {
+   if (cmdline_has_extra_options() && ret >= 0 && boot_command_line[0]) {
ret = snprintf(dst, rest(dst, end), "# Parameters from 
bootloader:\n# %s\n",
   boot_command_line);
if (ret > 0)
diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index ca73940e26df8..e5ee2c694401e 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -10,6 +10,7 @@
 #ifdef __KERNEL__
 #include 
 #include 
+bool __init cmdline_has_extra_options(void);
 #else /* !__KERNEL__ */
 /*
  * NOTE: This is only for tools/bootconfig, because tools/bootconfig will
diff --git a/init/main.c b/init/main.c
index 2ca52474d0c30..881f6230ee59e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -487,6 +487,11 @@ static int __init warn_bootconfig(char *str)
 
 early_param("bootconfig", warn_bootconfig);
 
+bool __init cmdline_has_extra_options(void)
+{
+   return extra_command_line || extra_init_args;
+}
+
 /* Change NUL term back to "=", to make "param" the whole string. */
 static void __init repair_env_string(char *param, char *val)
 {
-- 
2.40.1




Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread kernel test robot
Hi Michal,

kernel test robot noticed the following build errors:

[auto build test ERROR on fec50db7033ea478773b159e0e2efb135270e3b7]

url:
https://github.com/intel-lab-lkp/linux/commits/Michal-Koutn/tracing-Remove-dependency-of-saved_cmdlines_buffer-on-PID_MAX_DEFAULT/20240408-230031
base:   fec50db7033ea478773b159e0e2efb135270e3b7
patch link:
https://lore.kernel.org/r/20240408145819.8787-3-mkoutny%40suse.com
patch subject: [PATCH 2/3] kernel/pid: Remove default pid_max value
config: arm-allnoconfig 
(https://download.01.org/0day-ci/archive/20240409/202404090903.3jz667sn-...@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 
8b3b4a92adee40483c27f26c478a384cd69c6f05)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240409/202404090903.3jz667sn-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202404090903.3jz667sn-...@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/sysctl.c:23:
   In file included from include/linux/mm.h:2208:
   include/linux/vmstat.h:522:36: warning: arithmetic between different 
enumeration types ('enum node_stat_item' and 'enum lru_list') 
[-Wenum-enum-conversion]
 522 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
 |   ~~~ ^ ~~~
>> kernel/sysctl.c:1819:14: error: initializing 'void *' with an expression of 
>> type 'const int *' discards qualifiers 
>> [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
1819 | .extra2 = _max_max,
 |   ^~~~
   1 warning and 1 error generated.


vim +1819 kernel/sysctl.c

f461d2dcd511c0 Christoph Hellwig   2020-04-24  1617  
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1618  static struct ctl_table 
kern_table[] = {
^1da177e4c3f41 Linus Torvalds  2005-04-16  1619 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1620 .procname   
= "panic",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1621 .data   
= _timeout,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1622 .maxlen 
= sizeof(int),
49f0ce5f92321c Jerome Marchand 2014-01-21  1623 .mode   
= 0644,
6d4561110a3e9f Eric W. Biederman   2009-11-16  1624 .proc_handler   
= proc_dointvec,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1625 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1626  #ifdef CONFIG_PROC_SYSCTL
^1da177e4c3f41 Linus Torvalds  2005-04-16  1627 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1628 .procname   
= "tainted",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1629 .maxlen 
= sizeof(long),
^1da177e4c3f41 Linus Torvalds  2005-04-16  1630 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1631 .proc_handler   
= proc_taint,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1632 },
2da02997e08d3e David Rientjes  2009-01-06  1633 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1634 .procname   
= "sysctl_writes_strict",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1635 .data   
= _writes_strict,
9e3961a0979817 Prarit Bhargava 2014-12-10  1636 .maxlen 
= sizeof(int),
2da02997e08d3e David Rientjes  2009-01-06  1637 .mode   
= 0644,
9e3961a0979817 Prarit Bhargava 2014-12-10  1638 .proc_handler   
= proc_dointvec_minmax,
78e36f3b0dae58 Xiaoming Ni 2022-01-21  1639 .extra1 
= SYSCTL_NEG_ONE,
eec4844fae7c03 Matteo Croce2019-07-18  1640 .extra2 
= SYSCTL_ONE,
2da02997e08d3e David Rientjes  2009-01-06  1641 },
964c9dff009189 Alexander Popov 2018-08-17  1642  #endif
1efff914afac8a Theodore Ts'o   2015-03-17  1643 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1644 .procname   
= "print-fatal-signals",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1645 .data   
= _fatal_signals,
964c9dff009189 Alexander Popov 2018-08-17  1646 .maxlen 
= sizeof(int),
1efff914afac8a Theodore Ts'o   2015-03-17  1647 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1648 .proc_handler   
= proc_dointvec,
1efff914afac8a Theodore Ts'o   2015-03-17  1649 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1650  #ifdef CONFIG_SPARC
^1da177e4c3f41 Linus Torvalds  2005-04-16  1651 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1652 .procname   
= "reb

Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread kernel test robot
Hi Michal,

kernel test robot noticed the following build warnings:

[auto build test WARNING on fec50db7033ea478773b159e0e2efb135270e3b7]

url:
https://github.com/intel-lab-lkp/linux/commits/Michal-Koutn/tracing-Remove-dependency-of-saved_cmdlines_buffer-on-PID_MAX_DEFAULT/20240408-230031
base:   fec50db7033ea478773b159e0e2efb135270e3b7
patch link:
https://lore.kernel.org/r/20240408145819.8787-3-mkoutny%40suse.com
patch subject: [PATCH 2/3] kernel/pid: Remove default pid_max value
config: alpha-allnoconfig 
(https://download.01.org/0day-ci/archive/20240409/202404090849.mgj3z0xi-...@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240409/202404090849.mgj3z0xi-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202404090849.mgj3z0xi-...@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/sysctl.c:1819:35: warning: initialization discards 'const' qualifier 
>> from pointer target type [-Wdiscarded-qualifiers]
1819 | .extra2 = _max_max,
 |   ^


vim +/const +1819 kernel/sysctl.c

f461d2dcd511c0 Christoph Hellwig   2020-04-24  1617  
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1618  static struct ctl_table 
kern_table[] = {
^1da177e4c3f41 Linus Torvalds  2005-04-16  1619 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1620 .procname   
= "panic",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1621 .data   
= _timeout,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1622 .maxlen 
= sizeof(int),
49f0ce5f92321c Jerome Marchand 2014-01-21  1623 .mode   
= 0644,
6d4561110a3e9f Eric W. Biederman   2009-11-16  1624 .proc_handler   
= proc_dointvec,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1625 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1626  #ifdef CONFIG_PROC_SYSCTL
^1da177e4c3f41 Linus Torvalds  2005-04-16  1627 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1628 .procname   
= "tainted",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1629 .maxlen 
= sizeof(long),
^1da177e4c3f41 Linus Torvalds  2005-04-16  1630 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1631 .proc_handler   
= proc_taint,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1632 },
2da02997e08d3e David Rientjes  2009-01-06  1633 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1634 .procname   
= "sysctl_writes_strict",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1635 .data   
= _writes_strict,
9e3961a0979817 Prarit Bhargava 2014-12-10  1636 .maxlen 
= sizeof(int),
2da02997e08d3e David Rientjes  2009-01-06  1637 .mode   
= 0644,
9e3961a0979817 Prarit Bhargava 2014-12-10  1638 .proc_handler   
= proc_dointvec_minmax,
78e36f3b0dae58 Xiaoming Ni 2022-01-21  1639 .extra1 
= SYSCTL_NEG_ONE,
eec4844fae7c03 Matteo Croce2019-07-18  1640 .extra2 
= SYSCTL_ONE,
2da02997e08d3e David Rientjes  2009-01-06  1641 },
964c9dff009189 Alexander Popov 2018-08-17  1642  #endif
1efff914afac8a Theodore Ts'o   2015-03-17  1643 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1644 .procname   
= "print-fatal-signals",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1645 .data   
= _fatal_signals,
964c9dff009189 Alexander Popov 2018-08-17  1646 .maxlen 
= sizeof(int),
1efff914afac8a Theodore Ts'o   2015-03-17  1647 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1648 .proc_handler   
= proc_dointvec,
1efff914afac8a Theodore Ts'o   2015-03-17  1649 },
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1650  #ifdef CONFIG_SPARC
^1da177e4c3f41 Linus Torvalds  2005-04-16  1651 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1652 .procname   
= "reboot-cmd",
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1653 .data   
= reboot_command,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1654 .maxlen 
= 256,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1655 .mode   
= 0644,
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1656 .proc_handler   
= proc_dostring,
^1da177e4c3f41 Linus Torvalds  2005-04-16  1657 },
^1da177e4c3f41 Linus Torvalds  2005-04-16  1658 {
f461d2dcd511c0 Christoph Hellwig   2020-04-24  1659 .procname   

Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread Andrew Morton
On Mon,  8 Apr 2024 16:58:18 +0200 Michal Koutný  wrote:

> The kernel provides mechanisms, while it should not imply policies --
> default pid_max seems to be an example of the policy that does not fit
> all. At the same time pid_max must have some value assigned, so use the
> end of the allowed range -- pid_max_max.
> 
> This change thus increases initial pid_max from 32k to 4M (x86_64
> defconfig).

That seems like a large change.

It isn't clear why we'd want to merge this patchset.  Does it improve
anyone's life and if so, how?




[PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread Michal Koutný
pid_max is a per-pidns (thus global too) limit on a number of tasks the
kernel admits. The knob can be configured by admin in the range between
pid_max_min and pid_max_max (sic). The default value sits between
those and it typically equals max(32k, 1k*nr_cpus).

The nr_cpu scaling was introduced in commit 72680a191b93 ("pids:
increase pid_max based on num_possible_cpus") to accommodate kernel's own
helper tasks (before workqueues). Generally, 1024 tasks/cpu cap is too
much if they were all running and it is also too little when they are
idle (memory being bottleneck).

The kernel also provides other mechanisms to restrict number of tasks --
threads-max sysctl and RLIMIT_NPROC with memory-scaled defaults and
generic pids cgroup controller (the last one being the solution of
fork-bombs, with qualified limits set up by admin).

The kernel provides mechanisms, while it should not imply policies --
default pid_max seems to be an example of the policy that does not fit
all. At the same time pid_max must have some value assigned, so use the
end of the allowed range -- pid_max_max.

This change thus increases initial pid_max from 32k to 4M (x86_64
defconfig).

This has effect on size of structure that alloc_pid/idr_alloc_cyclic
eventually uses and structure that kernel tracing uses with
'record-tgid' (~16 MiB).

Signed-off-by: Michal Koutný 
---
 include/linux/pid.h |  4 ++--
 include/linux/threads.h | 15 -------
 kernel/pid.c|  8 +++-
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index a3aad9b4074c..0d191ac02958 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -106,8 +106,8 @@ extern void exchange_tids(struct task_struct *task, struct 
task_struct *old);
 extern void transfer_pid(struct task_struct *old, struct task_struct *new,
 enum pid_type);
 
-extern int pid_max;
-extern int pid_max_min, pid_max_max;
+extern int pid_max_min, pid_max;
+extern const int pid_max_max;
 
 /*
  * look up a PID in the hash table. Must be called with the tasklist_lock
diff --git a/include/linux/threads.h b/include/linux/threads.h
index c34173e6c5f1..43f8f38a0c13 100644
--- a/include/linux/threads.h
+++ b/include/linux/threads.h
@@ -22,25 +22,18 @@
 
 #define MIN_THREADS_LEFT_FOR_ROOT 4
 
-/*
- * This controls the default maximum pid allocated to a process
- */
-#define PID_MAX_DEFAULT (CONFIG_BASE_SMALL ? 0x1000 : 0x8000)
-
 /*
  * A maximum of 4 million PIDs should be enough for a while.
  * [NOTE: PID/TIDs are limited to 2^30 ~= 1 billion, see FUTEX_TID_MASK.]
  */
 #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
-   (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))
+   (sizeof(long) > 4 ? 4 * 1024 * 1024 : 0x8000))
 
 /*
- * Define a minimum number of pids per cpu.  Heuristically based
- * on original pid max of 32k for 32 cpus.  Also, increase the
- * minimum settable value for pid_max on the running system based
- * on similar defaults.  See kernel/pid.c:pid_idr_init() for details.
+ * Define a minimum number of pids per cpu. Mainly to accommodate
+ * smpboot_register_percpu_thread() kernel threads.
+ * See kernel/pid.c:pid_idr_init() for details.
  */
-#define PIDS_PER_CPU_DEFAULT   1024
 #define PIDS_PER_CPU_MIN   8
 
 #endif
diff --git a/kernel/pid.c b/kernel/pid.c
index da76ed1873f7..24ae505ac3b0 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -60,10 +60,10 @@ struct pid init_struct_pid = {
}, }
 };
 
-int pid_max = PID_MAX_DEFAULT;
+int pid_max = PID_MAX_LIMIT;
 
 int pid_max_min = RESERVED_PIDS + 1;
-int pid_max_max = PID_MAX_LIMIT;
+const int pid_max_max = PID_MAX_LIMIT;
 /*
  * Pseudo filesystems start inode numbering after one. We use Reserved
  * PIDs as a natural offset.
@@ -652,9 +652,7 @@ void __init pid_idr_init(void)
/* Verify no one has done anything silly: */
BUILD_BUG_ON(PID_MAX_LIMIT >= PIDNS_ADDING);
 
-   /* bump default and minimum pid_max based on number of cpus */
-   pid_max = min(pid_max_max, max_t(int, pid_max,
-   PIDS_PER_CPU_DEFAULT * num_possible_cpus()));
+   /* bump minimum pid_max based on number of cpus */
pid_max_min = max_t(int, pid_max_min,
PIDS_PER_CPU_MIN * num_possible_cpus());
pr_info("pid_max: default: %u minimum: %u\n", pid_max, pid_max_min);
-- 
2.44.0




[PATCH 0/3] kernel/pid: Remove default pid_max value

2024-04-08 Thread Michal Koutný
TL;DR excerpt from commit 02/03:

The kernel provides mechanisms, while it should not imply policies --
default pid_max seems to be an example of the policy that does not fit
all. At the same time pid_max must have some value assigned, so use the
end of the allowed range -- pid_max_max.

More details are in that commit's message. The other two commits are
related preparation and less related refresh in code that somewhat
references pid_max.

Michal Koutný (3):
  tracing: Remove dependency of saved_cmdlines_buffer on PID_MAX_DEFAULT
  kernel/pid: Remove default pid_max value
  tracing: Compare pid_max against pid_list capacity

 include/linux/pid.h   |  4 ++--
 include/linux/threads.h   | 15 ---
 kernel/pid.c  |  8 +++-
 kernel/trace/pid_list.c   |  6 +++---
 kernel/trace/pid_list.h   |  4 ++--
 kernel/trace/trace_sched_switch.c | 11 ++-
 6 files changed, 20 insertions(+), 28 deletions(-)


base-commit: fec50db7033ea478773b159e0e2efb135270e3b7
-- 
2.44.0




Re: [PATCH 13/13] mailbox: omap: Remove kernel FIFO message queuing

2024-04-01 Thread Andrew Davis

On 4/1/24 6:39 PM, Hari Nagalla wrote:

On 3/25/24 12:20, Andrew Davis wrote:

The kernel FIFO queue has a couple issues. The biggest issue is that
it causes extra latency in a path that can be used in real-time tasks,
such as communication with real-time remote processors.

The whole FIFO idea itself looks to be a leftover from before the
unified mailbox framework. The current mailbox framework expects
mbox_chan_received_data() to be called with data immediately as it
arrives. Remove the FIFO and pass the messages to the mailbox
framework directly.

Yes, this would definitely speed up the message receive path. With RT linux, 
the irq runs in thread context, so that is Ok. But with non-RT the whole 
receive path runs in interrupt context. So, i think it would be appropriate to 
use a threaded_irq()?


I was thinking the same at first, but seems some mailbox drivers use threaded, 
others
use non-threaded context. Since all we do in the IRQ context anymore is call
mbox_chan_received_data(), which is supposed to be IRQ safe, then it should be 
fine
either way. So for now I just kept this using the regular IRQ context as before.

If that does turn out to be an issue then let's switch to threaded.

Andrew



Re: [PATCH 13/13] mailbox: omap: Remove kernel FIFO message queuing

2024-04-01 Thread Hari Nagalla

On 3/25/24 12:20, Andrew Davis wrote:

The kernel FIFO queue has a couple issues. The biggest issue is that
it causes extra latency in a path that can be used in real-time tasks,
such as communication with real-time remote processors.

The whole FIFO idea itself looks to be a leftover from before the
unified mailbox framework. The current mailbox framework expects
mbox_chan_received_data() to be called with data immediately as it
arrives. Remove the FIFO and pass the messages to the mailbox
framework directly.
Yes, this would definitely speed up the message receive path. With RT 
linux, the irq runs in thread context, so that is Ok. But with non-RT 
the whole receive path runs in interrupt context. So, i think it would 
be appropriate to use a threaded_irq()?




[PATCH 13/13] mailbox: omap: Remove kernel FIFO message queuing

2024-03-25 Thread Andrew Davis
The kernel FIFO queue has a couple issues. The biggest issue is that
it causes extra latency in a path that can be used in real-time tasks,
such as communication with real-time remote processors.

The whole FIFO idea itself looks to be a leftover from before the
unified mailbox framework. The current mailbox framework expects
mbox_chan_received_data() to be called with data immediately as it
arrives. Remove the FIFO and pass the messages to the mailbox
framework directly.

Signed-off-by: Andrew Davis 
---
 drivers/mailbox/Kconfig|   9 ---
 drivers/mailbox/omap-mailbox.c | 103 +
 2 files changed, 3 insertions(+), 109 deletions(-)

diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig
index 42940108a1874..78e4c74fbe5c2 100644
--- a/drivers/mailbox/Kconfig
+++ b/drivers/mailbox/Kconfig
@@ -68,15 +68,6 @@ config OMAP2PLUS_MBOX
  OMAP2/3; or IPU, IVA HD and DSP in OMAP4/5. Say Y here if you
  want to use OMAP2+ Mailbox framework support.
 
-config OMAP_MBOX_KFIFO_SIZE
-   int "Mailbox kfifo default buffer size (bytes)"
-   depends on OMAP2PLUS_MBOX
-   default 256
-   help
- Specify the default size of mailbox's kfifo buffers (bytes).
- This can also be changed at runtime (via the mbox_kfifo_size
- module parameter).
-
 config ROCKCHIP_MBOX
bool "Rockchip Soc Integrated Mailbox Support"
depends on ARCH_ROCKCHIP || COMPILE_TEST
diff --git a/drivers/mailbox/omap-mailbox.c b/drivers/mailbox/omap-mailbox.c
index c5d4083125856..4e7e0e2f537b0 100644
--- a/drivers/mailbox/omap-mailbox.c
+++ b/drivers/mailbox/omap-mailbox.c
@@ -65,14 +65,6 @@ struct omap_mbox_fifo {
u32 intr_bit;
 };
 
-struct omap_mbox_queue {
-   spinlock_t  lock;
-   struct kfifofifo;
-   struct work_struct  work;
-   struct omap_mbox*mbox;
-   bool full;
-};
-
 struct omap_mbox_match_data {
u32 intr_type;
 };
@@ -90,7 +82,6 @@ struct omap_mbox_device {
 struct omap_mbox {
const char  *name;
int irq;
-   struct omap_mbox_queue  *rxq;
struct omap_mbox_device *parent;
struct omap_mbox_fifo   tx_fifo;
struct omap_mbox_fifo   rx_fifo;
@@ -99,10 +90,6 @@ struct omap_mbox {
boolsend_no_irq;
 };
 
-static unsigned int mbox_kfifo_size = CONFIG_OMAP_MBOX_KFIFO_SIZE;
-module_param(mbox_kfifo_size, uint, S_IRUGO);
-MODULE_PARM_DESC(mbox_kfifo_size, "Size of omap's mailbox kfifo (bytes)");
-
 static inline
 unsigned int mbox_read_reg(struct omap_mbox_device *mdev, size_t ofs)
 {
@@ -202,30 +189,6 @@ static void omap_mbox_disable_irq(struct omap_mbox *mbox, 
omap_mbox_irq_t irq)
mbox_write_reg(mbox->parent, bit, irqdisable);
 }
 
-/*
- * Message receiver(workqueue)
- */
-static void mbox_rx_work(struct work_struct *work)
-{
-   struct omap_mbox_queue *mq =
-   container_of(work, struct omap_mbox_queue, work);
-   u32 msg;
-   int len;
-
-   while (kfifo_len(>fifo) >= sizeof(msg)) {
-   len = kfifo_out(>fifo, (unsigned char *), sizeof(msg));
-   WARN_ON(len != sizeof(msg));
-
-   mbox_chan_received_data(mq->mbox->chan, (void *)(uintptr_t)msg);
-   spin_lock_irq(>lock);
-   if (mq->full) {
-   mq->full = false;
-   omap_mbox_enable_irq(mq->mbox, IRQ_RX);
-   }
-   spin_unlock_irq(>lock);
-   }
-}
-
 /*
  * Mailbox interrupt handler
  */
@@ -238,27 +201,15 @@ static void __mbox_tx_interrupt(struct omap_mbox *mbox)
 
 static void __mbox_rx_interrupt(struct omap_mbox *mbox)
 {
-   struct omap_mbox_queue *mq = mbox->rxq;
u32 msg;
-   int len;
 
while (!mbox_fifo_empty(mbox)) {
-   if (unlikely(kfifo_avail(>fifo) < sizeof(msg))) {
-   omap_mbox_disable_irq(mbox, IRQ_RX);
-   mq->full = true;
-   goto nomem;
-   }
-
msg = mbox_fifo_read(mbox);
-
-   len = kfifo_in(>fifo, (unsigned char *), sizeof(msg));
-   WARN_ON(len != sizeof(msg));
+   mbox_chan_received_data(mbox->chan, (void *)(uintptr_t)msg);
}
 
-   /* no more messages in the fifo. clear IRQ source. */
+   /* clear IRQ source. */
ack_mbox_irq(mbox, IRQ_RX);
-nomem:
-   schedule_work(>rxq->work);
 }
 
 static irqreturn_t mbox_interrupt(int irq, void *p)
@@ -274,57 +225,15 @@ static irqreturn_t mbox_interrupt(int irq, void *p)
return IRQ_HANDLED;
 }
 
-static struct omap_mbox_queue *mbox_queue_alloc(struct omap_mbox *mbox,
-   void (*work)(struct work_struct *))
-{
-   struct omap_mbox_queue *mq;
-   unsigned int size;
-
-  

[PATCH -next] fs: Fix kernel-doc comments to functions

2024-03-22 Thread Yang Li
This commit fix kernel-doc style comments with complete parameter
descriptions for the lookup_file(),lookup_dir_entry() and
lookup_file_dentry().

Signed-off-by: Yang Li 
---
 fs/tracefs/event_inode.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index dc067eeb6387..894c6ca1e500 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -336,6 +336,7 @@ static void update_inode_attr(struct dentry *dentry, struct 
inode *inode,
 
 /**
  * lookup_file - look up a file in the tracefs filesystem
+ * @parent_ei: Pointer to the eventfs_inode that represents parent of the file
  * @dentry: the dentry to look up
  * @mode: the permission that the file should have.
  * @attr: saved attributes changed by user
@@ -389,6 +390,7 @@ static struct dentry *lookup_file(struct eventfs_inode 
*parent_ei,
 /**
  * lookup_dir_entry - look up a dir in the tracefs filesystem
  * @dentry: the directory to look up
+ * @pei: Pointer to the parent eventfs_inode if available
  * @ei: the eventfs_inode that represents the directory to create
  *
  * This function will look up a dentry for a directory represented by
@@ -478,16 +480,20 @@ void eventfs_d_release(struct dentry *dentry)
 
 /**
  * lookup_file_dentry - create a dentry for a file of an eventfs_inode
+ * @dentry: The parent dentry under which the new file's dentry will be created
  * @ei: the eventfs_inode that the file will be created under
  * @idx: the index into the entry_attrs[] of the @ei
- * @parent: The parent dentry of the created file.
- * @name: The name of the file to create
  * @mode: The mode of the file.
  * @data: The data to use to set the inode of the file with on open()
  * @fops: The fops of the file to be created.
  *
- * Create a dentry for a file of an eventfs_inode @ei and place it into the
- * address located at @e_dentry.
+ * This function creates a dentry for a file associated with an
+ * eventfs_inode @ei. It uses the entry attributes specified by @idx,
+ * if available. The file will have the specified @mode and its inode will be
+ * set up with @data upon open. The file operations will be set to @fops.
+ *
+ * Return: Returns a pointer to the newly created file's dentry or an error
+ * pointer.
  */
 static struct dentry *
 lookup_file_dentry(struct dentry *dentry,
-- 
2.20.1.7.g153144c




[PATCH v7 6/7] LoongArch: Add pv ipi support on guest kernel side

2024-03-15 Thread Bibo Mao
PARAVIRT option and pv ipi is added on guest kernel side, function
pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current hypervirsor
type. Now only KVM type detection is supported, the paravirt function can
work only if current hypervisor type is KVM, since there is only KVM
supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver function. With
virutal IPI sender, ipi message is stored in DDR memory rather than
emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
acknowledge. And IPI message is stored in DDR, no trap in get IPI message.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|   9 ++
 arch/loongarch/include/asm/hardirq.h  |   1 +
 arch/loongarch/include/asm/paravirt.h |  27 
 .../include/asm/paravirt_api_clock.h  |   1 +
 arch/loongarch/kernel/Makefile|   1 +
 arch/loongarch/kernel/irq.c   |   2 +-
 arch/loongarch/kernel/paravirt.c  | 151 ++
 arch/loongarch/kernel/smp.c   |   4 +-
 8 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index b274784c2e26..a1fccaf117aa 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -578,6 +578,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
 typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
+   atomic_t message cacheline_aligned_in_smp;
 } cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
per_cpu(irq_stack, i), per_cpu(irq_stack, i) + 
IRQ_STACK_SIZE);
}
 
-   set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+   set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI 
| ECFGF_PMC);
 }
diff --git a/arch/loongarch/kernel/paravir

[PATCH v6 6/7] LoongArch: Add pv ipi support on guest kernel side

2024-03-02 Thread Bibo Mao
PARAVIRT option and pv ipi is added on guest kernel side, function
pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current hypervirsor
type. Now only KVM type detection is supported, the paravirt function can
work only if current hypervisor type is KVM, since there is only KVM
supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver function. With
virutal IPI sender, ipi message is stored in DDR memory rather than
emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
acknowledge. And IPI message is stored in DDR, no trap in get IPI message.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|   9 ++
 arch/loongarch/include/asm/hardirq.h  |   1 +
 arch/loongarch/include/asm/paravirt.h |  27 
 .../include/asm/paravirt_api_clock.h  |   1 +
 arch/loongarch/kernel/Makefile|   1 +
 arch/loongarch/kernel/irq.c   |   2 +-
 arch/loongarch/kernel/paravirt.c  | 151 ++
 arch/loongarch/kernel/smp.c   |   4 +-
 8 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 929f68926b34..fdaae9a0435c 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
 typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
+   atomic_t message cacheline_aligned_in_smp;
 } cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
per_cpu(irq_stack, i), per_cpu(irq_stack, i) + 
IRQ_STACK_SIZE);
}
 
-   set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+   set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI 
| ECFGF_PMC);
 }
diff --git a/arch/loongarch/kernel/paravir

Re: [PATCH v5 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-25 Thread maobibo




On 2024/2/24 下午5:15, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao  wrote:


Paravirt interface pv_ipi_init() is added here for guest kernel, it
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current VMM type.
Now only KVM VMM type is detected,the paravirt function can work only if
current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

There is not effective with pv_ipi_init() now, it is dummy function.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/Kconfig|  9 
  arch/loongarch/include/asm/kvm_para.h |  7 
  arch/loongarch/include/asm/paravirt.h | 27 
  .../include/asm/paravirt_api_clock.h  |  1 +
  arch/loongarch/kernel/Makefile|  1 +
  arch/loongarch/kernel/paravirt.c  | 41 +++
  arch/loongarch/kernel/setup.c |  1 +
  7 files changed, 87 insertions(+)
  create mode 100644 arch/loongarch/include/asm/paravirt.h
  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
  create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 929f68926b34..fdaae9a0435c 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH
 bool
 default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
  config ARCH_SUPPORTS_KEXEC
 def_bool y

diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index d48f993ae206..af5d677a9052 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
  #ifndef _ASM_LOONGARCH_KVM_PARA_H
  #define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypercall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
  /*
   * LoongArch hypercall return code
   */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
  obj-$(CONFIG_STACKTRACE)   += stacktrace.o

  obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o

  obj-$(CONFIG_SMP)  += smp.o

diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..5cf794e8490f
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_ipi_init(void)
+{
+   if (!cp

Re: [PATCH] ftrace: fix most kernel-doc warnings

2024-02-25 Thread Google
On Thu, 22 Feb 2024 21:48:33 -0800
Randy Dunlap  wrote:

> Reduce the number of kernel-doc warnings from 52 down to 10, i.e.,
> fix 42 kernel-doc warnings by (a) using the Returns: format for
> function return values or (b) using "@var:" instead of "@var -"
> for function parameter descriptions.
> 
> Fix one return values list so that it is formatted correctly when
> rendered for output.
> 
> Spell "non-zero" with a hyphen in several places.

Looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thanks!

> 
> Signed-off-by: Randy Dunlap 
> Reported-by: kernel test robot 
> Link: 
> https://lore.kernel.org/oe-kbuild-all/202312180518.x6frydsn-...@intel.com/
> Cc: Steven Rostedt 
> Cc: Masami Hiramatsu 
> Cc: Mathieu Desnoyers 
> Cc: Mark Rutland 
> Cc: linux-trace-ker...@vger.kernel.org
> ---
> This patch addresses most of the reported kernel-doc warnings but does
> not fix all of them, so I did not use "Closes:" for the Link: tag.
> 
>  kernel/trace/ftrace.c |   90 ++++----
>  1 file changed, 46 insertions(+), 44 deletions(-)
> 
> diff -- a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1160,7 +1160,7 @@ __ftrace_lookup_ip(struct ftrace_hash *h
>   * Search a given @hash to see if a given instruction pointer (@ip)
>   * exists in it.
>   *
> - * Returns the entry that holds the @ip if found. NULL otherwise.
> + * Returns: the entry that holds the @ip if found. NULL otherwise.
>   */
>  struct ftrace_func_entry *
>  ftrace_lookup_ip(struct ftrace_hash *hash, unsigned long ip)
> @@ -1282,7 +1282,7 @@ static void free_ftrace_hash_rcu(struct
>  
>  /**
>   * ftrace_free_filter - remove all filters for an ftrace_ops
> - * @ops - the ops to remove the filters from
> + * @ops: the ops to remove the filters from
>   */
>  void ftrace_free_filter(struct ftrace_ops *ops)
>  {
> @@ -1587,7 +1587,7 @@ static struct dyn_ftrace *lookup_rec(uns
>   * @end: end of range to search (inclusive). @end points to the last byte
>   *   to check.
>   *
> - * Returns rec->ip if the related ftrace location is a least partly within
> + * Returns: rec->ip if the related ftrace location is a least partly within
>   * the given address range. That is, the first address of the instruction
>   * that is either a NOP or call to the function tracer. It checks the ftrace
>   * internal tables to determine if the address belongs or not.
> @@ -1607,9 +1607,10 @@ unsigned long ftrace_location_range(unsi
>   * ftrace_location - return the ftrace location
>   * @ip: the instruction pointer to check
>   *
> - * If @ip matches the ftrace location, return @ip.
> - * If @ip matches sym+0, return sym's ftrace location.
> - * Otherwise, return 0.
> + * Returns:
> + * * If @ip matches the ftrace location, return @ip.
> + * * If @ip matches sym+0, return sym's ftrace location.
> + * * Otherwise, return 0.
>   */
>  unsigned long ftrace_location(unsigned long ip)
>  {
> @@ -1639,7 +1640,7 @@ out:
>   * @start: start of range to search
>   * @end: end of range to search (inclusive). @end points to the last byte to 
> check.
>   *
> - * Returns 1 if @start and @end contains a ftrace location.
> + * Returns: 1 if @start and @end contains a ftrace location.
>   * That is, the instruction that is either a NOP or call to
>   * the function tracer. It checks the ftrace internal tables to
>   * determine if the address belongs or not.
> @@ -2574,7 +2575,7 @@ static void call_direct_funcs(unsigned l
>   * wants to convert to a callback that saves all regs. If FTRACE_FL_REGS
>   * is not set, then it wants to convert to the normal callback.
>   *
> - * Returns the address of the trampoline to set to
> + * Returns: the address of the trampoline to set to
>   */
>  unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec)
>  {
> @@ -2615,7 +2616,7 @@ unsigned long ftrace_get_addr_new(struct
>   * a function that saves all the regs. Basically the '_EN' version
>   * represents the current state of the function.
>   *
> - * Returns the address of the trampoline that is currently being called
> + * Returns: the address of the trampoline that is currently being called
>   */
>  unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec)
>  {
> @@ -2719,7 +2720,7 @@ struct ftrace_rec_iter {
>  /**
>   * ftrace_rec_iter_start - start up iterating over traced functions
>   *
> - * Returns an iterator handle that is used to iterate over all
> + * Returns: an iterator handle that is used to iterate over all
>   * the records that represent address locations where functions
>   * are traced.
>   *
> @@ -2751,7 +2

Re: [PATCH v5 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-24 Thread Huacai Chen
Hi, Bibo,

On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao  wrote:
>
> Paravirt interface pv_ipi_init() is added here for guest kernel, it
> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> it will call function kvm_para_available() to detect current VMM type.
> Now only KVM VMM type is detected,the paravirt function can work only if
> current VMM is KVM hypervisor, since there is only KVM hypervisor
> supported on LoongArch now.
>
> There is not effective with pv_ipi_init() now, it is dummy function.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/Kconfig|  9 
>  arch/loongarch/include/asm/kvm_para.h |  7 
>  arch/loongarch/include/asm/paravirt.h | 27 
>  .../include/asm/paravirt_api_clock.h  |  1 +
>  arch/loongarch/kernel/Makefile|  1 +
>  arch/loongarch/kernel/paravirt.c      | 41 +++
>  arch/loongarch/kernel/setup.c |  1 +
>  7 files changed, 87 insertions(+)
>  create mode 100644 arch/loongarch/include/asm/paravirt.h
>  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>  create mode 100644 arch/loongarch/kernel/paravirt.c
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 929f68926b34..fdaae9a0435c 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH
> bool
> default y
>
> +config PARAVIRT
> +   bool "Enable paravirtualization code"
> +   depends on AS_HAS_LVZ_EXTENSION
> +   help
> +  This changes the kernel so it can modify itself when it is run
> + under a hypervisor, potentially improving performance significantly
> + over full virtualization.  However, when run without a hypervisor
> + the kernel is theoretically slower and slightly larger.
> +
>  config ARCH_SUPPORTS_KEXEC
> def_bool y
>
> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> b/arch/loongarch/include/asm/kvm_para.h
> index d48f993ae206..af5d677a9052 100644
> --- a/arch/loongarch/include/asm/kvm_para.h
> +++ b/arch/loongarch/include/asm/kvm_para.h
> @@ -2,6 +2,13 @@
>  #ifndef _ASM_LOONGARCH_KVM_PARA_H
>  #define _ASM_LOONGARCH_KVM_PARA_H
>
> +/*
> + * Hypercall code field
> + */
> +#define HYPERVISOR_KVM 1
> +#define HYPERVISOR_VENDOR_SHIFT8
> +#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) 
> + code)
> +
>  /*
>   * LoongArch hypercall return code
>   */
> diff --git a/arch/loongarch/include/asm/paravirt.h 
> b/arch/loongarch/include/asm/paravirt.h
> new file mode 100644
> index ..58f7b7b89f2c
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> +#define _ASM_LOONGARCH_PARAVIRT_H
> +
> +#ifdef CONFIG_PARAVIRT
> +#include 
> +struct static_key;
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 dummy_steal_clock(int cpu);
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> +   return static_call(pv_steal_clock)(cpu);
> +}
> +
> +int pv_ipi_init(void);
> +#else
> +static inline int pv_ipi_init(void)
> +{
> +   return 0;
> +}
> +
> +#endif // CONFIG_PARAVIRT
> +#endif
> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
> b/arch/loongarch/include/asm/paravirt_api_clock.h
> new file mode 100644
> index ..65ac7cee0dad
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> @@ -0,0 +1 @@
> +#include 
> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> index 3c808c680370..662e6e9de12d 100644
> --- a/arch/loongarch/kernel/Makefile
> +++ b/arch/loongarch/kernel/Makefile
> @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
>  obj-$(CONFIG_STACKTRACE)   += stacktrace.o
>
>  obj-$(CONFIG_PROC_FS)  += proc.o
> +obj-$(CONFIG_PARAVIRT) += paravirt.o
>
>  obj-$(CONFIG_SMP)  += smp.o
>
> diff --git a/arch/loongarch/kernel/paravirt.c 
> b/arch/loongarch/kernel/paravirt.c
> new file mode 100644
> index ..5cf794e8490f
> --- /dev/null
> +++ b/arch/loongarch/kernel/paravirt.c
> @@ -0,0 +1,41 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct static_key paravirt_steal_enabled;
&

[PATCH] ftrace: fix most kernel-doc warnings

2024-02-22 Thread Randy Dunlap
Reduce the number of kernel-doc warnings from 52 down to 10, i.e.,
fix 42 kernel-doc warnings by (a) using the Returns: format for
function return values or (b) using "@var:" instead of "@var -"
for function parameter descriptions.

Fix one return values list so that it is formatted correctly when
rendered for output.

Spell "non-zero" with a hyphen in several places.

Signed-off-by: Randy Dunlap 
Reported-by: kernel test robot 
Link: https://lore.kernel.org/oe-kbuild-all/202312180518.x6frydsn-...@intel.com/
Cc: Steven Rostedt 
Cc: Masami Hiramatsu 
Cc: Mathieu Desnoyers 
Cc: Mark Rutland 
Cc: linux-trace-ker...@vger.kernel.org
---
This patch addresses most of the reported kernel-doc warnings but does
not fix all of them, so I did not use "Closes:" for the Link: tag.

 kernel/trace/ftrace.c |   90 
 1 file changed, 46 insertions(+), 44 deletions(-)

diff -- a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1160,7 +1160,7 @@ __ftrace_lookup_ip(struct ftrace_hash *h
  * Search a given @hash to see if a given instruction pointer (@ip)
  * exists in it.
  *
- * Returns the entry that holds the @ip if found. NULL otherwise.
+ * Returns: the entry that holds the @ip if found. NULL otherwise.
  */
 struct ftrace_func_entry *
 ftrace_lookup_ip(struct ftrace_hash *hash, unsigned long ip)
@@ -1282,7 +1282,7 @@ static void free_ftrace_hash_rcu(struct
 
 /**
  * ftrace_free_filter - remove all filters for an ftrace_ops
- * @ops - the ops to remove the filters from
+ * @ops: the ops to remove the filters from
  */
 void ftrace_free_filter(struct ftrace_ops *ops)
 {
@@ -1587,7 +1587,7 @@ static struct dyn_ftrace *lookup_rec(uns
  * @end: end of range to search (inclusive). @end points to the last byte
  * to check.
  *
- * Returns rec->ip if the related ftrace location is a least partly within
+ * Returns: rec->ip if the related ftrace location is a least partly within
  * the given address range. That is, the first address of the instruction
  * that is either a NOP or call to the function tracer. It checks the ftrace
  * internal tables to determine if the address belongs or not.
@@ -1607,9 +1607,10 @@ unsigned long ftrace_location_range(unsi
  * ftrace_location - return the ftrace location
  * @ip: the instruction pointer to check
  *
- * If @ip matches the ftrace location, return @ip.
- * If @ip matches sym+0, return sym's ftrace location.
- * Otherwise, return 0.
+ * Returns:
+ * * If @ip matches the ftrace location, return @ip.
+ * * If @ip matches sym+0, return sym's ftrace location.
+ * * Otherwise, return 0.
  */
 unsigned long ftrace_location(unsigned long ip)
 {
@@ -1639,7 +1640,7 @@ out:
  * @start: start of range to search
  * @end: end of range to search (inclusive). @end points to the last byte to 
check.
  *
- * Returns 1 if @start and @end contains a ftrace location.
+ * Returns: 1 if @start and @end contains a ftrace location.
  * That is, the instruction that is either a NOP or call to
  * the function tracer. It checks the ftrace internal tables to
  * determine if the address belongs or not.
@@ -2574,7 +2575,7 @@ static void call_direct_funcs(unsigned l
  * wants to convert to a callback that saves all regs. If FTRACE_FL_REGS
  * is not set, then it wants to convert to the normal callback.
  *
- * Returns the address of the trampoline to set to
+ * Returns: the address of the trampoline to set to
  */
 unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec)
 {
@@ -2615,7 +2616,7 @@ unsigned long ftrace_get_addr_new(struct
  * a function that saves all the regs. Basically the '_EN' version
  * represents the current state of the function.
  *
- * Returns the address of the trampoline that is currently being called
+ * Returns: the address of the trampoline that is currently being called
  */
 unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec)
 {
@@ -2719,7 +2720,7 @@ struct ftrace_rec_iter {
 /**
  * ftrace_rec_iter_start - start up iterating over traced functions
  *
- * Returns an iterator handle that is used to iterate over all
+ * Returns: an iterator handle that is used to iterate over all
  * the records that represent address locations where functions
  * are traced.
  *
@@ -2751,7 +2752,7 @@ struct ftrace_rec_iter *ftrace_rec_iter_
  * ftrace_rec_iter_next - get the next record to process.
  * @iter: The handle to the iterator.
  *
- * Returns the next iterator after the given iterator @iter.
+ * Returns: the next iterator after the given iterator @iter.
  */
 struct ftrace_rec_iter *ftrace_rec_iter_next(struct ftrace_rec_iter *iter)
 {
@@ -2776,7 +2777,7 @@ struct ftrace_rec_iter *ftrace_rec_iter_
  * ftrace_rec_iter_record - get the record at the iterator location
  * @iter: The current iterator location
  *
- * Returns the record that the current @iter is at.
+ * Returns: the record that the current @iter is at.
  */
 st

[PATCH v5 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-21 Thread Bibo Mao
Paravirt interface pv_ipi_init() is added here for guest kernel, it
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current VMM type.
Now only KVM VMM type is detected,the paravirt function can work only if
current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

There is not effective with pv_ipi_init() now, it is dummy function.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|  9 
 arch/loongarch/include/asm/kvm_para.h |  7 
 arch/loongarch/include/asm/paravirt.h | 27 
 .../include/asm/paravirt_api_clock.h  |  1 +
 arch/loongarch/kernel/Makefile|  1 +
 arch/loongarch/kernel/paravirt.c  | 41 +++
 arch/loongarch/kernel/setup.c |  1 +
 7 files changed, 87 insertions(+)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 929f68926b34..fdaae9a0435c 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index d48f993ae206..af5d677a9052 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
 #ifndef _ASM_LOONGARCH_KVM_PARA_H
 #define _ASM_LOONGARCH_KVM_PARA_H
 
+/*
+ * Hypercall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
 /*
  * LoongArch hypercall return code
  */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..5cf794e8490f
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_ipi_init(void)
+{
+   if (!cpu_has_hypervisor)
+   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+   return 1;
+}
diff --git 

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread maobibo




On 2024/2/19 下午5:38, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 5:21 PM maobibo  wrote:




On 2024/2/19 下午4:48, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:




On 2024/2/19 上午10:42, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:


The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
arch/loongarch/Kconfig|  9 
arch/loongarch/include/asm/kvm_para.h |  7 
arch/loongarch/include/asm/paravirt.h | 27 
.../include/asm/paravirt_api_clock.h  |  1 +
arch/loongarch/kernel/Makefile|  1 +
arch/loongarch/kernel/paravirt.c  | 41 +++
arch/loongarch/kernel/setup.c |  2 +
7 files changed, 88 insertions(+)
create mode 100644 arch/loongarch/include/asm/paravirt.h
create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
   bool
   default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
config ARCH_SUPPORTS_KEXEC
   def_bool y

diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
#ifndef _ASM_LOONGARCH_KVM_PARA_H
#define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
/*
 * LoongArch hypcall return code
 */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}

The steal time code can be removed in this patch, I think.


Originally I want to remove this piece of code, but it fails to compile
if CONFIG_PARAVIRT is selected. Here is reference code, function
paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.

static __always_inline u64 steal_account_process_time(u64 maxtime)
{
#ifdef CONFIG_PARAVIRT
   if (static_key_false(_steal_enabled)) {
   u64 steal;

   steal = paravirt_steal_clock(smp_processor_id());
   steal -= this_rq()->prev_steal_time;
   steal = min(steal, maxtime);
   account_steal_time(steal);
   this_rq()->prev_steal_time += steal;

   return steal;
   }
#endif
   return 0;
}

OK, then keep it.




+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
obj-$(CONFIG_STACKTRACE)

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread Huacai Chen
On Mon, Feb 19, 2024 at 5:21 PM maobibo  wrote:
>
>
>
> On 2024/2/19 下午4:48, Huacai Chen wrote:
> > On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:
> >>
> >>
> >>
> >> On 2024/2/19 上午10:42, Huacai Chen wrote:
> >>> Hi, Bibo,
> >>>
> >>> On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:
> >>>>
> >>>> The patch adds paravirt interface for guest kernel, function
> >>>> pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
> >>>> runs on VM mode, it will call function kvm_para_available() to detect
> >>>> whether current VMM is KVM hypervisor. And the paravirt function can work
> >>>> only if current VMM is KVM hypervisor, since there is only KVM hypervisor
> >>>> supported on LoongArch now.
> >>>>
> >>>> This patch only adds paravirt interface for guest kernel, however there
> >>>> is not effective pv functions added here.
> >>>>
> >>>> Signed-off-by: Bibo Mao 
> >>>> ---
> >>>>arch/loongarch/Kconfig|  9 ++++
> >>>>arch/loongarch/include/asm/kvm_para.h |  7 ++++
> >>>>arch/loongarch/include/asm/paravirt.h | 27 
> >>>>.../include/asm/paravirt_api_clock.h  |  1 +
> >>>>arch/loongarch/kernel/Makefile|  1 +
> >>>>    arch/loongarch/kernel/paravirt.c  | 41 +++
> >>>>arch/loongarch/kernel/setup.c |  2 +
> >>>>7 files changed, 88 insertions(+)
> >>>>create mode 100644 arch/loongarch/include/asm/paravirt.h
> >>>>create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> >>>>create mode 100644 arch/loongarch/kernel/paravirt.c
> >>>>
> >>>> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >>>> index 10959e6c3583..817a56dff80f 100644
> >>>> --- a/arch/loongarch/Kconfig
> >>>> +++ b/arch/loongarch/Kconfig
> >>>> @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
> >>>>   bool
> >>>>   default y
> >>>>
> >>>> +config PARAVIRT
> >>>> +   bool "Enable paravirtualization code"
> >>>> +   depends on AS_HAS_LVZ_EXTENSION
> >>>> +   help
> >>>> +  This changes the kernel so it can modify itself when it is run
> >>>> + under a hypervisor, potentially improving performance 
> >>>> significantly
> >>>> + over full virtualization.  However, when run without a 
> >>>> hypervisor
> >>>> + the kernel is theoretically slower and slightly larger.
> >>>> +
> >>>>config ARCH_SUPPORTS_KEXEC
> >>>>   def_bool y
> >>>>
> >>>> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> >>>> b/arch/loongarch/include/asm/kvm_para.h
> >>>> index 9425d3b7e486..41200e922a82 100644
> >>>> --- a/arch/loongarch/include/asm/kvm_para.h
> >>>> +++ b/arch/loongarch/include/asm/kvm_para.h
> >>>> @@ -2,6 +2,13 @@
> >>>>#ifndef _ASM_LOONGARCH_KVM_PARA_H
> >>>>#define _ASM_LOONGARCH_KVM_PARA_H
> >>>>
> >>>> +/*
> >>>> + * Hypcall code field
> >>>> + */
> >>>> +#define HYPERVISOR_KVM 1
> >>>> +#define HYPERVISOR_VENDOR_SHIFT8
> >>>> +#define HYPERCALL_CODE(vendor, code)   ((vendor << 
> >>>> HYPERVISOR_VENDOR_SHIFT) + code)
> >>>> +
> >>>>/*
> >>>> * LoongArch hypcall return code
> >>>> */
> >>>> diff --git a/arch/loongarch/include/asm/paravirt.h 
> >>>> b/arch/loongarch/include/asm/paravirt.h
> >>>> new file mode 100644
> >>>> index ..b64813592ba0
> >>>> --- /dev/null
> >>>> +++ b/arch/loongarch/include/asm/paravirt.h
> >>>> @@ -0,0 +1,27 @@
> >>>> +/* SPDX-License-Identifier: GPL-2.0 */
> >>>> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> >>>> +#define _ASM_LOONGARCH_PARAVIRT_H
> >>>> +
> >>>> +#ifdef CONFIG_PARAVIRT
> >>>> +#include 
> &g

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread maobibo




On 2024/2/19 下午4:48, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:




On 2024/2/19 上午10:42, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:


The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
   arch/loongarch/Kconfig|  9 
   arch/loongarch/include/asm/kvm_para.h |  7 
   arch/loongarch/include/asm/paravirt.h | 27 
   .../include/asm/paravirt_api_clock.h  |  1 +
   arch/loongarch/kernel/Makefile|  1 +
   arch/loongarch/kernel/paravirt.c  | 41 +++
   arch/loongarch/kernel/setup.c |  2 +
   7 files changed, 88 insertions(+)
   create mode 100644 arch/loongarch/include/asm/paravirt.h
   create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
   create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
  bool
  default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
   config ARCH_SUPPORTS_KEXEC
  def_bool y

diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
   #ifndef _ASM_LOONGARCH_KVM_PARA_H
   #define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
   /*
* LoongArch hypcall return code
*/
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}

The steal time code can be removed in this patch, I think.


Originally I want to remove this piece of code, but it fails to compile
if CONFIG_PARAVIRT is selected. Here is reference code, function
paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.

static __always_inline u64 steal_account_process_time(u64 maxtime)
{
#ifdef CONFIG_PARAVIRT
  if (static_key_false(_steal_enabled)) {
  u64 steal;

  steal = paravirt_steal_clock(smp_processor_id());
  steal -= this_rq()->prev_steal_time;
  steal = min(steal, maxtime);
  account_steal_time(steal);
  this_rq()->prev_steal_time += steal;

  return steal;
  }
#endif
  return 0;
}

OK, then keep it.




+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
   obj-$(CONFIG_STACKTRACE)   += stacktrace.o

   obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread Huacai Chen
On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:
>
>
>
> On 2024/2/19 上午10:42, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:
> >>
> >> The patch adds paravirt interface for guest kernel, function
> >> pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
> >> runs on VM mode, it will call function kvm_para_available() to detect
> >> whether current VMM is KVM hypervisor. And the paravirt function can work
> >> only if current VMM is KVM hypervisor, since there is only KVM hypervisor
> >> supported on LoongArch now.
> >>
> >> This patch only adds paravirt interface for guest kernel, however there
> >> is not effective pv functions added here.
> >>
> >> Signed-off-by: Bibo Mao 
> >> ---
> >>   arch/loongarch/Kconfig|  9 
> >>   arch/loongarch/include/asm/kvm_para.h     |  7 
> >>   arch/loongarch/include/asm/paravirt.h | 27 
> >>   .../include/asm/paravirt_api_clock.h  |  1 +
> >>   arch/loongarch/kernel/Makefile|  1 +
> >>   arch/loongarch/kernel/paravirt.c  | 41 +++
> >>   arch/loongarch/kernel/setup.c |  2 +
> >>   7 files changed, 88 insertions(+)
> >>   create mode 100644 arch/loongarch/include/asm/paravirt.h
> >>   create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> >>   create mode 100644 arch/loongarch/kernel/paravirt.c
> >>
> >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >> index 10959e6c3583..817a56dff80f 100644
> >> --- a/arch/loongarch/Kconfig
> >> +++ b/arch/loongarch/Kconfig
> >> @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
> >>  bool
> >>  default y
> >>
> >> +config PARAVIRT
> >> +   bool "Enable paravirtualization code"
> >> +   depends on AS_HAS_LVZ_EXTENSION
> >> +   help
> >> +  This changes the kernel so it can modify itself when it is run
> >> + under a hypervisor, potentially improving performance 
> >> significantly
> >> + over full virtualization.  However, when run without a hypervisor
> >> + the kernel is theoretically slower and slightly larger.
> >> +
> >>   config ARCH_SUPPORTS_KEXEC
> >>  def_bool y
> >>
> >> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> >> b/arch/loongarch/include/asm/kvm_para.h
> >> index 9425d3b7e486..41200e922a82 100644
> >> --- a/arch/loongarch/include/asm/kvm_para.h
> >> +++ b/arch/loongarch/include/asm/kvm_para.h
> >> @@ -2,6 +2,13 @@
> >>   #ifndef _ASM_LOONGARCH_KVM_PARA_H
> >>   #define _ASM_LOONGARCH_KVM_PARA_H
> >>
> >> +/*
> >> + * Hypcall code field
> >> + */
> >> +#define HYPERVISOR_KVM 1
> >> +#define HYPERVISOR_VENDOR_SHIFT8
> >> +#define HYPERCALL_CODE(vendor, code)   ((vendor << 
> >> HYPERVISOR_VENDOR_SHIFT) + code)
> >> +
> >>   /*
> >>* LoongArch hypcall return code
> >>*/
> >> diff --git a/arch/loongarch/include/asm/paravirt.h 
> >> b/arch/loongarch/include/asm/paravirt.h
> >> new file mode 100644
> >> index ..b64813592ba0
> >> --- /dev/null
> >> +++ b/arch/loongarch/include/asm/paravirt.h
> >> @@ -0,0 +1,27 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> >> +#define _ASM_LOONGARCH_PARAVIRT_H
> >> +
> >> +#ifdef CONFIG_PARAVIRT
> >> +#include 
> >> +struct static_key;
> >> +extern struct static_key paravirt_steal_enabled;
> >> +extern struct static_key paravirt_steal_rq_enabled;
> >> +
> >> +u64 dummy_steal_clock(int cpu);
> >> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> >> +
> >> +static inline u64 paravirt_steal_clock(int cpu)
> >> +{
> >> +   return static_call(pv_steal_clock)(cpu);
> >> +}
> > The steal time code can be removed in this patch, I think.
> >
> Originally I want to remove this piece of code, but it fails to compile
> if CONFIG_PARAVIRT is selected. Here is reference code, function
> paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.
>
> static __always_inline u64 steal_account_process_time(u64 maxtime)
> {
> #ifdef CON

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-18 Thread maobibo




On 2024/2/19 上午10:42, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:


The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/Kconfig|  9 
  arch/loongarch/include/asm/kvm_para.h |  7 
  arch/loongarch/include/asm/paravirt.h | 27 
  .../include/asm/paravirt_api_clock.h  |  1 +
  arch/loongarch/kernel/Makefile|  1 +
  arch/loongarch/kernel/paravirt.c  | 41 +++
  arch/loongarch/kernel/setup.c |  2 +
  7 files changed, 88 insertions(+)
  create mode 100644 arch/loongarch/include/asm/paravirt.h
  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
  create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
 bool
 default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
  config ARCH_SUPPORTS_KEXEC
 def_bool y

diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
  #ifndef _ASM_LOONGARCH_KVM_PARA_H
  #define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
  /*
   * LoongArch hypcall return code
   */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}

The steal time code can be removed in this patch, I think.

Originally I want to remove this piece of code, but it fails to compile 
if CONFIG_PARAVIRT is selected. Here is reference code, function 
paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.


static __always_inline u64 steal_account_process_time(u64 maxtime)
{
#ifdef CONFIG_PARAVIRT
if (static_key_false(_steal_enabled)) {
u64 steal;

steal = paravirt_steal_clock(smp_processor_id());
steal -= this_rq()->prev_steal_time;
steal = min(steal, maxtime);
account_steal_time(steal);
this_rq()->prev_steal_time += steal;

return steal;
}
#endif
return 0;
}


+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
  obj-$(CONFIG_STACKTRACE)   += stacktrace.o

  obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o

  obj-$(CONFIG_SMP)  += smp.o

diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d0

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-18 Thread Huacai Chen
Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:
>
> The patch adds paravirt interface for guest kernel, function
> pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
> runs on VM mode, it will call function kvm_para_available() to detect
> whether current VMM is KVM hypervisor. And the paravirt function can work
> only if current VMM is KVM hypervisor, since there is only KVM hypervisor
> supported on LoongArch now.
>
> This patch only adds paravirt interface for guest kernel, however there
> is not effective pv functions added here.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/Kconfig|  9 
>  arch/loongarch/include/asm/kvm_para.h |  7 
>  arch/loongarch/include/asm/paravirt.h | 27 
>  .../include/asm/paravirt_api_clock.h  |  1 +
>  arch/loongarch/kernel/Makefile|  1 +
>  arch/loongarch/kernel/paravirt.c  | 41 +++
>  arch/loongarch/kernel/setup.c |  2 +
>  7 files changed, 88 insertions(+)
>  create mode 100644 arch/loongarch/include/asm/paravirt.h
>  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>  create mode 100644 arch/loongarch/kernel/paravirt.c
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 10959e6c3583..817a56dff80f 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
> bool
> default y
>
> +config PARAVIRT
> +   bool "Enable paravirtualization code"
> +   depends on AS_HAS_LVZ_EXTENSION
> +   help
> +  This changes the kernel so it can modify itself when it is run
> + under a hypervisor, potentially improving performance significantly
> + over full virtualization.  However, when run without a hypervisor
> + the kernel is theoretically slower and slightly larger.
> +
>  config ARCH_SUPPORTS_KEXEC
> def_bool y
>
> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> b/arch/loongarch/include/asm/kvm_para.h
> index 9425d3b7e486..41200e922a82 100644
> --- a/arch/loongarch/include/asm/kvm_para.h
> +++ b/arch/loongarch/include/asm/kvm_para.h
> @@ -2,6 +2,13 @@
>  #ifndef _ASM_LOONGARCH_KVM_PARA_H
>  #define _ASM_LOONGARCH_KVM_PARA_H
>
> +/*
> + * Hypcall code field
> + */
> +#define HYPERVISOR_KVM 1
> +#define HYPERVISOR_VENDOR_SHIFT8
> +#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) 
> + code)
> +
>  /*
>   * LoongArch hypcall return code
>   */
> diff --git a/arch/loongarch/include/asm/paravirt.h 
> b/arch/loongarch/include/asm/paravirt.h
> new file mode 100644
> index ..b64813592ba0
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> +#define _ASM_LOONGARCH_PARAVIRT_H
> +
> +#ifdef CONFIG_PARAVIRT
> +#include 
> +struct static_key;
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 dummy_steal_clock(int cpu);
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> +   return static_call(pv_steal_clock)(cpu);
> +}
The steal time code can be removed in this patch, I think.

> +
> +int pv_guest_init(void);
> +#else
> +static inline int pv_guest_init(void)
> +{
> +   return 0;
> +}
> +
> +#endif // CONFIG_PARAVIRT
> +#endif
> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
> b/arch/loongarch/include/asm/paravirt_api_clock.h
> new file mode 100644
> index ..65ac7cee0dad
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> @@ -0,0 +1 @@
> +#include 
> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> index 3c808c680370..662e6e9de12d 100644
> --- a/arch/loongarch/kernel/Makefile
> +++ b/arch/loongarch/kernel/Makefile
> @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
>  obj-$(CONFIG_STACKTRACE)   += stacktrace.o
>
>  obj-$(CONFIG_PROC_FS)  += proc.o
> +obj-$(CONFIG_PARAVIRT) += paravirt.o
>
>  obj-$(CONFIG_SMP)  += smp.o
>
> diff --git a/arch/loongarch/kernel/paravirt.c 
> b/arch/loongarch/kernel/paravirt.c
> new file mode 100644
> index ..21d01d05791a
> --- /dev/null
> +++ b/arch/loongarch/kernel/paravirt.c
> @@ -0,0 +1,41 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include

Re: [PATCH v2] x86/sgx: fix kernel-doc comment misuse

2024-02-12 Thread Jarkko Sakkinen
On Sun Feb 11, 2024 at 8:24 AM EET, Randy Dunlap wrote:
> Don't use "/**" for a non-kernel-doc comment. This prevents a warning
> from scripts/kernel-doc:
>
> main.c:740: warning: expecting prototype for A section metric is concatenated 
> in a way that @low bits 12(). Prototype was for sgx_calc_section_metric() 
> instead
>
> Cc: Jarkko Sakkinen 
> Cc: Dave Hansen 
> Cc: linux-...@vger.kernel.org
> Cc: x...@kernel.org
> Reviewed-by: Kai Huang 
> Signed-off-by: Randy Dunlap 
> ---
> v2: add Rev-by: Kai Huang
>
>  arch/x86/kernel/cpu/sgx/main.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff -- a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -731,7 +731,7 @@ out:
>   return 0;
>  }
>  
> -/**
> +/*
>   * A section metric is concatenated in a way that @low bits 12-31 define the
>   * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
>   * metric.


Reviewed-by: Jarkko Sakkinen 

BR, Jarkko



Re: [syzbot] [fs?] [trace?] BUG: unable to handle kernel paging request in tracefs_apply_options

2024-02-12 Thread syzbot
syzbot suspects this issue was fixed by commit:

commit ad579864637af46447208254719943179b69d41a
Author: Steven Rostedt (Google) 
Date:   Tue Jan 2 20:12:49 2024 +

tracefs: Check for dentry->d_inode exists in set_gid()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=17659d2418
start commit:   453f5db0619e Merge tag 'trace-v6.7-rc7' of git://git.kerne..
git tree:   upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=f8e72bae38c079e4
dashboard link: https://syzkaller.appspot.com/bug?extid=f8a023e0c6beabe2371a
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1414af31e8
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15e52409e8

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: tracefs: Check for dentry->d_inode exists in set_gid()

For information about bisection process see: https://goo.gl/tpsmEJ#bisection



[PATCH v2] x86/sgx: fix kernel-doc comment misuse

2024-02-10 Thread Randy Dunlap
Don't use "/**" for a non-kernel-doc comment. This prevents a warning
from scripts/kernel-doc:

main.c:740: warning: expecting prototype for A section metric is concatenated 
in a way that @low bits 12(). Prototype was for sgx_calc_section_metric() 
instead

Cc: Jarkko Sakkinen 
Cc: Dave Hansen 
Cc: linux-...@vger.kernel.org
Cc: x...@kernel.org
Reviewed-by: Kai Huang 
Signed-off-by: Randy Dunlap 
---
v2: add Rev-by: Kai Huang

 arch/x86/kernel/cpu/sgx/main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -- a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -731,7 +731,7 @@ out:
return 0;
 }
 
-/**
+/*
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
  * metric.



Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-02-02 Thread Luis Chamberlain
On Thu, Feb 01, 2024 at 03:27:54PM +0100, Marco Pagani wrote:
> 
> On 2024-01-30 21:47, Luis Chamberlain wrote:
> > 
> > It very much sounds like there is a desire to have this but without a
> > user, there is no justification.
> 
> I was working on a set of patches to fix an issue in the fpga subsystem
> when I came across your commit 557aafac1153 ("kernel/module: add
> documentation for try_module_get()") that made me realize we also had a
> safety problem. 
> 
> To solve this problem for the fpga manager, we had to add a mutex to
> ensure the low-level module still exists before calling
> try_module_get(). However, having a safer version of try_module_get()
> would have simplified the code and made it more robust against changes.
> 
> https://lore.kernel.org/linux-fpga/2024060242.149265-1-marpa...@redhat.com/
> 
> I suspect there may be other cases where try_module_get() is
> inadvertently called without ensuring that the module still exists
> that may benefit from a safer implementation.

Maybe so, however I'm not yet sure if this is safe from deadlocks.
Please work on a series of selftest simple modules which demonstrate
its use / and a simple bash script selftest loader which verifies this
won't bust. Consider you may have third party modules which also race
with this too, and other users without this new API.

> >> +bool try_module_get_safe(struct module *module)
> >> +{
> >> +  struct module *mod;
> >> +  bool ret = true;
> >> +
> >> +  if (!module)
> >> +  goto out;
> >> +
> >> +  mutex_lock(_mutex);
> > 
> > If a user comes around then this should be mutex_lock_interruptible(),
> > and add might_sleep()
> 
> Would it be okay to return false if it gets interrupted, or should I
> change the return type to int to propagate -EINTR? My concern with
> changing the signature is that it would be less straightforward to
> use the function in place of try_module_get().

Since we want a safe mechanism we might as well not allow a simple drop
in replacement but a more robust one so that users take care of the
return value properly.

  Luis



Re: [PATCH] lib/test_kmod: fix kernel-doc warnings

2024-02-01 Thread Luis Chamberlain
On Fri, Nov 03, 2023 at 09:20:44PM -0700, Randy Dunlap wrote:
> Fix all kernel-doc warnings in test_kmod.c:
> - Mark some enum values as private so that kernel-doc is not needed
>   for them
> - s/thread_mutex/thread_lock/ in a struct's kernel-doc comments
> - add kernel-doc info for @task_sync
> 
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_INVALID' not described in 
> enum 'kmod_test_case'
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_MAX' not described in enum 
> 'kmod_test_case'
> test_kmod.c:100: warning: Function parameter or member 'task_sync' not 
> described in 'kmod_test_device_info'
> test_kmod.c:134: warning: Function parameter or member 'thread_mutex' not 
> described in 'kmod_test_device'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Luis Chamberlain 
> Cc: linux-modu...@vger.kernel.org

Applied and pushed, thanks!

  Luis



Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-02-01 Thread Marco Pagani



On 2024-01-30 21:47, Luis Chamberlain wrote:
> On Tue, Jan 30, 2024 at 08:36:14PM +0100, Marco Pagani wrote:
>> The current implementation of try_module_get() requires the module to
>> exist and be live as a precondition. While this may seem intuitive at
>> first glance, enforcing the precondition can be tricky, considering that
>> modules can be unloaded at any time if not previously taken. For
>> instance, the caller could be preempted just before calling
>> try_module_get(), and while preempted, the module could be unloaded and
>> freed. More subtly, the module could also be unloaded at any point while
>> executing try_module_get() before incrementing the refount with
>> atomic_inc_not_zero().
>>
>> Neglecting the precondition that the module must exist and be live can
>> cause unexpected race conditions that can lead to crashes. However,
>> ensuring that the precondition is met may require additional locking
>> that increases the complexity of the code and can make it more
>> error-prone.
>>
>> This patch adds a slower yet safer implementation of try_module_get()
>> that checks if the module is valid by looking into the mod_tree before
>> taking the module's refcount. This new function can be safely called on
>> stale and invalid module pointers, relieving developers from the burden
>> of ensuring that the module exists and is live before attempting to take
>> it.
>>
>> The tree lookup and refcount increment are executed after taking the
>> module_mutex to prevent the module from being unloaded after looking up
>> the tree.
>>
>> Signed-off-by: Marco Pagani 
> 
> It very much sounds like there is a desire to have this but without a
> user, there is no justification.

I was working on a set of patches to fix an issue in the fpga subsystem
when I came across your commit 557aafac1153 ("kernel/module: add
documentation for try_module_get()") that made me realize we also had a
safety problem. 

To solve this problem for the fpga manager, we had to add a mutex to
ensure the low-level module still exists before calling
try_module_get(). However, having a safer version of try_module_get()
would have simplified the code and made it more robust against changes.

https://lore.kernel.org/linux-fpga/2024060242.149265-1-marpa...@redhat.com/

I suspect there may be other cases where try_module_get() is
inadvertently called without ensuring that the module still exists
that may benefit from a safer implementation.

>> +bool try_module_get_safe(struct module *module)
>> +{
>> +struct module *mod;
>> +bool ret = true;
>> +
>> +if (!module)
>> +goto out;
>> +
>> +mutex_lock(_mutex);
> 
> If a user comes around then this should be mutex_lock_interruptible(),
> and add might_sleep()

Would it be okay to return false if it gets interrupted, or should I
change the return type to int to propagate -EINTR? My concern with
changing the signature is that it would be less straightforward to
use the function in place of try_module_get().

>> +
>> +/*
>> + * Check if the address points to a valid live module and take
>> + * the refcount only if it points to the module struct.
>> + */
>> +mod = __module_address((unsigned long)module);
>> +if (mod && mod == module && module_is_live(mod))
>> +__module_get(mod);
>> +else
>> +ret = false;
>> +
>> +mutex_unlock(_mutex);
>> +
>> +out:
>> +return ret;
>> +}
>> +EXPORT_SYMBOL(try_module_get_safe);
> 
> And EXPORT_SYMBOL_GPL() would need to be used.

Okay, I initially used EXPORT_SYMBOL() to be compatible with
try_module_get().

> 
> I'd also expect selftests to be expanded for this case, but again,
> without a user, this is just trying to resolve a problem which does not
> exist.

I can add selftests in the next versions.
Thanks,
Marco




Re: [PATCH] lib/test_kmod: fix kernel-doc warnings

2024-01-31 Thread Randy Dunlap
Hi,

Any comments on this patch?
Thanks.


On 11/3/23 21:20, Randy Dunlap wrote:
> Fix all kernel-doc warnings in test_kmod.c:
> - Mark some enum values as private so that kernel-doc is not needed
>   for them
> - s/thread_mutex/thread_lock/ in a struct's kernel-doc comments
> - add kernel-doc info for @task_sync
> 
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_INVALID' not described in 
> enum 'kmod_test_case'
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_MAX' not described in enum 
> 'kmod_test_case'
> test_kmod.c:100: warning: Function parameter or member 'task_sync' not 
> described in 'kmod_test_device_info'
> test_kmod.c:134: warning: Function parameter or member 'thread_mutex' not 
> described in 'kmod_test_device'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Luis Chamberlain 
> Cc: linux-modu...@vger.kernel.org
> ---
>  lib/test_kmod.c |6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff -- a/lib/test_kmod.c b/lib/test_kmod.c
> --- a/lib/test_kmod.c
> +++ b/lib/test_kmod.c
> @@ -58,11 +58,14 @@ static int num_test_devs;
>   * @need_mod_put for your tests case.
>   */
>  enum kmod_test_case {
> + /* private: */
>   __TEST_KMOD_INVALID = 0,
> + /* public: */
>  
>   TEST_KMOD_DRIVER,
>   TEST_KMOD_FS_TYPE,
>  
> + /* private: */
>   __TEST_KMOD_MAX,
>  };
>  
> @@ -82,6 +85,7 @@ struct kmod_test_device;
>   * @ret_sync: return value if request_module() is used, sync request for
>   *   @TEST_KMOD_DRIVER
>   * @fs_sync: return value of get_fs_type() for @TEST_KMOD_FS_TYPE
> + * @task_sync: kthread's task_struct or %NULL if not running
>   * @thread_idx: thread ID
>   * @test_dev: test device test is being performed under
>   * @need_mod_put: Some tests (get_fs_type() is one) requires putting the 
> module
> @@ -108,7 +112,7 @@ struct kmod_test_device_info {
>   * @dev: pointer to misc_dev's own struct device
>   * @config_mutex: protects configuration of test
>   * @trigger_mutex: the test trigger can only be fired once at a time
> - * @thread_lock: protects @done count, and the @info per each thread
> + * @thread_mutex: protects @done count, and the @info per each thread
>   * @done: number of threads which have completed or failed
>   * @test_is_oom: when we run out of memory, use this to halt moving forward
>   * @kthreads_done: completion used to signal when all work is done

-- 
#Randy



[PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-01-31 Thread Bibo Mao
The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|  9 
 arch/loongarch/include/asm/kvm_para.h |  7 
 arch/loongarch/include/asm/paravirt.h | 27 
 .../include/asm/paravirt_api_clock.h  |  1 +
 arch/loongarch/kernel/Makefile|  1 +
 arch/loongarch/kernel/paravirt.c  | 41 +++
 arch/loongarch/kernel/setup.c |  2 +
 7 files changed, 88 insertions(+)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
 #ifndef _ASM_LOONGARCH_KVM_PARA_H
 #define _ASM_LOONGARCH_KVM_PARA_H
 
+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
 /*
  * LoongArch hypcall return code
  */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d01d05791a
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_guest_init(void)
+{
+   if (!cpu_has_hypervisor)
+   return 0;
+   if (!kvm_para_available())
+   

Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-01-30 Thread Luis Chamberlain
On Tue, Jan 30, 2024 at 08:36:14PM +0100, Marco Pagani wrote:
> The current implementation of try_module_get() requires the module to
> exist and be live as a precondition. While this may seem intuitive at
> first glance, enforcing the precondition can be tricky, considering that
> modules can be unloaded at any time if not previously taken. For
> instance, the caller could be preempted just before calling
> try_module_get(), and while preempted, the module could be unloaded and
> freed. More subtly, the module could also be unloaded at any point while
> executing try_module_get() before incrementing the refount with
> atomic_inc_not_zero().
> 
> Neglecting the precondition that the module must exist and be live can
> cause unexpected race conditions that can lead to crashes. However,
> ensuring that the precondition is met may require additional locking
> that increases the complexity of the code and can make it more
> error-prone.
> 
> This patch adds a slower yet safer implementation of try_module_get()
> that checks if the module is valid by looking into the mod_tree before
> taking the module's refcount. This new function can be safely called on
> stale and invalid module pointers, relieving developers from the burden
> of ensuring that the module exists and is live before attempting to take
> it.
> 
> The tree lookup and refcount increment are executed after taking the
> module_mutex to prevent the module from being unloaded after looking up
> the tree.
> 
> Signed-off-by: Marco Pagani 

It very much sounds like there is a desire to have this but without a
user, there is no justification.

> +bool try_module_get_safe(struct module *module)
> +{
> + struct module *mod;
> + bool ret = true;
> +
> + if (!module)
> + goto out;
> +
> + mutex_lock(_mutex);

If a user comes around then this should be mutex_lock_interruptible(),
and add might_sleep()

> +
> + /*
> +  * Check if the address points to a valid live module and take
> +  * the refcount only if it points to the module struct.
> +  */
> + mod = __module_address((unsigned long)module);
> + if (mod && mod == module && module_is_live(mod))
> + __module_get(mod);
> + else
> + ret = false;
> +
> + mutex_unlock(_mutex);
> +
> +out:
> + return ret;
> +}
> +EXPORT_SYMBOL(try_module_get_safe);

And EXPORT_SYMBOL_GPL() would need to be used.

I'd also expect selftests to be expanded for this case, but again,
without a user, this is just trying to resolve a problem which does not
exist.

  Luis



[RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-01-30 Thread Marco Pagani
The current implementation of try_module_get() requires the module to
exist and be live as a precondition. While this may seem intuitive at
first glance, enforcing the precondition can be tricky, considering that
modules can be unloaded at any time if not previously taken. For
instance, the caller could be preempted just before calling
try_module_get(), and while preempted, the module could be unloaded and
freed. More subtly, the module could also be unloaded at any point while
executing try_module_get() before incrementing the refount with
atomic_inc_not_zero().

Neglecting the precondition that the module must exist and be live can
cause unexpected race conditions that can lead to crashes. However,
ensuring that the precondition is met may require additional locking
that increases the complexity of the code and can make it more
error-prone.

This patch adds a slower yet safer implementation of try_module_get()
that checks if the module is valid by looking into the mod_tree before
taking the module's refcount. This new function can be safely called on
stale and invalid module pointers, relieving developers from the burden
of ensuring that the module exists and is live before attempting to take
it.

The tree lookup and refcount increment are executed after taking the
module_mutex to prevent the module from being unloaded after looking up
the tree.

Signed-off-by: Marco Pagani 
---
 include/linux/module.h | 15 +++
 kernel/module/main.c   | 27 +++
 2 files changed, 42 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 08364d5cbc07..86b6ea43d204 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -695,6 +695,19 @@ extern void __module_get(struct module *module);
  */
 extern bool try_module_get(struct module *module);
 
+/**
+ * try_module_get_safe() - safely take the refcount of a module.
+ * @module: address of the module to be taken.
+ *
+ * Safer version of try_module_get(). Check first if the module exists and is 
alive,
+ * and then take its reference count.
+ *
+ * Return:
+ * * %true - module exists and its refcount has been incremented or module is 
NULL.
+ * * %false - module does not exist.
+ */
+extern bool try_module_get_safe(struct module *module);
+
 /**
  * module_put() - release a reference count to a module
  * @module: the module we should release a reference count for
@@ -815,6 +828,8 @@ static inline bool try_module_get(struct module *module)
return true;
 }
 
+#define try_module_get_safe(module) try_module_get(module)
+
 static inline void module_put(struct module *module)
 {
 }
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 98fedfdb8db5..22376b69778c 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -842,6 +842,33 @@ bool try_module_get(struct module *module)
 }
 EXPORT_SYMBOL(try_module_get);
 
+bool try_module_get_safe(struct module *module)
+{
+   struct module *mod;
+   bool ret = true;
+
+   if (!module)
+   goto out;
+
+   mutex_lock(_mutex);
+
+   /*
+* Check if the address points to a valid live module and take
+* the refcount only if it points to the module struct.
+*/
+   mod = __module_address((unsigned long)module);
+   if (mod && mod == module && module_is_live(mod))
+   __module_get(mod);
+   else
+   ret = false;
+
+   mutex_unlock(_mutex);
+
+out:
+   return ret;
+}
+EXPORT_SYMBOL(try_module_get_safe);
+
 void module_put(struct module *module)
 {
int ret;

base-commit: 4515d08a742c76612b65d2f47a87d12860519842
-- 
2.43.0




[PATCH v3 4/6] LoongArch: Add paravirt interface for guest kernel

2024-01-22 Thread Bibo Mao
The patch add paravirt interface for guest kernel, function
pv_guest_init firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|  9 
 arch/loongarch/include/asm/kvm_para.h |  7 
 arch/loongarch/include/asm/paravirt.h | 27 
 .../include/asm/paravirt_api_clock.h  |  1 +
 arch/loongarch/kernel/Makefile|  1 +
 arch/loongarch/kernel/paravirt.c  | 41 +++
 arch/loongarch/kernel/setup.c |  2 +
 7 files changed, 88 insertions(+)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
 #ifndef _ASM_LOONGARCH_KVM_PARA_H
 #define _ASM_LOONGARCH_KVM_PARA_H
 
+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
 /*
  * LoongArch hypcall return code
  */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d01d05791a
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_guest_init(void)
+{
+   if (!cpu_has_hypervisor)
+   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+   return 1;
+}
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
in

BUG: unable to handle kernel paging request in __skb_flow_dissect

2024-01-16 Thread Ubisectech Sirius
Hello.
We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
Recently, our team has discovered a issue in Linux kernel 6.7.0-g052d534373b7. 
Attached to the email were a POC file of the issue.
Stack dump:
[ 185.664167][ T8332] BUG: unable to handle page fault for address: 
ed1029c40001
[ 185.665134][ T8332] #PF: supervisor read access in kernel mode
[ 185.665877][ T8332] #PF: error_code(0x) - not-present page
[ 185.666481][ T8332] PGD 7ffd0067 P4D 7ffd0067 PUD 3fff5067 PMD 0
[ 185.667129][ T8332] Oops:  [#1] PREEMPT SMP KASAN
[ 185.667719][ T8332] CPU: 1 PID: 8332 Comm: poc Not tainted 
6.7.0-g052d534373b7 #19
[ 185.668641][ T8332] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.15.0-1 04/01/2014
[ 185.669639][ T8332] RIP: 0010:__skb_flow_dissect 
(net/core/flow_dissector.c:1170 (discriminator 1))
[ 185.682210][ T8332] Call Trace:
[ 185.682595][ T8332] 
[ 185.717256][ T8332] __skb_get_hash (net/core/flow_dissector.c:1737 
net/core/flow_dissector.c:1770 net/core/flow_dissector.c:1794 
net/core/flow_dissector.c:1856)
[ 185.721978][ T8332] ip_tunnel_xmit (./include/linux/skbuff.h:1566 
net/ipv4/ip_tunnel.c:748)
[ 185.727788][ T8332] ipip_tunnel_xmit (net/ipv4/ipip.c:308)
[ 185.728396][ T8332] dev_hard_start_xmit (./include/linux/netdevice.h:5004 
net/core/dev.c:3547 net/core/dev.c:3563)
[ 185.729082][ T8332] __dev_queue_xmit (./include/linux/netdevice.h:3367 
net/core/dev.c:4352)
[ 185.736814][ T8332] neigh_connected_output (./include/linux/netdevice.h:3171 
net/core/neighbour.c:1592)
[ 185.737536][ T8332] ip_finish_output2 (./include/net/neighbour.h:542 
net/ipv4/ip_output.c:235)
[ 185.742239][ T8332] __ip_finish_output (net/ipv4/ip_output.c:313 
net/ipv4/ip_output.c:295)
[ 185.742943][ T8332] ip_finish_output (net/ipv4/ip_output.c:323)
[ 185.743556][ T8332] ip_mc_output (./include/linux/netfilter.h:303 
net/ipv4/ip_output.c:420)
[ 185.744137][ T8332] ip_local_out (./include/net/dst.h:451 
net/ipv4/ip_output.c:129)
[ 185.744746][ T8332] iptunnel_xmit (net/ipv4/ip_tunnel_core.c:84 
(discriminator 4))
[ 185.745390][ T8332] ip_tunnel_xmit (net/ipv4/ip_tunnel.c:833)
[ 185.750430][ T8332] dev_hard_start_xmit (./include/linux/netdevice.h:5004 
net/core/dev.c:3547 net/core/dev.c:3563)
[ 185.751114][ T8332] __dev_queue_xmit (./include/linux/netdevice.h:3367 
net/core/dev.c:4352)
[ 185.759138][ T8332] __bpf_redirect (./include/linux/netdevice.h:3367 
net/core/filter.c:2136 net/core/filter.c:2165 net/core/filter.c:2188)
[ 185.759757][ T8332] bpf_clone_redirect (net/core/filter.c:2459 
net/core/filter.c:2431)
[ 185.761088][ T8332] ___bpf_prog_run (kernel/bpf/core.c:1986)
[ 185.762499][ T8332] __bpf_prog_run512 (kernel/bpf/core.c:2227)
[ 185.778478][ T8332] bpf_test_run (./include/linux/bpf.h:1231 
./include/linux/filter.h:651 ./include/linux/filter.h:658 
net/bpf/test_run.c:423)
[ 185.783715][ T8332] bpf_prog_test_run_skb (net/bpf/test_run.c:1057)
[ 185.786538][ T8332] __sys_bpf (kernel/bpf/syscall.c:4107 
kernel/bpf/syscall.c:5475)
[ 185.793454][ T8332] __x64_sys_bpf (kernel/bpf/syscall.c:5559)
[ 185.794810][ T8332] do_syscall_64 (arch/x86/entry/common.c:52 
arch/x86/entry/common.c:83)
[ 185.795399][ T8332] entry_SYSCALL_64_after_hwframe 
(arch/x86/entry/entry_64.S:129)
[ 185.796182][ T8332] RIP: 0033:0x7f4f8955df29
Analyze of the issue:
The issue code in the __skb_flow_dissect 
function(net/core/flow_dissector.c:1170). The code are blow:
iph = __skb_header_pointer(skb, nhoff, sizeof(_iph), data, hlen, &_iph);
if (!iph || iph->ihl < 5) {
fdret = FLOW_DISSECT_RET_OUT_BAD;
break;
}
It looks like the function __skb_header_pointer will return a invalid address, 
and iph->ihl will read the invalid address to get value. So, I think the issue 
is lack of check the iph is valid or no.
Thank you for taking the time to read this email and we look forward to working 
with you further.
 Ubisectech Sirius Team
 Web: www.ubisectech.com
 Email: bugrep...@ubisectech.com


横板竖版组合LOGO_画板 1.png
Description: Binary data


poc.c
Description: Binary data


[PATCH v2 4/6] LoongArch: Add paravirt interface for guest kernel

2024-01-07 Thread Bibo Mao
The patch add paravirt interface for guest kernel, function
pv_guest_init firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, and there is only KVM hypervisor
supported on LoongArch now.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|  9 
 arch/loongarch/include/asm/kvm_para.h |  7 
 arch/loongarch/include/asm/paravirt.h | 27 
 .../include/asm/paravirt_api_clock.h  |  1 +
 arch/loongarch/kernel/Makefile|  1 +
 arch/loongarch/kernel/paravirt.c  | 41 +++
 arch/loongarch/kernel/setup.c |  2 +
 7 files changed, 88 insertions(+)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index ee123820a476..d8ccaf46a50d 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -564,6 +564,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
 #ifndef _ASM_LOONGARCH_KVM_PARA_H
 #define _ASM_LOONGARCH_KVM_PARA_H
 
+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
 /*
  * LoongArch hypcall return code
  */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d01d05791a
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_guest_init(void)
+{
+   if (!cpu_has_hypervisor)
+   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+   return 1;
+}
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index d183a745fb

Re: [syzbot] [fs?] [trace?] BUG: unable to handle kernel paging request in tracefs_apply_options

2024-01-03 Thread Steven Rostedt
On Wed, 03 Jan 2024 13:41:31 -0800
syzbot  wrote:

> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:453f5db0619e Merge tag 'trace-v6.7-rc7' of git://git.kerne..
> git tree:   upstream
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=10ec3829e8
> kernel config:  https://syzkaller.appspot.com/x/.config?x=f8e72bae38c079e4
> dashboard link: https://syzkaller.appspot.com/bug?extid=f8a023e0c6beabe2371a
> compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
> 2.40
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1414af31e8
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15e52409e8
> 
> Downloadable assets:
> disk image: 
> https://storage.googleapis.com/syzbot-assets/38b92a7149e8/disk-453f5db0.raw.xz
> vmlinux: 
> https://storage.googleapis.com/syzbot-assets/4f872267133f/vmlinux-453f5db0.xz
> kernel image: 
> https://storage.googleapis.com/syzbot-assets/587572061791/bzImage-453f5db0.xz
> 
> The issue was bisected to:
> 
> commit 7e8358edf503e87236c8d07f69ef0ed846dd5112
> Author: Steven Rostedt (Google) 
> Date:   Fri Dec 22 00:07:57 2023 +
> 
> eventfs: Fix file and directory uid and gid ownership
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=108cd519e8
> final oops: https://syzkaller.appspot.com/x/report.txt?x=128cd519e8
> console output: https://syzkaller.appspot.com/x/log.txt?x=148cd519e8
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+f8a023e0c6beabe23...@syzkaller.appspotmail.com
> Fixes: 7e8358edf503 ("eventfs: Fix file and directory uid and gid ownership")
> 
> BUG: unable to handle page fault for address: fff0
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x) - not-present page
> PGD d734067 P4D d734067 PUD d736067 PMD 0 
> Oops:  [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 5056 Comm: syz-executor170 Not tainted 
> 6.7.0-rc7-syzkaller-00049-g453f5db0619e #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 11/17/2023
> RIP: 0010:set_gid fs/tracefs/inode.c:224 [inline]
> RIP: 0010:tracefs_apply_options+0x4d0/0xa40 fs/tracefs/inode.c:337
> Code: 24 10 49 8b 1e 48 83 c3 f0 74 3d 48 89 d8 48 c1 e8 03 48 bd 00 00 00 00 
> 00 fc ff df 80 3c 28 00 74 08 48 89 df e8 70 ff 88 fe <48> 8b 1b 48 89 de 48 
> 83 e6 02 31 ff e8 bf fe 2c fe 48 83 e3 02 75
> RSP: 0018:c900040ffca8 EFLAGS: 00010246
> RAX: 1ffe RBX: fff0 RCX: 888014bf5940
> RDX:  RSI: 0004 RDI: c900040ffc20
> RBP: dc00 R08: 0003 R09: f5200081ff84
> R10: dc00 R11: f5200081ff84 R12: 88801d743888
> R13: 88801b0c3710 R14: 88801d7437e8 R15: 88801d743810
> FS:  557dd480() GS:8880b980() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: fff0 CR3: 1ec48000 CR4: 003506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  
>  tracefs_remount+0x78/0x80 fs/tracefs/inode.c:353
>  reconfigure_super+0x440/0x870 fs/super.c:1143
>  do_remount fs/namespace.c:2884 [inline]

This is the same bug that was fixed by:

   
https://lore.kernel.org/linux-trace-kernel/20240102151249.05da2...@gandalf.local.home/

And just waiting to be applied:

   https://lore.kernel.org/all/20240102210731.1f1c5...@gandalf.local.home/

Thanks,

-- Steve

>  path_mount+0xc24/0xfa0 fs/namespace.c:3656
>  do_mount fs/namespace.c:3677 [inline]
>  __do_sys_mount fs/namespace.c:3886 [inline]
>  __se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3863
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0x45/0x110 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x63/0x6b
> RIP: 0033:0x7fec326e8d99
> Code: 48 83 c4 28 c3 e8 67 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7ffc8103ddf8 EFLAGS: 0246 ORIG_RAX: 00a5
> RAX: ffda RBX: 7ffc8103de00 RCX: 7fec326e8d99
> RDX:  RSI: 20c0 RDI: 
> RBP: 7ffc8103de08 R08: 2140 R09: 7fec326b5b80
> R10: 02200022 R11: 0246 R12: 
> R13: 7ffc8103e068 R14: 0001 R15: 0001
>  
> Modules linked in:
> CR2: fff0
> ---[ end trace  ]---
> RIP: 0010:set_gid fs/tracefs/inode.c:224

[syzbot] [fs?] [trace?] BUG: unable to handle kernel paging request in tracefs_apply_options

2024-01-03 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:453f5db0619e Merge tag 'trace-v6.7-rc7' of git://git.kerne..
git tree:   upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=10ec3829e8
kernel config:  https://syzkaller.appspot.com/x/.config?x=f8e72bae38c079e4
dashboard link: https://syzkaller.appspot.com/bug?extid=f8a023e0c6beabe2371a
compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
2.40
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1414af31e8
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15e52409e8

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/38b92a7149e8/disk-453f5db0.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/4f872267133f/vmlinux-453f5db0.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/587572061791/bzImage-453f5db0.xz

The issue was bisected to:

commit 7e8358edf503e87236c8d07f69ef0ed846dd5112
Author: Steven Rostedt (Google) 
Date:   Fri Dec 22 00:07:57 2023 +

eventfs: Fix file and directory uid and gid ownership

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=108cd519e8
final oops: https://syzkaller.appspot.com/x/report.txt?x=128cd519e8
console output: https://syzkaller.appspot.com/x/log.txt?x=148cd519e8

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+f8a023e0c6beabe23...@syzkaller.appspotmail.com
Fixes: 7e8358edf503 ("eventfs: Fix file and directory uid and gid ownership")

BUG: unable to handle page fault for address: fff0
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD d734067 P4D d734067 PUD d736067 PMD 0 
Oops:  [#1] PREEMPT SMP KASAN
CPU: 0 PID: 5056 Comm: syz-executor170 Not tainted 
6.7.0-rc7-syzkaller-00049-g453f5db0619e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
11/17/2023
RIP: 0010:set_gid fs/tracefs/inode.c:224 [inline]
RIP: 0010:tracefs_apply_options+0x4d0/0xa40 fs/tracefs/inode.c:337
Code: 24 10 49 8b 1e 48 83 c3 f0 74 3d 48 89 d8 48 c1 e8 03 48 bd 00 00 00 00 
00 fc ff df 80 3c 28 00 74 08 48 89 df e8 70 ff 88 fe <48> 8b 1b 48 89 de 48 83 
e6 02 31 ff e8 bf fe 2c fe 48 83 e3 02 75
RSP: 0018:c900040ffca8 EFLAGS: 00010246
RAX: 1ffe RBX: fff0 RCX: 888014bf5940
RDX:  RSI: 0004 RDI: c900040ffc20
RBP: dc00 R08: 0003 R09: f5200081ff84
R10: dc00 R11: f5200081ff84 R12: 88801d743888
R13: 88801b0c3710 R14: 88801d7437e8 R15: 88801d743810
FS:  557dd480() GS:8880b980() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: fff0 CR3: 1ec48000 CR4: 003506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 tracefs_remount+0x78/0x80 fs/tracefs/inode.c:353
 reconfigure_super+0x440/0x870 fs/super.c:1143
 do_remount fs/namespace.c:2884 [inline]
 path_mount+0xc24/0xfa0 fs/namespace.c:3656
 do_mount fs/namespace.c:3677 [inline]
 __do_sys_mount fs/namespace.c:3886 [inline]
 __se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3863
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x45/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fec326e8d99
Code: 48 83 c4 28 c3 e8 67 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7ffc8103ddf8 EFLAGS: 0246 ORIG_RAX: 00a5
RAX: ffda RBX: 7ffc8103de00 RCX: 7fec326e8d99
RDX:  RSI: 20c0 RDI: 
RBP: 7ffc8103de08 R08: 2140 R09: 7fec326b5b80
R10: 02200022 R11: 0246 R12: 
R13: 7ffc8103e068 R14: 0001 R15: 0001
 
Modules linked in:
CR2: fff0
---[ end trace  ]---
RIP: 0010:set_gid fs/tracefs/inode.c:224 [inline]
RIP: 0010:tracefs_apply_options+0x4d0/0xa40 fs/tracefs/inode.c:337
Code: 24 10 49 8b 1e 48 83 c3 f0 74 3d 48 89 d8 48 c1 e8 03 48 bd 00 00 00 00 
00 fc ff df 80 3c 28 00 74 08 48 89 df e8 70 ff 88 fe <48> 8b 1b 48 89 de 48 83 
e6 02 31 ff e8 bf fe 2c fe 48 83 e3 02 75
RSP: 0018:c900040ffca8 EFLAGS: 00010246
RAX: 1ffe RBX: fff0 RCX: 888014bf5940
RDX:  RSI: 0004 RDI: c900040ffc20
RBP: dc00 R08: 0003 R09: f5200081ff84
R10: dc00 R11: f5200081ff84 R12: 88801d743888
R13: 88801b0c3710 R14: 88801d7437e8 R15: 88801d743810
FS:  557dd480() GS:8880b980() knlGS:
CS:  0010 DS:  ES: 00

Re: [PATCH 4/5] LoongArch: Add paravirt interface for guest kernel

2024-01-03 Thread maobibo




On 2024/1/3 下午4:14, Juergen Gross wrote:

On 03.01.24 09:00, maobibo wrote:



On 2024/1/3 下午3:40, Jürgen Groß wrote:

On 03.01.24 08:16, Bibo Mao wrote:

The patch add paravirt interface for guest kernel, it checks whether
system runs on VM mode. If it is, it will detect hypervisor type. And
returns true it is KVM hypervisor, else return false. Currently only
KVM hypervisor is supported, so there is only hypervisor detection
for KVM type.


I guess you are talking of pv_guest_init() here? Or do you mean
kvm_para_available()?

yes, it is pv_guest_init. It will be better if all hypervisor detection
is called in function pv_guest_init. Currently there is only kvm 
hypervisor, kvm_para_available is hard-coded in pv_guest_init here.


I think this is no problem as long as there are not more hypervisors
supported.



I can split file paravirt.c into paravirt.c and kvm.c, paravirt.c is 
used for hypervisor detection, and move code relative with pv_ipi into 
kvm.c


I wouldn't do that right now.

Just be a little bit more specific in the commit message (use the related
function name instead of "it").

Sure, will do this in next version, and thanks for your guidance.

Regards
Bibo Mao



Juergen





Re: [PATCH 4/5] LoongArch: Add paravirt interface for guest kernel

2024-01-03 Thread Juergen Gross

On 03.01.24 09:00, maobibo wrote:



On 2024/1/3 下午3:40, Jürgen Groß wrote:

On 03.01.24 08:16, Bibo Mao wrote:

The patch add paravirt interface for guest kernel, it checks whether
system runs on VM mode. If it is, it will detect hypervisor type. And
returns true it is KVM hypervisor, else return false. Currently only
KVM hypervisor is supported, so there is only hypervisor detection
for KVM type.


I guess you are talking of pv_guest_init() here? Or do you mean
kvm_para_available()?

yes, it is pv_guest_init. It will be better if all hypervisor detection
is called in function pv_guest_init. Currently there is only kvm hypervisor, 
kvm_para_available is hard-coded in pv_guest_init here.


I think this is no problem as long as there are not more hypervisors
supported.



I can split file paravirt.c into paravirt.c and kvm.c, paravirt.c is used for 
hypervisor detection, and move code relative with pv_ipi into kvm.c


I wouldn't do that right now.

Just be a little bit more specific in the commit message (use the related
function name instead of "it").


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH 4/5] LoongArch: Add paravirt interface for guest kernel

2024-01-03 Thread maobibo




On 2024/1/3 下午3:40, Jürgen Groß wrote:

On 03.01.24 08:16, Bibo Mao wrote:

The patch add paravirt interface for guest kernel, it checks whether
system runs on VM mode. If it is, it will detect hypervisor type. And
returns true it is KVM hypervisor, else return false. Currently only
KVM hypervisor is supported, so there is only hypervisor detection
for KVM type.


I guess you are talking of pv_guest_init() here? Or do you mean
kvm_para_available()?

yes, it is pv_guest_init. It will be better if all hypervisor detection
is called in function pv_guest_init. Currently there is only kvm 
hypervisor, kvm_para_available is hard-coded in pv_guest_init here.


I can split file paravirt.c into paravirt.c and kvm.c, paravirt.c is 
used for hypervisor detection, and move code relative with pv_ipi into kvm.c


Regards
Bibo Mao




Signed-off-by: Bibo Mao 
---
  arch/loongarch/Kconfig    |  8 
  arch/loongarch/include/asm/kvm_para.h |  7 
  arch/loongarch/include/asm/paravirt.h | 27 
  .../include/asm/paravirt_api_clock.h  |  1 +
  arch/loongarch/kernel/Makefile    |  1 +
  arch/loongarch/kernel/paravirt.c  | 41 +++
  arch/loongarch/kernel/setup.c |  2 +
  7 files changed, 87 insertions(+)
  create mode 100644 arch/loongarch/include/asm/paravirt.h
  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
  create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index ee123820a476..940e5960d297 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -564,6 +564,14 @@ config CPU_HAS_PREFETCH
  bool
  default y
+config PARAVIRT
+    bool "Enable paravirtualization code"
+    help
+  This changes the kernel so it can modify itself when it is run
+  under a hypervisor, potentially improving performance 
significantly

+  over full virtualization.  However, when run without a hypervisor
+  the kernel is theoretically slower and slightly larger.
+
  config ARCH_SUPPORTS_KEXEC
  def_bool y
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h

index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
  #ifndef _ASM_LOONGARCH_KVM_PARA_H
  #define _ASM_LOONGARCH_KVM_PARA_H
+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM    1
+#define HYPERVISOR_VENDOR_SHIFT    8
+#define HYPERCALL_CODE(vendor, code)    ((vendor << 
HYPERVISOR_VENDOR_SHIFT) + code)

+
  /*
   * LoongArch hypcall return code
   */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h

new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+    return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+    return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h

new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile 
b/arch/loongarch/kernel/Makefile

index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES)    += module.o 
module-sections.o

  obj-$(CONFIG_STACKTRACE)    += stacktrace.o
  obj-$(CONFIG_PROC_FS)    += proc.o
+obj-$(CONFIG_PARAVIRT)    += paravirt.o
  obj-$(CONFIG_SMP)    += smp.o
diff --git a/arch/loongarch/kernel/paravirt.c 
b/arch/loongarch/kernel/paravirt.c

new file mode 100644
index ..21d01d05791a
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+    return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);


This is the 4th arch with the same definition of native_steal_clock() and
pv_steal_clock. I think we should add a common file kernel/paravirt.c and
move the related functions from the archs into the new file.

If you don't want to do th

Re: [PATCH 4/5] LoongArch: Add paravirt interface for guest kernel

2024-01-02 Thread Jürgen Groß

On 03.01.24 08:16, Bibo Mao wrote:

The patch add paravirt interface for guest kernel, it checks whether
system runs on VM mode. If it is, it will detect hypervisor type. And
returns true it is KVM hypervisor, else return false. Currently only
KVM hypervisor is supported, so there is only hypervisor detection
for KVM type.


I guess you are talking of pv_guest_init() here? Or do you mean
kvm_para_available()?



Signed-off-by: Bibo Mao 
---
  arch/loongarch/Kconfig|  8 
  arch/loongarch/include/asm/kvm_para.h |  7 
  arch/loongarch/include/asm/paravirt.h | 27 
  .../include/asm/paravirt_api_clock.h  |  1 +
  arch/loongarch/kernel/Makefile|  1 +
  arch/loongarch/kernel/paravirt.c  | 41 +++
  arch/loongarch/kernel/setup.c |  2 +
  7 files changed, 87 insertions(+)
  create mode 100644 arch/loongarch/include/asm/paravirt.h
  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
  create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index ee123820a476..940e5960d297 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -564,6 +564,14 @@ config CPU_HAS_PREFETCH
bool
default y
  
+config PARAVIRT

+   bool "Enable paravirtualization code"
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
  config ARCH_SUPPORTS_KEXEC
def_bool y
  
diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h

index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
  #ifndef _ASM_LOONGARCH_KVM_PARA_H
  #define _ASM_LOONGARCH_KVM_PARA_H
  
+/*

+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
  /*
   * LoongArch hypcall return code
   */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
  obj-$(CONFIG_STACKTRACE)  += stacktrace.o
  
  obj-$(CONFIG_PROC_FS)		+= proc.o

+obj-$(CONFIG_PARAVIRT) += paravirt.o
  
  obj-$(CONFIG_SMP)		+= smp.o
  
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c

new file mode 100644
index ..21d01d05791a
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);


This is the 4th arch with the same definition of native_steal_clock() and
pv_steal_clock. I think we should add a common file kernel/paravirt.c and
move the related functions from the archs into the new file.

If you don't want to do that I can prepare a series.


+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISO

[PATCH 4/5] LoongArch: Add paravirt interface for guest kernel

2024-01-02 Thread Bibo Mao
The patch add paravirt interface for guest kernel, it checks whether
system runs on VM mode. If it is, it will detect hypervisor type. And
returns true it is KVM hypervisor, else return false. Currently only
KVM hypervisor is supported, so there is only hypervisor detection
for KVM type.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|  8 
 arch/loongarch/include/asm/kvm_para.h |  7 
 arch/loongarch/include/asm/paravirt.h | 27 
 .../include/asm/paravirt_api_clock.h  |  1 +
 arch/loongarch/kernel/Makefile|  1 +
 arch/loongarch/kernel/paravirt.c  | 41 +++
 arch/loongarch/kernel/setup.c |  2 +
 7 files changed, 87 insertions(+)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index ee123820a476..940e5960d297 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -564,6 +564,14 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
 #ifndef _ASM_LOONGARCH_KVM_PARA_H
 #define _ASM_LOONGARCH_KVM_PARA_H
 
+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
 /*
  * LoongArch hypcall return code
  */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d01d05791a
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_guest_init(void)
+{
+   if (!cpu_has_hypervisor)
+   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+   return 1;
+}
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index d183a745fb85..fa680bdd0bd1 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -43,6 +43,7 @@
 #in

Re: Unable to trace nf_nat function using kprobe with latest kernel 6.1.66-1

2024-01-02 Thread P K
sure. function was available in /proc/kallsyms.

Any suggestion how to track source IP and natted IP in bpf after NAT
masquerade?
Basically I wanted to know the Source IP and source Port after NAT IP
using 5 tuple lookup?

-- PK

On Tue, Jan 2, 2024 at 8:00 PM Steven Rostedt  wrote:
>
> On Tue, 2 Jan 2024 11:23:54 +0530
> P K  wrote:
>
> > Hi,
> >
> > I am unable to trace nf_nat functions using kprobe with the latest kernel.
> > Previously in kernel 6.1.55-1 it was working fine.
> > Can someone please check if it's broken with the latest commit or  i
> > have to use it differently?
> >
>
> Note, attaching to kernel functions is never considered stable API and may
> break at any kernel release.
>
> Also, if the compiler decides to inline the function, and makes it no
> longer visible in /proc/kallsyms then that too will cause this to break.
>
> -- Steve
>
>
> > Mentioned below are output:
> > Kernel - 6.1.55-1
> > / # bpftrace -e 'kprobe:nf_nat_ipv4_manip_pkt { printf("func called\n"); }'
> > Attaching 1 probe...
> > cannot attach kprobe, probe entry may not exist
> > ERROR: Error attaching probe: 'kprobe:nf_nat_ipv4_manip_pkt'
> >
> >
> >
> > Kernel 6.1.55-1
> > / # bpftrace -e 'kprobe:nf_nat_ipv4_manip_pkt { printf("func called\n"); }'
> > Attaching 1 probe...
> > func called
> > func called
>



Re: Unable to trace nf_nat function using kprobe with latest kernel 6.1.66-1

2024-01-02 Thread Steven Rostedt
On Tue, 2 Jan 2024 11:23:54 +0530
P K  wrote:

> Hi,
> 
> I am unable to trace nf_nat functions using kprobe with the latest kernel.
> Previously in kernel 6.1.55-1 it was working fine.
> Can someone please check if it's broken with the latest commit or  i
> have to use it differently?
> 

Note, attaching to kernel functions is never considered stable API and may
break at any kernel release.

Also, if the compiler decides to inline the function, and makes it no
longer visible in /proc/kallsyms then that too will cause this to break.

-- Steve


> Mentioned below are output:
> Kernel - 6.1.55-1
> / # bpftrace -e 'kprobe:nf_nat_ipv4_manip_pkt { printf("func called\n"); }'
> Attaching 1 probe...
> cannot attach kprobe, probe entry may not exist
> ERROR: Error attaching probe: 'kprobe:nf_nat_ipv4_manip_pkt'
> 
> 
> 
> Kernel 6.1.55-1
> / # bpftrace -e 'kprobe:nf_nat_ipv4_manip_pkt { printf("func called\n"); }'
> Attaching 1 probe...
> func called
> func called




Unable to trace nf_nat function using kprobe with latest kernel 6.1.66-1

2024-01-01 Thread P K
Hi,

I am unable to trace nf_nat functions using kprobe with the latest kernel.
Previously in kernel 6.1.55-1 it was working fine.
Can someone please check if it's broken with the latest commit or  i
have to use it differently?

Mentioned below are output:
Kernel - 6.1.55-1
/ # bpftrace -e 'kprobe:nf_nat_ipv4_manip_pkt { printf("func called\n"); }'
Attaching 1 probe...
cannot attach kprobe, probe entry may not exist
ERROR: Error attaching probe: 'kprobe:nf_nat_ipv4_manip_pkt'



Kernel 6.1.55-1
/ # bpftrace -e 'kprobe:nf_nat_ipv4_manip_pkt { printf("func called\n"); }'
Attaching 1 probe...
func called
func called



[PATCH V12 12/14] RISC-V: paravirt: pvqspinlock: Add nopvspin kernel parameter

2023-12-25 Thread guoren
From: Guo Ren 

Disables the qspinlock slow path using PV optimizations which
allow the hypervisor to 'idle' the guest on lock contention.

Reviewed-by: Leonardo Bras 
Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
---
 Documentation/admin-guide/kernel-parameters.txt |  2 +-
 arch/riscv/kernel/qspinlock_paravirt.c  | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index b7794c96d91e..4aff81d741e2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3927,7 +3927,7 @@
as generic guest with no PV drivers. Currently support
XEN HVM, KVM, HYPER_V and VMWARE guest.
 
-   nopvspin[X86,XEN,KVM]
+   nopvspin[X86,XEN,KVM,RISC-V]
Disables the qspinlock slow path using PV optimizations
which allow the hypervisor to 'idle' the guest on lock
contention.
diff --git a/arch/riscv/kernel/qspinlock_paravirt.c 
b/arch/riscv/kernel/qspinlock_paravirt.c
index 7d1b9941..4b04c93c4b9b 100644
--- a/arch/riscv/kernel/qspinlock_paravirt.c
+++ b/arch/riscv/kernel/qspinlock_paravirt.c
@@ -43,8 +43,21 @@ EXPORT_STATIC_CALL(pv_queued_spin_lock_slowpath);
 DEFINE_STATIC_CALL(pv_queued_spin_unlock, native_queued_spin_unlock);
 EXPORT_STATIC_CALL(pv_queued_spin_unlock);
 
+static bool nopvspin __initdata;
+static __init int parse_nopvspin(char *arg)
+{
+   nopvspin = true;
+   return 0;
+}
+early_param("nopvspin", parse_nopvspin);
+
 void __init pv_qspinlock_init(void)
 {
+   if (nopvspin) {
+   pr_info("PV qspinlocks disabled\n");
+   return;
+   }
+
if (num_possible_cpus() == 1)
return;
 
-- 
2.40.1




Re: [PATCH 2/3] nvdimm/dimm_devs: fix kernel-doc for function params

2023-12-21 Thread Randy Dunlap
Hi Ira,

On 12/21/23 14:34, Ira Weiny wrote:
> Ira Weiny wrote:
>> Randy Dunlap wrote:
> 
> [snip]
> 
>>> diff -- a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
>>> --- a/drivers/nvdimm/dimm_devs.c
>>> +++ b/drivers/nvdimm/dimm_devs.c
>>> @@ -53,7 +53,10 @@ static int validate_dimm(struct nvdimm_d
>>>  
>>>  /**
>>>   * nvdimm_init_nsarea - determine the geometry of a dimm's namespace area
>>> - * @nvdimm: dimm to initialize
>>> + * @ndd: dimm to initialize
>>> + *
>>> + * Returns: %0 if the area is already valid, -errno on error,
>>
>> This is good...
>>
>>> ...otherwise an
>>> + * ND_CMD_* status code.
>>
>> I don't see where any of the ->ndctl() calls return an ND_CMD_* status
>> code.  What am I missing?

Probably nothing.
Yes, please drop that/fix that when you merge it.
Thanks.

> If you agree that this should be dropped I can clean it up as I'm trying
> to prep the nvdimm tree now and was hoping to take this series.
> 
> Ira
> 
>>
>> The rest looks good,
>> Ira

-- 
#Randy
https://people.kernel.org/tglx/notes-about-netiquette
https://subspace.kernel.org/etiquette.html



Re: [PATCH 3/3] nvdimm/namespace: fix kernel-doc for function params

2023-12-21 Thread Ira Weiny
Randy Dunlap wrote:
> 
> 
> On 12/21/23 14:32, Ira Weiny wrote:
> > Randy Dunlap wrote:
> > 
> > [snip]
> > 
> >> @@ -1656,8 +1664,10 @@ static int select_pmem_id(struct nd_regi
> >>  /**
> >>   * create_namespace_pmem - validate interleave set labelling, retrieve 
> >> label0
> >>   * @nd_region: region with mappings to validate
> >> - * @nspm: target namespace to create
> >> + * @nd_mapping: container of dpa-resource-root + labels
> >>   * @nd_label: target pmem namespace label to evaluate
> >> + *
> >> + * Returns: the created  device on success or -errno on error
> > 
> > NIT: should this be ERR_PTR(-errno) on error?
> 
> Oh, for sure. Thanks for catching that.

I'll clean this up as well if you are good with me changing patch 2/3 as
well.

Ira



Re: [PATCH 3/3] nvdimm/namespace: fix kernel-doc for function params

2023-12-21 Thread Randy Dunlap



On 12/21/23 14:32, Ira Weiny wrote:
> Randy Dunlap wrote:
> 
> [snip]
> 
>> @@ -1656,8 +1664,10 @@ static int select_pmem_id(struct nd_regi
>>  /**
>>   * create_namespace_pmem - validate interleave set labelling, retrieve 
>> label0
>>   * @nd_region: region with mappings to validate
>> - * @nspm: target namespace to create
>> + * @nd_mapping: container of dpa-resource-root + labels
>>   * @nd_label: target pmem namespace label to evaluate
>> + *
>> + * Returns: the created  device on success or -errno on error
> 
> NIT: should this be ERR_PTR(-errno) on error?

Oh, for sure. Thanks for catching that.

> Generally good to me though.
> 
> Reviewed-by: Ira Weiny 
> 
>>   */
>>  static struct device *create_namespace_pmem(struct nd_region *nd_region,
>>  struct nd_mapping *nd_mapping,

-- 
#Randy
https://people.kernel.org/tglx/notes-about-netiquette
https://subspace.kernel.org/etiquette.html



Re: [PATCH 2/3] nvdimm/dimm_devs: fix kernel-doc for function params

2023-12-21 Thread Ira Weiny
Ira Weiny wrote:
> Randy Dunlap wrote:

[snip]

> > diff -- a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> > --- a/drivers/nvdimm/dimm_devs.c
> > +++ b/drivers/nvdimm/dimm_devs.c
> > @@ -53,7 +53,10 @@ static int validate_dimm(struct nvdimm_d
> >  
> >  /**
> >   * nvdimm_init_nsarea - determine the geometry of a dimm's namespace area
> > - * @nvdimm: dimm to initialize
> > + * @ndd: dimm to initialize
> > + *
> > + * Returns: %0 if the area is already valid, -errno on error,
> 
> This is good...
> 
> > ...otherwise an
> > + * ND_CMD_* status code.
> 
> I don't see where any of the ->ndctl() calls return an ND_CMD_* status
> code.  What am I missing?

If you agree that this should be dropped I can clean it up as I'm trying
to prep the nvdimm tree now and was hoping to take this series.

Ira

> 
> The rest looks good,
> Ira



Re: [PATCH 3/3] nvdimm/namespace: fix kernel-doc for function params

2023-12-21 Thread Ira Weiny
Randy Dunlap wrote:

[snip]

> @@ -1656,8 +1664,10 @@ static int select_pmem_id(struct nd_regi
>  /**
>   * create_namespace_pmem - validate interleave set labelling, retrieve label0
>   * @nd_region: region with mappings to validate
> - * @nspm: target namespace to create
> + * @nd_mapping: container of dpa-resource-root + labels
>   * @nd_label: target pmem namespace label to evaluate
> + *
> + * Returns: the created  device on success or -errno on error

NIT: should this be ERR_PTR(-errno) on error?

Generally good to me though.

Reviewed-by: Ira Weiny 

>   */
>  static struct device *create_namespace_pmem(struct nd_region *nd_region,
>   struct nd_mapping *nd_mapping,



Re: [PATCH 2/3] nvdimm/dimm_devs: fix kernel-doc for function params

2023-12-21 Thread Ira Weiny
Randy Dunlap wrote:
> Adjust kernel-doc notation to prevent warnings when using -Wall.
> 
> dimm_devs.c:59: warning: Function parameter or member 'ndd' not described in 
> 'nvdimm_init_nsarea'
> dimm_devs.c:59: warning: Excess function parameter 'nvdimm' description in 
> 'nvdimm_init_nsarea'
> dimm_devs.c:59: warning: No description found for return value of 
> 'nvdimm_init_nsarea'
> dimm_devs.c:728: warning: No description found for return value of 
> 'nd_pmem_max_contiguous_dpa'
> dimm_devs.c:773: warning: No description found for return value of 
> 'nd_pmem_available_dpa'
> dimm_devs.c:844: warning: Function parameter or member 'ndd' not described in 
> 'nvdimm_allocated_dpa'
> dimm_devs.c:844: warning: Excess function parameter 'nvdimm' description in 
> 'nvdimm_allocated_dpa'
> dimm_devs.c:844: warning: No description found for return value of 
> 'nvdimm_allocated_dpa'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Dan Williams 
> Cc: Vishal Verma 
> Cc: Dave Jiang 
> Cc: Ira Weiny 
> Cc: nvd...@lists.linux.dev
> ---
>  drivers/nvdimm/dimm_devs.c |   14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff -- a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> --- a/drivers/nvdimm/dimm_devs.c
> +++ b/drivers/nvdimm/dimm_devs.c
> @@ -53,7 +53,10 @@ static int validate_dimm(struct nvdimm_d
>  
>  /**
>   * nvdimm_init_nsarea - determine the geometry of a dimm's namespace area
> - * @nvdimm: dimm to initialize
> + * @ndd: dimm to initialize
> + *
> + * Returns: %0 if the area is already valid, -errno on error,

This is good...

> ...otherwise an
> + * ND_CMD_* status code.

I don't see where any of the ->ndctl() calls return an ND_CMD_* status
code.  What am I missing?

The rest looks good,
Ira

>   */
>  int nvdimm_init_nsarea(struct nvdimm_drvdata *ndd)
>  {
> @@ -722,6 +725,9 @@ static unsigned long dpa_align(struct nd
>   *  contiguous unallocated dpa range.
>   * @nd_region: constrain available space check to this reference region
>   * @nd_mapping: container of dpa-resource-root + labels
> + *
> + * Returns: %0 if there is an alignment error, otherwise the max
> + *   unallocated dpa range
>   */
>  resource_size_t nd_pmem_max_contiguous_dpa(struct nd_region *nd_region,
>  struct nd_mapping *nd_mapping)
> @@ -767,6 +773,8 @@ resource_size_t nd_pmem_max_contiguous_d
>   *
>   * Validate that a PMEM label, if present, aligns with the start of an
>   * interleave set.
> + *
> + * Returns: %0 if there is an alignment error, otherwise the unallocated dpa
>   */
>  resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
> struct nd_mapping *nd_mapping)
> @@ -836,8 +844,10 @@ struct resource *nvdimm_allocate_dpa(str
>  
>  /**
>   * nvdimm_allocated_dpa - sum up the dpa currently allocated to this label_id
> - * @nvdimm: container of dpa-resource-root + labels
> + * @ndd: container of dpa-resource-root + labels
>   * @label_id: dpa resource name of the form pmem-
> + *
> + * Returns: sum of the dpa allocated to the label_id
>   */
>  resource_size_t nvdimm_allocated_dpa(struct nvdimm_drvdata *ndd,
>   struct nd_label_id *label_id)



Re: [PATCH 1/3] nvdimm/btt: fix btt_blk_cleanup() kernel-doc

2023-12-21 Thread Ira Weiny
Randy Dunlap wrote:
> Correct the function parameters to prevent kernel-doc warnings:
> 
> btt.c:1567: warning: Function parameter or member 'nd_region' not described 
> in 'btt_init'
> btt.c:1567: warning: Excess function parameter 'maxlane' description in 
> 'btt_init'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Vishal Verma 
> Cc: Dan Williams 
> Cc: Dave Jiang 
> Cc: Ira Weiny 

Reviewed-by: Ira Weiny 

> Cc: nvd...@lists.linux.dev
> ---
>  drivers/nvdimm/btt.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -- a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -1550,7 +1550,7 @@ static void btt_blk_cleanup(struct btt *
>   * @rawsize: raw size in bytes of the backing device
>   * @lbasize: lba size of the backing device
>   * @uuid:A uuid for the backing device - this is stored on media
> - * @maxlane: maximum number of parallel requests the device can handle
> + * @nd_region:nd_region for the REGION device
>   *
>   * Initialize a Block Translation Table on a backing device to provide
>   * single sector power fail atomicity.





Re: [PATCH] kernel/module: improve documentation for try_module_get()

2023-12-21 Thread Luis Chamberlain
On Thu, Dec 21, 2023 at 05:58:47PM +0100, Marco Pagani wrote:
> The sentence "this call will fail if the module is already being
> removed" is potentially confusing and may contradict the rest of the
> documentation. If one tries to get a module that has already been
> removed using a stale pointer, the kernel will crash.
> 
> Signed-off-by: Marco Pagani 

Thanks, patch applied!

  Luis



[PATCH] kernel/module: improve documentation for try_module_get()

2023-12-21 Thread Marco Pagani
The sentence "this call will fail if the module is already being
removed" is potentially confusing and may contradict the rest of the
documentation. If one tries to get a module that has already been
removed using a stale pointer, the kernel will crash.

Signed-off-by: Marco Pagani 
---
 include/linux/module.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index a98e188cf37b..08364d5cbc07 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -668,7 +668,7 @@ extern void __module_get(struct module *module);
  * @module: the module we should check for
  *
  * Only try to get a module reference count if the module is not being removed.
- * This call will fail if the module is already being removed.
+ * This call will fail if the module is in the process of being removed.
  *
  * Care must also be taken to ensure the module exists and is alive prior to
  * usage of this call. This can be gauranteed through two means:
-- 
2.43.0




Re: BUG: unable to handle kernel paging request in bpf_probe_read_compat_str

2023-12-20 Thread Hou Tao
Hi,

On 12/21/2023 1:50 AM, Yonghong Song wrote:
>
> On 12/20/23 1:19 AM, Hou Tao wrote:
>> Hi,
>>
>> On 12/14/2023 11:40 AM, xingwei lee wrote:
>>> Hello I found a bug in net/bpf in the lastest upstream linux and
>>> comfired in the lastest net tree and lastest net bpf titled BUG:
>>> unable to handle kernel paging request in bpf_probe_read_compat_str
>>>
>>> If you fix this issue, please add the following tag to the commit:
>>> Reported-by: xingwei Lee 
>>>
>>> kernel: net 9702817384aa4a3700643d0b26e71deac0172cfd / bpf
>>> 2f2fee2bf74a7e31d06fc6cb7ba2bd4dd7753c99
>>> Kernel config:
>>> https://syzkaller.appspot.com/text?tag=KernelConfig=b50bd31249191be8
>>>
>>> in the lastest bpf tree, the crash like:
>>>
>>> TITLE: BUG: unable to handle kernel paging request in
>>> bpf_probe_read_compat_str
>>> CORRUPTED: false ()
>>> MAINTAINERS (TO): [a...@linux-foundation.org linux...@kvack.org]
>>> MAINTAINERS (CC): [linux-kernel@vger.kernel.org]
>>>
>>> BUG: unable to handle page fault for address: ff0
>> Thanks for the report and reproducer. The output is incomplete. It
>> should be: "BUG: unable to handle page fault for address:
>> ff60". The address is a vsyscall address, so
>> handle_page_fault() considers that the fault address is in userspace
>> instead of kernel space, and there will be no fix-up for the exception
>> and oops happened. Will post a fix and a selftest for it.
>
> There is a proposed fix here:
>
> https://lore.kernel.org/bpf/87r0jwquhv.ffs@tglx/
>
> Not sure the fix in the above link is merged to some upstream branch
> or not.

It seems it has not been merged. will ping Thomas later.




Re: BUG: unable to handle kernel paging request in bpf_probe_read_compat_str

2023-12-20 Thread Yonghong Song



On 12/20/23 1:19 AM, Hou Tao wrote:

Hi,

On 12/14/2023 11:40 AM, xingwei lee wrote:

Hello I found a bug in net/bpf in the lastest upstream linux and
comfired in the lastest net tree and lastest net bpf titled BUG:
unable to handle kernel paging request in bpf_probe_read_compat_str

If you fix this issue, please add the following tag to the commit:
Reported-by: xingwei Lee 

kernel: net 9702817384aa4a3700643d0b26e71deac0172cfd / bpf
2f2fee2bf74a7e31d06fc6cb7ba2bd4dd7753c99
Kernel config: 
https://syzkaller.appspot.com/text?tag=KernelConfig=b50bd31249191be8

in the lastest bpf tree, the crash like:

TITLE: BUG: unable to handle kernel paging request in bpf_probe_read_compat_str
CORRUPTED: false ()
MAINTAINERS (TO): [a...@linux-foundation.org linux...@kvack.org]
MAINTAINERS (CC): [linux-kernel@vger.kernel.org]

BUG: unable to handle page fault for address: ff0

Thanks for the report and reproducer. The output is incomplete. It
should be: "BUG: unable to handle page fault for address:
ff60". The address is a vsyscall address, so
handle_page_fault() considers that the fault address is in userspace
instead of kernel space, and there will be no fix-up for the exception
and oops happened. Will post a fix and a selftest for it.


There is a proposed fix here:

https://lore.kernel.org/bpf/87r0jwquhv.ffs@tglx/

Not sure the fix in the above link is merged to some upstream branch or not.


#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD cf7a067 P4D cf7a067 PUD cf7c067 PMD cf9f067 0
Oops:  [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8219 Comm: 9de Not tainted 6.7.0-rc41
Hardware name: QEMU Standard PC (i440FX + PIIX, 4
RIP: 0010:strncpy_from_kernel_nofault+0xc4/0x270 mm/maccess.c:91
Code: 83 85 6c 17 00 00 01 48 8b 2c 24 eb 18 e8 0
RSP: 0018:c900114e7ac0 EFLAGS: 00010293
RAX:  RBX: c900114e7b30 RCX:2
RDX: 8880183abcc0 RSI: 81b8c9c4 RDI:c
RBP: ff60 R08: 0001 R09:0
R10: 0001 R11: 0001 R12:8
R13: ff60 R14: 0008 R15:0
FS:  () GS:88823bc0(0
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ff60 CR3: 0cf77000 CR4:0
PKRU: 5554
Call Trace:

bpf_probe_read_kernel_str_common kernel/trace/bpf_trace.c:262 [inline]
bpf_probe_read_compat_str kernel/trace/bpf_trace.c:310 [inline]
bpf_probe_read_compat_str+0x12f/0x170 kernel/trace/bpf_trace.c:303
bpf_prog_f17ebaf3f5f7baf8+0x42/0x44
bpf_dispatcher_nop_func include/linux/bpf.h:1196 [inline]
__bpf_prog_run include/linux/filter.h:651 [inline]
bpf_prog_run include/linux/filter.h:658 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2307 [inline]
bpf_trace_run2+0x14e/0x410 kernel/trace/bpf_trace.c:2346
trace_kfree include/trace/events/kmem.h:94 [inline]
kfree+0xec/0x150 mm/slab_common.c:1043
vma_numab_state_free include/linux/mm.h:638 [inline]
__vm_area_free+0x3e/0x140 kernel/fork.c:525
remove_vma+0x128/0x170 mm/mmap.c:146
exit_mmap+0x453/0xa70 mm/mmap.c:3332
__mmput+0x12a/0x4d0 kernel/fork.c:1349
mmput+0x62/0x70 kernel/fork.c:1371
exit_mm kernel/exit.c:567 [inline]
do_exit+0x9aa/0x2ac0 kernel/exit.c:858
do_group_exit+0xd4/0x2a0 kernel/exit.c:1021
__do_sys_exit_group kernel/exit.c:1032 [inline]
__se_sys_exit_group kernel/exit.c:1030 [inline]
__x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1030
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x41/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b


=* repro.c =*
// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#ifndef __NR_bpf
#define __NR_bpf 321
#endif

#define BITMASK(bf_off, bf_len) (((1ull << (bf_len)) - 1) << (bf_off))
#define STORE_BY_BITMASK(type, htobe, addr, val, bf_off, bf_len) \
  *(type*)(addr) =   \
  htobe((htobe(*(type*)(addr)) & ~BITMASK((bf_off), (bf_len))) | \
(((type)(val) << (bf_off)) & BITMASK((bf_off), (bf_len

uint64_t r[1] = {0x};

int main(void) {
  syscall(__NR_mmap, /*addr=*/0x1000ul, /*len=*/0x1000ul, /*prot=*/0ul,
  /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  syscall(__NR_mmap, /*addr=*/0x2000ul, /*len=*/0x100ul, /*prot=*/7ul,
  /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  syscall(__NR_mmap, /*addr=*/0x2100ul, /*len=*/0x1000ul, /*prot=*/0ul,
  /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  intptr_t res = 0;
  *(uint32_t*)0x20c0 = 0x11;
  *(uint32_t*)0x20c4 = 0xb;
  *(uint64_t*)0x20c8 = 0x2180;
  *(uint8_t*)0x2180 = 0x18;
  STORE_BY_BITMASK(uint8_t, , 0x2181, 0, 0, 4);
  STORE_BY_BITMASK(uint8_t, , 0x2181, 0, 4, 4);
  *(uint16_t*)0x2182 = 0;
  *(uint32_t*)0x2184 = 0;
  *(uint8_t*)0x2188 = 0;
  *(uint8_t*)0x2189 = 0;
  *(ui

Re: [PATCH] tracing/synthetic: fix kernel-doc warnings

2023-12-20 Thread Google
On Tue, 19 Dec 2023 22:12:26 -0800
Randy Dunlap  wrote:

> scripts/kernel-doc warns about using @args: for variadic arguments to
> functions. Documentation/doc-guide/kernel-doc.rst says that this should
> be written as @...: instead, so update the source code to match that,
> preventing the warnings.
> 
> trace_events_synth.c:1165: warning: Excess function parameter 'args' 
> description in '__synth_event_gen_cmd_start'
> trace_events_synth.c:1714: warning: Excess function parameter 'args' 
> description in 'synth_event_trace'
> 

Thanks, this looks good to me.

Acked-by: Masami Hiramatsu (Google) 

> Signed-off-by: Randy Dunlap 
> Cc: Steven Rostedt 
> Cc: Masami Hiramatsu 
> Cc: Mathieu Desnoyers 
> Cc: linux-trace-ker...@vger.kernel.org
> ---
>  kernel/trace/trace_events_synth.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff -- a/kernel/trace/trace_events_synth.c 
> b/kernel/trace/trace_events_synth.c
> --- a/kernel/trace/trace_events_synth.c
> +++ b/kernel/trace/trace_events_synth.c
> @@ -1137,7 +1137,7 @@ EXPORT_SYMBOL_GPL(synth_event_add_fields
>   * @cmd: A pointer to the dynevent_cmd struct representing the new event
>   * @name: The name of the synthetic event
>   * @mod: The module creating the event, NULL if not created from a module
> - * @args: Variable number of arg (pairs), one pair for each field
> + * @...: Variable number of arg (pairs), one pair for each field
>   *
>   * NOTE: Users normally won't want to call this function directly, but
>   * rather use the synth_event_gen_cmd_start() wrapper, which
> @@ -1695,7 +1695,7 @@ __synth_event_trace_end(struct synth_eve
>   * synth_event_trace - Trace a synthetic event
>   * @file: The trace_event_file representing the synthetic event
>   * @n_vals: The number of values in vals
> - * @args: Variable number of args containing the event values
> + * @...: Variable number of args containing the event values
>   *
>   * Trace a synthetic event using the values passed in the variable
>   * argument list.


-- 
Masami Hiramatsu (Google) 



Re: BUG: unable to handle kernel paging request in bpf_probe_read_compat_str

2023-12-20 Thread Hou Tao
Hi,

On 12/14/2023 11:40 AM, xingwei lee wrote:
> Hello I found a bug in net/bpf in the lastest upstream linux and
> comfired in the lastest net tree and lastest net bpf titled BUG:
> unable to handle kernel paging request in bpf_probe_read_compat_str
>
> If you fix this issue, please add the following tag to the commit:
> Reported-by: xingwei Lee 
>
> kernel: net 9702817384aa4a3700643d0b26e71deac0172cfd / bpf
> 2f2fee2bf74a7e31d06fc6cb7ba2bd4dd7753c99
> Kernel config: 
> https://syzkaller.appspot.com/text?tag=KernelConfig=b50bd31249191be8
>
> in the lastest bpf tree, the crash like:
>
> TITLE: BUG: unable to handle kernel paging request in 
> bpf_probe_read_compat_str
> CORRUPTED: false ()
> MAINTAINERS (TO): [a...@linux-foundation.org linux...@kvack.org]
> MAINTAINERS (CC): [linux-kernel@vger.kernel.org]
>
> BUG: unable to handle page fault for address: ff0

Thanks for the report and reproducer. The output is incomplete. It
should be: "BUG: unable to handle page fault for address:
ff60". The address is a vsyscall address, so
handle_page_fault() considers that the fault address is in userspace
instead of kernel space, and there will be no fix-up for the exception
and oops happened. Will post a fix and a selftest for it.
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x) - not-present page
> PGD cf7a067 P4D cf7a067 PUD cf7c067 PMD cf9f067 0
> Oops:  [#1] PREEMPT SMP KASAN
> CPU: 1 PID: 8219 Comm: 9de Not tainted 6.7.0-rc41
> Hardware name: QEMU Standard PC (i440FX + PIIX, 4
> RIP: 0010:strncpy_from_kernel_nofault+0xc4/0x270 mm/maccess.c:91
> Code: 83 85 6c 17 00 00 01 48 8b 2c 24 eb 18 e8 0
> RSP: 0018:c900114e7ac0 EFLAGS: 00010293
> RAX:  RBX: c900114e7b30 RCX:2
> RDX: 8880183abcc0 RSI: 81b8c9c4 RDI:c
> RBP: ff60 R08: 0001 R09:0
> R10: 0001 R11: 0001 R12:8
> R13: ff60 R14: 0008 R15:0
> FS:  () GS:88823bc0(0
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: ff60 CR3: 0cf77000 CR4:0
> PKRU: 5554
> Call Trace:
> 
> bpf_probe_read_kernel_str_common kernel/trace/bpf_trace.c:262 [inline]
> bpf_probe_read_compat_str kernel/trace/bpf_trace.c:310 [inline]
> bpf_probe_read_compat_str+0x12f/0x170 kernel/trace/bpf_trace.c:303
> bpf_prog_f17ebaf3f5f7baf8+0x42/0x44
> bpf_dispatcher_nop_func include/linux/bpf.h:1196 [inline]
> __bpf_prog_run include/linux/filter.h:651 [inline]
> bpf_prog_run include/linux/filter.h:658 [inline]
> __bpf_trace_run kernel/trace/bpf_trace.c:2307 [inline]
> bpf_trace_run2+0x14e/0x410 kernel/trace/bpf_trace.c:2346
> trace_kfree include/trace/events/kmem.h:94 [inline]
> kfree+0xec/0x150 mm/slab_common.c:1043
> vma_numab_state_free include/linux/mm.h:638 [inline]
> __vm_area_free+0x3e/0x140 kernel/fork.c:525
> remove_vma+0x128/0x170 mm/mmap.c:146
> exit_mmap+0x453/0xa70 mm/mmap.c:3332
> __mmput+0x12a/0x4d0 kernel/fork.c:1349
> mmput+0x62/0x70 kernel/fork.c:1371
> exit_mm kernel/exit.c:567 [inline]
> do_exit+0x9aa/0x2ac0 kernel/exit.c:858
> do_group_exit+0xd4/0x2a0 kernel/exit.c:1021
> __do_sys_exit_group kernel/exit.c:1032 [inline]
> __se_sys_exit_group kernel/exit.c:1030 [inline]
> __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1030
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x41/0x110 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x63/0x6b
>
>
> =* repro.c =*
> // autogenerated by syzkaller (https://github.com/google/syzkaller)
>
> #define _GNU_SOURCE
>
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> #ifndef __NR_bpf
> #define __NR_bpf 321
> #endif
>
> #define BITMASK(bf_off, bf_len) (((1ull << (bf_len)) - 1) << (bf_off))
> #define STORE_BY_BITMASK(type, htobe, addr, val, bf_off, bf_len) \
>  *(type*)(addr) =   \
>  htobe((htobe(*(type*)(addr)) & ~BITMASK((bf_off), (bf_len))) | \
>(((type)(val) << (bf_off)) & BITMASK((bf_off), (bf_len
>
> uint64_t r[1] = {0x};
>
> int main(void) {
>  syscall(__NR_mmap, /*addr=*/0x1000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>  /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>  syscall(__NR_mmap, /*addr=*/0x2000ul, /*len=*/0x100ul, /*prot=*/7ul,
>  /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>  syscall(__NR_mmap, /*addr=*/0x2100ul, /*len=*/0x1000ul, /*prot=*/0ul,
>  /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>  intptr_t res = 0;
>  *(uint32_t*)0x20c0 = 0x11;
>  *(uint32_t*)0x20c4 =

[PATCH] tracing/synthetic: fix kernel-doc warnings

2023-12-19 Thread Randy Dunlap
scripts/kernel-doc warns about using @args: for variadic arguments to
functions. Documentation/doc-guide/kernel-doc.rst says that this should
be written as @...: instead, so update the source code to match that,
preventing the warnings.

trace_events_synth.c:1165: warning: Excess function parameter 'args' 
description in '__synth_event_gen_cmd_start'
trace_events_synth.c:1714: warning: Excess function parameter 'args' 
description in 'synth_event_trace'

Signed-off-by: Randy Dunlap 
Cc: Steven Rostedt 
Cc: Masami Hiramatsu 
Cc: Mathieu Desnoyers 
Cc: linux-trace-ker...@vger.kernel.org
---
 kernel/trace/trace_events_synth.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -- a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -1137,7 +1137,7 @@ EXPORT_SYMBOL_GPL(synth_event_add_fields
  * @cmd: A pointer to the dynevent_cmd struct representing the new event
  * @name: The name of the synthetic event
  * @mod: The module creating the event, NULL if not created from a module
- * @args: Variable number of arg (pairs), one pair for each field
+ * @...: Variable number of arg (pairs), one pair for each field
  *
  * NOTE: Users normally won't want to call this function directly, but
  * rather use the synth_event_gen_cmd_start() wrapper, which
@@ -1695,7 +1695,7 @@ __synth_event_trace_end(struct synth_eve
  * synth_event_trace - Trace a synthetic event
  * @file: The trace_event_file representing the synthetic event
  * @n_vals: The number of values in vals
- * @args: Variable number of args containing the event values
+ * @...: Variable number of args containing the event values
  *
  * Trace a synthetic event using the values passed in the variable
  * argument list.



Re: [PATCH] x86/sgx: fix kernel-doc comment misuse

2023-12-17 Thread Huang, Kai
On Sat, 2023-12-16 at 09:16 -0800, Randy Dunlap wrote:
> Don't use "/**" for a non-kernel-doc comment. This prevents a warning
> from scripts/kernel-doc:
> 
> main.c:740: warning: expecting prototype for A section metric is concatenated 
> in a way that @low bits 12(). Prototype was for sgx_calc_section_metric() 
> instead
> 
> Signed-off-by: Randy Dunlap 
> Cc: Jarkko Sakkinen 
> Cc: Dave Hansen 
> Cc: linux-...@vger.kernel.org
> Cc: x...@kernel.org

Reviewed-by: Kai Huang 

> ---
>  arch/x86/kernel/cpu/sgx/main.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -- a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -731,7 +731,7 @@ out:
>   return 0;
>  }
>  
> -/**
> +/*
>   * A section metric is concatenated in a way that @low bits 12-31 define the
>   * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
>   * metric.
> 



[PATCH] x86/sgx: fix kernel-doc comment misuse

2023-12-16 Thread Randy Dunlap
Don't use "/**" for a non-kernel-doc comment. This prevents a warning
from scripts/kernel-doc:

main.c:740: warning: expecting prototype for A section metric is concatenated 
in a way that @low bits 12(). Prototype was for sgx_calc_section_metric() 
instead

Signed-off-by: Randy Dunlap 
Cc: Jarkko Sakkinen 
Cc: Dave Hansen 
Cc: linux-...@vger.kernel.org
Cc: x...@kernel.org
---
 arch/x86/kernel/cpu/sgx/main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -- a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -731,7 +731,7 @@ out:
return 0;
 }
 
-/**
+/*
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
  * metric.



Re: [PATCH] hwspinlock/core: fix kernel-doc warnings

2023-12-15 Thread Bagas Sanjaya
On Tue, Dec 05, 2023 at 09:54:39PM -0800, Randy Dunlap wrote:
> diff -- a/drivers/hwspinlock/hwspinlock_core.c 
> b/drivers/hwspinlock/hwspinlock_core.c
> --- a/drivers/hwspinlock/hwspinlock_core.c
> +++ b/drivers/hwspinlock/hwspinlock_core.c
> @@ -84,8 +84,9 @@ static DEFINE_MUTEX(hwspinlock_tree_lock
>   * should decide between spin_trylock, spin_trylock_irq and
>   * spin_trylock_irqsave.
>   *
> - * Returns 0 if we successfully locked the hwspinlock or -EBUSY if
> + * Returns: %0 if we successfully locked the hwspinlock or -EBUSY if
>   * the hwspinlock was already taken.
> + *
>   * This function will never sleep.
>   */
>  int __hwspin_trylock(struct hwspinlock *hwlock, int mode, unsigned long 
> *flags)
> @@ -171,7 +172,7 @@ EXPORT_SYMBOL_GPL(__hwspin_trylock);
>  /**
>   * __hwspin_lock_timeout() - lock an hwspinlock with timeout limit
>   * @hwlock: the hwspinlock to be locked
> - * @timeout: timeout value in msecs
> + * @to: timeout value in msecs
>   * @mode: mode which controls whether local interrupts are disabled or not
>   * @flags: a pointer to where the caller's interrupt state will be saved at 
> (if
>   * requested)
> @@ -199,9 +200,11 @@ EXPORT_SYMBOL_GPL(__hwspin_trylock);
>   * to choose the appropriate @mode of operation, exactly the same way users
>   * should decide between spin_lock, spin_lock_irq and spin_lock_irqsave.
>   *
> - * Returns 0 when the @hwlock was successfully taken, and an appropriate
> + * Returns: %0 when the @hwlock was successfully taken, and an appropriate
>   * error code otherwise (most notably -ETIMEDOUT if the @hwlock is still
> - * busy after @timeout msecs). The function will never sleep.
> + * busy after @timeout msecs).
> + *
> + * The function will never sleep.
>   */
>  int __hwspin_lock_timeout(struct hwspinlock *hwlock, unsigned int to,
>   int mode, unsigned long *flags)
> @@ -304,13 +307,12 @@ EXPORT_SYMBOL_GPL(__hwspin_unlock);
>  
>  /**
>   * of_hwspin_lock_simple_xlate - translate hwlock_spec to return a lock id
> - * @bank: the hwspinlock device bank
>   * @hwlock_spec: hwlock specifier as found in the device tree
>   *
>   * This is a simple translation function, suitable for hwspinlock platform
>   * drivers that only has a lock specifier length of 1.
>   *
> - * Returns a relative index of the lock within a specified bank on success,
> + * Returns: a relative index of the lock within a specified bank on success,
>   * or -EINVAL on invalid specifier cell count.
>   */
>  static inline int
> @@ -332,9 +334,10 @@ of_hwspin_lock_simple_xlate(const struct
>   * hwspinlock device, so that it can be requested using the normal
>   * hwspin_lock_request_specific() API.
>   *
> - * Returns the global lock id number on success, -EPROBE_DEFER if the 
> hwspinlock
> - * device is not yet registered, -EINVAL on invalid args specifier value or 
> an
> - * appropriate error as returned from the OF parsing of the DT client node.
> + * Returns: the global lock id number on success, -EPROBE_DEFER if the
> + * hwspinlock device is not yet registered, -EINVAL on invalid args
> + * specifier value or an appropriate error as returned from the OF parsing
> + * of the DT client node.
>   */
>  int of_hwspin_lock_get_id(struct device_node *np, int index)
>  {
> @@ -399,9 +402,10 @@ EXPORT_SYMBOL_GPL(of_hwspin_lock_get_id)
>   * the hwspinlock device, so that it can be requested using the normal
>   * hwspin_lock_request_specific() API.
>   *
> - * Returns the global lock id number on success, -EPROBE_DEFER if the 
> hwspinlock
> - * device is not yet registered, -EINVAL on invalid args specifier value or 
> an
> - * appropriate error as returned from the OF parsing of the DT client node.
> + * Returns: the global lock id number on success, -EPROBE_DEFER if the
> + * hwspinlock device is not yet registered, -EINVAL on invalid args
> + * specifier value or an appropriate error as returned from the OF parsing
> + * of the DT client node.
>   */
>  int of_hwspin_lock_get_id_byname(struct device_node *np, const char *name)
>  {
> @@ -481,7 +485,7 @@ out:
>   *
>   * Should be called from a process context (might sleep)
>   *
> - * Returns 0 on success, or an appropriate error code on failure
> + * Returns: %0 on success, or an appropriate error code on failure
>   */
>  int hwspin_lock_register(struct hwspinlock_device *bank, struct device *dev,
>   const struct hwspinlock_ops *ops, int base_id, int num_locks)
> @@ -529,7 +533,7 @@ EXPORT_SYMBOL_GPL(hwspin_lock_register);
>   *
>   * Should be called from a process context (might sleep)
>   *
> - * Returns 0 on success, or an appropriate error code on failure
> + * Returns: %0 on success, or an appropriate error code on failure
>   */
>  int hwspin_lock_unregister(struct hwspinlock_device *bank)
>  {
> @@ -578,7 +582,7 @@ static int devm_hwspin_lock_device_match
>   *
>   * Should be called from a process context (might sleep)
>   *
> - * Returns 0 on success, 

BUG: unable to handle kernel paging request in bpf_probe_read_compat_str

2023-12-13 Thread xingwei lee
Hello I found a bug in net/bpf in the lastest upstream linux and
comfired in the lastest net tree and lastest net bpf titled BUG:
unable to handle kernel paging request in bpf_probe_read_compat_str

If you fix this issue, please add the following tag to the commit:
Reported-by: xingwei Lee 

kernel: net 9702817384aa4a3700643d0b26e71deac0172cfd / bpf
2f2fee2bf74a7e31d06fc6cb7ba2bd4dd7753c99
Kernel config: 
https://syzkaller.appspot.com/text?tag=KernelConfig=b50bd31249191be8

in the lastest bpf tree, the crash like:

TITLE: BUG: unable to handle kernel paging request in bpf_probe_read_compat_str
CORRUPTED: false ()
MAINTAINERS (TO): [a...@linux-foundation.org linux...@kvack.org]
MAINTAINERS (CC): [linux-kernel@vger.kernel.org]

BUG: unable to handle page fault for address: ff0
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD cf7a067 P4D cf7a067 PUD cf7c067 PMD cf9f067 0
Oops:  [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8219 Comm: 9de Not tainted 6.7.0-rc41
Hardware name: QEMU Standard PC (i440FX + PIIX, 4
RIP: 0010:strncpy_from_kernel_nofault+0xc4/0x270 mm/maccess.c:91
Code: 83 85 6c 17 00 00 01 48 8b 2c 24 eb 18 e8 0
RSP: 0018:c900114e7ac0 EFLAGS: 00010293
RAX:  RBX: c900114e7b30 RCX:2
RDX: 8880183abcc0 RSI: 81b8c9c4 RDI:c
RBP: ff60 R08: 0001 R09:0
R10: 0001 R11: 0001 R12:8
R13: ff60 R14: 0008 R15:0
FS:  () GS:88823bc0(0
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ff60 CR3: 0cf77000 CR4:0
PKRU: 5554
Call Trace:

bpf_probe_read_kernel_str_common kernel/trace/bpf_trace.c:262 [inline]
bpf_probe_read_compat_str kernel/trace/bpf_trace.c:310 [inline]
bpf_probe_read_compat_str+0x12f/0x170 kernel/trace/bpf_trace.c:303
bpf_prog_f17ebaf3f5f7baf8+0x42/0x44
bpf_dispatcher_nop_func include/linux/bpf.h:1196 [inline]
__bpf_prog_run include/linux/filter.h:651 [inline]
bpf_prog_run include/linux/filter.h:658 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2307 [inline]
bpf_trace_run2+0x14e/0x410 kernel/trace/bpf_trace.c:2346
trace_kfree include/trace/events/kmem.h:94 [inline]
kfree+0xec/0x150 mm/slab_common.c:1043
vma_numab_state_free include/linux/mm.h:638 [inline]
__vm_area_free+0x3e/0x140 kernel/fork.c:525
remove_vma+0x128/0x170 mm/mmap.c:146
exit_mmap+0x453/0xa70 mm/mmap.c:3332
__mmput+0x12a/0x4d0 kernel/fork.c:1349
mmput+0x62/0x70 kernel/fork.c:1371
exit_mm kernel/exit.c:567 [inline]
do_exit+0x9aa/0x2ac0 kernel/exit.c:858
do_group_exit+0xd4/0x2a0 kernel/exit.c:1021
__do_sys_exit_group kernel/exit.c:1032 [inline]
__se_sys_exit_group kernel/exit.c:1030 [inline]
__x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1030
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x41/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b


=* repro.c =*
// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#ifndef __NR_bpf
#define __NR_bpf 321
#endif

#define BITMASK(bf_off, bf_len) (((1ull << (bf_len)) - 1) << (bf_off))
#define STORE_BY_BITMASK(type, htobe, addr, val, bf_off, bf_len) \
 *(type*)(addr) =   \
 htobe((htobe(*(type*)(addr)) & ~BITMASK((bf_off), (bf_len))) | \
   (((type)(val) << (bf_off)) & BITMASK((bf_off), (bf_len

uint64_t r[1] = {0x};

int main(void) {
 syscall(__NR_mmap, /*addr=*/0x1000ul, /*len=*/0x1000ul, /*prot=*/0ul,
 /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
 syscall(__NR_mmap, /*addr=*/0x2000ul, /*len=*/0x100ul, /*prot=*/7ul,
 /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
 syscall(__NR_mmap, /*addr=*/0x2100ul, /*len=*/0x1000ul, /*prot=*/0ul,
 /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
 intptr_t res = 0;
 *(uint32_t*)0x20c0 = 0x11;
 *(uint32_t*)0x20c4 = 0xb;
 *(uint64_t*)0x20c8 = 0x2180;
 *(uint8_t*)0x2180 = 0x18;
 STORE_BY_BITMASK(uint8_t, , 0x2181, 0, 0, 4);
 STORE_BY_BITMASK(uint8_t, , 0x2181, 0, 4, 4);
 *(uint16_t*)0x2182 = 0;
 *(uint32_t*)0x2184 = 0;
 *(uint8_t*)0x2188 = 0;
 *(uint8_t*)0x2189 = 0;
 *(uint16_t*)0x218a = 0;
 *(uint32_t*)0x218c = 0;
 *(uint8_t*)0x2190 = 0x18;
 STORE_BY_BITMASK(uint8_t, , 0x2191, 1, 0, 4);
 STORE_BY_BITMASK(uint8_t, , 0x2191, 0, 4, 4);
 *(uint16_t*)0x2192 = 0;
 *(uint32_t*)0x2194 = 0x25702020;
 *(uint8_t*)0x2198 = 0;
 *(uint8_t*)0x2199 = 0;
 *(uint16_t*)0x219a = 0;
 *(uint32_t*)0x219c = 0x20202000;
 STORE_BY_BITMASK(uint8_t, , 0x21a0, 3, 0, 3);
 STORE_BY_BITMASK(uint8_t, , 0x21a0, 3, 3, 2);
 STORE_BY_BITMASK(uint8_t, , 0x21a0, 3, 5, 3);
 STORE_BY_BITMASK(uint8_t, , 0x21a1, 0xa, 0, 4);
 STORE_BY_BITMASK(uint8_t, , 0x21a1, 1, 4, 4);
 *(uint16_t*)0x21a2 = 0xfff

  1   2   3   4   5   6   7   8   9   10   >