[RFC PATCH v3 0/1] x86/sgx: Explicitly give up the CPU in EDMM's ioctl() to avoid softlockup

2024-05-15 Thread Bojun Zhu
Hi folks,

This is the third version of the patch to fix the softlockup in EDMM 
iotcl()[1][2].

If we run an enclave equipped with large EPC(30G or greater on my platfrom)
on the Linux with kernel preemptions disabled(by configuring
"CONFIG_PREEMPT_NONE=y"), we will get the following softlockup warning 
messages being reported in "dmesg" log:

The EDMM's ioctl()s (sgx_ioc_enclave_{ modify_types | restrict_permissions |  
remove_pages}) 
interface provided by kernel support batch changing attributes of enclave's EPC.
If userspace App requests kernel to handle too many EPC pages, kernel
may stuck for a long time(with preemption disabled).

The log is as follows:

[ cut here ]
[  901.101294] watchdog: BUG: soft lockup - CPU#92 stuck for 23s! 
[occlum-run:4289]
[  901.109617] Modules linked in: veth xt_conntrack xt_MASQUERADE 
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype iptable_filter 
br_netfilter bridge stp llc overlay nls_iso8859_1 intel_rapl_msr 
intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common 
i10nm_edac nfit binfmt_misc ipmi_ssif x86_pkg_temp_thermal intel_powerclamp 
coretemp kvm_intel kvm crct10dif_pclmul polyval_clmulni polyval_generic 
ghash_clmulni_intel sha512_ssse3 sha256_ssse3 pmt_telemetry sha1_ssse3 
pmt_class joydev intel_sdsi input_leds aesni_intel crypto_simd cryptd dax_hmem 
cxl_acpi cmdlinepart rapl cxl_core ast spi_nor intel_cstate drm_shmem_helper 
einj mtd drm_kms_helper mei_me idxd isst_if_mmio isst_if_mbox_pci 
isst_if_common intel_vsec idxd_bus mei acpi_ipmi ipmi_si ipmi_devintf 
ipmi_msghandler acpi_pad acpi_power_meter mac_hid sch_fq_codel msr parport_pc 
ppdev lp parport ramoops reed_solomon pstore_blk pstore_zone efi_pstore drm 
ip_tables x_tables
[  901.109670]  autofs4 mlx5_ib ib_uverbs ib_core hid_generic usbhid hid ses 
enclosure scsi_transport_sas mlx5_core pci_hyperv_intf mlxfw igb ahci psample 
i2c_algo_bit i2c_i801 spi_intel_pci xhci_pci tls megaraid_sas dca spi_intel 
crc32_pclmul i2c_smbus i2c_ismt libahci xhci_pci_renesas wmi pinctrl_emmitsburg
[  901.109691] CPU: 92 PID: 4289 Comm: occlum-run Not tainted 6.9.0-rc5 #3
[  901.109693] Hardware name: Inspur NF5468-M7-A0-R0-00/NF5468-M7-A0-R0-00, 
BIOS 05.02.01 05/08/2023
[  901.109695] RIP: 0010:sgx_enclave_restrict_permissions+0xba/0x1f0
[  901.109701] Code: 48 c1 e6 05 48 89 d1 48 8d 5c 24 40 b8 0e 00 00 00 48 2b 
8e 70 8e 15 8b 48 c1 e9 05 48 c1 e1 0c 48 03 8e 68 8e 15 8b 0f 01 cf  00 00 
00 40 0f 85 b2 00 00 00 85 c0 0f 85 db 00 00 00 4c 89 ef
[  901.109702] RSP: 0018:ad0ae5d0f8c0 EFLAGS: 0202
[  901.109704] RAX:  RBX: ad0ae5d0f900 RCX: ad11dfc0e000
[  901.109705] RDX: ad2adcff81c0 RSI:  RDI: 9a12f5f4f000
[  901.109706] RBP: ad0ae5d0f9b0 R08: 0002 R09: 9a1289f57520
[  901.109707] R10: 005d R11: 0002 R12: 0006d8ff2000
[  901.109708] R13: 9a12f5f4f000 R14: ad0ae5d0fa18 R15: 9a12f5f4f020
[  901.109709] FS:  7fb20ad1d740() GS:9a317fe0() 
knlGS:
[  901.109710] CS:  0010 DS:  ES:  CR0: 80050033
[  901.109711] CR2: 7f8041811000 CR3: 000118530006 CR4: 00770ef0
[  901.109712] DR0:  DR1:  DR2: 
[  901.109713] DR3:  DR6: fffe07f0 DR7: 0400
[  901.109714] PKRU: 5554
[  901.109714] Call Trace:
[  901.109716]  
[  901.109718]  ? show_regs+0x67/0x70
[  901.109722]  ? watchdog_timer_fn+0x1f3/0x280
[  901.109725]  ? __pfx_watchdog_timer_fn+0x10/0x10
[  901.109727]  ? __hrtimer_run_queues+0xc8/0x220
[  901.109731]  ? hrtimer_interrupt+0x10c/0x250
[  901.109733]  ? __sysvec_apic_timer_interrupt+0x53/0x130
[  901.109736]  ? sysvec_apic_timer_interrupt+0x7b/0x90
[  901.109739]  
[  901.109740]  
[  901.109740]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  901.109745]  ? sgx_enclave_restrict_permissions+0xba/0x1f0
[  901.109747]  ? aa_file_perm+0x145/0x550
[  901.109750]  sgx_ioctl+0x1ab/0x900
[  901.109751]  ? xas_find+0x84/0x200
[  901.109754]  ? sgx_enclave_etrack+0xbb/0x140
[  901.109756]  ? sgx_encl_may_map+0x19a/0x240
[  901.109758]  ? common_file_perm+0x8a/0x1b0
[  901.109760]  ? obj_cgroup_charge_pages+0xa2/0x100
[  901.109763]  ? tlb_flush_mmu+0x31/0x1c0
[  901.109766]  ? tlb_finish_mmu+0x42/0x80
[  901.109767]  ? do_mprotect_pkey+0x150/0x530
[  901.109769]  ? __fget_light+0xc0/0x100
[  901.109772]  __x64_sys_ioctl+0x95/0xd0
[  901.109775]  x64_sys_call+0x1209/0x20c0
[  901.109777]  do_syscall_64+0x6d/0x110
[  901.109779]  ? syscall_exit_to_user_mode+0x86/0x1c0
[  901.109782]  ? do_syscall_64+0x79/0x110
[  901.109783]  ? syscall_exit_to_user_mode+0x86/0x1c0
[  901.109784]  ? do_syscall_64+0x79/0x110
[  901.109785]  ? free_unref_page+0x10e/0x180
[  901.109788]  ? __do_fault+0x36/0x130
[  901.109791]  ? 

[PATCH v3 0/1] arm64: Implement stack trace termination record

2021-04-20 Thread madvenka
From: "Madhavan T. Venkataraman" 

Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.

All tasks have a pt_regs structure right after the task stack in the stack
page. The pt_regs structure contains a stackframe field. Make this stackframe
field the final frame in the task stack so all stack traces end at a fixed
stack offset.

For kernel tasks, this is simple to understand. For user tasks, there is
some extra detail. User tasks get created via fork() et al. Once they return
from fork, they enter the kernel only on an EL0 exception. In arm64,
system calls are also EL0 exceptions.

The EL0 exception handler uses the task pt_regs mentioned above to save
register state and call different exception functions. All stack traces
from EL0 exception code must end at the pt_regs. So, make pt_regs->stackframe
the final frame in the EL0 exception stack.

To summarize, task_pt_regs(task)->stackframe will always be the final frame
in a stack trace.

Sample stack traces
===

Showing just the last couple of frames in each stack trace to show how the
stack trace ends.

Primary CPU idle task
=

 ...
[0.077109]   rest_init+0x108/0x144
[0.077188]   arch_call_rest_init+0x18/0x24
[0.077220]   start_kernel+0x3ac/0x3e4
[0.077293]   __primary_switched+0xac/0xb0

Secondary CPU idle task
===

...
[0.077264]   secondary_start_kernel+0x228/0x388
[0.077326]   __secondary_switched+0x80/0x84

Sample kernel thread


 ...
[   24.543250]   kernel_init+0xa4/0x164
[   24.561850]   ret_from_fork+0x10/0x18

Write system call (EL0 exception)
=

(using a test driver called callfd)

[ 1160.628723]   callfd_stack+0x3c/0x70
[ 1160.628768]   callfd_op+0x35c/0x3a8
[ 1160.628791]   callfd_write+0x5c/0xc8
[ 1160.628813]   vfs_write+0x104/0x3b8
[ 1160.628837]   ksys_write+0xd0/0x188
[ 1160.628859]   __arm64_sys_write+0x4c/0x60
[ 1160.628883]   el0_svc_common.constprop.0+0xa8/0x240
[ 1160.628904]   do_el0_svc+0x40/0xa8
[ 1160.628921]   el0_svc+0x2c/0x78
[ 1160.628942]   el0_sync_handler+0xb0/0xb8
[ 1160.628962]   el0_sync+0x17c/0x180

NULL pointer dereference exception (EL1 exception)
==

[ 1160.637984]   callfd_stack+0x3c/0x70
[ 1160.638015]   die_kernel_fault+0x80/0x108
[ 1160.638042]   do_page_fault+0x520/0x600
[ 1160.638075]   do_translation_fault+0xa8/0xdc
[ 1160.638102]   do_mem_abort+0x68/0x100
[ 1160.638120]   el1_abort+0x40/0x60
[ 1160.638138]   el1_sync_handler+0xac/0xc8
[ 1160.638157]   el1_sync+0x74/0x100
[ 1160.638174]   0x0   <=== NULL pointer dereference
[ 1160.638189]   callfd_write+0x5c/0xc8
[ 1160.638211]   vfs_write+0x104/0x3b8
[ 1160.638234]   ksys_write+0xd0/0x188
[ 1160.638278]   __arm64_sys_write+0x4c/0x60
[ 1160.638325]   el0_svc_common.constprop.0+0xa8/0x240
[ 1160.638358]   do_el0_svc+0x40/0xa8
[ 1160.638379]   el0_svc+0x2c/0x78
[ 1160.638409]   el0_sync_handler+0xb0/0xb8
[ 1160.638452]   el0_sync+0x17c/0x180

Timer interrupt (EL1 exception)
===

Secondary CPU idle task interrupted by the timer interrupt:

[ 1160.702949] callfd_callback:
[ 1160.703006]   callfd_stack+0x3c/0x70
[ 1160.703060]   callfd_callback+0x30/0x40
[ 1160.703087]   call_timer_fn+0x48/0x220
[ 1160.703113]   run_timer_softirq+0x7cc/0xc70
[ 1160.703144]   __do_softirq+0x1ec/0x608
[ 1160.703166]   irq_exit+0x138/0x180
[ 1160.703193]   __handle_domain_irq+0x8c/0xf0
[ 1160.703218]   gic_handle_irq+0xec/0x410
[ 1160.703253]   el1_irq+0xc0/0x180
[ 1160.703278]   arch_local_irq_enable+0xc/0x28
[ 1160.703329]   default_idle_call+0x54/0x1d8
[ 1160.703355]   do_idle+0x2d8/0x350
[ 1160.703388]   cpu_startup_entry+0x2c/0x98
[ 1160.703412]   secondary_start_kernel+0x238/0x388
[ 1160.703446]   __secondary_switched+0x80/0x84
---
Changelog:

v3:
- Added Reviewed-by: Mark Brown .
- Fixed an extra space after a cast reported by checkpatch --strict.
- Synced with mainline tip.

v2:
- Changed some wordings as suggested by Mark Rutland.
- Removed the synthetic return PC for idle tasks. Changed the
  branches to start_kernel() and secondary_start_kernel() to
  calls so that they will have a proper return PC.

v1:
- Set up task_pt_regs(current)->stackframe as the final frame
  when a new task is initialized in copy_thread().
- Create pt_regs for the idle tasks and set up pt_regs->stackframe
  as the final frame for the idle tasks.
- Set up task_pt_regs(current)->stackframe as the final frame in
  the EL0 exception handler so the EL0 exception stack trace ends
  there.
- Terminate the stack trace successfully in unwind_frame() when

[PATCH v3 0/1] dwc2: Enable USB when booted in ACPI mode

2021-04-13 Thread Jeremy Linton
The BCM2711 has a designware USB controller that is commonly used
on the CM4 and RPi400. There is a desire to use thes machines with
a standard UEFI+ACPI stack as is being done with the normal RPi4.

This patch enables this by adding ACPI module boilerplate to the
existing dwc2 controller.

It should also be noted, that there is an ACPI table update
in the firmware which marks the ACPI _DMA() entries as
ResourceProducers. That change is required for this to work with
the 1G DMA translation present on the platform.

Changes:

  v2->v3: Add this cover letter to describe the patch changes
  
  v1->v2: Fix the kernel_ulong_t/set_parms() function typecasting
  warning by explicitly doing the type cast.

Jeremy Linton (1):
  usb: dwc2: Enable RPi in ACPI mode

 drivers/usb/dwc2/core.h |  2 ++
 drivers/usb/dwc2/params.c   | 18 +-
 drivers/usb/dwc2/platform.c |  1 +
 3 files changed, 20 insertions(+), 1 deletion(-)

-- 
2.29.2



[PATCH v3 0/1] NVIDIA Tegra memory improvements

2021-04-04 Thread Dmitry Osipenko
Hi,

Here is the last patch of the series which had minor problem in v2,
the rest of the patches are already applied by Krzysztof Kozlowski.

Changelog:

v3: - Added new optional reg property for emc-tables nodes in order to
  fix dt_binding_check warning.

  Please note that I will prepare a separate patch for v5.14 that will
  add the new property to the device-trees since Thierry already
  sent out PR for v5.13.

v2: - Fixed typos in the converted schemas.
- Corrected reg entry of tegra20-mc-gart schema to use fixed number of 
items.
- Made power-domain to use maxItems instead of $ref phandle in schemas.

Dmitry Osipenko (1):
  dt-bindings: memory: tegra20: emc: Convert to schema

 .../memory-controllers/nvidia,tegra20-emc.txt | 130 
 .../nvidia,tegra20-emc.yaml   | 303 ++
 2 files changed, 303 insertions(+), 130 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-emc.txt
 create mode 100644 
Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-emc.yaml

-- 
2.30.2



Re: [PATCH v3 0/1] drm/tiny: add support for Waveshare 2inch LCD module

2021-03-30 Thread carlis
On Tue, 30 Mar 2021 09:17:19 -0500
David Lechner  wrote:

> On 3/30/21 3:08 AM, Carlis wrote:
> > From: Xuezhi Zhang 
> > 
> > This adds a new module for the ST7789V controller with parameters
> > for the Waveshare 2inch LCD module.
> > 
> > Signed-off-by: Xuezhi Zhang 
> > ---
> > v2:change compatible value.
> > v3:change author name.
> > ---
> >   MAINTAINERS|   8 +
> >   drivers/gpu/drm/tiny/Kconfig   |  14 ++
> >   drivers/gpu/drm/tiny/Makefile  |   1 +
> >   drivers/gpu/drm/tiny/st7789v.c | 269
> > + 4 files changed, 292 insertions(+)
> >   create mode 100644 drivers/gpu/drm/tiny/st7789v.c
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index d92f85ca831d..df25e8e0deb1 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -5769,6 +5769,14 @@ T:   git
> > git://anongit.freedesktop.org/drm/drm-misc F:
> > Documentation/devicetree/bindings/display/sitronix,st7735r.yaml
> > F:  drivers/gpu/drm/tiny/st7735r.c 
> > +DRM DRIVER FOR SITRONIX ST7789V PANELS
> > +M: David Lechner   
> 
OK, i will remove this in the next patch.
> I should not be added here. I don't have one of these displays.
> 
> > +M: Xuezhi Zhang 
> > +S: Maintained
> > +T: git git://anongit.freedesktop.org/drm/drm-misc
> > +F:
> > Documentation/devicetree/bindings/display/sitronix,st7789v-dbi.yaml
> > +F: drivers/gpu/drm/tiny/st7789v.c +
> >   DRM DRIVER FOR SONY ACX424AKP PANELS
> >   M:Linus Walleij 
> >   S:Maintained  
thanks,
Xuezhi Zhang



[PATCH v3 0/1] nvmem: Change to unified property interface

2021-03-30 Thread Kevin Paul Herbert
nvmem: Change to unified property interface

Change from using device tree (Open Firmware) APIs to the unified
'fwnode' interface.

Change of_nvmem_cell_get() to fwnode_nvmem_cell_get(), and add a
wrapper for of_nvmem_cell_get().

Change of_nvmem_device_get() to fwnode_nvmem_device_get(). There
are no known accessors to the OF interface, so no need for a wrapper.

The first version of this patch incorrectly had a wrapper for
of_nvmem_device_get(), even though the comments about the patch
not needing this were correct.

The second version of this patch had an incorrect return type for
of_nvmem_device_get().



Re: [PATCH v3 0/1] drm/tiny: add support for Waveshare 2inch LCD module

2021-03-30 Thread David Lechner

On 3/30/21 3:08 AM, Carlis wrote:

From: Xuezhi Zhang 

This adds a new module for the ST7789V controller with parameters for
the Waveshare 2inch LCD module.

Signed-off-by: Xuezhi Zhang 
---
v2:change compatible value.
v3:change author name.
---
  MAINTAINERS|   8 +
  drivers/gpu/drm/tiny/Kconfig   |  14 ++
  drivers/gpu/drm/tiny/Makefile  |   1 +
  drivers/gpu/drm/tiny/st7789v.c | 269 +
  4 files changed, 292 insertions(+)
  create mode 100644 drivers/gpu/drm/tiny/st7789v.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d92f85ca831d..df25e8e0deb1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5769,6 +5769,14 @@ T:   git git://anongit.freedesktop.org/drm/drm-misc
  F:Documentation/devicetree/bindings/display/sitronix,st7735r.yaml
  F:drivers/gpu/drm/tiny/st7735r.c
  
+DRM DRIVER FOR SITRONIX ST7789V PANELS

+M: David Lechner 


I should not be added here. I don't have one of these displays.


+M: Xuezhi Zhang 
+S: Maintained
+T: git git://anongit.freedesktop.org/drm/drm-misc
+F: Documentation/devicetree/bindings/display/sitronix,st7789v-dbi.yaml
+F: drivers/gpu/drm/tiny/st7789v.c
+
  DRM DRIVER FOR SONY ACX424AKP PANELS
  M:Linus Walleij 
  S:Maintained


[PATCH v3 0/1] drm/tiny: add support for Waveshare 2inch LCD module

2021-03-30 Thread Carlis
From: Xuezhi Zhang 

This adds a new module for the ST7789V controller with parameters for
the Waveshare 2inch LCD module.

Signed-off-by: Xuezhi Zhang 
---
v2:change compatible value.
v3:change author name.
---
 MAINTAINERS|   8 +
 drivers/gpu/drm/tiny/Kconfig   |  14 ++
 drivers/gpu/drm/tiny/Makefile  |   1 +
 drivers/gpu/drm/tiny/st7789v.c | 269 +
 4 files changed, 292 insertions(+)
 create mode 100644 drivers/gpu/drm/tiny/st7789v.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d92f85ca831d..df25e8e0deb1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5769,6 +5769,14 @@ T:   git git://anongit.freedesktop.org/drm/drm-misc
 F: Documentation/devicetree/bindings/display/sitronix,st7735r.yaml
 F: drivers/gpu/drm/tiny/st7735r.c
 
+DRM DRIVER FOR SITRONIX ST7789V PANELS
+M: David Lechner 
+M: Xuezhi Zhang 
+S: Maintained
+T: git git://anongit.freedesktop.org/drm/drm-misc
+F: Documentation/devicetree/bindings/display/sitronix,st7789v-dbi.yaml
+F: drivers/gpu/drm/tiny/st7789v.c
+
 DRM DRIVER FOR SONY ACX424AKP PANELS
 M: Linus Walleij 
 S: Maintained
diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
index 2b6414f0fa75..ac2c7fb702f0 100644
--- a/drivers/gpu/drm/tiny/Kconfig
+++ b/drivers/gpu/drm/tiny/Kconfig
@@ -131,3 +131,17 @@ config TINYDRM_ST7735R
  * Okaya RH128128T 1.44" 128x128 TFT
 
  If M is selected the module will be called st7735r.
+
+config TINYDRM_ST7789V
+   tristate "DRM support for Sitronix ST7789V display panels"
+   depends on DRM && SPI
+   select DRM_KMS_HELPER
+   select DRM_KMS_CMA_HELPER
+   select DRM_MIPI_DBI
+   select BACKLIGHT_CLASS_DEVICE
+   help
+ DRM driver for Sitronix ST7789V with one of the following
+ LCDs:
+ * Waveshare 2inch lcd module 240x320 TFT
+
+ If M is selected the module will be called st7789v.
diff --git a/drivers/gpu/drm/tiny/Makefile b/drivers/gpu/drm/tiny/Makefile
index 6ae4e9e5a35f..aa0caa2b6c16 100644
--- a/drivers/gpu/drm/tiny/Makefile
+++ b/drivers/gpu/drm/tiny/Makefile
@@ -10,3 +10,4 @@ obj-$(CONFIG_TINYDRM_MI0283QT)+= mi0283qt.o
 obj-$(CONFIG_TINYDRM_REPAPER)  += repaper.o
 obj-$(CONFIG_TINYDRM_ST7586)   += st7586.o
 obj-$(CONFIG_TINYDRM_ST7735R)  += st7735r.o
+obj-$(CONFIG_TINYDRM_ST7789V)  += st7789v.o
diff --git a/drivers/gpu/drm/tiny/st7789v.c b/drivers/gpu/drm/tiny/st7789v.c
new file mode 100644
index ..9b4bb9edba40
--- /dev/null
+++ b/drivers/gpu/drm/tiny/st7789v.c
@@ -0,0 +1,269 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * DRM driver for display panels connected to a Sitronix ST7789V
+ * display controller in SPI mode.
+ *
+ * Copyright 2017 David Lechner 
+ * Copyright (C) 2019 Glider bvba
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define ST7789V_PORCTRL 0xb2
+#define ST7789V_GCTRL   0xb7
+#define ST7789V_VCOMS   0xbb
+#define ST7789V_LCMCTRL 0xc0
+#define ST7789V_VDVVRHEN0xc2
+#define ST7789V_VRHS0xc3
+#define ST7789V_VDVS0xc4
+#define ST7789V_FRCTRL2 0xc6
+#define ST7789V_PWCTRL1 0xd0
+#define ST7789V_PVGAMCTRL   0xe0
+#define ST7789V_NVGAMCTRL   0xe1
+
+#define ST7789V_MY BIT(7)
+#define ST7789V_MX BIT(6)
+#define ST7789V_MV BIT(5)
+#define ST7789V_RGBBIT(3)
+
+struct st7789v_cfg {
+   const struct drm_display_mode mode;
+   unsigned int left_offset;
+   unsigned int top_offset;
+   unsigned int write_only:1;
+   unsigned int rgb:1; /* RGB (vs. BGR) */
+};
+
+struct st7789v_priv {
+   struct mipi_dbi_dev dbidev; /* Must be first for .release() */
+   const struct st7789v_cfg *cfg;
+};
+
+static void st7789v_pipe_enable(struct drm_simple_display_pipe *pipe,
+   struct drm_crtc_state *crtc_state,
+   struct drm_plane_state *plane_state)
+{
+   struct mipi_dbi_dev *dbidev = drm_to_mipi_dbi_dev(pipe->crtc.dev);
+   struct st7789v_priv *priv = container_of(dbidev, struct st7789v_priv,
+dbidev);
+   struct mipi_dbi *dbi = >dbi;
+   int ret, idx;
+   u8 addr_mode;
+
+   if (!drm_dev_enter(pipe->crtc.dev, ))
+   return;
+
+   DRM_DEBUG_KMS("\n");
+
+   ret = mipi_dbi_poweron_reset(dbidev);
+   if (ret)
+   goto out_exit;
+
+   msleep(150);
+
+   mipi_dbi_command(dbi, MIPI_DCS_EXIT_SLEEP_MODE);
+   msleep(100);
+
+
+   switch (dbidev->rotation) {
+   default:
+   addr_mode = 0;
+   break;
+   case 90:
+   addr_mode = ST7789V_MY | ST7789V_MV;
+   break;
+   case 180:
+   addr_mode = ST7789V_MX | ST7789V_MY;
+ 

Re: [PATCH v3 0/1] Allow drivers to modify dql.min_limit value

2021-03-22 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Sun, 21 Mar 2021 22:48:48 +0900 you wrote:
> Abstract: would like to directly set dql.min_limit value inside a
> driver to improve BQL performances of a CAN USB driver.
> 
> CAN packets have a small PDU: for classical CAN maximum size is
> roughly 16 bytes (8 for payload and 8 for arbitration, CRC and
> others).
> 
> [...]

Here is the summary with links:
  - [v3,1/1] netdev: add netdev_queue_set_dql_min_limit()
https://git.kernel.org/netdev/net-next/c/f57bac3c33e7

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [PATCH v3 0/1] correct the inside linear map range during hotplug check

2021-03-22 Thread Will Deacon
On Tue, 16 Feb 2021 10:03:50 -0500, Pavel Tatashin wrote:
> v3:   - Sync with linux-next where arch_get_mappable_range() was
> introduced.
> v2:   - Added test-by Tyler Hicks
>   - Addressed comments from Anshuman Khandual: moved check under
> IS_ENABLED(CONFIG_RANDOMIZE_BASE), added
> WARN_ON(start_linear_pa > end_linear_pa);
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/1] arm64: mm: correct the inside linear map range during hotplug check
  https://git.kernel.org/arm64/c/ee7febce0519

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev


[PATCH v3 0/1] Allow drivers to modify dql.min_limit value

2021-03-21 Thread Vincent Mailhol
Abstract: would like to directly set dql.min_limit value inside a
driver to improve BQL performances of a CAN USB driver.

CAN packets have a small PDU: for classical CAN maximum size is
roughly 16 bytes (8 for payload and 8 for arbitration, CRC and
others).

I am writing an CAN driver for an USB interface. To compensate the
extra latency introduced by the USB, I want to group several CAN
frames and do one USB bulk send. To this purpose, I implemented BQL in
my driver.

However, the BQL algorithms can take time to adjust, especially if
there are small bursts.

The best way I found is to directly modify the dql.min_limit and set
it to some empirical values. This way, even during small burst events
I can have a good throughput. Slightly increasing the dql.min_limit
has no measurable impact on the latency as long as frames fit in the
same USB packet (i.e. BQL overheard is negligible compared to USB
overhead).

The BQL was not designed for USB nor was it designed for CAN's small
PDUs which probably explains why I am the first one to ever have
thought of using dql.min_limit within the driver.

The code I wrote looks like:

> #ifdef CONFIG_BQL
>   netdev_get_tx_queue(netdev, 0)->dql.min_limit = ;
> #endif

Using #ifdef to set up some variables is not a best practice. I am
sending this RFC to see if we can add a function to set this
dql.min_limit in a more pretty way.

For your reference, this RFQ is a follow-up of a discussion on the
linux-can mailing list:
https://lore.kernel.org/linux-can/20210309125708.ei75tr5vp2san...@pengutronix.de/

Thank you for your comments.

Yours sincerely,
Vincent

** Changelog **

RFC v2 -> v3
  - More verbose commit description.
  - Fix kernel documentation.

RFC v1 -> RFC v2
  - Fix incorect #ifdef use.
Reference: 
https://lore.kernel.org/linux-can/20210309153547.q7zspf46k6ter...@pengutronix.de/

Link to RFC v1:
https://lore.kernel.org/linux-can/20210309152354.95309-1-mailhol.vinc...@wanadoo.fr/T/#t

Vincent Mailhol (1):
  netdev: add netdev_queue_set_dql_min_limit()

 include/linux/netdevice.h | 18 ++
 1 file changed, 18 insertions(+)

-- 
2.26.2



[PATCH v3 0/1] dump kmessage before machine_kexec

2021-03-19 Thread Pavel Tatashin
Changelog
v3
- Re-sending because it still has not landed in mainline.
- Sync with mainline
- Added Acked-by: Baoquan He
v2
- Added review-by's
- Sync with mainline

Allow to study performance shutdown via kexec reboot calls by having kmsg
log saved via pstore.

Previous submissions
v1 https://lore.kernel.org/lkml/20200605194642.62278-1-pasha.tatas...@soleen.com
v2 
https://lore.kernel.org/lkml/20210126204125.313820-1-pasha.tatas...@soleen.com

Pavel Tatashin (1):
  kexec: dump kmessage before machine_kexec

 kernel/kexec_core.c | 2 ++
 1 file changed, 2 insertions(+)

-- 
2.25.1



[PATCH v3 0/1] GIC v4.1: Disable VSGI support for GIC CPUIF < v4.1

2021-03-17 Thread Lorenzo Pieralisi
This patchset is v3 of a previous version [1].

v2 -> v3:
- Coalesced all checks in one function (Marc's feedback)
- Allow sgi_ops on cpuif mismatch (to keep v4.1 doorbell
  mechanism that works fine even if GIC CPUIF < v4.1)

v1 -> v2:
- Fixed vGIC behaviour according to v1 [1] review
- Removed capability detection - rely on sanitised reg read
- Added vsgi specific flag (for gic and kvm)

[1] 
https://lore.kernel.org/linux-arm-kernel/20210302102744.12692-1-lorenzo.pieral...@arm.com

-- Original cover letter --

GIC v4.1 introduced changes to the GIC CPU interface; systems that
integrate CPUs that do not support GIC v4.1 features (as reported in the
ID_AA64PFR0_EL1.GIC bitfield) and a GIC v4.1 controller must disable in
software virtual SGIs support since the CPUIF and GIC controller version
mismatch results in CONSTRAINED UNPREDICTABLE behaviour at architectural
level.

For systems with CPUs reporting ID_AA64PFR0_EL1.GIC == b0001 integrated
in a system with a GIC v4.1 it _should_ still be safe to enable vLPIs
(other than vSGI) since the protocol between the GIC redistributor and
the GIC CPUIF was not changed from GIC v4.0 to GIC v4.1.

Cc: Marc Zyngier 

Lorenzo Pieralisi (1):
  irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection

 arch/arm64/kvm/vgic/vgic-mmio-v3.c |  4 ++--
 drivers/irqchip/irq-gic-v4.c   | 27 +--
 include/linux/irqchip/arm-gic-v4.h |  2 ++
 3 files changed, 29 insertions(+), 4 deletions(-)

-- 
2.29.1



[PATCH v3 0/1] add ACPI binding to RX6110 driver

2021-03-16 Thread Claudius Heine
Hi,

it took some time, but now we got the official ACPI id for the RX6110 RTC driver
from Seiko Epson.

regards,
Claudius

Johannes Hahn (1):
  rtc: rx6110: add ACPI bindings to I2C

 drivers/rtc/rtc-rx6110.c | 12 
 1 file changed, 12 insertions(+)

-- 
2.30.1



[PATCH v3 0/1] Unprivileged chroot

2021-03-11 Thread Mickaël Salaün
Hi,

This new patch replaces the path_is_under() check with
current_chrooted() as it is done with user namespaces.  Indeed, it is
much more simple to check the current root instead of limiting access to
a subset of files.

The chroot system call is currently limited to be used by processes with
the CAP_SYS_CHROOT capability.  This protects against malicious
procesess willing to trick SUID-like binaries.  The following patch
allows unprivileged users to safely use chroot(2), which may be
complementary to the use of user namespaces.

This patch is a follow-up of a previous one sent by Andy Lutomirski some
time ago:
https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.l...@amacapital.net/

This patch can be applied on top of v5.12-rc2 .  I would really
appreciate constructive reviews.

Previous version:
https://lore.kernel.org/r/20210310181857.401675-1-...@digikod.net

Regards,

Mickaël Salaün (1):
  fs: Allow no_new_privs tasks to call chroot(2)

 fs/open.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)


base-commit: a38fd8748464831584a19438cbb3082b5a2dab15
-- 
2.30.2



[PATCH v3 0/1] Bluetooth: Suspend improvements

2021-03-03 Thread Abhishek Pandit-Subedi


Hi Marcel (and linux bluetooth),

Here are a few suspend improvements based on user reports we saw on
ChromeOS and feedback from Hans de Goede on the mailing list.

I have tested this using our ChromeOS suspend/resume automated tests
(full SRHealth test coverage and some suspend resume stress tests).

Thanks
Abhishek


Changes in v3:
* Minor change to if statement

Changes in v2:
* Removed hci_dev_lock from hci_cc_set_event_filter since flags are
  set/cleared atomically

Abhishek Pandit-Subedi (1):
  Bluetooth: Remove unneeded commands for suspend

 include/net/bluetooth/hci.h |  1 +
 net/bluetooth/hci_event.c   | 27 +++
 net/bluetooth/hci_request.c | 44 +++--
 3 files changed, 55 insertions(+), 17 deletions(-)

-- 
2.31.0.rc0.254.gbdcc3b1a9d-goog



[PATCH v3 0/1] s390/vfio-ap: fix circular lockdep when starting SE guest

2021-03-02 Thread Tony Krowiak
*Commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated") introduced a change that results in a circular
lockdep when a Secure Execution guest that is configured with
crypto devices is started. The problem resulted due to the fact that the
patch moved the setting of the guest's AP masks within the protection of
the matrix_dev->lock when the vfio_ap driver is notified that the KVM 
pointer has been set. Since it is not critical that setting/clearing of
the guest's AP masks be done under the matrix_dev->lock when the driver is
notified, the masks will not be updated under the matrix_dev->lock. The
lock is necessary for the setting/unsetting of the KVM pointer, however,
so that will remain in place. 

The dependency chain for the circular lockdep resolved by this patch 
is (in reverse order):

2:  vfio_ap_mdev_group_notifier:kvm->lock
matrix_dev->lock

1:  handle_pqap:matrix_dev->lock
kvm_vcpu_ioctl: vcpu->mutex

0:  kvm_s390_cpus_to_pv:vcpu->mutex
kvm_vm_ioctl:   kvm->lock

Please note:
---
* If checkpatch is run against this patch series, you may
  get a "WARNING: Unknown commit id 'f21916ec4826', maybe rebased or not 
  pulled?" message. The commit 'f21916ec4826', however, is definitely
  in the master branch on top of which this patch series was built, so I'm
 not sure why this message is being output by checkpatch.
* All acks granted from previous review of this patch have been removed due
  to the fact that this patch introduces non-trivial changes (see change
  log below).

Change log v2=> v3:
-- 
* Added two fields - 'bool kvm_busy' and 'wait_queue_head_t wait_for_kvm' -
  fields to struct ap_matrix_mdev. The former indicates that the KVM
  pointer is in the process of being updated and the second allows a
  function that needs access to the KVM pointer to wait until it is
  no longer being updated. Resolves problem of synchronization between
  the functions that change the KVM pointer value and the functions that
  required access to it.

Change log v1=> v2:
--
* No longer holding the matrix_dev->lock prior to setting/clearing the
  masks supplying the AP configuration to a KVM guest.
* Make all updates to the data in the matrix mdev that is used to manage
  AP resources used by the KVM guest in the vfio_ap_mdev_set_kvm() function
  instead of the group notifier callback.
* Check for the matrix mdev's KVM pointer in the vfio_ap_mdev_unset_kvm()
  function instead of the vfio_ap_mdev_release() function.

Tony Krowiak (1):
  s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

 drivers/s390/crypto/vfio_ap_ops.c | 312 ++
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 218 insertions(+), 96 deletions(-)

-- 
2.21.3



[PATCH v3 0/1] iio: adc: ad7124: allow more than 8 channels

2021-02-23 Thread alexandru.tachici
From: Alexandru Tachici 

Currently AD7124-8 driver cannot use more than 8 IIO channels
because it was assigning the channel configurations bijectively
to channels specified in the device-tree. This is not possible
to do when using more than 8 channels as AD7124-8 has only 8
configuration registers.

All configurations are marked as live if they are
programmed on the device. Any change that happens from
userspace (sampling rate, filters etc.) will invalidate
them.

To allow the user to use all channels at once the driver
will keep in memory configurations for all channels but
will program only 8 of them at a time on the device.

If multiple channels have the same configuration, only
one configuration register will be used.

If there are more configurations needed than available registers
only the last 8 used configurations will be allowed to exist
on the device in a LRU fashion. (in case of raw reads).

If a read is requested on a channel whose configuration
is not programmed:
- check if there are similar configurations already programmed
if yes: - point channel to that config
if no:  - check if there are empty config slots
- if yes: write config, push into queue of LRU configs
- if no: pop one config, get it's config slot nr,
write new config on the old slot, push new config
in queue of LRU configs.

Alexandru Tachici (1):
  iio: adc: ad7124: allow more than 8 channels

 drivers/iio/adc/ad7124.c | 461 ++-
 1 file changed, 308 insertions(+), 153 deletions(-)

-- 
2.20.1



[PATCH v3 0/1] Automatic LSM stack ordering

2021-02-22 Thread Mickaël Salaün
Hi,

This patch series gives the opportunity to users to not manually
configure the list of LSM enabled at boot but instead always rely on the
up-to-date list of existing LSMs.  Indeed, CONFIG_LSM may never be
updated with a make oldconfig whereas users may select new LSMs over
time.  With this patch, when running make oldconfig, a new option
CONFIG_LSM_AUTO is pre-selected to delegate LSM ordering to the kernel
developers, according to the user configuration.

This third series replace the previous virtual dependencies with a new
option to automatically enable all selected LSMs.  This is cleaner,
simpler, and makes the transition more convenient.

This patch series can be applied on v5.11-7580-gea914b7ffbfd (or v5.11).
Previous version:
https://lore.kernel.org/r/20210215181511.2840674-1-...@digikod.net

Mickaël Salaün (1):
  security: Add CONFIG_LSM_AUTO to handle default LSM stack ordering

 security/Kconfig| 19 +++
 security/security.c | 26 +-
 2 files changed, 44 insertions(+), 1 deletion(-)


base-commit: 31caf8b2a847214be856f843e251fc2ed2cd1075
-- 
2.30.0



Re: [PATCH v3 0/1] phy: fsl-imx8-mipi-dphy: Hook into runtime pm

2021-02-19 Thread Liu Ying
Hi Guido,

On Fri, 2021-02-19 at 10:38 +0100, Guido Günther wrote:
> Hi,
> On Wed, Dec 16, 2020 at 07:22:32PM +0100, Guido Günther wrote:
> > This allows us to shut down the mipi power domain on the imx8. The 
> > alternative
> > would be to drop the dphy from the mipi power domain in the SOCs device tree
> > and only have the DSI host controller visible there but since the PD is 
> > mostly
> > about the PHY that would defeat it's purpose.
> 
> Is there anything I can do to move that forward. I assume this needs to
> go via the phy/ subsystem not drm?

I cannot find patch 1/1 of v3 in my mailbox, so I'll provide comment on
v2.

Regards,
Liu Ying

> Cheers,
>  -- Guido
> 
> > This is basically a resend from February 2020 which went without feedback.
> > 
> > This allows to shut off the power domain hen blanking the LCD panel:
> > 
> > pm_genpd_summary before:
> > 
> > domain  status  slaves
> > /device runtime status
> > --
> > mipion
> > /devices/platform/soc@0/soc@0:bus@3080/30a00300.dphy  unsupported
> > /devices/platform/soc@0/soc@0:bus@3080/30a0.mipi_dsi  suspended
> > 
> > after:
> > 
> > mipioff-0
> > /devices/platform/soc@0/soc@0:bus@3080/30a00300.dphy  suspended
> > /devices/platform/soc@0/soc@0:bus@3080/30a0.mipi_dsi  suspended
> > 
> > Changes from v1:
> >  - Tweak commit message slightly
> > 
> > Changes from v2:
> >   - As pre review comment by Lucas Stach
> > 
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-arm-kernel%2Fee22b072e0abe07559a3e6a63ccf6ece064a46cb.camel%40pengutronix.de%2Fdata=04%7C01%7Cvictor.liu%40nxp.com%7Ccac0b14c892c4a35340508d8d4ba2e16%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637493243396909710%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=PU5kegolJwKK%2BQ7nD7V9qjrKJ2fJ9eKoySoFihnFoD8%3Dreserved=0
> > Check for pm_runtime_get_sync failure
> > 
> > Guido Günther (1):
> >   phy: fsl-imx8-mipi-dphy: Hook into runtime pm
> > 
> >  .../phy/freescale/phy-fsl-imx8-mipi-dphy.c| 25 ++-
> >  1 file changed, 24 insertions(+), 1 deletion(-)
> > 
> > -- 
> > 2.29.2
> > 
> > 
> > ___
> > linux-arm-kernel mailing list
> > linux-arm-ker...@lists.infradead.org
> > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-arm-kerneldata=04%7C01%7Cvictor.liu%40nxp.com%7Ccac0b14c892c4a35340508d8d4ba2e16%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637493243396909710%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=kkC3Go0wvHemjxaKVHwU%2F6gWRsgVOFoVz7QEHB7Zqx0%3Dreserved=0



Re: [PATCH v3 0/1] phy: fsl-imx8-mipi-dphy: Hook into runtime pm

2021-02-19 Thread Guido Günther
Hi,
On Wed, Dec 16, 2020 at 07:22:32PM +0100, Guido Günther wrote:
> This allows us to shut down the mipi power domain on the imx8. The alternative
> would be to drop the dphy from the mipi power domain in the SOCs device tree
> and only have the DSI host controller visible there but since the PD is mostly
> about the PHY that would defeat it's purpose.

Is there anything I can do to move that forward. I assume this needs to
go via the phy/ subsystem not drm?
Cheers,
 -- Guido

> 
> This is basically a resend from February 2020 which went without feedback.
> 
> This allows to shut off the power domain hen blanking the LCD panel:
> 
> pm_genpd_summary before:
> 
> domain  status  slaves
> /device runtime status
> --
> mipion
> /devices/platform/soc@0/soc@0:bus@3080/30a00300.dphy  unsupported
> /devices/platform/soc@0/soc@0:bus@3080/30a0.mipi_dsi  suspended
> 
> after:
> 
> mipioff-0
> /devices/platform/soc@0/soc@0:bus@3080/30a00300.dphy  suspended
> /devices/platform/soc@0/soc@0:bus@3080/30a0.mipi_dsi  suspended
> 
> Changes from v1:
>  - Tweak commit message slightly
> 
> Changes from v2:
>   - As pre review comment by Lucas Stach
> 
> https://lore.kernel.org/linux-arm-kernel/ee22b072e0abe07559a3e6a63ccf6ece064a46cb.ca...@pengutronix.de/
> Check for pm_runtime_get_sync failure
> 
> Guido Günther (1):
>   phy: fsl-imx8-mipi-dphy: Hook into runtime pm
> 
>  .../phy/freescale/phy-fsl-imx8-mipi-dphy.c| 25 ++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> -- 
> 2.29.2
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


[PATCH v3 0/1] Add FITRIM ioctl support for exFAT filesystem

2021-02-16 Thread Hyeongseok Kim
This is for adding FITRIM ioctl functionality to exFAT filesystem.
To do that, add generic ioctl function and FITRIM handler.

Changelog
=
v2->v3:
- Remove unnecessary local variable
- Merge all changes to a single patch

v1->v2:
- Change variable declaration order as reverse tree style.
- Return -EOPNOTSUPP from sb_issue_discard() just as it is.
- Remove cond_resched() in while loop.
- Move ioctl related code into it's helper function.

Hyeongseok Kim (1):
  exfat: add support ioctl and FITRIM function

 fs/exfat/balloc.c   | 81 +
 fs/exfat/dir.c  |  5 +++
 fs/exfat/exfat_fs.h |  4 +++
 fs/exfat/file.c | 53 +
 4 files changed, 143 insertions(+)

-- 
2.27.0.83.g0313f36



[PATCH v3 0/1] correct the inside linear map range during hotplug check

2021-02-16 Thread Pavel Tatashin
v3: - Sync with linux-next where arch_get_mappable_range() was
  introduced.
v2: - Added test-by Tyler Hicks
- Addressed comments from Anshuman Khandual: moved check under
  IS_ENABLED(CONFIG_RANDOMIZE_BASE), added 
  WARN_ON(start_linear_pa > end_linear_pa);

Fixes a hotplug error that may occur on systems with CONFIG_RANDOMIZE_BASE
enabled.

Applies against linux-next.

v1:
https://lore.kernel.org/lkml/20210213012316.1525419-1-pasha.tatas...@soleen.com
v2:
https://lore.kernel.org/lkml/20210215192237.362706-1-pasha.tatas...@soleen.com

Pavel Tatashin (1):
  arm64: mm: correct the inside linear map range during hotplug check

 arch/arm64/mm/mmu.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

-- 
2.25.1



[PATCH v3 0/1] AMD EPYC: fix schedutil perf regression (freq-invariance)

2021-02-03 Thread Giovanni Gherdovich
v2 at https://lore.kernel.org/lkml/20210122204038.3238-1-ggherdov...@suse.cz

Changes wrt v2:

- removed redundant "#ifdef CONFIG_ACPI_CPPC_LIB"

Giovanni Gherdovich (1):
  x86,sched: On AMD EPYC set freq_max = max_boost in schedutil invariant
formula

 drivers/cpufreq/acpi-cpufreq.c   | 61 ++--
 drivers/cpufreq/cpufreq.c|  3 ++
 include/linux/cpufreq.h  |  5 +++
 kernel/sched/cpufreq_schedutil.c |  8 +++--
 4 files changed, 73 insertions(+), 4 deletions(-)

-- 
2.26.2



[PATCH v3 0/1] scale loop device lock

2021-01-25 Thread Pavel Tatashin
Changelog
v3
  - Added review-by Tyler
  - Sync with mainline
v2
  - Addressed Tyler Hicks comments
  - added mutex_destroy()
  - comment in lo_open()
  - added lock around lo_disk
===

In our environment we are using systemd portable containers in
squashfs formats, convert them into loop device, and mount.

NAME  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop5   7:50  76.4M  0 loop
`-BaseImageM1908  252:30  76.4M  1 crypt /BaseImageM1908
loop6   7:6020K  0 loop
`-test_launchperf20   252:17   0   1.3M  1 crypt /app/test_launchperf20
loop7   7:7020K  0 loop
`-test_launchperf18   252:40   1.5M  1 crypt /app/test_launchperf18
loop8   7:80 8K  0 loop
`-test_launchperf8252:25   028K  1 crypt app/test_launchperf8
loop9   7:90   376K  0 loop
`-test_launchperf14   252:29   0  45.7M  1 crypt /app/test_launchperf14
loop10  7:10   016K  0 loop
`-test_launchperf4252:11   0   968K  1 crypt app/test_launchperf4
loop11  7:11   0   1.2M  0 loop
`-test_launchperf17   252:26   0 150.4M  1 crypt /app/test_launchperf17
loop12  7:12   036K  0 loop
`-test_launchperf19   252:13   0   3.3M  1 crypt /app/test_launchperf19
loop13  7:13   0 8K  0 loop
...

We have over 50 loop devices which are mounted  during boot.

We observed contentions around loop_ctl_mutex.

The sample contentions stacks:

Contention 1:
__blkdev_get()
   bdev->bd_disk->fops->open()
  lo_open()
 mutex_lock_killable(_ctl_mutex); <- contention

Contention 2:
__blkdev_put()
   disk->fops->release()
  lo_release()
 mutex_lock(_ctl_mutex); <- contention

With total time waiting for loop_ctl_mutex ~18.8s during boot (across 8
CPUs) on our machine (69 loop devices): 2.35s per CPU.

Scaling this lock eliminates this contention entirely, and improves the boot
performance by 2s on our machine.

v2 https://lore.kernel.org/lkml/20200723211748.13139-1-pasha.tatas...@soleen.com
v1 
https://lore.kernel.org/lkml/20200717205322.127694-1-pasha.tatas...@soleen.com

Pavel Tatashin (1):
  loop: scale loop device by introducing per device lock

 drivers/block/loop.c | 92 +---
 drivers/block/loop.h |  1 +
 2 files changed, 54 insertions(+), 39 deletions(-)

-- 
2.25.1



[RFC PATCH v3 0/1] Adding support for IIO SCMI based sensors

2021-01-21 Thread Jyoti Bhayana
Hi,

This series adds support for ARM SCMI Protocol based IIO Device.
This driver provides support for Accelerometer and Gyroscope sensor
using new SCMI Sensor Protocol defined by the upcoming SCMIv3.0 ARM
specification, which is available at 

https://developer.arm.com/documentation/den0056/c/

This version of the patch series has been tested using 
version 5.4.21 branch of Android common kernel.

Any feedback welcome,

Thanks,

Jyoti Bhayana

v2 --> v3
- Incorporated the feedback comments from v2 review of the patch

v1 --> v2
- Incorporated the feedback comments from v1 review of the patch
- Regarding the new ABI for sensor_power,sensor_max_range,
and sensor_resolution, these are some of the sensor attributes
which Android passes to the apps. If there is any other way of getting
those values, please let us know

Jyoti Bhayana (1):
  iio/scmi: Adding support for IIO SCMI Based Sensors

 MAINTAINERS|   6 +
 drivers/iio/common/Kconfig |   1 +
 drivers/iio/common/Makefile|   1 +
 drivers/iio/common/scmi_sensors/Kconfig|  18 +
 drivers/iio/common/scmi_sensors/Makefile   |   5 +
 drivers/iio/common/scmi_sensors/scmi_iio.c | 736 +
 6 files changed, 767 insertions(+)
 create mode 100644 drivers/iio/common/scmi_sensors/Kconfig
 create mode 100644 drivers/iio/common/scmi_sensors/Makefile
 create mode 100644 drivers/iio/common/scmi_sensors/scmi_iio.c

-- 
2.30.0.280.ga3ce27912f-goog



[PATCH v3 0/1] arm64: PCI SMC config conduit

2021-01-20 Thread Jeremy Linton
This set provides a platform standardized way to access PCI
config space. It does that via an Arm specific interface
exported by the firmware. The Arm specification this is
based on can be found here:

The Arm PCI Configuration Space Access Firmware Interface
https://developer.arm.com/documentation/den0115/latest


v2->v3:
Convert from SMC only calls to arm_smccc_1_1_invoke() for better
  conformance with the specification.
v1->v2:
Add SMC_PCI_FEATURES calls to verify _READ, _WRITE and _SEG_INFO 
  functions exist.
Add a _SEG_INFO bus start, end validation against the ACPI table.
Adjust some function naming, and log messages.

Jeremy Linton (1):
  arm64: PCI: Enable SMC conduit

 arch/arm64/kernel/pci.c   | 111 ++
 include/linux/arm-smccc.h |  29 ++
 2 files changed, 140 insertions(+)

-- 
2.26.2



[PATCH v3 0/1] Add software TX timestamps to the CAN devices

2021-01-10 Thread Vincent Mailhol
With the ongoing work to add BQL to Socket CAN, I figured out that it
would be nice to have an easy way to mesure the latency.

And one easy way to do so it to check the round trip time of the
packet by doing the difference between the software rx timestamp and
the software tx timestamp.

rx timestamps are already available. This patch gives the missing
piece: add a tx software timestamp feature to the CAN devices.

Of course, the tx software timestamp might also be used for other
purposes such as performance measurements of the different queuing
disciplines (e.g. by checking the difference between the kernel tx
software timestamp and the userland tx software timestamp).

v2 was a mistake, please ignore it (fogot to do git add, changes were
not reflected...)

v3 reflects the comments that Jeroen made in
https://lkml.org/lkml/2021/1/10/54

Vincent Mailhol (1):
  can: dev: add software tx timestamps

 drivers/net/can/dev.c | 1 +
 1 file changed, 1 insertion(+)

-- 
2.26.2


[PATCH v3 0/1] mfd: intel-m10-bmc: add sysfs files for mac_address

2021-01-05 Thread Russ Weight
Add two sysfs nodes to the Intel MAX10 BMC driver: mac_address
and mac_count. The mac_address provides the first of a series
of sequential MAC addresses assigned to the FPGA card. The
mac_count indicates how many MAC addresses are assigned to the
card.

Changelog v2 -> v3:
  - Updated Date and KernelVersion in ABI documentation

Changelog v1 -> v2:
  - Updated the documentation for the mac_address and mac_count
sysfs nodes to clearify their usage.
  - Changed sysfs _show() functions to use sysfs_emit() instead
of sprintf.

Russ Weight (1):
  mfd: intel-m10-bmc: expose mac address and count

 .../ABI/testing/sysfs-driver-intel-m10-bmc| 21 +
 drivers/mfd/intel-m10-bmc.c   | 43 +++
 include/linux/mfd/intel-m10-bmc.h |  9 
 3 files changed, 73 insertions(+)

-- 
2.25.1



[RFC PATCH V3 0/1] block: fix I/O errors in BLKRRPART

2021-01-04 Thread Minwoo Im
Hello,

  This patch fixes I/O errors during BLKRRPART ioctl() behavior right
after format operation that changed logical block size of the block
device with a same file descriptor opened.

Testcase:

  The following testcase is a case of NVMe namespace with the following
conditions:

  - Current LBA format is lbaf=0 (512 bytes logical block size)
  - LBA Format(lbaf=1) has 4096 bytes logical block size

  # Format block device logical block size 512B to 4096B
  nvme format /dev/nvme0n1 --lbaf=1 --force

  This will cause I/O errors because BLKRRPART ioctl() happened right after
the format command with same file descriptor opened in application
(e.g., nvme-cli) like:

  fd = open("/dev/nvme0n1", O_RDONLY);

  nvme_format(fd, ...);
  if (ioctl(fd, BLKRRPART) < 0)
...

Errors:

  We can see the Read command with Number of LBA(NLB) 0x(65535) which
was under-flowed because BLKRRPART operation requested request size based
on i_blkbits of the block device which is 9 via buffer_head.

  [dmesg-snip]  

[   10.771740] blk_update_request: operation not supported error, dev 
nvme0n1, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   10.780262] Buffer I/O error on dev nvme0n1, logical block 0, async page 
read

  [event-snip]  

 
kworker/0:1H-56  [000]    913.456922: nvme_setup_cmd: nvme0: 
disk=nvme0n1, qid=1, cmdid=216, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read 
slba=0, len=65535, ctrl=0x0, dsmgmt=0, reftag=0)
 ksoftirqd/0-9   [000] .Ns.   916.566351: nvme_complete_rq: nvme0: 
disk=nvme0n1, qid=1, cmdid=216, res=0x0, retries=0, flags=0x0, status=0x4002

  The patch below fixes the I/O errors by rejecting I/O requests from the
block layer with setting a flag to gendisk until the file descriptor
re-opened to be updated by __blkdev_get().  This is based on the previous
discussion [1].

Since V2:
  - Cover letter with testcase and error logs attached. Removed un-related
changes: empty line. (Chaitanya, [2])
  - Put blkdev with blkdev_put_no_open().

Since V1:
  - Updated patch to reject I/O rather than updating i_blkbits of the
block device's inode directly from driver. (Christoph, [1])

[1] 
https://lore.kernel.org/linux-nvme/20201223183143.GB13354@localhost.localdomain/T/#t
[2] 
https://lore.kernel.org/linux-nvme/20201230140504.GB7917@localhost.localdomain/T/#t

Thanks,

Minwoo Im (1):
  block: reject I/O for same fd if block size changed

 block/blk-settings.c|  8 
 block/partitions/core.c | 11 +++
 fs/block_dev.c  |  6 ++
 include/linux/genhd.h   |  1 +
 4 files changed, 26 insertions(+)

-- 
2.17.1



[PATCH v3 0/1] mm: memmap defer init dosn't work as expected

2020-12-23 Thread Baoquan He
Post the regression fix in a standalone patch as Andrew suggested for
-stable branch better back porting. This is rebased on the latest
master branch of mainline kenrel, surely there's almost no change
comparing with v2.
https://lore.kernel.org/linux-mm/20201220082754.6900-1-...@redhat.com/

Tested on a system with 24G ram as below, adding 'memmap=128M!0x5'
to split the one ram region into two regions in numa node1 to simulate
the scenario of VMware.

[  +0.00] BIOS-provided physical RAM map:
[  +0.00] BIOS-e820: [mem 0x-0x0009bfff] usable
[  +0.00] BIOS-e820: [mem 0x0009c000-0x0009] reserved
[  +0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[  +0.00] BIOS-e820: [mem 0x0010-0x6cdcefff] usable
[  +0.00] BIOS-e820: [mem 0x6cdcf000-0x6efcefff] reserved
[  +0.00] BIOS-e820: [mem 0x6efcf000-0x6fdfefff] ACPI NVS
[  +0.00] BIOS-e820: [mem 0x6fdff000-0x6fffefff] ACPI data
[  +0.00] BIOS-e820: [mem 0x6000-0x6fff] usable
[  +0.00] BIOS-e820: [mem 0x7000-0x8fff] reserved
[  +0.00] BIOS-e820: [mem 0xe000-0x] reserved
[  +0.00] BIOS-e820: [mem 0x0001-0x00067f1f] usable
[  +0.00] BIOS-e820: [mem 0x00067f20-0x00067fff] reserved

Test passed as below. As you can see, with patch applied, memmap init
will cost much less time on numa node 1:

Without the patch:
[0.065029] Early memory node ranges
[0.065030]   node   0: [mem 0x1000-0x0009bfff]
[0.065032]   node   0: [mem 0x0010-0x6cdcefff]
[0.065034]   node   0: [mem 0x6000-0x6fff]
[0.065036]   node   0: [mem 0x0001-0x00027fff]
[0.065038]   node   1: [mem 0x00028000-0x0004]
[0.065040]   node   1: [mem 0x00050800-0x00067f1f]
[0.065185] Zeroed struct page in unavailable ranges: 16533 pages
[0.065187] Initmem setup node 0 [mem 0x1000-0x00027fff]
[0.069616] Initmem setup node 1 [mem 0x00028000-0x00067f1f]
[0.096298] ACPI: PM-Timer IO Port: 0x408

With the patch applied:
[0.065029] Early memory node ranges
[0.065030]   node   0: [mem 0x1000-0x0009bfff]
[0.065032]   node   0: [mem 0x0010-0x6cdcefff]
[0.065034]   node   0: [mem 0x6000-0x6fff]
[0.065036]   node   0: [mem 0x0001-0x00027fff]
[0.065038]   node   1: [mem 0x00028000-0x0004]
[0.065041]   node   1: [mem 0x00050800-0x00067f1f]
[0.065187] Zeroed struct page in unavailable ranges: 16533 pages
[0.065189] Initmem setup node 0 [mem 0x1000-0x00027fff]
[0.069572] Initmem setup node 1 [mem 0x00028000-0x00067f1f]
[0.070161] ACPI: PM-Timer IO Port: 0x408


Baoquan He (1):
  mm: memmap defer init dosn't work as expected

 arch/ia64/mm/init.c | 4 ++--
 include/linux/mm.h  | 5 +++--
 mm/memory_hotplug.c | 2 +-
 mm/page_alloc.c | 8 +---
 4 files changed, 11 insertions(+), 8 deletions(-)

-- 
2.17.2



[PATCH v3 0/1] phy: fsl-imx8-mipi-dphy: Hook into runtime pm

2020-12-16 Thread Guido Günther
This allows us to shut down the mipi power domain on the imx8. The alternative
would be to drop the dphy from the mipi power domain in the SOCs device tree
and only have the DSI host controller visible there but since the PD is mostly
about the PHY that would defeat it's purpose.

This is basically a resend from February 2020 which went without feedback.

This allows to shut off the power domain hen blanking the LCD panel:

pm_genpd_summary before:

domain  status  slaves
/device runtime status
--
mipion
/devices/platform/soc@0/soc@0:bus@3080/30a00300.dphy  unsupported
/devices/platform/soc@0/soc@0:bus@3080/30a0.mipi_dsi  suspended

after:

mipioff-0
/devices/platform/soc@0/soc@0:bus@3080/30a00300.dphy  suspended
/devices/platform/soc@0/soc@0:bus@3080/30a0.mipi_dsi  suspended

Changes from v1:
 - Tweak commit message slightly

Changes from v2:
  - As pre review comment by Lucas Stach

https://lore.kernel.org/linux-arm-kernel/ee22b072e0abe07559a3e6a63ccf6ece064a46cb.ca...@pengutronix.de/
Check for pm_runtime_get_sync failure

Guido Günther (1):
  phy: fsl-imx8-mipi-dphy: Hook into runtime pm

 .../phy/freescale/phy-fsl-imx8-mipi-dphy.c| 25 ++-
 1 file changed, 24 insertions(+), 1 deletion(-)

-- 
2.29.2



[PATCH v3 0/1] net: Reduce rcu_barrier() contentions from 'unshare(CLONE_NEWNET)'

2020-12-11 Thread SeongJae Park
From: SeongJae Park 

On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls
make the number of active slab objects including 'sock_inode_cache' type
rapidly and continuously increase.  As a result, memory pressure occurs.

In more detail, I made an artificial reproducer that resembles the
workload that we found the problem and reproduce the problem faster.  It
merely repeats 'unshare(CLONE_NEWNET)' 50,000 times in a loop.  It takes
about 2 minutes.  On 40 CPU cores, 70GB DRAM machine, the available
memory continuously reduced in a fast speed (about 120MB per second,
15GB in total within the 2 minutes).  Note that the issue don't
reproduce on every machine.  On my 6 CPU cores machine, the problem
didn't reproduce.

'cleanup_net()' and 'fqdir_work_fn()' are functions that deallocate the
relevant memory objects.  They are asynchronously invoked by the work
queues and internally use 'rcu_barrier()' to ensure safe destructions.
'cleanup_net()' works in a batched maneer in a single thread worker,
while 'fqdir_work_fn()' works for each 'fqdir_exit()' call in the
'system_wq'.

Therefore, 'fqdir_work_fn()' called frequently under the workload and
made the contention for 'rcu_barrier()' high.  In more detail, the
global mutex, 'rcu_state.barrier_mutex' became the bottleneck.

I tried making 'rcu_barrier()' and subsequent lightweight works in
'fqdir_work_fn()' to be processed by a dedicated singlethread worker in
batch and confirmed it works.  After the change, No continuous memory
reduction but some fluctuation observed.  Nevertheless, the available
memory reduction was only up to about 400MB.  The following patch is for
the change.  I think this is the right solution for point fix of this
issue, but someone might blame different parts.

1. User: Frequent 'unshare()' calls
>From some point of view, such frequent 'unshare()' calls might seem only
insane.

2. Global mutex in 'rcu_barrier()'
Because of the global mutex, 'rcu_barrier()' callers could wait long
even after the callbacks started before the call finished.  Therefore,
similar issues could happen in another 'rcu_barrier()' usages.  Maybe we
can use some wait queue like mechanism to notify the waiters when the
desired time came.

I personally believe applying the point fix for now and making
'rcu_barrier()' improvement in longterm make sense.  If I'm missing
something or you have different opinion, please feel free to let me
know.


Patch History
-

Changes from v2
(https://lore.kernel.org/lkml/20201210080844.23741-1-sjp...@amazon.com/)
- Add numbers after the patch (Eric Dumazet)
- Make only 'rcu_barrier()' and subsequent lightweight works serialized
  (Eric Dumazet)

Changes from v1
(https://lore.kernel.org/netdev/20201208094529.23266-1-sjp...@amazon.com/)
- Keep xmas tree variable ordering (Jakub Kicinski)
- Add more numbers (Eric Dumazet)
- Use 'llist_for_each_entry_safe()' (Eric Dumazet)


SeongJae Park (1):
  net/ipv4/inet_fragment: Batch fqdir destroy works

 include/net/inet_frag.h  |  1 +
 net/ipv4/inet_fragment.c | 45 +---
 2 files changed, 39 insertions(+), 7 deletions(-)

-- 
2.17.1



[PATCH v3 0/1] Fix object remain in offline per-cpu quarantine

2020-12-01 Thread Kuan-Ying Lee
This patch fixes object remain in the offline per-cpu quarantine as
describe below.

Free objects will get into per-cpu quarantine if enable generic KASAN.
If a cpu is offline and users use kmem_cache_destroy, kernel will detect
objects still remain in the offline per-cpu quarantine and report error.

Register a cpu hotplug function to remove all objects in the offline
per-cpu quarantine when cpu is going offline. Set a per-cpu variable
to indicate this cpu is offline.

Changes since v3:
 - Add a barrier to ensure the ordering
 - Rename the init function

Changes since v2:
 - Thanks for Dmitry suggestion
 - Remove unnecessary code
 - Put offline variable into cpu_quarantine
 - Use single qlist_free_all call instead of iteration over all slabs
 - Add bug reporter in commit message

Kuan-Ying Lee (1):
  kasan: fix object remain in offline per-cpu quarantine

 mm/kasan/quarantine.c | 40 
 1 file changed, 40 insertions(+)

-- 
2.18.0



[PATCH V3 0/1] Add QPIC NAND support for IPQ6018

2020-11-30 Thread Kathiravan T
IPQ6018 has the QPIC NAND controller of version 1.5.0, which
uses the BAM DMA. Add support for the QPIC BAM, QPIC NAND and
enable the same in the board DTS file.

[V3]:
- Rebased on v5.10-rc6
- Renamed the qpic bam dma node name from 'dma' to 'dma-controller'
- Update the device register space to 64bit format

Above mentioned last two points based on the latest changes in the QCOM 
tree.

[V2]:
- Rebased on v5.10-rc2
- Replaced "ok" with "okay" for status property
- Dropped the MTD and dt-bindings patch as they are already picked in 
MTD tree

Kathiravan T (1):
  arm64: dts: ipq6018: Add the QPIC peripheral nodes

 arch/arm64/boot/dts/qcom/ipq6018-cp01-c1.dts | 16 
 arch/arm64/boot/dts/qcom/ipq6018.dtsi| 41 
 2 files changed, 57 insertions(+)


base-commit: b65054597872ce3aefbc6a666385eabdf9e288da
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v3 0/1] Add macro definition for the upcoming new OST driver.

2020-10-26 Thread Zhou Yanjie
Add new macro definition to "ingenic,sysost.h", exchange the original
ABI values of OST_CLK_PERCPU_TIMER and OST_CLK_GLOBAL_TIMER, prepare
for the upcoming new OST driver.

I'm sure that exchanging the ABI values of OST_CLK_PERCPU_TIMER and
OST_CLK_GLOBAL_TIMER will not affect the existing related drivers and
the SoCs whitch using these drivers, so we should be able to exchange
them safely.

v1->v2:
Rewrite the commit message so that each line is less than 80 characters.

v2->v3:
Add the description of why the exchange of ABI values will not affect
the existing driver into the commit message.

周琰杰 (Zhou Yanjie) (1):
  dt-bindings: timer: Add new OST support for the upcoming new driver.

 include/dt-bindings/clock/ingenic,sysost.h | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

-- 
2.11.0



[PATCH v3 0/1] ARM: dts: sun8i: add FriendlyArm ZeroPi support

2020-10-26 Thread Yu-Tung Chang
This patch add FriendlyArm ZeroPi support.

Wiki:
http://wiki.friendlyarm.com/wiki/index.php/ZeroPi

Schematic:
http://wiki.friendlyarm.com/wiki/images/7/71/ZeroPi_20190731_Schematic.pdf

v1:
- Remove the extra spaces in description text.

v2:
- Remove the ehci0 and ohci0 device nodes.
- Remove the usbphy->usb0_id_det-gpios property.

v3:
- Enable RGMII RX/TX delay on PHY.

Yu-Tung Chang (1):
  ARM: dts: sun8i: add FriendlyArm ZeroPi support

 .../devicetree/bindings/arm/sunxi.yaml|  5 ++
 arch/arm/boot/dts/Makefile|  1 +
 arch/arm/boot/dts/sun8i-h3-zeropi.dts | 87 +++
 3 files changed, 93 insertions(+)
 create mode 100644 arch/arm/boot/dts/sun8i-h3-zeropi.dts

-- 
2.29.0



[PATCH v3 0/1] fix i2c polling mode workaround for FU540-C000 SoC

2020-10-15 Thread Sagar Shrikant Kadam
The polling mode workaround for the FU540-C000 on HiFive Unleashed A00
board was added earlier. The logic for this seems to work only in case
the interrupt property was missing/not added into the i2c0 device node.

Here we address this issue by identifying the SOC based on compatibility
string and set the master xfer's to polling mode if it's the FU540-C000
SoC.

The fix has been tested on Linux 5.9.0-rc8 with a PMOD based RTCC sensor
connected to I2C pins J1 header of the board. Log for reference

# uname -a
Linux buildroot 5.9.0-rc8-1-g9da7791 #1 SMP Fri Oct 9 07:56:13 PDT 2020 
riscv64 GNU/Linux
# i2cdetect -y 0
 0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:  -- -- -- -- -- -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- 57 -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 6f
70: -- -- -- -- -- -- -- --
# i2cget 0 0x57 0 b -y
0xa5
# i2cset 0 0x57 0 0x9f b -y
# i2cget 0 0x57 0 b -y
0x9f
# i2cget 0 0x57 1 b -y
0xff
# i2cset 0 0x57 1 0xa9 b -y
# i2cget 0 0x57 1 b -y
0xa9
# i2cget 0 0x6f 0x20 b -y
0x98
# i2cset 0 0x6f 0x20 0xa5 b -y
# i2cget 0 0x6f 0x20 b -y
0xa5
# i2cget 0 0x6f 0x5f b -y
0x55
# i2cset 0 0x6f 0x5f 0x5a b -y
# i2cget 0 0x6f 0x5f b -y
0x5a
#

Without the fix here, it's observed that "i2cdetect -y 0"
turns the system unresponsive, with CPU stall messages.

Patch History:
===
V3:
-Rectified typo as suggested here:
 https://lkml.org/lkml/2020/10/9/902

V2: 
-Incorporated changes as suggested by Peter Kosgaard
 https://lkml.org/lkml/2020/10/8/663

V1: Base version



Sagar Shrikant Kadam (1):
  i2c: ocores: fix polling mode workaround on FU540-C000 SoC

 drivers/i2c/busses/i2c-ocores.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

-- 
2.7.4



[RESEND PATCH v3 0/1] PCI/ERR: fix regression introduced by 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")

2020-10-10 Thread Hedi Berriche
This is a resend of v3 as the the original, sent over 6 hours ago, is yet
to make it to LKML.

- Changes since v2:

 * set status to PCI_ERS_RESULT_RECOVERED, in case of successful link
   reset, if and only if the initial value of error status is
   PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER.

- Changes since v1:

 * changed the commit message to clarify what broke post commit 6d2c89441571
 * dropped the misnomer post_reset_status variable in favour of a more natural
   approach that relies on a boolean to keep track of the outcome of 
reset_link()

After commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
pcie_do_recovery() no longer calls ->slot_reset() in the case of a successful
reset which breaks error recovery by breaking driver (re)initialisation.

Cc: Russ Anderson 
Cc: Kuppuswamy Sathyanarayanan 
Cc: Bjorn Helgaas 
Cc: Ashok Raj 
Cc: Joerg Roedel 

Cc: sta...@kernel.org # v5.7+

---
Hedi Berriche (1):
  PCI/ERR: don't clobber status after reset_link()

 drivers/pci/pcie/err.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.28.0



[PATCH v3 0/1] PCI/ERR: fix regression introduced by 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")

2020-10-10 Thread Hedi Berriche
- Changes since v2:

 * set status to PCI_ERS_RESULT_RECOVERED, in case of successful link
   reset, if and only if the initial value of error status is
   PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER.

- Changes since v1:

 * changed the commit message to clarify what broke post commit 6d2c89441571
 * dropped the misnomer post_reset_status variable in favour of a more natural
   approach that relies on a boolean to keep track of the outcome of 
reset_link()

After commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
pcie_do_recovery() no longer calls ->slot_reset() in the case of a successful
reset which breaks error recovery by breaking driver (re)initialisation.

Cc: Russ Anderson 
Cc: Kuppuswamy Sathyanarayanan 
Cc: Bjorn Helgaas 
Cc: Ashok Raj 
Cc: Joerg Roedel 

Cc: sta...@kernel.org # v5.7+

---
Hedi Berriche (1):
  PCI/ERR: don't clobber status after reset_link()

 drivers/pci/pcie/err.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.28.0



[PATCH v3 0/1] 8bpp support for Ingenic-drm

2020-09-27 Thread Paul Cercueil
Final (?) version of my "small improvements to ingenic-drm" patchset.

Most of the patches of V2 have been merged to drm-misc-next, except this
one which required some more work.

In the CRTC's .atomic_check callback, the size of the gamma LUT property
is now checked, so that only a complete 256-entry palette is accepted.

Cheers,
-Paul

Paul Cercueil (1):
  drm/ingenic: Add support for paletted 8bpp

 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 66 +--
 1 file changed, 62 insertions(+), 4 deletions(-)

-- 
2.28.0



[PATCH v3 0/1] convert l2 cache dt bindings to YAML format

2020-09-22 Thread Sagar Kadam
This patch is created and tested on top of mainline linux 
commit 856deb866d16 ("Linux 5.9-rc5")

Reference log of "make dt_binding_check" is available here[1].

Just in case required the log of dt_binding_check without this patch
is available here[2]

[1] https://paste.ubuntu.com/p/d2bXwvpFz9/
[2] https://paste.ubuntu.com/p/X2TzBbCs3k/

Change History:

v3:
-Incorporated changes as suggested by Rob Herring here[3]
 [3] https://lkml.org/lkml/2020/9/15/670
-Rebased patch on 5.9-rc5

V2:
-Fixed bot failure mentioned by Rob Herring
-Updated dt-schema and kernel as suggested

V1:
Base version

Sagar Kadam (1):
  dt-bindings: riscv: sifive-l2-cache: convert bindings to json-schema

 .../devicetree/bindings/riscv/sifive-l2-cache.txt  | 51 
 .../devicetree/bindings/riscv/sifive-l2-cache.yaml | 90 ++
 2 files changed, 90 insertions(+), 51 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt
 create mode 100644 Documentation/devicetree/bindings/riscv/sifive-l2-cache.yaml

-- 
2.7.4



Re: [PATCH v3 0/1] drm/bridge: ps8640: Make sure all needed is powered to get the EDID

2020-09-15 Thread Enric Balletbo i Serra
Hi Sam,

On 27/8/20 10:59, Enric Balletbo i Serra wrote:
> The first 4 patches of the series version 2:
>   - drm/bridge_connector: Set default status connected for eDP connectors
>   - drm/bridge: ps8640: Get the EDID from eDP control
>   - drm/bridge: ps8640: Return an error for incorrect attach flags
>   - drm/bridge: ps8640: Print an error if VDO control fails
> 
> Are already applied to drm-misc-next, so I removed from this series. The
> pending patch is part of the original series and is a rework of the power
> handling to get the EDID. Basically, we need to make sure all the
> needed is powered to be able to get the EDID. Before, we saw that getting
> the EDID failed as explained in the third patch.
> 
> [1] https://lkml.org/lkml/2020/6/15/1208
> 
> Changes in v3:
> - Make poweron/poweroff and pre_enable/post_disable reverse one to each other 
> (Sam Ravnborg)
> 
> Changes in v2:
> - Use drm_bridge_chain_pre_enable/post_disable() helpers (Sam Ravnborg)
> 
> Enric Balletbo i Serra (1):
>   drm/bridge: ps8640: Rework power state handling
> 
>  drivers/gpu/drm/bridge/parade-ps8640.c | 68 ++
>  1 file changed, 58 insertions(+), 10 deletions(-)
> 

A gentle ping on this patch. Would be nice land this together with the already
accepted patches.

Thanks,
  Enric


Re: [PATCH v3 0/1] drm/bridge: ps8640: Make sure all needed is powered to get the EDID

2020-09-15 Thread Neil Armstrong
Hi,

On 15/09/2020 14:40, Enric Balletbo i Serra wrote:
> Hi Sam,
> 
> On 27/8/20 10:59, Enric Balletbo i Serra wrote:
>> The first 4 patches of the series version 2:
>>   - drm/bridge_connector: Set default status connected for eDP connectors
>>   - drm/bridge: ps8640: Get the EDID from eDP control
>>   - drm/bridge: ps8640: Return an error for incorrect attach flags
>>   - drm/bridge: ps8640: Print an error if VDO control fails
>>
>> Are already applied to drm-misc-next, so I removed from this series. The
>> pending patch is part of the original series and is a rework of the power
>> handling to get the EDID. Basically, we need to make sure all the
>> needed is powered to be able to get the EDID. Before, we saw that getting
>> the EDID failed as explained in the third patch.
>>
>> [1] https://lkml.org/lkml/2020/6/15/1208
>>
>> Changes in v3:
>> - Make poweron/poweroff and pre_enable/post_disable reverse one to each 
>> other (Sam Ravnborg)
>>
>> Changes in v2:
>> - Use drm_bridge_chain_pre_enable/post_disable() helpers (Sam Ravnborg)
>>
>> Enric Balletbo i Serra (1):
>>   drm/bridge: ps8640: Rework power state handling
>>
>>  drivers/gpu/drm/bridge/parade-ps8640.c | 68 ++
>>  1 file changed, 58 insertions(+), 10 deletions(-)
>>
> 
> A gentle ping on this patch. Would be nice land this together with the already
> accepted patches.

Applying it to drm-misc-next

Thanks,
Neil

> 
> Thanks,
>   Enric
> 



[RFC/RFT PATCH v3 0/1] arc: add sparsemem support

2020-08-31 Thread Mike Rapoport
From: Mike Rapoport 

Hi,

This is yet another attempt to enable SPARSEMEM on ARC.

I've boot tested it on nSIM with haps_hs_defconfig with highmem and
sparsemem enabled.

With sparsemem the kernel text becomes a bit smaller, but bss and data are
slightly increased:

$ size discontig/vmlinux sparse/vmlinux
   textdata bss dec hex filename
4429390  785456  244580 5459426  534de2 discontig/vmlinux
4415099  786224  244844 5446167  531a17 sparse/vmlinux

I've also added a dummy global functions to wrap pfn_valid(), page_to_pfn()
and pfn_to_page(). Judging by objdump, sparsemem is a bit more efficient:

DISCONTIGMEMSPARSEMEM
:
seths   r2,0x3,r0   lsr r2,r0,0xe
mpy r2,r2,1896  mpy r0,r0,0x24
add r3,r2,0x8050066cadd3r2,0x80529d1c,r2
add_s   r2,r2,0x80500668ld_sr2,[r2,0]
ld_sr3,[r3,0]   bmskn   r2,r2,0x3
sub_s   r0,r0,r3j_s.d   [blink]
ld_sr2,[r2,0]   add_s   r0,r0,r2
mpy r0,r0,0x24  nop_s
j_s.d   [blink]
add_s   r0,r0,r2

:
ld_sr2,[r0,0]   ld_sr2,[r0,0]
lsr_s   r2,r2,0x1f  lsr_s   r2,r2,0x1b
mpy r2,r2,1896  add3r2,0x80529d1c,r2
add r3,r2,0x80500668ld_sr2,[r2,0]
add_s   r2,r2,0x8050066cbmskn   r2,r2,0x3
ld_sr3,[r3,0]   sub_s   r0,r0,r2
sub_s   r0,r0,r3asr_s   r0,r0,0x2
ld_sr2,[r2,0]   mpy r0,r0,0x38e38e39
asr_s   r0,r0,0x2   j_s [blink]
mpy r0,r0,0x38e38e39
j_s.d   [blink]
add_s   r0,r0,r2
nop_s

:
cmp_s   r0,0x3  lsr_s   r0,r0,0xe
mov_s   r2,0brhs.nt r0,0x20,24
mov.ls  r2,0x768add3r0,0x80529d1c,r0
add_s   r2,r2,0x80500814breq_s  r0,0,12
ld.as   r3,[r2,-106]ld_sr0,[r0,0]
ld.as   r2,[r2,-104]j_s.d   [blink]
add_s   r2,r2,r3xbfur0,r0,0x1
j_s.d   [blink] j_s.d   [blink]
seths   r0,r2,r0mov_s   r0,0
nop_s

Still, SPARSEMEM has an issue with potentially wasted memory allocated for
the memory map. The memory maps are allocated for each present section,
which means that if part of the section is not populated we'll have a bunch
of unused 'struct page' objects. The smaller the section size, the smaller
is memory overhead, but the section size cannot be much smaller than the
physical address because 

MAX_PHYSMEM_BITS - SECTION_SIZE_BITS

has to fit into page flags and the room there is limited.

There is yet another possibility to support separate banks. It is possible
to use FLATMEM and free the memmap allocated for the hole, like, for
instance, ARM does [1]. This will require ARC's override for pfn_valid()
that takes into account the actual memory configuration rather than relies
on the memmap.

[1] https://elixir.bootlin.com/linux/latest/source/arch/arm/mm/init.c#L305

Mike Rapoport (1):
  arc: add sparsemem support

 arch/arc/Kconfig | 10 ++
 arch/arc/include/asm/sparsemem.h | 13 +
 arch/arc/mm/init.c   |  6 +-
 3 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 arch/arc/include/asm/sparsemem.h

-- 
2.26.2



[PATCH v3 0/1] drm/bridge: ps8640: Make sure all needed is powered to get the EDID

2020-08-27 Thread Enric Balletbo i Serra
The first 4 patches of the series version 2:
  - drm/bridge_connector: Set default status connected for eDP connectors
  - drm/bridge: ps8640: Get the EDID from eDP control
  - drm/bridge: ps8640: Return an error for incorrect attach flags
  - drm/bridge: ps8640: Print an error if VDO control fails

Are already applied to drm-misc-next, so I removed from this series. The
pending patch is part of the original series and is a rework of the power
handling to get the EDID. Basically, we need to make sure all the
needed is powered to be able to get the EDID. Before, we saw that getting
the EDID failed as explained in the third patch.

[1] https://lkml.org/lkml/2020/6/15/1208

Changes in v3:
- Make poweron/poweroff and pre_enable/post_disable reverse one to each other 
(Sam Ravnborg)

Changes in v2:
- Use drm_bridge_chain_pre_enable/post_disable() helpers (Sam Ravnborg)

Enric Balletbo i Serra (1):
  drm/bridge: ps8640: Rework power state handling

 drivers/gpu/drm/bridge/parade-ps8640.c | 68 ++
 1 file changed, 58 insertions(+), 10 deletions(-)

-- 
2.28.0



[PATCH v3 0/1]extcon: ptn5150: Add usb-typec support for Intel LGM SoC

2020-08-27 Thread Ramuthevar,Vadivel MuruganX
Add usb-typec detection support for the Intel LGM SoC based boards.

Original driver is not supporting usb detection on Intel LGM SoC based boards
then we debugged and fixed the issue, but before sending our patches Mr.Krzyszto
has sent the same kind of patches, so I have rebased over his latest patches
which is present in maintainer tree.

Built and tested it's working fine, overthat created the new patch.

Thanks to Chanwoo Choi for the review comments and suggestions
---
v3:
  - Chanwoo Choi review comments update
  - replace 'capabiliy' to 'state' in commit message
  - add blank line 
v2:
  - Krzyszto review comments update
  - squash my previous patches 1 to 5 as single patch
  - add extcon_set_property_capability for EXTCON_USB and 
EXTCON_PROP_USB_TYPEC_POLARITY


Ramuthevar Vadivel Murugan (1):
  extcon: ptn5150: Set the VBUS and POLARITY property capability

 drivers/extcon/extcon-ptn5150.c | 7 +++
 1 file changed, 7 insertions(+)

-- 
2.11.0



[PATCH v3 0/1] netfilter: nat: add a range check for l3/l4 protonum

2020-08-24 Thread Will McVicker
Hi Pablo,

> This patch is much smaller and if you confirm this is address the
> issue, then this is awesome.

Yes, I can confirm the updated patch does fix the kernel panic. I have retested
on the Pixel 4 XL with version 4.14.180. Please see the updated patchset v3.

Thanks,
Will


Will McVicker (1):
  netfilter: nat: add a range check for l3/l4 protonum

 net/netfilter/nf_conntrack_netlink.c | 2 ++
 1 file changed, 2 insertions(+)

-- 
2.28.0.297.g1956fa8f8d-goog



Re: [PATCH V3 0/1] irqchip: intmux: implement intmux PM

2020-07-27 Thread Marc Zyngier
On Mon, 27 Jul 2020 22:17:33 +0800, Joakim Zhang wrote:
> This patch intends to implement intmux PM.
> 
> ChangeLogs:
> V2->V3:
>   1. allocate u32 saved_reg for a per channel.
> 
> V1->V2:
>   1. add more detailed commit message.
>   2. use u32 for 32bit HW registers.
>   3. fix kbuild failures.
>   4. move trivial functions into their respective callers.
>   5. squash two patches together.
> 
> [...]

Applied to irq/irqchip-next, thanks!

[1/1] irqchip/imx-intmux: Implement intmux runtime power management
  commit: bb403111e017a327737242eca40311921f833627

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.




[PATCH V3 0/1] irqchip: intmux: implement intmux PM

2020-07-27 Thread Joakim Zhang
This patch intends to implement intmux PM.

ChangeLogs:
V2->V3:
1. allocate u32 saved_reg for a per channel.

V1->V2:
1. add more detailed commit message.
2. use u32 for 32bit HW registers.
3. fix kbuild failures.
4. move trivial functions into their respective callers.
5. squash two patches together.

Joakim Zhang (1):
  irqchip: imx-intmux: implement intmux PM

 drivers/irqchip/irq-imx-intmux.c | 67 +++-
 1 file changed, 65 insertions(+), 2 deletions(-)

-- 
2.17.1



Re: [PATCH v3 0/1] ASoC: fsl_asrc: always select different clocks

2020-07-17 Thread Mark Brown
On Fri, Jul 17, 2020 at 01:34:34PM +0200, Arnaud Ferraris wrote:

> Understood, sorry about that. Should I do a "clean" re-send for this one?

It's fine, please just remember this for future submissions.


signature.asc
Description: PGP signature


Re: [PATCH v3 0/1] ASoC: fsl_asrc: always select different clocks

2020-07-17 Thread Arnaud Ferraris
Le 17/07/2020 à 13:21, Mark Brown a écrit :
> On Fri, Jul 17, 2020 at 12:38:56PM +0200, Arnaud Ferraris wrote:
>> This patch fixes the automatic clock selection so it always selects
>> distinct input and output clocks.
> 
> Please don't send new patches in reply to old ones, it buries things and
> makes it hard to keep track of what the current version of a series
> looks like.  Just send new versions as a completely new thread.
> 
> Please don't send cover letters for single patches, if there is anything
> that needs saying put it in the changelog of the patch or after the ---
> if it's administrative stuff.  This reduces mail volume and ensures that 
> any important information is recorded in the changelog rather than being
> lost. 
> 

Understood, sorry about that. Should I do a "clean" re-send for this one?

Regards,
Arnaud


Re: [PATCH v3 0/1] ASoC: fsl_asrc: always select different clocks

2020-07-17 Thread Mark Brown
On Fri, Jul 17, 2020 at 12:38:56PM +0200, Arnaud Ferraris wrote:
> This patch fixes the automatic clock selection so it always selects
> distinct input and output clocks.

Please don't send new patches in reply to old ones, it buries things and
makes it hard to keep track of what the current version of a series
looks like.  Just send new versions as a completely new thread.

Please don't send cover letters for single patches, if there is anything
that needs saying put it in the changelog of the patch or after the ---
if it's administrative stuff.  This reduces mail volume and ensures that 
any important information is recorded in the changelog rather than being
lost. 


signature.asc
Description: PGP signature


[PATCH v3 0/1] ASoC: fsl_asrc: always select different clocks

2020-07-17 Thread Arnaud Ferraris
This patch fixes the automatic clock selection so it always selects
distinct input and output clocks.

v2 -> v3:
- Update code comment, fix formatting and add more detailed explanations
  in commit message

v1 -> v2:
- compare clock indexes (and not the location in the clock table) to
  make sure input and output clocks are different

 Arnaud Ferraris(1):
   ASoC: fsl_asrc: make sure the input and output clocks are different

 sound/soc/fsl/fsl_asrc.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)




[PATCH v3 0/1] power: Emit change uevent when updating sysfs

2020-07-07 Thread Abhishek Pandit-Subedi


Hi linux-pm,

ChromeOS has a udev rule to chown the `power/wakeup` attribute so that
the power manager can modify it during runtime.

(https://source.chromium.org/chromiumos/chromiumos/codesearch/+/master:src/platform2/power_manager/udev/99-powerd-permissions.rules)

In our automated tests, we found that the `power/wakeup` attributes
weren't being chown-ed for some boards. On investigating, I found that
when the drivers probe and call device_set_wakeup_capable, no uevent was
being emitted for the newly added power/wakeup attribute. This was
manifesting at boot on some boards (Marvell SDIO bluetooth and Broadcom
Serial bluetooth drivers) or during usb disconnects during resume
(Realtek btusb driver with reset resume quirk).

It seems reasonable to me that changes to the attributes of a device
should cause a changed uevent so I have added that here.

Here's an example of the kernel events after toggling the authorized
bit of /sys/bus/usb/devices/1-3/

$ echo 0 > /sys/bus/usb/devices/1-3/authorized
KERNEL[27.357994] remove   
/devices/pci:00/:00:15.0/usb1/1-3/1-3:1.0/bluetooth/hci0/rfkill1 
(rfkill)
KERNEL[27.358049] remove   
/devices/pci:00/:00:15.0/usb1/1-3/1-3:1.0/bluetooth/hci0 (bluetooth)
KERNEL[27.358458] remove   /devices/pci:00/:00:15.0/usb1/1-3/1-3:1.0 
(usb)
KERNEL[27.358486] remove   /devices/pci:00/:00:15.0/usb1/1-3/1-3:1.1 
(usb)
KERNEL[27.358529] change   /devices/pci:00/:00:15.0/usb1/1-3 (usb)

$ echo 1 > /sys/bus/usb/devices/1-3/authorized
KERNEL[36.415749] change   /devices/pci:00/:00:15.0/usb1/1-3 (usb)
KERNEL[36.415798] add  /devices/pci:00/:00:15.0/usb1/1-3/1-3:1.0 
(usb)
KERNEL[36.417414] add  
/devices/pci:00/:00:15.0/usb1/1-3/1-3:1.0/bluetooth/hci0 (bluetooth)
KERNEL[36.417447] add  
/devices/pci:00/:00:15.0/usb1/1-3/1-3:1.0/bluetooth/hci0/rfkill2 
(rfkill)
KERNEL[36.417481] add  /devices/pci:00/:00:15.0/usb1/1-3/1-3:1.1 
(usb)

Thanks
Abhishek

Changes in v3:
- Simplified error handling

Changes in v2:
- Add newline at end of bt_dev_err

Abhishek Pandit-Subedi (1):
  power: Emit changed uevent on wakeup_sysfs_add/remove

 drivers/base/power/sysfs.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

-- 
2.27.0.212.ge8ba1cc988-goog



Re: [PATCH v3 0/1] hwmon:max6697: Allow max6581 to create tempX_offset

2020-07-06 Thread Guenter Roeck
On 7/6/20 6:18 PM, Chu Lin wrote:
> Per max6581, reg 4d and reg 4e is used for temperature read offset.
> This patch will let the user specify the temperature read offset for
> max6581. This patch is tested on max6581 and only applies to max6581.
> 

Since this is a single patch, you don't need patch 0.
Just add the change log after "---" to the actual patch.

Thanks,
Guenter

> Testing:
> echo 16250 > temp2_offset
> cat temp2_offset
> 16250
> 
> echo 17500 > temp3_offset
> cat temp3_offset
> 17500
> cat temp4_offset
> 0
> cat temp2_offset
> 17500
> 
> echo 0 > temp2_offset
> cat temp2_offset
> 0
> cat temp3_offset
> 17500
> 
> echo -0 > temp2_offset
> cat temp2_offset
> 0
> 
> echo -10 > temp2_offset
> cat temp2_input
> 4875
> 
> echo 1 > temp2_offset
> cat temp2_input
> 47125
> 
> echo -2000 > temp2_offset
> cat temp2_input
> 34875
> 
> echo -0 > temp2_offset
> cat temp2_input
> 37000
> 
> Signed-off-by: Chu Lin 
> ---
> ChangeLog v2 -> v3:
>   - Use reverse christmas tree order convension
>   - Fix the type issue where comparision is always true
>   - Change the line limit to 100 char instead of 80 char
> 
> ChangeLog v1 -> v2:
>   - Simplify the offset reg raw value to milli ceisus conversion
>   - Substitute the temp1_offset with dummy attr
>   - Avoid using double negative in the macro definition
>   - Return the actual error when i2c read/write is failed
>   - clamp the value to MAX or MIN respectively if an out of range input is 
> given
>   - Provide mux protection when multiple i2c accesses is required
> 
> Chu Lin (1):
>   hwmon:max6697: Allow max6581 to create tempX_offset attributes
> 
>  drivers/hwmon/max6697.c | 92 +++--
>  1 file changed, 88 insertions(+), 4 deletions(-)
> 



[PATCH v3 0/1] hwmon:max6697: Allow max6581 to create tempX_offset

2020-07-06 Thread Chu Lin
Per max6581, reg 4d and reg 4e is used for temperature read offset.
This patch will let the user specify the temperature read offset for
max6581. This patch is tested on max6581 and only applies to max6581.

Testing:
echo 16250 > temp2_offset
cat temp2_offset
16250

echo 17500 > temp3_offset
cat temp3_offset
17500
cat temp4_offset
0
cat temp2_offset
17500

echo 0 > temp2_offset
cat temp2_offset
0
cat temp3_offset
17500

echo -0 > temp2_offset
cat temp2_offset
0

echo -10 > temp2_offset
cat temp2_input
4875

echo 1 > temp2_offset
cat temp2_input
47125

echo -2000 > temp2_offset
cat temp2_input
34875

echo -0 > temp2_offset
cat temp2_input
37000

Signed-off-by: Chu Lin 
---
ChangeLog v2 -> v3:
  - Use reverse christmas tree order convension
  - Fix the type issue where comparision is always true
  - Change the line limit to 100 char instead of 80 char

ChangeLog v1 -> v2:
  - Simplify the offset reg raw value to milli ceisus conversion
  - Substitute the temp1_offset with dummy attr
  - Avoid using double negative in the macro definition
  - Return the actual error when i2c read/write is failed
  - clamp the value to MAX or MIN respectively if an out of range input is given
  - Provide mux protection when multiple i2c accesses is required

Chu Lin (1):
  hwmon:max6697: Allow max6581 to create tempX_offset attributes

 drivers/hwmon/max6697.c | 92 +++--
 1 file changed, 88 insertions(+), 4 deletions(-)

-- 
2.27.0.383.g050319c2ae-goog



[RFC PATCH v3 0/1] Add rwsem "contended hook" API and mmap_lock histograms

2020-06-18 Thread Axel Rasmussen
The overall goal of this patch is to add tracepoints around mmap_lock
acquisition. This will let us collect latency histograms, so we can see
how long we block for in the contended case. Our goal is to collect this
data across all of production at Google, so low overhead is critical.

I'm sending this RFC for feedback on the changes to rwsem.{h,c} and
lockdep.h in particular. I'll describe reasoning for the down_write case,
for brevity.

We want to measure the time lock acquisition takes. Naively, this is:

u64 start = sched_clock();
down_write(/* ... */);
trace(sched_clock() - start);

My measurements show that this adds ~5-6% overhead to building a kernel on
a test machine [1]. This level of overhead is unacceptably high.

My measurements show that only instrumenting the contended case lowers
overhead to < 1%. Naively, we can instrument only the contended case like
this:

if (!down_write_trylock(/* ... */))
/* Time and call down_write as before. */

However, in the case where `_trylock` succeeds, we have lost the lockdep
annotations (e.g. around ordering) `down_write` would normally include.
(Granted, we don't run with lockdep in production, but debug builds do.)

Assuming we need lower overhead, we aren't okay with losing lock
annotations, and we reject various alternatives to this patch:

- Making rwsem.c's __down_write and __down_write_trylock public, so
  mmap_lock.c could construct its own version of LOCK_CONTENDED with
  tracepoint calls.
- Having mmap_lock.c reach into rwsem.c's internals with "extern" forward
  declarations for these functions (and removing "static inline").
- Somehow adding the instrumentation directly to rwsem.c (either affecting
  all locks, or polluting it some other way).

The remaining alternative, I think, is what this patch proposes: add API
surface to rwsem.h which allows callers to provide instrumentation
callbacks which are invoked in the contended case.



[1]: For measuring the overhead of the instrumentation, I've been timing a
defconfig kernel build. The numbers above come from a KVM instance with
4 CPUs + 32G RAM, running 5.8-rc1 with this patch applied and a histogram
trigger configured for the acquire_returned tracepoint. My test script is
simple:

for (( i=0; i<5; ++i)); do
make mrproper > /dev/null || exit 1
make defconfig > /dev/null || exit 1
sync || exit 1
echo 3 > /proc/sys/vm/drop_caches || exit 1
/usr/bin/time make -j5 > /dev/null
done

The numbers I'm giving above are computed as:
(avg of 5 runs with this hist trigger enabled) / (avg on 5.8-rc1).

Axel Rasmussen (1):
  mmap_lock: add tracepoints around mmap_lock acquisition

 include/linux/lockdep.h  |  47 ++
 include/linux/mmap_lock.h|  27 ++-
 include/linux/rwsem.h|  12 ++
 include/trace/events/mmap_lock.h |  76 +
 kernel/locking/rwsem.c   |  64 +++
 mm/Kconfig   |  19 +++
 mm/Makefile  |   1 +
 mm/mmap_lock.c   | 281 +++
 8 files changed, 526 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/events/mmap_lock.h
 create mode 100644 mm/mmap_lock.c

--
2.27.0.111.gc72c7da667-goog



[PATCH v3 0/1] s390: virtio: let arch choose to accept devices without IOMMU feature

2020-06-17 Thread Pierre Morel
An architecture protecting the guest memory against unauthorized host
access may want to enforce VIRTIO I/O device protection through the
use of VIRTIO_F_IOMMU_PLATFORM.

Let's give a chance to the architecture to accept or not devices
without VIRTIO_F_IOMMU_PLATFORM.

Pierre Morel (1):
  s390: virtio: let arch accept devices without IOMMU feature

 arch/s390/mm/init.c |  6 ++
 drivers/virtio/virtio.c | 22 ++
 include/linux/virtio.h  |  2 ++
 3 files changed, 30 insertions(+)

-- 
2.25.1

Changelog

to v3:

- add warning
  (Connie, Christian)

- add comment
  (Connie)

- change hook name
  (Halil, Connie)

to v2:

- put the test in virtio_finalize_features()
  (Connie)

- put the test inside VIRTIO core
  (Jason)

- pass a virtio device as parameter
  (Halil)




[PATCH v3 0/1] ARM: Add Rockchip rk3288w support

2020-06-01 Thread Mylène Josserand
Hello everyone,

Context
---

Here is my V3 of my patches that add the support for the Rockchip
RK3288w which is a revision of the RK3288. It is mostly the same SOC
except for, at least, one clock tree which is different.
This difference is only known by looking at the BSP kernel [1].

Currently, the mainline kernel will not hang on rk3288w but it is
probably by "chance" because we got an issue on a lower kernel version.

According to Rockchip's U-Boot [2], the rk3288w can be detected using
the HDMI revision number (= 0x1A) in this version of the SOC.

Changelog
-

In this V3, the revision's detection is not done in the kernel anymore.
This patch will handle the rk3288w clock tree according to a new
compatible "rockchip,rk3288w-cru" that must be provided by bootloaders.

Changes since v2:
   - Remove all codes about revision detection, let's handle that by
   Bootloaders

Best regards,
Mylène Josserand

[1] 
https://github.com/rockchip-linux/kernel/blob/develop-4.4/drivers/clk/rockchip/clk-rk3288.c#L960..L964
[2] 
https://github.com/rockchip-linux/u-boot/blob/next-dev/arch/arm/mach-rockchip/rk3288/rk3288.c#L378..L388

Mylène Josserand (1):
  clk: rockchip: rk3288: Handle clock tree for rk3288w

 drivers/clk/rockchip/clk-rk3288.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

-- 
2.26.2



Re: [PATCH v3 0/1] net: ethernet: stmmac: simplify phy modes management for stm32

2020-05-18 Thread Christophe ROULLIER
Hi,

Just a "gentleman ping"

Regards,

Christophe.

On 27/04/2020 12:00, Christophe Roullier wrote:
> No new feature, just to simplify stm32 part to be easier to use.
> Add by default all Ethernet clocks in DT, and activate or not in function
> of phy mode, clock frequency, if property "st,ext-phyclk" is set or not.
> Keep backward compatibility
>
> version 3:
> Add acked from Alexandre Torgue
> Rebased on top of v5.7-rc2
>
> Christophe Roullier (1):
>net: ethernet: stmmac: simplify phy modes management for stm32
>
>   .../net/ethernet/stmicro/stmmac/dwmac-stm32.c | 74 +++
>   1 file changed, 44 insertions(+), 30 deletions(-)
>

[PATCH v3 0/1] vfio-ccw: Enable transparent CCW IPL from DASD

2020-05-05 Thread Jared Rossi
Remove the explicit prefetch check when using vfio-ccw devices.
This check does not trigger in practice as all Linux channel programs
are intended to use prefetch.

Version 3 improves logging by including the UUID of the vfio device
that triggers the warning.  A custom rate limit is used because
the generic rate limit of 10 per 5 seconds will still result in
multiple warnings during IPL. The warning message has been clarfied
to reflect that a channel program will be executed using prefetch
even though prefetch was not specified.

The text of warning itself does not explicitly refer to non-prefetching
channel programs as unsupported because it will trigger during IPL,
which is a normal and expected sequence.  Likewise, because we expect
the message to appear during IPL, the warning also does not explicitly
alert to the potential of an error, rather it simply notes that a
channel program is being executed in a way other than specified.

Verson 3 also makes some word choice changes to the documentation.

Jared Rossi (1):
  vfio-ccw: Enable transparent CCW IPL from DASD

 Documentation/s390/vfio-ccw.rst |  6 ++
 drivers/s390/cio/vfio_ccw_cp.c  | 19 ---
 2 files changed, 18 insertions(+), 7 deletions(-)

-- 
2.17.0



[PATCH v3 0/1] dmaengine: avalon: Intel Avalon-MM DMA Interface for PCIe

2019-10-16 Thread Alexander Gordeev
This series is against v5.4-rc3

I am posting "avalon-dma" update alone and going to post "avalon-test"
update as a follow-up or in the next round.

Changes since v2:
- avalon_dma_register() return value bug fixed;
- device_prep_slave_sg() does not crash dmaengine_prep_slave_single() now;
- kernel configuration options removed in favour of module parameters;
- BUG_ONs, WARN_ONs and dev_dbgs removed;
- goto labels renamed, other style issues addressed;
- polling loop in interrupt handler commented;

Changes since v1:
- "avalon-dma" converted to "dmaengine" model;
- "avalon-drv" renamed to "avalon-test";

The Avalon-MM DMA Interface for PCIe is a design used in hard IPs for
Intel Arria, Cyclone or Stratix FPGAs. It transfers data between on-chip
memory and system memory.

Testing was done using a custom FPGA build with Arria 10 FPGA streaming
data to target device RAM:

  +--++--++--++--+
  | Nios CPU |<-->|   RAM|<-->|  Avalon  |<-PCIe->| Host CPU |
  +--++--++--++--+

The RAM was examined for data integrity by examining RAM contents
from host CPU (indirectly - checking data DMAed to the system) and
from Nios CPU that has direct access to the device RAM. A companion
tool using "avalon-test" driver was used to DMA files to the device:
https://github.com/a-gordeev/avalon-tool.git

CC: dmaeng...@vger.kernel.org

Alexander Gordeev (1):
  dmaengine: avalon: Intel Avalon-MM DMA Interface for PCIe

 drivers/dma/Kconfig  |   2 +
 drivers/dma/Makefile |   1 +
 drivers/dma/avalon/Kconfig   |  14 +
 drivers/dma/avalon/Makefile  |   6 +
 drivers/dma/avalon/avalon-core.c | 476 +++
 drivers/dma/avalon/avalon-core.h |  92 ++
 drivers/dma/avalon/avalon-hw.c   | 186 
 drivers/dma/avalon/avalon-hw.h   |  85 ++
 drivers/dma/avalon/avalon-pci.c  | 144 ++
 9 files changed, 1006 insertions(+)
 create mode 100644 drivers/dma/avalon/Kconfig
 create mode 100644 drivers/dma/avalon/Makefile
 create mode 100644 drivers/dma/avalon/avalon-core.c
 create mode 100644 drivers/dma/avalon/avalon-core.h
 create mode 100644 drivers/dma/avalon/avalon-hw.c
 create mode 100644 drivers/dma/avalon/avalon-hw.h
 create mode 100644 drivers/dma/avalon/avalon-pci.c

Because the amount of changes since the previous version is quite big,
I am also posting the interdiff.

diff -u b/drivers/dma/avalon/Kconfig b/drivers/dma/avalon/Kconfig
--- b/drivers/dma/avalon/Kconfig
+++ b/drivers/dma/avalon/Kconfig
@@ -15,74 +14,0 @@
-
-if AVALON_DMA
-
-config AVALON_DMA_MASK_WIDTH
-   int "Avalon DMA streaming and coherent bitmask width"
-   range 0 64
-   default 64
-   help
- Width of bitmask for streaming and coherent DMA operations
-
-config AVALON_DMA_CTRL_BASE
-   hex "Avalon DMA controllers base"
-   default "0x"
-
-config AVALON_DMA_RD_EP_DST_LO
-   hex "Avalon DMA read controller base low"
-   default "0x8000"
-   help
- Specifies the lower 32-bits of the base address of the read
- status and descriptor table in the Root Complex memory.
-
-config AVALON_DMA_RD_EP_DST_HI
-   hex "Avalon DMA read controller base high"
-   default "0x"
-   help
- Specifies the upper 32-bits of the base address of the read
- status and descriptor table in the Root Complex memory.
-
-config AVALON_DMA_WR_EP_DST_LO
-   hex "Avalon DMA write controller base low"
-   default "0x80002000"
-   help
- Specifies the lower 32-bits of the base address of the write
- status and descriptor table in the Root Complex memory.
-
-config AVALON_DMA_WR_EP_DST_HI
-   hex "Avalon DMA write controller base high"
-   default "0x"
-   help
- Specifies the upper 32-bits of the base address of the write
- status and descriptor table in the Root Complex memory.
-
-config AVALON_DMA_PCI_VENDOR_ID
-   hex "PCI vendor ID"
-   default "0x1172"
-
-config AVALON_DMA_PCI_DEVICE_ID
-   hex "PCI device ID"
-   default "0xe003"
-
-config AVALON_DMA_PCI_BAR
-   int "PCI device BAR the Avalon DMA controller is mapped to"
-   range 0 5
-   default 0
-   help
- Number of PCI BAR the DMA controller is mapped to
-
-config AVALON_DMA_PCI_MSI_COUNT_ORDER
-   int "Count of MSIs the PCI device provides (order)"
-   range 0 5
-   default 5
-   help
- Number of vectors the PCI device uses in multiple MSI mode.
- This number is provided as the power of two.
-
-config AVALON_DMA_PCI_MSI_VECTOR
-   int "Vector number the DMA controller is mapped to"
-   range 0 31
-   default 0
-   help
- Number of MSI vector the DMA controller is mapped to in
- multiple MSI mode.
-
-endif
diff -u b/drivers/dma/avalon/avalon-core.c b/drivers/dma/avalon/avalon-core.c
--- 

[PATCH v3 0/1] x86/init: Add option to skip using RTC

2019-10-10 Thread Rahul Tanwar
Hi,

We have a new Atom Airmont core based product which does not support
RTC as persistent clock source.

Presently, platform ops get/set wallclock always use MC146818 RTC/CMOS
device to read & set time. This causes boot failure on our SOC with no
RTC. More specifically, it hangs in RTC driver's mach_get_cmos_time() 
when it polls RTC_FRQ_SELECT register and loops until Update-In-Progress
(UIP) flag gets cleared i.e. below code snippet.

while ((CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP))
cpu_relax();

After few rounds of review cycles/feedback, we concluded that we should
control it from Motorola MC146818 compatible RTC devicetree node.
Please see [1].

Make RTC read/write optional by detecting platforms which does not
support RTC/CMOS device through the corresponding DT node status
property. If status says disabled, then noop the get/set wallclock
ops.

For non DT enabled platforms or for DT enabled platforms which does
not define optional status property, proceed same as before.

Patch is baselined upon Linux 5.4-rc2 at below Git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/core

[1] Documentation/devicetree/bindings/rtc/rtc-cmos.txt

v3:
* Rebase to latest 5.4-rc2 kernel.
* Fix a build warning reported by kbuild test robot.

v2:
* As per review feedback, do not hack RTC read/write functions directly. 
  Instead, override get/set wallclock ops during setup_arch init sequence.

v1:
* Detect platforms with no RTC in RTC read/write functions and skip RTC
  read/write if not applicable.


Rahul Tanwar (1):
  x86/init: Noop get/set wallclock when platform doesn't support RTC

 arch/x86/kernel/x86_init.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

-- 
2.11.0



[PATCH v3 0/1] leds: fix /sys/class/leds//trigger

2019-09-29 Thread Akinobu Mita
Reading /sys/class/leds//trigger returns all available LED triggers.
However, the size of this file is limited to PAGE_SIZE because of the
limitation for sysfs attribute.

Enabling LED CPU trigger on systems with thousands of CPUs easily hits
PAGE_SIZE limit, and makes it impossible to see all available LED triggers
and which trigger is currently activated.

This patch converts /sys/class/leds//trigger to bin attribute and
removes the PAGE_SIZE limitation.

The first version of this seris provided the new api that follows the
"one value per file" rule of sysfs. The second version dropped it because
there have been a number of problems and it turns out that the new api
should be submitted separately.

* v3
- Remove "query" parameters from led_trigger_snprintf() and
  led_trigger_format()
- Return -ENOMEM immediately if memory allocation fails
- Drop Acked-by: tag due to a certain amount of changes

* v2
- Update commit message
- Drop patches for new api

Akinobu Mita (1):
  leds: remove PAGE_SIZE limit of /sys/class/leds//trigger

 drivers/leds/led-class.c|  8 ++--
 drivers/leds/led-triggers.c | 90 ++---
 drivers/leds/leds.h |  6 +++
 include/linux/leds.h|  5 ---
 4 files changed, 78 insertions(+), 31 deletions(-)

Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Cc: Jacek Anaszewski 
Cc: Pavel Machek 
Cc: Dan Murphy 
-- 
2.7.4



[PATCH v3 0/1] intel_cht_int33fe: Split code to USB Micro-B and Type-C variants

2019-09-18 Thread Yauhen Kharuzhy
Patch to support INT33FE ACPI pseudo-device on hardware with USB Micro-B
connector.

v4:
- Micro-B variant: Don't print error to the kernel log if i2c_acpi_new_device()
  has returned -EPROBE_DEFER.

v3:
- Rename TypeB variant to Micro-B (we have only one such device for now and it
  has Micro-B connector)
- Rebase on current linus/master
- Remove empty lines and replace "TypeC" by "Type-C"

v2:
Instead of defining two separated modules with two separated config
options, compile {common,typeb,typec} sources into one .ko module.
Call needed variant-specific probe function based after of hardware type
detection in common code.

Yauhen Kharuzhy (1):
  platform/x86/intel_cht_int33fe: Split code to USB Micro-B and Type-C
variants

 drivers/platform/x86/Kconfig  |  12 +-
 drivers/platform/x86/Makefile |   4 +
 .../platform/x86/intel_cht_int33fe_common.c   | 147 ++
 .../platform/x86/intel_cht_int33fe_common.h   |  41 +
 .../platform/x86/intel_cht_int33fe_microb.c   |  67 
 ...ht_int33fe.c => intel_cht_int33fe_typec.c} |  78 +-
 6 files changed, 276 insertions(+), 73 deletions(-)
 create mode 100644 drivers/platform/x86/intel_cht_int33fe_common.c
 create mode 100644 drivers/platform/x86/intel_cht_int33fe_common.h
 create mode 100644 drivers/platform/x86/intel_cht_int33fe_microb.c
 rename drivers/platform/x86/{intel_cht_int33fe.c => intel_cht_int33fe_typec.c} 
(82%)

-- 
2.23.0.rc1



[PATCH v3 0/1] intel_cht_int33fe: Split code to USB Micro-B and Type-C variants

2019-09-18 Thread Yauhen Kharuzhy
Patch to support INT33FE ACPI pseudo-device on hardware with USB Micro-B
connector.

v3:
- Rename TypeB variant to Micro-B (we have only one such device for now and it
  has Micro-B connector)
- Rebase on current linus/master
- Remove empty lines and replace "TypeC" by "Type-C"

v2:
Instead of defining two separated modules with two separated config
options, compile {common,typeb,typec} sources into one .ko module.
Call needed variant-specific probe function based after of hardware type
detection in common code.

Yauhen Kharuzhy (1):
  platform/x86/intel_cht_int33fe: Split code to USB Micro-B and Type-C
variants

 drivers/platform/x86/Kconfig  |  12 +-
 drivers/platform/x86/Makefile |   4 +
 .../platform/x86/intel_cht_int33fe_common.c   | 147 ++
 .../platform/x86/intel_cht_int33fe_common.h   |  41 +
 .../platform/x86/intel_cht_int33fe_microb.c   |  63 
 ...ht_int33fe.c => intel_cht_int33fe_typec.c} |  78 +-
 6 files changed, 272 insertions(+), 73 deletions(-)
 create mode 100644 drivers/platform/x86/intel_cht_int33fe_common.c
 create mode 100644 drivers/platform/x86/intel_cht_int33fe_common.h
 create mode 100644 drivers/platform/x86/intel_cht_int33fe_microb.c
 rename drivers/platform/x86/{intel_cht_int33fe.c => intel_cht_int33fe_typec.c} 
(82%)

-- 
2.23.0.rc1



[PATCH v3 0/1] Add option to skip using RTC

2019-09-03 Thread Rahul Tanwar
Hi,

There is a new product which does not support RTC as persistent clock source.

Platform ops get/set wallclock are used to get/set timespec through kernel 
timekeeping read/update_persistent_clock64() routines. Presently, get/set
wallclock ops always use MC146818A RTC/CMOS device to read & set time.
This causes boot failure on our new SOC with no RTC.

Make RTC read/write optional by detecting platforms which does not support
RTC/CMOS device through the corresponding DT node status property. If status
says disabled, then noop the get/set wallclock ops.

For non DT enabled machines or for DT enabled machines which does not define
optional status property, proceed same as before.

These patches are baselined upon Linux 5.3-rc6 at below Git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/core

v3:
* Fix a build warning reported by kbuild test robot.

v2:
* As per review feedback, do not hack RTC read/write functions directly. 
  Instead, override get/set wallclock ops during setup_arch init sequence.

v1:
* Detect platforms with no RTC in RTC read/write functions and skip RTC
  read/write if not applicable.


Rahul Tanwar (1):
  x86/init: Noop get/set wallclock when platform doesn't support RTC

 arch/x86/kernel/x86_init.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

-- 
2.11.0



Re: [PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load

2019-08-29 Thread Martin K. Petersen


> Problem description:
> 
> A node with Adaptec 6405 controller, latest BIOS V5.3-0[19204] A lot
> of disks attached to the controller.  Simple test: running mkfs.ext4
> on many disks on the same controller in parallel (mkfs is not
> important here, any serious io load triggers controller aborts)

Microchip folks: Please review!

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH v3 0/1] aacraid: Host adapter Adaptec 6405 constantly resets under high io load

2019-08-19 Thread Konstantin Khorenko
Problem description:

A node with Adaptec 6405 controller, latest BIOS V5.3-0[19204]
A lot of disks attached to the controller.
Simple test: running mkfs.ext4 on many disks on the same controller in
parallel (mkfs is not important here, any serious io load triggers controller
aborts)

Results:
* no problems (controller resets) with kernels prior to
  395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")

* latest ms kernel v5.2-rc6-15-g249155c20f9b - mkfs processes are in D state,
  lot of complains in logs like:

  [  654.894633] aacraid: Host adapter abort request.
  aacraid: Outstanding commands on (0,1,43,0):
  [  699.441034] aacraid: Host adapter abort request.
  aacraid: Outstanding commands on (0,1,40,0):
  [  699.442950] aacraid: Host adapter reset request. SCSI hang ?
  [  714.457428] aacraid: Host adapter reset request. SCSI hang ?
  ...
  [  759.514759] aacraid: Host adapter reset request. SCSI hang ?
  [  759.514869] aacraid :03:00.0: outstanding cmd: midlevel-0
  [  759.514870] aacraid :03:00.0: outstanding cmd: lowlevel-0
  [  759.514872] aacraid :03:00.0: outstanding cmd: error handler-498
  [  759.514873] aacraid :03:00.0: outstanding cmd: firmware-471
  [  759.514875] aacraid :03:00.0: outstanding cmd: kernel-60
  [  759.514912] aacraid :03:00.0: Controller reset type is 3
  [  759.515013] aacraid :03:00.0: Issuing IOP reset
  [  850.296705] aacraid :03:00.0: IOP reset succeeded

Same complains on Ubuntu kernel 4.15.0-50-generic:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777586

Controller:
===
03:00.0 RAID bus controller: Adaptec Series 6 - 6G SAS/PCIe 2 (rev 01)
 Subsystem: Adaptec Series 6 - ASR-6405 - 4 internal 6G SAS ports

Test:
=
# cat dev.list
/dev/sdq1
/dev/sde1
/dev/sds1
/dev/sdb1
/dev/sdk1
/dev/sdaj1
/dev/sdaf1
/dev/sdd1
/dev/sdac1
/dev/sdai1
/dev/sdz1
/dev/sdj1
/dev/sdy1
/dev/sdn1
/dev/sdae1
/dev/sdg1
/dev/sdi1
/dev/sdc1
/dev/sdf1
/dev/sdl1
/dev/sda1
/dev/sdab1
/dev/sdr1
/dev/sdo1
/dev/sdah1
/dev/sdm1
/dev/sdt1
/dev/sdp1
/dev/sdad1
/dev/sdh1

===
# cat run_mkfs.sh
#!/bin/bash

while read i; do
   mkfs.ext4 $i -q -E lazy_itable_init=1 -O uninit_bg -m 0 &
done

=
# cat dev.list | ./run_mkfs.sh

The issue is 100% reproducible.

i've bisected to the culprit patch, it's
395e5df79a95 ("scsi: aacraid: Remove reference to Series-9")

it changes arc ctrl checks for Series-6 controllers
and i've checked that resurrection of original logic in arc ctrl checks
eliminates controller hangs/resets.

Konstantin Khorenko (1):
  scsi: aacraid: resurrect correct arc ctrl checks for Series-6

--
v3 changes:
 * introduced another wrapper to check for devices except for Series 6
   controllers upon request from Sagar Biradar (Microchip)

 * dropped mentions of private bug ids


 drivers/scsi/aacraid/aacraid.h  | 11 +++
 drivers/scsi/aacraid/comminit.c |  5 ++---
 drivers/scsi/aacraid/linit.c|  2 +-
 3 files changed, 14 insertions(+), 4 deletions(-)

-- 
2.15.1



Re: [PATCH v3 0/1] waitid: process group enhancement

2019-08-14 Thread Christian Brauner
On Wed, Aug 14, 2019 at 11:58:22AM -0400, Rich Felker wrote:
> On Wed, Aug 14, 2019 at 05:43:59PM +0200, Christian Brauner wrote:
> > Hey everyone,
> > 
> > This patch adds support for waiting on the current process group by
> > specifying waitid(P_PGID, 0, ...) as discussed in [1]. The details why
> > we need to do this are in the commit message of [PATCH 1/1] so I won't
> > repeat them here.
> > 
> > I've picked this up since the thread has gone stale and parts of
> > userspace are actually blocked by this.
> > 
> > Note that the patch has been changed to be more closely aligned with the
> > P_PIDFD changes to waitid() I have sitting in my for-next branch (cf. [2]).
> > This makes the merge conflict a little simpler and picks up on the
> > coding style discussions that guided the P_PIDFD patchset.
> > 
> > There was some desire to get this feature in with 5.3 (cf. [3]).
> > But given that this is a new feature for waitid() and for the sake of
> > avoiding any merge conflicts I would prefer to land this in the 5.4
> > merge window together with the P_PIDFD changes.
> 
> That makes 5.4 (or later, depending on other stuff) the hard minimum
> for RV32 ABI. Is that acceptable? I was under the impression (perhaps
> mistaken) that 5.3 was going to be next LTS series which is why I'd
> like to have the necessary syscalls for a complete working RV32
> userspace in it. If I'm wrong about that please ignore me. :-)

5.3 is not going to be an LTS and we don't do new features after the
merge window is closed anyway. :)

Christian


Re: [PATCH v3 0/1] waitid: process group enhancement

2019-08-14 Thread Rich Felker
On Wed, Aug 14, 2019 at 05:43:59PM +0200, Christian Brauner wrote:
> Hey everyone,
> 
> This patch adds support for waiting on the current process group by
> specifying waitid(P_PGID, 0, ...) as discussed in [1]. The details why
> we need to do this are in the commit message of [PATCH 1/1] so I won't
> repeat them here.
> 
> I've picked this up since the thread has gone stale and parts of
> userspace are actually blocked by this.
> 
> Note that the patch has been changed to be more closely aligned with the
> P_PIDFD changes to waitid() I have sitting in my for-next branch (cf. [2]).
> This makes the merge conflict a little simpler and picks up on the
> coding style discussions that guided the P_PIDFD patchset.
> 
> There was some desire to get this feature in with 5.3 (cf. [3]).
> But given that this is a new feature for waitid() and for the sake of
> avoiding any merge conflicts I would prefer to land this in the 5.4
> merge window together with the P_PIDFD changes.

That makes 5.4 (or later, depending on other stuff) the hard minimum
for RV32 ABI. Is that acceptable? I was under the impression (perhaps
mistaken) that 5.3 was going to be next LTS series which is why I'd
like to have the necessary syscalls for a complete working RV32
userspace in it. If I'm wrong about that please ignore me. :-)

Rich


[PATCH v3 0/1] waitid: process group enhancement

2019-08-14 Thread Christian Brauner
Hey everyone,

This patch adds support for waiting on the current process group by
specifying waitid(P_PGID, 0, ...) as discussed in [1]. The details why
we need to do this are in the commit message of [PATCH 1/1] so I won't
repeat them here.

I've picked this up since the thread has gone stale and parts of
userspace are actually blocked by this.

Note that the patch has been changed to be more closely aligned with the
P_PIDFD changes to waitid() I have sitting in my for-next branch (cf. [2]).
This makes the merge conflict a little simpler and picks up on the
coding style discussions that guided the P_PIDFD patchset.

There was some desire to get this feature in with 5.3 (cf. [3]).
But given that this is a new feature for waitid() and for the sake of
avoiding any merge conflicts I would prefer to land this in the 5.4
merge window together with the P_PIDFD changes.

Thanks!
Christian

/* v0 */
Link: https://www.sourceware.org/ml/libc-alpha/2019-07/msg00587.html

/* v1 */
Link: 
https://lore.kernel.org/lkml/20190814113822.9505-1-christian.brau...@ubuntu.com/

/* v2 */
Link: 
https://lore.kernel.org/lkml/20190814130732.23572-1-christian.brau...@ubuntu.com/

/* References */
[1]: https://www.sourceware.org/ml/libc-alpha/2019-07/msg00587.html
[2]: https://lore.kernel.org/lkml/2019072729.6516-1-christ...@brauner.io/
[3]: https://www.sourceware.org/ml/libc-alpha/2019-08/msg00304.html

Eric W. Biederman (1):
  waitid: Add support for waiting for the current process group

 kernel/exit.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

-- 
2.22.0



[PATCH v3 0/1] get_user_pages changes

2019-07-26 Thread Bharath Vedartham
In this 3rd version of the patch series, I have compressed the patches
of the previous patch series into one patch. This was suggested by Christoph 
Hellwig.
The suggestion was to remove the pte_lookup functions and use the g
et_user_pages* functions directly instead of the pte_lookup functions.

There is nothing different in this series compared to the previous 
series, It essentially compresses the 3 patches of the original series 
into one patch.

Bharath Vedartham (1):
  sgi-gru: Remove *pte_lookup functions

 drivers/misc/sgi-gru/grufault.c | 114 +---
 1 file changed, 25 insertions(+), 89 deletions(-)

-- 
2.7.4



[PATCH v3 0/1] builddeb: generate multi-arch friendly linux-libc-dev

2019-07-04 Thread Cedric Hombourger
Changes in v3:

 - add Multi-Arch: same to debian/control for linux-libc-dev

Changes in v2:

 - forward $debarch from mkdebian to builddeb
 - use dpkg-architecture -qDEB_HOST_MULTIARCH instead of $CC -dumpmachine

Cedric Hombourger (1):
  builddeb: generate multi-arch friendly linux-libc-dev package

 scripts/package/builddeb | 8 
 scripts/package/mkdebian | 5 +++--
 2 files changed, 11 insertions(+), 2 deletions(-)

-- 
2.11.0



[PATCH v3 0/1] mm/vmalloc.c: improve readability and rewrite vmap_area

2019-07-04 Thread Pengfei Li
v2 -> v3:
* patch 1-4: Abandoned
* patch 5:
  - Eliminate "flags" (suggested by Uladzislau Rezki)
  - Based on https://lkml.org/lkml/2019/6/6/455
and https://lkml.org/lkml/2019/7/3/661

v1 -> v2:
* patch 3: Rename __find_vmap_area to __search_va_in_busy_tree
   instead of __search_va_from_busy_tree.
* patch 5: Add motivation and necessary test data to the commit
   message.
* patch 5: Let va->flags use only some low bits of va_start
   instead of completely overwriting va_start.

The current implementation of struct vmap_area wasted space.

After applying this commit, sizeof(struct vmap_area) has been
reduced from 11 words to 8 words.

Pengfei Li (1):
  Modify struct vmap_area to reduce its size

 include/linux/vmalloc.h | 20 +---
 mm/vmalloc.c| 24 ++--
 2 files changed, 23 insertions(+), 21 deletions(-)

-- 
2.21.0



[PATCH v3 0/1] iio: common: cros_ec_sensors: Add protocol v3 support

2019-07-01 Thread Fabien Lahoudere
This patch is part of a split of the following patch:
https://lkml.org/lkml/2019/6/18/268
To fix Enric comments from https://lkml.org/lkml/2019/6/25/949
I extract it from the other serie to speed up acceptance because
other patches need it to be upstreamed.

Changes since v2:
- Use patch 1 from v1 after discussion on ML

Changes since v1:
- Drop second patch
- return ENODEV if version is 0

Fabien Lahoudere (1):
  iio: common: cros_ec_sensors: determine protocol version

 .../cros_ec_sensors/cros_ec_sensors_core.c| 36 ++-
 1 file changed, 35 insertions(+), 1 deletion(-)

-- 
2.19.2



[PATCH v3 0/1] Add support for IPMB driver

2019-04-29 Thread Asmaa Mnebhi
Thank you for your feedback Wolfram. I have addressed your comments.

Concerning your questions:

"Why can't we use i2c_smbus_write_block_data()?"

i2c_smbus_write_block_data() does not allow me to pass the
requester_i2c_addr argument. Instead, it uses the
client->addr.
The client->addr in this driver is set to the
i2c address of the device where this driver is loaded
(since we used i2c_slave_register to register this device as
a slave).
But the address we want to pass to i2c_smbus_write_block_data_local
is actually the i2c address of the device on the other end of the
I2C bus. This is the case where our device acts as a master and
sends the IPMB (equivalent to I2C) response to the requester device
(which becomes the I2C slave).

"Can't we leave the default or will the compiler complain?"

I chose to leave the default because IPMB by definition only
allows master write. It doesn't do any reads. So if there is
any exetrnal device that tries to do a read, this i2c cb function
will just go to the default case.

"I really don't know enough about IPMB to judge if the design of
having one i2c-dev interface and another ipmb-dev interface is
a good solution"

I am open for discussion. My reasoning was that we need to interact
with user space so I used misc strictly to enable read/write.
Maybe we could do something similar to the i2c-slave-eeprom.c
where the eeprom_data struct uses bin_attributes?

Asmaa Mnebhi (1):
  Add support for IPMB driver

 drivers/char/ipmi/Kconfig|   8 +
 drivers/char/ipmi/Makefile   |   1 +
 drivers/char/ipmi/ipmb_dev_int.c | 386 +++
 3 files changed, 395 insertions(+)
 create mode 100644 drivers/char/ipmi/ipmb_dev_int.c

-- 
2.1.2



Re: [PATCH v3 0/1] Use HMM for ODP v3

2019-04-11 Thread Jerome Glisse
On Thu, Apr 11, 2019 at 12:29:43PM +, Leon Romanovsky wrote:
> On Wed, Apr 10, 2019 at 11:41:24AM -0400, jgli...@redhat.com wrote:
> > From: Jérôme Glisse 
> >
> > Changes since v1/v2 are about rebase and better comments in the code.
> > Previous cover letter slightly updated.
> >
> >
> > This patchset convert RDMA ODP to use HMM underneath this is motivated
> > by stronger code sharing for same feature (share virtual memory SVM or
> > Share Virtual Address SVA) and also stronger integration with mm code to
> > achieve that. It depends on HMM patchset posted for inclusion in 5.2 [2]
> > and [3].
> >
> > It has been tested with pingpong test with -o and others flags to test
> > different size/features associated with ODP.
> >
> > Moreover they are some features of HMM in the works like peer to peer
> > support, fast CPU page table snapshot, fast IOMMU mapping update ...
> > It will be easier for RDMA devices with ODP to leverage those if they
> > use HMM underneath.
> >
> > Quick summary of what HMM is:
> > HMM is a toolbox for device driver to implement software support for
> > Share Virtual Memory (SVM). Not only it provides helpers to mirror a
> > process address space on a device (hmm_mirror). It also provides
> > helper to allow to use device memory to back regular valid virtual
> > address of a process (any valid mmap that is not an mmap of a device
> > or a DAX mapping). They are two kinds of device memory. Private memory
> > that is not accessible to CPU because it does not have all the expected
> > properties (this is for all PCIE devices) or public memory which can
> > also be access by CPU without restriction (with OpenCAPI or CCIX or
> > similar cache-coherent and atomic inter-connect).
> >
> > Device driver can use each of HMM tools separatly. You do not have to
> > use all the tools it provides.
> >
> > For RDMA device i do not expect a need to use the device memory support
> > of HMM. This device memory support is geared toward accelerator like GPU.
> >
> >
> > You can find a branch [1] with all the prerequisite in. This patch is on
> > top of rdma-next with the HMM patchset [2] and mmu notifier patchset [3]
> > applied on top of it.
> >
> > [1] https://cgit.freedesktop.org/~glisse/linux/log/?h=rdma-5.2
> 
> Hi Jerome,
> 
> I took this branch and merged with our latest rdma-next, but it doesn't
> compile.
> 
> In file included from drivers/infiniband/hw/mlx5/mem.c:35:
> ./include/rdma/ib_umem_odp.h:110:20: error: field _mirror_ has
> incomplete type
>   struct hmm_mirror mirror;
>   ^~
> ./include/rdma/ib_umem_odp.h:132:18: warning: _struct hmm_range_ declared 
> inside parameter list will not be visible outside of this definition or 
> declaration
> struct hmm_range *range);
>   ^
> make[4]: *** [scripts/Makefile.build:276: drivers/infiniband/hw/mlx5/mem.o] 
> Error 1
> 
> The reason to it that in my .config, ZONE_DEVICE, MEMORY_HOTPLUG and HMM 
> options were disabled.

Silly my i forgot to update kconfig so i pushed a branch with
proper kconfig changes in the ODP patch but it depends on changes
to the HMM kconfig so that HMM_MIRROR can be enabled on arch that
do not have everything for HMM_DEVICE.

https://cgit.freedesktop.org/~glisse/linux/log/?h=rdma-odp-hmm-v4

I doing build of various kconfig variation before posting to make
sure it is all good.

Cheers,
Jérôme


Re: [PATCH v3 0/1] Use HMM for ODP v3

2019-04-11 Thread Leon Romanovsky
On Wed, Apr 10, 2019 at 11:41:24AM -0400, jgli...@redhat.com wrote:
> From: Jérôme Glisse 
>
> Changes since v1/v2 are about rebase and better comments in the code.
> Previous cover letter slightly updated.
>
>
> This patchset convert RDMA ODP to use HMM underneath this is motivated
> by stronger code sharing for same feature (share virtual memory SVM or
> Share Virtual Address SVA) and also stronger integration with mm code to
> achieve that. It depends on HMM patchset posted for inclusion in 5.2 [2]
> and [3].
>
> It has been tested with pingpong test with -o and others flags to test
> different size/features associated with ODP.
>
> Moreover they are some features of HMM in the works like peer to peer
> support, fast CPU page table snapshot, fast IOMMU mapping update ...
> It will be easier for RDMA devices with ODP to leverage those if they
> use HMM underneath.
>
> Quick summary of what HMM is:
> HMM is a toolbox for device driver to implement software support for
> Share Virtual Memory (SVM). Not only it provides helpers to mirror a
> process address space on a device (hmm_mirror). It also provides
> helper to allow to use device memory to back regular valid virtual
> address of a process (any valid mmap that is not an mmap of a device
> or a DAX mapping). They are two kinds of device memory. Private memory
> that is not accessible to CPU because it does not have all the expected
> properties (this is for all PCIE devices) or public memory which can
> also be access by CPU without restriction (with OpenCAPI or CCIX or
> similar cache-coherent and atomic inter-connect).
>
> Device driver can use each of HMM tools separatly. You do not have to
> use all the tools it provides.
>
> For RDMA device i do not expect a need to use the device memory support
> of HMM. This device memory support is geared toward accelerator like GPU.
>
>
> You can find a branch [1] with all the prerequisite in. This patch is on
> top of rdma-next with the HMM patchset [2] and mmu notifier patchset [3]
> applied on top of it.
>
> [1] https://cgit.freedesktop.org/~glisse/linux/log/?h=rdma-5.2

Hi Jerome,

I took this branch and merged with our latest rdma-next, but it doesn't
compile.

In file included from drivers/infiniband/hw/mlx5/mem.c:35:
./include/rdma/ib_umem_odp.h:110:20: error: field _mirror_ has
incomplete type
  struct hmm_mirror mirror;
  ^~
./include/rdma/ib_umem_odp.h:132:18: warning: _struct hmm_range_ declared 
inside parameter list will not be visible outside of this definition or 
declaration
struct hmm_range *range);
^
make[4]: *** [scripts/Makefile.build:276: drivers/infiniband/hw/mlx5/mem.o] 
Error 1

The reason to it that in my .config, ZONE_DEVICE, MEMORY_HOTPLUG and HMM 
options were disabled.

Thanks


signature.asc
Description: PGP signature


[PATCH v3 0/1] Use HMM for ODP v3

2019-04-10 Thread jglisse
From: Jérôme Glisse 

Changes since v1/v2 are about rebase and better comments in the code.
Previous cover letter slightly updated.


This patchset convert RDMA ODP to use HMM underneath this is motivated
by stronger code sharing for same feature (share virtual memory SVM or
Share Virtual Address SVA) and also stronger integration with mm code to
achieve that. It depends on HMM patchset posted for inclusion in 5.2 [2]
and [3].

It has been tested with pingpong test with -o and others flags to test
different size/features associated with ODP.

Moreover they are some features of HMM in the works like peer to peer
support, fast CPU page table snapshot, fast IOMMU mapping update ...
It will be easier for RDMA devices with ODP to leverage those if they
use HMM underneath.

Quick summary of what HMM is:
HMM is a toolbox for device driver to implement software support for
Share Virtual Memory (SVM). Not only it provides helpers to mirror a
process address space on a device (hmm_mirror). It also provides
helper to allow to use device memory to back regular valid virtual
address of a process (any valid mmap that is not an mmap of a device
or a DAX mapping). They are two kinds of device memory. Private memory
that is not accessible to CPU because it does not have all the expected
properties (this is for all PCIE devices) or public memory which can
also be access by CPU without restriction (with OpenCAPI or CCIX or
similar cache-coherent and atomic inter-connect).

Device driver can use each of HMM tools separatly. You do not have to
use all the tools it provides.

For RDMA device i do not expect a need to use the device memory support
of HMM. This device memory support is geared toward accelerator like GPU.


You can find a branch [1] with all the prerequisite in. This patch is on
top of rdma-next with the HMM patchset [2] and mmu notifier patchset [3]
applied on top of it.

[1] https://cgit.freedesktop.org/~glisse/linux/log/?h=rdma-5.2
[2] https://lkml.org/lkml/2019/4/3/1032
[3] https://lkml.org/lkml/2019/3/26/900

Cc: linux-r...@vger.kernel.org
Cc: Jason Gunthorpe 
Cc: Leon Romanovsky 
Cc: Doug Ledford 
Cc: Artemy Kovalyov 
Cc: Moni Shoua 
Cc: Mike Marciniszyn 
Cc: Kaike Wan 
Cc: Dennis Dalessandro 

Jérôme Glisse (1):
  RDMA/odp: convert to use HMM for ODP v3

 drivers/infiniband/core/umem_odp.c | 486 -
 drivers/infiniband/hw/mlx5/mem.c   |  20 +-
 drivers/infiniband/hw/mlx5/mr.c|   2 +-
 drivers/infiniband/hw/mlx5/odp.c   | 106 ---
 include/rdma/ib_umem_odp.h |  48 ++-
 5 files changed, 219 insertions(+), 443 deletions(-)

-- 
2.20.1



Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-18 Thread John Hubbard
On 3/14/19 2:06 AM, Jan Kara wrote:
> On Wed 13-03-19 19:21:37, Christopher Lameter wrote:
>> On Wed, 13 Mar 2019, Christoph Hellwig wrote:
>>
>>> On Wed, Mar 13, 2019 at 09:11:13AM +1100, Dave Chinner wrote:
 On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote:
> IMHO I don't think that the copy_file_range() is going to carry us 
> through the
> next wave of user performance requirements.  RDMA, while the first, is 
> not the
> only technology which is looking to have direct access to files.  XDP is
> another.[1]

 Sure, all I doing here was demonstrating that people have been
 trying to get local direct access to file mappings to DMA directly
 into them for a long time. Direct Io games like these are now
 largely unnecessary because we now have much better APIs to do
 zero-copy data transfer between files (which can do hardware offload
 if it is available!).
>>>
>>> And that is just the file to file case.  There are tons of other
>>> users of get_user_pages, including various drivers that do large
>>> amounts of I/O like video capture.  For them it makes tons of sense
>>> to transfer directly to/from a mmap()ed file.
>>
>> That is very similar to the RDMA case and DAX etc. We need to have a way
>> to tell a filesystem that this is going to happen and that things need to
>> be setup for this to work properly.
> 
> The way to tell filesystem what's happening is exactly what we are working
> on with these patches...
> 
>> But if that has not been done then I think its proper to fail a long term
>> pin operation on page cache pages. Meaning the regular filesystems
>> maintain control of whats happening with their pages.
> 
> And as I mentioned in my other email, we cannot just fail the pin for
> pagecache pages as that would regress existing applications.
> 
>   Honza
> 

Christopher L,

Are you OK with this approach now? If so, I'd like to collect any additional
ACKs people are willing to provide, and ask Andrew to consider this first 
patch for 5.2, so we can get started.

thanks,
-- 
John Hubbard
NVIDIA


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-14 Thread John Hubbard

On 3/14/19 1:25 PM, William Kucharski wrote:




On Mar 14, 2019, at 7:30 AM, Jan Kara  wrote:

Well I have some crash reports couple years old and they are not from QA
departments. So I'm pretty confident there are real users that use this in
production... and just reboot their machine in case it crashes.


Do you know what the use case in those crashes actually was?

I'm curious to know they were actually cases of say DMA from a video
capture card or if the uses posited to date are simply theoretical.



It's not merely theoretical. In addition to Jan's bug reports, I've
personally investigated a bug that involved an GPU (acting basically as
an AI accelerator in this case) that was doing DMA to memory that turned
out to be file backed.

The backtrace for that is in the commit description.

As others have mentioned, this works well enough to lure people into
using it, but then fails when you load down a powerful system (and put
it under memory pressure).

I think that as systems get larger, and more highly threaded, we might
see more such failures--maybe even in the Direct IO case someday,
although so far that race window is so small that that one truly is
still theoretical (or, we just haven't been in communication with
anyone who hit it).

thanks,
--
John Hubbard
NVIDIA



It's always good to know who might be doing this and why if for no other
reason than as something to keep in mind when designing future interfaces.



Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-14 Thread William Kucharski



> On Mar 14, 2019, at 7:30 AM, Jan Kara  wrote:
> 
> Well I have some crash reports couple years old and they are not from QA
> departments. So I'm pretty confident there are real users that use this in
> production... and just reboot their machine in case it crashes.

Do you know what the use case in those crashes actually was?

I'm curious to know they were actually cases of say DMA from a video
capture card or if the uses posited to date are simply theoretical.

It's always good to know who might be doing this and why if for no other
reason than as something to keep in mind when designing future interfaces.



Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-14 Thread Jan Kara
On Thu 14-03-19 09:57:18, Jason Gunthorpe wrote:
> On Thu, Mar 14, 2019 at 10:03:45AM +0100, Jan Kara wrote:
> > On Wed 13-03-19 19:16:51, Christopher Lameter wrote:
> > > On Tue, 12 Mar 2019, Jerome Glisse wrote:
> > > 
> > > > > > This has been discuss extensively already. GUP usage is now 
> > > > > > widespread in
> > > > > > multiple drivers, removing that would regress userspace ie break 
> > > > > > existing
> > > > > > application. We all know what the rules for that is.
> > > 
> > > You are still misstating the issue. In RDMA land GUP is widely used for
> > > anonyous memory and memory based filesystems. *Not* for real filesystems.
> > 
> > Maybe in your RDMA land. But there are apparently other users which do use
> > mmap of a file on normal filesystem (e.g. ext4) as a buffer for DMA
> > (Infiniband does not prohibit this if nothing else, video capture devices
> > also use very similar pattern of gup-ing pages and using them as video
> > buffers). And these users are reporting occasional kernel crashes. That's
> > how this whole effort started. Sadly the DMA to file mmap is working good
> > enough that people started using it so at this point we cannot just tell:
> > Sorry it was a mistake to allow this, just rewrite your applications.
> 
> This is where we are in RDMA too.. People are trying it and the ones
> that do enough load testing find their kernel OOPs
> 
> So it is not clear at all if this has graduated to a real use, or just
> an experiment. Perhaps there are some system configurations that don't
> trigger crashes..

Well I have some crash reports couple years old and they are not from QA
departments. So I'm pretty confident there are real users that use this in
production... and just reboot their machine in case it crashes.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-14 Thread Jason Gunthorpe
On Thu, Mar 14, 2019 at 10:03:45AM +0100, Jan Kara wrote:
> On Wed 13-03-19 19:16:51, Christopher Lameter wrote:
> > On Tue, 12 Mar 2019, Jerome Glisse wrote:
> > 
> > > > > This has been discuss extensively already. GUP usage is now 
> > > > > widespread in
> > > > > multiple drivers, removing that would regress userspace ie break 
> > > > > existing
> > > > > application. We all know what the rules for that is.
> > 
> > You are still misstating the issue. In RDMA land GUP is widely used for
> > anonyous memory and memory based filesystems. *Not* for real filesystems.
> 
> Maybe in your RDMA land. But there are apparently other users which do use
> mmap of a file on normal filesystem (e.g. ext4) as a buffer for DMA
> (Infiniband does not prohibit this if nothing else, video capture devices
> also use very similar pattern of gup-ing pages and using them as video
> buffers). And these users are reporting occasional kernel crashes. That's
> how this whole effort started. Sadly the DMA to file mmap is working good
> enough that people started using it so at this point we cannot just tell:
> Sorry it was a mistake to allow this, just rewrite your applications.

This is where we are in RDMA too.. People are trying it and the ones
that do enough load testing find their kernel OOPs

So it is not clear at all if this has graduated to a real use, or just
an experiment. Perhaps there are some system configurations that don't
trigger crashes..

Jason


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-14 Thread Jan Kara
On Wed 13-03-19 19:21:37, Christopher Lameter wrote:
> On Wed, 13 Mar 2019, Christoph Hellwig wrote:
> 
> > On Wed, Mar 13, 2019 at 09:11:13AM +1100, Dave Chinner wrote:
> > > On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote:
> > > > IMHO I don't think that the copy_file_range() is going to carry us 
> > > > through the
> > > > next wave of user performance requirements.  RDMA, while the first, is 
> > > > not the
> > > > only technology which is looking to have direct access to files.  XDP is
> > > > another.[1]
> > >
> > > Sure, all I doing here was demonstrating that people have been
> > > trying to get local direct access to file mappings to DMA directly
> > > into them for a long time. Direct Io games like these are now
> > > largely unnecessary because we now have much better APIs to do
> > > zero-copy data transfer between files (which can do hardware offload
> > > if it is available!).
> >
> > And that is just the file to file case.  There are tons of other
> > users of get_user_pages, including various drivers that do large
> > amounts of I/O like video capture.  For them it makes tons of sense
> > to transfer directly to/from a mmap()ed file.
> 
> That is very similar to the RDMA case and DAX etc. We need to have a way
> to tell a filesystem that this is going to happen and that things need to
> be setup for this to work properly.

The way to tell filesystem what's happening is exactly what we are working
on with these patches...

> But if that has not been done then I think its proper to fail a long term
> pin operation on page cache pages. Meaning the regular filesystems
> maintain control of whats happening with their pages.

And as I mentioned in my other email, we cannot just fail the pin for
pagecache pages as that would regress existing applications.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-14 Thread Jan Kara
On Wed 13-03-19 19:16:51, Christopher Lameter wrote:
> On Tue, 12 Mar 2019, Jerome Glisse wrote:
> 
> > > > This has been discuss extensively already. GUP usage is now widespread 
> > > > in
> > > > multiple drivers, removing that would regress userspace ie break 
> > > > existing
> > > > application. We all know what the rules for that is.
> 
> You are still misstating the issue. In RDMA land GUP is widely used for
> anonyous memory and memory based filesystems. *Not* for real filesystems.

Maybe in your RDMA land. But there are apparently other users which do use
mmap of a file on normal filesystem (e.g. ext4) as a buffer for DMA
(Infiniband does not prohibit this if nothing else, video capture devices
also use very similar pattern of gup-ing pages and using them as video
buffers). And these users are reporting occasional kernel crashes. That's
how this whole effort started. Sadly the DMA to file mmap is working good
enough that people started using it so at this point we cannot just tell:
Sorry it was a mistake to allow this, just rewrite your applications.

Plus we have O_DIRECT io which can use file mmap as a buffer and as Dave
Chinner mentioned there are real applications using this.

So no, we are not going to get away with "just forbid GUP for file backed
pages" which seems to be what you suggest. We might get away with that for
*some* GUP users and you are welcome to do that in the drivers you care
about but definitely not for all.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-13 Thread Jerome Glisse
On Wed, Mar 13, 2019 at 07:16:51PM +, Christopher Lameter wrote:
> On Tue, 12 Mar 2019, Jerome Glisse wrote:
> 
> > > > This has been discuss extensively already. GUP usage is now widespread 
> > > > in
> > > > multiple drivers, removing that would regress userspace ie break 
> > > > existing
> > > > application. We all know what the rules for that is.
> 
> You are still misstating the issue. In RDMA land GUP is widely used for
> anonyous memory and memory based filesystems. *Not* for real filesystems.

Then why are they bug report as one pointed out in cover letter ? It
means someone is doing GUP on filesystem. Moreover looking at RDMA
driver i do not see anything that check that VA for GUP belongs to a
vma that is not back by a regular file.

> 
> > > Because someone was able to get away with weird ways of abusing the system
> > > it not an argument that we should continue to allow such things. In fact
> > > we have repeatedly ensured that the kernel works reliably by improving the
> > > kernel so that a proper failure is occurring.
> >
> > Driver doing GUP on mmap of regular file is something that seems to
> > already have widespread user (in the RDMA devices at least). So they
> > are active users and they were never told that what they are doing
> > was illegal.
> 
> Not true. Again please differentiate the use cases between regular
> filesystem and anonyous mappings.

Again where does the bug comes from ? Where in RDMA is the check that
VA belong to a vma that is not back by a file ?

> 
> > > Well swapout cannot occur if the page is pinned and those pages are also
> > > often mlocked.
> >
> > I would need to check the swapout code but i believe the write to disk
> > can happen before the pin checks happens. I believe the event flow is:
> > map read only, allocate swap, write to disk, try to free page which
> > checks for pin. So that you could write stale data to disk and the GUP
> > going away before you perform the pin checks.
> 
> Allocate swap is a separate step that associates a swap entry to an
> anonymous page.
> 
> > They are other thing to take into account and that need proper page
> > dirtying, like soft dirtyness for instance.
> 
> RDMA mapped pages are all dirty all the time.

Point is the pte dirty bit might not be accurate nor the soft dirty bit
because GUP user does not update those bits and thus GUP user need to
call the set_page_dirty or similar to properly report page dirtyness.

> > Well RDMA driver maintainer seems to report that this has been a valid
> > and working workload for their users.
> 
> No they dont.
> 
> Could you please get up to date on the discussion before posting?

Again why is there bug report ? Where is the code in RDMA that check
that VA does not belong to vma that is back by a file ?

As much as i would like that this use case did not exist i fear it
does and it has been upstream for a while. This also very much apply
to O_DIRECT wether you like it or not.

Cheers,
Jérôme


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-13 Thread Christopher Lameter
On Wed, 13 Mar 2019, Christoph Hellwig wrote:

> On Wed, Mar 13, 2019 at 09:11:13AM +1100, Dave Chinner wrote:
> > On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote:
> > > IMHO I don't think that the copy_file_range() is going to carry us 
> > > through the
> > > next wave of user performance requirements.  RDMA, while the first, is 
> > > not the
> > > only technology which is looking to have direct access to files.  XDP is
> > > another.[1]
> >
> > Sure, all I doing here was demonstrating that people have been
> > trying to get local direct access to file mappings to DMA directly
> > into them for a long time. Direct Io games like these are now
> > largely unnecessary because we now have much better APIs to do
> > zero-copy data transfer between files (which can do hardware offload
> > if it is available!).
>
> And that is just the file to file case.  There are tons of other
> users of get_user_pages, including various drivers that do large
> amounts of I/O like video capture.  For them it makes tons of sense
> to transfer directly to/from a mmap()ed file.

That is very similar to the RDMA case and DAX etc. We need to have a way
to tell a filesystem that this is going to happen and that things need to
be setup for this to work properly.

But if that has not been done then I think its proper to fail a long term
pin operation on page cache pages. Meaning the regular filesystems
maintain control of whats happening with their pages.


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-13 Thread Christopher Lameter
On Tue, 12 Mar 2019, Jerome Glisse wrote:

> > > This has been discuss extensively already. GUP usage is now widespread in
> > > multiple drivers, removing that would regress userspace ie break existing
> > > application. We all know what the rules for that is.

You are still misstating the issue. In RDMA land GUP is widely used for
anonyous memory and memory based filesystems. *Not* for real filesystems.

> > Because someone was able to get away with weird ways of abusing the system
> > it not an argument that we should continue to allow such things. In fact
> > we have repeatedly ensured that the kernel works reliably by improving the
> > kernel so that a proper failure is occurring.
>
> Driver doing GUP on mmap of regular file is something that seems to
> already have widespread user (in the RDMA devices at least). So they
> are active users and they were never told that what they are doing
> was illegal.

Not true. Again please differentiate the use cases between regular
filesystem and anonyous mappings.

> > Well swapout cannot occur if the page is pinned and those pages are also
> > often mlocked.
>
> I would need to check the swapout code but i believe the write to disk
> can happen before the pin checks happens. I believe the event flow is:
> map read only, allocate swap, write to disk, try to free page which
> checks for pin. So that you could write stale data to disk and the GUP
> going away before you perform the pin checks.

Allocate swap is a separate step that associates a swap entry to an
anonymous page.

> They are other thing to take into account and that need proper page
> dirtying, like soft dirtyness for instance.

RDMA mapped pages are all dirty all the time.

> Well RDMA driver maintainer seems to report that this has been a valid
> and working workload for their users.

No they dont.

Could you please get up to date on the discussion before posting?


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-13 Thread Christoph Hellwig
On Wed, Mar 13, 2019 at 09:11:13AM +1100, Dave Chinner wrote:
> On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote:
> > IMHO I don't think that the copy_file_range() is going to carry us through 
> > the
> > next wave of user performance requirements.  RDMA, while the first, is not 
> > the
> > only technology which is looking to have direct access to files.  XDP is
> > another.[1]
> 
> Sure, all I doing here was demonstrating that people have been
> trying to get local direct access to file mappings to DMA directly
> into them for a long time. Direct Io games like these are now
> largely unnecessary because we now have much better APIs to do
> zero-copy data transfer between files (which can do hardware offload
> if it is available!).

And that is just the file to file case.  There are tons of other
users of get_user_pages, including various drivers that do large
amounts of I/O like video capture.  For them it makes tons of sense
to transfer directly to/from a mmap()ed file.


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-12 Thread Ira Weiny
On Wed, Mar 13, 2019 at 09:11:13AM +1100, Dave Chinner wrote:
> On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote:
> > IMHO I don't think that the copy_file_range() is going to carry us through 
> > the
> > next wave of user performance requirements.  RDMA, while the first, is not 
> > the
> > only technology which is looking to have direct access to files.  XDP is
> > another.[1]
> 
> Sure, all I doing here was demonstrating that people have been
> trying to get local direct access to file mappings to DMA directly
> into them for a long time. Direct Io games like these are now
> largely unnecessary because we now have much better APIs to do
> zero-copy data transfer between files (which can do hardware offload
> if it is available!).
> 
> It's the long term pins that RDMA does that are the problem here.
> I'm asssuming that for XDP, you're talking about userspace zero copy
> from files to the network hardware and vice versa? transmit is
> simple (read-only mapping), but receive probably requires bpf
> programs to ensure that data (minus headers) in the incoming packet
> stream is correctly placed into the UMEM region?

Yes, exactly.

> 
> XDP receive seems pretty much like the same problem as RDMA writes
> into the file. i.e.  the incoming write DMAs are going to have to
> trigger page faults if the UMEM is a long term pin so the filesystem
> behaves correctly with this remote data placement.  I'd suggest that
> RDMA, XDP and anything other hardware that is going to pin
> file-backed mappings for the long term need to use the same "inform
> the fs of a write operation into it's mapping" mechanisms...

Yes agreed.  I have a hack patch I'm testing right now which allows the user to
take a LAYOUT lease from user space and GUP triggers on that, either allowing
or rejecting the pin based on the lease.  I think this is the first step of
what Jan suggested.[1]  There is a lot more detail to work out with what
happens if that lease needs to be broken.

> 
> And if we start talking about wanting to do peer-to-peer DMA from
> network/GPU device to storage device without going through a
> file-backed CPU mapping, we still need to have the filesystem
> involved to translate file offsets to storage locations the
> filesystem has allocated for the data and to lock them down for as
> long as the peer-to-peer DMA offload is in place.  In effect, this
> is the same problem as RDMA+FS-DAXs - the filesystem owns the file
> offset to storage location mapping and manages storage access
> arbitration, not the mm/vma mapping presented to userspace

I've only daydreamed about Peer-to-peer transfers.  But yes I think this is the
direction we need to go.  But The details of doing a

GPU -> RDMA -> {network } -> RDMA -> FS DAX

And back again... without CPU/OS involvement are only a twinkle in my eye...
If that.

Ira

[1] https://lore.kernel.org/lkml/20190212160707.ga19...@quack2.suse.cz/



Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-12 Thread Dave Chinner
On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote:
> IMHO I don't think that the copy_file_range() is going to carry us through the
> next wave of user performance requirements.  RDMA, while the first, is not the
> only technology which is looking to have direct access to files.  XDP is
> another.[1]

Sure, all I doing here was demonstrating that people have been
trying to get local direct access to file mappings to DMA directly
into them for a long time. Direct Io games like these are now
largely unnecessary because we now have much better APIs to do
zero-copy data transfer between files (which can do hardware offload
if it is available!).

It's the long term pins that RDMA does that are the problem here.
I'm asssuming that for XDP, you're talking about userspace zero copy
from files to the network hardware and vice versa? transmit is
simple (read-only mapping), but receive probably requires bpf
programs to ensure that data (minus headers) in the incoming packet
stream is correctly placed into the UMEM region?

XDP receive seems pretty much like the same problem as RDMA writes
into the file. i.e.  the incoming write DMAs are going to have to
trigger page faults if the UMEM is a long term pin so the filesystem
behaves correctly with this remote data placement.  I'd suggest that
RDMA, XDP and anything other hardware that is going to pin
file-backed mappings for the long term need to use the same "inform
the fs of a write operation into it's mapping" mechanisms...

And if we start talking about wanting to do peer-to-peer DMA from
network/GPU device to storage device without going through a
file-backed CPU mapping, we still need to have the filesystem
involved to translate file offsets to storage locations the
filesystem has allocated for the data and to lock them down for as
long as the peer-to-peer DMA offload is in place.  In effect, this
is the same problem as RDMA+FS-DAXs - the filesystem owns the file
offset to storage location mapping and manages storage access
arbitration, not the mm/vma mapping presented to userspace

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-12 Thread Ira Weiny
On Tue, Mar 12, 2019 at 05:23:21AM +, Christopher Lameter wrote:
> On Mon, 11 Mar 2019, Dave Chinner wrote:
> 
> > > Direct IO on a mmapped file backed page doesnt make any sense.
> >
> > People have used it for many, many years as zero-copy data movement
> > pattern. i.e. mmap the destination file, use direct IO to DMA direct
> > into the destination file page cache pages, fdatasync() to force
> > writeback of the destination file.
> 
> Well we could make that more safe through a special API that designates a
> range of pages in a file in the same way as for RDMA. This is inherently
> not reliable as we found out.

I'm not following.  What API was not reliable?  In[2] we had ideas on such an
API but AFAIK these have not been tried.

>From what I have seen the above is racy and is prone to the issues John has
seen.  The difference is that Direct IO has a smaller window than RDMA.  (Or at
least I thought we already established that?)

"And also remember that while RDMA might be the case at least some
people care about here it really isn't different from any of the other
gup + I/O cases, including doing direct I/O to a mmap area.  The only
difference in the various cases is how long the area should be pinned
down..."

-- Christoph Hellwig : https://lkml.org/lkml/2018/10/1/591

> 
> > Now we have copy_file_range() to optimise this sort of data
> > movement, the need for games with mmap+direct IO largely goes away.
> > However, we still can't just remove that functionality as it will
> > break lots of random userspace stuff...
> 
> It is already broken and unreliable. Are there really "lots" of these
> things around? Can we test this by adding a warning in the kernel and see
> where it actually crops up?

IMHO I don't think that the copy_file_range() is going to carry us through the
next wave of user performance requirements.  RDMA, while the first, is not the
only technology which is looking to have direct access to files.  XDP is
another.[1]

Ira

[1] https://www.kernel.org/doc/html/v4.19-rc1/networking/af_xdp.html
[2] 
https://lore.kernel.org/lkml/20190205175059.gb21...@iweiny-desk2.sc.intel.com/



Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-12 Thread Jason Gunthorpe
On Tue, Mar 12, 2019 at 11:35:29AM -0400, Jerome Glisse wrote:
> > > > Yes you now have the filesystem as well as the GUP pinner claiming
> > > > authority over the contents of a single memory segment. Maybe better not
> > > > allow that?
> > >
> > > This goes back to regressing existing driver with existing users.
> > 
> > There is no regression if that behavior never really worked.
> 
> Well RDMA driver maintainer seems to report that this has been a valid
> and working workload for their users.

I think it is more O_DIRECT that is the history here..

In RDMA land long term GUPs of file backed pages tend to crash the
kernel (what John is trying to fix here) so I'm not sure there are
actual real & tested users, only people that wish they could do this..

Jason


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-12 Thread Jerome Glisse
On Tue, Mar 12, 2019 at 04:52:07AM +, Christopher Lameter wrote:
> On Fri, 8 Mar 2019, Jerome Glisse wrote:
> 
> > >
> > > It would good if that understanding would be enforced somehow given the 
> > > problems
> > > that we see.
> >
> > This has been discuss extensively already. GUP usage is now widespread in
> > multiple drivers, removing that would regress userspace ie break existing
> > application. We all know what the rules for that is.
> 
> The applications that work are using anonymous memory and memory
> filesystems. I have never seen use cases with a real filesystem and would
> have objected if someone tried something crazy like that.
> 
> Because someone was able to get away with weird ways of abusing the system
> it not an argument that we should continue to allow such things. In fact
> we have repeatedly ensured that the kernel works reliably by improving the
> kernel so that a proper failure is occurring.

Driver doing GUP on mmap of regular file is something that seems to
already have widespread user (in the RDMA devices at least). So they
are active users and they were never told that what they are doing
was illegal.

Note that i am personaly fine with breaking device driver that can not
abide by mmu notifier but the consensus seems that it is not fine to
do so.

> > > > In fact, the GUP documentation even recommends that pattern.
> > >
> > > Isnt that pattern safe for anonymous memory and memory filesystems like
> > > hugetlbfs etc? Which is the common use case.
> >
> > Still an issue in respect to swapout ie if anon/shmem page was map
> > read only in preparation for swapout and we do not report the page
> > as dirty what endup in swap might lack what was written last through
> > GUP.
> 
> Well swapout cannot occur if the page is pinned and those pages are also
> often mlocked.

I would need to check the swapout code but i believe the write to disk
can happen before the pin checks happens. I believe the event flow is:
map read only, allocate swap, write to disk, try to free page which
checks for pin. So that you could write stale data to disk and the GUP
going away before you perform the pin checks.

They are other thing to take into account and that need proper page
dirtying, like soft dirtyness for instance.


> > >
> > > Yes you now have the filesystem as well as the GUP pinner claiming
> > > authority over the contents of a single memory segment. Maybe better not
> > > allow that?
> >
> > This goes back to regressing existing driver with existing users.
> 
> There is no regression if that behavior never really worked.

Well RDMA driver maintainer seems to report that this has been a valid
and working workload for their users.


> > > Two filesystem trying to sync one memory segment both believing to have
> > > exclusive access and we want to sort this out. Why? Dont allow this.
> >
> > This is allowed, it always was, forbidding that case now would regress
> > existing application and it would also means that we are modifying the
> > API we expose to userspace. So again this is not something we can block
> > without regressing existing user.
> 
> We have always stopped the user from doing obviously stupid and risky
> things. It would be logical to do it here as well.

While i would rather only allow device that can handle mmu notifier
it is just not acceptable to regress existing user and they do seem
to exist and had working setup going on for a while.

Cheers,
Jérôme


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-11 Thread Christopher Lameter
On Mon, 11 Mar 2019, Dave Chinner wrote:

> > Direct IO on a mmapped file backed page doesnt make any sense.
>
> People have used it for many, many years as zero-copy data movement
> pattern. i.e. mmap the destination file, use direct IO to DMA direct
> into the destination file page cache pages, fdatasync() to force
> writeback of the destination file.

Well we could make that more safe through a special API that designates a
range of pages in a file in the same way as for RDMA. This is inherently
not reliable as we found out.

> Now we have copy_file_range() to optimise this sort of data
> movement, the need for games with mmap+direct IO largely goes away.
> However, we still can't just remove that functionality as it will
> break lots of random userspace stuff...

It is already broken and unreliable. Are there really "lots" of these
things around? Can we test this by adding a warning in the kernel and see
where it actually crops up?


Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-11 Thread Christopher Lameter
On Fri, 8 Mar 2019, Jerome Glisse wrote:

> >
> > It would good if that understanding would be enforced somehow given the 
> > problems
> > that we see.
>
> This has been discuss extensively already. GUP usage is now widespread in
> multiple drivers, removing that would regress userspace ie break existing
> application. We all know what the rules for that is.

The applications that work are using anonymous memory and memory
filesystems. I have never seen use cases with a real filesystem and would
have objected if someone tried something crazy like that.

Because someone was able to get away with weird ways of abusing the system
it not an argument that we should continue to allow such things. In fact
we have repeatedly ensured that the kernel works reliably by improving the
kernel so that a proper failure is occurring.


> > > In fact, the GUP documentation even recommends that pattern.
> >
> > Isnt that pattern safe for anonymous memory and memory filesystems like
> > hugetlbfs etc? Which is the common use case.
>
> Still an issue in respect to swapout ie if anon/shmem page was map
> read only in preparation for swapout and we do not report the page
> as dirty what endup in swap might lack what was written last through
> GUP.

Well swapout cannot occur if the page is pinned and those pages are also
often mlocked.

> >
> > Yes you now have the filesystem as well as the GUP pinner claiming
> > authority over the contents of a single memory segment. Maybe better not
> > allow that?
>
> This goes back to regressing existing driver with existing users.

There is no regression if that behavior never really worked.

> > Two filesystem trying to sync one memory segment both believing to have
> > exclusive access and we want to sort this out. Why? Dont allow this.
>
> This is allowed, it always was, forbidding that case now would regress
> existing application and it would also means that we are modifying the
> API we expose to userspace. So again this is not something we can block
> without regressing existing user.

We have always stopped the user from doing obviously stupid and risky
things. It would be logical to do it here as well.



Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-10 Thread Dave Chinner
On Fri, Mar 08, 2019 at 03:08:40AM +, Christopher Lameter wrote:
> On Wed, 6 Mar 2019, john.hubb...@gmail.com wrote:
> > Direct IO
> > =
> >
> > Direct IO can cause corruption, if userspace does Direct-IO that writes to
> > a range of virtual addresses that are mmap'd to a file.  The pages written
> > to are file-backed pages that can be under write back, while the Direct IO
> > is taking place.  Here, Direct IO races with a write back: it calls
> > GUP before page_mkclean() has replaced the CPU pte with a read-only entry.
> > The race window is pretty small, which is probably why years have gone by
> > before we noticed this problem: Direct IO is generally very quick, and
> > tends to finish up before the filesystem gets around to do anything with
> > the page contents.  However, it's still a real problem.  The solution is
> > to never let GUP return pages that are under write back, but instead,
> > force GUP to take a write fault on those pages.  That way, GUP will
> > properly synchronize with the active write back.  This does not change the
> > required GUP behavior, it just avoids that race.
> 
> Direct IO on a mmapped file backed page doesnt make any sense.

People have used it for many, many years as zero-copy data movement
pattern. i.e. mmap the destination file, use direct IO to DMA direct
into the destination file page cache pages, fdatasync() to force
writeback of the destination file.

Now we have copy_file_range() to optimise this sort of data
movement, the need for games with mmap+direct IO largely goes away.
However, we still can't just remove that functionality as it will
break lots of random userspace stuff...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


  1   2   3   4   >