Re: [PATCH] time: Fix overwrite err unexpected in clock_adjtime32
On Mon, Apr 12, 2021 at 02:52:11PM +, chenjun (AM) wrote: > 在 2021/4/12 22:20, Richard Cochran 写道: > > On Mon, Apr 12, 2021 at 12:45:51PM +, Chen Jun wrote: > >> the correct error is covered by put_old_timex32. > > > > Well, the non-negative return code (TIME_OK, TIME_INS, etc) is > > clobbered by put_old_timex32(). > > > >> Fixes: f1f1d5ebd10f ("posix-timers: Introduce a syscall for clock tuning.") > > > > This is not the correct commit for the "Fixes" tag. Please find the > > actual commit that introduced the issue. > > > > In commit f1f1d5ebd10f the code looked like this... > > > > long compat_sys_clock_adjtime(clockid_t which_clock, > > struct compat_timex __user *utp) > > { > > struct timex txc; > > mm_segment_t oldfs; > > int err, ret; > > > > err = compat_get_timex(, utp); > > if (err) > > return err; > > > > oldfs = get_fs(); > > set_fs(KERNEL_DS); > > ret = sys_clock_adjtime(which_clock, (struct timex __user *) > > ); > > set_fs(oldfs); > > > > err = compat_put_timex(utp, ); > > if (err) > > return err; > > > > return ret; > > } Look at the code ^^^ > The implement of clock_adjtime32 is similar to compat_sys_clock_adjtime. > And I think f1f1d5ebd10 introduced the problem actually. See how 'ret' and 'err' are two separate variables? It makes a difference. Thanks, Richard
Re: [PATCH] time: Fix overwrite err unexpected in clock_adjtime32
在 2021/4/12 22:20, Richard Cochran 写道: > On Mon, Apr 12, 2021 at 12:45:51PM +, Chen Jun wrote: >> the correct error is covered by put_old_timex32. > > Well, the non-negative return code (TIME_OK, TIME_INS, etc) is > clobbered by put_old_timex32(). > >> Fixes: f1f1d5ebd10f ("posix-timers: Introduce a syscall for clock tuning.") > > This is not the correct commit for the "Fixes" tag. Please find the > actual commit that introduced the issue. > > In commit f1f1d5ebd10f the code looked like this... > > long compat_sys_clock_adjtime(clockid_t which_clock, > struct compat_timex __user *utp) > { > struct timex txc; > mm_segment_t oldfs; > int err, ret; > > err = compat_get_timex(, utp); > if (err) > return err; > > oldfs = get_fs(); > set_fs(KERNEL_DS); > ret = sys_clock_adjtime(which_clock, (struct timex __user *) > ); > set_fs(oldfs); > > err = compat_put_timex(utp, ); > if (err) > return err; > > return ret; > } > f1f1d5ebd10: Introduce compat_sys_clock_adjtime 62a6fa97684: rename compat_sys_clock_adjtime to COMPAT_SYSCALL_DEFINE2(clock_adjtime 3a4d44b6162: move COMPAT_SYSCALL_DEFINE2(clock_adjtime from kernel/compat.c to kernel/time/posix-timers.c 8dabe7245bb: COMPAT_SYSCALL_DEFINE2(clock_adjtime, .. -> SYSCALL_DEFINE2(clock_adjtime32, .. The implement of clock_adjtime32 is similar to compat_sys_clock_adjtime. And I think f1f1d5ebd10 introduced the problem actually. > Thanks, > Richard > > > >> Signed-off-by: Chen Jun >> --- >> kernel/time/posix-timers.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c >> index bf540f5a..dd5697d 100644 >> --- a/kernel/time/posix-timers.c >> +++ b/kernel/time/posix-timers.c >> @@ -1191,8 +1191,8 @@ SYSCALL_DEFINE2(clock_adjtime32, clockid_t, >> which_clock, >> >> err = do_clock_adjtime(which_clock, ); >> >> -if (err >= 0) >> -err = put_old_timex32(utp, ); >> +if (err >= 0 && put_old_timex32(utp, )) >> +return -EFAULT; >> >> return err; >> } >> -- >> 2.9.4 >> > -- Regards Chen Jun
Re: [PATCH] time: Fix overwrite err unexpected in clock_adjtime32
On Mon, Apr 12, 2021 at 12:45:51PM +, Chen Jun wrote: > the correct error is covered by put_old_timex32. Well, the non-negative return code (TIME_OK, TIME_INS, etc) is clobbered by put_old_timex32(). > Fixes: f1f1d5ebd10f ("posix-timers: Introduce a syscall for clock tuning.") This is not the correct commit for the "Fixes" tag. Please find the actual commit that introduced the issue. In commit f1f1d5ebd10f the code looked like this... long compat_sys_clock_adjtime(clockid_t which_clock, struct compat_timex __user *utp) { struct timex txc; mm_segment_t oldfs; int err, ret; err = compat_get_timex(, utp); if (err) return err; oldfs = get_fs(); set_fs(KERNEL_DS); ret = sys_clock_adjtime(which_clock, (struct timex __user *) ); set_fs(oldfs); err = compat_put_timex(utp, ); if (err) return err; return ret; } Thanks, Richard > Signed-off-by: Chen Jun > --- > kernel/time/posix-timers.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c > index bf540f5a..dd5697d 100644 > --- a/kernel/time/posix-timers.c > +++ b/kernel/time/posix-timers.c > @@ -1191,8 +1191,8 @@ SYSCALL_DEFINE2(clock_adjtime32, clockid_t, which_clock, > > err = do_clock_adjtime(which_clock, ); > > - if (err >= 0) > - err = put_old_timex32(utp, ); > + if (err >= 0 && put_old_timex32(utp, )) > + return -EFAULT; > > return err; > } > -- > 2.9.4 >
Re: [PATCH RESEND v1 4/4] powerpc/vdso: Add support for time namespaces
On 3/31/21 5:48 PM, Christophe Leroy wrote: > This patch adds the necessary glue to provide time namespaces. > > Things are mainly copied from ARM64. > > __arch_get_timens_vdso_data() calculates timens vdso data position > based on the vdso data position, knowing it is the next page in vvar. > This avoids having to redo the mflr/bcl/mflr/mtlr dance to locate > the page relative to running code position. > > Signed-off-by: Christophe Leroy Reviewed-by: Vincenzo Frascino # vDSO parts > --- > arch/powerpc/Kconfig | 3 +- > arch/powerpc/include/asm/vdso/gettimeofday.h | 10 ++ > arch/powerpc/include/asm/vdso_datapage.h | 2 - > arch/powerpc/kernel/vdso.c | 116 --- > arch/powerpc/kernel/vdso32/vdso32.lds.S | 2 +- > arch/powerpc/kernel/vdso64/vdso64.lds.S | 2 +- > 6 files changed, 114 insertions(+), 21 deletions(-) > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index c1344c05226c..71daff5f15d5 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -172,6 +172,7 @@ config PPC > select GENERIC_CPU_AUTOPROBE > select GENERIC_CPU_VULNERABILITIES if PPC_BARRIER_NOSPEC > select GENERIC_EARLY_IOREMAP > + select GENERIC_GETTIMEOFDAY > select GENERIC_IRQ_SHOW > select GENERIC_IRQ_SHOW_LEVEL > select GENERIC_PCI_IOMAPif PCI > @@ -179,7 +180,7 @@ config PPC > select GENERIC_STRNCPY_FROM_USER > select GENERIC_STRNLEN_USER > select GENERIC_TIME_VSYSCALL > - select GENERIC_GETTIMEOFDAY > + select GENERIC_VDSO_TIME_NS > select HAVE_ARCH_AUDITSYSCALL > select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && > PPC_RADIX_MMU > select HAVE_ARCH_JUMP_LABEL > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h > b/arch/powerpc/include/asm/vdso/gettimeofday.h > index d453e725c79f..e448df1dd071 100644 > --- a/arch/powerpc/include/asm/vdso/gettimeofday.h > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h > @@ -2,6 +2,8 @@ > #ifndef _ASM_POWERPC_VDSO_GETTIMEOFDAY_H > #define _ASM_POWERPC_VDSO_GETTIMEOFDAY_H > > +#include > + > #ifdef __ASSEMBLY__ > > #include > @@ -153,6 +155,14 @@ static __always_inline u64 __arch_get_hw_counter(s32 > clock_mode, > > const struct vdso_data *__arch_get_vdso_data(void); > > +#ifdef CONFIG_TIME_NS > +static __always_inline > +const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data > *vd) > +{ > + return (void *)vd + PAGE_SIZE; > +} > +#endif > + > static inline bool vdso_clocksource_ok(const struct vdso_data *vd) > { > return true; > diff --git a/arch/powerpc/include/asm/vdso_datapage.h > b/arch/powerpc/include/asm/vdso_datapage.h > index 3f958ecf2beb..a585c8e538ff 100644 > --- a/arch/powerpc/include/asm/vdso_datapage.h > +++ b/arch/powerpc/include/asm/vdso_datapage.h > @@ -107,9 +107,7 @@ extern struct vdso_arch_data *vdso_data; > bcl 20, 31, .+4 > 999: > mflr\ptr > -#if CONFIG_PPC_PAGE_SHIFT > 14 > addis \ptr, \ptr, (_vdso_datapage - 999b)@ha > -#endif > addi\ptr, \ptr, (_vdso_datapage - 999b)@l > .endm > > diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c > index b14907209822..717f2c9a7573 100644 > --- a/arch/powerpc/kernel/vdso.c > +++ b/arch/powerpc/kernel/vdso.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > #include > > #include > @@ -50,6 +51,12 @@ static union { > } vdso_data_store __page_aligned_data; > struct vdso_arch_data *vdso_data = _data_store.data; > > +enum vvar_pages { > + VVAR_DATA_PAGE_OFFSET, > + VVAR_TIMENS_PAGE_OFFSET, > + VVAR_NR_PAGES, > +}; > + > static int vdso_mremap(const struct vm_special_mapping *sm, struct > vm_area_struct *new_vma, > unsigned long text_size) > { > @@ -73,8 +80,12 @@ static int vdso64_mremap(const struct vm_special_mapping > *sm, struct vm_area_str > return vdso_mremap(sm, new_vma, _end - _start); > } > > +static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, > + struct vm_area_struct *vma, struct vm_fault *vmf); > + > static struct vm_special_mapping vvar_spec __ro_after_init = { > .name = "[vvar]", > + .fault = vvar_fault, > }; > > static struct vm_special_mapping vdso32_spec __ro_after_init = { > @@ -87,6 +98,94 @@ static struct vm_special_mapping vdso64_spec > __ro_after_init = { > .mremap = vdso64_mremap, > }; > > +#ifdef CONFIG_TIME
Re: [PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces
On Wed, Mar 31 2021 at 16:48, Christophe Leroy wrote: > [Sorry, resending with complete destination list, I used the wrong script on > the first delivery] > > This series adds support for time namespaces on powerpc. > > All timens selftests are successfull. If PPC people want to pick up the whole lot, no objections from my side. Thanks, tglx
[PATCH] time: Fix overwrite err unexpected in clock_adjtime32
the correct error is covered by put_old_timex32. Fixes: f1f1d5ebd10f ("posix-timers: Introduce a syscall for clock tuning.") Signed-off-by: Chen Jun --- kernel/time/posix-timers.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index bf540f5a..dd5697d 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -1191,8 +1191,8 @@ SYSCALL_DEFINE2(clock_adjtime32, clockid_t, which_clock, err = do_clock_adjtime(which_clock, ); - if (err >= 0) - err = put_old_timex32(utp, ); + if (err >= 0 && put_old_timex32(utp, )) + return -EFAULT; return err; } -- 2.9.4
Re: [RESEND] i2c: mediatek: Get device clock-stretch time via dts
On Wed, 2021-04-07 at 20:19 +0200, Wolfram Sang wrote: > > Due to clock stretch, our HW IP cannot meet the ac-timing > > spec(tSU;STA,tSU;STO). > > There isn't a same delay for clock stretching, so we need pass a > > parameter which can be found through measurement to meet most > > conditions. > > What about using this existing binding? > > - i2c-scl-internal-delay-ns > Number of nanoseconds the IP core additionally needs to setup SCL. > I can't see the relationship between "i2c-scl-falling-time-ns" and clock stretching, is there a parameter related to clock stretching? If you think both of them will affect the ac-timing of SCL, at this point, "i2c-scl-falling-time-ns" maybe a good choice.
[PATCH 1/4] USB: serial: f81232: drop time-based drain delay
The f81232 driver now waits for the transmit FIFO to drain during close so there is no need to keep the time-based drain delay, which would add up to two seconds on every close for low line speeds. Fixes: 98405f81036d ("USB: serial: f81232: add tx_empty function") Signed-off-by: Johan Hovold --- drivers/usb/serial/f81232.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c index b4b847dce4bc..a7a7af8d05bf 100644 --- a/drivers/usb/serial/f81232.c +++ b/drivers/usb/serial/f81232.c @@ -948,7 +948,6 @@ static int f81232_port_probe(struct usb_serial_port *port) usb_set_serial_port_data(port, priv); - port->port.drain_delay = 256; priv->port = port; return 0; -- 2.26.3
[PATCH v4 3/6] perf arm-spe: Convert event kernel time to counter value
When handle a perf event, Arm SPE decoder needs to decide if this perf event is earlier or later than the samples from Arm SPE trace data; to do comparision, it needs to use the same unit for the time. This patch converts the event kernel time to arch timer's counter value, thus it can be used to compare with counter value contained in Arm SPE Timestamp packet. Signed-off-by: Leo Yan --- tools/perf/util/arm-spe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index 7620dcc45940..23714cf0380e 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -669,7 +669,7 @@ static int arm_spe_process_event(struct perf_session *session, } if (sample->time && (sample->time != (u64) -1)) - timestamp = sample->time; + timestamp = perf_time_to_tsc(sample->time, >tc); else timestamp = 0; -- 2.25.1
[PATCH v4 4/6] perf arm-spe: Assign kernel time to synthesized event
In current code, it assigns the arch timer counter to the synthesized samples Arm SPE trace, thus the samples don't contain the kernel time but only contain the raw counter value. To fix the issue, this patch converts the timer counter to kernel time and assigns it to sample timestamp. Signed-off-by: Leo Yan --- tools/perf/util/arm-spe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index 23714cf0380e..c13a89f06ab8 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -234,7 +234,7 @@ static void arm_spe_prep_sample(struct arm_spe *spe, struct arm_spe_record *record = >decoder->record; if (!spe->timeless_decoding) - sample->time = speq->timestamp; + sample->time = tsc_to_perf_time(record->timestamp, >tc); sample->ip = record->from_ip; sample->cpumode = arm_spe_cpumode(spe, sample->ip); -- 2.25.1
[PATCH 5.11 073/210] mac80211: fix time-is-after bug in mlme
From: Ben Greear commit 7d73cd946d4bc7d44cdc5121b1c61d5d71425dea upstream. The incorrect timeout check caused probing to happen when it did not need to happen. This in turn caused tx performance drop for around 5 seconds in ath10k-ct driver. Possibly that tx drop is due to a secondary issue, but fixing the probe to not happen when traffic is running fixes the symptom. Signed-off-by: Ben Greear Fixes: 9abf4e49830d ("mac80211: optimize station connection monitor") Acked-by: Felix Fietkau Link: https://lore.kernel.org/r/20210330230749.14097-1-gree...@candelatech.com Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman --- net/mac80211/mlme.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -4707,7 +4707,10 @@ static void ieee80211_sta_conn_mon_timer timeout = sta->rx_stats.last_rx; timeout += IEEE80211_CONNECTION_IDLE_TIME; - if (time_is_before_jiffies(timeout)) { + /* If timeout is after now, then update timer to fire at +* the later date, but do not actually probe at this time. +*/ + if (time_is_after_jiffies(timeout)) { mod_timer(>conn_mon_timer, round_jiffies_up(timeout)); return; }
[PATCH 5.11 027/210] IB/hfi1: Fix probe time panic when AIP is enabled with a buggy BIOS
From: Mike Marciniszyn commit 5de61a47eb9064cbbc5f3360d639e8e34a690a54 upstream. A panic can result when AIP is enabled: BUG: unable to handle kernel NULL pointer dereference at 000 PGD 0 P4D 0 Oops: 1 SMP PTI CPU: 70 PID: 981 Comm: systemd-udevd Tainted: G OE - - - 4.18.0-240.el8.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.01.01.0005.101720141054 10/17/2014 RIP: 0010:__bitmap_and+0x1b/0x70 RSP: 0018:99aa0845f9f0 EFLAGS: 00010246 RAX: RBX: 8d5a6fc18000 RCX: 0048 RDX: RSI: c06336f0 RDI: 8d5a8fa67750 RBP: 0079 R08: 000f R09: R10: R11: 0001 R12: c06336f0 R13: 00a0 R14: 8d5a6fc18000 R15: 0003 FS: 7fec137a5980() GS:8d5a9fa8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 000a04b48002 CR4: 001606e0 Call Trace: hfi1_num_netdev_contexts+0x7c/0x110 [hfi1] hfi1_init_dd+0xd7f/0x1a90 [hfi1] ? pci_bus_read_config_dword+0x49/0x70 ? pci_mmcfg_read+0x3e/0xe0 do_init_one.isra.18+0x336/0x640 [hfi1] local_pci_probe+0x41/0x90 pci_device_probe+0x105/0x1c0 really_probe+0x212/0x440 driver_probe_device+0x49/0xc0 device_driver_attach+0x50/0x60 __driver_attach+0x61/0x130 ? device_driver_attach+0x60/0x60 bus_for_each_dev+0x77/0xc0 ? klist_add_tail+0x3b/0x70 bus_add_driver+0x14d/0x1e0 ? dev_init+0x10b/0x10b [hfi1] driver_register+0x6b/0xb0 ? dev_init+0x10b/0x10b [hfi1] hfi1_mod_init+0x1e6/0x20a [hfi1] do_one_initcall+0x46/0x1c3 ? free_unref_page_commit+0x91/0x100 ? _cond_resched+0x15/0x30 ? kmem_cache_alloc_trace+0x140/0x1c0 do_init_module+0x5a/0x220 load_module+0x14b4/0x17e0 ? __do_sys_finit_module+0xa8/0x110 __do_sys_finit_module+0xa8/0x110 do_syscall_64+0x5b/0x1a0 The issue happens when pcibus_to_node() returns NO_NUMA_NODE. Fix this issue by moving the initialization of dd->node to hfi1_devdata allocation and remove the other pcibus_to_node() calls in the probe path and use dd->node instead. Affinity logic is adjusted to use a new field dd->affinity_entry as a guard instead of dd->node. Fixes: 4730f4a6c6b2 ("IB/hfi1: Activate the dummy netdev") Link: https://lore.kernel.org/r/1617025700-31865-4-git-send-email-dennis.dalessan...@cornelisnetworks.com Cc: sta...@vger.kernel.org Signed-off-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hfi1/affinity.c | 21 + drivers/infiniband/hw/hfi1/hfi.h |1 + drivers/infiniband/hw/hfi1/init.c | 10 +- drivers/infiniband/hw/hfi1/netdev_rx.c |3 +-- 4 files changed, 16 insertions(+), 19 deletions(-) --- a/drivers/infiniband/hw/hfi1/affinity.c +++ b/drivers/infiniband/hw/hfi1/affinity.c @@ -632,22 +632,11 @@ static void _dev_comp_vect_cpu_mask_clea */ int hfi1_dev_affinity_init(struct hfi1_devdata *dd) { - int node = pcibus_to_node(dd->pcidev->bus); struct hfi1_affinity_node *entry; const struct cpumask *local_mask; int curr_cpu, possible, i, ret; bool new_entry = false; - /* -* If the BIOS does not have the NUMA node information set, select -* NUMA 0 so we get consistent performance. -*/ - if (node < 0) { - dd_dev_err(dd, "Invalid PCI NUMA node. Performance may be affected\n"); - node = 0; - } - dd->node = node; - local_mask = cpumask_of_node(dd->node); if (cpumask_first(local_mask) >= nr_cpu_ids) local_mask = topology_core_cpumask(0); @@ -660,7 +649,7 @@ int hfi1_dev_affinity_init(struct hfi1_d * create an entry in the global affinity structure and initialize it. */ if (!entry) { - entry = node_affinity_allocate(node); + entry = node_affinity_allocate(dd->node); if (!entry) { dd_dev_err(dd, "Unable to allocate global affinity node\n"); @@ -751,6 +740,7 @@ int hfi1_dev_affinity_init(struct hfi1_d if (new_entry) node_affinity_add_tail(entry); + dd->affinity_entry = entry; mutex_unlock(_affinity.lock); return 0; @@ -766,10 +756,9 @@ void hfi1_dev_affinity_clean_up(struct h { struct hfi1_affinity_node *entry; - if (dd->node < 0) - return; - mutex_lock(_affinity.lock); + if (!dd->affinity_entry) + goto unlock; entry = node_affinity_lookup(dd->node); if (!entry) goto unlock; @@ -780,8 +769,8 @@ void hfi1_dev_affinity_clean_up(struct h */ _dev_comp_vect_cpu_mask_clean_up(dd, entry); unlock: +
[PATCH 5.10 066/188] mac80211: fix time-is-after bug in mlme
From: Ben Greear commit 7d73cd946d4bc7d44cdc5121b1c61d5d71425dea upstream. The incorrect timeout check caused probing to happen when it did not need to happen. This in turn caused tx performance drop for around 5 seconds in ath10k-ct driver. Possibly that tx drop is due to a secondary issue, but fixing the probe to not happen when traffic is running fixes the symptom. Signed-off-by: Ben Greear Fixes: 9abf4e49830d ("mac80211: optimize station connection monitor") Acked-by: Felix Fietkau Link: https://lore.kernel.org/r/20210330230749.14097-1-gree...@candelatech.com Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman --- net/mac80211/mlme.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -4660,7 +4660,10 @@ static void ieee80211_sta_conn_mon_timer timeout = sta->rx_stats.last_rx; timeout += IEEE80211_CONNECTION_IDLE_TIME; - if (time_is_before_jiffies(timeout)) { + /* If timeout is after now, then update timer to fire at +* the later date, but do not actually probe at this time. +*/ + if (time_is_after_jiffies(timeout)) { mod_timer(>conn_mon_timer, round_jiffies_up(timeout)); return; }
[PATCH 5.10 020/188] IB/hfi1: Fix probe time panic when AIP is enabled with a buggy BIOS
From: Mike Marciniszyn commit 5de61a47eb9064cbbc5f3360d639e8e34a690a54 upstream. A panic can result when AIP is enabled: BUG: unable to handle kernel NULL pointer dereference at 000 PGD 0 P4D 0 Oops: 1 SMP PTI CPU: 70 PID: 981 Comm: systemd-udevd Tainted: G OE - - - 4.18.0-240.el8.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.01.01.0005.101720141054 10/17/2014 RIP: 0010:__bitmap_and+0x1b/0x70 RSP: 0018:99aa0845f9f0 EFLAGS: 00010246 RAX: RBX: 8d5a6fc18000 RCX: 0048 RDX: RSI: c06336f0 RDI: 8d5a8fa67750 RBP: 0079 R08: 000f R09: R10: R11: 0001 R12: c06336f0 R13: 00a0 R14: 8d5a6fc18000 R15: 0003 FS: 7fec137a5980() GS:8d5a9fa8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 000a04b48002 CR4: 001606e0 Call Trace: hfi1_num_netdev_contexts+0x7c/0x110 [hfi1] hfi1_init_dd+0xd7f/0x1a90 [hfi1] ? pci_bus_read_config_dword+0x49/0x70 ? pci_mmcfg_read+0x3e/0xe0 do_init_one.isra.18+0x336/0x640 [hfi1] local_pci_probe+0x41/0x90 pci_device_probe+0x105/0x1c0 really_probe+0x212/0x440 driver_probe_device+0x49/0xc0 device_driver_attach+0x50/0x60 __driver_attach+0x61/0x130 ? device_driver_attach+0x60/0x60 bus_for_each_dev+0x77/0xc0 ? klist_add_tail+0x3b/0x70 bus_add_driver+0x14d/0x1e0 ? dev_init+0x10b/0x10b [hfi1] driver_register+0x6b/0xb0 ? dev_init+0x10b/0x10b [hfi1] hfi1_mod_init+0x1e6/0x20a [hfi1] do_one_initcall+0x46/0x1c3 ? free_unref_page_commit+0x91/0x100 ? _cond_resched+0x15/0x30 ? kmem_cache_alloc_trace+0x140/0x1c0 do_init_module+0x5a/0x220 load_module+0x14b4/0x17e0 ? __do_sys_finit_module+0xa8/0x110 __do_sys_finit_module+0xa8/0x110 do_syscall_64+0x5b/0x1a0 The issue happens when pcibus_to_node() returns NO_NUMA_NODE. Fix this issue by moving the initialization of dd->node to hfi1_devdata allocation and remove the other pcibus_to_node() calls in the probe path and use dd->node instead. Affinity logic is adjusted to use a new field dd->affinity_entry as a guard instead of dd->node. Fixes: 4730f4a6c6b2 ("IB/hfi1: Activate the dummy netdev") Link: https://lore.kernel.org/r/1617025700-31865-4-git-send-email-dennis.dalessan...@cornelisnetworks.com Cc: sta...@vger.kernel.org Signed-off-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hfi1/affinity.c | 21 + drivers/infiniband/hw/hfi1/hfi.h |1 + drivers/infiniband/hw/hfi1/init.c | 10 +- drivers/infiniband/hw/hfi1/netdev_rx.c |3 +-- 4 files changed, 16 insertions(+), 19 deletions(-) --- a/drivers/infiniband/hw/hfi1/affinity.c +++ b/drivers/infiniband/hw/hfi1/affinity.c @@ -632,22 +632,11 @@ static void _dev_comp_vect_cpu_mask_clea */ int hfi1_dev_affinity_init(struct hfi1_devdata *dd) { - int node = pcibus_to_node(dd->pcidev->bus); struct hfi1_affinity_node *entry; const struct cpumask *local_mask; int curr_cpu, possible, i, ret; bool new_entry = false; - /* -* If the BIOS does not have the NUMA node information set, select -* NUMA 0 so we get consistent performance. -*/ - if (node < 0) { - dd_dev_err(dd, "Invalid PCI NUMA node. Performance may be affected\n"); - node = 0; - } - dd->node = node; - local_mask = cpumask_of_node(dd->node); if (cpumask_first(local_mask) >= nr_cpu_ids) local_mask = topology_core_cpumask(0); @@ -660,7 +649,7 @@ int hfi1_dev_affinity_init(struct hfi1_d * create an entry in the global affinity structure and initialize it. */ if (!entry) { - entry = node_affinity_allocate(node); + entry = node_affinity_allocate(dd->node); if (!entry) { dd_dev_err(dd, "Unable to allocate global affinity node\n"); @@ -751,6 +740,7 @@ int hfi1_dev_affinity_init(struct hfi1_d if (new_entry) node_affinity_add_tail(entry); + dd->affinity_entry = entry; mutex_unlock(_affinity.lock); return 0; @@ -766,10 +756,9 @@ void hfi1_dev_affinity_clean_up(struct h { struct hfi1_affinity_node *entry; - if (dd->node < 0) - return; - mutex_lock(_affinity.lock); + if (!dd->affinity_entry) + goto unlock; entry = node_affinity_lookup(dd->node); if (!entry) goto unlock; @@ -780,8 +769,8 @@ void hfi1_dev_affinity_clean_up(struct h */ _dev_comp_vect_cpu_mask_clean_up(dd, entry); unlock: +
[tip: core/rcu] rcutorture: Make TREE03 use real-time tree.use_softirq setting
The following commit has been merged into the core/rcu branch of tip: Commit-ID: e2b949d54392ad890bb10fb8954d967e2fcd7503 Gitweb: https://git.kernel.org/tip/e2b949d54392ad890bb10fb8954d967e2fcd7503 Author:Paul E. McKenney AuthorDate:Thu, 14 Jan 2021 16:11:04 -08:00 Committer: Paul E. McKenney CommitterDate: Mon, 08 Mar 2021 14:21:40 -08:00 rcutorture: Make TREE03 use real-time tree.use_softirq setting TREE03 tests RCU priority boosting, which is a real-time feature. It would also be good if it tested something closer to what is actually used by the real-time folks. This commit therefore adds tree.use_softirq=0 to the TREE03 kernel boot parameters in TREE03.boot. Signed-off-by: Paul E. McKenney --- tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot index 1c21894..64f864f 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot @@ -4,3 +4,4 @@ rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3 rcutree.kthread_prio=2 threadirqs +tree.use_softirq=0
[PATCH v3 4/6] perf arm-spe: Assign kernel time to synthesized event
In current code, it assigns the arch timer counter to the synthesized samples Arm SPE trace, thus the samples don't contain the kernel time but only contain the raw counter value. To fix the issue, this patch converts the timer counter to kernel time and assigns it to sample timestamp. Signed-off-by: Leo Yan --- tools/perf/util/arm-spe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index f66e10c62473..ec7df83b50fd 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -234,7 +234,7 @@ static void arm_spe_prep_sample(struct arm_spe *spe, struct arm_spe_record *record = >decoder->record; if (!spe->timeless_decoding) - sample->time = speq->timestamp; + sample->time = tsc_to_perf_time(record->timestamp, >tc); sample->ip = record->from_ip; sample->cpumode = arm_spe_cpumode(spe, sample->ip); -- 2.25.1
[PATCH v3 3/6] perf arm-spe: Convert event kernel time to counter value
When handle a perf event, Arm SPE decoder needs to decide if this perf event is earlier or later than the samples from Arm SPE trace data; to do comparision, it needs to use the same unit for the time. This patch converts the event kernel time to arch timer's counter value, thus it can be used to compare with counter value contained in Arm SPE Timestamp packet. Signed-off-by: Leo Yan --- tools/perf/util/arm-spe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index b48816d5c0b4..f66e10c62473 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -669,7 +669,7 @@ static int arm_spe_process_event(struct perf_session *session, } if (sample->time && (sample->time != (u64) -1)) - timestamp = sample->time; + timestamp = perf_time_to_tsc(sample->time, >tc); else timestamp = 0; -- 2.25.1
[PATCH v2] usb: core: reduce power-on-good delay time of root hub
Return the exactly delay time given by root hub descriptor, this helps to reduce resume time etc. Due to the root hub descriptor is usually provided by the host controller driver, if there is compatibility for a root hub, we can fix it easily without affect other root hub Acked-by: Alan Stern Signed-off-by: Chunfeng Yun --- v2: remove RFC tag, and add acked-by Alan --- drivers/usb/core/hub.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h index 73f4482d833a..22ea1f4f2d66 100644 --- a/drivers/usb/core/hub.h +++ b/drivers/usb/core/hub.h @@ -148,8 +148,10 @@ static inline unsigned hub_power_on_good_delay(struct usb_hub *hub) { unsigned delay = hub->descriptor->bPwrOn2PwrGood * 2; - /* Wait at least 100 msec for power to become stable */ - return max(delay, 100U); + if (!hub->hdev->parent) /* root hub */ + return delay; + else /* Wait at least 100 msec for power to become stable */ + return max(delay, 100U); } static inline int hub_port_debounce_be_connected(struct usb_hub *hub, -- 2.18.0
Re: [RFC PATCH] usb: core: reduce power-on-good delay time of root hub
On Fri, Apr 09, 2021 at 10:39:07AM +0800, Chunfeng Yun wrote: > Return the exactly delay time given by root hub descriptor, > this helps to reduce resume time etc. > > Due to the root hub descriptor is usually provided by the host > controller driver, if there is compatibility for a root hub, > we can fix it easily without affect other root hub > > Signed-off-by: Chunfeng Yun Acked-by: Alan Stern > --- > drivers/usb/core/hub.h | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h > index 73f4482d833a..22ea1f4f2d66 100644 > --- a/drivers/usb/core/hub.h > +++ b/drivers/usb/core/hub.h > @@ -148,8 +148,10 @@ static inline unsigned hub_power_on_good_delay(struct > usb_hub *hub) > { > unsigned delay = hub->descriptor->bPwrOn2PwrGood * 2; > > - /* Wait at least 100 msec for power to become stable */ > - return max(delay, 100U); > + if (!hub->hdev->parent) /* root hub */ > + return delay; > + else /* Wait at least 100 msec for power to become stable */ > + return max(delay, 100U); > } > > static inline int hub_port_debounce_be_connected(struct usb_hub *hub, > -- > 2.18.0 >
[PATCH v4 3/3] ima: enable loading of build time generated key on .ima keyring
The kernel currently only loads the kernel module signing key onto the builtin trusted keyring. Load the module signing key onto the IMA keyring as well. Signed-off-by: Nayna Jain Acked-by: Stefan Berger --- certs/system_certificates.S | 13 - certs/system_keyring.c| 50 --- include/keys/system_keyring.h | 7 + security/integrity/digsig.c | 2 ++ 4 files changed, 61 insertions(+), 11 deletions(-) diff --git a/certs/system_certificates.S b/certs/system_certificates.S index 8f29058adf93..dcad27ea8527 100644 --- a/certs/system_certificates.S +++ b/certs/system_certificates.S @@ -8,9 +8,11 @@ .globl system_certificate_list system_certificate_list: __cert_list_start: -#ifdef CONFIG_MODULE_SIG +__module_cert_start: +#if defined(CONFIG_MODULE_SIG) || defined(CONFIG_IMA_APPRAISE_MODSIG) .incbin "certs/signing_key.x509" #endif +__module_cert_end: .incbin "certs/x509_certificate_list" __cert_list_end: @@ -35,3 +37,12 @@ system_certificate_list_size: #else .long __cert_list_end - __cert_list_start #endif + + .align 8 + .globl module_cert_size +module_cert_size: +#ifdef CONFIG_64BIT + .quad __module_cert_end - __module_cert_start +#else + .long __module_cert_end - __module_cert_start +#endif diff --git a/certs/system_keyring.c b/certs/system_keyring.c index 4b693da488f1..2b3ad375ecc1 100644 --- a/certs/system_keyring.c +++ b/certs/system_keyring.c @@ -27,6 +27,7 @@ static struct key *platform_trusted_keys; extern __initconst const u8 system_certificate_list[]; extern __initconst const unsigned long system_certificate_list_size; +extern __initconst const unsigned long module_cert_size; /** * restrict_link_to_builtin_trusted - Restrict keyring addition by built in CA @@ -132,19 +133,11 @@ static __init int system_trusted_keyring_init(void) */ device_initcall(system_trusted_keyring_init); -/* - * Load the compiled-in list of X.509 certificates. - */ -static __init int load_system_certificate_list(void) +static __init int load_cert(const u8 *p, const u8 *end, struct key *keyring) { key_ref_t key; - const u8 *p, *end; size_t plen; - pr_notice("Loading compiled-in X.509 certificates\n"); - - p = system_certificate_list; - end = p + system_certificate_list_size; while (p < end) { /* Each cert begins with an ASN.1 SEQUENCE tag and must be more * than 256 bytes in size. @@ -159,7 +152,7 @@ static __init int load_system_certificate_list(void) if (plen > end - p) goto dodgy_cert; - key = key_create_or_update(make_key_ref(builtin_trusted_keys, 1), + key = key_create_or_update(make_key_ref(keyring, 1), "asymmetric", NULL, p, @@ -186,6 +179,43 @@ static __init int load_system_certificate_list(void) pr_err("Problem parsing in-kernel X.509 certificate list\n"); return 0; } + +__init int load_module_cert(struct key *keyring) +{ + const u8 *p, *end; + + if (!IS_ENABLED(CONFIG_IMA_APPRAISE_MODSIG)) + return 0; + + pr_notice("Loading compiled-in module X.509 certificates\n"); + + p = system_certificate_list; + end = p + module_cert_size; + + return load_cert(p, end, keyring); +} + +/* + * Load the compiled-in list of X.509 certificates. + */ +static __init int load_system_certificate_list(void) +{ + const u8 *p, *end; + unsigned long size; + + pr_notice("Loading compiled-in X.509 certificates\n"); + +#ifdef CONFIG_MODULE_SIG + p = system_certificate_list; + size = system_certificate_list_size; +#else + p = system_certificate_list + module_cert_size; + size = system_certificate_list_size - module_cert_size; +#endif + + end = p + size; + return load_cert(p, end, builtin_trusted_keys); +} late_initcall(load_system_certificate_list); #ifdef CONFIG_SYSTEM_DATA_VERIFICATION diff --git a/include/keys/system_keyring.h b/include/keys/system_keyring.h index fb8b07daa9d1..f954276c616a 100644 --- a/include/keys/system_keyring.h +++ b/include/keys/system_keyring.h @@ -16,9 +16,16 @@ extern int restrict_link_by_builtin_trusted(struct key *keyring, const struct key_type *type, const union key_payload *payload, struct key *restriction_key); +extern __init int load_module_cert(struct key *keyring); #else #define restrict_link_by_builtin_trusted restrict_link_reject + +static inline __init int load_module_cert(struct key *keyring) +{ + return 0; +} + #endif #ifdef CONFIG_SECONDARY_TRUSTED_KEYRING diff --git a/security/integrity/digsig.c b/security/integrity/digsig.c
[PATCH v4 1/3] keys: cleanup build time module signing keys
The "mrproper" target is still looking for build time generated keys in the kernel root directory instead of certs directory. Fix the path and remove the names of the files which are no longer generated. Fixes: cfc411e7fff3 ("Move certificate handling to its own directory") Signed-off-by: Nayna Jain Reviewed-by: Stefan Berger Reviewed-by: Mimi Zohar Reviewed-by: Jarkko Sakkinen --- Makefile | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index cc77fd45ca64..d64c94f41edb 100644 --- a/Makefile +++ b/Makefile @@ -1523,9 +1523,9 @@ MRPROPER_FILES += include/config include/generated \ debian snap tar-install \ .config .config.old .version \ Module.symvers \ - signing_key.pem signing_key.priv signing_key.x509 \ - x509.genkey extra_certificates signing_key.x509.keyid \ - signing_key.x509.signer vmlinux-gdb.py \ + certs/signing_key.pem certs/signing_key.x509 \ + certs/x509.genkey \ + vmlinux-gdb.py \ *.spec # Directories & files removed with 'make distclean' -- 2.29.2
[PATCH v4 2/3] ima: enable signing of modules with build time generated key
The kernel build process currently only signs kernel modules when MODULE_SIG is enabled. Also, sign the kernel modules at build time when IMA_APPRAISE_MODSIG is enabled. Signed-off-by: Nayna Jain Acked-by: Stefan Berger --- certs/Kconfig | 2 +- certs/Makefile | 8 init/Kconfig | 6 +++--- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/certs/Kconfig b/certs/Kconfig index c94e93d8bccf..48675ad319db 100644 --- a/certs/Kconfig +++ b/certs/Kconfig @@ -4,7 +4,7 @@ menu "Certificates for signature checking" config MODULE_SIG_KEY string "File name or PKCS#11 URI of module signing key" default "certs/signing_key.pem" - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG help Provide the file name of a private key/certificate in PEM format, or a PKCS#11 URI according to RFC7512. The file should contain, or diff --git a/certs/Makefile b/certs/Makefile index f4c25b67aad9..e3185c57fbd8 100644 --- a/certs/Makefile +++ b/certs/Makefile @@ -32,6 +32,14 @@ endif # CONFIG_SYSTEM_TRUSTED_KEYRING clean-files := x509_certificate_list .x509.list ifeq ($(CONFIG_MODULE_SIG),y) + SIGN_KEY = y +endif + +ifeq ($(CONFIG_IMA_APPRAISE_MODSIG),y) + SIGN_KEY = y +endif + +ifdef SIGN_KEY ### # # If module signing is requested, say by allyesconfig, but a key has not been diff --git a/init/Kconfig b/init/Kconfig index 5f5c776ef192..85e48a578f90 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2164,7 +2164,7 @@ config MODULE_SIG_FORCE config MODULE_SIG_ALL bool "Automatically sign all modules" default y - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG help Sign all modules during make modules_install. Without this option, modules must be signed manually, using the scripts/sign-file tool. @@ -2174,7 +2174,7 @@ comment "Do not forget to sign required modules with scripts/sign-file" choice prompt "Which hash algorithm should modules be signed with?" - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG help This determines which sort of hashing algorithm will be used during signature generation. This algorithm _must_ be built into the kernel @@ -2206,7 +2206,7 @@ endchoice config MODULE_SIG_HASH string - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG default "sha1" if MODULE_SIG_SHA1 default "sha224" if MODULE_SIG_SHA224 default "sha256" if MODULE_SIG_SHA256 -- 2.29.2
Re: [PATCH 3/3] media: venus: don't de-reference NULL pointers at IRQ time
Hi Mauro, On 4/8/21 10:40 AM, Mauro Carvalho Chehab wrote: > Smatch is warning that: > drivers/media/platform/qcom/venus/hfi_venus.c:1100 venus_isr() warn: > variable dereferenced before check 'hdev' (see line 1097) > > The logic basically does: > hdev = to_hfi_priv(core); > > with is translated to: > hdev = core->priv; > > If the IRQ code can receive a NULL pointer for hdev, there's > a bug there, as it will first try to de-reference the pointer, > and then check if it is null. > > After looking at the code, it seems that this indeed can happen: > Basically, the venus IRQ thread is started with: > devm_request_threaded_irq() > So, it will only be freed after the driver unbinds. > > In order to prevent the IRQ code to work with freed data, > the logic at venus_hfi_destroy() sets core->priv to NULL, > which would make the IRQ code to ignore any pending IRQs. > > There is, however a race condition, as core->priv is set > to NULL only after being freed. So, we need also to move the > core->priv = NULL to happen earlier. > > Signed-off-by: Mauro Carvalho Chehab > --- > drivers/media/platform/qcom/venus/hfi_venus.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) Acked-by: Stanimir Varbanov > > diff --git a/drivers/media/platform/qcom/venus/hfi_venus.c > b/drivers/media/platform/qcom/venus/hfi_venus.c > index cebb20cf371f..ce98c523b3c6 100644 > --- a/drivers/media/platform/qcom/venus/hfi_venus.c > +++ b/drivers/media/platform/qcom/venus/hfi_venus.c > @@ -1094,12 +1094,15 @@ static irqreturn_t venus_isr(struct venus_core *core) > { > struct venus_hfi_device *hdev = to_hfi_priv(core); > u32 status; > - void __iomem *cpu_cs_base = hdev->core->cpu_cs_base; > - void __iomem *wrapper_base = hdev->core->wrapper_base; > + void __iomem *cpu_cs_base; > + void __iomem *wrapper_base; > > if (!hdev) > return IRQ_NONE; > > + cpu_cs_base = hdev->core->cpu_cs_base; > + wrapper_base = hdev->core->wrapper_base; > + > status = readl(wrapper_base + WRAPPER_INTR_STATUS); > if (IS_V6(core)) { > if (status & WRAPPER_INTR_STATUS_A2H_MASK || > @@ -1650,10 +1653,10 @@ void venus_hfi_destroy(struct venus_core *core) > { > struct venus_hfi_device *hdev = to_hfi_priv(core); > > + core->priv = NULL; > venus_interface_queues_release(hdev); > mutex_destroy(>lock); > kfree(hdev); > - core->priv = NULL; > core->ops = NULL; > } > > -- regards, Stan
[RFC PATCH] usb: core: reduce power-on-good delay time of root hub
Return the exactly delay time given by root hub descriptor, this helps to reduce resume time etc. Due to the root hub descriptor is usually provided by the host controller driver, if there is compatibility for a root hub, we can fix it easily without affect other root hub Signed-off-by: Chunfeng Yun --- drivers/usb/core/hub.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h index 73f4482d833a..22ea1f4f2d66 100644 --- a/drivers/usb/core/hub.h +++ b/drivers/usb/core/hub.h @@ -148,8 +148,10 @@ static inline unsigned hub_power_on_good_delay(struct usb_hub *hub) { unsigned delay = hub->descriptor->bPwrOn2PwrGood * 2; - /* Wait at least 100 msec for power to become stable */ - return max(delay, 100U); + if (!hub->hdev->parent) /* root hub */ + return delay; + else /* Wait at least 100 msec for power to become stable */ + return max(delay, 100U); } static inline int hub_port_debounce_be_connected(struct usb_hub *hub, -- 2.18.0
Re: [RFC PATCH v6 1/1] cmdline: Add capability to both append and prepend at the same time
On Sun, Apr 4, 2021 at 12:20 PM Christophe Leroy wrote: > > One user has expressed the need to both append and prepend some > built-in parameters to the command line provided by the bootloader. > > Allthough it is a corner case, it is easy to implement so let's do it. > > When the user chooses to prepend the bootloader provided command line > with the built-in command line, he is offered the possibility to enter > an additionnal built-in command line to be appended after the > bootloader provided command line. > > It is a complementary feature which has no impact on the already > existing ones and/or the existing defconfig. > > Suggested-by: Daniel Walker > Signed-off-by: Christophe Leroy > --- > Sending this out as an RFC, applies on top of the series > ("Implement GENERIC_CMDLINE"). I will add it to the series next spin > unless someone is against it. Well, it works, but you are working around the existing kconfig and the result is not great. You'd never design it this way. Rob
[PATCH 3/3] media: venus: don't de-reference NULL pointers at IRQ time
Smatch is warning that: drivers/media/platform/qcom/venus/hfi_venus.c:1100 venus_isr() warn: variable dereferenced before check 'hdev' (see line 1097) The logic basically does: hdev = to_hfi_priv(core); with is translated to: hdev = core->priv; If the IRQ code can receive a NULL pointer for hdev, there's a bug there, as it will first try to de-reference the pointer, and then check if it is null. After looking at the code, it seems that this indeed can happen: Basically, the venus IRQ thread is started with: devm_request_threaded_irq() So, it will only be freed after the driver unbinds. In order to prevent the IRQ code to work with freed data, the logic at venus_hfi_destroy() sets core->priv to NULL, which would make the IRQ code to ignore any pending IRQs. There is, however a race condition, as core->priv is set to NULL only after being freed. So, we need also to move the core->priv = NULL to happen earlier. Signed-off-by: Mauro Carvalho Chehab --- drivers/media/platform/qcom/venus/hfi_venus.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/media/platform/qcom/venus/hfi_venus.c b/drivers/media/platform/qcom/venus/hfi_venus.c index cebb20cf371f..ce98c523b3c6 100644 --- a/drivers/media/platform/qcom/venus/hfi_venus.c +++ b/drivers/media/platform/qcom/venus/hfi_venus.c @@ -1094,12 +1094,15 @@ static irqreturn_t venus_isr(struct venus_core *core) { struct venus_hfi_device *hdev = to_hfi_priv(core); u32 status; - void __iomem *cpu_cs_base = hdev->core->cpu_cs_base; - void __iomem *wrapper_base = hdev->core->wrapper_base; + void __iomem *cpu_cs_base; + void __iomem *wrapper_base; if (!hdev) return IRQ_NONE; + cpu_cs_base = hdev->core->cpu_cs_base; + wrapper_base = hdev->core->wrapper_base; + status = readl(wrapper_base + WRAPPER_INTR_STATUS); if (IS_V6(core)) { if (status & WRAPPER_INTR_STATUS_A2H_MASK || @@ -1650,10 +1653,10 @@ void venus_hfi_destroy(struct venus_core *core) { struct venus_hfi_device *hdev = to_hfi_priv(core); + core->priv = NULL; venus_interface_queues_release(hdev); mutex_destroy(>lock); kfree(hdev); - core->priv = NULL; core->ops = NULL; } -- 2.30.2
Re: [PATCH] mtd: add OTP (one-time-programmable) erase ioctl
Hello, Michael Walle wrote on Thu, 08 Apr 2021 08:55:42 +0200: > Hi Tudor, > > Am 2021-04-08 07:51, schrieb tudor.amba...@microchip.com: > > Would you please resend this patch, together with the mtd-utils > > and the SPI NOR patch in a single patch set? You'll help us all > > having all in a single place. > > This has already been picked-up: > https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git/commit/?h=mtd/next=e3c1f1c92d6ede3cfa09d6a103d3d1c1ef645e35 > > Although, I didn't receive an email notice. > > -michael Sometimes the notifications are not triggered when there is a conflict when applying the patch from patchwork directly. I usually answer manually in this case but I might have forgotten. About the patch, I felt it was good enough for merging, and I want to avoid applying such patches right before freezing our branches. Hence, I tend to be more aggressive earlier in the release cycles because I hate when my patches get delayed infinitely. The other side is a more careful approach when -rc6 gets tagged so that I can drop anything which would be crazily broken before our -next branches are stalled, leading for an useless public revert. Of course, I am fully open to removing this patch from -next if you ever feel it was too early and will happily get rid of it for this release: we can move the patch for the next release if you agree on this (especially since it touches the ABI). Cheers, Miquèl
Re: [PATCH] mtd: add OTP (one-time-programmable) erase ioctl
Hi Tudor, Am 2021-04-08 07:51, schrieb tudor.amba...@microchip.com: Would you please resend this patch, together with the mtd-utils and the SPI NOR patch in a single patch set? You'll help us all having all in a single place. This has already been picked-up: https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git/commit/?h=mtd/next=e3c1f1c92d6ede3cfa09d6a103d3d1c1ef645e35 Although, I didn't receive an email notice. -michael
Re: [PATCH] mtd: add OTP (one-time-programmable) erase ioctl
Michael, Would you please resend this patch, together with the mtd-utils and the SPI NOR patch in a single patch set? You'll help us all having all in a single place. For the new ioctl we'll need acks from all the mtd maintainers and at least a tested-by tag. Cheers, ta
[PATCH v13 07/18] arm64: kexec: flush image and lists during kexec load time
Currently, during kexec load we are copying relocation function and flushing it. However, we can also flush kexec relocation buffers and if new kernel image is already in place (i.e. crash kernel), we can also flush the new kernel image itself. Signed-off-by: Pavel Tatashin --- arch/arm64/kernel/machine_kexec.c | 49 +++ 1 file changed, 23 insertions(+), 26 deletions(-) diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 90a335c74442..3a034bc25709 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -59,23 +59,6 @@ void machine_kexec_cleanup(struct kimage *kimage) /* Empty routine needed to avoid build errors. */ } -int machine_kexec_post_load(struct kimage *kimage) -{ - void *reloc_code = page_to_virt(kimage->control_code_page); - - memcpy(reloc_code, arm64_relocate_new_kernel, - arm64_relocate_new_kernel_size); - kimage->arch.kern_reloc = __pa(reloc_code); - kexec_image_info(kimage); - - /* Flush the reloc_code in preparation for its execution. */ - __flush_dcache_area(reloc_code, arm64_relocate_new_kernel_size); - flush_icache_range((uintptr_t)reloc_code, (uintptr_t)reloc_code + - arm64_relocate_new_kernel_size); - - return 0; -} - /** * machine_kexec_prepare - Prepare for a kexec reboot. * @@ -152,6 +135,29 @@ static void kexec_segment_flush(const struct kimage *kimage) } } +int machine_kexec_post_load(struct kimage *kimage) +{ + void *reloc_code = page_to_virt(kimage->control_code_page); + + /* If in place flush new kernel image, else flush lists and buffers */ + if (kimage->head & IND_DONE) + kexec_segment_flush(kimage); + else + kexec_list_flush(kimage); + + memcpy(reloc_code, arm64_relocate_new_kernel, + arm64_relocate_new_kernel_size); + kimage->arch.kern_reloc = __pa(reloc_code); + kexec_image_info(kimage); + + /* Flush the reloc_code in preparation for its execution. */ + __flush_dcache_area(reloc_code, arm64_relocate_new_kernel_size); + flush_icache_range((uintptr_t)reloc_code, (uintptr_t)reloc_code + + arm64_relocate_new_kernel_size); + + return 0; +} + /** * machine_kexec - Do the kexec reboot. * @@ -169,13 +175,6 @@ void machine_kexec(struct kimage *kimage) WARN(in_kexec_crash && (stuck_cpus || smp_crash_stop_failed()), "Some CPUs may be stale, kdump will be unreliable.\n"); - /* Flush the kimage list and its buffers. */ - kexec_list_flush(kimage); - - /* Flush the new image if already in place. */ - if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE)) - kexec_segment_flush(kimage); - pr_info("Bye!\n"); local_daif_mask(); @@ -250,8 +249,6 @@ void arch_kexec_protect_crashkres(void) { int i; - kexec_segment_flush(kexec_crash_image); - for (i = 0; i < kexec_crash_image->nr_segments; i++) set_memory_valid( __phys_to_virt(kexec_crash_image->segment[i].mem), -- 2.25.1
Re: [PATCH] kernel/time: Feedback reply for hr_sleep syscall, a fine-grained sleep service
Marco! On Wed, Apr 07 2021 at 11:32, Marco Faltelli wrote: > Current sleep services (nanosleep) provide sleep periods very far from > the expectations when scheuling microsecond-scale timers. On our > testbed, using rdtscp() before and after a nanosleep() syscall to > measure the effective elapsed time with a 1us timer, we got ~59us. > Even with larger timeout periods, the difference is still evident > (e.g., with a 100us timer, we measured ~158us of elapsed time). So the delta is a constant of ~50us, right? > We believe that one of the reasons is the use of the timespec > structure, that needs to be copied for user to kernel and then > converted into a single-value representation. Interesting. > In our work Metronome > (https://dl.acm.org/doi/pdf/10.1145/3386367.3432730) we had the need > for a precise microsecond-granularity sleep service, as nanosleep() > was far from our needs, so we developed hr_sleep(), a new sleep > service. The above 'interesting' assumption made me curious about the deeper insight, so I went to read. Let me give you a few comments on that paper. > In current conventional implementations of the Linux kernel, the support > for (fine-grain) sleep periods of threads is based on the nanosleep() > system call, which has index 35 in the current specification of the > x86-64/Linux system call table. There is also clock_nanosleep(2) for completness sake... > The actual execution path of this system call, particularly at kernel > side, is shown in Figure 1a. When entering kernel mode the thread exploits > two main kernel > level subsystems. One is the scheduling subsystem, which allows managing > the run-queue of threads that can be rescheduled in CPU. The other one > is the high-resolution timers subsystem, which allows posting > timer-expiration requests to the Linux kernel timer wheel. The timer wheel is not involved in this at all. If your timer would end up on the timer wheel your observed latencies would be in the milliseconds range not in the 50usec range. > The latter is a data structure that keeps the ordered set of timer > expiration requests, so that each time one of these timers expires the > subsequent timer expiration request is activated. Not exactly how the timer wheel in the kernel works, but that's irrelevant because it is not involved in this. > The expiration of a timer is associated with the interrupt coming from > the High Precision Event Timer (HPET) on board of x86 processors. You must have a really old and crappy machine to test on. HPET is avoided on any CPU less than 10 years old and pretty much irrelevant or a performace horror on any machine which has more than 4 cores. > In any case, independently of whether preemption will occur, the CPU > cycles spent for that preamble lead to a delay for the post of the > timer- expiration request to the timer wheel, leading the thread to > actually start its timed-sleep phase with a delay. I would have expected a proper measurement of the delay which is caused by that processing in the paper, but in absence of that I instrumented it for you: First of all I implemented the thing myself, because the crud you posted fails to compile (see below) and for other reasons which I spare myself to explain because of that. The regular clock_nanosleep() over the hacked up nanosleep_u64(), which just takes a u64 nanosecond value as argument instead of the timespec pointer has an overhead of ~64 CPU clock cycles on average according to 'perf stat' which amounts to a whopping 32 nanoseconds per syscall on my test machine running at 2 GHz. That's _three_ orders of magnitude off from 50us. There goes the theory. So now where are these 50 microseconds coming from? There is no massive software/hardware induced overhead caused by the timespec pointer handling at all, the 50 microseconds are simply the default timer slack value which is added to the expiry value to allow better batching of timer expiries. That slack is automatically zero for tasks in real time scheduling classes and can also be modified by a system wide setting and per process via prctl(PR_SET_TIMERSLACK, .) except a system policy prevents that. That prtcl has unfortunately a severe limitation: it does not allow to set the slack value to 0, the mininum values is 1 nanosecond, and I'm happy to discuss that when you come up with a proper scientific proof that that _one_ nanosecond matters. As a limited excuse I concede, that the timer slack is barely documented, but i'm thorougly surprised that this has not been figured out and instead of that weird theories about the syscall entry code implications make up several pages of handwaving content of a published and 'reviewed' academic paper. So here is the comparison between the regular clock_nanosleep() with the prctl() used and the u64 based variant which sets the
Re: [RESEND] i2c: mediatek: Get device clock-stretch time via dts
> Due to clock stretch, our HW IP cannot meet the ac-timing > spec(tSU;STA,tSU;STO). > There isn't a same delay for clock stretching, so we need pass a > parameter which can be found through measurement to meet most > conditions. What about using this existing binding? - i2c-scl-internal-delay-ns Number of nanoseconds the IP core additionally needs to setup SCL. signature.asc Description: PGP signature
Re: [PATCH] platform/surface: aggregator_registry: Give devices time to set up when connecting
Hi, On 4/6/21 1:12 AM, Maximilian Luz wrote: > Sometimes, the "base connected" event that we rely on to (re-)attach the > device connected to the base is sent a bit too early. When this happens, > some devices may not be completely ready yet. > > Specifically, the battery has been observed to report zero-values for > things like full charge capacity, which, however, is only loaded once > when the driver for that device probes. This can thus result in battery > readings being unavailable. > > As we cannot easily and reliably discern between devices that are not > ready yet and devices that are not connected (i.e. will never be ready), > delay adding these devices. This should give them enough time to set up. > > The delay is set to 2.5 seconds, which should give us a good safety > margin based on testing and still be fairly responsive for users. > > To achieve that delay switch to updating via a delayed work struct, > which means that we can also get rid of some locking. > > Signed-off-by: Maximilian Luz Thank you for your patch, I've applied this patch to my review-hans branch: https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/log/?h=review-hans Note it will show up in my review-hans branch once I've pushed my local branch there, which might take a while. Once I've run some tests on this branch the patches there will be added to the platform-drivers-x86/for-next branch and eventually will be included in the pdx86 pull-request to Linus for the next merge-window. Regards, Hans > --- > .../surface/surface_aggregator_registry.c | 98 --- > 1 file changed, 40 insertions(+), 58 deletions(-) > > diff --git a/drivers/platform/surface/surface_aggregator_registry.c > b/drivers/platform/surface/surface_aggregator_registry.c > index eccb9d1007cd..685d37a7add1 100644 > --- a/drivers/platform/surface/surface_aggregator_registry.c > +++ b/drivers/platform/surface/surface_aggregator_registry.c > @@ -13,10 +13,10 @@ > #include > #include > #include > -#include > #include > #include > #include > +#include > > #include > #include > @@ -287,6 +287,13 @@ static int ssam_hub_add_devices(struct device *parent, > struct ssam_controller *c > > /* -- SSAM base-hub driver. > - */ > > +/* > + * Some devices (especially battery) may need a bit of time to be fully > usable > + * after being (re-)connected. This delay has been determined via > + * experimentation. > + */ > +#define SSAM_BASE_UPDATE_CONNECT_DELAY msecs_to_jiffies(2500) > + > enum ssam_base_hub_state { > SSAM_BASE_HUB_UNINITIALIZED, > SSAM_BASE_HUB_CONNECTED, > @@ -296,8 +303,8 @@ enum ssam_base_hub_state { > struct ssam_base_hub { > struct ssam_device *sdev; > > - struct mutex lock; /* Guards state update checks and transitions. */ > enum ssam_base_hub_state state; > + struct delayed_work update_work; > > struct ssam_event_notifier notif; > }; > @@ -335,11 +342,7 @@ static ssize_t ssam_base_hub_state_show(struct device > *dev, struct device_attrib > char *buf) > { > struct ssam_base_hub *hub = dev_get_drvdata(dev); > - bool connected; > - > - mutex_lock(>lock); > - connected = hub->state == SSAM_BASE_HUB_CONNECTED; > - mutex_unlock(>lock); > + bool connected = hub->state == SSAM_BASE_HUB_CONNECTED; > > return sysfs_emit(buf, "%d\n", connected); > } > @@ -356,16 +359,20 @@ static const struct attribute_group ssam_base_hub_group > = { > .attrs = ssam_base_hub_attrs, > }; > > -static int __ssam_base_hub_update(struct ssam_base_hub *hub, enum > ssam_base_hub_state new) > +static void ssam_base_hub_update_workfn(struct work_struct *work) > { > + struct ssam_base_hub *hub = container_of(work, struct ssam_base_hub, > update_work.work); > struct fwnode_handle *node = dev_fwnode(>sdev->dev); > + enum ssam_base_hub_state state; > int status = 0; > > - lockdep_assert_held(>lock); > + status = ssam_base_hub_query_state(hub, ); > + if (status) > + return; > > - if (hub->state == new) > - return 0; > - hub->state = new; > + if (hub->state == state) > + return; > + hub->state = state; > > if (hub->state == SSAM_BASE_HUB_CONNECTED) > status = ssam_hub_add_devices(>sdev->dev, hub->sdev->ctrl, > node); > @@ -374,51 +381,28 @@ static int __ssam_base_hub_update(struct ssam_base_hub >
Re: [RESEND] i2c: mediatek: Get device clock-stretch time via dts
On Tue, 2021-04-06 at 21:48 +0200, Wolfram Sang wrote: > On Sat, Mar 13, 2021 at 04:04:24PM +0800, qii.w...@mediatek.com wrote: > > From: Qii Wang > > > > tSU,STA/tHD,STA/tSU,STOP maybe out of spec due to device > > clock-stretching or circuit loss, we could get device > > clock-stretch time from dts to adjust these parameters > > to meet the spec via EXT_CONF register. > > > > Signed-off-by: Qii Wang > > I tried to understand from the code what the new binding expresses, but > I don't fully understand it. Is it the maximum clock stretch time? > Because I cannot recall a device which always uses the same delay for > clock stretching. > Due to clock stretch, our HW IP cannot meet the ac-timing spec(tSU;STA,tSU;STO). There isn't a same delay for clock stretching, so we need pass a parameter which can be found through measurement to meet most conditions.
[PATCH] kernel/time: Feedback reply for hr_sleep syscall, a fine-grained sleep service
Current sleep services (nanosleep) provide sleep periods very far from the expectations when scheuling microsecond-scale timers. On our testbed, using rdtscp() before and after a nanosleep() syscall to measure the effective elapsed time with a 1us timer, we got ~59us. Even with larger timeout periods, the difference is still evident (e.g., with a 100us timer, we measured ~158us of elapsed time). We believe that one of the reasons is the use of the timespec structure, that needs to be copied for user to kernel and then converted into a single-value representation. In our work Metronome (https://dl.acm.org/doi/pdf/10.1145/3386367.3432730) we had the need for a precise microsecond-granularity sleep service, as nanosleep() was far from our needs, so we developed hr_sleep(), a new sleep service. Since the sleep periods needed in our case are small, we don't want our sleep service to re-schedule a timer in case of a signal interruption, so it just returns -EINTR to the user. The user must be aware that this is a best-effort sleep service, so the sleep period specified is an upper-bound of the effective elapsed time. We believe this patch can be useful in applications where fine-grained granularity is requested for small sleep periods, and re-scheduling the timer in case of a signal is not mandatory. In the paper previously linked, Section 3.1 provides more details about hr_sleep and Section 3.3 extensively evaluates hr_sleep() and compares it to nanosleep(). For a 1us timeout, hr_sleep() elapses ~3.8us in mean vs. the ~59us of nanosleep(). hr_sleep has been previously submitted at https://lore.kernel.org/lkml/20210115180733.5663-1-marco.falte...@uniroma2.it/. This commit answers to the previous feedback in https://lore.kernel.org/lkml/CALCETrWfnL=3m3nmmhs-a3si5jptsctf6cethvtsdnwa5mh...@mail.gmail.com/ and applies the requested changes. Signed-off-by: Marco Faltelli --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + kernel/time/hrtimer.c | 67 ++ 2 files changed, 68 insertions(+) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 7bf01cbe582f..85b14dfa40fb 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -364,6 +364,7 @@ 440common process_madvise sys_process_madvise 441common epoll_pwait2sys_epoll_pwait2 442common mount_setattr sys_mount_setattr +443common hr_sleepsys_hr_sleep # # Due to a historical design error, certain syscalls are numbered differently diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 4a66725b1d4a..887c01392e08 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -2006,6 +2006,73 @@ SYSCALL_DEFINE2(nanosleep_time32, struct old_timespec32 __user *, rqtp, } #endif + + +#ifdef CONFIG_64BIT + + +struct control_record { + struct task_struct *task; + int awake; + struct hrtimer hr_timer; +}; + + +static enum hrtimer_restart hr_sleep_callback(struct hrtimer *timer) +{ + struct control_record *control; + struct task_struct *the_task; + + control = (control_record *)container_of(timer, control_record, hr_timer); + control->awake = 1; + the_task = control->task; + wake_up_process(the_task); + + return HRTIMER_NORESTART; +} + + + +/** + * hr_sleep - a high-resolution sleep service for fine-grained timeouts + * @nanoseconds: the requested sleep period in nanoseconds + * + * Returns: + * 0 when the sleep request successfully terminated + * -EINVAL if a sleep period < 0 is requested + * -EINTR if a signal interrupted the calling thread + */ +SYSCALL_DEFINE1(hr_sleep, long, nanoseconds) +{ + DECLARE_WAIT_QUEUE_HEAD(the_queue); + struct control_record control; + ktime_t ktime_interval; + struct restart_block *restart; + + if (nanoseconds < 0) + return -EINVAL; + + if (nanoseconds == 0) + return 0; + + ktime_interval = ktime_set(0, nanoseconds); + hrtimer_init(&(control.hr_timer), CLOCK_MONOTONIC, HRTIMER_MODE_REL); + control.hr_timer.function = _sleep_callback; + control.task = current; + control.awake = 0; + hrtimer_start(&(control.hr_timer), ktime_interval, HRTIMER_MODE_REL); + wait_event_interruptible(the_queue, control.awake == 1); + hrtimer_cancel(&(control.hr_timer)); + if (control.awake == 0) + //We have been interrupted by a signal + return -EINTR; + return 0; + +} + +#endif + + /* * Functions related to boot-time initialization: */ -- 2.25.1
Re: [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
On Tue, Mar 30, 2021, Wanpeng Li wrote: > On Tue, 30 Mar 2021 at 01:15, Sean Christopherson wrote: > > > > +Thomas > > > > On Mon, Mar 29, 2021, Wanpeng Li wrote: > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > > index 32cf828..85695b3 100644 > > > --- a/arch/x86/kvm/vmx/vmx.c > > > +++ b/arch/x86/kvm/vmx/vmx.c > > > @@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct > > > kvm_vcpu *vcpu, > > >* into world and some more. > > >*/ > > > lockdep_hardirqs_off(CALLER_ADDR0); > > > - guest_exit_irqoff(); > > > + if (vtime_accounting_enabled_this_cpu()) > > > + guest_exit_irqoff(); > > > > This looks ok, as CONFIG_CONTEXT_TRACKING and > > CONFIG_VIRT_CPU_ACCOUNTING_GEN are > > selected by CONFIG_NO_HZ_FULL=y, and can't be enabled independently, e.g. > > the > > rcu_user_exit() call won't be delayed because it will never be called in the > > !vtime case. But it still feels wrong poking into those details, e.g. it'll > > be weird and/or wrong guest_exit_irqoff() gains stuff that isn't vtime > > specific. > > Could you elaborate what's the meaning of "it'll be weird and/or wrong > guest_exit_irqoff() gains stuff that isn't vtime specific."? For example, if RCU logic is added to guest_exit_irqoff() that is needed irrespective of vtime, then KVM will end up with different RCU logic depending on whether or not vtime is enabled. RCU is just an example. My point is that it doesn't seem impossible that there would be something in the future that wants to tap into the guest->host transition. Maybe that never happens and the vtime check is perfectly ok, but for me, the name guest_exit_irqoff() doesn't sound like something that should hinge on time accounting being enabled.
Re: [RESEND] i2c: mediatek: Get device clock-stretch time via dts
On Sat, Mar 13, 2021 at 04:04:24PM +0800, qii.w...@mediatek.com wrote: > From: Qii Wang > > tSU,STA/tHD,STA/tSU,STOP maybe out of spec due to device > clock-stretching or circuit loss, we could get device > clock-stretch time from dts to adjust these parameters > to meet the spec via EXT_CONF register. > > Signed-off-by: Qii Wang I tried to understand from the code what the new binding expresses, but I don't fully understand it. Is it the maximum clock stretch time? Because I cannot recall a device which always uses the same delay for clock stretching. signature.asc Description: PGP signature
[PATCH] platform/surface: aggregator_registry: Give devices time to set up when connecting
Sometimes, the "base connected" event that we rely on to (re-)attach the device connected to the base is sent a bit too early. When this happens, some devices may not be completely ready yet. Specifically, the battery has been observed to report zero-values for things like full charge capacity, which, however, is only loaded once when the driver for that device probes. This can thus result in battery readings being unavailable. As we cannot easily and reliably discern between devices that are not ready yet and devices that are not connected (i.e. will never be ready), delay adding these devices. This should give them enough time to set up. The delay is set to 2.5 seconds, which should give us a good safety margin based on testing and still be fairly responsive for users. To achieve that delay switch to updating via a delayed work struct, which means that we can also get rid of some locking. Signed-off-by: Maximilian Luz --- .../surface/surface_aggregator_registry.c | 98 --- 1 file changed, 40 insertions(+), 58 deletions(-) diff --git a/drivers/platform/surface/surface_aggregator_registry.c b/drivers/platform/surface/surface_aggregator_registry.c index eccb9d1007cd..685d37a7add1 100644 --- a/drivers/platform/surface/surface_aggregator_registry.c +++ b/drivers/platform/surface/surface_aggregator_registry.c @@ -13,10 +13,10 @@ #include #include #include -#include #include #include #include +#include #include #include @@ -287,6 +287,13 @@ static int ssam_hub_add_devices(struct device *parent, struct ssam_controller *c /* -- SSAM base-hub driver. - */ +/* + * Some devices (especially battery) may need a bit of time to be fully usable + * after being (re-)connected. This delay has been determined via + * experimentation. + */ +#define SSAM_BASE_UPDATE_CONNECT_DELAY msecs_to_jiffies(2500) + enum ssam_base_hub_state { SSAM_BASE_HUB_UNINITIALIZED, SSAM_BASE_HUB_CONNECTED, @@ -296,8 +303,8 @@ enum ssam_base_hub_state { struct ssam_base_hub { struct ssam_device *sdev; - struct mutex lock; /* Guards state update checks and transitions. */ enum ssam_base_hub_state state; + struct delayed_work update_work; struct ssam_event_notifier notif; }; @@ -335,11 +342,7 @@ static ssize_t ssam_base_hub_state_show(struct device *dev, struct device_attrib char *buf) { struct ssam_base_hub *hub = dev_get_drvdata(dev); - bool connected; - - mutex_lock(>lock); - connected = hub->state == SSAM_BASE_HUB_CONNECTED; - mutex_unlock(>lock); + bool connected = hub->state == SSAM_BASE_HUB_CONNECTED; return sysfs_emit(buf, "%d\n", connected); } @@ -356,16 +359,20 @@ static const struct attribute_group ssam_base_hub_group = { .attrs = ssam_base_hub_attrs, }; -static int __ssam_base_hub_update(struct ssam_base_hub *hub, enum ssam_base_hub_state new) +static void ssam_base_hub_update_workfn(struct work_struct *work) { + struct ssam_base_hub *hub = container_of(work, struct ssam_base_hub, update_work.work); struct fwnode_handle *node = dev_fwnode(>sdev->dev); + enum ssam_base_hub_state state; int status = 0; - lockdep_assert_held(>lock); + status = ssam_base_hub_query_state(hub, ); + if (status) + return; - if (hub->state == new) - return 0; - hub->state = new; + if (hub->state == state) + return; + hub->state = state; if (hub->state == SSAM_BASE_HUB_CONNECTED) status = ssam_hub_add_devices(>sdev->dev, hub->sdev->ctrl, node); @@ -374,51 +381,28 @@ static int __ssam_base_hub_update(struct ssam_base_hub *hub, enum ssam_base_hub_ if (status) dev_err(>sdev->dev, "failed to update base-hub devices: %d\n", status); - - return status; -} - -static int ssam_base_hub_update(struct ssam_base_hub *hub) -{ - enum ssam_base_hub_state state; - int status; - - mutex_lock(>lock); - - status = ssam_base_hub_query_state(hub, ); - if (!status) - status = __ssam_base_hub_update(hub, state); - - mutex_unlock(>lock); - return status; } static u32 ssam_base_hub_notif(struct ssam_event_notifier *nf, const struct ssam_event *event) { - struct ssam_base_hub *hub; - struct ssam_device *sdev; - enum ssam_base_hub_state new; - - hub = container_of(nf, struct ssam_base_hub, notif); - sdev = hub->sdev; + struct ssam_base_hub *hub = container_of(nf, struct ssam_base_hub, notif); + unsigned long delay; if (event->command_id != SSAM_EVENT_BAS_CID_CONNECTION) return 0; if (event->length < 1)
Re: [PATCH RESEND v1 4/4] powerpc/vdso: Add support for time namespaces
On Wed, Mar 31, 2021 at 04:48:47PM +, Christophe Leroy wrote: > This patch adds the necessary glue to provide time namespaces. > > Things are mainly copied from ARM64. > > __arch_get_timens_vdso_data() calculates timens vdso data position > based on the vdso data position, knowing it is the next page in vvar. > This avoids having to redo the mflr/bcl/mflr/mtlr dance to locate > the page relative to running code position. > Acked-by: Andrei Vagin > Signed-off-by: Christophe Leroy
[RFC PATCH v6 1/1] cmdline: Add capability to both append and prepend at the same time
One user has expressed the need to both append and prepend some built-in parameters to the command line provided by the bootloader. Allthough it is a corner case, it is easy to implement so let's do it. When the user chooses to prepend the bootloader provided command line with the built-in command line, he is offered the possibility to enter an additionnal built-in command line to be appended after the bootloader provided command line. It is a complementary feature which has no impact on the already existing ones and/or the existing defconfig. Suggested-by: Daniel Walker Signed-off-by: Christophe Leroy --- Sending this out as an RFC, applies on top of the series ("Implement GENERIC_CMDLINE"). I will add it to the series next spin unless someone is against it. --- include/linux/cmdline.h | 3 +++ init/Kconfig| 12 +++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/linux/cmdline.h b/include/linux/cmdline.h index 020028e2bdf0..fb274a4d5519 100644 --- a/include/linux/cmdline.h +++ b/include/linux/cmdline.h @@ -36,6 +36,9 @@ static __always_inline bool __cmdline_build(char *dst, const char *src) len = cmdline_strlcat(dst, src, COMMAND_LINE_SIZE); + if (IS_ENABLED(CONFIG_CMDLINE_PREPEND)) + len = cmdline_strlcat(dst, " " CONFIG_CMDLINE_MORE, COMMAND_LINE_SIZE); + if (IS_ENABLED(CONFIG_CMDLINE_APPEND)) len = cmdline_strlcat(dst, " " CONFIG_CMDLINE, COMMAND_LINE_SIZE); diff --git a/init/Kconfig b/init/Kconfig index fa002e3765ab..cd3087ff4f28 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -128,6 +128,14 @@ config CMDLINE If this string is not empty, additional choices are proposed below to determine how it will be used by the kernel. +config CMDLINE_MORE + string "Additional default kernel command string" if GENERIC_CMDLINE && CMDLINE_PREPEND + default "" + help + Defines an additional default kernel command string. + If this string is not empty, it is appended to the + command-line arguments provided by the bootloader + choice prompt "Kernel command line type" if CMDLINE != "" default CMDLINE_PREPEND if ARCH_WANT_CMDLINE_PREPEND_BY_DEFAULT @@ -154,7 +162,9 @@ config CMDLINE_PREPEND bool "Prepend to the bootloader kernel arguments" help The default kernel command string will be prepended to the - command-line arguments provided by the bootloader. + command-line arguments provided by the bootloader. When this + option is selected, another string can be added which will + be appended. config CMDLINE_FORCE bool "Always use the default kernel command string" -- 2.25.0
Re: [PATCH net-next v1 6/9] net: dsa: qca: ar9331: add ageing time support
On 4/3/2021 04:48, Oleksij Rempel wrote: This switch provides global ageing time configuration, so let DSA use it. Signed-off-by: Oleksij Rempel Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH] perf record: Disallow -c and -F option at the same time
Em Fri, Apr 02, 2021 at 08:25:30PM -0700, Alexey Alexandrov escreveu: > A warning can be missed when the tool is run by some kind of automation. > Backward compatibility aside, I think conflicting flags should result in an > early exit to avoid later surprises. Sure, I agree with you in principle, but having erred out in the past, i.e. in making this be accepted, now making this out of the blue finally be considered what it always should have been considered, an error, feels like an error. I sent this message after merging the change, but before pushing it out publicly I felt some (more) discussion would be in order. Are you sure that potentially breaking existing scripts is ok in this case? Up to you, frankly. - Arnaldo > On Fri, Apr 2, 2021 at 6:37 AM Arnaldo Carvalho de Melo > wrote: > > > Em Fri, Apr 02, 2021 at 06:40:20PM +0900, Namhyung Kim escreveu: > > > It's confusing which one is effective when the both options are given. > > > The current code happens to use -c in this case but users might not be > > > aware of it. We can change it to complain about that instead of > > > relying on the implicit priority. > > > > > > Before: > > > $ perf record -c 11 -F 99 true > > > [ perf record: Woken up 1 times to write data ] > > > [ perf record: Captured and wrote 0.031 MB perf.data (8 samples) ] > > > > > > $ perf evlist -F > > > cycles: sample_period=11 > > > > > > After: > > > $ perf record -c 11 -F 99 true > > > cannot set frequency and period at the same time > > > > > > So this change can break existing usages, but I think it's rare to > > > have both options and it'd be better changing them. > > > > Humm, perhaps we can just make that an warning stating that -c is used > > if both are specified? > > > > $ perf record -c 11 -F 99 true > > Frequency and period can't be used the same time, -c 1 will be used. > > > > - Arnaldo > > > > > Suggested-by: Alexey Alexandrov > > > Signed-off-by: Namhyung Kim > > > --- > > > tools/perf/util/record.c | 8 +++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c > > > index f99852d54b14..43e5b563dee8 100644 > > > --- a/tools/perf/util/record.c > > > +++ b/tools/perf/util/record.c > > > @@ -157,9 +157,15 @@ static int get_max_rate(unsigned int *rate) > > > static int record_opts__config_freq(struct record_opts *opts) > > > { > > > bool user_freq = opts->user_freq != UINT_MAX; > > > + bool user_interval = opts->user_interval != ULLONG_MAX; > > > unsigned int max_rate; > > > > > > - if (opts->user_interval != ULLONG_MAX) > > > + if (user_interval && user_freq) { > > > + pr_err("cannot set frequency and period at the same > > time\n"); > > > + return -1; > > > + } > > > + > > > + if (user_interval) > > > opts->default_interval = opts->user_interval; > > > if (user_freq) > > > opts->freq = opts->user_freq; > > > -- > > > 2.31.0.208.g409f899ff0-goog > > > > > > > -- > > > > - Arnaldo > > -- - Arnaldo
Re: [PATCH net-next v1 6/9] net: dsa: qca: ar9331: add ageing time support
On Sat, Apr 03, 2021 at 01:48:45PM +0200, Oleksij Rempel wrote: > This switch provides global ageing time configuration, so let DSA use > it. > > Signed-off-by: Oleksij Rempel Reviewed-by: Andrew Lunn Andrew
[PATCH net-next v1 6/9] net: dsa: qca: ar9331: add ageing time support
This switch provides global ageing time configuration, so let DSA use it. Signed-off-by: Oleksij Rempel --- drivers/net/dsa/qca/ar9331.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/net/dsa/qca/ar9331.c b/drivers/net/dsa/qca/ar9331.c index 4a98f14f31f4..b2c22ba924f0 100644 --- a/drivers/net/dsa/qca/ar9331.c +++ b/drivers/net/dsa/qca/ar9331.c @@ -1115,6 +1115,25 @@ static void ar9331_sw_port_fast_age(struct dsa_switch *ds, int port) dev_err_ratelimited(priv->dev, "%s: error: %i\n", __func__, ret); } +static int ar9331_sw_set_ageing_time(struct dsa_switch *ds, +unsigned int ageing_time) +{ + struct ar9331_sw_priv *priv = (struct ar9331_sw_priv *)ds->priv; + struct regmap *regmap = priv->regmap; + u32 time, val; + + time = DIV_ROUND_UP(ageing_time, AR9331_SW_AT_AGE_TIME_COEF); + if (!time) + time = 1; + else if (time > U16_MAX) + time = U16_MAX; + + val = FIELD_PREP(AR9331_SW_AT_AGE_TIME, time) | AR9331_SW_AT_AGE_EN; + return regmap_update_bits(regmap, AR9331_SW_REG_ADDR_TABLE_CTRL, + AR9331_SW_AT_AGE_EN | AR9331_SW_AT_AGE_TIME, + val); +} + static const struct dsa_switch_ops ar9331_sw_ops = { .get_tag_protocol = ar9331_sw_get_tag_protocol, .setup = ar9331_sw_setup, @@ -1130,6 +1149,7 @@ static const struct dsa_switch_ops ar9331_sw_ops = { .port_fdb_dump = ar9331_sw_port_fdb_dump, .port_mdb_add = ar9331_sw_port_mdb_add, .port_mdb_del = ar9331_sw_port_mdb_del, + .set_ageing_time= ar9331_sw_set_ageing_time, }; static irqreturn_t ar9331_sw_irq(int irq, void *data) @@ -1476,6 +1496,8 @@ static int ar9331_sw_probe(struct mdio_device *mdiodev) priv->ops = ar9331_sw_ops; ds->ops = >ops; dev_set_drvdata(>dev, priv); + ds->ageing_time_min = AR9331_SW_AT_AGE_TIME_COEF; + ds->ageing_time_max = AR9331_SW_AT_AGE_TIME_COEF * U16_MAX; for (i = 0; i < ARRAY_SIZE(priv->port); i++) { struct ar9331_sw_port *port = >port[i]; -- 2.29.2
[PATCH v2 5/7] perf arm-spe: Assign kernel time to synthesized event
In current code, it assigns the arch timer counter to the synthesized samples Arm SPE trace, thus the samples don't contain the kernel time but only contain the raw counter value. To fix the issue, this patch converts the timer counter to kernel time and assigns it to sample timestamp. Signed-off-by: Leo Yan --- tools/perf/util/arm-spe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index 4cf558b0218a..80f5659e7f7e 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -234,7 +234,7 @@ static void arm_spe_prep_sample(struct arm_spe *spe, struct arm_spe_record *record = >decoder->record; if (!spe->timeless_decoding) - sample->time = speq->timestamp; + sample->time = tsc_to_perf_time(record->timestamp, >tc); sample->ip = record->from_ip; sample->cpumode = arm_spe_cpumode(spe, sample->ip); -- 2.25.1
[PATCH v2 4/7] perf arm-spe: Convert event kernel time to counter value
When handle a perf event, Arm SPE decoder needs to decide if this perf event is earlier or later than the samples from Arm SPE trace data; to do comparision, it needs to use the same unit for the time. This patch converts the event kernel time to arch timer's counter value, thus it can be used to compare with counter value contained in Arm SPE Timestamp packet. Signed-off-by: Leo Yan --- tools/perf/util/arm-spe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index 69ce3483d1af..4cf558b0218a 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -669,7 +669,7 @@ static int arm_spe_process_event(struct perf_session *session, } if (sample->time && (sample->time != (u64) -1)) - timestamp = sample->time; + timestamp = perf_time_to_tsc(sample->time, >tc); else timestamp = 0; -- 2.25.1
Re: [PATCH] perf record: Disallow -c and -F option at the same time
Em Fri, Apr 02, 2021 at 06:40:20PM +0900, Namhyung Kim escreveu: > It's confusing which one is effective when the both options are given. > The current code happens to use -c in this case but users might not be > aware of it. We can change it to complain about that instead of > relying on the implicit priority. > > Before: > $ perf record -c 11 -F 99 true > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.031 MB perf.data (8 samples) ] > > $ perf evlist -F > cycles: sample_period=11 > > After: > $ perf record -c 11 -F 99 true > cannot set frequency and period at the same time > > So this change can break existing usages, but I think it's rare to > have both options and it'd be better changing them. Humm, perhaps we can just make that an warning stating that -c is used if both are specified? $ perf record -c 11 -F 99 true Frequency and period can't be used the same time, -c 1 will be used. - Arnaldo > Suggested-by: Alexey Alexandrov > Signed-off-by: Namhyung Kim > --- > tools/perf/util/record.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c > index f99852d54b14..43e5b563dee8 100644 > --- a/tools/perf/util/record.c > +++ b/tools/perf/util/record.c > @@ -157,9 +157,15 @@ static int get_max_rate(unsigned int *rate) > static int record_opts__config_freq(struct record_opts *opts) > { > bool user_freq = opts->user_freq != UINT_MAX; > + bool user_interval = opts->user_interval != ULLONG_MAX; > unsigned int max_rate; > > - if (opts->user_interval != ULLONG_MAX) > + if (user_interval && user_freq) { > + pr_err("cannot set frequency and period at the same time\n"); > + return -1; > + } > + > + if (user_interval) > opts->default_interval = opts->user_interval; > if (user_freq) > opts->freq = opts->user_freq; > -- > 2.31.0.208.g409f899ff0-goog > -- - Arnaldo
Re: [PATCH v3 3/3] ima: enable loading of build time generated key on .ima keyring
On 3/30/21 9:16 AM, Nayna Jain wrote: The kernel currently only loads the kernel module signing key onto the builtin trusted keyring. Load the module signing key onto the IMA keyring as well. Signed-off-by: Nayna Jain Acked-by: Stefan Berger --- certs/system_certificates.S | 13 +- certs/system_keyring.c| 47 +++ include/keys/system_keyring.h | 7 ++ security/integrity/digsig.c | 2 ++ 4 files changed, 58 insertions(+), 11 deletions(-) diff --git a/certs/system_certificates.S b/certs/system_certificates.S index 8f29058adf93..dcad27ea8527 100644 --- a/certs/system_certificates.S +++ b/certs/system_certificates.S @@ -8,9 +8,11 @@ .globl system_certificate_list system_certificate_list: __cert_list_start: -#ifdef CONFIG_MODULE_SIG +__module_cert_start: +#if defined(CONFIG_MODULE_SIG) || defined(CONFIG_IMA_APPRAISE_MODSIG) .incbin "certs/signing_key.x509" #endif +__module_cert_end: .incbin "certs/x509_certificate_list" __cert_list_end: @@ -35,3 +37,12 @@ system_certificate_list_size: #else .long __cert_list_end - __cert_list_start #endif + + .align 8 + .globl module_cert_size +module_cert_size: +#ifdef CONFIG_64BIT + .quad __module_cert_end - __module_cert_start +#else + .long __module_cert_end - __module_cert_start +#endif diff --git a/certs/system_keyring.c b/certs/system_keyring.c index 4b693da488f1..bb122bf4cc17 100644 --- a/certs/system_keyring.c +++ b/certs/system_keyring.c @@ -27,6 +27,7 @@ static struct key *platform_trusted_keys; extern __initconst const u8 system_certificate_list[]; extern __initconst const unsigned long system_certificate_list_size; +extern __initconst const unsigned long module_cert_size; /** * restrict_link_to_builtin_trusted - Restrict keyring addition by built in CA @@ -132,19 +133,11 @@ static __init int system_trusted_keyring_init(void) */ device_initcall(system_trusted_keyring_init); -/* - * Load the compiled-in list of X.509 certificates. - */ -static __init int load_system_certificate_list(void) +static __init int load_cert(const u8 *p, const u8 *end, struct key *keyring) { key_ref_t key; - const u8 *p, *end; size_t plen; - pr_notice("Loading compiled-in X.509 certificates\n"); - - p = system_certificate_list; - end = p + system_certificate_list_size; while (p < end) { /* Each cert begins with an ASN.1 SEQUENCE tag and must be more * than 256 bytes in size. @@ -159,7 +152,7 @@ static __init int load_system_certificate_list(void) if (plen > end - p) goto dodgy_cert; - key = key_create_or_update(make_key_ref(builtin_trusted_keys, 1), + key = key_create_or_update(make_key_ref(keyring, 1), "asymmetric", NULL, p, @@ -186,6 +179,40 @@ static __init int load_system_certificate_list(void) pr_err("Problem parsing in-kernel X.509 certificate list\n"); return 0; } + +__init int load_module_cert(struct key *keyring) +{ + const u8 *p, *end; + + if (!IS_ENABLED(CONFIG_IMA_APPRAISE_MODSIG)) + return 0; + + pr_notice("Loading compiled-in module X.509 certificates\n"); + + p = system_certificate_list; + end = p + module_cert_size; + + return load_cert(p, end, keyring); +} + +/* + * Load the compiled-in list of X.509 certificates. + */ +static __init int load_system_certificate_list(void) +{ + const u8 *p, *end; + + pr_notice("Loading compiled-in X.509 certificates\n"); + +#ifdef CONFIG_MODULE_SIG + p = system_certificate_list; +#else + p = system_certificate_list + module_cert_size; +#endif + + end = p + system_certificate_list_size; + return load_cert(p, end, builtin_trusted_keys); +} late_initcall(load_system_certificate_list); #ifdef CONFIG_SYSTEM_DATA_VERIFICATION diff --git a/include/keys/system_keyring.h b/include/keys/system_keyring.h index fb8b07daa9d1..f954276c616a 100644 --- a/include/keys/system_keyring.h +++ b/include/keys/system_keyring.h @@ -16,9 +16,16 @@ extern int restrict_link_by_builtin_trusted(struct key *keyring, const struct key_type *type, const union key_payload *payload, struct key *restriction_key); +extern __init int load_module_cert(struct key *keyring); #else #define restrict_link_by_builtin_trusted restrict_link_reject + +static inline __init int load_module_cert(struct key *keyring) +{ + return 0; +} + #endif #ifdef CONFIG_SECONDARY_TRUSTED_KEYRING diff --git a/security/integrity/digsig.c b/security/integrity/digsig.c index 250fb0836156..3b06a01bd0fd 100644 ---
Re: [PATCH v3 2/3] ima: enable signing of modules with build time generated key
On 3/30/21 9:16 AM, Nayna Jain wrote: The kernel build process currently only signs kernel modules when MODULE_SIG is enabled. Also, sign the kernel modules at build time when IMA_APPRAISE_MODSIG is enabled. Signed-off-by: Nayna Jain Acked-by: Stefan Berger --- certs/Kconfig | 2 +- certs/Makefile | 8 init/Kconfig | 6 +++--- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/certs/Kconfig b/certs/Kconfig index c94e93d8bccf..48675ad319db 100644 --- a/certs/Kconfig +++ b/certs/Kconfig @@ -4,7 +4,7 @@ menu "Certificates for signature checking" config MODULE_SIG_KEY string "File name or PKCS#11 URI of module signing key" default "certs/signing_key.pem" - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG help Provide the file name of a private key/certificate in PEM format, or a PKCS#11 URI according to RFC7512. The file should contain, or diff --git a/certs/Makefile b/certs/Makefile index f4c25b67aad9..e3185c57fbd8 100644 --- a/certs/Makefile +++ b/certs/Makefile @@ -32,6 +32,14 @@ endif # CONFIG_SYSTEM_TRUSTED_KEYRING clean-files := x509_certificate_list .x509.list ifeq ($(CONFIG_MODULE_SIG),y) + SIGN_KEY = y +endif + +ifeq ($(CONFIG_IMA_APPRAISE_MODSIG),y) + SIGN_KEY = y +endif + +ifdef SIGN_KEY ### # # If module signing is requested, say by allyesconfig, but a key has not been diff --git a/init/Kconfig b/init/Kconfig index 5f5c776ef192..85e48a578f90 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2164,7 +2164,7 @@ config MODULE_SIG_FORCE config MODULE_SIG_ALL bool "Automatically sign all modules" default y - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG help Sign all modules during make modules_install. Without this option, modules must be signed manually, using the scripts/sign-file tool. @@ -2174,7 +2174,7 @@ comment "Do not forget to sign required modules with scripts/sign-file" choice prompt "Which hash algorithm should modules be signed with?" - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG help This determines which sort of hashing algorithm will be used during signature generation. This algorithm _must_ be built into the kernel @@ -2206,7 +2206,7 @@ endchoice config MODULE_SIG_HASH string - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG default "sha1" if MODULE_SIG_SHA1 default "sha224" if MODULE_SIG_SHA224 default "sha256" if MODULE_SIG_SHA256
[PATCH] perf record: Disallow -c and -F option at the same time
It's confusing which one is effective when the both options are given. The current code happens to use -c in this case but users might not be aware of it. We can change it to complain about that instead of relying on the implicit priority. Before: $ perf record -c 11 -F 99 true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.031 MB perf.data (8 samples) ] $ perf evlist -F cycles: sample_period=11 After: $ perf record -c 11 -F 99 true cannot set frequency and period at the same time So this change can break existing usages, but I think it's rare to have both options and it'd be better changing them. Suggested-by: Alexey Alexandrov Signed-off-by: Namhyung Kim --- tools/perf/util/record.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c index f99852d54b14..43e5b563dee8 100644 --- a/tools/perf/util/record.c +++ b/tools/perf/util/record.c @@ -157,9 +157,15 @@ static int get_max_rate(unsigned int *rate) static int record_opts__config_freq(struct record_opts *opts) { bool user_freq = opts->user_freq != UINT_MAX; + bool user_interval = opts->user_interval != ULLONG_MAX; unsigned int max_rate; - if (opts->user_interval != ULLONG_MAX) + if (user_interval && user_freq) { + pr_err("cannot set frequency and period at the same time\n"); + return -1; + } + + if (user_interval) opts->default_interval = opts->user_interval; if (user_freq) opts->freq = opts->user_freq; -- 2.31.0.208.g409f899ff0-goog
[PATCH 3/3] powerpc/32s: Define a MODULE area below kernel text all the time
On book3s/32, the segment below kernel text is used for module allocation when CONFIG_STRICT_KERNEL_RWX is defined. In order to benefit from the powerpc specific module_alloc() function which allocate modules with 32 Mbytes from end of kernel text, use that segment below PAGE_OFFSET at all time. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/book3s/32/pgtable.h | 2 -- arch/powerpc/mm/book3s32/mmu.c | 7 --- 3 files changed, 1 insertion(+), 10 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c1344c05226c..15a91202d5c3 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -1219,7 +1219,7 @@ config TASK_SIZE_BOOL config TASK_SIZE hex "Size of user task space" if TASK_SIZE_BOOL default "0x8000" if PPC_8xx - default "0xb000" if PPC_BOOK3S_32 && STRICT_KERNEL_RWX + default "0xb000" if PPC_BOOK3S_32 default "0xc000" endmenu diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index 415ae29fa73a..83c65845a1a9 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -194,10 +194,8 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot); #define VMALLOC_ENDioremap_bot #endif -#ifdef CONFIG_STRICT_KERNEL_RWX #define MODULES_ENDALIGN_DOWN(PAGE_OFFSET, SZ_256M) #define MODULES_VADDR (MODULES_END - SZ_256M) -#endif #ifndef __ASSEMBLY__ #include diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c index a0db398b5c26..159930351d9f 100644 --- a/arch/powerpc/mm/book3s32/mmu.c +++ b/arch/powerpc/mm/book3s32/mmu.c @@ -184,17 +184,10 @@ static bool is_module_segment(unsigned long addr) { if (!IS_ENABLED(CONFIG_MODULES)) return false; -#ifdef MODULES_VADDR if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M)) return false; if (addr > ALIGN(MODULES_END, SZ_256M) - 1) return false; -#else - if (addr < ALIGN_DOWN(VMALLOC_START, SZ_256M)) - return false; - if (addr > ALIGN(VMALLOC_END, SZ_256M) - 1) - return false; -#endif return true; } -- 2.25.0
[PATCH RESEND v1 4/4] powerpc/vdso: Add support for time namespaces
This patch adds the necessary glue to provide time namespaces. Things are mainly copied from ARM64. __arch_get_timens_vdso_data() calculates timens vdso data position based on the vdso data position, knowing it is the next page in vvar. This avoids having to redo the mflr/bcl/mflr/mtlr dance to locate the page relative to running code position. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 3 +- arch/powerpc/include/asm/vdso/gettimeofday.h | 10 ++ arch/powerpc/include/asm/vdso_datapage.h | 2 - arch/powerpc/kernel/vdso.c | 116 --- arch/powerpc/kernel/vdso32/vdso32.lds.S | 2 +- arch/powerpc/kernel/vdso64/vdso64.lds.S | 2 +- 6 files changed, 114 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c1344c05226c..71daff5f15d5 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -172,6 +172,7 @@ config PPC select GENERIC_CPU_AUTOPROBE select GENERIC_CPU_VULNERABILITIES if PPC_BARRIER_NOSPEC select GENERIC_EARLY_IOREMAP + select GENERIC_GETTIMEOFDAY select GENERIC_IRQ_SHOW select GENERIC_IRQ_SHOW_LEVEL select GENERIC_PCI_IOMAPif PCI @@ -179,7 +180,7 @@ config PPC select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER select GENERIC_TIME_VSYSCALL - select GENERIC_GETTIMEOFDAY + select GENERIC_VDSO_TIME_NS select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU select HAVE_ARCH_JUMP_LABEL diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h index d453e725c79f..e448df1dd071 100644 --- a/arch/powerpc/include/asm/vdso/gettimeofday.h +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h @@ -2,6 +2,8 @@ #ifndef _ASM_POWERPC_VDSO_GETTIMEOFDAY_H #define _ASM_POWERPC_VDSO_GETTIMEOFDAY_H +#include + #ifdef __ASSEMBLY__ #include @@ -153,6 +155,14 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode, const struct vdso_data *__arch_get_vdso_data(void); +#ifdef CONFIG_TIME_NS +static __always_inline +const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd) +{ + return (void *)vd + PAGE_SIZE; +} +#endif + static inline bool vdso_clocksource_ok(const struct vdso_data *vd) { return true; diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h index 3f958ecf2beb..a585c8e538ff 100644 --- a/arch/powerpc/include/asm/vdso_datapage.h +++ b/arch/powerpc/include/asm/vdso_datapage.h @@ -107,9 +107,7 @@ extern struct vdso_arch_data *vdso_data; bcl 20, 31, .+4 999: mflr\ptr -#if CONFIG_PPC_PAGE_SHIFT > 14 addis \ptr, \ptr, (_vdso_datapage - 999b)@ha -#endif addi\ptr, \ptr, (_vdso_datapage - 999b)@l .endm diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index b14907209822..717f2c9a7573 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -50,6 +51,12 @@ static union { } vdso_data_store __page_aligned_data; struct vdso_arch_data *vdso_data = _data_store.data; +enum vvar_pages { + VVAR_DATA_PAGE_OFFSET, + VVAR_TIMENS_PAGE_OFFSET, + VVAR_NR_PAGES, +}; + static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma, unsigned long text_size) { @@ -73,8 +80,12 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str return vdso_mremap(sm, new_vma, _end - _start); } +static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, +struct vm_area_struct *vma, struct vm_fault *vmf); + static struct vm_special_mapping vvar_spec __ro_after_init = { .name = "[vvar]", + .fault = vvar_fault, }; static struct vm_special_mapping vdso32_spec __ro_after_init = { @@ -87,6 +98,94 @@ static struct vm_special_mapping vdso64_spec __ro_after_init = { .mremap = vdso64_mremap, }; +#ifdef CONFIG_TIME_NS +struct vdso_data *arch_get_vdso_data(void *vvar_page) +{ + return ((struct vdso_arch_data *)vvar_page)->data; +} + +/* + * The vvar mapping contains data for a specific time namespace, so when a task + * changes namespace we must unmap its vvar data for the old namespace. + * Subsequent faults will map in data for the new namespace. + * + * For more details see timens_setup_vdso_data(). + */ +int vdso_join_timens(struct task_struct *task, struct time_namespace *ns) +{ + struct mm_struct *mm = task->mm; + struct vm_area_struct *vma; + + mmap_read_lock(mm); + + for (vma = mm->mmap; vma; vma = vma->vm_next) { + unsigned long s
[PATCH RESEND v1 0/4] powerpc/vdso: Add support for time namespaces
[Sorry, resending with complete destination list, I used the wrong script on the first delivery] This series adds support for time namespaces on powerpc. All timens selftests are successfull. Christophe Leroy (3): lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline() lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data() powerpc/vdso: Add support for time namespaces Dmitry Safonov (1): powerpc/vdso: Separate vvar vma from vdso .../include/asm/vdso/compat_gettimeofday.h| 3 +- arch/arm64/include/asm/vdso/gettimeofday.h| 2 +- arch/powerpc/Kconfig | 3 +- arch/powerpc/include/asm/mmu_context.h| 2 +- arch/powerpc/include/asm/vdso/gettimeofday.h | 10 ++ arch/powerpc/include/asm/vdso_datapage.h | 2 - arch/powerpc/kernel/vdso.c| 138 -- arch/powerpc/kernel/vdso32/vdso32.lds.S | 2 +- arch/powerpc/kernel/vdso64/vdso64.lds.S | 2 +- arch/s390/include/asm/vdso/gettimeofday.h | 3 +- arch/x86/include/asm/vdso/gettimeofday.h | 3 +- lib/vdso/gettimeofday.c | 31 ++-- 12 files changed, 162 insertions(+), 39 deletions(-) -- 2.25.0
[PATCH v1 4/4] powerpc/vdso: Add support for time namespaces
This patch adds the necessary glue to provide time namespaces. Things are mainly copied from ARM64. __arch_get_timens_vdso_data() calculates timens vdso data position based on the vdso data position, knowing it is the next page in vvar. This avoids having to redo the mflr/bcl/mflr/mtlr dance to locate the page relative to running code position. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 3 +- arch/powerpc/include/asm/vdso/gettimeofday.h | 10 ++ arch/powerpc/include/asm/vdso_datapage.h | 2 - arch/powerpc/kernel/vdso.c | 116 --- arch/powerpc/kernel/vdso32/vdso32.lds.S | 2 +- arch/powerpc/kernel/vdso64/vdso64.lds.S | 2 +- 6 files changed, 114 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c1344c05226c..71daff5f15d5 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -172,6 +172,7 @@ config PPC select GENERIC_CPU_AUTOPROBE select GENERIC_CPU_VULNERABILITIES if PPC_BARRIER_NOSPEC select GENERIC_EARLY_IOREMAP + select GENERIC_GETTIMEOFDAY select GENERIC_IRQ_SHOW select GENERIC_IRQ_SHOW_LEVEL select GENERIC_PCI_IOMAPif PCI @@ -179,7 +180,7 @@ config PPC select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER select GENERIC_TIME_VSYSCALL - select GENERIC_GETTIMEOFDAY + select GENERIC_VDSO_TIME_NS select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU select HAVE_ARCH_JUMP_LABEL diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h index d453e725c79f..e448df1dd071 100644 --- a/arch/powerpc/include/asm/vdso/gettimeofday.h +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h @@ -2,6 +2,8 @@ #ifndef _ASM_POWERPC_VDSO_GETTIMEOFDAY_H #define _ASM_POWERPC_VDSO_GETTIMEOFDAY_H +#include + #ifdef __ASSEMBLY__ #include @@ -153,6 +155,14 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode, const struct vdso_data *__arch_get_vdso_data(void); +#ifdef CONFIG_TIME_NS +static __always_inline +const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd) +{ + return (void *)vd + PAGE_SIZE; +} +#endif + static inline bool vdso_clocksource_ok(const struct vdso_data *vd) { return true; diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h index 3f958ecf2beb..a585c8e538ff 100644 --- a/arch/powerpc/include/asm/vdso_datapage.h +++ b/arch/powerpc/include/asm/vdso_datapage.h @@ -107,9 +107,7 @@ extern struct vdso_arch_data *vdso_data; bcl 20, 31, .+4 999: mflr\ptr -#if CONFIG_PPC_PAGE_SHIFT > 14 addis \ptr, \ptr, (_vdso_datapage - 999b)@ha -#endif addi\ptr, \ptr, (_vdso_datapage - 999b)@l .endm diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index b14907209822..717f2c9a7573 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -50,6 +51,12 @@ static union { } vdso_data_store __page_aligned_data; struct vdso_arch_data *vdso_data = _data_store.data; +enum vvar_pages { + VVAR_DATA_PAGE_OFFSET, + VVAR_TIMENS_PAGE_OFFSET, + VVAR_NR_PAGES, +}; + static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma, unsigned long text_size) { @@ -73,8 +80,12 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str return vdso_mremap(sm, new_vma, _end - _start); } +static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, +struct vm_area_struct *vma, struct vm_fault *vmf); + static struct vm_special_mapping vvar_spec __ro_after_init = { .name = "[vvar]", + .fault = vvar_fault, }; static struct vm_special_mapping vdso32_spec __ro_after_init = { @@ -87,6 +98,94 @@ static struct vm_special_mapping vdso64_spec __ro_after_init = { .mremap = vdso64_mremap, }; +#ifdef CONFIG_TIME_NS +struct vdso_data *arch_get_vdso_data(void *vvar_page) +{ + return ((struct vdso_arch_data *)vvar_page)->data; +} + +/* + * The vvar mapping contains data for a specific time namespace, so when a task + * changes namespace we must unmap its vvar data for the old namespace. + * Subsequent faults will map in data for the new namespace. + * + * For more details see timens_setup_vdso_data(). + */ +int vdso_join_timens(struct task_struct *task, struct time_namespace *ns) +{ + struct mm_struct *mm = task->mm; + struct vm_area_struct *vma; + + mmap_read_lock(mm); + + for (vma = mm->mmap; vma; vma = vma->vm_next) { + unsigned long s
[PATCH v1 0/4] powerpc/vdso: Add support for time namespaces
This series adds support for time namespaces on powerpc. All timens selftests are successfull. Christophe Leroy (3): lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline() lib/vdso: Add vdso_data pointer as input to __arch_get_timens_vdso_data() powerpc/vdso: Add support for time namespaces Dmitry Safonov (1): powerpc/vdso: Separate vvar vma from vdso .../include/asm/vdso/compat_gettimeofday.h| 3 +- arch/arm64/include/asm/vdso/gettimeofday.h| 2 +- arch/powerpc/Kconfig | 3 +- arch/powerpc/include/asm/mmu_context.h| 2 +- arch/powerpc/include/asm/vdso/gettimeofday.h | 10 ++ arch/powerpc/include/asm/vdso_datapage.h | 2 - arch/powerpc/kernel/vdso.c| 138 -- arch/powerpc/kernel/vdso32/vdso32.lds.S | 2 +- arch/powerpc/kernel/vdso64/vdso64.lds.S | 2 +- arch/s390/include/asm/vdso/gettimeofday.h | 3 +- arch/x86/include/asm/vdso/gettimeofday.h | 3 +- lib/vdso/gettimeofday.c | 31 ++-- 12 files changed, 162 insertions(+), 39 deletions(-) -- 2.25.0
Re: [PATCH v3 1/3] keys: cleanup build time module signing keys
On Tue, Mar 30, 2021 at 09:16:34AM -0400, Nayna Jain wrote: > The "mrproper" target is still looking for build time generated keys in > the kernel root directory instead of certs directory. Fix the path and > remove the names of the files which are no longer generated. > > Fixes: cfc411e7fff3 ("Move certificate handling to its own directory") > Signed-off-by: Nayna Jain > Reviewed-by: Stefan Berger > Reviewed-by: Mimi Zohar > --- > Makefile | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/Makefile b/Makefile > index d4784d181123..b7c2ed2a8684 100644 > --- a/Makefile > +++ b/Makefile > @@ -1523,9 +1523,9 @@ MRPROPER_FILES += include/config include/generated > \ > debian snap tar-install \ > .config .config.old .version \ > Module.symvers \ > - signing_key.pem signing_key.priv signing_key.x509 \ > - x509.genkey extra_certificates signing_key.x509.keyid \ > - signing_key.x509.signer vmlinux-gdb.py \ > + certs/signing_key.pem certs/signing_key.x509 \ > + certs/x509.genkey \ > + vmlinux-gdb.py \ > *.spec > > # Directories & files removed with 'make distclean' > -- > 2.29.2 > > Reviewed-by: Jarkko Sakkinen /Jarkko
[RFC v2 42/43] shmem: reduce time holding xa_lock when inserting pages
Rather than adding one page at a time to the page cache and taking the page cache xarray lock each time, where possible add pages in bulk by first populating an xarray node outside of the page cache before taking the lock to insert it. When a group of pages to be inserted will fill an xarray node, add them to a local xarray, export the xarray node, and then take the lock on the page cache xarray and insert the node. Signed-off-by: Anthony Yznaga --- mm/shmem.c | 162 ++--- 1 file changed, 156 insertions(+), 6 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index f495af51042e..a7c23b43b57f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -827,17 +827,149 @@ static void shmem_delete_from_page_cache(struct page *page, void *radswap) BUG_ON(error); } +static int shmem_add_aligned_to_page_cache(struct page *pages[], int npages, + struct address_space *mapping, + pgoff_t index, gfp_t gfp, int order, + struct mm_struct *charge_mm) +{ + int xa_shift = order + XA_CHUNK_SHIFT - (order % XA_CHUNK_SHIFT); + XA_STATE_ORDER(xas, >i_pages, index, xa_shift); + struct xarray xa_tmp; + /* +* Specify order so xas_create_range() only needs to be called once +* to allocate the entire range. This guarantees that xas_store() +* will not fail due to lack of memory. +* Specify index == 0 so the minimum necessary nodes are allocated. +*/ + XA_STATE_ORDER(xas_tmp, _tmp, 0, xa_shift); + unsigned long nr = 1UL << order; + struct xa_node *node; + int i, error; + + if (npages * nr != 1 << xa_shift) { + WARN_ONCE(1, "npages (%d) not aligned to xa_shift\n", npages); + return -EINVAL; + } + if (!IS_ALIGNED(index, 1 << xa_shift)) { + WARN_ONCE(1, "index (%lu) not aligned to xa_shift\n", index); + return -EINVAL; + } + + for (i = 0; i < npages; i++) { + bool skipcharge = page_memcg(pages[i]) ? true : false; + + VM_BUG_ON_PAGE(PageTail(pages[i]), pages[i]); + VM_BUG_ON_PAGE(!PageLocked(pages[i]), pages[i]); + VM_BUG_ON_PAGE(!PageSwapBacked(pages[i]), pages[i]); + + page_ref_add(pages[i], nr); + pages[i]->mapping = mapping; + pages[i]->index = index + (i * nr); + + if (!skipcharge && !PageSwapCache(pages[i])) { + error = mem_cgroup_charge(pages[i], charge_mm, gfp); + if (error) { + if (PageTransHuge(pages[i])) { + count_vm_event(THP_FILE_FALLBACK); + count_vm_event(THP_FILE_FALLBACK_CHARGE); + } + goto error; + } + } + cgroup_throttle_swaprate(pages[i], gfp); + } + + xa_init(_tmp); + do { + xas_lock(_tmp); + xas_create_range(_tmp); + if (xas_error(_tmp)) + goto unlock; + for (i = 0; i < npages; i++) { + int j = 0; +next: + xas_store(_tmp, pages[i]); + if (++j < nr) { + xas_next(_tmp); + goto next; + } + if (i < npages - 1) + xas_next(_tmp); + } + xas_set_order(_tmp, 0, xa_shift); + node = xas_export_node(_tmp); +unlock: + xas_unlock(_tmp); + } while (xas_nomem(_tmp, gfp)); + + if (xas_error(_tmp)) { + error = xas_error(_tmp); + i = npages - 1; + goto error; + } + + do { + xas_lock_irq(); + xas_import_node(, node); + if (xas_error()) + goto unlock1; + mapping->nrpages += nr * npages; + xas_unlock(); + for (i = 0; i < npages; i++) { + __mod_lruvec_page_state(pages[i], NR_FILE_PAGES, nr); + __mod_lruvec_page_state(pages[i], NR_SHMEM, nr); + if (PageTransHuge(pages[i])) { + count_vm_event(THP_FILE_ALLOC); + __inc_node_page_state(pages[i], NR_SHMEM_THPS); + } + } + local_irq_enable(); + break; +unlock1: + xas_unlock_irq(); + } while (xas_nomem(, gfp)); + + if (xas_error()) { + error = xas_error(); +
[RFC v2 32/43] shmem: preserve shmem files a chunk at a time
To prepare for multithreading the work to preserve a shmem file, divide the work into subranges of the total index range of the file. The chunk size is a rather arbitrary 256k indices. Signed-off-by: Anthony Yznaga --- mm/shmem_pkram.c | 64 +--- 1 file changed, 57 insertions(+), 7 deletions(-) diff --git a/mm/shmem_pkram.c b/mm/shmem_pkram.c index 8682b0c002c0..e52722b3a709 100644 --- a/mm/shmem_pkram.c +++ b/mm/shmem_pkram.c @@ -74,16 +74,14 @@ static int save_page(struct page *page, struct pkram_access *pa) return err; } -static int save_file_content(struct pkram_stream *ps, struct address_space *mapping) +static int save_file_content_range(struct pkram_access *pa, + struct address_space *mapping, + unsigned long start, unsigned long end) { - PKRAM_ACCESS(pa, ps, pages); struct pagevec pvec; - unsigned long start, end; int err = 0; int i; - start = 0; - end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); pagevec_init(); for ( ; ; ) { pvec.nr = find_get_pages_range(mapping, , end, @@ -95,7 +93,7 @@ static int save_file_content(struct pkram_stream *ps, struct address_space *mapp lock_page(page); BUG_ON(page->mapping != mapping); - err = save_page(page, ); + err = save_page(page, pa); if (PageCompound(page)) { start = page->index + compound_nr(page); i += compound_nr(page); @@ -113,10 +111,62 @@ static int save_file_content(struct pkram_stream *ps, struct address_space *mapp cond_resched(); } - pkram_finish_access(, err == 0); return err; } +struct shmem_pkram_arg { + struct pkram_stream *ps; + struct address_space *mapping; + struct mm_struct *mm; + atomic64_t next; +}; + +unsigned long shmem_pkram_max_index_range = 512 * 512; + +static int get_save_range(unsigned long max, atomic64_t *next, unsigned long *start, unsigned long *end) +{ + unsigned long index; + + index = atomic64_fetch_add(shmem_pkram_max_index_range, next); + if (index >= max) + return -ENODATA; + + *start = index; + *end = index + shmem_pkram_max_index_range - 1; + + return 0; +} + +static int do_save_file_content(struct pkram_stream *ps, + struct address_space *mapping, + atomic64_t *next) +{ + PKRAM_ACCESS(pa, ps, pages); + unsigned long start, end, max; + int ret; + + max = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); + + do { + ret = get_save_range(max, next, , ); + if (!ret) + ret = save_file_content_range(, mapping, start, end); + } while (!ret); + + if (ret == -ENODATA) + ret = 0; + + pkram_finish_access(, ret == 0); + return ret; +} + +static int save_file_content(struct pkram_stream *ps, struct address_space *mapping) +{ + struct shmem_pkram_arg arg = { ps, mapping, NULL, ATOMIC64_INIT(0) }; + + return do_save_file_content(arg.ps, arg.mapping, ); +} + static int save_file(struct dentry *dentry, struct pkram_stream *ps) { PKRAM_ACCESS(pa_bytes, ps, bytes); -- 1.8.3.1
Re: [PATCH net-next 1/1] stmmac: intel: add cross time-stamping freq difference adjustment
Hello: This patch was applied to netdev/net-next.git (refs/heads/master): On Tue, 30 Mar 2021 10:46:53 +0800 you wrote: > Cross time-stamping mechanism used in certain instance of Intel mGbE > may run at different clock frequency in comparison to the clock > frequency used by processor, so we introduce cross T/S frequency > adjustment to ensure TSC calculation is correct when processor got the > cross time-stamps. > > Signed-off-by: Wong Vee Khee > > [...] Here is the summary with links: - [net-next,1/1] stmmac: intel: add cross time-stamping freq difference adjustment https://git.kernel.org/netdev/net-next/c/1c137d4777b5 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
[PATCH v19 4/7] time: Add mechanism to recognize clocksource in time_get_snapshot
From: Thomas Gleixner System time snapshots are not conveying information about the current clocksource which was used, but callers like the PTP KVM guest implementation have the requirement to evaluate the clocksource type to select the appropriate mechanism. Introduce a clocksource id field in struct clocksource which is by default set to CSID_GENERIC (0). Clocksource implementations can set that field to a value which allows to identify the clocksource. Store the clocksource id of the current clocksource in the system_time_snapshot so callers can evaluate which clocksource was used to take the snapshot and act accordingly. Signed-off-by: Thomas Gleixner Signed-off-by: Jianyong Wu Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201209060932.212364-5-jianyong...@arm.com --- include/linux/clocksource.h | 6 ++ include/linux/clocksource_ids.h | 11 +++ include/linux/timekeeping.h | 12 +++- kernel/time/clocksource.c | 2 ++ kernel/time/timekeeping.c | 1 + 5 files changed, 27 insertions(+), 5 deletions(-) create mode 100644 include/linux/clocksource_ids.h diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 86d143db6523..1290d0dce840 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -62,6 +63,10 @@ struct module; * 400-499: Perfect * The ideal clocksource. A must-use where * available. + * @id:Defaults to CSID_GENERIC. The id value is captured + * in certain snapshot functions to allow callers to + * validate the clocksource from which the snapshot was + * taken. * @flags: Flags describing special properties * @enable:Optional function to enable the clocksource * @disable: Optional function to disable the clocksource @@ -100,6 +105,7 @@ struct clocksource { const char *name; struct list_headlist; int rating; + enum clocksource_idsid; enum vdso_clock_modevdso_clock_mode; unsigned long flags; diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h new file mode 100644 index ..4d8e19e05328 --- /dev/null +++ b/include/linux/clocksource_ids.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_CLOCKSOURCE_IDS_H +#define _LINUX_CLOCKSOURCE_IDS_H + +/* Enum to give clocksources a unique identifier */ +enum clocksource_ids { + CSID_GENERIC= 0, + CSID_MAX, +}; + +#endif diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index c6792cf01bc7..78a98bdff76d 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -3,6 +3,7 @@ #define _LINUX_TIMEKEEPING_H #include +#include /* Included from linux/ktime.h */ @@ -243,11 +244,12 @@ struct ktime_timestamps { * @cs_was_changed_seq:The sequence number of clocksource change events */ struct system_time_snapshot { - u64 cycles; - ktime_t real; - ktime_t raw; - unsigned intclock_was_set_seq; - u8 cs_was_changed_seq; + u64 cycles; + ktime_t real; + ktime_t raw; + enum clocksource_idscs_id; + unsigned intclock_was_set_seq; + u8 cs_was_changed_seq; }; /** diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index cce484a2cc7c..4fe1df894ee5 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -920,6 +920,8 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq) clocksource_arch_init(cs); + if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX)) + cs->id = CSID_GENERIC; if (cs->vdso_clock_mode < 0 || cs->vdso_clock_mode >= VDSO_CLOCKMODE_MAX) { pr_warn("clocksource %s registered with invalid VDSO mode %d. Disabling VDSO support.\n", diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 6aee5768c86f..06f55f9258bf 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1048,6 +1048,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot) do { seq = read_seqcount_begin(_core.seq); now = tk_clock_read(>tkr_mono); + systime_snapshot->cs_id = tk->tkr_mono.clock->id; systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq; systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; base_real = ktime_add(tk->tkr_mono.base, -- 2.29.2
[PATCH v3 3/3] ima: enable loading of build time generated key on .ima keyring
The kernel currently only loads the kernel module signing key onto the builtin trusted keyring. Load the module signing key onto the IMA keyring as well. Signed-off-by: Nayna Jain --- certs/system_certificates.S | 13 +- certs/system_keyring.c| 47 +++ include/keys/system_keyring.h | 7 ++ security/integrity/digsig.c | 2 ++ 4 files changed, 58 insertions(+), 11 deletions(-) diff --git a/certs/system_certificates.S b/certs/system_certificates.S index 8f29058adf93..dcad27ea8527 100644 --- a/certs/system_certificates.S +++ b/certs/system_certificates.S @@ -8,9 +8,11 @@ .globl system_certificate_list system_certificate_list: __cert_list_start: -#ifdef CONFIG_MODULE_SIG +__module_cert_start: +#if defined(CONFIG_MODULE_SIG) || defined(CONFIG_IMA_APPRAISE_MODSIG) .incbin "certs/signing_key.x509" #endif +__module_cert_end: .incbin "certs/x509_certificate_list" __cert_list_end: @@ -35,3 +37,12 @@ system_certificate_list_size: #else .long __cert_list_end - __cert_list_start #endif + + .align 8 + .globl module_cert_size +module_cert_size: +#ifdef CONFIG_64BIT + .quad __module_cert_end - __module_cert_start +#else + .long __module_cert_end - __module_cert_start +#endif diff --git a/certs/system_keyring.c b/certs/system_keyring.c index 4b693da488f1..bb122bf4cc17 100644 --- a/certs/system_keyring.c +++ b/certs/system_keyring.c @@ -27,6 +27,7 @@ static struct key *platform_trusted_keys; extern __initconst const u8 system_certificate_list[]; extern __initconst const unsigned long system_certificate_list_size; +extern __initconst const unsigned long module_cert_size; /** * restrict_link_to_builtin_trusted - Restrict keyring addition by built in CA @@ -132,19 +133,11 @@ static __init int system_trusted_keyring_init(void) */ device_initcall(system_trusted_keyring_init); -/* - * Load the compiled-in list of X.509 certificates. - */ -static __init int load_system_certificate_list(void) +static __init int load_cert(const u8 *p, const u8 *end, struct key *keyring) { key_ref_t key; - const u8 *p, *end; size_t plen; - pr_notice("Loading compiled-in X.509 certificates\n"); - - p = system_certificate_list; - end = p + system_certificate_list_size; while (p < end) { /* Each cert begins with an ASN.1 SEQUENCE tag and must be more * than 256 bytes in size. @@ -159,7 +152,7 @@ static __init int load_system_certificate_list(void) if (plen > end - p) goto dodgy_cert; - key = key_create_or_update(make_key_ref(builtin_trusted_keys, 1), + key = key_create_or_update(make_key_ref(keyring, 1), "asymmetric", NULL, p, @@ -186,6 +179,40 @@ static __init int load_system_certificate_list(void) pr_err("Problem parsing in-kernel X.509 certificate list\n"); return 0; } + +__init int load_module_cert(struct key *keyring) +{ + const u8 *p, *end; + + if (!IS_ENABLED(CONFIG_IMA_APPRAISE_MODSIG)) + return 0; + + pr_notice("Loading compiled-in module X.509 certificates\n"); + + p = system_certificate_list; + end = p + module_cert_size; + + return load_cert(p, end, keyring); +} + +/* + * Load the compiled-in list of X.509 certificates. + */ +static __init int load_system_certificate_list(void) +{ + const u8 *p, *end; + + pr_notice("Loading compiled-in X.509 certificates\n"); + +#ifdef CONFIG_MODULE_SIG + p = system_certificate_list; +#else + p = system_certificate_list + module_cert_size; +#endif + + end = p + system_certificate_list_size; + return load_cert(p, end, builtin_trusted_keys); +} late_initcall(load_system_certificate_list); #ifdef CONFIG_SYSTEM_DATA_VERIFICATION diff --git a/include/keys/system_keyring.h b/include/keys/system_keyring.h index fb8b07daa9d1..f954276c616a 100644 --- a/include/keys/system_keyring.h +++ b/include/keys/system_keyring.h @@ -16,9 +16,16 @@ extern int restrict_link_by_builtin_trusted(struct key *keyring, const struct key_type *type, const union key_payload *payload, struct key *restriction_key); +extern __init int load_module_cert(struct key *keyring); #else #define restrict_link_by_builtin_trusted restrict_link_reject + +static inline __init int load_module_cert(struct key *keyring) +{ + return 0; +} + #endif #ifdef CONFIG_SECONDARY_TRUSTED_KEYRING diff --git a/security/integrity/digsig.c b/security/integrity/digsig.c index 250fb0836156..3b06a01bd0fd 100644 --- a/security/integrity/digsig.c +++ b/security/integrity/digsig.c @@ -111,6 +111,8 @@ static int
[PATCH v3 2/3] ima: enable signing of modules with build time generated key
The kernel build process currently only signs kernel modules when MODULE_SIG is enabled. Also, sign the kernel modules at build time when IMA_APPRAISE_MODSIG is enabled. Signed-off-by: Nayna Jain --- certs/Kconfig | 2 +- certs/Makefile | 8 init/Kconfig | 6 +++--- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/certs/Kconfig b/certs/Kconfig index c94e93d8bccf..48675ad319db 100644 --- a/certs/Kconfig +++ b/certs/Kconfig @@ -4,7 +4,7 @@ menu "Certificates for signature checking" config MODULE_SIG_KEY string "File name or PKCS#11 URI of module signing key" default "certs/signing_key.pem" - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG help Provide the file name of a private key/certificate in PEM format, or a PKCS#11 URI according to RFC7512. The file should contain, or diff --git a/certs/Makefile b/certs/Makefile index f4c25b67aad9..e3185c57fbd8 100644 --- a/certs/Makefile +++ b/certs/Makefile @@ -32,6 +32,14 @@ endif # CONFIG_SYSTEM_TRUSTED_KEYRING clean-files := x509_certificate_list .x509.list ifeq ($(CONFIG_MODULE_SIG),y) + SIGN_KEY = y +endif + +ifeq ($(CONFIG_IMA_APPRAISE_MODSIG),y) + SIGN_KEY = y +endif + +ifdef SIGN_KEY ### # # If module signing is requested, say by allyesconfig, but a key has not been diff --git a/init/Kconfig b/init/Kconfig index 5f5c776ef192..85e48a578f90 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2164,7 +2164,7 @@ config MODULE_SIG_FORCE config MODULE_SIG_ALL bool "Automatically sign all modules" default y - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG help Sign all modules during make modules_install. Without this option, modules must be signed manually, using the scripts/sign-file tool. @@ -2174,7 +2174,7 @@ comment "Do not forget to sign required modules with scripts/sign-file" choice prompt "Which hash algorithm should modules be signed with?" - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG help This determines which sort of hashing algorithm will be used during signature generation. This algorithm _must_ be built into the kernel @@ -2206,7 +2206,7 @@ endchoice config MODULE_SIG_HASH string - depends on MODULE_SIG + depends on MODULE_SIG || IMA_APPRAISE_MODSIG default "sha1" if MODULE_SIG_SHA1 default "sha224" if MODULE_SIG_SHA224 default "sha256" if MODULE_SIG_SHA256 -- 2.29.2
[PATCH v3 1/3] keys: cleanup build time module signing keys
The "mrproper" target is still looking for build time generated keys in the kernel root directory instead of certs directory. Fix the path and remove the names of the files which are no longer generated. Fixes: cfc411e7fff3 ("Move certificate handling to its own directory") Signed-off-by: Nayna Jain Reviewed-by: Stefan Berger Reviewed-by: Mimi Zohar --- Makefile | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index d4784d181123..b7c2ed2a8684 100644 --- a/Makefile +++ b/Makefile @@ -1523,9 +1523,9 @@ MRPROPER_FILES += include/config include/generated \ debian snap tar-install \ .config .config.old .version \ Module.symvers \ - signing_key.pem signing_key.priv signing_key.x509 \ - x509.genkey extra_certificates signing_key.x509.keyid \ - signing_key.x509.signer vmlinux-gdb.py \ + certs/signing_key.pem certs/signing_key.x509 \ + certs/x509.genkey \ + vmlinux-gdb.py \ *.spec # Directories & files removed with 'make distclean' -- 2.29.2
Re: [PATCH 01/10] platform/x86: toshiba_acpi: bind life-time of toshiba_acpi_dev to parent
On Mon, 29 Mar 2021 at 17:30, Jonathan Cameron wrote: > > On Wed, 24 Mar 2021 14:55:39 +0200 > Alexandru Ardelean wrote: > > > The 'toshiba_acpi_dev' object is allocated first and free'd last. We can > > bind it's life-time to the parent ACPI device object. This is a first step > > in using more device-managed allocated functions for this. > > > > The main intent is to try to convert the IIO framework to export only > > device-managed functions (i.e. devm_iio_device_alloc() and > > devm_iio_device_register()). It's still not 100% sure that this is > > possible, but for now, this is the process of taking it slowly in that > > direction. > > > > Signed-off-by: Alexandru Ardelean > > Might just be me, but naming anything dev that isn't a struct device * > is downright confusing? > I found it a bit odd as well, but I decided to not take it in consideration for now. > > > > > --- > > drivers/platform/x86/toshiba_acpi.c | 6 ++ > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/platform/x86/toshiba_acpi.c > > b/drivers/platform/x86/toshiba_acpi.c > > index fa7232ad8c39..6d298810b7bf 100644 > > --- a/drivers/platform/x86/toshiba_acpi.c > > +++ b/drivers/platform/x86/toshiba_acpi.c > > @@ -2998,8 +2998,6 @@ static int toshiba_acpi_remove(struct acpi_device > > *acpi_dev) > > if (toshiba_acpi) > > toshiba_acpi = NULL; > > > > - kfree(dev); > > - > > return 0; > > } > > > > @@ -3016,6 +3014,7 @@ static const char *find_hci_method(acpi_handle handle) > > > > static int toshiba_acpi_add(struct acpi_device *acpi_dev) > > { > > + struct device *parent = _dev->dev; > > struct toshiba_acpi_dev *dev; > > const char *hci_method; > > u32 dummy; > > @@ -3033,7 +3032,7 @@ static int toshiba_acpi_add(struct acpi_device > > *acpi_dev) > > return -ENODEV; > > } > > > > - dev = kzalloc(sizeof(*dev), GFP_KERNEL); > > + dev = devm_kzalloc(parent, sizeof(*dev), GFP_KERNEL); > > if (!dev) > > return -ENOMEM; > > dev->acpi_dev = acpi_dev; > > @@ -3045,7 +3044,6 @@ static int toshiba_acpi_add(struct acpi_device > > *acpi_dev) > > ret = misc_register(>miscdev); > > if (ret) { > > pr_err("Failed to register miscdevice\n"); > > - kfree(dev); > > return ret; > > } > > >
[PATCH net-next 1/1] stmmac: intel: add cross time-stamping freq difference adjustment
Cross time-stamping mechanism used in certain instance of Intel mGbE may run at different clock frequency in comparison to the clock frequency used by processor, so we introduce cross T/S frequency adjustment to ensure TSC calculation is correct when processor got the cross time-stamps. Signed-off-by: Wong Vee Khee --- .../net/ethernet/stmicro/stmmac/dwmac-intel.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c index 08b4852eed4c..3d9a57043af2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c @@ -22,8 +22,13 @@ #define PCH_PTP_CLK_FREQ_19_2MHZ (GMAC_GPO0) #define PCH_PTP_CLK_FREQ_200MHZ(0) +/* Cross-timestamping defines */ +#define ART_CPUID_LEAF 0x15 +#define EHL_PSE_ART_MHZ1920 + struct intel_priv_data { int mdio_adhoc_addr;/* mdio address for serdes & etc */ + unsigned long crossts_adj; bool is_pse; }; @@ -340,9 +345,26 @@ static int intel_crosststamp(ktime_t *device, *system = convert_art_to_tsc(art_time); } + system->cycles *= intel_priv->crossts_adj; + return 0; } +static void intel_mgbe_pse_crossts_adj(struct intel_priv_data *intel_priv, + int base) +{ + if (boot_cpu_has(X86_FEATURE_ART)) { + unsigned int art_freq; + + /* On systems that support ART, ART frequency can be obtained +* from ECX register of CPUID leaf (0x15). +*/ + art_freq = cpuid_ecx(ART_CPUID_LEAF); + do_div(art_freq, base); + intel_priv->crossts_adj = art_freq; + } +} + static void common_default_data(struct plat_stmmacenet_data *plat) { plat->clk_csr = 2; /* clk_csr_i = 20-35MHz & MDC = clk_csr_i/16 */ @@ -551,6 +573,8 @@ static int ehl_pse0_common_data(struct pci_dev *pdev, plat->bus_id = 2; plat->addr64 = 32; + intel_mgbe_pse_crossts_adj(intel_priv, EHL_PSE_ART_MHZ); + return ehl_common_data(pdev, plat); } @@ -587,6 +611,8 @@ static int ehl_pse1_common_data(struct pci_dev *pdev, plat->bus_id = 3; plat->addr64 = 32; + intel_mgbe_pse_crossts_adj(intel_priv, EHL_PSE_ART_MHZ); + return ehl_common_data(pdev, plat); } @@ -913,6 +939,7 @@ static int intel_eth_pci_probe(struct pci_dev *pdev, plat->bsp_priv = intel_priv; intel_priv->mdio_adhoc_addr = INTEL_MGBE_ADHOC_ADDR; + intel_priv->crossts_adj = 1; /* Initialize all MSI vectors to invalid so that it can be set * according to platform data settings below. -- 2.25.1
Re: [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
On Tue, 30 Mar 2021 at 01:15, Sean Christopherson wrote: > > +Thomas > > On Mon, Mar 29, 2021, Wanpeng Li wrote: > > From: Wanpeng Li > > > > The bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=209831 > > reported that the guest time remains 0 when running a while true > > loop in the guest. > > > > The commit 87fa7f3e98a131 ("x86/kvm: Move context tracking where it > > belongs") moves guest_exit_irqoff() close to vmexit breaks the > > tick-based time accouting when the ticks that happen after IRQs are > > disabled are incorrectly accounted to the host/system time. This is > > because we exit the guest state too early. > > > > vtime-based time accounting is tied to context tracking, keep the > > guest_exit_irqoff() around vmexit code when both vtime-based time > > accounting and specific cpu is context tracking mode active. > > Otherwise, leave guest_exit_irqoff() after handle_exit_irqoff() > > and explicit IRQ window for tick-based time accouting. > > > > Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs") > > Cc: Sean Christopherson > > Signed-off-by: Wanpeng Li > > --- > > arch/x86/kvm/svm/svm.c | 3 ++- > > arch/x86/kvm/vmx/vmx.c | 3 ++- > > arch/x86/kvm/x86.c | 2 ++ > > 3 files changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > index 58a45bb..55fb5ce 100644 > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -3812,7 +3812,8 @@ static noinstr void svm_vcpu_enter_exit(struct > > kvm_vcpu *vcpu, > >* into world and some more. > >*/ > > lockdep_hardirqs_off(CALLER_ADDR0); > > - guest_exit_irqoff(); > > + if (vtime_accounting_enabled_this_cpu()) > > + guest_exit_irqoff(); > > > > instrumentation_begin(); > > trace_hardirqs_off_finish(); > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > index 32cf828..85695b3 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct > > kvm_vcpu *vcpu, > >* into world and some more. > >*/ > > lockdep_hardirqs_off(CALLER_ADDR0); > > - guest_exit_irqoff(); > > + if (vtime_accounting_enabled_this_cpu()) > > + guest_exit_irqoff(); > > This looks ok, as CONFIG_CONTEXT_TRACKING and CONFIG_VIRT_CPU_ACCOUNTING_GEN > are > selected by CONFIG_NO_HZ_FULL=y, and can't be enabled independently, e.g. the > rcu_user_exit() call won't be delayed because it will never be called in the > !vtime case. But it still feels wrong poking into those details, e.g. it'll > be weird and/or wrong guest_exit_irqoff() gains stuff that isn't vtime > specific. Could you elaborate what's the meaning of "it'll be weird and/or wrong guest_exit_irqoff() gains stuff that isn't vtime specific."? Wanpeng
Re: [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
+Thomas On Mon, Mar 29, 2021, Wanpeng Li wrote: > From: Wanpeng Li > > The bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=209831 > reported that the guest time remains 0 when running a while true > loop in the guest. > > The commit 87fa7f3e98a131 ("x86/kvm: Move context tracking where it > belongs") moves guest_exit_irqoff() close to vmexit breaks the > tick-based time accouting when the ticks that happen after IRQs are > disabled are incorrectly accounted to the host/system time. This is > because we exit the guest state too early. > > vtime-based time accounting is tied to context tracking, keep the > guest_exit_irqoff() around vmexit code when both vtime-based time > accounting and specific cpu is context tracking mode active. > Otherwise, leave guest_exit_irqoff() after handle_exit_irqoff() > and explicit IRQ window for tick-based time accouting. > > Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs") > Cc: Sean Christopherson > Signed-off-by: Wanpeng Li > --- > arch/x86/kvm/svm/svm.c | 3 ++- > arch/x86/kvm/vmx/vmx.c | 3 ++- > arch/x86/kvm/x86.c | 2 ++ > 3 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 58a45bb..55fb5ce 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -3812,7 +3812,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu > *vcpu, >* into world and some more. >*/ > lockdep_hardirqs_off(CALLER_ADDR0); > - guest_exit_irqoff(); > + if (vtime_accounting_enabled_this_cpu()) > + guest_exit_irqoff(); > > instrumentation_begin(); > trace_hardirqs_off_finish(); > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 32cf828..85695b3 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu > *vcpu, >* into world and some more. >*/ > lockdep_hardirqs_off(CALLER_ADDR0); > - guest_exit_irqoff(); > + if (vtime_accounting_enabled_this_cpu()) > + guest_exit_irqoff(); This looks ok, as CONFIG_CONTEXT_TRACKING and CONFIG_VIRT_CPU_ACCOUNTING_GEN are selected by CONFIG_NO_HZ_FULL=y, and can't be enabled independently, e.g. the rcu_user_exit() call won't be delayed because it will never be called in the !vtime case. But it still feels wrong poking into those details, e.g. it'll be weird and/or wrong guest_exit_irqoff() gains stuff that isn't vtime specific. Maybe that will never happen though? And of course, my hack alternative also pokes into the details[*]. Thomas, do you have an input on the least awful way to handle this? My horrible hack was to force PF_VCPU around the window where KVM handles IRQs after guest exit. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d9f931c63293..6ddf341cd755 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9118,6 +9118,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) vcpu->mode = OUTSIDE_GUEST_MODE; smp_wmb(); + /* +* Temporarily pretend this task is running a vCPU when potentially +* processing an IRQ exit, including the below opening of an IRQ +* window. Tick-based accounting of guest time relies on PF_VCPU +* being set when the tick IRQ handler runs. +*/ + current->flags |= PF_VCPU; static_call(kvm_x86_handle_exit_irqoff)(vcpu); /* @@ -9132,6 +9139,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) ++vcpu->stat.exits; local_irq_disable(); kvm_after_interrupt(vcpu); + current->flags &= ~PF_VCPU; if (lapic_in_kernel(vcpu)) { s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta; [*]https://lkml.kernel.org/r/20210206004218.312023-1-sea...@google.com > instrumentation_begin(); > trace_hardirqs_off_finish(); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index fe806e8..234c8b3 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9185,6 +9185,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > ++vcpu->stat.exits; > local_irq_disable(); > kvm_after_interrupt(vcpu); > + if (!vtime_accounting_enabled_this_cpu()) > + guest_exit_irqoff(); > > if (lapic_in_kernel(vcpu)) { > s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta; > -- > 2.7.4 >
Re: [PATCH v5 1/3] dt-bindings:iio:adc: add generic settling-time-us and oversampling-ratio channel properties
On Mon, 29 Mar 2021 15:06:20 +0200 Oleksij Rempel wrote: > On Mon, Mar 29, 2021 at 11:25:32AM +0100, Jonathan Cameron wrote: > > On Mon, 29 Mar 2021 09:31:29 +0200 > > Oleksij Rempel wrote: > > > > > Settling time and over sampling is a typical challenge for different IIO > > > ADC > > > devices. So, introduce channel specific settling-time-us and > > > oversampling-ratio > > > properties to cover this use case. > > > > > > Signed-off-by: Oleksij Rempel > > > --- > > > Documentation/devicetree/bindings/iio/adc/adc.yaml | 8 > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/Documentation/devicetree/bindings/iio/adc/adc.yaml > > > b/Documentation/devicetree/bindings/iio/adc/adc.yaml > > > index 912a7635edc4..d5bc86d2a2af 100644 > > > --- a/Documentation/devicetree/bindings/iio/adc/adc.yaml > > > +++ b/Documentation/devicetree/bindings/iio/adc/adc.yaml > > > @@ -39,4 +39,12 @@ properties: > > > The first value specifies the positive input pin, the second > > >specifies the negative input pin. > > > > > > + settling-time-us: > > > +description: > > > + Time between enabling the channel and firs stable readings. > > > > first > > ack > > > > + > > > + oversampling-ratio: > > > +$ref: /schemas/types.yaml#/definitions/uint32 > > > +description: Number of data samples which are averaged for each > > > read. > > > > I think I've asked about this in previous reviews, but I want a clear > > statement > > of why you think this property is a feature of the 'board' (and hence > > should be > > in device tree) rather than setting sensible defaults and leaving any > > control > > to userspace? > > yes, my reply was: Ah. I missed it somewhere along the way, thanks for repeating here. > > Oversampling is used as replacement of or addition to the low-pass filter. > > The > > filter can be implemented on board, but it will change settling time > > characteristic. Since low-pass filter is board specific characteristic, this > > property belongs in device tree as well. > > I could imagine that this values can be overwritten from user space for > diagnostic, but we need some working default values. Hmm. So low pass filters are interesting whether they are actually a characteristic of the board (obviously they are if they are resistors/ caps etc on the board), or of the application. Some applications want noisy messy data, others not so much. What filter you need to achieve a specific noise level on a given board is indeed a characteristic of the board. However, what that noise level is (which actually drives the decision) is not a board characteristic. If we have a configurable filter, then that can be argued to be a policy decision and hence userspace, not DT. > > Should I integrate this comment in to the yaml? Definitely. Whilst I'm not that keen on this one, you have made a reasonable argument that it is 'sort of' a board characteristic, so I can live with that as long as it is there. Perhaps the slightly amended version of the above. "Oversampling is used as replacement of or addition to the low-pass filter. In some cases, the desired filtering characteristics are a function the device design and can interact with other characteristics such as settling time." Jonathan > > Regards, > Oleksij
Re: [PATCH 01/10] platform/x86: toshiba_acpi: bind life-time of toshiba_acpi_dev to parent
On Wed, 24 Mar 2021 14:55:39 +0200 Alexandru Ardelean wrote: > The 'toshiba_acpi_dev' object is allocated first and free'd last. We can > bind it's life-time to the parent ACPI device object. This is a first step > in using more device-managed allocated functions for this. > > The main intent is to try to convert the IIO framework to export only > device-managed functions (i.e. devm_iio_device_alloc() and > devm_iio_device_register()). It's still not 100% sure that this is > possible, but for now, this is the process of taking it slowly in that > direction. > > Signed-off-by: Alexandru Ardelean Might just be me, but naming anything dev that isn't a struct device * is downright confusing? > --- > drivers/platform/x86/toshiba_acpi.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/platform/x86/toshiba_acpi.c > b/drivers/platform/x86/toshiba_acpi.c > index fa7232ad8c39..6d298810b7bf 100644 > --- a/drivers/platform/x86/toshiba_acpi.c > +++ b/drivers/platform/x86/toshiba_acpi.c > @@ -2998,8 +2998,6 @@ static int toshiba_acpi_remove(struct acpi_device > *acpi_dev) > if (toshiba_acpi) > toshiba_acpi = NULL; > > - kfree(dev); > - > return 0; > } > > @@ -3016,6 +3014,7 @@ static const char *find_hci_method(acpi_handle handle) > > static int toshiba_acpi_add(struct acpi_device *acpi_dev) > { > + struct device *parent = _dev->dev; > struct toshiba_acpi_dev *dev; > const char *hci_method; > u32 dummy; > @@ -3033,7 +3032,7 @@ static int toshiba_acpi_add(struct acpi_device > *acpi_dev) > return -ENODEV; > } > > - dev = kzalloc(sizeof(*dev), GFP_KERNEL); > + dev = devm_kzalloc(parent, sizeof(*dev), GFP_KERNEL); > if (!dev) > return -ENOMEM; > dev->acpi_dev = acpi_dev; > @@ -3045,7 +3044,6 @@ static int toshiba_acpi_add(struct acpi_device > *acpi_dev) > ret = misc_register(>miscdev); > if (ret) { > pr_err("Failed to register miscdevice\n"); > - kfree(dev); > return ret; > } >
Re: [PATCH v5 1/3] dt-bindings:iio:adc: add generic settling-time-us and oversampling-ratio channel properties
On Mon, Mar 29, 2021 at 11:25:32AM +0100, Jonathan Cameron wrote: > On Mon, 29 Mar 2021 09:31:29 +0200 > Oleksij Rempel wrote: > > > Settling time and over sampling is a typical challenge for different IIO ADC > > devices. So, introduce channel specific settling-time-us and > > oversampling-ratio > > properties to cover this use case. > > > > Signed-off-by: Oleksij Rempel > > --- > > Documentation/devicetree/bindings/iio/adc/adc.yaml | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/iio/adc/adc.yaml > > b/Documentation/devicetree/bindings/iio/adc/adc.yaml > > index 912a7635edc4..d5bc86d2a2af 100644 > > --- a/Documentation/devicetree/bindings/iio/adc/adc.yaml > > +++ b/Documentation/devicetree/bindings/iio/adc/adc.yaml > > @@ -39,4 +39,12 @@ properties: > >The first value specifies the positive input pin, the second > >specifies the negative input pin. > > > > + settling-time-us: > > +description: > > + Time between enabling the channel and firs stable readings. > > first ack > > + > > + oversampling-ratio: > > +$ref: /schemas/types.yaml#/definitions/uint32 > > +description: Number of data samples which are averaged for each read. > > I think I've asked about this in previous reviews, but I want a clear > statement > of why you think this property is a feature of the 'board' (and hence should > be > in device tree) rather than setting sensible defaults and leaving any control > to userspace? yes, my reply was: > Oversampling is used as replacement of or addition to the low-pass filter. The > filter can be implemented on board, but it will change settling time > characteristic. Since low-pass filter is board specific characteristic, this > property belongs in device tree as well. I could imagine that this values can be overwritten from user space for diagnostic, but we need some working default values. Should I integrate this comment in to the yaml? Regards, Oleksij -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0| Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- |
Re: [PATCH v5 1/3] dt-bindings:iio:adc: add generic settling-time-us and oversampling-ratio channel properties
On Mon, 29 Mar 2021 09:31:29 +0200 Oleksij Rempel wrote: > Settling time and over sampling is a typical challenge for different IIO ADC > devices. So, introduce channel specific settling-time-us and > oversampling-ratio > properties to cover this use case. > > Signed-off-by: Oleksij Rempel > --- > Documentation/devicetree/bindings/iio/adc/adc.yaml | 8 > 1 file changed, 8 insertions(+) > > diff --git a/Documentation/devicetree/bindings/iio/adc/adc.yaml > b/Documentation/devicetree/bindings/iio/adc/adc.yaml > index 912a7635edc4..d5bc86d2a2af 100644 > --- a/Documentation/devicetree/bindings/iio/adc/adc.yaml > +++ b/Documentation/devicetree/bindings/iio/adc/adc.yaml > @@ -39,4 +39,12 @@ properties: >The first value specifies the positive input pin, the second > specifies the negative input pin. > > + settling-time-us: > +description: > + Time between enabling the channel and firs stable readings. first > + > + oversampling-ratio: > +$ref: /schemas/types.yaml#/definitions/uint32 > +description: Number of data samples which are averaged for each read. I think I've asked about this in previous reviews, but I want a clear statement of why you think this property is a feature of the 'board' (and hence should be in device tree) rather than setting sensible defaults and leaving any control to userspace? Jonathan > + > additionalProperties: true
[PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
From: Wanpeng Li The bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=209831 reported that the guest time remains 0 when running a while true loop in the guest. The commit 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs") moves guest_exit_irqoff() close to vmexit breaks the tick-based time accouting when the ticks that happen after IRQs are disabled are incorrectly accounted to the host/system time. This is because we exit the guest state too early. vtime-based time accounting is tied to context tracking, keep the guest_exit_irqoff() around vmexit code when both vtime-based time accounting and specific cpu is context tracking mode active. Otherwise, leave guest_exit_irqoff() after handle_exit_irqoff() and explicit IRQ window for tick-based time accouting. Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs") Cc: Sean Christopherson Signed-off-by: Wanpeng Li --- arch/x86/kvm/svm/svm.c | 3 ++- arch/x86/kvm/vmx/vmx.c | 3 ++- arch/x86/kvm/x86.c | 2 ++ 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 58a45bb..55fb5ce 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3812,7 +3812,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, * into world and some more. */ lockdep_hardirqs_off(CALLER_ADDR0); - guest_exit_irqoff(); + if (vtime_accounting_enabled_this_cpu()) + guest_exit_irqoff(); instrumentation_begin(); trace_hardirqs_off_finish(); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 32cf828..85695b3 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, * into world and some more. */ lockdep_hardirqs_off(CALLER_ADDR0); - guest_exit_irqoff(); + if (vtime_accounting_enabled_this_cpu()) + guest_exit_irqoff(); instrumentation_begin(); trace_hardirqs_off_finish(); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fe806e8..234c8b3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9185,6 +9185,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) ++vcpu->stat.exits; local_irq_disable(); kvm_after_interrupt(vcpu); + if (!vtime_accounting_enabled_this_cpu()) + guest_exit_irqoff(); if (lapic_in_kernel(vcpu)) { s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta; -- 2.7.4
[PATCH v5 1/3] dt-bindings:iio:adc: add generic settling-time-us and oversampling-ratio channel properties
Settling time and over sampling is a typical challenge for different IIO ADC devices. So, introduce channel specific settling-time-us and oversampling-ratio properties to cover this use case. Signed-off-by: Oleksij Rempel --- Documentation/devicetree/bindings/iio/adc/adc.yaml | 8 1 file changed, 8 insertions(+) diff --git a/Documentation/devicetree/bindings/iio/adc/adc.yaml b/Documentation/devicetree/bindings/iio/adc/adc.yaml index 912a7635edc4..d5bc86d2a2af 100644 --- a/Documentation/devicetree/bindings/iio/adc/adc.yaml +++ b/Documentation/devicetree/bindings/iio/adc/adc.yaml @@ -39,4 +39,12 @@ properties: The first value specifies the positive input pin, the second specifies the negative input pin. + settling-time-us: +description: + Time between enabling the channel and firs stable readings. + + oversampling-ratio: +$ref: /schemas/types.yaml#/definitions/uint32 +description: Number of data samples which are averaged for each read. + additionalProperties: true -- 2.29.2
[PATCH v3 25/27] perf tests: Support 'Convert perf time to TSC' test for hybrid
Since for "cycles:u' on hybrid platform, it creates two "cycles". So the second evsel in evlist also needs initialization. With this patch, # ./perf test 71 71: Convert perf time to TSC: Ok Signed-off-by: Jin Yao --- v3: - No functional change. tools/perf/tests/perf-time-to-tsc.c | 16 1 file changed, 16 insertions(+) diff --git a/tools/perf/tests/perf-time-to-tsc.c b/tools/perf/tests/perf-time-to-tsc.c index 680c3cffb128..72f268c6cc5d 100644 --- a/tools/perf/tests/perf-time-to-tsc.c +++ b/tools/perf/tests/perf-time-to-tsc.c @@ -20,6 +20,7 @@ #include "tsc.h" #include "mmap.h" #include "tests.h" +#include "pmu.h" #define CHECK__(x) { \ while ((x) < 0) { \ @@ -66,6 +67,10 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe u64 test_tsc, comm1_tsc, comm2_tsc; u64 test_time, comm1_time = 0, comm2_time = 0; struct mmap *md; + bool hybrid = false; + + if (perf_pmu__has_hybrid()) + hybrid = true; threads = thread_map__new(-1, getpid(), UINT_MAX); CHECK_NOT_NULL__(threads); @@ -88,6 +93,17 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe evsel->core.attr.disabled = 1; evsel->core.attr.enable_on_exec = 0; + /* +* For hybrid "cycles:u", it creates two events. +* Init the second evsel here. +*/ + if (hybrid) { + evsel = evsel__next(evsel); + evsel->core.attr.comm = 1; + evsel->core.attr.disabled = 1; + evsel->core.attr.enable_on_exec = 0; + } + CHECK__(evlist__open(evlist)); CHECK__(evlist__mmap(evlist, UINT_MAX)); -- 2.17.1
[PATCH v2 4/6] sched: introduce task block time in schedstats
Currently in schedstats we have sum_sleep_runtime and iowait_sum, but there's no metric to show how long the task is in D state. Once a task in D state, it means the task is blocked in the kernel, for example the task may be waiting for a mutex. The D state is more frequent than iowait, and it is more critital than S state. So it is worth to add a metric to measure it. Signed-off-by: Yafang Shao --- include/linux/sched.h | 2 ++ kernel/sched/debug.c | 6 -- kernel/sched/stats.c | 1 + 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index b687bb38897b..2b885481b8bf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -428,6 +428,8 @@ struct sched_statistics { u64 block_start; u64 block_max; + s64 sum_block_runtime; + u64 exec_max; u64 slice_max; diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index d1bc616936d9..0995412dd3c0 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -499,10 +499,11 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p) (long long)(p->nvcsw + p->nivcsw), p->prio); - SEQ_printf(m, "%9Ld.%06ld %9Ld.%06ld %9Ld.%06ld", + SEQ_printf(m, "%9lld.%06ld %9lld.%06ld %9lld.%06ld %9lld.%06ld", SPLIT_NS(schedstat_val_or_zero(p->stats.wait_sum)), SPLIT_NS(p->se.sum_exec_runtime), - SPLIT_NS(schedstat_val_or_zero(p->stats.sum_sleep_runtime))); + SPLIT_NS(schedstat_val_or_zero(p->stats.sum_sleep_runtime)), + SPLIT_NS(schedstat_val_or_zero(p->stats.sum_block_runtime))); #ifdef CONFIG_NUMA_BALANCING SEQ_printf(m, " %d %d", task_node(p), task_numa_group_id(p)); @@ -941,6 +942,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, u64 avg_atom, avg_per_cpu; PN_SCHEDSTAT(stats.sum_sleep_runtime); + PN_SCHEDSTAT(stats.sum_block_runtime); PN_SCHEDSTAT(stats.wait_start); PN_SCHEDSTAT(stats.sleep_start); PN_SCHEDSTAT(stats.block_start); diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c index b2542f4d3192..21fae41c06f5 100644 --- a/kernel/sched/stats.c +++ b/kernel/sched/stats.c @@ -82,6 +82,7 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p, __schedstat_set(stats->block_start, 0); __schedstat_add(stats->sum_sleep_runtime, delta); + __schedstat_add(stats->sum_block_runtime, delta); if (p) { if (p->in_iowait) { -- 2.18.2
Re: [PATCH v4 1/3] dt-bindings:iio:adc: add generic settling-time-us and oversampling-ratio channel properties
On Mon, Mar 22, 2021 at 04:06:06PM +0100, Oleksij Rempel wrote: > Settling time and over sampling is a typical challenge for different IIO ADC > devices. So, introduce channel specific settling-time-us and > oversampling-ratio > properties to cover this use case. > > Signed-off-by: Oleksij Rempel > --- > Documentation/devicetree/bindings/iio/adc/adc.yaml | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/Documentation/devicetree/bindings/iio/adc/adc.yaml > b/Documentation/devicetree/bindings/iio/adc/adc.yaml > index 912a7635edc4..66fd4b45f097 100644 > --- a/Documentation/devicetree/bindings/iio/adc/adc.yaml > +++ b/Documentation/devicetree/bindings/iio/adc/adc.yaml > @@ -39,4 +39,13 @@ properties: >The first value specifies the positive input pin, the second > specifies the negative input pin. > > + settling-time-us: > +$ref: /schemas/types.yaml#/definitions/uint32 Don't need a type for properties with a standard unit suffix. > +description: > + Time between enabling the channel and firs stable readings. > + > + oversampling-ratio: > +$ref: /schemas/types.yaml#/definitions/uint32 > +description: Number of data samples which are averaged for each read. > + > additionalProperties: true > -- > 2.29.2 >
[PATCH v2 06/20] drm/dp: Clarify DP AUX registration time
The docs we had for drm_dp_aux_init() and drm_dp_aux_register() were mostly correct, except for the fact that they made the assumption that all AUX devices were grandchildren of their respective DRM devices. This is the case for most normal GPUs, but is almost never the case with SoCs and display bridges. So, let's fix this documentation to clarify when the right time to use drm_dp_aux_init() or drm_dp_aux_register() is. Signed-off-by: Lyude Paul --- drivers/gpu/drm/drm_dp_helper.c | 44 +++-- 1 file changed, 31 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c index eedbb48815b7..3fa858b9691c 100644 --- a/drivers/gpu/drm/drm_dp_helper.c +++ b/drivers/gpu/drm/drm_dp_helper.c @@ -1728,10 +1728,17 @@ EXPORT_SYMBOL(drm_dp_remote_aux_init); * drm_dp_aux_init() - minimally initialise an aux channel * @aux: DisplayPort AUX channel * - * If you need to use the drm_dp_aux's i2c adapter prior to registering it - * with the outside world, call drm_dp_aux_init() first. You must still - * call drm_dp_aux_register() once the connector has been registered to - * allow userspace access to the auxiliary DP channel. + * If you need to use the drm_dp_aux's i2c adapter prior to registering it with + * the outside world, call drm_dp_aux_init() first. For drivers which are + * grandparents to their AUX adapters (e.g. the AUX adapter is parented by a + * _connector), you must still call drm_dp_aux_register() once the connector + * has been registered to allow userspace access to the auxiliary DP channel. + * Likewise, for such drivers you should also assign _dp_aux.drm_dev as + * early as possible so that the _device that corresponds to the AUX adapter + * may be mentioned in debugging output from the DRM DP helpers. + * + * For devices which use a separate platform device for their AUX adapters, this + * may be called as early as required by the driver. */ void drm_dp_aux_init(struct drm_dp_aux *aux) { @@ -1751,15 +1758,26 @@ EXPORT_SYMBOL(drm_dp_aux_init); * drm_dp_aux_register() - initialise and register aux channel * @aux: DisplayPort AUX channel * - * Automatically calls drm_dp_aux_init() if this hasn't been done yet. - * This should only be called when the underlying drm_connector is - * initialiazed already. Therefore the best place to call this is from - * _connector_funcs.late_register. Not that drivers which don't follow this - * will Oops when CONFIG_DRM_DP_AUX_CHARDEV is enabled. - * - * Drivers which need to use the aux channel before that point (e.g. at driver - * load time, before drm_dev_register() has been called) need to call - * drm_dp_aux_init(). + * Automatically calls drm_dp_aux_init() if this hasn't been done yet. This + * should only be called once the parent of @aux, _dp_aux.dev, is + * initialized. For devices which are grandparents of their AUX channels, + * _dp_aux.dev will typically be the _connector which + * corresponds to @aux. For these devices, it's advised to call + * drm_dp_aux_register() in _connector_funcs.late_register, and likewise to + * call drm_dp_aux_unregister() in _connector_funcs.early_unregister. + * Functions which don't follow this will likely Oops when + * %CONFIG_DRM_DP_AUX_CHARDEV is enabled. + * + * For devices where the AUX channel is a device that exists independently of + * the _device that uses it, such as SoCs and bridge devices, it is + * recommended to call drm_dp_aux_register() after a _device has been + * assigned to _dp_aux.drm_dev, and likewise to call drm_dp_aux_unregister() + * once the _device should no longer be associated with the AUX channel + * (e.g. on bridge detach). + * + * Drivers which need to use the aux channel before either of the two points + * mentioned above need to call drm_dp_aux_init() in order to use the AUX + * channel before registration. * * Returns 0 on success or a negative error code on failure. */ -- 2.30.2
Re: [PATCH] dt-bindings: i2c: Add device clock-stretch time via dts
On Wed, 2021-03-24 at 11:12 -0600, Rob Herring wrote: > On Sat, Mar 13, 2021 at 04:07:09PM +0800, qii.w...@mediatek.com wrote: > > From: Qii Wang > > > > tSU,STA/tHD,STA/tSU,STOP maybe out of spec due to device > > clock-stretching or circuit loss, we could get device > > clock-stretch time from dts to adjust these parameters > > to meet the spec via EXT_CONF register. > > > > Signed-off-by: Qii Wang > > --- > > Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt > > b/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt > > index 7f0194f..97f66f0 100644 > > --- a/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt > > +++ b/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt > > @@ -32,6 +32,7 @@ Optional properties: > >- mediatek,have-pmic: platform can control i2c form special pmic side. > > Only mt6589 and mt8135 support this feature. > >- mediatek,use-push-pull: IO config use push-pull mode. > > + - clock-stretch-ns: Slave device clock-stretch time. > > Should be a common I2C property? > Wolfram Sang will look at this next and think about it. I hope it would be a common I2C property. > > > > Example: > > > > -- > > 1.9.1 > >
[PATCH 2/3] ASoC:codec:max98373: Added 30ms turn on/off time delay
Amp requires 10 ~ 30ms for the power ON and OFF. Added 30ms delay for stability. Signed-off-by: Ryan Lee --- sound/soc/codecs/max98373.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/soc/codecs/max98373.c b/sound/soc/codecs/max98373.c index 746c829312b8..1346a98ce8a1 100644 --- a/sound/soc/codecs/max98373.c +++ b/sound/soc/codecs/max98373.c @@ -28,11 +28,13 @@ static int max98373_dac_event(struct snd_soc_dapm_widget *w, regmap_update_bits(max98373->regmap, MAX98373_R20FF_GLOBAL_SHDN, MAX98373_GLOBAL_EN_MASK, 1); + usleep_range(3, 31000); break; case SND_SOC_DAPM_POST_PMD: regmap_update_bits(max98373->regmap, MAX98373_R20FF_GLOBAL_SHDN, MAX98373_GLOBAL_EN_MASK, 0); + usleep_range(3, 31000); max98373->tdm_mode = false; break; default: -- 2.17.1
Re: [PATCH] dt-bindings: i2c: Add device clock-stretch time via dts
On Sat, Mar 13, 2021 at 04:07:09PM +0800, qii.w...@mediatek.com wrote: > From: Qii Wang > > tSU,STA/tHD,STA/tSU,STOP maybe out of spec due to device > clock-stretching or circuit loss, we could get device > clock-stretch time from dts to adjust these parameters > to meet the spec via EXT_CONF register. > > Signed-off-by: Qii Wang > --- > Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt > b/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt > index 7f0194f..97f66f0 100644 > --- a/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt > +++ b/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt > @@ -32,6 +32,7 @@ Optional properties: >- mediatek,have-pmic: platform can control i2c form special pmic side. > Only mt6589 and mt8135 support this feature. >- mediatek,use-push-pull: IO config use push-pull mode. > + - clock-stretch-ns: Slave device clock-stretch time. Should be a common I2C property? > > Example: > > -- > 1.9.1 >
[PATCH 01/10] platform/x86: toshiba_acpi: bind life-time of toshiba_acpi_dev to parent
The 'toshiba_acpi_dev' object is allocated first and free'd last. We can bind it's life-time to the parent ACPI device object. This is a first step in using more device-managed allocated functions for this. The main intent is to try to convert the IIO framework to export only device-managed functions (i.e. devm_iio_device_alloc() and devm_iio_device_register()). It's still not 100% sure that this is possible, but for now, this is the process of taking it slowly in that direction. Signed-off-by: Alexandru Ardelean --- drivers/platform/x86/toshiba_acpi.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/platform/x86/toshiba_acpi.c b/drivers/platform/x86/toshiba_acpi.c index fa7232ad8c39..6d298810b7bf 100644 --- a/drivers/platform/x86/toshiba_acpi.c +++ b/drivers/platform/x86/toshiba_acpi.c @@ -2998,8 +2998,6 @@ static int toshiba_acpi_remove(struct acpi_device *acpi_dev) if (toshiba_acpi) toshiba_acpi = NULL; - kfree(dev); - return 0; } @@ -3016,6 +3014,7 @@ static const char *find_hci_method(acpi_handle handle) static int toshiba_acpi_add(struct acpi_device *acpi_dev) { + struct device *parent = _dev->dev; struct toshiba_acpi_dev *dev; const char *hci_method; u32 dummy; @@ -3033,7 +3032,7 @@ static int toshiba_acpi_add(struct acpi_device *acpi_dev) return -ENODEV; } - dev = kzalloc(sizeof(*dev), GFP_KERNEL); + dev = devm_kzalloc(parent, sizeof(*dev), GFP_KERNEL); if (!dev) return -ENOMEM; dev->acpi_dev = acpi_dev; @@ -3045,7 +3044,6 @@ static int toshiba_acpi_add(struct acpi_device *acpi_dev) ret = misc_register(>miscdev); if (ret) { pr_err("Failed to register miscdevice\n"); - kfree(dev); return ret; } -- 2.30.2
Re: [PATCH v4 18/22] x86/fpu/amx: Define AMX state components and have it used for boot-time checks
On Mar 20, 2021, at 14:31, Thomas Gleixner wrote: > On Sun, Feb 21 2021 at 10:56, Chang S. Bae wrote: >> >> +static void check_xtile_data_against_struct(int size) >> +{ >> +u32 max_palid, palid, state_size; >> +u32 eax, ebx, ecx, edx; >> +u16 max_tile; >> + >> +/* >> + * Check the maximum palette id: >> + * eax: the highest numbered palette subleaf. >> + */ >> +cpuid_count(TILE_CPUID, 0, _palid, , , ); >> + >> +/* >> + * Cross-check each tile size and find the maximum >> + * number of supported tiles. >> + */ >> +for (palid = 1, max_tile = 0; palid <= max_palid; palid++) { >> +u16 tile_size, max; >> + >> +/* >> + * Check the tile size info: >> + * eax[31:16]: bytes per title >> + * ebx[31:16]: the max names (or max number of tiles) >> + */ >> +cpuid_count(TILE_CPUID, palid, , , , ); >> +tile_size = eax >> 16; >> +max = ebx >> 16; >> + >> +if (WARN_ONCE(tile_size != sizeof(struct xtile_data), >> + "%s: struct is %zu bytes, cpu xtile %d bytes\n", >> + __stringify(XFEATURE_XTILE_DATA), >> + sizeof(struct xtile_data), tile_size)) >> +__xstate_dump_leaves(); >> + >> +if (max > max_tile) >> +max_tile = max; >> +} >> + >> +state_size = sizeof(struct xtile_data) * max_tile; >> +if (WARN_ONCE(size != state_size, >> + "%s: calculated size is %u bytes, cpu state %d bytes\n", >> + __stringify(XFEATURE_XTILE_DATA), state_size, size)) >> +__xstate_dump_leaves(); > > So we have 2 warnings which complain about inconsistent state and that's > it? Why has this absolutely no consequences? We just keep stuff enabled > and jug along, right? > > Which one of the two states is correct? Why don't we just disable that > muck and be done with it to play it safe? > > Failing to execute some workload by saying NO due to inconsistency is > far more useful than taking the chance of potential silent data > corruption. This change in fact follows the mainline code [1], where this type of warning is emitted with such mismatch. Yes, disabling the feature looks to be the right way. Or, perhaps, taking a large one is an option when mismatched ? At least, given the feedback, the mainline needs to be revised before applying this. Correct me if you don’t think so. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/fpu/xstate.c#n567 Thanks, Chang
Re: [PATCH v4 net-next 02/11] net: bridge: add helper to retrieve the current ageing time
On 23/03/2021 01:51, Vladimir Oltean wrote: > From: Vladimir Oltean > > The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from: > > sysfs/ioctl/netlink > -> br_set_ageing_time >-> __set_ageing_time > > therefore not at bridge port creation time, so: > (a) switchdev drivers have to hardcode the initial value for the address > ageing time, because they didn't get any notification > (b) that hardcoded value can be out of sync, if the user changes the > ageing time before enslaving the port to the bridge > > We need a helper in the bridge, such that switchdev drivers can query > the current value of the bridge ageing time when they start offloading > it. > > Signed-off-by: Vladimir Oltean > Reviewed-by: Florian Fainelli > Reviewed-by: Tobias Waldekranz > --- > include/linux/if_bridge.h | 6 ++ > net/bridge/br_stp.c | 13 + > 2 files changed, 19 insertions(+) > The patch is mostly fine, there are a few minor nits (const qualifiers). If there is another version of the patch-set please add them, either way: Acked-by: Nikolay Aleksandrov > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h > index 920d3a02cc68..ebd16495459c 100644 > --- a/include/linux/if_bridge.h > +++ b/include/linux/if_bridge.h > @@ -137,6 +137,7 @@ struct net_device *br_fdb_find_port(const struct > net_device *br_dev, > void br_fdb_clear_offload(const struct net_device *dev, u16 vid); > bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag); > u8 br_port_get_stp_state(const struct net_device *dev); > +clock_t br_get_ageing_time(struct net_device *br_dev); > #else > static inline struct net_device * > br_fdb_find_port(const struct net_device *br_dev, > @@ -160,6 +161,11 @@ static inline u8 br_port_get_stp_state(const struct > net_device *dev) > { > return BR_STATE_DISABLED; > } > + > +static inline clock_t br_get_ageing_time(struct net_device *br_dev) const > +{ > + return 0; > +} > #endif > > #endif > diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c > index 86b5e05d3f21..3dafb6143cff 100644 > --- a/net/bridge/br_stp.c > +++ b/net/bridge/br_stp.c > @@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t > ageing_time) > return 0; > } > > +clock_t br_get_ageing_time(struct net_device *br_dev) const > +{ > + struct net_bridge *br; const > + > + if (!netif_is_bridge_master(br_dev)) > + return 0; > + > + br = netdev_priv(br_dev); > + > + return jiffies_to_clock_t(br->ageing_time); > +} > +EXPORT_SYMBOL_GPL(br_get_ageing_time); > + > /* called under bridge lock */ > void __br_set_topology_change(struct net_bridge *br, unsigned char val) > { >
[PATCH v4 net-next 08/11] net: dsa: inherit the actual bridge port flags at join time
From: Vladimir Oltean DSA currently assumes that the bridge port starts off with this constellation of bridge port flags: - learning on - unicast flooding on - multicast flooding on - broadcast flooding on just by virtue of code copy-pasta from the bridge layer (new_nbp). This was a simple enough strategy thus far, because the 'bridge join' moment always coincided with the 'bridge port creation' moment. But with sandwiched interfaces, such as: br0 | bond0 | swp0 it may happen that the user has had time to change the bridge port flags of bond0 before enslaving swp0 to it. In that case, swp0 will falsely assume that the bridge port flags are those determined by new_nbp, when in fact this can happen: ip link add br0 type bridge ip link add bond0 type bond ip link set bond0 master br0 ip link set bond0 type bridge_slave learning off ip link set swp0 master br0 Now swp0 has learning enabled, bond0 has learning disabled. Not nice. Fix this by "dumpster diving" through the actual bridge port flags with br_port_flag_is_set, at bridge join time. We use this opportunity to split dsa_port_change_brport_flags into two distinct functions called dsa_port_inherit_brport_flags and dsa_port_clear_brport_flags, now that the implementation for the two cases is no longer similar. This patch also creates two functions called dsa_port_switchdev_sync and dsa_port_switchdev_unsync which collect what we have so far, even if that's asymmetrical. More is going to be added in the next patch. Signed-off-by: Vladimir Oltean --- net/dsa/port.c | 123 - 1 file changed, 82 insertions(+), 41 deletions(-) diff --git a/net/dsa/port.c b/net/dsa/port.c index fcbe5b1545b8..c712bf3da0a0 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -122,28 +122,84 @@ void dsa_port_disable(struct dsa_port *dp) rtnl_unlock(); } -static void dsa_port_change_brport_flags(struct dsa_port *dp, -bool bridge_offload) +static int dsa_port_inherit_brport_flags(struct dsa_port *dp, +struct netlink_ext_ack *extack) { - struct switchdev_brport_flags flags; - int flag; + const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | + BR_BCAST_FLOOD; + struct net_device *brport_dev = dsa_port_to_bridge_port(dp); + int flag, err; - flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; - if (bridge_offload) - flags.val = flags.mask; - else - flags.val = flags.mask & ~BR_LEARNING; + for_each_set_bit(flag, , 32) { + struct switchdev_brport_flags flags = {0}; + + flags.mask = BIT(flag); - for_each_set_bit(flag, , 32) { - struct switchdev_brport_flags tmp; + if (br_port_flag_is_set(brport_dev, BIT(flag))) + flags.val = BIT(flag); + + err = dsa_port_bridge_flags(dp, flags, extack); + if (err && err != -EOPNOTSUPP) + return err; + } - tmp.val = flags.val & BIT(flag); - tmp.mask = BIT(flag); + return 0; +} - dsa_port_bridge_flags(dp, tmp, NULL); +static void dsa_port_clear_brport_flags(struct dsa_port *dp) +{ + const unsigned long val = BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; + const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | + BR_BCAST_FLOOD; + int flag, err; + + for_each_set_bit(flag, , 32) { + struct switchdev_brport_flags flags = {0}; + + flags.mask = BIT(flag); + flags.val = val & BIT(flag); + + err = dsa_port_bridge_flags(dp, flags, NULL); + if (err && err != -EOPNOTSUPP) + dev_err(dp->ds->dev, + "failed to clear bridge port flag %lu: %pe\n", + flags.val, ERR_PTR(err)); } } +static int dsa_port_switchdev_sync(struct dsa_port *dp, + struct netlink_ext_ack *extack) +{ + int err; + + err = dsa_port_inherit_brport_flags(dp, extack); + if (err) + return err; + + return 0; +} + +static void dsa_port_switchdev_unsync(struct dsa_port *dp) +{ + /* Configure the port for standalone mode (no address learning, +* flood everything). +* The bridge only emits SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS events +* when the user requests it through netlink or sysfs, but not +* automatically at port join or leave, so we need to handle resetting +* the brport flags ourselves. But we even prefer it that way, because +* otherwise, some setups might never get the notification they need, +
[PATCH v4 net-next 02/11] net: bridge: add helper to retrieve the current ageing time
From: Vladimir Oltean The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from: sysfs/ioctl/netlink -> br_set_ageing_time -> __set_ageing_time therefore not at bridge port creation time, so: (a) switchdev drivers have to hardcode the initial value for the address ageing time, because they didn't get any notification (b) that hardcoded value can be out of sync, if the user changes the ageing time before enslaving the port to the bridge We need a helper in the bridge, such that switchdev drivers can query the current value of the bridge ageing time when they start offloading it. Signed-off-by: Vladimir Oltean Reviewed-by: Florian Fainelli Reviewed-by: Tobias Waldekranz --- include/linux/if_bridge.h | 6 ++ net/bridge/br_stp.c | 13 + 2 files changed, 19 insertions(+) diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index 920d3a02cc68..ebd16495459c 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -137,6 +137,7 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev, void br_fdb_clear_offload(const struct net_device *dev, u16 vid); bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag); u8 br_port_get_stp_state(const struct net_device *dev); +clock_t br_get_ageing_time(struct net_device *br_dev); #else static inline struct net_device * br_fdb_find_port(const struct net_device *br_dev, @@ -160,6 +161,11 @@ static inline u8 br_port_get_stp_state(const struct net_device *dev) { return BR_STATE_DISABLED; } + +static inline clock_t br_get_ageing_time(struct net_device *br_dev) +{ + return 0; +} #endif #endif diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c index 86b5e05d3f21..3dafb6143cff 100644 --- a/net/bridge/br_stp.c +++ b/net/bridge/br_stp.c @@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t ageing_time) return 0; } +clock_t br_get_ageing_time(struct net_device *br_dev) +{ + struct net_bridge *br; + + if (!netif_is_bridge_master(br_dev)) + return 0; + + br = netdev_priv(br_dev); + + return jiffies_to_clock_t(br->ageing_time); +} +EXPORT_SYMBOL_GPL(br_get_ageing_time); + /* called under bridge lock */ void __br_set_topology_change(struct net_bridge *br, unsigned char val) { -- 2.25.1
Re: [PATCH v3 net-next 07/12] net: dsa: sync ageing time when joining the bridge
On 3/20/2021 3:34 PM, Vladimir Oltean wrote: > From: Vladimir Oltean > > The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from: > > sysfs/ioctl/netlink > -> br_set_ageing_time >-> __set_ageing_time > > therefore not at bridge port creation time, so: > (a) drivers had to hardcode the initial value for the address ageing time, > because they didn't get any notification > (b) that hardcoded value can be out of sync, if the user changes the > ageing time before enslaving the port to the bridge > > Signed-off-by: Vladimir Oltean Reviewed-by: Florian Fainelli -- Florian
[PATCH v4 1/3] dt-bindings:iio:adc: add generic settling-time-us and oversampling-ratio channel properties
Settling time and over sampling is a typical challenge for different IIO ADC devices. So, introduce channel specific settling-time-us and oversampling-ratio properties to cover this use case. Signed-off-by: Oleksij Rempel --- Documentation/devicetree/bindings/iio/adc/adc.yaml | 9 + 1 file changed, 9 insertions(+) diff --git a/Documentation/devicetree/bindings/iio/adc/adc.yaml b/Documentation/devicetree/bindings/iio/adc/adc.yaml index 912a7635edc4..66fd4b45f097 100644 --- a/Documentation/devicetree/bindings/iio/adc/adc.yaml +++ b/Documentation/devicetree/bindings/iio/adc/adc.yaml @@ -39,4 +39,13 @@ properties: The first value specifies the positive input pin, the second specifies the negative input pin. + settling-time-us: +$ref: /schemas/types.yaml#/definitions/uint32 +description: + Time between enabling the channel and firs stable readings. + + oversampling-ratio: +$ref: /schemas/types.yaml#/definitions/uint32 +description: Number of data samples which are averaged for each read. + additionalProperties: true -- 2.29.2
[PATCH 5.4 08/60] s390/vtime: fix increased steal time accounting
From: Gerald Schaefer commit d54cb7d54877d529bc1e0e1f47a3dd082f73add3 upstream. Commit 152e9b8676c6e ("s390/vtime: steal time exponential moving average") inadvertently changed the input value for account_steal_time() from "cputime_to_nsecs(steal)" to just "steal", resulting in broken increased steal time accounting. Fix this by changing it back to "cputime_to_nsecs(steal)". Fixes: 152e9b8676c6e ("s390/vtime: steal time exponential moving average") Cc: # 5.1 Reported-by: Sabine Forkel Reviewed-by: Heiko Carstens Signed-off-by: Gerald Schaefer Signed-off-by: Heiko Carstens Signed-off-by: Greg Kroah-Hartman --- arch/s390/kernel/vtime.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/s390/kernel/vtime.c +++ b/arch/s390/kernel/vtime.c @@ -217,7 +217,7 @@ void vtime_flush(struct task_struct *tsk avg_steal = S390_lowcore.avg_steal_timer / 2; if ((s64) steal > 0) { S390_lowcore.steal_timer = 0; - account_steal_time(steal); + account_steal_time(cputime_to_nsecs(steal)); avg_steal += steal; } S390_lowcore.avg_steal_timer = avg_steal;
[PATCH 5.10 012/157] s390/vtime: fix increased steal time accounting
From: Gerald Schaefer commit d54cb7d54877d529bc1e0e1f47a3dd082f73add3 upstream. Commit 152e9b8676c6e ("s390/vtime: steal time exponential moving average") inadvertently changed the input value for account_steal_time() from "cputime_to_nsecs(steal)" to just "steal", resulting in broken increased steal time accounting. Fix this by changing it back to "cputime_to_nsecs(steal)". Fixes: 152e9b8676c6e ("s390/vtime: steal time exponential moving average") Cc: # 5.1 Reported-by: Sabine Forkel Reviewed-by: Heiko Carstens Signed-off-by: Gerald Schaefer Signed-off-by: Heiko Carstens Signed-off-by: Greg Kroah-Hartman --- arch/s390/kernel/vtime.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/s390/kernel/vtime.c +++ b/arch/s390/kernel/vtime.c @@ -217,7 +217,7 @@ void vtime_flush(struct task_struct *tsk avg_steal = S390_lowcore.avg_steal_timer / 2; if ((s64) steal > 0) { S390_lowcore.steal_timer = 0; - account_steal_time(steal); + account_steal_time(cputime_to_nsecs(steal)); avg_steal += steal; } S390_lowcore.avg_steal_timer = avg_steal;
[PATCH 5.11 012/120] s390/vtime: fix increased steal time accounting
From: Gerald Schaefer commit d54cb7d54877d529bc1e0e1f47a3dd082f73add3 upstream. Commit 152e9b8676c6e ("s390/vtime: steal time exponential moving average") inadvertently changed the input value for account_steal_time() from "cputime_to_nsecs(steal)" to just "steal", resulting in broken increased steal time accounting. Fix this by changing it back to "cputime_to_nsecs(steal)". Fixes: 152e9b8676c6e ("s390/vtime: steal time exponential moving average") Cc: # 5.1 Reported-by: Sabine Forkel Reviewed-by: Heiko Carstens Signed-off-by: Gerald Schaefer Signed-off-by: Heiko Carstens Signed-off-by: Greg Kroah-Hartman --- arch/s390/kernel/vtime.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/s390/kernel/vtime.c +++ b/arch/s390/kernel/vtime.c @@ -217,7 +217,7 @@ void vtime_flush(struct task_struct *tsk avg_steal = S390_lowcore.avg_steal_timer / 2; if ((s64) steal > 0) { S390_lowcore.steal_timer = 0; - account_steal_time(steal); + account_steal_time(cputime_to_nsecs(steal)); avg_steal += steal; } S390_lowcore.avg_steal_timer = avg_steal;
Re: [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time when joining the bridge
On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean wrote: > From: Vladimir Oltean > > The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from: > > sysfs/ioctl/netlink > -> br_set_ageing_time >-> __set_ageing_time > > therefore not at bridge port creation time, so: > (a) drivers had to hardcode the initial value for the address ageing time, > because they didn't get any notification > (b) that hardcoded value can be out of sync, if the user changes the > ageing time before enslaving the port to the bridge > > Signed-off-by: Vladimir Oltean > --- Reviewed-by: Tobias Waldekranz
Re: [PATCH v3 net-next 03/12] net: dsa: inherit the actual bridge port flags at join time
Hi Vladimir, I love your patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Vladimir-Oltean/Better-support-for-sandwiched-LAGs-with-bridge-and-DSA/20210321-063842 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git d773b7957e4fd7b732a163df0e59d31ad4237302 config: arm64-randconfig-r021-20210321 (attached as .config) compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 14696baaf4c43fe53f738bc292bbe169eed93d5d) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install arm64 cross compiling tool for clang build # apt-get install binutils-aarch64-linux-gnu # https://github.com/0day-ci/linux/commit/3aac17167e3de0aeaf5287f9d586725bdc7495a5 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Vladimir-Oltean/Better-support-for-sandwiched-LAGs-with-bridge-and-DSA/20210321-063842 git checkout 3aac17167e3de0aeaf5287f9d586725bdc7495a5 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> net/dsa/port.c:167:5: warning: format specifies type 'int' but the argument >> has type 'unsigned long' [-Wformat] flags.val, err, ERR_PTR(err)); ^ include/linux/dev_printk.h:112:32: note: expanded from macro 'dev_err' _dev_err(dev, dev_fmt(fmt), ##__VA_ARGS__) ~~~ ^~~ 1 warning generated. vim +167 net/dsa/port.c 148 149 static void dsa_port_clear_brport_flags(struct dsa_port *dp, 150 struct netlink_ext_ack *extack) 151 { 152 const unsigned long val = BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; 153 const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | 154 BR_BCAST_FLOOD; 155 int flag, err; 156 157 for_each_set_bit(flag, , 32) { 158 struct switchdev_brport_flags flags = {0}; 159 160 flags.mask = BIT(flag); 161 flags.val = val & BIT(flag); 162 163 err = dsa_port_bridge_flags(dp, flags, extack); 164 if (err && err != -EOPNOTSUPP) 165 dev_err(dp->ds->dev, 166 "failed to clear bridge port flag %d: %d (%pe)\n", > 167 flags.val, err, ERR_PTR(err)); 168 } 169 } 170 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH v3 net-next 03/12] net: dsa: inherit the actual bridge port flags at join time
Hi Vladimir, I love your patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Vladimir-Oltean/Better-support-for-sandwiched-LAGs-with-bridge-and-DSA/20210321-063842 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git d773b7957e4fd7b732a163df0e59d31ad4237302 config: arm-mvebu_v5_defconfig (attached as .config) compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/3aac17167e3de0aeaf5287f9d586725bdc7495a5 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Vladimir-Oltean/Better-support-for-sandwiched-LAGs-with-bridge-and-DSA/20210321-063842 git checkout 3aac17167e3de0aeaf5287f9d586725bdc7495a5 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): In file included from include/linux/device.h:15, from include/linux/dma-mapping.h:7, from include/linux/skbuff.h:31, from include/net/net_namespace.h:39, from include/linux/netdevice.h:37, from include/linux/if_bridge.h:12, from net/dsa/port.c:9: net/dsa/port.c: In function 'dsa_port_clear_brport_flags': >> net/dsa/port.c:166:5: warning: format '%d' expects argument of type 'int', >> but argument 3 has type 'long unsigned int' [-Wformat=] 166 | "failed to clear bridge port flag %d: %d (%pe)\n", | ^ include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt' 19 | #define dev_fmt(fmt) fmt | ^~~ net/dsa/port.c:165:4: note: in expansion of macro 'dev_err' 165 |dev_err(dp->ds->dev, |^~~ net/dsa/port.c:166:40: note: format string is defined here 166 | "failed to clear bridge port flag %d: %d (%pe)\n", | ~^ || |int | %ld vim +166 net/dsa/port.c 148 149 static void dsa_port_clear_brport_flags(struct dsa_port *dp, 150 struct netlink_ext_ack *extack) 151 { 152 const unsigned long val = BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; 153 const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | 154 BR_BCAST_FLOOD; 155 int flag, err; 156 157 for_each_set_bit(flag, , 32) { 158 struct switchdev_brport_flags flags = {0}; 159 160 flags.mask = BIT(flag); 161 flags.val = val & BIT(flag); 162 163 err = dsa_port_bridge_flags(dp, flags, extack); 164 if (err && err != -EOPNOTSUPP) 165 dev_err(dp->ds->dev, > 166 "failed to clear bridge port flag %d: > %d (%pe)\n", 167 flags.val, err, ERR_PTR(err)); 168 } 169 } 170 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
[PATCH v3 net-next 07/12] net: dsa: sync ageing time when joining the bridge
From: Vladimir Oltean The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from: sysfs/ioctl/netlink -> br_set_ageing_time -> __set_ageing_time therefore not at bridge port creation time, so: (a) drivers had to hardcode the initial value for the address ageing time, because they didn't get any notification (b) that hardcoded value can be out of sync, if the user changes the ageing time before enslaving the port to the bridge Signed-off-by: Vladimir Oltean --- Changes in v3: None. include/linux/if_bridge.h | 6 ++ net/bridge/br_stp.c | 13 + net/dsa/port.c| 10 ++ 3 files changed, 29 insertions(+) diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index 920d3a02cc68..ebd16495459c 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -137,6 +137,7 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev, void br_fdb_clear_offload(const struct net_device *dev, u16 vid); bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag); u8 br_port_get_stp_state(const struct net_device *dev); +clock_t br_get_ageing_time(struct net_device *br_dev); #else static inline struct net_device * br_fdb_find_port(const struct net_device *br_dev, @@ -160,6 +161,11 @@ static inline u8 br_port_get_stp_state(const struct net_device *dev) { return BR_STATE_DISABLED; } + +static inline clock_t br_get_ageing_time(struct net_device *br_dev) +{ + return 0; +} #endif #endif diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c index 86b5e05d3f21..3dafb6143cff 100644 --- a/net/bridge/br_stp.c +++ b/net/bridge/br_stp.c @@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t ageing_time) return 0; } +clock_t br_get_ageing_time(struct net_device *br_dev) +{ + struct net_bridge *br; + + if (!netif_is_bridge_master(br_dev)) + return 0; + + br = netdev_priv(br_dev); + + return jiffies_to_clock_t(br->ageing_time); +} +EXPORT_SYMBOL_GPL(br_get_ageing_time); + /* called under bridge lock */ void __br_set_topology_change(struct net_bridge *br, unsigned char val) { diff --git a/net/dsa/port.c b/net/dsa/port.c index 124f8bb21204..95e6f2861290 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -173,6 +173,7 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp, { struct net_device *brport_dev = dsa_port_to_bridge_port(dp); struct net_device *br = dp->bridge_dev; + clock_t ageing_time; u8 stp_state; int err; @@ -193,6 +194,11 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp, if (err && err != -EOPNOTSUPP) return err; + ageing_time = br_get_ageing_time(br); + err = dsa_port_ageing_time(dp, ageing_time); + if (err && err != -EOPNOTSUPP) + return err; + return 0; } @@ -222,6 +228,10 @@ static void dsa_port_switchdev_unsync(struct dsa_port *dp) * allow this in standalone mode too. */ dsa_port_mrouter(dp->cpu_dp, true, NULL); + + /* Ageing time may be global to the switch chip, so don't change it +* here because we have no good reason (or value) to change it to. +*/ } int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br, -- 2.25.1
[PATCH v3 net-next 03/12] net: dsa: inherit the actual bridge port flags at join time
From: Vladimir Oltean DSA currently assumes that the bridge port starts off with this constellation of bridge port flags: - learning on - unicast flooding on - multicast flooding on - broadcast flooding on just by virtue of code copy-pasta from the bridge layer (new_nbp). This was a simple enough strategy thus far, because the 'bridge join' moment always coincided with the 'bridge port creation' moment. But with sandwiched interfaces, such as: br0 | bond0 | swp0 it may happen that the user has had time to change the bridge port flags of bond0 before enslaving swp0 to it. In that case, swp0 will falsely assume that the bridge port flags are those determined by new_nbp, when in fact this can happen: ip link add br0 type bridge ip link add bond0 type bond ip link set bond0 master br0 ip link set bond0 type bridge_slave learning off ip link set swp0 master br0 Now swp0 has learning enabled, bond0 has learning disabled. Not nice. Fix this by "dumpster diving" through the actual bridge port flags with br_port_flag_is_set, at bridge join time. We use this opportunity to split dsa_port_change_brport_flags into two distinct functions called dsa_port_inherit_brport_flags and dsa_port_clear_brport_flags, now that the implementation for the two cases is no longer similar. Signed-off-by: Vladimir Oltean --- Changes in v3: Rewrote dsa_port_clear_brport_flags to at least catch errors, and to use the same "for" loop structure as dsa_port_inherit_brport_flags. net/dsa/port.c | 125 - 1 file changed, 83 insertions(+), 42 deletions(-) diff --git a/net/dsa/port.c b/net/dsa/port.c index fcbe5b1545b8..8dbc6e0db30c 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -122,26 +122,82 @@ void dsa_port_disable(struct dsa_port *dp) rtnl_unlock(); } -static void dsa_port_change_brport_flags(struct dsa_port *dp, -bool bridge_offload) +static int dsa_port_inherit_brport_flags(struct dsa_port *dp, +struct netlink_ext_ack *extack) { - struct switchdev_brport_flags flags; - int flag; + const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | + BR_BCAST_FLOOD; + struct net_device *brport_dev = dsa_port_to_bridge_port(dp); + int flag, err; - flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; - if (bridge_offload) - flags.val = flags.mask; - else - flags.val = flags.mask & ~BR_LEARNING; + for_each_set_bit(flag, , 32) { + struct switchdev_brport_flags flags = {0}; - for_each_set_bit(flag, , 32) { - struct switchdev_brport_flags tmp; + flags.mask = BIT(flag); - tmp.val = flags.val & BIT(flag); - tmp.mask = BIT(flag); + if (br_port_flag_is_set(brport_dev, BIT(flag))) + flags.val = BIT(flag); - dsa_port_bridge_flags(dp, tmp, NULL); + err = dsa_port_bridge_flags(dp, flags, extack); + if (err && err != -EOPNOTSUPP) + return err; } + + return 0; +} + +static void dsa_port_clear_brport_flags(struct dsa_port *dp, + struct netlink_ext_ack *extack) +{ + const unsigned long val = BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; + const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | + BR_BCAST_FLOOD; + int flag, err; + + for_each_set_bit(flag, , 32) { + struct switchdev_brport_flags flags = {0}; + + flags.mask = BIT(flag); + flags.val = val & BIT(flag); + + err = dsa_port_bridge_flags(dp, flags, extack); + if (err && err != -EOPNOTSUPP) + dev_err(dp->ds->dev, + "failed to clear bridge port flag %d: %d (%pe)\n", + flags.val, err, ERR_PTR(err)); + } +} + +static int dsa_port_switchdev_sync(struct dsa_port *dp, + struct netlink_ext_ack *extack) +{ + int err; + + err = dsa_port_inherit_brport_flags(dp, extack); + if (err) + return err; + + return 0; +} + +/* Configure the port for standalone mode (no address learning, flood + * everything, BR_STATE_FORWARDING, etc). + * The bridge only emits SWITCHDEV_ATTR_ID_PORT_* events when the user + * requests it through netlink or sysfs, but not automatically at port + * join or leave, so we need to handle resetting the brport flags ourselves. + * But we even prefer it that way, because otherwise, some setups might never + * get the notification they need, for example, when a port leaves a LAG that + * offloads the bridge,
Re: [PATCH v4 18/22] x86/fpu/amx: Define AMX state components and have it used for boot-time checks
On Sun, Feb 21 2021 at 10:56, Chang S. Bae wrote: > > +static void check_xtile_data_against_struct(int size) > +{ > + u32 max_palid, palid, state_size; > + u32 eax, ebx, ecx, edx; > + u16 max_tile; > + > + /* > + * Check the maximum palette id: > + * eax: the highest numbered palette subleaf. > + */ > + cpuid_count(TILE_CPUID, 0, _palid, , , ); > + > + /* > + * Cross-check each tile size and find the maximum > + * number of supported tiles. > + */ > + for (palid = 1, max_tile = 0; palid <= max_palid; palid++) { > + u16 tile_size, max; > + > + /* > + * Check the tile size info: > + * eax[31:16]: bytes per title > + * ebx[31:16]: the max names (or max number of tiles) > + */ > + cpuid_count(TILE_CPUID, palid, , , , ); > + tile_size = eax >> 16; > + max = ebx >> 16; > + > + if (WARN_ONCE(tile_size != sizeof(struct xtile_data), > + "%s: struct is %zu bytes, cpu xtile %d bytes\n", > + __stringify(XFEATURE_XTILE_DATA), > + sizeof(struct xtile_data), tile_size)) > + __xstate_dump_leaves(); > + > + if (max > max_tile) > + max_tile = max; > + } > + > + state_size = sizeof(struct xtile_data) * max_tile; > + if (WARN_ONCE(size != state_size, > + "%s: calculated size is %u bytes, cpu state %d bytes\n", > + __stringify(XFEATURE_XTILE_DATA), state_size, size)) > + __xstate_dump_leaves(); So we have 2 warnings which complain about inconsistent state and that's it? Why has this absolutely no consequences? We just keep stuff enabled and jug along, right? Which one of the two states is correct? Why don't we just disable that muck and be done with it to play it safe? Failing to execute some workload by saying NO due to inconsistency is far more useful than taking the chance of potential silent data corruption. Thanks, tglx
Re: [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time when joining the bridge
On Fri, Mar 19, 2021 at 03:13:03PM -0700, Florian Fainelli wrote: > > diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c > > index 86b5e05d3f21..3dafb6143cff 100644 > > --- a/net/bridge/br_stp.c > > +++ b/net/bridge/br_stp.c > > @@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t > > ageing_time) > > return 0; > > } > > > > +clock_t br_get_ageing_time(struct net_device *br_dev) > > +{ > > + struct net_bridge *br; > > + > > + if (!netif_is_bridge_master(br_dev)) > > + return 0; > > + > > + br = netdev_priv(br_dev); > > + > > + return jiffies_to_clock_t(br->ageing_time); > > Don't you want an ASSERT_RTNL() in this function as well? Hmm, I'm not sure. I don't think I'm accessing anything that is under the protection of the rtnl_mutex. If anything, the ageing time is protected by the "bridge lock", but I don't think there's much of an issue if I read an unsigned int while not holding it.
Re: [RFC PATCH v2 net-next 03/16] net: dsa: inherit the actual bridge port flags at join time
On Fri, Mar 19, 2021 at 03:08:46PM -0700, Florian Fainelli wrote: > > > On 3/18/2021 4:18 PM, Vladimir Oltean wrote: > > From: Vladimir Oltean > > > > DSA currently assumes that the bridge port starts off with this > > constellation of bridge port flags: > > > > - learning on > > - unicast flooding on > > - multicast flooding on > > - broadcast flooding on > > > > just by virtue of code copy-pasta from the bridge layer (new_nbp). > > This was a simple enough strategy thus far, because the 'bridge join' > > moment always coincided with the 'bridge port creation' moment. > > > > But with sandwiched interfaces, such as: > > > > br0 > > | > > bond0 > > | > > swp0 > > > > it may happen that the user has had time to change the bridge port flags > > of bond0 before enslaving swp0 to it. In that case, swp0 will falsely > > assume that the bridge port flags are those determined by new_nbp, when > > in fact this can happen: > > > > ip link add br0 type bridge > > ip link add bond0 type bond > > ip link set bond0 master br0 > > ip link set bond0 type bridge_slave learning off > > ip link set swp0 master br0 > > > > Now swp0 has learning enabled, bond0 has learning disabled. Not nice. > > > > Fix this by "dumpster diving" through the actual bridge port flags with > > br_port_flag_is_set, at bridge join time. > > > > We use this opportunity to split dsa_port_change_brport_flags into two > > distinct functions called dsa_port_inherit_brport_flags and > > dsa_port_clear_brport_flags, now that the implementation for the two > > cases is no longer similar. > > > > Signed-off-by: Vladimir Oltean > > --- > > net/dsa/port.c | 123 - > > 1 file changed, 82 insertions(+), 41 deletions(-) > > > > diff --git a/net/dsa/port.c b/net/dsa/port.c > > index fcbe5b1545b8..346c50467810 100644 > > --- a/net/dsa/port.c > > +++ b/net/dsa/port.c > > @@ -122,26 +122,82 @@ void dsa_port_disable(struct dsa_port *dp) > > rtnl_unlock(); > > } > > > > -static void dsa_port_change_brport_flags(struct dsa_port *dp, > > -bool bridge_offload) > > +static void dsa_port_clear_brport_flags(struct dsa_port *dp, > > + struct netlink_ext_ack *extack) > > { > > struct switchdev_brport_flags flags; > > - int flag; > > > > - flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; > > - if (bridge_offload) > > - flags.val = flags.mask; > > - else > > - flags.val = flags.mask & ~BR_LEARNING; > > + flags.mask = BR_LEARNING; > > + flags.val = 0; > > + dsa_port_bridge_flags(dp, flags, extack); > > Would not you want to use the same for_each_set_bit() loop that > dsa_port_change_br_flags() uses, that would be a tad more compact. > -- > Florian The reworded version has an equal number of lines, but at least it catches errors now: static void dsa_port_clear_brport_flags(struct dsa_port *dp, struct netlink_ext_ack *extack) { const unsigned long val = BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; int flag, err; for_each_set_bit(flag, , 32) { struct switchdev_brport_flags flags = {0}; flags.mask = BIT(flag); flags.val = val & BIT(flag); err = dsa_port_bridge_flags(dp, flags, extack); if (err && err != -EOPNOTSUPP) dev_err(dp->ds->dev, "failed to clear bridge port flag %d: %d (%pe)\n", flag, err, ERR_PTR(err)); } }
Re: [RFC PATCH v2 net-next 07/16] net: dsa: sync ageing time when joining the bridge
On 3/18/2021 4:18 PM, Vladimir Oltean wrote: > From: Vladimir Oltean > > The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from: > > sysfs/ioctl/netlink > -> br_set_ageing_time >-> __set_ageing_time > > therefore not at bridge port creation time, so: > (a) drivers had to hardcode the initial value for the address ageing time, > because they didn't get any notification > (b) that hardcoded value can be out of sync, if the user changes the > ageing time before enslaving the port to the bridge > > Signed-off-by: Vladimir Oltean > --- > include/linux/if_bridge.h | 6 ++ > net/bridge/br_stp.c | 13 + > net/dsa/port.c| 10 ++ > 3 files changed, 29 insertions(+) > > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h > index 920d3a02cc68..ebd16495459c 100644 > --- a/include/linux/if_bridge.h > +++ b/include/linux/if_bridge.h > @@ -137,6 +137,7 @@ struct net_device *br_fdb_find_port(const struct > net_device *br_dev, > void br_fdb_clear_offload(const struct net_device *dev, u16 vid); > bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag); > u8 br_port_get_stp_state(const struct net_device *dev); > +clock_t br_get_ageing_time(struct net_device *br_dev); > #else > static inline struct net_device * > br_fdb_find_port(const struct net_device *br_dev, > @@ -160,6 +161,11 @@ static inline u8 br_port_get_stp_state(const struct > net_device *dev) > { > return BR_STATE_DISABLED; > } > + > +static inline clock_t br_get_ageing_time(struct net_device *br_dev) > +{ > + return 0; > +} > #endif > > #endif > diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c > index 86b5e05d3f21..3dafb6143cff 100644 > --- a/net/bridge/br_stp.c > +++ b/net/bridge/br_stp.c > @@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t > ageing_time) > return 0; > } > > +clock_t br_get_ageing_time(struct net_device *br_dev) > +{ > + struct net_bridge *br; > + > + if (!netif_is_bridge_master(br_dev)) > + return 0; > + > + br = netdev_priv(br_dev); > + > + return jiffies_to_clock_t(br->ageing_time); Don't you want an ASSERT_RTNL() in this function as well? -- Florian
Re: [RFC PATCH v2 net-next 03/16] net: dsa: inherit the actual bridge port flags at join time
On 3/18/2021 4:18 PM, Vladimir Oltean wrote: > From: Vladimir Oltean > > DSA currently assumes that the bridge port starts off with this > constellation of bridge port flags: > > - learning on > - unicast flooding on > - multicast flooding on > - broadcast flooding on > > just by virtue of code copy-pasta from the bridge layer (new_nbp). > This was a simple enough strategy thus far, because the 'bridge join' > moment always coincided with the 'bridge port creation' moment. > > But with sandwiched interfaces, such as: > > br0 > | > bond0 > | > swp0 > > it may happen that the user has had time to change the bridge port flags > of bond0 before enslaving swp0 to it. In that case, swp0 will falsely > assume that the bridge port flags are those determined by new_nbp, when > in fact this can happen: > > ip link add br0 type bridge > ip link add bond0 type bond > ip link set bond0 master br0 > ip link set bond0 type bridge_slave learning off > ip link set swp0 master br0 > > Now swp0 has learning enabled, bond0 has learning disabled. Not nice. > > Fix this by "dumpster diving" through the actual bridge port flags with > br_port_flag_is_set, at bridge join time. > > We use this opportunity to split dsa_port_change_brport_flags into two > distinct functions called dsa_port_inherit_brport_flags and > dsa_port_clear_brport_flags, now that the implementation for the two > cases is no longer similar. > > Signed-off-by: Vladimir Oltean > --- > net/dsa/port.c | 123 - > 1 file changed, 82 insertions(+), 41 deletions(-) > > diff --git a/net/dsa/port.c b/net/dsa/port.c > index fcbe5b1545b8..346c50467810 100644 > --- a/net/dsa/port.c > +++ b/net/dsa/port.c > @@ -122,26 +122,82 @@ void dsa_port_disable(struct dsa_port *dp) > rtnl_unlock(); > } > > -static void dsa_port_change_brport_flags(struct dsa_port *dp, > - bool bridge_offload) > +static void dsa_port_clear_brport_flags(struct dsa_port *dp, > + struct netlink_ext_ack *extack) > { > struct switchdev_brport_flags flags; > - int flag; > > - flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; > - if (bridge_offload) > - flags.val = flags.mask; > - else > - flags.val = flags.mask & ~BR_LEARNING; > + flags.mask = BR_LEARNING; > + flags.val = 0; > + dsa_port_bridge_flags(dp, flags, extack); Would not you want to use the same for_each_set_bit() loop that dsa_port_change_br_flags() uses, that would be a tad more compact. -- Florian