Re: [PATCH net-next 0/5] PTP support for the SJA1105 DSA driver
On Fri, May 31, 2019 at 09:12:03PM +0300, Vladimir Oltean wrote: > It won't work unless I make changes to dsa_switch_rcv. Or to the tagging code. > Right now taggers can only return a pointer to the skb, or NULL, case > in which DSA will free it. The tagger can re-write the skb. Why not reform it into a PTP frame? This clever trick is what the phyter does in hardware. See dp83640.c. Thanks, Richard
Re: [PATCH net-next 0/5] PTP support for the SJA1105 DSA driver
On Fri, May 31, 2019 at 07:16:17PM +0300, Vladimir Oltean wrote: > But now comes the question on what to do on error cases - the meta > frame didn't arrive. Should I just drop the skb waiting for it? Yes, that is what other drivers do. > Right now I "goto rcv_anyway" - which linuxptp doesn't like btw. The application sees the missing time stamp and prints a warning message. This is IHMO the right thing to do, so that the user is made aware of the degradation of the synchronization. Thanks, Richard
Re: [PATCH 3/3] habanalabs: restore unsecured registers default values
On Thu, May 30, 2019 at 11:46 AM Dalit Ben Zoor wrote: > > unsecured registers can be changed by the user, and hence should be > restored to their default values in context switch > > Signed-off-by: Dalit Ben Zoor > --- > drivers/misc/habanalabs/goya/goya.c | 19 --- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/drivers/misc/habanalabs/goya/goya.c > b/drivers/misc/habanalabs/goya/goya.c > index 87859c55b4b8..81c1d576783f 100644 > --- a/drivers/misc/habanalabs/goya/goya.c > +++ b/drivers/misc/habanalabs/goya/goya.c > @@ -786,7 +786,6 @@ static void goya_init_dma_ch(struct hl_device *hdev, int > dma_id) > else > sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1007; > > - WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO + reg_off, lower_32_bits(sob_addr)); > WREG32(mmDMA_CH_0_WR_COMP_ADDR_HI + reg_off, upper_32_bits(sob_addr)); > WREG32(mmDMA_CH_0_WR_COMP_WDATA + reg_off, 0x8001); > } > @@ -4560,10 +4559,12 @@ static int goya_memset_device_memory(struct hl_device > *hdev, u64 addr, u64 size, > int goya_context_switch(struct hl_device *hdev, u32 asid) > { > struct asic_fixed_properties *prop = >asic_prop; > - u64 addr = prop->sram_base_address; > + u64 addr = prop->sram_base_address, sob_addr; > u32 size = hdev->pldm ? 0x1 : prop->sram_size; > u64 val = 0xull; > - int rc; > + int rc, dma_id; > + u32 channel_off = mmDMA_CH_1_WR_COMP_ADDR_LO - > + mmDMA_CH_0_WR_COMP_ADDR_LO; > > rc = goya_memset_device_memory(hdev, addr, size, val, false); > if (rc) { > @@ -4571,7 +4572,19 @@ int goya_context_switch(struct hl_device *hdev, u32 > asid) > return rc; > } > > + /* we need to reset registers that the user is allowed to change */ > + sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1007; > + WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO, lower_32_bits(sob_addr)); > + > + for (dma_id = 1 ; dma_id < NUMBER_OF_EXT_HW_QUEUES ; dma_id++) { > + sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1000 + > + (dma_id - 1) * 4; > + WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO + channel_off * dma_id, > + lower_32_bits(sob_addr)); > + } > + > WREG32(mmTPC_PLL_CLK_RLX_0, 0x200020); > + > goya_mmu_prepare(hdev, asid); > > goya_clear_sm_regs(hdev); > -- > 2.17.1 > The patch-set is: Reviewed-by: Oded Gabbay
Re: [PATCH -mm] mm, swap: Fix bad swap file entry warning
(Resend as LKML didn't take outlook settings.) > On Fri, 2019-05-31 at 11:27 -0700, Dexuan-Linux Cui wrote: > > Hi, > > Did you know about the panic reported here: > > https://marc.info/?t=15593077303=1=2 > > > > "Kernel panic - not syncing: stack-protector: Kernel stack is > > corrupted in: write_irq_affinity.isra> " > > > > This panic is reported on PowerPC and x86. > > > > In the case of x86, we see a lot of "get_swap_device: Bad swap file entry" > > errors before the panic: > > > > ... > > [ 24.404693] get_swap_device: Bad swap file entry 5801 > > [ 24.408702] get_swap_device: Bad swap file entry 5c01 > > [ 24.412510] get_swap_device: Bad swap file entry 6001 > > [ 24.416519] get_swap_device: Bad swap file entry 6401 > > [ 24.420217] get_swap_device: Bad swap file entry 6801 > > [ 24.423921] get_swap_device: Bad swap file entry 6c01 [..] I don't have a panic, but I observe many lines like this. > Looks familiar, > > https://lore.kernel.org/lkml/1559242868.6132.35.ca...@lca.pw/ > > I suppose Andrew might be better of reverting the whole series first before > Yury > came up with a right fix, so that other people who is testing linux-next don't > need to waste time for the same problem. I didn't observe any problems with this series on top of 5.1, and there's a fix for swap\ that eliminates the problem on top of current next for me: https://lkml.org/lkml/2019/5/30/1630 Could you please test my series with the patch above? Thanks, Yury
[PATCH -next] ALSA: lx6464es - Remove set but not used variables 'orun_mask, urun_mask'
Fixes gcc '-Wunused-but-set-variable' warning: sound/pci/lx6464es/lx_core.c: In function 'lx_interrupt_handle_async_events': sound/pci/lx6464es/lx_core.c:990:6: warning: variable 'urun_mask' set but not used [-Wunused-but-set-variable] sound/pci/lx6464es/lx_core.c:989:6: warning: variable 'orun_mask' set but not used [-Wunused-but-set-variable] They are never used, so can be removed. Signed-off-by: YueHaibing --- sound/pci/lx6464es/lx_core.c | 5 - 1 file changed, 5 deletions(-) diff --git a/sound/pci/lx6464es/lx_core.c b/sound/pci/lx6464es/lx_core.c index 9236a1a8c49b..dd3a873777eb 100644 --- a/sound/pci/lx6464es/lx_core.c +++ b/sound/pci/lx6464es/lx_core.c @@ -986,8 +986,6 @@ static int lx_interrupt_handle_async_events(struct lx6464es *chip, u32 irqsrc, * Stat[8] LSB overrun * */ - u64 orun_mask; - u64 urun_mask; int eb_pending_out = (irqsrc & MASK_SYS_STATUS_EOBO) ? 1 : 0; int eb_pending_in = (irqsrc & MASK_SYS_STATUS_EOBI) ? 1 : 0; @@ -1010,9 +1008,6 @@ static int lx_interrupt_handle_async_events(struct lx6464es *chip, u32 irqsrc, *r_notified_out_pipe_mask); } - orun_mask = ((u64)stat[7] << 32) + stat[8]; - urun_mask = ((u64)stat[5] << 32) + stat[6]; - /* todo: handle xrun notification */ return err;
[PATCH -next] pwm: pca9685: Remove set but not used variable 'pwm'
Fixes gcc '-Wunused-but-set-variable' warning: drivers/pwm/pwm-pca9685.c: In function 'pca9685_pwm_gpio_free': drivers/pwm/pwm-pca9685.c:173:21: warning: variable 'pwm' set but not used [-Wunused-but-set-variable] It's not used since commit e926b12c611c ("pwm: Clear chip_data in pwm_put()") Signed-off-by: YueHaibing --- drivers/pwm/pwm-pca9685.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/pwm/pwm-pca9685.c b/drivers/pwm/pwm-pca9685.c index 567f5e2771c4..d16215c276bd 100644 --- a/drivers/pwm/pwm-pca9685.c +++ b/drivers/pwm/pwm-pca9685.c @@ -170,12 +170,10 @@ static void pca9685_pwm_gpio_set(struct gpio_chip *gpio, unsigned int offset, static void pca9685_pwm_gpio_free(struct gpio_chip *gpio, unsigned int offset) { struct pca9685 *pca = gpiochip_get_data(gpio); - struct pwm_device *pwm; pca9685_pwm_gpio_set(gpio, offset, 0); pm_runtime_put(pca->chip.dev); mutex_lock(>lock); - pwm = >chip.pwms[offset]; mutex_unlock(>lock); }
[PATCH] treewide: fix typos of SPDX-License-Identifier
Prior to the adoption of SPDX, it was difficult for tools to determine the correct license due to incomplete or badly formatted license text. The SPDX solves this issue, assuming people can correctly spell "SPDX-License-Identifier" although this assumption is broken in some places. Since scripts/spdxcheck.py parses only lines that exactly matches to the correct tag, it cannot (should not) detect this kind of error. If the correct tag is missing, scripts/checkpatch.pl warns like this: WARNING: Missing or malformed SPDX-License-Identifier tag in line * So, people should notice it before the patch submission, but in reality broken tags sometimes slip in. The checkpatch warning is not useful for checking the committed files globally since large number of files still have no SPDX tag. Also, I am not sure about the legal effect when the SPDX tag is broken. Anyway, these typos are absolutely worth fixing. It is pretty easy to find suspicious lines by grep. $ git grep --not -e SPDX-License-Identifier --and -e SPDX- -- \ :^LICENSES :^scripts/spdxcheck.py :^*/license-rules.rst arch/arm/kernel/bugs.c:// SPDX-Identifier: GPL-2.0 drivers/phy/st/phy-stm32-usbphyc.c:// SPDX-Licence-Identifier: GPL-2.0 drivers/pinctrl/sh-pfc/pfc-r8a77980.c:// SPDX-Lincense-Identifier: GPL 2.0 lib/test_stackinit.c:// SPDX-Licenses: GPLv2 sound/soc/codecs/max9759.c:// SPDX-Licence-Identifier: GPL-2.0 Signed-off-by: Masahiro Yamada --- arch/arm/kernel/bugs.c| 2 +- drivers/phy/st/phy-stm32-usbphyc.c| 2 +- drivers/pinctrl/sh-pfc/pfc-r8a77980.c | 2 +- lib/test_stackinit.c | 2 +- sound/soc/codecs/max9759.c| 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/arm/kernel/bugs.c b/arch/arm/kernel/bugs.c index d41d3598e5e5..14c8dbbb7d2d 100644 --- a/arch/arm/kernel/bugs.c +++ b/arch/arm/kernel/bugs.c @@ -1,4 +1,4 @@ -// SPDX-Identifier: GPL-2.0 +// SPDX-License-Identifier: GPL-2.0 #include #include #include diff --git a/drivers/phy/st/phy-stm32-usbphyc.c b/drivers/phy/st/phy-stm32-usbphyc.c index 1255cd1d9a60..56bdea4b0bd9 100644 --- a/drivers/phy/st/phy-stm32-usbphyc.c +++ b/drivers/phy/st/phy-stm32-usbphyc.c @@ -1,4 +1,4 @@ -// SPDX-Licence-Identifier: GPL-2.0 +// SPDX-License-Identifier: GPL-2.0 /* * STMicroelectronics STM32 USB PHY Controller driver * diff --git a/drivers/pinctrl/sh-pfc/pfc-r8a77980.c b/drivers/pinctrl/sh-pfc/pfc-r8a77980.c index 473da65890a7..9ed4ead2dafb 100644 --- a/drivers/pinctrl/sh-pfc/pfc-r8a77980.c +++ b/drivers/pinctrl/sh-pfc/pfc-r8a77980.c @@ -1,4 +1,4 @@ -// SPDX-Lincense-Identifier: GPL 2.0 +// SPDX-License-Identifier: GPL-2.0 /* * R8A77980 processor support - PFC hardware block. * diff --git a/lib/test_stackinit.c b/lib/test_stackinit.c index 13115b6f2b88..e97dc54b4fdf 100644 --- a/lib/test_stackinit.c +++ b/lib/test_stackinit.c @@ -1,4 +1,4 @@ -// SPDX-Licenses: GPLv2 +// SPDX-License-Identifier: GPL-2.0 /* * Test cases for compiler-based stack variable zeroing via future * compiler flags or CONFIG_GCC_PLUGIN_STRUCTLEAK*. diff --git a/sound/soc/codecs/max9759.c b/sound/soc/codecs/max9759.c index ecfb4a80424b..00e9d4fd1651 100644 --- a/sound/soc/codecs/max9759.c +++ b/sound/soc/codecs/max9759.c @@ -1,4 +1,4 @@ -// SPDX-Licence-Identifier: GPL-2.0 +// SPDX-License-Identifier: GPL-2.0 /* * MAX9759 Amplifier Driver * -- 2.17.1
Re: [PATCH bpf v2] bpf: preallocate a perf_sample_data per event fd
On Fri, May 31, 2019 at 6:28 PM Song Liu wrote: > > > > > On May 31, 2019, at 3:37 PM, Matt Mullins wrote: > > > > It is possible that a BPF program can be called while another BPF > > program is executing bpf_perf_event_output. This has been observed with > > I/O completion occurring as a result of an interrupt: > > > > bpf_prog_247fd1341cddaea4_trace_req_end+0x8d7/0x1000 > > ? trace_call_bpf+0x82/0x100 > > ? sch_direct_xmit+0xe2/0x230 > > ? blk_mq_end_request+0x1/0x100 > > ? blk_mq_end_request+0x5/0x100 > > ? kprobe_perf_func+0x19b/0x240 > > ? __qdisc_run+0x86/0x520 > > ? blk_mq_end_request+0x1/0x100 > > ? blk_mq_end_request+0x5/0x100 > > ? kprobe_ftrace_handler+0x90/0xf0 > > ? ftrace_ops_assist_func+0x6e/0xe0 > > ? ip6_input_finish+0xbf/0x460 > > ? 0xa01e80bf > > ? nbd_dbg_flags_show+0xc0/0xc0 [nbd] > > ? blkdev_issue_zeroout+0x200/0x200 > > ? blk_mq_end_request+0x1/0x100 > > ? blk_mq_end_request+0x5/0x100 > > ? flush_smp_call_function_queue+0x6c/0xe0 > > ? smp_call_function_single_interrupt+0x32/0xc0 > > ? call_function_single_interrupt+0xf/0x20 > > ? call_function_single_interrupt+0xa/0x20 > > ? swiotlb_map_page+0x140/0x140 > > ? refcount_sub_and_test+0x1a/0x50 > > ? tcp_wfree+0x20/0xf0 > > ? skb_release_head_state+0x62/0xc0 > > ? skb_release_all+0xe/0x30 > > ? napi_consume_skb+0xb5/0x100 > > ? mlx5e_poll_tx_cq+0x1df/0x4e0 > > ? mlx5e_poll_tx_cq+0x38c/0x4e0 > > ? mlx5e_napi_poll+0x58/0xc30 > > ? mlx5e_napi_poll+0x232/0xc30 > > ? net_rx_action+0x128/0x340 > > ? __do_softirq+0xd4/0x2ad > > ? irq_exit+0xa5/0xb0 > > ? do_IRQ+0x7d/0xc0 > > ? common_interrupt+0xf/0xf > > > > ? __rb_free_aux+0xf0/0xf0 > > ? perf_output_sample+0x28/0x7b0 > > ? perf_prepare_sample+0x54/0x4a0 > > ? perf_event_output+0x43/0x60 > > ? bpf_perf_event_output_raw_tp+0x15f/0x180 > > ? blk_mq_start_request+0x1/0x120 > > ? bpf_prog_411a64a706fc6044_should_trace+0xad4/0x1000 > > ? bpf_trace_run3+0x2c/0x80 > > ? nbd_send_cmd+0x4c2/0x690 [nbd] > > > > This also cannot be alleviated by further splitting the per-cpu > > perf_sample_data structs (as in commit 283ca526a9bd ("bpf: fix > > corruption on concurrent perf_event_output calls")), as a raw_tp could > > be attached to the block:block_rq_complete tracepoint and execute during > > another raw_tp. Instead, keep a pre-allocated perf_sample_data > > structure per perf_event_array element and fail a bpf_perf_event_output > > if that element is concurrently being used. > > > > Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for > > perf_sample_data") > > Signed-off-by: Matt Mullins > > This looks great. Thanks for the fix. > > Acked-by: Song Liu > > > --- > > v1->v2: > > keep a pointer to the struct perf_sample_data rather than directly > > embedding it in the structure, avoiding the circular include and > > removing the need for in_use. Suggested by Song. > > > > include/linux/bpf.h | 1 + > > kernel/bpf/arraymap.c| 3 ++- > > kernel/trace/bpf_trace.c | 29 - > > 3 files changed, 19 insertions(+), 14 deletions(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index 4fb3aa2dc975..47fd85cfbbaf 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -472,6 +472,7 @@ struct bpf_event_entry { > > struct file *perf_file; > > struct file *map_file; > > struct rcu_head rcu; > > + struct perf_sample_data *sd; > > }; > > > > bool bpf_prog_array_compatible(struct bpf_array *array, const struct > > bpf_prog *fp); > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > > index 584636c9e2eb..c7f5d593e04f 100644 > > --- a/kernel/bpf/arraymap.c > > +++ b/kernel/bpf/arraymap.c > > @@ -654,11 +654,12 @@ static struct bpf_event_entry > > *bpf_event_entry_gen(struct file *perf_file, > > { > > struct bpf_event_entry *ee; > > > > - ee = kzalloc(sizeof(*ee), GFP_ATOMIC); > > + ee = kzalloc(sizeof(*ee) + sizeof(struct perf_sample_data), > > GFP_ATOMIC); > > if (ee) { > > ee->event = perf_file->private_data; > > ee->perf_file = perf_file; > > ee->map_file = map_file; > > + ee->sd = (void *)ee + sizeof(*ee); This bit looks quite weird, but I don't have better ideas to avoid circular .h pain. Applied to bpf tree. Thanks
Re: [PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()
On Sat, Jun 01, 2019 at 03:34:49AM +0100, Al Viro wrote: > On Sat, Jun 01, 2019 at 03:25:27AM +0100, Al Viro wrote: > > On Sat, Jun 01, 2019 at 10:15:26AM +0800, Gen Zhang wrote: > > > In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It > > > returns NULL when fails. So 'arg' should be checked. And 'mnt_opts' > > > should be freed when error. > > > > What's the latter one for? On failure we'll get to put_fs_context() > > pretty soon, so > > security_free_mnt_opts(>security); > > will be called just fine. Leaving it allocated on failure is fine... > > Actually, right now in mainline it is not (btrfs_mount_root() has > an odd call of security_sb_eat_lsm_opts()); eventually we will be > down to just the callers in ->parse_monolithic() instances, at which > point the above will become correct. At the moment it is not, so > consider the objection withdrawn (and I really need to get some sleep, > seeing how long did it take me to recall the context... ;-/) Thanks for your comments. And have a good dream. Thanks Gen
Re: [PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()
On Sat, Jun 01, 2019 at 03:25:27AM +0100, Al Viro wrote: > On Sat, Jun 01, 2019 at 10:15:26AM +0800, Gen Zhang wrote: > > In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It > > returns NULL when fails. So 'arg' should be checked. And 'mnt_opts' > > should be freed when error. > > What's the latter one for? On failure we'll get to put_fs_context() > pretty soon, so > security_free_mnt_opts(>security); > will be called just fine. Leaving it allocated on failure is fine... Paul Moore wrote: >It seems like we should also check for, and potentially free *mnt_opts >as the selinux_add_opt() error handling does just below this change, >yes? If that is the case we might want to move that error handling >code to the bottom of the function and jump there on error. I am not familiar with this part. So could you please show the function call sequence? Thanks Gen
Re: [PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()
On Sat, Jun 01, 2019 at 03:25:27AM +0100, Al Viro wrote: > On Sat, Jun 01, 2019 at 10:15:26AM +0800, Gen Zhang wrote: > > In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It > > returns NULL when fails. So 'arg' should be checked. And 'mnt_opts' > > should be freed when error. > > What's the latter one for? On failure we'll get to put_fs_context() > pretty soon, so > security_free_mnt_opts(>security); > will be called just fine. Leaving it allocated on failure is fine... Actually, right now in mainline it is not (btrfs_mount_root() has an odd call of security_sb_eat_lsm_opts()); eventually we will be down to just the callers in ->parse_monolithic() instances, at which point the above will become correct. At the moment it is not, so consider the objection withdrawn (and I really need to get some sleep, seeing how long did it take me to recall the context... ;-/)
Re: [PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()
On Sat, Jun 01, 2019 at 10:15:26AM +0800, Gen Zhang wrote: > In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It > returns NULL when fails. So 'arg' should be checked. And 'mnt_opts' > should be freed when error. What's the latter one for? On failure we'll get to put_fs_context() pretty soon, so security_free_mnt_opts(>security); will be called just fine. Leaving it allocated on failure is fine...
[PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()
In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It returns NULL when fails. So 'arg' should be checked. And 'mnt_opts' should be freed when error. Signed-off-by: Gen Zhang Reviewed-by: Ondrej Mosnacek Fixes: 99dbbb593fe6 ("selinux: rewrite selinux_sb_eat_lsm_opts()") --- diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 3ec702c..f329fc0 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2616,6 +2616,7 @@ static int selinux_sb_eat_lsm_opts(char *options, void **mnt_opts) char *from = options; char *to = options; bool first = true; + int ret; while (1) { int len = opt_len(from); @@ -2635,15 +2636,16 @@ static int selinux_sb_eat_lsm_opts(char *options, void **mnt_opts) *q++ = c; } arg = kmemdup_nul(arg, q - arg, GFP_KERNEL); + if (!arg) { + ret = -ENOMEM; + goto free_opt; + } } rc = selinux_add_opt(token, arg, mnt_opts); if (unlikely(rc)) { + ret = rc; kfree(arg); - if (*mnt_opts) { - selinux_free_mnt_opts(*mnt_opts); - *mnt_opts = NULL; - } - return rc; + goto free_opt; } } else { if (!first) { // copy with preceding comma @@ -2661,6 +2663,12 @@ static int selinux_sb_eat_lsm_opts(char *options, void **mnt_opts) } *to = '\0'; return 0; +free_opt: + if (*mnt_opts) { + selinux_free_mnt_opts(*mnt_opts); + *mnt_opts = NULL; + } + return ret; } static int selinux_sb_remount(struct super_block *sb, void *mnt_opts)
Re: [PATCH] firmware_loader: fix build without sysctl
Hi Matteo, On Fri, 31 May 2019 11:12:39 +0200 Matteo Croce wrote: > > please correct the Fixes tag if possible. > It seems that the hash of the offending commit now is d91bff3011cf Unfortunately, these hashes will keep changing for things in Andrew's patch queue until they are sent to Linus. In the end Andrew will probably just squash this fix into the original patch. Cheers, Stephen Rothwell pgpCiX6lQLHiS.pgp Description: OpenPGP digital signature
Re: [PATCH v2 0/5] Introduce OPP bandwidth bindings
I'll have to Nack this series because it's making a couple of wrong assumptions about bandwidth voting. Firstly, it's mixing up OPP to bandwidth mapping (Eg: CPU freq to CPU<->DDR bandwidth mapping) with the bandwidth levels that are actually supported by an interconnect path (Eg: CPU<->DDR bandwidth levels). For example, CPU0 might decide to vote for a max of 10 GB/s because it's a little CPU and never needs anything higher than 10 GB/s even at CPU0's max frequency. But that has no bearing on bandwidth level available between CPU<->DDR. There needs to be a separate BW OPP table describing the bandwith levels available for the CPU<->DDR path and then a separate mapping between CPU OPP to CPU<->DDR BW OPP. That way, the mapping decision (policy or voltage based config decision) doesn't affect the description of what the hardware really is capable of. Put another way, if someone comes around and decides the CPU0's max freq should ask for 15 GB/s because some shared voltage rail would already be pushed to a voltage sufficient to support 15 GB/s, then it shouldn't change the HW description of what bandwidth levels are available between CPU<->DDR. If the CPU<->DDR path supports 20 GB/s, it always does independent of the CPU OPP table mapping. By splitting out the available bandwidth levels of the CPU<->DDR path into a separate BW OPP table, we avoid these kinds of issues. Also, one could easily imagine using a bandwidth counter or some other means of BW measurement hardware to vote for bandwidth between CPU<->DDR and CPU<->L3. That entity should be able to know/learn all the available bandwidth levels in the CPU<->DDR path without forcing bandwidth levels to be listed in CPU OPP table. And if it's measuring bandwidth at a point common for all CPUs, what CPU OPP table is it even supposed to look at to learn all the available bandwidth levels. It just doesn't make sense. It's also easy to envision having multiple policies or devfreq governors voting for an interconnect path. The mapping you are talking about in this series is just an input for one of them (the cpufreq-map governor I sent out a while ago). Secondly, when it comes to bandwidth OPP tables, the peak bandwidth should be the key/first element (similar to how frequency is now). Long explanation follows. All the sensible frequency combinations of all the hardware interconnects between the master and slave port (Eg: GPU -> MMNOC -> BIMC -> DDR) determine the peak bandwidth levels available in that interconnect path. If multiple devices (GPU, CPU, etc) vote for different peak bandwidths for an interconnect (say BIMC), the interconnect provider picks the max peak bandwidth amongst all the requests and then picks the lowest interconnect frequency that can support the max peak bandwidth. So the devices (GPU, CPU, etc), actually have some control on what interconnect frequencies are picked by asking for a specific peak bandwidth -- so there's actually a notion of useful levels. Average bandwidth is an additive property -- so if CPU and GPU ask for 5 GB/s and 3 GB/s respectively for an interconnect, the interconnect provider adds them up and configures the interconnect for 8 GB/s. So if GPU asks for 5 GB/s average bandwidth, it has no idea what frequency the interconnect will actually get configured to. So, average bandwidth really doesn't provide a sense of levels to pick from for a given interconnect path. So peak bandwidth is a much better pick than average bandwidth for being a key to the bandwidth OPP table. So what I think we need is: * Bandwidth OPP support in the kernel * Bandwidth OPP DT binding to describe the bandwidth levels available for different interconnect paths. * A new "interconnect-opp" property that can point to different BW OPP tables for each of the interconnect paths listed under interconnects property. Then for mapping from device OPP to interconnect path bandwidth OPPs, you just used the existing required-opps binding to link an entry in GPU OPP to an entry in GPU<->DDR bandwidth OPP table. That way the hardware is actually described correctly and the mapping is kept separate. So, in the end, it'll look something like this in DT. gpu_cache_opp_table: gpu_cache_opp_table { compatible = "operating-points-v2"; gpu_cache_3000: opp-3000 { opp-peak-mbps = <3000>; avg-mbps = <1000>; }; gpu_cache_6000: opp-6000 { opp-peak-mbps = <6000>; avg-mbps = <2000>; }; gpu_cache_9000: opp-9000 { opp-peak-mbps = <9000>; avg-mbps = <9000>; }; }; gpu_ddr_opp_table: gpu_ddr_opp_table { compatible = "operating-points-v2"; gpu_ddr_1525: opp-1525 { opp-peak-mbps = <1525>; avg-mbps = <452>; }; gpu_ddr_3051: opp-3051 { opp-peak-mbps = <3051>; avg-mbps = <915>; }; gpu_ddr_7500:
mmotm 2019-05-31-19-09 uploaded
The mm-of-the-moment snapshot 2019-05-31-19-09 has been uploaded to http://www.ozlabs.org/~akpm/mmotm/ mmotm-readme.txt says README for mm-of-the-moment: http://www.ozlabs.org/~akpm/mmotm/ This is a snapshot of my -mm patch queue. Uploaded at random hopefully more than once a week. You will need quilt to apply these patches to the latest Linus release (5.x or 5.x-rcY). The series file is in broken-out.tar.gz and is duplicated in http://ozlabs.org/~akpm/mmotm/series The file broken-out.tar.gz contains two datestamp files: .DATE and .DATE--mm-dd-hh-mm-ss. Both contain the string -mm-dd-hh-mm-ss, followed by the base kernel version against which this patch series is to be applied. This tree is partially included in linux-next. To see which patches are included in linux-next, consult the `series' file. Only the patches within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in linux-next. A full copy of the full kernel tree with the linux-next and mmotm patches already applied is available through git within an hour of the mmotm release. Individual mmotm releases are tagged. The master branch always points to the latest release, so it's constantly rebasing. http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/ The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second) contains daily snapshots of the -mm tree. It is updated more frequently than mmotm, and is untested. A git copy of this tree is available at http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/ and use of this tree is similar to http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above. This mmotm tree contains the following patches against 5.2-rc2: (patches marked "*" will be included in linux-next) origin.patch drivers-crypto-ux500-cryp-makefile-fix.patch * mm-fix-documentation-vm-hmmrst-sphinx-warnings.patch * lib-sortc-fix-kernel-doc-notation-warnings.patch * mm-vmallocc-fix-typo-in-comment.patch * mm-slab-remove-obsoleted-config_debug_slab_leak.patch * arch-arm-boot-compressed-decompressc-fix-build-error-due-to-lz4-changes.patch * kernel-fork-make-max_threads-symbol-static.patch * prctl_set_mm-refactor-checks-from-validate_prctl_map.patch * prctl_set_mm-refactor-checks-from-validate_prctl_map-checkpatch-fixes.patch * prctl_set_mm-downgrade-mmap_sem-to-read-lock.patch * prctl_set_mm-downgrade-mmap_sem-to-read-lock-checkpatch-fixes.patch * mm-consider-subtrees-in-memoryevents.patch * memcg-make-it-work-on-sparse-non-0-node-systems.patch * ocfs2-fix-error-path-kobject-memory-leak.patch * mm-gup-continue-vm_fault_retry-processing-event-for-pre-faults.patch * scripts-gdb-fix-invocation-when-config_common_clk-is-not-set.patch * z3fold-fix-sheduling-while-atomic.patch * kasan-initialize-tag-to-0xff-in-__kasan_kmalloc.patch * spdxcheckpy-fix-directory-structures-v3.patch * iommu-intel-fix-variable-iommu-set-but-not-used.patch * signal-trace_signal_deliver-when-signal_group_exit.patch * generic-radix-trees-fix-kerneldoc-comment.patch * mm-compaction-make-sure-we-isolate-a-valid-pfn.patch * convert-struct-pid-count-to-refcount_t.patch * mm-dev_pfn-exclude-memory_device_private-while-computing-virtual-address.patch * fs-proc-allow-reporting-eip-esp-for-all-coredumping-threads.patch * mm-mempolicy-fix-an-incorrect-rebind-node-in-mpol_rebind_nodemask.patch * binfmt_flat-make-load_flat_shared_library-work.patch * mm-fix-trying-to-reclaim-unevicable-lru-page.patch * mm-memcontrol-dont-batch-updates-of-local-vm-stats-and-events.patch * list_lru-fix-memory-leak-in-__memcg_init_list_lru_node.patch * scripts-decode_stacktracesh-prefix-addr2line-with-cross_compile.patch * mm-mlockall-error-for-flag-mcl_onfault.patch * mm-fix-recent_rotated-history.patch * fs-ocfs2-fix-race-in-ocfs2_dentry_attach_lock.patch * mm-mmu_gather-remove-__tlb_reset_range-for-force-flush.patch * mm-mmu_gather-remove-__tlb_reset_range-for-force-flush-checkpatch-fixes.patch * scripts-decode_stacktrace-match-basepath-using-shell-prefix-operator-not-regex.patch * scripts-decode_stacktrace-look-for-modules-with-kodebug-extension.patch * scripts-decode_stacktrace-look-for-modules-with-kodebug-extension-v2.patch * scripts-spellingtxt-drop-sepc-from-the-misspelling-list.patch * scripts-spellingtxt-drop-sepc-from-the-misspelling-list-fix.patch * scripts-spellingtxt-add-spelling-fix-for-prohibited.patch * scripts-decode_stacktrace-accept-dash-underscore-in-modules.patch * sh-configs-remove-config_logfs-from-defconfig.patch * debugobjects-move-printk-out-of-db-lock-critical-sections.patch * ocfs2-add-last-unlock-times-in-locking_state.patch * ocfs2-add-locking-filter-debugfs-file.patch * fs-ocfs-fix-spelling-mistake-hearbeating-heartbeat.patch * ocfs2-clear-zero-in-unaligned-direct-io.patch * ocfs2-clear-zero-in-unaligned-direct-io-checkpatch-fixes.patch * ocfs2-wait-for-recovering-done-after-direct-unlock-request.patch * ocfs2-checkpoint-appending-truncate-log-transaction-before-flushing.patch *
Re: [RFC][PATCH] Makefile: Fix checkstack.pl arm64 wrong or unknown architecture
On Sat, Jun 1, 2019 at 2:45 AM George G. Davis wrote: > > Following this pattern, does this work for you? > > > > diff --git a/scripts/checkstack.pl b/scripts/checkstack.pl > > index 122aef5e4e14..371bd17a4983 100755 > > --- a/scripts/checkstack.pl > > +++ b/scripts/checkstack.pl > > @@ -46,7 +46,7 @@ my (@stack, $re, $dre, $x, $xs, $funcre); > > $x = "[0-9a-f]"; # hex character > > $xs = "[0-9a-f ]"; # hex character or space > > $funcre = qr/^$x* <(.*)>:$/; > > - if ($arch eq 'aarch64') { > > + if ($arch =~ '^(aarch|arm)64$') { > > Yes, that works, thanks! > > Will you submit a fix or would you like me to resubmit with the above > suggested > fix? Please send v2. Thanks. -- Best Regards Masahiro Yamada
[5.2-rc REGRESSION] Random gcc crash for 'make -j12' when low on memory
Hi, When compiling the kernel on v5.2-rc (both rc1 and rc2) with "make -j12", the gcc will randomly crash with segfault, while on v5.1-rc7 everything is OK. The crash only happens when the VM has only 1G ram, when given 4G ram it no longer crash. However according to dmesg, there is no OOM triggered. Thus this looks like a regression. The environment is: VM hypervisor: KVM vCPU: 8 vRAM: 1G (crash) 4G (OK) Distro: Archlinux Tried kernel: Upstream v5.1-rc7 (good), v5.2-rc1 (fail), v5.2-rc2(fail) Host CPU: Ryzen 1700 (no gcc crash on host) Is there something related to OOM changed? Thanks, Qu signature.asc Description: OpenPGP digital signature
Re: [PATCH v2] hooks: fix a missing-check bug in selinux_sb_eat_lsm_opts()
On Fri, May 31, 2019 at 11:45:28AM -0400, Paul Moore wrote: > On Thu, May 30, 2019 at 9:34 PM Gen Zhang wrote: > > > > In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It > > returns NULL when fails. So 'arg' should be checked. > > > > Signed-off-by: Gen Zhang > > Reviewed-by: Ondrej Mosnacek > > Fixes: 99dbbb593fe6 ("selinux: rewrite selinux_sb_eat_lsm_opts()") > > One quick note about the subject line, instead of using "hooks:" you > should use "selinux:" since this is specific to SELinux. If the patch > did apply to the LSM framework as a whole, I would suggest using > "lsm:" instead of "hooks:" as "hooks" is too ambiguous of a prefix. > > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > > index 3ec702c..5a9e959 100644 > > --- a/security/selinux/hooks.c > > +++ b/security/selinux/hooks.c > > @@ -2635,6 +2635,8 @@ static int selinux_sb_eat_lsm_opts(char *options, > > void **mnt_opts) > > *q++ = c; > > } > > arg = kmemdup_nul(arg, q - arg, GFP_KERNEL); > > + if (!arg) > > + return -ENOMEM; > > It seems like we should also check for, and potentially free *mnt_opts > as the selinux_add_opt() error handling does just below this change, > yes? If that is the case we might want to move that error handling > code to the bottom of the function and jump there on error. > > > } > > rc = selinux_add_opt(token, arg, mnt_opts); > > if (unlikely(rc)) { > > -- > paul moore > www.paul-moore.com Yes, I agree with that. And I will work on this to resubmit. Thanks Gen
Re: [PATCH v2] hooks: fix a missing-check bug in selinux_add_mnt_opt()
On Fri, May 31, 2019 at 11:55:23AM -0400, Paul Moore wrote: > On Thu, May 30, 2019 at 4:55 AM Gen Zhang wrote: > > > > In selinux_add_mnt_opt(), 'val' is allcoted by kmemdup_nul(). It returns > > NULL when fails. So 'val' should be checked. > > > > Signed-off-by: Gen Zhang > > Fixes: 757cbe597fe8 ("LSM: new method: ->sb_add_mnt_opt()") > > Previous comments regarding "selinux:" instead of "hooks:" apply here as well. > Thanks for your comments, Paul. I will make some changes. > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > > index 3ec702c..4797c63 100644 > > --- a/security/selinux/hooks.c > > +++ b/security/selinux/hooks.c > > @@ -1052,8 +1052,11 @@ static int selinux_add_mnt_opt(const char *option, > > const char *val, int len, > > if (token == Opt_error) > > return -EINVAL; > > > > - if (token != Opt_seclabel) > > - val = kmemdup_nul(val, len, GFP_KERNEL); > > + if (token != Opt_seclabel) { > > + val = kmemdup_nul(val, len, GFP_KERNEL); > > + if (!val) > > + return -ENOMEM; > > It looks like this code is only ever called by NFS, which will > eventually clean up mnt_opts via security_free_mnt_opts(), but since > the selinux_add_opt() error handler below cleans up mnt_opts it might > be safer to do the same here in case this function is called multiple > times to add multiple options. > > > + } > > rc = selinux_add_opt(token, val, mnt_opts); > > if (unlikely(rc)) { > > kfree(val); > > -- > paul moore > www.paul-moore.com Thanks Gen
Re: [PATCH bpf v2] bpf: preallocate a perf_sample_data per event fd
> On May 31, 2019, at 3:37 PM, Matt Mullins wrote: > > It is possible that a BPF program can be called while another BPF > program is executing bpf_perf_event_output. This has been observed with > I/O completion occurring as a result of an interrupt: > > bpf_prog_247fd1341cddaea4_trace_req_end+0x8d7/0x1000 > ? trace_call_bpf+0x82/0x100 > ? sch_direct_xmit+0xe2/0x230 > ? blk_mq_end_request+0x1/0x100 > ? blk_mq_end_request+0x5/0x100 > ? kprobe_perf_func+0x19b/0x240 > ? __qdisc_run+0x86/0x520 > ? blk_mq_end_request+0x1/0x100 > ? blk_mq_end_request+0x5/0x100 > ? kprobe_ftrace_handler+0x90/0xf0 > ? ftrace_ops_assist_func+0x6e/0xe0 > ? ip6_input_finish+0xbf/0x460 > ? 0xa01e80bf > ? nbd_dbg_flags_show+0xc0/0xc0 [nbd] > ? blkdev_issue_zeroout+0x200/0x200 > ? blk_mq_end_request+0x1/0x100 > ? blk_mq_end_request+0x5/0x100 > ? flush_smp_call_function_queue+0x6c/0xe0 > ? smp_call_function_single_interrupt+0x32/0xc0 > ? call_function_single_interrupt+0xf/0x20 > ? call_function_single_interrupt+0xa/0x20 > ? swiotlb_map_page+0x140/0x140 > ? refcount_sub_and_test+0x1a/0x50 > ? tcp_wfree+0x20/0xf0 > ? skb_release_head_state+0x62/0xc0 > ? skb_release_all+0xe/0x30 > ? napi_consume_skb+0xb5/0x100 > ? mlx5e_poll_tx_cq+0x1df/0x4e0 > ? mlx5e_poll_tx_cq+0x38c/0x4e0 > ? mlx5e_napi_poll+0x58/0xc30 > ? mlx5e_napi_poll+0x232/0xc30 > ? net_rx_action+0x128/0x340 > ? __do_softirq+0xd4/0x2ad > ? irq_exit+0xa5/0xb0 > ? do_IRQ+0x7d/0xc0 > ? common_interrupt+0xf/0xf > > ? __rb_free_aux+0xf0/0xf0 > ? perf_output_sample+0x28/0x7b0 > ? perf_prepare_sample+0x54/0x4a0 > ? perf_event_output+0x43/0x60 > ? bpf_perf_event_output_raw_tp+0x15f/0x180 > ? blk_mq_start_request+0x1/0x120 > ? bpf_prog_411a64a706fc6044_should_trace+0xad4/0x1000 > ? bpf_trace_run3+0x2c/0x80 > ? nbd_send_cmd+0x4c2/0x690 [nbd] > > This also cannot be alleviated by further splitting the per-cpu > perf_sample_data structs (as in commit 283ca526a9bd ("bpf: fix > corruption on concurrent perf_event_output calls")), as a raw_tp could > be attached to the block:block_rq_complete tracepoint and execute during > another raw_tp. Instead, keep a pre-allocated perf_sample_data > structure per perf_event_array element and fail a bpf_perf_event_output > if that element is concurrently being used. > > Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data") > Signed-off-by: Matt Mullins This looks great. Thanks for the fix. Acked-by: Song Liu > --- > v1->v2: > keep a pointer to the struct perf_sample_data rather than directly > embedding it in the structure, avoiding the circular include and > removing the need for in_use. Suggested by Song. > > include/linux/bpf.h | 1 + > kernel/bpf/arraymap.c| 3 ++- > kernel/trace/bpf_trace.c | 29 - > 3 files changed, 19 insertions(+), 14 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 4fb3aa2dc975..47fd85cfbbaf 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -472,6 +472,7 @@ struct bpf_event_entry { > struct file *perf_file; > struct file *map_file; > struct rcu_head rcu; > + struct perf_sample_data *sd; > }; > > bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog > *fp); > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > index 584636c9e2eb..c7f5d593e04f 100644 > --- a/kernel/bpf/arraymap.c > +++ b/kernel/bpf/arraymap.c > @@ -654,11 +654,12 @@ static struct bpf_event_entry > *bpf_event_entry_gen(struct file *perf_file, > { > struct bpf_event_entry *ee; > > - ee = kzalloc(sizeof(*ee), GFP_ATOMIC); > + ee = kzalloc(sizeof(*ee) + sizeof(struct perf_sample_data), GFP_ATOMIC); > if (ee) { > ee->event = perf_file->private_data; > ee->perf_file = perf_file; > ee->map_file = map_file; > + ee->sd = (void *)ee + sizeof(*ee); > } > > return ee; > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index f92d6ad5e080..076f8e987355 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -410,17 +410,17 @@ static const struct bpf_func_proto > bpf_perf_event_read_value_proto = { > .arg4_type = ARG_CONST_SIZE, > }; > > -static DEFINE_PER_CPU(struct perf_sample_data, bpf_trace_sd); > - > static __always_inline u64 > __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map, > - u64 flags, struct perf_sample_data *sd) > + u64 flags, struct perf_raw_record *raw) > { > struct bpf_array *array = container_of(map, struct bpf_array, map); > unsigned int cpu = smp_processor_id(); > u64
Re: [PATCH v8 2/4] soc: qcom: Add AOSS QMP driver
Hi, On Fri, May 31, 2019 at 5:09 PM Bjorn Andersson wrote: > > On Fri 31 May 15:24 PDT 2019, Doug Anderson wrote: > > > Hi, > > > > On Thu, May 30, 2019 at 8:01 PM Bjorn Andersson > > wrote: > > > > > > +/** > > > + * qmp_send() - send a message to the AOSS > > > + * @qmp: qmp context > > > + * @data: message to be sent > > > + * @len: length of the message > > > + * > > > + * Transmit @data to AOSS and wait for the AOSS to acknowledge the > > > message. > > > + * @len must be a multiple of 4 and not longer than the mailbox size. > > > Access is > > > + * synchronized by this implementation. > > > + * > > > + * Return: 0 on success, negative errno on failure > > > + */ > > > +static int qmp_send(struct qmp *qmp, const void *data, size_t len) > > > +{ > > > + int ret; > > > + > > > + if (WARN_ON(len + sizeof(u32) > qmp->size)) > > > + return -EINVAL; > > > + > > > + if (WARN_ON(len % sizeof(u32))) > > > + return -EINVAL; > > > + > > > + mutex_lock(>tx_lock); > > > + > > > + /* The message RAM only implements 32-bit accesses */ > > > + __iowrite32_copy(qmp->msgram + qmp->offset + sizeof(u32), > > > +data, len / sizeof(u32)); > > > + writel(len, qmp->msgram + qmp->offset); > > > + qmp_kick(qmp); > > > + > > > + ret = wait_event_interruptible_timeout(qmp->event, > > > + qmp_message_empty(qmp), > > > HZ); > > > + if (!ret) { > > > + dev_err(qmp->dev, "ucore did not ack channel\n"); > > > + ret = -ETIMEDOUT; > > > + > > > + /* Clear message from buffer */ > > > + writel(0, qmp->msgram + qmp->offset); > > > + } else { > > > + ret = 0; > > > + } > > > > Just like Vinod said in in v7, the "ret = 0" is redundant. > > > > If the condition passed to wait_event_interruptible_timeout() evaluates > true the remote side has consumed the message and ret will be 1. We end > up in the else block (i.e. not timeout) and we want the function to > return 0, so we set ret to 0. > > Please let me know if I'm reading this wrong. Ah, it's me that's confused. I missed the "!" on ret. Maybe it'd be less confusing if you did: time_left = wait_event_interruptible_timeout(...) if (!time_left) Even though you _can_ use "ret", it's less confusing to use a different variable since (often) ret is an error code. Speaking of which: do you actually handle the case where you get an interrupt? Should the above just be wait_event_timeout()?
Re: [PATCH] ARM: xor-neon: Replace __GNUC__ checks with CONFIG_CC_IS_GCC
On Fri, May 31, 2019 at 11:03:19PM +0200, Arnd Bergmann wrote: > On Fri, May 31, 2019 at 10:06 PM 'Nick Desaulniers' via Clang Built > Linux wrote: > > > > On Fri, May 31, 2019 at 12:21 PM Arnd Bergmann wrote: > > > clang, I would suggest dropping your patch then, and instead adding > > > > I disagree. The minimum version of gcc required to build the kernel > > is 4.6, so the comment about older versions of gcc is irrelevant and > > should be removed. > > Sure, that's ok. It just feels wrong to remove a warning that points > to a real problem that still exists and can be detected at the moment. > > If we think that clang-9 is going to be fixed before its release, > the warning could be changed to test for that version as a minimum, > and point to the bugzilla entry for more details. > > Arnd I just tested the arm64 implementation and it shows the same warnings about cost as arm. However, I see a warning as something that can be resolved by the user. The GCC warning's solution is to just use a newer version of GCC (something fairly easily attainable). This new warning currently has no solution other than don't use clang. It is up to you and Nick but I would say unless we are going to prioritize fixing this, we shouldn't add a warning for it. I'd say it is more appropriate to fix it then add a warning saying upgrade to this version to fix it, like the GCC one (though I don't necessarily hate adding the warning assuming that clang 9 will have it fixed). Cheers, Nathan
[PATCH net v1 1/2] net: ethernet: mediatek: Use hw_feature to judge if HWLRO is supported
From: Sean Wang Should hw_feature as hardware capability flags to check if hardware LRO got support. Signed-off-by: Mark Lee Signed-off-by: Sean Wang --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index f9fbb3ffa3a6..0b88febbaf2a 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -2298,13 +2298,13 @@ static int mtk_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd, switch (cmd->cmd) { case ETHTOOL_GRXRINGS: - if (dev->features & NETIF_F_LRO) { + if (dev->hw_features & NETIF_F_LRO) { cmd->data = MTK_MAX_RX_RING_NUM; ret = 0; } break; case ETHTOOL_GRXCLSRLCNT: - if (dev->features & NETIF_F_LRO) { + if (dev->hw_features & NETIF_F_LRO) { struct mtk_mac *mac = netdev_priv(dev); cmd->rule_cnt = mac->hwlro_ip_cnt; @@ -2312,11 +2312,11 @@ static int mtk_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd, } break; case ETHTOOL_GRXCLSRULE: - if (dev->features & NETIF_F_LRO) + if (dev->hw_features & NETIF_F_LRO) ret = mtk_hwlro_get_fdir_entry(dev, cmd); break; case ETHTOOL_GRXCLSRLALL: - if (dev->features & NETIF_F_LRO) + if (dev->hw_features & NETIF_F_LRO) ret = mtk_hwlro_get_fdir_all(dev, cmd, rule_locs); break; @@ -2333,11 +2333,11 @@ static int mtk_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd) switch (cmd->cmd) { case ETHTOOL_SRXCLSRLINS: - if (dev->features & NETIF_F_LRO) + if (dev->hw_features & NETIF_F_LRO) ret = mtk_hwlro_add_ipaddr(dev, cmd); break; case ETHTOOL_SRXCLSRLDEL: - if (dev->features & NETIF_F_LRO) + if (dev->hw_features & NETIF_F_LRO) ret = mtk_hwlro_del_ipaddr(dev, cmd); break; default: -- 2.17.1
Re: [PATCH net-next 00/12] code optimizations & bugfixes for HNS3 driver
From: David Miller Date: Fri, 31 May 2019 17:15:29 -0700 (PDT) > From: Huazhong Tan > Date: Fri, 31 May 2019 16:54:46 +0800 > >> This patch-set includes code optimizations and bugfixes for the HNS3 >> ethernet controller driver. >> >> [patch 1/12] removes the redundant core reset type >> >> [patch 2/12 - 3/12] fixes two VLAN related issues >> >> [patch 4/12] fixes a TM issue >> >> [patch 5/12 - 12/12] includes some patches related to RAS & MSI-X error > > Series applied. I reverted, you need to actually build test the infiniband side of your driver. drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function ‘hns_roce_v2_msix_interrupt_abn’: drivers/infiniband/hw/hns/hns_roce_hw_v2.c:5032:14: warning: passing argument 2 of ‘ops->set_default_reset_request’ makes pointer from integer without a cast [-Wint-conversion] HNAE3_FUNC_RESET); ^~~~ drivers/infiniband/hw/hns/hns_roce_hw_v2.c:5032:14: note: expected ‘long unsigned int *’ but argument is of type ‘int’ C-c C-cmake[5]: *** Deleting file 'drivers/net/wireless/ath/carl9170/cmd.o'
[PATCH net v1 2/2] net: ethernet: mediatek: Use NET_IP_ALIGN to judge if HW RX_2BYTE_OFFSET is enabled
From: Sean Wang Should only enable HW RX_2BYTE_OFFSET function in the case NET_IP_ALIGN equals to 2. Signed-off-by: Mark Lee Signed-off-by: Sean Wang --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 0b88febbaf2a..765cd56ebcd2 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -1778,6 +1778,7 @@ static void mtk_poll_controller(struct net_device *dev) static int mtk_start_dma(struct mtk_eth *eth) { + u32 rx_2b_offset = (NET_IP_ALIGN == 2) ? MTK_RX_2B_OFFSET : 0; int err; err = mtk_dma_init(eth); @@ -1794,7 +1795,7 @@ static int mtk_start_dma(struct mtk_eth *eth) MTK_QDMA_GLO_CFG); mtk_w32(eth, - MTK_RX_DMA_EN | MTK_RX_2B_OFFSET | + MTK_RX_DMA_EN | rx_2b_offset | MTK_RX_BT_32DWORDS | MTK_MULTI_EN, MTK_PDMA_GLO_CFG); -- 2.17.1
Re: [PATCH net-next 00/12] code optimizations & bugfixes for HNS3 driver
From: Huazhong Tan Date: Fri, 31 May 2019 16:54:46 +0800 > This patch-set includes code optimizations and bugfixes for the HNS3 > ethernet controller driver. > > [patch 1/12] removes the redundant core reset type > > [patch 2/12 - 3/12] fixes two VLAN related issues > > [patch 4/12] fixes a TM issue > > [patch 5/12 - 12/12] includes some patches related to RAS & MSI-X error Series applied.
Re: [PATCH -mm] mm, swap: Fix bad swap file entry warning
Michal Hocko writes: > On Fri 31-05-19 10:41:02, Huang, Ying wrote: >> From: Huang Ying >> >> Mike reported the following warning messages >> >> get_swap_device: Bad swap file entry 1401 >> >> This is produced by >> >> - total_swapcache_pages() >> - get_swap_device() >> >> Where get_swap_device() is used to check whether the swap device is >> valid and prevent it from being swapoff if so. But get_swap_device() >> may produce warning message as above for some invalid swap devices. >> This is fixed via calling swp_swap_info() before get_swap_device() to >> filter out the swap devices that may cause warning messages. >> >> Fixes: 6a946753dbe6 ("mm/swap_state.c: simplify total_swapcache_pages() with >> get_swap_device()") > > I suspect this is referring to a mmotm patch right? Yes. > There doesn't seem > to be any sha like this in Linus' tree AFAICS. If that is the case then > please note that mmotm patch showing up in linux-next do not have a > stable sha1 and therefore you shouldn't reference them in the commit > message. Instead please refer to the specific mmotm patch file so that > Andrew knows it should be folded in to it. Thanks for reminding! I will be more careful in the future. It seems that Andrew has identified the right patch to be folded into. Best Regards, Huang, Ying
Re: [PATCH bpf v2] bpf, riscv: clear high 32 bits for ALU32 add/sub/neg/lsh/rsh/arsh
On Fri, May 31, 2019 at 1:40 PM Palmer Dabbelt wrote: > > On Thu, 30 May 2019 15:29:22 PDT (-0700), luke.r.n...@gmail.com wrote: > > In BPF, 32-bit ALU operations should zero-extend their results into > > the 64-bit registers. > > > > The current BPF JIT on RISC-V emits incorrect instructions that perform > > sign extension only (e.g., addw, subw) on 32-bit add, sub, lsh, rsh, > > arsh, and neg. This behavior diverges from the interpreter and JITs > > for other architectures. > > > > This patch fixes the bugs by performing zero extension on the destination > > register of 32-bit ALU operations. > > > > Fixes: 2353ecc6f91f ("bpf, riscv: add BPF JIT for RV64G") > > Cc: Xi Wang > > Signed-off-by: Luke Nelson > > Reviewed-by: Palmer Dabbelt > > Thanks! I'm assuming this is going in through a BPF tree and not the RISC-V > tree, but LMK if that's not the case. Applied to bpf tree. Thanks
[PATCH net-next v1 4/6] net: ethernet: mediatek: Integrate hardware path from GMAC to PHY variants
From: Sean Wang All path route on various SoCs all would be managed in common function mtk_setup_hw_path that is determined by the both applied devicetree regarding the path between GMAC and the target PHY or switch by the capability of target SoC in the runtime. Signed-off-by: Sean Wang --- drivers/net/ethernet/mediatek/Makefile | 3 +- drivers/net/ethernet/mediatek/mtk_eth_path.c | 323 +++ drivers/net/ethernet/mediatek/mtk_eth_soc.c | 70 +--- drivers/net/ethernet/mediatek/mtk_eth_soc.h | 123 ++- 4 files changed, 450 insertions(+), 69 deletions(-) create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_path.c diff --git a/drivers/net/ethernet/mediatek/Makefile b/drivers/net/ethernet/mediatek/Makefile index b8206605154e..212c86f9982f 100644 --- a/drivers/net/ethernet/mediatek/Makefile +++ b/drivers/net/ethernet/mediatek/Makefile @@ -3,4 +3,5 @@ # Makefile for the Mediatek SoCs built-in ethernet macs # -obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth_soc.o mtk_sgmii.o +obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth_soc.o mtk_sgmii.o \ + mtk_eth_path.o diff --git a/drivers/net/ethernet/mediatek/mtk_eth_path.c b/drivers/net/ethernet/mediatek/mtk_eth_path.c new file mode 100644 index ..61f705d945e5 --- /dev/null +++ b/drivers/net/ethernet/mediatek/mtk_eth_path.c @@ -0,0 +1,323 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2018-2019 MediaTek Inc. + +/* A library for configuring path from GMAC/GDM to target PHY + * + * Author: Sean Wang + * + */ + +#include +#include + +#include "mtk_eth_soc.h" + +struct mtk_eth_muxc { + int (*set_path)(struct mtk_eth *eth, int path); +}; + +static const char * const mtk_eth_mux_name[] = { + "mux_gdm1_to_gmac1_esw", "mux_gmac2_gmac0_to_gephy", + "mux_u3_gmac2_to_qphy", "mux_gmac1_gmac2_to_sgmii_rgmii", + "mux_gmac12_to_gephy_sgmii", +}; + +static const char * const mtk_eth_path_name[] = { + "gmac1_rgmii", "gmac1_trgmii", "gmac1_sgmii", "gmac2_rgmii", + "gmac2_sgmii", "gmac2_gephy", "gdm1_esw", +}; + +static int set_mux_gdm1_to_gmac1_esw(struct mtk_eth *eth, int path) +{ + bool updated = true; + u32 val, mask, set; + + switch (path) { + case MTK_ETH_PATH_GMAC1_SGMII: + mask = ~(u32)MTK_MUX_TO_ESW; + set = 0; + break; + case MTK_ETH_PATH_GDM1_ESW: + mask = ~(u32)MTK_MUX_TO_ESW; + set = MTK_MUX_TO_ESW; + break; + default: + updated = false; + break; + }; + + if (updated) { + val = mtk_r32(eth, MTK_MAC_MISC); + val = (val & mask) | set; + mtk_w32(eth, val, MTK_MAC_MISC); + } + + dev_dbg(eth->dev, "path %s in %s updated = %d\n", + mtk_eth_path_name[path], __func__, updated); + + return 0; +} + +static int set_mux_gmac2_gmac0_to_gephy(struct mtk_eth *eth, int path) +{ + unsigned int val = 0; + bool updated = true; + + switch (path) { + case MTK_ETH_PATH_GMAC2_GEPHY: + val = ~(u32)GEPHY_MAC_SEL; + break; + default: + updated = false; + break; + } + + if (updated) + regmap_update_bits(eth->infra, INFRA_MISC2, GEPHY_MAC_SEL, val); + + dev_dbg(eth->dev, "path %s in %s updated = %d\n", + mtk_eth_path_name[path], __func__, updated); + + return 0; +} + +static int set_mux_u3_gmac2_to_qphy(struct mtk_eth *eth, int path) +{ + unsigned int val = 0; + bool updated = true; + + switch (path) { + case MTK_ETH_PATH_GMAC2_SGMII: + val = CO_QPHY_SEL; + break; + default: + updated = false; + break; + } + + if (updated) + regmap_update_bits(eth->infra, INFRA_MISC2, CO_QPHY_SEL, val); + + dev_dbg(eth->dev, "path %s in %s updated = %d\n", + mtk_eth_path_name[path], __func__, updated); + + return 0; +} + +static int set_mux_gmac1_gmac2_to_sgmii_rgmii(struct mtk_eth *eth, int path) +{ + unsigned int val = 0; + bool updated = true; + + switch (path) { + case MTK_ETH_PATH_GMAC1_SGMII: + val = SYSCFG0_SGMII_GMAC1; + break; + case MTK_ETH_PATH_GMAC2_SGMII: + val = SYSCFG0_SGMII_GMAC2; + break; + case MTK_ETH_PATH_GMAC1_RGMII: + case MTK_ETH_PATH_GMAC2_RGMII: + regmap_read(eth->ethsys, ETHSYS_SYSCFG0, ); + val &= SYSCFG0_SGMII_MASK; + + if ((path == MTK_GMAC1_RGMII && val == SYSCFG0_SGMII_GMAC1) || + (path == MTK_GMAC2_RGMII && val == SYSCFG0_SGMII_GMAC2)) + val = 0; + else + updated = false; + break; +
Re: [PATCH v8 2/4] soc: qcom: Add AOSS QMP driver
On Fri 31 May 15:24 PDT 2019, Doug Anderson wrote: > Hi, > > On Thu, May 30, 2019 at 8:01 PM Bjorn Andersson > wrote: > > > > +/** > > + * qmp_send() - send a message to the AOSS > > + * @qmp: qmp context > > + * @data: message to be sent > > + * @len: length of the message > > + * > > + * Transmit @data to AOSS and wait for the AOSS to acknowledge the message. > > + * @len must be a multiple of 4 and not longer than the mailbox size. > > Access is > > + * synchronized by this implementation. > > + * > > + * Return: 0 on success, negative errno on failure > > + */ > > +static int qmp_send(struct qmp *qmp, const void *data, size_t len) > > +{ > > + int ret; > > + > > + if (WARN_ON(len + sizeof(u32) > qmp->size)) > > + return -EINVAL; > > + > > + if (WARN_ON(len % sizeof(u32))) > > + return -EINVAL; > > + > > + mutex_lock(>tx_lock); > > + > > + /* The message RAM only implements 32-bit accesses */ > > + __iowrite32_copy(qmp->msgram + qmp->offset + sizeof(u32), > > +data, len / sizeof(u32)); > > + writel(len, qmp->msgram + qmp->offset); > > + qmp_kick(qmp); > > + > > + ret = wait_event_interruptible_timeout(qmp->event, > > + qmp_message_empty(qmp), HZ); > > + if (!ret) { > > + dev_err(qmp->dev, "ucore did not ack channel\n"); > > + ret = -ETIMEDOUT; > > + > > + /* Clear message from buffer */ > > + writel(0, qmp->msgram + qmp->offset); > > + } else { > > + ret = 0; > > + } > > Just like Vinod said in in v7, the "ret = 0" is redundant. > If the condition passed to wait_event_interruptible_timeout() evaluates true the remote side has consumed the message and ret will be 1. We end up in the else block (i.e. not timeout) and we want the function to return 0, so we set ret to 0. Please let me know if I'm reading this wrong. > > > +static int qmp_qdss_clk_add(struct qmp *qmp) > > +{ > > + struct clk_init_data qdss_init = { > > + .ops = _qdss_clk_ops, > > + .name = "qdss", > > + }; > > As I mentioned in v7, there is no downside in marking qdss_init as > "static const" and it avoids the compiler inserting a memcpy() to get > this data on the stack. Using static const also reduces your stack > usage. > In which case we would just serve it from .ro, makes sense now that I read your comment again. > > > + int ret; > > + > > + qmp->qdss_clk.init = _init; > > + ret = clk_hw_register(qmp->dev, >qdss_clk); > > + if (ret < 0) { > > + dev_err(qmp->dev, "failed to register qdss clock\n"); > > + return ret; > > + } > > + > > + ret = of_clk_add_hw_provider(qmp->dev->of_node, > > of_clk_hw_simple_get, > > +>qdss_clk); > > I still prefer to devm-ify the whole driver, using > devm_add_action_or_reset() to handle things where there is no devm. > ...but I won't insist. > > > Above things are just nits and I won't insist. They also could be > addressed in follow-up patches. Thus: > > Reviewed-by: Douglas Anderson Thanks! Regards, Bjorn
[PATCH net-next v1 1/6] dt-bindings: clock: mediatek: Add an extra required property to sgmiisys
From: Sean Wang add an extra required property "mediatek,physpeed" to sgmiisys to determine link speed to match up the capability of the target PHY. Signed-off-by: Sean Wang --- .../devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt index 30cb645c0e54..f5518f26a914 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt @@ -9,6 +9,8 @@ Required Properties: - "mediatek,mt7622-sgmiisys", "syscon" - "mediatek,mt7629-sgmiisys", "syscon" - #clock-cells: Must be 1 +- mediatek,physpeed: Should be one of "auto", "1000" or "2500" to match up +the capability of the target PHY. The SGMIISYS controller uses the common clk binding from Documentation/devicetree/bindings/clock/clock-bindings.txt -- 2.17.1
[PATCH net-next v1 6/6] arm64: dts: mt7622: Enlarge the SGMII register range
From: Sean Wang Enlarge the SGMII register range and using 2.5G force mode on default. Signed-off-by: Sean Wang --- arch/arm64/boot/dts/mediatek/mt7622.dtsi | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi b/arch/arm64/boot/dts/mediatek/mt7622.dtsi index 4b1f5ae710eb..d1e13d340e26 100644 --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi @@ -929,7 +929,8 @@ sgmiisys: sgmiisys@1b128000 { compatible = "mediatek,mt7622-sgmiisys", "syscon"; - reg = <0 0x1b128000 0 0x1000>; + reg = <0 0x1b128000 0 0x3000>; #clock-cells = <1>; + mediatek,physpeed = "2500"; }; }; -- 2.17.1
[PATCH net-next v1 0/6] Add MT7629 ethernet support
From: Sean Wang MT7629 inlcudes two sets of SGMIIs used for external switch or PHY, and embedded switch (ESW) via GDM1, GePHY via GMAC2, so add several patches in the series to make the code base common with the old SoCs. The patch 1, 3 and 6, adds extension for SGMII to have the hardware configured for 1G, 2.5G and AN to fit the capability of the target PHY. In patch 6 could be an example showing how to use these configurations for underlying PHY speed to match up the link speed of the target PHY. The patch 4 is used for automatically configured the hardware path from GMACx to the target PHY by the description in deviceetree topology to determine the proper value for the corresponding MUX. The patch 2 and 5 is for the update for MT7629 including dt-binding document and its driver. Sean Wang (6): dt-bindings: clock: mediatek: Add an extra required property to sgmiisys dt-bindings: net: mediatek: Add support for MediaTek MT7629 SoC net: ethernet: mediatek: Extend SGMII related functions net: ethernet: mediatek: Integrate hardware path from GMAC to PHY variants net: ethernet: mediatek: Add MT7629 ethernet support arm64: dts: mt7622: Enlarge the SGMII register range .../arm/mediatek/mediatek,sgmiisys.txt| 2 + .../devicetree/bindings/net/mediatek-net.txt | 14 +- arch/arm64/boot/dts/mediatek/mt7622.dtsi | 3 +- drivers/net/ethernet/mediatek/Makefile| 3 +- drivers/net/ethernet/mediatek/mtk_eth_path.c | 323 ++ drivers/net/ethernet/mediatek/mtk_eth_soc.c | 97 +++--- drivers/net/ethernet/mediatek/mtk_eth_soc.h | 177 +- drivers/net/ethernet/mediatek/mtk_sgmii.c | 105 ++ 8 files changed, 647 insertions(+), 77 deletions(-) create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_path.c create mode 100644 drivers/net/ethernet/mediatek/mtk_sgmii.c -- 2.17.1
[PATCH net-next v1 2/6] dt-bindings: net: mediatek: Add support for MediaTek MT7629 SoC
From: Sean Wang Add binding document for the ethernet on MT7629 SoC. Signed-off-by: Sean Wang --- .../devicetree/bindings/net/mediatek-net.txt | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt index 503f2b9194e2..770ff98d4524 100644 --- a/Documentation/devicetree/bindings/net/mediatek-net.txt +++ b/Documentation/devicetree/bindings/net/mediatek-net.txt @@ -11,6 +11,7 @@ Required properties: "mediatek,mt2701-eth": for MT2701 SoC "mediatek,mt7623-eth", "mediatek,mt2701-eth": for MT7623 SoC "mediatek,mt7622-eth": for MT7622 SoC + "mediatek,mt7629-eth": for MT7629 SoC - reg: Address and length of the register set for the device - interrupts: Should contain the three frame engines interrupts in numeric order. These are fe_int0, fe_int1 and fe_int2. @@ -19,14 +20,23 @@ Required properties: "ethif", "esw", "gp2", "gp1" : For MT2701 and MT7623 SoC "ethif", "esw", "gp0", "gp1", "gp2", "sgmii_tx250m", "sgmii_rx250m", "sgmii_cdr_ref", "sgmii_cdr_fb", "sgmii_ck", "eth2pll" : For MT7622 SoC + "ethif", "sgmiitop", "esw", "gp0", "gp1", "gp2", "fe", "sgmii_tx250m", + "sgmii_rx250m", "sgmii_cdr_ref", "sgmii_cdr_fb", "sgmii2_tx250m", + "sgmii2_rx250m", "sgmii2_cdr_ref", "sgmii2_cdr_fb", "sgmii_ck", + "eth2pll" : For MT7629 SoC. - power-domains: phandle to the power domain that the ethernet is part of - resets: Should contain phandles to the ethsys reset signals - reset-names: Should contain the names of reset signal listed in the resets property These are "fe", "gmac" and "ppe" - mediatek,ethsys: phandle to the syscon node that handles the port setup -- mediatek,sgmiisys: phandle to the syscon node that handles the SGMII setup - which is required for those SoCs equipped with SGMII such as MT7622 SoC. +- mediatek,infracfg: phandle to the syscon node that handles the path from + GMAC to PHY variants, which is required for MT7629 SoC. +- mediatek,sgmiisys: a list of phandles to the syscon node that handles the + SGMII setup which is required for those SoCs equipped with SGMII such + as MT7622 and MT7629 SoC. And MT7622 have only one set of SGMII shared + by GMAC1 and GMAC2; MT7629 have two independent sets of SGMII directed + to GMAC1 and GMAC2, respectively. - mediatek,pctl: phandle to the syscon node that handles the ports slew rate and driver current: only for MT2701 and MT7623 SoC -- 2.17.1
[PATCH net-next v1 5/6] net: ethernet: mediatek: Add MT7629 ethernet support
From: Sean Wang Add ethernet support to MT7629 SoC Signed-off-by: Sean Wang --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 14 -- drivers/net/ethernet/mediatek/mtk_eth_soc.h | 19 +++ 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 382173fa4752..362eacd82b92 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -54,8 +54,10 @@ static const struct mtk_ethtool_stats { }; static const char * const mtk_clks_source_name[] = { - "ethif", "esw", "gp0", "gp1", "gp2", "trgpll", "sgmii_tx250m", - "sgmii_rx250m", "sgmii_cdr_ref", "sgmii_cdr_fb", "sgmii_ck", "eth2pll" + "ethif", "sgmiitop", "esw", "gp0", "gp1", "gp2", "fe", "trgpll", + "sgmii_tx250m", "sgmii_rx250m", "sgmii_cdr_ref", "sgmii_cdr_fb", + "sgmii2_tx250m", "sgmii2_rx250m", "sgmii2_cdr_ref", "sgmii2_cdr_fb", + "sgmii_ck", "eth2pll", }; void mtk_w32(struct mtk_eth *eth, u32 val, unsigned reg) @@ -2629,11 +2631,19 @@ static const struct mtk_soc_data mt7623_data = { .required_pctl = true, }; +static const struct mtk_soc_data mt7629_data = { + .ana_rgc3 = 0x128, + .caps = MT7629_CAPS | MTK_HWLRO, + .required_clks = MT7629_CLKS_BITMAP, + .required_pctl = false, +}; + const struct of_device_id of_mtk_match[] = { { .compatible = "mediatek,mt2701-eth", .data = _data}, { .compatible = "mediatek,mt7621-eth", .data = _data}, { .compatible = "mediatek,mt7622-eth", .data = _data}, { .compatible = "mediatek,mt7623-eth", .data = _data}, + { .compatible = "mediatek,mt7629-eth", .data = _data}, {}, }; MODULE_DEVICE_TABLE(of, of_mtk_match); diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h index 89d68dd60b3d..a0aa5008d5cc 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h @@ -475,15 +475,21 @@ enum mtk_tx_flags { */ enum mtk_clks_map { MTK_CLK_ETHIF, + MTK_CLK_SGMIITOP, MTK_CLK_ESW, MTK_CLK_GP0, MTK_CLK_GP1, MTK_CLK_GP2, + MTK_CLK_FE, MTK_CLK_TRGPLL, MTK_CLK_SGMII_TX_250M, MTK_CLK_SGMII_RX_250M, MTK_CLK_SGMII_CDR_REF, MTK_CLK_SGMII_CDR_FB, + MTK_CLK_SGMII2_TX_250M, + MTK_CLK_SGMII2_RX_250M, + MTK_CLK_SGMII2_CDR_REF, + MTK_CLK_SGMII2_CDR_FB, MTK_CLK_SGMII_CK, MTK_CLK_ETH2PLL, MTK_CLK_MAX @@ -502,6 +508,19 @@ enum mtk_clks_map { BIT(MTK_CLK_SGMII_CK) | \ BIT(MTK_CLK_ETH2PLL)) #define MT7621_CLKS_BITMAP (0) +#define MT7629_CLKS_BITMAP (BIT(MTK_CLK_ETHIF) | BIT(MTK_CLK_ESW) | \ +BIT(MTK_CLK_GP0) | BIT(MTK_CLK_GP1) | \ +BIT(MTK_CLK_GP2) | BIT(MTK_CLK_FE) | \ +BIT(MTK_CLK_SGMII_TX_250M) | \ +BIT(MTK_CLK_SGMII_RX_250M) | \ +BIT(MTK_CLK_SGMII_CDR_REF) | \ +BIT(MTK_CLK_SGMII_CDR_FB) | \ +BIT(MTK_CLK_SGMII2_TX_250M) | \ +BIT(MTK_CLK_SGMII2_RX_250M) | \ +BIT(MTK_CLK_SGMII2_CDR_REF) | \ +BIT(MTK_CLK_SGMII2_CDR_FB) | \ +BIT(MTK_CLK_SGMII_CK) | \ +BIT(MTK_CLK_ETH2PLL) | BIT(MTK_CLK_SGMIITOP)) enum mtk_dev_state { MTK_HW_INIT, -- 2.17.1
[PATCH net-next v1 3/6] net: ethernet: mediatek: Extend SGMII related functions
From: Sean Wang Add SGMII related logic into a separate file, and also provides options for forcing 1G, 2.5, AN mode for the target PHY, that can be determined from SGMII node in DTS. Signed-off-by: Sean Wang --- drivers/net/ethernet/mediatek/Makefile | 2 +- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 75 -- drivers/net/ethernet/mediatek/mtk_eth_soc.h | 39 +++- drivers/net/ethernet/mediatek/mtk_sgmii.c | 105 4 files changed, 184 insertions(+), 37 deletions(-) create mode 100644 drivers/net/ethernet/mediatek/mtk_sgmii.c diff --git a/drivers/net/ethernet/mediatek/Makefile b/drivers/net/ethernet/mediatek/Makefile index d41a2414c575..b8206605154e 100644 --- a/drivers/net/ethernet/mediatek/Makefile +++ b/drivers/net/ethernet/mediatek/Makefile @@ -3,4 +3,4 @@ # Makefile for the Mediatek SoCs built-in ethernet macs # -obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth_soc.o +obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth_soc.o mtk_sgmii.o diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 765cd56ebcd2..d0cff646d3de 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -165,36 +165,37 @@ static void mtk_gmac0_rgmii_adjust(struct mtk_eth *eth, int speed) mtk_w32(eth, val, TRGMII_TCK_CTRL); } -static void mtk_gmac_sgmii_hw_setup(struct mtk_eth *eth, int mac_id) +static int mtk_gmac_sgmii_hw_setup(struct mtk_eth *eth, int mac_id) { + int sid, err; u32 val; - /* Setup the link timer and QPHY power up inside SGMIISYS */ - regmap_write(eth->sgmiisys, SGMSYS_PCS_LINK_TIMER, -SGMII_LINK_TIMER_DEFAULT); + /* Enable GMAC with SGMII once we finish the SGMII setup. */ + regmap_read(eth->ethsys, ETHSYS_SYSCFG0, ); + val &= ~SYSCFG0_SGMII_MASK; + regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val); - regmap_read(eth->sgmiisys, SGMSYS_SGMII_MODE, ); - val |= SGMII_REMOTE_FAULT_DIS; - regmap_write(eth->sgmiisys, SGMSYS_SGMII_MODE, val); + if (MTK_HAS_CAPS(eth->soc->caps, MTK_GMAC_SHARED_SGMII)) + sid = 0; + else + sid = mac_id; - regmap_read(eth->sgmiisys, SGMSYS_PCS_CONTROL_1, ); - val |= SGMII_AN_RESTART; - regmap_write(eth->sgmiisys, SGMSYS_PCS_CONTROL_1, val); + if (MTK_HAS_FLAGS(eth->sgmii->flags[sid], MTK_SGMII_PHYSPEED_AN)) + err = mtk_sgmii_setup_mode_an(eth->sgmii, sid); + else + err = mtk_sgmii_setup_mode_force(eth->sgmii, sid); - regmap_read(eth->sgmiisys, SGMSYS_QPHY_PWR_STATE_CTRL, ); - val &= ~SGMII_PHYA_PWD; - regmap_write(eth->sgmiisys, SGMSYS_QPHY_PWR_STATE_CTRL, val); + if (err) + return err; /* Determine MUX for which GMAC uses the SGMII interface */ - if (MTK_HAS_CAPS(eth->soc->caps, MTK_DUAL_GMAC_SHARED_SGMII)) { - regmap_read(eth->ethsys, ETHSYS_SYSCFG0, ); - val &= ~SYSCFG0_SGMII_MASK; - val |= !mac_id ? SYSCFG0_SGMII_GMAC1 : SYSCFG0_SGMII_GMAC2; - regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val); - - dev_info(eth->dev, "setup shared sgmii for gmac=%d\n", -mac_id); - } + regmap_read(eth->ethsys, ETHSYS_SYSCFG0, ); + if (!mac_id) + val |= SYSCFG0_SGMII_GMAC1; + else + val |= MTK_HAS_CAPS(eth->soc->caps, MTK_GMAC_SHARED_SGMII) ? + SYSCFG0_SGMII_GMAC2 : SYSCFG0_SGMII_GMAC2_V2; + regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val); /* Setup the GMAC1 going through SGMII path when SoC also support * ESW on GMAC1 @@ -204,6 +205,8 @@ static void mtk_gmac_sgmii_hw_setup(struct mtk_eth *eth, int mac_id) mtk_w32(eth, 0, MTK_MAC_MISC); dev_info(eth->dev, "setup gmac1 going through sgmii"); } + + return 0; } static void mtk_phy_link_adjust(struct net_device *dev) @@ -295,6 +298,7 @@ static int mtk_phy_connect(struct net_device *dev) struct mtk_eth *eth; struct device_node *np; u32 val; + int err; eth = mac->hw; np = of_parse_phandle(mac->of_node, "phy-handle", 0); @@ -314,8 +318,11 @@ static int mtk_phy_connect(struct net_device *dev) case PHY_INTERFACE_MODE_RGMII: break; case PHY_INTERFACE_MODE_SGMII: - if (MTK_HAS_CAPS(eth->soc->caps, MTK_SGMII)) - mtk_gmac_sgmii_hw_setup(eth, mac->id); + if (MTK_HAS_CAPS(eth->soc->caps, MTK_SGMII)) { + err = mtk_gmac_sgmii_hw_setup(eth, mac->id); + if (err) + goto err_phy; + } break; case PHY_INTERFACE_MODE_MII: mac->ge_mode = 1; @@ -2484,13 +2491,16 @@
Re: [PATCH bpf v4] libbpf: Return btf_fd for load_sk_storage_btf
On Thu, May 30, 2019 at 2:34 PM Song Liu wrote: > > On Wed, May 29, 2019 at 11:30 AM Michal Rostecki > wrote: > > > > Before this change, function load_sk_storage_btf expected that > > libbpf__probe_raw_btf was returning a BTF descriptor, but in fact it was > > returning an information about whether the probe was successful (0 or > > 1). load_sk_storage_btf was using that value as an argument of the close > > function, which was resulting in closing stdout and thus terminating the > > process which called that function. > > > > That bug was visible in bpftool. `bpftool feature` subcommand was always > > exiting too early (because of closed stdout) and it didn't display all > > requested probes. `bpftool -j feature` or `bpftool -p feature` were not > > returning a valid json object. > > > > This change renames the libbpf__probe_raw_btf function to > > libbpf__load_raw_btf, which now returns a BTF descriptor, as expected in > > load_sk_storage_btf. > > > > v2: > > - Fix typo in the commit message. > > > > v3: > > - Simplify BTF descriptor handling in bpf_object__probe_btf_* functions. > > - Rename libbpf__probe_raw_btf function to libbpf__load_raw_btf and > > return a BTF descriptor. > > > > v4: > > - Fix typo in the commit message. > > > > Fixes: d7c4b3980c18 ("libbpf: detect supported kernel BTF features and > > sanitize BTF") > > Signed-off-by: Michal Rostecki > > Acked-by: Andrii Nakryiko > > Acked-by: Song Liu Applied. Thanks!
Re: [RFC][PATCH 2/2] reset: qcom-pon: Add support for gen2 pon
On Fri, May 31, 2019 at 4:53 PM Bjorn Andersson wrote: > > On Fri 31 May 16:47 PDT 2019, John Stultz wrote: > > > Add support for gen2 pon register so "reboot bootloader" can > > work on pixel3 and db845. > > > > Cc: Andy Gross > > Cc: David Brown > > Cc: Bjorn Andersson > > Cc: Amit Pundir > > Cc: Rob Herring > > Cc: Mark Rutland > > Cc: Sebastian Reichel > > Cc: linux-arm-...@vger.kernel.org > > Cc: devicet...@vger.kernel.org > > Signed-off-by: John Stultz > > --- > > arch/arm64/boot/dts/qcom/pm8998.dtsi | 2 +- > > drivers/power/reset/qcom-pon.c | 15 --- > > 2 files changed, 13 insertions(+), 4 deletions(-) > > > > diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi > > b/arch/arm64/boot/dts/qcom/pm8998.dtsi > > index d3ca35a940fb..051a52df80f9 100644 > > --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi > > +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi > > @@ -39,7 +39,7 @@ > > #size-cells = <0>; > > > > pm8998_pon: pon@800 { > > - compatible = "qcom,pm8916-pon"; > > + compatible = "qcom,pm8998-pon"; > > > > reg = <0x800>; > > mode-bootloader = <0x2>; > > We want to take this through arm-soc and the rest through Sebastian's > tree, so please provide the dts update in a separate commit. Sure. I wasn't sure if tracking the change in a separate patch was worth it for such a trivial oneliner, but that's fine, I'll split it out. thanks for the review! -john
Re: [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup
On Wed, May 29, 2019 at 6:04 PM Roman Gushchin wrote: > > During my work on memcg-based memory accounting for bpf maps > I've done some cleanups and refactorings of the existing > memlock rlimit-based code. It makes it more robust, unifies > size to pages conversion, size checks and corresponding error > codes. Also it adds coverage for cgroup local storage and > socket local storage maps. > > It looks like some preliminary work on the mm side might be > required to start working on the memcg-based accounting, > so I'm sending these patches as a separate patchset. Applied. Thanks
Re: [PATCH v2 00/17] net: introduce Qualcomm IPA driver
On 2019-05-31 17:33, Bjorn Andersson wrote: On Fri 31 May 13:47 PDT 2019, Alex Elder wrote: On 5/31/19 2:19 PM, Arnd Bergmann wrote: > On Fri, May 31, 2019 at 6:36 PM Alex Elder wrote: >> On 5/31/19 9:58 AM, Dan Williams wrote: >>> On Thu, 2019-05-30 at 22:53 -0500, Alex Elder wrote: >>> >>> My question from the Nov 2018 IPA rmnet driver still stands; how does >>> this relate to net/ethernet/qualcomm/rmnet/ if at all? And if this is >>> really just a netdev talking to the IPA itself and unrelated to >>> net/ethernet/qualcomm/rmnet, let's call it "ipa%d" and stop cargo- >>> culting rmnet around just because it happens to be a net driver for a >>> QC SoC. >> >> First, the relationship between the IPA driver and the rmnet driver >> is that the IPA driver is assumed to sit between the rmnet driver >> and the hardware. > > Does this mean that IPA can only be used to back rmnet, and rmnet > can only be used on top of IPA, or can or both of them be combined > with another driver to talk to instead? No it does not mean that. As I understand it, one reason for the rmnet layer was to abstract the back end, which would allow using a modem, or using something else (a LAN?), without exposing certain details of the hardware. (Perhaps to support multiplexing, etc. without duplicating that logic in two "back-end" drivers?) To be perfectly honest, at first I thought having IPA use rmnet was a cargo cult thing like Dan suggested, because I didn't see the benefit. I now see why one would use that pass-through layer to handle the QMAP features. But back to your question. The other thing is that I see no reason the IPA couldn't present a "normal" (non QMAP) interface for a modem. It's something I'd really like to be able to do, but I can't do it without having the modem firmware change its configuration for these endpoints. My access to the people who implement the modem firmware has been very limited (something I hope to improve), and unless and until I can get corresponding changes on the modem side to implement connections that don't use QMAP, I can't implement such a thing. But any such changes would either be years into the future or for specific devices and as such not applicable to any/most of devices on the market now or in the coming years. But as Arnd points out, if the software split between IPA and rmnet is suboptimal your are encouraged to fix that. Regards, Bjorn The split rmnet design was chosen because we could place rmnet over any transport - IPA, PCIe (https://lkml.org/lkml/2018/4/26/1159) or USB. rmnet registers a rx handler, so the rmnet packet processing itself happens in the same softirq when packets are queued to network stack by IPA. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 3/5] asm-generic: Register fchmodat4 as syscall 428
On Fri, 31 May 2019 12:56:39 PDT (-0700), Arnd Bergmann wrote: On Fri, May 31, 2019 at 9:23 PM Palmer Dabbelt wrote: Signed-off-by: Palmer Dabbelt As usual, each patch needs a changelog text. I would prefer having a single patch here that changes /all/ system call tables at once, rather than doing one at a time like we used to. Works for me. That also gives me something to write about it the text :) In linux-next, we are at number 434 now, and there are a couple of other new system calls being proposed right now (more than usual), so you may have to change the number a few times. OK, no problem. It'll be a bit easier to handle the number that way. Note: most architectures use .tbl files now, the exceptions are include/uapi/asm-generic/unistd.h and arch/arm64/include/asm/unistd32.h, and the latter also requires changing __NR_compat_syscalls in asm/unistd.h. Numbers should now be the same across architectures, except for alpha, which has a +110 offset. We have discussed ways to have a single file to modify for a new call on all architectures, but no patches yet. OK, thanks. I'll wait a bit for feedback, but unless there's anything else I'll go ahead and finish this.
Re: [RFC][PATCH 2/2] reset: qcom-pon: Add support for gen2 pon
On Fri 31 May 16:47 PDT 2019, John Stultz wrote: > Add support for gen2 pon register so "reboot bootloader" can > work on pixel3 and db845. > > Cc: Andy Gross > Cc: David Brown > Cc: Bjorn Andersson > Cc: Amit Pundir > Cc: Rob Herring > Cc: Mark Rutland > Cc: Sebastian Reichel > Cc: linux-arm-...@vger.kernel.org > Cc: devicet...@vger.kernel.org > Signed-off-by: John Stultz > --- > arch/arm64/boot/dts/qcom/pm8998.dtsi | 2 +- > drivers/power/reset/qcom-pon.c | 15 --- > 2 files changed, 13 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi > b/arch/arm64/boot/dts/qcom/pm8998.dtsi > index d3ca35a940fb..051a52df80f9 100644 > --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi > +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi > @@ -39,7 +39,7 @@ > #size-cells = <0>; > > pm8998_pon: pon@800 { > - compatible = "qcom,pm8916-pon"; > + compatible = "qcom,pm8998-pon"; > > reg = <0x800>; > mode-bootloader = <0x2>; We want to take this through arm-soc and the rest through Sebastian's tree, so please provide the dts update in a separate commit. Apart from that this looks good! Reviewed-by: Bjorn Andersson Regards, Bjorn > diff --git a/drivers/power/reset/qcom-pon.c b/drivers/power/reset/qcom-pon.c > index 3fa1642d4c54..d0336a1612a4 100644 > --- a/drivers/power/reset/qcom-pon.c > +++ b/drivers/power/reset/qcom-pon.c > @@ -14,11 +14,15 @@ > > #define PON_SOFT_RB_SPARE0x8f > > +#define GEN1_REASON_SHIFT2 > +#define GEN2_REASON_SHIFT1 > + > struct pm8916_pon { > struct device *dev; > struct regmap *regmap; > u32 baseaddr; > struct reboot_mode_driver reboot_mode; > + long reason_shift; > }; > > static int pm8916_reboot_mode_write(struct reboot_mode_driver *reboot, > @@ -30,15 +34,18 @@ static int pm8916_reboot_mode_write(struct > reboot_mode_driver *reboot, > > ret = regmap_update_bits(pon->regmap, >pon->baseaddr + PON_SOFT_RB_SPARE, > - 0xfc, magic << 2); > + 0xfc, magic << pon->reason_shift); > if (ret < 0) > dev_err(pon->dev, "update reboot mode bits failed\n"); > > return ret; > } > > +static const struct of_device_id pm8916_pon_id_table[]; > + > static int pm8916_pon_probe(struct platform_device *pdev) > { > + const struct of_device_id *match; > struct pm8916_pon *pon; > int error; > > @@ -60,6 +67,7 @@ static int pm8916_pon_probe(struct platform_device *pdev) > return error; > > pon->reboot_mode.dev = >dev; > + pon->reason_shift = of_device_get_match_data(>dev); > pon->reboot_mode.write = pm8916_reboot_mode_write; > error = devm_reboot_mode_register(>dev, >reboot_mode); > if (error) { > @@ -73,8 +81,9 @@ static int pm8916_pon_probe(struct platform_device *pdev) > } > > static const struct of_device_id pm8916_pon_id_table[] = { > - { .compatible = "qcom,pm8916-pon" }, > - { .compatible = "qcom,pms405-pon" }, > + { .compatible = "qcom,pm8916-pon", .data = (void *)GEN1_REASON_SHIFT }, > + { .compatible = "qcom,pms405-pon", .data = (void *)GEN1_REASON_SHIFT }, > + { .compatible = "qcom,pm8998-pon", .data = (void *)GEN2_REASON_SHIFT }, > { } > }; > MODULE_DEVICE_TABLE(of, pm8916_pon_id_table); > -- > 2.17.1 >
Re: [RFC][PATCH 1/2] dt-bindings: power: reset: qcom: Add qcom,pm8998-pon compatability line
On Fri 31 May 16:47 PDT 2019, John Stultz wrote: > Update bindings to support for qcom,pm8998-pon which uses gen2 pon > > Cc: Andy Gross > Cc: David Brown > Cc: Bjorn Andersson Reviewed-by: Bjorn Andersson > Cc: Amit Pundir > Cc: Rob Herring > Cc: Mark Rutland > Cc: Sebastian Reichel > Cc: linux-arm-...@vger.kernel.org > Cc: devicet...@vger.kernel.org > Signed-off-by: John Stultz > --- > Documentation/devicetree/bindings/power/reset/qcom,pon.txt | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt > b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt > index 5705f575862d..0c0dc3a1e693 100644 > --- a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt > +++ b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt > @@ -9,6 +9,7 @@ Required Properties: > -compatible: Must be one of: > "qcom,pm8916-pon" > "qcom,pms405-pon" > + "qcom,pm8998-pon" > > -reg: Specifies the physical address of the pon register > > -- > 2.17.1 >
[RFC][PATCH 1/2] dt-bindings: power: reset: qcom: Add qcom,pm8998-pon compatability line
Update bindings to support for qcom,pm8998-pon which uses gen2 pon Cc: Andy Gross Cc: David Brown Cc: Bjorn Andersson Cc: Amit Pundir Cc: Rob Herring Cc: Mark Rutland Cc: Sebastian Reichel Cc: linux-arm-...@vger.kernel.org Cc: devicet...@vger.kernel.org Signed-off-by: John Stultz --- Documentation/devicetree/bindings/power/reset/qcom,pon.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt index 5705f575862d..0c0dc3a1e693 100644 --- a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt +++ b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt @@ -9,6 +9,7 @@ Required Properties: -compatible: Must be one of: "qcom,pm8916-pon" "qcom,pms405-pon" + "qcom,pm8998-pon" -reg: Specifies the physical address of the pon register -- 2.17.1
[RFC][PATCH 2/2] reset: qcom-pon: Add support for gen2 pon
Add support for gen2 pon register so "reboot bootloader" can work on pixel3 and db845. Cc: Andy Gross Cc: David Brown Cc: Bjorn Andersson Cc: Amit Pundir Cc: Rob Herring Cc: Mark Rutland Cc: Sebastian Reichel Cc: linux-arm-...@vger.kernel.org Cc: devicet...@vger.kernel.org Signed-off-by: John Stultz --- arch/arm64/boot/dts/qcom/pm8998.dtsi | 2 +- drivers/power/reset/qcom-pon.c | 15 --- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi b/arch/arm64/boot/dts/qcom/pm8998.dtsi index d3ca35a940fb..051a52df80f9 100644 --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi @@ -39,7 +39,7 @@ #size-cells = <0>; pm8998_pon: pon@800 { - compatible = "qcom,pm8916-pon"; + compatible = "qcom,pm8998-pon"; reg = <0x800>; mode-bootloader = <0x2>; diff --git a/drivers/power/reset/qcom-pon.c b/drivers/power/reset/qcom-pon.c index 3fa1642d4c54..d0336a1612a4 100644 --- a/drivers/power/reset/qcom-pon.c +++ b/drivers/power/reset/qcom-pon.c @@ -14,11 +14,15 @@ #define PON_SOFT_RB_SPARE 0x8f +#define GEN1_REASON_SHIFT 2 +#define GEN2_REASON_SHIFT 1 + struct pm8916_pon { struct device *dev; struct regmap *regmap; u32 baseaddr; struct reboot_mode_driver reboot_mode; + long reason_shift; }; static int pm8916_reboot_mode_write(struct reboot_mode_driver *reboot, @@ -30,15 +34,18 @@ static int pm8916_reboot_mode_write(struct reboot_mode_driver *reboot, ret = regmap_update_bits(pon->regmap, pon->baseaddr + PON_SOFT_RB_SPARE, -0xfc, magic << 2); +0xfc, magic << pon->reason_shift); if (ret < 0) dev_err(pon->dev, "update reboot mode bits failed\n"); return ret; } +static const struct of_device_id pm8916_pon_id_table[]; + static int pm8916_pon_probe(struct platform_device *pdev) { + const struct of_device_id *match; struct pm8916_pon *pon; int error; @@ -60,6 +67,7 @@ static int pm8916_pon_probe(struct platform_device *pdev) return error; pon->reboot_mode.dev = >dev; + pon->reason_shift = of_device_get_match_data(>dev); pon->reboot_mode.write = pm8916_reboot_mode_write; error = devm_reboot_mode_register(>dev, >reboot_mode); if (error) { @@ -73,8 +81,9 @@ static int pm8916_pon_probe(struct platform_device *pdev) } static const struct of_device_id pm8916_pon_id_table[] = { - { .compatible = "qcom,pm8916-pon" }, - { .compatible = "qcom,pms405-pon" }, + { .compatible = "qcom,pm8916-pon", .data = (void *)GEN1_REASON_SHIFT }, + { .compatible = "qcom,pms405-pon", .data = (void *)GEN1_REASON_SHIFT }, + { .compatible = "qcom,pm8998-pon", .data = (void *)GEN2_REASON_SHIFT }, { } }; MODULE_DEVICE_TABLE(of, pm8916_pon_id_table); -- 2.17.1
[PATCH 6/8] EDAC/amd64: Decode syndrome before translating address
From: Yazen Ghannam AMD Family 17h systems currently require address translation in order to report the system address of a DRAM ECC error. This is currently done before decoding the syndrome information. The syndrome information does not depend on the address translation, so the proper EDAC csrow/channel reporting can function without the address. However, the syndrome information will not be decoded if the address translation fails. Decode the syndrome information before doing the address translation. The syndrome information is architecturally defined in MCA_SYND and can be considered robust. The address translation is system-specific and may fail on newer systems without proper updates to the translation algorithm. Fixes: 713ad54675fd ("EDAC, amd64: Define and register UMC error decode function") Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index f0424c10cac0..4058b24b8e04 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -2567,13 +2567,6 @@ static void decode_umc_error(int node_id, struct mce *m) err.channel = find_umc_channel(m); - if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, _addr)) { - err.err_code = ERR_NORM_ADDR; - goto log_error; - } - - error_address_to_page_and_offset(sys_addr, ); - if (!(m->status & MCI_STATUS_SYNDV)) { err.err_code = ERR_SYND; goto log_error; @@ -2590,6 +2583,13 @@ static void decode_umc_error(int node_id, struct mce *m) err.csrow = m->synd & 0x7; + if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, _addr)) { + err.err_code = ERR_NORM_ADDR; + goto log_error; + } + + error_address_to_page_and_offset(sys_addr, ); + log_error: __log_ecc_error(mci, , ecc_type); } -- 2.17.1
[PATCH 3/8] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
From: Yazen Ghannam AMD Family 17h systems support x4 and x16 DRAM devices. However, the device type is not checked when setting EDAC_CTL_CAP. Set the appropriate EDAC_CTL_CAP flag based on the device type. Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h") Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index dd60cf5a3d96..125d6e2a828e 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid) static inline void f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) { - u8 i, ecc_en = 1, cpk_en = 1; + u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1; for_each_umc(i) { if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) { ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED); cpk_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_CHIPKILL_CAP); + + dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6)); + dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7)); } } @@ -3163,8 +3166,12 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) if (ecc_en) { mci->edac_ctl_cap |= EDAC_FLAG_SECDED; - if (cpk_en) - mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED; + if (cpk_en) { + if (dev_x4) + mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED; + else if (dev_x16) + mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED; + } } } -- 2.17.1
[PATCH 7/8] EDAC/amd64: Cache secondary Chip Select registers
From: Yazen Ghannam AMD Family 17h systems have a set of secondary Chip Select Base Addresses and Address Masks. These do not represent unique Chip Selects, rather they are used in conjunction with the primary Chip Select registers in certain use cases. Cache these secondary Chip Select registers for future use. Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 23 --- drivers/edac/amd64_edac.h | 4 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 4058b24b8e04..006417cb79dc 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -943,34 +943,51 @@ static void prep_chip_selects(struct amd64_pvt *pvt) static void read_umc_base_mask(struct amd64_pvt *pvt) { - u32 umc_base_reg, umc_mask_reg; - u32 base_reg, mask_reg; - u32 *base, *mask; + u32 umc_base_reg, umc_base_reg_sec; + u32 umc_mask_reg, umc_mask_reg_sec; + u32 base_reg, base_reg_sec; + u32 mask_reg, mask_reg_sec; + u32 *base, *base_sec; + u32 *mask, *mask_sec; int cs, umc; for_each_umc(umc) { umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR; + umc_base_reg_sec = get_umc_base(umc) + UMCCH_BASE_ADDR_SEC; for_each_chip_select(cs, umc, pvt) { base = >csels[umc].csbases[cs]; + base_sec = >csels[umc].csbases_sec[cs]; base_reg = umc_base_reg + (cs * 4); + base_reg_sec = umc_base_reg_sec + (cs * 4); if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n", umc, cs, *base, base_reg); + + if (!amd_smn_read(pvt->mc_node_id, base_reg_sec, base_sec)) + edac_dbg(0, "DCSB_SEC%d[%d]=0x%08x reg: 0x%x\n", +umc, cs, *base_sec, base_reg_sec); } umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK; + umc_mask_reg_sec = get_umc_base(umc) + UMCCH_ADDR_MASK_SEC; for_each_chip_select_mask(cs, umc, pvt) { mask = >csels[umc].csmasks[cs]; + mask_sec = >csels[umc].csmasks_sec[cs]; mask_reg = umc_mask_reg + (cs * 4); + mask_reg_sec = umc_mask_reg_sec + (cs * 4); if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n", umc, cs, *mask, mask_reg); + + if (!amd_smn_read(pvt->mc_node_id, mask_reg_sec, mask_sec)) + edac_dbg(0, "DCSM_SEC%d[%d]=0x%08x reg: 0x%x\n", +umc, cs, *mask_sec, mask_reg_sec); } } } diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 4dce6a2ac75f..68f12de6e654 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -259,7 +259,9 @@ /* UMC CH register offsets */ #define UMCCH_BASE_ADDR0x0 +#define UMCCH_BASE_ADDR_SEC0x10 #define UMCCH_ADDR_MASK0x20 +#define UMCCH_ADDR_MASK_SEC0x28 #define UMCCH_ADDR_CFG 0x30 #define UMCCH_DIMM_CFG 0x80 #define UMCCH_UMC_CFG 0x100 @@ -312,9 +314,11 @@ struct dram_range { /* A DCT chip selects collection */ struct chip_select { u32 csbases[NUM_CHIPSELECTS]; + u32 csbases_sec[NUM_CHIPSELECTS]; u8 b_cnt; u32 csmasks[NUM_CHIPSELECTS]; + u32 csmasks_sec[NUM_CHIPSELECTS]; u8 m_cnt; }; -- 2.17.1
[PATCH 2/8] EDAC/amd64: Support more than two controllers for chip selects handling
From: Yazen Ghannam The struct chip_select array that's used for saving chip select bases and masks is fixed at length of two. There should be one struct chip_select for each controller, so this array should be increased to support systems that may have more than two controllers. Increase the size of the struct chip_select array to eight, which is the largest number of controllers per die currently supported on AMD systems. Also, carve out the Family 17h+ reading of the bases/masks into a separate function. This effectively reverts the original bases/masks reading code to before Family 17h support was added. This is a second version of a commit that was reverted. Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for chip select handling"") Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 122 +- drivers/edac/amd64_edac.h | 5 +- 2 files changed, 69 insertions(+), 58 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 9fa2f205f05c..dd60cf5a3d96 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -943,91 +943,101 @@ static void prep_chip_selects(struct amd64_pvt *pvt) pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4; pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2; } else if (pvt->fam >= 0x17) { - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4; - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2; + int umc; + + for_each_umc(umc) { + pvt->csels[umc].b_cnt = 4; + pvt->csels[umc].m_cnt = 2; + } + } else { pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8; pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4; } } +static void read_umc_base_mask(struct amd64_pvt *pvt) +{ + u32 umc_base_reg, umc_mask_reg; + u32 base_reg, mask_reg; + u32 *base, *mask; + int cs, umc; + + for_each_umc(umc) { + umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR; + + for_each_chip_select(cs, umc, pvt) { + base = >csels[umc].csbases[cs]; + + base_reg = umc_base_reg + (cs * 4); + + if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) + edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n", +umc, cs, *base, base_reg); + } + + umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK; + + for_each_chip_select_mask(cs, umc, pvt) { + mask = >csels[umc].csmasks[cs]; + + mask_reg = umc_mask_reg + (cs * 4); + + if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) + edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n", +umc, cs, *mask, mask_reg); + } + } +} + /* * Function 2 Offset F10_DCSB0; read in the DCS Base and DCS Mask registers */ static void read_dct_base_mask(struct amd64_pvt *pvt) { - int base_reg0, base_reg1, mask_reg0, mask_reg1, cs; + int cs; prep_chip_selects(pvt); - if (pvt->umc) { - base_reg0 = get_umc_base(0) + UMCCH_BASE_ADDR; - base_reg1 = get_umc_base(1) + UMCCH_BASE_ADDR; - mask_reg0 = get_umc_base(0) + UMCCH_ADDR_MASK; - mask_reg1 = get_umc_base(1) + UMCCH_ADDR_MASK; - } else { - base_reg0 = DCSB0; - base_reg1 = DCSB1; - mask_reg0 = DCSM0; - mask_reg1 = DCSM1; - } + if (pvt->umc) + return read_umc_base_mask(pvt); for_each_chip_select(cs, 0, pvt) { - int reg0 = base_reg0 + (cs * 4); - int reg1 = base_reg1 + (cs * 4); + int reg0 = DCSB0 + (cs * 4); + int reg1 = DCSB1 + (cs * 4); u32 *base0 = >csels[0].csbases[cs]; u32 *base1 = >csels[1].csbases[cs]; - if (pvt->umc) { - if (!amd_smn_read(pvt->mc_node_id, reg0, base0)) - edac_dbg(0, " DCSB0[%d]=0x%08x reg: 0x%x\n", -cs, *base0, reg0); - - if (!amd_smn_read(pvt->mc_node_id, reg1, base1)) - edac_dbg(0, " DCSB1[%d]=0x%08x reg: 0x%x\n", -cs, *base1, reg1); - } else { - if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, base0)) - edac_dbg(0, " DCSB0[%d]=0x%08x reg: F2x%x\n", -cs, *base0, reg0); + if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, base0)) + edac_dbg(0, " DCSB0[%d]=0x%08x reg:
[PATCH 8/8] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs
From: Yazen Ghannam Future AMD systems will support "Asymmetric" Dual-Rank DIMMs. These are DIMMs were the ranks are of different sizes. The even rank will use the Primary Even Chip Select registers and the odd rank will use the Secondary Odd Chip Select registers. Recognize if a Secondary Odd Chip Select is being used. Use the Secondary Odd Address Mask when calculating the chip select size. Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 006417cb79dc..6c284a4f980c 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -790,6 +790,9 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan) #define CS_EVEN_PRIMARYBIT(0) #define CS_ODD_PRIMARY BIT(1) +#define CS_ODD_SECONDARY BIT(2) + +#define csrow_sec_enabled(i, dct, pvt) ((pvt)->csels[(dct)].csbases_sec[(i)] & DCSB_CS_ENABLE) static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) { @@ -801,6 +804,10 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) if (csrow_enabled(2 * dimm + 1, ctrl, pvt)) cs_mode |= CS_ODD_PRIMARY; + /* Asymmetric Dual-Rank DIMM support. */ + if (csrow_sec_enabled(2 * dimm + 1, ctrl, pvt)) + cs_mode |= CS_ODD_SECONDARY; + return cs_mode; } @@ -1590,7 +1597,11 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, */ dimm = csrow_nr >> 1; - addr_mask_orig = pvt->csels[umc].csmasks[dimm]; + /* Asymmetric Dual-Rank DIMM support. */ + if (cs_mode & CS_ODD_SECONDARY) + addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm]; + else + addr_mask_orig = pvt->csels[umc].csmasks[dimm]; /* * The number of zero bits in the mask is equal to the number of bits -- 2.17.1
[PATCH 4/8] EDAC/amd64: Initialize DIMM info for systems with more than two channels
From: Yazen Ghannam Currently, the DIMM info for AMD Family 17h systems is initialized in init_csrows(). This function is shared with legacy systems, and it has a limit of two channel support. This prevents initialization of the DIMM info for a number of ranks, so there will be missing ranks in the EDAC sysfs. Create a new init_csrows_df() for Family17h+ and revert init_csrows() back to pre-Family17h support. Loop over all channels in the new function in order to support systems with more than two channels. Fixes: bdcee7747f5c ("EDAC/amd64: Support more than two Unified Memory Controllers") Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 63 ++- 1 file changed, 49 insertions(+), 14 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 125d6e2a828e..d0926b181c7c 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -2837,6 +2837,46 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig) return nr_pages; } +static int init_csrows_df(struct mem_ctl_info *mci) +{ + struct amd64_pvt *pvt = mci->pvt_info; + enum edac_type edac_mode = EDAC_NONE; + enum dev_type dev_type = DEV_UNKNOWN; + struct dimm_info *dimm; + int empty = 1; + u8 umc, cs; + + if (mci->edac_ctl_cap & EDAC_FLAG_S16ECD16ED) { + edac_mode = EDAC_S16ECD16ED; + dev_type = DEV_X16; + } else if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED) { + edac_mode = EDAC_S4ECD4ED; + dev_type = DEV_X4; + } else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED) { + edac_mode = EDAC_SECDED; + } + + for_each_umc(umc) { + for_each_chip_select(cs, umc, pvt) { + if (!csrow_enabled(cs, umc, pvt)) + continue; + + empty = 0; + dimm = mci->csrows[cs]->channels[umc]->dimm; + + edac_dbg(1, "MC node: %d, csrow: %d\n", + pvt->mc_node_id, cs); + + dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs); + dimm->mtype = pvt->dram_type; + dimm->edac_mode = edac_mode; + dimm->dtype = dev_type; + } + } + + return empty; +} + /* * Initialize the array of csrow attribute instances, based on the values * from pci config hardware registers. @@ -2851,15 +2891,16 @@ static int init_csrows(struct mem_ctl_info *mci) int nr_pages = 0; u32 val; - if (!pvt->umc) { - amd64_read_pci_cfg(pvt->F3, NBCFG, ); + if (pvt->umc) + return init_csrows_df(mci); + + amd64_read_pci_cfg(pvt->F3, NBCFG, ); - pvt->nbcfg = val; + pvt->nbcfg = val; - edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: %d|DramEccEn: %d]\n", -pvt->mc_node_id, val, -!!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE)); - } + edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: %d|DramEccEn: %d]\n", +pvt->mc_node_id, val, +!!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE)); /* * We iterate over DCT0 here but we look at DCT1 in parallel, if needed. @@ -2896,13 +2937,7 @@ static int init_csrows(struct mem_ctl_info *mci) edac_dbg(1, "Total csrow%d pages: %u\n", i, nr_pages); /* Determine DIMM ECC mode: */ - if (pvt->umc) { - if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED) - edac_mode = EDAC_S4ECD4ED; - else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED) - edac_mode = EDAC_SECDED; - - } else if (pvt->nbcfg & NBCFG_ECC_ENABLE) { + if (pvt->nbcfg & NBCFG_ECC_ENABLE) { edac_mode = (pvt->nbcfg & NBCFG_CHIPKILL) ? EDAC_S4ECD4ED : EDAC_SECDED; -- 2.17.1
[PATCH 5/8] EDAC/amd64: Find Chip Select memory size using Address Mask
From: Yazen Ghannam Chip Select memory size reporting on AMD Family 17h was recently fixed in order to account for interleaving. However, the current method is not robust. The Chip Select Address Mask can be used to find the memory size. There are a few cases. 1) For single-rank, use the address mask as the size. 2) For dual-rank non-interleaved, use the address mask divided by 2 as the size. 3) For dual-rank interleaved, do #2 but "de-interleave" the address mask first. Always "de-interleave" the address mask in order to simplify the code flow. Bit mask manipulation is necessary to check for interleaving, so just go ahead do the de-interleaving. In the non-interleaved case, the original and de-interleaved address masks will be the same. To de-interleave the mask, count the number of zero bits in the middle of the mask and swap them with the most significant bits. For example, Original=0x9FE, De-interleaved=0x3FE Fixes: fc00c6a41638 ("EDAC/amd64: Adjust printed chip select sizes when interleaved") Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 107 ++ 1 file changed, 63 insertions(+), 44 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index d0926b181c7c..f0424c10cac0 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -788,51 +788,36 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan) (dclr & BIT(15)) ? "yes" : "no"); } -/* - * The Address Mask should be a contiguous set of bits in the non-interleaved - * case. So to check for CS interleaving, find the most- and least-significant - * bits of the mask, generate a contiguous bitmask, and compare the two. - */ -static bool f17_cs_interleaved(struct amd64_pvt *pvt, u8 ctrl, int cs) +#define CS_EVEN_PRIMARYBIT(0) +#define CS_ODD_PRIMARY BIT(1) + +static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) { - u32 mask = pvt->csels[ctrl].csmasks[cs >> 1]; - u32 msb = fls(mask) - 1, lsb = ffs(mask) - 1; - u32 test_mask = GENMASK(msb, lsb); + int cs_mode = 0; + + if (csrow_enabled(2 * dimm, ctrl, pvt)) + cs_mode |= CS_EVEN_PRIMARY; - edac_dbg(1, "mask=0x%08x test_mask=0x%08x\n", mask, test_mask); + if (csrow_enabled(2 * dimm + 1, ctrl, pvt)) + cs_mode |= CS_ODD_PRIMARY; - return mask ^ test_mask; + return cs_mode; } static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl) { - int dimm, size0, size1, cs0, cs1; + int dimm, size0, size1, cs0, cs1, cs_mode; edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl); for (dimm = 0; dimm < 2; dimm++) { - size0 = 0; cs0 = dimm * 2; - - if (csrow_enabled(cs0, ctrl, pvt)) - size0 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs0); - - size1 = 0; cs1 = dimm * 2 + 1; - if (csrow_enabled(cs1, ctrl, pvt)) { - /* -* CS interleaving is only supported if both CSes have -* the same amount of memory. Because they are -* interleaved, it will look like both CSes have the -* full amount of memory. Save the size for both as -* half the amount we found on CS0, if interleaved. -*/ - if (f17_cs_interleaved(pvt, ctrl, cs1)) - size1 = size0 = (size0 >> 1); - else - size1 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs1); - } + cs_mode = f17_get_cs_mode(dimm, ctrl, pvt); + + size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0); + size1 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs1); amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n", cs0,size0, @@ -1569,18 +1554,50 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct, return ddr3_cs_size(cs_mode, false); } -static int f17_base_addr_to_cs_size(struct amd64_pvt *pvt, u8 umc, +static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, unsigned int cs_mode, int csrow_nr) { - u32 base_addr = pvt->csels[umc].csbases[csrow_nr]; + u32 addr_mask_orig, addr_mask_deinterleaved; + u32 msb, weight, num_zero_bits; + int dimm, dual_rank, size = 0; - /* Each mask is used for every two base addresses. */ - u32 addr_mask = pvt->csels[umc].csmasks[csrow_nr >> 1]; + if (!cs_mode) + return size; - /* Register [31:1] = Address [39:9]. Size is in kBs here. */ - u32 size = ((addr_mask >> 1) - (base_addr >> 1) + 1) >> 1; + dual_rank = !!(cs_mode &
[PATCH 1/8] EDAC/amd64: Fix number of DIMMs and Chip Select bases/masks on Family17h
From: Yazen Ghannam ...because AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per channel. Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output") Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 873437be86d9..9fa2f205f05c 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -810,7 +810,7 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl) edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl); - for (dimm = 0; dimm < 4; dimm++) { + for (dimm = 0; dimm < 2; dimm++) { size0 = 0; cs0 = dimm * 2; @@ -942,6 +942,9 @@ static void prep_chip_selects(struct amd64_pvt *pvt) } else if (pvt->fam == 0x15 && pvt->model == 0x30) { pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4; pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2; + } else if (pvt->fam >= 0x17) { + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4; + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2; } else { pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8; pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4; -- 2.17.1
[PATCH 0/8] AMD64 EDAC fixes for v5.2
From: Yazen Ghannam Hi Boris, This set contains a few fixes for some changes merged in v5.2. There are also a couple of fixes for older issues. In addition, there are a couple of patches to add support for Asymmetric Dual-Rank DIMMs. Thanks, Yazen Yazen Ghannam (8): EDAC/amd64: Fix number of DIMMs and Chip Select bases/masks on Family17h EDAC/amd64: Support more than two controllers for chip selects handling EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP EDAC/amd64: Initialize DIMM info for systems with more than two channels EDAC/amd64: Find Chip Select memory size using Address Mask EDAC/amd64: Decode syndrome before translating address EDAC/amd64: Cache secondary Chip Select registers EDAC/amd64: Support Asymmetric Dual-Rank DIMMs drivers/edac/amd64_edac.c | 348 -- drivers/edac/amd64_edac.h | 9 +- 2 files changed, 232 insertions(+), 125 deletions(-) -- 2.17.1
Re: [PATCH 2/5] Add fchmodat4(), a new syscall
On Fri, 31 May 2019 12:51:00 PDT (-0700), Arnd Bergmann wrote: On Fri, May 31, 2019 at 9:23 PM Palmer Dabbelt wrote: man 3p says that fchmodat() takes a flags argument, but the Linux syscall does not. There doesn't appear to be a good userspace workaround for this issue but the implementation in the kernel is pretty straight-forward. The specific use case where the missing flags came up was WRT a fuse filesystem implemenation, but the functionality is pretty generic so I'm assuming there would be other use cases. Signed-off-by: Palmer Dabbelt --- fs/open.c| 21 +++-- include/linux/syscalls.h | 5 + 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/fs/open.c b/fs/open.c index a00350018a47..cfad7684e8d3 100644 --- a/fs/open.c +++ b/fs/open.c @@ -568,11 +568,17 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode) return ksys_fchmod(fd, mode); } -int do_fchmodat(int dfd, const char __user *filename, umode_t mode) +int do_fchmodat4(int dfd, const char __user *filename, umode_t mode, int flags) ... + +int do_fchmodat(int dfd, const char __user *filename, umode_t mode) +{ + return do_fchmodat4(dfd, filename, mode, 0); +} + There is only one external caller of do_fchmodat(), so just change that to pass the extra argument here, and keep a single do_fchmodat() function used by the sys_fchmod(), sys_fchmod4(), sys_chmod() and ksys_chmod(). OK, I'll roll that up into a v2.
[RFC PATCH 2/9] x86/sgx: Do not naturally align MAP_FIXED address
SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is tracked and enforced by the CPU using a base+mask approach, similar to how hardware range registers such as the variable MTRRs. As a result, the ELRANGE must be naturally sized and aligned. To reduce boilerplate code that would be needed in every userspace enclave loader, the SGX driver naturally aligns the mmap() address and also requires the range to be naturally sized. Unfortunately, SGX fails to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap() if userspace is attempting to map a small slice of an existing enclave. Signed-off-by: Sean Christopherson --- arch/x86/kernel/cpu/sgx/driver/main.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c index afe844aa81d6..129d356aff30 100644 --- a/arch/x86/kernel/cpu/sgx/driver/main.c +++ b/arch/x86/kernel/cpu/sgx/driver/main.c @@ -79,7 +79,13 @@ static unsigned long sgx_get_unmapped_area(struct file *file, unsigned long pgoff, unsigned long flags) { - if (len < 2 * PAGE_SIZE || len & (len - 1) || flags & MAP_PRIVATE) + if (flags & MAP_PRIVATE) + return -EINVAL; + + if (flags & MAP_FIXED) + return addr; + + if (len < 2 * PAGE_SIZE || len & (len - 1)) return -EINVAL; addr = current->mm->get_unmapped_area(file, addr, 2 * len, pgoff, -- 2.21.0
[RFC PATCH 6/9] x86/sgx: Require userspace to provide allowed prots to ADD_PAGES
...to support (the equivalent) of existing Linux Security Module functionality. Because SGX manually manages EPC memory, all enclave VMAs are backed by the same vm_file, i.e. /dev/sgx/enclave, so that SGX can implement the necessary hooks to move pages in/out of the EPC. And because EPC pages for any given enclave are fundamentally shared between processes, i.e. CoW semantics are not possible with EPC pages, /dev/sgx/enclave must always be MAP_SHARED. Lastly, all real world enclaves will need read, write and execute permissions to EPC pages. As a result, SGX does not play nice with existing LSM behavior as it is impossible to apply policies to enclaves with any reasonable granularity, e.g. an LSM can deny access to EPC altogether, but can't deny potentially dangerous behavior such as mapping pages RW->RW or RWX. To give LSMs enough information to implement their policies without having to resort to ugly things, e.g. holding a reference to the vm_file of each enclave page, require userspace to explicitly state the allowed protections for each page (region), i.e. take ALLOW_{READ,WRITE,EXEC} in the ADD_PAGES ioctl. The ALLOW_* flags will be passed to LSMs so that they can make informed decisions when the enclave is being built, i.e. when the source vm_file is available. For example, SELinux's EXECMOD permission can be required if an enclave is requesting both ALLOW_WRITE and ALLOW_EXEC. Update the mmap()/mprotect() hooks to enforce the ALLOW_* protections, a la the standard VM_MAY{READ,WRITE,EXEC} flags. The ALLOW_EXEC flag also has a second (important) use in that it can be used to prevent loading an enclave from a noexec file system, on SGX2 hardware (regardless of kernel support for SGX2), userspace could EADD from a noexec path using read-only permissions and later mprotect() and ENCLU[EMODPE] the page to gain execute permissions. By requiring ALLOW_EXEC up front, SGX will be able to enforce noexec paths when building the enclave. Signed-off-by: Sean Christopherson --- arch/x86/include/uapi/asm/sgx.h| 9 - arch/x86/kernel/cpu/sgx/driver/ioctl.c | 23 +-- arch/x86/kernel/cpu/sgx/encl.c | 2 +- arch/x86/kernel/cpu/sgx/encl.h | 1 + 4 files changed, 27 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h index 4a12d6abbcb7..4489e92fa0dc 100644 --- a/arch/x86/include/uapi/asm/sgx.h +++ b/arch/x86/include/uapi/asm/sgx.h @@ -31,6 +31,11 @@ struct sgx_enclave_create { __u64 src; }; +/* Supported flags for struct sgx_enclave_add_pages. */ +#define SGX_ALLOW_READ VM_READ +#define SGX_ALLOW_WRITEVM_WRITE +#define SGX_ALLOW_EXEC VM_EXEC + /** * struct sgx_enclave_add_pages - parameter structure for the *%SGX_IOC_ENCLAVE_ADD_PAGES ioctl @@ -39,6 +44,7 @@ struct sgx_enclave_create { * @secinfo: address for the SECINFO data (common to all pages) * @nr_pages: number of pages (must be virtually contiguous) * @mrmask:bitmask for the measured 256 byte chunks (common to all pages) + * @flags: flags, e.g. SGX_ALLOW_{READ,WRITE,EXEC} (common to all pages) */ struct sgx_enclave_add_pages { __u64 addr; @@ -46,7 +52,8 @@ struct sgx_enclave_add_pages { __u64 secinfo; __u32 nr_pages; __u16 mrmask; -} __attribute__((__packed__)); + __u16 flags; +}; /** * struct sgx_enclave_init - parameter structure for the diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index 6acfcbdeca9a..c30acd3fbbdd 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -235,7 +235,8 @@ static int sgx_validate_secs(const struct sgx_secs *secs, } static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, -unsigned long addr) +unsigned long addr, +unsigned long allowed_prot) { struct sgx_encl_page *encl_page; int ret; @@ -247,6 +248,7 @@ static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, return ERR_PTR(-ENOMEM); encl_page->desc = addr; encl_page->encl = encl; + encl_page->allowed_prot = allowed_prot; ret = radix_tree_insert(>page_tree, PFN_DOWN(encl_page->desc), encl_page); if (ret) { @@ -530,7 +532,7 @@ static int sgx_encl_queue_page(struct sgx_encl *encl, static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, void *data, struct sgx_secinfo *secinfo, - unsigned int mrmask) + unsigned int mrmask, unsigned long allowed_prot) { u64 page_type = secinfo->flags & SGX_SECINFO_PAGE_TYPE_MASK; struct sgx_encl_page
[RFC PATCH 3/9] x86/sgx: Allow userspace to add multiple pages in single ioctl()
...to improve performance when building enclaves by reducing the number of user<->system transitions. Rather than provide arbitrary batching, e.g. with per-page SECINFO and mrmask, take advantage of the fact that any sane enclave will have large swaths of pages with identical properties, e.g. code vs. data sections. For simplicity and stability in the initial implementation, loop over the existing add page flow instead of taking a more agressive approach, which would require tracking transitions between VMAs and holding mmap_sem for an extended duration. Signed-off-by: Sean Christopherson --- arch/x86/include/uapi/asm/sgx.h| 21 ++--- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 104 +++-- 2 files changed, 74 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h index 9ed690a38c70..4a12d6abbcb7 100644 --- a/arch/x86/include/uapi/asm/sgx.h +++ b/arch/x86/include/uapi/asm/sgx.h @@ -12,8 +12,8 @@ #define SGX_IOC_ENCLAVE_CREATE \ _IOW(SGX_MAGIC, 0x00, struct sgx_enclave_create) -#define SGX_IOC_ENCLAVE_ADD_PAGE \ - _IOW(SGX_MAGIC, 0x01, struct sgx_enclave_add_page) +#define SGX_IOC_ENCLAVE_ADD_PAGES \ + _IOW(SGX_MAGIC, 0x01, struct sgx_enclave_add_pages) #define SGX_IOC_ENCLAVE_INIT \ _IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init) #define SGX_IOC_ENCLAVE_SET_ATTRIBUTE \ @@ -32,21 +32,22 @@ struct sgx_enclave_create { }; /** - * struct sgx_enclave_add_page - parameter structure for the - * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl - * @addr: address within the ELRANGE - * @src: address for the page data - * @secinfo: address for the SECINFO data - * @mrmask:bitmask for the measured 256 byte chunks + * struct sgx_enclave_add_pages - parameter structure for the + *%SGX_IOC_ENCLAVE_ADD_PAGES ioctl + * @addr: start address within the ELRANGE + * @src: start address for the pages' data + * @secinfo: address for the SECINFO data (common to all pages) + * @nr_pages: number of pages (must be virtually contiguous) + * @mrmask:bitmask for the measured 256 byte chunks (common to all pages) */ -struct sgx_enclave_add_page { +struct sgx_enclave_add_pages { __u64 addr; __u64 src; __u64 secinfo; + __u32 nr_pages; __u16 mrmask; } __attribute__((__packed__)); - /** * struct sgx_enclave_init - parameter structure for the * %SGX_IOC_ENCLAVE_INIT ioctl diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index a27ec26a9350..6acfcbdeca9a 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -487,10 +487,9 @@ static int sgx_validate_tcs(struct sgx_encl *encl, struct sgx_tcs *tcs) return 0; } -static int __sgx_encl_add_page(struct sgx_encl *encl, +static int sgx_encl_queue_page(struct sgx_encl *encl, struct sgx_encl_page *encl_page, - void *data, - struct sgx_secinfo *secinfo, + void *data, struct sgx_secinfo *secinfo, unsigned int mrmask) { unsigned long page_index = sgx_encl_get_index(encl, encl_page); @@ -529,9 +528,9 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, return 0; } -static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, -void *data, struct sgx_secinfo *secinfo, -unsigned int mrmask) +static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, + void *data, struct sgx_secinfo *secinfo, + unsigned int mrmask) { u64 page_type = secinfo->flags & SGX_SECINFO_PAGE_TYPE_MASK; struct sgx_encl_page *encl_page; @@ -563,7 +562,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, goto out; } - ret = __sgx_encl_add_page(encl, encl_page, data, secinfo, mrmask); + ret = sgx_encl_queue_page(encl, encl_page, data, secinfo, mrmask); if (ret) { radix_tree_delete(_page->encl->page_tree, PFN_DOWN(encl_page->desc)); @@ -575,56 +574,79 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, return ret; } -/** - * sgx_ioc_enclave_add_page - handler for %SGX_IOC_ENCLAVE_ADD_PAGE - * - * @filep: open file to /dev/sgx - * @cmd: the command value - * @arg: pointer to an _enclave_add_page instance - * - * Add a page to an uninitialized enclave (EADD), and optionally extend the - * enclave's measurement with the contents of the page (EEXTEND). EADD and - * EEXTEND are done asynchronously via worker threads. A successful - * sgx_ioc_enclave_add_page() only
[RFC PATCH 9/9] security/selinux: Add enclave_load() implementation
The goal of selinux_enclave_load() is to provide a facsimile of the existing selinux_file_mprotect() and file_map_prot_check() policies, but tailored to the unique properties of SGX. For example, an enclave page is technically backed by a MAP_SHARED file, but the "file" is essentially shared memory that is never persisted anywhere and also requires execute permissions (for some pages). The basic concept is to require appropriate execute permissions on the source of the enclave for pages that are requesting PROT_EXEC, e.g. if an enclave page is being loaded from a regular file, require FILE__EXECUTE and/or FILE__EXECMOND, and if it's coming from an anonymous/private mapping, require PROCESS__EXECMEM since the process is essentially executing from the mapping, albeit in a roundabout way. Note, FILE__READ and FILE__WRITE are intentionally not required even if the source page is backed by a regular file. Writes to the enclave page are contained to the EPC, i.e. never hit the original file, and read permissions have already been vetted (or the VMA doesn't have PROT_READ, in which case loading the page into the enclave will fail). Signed-off-by: Sean Christopherson --- security/selinux/hooks.c | 85 1 file changed, 85 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 3ec702cf46ca..f436a055dda7 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6726,6 +6726,87 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux) } #endif +#ifdef CONFIG_INTEL_SGX +int selinux_enclave_load(struct vm_area_struct *vma, unsigned long prot, +unsigned long *allowed_prot) +{ + const struct cred *cred = current_cred(); + u32 sid = cred_sid(cred); + int rc; + + /* SGX is supported only in 64-bit kernels. */ + WARN_ON_ONCE(!default_noexec); + + /* +* SGX is responsible for checking @prot vs @allowed_prot, and SELinux +* only cares about execute related permissions for enclaves. +*/ + if (!(*allowed_prot & PROT_EXEC)) + return 0; + + /* +* Loading an executable enclave page from a VMA that is not executable +* itself requires EXECUTE permissions on the source file, or if there +* is no regular source file, EXECMEM since the page is being loaded +* from a non-executable anonymous mapping. +*/ + if (!(vma->vm_flags & VM_EXEC)) { + if (vma->vm_file && !IS_PRIVATE(file_inode(vma->vm_file))) + rc = file_has_perm(cred, vma->vm_file, FILE__EXECUTE); + else + rc = avc_has_perm(_state, + sid, sid, SECCLASS_PROCESS, + PROCESS__EXECMEM, NULL); + + /* +* Reject the load if the enclave *needs* the page to be +* executable, otherwise prevent it from becoming executable. +*/ + if (rc) { + if (prot & PROT_EXEC) + return rc; + + *allowed_prot &= ~PROT_EXEC; + } + } + + /* +* An enclave page that may do RW->RX or W+X requires EXECMOD (backed +* by a regular file) or EXECMEM (loaded from an anonymous mapping). +* Note, this hybrid EXECMOD and EXECMEM behavior is intentional and +* reflects the nature of enclaves and the EPC, e.g. EPC is effectively +* a non-persistent shared file, but each enclave is a private domain +* within that shared file, so delegate to the source of the enclave. +*/ + if ((*allowed_prot & PROT_EXEC) && (*allowed_prot & PROT_WRITE)) { + if (vma->vm_file && !IS_PRIVATE(file_inode(vma->vm_file))) + rc = file_has_perm(cred, vma->vm_file, FILE__EXECMOD); + else + rc = avc_has_perm(_state, + sid, sid, SECCLASS_PROCESS, + PROCESS__EXECMEM, NULL); + /* +* Clear ALLOW_EXEC instead of ALLOWED_WRITE if permissions are +* lacking and @prot has neither PROT_WRITE or PROT_EXEC. If +* userspace wanted RX they would have requested RX, and due to +* lack of permissions they can never get RW->RX, i.e. the only +* useful transition is R->RW. +*/ + if (rc) { + if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) + return rc; + + if (prot & PROT_EXEC) + *allowed_prot &= ~PROT_WRITE; + else + *allowed_prot &= ~PROT_EXEC; + } + } + + return 0; +} +#endif +
[RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM
This series is the result of a rather absurd amount of discussion over how to get SGX to play nice with LSM policies, without having to resort to evil shenanigans or put undue burden on userspace. The discussion definitely wandered into completely insane territory at times, but I think/hope we ended up with something reasonable. The basic gist of the approach is to require userspace to declare what protections are maximally allowed for any given page, e.g. add a flags field for loading enclave pages that takes ALLOW_{READ,WRITE,EXEC}. LSMs can then adjust the allowed protections, e.g. clear ALLOW_EXEC to prevent ever mapping the page with PROT_EXEC. SGX enforces the allowed perms via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}. ALLOW_EXEC is used to deny hings like loading an enclave from a noexec file system or from a file without EXECUTE permissions, e.g. without the ALLOW_EXEC concept, on SGX2 hardware (regardless of kernel support) userspace could EADD from a noexec file using read-only permissions, and later use mprotect() and ENCLU[EMODPE] to gain execute permissions. ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's EXECMOD (or EXECMEM). This is very much an RFC series. It's only compile tested, likely has obvious bugs, the SELinux patch could be completely harebrained, etc... My goal at this point is to get feedback at a macro level, e.g. is the core concept viable/acceptable, are there objection to hooking mprotect(), etc... Andy and Cedric, hopefully this aligns with your general expectations based on our last discussion. Lastly, I added a patch to allow userspace to add multiple pages in a single ioctl(). It's obviously not directly related to the security stuff, but the idea tangentially came up during earlier discussions and it's something I think the UAPI should provide (it's a tiny change). Since I was modifying the UAPI anyways, I threw it in. Sean Christopherson (9): x86/sgx: Remove unused local variable in sgx_encl_release() x86/sgx: Do not naturally align MAP_FIXED address x86/sgx: Allow userspace to add multiple pages in single ioctl() mm: Introduce vm_ops->mprotect() x86/sgx: Restrict mapping without an enclave page to PROT_NONE x86/sgx: Require userspace to provide allowed prots to ADD_PAGES x86/sgx: Enforce noexec filesystem restriction for enclaves LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX security/selinux: Add enclave_load() implementation arch/x86/include/uapi/asm/sgx.h| 30 -- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 143 + arch/x86/kernel/cpu/sgx/driver/main.c | 13 ++- arch/x86/kernel/cpu/sgx/encl.c | 31 +- arch/x86/kernel/cpu/sgx/encl.h | 4 + include/linux/lsm_hooks.h | 16 +++ include/linux/mm.h | 2 + include/linux/security.h | 2 + mm/mprotect.c | 15 ++- security/security.c| 8 ++ security/selinux/hooks.c | 85 +++ 11 files changed, 290 insertions(+), 59 deletions(-) -- 2.21.0
[RFC PATCH 7/9] x86/sgx: Enforce noexec filesystem restriction for enclaves
Do not allow an enclave page to be mapped with PROT_EXEC if the source page is backed by a file on a noexec file system. Signed-off-by: Sean Christopherson --- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index c30acd3fbbdd..5f71be7cbb01 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -576,6 +576,27 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, return ret; } +static int sgx_encl_page_protect(unsigned long src, unsigned long prot, +unsigned long *allowed_prot) +{ + struct vm_area_struct *vma; + + if (!(*allowed_prot & VM_EXEC)) + goto do_check; + + down_read(>mm->mmap_sem); + vma = find_vma(current->mm, src); + if (!vma || (vma->vm_file && path_noexec(>vm_file->f_path))) + *allowed_prot &= ~VM_EXEC; + up_read(>mm->mmap_sem); + +do_check: + if (prot & ~*allowed_prot) + return -EACCES; + + return 0; +} + static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, unsigned long src, struct sgx_secinfo *secinfo, unsigned int mrmask, unsigned int flags) @@ -589,8 +610,9 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, BUILD_BUG_ON(SGX_SECINFO_R != VM_READ || SGX_SECINFO_W != VM_WRITE || SGX_SECINFO_X != VM_EXEC); - if (prot & ~allowed_prot) - return -EACCES; + ret = sgx_encl_page_protect(src, prot, _prot); + if (ret) + return ret; data_page = alloc_page(GFP_HIGHUSER); if (!data_page) -- 2.21.0
[RFC PATCH 5/9] x86/sgx: Restrict mapping without an enclave page to PROT_NONE
To support LSM integration, SGX will require userspace to explicitly specify the allowed protections for each page. The allowed protections will be supplied to and modified by LSMs (based on their policies). To prevent userspace from circumventing the allowed protections, do not allow PROT_{READ,WRITE,EXEC} mappings to an enclave without an associated enclave page (which will track the allowed protections). Signed-off-by: Sean Christopherson --- arch/x86/kernel/cpu/sgx/driver/main.c | 5 + arch/x86/kernel/cpu/sgx/encl.c| 30 +++ arch/x86/kernel/cpu/sgx/encl.h| 3 +++ 3 files changed, 38 insertions(+) diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c index 129d356aff30..65a87c2fdf02 100644 --- a/arch/x86/kernel/cpu/sgx/driver/main.c +++ b/arch/x86/kernel/cpu/sgx/driver/main.c @@ -63,6 +63,11 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { struct sgx_encl *encl = file->private_data; + int ret; + + ret = sgx_map_allowed(encl, vma->vm_start, vma->vm_end, vma->vm_flags); + if (ret) + return ret; vma->vm_ops = _vm_ops; vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index f23ea0fbaa47..955d4f430adc 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -235,6 +235,35 @@ static void sgx_vma_close(struct vm_area_struct *vma) kref_put(>refcount, sgx_encl_release); } +int sgx_map_allowed(struct sgx_encl *encl, unsigned long start, + unsigned long end, unsigned long prot) +{ + struct sgx_encl_page *page; + unsigned long addr; + + prot &= (VM_READ | VM_WRITE | VM_EXEC); + if (!prot || !encl) + return 0; + + mutex_lock(>lock); + + for (addr = start; addr < end; addr += PAGE_SIZE) { + page = radix_tree_lookup(>page_tree, addr >> PAGE_SHIFT); + if (!page) + return -EACCES; + } + + mutex_unlock(>lock); + + return 0; +} + +static int sgx_vma_mprotect(struct vm_area_struct *vma, unsigned long start, + unsigned long end, unsigned long prot) +{ + return sgx_map_allowed(vma->vm_private_data, start, end, prot); +} + static unsigned int sgx_vma_fault(struct vm_fault *vmf) { unsigned long addr = (unsigned long)vmf->address; @@ -372,6 +401,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr, const struct vm_operations_struct sgx_vm_ops = { .close = sgx_vma_close, .open = sgx_vma_open, + .mprotect = sgx_vma_mprotect, .fault = sgx_vma_fault, .access = sgx_vma_access, }; diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index c557f0374d74..6e310e3b3fff 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -106,6 +106,9 @@ static inline unsigned long sgx_pcmd_offset(pgoff_t page_index) sizeof(struct sgx_pcmd); } +int sgx_map_allowed(struct sgx_encl *encl, unsigned long start, + unsigned long end, unsigned long prot); + enum sgx_encl_mm_iter { SGX_ENCL_MM_ITER_DONE = 0, SGX_ENCL_MM_ITER_NEXT = 1, -- 2.21.0
Re: [PATCH v2 00/17] net: introduce Qualcomm IPA driver
On Fri 31 May 13:47 PDT 2019, Alex Elder wrote: > On 5/31/19 2:19 PM, Arnd Bergmann wrote: > > On Fri, May 31, 2019 at 6:36 PM Alex Elder wrote: > >> On 5/31/19 9:58 AM, Dan Williams wrote: > >>> On Thu, 2019-05-30 at 22:53 -0500, Alex Elder wrote: > >>> > >>> My question from the Nov 2018 IPA rmnet driver still stands; how does > >>> this relate to net/ethernet/qualcomm/rmnet/ if at all? And if this is > >>> really just a netdev talking to the IPA itself and unrelated to > >>> net/ethernet/qualcomm/rmnet, let's call it "ipa%d" and stop cargo- > >>> culting rmnet around just because it happens to be a net driver for a > >>> QC SoC. > >> > >> First, the relationship between the IPA driver and the rmnet driver > >> is that the IPA driver is assumed to sit between the rmnet driver > >> and the hardware. > > > > Does this mean that IPA can only be used to back rmnet, and rmnet > > can only be used on top of IPA, or can or both of them be combined > > with another driver to talk to instead? > > No it does not mean that. > > As I understand it, one reason for the rmnet layer was to abstract > the back end, which would allow using a modem, or using something > else (a LAN?), without exposing certain details of the hardware. > (Perhaps to support multiplexing, etc. without duplicating that > logic in two "back-end" drivers?) > > To be perfectly honest, at first I thought having IPA use rmnet > was a cargo cult thing like Dan suggested, because I didn't see > the benefit. I now see why one would use that pass-through layer > to handle the QMAP features. > > But back to your question. The other thing is that I see no > reason the IPA couldn't present a "normal" (non QMAP) interface > for a modem. It's something I'd really like to be able to do, > but I can't do it without having the modem firmware change its > configuration for these endpoints. My access to the people who > implement the modem firmware has been very limited (something > I hope to improve), and unless and until I can get corresponding > changes on the modem side to implement connections that don't > use QMAP, I can't implement such a thing. > But any such changes would either be years into the future or for specific devices and as such not applicable to any/most of devices on the market now or in the coming years. But as Arnd points out, if the software split between IPA and rmnet is suboptimal your are encouraged to fix that. Regards, Bjorn
[RFC PATCH 1/9] x86/sgx: Remove unused local variable in sgx_encl_release()
Signed-off-by: Sean Christopherson --- arch/x86/kernel/cpu/sgx/encl.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 7216bdf07bd0..f23ea0fbaa47 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -463,7 +463,6 @@ EXPORT_SYMBOL_GPL(sgx_encl_destroy); void sgx_encl_release(struct kref *ref) { struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount); - struct sgx_encl_mm *encl_mm; if (encl->pm_notifier.notifier_call) unregister_pm_notifier(>pm_notifier); -- 2.21.0
[RFC PATCH 8/9] LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX
enclave_load() is roughly analogous to the existing file_mprotect(). Due to the nature of SGX and its Enclave Page Cache (EPC), all enclave VMAs are backed by a single file, i.e. /dev/sgx/enclave, that must be MAP_SHARED. Furthermore, all enclaves need read, write and execute VMAs. As a result, file_mprotect() does not provide any meaningful security for enclaves since an LSM can only deny/grant access to the EPC as a whole. security_enclave_load() is called when SGX is first loading an enclave page, i.e. copying a page from normal memory into the EPC. The notable difference from file_mprotect() is the allowed_prot parameter, which is essentially an SGX-specific version of a VMA's MAY_{READ,WRITE,EXEC} flags. The purpose of allowed_prot is to enable checks such as SELinux's FILE__EXECMOD permission without having to track and update VMAs across multiple mm structs, i.e. SGX can ensure userspace doesn't overstep its bounds simply by restricting an enclave VMA's protections by vetting what is maximally allowed during build time. An alternative to the allowed_prot approach would be to use an enclave's SIGSTRUCT (a smallish structure that can uniquely identify an enclave) as a proxy for the enclave. For example, SGX could take and hold a reference to the file containing the SIGSTRUCT (if it's in a file) and call security_enclave_load() during mprotect(). While the SIGSTRUCT approach would provide better precision, the actual value added was deemed to be negligible. On the other hand, pinning a file for the lifetime of the enclave is ugly, and essentially caching LSM policies in each page's allowed_prot avoids having to make an extra LSM upcall during mprotect(). Note, extensive discussion yielded no sane alternative to some form of SGX specific LSM hook[1]. [1] https://lkml.kernel.org/r/CALCETrXf8mSK45h7sTK5Wf+pXLVn=bjsc_rlpgo-h-qdzbr...@mail.gmail.com Signed-off-by: Sean Christopherson --- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 +- include/linux/lsm_hooks.h | 16 include/linux/security.h | 2 ++ security/security.c| 8 4 files changed, 35 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index 5f71be7cbb01..260417ecbcff 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -580,21 +581,24 @@ static int sgx_encl_page_protect(unsigned long src, unsigned long prot, unsigned long *allowed_prot) { struct vm_area_struct *vma; + int ret = 0; - if (!(*allowed_prot & VM_EXEC)) + if (!(*allowed_prot & VM_EXEC) && !IS_ENABLED(CONFIG_SECURITY)) goto do_check; down_read(>mm->mmap_sem); vma = find_vma(current->mm, src); if (!vma || (vma->vm_file && path_noexec(>vm_file->f_path))) *allowed_prot &= ~VM_EXEC; +#ifdef CONFIG_SECURITY + ret = security_enclave_load(vma, prot, allowed_prot); +#endif up_read(>mm->mmap_sem); do_check: - if (prot & ~*allowed_prot) - return -EACCES; - - return 0; + if (!ret && (prot & ~*allowed_prot)) + ret = -EACCES; + return ret; } static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 47f58cfb6a19..0562775424a0 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1446,6 +1446,14 @@ * @bpf_prog_free_security: * Clean up the security information stored inside bpf prog. * + * Security hooks for Intel SGX enclaves. + * + * @enclave_load: + * On success, returns 0 and optionally adjusts @allowed_prot + * @vma: the source memory region of the enclave page being loaded. + * @prot: the initial protection of the enclave page. + * @allowed_prot: the maximum protections of the enclave page. + * Return 0 if permission is granted. */ union security_list_options { int (*binder_set_context_mgr)(struct task_struct *mgr); @@ -1807,6 +1815,11 @@ union security_list_options { int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux); void (*bpf_prog_free_security)(struct bpf_prog_aux *aux); #endif /* CONFIG_BPF_SYSCALL */ + +#ifdef CONFIG_INTEL_SGX + int (*enclave_load)(struct vm_area_struct *vma, unsigned long prot, + unsigned long *allowed_prot); +#endif /* CONFIG_INTEL_SGX */ }; struct security_hook_heads { @@ -2046,6 +2059,9 @@ struct security_hook_heads { struct hlist_head bpf_prog_alloc_security; struct hlist_head bpf_prog_free_security; #endif /* CONFIG_BPF_SYSCALL */ +#ifdef CONFIG_INTEL_SGX + struct hlist_head enclave_load; +#endif /* CONFIG_INTEL_SGX */ } __randomize_layout;
[RFC PATCH 4/9] mm: Introduce vm_ops->mprotect()
SGX will use the mprotect() hook to prevent userspace from circumventing various security checks, i.e. Linux Security Modules. Enclaves are built by copying data from normal memory into the Enclave Page Cache (EPC). Due to the nature of SGX, the EPC is represented by a single file that must be MAP_SHARED, i.e. mprotect() only ever sees a single MAP_SHARED vm_file. Furthermore, all enclaves will need read, write and execute pages in the EPC. As a result, LSM policies cannot be meaningfully applied, e.g. an LSM can deny access to the EPC as a whole, but can't deny PROT_EXEC on page that originated in a non-EXECUTE file (which is long gone by the time mprotect() is called). By hooking mprotect(), SGX can make explicit LSM upcalls while an enclave is being built, i.e. when the kernel has a handle to origin of each enclave page, and enforce the result of the LSM policy whenever userspace maps the enclave page in the future. Alternatively, SGX could play games with MAY_{READ,WRITE,EXEC}, but that approach is quite ugly, e.g. would require userspace to call an SGX ioctl() prior to using mprotect() to extend a page's protections. Signed-off-by: Sean Christopherson --- include/linux/mm.h | 2 ++ mm/mprotect.c | 15 +++ 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0e8834ac32b7..50a42364a885 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -458,6 +458,8 @@ struct vm_operations_struct { void (*close)(struct vm_area_struct * area); int (*split)(struct vm_area_struct * area, unsigned long addr); int (*mremap)(struct vm_area_struct * area); + int (*mprotect)(struct vm_area_struct * area, unsigned long start, + unsigned long end, unsigned long prot); vm_fault_t (*fault)(struct vm_fault *vmf); vm_fault_t (*huge_fault)(struct vm_fault *vmf, enum page_entry_size pe_size); diff --git a/mm/mprotect.c b/mm/mprotect.c index bf38dfbbb4b4..e466ca5e4fe0 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -547,13 +547,20 @@ static int do_mprotect_pkey(unsigned long start, size_t len, goto out; } - error = security_file_mprotect(vma, reqprot, prot); - if (error) - goto out; - tmp = vma->vm_end; if (tmp > end) tmp = end; + + if (vma->vm_ops && vma->vm_ops->mprotect) { + error = vma->vm_ops->mprotect(vma, nstart, tmp, prot); + if (error) + goto out; + } + + error = security_file_mprotect(vma, reqprot, prot); + if (error) + goto out; + error = mprotect_fixup(vma, , nstart, tmp, newflags); if (error) goto out; -- 2.21.0
Re: [RFCv2 4/6] mm: factor out madvise's core functionality
On Fri, May 31, 2019 at 04:35:45PM +0200, Oleksandr Natalenko wrote: > On Fri, May 31, 2019 at 10:12:26PM +0900, Minchan Kim wrote: > > On Fri, May 31, 2019 at 09:04:20AM +0200, Oleksandr Natalenko wrote: > > > On Fri, May 31, 2019 at 03:43:11PM +0900, Minchan Kim wrote: > > > > This patch factor out madvise's core functionality so that upcoming > > > > patch can reuse it without duplication. It shouldn't change any > > > > behavior. > > > > > > > > Signed-off-by: Minchan Kim > > > > --- > > > > mm/madvise.c | 188 +++ > > > > 1 file changed, 101 insertions(+), 87 deletions(-) > > > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > > > index 9d749a1420b4..466623ea8c36 100644 > > > > --- a/mm/madvise.c > > > > +++ b/mm/madvise.c > > > > @@ -425,9 +425,10 @@ static int madvise_pageout_pte_range(pmd_t *pmd, > > > > unsigned long addr, > > > > struct page *page; > > > > int isolated = 0; > > > > struct vm_area_struct *vma = walk->vma; > > > > + struct task_struct *task = walk->private; > > > > unsigned long next; > > > > > > > > - if (fatal_signal_pending(current)) > > > > + if (fatal_signal_pending(task)) > > > > return -EINTR; > > > > > > > > next = pmd_addr_end(addr, end); > > > > @@ -505,12 +506,14 @@ static int madvise_pageout_pte_range(pmd_t *pmd, > > > > unsigned long addr, > > > > } > > > > > > > > static void madvise_pageout_page_range(struct mmu_gather *tlb, > > > > -struct vm_area_struct *vma, > > > > -unsigned long addr, unsigned long end) > > > > + struct task_struct *task, > > > > + struct vm_area_struct *vma, > > > > + unsigned long addr, unsigned long end) > > > > { > > > > struct mm_walk warm_walk = { > > > > .pmd_entry = madvise_pageout_pte_range, > > > > .mm = vma->vm_mm, > > > > + .private = task, > > > > }; > > > > > > > > tlb_start_vma(tlb, vma); > > > > @@ -519,9 +522,9 @@ static void madvise_pageout_page_range(struct > > > > mmu_gather *tlb, > > > > } > > > > > > > > > > > > -static long madvise_pageout(struct vm_area_struct *vma, > > > > - struct vm_area_struct **prev, > > > > - unsigned long start_addr, unsigned long > > > > end_addr) > > > > +static long madvise_pageout(struct task_struct *task, > > > > + struct vm_area_struct *vma, struct vm_area_struct > > > > **prev, > > > > + unsigned long start_addr, unsigned long end_addr) > > > > { > > > > struct mm_struct *mm = vma->vm_mm; > > > > struct mmu_gather tlb; > > > > @@ -532,7 +535,7 @@ static long madvise_pageout(struct vm_area_struct > > > > *vma, > > > > > > > > lru_add_drain(); > > > > tlb_gather_mmu(, mm, start_addr, end_addr); > > > > - madvise_pageout_page_range(, vma, start_addr, end_addr); > > > > + madvise_pageout_page_range(, task, vma, start_addr, > > > > end_addr); > > > > tlb_finish_mmu(, start_addr, end_addr); > > > > > > > > return 0; > > > > @@ -744,7 +747,8 @@ static long madvise_dontneed_single_vma(struct > > > > vm_area_struct *vma, > > > > return 0; > > > > } > > > > > > > > -static long madvise_dontneed_free(struct vm_area_struct *vma, > > > > +static long madvise_dontneed_free(struct mm_struct *mm, > > > > + struct vm_area_struct *vma, > > > > struct vm_area_struct **prev, > > > > unsigned long start, unsigned long > > > > end, > > > > int behavior) > > > > @@ -756,8 +760,8 @@ static long madvise_dontneed_free(struct > > > > vm_area_struct *vma, > > > > if (!userfaultfd_remove(vma, start, end)) { > > > > *prev = NULL; /* mmap_sem has been dropped, prev is > > > > stale */ > > > > > > > > - down_read(>mm->mmap_sem); > > > > - vma = find_vma(current->mm, start); > > > > + down_read(>mmap_sem); > > > > + vma = find_vma(mm, start); > > > > if (!vma) > > > > return -ENOMEM; > > > > if (start < vma->vm_start) { > > > > @@ -804,7 +808,8 @@ static long madvise_dontneed_free(struct > > > > vm_area_struct *vma, > > > > * Application wants to free up the pages and associated backing store. > > > > * This is effectively punching a hole into the middle of a file. > > > > */ > > > > -static long madvise_remove(struct vm_area_struct *vma, > > > > +static long madvise_remove(struct mm_struct *mm, > > > > + struct vm_area_struct *vma, > > > > struct vm_area_struct **prev, > > >
Re: [PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering
Hello Saravana, On 5/23/19 6:01 PM, Saravana Kannan wrote: ... > Having functional dependencies explicitly called out in DT and > automatically added before the devices are probed, provides the > following benefits: ... > - Supplier devices like clock providers, regulators providers, etc > need to keep the resources they provide active and at a particular > state(s) during boot up even if their current set of consumers don't > request the resource to be active. This is because the rest of the > consumers might not have probed yet and turning off the resource > before all the consumers have probed could lead to a hang or > undesired user experience. This benefit provided by the sync_state() callback function introduced in this series gives us a mechanism to solve a specific problem encountered on Qualcomm Technologies, Inc. (QTI) boards when booting with drivers compiled as modules. QTI boards have a regulator that powers the PHYs for display, camera, USB, UFS, and PCIe. When these boards boot up, the boot loader enables this regulator along with other resources in order to display a splash screen image. The regulator must remain enabled until the Linux display driver has probed and made a request with the regulator framework to keep the regulator enabled. If the regulator is disabled prematurely, then the screen image is corrupted and the display hardware enters a bad state. We have observed that when the camera driver probes before the display driver, it performs this sequence: regulator_enable(), camera register IO, regulator_disable(). Since it is the first consumer of the shared regulator, the regulator is physically disabled (even though display hardware still requires it to be enabled). We have solved this problem when compiling drivers statically with a downstream regulator proxy-consumer driver. This proxy-consumer is able to make an enable request for the shared regulator before any other consumer. It then removes its request at late_initcall_sync. Unfortunately, when drivers are compiled as modules instead of compiled statically into the kernel image, late_initcall_sync is not a meaningful marker of driver probe completion. This means that our existing proxy voting system will not work when drivers are compiled as modules. The sync_state() callback resolves this issue by providing a notification that is guaranteed to arrive only after all consumers of the shared regulator have probed. QTI boards have other cases of shared resources such as bus bandwidth which must remain at least at a level set by boot loaders in order to properly support hardware blocks that are enabled before the Linux kernel starts booting. Take care, David -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v2 net-next 7/7] net: ethernet: ti: cpsw: add XDP support
On Sat, Jun 01, 2019 at 12:37:36AM +0200, Jesper Dangaard Brouer wrote: On Fri, 31 May 2019 20:03:33 +0300 Ivan Khoronzhuk wrote: Probably it's not good example for others how it should be used, not a big problem to move it to separate pools.., even don't remember why I decided to use shared pool, there was some more reasons... need search in history. Using a shared pool is makes it a lot harder to solve the issue I'm currently working on. That is handling/waiting for in-flight frames to complete, before removing the mem ID from the (r)hashtable lookup. I have working code, that basically remove page_pool_destroy() from public API, and instead lets xdp_rxq_info_unreg() call it when in-flight count reach zero (and delay fully removing the mem ID). Frankly, not see reason why it can block smth, it can be considered like not shared pool. But Ok, anyway it can look more logical and can be reused by another SoC. I will add it per channel not a problem, at least for now no blockers. Adding pool per channel will create more page_pool_destroy() calls, per each pool, that I can be dropped once you decided to remove it form the API. This API is called along with xdp_rxq_info_unreg(), and seems like not a problem to just remove page_pool_destroy(), except one case that worries me... cpsw has one interesting feature, share same h/w with 2 network devices like dual mac, basically it's 3 port switch, but used as 2 separate interfaces. So that, both of them share same queues/channels/rings. XDP rxq requires network device to be set in rxq info, wich is used in the code as a pointer and is shared between xdp buffers, so can't be changed in flight. That's why each network interface has it's own instances of rxq, but page pools per each network device is common, so when I call xdp_rxq_info_unreg() per net device it doesn't mean I want to delete page poolBut seems I can avoid it calling xdp_rxq_info_unreg() for both when delete page pools... -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer -- Regards, Ivan Khoronzhuk
Re: [PATCH v2 00/17] net: introduce Qualcomm IPA driver
On Fri 31 May 12:19 PDT 2019, Arnd Bergmann wrote: > On Fri, May 31, 2019 at 6:36 PM Alex Elder wrote: > > On 5/31/19 9:58 AM, Dan Williams wrote: > > > On Thu, 2019-05-30 at 22:53 -0500, Alex Elder wrote: [..] > > So basically, the purpose of the rmnet driver is to handle QMAP > > protocol connections, and right now that's what the modem provides. > > Do you have any idea why this particular design was picked? > >From what I've seen of QMAP it seems like a reasonable design choice to have a software component (rmnet) dealing with this, separate from the transport. And I think IPA is the 4th or 5th mechanism for transporting QMAP packets back and forth to the modem. Downstream rmnet is copyright 2007-, and I know of interest in bringing at least one of the other transports upstream. Regards, Bjorn
Re: [RFCv2 6/6] mm: extend process_madvise syscall to support vector arrary
Hi Yann, On Fri, May 31, 2019 at 12:06:52PM +0200, Yann Droneaud wrote: > Hi, > > Le vendredi 31 mai 2019 à 15:43 +0900, Minchan Kim a écrit : > > > > diff --git a/include/uapi/asm-generic/mman-common.h > > b/include/uapi/asm-generic/mman-common.h > > index 92e347a89ddc..220c2b5eb961 100644 > > --- a/include/uapi/asm-generic/mman-common.h > > +++ b/include/uapi/asm-generic/mman-common.h > > @@ -75,4 +75,15 @@ > > #define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\ > > PKEY_DISABLE_WRITE) > > > > +struct pr_madvise_param { > > + int size; /* the size of this structure */ > > + int cookie; /* reserved to support atomicity */ > > + int nr_elem;/* count of below arrary fields */ > > Those should be unsigned. > > There's an implicit hole here on ABI with 64bits aligned pointers > > > + int __user *hints; /* hints for each range */ > > + /* to store result of each operation */ > > + const struct iovec __user *results; > > + /* input address ranges */ > > + const struct iovec __user *ranges; > > Using pointer type in uAPI structure require a 'compat' version of the > syscall need to be provided. > > If using iovec too. I will fix them when I submit next revision. Thanks for the review.
Re: [RFCv2 3/6] mm: introduce MADV_PAGEOUT
Hey Johannes, On Fri, May 31, 2019 at 12:59:27PM -0400, Johannes Weiner wrote: > Hi Michan, > > this looks pretty straight-forward to me, only one kink: > > On Fri, May 31, 2019 at 03:43:10PM +0900, Minchan Kim wrote: > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -2126,6 +2126,83 @@ static void shrink_active_list(unsigned long > > nr_to_scan, > > nr_deactivate, nr_rotated, sc->priority, file); > > } > > > > +unsigned long reclaim_pages(struct list_head *page_list) > > +{ > > + int nid = -1; > > + unsigned long nr_isolated[2] = {0, }; > > + unsigned long nr_reclaimed = 0; > > + LIST_HEAD(node_page_list); > > + struct reclaim_stat dummy_stat; > > + struct scan_control sc = { > > + .gfp_mask = GFP_KERNEL, > > + .priority = DEF_PRIORITY, > > + .may_writepage = 1, > > + .may_unmap = 1, > > + .may_swap = 1, > > + }; > > + > > + while (!list_empty(page_list)) { > > + struct page *page; > > + > > + page = lru_to_page(page_list); > > + if (nid == -1) { > > + nid = page_to_nid(page); > > + INIT_LIST_HEAD(_page_list); > > + nr_isolated[0] = nr_isolated[1] = 0; > > + } > > + > > + if (nid == page_to_nid(page)) { > > + list_move(>lru, _page_list); > > + nr_isolated[!!page_is_file_cache(page)] += > > + hpage_nr_pages(page); > > + continue; > > + } > > + > > + mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_ANON, > > + nr_isolated[0]); > > + mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_FILE, > > + nr_isolated[1]); > > + nr_reclaimed += shrink_page_list(_page_list, > > + NODE_DATA(nid), , TTU_IGNORE_ACCESS, > > + _stat, true); > > + while (!list_empty(_page_list)) { > > + struct page *page = lru_to_page(_page_list); > > + > > + list_del(>lru); > > + putback_lru_page(page); > > + } > > + mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_ANON, > > + -nr_isolated[0]); > > + mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_FILE, > > + -nr_isolated[1]); > > + nid = -1; > > + } > > + > > + if (!list_empty(_page_list)) { > > + mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_ANON, > > + nr_isolated[0]); > > + mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_FILE, > > + nr_isolated[1]); > > + nr_reclaimed += shrink_page_list(_page_list, > > + NODE_DATA(nid), , TTU_IGNORE_ACCESS, > > + _stat, true); > > + mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_ANON, > > + -nr_isolated[0]); > > + mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_FILE, > > + -nr_isolated[1]); > > + > > + while (!list_empty(_page_list)) { > > + struct page *page = lru_to_page(_page_list); > > + > > + list_del(>lru); > > + putback_lru_page(page); > > + } > > + > > + } > > The NR_ISOLATED accounting, nid parsing etc. is really awkward and > makes it hard to see what the function actually does. > > Can you please make those ISOLATED counters part of the isolation API? > Your patch really shows this is an overdue cleanup. Yeah, that was very painful. > > These are fast local percpu counters, we don't need the sprawling > batching we do all over vmscan.c, migrate.c, khugepaged.c, > compaction.c etc. Isolation can increase the counter page by page, and > reclaim or putback can likewise decrease them one by one. > > It looks like mlock is the only user of the isolation api that does > not participate in the NR_ISOLATED_* counters protocol, but I don't > see why it wouldn't, or why doing so would hurt. > > There are also seem to be quite a few callsites that use the atomic > versions of the counter API when they're clearly under the irqsafe > lru_lock. That would be fixed automatically by this work as well. I agree all points so will prepare clean up patch.
Re: Linux 5.1.6
Thanks, a bunch Greg! On 08:53 Fri 31 May , Greg KH wrote: I'm announcing the release of the 5.1.6 kernel. All users of the 5.1 kernel series must upgrade. The updated 5.1.y git tree can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.1.y and can be browsed at the normal kernel.org git web browser: https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary thanks, greg k-h Documentation/arm64/silicon-errata.txt |1 Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt |6 Makefile |2 arch/arm/include/asm/cp15.h|2 arch/arm/vdso/vgettimeofday.c |5 arch/arm64/Kconfig | 19 arch/arm64/include/asm/cpucaps.h |3 arch/arm64/include/asm/futex.h |2 arch/arm64/include/asm/pgtable.h |3 arch/arm64/include/asm/vdso_datapage.h |1 arch/arm64/kernel/asm-offsets.c|2 arch/arm64/kernel/cpu_errata.c | 24 arch/arm64/kernel/cpu_ops.c|1 arch/arm64/kernel/kaslr.c |6 arch/arm64/kernel/module.c |2 arch/arm64/kernel/syscall.c| 31 + arch/arm64/kernel/vdso.c |3 arch/arm64/kernel/vdso/gettimeofday.S |7 arch/arm64/mm/dma-mapping.c| 10 arch/arm64/mm/fault.c | 37 + arch/powerpc/boot/addnote.c|6 arch/powerpc/kernel/head_64.S |4 arch/powerpc/kernel/watchdog.c | 81 +- arch/powerpc/mm/numa.c | 18 arch/powerpc/perf/imc-pmu.c|7 arch/powerpc/platforms/powernv/opal-imc.c |2 arch/s390/kernel/kexec_elf.c |7 arch/s390/mm/pgtable.c |2 arch/sh/include/cpu-sh4/cpu/sh7786.h |2 arch/x86/Makefile |2 arch/x86/events/intel/cstate.c |2 arch/x86/events/intel/rapl.c |2 arch/x86/events/msr.c |1 arch/x86/ia32/ia32_signal.c| 29 arch/x86/include/asm/text-patching.h |4 arch/x86/include/asm/uaccess.h |7 arch/x86/kernel/alternative.c | 28 arch/x86/kernel/cpu/hygon.c|5 arch/x86/kernel/cpu/mce/core.c | 66 +- arch/x86/kernel/cpu/mce/inject.c | 14 arch/x86/kernel/cpu/microcode/core.c |3 arch/x86/kernel/ftrace.c |8 arch/x86/kernel/irq_64.c | 19 arch/x86/kernel/module.c |2 arch/x86/kernel/signal.c | 29 arch/x86/kernel/vmlinux.lds.S |6 arch/x86/kvm/irq.c |7 arch/x86/kvm/irq.h |1 arch/x86/kvm/pmu_amd.c |2 arch/x86/kvm/svm.c |6 arch/x86/kvm/vmx/nested.c |4 arch/x86/kvm/x86.c |2 arch/x86/lib/memcpy_64.S |3 arch/x86/mm/fault.c|2 arch/x86/platform/uv/tlb_uv.c |7 block/bio.c|2 block/blk-mq-sched.c | 12 block/blk-mq.c | 139 ++-- block/blk.h|2 block/genhd.c | 19 block/partition-generic.c
[PATCH] regulator: bd718x7: Drop unused include
This driver does not use any symbols from so just drop the include. Cc: Matti Vaittinen Signed-off-by: Linus Walleij --- drivers/regulator/bd718x7-regulator.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/regulator/bd718x7-regulator.c b/drivers/regulator/bd718x7-regulator.c index fde4264da6ff..8c22cfb76173 100644 --- a/drivers/regulator/bd718x7-regulator.c +++ b/drivers/regulator/bd718x7-regulator.c @@ -4,7 +4,6 @@ #include #include -#include #include #include #include -- 2.20.1
[PATCH] regulator: bd70528: Drop unused include
This driver does not use any symbols from so just drop the include. Cc: Matti Vaittinen Signed-off-by: Linus Walleij --- drivers/regulator/bd70528-regulator.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/regulator/bd70528-regulator.c b/drivers/regulator/bd70528-regulator.c index 30e3ed430a8a..0248a61f1006 100644 --- a/drivers/regulator/bd70528-regulator.c +++ b/drivers/regulator/bd70528-regulator.c @@ -4,7 +4,6 @@ #include #include -#include #include #include #include -- 2.20.1
Re: [PATCH] Revert "usb: core: remove local_irq_save() around ->complete() handler"
On 01.06.19 00:05, Greg Kroah-Hartman wrote: > On Fri, May 31, 2019 at 11:53:40PM +0200, Soeren Moch wrote: >> This reverts commit ed194d1367698a0872a2b75bbe06b3932ce9df3a. >> >> In contrast to the original patch description, apparently not all handlers >> were audited properly. E.g. my RT5370 based USB WIFI adapter (driver in >> drivers/net/wireless/ralink/rt2x00) hangs after a while under heavy load. >> This revert fixes this. > Why not just fix that driver? Wouldn't that be easier? > I suspect there are more drivers to fix. I only tested WIFI sticks so far, RTL8188 drivers also seem to suffer from this. I'm not sure how to fix all this properly, maybe Sebastian as original patch author can help here. This patch is mostly for -stable, to get an acceptable solution quickly. It was really annoying to get such unstable WIFI connection over the last three kernel releases to my development board. Since my internet service provider forcefully updated my router box 3 weeks ago, I unfortunately see the same symptoms on my primary internet access. That's even worse, I need to reset this router box every few days. I'm not sure, however, that this is caused by the same problem, but it feels like this. So can we please fix this regression quickly and workout a proper fix later? In the original patch there is no reason given, why this patch is necessary. With this revert I at least see a stable connection. Thanks, Soeren
[PATCH] regulator: arizona-micsupp: Delete unused include
This driver uses no symbols from so just drop this include. Signed-off-by: Linus Walleij --- drivers/regulator/arizona-micsupp.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/regulator/arizona-micsupp.c b/drivers/regulator/arizona-micsupp.c index be0d46da51a1..4876b1ceef23 100644 --- a/drivers/regulator/arizona-micsupp.c +++ b/drivers/regulator/arizona-micsupp.c @@ -16,7 +16,6 @@ #include #include #include -#include #include #include #include -- 2.20.1
Re: [PATCH v2 net-next 7/7] net: ethernet: ti: cpsw: add XDP support
On Fri, May 31, 2019 at 10:08:03PM +, Saeed Mahameed wrote: On Fri, 2019-05-31 at 20:03 +0300, Ivan Khoronzhuk wrote: On Fri, May 31, 2019 at 06:32:41PM +0200, Jesper Dangaard Brouer wrote: > On Fri, 31 May 2019 19:25:24 +0300 Ivan Khoronzhuk < > ivan.khoronz...@linaro.org> wrote: > > > On Fri, May 31, 2019 at 05:46:43PM +0200, Jesper Dangaard Brouer > > wrote: > > > From below code snippets, it looks like you only allocated 1 > > > page_pool > > > and sharing it with several RX-queues, as I don't have the full > > > context > > > and don't know this driver, I might be wrong? > > > > > > To be clear, a page_pool object is needed per RX-queue, as it > > > is > > > accessing a small RX page cache (which protected by > > > NAPI/softirq). > > > > There is one RX interrupt and one RX NAPI for all rx channels. > > So, what are you saying? > > You _are_ sharing the page_pool between several RX-channels, but it > is > safe because this hardware only have one RX interrupt + NAPI > instance?? I can miss smth but in case of cpsw technically it means: 1) RX interrupts are disabled while NAPI is scheduled, not for particular CPU or channel, but at all, for whole cpsw module. 2) RX channels are handled one by one by priority. Hi Ivan, I got a silly question.. What is the reason behind having multiple RX rings and one CPU/NAPI handling all of them ? priority ? how do you priorities ? Several. One of the reason, from what I know, it can handle for several cpus/napi but because of errata on some SoCs or for all of them it was discarded, but idea was it can. Second it uses same davinci_cpdma API as tx channels that can be rate limited, and it's used not only by cpsw but also by other driver, so can't be modified easily and no reason. And third one, h/w has ability to steer some filtered traffic to rx queues and can be potentially configured with ethtool ntuples or so, but it's not implementedyet. 3) After all of them handled and no more in budget - interrupts are enabled. 4) If page is returned to the pool, and it's within NAPI, no races as it's returned protected by softirq. If it's returned not in softirq it's protected by producer lock of the ring. Probably it's not good example for others how it should be used, not a big problem to move it to separate pools.., even don't remember why I decided to use shared pool, there was some more reasons... need search in history. > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer -- Regards, Ivan Khoronzhuk
Re: [RESEND PATCH v4 6/6] leds: lm36274: Introduce the TI LM36274 LED driver
Jacek On 5/31/19 4:57 PM, Jacek Anaszewski wrote: Dan, On 5/31/19 11:07 PM, Dan Murphy wrote: Hello On 5/31/19 2:44 PM, Jacek Anaszewski wrote: On 5/31/19 8:23 AM, Lee Jones wrote: On Thu, 30 May 2019, Jacek Anaszewski wrote: On 5/30/19 9:38 AM, Lee Jones wrote: On Wed, 29 May 2019, Jacek Anaszewski wrote: On 5/29/19 3:58 PM, Lee Jones wrote: On Fri, 24 May 2019, Jacek Anaszewski wrote: Hi, On 5/23/19 9:09 PM, Dan Murphy wrote: Pavel Thanks for the review On 5/23/19 7:50 AM, Pavel Machek wrote: Hi! +++ b/drivers/leds/leds-lm36274.c +static int lm36274_parse_dt(struct lm36274 *lm36274_data) +{ + struct fwnode_handle *child = NULL; + char label[LED_MAX_NAME_SIZE]; + struct device *dev = _data->pdev->dev; + const char *name; + int child_cnt; + int ret = -EINVAL; + + /* There should only be 1 node */ + child_cnt = device_get_child_node_count(dev); + if (child_cnt != 1) + return ret; I'd do explicit "return -EINVAL" here. ACK +static int lm36274_probe(struct platform_device *pdev) +{ + struct ti_lmu *lmu = dev_get_drvdata(pdev->dev.parent); + struct lm36274 *lm36274_data; + int ret; + + lm36274_data = devm_kzalloc(>dev, sizeof(*lm36274_data), + GFP_KERNEL); + if (!lm36274_data) { + ret = -ENOMEM; + return ret; + } And certainly do "return -ENOMEM" explicitly here. ACK Acked-by: Pavel Machek I've done all amendments requested by Pavel and updated branch ib-leds-mfd-regulator on linux-leds.git, but in the same time What do you mean by updated? You cannot update an 'ib' (immutable branch). Immutable means that it cannot change, by definition. We have already talked about that. Nobody has pulled so the branch could have been safely updated. You have no sure way to know that. And since I have no way to know, or faith that you won't update it again, pulling it now/at all would seem like a foolish thing to do. Sorry, but you are simply unjust. You're pretending to portray the situation as if I have been notoriously causing merge conflicts in linux-next which did not take place. Just to recap what this discussion is about: On 7 Apr 2019: 1. I sent pull request [0]. 2. 45 minutes later I updated it after discovering one omission [1]. It was rather small chance for it to be pulled as quickly as that. And even if it happened it wouldn't have been much harmful - we wouldn't have lost e.g. weeks of testing in linux-next due to that fact. On 21 May 2019: 3. I sent another pull request [2] to you and REGULATOR maintainers. After it turned out that lack of feedback from REGULATOR maintainers was caused by failing to send them the exact copies of patches to review, I informed you about possible need for updating the branch. Afterwards I received a reply from you saying that you hadn't pulled the branch anyway. At that point I was sure that neither MFD nor REGULATOR tree contains the patches. And only after that I updated the branch. Here are 2 examples where you have changed immutable branches, which is 100% of the Pull Requests I have received from you. Using that record as a benchmark, the situation hardly seems unjust. Until you can provide me with an assurance that you will not keep updating/changing the supposedly immutable pull-requests you send out, I won't be pulling any more in. I can just uphold the assurance which is implicitly assumed for anyone who has never broken acclaimed rules. As justified above. You have broken the rules every (100% of the) time. Yes, I admit, I would lose in court. [0] https://lore.kernel.org/patchwork/patch/1059075/ [1] https://lore.kernel.org/patchwork/patch/1059080/ [2] https://lore.kernel.org/patchwork/patch/1077066/ So we have 2 choices moving forward; you can either provide me with assurance that you have learned from this experience and will never change an *immutable* branch again, or I can continue to handle them, which has been the preference for some years. If you choose the former and adaptions need to be made in the future, the correct thing to do is create a *new*, different pull-request which has its own *new*, different tag, but uses the original tag as a base. I choose the former. That being said: Hereby I solemnly declare never ever change an immutable branch again. So how do I proceed with the requested change by Mark B on the LM36274 driver. Do I add a patch on top? Or do I submit a patch to the regulator tree once the PR is pulled? Won't the change be a dependency for [PATCH v4 1/6] ? Yes thats why I am asking as we would need to change the branch. Dan In each case, having all the commits in one set (and branch) should simplify the integration.
[PATCH] media: cxusb-analog: Use ARRAY_SIZE for cxusub_medion_pin_config
Use ARRAY_SIZE for computing element count of cxusub_medion_pin_config array as suggested by the kbuild test robot. Reported-by: kbuild test robot Signed-off-by: Maciej S. Szmigiero --- drivers/media/usb/dvb-usb/cxusb-analog.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/media/usb/dvb-usb/cxusb-analog.c b/drivers/media/usb/dvb-usb/cxusb-analog.c index 9b42ca71c177..51d3cba32b60 100644 --- a/drivers/media/usb/dvb-usb/cxusb-analog.c +++ b/drivers/media/usb/dvb-usb/cxusb-analog.c @@ -1622,8 +1622,7 @@ int cxusb_medion_analog_init(struct dvb_usb_device *dvbdev) /* TODO: setup audio samples insertion */ ret = v4l2_subdev_call(cxdev->cx25840, core, s_io_pin_config, - sizeof(cxusub_medion_pin_config) / - sizeof(cxusub_medion_pin_config[0]), + ARRAY_SIZE(cxusub_medion_pin_config), cxusub_medion_pin_config); if (ret != 0) dev_warn(>udev->dev,
[PATCH bpf v2] bpf: preallocate a perf_sample_data per event fd
It is possible that a BPF program can be called while another BPF program is executing bpf_perf_event_output. This has been observed with I/O completion occurring as a result of an interrupt: bpf_prog_247fd1341cddaea4_trace_req_end+0x8d7/0x1000 ? trace_call_bpf+0x82/0x100 ? sch_direct_xmit+0xe2/0x230 ? blk_mq_end_request+0x1/0x100 ? blk_mq_end_request+0x5/0x100 ? kprobe_perf_func+0x19b/0x240 ? __qdisc_run+0x86/0x520 ? blk_mq_end_request+0x1/0x100 ? blk_mq_end_request+0x5/0x100 ? kprobe_ftrace_handler+0x90/0xf0 ? ftrace_ops_assist_func+0x6e/0xe0 ? ip6_input_finish+0xbf/0x460 ? 0xa01e80bf ? nbd_dbg_flags_show+0xc0/0xc0 [nbd] ? blkdev_issue_zeroout+0x200/0x200 ? blk_mq_end_request+0x1/0x100 ? blk_mq_end_request+0x5/0x100 ? flush_smp_call_function_queue+0x6c/0xe0 ? smp_call_function_single_interrupt+0x32/0xc0 ? call_function_single_interrupt+0xf/0x20 ? call_function_single_interrupt+0xa/0x20 ? swiotlb_map_page+0x140/0x140 ? refcount_sub_and_test+0x1a/0x50 ? tcp_wfree+0x20/0xf0 ? skb_release_head_state+0x62/0xc0 ? skb_release_all+0xe/0x30 ? napi_consume_skb+0xb5/0x100 ? mlx5e_poll_tx_cq+0x1df/0x4e0 ? mlx5e_poll_tx_cq+0x38c/0x4e0 ? mlx5e_napi_poll+0x58/0xc30 ? mlx5e_napi_poll+0x232/0xc30 ? net_rx_action+0x128/0x340 ? __do_softirq+0xd4/0x2ad ? irq_exit+0xa5/0xb0 ? do_IRQ+0x7d/0xc0 ? common_interrupt+0xf/0xf ? __rb_free_aux+0xf0/0xf0 ? perf_output_sample+0x28/0x7b0 ? perf_prepare_sample+0x54/0x4a0 ? perf_event_output+0x43/0x60 ? bpf_perf_event_output_raw_tp+0x15f/0x180 ? blk_mq_start_request+0x1/0x120 ? bpf_prog_411a64a706fc6044_should_trace+0xad4/0x1000 ? bpf_trace_run3+0x2c/0x80 ? nbd_send_cmd+0x4c2/0x690 [nbd] This also cannot be alleviated by further splitting the per-cpu perf_sample_data structs (as in commit 283ca526a9bd ("bpf: fix corruption on concurrent perf_event_output calls")), as a raw_tp could be attached to the block:block_rq_complete tracepoint and execute during another raw_tp. Instead, keep a pre-allocated perf_sample_data structure per perf_event_array element and fail a bpf_perf_event_output if that element is concurrently being used. Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data") Signed-off-by: Matt Mullins --- v1->v2: keep a pointer to the struct perf_sample_data rather than directly embedding it in the structure, avoiding the circular include and removing the need for in_use. Suggested by Song. include/linux/bpf.h | 1 + kernel/bpf/arraymap.c| 3 ++- kernel/trace/bpf_trace.c | 29 - 3 files changed, 19 insertions(+), 14 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 4fb3aa2dc975..47fd85cfbbaf 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -472,6 +472,7 @@ struct bpf_event_entry { struct file *perf_file; struct file *map_file; struct rcu_head rcu; + struct perf_sample_data *sd; }; bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp); diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 584636c9e2eb..c7f5d593e04f 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -654,11 +654,12 @@ static struct bpf_event_entry *bpf_event_entry_gen(struct file *perf_file, { struct bpf_event_entry *ee; - ee = kzalloc(sizeof(*ee), GFP_ATOMIC); + ee = kzalloc(sizeof(*ee) + sizeof(struct perf_sample_data), GFP_ATOMIC); if (ee) { ee->event = perf_file->private_data; ee->perf_file = perf_file; ee->map_file = map_file; + ee->sd = (void *)ee + sizeof(*ee); } return ee; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index f92d6ad5e080..076f8e987355 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -410,17 +410,17 @@ static const struct bpf_func_proto bpf_perf_event_read_value_proto = { .arg4_type = ARG_CONST_SIZE, }; -static DEFINE_PER_CPU(struct perf_sample_data, bpf_trace_sd); - static __always_inline u64 __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map, - u64 flags, struct perf_sample_data *sd) + u64 flags, struct perf_raw_record *raw) { struct bpf_array *array = container_of(map, struct bpf_array, map); unsigned int cpu = smp_processor_id(); u64 index = flags & BPF_F_INDEX_MASK; struct bpf_event_entry *ee; struct perf_event *event; + struct perf_sample_data *sd; + u64 ret; if (index == BPF_F_CURRENT_CPU)
Re: [PATCH v2 net-next 7/7] net: ethernet: ti: cpsw: add XDP support
On Fri, 31 May 2019 20:03:33 +0300 Ivan Khoronzhuk wrote: > Probably it's not good example for others how it should be used, not > a big problem to move it to separate pools.., even don't remember why > I decided to use shared pool, there was some more reasons... need > search in history. Using a shared pool is makes it a lot harder to solve the issue I'm currently working on. That is handling/waiting for in-flight frames to complete, before removing the mem ID from the (r)hashtable lookup. I have working code, that basically remove page_pool_destroy() from public API, and instead lets xdp_rxq_info_unreg() call it when in-flight count reach zero (and delay fully removing the mem ID). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
linux kernel page allocation failure and tuning of page reclaim procedure
Hi We are using Renesas RZ/A1 processor based custom target board . linux kernel version is 4.9.123. 1) the platform is low memory platform having memory 64MB. 2) we are doing around 45MB TCP data transfer from PC to target using netcat utility . On Target , a process receives data over socket and writes the data to flash disk . 3) At the start of data transfer , we explicitly clear linux kernel cached memory by calling echo 3 > /proc/sys/vm/drop_caches 4) during TCP data transfer , we could see free -m showing "free" getting dropped to almost 1MB and most of the memory appearing as "cached" # free -m total used free shared buffers cached Mem: 5756 1 02 42 -/+ buffers/cache: 1245 Swap: 0 0 0 5) sometimes , we observed kernel memory getting exhausted as page allocation failure happens in kernel with the backtrace as printed below in point 7 ): 6) we have certain questions as below : a) how the kernel memory got exhausted ?is the kernel page reclaim mechanism not executing at right time ? since at the time of low memory conditions in kernel , even though the free memory drops to 1MB but there still cached memory available is 42mb and page reclaim procedure should have triggered which would have synced the cached pages to disk and used the same for kernel page allocations b) are there any parameters available within the linux memory subsystem with which the reclaim procedure can be monitored and fine tuned ? c) can some amount of free memory be reserved so that linux kernel does not caches it for page cache usage and kernel can use it for its other required page allocation ( particularly gfp_atomic ) as needed above on behalf of netcat nc process ? can some tuning be done in linux memory subsystem eg by using /proc/sys/vm/min_free_kbytes to achieve this objective . d) can we be provided with further clues on how to debug this issue further for out of memory condition in kernel ? 7) Backtrace: # [ 775.947949] nc.traditional: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC) [ 775.956362] CPU: 0 PID: 1288 Comm: nc.traditional Tainted: G O4.9.123-pic6-g31a13de-dirty #19 [ 775.966085] Hardware name: Generic R7S72100 (Flattened Device Tree) [ 775.972501] [] (unwind_backtrace) from [] (show_stack+0xb/0xc) [ 775.980118] [] (show_stack) from [] (warn_alloc+0x89/0xba) [ 775.987361] [] (warn_alloc) from [] (__alloc_pages_nodemask+0x1eb/0x634) [ 775.995790] [] (__alloc_pages_nodemask) from [] (__alloc_page_frag+0x39/0xde) [ 776.004685] [] (__alloc_page_frag) from [] (__netdev_alloc_skb+0x51/0xb0) [ 776.013217] [] (__netdev_alloc_skb) from [] (sh_eth_poll+0xbf/0x3c0) [ 776.021342] [] (sh_eth_poll) from [] (net_rx_action+0x77/0x170) [ 776.029051] [] (net_rx_action) from [] (__do_softirq+0x107/0x160) [ 776.036896] [] (__do_softirq) from [] (irq_exit+0x5d/0x80) [ 776.044165] [] (irq_exit) from [] (__handle_domain_irq+0x57/0x8c) [ 776.052007] [] (__handle_domain_irq) from [] (gic_handle_irq+0x31/0x48) [ 776.060362] [] (gic_handle_irq) from [] (__irq_svc+0x65/0xac) [ 776.067835] Exception stack(0xc1cafd70 to 0xc1cafdb8) [ 776.072876] fd60: 0002751c c1dec6a0 000c 521c3be5 [ 776.081042] fd80: 56feb08e f64823a6 ffb35f7b feab513d f9cb0643 056c c1caff10 e000 [ 776.089204] fda0: b1f49160 c1cafdc4 c180c677 c0234ace 200e0033 [ 776.095816] [] (__irq_svc) from [] (__copy_to_user_std+0x7e/0x430) [ 776.103796] [] (__copy_to_user_std) from [] (copy_page_to_iter+0x105/0x250) [ 776.112503] [] (copy_page_to_iter) from [] (skb_copy_datagram_iter+0xa3/0x108) [ 776.121469] [] (skb_copy_datagram_iter) from [] (tcp_recvmsg+0x3ab/0x5f4) [ 776.130045] [] (tcp_recvmsg) from [] (inet_recvmsg+0x21/0x2c) [ 776.137576] [] (inet_recvmsg) from [] (sock_read_iter+0x51/0x6e) [ 776.145384] [] (sock_read_iter) from [] (__vfs_read+0x97/0xb0) [ 776.152967] [] (__vfs_read) from [] (vfs_read+0x51/0xb0) [ 776.159983] [] (vfs_read) from [] (SyS_read+0x27/0x52) [ 776.166837] [] (SyS_read) from [] (ret_fast_syscall+0x1/0x54) [ 776.174308] Mem-Info: [ 776.176650] active_anon:2037 inactive_anon:23 isolated_anon:0 [ 776.176650] active_file:2636 inactive_file:7391 isolated_file:32 [ 776.176650] unevictable:0 dirty:1366 writeback:1281 unstable:0 [ 776.176650] slab_reclaimable:719 slab_unreclaimable:724 [ 776.176650] mapped:1990 shmem:26 pagetables:159 bounce:0 [ 776.176650] free:373 free_pcp:6 free_cma:0 [ 776.209062] Node 0 active_anon:8148kB inactive_anon:92kB active_file:10544kB inactive_file:29564kB unevictable:0kB isolated(anon):0kB isolated(file):128kB mapped:7960kB dirty:5464kB writeback:5124kB shmem:104kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no [ 776.233602] Normal free:1492kB min:964kB
Re: [PATCH v3] HID: fix A4Tech horizontal scrolling
On Sun, 12 May 2019, Błażej Szczygieł wrote: > Since recent high resolution scrolling changes the A4Tech driver must > check for the "REL_WHEEL_HI_RES" usage code. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203369 > Fixes: 2dc702c991e3774af9d7ce410eef410ca9e2357e ("HID: input: use the > Resolution Multiplier for high-resolution scrolling") > > Signed-off-by: Błażej Szczygieł Applied, thanks. -- Jiri Kosina SUSE Labs
re: I have a business deal of mutual funds benefits for you.
-- Dear sir, I have a client who is an oil business man and he made a fixed deposit of $24million USD in my bank, where I am the director of the branch, My client died with his entire family in Jordanian intervention in the Syrian Civil War 2014 leaving behind no next of kin. I Propose to present you as next of kin to claim the funds, if interested reply me for full details and how we are to proceed to close this deal. Thanks ismet
Re: [PATCH v2 03/17] soc: qcom: ipa: main code
On 5/31/19 4:50 PM, David Miller wrote: > From: Alex Elder > Date: Thu, 30 May 2019 22:53:34 -0500 > >> +void *route_virt; > ... >> +void *filter_virt; > ... > > If these are arrays of u64's, please declare them as "u64 *" instead of > the opaque "void *". Good idea. I hadn't paid attention to that. These tables are arrays of 64-bit addresses so it's better to represent them that way. Thanks. -Alex
Re: [PATCH] livepatch: Fix ftrace module text permissions race
On Fri, May 31, 2019 at 02:12:56PM -0500, Josh Poimboeuf wrote: > > Anyway, the above is a separate problem. This patch looks > > fine for the original problem. > > Thanks for the review. I'll post another version, with the above > changes and with the patches split up like Miroslav suggested. The latest patches are here: https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=fix-livepatch-ftrace-race If the bot likes them, I'll post them properly soon. -- Josh
Re: [PATCH v8 2/4] soc: qcom: Add AOSS QMP driver
Hi, On Thu, May 30, 2019 at 8:01 PM Bjorn Andersson wrote: > > +/** > + * qmp_send() - send a message to the AOSS > + * @qmp: qmp context > + * @data: message to be sent > + * @len: length of the message > + * > + * Transmit @data to AOSS and wait for the AOSS to acknowledge the message. > + * @len must be a multiple of 4 and not longer than the mailbox size. Access > is > + * synchronized by this implementation. > + * > + * Return: 0 on success, negative errno on failure > + */ > +static int qmp_send(struct qmp *qmp, const void *data, size_t len) > +{ > + int ret; > + > + if (WARN_ON(len + sizeof(u32) > qmp->size)) > + return -EINVAL; > + > + if (WARN_ON(len % sizeof(u32))) > + return -EINVAL; > + > + mutex_lock(>tx_lock); > + > + /* The message RAM only implements 32-bit accesses */ > + __iowrite32_copy(qmp->msgram + qmp->offset + sizeof(u32), > +data, len / sizeof(u32)); > + writel(len, qmp->msgram + qmp->offset); > + qmp_kick(qmp); > + > + ret = wait_event_interruptible_timeout(qmp->event, > + qmp_message_empty(qmp), HZ); > + if (!ret) { > + dev_err(qmp->dev, "ucore did not ack channel\n"); > + ret = -ETIMEDOUT; > + > + /* Clear message from buffer */ > + writel(0, qmp->msgram + qmp->offset); > + } else { > + ret = 0; > + } Just like Vinod said in in v7, the "ret = 0" is redundant. > +static int qmp_qdss_clk_add(struct qmp *qmp) > +{ > + struct clk_init_data qdss_init = { > + .ops = _qdss_clk_ops, > + .name = "qdss", > + }; As I mentioned in v7, there is no downside in marking qdss_init as "static const" and it avoids the compiler inserting a memcpy() to get this data on the stack. Using static const also reduces your stack usage. > + int ret; > + > + qmp->qdss_clk.init = _init; > + ret = clk_hw_register(qmp->dev, >qdss_clk); > + if (ret < 0) { > + dev_err(qmp->dev, "failed to register qdss clock\n"); > + return ret; > + } > + > + ret = of_clk_add_hw_provider(qmp->dev->of_node, of_clk_hw_simple_get, > +>qdss_clk); I still prefer to devm-ify the whole driver, using devm_add_action_or_reset() to handle things where there is no devm. ...but I won't insist. Above things are just nits and I won't insist. They also could be addressed in follow-up patches. Thus: Reviewed-by: Douglas Anderson
Re: [PATCH] drivers: hid: Add a module description line to the hid_hyperv driver
On Thu, 30 May 2019, Sasha Levin wrote: > From: Joseph Salisbury > > This patch only adds a MODULE_DESCRIPTION statement to the driver. > This change is only cosmetic, so there should be no runtime impact. > > Signed-off-by: Joseph Salisbury > Reviewed-by: Michael Kelley > Signed-off-by: Sasha Levin > --- > drivers/hid/hid-hyperv.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c > index 704049e62d58a..d3311d714d359 100644 > --- a/drivers/hid/hid-hyperv.c > +++ b/drivers/hid/hid-hyperv.c > @@ -614,5 +614,7 @@ static void __exit mousevsc_exit(void) > } > > MODULE_LICENSE("GPL"); > +MODULE_DESCRIPTION("Microsoft Hyper-V Synthetic HID Driver"); > + Is there a reason why you didn't CC Joseph on this submission? CCing Joseph and applying. Thanks, -- Jiri Kosina SUSE Labs
Re: hid-related 5.2-rc1 boot hang
On Thu, 30 May 2019, Dave Hansen wrote: > On 5/29/19 2:17 AM, Hans de Goede wrote: > ... > > Dave, can you try building your initrd without the hid-logitech-dj module > > included in the initrd? > > I did this on a vanilla 5.2-rc2 kernel (without the reverts) and still > experienced the boot hang while the device was inserted. > > > Also can you check if your modprobe is provided by module-init-tools > > or by kmod ? > > $ dpkg -S `which modprobe` > kmod: /sbin/modprobe Benjamin, Hans, are you looking into this? If not, I think we should start reverting (at least the request_module() changes, not sure about the rest of logitech issues yet) next week. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH v1 1/2] fork: add clone3
On Fri, May 31, 2019 at 01:38:29PM -0700, Linus Torvalds wrote: > On Wed, May 29, 2019 at 3:24 PM Andrei Vagin wrote: > > > > Thank you for thinking about time namespaces. I looked at this patch > > quickly and I would suggest to move a termination signal out of flags. I > > think we can add a separate field (exit_signal) into clone_args. Does it > > make sense? For me, exit_signal in flags always looked weird... > > I agree - the child signal in flags was always just a "it fits" kind > of thing, and that was obviously true back then, but is not true now. (Traveling until Monday, so sorry for delayed responses.) Yip, I agree that this is a good idea (answered Andrei's mail just now saying the same thing). I'll send out a new version of the patch with these changes added next week. Christian
Re: [PATCH v1 1/2] fork: add clone3
On Wed, May 29, 2019 at 03:24:15PM -0700, Andrei Vagin wrote: > On Wed, May 29, 2019 at 05:22:36PM +0200, Christian Brauner wrote: > > This adds the clone3 system call. > > > > As mentioned several times already (cf. [7], [8]) here's the promised > > patchset for clone3(). > > > > We recently merged the CLONE_PIDFD patchset (cf. [1]). It took the last > > free flag from clone(). > > > > Independent of the CLONE_PIDFD patchset a time namespace has been discussed > > at Linux Plumber Conference last year and has been sent out and reviewed > > (cf. [5]). It is expected that it will go upstream in the not too distant > > future. However, it relies on the addition of the CLONE_NEWTIME flag to > > clone(). The only other good candidate - CLONE_DETACHED - is currently not > > recyclable as we have identified at least two large or widely used > > codebases that currently pass this flag (cf. [2], [3], and [4]). Given that > > CLONE_PIDFD grabbed the last clone() flag the time namespace is effectively > > blocked. clone3() has the advantage that it will unblock this patchset > > again. > > Hi Christian, Hi Andrei, (Traveling until Monday, so sorry for delayed responses.) > > Thank you for thinking about time namespaces. I looked at this patch > quickly and I would suggest to move a termination signal out of flags. I > think we can add a separate field (exit_signal) into clone_args. Does it > make sense? For me, exit_signal in flags always looked weird... Yup, that does sound good to me. > > I will look at this patch more detailed later this week. Thanks. Excellent! Christian
Re: [PATCH v1 1/2] fork: add clone3
On Wed, May 29, 2019 at 05:42:14PM +0200, Yann Droneaud wrote: > Le mercredi 29 mai 2019 à 17:22 +0200, Christian Brauner a écrit : > > This adds the clone3 system call. > > > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index b4cba953040a..6bc3e3d17150 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -2472,7 +2475,96 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, > > unsigned long, newsp, > > unsigned long, tls) > > #endif > > { > > - return _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr, > > tls); > > + struct kernel_clone_args args = { > > + .flags = clone_flags, > > + .stack = newsp, > > + .pidfd = parent_tidptr, > > + .parent_tidptr = parent_tidptr, > > + .tls = tls, > > + .child_tidptr = child_tidptr, > > + }; > > + > > + /* clone(CLONE_PIDFD) uses parent_tidptr to return a pidfd */ > > + if ((clone_flags & CLONE_PIDFD) && (clone_flags & CLONE_PARENT_SETTID)) > > + return -EINVAL; > > + > > + return _do_fork(); > > +} > > + > > +static bool clone3_flags_valid(u64 flags) > > +{ > > + if (flags & CLONE_DETACHED) > > + return false; > > + > > + if (flags & ~CLONE_MAX) > > + return false; > > + > > + return true; > > +} > > + > > +static int copy_clone_args_from_user(struct kernel_clone_args *kargs, > > +struct clone_args __user *uargs, > > +size_t size) > > +{ > > + struct clone_args args; > > + > > + if (unlikely(size > PAGE_SIZE)) > > + return -E2BIG; > > + > > + if (unlikely(size < sizeof(struct clone_args))) > > + return -EINVAL; > > + > > + if (unlikely(!access_ok(uargs, size))) > > + return -EFAULT; > > + > > + if (size > sizeof(struct clone_args)) { > > + unsigned char __user *addr; > > + unsigned char __user *end; > > + unsigned char val; > > + > > + addr = (void __user *)uargs + sizeof(struct clone_args); > > + end = (void __user *)uargs + size; > > + > > + for (; addr < end; addr++) { > > + if (get_user(val, addr)) > > + return -EFAULT; > > + if (val) > > + return -E2BIG; > > Should be -EINVAL: having something after the structure should be > handled just like an invalid flags, while still allowing future > userspace program to probe for support for newer feature. (Traveling until Monday, so sorry for delayed responses.) This copies what: kernel/sched/core.c:sched_copy_attr() kernel/event/core.c:perf_copy_attr() are already doing. Consistency might be good here but, I think. Christian
Re: [PATCH v2 00/17] net: introduce Qualcomm IPA driver
On 5/31/19 4:12 PM, Arnd Bergmann wrote: > On Fri, May 31, 2019 at 10:47 PM Alex Elder wrote: >> On 5/31/19 2:19 PM, Arnd Bergmann wrote: >>> On Fri, May 31, 2019 at 6:36 PM Alex Elder wrote: On 5/31/19 9:58 AM, Dan Williams wrote: > On Thu, 2019-05-30 at 22:53 -0500, Alex Elder wrote: >>> >>> Does this mean that IPA can only be used to back rmnet, and rmnet >>> can only be used on top of IPA, or can or both of them be combined >>> with another driver to talk to instead? >> >> No it does not mean that. >> >> As I understand it, one reason for the rmnet layer was to abstract >> the back end, which would allow using a modem, or using something >> else (a LAN?), without exposing certain details of the hardware. >> (Perhaps to support multiplexing, etc. without duplicating that >> logic in two "back-end" drivers?) >> >> To be perfectly honest, at first I thought having IPA use rmnet >> was a cargo cult thing like Dan suggested, because I didn't see >> the benefit. I now see why one would use that pass-through layer >> to handle the QMAP features. >> >> But back to your question. The other thing is that I see no >> reason the IPA couldn't present a "normal" (non QMAP) interface >> for a modem. It's something I'd really like to be able to do, >> but I can't do it without having the modem firmware change its >> configuration for these endpoints. My access to the people who >> implement the modem firmware has been very limited (something >> I hope to improve), and unless and until I can get corresponding >> changes on the modem side to implement connections that don't >> use QMAP, I can't implement such a thing. > > Why would that require firmware changes? What I was thinking > here is to turn the bits of the rmnet driver that actually do anything > interesting on the headers into a library module (or a header file > with inline functions) that can be called directly by the ipa driver, > keeping the protocol unchanged. You know, it's possible you're right about not needing firmware changes. But it has always been my impression they would be needed. Here's why. It looks like this: GSI Channel GSI Channel | | -- v --- v - | AP (ep)|===| IPA |===|(ep) Modem | -- --- - The AP and Modem each have IPA endpoints (ep), which use GSI channels, to communicate with the IPA. Each endpoint has configuration options (such as checksum offload). I *thought* that the configurations of the two endpoints need to be compatible (e.g., they need to agree on whether they're aggregating). But with your questioning I now think you may be right, that only the local endpoint's configuration matters. I will inquire further on this. I *know* that the AP and modem exchange some information about IPA configuration, but looking more closely that looks like it's all about the configuration of shared IPA resources, not endpoints. That said, the broader design (including the user space code) surely assumes rmnet, and I don't have any sense of what impact changing that would make. I am sure that changing it would not be well received. -Alex >>> Always passing data from one netdev to another both ways >>> sounds like it introduces both direct CPU overhead, and >>> problems with flow control when data gets buffered inbetween. >> >> My impression is the rmnet driver is a pretty thin layer, >> so the CPU overhead is probably not that great (though >> deaggregating a message is expensive). I agree with you >> on the flow control. > > The CPU overhead I mean is not from executing code in the > rmnet driver, but from passing packets through the network > stack between the two drivers, i.e. adding each frame to > a queue and taking it back out. I'm not sure how this ends > up working in reality but from a first look it seems like > we might bounce in an out of the softirq handler inbetween. > > Arnd >
Re: [RFC PATCH v2 11/12] x86/mm/tlb: Use async and inline messages for flushing
> On May 31, 2019, at 2:47 PM, Andy Lutomirski wrote: > > > On May 31, 2019, at 2:33 PM, Nadav Amit wrote: > >>> On May 31, 2019, at 2:14 PM, Andy Lutomirski wrote: >>> On Thu, May 30, 2019 at 11:37 PM Nadav Amit wrote: When we flush userspace mappings, we can defer the TLB flushes, as long the following conditions are met: 1. No tables are freed, since otherwise speculative page walks might cause machine-checks. 2. No one would access userspace before flush takes place. Specifically, NMI handlers and kprobes would avoid accessing userspace. >>> >>> I think I need to ask the big picture question. When someone calls >>> flush_tlb_mm_range() (or the other entry points), if no page tables >>> were freed, they want the guarantee that future accesses (initiated >>> observably after the flush returns) will not use paging entries that >>> were replaced by stores ordered before flush_tlb_mm_range(). We also >>> need the guarantee that any effects from any memory access using the >>> old paging entries will become globally visible before >>> flush_tlb_mm_range(). >>> >>> I'm wondering if receipt of an IPI is enough to guarantee any of this. >>> If CPU 1 sets a dirty bit and CPU 2 writes to the APIC to send an IPI >>> to CPU 1, at what point is CPU 2 guaranteed to be able to observe the >>> dirty bit? An interrupt entry today is fully serializing by the time >>> it finishes, but interrupt entries are epicly slow, and I don't know >>> if the APIC waits long enough. Heck, what if IRQs are off on the >>> remote CPU? There are a handful of places where we touch user memory >>> with IRQs off, and it's (sadly) possible for user code to turn off >>> IRQs with iopl(). >>> >>> I *think* that Intel has stated recently that SMT siblings are >>> guaranteed to stop speculating when you write to the APIC ICR to poke >>> them, but SMT is very special. >>> >>> My general conclusion is that I think the code needs to document what >>> is guaranteed and why. >> >> I think I might have managed to confuse you with a bug I made (last minute >> bug when I was doing some cleanup). This bug does not affect the performance >> much, but it might led you to think that I use the APIC sending as >> synchronization. >> >> The idea is not for us to rely on write to ICR as something serializing. The >> flow should be as follows: >> >> >> CPU0CPU1 >> >> flush_tlb_mm_range() >> __smp_call_function_many() >> [ prepare call_single_data (csd) ] >> [ lock csd ] >> [ send IPI ] >> (*) >> [ wait for csd to be unlocked ] >> [ interrupt ] >> [ copy csd info to stack ] >> [ csd unlock ] >> [ find csd is unlocked ] >> [ continue (**) ] >> [ flush TLB ] >> >> >> At (**) the pages might be recycled, written-back to disk, etc. Note that >> during (*), CPU0 might do some local TLB flushes, making it very likely that >> CSD will be unlocked by the time it gets there. >> >> As you can see, I don’t rely on any special micro-architectural behavior. >> The synchronization is done purely in software. >> >> Does it make more sense now? > > Yes. Have you benchmarked this? Partially. Numbers are indeed worse. Here are preliminary results, comparing to v1 (concurrent): n_threads before concurrent +async - -- -- -- 1 661 663 663 2 14361225 (-14%) 1115 (-22%) 4 15711421 (-10%) 1289 (-18%) Note that the benefit of “async" would be greater if the initiator does not flush the TLB at all. This might happen in the case of kswapd, for example. Let me try some micro-optimizations first, run more benchmarks and get back to you.
Re: [PATCH v2 net-next 7/7] net: ethernet: ti: cpsw: add XDP support
On Fri, 2019-05-31 at 20:03 +0300, Ivan Khoronzhuk wrote: > On Fri, May 31, 2019 at 06:32:41PM +0200, Jesper Dangaard Brouer > wrote: > > On Fri, 31 May 2019 19:25:24 +0300 Ivan Khoronzhuk < > > ivan.khoronz...@linaro.org> wrote: > > > > > On Fri, May 31, 2019 at 05:46:43PM +0200, Jesper Dangaard Brouer > > > wrote: > > > > From below code snippets, it looks like you only allocated 1 > > > > page_pool > > > > and sharing it with several RX-queues, as I don't have the full > > > > context > > > > and don't know this driver, I might be wrong? > > > > > > > > To be clear, a page_pool object is needed per RX-queue, as it > > > > is > > > > accessing a small RX page cache (which protected by > > > > NAPI/softirq). > > > > > > There is one RX interrupt and one RX NAPI for all rx channels. > > > > So, what are you saying? > > > > You _are_ sharing the page_pool between several RX-channels, but it > > is > > safe because this hardware only have one RX interrupt + NAPI > > instance?? > > I can miss smth but in case of cpsw technically it means: > 1) RX interrupts are disabled while NAPI is scheduled, >not for particular CPU or channel, but at all, for whole cpsw > module. > 2) RX channels are handled one by one by priority. Hi Ivan, I got a silly question.. What is the reason behind having multiple RX rings and one CPU/NAPI handling all of them ? priority ? how do you priorities ? > 3) After all of them handled and no more in budget - interrupts are > enabled. > 4) If page is returned to the pool, and it's within NAPI, no races as > it's >returned protected by softirq. If it's returned not in softirq > it's protected >by producer lock of the ring. > > Probably it's not good example for others how it should be used, not > a big > problem to move it to separate pools.., even don't remember why I > decided to > use shared pool, there was some more reasons... need search in > history. > > > -- > > Best regards, > > Jesper Dangaard Brouer > > MSc.CS, Principal Kernel Engineer at Red Hat > > LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH] Revert "usb: core: remove local_irq_save() around ->complete() handler"
On Fri, May 31, 2019 at 11:53:40PM +0200, Soeren Moch wrote: > This reverts commit ed194d1367698a0872a2b75bbe06b3932ce9df3a. > > In contrast to the original patch description, apparently not all handlers > were audited properly. E.g. my RT5370 based USB WIFI adapter (driver in > drivers/net/wireless/ralink/rt2x00) hangs after a while under heavy load. > This revert fixes this. Why not just fix that driver? Wouldn't that be easier? thanks, greg k-h
Re: [RESEND PATCH v4 6/6] leds: lm36274: Introduce the TI LM36274 LED driver
Dan, On 5/31/19 11:07 PM, Dan Murphy wrote: Hello On 5/31/19 2:44 PM, Jacek Anaszewski wrote: On 5/31/19 8:23 AM, Lee Jones wrote: On Thu, 30 May 2019, Jacek Anaszewski wrote: On 5/30/19 9:38 AM, Lee Jones wrote: On Wed, 29 May 2019, Jacek Anaszewski wrote: On 5/29/19 3:58 PM, Lee Jones wrote: On Fri, 24 May 2019, Jacek Anaszewski wrote: Hi, On 5/23/19 9:09 PM, Dan Murphy wrote: Pavel Thanks for the review On 5/23/19 7:50 AM, Pavel Machek wrote: Hi! +++ b/drivers/leds/leds-lm36274.c +static int lm36274_parse_dt(struct lm36274 *lm36274_data) +{ + struct fwnode_handle *child = NULL; + char label[LED_MAX_NAME_SIZE]; + struct device *dev = _data->pdev->dev; + const char *name; + int child_cnt; + int ret = -EINVAL; + + /* There should only be 1 node */ + child_cnt = device_get_child_node_count(dev); + if (child_cnt != 1) + return ret; I'd do explicit "return -EINVAL" here. ACK +static int lm36274_probe(struct platform_device *pdev) +{ + struct ti_lmu *lmu = dev_get_drvdata(pdev->dev.parent); + struct lm36274 *lm36274_data; + int ret; + + lm36274_data = devm_kzalloc(>dev, sizeof(*lm36274_data), + GFP_KERNEL); + if (!lm36274_data) { + ret = -ENOMEM; + return ret; + } And certainly do "return -ENOMEM" explicitly here. ACK Acked-by: Pavel Machek I've done all amendments requested by Pavel and updated branch ib-leds-mfd-regulator on linux-leds.git, but in the same time What do you mean by updated? You cannot update an 'ib' (immutable branch). Immutable means that it cannot change, by definition. We have already talked about that. Nobody has pulled so the branch could have been safely updated. You have no sure way to know that. And since I have no way to know, or faith that you won't update it again, pulling it now/at all would seem like a foolish thing to do. Sorry, but you are simply unjust. You're pretending to portray the situation as if I have been notoriously causing merge conflicts in linux-next which did not take place. Just to recap what this discussion is about: On 7 Apr 2019: 1. I sent pull request [0]. 2. 45 minutes later I updated it after discovering one omission [1]. It was rather small chance for it to be pulled as quickly as that. And even if it happened it wouldn't have been much harmful - we wouldn't have lost e.g. weeks of testing in linux-next due to that fact. On 21 May 2019: 3. I sent another pull request [2] to you and REGULATOR maintainers. After it turned out that lack of feedback from REGULATOR maintainers was caused by failing to send them the exact copies of patches to review, I informed you about possible need for updating the branch. Afterwards I received a reply from you saying that you hadn't pulled the branch anyway. At that point I was sure that neither MFD nor REGULATOR tree contains the patches. And only after that I updated the branch. Here are 2 examples where you have changed immutable branches, which is 100% of the Pull Requests I have received from you. Using that record as a benchmark, the situation hardly seems unjust. Until you can provide me with an assurance that you will not keep updating/changing the supposedly immutable pull-requests you send out, I won't be pulling any more in. I can just uphold the assurance which is implicitly assumed for anyone who has never broken acclaimed rules. As justified above. You have broken the rules every (100% of the) time. Yes, I admit, I would lose in court. [0] https://lore.kernel.org/patchwork/patch/1059075/ [1] https://lore.kernel.org/patchwork/patch/1059080/ [2] https://lore.kernel.org/patchwork/patch/1077066/ So we have 2 choices moving forward; you can either provide me with assurance that you have learned from this experience and will never change an *immutable* branch again, or I can continue to handle them, which has been the preference for some years. If you choose the former and adaptions need to be made in the future, the correct thing to do is create a *new*, different pull-request which has its own *new*, different tag, but uses the original tag as a base. I choose the former. That being said: Hereby I solemnly declare never ever change an immutable branch again. So how do I proceed with the requested change by Mark B on the LM36274 driver. Do I add a patch on top? Or do I submit a patch to the regulator tree once the PR is pulled? Won't the change be a dependency for [PATCH v4 1/6] ? In each case, having all the commits in one set (and branch) should simplify the integration. -- Best regards, Jacek Anaszewski
[PATCH] Revert "usb: core: remove local_irq_save() around ->complete() handler"
This reverts commit ed194d1367698a0872a2b75bbe06b3932ce9df3a. In contrast to the original patch description, apparently not all handlers were audited properly. E.g. my RT5370 based USB WIFI adapter (driver in drivers/net/wireless/ralink/rt2x00) hangs after a while under heavy load. This revert fixes this. Also revert the follow-up patch d6142b91e9cc249b3aa22c90fade67e2e2d52cdb ("usb: core: remove flags variable in __usb_hcd_giveback_urb()"), since now we need the flags variable again. Cc: Sebastian Andrzej Siewior Cc: Alan Stern Cc: Greg Kroah-Hartman Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: sta...@vger.kernel.org # 4.20+ Signed-off-by: Soeren Moch --- drivers/usb/core/hcd.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index 94d22551fc1b..08d25fcf8b8e 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -1739,6 +1739,7 @@ static void __usb_hcd_giveback_urb(struct urb *urb) struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus); struct usb_anchor *anchor = urb->anchor; int status = urb->unlinked; + unsigned long flags; urb->hcpriv = NULL; if (unlikely((urb->transfer_flags & URB_SHORT_NOT_OK) && @@ -1755,7 +1756,20 @@ static void __usb_hcd_giveback_urb(struct urb *urb) /* pass ownership to the completion handler */ urb->status = status; + + /* +* We disable local IRQs here avoid possible deadlock because +* drivers may call spin_lock() to hold lock which might be +* acquired in one hard interrupt handler. +* +* The local_irq_save()/local_irq_restore() around complete() +* will be removed if current USB drivers have been cleaned up +* and no one may trigger the above deadlock situation when +* running complete() in tasklet. +*/ + local_irq_save(flags); urb->complete(urb); + local_irq_restore(flags); usb_anchor_resume_wakeups(anchor); atomic_dec(>use_count); -- 2.17.1