date:20190531

Re: [PATCH net-next 0/5] PTP support for the SJA1105 DSA driver

2019-05-31 Thread Richard Cochran

On Fri, May 31, 2019 at 09:12:03PM +0300, Vladimir Oltean wrote:
> It won't work unless I make changes to dsa_switch_rcv.

Or to the tagging code.

> Right now taggers can only return a pointer to the skb, or NULL, case
> in which DSA will free it.

The tagger can re-write the skb.  Why not reform it into a PTP frame?
This clever trick is what the phyter does in hardware.  See dp83640.c.

Thanks,
Richard

Re: [PATCH net-next 0/5] PTP support for the SJA1105 DSA driver

2019-05-31 Thread Richard Cochran

On Fri, May 31, 2019 at 07:16:17PM +0300, Vladimir Oltean wrote:
> But now comes the question on what to do on error cases - the meta
> frame didn't arrive. Should I just drop the skb waiting for it?

Yes, that is what other drivers do.

> Right now I "goto rcv_anyway" - which linuxptp doesn't like btw.

The application sees the missing time stamp and prints a warning
message.  This is IHMO the right thing to do, so that the user is made
aware of the degradation of the synchronization.

Thanks,
Richard

Re: [PATCH 3/3] habanalabs: restore unsecured registers default values

2019-05-31 Thread Oded Gabbay

On Thu, May 30, 2019 at 11:46 AM Dalit Ben Zoor  wrote:
>
> unsecured registers can be changed by the user, and hence should be
> restored to their default values in context switch
>
> Signed-off-by: Dalit Ben Zoor 
> ---
>  drivers/misc/habanalabs/goya/goya.c | 19 ---
>  1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/misc/habanalabs/goya/goya.c 
> b/drivers/misc/habanalabs/goya/goya.c
> index 87859c55b4b8..81c1d576783f 100644
> --- a/drivers/misc/habanalabs/goya/goya.c
> +++ b/drivers/misc/habanalabs/goya/goya.c
> @@ -786,7 +786,6 @@ static void goya_init_dma_ch(struct hl_device *hdev, int 
> dma_id)
> else
> sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1007;
>
> -   WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO + reg_off, lower_32_bits(sob_addr));
> WREG32(mmDMA_CH_0_WR_COMP_ADDR_HI + reg_off, upper_32_bits(sob_addr));
> WREG32(mmDMA_CH_0_WR_COMP_WDATA + reg_off, 0x8001);
>  }
> @@ -4560,10 +4559,12 @@ static int goya_memset_device_memory(struct hl_device 
> *hdev, u64 addr, u64 size,
>  int goya_context_switch(struct hl_device *hdev, u32 asid)
>  {
> struct asic_fixed_properties *prop = >asic_prop;
> -   u64 addr = prop->sram_base_address;
> +   u64 addr = prop->sram_base_address, sob_addr;
> u32 size = hdev->pldm ? 0x1 : prop->sram_size;
> u64 val = 0xull;
> -   int rc;
> +   int rc, dma_id;
> +   u32 channel_off = mmDMA_CH_1_WR_COMP_ADDR_LO -
> +   mmDMA_CH_0_WR_COMP_ADDR_LO;
>
> rc = goya_memset_device_memory(hdev, addr, size, val, false);
> if (rc) {
> @@ -4571,7 +4572,19 @@ int goya_context_switch(struct hl_device *hdev, u32 
> asid)
> return rc;
> }
>
> +   /* we need to reset registers that the user is allowed to change */
> +   sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1007;
> +   WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO, lower_32_bits(sob_addr));
> +
> +   for (dma_id = 1 ; dma_id < NUMBER_OF_EXT_HW_QUEUES ; dma_id++) {
> +   sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1000 +
> +   (dma_id - 1) * 4;
> +   WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO + channel_off * dma_id,
> +   lower_32_bits(sob_addr));
> +   }
> +
> WREG32(mmTPC_PLL_CLK_RLX_0, 0x200020);
> +
> goya_mmu_prepare(hdev, asid);
>
> goya_clear_sm_regs(hdev);
> --
> 2.17.1
>
The patch-set is:
Reviewed-by: Oded Gabbay

Re: [PATCH -mm] mm, swap: Fix bad swap file entry warning

2019-05-31 Thread Yuri Norov

(Resend as LKML didn't take outlook settings.)

> On Fri, 2019-05-31 at 11:27 -0700, Dexuan-Linux Cui wrote:
> > Hi,
> > Did you know about the panic reported here:
> >  https://marc.info/?t=15593077303=1=2
> > 
> > "Kernel panic - not syncing: stack-protector: Kernel stack is
> > corrupted in: write_irq_affinity.isra> "
> > 
> > This panic is reported on PowerPC and x86.
> > 
> > In the case of x86, we see a lot of "get_swap_device: Bad swap file entry"
> > errors before the panic:
> > 
> > ...
> > [   24.404693] get_swap_device: Bad swap file entry 5801
> > [   24.408702] get_swap_device: Bad swap file entry 5c01
> > [   24.412510] get_swap_device: Bad swap file entry 6001
> > [   24.416519] get_swap_device: Bad swap file entry 6401
> > [   24.420217] get_swap_device: Bad swap file entry 6801
> > [   24.423921] get_swap_device: Bad swap file entry 6c01

[..]

I don't have a panic, but I observe many lines like this.

> Looks familiar,
>
> https://lore.kernel.org/lkml/1559242868.6132.35.ca...@lca.pw/
>
> I suppose Andrew might be better of reverting the whole series first before 
> Yury
> came up with a right fix, so that other people who is testing linux-next don't
> need to waste time for the same problem.

I didn't observe any problems with this series on top of 5.1, and there's a fix 
for swap\
that eliminates the problem on top of current next for me:
https://lkml.org/lkml/2019/5/30/1630

Could you please test my series with the patch above?

Thanks,
Yury

[PATCH -next] ALSA: lx6464es - Remove set but not used variables 'orun_mask, urun_mask'

2019-05-31 Thread YueHaibing

Fixes gcc '-Wunused-but-set-variable' warning:

sound/pci/lx6464es/lx_core.c: In function 'lx_interrupt_handle_async_events':
sound/pci/lx6464es/lx_core.c:990:6: warning:
 variable 'urun_mask' set but not used [-Wunused-but-set-variable]
sound/pci/lx6464es/lx_core.c:989:6: warning:
 variable 'orun_mask' set but not used [-Wunused-but-set-variable]

They are never used, so can be removed.

Signed-off-by: YueHaibing 
---
 sound/pci/lx6464es/lx_core.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/sound/pci/lx6464es/lx_core.c b/sound/pci/lx6464es/lx_core.c
index 9236a1a8c49b..dd3a873777eb 100644
--- a/sound/pci/lx6464es/lx_core.c
+++ b/sound/pci/lx6464es/lx_core.c
@@ -986,8 +986,6 @@ static int lx_interrupt_handle_async_events(struct lx6464es 
*chip, u32 irqsrc,
 * Stat[8]  LSB overrun
 * */
 
-   u64 orun_mask;
-   u64 urun_mask;
int eb_pending_out = (irqsrc & MASK_SYS_STATUS_EOBO) ? 1 : 0;
int eb_pending_in  = (irqsrc & MASK_SYS_STATUS_EOBI) ? 1 : 0;
 
@@ -1010,9 +1008,6 @@ static int lx_interrupt_handle_async_events(struct 
lx6464es *chip, u32 irqsrc,
*r_notified_out_pipe_mask);
}
 
-   orun_mask = ((u64)stat[7] << 32) + stat[8];
-   urun_mask = ((u64)stat[5] << 32) + stat[6];
-
/* todo: handle xrun notification */
 
return err;

[PATCH -next] pwm: pca9685: Remove set but not used variable 'pwm'

2019-05-31 Thread YueHaibing

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/pwm/pwm-pca9685.c: In function 'pca9685_pwm_gpio_free':
drivers/pwm/pwm-pca9685.c:173:21: warning:
 variable 'pwm' set but not used [-Wunused-but-set-variable]

It's not used since commit e926b12c611c ("pwm: Clear chip_data in pwm_put()")

Signed-off-by: YueHaibing 
---
 drivers/pwm/pwm-pca9685.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/pwm/pwm-pca9685.c b/drivers/pwm/pwm-pca9685.c
index 567f5e2771c4..d16215c276bd 100644
--- a/drivers/pwm/pwm-pca9685.c
+++ b/drivers/pwm/pwm-pca9685.c
@@ -170,12 +170,10 @@ static void pca9685_pwm_gpio_set(struct gpio_chip *gpio, 
unsigned int offset,
 static void pca9685_pwm_gpio_free(struct gpio_chip *gpio, unsigned int offset)
 {
struct pca9685 *pca = gpiochip_get_data(gpio);
-   struct pwm_device *pwm;
 
pca9685_pwm_gpio_set(gpio, offset, 0);
pm_runtime_put(pca->chip.dev);
mutex_lock(>lock);
-   pwm = >chip.pwms[offset];
mutex_unlock(>lock);
 }

[PATCH] treewide: fix typos of SPDX-License-Identifier

2019-05-31 Thread Masahiro Yamada

Prior to the adoption of SPDX, it was difficult for tools to determine
the correct license due to incomplete or badly formatted license text.
The SPDX solves this issue, assuming people can correctly spell
"SPDX-License-Identifier" although this assumption is broken in some
places.

Since scripts/spdxcheck.py parses only lines that exactly matches to
the correct tag, it cannot (should not) detect this kind of error.

If the correct tag is missing, scripts/checkpatch.pl warns like this:

 WARNING: Missing or malformed SPDX-License-Identifier tag in line *

So, people should notice it before the patch submission, but in reality
broken tags sometimes slip in. The checkpatch warning is not useful for
checking the committed files globally since large number of files still
have no SPDX tag.

Also, I am not sure about the legal effect when the SPDX tag is broken.

Anyway, these typos are absolutely worth fixing. It is pretty easy to
find suspicious lines by grep.

  $ git grep --not -e SPDX-License-Identifier --and -e SPDX- -- \
:^LICENSES :^scripts/spdxcheck.py :^*/license-rules.rst
  arch/arm/kernel/bugs.c:// SPDX-Identifier: GPL-2.0
  drivers/phy/st/phy-stm32-usbphyc.c:// SPDX-Licence-Identifier: GPL-2.0
  drivers/pinctrl/sh-pfc/pfc-r8a77980.c:// SPDX-Lincense-Identifier: GPL 2.0
  lib/test_stackinit.c:// SPDX-Licenses: GPLv2
  sound/soc/codecs/max9759.c:// SPDX-Licence-Identifier: GPL-2.0

Signed-off-by: Masahiro Yamada 
---

 arch/arm/kernel/bugs.c| 2 +-
 drivers/phy/st/phy-stm32-usbphyc.c| 2 +-
 drivers/pinctrl/sh-pfc/pfc-r8a77980.c | 2 +-
 lib/test_stackinit.c  | 2 +-
 sound/soc/codecs/max9759.c| 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm/kernel/bugs.c b/arch/arm/kernel/bugs.c
index d41d3598e5e5..14c8dbbb7d2d 100644
--- a/arch/arm/kernel/bugs.c
+++ b/arch/arm/kernel/bugs.c
@@ -1,4 +1,4 @@
-// SPDX-Identifier: GPL-2.0
+// SPDX-License-Identifier: GPL-2.0
 #include 
 #include 
 #include 
diff --git a/drivers/phy/st/phy-stm32-usbphyc.c 
b/drivers/phy/st/phy-stm32-usbphyc.c
index 1255cd1d9a60..56bdea4b0bd9 100644
--- a/drivers/phy/st/phy-stm32-usbphyc.c
+++ b/drivers/phy/st/phy-stm32-usbphyc.c
@@ -1,4 +1,4 @@
-// SPDX-Licence-Identifier: GPL-2.0
+// SPDX-License-Identifier: GPL-2.0
 /*
  * STMicroelectronics STM32 USB PHY Controller driver
  *
diff --git a/drivers/pinctrl/sh-pfc/pfc-r8a77980.c 
b/drivers/pinctrl/sh-pfc/pfc-r8a77980.c
index 473da65890a7..9ed4ead2dafb 100644
--- a/drivers/pinctrl/sh-pfc/pfc-r8a77980.c
+++ b/drivers/pinctrl/sh-pfc/pfc-r8a77980.c
@@ -1,4 +1,4 @@
-// SPDX-Lincense-Identifier: GPL 2.0
+// SPDX-License-Identifier: GPL-2.0
 /*
  * R8A77980 processor support - PFC hardware block.
  *
diff --git a/lib/test_stackinit.c b/lib/test_stackinit.c
index 13115b6f2b88..e97dc54b4fdf 100644
--- a/lib/test_stackinit.c
+++ b/lib/test_stackinit.c
@@ -1,4 +1,4 @@
-// SPDX-Licenses: GPLv2
+// SPDX-License-Identifier: GPL-2.0
 /*
  * Test cases for compiler-based stack variable zeroing via future
  * compiler flags or CONFIG_GCC_PLUGIN_STRUCTLEAK*.
diff --git a/sound/soc/codecs/max9759.c b/sound/soc/codecs/max9759.c
index ecfb4a80424b..00e9d4fd1651 100644
--- a/sound/soc/codecs/max9759.c
+++ b/sound/soc/codecs/max9759.c
@@ -1,4 +1,4 @@
-// SPDX-Licence-Identifier: GPL-2.0
+// SPDX-License-Identifier: GPL-2.0
 /*
  * MAX9759 Amplifier Driver
  *
-- 
2.17.1

Re: [PATCH bpf v2] bpf: preallocate a perf_sample_data per event fd

2019-05-31 Thread Alexei Starovoitov

On Fri, May 31, 2019 at 6:28 PM Song Liu  wrote:
>
>
>
> > On May 31, 2019, at 3:37 PM, Matt Mullins  wrote:
> >
> > It is possible that a BPF program can be called while another BPF
> > program is executing bpf_perf_event_output.  This has been observed with
> > I/O completion occurring as a result of an interrupt:
> >
> >   bpf_prog_247fd1341cddaea4_trace_req_end+0x8d7/0x1000
> >   ? trace_call_bpf+0x82/0x100
> >   ? sch_direct_xmit+0xe2/0x230
> >   ? blk_mq_end_request+0x1/0x100
> >   ? blk_mq_end_request+0x5/0x100
> >   ? kprobe_perf_func+0x19b/0x240
> >   ? __qdisc_run+0x86/0x520
> >   ? blk_mq_end_request+0x1/0x100
> >   ? blk_mq_end_request+0x5/0x100
> >   ? kprobe_ftrace_handler+0x90/0xf0
> >   ? ftrace_ops_assist_func+0x6e/0xe0
> >   ? ip6_input_finish+0xbf/0x460
> >   ? 0xa01e80bf
> >   ? nbd_dbg_flags_show+0xc0/0xc0 [nbd]
> >   ? blkdev_issue_zeroout+0x200/0x200
> >   ? blk_mq_end_request+0x1/0x100
> >   ? blk_mq_end_request+0x5/0x100
> >   ? flush_smp_call_function_queue+0x6c/0xe0
> >   ? smp_call_function_single_interrupt+0x32/0xc0
> >   ? call_function_single_interrupt+0xf/0x20
> >   ? call_function_single_interrupt+0xa/0x20
> >   ? swiotlb_map_page+0x140/0x140
> >   ? refcount_sub_and_test+0x1a/0x50
> >   ? tcp_wfree+0x20/0xf0
> >   ? skb_release_head_state+0x62/0xc0
> >   ? skb_release_all+0xe/0x30
> >   ? napi_consume_skb+0xb5/0x100
> >   ? mlx5e_poll_tx_cq+0x1df/0x4e0
> >   ? mlx5e_poll_tx_cq+0x38c/0x4e0
> >   ? mlx5e_napi_poll+0x58/0xc30
> >   ? mlx5e_napi_poll+0x232/0xc30
> >   ? net_rx_action+0x128/0x340
> >   ? __do_softirq+0xd4/0x2ad
> >   ? irq_exit+0xa5/0xb0
> >   ? do_IRQ+0x7d/0xc0
> >   ? common_interrupt+0xf/0xf
> >   
> >   ? __rb_free_aux+0xf0/0xf0
> >   ? perf_output_sample+0x28/0x7b0
> >   ? perf_prepare_sample+0x54/0x4a0
> >   ? perf_event_output+0x43/0x60
> >   ? bpf_perf_event_output_raw_tp+0x15f/0x180
> >   ? blk_mq_start_request+0x1/0x120
> >   ? bpf_prog_411a64a706fc6044_should_trace+0xad4/0x1000
> >   ? bpf_trace_run3+0x2c/0x80
> >   ? nbd_send_cmd+0x4c2/0x690 [nbd]
> >
> > This also cannot be alleviated by further splitting the per-cpu
> > perf_sample_data structs (as in commit 283ca526a9bd ("bpf: fix
> > corruption on concurrent perf_event_output calls")), as a raw_tp could
> > be attached to the block:block_rq_complete tracepoint and execute during
> > another raw_tp.  Instead, keep a pre-allocated perf_sample_data
> > structure per perf_event_array element and fail a bpf_perf_event_output
> > if that element is concurrently being used.
> >
> > Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for 
> > perf_sample_data")
> > Signed-off-by: Matt Mullins 
>
> This looks great. Thanks for the fix.
>
> Acked-by: Song Liu 
>
> > ---
> > v1->v2:
> >   keep a pointer to the struct perf_sample_data rather than directly
> >   embedding it in the structure, avoiding the circular include and
> >   removing the need for in_use.  Suggested by Song.
> >
> > include/linux/bpf.h  |  1 +
> > kernel/bpf/arraymap.c|  3 ++-
> > kernel/trace/bpf_trace.c | 29 -
> > 3 files changed, 19 insertions(+), 14 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 4fb3aa2dc975..47fd85cfbbaf 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -472,6 +472,7 @@ struct bpf_event_entry {
> >   struct file *perf_file;
> >   struct file *map_file;
> >   struct rcu_head rcu;
> > + struct perf_sample_data *sd;
> > };
> >
> > bool bpf_prog_array_compatible(struct bpf_array *array, const struct 
> > bpf_prog *fp);
> > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> > index 584636c9e2eb..c7f5d593e04f 100644
> > --- a/kernel/bpf/arraymap.c
> > +++ b/kernel/bpf/arraymap.c
> > @@ -654,11 +654,12 @@ static struct bpf_event_entry 
> > *bpf_event_entry_gen(struct file *perf_file,
> > {
> >   struct bpf_event_entry *ee;
> >
> > - ee = kzalloc(sizeof(*ee), GFP_ATOMIC);
> > + ee = kzalloc(sizeof(*ee) + sizeof(struct perf_sample_data), 
> > GFP_ATOMIC);
> >   if (ee) {
> >   ee->event = perf_file->private_data;
> >   ee->perf_file = perf_file;
> >   ee->map_file = map_file;
> > + ee->sd = (void *)ee + sizeof(*ee);

This bit looks quite weird, but I don't have better ideas
to avoid circular .h pain.

Applied to bpf tree. Thanks

Re: [PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()

2019-05-31 Thread Gen Zhang

On Sat, Jun 01, 2019 at 03:34:49AM +0100, Al Viro wrote:
> On Sat, Jun 01, 2019 at 03:25:27AM +0100, Al Viro wrote:
> > On Sat, Jun 01, 2019 at 10:15:26AM +0800, Gen Zhang wrote:
> > > In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
> > > returns NULL when fails. So 'arg' should be checked. And 'mnt_opts' 
> > > should be freed when error.
> > 
> > What's the latter one for?  On failure we'll get to put_fs_context()
> > pretty soon, so
> > security_free_mnt_opts(>security);
> > will be called just fine.  Leaving it allocated on failure is fine...
> 
> Actually, right now in mainline it is not (btrfs_mount_root() has
> an odd call of security_sb_eat_lsm_opts()); eventually we will be
> down to just the callers in ->parse_monolithic() instances, at which
> point the above will become correct.  At the moment it is not, so
> consider the objection withdrawn (and I really need to get some sleep,
> seeing how long did it take me to recall the context... ;-/)
Thanks for your comments. And have a good dream.

Thanks
Gen

Re: [PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()

2019-05-31 Thread Gen Zhang

On Sat, Jun 01, 2019 at 03:25:27AM +0100, Al Viro wrote:
> On Sat, Jun 01, 2019 at 10:15:26AM +0800, Gen Zhang wrote:
> > In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
> > returns NULL when fails. So 'arg' should be checked. And 'mnt_opts' 
> > should be freed when error.
> 
> What's the latter one for?  On failure we'll get to put_fs_context()
> pretty soon, so
> security_free_mnt_opts(>security);
> will be called just fine.  Leaving it allocated on failure is fine...
Paul Moore  wrote:
>It seems like we should also check for, and potentially free *mnt_opts
>as the selinux_add_opt() error handling does just below this change,
>yes?  If that is the case we might want to move that error handling
>code to the bottom of the function and jump there on error.
I am not familiar with this part. So could you please show the function
call sequence?

Thanks
Gen

Re: [PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()

2019-05-31 Thread Al Viro

On Sat, Jun 01, 2019 at 03:25:27AM +0100, Al Viro wrote:
> On Sat, Jun 01, 2019 at 10:15:26AM +0800, Gen Zhang wrote:
> > In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
> > returns NULL when fails. So 'arg' should be checked. And 'mnt_opts' 
> > should be freed when error.
> 
> What's the latter one for?  On failure we'll get to put_fs_context()
> pretty soon, so
> security_free_mnt_opts(>security);
> will be called just fine.  Leaving it allocated on failure is fine...

Actually, right now in mainline it is not (btrfs_mount_root() has
an odd call of security_sb_eat_lsm_opts()); eventually we will be
down to just the callers in ->parse_monolithic() instances, at which
point the above will become correct.  At the moment it is not, so
consider the objection withdrawn (and I really need to get some sleep,
seeing how long did it take me to recall the context... ;-/)

Re: [PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()

2019-05-31 Thread Al Viro

On Sat, Jun 01, 2019 at 10:15:26AM +0800, Gen Zhang wrote:
> In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
> returns NULL when fails. So 'arg' should be checked. And 'mnt_opts' 
> should be freed when error.

What's the latter one for?  On failure we'll get to put_fs_context()
pretty soon, so
security_free_mnt_opts(>security);
will be called just fine.  Leaving it allocated on failure is fine...

[PATCH v3] selinux: lsm: fix a missing-check bug in selinux_sb_eat_lsm_opts()

2019-05-31 Thread Gen Zhang

In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
returns NULL when fails. So 'arg' should be checked. And 'mnt_opts' 
should be freed when error.

Signed-off-by: Gen Zhang 
Reviewed-by: Ondrej Mosnacek 
Fixes: 99dbbb593fe6 ("selinux: rewrite selinux_sb_eat_lsm_opts()")
---
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 3ec702c..f329fc0 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2616,6 +2616,7 @@ static int selinux_sb_eat_lsm_opts(char *options, void 
**mnt_opts)
char *from = options;
char *to = options;
bool first = true;
+   int ret;
 
while (1) {
int len = opt_len(from);
@@ -2635,15 +2636,16 @@ static int selinux_sb_eat_lsm_opts(char *options, void 
**mnt_opts)
*q++ = c;
}
arg = kmemdup_nul(arg, q - arg, GFP_KERNEL);
+   if (!arg) {
+   ret = -ENOMEM;
+   goto free_opt;
+   }
}
rc = selinux_add_opt(token, arg, mnt_opts);
if (unlikely(rc)) {
+   ret = rc;
kfree(arg);
-   if (*mnt_opts) {
-   selinux_free_mnt_opts(*mnt_opts);
-   *mnt_opts = NULL;
-   }
-   return rc;
+   goto free_opt;
}
} else {
if (!first) {   // copy with preceding comma
@@ -2661,6 +2663,12 @@ static int selinux_sb_eat_lsm_opts(char *options, void 
**mnt_opts)
}
*to = '\0';
return 0;
+free_opt:
+   if (*mnt_opts) {
+   selinux_free_mnt_opts(*mnt_opts);
+   *mnt_opts = NULL;
+   }
+   return ret;
 }
 
 static int selinux_sb_remount(struct super_block *sb, void *mnt_opts)

Re: [PATCH] firmware_loader: fix build without sysctl

2019-05-31 Thread Stephen Rothwell

Hi Matteo,

On Fri, 31 May 2019 11:12:39 +0200 Matteo Croce  wrote:
>
> please correct the Fixes tag if possible.
> It seems that the hash of the offending commit now is d91bff3011cf

Unfortunately, these hashes will keep changing for things in Andrew's
patch queue until they are sent to Linus.  In the end Andrew will
probably just squash this fix into the original patch.

Cheers,
Stephen Rothwell

pgpCiX6lQLHiS.pgp
Description: OpenPGP digital signature

Re: [PATCH v2 0/5] Introduce OPP bandwidth bindings

2019-05-31 Thread Saravana Kannan

I'll have to Nack this series because it's making a couple of wrong assumptions
about bandwidth voting.

Firstly, it's mixing up OPP to bandwidth mapping (Eg: CPU freq to CPU<->DDR
bandwidth mapping) with the bandwidth levels that are actually supported by an
interconnect path (Eg: CPU<->DDR bandwidth levels). For example, CPU0 might
decide to vote for a max of 10 GB/s because it's a little CPU and never needs
anything higher than 10 GB/s even at CPU0's max frequency. But that has no
bearing on bandwidth level available between CPU<->DDR.

There needs to be a separate BW OPP table describing the bandwith levels
available for the CPU<->DDR path and then a separate mapping between CPU OPP to
CPU<->DDR BW OPP. That way, the mapping decision (policy or voltage based
config decision) doesn't affect the description of what the hardware really is
capable of.

Put another way, if someone comes around and decides the CPU0's max freq should
ask for 15 GB/s because some shared voltage rail would already be pushed to a
voltage sufficient to support 15 GB/s, then it shouldn't change the HW
description of what bandwidth levels are available between CPU<->DDR. If the
CPU<->DDR path supports 20 GB/s, it always does independent of the CPU OPP
table mapping.

By splitting out the available bandwidth levels of the CPU<->DDR path into a
separate BW OPP table, we avoid these kinds of issues.

Also, one could easily imagine using a bandwidth counter or some other means of
BW measurement hardware to vote for bandwidth between CPU<->DDR and CPU<->L3.
That entity should be able to know/learn all the available bandwidth levels in
the CPU<->DDR path without forcing bandwidth levels to be listed in CPU OPP
table. And if it's measuring bandwidth at a point common for all CPUs, what CPU
OPP table is it even supposed to look at to learn all the available bandwidth
levels. It just doesn't make sense.

It's also easy to envision having multiple policies or devfreq governors voting
for an interconnect path. The mapping you are talking about in this series is
just an input for one of them (the cpufreq-map governor I sent out a while
ago).

Secondly, when it comes to bandwidth OPP tables, the peak bandwidth should be
the key/first element (similar to how frequency is now). Long explanation
follows.

All the sensible frequency combinations of all the hardware interconnects
between the master and slave port (Eg: GPU -> MMNOC -> BIMC -> DDR) determine
the peak bandwidth levels available in that interconnect path.

If multiple devices (GPU, CPU, etc) vote for different peak bandwidths for an
interconnect (say BIMC), the interconnect provider picks the max peak bandwidth
amongst all the requests and then picks the lowest interconnect frequency that
can support the max peak bandwidth. 

So the devices (GPU, CPU, etc), actually have some control on what interconnect
frequencies are picked by asking for a specific peak bandwidth -- so there's
actually a notion of useful levels.

Average bandwidth is an additive property -- so if CPU and GPU ask for 5 GB/s
and 3 GB/s respectively for an interconnect, the interconnect provider adds
them up and configures the interconnect for 8 GB/s. So if GPU asks for 5 GB/s
average bandwidth, it has no idea what frequency the interconnect will actually
get configured to. So, average bandwidth really doesn't provide a sense of
levels to pick from for a given interconnect path.

So peak bandwidth is a much better pick than average bandwidth for being a key
to the bandwidth OPP table.

So what I think we need is:
* Bandwidth OPP support in the kernel
* Bandwidth OPP DT binding to describe the bandwidth levels available for
  different interconnect paths.
* A new "interconnect-opp" property that can point to different BW OPP
  tables for each of the interconnect paths listed under interconnects
  property.

Then for mapping from device OPP to interconnect path bandwidth OPPs, you
just used the existing required-opps binding to link an entry in GPU OPP to
an entry in GPU<->DDR bandwidth OPP table. That way the hardware is
actually described correctly and the mapping is kept separate.

So, in the end, it'll look something like this in DT.

gpu_cache_opp_table: gpu_cache_opp_table {
compatible = "operating-points-v2";

gpu_cache_3000: opp-3000 {
opp-peak-mbps = <3000>;
avg-mbps = <1000>;
};
gpu_cache_6000: opp-6000 {
opp-peak-mbps = <6000>;
avg-mbps = <2000>;
};
gpu_cache_9000: opp-9000 {
opp-peak-mbps = <9000>;
avg-mbps = <9000>;
};
};

gpu_ddr_opp_table: gpu_ddr_opp_table {
compatible = "operating-points-v2";

gpu_ddr_1525: opp-1525 {
opp-peak-mbps = <1525>;
avg-mbps = <452>;
};
gpu_ddr_3051: opp-3051 {
opp-peak-mbps = <3051>;
avg-mbps = <915>;
};
gpu_ddr_7500:

mmotm 2019-05-31-19-09 uploaded

2019-05-31 Thread akpm

The mm-of-the-moment snapshot 2019-05-31-19-09 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (5.x
or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/



The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/

and use of this tree is similar to
http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above.


This mmotm tree contains the following patches against 5.2-rc2:
(patches marked "*" will be included in linux-next)

  origin.patch
  drivers-crypto-ux500-cryp-makefile-fix.patch
* mm-fix-documentation-vm-hmmrst-sphinx-warnings.patch
* lib-sortc-fix-kernel-doc-notation-warnings.patch
* mm-vmallocc-fix-typo-in-comment.patch
* mm-slab-remove-obsoleted-config_debug_slab_leak.patch
* arch-arm-boot-compressed-decompressc-fix-build-error-due-to-lz4-changes.patch
* kernel-fork-make-max_threads-symbol-static.patch
* prctl_set_mm-refactor-checks-from-validate_prctl_map.patch
* prctl_set_mm-refactor-checks-from-validate_prctl_map-checkpatch-fixes.patch
* prctl_set_mm-downgrade-mmap_sem-to-read-lock.patch
* prctl_set_mm-downgrade-mmap_sem-to-read-lock-checkpatch-fixes.patch
* mm-consider-subtrees-in-memoryevents.patch
* memcg-make-it-work-on-sparse-non-0-node-systems.patch
* ocfs2-fix-error-path-kobject-memory-leak.patch
* mm-gup-continue-vm_fault_retry-processing-event-for-pre-faults.patch
* scripts-gdb-fix-invocation-when-config_common_clk-is-not-set.patch
* z3fold-fix-sheduling-while-atomic.patch
* kasan-initialize-tag-to-0xff-in-__kasan_kmalloc.patch
* spdxcheckpy-fix-directory-structures-v3.patch
* iommu-intel-fix-variable-iommu-set-but-not-used.patch
* signal-trace_signal_deliver-when-signal_group_exit.patch
* generic-radix-trees-fix-kerneldoc-comment.patch
* mm-compaction-make-sure-we-isolate-a-valid-pfn.patch
* convert-struct-pid-count-to-refcount_t.patch
* mm-dev_pfn-exclude-memory_device_private-while-computing-virtual-address.patch
* fs-proc-allow-reporting-eip-esp-for-all-coredumping-threads.patch
* mm-mempolicy-fix-an-incorrect-rebind-node-in-mpol_rebind_nodemask.patch
* binfmt_flat-make-load_flat_shared_library-work.patch
* mm-fix-trying-to-reclaim-unevicable-lru-page.patch
* mm-memcontrol-dont-batch-updates-of-local-vm-stats-and-events.patch
* list_lru-fix-memory-leak-in-__memcg_init_list_lru_node.patch
* scripts-decode_stacktracesh-prefix-addr2line-with-cross_compile.patch
* mm-mlockall-error-for-flag-mcl_onfault.patch
* mm-fix-recent_rotated-history.patch
* fs-ocfs2-fix-race-in-ocfs2_dentry_attach_lock.patch
* mm-mmu_gather-remove-__tlb_reset_range-for-force-flush.patch
* mm-mmu_gather-remove-__tlb_reset_range-for-force-flush-checkpatch-fixes.patch
* 
scripts-decode_stacktrace-match-basepath-using-shell-prefix-operator-not-regex.patch
* scripts-decode_stacktrace-look-for-modules-with-kodebug-extension.patch
* scripts-decode_stacktrace-look-for-modules-with-kodebug-extension-v2.patch
* scripts-spellingtxt-drop-sepc-from-the-misspelling-list.patch
* scripts-spellingtxt-drop-sepc-from-the-misspelling-list-fix.patch
* scripts-spellingtxt-add-spelling-fix-for-prohibited.patch
* scripts-decode_stacktrace-accept-dash-underscore-in-modules.patch
* sh-configs-remove-config_logfs-from-defconfig.patch
* debugobjects-move-printk-out-of-db-lock-critical-sections.patch
* ocfs2-add-last-unlock-times-in-locking_state.patch
* ocfs2-add-locking-filter-debugfs-file.patch
* fs-ocfs-fix-spelling-mistake-hearbeating-heartbeat.patch
* ocfs2-clear-zero-in-unaligned-direct-io.patch
* ocfs2-clear-zero-in-unaligned-direct-io-checkpatch-fixes.patch
* ocfs2-wait-for-recovering-done-after-direct-unlock-request.patch
* ocfs2-checkpoint-appending-truncate-log-transaction-before-flushing.patch
*

Re: [RFC][PATCH] Makefile: Fix checkstack.pl arm64 wrong or unknown architecture

2019-05-31 Thread Masahiro Yamada

On Sat, Jun 1, 2019 at 2:45 AM George G. Davis  wrote:
> > Following this pattern, does this work for you?
> >
> > diff --git a/scripts/checkstack.pl b/scripts/checkstack.pl
> > index 122aef5e4e14..371bd17a4983 100755
> > --- a/scripts/checkstack.pl
> > +++ b/scripts/checkstack.pl
> > @@ -46,7 +46,7 @@ my (@stack, $re, $dre, $x, $xs, $funcre);
> > $x  = "[0-9a-f]";   # hex character
> > $xs = "[0-9a-f ]";  # hex character or space
> > $funcre = qr/^$x* <(.*)>:$/;
> > -   if ($arch eq 'aarch64') {
> > +   if ($arch =~ '^(aarch|arm)64$') {
>
> Yes, that works, thanks!
>
> Will you submit a fix or would you like me to resubmit with the above 
> suggested
> fix?

Please send v2.

Thanks.

-- 
Best Regards
Masahiro Yamada

[5.2-rc REGRESSION] Random gcc crash for 'make -j12' when low on memory

2019-05-31 Thread Qu Wenruo

Hi,

When compiling the kernel on v5.2-rc (both rc1 and rc2) with "make
-j12", the gcc will randomly crash with segfault, while on v5.1-rc7
everything is OK.

The crash only happens when the VM has only 1G ram, when given 4G ram it
no longer crash.
However according to dmesg, there is no OOM triggered.

Thus this looks like a regression.

The environment is:
VM hypervisor: KVM
vCPU: 8
vRAM: 1G (crash) 4G (OK)
Distro: Archlinux
Tried kernel: Upstream v5.1-rc7 (good), v5.2-rc1 (fail), v5.2-rc2(fail)

Host CPU: Ryzen 1700 (no gcc crash on host)

Is there something related to OOM changed?

Thanks,
Qu




signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2] hooks: fix a missing-check bug in selinux_sb_eat_lsm_opts()

2019-05-31 Thread Gen Zhang

On Fri, May 31, 2019 at 11:45:28AM -0400, Paul Moore wrote:
> On Thu, May 30, 2019 at 9:34 PM Gen Zhang  wrote:
> >
> > In selinux_sb_eat_lsm_opts(), 'arg' is allocated by kmemdup_nul(). It
> > returns NULL when fails. So 'arg' should be checked.
> >
> > Signed-off-by: Gen Zhang 
> > Reviewed-by: Ondrej Mosnacek 
> > Fixes: 99dbbb593fe6 ("selinux: rewrite selinux_sb_eat_lsm_opts()")
> 
> One quick note about the subject line, instead of using "hooks:" you
> should use "selinux:" since this is specific to SELinux.  If the patch
> did apply to the LSM framework as a whole, I would suggest using
> "lsm:" instead of "hooks:" as "hooks" is too ambiguous of a prefix.
> 
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index 3ec702c..5a9e959 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -2635,6 +2635,8 @@ static int selinux_sb_eat_lsm_opts(char *options, 
> > void **mnt_opts)
> > *q++ = c;
> > }
> > arg = kmemdup_nul(arg, q - arg, GFP_KERNEL);
> > +   if (!arg)
> > +   return -ENOMEM;
> 
> It seems like we should also check for, and potentially free *mnt_opts
> as the selinux_add_opt() error handling does just below this change,
> yes?  If that is the case we might want to move that error handling
> code to the bottom of the function and jump there on error.
> 
> > }
> > rc = selinux_add_opt(token, arg, mnt_opts);
> > if (unlikely(rc)) {
> 
> -- 
> paul moore
> www.paul-moore.com
Yes, I agree with that. And I will work on this to resubmit.

Thanks
Gen

Re: [PATCH v2] hooks: fix a missing-check bug in selinux_add_mnt_opt()

2019-05-31 Thread Gen Zhang

On Fri, May 31, 2019 at 11:55:23AM -0400, Paul Moore wrote:
> On Thu, May 30, 2019 at 4:55 AM Gen Zhang  wrote:
> >
> > In selinux_add_mnt_opt(), 'val' is allcoted by kmemdup_nul(). It returns
> > NULL when fails. So 'val' should be checked.
> >
> > Signed-off-by: Gen Zhang 
> > Fixes: 757cbe597fe8 ("LSM: new method: ->sb_add_mnt_opt()")
> 
> Previous comments regarding "selinux:" instead of "hooks:" apply here as well.
> 
Thanks for your comments, Paul. I will make some changes.
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index 3ec702c..4797c63 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -1052,8 +1052,11 @@ static int selinux_add_mnt_opt(const char *option, 
> > const char *val, int len,
> > if (token == Opt_error)
> > return -EINVAL;
> >
> > -   if (token != Opt_seclabel)
> > -   val = kmemdup_nul(val, len, GFP_KERNEL);
> > +   if (token != Opt_seclabel) {
> > +   val = kmemdup_nul(val, len, GFP_KERNEL);
> > +   if (!val)
> > +   return -ENOMEM;
> 
> It looks like this code is only ever called by NFS, which will
> eventually clean up mnt_opts via security_free_mnt_opts(), but since
> the selinux_add_opt() error handler below cleans up mnt_opts it might
> be safer to do the same here in case this function is called multiple
> times to add multiple options.
> 
> > +   }
> > rc = selinux_add_opt(token, val, mnt_opts);
> > if (unlikely(rc)) {
> > kfree(val);
> 
> -- 
> paul moore
> www.paul-moore.com
Thanks
Gen

Re: [PATCH bpf v2] bpf: preallocate a perf_sample_data per event fd

2019-05-31 Thread Song Liu




> On May 31, 2019, at 3:37 PM, Matt Mullins  wrote:
> 
> It is possible that a BPF program can be called while another BPF
> program is executing bpf_perf_event_output.  This has been observed with
> I/O completion occurring as a result of an interrupt:
> 
>   bpf_prog_247fd1341cddaea4_trace_req_end+0x8d7/0x1000
>   ? trace_call_bpf+0x82/0x100
>   ? sch_direct_xmit+0xe2/0x230
>   ? blk_mq_end_request+0x1/0x100
>   ? blk_mq_end_request+0x5/0x100
>   ? kprobe_perf_func+0x19b/0x240
>   ? __qdisc_run+0x86/0x520
>   ? blk_mq_end_request+0x1/0x100
>   ? blk_mq_end_request+0x5/0x100
>   ? kprobe_ftrace_handler+0x90/0xf0
>   ? ftrace_ops_assist_func+0x6e/0xe0
>   ? ip6_input_finish+0xbf/0x460
>   ? 0xa01e80bf
>   ? nbd_dbg_flags_show+0xc0/0xc0 [nbd]
>   ? blkdev_issue_zeroout+0x200/0x200
>   ? blk_mq_end_request+0x1/0x100
>   ? blk_mq_end_request+0x5/0x100
>   ? flush_smp_call_function_queue+0x6c/0xe0
>   ? smp_call_function_single_interrupt+0x32/0xc0
>   ? call_function_single_interrupt+0xf/0x20
>   ? call_function_single_interrupt+0xa/0x20
>   ? swiotlb_map_page+0x140/0x140
>   ? refcount_sub_and_test+0x1a/0x50
>   ? tcp_wfree+0x20/0xf0
>   ? skb_release_head_state+0x62/0xc0
>   ? skb_release_all+0xe/0x30
>   ? napi_consume_skb+0xb5/0x100
>   ? mlx5e_poll_tx_cq+0x1df/0x4e0
>   ? mlx5e_poll_tx_cq+0x38c/0x4e0
>   ? mlx5e_napi_poll+0x58/0xc30
>   ? mlx5e_napi_poll+0x232/0xc30
>   ? net_rx_action+0x128/0x340
>   ? __do_softirq+0xd4/0x2ad
>   ? irq_exit+0xa5/0xb0
>   ? do_IRQ+0x7d/0xc0
>   ? common_interrupt+0xf/0xf
>   
>   ? __rb_free_aux+0xf0/0xf0
>   ? perf_output_sample+0x28/0x7b0
>   ? perf_prepare_sample+0x54/0x4a0
>   ? perf_event_output+0x43/0x60
>   ? bpf_perf_event_output_raw_tp+0x15f/0x180
>   ? blk_mq_start_request+0x1/0x120
>   ? bpf_prog_411a64a706fc6044_should_trace+0xad4/0x1000
>   ? bpf_trace_run3+0x2c/0x80
>   ? nbd_send_cmd+0x4c2/0x690 [nbd]
> 
> This also cannot be alleviated by further splitting the per-cpu
> perf_sample_data structs (as in commit 283ca526a9bd ("bpf: fix
> corruption on concurrent perf_event_output calls")), as a raw_tp could
> be attached to the block:block_rq_complete tracepoint and execute during
> another raw_tp.  Instead, keep a pre-allocated perf_sample_data
> structure per perf_event_array element and fail a bpf_perf_event_output
> if that element is concurrently being used.
> 
> Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data")
> Signed-off-by: Matt Mullins 

This looks great. Thanks for the fix. 

Acked-by: Song Liu 

> ---
> v1->v2:
>   keep a pointer to the struct perf_sample_data rather than directly
>   embedding it in the structure, avoiding the circular include and
>   removing the need for in_use.  Suggested by Song.
> 
> include/linux/bpf.h  |  1 +
> kernel/bpf/arraymap.c|  3 ++-
> kernel/trace/bpf_trace.c | 29 -
> 3 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 4fb3aa2dc975..47fd85cfbbaf 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -472,6 +472,7 @@ struct bpf_event_entry {
>   struct file *perf_file;
>   struct file *map_file;
>   struct rcu_head rcu;
> + struct perf_sample_data *sd;
> };
> 
> bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog 
> *fp);
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 584636c9e2eb..c7f5d593e04f 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -654,11 +654,12 @@ static struct bpf_event_entry 
> *bpf_event_entry_gen(struct file *perf_file,
> {
>   struct bpf_event_entry *ee;
> 
> - ee = kzalloc(sizeof(*ee), GFP_ATOMIC);
> + ee = kzalloc(sizeof(*ee) + sizeof(struct perf_sample_data), GFP_ATOMIC);
>   if (ee) {
>   ee->event = perf_file->private_data;
>   ee->perf_file = perf_file;
>   ee->map_file = map_file;
> + ee->sd = (void *)ee + sizeof(*ee);
>   }
> 
>   return ee;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f92d6ad5e080..076f8e987355 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -410,17 +410,17 @@ static const struct bpf_func_proto 
> bpf_perf_event_read_value_proto = {
>   .arg4_type  = ARG_CONST_SIZE,
> };
> 
> -static DEFINE_PER_CPU(struct perf_sample_data, bpf_trace_sd);
> -
> static __always_inline u64
> __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map,
> - u64 flags, struct perf_sample_data *sd)
> + u64 flags, struct perf_raw_record *raw)
> {
>   struct bpf_array *array = container_of(map, struct bpf_array, map);
>   unsigned int cpu = smp_processor_id();
>   u64

Re: [PATCH v8 2/4] soc: qcom: Add AOSS QMP driver

2019-05-31 Thread Doug Anderson

Hi,

On Fri, May 31, 2019 at 5:09 PM Bjorn Andersson
 wrote:
>
> On Fri 31 May 15:24 PDT 2019, Doug Anderson wrote:
>
> > Hi,
> >
> > On Thu, May 30, 2019 at 8:01 PM Bjorn Andersson
> >  wrote:
> > >
> > > +/**
> > > + * qmp_send() - send a message to the AOSS
> > > + * @qmp: qmp context
> > > + * @data: message to be sent
> > > + * @len: length of the message
> > > + *
> > > + * Transmit @data to AOSS and wait for the AOSS to acknowledge the 
> > > message.
> > > + * @len must be a multiple of 4 and not longer than the mailbox size. 
> > > Access is
> > > + * synchronized by this implementation.
> > > + *
> > > + * Return: 0 on success, negative errno on failure
> > > + */
> > > +static int qmp_send(struct qmp *qmp, const void *data, size_t len)
> > > +{
> > > +   int ret;
> > > +
> > > +   if (WARN_ON(len + sizeof(u32) > qmp->size))
> > > +   return -EINVAL;
> > > +
> > > +   if (WARN_ON(len % sizeof(u32)))
> > > +   return -EINVAL;
> > > +
> > > +   mutex_lock(>tx_lock);
> > > +
> > > +   /* The message RAM only implements 32-bit accesses */
> > > +   __iowrite32_copy(qmp->msgram + qmp->offset + sizeof(u32),
> > > +data, len / sizeof(u32));
> > > +   writel(len, qmp->msgram + qmp->offset);
> > > +   qmp_kick(qmp);
> > > +
> > > +   ret = wait_event_interruptible_timeout(qmp->event,
> > > +  qmp_message_empty(qmp), 
> > > HZ);
> > > +   if (!ret) {
> > > +   dev_err(qmp->dev, "ucore did not ack channel\n");
> > > +   ret = -ETIMEDOUT;
> > > +
> > > +   /* Clear message from buffer */
> > > +   writel(0, qmp->msgram + qmp->offset);
> > > +   } else {
> > > +   ret = 0;
> > > +   }
> >
> > Just like Vinod said in in v7, the "ret = 0" is redundant.
> >
>
> If the condition passed to wait_event_interruptible_timeout() evaluates
> true the remote side has consumed the message and ret will be 1. We end
> up in the else block (i.e. not timeout) and we want the function to
> return 0, so we set ret to 0.
>
> Please let me know if I'm reading this wrong.

Ah, it's me that's confused.  I missed the "!" on ret.  Maybe it'd be
less confusing if you did:

time_left = wait_event_interruptible_timeout(...)
if (!time_left)

Even though you _can_ use "ret", it's less confusing to use a
different variable since (often) ret is an error code.

Speaking of which: do you actually handle the case where you get an
interrupt?  Should the above just be wait_event_timeout()?

Re: [PATCH] ARM: xor-neon: Replace GNUC checks with CONFIG_CC_IS_GCC

2019-05-31 Thread Nathan Chancellor

On Fri, May 31, 2019 at 11:03:19PM +0200, Arnd Bergmann wrote:
> On Fri, May 31, 2019 at 10:06 PM 'Nick Desaulniers' via Clang Built
> Linux  wrote:
> >
> > On Fri, May 31, 2019 at 12:21 PM Arnd Bergmann  wrote:
> > > clang, I would suggest dropping your patch then, and instead adding
> >
> > I disagree.  The minimum version of gcc required to build the kernel
> > is 4.6, so the comment about older versions of gcc is irrelevant and
> > should be removed.
> 
> Sure, that's ok. It just feels wrong to remove a warning that points
> to a real problem that still exists and can be detected at the moment.
> 
> If we think that clang-9 is going to be fixed before its release,
> the warning could be changed to test for that version as a minimum,
> and point to the bugzilla entry for more details.
> 
>   Arnd

I just tested the arm64 implementation and it shows the same warnings
about cost as arm.

However, I see a warning as something that can be resolved by the user.
The GCC warning's solution is to just use a newer version of GCC
(something fairly easily attainable). This new warning currently has no
solution other than don't use clang.

It is up to you and Nick but I would say unless we are going to
prioritize fixing this, we shouldn't add a warning for it. I'd say it is
more appropriate to fix it then add a warning saying upgrade to this
version to fix it, like the GCC one (though I don't necessarily hate
adding the warning assuming that clang 9 will have it fixed).

Cheers,
Nathan

[PATCH net v1 1/2] net: ethernet: mediatek: Use hw_feature to judge if HWLRO is supported

2019-05-31 Thread sean.wang

From: Sean Wang 

Should hw_feature as hardware capability flags to check if hardware LRO
got support.

Signed-off-by: Mark Lee 
Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index f9fbb3ffa3a6..0b88febbaf2a 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2298,13 +2298,13 @@ static int mtk_get_rxnfc(struct net_device *dev, struct 
ethtool_rxnfc *cmd,
 
switch (cmd->cmd) {
case ETHTOOL_GRXRINGS:
-   if (dev->features & NETIF_F_LRO) {
+   if (dev->hw_features & NETIF_F_LRO) {
cmd->data = MTK_MAX_RX_RING_NUM;
ret = 0;
}
break;
case ETHTOOL_GRXCLSRLCNT:
-   if (dev->features & NETIF_F_LRO) {
+   if (dev->hw_features & NETIF_F_LRO) {
struct mtk_mac *mac = netdev_priv(dev);
 
cmd->rule_cnt = mac->hwlro_ip_cnt;
@@ -2312,11 +2312,11 @@ static int mtk_get_rxnfc(struct net_device *dev, struct 
ethtool_rxnfc *cmd,
}
break;
case ETHTOOL_GRXCLSRULE:
-   if (dev->features & NETIF_F_LRO)
+   if (dev->hw_features & NETIF_F_LRO)
ret = mtk_hwlro_get_fdir_entry(dev, cmd);
break;
case ETHTOOL_GRXCLSRLALL:
-   if (dev->features & NETIF_F_LRO)
+   if (dev->hw_features & NETIF_F_LRO)
ret = mtk_hwlro_get_fdir_all(dev, cmd,
 rule_locs);
break;
@@ -2333,11 +2333,11 @@ static int mtk_set_rxnfc(struct net_device *dev, struct 
ethtool_rxnfc *cmd)
 
switch (cmd->cmd) {
case ETHTOOL_SRXCLSRLINS:
-   if (dev->features & NETIF_F_LRO)
+   if (dev->hw_features & NETIF_F_LRO)
ret = mtk_hwlro_add_ipaddr(dev, cmd);
break;
case ETHTOOL_SRXCLSRLDEL:
-   if (dev->features & NETIF_F_LRO)
+   if (dev->hw_features & NETIF_F_LRO)
ret = mtk_hwlro_del_ipaddr(dev, cmd);
break;
default:
-- 
2.17.1

Re: [PATCH net-next 00/12] code optimizations & bugfixes for HNS3 driver

2019-05-31 Thread David Miller

From: David Miller 
Date: Fri, 31 May 2019 17:15:29 -0700 (PDT)

> From: Huazhong Tan 
> Date: Fri, 31 May 2019 16:54:46 +0800
> 
>> This patch-set includes code optimizations and bugfixes for the HNS3
>> ethernet controller driver.
>> 
>> [patch 1/12] removes the redundant core reset type
>> 
>> [patch 2/12 - 3/12] fixes two VLAN related issues
>> 
>> [patch 4/12] fixes a TM issue
>> 
>> [patch 5/12 - 12/12] includes some patches related to RAS & MSI-X error
> 
> Series applied.

I reverted, you need to actually build test the infiniband side of your
driver.

drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function 
‘hns_roce_v2_msix_interrupt_abn’:
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:5032:14: warning: passing argument 2 
of ‘ops->set_default_reset_request’ makes pointer from integer without a cast 
[-Wint-conversion]
  HNAE3_FUNC_RESET);
  ^~~~
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:5032:14: note: expected ‘long 
unsigned int *’ but argument is of type ‘int’
  C-c C-cmake[5]: *** Deleting file 'drivers/net/wireless/ath/carl9170/cmd.o'

[PATCH net v1 2/2] net: ethernet: mediatek: Use NET_IP_ALIGN to judge if HW RX_2BYTE_OFFSET is enabled

2019-05-31 Thread sean.wang

From: Sean Wang 

Should only enable HW RX_2BYTE_OFFSET function in the case NET_IP_ALIGN
equals to 2.

Signed-off-by: Mark Lee 
Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 0b88febbaf2a..765cd56ebcd2 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1778,6 +1778,7 @@ static void mtk_poll_controller(struct net_device *dev)
 
 static int mtk_start_dma(struct mtk_eth *eth)
 {
+   u32 rx_2b_offset = (NET_IP_ALIGN == 2) ? MTK_RX_2B_OFFSET : 0;
int err;
 
err = mtk_dma_init(eth);
@@ -1794,7 +1795,7 @@ static int mtk_start_dma(struct mtk_eth *eth)
MTK_QDMA_GLO_CFG);
 
mtk_w32(eth,
-   MTK_RX_DMA_EN | MTK_RX_2B_OFFSET |
+   MTK_RX_DMA_EN | rx_2b_offset |
MTK_RX_BT_32DWORDS | MTK_MULTI_EN,
MTK_PDMA_GLO_CFG);
 
-- 
2.17.1

Re: [PATCH net-next 00/12] code optimizations & bugfixes for HNS3 driver

2019-05-31 Thread David Miller

From: Huazhong Tan 
Date: Fri, 31 May 2019 16:54:46 +0800

> This patch-set includes code optimizations and bugfixes for the HNS3
> ethernet controller driver.
> 
> [patch 1/12] removes the redundant core reset type
> 
> [patch 2/12 - 3/12] fixes two VLAN related issues
> 
> [patch 4/12] fixes a TM issue
> 
> [patch 5/12 - 12/12] includes some patches related to RAS & MSI-X error

Series applied.

Re: [PATCH -mm] mm, swap: Fix bad swap file entry warning

2019-05-31 Thread Huang, Ying

Michal Hocko  writes:

> On Fri 31-05-19 10:41:02, Huang, Ying wrote:
>> From: Huang Ying 
>> 
>> Mike reported the following warning messages
>> 
>>   get_swap_device: Bad swap file entry 1401
>> 
>> This is produced by
>> 
>> - total_swapcache_pages()
>>   - get_swap_device()
>> 
>> Where get_swap_device() is used to check whether the swap device is
>> valid and prevent it from being swapoff if so.  But get_swap_device()
>> may produce warning message as above for some invalid swap devices.
>> This is fixed via calling swp_swap_info() before get_swap_device() to
>> filter out the swap devices that may cause warning messages.
>> 
>> Fixes: 6a946753dbe6 ("mm/swap_state.c: simplify total_swapcache_pages() with 
>> get_swap_device()")
>
> I suspect this is referring to a mmotm patch right?

Yes.

> There doesn't seem
> to be any sha like this in Linus' tree AFAICS. If that is the case then
> please note that mmotm patch showing up in linux-next do not have a
> stable sha1 and therefore you shouldn't reference them in the commit
> message. Instead please refer to the specific mmotm patch file so that
> Andrew knows it should be folded in to it.

Thanks for reminding!  I will be more careful in the future.  It seems
that Andrew has identified the right patch to be folded into.

Best Regards,
Huang, Ying

Re: [PATCH bpf v2] bpf, riscv: clear high 32 bits for ALU32 add/sub/neg/lsh/rsh/arsh

2019-05-31 Thread Alexei Starovoitov

On Fri, May 31, 2019 at 1:40 PM Palmer Dabbelt  wrote:
>
> On Thu, 30 May 2019 15:29:22 PDT (-0700), luke.r.n...@gmail.com wrote:
> > In BPF, 32-bit ALU operations should zero-extend their results into
> > the 64-bit registers.
> >
> > The current BPF JIT on RISC-V emits incorrect instructions that perform
> > sign extension only (e.g., addw, subw) on 32-bit add, sub, lsh, rsh,
> > arsh, and neg. This behavior diverges from the interpreter and JITs
> > for other architectures.
> >
> > This patch fixes the bugs by performing zero extension on the destination
> > register of 32-bit ALU operations.
> >
> > Fixes: 2353ecc6f91f ("bpf, riscv: add BPF JIT for RV64G")
> > Cc: Xi Wang 
> > Signed-off-by: Luke Nelson 
>
> Reviewed-by: Palmer Dabbelt 
>
> Thanks!  I'm assuming this is going in through a BPF tree and not the RISC-V
> tree, but LMK if that's not the case.

Applied to bpf tree. Thanks

[PATCH net-next v1 4/6] net: ethernet: mediatek: Integrate hardware path from GMAC to PHY variants

2019-05-31 Thread sean.wang

From: Sean Wang 

All path route on various SoCs all would be managed in common function
mtk_setup_hw_path that is determined by the both applied devicetree
regarding the path between GMAC and the target PHY or switch by the
capability of target SoC in the runtime.

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/Makefile   |   3 +-
 drivers/net/ethernet/mediatek/mtk_eth_path.c | 323 +++
 drivers/net/ethernet/mediatek/mtk_eth_soc.c  |  70 +---
 drivers/net/ethernet/mediatek/mtk_eth_soc.h  | 123 ++-
 4 files changed, 450 insertions(+), 69 deletions(-)
 create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_path.c

diff --git a/drivers/net/ethernet/mediatek/Makefile 
b/drivers/net/ethernet/mediatek/Makefile
index b8206605154e..212c86f9982f 100644
--- a/drivers/net/ethernet/mediatek/Makefile
+++ b/drivers/net/ethernet/mediatek/Makefile
@@ -3,4 +3,5 @@
 # Makefile for the Mediatek SoCs built-in ethernet macs
 #
 
-obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth_soc.o mtk_sgmii.o
+obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth_soc.o mtk_sgmii.o \
+ mtk_eth_path.o
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_path.c 
b/drivers/net/ethernet/mediatek/mtk_eth_path.c
new file mode 100644
index ..61f705d945e5
--- /dev/null
+++ b/drivers/net/ethernet/mediatek/mtk_eth_path.c
@@ -0,0 +1,323 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018-2019 MediaTek Inc.
+
+/* A library for configuring path from GMAC/GDM to target PHY
+ *
+ * Author: Sean Wang 
+ *
+ */
+
+#include 
+#include 
+
+#include "mtk_eth_soc.h"
+
+struct mtk_eth_muxc {
+   int (*set_path)(struct mtk_eth *eth, int path);
+};
+
+static const char * const mtk_eth_mux_name[] = {
+   "mux_gdm1_to_gmac1_esw", "mux_gmac2_gmac0_to_gephy",
+   "mux_u3_gmac2_to_qphy", "mux_gmac1_gmac2_to_sgmii_rgmii",
+   "mux_gmac12_to_gephy_sgmii",
+};
+
+static const char * const mtk_eth_path_name[] = {
+   "gmac1_rgmii", "gmac1_trgmii", "gmac1_sgmii", "gmac2_rgmii",
+   "gmac2_sgmii", "gmac2_gephy", "gdm1_esw",
+};
+
+static int set_mux_gdm1_to_gmac1_esw(struct mtk_eth *eth, int path)
+{
+   bool updated = true;
+   u32 val, mask, set;
+
+   switch (path) {
+   case MTK_ETH_PATH_GMAC1_SGMII:
+   mask = ~(u32)MTK_MUX_TO_ESW;
+   set = 0;
+   break;
+   case MTK_ETH_PATH_GDM1_ESW:
+   mask = ~(u32)MTK_MUX_TO_ESW;
+   set = MTK_MUX_TO_ESW;
+   break;
+   default:
+   updated = false;
+   break;
+   };
+
+   if (updated) {
+   val = mtk_r32(eth, MTK_MAC_MISC);
+   val = (val & mask) | set;
+   mtk_w32(eth, val, MTK_MAC_MISC);
+   }
+
+   dev_dbg(eth->dev, "path %s in %s updated = %d\n",
+   mtk_eth_path_name[path], __func__, updated);
+
+   return 0;
+}
+
+static int set_mux_gmac2_gmac0_to_gephy(struct mtk_eth *eth, int path)
+{
+   unsigned int val = 0;
+   bool updated = true;
+
+   switch (path) {
+   case MTK_ETH_PATH_GMAC2_GEPHY:
+   val = ~(u32)GEPHY_MAC_SEL;
+   break;
+   default:
+   updated = false;
+   break;
+   }
+
+   if (updated)
+   regmap_update_bits(eth->infra, INFRA_MISC2, GEPHY_MAC_SEL, val);
+
+   dev_dbg(eth->dev, "path %s in %s updated = %d\n",
+   mtk_eth_path_name[path], __func__, updated);
+
+   return 0;
+}
+
+static int set_mux_u3_gmac2_to_qphy(struct mtk_eth *eth, int path)
+{
+   unsigned int val = 0;
+   bool updated = true;
+
+   switch (path) {
+   case MTK_ETH_PATH_GMAC2_SGMII:
+   val = CO_QPHY_SEL;
+   break;
+   default:
+   updated = false;
+   break;
+   }
+
+   if (updated)
+   regmap_update_bits(eth->infra, INFRA_MISC2, CO_QPHY_SEL, val);
+
+   dev_dbg(eth->dev, "path %s in %s updated = %d\n",
+   mtk_eth_path_name[path], __func__, updated);
+
+   return 0;
+}
+
+static int set_mux_gmac1_gmac2_to_sgmii_rgmii(struct mtk_eth *eth, int path)
+{
+   unsigned int val = 0;
+   bool updated = true;
+
+   switch (path) {
+   case MTK_ETH_PATH_GMAC1_SGMII:
+   val = SYSCFG0_SGMII_GMAC1;
+   break;
+   case MTK_ETH_PATH_GMAC2_SGMII:
+   val = SYSCFG0_SGMII_GMAC2;
+   break;
+   case MTK_ETH_PATH_GMAC1_RGMII:
+   case MTK_ETH_PATH_GMAC2_RGMII:
+   regmap_read(eth->ethsys, ETHSYS_SYSCFG0, );
+   val &= SYSCFG0_SGMII_MASK;
+
+   if ((path == MTK_GMAC1_RGMII && val == SYSCFG0_SGMII_GMAC1) ||
+   (path == MTK_GMAC2_RGMII && val == SYSCFG0_SGMII_GMAC2))
+   val = 0;
+   else
+   updated = false;
+   break;
+

Re: [PATCH v8 2/4] soc: qcom: Add AOSS QMP driver

2019-05-31 Thread Bjorn Andersson

On Fri 31 May 15:24 PDT 2019, Doug Anderson wrote:

> Hi,
> 
> On Thu, May 30, 2019 at 8:01 PM Bjorn Andersson
>  wrote:
> >
> > +/**
> > + * qmp_send() - send a message to the AOSS
> > + * @qmp: qmp context
> > + * @data: message to be sent
> > + * @len: length of the message
> > + *
> > + * Transmit @data to AOSS and wait for the AOSS to acknowledge the message.
> > + * @len must be a multiple of 4 and not longer than the mailbox size. 
> > Access is
> > + * synchronized by this implementation.
> > + *
> > + * Return: 0 on success, negative errno on failure
> > + */
> > +static int qmp_send(struct qmp *qmp, const void *data, size_t len)
> > +{
> > +   int ret;
> > +
> > +   if (WARN_ON(len + sizeof(u32) > qmp->size))
> > +   return -EINVAL;
> > +
> > +   if (WARN_ON(len % sizeof(u32)))
> > +   return -EINVAL;
> > +
> > +   mutex_lock(>tx_lock);
> > +
> > +   /* The message RAM only implements 32-bit accesses */
> > +   __iowrite32_copy(qmp->msgram + qmp->offset + sizeof(u32),
> > +data, len / sizeof(u32));
> > +   writel(len, qmp->msgram + qmp->offset);
> > +   qmp_kick(qmp);
> > +
> > +   ret = wait_event_interruptible_timeout(qmp->event,
> > +  qmp_message_empty(qmp), HZ);
> > +   if (!ret) {
> > +   dev_err(qmp->dev, "ucore did not ack channel\n");
> > +   ret = -ETIMEDOUT;
> > +
> > +   /* Clear message from buffer */
> > +   writel(0, qmp->msgram + qmp->offset);
> > +   } else {
> > +   ret = 0;
> > +   }
> 
> Just like Vinod said in in v7, the "ret = 0" is redundant.
> 

If the condition passed to wait_event_interruptible_timeout() evaluates
true the remote side has consumed the message and ret will be 1. We end
up in the else block (i.e. not timeout) and we want the function to
return 0, so we set ret to 0.

Please let me know if I'm reading this wrong.

> 
> > +static int qmp_qdss_clk_add(struct qmp *qmp)
> > +{
> > +   struct clk_init_data qdss_init = {
> > +   .ops = _qdss_clk_ops,
> > +   .name = "qdss",
> > +   };
> 
> As I mentioned in v7, there is no downside in marking qdss_init as
> "static const" and it avoids the compiler inserting a memcpy() to get
> this data on the stack.  Using static const also reduces your stack
> usage.
> 

In which case we would just serve it from .ro, makes sense now that I
read your comment again. 

> 
> > +   int ret;
> > +
> > +   qmp->qdss_clk.init = _init;
> > +   ret = clk_hw_register(qmp->dev, >qdss_clk);
> > +   if (ret < 0) {
> > +   dev_err(qmp->dev, "failed to register qdss clock\n");
> > +   return ret;
> > +   }
> > +
> > +   ret = of_clk_add_hw_provider(qmp->dev->of_node, 
> > of_clk_hw_simple_get,
> > +>qdss_clk);
> 
> I still prefer to devm-ify the whole driver, using
> devm_add_action_or_reset() to handle things where there is no devm.
> ...but I won't insist.
> 
> 
> Above things are just nits and I won't insist.  They also could be
> addressed in follow-up patches.  Thus:
> 
> Reviewed-by: Douglas Anderson 

Thanks!

Regards,
Bjorn

[PATCH net-next v1 1/6] dt-bindings: clock: mediatek: Add an extra required property to sgmiisys

2019-05-31 Thread sean.wang

From: Sean Wang 

add an extra required property "mediatek,physpeed" to sgmiisys to determine
link speed to match up the capability of the target PHY.

Signed-off-by: Sean Wang 
---
 .../devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt  | 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt 
b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt
index 30cb645c0e54..f5518f26a914 100644
--- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt
@@ -9,6 +9,8 @@ Required Properties:
- "mediatek,mt7622-sgmiisys", "syscon"
- "mediatek,mt7629-sgmiisys", "syscon"
 - #clock-cells: Must be 1
+- mediatek,physpeed: Should be one of "auto", "1000" or "2500" to match up
+the capability of the target PHY.
 
 The SGMIISYS controller uses the common clk binding from
 Documentation/devicetree/bindings/clock/clock-bindings.txt
-- 
2.17.1

[PATCH net-next v1 6/6] arm64: dts: mt7622: Enlarge the SGMII register range

2019-05-31 Thread sean.wang

From: Sean Wang 

Enlarge the SGMII register range and using 2.5G force mode on default.

Signed-off-by: Sean Wang 
---
 arch/arm64/boot/dts/mediatek/mt7622.dtsi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi 
b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
index 4b1f5ae710eb..d1e13d340e26 100644
--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
@@ -929,7 +929,8 @@
sgmiisys: sgmiisys@1b128000 {
compatible = "mediatek,mt7622-sgmiisys",
 "syscon";
-   reg = <0 0x1b128000 0 0x1000>;
+   reg = <0 0x1b128000 0 0x3000>;
#clock-cells = <1>;
+   mediatek,physpeed = "2500";
};
 };
-- 
2.17.1

[PATCH net-next v1 0/6] Add MT7629 ethernet support

2019-05-31 Thread sean.wang

From: Sean Wang 

MT7629 inlcudes two sets of SGMIIs used for external switch or PHY, and embedded
switch (ESW) via GDM1, GePHY via GMAC2, so add several patches in the series to
make the code base common with the old SoCs.

The patch 1, 3 and 6, adds extension for SGMII to have the hardware configured
for 1G, 2.5G and AN to fit the capability of the target PHY. In patch 6 could be
an example showing how to use these configurations for underlying PHY speed to
match up the link speed of the target PHY.

The patch 4 is used for automatically configured the hardware path from GMACx to
the target PHY by the description in deviceetree topology to determine the
proper value for the corresponding MUX.

The patch 2 and 5 is for the update for MT7629 including dt-binding document and
its driver.

Sean Wang (6):
  dt-bindings: clock: mediatek: Add an extra required property to
sgmiisys
  dt-bindings: net: mediatek: Add support for MediaTek MT7629 SoC
  net: ethernet: mediatek: Extend SGMII related functions
  net: ethernet: mediatek: Integrate hardware path from GMAC to PHY
variants
  net: ethernet: mediatek: Add MT7629 ethernet support
  arm64: dts: mt7622: Enlarge the SGMII register range

 .../arm/mediatek/mediatek,sgmiisys.txt|   2 +
 .../devicetree/bindings/net/mediatek-net.txt  |  14 +-
 arch/arm64/boot/dts/mediatek/mt7622.dtsi  |   3 +-
 drivers/net/ethernet/mediatek/Makefile|   3 +-
 drivers/net/ethernet/mediatek/mtk_eth_path.c  | 323 ++
 drivers/net/ethernet/mediatek/mtk_eth_soc.c   |  97 +++---
 drivers/net/ethernet/mediatek/mtk_eth_soc.h   | 177 +-
 drivers/net/ethernet/mediatek/mtk_sgmii.c | 105 ++
 8 files changed, 647 insertions(+), 77 deletions(-)
 create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_path.c
 create mode 100644 drivers/net/ethernet/mediatek/mtk_sgmii.c

-- 
2.17.1

[PATCH net-next v1 2/6] dt-bindings: net: mediatek: Add support for MediaTek MT7629 SoC

2019-05-31 Thread sean.wang

From: Sean Wang 

Add binding document for the ethernet on MT7629 SoC.

Signed-off-by: Sean Wang 
---
 .../devicetree/bindings/net/mediatek-net.txt   | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt 
b/Documentation/devicetree/bindings/net/mediatek-net.txt
index 503f2b9194e2..770ff98d4524 100644
--- a/Documentation/devicetree/bindings/net/mediatek-net.txt
+++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
@@ -11,6 +11,7 @@ Required properties:
"mediatek,mt2701-eth": for MT2701 SoC
"mediatek,mt7623-eth", "mediatek,mt2701-eth": for MT7623 SoC
"mediatek,mt7622-eth": for MT7622 SoC
+   "mediatek,mt7629-eth": for MT7629 SoC
 - reg: Address and length of the register set for the device
 - interrupts: Should contain the three frame engines interrupts in numeric
order. These are fe_int0, fe_int1 and fe_int2.
@@ -19,14 +20,23 @@ Required properties:
"ethif", "esw", "gp2", "gp1" : For MT2701 and MT7623 SoC
 "ethif", "esw", "gp0", "gp1", "gp2", "sgmii_tx250m", "sgmii_rx250m",
"sgmii_cdr_ref", "sgmii_cdr_fb", "sgmii_ck", "eth2pll" : For MT7622 SoC
+   "ethif", "sgmiitop", "esw", "gp0", "gp1", "gp2", "fe", "sgmii_tx250m",
+   "sgmii_rx250m", "sgmii_cdr_ref", "sgmii_cdr_fb", "sgmii2_tx250m",
+   "sgmii2_rx250m", "sgmii2_cdr_ref", "sgmii2_cdr_fb", "sgmii_ck",
+   "eth2pll" : For MT7629 SoC.
 - power-domains: phandle to the power domain that the ethernet is part of
 - resets: Should contain phandles to the ethsys reset signals
 - reset-names: Should contain the names of reset signal listed in the resets
property
These are "fe", "gmac" and "ppe"
 - mediatek,ethsys: phandle to the syscon node that handles the port setup
-- mediatek,sgmiisys: phandle to the syscon node that handles the SGMII setup
-   which is required for those SoCs equipped with SGMII such as MT7622 SoC.
+- mediatek,infracfg: phandle to the syscon node that handles the path from
+   GMAC to PHY variants, which is required for MT7629 SoC.
+- mediatek,sgmiisys: a list of phandles to the syscon node that handles the
+   SGMII setup which is required for those SoCs equipped with SGMII such
+   as MT7622 and MT7629 SoC. And MT7622 have only one set of SGMII shared
+   by GMAC1 and GMAC2; MT7629 have two independent sets of SGMII directed
+   to GMAC1 and GMAC2, respectively.
 - mediatek,pctl: phandle to the syscon node that handles the ports slew rate
and driver current: only for MT2701 and MT7623 SoC
 
-- 
2.17.1

[PATCH net-next v1 5/6] net: ethernet: mediatek: Add MT7629 ethernet support

2019-05-31 Thread sean.wang

From: Sean Wang 

Add ethernet support to MT7629 SoC

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 14 --
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 19 +++
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 382173fa4752..362eacd82b92 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -54,8 +54,10 @@ static const struct mtk_ethtool_stats {
 };
 
 static const char * const mtk_clks_source_name[] = {
-   "ethif", "esw", "gp0", "gp1", "gp2", "trgpll", "sgmii_tx250m",
-   "sgmii_rx250m", "sgmii_cdr_ref", "sgmii_cdr_fb", "sgmii_ck", "eth2pll"
+   "ethif", "sgmiitop", "esw", "gp0", "gp1", "gp2", "fe", "trgpll",
+   "sgmii_tx250m", "sgmii_rx250m", "sgmii_cdr_ref", "sgmii_cdr_fb",
+   "sgmii2_tx250m", "sgmii2_rx250m", "sgmii2_cdr_ref", "sgmii2_cdr_fb",
+   "sgmii_ck", "eth2pll",
 };
 
 void mtk_w32(struct mtk_eth *eth, u32 val, unsigned reg)
@@ -2629,11 +2631,19 @@ static const struct mtk_soc_data mt7623_data = {
.required_pctl = true,
 };
 
+static const struct mtk_soc_data mt7629_data = {
+   .ana_rgc3 = 0x128,
+   .caps = MT7629_CAPS | MTK_HWLRO,
+   .required_clks = MT7629_CLKS_BITMAP,
+   .required_pctl = false,
+};
+
 const struct of_device_id of_mtk_match[] = {
{ .compatible = "mediatek,mt2701-eth", .data = _data},
{ .compatible = "mediatek,mt7621-eth", .data = _data},
{ .compatible = "mediatek,mt7622-eth", .data = _data},
{ .compatible = "mediatek,mt7623-eth", .data = _data},
+   { .compatible = "mediatek,mt7629-eth", .data = _data},
{},
 };
 MODULE_DEVICE_TABLE(of, of_mtk_match);
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 89d68dd60b3d..a0aa5008d5cc 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -475,15 +475,21 @@ enum mtk_tx_flags {
  */
 enum mtk_clks_map {
MTK_CLK_ETHIF,
+   MTK_CLK_SGMIITOP,
MTK_CLK_ESW,
MTK_CLK_GP0,
MTK_CLK_GP1,
MTK_CLK_GP2,
+   MTK_CLK_FE,
MTK_CLK_TRGPLL,
MTK_CLK_SGMII_TX_250M,
MTK_CLK_SGMII_RX_250M,
MTK_CLK_SGMII_CDR_REF,
MTK_CLK_SGMII_CDR_FB,
+   MTK_CLK_SGMII2_TX_250M,
+   MTK_CLK_SGMII2_RX_250M,
+   MTK_CLK_SGMII2_CDR_REF,
+   MTK_CLK_SGMII2_CDR_FB,
MTK_CLK_SGMII_CK,
MTK_CLK_ETH2PLL,
MTK_CLK_MAX
@@ -502,6 +508,19 @@ enum mtk_clks_map {
 BIT(MTK_CLK_SGMII_CK) | \
 BIT(MTK_CLK_ETH2PLL))
 #define MT7621_CLKS_BITMAP (0)
+#define MT7629_CLKS_BITMAP (BIT(MTK_CLK_ETHIF) | BIT(MTK_CLK_ESW) |  \
+BIT(MTK_CLK_GP0) | BIT(MTK_CLK_GP1) | \
+BIT(MTK_CLK_GP2) | BIT(MTK_CLK_FE) | \
+BIT(MTK_CLK_SGMII_TX_250M) | \
+BIT(MTK_CLK_SGMII_RX_250M) | \
+BIT(MTK_CLK_SGMII_CDR_REF) | \
+BIT(MTK_CLK_SGMII_CDR_FB) | \
+BIT(MTK_CLK_SGMII2_TX_250M) | \
+BIT(MTK_CLK_SGMII2_RX_250M) | \
+BIT(MTK_CLK_SGMII2_CDR_REF) | \
+BIT(MTK_CLK_SGMII2_CDR_FB) | \
+BIT(MTK_CLK_SGMII_CK) | \
+BIT(MTK_CLK_ETH2PLL) | BIT(MTK_CLK_SGMIITOP))
 
 enum mtk_dev_state {
MTK_HW_INIT,
-- 
2.17.1

[PATCH net-next v1 3/6] net: ethernet: mediatek: Extend SGMII related functions

2019-05-31 Thread sean.wang

From: Sean Wang 

Add SGMII related logic into a separate file, and also provides options for
forcing 1G, 2.5, AN mode for the target PHY, that can be determined from
SGMII node in DTS.

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/Makefile  |   2 +-
 drivers/net/ethernet/mediatek/mtk_eth_soc.c |  75 --
 drivers/net/ethernet/mediatek/mtk_eth_soc.h |  39 +++-
 drivers/net/ethernet/mediatek/mtk_sgmii.c   | 105 
 4 files changed, 184 insertions(+), 37 deletions(-)
 create mode 100644 drivers/net/ethernet/mediatek/mtk_sgmii.c

diff --git a/drivers/net/ethernet/mediatek/Makefile 
b/drivers/net/ethernet/mediatek/Makefile
index d41a2414c575..b8206605154e 100644
--- a/drivers/net/ethernet/mediatek/Makefile
+++ b/drivers/net/ethernet/mediatek/Makefile
@@ -3,4 +3,4 @@
 # Makefile for the Mediatek SoCs built-in ethernet macs
 #
 
-obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth_soc.o
+obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth_soc.o mtk_sgmii.o
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 765cd56ebcd2..d0cff646d3de 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -165,36 +165,37 @@ static void mtk_gmac0_rgmii_adjust(struct mtk_eth *eth, 
int speed)
mtk_w32(eth, val, TRGMII_TCK_CTRL);
 }
 
-static void mtk_gmac_sgmii_hw_setup(struct mtk_eth *eth, int mac_id)
+static int mtk_gmac_sgmii_hw_setup(struct mtk_eth *eth, int mac_id)
 {
+   int sid, err;
u32 val;
 
-   /* Setup the link timer and QPHY power up inside SGMIISYS */
-   regmap_write(eth->sgmiisys, SGMSYS_PCS_LINK_TIMER,
-SGMII_LINK_TIMER_DEFAULT);
+   /* Enable GMAC with SGMII once we finish the SGMII setup. */
+   regmap_read(eth->ethsys, ETHSYS_SYSCFG0, );
+   val &= ~SYSCFG0_SGMII_MASK;
+   regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
 
-   regmap_read(eth->sgmiisys, SGMSYS_SGMII_MODE, );
-   val |= SGMII_REMOTE_FAULT_DIS;
-   regmap_write(eth->sgmiisys, SGMSYS_SGMII_MODE, val);
+   if (MTK_HAS_CAPS(eth->soc->caps, MTK_GMAC_SHARED_SGMII))
+   sid = 0;
+   else
+   sid = mac_id;
 
-   regmap_read(eth->sgmiisys, SGMSYS_PCS_CONTROL_1, );
-   val |= SGMII_AN_RESTART;
-   regmap_write(eth->sgmiisys, SGMSYS_PCS_CONTROL_1, val);
+   if (MTK_HAS_FLAGS(eth->sgmii->flags[sid], MTK_SGMII_PHYSPEED_AN))
+   err = mtk_sgmii_setup_mode_an(eth->sgmii, sid);
+   else
+   err = mtk_sgmii_setup_mode_force(eth->sgmii, sid);
 
-   regmap_read(eth->sgmiisys, SGMSYS_QPHY_PWR_STATE_CTRL, );
-   val &= ~SGMII_PHYA_PWD;
-   regmap_write(eth->sgmiisys, SGMSYS_QPHY_PWR_STATE_CTRL, val);
+   if (err)
+   return err;
 
/* Determine MUX for which GMAC uses the SGMII interface */
-   if (MTK_HAS_CAPS(eth->soc->caps, MTK_DUAL_GMAC_SHARED_SGMII)) {
-   regmap_read(eth->ethsys, ETHSYS_SYSCFG0, );
-   val &= ~SYSCFG0_SGMII_MASK;
-   val |= !mac_id ? SYSCFG0_SGMII_GMAC1 : SYSCFG0_SGMII_GMAC2;
-   regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
-
-   dev_info(eth->dev, "setup shared sgmii for gmac=%d\n",
-mac_id);
-   }
+   regmap_read(eth->ethsys, ETHSYS_SYSCFG0, );
+   if (!mac_id)
+   val |= SYSCFG0_SGMII_GMAC1;
+   else
+   val |= MTK_HAS_CAPS(eth->soc->caps, MTK_GMAC_SHARED_SGMII) ?
+   SYSCFG0_SGMII_GMAC2 : SYSCFG0_SGMII_GMAC2_V2;
+   regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
 
/* Setup the GMAC1 going through SGMII path when SoC also support
 * ESW on GMAC1
@@ -204,6 +205,8 @@ static void mtk_gmac_sgmii_hw_setup(struct mtk_eth *eth, 
int mac_id)
mtk_w32(eth, 0, MTK_MAC_MISC);
dev_info(eth->dev, "setup gmac1 going through sgmii");
}
+
+   return 0;
 }
 
 static void mtk_phy_link_adjust(struct net_device *dev)
@@ -295,6 +298,7 @@ static int mtk_phy_connect(struct net_device *dev)
struct mtk_eth *eth;
struct device_node *np;
u32 val;
+   int err;
 
eth = mac->hw;
np = of_parse_phandle(mac->of_node, "phy-handle", 0);
@@ -314,8 +318,11 @@ static int mtk_phy_connect(struct net_device *dev)
case PHY_INTERFACE_MODE_RGMII:
break;
case PHY_INTERFACE_MODE_SGMII:
-   if (MTK_HAS_CAPS(eth->soc->caps, MTK_SGMII))
-   mtk_gmac_sgmii_hw_setup(eth, mac->id);
+   if (MTK_HAS_CAPS(eth->soc->caps, MTK_SGMII)) {
+   err = mtk_gmac_sgmii_hw_setup(eth, mac->id);
+   if (err)
+   goto err_phy;
+   }
break;
case PHY_INTERFACE_MODE_MII:
mac->ge_mode = 1;
@@ -2484,13 +2491,16 @@

Re: [PATCH bpf v4] libbpf: Return btf_fd for load_sk_storage_btf

2019-05-31 Thread Alexei Starovoitov

On Thu, May 30, 2019 at 2:34 PM Song Liu  wrote:
>
> On Wed, May 29, 2019 at 11:30 AM Michal Rostecki  
> wrote:
> >
> > Before this change, function load_sk_storage_btf expected that
> > libbpf__probe_raw_btf was returning a BTF descriptor, but in fact it was
> > returning an information about whether the probe was successful (0 or
> > 1). load_sk_storage_btf was using that value as an argument of the close
> > function, which was resulting in closing stdout and thus terminating the
> > process which called that function.
> >
> > That bug was visible in bpftool. `bpftool feature` subcommand was always
> > exiting too early (because of closed stdout) and it didn't display all
> > requested probes. `bpftool -j feature` or `bpftool -p feature` were not
> > returning a valid json object.
> >
> > This change renames the libbpf__probe_raw_btf function to
> > libbpf__load_raw_btf, which now returns a BTF descriptor, as expected in
> > load_sk_storage_btf.
> >
> > v2:
> > - Fix typo in the commit message.
> >
> > v3:
> > - Simplify BTF descriptor handling in bpf_object__probe_btf_* functions.
> > - Rename libbpf__probe_raw_btf function to libbpf__load_raw_btf and
> > return a BTF descriptor.
> >
> > v4:
> > - Fix typo in the commit message.
> >
> > Fixes: d7c4b3980c18 ("libbpf: detect supported kernel BTF features and 
> > sanitize BTF")
> > Signed-off-by: Michal Rostecki 
> > Acked-by: Andrii Nakryiko 
>
> Acked-by: Song Liu 

Applied. Thanks!

Re: [RFC][PATCH 2/2] reset: qcom-pon: Add support for gen2 pon

2019-05-31 Thread John Stultz

On Fri, May 31, 2019 at 4:53 PM Bjorn Andersson
 wrote:
>
> On Fri 31 May 16:47 PDT 2019, John Stultz wrote:
>
> > Add support for gen2 pon register so "reboot bootloader" can
> > work on pixel3 and db845.
> >
> > Cc: Andy Gross 
> > Cc: David Brown 
> > Cc: Bjorn Andersson 
> > Cc: Amit Pundir 
> > Cc: Rob Herring 
> > Cc: Mark Rutland 
> > Cc: Sebastian Reichel 
> > Cc: linux-arm-...@vger.kernel.org
> > Cc: devicet...@vger.kernel.org
> > Signed-off-by: John Stultz 
> > ---
> >  arch/arm64/boot/dts/qcom/pm8998.dtsi |  2 +-
> >  drivers/power/reset/qcom-pon.c   | 15 ---
> >  2 files changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
> > b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> > index d3ca35a940fb..051a52df80f9 100644
> > --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> > @@ -39,7 +39,7 @@
> >   #size-cells = <0>;
> >
> >   pm8998_pon: pon@800 {
> > - compatible = "qcom,pm8916-pon";
> > + compatible = "qcom,pm8998-pon";
> >
> >   reg = <0x800>;
> >   mode-bootloader = <0x2>;
>
> We want to take this through arm-soc and the rest through Sebastian's
> tree, so please provide the dts update in a separate commit.

Sure. I wasn't sure if tracking the change in a separate patch was
worth it for such a trivial oneliner, but that's fine, I'll split it
out.

thanks for the review!
-john

Re: [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup

2019-05-31 Thread Alexei Starovoitov

On Wed, May 29, 2019 at 6:04 PM Roman Gushchin  wrote:
>
> During my work on memcg-based memory accounting for bpf maps
> I've done some cleanups and refactorings of the existing
> memlock rlimit-based code. It makes it more robust, unifies
> size to pages conversion, size checks and corresponding error
> codes. Also it adds coverage for cgroup local storage and
> socket local storage maps.
>
> It looks like some preliminary work on the mm side might be
> required to start working on the memcg-based accounting,
> so I'm sending these patches as a separate patchset.

Applied. Thanks

Re: [PATCH v2 00/17] net: introduce Qualcomm IPA driver

2019-05-31 Thread Subash Abhinov Kasiviswanathan

On 2019-05-31 17:33, Bjorn Andersson wrote:

On Fri 31 May 13:47 PDT 2019, Alex Elder wrote:

On 5/31/19 2:19 PM, Arnd Bergmann wrote:
> On Fri, May 31, 2019 at 6:36 PM Alex Elder  wrote:
>> On 5/31/19 9:58 AM, Dan Williams wrote:
>>> On Thu, 2019-05-30 at 22:53 -0500, Alex Elder wrote:
>>>
>>> My question from the Nov 2018 IPA rmnet driver still stands; how does
>>> this relate to net/ethernet/qualcomm/rmnet/ if at all? And if this is
>>> really just a netdev talking to the IPA itself and unrelated to
>>> net/ethernet/qualcomm/rmnet, let's call it "ipa%d" and stop cargo-
>>> culting rmnet around just because it happens to be a net driver for a
>>> QC SoC.
>>
>> First, the relationship between the IPA driver and the rmnet driver
>> is that the IPA driver is assumed to sit between the rmnet driver
>> and the hardware.
>
> Does this mean that IPA can only be used to back rmnet, and rmnet
> can only be used on top of IPA, or can or both of them be combined
> with another driver to talk to instead?

No it does not mean that.

As I understand it, one reason for the rmnet layer was to abstract
the back end, which would allow using a modem, or using something
else (a LAN?), without exposing certain details of the hardware.
(Perhaps to support multiplexing, etc. without duplicating that
logic in two "back-end" drivers?)

To be perfectly honest, at first I thought having IPA use rmnet
was a cargo cult thing like Dan suggested, because I didn't see
the benefit.  I now see why one would use that pass-through layer
to handle the QMAP features.

But back to your question.  The other thing is that I see no
reason the IPA couldn't present a "normal" (non QMAP) interface
for a modem.  It's something I'd really like to be able to do,
but I can't do it without having the modem firmware change its
configuration for these endpoints.  My access to the people who
implement the modem firmware has been very limited (something
I hope to improve), and unless and until I can get corresponding
changes on the modem side to implement connections that don't
use QMAP, I can't implement such a thing.

But any such changes would either be years into the future or for
specific devices and as such not applicable to any/most of devices on
the market now or in the coming years.

But as Arnd points out, if the software split between IPA and rmnet is
suboptimal your are encouraged to fix that.

Regards,
Bjorn

The split rmnet design was chosen because we could place rmnet
over any transport - IPA, PCIe (https://lkml.org/lkml/2018/4/26/1159)
or USB.

rmnet registers a rx handler, so the rmnet packet processing itself
happens in the same softirq when packets are queued to network stack
by IPA.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH 3/5] asm-generic: Register fchmodat4 as syscall 428

2019-05-31 Thread Palmer Dabbelt


On Fri, 31 May 2019 12:56:39 PDT (-0700), Arnd Bergmann wrote:

On Fri, May 31, 2019 at 9:23 PM Palmer Dabbelt  wrote:


Signed-off-by: Palmer Dabbelt 


As usual, each patch needs a changelog text. I would prefer having a single
patch here that changes /all/ system call tables at once, rather than doing one
at a time like we used to.


Works for me.  That also gives me something to write about it the text :)


In linux-next, we are at number 434 now, and there are a couple of other
new system calls being proposed right now (more than usual), so you may
have to change the number a few times.


OK, no problem.  It'll be a bit easier to handle the number that way.


Note: most architectures use .tbl files now, the exceptions are
include/uapi/asm-generic/unistd.h and arch/arm64/include/asm/unistd32.h,
and the latter also requires changing __NR_compat_syscalls in asm/unistd.h.

Numbers should now be the same across architectures, except for alpha,
which has a +110 offset. We have discussed ways to have a single
file to modify for a new call on all architectures, but no patches yet.


OK, thanks.  I'll wait a bit for feedback, but unless there's anything else
I'll go ahead and finish this.

Re: [RFC][PATCH 2/2] reset: qcom-pon: Add support for gen2 pon

2019-05-31 Thread Bjorn Andersson

On Fri 31 May 16:47 PDT 2019, John Stultz wrote:

> Add support for gen2 pon register so "reboot bootloader" can
> work on pixel3 and db845.
> 
> Cc: Andy Gross 
> Cc: David Brown 
> Cc: Bjorn Andersson 
> Cc: Amit Pundir 
> Cc: Rob Herring 
> Cc: Mark Rutland 
> Cc: Sebastian Reichel 
> Cc: linux-arm-...@vger.kernel.org
> Cc: devicet...@vger.kernel.org
> Signed-off-by: John Stultz 
> ---
>  arch/arm64/boot/dts/qcom/pm8998.dtsi |  2 +-
>  drivers/power/reset/qcom-pon.c   | 15 ---
>  2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
> b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> index d3ca35a940fb..051a52df80f9 100644
> --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
> +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> @@ -39,7 +39,7 @@
>   #size-cells = <0>;
>  
>   pm8998_pon: pon@800 {
> - compatible = "qcom,pm8916-pon";
> + compatible = "qcom,pm8998-pon";
>  
>   reg = <0x800>;
>   mode-bootloader = <0x2>;

We want to take this through arm-soc and the rest through Sebastian's
tree, so please provide the dts update in a separate commit.

Apart from that this looks good!

Reviewed-by: Bjorn Andersson 

Regards,
Bjorn

> diff --git a/drivers/power/reset/qcom-pon.c b/drivers/power/reset/qcom-pon.c
> index 3fa1642d4c54..d0336a1612a4 100644
> --- a/drivers/power/reset/qcom-pon.c
> +++ b/drivers/power/reset/qcom-pon.c
> @@ -14,11 +14,15 @@
>  
>  #define PON_SOFT_RB_SPARE0x8f
>  
> +#define GEN1_REASON_SHIFT2
> +#define GEN2_REASON_SHIFT1
> +
>  struct pm8916_pon {
>   struct device *dev;
>   struct regmap *regmap;
>   u32 baseaddr;
>   struct reboot_mode_driver reboot_mode;
> + long reason_shift;
>  };
>  
>  static int pm8916_reboot_mode_write(struct reboot_mode_driver *reboot,
> @@ -30,15 +34,18 @@ static int pm8916_reboot_mode_write(struct 
> reboot_mode_driver *reboot,
>  
>   ret = regmap_update_bits(pon->regmap,
>pon->baseaddr + PON_SOFT_RB_SPARE,
> -  0xfc, magic << 2);
> +  0xfc, magic << pon->reason_shift);
>   if (ret < 0)
>   dev_err(pon->dev, "update reboot mode bits failed\n");
>  
>   return ret;
>  }
>  
> +static const struct of_device_id pm8916_pon_id_table[];
> +
>  static int pm8916_pon_probe(struct platform_device *pdev)
>  {
> + const struct of_device_id *match;
>   struct pm8916_pon *pon;
>   int error;
>  
> @@ -60,6 +67,7 @@ static int pm8916_pon_probe(struct platform_device *pdev)
>   return error;
>  
>   pon->reboot_mode.dev = >dev;
> + pon->reason_shift = of_device_get_match_data(>dev);
>   pon->reboot_mode.write = pm8916_reboot_mode_write;
>   error = devm_reboot_mode_register(>dev, >reboot_mode);
>   if (error) {
> @@ -73,8 +81,9 @@ static int pm8916_pon_probe(struct platform_device *pdev)
>  }
>  
>  static const struct of_device_id pm8916_pon_id_table[] = {
> - { .compatible = "qcom,pm8916-pon" },
> - { .compatible = "qcom,pms405-pon" },
> + { .compatible = "qcom,pm8916-pon", .data = (void *)GEN1_REASON_SHIFT },
> + { .compatible = "qcom,pms405-pon", .data = (void *)GEN1_REASON_SHIFT },
> + { .compatible = "qcom,pm8998-pon", .data = (void *)GEN2_REASON_SHIFT },
>   { }
>  };
>  MODULE_DEVICE_TABLE(of, pm8916_pon_id_table);
> -- 
> 2.17.1
>

Re: [RFC][PATCH 1/2] dt-bindings: power: reset: qcom: Add qcom,pm8998-pon compatability line

2019-05-31 Thread Bjorn Andersson

On Fri 31 May 16:47 PDT 2019, John Stultz wrote:

> Update bindings to support for qcom,pm8998-pon which uses gen2 pon
> 
> Cc: Andy Gross 
> Cc: David Brown 
> Cc: Bjorn Andersson 

Reviewed-by: Bjorn Andersson 

> Cc: Amit Pundir 
> Cc: Rob Herring 
> Cc: Mark Rutland 
> Cc: Sebastian Reichel 
> Cc: linux-arm-...@vger.kernel.org
> Cc: devicet...@vger.kernel.org
> Signed-off-by: John Stultz 
> ---
>  Documentation/devicetree/bindings/power/reset/qcom,pon.txt | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt 
> b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt
> index 5705f575862d..0c0dc3a1e693 100644
> --- a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt
> +++ b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt
> @@ -9,6 +9,7 @@ Required Properties:
>  -compatible: Must be one of:
>   "qcom,pm8916-pon"
>   "qcom,pms405-pon"
> + "qcom,pm8998-pon"
>  
>  -reg: Specifies the physical address of the pon register
>  
> -- 
> 2.17.1
>

[RFC][PATCH 1/2] dt-bindings: power: reset: qcom: Add qcom,pm8998-pon compatability line

2019-05-31 Thread John Stultz

Update bindings to support for qcom,pm8998-pon which uses gen2 pon

Cc: Andy Gross 
Cc: David Brown 
Cc: Bjorn Andersson 
Cc: Amit Pundir 
Cc: Rob Herring 
Cc: Mark Rutland 
Cc: Sebastian Reichel 
Cc: linux-arm-...@vger.kernel.org
Cc: devicet...@vger.kernel.org
Signed-off-by: John Stultz 
---
 Documentation/devicetree/bindings/power/reset/qcom,pon.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt 
b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt
index 5705f575862d..0c0dc3a1e693 100644
--- a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt
+++ b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt
@@ -9,6 +9,7 @@ Required Properties:
 -compatible: Must be one of:
"qcom,pm8916-pon"
"qcom,pms405-pon"
+   "qcom,pm8998-pon"
 
 -reg: Specifies the physical address of the pon register
 
-- 
2.17.1

[RFC][PATCH 2/2] reset: qcom-pon: Add support for gen2 pon

2019-05-31 Thread John Stultz

Add support for gen2 pon register so "reboot bootloader" can
work on pixel3 and db845.

Cc: Andy Gross 
Cc: David Brown 
Cc: Bjorn Andersson 
Cc: Amit Pundir 
Cc: Rob Herring 
Cc: Mark Rutland 
Cc: Sebastian Reichel 
Cc: linux-arm-...@vger.kernel.org
Cc: devicet...@vger.kernel.org
Signed-off-by: John Stultz 
---
 arch/arm64/boot/dts/qcom/pm8998.dtsi |  2 +-
 drivers/power/reset/qcom-pon.c   | 15 ---
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
b/arch/arm64/boot/dts/qcom/pm8998.dtsi
index d3ca35a940fb..051a52df80f9 100644
--- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
+++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
@@ -39,7 +39,7 @@
#size-cells = <0>;
 
pm8998_pon: pon@800 {
-   compatible = "qcom,pm8916-pon";
+   compatible = "qcom,pm8998-pon";
 
reg = <0x800>;
mode-bootloader = <0x2>;
diff --git a/drivers/power/reset/qcom-pon.c b/drivers/power/reset/qcom-pon.c
index 3fa1642d4c54..d0336a1612a4 100644
--- a/drivers/power/reset/qcom-pon.c
+++ b/drivers/power/reset/qcom-pon.c
@@ -14,11 +14,15 @@
 
 #define PON_SOFT_RB_SPARE  0x8f
 
+#define GEN1_REASON_SHIFT  2
+#define GEN2_REASON_SHIFT  1
+
 struct pm8916_pon {
struct device *dev;
struct regmap *regmap;
u32 baseaddr;
struct reboot_mode_driver reboot_mode;
+   long reason_shift;
 };
 
 static int pm8916_reboot_mode_write(struct reboot_mode_driver *reboot,
@@ -30,15 +34,18 @@ static int pm8916_reboot_mode_write(struct 
reboot_mode_driver *reboot,
 
ret = regmap_update_bits(pon->regmap,
 pon->baseaddr + PON_SOFT_RB_SPARE,
-0xfc, magic << 2);
+0xfc, magic << pon->reason_shift);
if (ret < 0)
dev_err(pon->dev, "update reboot mode bits failed\n");
 
return ret;
 }
 
+static const struct of_device_id pm8916_pon_id_table[];
+
 static int pm8916_pon_probe(struct platform_device *pdev)
 {
+   const struct of_device_id *match;
struct pm8916_pon *pon;
int error;
 
@@ -60,6 +67,7 @@ static int pm8916_pon_probe(struct platform_device *pdev)
return error;
 
pon->reboot_mode.dev = >dev;
+   pon->reason_shift = of_device_get_match_data(>dev);
pon->reboot_mode.write = pm8916_reboot_mode_write;
error = devm_reboot_mode_register(>dev, >reboot_mode);
if (error) {
@@ -73,8 +81,9 @@ static int pm8916_pon_probe(struct platform_device *pdev)
 }
 
 static const struct of_device_id pm8916_pon_id_table[] = {
-   { .compatible = "qcom,pm8916-pon" },
-   { .compatible = "qcom,pms405-pon" },
+   { .compatible = "qcom,pm8916-pon", .data = (void *)GEN1_REASON_SHIFT },
+   { .compatible = "qcom,pms405-pon", .data = (void *)GEN1_REASON_SHIFT },
+   { .compatible = "qcom,pm8998-pon", .data = (void *)GEN2_REASON_SHIFT },
{ }
 };
 MODULE_DEVICE_TABLE(of, pm8916_pon_id_table);
-- 
2.17.1

[PATCH 6/8] EDAC/amd64: Decode syndrome before translating address

2019-05-31 Thread Ghannam, Yazen

From: Yazen Ghannam 

AMD Family 17h systems currently require address translation in order to
report the system address of a DRAM ECC error. This is currently done
before decoding the syndrome information. The syndrome information does
not depend on the address translation, so the proper EDAC csrow/channel
reporting can function without the address. However, the syndrome
information will not be decoded if the address translation fails.

Decode the syndrome information before doing the address translation.
The syndrome information is architecturally defined in MCA_SYND and can
be considered robust. The address translation is system-specific and may
fail on newer systems without proper updates to the translation
algorithm.

Fixes: 713ad54675fd ("EDAC, amd64: Define and register UMC error decode 
function")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index f0424c10cac0..4058b24b8e04 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2567,13 +2567,6 @@ static void decode_umc_error(int node_id, struct mce *m)
 
err.channel = find_umc_channel(m);
 
-   if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, 
_addr)) {
-   err.err_code = ERR_NORM_ADDR;
-   goto log_error;
-   }
-
-   error_address_to_page_and_offset(sys_addr, );
-
if (!(m->status & MCI_STATUS_SYNDV)) {
err.err_code = ERR_SYND;
goto log_error;
@@ -2590,6 +2583,13 @@ static void decode_umc_error(int node_id, struct mce *m)
 
err.csrow = m->synd & 0x7;
 
+   if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, 
_addr)) {
+   err.err_code = ERR_NORM_ADDR;
+   goto log_error;
+   }
+
+   error_address_to_page_and_offset(sys_addr, );
+
 log_error:
__log_ecc_error(mci, , ecc_type);
 }
-- 
2.17.1

[PATCH 3/8] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP

2019-05-31 Thread Ghannam, Yazen

From: Yazen Ghannam 

AMD Family 17h systems support x4 and x16 DRAM devices. However, the
device type is not checked when setting EDAC_CTL_CAP.

Set the appropriate EDAC_CTL_CAP flag based on the device type.

Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index dd60cf5a3d96..125d6e2a828e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 static inline void
 f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
 {
-   u8 i, ecc_en = 1, cpk_en = 1;
+   u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
 
for_each_umc(i) {
if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
cpk_en &= !!(pvt->umc[i].umc_cap_hi & 
UMC_ECC_CHIPKILL_CAP);
+
+   dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6));
+   dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7));
}
}
 
@@ -3163,8 +3166,12 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, 
struct amd64_pvt *pvt)
if (ecc_en) {
mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
 
-   if (cpk_en)
-   mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+   if (cpk_en) {
+   if (dev_x4)
+   mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+   else if (dev_x16)
+   mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED;
+   }
}
 }
 
-- 
2.17.1

[PATCH 7/8] EDAC/amd64: Cache secondary Chip Select registers

2019-05-31 Thread Ghannam, Yazen

From: Yazen Ghannam 

AMD Family 17h systems have a set of secondary Chip Select Base
Addresses and Address Masks. These do not represent unique Chip
Selects, rather they are used in conjunction with the primary
Chip Select registers in certain use cases.

Cache these secondary Chip Select registers for future use.

Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 23 ---
 drivers/edac/amd64_edac.h |  4 
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 4058b24b8e04..006417cb79dc 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -943,34 +943,51 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
 
 static void read_umc_base_mask(struct amd64_pvt *pvt)
 {
-   u32 umc_base_reg, umc_mask_reg;
-   u32 base_reg, mask_reg;
-   u32 *base, *mask;
+   u32 umc_base_reg, umc_base_reg_sec;
+   u32 umc_mask_reg, umc_mask_reg_sec;
+   u32 base_reg, base_reg_sec;
+   u32 mask_reg, mask_reg_sec;
+   u32 *base, *base_sec;
+   u32 *mask, *mask_sec;
int cs, umc;
 
for_each_umc(umc) {
umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
+   umc_base_reg_sec = get_umc_base(umc) + UMCCH_BASE_ADDR_SEC;
 
for_each_chip_select(cs, umc, pvt) {
base = >csels[umc].csbases[cs];
+   base_sec = >csels[umc].csbases_sec[cs];
 
base_reg = umc_base_reg + (cs * 4);
+   base_reg_sec = umc_base_reg_sec + (cs * 4);
 
if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
edac_dbg(0, "  DCSB%d[%d]=0x%08x reg: 0x%x\n",
 umc, cs, *base, base_reg);
+
+   if (!amd_smn_read(pvt->mc_node_id, base_reg_sec, 
base_sec))
+   edac_dbg(0, "DCSB_SEC%d[%d]=0x%08x reg: 
0x%x\n",
+umc, cs, *base_sec, base_reg_sec);
}
 
umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
+   umc_mask_reg_sec = get_umc_base(umc) + UMCCH_ADDR_MASK_SEC;
 
for_each_chip_select_mask(cs, umc, pvt) {
mask = >csels[umc].csmasks[cs];
+   mask_sec = >csels[umc].csmasks_sec[cs];
 
mask_reg = umc_mask_reg + (cs * 4);
+   mask_reg_sec = umc_mask_reg_sec + (cs * 4);
 
if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
edac_dbg(0, "  DCSM%d[%d]=0x%08x reg: 0x%x\n",
 umc, cs, *mask, mask_reg);
+
+   if (!amd_smn_read(pvt->mc_node_id, mask_reg_sec, 
mask_sec))
+   edac_dbg(0, "DCSM_SEC%d[%d]=0x%08x reg: 
0x%x\n",
+umc, cs, *mask_sec, mask_reg_sec);
}
}
 }
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 4dce6a2ac75f..68f12de6e654 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -259,7 +259,9 @@
 
 /* UMC CH register offsets */
 #define UMCCH_BASE_ADDR0x0
+#define UMCCH_BASE_ADDR_SEC0x10
 #define UMCCH_ADDR_MASK0x20
+#define UMCCH_ADDR_MASK_SEC0x28
 #define UMCCH_ADDR_CFG 0x30
 #define UMCCH_DIMM_CFG 0x80
 #define UMCCH_UMC_CFG  0x100
@@ -312,9 +314,11 @@ struct dram_range {
 /* A DCT chip selects collection */
 struct chip_select {
u32 csbases[NUM_CHIPSELECTS];
+   u32 csbases_sec[NUM_CHIPSELECTS];
u8 b_cnt;
 
u32 csmasks[NUM_CHIPSELECTS];
+   u32 csmasks_sec[NUM_CHIPSELECTS];
u8 m_cnt;
 };
 
-- 
2.17.1

[PATCH 2/8] EDAC/amd64: Support more than two controllers for chip selects handling

2019-05-31 Thread Ghannam, Yazen

From: Yazen Ghannam 

The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.

Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.

Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.

This is a second version of a commit that was reverted.

Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for 
chip select handling"")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 122 +-
 drivers/edac/amd64_edac.h |   5 +-
 2 files changed, 69 insertions(+), 58 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 9fa2f205f05c..dd60cf5a3d96 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -943,91 +943,101 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
} else if (pvt->fam >= 0x17) {
-   pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
-   pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
+   int umc;
+
+   for_each_umc(umc) {
+   pvt->csels[umc].b_cnt = 4;
+   pvt->csels[umc].m_cnt = 2;
+   }
+
} else {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
}
 }
 
+static void read_umc_base_mask(struct amd64_pvt *pvt)
+{
+   u32 umc_base_reg, umc_mask_reg;
+   u32 base_reg, mask_reg;
+   u32 *base, *mask;
+   int cs, umc;
+
+   for_each_umc(umc) {
+   umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
+
+   for_each_chip_select(cs, umc, pvt) {
+   base = >csels[umc].csbases[cs];
+
+   base_reg = umc_base_reg + (cs * 4);
+
+   if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
+   edac_dbg(0, "  DCSB%d[%d]=0x%08x reg: 0x%x\n",
+umc, cs, *base, base_reg);
+   }
+
+   umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
+
+   for_each_chip_select_mask(cs, umc, pvt) {
+   mask = >csels[umc].csmasks[cs];
+
+   mask_reg = umc_mask_reg + (cs * 4);
+
+   if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
+   edac_dbg(0, "  DCSM%d[%d]=0x%08x reg: 0x%x\n",
+umc, cs, *mask, mask_reg);
+   }
+   }
+}
+
 /*
  * Function 2 Offset F10_DCSB0; read in the DCS Base and DCS Mask registers
  */
 static void read_dct_base_mask(struct amd64_pvt *pvt)
 {
-   int base_reg0, base_reg1, mask_reg0, mask_reg1, cs;
+   int cs;
 
prep_chip_selects(pvt);
 
-   if (pvt->umc) {
-   base_reg0 = get_umc_base(0) + UMCCH_BASE_ADDR;
-   base_reg1 = get_umc_base(1) + UMCCH_BASE_ADDR;
-   mask_reg0 = get_umc_base(0) + UMCCH_ADDR_MASK;
-   mask_reg1 = get_umc_base(1) + UMCCH_ADDR_MASK;
-   } else {
-   base_reg0 = DCSB0;
-   base_reg1 = DCSB1;
-   mask_reg0 = DCSM0;
-   mask_reg1 = DCSM1;
-   }
+   if (pvt->umc)
+   return read_umc_base_mask(pvt);
 
for_each_chip_select(cs, 0, pvt) {
-   int reg0   = base_reg0 + (cs * 4);
-   int reg1   = base_reg1 + (cs * 4);
+   int reg0   = DCSB0 + (cs * 4);
+   int reg1   = DCSB1 + (cs * 4);
u32 *base0 = >csels[0].csbases[cs];
u32 *base1 = >csels[1].csbases[cs];
 
-   if (pvt->umc) {
-   if (!amd_smn_read(pvt->mc_node_id, reg0, base0))
-   edac_dbg(0, "  DCSB0[%d]=0x%08x reg: 0x%x\n",
-cs, *base0, reg0);
-
-   if (!amd_smn_read(pvt->mc_node_id, reg1, base1))
-   edac_dbg(0, "  DCSB1[%d]=0x%08x reg: 0x%x\n",
-cs, *base1, reg1);
-   } else {
-   if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, base0))
-   edac_dbg(0, "  DCSB0[%d]=0x%08x reg: F2x%x\n",
-cs, *base0, reg0);
+   if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, base0))
+   edac_dbg(0, "  DCSB0[%d]=0x%08x reg:

[PATCH 8/8] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs

2019-05-31 Thread Ghannam, Yazen

From: Yazen Ghannam 

Future AMD systems will support "Asymmetric" Dual-Rank DIMMs. These are
DIMMs were the ranks are of different sizes.

The even rank will use the Primary Even Chip Select registers and the
odd rank will use the Secondary Odd Chip Select registers.

Recognize if a Secondary Odd Chip Select is being used. Use the
Secondary Odd Address Mask when calculating the chip select size.

Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 006417cb79dc..6c284a4f980c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -790,6 +790,9 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, 
u32 dclr, int chan)
 
 #define CS_EVEN_PRIMARYBIT(0)
 #define CS_ODD_PRIMARY BIT(1)
+#define CS_ODD_SECONDARY   BIT(2)
+
+#define csrow_sec_enabled(i, dct, pvt) ((pvt)->csels[(dct)].csbases_sec[(i)] & 
DCSB_CS_ENABLE)
 
 static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 {
@@ -801,6 +804,10 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct 
amd64_pvt *pvt)
if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
cs_mode |= CS_ODD_PRIMARY;
 
+   /* Asymmetric Dual-Rank DIMM support. */
+   if (csrow_sec_enabled(2 * dimm + 1, ctrl, pvt))
+   cs_mode |= CS_ODD_SECONDARY;
+
return cs_mode;
 }
 
@@ -1590,7 +1597,11 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt 
*pvt, u8 umc,
 */
dimm = csrow_nr >> 1;
 
-   addr_mask_orig = pvt->csels[umc].csmasks[dimm];
+   /* Asymmetric Dual-Rank DIMM support. */
+   if (cs_mode & CS_ODD_SECONDARY)
+   addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm];
+   else
+   addr_mask_orig = pvt->csels[umc].csmasks[dimm];
 
/*
 * The number of zero bits in the mask is equal to the number of bits
-- 
2.17.1

[PATCH 4/8] EDAC/amd64: Initialize DIMM info for systems with more than two channels

2019-05-31 Thread Ghannam, Yazen

From: Yazen Ghannam 

Currently, the DIMM info for AMD Family 17h systems is initialized in
init_csrows(). This function is shared with legacy systems, and it has a
limit of two channel support.

This prevents initialization of the DIMM info for a number of ranks, so
there will be missing ranks in the EDAC sysfs.

Create a new init_csrows_df() for Family17h+ and revert init_csrows()
back to pre-Family17h support.

Loop over all channels in the new function in order to support systems
with more than two channels.

Fixes: bdcee7747f5c ("EDAC/amd64: Support more than two Unified Memory 
Controllers")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 63 ++-
 1 file changed, 49 insertions(+), 14 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 125d6e2a828e..d0926b181c7c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2837,6 +2837,46 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 
dct, int csrow_nr_orig)
return nr_pages;
 }
 
+static int init_csrows_df(struct mem_ctl_info *mci)
+{
+   struct amd64_pvt *pvt = mci->pvt_info;
+   enum edac_type edac_mode = EDAC_NONE;
+   enum dev_type dev_type = DEV_UNKNOWN;
+   struct dimm_info *dimm;
+   int empty = 1;
+   u8 umc, cs;
+
+   if (mci->edac_ctl_cap & EDAC_FLAG_S16ECD16ED) {
+   edac_mode = EDAC_S16ECD16ED;
+   dev_type = DEV_X16;
+   } else if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED) {
+   edac_mode = EDAC_S4ECD4ED;
+   dev_type = DEV_X4;
+   } else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED) {
+   edac_mode = EDAC_SECDED;
+   }
+
+   for_each_umc(umc) {
+   for_each_chip_select(cs, umc, pvt) {
+   if (!csrow_enabled(cs, umc, pvt))
+   continue;
+
+   empty = 0;
+   dimm = mci->csrows[cs]->channels[umc]->dimm;
+
+   edac_dbg(1, "MC node: %d, csrow: %d\n",
+   pvt->mc_node_id, cs);
+
+   dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
+   dimm->mtype = pvt->dram_type;
+   dimm->edac_mode = edac_mode;
+   dimm->dtype = dev_type;
+   }
+   }
+
+   return empty;
+}
+
 /*
  * Initialize the array of csrow attribute instances, based on the values
  * from pci config hardware registers.
@@ -2851,15 +2891,16 @@ static int init_csrows(struct mem_ctl_info *mci)
int nr_pages = 0;
u32 val;
 
-   if (!pvt->umc) {
-   amd64_read_pci_cfg(pvt->F3, NBCFG, );
+   if (pvt->umc)
+   return init_csrows_df(mci);
+
+   amd64_read_pci_cfg(pvt->F3, NBCFG, );
 
-   pvt->nbcfg = val;
+   pvt->nbcfg = val;
 
-   edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: 
%d|DramEccEn: %d]\n",
-pvt->mc_node_id, val,
-!!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));
-   }
+   edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: %d|DramEccEn: %d]\n",
+pvt->mc_node_id, val,
+!!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));
 
/*
 * We iterate over DCT0 here but we look at DCT1 in parallel, if needed.
@@ -2896,13 +2937,7 @@ static int init_csrows(struct mem_ctl_info *mci)
edac_dbg(1, "Total csrow%d pages: %u\n", i, nr_pages);
 
/* Determine DIMM ECC mode: */
-   if (pvt->umc) {
-   if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED)
-   edac_mode = EDAC_S4ECD4ED;
-   else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED)
-   edac_mode = EDAC_SECDED;
-
-   } else if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
+   if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
edac_mode = (pvt->nbcfg & NBCFG_CHIPKILL)
? EDAC_S4ECD4ED
: EDAC_SECDED;
-- 
2.17.1

[PATCH 5/8] EDAC/amd64: Find Chip Select memory size using Address Mask

2019-05-31 Thread Ghannam, Yazen

From: Yazen Ghannam 

Chip Select memory size reporting on AMD Family 17h was recently fixed
in order to account for interleaving. However, the current method is not
robust.

The Chip Select Address Mask can be used to find the memory size. There
are a few cases.

1) For single-rank, use the address mask as the size.

2) For dual-rank non-interleaved, use the address mask divided by 2 as
the size.

3) For dual-rank interleaved, do #2 but "de-interleave" the address mask
first.

Always "de-interleave" the address mask in order to simplify the code
flow. Bit mask manipulation is necessary to check for interleaving, so
just go ahead do the de-interleaving. In the non-interleaved case, the
original and de-interleaved address masks will be the same.

To de-interleave the mask, count the number of zero bits in the middle
of the mask and swap them with the most significant bits.

For example,
Original=0x9FE, De-interleaved=0x3FE

Fixes: fc00c6a41638 ("EDAC/amd64: Adjust printed chip select sizes when 
interleaved")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 107 ++
 1 file changed, 63 insertions(+), 44 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index d0926b181c7c..f0424c10cac0 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -788,51 +788,36 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, 
u32 dclr, int chan)
 (dclr & BIT(15)) ?  "yes" : "no");
 }
 
-/*
- * The Address Mask should be a contiguous set of bits in the non-interleaved
- * case. So to check for CS interleaving, find the most- and least-significant
- * bits of the mask, generate a contiguous bitmask, and compare the two.
- */
-static bool f17_cs_interleaved(struct amd64_pvt *pvt, u8 ctrl, int cs)
+#define CS_EVEN_PRIMARYBIT(0)
+#define CS_ODD_PRIMARY BIT(1)
+
+static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 {
-   u32 mask = pvt->csels[ctrl].csmasks[cs >> 1];
-   u32 msb = fls(mask) - 1, lsb = ffs(mask) - 1;
-   u32 test_mask = GENMASK(msb, lsb);
+   int cs_mode = 0;
+
+   if (csrow_enabled(2 * dimm, ctrl, pvt))
+   cs_mode |= CS_EVEN_PRIMARY;
 
-   edac_dbg(1, "mask=0x%08x test_mask=0x%08x\n", mask, test_mask);
+   if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
+   cs_mode |= CS_ODD_PRIMARY;
 
-   return mask ^ test_mask;
+   return cs_mode;
 }
 
 static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
 {
-   int dimm, size0, size1, cs0, cs1;
+   int dimm, size0, size1, cs0, cs1, cs_mode;
 
edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
 
for (dimm = 0; dimm < 2; dimm++) {
-   size0 = 0;
cs0 = dimm * 2;
-
-   if (csrow_enabled(cs0, ctrl, pvt))
-   size0 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs0);
-
-   size1 = 0;
cs1 = dimm * 2 + 1;
 
-   if (csrow_enabled(cs1, ctrl, pvt)) {
-   /*
-* CS interleaving is only supported if both CSes have
-* the same amount of memory. Because they are
-* interleaved, it will look like both CSes have the
-* full amount of memory. Save the size for both as
-* half the amount we found on CS0, if interleaved.
-*/
-   if (f17_cs_interleaved(pvt, ctrl, cs1))
-   size1 = size0 = (size0 >> 1);
-   else
-   size1 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs1);
-   }
+   cs_mode = f17_get_cs_mode(dimm, ctrl, pvt);
+
+   size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
+   size1 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs1);
 
amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n",
cs0,size0,
@@ -1569,18 +1554,50 @@ static int f16_dbam_to_chip_select(struct amd64_pvt 
*pvt, u8 dct,
return ddr3_cs_size(cs_mode, false);
 }
 
-static int f17_base_addr_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
unsigned int cs_mode, int csrow_nr)
 {
-   u32 base_addr = pvt->csels[umc].csbases[csrow_nr];
+   u32 addr_mask_orig, addr_mask_deinterleaved;
+   u32 msb, weight, num_zero_bits;
+   int dimm, dual_rank, size = 0;
 
-   /*  Each mask is used for every two base addresses. */
-   u32 addr_mask = pvt->csels[umc].csmasks[csrow_nr >> 1];
+   if (!cs_mode)
+   return size;
 
-   /*  Register [31:1] = Address [39:9]. Size is in kBs here. */
-   u32 size = ((addr_mask >> 1) - (base_addr >> 1) + 1) >> 1;
+   dual_rank = !!(cs_mode &

[PATCH 1/8] EDAC/amd64: Fix number of DIMMs and Chip Select bases/masks on Family17h

2019-05-31 Thread Ghannam, Yazen

From: Yazen Ghannam 

...because AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS
masks per channel.

Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 873437be86d9..9fa2f205f05c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -810,7 +810,7 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt 
*pvt, u8 ctrl)
 
edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
 
-   for (dimm = 0; dimm < 4; dimm++) {
+   for (dimm = 0; dimm < 2; dimm++) {
size0 = 0;
cs0 = dimm * 2;
 
@@ -942,6 +942,9 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
} else if (pvt->fam == 0x15 && pvt->model == 0x30) {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
+   } else if (pvt->fam >= 0x17) {
+   pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
+   pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
} else {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
-- 
2.17.1

[PATCH 0/8] AMD64 EDAC fixes for v5.2

2019-05-31 Thread Ghannam, Yazen

From: Yazen Ghannam 

Hi Boris,

This set contains a few fixes for some changes merged in v5.2. There
are also a couple of fixes for older issues. In addition, there are a
couple of patches to add support for Asymmetric Dual-Rank DIMMs.

Thanks,
Yazen

Yazen Ghannam (8):
  EDAC/amd64: Fix number of DIMMs and Chip Select bases/masks on
Family17h
  EDAC/amd64: Support more than two controllers for chip selects
handling
  EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
  EDAC/amd64: Initialize DIMM info for systems with more than two
channels
  EDAC/amd64: Find Chip Select memory size using Address Mask
  EDAC/amd64: Decode syndrome before translating address
  EDAC/amd64: Cache secondary Chip Select registers
  EDAC/amd64: Support Asymmetric Dual-Rank DIMMs

 drivers/edac/amd64_edac.c | 348 --
 drivers/edac/amd64_edac.h |   9 +-
 2 files changed, 232 insertions(+), 125 deletions(-)

-- 
2.17.1

Re: [PATCH 2/5] Add fchmodat4(), a new syscall

2019-05-31 Thread Palmer Dabbelt


On Fri, 31 May 2019 12:51:00 PDT (-0700), Arnd Bergmann wrote:

On Fri, May 31, 2019 at 9:23 PM Palmer Dabbelt  wrote:


man 3p says that fchmodat() takes a flags argument, but the Linux
syscall does not.  There doesn't appear to be a good userspace
workaround for this issue but the implementation in the kernel is pretty
straight-forward.  The specific use case where the missing flags came up
was WRT a fuse filesystem implemenation, but the functionality is pretty
generic so I'm assuming there would be other use cases.

Signed-off-by: Palmer Dabbelt 
---
 fs/open.c| 21 +++--
 include/linux/syscalls.h |  5 +
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index a00350018a47..cfad7684e8d3 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -568,11 +568,17 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
return ksys_fchmod(fd, mode);
 }

-int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
+int do_fchmodat4(int dfd, const char __user *filename, umode_t mode, int flags)

...

+
+int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
+{
+   return do_fchmodat4(dfd, filename, mode, 0);
+}
+


There is only one external caller of do_fchmodat(), so just change that
to pass the extra argument here, and keep a single do_fchmodat()
function used by the sys_fchmod(), sys_fchmod4(), sys_chmod()
and ksys_chmod().


OK, I'll roll that up into a v2.

[RFC PATCH 2/9] x86/sgx: Do not naturally align MAP_FIXED address

2019-05-31 Thread Sean Christopherson

SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is
tracked and enforced by the CPU using a base+mask approach, similar to
how hardware range registers such as the variable MTRRs.  As a result,
the ELRANGE must be naturally sized and aligned.

To reduce boilerplate code that would be needed in every userspace
enclave loader, the SGX driver naturally aligns the mmap() address and
also requires the range to be naturally sized.  Unfortunately, SGX fails
to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap()
if userspace is attempting to map a small slice of an existing enclave.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kernel/cpu/sgx/driver/main.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c 
b/arch/x86/kernel/cpu/sgx/driver/main.c
index afe844aa81d6..129d356aff30 100644
--- a/arch/x86/kernel/cpu/sgx/driver/main.c
+++ b/arch/x86/kernel/cpu/sgx/driver/main.c
@@ -79,7 +79,13 @@ static unsigned long sgx_get_unmapped_area(struct file *file,
   unsigned long pgoff,
   unsigned long flags)
 {
-   if (len < 2 * PAGE_SIZE || len & (len - 1) || flags & MAP_PRIVATE)
+   if (flags & MAP_PRIVATE)
+   return -EINVAL;
+
+   if (flags & MAP_FIXED)
+   return addr;
+
+   if (len < 2 * PAGE_SIZE || len & (len - 1))
return -EINVAL;
 
addr = current->mm->get_unmapped_area(file, addr, 2 * len, pgoff,
-- 
2.21.0

[RFC PATCH 6/9] x86/sgx: Require userspace to provide allowed prots to ADD_PAGES

2019-05-31 Thread Sean Christopherson

...to support (the equivalent) of existing Linux Security Module
functionality.

Because SGX manually manages EPC memory, all enclave VMAs are backed by
the same vm_file, i.e. /dev/sgx/enclave, so that SGX can implement the
necessary hooks to move pages in/out of the EPC.  And because EPC pages
for any given enclave are fundamentally shared between processes, i.e.
CoW semantics are not possible with EPC pages, /dev/sgx/enclave must
always be MAP_SHARED.  Lastly, all real world enclaves will need read,
write and execute permissions to EPC pages.  As a result, SGX does not
play nice with existing LSM behavior as it is impossible to apply
policies to enclaves with any reasonable granularity, e.g. an LSM can
deny access to EPC altogether, but can't deny potentially dangerous
behavior such as mapping pages RW->RW or RWX.

To give LSMs enough information to implement their policies without
having to resort to ugly things, e.g. holding a reference to the vm_file
of each enclave page, require userspace to explicitly state the allowed
protections for each page (region), i.e. take ALLOW_{READ,WRITE,EXEC}
in the ADD_PAGES ioctl.

The ALLOW_* flags will be passed to LSMs so that they can make informed
decisions when the enclave is being built, i.e. when the source vm_file
is available.  For example, SELinux's EXECMOD permission can be
required if an enclave is requesting both ALLOW_WRITE and ALLOW_EXEC.

Update the mmap()/mprotect() hooks to enforce the ALLOW_* protections,
a la the standard VM_MAY{READ,WRITE,EXEC} flags.

The ALLOW_EXEC flag also has a second (important) use in that it can
be used to prevent loading an enclave from a noexec file system, on
SGX2 hardware (regardless of kernel support for SGX2), userspace could
EADD from a noexec path using read-only permissions and later mprotect()
and ENCLU[EMODPE] the page to gain execute permissions.  By requiring
ALLOW_EXEC up front, SGX will be able to enforce noexec paths when
building the enclave.

Signed-off-by: Sean Christopherson 
---
 arch/x86/include/uapi/asm/sgx.h|  9 -
 arch/x86/kernel/cpu/sgx/driver/ioctl.c | 23 +--
 arch/x86/kernel/cpu/sgx/encl.c |  2 +-
 arch/x86/kernel/cpu/sgx/encl.h |  1 +
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 4a12d6abbcb7..4489e92fa0dc 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -31,6 +31,11 @@ struct sgx_enclave_create  {
__u64   src;
 };
 
+/* Supported flags for struct sgx_enclave_add_pages. */
+#define SGX_ALLOW_READ VM_READ
+#define SGX_ALLOW_WRITEVM_WRITE
+#define SGX_ALLOW_EXEC VM_EXEC
+
 /**
  * struct sgx_enclave_add_pages - parameter structure for the
  *%SGX_IOC_ENCLAVE_ADD_PAGES ioctl
@@ -39,6 +44,7 @@ struct sgx_enclave_create  {
  * @secinfo:   address for the SECINFO data (common to all pages)
  * @nr_pages:  number of pages (must be virtually contiguous)
  * @mrmask:bitmask for the measured 256 byte chunks (common to all pages)
+ * @flags: flags, e.g. SGX_ALLOW_{READ,WRITE,EXEC} (common to all pages)
  */
 struct sgx_enclave_add_pages {
__u64   addr;
@@ -46,7 +52,8 @@ struct sgx_enclave_add_pages {
__u64   secinfo;
__u32   nr_pages;
__u16   mrmask;
-} __attribute__((__packed__));
+   __u16   flags;
+};
 
 /**
  * struct sgx_enclave_init - parameter structure for the
diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c 
b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
index 6acfcbdeca9a..c30acd3fbbdd 100644
--- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
@@ -235,7 +235,8 @@ static int sgx_validate_secs(const struct sgx_secs *secs,
 }
 
 static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
-unsigned long addr)
+unsigned long addr,
+unsigned long allowed_prot)
 {
struct sgx_encl_page *encl_page;
int ret;
@@ -247,6 +248,7 @@ static struct sgx_encl_page *sgx_encl_page_alloc(struct 
sgx_encl *encl,
return ERR_PTR(-ENOMEM);
encl_page->desc = addr;
encl_page->encl = encl;
+   encl_page->allowed_prot = allowed_prot;
ret = radix_tree_insert(>page_tree, PFN_DOWN(encl_page->desc),
encl_page);
if (ret) {
@@ -530,7 +532,7 @@ static int sgx_encl_queue_page(struct sgx_encl *encl,
 
 static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
   void *data, struct sgx_secinfo *secinfo,
-  unsigned int mrmask)
+  unsigned int mrmask, unsigned long allowed_prot)
 {
u64 page_type = secinfo->flags & SGX_SECINFO_PAGE_TYPE_MASK;
struct sgx_encl_page

[RFC PATCH 3/9] x86/sgx: Allow userspace to add multiple pages in single ioctl()

2019-05-31 Thread Sean Christopherson

...to improve performance when building enclaves by reducing the number
of user<->system transitions.  Rather than provide arbitrary batching,
e.g. with per-page SECINFO and mrmask, take advantage of the fact that
any sane enclave will have large swaths of pages with identical
properties, e.g. code vs. data sections.

For simplicity and stability in the initial implementation, loop over
the existing add page flow instead of taking a more agressive approach,
which would require tracking transitions between VMAs and holding
mmap_sem for an extended duration.

Signed-off-by: Sean Christopherson 
---
 arch/x86/include/uapi/asm/sgx.h|  21 ++---
 arch/x86/kernel/cpu/sgx/driver/ioctl.c | 104 +++--
 2 files changed, 74 insertions(+), 51 deletions(-)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 9ed690a38c70..4a12d6abbcb7 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -12,8 +12,8 @@
 
 #define SGX_IOC_ENCLAVE_CREATE \
_IOW(SGX_MAGIC, 0x00, struct sgx_enclave_create)
-#define SGX_IOC_ENCLAVE_ADD_PAGE \
-   _IOW(SGX_MAGIC, 0x01, struct sgx_enclave_add_page)
+#define SGX_IOC_ENCLAVE_ADD_PAGES \
+   _IOW(SGX_MAGIC, 0x01, struct sgx_enclave_add_pages)
 #define SGX_IOC_ENCLAVE_INIT \
_IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
 #define SGX_IOC_ENCLAVE_SET_ATTRIBUTE \
@@ -32,21 +32,22 @@ struct sgx_enclave_create  {
 };
 
 /**
- * struct sgx_enclave_add_page - parameter structure for the
- *   %SGX_IOC_ENCLAVE_ADD_PAGE ioctl
- * @addr:  address within the ELRANGE
- * @src:   address for the page data
- * @secinfo:   address for the SECINFO data
- * @mrmask:bitmask for the measured 256 byte chunks
+ * struct sgx_enclave_add_pages - parameter structure for the
+ *%SGX_IOC_ENCLAVE_ADD_PAGES ioctl
+ * @addr:  start address within the ELRANGE
+ * @src:   start address for the pages' data
+ * @secinfo:   address for the SECINFO data (common to all pages)
+ * @nr_pages:  number of pages (must be virtually contiguous)
+ * @mrmask:bitmask for the measured 256 byte chunks (common to all pages)
  */
-struct sgx_enclave_add_page {
+struct sgx_enclave_add_pages {
__u64   addr;
__u64   src;
__u64   secinfo;
+   __u32   nr_pages;
__u16   mrmask;
 } __attribute__((__packed__));
 
-
 /**
  * struct sgx_enclave_init - parameter structure for the
  *   %SGX_IOC_ENCLAVE_INIT ioctl
diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c 
b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
index a27ec26a9350..6acfcbdeca9a 100644
--- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
@@ -487,10 +487,9 @@ static int sgx_validate_tcs(struct sgx_encl *encl, struct 
sgx_tcs *tcs)
return 0;
 }
 
-static int __sgx_encl_add_page(struct sgx_encl *encl,
+static int sgx_encl_queue_page(struct sgx_encl *encl,
   struct sgx_encl_page *encl_page,
-  void *data,
-  struct sgx_secinfo *secinfo,
+  void *data, struct sgx_secinfo *secinfo,
   unsigned int mrmask)
 {
unsigned long page_index = sgx_encl_get_index(encl, encl_page);
@@ -529,9 +528,9 @@ static int __sgx_encl_add_page(struct sgx_encl *encl,
return 0;
 }
 
-static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
-void *data, struct sgx_secinfo *secinfo,
-unsigned int mrmask)
+static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
+  void *data, struct sgx_secinfo *secinfo,
+  unsigned int mrmask)
 {
u64 page_type = secinfo->flags & SGX_SECINFO_PAGE_TYPE_MASK;
struct sgx_encl_page *encl_page;
@@ -563,7 +562,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, 
unsigned long addr,
goto out;
}
 
-   ret = __sgx_encl_add_page(encl, encl_page, data, secinfo, mrmask);
+   ret = sgx_encl_queue_page(encl, encl_page, data, secinfo, mrmask);
if (ret) {
radix_tree_delete(_page->encl->page_tree,
  PFN_DOWN(encl_page->desc));
@@ -575,56 +574,79 @@ static int sgx_encl_add_page(struct sgx_encl *encl, 
unsigned long addr,
return ret;
 }
 
-/**
- * sgx_ioc_enclave_add_page - handler for %SGX_IOC_ENCLAVE_ADD_PAGE
- *
- * @filep: open file to /dev/sgx
- * @cmd:   the command value
- * @arg:   pointer to an _enclave_add_page instance
- *
- * Add a page to an uninitialized enclave (EADD), and optionally extend the
- * enclave's measurement with the contents of the page (EEXTEND).  EADD and
- * EEXTEND are done asynchronously via worker threads.  A successful
- * sgx_ioc_enclave_add_page() only

[RFC PATCH 9/9] security/selinux: Add enclave_load() implementation

2019-05-31 Thread Sean Christopherson

The goal of selinux_enclave_load() is to provide a facsimile of the
existing selinux_file_mprotect() and file_map_prot_check() policies,
but tailored to the unique properties of SGX.

For example, an enclave page is technically backed by a MAP_SHARED file,
but the "file" is essentially shared memory that is never persisted
anywhere and also requires execute permissions (for some pages).

The basic concept is to require appropriate execute permissions on the
source of the enclave for pages that are requesting PROT_EXEC, e.g. if
an enclave page is being loaded from a regular file, require
FILE__EXECUTE and/or FILE__EXECMOND, and if it's coming from an
anonymous/private mapping, require PROCESS__EXECMEM since the process
is essentially executing from the mapping, albeit in a roundabout way.

Note, FILE__READ and FILE__WRITE are intentionally not required even if
the source page is backed by a regular file.  Writes to the enclave page
are contained to the EPC, i.e. never hit the original file, and read
permissions have already been vetted (or the VMA doesn't have PROT_READ,
in which case loading the page into the enclave will fail).

Signed-off-by: Sean Christopherson 
---
 security/selinux/hooks.c | 85 
 1 file changed, 85 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 3ec702cf46ca..f436a055dda7 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6726,6 +6726,87 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux 
*aux)
 }
 #endif
 
+#ifdef CONFIG_INTEL_SGX
+int selinux_enclave_load(struct vm_area_struct *vma, unsigned long prot,
+unsigned long *allowed_prot)
+{
+   const struct cred *cred = current_cred();
+   u32 sid = cred_sid(cred);
+   int rc;
+
+   /* SGX is supported only in 64-bit kernels. */
+   WARN_ON_ONCE(!default_noexec);
+
+   /*
+* SGX is responsible for checking @prot vs @allowed_prot, and SELinux
+* only cares about execute related permissions for enclaves.
+*/
+   if (!(*allowed_prot & PROT_EXEC))
+   return 0;
+
+   /*
+* Loading an executable enclave page from a VMA that is not executable
+* itself requires EXECUTE permissions on the source file, or if there
+* is no regular source file, EXECMEM since the page is being loaded
+* from a non-executable anonymous mapping.
+*/
+   if (!(vma->vm_flags & VM_EXEC)) {
+   if (vma->vm_file && !IS_PRIVATE(file_inode(vma->vm_file)))
+   rc = file_has_perm(cred, vma->vm_file, FILE__EXECUTE);
+   else
+   rc = avc_has_perm(_state,
+ sid, sid, SECCLASS_PROCESS,
+ PROCESS__EXECMEM, NULL);
+
+   /*
+* Reject the load if the enclave *needs* the page to be
+* executable, otherwise prevent it from becoming executable.
+*/
+   if (rc) {
+   if (prot & PROT_EXEC)
+   return rc;
+
+   *allowed_prot &= ~PROT_EXEC;
+   }
+   }
+
+   /*
+* An enclave page that may do RW->RX or W+X requires EXECMOD (backed
+* by a regular file) or EXECMEM (loaded from an anonymous mapping).
+* Note, this hybrid EXECMOD and EXECMEM behavior is intentional and
+* reflects the nature of enclaves and the EPC, e.g. EPC is effectively
+* a non-persistent shared file, but each enclave is a private domain
+* within that shared file, so delegate to the source of the enclave.
+*/
+   if ((*allowed_prot & PROT_EXEC) && (*allowed_prot & PROT_WRITE)) {
+   if (vma->vm_file && !IS_PRIVATE(file_inode(vma->vm_file)))
+   rc = file_has_perm(cred, vma->vm_file, FILE__EXECMOD);
+   else
+   rc = avc_has_perm(_state,
+ sid, sid, SECCLASS_PROCESS,
+ PROCESS__EXECMEM, NULL);
+   /*
+* Clear ALLOW_EXEC instead of ALLOWED_WRITE if permissions are
+* lacking and @prot has neither PROT_WRITE or PROT_EXEC.  If
+* userspace wanted RX they would have requested RX, and due to
+* lack of permissions they can never get RW->RX, i.e. the only
+* useful transition is R->RW.
+*/
+   if (rc) {
+   if ((prot & PROT_EXEC) && (prot & PROT_WRITE))
+   return rc;
+
+   if (prot & PROT_EXEC)
+   *allowed_prot &= ~PROT_WRITE;
+   else
+   *allowed_prot &= ~PROT_EXEC;
+   }
+   }
+
+   return 0;
+}
+#endif
+

[RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM

2019-05-31 Thread Sean Christopherson

This series is the result of a rather absurd amount of discussion over
how to get SGX to play nice with LSM policies, without having to resort
to evil shenanigans or put undue burden on userspace.  The discussion
definitely wandered into completely insane territory at times, but I
think/hope we ended up with something reasonable.

The basic gist of the approach is to require userspace to declare what
protections are maximally allowed for any given page, e.g. add a flags
field for loading enclave pages that takes ALLOW_{READ,WRITE,EXEC}.  LSMs
can then adjust the allowed protections, e.g. clear ALLOW_EXEC to prevent
ever mapping the page with PROT_EXEC.  SGX enforces the allowed perms
via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses
MAY_{READ,WRITE,EXEC}.

ALLOW_EXEC is used to deny hings like loading an enclave from a noexec
file system or from a file without EXECUTE permissions, e.g. without
the ALLOW_EXEC concept, on SGX2 hardware (regardless of kernel support)
userspace could EADD from a noexec file using read-only permissions,
and later use mprotect() and ENCLU[EMODPE] to gain execute permissions.

ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's
EXECMOD (or EXECMEM).

This is very much an RFC series.  It's only compile tested, likely has
obvious bugs, the SELinux patch could be completely harebrained, etc...
My goal at this point is to get feedback at a macro level, e.g. is the
core concept viable/acceptable, are there objection to hooking
mprotect(), etc...

Andy and Cedric, hopefully this aligns with your general expectations
based on our last discussion.

Lastly, I added a patch to allow userspace to add multiple pages in a
single ioctl().  It's obviously not directly related to the security
stuff, but the idea tangentially came up during earlier discussions and
it's something I think the UAPI should provide (it's a tiny change).
Since I was modifying the UAPI anyways, I threw it in.

Sean Christopherson (9):
  x86/sgx: Remove unused local variable in sgx_encl_release()
  x86/sgx: Do not naturally align MAP_FIXED address
  x86/sgx: Allow userspace to add multiple pages in single ioctl()
  mm: Introduce vm_ops->mprotect()
  x86/sgx: Restrict mapping without an enclave page to PROT_NONE
  x86/sgx: Require userspace to provide allowed prots to ADD_PAGES
  x86/sgx: Enforce noexec filesystem restriction for enclaves
  LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX
  security/selinux: Add enclave_load() implementation

 arch/x86/include/uapi/asm/sgx.h|  30 --
 arch/x86/kernel/cpu/sgx/driver/ioctl.c | 143 +
 arch/x86/kernel/cpu/sgx/driver/main.c  |  13 ++-
 arch/x86/kernel/cpu/sgx/encl.c |  31 +-
 arch/x86/kernel/cpu/sgx/encl.h |   4 +
 include/linux/lsm_hooks.h  |  16 +++
 include/linux/mm.h |   2 +
 include/linux/security.h   |   2 +
 mm/mprotect.c  |  15 ++-
 security/security.c|   8 ++
 security/selinux/hooks.c   |  85 +++
 11 files changed, 290 insertions(+), 59 deletions(-)

-- 
2.21.0

[RFC PATCH 7/9] x86/sgx: Enforce noexec filesystem restriction for enclaves

2019-05-31 Thread Sean Christopherson

Do not allow an enclave page to be mapped with PROT_EXEC if the source
page is backed by a file on a noexec file system.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kernel/cpu/sgx/driver/ioctl.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c 
b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
index c30acd3fbbdd..5f71be7cbb01 100644
--- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
@@ -576,6 +576,27 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, 
unsigned long addr,
return ret;
 }
 
+static int sgx_encl_page_protect(unsigned long src, unsigned long prot,
+unsigned long *allowed_prot)
+{
+   struct vm_area_struct *vma;
+
+   if (!(*allowed_prot & VM_EXEC))
+   goto do_check;
+
+   down_read(>mm->mmap_sem);
+   vma = find_vma(current->mm, src);
+   if (!vma || (vma->vm_file && path_noexec(>vm_file->f_path)))
+   *allowed_prot &= ~VM_EXEC;
+   up_read(>mm->mmap_sem);
+
+do_check:
+   if (prot & ~*allowed_prot)
+   return -EACCES;
+
+   return 0;
+}
+
 static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
 unsigned long src, struct sgx_secinfo *secinfo,
 unsigned int mrmask, unsigned int flags)
@@ -589,8 +610,9 @@ static int sgx_encl_add_page(struct sgx_encl *encl, 
unsigned long addr,
BUILD_BUG_ON(SGX_SECINFO_R != VM_READ || SGX_SECINFO_W != VM_WRITE ||
 SGX_SECINFO_X != VM_EXEC);
 
-   if (prot & ~allowed_prot)
-   return -EACCES;
+   ret = sgx_encl_page_protect(src, prot, _prot);
+   if (ret)
+   return ret;
 
data_page = alloc_page(GFP_HIGHUSER);
if (!data_page)
-- 
2.21.0

[RFC PATCH 5/9] x86/sgx: Restrict mapping without an enclave page to PROT_NONE

2019-05-31 Thread Sean Christopherson

To support LSM integration, SGX will require userspace to explicitly
specify the allowed protections for each page.  The allowed protections
will be supplied to and modified by LSMs (based on their policies).
To prevent userspace from circumventing the allowed protections, do not
allow PROT_{READ,WRITE,EXEC} mappings to an enclave without an
associated enclave page (which will track the allowed protections).

Signed-off-by: Sean Christopherson 
---
 arch/x86/kernel/cpu/sgx/driver/main.c |  5 +
 arch/x86/kernel/cpu/sgx/encl.c| 30 +++
 arch/x86/kernel/cpu/sgx/encl.h|  3 +++
 3 files changed, 38 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c 
b/arch/x86/kernel/cpu/sgx/driver/main.c
index 129d356aff30..65a87c2fdf02 100644
--- a/arch/x86/kernel/cpu/sgx/driver/main.c
+++ b/arch/x86/kernel/cpu/sgx/driver/main.c
@@ -63,6 +63,11 @@ static long sgx_compat_ioctl(struct file *filep, unsigned 
int cmd,
 static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct sgx_encl *encl = file->private_data;
+   int ret;
+
+   ret = sgx_map_allowed(encl, vma->vm_start, vma->vm_end, vma->vm_flags);
+   if (ret)
+   return ret;
 
vma->vm_ops = _vm_ops;
vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index f23ea0fbaa47..955d4f430adc 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -235,6 +235,35 @@ static void sgx_vma_close(struct vm_area_struct *vma)
kref_put(>refcount, sgx_encl_release);
 }
 
+int sgx_map_allowed(struct sgx_encl *encl, unsigned long start,
+   unsigned long end, unsigned long prot)
+{
+   struct sgx_encl_page *page;
+   unsigned long addr;
+
+   prot &= (VM_READ | VM_WRITE | VM_EXEC);
+   if (!prot || !encl)
+   return 0;
+
+   mutex_lock(>lock);
+
+   for (addr = start; addr < end; addr += PAGE_SIZE) {
+   page = radix_tree_lookup(>page_tree, addr >> PAGE_SHIFT);
+   if (!page)
+   return -EACCES;
+   }
+
+   mutex_unlock(>lock);
+
+   return 0;
+}
+
+static int sgx_vma_mprotect(struct vm_area_struct *vma, unsigned long start,
+   unsigned long end, unsigned long prot)
+{
+   return sgx_map_allowed(vma->vm_private_data, start, end, prot);
+}
+
 static unsigned int sgx_vma_fault(struct vm_fault *vmf)
 {
unsigned long addr = (unsigned long)vmf->address;
@@ -372,6 +401,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, 
unsigned long addr,
 const struct vm_operations_struct sgx_vm_ops = {
.close = sgx_vma_close,
.open = sgx_vma_open,
+   .mprotect = sgx_vma_mprotect,
.fault = sgx_vma_fault,
.access = sgx_vma_access,
 };
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index c557f0374d74..6e310e3b3fff 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -106,6 +106,9 @@ static inline unsigned long sgx_pcmd_offset(pgoff_t 
page_index)
   sizeof(struct sgx_pcmd);
 }
 
+int sgx_map_allowed(struct sgx_encl *encl, unsigned long start,
+   unsigned long end, unsigned long prot);
+
 enum sgx_encl_mm_iter {
SGX_ENCL_MM_ITER_DONE   = 0,
SGX_ENCL_MM_ITER_NEXT   = 1,
-- 
2.21.0

Re: [PATCH v2 00/17] net: introduce Qualcomm IPA driver

2019-05-31 Thread Bjorn Andersson

On Fri 31 May 13:47 PDT 2019, Alex Elder wrote:

> On 5/31/19 2:19 PM, Arnd Bergmann wrote:
> > On Fri, May 31, 2019 at 6:36 PM Alex Elder  wrote:
> >> On 5/31/19 9:58 AM, Dan Williams wrote:
> >>> On Thu, 2019-05-30 at 22:53 -0500, Alex Elder wrote:
> >>>
> >>> My question from the Nov 2018 IPA rmnet driver still stands; how does
> >>> this relate to net/ethernet/qualcomm/rmnet/ if at all? And if this is
> >>> really just a netdev talking to the IPA itself and unrelated to
> >>> net/ethernet/qualcomm/rmnet, let's call it "ipa%d" and stop cargo-
> >>> culting rmnet around just because it happens to be a net driver for a
> >>> QC SoC.
> >>
> >> First, the relationship between the IPA driver and the rmnet driver
> >> is that the IPA driver is assumed to sit between the rmnet driver
> >> and the hardware.
> > 
> > Does this mean that IPA can only be used to back rmnet, and rmnet
> > can only be used on top of IPA, or can or both of them be combined
> > with another driver to talk to instead?
> 
> No it does not mean that.
> 
> As I understand it, one reason for the rmnet layer was to abstract
> the back end, which would allow using a modem, or using something
> else (a LAN?), without exposing certain details of the hardware.
> (Perhaps to support multiplexing, etc. without duplicating that
> logic in two "back-end" drivers?)
> 
> To be perfectly honest, at first I thought having IPA use rmnet
> was a cargo cult thing like Dan suggested, because I didn't see
> the benefit.  I now see why one would use that pass-through layer
> to handle the QMAP features.
> 
> But back to your question.  The other thing is that I see no
> reason the IPA couldn't present a "normal" (non QMAP) interface
> for a modem.  It's something I'd really like to be able to do,
> but I can't do it without having the modem firmware change its
> configuration for these endpoints.  My access to the people who
> implement the modem firmware has been very limited (something
> I hope to improve), and unless and until I can get corresponding
> changes on the modem side to implement connections that don't
> use QMAP, I can't implement such a thing.
> 

But any such changes would either be years into the future or for
specific devices and as such not applicable to any/most of devices on
the market now or in the coming years.


But as Arnd points out, if the software split between IPA and rmnet is
suboptimal your are encouraged to fix that.

Regards,
Bjorn

[RFC PATCH 1/9] x86/sgx: Remove unused local variable in sgx_encl_release()

2019-05-31 Thread Sean Christopherson

Signed-off-by: Sean Christopherson 
---
 arch/x86/kernel/cpu/sgx/encl.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 7216bdf07bd0..f23ea0fbaa47 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -463,7 +463,6 @@ EXPORT_SYMBOL_GPL(sgx_encl_destroy);
 void sgx_encl_release(struct kref *ref)
 {
struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount);
-   struct sgx_encl_mm *encl_mm;
 
if (encl->pm_notifier.notifier_call)
unregister_pm_notifier(>pm_notifier);
-- 
2.21.0

[RFC PATCH 8/9] LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX

2019-05-31 Thread Sean Christopherson

enclave_load() is roughly analogous to the existing file_mprotect().

Due to the nature of SGX and its Enclave Page Cache (EPC), all enclave
VMAs are backed by a single file, i.e. /dev/sgx/enclave, that must be
MAP_SHARED.  Furthermore, all enclaves need read, write and execute
VMAs.  As a result, file_mprotect() does not provide any meaningful
security for enclaves since an LSM can only deny/grant access to the
EPC as a whole.

security_enclave_load() is called when SGX is first loading an enclave
page, i.e. copying a page from normal memory into the EPC.  The notable
difference from file_mprotect() is the allowed_prot parameter, which
is essentially an SGX-specific version of a VMA's MAY_{READ,WRITE,EXEC}
flags.  The purpose of allowed_prot is to enable checks such as
SELinux's FILE__EXECMOD permission without having to track and update
VMAs across multiple mm structs, i.e. SGX can ensure userspace doesn't
overstep its bounds simply by restricting an enclave VMA's protections
by vetting what is maximally allowed during build time.

An alternative to the allowed_prot approach would be to use an enclave's
SIGSTRUCT (a smallish structure that can uniquely identify an enclave)
as a proxy for the enclave.  For example, SGX could take and hold a
reference to the file containing the SIGSTRUCT (if it's in a file) and
call security_enclave_load() during mprotect().  While the SIGSTRUCT
approach would provide better precision, the actual value added was
deemed to be negligible.  On the other hand, pinning a file for the
lifetime of the enclave is ugly, and essentially caching LSM policies
in each page's allowed_prot avoids having to make an extra LSM upcall
during mprotect().

Note, extensive discussion yielded no sane alternative to some form of
SGX specific LSM hook[1].

[1] 
https://lkml.kernel.org/r/CALCETrXf8mSK45h7sTK5Wf+pXLVn=bjsc_rlpgo-h-qdzbr...@mail.gmail.com

Signed-off-by: Sean Christopherson 
---
 arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 +-
 include/linux/lsm_hooks.h  | 16 
 include/linux/security.h   |  2 ++
 security/security.c|  8 
 4 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c 
b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
index 5f71be7cbb01..260417ecbcff 100644
--- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -580,21 +581,24 @@ static int sgx_encl_page_protect(unsigned long src, 
unsigned long prot,
 unsigned long *allowed_prot)
 {
struct vm_area_struct *vma;
+   int ret = 0;
 
-   if (!(*allowed_prot & VM_EXEC))
+   if (!(*allowed_prot & VM_EXEC) && !IS_ENABLED(CONFIG_SECURITY))
goto do_check;
 
down_read(>mm->mmap_sem);
vma = find_vma(current->mm, src);
if (!vma || (vma->vm_file && path_noexec(>vm_file->f_path)))
*allowed_prot &= ~VM_EXEC;
+#ifdef CONFIG_SECURITY
+   ret = security_enclave_load(vma, prot, allowed_prot);
+#endif
up_read(>mm->mmap_sem);
 
 do_check:
-   if (prot & ~*allowed_prot)
-   return -EACCES;
-
-   return 0;
+   if (!ret && (prot & ~*allowed_prot))
+   ret = -EACCES;
+   return ret;
 }
 
 static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 47f58cfb6a19..0562775424a0 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1446,6 +1446,14 @@
  * @bpf_prog_free_security:
  * Clean up the security information stored inside bpf prog.
  *
+ * Security hooks for Intel SGX enclaves.
+ *
+ * @enclave_load:
+ * On success, returns 0 and optionally adjusts @allowed_prot
+ * @vma: the source memory region of the enclave page being loaded.
+ * @prot: the initial protection of the enclave page.
+ * @allowed_prot: the maximum protections of the enclave page.
+ * Return 0 if permission is granted.
  */
 union security_list_options {
int (*binder_set_context_mgr)(struct task_struct *mgr);
@@ -1807,6 +1815,11 @@ union security_list_options {
int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
 #endif /* CONFIG_BPF_SYSCALL */
+
+#ifdef CONFIG_INTEL_SGX
+   int (*enclave_load)(struct vm_area_struct *vma, unsigned long prot,
+   unsigned long *allowed_prot);
+#endif /* CONFIG_INTEL_SGX */
 };
 
 struct security_hook_heads {
@@ -2046,6 +2059,9 @@ struct security_hook_heads {
struct hlist_head bpf_prog_alloc_security;
struct hlist_head bpf_prog_free_security;
 #endif /* CONFIG_BPF_SYSCALL */
+#ifdef CONFIG_INTEL_SGX
+   struct hlist_head enclave_load;
+#endif /* CONFIG_INTEL_SGX */
 } __randomize_layout;

[RFC PATCH 4/9] mm: Introduce vm_ops->mprotect()

2019-05-31 Thread Sean Christopherson

SGX will use the mprotect() hook to prevent userspace from circumventing
various security checks, i.e. Linux Security Modules.

Enclaves are built by copying data from normal memory into the Enclave
Page Cache (EPC).  Due to the nature of SGX, the EPC is represented by a
single file that must be MAP_SHARED, i.e. mprotect() only ever sees a
single MAP_SHARED vm_file.  Furthermore, all enclaves will need read,
write and execute pages in the EPC.

As a result, LSM policies cannot be meaningfully applied, e.g. an LSM
can deny access to the EPC as a whole, but can't deny PROT_EXEC on page
that originated in a non-EXECUTE file (which is long gone by the time
mprotect() is called).

By hooking mprotect(), SGX can make explicit LSM upcalls while an
enclave is being built, i.e. when the kernel has a handle to origin of
each enclave page, and enforce the result of the LSM policy whenever
userspace maps the enclave page in the future.

Alternatively, SGX could play games with MAY_{READ,WRITE,EXEC}, but
that approach is quite ugly, e.g. would require userspace to call an
SGX ioctl() prior to using mprotect() to extend a page's protections.

Signed-off-by: Sean Christopherson 
---
 include/linux/mm.h |  2 ++
 mm/mprotect.c  | 15 +++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e8834ac32b7..50a42364a885 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -458,6 +458,8 @@ struct vm_operations_struct {
void (*close)(struct vm_area_struct * area);
int (*split)(struct vm_area_struct * area, unsigned long addr);
int (*mremap)(struct vm_area_struct * area);
+   int (*mprotect)(struct vm_area_struct * area, unsigned long start,
+   unsigned long end, unsigned long prot);
vm_fault_t (*fault)(struct vm_fault *vmf);
vm_fault_t (*huge_fault)(struct vm_fault *vmf,
enum page_entry_size pe_size);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index bf38dfbbb4b4..e466ca5e4fe0 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -547,13 +547,20 @@ static int do_mprotect_pkey(unsigned long start, size_t 
len,
goto out;
}
 
-   error = security_file_mprotect(vma, reqprot, prot);
-   if (error)
-   goto out;
-
tmp = vma->vm_end;
if (tmp > end)
tmp = end;
+
+   if (vma->vm_ops && vma->vm_ops->mprotect) {
+   error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
+   if (error)
+   goto out;
+   }
+
+   error = security_file_mprotect(vma, reqprot, prot);
+   if (error)
+   goto out;
+
error = mprotect_fixup(vma, , nstart, tmp, newflags);
if (error)
goto out;
-- 
2.21.0

Re: [RFCv2 4/6] mm: factor out madvise's core functionality

2019-05-31 Thread Minchan Kim

On Fri, May 31, 2019 at 04:35:45PM +0200, Oleksandr Natalenko wrote:
> On Fri, May 31, 2019 at 10:12:26PM +0900, Minchan Kim wrote:
> > On Fri, May 31, 2019 at 09:04:20AM +0200, Oleksandr Natalenko wrote:
> > > On Fri, May 31, 2019 at 03:43:11PM +0900, Minchan Kim wrote:
> > > > This patch factor out madvise's core functionality so that upcoming
> > > > patch can reuse it without duplication. It shouldn't change any 
> > > > behavior.
> > > > 
> > > > Signed-off-by: Minchan Kim 
> > > > ---
> > > >  mm/madvise.c | 188 +++
> > > >  1 file changed, 101 insertions(+), 87 deletions(-)
> > > > 
> > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > index 9d749a1420b4..466623ea8c36 100644
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -425,9 +425,10 @@ static int madvise_pageout_pte_range(pmd_t *pmd, 
> > > > unsigned long addr,
> > > > struct page *page;
> > > > int isolated = 0;
> > > > struct vm_area_struct *vma = walk->vma;
> > > > +   struct task_struct *task = walk->private;
> > > > unsigned long next;
> > > >  
> > > > -   if (fatal_signal_pending(current))
> > > > +   if (fatal_signal_pending(task))
> > > > return -EINTR;
> > > >  
> > > > next = pmd_addr_end(addr, end);
> > > > @@ -505,12 +506,14 @@ static int madvise_pageout_pte_range(pmd_t *pmd, 
> > > > unsigned long addr,
> > > >  }
> > > >  
> > > >  static void madvise_pageout_page_range(struct mmu_gather *tlb,
> > > > -struct vm_area_struct *vma,
> > > > -unsigned long addr, unsigned long end)
> > > > +   struct task_struct *task,
> > > > +   struct vm_area_struct *vma,
> > > > +   unsigned long addr, unsigned long end)
> > > >  {
> > > > struct mm_walk warm_walk = {
> > > > .pmd_entry = madvise_pageout_pte_range,
> > > > .mm = vma->vm_mm,
> > > > +   .private = task,
> > > > };
> > > >  
> > > > tlb_start_vma(tlb, vma);
> > > > @@ -519,9 +522,9 @@ static void madvise_pageout_page_range(struct 
> > > > mmu_gather *tlb,
> > > >  }
> > > >  
> > > >  
> > > > -static long madvise_pageout(struct vm_area_struct *vma,
> > > > -   struct vm_area_struct **prev,
> > > > -   unsigned long start_addr, unsigned long 
> > > > end_addr)
> > > > +static long madvise_pageout(struct task_struct *task,
> > > > +   struct vm_area_struct *vma, struct vm_area_struct 
> > > > **prev,
> > > > +   unsigned long start_addr, unsigned long end_addr)
> > > >  {
> > > > struct mm_struct *mm = vma->vm_mm;
> > > > struct mmu_gather tlb;
> > > > @@ -532,7 +535,7 @@ static long madvise_pageout(struct vm_area_struct 
> > > > *vma,
> > > >  
> > > > lru_add_drain();
> > > > tlb_gather_mmu(, mm, start_addr, end_addr);
> > > > -   madvise_pageout_page_range(, vma, start_addr, end_addr);
> > > > +   madvise_pageout_page_range(, task, vma, start_addr, 
> > > > end_addr);
> > > > tlb_finish_mmu(, start_addr, end_addr);
> > > >  
> > > > return 0;
> > > > @@ -744,7 +747,8 @@ static long madvise_dontneed_single_vma(struct 
> > > > vm_area_struct *vma,
> > > > return 0;
> > > >  }
> > > >  
> > > > -static long madvise_dontneed_free(struct vm_area_struct *vma,
> > > > +static long madvise_dontneed_free(struct mm_struct *mm,
> > > > + struct vm_area_struct *vma,
> > > >   struct vm_area_struct **prev,
> > > >   unsigned long start, unsigned long 
> > > > end,
> > > >   int behavior)
> > > > @@ -756,8 +760,8 @@ static long madvise_dontneed_free(struct 
> > > > vm_area_struct *vma,
> > > > if (!userfaultfd_remove(vma, start, end)) {
> > > > *prev = NULL; /* mmap_sem has been dropped, prev is 
> > > > stale */
> > > >  
> > > > -   down_read(>mm->mmap_sem);
> > > > -   vma = find_vma(current->mm, start);
> > > > +   down_read(>mmap_sem);
> > > > +   vma = find_vma(mm, start);
> > > > if (!vma)
> > > > return -ENOMEM;
> > > > if (start < vma->vm_start) {
> > > > @@ -804,7 +808,8 @@ static long madvise_dontneed_free(struct 
> > > > vm_area_struct *vma,
> > > >   * Application wants to free up the pages and associated backing store.
> > > >   * This is effectively punching a hole into the middle of a file.
> > > >   */
> > > > -static long madvise_remove(struct vm_area_struct *vma,
> > > > +static long madvise_remove(struct mm_struct *mm,
> > > > +   struct vm_area_struct *vma,
> > > > struct vm_area_struct **prev,
> > >

Re: [PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering

2019-05-31 Thread David Collins

Hello Saravana,

On 5/23/19 6:01 PM, Saravana Kannan wrote:
...
> Having functional dependencies explicitly called out in DT and
> automatically added before the devices are probed, provides the
> following benefits:
...
> - Supplier devices like clock providers, regulators providers, etc
>   need to keep the resources they provide active and at a particular
>   state(s) during boot up even if their current set of consumers don't
>   request the resource to be active. This is because the rest of the
>   consumers might not have probed yet and turning off the resource
>   before all the consumers have probed could lead to a hang or
>   undesired user experience.
This benefit provided by the sync_state() callback function introduced in
this series gives us a mechanism to solve a specific problem encountered
on Qualcomm Technologies, Inc. (QTI) boards when booting with drivers
compiled as modules.  QTI boards have a regulator that powers the PHYs for
display, camera, USB, UFS, and PCIe.  When these boards boot up, the boot
loader enables this regulator along with other resources in order to
display a splash screen image.  The regulator must remain enabled until
the Linux display driver has probed and made a request with the regulator
framework to keep the regulator enabled.  If the regulator is disabled
prematurely, then the screen image is corrupted and the display hardware
enters a bad state.

We have observed that when the camera driver probes before the display
driver, it performs this sequence: regulator_enable(), camera register IO,
regulator_disable().  Since it is the first consumer of the shared
regulator, the regulator is physically disabled (even though display
hardware still requires it to be enabled).  We have solved this problem
when compiling drivers statically with a downstream regulator
proxy-consumer driver.  This proxy-consumer is able to make an enable
request for the shared regulator before any other consumer.  It then
removes its request at late_initcall_sync.

Unfortunately, when drivers are compiled as modules instead of compiled
statically into the kernel image, late_initcall_sync is not a meaningful
marker of driver probe completion.  This means that our existing proxy
voting system will not work when drivers are compiled as modules.  The
sync_state() callback resolves this issue by providing a notification that
is guaranteed to arrive only after all consumers of the shared regulator
have probed.

QTI boards have other cases of shared resources such as bus bandwidth
which must remain at least at a level set by boot loaders in order to
properly support hardware blocks that are enabled before the Linux kernel
starts booting.

Take care,
David

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH v2 net-next 7/7] net: ethernet: ti: cpsw: add XDP support

2019-05-31 Thread Ivan Khoronzhuk


On Sat, Jun 01, 2019 at 12:37:36AM +0200, Jesper Dangaard Brouer wrote:

On Fri, 31 May 2019 20:03:33 +0300
Ivan Khoronzhuk  wrote:


Probably it's not good example for others how it should be used, not
a big problem to move it to separate pools.., even don't remember why
I decided to use shared pool, there was some more reasons... need
search in history.


Using a shared pool is makes it a lot harder to solve the issue I'm
currently working on.  That is handling/waiting for in-flight frames to
complete, before removing the mem ID from the (r)hashtable lookup.  I
have working code, that basically remove page_pool_destroy() from
public API, and instead lets xdp_rxq_info_unreg() call it when
in-flight count reach zero (and delay fully removing the mem ID).


Frankly, not see reason why it can block smth, it can be considered
like not shared pool. But Ok, anyway it can look more logical and can be
reused by another SoC. I will add it per channel not a problem,
at least for now no blockers. Adding pool per channel will create more
page_pool_destroy() calls, per each pool, that I can be dropped once
you decided to remove it form the API.

This API is called along with xdp_rxq_info_unreg(), and seems like not
a problem to just remove page_pool_destroy(), except one case that
worries me... cpsw has one interesting feature, share same h/w with 2
network devices like dual mac, basically it's 3 port switch, but used
as 2 separate interfaces. So that, both of them share same 
queues/channels/rings.
XDP rxq requires network device to be set in rxq info, wich is used in the
code as a pointer and is shared between xdp buffers, so can't be changed in
flight. That's why each network interface has it's own instances of rxq, but
page pools per each network device is common, so when I call
xdp_rxq_info_unreg() per net device it doesn't mean I want to delete
page poolBut seems I can avoid it calling xdp_rxq_info_unreg()
for both when delete page pools...



--
Best regards,
 Jesper Dangaard Brouer
 MSc.CS, Principal Kernel Engineer at Red Hat
 LinkedIn: http://www.linkedin.com/in/brouer


--
Regards,
Ivan Khoronzhuk

Re: [PATCH v2 00/17] net: introduce Qualcomm IPA driver

2019-05-31 Thread Bjorn Andersson

On Fri 31 May 12:19 PDT 2019, Arnd Bergmann wrote:

> On Fri, May 31, 2019 at 6:36 PM Alex Elder  wrote:
> > On 5/31/19 9:58 AM, Dan Williams wrote:
> > > On Thu, 2019-05-30 at 22:53 -0500, Alex Elder wrote:
[..]
> > So basically, the purpose of the rmnet driver is to handle QMAP
> > protocol connections, and right now that's what the modem provides.
> 
> Do you have any idea why this particular design was picked?
> 

>From what I've seen of QMAP it seems like a reasonable design choice to
have a software component (rmnet) dealing with this, separate from the
transport. And I think IPA is the 4th or 5th mechanism for transporting
QMAP packets back and forth to the modem.


Downstream rmnet is copyright 2007-, and I know of interest in bringing
at least one of the other transports upstream.

Regards,
Bjorn

Re: [RFCv2 6/6] mm: extend process_madvise syscall to support vector arrary

2019-05-31 Thread Minchan Kim

Hi Yann,

On Fri, May 31, 2019 at 12:06:52PM +0200, Yann Droneaud wrote:
> Hi,
> 
> Le vendredi 31 mai 2019 à 15:43 +0900, Minchan Kim a écrit :
> > 
> > diff --git a/include/uapi/asm-generic/mman-common.h
> > b/include/uapi/asm-generic/mman-common.h
> > index 92e347a89ddc..220c2b5eb961 100644
> > --- a/include/uapi/asm-generic/mman-common.h
> > +++ b/include/uapi/asm-generic/mman-common.h
> > @@ -75,4 +75,15 @@
> >  #define PKEY_ACCESS_MASK   (PKEY_DISABLE_ACCESS |\
> >  PKEY_DISABLE_WRITE)
> >  
> > +struct pr_madvise_param {
> > +   int size;   /* the size of this structure */
> > +   int cookie; /* reserved to support atomicity */
> > +   int nr_elem;/* count of below arrary fields */
> 
> Those should be unsigned.
> 
> There's an implicit hole here on ABI with 64bits aligned pointers
> 
> > +   int __user *hints;  /* hints for each range */
> > +   /* to store result of each operation */
> > +   const struct iovec __user *results;
> > +   /* input address ranges */
> > +   const struct iovec __user *ranges;
> 
> Using pointer type in uAPI structure require a 'compat' version of the
> syscall need to be provided.
> 
> If using iovec too.

I will fix them when I submit next revision.
Thanks for the review.

Re: [RFCv2 3/6] mm: introduce MADV_PAGEOUT

2019-05-31 Thread Minchan Kim

Hey Johannes,

On Fri, May 31, 2019 at 12:59:27PM -0400, Johannes Weiner wrote:
> Hi Michan,
> 
> this looks pretty straight-forward to me, only one kink:
> 
> On Fri, May 31, 2019 at 03:43:10PM +0900, Minchan Kim wrote:
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2126,6 +2126,83 @@ static void shrink_active_list(unsigned long 
> > nr_to_scan,
> > nr_deactivate, nr_rotated, sc->priority, file);
> >  }
> >  
> > +unsigned long reclaim_pages(struct list_head *page_list)
> > +{
> > +   int nid = -1;
> > +   unsigned long nr_isolated[2] = {0, };
> > +   unsigned long nr_reclaimed = 0;
> > +   LIST_HEAD(node_page_list);
> > +   struct reclaim_stat dummy_stat;
> > +   struct scan_control sc = {
> > +   .gfp_mask = GFP_KERNEL,
> > +   .priority = DEF_PRIORITY,
> > +   .may_writepage = 1,
> > +   .may_unmap = 1,
> > +   .may_swap = 1,
> > +   };
> > +
> > +   while (!list_empty(page_list)) {
> > +   struct page *page;
> > +
> > +   page = lru_to_page(page_list);
> > +   if (nid == -1) {
> > +   nid = page_to_nid(page);
> > +   INIT_LIST_HEAD(_page_list);
> > +   nr_isolated[0] = nr_isolated[1] = 0;
> > +   }
> > +
> > +   if (nid == page_to_nid(page)) {
> > +   list_move(>lru, _page_list);
> > +   nr_isolated[!!page_is_file_cache(page)] +=
> > +   hpage_nr_pages(page);
> > +   continue;
> > +   }
> > +
> > +   mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_ANON,
> > +   nr_isolated[0]);
> > +   mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_FILE,
> > +   nr_isolated[1]);
> > +   nr_reclaimed += shrink_page_list(_page_list,
> > +   NODE_DATA(nid), , TTU_IGNORE_ACCESS,
> > +   _stat, true);
> > +   while (!list_empty(_page_list)) {
> > +   struct page *page = lru_to_page(_page_list);
> > +
> > +   list_del(>lru);
> > +   putback_lru_page(page);
> > +   }
> > +   mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_ANON,
> > +   -nr_isolated[0]);
> > +   mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_FILE,
> > +   -nr_isolated[1]);
> > +   nid = -1;
> > +   }
> > +
> > +   if (!list_empty(_page_list)) {
> > +   mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_ANON,
> > +   nr_isolated[0]);
> > +   mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_FILE,
> > +   nr_isolated[1]);
> > +   nr_reclaimed += shrink_page_list(_page_list,
> > +   NODE_DATA(nid), , TTU_IGNORE_ACCESS,
> > +   _stat, true);
> > +   mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_ANON,
> > +   -nr_isolated[0]);
> > +   mod_node_page_state(NODE_DATA(nid), NR_ISOLATED_FILE,
> > +   -nr_isolated[1]);
> > +
> > +   while (!list_empty(_page_list)) {
> > +   struct page *page = lru_to_page(_page_list);
> > +
> > +   list_del(>lru);
> > +   putback_lru_page(page);
> > +   }
> > +
> > +   }
> 
> The NR_ISOLATED accounting, nid parsing etc. is really awkward and
> makes it hard to see what the function actually does.
> 
> Can you please make those ISOLATED counters part of the isolation API?
> Your patch really shows this is an overdue cleanup.

Yeah, that was very painful.

> 
> These are fast local percpu counters, we don't need the sprawling
> batching we do all over vmscan.c, migrate.c, khugepaged.c,
> compaction.c etc. Isolation can increase the counter page by page, and
> reclaim or putback can likewise decrease them one by one.
> 
> It looks like mlock is the only user of the isolation api that does
> not participate in the NR_ISOLATED_* counters protocol, but I don't
> see why it wouldn't, or why doing so would hurt.
> 
> There are also seem to be quite a few callsites that use the atomic
> versions of the counter API when they're clearly under the irqsafe
> lru_lock. That would be fixed automatically by this work as well.

I agree all points so will prepare clean up patch.

Re: Linux 5.1.6

2019-05-31 Thread Bhaskar Chowdhury



Thanks, a bunch Greg!

On 08:53 Fri 31 May , Greg KH wrote:

I'm announcing the release of the 5.1.6 kernel.

All users of the 5.1 kernel series must upgrade.

The updated 5.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
linux-5.1.y
and can be browsed at the normal kernel.org git web browser:

https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary

thanks,

greg k-h



Documentation/arm64/silicon-errata.txt |1
Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt |6
Makefile   |2
arch/arm/include/asm/cp15.h|2
arch/arm/vdso/vgettimeofday.c  |5
arch/arm64/Kconfig |   19
arch/arm64/include/asm/cpucaps.h   |3
arch/arm64/include/asm/futex.h |2
arch/arm64/include/asm/pgtable.h   |3
arch/arm64/include/asm/vdso_datapage.h |1
arch/arm64/kernel/asm-offsets.c|2
arch/arm64/kernel/cpu_errata.c |   24
arch/arm64/kernel/cpu_ops.c|1
arch/arm64/kernel/kaslr.c  |6
arch/arm64/kernel/module.c |2
arch/arm64/kernel/syscall.c|   31 +
arch/arm64/kernel/vdso.c   |3
arch/arm64/kernel/vdso/gettimeofday.S  |7
arch/arm64/mm/dma-mapping.c|   10
arch/arm64/mm/fault.c  |   37 +
arch/powerpc/boot/addnote.c|6
arch/powerpc/kernel/head_64.S  |4
arch/powerpc/kernel/watchdog.c |   81 +-
arch/powerpc/mm/numa.c |   18
arch/powerpc/perf/imc-pmu.c|7
arch/powerpc/platforms/powernv/opal-imc.c  |2
arch/s390/kernel/kexec_elf.c   |7
arch/s390/mm/pgtable.c |2
arch/sh/include/cpu-sh4/cpu/sh7786.h   |2
arch/x86/Makefile  |2
arch/x86/events/intel/cstate.c |2
arch/x86/events/intel/rapl.c   |2
arch/x86/events/msr.c  |1
arch/x86/ia32/ia32_signal.c|   29
arch/x86/include/asm/text-patching.h   |4
arch/x86/include/asm/uaccess.h |7
arch/x86/kernel/alternative.c  |   28
arch/x86/kernel/cpu/hygon.c|5
arch/x86/kernel/cpu/mce/core.c |   66 +-
arch/x86/kernel/cpu/mce/inject.c   |   14
arch/x86/kernel/cpu/microcode/core.c   |3
arch/x86/kernel/ftrace.c   |8
arch/x86/kernel/irq_64.c   |   19
arch/x86/kernel/module.c   |2
arch/x86/kernel/signal.c   |   29
arch/x86/kernel/vmlinux.lds.S  |6
arch/x86/kvm/irq.c |7
arch/x86/kvm/irq.h |1
arch/x86/kvm/pmu_amd.c |2
arch/x86/kvm/svm.c |6
arch/x86/kvm/vmx/nested.c  |4
arch/x86/kvm/x86.c |2
arch/x86/lib/memcpy_64.S   |3
arch/x86/mm/fault.c|2
arch/x86/platform/uv/tlb_uv.c  |7
block/bio.c|2
block/blk-mq-sched.c   |   12
block/blk-mq.c |  139 ++--
block/blk.h|2
block/genhd.c  |   19
block/partition-generic.c

[PATCH] regulator: bd718x7: Drop unused include

2019-05-31 Thread Linus Walleij

This driver does not use any symbols from 
so just drop the include.

Cc: Matti Vaittinen 
Signed-off-by: Linus Walleij 
---
 drivers/regulator/bd718x7-regulator.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/regulator/bd718x7-regulator.c 
b/drivers/regulator/bd718x7-regulator.c
index fde4264da6ff..8c22cfb76173 100644
--- a/drivers/regulator/bd718x7-regulator.c
+++ b/drivers/regulator/bd718x7-regulator.c
@@ -4,7 +4,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.20.1

[PATCH] regulator: bd70528: Drop unused include

2019-05-31 Thread Linus Walleij

This driver does not use any symbols from 
so just drop the include.

Cc: Matti Vaittinen 
Signed-off-by: Linus Walleij 
---
 drivers/regulator/bd70528-regulator.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/regulator/bd70528-regulator.c 
b/drivers/regulator/bd70528-regulator.c
index 30e3ed430a8a..0248a61f1006 100644
--- a/drivers/regulator/bd70528-regulator.c
+++ b/drivers/regulator/bd70528-regulator.c
@@ -4,7 +4,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.20.1

Re: [PATCH] Revert "usb: core: remove local_irq_save() around ->complete() handler"

2019-05-31 Thread Soeren Moch




On 01.06.19 00:05, Greg Kroah-Hartman wrote:
> On Fri, May 31, 2019 at 11:53:40PM +0200, Soeren Moch wrote:
>> This reverts commit ed194d1367698a0872a2b75bbe06b3932ce9df3a.
>>
>> In contrast to the original patch description, apparently not all handlers
>> were audited properly. E.g. my RT5370 based USB WIFI adapter (driver in
>> drivers/net/wireless/ralink/rt2x00) hangs after a while under heavy load.
>> This revert fixes this.
> Why not just fix that driver?  Wouldn't that be easier?
>
I suspect there are more drivers to fix. I only tested WIFI sticks so
far, RTL8188 drivers also seem to suffer from this. I'm not sure how to
fix all this properly, maybe Sebastian as original patch author can help
here.
This patch is mostly for -stable, to get an acceptable solution quickly.
It was really annoying to get such unstable WIFI connection over the
last three kernel releases to my development board.  Since my internet
service provider forcefully updated my router box 3 weeks ago, I
unfortunately see the same symptoms on my primary internet access.
That's even worse, I need to reset this router box every few days. I'm
not sure, however, that this is caused by the same problem, but it feels
like this.
So can we please fix this regression quickly and workout a proper fix
later? In the original patch there is no reason given, why this patch is
necessary. With this revert I at least see a stable connection.

Thanks,
Soeren

[PATCH] regulator: arizona-micsupp: Delete unused include

2019-05-31 Thread Linus Walleij

This driver uses no symbols from  so just drop
this include.

Signed-off-by: Linus Walleij 
---
 drivers/regulator/arizona-micsupp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/regulator/arizona-micsupp.c 
b/drivers/regulator/arizona-micsupp.c
index be0d46da51a1..4876b1ceef23 100644
--- a/drivers/regulator/arizona-micsupp.c
+++ b/drivers/regulator/arizona-micsupp.c
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.20.1

Re: [PATCH v2 net-next 7/7] net: ethernet: ti: cpsw: add XDP support

2019-05-31 Thread Ivan Khoronzhuk

On Fri, May 31, 2019 at 10:08:03PM +, Saeed Mahameed wrote:

On Fri, 2019-05-31 at 20:03 +0300, Ivan Khoronzhuk wrote:

On Fri, May 31, 2019 at 06:32:41PM +0200, Jesper Dangaard Brouer
wrote:
> On Fri, 31 May 2019 19:25:24 +0300 Ivan Khoronzhuk <
> ivan.khoronz...@linaro.org> wrote:
>
> > On Fri, May 31, 2019 at 05:46:43PM +0200, Jesper Dangaard Brouer
> > wrote:
> > > From below code snippets, it looks like you only allocated 1
> > > page_pool
> > > and sharing it with several RX-queues, as I don't have the full
> > > context
> > > and don't know this driver, I might be wrong?
> > >
> > > To be clear, a page_pool object is needed per RX-queue, as it
> > > is
> > > accessing a small RX page cache (which protected by
> > > NAPI/softirq).
> >
> > There is one RX interrupt and one RX NAPI for all rx channels.
>
> So, what are you saying?
>
> You _are_ sharing the page_pool between several RX-channels, but it
> is
> safe because this hardware only have one RX interrupt + NAPI
> instance??

I can miss smth but in case of cpsw technically it means:
1) RX interrupts are disabled while NAPI is scheduled,
   not for particular CPU or channel, but at all, for whole cpsw
module.
2) RX channels are handled one by one by priority.

Hi Ivan, I got a silly question..

What is the reason behind having multiple RX rings and one CPU/NAPI
handling all of them ? priority ? how do you priorities ?

Several.
One of the reason, from what I know, it can handle for several cpus/napi but
because of errata on some SoCs or for all of them it was discarded, but idea was
it can. Second it uses same davinci_cpdma API as tx channels that can be rate
limited, and it's used not only by cpsw but also by other driver, so can't be
modified easily and no reason. And third one, h/w has ability to steer some
filtered traffic to rx queues and can be potentially configured with ethtool
ntuples or so, but it's not implementedyet.

3) After all of them handled and no more in budget - interrupts are
enabled.
4) If page is returned to the pool, and it's within NAPI, no races as
it's
   returned protected by softirq. If it's returned not in softirq
it's protected
   by producer lock of the ring.

Probably it's not good example for others how it should be used, not
a big
problem to move it to separate pools.., even don't remember why I
decided to
use shared pool, there was some more reasons... need search in
history.

> --
> Best regards,
>  Jesper Dangaard Brouer
>  MSc.CS, Principal Kernel Engineer at Red Hat
>  LinkedIn: http://www.linkedin.com/in/brouer

--
Regards,
Ivan Khoronzhuk

Re: [RESEND PATCH v4 6/6] leds: lm36274: Introduce the TI LM36274 LED driver

2019-05-31 Thread Dan Murphy


Jacek

On 5/31/19 4:57 PM, Jacek Anaszewski wrote:

Dan,

On 5/31/19 11:07 PM, Dan Murphy wrote:

Hello

On 5/31/19 2:44 PM, Jacek Anaszewski wrote:

On 5/31/19 8:23 AM, Lee Jones wrote:

On Thu, 30 May 2019, Jacek Anaszewski wrote:


On 5/30/19 9:38 AM, Lee Jones wrote:

On Wed, 29 May 2019, Jacek Anaszewski wrote:


On 5/29/19 3:58 PM, Lee Jones wrote:

On Fri, 24 May 2019, Jacek Anaszewski wrote:


Hi,

On 5/23/19 9:09 PM, Dan Murphy wrote:

Pavel

Thanks for the review

On 5/23/19 7:50 AM, Pavel Machek wrote:

Hi!


+++ b/drivers/leds/leds-lm36274.c



+static int lm36274_parse_dt(struct lm36274 *lm36274_data)
+{
+    struct fwnode_handle *child = NULL;
+    char label[LED_MAX_NAME_SIZE];
+    struct device *dev = _data->pdev->dev;
+    const char *name;
+    int child_cnt;
+    int ret = -EINVAL;
+
+    /* There should only be 1 node */
+    child_cnt = device_get_child_node_count(dev);
+    if (child_cnt != 1)
+    return ret;


I'd do explicit "return -EINVAL" here.



ACK


+static int lm36274_probe(struct platform_device *pdev)
+{
+    struct ti_lmu *lmu = dev_get_drvdata(pdev->dev.parent);
+    struct lm36274 *lm36274_data;
+    int ret;
+
+    lm36274_data = devm_kzalloc(>dev, 
sizeof(*lm36274_data),

+    GFP_KERNEL);
+    if (!lm36274_data) {
+    ret = -ENOMEM;
+    return ret;
+    }


And certainly do "return -ENOMEM" explicitly here.



ACK


Acked-by: Pavel Machek 


I've done all amendments requested by Pavel and updated branch
ib-leds-mfd-regulator on linux-leds.git, but in the same time


What do you mean by updated?  You cannot update an 'ib' (immutable
branch).  Immutable means that it cannot change, by definition.


We have already talked about that. Nobody has pulled so the branch
could have been safely updated.


You have no sure way to know that.  And since I have no way to know,
or faith that you won't update it again, pulling it now/at all would
seem like a foolish thing to do.


Sorry, but you are simply unjust. You're pretending to portray the
situation as if I have been notoriously causing merge conflicts in
linux-next which did not take place.

Just to recap what this discussion is about:

On 7 Apr 2019:

1. I sent pull request [0].
2. 45 minutes later I updated it after discovering one omission [1].
    It was rather small chance for it to be pulled as quickly as 
that.

    And even if it happened it wouldn't have been much harmful - we
    wouldn't have lost e.g. weeks of testing in linux-next due to 
that

    fact.

On 21 May 2019:

3. I sent another pull request [2] to you and REGULATOR maintainers.
    After it turned out that lack of feedback from REGULATOR 
maintainers

    was caused by failing to send them the exact copies of patches to
    review, I informed you about possible need for updating the 
branch.
    Afterwards I received a reply from you saying that you hadn't 
pulled

    the branch anyway. At that point I was sure that neither MFD nor
    REGULATOR tree contains the patches. And only after that I 
updated

    the branch.


Here are 2 examples where you have changed immutable branches, which
is 100% of the Pull Requests I have received from you. Using that
record as a benchmark, the situation hardly seems unjust.


Until you can provide me with an assurance that you will not keep
updating/changing the supposedly immutable pull-requests you send 
out,

I won't be pulling any more in.


I can just uphold the assurance which is implicitly assumed for 
anyone

who has never broken acclaimed rules. As justified above.


You have broken the rules every (100% of the) time.


Yes, I admit, I would lose in court.


[0] https://lore.kernel.org/patchwork/patch/1059075/
[1] https://lore.kernel.org/patchwork/patch/1059080/
[2] https://lore.kernel.org/patchwork/patch/1077066/


So we have 2 choices moving forward; you can either provide me with
assurance that you have learned from this experience and will never
change an *immutable* branch again, or I can continue to handle them,
which has been the preference for some years.

If you choose the former and adaptions need to be made in the future,
the correct thing to do is create a *new*, different pull-request
which has its own *new*, different tag, but uses the original tag as a
base.


I choose the former. That being said:

Hereby I solemnly declare never ever change an immutable branch again.

So how do I proceed with the requested change by Mark B on the 
LM36274 driver.


Do I add a patch on top?

Or do I submit a patch to the regulator tree once the PR is pulled?


Won't the change be a dependency for [PATCH v4 1/6] ?


Yes thats why I am asking as we would need to change the branch.


Dan


In each case, having all the commits in one set (and branch) should
simplify the integration.

[PATCH] media: cxusb-analog: Use ARRAY_SIZE for cxusub_medion_pin_config

2019-05-31 Thread Maciej S. Szmigiero

Use ARRAY_SIZE for computing element count of cxusub_medion_pin_config
array as suggested by the kbuild test robot.

Reported-by: kbuild test robot 
Signed-off-by: Maciej S. Szmigiero 
---
 drivers/media/usb/dvb-usb/cxusb-analog.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/media/usb/dvb-usb/cxusb-analog.c 
b/drivers/media/usb/dvb-usb/cxusb-analog.c
index 9b42ca71c177..51d3cba32b60 100644
--- a/drivers/media/usb/dvb-usb/cxusb-analog.c
+++ b/drivers/media/usb/dvb-usb/cxusb-analog.c
@@ -1622,8 +1622,7 @@ int cxusb_medion_analog_init(struct dvb_usb_device 
*dvbdev)
/* TODO: setup audio samples insertion */
 
ret = v4l2_subdev_call(cxdev->cx25840, core, s_io_pin_config,
-  sizeof(cxusub_medion_pin_config) /
-  sizeof(cxusub_medion_pin_config[0]),
+  ARRAY_SIZE(cxusub_medion_pin_config),
   cxusub_medion_pin_config);
if (ret != 0)
dev_warn(>udev->dev,

[PATCH bpf v2] bpf: preallocate a perf_sample_data per event fd

2019-05-31 Thread Matt Mullins

It is possible that a BPF program can be called while another BPF
program is executing bpf_perf_event_output.  This has been observed with
I/O completion occurring as a result of an interrupt:

bpf_prog_247fd1341cddaea4_trace_req_end+0x8d7/0x1000
? trace_call_bpf+0x82/0x100
? sch_direct_xmit+0xe2/0x230
? blk_mq_end_request+0x1/0x100
? blk_mq_end_request+0x5/0x100
? kprobe_perf_func+0x19b/0x240
? __qdisc_run+0x86/0x520
? blk_mq_end_request+0x1/0x100
? blk_mq_end_request+0x5/0x100
? kprobe_ftrace_handler+0x90/0xf0
? ftrace_ops_assist_func+0x6e/0xe0
? ip6_input_finish+0xbf/0x460
? 0xa01e80bf
? nbd_dbg_flags_show+0xc0/0xc0 [nbd]
? blkdev_issue_zeroout+0x200/0x200
? blk_mq_end_request+0x1/0x100
? blk_mq_end_request+0x5/0x100
? flush_smp_call_function_queue+0x6c/0xe0
? smp_call_function_single_interrupt+0x32/0xc0
? call_function_single_interrupt+0xf/0x20
? call_function_single_interrupt+0xa/0x20
? swiotlb_map_page+0x140/0x140
? refcount_sub_and_test+0x1a/0x50
? tcp_wfree+0x20/0xf0
? skb_release_head_state+0x62/0xc0
? skb_release_all+0xe/0x30
? napi_consume_skb+0xb5/0x100
? mlx5e_poll_tx_cq+0x1df/0x4e0
? mlx5e_poll_tx_cq+0x38c/0x4e0
? mlx5e_napi_poll+0x58/0xc30
? mlx5e_napi_poll+0x232/0xc30
? net_rx_action+0x128/0x340
? __do_softirq+0xd4/0x2ad
? irq_exit+0xa5/0xb0
? do_IRQ+0x7d/0xc0
? common_interrupt+0xf/0xf

? __rb_free_aux+0xf0/0xf0
? perf_output_sample+0x28/0x7b0
? perf_prepare_sample+0x54/0x4a0
? perf_event_output+0x43/0x60
? bpf_perf_event_output_raw_tp+0x15f/0x180
? blk_mq_start_request+0x1/0x120
? bpf_prog_411a64a706fc6044_should_trace+0xad4/0x1000
? bpf_trace_run3+0x2c/0x80
? nbd_send_cmd+0x4c2/0x690 [nbd]

This also cannot be alleviated by further splitting the per-cpu
perf_sample_data structs (as in commit 283ca526a9bd ("bpf: fix
corruption on concurrent perf_event_output calls")), as a raw_tp could
be attached to the block:block_rq_complete tracepoint and execute during
another raw_tp.  Instead, keep a pre-allocated perf_sample_data
structure per perf_event_array element and fail a bpf_perf_event_output
if that element is concurrently being used.

Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data")
Signed-off-by: Matt Mullins 
---
v1->v2:
keep a pointer to the struct perf_sample_data rather than directly
embedding it in the structure, avoiding the circular include and
removing the need for in_use.  Suggested by Song.

 include/linux/bpf.h  |  1 +
 kernel/bpf/arraymap.c|  3 ++-
 kernel/trace/bpf_trace.c | 29 -
 3 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4fb3aa2dc975..47fd85cfbbaf 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -472,6 +472,7 @@ struct bpf_event_entry {
struct file *perf_file;
struct file *map_file;
struct rcu_head rcu;
+   struct perf_sample_data *sd;
 };
 
 bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog 
*fp);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 584636c9e2eb..c7f5d593e04f 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -654,11 +654,12 @@ static struct bpf_event_entry *bpf_event_entry_gen(struct 
file *perf_file,
 {
struct bpf_event_entry *ee;
 
-   ee = kzalloc(sizeof(*ee), GFP_ATOMIC);
+   ee = kzalloc(sizeof(*ee) + sizeof(struct perf_sample_data), GFP_ATOMIC);
if (ee) {
ee->event = perf_file->private_data;
ee->perf_file = perf_file;
ee->map_file = map_file;
+   ee->sd = (void *)ee + sizeof(*ee);
}
 
return ee;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f92d6ad5e080..076f8e987355 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -410,17 +410,17 @@ static const struct bpf_func_proto 
bpf_perf_event_read_value_proto = {
.arg4_type  = ARG_CONST_SIZE,
 };
 
-static DEFINE_PER_CPU(struct perf_sample_data, bpf_trace_sd);
-
 static __always_inline u64
 __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map,
-   u64 flags, struct perf_sample_data *sd)
+   u64 flags, struct perf_raw_record *raw)
 {
struct bpf_array *array = container_of(map, struct bpf_array, map);
unsigned int cpu = smp_processor_id();
u64 index = flags & BPF_F_INDEX_MASK;
struct bpf_event_entry *ee;
struct perf_event *event;
+   struct perf_sample_data *sd;
+   u64 ret;
 
if (index == BPF_F_CURRENT_CPU)

Re: [PATCH v2 net-next 7/7] net: ethernet: ti: cpsw: add XDP support

2019-05-31 Thread Jesper Dangaard Brouer

On Fri, 31 May 2019 20:03:33 +0300
Ivan Khoronzhuk  wrote:

> Probably it's not good example for others how it should be used, not
> a big problem to move it to separate pools.., even don't remember why
> I decided to use shared pool, there was some more reasons... need
> search in history.

Using a shared pool is makes it a lot harder to solve the issue I'm
currently working on.  That is handling/waiting for in-flight frames to
complete, before removing the mem ID from the (r)hashtable lookup.  I
have working code, that basically remove page_pool_destroy() from
public API, and instead lets xdp_rxq_info_unreg() call it when
in-flight count reach zero (and delay fully removing the mem ID).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

linux kernel page allocation failure and tuning of page reclaim procedure

2019-05-31 Thread amit nagal

Hi

We are using Renesas RZ/A1 processor based custom target board . linux
kernel version is 4.9.123.

1) the platform is low memory platform having memory 64MB.

2)  we are doing around 45MB TCP data transfer from PC to target using
netcat utility .
On Target , a process receives data over socket and writes the data to
flash disk .

3) At the start of data transfer , we explicitly clear linux kernel
cached memory by  calling echo 3 > /proc/sys/vm/drop_caches

4) during TCP data transfer , we could see free -m showing "free"
getting dropped to almost 1MB and most of the memory appearing as
"cached"

# free -m
total used   free
   shared   buffers   cached
Mem: 5756 1
  02   42
-/+ buffers/cache:  1245
Swap:   0  0   0

5) sometimes , we observed kernel memory getting exhausted as page
allocation failure happens in kernel  with the backtrace as printed
 below in point 7 ):

6) we have certain questions as below :
a) how the kernel memory got exhausted ?is the kernel page reclaim
mechanism not executing at right time ?
since at the time of low memory conditions in kernel , even though the
free memory drops to 1MB
but there still cached memory available is 42mb and page reclaim
procedure should have triggered
which would have synced the cached pages to disk and used the same for
kernel page allocations

b) are there any parameters available within the linux memory
subsystem with which
the reclaim procedure can be monitored and  fine tuned ?

c) can  some amount of free memory be reserved so that linux kernel
does not caches it for page cache usage
and kernel can use it for its other required page allocation (
particularly gfp_atomic ) as needed above on behalf of netcat nc
process ?
can some tuning be done in linux memory subsystem eg by using
/proc/sys/vm/min_free_kbytes  to achieve this objective .

d) can we be provided with further clues on how to debug this issue
further for out of memory condition in kernel  ?

7) Backtrace:
# [  775.947949] nc.traditional: page allocation failure: order:0,
mode:0x2080020(GFP_ATOMIC)
[  775.956362] CPU: 0 PID: 1288 Comm: nc.traditional Tainted: G
   O4.9.123-pic6-g31a13de-dirty #19
[  775.966085] Hardware name: Generic R7S72100 (Flattened Device Tree)
[  775.972501] [] (unwind_backtrace) from []
(show_stack+0xb/0xc)
[  775.980118] [] (show_stack) from []
(warn_alloc+0x89/0xba)
[  775.987361] [] (warn_alloc) from []
(__alloc_pages_nodemask+0x1eb/0x634)
[  775.995790] [] (__alloc_pages_nodemask) from []
(__alloc_page_frag+0x39/0xde)
[  776.004685] [] (__alloc_page_frag) from []
(__netdev_alloc_skb+0x51/0xb0)
[  776.013217] [] (__netdev_alloc_skb) from []
(sh_eth_poll+0xbf/0x3c0)
[  776.021342] [] (sh_eth_poll) from []
(net_rx_action+0x77/0x170)
[  776.029051] [] (net_rx_action) from []
(__do_softirq+0x107/0x160)
[  776.036896] [] (__do_softirq) from []
(irq_exit+0x5d/0x80)
[  776.044165] [] (irq_exit) from []
(__handle_domain_irq+0x57/0x8c)
[  776.052007] [] (__handle_domain_irq) from []
(gic_handle_irq+0x31/0x48)
[  776.060362] [] (gic_handle_irq) from []
(__irq_svc+0x65/0xac)
[  776.067835] Exception stack(0xc1cafd70 to 0xc1cafdb8)
[  776.072876] fd60: 0002751c c1dec6a0 000c 521c3be5
[  776.081042] fd80: 56feb08e f64823a6 ffb35f7b feab513d f9cb0643
056c c1caff10 e000
[  776.089204] fda0: b1f49160 c1cafdc4 c180c677 c0234ace 200e0033 
[  776.095816] [] (__irq_svc) from []
(__copy_to_user_std+0x7e/0x430)
[  776.103796] [] (__copy_to_user_std) from []
(copy_page_to_iter+0x105/0x250)
[  776.112503] [] (copy_page_to_iter) from []
(skb_copy_datagram_iter+0xa3/0x108)
[  776.121469] [] (skb_copy_datagram_iter) from []
(tcp_recvmsg+0x3ab/0x5f4)
[  776.130045] [] (tcp_recvmsg) from []
(inet_recvmsg+0x21/0x2c)
[  776.137576] [] (inet_recvmsg) from []
(sock_read_iter+0x51/0x6e)
[  776.145384] [] (sock_read_iter) from []
(__vfs_read+0x97/0xb0)
[  776.152967] [] (__vfs_read) from [] (vfs_read+0x51/0xb0)
[  776.159983] [] (vfs_read) from [] (SyS_read+0x27/0x52)
[  776.166837] [] (SyS_read) from []
(ret_fast_syscall+0x1/0x54)
[  776.174308] Mem-Info:
[  776.176650] active_anon:2037 inactive_anon:23 isolated_anon:0
[  776.176650]  active_file:2636 inactive_file:7391 isolated_file:32
[  776.176650]  unevictable:0 dirty:1366 writeback:1281 unstable:0
[  776.176650]  slab_reclaimable:719 slab_unreclaimable:724
[  776.176650]  mapped:1990 shmem:26 pagetables:159 bounce:0
[  776.176650]  free:373 free_pcp:6 free_cma:0
[  776.209062] Node 0 active_anon:8148kB inactive_anon:92kB
active_file:10544kB inactive_file:29564kB unevictable:0kB
isolated(anon):0kB isolated(file):128kB mapped:7960kB dirty:5464kB
writeback:5124kB shmem:104kB writeback_tmp:0kB unstable:0kB
pages_scanned:0 all_unreclaimable? no [  776.233602] Normal
free:1492kB min:964kB

Re: [PATCH v3] HID: fix A4Tech horizontal scrolling

2019-05-31 Thread Jiri Kosina

On Sun, 12 May 2019, Błażej Szczygieł wrote:

> Since recent high resolution scrolling changes the A4Tech driver must
> check for the "REL_WHEEL_HI_RES" usage code.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203369
> Fixes: 2dc702c991e3774af9d7ce410eef410ca9e2357e ("HID: input: use the
> Resolution Multiplier for high-resolution scrolling")
> 
> Signed-off-by: Błażej Szczygieł 

Applied, thanks.

-- 
Jiri Kosina
SUSE Labs

re: I have a business deal of mutual funds benefits for you.

2019-05-31 Thread Chang Jeffery





--
Dear sir,

I have a client who is an oil business man and he made a fixed deposit 
of $24million USD in my bank, where I am the director of the branch, My 
client died with his entire family in Jordanian intervention in the 
Syrian Civil War 2014 leaving behind no next of kin. I Propose to 
present you as next of kin to claim the funds, if interested reply me 
for full details and how we are to proceed to close this deal.


Thanks
ismet

Re: [PATCH v2 03/17] soc: qcom: ipa: main code

2019-05-31 Thread Alex Elder

On 5/31/19 4:50 PM, David Miller wrote:
> From: Alex Elder 
> Date: Thu, 30 May 2019 22:53:34 -0500
> 
>> +void *route_virt;
>  ...
>> +void *filter_virt;
>  ...
> 
> If these are arrays of u64's, please declare them as "u64 *" instead of
> the opaque "void *".

Good idea.  I hadn't paid attention to that.  These tables are
arrays of 64-bit addresses so it's better to represent them that
way.

Thanks.

-Alex

Re: [PATCH] livepatch: Fix ftrace module text permissions race

2019-05-31 Thread Josh Poimboeuf

On Fri, May 31, 2019 at 02:12:56PM -0500, Josh Poimboeuf wrote:
> > Anyway, the above is a separate problem. This patch looks
> > fine for the original problem.
> 
> Thanks for the review.  I'll post another version, with the above
> changes and with the patches split up like Miroslav suggested.

The latest patches are here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=fix-livepatch-ftrace-race

If the bot likes them, I'll post them properly soon.

-- 
Josh

Re: [PATCH v8 2/4] soc: qcom: Add AOSS QMP driver

2019-05-31 Thread Doug Anderson

Hi,

On Thu, May 30, 2019 at 8:01 PM Bjorn Andersson
 wrote:
>
> +/**
> + * qmp_send() - send a message to the AOSS
> + * @qmp: qmp context
> + * @data: message to be sent
> + * @len: length of the message
> + *
> + * Transmit @data to AOSS and wait for the AOSS to acknowledge the message.
> + * @len must be a multiple of 4 and not longer than the mailbox size. Access 
> is
> + * synchronized by this implementation.
> + *
> + * Return: 0 on success, negative errno on failure
> + */
> +static int qmp_send(struct qmp *qmp, const void *data, size_t len)
> +{
> +   int ret;
> +
> +   if (WARN_ON(len + sizeof(u32) > qmp->size))
> +   return -EINVAL;
> +
> +   if (WARN_ON(len % sizeof(u32)))
> +   return -EINVAL;
> +
> +   mutex_lock(>tx_lock);
> +
> +   /* The message RAM only implements 32-bit accesses */
> +   __iowrite32_copy(qmp->msgram + qmp->offset + sizeof(u32),
> +data, len / sizeof(u32));
> +   writel(len, qmp->msgram + qmp->offset);
> +   qmp_kick(qmp);
> +
> +   ret = wait_event_interruptible_timeout(qmp->event,
> +  qmp_message_empty(qmp), HZ);
> +   if (!ret) {
> +   dev_err(qmp->dev, "ucore did not ack channel\n");
> +   ret = -ETIMEDOUT;
> +
> +   /* Clear message from buffer */
> +   writel(0, qmp->msgram + qmp->offset);
> +   } else {
> +   ret = 0;
> +   }

Just like Vinod said in in v7, the "ret = 0" is redundant.


> +static int qmp_qdss_clk_add(struct qmp *qmp)
> +{
> +   struct clk_init_data qdss_init = {
> +   .ops = _qdss_clk_ops,
> +   .name = "qdss",
> +   };

As I mentioned in v7, there is no downside in marking qdss_init as
"static const" and it avoids the compiler inserting a memcpy() to get
this data on the stack.  Using static const also reduces your stack
usage.


> +   int ret;
> +
> +   qmp->qdss_clk.init = _init;
> +   ret = clk_hw_register(qmp->dev, >qdss_clk);
> +   if (ret < 0) {
> +   dev_err(qmp->dev, "failed to register qdss clock\n");
> +   return ret;
> +   }
> +
> +   ret = of_clk_add_hw_provider(qmp->dev->of_node, of_clk_hw_simple_get,
> +>qdss_clk);

I still prefer to devm-ify the whole driver, using
devm_add_action_or_reset() to handle things where there is no devm.
...but I won't insist.


Above things are just nits and I won't insist.  They also could be
addressed in follow-up patches.  Thus:

Reviewed-by: Douglas Anderson

Re: [PATCH] drivers: hid: Add a module description line to the hid_hyperv driver

2019-05-31 Thread Jiri Kosina

On Thu, 30 May 2019, Sasha Levin wrote:

> From: Joseph Salisbury 
> 
> This patch only adds a MODULE_DESCRIPTION statement to the driver.
> This change is only cosmetic, so there should be no runtime impact.
> 
> Signed-off-by: Joseph Salisbury 
> Reviewed-by: Michael Kelley 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/hid/hid-hyperv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
> index 704049e62d58a..d3311d714d359 100644
> --- a/drivers/hid/hid-hyperv.c
> +++ b/drivers/hid/hid-hyperv.c
> @@ -614,5 +614,7 @@ static void __exit mousevsc_exit(void)
>  }
>  
>  MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Microsoft Hyper-V Synthetic HID Driver");
> +

Is there a reason why you didn't CC Joseph on this submission?

CCing Joseph and applying. Thanks,

-- 
Jiri Kosina
SUSE Labs

Re: hid-related 5.2-rc1 boot hang

2019-05-31 Thread Jiri Kosina

On Thu, 30 May 2019, Dave Hansen wrote:

> On 5/29/19 2:17 AM, Hans de Goede wrote:
> ...
> > Dave, can you try building your initrd without the hid-logitech-dj module
> > included in the initrd?
> 
> I did this on a vanilla 5.2-rc2 kernel (without the reverts) and still
> experienced the boot hang while the device was inserted.
> 
> > Also can you check if your modprobe is provided by module-init-tools
> > or by kmod ?
> 
> $ dpkg -S `which modprobe`
> kmod: /sbin/modprobe

Benjamin, Hans, are you looking into this?

If not, I think we should start reverting (at least the request_module() 
changes, not sure about the rest of logitech issues yet) next week.

Thanks,

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH v1 1/2] fork: add clone3

2019-05-31 Thread Christian Brauner

On Fri, May 31, 2019 at 01:38:29PM -0700, Linus Torvalds wrote:
> On Wed, May 29, 2019 at 3:24 PM Andrei Vagin  wrote:
> >
> > Thank you for thinking about time namespaces. I looked at this patch
> > quickly and I would suggest to move a termination signal out of flags. I
> > think we can add a separate field (exit_signal) into clone_args. Does it
> > make sense? For me, exit_signal in flags always looked weird...
> 
> I agree - the child signal in flags was always just a "it fits" kind
> of thing, and that was obviously true back then, but is not true now.

(Traveling until Monday, so sorry for delayed responses.)

Yip, I agree that this is a good idea (answered Andrei's mail just now
saying the same thing). I'll send out a new version of the patch with
these changes added next week.

Christian

Re: [PATCH v1 1/2] fork: add clone3

2019-05-31 Thread Christian Brauner

On Wed, May 29, 2019 at 03:24:15PM -0700, Andrei Vagin wrote:
> On Wed, May 29, 2019 at 05:22:36PM +0200, Christian Brauner wrote:
> > This adds the clone3 system call.
> > 
> > As mentioned several times already (cf. [7], [8]) here's the promised
> > patchset for clone3().
> > 
> > We recently merged the CLONE_PIDFD patchset (cf. [1]). It took the last
> > free flag from clone().
> > 
> > Independent of the CLONE_PIDFD patchset a time namespace has been discussed
> > at Linux Plumber Conference last year and has been sent out and reviewed
> > (cf. [5]). It is expected that it will go upstream in the not too distant
> > future. However, it relies on the addition of the CLONE_NEWTIME flag to
> > clone(). The only other good candidate - CLONE_DETACHED - is currently not
> > recyclable as we have identified at least two large or widely used
> > codebases that currently pass this flag (cf. [2], [3], and [4]). Given that
> > CLONE_PIDFD grabbed the last clone() flag the time namespace is effectively
> > blocked. clone3() has the advantage that it will unblock this patchset
> > again.
> 
> Hi Christian,

Hi Andrei,

(Traveling until Monday, so sorry for delayed responses.)

> 
> Thank you for thinking about time namespaces. I looked at this patch
> quickly and I would suggest to move a termination signal out of flags. I
> think we can add a separate field (exit_signal) into clone_args. Does it
> make sense? For me, exit_signal in flags always looked weird...

Yup, that does sound good to me.

> 
> I will look at this patch more detailed later this week. Thanks.

Excellent!

Christian

Re: [PATCH v1 1/2] fork: add clone3

2019-05-31 Thread Christian Brauner

On Wed, May 29, 2019 at 05:42:14PM +0200, Yann Droneaud wrote:
> Le mercredi 29 mai 2019 à 17:22 +0200, Christian Brauner a écrit :
> > This adds the clone3 system call.
> > 
> > 
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index b4cba953040a..6bc3e3d17150 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -2472,7 +2475,96 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, 
> > unsigned long, newsp,
> >  unsigned long, tls)
> >  #endif
> >  {
> > -   return _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr, 
> > tls);
> > +   struct kernel_clone_args args = {
> > +   .flags = clone_flags,
> > +   .stack = newsp,
> > +   .pidfd = parent_tidptr,
> > +   .parent_tidptr = parent_tidptr,
> > +   .tls = tls,
> > +   .child_tidptr = child_tidptr,
> > +   };
> > +
> > +   /* clone(CLONE_PIDFD) uses parent_tidptr to return a pidfd */
> > +   if ((clone_flags & CLONE_PIDFD) && (clone_flags & CLONE_PARENT_SETTID))
> > +   return -EINVAL;
> > +
> > +   return _do_fork();
> > +}
> > +
> > +static bool clone3_flags_valid(u64 flags)
> > +{
> > +   if (flags & CLONE_DETACHED)
> > +   return false;
> > +
> > +   if (flags & ~CLONE_MAX)
> > +   return false;
> > +
> > +   return true;
> > +}
> > +
> > +static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
> > +struct clone_args __user *uargs,
> > +size_t size)
> > +{
> > +   struct clone_args args;
> > +
> > +   if (unlikely(size > PAGE_SIZE))
> > +   return -E2BIG;
> > +
> > +   if (unlikely(size < sizeof(struct clone_args)))
> > +   return -EINVAL;
> > +
> > +   if (unlikely(!access_ok(uargs, size)))
> > +   return -EFAULT;
> > +
> > +   if (size > sizeof(struct clone_args)) {
> > +   unsigned char __user *addr;
> > +   unsigned char __user *end;
> > +   unsigned char val;
> > +
> > +   addr = (void __user *)uargs + sizeof(struct clone_args);
> > +   end = (void __user *)uargs + size;
> > +
> > +   for (; addr < end; addr++) {
> > +   if (get_user(val, addr))
> > +   return -EFAULT;
> > +   if (val)
> > +   return -E2BIG;
> 
> Should be -EINVAL: having something after the structure should be
> handled just like an invalid flags, while still allowing future
> userspace program to probe for support for newer feature.

(Traveling until Monday, so sorry for delayed responses.)

This copies what:

kernel/sched/core.c:sched_copy_attr()
kernel/event/core.c:perf_copy_attr()

are already doing. Consistency might be good here but, I think.

Christian

Re: [PATCH v2 00/17] net: introduce Qualcomm IPA driver

2019-05-31 Thread Alex Elder

On 5/31/19 4:12 PM, Arnd Bergmann wrote:
> On Fri, May 31, 2019 at 10:47 PM Alex Elder  wrote:
>> On 5/31/19 2:19 PM, Arnd Bergmann wrote:
>>> On Fri, May 31, 2019 at 6:36 PM Alex Elder  wrote:
 On 5/31/19 9:58 AM, Dan Williams wrote:
> On Thu, 2019-05-30 at 22:53 -0500, Alex Elder wrote:
>>>
>>> Does this mean that IPA can only be used to back rmnet, and rmnet
>>> can only be used on top of IPA, or can or both of them be combined
>>> with another driver to talk to instead?
>>
>> No it does not mean that.
>>
>> As I understand it, one reason for the rmnet layer was to abstract
>> the back end, which would allow using a modem, or using something
>> else (a LAN?), without exposing certain details of the hardware.
>> (Perhaps to support multiplexing, etc. without duplicating that
>> logic in two "back-end" drivers?)
>>
>> To be perfectly honest, at first I thought having IPA use rmnet
>> was a cargo cult thing like Dan suggested, because I didn't see
>> the benefit.  I now see why one would use that pass-through layer
>> to handle the QMAP features.
>>
>> But back to your question.  The other thing is that I see no
>> reason the IPA couldn't present a "normal" (non QMAP) interface
>> for a modem.  It's something I'd really like to be able to do,
>> but I can't do it without having the modem firmware change its
>> configuration for these endpoints.  My access to the people who
>> implement the modem firmware has been very limited (something
>> I hope to improve), and unless and until I can get corresponding
>> changes on the modem side to implement connections that don't
>> use QMAP, I can't implement such a thing.
> 
> Why would that require firmware changes? What I was thinking
> here is to turn the bits of the rmnet driver that actually do anything
> interesting on the headers into a library module (or a header file
> with inline functions) that can be called directly by the ipa driver,
> keeping the protocol unchanged.

You know, it's possible you're right about not needing
firmware changes.  But it has always been my impression
they would be needed.  Here's why.

It looks like this:

   GSI Channel   GSI Channel
   | | 
  --   v   ---   v   -
  | AP (ep)|===| IPA |===|(ep) Modem |
  --   ---   -

The AP and Modem each have IPA endpoints (ep), which use GSI channels,
to communicate with the IPA. Each endpoint has configuration options
(such as checksum offload).  I *thought* that the configurations of
the two endpoints need to be compatible (e.g., they need to agree on
whether they're aggregating).  But with your questioning I now think
you may be right, that only the local endpoint's configuration matters.

I will inquire further on this.  I *know* that the AP and modem
exchange some information about IPA configuration, but looking more
closely that looks like it's all about the configuration of shared
IPA resources, not endpoints.

That said, the broader design (including the user space code)
surely assumes rmnet, and I don't have any sense of what impact
changing that would make.  I am sure that changing it would not
be well received.

-Alex

>>> Always passing data from one netdev to another both ways
>>> sounds like it introduces both direct CPU overhead, and
>>> problems with flow control when data gets buffered inbetween.
>>
>> My impression is the rmnet driver is a pretty thin layer,
>> so the CPU overhead is probably not that great (though
>> deaggregating a message is expensive).  I agree with you
>> on the flow control.
> 
> The CPU overhead I mean is not from executing code in the
> rmnet driver, but from passing packets through the network
> stack between the two drivers, i.e. adding each frame to
> a queue and taking it back out. I'm not sure how this ends
> up working in reality but from a first look it seems like
> we might bounce in an out of the softirq handler inbetween.
> 
>   Arnd
>

Re: [RFC PATCH v2 11/12] x86/mm/tlb: Use async and inline messages for flushing

2019-05-31 Thread Nadav Amit

> On May 31, 2019, at 2:47 PM, Andy Lutomirski  wrote:
> 
> 
> On May 31, 2019, at 2:33 PM, Nadav Amit  wrote:
> 
>>> On May 31, 2019, at 2:14 PM, Andy Lutomirski  wrote:
>>> 
 On Thu, May 30, 2019 at 11:37 PM Nadav Amit  wrote:
 When we flush userspace mappings, we can defer the TLB flushes, as long
 the following conditions are met:
 
 1. No tables are freed, since otherwise speculative page walks might
 cause machine-checks.
 
 2. No one would access userspace before flush takes place. Specifically,
 NMI handlers and kprobes would avoid accessing userspace.
>>> 
>>> I think I need to ask the big picture question.  When someone calls
>>> flush_tlb_mm_range() (or the other entry points), if no page tables
>>> were freed, they want the guarantee that future accesses (initiated
>>> observably after the flush returns) will not use paging entries that
>>> were replaced by stores ordered before flush_tlb_mm_range().  We also
>>> need the guarantee that any effects from any memory access using the
>>> old paging entries will become globally visible before
>>> flush_tlb_mm_range().
>>> 
>>> I'm wondering if receipt of an IPI is enough to guarantee any of this.
>>> If CPU 1 sets a dirty bit and CPU 2 writes to the APIC to send an IPI
>>> to CPU 1, at what point is CPU 2 guaranteed to be able to observe the
>>> dirty bit?  An interrupt entry today is fully serializing by the time
>>> it finishes, but interrupt entries are epicly slow, and I don't know
>>> if the APIC waits long enough.  Heck, what if IRQs are off on the
>>> remote CPU?  There are a handful of places where we touch user memory
>>> with IRQs off, and it's (sadly) possible for user code to turn off
>>> IRQs with iopl().
>>> 
>>> I *think* that Intel has stated recently that SMT siblings are
>>> guaranteed to stop speculating when you write to the APIC ICR to poke
>>> them, but SMT is very special.
>>> 
>>> My general conclusion is that I think the code needs to document what
>>> is guaranteed and why.
>> 
>> I think I might have managed to confuse you with a bug I made (last minute
>> bug when I was doing some cleanup). This bug does not affect the performance
>> much, but it might led you to think that I use the APIC sending as
>> synchronization.
>> 
>> The idea is not for us to rely on write to ICR as something serializing. The
>> flow should be as follows:
>> 
>> 
>>   CPU0CPU1
>> 
>> flush_tlb_mm_range()
>> __smp_call_function_many()
>> [ prepare call_single_data (csd) ]
>> [ lock csd ] 
>> [ send IPI ]
>>   (*)
>> [ wait for csd to be unlocked ]
>>   [ interrupt ]
>>   [ copy csd info to stack ]
>>   [ csd unlock ]
>> [ find csd is unlocked ]
>> [ continue (**) ]
>>   [ flush TLB ]
>> 
>> 
>> At (**) the pages might be recycled, written-back to disk, etc. Note that
>> during (*), CPU0 might do some local TLB flushes, making it very likely that
>> CSD will be unlocked by the time it gets there.
>> 
>> As you can see, I don’t rely on any special micro-architectural behavior.
>> The synchronization is done purely in software.
>> 
>> Does it make more sense now?
> 
> Yes.  Have you benchmarked this?

Partially. Numbers are indeed worse. Here are preliminary results, comparing
to v1 (concurrent):

n_threads   before  concurrent  +async
-   --  --  --
1   661 663 663
2   14361225 (-14%) 1115 (-22%)
4   15711421 (-10%) 1289 (-18%)

Note that the benefit of “async" would be greater if the initiator does not
flush the TLB at all. This might happen in the case of kswapd, for example.
Let me try some micro-optimizations first, run more benchmarks and get back
to you.

Re: [PATCH v2 net-next 7/7] net: ethernet: ti: cpsw: add XDP support

2019-05-31 Thread Saeed Mahameed

On Fri, 2019-05-31 at 20:03 +0300, Ivan Khoronzhuk wrote:
> On Fri, May 31, 2019 at 06:32:41PM +0200, Jesper Dangaard Brouer
> wrote:
> > On Fri, 31 May 2019 19:25:24 +0300 Ivan Khoronzhuk <
> > ivan.khoronz...@linaro.org> wrote:
> > 
> > > On Fri, May 31, 2019 at 05:46:43PM +0200, Jesper Dangaard Brouer
> > > wrote:
> > > > From below code snippets, it looks like you only allocated 1
> > > > page_pool
> > > > and sharing it with several RX-queues, as I don't have the full
> > > > context
> > > > and don't know this driver, I might be wrong?
> > > > 
> > > > To be clear, a page_pool object is needed per RX-queue, as it
> > > > is
> > > > accessing a small RX page cache (which protected by
> > > > NAPI/softirq).
> > > 
> > > There is one RX interrupt and one RX NAPI for all rx channels.
> > 
> > So, what are you saying?
> > 
> > You _are_ sharing the page_pool between several RX-channels, but it
> > is
> > safe because this hardware only have one RX interrupt + NAPI
> > instance??
> 
> I can miss smth but in case of cpsw technically it means:
> 1) RX interrupts are disabled while NAPI is scheduled,
>not for particular CPU or channel, but at all, for whole cpsw
> module.
> 2) RX channels are handled one by one by priority.

Hi Ivan, I got a silly question.. 

What is the reason behind having multiple RX rings and one CPU/NAPI
handling all of them ? priority ? how do you priorities ?

> 3) After all of them handled and no more in budget - interrupts are
> enabled.
> 4) If page is returned to the pool, and it's within NAPI, no races as
> it's
>returned protected by softirq. If it's returned not in softirq
> it's protected 
>by producer lock of the ring.
> 
> Probably it's not good example for others how it should be used, not
> a big
> problem to move it to separate pools.., even don't remember why I
> decided to
> use shared pool, there was some more reasons... need search in
> history.
> 
> > -- 
> > Best regards,
> >  Jesper Dangaard Brouer
> >  MSc.CS, Principal Kernel Engineer at Red Hat
> >  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH] Revert "usb: core: remove local_irq_save() around ->complete() handler"

2019-05-31 Thread Greg Kroah-Hartman

On Fri, May 31, 2019 at 11:53:40PM +0200, Soeren Moch wrote:
> This reverts commit ed194d1367698a0872a2b75bbe06b3932ce9df3a.
> 
> In contrast to the original patch description, apparently not all handlers
> were audited properly. E.g. my RT5370 based USB WIFI adapter (driver in
> drivers/net/wireless/ralink/rt2x00) hangs after a while under heavy load.
> This revert fixes this.

Why not just fix that driver?  Wouldn't that be easier?

thanks,

greg k-h

Re: [RESEND PATCH v4 6/6] leds: lm36274: Introduce the TI LM36274 LED driver

2019-05-31 Thread Jacek Anaszewski


Dan,

On 5/31/19 11:07 PM, Dan Murphy wrote:

Hello

On 5/31/19 2:44 PM, Jacek Anaszewski wrote:

On 5/31/19 8:23 AM, Lee Jones wrote:

On Thu, 30 May 2019, Jacek Anaszewski wrote:


On 5/30/19 9:38 AM, Lee Jones wrote:

On Wed, 29 May 2019, Jacek Anaszewski wrote:


On 5/29/19 3:58 PM, Lee Jones wrote:

On Fri, 24 May 2019, Jacek Anaszewski wrote:


Hi,

On 5/23/19 9:09 PM, Dan Murphy wrote:

Pavel

Thanks for the review

On 5/23/19 7:50 AM, Pavel Machek wrote:

Hi!


+++ b/drivers/leds/leds-lm36274.c



+static int lm36274_parse_dt(struct lm36274 *lm36274_data)
+{
+    struct fwnode_handle *child = NULL;
+    char label[LED_MAX_NAME_SIZE];
+    struct device *dev = _data->pdev->dev;
+    const char *name;
+    int child_cnt;
+    int ret = -EINVAL;
+
+    /* There should only be 1 node */
+    child_cnt = device_get_child_node_count(dev);
+    if (child_cnt != 1)
+    return ret;


I'd do explicit "return -EINVAL" here.



ACK


+static int lm36274_probe(struct platform_device *pdev)
+{
+    struct ti_lmu *lmu = dev_get_drvdata(pdev->dev.parent);
+    struct lm36274 *lm36274_data;
+    int ret;
+
+    lm36274_data = devm_kzalloc(>dev, 
sizeof(*lm36274_data),

+    GFP_KERNEL);
+    if (!lm36274_data) {
+    ret = -ENOMEM;
+    return ret;
+    }


And certainly do "return -ENOMEM" explicitly here.



ACK


Acked-by: Pavel Machek 


I've done all amendments requested by Pavel and updated branch
ib-leds-mfd-regulator on linux-leds.git, but in the same time


What do you mean by updated?  You cannot update an 'ib' (immutable
branch).  Immutable means that it cannot change, by definition.


We have already talked about that. Nobody has pulled so the branch
could have been safely updated.


You have no sure way to know that.  And since I have no way to know,
or faith that you won't update it again, pulling it now/at all would
seem like a foolish thing to do.


Sorry, but you are simply unjust. You're pretending to portray the
situation as if I have been notoriously causing merge conflicts in
linux-next which did not take place.

Just to recap what this discussion is about:

On 7 Apr 2019:

1. I sent pull request [0].
2. 45 minutes later I updated it after discovering one omission [1].
    It was rather small chance for it to be pulled as quickly as that.
    And even if it happened it wouldn't have been much harmful - we
    wouldn't have lost e.g. weeks of testing in linux-next due to that
    fact.

On 21 May 2019:

3. I sent another pull request [2] to you and REGULATOR maintainers.
    After it turned out that lack of feedback from REGULATOR 
maintainers

    was caused by failing to send them the exact copies of patches to
    review, I informed you about possible need for updating the branch.
    Afterwards I received a reply from you saying that you hadn't 
pulled

    the branch anyway. At that point I was sure that neither MFD nor
    REGULATOR tree contains the patches. And only after that I updated
    the branch.


Here are 2 examples where you have changed immutable branches, which
is 100% of the Pull Requests I have received from you.  Using that
record as a benchmark, the situation hardly seems unjust.


Until you can provide me with an assurance that you will not keep
updating/changing the supposedly immutable pull-requests you send out,
I won't be pulling any more in.


I can just uphold the assurance which is implicitly assumed for anyone
who has never broken acclaimed rules. As justified above.


You have broken the rules every (100% of the) time.


Yes, I admit, I would lose in court.


[0] https://lore.kernel.org/patchwork/patch/1059075/
[1] https://lore.kernel.org/patchwork/patch/1059080/
[2] https://lore.kernel.org/patchwork/patch/1077066/


So we have 2 choices moving forward; you can either provide me with
assurance that you have learned from this experience and will never
change an *immutable* branch again, or I can continue to handle them,
which has been the preference for some years.

If you choose the former and adaptions need to be made in the future,
the correct thing to do is create a *new*, different pull-request
which has its own *new*, different tag, but uses the original tag as a
base.


I choose the former. That being said:

Hereby I solemnly declare never ever change an immutable branch again.

So how do I proceed with the requested change by Mark B on the LM36274 
driver.


Do I add a patch on top?

Or do I submit a patch to the regulator tree once the PR is pulled?


Won't the change be a dependency for [PATCH v4 1/6] ?

In each case, having all the commits in one set (and branch) should
simplify the integration.

--
Best regards,
Jacek Anaszewski

[PATCH] Revert "usb: core: remove local_irq_save() around ->complete() handler"

2019-05-31 Thread Soeren Moch

This reverts commit ed194d1367698a0872a2b75bbe06b3932ce9df3a.

In contrast to the original patch description, apparently not all handlers
were audited properly. E.g. my RT5370 based USB WIFI adapter (driver in
drivers/net/wireless/ralink/rt2x00) hangs after a while under heavy load.
This revert fixes this.

Also revert the follow-up patch d6142b91e9cc249b3aa22c90fade67e2e2d52cdb
("usb: core: remove flags variable in __usb_hcd_giveback_urb()"), since now
we need the flags variable again.

Cc: Sebastian Andrzej Siewior 
Cc: Alan Stern 
Cc: Greg Kroah-Hartman 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: sta...@vger.kernel.org # 4.20+
Signed-off-by: Soeren Moch 
---
 drivers/usb/core/hcd.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 94d22551fc1b..08d25fcf8b8e 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1739,6 +1739,7 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus);
struct usb_anchor *anchor = urb->anchor;
int status = urb->unlinked;
+   unsigned long flags;

urb->hcpriv = NULL;
if (unlikely((urb->transfer_flags & URB_SHORT_NOT_OK) &&
@@ -1755,7 +1756,20 @@ static void __usb_hcd_giveback_urb(struct urb *urb)

/* pass ownership to the completion handler */
urb->status = status;
+
+   /*
+* We disable local IRQs here avoid possible deadlock because
+* drivers may call spin_lock() to hold lock which might be
+* acquired in one hard interrupt handler.
+*
+* The local_irq_save()/local_irq_restore() around complete()
+* will be removed if current USB drivers have been cleaned up
+* and no one may trigger the above deadlock situation when
+* running complete() in tasklet.
+*/
+   local_irq_save(flags);
urb->complete(urb);
+   local_irq_restore(flags);

usb_anchor_resume_wakeups(anchor);
atomic_dec(>use_count);
--
2.17.1

1 2 3 4 5 6 7 8 9 >

1 - 100 of 801 matches

Mail list logo