Re: [PATCH net-next] net: freescale: Remove unused declarations

2023-08-18 Thread Leon Romanovsky
On Thu, Aug 17, 2023 at 09:41:59PM +0800, Yue Haibing wrote:
> Commit 5d93cfcf7360 ("net: dpaa: Convert to phylink") removed
> fman_set_mac_active_pause()/fman_get_pause_cfg() but not declarations.
> Commit 48257c4f168e ("Add fs_enet ethernet network driver, for several
> embedded platforms.") declared but never implemented
> fs_enet_platform_init() and fs_enet_platform_cleanup().
> 
> Signed-off-by: Yue Haibing 
> ---
>  drivers/net/ethernet/freescale/fman/mac.h| 4 
>  drivers/net/ethernet/freescale/fs_enet/fs_enet.h | 5 -
>  2 files changed, 9 deletions(-)
> 

Thanks,
Reviewed-by: Leon Romanovsky 


Re: [PATCH v4 00/33] Per-VMA locks

2023-07-11 Thread Leon Romanovsky
On Tue, Jul 11, 2023 at 09:35:13AM -0700, Suren Baghdasaryan wrote:
> On Tue, Jul 11, 2023 at 4:09 AM Leon Romanovsky  wrote:
> >
> > On Tue, Jul 11, 2023 at 02:01:41PM +0300, Leon Romanovsky wrote:
> > > On Tue, Jul 11, 2023 at 12:39:34PM +0200, Vlastimil Babka wrote:
> > > > On 7/11/23 12:35, Leon Romanovsky wrote:
> > > > >
> > > > > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote:
> > > > >
> > > > > <...>
> > > > >
> > > > >> Laurent Dufour (1):
> > > > >>   powerc/mm: try VMA lock-based page fault handling first
> > > > >
> > > > > Hi,
> > > > >
> > > > > This series and specifically the commit above broke docker over PPC.
> > > > > It causes to docker service stuck while trying to activate. Revert of
> > > > > this commit allows us to use docker again.
> > > >
> > > > Hi,
> > > >
> > > > there have been follow-up fixes, that are part of 6.4.3 stable (also
> > > > 6.5-rc1) Does that version work for you?
> > >
> > > I'll recheck it again on clean system, but for the record:
> > > 1. We are running 6.5-rc1 kernels.
> > > 2. PPC doesn't compile for us on -rc1 without this fix.
> > > https://lore.kernel.org/all/20230629124500.1.I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4@changeid/
> >
> > Ohh, I see it in -rc1, let's recheck.
> 
> Hi Leon,
> Please let us know how it goes.

Once, we rebuilt clean -rc1, docker worked for us.
Sorry for the noise.

> 
> >
> > > 3. I didn't see anything relevant -rc1 with "git log 
> > > arch/powerpc/mm/fault.c".
> 
> The fixes Vlastimil was referring to are not in the fault.c, they are
> in the main mm and fork code. More specifically, check for these
> patches to exist in the branch you are testing:
> 
> mm: lock newly mapped VMA with corrected ordering
> fork: lock VMAs of the parent process when forking
> mm: lock newly mapped VMA which can be modified after it becomes visible
> mm: lock a vma before stack expansion

Thanks

> 
> Thanks,
> Suren.
> 
> > >
> > > Do you have in mind anything specific to check?
> > >
> > > Thanks
> > >
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to kernel-team+unsubscr...@android.com.
> >


Re: [PATCH v4 00/33] Per-VMA locks

2023-07-11 Thread Leon Romanovsky


On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote:

<...>

> Laurent Dufour (1):
>   powerc/mm: try VMA lock-based page fault handling first

Hi,

This series and specifically the commit above broke docker over PPC.
It causes to docker service stuck while trying to activate. Revert of
this commit allows us to use docker again.

[user@ppc-135-3-200-205 ~]# sudo systemctl status docker
● docker.service - Docker Application Container Engine
 Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor 
preset: disabled)
 Active: activating (start) since Mon 2023-06-26 14:47:07 IDT; 3h 50min ago
TriggeredBy: ● docker.socket
   Docs: https://docs.docker.com
   Main PID: 276555 (dockerd)
 Memory: 44.2M
 CGroup: /system.slice/docker.service
 └─ 276555 /usr/bin/dockerd -H fd:// 
--containerd=/run/containerd/containerd.sock

Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: 
time="2023-06-26T14:47:07.129383166+03:00" level=info msg="Graph migration to 
content-addressability took 0.00 se>
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: 
time="2023-06-26T14:47:07.129666160+03:00" level=warning msg="Your kernel does 
not support cgroup cfs period"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: 
time="2023-06-26T14:47:07.129684117+03:00" level=warning msg="Your kernel does 
not support cgroup cfs quotas"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: 
time="2023-06-26T14:47:07.129697085+03:00" level=warning msg="Your kernel does 
not support cgroup rt period"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: 
time="2023-06-26T14:47:07.129711513+03:00" level=warning msg="Your kernel does 
not support cgroup rt runtime"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: 
time="2023-06-26T14:47:07.129720656+03:00" level=warning msg="Unable to find 
blkio cgroup in mounts"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: 
time="2023-06-26T14:47:07.129805617+03:00" level=warning msg="mountpoint for 
pids not found"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: 
time="2023-06-26T14:47:07.130199070+03:00" level=info msg="Loading containers: 
start."
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: 
time="2023-06-26T14:47:07.132688568+03:00" level=warning msg="Running modprobe 
bridge br_netfilter failed with me>
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: 
time="2023-06-26T14:47:07.271014050+03:00" level=info msg="Default bridge 
(docker0) is assigned with an IP addres>

Python script which we used for bisect:

import subprocess
import time
import sys


def run_command(cmd):
print('running:', cmd)

p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, 
stderr=subprocess.PIPE)

try:
stdout, stderr = p.communicate(timeout=30)

except subprocess.TimeoutExpired:
return True

print(stdout.decode())
print(stderr.decode())
print('rc:', p.returncode)

return False


def main():
commands = [
'sudo systemctl stop docker',
'sudo systemctl status docker',
'sudo systemctl is-active docker',
'sudo systemctl start docker',
'sudo systemctl status docker',
]

for i in range(1000):
title = f'Try no. {i + 1}'
print('*' * 50, title, '*' * 50)

for cmd in commands:
if run_command(cmd):
print(f'Reproduced on try no. {i + 1}!')
print(f'"{cmd}" is stuck!')

return 1

print('\n')
time.sleep(30)
return 0

if __name__ == '__main__':
sys.exit(main())

Thanks


Re: [PATCH v4 00/33] Per-VMA locks

2023-07-11 Thread Leon Romanovsky
On Tue, Jul 11, 2023 at 02:01:41PM +0300, Leon Romanovsky wrote:
> On Tue, Jul 11, 2023 at 12:39:34PM +0200, Vlastimil Babka wrote:
> > On 7/11/23 12:35, Leon Romanovsky wrote:
> > > 
> > > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote:
> > > 
> > > <...>
> > > 
> > >> Laurent Dufour (1):
> > >>   powerc/mm: try VMA lock-based page fault handling first
> > > 
> > > Hi,
> > > 
> > > This series and specifically the commit above broke docker over PPC.
> > > It causes to docker service stuck while trying to activate. Revert of
> > > this commit allows us to use docker again.
> > 
> > Hi,
> > 
> > there have been follow-up fixes, that are part of 6.4.3 stable (also
> > 6.5-rc1) Does that version work for you?
> 
> I'll recheck it again on clean system, but for the record:
> 1. We are running 6.5-rc1 kernels.
> 2. PPC doesn't compile for us on -rc1 without this fix.
> https://lore.kernel.org/all/20230629124500.1.I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4@changeid/

Ohh, I see it in -rc1, let's recheck.

> 3. I didn't see anything relevant -rc1 with "git log arch/powerpc/mm/fault.c".
> 
> Do you have in mind anything specific to check?
> 
> Thanks
> 


Re: [PATCH v4 00/33] Per-VMA locks

2023-07-11 Thread Leon Romanovsky
On Tue, Jul 11, 2023 at 12:39:34PM +0200, Vlastimil Babka wrote:
> On 7/11/23 12:35, Leon Romanovsky wrote:
> > 
> > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote:
> > 
> > <...>
> > 
> >> Laurent Dufour (1):
> >>   powerc/mm: try VMA lock-based page fault handling first
> > 
> > Hi,
> > 
> > This series and specifically the commit above broke docker over PPC.
> > It causes to docker service stuck while trying to activate. Revert of
> > this commit allows us to use docker again.
> 
> Hi,
> 
> there have been follow-up fixes, that are part of 6.4.3 stable (also
> 6.5-rc1) Does that version work for you?

I'll recheck it again on clean system, but for the record:
1. We are running 6.5-rc1 kernels.
2. PPC doesn't compile for us on -rc1 without this fix.
https://lore.kernel.org/all/20230629124500.1.I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4@changeid/
3. I didn't see anything relevant -rc1 with "git log arch/powerpc/mm/fault.c".

Do you have in mind anything specific to check?

Thanks


Re: [PATCH v4] net: wan: Add checks for NULL for utdm in undo_uhdlc_init and unmap_si_regs

2023-01-11 Thread Leon Romanovsky
On Wed, Jan 11, 2023 at 10:55:33PM +0300, Esina Ekaterina wrote:
> If uhdlc_priv_tsa != 1 then utdm is not initialized.
> And if ret != NULL then goto undo_uhdlc_init, where
> utdm is dereferenced. Same if dev == NULL.
> 
> Found by Astra Linux on behalf of Linux Verification Center
> (linuxtesting.org) with SVACE.
> 
> Signed-off-by: Esina Ekaterina 
>   ---
> v4: Fix style
> v3: Remove braces
> v2: Add check for NULL for unmap_si_regs
> ---
>  drivers/net/wan/fsl_ucc_hdlc.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

In addition to what Jakub said, please don't send patches as reply-to.
Please sent them as new threads.

Thanks


Re: [PATCH] PCI/AER: Rate limit the reporting of the correctable errors

2023-01-03 Thread Leon Romanovsky
On Wed, Jan 04, 2023 at 10:27:33AM +0530, Rajat Khandelwal wrote:
> Hi Bjorn,
> 
> Thanks for the acknowledgement.
> 
> On 1/4/2023 12:44 AM, Bjorn Helgaas wrote:
> > [+cc Paul, Sasha, Leon, Frederick]
> > 
> > (Please cc folks who have commented on previous versions of your
> > patch.)
> > 
> > On Tue, Jan 03, 2023 at 10:25:48PM +0530, Rajat Khandelwal wrote:
> > > There are many instances where correctable errors tend to inundate
> > > the message buffer. We observe such instances during thunderbolt PCIe
> > > tunneling.

<...>

> > > [54982.838808] igc :2b:00.0:   device [8086:5502] error 
> > > status/mask=1000/2000
> > > [54982.838817] igc :2b:00.0:[12] Timeout
> > Please remove the timestamps; they don't contribute to understanding
> > the problem.
> 
> --> Sure.

Please don't add "-->" or any marker to replies. It breaks mail color
scheme.

> 
> > 
> > > This gets repeated continuously, thus inundating the buffer.
> > Did you verify that we actually clear the Correctable Error Status
> > register?
> 
> --> This patch targets only rate limiting the correctable errors since they 
> are
> non-fatal, and they kind of inundate the CPU logs, particularly during 
> thunderbolt
> connections. It doesn't have an impact anywhere else.
> As per your suggestion in the igc patch, I found rate limiting as a doable 
> option
> currently. Have eradicated any kind of masking the bits.

You didn't answer on the asked question. "Did you verify that we actually clear
the Correctable Error Status register?".

Thanks


Re: [PATCH net-next] eth: fealnx: delete the driver for Myson MTD-800

2022-10-26 Thread Leon Romanovsky
On Tue, Oct 25, 2022 at 11:42:54AM -0700, Jakub Kicinski wrote:
> The git history for this driver seems to be completely
> automated / tree wide changes. I can't find any boards
> or systems which would use this chip. Google search
> shows pictures of towel warmers and no networking products.
> 
> Signed-off-by: Jakub Kicinski 
> ---
> CC: tsbog...@alpha.franken.de
> CC: m...@ellerman.id.au
> CC: npig...@gmail.com
> CC: christophe.le...@csgroup.eu
> CC: lukas.bulw...@gmail.com
> CC: a...@arndb.de
> CC: step...@networkplumber.org
> CC: shay...@amazon.com
> CC: l...@kernel.org
> CC: m...@semihalf.com
> CC: pe...@nvidia.com
> CC: wsa+rene...@sang-engineering.com
> CC: linux-m...@vger.kernel.org
> CC: linuxppc-dev@lists.ozlabs.org
> ---
>  arch/mips/configs/mtx1_defconfig  |1 -
>  arch/powerpc/configs/ppc6xx_defconfig |1 -
>  drivers/net/ethernet/Kconfig  |   10 -
>  drivers/net/ethernet/Makefile |1 -
>  drivers/net/ethernet/fealnx.c | 1953 -
>  5 files changed, 1966 deletions(-)
>  delete mode 100644 drivers/net/ethernet/fealnx.c

Thanks,
Reviewed-by: Leon Romanovsky 


Re: [PATCH] net: move from strlcpy with unused retval to strscpy

2022-08-21 Thread Leon Romanovsky
On Thu, Aug 18, 2022 at 11:00:34PM +0200, Wolfram Sang wrote:
> Follow the advice of the below link and prefer 'strscpy' in this
> subsystem. Conversion is 1:1 because the return value is not used.
> Generated by a coccinelle script.
> 
> Link: 
> https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=v6a6g1ouzcprm...@mail.gmail.com/
> Signed-off-by: Wolfram Sang 
> ---

<...>

>  drivers/net/ethernet/mellanox/mlx4/en_ethtool.c  |  6 +++---
>  drivers/net/ethernet/mellanox/mlx4/fw.c  |  2 +-
>  .../net/ethernet/mellanox/mlx5/core/en_ethtool.c |  4 ++--
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c |  2 +-
>  .../ethernet/mellanox/mlx5/core/ipoib/ethtool.c  |  2 +-

Thanks,
Reviewed-by: Leon Romanovsky 


Re: [PATCH 11/22] rdmavt: Replace comments with C99 initializers

2022-03-27 Thread Leon Romanovsky
On Sat, Mar 26, 2022 at 05:58:58PM +0100, Benjamin Stürz wrote:
> This replaces comments with C99's designated
> initializers because the kernel supports them now.
> 
> Signed-off-by: Benjamin Stürz 
> ---
>  drivers/infiniband/sw/rdmavt/rc.c | 62 +++
>  1 file changed, 31 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rdmavt/rc.c 
> b/drivers/infiniband/sw/rdmavt/rc.c
> index 4e5d4a27633c..121b8a23ac07 100644
> --- a/drivers/infiniband/sw/rdmavt/rc.c
> +++ b/drivers/infiniband/sw/rdmavt/rc.c
> @@ -10,37 +10,37 @@
>   * Convert the AETH credit code into the number of credits.
>   */
>  static const u16 credit_table[31] = {
> - 0,  /* 0 */
> - 1,  /* 1 */
> - 2,  /* 2 */
> - 3,  /* 3 */
> - 4,  /* 4 */
> - 6,  /* 5 */
> - 8,  /* 6 */
> - 12, /* 7 */
> - 16, /* 8 */
> - 24, /* 9 */
> - 32, /* A */
> - 48, /* B */
> - 64, /* C */
> - 96, /* D */
> - 128,/* E */
> - 192,/* F */
> - 256,/* 10 */
> - 384,/* 11 */
> - 512,/* 12 */
> - 768,/* 13 */
> - 1024,   /* 14 */
> - 1536,   /* 15 */
> - 2048,   /* 16 */
> - 3072,   /* 17 */
> - 4096,   /* 18 */
> - 6144,   /* 19 */
> - 8192,   /* 1A */
> - 12288,  /* 1B */
> - 16384,  /* 1C */
> - 24576,  /* 1D */
> - 32768   /* 1E */
> + [0x00] = 0,
> + [0x01] = 1,
> + [0x02] = 2,
> + [0x03] = 3,
> + [0x04] = 4,
> + [0x05] = 6,
> + [0x06] = 8,
> + [0x07] = 12,
> + [0x08] = 16,
> + [0x09] = 24,
> + [0x0A] = 32,
> + [0x0B] = 48,
> + [0x0C] = 64,
> + [0x0D] = 96,
> + [0x0E] = 128,
> + [0x0F] = 192,
> + [0x10] = 256,
> + [0x11] = 384,
> + [0x12] = 512,
> + [0x13] = 768,
> + [0x14] = 1024,
> + [0x15] = 1536,
> + [0x16] = 2048,
> + [0x17] = 3072,
> + [0x18] = 4096,
> + [0x19] = 6144,
> + [0x1A] = 8192,
> + [0x1B] = 12288,
> + [0x1C] = 16384,
> + [0x1D] = 24576,
> + [0x1E] = 32768
>  };

I have hard time to see any value in this commit, why is this change needed?

Thanks

>  
>  /**
> -- 
> 2.35.1
> 
> 


Re: [RFC v1 PATCH 3/3] driver: update all the code that use soc_device_match

2021-04-19 Thread Leon Romanovsky
On Mon, Apr 19, 2021 at 12:27:22PM +0800, Alice Guo (OSS) wrote:
> From: Alice Guo 
> 
> Update all the code that use soc_device_match because add support for
> soc_device_match returning -EPROBE_DEFER.
> 
> Signed-off-by: Alice Guo 
> ---
>  drivers/bus/ti-sysc.c |  2 +-
>  drivers/clk/renesas/r8a7795-cpg-mssr.c|  4 +++-
>  drivers/clk/renesas/rcar-gen2-cpg.c   |  2 +-
>  drivers/clk/renesas/rcar-gen3-cpg.c   |  2 +-
>  drivers/dma/fsl-dpaa2-qdma/dpaa2-qdma.c   |  7 ++-
>  drivers/dma/ti/k3-psil.c  |  3 +++
>  drivers/dma/ti/k3-udma.c  |  2 +-
>  drivers/gpu/drm/bridge/nwl-dsi.c  |  2 +-
>  drivers/gpu/drm/meson/meson_drv.c |  4 +++-
>  drivers/gpu/drm/omapdrm/dss/dispc.c   |  2 +-
>  drivers/gpu/drm/omapdrm/dss/dpi.c |  4 +++-
>  drivers/gpu/drm/omapdrm/dss/dsi.c |  3 +++
>  drivers/gpu/drm/omapdrm/dss/dss.c |  3 +++
>  drivers/gpu/drm/omapdrm/dss/hdmi4_core.c  |  3 +++
>  drivers/gpu/drm/omapdrm/dss/venc.c|  4 +++-
>  drivers/gpu/drm/omapdrm/omap_drv.c|  3 +++
>  drivers/gpu/drm/rcar-du/rcar_du_crtc.c|  4 +++-
>  drivers/gpu/drm/rcar-du/rcar_lvds.c   |  2 +-
>  drivers/gpu/drm/tidss/tidss_dispc.c   |  4 +++-
>  drivers/iommu/ipmmu-vmsa.c|  7 +--
>  drivers/media/platform/rcar-vin/rcar-core.c   |  2 +-
>  drivers/media/platform/rcar-vin/rcar-csi2.c   |  2 +-
>  drivers/media/platform/vsp1/vsp1_uif.c|  4 +++-
>  drivers/mmc/host/renesas_sdhi_core.c  |  2 +-
>  drivers/mmc/host/renesas_sdhi_internal_dmac.c |  2 +-
>  drivers/mmc/host/sdhci-of-esdhc.c | 21 ++-
>  drivers/mmc/host/sdhci-omap.c |  2 +-
>  drivers/mmc/host/sdhci_am654.c|  2 +-
>  drivers/net/ethernet/renesas/ravb_main.c  |  4 +++-
>  drivers/net/ethernet/ti/am65-cpsw-nuss.c  |  2 +-
>  drivers/net/ethernet/ti/cpsw.c|  2 +-
>  drivers/net/ethernet/ti/cpsw_new.c|  2 +-
>  drivers/phy/ti/phy-omap-usb2.c|  4 +++-
>  drivers/pinctrl/renesas/core.c|  2 +-
>  drivers/pinctrl/renesas/pfc-r8a7790.c |  5 -
>  drivers/pinctrl/renesas/pfc-r8a7794.c |  5 -
>  drivers/soc/fsl/dpio/dpio-driver.c| 13 
>  drivers/soc/renesas/r8a774c0-sysc.c   |  5 -
>  drivers/soc/renesas/r8a7795-sysc.c|  2 +-
>  drivers/soc/renesas/r8a77990-sysc.c   |  5 -
>  drivers/soc/ti/k3-ringacc.c   |  2 +-
>  drivers/staging/mt7621-pci/pci-mt7621.c   |  2 +-
>  drivers/thermal/rcar_gen3_thermal.c   |  4 +++-
>  drivers/thermal/ti-soc-thermal/ti-bandgap.c   | 10 +++--
>  drivers/usb/gadget/udc/renesas_usb3.c |  2 +-
>  drivers/usb/host/ehci-platform.c  |  4 +++-
>  drivers/usb/host/xhci-rcar.c  |  2 +-
>  drivers/watchdog/renesas_wdt.c|  2 +-
>  48 files changed, 131 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/bus/ti-sysc.c b/drivers/bus/ti-sysc.c
> index 5fae60f8c135..00c59aa217c1 100644
> --- a/drivers/bus/ti-sysc.c
> +++ b/drivers/bus/ti-sysc.c
> @@ -2909,7 +2909,7 @@ static int sysc_init_soc(struct sysc *ddata)
>   }
>  
>   match = soc_device_match(sysc_soc_feat_match);
> - if (!match)
> + if (!match || IS_ERR(match))
>   return 0;
>  
>   if (match->data)
> diff --git a/drivers/clk/renesas/r8a7795-cpg-mssr.c 
> b/drivers/clk/renesas/r8a7795-cpg-mssr.c
> index c32d2c678046..90a18336a4c3 100644
> --- a/drivers/clk/renesas/r8a7795-cpg-mssr.c
> +++ b/drivers/clk/renesas/r8a7795-cpg-mssr.c
> @@ -439,6 +439,7 @@ static const unsigned int r8a7795es2_mod_nullify[] 
> __initconst = {
>  
>  static int __init r8a7795_cpg_mssr_init(struct device *dev)
>  {
> + const struct soc_device_attribute *match;
>   const struct rcar_gen3_cpg_pll_config *cpg_pll_config;
>   u32 cpg_mode;
>   int error;
> @@ -453,7 +454,8 @@ static int __init r8a7795_cpg_mssr_init(struct device 
> *dev)
>   return -EINVAL;
>   }
>  
> - if (soc_device_match(r8a7795es1)) {
> + match = soc_device_match(r8a7795es1);
> + if (!IS_ERR(match) && match) {

"if (!IS_ERR_OR_NULL(match))" in all places.

Thanks


[PATCH net-next 22/23] net/freescale: Don't set zero if FW not-available in ucc_geth

2020-03-01 Thread Leon Romanovsky
From: Leon Romanovsky 

Rely on ethtool to properly present the fact that FW is not
available for the ucc_geth driver.

Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/freescale/ucc_geth_ethtool.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth_ethtool.c 
b/drivers/net/ethernet/freescale/ucc_geth_ethtool.c
index bc7ba70d176c..14c08a868190 100644
--- a/drivers/net/ethernet/freescale/ucc_geth_ethtool.c
+++ b/drivers/net/ethernet/freescale/ucc_geth_ethtool.c
@@ -334,7 +334,6 @@ uec_get_drvinfo(struct net_device *netdev,
struct ethtool_drvinfo *drvinfo)
 {
strlcpy(drvinfo->driver, DRV_NAME, sizeof(drvinfo->driver));
-   strlcpy(drvinfo->fw_version, "N/A", sizeof(drvinfo->fw_version));
strlcpy(drvinfo->bus_info, "QUICC ENGINE", sizeof(drvinfo->bus_info));
 }
 
-- 
2.24.1



[PATCH net-next 20/23] net/freescale: Clean drivers from static versions

2020-03-01 Thread Leon Romanovsky
From: Leon Romanovsky 

There is no need to set static versions because linux kernel is
released all together with same version applicable to the whole
code base.

Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c  |  2 --
 drivers/net/ethernet/freescale/enetc/enetc_pf.c | 13 -
 drivers/net/ethernet/freescale/enetc/enetc_vf.c | 12 
 drivers/net/ethernet/freescale/fec_main.c   |  1 -
 .../net/ethernet/freescale/fs_enet/fs_enet-main.c   |  2 --
 drivers/net/ethernet/freescale/fs_enet/fs_enet.h|  2 --
 drivers/net/ethernet/freescale/gianfar.c|  2 --
 drivers/net/ethernet/freescale/gianfar.h|  1 -
 drivers/net/ethernet/freescale/gianfar_ethtool.c|  2 --
 drivers/net/ethernet/freescale/ucc_geth.c   |  1 -
 drivers/net/ethernet/freescale/ucc_geth.h   |  1 -
 drivers/net/ethernet/freescale/ucc_geth_ethtool.c   |  1 -
 12 files changed, 40 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index 66d150872d48..13ab669ca8b3 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -110,8 +110,6 @@ static void dpaa_get_drvinfo(struct net_device *net_dev,
 
strlcpy(drvinfo->driver, KBUILD_MODNAME,
sizeof(drvinfo->driver));
-   len = snprintf(drvinfo->version, sizeof(drvinfo->version),
-  "%X", 0);
len = snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
   "%X", 0);
 
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index fc0d7d99e9a1..545a344bce00 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -7,12 +7,6 @@
 #include 
 #include "enetc_pf.h"
 
-#define ENETC_DRV_VER_MAJ 1
-#define ENETC_DRV_VER_MIN 0
-
-#define ENETC_DRV_VER_STR __stringify(ENETC_DRV_VER_MAJ) "." \
- __stringify(ENETC_DRV_VER_MIN)
-static const char enetc_drv_ver[] = ENETC_DRV_VER_STR;
 #define ENETC_DRV_NAME_STR "ENETC PF driver"
 static const char enetc_drv_name[] = ENETC_DRV_NAME_STR;
 
@@ -929,9 +923,6 @@ static int enetc_pf_probe(struct pci_dev *pdev,
 
netif_carrier_off(ndev);
 
-   netif_info(priv, probe, ndev, "%s v%s\n",
-  enetc_drv_name, enetc_drv_ver);
-
return 0;
 
 err_reg_netdev:
@@ -959,9 +950,6 @@ static void enetc_pf_remove(struct pci_dev *pdev)
enetc_sriov_configure(pdev, 0);
 
priv = netdev_priv(si->ndev);
-   netif_info(priv, drv, si->ndev, "%s v%s remove\n",
-  enetc_drv_name, enetc_drv_ver);
-
unregister_netdev(si->ndev);
 
enetc_mdio_remove(pf);
@@ -995,4 +983,3 @@ module_pci_driver(enetc_pf_driver);
 
 MODULE_DESCRIPTION(ENETC_DRV_NAME_STR);
 MODULE_LICENSE("Dual BSD/GPL");
-MODULE_VERSION(ENETC_DRV_VER_STR);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_vf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
index ebd21bf4cfa1..28a786b2f3e7 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_vf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
@@ -4,12 +4,6 @@
 #include 
 #include "enetc.h"
 
-#define ENETC_DRV_VER_MAJ 1
-#define ENETC_DRV_VER_MIN 0
-
-#define ENETC_DRV_VER_STR __stringify(ENETC_DRV_VER_MAJ) "." \
- __stringify(ENETC_DRV_VER_MIN)
-static const char enetc_drv_ver[] = ENETC_DRV_VER_STR;
 #define ENETC_DRV_NAME_STR "ENETC VF driver"
 static const char enetc_drv_name[] = ENETC_DRV_NAME_STR;
 
@@ -201,9 +195,6 @@ static int enetc_vf_probe(struct pci_dev *pdev,
 
netif_carrier_off(ndev);
 
-   netif_info(priv, probe, ndev, "%s v%s\n",
-  enetc_drv_name, enetc_drv_ver);
-
return 0;
 
 err_reg_netdev:
@@ -225,8 +216,6 @@ static void enetc_vf_remove(struct pci_dev *pdev)
struct enetc_ndev_priv *priv;
 
priv = netdev_priv(si->ndev);
-   netif_info(priv, drv, si->ndev, "%s v%s remove\n",
-  enetc_drv_name, enetc_drv_ver);
unregister_netdev(si->ndev);
 
enetc_free_msix(priv);
@@ -254,4 +243,3 @@ module_pci_driver(enetc_vf_driver);
 
 MODULE_DESCRIPTION(ENETC_DRV_NAME_STR);
 MODULE_LICENSE("Dual BSD/GPL");
-MODULE_VERSION(ENETC_DRV_VER_STR);
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 12edd4e358f8..af7653e341f2 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2128,7 +2128,6 @@ static void fec_enet_get_drvinfo(struct net_device *ndev,
 
strlcpy(info->driver, fep->pdev->dev.driver-&g

[PATCH net-next 00/23] Clean driver, module and FW versions

2020-03-01 Thread Leon Romanovsky
From: Leon Romanovsky 

Hi,

This is second batch of the series which removes various static versions
in favour of globaly defined Linux kernel version.

The first part with better cover letter can be found here
https://lore.kernel.org/lkml/20200224085311.460338-1-l...@kernel.org

The code is based on
68e2c37690b0 ("Merge branch 'hsr-several-code-cleanup-for-hsr-module'")

and WIP branch is
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=ethtool

Thanks

Leon Romanovsky (23):
  net/broadcom: Clean broadcom code from driver versions
  net/broadcom: Don't set N/A FW if it is not available
  net/brocade: Delete driver version
  net/liquidio: Delete driver version assignment
  net/liquidio: Delete non-working LIQUIDIO_PACKAGE check
  net/cavium: Clean driver versions
  net/cavium: Delete N/A assignments for ethtool
  net/chelsio: Delete drive and  module versions
  net/chelsio: Don't set N/A for not available FW
  net/cirrus: Delete driver version
  net/cisco: Delete driver and module versions
  net/cortina: Delete driver version from ethtool output
  net/davicom: Delete ethtool version assignment
  net/dec: Delete driver versions
  net/dlink: Remove driver version and release date
  net/dnet: Delete static version from the driver
  net/emulex: Delete driver version
  net/faraday: Delete driver version from the drivers
  net/fealnx: Delete driver version
  net/freescale: Clean drivers from static versions
  net/freescale: Don't set zero if FW not-available in dpaa
  net/freescale: Don't set zero if FW not-available in ucc_geth
  net/freescale: Don't set zero if FW iand bus not-available in gianfar

 drivers/net/ethernet/broadcom/b44.c   |  5 
 drivers/net/ethernet/broadcom/bcm63xx_enet.c  | 10 ++-
 drivers/net/ethernet/broadcom/bcmsysport.c|  1 -
 drivers/net/ethernet/broadcom/bnx2.c  | 11 
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h   |  8 +-
 .../ethernet/broadcom/bnx2x/bnx2x_ethtool.c   |  7 -
 .../net/ethernet/broadcom/bnx2x/bnx2x_main.c  |  7 -
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |  8 --
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++-
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  1 -
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c |  1 -
 .../net/ethernet/broadcom/genet/bcmgenet.c|  1 -
 drivers/net/ethernet/broadcom/tg3.c   | 11 +---
 drivers/net/ethernet/brocade/bna/bnad.c   |  4 ---
 drivers/net/ethernet/brocade/bna/bnad.h   |  2 --
 .../net/ethernet/brocade/bna/bnad_ethtool.c   |  1 -
 .../ethernet/cavium/liquidio/lio_ethtool.c|  2 --
 .../net/ethernet/cavium/liquidio/lio_main.c   |  8 --
 .../ethernet/cavium/liquidio/lio_vf_main.c|  5 ++--
 .../cavium/liquidio/liquidio_common.h |  6 -
 .../ethernet/cavium/liquidio/octeon_console.c | 10 ++-
 .../net/ethernet/cavium/octeon/octeon_mgmt.c  |  6 -
 .../ethernet/cavium/thunder/nicvf_ethtool.c   |  2 --
 drivers/net/ethernet/chelsio/cxgb/common.h|  1 -
 drivers/net/ethernet/chelsio/cxgb/cxgb2.c |  3 ---
 .../net/ethernet/chelsio/cxgb3/cxgb3_main.c   |  4 ---
 drivers/net/ethernet/chelsio/cxgb3/version.h  |  2 --
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h|  3 +--
 .../ethernet/chelsio/cxgb4/cxgb4_ethtool.c|  6 +
 .../net/ethernet/chelsio/cxgb4/cxgb4_main.c   | 10 ---
 .../ethernet/chelsio/cxgb4vf/cxgb4vf_main.c   |  9 ---
 .../ethernet/chelsio/libcxgb/libcxgb_ppm.c|  2 --
 drivers/net/ethernet/cirrus/ep93xx_eth.c  |  2 --
 drivers/net/ethernet/cisco/enic/enic.h|  2 --
 .../net/ethernet/cisco/enic/enic_ethtool.c|  1 -
 drivers/net/ethernet/cisco/enic/enic_main.c   |  3 ---
 drivers/net/ethernet/cortina/gemini.c |  2 --
 drivers/net/ethernet/davicom/dm9000.c |  2 --
 drivers/net/ethernet/dec/tulip/de2104x.c  | 15 ---
 drivers/net/ethernet/dec/tulip/dmfe.c | 14 --
 drivers/net/ethernet/dec/tulip/tulip_core.c   | 26 ++-
 drivers/net/ethernet/dec/tulip/uli526x.c  | 13 --
 drivers/net/ethernet/dec/tulip/winbond-840.c  | 12 -
 drivers/net/ethernet/dlink/dl2k.c |  9 ---
 drivers/net/ethernet/dlink/sundance.c | 20 --
 drivers/net/ethernet/dnet.c   |  1 -
 drivers/net/ethernet/dnet.h   |  1 -
 drivers/net/ethernet/emulex/benet/be.h|  1 -
 .../net/ethernet/emulex/benet/be_ethtool.c|  1 -
 drivers/net/ethernet/emulex/benet/be_main.c   |  5 +---
 drivers/net/ethernet/faraday/ftgmac100.c  |  2 --
 drivers/net/ethernet/faraday/ftmac100.c   |  3 ---
 drivers/net/ethernet/fealnx.c | 20 --
 .../ethernet/freescale/dpaa/dpaa_ethtool.c| 11 
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 13 --
 .../net/ethernet/freescale/enetc/enetc_vf.c   | 12 -
 drivers/net/ethernet/freescale/fec_main.c |  1 -
 .../ethernet/freescale/fs_enet/fs_enet-m

Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN

2019-12-24 Thread Leon Romanovsky
On Tue, Dec 24, 2019 at 06:03:50PM -0800, John Hubbard wrote:
> On 12/22/19 5:23 AM, Leon Romanovsky wrote:
> > On Fri, Dec 20, 2019 at 03:54:55PM -0800, John Hubbard wrote:
> > > On 12/20/19 10:29 AM, Leon Romanovsky wrote:
> > > ...
> > > > > $ ./build.sh
> > > > > $ build/bin/run_tests.py
> > > > >
> > > > > If you get things that far I think Leon can get a reproduction for you
> > > >
> > > > I'm not so optimistic about that.
> > > >
> > >
> > > OK, I'm going to proceed for now on the assumption that I've got an 
> > > overflow
> > > problem that happens when huge pages are pinned. If I can get more 
> > > information,
> > > great, otherwise it's probably enough.
> > >
> > > One thing: for your repro, if you know the huge page size, and the system
> > > page size for that case, that would really help. Also the number of pins 
> > > per
> > > page, more or less, that you'd expect. Because Jason says that only 2M 
> > > huge
> > > pages are used...
> > >
> > > Because the other possibility is that the refcount really is going 
> > > negative,
> > > likely due to a mismatched pin/unpin somehow.
> > >
> > > If there's not an obvious repro case available, but you do have one (is 
> > > it easy
> > > to repro, though?), then *if* you have the time, I could point you to a 
> > > github
> > > branch that reduces GUP_PIN_COUNTING_BIAS by, say, 4x, by applying this:
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index bb44c4d2ada7..8526fd03b978 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -1077,7 +1077,7 @@ static inline void put_page(struct page *page)
> > >* get_user_pages and page_mkclean and other calls that race to set up 
> > > page
> > >* table entries.
> > >*/
> > > -#define GUP_PIN_COUNTING_BIAS (1U << 10)
> > > +#define GUP_PIN_COUNTING_BIAS (1U << 8)
> > >
> > >   void unpin_user_page(struct page *page);
> > >   void unpin_user_pages_dirty_lock(struct page **pages, unsigned long 
> > > npages,
> > >
> > > If that fails to repro, then we would be zeroing in on the root cause.
> > >
> > > The branch is here (I just tested it and it seems healthy):
> > >
> > > g...@github.com:johnhubbard/linux.git  
> > > pin_user_pages_tracking_v11_with_diags
> >
> > Hi,
> >
> > We tested the following branch and here comes results:
>
> Thanks for this testing run!
>
> > [root@server consume_mtts]# (master) $ grep foll_pin /proc/vmstat
> > nr_foll_pin_requested 0
> > nr_foll_pin_returned 0
> >
>
> Zero pinned pages!

Maybe we are missing some CONFIG_* option?
https://lore.kernel.org/linux-rdma/12a28917-f8c9-5092-2f01-92bb74714...@nvidia.com/T/#mf900896f5dfc86cdee9246219990c632ed77115f

>
> ...now I'm confused. Somehow FOLL_PIN and pin_user_pages*() calls are
> not happening. And although the backtraces below show some of my new
> routines (like try_grab_page), they also confirm the above: there is no
> pin_user_page*() call in the stack.
>
> In particular, it looks like ib_umem_get() is calling through to
> get_user_pages*(), rather than pin_user_pages*(). I don't see how this
> is possible, because the code on my screen shows ib_umem_get() calling
> pin_user_pages_fast().
>
> Any thoughts or ideas are welcome here.
>
> However, glossing over all of that and assuming that the new
> GUP_PIN_COUNTING_BIAS of 256 is applied, it's interesting that we still
> see any overflow. I'm less confident now that this is a true refcount
> overflow.

Earlier in this email thread, I posted possible function call chain which
doesn't involve refcount overflow, but for some reason the refcount
overflow was chosen as a way to explore.

>
> Also, any information that would get me closer to being able to attempt
> my own reproduction of the problem are *very* welcome. :)

It is ancient verification test (~10y) which is not an easy task to
make it understandable and standalone :).

>
> thanks,
> --
> John Hubbard
> NVIDIA
>
> > [root@serer consume_mtts]# (master) $ dmesg
> > [  425.221459] [ cut here ]
> > [  425.225894] WARNING: CPU: 1 PID: 6738 at mm/gup.c:61 
> > try_grab_compound_head+0x90/0xa0
> > [  425.228021] Modules linked in: mlx5_ib mlx5_core mlxfw mlx4_ib mlx4_en 
> > ptp pps_core mlx4_core bonding ip6_gre ip6_tunnel tunnel6 i

Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN

2019-12-22 Thread Leon Romanovsky
On Fri, Dec 20, 2019 at 03:54:55PM -0800, John Hubbard wrote:
> On 12/20/19 10:29 AM, Leon Romanovsky wrote:
> ...
> >> $ ./build.sh
> >> $ build/bin/run_tests.py
> >>
> >> If you get things that far I think Leon can get a reproduction for you
> >
> > I'm not so optimistic about that.
> >
>
> OK, I'm going to proceed for now on the assumption that I've got an overflow
> problem that happens when huge pages are pinned. If I can get more 
> information,
> great, otherwise it's probably enough.
>
> One thing: for your repro, if you know the huge page size, and the system
> page size for that case, that would really help. Also the number of pins per
> page, more or less, that you'd expect. Because Jason says that only 2M huge
> pages are used...
>
> Because the other possibility is that the refcount really is going negative,
> likely due to a mismatched pin/unpin somehow.
>
> If there's not an obvious repro case available, but you do have one (is it 
> easy
> to repro, though?), then *if* you have the time, I could point you to a github
> branch that reduces GUP_PIN_COUNTING_BIAS by, say, 4x, by applying this:
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bb44c4d2ada7..8526fd03b978 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1077,7 +1077,7 @@ static inline void put_page(struct page *page)
>   * get_user_pages and page_mkclean and other calls that race to set up page
>   * table entries.
>   */
> -#define GUP_PIN_COUNTING_BIAS (1U << 10)
> +#define GUP_PIN_COUNTING_BIAS (1U << 8)
>
>  void unpin_user_page(struct page *page);
>  void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>
> If that fails to repro, then we would be zeroing in on the root cause.
>
> The branch is here (I just tested it and it seems healthy):
>
> g...@github.com:johnhubbard/linux.git  pin_user_pages_tracking_v11_with_diags

Hi,

We tested the following branch and here comes results:
[root@server consume_mtts]# (master) $ grep foll_pin /proc/vmstat
nr_foll_pin_requested 0
nr_foll_pin_returned 0

[root@serer consume_mtts]# (master) $ dmesg
[  425.221459] [ cut here ]
[  425.225894] WARNING: CPU: 1 PID: 6738 at mm/gup.c:61 
try_grab_compound_head+0x90/0xa0
[  425.228021] Modules linked in: mlx5_ib mlx5_core mlxfw mlx4_ib mlx4_en ptp 
pps_core mlx4_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre ip_tunnel 
rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm ib_uverbs ib_ipoib ib_umad ib_srp 
scsi_transport_srp rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm 
ib_cm ib_core [last unloaded: mlxfw]
[  425.235266] CPU: 1 PID: 6738 Comm: consume_mtts Tainted: G   O  
5.5.0-rc2+ #1
[  425.237480] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[  425.239738] RIP: 0010:try_grab_compound_head+0x90/0xa0
[  425.241170] Code: 06 48 8d 4f 34 f0 0f b1 57 34 74 cd 85 c0 74 cf 8d 14 06 
f0 0f b1 11 74 c0 eb f1 8d 14 06 f0 0f b1 11 74 b5 85 c0 75 f3 eb b5 <0f> 0b 31 
c0 c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41
[  425.245739] RSP: 0018:c96878a8 EFLAGS: 00010082
[  425.247124] RAX: 8001 RBX: 7f780488a000 RCX: 0bb0
[  425.248956] RDX: ea000e031087 RSI: 8a00 RDI: ea000dc58000
[  425.250761] RBP: ea000e031080 R08: c9687974 R09: 000fffe0
[  425.252661] R10:  R11: 88836256 R12: 008a
[  425.254487] R13: 8003716000e7 R14: 7f780488a000 R15: c9687974
[  425.256309] FS:  7f780d9d3740() GS:8883b1c8() 
knlGS:
[  425.258401] CS:  0010 DS:  ES:  CR0: 80050033
[  425.259949] CR2: 02334048 CR3: 00039c68c001 CR4: 001606a0
[  425.261884] Call Trace:
[  425.262735]  gup_pgd_range+0x517/0x5a0
[  425.263819]  internal_get_user_pages_fast+0x210/0x250
[  425.265193]  ib_umem_get+0x298/0x550 [ib_uverbs]
[  425.266476]  mr_umem_get+0xc9/0x260 [mlx5_ib]
[  425.267699]  mlx5_ib_reg_user_mr+0xcc/0x7e0 [mlx5_ib]
[  425.269134]  ? xas_load+0x8/0x80
[  425.270074]  ? xa_load+0x48/0x90
[  425.271038]  ? lookup_get_idr_uobject.part.10+0x12/0x70 [ib_uverbs]
[  425.272757]  ib_uverbs_reg_mr+0x127/0x280 [ib_uverbs]
[  425.274120]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xc2/0xf0 
[ib_uverbs]
[  425.276058]  ib_uverbs_cmd_verbs.isra.6+0x5be/0xbe0 [ib_uverbs]
[  425.277657]  ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
[  425.279155]  ? __alloc_pages_nodemask+0x148/0x2b0
[  425.280445]  ib_uverbs_ioctl+0xc0/0x120 [ib_uverbs]
[  425.281755]  do_vfs_ioctl+0x9d/0x650
[  425.282766]  ksys_ioctl+0x70/0x80
[  425.283745]  __x64_sys_ioctl+0x16/0x20
[  425.284912]  do_syscall_64+0x42/0x130
[  425.285973]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  425

Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN

2019-12-21 Thread Leon Romanovsky
On Fri, Dec 20, 2019 at 03:54:55PM -0800, John Hubbard wrote:
> On 12/20/19 10:29 AM, Leon Romanovsky wrote:
> ...
> >> $ ./build.sh
> >> $ build/bin/run_tests.py
> >>
> >> If you get things that far I think Leon can get a reproduction for you
> >
> > I'm not so optimistic about that.
> >
>
> OK, I'm going to proceed for now on the assumption that I've got an overflow
> problem that happens when huge pages are pinned. If I can get more 
> information,
> great, otherwise it's probably enough.
>
> One thing: for your repro, if you know the huge page size, and the system
> page size for that case, that would really help. Also the number of pins per
> page, more or less, that you'd expect. Because Jason says that only 2M huge
> pages are used...
>
> Because the other possibility is that the refcount really is going negative,
> likely due to a mismatched pin/unpin somehow.
>
> If there's not an obvious repro case available, but you do have one (is it 
> easy
> to repro, though?), then *if* you have the time, I could point you to a github
> branch that reduces GUP_PIN_COUNTING_BIAS by, say, 4x, by applying this:

I'll see what I can do this Sunday.

>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bb44c4d2ada7..8526fd03b978 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1077,7 +1077,7 @@ static inline void put_page(struct page *page)
>   * get_user_pages and page_mkclean and other calls that race to set up page
>   * table entries.
>   */
> -#define GUP_PIN_COUNTING_BIAS (1U << 10)
> +#define GUP_PIN_COUNTING_BIAS (1U << 8)
>
>  void unpin_user_page(struct page *page);
>  void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>
> If that fails to repro, then we would be zeroing in on the root cause.
>
> The branch is here (I just tested it and it seems healthy):
>
> g...@github.com:johnhubbard/linux.git  pin_user_pages_tracking_v11_with_diags
>
>
>
> thanks,
> --
> John Hubbard
> NVIDIA


Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN

2019-12-20 Thread Leon Romanovsky
On Thu, Dec 19, 2019 at 02:58:43PM -0800, John Hubbard wrote:
> On 12/19/19 1:07 PM, Jason Gunthorpe wrote:
> ...
> > > 3. It would be nice if I could reproduce this. I have a two-node mlx5 
> > > Infiniband
> > > test setup, but I have done only the tiniest bit of user space IB coding, 
> > > so
> > > if you have any test programs that aren't too hard to deal with that could
> > > possibly hit this, or be tweaked to hit it, I'd be grateful. Keeping in 
> > > mind
> > > that I'm not an advanced IB programmer. At all. :)
> >
> > Clone this:
> >
> > https://github.com/linux-rdma/rdma-core.git
> >
> > Install all the required deps to build it (notably cython), see the 
> > README.md
> >
> > $ ./build.sh
> > $ build/bin/run_tests.py
> >
> > If you get things that far I think Leon can get a reproduction for you
> >
>
> Cool, it's up and running (1 failure, 3 skipped, out of 67 tests).
>
> This is a great test suite to have running, I'll add it to my scripts. Here's 
> the
> full output in case the failure or skip cases are a problem:
>
> $ sudo ./build/bin/run_tests.py --verbose
>
> test_create_ah (tests.test_addr.AHTest) ... ok
> test_create_ah_roce (tests.test_addr.AHTest) ... skipped "Can't run RoCE 
> tests on IB link layer"
> test_destroy_ah (tests.test_addr.AHTest) ... ok
> test_create_comp_channel (tests.test_cq.CCTest) ... ok
> test_destroy_comp_channel (tests.test_cq.CCTest) ... ok
> test_create_cq_ex (tests.test_cq.CQEXTest) ... ok
> test_create_cq_ex_bad_flow (tests.test_cq.CQEXTest) ... ok
> test_destroy_cq_ex (tests.test_cq.CQEXTest) ... ok
> test_create_cq (tests.test_cq.CQTest) ... ok
> test_create_cq_bad_flow (tests.test_cq.CQTest) ... ok
> test_destroy_cq (tests.test_cq.CQTest) ... ok
> test_rc_traffic_cq_ex (tests.test_cqex.CqExTestCase) ... ok
> test_ud_traffic_cq_ex (tests.test_cqex.CqExTestCase) ... ok
> test_xrc_traffic_cq_ex (tests.test_cqex.CqExTestCase) ... ok
> test_create_dm (tests.test_device.DMTest) ... ok
> test_create_dm_bad_flow (tests.test_device.DMTest) ... ok
> test_destroy_dm (tests.test_device.DMTest) ... ok
> test_destroy_dm_bad_flow (tests.test_device.DMTest) ... ok
> test_dm_read (tests.test_device.DMTest) ... ok
> test_dm_write (tests.test_device.DMTest) ... ok
> test_dm_write_bad_flow (tests.test_device.DMTest) ... ok
> test_dev_list (tests.test_device.DeviceTest) ... ok
> test_open_dev (tests.test_device.DeviceTest) ... ok
> test_query_device (tests.test_device.DeviceTest) ... ok
> test_query_device_ex (tests.test_device.DeviceTest) ... ok
> test_query_gid (tests.test_device.DeviceTest) ... ok
> test_query_port (tests.test_device.DeviceTest) ... FAIL
> test_query_port_bad_flow (tests.test_device.DeviceTest) ... ok
> test_create_dm_mr (tests.test_mr.DMMRTest) ... ok
> test_destroy_dm_mr (tests.test_mr.DMMRTest) ... ok
> test_buffer (tests.test_mr.MRTest) ... ok
> test_dereg_mr (tests.test_mr.MRTest) ... ok
> test_dereg_mr_twice (tests.test_mr.MRTest) ... ok
> test_lkey (tests.test_mr.MRTest) ... ok
> test_read (tests.test_mr.MRTest) ... ok
> test_reg_mr (tests.test_mr.MRTest) ... ok
> test_reg_mr_bad_flags (tests.test_mr.MRTest) ... ok
> test_reg_mr_bad_flow (tests.test_mr.MRTest) ... ok
> test_rkey (tests.test_mr.MRTest) ... ok
> test_write (tests.test_mr.MRTest) ... ok
> test_dereg_mw_type1 (tests.test_mr.MWTest) ... ok
> test_dereg_mw_type2 (tests.test_mr.MWTest) ... ok
> test_reg_mw_type1 (tests.test_mr.MWTest) ... ok
> test_reg_mw_type2 (tests.test_mr.MWTest) ... ok
> test_reg_mw_wrong_type (tests.test_mr.MWTest) ... ok
> test_odp_rc_traffic (tests.test_odp.OdpTestCase) ... ok
> test_odp_ud_traffic (tests.test_odp.OdpTestCase) ... skipped 'ODP is not 
> supported - ODP recv not supported'
> test_odp_xrc_traffic (tests.test_odp.OdpTestCase) ... ok
> test_default_allocators (tests.test_parent_domain.ParentDomainTestCase) ... ok
> test_mem_align_allocators (tests.test_parent_domain.ParentDomainTestCase) ... 
> ok
> test_without_allocators (tests.test_parent_domain.ParentDomainTestCase) ... ok
> test_alloc_pd (tests.test_pd.PDTest) ... ok
> test_create_pd_none_ctx (tests.test_pd.PDTest) ... ok
> test_dealloc_pd (tests.test_pd.PDTest) ... ok
> test_destroy_pd_twice (tests.test_pd.PDTest) ... ok
> test_multiple_pd_creation (tests.test_pd.PDTest) ... ok
> test_create_qp_ex_no_attr (tests.test_qp.QPTest) ... ok
> test_create_qp_ex_no_attr_connected (tests.test_qp.QPTest) ... ok
> test_create_qp_ex_with_attr (tests.test_qp.QPTest) ... ok
> test_create_qp_ex_with_attr_connected (tests.test_qp.QPTest) ... ok
> test_create_qp_no_attr (tests.test_qp.QPTest) ... ok
> test_create_qp_no_attr_connected (tests.test_qp.QPTest) ... ok
> test_create_qp_with_attr (tests.test_qp.QPTest) ... ok
> test_create_qp_with_attr_connected (tests.test_qp.QPTest) ... ok
> test_modify_qp (tests.test_qp.QPTest) ... ok
> test_query_qp (tests.test_qp.QPTest) ... ok
> test_rdmacm_sync_traffic (tests.test_rdmacm.CMTestCase) ... skipped 'No 
> devices with net interface'
>
> 

Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN

2019-12-20 Thread Leon Romanovsky
On Thu, Dec 19, 2019 at 05:07:43PM -0400, Jason Gunthorpe wrote:
> On Thu, Dec 19, 2019 at 12:30:31PM -0800, John Hubbard wrote:
> > On 12/19/19 5:26 AM, Leon Romanovsky wrote:
> > > On Mon, Dec 16, 2019 at 02:25:12PM -0800, John Hubbard wrote:
> > > > Hi,
> > > >
> > > > This implements an API naming change (put_user_page*() -->
> > > > unpin_user_page*()), and also implements tracking of FOLL_PIN pages. It
> > > > extends that tracking to a few select subsystems. More subsystems will
> > > > be added in follow up work.
> > >
> > > Hi John,
> > >
> > > The patchset generates kernel panics in our IB testing. In our tests, we
> > > allocated single memory block and registered multiple MRs using the single
> > > block.
> > >
> > > The possible bad flow is:
> > >   ib_umem_geti() ->
> > >pin_user_pages_fast(FOLL_WRITE) ->
> > > internal_get_user_pages_fast(FOLL_WRITE) ->
> > >  gup_pgd_range() ->
> > >   gup_huge_pd() ->
> > >gup_hugepte() ->
> > > try_grab_compound_head() ->
> >
> > Hi Leon,
> >
> > Thanks very much for the detailed report! So we're overflowing...
> >
> > At first look, this seems likely to be hitting a weak point in the
> > GUP_PIN_COUNTING_BIAS-based design, one that I believed could be deferred
> > (there's a writeup in Documentation/core-api/pin_user_page.rst, lines
> > 99-121). Basically it's pretty easy to overflow the page->_refcount
> > with huge pages if the pages have a *lot* of subpages.
> >
> > We can only do about 7 pins on 1GB huge pages that use 4KB subpages.
>
> Considering that establishing these pins is entirely under user
> control, we can't have a limit here.
>
> If the number of allowed pins are exhausted then the
> pin_user_pages_fast() must fail back to the user.
>
> > 3. It would be nice if I could reproduce this. I have a two-node mlx5 
> > Infiniband
> > test setup, but I have done only the tiniest bit of user space IB coding, so
> > if you have any test programs that aren't too hard to deal with that could
> > possibly hit this, or be tweaked to hit it, I'd be grateful. Keeping in mind
> > that I'm not an advanced IB programmer. At all. :)
>
> Clone this:
>
> https://github.com/linux-rdma/rdma-core.git
>
> Install all the required deps to build it (notably cython), see the README.md
>
> $ ./build.sh
> $ build/bin/run_tests.py
>
> If you get things that far I think Leon can get a reproduction for you

I'm not so optimistic about that.

Thanks

>
> Jason


Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN

2019-12-19 Thread Leon Romanovsky
On Mon, Dec 16, 2019 at 02:25:12PM -0800, John Hubbard wrote:
> Hi,
>
> This implements an API naming change (put_user_page*() -->
> unpin_user_page*()), and also implements tracking of FOLL_PIN pages. It
> extends that tracking to a few select subsystems. More subsystems will
> be added in follow up work.

Hi John,

The patchset generates kernel panics in our IB testing. In our tests, we
allocated single memory block and registered multiple MRs using the single
block.

The possible bad flow is:
 ib_umem_geti() ->
  pin_user_pages_fast(FOLL_WRITE) ->
   internal_get_user_pages_fast(FOLL_WRITE) ->
gup_pgd_range() ->
 gup_huge_pd() ->
  gup_hugepte() ->
   try_grab_compound_head() ->

 108 static __maybe_unused struct page *try_grab_compound_head(struct page 
*page,
 109   int refs,
 110   unsigned int 
flags)
 111 {
 112 if (flags & FOLL_GET)
 113 return try_get_compound_head(page, refs);
 114 else if (flags & FOLL_PIN)
 115 return try_pin_compound_head(page, refs);
 116
 117 WARN_ON_ONCE(1);
 118 return NULL;
 119 }

# (master) $ dmesg
[10924.70] mlx5_core :00:08.0 eth2: Link up
[10924.725383] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[10960.902254] [ cut here ]
[10960.905614] WARNING: CPU: 3 PID: 8838 at mm/gup.c:61 
try_grab_compound_head+0x92/0xd0
[10960.907313] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod ib_srpt 
target_core_mod ib_srp rpcrdma rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib iw_cm 
ib_cm mlx5_ib ib_uverbs ib_core kvm_intel mlx5_core rfkill mlxfw sunrpc 
virtio_net pci_hyperv_intf kvm irqbypass net_failover crc32_pclmul i2c_piix4 
ptp crc32c_intel failover pcspkr ghash_clmulni_intel i2c_core pps_core 
sch_fq_codel ip_tables ata_generic pata_acpi serio_raw ata_piix floppy [last 
unloaded: mlxkvl]
[10960.917806] CPU: 3 PID: 8838 Comm: consume_mtts Tainted: G   OE 
5.5.0-rc2-for-upstream-perf-2019-12-18_10-06-50-78 #1
[10960.920530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[10960.923024] RIP: 0010:try_grab_compound_head+0x92/0xd0
[10960.924329] Code: e4 8d 14 06 48 8d 4f 34 f0 0f b1 57 34 0f 94 c2 84 d2 75 
cb 85 c0 74 cd 8d 14 06 f0 0f b1 11 0f 94 c2 84 d2 75 b9 66 90 eb ea <0f> 0b 31 
ff eb b7 85 c0 66 0f 1f 44 00 00 74 ab 8d 14 06 f0 0f b1
[10960.928512] RSP: 0018:c9000129f880 EFLAGS: 00010082
[10960.929831] RAX: 8001 RBX: 7f6397446000 RCX: 000fffe0
[10960.931422] RDX: 0004 RSI: 00011800 RDI: ea000f5d8000
[10960.933005] RBP: c9000129f93c R08: c9000129f93c R09: 0020
[10960.934584] R10: 88840774b200 R11: 88800230 R12: 7f6397446000
[10960.936212] R13: 0046 R14: 8003d76000e7 R15: 0080
[10960.937793] FS:  7f63a0590740() GS:88842f98() 
knlGS:
[10960.939962] CS:  0010 DS:  ES:  CR0: 80050033
[10960.941367] CR2: 023e9008 CR3: 000406d0a002 CR4: 007606e0
[10960.942975] DR0:  DR1:  DR2: 
[10960.944654] DR3:  DR6: fffe0ff0 DR7: 0400
[10960.946394] PKRU: 5554
[10960.947310] Call Trace:
[10960.948193]  gup_pgd_range+0x61e/0x950
[10960.949585]  internal_get_user_pages_fast+0x98/0x1c0
[10960.951313]  ib_umem_get+0x2b3/0x5a0 [ib_uverbs]
[10960.952929]  mr_umem_get+0xd8/0x280 [mlx5_ib]
[10960.954150]  ? xas_store+0x49/0x550
[10960.955187]  mlx5_ib_reg_user_mr+0x149/0x7a0 [mlx5_ib]
[10960.956478]  ? xas_load+0x9/0x80
[10960.957474]  ? xa_load+0x54/0x90
[10960.958465]  ? lookup_get_idr_uobject.part.10+0x12/0x80 [ib_uverbs]
[10960.959926]  ib_uverbs_reg_mr+0x138/0x2a0 [ib_uverbs]
[10960.961192]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 
[ib_uverbs]
[10960.963208]  ib_uverbs_cmd_verbs.isra.8+0x997/0xb30 [ib_uverbs]
[10960.964603]  ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
[10960.965949]  ? mem_cgroup_commit_charge+0x6a/0x140
[10960.967177]  ? page_add_new_anon_rmap+0x58/0xc0
[10960.968360]  ib_uverbs_ioctl+0xbc/0x130 [ib_uverbs]
[10960.969595]  do_vfs_ioctl+0xa6/0x640
[10960.970631]  ? syscall_trace_enter+0x1f8/0x2e0
[10960.971829]  ksys_ioctl+0x60/0x90
[10960.972825]  __x64_sys_ioctl+0x16/0x20
[10960.973888]  do_syscall_64+0x48/0x130
[10960.974949]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[10960.976219] RIP: 0033:0x7f639fe9b267
[10960.977260] Code: b3 66 90 48 8b 05 19 3c 2c 00 64 c7 00 26 00 00 00 48 c7 
c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d e9 3b 2c 00 f7 d8 64 89 01 48
[10960.981413] RSP: 002b:7fff5335ca08 EFLAGS: 0246 ORIG_RAX: 
0010
[10960.983472] RAX: 

Re: [PATCH v7 07/24] IB/umem: use get_user_pages_fast() to pin DMA pages

2019-11-24 Thread Leon Romanovsky
On Thu, Nov 21, 2019 at 10:36:43AM -0400, Jason Gunthorpe wrote:
> On Thu, Nov 21, 2019 at 12:07:46AM -0800, Christoph Hellwig wrote:
> > On Wed, Nov 20, 2019 at 11:13:37PM -0800, John Hubbard wrote:
> > > And get rid of the mmap_sem calls, as part of that. Note
> > > that get_user_pages_fast() will, if necessary, fall back to
> > > __gup_longterm_unlocked(), which takes the mmap_sem as needed.
> > >
> > > Reviewed-by: Jan Kara 
> > > Reviewed-by: Jason Gunthorpe 
> > > Reviewed-by: Ira Weiny 
> > > Signed-off-by: John Hubbard 
> >
> > Looks fine,
> >
> > Reviewed-by: Christoph Hellwig 
> >
> > Jason, can you queue this up for 5.5 to reduce this patch stack a bit?
>
> Yes, I said I'd do this in an earlier revision. Now that it is clear this
> won't go through Andrew's tree, applied to rdma for-next

Jason,

This patch broke RDMA completely.
Change from get_user_pages() to get_user_pages_fast() causes to endless
amount of splats due to combination of the following code:

189 struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
190 size_t size, int access)
...
263 if (!umem->writable)
264 gup_flags |= FOLL_FORCE;
265

and

2398 int get_user_pages_fast(unsigned long start, int nr_pages,
2399 unsigned int gup_flags, struct page **pages)
2400 {
2401 unsigned long addr, len, end;
2402 int nr = 0, ret = 0;
2403
2404 if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM)))
2405 return -EINVAL;


[   72.623266] [ cut here ]
[   72.624143] WARNING: CPU: 1 PID: 2557 at mm/gup.c:2404 
get_user_pages_fast+0x115/0x180
[   72.625426] Modules linked in: xt_MASQUERADE iptable_nat nf_nat
   br_netfilter overlay ib_srp scsi_transport_srp rpcrdma rdma_ucm
   ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad iw_cm ib_ipoib
   ib_cm mlx5_ib ib_uverbs ib_core mlx5_core mlxfw ptp pps_core
[   72.629024] CPU: 1 PID: 2557 Comm: python3 Not tainted 
5.4.0-rc5-J6120-Geeb831ec5b80 #1
[   72.630435] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[   72.631973] RIP: 0010:get_user_pages_fast+0x115/0x180
[   72.632873] Code: 8d 0c 10 85 c0 89 d0 0f 49 c1 eb 97 fa 4c 8d
   44 24 0c 4c 89 e9 44 89 e2 48 89 df e8 25 dd ff ff fb 8b 44 24 0c
   e9 75 ff ff ff <0f> 0b b8 ea ff ff ff e9 6d ff ff ff 65 4c 8b 2c
   25 00 5d 01 00 49
[   72.635967] RSP: 0018:c9edf930 EFLAGS: 00010202
[   72.636896] RAX:  RBX: 7f931c392000 RCX: 8883f4909000
[   72.638117] RDX: 00010011 RSI: 0001 RDI: 7f931c392000
[   72.639353] RBP: 7f931c392000 R08: 0020 R09: 8883f4909000
[   72.640602] R10:  R11: 8884068d6760 R12: 888409e45ba0
[   72.641858] R13: 8884068d6760 R14: 1600 R15: 8883f4909000
[   72.643115] FS:  7f9323754700() GS:88842fa8() 
knlGS:
[   72.644586] CS:  0010 DS:  ES:  CR0: 80050033
[   72.645628] CR2: 7f931c3ca000 CR3: 0003f83d6006 CR4: 007606a0
[   72.646875] DR0:  DR1:  DR2: 
[   72.648120] DR3:  DR6: fffe0ff0 DR7: 0400
[   72.649374] PKRU: 5554
[   72.649935] Call Trace:
[   72.650477]  ib_umem_get+0x298/0x550 [ib_uverbs]
[   72.651347]  mlx5_ib_db_map_user+0xad/0x130 [mlx5_ib]
[   72.652279]  mlx5_ib_create_cq+0x1e8/0xaa0 [mlx5_ib]
[   72.653207]  create_cq+0x1c8/0x2d0 [ib_uverbs]
[   72.654050]  ib_uverbs_create_cq+0x70/0xa0 [ib_uverbs]
[   72.654988]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xc2/0xf0 
[ib_uverbs]
[   72.656331]  ib_uverbs_cmd_verbs.isra.6+0x5be/0xbe0 [ib_uverbs]
[   72.657398]  ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
[   72.658423]  ? kvm_clock_get_cycles+0xd/0x10
[   72.659229]  ? kmem_cache_alloc+0x176/0x1c0
[   72.660025]  ? filemap_map_pages+0x18c/0x350
[   72.660838]  ib_uverbs_ioctl+0xc0/0x120 [ib_uverbs]
[   72.661756]  do_vfs_ioctl+0xa1/0x610
[   72.662458]  ksys_ioctl+0x70/0x80
[   72.663119]  __x64_sys_ioctl+0x16/0x20
[   72.663850]  do_syscall_64+0x42/0x110
[   72.664562]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   72.665492] RIP: 0033:0x7f9332958267
[   72.666185] Code: b3 66 90 48
8b 05 19 3c 2c 00 64 c7 00 26 00
00 00 48 c7 c0 ff ff ff ff c3 66
2e 0f 1f 84 00 00 00 00 00 b8 10
00 00 00 0f 05 <48> 3d 01 f0 ff
ff 73 01 c3 48 8b 0d e9 3b 2c 00
f7 d8 64 89 01 48
[   72.669347] RSP: 002b:7f9323751928 EFLAGS: 0246 ORIG_RAX: 
0010
[   72.670706] RAX: ffda RBX: 7f93237519b8 RCX: 7f9332958267
[   72.671937] RDX: 7f93237519a0 RSI: c0181b01 RDI: 0008
[   72.673176] RBP: 7f9323751980 R08: 0005 R09: 7f931c3ffef0
[   72.674415] R10: 1000 R11: 0246 R12: 7f931c400030
[   

Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-07 Thread Leon Romanovsky
On Mon, Jan 07, 2019 at 09:01:29PM -0700, Jason Gunthorpe wrote:
> On Sun, Jan 06, 2019 at 09:43:46AM +1100, Benjamin Herrenschmidt wrote:
> > On Sat, 2019-01-05 at 10:51 -0700, Jason Gunthorpe wrote:
> > >
> > > > Interesting.  I've investigated this further, though I don't have as
> > > > many new clues as I'd like.  The problem occurs reliably, at least on
> > > > one particular type of machine (a POWER8 "Garrison" with ConnectX-4).
> > > > I don't yet know if it occurs with other machines, I'm having trouble
> > > > getting access to other machines with a suitable card.  I didn't
> > > > manage to reproduce it on a different POWER8 machine with a
> > > > ConnectX-5, but I don't know if it's the difference in machine or
> > > > difference in card revision that's important.
> > >
> > > Make sure the card has the latest firmware is always good advice..
> > >
> > > > So possibilities that occur to me:
> > > >   * It's something specific about how the vfio-pci driver uses D3
> > > > state - have you tried rebinding your device to vfio-pci?
> > > >   * It's something specific about POWER, either the kernel or the PCI
> > > > bridge hardware
> > > >   * It's something specific about this particular type of machine
> > >
> > > Does the EEH indicate what happend to actually trigger it?
> >
> > In a very cryptic way that requires manual parsing using non-public
> > docs sadly but yes. From the look of it, it's a completion timeout.
> >
> > Looks to me like we don't get a response to a config space access
> > during the change of D state. I don't know if it's the write of the D3
> > state itself or the read back though (it's probably detected on the
> > read back or a subsequent read, but that doesn't tell me which specific
> > one failed).
>
> If it is just one card doing it (again, check you have latest
> firmware) I wonder if it is a sketchy PCI-E electrical link that is
> causing a long re-training cycle? Can you tell if the PCI-E link is
> permanently gone or does it eventually return?
>
> Does the card work in Gen 3 when it starts? Is there any indication of
> PCI-E link errors?
>
> Everytime or sometimes?
>
> POWER 8 firmware is good? If the link does eventually come back, is
> the POWER8's D3 resumption timeout long enough?
>
> If this doesn't lead to an obvious conclusion you'll probably need to
> connect to IBM's Mellanox support team to get more information from
> the card side.

+1, I tried to find any Mellanox-internal bugs related to your issue
and didn't find anything concrete.

Thanks

>
> Jason


signature.asc
Description: PGP signature


Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2018-12-05 Thread Leon Romanovsky
On Thu, Dec 06, 2018 at 03:19:51PM +1100, David Gibson wrote:
> Mellanox ConnectX-5 IB cards (MT27800) seem to cause a call trace when
> unbound from their regular driver and attached to vfio-pci in order to pass
> them through to a guest.
>
> This goes away if the disable_idle_d3 option is used, so it looks like a
> problem with the hardware handling D3 state.  To fix that more permanently,
> use a device quirk to disable D3 state for these devices.
>
> We do this by renaming the existing quirk_no_ata_d3() more generally and
> attaching it to the ConnectX-[45] devices (0x15b3:0x1013).
>
> Signed-off-by: David Gibson 
> ---
>  drivers/pci/quirks.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
>

Hi David,

Thank for your patch,

I would like to reproduce the calltrace before moving forward,
but have trouble to reproduce the original issue.

I'm working with vfio-pci and CX-4/5 cards on daily basis,
tried manually enter into D3 state now, and it worked for me.

Can you please post your full calltrace, and "lspci -s PCI_ID -vv"
output?

Thanks


signature.asc
Description: PGP signature


Re: Regression from patch 'tty: hvc: hvc_poll() break hv read loop'

2018-09-04 Thread Leon Romanovsky
On Wed, Sep 05, 2018 at 01:51:56PM +1000, Nicholas Piggin wrote:
> On Tue, 4 Sep 2018 15:16:35 -0600
> Jason Gunthorpe  wrote:
>
> > On Wed, Sep 05, 2018 at 07:15:29AM +1000, Nicholas Piggin wrote:
> > > On Tue, 4 Sep 2018 11:48:08 -0600
> > > Jason Gunthorpe  wrote:
> > >
> > > > Hi Nicholas,
> > > >
> > > > I am testing 4.19-rc2 and I see bad behavior with my qemu hvc0
> > > > console..
> > > >
> > > > Running interactive with qemu (qemu-2.11.2-1.fc28) on the console
> > > > providing hvc0, using options like:
> > > >
> > > > -nographic
> > > > -chardev stdio,id=stdio,mux=on,signal=off
> > > > -mon chardev=stdio
> > > > -device isa-serial,chardev=stdio
> > > > -device virtio-serial-pci
> > > > -device virtconsole,chardev=stdio
> > > >
> > > > I see the hvc0 console hang regularly, ie doing something like 'up
> > > > arrow' in bash causes the hvc0 console to hang. Prior kernels worked
> > > > OK.
> > > >
> > > > Any ideas? I'm not familiar with this code.. Thanks!
> > >
> > > Yes I have had another report, I'm working on a fix. Sorry it has taken
> > > a while and thank you for the report.
> >
> > Okay, let me know when you have a fix and I will be able to test it
> > for you!
>
> Can you try this?

It worked for me, so it will work for Jason too (we have same setup).

Tested-by: Leon Romanovsky 

Thanks