Re: [PATCH RFC 1/1] powerpc/eeh: Provide a unique ID for each EEH recovery

2020-06-25 Thread Oliver O'Halloran
On Wed, Jun 24, 2020 at 3:20 PM Sam Bobroff  wrote:
>
> Give a unique ID to each recovery event, to ease log parsing and
> prepare for parallel recovery.
>
> Also add some new messages with a very simple format that may be
> useful to log-parsers.
>
> Signed-off-by: Sam Bobroff 
> ---
> This patch should be applied on top of my recent(ish) set:
> "powerpc/eeh: Synchronization for safety".

If you're going to do a respin I'd post these as a single series and
rebase it on mainline. There's a bit of drift.

> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 68e6dfa526a5..54f921ff7621 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -197,7 +197,8 @@ EXPORT_SYMBOL_GPL(eeh_recovery_must_be_locked);
>   * for the indicated PCI device, and puts them into a buffer
>   * for RTAS error logging.
>   */
> -static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
> +static size_t eeh_dump_dev_log(unsigned int event_id, struct eeh_dev *edev,
> +  char *buf, size_t len)

If we're going to pass around some event context then IMO we should
pass around the eeh_event itself rather than just an ID number. That
would give us somewhere to put any extra per-event context (such as
the saved stacktrace) rather than dumping it into eeh_pe.

We'd probably have to fix the "special" events so they're signalled by
some means other than a NULL event pointer.

*snip*


> @@ -280,19 +283,26 @@ static void eeh_pe_report_pdev(struct pci_dev *pdev, 
> eeh_report_fn fn,
> driver = eeh_pcid_get(pdev);
>
> if (!driver)
> -   pci_info(pdev, "no driver");
> +   pci_info(pdev, "EEH(%u): no driver", event_id);
> else if (!driver->err_handler)
> -   pci_info(pdev, "driver not EEH aware");
> +   pci_info(pdev, "EEH(%u): driver not EEH aware", 
> event_id);
> else if (late)
> -   pci_info(pdev, "driver bound too late");
> +   pci_info(pdev, "EEH(%u): driver bound too late", 
> event_id);
> else {
> -   new_result = fn(pdev, driver);

> +   pr_warn("EEH(%u): EVENT=HANDLER_CALL 
> DEVICE=%04x:%02x:%02x.%x DRIVER='%s' HANDLER='%s'\n",

WHY ARE WE YELLING




> @@ -579,7 +598,8 @@ static void *eeh_add_virt_device(struct eeh_dev *edev)
>   * lock is dropped (which it must be in order to take the PCI rescan/remove
>   * lock without risking a deadlock).
>   */
> -static void eeh_rmv_device(struct pci_dev *pdev, void *userdata)
> +static void eeh_rmv_device(unsigned int event_id,
> +  struct pci_dev *pdev, void *userdata)
>  {
> struct eeh_dev *edev;
> struct pci_driver *driver;
> @@ -588,8 +608,8 @@ static void eeh_rmv_device(struct pci_dev *pdev, void 
> *userdata)
>
> edev = pci_dev_to_eeh_dev(pdev);
> if (!edev) {
> -   pci_warn(pdev, "EEH: Device removed during processing 
> (#%d)\n",
> -__LINE__);
> +   pci_warn(pdev, "EEH(%u): Device removed during processing 
> (#%d)\n",
> +event_id, __LINE__);

It's already there, but what's with the __LINE__ ?

> diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
> index a7a8dc182efb..bd38d6fe5449 100644
> --- a/arch/powerpc/kernel/eeh_event.c
> +++ b/arch/powerpc/kernel/eeh_event.c
> @@ -26,6 +26,9 @@ static DEFINE_SPINLOCK(eeh_eventlist_lock);
>  static DECLARE_COMPLETION(eeh_eventlist_event);
>  static LIST_HEAD(eeh_eventlist);
>

> +/* Event ID 0 is reserved for special events */
> +static atomic_t eeh_event_id = ATOMIC_INIT(1);
> +

I don't think using zero for all special events is a good idea.
Special events are just events that are detected by the EEH
notification interrupt. Unlike the MMIO / config space detection
mechanism we don't have any device or PE context available in the
interrupt handler so the work of figuring out where the error came
from is punted to the recovery thread.

IMO this function probably shouldn't be calling
eeh_handle_normal_event() at all. Instead it should queue a new
eeh_event (with a unique ID) for each error it finds. That way
handling a "special" event just consists of scanning for which PHB /
PE is currently broken and the actual recovery path is identical. If
we switched to using a threaded IRQ handler (which can block) for the
EEH notification interrupts we could probably kill off special events
entirely.

> @@ -1338,7 +1367,7 @@ void eeh_handle_special_event(void)
> if (rc == EEH_NEXT_ERR_FROZEN_PE ||
> rc == EEH_NEXT_ERR_FENCED_PHB) {
> eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
> -   eeh_handle_normal_event(pe);
> +   eeh_handle_normal_event(0, pe);

I think that needs to be a 

[powerpc:next] BUILD SUCCESS 105fb38124a490f38e9c1e23bb4c4a0b6ba12fdb

2020-06-25 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next
branch HEAD: 105fb38124a490f38e9c1e23bb4c4a0b6ba12fdb  powerpc/8xx: Modify 
ptep_get()

elapsed time: 942m

configs tested: 120
configs skipped: 5

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
arc haps_hs_smp_defconfig
powerpc  g5_defconfig
mipsjmr3927_defconfig
s390 allyesconfig
sh   se7751_defconfig
arm   imx_v6_v7_defconfig
armxcep_defconfig
arm  pxa255-idp_defconfig
arm  tango4_defconfig
arm   mainstone_defconfig
arm  moxart_defconfig
m68kq40_defconfig
sh  sdk7786_defconfig
s390  allnoconfig
armmps2_defconfig
arm pxa_defconfig
arm lpc18xx_defconfig
mips   ip27_defconfig
arm eseries_pxa_defconfig
mips  loongson3_defconfig
i386 alldefconfig
nds32 allnoconfig
sh   se7724_defconfig
mips loongson1b_defconfig
pariscallnoconfig
armlart_defconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips  allnoconfig
mips allmodconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
powerpc defconfig
i386 randconfig-a002-20200624
i386 randconfig-a006-20200624
i386 randconfig-a003-20200624
i386 randconfig-a001-20200624
i386 randconfig-a005-20200624
i386 randconfig-a004-20200624
x86_64   randconfig-a004-20200624
x86_64   randconfig-a002-20200624
x86_64   randconfig-a003-20200624
x86_64   randconfig-a005-20200624
x86_64   randconfig-a001-20200624
x86_64   randconfig-a006-20200624
i386 randconfig-a013-20200624
i386 randconfig-a016-20200624
i386 randconfig-a012-20200624
i386 randconfig-a014-20200624
i386 randconfig-a011-20200624
i386 randconfig-a015-20200624
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390 allmodconfig
s390   

[powerpc:fixes-test] BUILD SUCCESS c1ed1754f271f6b7acb1bfdc8cfb62220fbed423

2020-06-25 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
fixes-test
branch HEAD: c1ed1754f271f6b7acb1bfdc8cfb62220fbed423  powerpc/kvm/book3s64: 
Fix kernel crash with nested kvm & DEBUG_VIRTUAL

elapsed time: 945m

configs tested: 114
configs skipped: 108

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
arc haps_hs_smp_defconfig
s390 allyesconfig
powerpc  g5_defconfig
mipsjmr3927_defconfig
sh   se7751_defconfig
arm   imx_v6_v7_defconfig
armxcep_defconfig
arm  pxa255-idp_defconfig
arm  tango4_defconfig
arm pxa_defconfig
arm lpc18xx_defconfig
mips   ip27_defconfig
arm eseries_pxa_defconfig
mips  loongson3_defconfig
i386 alldefconfig
nds32 allnoconfig
sh   se7724_defconfig
mips loongson1b_defconfig
pariscallnoconfig
armlart_defconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips  allnoconfig
mips allmodconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
powerpc defconfig
i386 randconfig-a002-20200624
i386 randconfig-a006-20200624
i386 randconfig-a003-20200624
i386 randconfig-a001-20200624
i386 randconfig-a005-20200624
i386 randconfig-a004-20200624
i386 randconfig-a013-20200624
i386 randconfig-a016-20200624
i386 randconfig-a012-20200624
i386 randconfig-a014-20200624
i386 randconfig-a011-20200624
i386 randconfig-a015-20200624
x86_64   randconfig-a004-20200624
x86_64   randconfig-a002-20200624
x86_64   randconfig-a003-20200624
x86_64   randconfig-a005-20200624
x86_64   randconfig-a001-20200624
x86_64   randconfig-a006-20200624
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390  allnoconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
sparc64 defconfig
sparc64  

[powerpc:merge] BUILD SUCCESS 90a3c369b73246c37616c947e83b9939e41974c8

2020-06-25 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: 90a3c369b73246c37616c947e83b9939e41974c8  Automatic merge of 
'master', 'next' and 'fixes' (2020-06-25 23:12)

elapsed time: 908m

configs tested: 124
configs skipped: 5

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
arc haps_hs_smp_defconfig
powerpc  g5_defconfig
mipsjmr3927_defconfig
s390 allyesconfig
sh   se7751_defconfig
arm   imx_v6_v7_defconfig
armxcep_defconfig
arm  pxa255-idp_defconfig
arm  tango4_defconfig
arm   mainstone_defconfig
arm  moxart_defconfig
m68kq40_defconfig
sh  sdk7786_defconfig
s390  allnoconfig
armmps2_defconfig
arm pxa_defconfig
arm lpc18xx_defconfig
mips   ip27_defconfig
arm eseries_pxa_defconfig
mips  loongson3_defconfig
i386 alldefconfig
nds32 allnoconfig
sh   se7724_defconfig
mips loongson1b_defconfig
powerpc  mpc885_ads_defconfig
shsh7763rdp_defconfig
c6xevmc6472_defconfig
riscv  rv32_defconfig
pariscallnoconfig
armlart_defconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips  allnoconfig
mips allmodconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
powerpc defconfig
i386 randconfig-a002-20200624
i386 randconfig-a006-20200624
i386 randconfig-a003-20200624
i386 randconfig-a004-20200624
i386 randconfig-a001-20200624
i386 randconfig-a005-20200624
i386 randconfig-a013-20200624
i386 randconfig-a014-20200624
i386 randconfig-a016-20200624
i386 randconfig-a012-20200624
i386 randconfig-a011-20200624
i386 randconfig-a015-20200624
x86_64   randconfig-a004-20200624
x86_64   randconfig-a002-20200624
x86_64   randconfig-a003-20200624
x86_64   randconfig-a005-20200624
x86_64   randconfig-a001-20200624
x86_64   randconfig-a006-20200624
riscv   

Re: [PATCH] powerpc/64: indirect function call use bctrl rather than blrl in ret_from_kernel_thread

2020-06-25 Thread Michael Ellerman
On Thu, 11 Jun 2020 22:11:19 +1000, Nicholas Piggin wrote:
> blrl is not recommended to use as an indirect function call, as it may
> corrupt the link stack predictor.
> 
> This is not a performance critical path but this should be fixed for
> consistency.

Applied to powerpc/next.

[1/1] powerpc/64: indirect function call use bctrl rather than blrl in 
ret_from_kernel_thread
  https://git.kernel.org/powerpc/c/89bbe4c798bc3a43c882179adb5222c1a972ac70

cheers


Re: [PATCH] powerpc/mm: Fix typo in IS_ENABLED()

2020-06-25 Thread Michael Ellerman
On Fri, 5 Jun 2020 07:18:06 -0700, Kees Cook wrote:
> IS_ENABLED() matches names exactly, so the missing "CONFIG_" prefix
> means this code would never be built.
> 
> Also fixes a missing newline in pr_warn().

Applied to powerpc/next.

[1/1] powerpc/mm: Fix typo in IS_ENABLED()
  https://git.kernel.org/powerpc/c/55bd9ac468397c4f12a33b7ec714b5d0362c3aa2

cheers


Re: [PATCH V2] powerpc/pseries/svm: Drop unused align argument in alloc_shared_lppaca() function

2020-06-25 Thread Michael Ellerman
On Fri, 12 Jun 2020 19:59:53 +0530, Satheesh Rajendran wrote:
> Argument "align" in alloc_shared_lppaca() was unused inside the
> function. Let's drop it and update code comment for page alignment.

Applied to powerpc/next.

[1/1] powerpc/pseries/svm: Drop unused align argument in alloc_shared_lppaca() 
function
  https://git.kernel.org/powerpc/c/178748b6d14946f080d49bc7dcc47b7cc4437e4d

cheers


Re: [PATCH 0/3] powerpc/dt_cpu_ftrs: Make use of ISA_V3_* macros

2020-06-25 Thread Michael Ellerman
On Wed, 10 Jun 2020 18:51:11 -0300, Murilo Opsfelder Araujo wrote:
> The first patch removes unused macro ISA_V2_07B.  The second and third
> patches make use of macros ISA_V3_0B and ISA_V3_1, respectively,
> instead their corresponding literals.
> 
> Murilo Opsfelder Araujo (3):
>   powerpc/dt_cpu_ftrs: Remove unused macro ISA_V2_07B
>   powerpc/dt_cpu_ftrs: Make use of macro ISA_V3_0B
>   powerpc/dt_cpu_ftrs: Make use of macro ISA_V3_1
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc/dt_cpu_ftrs: Remove unused macro ISA_V2_07B
  https://git.kernel.org/powerpc/c/f39eb5d8ac707fd59029a06c3f985f29b1aaa26b
[2/3] powerpc/dt_cpu_ftrs: Make use of macro ISA_V3_0B
  https://git.kernel.org/powerpc/c/e781f12a60a7bddb50909d42478cca8724c8b113
[3/3] powerpc/dt_cpu_ftrs: Make use of macro ISA_V3_1
  https://git.kernel.org/powerpc/c/7714394706c0309b3f3fc474b390463d60eb6cb1

cheers


Re: [PATCH] arch: powerpc: ppc4xx compile flag optimizations

2020-06-25 Thread Michael Ellerman
On Thu, 22 Dec 2016 09:06:08 +0100, John Crispin wrote:
> This patch splits up the compile flags between ppc40x and ppc44x.

Applied to powerpc/next.

[1/1] powerpc/4xx: ppc4xx compile flag optimizations
  https://git.kernel.org/powerpc/c/548ad77d10f7ad6e5f84a0026978da2ed1df0dae

cheers


Re: [PATCH] powerpc/ptdump: Fix build failure in hashpagetable.c

2020-06-25 Thread Michael Ellerman
On Mon, 15 Jun 2020 13:18:39 + (UTC), Christophe Leroy wrote:
> H_SUCCESS is only defined when CONFIG_PPC_PSERIES is defined.
> 
> != H_SUCCESS means != 0. Modify the test accordingly.

Applied to powerpc/next.

[1/1] powerpc/ptdump: Fix build failure in hashpagetable.c
  https://git.kernel.org/powerpc/c/7c466b0807960edc13e4b855be85ea765df9a6cd

cheers


Re: [PATCH 1/2] selftests/powerpc: Allow choice of CI memory location in alignment_handler test

2020-06-25 Thread Michael Ellerman
On Wed, 20 May 2020 12:11:02 +1000, Jordan Niethe wrote:
> The alignment handler selftest needs cache-inhibited memory and
> currently /dev/fb0 is relied on to provided this. This prevents running
> the test on systems without /dev/fb0 (e.g., mambo). Read the commandline
> arguments for an optional path to be used instead, as well as an
> optional offset to be for mmaping this path.

Applied to powerpc/next.

[1/2] selftests/powerpc: Allow choice of CI memory location in 
alignment_handler test
  https://git.kernel.org/powerpc/c/01bd294642841998c76d9e6929597dcb7da9466a
[2/2] selftests/powerpc: Add prefixed loads/stores to alignment_handler test
  https://git.kernel.org/powerpc/c/620a6473df36f8dc6f70bc85ff3465b2e21d1254

cheers


Re: [PATCH] powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when page size is 256k

2020-06-25 Thread Michael Ellerman
On Mon, 15 Jun 2020 07:48:25 + (UTC), Christophe Leroy wrote:
> FIX_EARLY_DEBUG_BASE reserves a 128k area for debuging.
> 
> When page size is 256k, the calculation results in a 0 number of
> pages, leading to the following failure:
> 
>   CC  arch/powerpc/kernel/asm-offsets.s
> In file included from ./arch/powerpc/include/asm/nohash/32/pgtable.h:77:0,
>  from ./arch/powerpc/include/asm/nohash/pgtable.h:8,
>  from ./arch/powerpc/include/asm/pgtable.h:20,
>  from ./include/linux/pgtable.h:6,
>  from ./arch/powerpc/include/asm/kup.h:42,
>  from ./arch/powerpc/include/asm/uaccess.h:9,
>  from ./include/linux/uaccess.h:11,
>  from ./include/linux/crypto.h:21,
>  from ./include/crypto/hash.h:11,
>  from ./include/linux/uio.h:10,
>  from ./include/linux/socket.h:8,
>  from ./include/linux/compat.h:15,
>  from arch/powerpc/kernel/asm-offsets.c:14:
> ./arch/powerpc/include/asm/fixmap.h:75:2: error: overflow in enumeration 
> values
>   __end_of_permanent_fixed_addresses,
>   ^
> make[2]: *** [arch/powerpc/kernel/asm-offsets.s] Error 1
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when page size is 256k
  https://git.kernel.org/powerpc/c/03fd42d458fb9cb69e712600bd69ff77ff3a45a8

cheers


Re: [PATCH] powerpc/8xx: Modify ptep_get()

2020-06-25 Thread Michael Ellerman
On Thu, 18 Jun 2020 12:07:46 + (UTC), Christophe Leroy wrote:
> Move ptep_get() close to pte_update(), in an ifdef section already
> dedicated to powerpc 8xx. This section contains explanation about
> the layout of page table entries.
> 
> Also modify it to return 4 times the pte value instead of padding
> with zeroes.

Applied to powerpc/next.

[1/1] powerpc/8xx: Modify ptep_get()
  https://git.kernel.org/powerpc/c/105fb38124a490f38e9c1e23bb4c4a0b6ba12fdb

cheers


Re: [PATCH v3 0/2] powerpc: CMDLINE config cleanup

2020-06-25 Thread Michael Ellerman
On Fri, 12 Jun 2020 10:42:18 +1200, Chris Packham wrote:
> This series cleans up the config options related to the boot command line.
> 
> Chris Packham (2):
>   powerpc: Remove inaccessible CMDLINE default
>   powerpc: configs: remove CMDLINE_BOOL
> 
>  arch/powerpc/Kconfig   | 6 +-
>  arch/powerpc/configs/44x/akebono_defconfig | 2 --
>  arch/powerpc/configs/44x/arches_defconfig  | 2 --
>  arch/powerpc/configs/44x/bamboo_defconfig  | 2 --
>  arch/powerpc/configs/44x/bluestone_defconfig   | 2 --
>  arch/powerpc/configs/44x/canyonlands_defconfig | 2 --
>  arch/powerpc/configs/44x/currituck_defconfig   | 2 --
>  arch/powerpc/configs/44x/eiger_defconfig   | 2 --
>  arch/powerpc/configs/44x/fsp2_defconfig| 1 -
>  arch/powerpc/configs/44x/icon_defconfig| 2 --
>  arch/powerpc/configs/44x/iss476-smp_defconfig  | 1 -
>  arch/powerpc/configs/44x/katmai_defconfig  | 2 --
>  arch/powerpc/configs/44x/rainier_defconfig | 2 --
>  arch/powerpc/configs/44x/redwood_defconfig | 2 --
>  arch/powerpc/configs/44x/sam440ep_defconfig| 2 --
>  arch/powerpc/configs/44x/sequoia_defconfig | 2 --
>  arch/powerpc/configs/44x/taishan_defconfig | 2 --
>  arch/powerpc/configs/44x/warp_defconfig| 1 -
>  arch/powerpc/configs/holly_defconfig   | 1 -
>  arch/powerpc/configs/mvme5100_defconfig| 3 +--
>  arch/powerpc/configs/ps3_defconfig | 2 --
>  arch/powerpc/configs/skiroot_defconfig | 1 -
>  arch/powerpc/configs/storcenter_defconfig  | 1 -
>  23 files changed, 2 insertions(+), 43 deletions(-)

Applied to powerpc/next.

[1/2] powerpc: Remove inaccessible CMDLINE default
  https://git.kernel.org/powerpc/c/f134a7cef1d7de45fab028a945b7f34c615718e1
[2/2] powerpc/configs: Remove CMDLINE_BOOL
  https://git.kernel.org/powerpc/c/0488d32530ecff973c40f2592a6eab2907d0a5cc

cheers


Re: [PATCH] powerpc/mm/book3s64: Skip 16G page reservation with radix

2020-06-25 Thread Michael Ellerman
On Mon, 22 Jun 2020 12:10:19 +0530, Aneesh Kumar K.V wrote:
> With hash translation, the hypervisor can hint the LPAR about 16GB contiguous 
> range
> via ibm,expected#pages. The kernel marks the range specified in the device 
> tree
> as reserved. Avoid doing this when using radix translation. Radix translation
> only supports 1G gigantic hugepage and kernel can do the 1G gigantic hugepage
> allocation via early memblock reservation. This can be done because with radix
> translation pages are not required to be contiguous on the host.

Applied to powerpc/next.

[1/1] powerpc/mm/book3s64: Skip 16G page reservation with radix
  https://git.kernel.org/powerpc/c/86590e524ee834b629afc55d8e5786091fbf84cc

cheers


Re: [PATCH kernel v2] powerpc/powernv/ioda: Return correct error if TCE level allocation failed

2020-06-25 Thread Michael Ellerman
On Wed, 17 Jun 2020 10:38:35 +1000, Alexey Kardashevskiy wrote:
> The iommu_table_ops::xchg_no_kill() callback updates TCE. It is quite
> possible that not entire table is allocated if it is huge and multilevel
> so xchg may also allocate subtables. If failed, it returns H_HARDWARE
> for failed allocation and H_TOO_HARD if it needs it but cannot do because
> the alloc parameter is "false" (set when called with MMU=off to force
> retry with MMU=on).
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/powernv/ioda: Return correct error if TCE level allocation failed
  https://git.kernel.org/powerpc/c/5f202c1a1d429bee3ddab647711f181c72d157da

cheers


Re: [PATCH kernel] powerpc/xive: Ignore kmemleak false positives

2020-06-25 Thread Michael Ellerman
On Fri, 12 Jun 2020 14:33:03 +1000, Alexey Kardashevskiy wrote:
> xive_native_provision_pages() allocates memory and passes the pointer to
> OPAL so kmemleak cannot find the pointer usage in the kernel memory and
> produces a false positive report (below) (even if the kernel did scan
> OPAL memory, it is unable to deal with __pa() addresses anyway).
> 
> This silences the warning.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/xive: Ignore kmemleak false positives
  https://git.kernel.org/powerpc/c/f0993c839e95dd6c7f054a1015e693c87e33e4fb

cheers


Re: [PATCH] powerpc/fsl_booke/32: fix build with CONFIG_RANDOMIZE_BASE

2020-06-25 Thread Michael Ellerman
On Sat, 13 Jun 2020 23:28:01 +0700, Arseny Solokha wrote:
> Building the current 5.8 kernel for a e500 machine with
> CONFIG_RANDOMIZE_BASE set yields the following failure:
> 
>   arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init':
>   arch/powerpc/mm/nohash/kaslr_booke.c:387:2: error: implicit declaration
> of function 'flush_icache_range'; did you mean 'flush_tlb_range'?
> [-Werror=implicit-function-declaration]
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/fsl_booke/32: Fix build with CONFIG_RANDOMIZE_BASE
  https://git.kernel.org/powerpc/c/7e4773f73dcfb92e7e33532162f722ec291e75a4

cheers


Re: [PATCH] powerpc/kvm/book3s64/nested: Fix kernel crash with nested kvm

2020-06-25 Thread Michael Ellerman
On Thu, 11 Jun 2020 17:31:59 +0530, Aneesh Kumar K.V wrote:
> __pa() do check for addr value passed and if < PAGE_OFFSET
> results in BUG.
> 
>  #define __pa(x)  
> \
> ({\
>   VIRTUAL_BUG_ON((unsigned long)(x) < PAGE_OFFSET);   \
>   (unsigned long)(x) & 0x0fffUL;  \
> })
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/kvm/book3s64: Fix kernel crash with nested kvm & DEBUG_VIRTUAL
  https://git.kernel.org/powerpc/c/c1ed1754f271f6b7acb1bfdc8cfb62220fbed423

cheers


[PATCH V3 2/3] cpufreq: Register governors at core_initcall

2020-06-25 Thread Viresh Kumar
From: Quentin Perret 

Currently, most CPUFreq governors are registered at core_initcall time
when used as default, and module_init otherwise. In preparation for
letting users specify the default governor on the kernel command line,
change all of them to use core_initcall unconditionally, as is already
the case for schedutil and performance. This will enable us to assume
builtin governors have been registered before the builtin CPUFreq
drivers probe.

And since all governors now have similar init/exit patterns, introduce
two new macros cpufreq_governor_{init,exit}() to factorize the code.

Acked-by: Viresh Kumar 
Signed-off-by: Quentin Perret 
Signed-off-by: Viresh Kumar 
---
 .../platforms/cell/cpufreq_spudemand.c| 26 ++-
 drivers/cpufreq/cpufreq_conservative.c| 22 
 drivers/cpufreq/cpufreq_ondemand.c| 24 +
 drivers/cpufreq/cpufreq_performance.c | 14 ++
 drivers/cpufreq/cpufreq_powersave.c   | 18 +++--
 drivers/cpufreq/cpufreq_userspace.c   | 18 +++--
 include/linux/cpufreq.h   | 14 ++
 kernel/sched/cpufreq_schedutil.c  |  6 +
 8 files changed, 36 insertions(+), 106 deletions(-)

diff --git a/arch/powerpc/platforms/cell/cpufreq_spudemand.c 
b/arch/powerpc/platforms/cell/cpufreq_spudemand.c
index 55b31eadb3c8..ca7849e113d7 100644
--- a/arch/powerpc/platforms/cell/cpufreq_spudemand.c
+++ b/arch/powerpc/platforms/cell/cpufreq_spudemand.c
@@ -126,30 +126,8 @@ static struct cpufreq_governor spu_governor = {
.stop = spu_gov_stop,
.owner = THIS_MODULE,
 };
-
-/*
- * module init and destoy
- */
-
-static int __init spu_gov_init(void)
-{
-   int ret;
-
-   ret = cpufreq_register_governor(_governor);
-   if (ret)
-   printk(KERN_ERR "registration of governor failed\n");
-   return ret;
-}
-
-static void __exit spu_gov_exit(void)
-{
-   cpufreq_unregister_governor(_governor);
-}
-
-
-module_init(spu_gov_init);
-module_exit(spu_gov_exit);
+cpufreq_governor_init(spu_governor);
+cpufreq_governor_exit(spu_governor);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Christian Krafft ");
-
diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index 737ff3b9c2c0..aa39ff31ec9f 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -322,17 +322,7 @@ static struct dbs_governor cs_governor = {
.start = cs_start,
 };
 
-#define CPU_FREQ_GOV_CONSERVATIVE  (_governor.gov)
-
-static int __init cpufreq_gov_dbs_init(void)
-{
-   return cpufreq_register_governor(CPU_FREQ_GOV_CONSERVATIVE);
-}
-
-static void __exit cpufreq_gov_dbs_exit(void)
-{
-   cpufreq_unregister_governor(CPU_FREQ_GOV_CONSERVATIVE);
-}
+#define CPU_FREQ_GOV_CONSERVATIVE  (cs_governor.gov)
 
 MODULE_AUTHOR("Alexander Clouter ");
 MODULE_DESCRIPTION("'cpufreq_conservative' - A dynamic cpufreq governor for "
@@ -343,11 +333,9 @@ MODULE_LICENSE("GPL");
 #ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE
 struct cpufreq_governor *cpufreq_default_governor(void)
 {
-   return CPU_FREQ_GOV_CONSERVATIVE;
+   return _FREQ_GOV_CONSERVATIVE;
 }
-
-core_initcall(cpufreq_gov_dbs_init);
-#else
-module_init(cpufreq_gov_dbs_init);
 #endif
-module_exit(cpufreq_gov_dbs_exit);
+
+cpufreq_governor_init(CPU_FREQ_GOV_CONSERVATIVE);
+cpufreq_governor_exit(CPU_FREQ_GOV_CONSERVATIVE);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c 
b/drivers/cpufreq/cpufreq_ondemand.c
index 82a4d37ddecb..ac361a8b1d3b 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -408,7 +408,7 @@ static struct dbs_governor od_dbs_gov = {
.start = od_start,
 };
 
-#define CPU_FREQ_GOV_ONDEMAND  (_dbs_gov.gov)
+#define CPU_FREQ_GOV_ONDEMAND  (od_dbs_gov.gov)
 
 static void od_set_powersave_bias(unsigned int powersave_bias)
 {
@@ -429,7 +429,7 @@ static void od_set_powersave_bias(unsigned int 
powersave_bias)
continue;
 
policy = cpufreq_cpu_get_raw(cpu);
-   if (!policy || policy->governor != CPU_FREQ_GOV_ONDEMAND)
+   if (!policy || policy->governor != _FREQ_GOV_ONDEMAND)
continue;
 
policy_dbs = policy->governor_data;
@@ -461,16 +461,6 @@ void od_unregister_powersave_bias_handler(void)
 }
 EXPORT_SYMBOL_GPL(od_unregister_powersave_bias_handler);
 
-static int __init cpufreq_gov_dbs_init(void)
-{
-   return cpufreq_register_governor(CPU_FREQ_GOV_ONDEMAND);
-}
-
-static void __exit cpufreq_gov_dbs_exit(void)
-{
-   cpufreq_unregister_governor(CPU_FREQ_GOV_ONDEMAND);
-}
-
 MODULE_AUTHOR("Venkatesh Pallipadi ");
 MODULE_AUTHOR("Alexey Starikovskiy ");
 MODULE_DESCRIPTION("'cpufreq_ondemand' - A dynamic cpufreq governor for "
@@ -480,11 +470,9 @@ MODULE_LICENSE("GPL");
 #ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND
 struct cpufreq_governor 

[PATCH v3 0/3] cpufreq: Allow default governor on cmdline and fix locking issues

2020-06-25 Thread Viresh Kumar
Hi,

I have picked Quentin's series over my patch, modified both and tested.

V2->V3:
- default_governor is a string now and we don't set it on governor
  registration or unregistration anymore.
- Fixed locking issues in cpufreq_init_policy().

--
Viresh

Original cover letter fro Quentin:

This series enables users of prebuilt kernels (e.g. distro kernels) to
specify their CPUfreq governor of choice using the kernel command line,
instead of having to wait for the system to fully boot to userspace to
switch using the sysfs interface. This is helpful for 2 reasons:
  1. users get to choose the governor that runs during the actual boot;
  2. it simplifies the userspace boot procedure a bit (one less thing to
 worry about).

To enable this, the first patch moves all governor init calls to
core_initcall, to make sure they are registered by the time the drivers
probe. This should be relatively low impact as registering a governor
is a simple procedure (it gets added to a llist), and all governors
already load at core_initcall anyway when they're set as the default
in Kconfig. This also allows to clean-up the governors' init/exit code,
and reduces boilerplate.

The second patch introduces the new command line parameter, inspired by
its cpuidle counterpart. More details can be found in the respective
patch headers.

Changes in v2:
 - added Viresh's ack to patch 01
 - moved the assignment of 'default_governor' in patch 02 to the governor
   registration path instead of the driver registration (Viresh)

Quentin Perret (2):
  cpufreq: Register governors at core_initcall
  cpufreq: Specify default governor on command line

Viresh Kumar (1):
  cpufreq: Fix locking issues with governors

 .../admin-guide/kernel-parameters.txt |  5 +
 Documentation/admin-guide/pm/cpufreq.rst  |  6 +-
 .../platforms/cell/cpufreq_spudemand.c| 26 +-
 drivers/cpufreq/cpufreq.c | 93 ---
 drivers/cpufreq/cpufreq_conservative.c| 22 +
 drivers/cpufreq/cpufreq_ondemand.c| 24 ++---
 drivers/cpufreq/cpufreq_performance.c | 14 +--
 drivers/cpufreq/cpufreq_powersave.c   | 18 +---
 drivers/cpufreq/cpufreq_userspace.c   | 18 +---
 include/linux/cpufreq.h   | 14 +++
 kernel/sched/cpufreq_schedutil.c  |  6 +-
 11 files changed, 106 insertions(+), 140 deletions(-)

-- 
2.25.0.rc1.19.g042ed3e048af



Re: [PATCH] selftests/powerpc: Fix build issue with output directory

2020-06-25 Thread Michael Ellerman
Harish  writes:
> We use OUTPUT directory as TMPOUT for checking no-pie option. When
> building powerpc/ from selftests directory, the OUTPUT directory
> eventually points to powerpc/pmu/ebb/ and gets removed when
> checking for -no-pie option in try-run routine, subsequently build
> fails with the following

It worked fine until:
  f2f02ebd8f38 ("kbuild: improve cc-option to clean up all temporary files") 
when building powerpc/ from selftests directory, the

I updated the change log to mention that.

cheers


> $ make -C powerpc
> ...
> ...
> TARGET=ebb; BUILD_TARGET=$OUTPUT/$TARGET; mkdir -p $BUILD_TARGET; make 
> OUTPUT=$BUILD_TARGET -k -C $TARGET all
> make[2]: Entering directory 
> '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb'
> make[2]: *** No rule to make target 'Makefile'.
> make[2]: Failed to remake makefile 'Makefile'.
> make[2]: *** No rule to make target 'ebb.c', needed by 
> '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
> make[2]: *** No rule to make target 'ebb_handler.S', needed by 
> '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
> make[2]: *** No rule to make target 'trace.c', needed by 
> '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
> make[2]: *** No rule to make target 'busy_loop.S', needed by 
> '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
> make[2]: Target 'all' not remade because of errors.
>
> Fix this by adding a suffix to the OUTPUT directory so that the
> failure is avoided
>
> Fixes: 9686813f6e9d ("selftests/powerpc: Fix try-run when source tree is not 
> writable")
> Signed-off-by: Harish 


Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line

2020-06-25 Thread Viresh Kumar
On 23-06-20, 15:21, Quentin Perret wrote:
> @@ -2789,7 +2796,13 @@ static int __init cpufreq_core_init(void)
>   cpufreq_global_kobject = kobject_create_and_add("cpufreq", 
> _subsys.dev_root->kobj);
>   BUG_ON(!cpufreq_global_kobject);
>  
> + mutex_lock(_governor_mutex);
> + if (!default_governor)

Also is this check really required ? The pointer will always be NULL
at this point, isn't it ?

> + default_governor = cpufreq_default_governor();
> + mutex_unlock(_governor_mutex);
> +
>   return 0;
>  }
>  module_param(off, int, 0444);
> +module_param_string(default_governor, cpufreq_param_governor, 
> CPUFREQ_NAME_LEN, 0444);
>  core_initcall(cpufreq_core_init);
> -- 
> 2.27.0.111.gc72c7da667-goog

-- 
viresh


Re: [PATCH 00/13] iommu: Remove usage of dev->archdata.iommu

2020-06-25 Thread Jerry Snitselaar

On Thu Jun 25 20, Joerg Roedel wrote:

From: Joerg Roedel 

Hi,

here is a patch-set to remove the usage of dev->archdata.iommu from
the IOMMU code in the kernel and replace its uses by the iommu per-device
private data field. The changes also remove the field entirely from
the architectures which no longer need it.

On PowerPC the field is called dev->archdata.iommu_domain and was only
used by the PAMU IOMMU driver. It gets removed as well.

The patches have been runtime tested on Intel VT-d and compile tested
with allyesconfig for:

* x86 (32 and 64 bit)
* arm and arm64
* ia64 (only drivers/ because build failed for me in
arch/ia64)
* PPC64

Besides that the changes also survived my IOMMU tree compile tests.

Please review.

Regards,

Joerg

Joerg Roedel (13):
 iommu/exynos: Use dev_iommu_priv_get/set()
 iommu/vt-d: Use dev_iommu_priv_get/set()
 iommu/msm: Use dev_iommu_priv_get/set()
 iommu/omap: Use dev_iommu_priv_get/set()
 iommu/rockchip: Use dev_iommu_priv_get/set()
 iommu/tegra: Use dev_iommu_priv_get/set()
 iommu/pamu: Use dev_iommu_priv_get/set()
 iommu/mediatek: Do no use dev->archdata.iommu
 x86: Remove dev->archdata.iommu pointer
 ia64: Remove dev->archdata.iommu pointer
 arm: Remove dev->archdata.iommu pointer
 arm64: Remove dev->archdata.iommu pointer
 powerpc/dma: Remove dev->archdata.iommu_domain

arch/arm/include/asm/device.h |  3 ---
arch/arm64/include/asm/device.h   |  3 ---
arch/ia64/include/asm/device.h|  3 ---
arch/powerpc/include/asm/device.h |  3 ---
arch/x86/include/asm/device.h |  3 ---
.../gpu/drm/i915/selftests/mock_gem_device.c  | 10 --
drivers/iommu/exynos-iommu.c  | 20 +--
drivers/iommu/fsl_pamu_domain.c   |  8 
drivers/iommu/intel/iommu.c   | 18 -
drivers/iommu/msm_iommu.c |  4 ++--
drivers/iommu/mtk_iommu.h |  2 ++
drivers/iommu/mtk_iommu_v1.c  | 10 --
drivers/iommu/omap-iommu.c| 20 +--
drivers/iommu/rockchip-iommu.c|  8 
drivers/iommu/tegra-gart.c|  8 
drivers/iommu/tegra-smmu.c|  8 
.../media/platform/s5p-mfc/s5p_mfc_iommu.h|  4 +++-
17 files changed, 64 insertions(+), 71 deletions(-)

--
2.27.0



Reviewed-by: Jerry Snitselaar 



Re: [PATCH v2] SUNRPC: Add missing definition of ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE

2020-06-25 Thread Bruce Fields
On Mon, Jun 15, 2020 at 08:33:40AM -0400, Chuck Lever wrote:
> 
> 
> > On Jun 15, 2020, at 2:25 AM, Christophe Leroy  
> > wrote:
> > 
> > Even if that's only a warning, not including asm/cacheflush.h
> > leads to svc_flush_bvec() being empty allthough powerpc defines
> > ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE.
> > 
> >  CC  net/sunrpc/svcsock.o
> > net/sunrpc/svcsock.c:227:5: warning: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is 
> > not defined [-Wundef]
> > #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
> > ^
> > 
> > Include linux/highmem.h so that asm/cacheflush.h will be included.
> > 
> > Reported-by: Christophe Leroy 
> > Reported-by: kernel test robot 
> > Cc: Chuck Lever 
> > Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()")
> > Signed-off-by: Christophe Leroy 
> 
> LGTM.
> 
> Acked-by: Chuck Lever 

Thanks, applying for 5.8.--b.


Re: [PATCH] powerpc: Warn about use of smt_snooze_delay

2020-06-25 Thread Tyrel Datwyler
On 6/25/20 3:29 AM, Christophe Leroy wrote:
> 
> 
> Le 25/06/2020 à 12:03, Joel Stanley a écrit :
>> It's not done anything for a long time. Save the percpu variable, and
>> emit a warning to remind users to not expect it to do anything.
> 
> Why not just drop the file entirely  if it is useless ?

Userspace tooling that hasn't learned its useless yet. Joel has also submitted a
pull request for the ppc64_util tool in question to drop using this interface.

-Tyrel

> 
> Christophe
> 
>>
>> Signed-off-by: Joel Stanley 
>> ---
>>   arch/powerpc/kernel/sysfs.c | 41 +
>>   1 file changed, 14 insertions(+), 27 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
>> index 571b3259697e..530ae92bc46d 100644
>> --- a/arch/powerpc/kernel/sysfs.c
>> +++ b/arch/powerpc/kernel/sysfs.c
>> @@ -32,29 +32,25 @@
>>     static DEFINE_PER_CPU(struct cpu, cpu_devices);
>>   -/*
>> - * SMT snooze delay stuff, 64-bit only for now
>> - */
>> -
>>   #ifdef CONFIG_PPC64
>>   -/* Time in microseconds we delay before sleeping in the idle loop */
>> -static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
>> +/*
>> + * Snooze delay has not been hooked up since 3fa8cad82b94
>> ("powerpc/pseries/cpuidle:
>> + * smt-snooze-delay cleanup.") and has been broken even longer. As was
>> foretold in
>> + * 2014:
>> + *
>> + *  "ppc64_util currently utilises it. Once we fix ppc64_util, propose to 
>> clean
>> + *  up the kernel code."
>> + *
>> + * At some point in the future this code should be removed.
>> + */
>>     static ssize_t store_smt_snooze_delay(struct device *dev,
>>     struct device_attribute *attr,
>>     const char *buf,
>>     size_t count)
>>   {
>> -    struct cpu *cpu = container_of(dev, struct cpu, dev);
>> -    ssize_t ret;
>> -    long snooze;
>> -
>> -    ret = sscanf(buf, "%ld", );
>> -    if (ret != 1)
>> -    return -EINVAL;
>> -
>> -    per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
>> +    WARN_ON_ONCE("smt_snooze_delay sysfs file has no effect\n");
>>   return count;
>>   }
>>   @@ -62,9 +58,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
>>    struct device_attribute *attr,
>>    char *buf)
>>   {
>> -    struct cpu *cpu = container_of(dev, struct cpu, dev);
>> +    WARN_ON_ONCE("smt_snooze_delay sysfs file has no effect\n");
>>   -    return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
>> +    return sprintf(buf, "100\n");
>>   }
>>     static DEVICE_ATTR(smt_snooze_delay, 0644, show_smt_snooze_delay,
>> @@ -72,16 +68,7 @@ static DEVICE_ATTR(smt_snooze_delay, 0644,
>> show_smt_snooze_delay,
>>     static int __init setup_smt_snooze_delay(char *str)
>>   {
>> -    unsigned int cpu;
>> -    long snooze;
>> -
>> -    if (!cpu_has_feature(CPU_FTR_SMT))
>> -    return 1;
>> -
>> -    snooze = simple_strtol(str, NULL, 10);
>> -    for_each_possible_cpu(cpu)
>> -    per_cpu(smt_snooze_delay, cpu) = snooze;
>> -
>> +    WARN_ON_ONCE("smt-snooze-delay command line option has no effect\n");
>>   return 1;
>>   }
>>   __setup("smt-snooze-delay=", setup_smt_snooze_delay);
>>



Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line

2020-06-25 Thread Rafael J. Wysocki
On Thu, Jun 25, 2020 at 3:50 PM Quentin Perret  wrote:
>
> On Thursday 25 Jun 2020 at 15:28:43 (+0200), Rafael J. Wysocki wrote:
> > On Thu, Jun 25, 2020 at 1:53 PM Quentin Perret  wrote:
> > >
> > > On Thursday 25 Jun 2020 at 13:44:34 (+0200), Rafael J. Wysocki wrote:
> > > > On Thu, Jun 25, 2020 at 1:36 PM Viresh Kumar  
> > > > wrote:
> > > > > This change is not right IMO. This part handles the set-policy case,
> > > > > where there are no governors. Right now this code, for some reasons
> > > > > unknown to me, forcefully uses the default governor set to indicate
> > > > > the policy, which is not a great idea in my opinion TBH. This doesn't
> > > > > and shouldn't care about governor modules and should only be looking
> > > > > at strings instead of governor pointer.
> > > >
> > > > Sounds right.
> > > >
> > > > > Rafael, I even think we should remove this code completely and just
> > > > > rely on what the driver has sent to us. Using the selected governor
> > > > > for set policy drivers is very confusing and also we shouldn't be
> > > > > forced to compiling any governor for the set-policy case.
> > > >
> > > > Well, AFAICS the idea was to use the default governor as a kind of
> > > > default policy proxy, but I agree that strings should be sufficient
> > > > for that.
> > >
> > > I agree with all the above. I'd much rather not rely on the default
> > > governor name to populate the default policy, too, so +1 from me.
> >
> > So before this series the default governor was selected at the kernel
> > configuration time (pre-build) and was always built-in.  Because it
> > could not go away, its name could be used to indicate the default
> > policy for the "setpolicy" drivers.
> >
> > After this series, however, it cannot be used this way reliably, but
> > you can still pass cpufreq_param_governor to cpufreq_parse_policy()
> > instead of def_gov->name in cpufreq_init_policy(), can't you?
>
> Good point. I also need to fallback to the default builtin governor if
> the command line parameter isn't valid (or non-existent), so perhaps
> something like so?

Yes, that should work if I haven't missed anything.

> iff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index dad6b85f4c89..20a2020abf88 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -653,6 +653,23 @@ static unsigned int cpufreq_parse_policy(char 
> *str_governor)
> return CPUFREQ_POLICY_UNKNOWN;
>  }
>
> +static unsigned int cpufreq_default_policy(void)
> +{
> +   unsigned int pol;
> +
> +   pol = cpufreq_parse_policy(cpufreq_param_governor);
> +   if (pol != CPUFREQ_POLICY_UNKNOWN)
> +   return pol;
> +
> +   if (IS_BUILTIN(CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE))
> +   return CPUFREQ_POLICY_PERFORMANCE;
> +
> +   if (IS_BUILTIN(CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE))
> +   return CPUFREQ_POLICY_POWERSAVE;
> +
> +   return CPUFREQ_POLICY_UNKNOWN;
> +}
> +
>  /**
>   * cpufreq_parse_governor - parse a governor string only for has_target()
>   * @str_governor: Governor name.
> @@ -1085,8 +1102,8 @@ static int cpufreq_init_policy(struct cpufreq_policy 
> *policy)
> /* Use the default policy if there is no last_policy. */
> if (policy->last_policy) {
> pol = policy->last_policy;
> -   } else if (default_governor) {
> -   pol = cpufreq_parse_policy(default_governor->name);
> +   } else {
> +   pol = cpufreq_default_policy();
> /*
>  * In case the default governor is neiter 
> "performance"
>  * nor "powersave", fall back to the initial policy


Re: [PATCH] hvc: unify console setup naming

2020-06-25 Thread Petr Mladek
On Sat 2020-06-20 02:22:40, Sergey Senozhatsky wrote:
> Use the 'common' foo_console_setup() naming scheme. There are 71
> foo_console_setup() callbacks and only one foo_setup_console().
> 
> Signed-off-by: Sergey Senozhatsky 

This patch is commited in printk/linux.git, branch
for-5.9-console-return-codes.

Best Regards,
Petr


Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line

2020-06-25 Thread Rafael J. Wysocki
On Thu, Jun 25, 2020 at 1:53 PM Quentin Perret  wrote:
>
> On Thursday 25 Jun 2020 at 13:44:34 (+0200), Rafael J. Wysocki wrote:
> > On Thu, Jun 25, 2020 at 1:36 PM Viresh Kumar  
> > wrote:
> > > This change is not right IMO. This part handles the set-policy case,
> > > where there are no governors. Right now this code, for some reasons
> > > unknown to me, forcefully uses the default governor set to indicate
> > > the policy, which is not a great idea in my opinion TBH. This doesn't
> > > and shouldn't care about governor modules and should only be looking
> > > at strings instead of governor pointer.
> >
> > Sounds right.
> >
> > > Rafael, I even think we should remove this code completely and just
> > > rely on what the driver has sent to us. Using the selected governor
> > > for set policy drivers is very confusing and also we shouldn't be
> > > forced to compiling any governor for the set-policy case.
> >
> > Well, AFAICS the idea was to use the default governor as a kind of
> > default policy proxy, but I agree that strings should be sufficient
> > for that.
>
> I agree with all the above. I'd much rather not rely on the default
> governor name to populate the default policy, too, so +1 from me.

So before this series the default governor was selected at the kernel
configuration time (pre-build) and was always built-in.  Because it
could not go away, its name could be used to indicate the default
policy for the "setpolicy" drivers.

After this series, however, it cannot be used this way reliably, but
you can still pass cpufreq_param_governor to cpufreq_parse_policy()
instead of def_gov->name in cpufreq_init_policy(), can't you?


Re: [PATCH 02/13] iommu/vt-d: Use dev_iommu_priv_get/set()

2020-06-25 Thread Lu Baolu

Hi Joerg,

On 2020/6/25 21:08, Joerg Roedel wrote:

From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
  .../gpu/drm/i915/selftests/mock_gem_device.c   | 10 --
  drivers/iommu/intel/iommu.c| 18 +-


For changes in VT-d driver,

Reviewed-by: Lu Baolu 

Best regards,
baolu


  2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c 
b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 9b105b811f1f..e08601905a64 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -24,6 +24,7 @@
  
  #include 

  #include 
+#include 
  
  #include 
  
@@ -118,6 +119,9 @@ struct drm_i915_private *mock_gem_device(void)

  {
struct drm_i915_private *i915;
struct pci_dev *pdev;
+#if IS_ENABLED(CONFIG_IOMMU_API) && defined(CONFIG_INTEL_IOMMU)
+   struct dev_iommu iommu;
+#endif
int err;
  
  	pdev = kzalloc(sizeof(*pdev), GFP_KERNEL);

@@ -136,8 +140,10 @@ struct drm_i915_private *mock_gem_device(void)
dma_coerce_mask_and_coherent(>dev, DMA_BIT_MASK(64));
  
  #if IS_ENABLED(CONFIG_IOMMU_API) && defined(CONFIG_INTEL_IOMMU)

-   /* hack to disable iommu for the fake device; force identity mapping */
-   pdev->dev.archdata.iommu = (void *)-1;
+   /* HACK HACK HACK to disable iommu for the fake device; force identity 
mapping */
+   memset(, 0, sizeof(iommu));
+   iommu.priv = (void *)-1;
+   pdev->dev.iommu = 
  #endif
  
  	pci_set_drvdata(pdev, i915);

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d759e7234e98..2ce490c2eab8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -372,7 +372,7 @@ struct device_domain_info *get_domain_info(struct device 
*dev)
if (!dev)
return NULL;
  
-	info = dev->archdata.iommu;

+   info = dev_iommu_priv_get(dev);
if (unlikely(info == DUMMY_DEVICE_DOMAIN_INFO ||
 info == DEFER_DEVICE_DOMAIN_INFO))
return NULL;
@@ -743,12 +743,12 @@ struct context_entry *iommu_context_addr(struct 
intel_iommu *iommu, u8 bus,
  
  static int iommu_dummy(struct device *dev)

  {
-   return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
+   return dev_iommu_priv_get(dev) == DUMMY_DEVICE_DOMAIN_INFO;
  }
  
  static bool attach_deferred(struct device *dev)

  {
-   return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO;
+   return dev_iommu_priv_get(dev) == DEFER_DEVICE_DOMAIN_INFO;
  }
  
  /**

@@ -2420,7 +2420,7 @@ static inline void unlink_domain_info(struct 
device_domain_info *info)
list_del(>link);
list_del(>global);
if (info->dev)
-   info->dev->archdata.iommu = NULL;
+   dev_iommu_priv_set(info->dev, NULL);
  }
  
  static void domain_remove_dev_info(struct dmar_domain *domain)

@@ -2453,7 +2453,7 @@ static void do_deferred_attach(struct device *dev)
  {
struct iommu_domain *domain;
  
-	dev->archdata.iommu = NULL;

+   dev_iommu_priv_set(dev, NULL);
domain = iommu_get_domain_for_dev(dev);
if (domain)
intel_iommu_attach_device(domain, dev);
@@ -2599,7 +2599,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
list_add(>link, >devices);
list_add(>global, _domain_list);
if (dev)
-   dev->archdata.iommu = info;
+   dev_iommu_priv_set(dev, info);
spin_unlock_irqrestore(_domain_lock, flags);
  
  	/* PASID table is mandatory for a PCI device in scalable mode. */

@@ -4004,7 +4004,7 @@ static void quirk_ioat_snb_local_iommu(struct pci_dev 
*pdev)
if (!drhd || drhd->reg_base_addr - vtbar != 0xa000) {
pr_warn_once(FW_BUG "BIOS assigned incorrect VT-d unit for Intel(R) 
QuickData Technology device\n");
add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
-   pdev->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
+   dev_iommu_priv_set(>dev, DUMMY_DEVICE_DOMAIN_INFO);
}
  }
  DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IOAT_SNB, 
quirk_ioat_snb_local_iommu);
@@ -4043,7 +4043,7 @@ static void __init init_no_remapping_devices(void)
drhd->ignored = 1;
for_each_active_dev_scope(drhd->devices,
  drhd->devices_cnt, i, dev)
-   dev->archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
+   dev_iommu_priv_set(dev, 
DUMMY_DEVICE_DOMAIN_INFO);
}
}
  }
@@ -5665,7 +5665,7 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
return ERR_PTR(-ENODEV);
  
  	if 

[PATCH 10/13] ia64: Remove dev->archdata.iommu pointer

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

There are no users left, all drivers have been converted to use the
per-device private pointer offered by IOMMU core.

Signed-off-by: Joerg Roedel 
---
 arch/ia64/include/asm/device.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/ia64/include/asm/device.h b/arch/ia64/include/asm/device.h
index 3eb397415381..918b198cd5bb 100644
--- a/arch/ia64/include/asm/device.h
+++ b/arch/ia64/include/asm/device.h
@@ -6,9 +6,6 @@
 #define _ASM_IA64_DEVICE_H
 
 struct dev_archdata {
-#ifdef CONFIG_IOMMU_API
-   void *iommu; /* hook for IOMMU specific extension */
-#endif
 };
 
 struct pdev_archdata {
-- 
2.27.0



[PATCH 08/13] iommu/mediatek: Do no use dev->archdata.iommu

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

The iommu private pointer is already used in the Mediatek IOMMU v1
driver, so move the dma_iommu_mapping pointer into 'struct
mtk_iommu_data' and do not use dev->archdata.iommu anymore.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/mtk_iommu.h|  2 ++
 drivers/iommu/mtk_iommu_v1.c | 10 --
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index ea949a324e33..1682406e98dc 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -62,6 +62,8 @@ struct mtk_iommu_data {
struct iommu_device iommu;
const struct mtk_iommu_plat_data *plat_data;
 
+   struct dma_iommu_mapping*mapping; /* For mtk_iommu_v1.c */
+
struct list_headlist;
struct mtk_smi_larb_iommu   larb_imu[MTK_LARB_NR_MAX];
 };
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index c9d79cff4d17..82ddfe9170d4 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -269,7 +269,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
int ret;
 
/* Only allow the domain created internally. */
-   mtk_mapping = data->dev->archdata.iommu;
+   mtk_mapping = data->mapping;
if (mtk_mapping->domain != domain)
return 0;
 
@@ -369,7 +369,6 @@ static int mtk_iommu_create_mapping(struct device *dev,
struct mtk_iommu_data *data;
struct platform_device *m4updev;
struct dma_iommu_mapping *mtk_mapping;
-   struct device *m4udev;
int ret;
 
if (args->args_count != 1) {
@@ -401,8 +400,7 @@ static int mtk_iommu_create_mapping(struct device *dev,
return ret;
 
data = dev_iommu_priv_get(dev);
-   m4udev = data->dev;
-   mtk_mapping = m4udev->archdata.iommu;
+   mtk_mapping = data->mapping;
if (!mtk_mapping) {
/* MTK iommu support 4GB iova address space. */
mtk_mapping = arm_iommu_create_mapping(_bus_type,
@@ -410,7 +408,7 @@ static int mtk_iommu_create_mapping(struct device *dev,
if (IS_ERR(mtk_mapping))
return PTR_ERR(mtk_mapping);
 
-   m4udev->archdata.iommu = mtk_mapping;
+   data->mapping = mtk_mapping;
}
 
return 0;
@@ -459,7 +457,7 @@ static void mtk_iommu_probe_finalize(struct device *dev)
int err;
 
data= dev_iommu_priv_get(dev);
-   mtk_mapping = data->dev->archdata.iommu;
+   mtk_mapping = data->mapping;
 
err = arm_iommu_attach_device(dev, mtk_mapping);
if (err)
-- 
2.27.0



[PATCH 06/13] iommu/tegra: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/tegra-gart.c | 8 
 drivers/iommu/tegra-smmu.c | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 5fbdff6ff41a..fac720273889 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -113,8 +113,8 @@ static int gart_iommu_attach_dev(struct iommu_domain 
*domain,
 
if (gart->active_domain && gart->active_domain != domain) {
ret = -EBUSY;
-   } else if (dev->archdata.iommu != domain) {
-   dev->archdata.iommu = domain;
+   } else if (dev_iommu_priv_get(dev) != domain) {
+   dev_iommu_priv_set(dev, domain);
gart->active_domain = domain;
gart->active_devices++;
}
@@ -131,8 +131,8 @@ static void gart_iommu_detach_dev(struct iommu_domain 
*domain,
 
spin_lock(>dom_lock);
 
-   if (dev->archdata.iommu == domain) {
-   dev->archdata.iommu = NULL;
+   if (dev_iommu_priv_get(dev) == domain) {
+   dev_iommu_priv_set(dev, NULL);
 
if (--gart->active_devices == 0)
gart->active_domain = NULL;
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 7426b7666e2b..124c8848ab7e 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -465,7 +465,7 @@ static void tegra_smmu_as_unprepare(struct tegra_smmu *smmu,
 static int tegra_smmu_attach_dev(struct iommu_domain *domain,
 struct device *dev)
 {
-   struct tegra_smmu *smmu = dev->archdata.iommu;
+   struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
struct tegra_smmu_as *as = to_smmu_as(domain);
struct device_node *np = dev->of_node;
struct of_phandle_args args;
@@ -780,7 +780,7 @@ static struct iommu_device *tegra_smmu_probe_device(struct 
device *dev)
 * supported by the Linux kernel, so abort after the
 * first match.
 */
-   dev->archdata.iommu = smmu;
+   dev_iommu_priv_set(dev, smmu);
 
break;
}
@@ -797,7 +797,7 @@ static struct iommu_device *tegra_smmu_probe_device(struct 
device *dev)
 
 static void tegra_smmu_release_device(struct device *dev)
 {
-   dev->archdata.iommu = NULL;
+   dev_iommu_priv_set(dev, NULL);
 }
 
 static const struct tegra_smmu_group_soc *
@@ -856,7 +856,7 @@ static struct iommu_group *tegra_smmu_group_get(struct 
tegra_smmu *smmu,
 static struct iommu_group *tegra_smmu_device_group(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct tegra_smmu *smmu = dev->archdata.iommu;
+   struct tegra_smmu *smmu = dev_iommu_priv_get(dev);
struct iommu_group *group;
 
group = tegra_smmu_group_get(smmu, fwspec->ids[0]);
-- 
2.27.0



[PATCH 05/13] iommu/rockchip: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/rockchip-iommu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index d25c2486ca07..e5d86b7177de 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -836,7 +836,7 @@ static size_t rk_iommu_unmap(struct iommu_domain *domain, 
unsigned long _iova,
 
 static struct rk_iommu *rk_iommu_from_dev(struct device *dev)
 {
-   struct rk_iommudata *data = dev->archdata.iommu;
+   struct rk_iommudata *data = dev_iommu_priv_get(dev);
 
return data ? data->iommu : NULL;
 }
@@ -1059,7 +1059,7 @@ static struct iommu_device *rk_iommu_probe_device(struct 
device *dev)
struct rk_iommudata *data;
struct rk_iommu *iommu;
 
-   data = dev->archdata.iommu;
+   data = dev_iommu_priv_get(dev);
if (!data)
return ERR_PTR(-ENODEV);
 
@@ -1073,7 +1073,7 @@ static struct iommu_device *rk_iommu_probe_device(struct 
device *dev)
 
 static void rk_iommu_release_device(struct device *dev)
 {
-   struct rk_iommudata *data = dev->archdata.iommu;
+   struct rk_iommudata *data = dev_iommu_priv_get(dev);
 
device_link_del(data->link);
 }
@@ -1100,7 +1100,7 @@ static int rk_iommu_of_xlate(struct device *dev,
iommu_dev = of_find_device_by_node(args->np);
 
data->iommu = platform_get_drvdata(iommu_dev);
-   dev->archdata.iommu = data;
+   dev_iommu_priv_set(dev, data);
 
platform_device_put(iommu_dev);
 
-- 
2.27.0



[PATCH 04/13] iommu/omap: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/omap-iommu.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index c8282cc212cb..e84ead6fb234 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -71,7 +71,7 @@ static struct omap_iommu_domain *to_omap_domain(struct 
iommu_domain *dom)
  **/
 void omap_iommu_save_ctx(struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
struct omap_iommu *obj;
u32 *p;
int i;
@@ -101,7 +101,7 @@ EXPORT_SYMBOL_GPL(omap_iommu_save_ctx);
  **/
 void omap_iommu_restore_ctx(struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
struct omap_iommu *obj;
u32 *p;
int i;
@@ -1398,7 +1398,7 @@ static size_t omap_iommu_unmap(struct iommu_domain 
*domain, unsigned long da,
 
 static int omap_iommu_count(struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
int count = 0;
 
while (arch_data->iommu_dev) {
@@ -1459,8 +1459,8 @@ static void omap_iommu_detach_fini(struct 
omap_iommu_domain *odomain)
 static int
 omap_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
struct omap_iommu_domain *omap_domain = to_omap_domain(domain);
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
struct omap_iommu_device *iommu;
struct omap_iommu *oiommu;
int ret = 0;
@@ -1524,7 +1524,7 @@ omap_iommu_attach_dev(struct iommu_domain *domain, struct 
device *dev)
 static void _omap_iommu_detach_dev(struct omap_iommu_domain *omap_domain,
   struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
struct omap_iommu_device *iommu = omap_domain->iommus;
struct omap_iommu *oiommu;
int i;
@@ -1650,7 +1650,7 @@ static struct iommu_device 
*omap_iommu_probe_device(struct device *dev)
int num_iommus, i;
 
/*
-* Allocate the archdata iommu structure for DT-based devices.
+* Allocate the per-device iommu structure for DT-based devices.
 *
 * TODO: Simplify this when removing non-DT support completely from the
 * IOMMU users.
@@ -1698,7 +1698,7 @@ static struct iommu_device 
*omap_iommu_probe_device(struct device *dev)
of_node_put(np);
}
 
-   dev->archdata.iommu = arch_data;
+   dev_iommu_priv_set(dev, arch_data);
 
/*
 * use the first IOMMU alone for the sysfs device linking.
@@ -1712,19 +1712,19 @@ static struct iommu_device 
*omap_iommu_probe_device(struct device *dev)
 
 static void omap_iommu_release_device(struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
 
if (!dev->of_node || !arch_data)
return;
 
-   dev->archdata.iommu = NULL;
+   dev_iommu_priv_set(dev, NULL);
kfree(arch_data);
 
 }
 
 static struct iommu_group *omap_iommu_device_group(struct device *dev)
 {
-   struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
+   struct omap_iommu_arch_data *arch_data = dev_iommu_priv_get(dev);
struct iommu_group *group = ERR_PTR(-EINVAL);
 
if (!arch_data)
-- 
2.27.0



[PATCH 01/13] iommu/exynos: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/exynos-iommu.c  | 20 +--
 .../media/platform/s5p-mfc/s5p_mfc_iommu.h|  4 +++-
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 60c8a56e4a3f..6a9b67302369 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -173,7 +173,7 @@ static u32 lv2ent_offset(sysmmu_iova_t iova)
 #define REG_V5_FAULT_AR_VA 0x070
 #define REG_V5_FAULT_AW_VA 0x080
 
-#define has_sysmmu(dev)(dev->archdata.iommu != NULL)
+#define has_sysmmu(dev)(dev_iommu_priv_get(dev) != NULL)
 
 static struct device *dma_dev;
 static struct kmem_cache *lv2table_kmem_cache;
@@ -226,7 +226,7 @@ static const struct sysmmu_fault_info sysmmu_v5_faults[] = {
 };
 
 /*
- * This structure is attached to dev.archdata.iommu of the master device
+ * This structure is attached to dev->iommu->priv of the master device
  * on device add, contains a list of SYSMMU controllers defined by device tree,
  * which are bound to given master device. It is usually referenced by 'owner'
  * pointer.
@@ -670,7 +670,7 @@ static int __maybe_unused exynos_sysmmu_suspend(struct 
device *dev)
struct device *master = data->master;
 
if (master) {
-   struct exynos_iommu_owner *owner = master->archdata.iommu;
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(master);
 
mutex_lock(>rpm_lock);
if (data->domain) {
@@ -688,7 +688,7 @@ static int __maybe_unused exynos_sysmmu_resume(struct 
device *dev)
struct device *master = data->master;
 
if (master) {
-   struct exynos_iommu_owner *owner = master->archdata.iommu;
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(master);
 
mutex_lock(>rpm_lock);
if (data->domain) {
@@ -837,8 +837,8 @@ static void exynos_iommu_domain_free(struct iommu_domain 
*iommu_domain)
 static void exynos_iommu_detach_device(struct iommu_domain *iommu_domain,
struct device *dev)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct exynos_iommu_domain *domain = to_exynos_domain(iommu_domain);
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
phys_addr_t pagetable = virt_to_phys(domain->pgtable);
struct sysmmu_drvdata *data, *next;
unsigned long flags;
@@ -875,8 +875,8 @@ static void exynos_iommu_detach_device(struct iommu_domain 
*iommu_domain,
 static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain,
   struct device *dev)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct exynos_iommu_domain *domain = to_exynos_domain(iommu_domain);
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
struct sysmmu_drvdata *data;
phys_addr_t pagetable = virt_to_phys(domain->pgtable);
unsigned long flags;
@@ -1237,7 +1237,7 @@ static phys_addr_t exynos_iommu_iova_to_phys(struct 
iommu_domain *iommu_domain,
 
 static struct iommu_device *exynos_iommu_probe_device(struct device *dev)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
struct sysmmu_drvdata *data;
 
if (!has_sysmmu(dev))
@@ -1263,7 +1263,7 @@ static struct iommu_device 
*exynos_iommu_probe_device(struct device *dev)
 
 static void exynos_iommu_release_device(struct device *dev)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
struct sysmmu_drvdata *data;
 
if (!has_sysmmu(dev))
@@ -1287,8 +1287,8 @@ static void exynos_iommu_release_device(struct device 
*dev)
 static int exynos_iommu_of_xlate(struct device *dev,
 struct of_phandle_args *spec)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct platform_device *sysmmu = of_find_device_by_node(spec->np);
+   struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
struct sysmmu_drvdata *data, *entry;
 
if (!sysmmu)
@@ -1305,7 +1305,7 @@ static int exynos_iommu_of_xlate(struct device *dev,
 
INIT_LIST_HEAD(>controllers);
mutex_init(>rpm_lock);
-   dev->archdata.iommu = owner;
+   dev_iommu_priv_set(dev, owner);
}
 
list_for_each_entry(entry, >controllers, owner_node)
diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_iommu.h 
b/drivers/media/platform/s5p-mfc/s5p_mfc_iommu.h
index 152a713fff78..1a32266b7ddc 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc_iommu.h
+++ 

[PATCH 02/13] iommu/vt-d: Use dev_iommu_priv_get/set()

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Remove the use of dev->archdata.iommu and use the private per-device
pointer provided by IOMMU core code instead.

Signed-off-by: Joerg Roedel 
---
 .../gpu/drm/i915/selftests/mock_gem_device.c   | 10 --
 drivers/iommu/intel/iommu.c| 18 +-
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c 
b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 9b105b811f1f..e08601905a64 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -24,6 +24,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -118,6 +119,9 @@ struct drm_i915_private *mock_gem_device(void)
 {
struct drm_i915_private *i915;
struct pci_dev *pdev;
+#if IS_ENABLED(CONFIG_IOMMU_API) && defined(CONFIG_INTEL_IOMMU)
+   struct dev_iommu iommu;
+#endif
int err;
 
pdev = kzalloc(sizeof(*pdev), GFP_KERNEL);
@@ -136,8 +140,10 @@ struct drm_i915_private *mock_gem_device(void)
dma_coerce_mask_and_coherent(>dev, DMA_BIT_MASK(64));
 
 #if IS_ENABLED(CONFIG_IOMMU_API) && defined(CONFIG_INTEL_IOMMU)
-   /* hack to disable iommu for the fake device; force identity mapping */
-   pdev->dev.archdata.iommu = (void *)-1;
+   /* HACK HACK HACK to disable iommu for the fake device; force identity 
mapping */
+   memset(, 0, sizeof(iommu));
+   iommu.priv = (void *)-1;
+   pdev->dev.iommu = 
 #endif
 
pci_set_drvdata(pdev, i915);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d759e7234e98..2ce490c2eab8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -372,7 +372,7 @@ struct device_domain_info *get_domain_info(struct device 
*dev)
if (!dev)
return NULL;
 
-   info = dev->archdata.iommu;
+   info = dev_iommu_priv_get(dev);
if (unlikely(info == DUMMY_DEVICE_DOMAIN_INFO ||
 info == DEFER_DEVICE_DOMAIN_INFO))
return NULL;
@@ -743,12 +743,12 @@ struct context_entry *iommu_context_addr(struct 
intel_iommu *iommu, u8 bus,
 
 static int iommu_dummy(struct device *dev)
 {
-   return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
+   return dev_iommu_priv_get(dev) == DUMMY_DEVICE_DOMAIN_INFO;
 }
 
 static bool attach_deferred(struct device *dev)
 {
-   return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO;
+   return dev_iommu_priv_get(dev) == DEFER_DEVICE_DOMAIN_INFO;
 }
 
 /**
@@ -2420,7 +2420,7 @@ static inline void unlink_domain_info(struct 
device_domain_info *info)
list_del(>link);
list_del(>global);
if (info->dev)
-   info->dev->archdata.iommu = NULL;
+   dev_iommu_priv_set(info->dev, NULL);
 }
 
 static void domain_remove_dev_info(struct dmar_domain *domain)
@@ -2453,7 +2453,7 @@ static void do_deferred_attach(struct device *dev)
 {
struct iommu_domain *domain;
 
-   dev->archdata.iommu = NULL;
+   dev_iommu_priv_set(dev, NULL);
domain = iommu_get_domain_for_dev(dev);
if (domain)
intel_iommu_attach_device(domain, dev);
@@ -2599,7 +2599,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
list_add(>link, >devices);
list_add(>global, _domain_list);
if (dev)
-   dev->archdata.iommu = info;
+   dev_iommu_priv_set(dev, info);
spin_unlock_irqrestore(_domain_lock, flags);
 
/* PASID table is mandatory for a PCI device in scalable mode. */
@@ -4004,7 +4004,7 @@ static void quirk_ioat_snb_local_iommu(struct pci_dev 
*pdev)
if (!drhd || drhd->reg_base_addr - vtbar != 0xa000) {
pr_warn_once(FW_BUG "BIOS assigned incorrect VT-d unit for 
Intel(R) QuickData Technology device\n");
add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
-   pdev->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
+   dev_iommu_priv_set(>dev, DUMMY_DEVICE_DOMAIN_INFO);
}
 }
 DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IOAT_SNB, 
quirk_ioat_snb_local_iommu);
@@ -4043,7 +4043,7 @@ static void __init init_no_remapping_devices(void)
drhd->ignored = 1;
for_each_active_dev_scope(drhd->devices,
  drhd->devices_cnt, i, dev)
-   dev->archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
+   dev_iommu_priv_set(dev, 
DUMMY_DEVICE_DOMAIN_INFO);
}
}
 }
@@ -5665,7 +5665,7 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
return ERR_PTR(-ENODEV);
 
if (translation_pre_enabled(iommu))
-   dev->archdata.iommu = DEFER_DEVICE_DOMAIN_INFO;
+   dev_iommu_priv_set(dev, 

[PATCH 00/13] iommu: Remove usage of dev->archdata.iommu

2020-06-25 Thread Joerg Roedel
From: Joerg Roedel 

Hi,

here is a patch-set to remove the usage of dev->archdata.iommu from
the IOMMU code in the kernel and replace its uses by the iommu per-device
private data field. The changes also remove the field entirely from
the architectures which no longer need it.

On PowerPC the field is called dev->archdata.iommu_domain and was only
used by the PAMU IOMMU driver. It gets removed as well.

The patches have been runtime tested on Intel VT-d and compile tested
with allyesconfig for:

* x86 (32 and 64 bit)
* arm and arm64
* ia64 (only drivers/ because build failed for me in
arch/ia64)
* PPC64

Besides that the changes also survived my IOMMU tree compile tests.

Please review.

Regards,

Joerg

Joerg Roedel (13):
  iommu/exynos: Use dev_iommu_priv_get/set()
  iommu/vt-d: Use dev_iommu_priv_get/set()
  iommu/msm: Use dev_iommu_priv_get/set()
  iommu/omap: Use dev_iommu_priv_get/set()
  iommu/rockchip: Use dev_iommu_priv_get/set()
  iommu/tegra: Use dev_iommu_priv_get/set()
  iommu/pamu: Use dev_iommu_priv_get/set()
  iommu/mediatek: Do no use dev->archdata.iommu
  x86: Remove dev->archdata.iommu pointer
  ia64: Remove dev->archdata.iommu pointer
  arm: Remove dev->archdata.iommu pointer
  arm64: Remove dev->archdata.iommu pointer
  powerpc/dma: Remove dev->archdata.iommu_domain

 arch/arm/include/asm/device.h |  3 ---
 arch/arm64/include/asm/device.h   |  3 ---
 arch/ia64/include/asm/device.h|  3 ---
 arch/powerpc/include/asm/device.h |  3 ---
 arch/x86/include/asm/device.h |  3 ---
 .../gpu/drm/i915/selftests/mock_gem_device.c  | 10 --
 drivers/iommu/exynos-iommu.c  | 20 +--
 drivers/iommu/fsl_pamu_domain.c   |  8 
 drivers/iommu/intel/iommu.c   | 18 -
 drivers/iommu/msm_iommu.c |  4 ++--
 drivers/iommu/mtk_iommu.h |  2 ++
 drivers/iommu/mtk_iommu_v1.c  | 10 --
 drivers/iommu/omap-iommu.c| 20 +--
 drivers/iommu/rockchip-iommu.c|  8 
 drivers/iommu/tegra-gart.c|  8 
 drivers/iommu/tegra-smmu.c|  8 
 .../media/platform/s5p-mfc/s5p_mfc_iommu.h|  4 +++-
 17 files changed, 64 insertions(+), 71 deletions(-)

-- 
2.27.0



Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line

2020-06-25 Thread Rafael J. Wysocki
On Thu, Jun 25, 2020 at 1:36 PM Viresh Kumar  wrote:
>
> After your last email (reply to my patch), I noticed a change which
> isn't required. :)
>
> On 23-06-20, 15:21, Quentin Perret wrote:
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 0128de3603df..4b1a5c0173cf 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -50,6 +50,9 @@ static LIST_HEAD(cpufreq_governor_list);
> >  #define for_each_governor(__governor)\
> >   list_for_each_entry(__governor, _governor_list, governor_list)
> >
> > +static char cpufreq_param_governor[CPUFREQ_NAME_LEN];
> > +static struct cpufreq_governor *default_governor;
> > +
> >  /**
> >   * The "cpufreq driver" - the arch- or hardware-dependent low
> >   * level driver of CPUFreq support, and its spinlock. This lock
> > @@ -1055,7 +1058,6 @@ __weak struct cpufreq_governor 
> > *cpufreq_default_governor(void)
> >
> >  static int cpufreq_init_policy(struct cpufreq_policy *policy)
> >  {
> > - struct cpufreq_governor *def_gov = cpufreq_default_governor();
> >   struct cpufreq_governor *gov = NULL;
> >   unsigned int pol = CPUFREQ_POLICY_UNKNOWN;
> >
> > @@ -1065,8 +1067,8 @@ static int cpufreq_init_policy(struct cpufreq_policy 
> > *policy)
> >   if (gov) {
> >   pr_debug("Restoring governor %s for cpu %d\n",
> >policy->governor->name, policy->cpu);
> > - } else if (def_gov) {
> > - gov = def_gov;
> > + } else if (default_governor) {
> > + gov = default_governor;
> >   } else {
> >   return -ENODATA;
> >   }
>
>
> > @@ -1074,8 +1076,8 @@ static int cpufreq_init_policy(struct cpufreq_policy 
> > *policy)
> >   /* Use the default policy if there is no last_policy. */
> >   if (policy->last_policy) {
> >   pol = policy->last_policy;
> > - } else if (def_gov) {
> > - pol = cpufreq_parse_policy(def_gov->name);
> > + } else if (default_governor) {
> > + pol = cpufreq_parse_policy(default_governor->name);
>
> This change is not right IMO. This part handles the set-policy case,
> where there are no governors. Right now this code, for some reasons
> unknown to me, forcefully uses the default governor set to indicate
> the policy, which is not a great idea in my opinion TBH. This doesn't
> and shouldn't care about governor modules and should only be looking
> at strings instead of governor pointer.

Sounds right.

> Rafael, I even think we should remove this code completely and just
> rely on what the driver has sent to us. Using the selected governor
> for set policy drivers is very confusing and also we shouldn't be
> forced to compiling any governor for the set-policy case.

Well, AFAICS the idea was to use the default governor as a kind of
default policy proxy, but I agree that strings should be sufficient
for that.

I'll have a look at what to do with that code.


Re: [PATCH v2 1/4] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings

2020-06-25 Thread Aneesh Kumar K.V
"Aneesh Kumar K.V"  writes:


> Fixing this includes 3 parts:
>
> - Re-walk the init_mm page tables from mem_init() and initialize
>   the PMD and PTE fragment count to 1.
> - When freeing PUD, PMD and PTE page table pages, check explicitly
>   if they come from memblock and if so free then appropriately.
> - When we do early memblock based allocation of PMD and PUD pages,
>   allocate in PAGE_SIZE granularity so that we are sure the
>   complete page is used as pagetable page.
>
> Since we now do PAGE_SIZE allocations for both PUD table and
> PMD table (Note that PTE table allocation is already of PAGE_SIZE),
> we end up allocating more memory for the same amount of system RAM.
> Here is a comparision of how much more we need for a 64T and 2G
> system after this patch:
>

Missed updating the commit message w.r.t page table fragments.  Updated
one below.

powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings

We can hit the following BUG_ON during memory unplug:

kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:342!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
NIP [c0093308] pmd_fragment_free+0x48/0xc0
LR [c147bfec] remove_pagetable+0x578/0x60c
Call Trace:
0xc0805000 (unreliable)
remove_pagetable+0x384/0x60c
radix__remove_section_mapping+0x18/0x2c
remove_section_mapping+0x1c/0x3c
arch_remove_memory+0x11c/0x180
try_remove_memory+0x120/0x1b0
__remove_memory+0x20/0x40
dlpar_remove_lmb+0xc0/0x114
dlpar_memory+0x8b0/0xb20
handle_dlpar_errorlog+0xc0/0x190
pseries_hp_work_fn+0x2c/0x60
process_one_work+0x30c/0x810
worker_thread+0x98/0x540
kthread+0x1c4/0x1d0
ret_from_kernel_thread+0x5c/0x74

This occurs when unplug is attempted for such memory which has
been mapped using memblock pages as part of early kernel page
table setup. We wouldn't have initialized the PMD or PTE fragment
count for those PMD or PTE pages.

This can be fixed by allocating memory in PAGE_SIZE granularity
during early page table allocation. This makes sure a specific
page is not shared for another memblock allocation and we can
free them correctly on removing page-table pages.

Since we now do PAGE_SIZE allocations for both PUD table and
PMD table (Note that PTE table allocation is already of PAGE_SIZE),
we end up allocating more memory for the same amount of system RAM.
Here is a comparision of how much more we need for a 64T and 2G
system after this patch:

1. 64T system
-
64T RAM would need 64G for vmemmap with struct page size being 64B.

128 PUD tables for 64T memory (1G mappings)
1 PUD table and 64 PMD tables for 64G vmemmap (2M mappings)

With default PUD[PMD]_TABLE_SIZE(4K), (128+1+64)*4K=772K
With PAGE_SIZE(64K) table allocations, (128+1+64)*64K=12352K

2. 2G system

2G RAM would need 2M for vmemmap with struct page size being 64B.

1 PUD table for 2G memory (1G mapping)
1 PUD table and 1 PMD table for 2M vmemmap (2M mappings)

With default PUD[PMD]_TABLE_SIZE(4K), (1+1+1)*4K=12K
With new PAGE_SIZE(64K) table allocations, (1+1+1)*64K=192K

Signed-off-by: Bharata B Rao 
Signed-off-by: Aneesh Kumar K.V 

-aneesh


Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line

2020-06-25 Thread Rafael J. Wysocki
On Thu, Jun 25, 2020 at 10:50 AM Viresh Kumar  wrote:
>
> On 24-06-20, 16:32, Quentin Perret wrote:
> > Right, but I must admit that, looking at this more, I'm getting a bit
> > confused with the overall locking for governors :/
> >
> > When in cpufreq_init_policy() we find a governor using
> > find_governor(policy->last_governor), what guarantees this governor is
> > not concurrently unregistered? That is, what guarantees this governor
> > doesn't go away between that find_governor() call, and the subsequent
> > call to try_module_get() in cpufreq_set_policy() down the line?
> >
> > Can we somewhat assume that whatever governor is referred to by
> > policy->last_governor will have a non-null refcount? Or are the
> > cpufreq_online() and cpufreq_unregister_governor() path mutually
> > exclusive? Or is there something else?
>
> This should be sufficient to fix pending issues I believe. Based over your
> patches.

LGTM, but can you post it in a new thread to let Patchwork pick it up?

> -8<-
> From: Viresh Kumar 
> Date: Thu, 25 Jun 2020 13:15:23 +0530
> Subject: [PATCH] cpufreq: Fix locking issues with governors
>
> The locking around governors handling isn't adequate currently. The list
> of governors should never be traversed without locking in place. Also we
> must make sure the governor isn't removed while it is still referenced
> by code.
>
> Reported-by: Quentin Perret 
> Signed-off-by: Viresh Kumar 
> ---
>  drivers/cpufreq/cpufreq.c | 59 ---
>  1 file changed, 36 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4b1a5c0173cf..dad6b85f4c89 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -624,6 +624,24 @@ static struct cpufreq_governor *find_governor(const char 
> *str_governor)
> return NULL;
>  }
>
> +static struct cpufreq_governor *get_governor(const char *str_governor)
> +{
> +   struct cpufreq_governor *t;
> +
> +   mutex_lock(_governor_mutex);
> +   t = find_governor(str_governor);
> +   if (!t)
> +   goto unlock;
> +
> +   if (!try_module_get(t->owner))
> +   t = NULL;
> +
> +unlock:
> +   mutex_unlock(_governor_mutex);
> +
> +   return t;
> +}
> +
>  static unsigned int cpufreq_parse_policy(char *str_governor)
>  {
> if (!strncasecmp(str_governor, "performance", CPUFREQ_NAME_LEN))
> @@ -643,28 +661,14 @@ static struct cpufreq_governor 
> *cpufreq_parse_governor(char *str_governor)
>  {
> struct cpufreq_governor *t;
>
> -   mutex_lock(_governor_mutex);
> -
> -   t = find_governor(str_governor);
> -   if (!t) {
> -   int ret;
> -
> -   mutex_unlock(_governor_mutex);
> -
> -   ret = request_module("cpufreq_%s", str_governor);
> -   if (ret)
> -   return NULL;
> -
> -   mutex_lock(_governor_mutex);
> +   t = get_governor(str_governor);
> +   if (t)
> +   return t;
>
> -   t = find_governor(str_governor);
> -   }
> -   if (t && !try_module_get(t->owner))
> -   t = NULL;
> -
> -   mutex_unlock(_governor_mutex);
> +   if (request_module("cpufreq_%s", str_governor))
> +   return NULL;
>
> -   return t;
> +   return get_governor(str_governor);
>  }
>
>  /**
> @@ -818,12 +822,14 @@ static ssize_t show_scaling_available_governors(struct 
> cpufreq_policy *policy,
> goto out;
> }
>
> +   mutex_lock(_governor_mutex);
> for_each_governor(t) {
> if (i >= (ssize_t) ((PAGE_SIZE / sizeof(char))
> - (CPUFREQ_NAME_LEN + 2)))
> -   goto out;
> +   break;
> i += scnprintf([i], CPUFREQ_NAME_PLEN, "%s ", t->name);
> }
> +   mutex_unlock(_governor_mutex);
>  out:
> i += sprintf([i], "\n");
> return i;
> @@ -1060,11 +1066,14 @@ static int cpufreq_init_policy(struct cpufreq_policy 
> *policy)
>  {
> struct cpufreq_governor *gov = NULL;
> unsigned int pol = CPUFREQ_POLICY_UNKNOWN;
> +   bool put_governor = false;
> +   int ret;
>
> if (has_target()) {
> /* Update policy governor to the one used before hotplug. */
> -   gov = find_governor(policy->last_governor);
> +   gov = get_governor(policy->last_governor);
> if (gov) {
> +   put_governor = true;
> pr_debug("Restoring governor %s for cpu %d\n",
>  policy->governor->name, policy->cpu);
> } else if (default_governor) {
> @@ -1091,7 +1100,11 @@ static int cpufreq_init_policy(struct cpufreq_policy 
> *policy)
> return -ENODATA;
> }
>
> -   return cpufreq_set_policy(policy, gov, pol);
> +

Re: [PATCH] powerpc: Warn about use of smt_snooze_delay

2020-06-25 Thread Christophe Leroy




Le 25/06/2020 à 12:03, Joel Stanley a écrit :

It's not done anything for a long time. Save the percpu variable, and
emit a warning to remind users to not expect it to do anything.


Why not just drop the file entirely  if it is useless ?

Christophe



Signed-off-by: Joel Stanley 
---
  arch/powerpc/kernel/sysfs.c | 41 +
  1 file changed, 14 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 571b3259697e..530ae92bc46d 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -32,29 +32,25 @@
  
  static DEFINE_PER_CPU(struct cpu, cpu_devices);
  
-/*

- * SMT snooze delay stuff, 64-bit only for now
- */
-
  #ifdef CONFIG_PPC64
  
-/* Time in microseconds we delay before sleeping in the idle loop */

-static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
+/*
+ * Snooze delay has not been hooked up since 3fa8cad82b94 
("powerpc/pseries/cpuidle:
+ * smt-snooze-delay cleanup.") and has been broken even longer. As was 
foretold in
+ * 2014:
+ *
+ *  "ppc64_util currently utilises it. Once we fix ppc64_util, propose to clean
+ *  up the kernel code."
+ *
+ * At some point in the future this code should be removed.
+ */
  
  static ssize_t store_smt_snooze_delay(struct device *dev,

  struct device_attribute *attr,
  const char *buf,
  size_t count)
  {
-   struct cpu *cpu = container_of(dev, struct cpu, dev);
-   ssize_t ret;
-   long snooze;
-
-   ret = sscanf(buf, "%ld", );
-   if (ret != 1)
-   return -EINVAL;
-
-   per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
+   WARN_ON_ONCE("smt_snooze_delay sysfs file has no effect\n");
return count;
  }
  
@@ -62,9 +58,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,

 struct device_attribute *attr,
 char *buf)
  {
-   struct cpu *cpu = container_of(dev, struct cpu, dev);
+   WARN_ON_ONCE("smt_snooze_delay sysfs file has no effect\n");
  
-	return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));

+   return sprintf(buf, "100\n");
  }
  
  static DEVICE_ATTR(smt_snooze_delay, 0644, show_smt_snooze_delay,

@@ -72,16 +68,7 @@ static DEVICE_ATTR(smt_snooze_delay, 0644, 
show_smt_snooze_delay,
  
  static int __init setup_smt_snooze_delay(char *str)

  {
-   unsigned int cpu;
-   long snooze;
-
-   if (!cpu_has_feature(CPU_FTR_SMT))
-   return 1;
-
-   snooze = simple_strtol(str, NULL, 10);
-   for_each_possible_cpu(cpu)
-   per_cpu(smt_snooze_delay, cpu) = snooze;
-
+   WARN_ON_ONCE("smt-snooze-delay command line option has no effect\n");
return 1;
  }
  __setup("smt-snooze-delay=", setup_smt_snooze_delay);



Re: FSL P5020/P5040: DPAA Ethernet issue with the latest Git kernel

2020-06-25 Thread Alexander Gordeev
On Thu, Jun 25, 2020 at 01:01:52AM +0200, Christian Zigotzky wrote:
[...]
> I compiled a test kernel with the option "CONFIG_TEST_BITMAP=y"
> yesterday. After that Skateman and I booted it and looked for the
> bitmap tests with "dmesg | grep -i bitmap".
> 
> Results:
> 
> FSL P5020:
> 
> [    0.297756] test_bitmap: loaded.
> [    0.298113] test_bitmap: parselist: 14: input is '0-2047:128/256'
> OK, Time: 562
> [    0.298142] test_bitmap: parselist_user: 14: input is
> '0-2047:128/256' OK, Time: 761
> [    0.301553] test_bitmap: all 1663 tests passed
> 
> FSL P5040:
> 
> [    0.296563] test_bitmap: loaded.
> [    0.296894] test_bitmap: parselist: 14: input is '0-2047:128/256'
> OK, Time: 540
> [    0.296920] test_bitmap: parselist_user: 14: input is
> '0-2047:128/256' OK, Time: 680
> [    0.24] test_bitmap: all 1663 tests passed

Thanks for the test! So it works as expected.

I would suggest to compare what is going on on the device probing
with and without the bisected commit.

There seems to be MAC and PHY mode initialization issue that might
resulted from the bitmap format change.

I put Madalin and Sascha on CC as they have done some works on
this part recently.

Thanks!


> Thanks,
> Christian


[PATCH] powerpc: Warn about use of smt_snooze_delay

2020-06-25 Thread Joel Stanley
It's not done anything for a long time. Save the percpu variable, and
emit a warning to remind users to not expect it to do anything.

Signed-off-by: Joel Stanley 
---
 arch/powerpc/kernel/sysfs.c | 41 +
 1 file changed, 14 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 571b3259697e..530ae92bc46d 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -32,29 +32,25 @@
 
 static DEFINE_PER_CPU(struct cpu, cpu_devices);
 
-/*
- * SMT snooze delay stuff, 64-bit only for now
- */
-
 #ifdef CONFIG_PPC64
 
-/* Time in microseconds we delay before sleeping in the idle loop */
-static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 };
+/*
+ * Snooze delay has not been hooked up since 3fa8cad82b94 
("powerpc/pseries/cpuidle:
+ * smt-snooze-delay cleanup.") and has been broken even longer. As was 
foretold in
+ * 2014:
+ *
+ *  "ppc64_util currently utilises it. Once we fix ppc64_util, propose to clean
+ *  up the kernel code."
+ *
+ * At some point in the future this code should be removed.
+ */
 
 static ssize_t store_smt_snooze_delay(struct device *dev,
  struct device_attribute *attr,
  const char *buf,
  size_t count)
 {
-   struct cpu *cpu = container_of(dev, struct cpu, dev);
-   ssize_t ret;
-   long snooze;
-
-   ret = sscanf(buf, "%ld", );
-   if (ret != 1)
-   return -EINVAL;
-
-   per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
+   WARN_ON_ONCE("smt_snooze_delay sysfs file has no effect\n");
return count;
 }
 
@@ -62,9 +58,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
 struct device_attribute *attr,
 char *buf)
 {
-   struct cpu *cpu = container_of(dev, struct cpu, dev);
+   WARN_ON_ONCE("smt_snooze_delay sysfs file has no effect\n");
 
-   return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
+   return sprintf(buf, "100\n");
 }
 
 static DEVICE_ATTR(smt_snooze_delay, 0644, show_smt_snooze_delay,
@@ -72,16 +68,7 @@ static DEVICE_ATTR(smt_snooze_delay, 0644, 
show_smt_snooze_delay,
 
 static int __init setup_smt_snooze_delay(char *str)
 {
-   unsigned int cpu;
-   long snooze;
-
-   if (!cpu_has_feature(CPU_FTR_SMT))
-   return 1;
-
-   snooze = simple_strtol(str, NULL, 10);
-   for_each_possible_cpu(cpu)
-   per_cpu(smt_snooze_delay, cpu) = snooze;
-
+   WARN_ON_ONCE("smt-snooze-delay command line option has no effect\n");
return 1;
 }
 __setup("smt-snooze-delay=", setup_smt_snooze_delay);
-- 
2.27.0



Re: PowerPC KVM-PR issue

2020-06-25 Thread Christian Zigotzky

On 15 June 2020 at 01:39 pm, Christian Zigotzky wrote:

On 14 June 2020 at 04:52 pm, Christian Zigotzky wrote:

On 14 June 2020 at 02:53 pm, Nicholas Piggin wrote:

Excerpts from Christian Zigotzky's message of June 12, 2020 11:01 pm:

On 11 June 2020 at 04:47 pm, Christian Zigotzky wrote:

On 10 June 2020 at 01:23 pm, Christian Zigotzky wrote:

On 10 June 2020 at 11:06 am, Christian Zigotzky wrote:

On 10 June 2020 at 00:18 am, Christian Zigotzky wrote:

Hello,

KVM-PR doesn't work anymore on my Nemo board [1]. I figured out
that the Git kernels and the kernel 5.7 are affected.

Error message: Fienix kernel: kvmppc_exit_pr_progint: emulation at
700 failed ()

I can boot virtual QEMU PowerPC machines with KVM-PR with the
kernel 5.6 without any problems on my Nemo board.

I tested it with QEMU 2.5.0 and QEMU 5.0.0 today.

Could you please check KVM-PR on your PowerPC machine?

Thanks,
Christian

[1] https://en.wikipedia.org/wiki/AmigaOne_X1000
I figured out that the PowerPC updates 5.7-1 [1] are responsible 
for

the KVM-PR issue. Please test KVM-PR on your PowerPC machines and
check the PowerPC updates 5.7-1 [1].

Thanks

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d38c07afc356ddebaa3ed8ecb3f553340e05c969 





I tested the latest Git kernel with Mac-on-Linux/KVM-PR today.
Unfortunately I can't use KVM-PR with MoL anymore because of this
issue (see screenshots [1]). Please check the PowerPC updates 5.7-1.

Thanks

[1]
-
https://i.pinimg.com/originals/0c/b3/64/0cb364a40241fa2b7f297d4272bbb8b7.png 


-
https://i.pinimg.com/originals/9a/61/d1/9a61d170b1c9f514f7a78a3014ffd18f.png 




Hi All,

I bisected today because of the KVM-PR issue.

Result:

9600f261acaaabd476d7833cec2dd20f2919f1a0 is the first bad commit
commit 9600f261acaaabd476d7833cec2dd20f2919f1a0
Author: Nicholas Piggin 
Date:   Wed Feb 26 03:35:21 2020 +1000

 powerpc/64s/exception: Move KVM test to common code

 This allows more code to be moved out of unrelocated regions. 
The
 system call KVMTEST is changed to be open-coded and remain in 
the

 tramp area to avoid having to move it to entry_64.S. The custom
nature
 of the system call entry code means the hcall case can be 
made more

 streamlined than regular interrupt handlers.

 mpe: Incorporate fix from Nick:

 Moving KVM test to the common entry code missed the case of 
HMI and
 MCE, which do not do __GEN_COMMON_ENTRY (because they don't 
want to

 switch to virt mode).

 This means a MCE or HMI exception that is taken while KVM is
running a
 guest context will not be switched out of that context, and 
KVM won't
 be notified. Found by running sigfuz in guest with patched 
host on
 POWER9 DD2.3, which causes some TM related HMI interrupts 
(which are

 expected and supposed to be handled by KVM).

 This fix adds a __GEN_REALMODE_COMMON_ENTRY for those 
handlers to add
 the KVM test. This makes them look a little more like other 
handlers

 that all use __GEN_COMMON_ENTRY.

 Signed-off-by: Nicholas Piggin 
 Signed-off-by: Michael Ellerman 
 Link:
https://lore.kernel.org/r/20200225173541.1549955-13-npig...@gmail.com

:04 04 ec21cec22d165f8696d69532734cb2985d532cb0
87dd49a9cd7202ec79350e8ca26cea01f1dbd93d M    arch

-

The following commit is the problem: powerpc/64s/exception: Move KVM
test to common code [1]

These changes were included in the PowerPC updates 5.7-1. [2]

Another test:

git checkout d38c07afc356ddebaa3ed8ecb3f553340e05c969 (PowerPC 
updates

5.7-1 [2] ) -> KVM-PR doesn't work.

After that: git revert d38c07afc356ddebaa3ed8ecb3f553340e05c969 -m 1
-> KVM-PR works.

Could you please check the first bad commit? [1]

Thanks,
Christian


[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9600f261acaaabd476d7833cec2dd20f2919f1a0 


[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d38c07afc356ddebaa3ed8ecb3f553340e05c969 


Hi All,

I tried to revert the __GEN_REALMODE_COMMON_ENTRY fix for the 
latest Git

kernel and for the stable kernel 5.7.2 but without any success. There
was lot of restructuring work during the kernel 5.7 development 
time in

the PowerPC area so it isn't possible reactivate the old code. That
means we have lost the whole KVM-PR support. I also reported this 
issue

to Alexander Graf two days ago. He wrote: "Howdy :). It looks pretty
broken. Have you ever made a bisect to see where the problem comes 
from?"


Please check the KVM-PR code.

Does this patch fix it for you?

The CTR register reload in the KVM interrupt path used the wrong save
area for SLB (and NMI) interrupts.

Fixes: 9600f261acaaa ("powerpc/64s/exception: Move KVM test to 
common code")

Reported-by: Christian Zigotzky 
Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/kernel/exceptions-64s.S | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 

Re: [PATCH 16/17] arch: remove HAVE_COPY_THREAD_TLS

2020-06-25 Thread Thomas Bogendoerfer
On Tue, Jun 23, 2020 at 01:43:25AM +0200, Christian Brauner wrote:
> All architectures support copy_thread_tls() now, so remove the legacy
> copy_thread() function and the HAVE_COPY_THREAD_TLS config option. Everyone
> uses the same process creation calling convention based on
> copy_thread_tls() and struct kernel_clone_args. This will make it easier to
> maintain the core process creation code under kernel/, simplifies the
> callpaths and makes the identical for all architectures.
> [..] 
>  arch/mips/Kconfig  |  1 -

Acked-by: Thomas Bogendoerfer 

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]


Re: [PATCH 17/17] arch: rename copy_thread_tls() back to copy_thread()

2020-06-25 Thread Thomas Bogendoerfer
On Tue, Jun 23, 2020 at 01:43:26AM +0200, Christian Brauner wrote:
> Now that HAVE_COPY_THREAD_TLS has been removed, rename copy_thread_tls()
> back simply copy_thread(). It's a simpler name, and doesn't imply that only
> tls is copied here. This finishes an outstanding chunk of internal process
> creation work since we've added clone3().
> [..]
>  arch/mips/kernel/process.c   | 2 +-

Acked-by: Thomas Bogendoerfer 

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]


Re: [PATCH v5 01/13] powerpc: Remove Xilinx PPC405/PPC440 support

2020-06-25 Thread Joel Stanley
On Fri, 19 Jun 2020 at 11:02, Michael Ellerman  wrote:
>
> Nathan Chancellor  writes:
> >> It's kind of nuts that the zImage points to some arbitrary image
> >> depending on what's configured and the order of things in the Makefile.
> >> But I'm not sure how we make it less nuts without risking breaking
> >> people's existing setups.
> >
> > Hi Michael,
> >
> > For what it's worth, this is squared this away in terms of our CI by
> > just building and booting the uImage directly, rather than implicitly
> > using the zImage:
> >
> > https://github.com/ClangBuiltLinux/continuous-integration/pull/282
> > https://github.com/ClangBuiltLinux/boot-utils/pull/22
>
> Great.
>
> > We were only using the zImage because that is what Joel Stanley intially
> > set us up with when PowerPC 32-bit was added to our CI:
> >
> > https://github.com/ClangBuiltLinux/continuous-integration/pull/100
>
> Ah, so Joel owes us all beers then ;)

Hey, you owe me beers for finding broken machines!

This machine was picked from a vague discussion on an internal chat.
The two requirements were that it would build, and boot in qemu.

If there's a better supported 32 bit machine then we should switch the
CI over. We don't want the Clang CI to be the only user and give the
false impression that someone out there is still booting upstream
kernels on it.

> > Admittedly, we really do not have many PowerPC experts in our
> > organization so we are supporting it on a "best effort" basis, which
> > often involves using whatever knowledge is floating around or can be
> > gained from interactions such as this :) so thank you for that!
>
> No worries. I definitely don't expect you folks to invest much effort in
> powerpc, especially the old 32-bit stuff, so always happy to help debug
> things, and really appreciate the testing you do.

+1

Cheers,

Joel


Re: [PATCH v4 6/8] arm: Break cyclic percpu include

2020-06-25 Thread Will Deacon
On Wed, Jun 24, 2020 at 07:53:20PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 23, 2020 at 10:02:57AM +0100, Will Deacon wrote:
> > On Tue, Jun 23, 2020 at 10:36:51AM +0200, Peter Zijlstra wrote:
> > > In order to use  in irqflags.h, we need to make sure
> > > asm/percpu.h does not itself depend on irqflags.h.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) 
> > > ---
> > >  arch/arm/include/asm/percpu.h |2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > --- a/arch/arm/include/asm/percpu.h
> > > +++ b/arch/arm/include/asm/percpu.h
> > > @@ -10,6 +10,8 @@
> > >   * in the TPIDRPRW. TPIDRPRW only exists on V6K and V7
> > >   */
> > >  #if defined(CONFIG_SMP) && !defined(CONFIG_CPU_V6)
> > > +register unsigned long current_stack_pointer asm ("sp");
> > 
> > If you define this unconditionally, then we can probably get rid of the
> > copy in asm/thread_info.h, rather than duplicate the same #define.
> 
> The below delta seems to build arm-allnoconfig, arm-defconfig and
> arm-allmodconfig.
> 
> Although please don't ask me how asm/thread_info.h includes asm/percpu.h
> 
> Does that work for you?

Yes, thanks! I can't believe you removed the helpful comment.

> -/*
> - * how to get the current stack pointer in C
> - */

Will


[PATCH v2 4/4] powerpc/mm/radix: Create separate mappings for hot-plugged memory

2020-06-25 Thread Aneesh Kumar K.V
To enable memory unplug without splitting kernel page table
mapping, we force the max mapping size to the LMB size. LMB
size is the unit in which hypervisor will do memory add/remove
operation.

This implies on pseries system, we now end up mapping
memory with 2M page size instead of 1G. To improve
that we want hypervisor to hint the kernel about the hotplug
memory range.  This was added that as part of

commit b6eca183e23e ("powerpc/kernel: Enables memory
hot-remove after reboot on pseries guests")

But we still don't do that on PowerVM. Once we get PowerVM
updated, we can then force the 2M mapping only to hot-pluggable
memory region using memblock_is_hotpluggable(). Till then
let's depend on LMB size for finding the mapping page size
for linear range.

Signed-off-by: Bharata B Rao 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 83 
 arch/powerpc/platforms/powernv/setup.c   | 10 ++-
 2 files changed, 81 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 78ad11812e0d..a4179e4da49d 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -34,6 +35,7 @@
 
 unsigned int mmu_pid_bits;
 unsigned int mmu_base_pid;
+unsigned int radix_mem_block_size;
 
 static __ref void *early_alloc_pgtable(unsigned long size, int nid,
unsigned long region_start, unsigned long region_end)
@@ -266,6 +268,7 @@ static unsigned long next_boundary(unsigned long addr, 
unsigned long end)
 
 static int __meminit create_physical_mapping(unsigned long start,
 unsigned long end,
+unsigned long max_mapping_size,
 int nid, pgprot_t _prot)
 {
unsigned long vaddr, addr, mapping_size = 0;
@@ -279,6 +282,8 @@ static int __meminit create_physical_mapping(unsigned long 
start,
int rc;
 
gap = next_boundary(addr, end) - addr;
+   if (gap > max_mapping_size)
+   gap = max_mapping_size;
previous_size = mapping_size;
prev_exec = exec;
 
@@ -329,8 +334,9 @@ static void __init radix_init_pgtable(void)
 
/* We don't support slb for radix */
mmu_slb_size = 0;
+
/*
-* Create the linear mapping, using standard page size for now
+* Create the linear mapping
 */
for_each_memblock(memory, reg) {
/*
@@ -346,6 +352,7 @@ static void __init radix_init_pgtable(void)
 
WARN_ON(create_physical_mapping(reg->base,
reg->base + reg->size,
+   radix_mem_block_size,
-1, PAGE_KERNEL));
}
 
@@ -486,6 +493,49 @@ static int __init radix_dt_scan_page_sizes(unsigned long 
node,
return 1;
 }
 
+static int __init probe_memory_block_size(unsigned long node, const char 
*uname, int
+ depth, void *data)
+{
+   const __be32 *block_size;
+   int len;
+
+   if (depth != 1)
+   return 0;
+
+   if (!strcmp(uname, "ibm,dynamic-reconfiguration-memory")) {
+
+   block_size = of_get_flat_dt_prop(node, "ibm,lmb-size", );
+   if (!block_size || len < dt_root_size_cells * sizeof(__be32))
+   /*
+* Nothing in the device tree
+*/
+   return MIN_MEMORY_BLOCK_SIZE;
+
+   return dt_mem_next_cell(dt_root_size_cells, _size);
+
+   }
+
+   return 0;
+}
+
+static unsigned long radix_memory_block_size(void)
+{
+   unsigned long mem_block_size = MIN_MEMORY_BLOCK_SIZE;
+
+   if (firmware_has_feature(FW_FEATURE_OPAL)) {
+
+   mem_block_size = 1UL * 1024 * 1024 * 1024;
+
+   } else if (firmware_has_feature(FW_FEATURE_LPAR)) {
+   mem_block_size = of_scan_flat_dt(probe_memory_block_size, NULL);
+   if (!mem_block_size)
+   mem_block_size = MIN_MEMORY_BLOCK_SIZE;
+   }
+
+   return mem_block_size;
+}
+
+
 void __init radix__early_init_devtree(void)
 {
int rc;
@@ -494,17 +544,27 @@ void __init radix__early_init_devtree(void)
 * Try to find the available page sizes in the device-tree
 */
rc = of_scan_flat_dt(radix_dt_scan_page_sizes, NULL);
-   if (rc != 0)  /* Found */
-   goto found;
+   if (rc == 0) {
+   /*
+* no page size details found in device tree
+* let's assume we have page 4k and 64k support
+*/
+   mmu_psize_defs[MMU_PAGE_4K].shift = 12;
+

[PATCH v2 3/4] powerpc/mm/radix: Remove split_kernel_mapping()

2020-06-25 Thread Aneesh Kumar K.V
From: Bharata B Rao 

We split the page table mapping on memory unplug if the
linear range was mapped with huge page mapping (for ex: 1G)
The page table splitting code has a few issues:

1. Recursive locking

Memory unplug path takes cpu_hotplug_lock and calls stop_machine()
for splitting the mappings. However stop_machine() takes
cpu_hotplug_lock again causing deadlock.

2. BUG: sleeping function called from in_atomic() context
-
Memory unplug path (remove_pagetable) takes init_mm.page_table_lock
spinlock and later calls stop_machine() which does wait_for_completion()

3. Bad unlock unbalance
---
Memory unplug path takes init_mm.page_table_lock spinlock and calls
stop_machine(). The stop_machine thread function runs in a different
thread context (migration thread) which tries to release and reaquire
ptl. Releasing ptl from a different thread than which acquired it
causes bad unlock unbalance.

These problems can be avoided if we avoid mapping hot-plugged memory
with 1G mapping, thereby removing the need for splitting them during
unplug. The kernel always make sure the minimum unplug request is
SUBSECTION_SIZE for device memory and SECTION_SIZE for regular memory.

In preparation for such a change remove page table splitting support.

This essentially is a revert of
commit 4dd5f8a99e791 ("powerpc/mm/radix: Split linear mapping on hot-unplug")

Signed-off-by: Bharata B Rao 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 95 +---
 1 file changed, 19 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index af57604f295f..78ad11812e0d 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -723,32 +722,6 @@ static void free_pud_table(pud_t *pud_start, p4d_t *p4d)
p4d_clear(p4d);
 }
 
-struct change_mapping_params {
-   pte_t *pte;
-   unsigned long start;
-   unsigned long end;
-   unsigned long aligned_start;
-   unsigned long aligned_end;
-};
-
-static int __meminit stop_machine_change_mapping(void *data)
-{
-   struct change_mapping_params *params =
-   (struct change_mapping_params *)data;
-
-   if (!data)
-   return -1;
-
-   spin_unlock(_mm.page_table_lock);
-   pte_clear(_mm, params->aligned_start, params->pte);
-   create_physical_mapping(__pa(params->aligned_start),
-   __pa(params->start), -1, PAGE_KERNEL);
-   create_physical_mapping(__pa(params->end), __pa(params->aligned_end),
-   -1, PAGE_KERNEL);
-   spin_lock(_mm.page_table_lock);
-   return 0;
-}
-
 static void remove_pte_table(pte_t *pte_start, unsigned long addr,
 unsigned long end)
 {
@@ -777,52 +750,6 @@ static void remove_pte_table(pte_t *pte_start, unsigned 
long addr,
}
 }
 
-/*
- * clear the pte and potentially split the mapping helper
- */
-static void __meminit split_kernel_mapping(unsigned long addr, unsigned long 
end,
-   unsigned long size, pte_t *pte)
-{
-   unsigned long mask = ~(size - 1);
-   unsigned long aligned_start = addr & mask;
-   unsigned long aligned_end = addr + size;
-   struct change_mapping_params params;
-   bool split_region = false;
-
-   if ((end - addr) < size) {
-   /*
-* We're going to clear the PTE, but not flushed
-* the mapping, time to remap and flush. The
-* effects if visible outside the processor or
-* if we are running in code close to the
-* mapping we cleared, we are in trouble.
-*/
-   if (overlaps_kernel_text(aligned_start, addr) ||
-   overlaps_kernel_text(end, aligned_end)) {
-   /*
-* Hack, just return, don't pte_clear
-*/
-   WARN_ONCE(1, "Linear mapping %lx->%lx overlaps kernel "
- "text, not splitting\n", addr, end);
-   return;
-   }
-   split_region = true;
-   }
-
-   if (split_region) {
-   params.pte = pte;
-   params.start = addr;
-   params.end = end;
-   params.aligned_start = addr & ~(size - 1);
-   params.aligned_end = min_t(unsigned long, aligned_end,
-   (unsigned long)__va(memblock_end_of_DRAM()));
-   stop_machine(stop_machine_change_mapping, , NULL);
-   return;
-   }
-
-   pte_clear(_mm, addr, pte);
-}
-
 static void remove_pmd_table(pmd_t *pmd_start, unsigned 

[PATCH v2 1/4] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings

2020-06-25 Thread Aneesh Kumar K.V
We can hit the following BUG_ON during memory unplug:

kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:342!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
NIP [c0093308] pmd_fragment_free+0x48/0xc0
LR [c147bfec] remove_pagetable+0x578/0x60c
Call Trace:
0xc0805000 (unreliable)
remove_pagetable+0x384/0x60c
radix__remove_section_mapping+0x18/0x2c
remove_section_mapping+0x1c/0x3c
arch_remove_memory+0x11c/0x180
try_remove_memory+0x120/0x1b0
__remove_memory+0x20/0x40
dlpar_remove_lmb+0xc0/0x114
dlpar_memory+0x8b0/0xb20
handle_dlpar_errorlog+0xc0/0x190
pseries_hp_work_fn+0x2c/0x60
process_one_work+0x30c/0x810
worker_thread+0x98/0x540
kthread+0x1c4/0x1d0
ret_from_kernel_thread+0x5c/0x74

This occurs when unplug is attempted for such memory which has
been mapped using memblock pages as part of early kernel page
table setup. We wouldn't have initialized the PMD or PTE fragment
count for those PMD or PTE pages.

Fixing this includes 3 parts:

- Re-walk the init_mm page tables from mem_init() and initialize
  the PMD and PTE fragment count to 1.
- When freeing PUD, PMD and PTE page table pages, check explicitly
  if they come from memblock and if so free then appropriately.
- When we do early memblock based allocation of PMD and PUD pages,
  allocate in PAGE_SIZE granularity so that we are sure the
  complete page is used as pagetable page.

Since we now do PAGE_SIZE allocations for both PUD table and
PMD table (Note that PTE table allocation is already of PAGE_SIZE),
we end up allocating more memory for the same amount of system RAM.
Here is a comparision of how much more we need for a 64T and 2G
system after this patch:

1. 64T system
-
64T RAM would need 64G for vmemmap with struct page size being 64B.

128 PUD tables for 64T memory (1G mappings)
1 PUD table and 64 PMD tables for 64G vmemmap (2M mappings)

With default PUD[PMD]_TABLE_SIZE(4K), (128+1+64)*4K=772K
With PAGE_SIZE(64K) table allocations, (128+1+64)*64K=12352K

2. 2G system

2G RAM would need 2M for vmemmap with struct page size being 64B.

1 PUD table for 2G memory (1G mapping)
1 PUD table and 1 PMD table for 2M vmemmap (2M mappings)

With default PUD[PMD]_TABLE_SIZE(4K), (1+1+1)*4K=12K
With new PAGE_SIZE(64K) table allocations, (1+1+1)*64K=192K

Signed-off-by: Bharata B Rao 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 16 +++-
 arch/powerpc/mm/book3s64/pgtable.c   |  5 -
 arch/powerpc/mm/book3s64/radix_pgtable.c | 15 +++
 arch/powerpc/mm/pgtable-frag.c   |  3 +++
 4 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 69c5b051734f..e1af0b394ceb 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -107,9 +107,23 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, 
unsigned long addr)
return pud;
 }
 
+static inline void __pud_free(pud_t *pud)
+{
+   struct page *page = virt_to_page(pud);
+
+   /*
+* Early pud pages allocated via memblock allocator
+* can't be directly freed to slab
+*/
+   if (PageReserved(page))
+   free_reserved_page(page);
+   else
+   kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), pud);
+}
+
 static inline void pud_free(struct mm_struct *mm, pud_t *pud)
 {
-   kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), pud);
+   return __pud_free(pud);
 }
 
 static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index c58ad1049909..85de5b574dd7 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -339,6 +339,9 @@ void pmd_fragment_free(unsigned long *pmd)
 {
struct page *page = virt_to_page(pmd);
 
+   if (PageReserved(page))
+   return free_reserved_page(page);
+
BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
if (atomic_dec_and_test(>pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
@@ -356,7 +359,7 @@ static inline void pgtable_free(void *table, int index)
pmd_fragment_free(table);
break;
case PUD_INDEX:
-   kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), table);
+   __pud_free(table);
break;
 #if defined(CONFIG_PPC_4K_PAGES) && defined(CONFIG_HUGETLB_PAGE)
/* 16M hugepd directory at pud level */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 8acb96de0e48..80be96d3018f 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -57,6 +57,13 @@ static __ref void *early_alloc_pgtable(unsigned long size, 
int nid,

[PATCH v2 0/4] powerpc/mm/radix: Memory unplug fixes

2020-06-25 Thread Aneesh Kumar K.V
This is the next version of the fixes for memory unplug on radix.
The issues and the fix are described in the actual patches.

Changes from v1:
- Added back patch to drop split_kernel_mapping
- Most of the split_kernel_mapping related issues are now described
  in the removal patch
- drop pte fragment change
- use lmb size as the max mapping size.
- Radix baremetal now use memory block size of 1G.


Changes from v0:
==
- Rebased to latest kernel.
- Took care of p4d changes.
- Addressed Aneesh's review feedback:
 - Added comments.
 - Indentation fixed.
- Dropped the 1st patch (setting DRCONF_MEM_HOTREMOVABLE lmb flags) as
  it is debatable if this flag should be set in the device tree by OS
  and not by platform in case of hotplug. This can be looked at separately.
  (The fixes in this patchset remain valid without the dropped patch)
- Dropped the last patch that removed split_kernel_mapping() to ensure
  that spilitting code is available for any radix guest running on
  platforms that don't set DRCONF_MEM_HOTREMOVABLE.


Aneesh Kumar K.V (2):
  powerpc/mm/radix: Fix PTE/PMD fragment count for early page table
mappings
  powerpc/mm/radix: Create separate mappings for hot-plugged memory

Bharata B Rao (2):
  powerpc/mm/radix: Free PUD table when freeing pagetable
  powerpc/mm/radix: Remove split_kernel_mapping()

 arch/powerpc/include/asm/book3s/64/pgalloc.h |  16 +-
 arch/powerpc/mm/book3s64/pgtable.c   |   5 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c | 199 +++
 arch/powerpc/mm/pgtable-frag.c   |   3 +
 arch/powerpc/platforms/powernv/setup.c   |  10 +-
 5 files changed, 144 insertions(+), 89 deletions(-)

-- 
2.26.2