date:20201202

Re: [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts

2020-12-02 Thread kernel test robot

Hi Alexey,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.10-rc6 next-20201201]
[cannot apply to powerpc/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Alexey-Kardashevskiy/powerpc-kuap-Restore-AMR-after-replaying-soft-interrupts/20201203-135010
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
34816d20f173a90389c8a7e641166d8ea9dce70a
config: powerpc-wii_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/d2a99260ef6d241abe6fb961ee3ed84bbc5db7f5
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Alexey-Kardashevskiy/powerpc-kuap-Restore-AMR-after-replaying-soft-interrupts/20201203-135010
git checkout d2a99260ef6d241abe6fb961ee3ed84bbc5db7f5
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/uaccess.h:9,
from include/linux/uaccess.h:11,
from include/linux/sched/task.h:11,
from include/linux/sched/signal.h:9,
from include/linux/rcuwait.h:6,
from include/linux/percpu-rwsem.h:7,
from include/linux/fs.h:33,
from include/linux/compat.h:17,
from arch/powerpc/kernel/asm-offsets.c:14:
   arch/powerpc/include/asm/kup.h: In function 'get_kuap':
>> arch/powerpc/include/asm/kup.h:19:26: warning: conversion from 'long long 
>> unsigned int' to 'long unsigned int' changes value from 
>> '13835058055282163712' to '0' [-Woverflow]
  19 | #define AMR_KUAP_BLOCKED (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
 |  ^
   arch/powerpc/include/asm/kup.h:73:9: note: in expansion of macro 
'AMR_KUAP_BLOCKED'
  73 |  return AMR_KUAP_BLOCKED;
 | ^~~~
--
   In file included from arch/powerpc/include/asm/uaccess.h:9,
from include/linux/uaccess.h:11,
from include/linux/sched/task.h:11,
from include/linux/sched/signal.h:9,
from include/linux/rcuwait.h:6,
from include/linux/percpu-rwsem.h:7,
from include/linux/fs.h:33,
from include/linux/huge_mm.h:8,
from include/linux/mm.h:687,
from include/linux/ring_buffer.h:5,
from include/linux/trace_events.h:6,
from include/trace/trace_events.h:21,
from include/trace/define_trace.h:102,
from include/trace/events/sched.h:656,
from kernel/sched/core.c:10:
   arch/powerpc/include/asm/kup.h: In function 'get_kuap':
>> arch/powerpc/include/asm/kup.h:19:26: warning: conversion from 'long long 
>> unsigned int' to 'long unsigned int' changes value from 
>> '13835058055282163712' to '0' [-Woverflow]
  19 | #define AMR_KUAP_BLOCKED (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
 |  ^
   arch/powerpc/include/asm/kup.h:73:9: note: in expansion of macro 
'AMR_KUAP_BLOCKED'
  73 |  return AMR_KUAP_BLOCKED;
 | ^~~~
   kernel/sched/core.c: In function 'ttwu_stat':
   kernel/sched/core.c:2419:13: warning: variable 'rq' set but not used 
[-Wunused-but-set-variable]
2419 |  struct rq *rq;
 | ^~
--
   In file included from arch/powerpc/include/asm/uaccess.h:9,
from include/linux/uaccess.h:11,
from include/linux/sched/task.h:11,
from include/linux/sched/signal.h:9,
from include/linux/sched/cputime.h:5,
from kernel/sched/sched.h:11,
from kernel/sched/fair.c:23:
   arch/powerpc/include/asm/kup.h: In function 'get_kuap':
>> arch/powerpc/include/asm/kup.h:19:26: warning: conversion from 'long long 
>> unsigned int' to 'long unsigned int' changes value from 
>> '13835058055282163712' to '0' [-Woverflow]
  19 | #define AMR_KUAP_BLOCKED (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
 |  ^
   arch/powerpc/include/asm/kup.h:73:9: note: in expansion of macro 
'AMR_KUAP_BLOCKED'
  73 |  return

Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting

2020-12-02 Thread kernel test robot

Hi Andy,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/x86/mm soc/for-next linus/master v5.10-rc6 
next-20201201]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Andy-Lutomirski/x86-mm-Lightweight-lazy-mm-refcounting/20201203-132706
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
238c91115cd05c71447ea071624a4c9fe661f970
config: m68k-randconfig-s032-20201203 (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-179-ga00755aa-dirty
# 
https://github.com/0day-ci/linux/commit/0d9b23b22e621d8e588095b4d0f9f39110a57901
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Andy-Lutomirski/x86-mm-Lightweight-lazy-mm-refcounting/20201203-132706
git checkout 0d9b23b22e621d8e588095b4d0f9f39110a57901
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 
CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=m68k 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   m68k-linux-ld: kernel/fork.o: in function `__mmput':
>> fork.c:(.text+0x214): undefined reference to `arch_fixup_lazy_mm_refs'
   m68k-linux-ld: kernel/fork.o: in function `mmput_async_fn':
   fork.c:(.text+0x990): undefined reference to `arch_fixup_lazy_mm_refs'
   m68k-linux-ld: kernel/fork.o: in function `mmput':
   fork.c:(.text+0xa68): undefined reference to `arch_fixup_lazy_mm_refs'
   m68k-linux-ld: kernel/fork.o: in function `dup_mm.isra.0':
   fork.c:(.text+0x1192): undefined reference to `arch_fixup_lazy_mm_refs'
   m68k-linux-ld: kernel/fork.o: in function `mm_access':
   fork.c:(.text+0x13c6): undefined reference to `arch_fixup_lazy_mm_refs'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts

2020-12-02 Thread Aneesh Kumar K.V

Alexey Kardashevskiy  writes:

> When interrupted in raw_copy_from_user()/... after user memory access
> is enabled, a nested handler may also access user memory (perf is
> one example) and when it does so, it calls prevent_read_from_user()
> which prevents the upper handler from accessing user memory.
>
> This saves/restores AMR when replaying interrupts.
>
> get_kuap/set_kuap have stubs for disabled KUAP on RADIX but there are
> none for hash-only configs (BOOK3E) so this adds stubs and moves
> AMR_KUAP_BLOCK_xxx.
>
> Found by syzkaller. More likely to break with enabled
> CONFIG_DEBUG_ATOMIC_SLEEP, the call chain is
> timer_interrupt -> ktime_get -> read_seqcount_begin -> local_irq_restore.

Can you test this with 
https://github.com/kvaneesh/linux/commits/hash-kuap-reworked-2

We do save restore AMR on interrupt entry and exit.

-aneesh

Re: powerpc 5.10-rcN boot failures with RCU_SCALE_TEST=m

2020-12-02 Thread Michael Ellerman

Uladzislau Rezki  writes:
> On Thu, Dec 03, 2020 at 01:03:32AM +1100, Michael Ellerman wrote:
...
>> 
>> The SMP bringup stalls because _cpu_up() is blocked trying to take
>> cpu_hotplug_lock for writing:
>> 
>> [  401.403132][T0] task:swapper/0   state:D stack:12512 pid:1 
>> ppid: 0 flags:0x0800
>> [  401.403502][T0] Call Trace:
>> [  401.403907][T0] [c62c37d0] [c62c3830] 
>> 0xc62c3830 (unreliable)
>> [  401.404068][T0] [c62c39b0] [c0019d70] 
>> __switch_to+0x2e0/0x4a0
>> [  401.404189][T0] [c62c3a10] [c0b87228] 
>> __schedule+0x288/0x9b0
>> [  401.404257][T0] [c62c3ad0] [c0b879b8] 
>> schedule+0x68/0x120
>> [  401.404324][T0] [c62c3b00] [c0184ad4] 
>> percpu_down_write+0x164/0x170
>> [  401.404390][T0] [c62c3b50] [c0116b68] 
>> _cpu_up+0x68/0x280
>> [  401.404475][T0] [c62c3bb0] [c0116e70] 
>> cpu_up+0xf0/0x140
>> [  401.404546][T0] [c62c3c30] [c011776c] 
>> bringup_nonboot_cpus+0xac/0xf0
>> [  401.404643][T0] [c62c3c80] [c0eea1b8] 
>> smp_init+0x40/0xcc
>> [  401.404727][T0] [c62c3ce0] [c0ec43dc] 
>> kernel_init_freeable+0x1e0/0x3a0
>> [  401.404799][T0] [c62c3db0] [c0011ec4] 
>> kernel_init+0x24/0x150
>> [  401.404958][T0] [c62c3e20] [c000daf0] 
>> ret_from_kernel_thread+0x5c/0x6c
>> 
>> It can't get it because kprobe_optimizer() has taken it for read and is now
>> blocked waiting for synchronize_rcu_tasks():
>> 
>> [  401.418808][T0] task:kworker/0:1 state:D stack:13392 pid:   12 
>> ppid: 2 flags:0x0800
>> [  401.418951][T0] Workqueue: events kprobe_optimizer
>> [  401.419078][T0] Call Trace:
>> [  401.419121][T0] [c62ef650] [c62ef710] 
>> 0xc62ef710 (unreliable)
>> [  401.419213][T0] [c62ef830] [c0019d70] 
>> __switch_to+0x2e0/0x4a0
>> [  401.419281][T0] [c62ef890] [c0b87228] 
>> __schedule+0x288/0x9b0
>> [  401.419347][T0] [c62ef950] [c0b879b8] 
>> schedule+0x68/0x120
>> [  401.419415][T0] [c62ef980] [c0b8e664] 
>> schedule_timeout+0x2a4/0x340
>> [  401.419484][T0] [c62efa80] [c0b894ec] 
>> wait_for_completion+0x9c/0x170
>> [  401.419552][T0] [c62efae0] [c01ac85c] 
>> __wait_rcu_gp+0x19c/0x210
>> [  401.419619][T0] [c62efb40] [c01ac90c] 
>> synchronize_rcu_tasks_generic+0x3c/0x70
>> [  401.419690][T0] [c62efbe0] [c022a3dc] 
>> kprobe_optimizer+0x1dc/0x470
>> [  401.419757][T0] [c62efc60] [c0136684] 
>> process_one_work+0x2f4/0x530
>> [  401.419823][T0] [c62efd20] [c0138d28] 
>> worker_thread+0x78/0x570
>> [  401.419891][T0] [c62efdb0] [c0142424] 
>> kthread+0x194/0x1a0
>> [  401.419976][T0] [c62efe20] [c000daf0] 
>> ret_from_kernel_thread+0x5c/0x6c
>> 
>> But why is the synchronize_rcu_tasks() not completing?
>> 
> I think that it is because RCU is not fully initialized by that time.

Yeah that would explain it :)

> The 36dadef23fcc ("kprobes: Init kprobes in early_initcall") patch
> switches to early_initcall() that has a higher priority sequence than
> core_initcall() that is used to complete an RCU setup in the 
> rcu_set_runtime_mode().

I was looking at debug_lockdep_rcu_enabled(), which is:

noinstr int notrace debug_lockdep_rcu_enabled(void)
{
return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
   current->lockdep_recursion == 0;
}

That is not firing any warnings for me because rcu_scheduler_active is:

(gdb) p/x rcu_scheduler_active
$1 = 0x1

Which is:

#define RCU_SCHEDULER_INIT  1

But that's different to RCU_SCHEDULER_RUNNING, which is set in
rcu_set_runtime_mode() as you mentioned:

static int __init rcu_set_runtime_mode(void)
{
rcu_test_sync_prims();
rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
kfree_rcu_scheduler_running();
rcu_test_sync_prims();
return 0;
}

The comment on rcu_scheduler_active implies that once we're at
RCU_SCHEDULER_INIT things should work:

/*
 * The rcu_scheduler_active variable is initialized to the value
 * RCU_SCHEDULER_INACTIVE and transitions RCU_SCHEDULER_INIT just before the
 * first task is spawned.  So when this variable is RCU_SCHEDULER_INACTIVE,
 * RCU can assume that there is but one task, allowing RCU to (for example)
 * optimize synchronize_rcu() to a simple barrier().  When this variable
 * is RCU_SCHEDULER_INIT, RCU must actually do all the hard work required
 * to detect real grace periods.  This variable is also used to suppress
 * boot-time false positives from lockdep-RCU error checking.  Finally, it
 * transitions from RCU_SCHEDULER_INIT to RCU_SCHEDULER_RUNNING after RCU
 * is fully initialized,

Re: [powerpc:next-test 121/184] arch/powerpc/kernel/firmware.c:31:9-10: WARNING: return of 0/1 in function 'check_kvm_guest' with return type bool

2020-12-02 Thread Srikar Dronamraju

Hi, 

Thanks for the report.

* kernel test robot  [2020-12-03 07:25:06]:

> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> next-test
> head:   fb003959777a635dea8910cf71109b612c7f940c
> commit: 77354ecf8473208a5cc5f20a501760f7d6d631cd [121/184] powerpc: Rename 
> is_kvm_guest() to check_kvm_guest()

Sorry for the nit pick, but the commit should have been
Commit 107c55005fbd ("powerpc/pseries: Add KVM guest doorbell restrictions")

because thats the patch that introduced is_kvm_guest.
my patch was just renaming is_kvm_guest to check_kvm_guest.

> config: powerpc-randconfig-c003-20201202 (attached as .config)
> compiler: powerpc64le-linux-gcc (GCC) 9.3.0
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
> 
> 
> "coccinelle warnings: (new ones prefixed by >>)"
> >> arch/powerpc/kernel/firmware.c:31:9-10: WARNING: return of 0/1 in function 
> >> 'check_kvm_guest' with return type bool
> 
> Please review and possibly fold the followup patch.
> 

But I agree we can still fold this the followup patch into the
Rename is_kvm_guest() to check_kvm_guest().

> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org



-- 
Thanks and Regards
Srikar Dronamraju

[PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts

2020-12-02 Thread Alexey Kardashevskiy

When interrupted in raw_copy_from_user()/... after user memory access
is enabled, a nested handler may also access user memory (perf is
one example) and when it does so, it calls prevent_read_from_user()
which prevents the upper handler from accessing user memory.

This saves/restores AMR when replaying interrupts.

get_kuap/set_kuap have stubs for disabled KUAP on RADIX but there are
none for hash-only configs (BOOK3E) so this adds stubs and moves
AMR_KUAP_BLOCK_xxx.

Found by syzkaller. More likely to break with enabled
CONFIG_DEBUG_ATOMIC_SLEEP, the call chain is
timer_interrupt -> ktime_get -> read_seqcount_begin -> local_irq_restore.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* fixed compile on hash
* moved get/set to arch_local_irq_restore
* block KUAP before replaying


---

This is an example:

[ cut here ]
Bug: Read fault blocked by AMR!
WARNING: CPU: 0 PID: 1603 at 
/home/aik/p/kernel/arch/powerpc/include/asm/book3s/64/kup-radix.h:145 
__do_page_fau

Modules linked in:
CPU: 0 PID: 1603 Comm: amr Not tainted 5.10.0-rc6_v5.10-rc6_a+fstn1 #24
NIP:  c009ece8 LR: c009ece4 CTR: 
REGS: cdc63560 TRAP: 0700   Not tainted  (5.10.0-rc6_v5.10-rc6_a+fstn1)
MSR:  80021033   CR: 28002888  XER: 2004
CFAR: c01fa928 IRQMASK: 1
GPR00: c009ece4 cdc637f0 c2397600 001f
GPR04: c20eb318  cdc63494 0027
GPR08: c0007fe4de68 cdfe9180  0001
GPR12: 2000 c30a  
GPR16:    bfff
GPR20:  c000134a4020 c19c2218 0fe0
GPR24:   cd106200 4000
GPR28:  0300 cdc63910 c1946730
NIP [c009ece8] __do_page_fault+0xb38/0xde0
LR [c009ece4] __do_page_fault+0xb34/0xde0
Call Trace:
[cdc637f0] [c009ece4] __do_page_fault+0xb34/0xde0 (unreliable)
[cdc638a0] [c000c968] handle_page_fault+0x10/0x2c
--- interrupt: 300 at strncpy_from_user+0x290/0x440
LR = strncpy_from_user+0x284/0x440
[cdc63ba0] [c0c3dcb0] strncpy_from_user+0x2f0/0x440 (unreliable)
[cdc63c30] [c068b888] getname_flags+0x88/0x2c0
[cdc63c90] [c0662a44] do_sys_openat2+0x2d4/0x5f0
[cdc63d30] [c066560c] do_sys_open+0xcc/0x140
[cdc63dc0] [c0045e10] system_call_exception+0x160/0x240
[cdc63e20] [c000da60] system_call_common+0xf0/0x27c
Instruction dump:
409c0048 3fe2ff5b 3bfff128 fac10060 fae10068 482f7a85 6000 3c62ff5b
7fe4fb78 3863f250 4815bbd9 6000 <0fe0> 3c62ff5b 3863f2b8 4815c8b5
irq event stamp: 254
hardirqs last  enabled at (253): [] 
arch_local_irq_restore+0xa0/0x150
hardirqs last disabled at (254): [] 
data_access_common_virt+0x1b0/0x1d0
softirqs last  enabled at (0): [] copy_process+0x78c/0x2120
softirqs last disabled at (0): [<>] 0x0
---[ end trace ba98aec5151f3aeb ]---
---
 arch/powerpc/include/asm/book3s/64/kup-radix.h |  3 ---
 arch/powerpc/include/asm/kup.h | 10 ++
 arch/powerpc/kernel/irq.c  |  6 ++
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index a39e2d193fdc..4ad607461b75 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -5,9 +5,6 @@
 #include 
 #include 
 
-#define AMR_KUAP_BLOCK_READUL(0x4000)
-#define AMR_KUAP_BLOCK_WRITE   UL(0x8000)
-#define AMR_KUAP_BLOCKED   (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
 #define AMR_KUAP_SHIFT 62
 
 #ifdef __ASSEMBLY__
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 0d93331d0fab..e63930767a96 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -14,6 +14,10 @@
 #define KUAP_CURRENT_WRITE 8
 #define KUAP_CURRENT   (KUAP_CURRENT_READ | KUAP_CURRENT_WRITE)
 
+#define AMR_KUAP_BLOCK_READUL(0x4000)
+#define AMR_KUAP_BLOCK_WRITE   UL(0x8000)
+#define AMR_KUAP_BLOCKED   (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
+
 #ifdef CONFIG_PPC_BOOK3S_64
 #include 
 #endif
@@ -64,6 +68,12 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, 
bool is_write)
 }
 
 static inline void kuap_check_amr(void) { }
+static inline unsigned long get_kuap(void)
+{
+   return AMR_KUAP_BLOCKED;
+}
+
+static inline void set_kuap(unsigned long value) { }
 
 /*
  * book3s/64/kup-radix.h defines these functions for the !KUAP case to flush
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 7d0f7682d01d..d9fd46da04d6 100644
---

Re: [PATCH kernel] powerpc/perf: Stop crashing with generic_compat_pmu

2020-12-02 Thread Madhavan Srinivasan




On 12/2/20 8:31 AM, Alexey Kardashevskiy wrote:

Hi Maddy,

I just noticed that I still have "powerpc/perf: Add checks for 
reserved values" in my pile (pushed here 
https://github.com/aik/linux/commit/61e1bc3f2e19d450e2e2d39174d422160b21957b 
), do we still need it? The lockups I saw were fixed by 
https://github.com/aik/linux/commit/17899eaf88d689 but it is hardly a 
replacement. Thanks,


sorry missed this. Will look at this again. Since we will need 
generation specific checks for the reserve field.


Maddy




On 04/06/2020 02:34, Madhavan Srinivasan wrote:



On 6/2/20 8:26 AM, Alexey Kardashevskiy wrote:
The bhrb_filter_map ("The  Branch History  Rolling  Buffer") 
callback is

only defined in raw CPUs' power_pmu structs. The "architected" CPUs use
generic_compat_pmu which does not have this callback and crashed occur.

This add a NULL pointer check for bhrb_filter_map() which behaves as if
the callback returned an error.

This does not add the same check for config_bhrb() as the only caller
checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0.


Changes looks fine.
Reviewed-by: Madhavan Srinivasan 

The commit be80e758d0c2e ('powerpc/perf: Add generic compat mode pmu 
driver')

which introduced generic_compat_pmu was merged in v5.2.  So we need to
CC stable starting from 5.2 :( .  My bad,  sorry.

Maddy


Signed-off-by: Alexey Kardashevskiy 
---
  arch/powerpc/perf/core-book3s.c | 19 ++-
  1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c 
b/arch/powerpc/perf/core-book3s.c

index 3dcfecf858f3..36870569bf9c 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1515,9 +1515,16 @@ static int power_pmu_add(struct perf_event 
*event, int ef_flags)

  ret = 0;
   out:
  if (has_branch_stack(event)) {
-    power_pmu_bhrb_enable(event);
-    cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
-    event->attr.branch_sample_type);
+    u64 bhrb_filter = -1;
+
+    if (ppmu->bhrb_filter_map)
+    bhrb_filter = ppmu->bhrb_filter_map(
+    event->attr.branch_sample_type);
+
+    if (bhrb_filter != -1) {
+    cpuhw->bhrb_filter = bhrb_filter;
+    power_pmu_bhrb_enable(event); /* Does bhrb_users++ */
+    }
  }

  perf_pmu_enable(event->pmu);
@@ -1839,7 +1846,6 @@ static int power_pmu_event_init(struct 
perf_event *event)

  int n;
  int err;
  struct cpu_hw_events *cpuhw;
-    u64 bhrb_filter;

  if (!ppmu)
  return -ENOENT;
@@ -1945,7 +1951,10 @@ static int power_pmu_event_init(struct 
perf_event *event)

  err = power_check_constraints(cpuhw, events, cflags, n + 1);

  if (has_branch_stack(event)) {
-    bhrb_filter = ppmu->bhrb_filter_map(
+    u64 bhrb_filter = -1;
+
+    if (ppmu->bhrb_filter_map)
+    bhrb_filter = ppmu->bhrb_filter_map(
  event->attr.branch_sample_type);

  if (bhrb_filter == -1) {

[MOCKUP] x86/mm: Lightweight lazy mm refcounting

2020-12-02 Thread Andy Lutomirski

For context, this is part of a series here:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm=7c4bcc0a464ca60be1e0aeba805a192be0ee81e5

This code compiles, but I haven't even tried to boot it.  The earlier
part of the series isn't terribly interesting -- it's a handful of
cleanups that remove all reads of ->active_mm from arch/x86.  I've
been meaning to do that for a while, and now I did it.  But, with
that done, I think we can move to a totally different lazy mm refcounting
model.

So this patch is a mockup of what this could look like.  The algorithm
involves no atomics at all in the context switch path except for a
single atomic_long_xchg() of a percpu variable when going from lazy
mode to nonlazy mode.  Even that could be optimized -- I suspect it could
be replaced with non-atomic code if mm_users > 0.  Instead, on mm exit,
there's a single loop over all CPUs on which that mm could be lazily
loaded that atomic_long_cmpxchg_relaxed()'s a remote percpu variable to
tell the CPU to kindly mmdrop() the mm when it reschedules.  All cpus
that don't have the mm lazily loaded are ignored because they can't
have lazy references, and no cpus can gain lazy references after
mm_users hits zero.  (I need to verify that all the relevant barriers
are in place.  I suspect that they are on x86 but that I'm short an
smp_mb() on arches for which _relaxed atomics are genuinely relaxed.)

Here's how I think it fits with various arches:

x86: On bare metal (i.e. paravirt flush unavailable), the loop won't do
much.  The existing TLB shootdown when user tables are freed will
empty mm_cpumask of everything but the calling CPU.  So x86 ends up
pretty close to as good as we can get short of reworking mm_cpumask() itself.

arm64: with s/for_each_cpu/for_each_online_cpu, I think it will give
good performance.  The mmgrab()/mmdrop() overhead goes away, and,
on smallish systems, the cost of the loop should be low.

power: same as ARM, except that the loop may be rather larger since
the systems are bigger.  But I imagine it's still faster than Nick's
approach -- a cmpxchg to a remote cacheline should still be faster than
an IPI shootdown.  (Nick, don't benchmark this patch -- at least the
mm_users optimization mentioned above should be done, but also the
mmgrab() and mmdrop() aren't actually removed.)

Other arches: I don't know.  Further research is required.

What do you all think?


As mentioned, there are several things blatantly wrong with this patch:

The coding stype is not up to kernel standars.  I have prototypes in the
wrong places and other hacks.

mms are likely to be freed with IRQs off.  I think this is safe, but it's
suboptimal.

This whole thing is in arch/x86.  The core algorithm ought to move outside
arch/, but doing so without making a mess might take some thought.  It
doesn't help that different architectures have very different ideas
of what mm_cpumask() does.

Finally, this patch has no benefit by itself.  I didn't remove the
active_mm refounting, so the big benefit of removing mmgrab() and
mmdrop() calls on transitions to and from lazy mode isn't there.
There is no point at all in benchmarking this patch as is.  That
being said, getting rid of the active_mm refcounting shouldn't be
so hard, since x86 (in this tree) no longer uses active_mm at all.

I should contemplate whether the calling CPU is special in
arch_fixup_lazy_mm_refs().  On a very very quick think, it's not, but
it needs more thought.

Signed-off-by: Andy Lutomirski 

 arch/x86/include/asm/tlbflush.h | 20 
 arch/x86/mm/tlb.c   | 81 +++--
 kernel/fork.c   |  5 ++
 3 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 8c87a2e0b660..efcd4f125f2c 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -124,6 +124,26 @@ struct tlb_state {
 */
unsigned short user_pcid_flush_mask;
 
+   /*
+* Lazy mm tracking.
+*
+*  - If this is NULL, it means that any mm_struct referenced by
+*this CPU is kept alive by a real reference count.
+*
+*  - If this is nonzero but the low bit is clear, it points to
+*an mm_struct that must not be freed out from under this
+*CPU.
+*
+*  - If the low bit is set, it still points to an mm_struct,
+*but some other CPU has mmgrab()'d it on our behalf, and we
+*must mmdrop() it when we're done with it.
+*
+* See lazy_mm_grab() and related functions for the precise
+* access rules.
+*/
+   atomic_long_t   lazy_mm;
+
+
/*
 * Access to this CR4 shadow and to H/W CR4 is protected by
 * disabling interrupts when modifying either one.
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index e27300fc865b..00f5bace534b 100644
---

Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-12-02 Thread Andy Lutomirski

On Tue, Dec 1, 2020 at 6:50 PM Nicholas Piggin  wrote:
>
> Excerpts from Andy Lutomirski's message of November 29, 2020 3:55 am:
> > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin  wrote:
> >>
> >> And get rid of the generic sync_core_before_usermode facility. This is
> >> functionally a no-op in the core scheduler code, but it also catches
> >>
> >> This helper is the wrong way around I think. The idea that membarrier
> >> state requires a core sync before returning to user is the easy one
> >> that does not need hiding behind membarrier calls. The gap in core
> >> synchronization due to x86's sysret/sysexit and lazy tlb mode, is the
> >> tricky detail that is better put in x86 lazy tlb code.
> >>
> >> Consider if an arch did not synchronize core in switch_mm either, then
> >> membarrier_mm_sync_core_before_usermode would be in the wrong place
> >> but arch specific mmu context functions would still be the right place.
> >> There is also a exit_lazy_tlb case that is not covered by this call, which
> >> could be a bugs (kthread use mm the membarrier process's mm then context
> >> switch back to the process without switching mm or lazy mm switch).
> >>
> >> This makes lazy tlb code a bit more modular.
> >
> > I have a couple of membarrier fixes that I want to send out today or
> > tomorrow, and they might eliminate the need for this patch.  Let me
> > think about this a little bit.  I'll cc you.  The existing code is way
> > to subtle and the comments are far too confusing for me to be quickly
> > confident about any of my conclusions :)
> >
>
> Thanks for the head's up. I'll have to have a better look through them
> but I don't know that it eliminates the need for this entirely although
> it might close some gaps and make this not a bug fix. The problem here
> is x86 code wanted something to be called when a lazy mm is unlazied,
> but it missed some spots and also the core scheduler doesn't need to
> know about those x86 details if it has this generic call that annotates
> the lazy handling better.

I'll send v3 tomorrow.  They add more sync_core_before_usermode() callers.

Having looked at your patches a bunch and the membarrier code a bunch,
I don't think I like the approach of pushing this logic out into new
core functions called by arch code.  Right now, even with my
membarrier patches applied, understanding how (for example) the x86
switch_mm_irqs_off() plus the scheduler code provides the barriers
that membarrier needs is quite complicated, and it's not clear to me
that the code is even correct.  At the very least I'm pretty sure that
the x86 comments are misleading.  With your patches, someone trying to
audit the code would have to follow core code calling into arch code
and back out into core code to figure out what's going on.  I think
the result is worse.

I wrote this incomplete patch which takes the opposite approach (sorry
for whitespace damage):

commit 928b5c60e93f475934892d6e0b357ebf0a2bf87d
Author: Andy Lutomirski 
Date:   Wed Dec 2 17:24:02 2020 -0800

[WIP] x86/mm: Handle unlazying membarrier core sync in the arch code

The core scheduler isn't a great place for
membarrier_mm_sync_core_before_usermode() -- the core scheduler
doesn't actually know whether we are lazy.  With the old code, if a
CPU is running a membarrier-registered task, goes idle, gets unlazied
via a TLB shootdown IPI, and switches back to the
membarrier-registered task, it will do an unnecessary core sync.

Conveniently, x86 is the only architecture that does anything in this
hook, so we can just move the code.

XXX: actually delete the old code.

Signed-off-by: Andy Lutomirski 

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3338a1feccf9..e27300fc865b 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -496,6 +496,8 @@ void switch_mm_irqs_off(struct mm_struct *prev,
struct mm_struct *next,
  * from one thread in a process to another thread in the same
  * process. No TLB flush required.
  */
+
+// XXX: why is this okay wrt membarrier?
 if (!was_lazy)
 return;

@@ -508,12 +510,24 @@ void switch_mm_irqs_off(struct mm_struct *prev,
struct mm_struct *next,
 smp_mb();
 next_tlb_gen = atomic64_read(>context.tlb_gen);
 if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
-next_tlb_gen)
+next_tlb_gen) {
+/*
+ * We're reactivating an mm, and membarrier might
+ * need to serialize.  Tell membarrier.
+ */
+
+// XXX: I can't understand the logic in
+// membarrier_mm_sync_core_before_usermode().  What's
+// the mm check for?
+membarrier_mm_sync_core_before_usermode(next);
 return;
+}

 /*
  * TLB contents went out of date while we were in lazy
  * mode. Fall through to the TLB switching code below.
+ * No need for an

[PATCH] powerpc/book3s_hv_uvmem: Check for failed page migration

2020-12-02 Thread Alistair Popple

migrate_vma_pages() may still clear MIGRATE_PFN_MIGRATE on pages which
are not able to be migrated. Drivers may safely copy data prior to
calling migrate_vma_pages() however a remote mapping must not be
established until after migrate_vma_pages() has returned as the
migration could still fail.

UV_PAGE_IN_in both copies and maps the data page, therefore it should
only be called after checking the results of migrate_vma_pages().

Signed-off-by: Alistair Popple 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..08aa6a90c525 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -762,7 +762,10 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
goto out_finalize;
}
 
-   if (pagein) {
+   *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+   migrate_vma_pages();
+
+   if ((*mig.src & MIGRATE_PFN_MIGRATE) && pagein) {
pfn = *mig.src >> MIGRATE_PFN_SHIFT;
spage = migrate_pfn_to_page(*mig.src);
if (spage) {
@@ -773,8 +776,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
}
}
 
-   *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
-   migrate_vma_pages();
 out_finalize:
migrate_vma_finalize();
return ret;
-- 
2.20.1

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-02 Thread Andy Lutomirski

> On Dec 1, 2020, at 7:47 PM, Nicholas Piggin  wrote:
>
> Excerpts from Andy Lutomirski's message of December 1, 2020 4:31 am:
>> other arch folk: there's some background here:
>>
>> https://lkml.kernel.org/r/calcetrvxube8lfnn-qs+dzroqaiw+sfug1j047ybyv31sat...@mail.gmail.com
>>
>>> On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski  wrote:
>>>
>>> On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski  wrote:

 On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin  wrote:
>
> On big systems, the mm refcount can become highly contented when doing
> a lot of context switching with threaded applications (particularly
> switching between the idle thread and an application thread).
>
> Abandoning lazy tlb slows switching down quite a bit in the important
> user->idle->user cases, so so instead implement a non-refcounted scheme
> that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
> any remaining lazy ones.
>
> Shootdown IPIs are some concern, but they have not been observed to be
> a big problem with this scheme (the powerpc implementation generated
> 314 additional interrupts on a 144 CPU system during a kernel compile).
> There are a number of strategies that could be employed to reduce IPIs
> if they turn out to be a problem for some workload.

 I'm still wondering whether we can do even better.

>>>
>>> Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes
>>> the TLB.  On x86, this will shoot down all lazies as long as even a
>>> single pagetable was freed.  (Or at least it will if we don't have a
>>> serious bug, but the code seems okay.  We'll hit pmd_free_tlb, which
>>> sets tlb->freed_tables, which will trigger the IPI.)  So, on
>>> architectures like x86, the shootdown approach should be free.  The
>>> only way it ought to have any excess IPIs is if we have CPUs in
>>> mm_cpumask() that don't need IPI to free pagetables, which could
>>> happen on paravirt.
>>
>> Indeed, on x86, we do this:
>>
>> [   11.558844]  flush_tlb_mm_range.cold+0x18/0x1d
>> [   11.559905]  tlb_finish_mmu+0x10e/0x1a0
>> [   11.561068]  exit_mmap+0xc8/0x1a0
>> [   11.561932]  mmput+0x29/0xd0
>> [   11.562688]  do_exit+0x316/0xa90
>> [   11.563588]  do_group_exit+0x34/0xb0
>> [   11.564476]  __x64_sys_exit_group+0xf/0x10
>> [   11.565512]  do_syscall_64+0x34/0x50
>>
>> and we have info->freed_tables set.
>>
>> What are the architectures that have large systems like?
>>
>> x86: we already zap lazies, so it should cost basically nothing to do
>
> This is not zapping lazies, this is freeing the user page tables.
>
> "lazy mm" is where a switch to a kernel thread takes on the
> previous mm for its kernel mapping rather than switch to init_mm.

The intent of the code is to flush the TLB after freeing user pages
tables, but, on bare metal, lazies get zapped as a side effect.

Anyway, I'm going to send out a mockup of an alternative approach shortly.

[PATCH] powerpc: add security.config, enforcing lockdown=integrity

2020-12-02 Thread Daniel Axtens

It's sometimes handy to have a config that boots a bit like a system
under secure boot (forcing lockdown=integrity, without needing any
extra stuff like a command line option).

This config file allows that, and also turns on a few assorted security
and hardening options for good measure.

Suggested-by: Michael Ellerman 
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/configs/security.config | 15 +++
 1 file changed, 15 insertions(+)
 create mode 100644 arch/powerpc/configs/security.config

diff --git a/arch/powerpc/configs/security.config 
b/arch/powerpc/configs/security.config
new file mode 100644
index ..1c91a35c6a73
--- /dev/null
+++ b/arch/powerpc/configs/security.config
@@ -0,0 +1,15 @@
+# This is the equivalent of booting with lockdown=integrity
+CONFIG_SECURITY=y
+CONFIG_SECURITYFS=y
+CONFIG_SECURITY_LOCKDOWN_LSM=y
+CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
+CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY=y
+
+# These are some general, reasonably inexpensive hardening options
+CONFIG_HARDENED_USERCOPY=y
+CONFIG_FORTIFY_SOURCE=y
+CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
+
+# UBSAN bounds checking is very cheap and good for hardening
+CONFIG_UBSAN=y
+# CONFIG_UBSAN_MISC is not set
\ No newline at end of file
-- 
2.25.1

Re: [PATCH 2/2] ASoC: fsl: Add imx-hdmi machine driver

2020-12-02 Thread Shengjiu Wang

On Thu, Dec 3, 2020 at 4:23 AM Nicolin Chen  wrote:
>
> On Fri, Nov 27, 2020 at 01:30:21PM +0800, Shengjiu Wang wrote:
> > The driver is initially designed for sound card using HDMI
> > interface on i.MX platform. There is internal HDMI IP or
> > external HDMI modules connect with SAI or AUD2HTX interface.
> > It supports both transmitter and receiver devices.
> >
> > Signed-off-by: Shengjiu Wang 
> > ---
> >  sound/soc/fsl/Kconfig|  12 ++
> >  sound/soc/fsl/Makefile   |   2 +
> >  sound/soc/fsl/imx-hdmi.c | 235 +++
> >  3 files changed, 249 insertions(+)
> >  create mode 100644 sound/soc/fsl/imx-hdmi.c
>
> > diff --git a/sound/soc/fsl/imx-hdmi.c b/sound/soc/fsl/imx-hdmi.c
> > new file mode 100644
> > index ..ac164514b1b2
> > --- /dev/null
> > +++ b/sound/soc/fsl/imx-hdmi.c
>
> > +static int imx_hdmi_hw_params(struct snd_pcm_substream *substream,
> > +   struct snd_pcm_hw_params *params)
> > +{
> > + struct snd_soc_pcm_runtime *rtd = substream->private_data;
> > + struct imx_hdmi_data *data = snd_soc_card_get_drvdata(rtd->card);
> > + bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
> > + struct snd_soc_dai *cpu_dai = asoc_rtd_to_cpu(rtd, 0);
> > + struct snd_soc_card *card = rtd->card;
> > + struct device *dev = card->dev;
> > + int ret;
> > +
> > + /* set cpu DAI configuration */
> > + ret = snd_soc_dai_set_sysclk(cpu_dai, data->cpu_priv.sysclk_id[tx],
> > +  8 * data->cpu_priv.slot_width * 
> > params_rate(params),
>
> Looks like fixed 2 slots being used, judging by the set_tdm_slot
> call below. Then...why "8 *"? Probably need a line of comments?

The master clock always 256 * rate, when slot_width=32.  so use
the 8 * slot_width.  will add comments.

>
> > +  tx ? SND_SOC_CLOCK_OUT : 
> > SND_SOC_CLOCK_IN);
> > + if (ret && ret != -ENOTSUPP) {
> > + dev_err(dev, "failed to set cpu sysclk: %d\n", ret);
> > + return ret;
> > + }
> > +
> > + ret = snd_soc_dai_set_tdm_slot(cpu_dai, 0, 0, 2, 
> > data->cpu_priv.slot_width);
>
> May have a local variable to cache slot_width.

ok.

>
> > +static int imx_hdmi_probe(struct platform_device *pdev)
>
> > + data->dai.name = "i.MX HDMI";
> > + data->dai.stream_name = "i.MX HDMI";
> > + data->dai.cpus->dai_name = dev_name(_pdev->dev);
> > + data->dai.platforms->of_node = cpu_np;
> > + data->dai.ops = _hdmi_ops;
> > + data->dai.playback_only = true;
> > + data->dai.capture_only = false;
> > + data->dai.init = imx_hdmi_init;
> > +
> > +
> > + if (of_property_read_bool(np, "hdmi-out")) {
> > + data->dai.playback_only = true;
> > + data->dai.capture_only = false;
> > + data->dai.codecs->dai_name = "i2s-hifi";
> > + data->dai.codecs->name = "hdmi-audio-codec.1";
> > + data->dai.dai_fmt = data->dai_fmt |
> > + SND_SOC_DAIFMT_NB_NF |
> > + SND_SOC_DAIFMT_CBS_CFS;
> > + }
> > +
> > + if (of_property_read_bool(np, "hdmi-in")) {
> > + data->dai.playback_only = false;
> > + data->dai.capture_only = true;
> > + data->dai.codecs->dai_name = "i2s-hifi";
> > + data->dai.codecs->name = "hdmi-audio-codec.2";
> > + data->dai.dai_fmt = data->dai_fmt |
> > + SND_SOC_DAIFMT_NB_NF |
> > + SND_SOC_DAIFMT_CBM_CFM;
> > + }
> > +
> > + if ((data->dai.playback_only && data->dai.capture_only) ||
> > + (!data->dai.playback_only && !data->dai.capture_only)) {
> > + dev_err(>dev, "Wrongly enable HDMI DAI link\n");
> > + goto fail;
> > + }
>
> Seems that this condition check can never be true, given that:
> 1. By default: playback_only=true && capture_only=false
> 2. Conditionally overwritten: playback_only=true && capture_only=false
> 3. Conditionally overwritten: playback_only=false && capture_only=true
>
> If I understand it correctly, probably should be something like:
> bool hdmi_out = of_property_read_bool(np, "hdmi-out");
> bool hdmi_in = of_property_read_bool(np, "hdmi-in");
>
> if ((hdmi_out && hdmi_in) || (!hdmi_out || !hdmi_in))
> // "Invalid HDMI DAI link"; goto fail;
>
> if (hdmi_out) {
> // ...
> } else if (hdmi_in) {
> // ...
> } else // No need of this line if two properties are exclusive
>

Good catch, will update it.

> > + data->card.num_links = 1;
> > + data->card.dai_link = >dai;
> > +
> > + platform_set_drvdata(pdev, >card);
>
> Why pass card pointer?

Seems it duplicates with dev_set_drvdata(card->dev, card);
in snd_soc_register_card.  will remove it.

best regards
wang shengjiu

[powerpc:merge] BUILD SUCCESS a1aeabd25a36d9e019381278e543e2d538dd44a7

2020-12-02 Thread kernel test robot

   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a004-20201202
x86_64   randconfig-a006-20201202
x86_64   randconfig-a001-20201202
x86_64   randconfig-a002-20201202
x86_64   randconfig-a005-20201202
x86_64   randconfig-a003-20201202
i386 randconfig-a004-20201202
i386 randconfig-a005-20201202
i386 randconfig-a001-20201202
i386 randconfig-a002-20201202
i386 randconfig-a006-20201202
i386 randconfig-a003-20201202
i386 randconfig-a014-20201202
i386 randconfig-a013-20201202
i386 randconfig-a011-20201202
i386 randconfig-a015-20201202
i386 randconfig-a012-20201202
i386 randconfig-a016-20201202
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a016-20201202
x86_64   randconfig-a012-20201202
x86_64   randconfig-a014-20201202
x86_64   randconfig-a013-20201202
x86_64   randconfig-a015-20201202
x86_64   randconfig-a011-20201202

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

[powerpc:next-test] BUILD REGRESSION fb003959777a635dea8910cf71109b612c7f940c

2020-12-02 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next-test
branch HEAD: fb003959777a635dea8910cf71109b612c7f940c  powerpc/44x: Don't 
support 47x code and non 47x code at the same time

Error/Warning reports:

https://lore.kernel.org/linuxppc-dev/202012030320.sonpcjr8-...@intel.com
https://lore.kernel.org/linuxppc-dev/202012030759.zueuldq3-...@intel.com

Error/Warning in current branch:

drivers/misc/lkdtm/powerpc.c:13:10: error: use of undeclared identifier 
'SLB_VSID_KERNEL'
drivers/misc/lkdtm/powerpc.c:13:54: error: no member named 'sllp' in 'struct 
mmu_psize_def'
drivers/misc/lkdtm/powerpc.c:17:15: error: implicit declaration of function 
'mk_vsid_data' [-Werror,-Wimplicit-function-declaration]
drivers/misc/lkdtm/powerpc.c:18:15: error: implicit declaration of function 
'mk_esid_data' [-Werror,-Wimplicit-function-declaration]
drivers/misc/lkdtm/powerpc.c:18:38: error: use of undeclared identifier 
'SLB_NUM_BOLTED'
drivers/misc/lkdtm/powerpc.c:37:37: error: use of undeclared identifier 
'MMU_SEGSIZE_1T'
drivers/misc/lkdtm/powerpc.c:37:53: error: use of undeclared identifier 
'mmu_vmalloc_psize'

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
`-- powerpc-randconfig-c003-20201202
`-- 
arch-powerpc-kernel-firmware.c:WARNING:return-of-in-function-check_kvm_guest-with-return-type-bool

clang_recent_errors
`-- powerpc64-randconfig-r013-20201202
|-- 
drivers-misc-lkdtm-powerpc.c:error:implicit-declaration-of-function-mk_esid_data-Werror-Wimplicit-function-declaration
|-- 
drivers-misc-lkdtm-powerpc.c:error:implicit-declaration-of-function-mk_vsid_data-Werror-Wimplicit-function-declaration
|-- 
drivers-misc-lkdtm-powerpc.c:error:no-member-named-sllp-in-struct-mmu_psize_def
|-- 
drivers-misc-lkdtm-powerpc.c:error:use-of-undeclared-identifier-MMU_SEGSIZE_1T
|-- 
drivers-misc-lkdtm-powerpc.c:error:use-of-undeclared-identifier-SLB_NUM_BOLTED
|-- 
drivers-misc-lkdtm-powerpc.c:error:use-of-undeclared-identifier-SLB_VSID_KERNEL
`-- 
drivers-misc-lkdtm-powerpc.c:error:use-of-undeclared-identifier-mmu_vmalloc_psize

elapsed time: 733m

configs tested: 141
configs skipped: 2

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
armclps711x_defconfig
armshmobile_defconfig
powerpc64alldefconfig
mips decstation_r4k_defconfig
nios2 3c120_defconfig
mips bigsur_defconfig
mipsmalta_qemu_32r6_defconfig
mips mpc30x_defconfig
microblaze  mmu_defconfig
sh sh7710voipgw_defconfig
powerpc redwood_defconfig
parisc   alldefconfig
mips  pic32mzda_defconfig
powerpc skiroot_defconfig
sh  rsk7264_defconfig
powerpcfsp2_defconfig
sh shx3_defconfig
shsh7785lcr_defconfig
shtitan_defconfig
csky alldefconfig
m68k  sun3x_defconfig
arm  ep93xx_defconfig
sh   se7724_defconfig
powerpc powernv_defconfig
powerpc  walnut_defconfig
arm s3c6400_defconfig
arm   aspeed_g5_defconfig
sparc   sparc32_defconfig
mips tb0287_defconfig
m68k  hp300_defconfig
mips cu1830-neo_defconfig
sh   se7343_defconfig
armu300_defconfig
m68k alldefconfig
armspear6xx_defconfig
arm assabet_defconfig
sh  ul2_defconfig
xtensa  nommu_kc705_defconfig
nds32 allnoconfig
arm   versatile_defconfig
shhp6xx_defconfig
arm   sunxi_defconfig
powerpc mpc5200_defconfig
powerpc mpc8272_ads_defconfig
openriscdefconfig
powerpc   holly_defconfig
arm  prima2_defconfig
arc   tb10x_defconfig
m68kdefconfig
armkeystone_defconfig
mipsnlm_xlr_defconfig
xtensa  defconfig
powerpc stx_gp3_defconfig
arm   cns3420vb_defconfig
arm   omap1_defconfig
mips

[powerpc:fixes-test] BUILD SUCCESS a1ee28117077c3bf24e5ab6324c835eaab629c45

2020-12-02 Thread kernel test robot

 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a004-20201202
x86_64   randconfig-a006-20201202
x86_64   randconfig-a001-20201202
x86_64   randconfig-a002-20201202
x86_64   randconfig-a005-20201202
x86_64   randconfig-a003-20201202
i386 randconfig-a004-20201202
i386 randconfig-a005-20201202
i386 randconfig-a001-20201202
i386 randconfig-a002-20201202
i386 randconfig-a006-20201202
i386 randconfig-a003-20201202
i386 randconfig-a014-20201202
i386 randconfig-a013-20201202
i386 randconfig-a011-20201202
i386 randconfig-a015-20201202
i386 randconfig-a012-20201202
i386 randconfig-a016-20201202
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a016-20201202
x86_64   randconfig-a012-20201202
x86_64   randconfig-a014-20201202
x86_64   randconfig-a013-20201202
x86_64   randconfig-a015-20201202
x86_64   randconfig-a011-20201202

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

[PATCH v3 18/18] ibmvfc: drop host lock when completing commands in CRQ

2020-12-02 Thread Tyrel Datwyler

The legacy CRQ holds the host lock the even while completing commands.
This presents a problem when in legacy single queue mode and
nr_hw_queues is greater than one since calling scsi_done() introduces
the potential for deadlock.

If nr_hw_queues is greater than one drop the hostlock in the legacy CRQ
path when completing a command.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index e499599662ec..e2200bdff2be 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2969,6 +2969,7 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, 
struct ibmvfc_host *vhost)
 {
long rc;
struct ibmvfc_event *evt = (struct ibmvfc_event 
*)be64_to_cpu(crq->ioba);
+   unsigned long flags;
 
switch (crq->valid) {
case IBMVFC_CRQ_INIT_RSP:
@@ -3039,7 +3040,12 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, 
struct ibmvfc_host *vhost)
del_timer(>timer);
list_del(>queue);
ibmvfc_trc_end(evt);
-   evt->done(evt);
+   if (nr_scsi_hw_queues > 1) {
+   spin_unlock_irqrestore(vhost->host->host_lock, flags);
+   evt->done(evt);
+   spin_lock_irqsave(vhost->host->host_lock, flags);
+   } else
+   evt->done(evt);
 }
 
 /**
-- 
2.27.0

[PATCH v3 16/18] ibmvfc: enable MQ and set reasonable defaults

2020-12-02 Thread Tyrel Datwyler

Turn on MQ by default and set sane values for the upper limit on hw
queues for the scsi host, and number of hw scsi channels to request from
the partner VIOS.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 980eb9afe93a..93c234a5512d 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -41,9 +41,9 @@
 #define IBMVFC_DEFAULT_LOG_LEVEL   2
 #define IBMVFC_MAX_CDB_LEN 16
 #define IBMVFC_CLS3_ERROR  0
-#define IBMVFC_MQ  0
-#define IBMVFC_SCSI_CHANNELS   0
-#define IBMVFC_SCSI_HW_QUEUES  1
+#define IBMVFC_MQ  1
+#define IBMVFC_SCSI_CHANNELS   8
+#define IBMVFC_SCSI_HW_QUEUES  16
 #define IBMVFC_MIG_NO_SUB_TO_CRQ   0
 #define IBMVFC_MIG_NO_N_TO_M   0
 
-- 
2.27.0

[PATCH v3 17/18] ibmvfc: provide modules parameters for MQ settings

2020-12-02 Thread Tyrel Datwyler

Add the various module parameter toggles for adjusting the MQ
characteristics at boot/load time as well as a device attribute for
changing the client scsi channel request amount.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 76 +-
 1 file changed, 66 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index e353b9e88104..e499599662ec 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -40,6 +40,12 @@ static unsigned int disc_threads = IBMVFC_MAX_DISC_THREADS;
 static unsigned int ibmvfc_debug = IBMVFC_DEBUG;
 static unsigned int log_level = IBMVFC_DEFAULT_LOG_LEVEL;
 static unsigned int cls3_error = IBMVFC_CLS3_ERROR;
+static unsigned int mq_enabled = IBMVFC_MQ;
+static unsigned int nr_scsi_hw_queues = IBMVFC_SCSI_HW_QUEUES;
+static unsigned int nr_scsi_channels = IBMVFC_SCSI_CHANNELS;
+static unsigned int mig_channels_only = IBMVFC_MIG_NO_SUB_TO_CRQ;
+static unsigned int mig_no_less_channels = IBMVFC_MIG_NO_N_TO_M;
+
 static LIST_HEAD(ibmvfc_head);
 static DEFINE_SPINLOCK(ibmvfc_driver_lock);
 static struct scsi_transport_template *ibmvfc_transport_template;
@@ -49,6 +55,22 @@ MODULE_AUTHOR("Brian King ");
 MODULE_LICENSE("GPL");
 MODULE_VERSION(IBMVFC_DRIVER_VERSION);
 
+module_param_named(mq, mq_enabled, uint, S_IRUGO);
+MODULE_PARM_DESC(mq, "Enable multiqueue support. "
+"[Default=" __stringify(IBMVFC_MQ) "]");
+module_param_named(scsi_host_queues, nr_scsi_hw_queues, uint, S_IRUGO);
+MODULE_PARM_DESC(scsi_host_queues, "Number of SCSI Host submission queues. "
+"[Default=" __stringify(IBMVFC_SCSI_HW_QUEUES) "]");
+module_param_named(scsi_hw_channels, nr_scsi_channels, uint, S_IRUGO);
+MODULE_PARM_DESC(scsi_hw_channels, "Number of hw scsi channels to request. "
+"[Default=" __stringify(IBMVFC_SCSI_CHANNELS) "]");
+module_param_named(mig_channels_only, mig_channels_only, uint, S_IRUGO);
+MODULE_PARM_DESC(mig_channels_only, "Prevent migration to non-channelized 
system. "
+"[Default=" __stringify(IBMVFC_MIG_NO_SUB_TO_CRQ) "]");
+module_param_named(mig_no_less_channels, mig_no_less_channels, uint, S_IRUGO);
+MODULE_PARM_DESC(mig_no_less_channels, "Prevent migration to system with less 
channels. "
+"[Default=" __stringify(IBMVFC_MIG_NO_N_TO_M) "]");
+
 module_param_named(init_timeout, init_timeout, uint, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(init_timeout, "Initialization timeout in seconds. "
 "[Default=" __stringify(IBMVFC_INIT_TIMEOUT) "]");
@@ -823,7 +845,7 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
crq->cur = 0;
 
if (vhost->scsi_scrqs.scrqs) {
-   for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+   for (i = 0; i < nr_scsi_hw_queues; i++) {
scrq = >scsi_scrqs.scrqs[i];
memset(scrq->msgs, 0, PAGE_SIZE);
scrq->cur = 0;
@@ -3234,6 +3256,37 @@ static ssize_t ibmvfc_store_log_level(struct device *dev,
return strlen(buf);
 }
 
+static ssize_t ibmvfc_show_scsi_channels(struct device *dev,
+struct device_attribute *attr, char 
*buf)
+{
+   struct Scsi_Host *shost = class_to_shost(dev);
+   struct ibmvfc_host *vhost = shost_priv(shost);
+   unsigned long flags = 0;
+   int len;
+
+   spin_lock_irqsave(shost->host_lock, flags);
+   len = snprintf(buf, PAGE_SIZE, "%d\n", vhost->client_scsi_channels);
+   spin_unlock_irqrestore(shost->host_lock, flags);
+   return len;
+}
+
+static ssize_t ibmvfc_store_scsi_channels(struct device *dev,
+struct device_attribute *attr,
+const char *buf, size_t count)
+{
+   struct Scsi_Host *shost = class_to_shost(dev);
+   struct ibmvfc_host *vhost = shost_priv(shost);
+   unsigned long flags = 0;
+   unsigned int channels;
+
+   spin_lock_irqsave(shost->host_lock, flags);
+   channels = simple_strtoul(buf, NULL, 10);
+   vhost->client_scsi_channels = min(channels, nr_scsi_hw_queues);
+   ibmvfc_hard_reset_host(vhost);
+   spin_unlock_irqrestore(shost->host_lock, flags);
+   return strlen(buf);
+}
+
 static DEVICE_ATTR(partition_name, S_IRUGO, ibmvfc_show_host_partition_name, 
NULL);
 static DEVICE_ATTR(device_name, S_IRUGO, ibmvfc_show_host_device_name, NULL);
 static DEVICE_ATTR(port_loc_code, S_IRUGO, ibmvfc_show_host_loc_code, NULL);
@@ -3242,6 +3295,8 @@ static DEVICE_ATTR(npiv_version, S_IRUGO, 
ibmvfc_show_host_npiv_version, NULL);
 static DEVICE_ATTR(capabilities, S_IRUGO, ibmvfc_show_host_capabilities, NULL);
 static DEVICE_ATTR(log_level, S_IRUGO | S_IWUSR,
   ibmvfc_show_log_level, ibmvfc_store_log_level);
+static DEVICE_ATTR(nr_scsi_channels, S_IRUGO | S_IWUSR,
+

[PATCH v3 15/18] ibmvfc: send Cancel MAD down each hw scsi channel

2020-12-02 Thread Tyrel Datwyler

In general the client needs to send Cancel MADs and task management
commands down the same channel as the command(s) intended to cancel or
abort. The client assigns cancel keys per LUN and thus must send a
Cancel down each channel commands were submitted for that LUN. Further,
the client then must wait for those cancel completions prior to
submitting a LUN RESET or ABORT TASK SET.

Add a cancel event pointer and cancel rsp iu storage to the
ibmvfc_sub_queue struct such that the cancel routine can assign a cancel
event to each applicable queue. When in legacy CRQ mode we fake treating
it as a subqueue by using a subqueue struct allocated on the stack. Wait
for completion of each submitted cancel.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 104 ++---
 drivers/scsi/ibmvscsi/ibmvfc.h |  38 ++--
 2 files changed, 90 insertions(+), 52 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index ec3db5a6baf3..e353b9e88104 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2339,67 +2339,103 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, 
int type)
 {
struct ibmvfc_host *vhost = shost_priv(sdev->host);
struct ibmvfc_event *evt, *found_evt;
-   union ibmvfc_iu rsp;
-   int rsp_rc = -EBUSY;
+   struct ibmvfc_sub_queue *scrqs;
+   struct ibmvfc_sub_queue legacy_crq;
+   int rsp_rc = 0;
unsigned long flags;
u16 status;
+   int cancel_cnt = 0;
+   int num_hwq;
+   int ret = 0;
+   int i;
 
ENTER;
spin_lock_irqsave(vhost->host->host_lock, flags);
-   found_evt = NULL;
-   list_for_each_entry(evt, >sent, queue) {
-   if (evt->cmnd && evt->cmnd->device == sdev) {
-   found_evt = evt;
+   if (vhost->using_channels && vhost->scsi_scrqs.active_queues) {
+   num_hwq = vhost->scsi_scrqs.active_queues;
+   scrqs = vhost->scsi_scrqs.scrqs;
+   } else {
+   /* Use ibmvfc_sub_queue on the stack to fake legacy CRQ as a 
subqueue */
+   num_hwq = 1;
+   scrqs = _crq;
+   }
+
+   for (i = 0; i < num_hwq; i++) {
+   scrqs[i].cancel_event = NULL;
+   found_evt = NULL;
+   list_for_each_entry(evt, >sent, queue) {
+   if (evt->cmnd && evt->cmnd->device == sdev && evt->hwq 
== i) {
+   found_evt = evt;
+   cancel_cnt++;
+   break;
+   }
+   }
+
+   if (!found_evt)
+   continue;
+
+   if (vhost->logged_in) {
+   scrqs[i].cancel_event = ibmvfc_init_tmf(vhost, sdev, 
type);
+   scrqs[i].cancel_event->hwq = i;
+   scrqs[i].cancel_event->sync_iu = [i].cancel_rsp;
+   rsp_rc = ibmvfc_send_event(scrqs[i].cancel_event, 
vhost, default_timeout);
+   if (rsp_rc)
+   break;
+   } else {
+   rsp_rc = -EBUSY;
break;
}
}
 
-   if (!found_evt) {
+   spin_unlock_irqrestore(vhost->host->host_lock, flags);
+
+   if (!cancel_cnt) {
if (vhost->log_level > IBMVFC_DEFAULT_LOG_LEVEL)
sdev_printk(KERN_INFO, sdev, "No events found to 
cancel\n");
-   spin_unlock_irqrestore(vhost->host->host_lock, flags);
return 0;
}
 
-   if (vhost->logged_in) {
-   evt = ibmvfc_init_tmf(vhost, sdev, type);
-   evt->sync_iu = 
-   rsp_rc = ibmvfc_send_event(evt, vhost, default_timeout);
-   }
-
-   spin_unlock_irqrestore(vhost->host->host_lock, flags);
-
if (rsp_rc != 0) {
sdev_printk(KERN_ERR, sdev, "Failed to send cancel event. 
rc=%d\n", rsp_rc);
/* If failure is received, the host adapter is most likely going
 through reset, return success so the caller will wait for the 
command
 being cancelled to get returned */
-   return 0;
+   goto free_events;
}
 
sdev_printk(KERN_INFO, sdev, "Cancelling outstanding commands.\n");
 
-   wait_for_completion(>comp);
-   status = be16_to_cpu(rsp.mad_common.status);
-   spin_lock_irqsave(vhost->host->host_lock, flags);
-   ibmvfc_free_event(evt);
-   spin_unlock_irqrestore(vhost->host->host_lock, flags);
+   for (i = 0; i < num_hwq; i++) {
+   if (!scrqs[i].cancel_event)
+   continue;
 
-   if (status != IBMVFC_MAD_SUCCESS) {
-   sdev_printk(KERN_WARNING, sdev, "Cancel failed with rc=%x\n", 
status);
-   switch (status) {
-   case

[PATCH v3 12/18] ibmvfc: send commands down HW Sub-CRQ when channelized

2020-12-02 Thread Tyrel Datwyler

When the client has negotiated the use of channels all vfcFrames are
required to go down a Sub-CRQ channel or it is a protocoal violation. If
the adapter state is channelized submit vfcFrames to the appropriate
Sub-CRQ via the h_send_sub_crq() helper.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 32 +++-
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index b51ae17883b7..d94db73d4a15 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -701,6 +701,15 @@ static int ibmvfc_send_crq(struct ibmvfc_host *vhost, u64 
word1, u64 word2)
return plpar_hcall_norets(H_SEND_CRQ, vdev->unit_address, word1, word2);
 }
 
+static int ibmvfc_send_sub_crq(struct ibmvfc_host *vhost, u64 cookie, u64 
word1,
+  u64 word2, u64 word3, u64 word4)
+{
+   struct vio_dev *vdev = to_vio_dev(vhost->dev);
+
+   return plpar_hcall_norets(H_SEND_SUB_CRQ, vdev->unit_address, cookie,
+ word1, word2, word3, word4);
+}
+
 /**
  * ibmvfc_send_crq_init - Send a CRQ init message
  * @vhost: ibmvfc host struct
@@ -1513,15 +1522,19 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
 struct ibmvfc_host *vhost, unsigned long timeout)
 {
__be64 *crq_as_u64 = (__be64 *) >crq;
+   int channel_cmd = 0;
int rc;
 
/* Copy the IU into the transfer area */
*evt->xfer_iu = evt->iu;
-   if (evt->crq.format == IBMVFC_CMD_FORMAT)
+   if (evt->crq.format == IBMVFC_CMD_FORMAT) {
evt->xfer_iu->cmd.tag = cpu_to_be64((u64)evt);
-   else if (evt->crq.format == IBMVFC_MAD_FORMAT)
+   channel_cmd = 1;
+   } else if (evt->crq.format == IBMVFC_MAD_FORMAT) {
evt->xfer_iu->mad_common.tag = cpu_to_be64((u64)evt);
-   else
+   if (evt->xfer_iu->mad_common.opcode == IBMVFC_TMF_MAD)
+   channel_cmd = 1;
+   } else
BUG();
 
list_add_tail(>queue, >sent);
@@ -1534,8 +1547,17 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
 
mb();
 
-   if ((rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]),
- be64_to_cpu(crq_as_u64[1] {
+   if (vhost->using_channels && channel_cmd)
+   rc = ibmvfc_send_sub_crq(vhost,
+
vhost->scsi_scrqs.scrqs[evt->hwq].vios_cookie,
+be64_to_cpu(crq_as_u64[0]),
+be64_to_cpu(crq_as_u64[1]),
+0, 0);
+   else
+   rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]),
+be64_to_cpu(crq_as_u64[1]));
+
+   if (rc) {
list_del(>queue);
del_timer(>timer);
 
-- 
2.27.0

[PATCH v3 14/18] ibmvfc: add cancel mad initialization helper

2020-12-02 Thread Tyrel Datwyler

Add a helper routine for initializing a Cancel MAD. This will be useful
for a channelized client that needs to send a Cancel commands down every
channel commands were sent for a particular LUN.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 67 --
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index d65de320252e..ec3db5a6baf3 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2286,6 +2286,44 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host 
*vhost, void *device,
return SUCCESS;
 }
 
+static struct ibmvfc_event *ibmvfc_init_tmf(struct ibmvfc_host *vhost,
+struct scsi_device *sdev,
+int type)
+{
+   struct scsi_target *starget = scsi_target(sdev);
+   struct fc_rport *rport = starget_to_rport(starget);
+   struct ibmvfc_event *evt;
+   struct ibmvfc_tmf *tmf;
+
+   evt = ibmvfc_get_event(vhost);
+   ibmvfc_init_event(evt, ibmvfc_sync_completion, IBMVFC_MAD_FORMAT);
+
+   tmf = >iu.tmf;
+   memset(tmf, 0, sizeof(*tmf));
+   if (ibmvfc_check_caps(vhost, IBMVFC_HANDLE_VF_WWPN)) {
+   tmf->common.version = cpu_to_be32(2);
+   tmf->target_wwpn = cpu_to_be64(rport->port_name);
+   } else {
+   tmf->common.version = cpu_to_be32(1);
+   }
+   tmf->common.opcode = cpu_to_be32(IBMVFC_TMF_MAD);
+   tmf->common.length = cpu_to_be16(sizeof(*tmf));
+   tmf->scsi_id = cpu_to_be64(rport->port_id);
+   int_to_scsilun(sdev->lun, >lun);
+   if (!ibmvfc_check_caps(vhost, IBMVFC_CAN_SUPPRESS_ABTS))
+   type &= ~IBMVFC_TMF_SUPPRESS_ABTS;
+   if (vhost->state == IBMVFC_ACTIVE)
+   tmf->flags = cpu_to_be32((type | IBMVFC_TMF_LUA_VALID));
+   else
+   tmf->flags = cpu_to_be32(((type & IBMVFC_TMF_SUPPRESS_ABTS) | 
IBMVFC_TMF_LUA_VALID));
+   tmf->cancel_key = cpu_to_be32((unsigned long)sdev->hostdata);
+   tmf->my_cancel_key = cpu_to_be32((unsigned long)starget->hostdata);
+
+   init_completion(>comp);
+
+   return evt;
+}
+
 /**
  * ibmvfc_cancel_all - Cancel all outstanding commands to the device
  * @sdev:  scsi device to cancel commands
@@ -2300,9 +2338,6 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host *vhost, 
void *device,
 static int ibmvfc_cancel_all(struct scsi_device *sdev, int type)
 {
struct ibmvfc_host *vhost = shost_priv(sdev->host);
-   struct scsi_target *starget = scsi_target(sdev);
-   struct fc_rport *rport = starget_to_rport(starget);
-   struct ibmvfc_tmf *tmf;
struct ibmvfc_event *evt, *found_evt;
union ibmvfc_iu rsp;
int rsp_rc = -EBUSY;
@@ -2327,32 +2362,8 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, 
int type)
}
 
if (vhost->logged_in) {
-   evt = ibmvfc_get_event(vhost);
-   ibmvfc_init_event(evt, ibmvfc_sync_completion, 
IBMVFC_MAD_FORMAT);
-
-   tmf = >iu.tmf;
-   memset(tmf, 0, sizeof(*tmf));
-   if (ibmvfc_check_caps(vhost, IBMVFC_HANDLE_VF_WWPN)) {
-   tmf->common.version = cpu_to_be32(2);
-   tmf->target_wwpn = cpu_to_be64(rport->port_name);
-   } else {
-   tmf->common.version = cpu_to_be32(1);
-   }
-   tmf->common.opcode = cpu_to_be32(IBMVFC_TMF_MAD);
-   tmf->common.length = cpu_to_be16(sizeof(*tmf));
-   tmf->scsi_id = cpu_to_be64(rport->port_id);
-   int_to_scsilun(sdev->lun, >lun);
-   if (!ibmvfc_check_caps(vhost, IBMVFC_CAN_SUPPRESS_ABTS))
-   type &= ~IBMVFC_TMF_SUPPRESS_ABTS;
-   if (vhost->state == IBMVFC_ACTIVE)
-   tmf->flags = cpu_to_be32((type | IBMVFC_TMF_LUA_VALID));
-   else
-   tmf->flags = cpu_to_be32(((type & 
IBMVFC_TMF_SUPPRESS_ABTS) | IBMVFC_TMF_LUA_VALID));
-   tmf->cancel_key = cpu_to_be32((unsigned long)sdev->hostdata);
-   tmf->my_cancel_key = cpu_to_be32((unsigned 
long)starget->hostdata);
-
+   evt = ibmvfc_init_tmf(vhost, sdev, type);
evt->sync_iu = 
-   init_completion(>comp);
rsp_rc = ibmvfc_send_event(evt, vhost, default_timeout);
}
 
-- 
2.27.0

[PATCH v3 13/18] ibmvfc: register Sub-CRQ handles with VIOS during channel setup

2020-12-02 Thread Tyrel Datwyler

If the ibmvfc client adapter requests channels it must submit a number
of Sub-CRQ handles matching the number of channels being requested. The
VIOS in its response will overwrite the actual number of channel
resources allocated which may be less than what was requested. The
client then must store the VIOS Sub-CRQ handle for each queue. This VIOS
handle is needed as a parameter with  h_send_sub_crq().

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index d94db73d4a15..d65de320252e 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4512,15 +4512,35 @@ static void ibmvfc_discover_targets(struct ibmvfc_host 
*vhost)
 static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt)
 {
struct ibmvfc_host *vhost = evt->vhost;
+   struct ibmvfc_channel_setup *setup = vhost->channel_setup_buf;
+   struct ibmvfc_scsi_channels *scrqs = >scsi_scrqs;
u32 mad_status = be16_to_cpu(evt->xfer_iu->channel_setup.common.status);
int level = IBMVFC_DEFAULT_LOG_LEVEL;
+   int flags, active_queues, i;
 
ibmvfc_free_event(evt);
 
switch (mad_status) {
case IBMVFC_MAD_SUCCESS:
ibmvfc_dbg(vhost, "Channel Setup succeded\n");
+   flags = be32_to_cpu(setup->flags);
vhost->do_enquiry = 0;
+   active_queues = be32_to_cpu(setup->num_scsi_subq_channels);
+   scrqs->active_queues = active_queues;
+
+   if (flags & IBMVFC_CHANNELS_CANCELED) {
+   ibmvfc_dbg(vhost, "Channels Canceled\n");
+   vhost->using_channels = 0;
+   } else {
+   if (active_queues)
+   vhost->using_channels = 1;
+   for (i = 0; i < active_queues; i++)
+   scrqs->scrqs[i].vios_cookie =
+   be64_to_cpu(setup->channel_handles[i]);
+
+   ibmvfc_dbg(vhost, "Using %u channels\n",
+  vhost->scsi_scrqs.active_queues);
+   }
break;
case IBMVFC_MAD_FAILED:
level += ibmvfc_retry_host_init(vhost);
@@ -4544,9 +4564,19 @@ static void ibmvfc_channel_setup(struct ibmvfc_host 
*vhost)
struct ibmvfc_channel_setup_mad *mad;
struct ibmvfc_channel_setup *setup_buf = vhost->channel_setup_buf;
struct ibmvfc_event *evt = ibmvfc_get_event(vhost);
+   struct ibmvfc_scsi_channels *scrqs = >scsi_scrqs;
+   unsigned int num_channels =
+   min(vhost->client_scsi_channels, vhost->max_vios_scsi_channels);
+   int i;
 
memset(setup_buf, 0, sizeof(*setup_buf));
-   setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS);
+   if (num_channels == 0)
+   setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS);
+   else {
+   setup_buf->num_scsi_subq_channels = cpu_to_be32(num_channels);
+   for (i = 0; i < num_channels; i++)
+   setup_buf->channel_handles[i] = 
cpu_to_be64(scrqs->scrqs[i].cookie);
+   }
 
ibmvfc_init_event(evt, ibmvfc_channel_setup_done, IBMVFC_MAD_FORMAT);
mad = >iu.channel_setup;
-- 
2.27.0

[PATCH v3 11/18] ibmvfc: set and track hw queue in ibmvfc_event struct

2020-12-02 Thread Tyrel Datwyler

Extract the hwq id from a SCSI command and store it in the ibmvfc_event
structure to identify which Sub-CRQ to send the command down when
channels are being utilized.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 5 +
 drivers/scsi/ibmvscsi/ibmvfc.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 0eb91ac86d96..b51ae17883b7 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1397,6 +1397,7 @@ static void ibmvfc_init_event(struct ibmvfc_event *evt,
evt->crq.format = format;
evt->done = done;
evt->eh_comp = NULL;
+   evt->hwq = 0;
 }
 
 /**
@@ -1748,6 +1749,8 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd,
struct ibmvfc_cmd *vfc_cmd;
struct ibmvfc_fcp_cmd_iu *iu;
struct ibmvfc_event *evt;
+   u32 tag_and_hwq = blk_mq_unique_tag(cmnd->request);
+   u16 hwq = blk_mq_unique_tag_to_hwq(tag_and_hwq);
int rc;
 
if (unlikely((rc = fc_remote_port_chkready(rport))) ||
@@ -1775,6 +1778,8 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd,
}
 
vfc_cmd->correlation = cpu_to_be64(evt);
+   if (vhost->using_channels)
+   evt->hwq = hwq % vhost->scsi_scrqs.active_queues;
 
if (likely(!(rc = ibmvfc_map_sg_data(cmnd, evt, vfc_cmd, vhost->dev
return ibmvfc_send_event(evt, vhost, 0);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index dff26dbd912c..e0ffb0416223 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -781,6 +781,7 @@ struct ibmvfc_event {
struct completion comp;
struct completion *eh_comp;
struct timer_list timer;
+   u16 hwq;
 };
 
 /* a pool of event structs for use */
-- 
2.27.0

[PATCH v3 10/18] ibmvfc: advertise client support for using hardware channels

2020-12-02 Thread Tyrel Datwyler

Previous patches have plumbed the necessary Sub-CRQ interface and
channel negotiation MADs to fully channelized hardware queues.

Advertise client support via NPIV Login capability
IBMVFC_CAN_USE_CHANNELS when the client bits have MQ enabled via
vhost->mq_enabled, or when channels were already in use during a
subsequent NPIV Login. The later is required because channel support is
only renegotiated after a CRQ pair is broken. Simple NPIV Logout/Logins
require the client to continue to advertise the channel capability until
the CRQ pair between the client is broken.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index d584a4ce0682..0eb91ac86d96 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1282,6 +1282,10 @@ static void ibmvfc_set_login_info(struct ibmvfc_host 
*vhost)
 
login_info->max_cmds = cpu_to_be32(max_requests + 
IBMVFC_NUM_INTERNAL_REQ);
login_info->capabilities = cpu_to_be64(IBMVFC_CAN_MIGRATE | 
IBMVFC_CAN_SEND_VF_WWPN);
+
+   if (vhost->mq_enabled || vhost->using_channels)
+   login_info->capabilities |= 
cpu_to_be64(IBMVFC_CAN_USE_CHANNELS);
+
login_info->async.va = cpu_to_be64(vhost->async_crq.msg_token);
login_info->async.len = cpu_to_be32(vhost->async_crq.size * 
sizeof(*vhost->async_crq.msgs));
strncpy(login_info->partition_name, vhost->partition_name, 
IBMVFC_MAX_NAME);
-- 
2.27.0

[PATCH v3 09/18] ibmvfc: implement channel enquiry and setup commands

2020-12-02 Thread Tyrel Datwyler

New NPIV_ENQUIRY_CHANNEL and NPIV_SETUP_CHANNEL management datagrams
(MADs) were defined in a previous patchset. If the client advertises a
desire to use channels and the partner VIOS is channel capable then the
client must proceed with channel enquiry to determine the maximum number
of channels the VIOS is capable of providing, and registering SubCRQs
via channel setup with the VIOS immediately following NPIV Login. This
handshaking should not be performed for subsequent NPIV Logins unless
the CRQ connection has been reset.

Implement these two new MADs and issue them following a successful NPIV
login where the VIOS has set the SUPPORT_CHANNELS capability bit in the
NPIV Login response.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 135 -
 drivers/scsi/ibmvscsi/ibmvfc.h |   3 +
 2 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 6b299df786fc..d584a4ce0682 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -806,6 +806,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
spin_lock_irqsave(vhost->host->host_lock, flags);
vhost->state = IBMVFC_NO_CRQ;
vhost->logged_in = 0;
+   vhost->do_enquiry = 1;
+   vhost->using_channels = 0;
 
/* Clean out the queue */
memset(crq->msgs, 0, PAGE_SIZE);
@@ -4476,6 +4478,118 @@ static void ibmvfc_discover_targets(struct ibmvfc_host 
*vhost)
ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
 }
 
+static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt)
+{
+   struct ibmvfc_host *vhost = evt->vhost;
+   u32 mad_status = be16_to_cpu(evt->xfer_iu->channel_setup.common.status);
+   int level = IBMVFC_DEFAULT_LOG_LEVEL;
+
+   ibmvfc_free_event(evt);
+
+   switch (mad_status) {
+   case IBMVFC_MAD_SUCCESS:
+   ibmvfc_dbg(vhost, "Channel Setup succeded\n");
+   vhost->do_enquiry = 0;
+   break;
+   case IBMVFC_MAD_FAILED:
+   level += ibmvfc_retry_host_init(vhost);
+   ibmvfc_log(vhost, level, "Channel Setup failed\n");
+   fallthrough;
+   case IBMVFC_MAD_DRIVER_FAILED:
+   return;
+   default:
+   dev_err(vhost->dev, "Invalid Channel Setup response: 0x%x\n",
+   mad_status);
+   ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
+   return;
+   }
+
+   ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_QUERY);
+   wake_up(>work_wait_q);
+}
+
+static void ibmvfc_channel_setup(struct ibmvfc_host *vhost)
+{
+   struct ibmvfc_channel_setup_mad *mad;
+   struct ibmvfc_channel_setup *setup_buf = vhost->channel_setup_buf;
+   struct ibmvfc_event *evt = ibmvfc_get_event(vhost);
+
+   memset(setup_buf, 0, sizeof(*setup_buf));
+   setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS);
+
+   ibmvfc_init_event(evt, ibmvfc_channel_setup_done, IBMVFC_MAD_FORMAT);
+   mad = >iu.channel_setup;
+   memset(mad, 0, sizeof(*mad));
+   mad->common.version = cpu_to_be32(1);
+   mad->common.opcode = cpu_to_be32(IBMVFC_CHANNEL_SETUP);
+   mad->common.length = cpu_to_be16(sizeof(*mad));
+   mad->buffer.va = cpu_to_be64(vhost->channel_setup_dma);
+   mad->buffer.len = cpu_to_be32(sizeof(*vhost->channel_setup_buf));
+
+   ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_INIT_WAIT);
+
+   if (!ibmvfc_send_event(evt, vhost, default_timeout))
+   ibmvfc_dbg(vhost, "Sent channel setup\n");
+   else
+   ibmvfc_link_down(vhost, IBMVFC_LINK_DOWN);
+}
+
+static void ibmvfc_channel_enquiry_done(struct ibmvfc_event *evt)
+{
+   struct ibmvfc_host *vhost = evt->vhost;
+   struct ibmvfc_channel_enquiry *rsp = >xfer_iu->channel_enquiry;
+   u32 mad_status = be16_to_cpu(rsp->common.status);
+   int level = IBMVFC_DEFAULT_LOG_LEVEL;
+
+   switch (mad_status) {
+   case IBMVFC_MAD_SUCCESS:
+   ibmvfc_dbg(vhost, "Channel Enquiry succeeded\n");
+   vhost->max_vios_scsi_channels = 
be32_to_cpu(rsp->num_scsi_subq_channels);
+   ibmvfc_free_event(evt);
+   break;
+   case IBMVFC_MAD_FAILED:
+   level += ibmvfc_retry_host_init(vhost);
+   ibmvfc_log(vhost, level, "Channel Enquiry failed\n");
+   fallthrough;
+   case IBMVFC_MAD_DRIVER_FAILED:
+   ibmvfc_free_event(evt);
+   return;
+   default:
+   dev_err(vhost->dev, "Invalid Channel Enquiry response: 0x%x\n",
+   mad_status);
+   ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
+   ibmvfc_free_event(evt);
+   return;
+   }
+
+   ibmvfc_channel_setup(vhost);
+}
+
+static void ibmvfc_channel_enquiry(struct ibmvfc_host *vhost)
+{
+

[PATCH v3 08/18] ibmvfc: map/request irq and register Sub-CRQ interrupt handler

2020-12-02 Thread Tyrel Datwyler

Create an irq mapping for the hw_irq number provided from phyp firmware.
Request an irq assigned our Sub-CRQ interrupt handler.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 649268996a5c..6b299df786fc 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5134,12 +5134,34 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
goto reg_failed;
}
 
+   scrq->irq = irq_create_mapping(NULL, scrq->hw_irq);
+
+   if (!scrq->irq) {
+   rc = -EINVAL;
+   dev_err(dev, "Error mapping sub-crq[%d] irq\n", index);
+   goto irq_failed;
+   }
+
+   snprintf(scrq->name, sizeof(scrq->name), "ibmvfc-%x-scsi%d",
+vdev->unit_address, index);
+   rc = request_irq(scrq->irq, ibmvfc_interrupt_scsi, 0, scrq->name, scrq);
+
+   if (rc) {
+   dev_err(dev, "Couldn't register sub-crq[%d] irq\n", index);
+   irq_dispose_mapping(scrq->irq);
+   goto irq_failed;
+   }
+
scrq->hwq_id = index;
scrq->vhost = vhost;
 
LEAVE;
return 0;
 
+irq_failed:
+   do {
+   plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
scrq->cookie);
+   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
 reg_failed:
dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
 dma_map_failed:
-- 
2.27.0

[PATCH v3 07/18] ibmvfc: define Sub-CRQ interrupt handler routine

2020-12-02 Thread Tyrel Datwyler

Simple handler that calls Sub-CRQ drain routine directly.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index b61ae1df21e5..649268996a5c 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3461,6 +3461,16 @@ static void ibmvfc_drain_sub_crq(struct ibmvfc_sub_queue 
*scrq)
spin_unlock_irqrestore(scrq->vhost->host->host_lock, flags);
 }
 
+static irqreturn_t ibmvfc_interrupt_scsi(int irq, void *scrq_instance)
+{
+   struct ibmvfc_sub_queue *scrq = (struct ibmvfc_sub_queue 
*)scrq_instance;
+
+   ibmvfc_toggle_scrq_irq(scrq, 0);
+   ibmvfc_drain_sub_crq(scrq);
+
+   return IRQ_HANDLED;
+}
+
 /**
  * ibmvfc_init_tgt - Set the next init job step for the target
  * @tgt:   ibmvfc target struct
-- 
2.27.0

[PATCH v3 01/18] ibmvfc: add vhost fields and defaults for MQ enablement

2020-12-02 Thread Tyrel Datwyler

Introduce several new vhost fields for managing MQ state of the adapter
as well as initial defaults for MQ enablement.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c |  9 -
 drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 42e4d35e0d35..f1d677a7423d 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
}
 
shost->transportt = ibmvfc_transport_template;
-   shost->can_queue = max_requests;
+   shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);
shost->max_lun = max_lun;
shost->max_id = max_targets;
shost->max_sectors = IBMVFC_MAX_SECTORS;
shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
shost->unique_id = shost->host_no;
+   shost->nr_hw_queues = IBMVFC_SCSI_HW_QUEUES;
 
vhost = shost_priv(shost);
INIT_LIST_HEAD(>sent);
@@ -5178,6 +5179,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
vhost->partition_number = -1;
vhost->log_level = log_level;
vhost->task_set = 1;
+
+   vhost->mq_enabled = IBMVFC_MQ;
+   vhost->client_scsi_channels = IBMVFC_SCSI_CHANNELS;
+   vhost->using_channels = 0;
+   vhost->do_enquiry = 1;
+
strcpy(vhost->partition_name, "UNKNOWN");
init_waitqueue_head(>work_wait_q);
init_waitqueue_head(>init_wait_q);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 9d58cfd774d3..e095daada70e 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -41,16 +41,21 @@
 #define IBMVFC_DEFAULT_LOG_LEVEL   2
 #define IBMVFC_MAX_CDB_LEN 16
 #define IBMVFC_CLS3_ERROR  0
+#define IBMVFC_MQ  0
+#define IBMVFC_SCSI_CHANNELS   0
+#define IBMVFC_SCSI_HW_QUEUES  1
+#define IBMVFC_MIG_NO_SUB_TO_CRQ   0
+#define IBMVFC_MIG_NO_N_TO_M   0
 
 /*
  * Ensure we have resources for ERP and initialization:
- * 1 for ERP
  * 1 for initialization
  * 1 for NPIV Logout
  * 2 for BSG passthru
  * 2 for each discovery thread
+ * 1 ERP for each possible HW Queue
  */
-#define IBMVFC_NUM_INTERNAL_REQ(1 + 1 + 1 + 2 + (disc_threads * 2))
+#define IBMVFC_NUM_INTERNAL_REQ(1 + 1 + 2 + (disc_threads * 2) + 
IBMVFC_SCSI_HW_QUEUES)
 
 #define IBMVFC_MAD_SUCCESS 0x00
 #define IBMVFC_MAD_NOT_SUPPORTED   0xF1
@@ -826,6 +831,10 @@ struct ibmvfc_host {
int delay_init;
int scan_complete;
int logged_in;
+   int mq_enabled;
+   int using_channels;
+   int do_enquiry;
+   int client_scsi_channels;
int aborting_passthru;
int events_to_log;
 #define IBMVFC_AE_LINKUP   0x0001
-- 
2.27.0

[PATCH v3 03/18] ibmvfc: add Subordinate CRQ definitions

2020-12-02 Thread Tyrel Datwyler

Subordinate Command Response Queues (Sub CRQ) are used in conjunction
with the primary CRQ when more than one queue is needed by the virtual
IO adapter. Recent phyp firmware versions support Sub CRQ's with ibmvfc
adapters. This feature is a prerequisite for supporting multiple
hardware backed submission queues in the vfc adapter.

The Sub CRQ command element differs from the standard CRQ in that it is
32bytes long as opposed to 16bytes for the latter. Despite this extra
16bytes the ibmvfc protocol will use the original CRQ command element
mapped to the first 16bytes of the Sub CRQ element initially.

Add definitions for the Sub CRQ command element and queue.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index e095daada70e..b3cd35cbf067 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -656,6 +656,29 @@ struct ibmvfc_crq_queue {
dma_addr_t msg_token;
 };
 
+struct ibmvfc_sub_crq {
+   struct ibmvfc_crq crq;
+   __be64 reserved[2];
+} __packed __aligned(8);
+
+struct ibmvfc_sub_queue {
+   struct ibmvfc_sub_crq *msgs;
+   dma_addr_t msg_token;
+   int size, cur;
+   struct ibmvfc_host *vhost;
+   unsigned long cookie;
+   unsigned long vios_cookie;
+   unsigned long hw_irq;
+   unsigned long irq;
+   unsigned long hwq_id;
+   char name[32];
+};
+
+struct ibmvfc_scsi_channels {
+   struct ibmvfc_sub_queue *scrqs;
+   unsigned int active_queues;
+};
+
 enum ibmvfc_ae_link_state {
IBMVFC_AE_LS_LINK_UP= 0x01,
IBMVFC_AE_LS_LINK_BOUNCED   = 0x02,
-- 
2.27.0

[PATCH v3 06/18] ibmvfc: add handlers to drain and complete Sub-CRQ responses

2020-12-02 Thread Tyrel Datwyler

The logic for iterating over the Sub-CRQ responses is similiar to that
of the primary CRQ. Add the necessary handlers for processing those
responses.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 80 ++
 1 file changed, 80 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index e082935f56cf..b61ae1df21e5 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3381,6 +3381,86 @@ static int ibmvfc_toggle_scrq_irq(struct 
ibmvfc_sub_queue *scrq, int enable)
return rc;
 }
 
+static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host 
*vhost)
+{
+   struct ibmvfc_event *evt = (struct ibmvfc_event 
*)be64_to_cpu(crq->ioba);
+   unsigned long flags;
+
+   switch (crq->valid) {
+   case IBMVFC_CRQ_CMD_RSP:
+   break;
+   case IBMVFC_CRQ_XPORT_EVENT:
+   return;
+   default:
+   dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", 
crq->valid);
+   return;
+   }
+
+   /* The only kind of payload CRQs we should get are responses to
+* things we send. Make sure this response is to something we
+* actually sent
+*/
+   if (unlikely(!ibmvfc_valid_event(>pool, evt))) {
+   dev_err(vhost->dev, "Returned correlation_token 0x%08llx is 
invalid!\n",
+   crq->ioba);
+   return;
+   }
+
+   if (unlikely(atomic_read(>free))) {
+   dev_err(vhost->dev, "Received duplicate correlation_token 
0x%08llx!\n",
+   crq->ioba);
+   return;
+   }
+
+   del_timer(>timer);
+   list_del(>queue);
+   ibmvfc_trc_end(evt);
+   spin_unlock_irqrestore(vhost->host->host_lock, flags);
+   evt->done(evt);
+   spin_lock_irqsave(vhost->host->host_lock, flags);
+}
+
+static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_sub_queue *scrq)
+{
+   struct ibmvfc_crq *crq;
+
+   crq = >msgs[scrq->cur].crq;
+   if (crq->valid & 0x80) {
+   if (++scrq->cur == scrq->size)
+   scrq->cur = 0;
+   rmb();
+   } else
+   crq = NULL;
+
+   return crq;
+}
+
+static void ibmvfc_drain_sub_crq(struct ibmvfc_sub_queue *scrq)
+{
+   struct ibmvfc_crq *crq;
+   unsigned long flags;
+   int done = 0;
+
+   spin_lock_irqsave(scrq->vhost->host->host_lock, flags);
+   while (!done) {
+   while ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
+   ibmvfc_handle_scrq(crq, scrq->vhost);
+   crq->valid = 0;
+   wmb();
+   }
+
+   ibmvfc_toggle_scrq_irq(scrq, 1);
+   if ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
+   ibmvfc_toggle_scrq_irq(scrq, 0);
+   ibmvfc_handle_scrq(crq, scrq->vhost);
+   crq->valid = 0;
+   wmb();
+   } else
+   done = 1;
+   }
+   spin_unlock_irqrestore(scrq->vhost->host->host_lock, flags);
+}
+
 /**
  * ibmvfc_init_tgt - Set the next init job step for the target
  * @tgt:   ibmvfc target struct
-- 
2.27.0

[PATCH v3 05/18] ibmvfc: add Sub-CRQ IRQ enable/disable routine

2020-12-02 Thread Tyrel Datwyler

Each Sub-CRQ has its own interrupt. A hypercall is required to toggle
the IRQ state. Provide the necessary mechanism via a helper function.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index f879be666c84..e082935f56cf 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3361,6 +3361,26 @@ static void ibmvfc_tasklet(void *data)
spin_unlock_irqrestore(vhost->host->host_lock, flags);
 }
 
+static int ibmvfc_toggle_scrq_irq(struct ibmvfc_sub_queue *scrq, int enable)
+{
+   struct device *dev = scrq->vhost->dev;
+   struct vio_dev *vdev = to_vio_dev(dev);
+   unsigned long rc;
+   int irq_action = H_ENABLE_VIO_INTERRUPT;
+
+   if (!enable)
+   irq_action = H_DISABLE_VIO_INTERRUPT;
+
+   rc = plpar_hcall_norets(H_VIOCTL, vdev->unit_address, irq_action,
+   scrq->hw_irq, 0, 0);
+
+   if (rc)
+   dev_err(dev, "Couldn't %s sub-crq[%lu] irq. rc=%ld\n",
+   enable ? "enable" : "disable", scrq->hwq_id, rc);
+
+   return rc;
+}
+
 /**
  * ibmvfc_init_tgt - Set the next init job step for the target
  * @tgt:   ibmvfc target struct
-- 
2.27.0

[PATCH v3 00/18] ibmvfc: initial MQ development

2020-12-02 Thread Tyrel Datwyler

Recent updates in pHyp Firmware and VIOS releases provide new infrastructure
towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each
Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the
partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then
negotiated with the VIOS via new Management Datagrams (MADs) for channel setup.

This initial implementation adds the necessary Sub-CRQ framework and implements
the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS
HW backed channels. The event pool and locking still leverages the legacy single
queue implementation, and as such lock contention is problematic when increasing
the number of queues. However, this initial work demonstrates a 1.2x factor
increase in IOPs when configured with two HW queues despite lock contention.

changes in v2:
* Patch 4: changed firmware support logging to dev_warn_once
* Patch 6: adjusted locking
* Patch 15: dropped logging verbosity, moved cancel event tracking into subqueue
* Patch 17: removed write permission for migration module parameters
drive hard reset after update to num of scsi channels

changes in v2:
* Patch 4: NULL'd scsi_scrq reference after deallocation
* Patch 6: Added switch case to handle XPORT event
* Patch 9: fixed ibmvfc_event leak and double free
* added support for cancel command with MQ
* added parameter toggles for MQ settings

Tyrel Datwyler (18):
  ibmvfc: add vhost fields and defaults for MQ enablement
  ibmvfc: define hcall wrapper for registering a Sub-CRQ
  ibmvfc: add Subordinate CRQ definitions
  ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
  ibmvfc: add Sub-CRQ IRQ enable/disable routine
  ibmvfc: add handlers to drain and complete Sub-CRQ responses
  ibmvfc: define Sub-CRQ interrupt handler routine
  ibmvfc: map/request irq and register Sub-CRQ interrupt handler
  ibmvfc: implement channel enquiry and setup commands
  ibmvfc: advertise client support for using hardware channels
  ibmvfc: set and track hw queue in ibmvfc_event struct
  ibmvfc: send commands down HW Sub-CRQ when channelized
  ibmvfc: register Sub-CRQ handles with VIOS during channel setup
  ibmvfc: add cancel mad initialization helper
  ibmvfc: send Cancel MAD down each hw scsi channel
  ibmvfc: enable MQ and set reasonable defaults
  ibmvfc: provide modules parameters for MQ settings
  ibmvfc: drop host lock when completing commands in CRQ

 drivers/scsi/ibmvscsi/ibmvfc.c | 721 +
 drivers/scsi/ibmvscsi/ibmvfc.h |  79 +++-
 2 files changed, 711 insertions(+), 89 deletions(-)

-- 
2.27.0

[PATCH v3 02/18] ibmvfc: define hcall wrapper for registering a Sub-CRQ

2020-12-02 Thread Tyrel Datwyler

Sub-CRQs are registred with firmware via a hypercall. Abstract that
interface into a simpler helper function.

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index f1d677a7423d..64674054dbae 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -138,6 +138,20 @@ static void ibmvfc_tgt_move_login(struct ibmvfc_target *);
 
 static const char *unknown_error = "unknown error";
 
+static long h_reg_sub_crq(unsigned long unit_address, unsigned long ioba,
+ unsigned long length, unsigned long *cookie,
+ unsigned long *irq)
+{
+   unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+   long rc;
+
+   rc = plpar_hcall(H_REG_SUB_CRQ, retbuf, unit_address, ioba, length);
+   *cookie = retbuf[0];
+   *irq = retbuf[1];
+
+   return rc;
+}
+
 static int ibmvfc_check_caps(struct ibmvfc_host *vhost, unsigned long 
cap_flags)
 {
u64 host_caps = be64_to_cpu(vhost->login_buf->resp.capabilities);
-- 
2.27.0

[PATCH v3 04/18] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels

2020-12-02 Thread Tyrel Datwyler

Allocate a set of Sub-CRQs in advance. During channel setup the client
and VIOS negotiate the number of queues the VIOS supports and the number
that the client desires to request. Its possible that the final channel
resources allocated is less than requested, but the client is still
responsible for sending handles for every queue it is hoping for.

Also, provide deallocation cleanup routines.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 129 +
 drivers/scsi/ibmvscsi/ibmvfc.h |   1 +
 2 files changed, 130 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 64674054dbae..f879be666c84 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -793,6 +793,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
unsigned long flags;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
struct ibmvfc_crq_queue *crq = >crq;
+   struct ibmvfc_sub_queue *scrq;
+   int i;
 
/* Close the CRQ */
do {
@@ -809,6 +811,14 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
memset(crq->msgs, 0, PAGE_SIZE);
crq->cur = 0;
 
+   if (vhost->scsi_scrqs.scrqs) {
+   for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+   scrq = >scsi_scrqs.scrqs[i];
+   memset(scrq->msgs, 0, PAGE_SIZE);
+   scrq->cur = 0;
+   }
+   }
+
/* And re-open it again */
rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address,
crq->msg_token, PAGE_SIZE);
@@ -4983,6 +4993,118 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
return retrc;
 }
 
+static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
+ int index)
+{
+   struct device *dev = vhost->dev;
+   struct vio_dev *vdev = to_vio_dev(dev);
+   struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index];
+   int rc = -ENOMEM;
+
+   ENTER;
+
+   scrq->msgs = (struct ibmvfc_sub_crq *)get_zeroed_page(GFP_KERNEL);
+   if (!scrq->msgs)
+   return rc;
+
+   scrq->size = PAGE_SIZE / sizeof(*scrq->msgs);
+   scrq->msg_token = dma_map_single(dev, scrq->msgs, PAGE_SIZE,
+DMA_BIDIRECTIONAL);
+
+   if (dma_mapping_error(dev, scrq->msg_token))
+   goto dma_map_failed;
+
+   rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
+  >cookie, >hw_irq);
+
+   if (rc) {
+   dev_warn(dev, "Error registering sub-crq: %d\n", rc);
+   if (rc == H_PARAMETER)
+   dev_warn_once(dev, "Firmware may not support MQ\n");
+   goto reg_failed;
+   }
+
+   scrq->hwq_id = index;
+   scrq->vhost = vhost;
+
+   LEAVE;
+   return 0;
+
+reg_failed:
+   dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
+dma_map_failed:
+   free_page((unsigned long)scrq->msgs);
+   LEAVE;
+   return rc;
+}
+
+static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int 
index)
+{
+   struct device *dev = vhost->dev;
+   struct vio_dev *vdev = to_vio_dev(dev);
+   struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index];
+   long rc;
+
+   ENTER;
+
+   do {
+   rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
+   scrq->cookie);
+   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
+
+   if (rc)
+   dev_err(dev, "Failed to free sub-crq[%d]: rc=%ld\n", index, rc);
+
+   dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
+   free_page((unsigned long)scrq->msgs);
+   LEAVE;
+}
+
+static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
+{
+   int i, j;
+
+   ENTER;
+
+   vhost->scsi_scrqs.scrqs = kcalloc(IBMVFC_SCSI_HW_QUEUES,
+ sizeof(*vhost->scsi_scrqs.scrqs),
+ GFP_KERNEL);
+   if (!vhost->scsi_scrqs.scrqs)
+   return -1;
+
+   for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+   if (ibmvfc_register_scsi_channel(vhost, i)) {
+   for (j = i; j > 0; j--)
+   ibmvfc_deregister_scsi_channel(vhost, j - 1);
+   kfree(vhost->scsi_scrqs.scrqs);
+   vhost->scsi_scrqs.scrqs = NULL;
+   vhost->scsi_scrqs.active_queues = 0;
+   LEAVE;
+   return -1;
+   }
+   }
+
+   LEAVE;
+   return 0;
+}
+
+static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
+{
+   int i;
+
+   ENTER;
+   if (!vhost->scsi_scrqs.scrqs)
+   return;
+
+   for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++)
+

[PATCH] powerpc: fix boolreturn.cocci warnings

2020-12-02 Thread kernel test robot

From: kernel test robot 

arch/powerpc/kernel/firmware.c:31:9-10: WARNING: return of 0/1 in function 
'check_kvm_guest' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Fixes: 77354ecf8473 ("powerpc: Rename is_kvm_guest() to check_kvm_guest()")
CC: Srikar Dronamraju 
Reported-by: kernel test robot 
Signed-off-by: kernel test robot 
---

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   fb003959777a635dea8910cf71109b612c7f940c
commit: 77354ecf8473208a5cc5f20a501760f7d6d631cd [121/184] powerpc: Rename 
is_kvm_guest() to check_kvm_guest()

 firmware.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/powerpc/kernel/firmware.c
+++ b/arch/powerpc/kernel/firmware.c
@@ -28,11 +28,11 @@ bool check_kvm_guest(void)
 
hyper_node = of_find_node_by_path("/hypervisor");
if (!hyper_node)
-   return 0;
+   return false;
 
if (!of_device_is_compatible(hyper_node, "linux,kvm"))
-   return 0;
+   return false;
 
-   return 1;
+   return true;
 }
 #endif

[powerpc:next-test 121/184] arch/powerpc/kernel/firmware.c:31:9-10: WARNING: return of 0/1 in function 'check_kvm_guest' with return type bool

2020-12-02 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   fb003959777a635dea8910cf71109b612c7f940c
commit: 77354ecf8473208a5cc5f20a501760f7d6d631cd [121/184] powerpc: Rename 
is_kvm_guest() to check_kvm_guest()
config: powerpc-randconfig-c003-20201202 (attached as .config)
compiler: powerpc64le-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


"coccinelle warnings: (new ones prefixed by >>)"
>> arch/powerpc/kernel/firmware.c:31:9-10: WARNING: return of 0/1 in function 
>> 'check_kvm_guest' with return type bool

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [PATCH v2 06/17] ibmvfc: add handlers to drain and complete Sub-CRQ responses

2020-12-02 Thread Tyrel Datwyler

On 12/2/20 7:46 AM, Brian King wrote:
> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>> The logic for iterating over the Sub-CRQ responses is similiar to that
>> of the primary CRQ. Add the necessary handlers for processing those
>> responses.
>>
>> Signed-off-by: Tyrel Datwyler 
>> ---
>>  drivers/scsi/ibmvscsi/ibmvfc.c | 77 ++
>>  1 file changed, 77 insertions(+)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>> index 97f00fefa809..e9da3f60c793 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>> @@ -3381,6 +3381,83 @@ static int ibmvfc_toggle_scrq_irq(struct 
>> ibmvfc_sub_queue *scrq, int enable)
>>  return rc;
>>  }
>>  
>> +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host 
>> *vhost)
>> +{
>> +struct ibmvfc_event *evt = (struct ibmvfc_event 
>> *)be64_to_cpu(crq->ioba);
>> +unsigned long flags;
>> +
>> +switch (crq->valid) {
>> +case IBMVFC_CRQ_CMD_RSP:
>> +break;
>> +case IBMVFC_CRQ_XPORT_EVENT:
>> +return;
>> +default:
>> +dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", 
>> crq->valid);
>> +return;
>> +}
>> +
>> +/* The only kind of payload CRQs we should get are responses to
>> + * things we send. Make sure this response is to something we
>> + * actually sent
>> + */
>> +if (unlikely(!ibmvfc_valid_event(>pool, evt))) {
>> +dev_err(vhost->dev, "Returned correlation_token 0x%08llx is 
>> invalid!\n",
>> +crq->ioba);
>> +return;
>> +}
>> +
>> +if (unlikely(atomic_read(>free))) {
>> +dev_err(vhost->dev, "Received duplicate correlation_token 
>> 0x%08llx!\n",
>> +crq->ioba);
>> +return;
>> +}
>> +
>> +spin_lock_irqsave(vhost->host->host_lock, flags);
>> +del_timer(>timer);
>> +list_del(>queue);
>> +ibmvfc_trc_end(evt);
>> +spin_unlock_irqrestore(vhost->host->host_lock, flags);
>> +evt->done(evt);
>> +}
>> +
>> +static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_sub_queue *scrq)
>> +{
>> +struct ibmvfc_crq *crq;
>> +
>> +crq = >msgs[scrq->cur].crq;
>> +if (crq->valid & 0x80) {
>> +if (++scrq->cur == scrq->size)
> 
> You are incrementing the cur pointer without any locks held. Although
> unlikely, could you also be in ibmvfc_reset_crq in another thread?
> If so, you'd have a subtle race condition here where the cur pointer could
> be read, then ibmvfc_reset_crq writes it to zero, then this thread
> writes it to a non zero value, which would then cause you to be out of
> sync with the VIOS as to where the cur pointer is.

Oof, yeah I was previously holding the lock the whole time, but switched it up
once I realized I can't complete a scsi command with the lock held, and got a
little too loose with it.

-Tyrel
> 
>> +scrq->cur = 0;
>> +rmb();
>> +} else
>> +crq = NULL;
>> +
>> +return crq;
>> +}
>> +
> 
> 
>

Re: [PATCH v2 04/17] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels

2020-12-02 Thread Tyrel Datwyler

On 12/2/20 7:25 AM, Brian King wrote:
> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>> +static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
>> +  int index)
>> +{
>> +struct device *dev = vhost->dev;
>> +struct vio_dev *vdev = to_vio_dev(dev);
>> +struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index];
>> +int rc = -ENOMEM;
>> +
>> +ENTER;
>> +
>> +scrq->msgs = (struct ibmvfc_sub_crq *)get_zeroed_page(GFP_KERNEL);
>> +if (!scrq->msgs)
>> +return rc;
>> +
>> +scrq->size = PAGE_SIZE / sizeof(*scrq->msgs);
>> +scrq->msg_token = dma_map_single(dev, scrq->msgs, PAGE_SIZE,
>> + DMA_BIDIRECTIONAL);
>> +
>> +if (dma_mapping_error(dev, scrq->msg_token))
>> +goto dma_map_failed;
>> +
>> +rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
>> +   >cookie, >hw_irq);
>> +
>> +if (rc) {
>> +dev_warn(dev, "Error registering sub-crq: %d\n", rc);
>> +dev_warn(dev, "Firmware may not support MQ\n");
> 
> Will this now get logged everywhere this new driver runs if the firmware
> does not support sub CRQs? Is there something better that could be done
> here to only log this for a true error and not just because a new driver
> is running with an older firmware release?

I suppose we can guess if the rc is H_PARAMETER that sub-crqs are probably not
supported and do a dev_warn_once() for the no MQ support message. H_PARAMETER
could mean other things though so we still need to log the failure in my 
opinion.

-Tyrel

> 
>> +goto reg_failed;
>> +}
>> +
>> +scrq->hwq_id = index;
>> +scrq->vhost = vhost;
>> +
>> +LEAVE;
>> +return 0;
>> +
>> +reg_failed:
>> +dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
>> +dma_map_failed:
>> +free_page((unsigned long)scrq->msgs);
>> +LEAVE;
>> +return rc;
>> +}
>> +
> 
> 
>

Re: [PATCH v2 17/17] ibmvfc: provide modules parameters for MQ settings

2020-12-02 Thread Tyrel Datwyler

On 12/2/20 10:40 AM, Brian King wrote:
> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>> +module_param_named(mig_channels_only, mig_channels_only, uint, S_IRUGO | 
>> S_IWUSR);
>> +MODULE_PARM_DESC(mig_channels_only, "Prevent migration to non-channelized 
>> system. "
>> + "[Default=" __stringify(IBMVFC_MIG_NO_SUB_TO_CRQ) "]");
>> +module_param_named(mig_no_less_channels, mig_no_less_channels, uint, 
>> S_IRUGO | S_IWUSR);
>> +MODULE_PARM_DESC(mig_no_less_channels, "Prevent migration to system with 
>> less channels. "
>> + "[Default=" __stringify(IBMVFC_MIG_NO_N_TO_M) "]");
> 
> Both of these are writeable, but it doesn't look like you do any 
> re-negotiation
> with the VIOS for these changed settings to take effect if someone changes
> them at runtime.

For some reason I convinced myself that these could just be changed on the fly,
but yes for them to actually take effect we need to re-negotiate the channels 
setup.

> 
>> +
>>  module_param_named(init_timeout, init_timeout, uint, S_IRUGO | S_IWUSR);
>>  MODULE_PARM_DESC(init_timeout, "Initialization timeout in seconds. "
>>   "[Default=" __stringify(IBMVFC_INIT_TIMEOUT) "]");
> 
>> @@ -3228,6 +3250,36 @@ static ssize_t ibmvfc_store_log_level(struct device 
>> *dev,
>>  return strlen(buf);
>>  }
>>  
>> +static ssize_t ibmvfc_show_scsi_channels(struct device *dev,
>> + struct device_attribute *attr, char 
>> *buf)
>> +{
>> +struct Scsi_Host *shost = class_to_shost(dev);
>> +struct ibmvfc_host *vhost = shost_priv(shost);
>> +unsigned long flags = 0;
>> +int len;
>> +
>> +spin_lock_irqsave(shost->host_lock, flags);
>> +len = snprintf(buf, PAGE_SIZE, "%d\n", vhost->client_scsi_channels);
>> +spin_unlock_irqrestore(shost->host_lock, flags);
>> +return len;
>> +}
>> +
>> +static ssize_t ibmvfc_store_scsi_channels(struct device *dev,
>> + struct device_attribute *attr,
>> + const char *buf, size_t count)
>> +{
>> +struct Scsi_Host *shost = class_to_shost(dev);
>> +struct ibmvfc_host *vhost = shost_priv(shost);
>> +unsigned long flags = 0;
>> +unsigned int channels;
>> +
>> +spin_lock_irqsave(shost->host_lock, flags);
>> +channels = simple_strtoul(buf, NULL, 10);
>> +vhost->client_scsi_channels = min(channels, nr_scsi_hw_queues);
> 
> Don't we need to do a LIP here for this new setting to go into effect?

Actually, we need a hard reset to break the CRQ Pair. A LIP will only do a NPIV
Logout/Login which keeps the existing channel setup.

-Tyrel

> 
>> +spin_unlock_irqrestore(shost->host_lock, flags);
>> +return strlen(buf);
>> +}
>> +
>>  static DEVICE_ATTR(partition_name, S_IRUGO, 
>> ibmvfc_show_host_partition_name, NULL);
>>  static DEVICE_ATTR(device_name, S_IRUGO, ibmvfc_show_host_device_name, 
>> NULL);
>>  static DEVICE_ATTR(port_loc_code, S_IRUGO, ibmvfc_show_host_loc_code, NULL);
> 
> 
>

Re: [PATCH v2 15/17] ibmvfc: send Cancel MAD down each hw scsi channel

2020-12-02 Thread Tyrel Datwyler

On 12/2/20 10:27 AM, Brian King wrote:
> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>> In general the client needs to send Cancel MADs and task management
>> commands down the same channel as the command(s) intended to cancel or
>> abort. The client assigns cancel keys per LUN and thus must send a
>> Cancel down each channel commands were submitted for that LUN. Further,
>> the client then must wait for those cancel completions prior to
>> submitting a LUN RESET or ABORT TASK SET.
>>
>> Allocate event pointers for each possible scsi channel and assign an
>> event for each channel that requires a cancel. Wait for completion each
>> submitted cancel.
>>
>> Signed-off-by: Tyrel Datwyler 
>> ---
>>  drivers/scsi/ibmvscsi/ibmvfc.c | 106 +
>>  1 file changed, 68 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>> index 0b6284020f06..97e8eed04b01 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>> @@ -2339,32 +2339,52 @@ static int ibmvfc_cancel_all(struct scsi_device 
>> *sdev, int type)
>>  {
>>  struct ibmvfc_host *vhost = shost_priv(sdev->host);
>>  struct ibmvfc_event *evt, *found_evt;
>> -union ibmvfc_iu rsp;
>> -int rsp_rc = -EBUSY;
>> +struct ibmvfc_event **evt_list;
>> +union ibmvfc_iu *rsp;
>> +int rsp_rc = 0;
>>  unsigned long flags;
>>  u16 status;
>> +int num_hwq = 1;
>> +int i;
>> +int ret = 0;
>>  
>>  ENTER;
>>  spin_lock_irqsave(vhost->host->host_lock, flags);
>> -found_evt = NULL;
>> -list_for_each_entry(evt, >sent, queue) {
>> -if (evt->cmnd && evt->cmnd->device == sdev) {
>> -found_evt = evt;
>> -break;
>> +if (vhost->using_channels && vhost->scsi_scrqs.active_queues)
>> +num_hwq = vhost->scsi_scrqs.active_queues;
>> +
>> +evt_list = kcalloc(num_hwq, sizeof(*evt_list), GFP_KERNE> + rsp = 
>> kcalloc(num_hwq, sizeof(*rsp), GFP_KERNEL);
> 
> Can't this just go on the stack? We don't want to be allocating memory
> during error recovery. Or, alternatively, you could put this in the
> vhost structure and protect it with a mutex. We only have enough events
> to single thread these anyway.
Yes, this could just go on the stack.

> 
>> +
>> +for (i = 0; i < num_hwq; i++) {
>> +sdev_printk(KERN_INFO, sdev, "Cancelling outstanding commands 
>> on queue %d.\n", i);
> 
> Prior to this patch, if there was nothing outstanding to the device and 
> cancel_all was called,
> no messages would get printed. This is changing that behavior. Is that 
> intentional? Additionally,
> it looks like this will get a lot more vebose, logging a message for each hw 
> queue, regardless
> of whether there was anything outstanding. Perhaps you want to move this down 
> to after the check
> for !found_evt?

It would actually print "no commands found to cancel". I think its fair to make
it less verbose or at least make them dbg output for each queue.

-Tyrel

> 
>> +
>> +found_evt = NULL;
>> +list_for_each_entry(evt, >sent, queue) {
>> +if (evt->cmnd && evt->cmnd->device == sdev && evt->hwq 
>> == i) {
>> +found_evt = evt;
>> +break;
>> +}
>>  }
>> -}
>>  
> 
> 
>

[powerpc:next-test 124/184] drivers/misc/lkdtm/powerpc.c:13:54: error: no member named 'sllp' in 'struct mmu_psize_def'

2020-12-02 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   fb003959777a635dea8910cf71109b612c7f940c
commit: d1579010cfe439978f9830b5632f9795049c6717 [124/184] lkdtm/powerpc: Add 
SLB multihit test
config: powerpc64-randconfig-r013-20201202 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
2671fccf0381769276ca8246ec0499adcb9b0355)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=d1579010cfe439978f9830b5632f9795049c6717
git remote add powerpc 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
git fetch --no-tags powerpc next-test
git checkout d1579010cfe439978f9830b5632f9795049c6717
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> drivers/misc/lkdtm/powerpc.c:13:54: error: no member named 'sllp' in 'struct 
>> mmu_psize_def'
   flags = SLB_VSID_KERNEL | mmu_psize_defs[page_size].sllp;
 ~ ^
>> drivers/misc/lkdtm/powerpc.c:13:10: error: use of undeclared identifier 
>> 'SLB_VSID_KERNEL'
   flags = SLB_VSID_KERNEL | mmu_psize_defs[page_size].sllp;
   ^
>> drivers/misc/lkdtm/powerpc.c:17:15: error: implicit declaration of function 
>> 'mk_vsid_data' [-Werror,-Wimplicit-function-declaration]
: "r" (mk_vsid_data(p, ssize, flags)),
   ^
>> drivers/misc/lkdtm/powerpc.c:18:15: error: implicit declaration of function 
>> 'mk_esid_data' [-Werror,-Wimplicit-function-declaration]
  "r" (mk_esid_data(p, ssize, SLB_NUM_BOLTED))
   ^
   drivers/misc/lkdtm/powerpc.c:18:15: note: did you mean 'mk_vsid_data'?
   drivers/misc/lkdtm/powerpc.c:17:15: note: 'mk_vsid_data' declared here
: "r" (mk_vsid_data(p, ssize, flags)),
   ^
>> drivers/misc/lkdtm/powerpc.c:18:38: error: use of undeclared identifier 
>> 'SLB_NUM_BOLTED'
  "r" (mk_esid_data(p, ssize, SLB_NUM_BOLTED))
  ^
   drivers/misc/lkdtm/powerpc.c:23:34: error: use of undeclared identifier 
'SLB_NUM_BOLTED'
 "r" (mk_esid_data(p, ssize, SLB_NUM_BOLTED + 1))
 ^
>> drivers/misc/lkdtm/powerpc.c:37:37: error: use of undeclared identifier 
>> 'MMU_SEGSIZE_1T'
   insert_slb_entry((unsigned long)p, MMU_SEGSIZE_1T, 
mmu_vmalloc_psize);
  ^
>> drivers/misc/lkdtm/powerpc.c:37:53: error: use of undeclared identifier 
>> 'mmu_vmalloc_psize'
   insert_slb_entry((unsigned long)p, MMU_SEGSIZE_1T, 
mmu_vmalloc_psize);
  ^
   drivers/misc/lkdtm/powerpc.c:56:37: error: use of undeclared identifier 
'MMU_SEGSIZE_1T'
   insert_slb_entry((unsigned long)p, MMU_SEGSIZE_1T, mmu_linear_psize);
  ^
   drivers/misc/lkdtm/powerpc.c:85:18: error: use of undeclared identifier 
'SLB_NUM_BOLTED'
 "r" (esid | SLB_NUM_BOLTED)
 ^
   drivers/misc/lkdtm/powerpc.c:94:19: error: use of undeclared identifier 
'SLB_NUM_BOLTED'
 "r" (esid | (SLB_NUM_BOLTED + 1))
  ^
   11 errors generated.

vim +13 drivers/misc/lkdtm/powerpc.c

 7  
 8  /* Inserts new slb entries */
 9  static void insert_slb_entry(unsigned long p, int ssize, int page_size)
10  {
11  unsigned long flags;
12  
  > 13  flags = SLB_VSID_KERNEL | mmu_psize_defs[page_size].sllp;
14  preempt_disable();
15  
16  asm volatile("slbmte %0,%1" :
  > 17   : "r" (mk_vsid_data(p, ssize, flags)),
  > 18 "r" (mk_esid_data(p, ssize, SLB_NUM_BOLTED))
19   : "memory");
20  
21  asm volatile("slbmte %0,%1" :
22  : "r" (mk_vsid_data(p, ssize, flags)),
23"

Re: [PATCH v2 2/2] kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1

2020-12-02 Thread Kees Cook

On Wed, Dec 02, 2020 at 11:37:38AM +0900, Masahiro Yamada wrote:
> On Wed, Dec 2, 2020 at 5:56 AM Kees Cook  wrote:
> >
> > On Tue, Dec 01, 2020 at 10:31:37PM +0900, Masahiro Yamada wrote:
> > > On Wed, Nov 25, 2020 at 7:22 AM Kees Cook  wrote:
> > > >
> > > > On Thu, Nov 19, 2020 at 01:13:27PM -0800, Nick Desaulniers wrote:
> > > > > On Thu, Nov 19, 2020 at 12:57 PM Nathan Chancellor
> > > > >  wrote:
> > > > > >
> > > > > > ld.lld 10.0.1 spews a bunch of various warnings about .rela 
> > > > > > sections,
> > > > > > along with a few others. Newer versions of ld.lld do not have these
> > > > > > warnings. As a result, do not add '--orphan-handling=warn' to
> > > > > > LDFLAGS_vmlinux if ld.lld's version is not new enough.
> > > > > >
> > > > > > Link: https://github.com/ClangBuiltLinux/linux/issues/1187
> > > > > > Link: https://github.com/ClangBuiltLinux/linux/issues/1193
> > > > > > Reported-by: Arvind Sankar 
> > > > > > Reported-by: kernelci.org bot 
> > > > > > Reported-by: Mark Brown 
> > > > > > Reviewed-by: Kees Cook 
> > > > > > Signed-off-by: Nathan Chancellor 
> > > > >
> > > > > Thanks for the additions in v2.
> > > > > Reviewed-by: Nick Desaulniers 
> > > >
> > > > I'm going to carry this for a few days in -next, and if no one screams,
> > > > ask Linus to pull it for v5.10-rc6.
> > > >
> > > > Thanks!
> > > >
> > > > --
> > > > Kees Cook
> > >
> > >
> > > Sorry for the delay.
> > > Applied to linux-kbuild.
> >
> > Great, thanks!
> >
> > > But, I already see this in linux-next.
> > > Please let me know if I should drop it from my tree.
> >
> > My intention was to get this to Linus this week. Do you want to do that
> > yourself, or Ack the patches in my tree and I'll send it?
> 
> I will send a kbuild pull request myself this week.

Okay, thanks! I've removed it from my -next tree now.

-- 
Kees Cook

Re: [PATCH v2 17/17] ibmvfc: provide modules parameters for MQ settings

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> +module_param_named(mig_channels_only, mig_channels_only, uint, S_IRUGO | 
> S_IWUSR);
> +MODULE_PARM_DESC(mig_channels_only, "Prevent migration to non-channelized 
> system. "
> +  "[Default=" __stringify(IBMVFC_MIG_NO_SUB_TO_CRQ) "]");
> +module_param_named(mig_no_less_channels, mig_no_less_channels, uint, S_IRUGO 
> | S_IWUSR);
> +MODULE_PARM_DESC(mig_no_less_channels, "Prevent migration to system with 
> less channels. "
> +  "[Default=" __stringify(IBMVFC_MIG_NO_N_TO_M) "]");

Both of these are writeable, but it doesn't look like you do any re-negotiation
with the VIOS for these changed settings to take effect if someone changes
them at runtime.

> +
>  module_param_named(init_timeout, init_timeout, uint, S_IRUGO | S_IWUSR);
>  MODULE_PARM_DESC(init_timeout, "Initialization timeout in seconds. "
>"[Default=" __stringify(IBMVFC_INIT_TIMEOUT) "]");

> @@ -3228,6 +3250,36 @@ static ssize_t ibmvfc_store_log_level(struct device 
> *dev,
>   return strlen(buf);
>  }
>  
> +static ssize_t ibmvfc_show_scsi_channels(struct device *dev,
> +  struct device_attribute *attr, char 
> *buf)
> +{
> + struct Scsi_Host *shost = class_to_shost(dev);
> + struct ibmvfc_host *vhost = shost_priv(shost);
> + unsigned long flags = 0;
> + int len;
> +
> + spin_lock_irqsave(shost->host_lock, flags);
> + len = snprintf(buf, PAGE_SIZE, "%d\n", vhost->client_scsi_channels);
> + spin_unlock_irqrestore(shost->host_lock, flags);
> + return len;
> +}
> +
> +static ssize_t ibmvfc_store_scsi_channels(struct device *dev,
> +  struct device_attribute *attr,
> +  const char *buf, size_t count)
> +{
> + struct Scsi_Host *shost = class_to_shost(dev);
> + struct ibmvfc_host *vhost = shost_priv(shost);
> + unsigned long flags = 0;
> + unsigned int channels;
> +
> + spin_lock_irqsave(shost->host_lock, flags);
> + channels = simple_strtoul(buf, NULL, 10);
> + vhost->client_scsi_channels = min(channels, nr_scsi_hw_queues);

Don't we need to do a LIP here for this new setting to go into effect?

> + spin_unlock_irqrestore(shost->host_lock, flags);
> + return strlen(buf);
> +}
> +
>  static DEVICE_ATTR(partition_name, S_IRUGO, ibmvfc_show_host_partition_name, 
> NULL);
>  static DEVICE_ATTR(device_name, S_IRUGO, ibmvfc_show_host_device_name, NULL);
>  static DEVICE_ATTR(port_loc_code, S_IRUGO, ibmvfc_show_host_loc_code, NULL);



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 16/17] ibmvfc: enable MQ and set reasonable defaults

2020-12-02 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 15/17] ibmvfc: send Cancel MAD down each hw scsi channel

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> In general the client needs to send Cancel MADs and task management
> commands down the same channel as the command(s) intended to cancel or
> abort. The client assigns cancel keys per LUN and thus must send a
> Cancel down each channel commands were submitted for that LUN. Further,
> the client then must wait for those cancel completions prior to
> submitting a LUN RESET or ABORT TASK SET.
> 
> Allocate event pointers for each possible scsi channel and assign an
> event for each channel that requires a cancel. Wait for completion each
> submitted cancel.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 106 +
>  1 file changed, 68 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 0b6284020f06..97e8eed04b01 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -2339,32 +2339,52 @@ static int ibmvfc_cancel_all(struct scsi_device 
> *sdev, int type)
>  {
>   struct ibmvfc_host *vhost = shost_priv(sdev->host);
>   struct ibmvfc_event *evt, *found_evt;
> - union ibmvfc_iu rsp;
> - int rsp_rc = -EBUSY;
> + struct ibmvfc_event **evt_list;
> + union ibmvfc_iu *rsp;
> + int rsp_rc = 0;
>   unsigned long flags;
>   u16 status;
> + int num_hwq = 1;
> + int i;
> + int ret = 0;
>  
>   ENTER;
>   spin_lock_irqsave(vhost->host->host_lock, flags);
> - found_evt = NULL;
> - list_for_each_entry(evt, >sent, queue) {
> - if (evt->cmnd && evt->cmnd->device == sdev) {
> - found_evt = evt;
> - break;
> + if (vhost->using_channels && vhost->scsi_scrqs.active_queues)
> + num_hwq = vhost->scsi_scrqs.active_queues;
> +
> + evt_list = kcalloc(num_hwq, sizeof(*evt_list), GFP_KERNE> + rsp = 
> kcalloc(num_hwq, sizeof(*rsp), GFP_KERNEL);

Can't this just go on the stack? We don't want to be allocating memory
during error recovery. Or, alternatively, you could put this in the
vhost structure and protect it with a mutex. We only have enough events
to single thread these anyway.

> +
> + for (i = 0; i < num_hwq; i++) {
> + sdev_printk(KERN_INFO, sdev, "Cancelling outstanding commands 
> on queue %d.\n", i);

Prior to this patch, if there was nothing outstanding to the device and 
cancel_all was called,
no messages would get printed. This is changing that behavior. Is that 
intentional? Additionally,
it looks like this will get a lot more vebose, logging a message for each hw 
queue, regardless
of whether there was anything outstanding. Perhaps you want to move this down 
to after the check
for !found_evt?

> +
> + found_evt = NULL;
> + list_for_each_entry(evt, >sent, queue) {
> + if (evt->cmnd && evt->cmnd->device == sdev && evt->hwq 
> == i) {
> + found_evt = evt;
> + break;
> + }
>   }
> - }
>  



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH kernel] powerpc/kuap: Restore AMR after replaying soft interrupts

2020-12-02 Thread kernel test robot

Hi Alexey,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on linus/master v5.10-rc6 next-20201201]
[cannot apply to scottwood/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Alexey-Kardashevskiy/powerpc-kuap-Restore-AMR-after-replaying-soft-interrupts/20201202-094132
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc64-randconfig-r024-20201202 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
2671fccf0381769276ca8246ec0499adcb9b0355)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/6b38a9b10a8384beeaa820e1c935cc4cabdb951e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Alexey-Kardashevskiy/powerpc-kuap-Restore-AMR-after-replaying-soft-interrupts/20201202-094132
git checkout 6b38a9b10a8384beeaa820e1c935cc4cabdb951e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from arch/powerpc/kernel/irq.c:31:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer 
arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c),
   ^~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 
'DEF_PCI_AC_NORET'
   __do_##name al; \
   ^~
   :100:1: note: expanded from here
   __do_insw
   ^
   arch/powerpc/include/asm/io.h:542:56: note: expanded from macro '__do_insw'
   #define __do_insw(p, b, n)  readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
  ~^
   In file included from arch/powerpc/kernel/irq.c:31:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer 
arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
   ^~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 
'DEF_PCI_AC_NORET'
   __do_##name al; \
   ^~
   :102:1: note: expanded from here
   __do_insl
   ^
   arch/powerpc/include/asm/io.h:543:56: note: expanded from macro '__do_insl'
   #define __do_insl(p, b, n)  readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
  ~^
   In file included from arch/powerpc/kernel/irq.c:31:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer 
arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
   ^~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 
'DEF_PCI_AC_NORET'
   __do_##n

Re: [PATCH v2 01/17] ibmvfc: add vhost fields and defaults for MQ enablement

2020-12-02 Thread Tyrel Datwyler

On 12/2/20 7:14 AM, Brian King wrote:
> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>> Introduce several new vhost fields for managing MQ state of the adapter
>> as well as initial defaults for MQ enablement.
>>
>> Signed-off-by: Tyrel Datwyler 
>> ---
>>  drivers/scsi/ibmvscsi/ibmvfc.c |  9 -
>>  drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++--
>>  2 files changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>> index 42e4d35e0d35..f1d677a7423d 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
>> struct vio_device_id *id)
>>  }
>>  
>>  shost->transportt = ibmvfc_transport_template;
>> -shost->can_queue = max_requests;
>> +shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);
> 
> This doesn't look right. can_queue is the SCSI host queue depth, not the MQ 
> queue depth.

Our max_requests is the total number commands allowed across all queues. From
what I understand is can_queue is the total number of commands in flight allowed
for each hw queue.

/*
 * In scsi-mq mode, the number of hardware queues supported by the LLD.
 *
 * Note: it is assumed that each hardware queue has a queue depth of
 * can_queue. In other words, the total queue depth per host
 * is nr_hw_queues * can_queue. However, for when host_tagset is set,
 * the total queue depth is can_queue.
 */

We currently don't use the host wide shared tagset.

-Tyrel

> 
>>  shost->max_lun = max_lun;
>>  shost->max_id = max_targets;
>>  shost->max_sectors = IBMVFC_MAX_SECTORS;
>>  shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
>>  shost->unique_id = shost->host_no;
>> +shost->nr_hw_queues = IBMVFC_SCSI_HW_QUEUES;
>>  
>>  vhost = shost_priv(shost);
>>  INIT_LIST_HEAD(>sent);
> 
> 
>

Re: [PATCH 00/13] ibmvfc: initial MQ development

2020-12-02 Thread Tyrel Datwyler

On 12/2/20 4:03 AM, Hannes Reinecke wrote:
> On 11/26/20 2:48 AM, Tyrel Datwyler wrote:
>> Recent updates in pHyp Firmware and VIOS releases provide new infrastructure
>> towards enabling Subordinate Command Response Queues (Sub-CRQs) such that 
>> each
>> Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on 
>> the
>> partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and 
>> then
>> negotiated with the VIOS via new Management Datagrams (MADs) for channel 
>> setup.
>>
>> This initial implementation adds the necessary Sub-CRQ framework and 
>> implements
>> the new MADs for negotiating and assigning a set of Sub-CRQs to associated 
>> VIOS
>> HW backed channels. The event pool and locking still leverages the legacy 
>> single
>> queue implementation, and as such lock contention is problematic when 
>> increasing
>> the number of queues. However, this initial work demonstrates a 1.2x factor
>> increase in IOPs when configured with two HW queues despite lock contention.
>>
> Why do you still hold the hold lock during submission?

Proof of concept.

> An initial check on the submission code path didn't reveal anything obvious, 
> so
> it _should_ be possible to drop the host lock there.

Its used to protect the event pool and the event free/sent lists. This could
probably have its own lock instead of the host lock.

> Or at least move it into the submission function itself to avoid lock
> contention. Hmm?

I have a followup patch to do that, but I didn't see any change in performance.
I've got another patch I'm finishing that provides dedicated event pools for
each subqueue such that they will no longer have any dependency on the host 
lock.

-Tyrel

> 
> Cheers,
> 
> Hannes

Re: [PATCH] drivers: char: tpm: remove unneeded MODULE_VERSION() usage

2020-12-02 Thread Jarkko Sakkinen

On Wed, Dec 02, 2020 at 01:15:53PM +0100, Enrico Weigelt, metux IT consult 
wrote:
> Remove MODULE_VERSION(), as it isn't needed at all: the only version
> making sense is the kernel version.

Kernel version neither does make sense here. Why are mentioning it
in the commit message? Please just derive the commit message from
the one that Greg wrote.

> Link: https://lkml.org/lkml/2017/11/22/480
>

Remove the spurious empty line.

> Signed-off-by: Enrico Weigelt 
> ---
>  drivers/char/tpm/st33zp24/i2c.c  | 1 -
>  drivers/char/tpm/st33zp24/spi.c  | 1 -
>  drivers/char/tpm/st33zp24/st33zp24.c | 1 -
>  drivers/char/tpm/tpm-interface.c | 1 -
>  drivers/char/tpm/tpm_atmel.c | 1 -
>  drivers/char/tpm/tpm_crb.c   | 1 -
>  drivers/char/tpm/tpm_i2c_infineon.c  | 1 -
>  drivers/char/tpm/tpm_ibmvtpm.c   | 1 -
>  drivers/char/tpm/tpm_infineon.c  | 1 -
>  drivers/char/tpm/tpm_nsc.c   | 1 -
>  drivers/char/tpm/tpm_tis.c   | 1 -
>  drivers/char/tpm/tpm_tis_core.c  | 1 -
>  drivers/char/tpm/tpm_vtpm_proxy.c| 1 -
>  13 files changed, 13 deletions(-)
> 
> diff --git a/drivers/char/tpm/st33zp24/i2c.c b/drivers/char/tpm/st33zp24/i2c.c
> index 7c617edff4ca..7ed9829cacc4 100644
> --- a/drivers/char/tpm/st33zp24/i2c.c
> +++ b/drivers/char/tpm/st33zp24/i2c.c
> @@ -313,5 +313,4 @@ module_i2c_driver(st33zp24_i2c_driver);
>  
>  MODULE_AUTHOR("TPM support (tpmsupp...@list.st.com)");
>  MODULE_DESCRIPTION("STM TPM 1.2 I2C ST33 Driver");
> -MODULE_VERSION("1.3.0");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/st33zp24/spi.c b/drivers/char/tpm/st33zp24/spi.c
> index a75dafd39445..147efea4eb05 100644
> --- a/drivers/char/tpm/st33zp24/spi.c
> +++ b/drivers/char/tpm/st33zp24/spi.c
> @@ -430,5 +430,4 @@ module_spi_driver(st33zp24_spi_driver);
>  
>  MODULE_AUTHOR("TPM support (tpmsupp...@list.st.com)");
>  MODULE_DESCRIPTION("STM TPM 1.2 SPI ST33 Driver");
> -MODULE_VERSION("1.3.0");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/st33zp24/st33zp24.c 
> b/drivers/char/tpm/st33zp24/st33zp24.c
> index 4ec10ab5e576..e0f1a5828993 100644
> --- a/drivers/char/tpm/st33zp24/st33zp24.c
> +++ b/drivers/char/tpm/st33zp24/st33zp24.c
> @@ -646,5 +646,4 @@ EXPORT_SYMBOL(st33zp24_pm_resume);
>  
>  MODULE_AUTHOR("TPM support (tpmsupp...@list.st.com)");
>  MODULE_DESCRIPTION("ST33ZP24 TPM 1.2 driver");
> -MODULE_VERSION("1.3.0");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm-interface.c 
> b/drivers/char/tpm/tpm-interface.c
> index 1621ce818705..dfdc68b8bf88 100644
> --- a/drivers/char/tpm/tpm-interface.c
> +++ b/drivers/char/tpm/tpm-interface.c
> @@ -514,5 +514,4 @@ module_exit(tpm_exit);
>  
>  MODULE_AUTHOR("Leendert van Doorn (leend...@watson.ibm.com)");
>  MODULE_DESCRIPTION("TPM Driver");
> -MODULE_VERSION("2.0");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_atmel.c b/drivers/char/tpm/tpm_atmel.c
> index 54a6750a6757..35bf249cc95a 100644
> --- a/drivers/char/tpm/tpm_atmel.c
> +++ b/drivers/char/tpm/tpm_atmel.c
> @@ -231,5 +231,4 @@ module_exit(cleanup_atmel);
>  
>  MODULE_AUTHOR("Leendert van Doorn (leend...@watson.ibm.com)");
>  MODULE_DESCRIPTION("TPM Driver");
> -MODULE_VERSION("2.0");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
> index a9dcf31eadd2..3e72b7b99cce 100644
> --- a/drivers/char/tpm/tpm_crb.c
> +++ b/drivers/char/tpm/tpm_crb.c
> @@ -748,5 +748,4 @@ static struct acpi_driver crb_acpi_driver = {
>  module_acpi_driver(crb_acpi_driver);
>  MODULE_AUTHOR("Jarkko Sakkinen ");
>  MODULE_DESCRIPTION("TPM2 Driver");
> -MODULE_VERSION("0.1");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_i2c_infineon.c 
> b/drivers/char/tpm/tpm_i2c_infineon.c
> index a19d32cb4e94..8920b7c19fcb 100644
> --- a/drivers/char/tpm/tpm_i2c_infineon.c
> +++ b/drivers/char/tpm/tpm_i2c_infineon.c
> @@ -731,5 +731,4 @@ static struct i2c_driver tpm_tis_i2c_driver = {
>  module_i2c_driver(tpm_tis_i2c_driver);
>  MODULE_AUTHOR("Peter Huewe ");
>  MODULE_DESCRIPTION("TPM TIS I2C Infineon Driver");
> -MODULE_VERSION("2.2.0");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
> index 994385bf37c0..5b04d113f634 100644
> --- a/drivers/char/tpm/tpm_ibmvtpm.c
> +++ b/drivers/char/tpm/tpm_ibmvtpm.c
> @@ -750,5 +750,4 @@ module_exit(ibmvtpm_module_exit);
>  
>  MODULE_AUTHOR("ad...@us.ibm.com");
>  MODULE_DESCRIPTION("IBM vTPM Driver");
> -MODULE_VERSION("1.0");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/char/tpm/tpm_infineon.c b/drivers/char/tpm/tpm_infineon.c
> index 9c924a1440a9..8a58966c5c9b 100644
> --- a/drivers/char/tpm/tpm_infineon.c
> +++ b/drivers/char/tpm/tpm_infineon.c
> @@ -621,5 +621,4 @@ module_pnp_driver(tpm_inf_pnp_driver);
>  
>  MODULE_AUTHOR("Marcel Selhorst ");
>  MODULE_DESCRIPTION("Driver for Infineon TPM SLD 9630 TT 1.1 / SLB 9635 TT 
> 1.2");
> -MODULE_VERSION("1.9.2");
>  MODULE_LICENSE("GPL");
> diff --git

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-02 Thread Peter Zijlstra

On Wed, Dec 02, 2020 at 06:38:12AM -0800, Andy Lutomirski wrote:
> 
> > On Dec 2, 2020, at 6:20 AM, Peter Zijlstra  wrote:
> > 
> > On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote:
> >> + * - A delayed freeing and RCU-like quiescing sequence based on
> >> + *   mm switching to avoid IPIs completely.
> > 
> > That one's interesting too. so basically you want to count switch_mm()
> > invocations on each CPU. Then, periodically snapshot the counter on each
> > CPU, and when they've all changed, increment a global counter.
> > 
> > Then, you snapshot the global counter and wait for it to increment
> > (twice I think, the first increment might already be in progress).
> > 
> > The only question here is what should drive this machinery.. the tick
> > probably.
> > 
> > This shouldn't be too hard to do I think.
> > 
> > Something a little like so perhaps?
> 
> I don’t think this will work.  A CPU can go idle with lazy mm and nohz
> forever.  This could lead to unbounded memory use on a lightly loaded
> system.

Hurm.. quite so indeed. Fixing that seems to end up with requiring that
other proposal, such that we can tell which CPU has what active_mm
stuck.

Also, more complicated... :/

Re: [PATCH v2 14/17] ibmvfc: add cancel mad initialization helper

2020-12-02 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 06/17] ibmvfc: add handlers to drain and complete Sub-CRQ responses

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host 
> *vhost)
> +{
> + struct ibmvfc_event *evt = (struct ibmvfc_event 
> *)be64_to_cpu(crq->ioba);
> + unsigned long flags;
> +
> + switch (crq->valid) {
> + case IBMVFC_CRQ_CMD_RSP:
> + break;
> + case IBMVFC_CRQ_XPORT_EVENT:
> + return;
> + default:
> + dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", 
> crq->valid);
> + return;
> + }
> +
> + /* The only kind of payload CRQs we should get are responses to
> +  * things we send. Make sure this response is to something we
> +  * actually sent
> +  */
> + if (unlikely(!ibmvfc_valid_event(>pool, evt))) {
> + dev_err(vhost->dev, "Returned correlation_token 0x%08llx is 
> invalid!\n",
> + crq->ioba);
> + return;
> + }
> +
> + if (unlikely(atomic_read(>free))) {
> + dev_err(vhost->dev, "Received duplicate correlation_token 
> 0x%08llx!\n",
> + crq->ioba);
> + return;
> + }
> +
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + del_timer(>timer);
> + list_del(>queue);
> + ibmvfc_trc_end(evt);

Another thought here... If you are going through ibmvfc_purge_requests at the 
same time
as this code, you could check the free bit above, then have 
ibmvfc_purge_requests
put the event on the free queue and call scsi_done, then you come down and get 
the host
lock here, remove the command from the free list, and call the done function 
again,
which could result in a double completion to the scsi layer.

I think you need to grab the host lock before you check the free bit to avoid 
this race.

> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> + evt->done(evt);
> +}
> +


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 09/17] ibmvfc: implement channel enquiry and setup commands

2020-12-02 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 06/17] ibmvfc: add handlers to drain and complete Sub-CRQ responses

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> The logic for iterating over the Sub-CRQ responses is similiar to that
> of the primary CRQ. Add the necessary handlers for processing those
> responses.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 77 ++
>  1 file changed, 77 insertions(+)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 97f00fefa809..e9da3f60c793 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -3381,6 +3381,83 @@ static int ibmvfc_toggle_scrq_irq(struct 
> ibmvfc_sub_queue *scrq, int enable)
>   return rc;
>  }
>  
> +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host 
> *vhost)
> +{
> + struct ibmvfc_event *evt = (struct ibmvfc_event 
> *)be64_to_cpu(crq->ioba);
> + unsigned long flags;
> +
> + switch (crq->valid) {
> + case IBMVFC_CRQ_CMD_RSP:
> + break;
> + case IBMVFC_CRQ_XPORT_EVENT:
> + return;
> + default:
> + dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", 
> crq->valid);
> + return;
> + }
> +
> + /* The only kind of payload CRQs we should get are responses to
> +  * things we send. Make sure this response is to something we
> +  * actually sent
> +  */
> + if (unlikely(!ibmvfc_valid_event(>pool, evt))) {
> + dev_err(vhost->dev, "Returned correlation_token 0x%08llx is 
> invalid!\n",
> + crq->ioba);
> + return;
> + }
> +
> + if (unlikely(atomic_read(>free))) {
> + dev_err(vhost->dev, "Received duplicate correlation_token 
> 0x%08llx!\n",
> + crq->ioba);
> + return;
> + }
> +
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + del_timer(>timer);
> + list_del(>queue);
> + ibmvfc_trc_end(evt);
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> + evt->done(evt);
> +}
> +
> +static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_sub_queue *scrq)
> +{
> + struct ibmvfc_crq *crq;
> +
> + crq = >msgs[scrq->cur].crq;
> + if (crq->valid & 0x80) {
> + if (++scrq->cur == scrq->size)

You are incrementing the cur pointer without any locks held. Although
unlikely, could you also be in ibmvfc_reset_crq in another thread?
If so, you'd have a subtle race condition here where the cur pointer could
be read, then ibmvfc_reset_crq writes it to zero, then this thread
writes it to a non zero value, which would then cause you to be out of
sync with the VIOS as to where the cur pointer is.

> + scrq->cur = 0;
> + rmb();
> + } else
> + crq = NULL;
> +
> + return crq;
> +}
> +



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 04/17] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> +static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
> +   int index)
> +{
> + struct device *dev = vhost->dev;
> + struct vio_dev *vdev = to_vio_dev(dev);
> + struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index];
> + int rc = -ENOMEM;
> +
> + ENTER;
> +
> + scrq->msgs = (struct ibmvfc_sub_crq *)get_zeroed_page(GFP_KERNEL);
> + if (!scrq->msgs)
> + return rc;
> +
> + scrq->size = PAGE_SIZE / sizeof(*scrq->msgs);
> + scrq->msg_token = dma_map_single(dev, scrq->msgs, PAGE_SIZE,
> +  DMA_BIDIRECTIONAL);
> +
> + if (dma_mapping_error(dev, scrq->msg_token))
> + goto dma_map_failed;
> +
> + rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
> +>cookie, >hw_irq);
> +
> + if (rc) {
> + dev_warn(dev, "Error registering sub-crq: %d\n", rc);
> + dev_warn(dev, "Firmware may not support MQ\n");

Will this now get logged everywhere this new driver runs if the firmware
does not support sub CRQs? Is there something better that could be done
here to only log this for a true error and not just because a new driver
is running with an older firmware release?

> + goto reg_failed;
> + }
> +
> + scrq->hwq_id = index;
> + scrq->vhost = vhost;
> +
> + LEAVE;
> + return 0;
> +
> +reg_failed:
> + dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
> +dma_map_failed:
> + free_page((unsigned long)scrq->msgs);
> + LEAVE;
> + return rc;
> +}
> +



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 01/17] ibmvfc: add vhost fields and defaults for MQ enablement

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> Introduce several new vhost fields for managing MQ state of the adapter
> as well as initial defaults for MQ enablement.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c |  9 -
>  drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++--
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 42e4d35e0d35..f1d677a7423d 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
> struct vio_device_id *id)
>   }
>  
>   shost->transportt = ibmvfc_transport_template;
> - shost->can_queue = max_requests;
> + shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);

This doesn't look right. can_queue is the SCSI host queue depth, not the MQ 
queue depth.

>   shost->max_lun = max_lun;
>   shost->max_id = max_targets;
>   shost->max_sectors = IBMVFC_MAX_SECTORS;
>   shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
>   shost->unique_id = shost->host_no;
> + shost->nr_hw_queues = IBMVFC_SCSI_HW_QUEUES;
>  
>   vhost = shost_priv(shost);
>   INIT_LIST_HEAD(>sent);



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH kernel v3] powerpc/pci: Remove LSI mappings on device teardown

2020-12-02 Thread Cédric Le Goater

On 12/2/20 1:52 AM, Alexey Kardashevskiy wrote:
> From: Oliver O'Halloran 
> 
> When a passthrough IO adapter is removed from a pseries machine using hash
> MMU and the XIVE interrupt mode, the POWER hypervisor expects the guest OS
> to clear all page table entries related to the adapter. If some are still
> present, the RTAS call which isolates the PCI slot returns error 9001
> "valid outstanding translations" and the removal of the IO adapter fails.
> This is because when the PHBs are scanned, Linux maps automatically the
> INTx interrupts in the Linux interrupt number space but these are never
> removed.
> 
> This problem can be fixed by adding the corresponding unmap operation when
> the device is removed. There's no pcibios_* hook for the remove case, but
> the same effect can be achieved using a bus notifier.
> 
> Because INTx are shared among PHBs (and potentially across the system),
> this adds tracking of virq to unmap them only when the last user is gone.
> 
> Signed-off-by: Oliver O'Halloran 
> [aik: added refcounter]
> Signed-off-by: Alexey Kardashevskiy 


Reviewed-by: Cédric Le Goater 

I did some PHB hotplug tests on a KVM guest and a LPAR using only LSIs.

Tested-by: Cédric Le Goater 

Thanks Alexey,

C.

> ---
> Changes:
> v3:
> * free @vi on error path
> 
> v2:
> * added refcounter
> ---
>  arch/powerpc/kernel/pci-common.c | 82 ++--
>  1 file changed, 78 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/pci-common.c 
> b/arch/powerpc/kernel/pci-common.c
> index be108616a721..2b555997b295 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -353,6 +353,55 @@ struct pci_controller 
> *pci_find_controller_for_domain(int domain_nr)
>   return NULL;
>  }
>  
> +struct pci_intx_virq {
> + int virq;
> + struct kref kref;
> + struct list_head list_node;
> +};
> +
> +static LIST_HEAD(intx_list);
> +static DEFINE_MUTEX(intx_mutex);
> +
> +static void ppc_pci_intx_release(struct kref *kref)
> +{
> + struct pci_intx_virq *vi = container_of(kref, struct pci_intx_virq, 
> kref);
> +
> + list_del(>list_node);
> + irq_dispose_mapping(vi->virq);
> + kfree(vi);
> +}
> +
> +static int ppc_pci_unmap_irq_line(struct notifier_block *nb,
> +unsigned long action, void *data)
> +{
> + struct pci_dev *pdev = to_pci_dev(data);
> +
> + if (action == BUS_NOTIFY_DEL_DEVICE) {
> + struct pci_intx_virq *vi;
> +
> + mutex_lock(_mutex);
> + list_for_each_entry(vi, _list, list_node) {
> + if (vi->virq == pdev->irq) {
> + kref_put(>kref, ppc_pci_intx_release);
> + break;
> + }
> + }
> + mutex_unlock(_mutex);
> + }
> +
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block ppc_pci_unmap_irq_notifier = {
> + .notifier_call = ppc_pci_unmap_irq_line,
> +};
> +
> +static int ppc_pci_register_irq_notifier(void)
> +{
> + return bus_register_notifier(_bus_type, 
> _pci_unmap_irq_notifier);
> +}
> +arch_initcall(ppc_pci_register_irq_notifier);
> +
>  /*
>   * Reads the interrupt pin to determine if interrupt is use by card.
>   * If the interrupt is used, then gets the interrupt line from the
> @@ -361,6 +410,12 @@ struct pci_controller 
> *pci_find_controller_for_domain(int domain_nr)
>  static int pci_read_irq_line(struct pci_dev *pci_dev)
>  {
>   int virq;
> + struct pci_intx_virq *vi, *vitmp;
> +
> + /* Preallocate vi as rewind is complex if this fails after mapping */
> + vi = kzalloc(sizeof(struct pci_intx_virq), GFP_KERNEL);
> + if (!vi)
> + return -1;
>  
>   pr_debug("PCI: Try to map irq for %s...\n", pci_name(pci_dev));
>  
> @@ -377,12 +432,12 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
>* function.
>*/
>   if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_PIN, ))
> - return -1;
> + goto error_exit;
>   if (pin == 0)
> - return -1;
> + goto error_exit;
>   if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_LINE, ) ||
>   line == 0xff || line == 0) {
> - return -1;
> + goto error_exit;
>   }
>   pr_debug(" No map ! Using line %d (pin %d) from PCI config\n",
>line, pin);
> @@ -394,14 +449,33 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
>  
>   if (!virq) {
>   pr_debug(" Failed to map !\n");
> - return -1;
> + goto error_exit;
>   }
>  
>   pr_debug(" Mapped to linux irq %d\n", virq);
>  
>   pci_dev->irq = virq;
>  
> + mutex_lock(_mutex);
> + list_for_each_entry(vitmp, _list, list_node) {
> + if (vitmp->virq == virq)

Re: powerpc 5.10-rcN boot failures with RCU_SCALE_TEST=m

2020-12-02 Thread Uladzislau Rezki

On Thu, Dec 03, 2020 at 01:03:32AM +1100, Michael Ellerman wrote:
> Daniel Axtens  writes:
> > Hi all,
> >
> > I'm having some difficulty tracking down a bug.
> >
> > Some configurations of the powerpc kernel since somewhere in the 5.10
> > merge window fail to boot on some ppc64 systems. They hang while trying
> > to bring up SMP. It seems to depend on the RCU_SCALE/PERF_TEST option.
> > (It was renamed in the 5.10 merge window.)
> >
> > I can reproduce it as follows with qemu tcg:
> >
> > make -j64 pseries_le_defconfig
> > scripts/config -m RCU_SCALE_TEST
> > scripts/config -m RCU_PERF_TEST
> > make -j 64 vmlinux CC="ccache gcc"
> >
> > qemu-system-ppc64 -cpu power9 -M pseries -m 1G -nographic -vga none -smp 4 
> > -kernel vmlinux
> >
> > ...
> > [0.036284][T0] Mount-cache hash table entries: 8192 (order: 0, 
> > 65536 bytes, linear)
> > [0.036481][T0] Mountpoint-cache hash table entries: 8192 (order: 0, 
> > 65536 bytes, linear)
> > [0.148168][T1] POWER9 performance monitor hardware support 
> > registered
> > [0.151118][T1] rcu: Hierarchical SRCU implementation.
> > [0.186660][T1] smp: Bringing up secondary CPUs ...
> > 
> 
> One does not simply hang :)
> 
> > I have no idea why RCU_SCALE/PERF_TEST would be causing this, but that
> > seems to be what does it: if I don't set that, the kernel boots fine.
> 
> It seems to be TASKS_RCU that is the key.
> 
> I don't need RCU_SCALE_TEST enabled, I can trigger it just with the
> following applied:
> 
> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> index 0ebe15a84985..f3500c95d6a1 100644
> --- a/kernel/rcu/Kconfig
> +++ b/kernel/rcu/Kconfig
> @@ -78,7 +78,7 @@ config TASKS_RCU_GENERIC
> task-based RCU implementations.  Not for manual selection.
>  
>  config TASKS_RCU
> - def_bool PREEMPTION
> + def_bool y
>   help
> This option enables a task-based RCU implementation that uses
> only voluntary context switch (not preemption!), idle, and
> 
> 
> And bisect points to:
>   36dadef23fcc ("kprobes: Init kprobes in early_initcall")
> 
> Which moved init_kprobes() prior to SMP bringup.
> 
> 
> For some reason when it gets stuck sysrq doesn't work, but I was able to
> get it into gdb and manually call handle_sysrq('t') to get the output
> below.
> 
> The SMP bringup stalls because _cpu_up() is blocked trying to take
> cpu_hotplug_lock for writing:
> 
> [  401.403132][T0] task:swapper/0   state:D stack:12512 pid:1 
> ppid: 0 flags:0x0800
> [  401.403502][T0] Call Trace:
> [  401.403907][T0] [c62c37d0] [c62c3830] 
> 0xc62c3830 (unreliable)
> [  401.404068][T0] [c62c39b0] [c0019d70] 
> __switch_to+0x2e0/0x4a0
> [  401.404189][T0] [c62c3a10] [c0b87228] 
> __schedule+0x288/0x9b0
> [  401.404257][T0] [c62c3ad0] [c0b879b8] 
> schedule+0x68/0x120
> [  401.404324][T0] [c62c3b00] [c0184ad4] 
> percpu_down_write+0x164/0x170
> [  401.404390][T0] [c62c3b50] [c0116b68] 
> _cpu_up+0x68/0x280
> [  401.404475][T0] [c62c3bb0] [c0116e70] cpu_up+0xf0/0x140
> [  401.404546][T0] [c62c3c30] [c011776c] 
> bringup_nonboot_cpus+0xac/0xf0
> [  401.404643][T0] [c62c3c80] [c0eea1b8] 
> smp_init+0x40/0xcc
> [  401.404727][T0] [c62c3ce0] [c0ec43dc] 
> kernel_init_freeable+0x1e0/0x3a0
> [  401.404799][T0] [c62c3db0] [c0011ec4] 
> kernel_init+0x24/0x150
> [  401.404958][T0] [c62c3e20] [c000daf0] 
> ret_from_kernel_thread+0x5c/0x6c
> 
> It can't get it because kprobe_optimizer() has taken it for read and is now
> blocked waiting for synchronize_rcu_tasks():
> 
> [  401.418808][T0] task:kworker/0:1 state:D stack:13392 pid:   12 
> ppid: 2 flags:0x0800
> [  401.418951][T0] Workqueue: events kprobe_optimizer
> [  401.419078][T0] Call Trace:
> [  401.419121][T0] [c62ef650] [c62ef710] 
> 0xc62ef710 (unreliable)
> [  401.419213][T0] [c62ef830] [c0019d70] 
> __switch_to+0x2e0/0x4a0
> [  401.419281][T0] [c62ef890] [c0b87228] 
> __schedule+0x288/0x9b0
> [  401.419347][T0] [c62ef950] [c0b879b8] 
> schedule+0x68/0x120
> [  401.419415][T0] [c62ef980] [c0b8e664] 
> schedule_timeout+0x2a4/0x340
> [  401.419484][T0] [c62efa80] [c0b894ec] 
> wait_for_completion+0x9c/0x170
> [  401.419552][T0] [c62efae0] [c01ac85c] 
> __wait_rcu_gp+0x19c/0x210
> [  401.419619][T0] [c62efb40] [c01ac90c] 
> synchronize_rcu_tasks_generic+0x3c/0x70
> [  401.419690][T0] [c62efbe0] [c022a3dc] 
> kprobe_optimizer+0x1dc/0x470
> [  401.419757][T0] [c62efc60] [c0136684] 
> process_one_work+0x2f4/0x530
> [  401.419823][T0] [c62efd20] [c0138d28]

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-02 Thread Andy Lutomirski



> On Dec 2, 2020, at 6:20 AM, Peter Zijlstra  wrote:
> 
> On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote:
>> + * - A delayed freeing and RCU-like quiescing sequence based on
>> + *   mm switching to avoid IPIs completely.
> 
> That one's interesting too. so basically you want to count switch_mm()
> invocations on each CPU. Then, periodically snapshot the counter on each
> CPU, and when they've all changed, increment a global counter.
> 
> Then, you snapshot the global counter and wait for it to increment
> (twice I think, the first increment might already be in progress).
> 
> The only question here is what should drive this machinery.. the tick
> probably.
> 
> This shouldn't be too hard to do I think.
> 
> Something a little like so perhaps?

I don’t think this will work.  A CPU can go idle with lazy mm and nohz forever. 
 This could lead to unbounded memory use on a lightly loaded system.

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-02 Thread Peter Zijlstra

On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote:
> +  * - A delayed freeing and RCU-like quiescing sequence based on
> +  *   mm switching to avoid IPIs completely.

That one's interesting too. so basically you want to count switch_mm()
invocations on each CPU. Then, periodically snapshot the counter on each
CPU, and when they've all changed, increment a global counter.

Then, you snapshot the global counter and wait for it to increment
(twice I think, the first increment might already be in progress).

The only question here is what should drive this machinery.. the tick
probably.

This shouldn't be too hard to do I think.

Something a little like so perhaps?


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 41404afb7f4c..27b64a60a468 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4525,6 +4525,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 * finish_task_switch()'s mmdrop().
 */
switch_mm_irqs_off(prev->active_mm, next->mm, next);
+   rq->nr_mm_switches++;
 
if (!prev->mm) {// from kernel
/* will mmdrop() in finish_task_switch(). */
@@ -4739,6 +4740,80 @@ unsigned long long task_sched_runtime(struct task_struct 
*p)
return ns;
 }
 
+static DEFINE_PER_CPU(unsigned long[2], mm_switches);
+
+static struct {
+   unsigned long __percpu *switches[2];
+   unsigned long generation;
+   atomic_t complete;
+   struct wait_queue_dead wait;
+} mm_foo = {
+   .switches = _switches,
+   .generation = 0,
+   .complete = -1, // XXX bootstrap, hotplug
+   .wait = __WAIT_QUEUE_HEAD_INITIALIZER(mm_foo.wait),
+};
+
+static void mm_gen_tick(int cpu, struct rq *rq)
+{
+   unsigned long prev, curr, switches = rq->nr_mm_switches;
+   int idx = READ_ONCE(mm_foo.generation) & 1;
+
+   /* DATA-DEP on mm_foo.generation */
+
+   prev = __this_cpu_read(mm_foo.switches[idx^1]);
+   curr = __this_cpu_read(mm_foo.switches[idx]);
+
+   /* we haven't switched since the last generation */
+   if (prev == switches)
+   return false;
+
+   __this_cpu_write(mm_foo.switches[idx], switches);
+
+   /*
+* If @curr is less than @prev, this is the first update of
+* this generation, per the above, switches has also increased since,
+* so mark out CPU complete.
+*/
+   if ((long)(curr - prev) < 0 && atomic_dec_and_test(_foo.complete)) {
+   /*
+* All CPUs are complete, IOW they all switched at least once
+* since the last generation. Reset the completion counter and
+* increment the generation.
+*/
+   atomic_set(_foo.complete, nr_online_cpus());
+   /*
+* Matches the address dependency above:
+*
+*   idx = gen & 1  complete = nr_cpus
+*
+*   curr = sw[idx] generation++;
+*   prev = sw[idx^1]
+*   if (curr < prev)
+* complete--
+*
+* If we don't observe the new generation; we'll not decrement. 
If we
+* do see the new generation, we must also see the new 
completion count.
+*/
+   smp_wmb();
+   mm_foo.generation++;
+   return true;
+   }
+
+   return false;
+}
+
+static void mm_gen_wake(void)
+{
+   wake_up_all(_foo.wait);
+}
+
+static void mm_gen_wait(void)
+{
+   unsigned int gen = READ_ONCE(mm_foo.generation);
+   wait_event(_foo.wait, READ_ONCE(mm_foo.generation) - gen > 1);
+}
+
 /*
  * This function gets called by the timer code, with HZ frequency.
  * We call it with interrupts disabled.
@@ -4750,6 +4825,7 @@ void scheduler_tick(void)
struct task_struct *curr = rq->curr;
struct rq_flags rf;
unsigned long thermal_pressure;
+   bool wake_mm_gen;
 
arch_scale_freq_tick();
sched_clock_tick();
@@ -4763,8 +4839,13 @@ void scheduler_tick(void)
calc_global_load_tick(rq);
psi_task_tick(rq);
 
+   wake_mm_gen = mm_gen_tick(cpu, rq);
+
rq_unlock(rq, );
 
+   if (wake_mm_gen)
+   mm_gen_wake();
+
perf_event_task_tick();
 
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bf9d8da7d35e..62fb685db8d0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -927,6 +927,7 @@ struct rq {
unsigned intttwu_pending;
 #endif
u64 nr_switches;
+   u64 nr_mm_switches;
 
 #ifdef CONFIG_UCLAMP_TASK
/* Utilization clamp values based on CPU's RUNNABLE tasks */

Re: [PATCH kernel v3] powerpc/pci: Remove LSI mappings on device teardown

2020-12-02 Thread Frederic Barrat





On 02/12/2020 01:52, Alexey Kardashevskiy wrote:

From: Oliver O'Halloran 

When a passthrough IO adapter is removed from a pseries machine using hash
MMU and the XIVE interrupt mode, the POWER hypervisor expects the guest OS
to clear all page table entries related to the adapter. If some are still
present, the RTAS call which isolates the PCI slot returns error 9001
"valid outstanding translations" and the removal of the IO adapter fails.
This is because when the PHBs are scanned, Linux maps automatically the
INTx interrupts in the Linux interrupt number space but these are never
removed.

This problem can be fixed by adding the corresponding unmap operation when
the device is removed. There's no pcibios_* hook for the remove case, but
the same effect can be achieved using a bus notifier.

Because INTx are shared among PHBs (and potentially across the system),
this adds tracking of virq to unmap them only when the last user is gone.

Signed-off-by: Oliver O'Halloran 
[aik: added refcounter]
Signed-off-by: Alexey Kardashevskiy 
---



Looks ok to me.
Reviewed-by: Frederic Barrat 



Changes:
v3:
* free @vi on error path

v2:
* added refcounter
---
  arch/powerpc/kernel/pci-common.c | 82 ++--
  1 file changed, 78 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index be108616a721..2b555997b295 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -353,6 +353,55 @@ struct pci_controller *pci_find_controller_for_domain(int 
domain_nr)
return NULL;
  }
  
+struct pci_intx_virq {

+   int virq;
+   struct kref kref;
+   struct list_head list_node;
+};
+
+static LIST_HEAD(intx_list);
+static DEFINE_MUTEX(intx_mutex);
+
+static void ppc_pci_intx_release(struct kref *kref)
+{
+   struct pci_intx_virq *vi = container_of(kref, struct pci_intx_virq, 
kref);
+
+   list_del(>list_node);
+   irq_dispose_mapping(vi->virq);
+   kfree(vi);
+}
+
+static int ppc_pci_unmap_irq_line(struct notifier_block *nb,
+  unsigned long action, void *data)
+{
+   struct pci_dev *pdev = to_pci_dev(data);
+
+   if (action == BUS_NOTIFY_DEL_DEVICE) {
+   struct pci_intx_virq *vi;
+
+   mutex_lock(_mutex);
+   list_for_each_entry(vi, _list, list_node) {
+   if (vi->virq == pdev->irq) {
+   kref_put(>kref, ppc_pci_intx_release);
+   break;
+   }
+   }
+   mutex_unlock(_mutex);
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block ppc_pci_unmap_irq_notifier = {
+   .notifier_call = ppc_pci_unmap_irq_line,
+};
+
+static int ppc_pci_register_irq_notifier(void)
+{
+   return bus_register_notifier(_bus_type, 
_pci_unmap_irq_notifier);
+}
+arch_initcall(ppc_pci_register_irq_notifier);
+
  /*
   * Reads the interrupt pin to determine if interrupt is use by card.
   * If the interrupt is used, then gets the interrupt line from the
@@ -361,6 +410,12 @@ struct pci_controller *pci_find_controller_for_domain(int 
domain_nr)
  static int pci_read_irq_line(struct pci_dev *pci_dev)
  {
int virq;
+   struct pci_intx_virq *vi, *vitmp;
+
+   /* Preallocate vi as rewind is complex if this fails after mapping */
+   vi = kzalloc(sizeof(struct pci_intx_virq), GFP_KERNEL);
+   if (!vi)
+   return -1;
  
  	pr_debug("PCI: Try to map irq for %s...\n", pci_name(pci_dev));
  
@@ -377,12 +432,12 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)

 * function.
 */
if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_PIN, ))
-   return -1;
+   goto error_exit;
if (pin == 0)
-   return -1;
+   goto error_exit;
if (pci_read_config_byte(pci_dev, PCI_INTERRUPT_LINE, ) ||
line == 0xff || line == 0) {
-   return -1;
+   goto error_exit;
}
pr_debug(" No map ! Using line %d (pin %d) from PCI config\n",
 line, pin);
@@ -394,14 +449,33 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
  
  	if (!virq) {

pr_debug(" Failed to map !\n");
-   return -1;
+   goto error_exit;
}
  
  	pr_debug(" Mapped to linux irq %d\n", virq);
  
  	pci_dev->irq = virq;
  
+	mutex_lock(_mutex);

+   list_for_each_entry(vitmp, _list, list_node) {
+   if (vitmp->virq == virq) {
+   kref_get(>kref);
+   kfree(vi);
+   vi = NULL;
+   break;
+   }
+   }
+   if (vi) {
+   vi->virq = virq;
+   kref_init(>kref);
+

Re: powerpc 5.10-rcN boot failures with RCU_SCALE_TEST=m

2020-12-02 Thread Michael Ellerman

Daniel Axtens  writes:
> Hi all,
>
> I'm having some difficulty tracking down a bug.
>
> Some configurations of the powerpc kernel since somewhere in the 5.10
> merge window fail to boot on some ppc64 systems. They hang while trying
> to bring up SMP. It seems to depend on the RCU_SCALE/PERF_TEST option.
> (It was renamed in the 5.10 merge window.)
>
> I can reproduce it as follows with qemu tcg:
>
> make -j64 pseries_le_defconfig
> scripts/config -m RCU_SCALE_TEST
> scripts/config -m RCU_PERF_TEST
> make -j 64 vmlinux CC="ccache gcc"
>
> qemu-system-ppc64 -cpu power9 -M pseries -m 1G -nographic -vga none -smp 4 
> -kernel vmlinux
>
> ...
> [0.036284][T0] Mount-cache hash table entries: 8192 (order: 0, 65536 
> bytes, linear)
> [0.036481][T0] Mountpoint-cache hash table entries: 8192 (order: 0, 
> 65536 bytes, linear)
> [0.148168][T1] POWER9 performance monitor hardware support registered
> [0.151118][T1] rcu: Hierarchical SRCU implementation.
> [0.186660][T1] smp: Bringing up secondary CPUs ...
> 

One does not simply hang :)

> I have no idea why RCU_SCALE/PERF_TEST would be causing this, but that
> seems to be what does it: if I don't set that, the kernel boots fine.

It seems to be TASKS_RCU that is the key.

I don't need RCU_SCALE_TEST enabled, I can trigger it just with the
following applied:

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 0ebe15a84985..f3500c95d6a1 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -78,7 +78,7 @@ config TASKS_RCU_GENERIC
  task-based RCU implementations.  Not for manual selection.
 
 config TASKS_RCU
-   def_bool PREEMPTION
+   def_bool y
help
  This option enables a task-based RCU implementation that uses
  only voluntary context switch (not preemption!), idle, and


And bisect points to:
  36dadef23fcc ("kprobes: Init kprobes in early_initcall")

Which moved init_kprobes() prior to SMP bringup.


For some reason when it gets stuck sysrq doesn't work, but I was able to
get it into gdb and manually call handle_sysrq('t') to get the output
below.

The SMP bringup stalls because _cpu_up() is blocked trying to take
cpu_hotplug_lock for writing:

[  401.403132][T0] task:swapper/0   state:D stack:12512 pid:1 ppid: 
0 flags:0x0800
[  401.403502][T0] Call Trace:
[  401.403907][T0] [c62c37d0] [c62c3830] 0xc62c3830 
(unreliable)
[  401.404068][T0] [c62c39b0] [c0019d70] 
__switch_to+0x2e0/0x4a0
[  401.404189][T0] [c62c3a10] [c0b87228] 
__schedule+0x288/0x9b0
[  401.404257][T0] [c62c3ad0] [c0b879b8] schedule+0x68/0x120
[  401.404324][T0] [c62c3b00] [c0184ad4] 
percpu_down_write+0x164/0x170
[  401.404390][T0] [c62c3b50] [c0116b68] _cpu_up+0x68/0x280
[  401.404475][T0] [c62c3bb0] [c0116e70] cpu_up+0xf0/0x140
[  401.404546][T0] [c62c3c30] [c011776c] 
bringup_nonboot_cpus+0xac/0xf0
[  401.404643][T0] [c62c3c80] [c0eea1b8] smp_init+0x40/0xcc
[  401.404727][T0] [c62c3ce0] [c0ec43dc] 
kernel_init_freeable+0x1e0/0x3a0
[  401.404799][T0] [c62c3db0] [c0011ec4] 
kernel_init+0x24/0x150
[  401.404958][T0] [c62c3e20] [c000daf0] 
ret_from_kernel_thread+0x5c/0x6c

It can't get it because kprobe_optimizer() has taken it for read and is now
blocked waiting for synchronize_rcu_tasks():

[  401.418808][T0] task:kworker/0:1 state:D stack:13392 pid:   12 ppid: 
2 flags:0x0800
[  401.418951][T0] Workqueue: events kprobe_optimizer
[  401.419078][T0] Call Trace:
[  401.419121][T0] [c62ef650] [c62ef710] 0xc62ef710 
(unreliable)
[  401.419213][T0] [c62ef830] [c0019d70] 
__switch_to+0x2e0/0x4a0
[  401.419281][T0] [c62ef890] [c0b87228] 
__schedule+0x288/0x9b0
[  401.419347][T0] [c62ef950] [c0b879b8] schedule+0x68/0x120
[  401.419415][T0] [c62ef980] [c0b8e664] 
schedule_timeout+0x2a4/0x340
[  401.419484][T0] [c62efa80] [c0b894ec] 
wait_for_completion+0x9c/0x170
[  401.419552][T0] [c62efae0] [c01ac85c] 
__wait_rcu_gp+0x19c/0x210
[  401.419619][T0] [c62efb40] [c01ac90c] 
synchronize_rcu_tasks_generic+0x3c/0x70
[  401.419690][T0] [c62efbe0] [c022a3dc] 
kprobe_optimizer+0x1dc/0x470
[  401.419757][T0] [c62efc60] [c0136684] 
process_one_work+0x2f4/0x530
[  401.419823][T0] [c62efd20] [c0138d28] 
worker_thread+0x78/0x570
[  401.419891][T0] [c62efdb0] [c0142424] kthread+0x194/0x1a0
[  401.419976][T0] [c62efe20] [c000daf0] 
ret_from_kernel_thread+0x5c/0x6c

But why is the synchronize_rcu_tasks() not completing?

Hopefully Paul can help there, otherwise I'll try

Re: [PATCH 5/8] powerpc/64s/powernv: ratelimit harmless HMI error printing

2020-12-02 Thread Michael Ellerman

Nicholas Piggin  writes:
> Harmless HMI errors can be triggered by guests in some cases, and don't
> contain much useful information anyway. Ratelimit these to avoid
> flooding the console/logs.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/platforms/powernv/opal-hmi.c | 27 +--
>  1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c 
> b/arch/powerpc/platforms/powernv/opal-hmi.c
> index 3e1f064a18db..959da6df0227 100644
> --- a/arch/powerpc/platforms/powernv/opal-hmi.c
> +++ b/arch/powerpc/platforms/powernv/opal-hmi.c
> @@ -240,19 +240,22 @@ static void print_hmi_event_info(struct OpalHMIEvent 
> *hmi_evt)
>   break;
>   }
>  
> - printk("%s%s Hypervisor Maintenance interrupt [%s]\n",
> - level, sevstr,
> - hmi_evt->disposition == OpalHMI_DISPOSITION_RECOVERED ?
> - "Recovered" : "Not recovered");
> - error_info = hmi_evt->type < ARRAY_SIZE(hmi_error_types) ?
> - hmi_error_types[hmi_evt->type]
> - : "Unknown";
> - printk("%s Error detail: %s\n", level, error_info);
> - printk("%s  HMER: %016llx\n", level, be64_to_cpu(hmi_evt->hmer));
> - if ((hmi_evt->type == OpalHMI_ERROR_TFAC) ||
> - (hmi_evt->type == OpalHMI_ERROR_TFMR_PARITY))
> - printk("%s  TFMR: %016llx\n", level,
> + if (hmi_evt->severity != OpalHMI_SEV_NO_ERROR || printk_ratelimit()) {
> + printk("%s%s Hypervisor Maintenance interrupt [%s]\n",
> + level, sevstr,
> + hmi_evt->disposition == OpalHMI_DISPOSITION_RECOVERED ?
> + "Recovered" : "Not recovered");
> + error_info = hmi_evt->type < ARRAY_SIZE(hmi_error_types) ?
> + hmi_error_types[hmi_evt->type]
> + : "Unknown";
> + printk("%s Error detail: %s\n", level, error_info);
> + printk("%s  HMER: %016llx\n", level,
> + be64_to_cpu(hmi_evt->hmer));
> + if ((hmi_evt->type == OpalHMI_ERROR_TFAC) ||
> + (hmi_evt->type == OpalHMI_ERROR_TFMR_PARITY))
> + printk("%s  TFMR: %016llx\n", level,
>   be64_to_cpu(hmi_evt->tfmr));
> + }

Same comment RE printk_ratelimit(), I folded this in:

diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c 
b/arch/powerpc/platforms/powernv/opal-hmi.c
index 959da6df0227..f0c1830deb51 100644
--- a/arch/powerpc/platforms/powernv/opal-hmi.c
+++ b/arch/powerpc/platforms/powernv/opal-hmi.c
@@ -213,6 +213,8 @@ static void print_hmi_event_info(struct OpalHMIEvent 
*hmi_evt)
"A hypervisor resource error occurred",
"CAPP recovery process is in progress",
};
+   static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
 
/* Print things out */
if (hmi_evt->version < OpalHMIEvt_V1) {
@@ -240,7 +242,7 @@ static void print_hmi_event_info(struct OpalHMIEvent 
*hmi_evt)
break;
}
 
-   if (hmi_evt->severity != OpalHMI_SEV_NO_ERROR || printk_ratelimit()) {
+   if (hmi_evt->severity != OpalHMI_SEV_NO_ERROR || __ratelimit()) {
printk("%s%s Hypervisor Maintenance interrupt [%s]\n",
level, sevstr,
hmi_evt->disposition == OpalHMI_DISPOSITION_RECOVERED ?

cheers

Re: [PATCH 4/8] KVM: PPC: Book3S HV: Ratelimit machine check messages coming from guests

2020-12-02 Thread Michael Ellerman

Nicholas Piggin  writes:
> A number of machine check exceptions are triggerable by the guest.
> Ratelimit these to avoid a guest flooding the host console and logs.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kvm/book3s_hv.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index e3b1839fc251..c94f9595133d 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1328,8 +1328,12 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
>   r = RESUME_GUEST;
>   break;
>   case BOOK3S_INTERRUPT_MACHINE_CHECK:
> - /* Print the MCE event to host console. */
> - machine_check_print_event_info(>arch.mce_evt, false, 
> true);
> + /*
> +  * Print the MCE event to host console. Ratelimit so the guest
> +  * can't flood the host log.
> +  */
> + if (printk_ratelimit())
> + 
> machine_check_print_event_info(>arch.mce_evt,false, true);

You're not supposed to use printk_ratelimit(), because there's a single
rate limit state for all printks. ie. some other noisty printk() can
cause this one to never be printed.

I folded this in:

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cbbc4f0a26fe..cfaa91b27112 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1327,12 +1327,14 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
case BOOK3S_INTERRUPT_SYSTEM_RESET:
r = RESUME_GUEST;
break;
-   case BOOK3S_INTERRUPT_MACHINE_CHECK:
+   case BOOK3S_INTERRUPT_MACHINE_CHECK: {
+   static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
/*
 * Print the MCE event to host console. Ratelimit so the guest
 * can't flood the host log.
 */
-   if (printk_ratelimit())
+   if (__ratelimit())

machine_check_print_event_info(>arch.mce_evt,false, true);
 
/*
@@ -1361,6 +1363,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 
r = RESUME_HOST;
break;
+   }
case BOOK3S_INTERRUPT_PROGRAM:
{
ulong flags;
@@ -1520,12 +1523,16 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu 
*vcpu)
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
+   {
+   static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
/* Pass the machine check to the L1 guest */
r = RESUME_HOST;
/* Print the MCE event to host console. */
-   if (printk_ratelimit())
+   if (__ratelimit())
machine_check_print_event_info(>arch.mce_evt, 
false, true);
break;
+   }
/*
 * We get these next two if the guest accesses a page which it thinks
 * it has mapped but which is not actually present, either because


cheers

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-02 Thread Peter Zijlstra

On Wed, Dec 02, 2020 at 12:17:31PM +0100, Peter Zijlstra wrote:

> So the obvious 'improvement' here would be something like:
> 
>   for_each_online_cpu(cpu) {
>   p = rcu_dereference(cpu_rq(cpu)->curr;
>   if (p->active_mm != mm)
>   continue;
>   __cpumask_set_cpu(cpu, tmpmask);
>   }
>   on_each_cpu_mask(tmpmask, ...);
> 
> The remote CPU will never switch _to_ @mm, on account of it being quite
> dead, but it is quite prone to false negatives.
> 
> Consider that __schedule() sets rq->curr *before* context_switch(), this
> means we'll see next->active_mm, even though prev->active_mm might still
> be our @mm.
> 
> Now, because we'll be removing the atomic ops from context_switch()'s
> active_mm swizzling, I think we can change this to something like the
> below. The hope being that the cost of the new barrier can be offset by
> the loss of the atomics.
> 
> Hmm ?
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 41404afb7f4c..2597c5c0ccb0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4509,7 +4509,6 @@ context_switch(struct rq *rq, struct task_struct *prev,
>   if (!next->mm) {// to kernel
>   enter_lazy_tlb(prev->active_mm, next);
>  
> - next->active_mm = prev->active_mm;
>   if (prev->mm)   // from user
>   mmgrab(prev->active_mm);
>   else
> @@ -4524,6 +4523,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
>* case 'prev->active_mm == next->mm' through
>* finish_task_switch()'s mmdrop().
>*/
> + next->active_mm = next->mm;
>   switch_mm_irqs_off(prev->active_mm, next->mm, next);

I think that next->active_mm store should be after switch_mm(),
otherwise we still race.

>  
>   if (!prev->mm) {// from kernel
> @@ -5713,11 +5713,9 @@ static void __sched notrace __schedule(bool preempt)
>  
>   if (likely(prev != next)) {
>   rq->nr_switches++;
> - /*
> -  * RCU users of rcu_dereference(rq->curr) may not see
> -  * changes to task_struct made by pick_next_task().
> -  */
> - RCU_INIT_POINTER(rq->curr, next);
> +
> + next->active_mm = prev->active_mm;
> + rcu_assign_pointer(rq->curr, next);
>   /*
>* The membarrier system call requires each architecture
>* to have a full memory barrier after updating

[PATCH] drivers: char: tpm: remove unneeded MODULE_VERSION() usage

2020-12-02 Thread Enrico Weigelt, metux IT consult

Remove MODULE_VERSION(), as it isn't needed at all: the only version
making sense is the kernel version.

Link: https://lkml.org/lkml/2017/11/22/480

Signed-off-by: Enrico Weigelt 
---
 drivers/char/tpm/st33zp24/i2c.c  | 1 -
 drivers/char/tpm/st33zp24/spi.c  | 1 -
 drivers/char/tpm/st33zp24/st33zp24.c | 1 -
 drivers/char/tpm/tpm-interface.c | 1 -
 drivers/char/tpm/tpm_atmel.c | 1 -
 drivers/char/tpm/tpm_crb.c   | 1 -
 drivers/char/tpm/tpm_i2c_infineon.c  | 1 -
 drivers/char/tpm/tpm_ibmvtpm.c   | 1 -
 drivers/char/tpm/tpm_infineon.c  | 1 -
 drivers/char/tpm/tpm_nsc.c   | 1 -
 drivers/char/tpm/tpm_tis.c   | 1 -
 drivers/char/tpm/tpm_tis_core.c  | 1 -
 drivers/char/tpm/tpm_vtpm_proxy.c| 1 -
 13 files changed, 13 deletions(-)

diff --git a/drivers/char/tpm/st33zp24/i2c.c b/drivers/char/tpm/st33zp24/i2c.c
index 7c617edff4ca..7ed9829cacc4 100644
--- a/drivers/char/tpm/st33zp24/i2c.c
+++ b/drivers/char/tpm/st33zp24/i2c.c
@@ -313,5 +313,4 @@ module_i2c_driver(st33zp24_i2c_driver);
 
 MODULE_AUTHOR("TPM support (tpmsupp...@list.st.com)");
 MODULE_DESCRIPTION("STM TPM 1.2 I2C ST33 Driver");
-MODULE_VERSION("1.3.0");
 MODULE_LICENSE("GPL");
diff --git a/drivers/char/tpm/st33zp24/spi.c b/drivers/char/tpm/st33zp24/spi.c
index a75dafd39445..147efea4eb05 100644
--- a/drivers/char/tpm/st33zp24/spi.c
+++ b/drivers/char/tpm/st33zp24/spi.c
@@ -430,5 +430,4 @@ module_spi_driver(st33zp24_spi_driver);
 
 MODULE_AUTHOR("TPM support (tpmsupp...@list.st.com)");
 MODULE_DESCRIPTION("STM TPM 1.2 SPI ST33 Driver");
-MODULE_VERSION("1.3.0");
 MODULE_LICENSE("GPL");
diff --git a/drivers/char/tpm/st33zp24/st33zp24.c 
b/drivers/char/tpm/st33zp24/st33zp24.c
index 4ec10ab5e576..e0f1a5828993 100644
--- a/drivers/char/tpm/st33zp24/st33zp24.c
+++ b/drivers/char/tpm/st33zp24/st33zp24.c
@@ -646,5 +646,4 @@ EXPORT_SYMBOL(st33zp24_pm_resume);
 
 MODULE_AUTHOR("TPM support (tpmsupp...@list.st.com)");
 MODULE_DESCRIPTION("ST33ZP24 TPM 1.2 driver");
-MODULE_VERSION("1.3.0");
 MODULE_LICENSE("GPL");
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 1621ce818705..dfdc68b8bf88 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -514,5 +514,4 @@ module_exit(tpm_exit);
 
 MODULE_AUTHOR("Leendert van Doorn (leend...@watson.ibm.com)");
 MODULE_DESCRIPTION("TPM Driver");
-MODULE_VERSION("2.0");
 MODULE_LICENSE("GPL");
diff --git a/drivers/char/tpm/tpm_atmel.c b/drivers/char/tpm/tpm_atmel.c
index 54a6750a6757..35bf249cc95a 100644
--- a/drivers/char/tpm/tpm_atmel.c
+++ b/drivers/char/tpm/tpm_atmel.c
@@ -231,5 +231,4 @@ module_exit(cleanup_atmel);
 
 MODULE_AUTHOR("Leendert van Doorn (leend...@watson.ibm.com)");
 MODULE_DESCRIPTION("TPM Driver");
-MODULE_VERSION("2.0");
 MODULE_LICENSE("GPL");
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index a9dcf31eadd2..3e72b7b99cce 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -748,5 +748,4 @@ static struct acpi_driver crb_acpi_driver = {
 module_acpi_driver(crb_acpi_driver);
 MODULE_AUTHOR("Jarkko Sakkinen ");
 MODULE_DESCRIPTION("TPM2 Driver");
-MODULE_VERSION("0.1");
 MODULE_LICENSE("GPL");
diff --git a/drivers/char/tpm/tpm_i2c_infineon.c 
b/drivers/char/tpm/tpm_i2c_infineon.c
index a19d32cb4e94..8920b7c19fcb 100644
--- a/drivers/char/tpm/tpm_i2c_infineon.c
+++ b/drivers/char/tpm/tpm_i2c_infineon.c
@@ -731,5 +731,4 @@ static struct i2c_driver tpm_tis_i2c_driver = {
 module_i2c_driver(tpm_tis_i2c_driver);
 MODULE_AUTHOR("Peter Huewe ");
 MODULE_DESCRIPTION("TPM TIS I2C Infineon Driver");
-MODULE_VERSION("2.2.0");
 MODULE_LICENSE("GPL");
diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
index 994385bf37c0..5b04d113f634 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -750,5 +750,4 @@ module_exit(ibmvtpm_module_exit);
 
 MODULE_AUTHOR("ad...@us.ibm.com");
 MODULE_DESCRIPTION("IBM vTPM Driver");
-MODULE_VERSION("1.0");
 MODULE_LICENSE("GPL");
diff --git a/drivers/char/tpm/tpm_infineon.c b/drivers/char/tpm/tpm_infineon.c
index 9c924a1440a9..8a58966c5c9b 100644
--- a/drivers/char/tpm/tpm_infineon.c
+++ b/drivers/char/tpm/tpm_infineon.c
@@ -621,5 +621,4 @@ module_pnp_driver(tpm_inf_pnp_driver);
 
 MODULE_AUTHOR("Marcel Selhorst ");
 MODULE_DESCRIPTION("Driver for Infineon TPM SLD 9630 TT 1.1 / SLB 9635 TT 
1.2");
-MODULE_VERSION("1.9.2");
 MODULE_LICENSE("GPL");
diff --git a/drivers/char/tpm/tpm_nsc.c b/drivers/char/tpm/tpm_nsc.c
index 038701d48351..6ab2fe7e8782 100644
--- a/drivers/char/tpm/tpm_nsc.c
+++ b/drivers/char/tpm/tpm_nsc.c
@@ -412,5 +412,4 @@ module_exit(cleanup_nsc);
 
 MODULE_AUTHOR("Leendert van Doorn (leend...@watson.ibm.com)");
 MODULE_DESCRIPTION("TPM Driver");
-MODULE_VERSION("2.0");
 MODULE_LICENSE("GPL");
diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index 4ed6e660273a..3074235b405d 100644
--- a/drivers/char/tpm/tpm_tis.c
+++

Re: [PATCH 2/2] powerpc/ps3: make system bus's remove and shutdown callbacks return void

2020-12-02 Thread Takashi Iwai

On Wed, 02 Dec 2020 13:14:06 +0100,
Michael Ellerman wrote:
> 
> Uwe Kleine-König  writes:
> > Hello Michael,
> >
> > On Sat, Nov 28, 2020 at 09:48:30AM +0100, Takashi Iwai wrote:
> >> On Thu, 26 Nov 2020 17:59:50 +0100,
> >> Uwe Kleine-König wrote:
> >> > 
> >> > The driver core ignores the return value of struct device_driver::remove
> >> > because there is only little that can be done. For the shutdown callback
> >> > it's ps3_system_bus_shutdown() which ignores the return value.
> >> > 
> >> > To simplify the quest to make struct device_driver::remove return void,
> >> > let struct ps3_system_bus_driver::remove return void, too. All users
> >> > already unconditionally return 0, this commit makes it obvious that
> >> > returning an error code is a bad idea and ensures future users behave
> >> > accordingly.
> >> > 
> >> > Signed-off-by: Uwe Kleine-König 
> >> 
> >> For the sound bit:
> >> Acked-by: Takashi Iwai 
> >
> > assuming that you are the one who will apply this patch: Note that it
> > depends on patch 1 that Takashi already applied to his tree. So you
> > either have to wait untils patch 1 appears in some tree that you merge
> > before applying, or you have to take patch 1, too. (With Takashi
> > optinally dropping it then.)
> 
> Thanks. I've picked up both patches.
> 
> If Takashi doesn't want to rebase his tree to drop patch 1 that's OK, it
> will just arrive in mainline via two paths, but git should handle it.

Yeah, I'd like to avoid rebasing, so let's get it merge from both
trees.  git can handle such a case gracefully.


thanks,

Takashi

Re: [PATCH 2/2] powerpc/ps3: make system bus's remove and shutdown callbacks return void

2020-12-02 Thread Michael Ellerman

Uwe Kleine-König  writes:
> Hello Michael,
>
> On Sat, Nov 28, 2020 at 09:48:30AM +0100, Takashi Iwai wrote:
>> On Thu, 26 Nov 2020 17:59:50 +0100,
>> Uwe Kleine-König wrote:
>> > 
>> > The driver core ignores the return value of struct device_driver::remove
>> > because there is only little that can be done. For the shutdown callback
>> > it's ps3_system_bus_shutdown() which ignores the return value.
>> > 
>> > To simplify the quest to make struct device_driver::remove return void,
>> > let struct ps3_system_bus_driver::remove return void, too. All users
>> > already unconditionally return 0, this commit makes it obvious that
>> > returning an error code is a bad idea and ensures future users behave
>> > accordingly.
>> > 
>> > Signed-off-by: Uwe Kleine-König 
>> 
>> For the sound bit:
>> Acked-by: Takashi Iwai 
>
> assuming that you are the one who will apply this patch: Note that it
> depends on patch 1 that Takashi already applied to his tree. So you
> either have to wait untils patch 1 appears in some tree that you merge
> before applying, or you have to take patch 1, too. (With Takashi
> optinally dropping it then.)

Thanks. I've picked up both patches.

If Takashi doesn't want to rebase his tree to drop patch 1 that's OK, it
will just arrive in mainline via two paths, but git should handle it.

cheers

Re: [PATCH 00/13] ibmvfc: initial MQ development

2020-12-02 Thread Hannes Reinecke


On 11/26/20 2:48 AM, Tyrel Datwyler wrote:

Recent updates in pHyp Firmware and VIOS releases provide new infrastructure
towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each
Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the
partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then
negotiated with the VIOS via new Management Datagrams (MADs) for channel setup.

This initial implementation adds the necessary Sub-CRQ framework and implements
the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS
HW backed channels. The event pool and locking still leverages the legacy single
queue implementation, and as such lock contention is problematic when increasing
the number of queues. However, this initial work demonstrates a 1.2x factor
increase in IOPs when configured with two HW queues despite lock contention.


Why do you still hold the hold lock during submission?
An initial check on the submission code path didn't reveal anything 
obvious, so it _should_ be possible to drop the host lock there.
Or at least move it into the submission function itself to avoid lock 
contention. Hmm?


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

Re: [PATCH] EDAC, mv64x60: Fix error return code in mv64x60_pci_err_probe()

2020-12-02 Thread Borislav Petkov

On Tue, Nov 24, 2020 at 02:30:09PM +0800, Wang ShaoBo wrote:
> Fix to return -ENODEV error code when edac_pci_add_device() failed instaed
> of 0 in mv64x60_pci_err_probe(), as done elsewhere in this function.
> 
> Fixes: 4f4aeeabc061 ("drivers-edac: add marvell mv64x60 driver")
> Signed-off-by: Wang ShaoBo 
> ---
>  drivers/edac/mv64x60_edac.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/edac/mv64x60_edac.c b/drivers/edac/mv64x60_edac.c
> index 3c68bb525d5d..456b9ca1fe8d 100644
> --- a/drivers/edac/mv64x60_edac.c
> +++ b/drivers/edac/mv64x60_edac.c
> @@ -168,6 +168,7 @@ static int mv64x60_pci_err_probe(struct platform_device 
> *pdev)
>  
>   if (edac_pci_add_device(pci, pdata->edac_idx) > 0) {
>   edac_dbg(3, "failed edac_pci_add_device()\n");
> + res = -ENODEV;
>   goto err;
>   }

That driver depends on MV64X60 and I don't see anything in the tree
enabling it and I can't select it AFAICT:

config MV64X60
bool
select PPC_INDIRECT_PCI
select CHECK_CACHE_COHERENCY

PPC folks, what do we do here?

If not used anymore, I'd love to have one less EDAC driver.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-02 Thread Peter Zijlstra

On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote:
> +static void shoot_lazy_tlbs(struct mm_struct *mm)
> +{
> + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) {
> + /*
> +  * IPI overheads have not found to be expensive, but they could
> +  * be reduced in a number of possible ways, for example (in
> +  * roughly increasing order of complexity):
> +  * - A batch of mms requiring IPIs could be gathered and freed
> +  *   at once.
> +  * - CPUs could store their active mm somewhere that can be
> +  *   remotely checked without a lock, to filter out
> +  *   false-positives in the cpumask.
> +  * - After mm_users or mm_count reaches zero, switching away
> +  *   from the mm could clear mm_cpumask to reduce some IPIs
> +  *   (some batching or delaying would help).
> +  * - A delayed freeing and RCU-like quiescing sequence based on
> +  *   mm switching to avoid IPIs completely.
> +  */
> + on_each_cpu_mask(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 
> 1);
> + if (IS_ENABLED(CONFIG_DEBUG_VM))
> + on_each_cpu(do_check_lazy_tlb, (void *)mm, 1);

So the obvious 'improvement' here would be something like:

for_each_online_cpu(cpu) {
p = rcu_dereference(cpu_rq(cpu)->curr;
if (p->active_mm != mm)
continue;
__cpumask_set_cpu(cpu, tmpmask);
}
on_each_cpu_mask(tmpmask, ...);

The remote CPU will never switch _to_ @mm, on account of it being quite
dead, but it is quite prone to false negatives.

Consider that __schedule() sets rq->curr *before* context_switch(), this
means we'll see next->active_mm, even though prev->active_mm might still
be our @mm.

Now, because we'll be removing the atomic ops from context_switch()'s
active_mm swizzling, I think we can change this to something like the
below. The hope being that the cost of the new barrier can be offset by
the loss of the atomics.

Hmm ?

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 41404afb7f4c..2597c5c0ccb0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4509,7 +4509,6 @@ context_switch(struct rq *rq, struct task_struct *prev,
if (!next->mm) {// to kernel
enter_lazy_tlb(prev->active_mm, next);
 
-   next->active_mm = prev->active_mm;
if (prev->mm)   // from user
mmgrab(prev->active_mm);
else
@@ -4524,6 +4523,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 * case 'prev->active_mm == next->mm' through
 * finish_task_switch()'s mmdrop().
 */
+   next->active_mm = next->mm;
switch_mm_irqs_off(prev->active_mm, next->mm, next);
 
if (!prev->mm) {// from kernel
@@ -5713,11 +5713,9 @@ static void __sched notrace __schedule(bool preempt)
 
if (likely(prev != next)) {
rq->nr_switches++;
-   /*
-* RCU users of rcu_dereference(rq->curr) may not see
-* changes to task_struct made by pick_next_task().
-*/
-   RCU_INIT_POINTER(rq->curr, next);
+
+   next->active_mm = prev->active_mm;
+   rcu_assign_pointer(rq->curr, next);
/*
 * The membarrier system call requires each architecture
 * to have a full memory barrier after updating

Re: CONFIG_PPC_VAS depends on 64k pages...?

2020-12-02 Thread Will Springer

On Tuesday, December 1, 2020 5:16:51 AM PST Bulent Abali wrote:
> I don't know anything about VAS page size requirements in the kernel.  I
> checked the user compression library and saw that we do a sysconf to
> get the page size; so the library should be immune to page size by
> design. But it wouldn't surprise me if a 64KB constant is inadvertently
> hardcoded somewhere else in the library.  Giving heads up to Tulio and
> Raphael who are owners of the github repo.
> 
> https://github.com/libnxz/power-gzip/blob/master/lib/nx_zlib.c#L922
> 
> If we got this wrong in the library it might manifest itself as an error
> message of the sort "excessive page faults".  The library must touch
> pages ahead to make them present in the memory; occasional page faults
> is acceptable. It will retry.

Hm, good to know. As I said I haven't noticed any problems so far, over a 
few different days of testing. My change is now in the Void Linux kernel 
package, and is working for others as well (including the Void maintainer 
Daniel/q66 who I CC'd initially).

> 
> Bulent
> 
> 
> 
> 
> From:"Sukadev Bhattiprolu" 
> To:"Christophe Leroy" 
> Cc:"Will Springer" ,
> linuxppc-dev@lists.ozlabs.org, dan...@octaforge.org, Bulent
> Abali/Watson/IBM@IBM, ha...@linux.ibm.com Date:12/01/2020 12:53
> AM
> Subject:Re: CONFIG_PPC_VAS depends on 64k pages...?
> 
> Christophe Leroy [christophe.le...@csgroup.eu] wrote:
> > Hi,
> > 
> > Le 19/11/2020 à 11:58, Will Springer a écrit :
> > > I learned about the POWER9 gzip accelerator a few months ago when
> > > the
> > > support hit upstream Linux 5.8. However, for some reason the Kconfig
> > > dictates that VAS depends on a 64k page size, which is problematic
> > > as I
> > > run Void Linux, which uses a 4k-page kernel.
> > > 
> > > Some early poking by others indicated there wasn't an obvious page
> > > size
> > > dependency in the code, and suggested I try modifying the config to
> > > switch it on. I did so, but was stopped by a minor complaint of an
> > > "unexpected DT configuration" by the VAS code. I wasn't equipped to
> > > figure out exactly what this meant, even after finding the
> > > offending condition, so after writing a very drawn-out forum post
> > > asking for help, I dropped the subject.
> > > 
> > > Fast forward to today, when I was reminded of the whole thing again,
> > > and decided to debug a bit further. Apparently the VAS platform
> > > device (derived from the DT node) has 5 resources on my 4k kernel,
> > > instead of 4 (which evidently works for others who have had success
> > > on 64k kernels). I have no idea what this means in practice (I
> > > don't know how to introspect it), but after making a tiny patch[1],
> > > everything came up smoothly and I was doing blazing-fast gzip
> > > (de)compression in no time.
> > > 
> > > Everything seems to work fine on 4k pages. So, what's up? Are there
> > > pitfalls lurking around that I've yet to stumble over? More
> > > reasonably,
> > > I'm curious as to why the feature supposedly depends on 64k pages,
> > > or if there's anything else I should be concerned about.
> 
> Will,
> 
> The reason I put in that config check is because we were only able to
> test 64K pages at that point.
> 
> It is interesting that it is working for you. Following code in skiboot
> https://github.com/open-power/skiboot/blob/master/hw/vas.cshould
> restrict it to 64K pages. IIRC there is also a corresponding change in
> some NX registers that should also be configured to allow 4K pages. 

Huh, that is interesting indeed. As far as the kernel code, the only thing 
specific to 64k pages I could find was in [1], where 
VAS_XLATE_LPCR_PAGE_SIZE is set. There is also NX_PAGE_SIZE in drivers/
crypto/nx/nx.h, which is set to 4096, but I don't know if that's related to 
kernel page size at all. Without a better idea of the code base, I didn't
examine more thoroughly.

[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/platforms/powernv/vas-window.c#n293

> static int init_north_ctl(struct proc_chip *chip)
> {
>  uint64_t val = 0ULL;
> 
>  val = SETFIELD(VAS_64K_MODE_MASK, val,
> true); val = SETFIELD(VAS_ACCEPT_PASTE_MASK, val, true); val =
> SETFIELD(VAS_ENABLE_WC_MMIO_BAR, val, true); val =
> SETFIELD(VAS_ENABLE_UWC_MMIO_BAR, val, true); val =
> SETFIELD(VAS_ENABLE_RMA_MMIO_BAR, val, true);
> 
>  return vas_scom_write(chip,
> VAS_MISC_N_CTL, val); }
> 
> I am copying Bulent Albali and Haren Myneni who have been working with
> VAS/NX for their thoughts/experience.

Thanks for this and for your input, by the way.

> 
> > Maybe ask Sukadev who did the implementation and is maintaining it ?
> > 
> > > I do have to say I'm quite satisfied with the results of the NX
> > > accelerator, though. Being able to shuffle data to a RaptorCS box
> > > over gigE and

73 matches

Mail list logo