date:20221116

Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc

2022-11-16 Thread Christophe Leroy



Le 16/11/2022 à 18:01, Hari Bathini a écrit :
> 
> 
> On 16/11/22 12:14 am, Christophe Leroy wrote:
>>
>>
>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit :
>>>
>>>
>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit :
 Hi Christophe,

 On 11/11/22 4:55 pm, Christophe Leroy wrote:
> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>> Most BPF programs are small, but they consume a page each. For 
>> systems
>> with busy traffic and many BPF programs, this may also add 
>> significant
>> pressure on instruction TLB. High iTLB pressure usually slows down 
>> the
>> whole system causing visible performance degradation for production
>> workloads.
>>
>> bpf_prog_pack, a customized allocator that packs multiple bpf 
>> programs
>> into preallocated memory chunks, was proposed [1] to address it. This
>> series extends this support on powerpc.
>>
>> Patches 1 & 2 add the arch specific functions needed to support this
>> feature. Patch 3 enables the support for powerpc. The last patch
>> ensures cleanup is handled racefully.
>>

>> Tested the changes successfully on a PowerVM. patch_instruction(),
>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
>> Posting the patches in the meanwhile for feedback on these changes.
>
> I did a quick test on ppc32, I don't get such a problem, only 
> something
> wrong in the dump print as traps intructions only are dumped, but
> tcpdump works as expected:

 Thanks for the quick test. Could you please share the config you used.
 I am probably missing a few knobs in my conifg...

>>>
>>
>> I also managed to test it on QEMU. The config is based on 
>> pmac32_defconfig.
> 
> I had the same config but hit this problem:
> 
>  # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>  test_bpf: #0 TAX
>  [ cut here ]
>  WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 
> bpf_int_jit_compile+0x8a0/0x9f8

I get no such problem, on QEMU, and I checked the .config has:
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_STRICT_MODULE_RWX=y

Boot successful.
/ # ifconfig eth0 10.0.2.15
e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
/ # tftp -g 10.0.2.2 -r test_bpf.ko
/ # echo 1 > /proc/sys/net/core/bpf_jit_enable
/ # insmod ./test_bpf.ko
test_bpf: #0 TAX jited:1 216 87 86 PASS
test_bpf: #1 TXA jited:1 57 27 27 PASS
test_bpf: #2 ADD_SUB_MUL_K jited:1 50 PASS
test_bpf: #3 DIV_MOD_KX jited:1 110 PASS
test_bpf: #4 AND_OR_LSH_K jited:1 67 26 PASS
test_bpf: #5 LD_IMM_0 jited:1 77 PASS
...

By the way, you can note that during the boot you get:

This platform has HASH MMU, STRICT_MODULE_RWX won't work

See why in 0670010f3b10 ("powerpc/32s: Enable STRICT_MODULE_RWX for the 
603 core")

Nevertheless it should prevent patch_instruction() to work.

Could you had a pr_err() in __patch_instruction() in the failure path to 
print and check exec_addr and patch_addr ?



>  jited:1
>  kernel tried to execute exec-protected page (be857020) - exploit 
> attempt? (uid: 0)
>  BUG: Unable to handle kernel instruction fetch
>  Faulting instruction address: 0xbe857020

I'm a bit surprised of this. On hash based book3s/32 there is no way to 
protect pages for exec-protection. Protection is performed at segment 
level, all kernel segments have the NX bit set except the segment used 
for module text, which is by default 0xb000-0xbfff.

Or maybe this is the first time that address is accessed, and the ISI 
handler does the check before loading the hash table ?

> 
> bpf_jit_binary_pack_finalize() function failed due to 
> patch_instruction() ..

Is there a way tell BPF core that jit failed in that case to avoid that ?

Christophe

[powerpc:fixes-test] BUILD SUCCESS eb761a1760bf30cf64e98ee8d914866e62ec9e8a

2022-11-16 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
fixes-test
branch HEAD: eb761a1760bf30cf64e98ee8d914866e62ec9e8a  powerpc: Fix writable 
sections being moved into the rodata region

elapsed time: 725m

configs tested: 67
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arc defconfig
alpha   defconfig
s390defconfig
s390 allmodconfig
um i386_defconfig
um   x86_64_defconfig
s390 allyesconfig
powerpc   allnoconfig
mips allyesconfig
powerpc  allmodconfig
sh   allmodconfig
ia64 allmodconfig
x86_64  rhel-8.3-func
x86_64rhel-8.3-kselftests
x86_64 rhel-8.3-kunit
x86_64   rhel-8.3-kvm
x86_64   rhel-8.3-syz
i386  randconfig-a001
sh  r7785rp_defconfig
m68k allyesconfig
i386  randconfig-a003
alphaallyesconfig
m68k allmodconfig
arc  allyesconfig
i386  randconfig-a005
i386 allyesconfig
i386defconfig
x86_64  defconfig
x86_64   allyesconfig
x86_64   rhel-8.3
x86_64randconfig-a006
x86_64randconfig-a004
x86_64randconfig-a002
arm64allyesconfig
arm defconfig
arm  allyesconfig
i386  randconfig-a012
i386  randconfig-a014
i386  randconfig-a016
x86_64allnoconfig
shtitan_defconfig
powerpc mpc85xx_cds_defconfig
mipsar7_defconfig
armzeus_defconfig
arm ezx_defconfig
i386  randconfig-c001
microblaze  defconfig
sh  rsk7203_defconfig
sparc64  alldefconfig
riscvnommu_virt_defconfig
riscv  rv32_defconfig
riscvnommu_k210_defconfig
riscv allnoconfig
i386   debian-10.3-kselftests
i386  debian-10.3
i386  debian-10.3-kvm
i386debian-10.3-kunit
i386 debian-10.3-func

clang tested configs:
i386  randconfig-a002
i386  randconfig-a006
i386  randconfig-a004
powerpc pseries_defconfig
x86_64randconfig-a012
x86_64randconfig-a014
x86_64randconfig-a016
hexagon  randconfig-r041-20221117
hexagon  randconfig-r045-20221117

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

[powerpc:merge] BUILD SUCCESS b9b83072dc2b0a262b201eac66ed2cafd202a93e

2022-11-16 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: b9b83072dc2b0a262b201eac66ed2cafd202a93e  Automatic merge of 
'master' into merge (2022-11-14 14:07)

elapsed time: 724m

configs tested: 54
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
um i386_defconfig
um   x86_64_defconfig
arc defconfig
s390 allmodconfig
alpha   defconfig
s390defconfig
ia64 allmodconfig
s390 allyesconfig
x86_64randconfig-a011
x86_64  defconfig
i386  randconfig-a001
x86_64randconfig-a015
i386defconfig
i386  randconfig-a003
m68k allmodconfig
powerpc   allnoconfig
arc  allyesconfig
powerpc  allmodconfig
x86_64randconfig-a013
sh   allmodconfig
i386  randconfig-a005
mips allyesconfig
x86_64  rhel-8.3-func
alphaallyesconfig
x86_64   rhel-8.3
x86_64rhel-8.3-kselftests
i386  randconfig-a012
x86_64   allyesconfig
arc  randconfig-r043-20221116
i386 allyesconfig
x86_64randconfig-a006
x86_64   rhel-8.3-syz
i386  randconfig-a016
m68k allyesconfig
i386  randconfig-a014
x86_64   rhel-8.3-kvm
x86_64 rhel-8.3-kunit
arm defconfig
arm64allyesconfig
arm  allyesconfig

clang tested configs:
x86_64randconfig-a012
i386  randconfig-a013
x86_64randconfig-a014
i386  randconfig-a002
i386  randconfig-a011
x86_64randconfig-a016
i386  randconfig-a006
i386  randconfig-a004
hexagon  randconfig-r041-20221116
riscvrandconfig-r042-20221116
hexagon  randconfig-r045-20221116
x86_64randconfig-a005
s390 randconfig-r044-20221116
i386  randconfig-a015

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

Re: [PATCH v2 3/3] treewide: use get_random_u32_between() when possible

2022-11-16 Thread Jason A. Donenfeld

On Thu, Nov 17, 2022 at 12:55:47AM +0100, Jason A. Donenfeld wrote:
> 1) How/whether to make f(0, UR2_MAX) safe,
>- without additional 64-bit arithmetic,
>- minimizing the number of branches.
>I have a few ideas I'll code golf for a bit.
> I think I can make progress with (1) alone by fiddling around with
> godbolt enough, like usual.

The code gen is definitely worse.

Original half-open interval:

return floor + get_random_u32_below(ceil - floor);

Suggested fully closed interval:

ceil = ceil - floor + 1;
return likely(ceil) ? floor + get_random_u32_below(ceil) : get_random_u32();

Is the worse code gen actually worth it? Options:

 a) Decide worse codegen is worth it.
 b) Declare f(0, U32_MAX) undefined and just not handle it.
 c) Stick with original half-open interval that doesn't have this problem.

Jason

Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

2022-11-16 Thread Huang, Kai

On Wed, 2022-11-16 at 17:11 +, Sean Christopherson wrote:
> On Wed, Nov 16, 2022, Huang, Kai wrote:
> > On Tue, 2022-11-15 at 20:16 +, Sean Christopherson wrote:
> > > On Thu, Nov 10, 2022, Huang, Kai wrote:
> > > > On Thu, 2022-11-10 at 01:33 +, Huang, Kai wrote:
> > > > Hmm.. I wasn't thinking thoroughly.  I forgot CPU compatibility check 
> > > > also
> > > > happens on all online cpus when loading KVM.  For this case, IRQ is 
> > > > disabled and
> > > > cpu_active() is true.  For the hotplug case, IRQ is enabled but  
> > > > cpu_active() is
> > > > false.
> > > 
> > > Actually, you're right (and wrong).  You're right in that the WARN is 
> > > flawed.  And
> > > the reason for that is because you're wrong about the hotplug case.  In 
> > > this version
> > > of things, the compatibility checks are routed through hardware enabling, 
> > > i.e. this
> > > flow is used only when loading KVM.  This helper should only be called 
> > > via SMP function
> > > call, which means that IRQs should always be disabled.
> > 
> > Did you mean below code change in later patch "[PATCH 39/44] KVM: Drop
> > kvm_count_lock and instead protect kvm_usage_count with kvm_lock"?
> > 
> > /*
> >  * Abort the CPU online process if hardware virtualization cannot
> >  * be enabled. Otherwise running VMs would encounter unrecoverable
> > @@ -5039,13 +5039,16 @@ static int kvm_online_cpu(unsigned int cpu)
> > if (kvm_usage_count) {
> > WARN_ON_ONCE(atomic_read(_enable_failed));
> >  
> > +   local_irq_save(flags);
> > hardware_enable_nolock(NULL);
> > +   local_irq_restore(flags);
> 
> Sort of.  What I was saying is that in this v1, the compatibility checks that 
> are
> done during harware enabling are initiated from vendor code, i.e. VMX and SVM 
> call
> {svm,vmx}_check_processor_compat() directly.  As a result, the compat checks 
> that
> are handled in common code:
> 
>   if (__cr4_reserved_bits(cpu_has, c) !=
>   __cr4_reserved_bits(cpu_has, _cpu_data))
>   return -EIO;
> 
> are skipped.  And if that's fixed, then the above hardware_enable_nolock() 
> call
> will bounce through kvm_x86_check_processor_compatibility() with IRQs enabled
> once the KVM hotplug hook is moved to the ONLINE section.

Oh I see.  So you still want the kvm_x86_ops->check_processor_compatibility(),
in order to avoid duplicating the above code in SVM and VMX.

> 
> As above, the simple "fix" would be to disable IRQs, but that's not actually
> necessary.  The only requirement is that preemption is disabled so that the 
> checks
> are done on the current CPU.  
> 

Probably even preemption is allowed, as long as the compatibility check is not
scheduled to another cpu.


> The "IRQs disabled" check was a deliberately
> agressive WARN that was added to guard against doing compatibility checks from
> the "wrong" location.
> 
> E.g. this is what I ended up with for a changelog to drop the irqs_disabled()
> check and for the end code (though it's not tested yet...)
> 
> Drop kvm_x86_check_processor_compatibility()'s WARN that IRQs are
> disabled, as the ONLINE section runs with IRQs disabled.  The WARN wasn't
 ^
 enabled.

> intended to be a requirement, e.g. disabling preemption is sufficient,
> the IRQ thing was purely an aggressive sanity check since the helper was
> only ever invoked via SMP function call.
> 
> 
> static int kvm_x86_check_processor_compatibility(void)
> {
> int cpu = smp_processor_id();
> struct cpuinfo_x86 *c = _data(cpu);
> 
> /*
>  * Compatibility checks are done when loading KVM and when enabling
>  * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
>  * compatible, i.e. KVM should never perform a compatibility check on
>  * an offline CPU.
>  */
> WARN_ON(!cpu_online(cpu));

Looks good to me.  Perhaps this also can be removed, though.

And IMHO the removing of WARN_ON(!irq_disabled()) should be folded to the patch
"[PATCH 37/44] KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section". 
Because moving from STARTING section to ONLINE section changes the IRQ status
when the compatibility check is called.

> 
> if (__cr4_reserved_bits(cpu_has, c) !=
> __cr4_reserved_bits(cpu_has, _cpu_data))
> return -EIO;
> 
> return static_call(kvm_x86_check_processor_compatibility)();
> }
> 
> 
> int kvm_arch_hardware_enable(void)
> {
> struct kvm *kvm;
> struct kvm_vcpu *vcpu;
> unsigned long i;
> int ret;
> u64 local_tsc;
> u64 max_tsc = 0;
> bool stable, backwards_tsc = false;
> 
> kvm_user_return_msr_cpu_online();
> 
> ret = kvm_x86_check_processor_compatibility();
> if (ret)
> return ret;
> 
>

Re: [PATCH mm-unstable v1 12/20] RDMA/siw: remove FOLL_FORCE usage

2022-11-16 Thread Jason Gunthorpe

On Wed, Nov 16, 2022 at 11:26:51AM +0100, David Hildenbrand wrote:
> GUP now supports reliable R/O long-term pinning in COW mappings, such
> that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
> far in one corner case (DAXFS file with holes), which can be ignored
> because GUP does not support long-term pinning in fsdax (see
> check_vma_flags()).
> 
> Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
> for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
> using FOLL_FORCE, which is really only for ptrace access.
> 
> Cc: Bernard Metzler 
> Cc: Jason Gunthorpe 
> Cc: Leon Romanovsky 
> Signed-off-by: David Hildenbrand 
> ---
>  drivers/infiniband/sw/siw/siw_mem.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [PATCH mm-unstable v1 11/20] RDMA/usnic: remove FOLL_FORCE usage

2022-11-16 Thread Jason Gunthorpe

On Wed, Nov 16, 2022 at 11:26:50AM +0100, David Hildenbrand wrote:
> GUP now supports reliable R/O long-term pinning in COW mappings, such
> that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
> far in one corner case (DAXFS file with holes), which can be ignored
> because GUP does not support long-term pinning in fsdax (see
> check_vma_flags()).
> 
> Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
> for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
> using FOLL_FORCE, which is really only for ptrace access.
> 
> Cc: Christian Benvenuti 
> Cc: Nelson Escobar 
> Cc: Jason Gunthorpe 
> Cc: Leon Romanovsky 
> Signed-off-by: David Hildenbrand 
> ---
>  drivers/infiniband/hw/usnic/usnic_uiom.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [PATCH mm-unstable v1 10/20] RDMA/umem: remove FOLL_FORCE usage

2022-11-16 Thread Jason Gunthorpe

On Wed, Nov 16, 2022 at 11:26:49AM +0100, David Hildenbrand wrote:
> GUP now supports reliable R/O long-term pinning in COW mappings, such
> that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
> far in one corner case (DAXFS file with holes), which can be ignored
> because GUP does not support long-term pinning in fsdax (see
> check_vma_flags()).
> 
> Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
> for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
> using FOLL_FORCE, which is really only for ptrace access.
> 
> Tested-by: Leon Romanovsky  # Over mlx4 and mlx5.
> Cc: Jason Gunthorpe 
> Cc: Leon Romanovsky 
> Signed-off-by: David Hildenbrand 
> ---
>  drivers/infiniband/core/umem.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [PATCH v2 3/3] treewide: use get_random_u32_between() when possible

2022-11-16 Thread Jason A. Donenfeld

On Wed, Nov 16, 2022 at 04:31:18PM -0800, Kees Cook wrote:
> On Thu, Nov 17, 2022 at 01:03:14AM +0100, Jason A. Donenfeld wrote:
> > On Thu, Nov 17, 2022 at 12:55:47AM +0100, Jason A. Donenfeld wrote:
> > > 2) What to call it:
> > >- between I still like, because it mirrors "I'm thinking of a number
> > >  between 1 and 10 and..." that everybody knows,
> > >- inclusive I guess works, but it's not a preposition,
> > >- bikeshed color #3?
> > 
> > - between
> > - ranged
> > - spanning
> > 
> > https://www.thefreedictionary.com/List-of-prepositions.htm
> > - amid
> > 
> > Sigh, names.
> 
> I think "inclusive" is best.

I find it not very descriptive of what the function does. Is there one
you like second best? Or are you convinced they're all much much much
worse than "inclusive" that they shouldn't be considered?

Jason

Re: [PATCH v2 3/3] treewide: use get_random_u32_between() when possible

2022-11-16 Thread Kees Cook

On Thu, Nov 17, 2022 at 01:03:14AM +0100, Jason A. Donenfeld wrote:
> On Thu, Nov 17, 2022 at 12:55:47AM +0100, Jason A. Donenfeld wrote:
> > 2) What to call it:
> >- between I still like, because it mirrors "I'm thinking of a number
> >  between 1 and 10 and..." that everybody knows,
> >- inclusive I guess works, but it's not a preposition,
> >- bikeshed color #3?
> 
> - between
> - ranged
> - spanning
> 
> https://www.thefreedictionary.com/List-of-prepositions.htm
> - amid
> 
> Sigh, names.

I think "inclusive" is best. The other words still don't provide
unambiguous language. It's the language used in formal math, e.g.
sigma-notation, etc. It's an adjective for "get random" (verb, noun).

-- 
Kees Cook

Re: [PATCH v2 3/3] treewide: use get_random_u32_between() when possible

2022-11-16 Thread Jason A. Donenfeld

On Thu, Nov 17, 2022 at 12:55:47AM +0100, Jason A. Donenfeld wrote:
> 2) What to call it:
>- between I still like, because it mirrors "I'm thinking of a number
>  between 1 and 10 and..." that everybody knows,
>- inclusive I guess works, but it's not a preposition,
>- bikeshed color #3?

- between
- ranged
- spanning

https://www.thefreedictionary.com/List-of-prepositions.htm
- amid

Sigh, names.

Re: [PATCH v2 3/3] treewide: use get_random_u32_between() when possible

2022-11-16 Thread Jason A. Donenfeld

On Wed, Nov 16, 2022 at 02:43:13PM -0800, Kees Cook wrote:
> On Mon, Nov 14, 2022 at 05:45:58PM +0100, Jason A. Donenfeld wrote:
> > -   (get_random_u32_below(1024) + 1) * PAGE_SIZE;
> > +   get_random_u32_between(1, 1024 + 1) * PAGE_SIZE;
> 
> I really don't like "between". Can't this be named "inclusive" (and
> avoid adding 1 everywhere, which seems ugly), or at least named
> something less ambiguous?
> 
> > -   n = get_random_u32_below(100) + 1;
> > +   n = get_random_u32_between(1, 101);
> 
> Because I find this much less readable. "Below 100" is clear: 0-99
> inclusive, plus 1, so 1-100 inclusive. "Between 1 and 101" is not obvious
> to me to mean: 1-100 inclusive.
> 
> These seem so much nicer:
>   get_random_u32_inclusive(1, 1024)
>   get_random_u32_inclusive(1, 100)

Yann pointed out something similar -- the half-closed interval being
confusing -- and while I was initially dismissive, I've warmed up to
doing this fully closed after sending a diff of that:

https://lore.kernel.org/lkml/y3qt8hixj8gio...@zx2c4.com/

So okay, let's say that I'll implement the inclusive version instead. We
now have two problems to solve:

1) How/whether to make f(0, UR2_MAX) safe,
   - without additional 64-bit arithmetic,
   - minimizing the number of branches.
   I have a few ideas I'll code golf for a bit.

2) What to call it:
   - between I still like, because it mirrors "I'm thinking of a number
 between 1 and 10 and..." that everybody knows,
   - inclusive I guess works, but it's not a preposition,
   - bikeshed color #3?

I think I can make progress with (1) alone by fiddling around with
godbolt enough, like usual. I could use some more ideas for (2) though.

Jason

Re: [PATCH v4 2/3] powerpc/pseries: PLPKS SED Opal keystore support

2022-11-16 Thread Greg Joyce

On Fri, 2022-10-07 at 19:09 +, Elliott, Robert (Servers) wrote:
> > -Original Message-
> > From: gjo...@linux.vnet.ibm.com 
> > Sent: Friday, August 19, 2022 5:32 PM
> > To: linux-bl...@vger.kernel.org
> > Cc: linuxppc-dev@lists.ozlabs.org; jonathan.derr...@linux.dev;
> > brk...@linux.vnet.ibm.com; msucha...@suse.de; m...@ellerman.id.au;
> > na...@linux.ibm.com; ax...@kernel.dk; a...@linux-foundation.org;
> > gjo...@linux.vnet.ibm.com; linux-...@vger.kernel.org;
> > keyri...@vger.kernel.org; dhowe...@redhat.com; jar...@kernel.org
> > Subject: [PATCH v4 2/3] powerpc/pseries: PLPKS SED Opal keystore
> > support
> > 
> > +++ b/arch/powerpc/platforms/pseries/plpks_sed_ops.c
> ...
> > +struct plpks_sed_object_data {
> > +   u_char version;
> > +   u_char pad1[7];
> > +   u_long authority;
> > +   u_long range;
> > +   u_int  key_len;
> > +   u_char key[32];
> > +};
> ...
> > +/*
> > + * Read the SED Opal key from PLPKS given the label
> > + */
> > +int sed_read_key(char *keyname, char *key, u_int *keylen)
> > +{
> ...
> > +   *keylen = be32_to_cpu(data->key_len);
> > +
> > +   if (var.data) {
> > +   memcpy(key, var.data + offset, var.datalen - offset);
> > +   key[*keylen] = '\0';
> 
> Is there a guarantee that key_len is always < sizeof key, or
> does that need to be checked in more places?

Changed keylen paramter to be the maximum size that it copied. This 
will help avoid buffer overwrite.

Re: [PATCH v4 3/3] block: sed-opal: keystore access for SED Opal keys

2022-11-16 Thread Greg Joyce

On Fri, 2022-10-07 at 12:21 -0600, Jonathan Derrick wrote:
> LGTM besides comment below
> 
> Reviewed-by: Jonathan Derrick 
> 
> On 8/19/2022 4:31 PM, gjo...@linux.vnet.ibm.com wrote:
> > From: Greg Joyce 
> > 
> > Allow for permanent SED authentication keys by
> > reading/writing to the SED Opal non-volatile keystore.
> > 
> > Signed-off-by: Greg Joyce 
> > ---
> >  block/sed-opal.c | 18 --
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/sed-opal.c b/block/sed-opal.c
> > index 3bdb31cf3e7c..11b0eb3a656b 100644
> > --- a/block/sed-opal.c
> > +++ b/block/sed-opal.c
> > @@ -18,6 +18,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -2697,7 +2698,13 @@ static int opal_set_new_pw(struct opal_dev
> > *dev, struct opal_new_pw *opal_pw)
> > if (ret)
> > return ret;
> >  
> > -   /* update keyring with new password */
> > +   /* update keyring and arch var with new password */
> > +   ret = sed_write_key(OPAL_AUTH_KEY,
> > +   opal_pw->new_user_pw.opal_key.key,
> > +   opal_pw->new_user_pw.opal_key.key_len);
> > +   if (ret != -EOPNOTSUPP)
> > +   pr_warn("error updating SED key: %d\n", ret);
> I cant see any reason this would fail and make the keys inconsistent,
> but it seems
> like update_sed_opal_key() should be dependent on sed_write_key()
> succeeding

The thought was that since the key was already updated on the SED
drive, there should be an attempt to update it in the key store
even in the unlikely event the keyring update failed.

> 
> > +
> > ret = update_sed_opal_key(OPAL_AUTH_KEY,
> >   opal_pw->new_user_pw.opal_key.key,
> >   opal_pw-
> > >new_user_pw.opal_key.key_len);
> > @@ -2920,6 +2927,8 @@ EXPORT_SYMBOL_GPL(sed_ioctl);
> >  static int __init sed_opal_init(void)
> >  {
> > struct key *kr;
> > +   char init_sed_key[OPAL_KEY_MAX];
> > +   int keylen = OPAL_KEY_MAX;
> >  
> > kr = keyring_alloc(".sed_opal",
> >GLOBAL_ROOT_UID, GLOBAL_ROOT_GID,
> > current_cred(),
> > @@ -2932,6 +2941,11 @@ static int __init sed_opal_init(void)
> >  
> > sed_opal_keyring = kr;
> >  
> > -   return 0;
> > +   if (sed_read_key(OPAL_AUTH_KEY, init_sed_key, ) < 0) {
> > +   memset(init_sed_key, '\0', sizeof(init_sed_key));
> > +   keylen = OPAL_KEY_MAX;
> > +   }
> > +
> > +   return update_sed_opal_key(OPAL_AUTH_KEY, init_sed_key,
> > keylen);
> >  }
> >  late_initcall(sed_opal_init);

Re: [patch 13/39] PCI/MSI: Use msi_domain_info::bus_token

2022-11-16 Thread Thomas Gleixner

On Wed, Nov 16 2022 at 13:51, Jason Gunthorpe wrote:
> On Fri, Nov 11, 2022 at 02:54:35PM +0100, Thomas Gleixner wrote:
>>  /* PCI-MSI is oneshot-safe */
>>  info->chip->flags |= IRQCHIP_ONESHOT_SAFE;
>> +/* Let the core update the bus token */
>> +info->bus_token = DOMAIN_BUS_PCI_MSI;
>
> comment seems a bit obvious

:)

> Reviewed-by: Jason Gunthorpe 
>
> Should the callers be updated to set this in their "struct
> msi_domain_info" ?

For PCI/MSI we can handle that in the core for all of them. :)

The other msi_domain_info usage in various places needs obviously
special care.

Thanks,

tglx

Re: [patch 12/39] genirq/msi: Add bus token to struct msi_domain_info

2022-11-16 Thread Thomas Gleixner

On Wed, Nov 16 2022 at 13:49, Jason Gunthorpe wrote:

> On Fri, Nov 11, 2022 at 02:54:33PM +0100, Thomas Gleixner wrote:
>> From: Ahmed S. Darwish 
>> 
>> Add a bus token member to struct msi_domain_info and let
>> msi_create_irq_domain() set the bus token.
>> 
>> That allows to remove the bus token updates at the call sites.
>> 
>> Suggested-by: Thomas Gleixner 
>> Signed-off-by: Ahmed S. Darwish 
>> Signed-off-by: Thomas Gleixner 
>> ---
>>  include/linux/msi.h |   19 +++
>>  kernel/irq/msi.c|7 +--
>>  2 files changed, 16 insertions(+), 10 deletions(-)
>
> Reviewed-by: Jason Gunthorpe 
>
>>  struct msi_domain_info {
>> -u32 flags;
>> -struct msi_domain_ops   *ops;
>> -struct irq_chip *chip;
>> -void*chip_data;
>> -irq_flow_handler_t  handler;
>> -void*handler_data;
>> -const char  *handler_name;
>> -void*data;
>> +u32 flags;
>> +enum irq_domain_bus_token   bus_token;
>> +struct msi_domain_ops   *ops;
>> +struct irq_chip *chip;
>> +void*chip_data;
>> +irq_flow_handler_t  handler;
>> +void*handler_data;
>> +const char  *handler_name;
>> +void*data;
>>  };
>
> This is why I've been frowning on horizontal alignment :(

Yes, it's annoying when you have to adjust it, but it's fundamentaly
simpler to parse than the clogged together word salad.

Re: [patch 08/39] genirq/msi: Provide msi_domain_ops::post_free()

2022-11-16 Thread Thomas Gleixner

On Wed, Nov 16 2022 at 13:44, Jason Gunthorpe wrote:
> On Fri, Nov 11, 2022 at 02:54:27PM +0100, Thomas Gleixner wrote:
>> To prepare for removing the exposure of __msi_domain_free_irqs() provide a
>> post_free() callback in the MSI domain ops which can be used to solve
>> the problem of the only user of __msi_domain_free_irqs() in arch/powerpc.
>> 
>> Signed-off-by: Thomas Gleixner 
>> ---
>>  include/linux/msi.h |4 
>>  kernel/irq/msi.c|2 ++
>>  2 files changed, 6 insertions(+)
>
> Make sense, but I do wonder why PPC needs this ordering..

Some hypervisor thing.

Re: [PATCH v2 3/3] treewide: use get_random_u32_between() when possible

2022-11-16 Thread Kees Cook

On Mon, Nov 14, 2022 at 05:45:58PM +0100, Jason A. Donenfeld wrote:
> - (get_random_u32_below(1024) + 1) * PAGE_SIZE;
> + get_random_u32_between(1, 1024 + 1) * PAGE_SIZE;

I really don't like "between". Can't this be named "inclusive" (and
avoid adding 1 everywhere, which seems ugly), or at least named
something less ambiguous?

> - n = get_random_u32_below(100) + 1;
> + n = get_random_u32_between(1, 101);

Because I find this much less readable. "Below 100" is clear: 0-99
inclusive, plus 1, so 1-100 inclusive. "Between 1 and 101" is not obvious
to me to mean: 1-100 inclusive.

These seem so much nicer:
get_random_u32_inclusive(1, 1024)
get_random_u32_inclusive(1, 100)

-- 
Kees Cook

Re: [PATCH 2/2] cxl: fix possible null-ptr-deref in cxl_pci_init_afu|adapter()

2022-11-16 Thread Frederic Barrat





On 11/11/2022 15:54, Yang Yingliang wrote:

If device_register() fails in cxl_pci_afu|adapter(), the device
is not added, device_unregister() can not be called in the error
path, otherwise it will cause a null-ptr-deref because of removing
not added device.

As comment of device_register() says, it should use put_device() to give
up the reference in the error path. So split device_unregister() into
device_del() and put_device(), then goes to put dev when register fails.

Fixes: f204e0b8cedd ("cxl: Driver code for powernv PCIe based cards for userspace 
access")
Signed-off-by: Yang Yingliang 
---



Acked-by: Frederic Barrat 

  Fred



  drivers/misc/cxl/pci.c | 20 
  1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 3de0aea62ade..6d495d641c95 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1164,10 +1164,10 @@ static int pci_init_afu(struct cxl *adapter, int slice, 
struct pci_dev *dev)
 * if it returns an error!
 */
if ((rc = cxl_register_afu(afu)))
-   goto err_put1;
+   goto err_put_dev;
  
  	if ((rc = cxl_sysfs_afu_add(afu)))

-   goto err_put1;
+   goto err_del_dev;
  
  	adapter->afu[afu->slice] = afu;
  
@@ -1176,10 +1176,12 @@ static int pci_init_afu(struct cxl *adapter, int slice, struct pci_dev *dev)
  
  	return 0;
  
-err_put1:

+err_del_dev:
+   device_del(>dev);
+err_put_dev:
pci_deconfigure_afu(afu);
cxl_debugfs_afu_remove(afu);
-   device_unregister(>dev);
+   put_device(>dev);
return rc;
  
  err_free_native:

@@ -1667,23 +1669,25 @@ static struct cxl *cxl_pci_init_adapter(struct pci_dev 
*dev)
 * even if it returns an error!
 */
if ((rc = cxl_register_adapter(adapter)))
-   goto err_put1;
+   goto err_put_dev;
  
  	if ((rc = cxl_sysfs_adapter_add(adapter)))

-   goto err_put1;
+   goto err_del_dev;
  
  	/* Release the context lock as adapter is configured */

cxl_adapter_context_unlock(adapter);
  
  	return adapter;
  
-err_put1:

+err_del_dev:
+   device_del(>dev);
+err_put_dev:
/* This should mirror cxl_remove_adapter, except without the
 * sysfs parts
 */
cxl_debugfs_adapter_remove(adapter);
cxl_deconfigure_adapter(adapter);
-   device_unregister(>dev);
+   put_device(>dev);
return ERR_PTR(rc);
  
  err_release:

Re: [patch 39/39] x86/apic: Remove X86_IRQ_ALLOC_CONTIGUOUS_VECTORS

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:55:17PM +0100, Thomas Gleixner wrote:
> Now that the PCI/MSI core code does early checking for multi-MSI support
> X86_IRQ_ALLOC_CONTIGUOUS_VECTORS is not required anymore.

> Remove the flag and rely on MSI_FLAG_MULTI_PCI_MSI.

Reviewed-by: Jason Gunthorpe 

Jason

Re: [PATCH 1/2] cxl: fix possible null-ptr-deref in cxl_guest_init_afu|adapter()

2022-11-16 Thread Frederic Barrat





On 11/11/2022 15:54, Yang Yingliang wrote:

If device_register() fails in cxl_register_afu|adapter(), the device
is not added, device_unregister() can not be called in the error path,
otherwise it will cause a null-ptr-deref because of removing not added
device.

As comment of device_register() says, it should use put_device() to give
up the reference in the error path. So split device_unregister() into
device_del() and put_device(), then goes to put dev when register fails.

Fixes: 14baf4d9c739 ("cxl: Add guest-specific code")
Signed-off-by: Yang Yingliang 
---



Thanks for fixing it!
At first, I was slightly uneasy about calling device_del() and 
put_device() directly, i.e. it would be better not to worry about what's 
under the hood of device_unregister(). But 1) I don't see how else to 
fix it and 2) more importantly, I looked at the history of 
device_unregister() to see how frequently it changed. I can now rest 
easy :-)


Acked-by: Frederic Barrat 

  Fred




  drivers/misc/cxl/guest.c | 24 ++--
  1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c
index 375f692ae9d6..fb95a2d5cef4 100644
--- a/drivers/misc/cxl/guest.c
+++ b/drivers/misc/cxl/guest.c
@@ -965,10 +965,10 @@ int cxl_guest_init_afu(struct cxl *adapter, int slice, 
struct device_node *afu_n
 * if it returns an error!
 */
if ((rc = cxl_register_afu(afu)))
-   goto err_put1;
+   goto err_put_dev;
  
  	if ((rc = cxl_sysfs_afu_add(afu)))

-   goto err_put1;
+   goto err_del_dev;
  
  	/*

 * pHyp doesn't expose the programming models supported by the
@@ -984,7 +984,7 @@ int cxl_guest_init_afu(struct cxl *adapter, int slice, 
struct device_node *afu_n
afu->modes_supported = CXL_MODE_DIRECTED;
  
  	if ((rc = cxl_afu_select_best_mode(afu)))

-   goto err_put2;
+   goto err_remove_sysfs;
  
  	adapter->afu[afu->slice] = afu;
  
@@ -1004,10 +1004,12 @@ int cxl_guest_init_afu(struct cxl *adapter, int slice, struct device_node *afu_n
  
  	return 0;
  
-err_put2:

+err_remove_sysfs:
cxl_sysfs_afu_remove(afu);
-err_put1:
-   device_unregister(>dev);
+err_del_dev:
+   device_del(>dev);
+err_put_dev:
+   put_device(>dev);
free = false;
guest_release_serr_irq(afu);
  err2:
@@ -1141,18 +1143,20 @@ struct cxl *cxl_guest_init_adapter(struct device_node 
*np, struct platform_devic
 * even if it returns an error!
 */
if ((rc = cxl_register_adapter(adapter)))
-   goto err_put1;
+   goto err_put_dev;
  
  	if ((rc = cxl_sysfs_adapter_add(adapter)))

-   goto err_put1;
+   goto err_del_dev;
  
  	/* release the context lock as the adapter is configured */

cxl_adapter_context_unlock(adapter);
  
  	return adapter;
  
-err_put1:

-   device_unregister(>dev);
+err_del_dev:
+   device_del(>dev);
+err_put_dev:
+   put_device(>dev);
free = false;
cxl_guest_remove_chardev(adapter);
  err1:

Re: [patch 38/39] genirq/msi: Remove msi_domain_ops::msi_check()

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:55:15PM +0100, Thomas Gleixner wrote:
> No more users.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/msi.h |4 
>  kernel/irq/msi.c|   17 +
>  2 files changed, 1 insertion(+), 20 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 37/39] PCI/MSI: Remove redundant msi_check() callback

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:55:14PM +0100, Thomas Gleixner wrote:
> All these sanity checks are now done _before_ any allocation work
> happens. No point in doing it twice.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/irqdomain.c |   48 
> 
>  1 file changed, 48 deletions(-)

Reviewed-by: Jason Gunthorpe 

Much clearer this way

Jason

Re: [patch 36/39] PCI/MSI: Validate MSIX contiguous restriction early

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:55:12PM +0100, Thomas Gleixner wrote:
> With interrupt domains the sanity check for MSI-X vector validation can be
> done _before_ any allocation happens. The sanity check only applies to the
> allocation functions which have an 'entries' array argument. The entries
> array is filled by the caller with the requested MSI-X indicies. Some drivers
> have gaps in the index space which is not supported on all architectures.
> 
> The PCI/MSI irqdomain has a 'feature' bit to enforce this validation late
> during the allocation phase.
> 
> Just do it right away before doing any other work along with the other
> sanity checks on that array.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/msi.c |   11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 35/39] PCI/MSI: Reject MSI-X early

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:55:11PM +0100, Thomas Gleixner wrote:
> Similar to PCI multi-MSI reject MSI-X enablement when a irq domain is
> attached to the device which does not support MSI-X.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/msi.c |4 
>  1 file changed, 4 insertions(+)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 34/39] PCI/MSI: Reject multi-MSI early

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:55:09PM +0100, Thomas Gleixner wrote:
> When hierarchical MSI interrupt domains are enabled then there is no point
> to do tons of work and detect the missing support for multi-MSI late in the
> allocation path.
> 
> Just query the domain feature flags right away. The query function is going
> to be used for other purposes later and has a mode argument which influences
> the result:
> 
>   ALLOW_LEGACY returns true when:
>  - there is no irq domain attached (legacy support)
>  - there is a irq domain attached which has the feature flag set
> 
>   DENY_LEGACY returns only true when:
>  - there is a irq domain attached which has the feature flag set
> 
> This allows to use the function universally without ifdeffery in the
> calling code.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/irqdomain.c |   22 ++
>  drivers/pci/msi/msi.c   |4 
>  drivers/pci/msi/msi.h   |9 +
>  3 files changed, 35 insertions(+)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 33/39] PCI/MSI: Sanitize MSI-X checks

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:55:07PM +0100, Thomas Gleixner wrote:
> There is no point in doing the same sanity checks over and over in a loop
> during MSI-X enablement. Put them in front of the loop and return early
> when they fail.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/msi.c |   67 
> +-
>  1 file changed, 34 insertions(+), 33 deletions(-)

> + /* PCI_IRQ_VIRTUAL is a horrible hack! */

New comment too :)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 18/39] PCI/MSI: Move mask and unmask helpers to msi.h

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:43PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> The upcoming support for per device MSI interrupt domains needs to share
> some of the inline helpers with the MSI implementation.
> 
> Move them to the header file.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 
> 
> ---
>  drivers/pci/msi/msi.c | 61 +--
>  drivers/pci/msi/msi.h | 83 
> +---
>  2 files changed, 74 insertions(+), 70 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 17/39] PCI/MSI: Get rid of externs in msi.h

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:42PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> Follow the style of 
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/msi.h |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 16/39] genirq: Get rid of GENERIC_MSI_IRQ_DOMAIN

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:40PM +0100, Thomas Gleixner wrote:
> Adjust to reality and remove another layer of pointless Kconfig
> indirection. CONFIG_GENERIC_MSI_IRQ is good enough to serve
> all purposes.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/base/Makefile   |2 +-
>  drivers/bus/fsl-mc/Kconfig  |2 +-
>  drivers/dma/Kconfig |2 +-
>  drivers/dma/qcom/hidma.c|8 
>  drivers/iommu/Kconfig   |2 +-
>  drivers/irqchip/Kconfig |6 +++---
>  drivers/mailbox/Kconfig |2 +-
>  drivers/pci/Kconfig |1 -
>  drivers/perf/Kconfig|2 +-
>  drivers/soc/ti/Kconfig  |2 +-
>  include/asm-generic/msi.h   |4 ++--
>  include/linux/device.h  |8 +++-
>  include/linux/gpio/driver.h |2 +-
>  include/linux/msi.h |   10 ++
>  kernel/irq/Kconfig  |7 +--
>  kernel/irq/msi.c|3 ---
>  16 files changed, 23 insertions(+), 40 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 15/39] PCI/MSI: Get rid of PCI_MSI_IRQ_DOMAIN

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:38PM +0100, Thomas Gleixner wrote:
> What a zoo:
> 
>  PCI_MSI
>   select GENERIC_MSI_IRQ
> 
>  PCI_MSI_IRQ_DOMAIN
>   def_bool y
>   depends on PCI_MSI
>   select GENERIC_MSI_IRQ_DOMAIN
> 
> Ergo PCI_MSI enables PCI_MSI_IRQ_DOMAIN which in turn selects
> GENERIC_MSI_IRQ_DOMAIN. So all the dependencies on PCI_MSI_IRQ_DOMAIN are
> just an indirection to PCI_MSI.
> 
> Match the reality and just admit that PCI_MSI requires
> GENERIC_MSI_IRQ_DOMAIN.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  arch/um/drivers/Kconfig |1 
>  arch/um/include/asm/pci.h   |2 -
>  arch/x86/Kconfig|1 
>  arch/x86/include/asm/pci.h  |4 +-
>  drivers/pci/Kconfig |8 +
>  drivers/pci/controller/Kconfig  |   30 +---
>  drivers/pci/controller/dwc/Kconfig  |   48 
> 
>  drivers/pci/controller/mobiveil/Kconfig |6 ++--
>  drivers/pci/msi/Makefile|2 -
>  drivers/pci/probe.c |2 -
>  include/linux/msi.h |   32 ++---
>  11 files changed, 56 insertions(+), 80 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 14/39] PCI/MSI: Let the MSI core free descriptors

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:37PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> Let the core do the freeing of descriptors and just keep it around for the
> legacy case.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/irqdomain.c |   10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -24,11 +24,12 @@ void pci_msi_teardown_msi_irqs(struct pc
>   struct irq_domain *domain;
>  
>   domain = dev_get_msi_domain(>dev);
> - if (domain && irq_domain_is_hierarchy(domain))
> + if (domain && irq_domain_is_hierarchy(domain)) {
>   msi_domain_free_irqs_descs_locked(domain, >dev);
> - else
> + } else {
>   pci_msi_legacy_teardown_msi_irqs(dev);
> - msi_free_msi_descs(>dev);
> + msi_free_msi_descs(>dev);
> + }
>  }
>  
>  /**
> @@ -170,6 +171,9 @@ struct irq_domain *pci_msi_create_irq_do
>   if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
>   pci_msi_domain_update_chip_ops(info);
>  
> + /* Let the core code free MSI descriptors when freeing interrupts */
> + info->flags |= MSI_FLAG_FREE_MSI_DESCS;

Comment repeats the one on the enum declaration?

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 13/39] PCI/MSI: Use msi_domain_info::bus_token

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:35PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> Set the bus token in the msi_domain_info structure and let the core code
> handle the update.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/irqdomain.c |   11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -162,8 +162,6 @@ struct irq_domain *pci_msi_create_irq_do
>struct msi_domain_info *info,
>struct irq_domain *parent)
>  {
> - struct irq_domain *domain;
> -
>   if (WARN_ON(info->flags & MSI_FLAG_LEVEL_CAPABLE))
>   info->flags &= ~MSI_FLAG_LEVEL_CAPABLE;
>  
> @@ -178,13 +176,10 @@ struct irq_domain *pci_msi_create_irq_do
>  
>   /* PCI-MSI is oneshot-safe */
>   info->chip->flags |= IRQCHIP_ONESHOT_SAFE;
> + /* Let the core update the bus token */
> + info->bus_token = DOMAIN_BUS_PCI_MSI;

comment seems a bit obvious

Reviewed-by: Jason Gunthorpe 

Should the callers be updated to set this in their "struct
msi_domain_info" ?

Jason

Re: [patch 12/39] genirq/msi: Add bus token to struct msi_domain_info

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:33PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> Add a bus token member to struct msi_domain_info and let
> msi_create_irq_domain() set the bus token.
> 
> That allows to remove the bus token updates at the call sites.
> 
> Suggested-by: Thomas Gleixner 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/msi.h |   19 +++
>  kernel/irq/msi.c|7 +--
>  2 files changed, 16 insertions(+), 10 deletions(-)

Reviewed-by: Jason Gunthorpe 

>  struct msi_domain_info {
> - u32 flags;
> - struct msi_domain_ops   *ops;
> - struct irq_chip *chip;
> - void*chip_data;
> - irq_flow_handler_t  handler;
> - void*handler_data;
> - const char  *handler_name;
> - void*data;
> + u32 flags;
> + enum irq_domain_bus_token   bus_token;
> + struct msi_domain_ops   *ops;
> + struct irq_chip *chip;
> + void*chip_data;
> + irq_flow_handler_t  handler;
> + void*handler_data;
> + const char  *handler_name;
> + void*data;
>  };

This is why I've been frowning on horizontal alignment :(

Jason

Re: [patch 11/39] genirq/irqdomain: Move bus token enum into a seperate header

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:32PM +0100, Thomas Gleixner wrote:
> Split the bus token defines out into a seperate header file to avoid
> inclusion of irqdomain.h in msi.h.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/irqdomain.h  |   22 +-
>  include/linux/irqdomain_defs.h |   26 ++
>  2 files changed, 27 insertions(+), 21 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 10/39] genirq/msi: Make __msi_domain_free_irqs() static

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:30PM +0100, Thomas Gleixner wrote:
> Now that the last user is gone, confine it to the core code.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/msi.h |4 
>  kernel/irq/msi.c|3 ++-
>  2 files changed, 2 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 09/39] powerpc/pseries/msi: Use msi_domain_ops::msi_post_free()

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:28PM +0100, Thomas Gleixner wrote:
> Use the new msi_post_free() callback which is invoked after the interrupts
> have been freed to tell the hypervisor about the shutdown.
> 
> This allows to remove the exposure of __msi_domain_free_irqs().
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Michael Ellerman 
> Cc: Christophe Leroy 
> Cc: linuxppc-dev@lists.ozlabs.org
> ---
>  arch/powerpc/platforms/pseries/msi.c |7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 08/39] genirq/msi: Provide msi_domain_ops::post_free()

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:27PM +0100, Thomas Gleixner wrote:
> To prepare for removing the exposure of __msi_domain_free_irqs() provide a
> post_free() callback in the MSI domain ops which can be used to solve
> the problem of the only user of __msi_domain_free_irqs() in arch/powerpc.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/msi.h |4 
>  kernel/irq/msi.c|2 ++
>  2 files changed, 6 insertions(+)

Make sense, but I do wonder why PPC needs this ordering..

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 07/39] genirq/msi: Make __msi_domain_alloc_irqs() static

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:25PM +0100, Thomas Gleixner wrote:
> Nothing outside of the core code requires this.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/msi.h |7 ++-
>  kernel/irq/msi.c|6 --
>  2 files changed, 6 insertions(+), 7 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 06/39] genirq/msi: Add missing kernel doc to msi_next_desc()

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:23PM +0100, Thomas Gleixner wrote:
> W=1 complains about this.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  kernel/irq/msi.c |1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 05/39] genirq/msi: Remove filter from msi_free_descs_free_range()

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:22PM +0100, Thomas Gleixner wrote:
> When a range of descriptors is freed then all of them are not associated to
> a linux interrupt. Remove the filter and add a warning to the free function.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/base/platform-msi.c |2 +-
>  include/linux/msi.h |5 ++---
>  kernel/irq/msi.c|   19 ++-
>  3 files changed, 13 insertions(+), 13 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 04/39] genirq/msi: Use MSI_DESC_ALL in msi_add_simple_msi_descs()

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:20PM +0100, Thomas Gleixner wrote:
> There are no associated MSI descriptors in the requested range when the MSI
> descriptor allocation fails. Use MSI_DESC_ALL as the filter which prepares
> the next step to get rid of the filter for freeing.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  kernel/irq/msi.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 01/39] PCI/MSI: Check for MSI enabled in __pci_msix_enable()

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:54:15PM +0100, Thomas Gleixner wrote:
> PCI/MSI and PCI/MSI-X are mutually exclusive, but the MSI-X enable code
> lacks a check for already enabled MSI.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/msi.c |5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Jason Gunthorpe 

Jason

Re: [patch 30/39] PCI/MSI: Move pci_msi_restore_state() to api.c

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:55:03PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_msi_enabled() and add kernel-doc for the function.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 
> 
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index ee9ed5ccd94d..8d1cf6db9bd7 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -308,6 +308,21 @@ void pci_free_irq_vectors(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL(pci_free_irq_vectors);
>  
> +/**
> + * pci_restore_msi_state() - Restore cached MSI(-X) state on device
> + * @dev: the PCI device to operate on
> + *
> + * Write the Linux-cached MSI(-X) state back on device. This is
> + * typically useful upon system resume, or after an error-recovery PCI
> + * adapter reset.
> + */
> +void pci_restore_msi_state(struct pci_dev *dev)
> +{
> + __pci_restore_msi_state(dev);
> + __pci_restore_msix_state(dev);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_msi_state);

This leaves behind two functions that are only called from here.

I think they should be moved into this file as well.

Or perhaps more broadly, it would make sense if the functions
were split up into 'for irq chip implementors' 'for drivers' and
'shared utilities'.

There are several other examples, like
__pci_enable_msix_range()/__pci_enable_msi_range() that are only used
by api.c and could reasonably move as well, plus their tree of
single-use support functions.

And if you think about it that way - things have ended up so that
api.c is really all about vector allocation (as that is the only thing
the drivers actually do) so it would be tempting to call it
allocator.c and consolidate everything in that topic.

Jason

Re: [patch 02/39] iommu/vt-d: Remove bogus check for multi MSI-X

2022-11-16 Thread Ashok Raj

On Wed, Nov 16, 2022 at 06:02:30PM +0100, Thomas Gleixner wrote:
> On Wed, Nov 16 2022 at 07:52, Ashok Raj wrote:
> > On Fri, Nov 11, 2022 at 02:54:17PM +0100, Thomas Gleixner wrote:
> >> PCI/Multi-MSI is MSI specific and not supported for MSI-X.
> >> 
> >> Signed-off-by: Thomas Gleixner 
> >> ---
> >>  drivers/iommu/intel/irq_remapping.c |3 +--
> >>  1 file changed, 1 insertion(+), 2 deletions(-)
> >> 
> >> --- a/drivers/iommu/intel/irq_remapping.c
> >> +++ b/drivers/iommu/intel/irq_remapping.c
> >> @@ -1334,8 +1334,7 @@ static int intel_irq_remapping_alloc(str
> >>  
> >>if (!info || !iommu)
> >>return -EINVAL;
> >> -  if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI &&
> >> -  info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX)
> >> +  if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI)
> >>return -EINVAL;
> >>  
> >>/*
> >> 
> >
> > This check is only making sure that when multi-msi is requested that the
> > type has to be either MSI/MSIX.
> 
> MSI-X does not support multi vector allocations on a single entry.
> 
> > Wouldn't this change return -EINVAL when type = MSIX?
> 
> Rightfully so. MSIX vectors are allocated one by one. Has been that way
> forever.
> 

I thought why block multi-vector allocation on MSIX, but if there is no
use case makes perfect sense.

Thanks for the clarification.

Reviewed-by: Ashok Raj

Re: [patch 31/39] Documentation: PCI: Add reference to PCI/MSI device driver APIs

2022-11-16 Thread Jason Gunthorpe

On Fri, Nov 11, 2022 at 02:55:04PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> All exported device-driver MSI APIs are now grouped in one place at
> drivers/pci/msi/api.c with comprehensive kernel-docs added.
> 
> Reference these kernel-docs in the official PCI/MSI howto.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 
> 
> ---
>  Documentation/PCI/msi-howto.rst |   10 ++
>  1 file changed, 10 insertions(+)

I was wondering what the payoff was for moving everything to api.c,
this seems pretty good.

Though, in some regards it might be cleaner to teach ":export:" about
how to process symbol namespaces and put all the architecture facing
exports in some PCI_MSI_ARCH/IRQCHIP namespace which could achieve the
same effect for kdoc as moving all the code around and have the bonus
of discouraging people from mis-using the APIs inside inappropriate
drivers.

But, I like the idea, and the outcome is great so

Reviewed-by: Jason Gunthorpe 

Jason

Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc

2022-11-16 Thread Christophe Leroy



Le 16/11/2022 à 18:01, Hari Bathini a écrit :
>>
>> I also managed to test it on QEMU. The config is based on 
>> pmac32_defconfig.
> 
> I had the same config but hit this problem:
> 
>  # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>  test_bpf: #0 TAX
>  [ cut here ]
>  WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 
> bpf_int_jit_compile+0x8a0/0x9f8

Ok, I'll give it a try with test_bpf module.

Christophe

Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

2022-11-16 Thread Sean Christopherson

On Wed, Nov 16, 2022, Huang, Kai wrote:
> On Tue, 2022-11-15 at 20:16 +, Sean Christopherson wrote:
> > On Thu, Nov 10, 2022, Huang, Kai wrote:
> > > On Thu, 2022-11-10 at 01:33 +, Huang, Kai wrote:
> > > Hmm.. I wasn't thinking thoroughly.  I forgot CPU compatibility check also
> > > happens on all online cpus when loading KVM.  For this case, IRQ is 
> > > disabled and
> > > cpu_active() is true.  For the hotplug case, IRQ is enabled but  
> > > cpu_active() is
> > > false.
> > 
> > Actually, you're right (and wrong).  You're right in that the WARN is 
> > flawed.  And
> > the reason for that is because you're wrong about the hotplug case.  In 
> > this version
> > of things, the compatibility checks are routed through hardware enabling, 
> > i.e. this
> > flow is used only when loading KVM.  This helper should only be called via 
> > SMP function
> > call, which means that IRQs should always be disabled.
> 
> Did you mean below code change in later patch "[PATCH 39/44] KVM: Drop
> kvm_count_lock and instead protect kvm_usage_count with kvm_lock"?
> 
>   /*
>* Abort the CPU online process if hardware virtualization cannot
>* be enabled. Otherwise running VMs would encounter unrecoverable
> @@ -5039,13 +5039,16 @@ static int kvm_online_cpu(unsigned int cpu)
>   if (kvm_usage_count) {
>   WARN_ON_ONCE(atomic_read(_enable_failed));
>  
> + local_irq_save(flags);
>   hardware_enable_nolock(NULL);
> + local_irq_restore(flags);

Sort of.  What I was saying is that in this v1, the compatibility checks that 
are
done during harware enabling are initiated from vendor code, i.e. VMX and SVM 
call
{svm,vmx}_check_processor_compat() directly.  As a result, the compat checks 
that
are handled in common code:

if (__cr4_reserved_bits(cpu_has, c) !=
__cr4_reserved_bits(cpu_has, _cpu_data))
return -EIO;

are skipped.  And if that's fixed, then the above hardware_enable_nolock() call
will bounce through kvm_x86_check_processor_compatibility() with IRQs enabled
once the KVM hotplug hook is moved to the ONLINE section.

As above, the simple "fix" would be to disable IRQs, but that's not actually
necessary.  The only requirement is that preemption is disabled so that the 
checks
are done on the current CPU.  The "IRQs disabled" check was a deliberately
agressive WARN that was added to guard against doing compatibility checks from
the "wrong" location.

E.g. this is what I ended up with for a changelog to drop the irqs_disabled()
check and for the end code (though it's not tested yet...)

Drop kvm_x86_check_processor_compatibility()'s WARN that IRQs are
disabled, as the ONLINE section runs with IRQs disabled.  The WARN wasn't
intended to be a requirement, e.g. disabling preemption is sufficient,
the IRQ thing was purely an aggressive sanity check since the helper was
only ever invoked via SMP function call.

static int kvm_x86_check_processor_compatibility(void)
{
int cpu = smp_processor_id();
struct cpuinfo_x86 *c = _data(cpu);

/*
 * Compatibility checks are done when loading KVM and when enabling
 * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
 * compatible, i.e. KVM should never perform a compatibility check on
 * an offline CPU.
 */
WARN_ON(!cpu_online(cpu));

if (__cr4_reserved_bits(cpu_has, c) !=
__cr4_reserved_bits(cpu_has, _cpu_data))
return -EIO;

return static_call(kvm_x86_check_processor_compatibility)();
}

int kvm_arch_hardware_enable(void)
{
struct kvm *kvm;
struct kvm_vcpu *vcpu;
unsigned long i;
int ret;
u64 local_tsc;
u64 max_tsc = 0;
bool stable, backwards_tsc = false;

kvm_user_return_msr_cpu_online();

ret = kvm_x86_check_processor_compatibility();
if (ret)
return ret;

ret = static_call(kvm_x86_hardware_enable)();
if (ret != 0)
return ret;

}

Re: [patch 32/39] PCI/MSI: Reorder functions in msi.c

2022-11-16 Thread Thomas Gleixner

On Wed, Nov 16 2022 at 10:28, Bjorn Helgaas wrote:

> On Fri, Nov 11, 2022 at 02:55:06PM +0100, Thomas Gleixner wrote:
>> From: Ahmed S. Darwish 
>> 
>> There is no way to navigate msi.c without banging the head against the wall
>> every now and then because MSI and MSI-X specific functions are
>> intermingled and the code flow is completely non-obvious.
>> 
>> Reorder everthing so common helpers, MSI and MSI-X specific functions are
>> grouped together.
>
> s/everthing/everything/
>
>> Suggested-by: Thomas Gleixner 
>> Signed-off-by: Ahmed S. Darwish 
>> Signed-off-by: Thomas Gleixner 
>
> Acked-by: Bjorn Helgaas 
>
> I assume this is pure code movement, so I didn't even look at the
> text below.

It is.

Re: [patch 27/39] PCI/MSI: Move pci_disable_msix() to api.c

2022-11-16 Thread Thomas Gleixner

On Wed, Nov 16 2022 at 10:26, Bjorn Helgaas wrote:
>>  /**
>> + * pci_disable_msix() - Disable MSI-X interrupt mode on device
>> + * @dev: the PCI device to operate on
>> + *
>> + * Legacy device driver API to disable MSI-X interrupt mode on device,
>> + * free earlier-allocated interrupt vectors, and restore INTx emulation.
>
> Isn't INTx *emulation* a PCIe implementation detail?  Doesn't seem
> relevant to callers that it's emulated.

Don't know how the emulation ended up there. It restores INTx which is
obvioulsy not a wire on PCIe... But yes, it's uninteresting.

Re: [patch 23/39] PCI/MSI: Move pci_alloc_irq_vectors_affinity() to api.c

2022-11-16 Thread Thomas Gleixner

On Wed, Nov 16 2022 at 10:23, Bjorn Helgaas wrote:
> On Fri, Nov 11, 2022 at 02:54:51PM +0100, Thomas Gleixner wrote:
>> + * Same as pci_alloc_irq_vectors(), but with the extra @affd parameter.
>> + * Check that function docs, and  irq_affinity, for more details.
>
> Is " irq_affinity" some kernel-doc syntax, or is the "&"
> superfluous?

The latter.

Re: [patch 22/39] PCI/MSI: Move pci_alloc_irq_vectors() to api.c

2022-11-16 Thread Thomas Gleixner

On Wed, Nov 16 2022 at 10:22, Bjorn Helgaas wrote:
>> + * Allocate up to @max_vecs interrupt vectors on device. MSI-X irq
>
> s/irq/IRQ/
>
>> + * vector allocation has a higher precedence over plain MSI, which has a
>> + * higher precedence over legacy INTx emulation.
>> + *
>> + * Upon a successful allocation, the caller should use pci_irq_vector()
>> + * to get the Linux IRQ number to be passed to request_threaded_irq().
>> + * The driver must call pci_free_irq_vectors() on cleanup.
>> + *
>> + * Return: number of allocated vectors (which might be smaller than
>> + * @max_vecs), -ENOSPC if less than @min_vecs interrupt vectors are
>
> s/less/fewer/ (also in some previous patches, IIRC)

Will fix.

Re: [patch 20/39] PCI/MSI: Move pci_enable_msi() API to api.c

2022-11-16 Thread Thomas Gleixner

On Wed, Nov 16 2022 at 10:18, Bjorn Helgaas wrote:
> On Fri, Nov 11, 2022 at 02:54:46PM +0100, Thomas Gleixner wrote:
>> From: Ahmed S. Darwish 
>> 
>> To distangle the maze in msi.c all exported device-driver MSI APIs are now
>> to be grouped in one file, api.c.
>> 
>> Move pci_enable_msi() and make its kernel-doc comprehensive.
>> 
>> Signed-off-by: Ahmed S. Darwish 
>> Signed-off-by: Thomas Gleixner 
>
> Acked-by: Bjorn Helgaas 
>
> Nit: suggest "disentangle" or "untangle" for "distangle" here and in
> subsequent patches.

My fault. I suggested the word to Ahmed well knowing that this is one of
the words I never get spelled correctly :)

Re: [patch 15/39] PCI/MSI: Get rid of PCI_MSI_IRQ_DOMAIN

2022-11-16 Thread Thomas Gleixner

On Wed, Nov 16 2022 at 10:12, Bjorn Helgaas wrote:
> On Fri, Nov 11, 2022 at 02:54:38PM +0100, Thomas Gleixner wrote:
>> What a zoo:
>> 
>>  PCI_MSI
>>  select GENERIC_MSI_IRQ
>> 
>>  PCI_MSI_IRQ_DOMAIN
>>  def_bool y
>>  depends on PCI_MSI
>>  select GENERIC_MSI_IRQ_DOMAIN
>> 
>> Ergo PCI_MSI enables PCI_MSI_IRQ_DOMAIN which in turn selects
>> GENERIC_MSI_IRQ_DOMAIN. So all the dependencies on PCI_MSI_IRQ_DOMAIN are
>> just an indirection to PCI_MSI.
>> 
>> Match the reality and just admit that PCI_MSI requires
>> GENERIC_MSI_IRQ_DOMAIN.
>> 
>> Signed-off-by: Thomas Gleixner 
>
> Acked-by: Bjorn Helgaas 
>
> Just FYI, this will conflict with my work-in-progress to add more
> COMPILE_TEST coverage:
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?id=72b5e7c401a1
>
> No *actual* conflicts, just textually next door, so should be sipmle
> to resolve.  Worst case I can postpone my patch until the next cycle.

Linus should be able to resolve that conflict :)

Re: [patch 11/39] genirq/irqdomain: Move bus token enum into a seperate header

2022-11-16 Thread Ashok Raj

On Fri, Nov 11, 2022 at 02:54:32PM +0100, Thomas Gleixner wrote:
> Split the bus token defines out into a seperate header file to avoid
> inclusion of irqdomain.h in msi.h.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/irqdomain.h  |   22 +-
>  include/linux/irqdomain_defs.h |   26 ++
>  2 files changed, 27 insertions(+), 21 deletions(-)
> 


for Patches 5-11:

Reviewed-by: Ashok Raj

Re: [patch 03/39] iommu/amd: Remove bogus check for multi MSI-X

2022-11-16 Thread Thomas Gleixner

On Wed, Nov 16 2022 at 08:02, Ashok Raj wrote:
> On Fri, Nov 11, 2022 at 02:54:19PM +0100, Thomas Gleixner wrote:
>> PCI/Multi-MSI is MSI specific and not supported for MSI-X
>> 
>> Signed-off-by: Thomas Gleixner 
>> ---
>>  drivers/iommu/amd/iommu.c |3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -3294,8 +3294,7 @@ static int irq_remapping_alloc(struct ir
>>  
>>  if (!info)
>>  return -EINVAL;
>> -if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI &&
>> -info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX)
>> +if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI)
>>  return -EINVAL;
>>  
>>  /*
>> 
>
> nit:
>
> maybe better to merge patch2/3 since both seem related?

What's better about it? It's two different drivers.

Re: [patch 02/39] iommu/vt-d: Remove bogus check for multi MSI-X

2022-11-16 Thread Thomas Gleixner

On Wed, Nov 16 2022 at 07:52, Ashok Raj wrote:
> On Fri, Nov 11, 2022 at 02:54:17PM +0100, Thomas Gleixner wrote:
>> PCI/Multi-MSI is MSI specific and not supported for MSI-X.
>> 
>> Signed-off-by: Thomas Gleixner 
>> ---
>>  drivers/iommu/intel/irq_remapping.c |3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> --- a/drivers/iommu/intel/irq_remapping.c
>> +++ b/drivers/iommu/intel/irq_remapping.c
>> @@ -1334,8 +1334,7 @@ static int intel_irq_remapping_alloc(str
>>  
>>  if (!info || !iommu)
>>  return -EINVAL;
>> -if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI &&
>> -info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX)
>> +if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI)
>>  return -EINVAL;
>>  
>>  /*
>> 
>
> This check is only making sure that when multi-msi is requested that the
> type has to be either MSI/MSIX.

MSI-X does not support multi vector allocations on a single entry.

> Wouldn't this change return -EINVAL when type = MSIX?

Rightfully so. MSIX vectors are allocated one by one. Has been that way
forever.

Thanks,

tglx

Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc

2022-11-16 Thread Hari Bathini





On 16/11/22 12:14 am, Christophe Leroy wrote:



Le 14/11/2022 à 18:27, Christophe Leroy a écrit :



Le 14/11/2022 à 15:47, Hari Bathini a écrit :

Hi Christophe,

On 11/11/22 4:55 pm, Christophe Leroy wrote:

Le 10/11/2022 à 19:43, Hari Bathini a écrit :

Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Patches 1 & 2 add the arch specific functions needed to support this
feature. Patch 3 enables the support for powerpc. The last patch
ensures cleanup is handled racefully.




Tested the changes successfully on a PowerVM. patch_instruction(),
needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
Posting the patches in the meanwhile for feedback on these changes.


I did a quick test on ppc32, I don't get such a problem, only something
wrong in the dump print as traps intructions only are dumped, but
tcpdump works as expected:


Thanks for the quick test. Could you please share the config you used.
I am probably missing a few knobs in my conifg...





I also managed to test it on QEMU. The config is based on pmac32_defconfig.


I had the same config but hit this problem:

# echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
test_bpf: #0 TAX
[ cut here ]
	WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 
bpf_int_jit_compile+0x8a0/0x9f8

Modules linked in: test_bpf(+)
CPU: 0 PID: 96 Comm: modprobe Not tainted 6.1.0-rc5+ #116
Hardware name: PowerMac3,1 750CL 0x87210 PowerMac
NIP:  c0038224 LR: c0037f74 CTR: 0009
REGS: f1b41b10 TRAP: 0700   Not tainted  (6.1.0-rc5+)
MSR:  00029032   CR: 82004882  XER: 2000

	GPR00: c0037f74 f1b41bd0 c12333c0 ffea c19747e0 c0123a08 c1974a00 
c12333c0
	GPR08:  0b58 9032 0003 82004882 100d816a f266 

	GPR16:  c0c1  c0c1 c1913780  001a 
100a2ee3
	GPR24: 014c be857000 be857020 f1b41c00 c194c180 f1b46000 ffea 
f1b46000

NIP [c0038224] bpf_int_jit_compile+0x8a0/0x9f8
LR [c0037f74] bpf_int_jit_compile+0x5f0/0x9f8
Call Trace:
[f1b41bd0] [c0037f74] bpf_int_jit_compile+0x5f0/0x9f8 (unreliable)
[f1b41ca0] [c0123790] bpf_prog_select_runtime+0x178/0x1c4
[f1b41cd0] [c06cc4b0] bpf_prepare_filter+0x524/0x624
[f1b41d20] [c06cc63c] bpf_prog_create+0x8c/0xd0
[f1b41d40] [be85083c] test_bpf_init+0x46c/0x11b0 [test_bpf]
[f1b41df0] [c0008534] do_one_initcall+0x58/0x1b8
[f1b41e60] [c00b6c38] do_init_module+0x54/0x1e4
[f1b41e80] [c00b8f80] sys_init_module+0x10c/0x174
[f1b41f10] [c0014390] system_call_exception+0x94/0x144
[f1b41f30] [c001a1ac] ret_from_syscall+0x0/0x2c
--- interrupt: c00 at 0xfef48ac
NIP:  0fef48ac LR: 10015de0 CTR: 
REGS: f1b41f40 TRAP: 0c00   Not tainted  (6.1.0-rc5+)
MSR:  d032   CR: 88002224  XER: 2000

	GPR00: 0080 aff34990 a7a8d380 a75c0010 00477e90 1051b9a0 0fedfdd8 
d032
	GPR08:  0008 00478ffa 400f26fa 400f23c7 100d816a  

	GPR16:  1051b950  1051b92c 100d 100a2eab 100a2e97 
100a2ee3
	GPR24: 100a2ed5 1051b980 0001 100d01a8 1051b920 a75c0010 1051b9a0 
a7a86388

NIP [0fef48ac] 0xfef48ac
LR [10015de0] 0x10015de0
--- interrupt: c00
Instruction dump:
8081001c 38a4 7f23cb78 4bfff5d1 7f23cb78 8081001c 480eba85 82410098
8261009c 82a100a4 82e100ac 4bfffce8 <0fe0> 92010090 92410098 
92e100ac
---[ end trace  ]---
jited:1
	kernel tried to execute exec-protected page (be857020) - exploit 
attempt? (uid: 0)

BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xbe857020
Vector: 400 (Instruction Access) at [f1b41c30]
pc: be857020
lr: be84eaa4: __run_one+0xec/0x248 [test_bpf]
sp: f1b41cf0
   msr: 40009032
  current = 0xc12333c0
pid   = 96, comm = modprobe
	Linux version 6.1.0-rc5+ (root@hbathini-Standard-PC-Q35-ICH9-2009) 
(powerpc-linux-gnu-gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, GNU ld 
(GNU Binutils for Ubuntu) 2.38) #116 Wed Nov 16 20:48:10 IST 2022

enter ? for help
[link register   ] be84eaa4 __run_one+0xec/0x248 [test_bpf]
[f1b41cf0] be84e9dc __run_one+0x24/0x248 [test_bpf] (unreliable)
[f1b41d40] be850c78 test_bpf_init+0x8a8/0x11b0 [test_bpf]
[f1b41df0] c0008534 do_one_initcall+0x58/0x1b8

Re: [patch 01/39] PCI/MSI: Check for MSI enabled in __pci_msix_enable()

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:15PM +0100, Thomas Gleixner wrote:
> PCI/MSI and PCI/MSI-X are mutually exclusive, but the MSI-X enable code
> lacks a check for already enabled MSI.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/msi.c |5 +
>  1 file changed, 5 insertions(+)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -935,6 +935,11 @@ static int __pci_enable_msix_range(struc
>   if (maxvec < minvec)
>   return -ERANGE;
>  
> + if (dev->msi_enabled) {
> + pci_info(dev, "can't enable MSI-X (MSI already enabled)\n");
> + return -EINVAL;
> + }
> +
>   if (WARN_ON_ONCE(dev->msix_enabled))
>   return -EINVAL;
>  
>

Re: [patch 28/39] PCI/MSI: Move pci_irq_get_affinity() to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:59PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_irq_get_affinity() and let its kernel-doc match rest of the
> file.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

One nit below.

> ---
>  drivers/pci/msi/api.c | 43 +++
>  drivers/pci/msi/msi.c | 38 --
>  2 files changed, 43 insertions(+), 38 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 653a61868ae6..473df7ba0584 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -9,6 +9,7 @@
>   */
>  
>  #include 
> +#include 
>  
>  #include "msi.h"
>  
> @@ -251,6 +252,48 @@ int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
>  EXPORT_SYMBOL(pci_irq_vector);
>  
>  /**
> + * pci_irq_get_affinity() - Get a device interrupt vector affinity
> + * @dev: the PCI device to operate on
> + * @nr:  device-relative interrupt vector index (0-based); has different
> + *   meanings, depending on interrupt mode
> + * MSI-Xthe index in the MSI-X vector table
> + * MSI  the index of the enabled MSI vectors
> + * INTx must be 0
> + *
> + * Return: MSI/MSI-X vector affinity, NULL if @nr is out of range or if
> + * the MSI(-X) vector was allocated without explicit affinity
> + * requirements (e.g., by pci_enable_msi(), pci_enable_msix_range(), or
> + * pci_alloc_irq_vectors() without the %PCI_IRQ_AFFINITY flag). Return a
> + * generic set of CPU ids representing all possible CPUs available
> + * during system boot if the device is in legacy INTx mode.

s/ids/IDs/

> + */
> +const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
> +{
> + int idx, irq = pci_irq_vector(dev, nr);
> + struct msi_desc *desc;
> +
> + if (WARN_ON_ONCE(irq <= 0))
> + return NULL;
> +
> + desc = irq_get_msi_desc(irq);
> + /* Non-MSI does not have the information handy */
> + if (!desc)
> + return cpu_possible_mask;
> +
> + /* MSI[X] interrupts can be allocated without affinity descriptor */
> + if (!desc->affinity)
> + return NULL;
> +
> + /*
> +  * MSI has a mask array in the descriptor.
> +  * MSI-X has a single mask.
> +  */
> + idx = dev->msi_enabled ? nr : 0;
> + return >affinity[idx].mask;
> +}
> +EXPORT_SYMBOL(pci_irq_get_affinity);
> +
> +/**
>   * pci_free_irq_vectors() - Free previously allocated IRQs for a device
>   * @dev: the PCI device to operate on
>   *
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 6fa90d07d2e4..d78646d1c116 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -854,44 +854,6 @@ int __pci_enable_msix_range(struct pci_dev *dev,
>   }
>  }
>  
> -/**
> - * pci_irq_get_affinity - return the affinity of a particular MSI vector
> - * @dev: PCI device to operate on
> - * @nr:  device-relative interrupt vector index (0-based).
> - *
> - * @nr has the following meanings depending on the interrupt mode:
> - *   MSI-X:  The index in the MSI-X vector table
> - *   MSI:The index of the enabled MSI vectors
> - *   INTx:   Must be 0
> - *
> - * Return: A cpumask pointer or NULL if @nr is out of range
> - */
> -const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
> -{
> - int idx, irq = pci_irq_vector(dev, nr);
> - struct msi_desc *desc;
> -
> - if (WARN_ON_ONCE(irq <= 0))
> - return NULL;
> -
> - desc = irq_get_msi_desc(irq);
> - /* Non-MSI does not have the information handy */
> - if (!desc)
> - return cpu_possible_mask;
> -
> - /* MSI[X] interrupts can be allocated without affinity descriptor */
> - if (!desc->affinity)
> - return NULL;
> -
> - /*
> -  * MSI has a mask array in the descriptor.
> -  * MSI-X has a single mask.
> -  */
> - idx = dev->msi_enabled ? nr : 0;
> - return >affinity[idx].mask;
> -}
> -EXPORT_SYMBOL(pci_irq_get_affinity);
> -
>  struct pci_dev *msi_desc_to_pci_dev(struct msi_desc *desc)
>  {
>   return to_pci_dev(desc->dev);
>

Re: [patch 37/39] PCI/MSI: Remove redundant msi_check() callback

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:55:14PM +0100, Thomas Gleixner wrote:
> All these sanity checks are now done _before_ any allocation work
> happens. No point in doing it twice.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/irqdomain.c |   48 
> 
>  1 file changed, 48 deletions(-)
> 
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -64,51 +64,6 @@ static irq_hw_number_t pci_msi_domain_ca
>   (pci_domain_nr(dev->bus) & 0x) << 27;
>  }
>  
> -static inline bool pci_msi_desc_is_multi_msi(struct msi_desc *desc)
> -{
> - return !desc->pci.msi_attrib.is_msix && desc->nvec_used > 1;
> -}
> -
> -/**
> - * pci_msi_domain_check_cap - Verify that @domain supports the capabilities
> - * for @dev
> - * @domain:  The interrupt domain to check
> - * @info:The domain info for verification
> - * @dev: The device to check
> - *
> - * Returns:
> - *  0 if the functionality is supported
> - *  1 if Multi MSI is requested, but the domain does not support it
> - *  -ENOTSUPP otherwise
> - */
> -static int pci_msi_domain_check_cap(struct irq_domain *domain,
> - struct msi_domain_info *info,
> - struct device *dev)
> -{
> - struct msi_desc *desc = msi_first_desc(dev, MSI_DESC_ALL);
> -
> - /* Special handling to support __pci_enable_msi_range() */
> - if (pci_msi_desc_is_multi_msi(desc) &&
> - !(info->flags & MSI_FLAG_MULTI_PCI_MSI))
> - return 1;
> -
> - if (desc->pci.msi_attrib.is_msix) {
> - if (!(info->flags & MSI_FLAG_PCI_MSIX))
> - return -ENOTSUPP;
> -
> - if (info->flags & MSI_FLAG_MSIX_CONTIGUOUS) {
> - unsigned int idx = 0;
> -
> - /* Check for gaps in the entry indices */
> - msi_for_each_desc(desc, dev, MSI_DESC_ALL) {
> - if (desc->msi_index != idx++)
> - return -ENOTSUPP;
> - }
> - }
> - }
> - return 0;
> -}
> -
>  static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
>   struct msi_desc *desc)
>  {
> @@ -118,7 +73,6 @@ static void pci_msi_domain_set_desc(msi_
>  
>  static struct msi_domain_ops pci_msi_domain_ops_default = {
>   .set_desc   = pci_msi_domain_set_desc,
> - .msi_check  = pci_msi_domain_check_cap,
>  };
>  
>  static void pci_msi_domain_update_dom_ops(struct msi_domain_info *info)
> @@ -130,8 +84,6 @@ static void pci_msi_domain_update_dom_op
>   } else {
>   if (ops->set_desc == NULL)
>   ops->set_desc = pci_msi_domain_set_desc;
> - if (ops->msi_check == NULL)
> - ops->msi_check = pci_msi_domain_check_cap;
>   }
>  }
>  
>

Re: [patch 36/39] PCI/MSI: Validate MSIX contiguous restriction early

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:55:12PM +0100, Thomas Gleixner wrote:
> With interrupt domains the sanity check for MSI-X vector validation can be
> done _before_ any allocation happens. The sanity check only applies to the
> allocation functions which have an 'entries' array argument. The entries
> array is filled by the caller with the requested MSI-X indicies. Some drivers
> have gaps in the index space which is not supported on all architectures.
> 
> The PCI/MSI irqdomain has a 'feature' bit to enforce this validation late
> during the allocation phase.
> 
> Just do it right away before doing any other work along with the other
> sanity checks on that array.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

s/indicies/indices/ (commit log)
s/irqdomain/irq domain/?  IIRC previous logs used "irq domain"
s/MSIX/MSI-X/ (subject line)

> ---
>  drivers/pci/msi/msi.c |   11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -725,13 +725,17 @@ static int msix_capability_init(struct p
>   return ret;
>  }
>  
> -static bool pci_msix_validate_entries(struct msix_entry *entries, int nvec, 
> int hwsize)
> +static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry 
> *entries,
> +   int nvec, int hwsize)
>  {
> + bool nogap;
>   int i, j;
>  
>   if (!entries)
>   return true;
>  
> + nogap = pci_msi_domain_supports(dev, MSI_FLAG_MSIX_CONTIGUOUS, 
> DENY_LEGACY);
> +
>   for (i = 0; i < nvec; i++) {
>   /* Entry within hardware limit? */
>   if (entries[i].entry >= hwsize)
> @@ -742,6 +746,9 @@ static bool pci_msix_validate_entries(st
>   if (entries[i].entry == entries[j].entry)
>   return false;
>   }
> + /* Check for unsupported gaps */
> + if (nogap && entries[i].entry != i)
> + return false;
>   }
>   return true;
>  }
> @@ -773,7 +780,7 @@ int __pci_enable_msix_range(struct pci_d
>   if (hwsize < 0)
>   return hwsize;
>  
> - if (!pci_msix_validate_entries(entries, nvec, hwsize))
> + if (!pci_msix_validate_entries(dev, entries, nvec, hwsize))
>   return -EINVAL;
>  
>   /* PCI_IRQ_VIRTUAL is a horrible hack! */
>

Re: [patch 35/39] PCI/MSI: Reject MSI-X early

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:55:11PM +0100, Thomas Gleixner wrote:
> Similar to PCI multi-MSI reject MSI-X enablement when a irq domain is
> attached to the device which does not support MSI-X.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/msi.c |4 
>  1 file changed, 4 insertions(+)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -760,6 +760,10 @@ int __pci_enable_msix_range(struct pci_d
>   if (WARN_ON_ONCE(dev->msix_enabled))
>   return -EINVAL;
>  
> + /* Check MSI-X early on irq domain enabled architectures */
> + if (!pci_msi_domain_supports(dev, MSI_FLAG_PCI_MSIX, ALLOW_LEGACY))
> + return -ENOTSUPP;
> +
>   if (!pci_msi_supported(dev, nvec) || dev->current_state != PCI_D0)
>   return -EINVAL;
>  
>

Re: [patch 34/39] PCI/MSI: Reject multi-MSI early

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:55:09PM +0100, Thomas Gleixner wrote:
> When hierarchical MSI interrupt domains are enabled then there is no point
> to do tons of work and detect the missing support for multi-MSI late in the
> allocation path.
> 
> Just query the domain feature flags right away. The query function is going
> to be used for other purposes later and has a mode argument which influences
> the result:
> 
>   ALLOW_LEGACY returns true when:
>  - there is no irq domain attached (legacy support)
>  - there is a irq domain attached which has the feature flag set
> 
>   DENY_LEGACY returns only true when:
>  - there is a irq domain attached which has the feature flag set
> 
> This allows to use the function universally without ifdeffery in the
> calling code.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/irqdomain.c |   22 ++
>  drivers/pci/msi/msi.c   |4 
>  drivers/pci/msi/msi.h   |9 +
>  3 files changed, 35 insertions(+)
> 
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -187,6 +187,28 @@ struct irq_domain *pci_msi_create_irq_do
>  }
>  EXPORT_SYMBOL_GPL(pci_msi_create_irq_domain);
>  
> +/**
> + * pci_msi_domain_supports - Check for support of a particular feature flag
> + * @pdev:The PCI device to operate on
> + * @feature_mask:The feature mask to check for (full match)
> + * @mode:If ALLOW_LEGACY this grants the feature when there is 
> no irq domain
> + *   associated to the device. If DENY_LEGACY the lack of an 
> irq domain
> + *   makes the feature unsupported

Looks like some of these might be wider than 80 columns, which I think
was the typical width of this file.

> + */
> +bool pci_msi_domain_supports(struct pci_dev *pdev, unsigned int feature_mask,
> +  enum support_mode mode)
> +{
> + struct msi_domain_info *info;
> + struct irq_domain *domain;
> +
> + domain = dev_get_msi_domain(>dev);
> +
> + if (!domain || !irq_domain_is_hierarchy(domain))
> + return mode == ALLOW_LEGACY;
> + info = domain->host_data;
> + return (info->flags & feature_mask) == feature_mask;
> +}
> +
>  /*
>   * Users of the generic MSI infrastructure expect a device to have a single 
> ID,
>   * so with DMA aliases we have to pick the least-worst compromise. Devices 
> with
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -347,6 +347,10 @@ static int msi_capability_init(struct pc
>   struct msi_desc *entry;
>   int ret;
>  
> + /* Reject multi-MSI early on irq domain enabled architectures */
> + if (nvec > 1 && !pci_msi_domain_supports(dev, MSI_FLAG_MULTI_PCI_MSI, 
> ALLOW_LEGACY))
> + return 1;
> +
>   /*
>* Disable MSI during setup in the hardware, but mark it enabled
>* so that setup code can evaluate it.
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -97,6 +97,15 @@ int __pci_enable_msix_range(struct pci_d
>  void __pci_restore_msi_state(struct pci_dev *dev);
>  void __pci_restore_msix_state(struct pci_dev *dev);
>  
> +/* irq_domain related functionality */
> +
> +enum support_mode {
> + ALLOW_LEGACY,
> + DENY_LEGACY,
> +};
> +
> +bool pci_msi_domain_supports(struct pci_dev *dev, unsigned int feature_mask, 
> enum support_mode mode);
> +
>  /* Legacy (!IRQDOMAIN) fallbacks */
>  
>  #ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
>

Re: [patch 33/39] PCI/MSI: Sanitize MSI-X checks

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:55:07PM +0100, Thomas Gleixner wrote:
> There is no point in doing the same sanity checks over and over in a loop
> during MSI-X enablement. Put them in front of the loop and return early
> when they fail.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/msi.c |   67 
> +-
>  1 file changed, 34 insertions(+), 33 deletions(-)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -721,47 +721,31 @@ static int msix_capability_init(struct p
>   return ret;
>  }
>  
> -static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
> -  int nvec, struct irq_affinity *affd, int flags)
> +static bool pci_msix_validate_entries(struct msix_entry *entries, int nvec, 
> int hwsize)
>  {
> - int nr_entries;
>   int i, j;
>  
> - if (!pci_msi_supported(dev, nvec) || dev->current_state != PCI_D0)
> - return -EINVAL;
> + if (!entries)
> + return true;
>  
> - nr_entries = pci_msix_vec_count(dev);
> - if (nr_entries < 0)
> - return nr_entries;
> - if (nvec > nr_entries && !(flags & PCI_IRQ_VIRTUAL))
> - return nr_entries;
> -
> - if (entries) {
> - /* Check for any invalid entries */
> - for (i = 0; i < nvec; i++) {
> - if (entries[i].entry >= nr_entries)
> - return -EINVAL; /* invalid entry */
> - for (j = i + 1; j < nvec; j++) {
> - if (entries[i].entry == entries[j].entry)
> - return -EINVAL; /* duplicate entry */
> - }
> + for (i = 0; i < nvec; i++) {
> + /* Entry within hardware limit? */
> + if (entries[i].entry >= hwsize)
> + return false;
> +
> + /* Check for duplicate entries */
> + for (j = i + 1; j < nvec; j++) {
> + if (entries[i].entry == entries[j].entry)
> + return false;
>   }
>   }
> -
> - /* Check whether driver already requested for MSI IRQ */
> - if (dev->msi_enabled) {
> - pci_info(dev, "can't enable MSI-X (MSI IRQ already 
> assigned)\n");
> - return -EINVAL;
> - }
> - return msix_capability_init(dev, entries, nvec, affd);
> + return true;
>  }
>  
> -int __pci_enable_msix_range(struct pci_dev *dev,
> - struct msix_entry *entries, int minvec,
> - int maxvec, struct irq_affinity *affd,
> - int flags)
> +int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, 
> int minvec,
> + int maxvec, struct irq_affinity *affd, int flags)
>  {
> - int rc, nvec = maxvec;
> + int hwsize, rc, nvec = maxvec;
>  
>   if (maxvec < minvec)
>   return -ERANGE;
> @@ -774,6 +758,23 @@ int __pci_enable_msix_range(struct pci_d
>   if (WARN_ON_ONCE(dev->msix_enabled))
>   return -EINVAL;
>  
> + if (!pci_msi_supported(dev, nvec) || dev->current_state != PCI_D0)
> + return -EINVAL;
> +
> + hwsize = pci_msix_vec_count(dev);
> + if (hwsize < 0)
> + return hwsize;
> +
> + if (!pci_msix_validate_entries(entries, nvec, hwsize))
> + return -EINVAL;
> +
> + /* PCI_IRQ_VIRTUAL is a horrible hack! */
> + if (nvec > hwsize && !(flags & PCI_IRQ_VIRTUAL))
> + nvec = hwsize;
> +
> + if (nvec < minvec)
> + return -ENOSPC;
> +
>   rc = pci_setup_msi_context(dev);
>   if (rc)
>   return rc;
> @@ -785,7 +786,7 @@ int __pci_enable_msix_range(struct pci_d
>   return -ENOSPC;
>   }
>  
> - rc = __pci_enable_msix(dev, entries, nvec, affd, flags);
> + rc = msix_capability_init(dev, entries, nvec, affd);
>   if (rc == 0)
>   return nvec;
>  
>

Re: [patch 32/39] PCI/MSI: Reorder functions in msi.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:55:06PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> There is no way to navigate msi.c without banging the head against the wall
> every now and then because MSI and MSI-X specific functions are
> intermingled and the code flow is completely non-obvious.
> 
> Reorder everthing so common helpers, MSI and MSI-X specific functions are
> grouped together.

s/everthing/everything/

> Suggested-by: Thomas Gleixner 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

I assume this is pure code movement, so I didn't even look at the
text below.

> ---
>  drivers/pci/msi/msi.c |  577 
> +-
>  1 file changed, 295 insertions(+), 282 deletions(-)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -16,6 +16,97 @@
>  int pci_msi_enable = 1;
>  int pci_msi_ignore_mask;
>  
> +/**
> + * pci_msi_supported - check whether MSI may be enabled on a device
> + * @dev: pointer to the pci_dev data structure of MSI device function
> + * @nvec: how many MSIs have been requested?
> + *
> + * Look at global flags, the device itself, and its parent buses
> + * to determine if MSI/-X are supported for the device. If MSI/-X is
> + * supported return 1, else return 0.
> + **/
> +static int pci_msi_supported(struct pci_dev *dev, int nvec)
> +{
> + struct pci_bus *bus;
> +
> + /* MSI must be globally enabled and supported by the device */
> + if (!pci_msi_enable)
> + return 0;
> +
> + if (!dev || dev->no_msi)
> + return 0;
> +
> + /*
> +  * You can't ask to have 0 or less MSIs configured.
> +  *  a) it's stupid ..
> +  *  b) the list manipulation code assumes nvec >= 1.
> +  */
> + if (nvec < 1)
> + return 0;
> +
> + /*
> +  * Any bridge which does NOT route MSI transactions from its
> +  * secondary bus to its primary bus must set NO_MSI flag on
> +  * the secondary pci_bus.
> +  *
> +  * The NO_MSI flag can either be set directly by:
> +  * - arch-specific PCI host bus controller drivers (deprecated)
> +  * - quirks for specific PCI bridges
> +  *
> +  * or indirectly by platform-specific PCI host bridge drivers by
> +  * advertising the 'msi_domain' property, which results in
> +  * the NO_MSI flag when no MSI domain is found for this bridge
> +  * at probe time.
> +  */
> + for (bus = dev->bus; bus; bus = bus->parent)
> + if (bus->bus_flags & PCI_BUS_FLAGS_NO_MSI)
> + return 0;
> +
> + return 1;
> +}
> +
> +static void pcim_msi_release(void *pcidev)
> +{
> + struct pci_dev *dev = pcidev;
> +
> + dev->is_msi_managed = false;
> + pci_free_irq_vectors(dev);
> +}
> +
> +/*
> + * Needs to be separate from pcim_release to prevent an ordering problem
> + * vs. msi_device_data_release() in the MSI core code.
> + */
> +static int pcim_setup_msi_release(struct pci_dev *dev)
> +{
> + int ret;
> +
> + if (!pci_is_managed(dev) || dev->is_msi_managed)
> + return 0;
> +
> + ret = devm_add_action(>dev, pcim_msi_release, dev);
> + if (!ret)
> + dev->is_msi_managed = true;
> + return ret;
> +}
> +
> +/*
> + * Ordering vs. devres: msi device data has to be installed first so that
> + * pcim_msi_release() is invoked before it on device release.
> + */
> +static int pci_setup_msi_context(struct pci_dev *dev)
> +{
> + int ret = msi_setup_device_data(>dev);
> +
> + if (!ret)
> + ret = pcim_setup_msi_release(dev);
> + return ret;
> +}
> +
> +/*
> + * Helper functions for mask/unmask and MSI message handling
> + */
> +
>  void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set)
>  {
>   raw_spinlock_t *lock = _pci_dev(desc->dev)->msi_lock;
> @@ -163,15 +254,8 @@ void pci_write_msi_msg(unsigned int irq,
>  }
>  EXPORT_SYMBOL_GPL(pci_write_msi_msg);
>  
> -void pci_free_msi_irqs(struct pci_dev *dev)
> -{
> - pci_msi_teardown_msi_irqs(dev);
>  
> - if (dev->msix_base) {
> - iounmap(dev->msix_base);
> - dev->msix_base = NULL;
> - }
> -}
> +/* PCI/MSI specific functionality */
>  
>  static void pci_intx_for_msi(struct pci_dev *dev, int enable)
>  {
> @@ -190,111 +274,6 @@ static void pci_msi_set_enable(struct pc
>   pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
>  }
>  
> -/*
> - * Architecture override returns true when the PCI MSI message should be
> - * written by the generic restore function.
> - */
> -bool __weak arch_restore_msi_irqs(struct pci_dev *dev)
> -{
> - return true;
> -}
> -
> -void __pci_restore_msi_state(struct pci_dev *dev)
> -{
> - struct msi_desc *entry;
> - u16 control;
> -
> - if (!dev->msi_enabled)
> - return;
> -
> - entry = irq_get_msi_desc(dev->irq);
> -
> - pci_intx_for_msi(dev, 0);
> - pci_msi_set_enable(dev, 0);
> - if

Re: [patch 31/39] Documentation: PCI: Add reference to PCI/MSI device driver APIs

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:55:04PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> All exported device-driver MSI APIs are now grouped in one place at
> drivers/pci/msi/api.c with comprehensive kernel-docs added.
> 
> Reference these kernel-docs in the official PCI/MSI howto.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  Documentation/PCI/msi-howto.rst |   10 ++
>  1 file changed, 10 insertions(+)
> ---
> --- a/Documentation/PCI/msi-howto.rst
> +++ b/Documentation/PCI/msi-howto.rst
> @@ -285,3 +285,13 @@ to bridges between the PCI root and the
>  It is also worth checking the device driver to see whether it supports MSIs.
>  For example, it may contain calls to pci_alloc_irq_vectors() with the
>  PCI_IRQ_MSI or PCI_IRQ_MSIX flags.
> +
> +
> +List of device drivers MSI(-X) APIs
> +===
> +
> +The PCI/MSI subystem has a dedicated C file for its exported device driver
> +APIs — `drivers/pci/msi/api.c`. The following functions are exported:
> +
> +.. kernel-doc:: drivers/pci/msi/api.c
> +   :export:
>

Re: [patch 30/39] PCI/MSI: Move pci_msi_restore_state() to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:55:03PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_msi_enabled() and add kernel-doc for the function.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index ee9ed5ccd94d..8d1cf6db9bd7 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -308,6 +308,21 @@ void pci_free_irq_vectors(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL(pci_free_irq_vectors);
>  
> +/**
> + * pci_restore_msi_state() - Restore cached MSI(-X) state on device
> + * @dev: the PCI device to operate on
> + *
> + * Write the Linux-cached MSI(-X) state back on device. This is
> + * typically useful upon system resume, or after an error-recovery PCI
> + * adapter reset.
> + */
> +void pci_restore_msi_state(struct pci_dev *dev)
> +{
> + __pci_restore_msi_state(dev);
> + __pci_restore_msix_state(dev);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_msi_state);
> +
>  /**
>   * pci_msi_enabled() - Are MSI(-X) interrupts enabled system-wide?
>   *
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 59c33bc7fe81..a5d168c823ff 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -199,7 +199,7 @@ bool __weak arch_restore_msi_irqs(struct pci_dev *dev)
>   return true;
>  }
>  
> -static void __pci_restore_msi_state(struct pci_dev *dev)
> +void __pci_restore_msi_state(struct pci_dev *dev)
>  {
>   struct msi_desc *entry;
>   u16 control;
> @@ -231,7 +231,7 @@ static void pci_msix_clear_and_set_ctrl(struct pci_dev 
> *dev, u16 clear, u16 set)
>   pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, ctrl);
>  }
>  
> -static void __pci_restore_msix_state(struct pci_dev *dev)
> +void __pci_restore_msix_state(struct pci_dev *dev)
>  {
>   struct msi_desc *entry;
>   bool write_msg;
> @@ -257,13 +257,6 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
>   pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>  }
>  
> -void pci_restore_msi_state(struct pci_dev *dev)
> -{
> - __pci_restore_msi_state(dev);
> - __pci_restore_msix_state(dev);
> -}
> -EXPORT_SYMBOL_GPL(pci_restore_msi_state);
> -
>  static void pcim_msi_release(void *pcidev)
>  {
>   struct pci_dev *dev = pcidev;
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index f3f4ede53171..8170ef2c5ad0 100644
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -94,6 +94,8 @@ void pci_free_msi_irqs(struct pci_dev *dev);
>  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, 
> struct irq_affinity *affd);
>  int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, 
> int minvec,
>   int maxvec,  struct irq_affinity *affd, int flags);
> +void __pci_restore_msi_state(struct pci_dev *dev);
> +void __pci_restore_msix_state(struct pci_dev *dev);
>  
>  /* Legacy (!IRQDOMAIN) fallbacks */
>  
>

Re: [patch 29/39] PCI/MSI: Move pci_msi_enabled() to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:55:01PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_msi_enabled() and make its kernel-doc comprehensive.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/api.c | 12 
>  drivers/pci/msi/msi.c | 14 +-
>  drivers/pci/msi/msi.h |  3 +++
>  3 files changed, 16 insertions(+), 13 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 473df7ba0584..ee9ed5ccd94d 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -307,3 +307,15 @@ void pci_free_irq_vectors(struct pci_dev *dev)
>   pci_disable_msi(dev);
>  }
>  EXPORT_SYMBOL(pci_free_irq_vectors);
> +
> +/**
> + * pci_msi_enabled() - Are MSI(-X) interrupts enabled system-wide?
> + *
> + * Return: true if MSI has not been globally disabled through ACPI FADT,
> + * PCI bridge quirks, or the "pci=nomsi" kernel command-line option.
> + */
> +int pci_msi_enabled(void)
> +{
> + return pci_msi_enable;
> +}
> +EXPORT_SYMBOL(pci_msi_enabled);
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index d78646d1c116..59c33bc7fe81 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -13,7 +13,7 @@
>  #include "../pci.h"
>  #include "msi.h"
>  
> -static int pci_msi_enable = 1;
> +int pci_msi_enable = 1;
>  int pci_msi_ignore_mask;
>  
>  void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set)
> @@ -864,15 +864,3 @@ void pci_no_msi(void)
>  {
>   pci_msi_enable = 0;
>  }
> -
> -/**
> - * pci_msi_enabled - is MSI enabled?
> - *
> - * Returns true if MSI has not been disabled by the command-line option
> - * pci=nomsi.
> - **/
> -int pci_msi_enabled(void)
> -{
> - return pci_msi_enable;
> -}
> -EXPORT_SYMBOL(pci_msi_enabled);
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index 77e2587f7e4f..f3f4ede53171 100644
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -84,6 +84,9 @@ static inline __attribute_const__ u32 msi_multi_mask(struct 
> msi_desc *desc)
>   return (1 << (1 << desc->pci.msi_attrib.multi_cap)) - 1;
>  }
>  
> +/* Subsystem variables */
> +extern int pci_msi_enable;
> +
>  /* MSI internal functions invoked from the public APIs */
>  void pci_msi_shutdown(struct pci_dev *dev);
>  void pci_msix_shutdown(struct pci_dev *dev);
>

Re: [patch 27/39] PCI/MSI: Move pci_disable_msix() to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:58PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_disable_msix() and make its kernel-doc comprehensive.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

Trivial question below.

> ---
>  drivers/pci/msi/api.c | 24 
>  drivers/pci/msi/msi.c | 14 +-
>  drivers/pci/msi/msi.h |  1 +
>  3 files changed, 26 insertions(+), 13 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 83ea38ffa116..653a61868ae6 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -112,6 +112,30 @@ int pci_enable_msix_range(struct pci_dev *dev, struct 
> msix_entry *entries,
>  EXPORT_SYMBOL(pci_enable_msix_range);
>  
>  /**
> + * pci_disable_msix() - Disable MSI-X interrupt mode on device
> + * @dev: the PCI device to operate on
> + *
> + * Legacy device driver API to disable MSI-X interrupt mode on device,
> + * free earlier-allocated interrupt vectors, and restore INTx emulation.

Isn't INTx *emulation* a PCIe implementation detail?  Doesn't seem
relevant to callers that it's emulated.

> + * The PCI device Linux IRQ (@dev->irq) is restored to its default pin
> + * assertion IRQ. This is the cleanup pair of pci_enable_msix_range().
> + *
> + * NOTE: The newer pci_alloc_irq_vectors() / pci_free_irq_vectors() API
> + * pair should, in general, be used instead.
> + */
> +void pci_disable_msix(struct pci_dev *dev)
> +{
> + if (!pci_msi_enabled() || !dev || !dev->msix_enabled)
> + return;
> +
> + msi_lock_descs(>dev);
> + pci_msix_shutdown(dev);
> + pci_free_msi_irqs(dev);
> + msi_unlock_descs(>dev);
> +}
> +EXPORT_SYMBOL(pci_disable_msix);
> +
> +/**
>   * pci_alloc_irq_vectors() - Allocate multiple device interrupt vectors
>   * @dev:  the PCI device to operate on
>   * @min_vecs: minimum required number of vectors (must be >= 1)
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 1226d66da992..6fa90d07d2e4 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -736,7 +736,7 @@ static int __pci_enable_msix(struct pci_dev *dev, struct 
> msix_entry *entries,
>   return msix_capability_init(dev, entries, nvec, affd);
>  }
>  
> -static void pci_msix_shutdown(struct pci_dev *dev)
> +void pci_msix_shutdown(struct pci_dev *dev)
>  {
>   struct msi_desc *desc;
>  
> @@ -758,18 +758,6 @@ static void pci_msix_shutdown(struct pci_dev *dev)
>   pcibios_alloc_irq(dev);
>  }
>  
> -void pci_disable_msix(struct pci_dev *dev)
> -{
> - if (!pci_msi_enable || !dev || !dev->msix_enabled)
> - return;
> -
> - msi_lock_descs(>dev);
> - pci_msix_shutdown(dev);
> - pci_free_msi_irqs(dev);
> - msi_unlock_descs(>dev);
> -}
> -EXPORT_SYMBOL(pci_disable_msix);
> -
>  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
>  struct irq_affinity *affd)
>  {
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index 8c4a5289432d..77e2587f7e4f 100644
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -86,6 +86,7 @@ static inline __attribute_const__ u32 msi_multi_mask(struct 
> msi_desc *desc)
>  
>  /* MSI internal functions invoked from the public APIs */
>  void pci_msi_shutdown(struct pci_dev *dev);
> +void pci_msix_shutdown(struct pci_dev *dev);
>  void pci_free_msi_irqs(struct pci_dev *dev);
>  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, 
> struct irq_affinity *affd);
>  int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, 
> int minvec,
>

Re: [patch 26/39] PCI/MSI: Move pci_msix_vec_count() to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:56PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_msix_vec_count() and make its kernel-doc comprehensive.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/api.c | 20 
>  drivers/pci/msi/msi.c | 20 
>  2 files changed, 20 insertions(+), 20 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 2ff2a9ccfc47..83ea38ffa116 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -60,6 +60,26 @@ void pci_disable_msi(struct pci_dev *dev)
>  EXPORT_SYMBOL(pci_disable_msi);
>  
>  /**
> + * pci_msix_vec_count() - Get number of MSI-X interrupt vectors on device
> + * @dev: the PCI device to operate on
> + *
> + * Return: number of MSI-X interrupt vectors available on this device
> + * (i.e., the device's MSI-X capability structure "table size"), -EINVAL
> + * if the device is not MSI-X capable, other errnos otherwise.
> + */
> +int pci_msix_vec_count(struct pci_dev *dev)
> +{
> + u16 control;
> +
> + if (!dev->msix_cap)
> + return -EINVAL;
> +
> + pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, );
> + return msix_table_size(control);
> +}
> +EXPORT_SYMBOL(pci_msix_vec_count);
> +
> +/**
>   * pci_enable_msix_range() - Enable MSI-X interrupt mode on device
>   * @dev: the PCI device to operate on
>   * @entries: input/output parameter, array of MSI-X configuration entries
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index ed8caf5ac99f..1226d66da992 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -701,26 +701,6 @@ void pci_msi_shutdown(struct pci_dev *dev)
>   pcibios_alloc_irq(dev);
>  }
>  
> -/**
> - * pci_msix_vec_count - return the number of device's MSI-X table entries
> - * @dev: pointer to the pci_dev data structure of MSI-X device function
> - * This function returns the number of device's MSI-X table entries and
> - * therefore the number of MSI-X vectors device is capable of sending.
> - * It returns a negative errno if the device is not capable of sending MSI-X
> - * interrupts.
> - **/
> -int pci_msix_vec_count(struct pci_dev *dev)
> -{
> - u16 control;
> -
> - if (!dev->msix_cap)
> - return -EINVAL;
> -
> - pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, );
> - return msix_table_size(control);
> -}
> -EXPORT_SYMBOL(pci_msix_vec_count);
> -
>  static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
>int nvec, struct irq_affinity *affd, int flags)
>  {
>

Re: [patch 25/39] PCI/MSI: Move pci_free_irq_vectors() to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:54PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_free_irq_vectors() and make its kernel-doc comprehensive.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/api.c | 15 +++
>  drivers/pci/msi/msi.c | 13 -
>  2 files changed, 15 insertions(+), 13 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 0f1ec87e3f9f..2ff2a9ccfc47 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -205,3 +205,18 @@ int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
>   return irq ? irq : -EINVAL;
>  }
>  EXPORT_SYMBOL(pci_irq_vector);
> +
> +/**
> + * pci_free_irq_vectors() - Free previously allocated IRQs for a device
> + * @dev: the PCI device to operate on
> + *
> + * Undo the interrupt vector allocations and possible device MSI/MSI-X
> + * enablement earlier done through pci_alloc_irq_vectors_affinity() or
> + * pci_alloc_irq_vectors().
> + */
> +void pci_free_irq_vectors(struct pci_dev *dev)
> +{
> + pci_disable_msix(dev);
> + pci_disable_msi(dev);
> +}
> +EXPORT_SYMBOL(pci_free_irq_vectors);
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 38ad2fe4b85c..ed8caf5ac99f 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -887,19 +887,6 @@ int __pci_enable_msix_range(struct pci_dev *dev,
>  }
>  
>  /**
> - * pci_free_irq_vectors - free previously allocated IRQs for a device
> - * @dev: PCI device to operate on
> - *
> - * Undoes the allocations and enabling in pci_alloc_irq_vectors().
> - */
> -void pci_free_irq_vectors(struct pci_dev *dev)
> -{
> - pci_disable_msix(dev);
> - pci_disable_msi(dev);
> -}
> -EXPORT_SYMBOL(pci_free_irq_vectors);
> -
> -/**
>   * pci_irq_get_affinity - return the affinity of a particular MSI vector
>   * @dev: PCI device to operate on
>   * @nr:  device-relative interrupt vector index (0-based).
>

Re: [patch 24/39] PCI/MSI: Move pci_irq_vector() to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:53PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_irq_vector() and let its kernel-doc match the rest of the file.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/api.c | 23 +++
>  drivers/pci/msi/msi.c | 24 
>  2 files changed, 23 insertions(+), 24 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 8546749afa6e..0f1ec87e3f9f 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -182,3 +182,26 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, 
> unsigned int min_vecs,
>   return nvecs;
>  }
>  EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
> +
> +/**
> + * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
> + * @dev: the PCI device to operate on
> + * @nr:  device-relative interrupt vector index (0-based); has different
> + *   meanings, depending on interrupt mode
> + * MSI-Xthe index in the MSI-X vector table
> + * MSI  the index of the enabled MSI vectors
> + * INTx must be 0
> + *
> + * Return: the Linux IRQ number, or -EINVAL if @nr is out of range
> + */
> +int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
> +{
> + unsigned int irq;
> +
> + if (!dev->msi_enabled && !dev->msix_enabled)
> + return !nr ? dev->irq : -EINVAL;
> +
> + irq = msi_get_virq(>dev, nr);
> + return irq ? irq : -EINVAL;
> +}
> +EXPORT_SYMBOL(pci_irq_vector);
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index a028774f438d..38ad2fe4b85c 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -900,30 +900,6 @@ void pci_free_irq_vectors(struct pci_dev *dev)
>  EXPORT_SYMBOL(pci_free_irq_vectors);
>  
>  /**
> - * pci_irq_vector - return Linux IRQ number of a device vector
> - * @dev: PCI device to operate on
> - * @nr:  Interrupt vector index (0-based)
> - *
> - * @nr has the following meanings depending on the interrupt mode:
> - *   MSI-X:  The index in the MSI-X vector table
> - *   MSI:The index of the enabled MSI vectors
> - *   INTx:   Must be 0
> - *
> - * Return: The Linux interrupt number or -EINVAl if @nr is out of range.
> - */
> -int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
> -{
> - unsigned int irq;
> -
> - if (!dev->msi_enabled && !dev->msix_enabled)
> - return !nr ? dev->irq : -EINVAL;
> -
> - irq = msi_get_virq(>dev, nr);
> - return irq ? irq : -EINVAL;
> -}
> -EXPORT_SYMBOL(pci_irq_vector);
> -
> -/**
>   * pci_irq_get_affinity - return the affinity of a particular MSI vector
>   * @dev: PCI device to operate on
>   * @nr:  device-relative interrupt vector index (0-based).
>

Re: [patch 23/39] PCI/MSI: Move pci_alloc_irq_vectors_affinity() to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:51PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_alloc_irq_vectors_affinity() and let its kernel-doc reference
> pci_alloc_irq_vectors() documentation added in parent commit.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

One question below.

> ---
>  drivers/pci/msi/api.c | 59 +++-
>  drivers/pci/msi/msi.c | 65 
> +
>  2 files changed, 59 insertions(+), 65 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 1714905943fb..8546749afa6e 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -123,3 +123,62 @@ int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned 
> int min_vecs,
> flags, NULL);
>  }
>  EXPORT_SYMBOL(pci_alloc_irq_vectors);
> +
> +/**
> + * pci_alloc_irq_vectors_affinity() - Allocate multiple device interrupt
> + *vectors with affinity requirements
> + * @dev:  the PCI device to operate on
> + * @min_vecs: minimum required number of vectors (must be >= 1)
> + * @max_vecs: maximum desired number of vectors
> + * @flags:allocation flags, as in pci_alloc_irq_vectors()
> + * @affd: affinity requirements (can be %NULL).
> + *
> + * Same as pci_alloc_irq_vectors(), but with the extra @affd parameter.
> + * Check that function docs, and  irq_affinity, for more details.

Is " irq_affinity" some kernel-doc syntax, or is the "&"
superfluous?

> + */
> +int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int 
> min_vecs,
> +unsigned int max_vecs, unsigned int flags,
> +struct irq_affinity *affd)
> +{
> + struct irq_affinity msi_default_affd = {0};
> + int nvecs = -ENOSPC;
> +
> + if (flags & PCI_IRQ_AFFINITY) {
> + if (!affd)
> + affd = _default_affd;
> + } else {
> + if (WARN_ON(affd))
> + affd = NULL;
> + }
> +
> + if (flags & PCI_IRQ_MSIX) {
> + nvecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
> + affd, flags);
> + if (nvecs > 0)
> + return nvecs;
> + }
> +
> + if (flags & PCI_IRQ_MSI) {
> + nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
> + if (nvecs > 0)
> + return nvecs;
> + }
> +
> + /* use legacy IRQ if allowed */
> + if (flags & PCI_IRQ_LEGACY) {
> + if (min_vecs == 1 && dev->irq) {
> + /*
> +  * Invoke the affinity spreading logic to ensure that
> +  * the device driver can adjust queue configuration
> +  * for the single interrupt case.
> +  */
> + if (affd)
> + irq_create_affinity_masks(1, affd);
> + pci_intx(dev, 1);
> + return 1;
> + }
> + }
> +
> + return nvecs;
> +}
> +EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 6700ef1c734e..a028774f438d 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -887,71 +887,6 @@ int __pci_enable_msix_range(struct pci_dev *dev,
>  }
>  
>  /**
> - * pci_alloc_irq_vectors_affinity - allocate multiple IRQs for a device
> - * @dev: PCI device to operate on
> - * @min_vecs:minimum number of vectors required (must be >= 
> 1)
> - * @max_vecs:maximum (desired) number of vectors
> - * @flags:   flags or quirks for the allocation
> - * @affd:optional description of the affinity requirements
> - *
> - * Allocate up to @max_vecs interrupt vectors for @dev, using MSI-X or MSI
> - * vectors if available, and fall back to a single legacy vector
> - * if neither is available.  Return the number of vectors allocated,
> - * (which might be smaller than @max_vecs) if successful, or a negative
> - * error code on error. If less than @min_vecs interrupt vectors are
> - * available for @dev the function will fail with -ENOSPC.
> - *
> - * To get the Linux IRQ number used for a vector that can be passed to
> - * request_irq() use the pci_irq_vector() helper.
> - */
> -int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int 
> min_vecs,
> -unsigned int max_vecs, unsigned int flags,
> -struct irq_affinity *affd)
> -{
> - struct irq_affinity msi_default_affd = {0};
> - int nvecs = -ENOSPC;
> -
> - if (flags & PCI_IRQ_AFFINITY) {

[PATCH printk v5 25/40] tty: hvc: use console_is_registered()

2022-11-16 Thread John Ogness

It is not reliable to check for CON_ENABLED in order to identify if a
console is registered. Use console_is_registered() instead.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 drivers/tty/hvc/hvc_console.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 4802cfaa107f..a683e21df19c 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -264,8 +264,8 @@ static void hvc_port_destruct(struct tty_port *port)
 
 static void hvc_check_console(int index)
 {
-   /* Already enabled, bail out */
-   if (hvc_console.flags & CON_ENABLED)
+   /* Already registered, bail out */
+   if (console_is_registered(_console))
return;
 
/* If this index is what the user requested, then register
-- 
2.30.2

Re: [patch 22/39] PCI/MSI: Move pci_alloc_irq_vectors() to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:50PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Make pci_alloc_irq_vectors() a real function instead of wrapper and add
> proper kernel doc to it.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

Two nits below.

> ---
>  drivers/pci/msi/api.c | 33 +
>  include/linux/pci.h   | 17 +
>  2 files changed, 42 insertions(+), 8 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index d48050555d55..1714905943fb 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -90,3 +90,36 @@ int pci_enable_msix_range(struct pci_dev *dev, struct 
> msix_entry *entries,
>   return __pci_enable_msix_range(dev, entries, minvec, maxvec, NULL, 0);
>  }
>  EXPORT_SYMBOL(pci_enable_msix_range);
> +
> +/**
> + * pci_alloc_irq_vectors() - Allocate multiple device interrupt vectors
> + * @dev:  the PCI device to operate on
> + * @min_vecs: minimum required number of vectors (must be >= 1)
> + * @max_vecs: maximum desired number of vectors
> + * @flags:One or more of:
> + *%PCI_IRQ_MSIX  Allow trying MSI-X vector allocations
> + *%PCI_IRQ_MSI   Allow trying MSI vector allocations
> + *%PCI_IRQ_LEGACYAllow trying legacy INTx interrupts, if
> + *   and only if @min_vecs == 1
> + *%PCI_IRQ_AFFINITY  Auto-manage IRQs affinity by spreading
> + *   the vectors around available CPUs
> + *
> + * Allocate up to @max_vecs interrupt vectors on device. MSI-X irq

s/irq/IRQ/

> + * vector allocation has a higher precedence over plain MSI, which has a
> + * higher precedence over legacy INTx emulation.
> + *
> + * Upon a successful allocation, the caller should use pci_irq_vector()
> + * to get the Linux IRQ number to be passed to request_threaded_irq().
> + * The driver must call pci_free_irq_vectors() on cleanup.
> + *
> + * Return: number of allocated vectors (which might be smaller than
> + * @max_vecs), -ENOSPC if less than @min_vecs interrupt vectors are

s/less/fewer/ (also in some previous patches, IIRC)

> + * available, other errnos otherwise.
> + */
> +int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> +   unsigned int max_vecs, unsigned int flags)
> +{
> + return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> +   flags, NULL);
> +}
> +EXPORT_SYMBOL(pci_alloc_irq_vectors);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 2bda4a4e47e8..6a356a39ba94 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1553,6 +1553,8 @@ static inline int pci_enable_msix_exact(struct pci_dev 
> *dev,
>   return rc;
>   return 0;
>  }
> +int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> +   unsigned int max_vecs, unsigned int flags);
>  int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int 
> min_vecs,
>  unsigned int max_vecs, unsigned int flags,
>  struct irq_affinity *affd);
> @@ -1586,6 +1588,13 @@ pci_alloc_irq_vectors_affinity(struct pci_dev *dev, 
> unsigned int min_vecs,
>   return 1;
>   return -ENOSPC;
>  }
> +static inline int
> +pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> +   unsigned int max_vecs, unsigned int flags)
> +{
> + return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> +   flags, NULL);
> +}
>  
>  static inline void pci_free_irq_vectors(struct pci_dev *dev)
>  {
> @@ -1900,14 +1909,6 @@ pci_alloc_irq_vectors_affinity(struct pci_dev *dev, 
> unsigned int min_vecs,
>  }
>  #endif /* CONFIG_PCI */
>  
> -static inline int
> -pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> -   unsigned int max_vecs, unsigned int flags)
> -{
> - return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
> -   NULL);
> -}
> -
>  /* Include architecture-dependent settings and functions */
>  
>  #include 
>

[PATCH printk v5 00/40] reduce console_lock scope

2022-11-16 Thread John Ogness

This is v5 of a series to prepare for threaded/atomic
printing. v4 is here [0]. This series focuses on reducing the
scope of the BKL console_lock. It achieves this by switching to
SRCU and a dedicated mutex for console list iteration and
modification, respectively. The console_lock will no longer
offer this protection.

Also, during the review of v2 it came to our attention that
many console drivers are checking CON_ENABLED to see if they
are registered. Because this flag can change without
unregistering and because this flag does not represent an
atomic point when an (un)registration process is complete,
a new console_is_registered() function is introduced. This
function uses the console_list_lock to synchronize with the
(un)registration process to provide a reliable status.

All users of the console_lock for list iteration have been
modified. For the call sites where the console_lock is still
needed (for other reasons), comments are added to explain
exactly why the console_lock is needed.

All users of CON_ENABLED for registration status have been
modified to use console_is_registered(). Note that there are
still users of CON_ENABLED, but this is for legitimate purposes
about a registered console being able to print.

The base commit for this series is from Paul McKenney's RCU tree
and provides an NMI-safe SRCU implementation [1]. Without the
NMI-safe SRCU implementation, this series is not less safe than
mainline. But we will need the NMI-safe SRCU implementation for
atomic consoles anyway, so we might as well get it in
now. Especially since it _does_ increase the reliability for
mainline in the panic path.

Changes since v4:

printk:

- Introduce console_init_seq() to handle the now rather complex
  procedure to find an appropriate start sequence number for a
  new console upon registration.

- When registering a non-boot console and boot consoles are
  registered, try to flush all the consoles to get the next @seq
  value before falling back to use the @seq of the enabled boot
  console that is furthest behind.

- For console_force_preferred_locked(), make the console the
  head of the console list.

John Ogness

[0] 
https://lore.kernel.org/lkml/20221114162932.141883-1-john.ogn...@linutronix.de
[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/log/?h=srcunmisafe.2022.11.09a

John Ogness (38):
  printk: Prepare for SRCU console list protection
  printk: register_console: use "registered" for variable names
  printk: move @seq initialization to helper
  printk: fix setting first seq for consoles
  um: kmsg_dump: only dump when no output console available
  tty: serial: kgdboc: document console_lock usage
  tty: tty_io: document console_lock usage
  proc: consoles: document console_lock usage
  printk: introduce console_list_lock
  console: introduce wrappers to read/write console flags
  um: kmsg_dumper: use srcu console list iterator
  kdb: use srcu console list iterator
  printk: console_flush_all: use srcu console list iterator
  printk: __pr_flush: use srcu console list iterator
  printk: console_is_usable: use console_srcu_read_flags
  printk: console_unblank: use srcu console list iterator
  printk: console_flush_on_panic: use srcu console list iterator
  printk: console_device: use srcu console list iterator
  console: introduce console_is_registered()
  serial_core: replace uart_console_enabled() with
uart_console_registered()
  tty: nfcon: use console_is_registered()
  efi: earlycon: use console_is_registered()
  tty: hvc: use console_is_registered()
  tty: serial: earlycon: use console_is_registered()
  tty: serial: pic32_uart: use console_is_registered()
  tty: serial: samsung_tty: use console_is_registered()
  tty: serial: xilinx_uartps: use console_is_registered()
  usb: early: xhci-dbc: use console_is_registered()
  netconsole: avoid CON_ENABLED misuse to track registration
  printk, xen: fbfront: create/use safe function for forcing preferred
  tty: tty_io: use console_list_lock for list synchronization
  proc: consoles: use console_list_lock for list iteration
  tty: serial: kgdboc: use srcu console list iterator
  tty: serial: kgdboc: use console_list_lock for list traversal
  tty: serial: kgdboc: synchronize tty_find_polling_driver() and
register_console()
  tty: serial: kgdboc: use console_list_lock to trap exit
  printk: relieve console_lock of list synchronization duties
  tty: serial: sh-sci: use setup() callback for early console

Thomas Gleixner (2):
  serial: kgdboc: Lock console list in probe function
  printk: Convert console_drivers list to hlist

 .clang-format   |   1 +
 arch/m68k/emu/nfcon.c   |   9 +-
 arch/um/kernel/kmsg_dump.c  |  24 +-
 drivers/firmware/efi/earlycon.c |   8 +-
 drivers/net/netconsole.c|  21 +-
 drivers/tty/hvc/hvc_console.c   |   4 +-
 drivers/tty/serial/8250/8250_core.c |   2 +-
 drivers/tty/serial/earlycon.c   |   4 +-
 drivers/tty/serial/kgdboc.c

Re: [patch 21/39] PCI/MSI: Move pci_enable_msix_range() to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:48PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c, all exported device-driver MSI APIs are
> now to be grouped in one file, api.c.
> 
> Move pci_enable_msix_range() and make its kernel-doc comprehensive.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/api.c | 32 
>  drivers/pci/msi/msi.c | 30 --
>  drivers/pci/msi/msi.h |  3 +++
>  3 files changed, 39 insertions(+), 26 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 63d7f8f6a284..d48050555d55 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -58,3 +58,35 @@ void pci_disable_msi(struct pci_dev *dev)
>   msi_unlock_descs(>dev);
>  }
>  EXPORT_SYMBOL(pci_disable_msi);
> +
> +/**
> + * pci_enable_msix_range() - Enable MSI-X interrupt mode on device
> + * @dev: the PCI device to operate on
> + * @entries: input/output parameter, array of MSI-X configuration entries
> + * @minvec:  minimum required number of MSI-X vectors
> + * @maxvec:  maximum desired number of MSI-X vectors
> + *
> + * Legacy device driver API to enable MSI-X interrupt mode on device and
> + * configure its MSI-X capability structure as appropriate.  The passed
> + * @entries array must have each of its members "entry" field set to a
> + * desired (valid) MSI-X vector number, where the range of valid MSI-X
> + * vector numbers can be queried through pci_msix_vec_count().  If
> + * successful, the driver must invoke pci_disable_msix() on cleanup.
> + *
> + * NOTE: The newer pci_alloc_irq_vectors() / pci_free_irq_vectors() API
> + * pair should, in general, be used instead.
> + *
> + * Return: number of MSI-X vectors allocated (which might be smaller
> + * than @maxvecs), where Linux IRQ numbers for such allocated vectors
> + * are saved back in the @entries array elements' "vector" field. Return
> + * -ENOSPC if less than @minvecs interrupt vectors are available.
> + * Return -EINVAL if one of the passed @entries members "entry" field
> + * was invalid or a duplicate, or if plain MSI interrupts mode was
> + * earlier enabled on device. Return other errnos otherwise.
> + */
> +int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
> +   int minvec, int maxvec)
> +{
> + return __pci_enable_msix_range(dev, entries, minvec, maxvec, NULL, 0);
> +}
> +EXPORT_SYMBOL(pci_enable_msix_range);
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 98f07ad9af62..6700ef1c734e 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -844,10 +844,10 @@ int __pci_enable_msi_range(struct pci_dev *dev, int 
> minvec, int maxvec,
>   }
>  }
>  
> -static int __pci_enable_msix_range(struct pci_dev *dev,
> -struct msix_entry *entries, int minvec,
> -int maxvec, struct irq_affinity *affd,
> -int flags)
> +int __pci_enable_msix_range(struct pci_dev *dev,
> + struct msix_entry *entries, int minvec,
> + int maxvec, struct irq_affinity *affd,
> + int flags)
>  {
>   int rc, nvec = maxvec;
>  
> @@ -887,28 +887,6 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
>  }
>  
>  /**
> - * pci_enable_msix_range - configure device's MSI-X capability structure
> - * @dev: pointer to the pci_dev data structure of MSI-X device function
> - * @entries: pointer to an array of MSI-X entries
> - * @minvec: minimum number of MSI-X IRQs requested
> - * @maxvec: maximum number of MSI-X IRQs requested
> - *
> - * Setup the MSI-X capability structure of device function with a maximum
> - * possible number of interrupts in the range between @minvec and @maxvec
> - * upon its software driver call to request for MSI-X mode enabled on its
> - * hardware device function. It returns a negative errno if an error occurs.
> - * If it succeeds, it returns the actual number of interrupts allocated and
> - * indicates the successful configuration of MSI-X capability structure
> - * with new allocated MSI-X interrupts.
> - **/
> -int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
> - int minvec, int maxvec)
> -{
> - return __pci_enable_msix_range(dev, entries, minvec, maxvec, NULL, 0);
> -}
> -EXPORT_SYMBOL(pci_enable_msix_range);
> -
> -/**
>   * pci_alloc_irq_vectors_affinity - allocate multiple IRQs for a device
>   * @dev: PCI device to operate on
>   * @min_vecs:minimum number of vectors required (must be >= 
> 1)
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index 00bb98d5bb0e..8c4a5289432d 100644
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -88,8 +88,11 @@ static inline __attribute_const__ u32 
>

Re: [patch 20/39] PCI/MSI: Move pci_enable_msi() API to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:46PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> To distangle the maze in msi.c all exported device-driver MSI APIs are now
> to be grouped in one file, api.c.
> 
> Move pci_enable_msi() and make its kernel-doc comprehensive.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

Nit: suggest "disentangle" or "untangle" for "distangle" here and in
subsequent patches.

> ---
>  drivers/pci/msi/api.c | 23 +++
>  drivers/pci/msi/msi.c | 14 ++
>  drivers/pci/msi/msi.h |  1 +
>  3 files changed, 26 insertions(+), 12 deletions(-)
> ---
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index 7485942cbe5d..63d7f8f6a284 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -13,6 +13,29 @@
>  #include "msi.h"
>  
>  /**
> + * pci_enable_msi() - Enable MSI interrupt mode on device
> + * @dev: the PCI device to operate on
> + *
> + * Legacy device driver API to enable MSI interrupts mode on device and
> + * allocate a single interrupt vector. On success, the allocated vector
> + * Linux IRQ will be saved at @dev->irq. The driver must invoke
> + * pci_disable_msi() on cleanup.
> + *
> + * NOTE: The newer pci_alloc_irq_vectors() / pci_free_irq_vectors() API
> + * pair should, in general, be used instead.
> + *
> + * Return: 0 on success, errno otherwise
> + */
> +int pci_enable_msi(struct pci_dev *dev)
> +{
> + int rc = __pci_enable_msi_range(dev, 1, 1, NULL);
> + if (rc < 0)
> + return rc;
> + return 0;
> +}
> +EXPORT_SYMBOL(pci_enable_msi);
> +
> +/**
>   * pci_disable_msi() - Disable MSI interrupt mode on device
>   * @dev: the PCI device to operate on
>   *
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 4a1300b74518..98f07ad9af62 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -790,8 +790,8 @@ void pci_disable_msix(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL(pci_disable_msix);
>  
> -static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int 
> maxvec,
> -   struct irq_affinity *affd)
> +int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
> +struct irq_affinity *affd)
>  {
>   int nvec;
>   int rc;
> @@ -844,16 +844,6 @@ static int __pci_enable_msi_range(struct pci_dev *dev, 
> int minvec, int maxvec,
>   }
>  }
>  
> -/* deprecated, don't use */
> -int pci_enable_msi(struct pci_dev *dev)
> -{
> - int rc = __pci_enable_msi_range(dev, 1, 1, NULL);
> - if (rc < 0)
> - return rc;
> - return 0;
> -}
> -EXPORT_SYMBOL(pci_enable_msi);
> -
>  static int __pci_enable_msix_range(struct pci_dev *dev,
>  struct msix_entry *entries, int minvec,
>  int maxvec, struct irq_affinity *affd,
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index 634879277349..00bb98d5bb0e 100644
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -87,6 +87,7 @@ static inline __attribute_const__ u32 msi_multi_mask(struct 
> msi_desc *desc)
>  /* MSI internal functions invoked from the public APIs */
>  void pci_msi_shutdown(struct pci_dev *dev);
>  void pci_free_msi_irqs(struct pci_dev *dev);
> +int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, 
> struct irq_affinity *affd);
>  
>  /* Legacy (!IRQDOMAIN) fallbacks */
>  #ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
>

Re: [patch 19/39] PCI/MSI: Move pci_disable_msi() to api.c

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:45PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> msi.c is a maze of randomly sorted functions which makes the code
> unreadable. As a first step split the driver visible API and the internal
> implementation which also allows proper API documentation via one file.
> 
> Create drivers/pci/msi/api.c to group all exported device-driver PCI/MSI
> APIs in one C file.
> 
> Begin by moving pci_disable_msi() there and add kernel-doc for the function
> as appropriate.
> 
> Suggested-by: Thomas Gleixner 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/Makefile |  3 +--
>  drivers/pci/msi/api.c| 37 +
>  drivers/pci/msi/msi.c| 22 +-
>  drivers/pci/msi/msi.h|  4 
>  4 files changed, 47 insertions(+), 19 deletions(-)
>  create mode 100644 drivers/pci/msi/api.c
> ---
> diff --git a/drivers/pci/msi/Makefile b/drivers/pci/msi/Makefile
> index 4e0a7e07965e..839ff72d72a8 100644
> --- a/drivers/pci/msi/Makefile
> +++ b/drivers/pci/msi/Makefile
> @@ -2,6 +2,5 @@
>  #
>  # Makefile for the PCI/MSI
>  obj-$(CONFIG_PCI)+= pcidev_msi.o
> -obj-$(CONFIG_PCI_MSI)+= msi.o
> -obj-$(CONFIG_PCI_MSI)+= irqdomain.o
> +obj-$(CONFIG_PCI_MSI)+= api.o msi.o irqdomain.o
>  obj-$(CONFIG_PCI_MSI_ARCH_FALLBACKS) += legacy.o
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> new file mode 100644
> index ..7485942cbe5d
> --- /dev/null
> +++ b/drivers/pci/msi/api.c
> @@ -0,0 +1,37 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCI MSI/MSI-X — Exported APIs for device drivers
> + *
> + * Copyright (C) 2003-2004 Intel
> + * Copyright (C) Tom Long Nguyen (tom.l.ngu...@intel.com)
> + * Copyright (C) 2016 Christoph Hellwig.
> + * Copyright (C) 2022 Linutronix GmbH
> + */
> +
> +#include 
> +
> +#include "msi.h"
> +
> +/**
> + * pci_disable_msi() - Disable MSI interrupt mode on device
> + * @dev: the PCI device to operate on
> + *
> + * Legacy device driver API to disable MSI interrupt mode on device,
> + * free earlier allocated interrupt vectors, and restore INTx emulation.
> + * The PCI device Linux IRQ (@dev->irq) is restored to its default
> + * pin-assertion IRQ. This is the cleanup pair of pci_enable_msi().
> + *
> + * NOTE: The newer pci_alloc_irq_vectors() / pci_free_irq_vectors() API
> + * pair should, in general, be used instead.
> + */
> +void pci_disable_msi(struct pci_dev *dev)
> +{
> + if (!pci_msi_enabled() || !dev || !dev->msi_enabled)
> + return;
> +
> + msi_lock_descs(>dev);
> + pci_msi_shutdown(dev);
> + pci_free_msi_irqs(dev);
> + msi_unlock_descs(>dev);
> +}
> +EXPORT_SYMBOL(pci_disable_msi);
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 5c310df55d0d..4a1300b74518 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -163,7 +163,7 @@ void pci_write_msi_msg(unsigned int irq, struct msi_msg 
> *msg)
>  }
>  EXPORT_SYMBOL_GPL(pci_write_msi_msg);
>  
> -static void free_msi_irqs(struct pci_dev *dev)
> +void pci_free_msi_irqs(struct pci_dev *dev)
>  {
>   pci_msi_teardown_msi_irqs(dev);
>  
> @@ -413,7 +413,7 @@ static int msi_capability_init(struct pci_dev *dev, int 
> nvec,
>  
>  err:
>   pci_msi_unmask(entry, msi_multi_mask(entry));
> - free_msi_irqs(dev);
> + pci_free_msi_irqs(dev);
>  fail:
>   dev->msi_enabled = 0;
>  unlock:
> @@ -531,7 +531,7 @@ static int msix_setup_interrupts(struct pci_dev *dev, 
> void __iomem *base,
>   goto out_unlock;
>  
>  out_free:
> - free_msi_irqs(dev);
> + pci_free_msi_irqs(dev);
>  out_unlock:
>   msi_unlock_descs(>dev);
>   kfree(masks);
> @@ -680,7 +680,7 @@ int pci_msi_vec_count(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL(pci_msi_vec_count);
>  
> -static void pci_msi_shutdown(struct pci_dev *dev)
> +void pci_msi_shutdown(struct pci_dev *dev)
>  {
>   struct msi_desc *desc;
>  
> @@ -701,18 +701,6 @@ static void pci_msi_shutdown(struct pci_dev *dev)
>   pcibios_alloc_irq(dev);
>  }
>  
> -void pci_disable_msi(struct pci_dev *dev)
> -{
> - if (!pci_msi_enable || !dev || !dev->msi_enabled)
> - return;
> -
> - msi_lock_descs(>dev);
> - pci_msi_shutdown(dev);
> - free_msi_irqs(dev);
> - msi_unlock_descs(>dev);
> -}
> -EXPORT_SYMBOL(pci_disable_msi);
> -
>  /**
>   * pci_msix_vec_count - return the number of device's MSI-X table entries
>   * @dev: pointer to the pci_dev data structure of MSI-X device function
> @@ -797,7 +785,7 @@ void pci_disable_msix(struct pci_dev *dev)
>  
>   msi_lock_descs(>dev);
>   pci_msix_shutdown(dev);
> - free_msi_irqs(dev);
> + pci_free_msi_irqs(dev);
>   msi_unlock_descs(>dev);
>  }
>  EXPORT_SYMBOL(pci_disable_msix);
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index

Re: [patch 18/39] PCI/MSI: Move mask and unmask helpers to msi.h

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:43PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> The upcoming support for per device MSI interrupt domains needs to share
> some of the inline helpers with the MSI implementation.
> 
> Move them to the header file.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/msi.c | 61 +--
>  drivers/pci/msi/msi.h | 83 
> +---
>  2 files changed, 74 insertions(+), 70 deletions(-)
> ---
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 160af9f01669..5c310df55d0d 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -16,7 +16,7 @@
>  static int pci_msi_enable = 1;
>  int pci_msi_ignore_mask;
>  
> -static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, 
> u32 set)
> +void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set)
>  {
>   raw_spinlock_t *lock = _pci_dev(desc->dev)->msi_lock;
>   unsigned long flags;
> @@ -32,65 +32,6 @@ static noinline void pci_msi_update_mask(struct msi_desc 
> *desc, u32 clear, u32 s
>   raw_spin_unlock_irqrestore(lock, flags);
>  }
>  
> -static inline void pci_msi_mask(struct msi_desc *desc, u32 mask)
> -{
> - pci_msi_update_mask(desc, 0, mask);
> -}
> -
> -static inline void pci_msi_unmask(struct msi_desc *desc, u32 mask)
> -{
> - pci_msi_update_mask(desc, mask, 0);
> -}
> -
> -static inline void __iomem *pci_msix_desc_addr(struct msi_desc *desc)
> -{
> - return desc->pci.mask_base + desc->msi_index * PCI_MSIX_ENTRY_SIZE;
> -}
> -
> -/*
> - * This internal function does not flush PCI writes to the device.  All
> - * users must ensure that they read from the device before either assuming
> - * that the device state is up to date, or returning out of this file.
> - * It does not affect the msi_desc::msix_ctrl cache either. Use with care!
> - */
> -static void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl)
> -{
> - void __iomem *desc_addr = pci_msix_desc_addr(desc);
> -
> - if (desc->pci.msi_attrib.can_mask)
> - writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
> -}
> -
> -static inline void pci_msix_mask(struct msi_desc *desc)
> -{
> - desc->pci.msix_ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
> - pci_msix_write_vector_ctrl(desc, desc->pci.msix_ctrl);
> - /* Flush write to device */
> - readl(desc->pci.mask_base);
> -}
> -
> -static inline void pci_msix_unmask(struct msi_desc *desc)
> -{
> - desc->pci.msix_ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
> - pci_msix_write_vector_ctrl(desc, desc->pci.msix_ctrl);
> -}
> -
> -static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
> -{
> - if (desc->pci.msi_attrib.is_msix)
> - pci_msix_mask(desc);
> - else
> - pci_msi_mask(desc, mask);
> -}
> -
> -static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask)
> -{
> - if (desc->pci.msi_attrib.is_msix)
> - pci_msix_unmask(desc);
> - else
> - pci_msi_unmask(desc, mask);
> -}
> -
>  /**
>   * pci_msi_mask_irq - Generic IRQ chip callback to mask PCI/MSI interrupts
>   * @data:pointer to irqdata associated to that interrupt
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index fc92603b33e1..d8f62d911f08 100644
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -8,21 +8,67 @@
>  int pci_msi_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
>  void pci_msi_teardown_msi_irqs(struct pci_dev *dev);
>  
> -#ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
> -int pci_msi_legacy_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
> -void pci_msi_legacy_teardown_msi_irqs(struct pci_dev *dev);
> -#else
> -static inline int pci_msi_legacy_setup_msi_irqs(struct pci_dev *dev, int 
> nvec, int type)
> +/* Mask/unmask helpers */
> +void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set);
> +
> +static inline void pci_msi_mask(struct msi_desc *desc, u32 mask)
>  {
> - WARN_ON_ONCE(1);
> - return -ENODEV;
> + pci_msi_update_mask(desc, 0, mask);
>  }
>  
> -static inline void pci_msi_legacy_teardown_msi_irqs(struct pci_dev *dev)
> +static inline void pci_msi_unmask(struct msi_desc *desc, u32 mask)
>  {
> - WARN_ON_ONCE(1);
> + pci_msi_update_mask(desc, mask, 0);
> +}
> +
> +static inline void __iomem *pci_msix_desc_addr(struct msi_desc *desc)
> +{
> + return desc->pci.mask_base + desc->msi_index * PCI_MSIX_ENTRY_SIZE;
> +}
> +
> +/*
> + * This internal function does not flush PCI writes to the device.  All
> + * users must ensure that they read from the device before either assuming
> + * that the device state is up to date, or returning out of this file.
> + * It does not affect the msi_desc::msix_ctrl cache either. Use with care!
> + */
> +static inline void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 
> ctrl)
> +{
> +

Re: [patch 17/39] PCI/MSI: Get rid of externs in msi.h

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:42PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> Follow the style of 
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/msi.h |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -5,12 +5,12 @@
>  
>  #define msix_table_size(flags)   ((flags & PCI_MSIX_FLAGS_QSIZE) + 1)
>  
> -extern int pci_msi_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
> -extern void pci_msi_teardown_msi_irqs(struct pci_dev *dev);
> +int pci_msi_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
> +void pci_msi_teardown_msi_irqs(struct pci_dev *dev);
>  
>  #ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
> -extern int pci_msi_legacy_setup_msi_irqs(struct pci_dev *dev, int nvec, int 
> type);
> -extern void pci_msi_legacy_teardown_msi_irqs(struct pci_dev *dev);
> +int pci_msi_legacy_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
> +void pci_msi_legacy_teardown_msi_irqs(struct pci_dev *dev);
>  #else
>  static inline int pci_msi_legacy_setup_msi_irqs(struct pci_dev *dev, int 
> nvec, int type)
>  {
>

Re: [patch 14/39] PCI/MSI: Let the MSI core free descriptors

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:37PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> Let the core do the freeing of descriptors and just keep it around for the
> legacy case.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/irqdomain.c |   10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -24,11 +24,12 @@ void pci_msi_teardown_msi_irqs(struct pc
>   struct irq_domain *domain;
>  
>   domain = dev_get_msi_domain(>dev);
> - if (domain && irq_domain_is_hierarchy(domain))
> + if (domain && irq_domain_is_hierarchy(domain)) {
>   msi_domain_free_irqs_descs_locked(domain, >dev);
> - else
> + } else {
>   pci_msi_legacy_teardown_msi_irqs(dev);
> - msi_free_msi_descs(>dev);
> + msi_free_msi_descs(>dev);
> + }
>  }
>  
>  /**
> @@ -170,6 +171,9 @@ struct irq_domain *pci_msi_create_irq_do
>   if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
>   pci_msi_domain_update_chip_ops(info);
>  
> + /* Let the core code free MSI descriptors when freeing interrupts */
> + info->flags |= MSI_FLAG_FREE_MSI_DESCS;
> +
>   info->flags |= MSI_FLAG_ACTIVATE_EARLY | MSI_FLAG_DEV_SYSFS;
>   if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))
>   info->flags |= MSI_FLAG_MUST_REACTIVATE;
>

Re: [patch 13/39] PCI/MSI: Use msi_domain_info::bus_token

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:35PM +0100, Thomas Gleixner wrote:
> From: Ahmed S. Darwish 
> 
> Set the bus token in the msi_domain_info structure and let the core code
> handle the update.
> 
> Signed-off-by: Ahmed S. Darwish 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/irqdomain.c |   11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -162,8 +162,6 @@ struct irq_domain *pci_msi_create_irq_do
>struct msi_domain_info *info,
>struct irq_domain *parent)
>  {
> - struct irq_domain *domain;
> -
>   if (WARN_ON(info->flags & MSI_FLAG_LEVEL_CAPABLE))
>   info->flags &= ~MSI_FLAG_LEVEL_CAPABLE;
>  
> @@ -178,13 +176,10 @@ struct irq_domain *pci_msi_create_irq_do
>  
>   /* PCI-MSI is oneshot-safe */
>   info->chip->flags |= IRQCHIP_ONESHOT_SAFE;
> + /* Let the core update the bus token */
> + info->bus_token = DOMAIN_BUS_PCI_MSI;
>  
> - domain = msi_create_irq_domain(fwnode, info, parent);
> - if (!domain)
> - return NULL;
> -
> - irq_domain_update_bus_token(domain, DOMAIN_BUS_PCI_MSI);
> - return domain;
> + return msi_create_irq_domain(fwnode, info, parent);
>  }
>  EXPORT_SYMBOL_GPL(pci_msi_create_irq_domain);
>  
>

Re: [patch 04/39] genirq/msi: Use MSI_DESC_ALL in msi_add_simple_msi_descs()

2022-11-16 Thread Ashok Raj

On Fri, Nov 11, 2022 at 02:54:20PM +0100, Thomas Gleixner wrote:
> There are no associated MSI descriptors in the requested range when the MSI
> descriptor allocation fails. Use MSI_DESC_ALL as the filter which prepares
> the next step to get rid of the filter for freeing.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  kernel/irq/msi.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -120,7 +120,7 @@ static int msi_add_simple_msi_descs(stru
>  fail_mem:
>   ret = -ENOMEM;
>  fail:
> - msi_free_msi_descs_range(dev, MSI_DESC_NOTASSOCIATED, index, last);
> + msi_free_msi_descs_range(dev, MSI_DESC_ALL, index, last);
>   return ret;
>  }
>  
> 
Reviewed-by: Ashok Raj

Re: [patch 15/39] PCI/MSI: Get rid of PCI_MSI_IRQ_DOMAIN

2022-11-16 Thread Bjorn Helgaas

On Fri, Nov 11, 2022 at 02:54:38PM +0100, Thomas Gleixner wrote:
> What a zoo:
> 
>  PCI_MSI
>   select GENERIC_MSI_IRQ
> 
>  PCI_MSI_IRQ_DOMAIN
>   def_bool y
>   depends on PCI_MSI
>   select GENERIC_MSI_IRQ_DOMAIN
> 
> Ergo PCI_MSI enables PCI_MSI_IRQ_DOMAIN which in turn selects
> GENERIC_MSI_IRQ_DOMAIN. So all the dependencies on PCI_MSI_IRQ_DOMAIN are
> just an indirection to PCI_MSI.
> 
> Match the reality and just admit that PCI_MSI requires
> GENERIC_MSI_IRQ_DOMAIN.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

Just FYI, this will conflict with my work-in-progress to add more
COMPILE_TEST coverage:
  
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?id=72b5e7c401a1

No *actual* conflicts, just textually next door, so should be sipmle
to resolve.  Worst case I can postpone my patch until the next cycle.

> ---
>  arch/um/drivers/Kconfig |1 
>  arch/um/include/asm/pci.h   |2 -
>  arch/x86/Kconfig|1 
>  arch/x86/include/asm/pci.h  |4 +-
>  drivers/pci/Kconfig |8 +
>  drivers/pci/controller/Kconfig  |   30 +---
>  drivers/pci/controller/dwc/Kconfig  |   48 
> 
>  drivers/pci/controller/mobiveil/Kconfig |6 ++--
>  drivers/pci/msi/Makefile|2 -
>  drivers/pci/probe.c |2 -
>  include/linux/msi.h |   32 ++---
>  11 files changed, 56 insertions(+), 80 deletions(-)
> 
> --- a/arch/um/drivers/Kconfig
> +++ b/arch/um/drivers/Kconfig
> @@ -381,7 +381,6 @@ config UML_PCI_OVER_VIRTIO
>   select UML_IOMEM_EMULATION
>   select UML_DMA_EMULATION
>   select PCI_MSI
> - select PCI_MSI_IRQ_DOMAIN
>   select PCI_LOCKLESS_CONFIG
>  
>  config UML_PCI_OVER_VIRTIO_DEVICE_ID
> --- a/arch/um/include/asm/pci.h
> +++ b/arch/um/include/asm/pci.h
> @@ -7,7 +7,7 @@
>  /* Generic PCI */
>  #include 
>  
> -#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> +#ifdef CONFIG_PCI_MSI
>  /*
>   * This is a bit of an annoying hack, and it assumes we only have
>   * the virt-pci (if anything). Which is true, but still.
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1109,7 +1109,6 @@ config X86_LOCAL_APIC
>   def_bool y
>   depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || 
> PCI_MSI
>   select IRQ_DOMAIN_HIERARCHY
> - select PCI_MSI_IRQ_DOMAIN if PCI_MSI
>  
>  config X86_IO_APIC
>   def_bool y
> --- a/arch/x86/include/asm/pci.h
> +++ b/arch/x86/include/asm/pci.h
> @@ -21,7 +21,7 @@ struct pci_sysdata {
>  #ifdef CONFIG_X86_64
>   void*iommu; /* IOMMU private data */
>  #endif
> -#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> +#ifdef CONFIG_PCI_MSI
>   void*fwnode;/* IRQ domain for MSI assignment */
>  #endif
>  #if IS_ENABLED(CONFIG_VMD)
> @@ -52,7 +52,7 @@ static inline int pci_proc_domain(struct
>  }
>  #endif
>  
> -#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> +#ifdef CONFIG_PCI_MSI
>  static inline void *_pci_root_bus_fwnode(struct pci_bus *bus)
>  {
>   return to_pci_sysdata(bus)->fwnode;
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -38,6 +38,7 @@ source "drivers/pci/pcie/Kconfig"
>  
>  config PCI_MSI
>   bool "Message Signaled Interrupts (MSI and MSI-X)"
> + select GENERIC_MSI_IRQ_DOMAIN
>   select GENERIC_MSI_IRQ
>   help
>  This allows device drivers to enable MSI (Message Signaled
> @@ -51,11 +52,6 @@ config PCI_MSI
>  
>  If you don't know what to do here, say Y.
>  
> -config PCI_MSI_IRQ_DOMAIN
> - def_bool y
> - depends on PCI_MSI
> - select GENERIC_MSI_IRQ_DOMAIN
> -
>  config PCI_MSI_ARCH_FALLBACKS
>   bool
>  
> @@ -192,7 +188,7 @@ config PCI_LABEL
>  
>  config PCI_HYPERV
>   tristate "Hyper-V PCI Frontend"
> - depends on ((X86 && X86_64) || ARM64) && HYPERV && PCI_MSI && 
> PCI_MSI_IRQ_DOMAIN && SYSFS
> + depends on ((X86 && X86_64) || ARM64) && HYPERV && PCI_MSI && SYSFS
>   select PCI_HYPERV_INTERFACE
>   help
> The PCI device frontend driver allows the kernel to import arbitrary
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -19,7 +19,7 @@ config PCI_AARDVARK
>   tristate "Aardvark PCIe controller"
>   depends on (ARCH_MVEBU && ARM64) || COMPILE_TEST
>   depends on OF
> - depends on PCI_MSI_IRQ_DOMAIN
> + depends on PCI_MSI
>   select PCI_BRIDGE_EMUL
>   help
>Add support for Aardvark 64bit PCIe Host Controller. This
> @@ -29,7 +29,7 @@ config PCI_AARDVARK
>  config PCIE_XILINX_NWL
>   bool "NWL PCIe Core"
>   depends on ARCH_ZYNQMP || COMPILE_TEST
> - depends on PCI_MSI_IRQ_DOMAIN
> + depends on PCI_MSI
>   help
>Say 'Y' here if you want kernel support for Xilinx
>NWL PCIe controller. The controller can act as Root Port
> @@ -53,7

Re: [patch 03/39] iommu/amd: Remove bogus check for multi MSI-X

2022-11-16 Thread Ashok Raj

On Fri, Nov 11, 2022 at 02:54:19PM +0100, Thomas Gleixner wrote:
> PCI/Multi-MSI is MSI specific and not supported for MSI-X
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/iommu/amd/iommu.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3294,8 +3294,7 @@ static int irq_remapping_alloc(struct ir
>  
>   if (!info)
>   return -EINVAL;
> - if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI &&
> - info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX)
> + if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI)
>   return -EINVAL;
>  
>   /*
> 

nit:

maybe better to merge patch2/3 since both seem related?

Re: [patch 02/39] iommu/vt-d: Remove bogus check for multi MSI-X

2022-11-16 Thread Ashok Raj

On Fri, Nov 11, 2022 at 02:54:17PM +0100, Thomas Gleixner wrote:
> PCI/Multi-MSI is MSI specific and not supported for MSI-X.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/iommu/intel/irq_remapping.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> --- a/drivers/iommu/intel/irq_remapping.c
> +++ b/drivers/iommu/intel/irq_remapping.c
> @@ -1334,8 +1334,7 @@ static int intel_irq_remapping_alloc(str
>  
>   if (!info || !iommu)
>   return -EINVAL;
> - if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI &&
> - info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX)
> + if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI)
>   return -EINVAL;
>  
>   /*
> 

This check is only making sure that when multi-msi is requested that the
type has to be either MSI/MSIX. 

Wouldn't this change return -EINVAL when type = MSIX?

Re: [PATCH 13/44] KVM: x86: Serialize vendor module initialization (hardware setup)

2022-11-16 Thread Sean Christopherson

On Wed, Nov 16, 2022, Huang, Kai wrote:
> On Wed, 2022-11-02 at 23:18 +, Sean Christopherson wrote:
> > Acquire a new mutex, vendor_module_lock, in kvm_x86_vendor_init() while
> > doing hardware setup to ensure that concurrent calls are fully serialized.
> > KVM rejects attempts to load vendor modules if a different module has
> > already been loaded, but doesn't handle the case where multiple vendor
> > modules are loaded at the same time, and module_init() doesn't run under
> > the global module_mutex.
> > 
> > Note, in practice, this is likely a benign bug as no platform exists that
> > supports both SVM and VMX, i.e. barring a weird VM setup, one of the
> > vendor modules is guaranteed to fail a support check before modifying
> > common KVM state.
> > 
> > Alternatively, KVM could perform an atomic CMPXCHG on .hardware_enable,
> > but that comes with its own ugliness as it would require setting
> > .hardware_enable before success is guaranteed, e.g. attempting to load
> > the "wrong" could result in spurious failure to load the "right" module.
> > 
> > Introduce a new mutex as using kvm_lock is extremely deadlock prone due
> > to kvm_lock being taken under cpus_write_lock(), and in the future, under
> > under cpus_read_lock().  Any operation that takes cpus_read_lock() while
> > holding kvm_lock would potentially deadlock, e.g. kvm_timer_init() takes
> > cpus_read_lock() to register a callback.  In theory, KVM could avoid
> > such problematic paths, i.e. do less setup under kvm_lock, but avoiding
> > all calls to cpus_read_lock() is subtly difficult and thus fragile.  E.g.
> > updating static calls also acquires cpus_read_lock().
> > 
> > Inverting the lock ordering, i.e. always taking kvm_lock outside
> > cpus_read_lock(), is not a viable option, e.g. kvm_online_cpu() takes
> > kvm_lock and is called under cpus_write_lock().
> 
> "kvm_online_cpu() takes kvm_lock and is called under cpus_write_lock()" hasn't
> happened yet.

Doh, right.  Thanks!

> > The lockdep splat below is dependent on future patches to take
> > cpus_read_lock() in hardware_enable_all(), but as above, deadlock is
> > already is already possible.
> 
> IIUC kvm_lock by design is supposed to protect vm_list, thus IMHO naturally it
> doesn't fit to protect multiple vendor module loading.

A different way to look at it is that kvm_lock protects anything that is global 
to
all of KVM, and it just so happens that lists and counters of VMs are the only
such resources (lumping in the usage in vm_uevent_notify_change() and the future
usage to protect kvm_usage_count).

> Looks above argument is good enough.  I am not sure  whether we need 
> additional
> justification which comes from future patches. :)

To try to prevent someone from trying to eliminate the "extra" lock, like this
series does for kvm_count_lock.  Hopefully future someones that want to clean up
the code do a git blame to understand why the lock was introduced and don't 
waste
their time running into the same issues (or worse, don't run into the issues and
break KVM).

> Also, do you also want to update Documentation/virt/kvm/locking.rst" in this
> patch?

Hmm, yeah.  That'd also be a good place to document why kvm_lock isn't used.

Re: [patch 01/39] PCI/MSI: Check for MSI enabled in __pci_msix_enable()

2022-11-16 Thread Ashok Raj

On Fri, Nov 11, 2022 at 02:54:15PM +0100, Thomas Gleixner wrote:
> PCI/MSI and PCI/MSI-X are mutually exclusive, but the MSI-X enable code
> lacks a check for already enabled MSI.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/msi.c |5 +
>  1 file changed, 5 insertions(+)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -935,6 +935,11 @@ static int __pci_enable_msix_range(struc
>   if (maxvec < minvec)
>   return -ERANGE;
>  
> + if (dev->msi_enabled) {
> + pci_info(dev, "can't enable MSI-X (MSI already enabled)\n");
> + return -EINVAL;
> + }
> +

nit: 

Can the pre-enabled checks for msi and msix be moved up before any vector
range check?

not that it matters for how it fails, does EBUSY sound better?

looks good.

Reviewed-by: Ashok Raj 

>   if (WARN_ON_ONCE(dev->msix_enabled))
>   return -EINVAL;

Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

2022-11-16 Thread Huang, Kai

On Tue, 2022-11-15 at 20:16 +, Sean Christopherson wrote:
> On Thu, Nov 10, 2022, Huang, Kai wrote:
> > On Thu, 2022-11-10 at 01:33 +, Huang, Kai wrote:
> > > > @@ -9283,7 +9283,13 @@ static int
> > > > kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
> > > >     int cpu = smp_processor_id();
> > > >     struct cpuinfo_x86 *c = _data(cpu);
> > > >  
> > > > -   WARN_ON(!irqs_disabled());
> > > > +   /*
> > > > +* Compatibility checks are done when loading KVM and when 
> > > > enabling
> > > > +* hardware, e.g. during CPU hotplug, to ensure all online CPUs 
> > > > are
> > > > +* compatible, i.e. KVM should never perform a compatibility 
> > > > check
> > > > on
> > > > +* an offline CPU.
> > > > +*/
> > > > +   WARN_ON(!irqs_disabled() && cpu_active(cpu));
> > > >  
> > > 
> > > Also, the logic of:
> > > 
> > >   !irqs_disabled() && cpu_active(cpu)
> > > 
> > > is quite weird.
> > > 
> > > The original "WARN(!irqs_disabled())" is reasonable because in STARTING
> > > section
> > > the IRQ is indeed disabled.
> > > 
> > > But this doesn't make sense anymore after we move to ONLINE section, in 
> > > which
> > > IRQ has already been enabled (see start_secondary()).  IIUC the WARN_ON()
> > > doesn't get exploded is purely because there's an additional 
> > > cpu_active(cpu)
> > > check.
> > > 
> > > So, a more reasonable check should be something like:
> > > 
> > >   WARN_ON(irqs_disabled() || cpu_active(cpu) || !cpu_online(cpu));
> > > 
> > > Or we can simply do:
> > > 
> > >   WARN_ON(!cpu_online(cpu) || cpu_active(cpu));
> > > 
> > > (because I don't know whether it's possible IRQ can somehow get disabled 
> > > in
> > > ONLINE section).
> > > 
> > > Btw above is purely based on code analysis, but I haven't done any test.
> > 
> > Hmm.. I wasn't thinking thoroughly.  I forgot CPU compatibility check also
> > happens on all online cpus when loading KVM.  For this case, IRQ is 
> > disabled and
> > cpu_active() is true.  For the hotplug case, IRQ is enabled but  
> > cpu_active() is
> > false.
> 
> Actually, you're right (and wrong).  You're right in that the WARN is flawed. 
>  And
> the reason for that is because you're wrong about the hotplug case.  In this 
> version
> of things, the compatibility checks are routed through hardware enabling, 
> i.e. this
> flow is used only when loading KVM.  This helper should only be called via 
> SMP function
> call, which means that IRQs should always be disabled.

Did you mean below code change in later patch "[PATCH 39/44] KVM: Drop
kvm_count_lock and instead protect kvm_usage_count with kvm_lock"?

/*
 * Abort the CPU online process if hardware virtualization cannot
 * be enabled. Otherwise running VMs would encounter unrecoverable
@@ -5039,13 +5039,16 @@ static int kvm_online_cpu(unsigned int cpu)
if (kvm_usage_count) {
WARN_ON_ONCE(atomic_read(_enable_failed));
 
+   local_irq_save(flags);
hardware_enable_nolock(NULL);
+   local_irq_restore(flags);
+

Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data

2022-11-16 Thread Stephen Röttger

On Tue, Nov 15, 2022 at 5:16 AM Michael Sammler  wrote:
> > We're currently working on a feature in chromium that uses pkeys for
> > in-process isolation. Being able to use the pkey state in the seccomp
> > filter would be pretty useful for this. For example, it would allow
> > us to enforce that no code outside the isolated thread would ever
> > map/mprotect executable memory.
> > We can probably do something similar by adding instruction pointer
> > checks to the seccomp filter, but that feels quite hacky and this
> > feature would make a much nicer implementation.
> >
> > Are there any plans to make a version 2 of this patch?
>
> Thanks for your interest in this patch, but I am now working on other 
> projects and currently don't plan to make a version 2 of this patch.

I'd be happy to take over writing a version 2 for this.

Kees and Dave, does this feature overall look good to you?

>From the discussion, I think there are two proposed changes:
* use an architecture-generic interface as Ram Pai suggested (i.e. add
a read_pkey function)
* ensure to restore the pkru value or fetch it from the xsave buffer


smime.p7s
Description: S/MIME Cryptographic Signature

[PATCH] arch/powerpc/setup: Fix reference count issue in pas_setup_mce_regs()

2022-11-16 Thread Xiongfeng Wang

pci_get_device() will increase the reference count for the returned
pci_dev, and also decrease the reference count for the input parameter
*from* if it is not NULL.

In function pas_setup_mce_regs(), after we break out the loop with 'dev'
not NULL, we need to decrease the reference count of 'dev'. Since
pci_dev_put() can handle the NULL input parameter, we can just add
pci_dev_put() after the loop. Also add pci_dev_put() for another two
pci_get_device().

Fixes: cd7834167ffb ("[POWERPC] pasemi: Print more information at machine 
check")
Signed-off-by: Xiongfeng Wang 
---
 arch/powerpc/platforms/pasemi/setup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/pasemi/setup.c 
b/arch/powerpc/platforms/pasemi/setup.c
index 2aef49e04dd4..4799b0a7b727 100644
--- a/arch/powerpc/platforms/pasemi/setup.c
+++ b/arch/powerpc/platforms/pasemi/setup.c
@@ -167,6 +167,7 @@ static int __init pas_setup_mce_regs(void)
dev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa00a, dev);
reg++;
}
+   pci_dev_put(dev);
 
dev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa001, NULL);
if (dev && reg+4 < MAX_MCE_REGS) {
@@ -183,6 +184,7 @@ static int __init pas_setup_mce_regs(void)
mce_regs[reg].addr = pasemi_pci_getcfgaddr(dev, 0xc1c);
reg++;
}
+   pci_dev_put(dev);
 
dev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa009, NULL);
if (dev && reg+2 < MAX_MCE_REGS) {
@@ -193,6 +195,7 @@ static int __init pas_setup_mce_regs(void)
mce_regs[reg].addr = pasemi_pci_getcfgaddr(dev, 0x214);
reg++;
}
+   pci_dev_put(dev);
 
num_mce_regs = reg;
 
-- 
2.20.1

Re: [PATCH mm-unstable v1 17/20] drm/exynos: remove FOLL_FORCE usage

2022-11-16 Thread Daniel Vetter

On Wed, Nov 16, 2022 at 11:26:56AM +0100, David Hildenbrand wrote:
> FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages
> using unpin_user_pages_dirty_lock(true), the assumption is that all these
> pages are writable.
> 
> FOLL_FORCE in this case seems to be a legacy leftover. Let's just remove
> it.
> 
> Cc: Inki Dae 
> Cc: Seung-Woo Kim 
> Cc: Kyungmin Park 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Krzysztof Kozlowski 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Daniel Vetter 

Plus ack for merging through the appropriate non-drm tree.
-Daniel

> ---
>  drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
> b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> index 471fd6c8135f..e19c2ceb3759 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> @@ -477,7 +477,7 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
> g2d_data *g2d,
>   }
>  
>   ret = pin_user_pages_fast(start, npages,
> -   FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> +   FOLL_WRITE | FOLL_LONGTERM,
> g2d_userptr->pages);
>   if (ret != npages) {
>   DRM_DEV_ERROR(g2d->dev,
> -- 
> 2.38.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage

2022-11-16 Thread Daniel Vetter

On Wed, Nov 16, 2022 at 11:26:55AM +0100, David Hildenbrand wrote:
> FOLL_FORCE is really only for ptrace access. According to commit
> 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always
> writable"), get_vaddr_frames() currently pins all pages writable as a
> workaround for issues with read-only buffers.
> 
> FOLL_FORCE, however, seems to be a legacy leftover as it predates
> commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are
> always writable"). Let's just remove it.
> 
> Once the read-only buffer issue has been resolved, FOLL_WRITE could
> again be set depending on the DMA direction.
> 
> Cc: Hans Verkuil 
> Cc: Marek Szyprowski 
> Cc: Tomasz Figa 
> Cc: Marek Szyprowski 
> Cc: Mauro Carvalho Chehab 
> Signed-off-by: David Hildenbrand 

Also code I looked at while looking at follow_pfn stuff

Reviewed-by: Daniel Vetter 

> ---
>  drivers/media/common/videobuf2/frame_vector.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/media/common/videobuf2/frame_vector.c 
> b/drivers/media/common/videobuf2/frame_vector.c
> index 542dde9d2609..062e98148c53 100644
> --- a/drivers/media/common/videobuf2/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> nr_frames,
>   start = untagged_addr(start);
>  
>   ret = pin_user_pages_fast(start, nr_frames,
> -   FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> +   FOLL_WRITE | FOLL_LONGTERM,
> (struct page **)(vec->ptrs));
>   if (ret > 0) {
>   vec->got_ref = true;
> -- 
> 2.38.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH mm-unstable v1 14/20] drm/etnaviv: remove FOLL_FORCE usage

2022-11-16 Thread Daniel Vetter

On Wed, Nov 16, 2022 at 11:26:53AM +0100, David Hildenbrand wrote:
> GUP now supports reliable R/O long-term pinning in COW mappings, such
> that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
> far in one corner case (DAXFS file with holes), which can be ignored
> because GUP does not support long-term pinning in fsdax (see
> check_vma_flags()).
> 
> commit cd5297b0855f ("drm/etnaviv: Use FOLL_FORCE for userptr")
> documents that FOLL_FORCE | FOLL_WRITE was really only used for reliable
> R/O pinning.
> 
> Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
> for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
> using FOLL_FORCE, which is really only for ptrace access.
> 
> Cc: Daniel Vetter 
> Cc: Lucas Stach 
> Cc: Russell King 
> Cc: Christian Gmeiner 
> Cc: David Airlie 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Daniel Vetter 

Also ack for merging through whatever tree suits best, since I guess this
should all land together.
-Daniel

> ---
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> index cc386f8a7116..efe2240945d0 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> @@ -638,6 +638,7 @@ static int etnaviv_gem_userptr_get_pages(struct 
> etnaviv_gem_object *etnaviv_obj)
>   struct page **pvec = NULL;
>   struct etnaviv_gem_userptr *userptr = _obj->userptr;
>   int ret, pinned = 0, npages = etnaviv_obj->base.size >> PAGE_SHIFT;
> + unsigned int gup_flags = FOLL_LONGTERM;
>  
>   might_lock_read(>mm->mmap_lock);
>  
> @@ -648,14 +649,15 @@ static int etnaviv_gem_userptr_get_pages(struct 
> etnaviv_gem_object *etnaviv_obj)
>   if (!pvec)
>   return -ENOMEM;
>  
> + if (!userptr->ro)
> + gup_flags |= FOLL_WRITE;
> +
>   do {
>   unsigned num_pages = npages - pinned;
>   uint64_t ptr = userptr->ptr + pinned * PAGE_SIZE;
>   struct page **pages = pvec + pinned;
>  
> - ret = pin_user_pages_fast(ptr, num_pages,
> -   FOLL_WRITE | FOLL_FORCE | 
> FOLL_LONGTERM,
> -   pages);
> + ret = pin_user_pages_fast(ptr, num_pages, gup_flags, pages);
>   if (ret < 0) {
>   unpin_user_pages(pvec, pinned);
>   kvfree(pvec);
> -- 
> 2.38.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH mm-unstable v1 13/20] media: videobuf-dma-sg: remove FOLL_FORCE usage

2022-11-16 Thread Daniel Vetter

On Wed, Nov 16, 2022 at 11:26:52AM +0100, David Hildenbrand wrote:
> GUP now supports reliable R/O long-term pinning in COW mappings, such
> that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
> far in one corner case (DAXFS file with holes), which can be ignored
> because GUP does not support long-term pinning in fsdax (see
> check_vma_flags()).
> 
> Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
> for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
> using FOLL_FORCE, which is really only for ptrace access.
> 
> Cc: Mauro Carvalho Chehab 
> Signed-off-by: David Hildenbrand 

I looked at this a while ago when going through some of the follow_pfn
stuff, so

Reviewed-by: Daniel Vetter 

> ---
>  drivers/media/v4l2-core/videobuf-dma-sg.c | 14 +-
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c 
> b/drivers/media/v4l2-core/videobuf-dma-sg.c
> index f75e5eedeee0..234e9f647c96 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-sg.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
> @@ -151,17 +151,16 @@ static void videobuf_dma_init(struct videobuf_dmabuf 
> *dma)
>  static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
>   int direction, unsigned long data, unsigned long size)
>  {
> + unsigned int gup_flags = FOLL_LONGTERM;
>   unsigned long first, last;
> - int err, rw = 0;
> - unsigned int flags = FOLL_FORCE;
> + int err;
>  
>   dma->direction = direction;
>   switch (dma->direction) {
>   case DMA_FROM_DEVICE:
> - rw = READ;
> + gup_flags |= FOLL_WRITE;
>   break;
>   case DMA_TO_DEVICE:
> - rw = WRITE;
>   break;
>   default:
>   BUG();
> @@ -177,14 +176,11 @@ static int videobuf_dma_init_user_locked(struct 
> videobuf_dmabuf *dma,
>   if (NULL == dma->pages)
>   return -ENOMEM;
>  
> - if (rw == READ)
> - flags |= FOLL_WRITE;
> -
>   dprintk(1, "init user [0x%lx+0x%lx => %lu pages]\n",
>   data, size, dma->nr_pages);
>  
> - err = pin_user_pages(data & PAGE_MASK, dma->nr_pages,
> -  flags | FOLL_LONGTERM, dma->pages, NULL);
> + err = pin_user_pages(data & PAGE_MASK, dma->nr_pages, gup_flags,
> +  dma->pages, NULL);
>  
>   if (err != dma->nr_pages) {
>   dma->nr_pages = (err >= 0) ? err : 0;
> -- 
> 2.38.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH mm-unstable v1 09/20] mm/gup: reliable R/O long-term pinning in COW mappings

2022-11-16 Thread Daniel Vetter

On Wed, Nov 16, 2022 at 11:26:48AM +0100, David Hildenbrand wrote:
> We already support reliable R/O pinning of anonymous memory. However,
> assume we end up pinning (R/O long-term) a pagecache page or the shared
> zeropage inside a writable private ("COW") mapping. The next write access
> will trigger a write-fault and replace the pinned page by an exclusive
> anonymous page in the process page tables to break COW: the pinned page no
> longer corresponds to the page mapped into the process' page table.
> 
> Now that FAULT_FLAG_UNSHARE can break COW on anything mapped into a
> COW mapping, let's properly break COW first before R/O long-term
> pinning something that's not an exclusive anon page inside a COW
> mapping. FAULT_FLAG_UNSHARE will break COW and map an exclusive anon page
> instead that can get pinned safely.
> 
> With this change, we can stop using FOLL_FORCE|FOLL_WRITE for reliable
> R/O long-term pinning in COW mappings.
> 
> With this change, the new R/O long-term pinning tests for non-anonymous
> memory succeed:
>   # [RUN] R/O longterm GUP pin ... with shared zeropage
>   ok 151 Longterm R/O pin is reliable
>   # [RUN] R/O longterm GUP pin ... with memfd
>   ok 152 Longterm R/O pin is reliable
>   # [RUN] R/O longterm GUP pin ... with tmpfile
>   ok 153 Longterm R/O pin is reliable
>   # [RUN] R/O longterm GUP pin ... with huge zeropage
>   ok 154 Longterm R/O pin is reliable
>   # [RUN] R/O longterm GUP pin ... with memfd hugetlb (2048 kB)
>   ok 155 Longterm R/O pin is reliable
>   # [RUN] R/O longterm GUP pin ... with memfd hugetlb (1048576 kB)
>   ok 156 Longterm R/O pin is reliable
>   # [RUN] R/O longterm GUP-fast pin ... with shared zeropage
>   ok 157 Longterm R/O pin is reliable
>   # [RUN] R/O longterm GUP-fast pin ... with memfd
>   ok 158 Longterm R/O pin is reliable
>   # [RUN] R/O longterm GUP-fast pin ... with tmpfile
>   ok 159 Longterm R/O pin is reliable
>   # [RUN] R/O longterm GUP-fast pin ... with huge zeropage
>   ok 160 Longterm R/O pin is reliable
>   # [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (2048 kB)
>   ok 161 Longterm R/O pin is reliable
>   # [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (1048576 kB)
>   ok 162 Longterm R/O pin is reliable
> 
> Note 1: We don't care about short-term R/O-pinning, because they have
> snapshot semantics: they are not supposed to observe modifications that
> happen after pinning.
> 
> As one example, assume we start direct I/O to read from a page and store
> page content into a file: modifications to page content after starting
> direct I/O are not guaranteed to end up in the file. So even if we'd pin
> the shared zeropage, the end result would be as expected -- getting zeroes
> stored to the file.
> 
> Note 2: For shared mappings we'll now always fallback to the slow path to
> lookup the VMA when R/O long-term pining. While that's the necessary price
> we have to pay right now, it's actually not that bad in practice: most
> FOLL_LONGTERM users already specify FOLL_WRITE, for example, along with
> FOLL_FORCE because they tried dealing with COW mappings correctly ...
> 
> Note 3: For users that use FOLL_LONGTERM right now without FOLL_WRITE,
> such as VFIO, we'd now no longer pin the shared zeropage. Instead, we'd
> populate exclusive anon pages that we can pin. There was a concern that
> this could affect the memlock limit of existing setups.
> 
> For example, a VM running with VFIO could run into the memlock limit and
> fail to run. However, we essentially had the same behavior already in
> commit 17839856fd58 ("gup: document and work around "COW can break either
> way" issue") which got merged into some enterprise distros, and there were
> not any such complaints. So most probably, we're fine.
> 
> Signed-off-by: David Hildenbrand 

I don't think my ack is any good for the implementation, but for the
driver side semantics this sounds like what we want :-)

Acked-by: Daniel Vetter 

> ---
>  include/linux/mm.h | 27 ---
>  mm/gup.c   | 10 +-
>  mm/huge_memory.c   |  2 +-
>  mm/hugetlb.c   |  7 ---
>  4 files changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6bd2ee5872dd..e8cc838f42f9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3095,8 +3095,12 @@ static inline int vm_fault_to_errno(vm_fault_t 
> vm_fault, int foll_flags)
>   * Must be called with the (sub)page that's actually referenced via the
>   * page table entry, which might not necessarily be the head page for a
>   * PTE-mapped THP.
> + *
> + * If the vma is NULL, we're coming from the GUP-fast path and might have
> + * to fallback to the slow path just to lookup the vma.
>   */
> -static inline bool gup_must_unshare(unsigned int flags, struct page *page)
> +static inline bool gup_must_unshare(struct vm_area_struct *vma,
> + unsigned int flags, struct page *page)
>  {
>   /*
>*

[PATCH mm-unstable v1 19/20] habanalabs: remove FOLL_FORCE usage

2022-11-16 Thread David Hildenbrand

FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages
using unpin_user_pages_dirty_lock(true), the assumption is that all these
pages are writable.

FOLL_FORCE in this case seems to be due to copy-and-past from other
drivers. Let's just remove it.

Acked-by: Oded Gabbay 
Cc: Oded Gabbay 
Cc: Arnd Bergmann 
Cc: Greg Kroah-Hartman 
Signed-off-by: David Hildenbrand 
---
 drivers/misc/habanalabs/common/memory.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/common/memory.c 
b/drivers/misc/habanalabs/common/memory.c
index ef28f3b37b93..e35cca96bbef 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -2312,8 +2312,7 @@ static int get_user_memory(struct hl_device *hdev, u64 
addr, u64 size,
if (!userptr->pages)
return -ENOMEM;
 
-   rc = pin_user_pages_fast(start, npages,
-FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+   rc = pin_user_pages_fast(start, npages, FOLL_WRITE | FOLL_LONGTERM,
 userptr->pages);
 
if (rc != npages) {
-- 
2.38.1

[PATCH mm-unstable v1 18/20] RDMA/hw/qib/qib_user_pages: remove FOLL_FORCE usage

2022-11-16 Thread David Hildenbrand

FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages
using unpin_user_pages_dirty_lock(true), the assumption is that all these
pages are writable.

FOLL_FORCE in this case seems to be a legacy leftover. Let's just remove
it.

Cc: Dennis Dalessandro 
Cc: Jason Gunthorpe 
Cc: Leon Romanovsky 
Signed-off-by: David Hildenbrand 
---
 drivers/infiniband/hw/qib/qib_user_pages.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index f4b5f05058e4..f693bc753b6b 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -110,7 +110,7 @@ int qib_get_user_pages(unsigned long start_page, size_t 
num_pages,
for (got = 0; got < num_pages; got += ret) {
ret = pin_user_pages(start_page + got * PAGE_SIZE,
 num_pages - got,
-FOLL_LONGTERM | FOLL_WRITE | FOLL_FORCE,
+FOLL_LONGTERM | FOLL_WRITE,
 p + got, NULL);
if (ret < 0) {
mmap_read_unlock(current->mm);
-- 
2.38.1

1 2 >

1 - 100 of 117 matches

Mail list logo