Re: [tip: locking/core] lockdep: Fix lockdep recursion

2020-10-14 Thread Paul E. McKenney
On Tue, Oct 13, 2020 at 12:30:25PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 13, 2020 at 09:26:50AM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 13, 2020 at 01:25:44PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 13, 2020 at 12:44:50PM +0200, Peter Zijlstra wrote:
> > > > On Tue, Oct 13, 2020 at 12:34:06PM +0200, Peter Zijlstra wrote:
> > > > > On Mon, Oct 12, 2020 at 02:28:12PM -0700, Paul E. McKenney wrote:
> > > > > > It is certainly an accident waiting to happen.  Would something like
> > > > > > the following make sense?
> > > > > 
> > > > > Sadly no.
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index bfd38f2..52a63bc 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -4067,6 +4067,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > > > > >  
> > > > > > rnp = rdp->mynode;
> > > > > > mask = rdp->grpmask;
> > > > > > +   lockdep_off();
> > > > > > raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > > WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
> > > > > > newcpu = !(rnp->expmaskinitnext & mask);
> > > > > > @@ -4086,6 +4087,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > > > > > } else {
> > > > > > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > > > }
> > > > > > +   lockdep_on();
> > > > > > smp_mb(); /* Ensure RCU read-side usage follows above 
> > > > > > initialization. */
> > > > > >  }
> > > > > 
> > > > > This will just shut it up, but will not fix the actual problem of that
> > > > > spin-lock ending up in trace_lock_acquire() which relies on RCU which
> > > > > isn't looking.
> > > > > 
> > > > > What we need here is to supress tracing not lockdep. Let me consider.
> > > > 
> > > > We appear to have a similar problem with rcu_report_dead(), it's
> > > > raw_spin_unlock()s can end up in trace_lock_release() while we just
> > > > killed RCU.
> > > 
> > > So we can deal with the explicit trace_*() calls like the below, but I
> > > really don't like it much. It also doesn't help with function tracing.
> > > This is really early/late in the hotplug cycle and should be considered
> > > entry, we shouldn't be tracing anything here.
> > > 
> > > Paul, would it be possible to use a scheme similar to IRQ/NMI for
> > > hotplug? That seems to mostly rely on atomic ops, not locks.
> > 
> > The rest of the rcu_node tree and the various grace-period/hotplug races
> > makes that question non-trivial.  I will look into it, but I have no
> > reason for optimism.
> > 
> > But there is only one way to find out...  ;-)
> 
> The aforementioned races get really ugly really fast.  So I do not
> believe that a lockless approach is a strategy to win here.
> 
> But why not use something sort of like a sequence counter, but adapted
> for local on-CPU use?  This should quiet the diagnostics for the full
> time that RCU needs its locks.  Untested patch below.
> 
> Thoughts?

Well, one problem with the previous patch is that it can result in false
negatives.  If a CPU incorrectly executes an RCU operation while offline,
but does so just when some other CPU sharing that same leaf rcu_node
structure is itself coming online or going offline, the incorrect RCU
operation will be ignored.  So this version avoids these false negatives
by putting the new ->ofl-seq field in the rcu_data structure instead of
the rcu_node structure.

Thanx, Paul



commit 7deaa04b02298001426730ed0e6214ac20d1a1c1
Author: Paul E. McKenney 
Date:   Tue Oct 13 12:39:23 2020 -0700

rcu: Prevent lockdep-RCU splats on lock acquisition/release

The rcu_cpu_starting() and rcu_report_dead() functions transition the
current CPU between online and offline state from an RCU perspective.
Unfortunately, this means that the rcu_cpu_starting() function's lock
acquisition and the rcu_report_dead() function's lock releases happen
while the CPU is offline from an RCU perspective, which can result in
lockdep-RCU splats about using RCU from an offline CPU.  In reality,
aside from the splats, both transitions are safe because a new grace
period cannot start until these functions release their locks.

But the false-positive splats nevertheless need to be eliminated.

This commit therefore uses sequence-count-like synchronization to forgive
use of RCU while RCU thinks a CPU is offline across the full extent of
the rcu_cpu_starting() and rcu_report_dead() function's lock acquisitions
and releases.

One approach would have been to use the actual sequence-count primitives
provided by the Linux kernel.  Unfortunately, the resulting code looks
completely broken and wrong, and is likely to result in patches that

Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2020-10-14 Thread Mike Kravetz
On 10/14/20 11:18 AM, David Hildenbrand wrote:
> On 14.10.20 19:56, Mina Almasry wrote:
>> On Wed, Oct 14, 2020 at 9:15 AM David Hildenbrand  wrote:
>>>
>>> On 14.10.20 17:22, David Hildenbrand wrote:
 Hi everybody,

 Michal Privoznik played with "free page reporting" in QEMU/virtio-balloon
 with hugetlbfs and reported that this results in [1]

 1. WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 
 page_counter_uncharge+0x4b/0x5

 2. Any hugetlbfs allocations failing. (I assume because some accounting is 
 wrong)


 QEMU with free page hinting uses fallocate(FALLOC_FL_PUNCH_HOLE)
 to discard pages that are reported as free by a VM. The reporting
 granularity is in pageblock granularity. So when the guest reports
 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE) one huge page in QEMU.

 I was also able to reproduce (also with virtio-mem, which similarly
 uses fallocate(FALLOC_FL_PUNCH_HOLE)) on latest v5.9
 (and on v5.7.X from F32).

 Looks like something with fallocate(FALLOC_FL_PUNCH_HOLE) accounting
 is broken with cgroups. I did *not* try without cgroups yet.

 Any ideas?
>>
>> Hi David,
>>
>> I may be able to dig in and take a look. How do I reproduce this
>> though? I just fallocate(FALLOC_FL_PUNCH_HOLE) one 2MB page in a
>> hugetlb region?
>>
> 
> Hi Mina,
> 
> thanks for having a look. I started poking around myself but,
> being new to cgroup code, I even failed to understand why that code gets
> triggered though the hugetlb controller isn't even enabled.
> 
> I assume you at least have to make sure that there is
> a page populated (MMAP_POPULATE, or read/write it). But I am not
> sure yet if a single fallocate(FALLOC_FL_PUNCH_HOLE) is
> sufficient, or if it will require a sequence of
> populate+discard(punch) (or multi-threading).

FWIW - I ran libhugetlbfs tests which do a bunch of hole punching
with (and without) hugetlb controller enabled and did not see this issue.

May need to reproduce via QEMU as below.
-- 
Mike Kravetz

> What definitely makes it trigger is via QEMU
> 
> qemu-system-x86_64 \
> -machine 
> pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
> -cpu host,migratable=on \
> -m 4096 \
> -object 
> memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,size=4294967296
>  \
> -overcommit mem-lock=off \
> -smp 4,sockets=1,dies=1,cores=2,threads=2 \
> -nodefaults \
> -nographic \
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
> -blockdev 
> '{"driver":"file","filename":"../Fedora-Cloud-Base-32-1.6.x86_64.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}'
>  \
> -blockdev 
> '{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-1-storage","backing":null}'
>  \
> -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1
>  \
> -chardev stdio,nosignal,id=serial \
> -device isa-serial,chardev=serial \
> -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7,free-page-reporting=on
> 
> 
> However, you need a recent QEMU (>= v5.1 IIRC) and a recent kernel
> (>= v5.7) inside your guest image.
> 
> Fedora rawhide qcow2 should do: 
> https://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/Fedora-Cloud-Base-Rawhide-20201004.n.1.x86_64.qcow2
> 


Re: [RFC PATCH v2] checkpatch: add shebang check to EXECUTE_PERMISSIONS

2020-10-14 Thread Miguel Ojeda
On Wed, Oct 14, 2020 at 8:05 PM Joe Perches  wrote:
>
> Any 'formatting off/on' marker should be tool agnostic.

Agreed, they should have used a compiler-agnostic name for the marker.

Cheers,
Miguel


Re: [PATCH 5.8 000/124] 5.8.15-rc1 review

2020-10-14 Thread Jeffrin Jose T
On Wed, 2020-10-14 at 11:56 +0200, Greg Kroah-Hartman wrote:
> On Mon, Oct 12, 2020 at 11:00:07PM +0530, Jeffrin Jose T wrote:
> >  * On Mon, 2020-10-12 at 15:30 +0200, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 5.8.15
> > > release.
> > > There are 124 patches in this series, all will be posted as a
> > > response
> > > to this one.  If anyone has any issues with these being applied,
> > > please
> > > let me know.
> > > 
> > > Responses should be made by Wed, 14 Oct 2020 13:31:22 +.
> > > Anything received after that time might be too late.
> > > 
> > > The whole patch series can be found in one patch at:
> > >   
> > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.15-rc1.gz
> > > or in the git tree and branch at:
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-
> > > stable-rc.git linux-5.8.y
> > > and the diffstat can be found below.
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > 
> > hello,
> > 
> > Compiled and booted 5.8.15-rc1+ .  No typical dmesg regression.
> > I also  have something to mention here. I saw  a warning related in
> > several  kernels which looks like the following...
> > 
> > "MDS CPU bug present and SMT on, data leak possible"
> > 
> > But now in 5.8.15-rc1+ , that warning disappeared.
> 
> Odds are your microcode/bios finally got updated on that machine,
> right?
> 
i do not think thot the bios got updoted, becouse the bug is still
shown 
to be present in 5.8.13-rc1+  . so moy be the updoted kernel is fixing
the 
bug or gives  o workround.

-- 
software engineer
rajagiri school of engineering and technology



[PATCH v2 05/20] kvm: x86/mmu: Add functions to handle changed TDP SPTEs

2020-10-14 Thread Ben Gardon
The existing bookkeeping done by KVM when a PTE is changed is spread
around several functions. This makes it difficult to remember all the
stats, bitmaps, and other subsystems that need to be updated whenever a
PTE is modified. When a non-leaf PTE is marked non-present or becomes a
leaf PTE, page table memory must also be freed. To simplify the MMU and
facilitate the use of atomic operations on SPTEs in future patches, create
functions to handle some of the bookkeeping required as a result of
a change.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c  |  39 +--
 arch/x86/kvm/mmu/mmu_internal.h |  38 +++
 arch/x86/kvm/mmu/tdp_mmu.c  | 112 
 3 files changed, 152 insertions(+), 37 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a3340ed59ad1d..8bf20723c6177 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -105,21 +105,6 @@ enum {
AUDIT_POST_SYNC
 };
 
-#undef MMU_DEBUG
-
-#ifdef MMU_DEBUG
-static bool dbg = 0;
-module_param(dbg, bool, 0644);
-
-#define pgprintk(x...) do { if (dbg) printk(x); } while (0)
-#define rmap_printk(x...) do { if (dbg) printk(x); } while (0)
-#define MMU_WARN_ON(x) WARN_ON(x)
-#else
-#define pgprintk(x...) do { } while (0)
-#define rmap_printk(x...) do { } while (0)
-#define MMU_WARN_ON(x) do { } while (0)
-#endif
-
 #define PTE_PREFETCH_NUM   8
 
 #define PT32_LEVEL_BITS 10
@@ -211,7 +196,6 @@ static u64 __read_mostly shadow_nx_mask;
 static u64 __read_mostly shadow_x_mask;/* mutual exclusive with 
nx_mask */
 static u64 __read_mostly shadow_user_mask;
 static u64 __read_mostly shadow_accessed_mask;
-static u64 __read_mostly shadow_dirty_mask;
 static u64 __read_mostly shadow_mmio_value;
 static u64 __read_mostly shadow_mmio_access_mask;
 static u64 __read_mostly shadow_present_mask;
@@ -287,8 +271,8 @@ static void kvm_flush_remote_tlbs_with_range(struct kvm 
*kvm,
kvm_flush_remote_tlbs(kvm);
 }
 
-static void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
-   u64 start_gfn, u64 pages)
+void kvm_flush_remote_tlbs_with_address(struct kvm *kvm, u64 start_gfn,
+   u64 pages)
 {
struct kvm_tlb_range range;
 
@@ -324,12 +308,6 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct 
kvm_vcpu *vcpu)
return vcpu->arch.mmu == >arch.guest_mmu;
 }
 
-static inline bool spte_ad_enabled(u64 spte)
-{
-   MMU_WARN_ON(is_mmio_spte(spte));
-   return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_DISABLED_MASK;
-}
-
 static inline bool spte_ad_need_write_protect(u64 spte)
 {
MMU_WARN_ON(is_mmio_spte(spte));
@@ -347,12 +325,6 @@ static inline u64 spte_shadow_accessed_mask(u64 spte)
return spte_ad_enabled(spte) ? shadow_accessed_mask : 0;
 }
 
-static inline u64 spte_shadow_dirty_mask(u64 spte)
-{
-   MMU_WARN_ON(is_mmio_spte(spte));
-   return spte_ad_enabled(spte) ? shadow_dirty_mask : 0;
-}
-
 static inline bool is_access_track_spte(u64 spte)
 {
return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0;
@@ -767,13 +739,6 @@ static bool is_accessed_spte(u64 spte)
 : !is_access_track_spte(spte);
 }
 
-static bool is_dirty_spte(u64 spte)
-{
-   u64 dirty_mask = spte_shadow_dirty_mask(spte);
-
-   return dirty_mask ? spte & dirty_mask : spte & PT_WRITABLE_MASK;
-}
-
 /* Rules for using mmu_spte_set:
  * Set the sptep from nonpresent to present.
  * Note: the sptep being assigned *must* be either not present
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 6cedf578c9a8d..c053a157e4d55 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -8,6 +8,21 @@
 
 #include 
 
+#undef MMU_DEBUG
+
+#ifdef MMU_DEBUG
+static bool dbg = 0;
+module_param(dbg, bool, 0644);
+
+#define pgprintk(x...) do { if (dbg) printk(x); } while (0)
+#define rmap_printk(x...) do { if (dbg) printk(x); } while (0)
+#define MMU_WARN_ON(x) WARN_ON(x)
+#else
+#define pgprintk(x...) do { } while (0)
+#define rmap_printk(x...) do { } while (0)
+#define MMU_WARN_ON(x) do { } while (0)
+#endif
+
 struct kvm_mmu_page {
struct list_head link;
struct hlist_node hash_link;
@@ -105,6 +120,8 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 #define ACC_USER_MASKPT_USER_MASK
 #define ACC_ALL  (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
 
+static u64 __read_mostly shadow_dirty_mask;
+
 /* Functions for interpreting SPTEs */
 static inline bool is_mmio_spte(u64 spte)
 {
@@ -150,4 +167,25 @@ static inline bool kvm_mmu_put_root(struct kvm_mmu_page 
*sp)
 }
 
 
+static inline bool spte_ad_enabled(u64 spte)
+{
+  

[PATCH v2 04/20] kvm: x86/mmu: Allocate and free TDP MMU roots

2020-10-14 Thread Ben Gardon
The TDP MMU must be able to allocate paging structure root pages and track
the usage of those pages. Implement a similar, but separate system for root
page allocation to that of the x86 shadow paging implementation. When
future patches add synchronization model changes to allow for parallel
page faults, these pages will need to be handled differently from the
x86 shadow paging based MMU's root pages.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/mmu/mmu.c  |  29 +---
 arch/x86/kvm/mmu/mmu_internal.h |  24 +++
 arch/x86/kvm/mmu/tdp_mmu.c  | 114 
 arch/x86/kvm/mmu/tdp_mmu.h  |   5 ++
 5 files changed, 162 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6b6dbc20ce23a..e0ec1dd271a32 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -989,6 +989,7 @@ struct kvm_arch {
 * operations.
 */
bool tdp_mmu_enabled;
+   struct list_head tdp_mmu_roots;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f53d29e09367c..a3340ed59ad1d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -144,11 +144,6 @@ module_param(dbg, bool, 0644);
 #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
| shadow_x_mask | shadow_nx_mask | shadow_me_mask)
 
-#define ACC_EXEC_MASK1
-#define ACC_WRITE_MASK   PT_WRITABLE_MASK
-#define ACC_USER_MASKPT_USER_MASK
-#define ACC_ALL  (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
-
 /* The mask for the R/X bits in EPT PTEs */
 #define PT64_EPT_READABLE_MASK 0x1ull
 #define PT64_EPT_EXECUTABLE_MASK   0x4ull
@@ -209,7 +204,7 @@ struct kvm_shadow_walk_iterator {
 __shadow_walk_next(&(_walker), spte))
 
 static struct kmem_cache *pte_list_desc_cache;
-static struct kmem_cache *mmu_page_header_cache;
+struct kmem_cache *mmu_page_header_cache;
 static struct percpu_counter kvm_total_used_mmu_pages;
 
 static u64 __read_mostly shadow_nx_mask;
@@ -3588,9 +3583,13 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t 
*root_hpa,
return;
 
sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
-   --sp->root_count;
-   if (!sp->root_count && sp->role.invalid)
-   kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
+
+   if (kvm_mmu_put_root(sp)) {
+   if (sp->tdp_mmu_page)
+   kvm_tdp_mmu_free_root(kvm, sp);
+   else if (sp->role.invalid)
+   kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
+   }
 
*root_hpa = INVALID_PAGE;
 }
@@ -3680,8 +3679,16 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
hpa_t root;
unsigned i;
 
-   if (shadow_root_level >= PT64_ROOT_4LEVEL) {
-   root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true);
+   if (vcpu->kvm->arch.tdp_mmu_enabled) {
+   root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
+
+   if (!VALID_PAGE(root))
+   return -ENOSPC;
+   vcpu->arch.mmu->root_hpa = root;
+   } else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
+   root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level,
+ true);
+
if (!VALID_PAGE(root))
return -ENOSPC;
vcpu->arch.mmu->root_hpa = root;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 74ccbf001a42e..6cedf578c9a8d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -43,8 +43,12 @@ struct kvm_mmu_page {
 
/* Number of writes since the last time traversal visited this page.  */
atomic_t write_flooding_count;
+
+   bool tdp_mmu_page;
 };
 
+extern struct kmem_cache *mmu_page_header_cache;
+
 static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
 {
struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT);
@@ -96,6 +100,11 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
(PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
* PT64_LEVEL_BITS))) - 1))
 
+#define ACC_EXEC_MASK1
+#define ACC_WRITE_MASK   PT_WRITABLE_MASK
+#define ACC_USER_MASKPT_USER_MASK
+#define ACC_ALL  (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
+
 /* Functions for interpreting SPTEs */
 static inline bool is_mmio_spte(u64 spte)
 {
@@ -126,4 +135,19 @@ static inline kvm_pfn_t spte_to_pfn(u64 pte)
return (pte & PT64_BASE_ADDR_MASK) 

[PATCH -next] Revert "powerpc/pci: unmap legacy INTx interrupts when a PHB is removed"

2020-10-14 Thread Qian Cai
This reverts commit 3a3181e16fbde752007759f8759d25e0ff1fc425 which
causes memory corruptions on POWER9 NV.

Signed-off-by: Qian Cai 
---
 arch/powerpc/include/asm/pci-bridge.h |   6 --
 arch/powerpc/kernel/pci-common.c  | 114 --
 2 files changed, 120 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index d21e070352dc..d2a2a14e56f9 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -48,9 +48,6 @@ struct pci_controller_ops {
 
 /*
  * Structure of a PCI controller (host bridge)
- *
- * @irq_count: number of interrupt mappings
- * @irq_map: interrupt mappings
  */
 struct pci_controller {
struct pci_bus *bus;
@@ -130,9 +127,6 @@ struct pci_controller {
 
void *private_data;
struct npu *npu;
-
-   unsigned int irq_count;
-   unsigned int *irq_map;
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index deb831f0ae13..be108616a721 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -353,115 +353,6 @@ struct pci_controller *pci_find_controller_for_domain(int 
domain_nr)
return NULL;
 }
 
-/*
- * Assumption is made on the interrupt parent. All interrupt-map
- * entries are considered to have the same parent.
- */
-static int pcibios_irq_map_count(struct pci_controller *phb)
-{
-   const __be32 *imap;
-   int imaplen;
-   struct device_node *parent;
-   u32 intsize, addrsize, parintsize, paraddrsize;
-
-   if (of_property_read_u32(phb->dn, "#interrupt-cells", ))
-   return 0;
-   if (of_property_read_u32(phb->dn, "#address-cells", ))
-   return 0;
-
-   imap = of_get_property(phb->dn, "interrupt-map", );
-   if (!imap) {
-   pr_debug("%pOF : no interrupt-map\n", phb->dn);
-   return 0;
-   }
-   imaplen /= sizeof(u32);
-   pr_debug("%pOF : imaplen=%d\n", phb->dn, imaplen);
-
-   if (imaplen < (addrsize + intsize + 1))
-   return 0;
-
-   imap += intsize + addrsize;
-   parent = of_find_node_by_phandle(be32_to_cpup(imap));
-   if (!parent) {
-   pr_debug("%pOF : no imap parent found !\n", phb->dn);
-   return 0;
-   }
-
-   if (of_property_read_u32(parent, "#interrupt-cells", )) {
-   pr_debug("%pOF : parent lacks #interrupt-cells!\n", phb->dn);
-   return 0;
-   }
-
-   if (of_property_read_u32(parent, "#address-cells", ))
-   paraddrsize = 0;
-
-   return imaplen / (addrsize + intsize + 1 + paraddrsize + parintsize);
-}
-
-static void pcibios_irq_map_init(struct pci_controller *phb)
-{
-   phb->irq_count = pcibios_irq_map_count(phb);
-   if (phb->irq_count < PCI_NUM_INTX)
-   phb->irq_count = PCI_NUM_INTX;
-
-   pr_debug("%pOF : interrupt map #%d\n", phb->dn, phb->irq_count);
-
-   phb->irq_map = kcalloc(phb->irq_count, sizeof(unsigned int),
-  GFP_KERNEL);
-}
-
-static void pci_irq_map_register(struct pci_dev *pdev, unsigned int virq)
-{
-   struct pci_controller *phb = pci_bus_to_host(pdev->bus);
-   int i;
-
-   if (!phb->irq_map)
-   return;
-
-   for (i = 0; i < phb->irq_count; i++) {
-   /*
-* Look for an empty or an equivalent slot, as INTx
-* interrupts can be shared between adapters.
-*/
-   if (phb->irq_map[i] == virq || !phb->irq_map[i]) {
-   phb->irq_map[i] = virq;
-   break;
-   }
-   }
-
-   if (i == phb->irq_count)
-   pr_err("PCI:%s all platform interrupts mapped\n",
-  pci_name(pdev));
-}
-
-/*
- * Clearing the mapped interrupts will also clear the underlying
- * mappings of the ESB pages of the interrupts when under XIVE. It is
- * a requirement of PowerVM to clear all memory mappings before
- * removing a PHB.
- */
-static void pci_irq_map_dispose(struct pci_bus *bus)
-{
-   struct pci_controller *phb = pci_bus_to_host(bus);
-   int i;
-
-   if (!phb->irq_map)
-   return;
-
-   pr_debug("PCI: Clearing interrupt mappings for PHB %04x:%02x...\n",
-pci_domain_nr(bus), bus->number);
-   for (i = 0; i < phb->irq_count; i++)
-   irq_dispose_mapping(phb->irq_map[i]);
-
-   kfree(phb->irq_map);
-}
-
-void pcibios_remove_bus(struct pci_bus *bus)
-{
-   pci_irq_map_dispose(bus);
-}
-EXPORT_SYMBOL_GPL(pcibios_remove_bus);
-
 /*
  * Reads the interrupt pin to determine if interrupt is use by card.
  * If the interrupt is used, then gets the interrupt line from the
@@ -510,8 +401,6 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
 
pci_dev->irq = virq;
 
-   /* Record all 

[PATCH v2 06/20] KVM: Cache as_id in kvm_memory_slot

2020-10-14 Thread Ben Gardon
From: Peter Xu 

Cache the address space ID just like the slot ID.  It will be used in
order to fill in the dirty ring entries.

Suggested-by: Paolo Bonzini 
Suggested-by: Sean Christopherson 
Reviewed-by: Sean Christopherson 
Signed-off-by: Peter Xu 
---
 include/linux/kvm_host.h | 1 +
 virt/kvm/kvm_main.c  | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 05e3c2fb3ef78..c6f45687ba89c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -346,6 +346,7 @@ struct kvm_memory_slot {
unsigned long userspace_addr;
u32 flags;
short id;
+   u16 as_id;
 };
 
 static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot 
*memslot)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 68edd25dcb11f..2e85392131252 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1247,6 +1247,11 @@ static int kvm_delete_memslot(struct kvm *kvm,
 
memset(, 0, sizeof(new));
new.id = old->id;
+   /*
+* This is only for debugging purpose; it should never be referenced
+* for a removed memslot.
+*/
+   new.as_id = as_id;
 
r = kvm_set_memslot(kvm, mem, old, , as_id, KVM_MR_DELETE);
if (r)
@@ -1313,6 +1318,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (!mem->memory_size)
return kvm_delete_memslot(kvm, mem, , as_id);
 
+   new.as_id = as_id;
new.id = id;
new.base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
new.npages = mem->memory_size >> PAGE_SHIFT;
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v2 09/20] kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg

2020-10-14 Thread Ben Gardon
In order to avoid creating executable hugepages in the TDP MMU PF
handler, remove the dependency between disallowed_hugepage_adjust and
the shadow_walk_iterator. This will open the function up to being used
by the TDP MMU PF handler in a future patch.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c | 13 +++--
 arch/x86/kvm/mmu/paging_tmpl.h |  3 ++-
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 05024b8ae5a4d..288b97e96202e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3243,13 +3243,12 @@ static int kvm_mmu_hugepage_adjust(struct kvm_vcpu 
*vcpu, gfn_t gfn,
return level;
 }
 
-static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator it,
-  gfn_t gfn, kvm_pfn_t *pfnp, int *levelp)
+static void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
+  kvm_pfn_t *pfnp, int *levelp)
 {
int level = *levelp;
-   u64 spte = *it.sptep;
 
-   if (it.level == level && level > PG_LEVEL_4K &&
+   if (cur_level == level && level > PG_LEVEL_4K &&
is_shadow_present_pte(spte) &&
!is_large_pte(spte)) {
/*
@@ -3259,7 +3258,8 @@ static void disallowed_hugepage_adjust(struct 
kvm_shadow_walk_iterator it,
 * patching back for them into pfn the next 9 bits of
 * the address.
 */
-   u64 page_mask = KVM_PAGES_PER_HPAGE(level) - 
KVM_PAGES_PER_HPAGE(level - 1);
+   u64 page_mask = KVM_PAGES_PER_HPAGE(level) -
+   KVM_PAGES_PER_HPAGE(level - 1);
*pfnp |= gfn & page_mask;
(*levelp)--;
}
@@ -3292,7 +3292,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, 
u32 error_code,
 * large page, as the leaf could be executable.
 */
if (nx_huge_page_workaround_enabled)
-   disallowed_hugepage_adjust(it, gfn, , );
+   disallowed_hugepage_adjust(*it.sptep, gfn, it.level,
+  , );
 
base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
if (it.level == level)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 9a1a15f19beb6..50e268eb8e1a9 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -695,7 +695,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 * large page, as the leaf could be executable.
 */
if (nx_huge_page_workaround_enabled)
-   disallowed_hugepage_adjust(it, gw->gfn, , );
+   disallowed_hugepage_adjust(*it.sptep, gw->gfn, it.level,
+  , );
 
base_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
if (it.level == level)
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v2 14/20] kvm: x86/mmu: Support changed pte notifier in tdp MMU

2020-10-14 Thread Ben Gardon
In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
a hook and handle the change_pte MMU notifier.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c  | 21 ++---
 arch/x86/kvm/mmu/mmu_internal.h | 29 +
 arch/x86/kvm/mmu/tdp_mmu.c  | 56 +
 arch/x86/kvm/mmu/tdp_mmu.h  |  3 ++
 4 files changed, 98 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e6ab79d8f215f..ef9ea3f45241b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -135,9 +135,6 @@ enum {
 
 #include 
 
-#define SPTE_HOST_WRITEABLE(1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
-#define SPTE_MMU_WRITEABLE (1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))
-
 /* make pte_list_desc fit well in cache line */
 #define PTE_LIST_EXT 3
 
@@ -1615,13 +1612,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, struct 
kvm_rmap_head *rmap_head,
pte_list_remove(rmap_head, sptep);
goto restart;
} else {
-   new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
-   new_spte |= (u64)new_pfn << PAGE_SHIFT;
-
-   new_spte &= ~PT_WRITABLE_MASK;
-   new_spte &= ~SPTE_HOST_WRITEABLE;
-
-   new_spte = mark_spte_for_access_track(new_spte);
+   new_spte = kvm_mmu_changed_pte_notifier_make_spte(
+   *sptep, new_pfn);
 
mmu_spte_clear_track_bits(sptep);
mmu_spte_set(sptep, new_spte);
@@ -1777,7 +1769,14 @@ int kvm_unmap_hva_range(struct kvm *kvm, unsigned long 
start, unsigned long end,
 
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 {
-   return kvm_handle_hva(kvm, hva, (unsigned long), kvm_set_pte_rmapp);
+   int r;
+
+   r = kvm_handle_hva(kvm, hva, (unsigned long), kvm_set_pte_rmapp);
+
+   if (kvm->arch.tdp_mmu_enabled)
+   r |= kvm_tdp_mmu_set_spte_hva(kvm, hva, );
+
+   return r;
 }
 
 static int kvm_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d886fe750be38..49c3a04d2b894 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -115,6 +115,12 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
(PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
* PT64_LEVEL_BITS))) - 1))
 
+#ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
+#define PT64_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1))
+#else
+#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
+#endif
+
 #define ACC_EXEC_MASK1
 #define ACC_WRITE_MASK   PT_WRITABLE_MASK
 #define ACC_USER_MASKPT_USER_MASK
@@ -132,6 +138,12 @@ static u64 __read_mostly shadow_x_mask;/* mutual 
exclusive with nx_mask */
  */
 static u64 __read_mostly shadow_acc_track_mask;
 
+#define PT_FIRST_AVAIL_BITS_SHIFT 10
+#define PT64_SECOND_AVAIL_BITS_SHIFT 54
+
+#define SPTE_HOST_WRITEABLE(1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
+#define SPTE_MMU_WRITEABLE (1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))
+
 /* Functions for interpreting SPTEs */
 static inline bool is_mmio_spte(u64 spte)
 {
@@ -264,4 +276,21 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache 
*mc);
 
 u64 mark_spte_for_access_track(u64 spte);
 
+static inline u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte,
+kvm_pfn_t new_pfn)
+{
+   u64 new_spte;
+
+   new_spte = old_spte & ~PT64_BASE_ADDR_MASK;
+   new_spte |= (u64)new_pfn << PAGE_SHIFT;
+
+   new_spte &= ~PT_WRITABLE_MASK;
+   new_spte &= ~SPTE_HOST_WRITEABLE;
+
+   new_spte = mark_spte_for_access_track(new_spte);
+
+   return new_spte;
+}
+
+
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 575970d8805a4..90abd55c89375 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -677,3 +677,59 @@ int kvm_tdp_mmu_test_age_hva(struct kvm *kvm, unsigned 
long hva)
return kvm_tdp_mmu_handle_hva_range(kvm, hva, hva + 1, 0,
test_age_gfn);
 }
+
+/*
+ * Handle the changed_pte MMU notifier for the TDP MMU.
+ * data is a pointer to the new pte_t mapping the HVA specified by the MMU
+ * notifier.
+ * Returns non-zero if a flush is needed before releasing the MMU lock.
+ */
+static int set_tdp_spte(struct kvm *kvm, struct kvm_memory_slot *slot,

[PATCH v2 12/20] kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU

2020-10-14 Thread Ben Gardon
In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
hooks to handle the invalidate range family of MMU notifiers.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c |  9 -
 arch/x86/kvm/mmu/tdp_mmu.c | 80 +++---
 arch/x86/kvm/mmu/tdp_mmu.h |  2 +
 3 files changed, 85 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 421a12a247b67..00534133f99fc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1781,7 +1781,14 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long 
hva,
 int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end,
unsigned flags)
 {
-   return kvm_handle_hva_range(kvm, start, end, 0, kvm_unmap_rmapp);
+   int r;
+
+   r = kvm_handle_hva_range(kvm, start, end, 0, kvm_unmap_rmapp);
+
+   if (kvm->arch.tdp_mmu_enabled)
+   r |= kvm_tdp_mmu_zap_hva_range(kvm, start, end);
+
+   return r;
 }
 
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 78d41a1949651..9ec6c26ed6619 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -58,7 +58,7 @@ bool is_tdp_mmu_root(struct kvm *kvm, hpa_t hpa)
 }
 
 static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
- gfn_t start, gfn_t end);
+ gfn_t start, gfn_t end, bool can_yield);
 
 void kvm_tdp_mmu_free_root(struct kvm *kvm, struct kvm_mmu_page *root)
 {
@@ -71,7 +71,7 @@ void kvm_tdp_mmu_free_root(struct kvm *kvm, struct 
kvm_mmu_page *root)
 
list_del(>link);
 
-   zap_gfn_range(kvm, root, 0, max_gfn);
+   zap_gfn_range(kvm, root, 0, max_gfn, false);
 
free_page((unsigned long)root->spt);
kmem_cache_free(mmu_page_header_cache, root);
@@ -318,9 +318,14 @@ static bool tdp_mmu_iter_cond_resched(struct kvm *kvm, 
struct tdp_iter *iter)
  * non-root pages mapping GFNs strictly within that range. Returns true if
  * SPTEs have been cleared and a TLB flush is needed before releasing the
  * MMU lock.
+ * If can_yield is true, will release the MMU lock and reschedule if the
+ * scheduler needs the CPU or there is contention on the MMU lock. If this
+ * function cannot yield, it will not release the MMU lock or reschedule and
+ * the caller must ensure it does not supply too large a GFN range, or the
+ * operation can cause a soft lockup.
  */
 static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
- gfn_t start, gfn_t end)
+ gfn_t start, gfn_t end, bool can_yield)
 {
struct tdp_iter iter;
bool flush_needed = false;
@@ -341,7 +346,10 @@ static bool zap_gfn_range(struct kvm *kvm, struct 
kvm_mmu_page *root,
 
tdp_mmu_set_spte(kvm, , 0);
 
-   flush_needed = !tdp_mmu_iter_cond_resched(kvm, );
+   if (can_yield)
+   flush_needed = !tdp_mmu_iter_cond_resched(kvm, );
+   else
+   flush_needed = true;
}
return flush_needed;
 }
@@ -364,7 +372,7 @@ bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t 
start, gfn_t end)
 */
get_tdp_mmu_root(kvm, root);
 
-   flush |= zap_gfn_range(kvm, root, start, end);
+   flush |= zap_gfn_range(kvm, root, start, end, true);
 
put_tdp_mmu_root(kvm, root);
}
@@ -502,3 +510,65 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 
error_code,
 
return ret;
 }
+
+static int kvm_tdp_mmu_handle_hva_range(struct kvm *kvm, unsigned long start,
+   unsigned long end, unsigned long data,
+   int (*handler)(struct kvm *kvm, struct kvm_memory_slot *slot,
+  struct kvm_mmu_page *root, gfn_t start,
+  gfn_t end, unsigned long data))
+{
+   struct kvm_memslots *slots;
+   struct kvm_memory_slot *memslot;
+   struct kvm_mmu_page *root;
+   int ret = 0;
+   int as_id;
+
+   for_each_tdp_mmu_root(kvm, root) {
+   /*
+* Take a reference on the root so that it cannot be freed if
+* this thread releases the MMU lock and yields in this loop.
+*/
+   get_tdp_mmu_root(kvm, root);
+
+   as_id = kvm_mmu_page_as_id(root);
+   slots = __kvm_memslots(kvm, as_id);
+   kvm_for_each_memslot(memslot, slots) {
+   unsigned long hva_start, hva_end;
+

[PATCH v2 15/20] kvm: x86/mmu: Support dirty logging for the TDP MMU

2020-10-14 Thread Ben Gardon
Dirty logging is a key feature of the KVM MMU and must be supported by
the TDP MMU. Add support for both the write protection and PML dirty
logging modes.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c  |  20 ++-
 arch/x86/kvm/mmu/mmu_internal.h |   6 +
 arch/x86/kvm/mmu/tdp_iter.h |   7 +-
 arch/x86/kvm/mmu/tdp_mmu.c  | 292 +++-
 arch/x86/kvm/mmu/tdp_mmu.h  |  10 ++
 include/linux/kvm_host.h|   1 +
 virt/kvm/kvm_main.c |   6 +-
 7 files changed, 327 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ef9ea3f45241b..b2ce57761d2f1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -277,12 +277,6 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct 
kvm_vcpu *vcpu)
return vcpu->arch.mmu == >arch.guest_mmu;
 }
 
-static inline bool spte_ad_need_write_protect(u64 spte)
-{
-   MMU_WARN_ON(is_mmio_spte(spte));
-   return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_ENABLED_MASK;
-}
-
 bool is_nx_huge_page_enabled(void)
 {
return READ_ONCE(nx_huge_pages);
@@ -1483,6 +1477,9 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm 
*kvm,
 {
struct kvm_rmap_head *rmap_head;
 
+   if (kvm->arch.tdp_mmu_enabled)
+   kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
+   slot->base_gfn + gfn_offset, mask, true);
while (mask) {
rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + 
__ffs(mask),
  PG_LEVEL_4K, slot);
@@ -1509,6 +1506,9 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 {
struct kvm_rmap_head *rmap_head;
 
+   if (kvm->arch.tdp_mmu_enabled)
+   kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
+   slot->base_gfn + gfn_offset, mask, false);
while (mask) {
rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + 
__ffs(mask),
  PG_LEVEL_4K, slot);
@@ -5853,6 +5853,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
spin_lock(>mmu_lock);
flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
+   if (kvm->arch.tdp_mmu_enabled)
+   flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, PG_LEVEL_4K);
spin_unlock(>mmu_lock);
 
/*
@@ -5941,6 +5943,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 
spin_lock(>mmu_lock);
flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false);
+   if (kvm->arch.tdp_mmu_enabled)
+   flush |= kvm_tdp_mmu_clear_dirty_slot(kvm, memslot);
spin_unlock(>mmu_lock);
 
/*
@@ -5962,6 +5966,8 @@ void kvm_mmu_slot_largepage_remove_write_access(struct 
kvm *kvm,
spin_lock(>mmu_lock);
flush = slot_handle_large_level(kvm, memslot, slot_rmap_write_protect,
false);
+   if (kvm->arch.tdp_mmu_enabled)
+   flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, PG_LEVEL_2M);
spin_unlock(>mmu_lock);
 
if (flush)
@@ -5976,6 +5982,8 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
 
spin_lock(>mmu_lock);
flush = slot_handle_all_level(kvm, memslot, __rmap_set_dirty, false);
+   if (kvm->arch.tdp_mmu_enabled)
+   flush |= kvm_tdp_mmu_slot_set_dirty(kvm, memslot);
spin_unlock(>mmu_lock);
 
if (flush)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 49c3a04d2b894..a7230532bb845 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -232,6 +232,12 @@ static inline bool is_executable_pte(u64 spte)
return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask;
 }
 
+static inline bool spte_ad_need_write_protect(u64 spte)
+{
+   MMU_WARN_ON(is_mmio_spte(spte));
+   return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_ENABLED_MASK;
+}
+
 void kvm_flush_remote_tlbs_with_address(struct kvm *kvm, u64 start_gfn,
u64 pages);
 
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 884ed2c70bfed..47170d0dc98e5 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -41,11 +41,14 @@ struct tdp_iter {
  * Iterates over every SPTE mapping the GFN range [start, end) in a
  * preorder traversal.
  */
-#define for_each_tdp_pte(iter, root, root_level, start, end) \
-   for (tdp_iter_start(, root, root_level, PG_LEVEL_4K, start); \
+#define for_each_tdp_pte_min_level(iter, root, root_level, min_level, start, 
end) \
+   for 

[PATCH v2 19/20] kvm: x86/mmu: Don't clear write flooding count for direct roots

2020-10-14 Thread Ben Gardon
Direct roots don't have a write flooding count because the guest can't
affect that paging structure. Thus there's no need to clear the write
flooding count on a fast CR3 switch for direct roots.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2e8bf8d19c35a..3935c10278736 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4266,7 +4266,13 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, 
gpa_t new_pgd,
 */
vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
 
-   
__clear_sp_write_flooding_count(to_shadow_page(vcpu->arch.mmu->root_hpa));
+   /*
+* If this is a direct root page, it doesn't have a write flooding
+* count. Otherwise, clear the write flooding count.
+*/
+   if (!new_role.direct)
+   __clear_sp_write_flooding_count(
+   to_shadow_page(vcpu->arch.mmu->root_hpa));
 }
 
 void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, bool skip_tlb_flush,
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v2 16/20] kvm: x86/mmu: Support disabling dirty logging for the tdp MMU

2020-10-14 Thread Ben Gardon
Dirty logging ultimately breaks down MMU mappings to 4k granularity.
When dirty logging is no longer needed, these granaular mappings
represent a useless performance penalty. When dirty logging is disabled,
search the paging structure for mappings that could be re-constituted
into a large page mapping. Zap those mappings so that they can be
faulted in again at a higher mapping level.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c |  3 ++
 arch/x86/kvm/mmu/tdp_mmu.c | 59 ++
 arch/x86/kvm/mmu/tdp_mmu.h |  2 ++
 3 files changed, 64 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b2ce57761d2f1..8fcf5e955c475 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5918,6 +5918,9 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
spin_lock(>mmu_lock);
slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
 kvm_mmu_zap_collapsible_spte, true);
+
+   if (kvm->arch.tdp_mmu_enabled)
+   kvm_tdp_mmu_zap_collapsible_sptes(kvm, memslot);
spin_unlock(>mmu_lock);
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 099c7d68aeb1d..94624cc1df84c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1019,3 +1019,62 @@ bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct 
kvm_memory_slot *slot)
return spte_set;
 }
 
+/*
+ * Clear non-leaf entries (and free associated page tables) which could
+ * be replaced by large mappings, for GFNs within the slot.
+ */
+static void zap_collapsible_spte_range(struct kvm *kvm,
+  struct kvm_mmu_page *root,
+  gfn_t start, gfn_t end)
+{
+   struct tdp_iter iter;
+   kvm_pfn_t pfn;
+   bool spte_set = false;
+
+   tdp_root_for_each_pte(iter, root, start, end) {
+   if (!is_shadow_present_pte(iter.old_spte) ||
+   is_last_spte(iter.old_spte, iter.level))
+   continue;
+
+   pfn = spte_to_pfn(iter.old_spte);
+   if (kvm_is_reserved_pfn(pfn) ||
+   !PageTransCompoundMap(pfn_to_page(pfn)))
+   continue;
+
+   tdp_mmu_set_spte(kvm, , 0);
+   spte_set = true;
+
+   spte_set = !tdp_mmu_iter_cond_resched(kvm, );
+   }
+
+   if (spte_set)
+   kvm_flush_remote_tlbs(kvm);
+}
+
+/*
+ * Clear non-leaf entries (and free associated page tables) which could
+ * be replaced by large mappings, for GFNs within the slot.
+ */
+void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
+  const struct kvm_memory_slot *slot)
+{
+   struct kvm_mmu_page *root;
+   int root_as_id;
+
+   for_each_tdp_mmu_root(kvm, root) {
+   root_as_id = kvm_mmu_page_as_id(root);
+   if (root_as_id != slot->as_id)
+   continue;
+
+   /*
+* Take a reference on the root so that it cannot be freed if
+* this thread releases the MMU lock and yields in this loop.
+*/
+   get_tdp_mmu_root(kvm, root);
+
+   zap_collapsible_spte_range(kvm, root, slot->base_gfn,
+  slot->base_gfn + slot->npages);
+
+   put_tdp_mmu_root(kvm, root);
+   }
+}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index add8bb97c56dd..dc4cdc5cc29f5 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -38,4 +38,6 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
   gfn_t gfn, unsigned long mask,
   bool wrprot);
 bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
+  const struct kvm_memory_slot *slot);
 #endif /* __KVM_X86_MMU_TDP_MMU_H */
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v2 18/20] kvm: x86/mmu: Support MMIO in the TDP MMU

2020-10-14 Thread Ben Gardon
In order to support MMIO, KVM must be able to walk the TDP paging
structures to find mappings for a given GFN. Support this walk for
the TDP MMU.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

v2: Thanks to Dan Carpenter and kernel test robot for finding that root
was used uninitialized in get_mmio_spte.

Signed-off-by: Ben Gardon 
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 
---
 arch/x86/kvm/mmu/mmu.c | 70 ++
 arch/x86/kvm/mmu/tdp_mmu.c | 18 ++
 arch/x86/kvm/mmu/tdp_mmu.h |  2 ++
 3 files changed, 69 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 58d2412817c87..2e8bf8d19c35a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3853,54 +3853,82 @@ static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, 
u64 addr, bool direct)
return vcpu_match_mmio_gva(vcpu, addr);
 }
 
-/* return true if reserved bit is detected on spte. */
-static bool
-walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
+/*
+ * Return the level of the lowest level SPTE added to sptes.
+ * That SPTE may be non-present.
+ */
+static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes)
 {
struct kvm_shadow_walk_iterator iterator;
-   u64 sptes[PT64_ROOT_MAX_LEVEL], spte = 0ull;
-   struct rsvd_bits_validate *rsvd_check;
-   int root, leaf;
-   bool reserved = false;
+   int leaf = vcpu->arch.mmu->root_level;
+   u64 spte;
 
-   rsvd_check = >arch.mmu->shadow_zero_check;
 
walk_shadow_page_lockless_begin(vcpu);
 
-   for (shadow_walk_init(, vcpu, addr),
-leaf = root = iterator.level;
+   for (shadow_walk_init(, vcpu, addr);
 shadow_walk_okay();
 __shadow_walk_next(, spte)) {
+   leaf = iterator.level;
spte = mmu_spte_get_lockless(iterator.sptep);
 
sptes[leaf - 1] = spte;
-   leaf--;
 
if (!is_shadow_present_pte(spte))
break;
 
+   }
+
+   walk_shadow_page_lockless_end(vcpu);
+
+   return leaf;
+}
+
+/* return true if reserved bit is detected on spte. */
+static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
+{
+   u64 sptes[PT64_ROOT_MAX_LEVEL];
+   struct rsvd_bits_validate *rsvd_check;
+   int root = vcpu->arch.mmu->root_level;
+   int leaf;
+   int level;
+   bool reserved = false;
+
+   if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) {
+   *sptep = 0ull;
+   return reserved;
+   }
+
+   if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
+   leaf = kvm_tdp_mmu_get_walk(vcpu, addr, sptes);
+   else
+   leaf = get_walk(vcpu, addr, sptes);
+
+   rsvd_check = >arch.mmu->shadow_zero_check;
+
+   for (level = root; level >= leaf; level--) {
+   if (!is_shadow_present_pte(sptes[level - 1]))
+   break;
/*
 * Use a bitwise-OR instead of a logical-OR to aggregate the
 * reserved bit and EPT's invalid memtype/XWR checks to avoid
 * adding a Jcc in the loop.
 */
-   reserved |= __is_bad_mt_xwr(rsvd_check, spte) |
-   __is_rsvd_bits_set(rsvd_check, spte, 
iterator.level);
+   reserved |= __is_bad_mt_xwr(rsvd_check, sptes[level - 1]) |
+   __is_rsvd_bits_set(rsvd_check, sptes[level - 1],
+  level);
}
 
-   walk_shadow_page_lockless_end(vcpu);
-
if (reserved) {
pr_err("%s: detect reserved bits on spte, addr 0x%llx, dump 
hierarchy:\n",
   __func__, addr);
-   while (root > leaf) {
+   for (level = root; level >= leaf; level--)
pr_err("-- spte 0x%llx level %d.\n",
-  sptes[root - 1], root);
-   root--;
-   }
+  sptes[level - 1], level);
}
 
-   *sptep = spte;
+   *sptep = sptes[leaf - 1];
+
return reserved;
 }
 
@@ -3912,7 +3940,7 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, 
u64 addr, bool direct)
if (mmio_info_in_cache(vcpu, addr, direct))
return RET_PF_EMULATE;
 
-   reserved = walk_shadow_page_get_mmio_spte(vcpu, addr, );
+   reserved = get_mmio_spte(vcpu, addr, );
if (WARN_ON(reserved))
return -EINVAL;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c471f2e977d11..b1515b89606e1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ 

[PATCH v2 20/20] kvm: x86/mmu: NX largepage recovery for TDP MMU

2020-10-14 Thread Ben Gardon
When KVM maps a largepage backed region at a lower level in order to
make it executable (i.e. NX large page shattering), it reduces the TLB
performance of that region. In order to avoid making this degradation
permanent, KVM must periodically reclaim shattered NX largepages by
zapping them and allowing them to be rebuilt in the page fault handler.

With this patch, the TDP MMU does not respect KVM's rate limiting on
reclaim. It traverses the entire TDP structure every time. This will be
addressed in a future patch.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c  | 13 +
 arch/x86/kvm/mmu/mmu_internal.h |  3 +++
 arch/x86/kvm/mmu/tdp_mmu.c  |  6 ++
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3935c10278736..5c8a35e4c872b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1030,7 +1030,7 @@ static void account_shadowed(struct kvm *kvm, struct 
kvm_mmu_page *sp)
kvm_mmu_gfn_disallow_lpage(slot, gfn);
 }
 
-static void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
if (sp->lpage_disallowed)
return;
@@ -1058,7 +1058,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct 
kvm_mmu_page *sp)
kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
 
-static void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
--kvm->stat.nx_lpage_splits;
sp->lpage_disallowed = false;
@@ -6362,8 +6362,13 @@ static void kvm_recover_nx_lpages(struct kvm *kvm)
  struct kvm_mmu_page,
  lpage_disallowed_link);
WARN_ON_ONCE(!sp->lpage_disallowed);
-   kvm_mmu_prepare_zap_page(kvm, sp, _list);
-   WARN_ON_ONCE(sp->lpage_disallowed);
+   if (sp->tdp_mmu_page)
+   kvm_tdp_mmu_zap_gfn_range(kvm, sp->gfn,
+   sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level));
+   else {
+   kvm_mmu_prepare_zap_page(kvm, sp, _list);
+   WARN_ON_ONCE(sp->lpage_disallowed);
+   }
 
if (need_resched() || spin_needbreak(>mmu_lock)) {
kvm_mmu_commit_zap_page(kvm, _list);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index a7230532bb845..88899a2666d86 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -299,4 +299,7 @@ static inline u64 
kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte,
 }
 
 
+void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b1515b89606e1..2949759c6aa84 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -289,6 +289,9 @@ static void __handle_changed_spte(struct kvm *kvm, int 
as_id, gfn_t gfn,
 
list_del(>link);
 
+   if (sp->lpage_disallowed)
+   unaccount_huge_nx_page(kvm, sp);
+
for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
old_child_spte = *(pt + i);
*(pt + i) = 0;
@@ -567,6 +570,9 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 
error_code,
new_spte = make_nonleaf_spte(child_pt,
 !shadow_accessed_mask);
 
+   if (huge_page_disallowed && req_level >= iter.level)
+   account_huge_nx_page(vcpu->kvm, sp);
+
tdp_mmu_set_spte(vcpu->kvm, , new_spte);
}
}
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v2 17/20] kvm: x86/mmu: Support write protection for nesting in tdp MMU

2020-10-14 Thread Ben Gardon
To support nested virtualization, KVM will sometimes need to write
protect pages which are part of a shadowed paging structure or are not
writable in the shadowed paging structure. Add a function to write
protect GFN mappings for this purpose.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c |  4 +++
 arch/x86/kvm/mmu/tdp_mmu.c | 50 ++
 arch/x86/kvm/mmu/tdp_mmu.h |  3 +++
 3 files changed, 57 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8fcf5e955c475..58d2412817c87 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1553,6 +1553,10 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
write_protected |= __rmap_write_protect(kvm, rmap_head, true);
}
 
+   if (kvm->arch.tdp_mmu_enabled)
+   write_protected |=
+   kvm_tdp_mmu_write_protect_gfn(kvm, slot, gfn);
+
return write_protected;
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 94624cc1df84c..c471f2e977d11 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1078,3 +1078,53 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
put_tdp_mmu_root(kvm, root);
}
 }
+
+/*
+ * Removes write access on the last level SPTE mapping this GFN and unsets the
+ * SPTE_MMU_WRITABLE bit to ensure future writes continue to be intercepted.
+ * Returns true if an SPTE was set and a TLB flush is needed.
+ */
+static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
+ gfn_t gfn)
+{
+   struct tdp_iter iter;
+   u64 new_spte;
+   bool spte_set = false;
+
+   tdp_root_for_each_leaf_pte(iter, root, gfn, gfn + 1) {
+   if (!is_writable_pte(iter.old_spte))
+   break;
+
+   new_spte = iter.old_spte &
+   ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE);
+
+   tdp_mmu_set_spte(kvm, , new_spte);
+   spte_set = true;
+   }
+
+   return spte_set;
+}
+
+/*
+ * Removes write access on the last level SPTE mapping this GFN and unsets the
+ * SPTE_MMU_WRITABLE bit to ensure future writes continue to be intercepted.
+ * Returns true if an SPTE was set and a TLB flush is needed.
+ */
+bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
+  struct kvm_memory_slot *slot, gfn_t gfn)
+{
+   struct kvm_mmu_page *root;
+   int root_as_id;
+   bool spte_set = false;
+
+   lockdep_assert_held(>mmu_lock);
+   for_each_tdp_mmu_root(kvm, root) {
+   root_as_id = kvm_mmu_page_as_id(root);
+   if (root_as_id != slot->as_id)
+   continue;
+
+   spte_set = write_protect_gfn(kvm, root, gfn) || spte_set;
+   }
+   return spte_set;
+}
+
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index dc4cdc5cc29f5..b66283db43221 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -40,4 +40,7 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
   const struct kvm_memory_slot *slot);
+
+bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
+  struct kvm_memory_slot *slot, gfn_t gfn);
 #endif /* __KVM_X86_MMU_TDP_MMU_H */
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v2 11/20] kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU

2020-10-14 Thread Ben Gardon
Attach struct kvm_mmu_pages to every page in the TDP MMU to track
metadata, facilitate NX reclaim, and enable inproved parallelism of MMU
operations in future patches.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/include/asm/kvm_host.h |  4 
 arch/x86/kvm/mmu/tdp_mmu.c  | 13 ++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e0ec1dd271a32..2568dcd134156 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -989,7 +989,11 @@ struct kvm_arch {
 * operations.
 */
bool tdp_mmu_enabled;
+
+   /* List of struct tdp_mmu_pages being used as roots */
struct list_head tdp_mmu_roots;
+   /* List of struct tdp_mmu_pages not being used as roots */
+   struct list_head tdp_mmu_pages;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index f92c12c4ce31a..78d41a1949651 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -34,6 +34,7 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
kvm->arch.tdp_mmu_enabled = true;
 
INIT_LIST_HEAD(>arch.tdp_mmu_roots);
+   INIT_LIST_HEAD(>arch.tdp_mmu_pages);
 }
 
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
@@ -188,6 +189,7 @@ static void __handle_changed_spte(struct kvm *kvm, int 
as_id, gfn_t gfn,
bool is_leaf = is_present && is_last_spte(new_spte, level);
bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
u64 *pt;
+   struct kvm_mmu_page *sp;
u64 old_child_spte;
int i;
 
@@ -253,6 +255,9 @@ static void __handle_changed_spte(struct kvm *kvm, int 
as_id, gfn_t gfn,
 */
if (was_present && !was_leaf && (pfn_changed || !is_present)) {
pt = spte_to_child_pt(old_spte, level);
+   sp = sptep_to_sp(pt);
+
+   list_del(>link);
 
for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
old_child_spte = *(pt + i);
@@ -266,6 +271,7 @@ static void __handle_changed_spte(struct kvm *kvm, int 
as_id, gfn_t gfn,
   KVM_PAGES_PER_HPAGE(level));
 
free_page((unsigned long)pt);
+   kmem_cache_free(mmu_page_header_cache, sp);
}
 }
 
@@ -434,8 +440,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 
error_code,
bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
struct kvm_mmu *mmu = vcpu->arch.mmu;
struct tdp_iter iter;
-   struct kvm_mmu_memory_cache *pf_pt_cache =
-   >arch.mmu_shadow_page_cache;
+   struct kvm_mmu_page *sp;
u64 *child_pt;
u64 new_spte;
int ret;
@@ -479,7 +484,9 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 
error_code,
}
 
if (!is_shadow_present_pte(iter.old_spte)) {
-   child_pt = kvm_mmu_memory_cache_alloc(pf_pt_cache);
+   sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level);
+   list_add(>link, >kvm->arch.tdp_mmu_pages);
+   child_pt = sp->spt;
clear_page(child_pt);
new_spte = make_nonleaf_spte(child_pt,
 !shadow_accessed_mask);
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v2 10/20] kvm: x86/mmu: Add TDP MMU PF handler

2020-10-14 Thread Ben Gardon
Add functions to handle page faults in the TDP MMU. These page faults
are currently handled in much the same way as the x86 shadow paging
based MMU, however the ordering of some operations is slightly
different. Future patches will add eager NX splitting, a fast page fault
handler, and parallel page faults.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c  |  82 +++--
 arch/x86/kvm/mmu/mmu_internal.h |  59 +++
 arch/x86/kvm/mmu/tdp_mmu.c  | 124 
 arch/x86/kvm/mmu/tdp_mmu.h  |   5 ++
 4 files changed, 212 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 288b97e96202e..421a12a247b67 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -141,23 +141,6 @@ enum {
 /* make pte_list_desc fit well in cache line */
 #define PTE_LIST_EXT 3
 
-/*
- * Return values of handle_mmio_page_fault, mmu.page_fault, and 
fast_page_fault().
- *
- * RET_PF_RETRY: let CPU fault again on the address.
- * RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
- * RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
- * RET_PF_FIXED: The faulting entry has been fixed.
- * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU.
- */
-enum {
-   RET_PF_RETRY = 0,
-   RET_PF_EMULATE,
-   RET_PF_INVALID,
-   RET_PF_FIXED,
-   RET_PF_SPURIOUS,
-};
-
 struct pte_list_desc {
u64 *sptes[PTE_LIST_EXT];
struct pte_list_desc *more;
@@ -195,19 +178,11 @@ static struct percpu_counter kvm_total_used_mmu_pages;
 static u64 __read_mostly shadow_nx_mask;
 static u64 __read_mostly shadow_x_mask;/* mutual exclusive with 
nx_mask */
 static u64 __read_mostly shadow_user_mask;
-static u64 __read_mostly shadow_accessed_mask;
 static u64 __read_mostly shadow_mmio_value;
 static u64 __read_mostly shadow_mmio_access_mask;
 static u64 __read_mostly shadow_present_mask;
 static u64 __read_mostly shadow_me_mask;
 
-/*
- * SPTEs used by MMUs without A/D bits are marked with SPTE_AD_DISABLED_MASK;
- * shadow_acc_track_mask is the set of bits to be cleared in non-accessed
- * pages.
- */
-static u64 __read_mostly shadow_acc_track_mask;
-
 /*
  * The mask/shift to use for saving the original R/X bits when marking the PTE
  * as not-present for access tracking purposes. We do not save the W bit as the
@@ -314,22 +289,11 @@ static inline bool spte_ad_need_write_protect(u64 spte)
return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_ENABLED_MASK;
 }
 
-static bool is_nx_huge_page_enabled(void)
+bool is_nx_huge_page_enabled(void)
 {
return READ_ONCE(nx_huge_pages);
 }
 
-static inline u64 spte_shadow_accessed_mask(u64 spte)
-{
-   MMU_WARN_ON(is_mmio_spte(spte));
-   return spte_ad_enabled(spte) ? shadow_accessed_mask : 0;
-}
-
-static inline bool is_access_track_spte(u64 spte)
-{
-   return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0;
-}
-
 /*
  * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
  * the memslots generation and is derived as follows:
@@ -377,7 +341,7 @@ static u64 get_mmio_spte_generation(u64 spte)
return gen;
 }
 
-static u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access)
+u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access)
 {
 
u64 gen = kvm_vcpu_memslots(vcpu)->generation & MMIO_SPTE_GEN_MASK;
@@ -2468,7 +2432,7 @@ static void shadow_walk_next(struct 
kvm_shadow_walk_iterator *iterator)
__shadow_walk_next(iterator, *iterator->sptep);
 }
 
-static u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
+u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
 {
u64 spte;
 
@@ -2886,15 +2850,10 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
 E820_TYPE_RAM);
 }
 
-/* Bits which may be returned by set_spte() */
-#define SET_SPTE_WRITE_PROTECTED_PTBIT(0)
-#define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1)
-#define SET_SPTE_SPURIOUS  BIT(2)
-
-static int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
-gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative,
-bool can_unsync, bool host_writable, bool ad_disabled,
-u64 *new_spte)
+int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
+ gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative,
+ bool can_unsync, bool host_writable, bool ad_disabled,
+ u64 *new_spte)
 {
u64 spte = 0;
int ret = 0;
@@ -3187,9 +3146,9 @@ static int host_pfn_mapping_level(struct kvm_vcpu *vcpu, 
gfn_t gfn,
return level;
 }
 

[PATCH v2 07/20] kvm: x86/mmu: Support zapping SPTEs in the TDP MMU

2020-10-14 Thread Ben Gardon
Add functions to zap SPTEs to the TDP MMU. These are needed to tear down
TDP MMU roots properly and implement other MMU functions which require
tearing down mappings. Future patches will add functions to populate the
page tables, but as for this patch there will not be any work for these
functions to do.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c  |  15 +
 arch/x86/kvm/mmu/tdp_iter.c |   5 ++
 arch/x86/kvm/mmu/tdp_iter.h |   1 +
 arch/x86/kvm/mmu/tdp_mmu.c  | 109 
 arch/x86/kvm/mmu/tdp_mmu.h  |   2 +
 5 files changed, 132 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8bf20723c6177..337ab6823e312 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5787,6 +5787,10 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
kvm_reload_remote_mmus(kvm);
 
kvm_zap_obsolete_pages(kvm);
+
+   if (kvm->arch.tdp_mmu_enabled)
+   kvm_tdp_mmu_zap_all(kvm);
+
spin_unlock(>mmu_lock);
 }
 
@@ -5827,6 +5831,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, 
gfn_t gfn_end)
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
int i;
+   bool flush;
 
spin_lock(>mmu_lock);
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
@@ -5846,6 +5851,12 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, 
gfn_t gfn_end)
}
}
 
+   if (kvm->arch.tdp_mmu_enabled) {
+   flush = kvm_tdp_mmu_zap_gfn_range(kvm, gfn_start, gfn_end);
+   if (flush)
+   kvm_flush_remote_tlbs(kvm);
+   }
+
spin_unlock(>mmu_lock);
 }
 
@@ -6012,6 +6023,10 @@ void kvm_mmu_zap_all(struct kvm *kvm)
}
 
kvm_mmu_commit_zap_page(kvm, _list);
+
+   if (kvm->arch.tdp_mmu_enabled)
+   kvm_tdp_mmu_zap_all(kvm);
+
spin_unlock(>mmu_lock);
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index b07e9f0c5d4aa..701eb753b701e 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -174,3 +174,8 @@ void tdp_iter_refresh_walk(struct tdp_iter *iter)
   iter->root_level, iter->min_level, goal_gfn);
 }
 
+u64 *tdp_iter_root_pt(struct tdp_iter *iter)
+{
+   return iter->pt_path[iter->root_level - 1];
+}
+
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index d629a53e1b73f..884ed2c70bfed 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -52,5 +52,6 @@ void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int 
root_level,
int min_level, gfn_t goal_gfn);
 void tdp_iter_next(struct tdp_iter *iter);
 void tdp_iter_refresh_walk(struct tdp_iter *iter);
+u64 *tdp_iter_root_pt(struct tdp_iter *iter);
 
 #endif /* __KVM_X86_MMU_TDP_ITER_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index f2bd3a6928ce9..9b5cd4a832f1a 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -56,8 +56,13 @@ bool is_tdp_mmu_root(struct kvm *kvm, hpa_t hpa)
return sp->tdp_mmu_page && sp->root_count;
 }
 
+static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
+ gfn_t start, gfn_t end);
+
 void kvm_tdp_mmu_free_root(struct kvm *kvm, struct kvm_mmu_page *root)
 {
+   gfn_t max_gfn = 1ULL << (boot_cpu_data.x86_phys_bits - PAGE_SHIFT);
+
lockdep_assert_held(>mmu_lock);
 
WARN_ON(root->root_count);
@@ -65,6 +70,8 @@ void kvm_tdp_mmu_free_root(struct kvm *kvm, struct 
kvm_mmu_page *root)
 
list_del(>link);
 
+   zap_gfn_range(kvm, root, 0, max_gfn);
+
free_page((unsigned long)root->spt);
kmem_cache_free(mmu_page_header_cache, root);
 }
@@ -155,6 +162,11 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
u64 old_spte, u64 new_spte, int level);
 
+static int kvm_mmu_page_as_id(struct kvm_mmu_page *sp)
+{
+   return sp->role.smm ? 1 : 0;
+}
+
 /**
  * handle_changed_spte - handle bookkeeping associated with an SPTE change
  * @kvm: kvm instance
@@ -262,3 +274,100 @@ static void handle_changed_spte(struct kvm *kvm, int 
as_id, gfn_t gfn,
 {
__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level);
 }
+
+static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
+   u64 new_spte)
+{
+   u64 *root_pt = tdp_iter_root_pt(iter);
+   struct kvm_mmu_page *root = sptep_to_sp(root_pt);
+   int as_id = kvm_mmu_page_as_id(root);
+
+   *iter->sptep = new_spte;
+
+   handle_changed_spte(kvm, as_id, 

[PATCH v2 02/20] kvm: x86/mmu: Introduce tdp_iter

2020-10-14 Thread Ben Gardon
The TDP iterator implements a pre-order traversal of a TDP paging
structure. This iterator will be used in future patches to create
an efficient implementation of the KVM MMU for the TDP case.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/Makefile   |   3 +-
 arch/x86/kvm/mmu/mmu.c  |  66 
 arch/x86/kvm/mmu/mmu_internal.h |  66 
 arch/x86/kvm/mmu/tdp_iter.c | 176 
 arch/x86/kvm/mmu/tdp_iter.h |  56 ++
 5 files changed, 300 insertions(+), 67 deletions(-)
 create mode 100644 arch/x86/kvm/mmu/tdp_iter.c
 create mode 100644 arch/x86/kvm/mmu/tdp_iter.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 7f86a14aed0e9..4525c1151bf99 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -15,7 +15,8 @@ kvm-$(CONFIG_KVM_ASYNC_PF)+= $(KVM)/async_pf.o
 
 kvm-y  += x86.o emulate.o i8259.o irq.o lapic.o \
   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
-  hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o
+  hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
+  mmu/tdp_iter.o
 
 kvm-intel-y+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o 
\
   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6c9db349600c8..6d82784ed5679 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -121,28 +121,6 @@ module_param(dbg, bool, 0644);
 
 #define PTE_PREFETCH_NUM   8
 
-#define PT_FIRST_AVAIL_BITS_SHIFT 10
-#define PT64_SECOND_AVAIL_BITS_SHIFT 54
-
-/*
- * The mask used to denote special SPTEs, which can be either MMIO SPTEs or
- * Access Tracking SPTEs.
- */
-#define SPTE_SPECIAL_MASK (3ULL << 52)
-#define SPTE_AD_ENABLED_MASK (0ULL << 52)
-#define SPTE_AD_DISABLED_MASK (1ULL << 52)
-#define SPTE_AD_WRPROT_ONLY_MASK (2ULL << 52)
-#define SPTE_MMIO_MASK (3ULL << 52)
-
-#define PT64_LEVEL_BITS 9
-
-#define PT64_LEVEL_SHIFT(level) \
-   (PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS)
-
-#define PT64_INDEX(address, level)\
-   (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))
-
-
 #define PT32_LEVEL_BITS 10
 
 #define PT32_LEVEL_SHIFT(level) \
@@ -155,19 +133,6 @@ module_param(dbg, bool, 0644);
 #define PT32_INDEX(address, level)\
(((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1))
 
-
-#ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
-#define PT64_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1))
-#else
-#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
-#endif
-#define PT64_LVL_ADDR_MASK(level) \
-   (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
-   * PT64_LEVEL_BITS))) - 1))
-#define PT64_LVL_OFFSET_MASK(level) \
-   (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
-   * PT64_LEVEL_BITS))) - 1))
-
 #define PT32_BASE_ADDR_MASK PAGE_MASK
 #define PT32_DIR_BASE_ADDR_MASK \
(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
@@ -192,8 +157,6 @@ module_param(dbg, bool, 0644);
 #define SPTE_HOST_WRITEABLE(1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
 #define SPTE_MMU_WRITEABLE (1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))
 
-#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
-
 /* make pte_list_desc fit well in cache line */
 #define PTE_LIST_EXT 3
 
@@ -349,11 +312,6 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 
access_mask)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
 
-static bool is_mmio_spte(u64 spte)
-{
-   return (spte & SPTE_SPECIAL_MASK) == SPTE_MMIO_MASK;
-}
-
 static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
 {
return sp->role.ad_disabled;
@@ -626,35 +584,11 @@ static int is_nx(struct kvm_vcpu *vcpu)
return vcpu->arch.efer & EFER_NX;
 }
 
-static int is_shadow_present_pte(u64 pte)
-{
-   return (pte != 0) && !is_mmio_spte(pte);
-}
-
-static int is_large_pte(u64 pte)
-{
-   return pte & PT_PAGE_SIZE_MASK;
-}
-
-static int is_last_spte(u64 pte, int level)
-{
-   if (level == PG_LEVEL_4K)
-   return 1;
-   if (is_large_pte(pte))
-   return 1;
-   return 0;
-}
-
 static bool is_executable_pte(u64 spte)
 {
return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask;
 }
 
-static kvm_pfn_t spte_to_pfn(u64 pte)
-{
-   return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
-}
-
 static gfn_t pse36_gfn_delta(u32 gpte)
 {
int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h 

[PATCH v2 08/20] kvm: x86/mmu: Separate making non-leaf sptes from link_shadow_page

2020-10-14 Thread Ben Gardon
The TDP MMU page fault handler will need to be able to create non-leaf
SPTEs to build up the paging structures. Rather than re-implementing the
function, factor the SPTE creation out of link_shadow_page.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 337ab6823e312..05024b8ae5a4d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2468,21 +2468,30 @@ static void shadow_walk_next(struct 
kvm_shadow_walk_iterator *iterator)
__shadow_walk_next(iterator, *iterator->sptep);
 }
 
-static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
-struct kvm_mmu_page *sp)
+static u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
 {
u64 spte;
 
-   BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
-
-   spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK |
+   spte = __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK |
   shadow_user_mask | shadow_x_mask | shadow_me_mask;
 
-   if (sp_ad_disabled(sp))
+   if (ad_disabled)
spte |= SPTE_AD_DISABLED_MASK;
else
spte |= shadow_accessed_mask;
 
+   return spte;
+}
+
+static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+struct kvm_mmu_page *sp)
+{
+   u64 spte;
+
+   BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
+
+   spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
+
mmu_spte_set(sptep, spte);
 
mmu_page_add_parent_pte(vcpu, sp, sptep);
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v2 13/20] kvm: x86/mmu: Add access tracking for tdp_mmu

2020-10-14 Thread Ben Gardon
In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. The
main Linux MM uses the access tracking MMU notifiers for swap and other
features. Add hooks to handle the test/flush HVA (range) family of
MMU notifiers.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/kvm/mmu/mmu.c  |  34 +-
 arch/x86/kvm/mmu/mmu_internal.h |  17 +
 arch/x86/kvm/mmu/tdp_mmu.c  | 113 ++--
 arch/x86/kvm/mmu/tdp_mmu.h  |   4 ++
 4 files changed, 145 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 00534133f99fc..e6ab79d8f215f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -175,8 +175,6 @@ static struct kmem_cache *pte_list_desc_cache;
 struct kmem_cache *mmu_page_header_cache;
 static struct percpu_counter kvm_total_used_mmu_pages;
 
-static u64 __read_mostly shadow_nx_mask;
-static u64 __read_mostly shadow_x_mask;/* mutual exclusive with 
nx_mask */
 static u64 __read_mostly shadow_user_mask;
 static u64 __read_mostly shadow_mmio_value;
 static u64 __read_mostly shadow_mmio_access_mask;
@@ -221,7 +219,6 @@ static u64 __read_mostly 
shadow_nonpresent_or_rsvd_lower_gfn_mask;
 static u8 __read_mostly shadow_phys_bits;
 
 static void mmu_spte_set(u64 *sptep, u64 spte);
-static bool is_executable_pte(u64 spte);
 static union kvm_mmu_page_role
 kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
 
@@ -516,11 +513,6 @@ static int is_nx(struct kvm_vcpu *vcpu)
return vcpu->arch.efer & EFER_NX;
 }
 
-static bool is_executable_pte(u64 spte)
-{
-   return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask;
-}
-
 static gfn_t pse36_gfn_delta(u32 gpte)
 {
int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
@@ -695,14 +687,6 @@ static bool spte_has_volatile_bits(u64 spte)
return false;
 }
 
-static bool is_accessed_spte(u64 spte)
-{
-   u64 accessed_mask = spte_shadow_accessed_mask(spte);
-
-   return accessed_mask ? spte & accessed_mask
-: !is_access_track_spte(spte);
-}
-
 /* Rules for using mmu_spte_set:
  * Set the sptep from nonpresent to present.
  * Note: the sptep being assigned *must* be either not present
@@ -838,7 +822,7 @@ static u64 mmu_spte_get_lockless(u64 *sptep)
return __get_spte_lockless(sptep);
 }
 
-static u64 mark_spte_for_access_track(u64 spte)
+u64 mark_spte_for_access_track(u64 spte)
 {
if (spte_ad_enabled(spte))
return spte & ~shadow_accessed_mask;
@@ -1842,12 +1826,24 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 
*spte, gfn_t gfn)
 
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
 {
-   return kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp);
+   int young = false;
+
+   young = kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp);
+   if (kvm->arch.tdp_mmu_enabled)
+   young |= kvm_tdp_mmu_age_hva_range(kvm, start, end);
+
+   return young;
 }
 
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
 {
-   return kvm_handle_hva(kvm, hva, 0, kvm_test_age_rmapp);
+   int young = false;
+
+   young = kvm_handle_hva(kvm, hva, 0, kvm_test_age_rmapp);
+   if (kvm->arch.tdp_mmu_enabled)
+   young |= kvm_tdp_mmu_test_age_hva(kvm, hva);
+
+   return young;
 }
 
 #ifdef MMU_DEBUG
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index f7fe5616eff98..d886fe750be38 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -122,6 +122,8 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 
 static u64 __read_mostly shadow_dirty_mask;
 static u64 __read_mostly shadow_accessed_mask;
+static u64 __read_mostly shadow_nx_mask;
+static u64 __read_mostly shadow_x_mask;/* mutual exclusive with 
nx_mask */
 
 /*
  * SPTEs used by MMUs without A/D bits are marked with SPTE_AD_DISABLED_MASK;
@@ -205,6 +207,19 @@ static inline bool is_access_track_spte(u64 spte)
return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0;
 }
 
+static inline bool is_accessed_spte(u64 spte)
+{
+   u64 accessed_mask = spte_shadow_accessed_mask(spte);
+
+   return accessed_mask ? spte & accessed_mask
+: !is_access_track_spte(spte);
+}
+
+static inline bool is_executable_pte(u64 spte)
+{
+   return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask;
+}
+
 void kvm_flush_remote_tlbs_with_address(struct kvm *kvm, u64 start_gfn,
u64 pages);
 
@@ -247,4 +262,6 @@ bool is_nx_huge_page_enabled(void);
 
 void *mmu_memory_cache_alloc(struct 

[PATCH v2 01/20] kvm: x86/mmu: Separate making SPTEs from set_spte

2020-10-14 Thread Ben Gardon
Separate the functions for generating leaf page table entries from the
function that inserts them into the paging structure. This refactoring
will facilitate changes to the MMU sychronization model to use atomic
compare / exchanges (which are not guaranteed to succeed) instead of a
monolithic MMU lock.

No functional change expected.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This commit introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
Reviewed-by: Peter Shier 
---
 arch/x86/kvm/mmu/mmu.c | 49 --
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 32e0e5c0524e5..6c9db349600c8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2987,20 +2987,15 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
 #define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1)
 #define SET_SPTE_SPURIOUS  BIT(2)
 
-static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-   unsigned int pte_access, int level,
-   gfn_t gfn, kvm_pfn_t pfn, bool speculative,
-   bool can_unsync, bool host_writable)
+static int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
+gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative,
+bool can_unsync, bool host_writable, bool ad_disabled,
+u64 *new_spte)
 {
u64 spte = 0;
int ret = 0;
-   struct kvm_mmu_page *sp;
 
-   if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
-   return 0;
-
-   sp = sptep_to_sp(sptep);
-   if (sp_ad_disabled(sp))
+   if (ad_disabled)
spte |= SPTE_AD_DISABLED_MASK;
else if (kvm_vcpu_ad_need_write_protect(vcpu))
spte |= SPTE_AD_WRPROT_ONLY_MASK;
@@ -3053,8 +3048,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 * is responsibility of mmu_get_page / kvm_sync_page.
 * Same reasoning can be applied to dirty page accounting.
 */
-   if (!can_unsync && is_writable_pte(*sptep))
-   goto set_pte;
+   if (!can_unsync && is_writable_pte(old_spte))
+   goto out;
 
if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
pgprintk("%s: found shadow page for %llx, marking ro\n",
@@ -3065,15 +3060,37 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
}
}
 
-   if (pte_access & ACC_WRITE_MASK) {
-   kvm_vcpu_mark_page_dirty(vcpu, gfn);
+   if (pte_access & ACC_WRITE_MASK)
spte |= spte_shadow_dirty_mask(spte);
-   }
 
if (speculative)
spte = mark_spte_for_access_track(spte);
 
-set_pte:
+out:
+   *new_spte = spte;
+   return ret;
+}
+
+static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
+   unsigned int pte_access, int level,
+   gfn_t gfn, kvm_pfn_t pfn, bool speculative,
+   bool can_unsync, bool host_writable)
+{
+   u64 spte;
+   struct kvm_mmu_page *sp;
+   int ret;
+
+   if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
+   return 0;
+
+   sp = sptep_to_sp(sptep);
+
+   ret = make_spte(vcpu, pte_access, level, gfn, pfn, *sptep, speculative,
+   can_unsync, host_writable, sp_ad_disabled(sp), );
+
+   if (spte & PT_WRITABLE_MASK)
+   kvm_vcpu_mark_page_dirty(vcpu, gfn);
+
if (*sptep == spte)
ret |= SET_SPTE_SPURIOUS;
else if (mmu_spte_update(sptep, spte))
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v2 03/20] kvm: x86/mmu: Init / Uninit the TDP MMU

2020-10-14 Thread Ben Gardon
The TDP MMU offers an alternative mode of operation to the x86 shadow
paging based MMU, optimized for running an L1 guest with TDP. The TDP MMU
will require new fields that need to be initialized and torn down. Add
hooks into the existing KVM MMU initialization process to do that
initialization / cleanup. Currently the initialization and cleanup
fucntions do not do very much, however more operations will be added in
future patches.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon 
---
 arch/x86/include/asm/kvm_host.h |  9 
 arch/x86/kvm/Makefile   |  2 +-
 arch/x86/kvm/mmu/mmu.c  |  5 +
 arch/x86/kvm/mmu/tdp_mmu.c  | 38 +
 arch/x86/kvm/mmu/tdp_mmu.h  | 10 +
 5 files changed, 63 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kvm/mmu/tdp_mmu.c
 create mode 100644 arch/x86/kvm/mmu/tdp_mmu.h

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d0f77235da923..6b6dbc20ce23a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -980,6 +980,15 @@ struct kvm_arch {
 
struct kvm_pmu_event_filter *pmu_event_filter;
struct task_struct *nx_lpage_recovery_thread;
+
+   /*
+* Whether the TDP MMU is enabled for this VM. This contains a
+* snapshot of the TDP MMU module parameter from when the VM was
+* created and remains unchanged for the life of the VM. If this is
+* true, TDP MMU handler functions will run for various MMU
+* operations.
+*/
+   bool tdp_mmu_enabled;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 4525c1151bf99..fd6b1b0cc27c0 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -16,7 +16,7 @@ kvm-$(CONFIG_KVM_ASYNC_PF)+= $(KVM)/async_pf.o
 kvm-y  += x86.o emulate.o i8259.o irq.o lapic.o \
   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
   hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
-  mmu/tdp_iter.o
+  mmu/tdp_iter.o mmu/tdp_mmu.o
 
 kvm-intel-y+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o 
\
   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6d82784ed5679..f53d29e09367c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -19,6 +19,7 @@
 #include "ioapic.h"
 #include "mmu.h"
 #include "mmu_internal.h"
+#include "tdp_mmu.h"
 #include "x86.h"
 #include "kvm_cache_regs.h"
 #include "kvm_emulate.h"
@@ -5833,6 +5834,8 @@ void kvm_mmu_init_vm(struct kvm *kvm)
 {
struct kvm_page_track_notifier_node *node = >arch.mmu_sp_tracker;
 
+   kvm_mmu_init_tdp_mmu(kvm);
+
node->track_write = kvm_mmu_pte_write;
node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
kvm_page_track_register_notifier(kvm, node);
@@ -5843,6 +5846,8 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
struct kvm_page_track_notifier_node *node = >arch.mmu_sp_tracker;
 
kvm_page_track_unregister_notifier(kvm, node);
+
+   kvm_mmu_uninit_tdp_mmu(kvm);
 }
 
 void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
new file mode 100644
index 0..b3809835e90b1
--- /dev/null
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "tdp_mmu.h"
+
+static bool __read_mostly tdp_mmu_enabled = false;
+
+static bool is_tdp_mmu_enabled(void)
+{
+#ifdef CONFIG_X86_64
+   if (!READ_ONCE(tdp_mmu_enabled))
+   return false;
+
+   if (WARN_ONCE(!tdp_enabled,
+ "Creating a VM with TDP MMU enabled requires TDP."))
+   return false;
+
+   return true;
+
+#else
+   return false;
+#endif /* CONFIG_X86_64 */
+}
+
+/* Initializes the TDP MMU for the VM, if enabled. */
+void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
+{
+   if (!is_tdp_mmu_enabled())
+   return;
+
+   /* This should not be changed for the lifetime of the VM. */
+   kvm->arch.tdp_mmu_enabled = true;
+}
+
+void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
+{
+   if (!kvm->arch.tdp_mmu_enabled)
+   return;
+}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
new file mode 100644
index 0..cd4a562a70e9a
--- /dev/null
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef __KVM_X86_MMU_TDP_MMU_H
+#define __KVM_X86_MMU_TDP_MMU_H
+
+#include 
+
+void kvm_mmu_init_tdp_mmu(struct kvm *kvm);
+void kvm_mmu_uninit_tdp_mmu(struct kvm 

[PATCH v2 00/20] Introduce the TDP MMU

2020-10-14 Thread Ben Gardon
Over the years, the needs for KVM's x86 MMU have grown from running small
guests to live migrating multi-terabyte VMs with hundreds of vCPUs. Where
we previously depended on shadow paging to run all guests, we now have
two dimensional paging (TDP). This patch set introduces a new
implementation of much of the KVM MMU, optimized for running guests with
TDP. We have re-implemented many of the MMU functions to take advantage of
the relative simplicity of TDP and eliminate the need for an rmap.
Building on this simplified implementation, a future patch set will change
the synchronization model for this "TDP MMU" to enable more parallelism
than the monolithic MMU lock. A TDP MMU is currently in use at Google
and has given us the performance necessary to live migrate our 416 vCPU,
12TiB m2-ultramem-416 VMs.

This work was motivated by the need to handle page faults in parallel for
very large VMs. When VMs have hundreds of vCPUs and terabytes of memory,
KVM's MMU lock suffers extreme contention, resulting in soft-lockups and
long latency on guest page faults. This contention can be easily seen
running the KVM selftests demand_paging_test with a couple hundred vCPUs.
Over a 1 second profile of the demand_paging_test, with 416 vCPUs and 4G
per vCPU, 98% of the time was spent waiting for the MMU lock. At Google,
the TDP MMU reduced the test duration by 89% and the execution was
dominated by get_user_pages and the user fault FD ioctl instead of the
MMU lock.

This series is the first of two. In this series we add a basic
implementation of the TDP MMU. In the next series we will improve the
performance of the TDP MMU and allow it to execute MMU operations
in parallel.

The overall purpose of the KVM MMU is to program paging structures
(CR3/EPT/NPT) to encode the mapping of guest addresses to host physical
addresses (HPA), and to provide utilities for other KVM features, for
example dirty logging. The definition of the L1 guest physical address
(GPA) to HPA mapping comes in two parts: KVM's memslots map GPA to HVA,
and the kernel MM/x86 host page tables map HVA -> HPA. Without TDP, the
MMU must program the x86 page tables to encode the full translation of
guest virtual addresses (GVA) to HPA. This requires "shadowing" the
guest's page tables to create a composite x86 paging structure. This
solution is complicated, requires separate paging structures for each
guest CR3, and requires emulating guest page table changes. The TDP case
is much simpler. In this case, KVM lets the guest control CR3 and programs
the EPT/NPT paging structures with the GPA -> HPA mapping. The guest has
no way to change this mapping and only one version of the paging structure
is needed per L1 paging mode. In this case the paging mode is some
combination of the number of levels in the paging structure, the address
space (normal execution or system management mode, on x86), and other
attributes. Most VMs only ever use 1 paging mode and so only ever need one
TDP structure.

This series implements a "TDP MMU" through alternative implementations of
MMU functions for running L1 guests with TDP. The TDP MMU falls back to
the existing shadow paging implementation when TDP is not available, and
interoperates with the existing shadow paging implementation for nesting.
The use of the TDP MMU can be controlled by a module parameter which is
snapshot on VM creation and follows the life of the VM. This snapshot
is used in many functions to decide whether or not to use TDP MMU handlers
for a given operation.

This series can also be viewed in Gerrit here:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
(Thanks to Dmitry Vyukov  for setting up the
Gerrit instance)

Changes v1 -> v2:
  Big thanks to Paolo and Sean for your thorough reviews!
  - Moved some function definitions to mmu_internal.h instead of just
declaring them there.
  - Dropped the commit to add an as_id field to memslots in favor of
Peter Xu's which is part of the dirty ring patch set. I've included a
copy of that patch from v13 of the patch set in this series.
  - Fixed comment style on SPDX license headers
  - Added a min_level to the tdp_iter and removed the tdp_iter_no_step_down
function
  - Removed the tdp_mmu module parameter and defaulted the global to false
  - Unified more NX reclaim code
  - Added helper functions for setting SPTEs in the TDP MMU
  - Renamed tdp_iter macros to for clarity
  - Renamed kvm_tdp_mmu_page_fault to kvm_tdp_mmu_map and gave it the same
signature as __direct_map
  - Converted some WARN_ONs to BUG_ONs or removed them
  - Changed dlog to dirty_log to match convention
  - Switched make_spte to return a return code and use a return parameter
for the new SPTE
  - Refactored TDP MMU root allocation
  - Other misc cleanups and bug fixes

Ben Gardon (19):
  kvm: x86/mmu: Separate making SPTEs from set_spte
  kvm: x86/mmu: Introduce tdp_iter
  kvm: x86/mmu: Init / Uninit the TDP MMU
  kvm: x86/mmu: Allocate and free TDP MMU roots
  

Re: [PATCH v4 3/6] drivers: hwmon: Add the iEi WT61P803 PUZZLE HWMON driver

2020-10-14 Thread Luka Kovacic
Hello Guenter,

On Tue, Oct 13, 2020 at 8:51 PM Guenter Roeck  wrote:
>
> On 10/13/20 11:09 AM, Luka Kovacic wrote:
> > Hello Guenter,
> >
> > On Sun, Oct 11, 2020 at 11:26 PM Guenter Roeck  wrote:
> >>
> >> On Wed, Oct 07, 2020 at 02:48:58AM +0200, Luka Kovacic wrote:
> >>> Add the iEi WT61P803 PUZZLE HWMON driver, that handles the fan speed
> >>> control via PWM, reading fan speed and reading on-board temperature
> >>> sensors.
> >>>
> >>> The driver registers a HWMON device and a simple thermal cooling device to
> >>> enable in-kernel fan management.
> >>>
> >>> This driver depends on the iEi WT61P803 PUZZLE MFD driver.
> >>>
> >>> Signed-off-by: Luka Kovacic 
> >>> Cc: Luka Perkov 
> >>> Cc: Robert Marko 
> >>> ---
> >>>  drivers/hwmon/Kconfig |   8 +
> >>>  drivers/hwmon/Makefile|   1 +
> >>>  drivers/hwmon/iei-wt61p803-puzzle-hwmon.c | 457 ++
> >>>  3 files changed, 466 insertions(+)
> >>>  create mode 100644 drivers/hwmon/iei-wt61p803-puzzle-hwmon.c
> >>>
> >>> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> >>> index 8dc28b26916e..ff279df9bf40 100644
> >>> --- a/drivers/hwmon/Kconfig
> >>> +++ b/drivers/hwmon/Kconfig
> >>> @@ -722,6 +722,14 @@ config SENSORS_IBMPOWERNV
> >>> This driver can also be built as a module. If so, the module
> >>> will be called ibmpowernv.
> >>>
> >>> +config SENSORS_IEI_WT61P803_PUZZLE_HWMON
> >>> + tristate "iEi WT61P803 PUZZLE MFD HWMON Driver"
> >>> + depends on MFD_IEI_WT61P803_PUZZLE
> >>> + help
> >>> +   The iEi WT61P803 PUZZLE MFD HWMON Driver handles reading fan speed
> >>> +   and writing fan PWM values. It also supports reading on-board
> >>> +   temperature sensors.
> >>> +
> >>>  config SENSORS_IIO_HWMON
> >>>   tristate "Hwmon driver that uses channels specified via iio maps"
> >>>   depends on IIO
> >>> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> >>> index a8f4b35b136b..b0afb2d6896f 100644
> >>> --- a/drivers/hwmon/Makefile
> >>> +++ b/drivers/hwmon/Makefile
> >>> @@ -83,6 +83,7 @@ obj-$(CONFIG_SENSORS_HIH6130)   += hih6130.o
> >>>  obj-$(CONFIG_SENSORS_ULTRA45)+= ultra45_env.o
> >>>  obj-$(CONFIG_SENSORS_I5500)  += i5500_temp.o
> >>>  obj-$(CONFIG_SENSORS_I5K_AMB)+= i5k_amb.o
> >>> +obj-$(CONFIG_SENSORS_IEI_WT61P803_PUZZLE_HWMON) += 
> >>> iei-wt61p803-puzzle-hwmon.o
> >>>  obj-$(CONFIG_SENSORS_IBMAEM) += ibmaem.o
> >>>  obj-$(CONFIG_SENSORS_IBMPEX) += ibmpex.o
> >>>  obj-$(CONFIG_SENSORS_IBMPOWERNV)+= ibmpowernv.o
> >>> diff --git a/drivers/hwmon/iei-wt61p803-puzzle-hwmon.c 
> >>> b/drivers/hwmon/iei-wt61p803-puzzle-hwmon.c
> >>> new file mode 100644
> >>> index ..be7b019d126c
> >>> --- /dev/null
> >>> +++ b/drivers/hwmon/iei-wt61p803-puzzle-hwmon.c
> >>> @@ -0,0 +1,457 @@
> >>> +// SPDX-License-Identifier: GPL-2.0-only
> >>> +/* iEi WT61P803 PUZZLE MCU HWMON Driver
> >>> + *
> >>> + * Copyright (C) 2020 Sartura Ltd.
> >>> + * Author: Luka Kovacic 
> >>> + */
> >>> +
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +
> >>> +#define IEI_WT61P803_PUZZLE_HWMON_MAX_TEMP_NUM 2
> >>> +#define IEI_WT61P803_PUZZLE_HWMON_MAX_FAN_NUM 5
> >>> +#define IEI_WT61P803_PUZZLE_HWMON_MAX_PWM_NUM 2
> >>> +#define IEI_WT61P803_PUZZLE_HWMON_MAX_PWM_VAL 255
> >>> +
> >>> +/**
> >>> + * struct iei_wt61p803_puzzle_thermal_cooling_device - Thermal cooling 
> >>> device instance
> >>> + *
> >>> + * @mcu_hwmon:   MCU HWMON struct pointer
> >>> + * @tcdev:   Thermal cooling device pointer
> >>> + * @name:Thermal cooling device name
> >>> + * @pwm_channel: PWM channel (0 or 1)
> >>> + * @cooling_levels:  Thermal cooling device cooling levels
> >>> + */
> >>> +struct iei_wt61p803_puzzle_thermal_cooling_device {
> >>> + struct iei_wt61p803_puzzle_hwmon *mcu_hwmon;
> >>> + struct thermal_cooling_device *tcdev;
> >>> + char name[THERMAL_NAME_LENGTH];
> >>> + int pwm_channel;
> >>> + u8 *cooling_levels;
> >>> +};
> >>> +
> >>> +/**
> >>> + * struct iei_wt61p803_puzzle_hwmon - MCU HWMON Driver
> >>> + *
> >>> + * @mcu: MCU struct pointer
> >>> + * @response_buffer  Global MCU response buffer 
> >>> allocation
> >>> + * @thermal_cooling_dev_present: Per-channel thermal cooling device 
> >>> control
> >>> + * @cdev:Per-channel thermal cooling device 
> >>> private structure
> >>> + */
> >>> +struct iei_wt61p803_puzzle_hwmon {
> >>> + struct iei_wt61p803_puzzle *mcu;
> >>> + unsigned char *response_buffer;
> >>> + bool 
> >>> thermal_cooling_dev_present[IEI_WT61P803_PUZZLE_HWMON_MAX_PWM_NUM];
> >>> + struct iei_wt61p803_puzzle_thermal_cooling_device
> >>> + 

Re: WARN_ONCE triggered: tpm_tis: Add a check for invalid status

2020-10-14 Thread James Bottomley
On Wed, 2020-10-14 at 19:57 +0200, Dirk Gouders wrote:
> On my laptop the check introduced with 55707d531af62b (tpm_tis: Add a
> check for invalid status) triggered the warning (output below).
> 
> So, my laptop seems to be a candidate for testing.

I'm afraid this is a known problem on a wide range of TIS TPMs ... it's
fixed by the patch set I'm trying to get upstream:

https://lore.kernel.org/linux-integrity/20201001180925.13808-1-james.bottom...@hansenpartnership.com/

But in the meantime, it's harmless.  The TIS code at the point in the
trace is trying to send a TPM2_GetCapability() command which fails
because the locality isn't listening, but the design of that command is
only to trigger an interrupt to probe the interrupt handling nothing
else depends on it succeeding.

James




Re: [PATCH v6 02/25] objtool: Add a pass for generating __mcount_loc

2020-10-14 Thread Peter Zijlstra
On Wed, Oct 14, 2020 at 06:50:04PM +0200, Ingo Molnar wrote:
> Meh, adding --mcount as an option to 'objtool check' was a valid hack for a 
> prototype patchset, but please turn this into a proper subcommand, just 
> like 'objtool orc' is.
> 
> 'objtool check' should ... keep checking. :-)

No, no subcommands. orc being a subcommand was a mistake.


Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2020-10-14 Thread David Hildenbrand
On 14.10.20 19:56, Mina Almasry wrote:
> On Wed, Oct 14, 2020 at 9:15 AM David Hildenbrand  wrote:
>>
>> On 14.10.20 17:22, David Hildenbrand wrote:
>>> Hi everybody,
>>>
>>> Michal Privoznik played with "free page reporting" in QEMU/virtio-balloon
>>> with hugetlbfs and reported that this results in [1]
>>>
>>> 1. WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 
>>> page_counter_uncharge+0x4b/0x5
>>>
>>> 2. Any hugetlbfs allocations failing. (I assume because some accounting is 
>>> wrong)
>>>
>>>
>>> QEMU with free page hinting uses fallocate(FALLOC_FL_PUNCH_HOLE)
>>> to discard pages that are reported as free by a VM. The reporting
>>> granularity is in pageblock granularity. So when the guest reports
>>> 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE) one huge page in QEMU.
>>>
>>> I was also able to reproduce (also with virtio-mem, which similarly
>>> uses fallocate(FALLOC_FL_PUNCH_HOLE)) on latest v5.9
>>> (and on v5.7.X from F32).
>>>
>>> Looks like something with fallocate(FALLOC_FL_PUNCH_HOLE) accounting
>>> is broken with cgroups. I did *not* try without cgroups yet.
>>>
>>> Any ideas?
> 
> Hi David,
> 
> I may be able to dig in and take a look. How do I reproduce this
> though? I just fallocate(FALLOC_FL_PUNCH_HOLE) one 2MB page in a
> hugetlb region?
> 

Hi Mina,

thanks for having a look. I started poking around myself but,
being new to cgroup code, I even failed to understand why that code gets
triggered though the hugetlb controller isn't even enabled.

I assume you at least have to make sure that there is
a page populated (MMAP_POPULATE, or read/write it). But I am not
sure yet if a single fallocate(FALLOC_FL_PUNCH_HOLE) is
sufficient, or if it will require a sequence of
populate+discard(punch) (or multi-threading).

What definitely makes it trigger is via QEMU

qemu-system-x86_64 \
-machine 
pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
-cpu host,migratable=on \
-m 4096 \
-object 
memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,size=4294967296 \
-overcommit mem-lock=off \
-smp 4,sockets=1,dies=1,cores=2,threads=2 \
-nodefaults \
-nographic \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-blockdev 
'{"driver":"file","filename":"../Fedora-Cloud-Base-32-1.6.x86_64.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}'
 \
-blockdev 
'{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-1-storage","backing":null}'
 \
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1
 \
-chardev stdio,nosignal,id=serial \
-device isa-serial,chardev=serial \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7,free-page-reporting=on


However, you need a recent QEMU (>= v5.1 IIRC) and a recent kernel
(>= v5.7) inside your guest image.

Fedora rawhide qcow2 should do: 
https://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/Fedora-Cloud-Base-Rawhide-20201004.n.1.x86_64.qcow2


-- 
Thanks,

David / dhildenb



Re: [PATCH] NFS: Fix mode bits and nlink count for v4 referral dirs

2020-10-14 Thread Trond Myklebust
On Tue, 2020-10-06 at 08:14 -0700, Ashish Sangwan wrote:
> Request for mode bits and nlink count in the nfs4_get_referral call
> and if server returns them use them instead of hard coded values.
> 
> CC: sta...@vger.kernel.org
> Signed-off-by: Ashish Sangwan 
> ---
>  fs/nfs/nfs4proc.c | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6e95c85fe395..efec05c5f535 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -266,7 +266,9 @@ const u32 nfs4_fs_locations_bitmap[3] = {
>   | FATTR4_WORD0_FSID
>   | FATTR4_WORD0_FILEID
>   | FATTR4_WORD0_FS_LOCATIONS,
> - FATTR4_WORD1_OWNER
> + FATTR4_WORD1_MODE
> + | FATTR4_WORD1_NUMLINKS
> + | FATTR4_WORD1_OWNER
>   | FATTR4_WORD1_OWNER_GROUP
>   | FATTR4_WORD1_RAWDEV
>   | FATTR4_WORD1_SPACE_USED
> @@ -7594,16 +7596,28 @@ nfs4_listxattr_nfs4_user(struct inode *inode,
> char *list, size_t list_len)
>   */
>  static void nfs_fixup_referral_attributes(struct nfs_fattr *fattr)
>  {
> + bool fix_mode = true, fix_nlink = true;
> +
>   if (!(((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) ||
>  (fattr->valid & NFS_ATTR_FATTR_FILEID)) &&
> (fattr->valid & NFS_ATTR_FATTR_FSID) &&
> (fattr->valid & NFS_ATTR_FATTR_V4_LOCATIONS)))
>   return;
>  
> + if (fattr->valid & NFS_ATTR_FATTR_MODE)
> + fix_mode = false;
> + if (fattr->valid & NFS_ATTR_FATTR_NLINK)
> + fix_nlink = false;
>   fattr->valid |= NFS_ATTR_FATTR_TYPE | NFS_ATTR_FATTR_MODE |
>   NFS_ATTR_FATTR_NLINK | NFS_ATTR_FATTR_V4_REFERRAL;
> - fattr->mode = S_IFDIR | S_IRUGO | S_IXUGO;
> - fattr->nlink = 2;
> +
> + if (fix_mode)
> + fattr->mode = S_IFDIR | S_IRUGO | S_IXUGO;
> + else
> + fattr->mode |= S_IFDIR;
> +
> + if (fix_nlink)
> + fattr->nlink = 2;
>  }
>  
>  static int _nfs4_proc_fs_locations(struct rpc_clnt *client, struct
> inode *dir,

NACK to this patch. The whole point is that if the server has a
referral, then it is not going to give us any attributes other than the
ones we're already asking for because it may not even have a real
directory. The client is required to fake up an inode, hence the
existing code.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.mykleb...@hammerspace.com




Re: [PATCH] lib: kunit: add test_min_heap test conversion to KUnit

2020-10-14 Thread Ian Rogers
On Mon, Oct 12, 2020 at 2:03 PM Brendan Higgins
 wrote:
>
> On Tue, Aug 4, 2020 at 9:22 AM Vitor Massaru Iha  wrote:
> >
> > Hi Peter,
> >
> > On Tue, Aug 4, 2020 at 11:23 AM  wrote:
> > >
> > > On Tue, Aug 04, 2020 at 10:46:21AM -0300, Vitor Massaru Iha wrote:
> > > > On Tue, Aug 4, 2020 at 10:25 AM  wrote:
> > > > > On Wed, Jul 29, 2020 at 06:57:17PM -0300, Vitor Massaru Iha wrote:
> > > > >
> > > > > > The results can be seen this way:
> > > > > >
> > > > > > This is an excerpt from the test.log with the result in TAP format:
> > > > > > [snip]
> > > > > > ok 5 - example
> > > > > > # Subtest: min-heap
> > > > > > 1..6
> > > > > > ok 1 - test_heapify_all_true
> > > > > > ok 2 - test_heapify_all_false
> > > > > > ok 3 - test_heap_push_true
> > > > > > ok 4 - test_heap_push_false
> > > > > > ok 5 - test_heap_pop_push_true
> > > > > > ok 6 - test_heap_pop_push_false
> > > > > > [snip]
> > >
> > > So ^ is TAP format?
> >
> > Yep, you can see the spec here: 
> > https://testanything.org/tap-specification.html
> >
> > >
> > > > > I don't care or care to use either; what does dmesg do? It used to be
> > > > > that just building the self-tests was sufficient and any error would
> > > > > show in dmesg when you boot the machine.
> > > > >
> > > > > But if I now have to use some damn tool, this is a regression.
> > > >
> > > > If you don't want to, you don't need to use the kunit-tool. If you
> > > > compile the tests as builtin and run the Kernel on your machine
> > > > the test result will be shown in dmesg in TAP format.
> > >
> > > That's seems a lot more verbose than it is now. I've recently even done
> > > a bunch of tests that don't print anything on success, dmesg is clutter
> > > enough already.
> >
> > What tests do you refer to?
> >
> > Running the test_min_heap.c, I got this from dmesg:
> >
> > min_heap_test: test passed
> >
> > And running min_heap_kunit.c:
> >
> > ok 1 - min-heap
>
> Gentle poke. I think Vitor was looking for a response. My guess is
> that Ian was waiting for a follow up patch.

There were some issues in the original patch, they should be easy to
fix. I'm more concerned that Peter's issues are addressed about the
general direction of the patch, verbosity and testing frameworks. I
see Vitor followed up with Peter but I'm not sure that means the
approach has been accepted.

Thanks,
Ian


[PATCH v3 2/5] arm64: tegra: Specify sdhci clock parent for Tegra194 SDMMC1 and SDMMC3

2020-10-14 Thread Tamás Szűcs
Use PLLC4_MUXED as clock source for SDMMC1 and SDMMC3 core clocks. This enables
more suitable interface clocks for higher data rate modes.

Signed-off-by: Tamás Szűcs 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 058fdb1ffa1a..7e3ceeb5bc43 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -701,6 +701,9 @@
clocks = < TEGRA194_CLK_SDMMC1>,
 < TEGRA194_CLK_SDMMC_LEGACY_TM>;
clock-names = "sdhci", "tmclk";
+   assigned-clocks = < TEGRA194_CLK_SDMMC1>,
+ < TEGRA194_CLK_PLLC4_MUXED>;
+   assigned-clock-parents = < 
TEGRA194_CLK_PLLC4_MUXED>;
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
interconnects = < TEGRA194_MEMORY_CLIENT_SDMMCRA 
>,
@@ -739,6 +742,9 @@
clocks = < TEGRA194_CLK_SDMMC3>,
 < TEGRA194_CLK_SDMMC_LEGACY_TM>;
clock-names = "sdhci", "tmclk";
+   assigned-clocks = < TEGRA194_CLK_SDMMC3>,
+ < TEGRA194_CLK_PLLC4_MUXED>;
+   assigned-clock-parents = < 
TEGRA194_CLK_PLLC4_MUXED>;
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
interconnects = < TEGRA194_MEMORY_CLIENT_SDMMCR 
>,
-- 
2.20.1



Re: [patch 11/12] usb: core: Replace in_interrupt() in comments

2020-10-14 Thread Alan Stern
On Wed, Oct 14, 2020 at 06:41:23PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-10-14 12:27:21 [-0400], Alan Stern wrote:
> > > --- a/drivers/usb/core/hcd.c
> > > +++ b/drivers/usb/core/hcd.c
> > > @@ -746,9 +746,6 @@ static int rh_call_control (struct usb_h
> > >   * Root Hub interrupt transfers are polled using a timer if the
> > >   * driver requests it; otherwise the driver is responsible for
> > >   * calling usb_hcd_poll_rh_status() when an event occurs.
> > > - *
> > > - * Completions are called in_interrupt(), but they may or may not
> > > - * be in_irq().
> > 
> > This comment should not be removed; instead it should be changed to say 
> > that completion handlers are called with interrupts disabled.
> 
> The timer callback:
>   rh_timer_func() -> usb_hcd_poll_rh_status()  
> 
> invokes the function with enabled interrupts.

Well, it doesn't change the interrupt settings.  It might call 
usb_hcd_poll_rh_status() with interrupts enabled or disabled, depending 
on how it was called originally.

But that wasn't what I meant.  usb_hcd_poll_rh_status() calls 
usb_hcd_giveback_urb() with interrupts disabled always, and that routine 
may call __usb_hcd_giveback_urb(), which calls

urb->complete(urb);

In this case the completion handler would be invoked with interrupts 
disabled.  Alternatively, __usb_hcd_giveback_urb() may be invoked from a 
BH handler, in which case the completion handler will run in softirq 
context with interrupts enabled.

So I guess it would be best to say that completion handlers may be 
called with interrupts enabled or disabled.  Or you might want to put 
such a comment in __usb_hcd_giveback_urb().

> > > @@ -1691,7 +1690,6 @@ static void usb_giveback_urb_bh(unsigned
> > >   * @hcd: host controller returning the URB
> > >   * @urb: urb being returned to the USB device driver.
> > >   * @status: completion status code for the URB.
> > > - * Context: in_interrupt()
> > 
> > The comment should be changed to say that the routine runs in a BH 
> > handler (or however you want to express it).
> 
> Do you mean usb_hcd_giveback_urb() runs in BH context or that the
> completion callback of the URB runs in BH context?

Actually I meant that usb_hcd_giveback_urb_bh() runs in BH context.  
Sorry, I got confused about the location of this hunk.

To be explicit: The comment for usb_hcd_giveback_urb() should say that 
the function expects to be called with interrupts disabled (whether the 
context is task, atomic, BH, interrupt, etc. doesn't matter).

> The completion callback of the URB may run in BH or IRQ context
> depending on HCD.
> 
> > > --- a/drivers/usb/core/message.c
> > > +++ b/drivers/usb/core/message.c
> > 
> > > @@ -934,7 +939,7 @@ int usb_get_device_descriptor(struct usb
> > >  /*
> > >   * usb_set_isoch_delay - informs the device of the packet transmit delay
> > >   * @dev: the device whose delay is to be informed
> > > - * Context: !in_interrupt()
> > > + * Context: can sleep
> > 
> > Why is this comment different from all the others?
> 
> It says !in_interrupt() which is also true for preempt-disabled regions.
> But the caller must not have preemption disabled. "can sleep" is more
> obvious as what it needs.

But all the other comments in this patch say:

 * Context: task context, might sleep.

Why doesn't this comment say the same thing?

Alan Stern


RE: [RFC] Documentation: Add documentation for new performance_profile sysfs class

2020-10-14 Thread Limonciello, Mario
> As far as I know, the profiles affect  the thermal behavior like "how long to
> wait before starting the fan and at what temperature" or "how fast to run the
> fan with the current cpu load and temperature".
> 
> The only way that firmware uses to "control" performance should be the odvp0
> DPTF variable.
> 
> On Windows HP expose  both "Cool" and "Quiet" profile respectively as "Ideal
> for when the computer feels warm to the touch" and "Ideal for quiet
> environments".
> 
> more detail: https://support.hp.com/in-en/document/c06063108
> 
> To be precise "Quiet" profile should be available only on platform without
> dedicate GPU (I'm investigating if there is other case), instead the other 3
> profiles ("HP Recommended", "Performance" and "Cool") are available on all
> platform that support thermal profile.
> 
> reading here seems that Dell offer identical profiles:
> https://www.dell.com/support/manuals/it/it/itbsdt1/dell-command-power-manager-
> v2.2/userguide_dell/thermal-management?guid=guid-c05d2582-fc07-4e3e-918a-
> 965836d20752=en-us
> 

The Dell profiles here I don't think should be conflated with DTT/DPTF though.
These won't affect anything in the DTT vault.

Those profiles are already accessible from Linux userland, there is already a
command line tool in libsmbios that can do it.

$ sudo smbios-thermal-ctl -i
Libsmbios version : 2.4.3
smbios-thermal-ctl version : 2.4.3

 Print all the Available Thermal Information of your system:
---

Supported Thermal Modes:
 Balanced
 Cool Bottom
 Quiet
 Performance

Supported Active Acoustic Controller (AAC) modes:

Supported AAC Configuration type:
Global (AAC enable/disable applies to all supported USTT modes)\

$ sudo smbios-thermal-ctl -g
Helper function to Get current Thermal Mode settings

 Print Current Status of Thermal Information:
---

Current Thermal Modes:
 Balanced

Current Active Acoustic Controller (AAC) Mode:
 AAC mode Disabled

Current Active Acoustic Controller (AAC) Mode:
Global (AAC enable/disable applies to all supported USTT modes)

---
*If* we choose to export them for Dell from the kernel, we should probably also
filter from kernel side and modify the existing userspace tool to use the kernel
API when possible.

I'm not convinced that Dell would want to tie into this proposed knob with those
settings because I expect that existing software like thermald will not work as
efficiently.



Re: [PATCH v2] checkpatch: add new exception to repeated word check

2020-10-14 Thread Dwaipayan Ray
On Wed, Oct 14, 2020 at 11:33 PM Joe Perches  wrote:
>
> On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote:
> > Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> > moved the repeated word test to check for more file types. But after
> > this, if checkpatch.pl is run on MAINTAINERS, it generates several
> > new warnings of the type:
>
> Perhaps instead of adding more content checks so that
> word boundaries are not something like \S but also
> not punctuation so that content like
>
> git git://
> @size size
>
> does not match?
>
>
Hi,
So currently the words are trimmed of non alphabets before the check:

while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
my $first = $1;
my $second = $2;

where, the word_pattern is:
my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';

So do you perhaps recommend modifying this word pattern to
include the punctuation as well rather than trimming them off?

Thanks,
Dwaipayan.


Re: [PATCH 18/20] arch: dts: Fix EHCI/OHCI DT nodes name

2020-10-14 Thread Serge Semin
On Wed, Oct 14, 2020 at 11:00:45AM -0700, Florian Fainelli wrote:
> On 10/14/20 3:14 AM, Serge Semin wrote:
> > In accordance with the Generic EHCI/OHCI bindings the corresponding node
> > name is suppose to comply with the Generic USB HCD DT schema, which
> > requires the USB nodes to have the name acceptable by the regexp:
> > "^usb(@.*)?" . Let's fix the DTS files, which have the nodes defined with
> > incompatible names.
> > 
> > Signed-off-by: Serge Semin 
> > 
> > ---
> > 
> > Please, test the patch out to make sure it doesn't brake the dependent DTS
> > files. I did only a manual grepping of the possible nodes dependencies.
> 

> Not sure how you envisioned these change to be picked up, but you may
> need to split these changes between ARM/ARM64, MIPS and PowerPC at
> least. And within ARM/ARM64 you will most likely have to split according
> to the various SoC maintainers.

Hmm, I don't really know how it's going to be done in this case, but there must
be a way to get the cross-platform patches picked up in general. For
instance, see the patches like:
714acdbd1c94 arch: rename copy_thread_tls() back to copy_thread()
140c8180eb7c arch: remove HAVE_COPY_THREAD_TLS
They touched the files from different files, but still have been merged in.
Maybe I should have copied these three patches to the 
"linux-a...@vger.kernel.org"
list or some other mailing list...

-Sergey

> -- 
> Florian


Re: [PATCH] firmware: arm_scmi: Fix duplicate workqueue name

2020-10-14 Thread Florian Fainelli
On 10/14/20 10:39 AM, Sudeep Holla wrote:
> On Wed, Oct 14, 2020 at 10:13:04AM -0700, Florian Fainelli wrote:
>> On 10/14/20 9:18 AM, Sudeep Holla wrote:
>>> On Wed, Oct 14, 2020 at 02:48:19PM +0100, Cristian Marussi wrote:
>>>
>>> [...]
>>>
>
> I have pushed a version with above change [1], please check if you are
> happy with that ?
>
> [1] https://git.kernel.org/sudeep.holla/linux/c/b2cd15549b

 I agree with the need to retain _notify name, but I'm not so sure about
 the above patch...which is:

>>>
>>> I agree, I thought about it and just cooked up this as a quick solution.
>>> I will move to that, even I wasn't happy with this TBH.
>>
>> The reason why I went with just dev_name() was such that the workqueue
>> name and the device nodes under /sys would strictly match, which helps
>> as an user, and also it avoided the temporary buffer and its size
>> limitations.
> 
> Agreed. I just showed that as example and was hoping to use some nice
> kstr* APIs to achieve what I wanted but soon realised there exists none.
> So as replied earlier, I will take this change as it for now. Let us
> address this in future if it becomes an issue.
> 
> Thanks for quick test, we now know whom to bother if we need more testing
>  as out internal platforms are not that great to cover all the aspects
> we add in the spec and even in the kernel.

No problem! I still need to find the time to upgrade the ATF equivalent
implementation to support SCMI 3.0, have not done that just yet.
-- 
Florian


Re: [PATCH v21 07/12] landlock: Support filesystem access-control

2020-10-14 Thread James Morris
On Thu, 8 Oct 2020, Mickaël Salaün wrote:

> +config ARCH_EPHEMERAL_STATES
> + def_bool n
> + help
> +   An arch should select this symbol if it does not keep an internal 
> kernel
> +   state for kernel objects such as inodes, but instead relies on 
> something
> +   else (e.g. the host kernel for an UML kernel).
> +

This is used to disable Landlock for UML, correct? I wonder if it could be 
more specific: "ephemeral states" is a very broad term.

How about something like ARCH_OWN_INODES ?


-- 
James Morris



[PATCH v3 1/5] arm64: tegra: Enable signal voltage switching on Tegra194 SDMMC1 and SDMMC3

2020-10-14 Thread Tamás Szűcs
Add pad voltage configuration nodes for SDMMC pads with configurable voltages
and enable supported SD card, SDIO and eMMC modes.

Signed-off-by: Tamás Szűcs 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 45 
 1 file changed, 45 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index e9c90f0f44ff..058fdb1ffa1a 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -705,6 +706,9 @@
interconnects = < TEGRA194_MEMORY_CLIENT_SDMMCRA 
>,
< TEGRA194_MEMORY_CLIENT_SDMMCWA 
>;
interconnect-names = "dma-mem", "write";
+   pinctrl-names = "sdmmc-3v3", "sdmmc-1v8";
+   pinctrl-0 = <_3v3>;
+   pinctrl-1 = <_1v8>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -716,6 +720,15 @@
nvidia,pad-autocal-pull-down-offset-sdr104 = <0x00>;
nvidia,default-tap = <0x9>;
nvidia,default-trim = <0x5>;
+   cap-sd-highspeed;
+   cap-mmc-highspeed;
+   sd-uhs-sdr12;
+   sd-uhs-sdr25;
+   sd-uhs-sdr50;
+   sd-uhs-sdr104;
+   cap-sdio-irq;
+   mmc-ddr-1_8v;
+   mmc-hs200-1_8v;
status = "disabled";
};
 
@@ -731,6 +744,9 @@
interconnects = < TEGRA194_MEMORY_CLIENT_SDMMCR 
>,
< TEGRA194_MEMORY_CLIENT_SDMMCW 
>;
interconnect-names = "dma-mem", "write";
+   pinctrl-names = "sdmmc-3v3", "sdmmc-1v8";
+   pinctrl-0 = <_3v3>;
+   pinctrl-1 = <_1v8>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -743,6 +759,15 @@
nvidia,pad-autocal-pull-down-offset-sdr104 = <0x00>;
nvidia,default-tap = <0x9>;
nvidia,default-trim = <0x5>;
+   cap-sd-highspeed;
+   cap-mmc-highspeed;
+   sd-uhs-sdr12;
+   sd-uhs-sdr25;
+   sd-uhs-sdr50;
+   sd-uhs-sdr104;
+   cap-sdio-irq;
+   mmc-ddr-1_8v;
+   mmc-hs200-1_8v;
status = "disabled";
};
 
@@ -1269,6 +1294,26 @@
 
#interrupt-cells = <2>;
interrupt-controller;
+
+   sdmmc1_3v3: sdmmc1-3v3 {
+   pins = "sdmmc1-hv";
+   power-source = ;
+   };
+
+   sdmmc1_1v8: sdmmc1-1v8 {
+   pins = "sdmmc1-hv";
+   power-source = ;
+   };
+
+   sdmmc3_3v3: sdmmc3-3v3 {
+   pins = "sdmmc3-hv";
+   power-source = ;
+   };
+
+   sdmmc3_1v8: sdmmc3-1v8 {
+   pins = "sdmmc3-hv";
+   power-source = ;
+   };
};
 
host1x@13e0 {
-- 
2.20.1



Re: [PATCH RFC v4 06/13] perf vendor events arm64: Add hip09 SMMUv3 PMCG events

2020-10-14 Thread Robin Murphy

On 2020-10-08 11:15, John Garry wrote:

Add the SMMUv3 PMCG (Performance Monitor Event Group) events for hip09
platform.

This contains a mix of architected and IMP def events

Signed-off-by: John Garry 
---
  .../hisilicon/hip09/sys/smmu-v3-pmcg.json | 42 +++
  1 file changed, 42 insertions(+)
  create mode 100644 
tools/perf/pmu-events/arch/arm64/hisilicon/hip09/sys/smmu-v3-pmcg.json

diff --git 
a/tools/perf/pmu-events/arch/arm64/hisilicon/hip09/sys/smmu-v3-pmcg.json 
b/tools/perf/pmu-events/arch/arm64/hisilicon/hip09/sys/smmu-v3-pmcg.json
new file mode 100644
index ..8abafbb2dcb4
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip09/sys/smmu-v3-pmcg.json
@@ -0,0 +1,42 @@
+[
+   {
+   "ArchStdEvent": "smmuv3_pmcg.CYCLES"
+   "Compat": "0x00030736"
+   },
+   {
+   "ArchStdEvent": "smmuv3_pmcg.TRANSACTION"
+   "Compat": "0x00030736"
+   },
+   {
+   "ArchStdEvent": "smmuv3_pmcg.TLB_MISS"
+   "Compat": "0x00030736"
+   },
+   {
+   "ArchStdEvent": "smmuv3_pmcg.CONFIG_CACHE_MISS"
+   "Compat": "0x00030736"
+   },
+   {
+   "ArchStdEvent": "smmuv3_pmcg.TRANS_TABLE_WALK_ACCESS"
+   "Compat": "0x00030736"
+   },
+   {
+   "ArchStdEvent": "smmuv3_pmcg.CONFIG_STRUCT_ACCESS"
+   "Compat": "0x00030736"
+   },
+   {
+   "ArchStdEvent": "smmuv3_pmcg.PCIE_ATS_TRANS_RQ"
+   "Compat": "0x00030736"
+   },
+   {
+   "ArchStdEvent": "smmuv3_pmcg.PCIE_ATS_TRANS_PASSED"
+   "Compat": "0x00030736"
+   },
+   {
+   "EventCode": "0x8a",
+   "EventName": "smmuv3_pmcg.L1_TLB",
+   "BriefDescription": "SMMUv3 PMCG L1 TABLE transation",
+   "PublicDescription": "SMMUv3 PMCG L1 TABLE transation",


Those typos are either missing "c"s or "l"s, but with SMMU it's never 
clear which ;)


Robin.


+   "Unit": "smmuv3_pmcg",
+   "Compat": "0x00030736"
+   },
+]



[PATCH v3 3/5] arm64: tegra: Fix CD on Jetson AGX Xavier SDMMC1

2020-10-14 Thread Tamás Szűcs
Change GPIO used for card detection on SDMMC1. Also, specify bus width and
disable write protect while at it.

Signed-off-by: Tamás Szűcs 
---
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
index d71b7a1140fe..31b5f00a7547 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
@@ -75,7 +75,9 @@
 
/* SDMMC1 (SD/MMC) */
mmc@340 {
-   cd-gpios = < TEGRA194_MAIN_GPIO(A, 0) 
GPIO_ACTIVE_LOW>;
+   bus-width = <4>;
+   cd-gpios = < TEGRA194_MAIN_GPIO(G, 7) 
GPIO_ACTIVE_LOW>;
+   disable-wp;
};
 
/* SDMMC4 (eMMC) */
-- 
2.20.1



[PATCH v3 4/5] arm64: tegra: Add vmmc-supply regulator for Jetson AGX Xavier SDMMC1

2020-10-14 Thread Tamás Szűcs
Create regulator for VDD_3V3_SD and add it to SDMMC1. When vmmc-supply is
undefined the initialization sequence specified in aliases is disregarded.

Signed-off-by: Tamás Szűcs 
---
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
index 31b5f00a7547..604a2c8a7478 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
@@ -78,6 +78,8 @@
bus-width = <4>;
cd-gpios = < TEGRA194_MAIN_GPIO(G, 7) 
GPIO_ACTIVE_LOW>;
disable-wp;
+
+   vmmc-supply = <_3v3_sd>;
};
 
/* SDMMC4 (eMMC) */
@@ -356,4 +358,14 @@
gpio = < TEGRA194_MAIN_GPIO(Z, 1) GPIO_ACTIVE_HIGH>;
enable-active-high;
};
+
+   vdd_3v3_sd: regulator@5 {
+   compatible = "regulator-fixed";
+   regulator-name = "VDD_3V3_SD";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   gpio = < TEGRA194_MAIN_GPIO(A, 0) GPIO_ACTIVE_HIGH>;
+   regulator-boot-on;
+   enable-active-high;
+   };
 };
-- 
2.20.1



[PATCH v3 5/5] arm64: tegra: Configure SDIO cards on Jetson AGX Xavier SDMMC1

2020-10-14 Thread Tamás Szűcs
Preserve SDIO card power during suspend/resume cycles and enable waking up the
host system on SDIO IRQ assertion.

Signed-off-by: Tamás Szűcs 
---
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
index 604a2c8a7478..e3098b1555bc 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
@@ -78,6 +78,8 @@
bus-width = <4>;
cd-gpios = < TEGRA194_MAIN_GPIO(G, 7) 
GPIO_ACTIVE_LOW>;
disable-wp;
+   keep-power-in-suspend;
+   wakeup-source;
 
vmmc-supply = <_3v3_sd>;
};
-- 
2.20.1



[PATCH v3 0/5] arm64: tegra: Xavier SDMMC changes

2020-10-14 Thread Tamás Szűcs
Upstream Xavier SDMMC needs some love. I've been able to test with a Jetson AGX 
Xavier.
Changes here work for me with 1.8 V and 3.3 V SD and SDIO devices.

Changes in v3:
- I started seeing ocasional eMMC init timeouts on cold starts when HS400 is 
enabled, so drop this until fix is found
- add missing vmmc-supply regulator for SDMMC1
- specify bus width and disable WP for SDMMC1
- specify primary clock source for SDMMC1 and SDMMC3

Changes in v2:
- fix board name in commit messages
- rebase on for-next

Kind regards,
Tamas


Tamás Szűcs (5):
  arm64: tegra: Enable signal voltage switching on Tegra194 SDMMC1 and
SDMMC3
  arm64: tegra: Specify sdhci clock parent for Tegra194 SDMMC1 and
SDMMC3
  arm64: tegra: Fix CD on Jetson AGX Xavier SDMMC1
  arm64: tegra: Add vmmc-supply regulator for Jetson AGX Xavier SDMMC1
  arm64: tegra: Configure SDIO cards on Jetson AGX Xavier SDMMC1

 .../arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 18 ++-
 arch/arm64/boot/dts/nvidia/tegra194.dtsi  | 51 +++
 2 files changed, 68 insertions(+), 1 deletion(-)

-- 
2.20.1



Re: [RFC PATCH v2] checkpatch: add shebang check to EXECUTE_PERMISSIONS

2020-10-14 Thread Joe Perches
On Wed, 2020-10-14 at 19:45 +0200, Miguel Ojeda wrote:
>   - Code that should be specially-formatted should be in a
> clang-format-off section to begin with, so it doesn't count.

clang-format is not the end-all tool.
Any 'formatting off/on' marker should be tool agnostic.




WARN_ONCE triggered: tpm_tis: Add a check for invalid status

2020-10-14 Thread Dirk Gouders
On my laptop the check introduced with 55707d531af62b (tpm_tis: Add a
check for invalid status) triggered the warning (output below).

So, my laptop seems to be a candidate for testing.

Dirk

[7.255467] [ cut here ]
[7.255468] TPM returned invalid status
[7.255481] WARNING: CPU: 4 PID: 816 at drivers/char/tpm/tpm_tis_core.c:249 
tpm_tis_status+0x5e/0x7f [tpm_tis_core]
[7.255481] Modules linked in: nvram tpm_tis(+) tpm_tis_core tpm wmi 
pinctrl_amd
[7.255485] CPU: 4 PID: 816 Comm: udevd Not tainted 5.9.0-x86_64+ #148
[7.255486] Hardware name: LENOVO 20U50008GE/20U50008GE, BIOS R19ET26W (1.10 
) 06/22/2020
[7.255488] RIP: 0010:tpm_tis_status+0x5e/0x7f [tpm_tis_core]
[7.255489] Code: 44 8a 64 24 07 41 f6 c4 23 74 21 45 31 e4 80 3d 86 26 00 
00 00 75 15 48 c7 c7 2a 71 01 a0 c6 05 76 26 00 00 01 e8 e1 17 0e e1 <0f> 0b 48 
8b 44 24 08 65 48 33 04 25 28 00 00 00 74 05 e8 cd cd a2
[7.255489] RSP: 0018:c9f7f8b8 EFLAGS: 00010286
[7.255490] RAX:  RBX: 8883f9117198 RCX: 038a
[7.255490] RDX: 0001 RSI: 0002 RDI: 82dbd34c
[7.255491] RBP: 8883f8aae000 R08: 0001 R09: 00012d00
[7.255491] R10:  R11: 003c R12: 
[7.255492] R13:  R14: 8883f7c8a000 R15: 0016
[7.255493] FS:  7f350c98d780() GS:88840ed0() 
knlGS:
[7.255493] CS:  0010 DS:  ES:  CR0: 80050033
[7.255494] CR2: 7f350c572ca8 CR3: 0003f700 CR4: 00350ee0
[7.255494] Call Trace:
[7.255497]  tpm_tis_send_data+0x28/0x187 [tpm_tis_core]
[7.255498]  tpm_tis_send_main+0x14/0x84 [tpm_tis_core]
[7.255500]  tpm_transmit+0xc5/0x2d7 [tpm]
[7.255503]  tpm_transmit_cmd+0x23/0x8b [tpm]
[7.255504]  tpm2_get_tpm_pt+0x71/0xb2 [tpm]
[7.255505]  tpm_tis_probe_irq_single+0x134/0xbe2 [tpm_tis_core]
[7.255506]  tpm_tis_core_init+0x3b3/0x47b [tpm_tis_core]
[7.255508]  tpm_tis_plat_probe+0xa7/0xc5 [tpm_tis]
[7.255512]  platform_drv_probe+0x2a/0x6b
[7.255514]  really_probe+0x157/0x32a
[7.255516]  driver_probe_device+0x63/0x97
[7.255517]  device_driver_attach+0x37/0x50
[7.255518]  __driver_attach+0x92/0x9a
[7.255519]  ? device_driver_attach+0x50/0x50
[7.255520]  bus_for_each_dev+0x70/0xa6
[7.255521]  bus_add_driver+0x103/0x1b4
[7.255522]  driver_register+0x99/0xd2
[7.255523]  ? 0xa0023000
[7.255524]  init_tis+0x88/0x1000 [tpm_tis]
[7.255526]  ? consume_skb+0x9/0x20
[7.255529]  ? ___cache_free+0x5c/0x153
[7.255531]  ? _cond_resched+0x1b/0x1e
[7.255532]  do_one_initcall+0x8a/0x195
[7.255534]  ? kmem_cache_alloc_trace+0x80/0x8f
[7.255536]  do_init_module+0x56/0x1e8
[7.255537]  load_module+0x1c2c/0x2139
[7.255538]  ? __do_sys_finit_module+0x8f/0xb6
[7.255539]  __do_sys_finit_module+0x8f/0xb6
[7.255541]  do_syscall_64+0x5d/0x6a
[7.255543]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[7.255544] RIP: 0033:0x7f350cae8919
[7.255545] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 47 05 0c 00 f7 d8 64 89 01 48
[7.255545] RSP: 002b:7ffe0e5ed928 EFLAGS: 0246 ORIG_RAX: 
0139
[7.255546] RAX: ffda RBX: 55a400c373e0 RCX: 7f350cae8919
[7.255546] RDX:  RSI: 7f350cbc1a7d RDI: 000b
[7.255547] RBP: 0002 R08:  R09: 7ffe0e5edaa0
[7.255547] R10: 000b R11: 0246 R12: 7f350cbc1a7d
[7.255548] R13:  R14: 55a400c30e10 R15: 55a400c373e0
[7.255548] ---[ end trace 883dd41f557db5d3 ]---


Re: [SUSPECTED SPAM][PATCH 0/2] Remove Xen PVH dependency on PCI

2020-10-14 Thread Andrew Cooper
On 14/10/2020 18:53, Jason Andryuk wrote:
> A Xen PVH domain doesn't have a PCI bus or devices,

[*] Yet.

> so it doesn't need PCI support built in.

Untangling the dependences is a good thing, but eventually we plan to
put an optional PCI bus back in, e.g. for SRIOV usecases.

~Andrew


Re: [PATCH v2] checkpatch: add new exception to repeated word check

2020-10-14 Thread Joe Perches
On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote:
> Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> moved the repeated word test to check for more file types. But after
> this, if checkpatch.pl is run on MAINTAINERS, it generates several
> new warnings of the type:

Perhaps instead of adding more content checks so that
word boundaries are not something like \S but also
not punctuation so that content like

git git://
@size size

does not match?




Re: [PATCH] MIPS: DEC: Restore bootmem reservation for firmware working memory area

2020-10-14 Thread Serge Semin
Hello Maciej,

On Wed, Oct 14, 2020 at 12:10:09PM +0100, Maciej W. Rozycki wrote:
> Fix a crash on DEC platforms starting with:
> 

...

> @@ -146,6 +150,9 @@ void __init plat_mem_setup(void)
>  
>   ioport_resource.start = ~0UL;
>   ioport_resource.end = 0UL;
> +
> + /* Stay away from the firmware working memory area for now. */

> + memblock_reserve(PHYS_OFFSET, __pa_symbol(&_text));

Shouldn't that be:
+   memblock_reserve(PHYS_OFFSET, __pa_symbol(&_text) - PHYS_OFFSET);
instead?

-Sergey

>  }
>  
>  /*
> 


Re: [PATCH 18/20] arch: dts: Fix EHCI/OHCI DT nodes name

2020-10-14 Thread Florian Fainelli
On 10/14/20 3:14 AM, Serge Semin wrote:
> In accordance with the Generic EHCI/OHCI bindings the corresponding node
> name is suppose to comply with the Generic USB HCD DT schema, which
> requires the USB nodes to have the name acceptable by the regexp:
> "^usb(@.*)?" . Let's fix the DTS files, which have the nodes defined with
> incompatible names.
> 
> Signed-off-by: Serge Semin 
> 
> ---
> 
> Please, test the patch out to make sure it doesn't brake the dependent DTS
> files. I did only a manual grepping of the possible nodes dependencies.

Not sure how you envisioned these change to be picked up, but you may
need to split these changes between ARM/ARM64, MIPS and PowerPC at
least. And within ARM/ARM64 you will most likely have to split according
to the various SoC maintainers.
-- 
Florian


[tip: sched/urgent] sched: Replace zero-length array with flexible-array

2020-10-14 Thread tip-bot2 for zhuguangqing
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: eba9f08293d76370049ec85581ab3d7f6d069e3e
Gitweb:
https://git.kernel.org/tip/eba9f08293d76370049ec85581ab3d7f6d069e3e
Author:zhuguangqing 
AuthorDate:Wed, 14 Oct 2020 22:02:20 +08:00
Committer: Ingo Molnar 
CommitterDate: Wed, 14 Oct 2020 19:55:19 +02:00

sched: Replace zero-length array with flexible-array

In the following commit:

  04f5c362ec6d: ("sched/fair: Replace zero-length array with flexible-array")

a zero-length array cpumask[0] has been replaced with cpumask[].
But there is still a cpumask[0] in 'struct sched_group_capacity'
which was missed.

The point of using [] instead of [0] is that with [] the compiler will
generate a build warning if it isn't the last member of a struct.

[ mingo: Rewrote the changelog. ]

Signed-off-by: zhuguangqing 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20201014140220.11384-1-zhuguangqin...@gmail.com
---
 kernel/sched/sched.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 28709f6..648f023 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1471,7 +1471,7 @@ struct sched_group_capacity {
int id;
 #endif
 
-   unsigned long   cpumask[0]; /* Balance mask */
+   unsigned long   cpumask[];  /* Balance mask */
 };
 
 struct sched_group {


[tip: x86/urgent] x86/syscalls: Document the fact that syscalls 512-547 are a legacy mistake

2020-10-14 Thread tip-bot2 for Andy Lutomirski
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: c3b484c439b0bab7a698495f33ef16286a1000c4
Gitweb:
https://git.kernel.org/tip/c3b484c439b0bab7a698495f33ef16286a1000c4
Author:Andy Lutomirski 
AuthorDate:Sun, 11 Oct 2020 19:51:21 -07:00
Committer: Ingo Molnar 
CommitterDate: Wed, 14 Oct 2020 19:53:40 +02:00

x86/syscalls: Document the fact that syscalls 512-547 are a legacy mistake

Since this commit:

  6365b842aae4 ("x86/syscalls: Split the x32 syscalls into their own table")

there is no need for special x32-specific syscall numbers.  I forgot to
update the comments in syscall_64.tbl.  Add comments to make it clear to
future contributors that this range is a legacy wart.

Reported-by: Jessica Clarke 
Signed-off-by: Andy Lutomirski 
Signed-off-by: Ingo Molnar 
Link: 
https://lore.kernel.org/r/6c56fb4ddd18fc60a238eb4d867e4b3d97c6351e.1602471055.git.l...@kernel.org
---
 arch/x86/entry/syscalls/syscall_64.tbl | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index f30d6ae..4adb5d2 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -363,10 +363,10 @@
 439common  faccessat2  sys_faccessat2
 
 #
-# x32-specific system call numbers start at 512 to avoid cache impact
-# for native 64-bit operation. The __x32_compat_sys stubs are created
-# on-the-fly for compat_sys_*() compatibility system calls if X86_X32
-# is defined.
+# Due to a historical design error, certain syscalls are numbered differently
+# in x32 as compared to native x86_64.  These syscalls have numbers 512-547.
+# Do not add new syscalls to this range.  Numbers 548 and above are available
+# for non-x32 use.
 #
 512x32 rt_sigactioncompat_sys_rt_sigaction
 513x32 rt_sigreturncompat_sys_x32_rt_sigreturn
@@ -404,3 +404,5 @@
 545x32 execveatcompat_sys_execveat
 546x32 preadv2 compat_sys_preadv64v2
 547x32 pwritev2compat_sys_pwritev64v2
+# This is the end of the legacy x32 range.  Numbers 548 and above are
+# not special and are not to be used for x32-specific syscalls.


[tip: sched/urgent] sched/features: Fix !CONFIG_JUMP_LABEL case

2020-10-14 Thread tip-bot2 for Juri Lelli
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: a73f863af4ce9730795eab7097fb2102e6854365
Gitweb:
https://git.kernel.org/tip/a73f863af4ce9730795eab7097fb2102e6854365
Author:Juri Lelli 
AuthorDate:Tue, 13 Oct 2020 07:31:14 +02:00
Committer: Ingo Molnar 
CommitterDate: Wed, 14 Oct 2020 19:55:46 +02:00

sched/features: Fix !CONFIG_JUMP_LABEL case

Commit:

  765cc3a4b224e ("sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG 
builds")

made sched features static for !CONFIG_SCHED_DEBUG configurations, but
overlooked the CONFIG_SCHED_DEBUG=y and !CONFIG_JUMP_LABEL cases.

For the latter echoing changes to /sys/kernel/debug/sched_features has
the nasty effect of effectively changing what sched_features reports,
but without actually changing the scheduler behaviour (since different
translation units get different sysctl_sched_features).

Fix CONFIG_SCHED_DEBUG=y and !CONFIG_JUMP_LABEL configurations by properly
restructuring ifdefs.

Fixes: 765cc3a4b224e ("sched/core: Optimize sched_feat() for 
!CONFIG_SCHED_DEBUG builds")
Co-developed-by: Daniel Bristot de Oliveira 
Signed-off-by: Daniel Bristot de Oliveira 
Signed-off-by: Juri Lelli 
Signed-off-by: Ingo Molnar 
Acked-by: Patrick Bellasi 
Reviewed-by: Valentin Schneider 
Link: https://lore.kernel.org/r/20201013053114.160628-1-juri.le...@redhat.com
---
 kernel/sched/core.c  |  2 +-
 kernel/sched/sched.h | 13 ++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8160ab5..d2003a7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -44,7 +44,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp);
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 
-#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_JUMP_LABEL)
+#ifdef CONFIG_SCHED_DEBUG
 /*
  * Debugging: various feature bits
  *
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 648f023..df80bfc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1629,7 +1629,7 @@ enum {
 
 #undef SCHED_FEAT
 
-#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_JUMP_LABEL)
+#ifdef CONFIG_SCHED_DEBUG
 
 /*
  * To support run-time toggling of sched features, all the translation units
@@ -1637,6 +1637,7 @@ enum {
  */
 extern const_debug unsigned int sysctl_sched_features;
 
+#ifdef CONFIG_JUMP_LABEL
 #define SCHED_FEAT(name, enabled)  \
 static __always_inline bool static_branch_##name(struct static_key *key) \
 {  \
@@ -1649,7 +1650,13 @@ static __always_inline bool static_branch_##name(struct 
static_key *key) \
 extern struct static_key sched_feat_keys[__SCHED_FEAT_NR];
 #define sched_feat(x) (static_branch_##x(_feat_keys[__SCHED_FEAT_##x]))
 
-#else /* !(SCHED_DEBUG && CONFIG_JUMP_LABEL) */
+#else /* !CONFIG_JUMP_LABEL */
+
+#define sched_feat(x) (sysctl_sched_features & (1UL << __SCHED_FEAT_##x))
+
+#endif /* CONFIG_JUMP_LABEL */
+
+#else /* !SCHED_DEBUG */
 
 /*
  * Each translation unit has its own copy of sysctl_sched_features to allow
@@ -1665,7 +1672,7 @@ static const_debug __maybe_unused unsigned int 
sysctl_sched_features =
 
 #define sched_feat(x) !!(sysctl_sched_features & (1UL << __SCHED_FEAT_##x))
 
-#endif /* SCHED_DEBUG && CONFIG_JUMP_LABEL */
+#endif /* SCHED_DEBUG */
 
 extern struct static_key_false sched_numa_balancing;
 extern struct static_key_false sched_schedstats;


Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2020-10-14 Thread Mina Almasry
On Wed, Oct 14, 2020 at 9:15 AM David Hildenbrand  wrote:
>
> On 14.10.20 17:22, David Hildenbrand wrote:
> > Hi everybody,
> >
> > Michal Privoznik played with "free page reporting" in QEMU/virtio-balloon
> > with hugetlbfs and reported that this results in [1]
> >
> > 1. WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 
> > page_counter_uncharge+0x4b/0x5
> >
> > 2. Any hugetlbfs allocations failing. (I assume because some accounting is 
> > wrong)
> >
> >
> > QEMU with free page hinting uses fallocate(FALLOC_FL_PUNCH_HOLE)
> > to discard pages that are reported as free by a VM. The reporting
> > granularity is in pageblock granularity. So when the guest reports
> > 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE) one huge page in QEMU.
> >
> > I was also able to reproduce (also with virtio-mem, which similarly
> > uses fallocate(FALLOC_FL_PUNCH_HOLE)) on latest v5.9
> > (and on v5.7.X from F32).
> >
> > Looks like something with fallocate(FALLOC_FL_PUNCH_HOLE) accounting
> > is broken with cgroups. I did *not* try without cgroups yet.
> >
> > Any ideas?

Hi David,

I may be able to dig in and take a look. How do I reproduce this
though? I just fallocate(FALLOC_FL_PUNCH_HOLE) one 2MB page in a
hugetlb region?

>
> Just tried without the hugetlb controller, seems to work just fine.
>
> I'd like to note that
> - The controller was not activated
> - I had to compile the hugetlb controller out to make it work.
>
> --
> Thanks,
>
> David / dhildenb
>


[PATCH 0/2] Remove Xen PVH dependency on PCI

2020-10-14 Thread Jason Andryuk
A Xen PVH domain doesn't have a PCI bus or devices, so it doesn't need
PCI support built in.  Currently, XEN_PVH depends on XEN_PVHVM which
depends on PCI.

The first patch introduces XEN_PVHVM_GUEST as a toplevel item and
changes XEN_PVHVM to a hidden variable.  This allows XEN_PVH to depend
on XEN_PVHVM without PCI while XEN_PVHVM_GUEST depends on PCI.

The second patch moves XEN_512GB to clean up the option nesting.

Jason Andryuk (2):
  xen: Remove Xen PVH/PVHVM dependency on PCI
  xen: Kconfig: nest Xen guest options

 arch/x86/xen/Kconfig | 38 ++
 drivers/xen/Makefile |  2 +-
 2 files changed, 23 insertions(+), 17 deletions(-)

-- 
2.26.2



[PATCH 2/2] xen: Kconfig: nest Xen guest options

2020-10-14 Thread Jason Andryuk
Moving XEN_512GB allows it to nest under XEN_PV.  That also allows
XEN_PVH to nest under XEN as a sibling to XEN_PV and XEN_PVHVM giving:

[*]   Xen guest support
[*] Xen PV guest support
[*]   Limit Xen pv-domain memory to 512GB
[*]   Xen PV Dom0 support
[*] Xen PVHVM guest support
[*] Xen PVH guest support

Signed-off-by: Jason Andryuk 
---
 arch/x86/xen/Kconfig | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index b75007eb4ec4..2b105888927c 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -26,6 +26,19 @@ config XEN_PV
help
  Support running as a Xen PV guest.
 
+config XEN_512GB
+   bool "Limit Xen pv-domain memory to 512GB"
+   depends on XEN_PV && X86_64
+   default y
+   help
+ Limit paravirtualized user domains to 512GB of RAM.
+
+ The Xen tools and crash dump analysis tools might not support
+ pv-domains with more than 512 GB of RAM. This option controls the
+ default setting of the kernel to use only up to 512 GB or more.
+ It is always possible to change the default via specifying the
+ boot parameter "xen_512gb_limit".
+
 config XEN_PV_SMP
def_bool y
depends on XEN_PV && SMP
@@ -53,19 +66,6 @@ config XEN_PVHVM_GUEST
help
  Support running as a Xen PVHVM guest.
 
-config XEN_512GB
-   bool "Limit Xen pv-domain memory to 512GB"
-   depends on XEN_PV
-   default y
-   help
- Limit paravirtualized user domains to 512GB of RAM.
-
- The Xen tools and crash dump analysis tools might not support
- pv-domains with more than 512 GB of RAM. This option controls the
- default setting of the kernel to use only up to 512 GB or more.
- It is always possible to change the default via specifying the
- boot parameter "xen_512gb_limit".
-
 config XEN_SAVE_RESTORE
bool
depends on XEN
-- 
2.26.2



[PATCH 1/2] xen: Remove Xen PVH/PVHVM dependency on PCI

2020-10-14 Thread Jason Andryuk
A Xen PVH domain doesn't have a PCI bus or devices, so it doesn't need
PCI support built in.  Currently, XEN_PVH depends on XEN_PVHVM which
depends on PCI.

Introduce XEN_PVHVM_GUEST as a toplevel item and change XEN_PVHVM to a
hidden variable.  This allows XEN_PVH to depend on XEN_PVHVM without PCI
while XEN_PVHVM_GUEST depends on PCI.

In drivers/xen, compile platform-pci depending on XEN_PVHVM_GUEST since
that pulls in the PCI dependency for linking.

Signed-off-by: Jason Andryuk 
---
---
 arch/x86/xen/Kconfig | 18 --
 drivers/xen/Makefile |  2 +-
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 218acbd5c7a0..b75007eb4ec4 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -39,16 +39,20 @@ config XEN_DOM0
  Support running as a Xen PV Dom0 guest.
 
 config XEN_PVHVM
-   bool "Xen PVHVM guest support"
-   default y
-   depends on XEN && PCI && X86_LOCAL_APIC
-   help
- Support running as a Xen PVHVM guest.
+   def_bool y
+   depends on XEN && X86_LOCAL_APIC
 
 config XEN_PVHVM_SMP
def_bool y
depends on XEN_PVHVM && SMP
 
+config XEN_PVHVM_GUEST
+   bool "Xen PVHVM guest support"
+   default y
+   depends on XEN_PVHVM && PCI
+   help
+ Support running as a Xen PVHVM guest.
+
 config XEN_512GB
bool "Limit Xen pv-domain memory to 512GB"
depends on XEN_PV
@@ -76,7 +80,9 @@ config XEN_DEBUG_FS
  Enabling this option may incur a significant performance overhead.
 
 config XEN_PVH
-   bool "Support for running as a Xen PVH guest"
+   bool "Xen PVH guest support"
depends on XEN && XEN_PVHVM && ACPI
select PVH
def_bool n
+   help
+ Support for running as a Xen PVH guest.
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index babdca808861..c3621b9f4012 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -21,7 +21,7 @@ obj-$(CONFIG_XEN_GNTDEV)  += xen-gntdev.o
 obj-$(CONFIG_XEN_GRANT_DEV_ALLOC)  += xen-gntalloc.o
 obj-$(CONFIG_XENFS)+= xenfs/
 obj-$(CONFIG_XEN_SYS_HYPERVISOR)   += sys-hypervisor.o
-obj-$(CONFIG_XEN_PVHVM)+= platform-pci.o
+obj-$(CONFIG_XEN_PVHVM_GUEST)  += platform-pci.o
 obj-$(CONFIG_SWIOTLB_XEN)  += swiotlb-xen.o
 obj-$(CONFIG_XEN_MCE_LOG)  += mcelog.o
 obj-$(CONFIG_XEN_PCIDEV_BACKEND)   += xen-pciback/
-- 
2.26.2



[GIT PULL] SPDX patches for 5.10-rc1

2020-10-14 Thread Greg KH
The following changes since commit 856deb866d16e29bd65952e0289066f6078af773:

  Linux 5.9-rc5 (2020-09-13 16:06:00 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx.git 
tags/spdx-5.10-rc1

for you to fetch changes up to c5c553850899e2662ecf749ac21fff95d17f59a4:

  scripts/spdxcheck.py: handle license identifiers in XML comments (2020-10-02 
11:31:26 +0200)


SPDX patches for 5.10-rc1

Here are some SPDX-specific changes for 5.10-rc1.

They include:
- driver fixes to make spdxcheck.pl work properly
- add GFDL licenses as "deprecated" but required due to some of
  our documentation using them
- add Zlib license as "deprecated" but required because we have
  code with this license in the tree.
- convert some drivers to have SPDX identifiers that previously
  didn't have them.

All have been in linux-next for a very long time with no reported
issues.

Signed-off-by: Greg Kroah-Hartman 


Lukas Bulwahn (2):
  net/mlx5: IPsec: make spdxcheck.py happy
  scripts/spdxcheck.py: handle license identifiers in XML comments

Mauro Carvalho Chehab (1):
  LICENSE: add GFDL deprecated licenses

Mikhail Zaslonko (1):
  LICENSES/deprecated: add Zlib license text

Thomas Gleixner (5):
  scsi/qla4xxx: Convert to SPDX license identifiers
  scsi/qla2xxx: Convert to SPDX license identifiers
  net/qlcnic: Convert to SPDX license identifiers
  net/qlge: Convert to SPDX license identifiers
  net/qla3xxx: Convert to SPDX license identifiers

 .../device_drivers/qlogic/LICENSE.qla3xxx  |  46 ---
 .../device_drivers/qlogic/LICENSE.qlcnic   | 288 --
 .../networking/device_drivers/qlogic/LICENSE.qlge  | 288 --
 Documentation/scsi/LICENSE.qla2xxx | 290 --
 Documentation/scsi/LICENSE.qla4xxx | 289 --
 LICENSES/deprecated/GFDL-1.1   | 377 +++
 LICENSES/deprecated/GFDL-1.2   | 417 +
 LICENSES/deprecated/Zlib   |  27 ++
 MAINTAINERS|   3 -
 .../mellanox/mlx5/core/accel/ipsec_offload.c   |   2 +-
 drivers/net/ethernet/qlogic/qla3xxx.c  |   3 +-
 drivers/net/ethernet/qlogic/qla3xxx.h  |   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h|   3 +-
 .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c|   3 +-
 .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h|   3 +-
 .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c  |   3 +-
 .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_vnic.c  |   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c|   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.c|   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h|   3 +-
 .../net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c|   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_hdr.h|   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c |   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h |   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_init.c   |   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c |   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c   |   3 +-
 .../net/ethernet/qlogic/qlcnic/qlcnic_minidump.c   |   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov.h  |   3 +-
 .../ethernet/qlogic/qlcnic/qlcnic_sriov_common.c   |   3 +-
 .../net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c   |   3 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c  |   3 +-
 drivers/scsi/qla2xxx/qla_attr.c|   3 +-
 drivers/scsi/qla2xxx/qla_bsg.c |   3 +-
 drivers/scsi/qla2xxx/qla_bsg.h |   3 +-
 drivers/scsi/qla2xxx/qla_dbg.c |   3 +-
 drivers/scsi/qla2xxx/qla_dbg.h |   3 +-
 drivers/scsi/qla2xxx/qla_def.h |   3 +-
 drivers/scsi/qla2xxx/qla_dfs.c |   3 +-
 drivers/scsi/qla2xxx/qla_fw.h  |   3 +-
 drivers/scsi/qla2xxx/qla_gbl.h |   3 +-
 drivers/scsi/qla2xxx/qla_gs.c  |   3 +-
 drivers/scsi/qla2xxx/qla_init.c|   3 +-
 drivers/scsi/qla2xxx/qla_inline.h  |   3 +-
 drivers/scsi/qla2xxx/qla_iocb.c|   3 +-
 drivers/scsi/qla2xxx/qla_isr.c |   3 +-
 drivers/scsi/qla2xxx/qla_mbx.c |   3 +-
 drivers/scsi/qla2xxx/qla_mid.c |   3 +-
 drivers/scsi/qla2xxx/qla_mr.c  |   3 +-
 drivers/scsi/qla2xxx/qla_mr.h  |   3 +-
 drivers/scsi/qla2xxx/qla_nvme.c|   3 +-
 drivers/scsi/qla2xxx/qla_nvme.h|   3 +-
 

[PATCH] printk: ringbuffer: Wrong data pointer when appending small string

2020-10-14 Thread Petr Mladek
data_realloc() returns wrong data pointer when the block is wrapped and
the size is not increased. It might happen when pr_cont() wants to
add only few characters and there is already a space for them because
of alignment.

It might cause writing outsite the buffer. It has been detected by LTP
tests with KASAN enabled:

[  221.921944] 
oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=c,mems_allowed=0,oom_memcg=/0,task_memcg=in
[  221.922108] 
==
[  221.922111] BUG: KASAN: global-out-of-bounds in vprintk_store+0x362/0x3d0
[  221.922112] Write of size 2 at addr ba51dbcd by task
memcg_test_1/11282
[  221.922113]
[  221.922114] CPU: 1 PID: 11282 Comm: memcg_test_1 Not tainted
5.9.0-next-20201013 #1
[  221.922116] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[  221.922116] Call Trace:
[  221.922117]  dump_stack+0xa4/0xd9
[  221.922118]  print_address_description.constprop.0+0x21/0x210
[  221.922119]  ? _raw_write_lock_bh+0xe0/0xe0
[  221.922120]  ? vprintk_store+0x362/0x3d0
[  221.922121]  kasan_report.cold+0x37/0x7c
[  221.922122]  ? vprintk_store+0x362/0x3d0
[  221.922123]  check_memory_region+0x18c/0x1f0
[  221.922124]  memcpy+0x3c/0x60
[  221.922125]  vprintk_store+0x362/0x3d0
[  221.922125]  ? __ia32_sys_syslog+0x50/0x50
[  221.922126]  ? _raw_spin_lock_irqsave+0x9b/0x100
[  221.922127]  ? _raw_spin_lock_irq+0xf0/0xf0
[  221.922128]  ? __kasan_check_write+0x14/0x20
[  221.922129]  vprintk_emit+0x8d/0x1f0
[  221.922130]  vprintk_default+0x1d/0x20
[  221.922131]  vprintk_func+0x5a/0x100
[  221.922132]  printk+0xb2/0xe3
[  221.922133]  ? swsusp_write.cold+0x189/0x189
[  221.922134]  ? kernfs_vfs_xattr_set+0x60/0x60
[  221.922134]  ? _raw_write_lock_bh+0xe0/0xe0
[  221.922135]  ? trace_hardirqs_on+0x38/0x100
[  221.922136]  pr_cont_kernfs_path.cold+0x49/0x4b
[  221.922137]  mem_cgroup_print_oom_context.cold+0x74/0xc3
[  221.922138]  dump_header+0x340/0x3bf
[  221.922139]  oom_kill_process.cold+0xb/0x10
[  221.922140]  out_of_memory+0x1e9/0x860
[  221.922141]  ? oom_killer_disable+0x210/0x210
[  221.922142]  mem_cgroup_out_of_memory+0x198/0x1c0
[  221.922143]  ? mem_cgroup_count_precharge_pte_range+0x250/0x250
[  221.922144]  try_charge+0xa9b/0xc50
[  221.922145]  ? arch_stack_walk+0x9e/0xf0
[  221.922146]  ? memory_high_write+0x230/0x230
[  221.922146]  ? avc_has_extended_perms+0x830/0x830
[  221.922147]  ? stack_trace_save+0x94/0xc0
[  221.922148]  ? stack_trace_consume_entry+0x90/0x90
[  221.922149]  __memcg_kmem_charge+0x73/0x120
[  221.922150]  ? cred_has_capability+0x10f/0x200
[  221.922151]  ? mem_cgroup_can_attach+0x260/0x260
[  221.922152]  ? selinux_sb_eat_lsm_opts+0x2f0/0x2f0
[  221.922153]  ? obj_cgroup_charge+0x16b/0x220
[  221.922154]  ? kmem_cache_alloc+0x78/0x4c0
[  221.922155]  obj_cgroup_charge+0x122/0x220
[  221.922156]  ? vm_area_alloc+0x20/0x90
[  221.922156]  kmem_cache_alloc+0x78/0x4c0
[  221.922157]  vm_area_alloc+0x20/0x90
[  221.922158]  mmap_region+0x3ed/0x9a0
[  221.922159]  ? cap_mmap_addr+0x1d/0x80
[  221.922160]  do_mmap+0x3ee/0x720
[  221.922161]  vm_mmap_pgoff+0x16a/0x1c0
[  221.922162]  ? randomize_stack_top+0x90/0x90
[  221.922163]  ? copy_page_range+0x1980/0x1980
[  221.922163]  ksys_mmap_pgoff+0xab/0x350
[  221.922164]  ? find_mergeable_anon_vma+0x110/0x110
[  221.922165]  ? __audit_syscall_entry+0x1a6/0x1e0
[  221.922166]  __x64_sys_mmap+0x8d/0xb0
[  221.922167]  do_syscall_64+0x38/0x50
[  221.922168]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  221.922169] RIP: 0033:0x7fe8f5e75103
[  221.922172] Code: 54 41 89 d4 55 48 89 fd 53 4c 89 cb 48 85 ff 74
56 49 89 d9 45 89 f8 45 89 f2 44 89 e2 4c 89 ee 48 89 ef b8 09 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 7d 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66
2e 0f
[  221.922173] RSP: 002b:7ffd38c90198 EFLAGS: 0246 ORIG_RAX:
0009
[  221.922175] RAX: ffda RBX:  RCX: 7fe8f5e75103
[  221.922176] RDX: 0003 RSI: 1000 RDI: 
[  221.922178] RBP:  R08:  R09: 
[  221.922179] R10: 2022 R11: 0246 R12: 0003
[  221.922180] R13: 1000 R14: 2022 R15: 
[  221.922181]
[  213O[  221.922182] The buggy address belongs to the variable:
[  221.922183]  clear_seq+0x2d/0x40
[  221.922183]
[  221.922184] Memory state around the buggy address:
[  221.922185]  ba51da80: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[  221.922187]  ba51db00: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[  221.922188] >ba51db80: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9
00 f9 f9 f9
[  221.922189]   ^
[  221.922190]  ba51dc00: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9
00 f9 f9 f9
[  221.922191]  ba51dc80: f9 f9 f9 f9 01 f9 f9 f9 f9 f9 f9 f9
00 f9 f9 f9
[  221.922193] 

Re: [PATCH] perf jevents: Fix event code for events referencing std arch events

2020-10-14 Thread John Garry

On 14/10/2020 17:49, Arnaldo Carvalho de Melo wrote:

Ok, applied,


Thanks


please consider adding a Fixes tag next time.



Can do if it helps, but I only thought it appropriate when fixing 
something merged to mainline.


John


[GIT PULL] Driver core patches for 5.10-rc1

2020-10-14 Thread Greg KH
The following changes since commit 856deb866d16e29bd65952e0289066f6078af773:

  Linux 5.9-rc5 (2020-09-13 16:06:00 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 
tags/driver-core-5.10-rc1

for you to fetch changes up to ee4906770ee931394179bcd42cabb196bc952276:

  regmap: debugfs: use semicolons rather than commas to separate statements 
(2020-10-02 15:48:52 +0200)


Driver Core patches for 5.10-rc1

Here is the "big" set of driver core patches for 5.10-rc1

They include a lot of different things, all related to the driver core
and/or some driver logic:
- sysfs common write functions to make it easier to audit sysfs
  attributes
- device connection cleanups and fixes
- devm helpers for a few functions
- NOIO allocations for when devices are being removed
- minor cleanups and fixes

All have been in linux-next for a while with no reported issues.

Signed-off-by: Greg Kroah-Hartman 


Andy Shevchenko (1):
  driver core: Annotate dev_err_probe() with __must_check

Bartosz Golaszewski (4):
  devres: provide devm_krealloc()
  hwmon: pmbus: use more devres helpers
  iio: adc: xilinx-xadc: use devm_krealloc()
  platform_device: switch to simpler IDA interface

Greg Kroah-Hartman (4):
  Revert "test_firmware: Test platform fw loading on non-EFI systems"
  Revert "driver core: Annotate dev_err_probe() with __must_check"
  Merge 5.9-rc5 into driver-core-next
  platform/x86: intel_pmc_core: do not create a static struct device

Heikki Krogerus (5):
  device connection: Remove device_connection_find()
  device connection: Remove device_connection_add()
  device connection: Remove struct device_connection
  device property: Move fwnode_connection_find_match() under 
drivers/base/property.c
  Documentation: Remove device connection documentation

Jim Cromie (1):
  dyndbg: use keyword, arg varnames for query term pairs

Joe Perches (8):
  sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output
  drivers core: Use sysfs_emit and sysfs_emit_at for show(device *...) 
functions
  drivers core: Remove strcat uses around sysfs_emit and neaten
  drivers core: Reindent a couple uses around sysfs_emit
  drivers core: Miscellaneous changes for sysfs_emit
  mm: and drivers core: Convert hugetlb_report_node_meminfo to sysfs_emit
  drivers core: Use sysfs_emit for shared_cpu_map_show and 
shared_cpu_list_show
  drivers core: node: Use a more typical macro definition style for 
ACCESS_ATTR

Jonathan Neuschäfer (1):
  docs: driver-api: firmware: fallback-mechanisms: Fix rendering of bullet 
point

Julia Lawall (1):
  regmap: debugfs: use semicolons rather than commas to separate statements

Kees Cook (1):
  test_firmware: Test platform fw loading on non-EFI systems

Oliver Neukum (1):
  driver core: force NOIO allocations during unplug

Randy Dunlap (1):
  lib: devres: delete duplicated words

Saravana Kannan (1):
  scripts/dev-needs: Add script to list device dependencies

Stephen Boyd (2):
  syscore: Use pm_pr_dbg() for syscore_{suspend,resume}()
  driver core: platform: Document return type of more functions

Zenghui Yu (1):
  driver core: Use the ktime_us_delta() helper

 Documentation/driver-api/device_connection.rst |  43 ---
 Documentation/driver-api/driver-model/devres.rst   |   1 +
 .../driver-api/firmware/fallback-mechanisms.rst|   1 +
 Documentation/driver-api/index.rst |   1 -
 Documentation/filesystems/sysfs.rst|   8 +-
 MAINTAINERS|   6 +
 drivers/base/Makefile  |   2 +-
 drivers/base/arch_topology.c   |   2 +-
 drivers/base/bus.c |   2 +-
 drivers/base/cacheinfo.c   |  49 ++--
 drivers/base/class.c   |   2 +-
 drivers/base/core.c|  63 +++--
 drivers/base/cpu.c |  84 +++---
 drivers/base/dd.c  |   8 +-
 drivers/base/devcon.c  | 231 ---
 drivers/base/devcoredump.c |   2 +-
 drivers/base/devres.c  | 105 +++
 drivers/base/firmware_loader/fallback.c|   4 +-
 drivers/base/memory.c  |  62 ++--
 drivers/base/node.c| 306 ++--
 drivers/base/platform.c|  37 ++-
 drivers/base/power/sysfs.c | 160 +++
 drivers/base/power/wakeup_stats.c  |  17 +-
 drivers/base/property.c|  

[GIT PULL] TTY/Serial driver patches for 5.10-rc1

2020-10-14 Thread Greg KH
The following changes since commit ba4f184e126b751d1bffad5897f263108befc780:

  Linux 5.9-rc6 (2020-09-20 16:33:55 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tags/tty-5.10-rc1

for you to fetch changes up to 4be87603b6dc9e49c2e07151bb51180dc0b6964a:

  serial: mcf: add sysrq capability (2020-10-05 13:32:30 +0200)


TTY/Serial patches for 5.10-rc1

Here is the big set of tty and serial driver patches for 5.10-rc1.

Lots of little things in here, including:
- tasklet_setup api conversions
- sysrq support for capital letters
- vt and vc cleanups and unwinding the mess some more
- serial driver updates and minor tweaks
- new device ids
- rs485 support for some drivers
- serial binding documentation updates
- lots of small serial driver changes for reported issues

All have been in linux-next for a while with no reported issues.

Signed-off-by: Greg Kroah-Hartman 


Alex Dewar (1):
  serial: core: don't use snprintf() for formatting sysfs attrs

Allen Pais (4):
  tty: ipwireless: convert tasklets to use new tasklet_setup() API
  tty: atmel_serial: convert tasklets to use new tasklet_setup() API
  tty: ifx6x60: convert tasklets to use new tasklet_setup() API
  tty: timbuart: convert tasklets to use new tasklet_setup() API

Andrij Abyzov (1):
  serial: 8250_fsl: Fix TX interrupt handling condition

Andrzej Pietrasiewicz (1):
  tty/sysrq: Extend the sysrq_key_table to cover capital letters

Andy Shevchenko (1):
  serial: sa1100: use platform_get_resource()

Angelo Dureghello (2):
  serial: fsl_lpuart: add sysrq support when using dma
  serial: mcf: add sysrq capability

Artem Savkov (1):
  pty: do tty_flip_buffer_push without port->lock in pty_write

Christophe JAILLET (2):
  tty: serial: icom: switch from 'pci_' to 'dma_' API
  tty: synclink_gt: switch from 'pci_' to 'dma_' API

Daniel Mack (1):
  sc16is7xx: Set iobase to device index

Douglas Anderson (1):
  tty: serial: qcom_geni_serial: 115.2 is a better console default than 9600

Du Huanpeng (1):
  serial: 8250_pci: Add WCH384_8S 8 port serial device

Fabio Estevam (1):
  serial: fsl_lpuart: Fix typo in "transfer"

Greg Kroah-Hartman (3):
  Merge 5.9-rc3 into tty-next
  Merge 5.9.0-rc6 into tty-next
  Merge ba31128384dfd ("Merge tag 'libnvdimm-fixes-5.9-rc7' of 
git://git.kernel.org/.../nvdimm/nvdimm") into tty-next

Hsin-Yi Wang (2):
  tty: serial: print earlycon info after match->setup
  tty: serial: 8250_mtk: set regshift for mmio32

Jan Kara (1):
  dax: Fix compilation for CONFIG_DAX && !CONFIG_FS_DAX

Jason Yan (1):
  serial: ucc_uart: make qe_uart_set_mctrl() static

Jiri Slaby (25):
  vt: make vc_data pointers const in selection.h
  vt: declare xy for get/putconsxy properly
  vc: propagate "viewed as bool" from screenpos up
  vc_screen: document and cleanup vcs_vc
  vc_screen: rewrite vcs_size to accept vc, not inode
  vc_screen: sanitize types in vcs_write
  vc_screen: extract vcs_write_buf_noattr
  vc_screen: extract vcs_write_buf
  vc_screen: eliminate ifdefs from vcs_write_buf
  vc_screen: sanitize types in vcs_read
  vs_screen: kill tmp_count from vcs_read
  vc_screen: extract vcs_read_buf_uni
  vc_screen: extract vcs_read_buf_noattr
  vc_screen: extract vcs_read_buf
  vc_screen: extract vcs_read_buf_header
  vc_screen: prune macros
  tty: n_gsm, eliminate indirection for gsm->{output,error}()
  newport_con: fix no return statement in newport_show_logo
  newport_con: make module's init & exit static using module_driver
  tty: fix kernel-doc
  tty: ldiscs, fix kernel-doc
  tty: vt, fix kernel-doc
  tty: synclink, fix kernel-doc
  tty: serial, fix kernel-doc
  Revert "vc_screen: extract vcs_read_buf_header"

Julia Lawall (1):
  pch_uart: drop double zeroing

Krzysztof Kozlowski (2):
  serial: 8250: Simplify with dev_err_probe()
  serial: core: Simplify with dev_err_probe()

Lad Prabhakar (2):
  dt-bindings: serial: renesas, scif: Document r8a774e1 bindings
  dt-bindings: serial: renesas, hscif: Document r8a774e1 bindings

Linus Torvalds (1):
  Merge tag 'libnvdimm-fixes-5.9-rc7' of 
git://git.kernel.org/.../nvdimm/nvdimm

Marek Vasut (1):
  serial: stm32: Add RS485 RTS GPIO control again

Matthias Schiffer (1):
  tty: serial: imx: disable TXDC IRQ in imx_uart_shutdown() to avoid IRQ 
storm

Paras Sharma (1):
  serial: qcom_geni_serial: To correct QUP Version detection logic

Peng Fan (2):
  tty: serial: lpuart: fix lpuart32_write usage
  tty: serial: fsl_lpuart: fix lpuart32_poll_get_char

Peter Zijlstra (1):
  serial: pl011: Fix lockdep splat when 

Re: [RFC PATCH v2] checkpatch: add shebang check to EXECUTE_PERMISSIONS

2020-10-14 Thread Miguel Ojeda
On Wed, Oct 14, 2020 at 10:40 AM Joe Perches  wrote:
>
> Eek no.
>
> Mindless use of either tool isn't a great thing.

That is up to opinion. I (and others) definitely want to get to the
point the kernel sources are automatically formatted, because it has
significant advantages. The biggest is that it saves a lot of time for
everyone involved.

> Linux source code has generally be created with
> human readability in mind by humans, not scripts.

Code readability is by definition for humans, and any code formatter
(like clang-format) is written for them.

> Please don't try to replace human readable code
> with mindless tools.

Please do :-)

> There is a _lot_ of relatively inappropriate
> output in how clang-format changes existing code
> in the kernel.

Sure, but note that:

  - Code that should be specially-formatted should be in a
clang-format-off section to begin with, so it doesn't count.

  - Some people like to tweak identifiers or refactor code to make it
fit in the line limit beautifully. If you are doing that, then you
should do that for clang-format too. It is not fair to compare both
outputs when the tool is restricted on the transformations it can
perform. You can help the tool, too, the same way you can help
yourself.

  - Some style differences may be worth ignoring if the advantage is
getting automatic formatting.

Cheers,
Miguel


Re: [PATCH RFC v4 00/13] perf pmu-events: Support event aliasing for system PMUs

2020-10-14 Thread John Garry

On 14/10/2020 12:16, Jiri Olsa wrote:

My thought was that since the kernel part needs acceptance first [0], which
is based on v5.9-rc7, I would just use the same baseline here.

However I suppose I should still use Arnaldo's perf/core from now on as
baseline, so I'll look at that now.

yes please, I can't already apply 2nd patch..

patching file pmu-events/jevents.c
Hunk #1 succeeded at 82 with fuzz 2 (offset 29 lines).
Hunk #2 FAILED at 335.
Hunk #3 FAILED at 354.
Hunk #4 succeeded at 406 (offset 22 lines).
Hunk #5 FAILED at 439.
Hunk #6 FAILED at 531.
Hunk #7 FAILED at 555.
Hunk #8 FAILED at 602.
Hunk #9 FAILED at 698.
Hunk #10 succeeded at 770 (offset 2 lines).
Hunk #11 succeeded at 813 (offset 2 lines).
Hunk #12 succeeded at 1075 (offset 2 lines).
Hunk #13 succeeded at 1242 (offset 2 lines).
Hunk #14 succeeded at 1273 (offset 2 lines).
7 out of 14 hunks FAILED -- saving rejects to file 
pmu-events/jevents.c.rej
can't find file to patch at input line 439


OK, so I already did the rebase and it was quite straightforward - the 
conflicts came with the recent changes in jevents.


I will send out soon.

Thanks,
John


Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class

2020-10-14 Thread Elia Devito
Hi,

In data mercoledì 14 ottobre 2020 17:46:43 CEST, Rafael J. Wysocki ha scritto:
> On Wed, Oct 14, 2020 at 4:16 PM Hans de Goede  wrote:
> > Hi,
> > 
> > On 10/14/20 3:55 PM, Rafael J. Wysocki wrote:
> > > On Tue, Oct 13, 2020 at 3:09 PM Hans de Goede  
wrote:
> > >> Hi,
> > >> 
> > >> On 10/12/20 6:42 PM, Rafael J. Wysocki wrote:
> > >>> On Wed, Oct 7, 2020 at 8:41 PM Limonciello, Mario
> > >>> 
> > >>>  wrote:
> > > On Wed, 2020-10-07 at 15:58 +, Limonciello, Mario wrote:
> > >>> On Mon, 2020-10-05 at 12:58 +, Limonciello, Mario wrote:
> > > On modern systems CPU/GPU/... performance is often dynamically
> > > configurable
> > > in the form of e.g. variable clock-speeds and TPD. The
> > > performance
> > > is often
> > > automatically adjusted to the load by some automatic-mechanism
> > > (which may
> > > very well live outside the kernel).
> > > 
> > > These auto performance-adjustment mechanisms often can be
> > > configured with
> > > one of several performance-profiles, with either a bias towards
> > > low-power
> > > consumption (and cool and quiet) or towards performance (and
> > > higher
> > > power
> > > consumption and thermals).
> > > 
> > > Introduce a new performance_profile class/sysfs API which
> > > offers a
> > > generic
> > > API for selecting the performance-profile of these automatic-
> > > mechanisms.
> >  
> >  If introducing an API for this - let me ask the question, why
> >  even let each
> >  driver offer a class interface and userspace need to change
> >  "each" driver's
> >  performance setting?
> >  
> >  I would think that you could just offer something kernel-wide
> >  like
> >  /sys/power/performance-profile
> >  
> >  Userspace can read and write to a single file.  All drivers can
> >  get notified
> >  on this sysfs file changing.
> >  
> >  The systems that react in firmware (such as the two that prompted
> >  this discussion) can change at that time.  It leaves the
> >  possibility for a
> >  more open kernel implementation that can do the same thing though
> >  too by
> >  directly modifying device registers instead of ACPI devices.
> > >>> 
> > >>> The problem, as I've mentioned in previous discussions we had
> > >>> about
> > >>> this, is that, as you've seen in replies to this mail, this would
> > >>> suddenly be making the kernel apply policy.
> > >>> 
> > >>> There's going to be pushback as soon as policy is enacted in the
> > >>> kernel, and you take away the different knobs for individual
> > >>> components
> > >>> (or you can control them centrally as well as individually). As
> > >>> much as
> > >>> I hate the quantity of knobs[1], I don't think that trying to
> > >>> reduce
> > >>> the number of knobs in the kernel is a good use of our time, and
> > >>> easier
> > >>> to enact, coordinated with design targets, in user-space.
> > >>> 
> > >>> Unless you can think of a way to implement this kernel wide
> > >>> setting
> > >>> without adding one more exponent on the number of possibilities
> > >>> for
> > >>> the
> > >>> testing matrix, I'll +1 Hans' original API.
> > >> 
> > >> Actually I offered two proposals in my reply.  So are you NAKing
> > >> both?
> > > 
> > > No, this is only about the first portion of the email, which I
> > > quoted.
> > > And I'm not NAK'ing it, but I don't see how it can work without
> > > being
> > > antithetical to what kernel "users" expect, or what the folks
> > > consuming
> > > those interfaces (presumably us both) would expect to be able to
> > > test
> > > and maintain.
> >  
> >  (Just so others are aware, Bastien and I had a previous discussion on
> >  this topic that he alluded to here:
> >  https://gitlab.freedesktop.org/hadess/power-profiles-daemon/-/issues
> >  /1)
> >  
> >  In general I agree that we shouldn't be offering 100's of knobs to
> >  change
> >  things and protect users from themselves where possible.
> >  
> >  Whether the decisions are made in the kernel or in userspace you
> >  still have a matrix once you're letting someone change 2 different
> >  kernel devices that offer policy.  I'd argue it's actually worse if
> >  you let userspace change it though.
> >  
> >  Let's go back to the my GPU and platform example and lets say both
> >  offer the new knob here for both.  Userspace software such as your
> >  PPD picks performance.  Both the platform device and GPU device get
> >  changed, hopefully no conflicts.
> >  Then 

Re: [PATCH v5 3/5] counter: Add character device interface

2020-10-14 Thread David Lechner

On 9/26/20 9:18 PM, William Breathitt Gray wrote:

diff --git a/drivers/counter/counter-chrdev.c b/drivers/counter/counter-chrdev.c
new file mode 100644
index ..2be3846e4105
--- /dev/null
+++ b/drivers/counter/counter-chrdev.c




+/**
+ * counter_push_event - queue event for userspace reading
+ * @counter:   pointer to Counter structure
+ * @event: triggered event
+ * @channel:   event channel
+ *
+ * Note: If no one is watching for the respective event, it is silently
+ * discarded.
+ *
+ * RETURNS:
+ * 0 on success, negative error number on failure.
+ */
+int counter_push_event(struct counter_device *const counter, const u8 event,
+  const u8 channel)
+{
+   struct counter_event ev = {0};
+   unsigned int copied = 0;
+   unsigned long flags;
+   struct counter_event_node *event_node;
+   struct counter_comp_node *comp_node;
+   int err;
+
+   ev.timestamp = ktime_get_ns();
+   ev.watch.event = event;
+   ev.watch.channel = channel;
+
+   raw_spin_lock_irqsave(>events_lock, flags);
+
+   /* Search for event in the list */
+   list_for_each_entry(event_node, >events_list, l)
+   if (event_node->event == event &&
+   event_node->channel == channel)
+   break;
+
+   /* If event is not in the list */
+   if (_node->l == >events_list)
+   goto exit_early;
+
+   /* Read and queue relevant comp for userspace */
+   list_for_each_entry(comp_node, _node->comp_list, l) {
+   err = counter_get_data(counter, comp_node, _u8);


Currently all counter devices are memory mapped devices so calling
counter_get_data() here with interrupts disabled is probably OK, but
if any counter drivers are added that use I2C/SPI/etc. that will take
a long time to read, it would cause problems leaving interrupts
disabled here.

Brainstorming: Would it make sense to separate the event from the
component value being read? As I mentioned in one of my previous
reviews, I think there are some cases where we would just want to
know when an event happened and not read any additional data anyway.
In the case of a slow communication bus, this would also let us
queue the event in the kfifo and notify poll right away and then
defer the reads in a workqueue for later.




+   if (err)
+   goto err_counter_get_data;
+
+   ev.watch.component = comp_node->component;
+
+   copied += kfifo_put(>events, ev);
+   }
+
+   if (copied)
+   wake_up_poll(>events_wait, EPOLLIN);
+
+exit_early:
+   raw_spin_unlock_irqrestore(>events_lock, flags);
+
+   return 0;
+
+err_counter_get_data:
+   raw_spin_unlock_irqrestore(>events_lock, flags);
+   return err;
+}




Re: [GIT PULL] io_uring updates for 5.10-rc1

2020-10-14 Thread Nick Desaulniers
Sorry for not reporting it sooner.  It looks to me like a GNU `as` bug:
https://github.com/ClangBuiltLinux/linux/issues/1153#issuecomment-692265433
When I'm done with the three build breakages that popped up overnight I'll try
to report it to GNU binutils folks.

(We run an issue tracker out of
https://github.com/ClangBuiltLinux/linux/issues, if your interested to see what
the outstanding known issues are, or recently solved ones.  We try to
aggressively track when and where patches land for the inevitable backports.
We have 118 people in our github group!)


Re: [PATCH v7 3/3] iommu/tegra-smmu: Add PCI support

2020-10-14 Thread Robin Murphy

On 2020-10-09 17:19, Nicolin Chen wrote:

This patch simply adds support for PCI devices.

Reviewed-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Signed-off-by: Nicolin Chen 
---

Changelog
v6->v7
  * Renamed goto labels, suggested by Thierry.
v5->v6
  * Added Dmitry's Reviewed-by and Tested-by.
v4->v5
  * Added Dmitry's Reviewed-by
v3->v4
  * Dropped !iommu_present() check
  * Added CONFIG_PCI check in the exit path
v2->v3
  * Replaced ternary conditional operator with if-else in .device_group()
  * Dropped change in tegra_smmu_remove()
v1->v2
  * Added error-out labels in tegra_smmu_probe()
  * Dropped pci_request_acs() since IOMMU core would call it.

  drivers/iommu/tegra-smmu.c | 35 +--
  1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index be29f5977145..2941d6459076 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -10,6 +10,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -865,7 +866,11 @@ static struct iommu_group *tegra_smmu_device_group(struct 
device *dev)
group->smmu = smmu;
group->soc = soc;
  
-	group->group = iommu_group_alloc();

+   if (dev_is_pci(dev))
+   group->group = pci_device_group(dev);


Just to check, is it OK to have two or more swgroups "owning" the same 
iommu_group if an existing one gets returned here? It looks like that 
might not play nice with the use of iommu_group_set_iommudata().


Robin.


+   else
+   group->group = generic_device_group(dev);
+
if (IS_ERR(group->group)) {
devm_kfree(smmu->dev, group);
mutex_unlock(>lock);
@@ -1075,22 +1080,32 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
iommu_device_set_fwnode(>iommu, dev->fwnode);
  
  	err = iommu_device_register(>iommu);

-   if (err) {
-   iommu_device_sysfs_remove(>iommu);
-   return ERR_PTR(err);
-   }
+   if (err)
+   goto remove_sysfs;
  
  	err = bus_set_iommu(_bus_type, _smmu_ops);

-   if (err < 0) {
-   iommu_device_unregister(>iommu);
-   iommu_device_sysfs_remove(>iommu);
-   return ERR_PTR(err);
-   }
+   if (err < 0)
+   goto unregister;
+
+#ifdef CONFIG_PCI
+   err = bus_set_iommu(_bus_type, _smmu_ops);
+   if (err < 0)
+   goto unset_platform_bus;
+#endif
  
  	if (IS_ENABLED(CONFIG_DEBUG_FS))

tegra_smmu_debugfs_init(smmu);
  
  	return smmu;

+
+unset_platform_bus: __maybe_unused;
+   bus_set_iommu(_bus_type, NULL);
+unregister:
+   iommu_device_unregister(>iommu);
+remove_sysfs:
+   iommu_device_sysfs_remove(>iommu);
+
+   return ERR_PTR(err);
  }
  
  void tegra_smmu_remove(struct tegra_smmu *smmu)




Re: [PATCH 00/20] dt-bindings: usb: Add generic USB HCD, xHCI, DWC USB3 DT schema

2020-10-14 Thread Serge Semin
Ah, forgot to mark the series as v2. Sorry about that. The next one will be v3
then...

-Sergey

On Wed, Oct 14, 2020 at 01:13:42PM +0300, Serge Semin wrote:
> We've performed some work on the Generic USB HCD, xHCI and DWC USB3 DT
> bindings in the framework of the Baikal-T1 SoC support integration into
> the kernel. This patchset is a result of that work.
> 
> First of all we moved the generic USB properties from the legacy text
> bindings into the USB HCD DT schema. So now the generic USB HCD-compatible
> DT nodes are validated taking into account the optional properties like:
> maximum-speed, dr_mode, otg-rev, usb-role-switch, etc. We've fixed these
> properties a bit so they would correspond to what functionality kernel
> currently supports.
> 
> Secondly we converted generic USB xHCI text bindings file into the DT
> schema. It had to be split up into two bindings: DT schema with generic
> xHCI properties and a generic xHCI device DT schema. The later will be
> used to validate the pure xHCI-based nodes, while the former can be
> utilized by some vendor-specific versions of xHCI.
> 
> Thirdly, what was primarily intended to be done for Baikal-T1 SoC USB we
> converted the legacy text-based DWC USB3 bindings to DT schema and altered
> the result a bit so it would be more coherent with what actually
> controller and its driver support. Since we've now got the DWC USB3 DT
> schema, we made it used to validate the sub-nodes of the Qualcom, TI and
> Amlogic DWC3 DT nodes.
> 
> Finally we've also fixed all the OHCI/EHCI, xHCI and DW USB3 compatible DT
> nodes so they would comply with the nodes naming scema declared in the USB
> HCD DT bindings file.
> 
> Link: 
> https://lore.kernel.org/linux-usb/20201010224121.12672-1-sergey.se...@baikalelectronics.ru/
> Changelog v2:
> - Thanks to Sergei Shtylyov for suggesting the commit logs grammar fixes:
>   [PATCH 04/18] dt-bindings: usb: usb-hcd: Add "ulpi/serial/hsic" PHY types
>   [PATCH 05/18] dt-bindings: usb: usb-hcd: Add "tpl-support" property
>   [PATCH 11/18] dt-bindings: usb: dwc3: Add interrupt-names property support
>   [PATCH 13/18] dt-bindings: usb: dwc3: Add Tx De-emphasis restrictions
>   [PATCH 17/18] dt-bindings: usb: keystone-dwc3: Validate DWC3 sub-node
> - Set FL-adj of the amlogiv,meson-g12a-usb controller with value 0x20 instead
>   of completely removing the property.
> - Drop the patch:
>   [PATCH 02/18] dt-bindings: usb: usb-hcd: Add "wireless" maximum-speed
> property value
>   since "wireless" speed type is depracated due to lack of the device
>   supporting it.
> - Drop quotes from around the compat string constant.
> - Discard '|' from the property descriptions, since we don't need to preserve
>   the text formatting.
> - Convert abbreviated form of the "maximum-speed" enum constraint into
>   the multi-lined version of the list.
> - Fix the DW USB3 "clock-names" prop description to be refererring to the
>   enumerated clock-names instead of the ones from the Databook.
> - Add explicit "additionalProperties: true" to the usb-xhci.yaml schema,
>   since additionalProperties/unevaluatedProperties are going to be mandary
>   for each binding.
> - Use "oneOf: [dwc2.yaml#, snps,dwc3.yaml#]" instead of the bulky "if:
>   properties: compatibe: ..." statement.
> - Discard the "^dwc3@[0-9a-f]+$" nodes from being acceptable as sub-nodes
>   of the Qualcomm DWC3 DT nodes.
> - Add new patches:
>   [PATCH 18/20] arch: dts: Fix EHCI/OHCI DT nodes name
>   [PATCH 19/20] arch: dts: Fix xHCI DT nodes name
>   [PATCH 20/20] arch: dts: Fix DWC USB3 DT nodes name
> 
> Signed-off-by: Serge Semin 
> Cc: Alexey Malahov 
> Cc: Pavel Parkhomenko 
> Cc: Andy Gross 
> Cc: Bjorn Andersson 
> Cc: Manu Gautam 
> Cc: Roger Quadros 
> Cc: Lad Prabhakar 
> Cc: Yoshihiro Shimoda 
> Cc: Neil Armstrong 
> Cc: Kevin Hilman 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-snps-...@lists.infradead.org
> Cc: linux-m...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org
> Cc: devicet...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> 
> Serge Semin (20):
>   dt-bindings: usb: usb-hcd: Convert generic USB properties to DT schema
>   dt-bindings: usb: usb-hcd: Add "otg-rev" property restriction
>   dt-bindings: usb: usb-hcd: Add "ulpi/serial/hsic" PHY types
>   dt-bindings: usb: usb-hcd: Add "tpl-support" property
>   dt-bindings: usb: usb-hcd: Add generic "usb-phy" property
>   dt-bindings: usb: Convert xHCI bindings to DT schema
>   dt-bindings: usb: xhci: Add Broadcom STB v2 compatible device
>   dt-bindings: usb: renesas-xhci: Refer to the usb-xhci.yaml file
>   dt-bindings: usb: Convert DWC USB3 bindings to DT schema
>   dt-bindings: usb: dwc3: Add interrupt-names property support
>   dt-bindings: usb: dwc3: Add synopsys,dwc3 compatible string
>   dt-bindings: usb: dwc3: Add Tx De-emphasis restrictions
>   dt-bindings: usb: dwc3: Add Frame Length Adj restrictions
>   dt-bindings: usb: meson-g12a-usb: Fix FL-adj 

Re: [PATCH] firmware: arm_scmi: Fix duplicate workqueue name

2020-10-14 Thread Sudeep Holla
On Wed, Oct 14, 2020 at 10:13:04AM -0700, Florian Fainelli wrote:
> On 10/14/20 9:18 AM, Sudeep Holla wrote:
> > On Wed, Oct 14, 2020 at 02:48:19PM +0100, Cristian Marussi wrote:
> > 
> > [...]
> > 
> >>>
> >>> I have pushed a version with above change [1], please check if you are
> >>> happy with that ?
> >>>
> >>> [1] https://git.kernel.org/sudeep.holla/linux/c/b2cd15549b
> >>
> >> I agree with the need to retain _notify name, but I'm not so sure about
> >> the above patch...which is:
> >>
> > 
> > I agree, I thought about it and just cooked up this as a quick solution.
> > I will move to that, even I wasn't happy with this TBH.
> 
> The reason why I went with just dev_name() was such that the workqueue
> name and the device nodes under /sys would strictly match, which helps
> as an user, and also it avoided the temporary buffer and its size
> limitations.

Agreed. I just showed that as example and was hoping to use some nice
kstr* APIs to achieve what I wanted but soon realised there exists none.
So as replied earlier, I will take this change as it for now. Let us
address this in future if it becomes an issue.

Thanks for quick test, we now know whom to bother if we need more testing
 as out internal platforms are not that great to cover all the aspects
we add in the spec and even in the kernel.

-- 
Regards,
Sudeep


[for-next][PATCH 07/12] tracing: Check that the synthetic event and field names are legal

2020-10-14 Thread Steven Rostedt
From: Tom Zanussi 

Call the is_good_name() function used by probe events to make sure
synthetic event and field names don't contain illegal characters and
cause unexpected parsing of synthetic event commands.

Link: 
https://lkml.kernel.org/r/c4d4bb59d3ac39bcbd70fba0cf837d6b1cedb015.1602598160.git.zanu...@kernel.org

Fixes: 4b147936fa50 (tracing: Add support for 'synthetic' events)
Reported-by: Masami Hiramatsu 
Reviewed-by: Masami Hiramatsu 
Tested-by: Masami Hiramatsu 
Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_synth.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/trace/trace_events_synth.c 
b/kernel/trace/trace_events_synth.c
index b19e2f4159ab..8c9d6e464da0 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -572,6 +572,10 @@ static struct synth_field *parse_synth_field(int argc, 
const char **argv,
ret = -ENOMEM;
goto free;
}
+   if (!is_good_name(field->name)) {
+   ret = -EINVAL;
+   goto free;
+   }
 
if (field_type[0] == ';')
field_type++;
@@ -1112,6 +1116,11 @@ static int __create_synth_event(int argc, const char 
*name, const char **argv)
 
mutex_lock(_mutex);
 
+   if (!is_good_name(name)) {
+   ret = -EINVAL;
+   goto out;
+   }
+
event = find_synth_event(name);
if (event) {
ret = -EEXIST;
-- 
2.28.0




[for-next][PATCH 11/12] selftests/ftrace: Add test case for synthetic event syntax errors

2020-10-14 Thread Steven Rostedt
From: Tom Zanussi 

Add a selftest that verifies that the syntax error messages and caret
positions are correct for most of the possible synthetic event syntax
error cases.

Link: 
https://lkml.kernel.org/r/af611928ce79f86eaf0af8654f1d7802d5cc21ff.1602598160.git.zanu...@kernel.org

Tested-by: Masami Hiramatsu 
Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 .../trigger-synthetic_event_syntax_errors.tc  | 19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc

diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc
new file mode 100644
index ..ada594fe16cb
--- /dev/null
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc
@@ -0,0 +1,19 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: event trigger - test synthetic_events syntax parser errors
+# requires: synthetic_events error_log
+
+check_error() { # command-with-error-pos-by-^
+ftrace_errlog_check 'synthetic_events' "$1" 'synthetic_events'
+}
+
+check_error 'myevent ^chr arg' # INVALID_TYPE
+check_error 'myevent ^char str[];; int v'  # INVALID_TYPE
+check_error 'myevent char ^str]; int v'# INVALID_NAME
+check_error 'myevent char ^str;[]' # INVALID_NAME
+check_error 'myevent ^char str[; int v'# INVALID_TYPE
+check_error '^mye;vent char str[]' # BAD_NAME
+check_error 'myevent char str[]; ^int' # INVALID_FIELD
+check_error '^myevent' # INCOMPLETE_CMD
+
+exit 0
-- 
2.28.0




[for-next][PATCH 01/12] tracing: Check return value of __create_val_fields() before using its result

2020-10-14 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

After having a typo for writing a histogram trigger.

Wrote:
  echo 'hist:key=pid:ts=common_timestamp.usec' > 
events/sched/sched_waking/trigger

Instead of:
  echo 'hist:key=pid:ts=common_timestamp.usecs' > 
events/sched/sched_waking/trigger

and the following crash happened:

 BUG: kernel NULL pointer dereference, address: 0008
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x) - not-present page
 PGD 0 P4D 0
 Oops:  [#1] PREEMPT SMP PTI
 CPU: 4 PID: 1641 Comm: sh Not tainted 5.9.0-rc5-test+ #549
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 
07/14/2016
 RIP: 0010:event_hist_trigger_func+0x70b/0x1ee0
 Code: 24 08 89 d5 49 89 cc e9 8c 00 00 00 4c 89 f2 41 b9 00 10 00 00 4c 89 e1 
44 89 ee 4c 89 ff e8 dc d3 ff ff 45 89 ea 4b 8b 14 d7  42 08 04 74 17 41 8b 
8f c0 00 00 00 8d 71 01 41 89 b7 c0 00 00
 RSP: 0018:959213d53db0 EFLAGS: 00010202
 RAX: ffea RBX:  RCX: 00084c04
 RDX:  RSI: df7326aefebd174c RDI: 00031080
 RBP: 0002 R08: 0001 R09: 0001
 R10: 0001 R11: 0046 R12: 959211dcf690
 R13: 0001 R14: 95925a36e370 R15: 959251c89800
 FS:  7fb9ea934740() GS:95925ab0() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 0008 CR3: c976c005 CR4: 001706e0
 Call Trace:
  ? trigger_process_regex+0x78/0x110
  trigger_process_regex+0xc5/0x110
  event_trigger_write+0x71/0xd0
  vfs_write+0xca/0x210
  ksys_write+0x70/0xf0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fb9eaa29487
 Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 
8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 
c3 48 83 ec 28 48 89 54 24 18 48 89 74 24

This was caused by accessing the hlist_data fields after the call to
__create_val_fields() without checking if the creation succeed.

Link: https://lkml.kernel.org/r/20201013154852.3abd8...@gandalf.local.home

Fixes: 63a1e5de3006 ("tracing: Save normal string variables")
Reviewed-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_hist.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index c74a7d157306..96c3f86b81c5 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -3687,7 +3687,7 @@ static int create_var_field(struct hist_trigger_data 
*hist_data,
 
ret = __create_val_field(hist_data, val_idx, file, var_name, expr_str, 
flags);
 
-   if (hist_data->fields[val_idx]->flags & HIST_FIELD_FL_STRING)
+   if (!ret && hist_data->fields[val_idx]->flags & HIST_FIELD_FL_STRING)
hist_data->fields[val_idx]->var_str_idx = 
hist_data->n_var_str++;
 
return ret;
-- 
2.28.0




[for-next][PATCH 03/12] tracing/boot: Add ftrace.instance.*.alloc_snapshot option

2020-10-14 Thread Steven Rostedt
From: Masami Hiramatsu 

Add ftrace.instance.*.alloc_snapshot option.

This option has been described in Documentation/trace/boottime-trace.rst
but not implemented yet.

ftrace.[instance.INSTANCE.]alloc_snapshot
   Allocate snapshot buffer.

The difference from kernel.alloc_snapshot is that the kernel.alloc_snapshot
will allocate the buffer only for the main instance, but this can allocate
buffer for any new instances.

Link: 
https://lkml.kernel.org/r/160234368948.400560.15313384470765915015.stgit@devnote2

Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_boot.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/trace/trace_boot.c b/kernel/trace/trace_boot.c
index 754e3cf2df3a..c22a152ef0b4 100644
--- a/kernel/trace/trace_boot.c
+++ b/kernel/trace/trace_boot.c
@@ -284,6 +284,12 @@ trace_boot_enable_tracer(struct trace_array *tr, struct 
xbc_node *node)
if (tracing_set_tracer(tr, p) < 0)
pr_err("Failed to set given tracer: %s\n", p);
}
+
+   /* Since tracer can free snapshot buffer, allocate snapshot here.*/
+   if (xbc_node_find_value(node, "alloc_snapshot", NULL)) {
+   if (tracing_alloc_snapshot_instance(tr) < 0)
+   pr_err("Failed to allocate snapshot buffer\n");
+   }
 }
 
 static void __init
-- 
2.28.0




[for-next][PATCH 04/12] tracing: Fix some typos in comments

2020-10-14 Thread Steven Rostedt
From: Qiujun Huang 

s/wihin/within/
s/retrieven/retrieved/
s/suppport/support/
s/wil/will/
s/accidently/accidentally/
s/if the if the/if the/

Link: https://lkml.kernel.org/r/20201010140924.3809-1-hqjag...@gmail.com

Signed-off-by: Qiujun Huang 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.c | 4 ++--
 kernel/trace/trace.h | 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0806fa9f2815..63c97012ed39 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9465,7 +9465,7 @@ __init static int tracer_alloc_buffers(void)
}
 
/*
-* Make sure we don't accidently add more trace options
+* Make sure we don't accidentally add more trace options
 * than we have bits for.
 */
BUILD_BUG_ON(TRACE_ITER_LAST_BIT > TRACE_FLAGS_MAX_SIZE);
@@ -9494,7 +9494,7 @@ __init static int tracer_alloc_buffers(void)
 
/*
 * The prepare callbacks allocates some memory for the ring buffer. We
-* don't free the buffer if the if the CPU goes down. If we were to free
+* don't free the buffer if the CPU goes down. If we were to free
 * the buffer, then the user would lose any trace that was in the
 * buffer. The memory will be removed once the "instance" is removed.
 */
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 5b0e797cacdd..f777bb68e660 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -246,7 +246,7 @@ typedef bool (*cond_update_fn_t)(struct trace_array *tr, 
void *cond_data);
  * tracing_snapshot_cond(tr, cond_data), the cond_data passed in is
  * passed in turn to the cond_snapshot.update() function.  That data
  * can be compared by the update() implementation with the cond_data
- * contained wihin the struct cond_snapshot instance associated with
+ * contained within the struct cond_snapshot instance associated with
  * the trace_array.  Because the tr->max_lock is held throughout the
  * update() call, the update() function can directly retrieve the
  * cond_snapshot and cond_data associated with the per-instance
@@ -271,7 +271,7 @@ typedef bool (*cond_update_fn_t)(struct trace_array *tr, 
void *cond_data);
  * take the snapshot, by returning 'true' if so, 'false' if no
  * snapshot should be taken.  Because the max_lock is held for
  * the duration of update(), the implementation is safe to
- * directly retrieven and save any implementation data it needs
+ * directly retrieved and save any implementation data it needs
  * to in association with the snapshot.
  */
 struct cond_snapshot {
@@ -573,7 +573,7 @@ struct tracer {
  *   The function callback, which can use the FTRACE bits to
  *check for recursion.
  *
- * Now if the arch does not suppport a feature, and it calls
+ * Now if the arch does not support a feature, and it calls
  * the global list function which calls the ftrace callback
  * all three of these steps will do a recursion protection.
  * There's no reason to do one if the previous caller already
@@ -1479,7 +1479,7 @@ __trace_event_discard_commit(struct trace_buffer *buffer,
 /*
  * Helper function for event_trigger_unlock_commit{_regs}().
  * If there are event triggers attached to this event that requires
- * filtering against its fields, then they wil be called as the
+ * filtering against its fields, then they will be called as the
  * entry already holds the field information of the current event.
  *
  * It also checks if the event should be discarded or not.
-- 
2.28.0




[for-next][PATCH 10/12] tracing: Handle synthetic event array field type checking correctly

2020-10-14 Thread Steven Rostedt
From: Tom Zanussi 

Since synthetic event array types are derived from the field name,
there may be a semicolon at the end of the type which should be
stripped off.

If there are more characters following that, normal type string
checking will result in an invalid type.

Without this patch, you can end up with an invalid field type string
that gets displayed in both the synthetic event description and the
event format:

Before:

  # echo 'myevent char str[16]; int v' >> synthetic_events
  # cat synthetic_events
myevent char[16]; str; int v

  name: myevent
  ID: 1936
  format:
field:unsigned short common_type;   offset:0;   size:2; 
signed:0;
field:unsigned char common_flags;   offset:2;   size:1; 
signed:0;
field:unsigned char common_preempt_count;   offset:3;   size:1; 
signed:0;
field:int common_pid;   offset:4;   size:4; signed:1;

field:char str[16];;offset:8;   size:16;signed:1;
field:int v;offset:40;  size:4; signed:1;

  print fmt: "str=%s, v=%d", REC->str, REC->v

After:

  # echo 'myevent char str[16]; int v' >> synthetic_events
  # cat synthetic_events
myevent char[16] str; int v

  # cat events/synthetic/myevent/format
  name: myevent
  ID: 1936
  format:
field:unsigned short common_type;   offset:0;   size:2; 
signed:0;
field:unsigned char common_flags;   offset:2;   size:1; 
signed:0;
field:unsigned char common_preempt_count;   offset:3;   size:1; 
signed:0;
field:int common_pid;   offset:4;   size:4; signed:1;

field:char str[16]; offset:8;   size:16;signed:1;
field:int v;offset:40;  size:4; signed:1;

  print fmt: "str=%s, v=%d", REC->str, REC->v

Link: 
https://lkml.kernel.org/r/6587663b56c2d45ab9d8c8472a2110713cdec97d.1602598160.git.zanu...@kernel.org

[ : wrote parse_synth_field() snippet. ]
Fixes: 4b147936fa50 (tracing: Add support for 'synthetic' events)
Reported-by: Masami Hiramatsu 
Tested-by: Masami Hiramatsu 
Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_synth.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_events_synth.c 
b/kernel/trace/trace_events_synth.c
index f77851018121..d239f0e2af8f 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -174,7 +174,7 @@ static int synth_field_string_size(char *type)
start += sizeof("char[") - 1;
 
end = strchr(type, ']');
-   if (!end || end < start)
+   if (!end || end < start || type + strlen(type) > end + 1)
return -EINVAL;
 
len = end - start;
@@ -625,8 +625,14 @@ static struct synth_field *parse_synth_field(int argc, 
const char **argv,
if (field_type[0] == ';')
field_type++;
len = strlen(field_type) + 1;
-   if (array)
-   len += strlen(array);
+
+if (array) {
+int l = strlen(array);
+
+if (l && array[l - 1] == ';')
+l--;
+len += l;
+}
if (prefix)
len += strlen(prefix);
 
-- 
2.28.0




Re: [GIT PULL] Hyper-V commits for 5.10

2020-10-14 Thread pr-tracker-bot
The pull request you sent on Tue, 13 Oct 2020 13:12:14 +:

> ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git 
> tags/hyperv-next-signed

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/4907a43da83184d4e88009654c9b31f5e091f709

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


[for-next][PATCH 00/12] tracing: Last minute updates before sending to Linus

2020-10-14 Thread Steven Rostedt
  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
for-next

Head SHA1: d3c07fac565261101337b9535df072361297ea2d


Axel Rasmussen (1):
  tracing: support "bool" type in synthetic trace events

Gaurav Kohli (1):
  tracing: Fix race in trace_open and buffer resize call

Masami Hiramatsu (1):
  tracing/boot: Add ftrace.instance.*.alloc_snapshot option

Qiujun Huang (1):
  tracing: Fix some typos in comments

Steven Rostedt (VMware) (1):
  tracing: Check return value of __create_val_fields() before using its 
result

Tom Zanussi (7):
  tracing: Don't show dynamic string internals in synthetic event 
description
  tracing: Move is_good_name() from trace_probe.h to trace.h
  tracing: Check that the synthetic event and field names are legal
  tracing: Add synthetic event error logging
  selftests/ftrace: Change synthetic event name for inter-event-combined 
test
  tracing: Handle synthetic event array field type checking correctly
  selftests/ftrace: Add test case for synthetic event syntax errors


 kernel/trace/ring_buffer.c |  10 ++
 kernel/trace/trace.c   |   4 +-
 kernel/trace/trace.h   |  21 +++-
 kernel/trace/trace_boot.c  |   6 +
 kernel/trace/trace_events_hist.c   |   2 +-
 kernel/trace/trace_events_synth.c  | 127 -
 kernel/trace/trace_probe.h |  13 ---
 .../trigger-inter-event-combined-hist.tc   |   8 +-
 .../trigger-synthetic_event_syntax_errors.tc   |  19 +++
 9 files changed, 180 insertions(+), 30 deletions(-)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc


Re: [GIT PULL v2] objtool changes for v5.10

2020-10-14 Thread pr-tracker-bot
The pull request you sent on Tue, 13 Oct 2020 12:38:31 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> objtool-core-2020-10-13

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6873139ed078bfe0341d4cbb69e5af1b323bf532

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


Re: [PATCH 09/20] dt-bindings: usb: Convert DWC USB3 bindings to DT schema

2020-10-14 Thread Serge Semin
On Wed, Oct 14, 2020 at 08:32:19AM -0500, Rob Herring wrote:
> On Wed, 14 Oct 2020 13:13:51 +0300, Serge Semin wrote:
> > DWC USB3 DT node is supposed to be compliant with the Generic xHCI
> > Controller schema, but with additional vendor-specific properties, the
> > controller-specific reference clocks and PHYs. So let's convert the
> > currently available legacy text-based DWC USB3 bindings to the DT schema
> > and make sure the DWC USB3 nodes are also validated against the
> > usb-xhci.yaml schema.
> > 
> > Note we have to discard the nodename restriction of being prefixed with
> > "dwc3@" string, since in accordance with the usb-hcd.yaml schema USB nodes
> > are supposed to be named as "^usb(@.*)".
> > 
> > Signed-off-by: Serge Semin 
> > 
> > ---
> > 
> > Changelog v2:
> > - Discard '|' from the descriptions, since we don't need to preserve
> >   the text formatting in any of them.
> > - Drop quotes from around the string constants.
> > - Fix the "clock-names" prop description to be referring the enumerated
> >   clock-names instead of the ones from the Databook.
> > ---
> >  .../devicetree/bindings/usb/dwc3.txt  | 125 
> >  .../devicetree/bindings/usb/snps,dwc3.yaml| 295 ++
> >  2 files changed, 295 insertions(+), 125 deletions(-)
> >  delete mode 100644 Documentation/devicetree/bindings/usb/dwc3.txt
> >  create mode 100644 Documentation/devicetree/bindings/usb/snps,dwc3.yaml
> > 
> 
> 

> My bot found errors running 'make dt_binding_check' on your patch:
> 
> ./Documentation/devicetree/bindings/usb/snps,dwc3.yaml:44:4: [warning] wrong 
> indentation: expected 4 but found 3 (indentation)
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/usb/qcom,dwc3.example.dt.yaml:
>  dwc3@a60: $nodename:0: 'dwc3@a60' does not match '^usb(@.*)?'
>   From schema: 
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/usb/snps,dwc3.yaml
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/usb/amlogic,meson-g12a-usb-ctrl.example.dt.yaml:
>  usb@ff50: snps,quirk-frame-length-adjustment: True is not of type 'array'
>   From schema: 
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/usb/snps,dwc3.yaml
> 
> 
> See https://patchwork.ozlabs.org/patch/1382003
> 
> If you already ran 'make dt_binding_check' and didn't see the above
> error(s), then make sure dt-schema is up to date:
> 
> pip3 install git+https://github.com/devicetree-org/dt-schema.git@master 
> --upgrade
> 
> Please check and re-submit.
> 

Both of these errors are fixed in the following patches of the series:
[PATCH 17/20] dt-bindings: usb: qcom,dwc3: Validate DWC3 sub-node
[PATCH 15/20] dt-bindings: usb: meson-g12a-usb: Fix FL-adj property value

This patch preserves the original legacy bindings and doesn't touch the
depended bogus DT schemas.

-Sergey



[for-next][PATCH 12/12] tracing: support "bool" type in synthetic trace events

2020-10-14 Thread Steven Rostedt
From: Axel Rasmussen 

It's common [1] to define tracepoint fields as "bool" when they contain
a true / false value. Currently, defining a synthetic event with a
"bool" field yields EINVAL. It's possible to work around this by using
e.g. u8 (assuming sizeof(bool) is 1, and bool is unsigned; if either of
these properties don't match, you get EINVAL [2]).

Supporting "bool" explicitly makes hooking this up easier and more
portable for userspace.

[1]: grep -r "bool" include/trace/events/
[2]: check_synth_field() in kernel/trace/trace_events_hist.c

Link: https://lkml.kernel.org/r/20201009220524.485102-2-axelrasmus...@google.com

Acked-by: Michel Lespinasse 
Acked-by: David Rientjes 
Signed-off-by: Axel Rasmussen 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_synth.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/trace/trace_events_synth.c 
b/kernel/trace/trace_events_synth.c
index d239f0e2af8f..3212e2c653b3 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -229,6 +229,8 @@ static int synth_field_size(char *type)
size = sizeof(long);
else if (strcmp(type, "unsigned long") == 0)
size = sizeof(unsigned long);
+   else if (strcmp(type, "bool") == 0)
+   size = sizeof(bool);
else if (strcmp(type, "pid_t") == 0)
size = sizeof(pid_t);
else if (strcmp(type, "gfp_t") == 0)
@@ -271,6 +273,8 @@ static const char *synth_field_fmt(char *type)
fmt = "%ld";
else if (strcmp(type, "unsigned long") == 0)
fmt = "%lu";
+   else if (strcmp(type, "bool") == 0)
+   fmt = "%d";
else if (strcmp(type, "pid_t") == 0)
fmt = "%d";
else if (strcmp(type, "gfp_t") == 0)
-- 
2.28.0




Re: [GIT PULL] xen: branch for v5.10-rc1

2020-10-14 Thread pr-tracker-bot
The pull request you sent on Wed, 14 Oct 2020 07:39:17 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
> for-linus-5.10b-rc1-tag

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a09b1d78505eb9fe27597a5174c61a7c66253fe8

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


[for-next][PATCH 06/12] tracing: Move is_good_name() from trace_probe.h to trace.h

2020-10-14 Thread Steven Rostedt
From: Tom Zanussi 

is_good_name() is useful for other trace infrastructure, such as
synthetic events, so make it available via trace.h.

Link: 
https://lkml.kernel.org/r/cc6d6a2d7da6957fcbe1e2922e76d18d2bb459b4.1602598160.git.zanu...@kernel.org

Acked-by: Masami Hiramatsu 
Tested-by: Masami Hiramatsu 
Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.h   | 13 +
 kernel/trace/trace_probe.h | 13 -
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f777bb68e660..34e0c4d5a6e7 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_FTRACE_SYSCALLS
 #include /* For NR_SYSCALLS   */
@@ -2090,4 +2091,16 @@ static __always_inline void trace_iterator_reset(struct 
trace_iterator *iter)
iter->pos = -1;
 }
 
+/* Check the name is good for event/group/fields */
+static inline bool is_good_name(const char *name)
+{
+   if (!isalpha(*name) && *name != '_')
+   return false;
+   while (*++name != '\0') {
+   if (!isalpha(*name) && !isdigit(*name) && *name != '_')
+   return false;
+   }
+   return true;
+}
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 04d00987da69..2f703a20c724 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -348,18 +347,6 @@ bool trace_probe_match_command_args(struct trace_probe *tp,
 #define trace_probe_for_each_link_rcu(pos, tp) \
list_for_each_entry_rcu(pos, &(tp)->event->files, list)
 
-/* Check the name is good for event/group/fields */
-static inline bool is_good_name(const char *name)
-{
-   if (!isalpha(*name) && *name != '_')
-   return false;
-   while (*++name != '\0') {
-   if (!isalpha(*name) && !isdigit(*name) && *name != '_')
-   return false;
-   }
-   return true;
-}
-
 #define TPARG_FL_RETURN BIT(0)
 #define TPARG_FL_KERNEL BIT(1)
 #define TPARG_FL_FENTRY BIT(2)
-- 
2.28.0




[for-next][PATCH 08/12] tracing: Add synthetic event error logging

2020-10-14 Thread Steven Rostedt
From: Tom Zanussi 

Add support for synthetic event error logging, which entails adding a
logging function for it, a way to save the synthetic event command,
and a set of specific synthetic event parse error strings and
handling.

Link: 
https://lkml.kernel.org/r/ed099c66df13b40cfc633aaeb17f66c37a923066.1602598160.git.zanu...@kernel.org

[ : wrote save_cmdstr() seq_buf implementation. ]
Tested-by: Masami Hiramatsu 
Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_synth.c | 92 ++-
 1 file changed, 90 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_events_synth.c 
b/kernel/trace/trace_events_synth.c
index 8c9d6e464da0..f77851018121 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -20,6 +20,48 @@
 
 #include "trace_synth.h"
 
+#undef ERRORS
+#define ERRORS \
+   C(BAD_NAME, "Illegal name"),\
+   C(CMD_INCOMPLETE,   "Incomplete command"),  \
+   C(EVENT_EXISTS, "Event already exists"),\
+   C(TOO_MANY_FIELDS,  "Too many fields"), \
+   C(INCOMPLETE_TYPE,  "Incomplete type"), \
+   C(INVALID_TYPE, "Invalid type"),\
+   C(INVALID_FIELD,"Invalid field"),   \
+   C(CMD_TOO_LONG, "Command too long"),
+
+#undef C
+#define C(a, b)SYNTH_ERR_##a
+
+enum { ERRORS };
+
+#undef C
+#define C(a, b)b
+
+static const char *err_text[] = { ERRORS };
+
+static char last_cmd[MAX_FILTER_STR_VAL];
+
+static int errpos(const char *str)
+{
+   return err_pos(last_cmd, str);
+}
+
+static void last_cmd_set(char *str)
+{
+   if (!str)
+   return;
+
+   strncpy(last_cmd, str, MAX_FILTER_STR_VAL - 1);
+}
+
+static void synth_err(u8 err_type, u8 err_pos)
+{
+   tracing_log_err(NULL, "synthetic_events", last_cmd, err_text,
+   err_type, err_pos);
+}
+
 static int create_synth_event(int argc, const char **argv);
 static int synth_event_show(struct seq_file *m, struct dyn_event *ev);
 static int synth_event_release(struct dyn_event *ev);
@@ -545,8 +587,10 @@ static struct synth_field *parse_synth_field(int argc, 
const char **argv,
field_type++;
 
if (!strcmp(field_type, "unsigned")) {
-   if (argc < 3)
+   if (argc < 3) {
+   synth_err(SYNTH_ERR_INCOMPLETE_TYPE, 
errpos(field_type));
return ERR_PTR(-EINVAL);
+   }
prefix = "unsigned ";
field_type = argv[1];
field_name = argv[2];
@@ -573,6 +617,7 @@ static struct synth_field *parse_synth_field(int argc, 
const char **argv,
goto free;
}
if (!is_good_name(field->name)) {
+   synth_err(SYNTH_ERR_BAD_NAME, errpos(field_name));
ret = -EINVAL;
goto free;
}
@@ -601,6 +646,7 @@ static struct synth_field *parse_synth_field(int argc, 
const char **argv,
 
size = synth_field_size(field->type);
if (size < 0) {
+   synth_err(SYNTH_ERR_INVALID_TYPE, errpos(field_type));
ret = -EINVAL;
goto free;
} else if (size == 0) {
@@ -621,6 +667,7 @@ static struct synth_field *parse_synth_field(int argc, 
const char **argv,
field->is_dynamic = true;
size = sizeof(u64);
} else {
+   synth_err(SYNTH_ERR_INVALID_TYPE, errpos(field_type));
ret = -EINVAL;
goto free;
}
@@ -1098,12 +1145,47 @@ int synth_event_gen_cmd_array_start(struct dynevent_cmd 
*cmd, const char *name,
 }
 EXPORT_SYMBOL_GPL(synth_event_gen_cmd_array_start);
 
+static int save_cmdstr(int argc, const char *name, const char **argv)
+{
+   struct seq_buf s;
+   char *buf;
+   int i;
+
+   buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   seq_buf_init(, buf, MAX_DYNEVENT_CMD_LEN);
+
+   seq_buf_puts(, name);
+
+   for (i = 0; i < argc; i++) {
+   seq_buf_putc(, ' ');
+   seq_buf_puts(, argv[i]);
+   }
+
+   if (!seq_buf_buffer_left()) {
+   synth_err(SYNTH_ERR_CMD_TOO_LONG, 0);
+   kfree(buf);
+   return -EINVAL;
+   }
+   buf[s.len] = 0;
+   last_cmd_set(buf);
+
+   kfree(buf);
+   return 0;
+}
+
 static int __create_synth_event(int argc, const char *name, const char **argv)
 {
struct synth_field *field, *fields[SYNTH_FIELDS_MAX];
struct synth_event *event = NULL;
int i, consumed = 0, n_fields = 0, ret = 0;
 
+   ret = save_cmdstr(argc, name, argv);
+   if (ret)
+   return ret;
+
/*
 * Argument syntax:
   

Re: [GIT PULL] objtool changes for v5.10

2020-10-14 Thread pr-tracker-bot
The pull request you sent on Tue, 13 Oct 2020 10:26:25 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> objtool-core-2020-10-13

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/ab0a40ea88204e1291b56da8128e2845fec8ee88

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


Re: [GIT PULL] x86/seves for v5.10

2020-10-14 Thread pr-tracker-bot
The pull request you sent on Tue, 13 Oct 2020 13:08:19 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> tags/x86_seves_for_v5.10

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/da9803dfd3955bd2f9909d55e23f188ad76dbe58

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


[for-next][PATCH 02/12] tracing: Fix race in trace_open and buffer resize call

2020-10-14 Thread Steven Rostedt
From: Gaurav Kohli 

Below race can come, if trace_open and resize of
cpu buffer is running parallely on different cpus
CPUXCPUY
ring_buffer_resize
atomic_read(>resize_disabled)
tracing_open
tracing_reset_online_cpus
ring_buffer_reset_cpu
rb_reset_cpu
rb_update_pages
remove/insert pages
resetting pointer

This race can cause data abort or some times infinte loop in
rb_remove_pages and rb_insert_pages while checking pages
for sanity.

Take buffer lock to fix this.

Link: 
https://lkml.kernel.org/r/1601976833-24377-1-git-send-email-gko...@codeaurora.org

Cc: sta...@vger.kernel.org
Fixes: b23d7a5f4a07a ("ring-buffer: speed up buffer resets by avoiding 
synchronize_rcu for each CPU")
Signed-off-by: Gaurav Kohli 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/ring_buffer.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 93ef0ab6ea20..15bf28b13e50 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4866,6 +4866,9 @@ void ring_buffer_reset_cpu(struct trace_buffer *buffer, 
int cpu)
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return;
 
+   /* prevent another thread from changing buffer sizes */
+   mutex_lock(>mutex);
+
atomic_inc(_buffer->resize_disabled);
atomic_inc(_buffer->record_disabled);
 
@@ -4876,6 +4879,8 @@ void ring_buffer_reset_cpu(struct trace_buffer *buffer, 
int cpu)
 
atomic_dec(_buffer->record_disabled);
atomic_dec(_buffer->resize_disabled);
+
+   mutex_unlock(>mutex);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
 
@@ -4889,6 +4894,9 @@ void ring_buffer_reset_online_cpus(struct trace_buffer 
*buffer)
struct ring_buffer_per_cpu *cpu_buffer;
int cpu;
 
+   /* prevent another thread from changing buffer sizes */
+   mutex_lock(>mutex);
+
for_each_online_buffer_cpu(buffer, cpu) {
cpu_buffer = buffer->buffers[cpu];
 
@@ -4907,6 +4915,8 @@ void ring_buffer_reset_online_cpus(struct trace_buffer 
*buffer)
atomic_dec(_buffer->record_disabled);
atomic_dec(_buffer->resize_disabled);
}
+
+   mutex_unlock(>mutex);
 }
 
 /**
-- 
2.28.0




[for-next][PATCH 09/12] selftests/ftrace: Change synthetic event name for inter-event-combined test

2020-10-14 Thread Steven Rostedt
From: Tom Zanussi 

This test uses waking+wakeup_latency as an event name, which doesn't
make sense since it includes an operator.  Illegal names are now
detected by the synthetic event command parsing, which causes this
test to fail.  Change the name to 'waking_plus_wakeup_latency' to
prevent this.

Link: 
https://lkml.kernel.org/r/a1ee2f76ff28ef7166fb788ca8be968887808920.1602598160.git.zanu...@kernel.org

Fixes: f06eec4d0f2c (selftests: ftrace: Add inter-event hist triggers testcases)
Acked-by: Masami Hiramatsu 
Tested-by: Masami Hiramatsu 
Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 .../inter-event/trigger-inter-event-combined-hist.tc  | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc
index 7449a4b8f1f9..9098f1e7433f 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc
@@ -25,12 +25,12 @@ echo 'wakeup_latency u64 lat pid_t pid' >> synthetic_events
 echo 'hist:keys=pid:ts1=common_timestamp.usecs if comm=="ping"' >> 
events/sched/sched_wakeup/trigger
 echo 
'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts1:onmatch(sched.sched_wakeup).wakeup_latency($wakeup_lat,next_pid)
 if next_comm=="ping"' > events/sched/sched_switch/trigger
 
-echo 'waking+wakeup_latency u64 lat; pid_t pid' >> synthetic_events
-echo 
'hist:keys=pid,lat:sort=pid,lat:ww_lat=$waking_lat+$wakeup_lat:onmatch(synthetic.wakeup_latency).waking+wakeup_latency($ww_lat,pid)'
 >> events/synthetic/wakeup_latency/trigger
-echo 'hist:keys=pid,lat:sort=pid,lat' >> 
events/synthetic/waking+wakeup_latency/trigger
+echo 'waking_plus_wakeup_latency u64 lat; pid_t pid' >> synthetic_events
+echo 
'hist:keys=pid,lat:sort=pid,lat:ww_lat=$waking_lat+$wakeup_lat:onmatch(synthetic.wakeup_latency).waking_plus_wakeup_latency($ww_lat,pid)'
 >> events/synthetic/wakeup_latency/trigger
+echo 'hist:keys=pid,lat:sort=pid,lat' >> 
events/synthetic/waking_plus_wakeup_latency/trigger
 
 ping $LOCALHOST -c 3
-if ! grep -q "pid:" events/synthetic/waking+wakeup_latency/hist; then
+if ! grep -q "pid:" events/synthetic/waking_plus_wakeup_latency/hist; then
 fail "Failed to create combined histogram"
 fi
 
-- 
2.28.0




[for-next][PATCH 05/12] tracing: Dont show dynamic string internals in synthetic event description

2020-10-14 Thread Steven Rostedt
From: Tom Zanussi 

For synthetic event dynamic fields, the type contains "__data_loc",
which is basically an internal part of the type which is only meant to
be displayed in the format, not in the event description itself, which
is confusing to users since they can't use __data_loc on the
command-line to define an event field, which printing it would lead
them to believe.

So filter it out from the description, while leaving it in the type.

Link: 
https://lkml.kernel.org/r/b3b7baf7813298a5ede4ff02e2e837b91c05a724.1602598160.git.zanu...@kernel.org

Reported-by: Masami Hiramatsu 
Tested-by: Masami Hiramatsu 
Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_synth.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_synth.c 
b/kernel/trace/trace_events_synth.c
index 3b2dcc42b8ee..b19e2f4159ab 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -1867,14 +1867,22 @@ static int __synth_event_show(struct seq_file *m, 
struct synth_event *event)
 {
struct synth_field *field;
unsigned int i;
+   char *type, *t;
 
seq_printf(m, "%s\t", event->name);
 
for (i = 0; i < event->n_fields; i++) {
field = event->fields[i];
 
+   type = field->type;
+   t = strstr(type, "__data_loc");
+   if (t) { /* __data_loc belongs in format but not event desc */
+   t += sizeof("__data_loc");
+   type = t;
+   }
+
/* parameter values */
-   seq_printf(m, "%s %s%s", field->type, field->name,
+   seq_printf(m, "%s %s%s", type, field->name,
   i == event->n_fields - 1 ? "" : "; ");
}
 
-- 
2.28.0




Re: [PATCH] firmware: arm_scmi: Fix duplicate workqueue name

2020-10-14 Thread Sudeep Holla
On Wed, Oct 14, 2020 at 06:32:42PM +0100, Sudeep Holla wrote:
> On Wed, Oct 14, 2020 at 10:08:32AM -0700, Florian Fainelli wrote:
> > On 10/14/20 2:18 AM, Sudeep Holla wrote:
> > > Hi Florian,
> > >
> > > Thanks for the patch, it shows someone else is also using this and
> > > testing .
> > >
> > > On Tue, Oct 13, 2020 at 07:17:37PM -0700, Florian Fainelli wrote:
> > >> When more than a single SCMI device are present in the system, the
> > >> creation of the notification workqueue with the WQ_SYSFS flag will lead
> > >> to the following sysfs duplicate node warning:
> > >>
> > >
> > > Please trim the calltrace next time without timestamp and register raw
> > > hex values.
> >
> > Will do, thanks!
> >
> 
> Thanks!
> 
> > >
> > > You using this on 32-bit platform ? If so, thanks for additional test
> > > coverage.
> >
> > We have a mix of ARMv7/LPAE (Brahma-B15) and ARMv8 (Brahma-B53,
> > Cortex-A72) devices that we regularly test with 32-bit and 64-bit kernels.
> >
> 
> Ah OK, good to know.
> 
> [...]
> 
> > >> Fix this by using dev_name(handle->dev) which guarantees that the name is
> > >> unique and this also helps correlate which notification workqueue 
> > >> corresponds
> > >> to which SCMI device instance.
> > >>
> > >
> > > I am curious as how multiple SCMI instances are used. We know few 
> > > limitations
> > > in the code to handle that yet, so interested to know if you are carrying
> > > more patches/fixes.
> >
> > We currently have two SCMI device nodes in Device Tree:
> >
> > - the first one is responsible for all of the base, performance, sensors
> > protocols and is present on all of the chips listed above
> >
> > - the second one is responsible for a proprietary protocol through which
> > we encapsulate a variety of operations towards a secure agent in the
> > system, this is only present in a subset of devices.
> >
> 
> And any particular reasons it can't exist in the same node. And also are
> they talking to different SCMI firmware implementation meaning different
> location in the system. The reason I ask is we have notion of one platform
> with agent id = 0 as per the specification. It can be split in terms
> of implementation and can have some side band communication amongst
> themselves but can't have agent ID other than 0. It violates specification.
> 
> I don't have issues split it into 2 or more SCMI device as long as it
> doesn't provide notion of existence of multiple SCMI platform firmware
> implementations with different agent ID.
> 
> Also Cristian has posted patches to support custom protocols[1]. It would
> be good if you can take a look/review/test/comment...
> 

Pressed enter too early, link added now.

-- 
Regards,
Sudeep

[1] https://lore.kernel.org/lkml/20201014150545.44807-1-cristian.maru...@arm.com


Re: [PATCH v3 7/7] selftests/ftrace: Add test case for synthetic event syntax errors

2020-10-14 Thread Steven Rostedt
On Wed, 14 Oct 2020 11:06:36 +0900
Masami Hiramatsu  wrote:

> Hi Tom,
> 
> On Tue, 13 Oct 2020 09:17:58 -0500
> Tom Zanussi  wrote:
> 
> > Add a selftest that verifies that the syntax error messages and caret
> > positions are correct for most of the possible synthetic event syntax
> > error cases.
> > 
> > Signed-off-by: Tom Zanussi 
> > ---
> >  .../trigger-synthetic_event_syntax_errors.tc  | 19 +++
> >  1 file changed, 19 insertions(+)
> >  create mode 100644 
> > tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc
> > 
> > diff --git 
> > a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc
> >  
> > b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc
> > new file mode 100644
> > index ..ada594fe16cb
> > --- /dev/null
> > +++ 
> > b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic_event_syntax_errors.tc
> > @@ -0,0 +1,19 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0
> > +# description: event trigger - test synthetic_events syntax parser errors
> > +# requires: synthetic_events error_log  
> 
> This also requires dynamic strings support. So, its "requires" line should be
> 
> # requires: synthetic_events error_log "char name[]' >> 
> synthetic_events":README
> 
> > +
> > +check_error() { # command-with-error-pos-by-^
> > +ftrace_errlog_check 'synthetic_events' "$1" 'synthetic_events'
> > +}
> > +  
> 
> BTW, some errors looks a bit odd.
> 
> > +check_error 'myevent ^chr arg' # INVALID_TYPE
> > +check_error 'myevent ^char str[];; int v'  # INVALID_TYPE  
> 
> I think there is a wrong "void" argument between ";", instead of invalid type.
> 
> > +check_error 'myevent char ^str]; int v'# INVALID_NAME
> > +check_error 'myevent char ^str;[]' # INVALID_NAME  
> 
> This is also not an invalid name but '[]' is an invalid type. 
> 
> > +check_error 'myevent ^char str[; int v'# INVALID_TYPE
> > +check_error '^mye;vent char str[]' # BAD_NAME
> > +check_error 'myevent char str[]; ^int' # INVALID_FIELD  
> 
> Isn't it an incomplete command?
> 
> > +check_error '^myevent' # INCOMPLETE_CMD
> > +
> > +exit 0  
> 

Hi Masami,

I finished testing this series along with other patches (some from you),
and I'm ready to push this to next, and hopefully soon to Linus. You have a
"tested-by" for the entire series. Are you OK with this patch too? Can we
push this forward and fix up any issues you have later?

-- Steve


Re: [PATCH] firmware: arm_scmi: Fix duplicate workqueue name

2020-10-14 Thread Sudeep Holla
On Wed, Oct 14, 2020 at 10:08:32AM -0700, Florian Fainelli wrote:
> On 10/14/20 2:18 AM, Sudeep Holla wrote:
> > Hi Florian,
> >
> > Thanks for the patch, it shows someone else is also using this and
> > testing .
> >
> > On Tue, Oct 13, 2020 at 07:17:37PM -0700, Florian Fainelli wrote:
> >> When more than a single SCMI device are present in the system, the
> >> creation of the notification workqueue with the WQ_SYSFS flag will lead
> >> to the following sysfs duplicate node warning:
> >>
> >
> > Please trim the calltrace next time without timestamp and register raw
> > hex values.
>
> Will do, thanks!
>

Thanks!

> >
> > You using this on 32-bit platform ? If so, thanks for additional test
> > coverage.
>
> We have a mix of ARMv7/LPAE (Brahma-B15) and ARMv8 (Brahma-B53,
> Cortex-A72) devices that we regularly test with 32-bit and 64-bit kernels.
>

Ah OK, good to know.

[...]

> >> Fix this by using dev_name(handle->dev) which guarantees that the name is
> >> unique and this also helps correlate which notification workqueue 
> >> corresponds
> >> to which SCMI device instance.
> >>
> >
> > I am curious as how multiple SCMI instances are used. We know few 
> > limitations
> > in the code to handle that yet, so interested to know if you are carrying
> > more patches/fixes.
>
> We currently have two SCMI device nodes in Device Tree:
>
> - the first one is responsible for all of the base, performance, sensors
> protocols and is present on all of the chips listed above
>
> - the second one is responsible for a proprietary protocol through which
> we encapsulate a variety of operations towards a secure agent in the
> system, this is only present in a subset of devices.
>

And any particular reasons it can't exist in the same node. And also are
they talking to different SCMI firmware implementation meaning different
location in the system. The reason I ask is we have notion of one platform
with agent id = 0 as per the specification. It can be split in terms
of implementation and can have some side band communication amongst
themselves but can't have agent ID other than 0. It violates specification.

I don't have issues split it into 2 or more SCMI device as long as it
doesn't provide notion of existence of multiple SCMI platform firmware
implementations with different agent ID.

Also Cristian has posted patches to support custom protocols[1]. It would
be good if you can take a look/review/test/comment...

> The reason for that split was because the nature of the SCMI traffic was
> different, the first one is comprised of small messages dealing with
> power management, most messages can be completed synchronously (or
> quasi, if you remember our mailbox implementation) whereas the second is
> comprised of large and asynchronously completed messages. This is not
> probably a common set-up, and we have since eliminated that second SCMI
> interface and went with a direct Linux to agent interface that does not
> "speak" SCMI.
>

I still don't see strong reasons for splitting it. Also by above I assume
you still have one SCMI server/platform implementation and just different
types of transport for std vs custom protocol and all sorts of such combo.
I think we deal with that as long as they are of same transport type like
mailbox or smc.

> We may still have multiple SCMI devices in the future since we are
> trying to separate out the CPU performance domain which involves writing
> to registers that are either ARM system control registers, or only on an
> ARM CPU local bus. The other clock/performance controls can go directly
> to the agent implementing SCMI.
>

We have something called fastchannels for that. Your scmi perf protocol
can present those registers in those fastchannels if they are MMIO.
If they are system registers, then you need not use scmi at all for CPU
DVFS and still need one SCMI device for all other needs.

> >
> >> Fixes: bd31b249692e ("firmware: arm_scmi: Add notification dispatch and 
> >> delivery")
> >> Signed-off-by: Florian Fainelli 
> >> ---
> >>  drivers/firmware/arm_scmi/notify.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/firmware/arm_scmi/notify.c 
> >> b/drivers/firmware/arm_scmi/notify.c
> >> index 4731daaacd19..24c9ef232f48 100644
> >> --- a/drivers/firmware/arm_scmi/notify.c
> >> +++ b/drivers/firmware/arm_scmi/notify.c
> >> @@ -1468,7 +1468,7 @@ int scmi_notification_init(struct scmi_handle 
> >> *handle)
> >>ni->gid = gid;
> >>ni->handle = handle;
> >>
> >> -  ni->notify_wq = alloc_workqueue("scmi_notify",
> >> +  ni->notify_wq = alloc_workqueue(dev_name(handle->dev),
> >
> > Not sure if it makes sense to add "_notify" to clearly identify the 
> > workqueue.
>
> I had intentionally not done it because:
>

OK fair enough.

> - it would create a different name than "dev_name" and I liked having
> the same name throughout the entire sysfs hierarchy to easily identify
> the device
>

Good point. I was worried if we need to create 

Re: [PATCH net] net: dsa: ksz: fix padding size of skb

2020-10-14 Thread Vladimir Oltean
On Wed, Oct 14, 2020 at 07:02:13PM +0200, Christian Eggers wrote:
> > Otherwise said, the frame must be padded to
> > max(skb->len, ETH_ZLEN) + tail tag length.
> At first I thought the same when working on this. But IMHO the padding must
> only ensure the minimum required size, there is no need to pad to the "real"
> size of the skb. The check for the tailroom above ensures that enough memory
> for the "real" size is available.

Yes, that's right, that's the current logic, but what's the point of
your patch, then, if the call to __skb_put_padto is only supposed to
ensure ETH_ZLEN length?
In fact, __skb_put_padto fundamentally does:
- an extension of skb->len to the requested argument, via __skb_put
- a zero-filling of the extra area
So if you include the length of the tag in the call to __skb_put_padto,
then what's the other skb_put() from ksz8795_xmit, ksz9477_xmit,
ksz9893_xmit going to do? Aren't you increasing the frame length twice
by the length of one tag when you are doing this? What problem are you
actually trying to solve?
Can you show a skb_dump(KERN_ERR, skb, true) before and after your change?


Re: fw_devlink on will break all snps,dw-apb-gpio users

2020-10-14 Thread Saravana Kannan
On Wed, Oct 14, 2020 at 4:12 AM Jisheng Zhang
 wrote:
>
> Hi,
>
> If set fw_devlink as on, any consumers of dw apb gpio won't probe.
>
> The related dts looks like:
>
> gpio0: gpio@2400 {
>compatible = "snps,dw-apb-gpio";
>#address-cells = <1>;
>#size-cells = <0>;
>
>porta: gpio-port@0 {
>   compatible = "snps,dw-apb-gpio-port";
>   gpio-controller;
>   #gpio-cells = <2>;
>   ngpios = <32>;
>   reg = <0>;
>};
> };
>
> device_foo {
> status = "okay"
> ...;
> reset-gpio = <, 0, GPIO_ACTIVE_HIGH>;
> };
>
> If I change the reset-gpio property to use another kind of gpio phandle,
> e.g gpio expander, then device_foo can be probed successfully.
>
> The gpio expander dt node looks like:
>
> expander3: gpio@44 {
> compatible = "fcs,fxl6408";
> pinctrl-names = "default";
> pinctrl-0 = <_pmux>;
> reg = <0x44>;
> gpio-controller;
> #gpio-cells = <2>;
> interrupt-parent = <>;
> interrupts = <23 IRQ_TYPE_NONE>;
> interrupt-controller;
> #interrupt-cells = <2>;
> };
>
> The common pattern looks like the devlink can't cope with suppliers from
> child dt node.

fw_devlink doesn't have any problem dealing with child devices being
suppliers. The problem with your case is that the
drivers/gpio/gpio-dwapb.c driver directly parses the child nodes and
never creates struct devices for them. If you have a node with
compatible string, fw_devlink expects you to create and probe a struct
device for it. So change your driver to add the child devices as
devices instead of just parsing the node directly and doing stuff with
it.

Either that, or stop putting "compatible" string in a node if you
don't plan to actually treat it as a device -- but that's too late for
this driver (it needs to be backward compatible). So change the driver
to add of_platform_populate() and write a driver that probes
"snps,dw-apb-gpio-port".

-Saravana


Re: [PATCH v2] perf bench: Use condition variables in numa.

2020-10-14 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 14, 2020 at 06:14:18PM +0200, Jiri Olsa escreveu:
> On Wed, Oct 14, 2020 at 08:39:51AM -0700, Ian Rogers wrote:
> > The pthread_mutex_lock avoids any race on g->nr_tasks_started and
> > g->p.nr_tasks is set up in init() along with all the global state. I
> > don't think there's any race on g->nr_tasks_started and doing a signal
> > for every thread starting will just cause unnecessary wake-ups for the
> > main thread. I think it is better to keep it. I added loops on all the
> > pthread_cond_waits so the code is robust against spurious wake ups.
> 
> ah, I missed that mutex call
> 
> Acked-by: Jiri Olsa 

Thanks, applied.

- Arnaldo


Re: [PATCH 04/20] dt-bindings: usb: usb-hcd: Add "tpl-support" property

2020-10-14 Thread Serge Semin
On Wed, Oct 14, 2020 at 08:27:56AM -0500, Rob Herring wrote:
> On Wed, 14 Oct 2020 13:13:46 +0300, Serge Semin wrote:
> > The host controller device might be designed to work for the particular
> > products or applications. In that case its DT node is supposed to be
> > equipped with the tpl-support property.
> > 
> > Signed-off-by: Serge Semin 
> > 
> > ---
> > 
> > Changelog v2:
> > - Grammar fix: "s/it'/its"
> > - Discard '|' from the property description, since we don't need to preserve
> >   the text formatting.
> > ---
> >  Documentation/devicetree/bindings/usb/usb-hcd.yaml | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> 
> 
> My bot found errors running 'make dt_binding_check' on your patch:
> 
> Traceback (most recent call last):
>   File "/usr/local/bin/dt-extract-example", line 45, in 
> binding = yaml.load(open(args.yamlfile, encoding='utf-8').read())
>   File "/usr/local/lib/python3.8/dist-packages/ruamel/yaml/main.py", line 
> 343, in load
> return constructor.get_single_data()
>   File "/usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py", 
> line 111, in get_single_data
> node = self.composer.get_single_node()
>   File "_ruamel_yaml.pyx", line 706, in _ruamel_yaml.CParser.get_single_node
>   File "_ruamel_yaml.pyx", line 724, in _ruamel_yaml.CParser._compose_document
>   File "_ruamel_yaml.pyx", line 775, in _ruamel_yaml.CParser._compose_node
>   File "_ruamel_yaml.pyx", line 891, in 
> _ruamel_yaml.CParser._compose_mapping_node
>   File "_ruamel_yaml.pyx", line 904, in _ruamel_yaml.CParser._parse_next_event
> ruamel.yaml.scanner.ScannerError: mapping values are not allowed in this 
> context
>   in "", line 27, column 14
> make[1]: *** [Documentation/devicetree/bindings/Makefile:20: 
> Documentation/devicetree/bindings/usb/usb-hcd.example.dts] Error 1
> make[1]: *** Deleting file 
> 'Documentation/devicetree/bindings/usb/usb-hcd.example.dts'
> make[1]: *** Waiting for unfinished jobs
> ./Documentation/devicetree/bindings/usb/usb-hcd.yaml:27:14: [error] syntax 
> error: mapping values are not allowed here (syntax)
> make[1]: *** [Documentation/devicetree/bindings/Makefile:59: 
> Documentation/devicetree/bindings/processed-schema-examples.json] Error 123
> make: *** [Makefile:1366: dt_binding_check] Error 2
> 
> 
> See https://patchwork.ozlabs.org/patch/1382001
> 
> If you already ran 'make dt_binding_check' and didn't see the above
> error(s), then make sure dt-schema is up to date:
> 
> pip3 install git+https://github.com/devicetree-org/dt-schema.git@master 
> --upgrade
> 
> Please check and re-submit.

Hm, that's weird. Of course I did the dt_binding_check before submission, but
even after the dt-schema repo update I failed to see the error:

$ make -j8 ARCH=mips CROSS_COMPILE=mipsel-baikal-linux- dt_binding_check 
DT_SCHEMA_FILES=Documentation/devicetree/bindings/usb/usb-hcd.yaml
  CHKDT   Documentation/devicetree/bindings/usb/usb-hcd.yaml
  SCHEMA  Documentation/devicetree/bindings/processed-schema-examples.yaml
  DTC Documentation/devicetree/bindings/usb/usb-hcd.example.dt.yaml
  CHECK   Documentation/devicetree/bindings/usb/usb-hcd.example.dt.yaml

Rob, any idea why has the bot got mad at me?

-Sergey

> 


Re: [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation

2020-10-14 Thread Ingo Molnar


* Alexey Budankov  wrote:

> 
> Patch set provides threaded trace streaming for base perf record
> operation. Provided streaming mode (--threads) mitigates profiling
> data losses and resolves scalability issues of serial and asynchronous
> (--aio) trace streaming modes on multicore server systems. The patch
> set is based on the prototype [1], [2] and the most closely relates
> to mode 3) "mode that creates thread for every monitored memory map".
> 
> The threaded mode executes one-to-one mapping of trace streaming threads
> to mapped data buffers and streaming into per-CPU trace files located
> at data directory. The data buffers and threads are affined to NUMA
> nodes and monitored CPUs according to system topology. --cpu option
> can be used to specify exact CPUs to be monitored.

Yay! This should really be the default trace capture model everywhere 
possible.

Can we do this for perf top too? It's really struggling with lots of cores.

If on a 64-core system I run just a moderately higher frequency 'perf top' 
of 1 kHz:

  perf top -e cycles -F 1000

perf stays stuck forever in 'Collecting samples...', and I also get a lot 
of:

  [548112.871089] Uhhuh. NMI received for unknown reason 31 on CPU 25.
  [548112.871089] Do you have a strange power saving mode enabled?

Thanks,

Ingo


Re: [ANNOUNCE] libtraceevent.git

2020-10-14 Thread Steven Rostedt
On Wed, 14 Oct 2020 20:56:53 +0800
Zamir SUN  wrote:

> > 
> > So should I just add that one patch and tag it?
> >   
> 
> That would be great, at least for Fedora packaging.

I'm going with version 1.1.0 and not following the kernel versioning, as
that would just add to the confusion.

Here's the tarball:

  
https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git/snapshot/libtraceevent-1.1.0.tar.gz

-- Steve



Re: [PATCH 3/3] mfd: ahc1ec0-wdt: Add sub-device watchdog for Advantech embedded controller

2020-10-14 Thread kernel test robot
Hi Shihlun,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on linux/master v5.9]
[cannot apply to lee-mfd/for-mfd-next next-20201013]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Shihlun-Lin/mfd-ahc1ec0-Add-support-for-Advantech-embedded-controller/20201014-164627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
b5fc7a89e58bcc059a3d5e4db79c481fb437de59
config: parisc-randconfig-r003-20201014 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/c0d1228cbc430e5be6ac8a311f8fa783f9285628
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Shihlun-Lin/mfd-ahc1ec0-Add-support-for-Advantech-embedded-controller/20201014-164627
git checkout c0d1228cbc430e5be6ac8a311f8fa783f9285628
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   drivers/mfd/ahc1ec0-wdt.c: In function 'wdt_proc_read':
>> drivers/mfd/ahc1ec0-wdt.c:162:17: warning: variable 'name' set but not used 
>> [-Wunused-but-set-variable]
 162 |  unsigned char *name, *chip_name, *is_enable;
 | ^~~~
   drivers/mfd/ahc1ec0-wdt.c: In function 'adv_ec_wdt_probe':
>> drivers/mfd/ahc1ec0-wdt.c:426:31: warning: variable 'adv_ec_data' set but 
>> not used [-Wunused-but-set-variable]
 426 |  struct adv_ec_platform_data *adv_ec_data;
 |   ^~~

vim +/name +162 drivers/mfd/ahc1ec0-wdt.c

   159  
   160  static int wdt_proc_read(struct seq_file *m, void *p)
   161  {
 > 162  unsigned char *name, *chip_name, *is_enable;
   163  unsigned long current_timeout = 0;
   164  
   165  name = m->private;
   166  
   167  chip_name = wdt_data.chip_name;
   168  current_timeout = wdt_data.current_timeout;
   169  is_enable = wdt_data.is_enable;
   170  
   171  seq_printf(m, "name   : %s\n", chip_name);
   172  seq_printf(m, "timeout: %ld\n", current_timeout);
   173  seq_printf(m, "is_enable  : %s\n", is_enable);
   174  
   175  return 0;
   176  }
   177  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


<    1   2   3   4   5   6   7   8   9   10   >