[RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA

2021-04-29 Thread Shanker Donthineni
For pass-through device assignment, the ARM64 KVM hypervisor retrieves
the memory region properties physical address, size, and whether a
region backed with struct page or not from VMA. The prefetchable
attribute of a BAR region isn't visible to KVM to make an optimal
decision for stage2 attributes.

This patch updates vma->vm_page_prot and maps with write-combine
attribute if the associated BAR is prefetchable. For ARM64
pgprot_writecombine() is mapped to memory-type MT_NORMAL_NC which
has no side effects on reads and multiple writes can be combined.

Signed-off-by: Shanker Donthineni 
---
 drivers/vfio/pci/vfio_pci.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 5023e23db3bc..1b734fe1dd51 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1703,7 +1703,11 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
}
 
vma->vm_private_data = vdev;
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+   if (IS_ENABLED(CONFIG_ARM64) &&
+   (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH))
+   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+   else
+   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
 
/*
-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC 0/2] [RFC] Honor PCI prefetchable attributes for a virtual machine on ARM64

2021-04-29 Thread Shanker Donthineni
Problem statement: Virtual machine crashes when NVIDIA GPU driver access a 
prefetchable BAR space due to the unaligned reads/writes for pass-through 
devices. The same binary works fine as expected in the host kernel. Only one 
BAR has control & status registers (CSR) and other PCI BARs are marked as 
prefetchable. NVIDIA GPU driver uses the write-combine feature for mapping the 
prefetchable BARs to improve performance. This problem applies to all other 
drivers which want to enable WC.
 
Solution: Honor PCI prefetchable attributes for the guest operating systems.
 
Proposal: ARM64-KVM uses VMA struct for the needed information e.g. region 
physical address, size, and memory-type (struct page backed mapping or 
anonymous memory) for setting up a stage-2 page table. Right now memory region 
either can be mapped as DEVICE (strongly ordered) or NORMAL (write-back cache) 
depends on the flag VM_PFNMAP in VMA. VFIO-PCI will keep the prefetchable 
(write-combine) information in vma->vm_page_prot similar to other fields, and 
KVM will prepare stage-2 entries based on the memory-type attribute that was 
set in VMA.

Shanker Donthineni (2):
  vfio/pci: keep the prefetchable attribute of a BAR region in VMA
  KVM: arm64: Add write-combine support for stage-2 entries

 arch/arm64/include/asm/kvm_mmu.h |  3 ++-
 arch/arm64/include/asm/kvm_pgtable.h |  2 ++
 arch/arm64/include/asm/memory.h  |  4 +++-
 arch/arm64/kvm/hyp/pgtable.c |  9 +++--
 arch/arm64/kvm/mmu.c | 22 +++---
 arch/arm64/kvm/vgic/vgic-v2.c|  2 +-
 drivers/vfio/pci/vfio_pci.c  |  6 +-
 7 files changed, 39 insertions(+), 9 deletions(-)

-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC 2/2] KVM: arm64: Add write-combine support for stage-2 entries

2021-04-29 Thread Shanker Donthineni
In the current implementation, the device memory is always mapped as
DEVICE_nGnRE in stage-2. In the host kernel, device drivers have
flexibility whether to choose a memory-type device or write-combine
(Non-cacheable) depends on the use case. PCI specification has a
prefetchable BAR concept where multiple writes can be combined and
no side effects on reads. It provides huge performance improvement
and also allows unaligned access.

NVIDIA GPU PCIe devices have 3 BAR regions. Two regions are mapped to
video/compute memory and marked as prefetchable. The GPU driver takes
advantage of the write-combine feature for higher performance. The
same driver has no issues in the host kernel but crashes inside the
virtual machine because of unaligned accesses.

This patch finds the PTE attributes for device memory in VMA. It
updates the stage-2 attribute to NORMAL_NC for WC regions and
the default type DEVICE_nGnRE for non-WC regions.

Change-Id: Ibaea69c7a301df3c86609e871f6d066728391080
Signed-off-by: Shanker Donthineni 
---
 arch/arm64/include/asm/kvm_mmu.h |  3 ++-
 arch/arm64/include/asm/kvm_pgtable.h |  2 ++
 arch/arm64/include/asm/memory.h  |  4 +++-
 arch/arm64/kvm/hyp/pgtable.c |  9 +++--
 arch/arm64/kvm/mmu.c | 21 ++---
 arch/arm64/kvm/vgic/vgic-v2.c|  2 +-
 6 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 90873851f677..dec498a6ba2f 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -160,7 +160,8 @@ void stage2_unmap_vm(struct kvm *kvm);
 int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
- phys_addr_t pa, unsigned long size, bool writable);
+ phys_addr_t pa, unsigned long size, bool writable,
+ bool writecombine);
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
b/arch/arm64/include/asm/kvm_pgtable.h
index 8886d43cfb11..26f28220f6f3 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -35,6 +35,7 @@ struct kvm_pgtable {
  * @KVM_PGTABLE_PROT_W:Write permission.
  * @KVM_PGTABLE_PROT_R:Read permission.
  * @KVM_PGTABLE_PROT_DEVICE:   Device attributes.
+ * @KVM_PGTABLE_PROT_WC:   Normal non-cacheable (WC).
  */
 enum kvm_pgtable_prot {
KVM_PGTABLE_PROT_X  = BIT(0),
@@ -42,6 +43,7 @@ enum kvm_pgtable_prot {
KVM_PGTABLE_PROT_R  = BIT(2),
 
KVM_PGTABLE_PROT_DEVICE = BIT(3),
+   KVM_PGTABLE_PROT_WC = BIT(4),
 };
 
 #define PAGE_HYP   (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0aabc3be9a75..04a812b59437 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -144,13 +144,15 @@
  * Memory types for Stage-2 translation
  */
 #define MT_S2_NORMAL   0xf
+#define MT_S2_WRITE_COMBINE5
 #define MT_S2_DEVICE_nGnRE 0x1
 
 /*
  * Memory types for Stage-2 translation when ID_AA64MMFR2_EL1.FWB is 0001
- * Stage-2 enforces Normal-WB and Device-nGnRE
+ * Stage-2 enforces Normal-WB, Normal-NC and Device-nGnRE
  */
 #define MT_S2_FWB_NORMAL   6
+#define MT_S2_FWB_WRITE_COMBINE5
 #define MT_S2_FWB_DEVICE_nGnRE 1
 
 #ifdef CONFIG_ARM64_4K_PAGES
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 926fc07074f5..bdfed559eae2 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -444,9 +444,14 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot 
prot,
struct stage2_map_data *data)
 {
bool device = prot & KVM_PGTABLE_PROT_DEVICE;
-   kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
-   PAGE_S2_MEMATTR(NORMAL);
u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
+   kvm_pte_t attr = PAGE_S2_MEMATTR(NORMAL);
+
+   if (device) {
+   attr = (prot & KVM_PGTABLE_PROT_WC) ?
+   PAGE_S2_MEMATTR(WRITE_COMBINE) :
+   PAGE_S2_MEMATTR(DEVICE_nGnRE);
+   }
 
if (!(prot & KVM_PGTABLE_PROT_X))
attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8711894db8c2..5b8ec1ab12e2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -487,6 +487,16 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
}
 }
 
+/**
+ * is_vma_write_combine - check if VMA is mapped with writecombine or not
+ * Return true if VMA mapped with MT_NORMAL_NC otherwise fasle
+ */
+static bool inline is_vma_write_co

Re: [PATCH v2] arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening

2018-03-10 Thread Shanker Donthineni
Hi Will,

On 03/09/2018 07:48 AM, Will Deacon wrote:
> Hi SHanker,
> 
> On Mon, Mar 05, 2018 at 11:06:43AM -0600, Shanker Donthineni wrote:
>> The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC
>> V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses
>> the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead
>> of Silicon provider service ID 0xC2001700.
>>
>> Signed-off-by: Shanker Donthineni 
>> ---
>> Chnages since v1:
>>   - Trivial change in cpucaps.h (refresh after removing 
>> ARM64_HARDEN_BP_POST_GUEST_EXIT)
>>
>>  arch/arm64/include/asm/cpucaps.h |  5 ++--
>>  arch/arm64/include/asm/kvm_asm.h |  2 --
>>  arch/arm64/kernel/bpi.S  |  8 --
>>  arch/arm64/kernel/cpu_errata.c   | 55 
>> ++--
>>  arch/arm64/kvm/hyp/entry.S   | 12 -
>>  arch/arm64/kvm/hyp/switch.c  | 10 
>>  6 files changed, 21 insertions(+), 71 deletions(-)
> 
> Could you reply to my outstanding question on the last version of this patch
> please?
> 

I replied to your comments. This patch contents have been discussed with QCOM 
CPU
architecture and design team. Their recommendation was to keep two variants of
variant2 mitigation in order to take advantage of Falkor hardware and avoid the
unnecessary overhead by calling SMMCC always.


> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-March/564194.html
> 
> Will
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening

2018-03-10 Thread Shanker Donthineni
Hi Will,

On 03/06/2018 09:25 AM, Will Deacon wrote:
> On Mon, Mar 05, 2018 at 12:03:33PM -0600, Shanker Donthineni wrote:
>> On 03/05/2018 11:15 AM, Will Deacon wrote:
>>> On Mon, Mar 05, 2018 at 10:57:58AM -0600, Shanker Donthineni wrote:
>>>> On 03/05/2018 09:56 AM, Will Deacon wrote:
>>>>> On Fri, Mar 02, 2018 at 03:50:18PM -0600, Shanker Donthineni wrote:
>>>>>> @@ -199,33 +208,15 @@ static int enable_smccc_arch_workaround_1(void 
>>>>>> *data)
>>>>>>  return 0;
>>>>>>  }
>>>>>>  
>>>>>> +if (((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR) ||
>>>>>> +((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR_V1))
>>>>>> +cb = qcom_link_stack_sanitization;
>>>>>
>>>>> Is this just a performance thing? Do you actually see an advantage over
>>>>> always making the firmware call? We've seen minimal impact in our testing.
>>>>>
>>>>
>>>> Yes, we've couple of advantages using the standard SMCCC_ARCH_WOKAROUND_1 
>>>> framework. 
>>>>   - Improves the code readability.
>>>>   - Avoid the unnecessary MIDR checks on each vCPU exit.
>>>>   - Validates ID_AA64PFR0_CVS2 feature for Falkor chips.
>>>>   - Avoids the 2nd link stack sanitization workaround in firmware.
>>>
>>> What I mean is, can we drop qcom_link_stack_sanitization altogether and
>>> use the SMCCC interface for everything?
>>>
>>
>> No, We would like to keep it qcom_link_stack_sanitization for host kernel
>> since it takes a few CPU cycles instead of heavyweight SMCCC call.
> 
> Is that something that you can actually measure in the workloads and
> benchmarks that you care about? If so, fine, but that doesn't seem to be the
> case for the Cortex cores we've looked at internally and it would be nice to
> avoid having different workarounds in the kernel just because the SMCCC
> interface wasn't baked in time, rather than because there's a meaningful
> performance difference.
>

We've seen noticeable performance improvement with the microbench workloads,
ans also some of our customers have observed improvements on heavy workloads. 
Unfortunately I can't share those specific results here. SMCCC call overhead 
is much higher as compared to link stack workaround on Falkor, ~99X.

Host kernel workaround takes less than ~20 CPU cycles, whereas 
SMCCC_ARCH_WOKAROUND_1
consumes thousands of CPU cycles to sanitize the branch prediction on Falkor.  

Especially workloads inside virtual machines provides much better results 
because
no KVM involvement is required whenever guest calls 
qcom_link_stack_sanitization().
 
> Will
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-03-07 Thread Shanker Donthineni
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
The following two control bits DIC and IDC were defined for this
purpose. No need to perform point of unification cache maintenance
operations from software on systems where CPU caches are transparent.

This patch optimize the three functions __flush_cache_user_range(),
clean_dcache_area_pou() and invalidate_icache_range() if the hardware
reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
in order to avoid the unnecessary overhead.

CTR_EL0.DIC: Instruction cache invalidation requirements for
 instruction to data coherence. The meaning of this bit[29].
  0: Instruction cache invalidation to the point of unification
 is required for instruction to data coherence.
  1: Instruction cache cleaning to the point of unification is
  not required for instruction to data coherence.

CTR_EL0.IDC: Data cache clean requirements for instruction to data
 coherence. The meaning of this bit[28].
  0: Data cache clean to the point of unification is required for
 instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
 or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
  1: Data cache clean to the point of unification is not required
 for instruction to data coherence.

Co-authored-by: Philip Elcan 
Signed-off-by: Shanker Donthineni 
---
Changes since v6:
  -Both I-Cache and D-Cache changes are symmetric as Will suggested.
  -Remove Kconfig option.
  -Patch __flush_icache_all().

Changes since v5:
  -Addressed Mark's review comments.

Changes since v4:
  -Moved patching ARM64_HAS_CACHE_DIC inside invalidate_icache_by_line
  -Removed 'dsb ishst' for ARM64_HAS_CACHE_DIC as Mark suggested.

Changes since v3:
  -Added preprocessor guard CONFIG_xxx to code snippets in cache.S
  -Changed barrier attributes from ISH to ISHST.

Changes since v2:
  -Included barriers, DSB/ISB with DIC set, and DSB with IDC set.
  -Single Kconfig option.

Changes since v1:
  -Reworded commit text.
  -Used the alternatives framework as Catalin suggested.
  -Rebased on top of https://patchwork.kernel.org/patch/10227927/

 arch/arm64/include/asm/cache.h  |  4 
 arch/arm64/include/asm/cacheflush.h |  7 +--
 arch/arm64/include/asm/cpucaps.h|  4 +++-
 arch/arm64/kernel/cpufeature.c  | 36 ++--
 arch/arm64/mm/cache.S   | 21 -
 5 files changed, 62 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index ea9bb4e..9bbffc7 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -20,8 +20,12 @@
 
 #define CTR_L1IP_SHIFT 14
 #define CTR_L1IP_MASK  3
+#define CTR_DMINLINE_SHIFT 16
+#define CTR_ERG_SHIFT  20
 #define CTR_CWG_SHIFT  24
 #define CTR_CWG_MASK   15
+#define CTR_IDC_SHIFT  28
+#define CTR_DIC_SHIFT  29
 
 #define CTR_L1IP(ctr)  (((ctr) >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK)
 
diff --git a/arch/arm64/include/asm/cacheflush.h 
b/arch/arm64/include/asm/cacheflush.h
index bef9f41..d51bde1 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -133,8 +133,11 @@ extern void copy_to_user_page(struct vm_area_struct *, 
struct page *,
 
 static inline void __flush_icache_all(void)
 {
-   asm("ic ialluis");
-   dsb(ish);
+   /* Instruction cache invalidation is not required for I/D coherence? */
+   if (!cpus_have_const_cap(ARM64_HAS_CACHE_DIC)) {
+   asm("ic ialluis");
+   dsb(ish);
+   }
 }
 
 #define flush_dcache_mmap_lock(mapping) \
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index bb26382..8dd42ae 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -45,7 +45,9 @@
 #define ARM64_HARDEN_BRANCH_PREDICTOR  24
 #define ARM64_HARDEN_BP_POST_GUEST_EXIT25
 #define ARM64_HAS_RAS_EXTN 26
+#define ARM64_HAS_CACHE_IDC27
+#define ARM64_HAS_CACHE_DIC28
 
-#define ARM64_NCAPS27
+#define ARM64_NCAPS29
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 2985a06..9f39e9c 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -199,12 +199,12 @@ static int __init register_cpu_hwcaps_dumper(void)
 };
 
 static const struct arm64_ftr_bits ftr_ctr[] = {
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1),   
/* RES1 */
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 29, 1, 1),  
/* DIC */
-   AR

Re: [PATCH v6] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-03-06 Thread Shanker Donthineni
Hi Will,

On 03/06/2018 12:48 PM, Shanker Donthineni wrote:
> Hi Will,
> 
> On 03/06/2018 09:23 AM, Will Deacon wrote:
>> Hi Shanker,
>>
>> On Tue, Mar 06, 2018 at 08:47:27AM -0600, Shanker Donthineni wrote:
>>> On 03/06/2018 07:44 AM, Will Deacon wrote:
>>>> I think this is a slight asymmetry with the code for the I-side. On the
>>>> I-side, you hook into invalidate_icache_by_line, whereas on the D-side you
>>>> hook into the callers of dcache_by_line_op. Why is that?
>>>>
>>>
>>> There is no particular reason other than complexity of the macro with 
>>> another 
>>> alternative. I tried to avoid this change by updating 
>>> __clean_dcache_area_pou().
>>> I can change if you're interested to see both I-Side and D-Side changes are
>>> symmetric some thing like this...
>>>
>>>  .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
>>>   
>>>   .if   (\op == cvau)
>>>   alternative_if ARM64_HAS_CACHE_IDC
>>> dsb ishst
>>> b   9997f
>>>   alternative_else_nop_endif
>>>   .endif
>>>
>>> dcache_line_size \tmp1, \tmp2
>>> add \size, \kaddr, \size
>>> sub \tmp2, \tmp1, #1
>>> bic \kaddr, \kaddr, \tmp2
>>>  9998:
>>> .if (\op == cvau || \op == cvac)
>>>  alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
>>> dc  \op, \kaddr
>>>  alternative_else
>>> dc  civac, \kaddr
>>>  alternative_endif
>>> .elseif (\op == cvap)
>>>  alternative_if ARM64_HAS_DCPOP
>>> sys 3, c7, c12, 1, \kaddr   // dc cvap
>>>  alternative_else
>>> dc  cvac, \kaddr
>>>  alternative_endif
>>> .else
>>> dc  \op, \kaddr
>>> .endif
>>> add \kaddr, \kaddr, \tmp1
>>> cmp \kaddr, \size
>>> b.lo9998b
>>> dsb \domain
>>> 9997:
>>> .endm
>>
>> I think it would be cleaner the other way round, actually -- move the check
>> out of invalidate_icache_by_line and into its two callers.
>>
> 
> Sure, I'll send out the next patch with your suggestions.
> 
>>>> I notice that the only user other than
>>>> flush_icache_range/__flush_cache_user_range or invalidate_icache_by_line
>>>> is in KVM, via invalidate_icache_range. If you want to hook in there, why
>>>> aren't you also patching __flush_icache_all? If so, I'd rather have the
>>>> I-side code consistent with the D-side code and do this in the handful of
>>>> callers. We might even be able to elide a branch or two that way.
>>>>
>>>
>>> Agree with you, it saves function calls overhead. I'll do this change...
>>>
>>> static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
>>> {
>>> if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC)
>>> __invalidate_icache_guest_page(pfn, size);
>>> }
>>>
>>>
>>>> I'm going to assume that I-cache aliases are all coherent if DIC=1, so it's
>>>> safe to elide our alias sync code.
>>>>
>>>
>>> I'm not sure about I-cache whether aliases are all coherent if DIC=1 ot not.
>>> Unfortunately I don't have any hardware to test DIC=1. I've verified IDC=1.
>>
>> I checked with our architects and aliases don't pose a problem here, so you
>> can ignore me :)
>>
> 
> I also confirmed with Thomas Speier, we can skip __flush_icache_all() if 
> DIC=1.
> 
>  
Planning to patch __flush_icache_all() itself instead of changing the callers. 
This
way we can avoid "ic ialluis" completely. Is this okay for you? 

static inline void __flush_icache_all(void)
{
   /* Instruction cache invalidation is not required for I/D coherence? */
  if (!cpus_have_const_cap(ARM64_HAS_CACHE_DIC)) {
   asm("ic ialluis");
   dsb(ish);
   }
}

>> Will
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-03-06 Thread Shanker Donthineni
Hi Will,

On 03/06/2018 09:23 AM, Will Deacon wrote:
> Hi Shanker,
> 
> On Tue, Mar 06, 2018 at 08:47:27AM -0600, Shanker Donthineni wrote:
>> On 03/06/2018 07:44 AM, Will Deacon wrote:
>>> I think this is a slight asymmetry with the code for the I-side. On the
>>> I-side, you hook into invalidate_icache_by_line, whereas on the D-side you
>>> hook into the callers of dcache_by_line_op. Why is that?
>>>
>>
>> There is no particular reason other than complexity of the macro with 
>> another 
>> alternative. I tried to avoid this change by updating 
>> __clean_dcache_area_pou().
>> I can change if you're interested to see both I-Side and D-Side changes are
>> symmetric some thing like this...
>>
>>  .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
>>   
>>   .if(\op == cvau)
>>   alternative_if ARM64_HAS_CACHE_IDC
>> dsb  ishst
>> b   9997f
>>   alternative_else_nop_endif
>>   .endif
>>
>>  dcache_line_size \tmp1, \tmp2
>>  add \size, \kaddr, \size
>>  sub \tmp2, \tmp1, #1
>>  bic \kaddr, \kaddr, \tmp2
>>  9998:
>>  .if (\op == cvau || \op == cvac)
>>  alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
>>  dc  \op, \kaddr
>>  alternative_else
>>  dc  civac, \kaddr
>>  alternative_endif
>>  .elseif (\op == cvap)
>>  alternative_if ARM64_HAS_DCPOP
>>  sys 3, c7, c12, 1, \kaddr   // dc cvap
>>  alternative_else
>>  dc  cvac, \kaddr
>>  alternative_endif
>>  .else
>>  dc  \op, \kaddr
>>  .endif
>>  add \kaddr, \kaddr, \tmp1
>>  cmp \kaddr, \size
>>  b.lo9998b
>>  dsb \domain
>> 9997:
>>  .endm
> 
> I think it would be cleaner the other way round, actually -- move the check
> out of invalidate_icache_by_line and into its two callers.
> 

Sure, I'll send out the next patch with your suggestions.

>>> I notice that the only user other than
>>> flush_icache_range/__flush_cache_user_range or invalidate_icache_by_line
>>> is in KVM, via invalidate_icache_range. If you want to hook in there, why
>>> aren't you also patching __flush_icache_all? If so, I'd rather have the
>>> I-side code consistent with the D-side code and do this in the handful of
>>> callers. We might even be able to elide a branch or two that way.
>>>
>>
>> Agree with you, it saves function calls overhead. I'll do this change...
>>
>> static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
>> {
>>  if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC)
>>  __invalidate_icache_guest_page(pfn, size);
>> }
>>
>>
>>> I'm going to assume that I-cache aliases are all coherent if DIC=1, so it's
>>> safe to elide our alias sync code.
>>>
>>
>> I'm not sure about I-cache whether aliases are all coherent if DIC=1 ot not.
>> Unfortunately I don't have any hardware to test DIC=1. I've verified IDC=1.
> 
> I checked with our architects and aliases don't pose a problem here, so you
> can ignore me :)
> 

I also confirmed with Thomas Speier, we can skip __flush_icache_all() if DIC=1.

 
> Will
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-03-06 Thread Shanker Donthineni
Hi Will

On 03/06/2018 07:44 AM, Will Deacon wrote:
> Hi Shanker,
> 
> On Wed, Feb 28, 2018 at 10:14:00PM -0600, Shanker Donthineni wrote:
>> The DCache clean & ICache invalidation requirements for instructions
>> to be data coherence are discoverable through new fields in CTR_EL0.
>> The following two control bits DIC and IDC were defined for this
>> purpose. No need to perform point of unification cache maintenance
>> operations from software on systems where CPU caches are transparent.
>>
>> This patch optimize the three functions __flush_cache_user_range(),
>> clean_dcache_area_pou() and invalidate_icache_range() if the hardware
>> reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
>> instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
>> in order to avoid the unnecessary overhead.
>>
>> CTR_EL0.DIC: Instruction cache invalidation requirements for
>>  instruction to data coherence. The meaning of this bit[29].
>>   0: Instruction cache invalidation to the point of unification
>>  is required for instruction to data coherence.
>>   1: Instruction cache cleaning to the point of unification is
>>   not required for instruction to data coherence.
>>
>> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>>  coherence. The meaning of this bit[28].
>>   0: Data cache clean to the point of unification is required for
>>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>>   1: Data cache clean to the point of unification is not required
>>  for instruction to data coherence.
>>
>> Co-authored-by: Philip Elcan 
>> Signed-off-by: Shanker Donthineni 
>> ---
>> Changes since v5:
>>   -Addressed Mark's review comments.
> 
> This mostly looks good now. Just a few comments inline.
> 
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 7381eeb..41af850 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1091,6 +1091,18 @@ config ARM64_RAS_EXTN
>>and access the new registers if the system supports the extension.
>>Platform RAS features may additionally depend on firmware support.
>>  
>> +config ARM64_SKIP_CACHE_POU
>> +bool "Enable support to skip cache PoU operations"
>> +default y
>> +help
>> +  Explicit point of unification cache operations can be eliminated
>> +  in software if the hardware handles transparently. The new bits in
>> +  CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware
>> +  capabilities of ICache and DCache PoU requirements.
>> +
>> +  Selecting this feature will allow the kernel to optimize cache
>> +  maintenance to the PoU.
>> +
>>  endmenu
> 
> Let's not bother with a Kconfig option. I think the extra couple of NOPs
> this introduces for CPUs that don't implement the new features isn't going
> to hurt anybody.
> 

Okay, I'll get rid of Kconfig option.

>> diff --git a/arch/arm64/include/asm/assembler.h 
>> b/arch/arm64/include/asm/assembler.h
>> index 3c78835..39f2274 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -444,6 +444,11 @@
>>   *  Corrupts:   tmp1, tmp2
>>   */
>>  .macro invalidate_icache_by_line start, end, tmp1, tmp2, label
>> +#ifdef CONFIG_ARM64_SKIP_CACHE_POU
>> +alternative_if ARM64_HAS_CACHE_DIC
>> +b   9996f
>> +alternative_else_nop_endif
>> +#endif
>>  icache_line_size \tmp1, \tmp2
>>  sub \tmp2, \tmp1, #1
>>  bic \tmp2, \start, \tmp2
>> @@ -453,6 +458,7 @@
>>  cmp \tmp2, \end
>>  b.lo9997b
>>  dsb ish
>> +9996:
>>  isb
>>  .endm
>>  
>> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
>> index ea9bb4e..d460e9f 100644
>> --- a/arch/arm64/include/asm/cache.h
>> +++ b/arch/arm64/include/asm/cache.h
>> @@ -20,8 +20,12 @@
>>  
>>  #define CTR_L1IP_SHIFT  14
>>  #define CTR_L1IP_MASK   3
>> +#define CTR_DMLINE_SHIFT16
> 
> This should be "CTR_DMINLINE_SHIFT"
> 

I'll change it.

>> +#define CTR_ERG_SHIFT   20
>>  #define CTR_CWG_SHIFT   24
>>  #define CTR_CWG_MASK15
>> +#define CTR_IDC_SHIFT   28
>> +#define CTR_DIC_SHIFT   2

Re: [PATCH] arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening

2018-03-05 Thread Shanker Donthineni
Hi Will,

On 03/05/2018 11:15 AM, Will Deacon wrote:
> On Mon, Mar 05, 2018 at 10:57:58AM -0600, Shanker Donthineni wrote:
>> Hi Will,
>>
>> On 03/05/2018 09:56 AM, Will Deacon wrote:
>>> Hi Shanker,
>>>
>>> On Fri, Mar 02, 2018 at 03:50:18PM -0600, Shanker Donthineni wrote:
>>>> The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC
>>>> V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses
>>>> the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead
>>>> of Silicon provider service ID 0xC2001700.
>>>>
>>>> Signed-off-by: Shanker Donthineni 
>>>> ---
>>>>  arch/arm64/include/asm/cpucaps.h |  2 +-
>>>>  arch/arm64/include/asm/kvm_asm.h |  2 --
>>>>  arch/arm64/kernel/bpi.S  |  8 --
>>>>  arch/arm64/kernel/cpu_errata.c   | 55 
>>>> ++--
>>>>  arch/arm64/kvm/hyp/entry.S   | 12 -
>>>>  arch/arm64/kvm/hyp/switch.c  | 10 
>>>>  6 files changed, 20 insertions(+), 69 deletions(-)
>>>
>>> I'm happy to take this via arm64 if I get an ack from Marc/Christoffer.
>>>
>>>> diff --git a/arch/arm64/include/asm/cpucaps.h 
>>>> b/arch/arm64/include/asm/cpucaps.h
>>>> index bb26382..6ecc249 100644
>>>> --- a/arch/arm64/include/asm/cpucaps.h
>>>> +++ b/arch/arm64/include/asm/cpucaps.h
>>>> @@ -43,7 +43,7 @@
>>>>  #define ARM64_SVE 22
>>>>  #define ARM64_UNMAP_KERNEL_AT_EL0 23
>>>>  #define ARM64_HARDEN_BRANCH_PREDICTOR 24
>>>> -#define ARM64_HARDEN_BP_POST_GUEST_EXIT   25
>>>> +/* #define ARM64_UNALLOCATED_ENTRY25 */
>>>>  #define ARM64_HAS_RAS_EXTN26
>>>>  
>>>>  #define ARM64_NCAPS   27
>>>
>>> These aren't ABI, so I think you can just drop
>>> ARM64_HARDEN_BP_POST_GUEST_EXIT and repack the others accordingly.
>>>
>> Sure, I'll remove it completely in v2 patch.
>>  
>>>> diff --git a/arch/arm64/include/asm/kvm_asm.h 
>>>> b/arch/arm64/include/asm/kvm_asm.h
>>>> index 24961b7..ab4d0a9 100644
>>>> --- a/arch/arm64/include/asm/kvm_asm.h
>>>> +++ b/arch/arm64/include/asm/kvm_asm.h
>>>> @@ -68,8 +68,6 @@
>>>>  
>>>>  extern u32 __init_stage2_translation(void);
>>>>  
>>>> -extern void __qcom_hyp_sanitize_btac_predictors(void);
>>>> -
>>>>  #endif
>>>>  
>>>>  #endif /* __ARM_KVM_ASM_H__ */
>>>> diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
>>>> index e5de335..dc4eb15 100644
>>>> --- a/arch/arm64/kernel/bpi.S
>>>> +++ b/arch/arm64/kernel/bpi.S
>>>> @@ -55,14 +55,6 @@ ENTRY(__bp_harden_hyp_vecs_start)
>>>>.endr
>>>>  ENTRY(__bp_harden_hyp_vecs_end)
>>>>  
>>>> -ENTRY(__qcom_hyp_sanitize_link_stack_start)
>>>> -  stp x29, x30, [sp, #-16]!
>>>> -  .rept   16
>>>> -  bl  . + 4
>>>> -  .endr
>>>> -  ldp x29, x30, [sp], #16
>>>> -ENTRY(__qcom_hyp_sanitize_link_stack_end)
>>>> -
>>>>  .macro smccc_workaround_1 inst
>>>>sub sp, sp, #(8 * 4)
>>>>stp x2, x3, [sp, #(8 * 0)]
>>>> diff --git a/arch/arm64/kernel/cpu_errata.c 
>>>> b/arch/arm64/kernel/cpu_errata.c
>>>> index 52f15cd..d779ffd4 100644
>>>> --- a/arch/arm64/kernel/cpu_errata.c
>>>> +++ b/arch/arm64/kernel/cpu_errata.c
>>>> @@ -67,8 +67,6 @@ static int cpu_enable_trap_ctr_access(void *__unused)
>>>>  DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
>>>>  
>>>>  #ifdef CONFIG_KVM
>>>> -extern char __qcom_hyp_sanitize_link_stack_start[];
>>>> -extern char __qcom_hyp_sanitize_link_stack_end[];
>>>>  extern char __smccc_workaround_1_smc_start[];
>>>>  extern char __smccc_workaround_1_smc_end[];
>>>>  extern char __smccc_workaround_1_hvc_start[];
>>>> @@ -115,8 +113,6 @@ static void 
>>>> __install_bp_hardening_cb(bp_hardening_cb_t fn,
>>>>spin_unlock(&bp_lock);
>>>>  }
>>>>  #else
>>>> -#define __qcom_hyp_sanitize_link_stack_start  NULL
>>>> -#define __qco

[PATCH v2] arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening

2018-03-05 Thread Shanker Donthineni
The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC
V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses
the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead
of Silicon provider service ID 0xC2001700.

Signed-off-by: Shanker Donthineni 
---
Chnages since v1:
  - Trivial change in cpucaps.h (refresh after removing 
ARM64_HARDEN_BP_POST_GUEST_EXIT)

 arch/arm64/include/asm/cpucaps.h |  5 ++--
 arch/arm64/include/asm/kvm_asm.h |  2 --
 arch/arm64/kernel/bpi.S  |  8 --
 arch/arm64/kernel/cpu_errata.c   | 55 ++--
 arch/arm64/kvm/hyp/entry.S   | 12 -
 arch/arm64/kvm/hyp/switch.c  | 10 
 6 files changed, 21 insertions(+), 71 deletions(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index bb26382..324c85e 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -43,9 +43,8 @@
 #define ARM64_SVE  22
 #define ARM64_UNMAP_KERNEL_AT_EL0  23
 #define ARM64_HARDEN_BRANCH_PREDICTOR  24
-#define ARM64_HARDEN_BP_POST_GUEST_EXIT25
-#define ARM64_HAS_RAS_EXTN 26
+#define ARM64_HAS_RAS_EXTN 25
 
-#define ARM64_NCAPS27
+#define ARM64_NCAPS26
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 24961b7..ab4d0a9 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -68,8 +68,6 @@
 
 extern u32 __init_stage2_translation(void);
 
-extern void __qcom_hyp_sanitize_btac_predictors(void);
-
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
index e5de335..dc4eb15 100644
--- a/arch/arm64/kernel/bpi.S
+++ b/arch/arm64/kernel/bpi.S
@@ -55,14 +55,6 @@ ENTRY(__bp_harden_hyp_vecs_start)
.endr
 ENTRY(__bp_harden_hyp_vecs_end)
 
-ENTRY(__qcom_hyp_sanitize_link_stack_start)
-   stp x29, x30, [sp, #-16]!
-   .rept   16
-   bl  . + 4
-   .endr
-   ldp x29, x30, [sp], #16
-ENTRY(__qcom_hyp_sanitize_link_stack_end)
-
 .macro smccc_workaround_1 inst
sub sp, sp, #(8 * 4)
stp x2, x3, [sp, #(8 * 0)]
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 52f15cd..d779ffd4 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -67,8 +67,6 @@ static int cpu_enable_trap_ctr_access(void *__unused)
 DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
 
 #ifdef CONFIG_KVM
-extern char __qcom_hyp_sanitize_link_stack_start[];
-extern char __qcom_hyp_sanitize_link_stack_end[];
 extern char __smccc_workaround_1_smc_start[];
 extern char __smccc_workaround_1_smc_end[];
 extern char __smccc_workaround_1_hvc_start[];
@@ -115,8 +113,6 @@ static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
spin_unlock(&bp_lock);
 }
 #else
-#define __qcom_hyp_sanitize_link_stack_start   NULL
-#define __qcom_hyp_sanitize_link_stack_end NULL
 #define __smccc_workaround_1_smc_start NULL
 #define __smccc_workaround_1_smc_end   NULL
 #define __smccc_workaround_1_hvc_start NULL
@@ -161,12 +157,25 @@ static void call_hvc_arch_workaround_1(void)
arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_1, NULL);
 }
 
+static void qcom_link_stack_sanitization(void)
+{
+   u64 tmp;
+
+   asm volatile("mov   %0, x30 \n"
+".rept 16  \n"
+"bl. + 4   \n"
+".endr \n"
+"mov   x30, %0 \n"
+: "=&r" (tmp));
+}
+
 static int enable_smccc_arch_workaround_1(void *data)
 {
const struct arm64_cpu_capabilities *entry = data;
bp_hardening_cb_t cb;
void *smccc_start, *smccc_end;
struct arm_smccc_res res;
+   u32 midr = read_cpuid_id();
 
if (!entry->matches(entry, SCOPE_LOCAL_CPU))
return 0;
@@ -199,33 +208,15 @@ static int enable_smccc_arch_workaround_1(void *data)
return 0;
}
 
+   if (((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR) ||
+   ((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR_V1))
+   cb = qcom_link_stack_sanitization;
+
install_bp_hardening_cb(entry, cb, smccc_start, smccc_end);
 
return 0;
 }
 
-static void qcom_link_stack_sanitization(void)
-{
-   u64 tmp;
-
-   asm volatile("mov   %0, x30 \n"
-".rept 16  \n"
-"bl. + 4   \n"
-".endr \n"
-"mov   

Re: [PATCH] arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening

2018-03-05 Thread Shanker Donthineni
Hi Will,

On 03/05/2018 09:56 AM, Will Deacon wrote:
> Hi Shanker,
> 
> On Fri, Mar 02, 2018 at 03:50:18PM -0600, Shanker Donthineni wrote:
>> The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC
>> V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses
>> the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead
>> of Silicon provider service ID 0xC2001700.
>>
>> Signed-off-by: Shanker Donthineni 
>> ---
>>  arch/arm64/include/asm/cpucaps.h |  2 +-
>>  arch/arm64/include/asm/kvm_asm.h |  2 --
>>  arch/arm64/kernel/bpi.S  |  8 --
>>  arch/arm64/kernel/cpu_errata.c   | 55 
>> ++--
>>  arch/arm64/kvm/hyp/entry.S   | 12 -
>>  arch/arm64/kvm/hyp/switch.c  | 10 
>>  6 files changed, 20 insertions(+), 69 deletions(-)
> 
> I'm happy to take this via arm64 if I get an ack from Marc/Christoffer.
> 
>> diff --git a/arch/arm64/include/asm/cpucaps.h 
>> b/arch/arm64/include/asm/cpucaps.h
>> index bb26382..6ecc249 100644
>> --- a/arch/arm64/include/asm/cpucaps.h
>> +++ b/arch/arm64/include/asm/cpucaps.h
>> @@ -43,7 +43,7 @@
>>  #define ARM64_SVE   22
>>  #define ARM64_UNMAP_KERNEL_AT_EL0   23
>>  #define ARM64_HARDEN_BRANCH_PREDICTOR   24
>> -#define ARM64_HARDEN_BP_POST_GUEST_EXIT 25
>> +/* #define ARM64_UNALLOCATED_ENTRY  25 */
>>  #define ARM64_HAS_RAS_EXTN  26
>>  
>>  #define ARM64_NCAPS 27
> 
> These aren't ABI, so I think you can just drop
> ARM64_HARDEN_BP_POST_GUEST_EXIT and repack the others accordingly.
> 
Sure, I'll remove it completely in v2 patch.
 
>> diff --git a/arch/arm64/include/asm/kvm_asm.h 
>> b/arch/arm64/include/asm/kvm_asm.h
>> index 24961b7..ab4d0a9 100644
>> --- a/arch/arm64/include/asm/kvm_asm.h
>> +++ b/arch/arm64/include/asm/kvm_asm.h
>> @@ -68,8 +68,6 @@
>>  
>>  extern u32 __init_stage2_translation(void);
>>  
>> -extern void __qcom_hyp_sanitize_btac_predictors(void);
>> -
>>  #endif
>>  
>>  #endif /* __ARM_KVM_ASM_H__ */
>> diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
>> index e5de335..dc4eb15 100644
>> --- a/arch/arm64/kernel/bpi.S
>> +++ b/arch/arm64/kernel/bpi.S
>> @@ -55,14 +55,6 @@ ENTRY(__bp_harden_hyp_vecs_start)
>>  .endr
>>  ENTRY(__bp_harden_hyp_vecs_end)
>>  
>> -ENTRY(__qcom_hyp_sanitize_link_stack_start)
>> -stp x29, x30, [sp, #-16]!
>> -.rept   16
>> -bl  . + 4
>> -.endr
>> -ldp x29, x30, [sp], #16
>> -ENTRY(__qcom_hyp_sanitize_link_stack_end)
>> -
>>  .macro smccc_workaround_1 inst
>>  sub sp, sp, #(8 * 4)
>>  stp x2, x3, [sp, #(8 * 0)]
>> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
>> index 52f15cd..d779ffd4 100644
>> --- a/arch/arm64/kernel/cpu_errata.c
>> +++ b/arch/arm64/kernel/cpu_errata.c
>> @@ -67,8 +67,6 @@ static int cpu_enable_trap_ctr_access(void *__unused)
>>  DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
>>  
>>  #ifdef CONFIG_KVM
>> -extern char __qcom_hyp_sanitize_link_stack_start[];
>> -extern char __qcom_hyp_sanitize_link_stack_end[];
>>  extern char __smccc_workaround_1_smc_start[];
>>  extern char __smccc_workaround_1_smc_end[];
>>  extern char __smccc_workaround_1_hvc_start[];
>> @@ -115,8 +113,6 @@ static void __install_bp_hardening_cb(bp_hardening_cb_t 
>> fn,
>>  spin_unlock(&bp_lock);
>>  }
>>  #else
>> -#define __qcom_hyp_sanitize_link_stack_startNULL
>> -#define __qcom_hyp_sanitize_link_stack_end  NULL
>>  #define __smccc_workaround_1_smc_start  NULL
>>  #define __smccc_workaround_1_smc_endNULL
>>  #define __smccc_workaround_1_hvc_start  NULL
>> @@ -161,12 +157,25 @@ static void call_hvc_arch_workaround_1(void)
>>  arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_1, NULL);
>>  }
>>  
>> +static void qcom_link_stack_sanitization(void)
>> +{
>> +u64 tmp;
>> +
>> +asm volatile("mov   %0, x30 \n"
>> + ".rept 16  \n"
>> + "bl. + 4   \n"
>> + ".endr \n"
>> + "mov   x30, %0 \n"
>> + : "=&r" (tmp));

[PATCH] arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening

2018-03-02 Thread Shanker Donthineni
The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC
V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses
the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead
of Silicon provider service ID 0xC2001700.

Signed-off-by: Shanker Donthineni 
---
 arch/arm64/include/asm/cpucaps.h |  2 +-
 arch/arm64/include/asm/kvm_asm.h |  2 --
 arch/arm64/kernel/bpi.S  |  8 --
 arch/arm64/kernel/cpu_errata.c   | 55 ++--
 arch/arm64/kvm/hyp/entry.S   | 12 -
 arch/arm64/kvm/hyp/switch.c  | 10 
 6 files changed, 20 insertions(+), 69 deletions(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index bb26382..6ecc249 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -43,7 +43,7 @@
 #define ARM64_SVE  22
 #define ARM64_UNMAP_KERNEL_AT_EL0  23
 #define ARM64_HARDEN_BRANCH_PREDICTOR  24
-#define ARM64_HARDEN_BP_POST_GUEST_EXIT25
+/* #define ARM64_UNALLOCATED_ENTRY 25 */
 #define ARM64_HAS_RAS_EXTN 26
 
 #define ARM64_NCAPS27
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 24961b7..ab4d0a9 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -68,8 +68,6 @@
 
 extern u32 __init_stage2_translation(void);
 
-extern void __qcom_hyp_sanitize_btac_predictors(void);
-
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
index e5de335..dc4eb15 100644
--- a/arch/arm64/kernel/bpi.S
+++ b/arch/arm64/kernel/bpi.S
@@ -55,14 +55,6 @@ ENTRY(__bp_harden_hyp_vecs_start)
.endr
 ENTRY(__bp_harden_hyp_vecs_end)
 
-ENTRY(__qcom_hyp_sanitize_link_stack_start)
-   stp x29, x30, [sp, #-16]!
-   .rept   16
-   bl  . + 4
-   .endr
-   ldp x29, x30, [sp], #16
-ENTRY(__qcom_hyp_sanitize_link_stack_end)
-
 .macro smccc_workaround_1 inst
sub sp, sp, #(8 * 4)
stp x2, x3, [sp, #(8 * 0)]
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 52f15cd..d779ffd4 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -67,8 +67,6 @@ static int cpu_enable_trap_ctr_access(void *__unused)
 DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
 
 #ifdef CONFIG_KVM
-extern char __qcom_hyp_sanitize_link_stack_start[];
-extern char __qcom_hyp_sanitize_link_stack_end[];
 extern char __smccc_workaround_1_smc_start[];
 extern char __smccc_workaround_1_smc_end[];
 extern char __smccc_workaround_1_hvc_start[];
@@ -115,8 +113,6 @@ static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
spin_unlock(&bp_lock);
 }
 #else
-#define __qcom_hyp_sanitize_link_stack_start   NULL
-#define __qcom_hyp_sanitize_link_stack_end NULL
 #define __smccc_workaround_1_smc_start NULL
 #define __smccc_workaround_1_smc_end   NULL
 #define __smccc_workaround_1_hvc_start NULL
@@ -161,12 +157,25 @@ static void call_hvc_arch_workaround_1(void)
arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_1, NULL);
 }
 
+static void qcom_link_stack_sanitization(void)
+{
+   u64 tmp;
+
+   asm volatile("mov   %0, x30 \n"
+".rept 16  \n"
+"bl. + 4   \n"
+".endr \n"
+"mov   x30, %0 \n"
+: "=&r" (tmp));
+}
+
 static int enable_smccc_arch_workaround_1(void *data)
 {
const struct arm64_cpu_capabilities *entry = data;
bp_hardening_cb_t cb;
void *smccc_start, *smccc_end;
struct arm_smccc_res res;
+   u32 midr = read_cpuid_id();
 
if (!entry->matches(entry, SCOPE_LOCAL_CPU))
return 0;
@@ -199,33 +208,15 @@ static int enable_smccc_arch_workaround_1(void *data)
return 0;
}
 
+   if (((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR) ||
+   ((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR_V1))
+   cb = qcom_link_stack_sanitization;
+
install_bp_hardening_cb(entry, cb, smccc_start, smccc_end);
 
return 0;
 }
 
-static void qcom_link_stack_sanitization(void)
-{
-   u64 tmp;
-
-   asm volatile("mov   %0, x30 \n"
-".rept 16  \n"
-"bl. + 4   \n"
-".endr \n"
-"mov   x30, %0 \n"
-: "=&r" (tmp));
-}
-
-static int qcom_enable_link_stack_sanitization(

[PATCH v6] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-28 Thread Shanker Donthineni
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
The following two control bits DIC and IDC were defined for this
purpose. No need to perform point of unification cache maintenance
operations from software on systems where CPU caches are transparent.

This patch optimize the three functions __flush_cache_user_range(),
clean_dcache_area_pou() and invalidate_icache_range() if the hardware
reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
in order to avoid the unnecessary overhead.

CTR_EL0.DIC: Instruction cache invalidation requirements for
 instruction to data coherence. The meaning of this bit[29].
  0: Instruction cache invalidation to the point of unification
 is required for instruction to data coherence.
  1: Instruction cache cleaning to the point of unification is
  not required for instruction to data coherence.

CTR_EL0.IDC: Data cache clean requirements for instruction to data
 coherence. The meaning of this bit[28].
  0: Data cache clean to the point of unification is required for
 instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
 or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
  1: Data cache clean to the point of unification is not required
 for instruction to data coherence.

Co-authored-by: Philip Elcan 
Signed-off-by: Shanker Donthineni 
---
Changes since v5:
  -Addressed Mark's review comments.

Changes since v4:
  -Moved patching ARM64_HAS_CACHE_DIC inside invalidate_icache_by_line
  -Removed 'dsb ishst' for ARM64_HAS_CACHE_DIC as Mark suggested.

Changes since v3:
  -Added preprocessor guard CONFIG_xxx to code snippets in cache.S
  -Changed barrier attributes from ISH to ISHST.

Changes since v2:
  -Included barriers, DSB/ISB with DIC set, and DSB with IDC set.
  -Single Kconfig option.

Changes since v1:
  -Reworded commit text.
  -Used the alternatives framework as Catalin suggested.
  -Rebased on top of https://patchwork.kernel.org/patch/10227927/

 arch/arm64/Kconfig | 12 
 arch/arm64/include/asm/assembler.h |  6 ++
 arch/arm64/include/asm/cache.h |  4 
 arch/arm64/include/asm/cpucaps.h   |  4 +++-
 arch/arm64/kernel/cpufeature.c | 40 --
 arch/arm64/mm/cache.S  | 13 +
 6 files changed, 72 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7381eeb..41af850 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1091,6 +1091,18 @@ config ARM64_RAS_EXTN
  and access the new registers if the system supports the extension.
  Platform RAS features may additionally depend on firmware support.
 
+config ARM64_SKIP_CACHE_POU
+   bool "Enable support to skip cache PoU operations"
+   default y
+   help
+ Explicit point of unification cache operations can be eliminated
+ in software if the hardware handles transparently. The new bits in
+ CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware
+ capabilities of ICache and DCache PoU requirements.
+
+ Selecting this feature will allow the kernel to optimize cache
+ maintenance to the PoU.
+
 endmenu
 
 config ARM64_SVE
diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index 3c78835..39f2274 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -444,6 +444,11 @@
  * Corrupts:   tmp1, tmp2
  */
.macro invalidate_icache_by_line start, end, tmp1, tmp2, label
+#ifdef CONFIG_ARM64_SKIP_CACHE_POU
+alternative_if ARM64_HAS_CACHE_DIC
+   b   9996f
+alternative_else_nop_endif
+#endif
icache_line_size \tmp1, \tmp2
sub \tmp2, \tmp1, #1
bic \tmp2, \start, \tmp2
@@ -453,6 +458,7 @@
cmp \tmp2, \end
b.lo9997b
dsb ish
+9996:
isb
.endm
 
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index ea9bb4e..d460e9f 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -20,8 +20,12 @@
 
 #define CTR_L1IP_SHIFT 14
 #define CTR_L1IP_MASK  3
+#define CTR_DMLINE_SHIFT   16
+#define CTR_ERG_SHIFT  20
 #define CTR_CWG_SHIFT  24
 #define CTR_CWG_MASK   15
+#define CTR_IDC_SHIFT  28
+#define CTR_DIC_SHIFT  29
 
 #define CTR_L1IP(ctr)  (((ctr) >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK)
 
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index bb26382..8dd42ae 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -45,7 +45,9 @@
 #define ARM64_HARDEN_BRANCH_PREDICTOR  24
 #define ARM64_HARDEN_BP_POST_GUEST_EXIT  

[PATCH v5] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-24 Thread Shanker Donthineni
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
The following two control bits DIC and IDC were defined for this
purpose. No need to perform point of unification cache maintenance
operations from software on systems where CPU caches are transparent.

This patch optimize the three functions __flush_cache_user_range(),
clean_dcache_area_pou() and invalidate_icache_range() if the hardware
reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
in order to avoid the unnecessary overhead.

CTR_EL0.DIC: Instruction cache invalidation requirements for
 instruction to data coherence. The meaning of this bit[29].
  0: Instruction cache invalidation to the point of unification
 is required for instruction to data coherence.
  1: Instruction cache cleaning to the point of unification is
  not required for instruction to data coherence.

CTR_EL0.IDC: Data cache clean requirements for instruction to data
 coherence. The meaning of this bit[28].
  0: Data cache clean to the point of unification is required for
 instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
 or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
  1: Data cache clean to the point of unification is not required
 for instruction to data coherence.

Signed-off-by: Philip Elcan 
Signed-off-by: Shanker Donthineni 
---
Changes since v4:
  -Moved patching ARM64_HAS_CACHE_DIC inside invalidate_icache_by_line
  -Removed 'dsb ishst' for ARM64_HAS_CACHE_DIC as Mark suggested.

Changes since v3:
  -Added preprocessor guard CONFIG_xxx to code snippets in cache.S
  -Changed barrier attributes from ISH to ISHST.

Changes since v2:
  -Included barriers, DSB/ISB with DIC set, and DSB with IDC set.
  -Single Kconfig option.

Changes since v1:
  -Reworded commit text.
  -Used the alternatives framework as Catalin suggested.
  -Rebased on top of https://patchwork.kernel.org/patch/10227927/

 arch/arm64/Kconfig | 12 
 arch/arm64/include/asm/assembler.h |  6 ++
 arch/arm64/include/asm/cache.h |  5 +
 arch/arm64/include/asm/cpucaps.h   |  4 +++-
 arch/arm64/kernel/cpufeature.c | 40 --
 arch/arm64/mm/cache.S  | 13 +
 6 files changed, 73 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f55fe5b..82b8053 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN
  and access the new registers if the system supports the extension.
  Platform RAS features may additionally depend on firmware support.
 
+config ARM64_SKIP_CACHE_POU
+   bool "Enable support to skip cache POU operations"
+   default y
+   help
+ Explicit point of unification cache operations can be eliminated
+ in software if the hardware handles transparently. The new bits in
+ CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware
+ capabilities of ICache and DCache POU requirements.
+
+ Selecting this feature will allow the kernel to optimize the POU
+ cache maintaince operations where it requires 'D{I}C C{I}VAU'
+
 endmenu
 
 config ARM64_SVE
diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index 3c78835..39f2274 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -444,6 +444,11 @@
  * Corrupts:   tmp1, tmp2
  */
.macro invalidate_icache_by_line start, end, tmp1, tmp2, label
+#ifdef CONFIG_ARM64_SKIP_CACHE_POU
+alternative_if ARM64_HAS_CACHE_DIC
+   b   9996f
+alternative_else_nop_endif
+#endif
icache_line_size \tmp1, \tmp2
sub \tmp2, \tmp1, #1
bic \tmp2, \start, \tmp2
@@ -453,6 +458,7 @@
cmp \tmp2, \end
b.lo9997b
dsb ish
+9996:
isb
.endm
 
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index ea9bb4e..e22178b 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -20,8 +20,13 @@
 
 #define CTR_L1IP_SHIFT 14
 #define CTR_L1IP_MASK  3
+#define CTR_DMLINE_SHIFT   16
+#define CTR_ERG_SHIFT  20
 #define CTR_CWG_SHIFT  24
 #define CTR_CWG_MASK   15
+#define CTR_IDC_SHIFT  28
+#define CTR_DIC_SHIFT  29
+#define CTR_B31_SHIFT  31
 
 #define CTR_L1IP(ctr)  (((ctr) >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK)
 
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index bb26382..8dd42ae 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -45,7 +45,9 @@
 #define ARM64_HARDEN_BRANCH_PREDICTOR  24
 #define ARM64

[PATCH v4] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-22 Thread Shanker Donthineni
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
The following two control bits DIC and IDC were defined for this
purpose. No need to perform point of unification cache maintenance
operations from software on systems where CPU caches are transparent.

This patch optimize the three functions __flush_cache_user_range(),
clean_dcache_area_pou() and invalidate_icache_range() if the hardware
reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
in order to avoid the unnecessary overhead.

CTR_EL0.DIC: Instruction cache invalidation requirements for
 instruction to data coherence. The meaning of this bit[29].
  0: Instruction cache invalidation to the point of unification
 is required for instruction to data coherence.
  1: Instruction cache cleaning to the point of unification is
  not required for instruction to data coherence.

CTR_EL0.IDC: Data cache clean requirements for instruction to data
 coherence. The meaning of this bit[28].
  0: Data cache clean to the point of unification is required for
 instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
 or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
  1: Data cache clean to the point of unification is not required
 for instruction to data coherence.

Signed-off-by: Philip Elcan 
Signed-off-by: Shanker Donthineni 
---
Changes since v3:
  -Added preprocessor guard CONFIG_xxx to code snippets in cache.S
  -Changed barrier attributes from ISH to ISHST.

Changes since v2:
  -Included barriers, DSB/ISB with DIC set, and DSB with IDC set.
  -Single Kconfig option.

Changes since v1:
  -Reworded commit text.
  -Used the alternatives framework as Catalin suggested.
  -Rebased on top of https://patchwork.kernel.org/patch/10227927/

 arch/arm64/Kconfig   | 12 
 arch/arm64/include/asm/cache.h   |  5 +
 arch/arm64/include/asm/cpucaps.h |  4 +++-
 arch/arm64/kernel/cpufeature.c   | 40 ++--
 arch/arm64/mm/cache.S| 29 -
 5 files changed, 82 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f55fe5b..82b8053 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN
  and access the new registers if the system supports the extension.
  Platform RAS features may additionally depend on firmware support.
 
+config ARM64_SKIP_CACHE_POU
+   bool "Enable support to skip cache POU operations"
+   default y
+   help
+ Explicit point of unification cache operations can be eliminated
+ in software if the hardware handles transparently. The new bits in
+ CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware
+ capabilities of ICache and DCache POU requirements.
+
+ Selecting this feature will allow the kernel to optimize the POU
+ cache maintaince operations where it requires 'D{I}C C{I}VAU'
+
 endmenu
 
 config ARM64_SVE
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index ea9bb4e..e22178b 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -20,8 +20,13 @@
 
 #define CTR_L1IP_SHIFT 14
 #define CTR_L1IP_MASK  3
+#define CTR_DMLINE_SHIFT   16
+#define CTR_ERG_SHIFT  20
 #define CTR_CWG_SHIFT  24
 #define CTR_CWG_MASK   15
+#define CTR_IDC_SHIFT  28
+#define CTR_DIC_SHIFT  29
+#define CTR_B31_SHIFT  31
 
 #define CTR_L1IP(ctr)  (((ctr) >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK)
 
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index bb26382..8dd42ae 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -45,7 +45,9 @@
 #define ARM64_HARDEN_BRANCH_PREDICTOR  24
 #define ARM64_HARDEN_BP_POST_GUEST_EXIT25
 #define ARM64_HAS_RAS_EXTN 26
+#define ARM64_HAS_CACHE_IDC27
+#define ARM64_HAS_CACHE_DIC28
 
-#define ARM64_NCAPS27
+#define ARM64_NCAPS29
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index ff8a6e9..c0b0db0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -199,12 +199,12 @@ static int __init register_cpu_hwcaps_dumper(void)
 };
 
 static const struct arm64_ftr_bits ftr_ctr[] = {
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1),   
/* RES1 */
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 29, 1, 1),  
/* DIC */
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 28, 1, 1), 

Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Shanker Donthineni
Hi Mark,

On 02/21/2018 09:09 AM, Mark Rutland wrote:
> On Wed, Feb 21, 2018 at 07:49:06AM -0600, Shanker Donthineni wrote:
>> The DCache clean & ICache invalidation requirements for instructions
>> to be data coherence are discoverable through new fields in CTR_EL0.
>> The following two control bits DIC and IDC were defined for this
>> purpose. No need to perform point of unification cache maintenance
>> operations from software on systems where CPU caches are transparent.
>>
>> This patch optimize the three functions __flush_cache_user_range(),
>> clean_dcache_area_pou() and invalidate_icache_range() if the hardware
>> reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
>> instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
>> in order to avoid the unnecessary overhead.
>>
>> CTR_EL0.DIC: Instruction cache invalidation requirements for
>>  instruction to data coherence. The meaning of this bit[29].
>>   0: Instruction cache invalidation to the point of unification
>>  is required for instruction to data coherence.
>>   1: Instruction cache cleaning to the point of unification is
>>   not required for instruction to data coherence.
>>
>> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>>  coherence. The meaning of this bit[28].
>>   0: Data cache clean to the point of unification is required for
>>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>>   1: Data cache clean to the point of unification is not required
>>  for instruction to data coherence.
>>
>> Signed-off-by: Philip Elcan 
>> Signed-off-by: Shanker Donthineni 
>> ---
>> Changes since v2:
>>   -Included barriers, DSB/ISB with DIC set, and DSB with IDC set.
>>   -Single Kconfig option.
>>
>> Changes since v1:
>>   -Reworded commit text.
>>   -Used the alternatives framework as Catalin suggested.
>>   -Rebased on top of https://patchwork.kernel.org/patch/10227927/
>>
>>  arch/arm64/Kconfig   | 12 
>>  arch/arm64/include/asm/cache.h   |  5 +
>>  arch/arm64/include/asm/cpucaps.h |  4 +++-
>>  arch/arm64/kernel/cpufeature.c   | 40 
>> ++--
>>  arch/arm64/mm/cache.S| 21 +++--
>>  5 files changed, 73 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index f55fe5b..82b8053 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN
>>and access the new registers if the system supports the extension.
>>Platform RAS features may additionally depend on firmware support.
>>  
>> +config ARM64_SKIP_CACHE_POU
>> +bool "Enable support to skip cache POU operations"
>> +default y
>> +help
>> +  Explicit point of unification cache operations can be eliminated
>> +  in software if the hardware handles transparently. The new bits in
>> +  CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware
>> +  capabilities of ICache and DCache POU requirements.
>> +
>> +  Selecting this feature will allow the kernel to optimize the POU
>> +  cache maintaince operations where it requires 'D{I}C C{I}VAU'
>> +
>>  endmenu
> 
> Is it worth having a config option for this at all? The savings from turning
> this off seem trivial.
> 
>>  
>>  config ARM64_SVE
>> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
>> index ea9bb4e..e22178b 100644
>> --- a/arch/arm64/include/asm/cache.h
>> +++ b/arch/arm64/include/asm/cache.h
>> @@ -20,8 +20,13 @@
>>  
>>  #define CTR_L1IP_SHIFT  14
>>  #define CTR_L1IP_MASK   3
>> +#define CTR_DMLINE_SHIFT16
>> +#define CTR_ERG_SHIFT   20
>>  #define CTR_CWG_SHIFT   24
>>  #define CTR_CWG_MASK15
>> +#define CTR_IDC_SHIFT   28
>> +#define CTR_DIC_SHIFT   29
>> +#define CTR_B31_SHIFT   31
>>  
>>  #define CTR_L1IP(ctr)   (((ctr) >> CTR_L1IP_SHIFT) & 
>> CTR_L1IP_MASK)
>>  
>> diff --git a/arch/arm64/include/asm/cpucaps.h 
>> b/arch/arm64/include/asm/cpucaps.h
>> index bb26382..8dd42ae 100644
>> --- a/arch/arm64/include/asm/cpucaps.h
>> +++ b/arch/arm64/include/asm/cpucaps.h
>> @@ -45,7 +45,9 

[PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Shanker Donthineni
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
The following two control bits DIC and IDC were defined for this
purpose. No need to perform point of unification cache maintenance
operations from software on systems where CPU caches are transparent.

This patch optimize the three functions __flush_cache_user_range(),
clean_dcache_area_pou() and invalidate_icache_range() if the hardware
reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
in order to avoid the unnecessary overhead.

CTR_EL0.DIC: Instruction cache invalidation requirements for
 instruction to data coherence. The meaning of this bit[29].
  0: Instruction cache invalidation to the point of unification
 is required for instruction to data coherence.
  1: Instruction cache cleaning to the point of unification is
  not required for instruction to data coherence.

CTR_EL0.IDC: Data cache clean requirements for instruction to data
 coherence. The meaning of this bit[28].
  0: Data cache clean to the point of unification is required for
 instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
 or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
  1: Data cache clean to the point of unification is not required
 for instruction to data coherence.

Signed-off-by: Philip Elcan 
Signed-off-by: Shanker Donthineni 
---
Changes since v2:
  -Included barriers, DSB/ISB with DIC set, and DSB with IDC set.
  -Single Kconfig option.

Changes since v1:
  -Reworded commit text.
  -Used the alternatives framework as Catalin suggested.
  -Rebased on top of https://patchwork.kernel.org/patch/10227927/

 arch/arm64/Kconfig   | 12 
 arch/arm64/include/asm/cache.h   |  5 +
 arch/arm64/include/asm/cpucaps.h |  4 +++-
 arch/arm64/kernel/cpufeature.c   | 40 ++--
 arch/arm64/mm/cache.S| 21 +++--
 5 files changed, 73 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f55fe5b..82b8053 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN
  and access the new registers if the system supports the extension.
  Platform RAS features may additionally depend on firmware support.
 
+config ARM64_SKIP_CACHE_POU
+   bool "Enable support to skip cache POU operations"
+   default y
+   help
+ Explicit point of unification cache operations can be eliminated
+ in software if the hardware handles transparently. The new bits in
+ CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware
+ capabilities of ICache and DCache POU requirements.
+
+ Selecting this feature will allow the kernel to optimize the POU
+ cache maintaince operations where it requires 'D{I}C C{I}VAU'
+
 endmenu
 
 config ARM64_SVE
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index ea9bb4e..e22178b 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -20,8 +20,13 @@
 
 #define CTR_L1IP_SHIFT 14
 #define CTR_L1IP_MASK  3
+#define CTR_DMLINE_SHIFT   16
+#define CTR_ERG_SHIFT  20
 #define CTR_CWG_SHIFT  24
 #define CTR_CWG_MASK   15
+#define CTR_IDC_SHIFT  28
+#define CTR_DIC_SHIFT  29
+#define CTR_B31_SHIFT  31
 
 #define CTR_L1IP(ctr)  (((ctr) >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK)
 
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index bb26382..8dd42ae 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -45,7 +45,9 @@
 #define ARM64_HARDEN_BRANCH_PREDICTOR  24
 #define ARM64_HARDEN_BP_POST_GUEST_EXIT25
 #define ARM64_HAS_RAS_EXTN 26
+#define ARM64_HAS_CACHE_IDC27
+#define ARM64_HAS_CACHE_DIC28
 
-#define ARM64_NCAPS27
+#define ARM64_NCAPS29
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index ff8a6e9..12e100a 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -199,12 +199,12 @@ static int __init register_cpu_hwcaps_dumper(void)
 };
 
 static const struct arm64_ftr_bits ftr_ctr[] = {
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1),   
/* RES1 */
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 29, 1, 1),  
/* DIC */
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 28, 1, 1),  
/* IDC */
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, 24, 4, 0), 
/* CWG */
-   ARM64_FTR_BITS(FT

Re: [PATCH v2] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Shanker Donthineni
Hi Catalin,

On 02/21/2018 05:12 AM, Catalin Marinas wrote:
> On Mon, Feb 19, 2018 at 08:59:06PM -0600, Shanker Donthineni wrote:
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index f55fe5b..4061210 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1095,6 +1095,27 @@ config ARM64_RAS_EXTN
>>and access the new registers if the system supports the extension.
>>Platform RAS features may additionally depend on firmware support.
>>  
>> +config ARM64_CACHE_IDC
>> +bool "Enable support for DCache clean PoU optimization"
>> +default y
>> +help
>> +  The data cache clean to the point of unification is not required
>> +  for instruction to be data coherence if CTR_EL0.IDC has value 1.
>> +
>> +  Selecting this feature will allow the kernel to optimize the POU
>> +  cache maintaince operations where it requires 'DC CVAU'.
>> +
>> +config ARM64_CACHE_DIC
>> +bool "Enable support for ICache invalidation PoU optimization"
>> +default y
>> +help
>> +  Instruction cache invalidation to the point of unification is not
>> +  required for instruction to be data coherence if CTR_EL0.DIC has
>> +  value 1.
>> +
>> +  Selecting this feature will allow the kernel to optimize the POU
>> +  cache maintaince operations where it requires 'IC IVAU'.
> 
> A single Kconfig entry is sufficient for both features.
> 

I'll do in v3 patch.

>> @@ -864,6 +864,22 @@ static bool has_no_fpsimd(const struct 
>> arm64_cpu_capabilities *entry, int __unus
>>  ID_AA64PFR0_FP_SHIFT) < 0;
>>  }
>>  
>> +#ifdef CONFIG_ARM64_CACHE_IDC
>> +static bool has_cache_idc(const struct arm64_cpu_capabilities *entry,
>> +  int __unused)
>> +{
>> +return !!(read_sanitised_ftr_reg(SYS_CTR_EL0) & (1UL << CTR_IDC_SHIFT));
>> +}
>> +#endif
>> +
>> +#ifdef CONFIG_ARM64_CACHE_DIC
>> +static bool has_cache_dic(const struct arm64_cpu_capabilities *entry,
>> +  int __unused)
>> +{
>> +return !!(read_sanitised_ftr_reg(SYS_CTR_EL0) & (1UL << CTR_DIC_SHIFT));
>> +}
>> +#endif
> 
> Nitpick: no need for !! since the function type is bool already.
> 

Sure, I'll remove '!!'.

>> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
>> index 758bde7..7d37d71 100644
>> --- a/arch/arm64/mm/cache.S
>> +++ b/arch/arm64/mm/cache.S
>> @@ -50,6 +50,9 @@ ENTRY(flush_icache_range)
>>   */
>>  ENTRY(__flush_cache_user_range)
>>  uaccess_ttbr0_enable x2, x3, x4
>> +alternative_if ARM64_HAS_CACHE_IDC
>> +b   8f
>> +alternative_else_nop_endif
>>  dcache_line_size x2, x3
>>  sub x3, x2, #1
>>  bic x4, x0, x3
>> @@ -60,6 +63,11 @@ user_alt 9f, "dc cvau, x4",  "dc civac, x4",  
>> ARM64_WORKAROUND_CLEAN_CACHE
>>  b.lo1b
>>  dsb ish
>>  
>> +8:
>> +alternative_if ARM64_HAS_CACHE_DIC
>> +mov x0, #0
>> +b   1f
>> +alternative_else_nop_endif
>>  invalidate_icache_by_line x0, x1, x2, x3, 9f
>>  mov x0, #0
>>  1:
> 
> You can add another label at mov x0, #0 below this hunk and keep a
> single instruction in the alternative path.
> 
> However, my worry is that in an implementation with DIC set, we also
> skip the DSB/ISB sequence in the invalidate_icache_by_line macro. For
> example, in an implementation with transparent PoU, we could have:
> 
>   str , [addr]
>   // no cache maintenance or barrier
>   br  
> 

Thanks for pointing out the missing barriers. I think it make sense to follow
the existing barrier semantics in order to avoid the unknown things.
 
> Is an ISB required between the instruction store and execution? I would
> say yes but maybe Will has a better opinion here.
> 
Agree, an ISB is required especially for self-modifying code. I'll include in 
v3 patch. 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-19 Thread Shanker Donthineni
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
The following two control bits DIC and IDC were defined for this
purpose. No need to perform point of unification cache maintenance
operations from software on systems where CPU caches are transparent.

This patch optimize the three functions __flush_cache_user_range(),
clean_dcache_area_pou() and invalidate_icache_range() if the hardware
reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
in order to avoid the unnecessary overhead.

CTR_EL0.DIC: Instruction cache invalidation requirements for
 instruction to data coherence. The meaning of this bit[29].
  0: Instruction cache invalidation to the point of unification
 is required for instruction to data coherence.
  1: Instruction cache cleaning to the point of unification is
  not required for instruction to data coherence.

CTR_EL0.IDC: Data cache clean requirements for instruction to data
 coherence. The meaning of this bit[28].
  0: Data cache clean to the point of unification is required for
 instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
 or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
  1: Data cache clean to the point of unification is not required
 for instruction to data coherence.

Signed-off-by: Philip Elcan 
Signed-off-by: Shanker Donthineni 
---
Changes since v1:
  -Reworded commit text.
  -Used the alternatives framework as Catalin suggested.
  -Rebased on top of https://patchwork.kernel.org/patch/10227927/

 arch/arm64/Kconfig   | 21 +++
 arch/arm64/include/asm/cache.h   |  5 +
 arch/arm64/include/asm/cpucaps.h |  4 +++-
 arch/arm64/kernel/cpufeature.c   | 44 ++--
 arch/arm64/mm/cache.S| 15 ++
 5 files changed, 82 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f55fe5b..4061210 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1095,6 +1095,27 @@ config ARM64_RAS_EXTN
  and access the new registers if the system supports the extension.
  Platform RAS features may additionally depend on firmware support.
 
+config ARM64_CACHE_IDC
+   bool "Enable support for DCache clean PoU optimization"
+   default y
+   help
+ The data cache clean to the point of unification is not required
+ for instruction to be data coherence if CTR_EL0.IDC has value 1.
+
+ Selecting this feature will allow the kernel to optimize the POU
+ cache maintaince operations where it requires 'DC CVAU'.
+
+config ARM64_CACHE_DIC
+   bool "Enable support for ICache invalidation PoU optimization"
+   default y
+   help
+ Instruction cache invalidation to the point of unification is not
+ required for instruction to be data coherence if CTR_EL0.DIC has
+ value 1.
+
+ Selecting this feature will allow the kernel to optimize the POU
+ cache maintaince operations where it requires 'IC IVAU'.
+
 endmenu
 
 config ARM64_SVE
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index ea9bb4e..e22178b 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -20,8 +20,13 @@
 
 #define CTR_L1IP_SHIFT 14
 #define CTR_L1IP_MASK  3
+#define CTR_DMLINE_SHIFT   16
+#define CTR_ERG_SHIFT  20
 #define CTR_CWG_SHIFT  24
 #define CTR_CWG_MASK   15
+#define CTR_IDC_SHIFT  28
+#define CTR_DIC_SHIFT  29
+#define CTR_B31_SHIFT  31
 
 #define CTR_L1IP(ctr)  (((ctr) >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK)
 
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index bb26382..8dd42ae 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -45,7 +45,9 @@
 #define ARM64_HARDEN_BRANCH_PREDICTOR  24
 #define ARM64_HARDEN_BP_POST_GUEST_EXIT25
 #define ARM64_HAS_RAS_EXTN 26
+#define ARM64_HAS_CACHE_IDC27
+#define ARM64_HAS_CACHE_DIC28
 
-#define ARM64_NCAPS27
+#define ARM64_NCAPS29
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index ff8a6e9..53a7266 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -199,12 +199,12 @@ static int __init register_cpu_hwcaps_dumper(void)
 };
 
 static const struct arm64_ftr_bits ftr_ctr[] = {
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1),   
/* RES1 */
-   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 29, 1, 1),  
/* DIC */
-

Re: [PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-19 Thread Shanker Donthineni
Thanks Catalin for your comments.

On 02/19/2018 11:18 AM, Catalin Marinas wrote:
> On Mon, Feb 19, 2018 at 10:35:30AM -0600, Shanker Donthineni wrote:
>> On 02/19/2018 08:38 AM, Catalin Marinas wrote:
>>> On the patch, I'd rather have an alternative framework entry for no VAU
>>> cache maint required and some ret instruction at the beginning of the
>>> cache maint function rather than jumping out of the loop somewhere
>>> inside the cache maintenance code, penalising the CPUs that do require
>>> it.
>>
>> Alternative framework might break things in case of CPU hotplug. I need one
>> more confirmation from you on incorporating alternative framework. 
> 
> CPU hotplug can be an issue but it should be handled like other similar
> cases: if a CPU comes online late and its features are incompatible, it
> should not be brought online. The cpufeature code handles this.
> 
> With Will's patch for CTR_EL0, we handle different CPU features during
> boot, defaulting to the lowest value for the IDC/DIC bits.
> 
> I suggest you add new ARM64_HAS_* feature bits and enable them based on
> CTR_EL0.IDC and DIC. You could check for both being 1 with a single
> feature bit but I guess an implementation is allowed to have these
> different (e.g. DIC == 0 and IDC == 1).
> 

I'll add two new features ARM64_HAS_DIC and ARM64_HAS_IDC to support
all implementations. Unfortunately QCOM server chips supports IDC not DIC.
 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-19 Thread Shanker Donthineni
Hi Will,

On 02/19/2018 08:43 AM, Will Deacon wrote:
> Hi Shanker,
> 
> On Fri, Feb 16, 2018 at 06:57:46PM -0600, Shanker Donthineni wrote:
>> Two point of unification cache maintenance operations 'DC CVAU' and
>> 'IC IVAU' are optional for implementors as per ARMv8 specification.
>> This patch parses the updated CTR_EL0 register definition and adds
>> the required changes to skip POU operations if the hardware reports
>> CTR_EL0.IDC and/or CTR_EL0.IDC.
>>
>> CTR_EL0.DIC: Instruction cache invalidation requirements for
>>  instruction to data coherence. The meaning of this bit[29].
>>   0: Instruction cache invalidation to the point of unification
>>  is required for instruction to data coherence.
>>   1: Instruction cache cleaning to the point of unification is
>>   not required for instruction to data coherence.
>>
>> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>>  coherence. The meaning of this bit[28].
>>   0: Data cache clean to the point of unification is required for
>>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>>   1: Data cache clean to the point of unification is not required
>>  for instruction to data coherence.
>>
>> Signed-off-by: Philip Elcan 
>> Signed-off-by: Shanker Donthineni 
>> ---
>>  arch/arm64/include/asm/assembler.h | 48 
>> --
>>  arch/arm64/include/asm/cache.h |  2 ++
>>  arch/arm64/kernel/cpufeature.c |  2 ++
>>  arch/arm64/mm/cache.S  | 26 ++---
>>  4 files changed, 51 insertions(+), 27 deletions(-)
> 
> I was looking at our CTR_EL0 code last week but forgot to post the patch I
> wrote fixing up some of the fields. I just send it now, so please can
> you rebase on top of:
> 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/560488.html
> 
> Also:
> 
>> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
>> index ea9bb4e..aea533b 100644
>> --- a/arch/arm64/include/asm/cache.h
>> +++ b/arch/arm64/include/asm/cache.h
>> @@ -22,6 +22,8 @@
>>  #define CTR_L1IP_MASK   3
>>  #define CTR_CWG_SHIFT   24
>>  #define CTR_CWG_MASK15
>> +#define CTR_IDC_SHIFT   28
>> +#define CTR_DIC_SHIFT   29
>>  
>>  #define CTR_L1IP(ctr)   (((ctr) >> CTR_L1IP_SHIFT) & 
>> CTR_L1IP_MASK)
>>  
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 29b1f87..f42bb5a 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -200,6 +200,8 @@ static int __init register_cpu_hwcaps_dumper(void)
>>  
>>  static const struct arm64_ftr_bits ftr_ctr[] = {
>>  ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1),   /* RAO 
>> */
>> +ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_DIC_SHIFT, 
>> 1, 0),   /* DIC */
>> +ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_IDC_SHIFT, 
>> 1, 0),   /* IDC */
>>  ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, 24, 4, 0), 
>> /* CWG */
>>  ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 20, 4, 0),  
>> /* ERG */
>>  ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1),  
>> /* DminLine */
> 
> Could you update the other table entries here to use the CTR_*_SHIFT values
> as well?
> 

I'll do.

> Thanks,
> 
> Will
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-19 Thread Shanker Donthineni
Hi Catalin,

On 02/19/2018 08:38 AM, Catalin Marinas wrote:
> On Fri, Feb 16, 2018 at 06:57:46PM -0600, Shanker Donthineni wrote:
>> Two point of unification cache maintenance operations 'DC CVAU' and
>> 'IC IVAU' are optional for implementors as per ARMv8 specification.
>> This patch parses the updated CTR_EL0 register definition and adds
>> the required changes to skip POU operations if the hardware reports
>> CTR_EL0.IDC and/or CTR_EL0.IDC.
>>
>> CTR_EL0.DIC: Instruction cache invalidation requirements for
>>  instruction to data coherence. The meaning of this bit[29].
>>   0: Instruction cache invalidation to the point of unification
>>  is required for instruction to data coherence.
>>   1: Instruction cache cleaning to the point of unification is
>>   not required for instruction to data coherence.
>>
>> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>>  coherence. The meaning of this bit[28].
>>   0: Data cache clean to the point of unification is required for
>>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>>   1: Data cache clean to the point of unification is not required
>>  for instruction to data coherence.
> 
> There is a difference between cache maintenance to PoU "is not required"
> and the actual instructions being optional (i.e. undef when executed).
> If your caches are transparent and DC CVAU/IC IVAU is not required,
> these instructions should behave as NOPs. So, are you trying to improve
> the performance of the cache maintenance routines in the kernel? If yes,
> please show some (relative) numbers and a better description in the
> commit log.
> 

Yes, I agree with you, POU instructions are NOPs if the caches are transparent.
There was no issue as per correctness point of view. But causing the unnecessary
overhead in ASM routines where code goes thorough VA range incremented
by cache line size. This overhead is noticeable with 64K PAGE, especially with 
sections mappings. I'll reword the commit text to reflect your comments in v2 
patch.

e.g. 512M section with 64K PAGE_SIZE kernel, assume 64Bytes cache size.
 flush_icache_range() consumes around 256M cpu cycles
 
Icache loop overhead: 512Mbytes / 64Bytes * 4 instructions per loop
Dcache loop overhead: 512Mbytes / 64Bytes * 4 instructions per loop


With this patch it takes less than ~1K cycles.

 
> On the patch, I'd rather have an alternative framework entry for no VAU
> cache maint required and some ret instruction at the beginning of the
> cache maint function rather than jumping out of the loop somewhere
> inside the cache maintenance code, penalising the CPUs that do require
> it.
> 

Alternative framework might break things in case of CPU hotplug. I need one
more confirmation from you on incorporating alternative framework. 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: No need to zero CNTVOFF in kvm_timer_vcpu_put() for VHE

2018-02-19 Thread Shanker Donthineni
In AArch64/AArch32, the virtual counter uses a fixed virtual offset
of zero in the following situations as per ARMv8 specifications:

1) HCR_EL2.E2H is 1, and CNTVCT_EL0/CNTVCT are read from EL2.
2) HCR_EL2.{E2H, TGE} is {1, 1}, and either:
   — CNTVCT_EL0 is read from Non-secure EL0 or EL2.
   — CNTVCT is read from Non-secure EL0.

So, no need to zero CNTVOFF_EL2/CNTVOFF for VHE case.

Signed-off-by: Shanker Donthineni 
---
 virt/kvm/arm/arch_timer.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 70268c0..86eca324 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -541,9 +541,11 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 * The kernel may decide to run userspace after calling vcpu_put, so
 * we reset cntvoff to 0 to ensure a consistent read between user
 * accesses to the virtual counter and kernel access to the physical
-* counter.
+* counter of non-VHE case. For VHE, the virtual counter uses a fixed
+* virtual offset of zero, so no need to zero CNTVOFF_EL2 register.
 */
-   set_cntvoff(0);
+   if (!has_vhe())
+   set_cntvoff(0);
 }
 
 /*
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-16 Thread Shanker Donthineni
Two point of unification cache maintenance operations 'DC CVAU' and
'IC IVAU' are optional for implementors as per ARMv8 specification.
This patch parses the updated CTR_EL0 register definition and adds
the required changes to skip POU operations if the hardware reports
CTR_EL0.IDC and/or CTR_EL0.IDC.

CTR_EL0.DIC: Instruction cache invalidation requirements for
 instruction to data coherence. The meaning of this bit[29].
  0: Instruction cache invalidation to the point of unification
 is required for instruction to data coherence.
  1: Instruction cache cleaning to the point of unification is
  not required for instruction to data coherence.

CTR_EL0.IDC: Data cache clean requirements for instruction to data
 coherence. The meaning of this bit[28].
  0: Data cache clean to the point of unification is required for
 instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
 or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
  1: Data cache clean to the point of unification is not required
 for instruction to data coherence.

Signed-off-by: Philip Elcan 
Signed-off-by: Shanker Donthineni 
---
 arch/arm64/include/asm/assembler.h | 48 --
 arch/arm64/include/asm/cache.h |  2 ++
 arch/arm64/kernel/cpufeature.c |  2 ++
 arch/arm64/mm/cache.S  | 26 ++---
 4 files changed, 51 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index 3c78835..9eaa948 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
.macro save_and_disable_daif, flags
mrs \flags, daif
@@ -334,9 +335,9 @@
  * raw_dcache_line_size - get the minimum D-cache line size on this CPU
  * from the CTR register.
  */
-   .macro  raw_dcache_line_size, reg, tmp
-   mrs \tmp, ctr_el0   // read CTR
-   ubfm\tmp, \tmp, #16, #19// cache line size encoding
+   .macro  raw_dcache_line_size, reg, tmp, ctr
+   mrs \ctr, ctr_el0   // read CTR
+   ubfm\tmp, \ctr, #16, #19// cache line size encoding
mov \reg, #4// bytes per word
lsl \reg, \reg, \tmp// actual cache line size
.endm
@@ -344,9 +345,9 @@
 /*
  * dcache_line_size - get the safe D-cache line size across all CPUs
  */
-   .macro  dcache_line_size, reg, tmp
-   read_ctr\tmp
-   ubfm\tmp, \tmp, #16, #19// cache line size encoding
+   .macro  dcache_line_size, reg, tmp, ctr
+   read_ctr\ctr
+   ubfm\tmp, \ctr, #16, #19// cache line size encoding
mov \reg, #4// bytes per word
lsl \reg, \reg, \tmp// actual cache line size
.endm
@@ -355,9 +356,9 @@
  * raw_icache_line_size - get the minimum I-cache line size on this CPU
  * from the CTR register.
  */
-   .macro  raw_icache_line_size, reg, tmp
-   mrs \tmp, ctr_el0   // read CTR
-   and \tmp, \tmp, #0xf// cache line size encoding
+   .macro  raw_icache_line_size, reg, tmp, ctr
+   mrs \ctr, ctr_el0   // read CTR
+   and \tmp, \ctr, #0xf// cache line size encoding
mov \reg, #4// bytes per word
lsl \reg, \reg, \tmp// actual cache line size
.endm
@@ -365,9 +366,9 @@
 /*
  * icache_line_size - get the safe I-cache line size across all CPUs
  */
-   .macro  icache_line_size, reg, tmp
-   read_ctr\tmp
-   and \tmp, \tmp, #0xf// cache line size encoding
+   .macro  icache_line_size, reg, tmp, ctr
+   read_ctr\ctr
+   and \tmp, \ctr, #0xf// cache line size encoding
mov \reg, #4// bytes per word
lsl \reg, \reg, \tmp// actual cache line size
.endm
@@ -408,13 +409,21 @@
  * size:   size of the region
  * Corrupts:   kaddr, size, tmp1, tmp2
  */
-   .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
-   dcache_line_size \tmp1, \tmp2
+   .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2, tmp3
+   dcache_line_size \tmp1, \tmp2, \tmp3
add \size, \kaddr, \size
sub \tmp2, \tmp1, #1
bic \kaddr, \kaddr, \tmp2
 9998:
-   .if (\op == cvau || \op == cvac)
+   .if (\op == cvau)
+alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
+   tbnz\tmp3, #CTR_IDC_SHIFT, 9997f
+   dc  cvau, \kaddr
+alternative_else
+   dc  civac, \kaddr
+   nop
+alternative_endif
+   .elseif (\op == cvac)
 alternative_if_not ARM64_WORKA

[PATCH] arm64: Add missing Falkor part number for branch predictor hardening

2018-02-11 Thread Shanker Donthineni
References to CPU part number MIDR_QCOM_FALKOR were dropped from the
mailing list patch due to mainline/arm64 branch dependency. So this
patch adds the missing part number.

Fixes: ec82b567a74f ("arm64: Implement branch predictor hardening for Falkor")
Signed-off-by: Shanker Donthineni 
---
 arch/arm64/kernel/cpu_errata.c | 9 +
 arch/arm64/kvm/hyp/switch.c| 4 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 0782359..52f15cd 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -408,6 +408,15 @@ static int qcom_enable_link_stack_sanitization(void *data)
},
{
.capability = ARM64_HARDEN_BRANCH_PREDICTOR,
+   MIDR_ALL_VERSIONS(MIDR_QCOM_FALKOR),
+   .enable = qcom_enable_link_stack_sanitization,
+   },
+   {
+   .capability = ARM64_HARDEN_BP_POST_GUEST_EXIT,
+   MIDR_ALL_VERSIONS(MIDR_QCOM_FALKOR),
+   },
+   {
+   .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN),
.enable = enable_smccc_arch_workaround_1,
},
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 116252a8..870f4b1 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -407,8 +407,10 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
u32 midr = read_cpuid_id();
 
/* Apply BTAC predictors mitigation to all Falkor chips */
-   if ((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR_V1)
+   if (((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR) ||
+   ((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR_V1)) {
__qcom_hyp_sanitize_btac_predictors();
+   }
}
 
fp_enabled = __fpsimd_enabled();
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] irqchip/gic-v3: Use wmb() instead of smb_wmb() in gic_raise_softirq()

2018-02-01 Thread Shanker Donthineni
Hi Will, Thanks for your quick reply.

On 02/01/2018 04:33 AM, Will Deacon wrote:
> Hi Shanker,
> 
> On Wed, Jan 31, 2018 at 06:03:42PM -0600, Shanker Donthineni wrote:
>> A DMB instruction can be used to ensure the relative order of only
>> memory accesses before and after the barrier. Since writes to system
>> registers are not memory operations, barrier DMB is not sufficient
>> for observability of memory accesses that occur before ICC_SGI1R_EL1
>> writes.
>>
>> A DSB instruction ensures that no instructions that appear in program
>> order after the DSB instruction, can execute until the DSB instruction
>> has completed.
>>
>> Signed-off-by: Shanker Donthineni 
>> ---
>>  drivers/irqchip/irq-gic-v3.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
>> index b56c3e2..980ae8e 100644
>> --- a/drivers/irqchip/irq-gic-v3.c
>> +++ b/drivers/irqchip/irq-gic-v3.c
>> @@ -688,7 +688,7 @@ static void gic_raise_softirq(const struct cpumask 
>> *mask, unsigned int irq)
>>   * Ensure that stores to Normal memory are visible to the
>>   * other CPUs before issuing the IPI.
>>   */
>> -smp_wmb();
>> +wmb();
> 
> I think this is the right thing to do and the smp_wmb() was accidentally
> pulled in here as a copy-paste from the GICv2 driver where it is sufficient
> in practice.
> 
> Did you spot this by code inspection, or did the DMB actually cause
> observable failures? (trying to figure out whether or not this need to go
> to -stable).
> 

We've inspected the code because kernel was causing failures in 
scheduler/IPI_RESCHDULE.
After some time of debugging, we landed in GIC driver and found that the issue 
was due
to the DMB barrier. 

Side note, we're also missing synchronization barriers in GIC driver after 
writing some
of the ICC_XXX system registers. I'm planning to post those changes for 
comments.

e.g: gic_write_sgi1r(val) and gic_write_eoir(irqnr);
 

> Anyway:
> 
> Acked-by: Will Deacon 
> 
> Cheers,
> 
> Will
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] irqchip/gic-v3: Use wmb() instead of smb_wmb() in gic_raise_softirq()

2018-01-31 Thread Shanker Donthineni
A DMB instruction can be used to ensure the relative order of only
memory accesses before and after the barrier. Since writes to system
registers are not memory operations, barrier DMB is not sufficient
for observability of memory accesses that occur before ICC_SGI1R_EL1
writes.

A DSB instruction ensures that no instructions that appear in program
order after the DSB instruction, can execute until the DSB instruction
has completed.

Signed-off-by: Shanker Donthineni 
---
 drivers/irqchip/irq-gic-v3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index b56c3e2..980ae8e 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -688,7 +688,7 @@ static void gic_raise_softirq(const struct cpumask *mask, 
unsigned int irq)
 * Ensure that stores to Normal memory are visible to the
 * other CPUs before issuing the IPI.
 */
-   smp_wmb();
+   wmb();
 
for_each_cpu(cpu, mask) {
u64 cluster_id = MPIDR_TO_SGI_CLUSTER_ID(cpu_logical_map(cpu));
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Implement branch predictor hardening for Falkor

2018-01-08 Thread Shanker Donthineni
Hi Will/Catalin,

Please drop 
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=kpti&id=79ad24ef6c260efa0614896b15e67f4829448e32
 in which you've removed FALKOR MIDR change. I've posted
v2 patch series including typo fix & FALKOR MIDR patch which is already 
available in upstream v4.15-rc7
branch. Please merge v2 patch.

On 01/08/2018 01:10 PM, Shanker Donthineni wrote:
> Hi Will,
> 
> On 01/08/2018 12:44 PM, Will Deacon wrote:
>> On Mon, Jan 08, 2018 at 05:09:33PM +, Will Deacon wrote:
>>> On Fri, Jan 05, 2018 at 02:28:59PM -0600, Shanker Donthineni wrote:
>>>> Falkor is susceptible to branch predictor aliasing and can
>>>> theoretically be attacked by malicious code. This patch
>>>> implements a mitigation for these attacks, preventing any
>>>> malicious entries from affecting other victim contexts.
>>>
>>> Thanks, Shanker. I'll pick this up (fixing the typo pointed out by Drew).
>>
>> Note that MIDR_FALKOR doesn't exist in mainline, so I had to drop those
>> changes too. See the kpti branch for details.
>>
> 
> The FALKOR MIDR patch is already available in the upstream kernel v4.15-rc7
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64?h=v4.15-rc7&id=c622cc013cece073722592cff1ac6643a33b1622
> 
> If you want I can resend the above patch in v2 series including typo fix.
> 
>> If you'd like anything else done here, please send additional patches to me
>> and Catalin that we can apply on top of what we currently have. Note that
>> I'm in the air tomorrow, so won't be picking up email.
>>
>> Cheers,
>>
>> Will
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 1/2] arm64: Define cputype macros for Falkor CPU

2018-01-08 Thread Shanker Donthineni
Add cputype definition macros for Qualcomm Datacenter Technologies
Falkor CPU in cputype.h. It's unfortunate that the first revision
of the Falkor CPU used the wrong part number 0x800, got fixed in v2
chip with part number 0xC00, and would be used the same value for
future revisions.

Signed-off-by: Shanker Donthineni 
Signed-off-by: Will Deacon 
---
This patch is availble at 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64?h=v4.15-rc7&id=c622cc013cece073722592cff1ac6643a33b1622

 arch/arm64/include/asm/cputype.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 84385b9..424ca71d 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -93,6 +93,7 @@
 #define BRCM_CPU_PART_VULCAN   0x516
 
 #define QCOM_CPU_PART_FALKOR_V10x800
+#define QCOM_CPU_PART_FALKOR   0xC00
 
 #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A53)
 #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A57)
@@ -103,6 +104,7 @@
 #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_81XX)
 #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_83XX)
 #define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, 
QCOM_CPU_PART_FALKOR_V1)
+#define MIDR_QCOM_FALKOR MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR)
 
 #ifndef __ASSEMBLY__
 
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 2/2] arm64: Implement branch predictor hardening for Falkor

2018-01-08 Thread Shanker Donthineni
Falkor is susceptible to branch predictor aliasing and can
theoretically be attacked by malicious code. This patch
implements a mitigation for these attacks, preventing any
malicious entries from affecting other victim contexts.

Signed-off-by: Shanker Donthineni 
---
Changes since v1:
  Corrected typo to fix the compilation errors if HARDEN_BRANCH_PREDICTOR=n

This patch requires FALKOR MIDR which is available in upstream v4.15-rc7
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64?h=v4.15-rc7&id=c622cc013cece073722592cff1ac6643a33b1622
 ans also
  attached this v2 patch series.

 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/kvm_asm.h |  2 ++
 arch/arm64/kernel/bpi.S  |  8 +++
 arch/arm64/kernel/cpu_errata.c   | 49 ++--
 arch/arm64/kvm/hyp/entry.S   | 12 ++
 arch/arm64/kvm/hyp/switch.c  | 10 
 6 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 51616e7..7049b48 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -43,7 +43,8 @@
 #define ARM64_SVE  22
 #define ARM64_UNMAP_KERNEL_AT_EL0  23
 #define ARM64_HARDEN_BRANCH_PREDICTOR  24
+#define ARM64_HARDEN_BP_POST_GUEST_EXIT25
 
-#define ARM64_NCAPS25
+#define ARM64_NCAPS26
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index ab4d0a9..24961b7 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -68,6 +68,8 @@
 
 extern u32 __init_stage2_translation(void);
 
+extern void __qcom_hyp_sanitize_btac_predictors(void);
+
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
index 2b10d52..44ffcda 100644
--- a/arch/arm64/kernel/bpi.S
+++ b/arch/arm64/kernel/bpi.S
@@ -77,3 +77,11 @@ ENTRY(__psci_hyp_bp_inval_start)
ldp x2, x3, [sp], #16
ldp x0, x1, [sp], #16
 ENTRY(__psci_hyp_bp_inval_end)
+
+ENTRY(__qcom_hyp_sanitize_link_stack_start)
+   stp x29, x30, [sp, #-16]!
+   .rept   16
+   bl  . + 4
+   .endr
+   ldp x29, x30, [sp], #16
+ENTRY(__qcom_hyp_sanitize_link_stack_end)
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index cb0fb37..9ee9d2e 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -54,6 +54,8 @@ static int cpu_enable_trap_ctr_access(void *__unused)
 
 #ifdef CONFIG_KVM
 extern char __psci_hyp_bp_inval_start[], __psci_hyp_bp_inval_end[];
+extern char __qcom_hyp_sanitize_link_stack_start[];
+extern char __qcom_hyp_sanitize_link_stack_end[];
 
 static void __copy_hyp_vect_bpi(int slot, const char *hyp_vecs_start,
const char *hyp_vecs_end)
@@ -96,8 +98,10 @@ static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
spin_unlock(&bp_lock);
 }
 #else
-#define __psci_hyp_bp_inval_start  NULL
-#define __psci_hyp_bp_inval_endNULL
+#define __psci_hyp_bp_inval_start  NULL
+#define __psci_hyp_bp_inval_endNULL
+#define __qcom_hyp_sanitize_link_stack_start   NULL
+#define __qcom_hyp_sanitize_link_stack_end NULL
 
 static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
  const char *hyp_vecs_start,
@@ -138,6 +142,29 @@ static int enable_psci_bp_hardening(void *data)
 
return 0;
 }
+
+static void qcom_link_stack_sanitization(void)
+{
+   u64 tmp;
+
+   asm volatile("mov   %0, x30 \n"
+".rept 16  \n"
+"bl. + 4   \n"
+".endr \n"
+"mov   x30, %0 \n"
+: "=&r" (tmp));
+}
+
+static int qcom_enable_link_stack_sanitization(void *data)
+{
+   const struct arm64_cpu_capabilities *entry = data;
+
+   install_bp_hardening_cb(entry, qcom_link_stack_sanitization,
+   __qcom_hyp_sanitize_link_stack_start,
+   __qcom_hyp_sanitize_link_stack_end);
+
+   return 0;
+}
 #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
 
 #define MIDR_RANGE(model, min, max) \
@@ -302,6 +329,24 @@ static int enable_psci_bp_hardening(void *data)
MIDR_ALL_VERSIONS(MIDR_CORTEX_A75),
.enable = enable_psci_bp_hardening,
},
+   {
+   .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
+   MIDR_ALL_VERSIONS(MIDR_QCOM_FALKOR_V1),
+   .enable = qcom_enable_link_stack_sanitization,
+   },
+   {
+   .capability = ARM64_HAR

Re: [PATCH] arm64: Implement branch predictor hardening for Falkor

2018-01-08 Thread Shanker Donthineni
Hi Will,

On 01/08/2018 12:44 PM, Will Deacon wrote:
> On Mon, Jan 08, 2018 at 05:09:33PM +, Will Deacon wrote:
>> On Fri, Jan 05, 2018 at 02:28:59PM -0600, Shanker Donthineni wrote:
>>> Falkor is susceptible to branch predictor aliasing and can
>>> theoretically be attacked by malicious code. This patch
>>> implements a mitigation for these attacks, preventing any
>>> malicious entries from affecting other victim contexts.
>>
>> Thanks, Shanker. I'll pick this up (fixing the typo pointed out by Drew).
> 
> Note that MIDR_FALKOR doesn't exist in mainline, so I had to drop those
> changes too. See the kpti branch for details.
> 

The FALKOR MIDR patch is already available in the upstream kernel v4.15-rc7

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64?h=v4.15-rc7&id=c622cc013cece073722592cff1ac6643a33b1622

If you want I can resend the above patch in v2 series including typo fix.

> If you'd like anything else done here, please send additional patches to me
> and Catalin that we can apply on top of what we currently have. Note that
> I'm in the air tomorrow, so won't be picking up email.
> 
> Cheers,
> 
> Will
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Implement branch predictor hardening for Falkor

2018-01-08 Thread Shanker Donthineni
Hi Andrew,

On 01/08/2018 03:28 AM, Andrew Jones wrote:
> Hi Shanker,
> 
> On Fri, Jan 05, 2018 at 02:28:59PM -0600, Shanker Donthineni wrote:
> ...
>> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
>> index cb0fb37..daf53a5 100644
>> --- a/arch/arm64/kernel/cpu_errata.c
>> +++ b/arch/arm64/kernel/cpu_errata.c
>> @@ -54,6 +54,8 @@ static int cpu_enable_trap_ctr_access(void *__unused)
>>  
>>  #ifdef CONFIG_KVM
>>  extern char __psci_hyp_bp_inval_start[], __psci_hyp_bp_inval_end[];
>> +extern char __qcom_hyp_sanitize_link_stack_start[];
>> +extern char __qcom_hyp_sanitize_link_stack_end[];
>>  
>>  static void __copy_hyp_vect_bpi(int slot, const char *hyp_vecs_start,
>>  const char *hyp_vecs_end)
>> @@ -96,8 +98,10 @@ static void __install_bp_hardening_cb(bp_hardening_cb_t 
>> fn,
>>  spin_unlock(&bp_lock);
>>  }
>>  #else
>> -#define __psci_hyp_bp_inval_start   NULL
>> -#define __psci_hyp_bp_inval_end NULL
>> +#define __psci_hyp_bp_inval_start   NULL
>> +#define __psci_hyp_bp_inval_end NULL
>> +#define __qcom_hyp_sanitize_link_stack_startNULL
>> +#define __qcom_hyp_sanitize_link_stack_startNULL
>   ^^ copy+paste error here

Thanks for catching typo, I'll fix in v2 patch. 

> 
> Thanks,
> drew
> 
> ___________
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] arm64: Implement branch predictor hardening for Falkor

2018-01-05 Thread Shanker Donthineni
Falkor is susceptible to branch predictor aliasing and can
theoretically be attacked by malicious code. This patch
implements a mitigation for these attacks, preventing any
malicious entries from affecting other victim contexts.

Signed-off-by: Shanker Donthineni 
---
 This patch has been verified using tip of
   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kpti
and
   
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64?h=v4.15-rc6&id=c622cc013cece073722592cff1ac6643a33b1622

 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/kvm_asm.h |  2 ++
 arch/arm64/kernel/bpi.S  |  8 +++
 arch/arm64/kernel/cpu_errata.c   | 49 ++--
 arch/arm64/kvm/hyp/entry.S   | 12 ++
 arch/arm64/kvm/hyp/switch.c  | 10 
 6 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 51616e7..7049b48 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -43,7 +43,8 @@
 #define ARM64_SVE  22
 #define ARM64_UNMAP_KERNEL_AT_EL0  23
 #define ARM64_HARDEN_BRANCH_PREDICTOR  24
+#define ARM64_HARDEN_BP_POST_GUEST_EXIT25
 
-#define ARM64_NCAPS25
+#define ARM64_NCAPS26
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index ab4d0a9..24961b7 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -68,6 +68,8 @@
 
 extern u32 __init_stage2_translation(void);
 
+extern void __qcom_hyp_sanitize_btac_predictors(void);
+
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
index 2b10d52..44ffcda 100644
--- a/arch/arm64/kernel/bpi.S
+++ b/arch/arm64/kernel/bpi.S
@@ -77,3 +77,11 @@ ENTRY(__psci_hyp_bp_inval_start)
ldp x2, x3, [sp], #16
ldp x0, x1, [sp], #16
 ENTRY(__psci_hyp_bp_inval_end)
+
+ENTRY(__qcom_hyp_sanitize_link_stack_start)
+   stp x29, x30, [sp, #-16]!
+   .rept   16
+   bl  . + 4
+   .endr
+   ldp x29, x30, [sp], #16
+ENTRY(__qcom_hyp_sanitize_link_stack_end)
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index cb0fb37..daf53a5 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -54,6 +54,8 @@ static int cpu_enable_trap_ctr_access(void *__unused)
 
 #ifdef CONFIG_KVM
 extern char __psci_hyp_bp_inval_start[], __psci_hyp_bp_inval_end[];
+extern char __qcom_hyp_sanitize_link_stack_start[];
+extern char __qcom_hyp_sanitize_link_stack_end[];
 
 static void __copy_hyp_vect_bpi(int slot, const char *hyp_vecs_start,
const char *hyp_vecs_end)
@@ -96,8 +98,10 @@ static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
spin_unlock(&bp_lock);
 }
 #else
-#define __psci_hyp_bp_inval_start  NULL
-#define __psci_hyp_bp_inval_endNULL
+#define __psci_hyp_bp_inval_start  NULL
+#define __psci_hyp_bp_inval_endNULL
+#define __qcom_hyp_sanitize_link_stack_start   NULL
+#define __qcom_hyp_sanitize_link_stack_start   NULL
 
 static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
  const char *hyp_vecs_start,
@@ -138,6 +142,29 @@ static int enable_psci_bp_hardening(void *data)
 
return 0;
 }
+
+static void qcom_link_stack_sanitization(void)
+{
+   u64 tmp;
+
+   asm volatile("mov   %0, x30 \n"
+".rept 16  \n"
+"bl. + 4   \n"
+".endr \n"
+"mov   x30, %0 \n"
+: "=&r" (tmp));
+}
+
+static int qcom_enable_link_stack_sanitization(void *data)
+{
+   const struct arm64_cpu_capabilities *entry = data;
+
+   install_bp_hardening_cb(entry, qcom_link_stack_sanitization,
+   __qcom_hyp_sanitize_link_stack_start,
+   __qcom_hyp_sanitize_link_stack_end);
+
+   return 0;
+}
 #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
 
 #define MIDR_RANGE(model, min, max) \
@@ -302,6 +329,24 @@ static int enable_psci_bp_hardening(void *data)
MIDR_ALL_VERSIONS(MIDR_CORTEX_A75),
.enable = enable_psci_bp_hardening,
},
+   {
+   .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
+   MIDR_ALL_VERSIONS(MIDR_QCOM_FALKOR_V1),
+   .enable = qcom_enable_link_stack_sanitization,
+   },
+   {
+   .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
+   MIDR_ALL_VERSIONS(MIDR_QCO

[PATCH v5 2/2] arm64: Add software workaround for Falkor erratum 1041

2017-12-11 Thread Shanker Donthineni
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.

When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.

1) A System Error Interrupt (SEI) being raised by the Falkor core due
   to the errant memory access attempting to access a region of memory
   that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
   memory. This behavior may only occur if the instruction cache is
   disabled prior to or coincident with translation being changed from
   enabled to disabled.

The conditions leading to this erratum will not occur when either of the
following occur:
 1) A higher exception level disables translation of a lower exception level
   (e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
 2) An exception level disabling its stage-1 translation if its stage-2
translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
to 0 when HCR_EL2[VM] has a value of 1).

To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.

Signed-off-by: Shanker Donthineni 
---
Changes since v3:
  Rebased to kernel v4.15-rc3 and removed the alternatives.
Changes since v3:
  Rebased to kernel v4.15-rc1.
Changes since v2:
  Repost the corrected patches.
Changes since v1:
  Apply the workaround where it's required.

 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 12 +++-
 arch/arm64/include/asm/assembler.h | 10 ++
 arch/arm64/kernel/cpu-reset.S  |  1 +
 arch/arm64/kernel/efi-entry.S  |  2 ++
 arch/arm64/kernel/head.S   |  1 +
 arch/arm64/kernel/relocate_kernel.S|  1 +
 arch/arm64/kvm/hyp-init.S  |  1 +
 8 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index 304bf22..fc1c884 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -75,3 +75,4 @@ stable kernels.
 | Qualcomm Tech. | Falkor v1   | E1003   | 
QCOM_FALKOR_ERRATUM_1003|
 | Qualcomm Tech. | Falkor v1   | E1009   | 
QCOM_FALKOR_ERRATUM_1009|
 | Qualcomm Tech. | QDF2400 ITS | E0065   | 
QCOM_QDF2400_ERRATUM_0065   |
+| Qualcomm Tech. | Falkor v{1,2}   | E1041   | 
QCOM_FALKOR_ERRATUM_1041|
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a93339f..c9a7e9e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -557,7 +557,6 @@ config QCOM_QDF2400_ERRATUM_0065
 
  If unsure, say Y.
 
-
 config SOCIONEXT_SYNQUACER_PREITS
bool "Socionext Synquacer: Workaround for GICv3 pre-ITS"
default y
@@ -576,6 +575,17 @@ config HISILICON_ERRATUM_161600802
  a 128kB offset to be applied to the target address in this commands.
 
  If unsure, say Y.
+
+config QCOM_FALKOR_ERRATUM_E1041
+   bool "Falkor E1041: Speculative instruction fetches might cause errant 
memory access"
+   default y
+   help
+ Falkor CPU may speculatively fetch instructions from an improper
+ memory location when MMU translation is changed from SCTLR_ELn[M]=1
+ to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
+
+ If unsure, say Y.
+
 endmenu
 
 
diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index aef72d8..8b16828 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -512,4 +512,14 @@
 #endif
.endm
 
+/**
+ * Errata workaround prior to disable MMU. Insert an ISB immediately prior
+ * to executing the MSR that will change SCTLR_ELn[M] from a value of 1 to 0.
+ */
+   .macro pre_disable_mmu_workaround
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
+   isb
+#endif
+   .endm
+
 #endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
index 65f42d2..2a752cb 100644
--- a/arch/arm64/kernel/cpu-reset.S
+++ b/arch/arm64/kernel/cpu-reset.S
@@ -37,6 +37,7 @@ ENTRY(__cpu_soft_restart)
mrs x12, sctlr_el1
ldr x13, =SCTLR_ELx_FLAGS
bic x12, x12, x13
+   pre_disable_mmu_workaround
msr sctlr_el1, x12
isb
 
diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry

[PATCH v5 1/2] arm64: Define cputype macros for Falkor CPU

2017-12-11 Thread Shanker Donthineni
Add cputype definition macros for Qualcomm Datacenter Technologies
Falkor CPU in cputype.h. It's unfortunate that the first revision
of the Falkor CPU used the wrong part number 0x800, got fixed in v2
chip with part number 0xC00, and would be used the same value for
future revisions.

Signed-off-by: Shanker Donthineni 
---
 arch/arm64/include/asm/cputype.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 235e77d..cbf08d7 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -91,6 +91,7 @@
 #define BRCM_CPU_PART_VULCAN   0x516
 
 #define QCOM_CPU_PART_FALKOR_V10x800
+#define QCOM_CPU_PART_FALKOR   0xC00
 
 #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A53)
 #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A57)
@@ -99,6 +100,7 @@
 #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_81XX)
 #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_83XX)
 #define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, 
QCOM_CPU_PART_FALKOR_V1)
+#define MIDR_QCOM_FALKOR MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR)
 
 #ifndef __ASSEMBLY__
 
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RESEND PATCH v4 2/2] arm64: Add software workaround for Falkor erratum 1041

2017-12-11 Thread Shanker Donthineni
Thanks Mark, I'll post v5 patch without alternatives. 


On 12/11/2017 04:45 AM, Mark Rutland wrote:
> Hi,
> 
> On Sun, Dec 10, 2017 at 08:03:43PM -0600, Shanker Donthineni wrote:
>> +/**
>> + * Errata workaround prior to disable MMU. Insert an ISB immediately prior
>> + * to executing the MSR that will change SCTLR_ELn[M] from a value of 1 to 
>> 0.
>> + */
>> +.macro pre_disable_mmu_workaround
>> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
>> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
>> +isb
>> +alternative_else_nop_endif
>> +#endif
>> +.endm
> 
> There's really no need for this to be an alternative. It makes the
> kernel larger and more complex due to all the altinstr data and probing
> code.
> 
> As Will suggested last time [1], please just use the ifdef, and always
> compile-in the extra ISB if CONFIG_QCOM_FALKOR_ERRATUM_E1041 is
> selected. Get rid of the alternatives and probing code.
> 
> All you need here is:
> 
>   /*
>* Some Falkor parts make errant speculative instruction fetches
>* when SCTLR_ELx.M is cleared. An ISB before the write to
>* SCTLR_ELx prevents this.
>*/
>   .macro pre_disable_mmu_workaround
> #ifdef
>   isb
> #endif
>   .endm
> 
>> +
>> +.macro pre_disable_mmu_early_workaround
>> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
>> +isb
>> +#endif
>> +.endm
>> +
> 
> ... and we don't need a special early variant.
> 
> Thanks,
> Mark.
> 
> [1] https://lkml.kernel.org/r/20171201112457.ge18...@arm.com
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RESEND PATCH v4 2/2] arm64: Add software workaround for Falkor erratum 1041

2017-12-10 Thread Shanker Donthineni
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.

When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.

1) A System Error Interrupt (SEI) being raised by the Falkor core due
   to the errant memory access attempting to access a region of memory
   that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
   memory. This behavior may only occur if the instruction cache is
   disabled prior to or coincident with translation being changed from
   enabled to disabled.

The conditions leading to this erratum will not occur when either of the
following occur:
 1) A higher exception level disables translation of a lower exception level
   (e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
 2) An exception level disabling its stage-1 translation if its stage-2
translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
to 0 when HCR_EL2[VM] has a value of 1).

To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.

Signed-off-by: Shanker Donthineni 
---
Changes since v3:
  Rebased to kernel v4.15-rc1.
Changes since v2:
  Repost the corrected patches.
Changes since v1:
  Apply the workaround where it's required.

 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 12 +++-
 arch/arm64/include/asm/assembler.h | 19 +++
 arch/arm64/include/asm/cpucaps.h   |  3 ++-
 arch/arm64/kernel/cpu-reset.S  |  1 +
 arch/arm64/kernel/cpu_errata.c | 16 
 arch/arm64/kernel/efi-entry.S  |  2 ++
 arch/arm64/kernel/head.S   |  1 +
 arch/arm64/kernel/relocate_kernel.S|  1 +
 arch/arm64/kvm/hyp-init.S  |  1 +
 10 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index 304bf22..fc1c884 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -75,3 +75,4 @@ stable kernels.
 | Qualcomm Tech. | Falkor v1   | E1003   | 
QCOM_FALKOR_ERRATUM_1003|
 | Qualcomm Tech. | Falkor v1   | E1009   | 
QCOM_FALKOR_ERRATUM_1009|
 | Qualcomm Tech. | QDF2400 ITS | E0065   | 
QCOM_QDF2400_ERRATUM_0065   |
+| Qualcomm Tech. | Falkor v{1,2}   | E1041   | 
QCOM_FALKOR_ERRATUM_1041|
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a93339f..c9a7e9e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -557,7 +557,6 @@ config QCOM_QDF2400_ERRATUM_0065
 
  If unsure, say Y.
 
-
 config SOCIONEXT_SYNQUACER_PREITS
bool "Socionext Synquacer: Workaround for GICv3 pre-ITS"
default y
@@ -576,6 +575,17 @@ config HISILICON_ERRATUM_161600802
  a 128kB offset to be applied to the target address in this commands.
 
  If unsure, say Y.
+
+config QCOM_FALKOR_ERRATUM_E1041
+   bool "Falkor E1041: Speculative instruction fetches might cause errant 
memory access"
+   default y
+   help
+ Falkor CPU may speculatively fetch instructions from an improper
+ memory location when MMU translation is changed from SCTLR_ELn[M]=1
+ to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
+
+ If unsure, say Y.
+
 endmenu
 
 
diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index aef72d8..c77742a 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
.macro save_and_disable_daif, flags
mrs \flags, daif
@@ -512,4 +513,22 @@
 #endif
.endm
 
+/**
+ * Errata workaround prior to disable MMU. Insert an ISB immediately prior
+ * to executing the MSR that will change SCTLR_ELn[M] from a value of 1 to 0.
+ */
+   .macro pre_disable_mmu_workaround
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
+alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
+   isb
+alternative_else_nop_endif
+#endif
+   .endm
+
+   .macro pre_disable_mmu_early_workaround
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
+   isb
+#endif
+   .endm
+
 #endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
inde

[RESEND PATCH v4 1/2] arm64: Define cputype macros for Falkor CPU

2017-12-10 Thread Shanker Donthineni
Add cputype definition macros for Qualcomm Datacenter Technologies
Falkor CPU in cputype.h. It's unfortunate that the first revision
of the Falkor CPU used the wrong part number 0x800, got fixed in v2
chip with part number 0xC00, and would be used the same value for
future revisions.

Signed-off-by: Shanker Donthineni 
---
 arch/arm64/include/asm/cputype.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 235e77d..cbf08d7 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -91,6 +91,7 @@
 #define BRCM_CPU_PART_VULCAN   0x516
 
 #define QCOM_CPU_PART_FALKOR_V10x800
+#define QCOM_CPU_PART_FALKOR   0xC00
 
 #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A53)
 #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A57)
@@ -99,6 +100,7 @@
 #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_81XX)
 #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_83XX)
 #define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, 
QCOM_CPU_PART_FALKOR_V1)
+#define MIDR_QCOM_FALKOR MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR)
 
 #ifndef __ASSEMBLY__
 
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 2/2] arm64: Add software workaround for Falkor erratum 1041

2017-12-04 Thread Shanker Donthineni
Hi Will,

On 12/03/2017 07:35 AM, Shanker Donthineni wrote:
> Hi Will, thanks for your review comments.
> 
> On 12/01/2017 05:24 AM, Will Deacon wrote:
>> On Mon, Nov 27, 2017 at 05:18:00PM -0600, Shanker Donthineni wrote:
>>> The ARM architecture defines the memory locations that are permitted
>>> to be accessed as the result of a speculative instruction fetch from
>>> an exception level for which all stages of translation are disabled.
>>> Specifically, the core is permitted to speculatively fetch from the
>>> 4KB region containing the current program counter 4K and next 4K.
>>>
>>> When translation is changed from enabled to disabled for the running
>>> exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
>>> Falkor core may errantly speculatively access memory locations outside
>>> of the 4KB region permitted by the architecture. The errant memory
>>> access may lead to one of the following unexpected behaviors.
>>>
>>> 1) A System Error Interrupt (SEI) being raised by the Falkor core due
>>>to the errant memory access attempting to access a region of memory
>>>that is protected by a slave-side memory protection unit.
>>> 2) Unpredictable device behavior due to a speculative read from device
>>>memory. This behavior may only occur if the instruction cache is
>>>disabled prior to or coincident with translation being changed from
>>>enabled to disabled.
>>>
>>> The conditions leading to this erratum will not occur when either of the
>>> following occur:
>>>  1) A higher exception level disables translation of a lower exception level
>>>(e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
>>>  2) An exception level disabling its stage-1 translation if its stage-2
>>> translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
>>> to 0 when HCR_EL2[VM] has a value of 1).
>>>
>>> To avoid the errant behavior, software must execute an ISB immediately
>>> prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
>>>
>>> Signed-off-by: Shanker Donthineni 
>>> ---
>>> Changes since v3:
>>>   Rebased to kernel v4.15-rc1.
>>> Changes since v2:
>>>   Repost the corrected patches.
>>> Changes since v1:
>>>   Apply the workaround where it's required.
>>>
>>>  Documentation/arm64/silicon-errata.txt |  1 +
>>>  arch/arm64/Kconfig | 12 +++-
>>>  arch/arm64/include/asm/assembler.h | 19 +++
>>>  arch/arm64/include/asm/cpucaps.h   |  3 ++-
>>>  arch/arm64/kernel/cpu-reset.S  |  1 +
>>>  arch/arm64/kernel/cpu_errata.c | 16 
>>>  arch/arm64/kernel/efi-entry.S  |  2 ++
>>>  arch/arm64/kernel/head.S   |  1 +
>>>  arch/arm64/kernel/relocate_kernel.S|  1 +
>>>  arch/arm64/kvm/hyp-init.S  |  1 +
>>
>> This is an awful lot of code just to add an ISB instruction prior to
>> disabling the MMU. Why do you need to go through the alternatives framework
>> for this? Just do it with an #ifdef; this isn't a fastpath.
>>
> 
> We can avoid changes to only two files cpu_errata.c and cpucaps.h without 
> using
> the alternatives framework. Even though it's in slow path, cpu-errata.c 
> changes 
> provides a nice debug message which indicates the erratum E1041 is applied. 
> 
> Erratum log information would be very useful to conform our customers using 
> the
> right kernel with E1014 patch by looking at dmesg. Other than that I don't 
> have
> any other strong opinion to avoid alternatives and handle using #idef.
> 
> Should I go ahead and post v5 patch without alternatives?
> 


Please provide your thoughts on next step. We would like to merge this erratum
to v4.15 kernel.

>> Will
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 2/2] arm64: Add software workaround for Falkor erratum 1041

2017-12-03 Thread Shanker Donthineni
Hi Will, thanks for your review comments.

On 12/01/2017 05:24 AM, Will Deacon wrote:
> On Mon, Nov 27, 2017 at 05:18:00PM -0600, Shanker Donthineni wrote:
>> The ARM architecture defines the memory locations that are permitted
>> to be accessed as the result of a speculative instruction fetch from
>> an exception level for which all stages of translation are disabled.
>> Specifically, the core is permitted to speculatively fetch from the
>> 4KB region containing the current program counter 4K and next 4K.
>>
>> When translation is changed from enabled to disabled for the running
>> exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
>> Falkor core may errantly speculatively access memory locations outside
>> of the 4KB region permitted by the architecture. The errant memory
>> access may lead to one of the following unexpected behaviors.
>>
>> 1) A System Error Interrupt (SEI) being raised by the Falkor core due
>>to the errant memory access attempting to access a region of memory
>>that is protected by a slave-side memory protection unit.
>> 2) Unpredictable device behavior due to a speculative read from device
>>memory. This behavior may only occur if the instruction cache is
>>disabled prior to or coincident with translation being changed from
>>enabled to disabled.
>>
>> The conditions leading to this erratum will not occur when either of the
>> following occur:
>>  1) A higher exception level disables translation of a lower exception level
>>(e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
>>  2) An exception level disabling its stage-1 translation if its stage-2
>> translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
>> to 0 when HCR_EL2[VM] has a value of 1).
>>
>> To avoid the errant behavior, software must execute an ISB immediately
>> prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
>>
>> Signed-off-by: Shanker Donthineni 
>> ---
>> Changes since v3:
>>   Rebased to kernel v4.15-rc1.
>> Changes since v2:
>>   Repost the corrected patches.
>> Changes since v1:
>>   Apply the workaround where it's required.
>>
>>  Documentation/arm64/silicon-errata.txt |  1 +
>>  arch/arm64/Kconfig | 12 +++-
>>  arch/arm64/include/asm/assembler.h | 19 +++
>>  arch/arm64/include/asm/cpucaps.h   |  3 ++-
>>  arch/arm64/kernel/cpu-reset.S  |  1 +
>>  arch/arm64/kernel/cpu_errata.c | 16 
>>  arch/arm64/kernel/efi-entry.S  |  2 ++
>>  arch/arm64/kernel/head.S   |  1 +
>>  arch/arm64/kernel/relocate_kernel.S|  1 +
>>  arch/arm64/kvm/hyp-init.S  |  1 +
> 
> This is an awful lot of code just to add an ISB instruction prior to
> disabling the MMU. Why do you need to go through the alternatives framework
> for this? Just do it with an #ifdef; this isn't a fastpath.
> 

We can avoid changes to only two files cpu_errata.c and cpucaps.h without using
the alternatives framework. Even though it's in slow path, cpu-errata.c changes 
provides a nice debug message which indicates the erratum E1041 is applied. 

Erratum log information would be very useful to conform our customers using the
right kernel with E1014 patch by looking at dmesg. Other than that I don't have
any other strong opinion to avoid alternatives and handle using #idef.

Should I go head and post v5 patch without alternatives?

> Will
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 0/2] Implement a software workaround for Falkor erratum 1041

2017-11-27 Thread Shanker Donthineni
On Falkor CPU, we’ve discovered a hardware issue which might lead to a
kernel crash or the unexpected behavior. The Falkor core may errantly
access memory locations on speculative instruction fetches. This may
happen whenever MMU translation state, SCTLR_ELn[M] bit is being changed
from enabled to disabled for the currently running exception level. To
prevent the errant hardware behavior, software must execute an ISB
immediately prior to executing the MSR that changes SCTLR_ELn[M] from a
value of 1 to 0.

These v4 patches are based on 4.15-rc1 and tested on QDF2400 platform.

Patch2 from V1 series got dropped to accommodate review comments. Apply
the workaround where it's required.

Posted wrong the patches in v2.

Shanker Donthineni (2):
  arm64: Define cputype macros for Falkor CPU
  arm64: Add software workaround for Falkor erratum 1041

 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 12 +++-
 arch/arm64/include/asm/assembler.h | 19 +++
 arch/arm64/include/asm/cpucaps.h   |  3 ++-
 arch/arm64/include/asm/cputype.h   |  2 ++
 arch/arm64/kernel/cpu-reset.S  |  1 +
 arch/arm64/kernel/cpu_errata.c | 16 
 arch/arm64/kernel/efi-entry.S  |  2 ++
 arch/arm64/kernel/head.S   |  1 +
 arch/arm64/kernel/relocate_kernel.S|  1 +
 arch/arm64/kvm/hyp-init.S  |  1 +
 11 files changed, 57 insertions(+), 2 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 1/2] arm64: Define cputype macros for Falkor CPU

2017-11-27 Thread Shanker Donthineni
Add cputype definition macros for Qualcomm Datacenter Technologies
Falkor CPU in cputype.h. It's unfortunate that the first revision
of the Falkor CPU used the wrong part number 0x800, got fixed in v2
chip with part number 0xC00, and would be used the same value for
future revisions.

Signed-off-by: Shanker Donthineni 
---
 arch/arm64/include/asm/cputype.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 235e77d..cbf08d7 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -91,6 +91,7 @@
 #define BRCM_CPU_PART_VULCAN   0x516
 
 #define QCOM_CPU_PART_FALKOR_V10x800
+#define QCOM_CPU_PART_FALKOR   0xC00
 
 #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A53)
 #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A57)
@@ -99,6 +100,7 @@
 #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_81XX)
 #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_83XX)
 #define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, 
QCOM_CPU_PART_FALKOR_V1)
+#define MIDR_QCOM_FALKOR MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR)
 
 #ifndef __ASSEMBLY__
 
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 2/2] arm64: Add software workaround for Falkor erratum 1041

2017-11-27 Thread Shanker Donthineni
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.

When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.

1) A System Error Interrupt (SEI) being raised by the Falkor core due
   to the errant memory access attempting to access a region of memory
   that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
   memory. This behavior may only occur if the instruction cache is
   disabled prior to or coincident with translation being changed from
   enabled to disabled.

The conditions leading to this erratum will not occur when either of the
following occur:
 1) A higher exception level disables translation of a lower exception level
   (e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
 2) An exception level disabling its stage-1 translation if its stage-2
translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
to 0 when HCR_EL2[VM] has a value of 1).

To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.

Signed-off-by: Shanker Donthineni 
---
Changes since v3:
  Rebased to kernel v4.15-rc1.
Changes since v2:
  Repost the corrected patches.
Changes since v1:
  Apply the workaround where it's required.

 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 12 +++-
 arch/arm64/include/asm/assembler.h | 19 +++
 arch/arm64/include/asm/cpucaps.h   |  3 ++-
 arch/arm64/kernel/cpu-reset.S  |  1 +
 arch/arm64/kernel/cpu_errata.c | 16 
 arch/arm64/kernel/efi-entry.S  |  2 ++
 arch/arm64/kernel/head.S   |  1 +
 arch/arm64/kernel/relocate_kernel.S|  1 +
 arch/arm64/kvm/hyp-init.S  |  1 +
 10 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index 304bf22..fc1c884 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -75,3 +75,4 @@ stable kernels.
 | Qualcomm Tech. | Falkor v1   | E1003   | 
QCOM_FALKOR_ERRATUM_1003|
 | Qualcomm Tech. | Falkor v1   | E1009   | 
QCOM_FALKOR_ERRATUM_1009|
 | Qualcomm Tech. | QDF2400 ITS | E0065   | 
QCOM_QDF2400_ERRATUM_0065   |
+| Qualcomm Tech. | Falkor v{1,2}   | E1041   | 
QCOM_FALKOR_ERRATUM_1041|
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a93339f..c9a7e9e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -557,7 +557,6 @@ config QCOM_QDF2400_ERRATUM_0065
 
  If unsure, say Y.
 
-
 config SOCIONEXT_SYNQUACER_PREITS
bool "Socionext Synquacer: Workaround for GICv3 pre-ITS"
default y
@@ -576,6 +575,17 @@ config HISILICON_ERRATUM_161600802
  a 128kB offset to be applied to the target address in this commands.
 
  If unsure, say Y.
+
+config QCOM_FALKOR_ERRATUM_E1041
+   bool "Falkor E1041: Speculative instruction fetches might cause errant 
memory access"
+   default y
+   help
+ Falkor CPU may speculatively fetch instructions from an improper
+ memory location when MMU translation is changed from SCTLR_ELn[M]=1
+ to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
+
+ If unsure, say Y.
+
 endmenu
 
 
diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index aef72d8..c77742a 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
.macro save_and_disable_daif, flags
mrs \flags, daif
@@ -512,4 +513,22 @@
 #endif
.endm
 
+/**
+ * Errata workaround prior to disable MMU. Insert an ISB immediately prior
+ * to executing the MSR that will change SCTLR_ELn[M] from a value of 1 to 0.
+ */
+   .macro pre_disable_mmu_workaround
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
+alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
+   isb
+alternative_else_nop_endif
+#endif
+   .endm
+
+   .macro pre_disable_mmu_early_workaround
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
+   isb
+#endif
+   .endm
+
 #endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
inde

Re: [PATCH v2 2/2] arm64: Add software workaround for Falkor erratum 1041

2017-11-12 Thread Shanker Donthineni
Hi, 

Sorry, I've posted a wrong patch which causes the compilation errors.
Please disregard this patch, I posted v3 patch to fix the build
issue. 

https://patchwork.kernel.org/patch/10055077/

On 11/12/2017 07:16 PM, Shanker Donthineni wrote:
> The ARM architecture defines the memory locations that are permitted
> to be accessed as the result of a speculative instruction fetch from
> an exception level for which all stages of translation are disabled.
> Specifically, the core is permitted to speculatively fetch from the
> 4KB region containing the current program counter 4K and next 4K.
> 
> When translation is changed from enabled to disabled for the running
> exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
> Falkor core may errantly speculatively access memory locations outside
> of the 4KB region permitted by the architecture. The errant memory
> access may lead to one of the following unexpected behaviors.
> 
> 1) A System Error Interrupt (SEI) being raised by the Falkor core due
>to the errant memory access attempting to access a region of memory
>that is protected by a slave-side memory protection unit.
> 2) Unpredictable device behavior due to a speculative read from device
>memory. This behavior may only occur if the instruction cache is
>disabled prior to or coincident with translation being changed from
>enabled to disabled.
> 
> The conditions leading to this erratum will not occur when either of the
> following occur:
>  1) A higher exception level disables translation of a lower exception level
>(e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
>  2) An exception level disabling its stage-1 translation if its stage-2
> translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
> to 0 when HCR_EL2[VM] has a value of 1).
> 
> To avoid the errant behavior, software must execute an ISB immediately
> prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
> 
> Signed-off-by: Shanker Donthineni 
> ---
>  Documentation/arm64/silicon-errata.txt |  1 +
>  arch/arm64/Kconfig | 10 ++
>  arch/arm64/include/asm/assembler.h | 18 ++
>  arch/arm64/include/asm/cpucaps.h   |  3 ++-
>  arch/arm64/kernel/cpu-reset.S  |  1 +
>  arch/arm64/kernel/cpu_errata.c | 16 
>  arch/arm64/kernel/efi-entry.S  |  2 ++
>  arch/arm64/kernel/head.S   |  1 +
>  arch/arm64/kernel/relocate_kernel.S|  1 +
>  arch/arm64/kvm/hyp-init.S  |  1 +
>  10 files changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt 
> b/Documentation/arm64/silicon-errata.txt
> index 66e8ce1..704770c0 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -74,3 +74,4 @@ stable kernels.
>  | Qualcomm Tech. | Falkor v1   | E1003   | 
> QCOM_FALKOR_ERRATUM_1003|
>  | Qualcomm Tech. | Falkor v1   | E1009   | 
> QCOM_FALKOR_ERRATUM_1009|
>  | Qualcomm Tech. | QDF2400 ITS | E0065   | 
> QCOM_QDF2400_ERRATUM_0065   |
> +| Qualcomm Tech. | Falkor v{1,2}   | E1041   | 
> QCOM_FALKOR_ERRATUM_1041|
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0df64a6..8f73eac 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -539,6 +539,16 @@ config QCOM_QDF2400_ERRATUM_0065
>  
> If unsure, say Y.
>  
> +config QCOM_FALKOR_ERRATUM_E1041
> + bool "Falkor E1041: Speculative instruction fetches might cause errant 
> memory access"
> + default y
> + help
> +   Falkor CPU may speculatively fetch instructions from an improper
> +   memory location when MMU translation is changed from SCTLR_ELn[M]=1
> +   to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
> +
> +   If unsure, say Y.
> +
>  endmenu
>  
>  
> diff --git a/arch/arm64/include/asm/assembler.h 
> b/arch/arm64/include/asm/assembler.h
> index d58a625..eb11cdf 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -499,4 +499,22 @@
>  #endif
>   .endm
>  
> +/**
> + * Errata workaround prior to disable MMU. Insert an ISB immediately prior
> + * to executing the MSR that will change SCTLR_ELn[M] from a value of 1 to 0.
> + */
> + .macro pre_disable_mmu_workaround
> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
> + isb
> +alternative_else_nop_endif
> +#endif
> + .end
> +
> + .macro pre_disable_mmu_early_workaround
> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
> +

[PATCH v3 1/2] arm64: Define cputype macros for Falkor CPU

2017-11-12 Thread Shanker Donthineni
Add cputype definition macros for Qualcomm Datacenter Technologies
Falkor CPU in cputype.h. It's unfortunate that the first revision
of the Falkor CPU used the wrong part number 0x800, got fixed in v2
chip with part number 0xC00, and would be used the same value for
future revisions.

Signed-off-by: Shanker Donthineni 
---
 arch/arm64/include/asm/cputype.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 235e77d..cbf08d7 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -91,6 +91,7 @@
 #define BRCM_CPU_PART_VULCAN   0x516
 
 #define QCOM_CPU_PART_FALKOR_V10x800
+#define QCOM_CPU_PART_FALKOR   0xC00
 
 #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A53)
 #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A57)
@@ -99,6 +100,7 @@
 #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_81XX)
 #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_83XX)
 #define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, 
QCOM_CPU_PART_FALKOR_V1)
+#define MIDR_QCOM_FALKOR MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR)
 
 #ifndef __ASSEMBLY__
 
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 2/2] arm64: Add software workaround for Falkor erratum 1041

2017-11-12 Thread Shanker Donthineni
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.

When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.

1) A System Error Interrupt (SEI) being raised by the Falkor core due
   to the errant memory access attempting to access a region of memory
   that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
   memory. This behavior may only occur if the instruction cache is
   disabled prior to or coincident with translation being changed from
   enabled to disabled.

The conditions leading to this erratum will not occur when either of the
following occur:
 1) A higher exception level disables translation of a lower exception level
   (e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
 2) An exception level disabling its stage-1 translation if its stage-2
translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
to 0 when HCR_EL2[VM] has a value of 1).

To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.

Signed-off-by: Shanker Donthineni 
---
Changes since v1:
  Apply the workaround where it's required.
Changes since v2:
  Repost the corrected patches.

 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 10 ++
 arch/arm64/include/asm/assembler.h | 19 +++
 arch/arm64/include/asm/cpucaps.h   |  3 ++-
 arch/arm64/kernel/cpu-reset.S  |  1 +
 arch/arm64/kernel/cpu_errata.c | 16 
 arch/arm64/kernel/efi-entry.S  |  2 ++
 arch/arm64/kernel/head.S   |  1 +
 arch/arm64/kernel/relocate_kernel.S|  1 +
 arch/arm64/kvm/hyp-init.S  |  1 +
 10 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index 66e8ce1..704770c0 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -74,3 +74,4 @@ stable kernels.
 | Qualcomm Tech. | Falkor v1   | E1003   | 
QCOM_FALKOR_ERRATUM_1003|
 | Qualcomm Tech. | Falkor v1   | E1009   | 
QCOM_FALKOR_ERRATUM_1009|
 | Qualcomm Tech. | QDF2400 ITS | E0065   | 
QCOM_QDF2400_ERRATUM_0065   |
+| Qualcomm Tech. | Falkor v{1,2}   | E1041   | 
QCOM_FALKOR_ERRATUM_1041|
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..8f73eac 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -539,6 +539,16 @@ config QCOM_QDF2400_ERRATUM_0065
 
  If unsure, say Y.
 
+config QCOM_FALKOR_ERRATUM_E1041
+   bool "Falkor E1041: Speculative instruction fetches might cause errant 
memory access"
+   default y
+   help
+ Falkor CPU may speculatively fetch instructions from an improper
+ memory location when MMU translation is changed from SCTLR_ELn[M]=1
+ to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
+
+ If unsure, say Y.
+
 endmenu
 
 
diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index d58a625..dd9cec5 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Enable and disable interrupts.
@@ -499,4 +500,22 @@
 #endif
.endm
 
+/**
+ * Errata workaround prior to disable MMU. Insert an ISB immediately prior
+ * to executing the MSR that will change SCTLR_ELn[M] from a value of 1 to 0.
+ */
+   .macro pre_disable_mmu_workaround
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
+alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
+   isb
+alternative_else_nop_endif
+#endif
+   .endm
+
+   .macro pre_disable_mmu_early_workaround
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
+   isb
+#endif
+   .endm
+
 #endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8da6216..7f7a59d 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -40,7 +40,8 @@
 #define ARM64_WORKAROUND_85892119
 #define ARM64_WORKAROUND_CAVIUM_30115  20
 #define ARM64_HAS_DCPOP21
+#define ARM64_WORKAROUND_QCOM_FALKOR_E1041 22
 
-#define ARM64_NCAPS   

[PATCH v3 0/2] Implement a software workaround for Falkor erratum 1041

2017-11-12 Thread Shanker Donthineni
On Falkor CPU, we’ve discovered a hardware issue which might lead to a
kernel crash or the unexpected behavior. The Falkor core may errantly
access memory locations on speculative instruction fetches. This may
happen whenever MMU translation state, SCTLR_ELn[M] bit is being changed
from enabled to disabled for the currently running exception level. To
prevent the errant hardware behavior, software must execute an ISB
immediately prior to executing the MSR that changes SCTLR_ELn[M] from a
value of 1 to 0. To simplify the complexity of a workaround, this patch
series issues an ISB whenever SCTLR_ELn[M] is changed to 0 to fix the
Falkor erratum 1041.

Patch2 from V1 series got dropped to accommodate review comments. Apply
the workaround where it's required.

Posted wrong the patches in v2.

Patch1:
  - CPUTYPE definitions for Falkor CPU.

Patch2:
  - Actual workaround changes for erratum E1041.

Shanker Donthineni (2):
  arm64: Define cputype macros for Falkor CPU
  arm64: Add software workaround for Falkor erratum 1041

 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 10 ++
 arch/arm64/include/asm/assembler.h | 18 ++
 arch/arm64/include/asm/cpucaps.h   |  3 ++-
 arch/arm64/include/asm/cputype.h   |  2 ++
 arch/arm64/kernel/cpu-reset.S  |  1 +
 arch/arm64/kernel/cpu_errata.c | 16 
 arch/arm64/kernel/efi-entry.S  |  2 ++
 arch/arm64/kernel/head.S   |  1 +
 arch/arm64/kernel/relocate_kernel.S|  1 +
 arch/arm64/kvm/hyp-init.S  |  1 +
 11 files changed, 55 insertions(+), 1 deletion(-)

-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 0/2] Implement a software workaround for Falkor erratum 1041

2017-11-12 Thread Shanker Donthineni
On Falkor CPU, we’ve discovered a hardware issue which might lead to a
kernel crash or the unexpected behavior. The Falkor core may errantly
access memory locations on speculative instruction fetches. This may
happen whenever MMU translation state, SCTLR_ELn[M] bit is being changed
from enabled to disabled for the currently running exception level. To
prevent the errant hardware behavior, software must execute an ISB
immediately prior to executing the MSR that changes SCTLR_ELn[M] from a
value of 1 to 0. To simplify the complexity of a workaround, this patch
series issues an ISB whenever SCTLR_ELn[M] is changed to 0 to fix the
Falkor erratum 1041.

Patch2 from V1 series got dropped to accommodate review comments. Apply
the workaround where it's required.

Patch1:
  - CPUTYPE definitions for Falkor CPU.

Patch2:
  - Actual workaround changes for erratum E1041.

Shanker Donthineni (2):
  arm64: Define cputype macros for Falkor CPU
  arm64: Add software workaround for Falkor erratum 1041

 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 10 ++
 arch/arm64/include/asm/assembler.h | 18 ++
 arch/arm64/include/asm/cpucaps.h   |  3 ++-
 arch/arm64/include/asm/cputype.h   |  2 ++
 arch/arm64/kernel/cpu-reset.S  |  1 +
 arch/arm64/kernel/cpu_errata.c | 16 
 arch/arm64/kernel/efi-entry.S  |  2 ++
 arch/arm64/kernel/head.S   |  1 +
 arch/arm64/kernel/relocate_kernel.S|  1 +
 arch/arm64/kvm/hyp-init.S  |  1 +
 11 files changed, 55 insertions(+), 1 deletion(-)

-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 2/2] arm64: Add software workaround for Falkor erratum 1041

2017-11-12 Thread Shanker Donthineni
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.

When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.

1) A System Error Interrupt (SEI) being raised by the Falkor core due
   to the errant memory access attempting to access a region of memory
   that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
   memory. This behavior may only occur if the instruction cache is
   disabled prior to or coincident with translation being changed from
   enabled to disabled.

The conditions leading to this erratum will not occur when either of the
following occur:
 1) A higher exception level disables translation of a lower exception level
   (e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
 2) An exception level disabling its stage-1 translation if its stage-2
translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
to 0 when HCR_EL2[VM] has a value of 1).

To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.

Signed-off-by: Shanker Donthineni 
---
 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 10 ++
 arch/arm64/include/asm/assembler.h | 18 ++
 arch/arm64/include/asm/cpucaps.h   |  3 ++-
 arch/arm64/kernel/cpu-reset.S  |  1 +
 arch/arm64/kernel/cpu_errata.c | 16 
 arch/arm64/kernel/efi-entry.S  |  2 ++
 arch/arm64/kernel/head.S   |  1 +
 arch/arm64/kernel/relocate_kernel.S|  1 +
 arch/arm64/kvm/hyp-init.S  |  1 +
 10 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index 66e8ce1..704770c0 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -74,3 +74,4 @@ stable kernels.
 | Qualcomm Tech. | Falkor v1   | E1003   | 
QCOM_FALKOR_ERRATUM_1003|
 | Qualcomm Tech. | Falkor v1   | E1009   | 
QCOM_FALKOR_ERRATUM_1009|
 | Qualcomm Tech. | QDF2400 ITS | E0065   | 
QCOM_QDF2400_ERRATUM_0065   |
+| Qualcomm Tech. | Falkor v{1,2}   | E1041   | 
QCOM_FALKOR_ERRATUM_1041|
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..8f73eac 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -539,6 +539,16 @@ config QCOM_QDF2400_ERRATUM_0065
 
  If unsure, say Y.
 
+config QCOM_FALKOR_ERRATUM_E1041
+   bool "Falkor E1041: Speculative instruction fetches might cause errant 
memory access"
+   default y
+   help
+ Falkor CPU may speculatively fetch instructions from an improper
+ memory location when MMU translation is changed from SCTLR_ELn[M]=1
+ to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
+
+ If unsure, say Y.
+
 endmenu
 
 
diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index d58a625..eb11cdf 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -499,4 +499,22 @@
 #endif
.endm
 
+/**
+ * Errata workaround prior to disable MMU. Insert an ISB immediately prior
+ * to executing the MSR that will change SCTLR_ELn[M] from a value of 1 to 0.
+ */
+   .macro pre_disable_mmu_workaround
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
+alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
+   isb
+alternative_else_nop_endif
+#endif
+   .end
+
+   .macro pre_disable_mmu_early_workaround
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
+   isb
+#endif
+   .end
+
 #endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8da6216..7f7a59d 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -40,7 +40,8 @@
 #define ARM64_WORKAROUND_85892119
 #define ARM64_WORKAROUND_CAVIUM_30115  20
 #define ARM64_HAS_DCPOP21
+#define ARM64_WORKAROUND_QCOM_FALKOR_E1041 22
 
-#define ARM64_NCAPS22
+#define ARM64_NCAPS23
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
index 65f42d2..2a752cb 100644
--- a/arch/arm64/kernel/c

[PATCH v2 1/2] arm64: Define cputype macros for Falkor CPU

2017-11-12 Thread Shanker Donthineni
Add cputype definition macros for Qualcomm Datacenter Technologies
Falkor CPU in cputype.h. It's unfortunate that the first revision
of the Falkor CPU used the wrong part number 0x800, got fixed in v2
chip with part number 0xC00, and would be used the same value for
future revisions.

Signed-off-by: Shanker Donthineni 
---
 arch/arm64/include/asm/cputype.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 235e77d..cbf08d7 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -91,6 +91,7 @@
 #define BRCM_CPU_PART_VULCAN   0x516
 
 #define QCOM_CPU_PART_FALKOR_V10x800
+#define QCOM_CPU_PART_FALKOR   0xC00
 
 #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A53)
 #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A57)
@@ -99,6 +100,7 @@
 #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_81XX)
 #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_83XX)
 #define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, 
QCOM_CPU_PART_FALKOR_V1)
+#define MIDR_QCOM_FALKOR MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR)
 
 #ifndef __ASSEMBLY__
 
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] arm64: Add software workaround for Falkor erratum 1041

2017-11-12 Thread Shanker Donthineni
Hi James,

On 11/10/2017 04:24 AM, James Morse wrote:
> Hi Shanker,
> 
> On 09/11/17 15:22, Shanker Donthineni wrote:
>> On 11/09/2017 05:08 AM, James Morse wrote:
>>> On 04/11/17 21:43, Shanker Donthineni wrote:
>>>> On 11/03/2017 10:11 AM, Robin Murphy wrote:
>>>>> On 03/11/17 03:27, Shanker Donthineni wrote:
>>>>>> The ARM architecture defines the memory locations that are permitted
>>>>>> to be accessed as the result of a speculative instruction fetch from
>>>>>> an exception level for which all stages of translation are disabled.
>>>>>> Specifically, the core is permitted to speculatively fetch from the
>>>>>> 4KB region containing the current program counter and next 4KB.
>>>>>>
>>>>>> When translation is changed from enabled to disabled for the running
>>>>>> exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
>>>>>> Falkor core may errantly speculatively access memory locations outside
>>>>>> of the 4KB region permitted by the architecture. The errant memory
>>>>>> access may lead to one of the following unexpected behaviors.
>>>>>>
>>>>>> 1) A System Error Interrupt (SEI) being raised by the Falkor core due
>>>>>>to the errant memory access attempting to access a region of memory
>>>>>>that is protected by a slave-side memory protection unit.
>>>>>> 2) Unpredictable device behavior due to a speculative read from device
>>>>>>memory. This behavior may only occur if the instruction cache is
>>>>>>disabled prior to or coincident with translation being changed from
>>>>>>enabled to disabled.
>>>>>>
>>>>>> To avoid the errant behavior, software must execute an ISB immediately
>>>>>> prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
> 
>>>>>> diff --git a/arch/arm64/include/asm/assembler.h 
>>>>>> b/arch/arm64/include/asm/assembler.h
>>>>>> index b6dfb4f..4c91efb 100644
>>>>>> --- a/arch/arm64/include/asm/assembler.h
>>>>>> +++ b/arch/arm64/include/asm/assembler.h
>>>>>> @@ -514,6 +515,22 @@
>>>>>>   *   reg: the value to be written.
>>>>>>   */
>>>>>>  .macro  write_sctlr, eln, reg
>>>>>> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1041
>>>>>> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
>>>>>> +tbnz\reg, #0, 8000f  // enable MMU?
>>>
>>> Won't this match any change that leaves the MMU enabled?
>>
>> Yes. No need to apply workaround if the MMU is going to be enabled.
> 
> (Sorry, looks like I had this upside down)
> 
> My badly-made-point is you can't know if the MMU is being disabled unless you
> have both the old and new values.
> 
> As an example, in el2_setup, (where the MMU is disabled), we set the EE/E0E 
> bits
> to match the kernel's endianness. Won't your macro will insert an unnecessary
> isb? Is this needed for the errata workaround?
> 

Yes, It's not required in this case. I'll post a v2 patch and apply the 
workaround
where it's absolutely required. Seems handling a workaround inside helper macros
causing confusion.

> 
>>> I think the macro is making this more confusing. Disabling the MMU is 
>>> obvious
>>> from the call-site, (and really rare!). Trying to work it out from a macro 
>>> makes
>>> it more complicated than necessary.
> 
>> Not clear, are you suggesting not to use read{write}_sctlr() macros instead 
>> apply 
>> the workaround from the call-site based on the MMU-on status?
> 
> Yes. This is the only way to patch only the locations that turn the MMU off.
> 
> 
>> If yes, It simplifies
>> the code logic but CONFIG_QCOM_FALKOR_ERRATUM_1041 references are scatter 
>> everywhere. 
> 
> Wouldn't they only appear in the places that are affected by the errata?
> This is exactly what we want, anyone touching that code now knows they need to
> double check this behaviour, (and ask you to test it!).
> 
> Otherwise we have a macro second guessing what is happening, if its not quite
> right (because some information has been lost), we're now not sure what we 
> need
> to do if we ever refactor any of this code.
> 
> [...]
> 
>>>> I'll prefer alternatives
>>>> just to avoid the unnecessar

Re: [PATCH 3/3] arm64: Add software workaround for Falkor erratum 1041

2017-11-09 Thread Shanker Donthineni
Hi James,

On 11/09/2017 05:08 AM, James Morse wrote:
> Hi Shanker, Robin,
> 
> On 04/11/17 21:43, Shanker Donthineni wrote:
>> On 11/03/2017 10:11 AM, Robin Murphy wrote:
>>> On 03/11/17 03:27, Shanker Donthineni wrote:
>>>> The ARM architecture defines the memory locations that are permitted
>>>> to be accessed as the result of a speculative instruction fetch from
>>>> an exception level for which all stages of translation are disabled.
>>>> Specifically, the core is permitted to speculatively fetch from the
>>>> 4KB region containing the current program counter and next 4KB.
>>>>
>>>> When translation is changed from enabled to disabled for the running
>>>> exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
>>>> Falkor core may errantly speculatively access memory locations outside
>>>> of the 4KB region permitted by the architecture. The errant memory
>>>> access may lead to one of the following unexpected behaviors.
>>>>
>>>> 1) A System Error Interrupt (SEI) being raised by the Falkor core due
>>>>to the errant memory access attempting to access a region of memory
>>>>that is protected by a slave-side memory protection unit.
>>>> 2) Unpredictable device behavior due to a speculative read from device
>>>>memory. This behavior may only occur if the instruction cache is
>>>>disabled prior to or coincident with translation being changed from
>>>>enabled to disabled.
>>>>
>>>> To avoid the errant behavior, software must execute an ISB immediately
>>>> prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
> 
> 
>>>> diff --git a/arch/arm64/include/asm/assembler.h 
>>>> b/arch/arm64/include/asm/assembler.h
>>>> index b6dfb4f..4c91efb 100644
>>>> --- a/arch/arm64/include/asm/assembler.h
>>>> +++ b/arch/arm64/include/asm/assembler.h
>>>> @@ -30,6 +30,7 @@
>>>>  #include 
>>>>  #include 
>>>>  #include 
>>>> +#include 
>>>>  
>>>>  /*
>>>>   * Enable and disable interrupts.
>>>> @@ -514,6 +515,22 @@
>>>>   *   reg: the value to be written.
>>>>   */
>>>>.macro  write_sctlr, eln, reg
>>>> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1041
>>>> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
>>>> +  tbnz\reg, #0, 8000f  // enable MMU?
> 
> Won't this match any change that leaves the MMU enabled?
> 

Yes. No need to apply workaround if the MMU is going to be enabled.

> I think the macro is making this more confusing. Disabling the MMU is obvious
> from the call-site, (and really rare!). Trying to work it out from a macro 
> makes
> it more complicated than necessary.
>

Not clear, are you suggesting not to use read{write}_sctlr() macros instead 
apply 
the workaround from the call-site based on the MMU-on status? If yes, It 
simplifies
the code logic but CONFIG_QCOM_FALKOR_ERRATUM_1041 references are scatter 
everywhere. 
 
> 
>>> Do we really need the branch here? It's not like enabling the MMU is
>>> something we do on the syscall fastpath, and I can't imagine an extra
>>> ISB hurts much (and is probably comparable to a mispredicted branch
> 
>> I don't have any strong opinion on whether to use an ISB conditionally
>> or unconditionally. Yes, the current kernel code is not touching
>> SCTLR_ELn register on the system call fast path. I would like to keep
>> it as a conditional ISB in case if the future kernel accesses the
>> SCTLR_ELn on the fast path. An extra ISB should not hurt a lot but I
>> believe it has more overhead than the TBZ+branch mis-prediction on Falkor
>> CPU. This patch has been tested on the real hardware to fix the problem.
> 
>> I'm open to change to an unconditional ISB if it's the better fix.
>>
>>> anyway). In fact, is there any noticeable hit on other
>>> microarchitectures if we save the alternative bother and just do it
>>> unconditionally always?
>>>
>>
>> I can't comment on the performance impacts of other CPUs since I don't
>> have access to their development platforms. I'll prefer alternatives
>> just to avoid the unnecessary overhead on future Qualcomm Datacenter
>> server CPUs and regression on other CPUs because of inserting an ISB
> 
> I think hiding errata on other CPUs is a good argument.
> 
> My suggestion would be:
>> #ifdef CONFIG_QCOM_FALK

Re: [PATCH 3/3] arm64: Add software workaround for Falkor erratum 1041

2017-11-04 Thread Shanker Donthineni
Hi Robin, Thanks for your review comments. 

On 11/03/2017 10:11 AM, Robin Murphy wrote:
> On 03/11/17 03:27, Shanker Donthineni wrote:
>> The ARM architecture defines the memory locations that are permitted
>> to be accessed as the result of a speculative instruction fetch from
>> an exception level for which all stages of translation are disabled.
>> Specifically, the core is permitted to speculatively fetch from the
>> 4KB region containing the current program counter and next 4KB.
>>
>> When translation is changed from enabled to disabled for the running
>> exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
>> Falkor core may errantly speculatively access memory locations outside
>> of the 4KB region permitted by the architecture. The errant memory
>> access may lead to one of the following unexpected behaviors.
>>
>> 1) A System Error Interrupt (SEI) being raised by the Falkor core due
>>to the errant memory access attempting to access a region of memory
>>that is protected by a slave-side memory protection unit.
>> 2) Unpredictable device behavior due to a speculative read from device
>>memory. This behavior may only occur if the instruction cache is
>>disabled prior to or coincident with translation being changed from
>>enabled to disabled.
>>
>> To avoid the errant behavior, software must execute an ISB immediately
>> prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
>>
>> Signed-off-by: Shanker Donthineni 
>> ---
>>  Documentation/arm64/silicon-errata.txt |  1 +
>>  arch/arm64/Kconfig | 10 ++
>>  arch/arm64/include/asm/assembler.h | 17 +
>>  arch/arm64/include/asm/cpucaps.h   |  3 ++-
>>  arch/arm64/kernel/cpu_errata.c | 16 
>>  arch/arm64/kernel/efi-entry.S  |  4 ++--
>>  arch/arm64/kernel/head.S   |  4 ++--
>>  7 files changed, 50 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/arm64/silicon-errata.txt 
>> b/Documentation/arm64/silicon-errata.txt
>> index 66e8ce1..704770c0 100644
>> --- a/Documentation/arm64/silicon-errata.txt
>> +++ b/Documentation/arm64/silicon-errata.txt
>> @@ -74,3 +74,4 @@ stable kernels.
>>  | Qualcomm Tech. | Falkor v1   | E1003   | 
>> QCOM_FALKOR_ERRATUM_1003|
>>  | Qualcomm Tech. | Falkor v1   | E1009   | 
>> QCOM_FALKOR_ERRATUM_1009|
>>  | Qualcomm Tech. | QDF2400 ITS | E0065   | 
>> QCOM_QDF2400_ERRATUM_0065   |
>> +| Qualcomm Tech. | Falkor v{1,2}   | E1041   | 
>> QCOM_FALKOR_ERRATUM_1041|
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 0df64a6..7e933fb 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -539,6 +539,16 @@ config QCOM_QDF2400_ERRATUM_0065
>>  
>>If unsure, say Y.
>>  
>> +config QCOM_FALKOR_ERRATUM_1041
>> +bool "Falkor E1041: Speculative instruction fetches might cause errant 
>> memory access"
>> +default y
>> +help
>> +  Falkor CPU may speculatively fetch instructions from an improper
>> +  memory location when MMU translation is changed from SCTLR_ELn[M]=1
>> +  to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
>> +
>> +  If unsure, say Y.
>> +
>>  endmenu
>>  
>>  
>> diff --git a/arch/arm64/include/asm/assembler.h 
>> b/arch/arm64/include/asm/assembler.h
>> index b6dfb4f..4c91efb 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -30,6 +30,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  /*
>>   * Enable and disable interrupts.
>> @@ -514,6 +515,22 @@
>>   *   reg: the value to be written.
>>   */
>>  .macro  write_sctlr, eln, reg
>> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1041
>> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
>> +tbnz\reg, #0, 8000f  // enable MMU?
> 
> Do we really need the branch here? It's not like enabling the MMU is
> something we do on the syscall fastpath, and I can't imagine an extra
> ISB hurts much (and is probably comparable to a mispredicted branch

I don't have any strong opinion on whether to use an ISB conditionally
or unconditionally. Yes, the current kernel code is not touching
SCTLR_ELn register on the system call fast path. I would like to keep
it as a conditional ISB in case if the future kernel accesses the
SCTLR_ELn on the fast path. An 

[PATCH 3/3] arm64: Add software workaround for Falkor erratum 1041

2017-11-02 Thread Shanker Donthineni
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter and next 4KB.

When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.

1) A System Error Interrupt (SEI) being raised by the Falkor core due
   to the errant memory access attempting to access a region of memory
   that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
   memory. This behavior may only occur if the instruction cache is
   disabled prior to or coincident with translation being changed from
   enabled to disabled.

To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.

Signed-off-by: Shanker Donthineni 
---
 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 10 ++
 arch/arm64/include/asm/assembler.h | 17 +
 arch/arm64/include/asm/cpucaps.h   |  3 ++-
 arch/arm64/kernel/cpu_errata.c | 16 
 arch/arm64/kernel/efi-entry.S  |  4 ++--
 arch/arm64/kernel/head.S   |  4 ++--
 7 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index 66e8ce1..704770c0 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -74,3 +74,4 @@ stable kernels.
 | Qualcomm Tech. | Falkor v1   | E1003   | 
QCOM_FALKOR_ERRATUM_1003|
 | Qualcomm Tech. | Falkor v1   | E1009   | 
QCOM_FALKOR_ERRATUM_1009|
 | Qualcomm Tech. | QDF2400 ITS | E0065   | 
QCOM_QDF2400_ERRATUM_0065   |
+| Qualcomm Tech. | Falkor v{1,2}   | E1041   | 
QCOM_FALKOR_ERRATUM_1041|
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..7e933fb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -539,6 +539,16 @@ config QCOM_QDF2400_ERRATUM_0065
 
  If unsure, say Y.
 
+config QCOM_FALKOR_ERRATUM_1041
+   bool "Falkor E1041: Speculative instruction fetches might cause errant 
memory access"
+   default y
+   help
+ Falkor CPU may speculatively fetch instructions from an improper
+ memory location when MMU translation is changed from SCTLR_ELn[M]=1
+ to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
+
+ If unsure, say Y.
+
 endmenu
 
 
diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index b6dfb4f..4c91efb 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Enable and disable interrupts.
@@ -514,6 +515,22 @@
  *   reg: the value to be written.
  */
.macro  write_sctlr, eln, reg
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1041
+alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
+   tbnz\reg, #0, 8000f  // enable MMU?
+   isb
+8000:
+alternative_else_nop_endif
+#endif
+   msr sctlr_\eln, \reg
+   .endm
+
+   .macro  early_write_sctlr, eln, reg
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1041
+   tbnz\reg, #0, 8000f  // enable MMU?
+   isb
+8000:
+#endif
msr sctlr_\eln, \reg
.endm
 
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8da6216..7f7a59d 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -40,7 +40,8 @@
 #define ARM64_WORKAROUND_85892119
 #define ARM64_WORKAROUND_CAVIUM_30115  20
 #define ARM64_HAS_DCPOP21
+#define ARM64_WORKAROUND_QCOM_FALKOR_E1041 22
 
-#define ARM64_NCAPS22
+#define ARM64_NCAPS23
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 0e27f86..27f9a45 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -179,6 +179,22 @@ static int cpu_enable_trap_ctr_access(void *__unused)
   MIDR_CPU_VAR_REV(0, 0)),
},
 #endif
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1041
+   {
+   .desc = "Qualcomm Technologies Falkor erratum 1041",
+   .capability = ARM64_WORKAROUND_QCOM_FALKOR_E1041,
+   MIDR_RANGE

[PATCH 0/3] Implement a software workaround for Falkor erratum 1041

2017-11-02 Thread Shanker Donthineni
On Falkor CPU, we’ve discovered a hardware issue which might lead to a
kernel crash or the unexpected behavior. The Falkor core may errantly
access memory locations on speculative instruction fetches. This may
happen whenever MMU translation state, SCTLR_ELn[M] bit is being changed
from enabled to disabled for the currently running exception level. To
prevent the errant hardware behavior, software must execute an ISB
immediately prior to executing the MSR that changes SCTLR_ELn[M] from a
value of 1 to 0. To simplify the complexity of a workaround, this patch
series issues an ISB whenever SCTLR_ELn[M] is changed to 0 to fix the
Falkor erratum 1041.

Patch1:
  - CPUTYPE definitions for Falkor CPU.

Patch2:
  - Define two ASM helper macros to read/write SCTLR_ELn register.

Patch3:
  - Actual workaround changes for erratum E1041.

Shanker Donthineni (3):
  arm64: Define cputype macros for Falkor CPU
  arm64: Prepare SCTLR_ELn accesses to handle Falkor erratum 1041
  arm64: Add software workaround for Falkor erratum 1041

 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 10 ++
 arch/arm64/include/asm/assembler.h | 35 ++
 arch/arm64/include/asm/cpucaps.h   |  3 ++-
 arch/arm64/include/asm/cputype.h   |  2 ++
 arch/arm64/kernel/cpu-reset.S  |  4 ++--
 arch/arm64/kernel/cpu_errata.c | 16 
 arch/arm64/kernel/efi-entry.S  |  8 
 arch/arm64/kernel/head.S   | 18 -
 arch/arm64/kernel/relocate_kernel.S|  4 ++--
 arch/arm64/kvm/hyp-init.S  |  6 +++---
 arch/arm64/mm/proc.S   |  6 +++---
 12 files changed, 89 insertions(+), 24 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 1/3] arm64: Define cputype macros for Falkor CPU

2017-11-02 Thread Shanker Donthineni
Add cputype definition macros for Qualcomm Datacenter Technologies
Falkor CPU in cputype.h. It's unfortunate that the first revision
of the Falkor CPU used the wrong part number 0x800, got fixed in v2
chip with part number 0xC00, and would be used the same value for
future revisions.

Signed-off-by: Shanker Donthineni 
Signed-off-by: Neil Leeder 
---
 arch/arm64/include/asm/cputype.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 235e77d..cbf08d7 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -91,6 +91,7 @@
 #define BRCM_CPU_PART_VULCAN   0x516
 
 #define QCOM_CPU_PART_FALKOR_V10x800
+#define QCOM_CPU_PART_FALKOR   0xC00
 
 #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A53)
 #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A57)
@@ -99,6 +100,7 @@
 #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_81XX)
 #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_83XX)
 #define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, 
QCOM_CPU_PART_FALKOR_V1)
+#define MIDR_QCOM_FALKOR MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR)
 
 #ifndef __ASSEMBLY__
 
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 2/3] arm64: Prepare SCTLR_ELn accesses to handle Falkor erratum 1041

2017-11-02 Thread Shanker Donthineni
This patch introduces two helper macros read_sctlr and write_sctlr
to access system register SCTLR_ELn. Replace all MSR/MRS references
to sctlr_el1{el2} with macros.

This should cause no behavioral change.

Signed-off-by: Shanker Donthineni 
---
 arch/arm64/include/asm/assembler.h  | 18 ++
 arch/arm64/kernel/cpu-reset.S   |  4 ++--
 arch/arm64/kernel/efi-entry.S   |  8 
 arch/arm64/kernel/head.S| 18 +-
 arch/arm64/kernel/relocate_kernel.S |  4 ++--
 arch/arm64/kvm/hyp-init.S   |  6 +++---
 arch/arm64/mm/proc.S|  6 +++---
 7 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index d58a625..b6dfb4f 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -499,4 +499,22 @@
 #endif
.endm
 
+/**
+ * Read value of the system control register SCTLR_ELn.
+ *   eln: which system control register.
+ *   reg: contents of the SCTLR_ELn.
+ */
+   .macro  read_sctlr, eln, reg
+   mrs \reg, sctlr_\eln
+   .endm
+
+/**
+ * Write the value to the system control register SCTLR_ELn.
+ *   eln: which system control register.
+ *   reg: the value to be written.
+ */
+   .macro  write_sctlr, eln, reg
+   msr sctlr_\eln, \reg
+   .endm
+
 #endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
index 65f42d2..9224abd 100644
--- a/arch/arm64/kernel/cpu-reset.S
+++ b/arch/arm64/kernel/cpu-reset.S
@@ -34,10 +34,10 @@
  */
 ENTRY(__cpu_soft_restart)
/* Clear sctlr_el1 flags. */
-   mrs x12, sctlr_el1
+   read_sctlr el1, x12
ldr x13, =SCTLR_ELx_FLAGS
bic x12, x12, x13
-   msr sctlr_el1, x12
+   write_sctlr el1, x12
isb
 
cbz x0, 1f  // el2_switch?
diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
index 4e6ad35..acae627 100644
--- a/arch/arm64/kernel/efi-entry.S
+++ b/arch/arm64/kernel/efi-entry.S
@@ -93,17 +93,17 @@ ENTRY(entry)
mrs x0, CurrentEL
cmp x0, #CurrentEL_EL2
b.ne1f
-   mrs x0, sctlr_el2
+   read_sctlr el2, x0
bic x0, x0, #1 << 0 // clear SCTLR.M
bic x0, x0, #1 << 2 // clear SCTLR.C
-   msr sctlr_el2, x0
+   write_sctlr el2, x0
isb
b   2f
 1:
-   mrs x0, sctlr_el1
+   read_sctlr el1, x0
bic x0, x0, #1 << 0 // clear SCTLR.M
bic x0, x0, #1 << 2 // clear SCTLR.C
-   msr sctlr_el1, x0
+   write_sctlr el1, x0
isb
 2:
/* Jump to kernel entry point */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 0b243ec..b8d5b73 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -388,18 +388,18 @@ ENTRY(el2_setup)
mrs x0, CurrentEL
cmp x0, #CurrentEL_EL2
b.eq1f
-   mrs x0, sctlr_el1
+   read_sctlr el1, x0
 CPU_BE(orr x0, x0, #(3 << 24)  )   // Set the EE and E0E 
bits for EL1
 CPU_LE(bic x0, x0, #(3 << 24)  )   // Clear the EE and E0E 
bits for EL1
-   msr sctlr_el1, x0
+   write_sctlr el1, x0
mov w0, #BOOT_CPU_MODE_EL1  // This cpu booted in EL1
isb
ret
 
-1: mrs x0, sctlr_el2
+1: read_sctlr el2, x0
 CPU_BE(orr x0, x0, #(1 << 25)  )   // Set the EE bit for 
EL2
 CPU_LE(bic x0, x0, #(1 << 25)  )   // Clear the EE bit for 
EL2
-   msr sctlr_el2, x0
+   write_sctlr el2, x0
 
 #ifdef CONFIG_ARM64_VHE
/*
@@ -511,7 +511,7 @@ install_el2_stub:
mov x0, #0x0800 // Set/clear RES{1,0} bits
 CPU_BE(movkx0, #0x33d0, lsl #16)   // Set EE and E0E on BE 
systems
 CPU_LE(movkx0, #0x30d0, lsl #16)   // Clear EE and E0E on 
LE systems
-   msr sctlr_el1, x0
+   write_sctlr el1, x0
 
/* Coprocessor traps. */
mov x0, #0x33ff
@@ -664,7 +664,7 @@ ENTRY(__enable_mmu)
msr ttbr0_el1, x1   // load TTBR0
msr ttbr1_el1, x2   // load TTBR1
isb
-   msr sctlr_el1, x0
+   write_sctlr el1, x0
isb
/*
 * Invalidate the local I-cache so that any instructions fetched
@@ -716,7 +716,7 @@ ENDPROC(__relocate_kernel)
 __primary_switch:
 #ifdef CONFIG_RANDOMIZE_BASE
mov x19, x0 // preserve new SCTLR_EL1 value
-   mrs x20, sctlr_el1  // preserve old SCTLR_EL1 value
+   read_sctlr el1, x20 // preserve old SCTLR_EL1 value
 #endif
 
bl  __enable_mmu
@@ -732,14 +732,14 @@ __primary_switch:
 * to take into acco

Re: [PATCH v4 00/26] KVM/ARM: Add support for GICv4

2017-10-08 Thread Shanker Donthineni
Hi Marc,

I've tested this patch series on QDF2400 server platform using NVME card, the 
basic
functionality works fine and the below log messages shows around 70 interrupts
are delivered to vCPU directly.


Tested-by: Shanker Donthineni 


>From guest kernel:
 /mnt # cat /proc/interrupts | grep ITS
 51: 83   ITS-MSI 32768 Edge  nvme0q0, nvme0q1
 52:  0   ITS-MSI 16384 Edge  virtio0-config
 53:  0   ITS-MSI 16385 Edge  virtio0-input.0
 54:  0   ITS-MSI 16386 Edge  virtio0-output.0
 
>From host kernel:
 /mnt # cat /proc/interrupts | grep GICv4
 388:  9  GICv4-vpe   0 Edge  vcpu

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] arm64: KVM: Reject non-compliant HVC calls from guest kernel

2017-08-07 Thread Shanker Donthineni
The SMC/HVC instructions with an immediate value non-zero are not compliant
according to 'SMC calling convention system software document'. Add a
validation check in handle_hvc() to avoid malicious HVC calls from VM, and
inject an undefined instruction for those calls.

http://infocenter.arm.com/help/topic/com.arm.doc.den0028b/ARM_DEN0028B_SMC_Calling_Convention.pdf

Signed-off-by: Shanker Donthineni 
---
 arch/arm64/include/asm/esr.h |  4 
 arch/arm64/kvm/handle_exit.c | 12 +++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 8cabd57..fa988e5 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -107,6 +107,9 @@
 #define ESR_ELx_AR (UL(1) << 14)
 #define ESR_ELx_CM (UL(1) << 8)
 
+/* ISS field definitions for HVC/SVC instruction execution traps */
+#define ESR_HVC_IMMEDIATE(esr) ((esr) & 0x)
+
 /* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV (UL(1) << 24)
 #define ESR_ELx_COND_SHIFT (20)
@@ -114,6 +117,7 @@
 #define ESR_ELx_WFx_ISS_WFE(UL(1) << 0)
 #define ESR_ELx_xVC_IMM_MASK   ((1UL << 16) - 1)
 
+
 /* ESR value templates for specific events */
 
 /* BRK instruction trap from AArch64 state */
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 17d8a16..a900dcd 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -42,13 +42,15 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run 
*run)
kvm_vcpu_hvc_get_imm(vcpu));
vcpu->stat.hvc_exit_stat++;
 
-   ret = kvm_psci_call(vcpu);
-   if (ret < 0) {
-   kvm_inject_undefined(vcpu);
-   return 1;
+   /* HVC immediate value must be zero for all compliant calls */
+   if (!ESR_HVC_IMMEDIATE(kvm_vcpu_get_hsr(vcpu))) {
+   ret = kvm_psci_call(vcpu);
+   if (ret >= 0)
+   return ret;
}
 
-   return ret;
+   kvm_inject_undefined(vcpu);
+   return 1;
 }
 
 static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: Fix bug in advertising KVM_CAP_MSI_DEVID capability

2017-07-08 Thread Shanker Donthineni
Commit 0e4e82f154e3 ("KVM: arm64: vgic-its: Enable ITS emulation as
a virtual MSI controller") tried to advertise KVM_CAP_MSI_DEVID, but
the code logic was not updating the dist->msis_require_devid field
correctly. If hypervisor tool creates the ITS device after VGIC
initialization then we don't advertise KVM_CAP_MSI_DEVID capability.

Update the field msis_require_devid to true inside vgic_its_create()
to fix the issue.

Fixes: 0e4e82f154e3 ("vgic-its: Enable ITS emulation as a virtual MSI 
controller")
Signed-off-by: Shanker Donthineni 
---
 virt/kvm/arm/vgic/vgic-init.c | 3 ---
 virt/kvm/arm/vgic/vgic-its.c  | 1 +
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index 3a0b899..5801261 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -285,9 +285,6 @@ int vgic_init(struct kvm *kvm)
if (ret)
goto out;
 
-   if (vgic_has_its(kvm))
-   dist->msis_require_devid = true;
-
kvm_for_each_vcpu(i, vcpu, kvm)
kvm_vgic_vcpu_enable(vcpu);
 
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 2dff288..aa6b68d 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -1598,6 +1598,7 @@ static int vgic_its_create(struct kvm_device *dev, u32 
type)
INIT_LIST_HEAD(&its->device_list);
INIT_LIST_HEAD(&its->collection_list);
 
+   dev->kvm->arch.vgic.msis_require_devid = true;
dev->kvm->arch.vgic.has_its = true;
its->enabled = false;
its->dev = dev;
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 38/52] KVM: arm/arm64: GICv4: Wire init/teardown of per-VM support

2017-07-08 Thread Shanker Donthineni
Hi Marc,

On 06/28/2017 10:03 AM, Marc Zyngier wrote:
> Should the HW support GICv4 and an ITS being associated with this
> VM, let's init the its_vm and its_vpe structures.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  virt/kvm/arm/vgic/vgic-init.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 3a0b8999f011..0de1f0d986d4 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -285,8 +285,14 @@ int vgic_init(struct kvm *kvm)
>   if (ret)
>   goto out;
>  
> - if (vgic_has_its(kvm))
> + if (vgic_has_its(kvm)) {
>   dist->msis_require_devid = true;
> + if (kvm_vgic_global_state.has_gicv4) {
> + ret = vgic_v4_init(kvm);
> + if (ret)
> + goto out;
> + }

This is not quite right, ITS virtual device may not be initialized at the time 
of 
calling vgic-init(). This change breaks the existing KVM functionality with QEMU
hypervisor tool. In later patches, code assumes vgic_v4_init(kvm) was called 
when
vgic_has_its(kvm) returns a true value. 

The right change would be move this logic to inside vgic_its_create() something 
like this.

 --- a/virt/kvm/arm/vgic/vgic-init.c
 +++ b/virt/kvm/arm/vgic/vgic-init.c
 @@ -285,14 +285,8 @@ int vgic_init(struct kvm *kvm)
 if (ret)
 goto out;
 
 -   if (vgic_has_its(kvm)) { 
 +   if (vgic_has_its(kvm))
 dist->msis_require_devid = true;
 -   if (kvm_vgic_global_state.has_gicv4) {
 -   ret = vgic_v4_init(kvm);
 -   if (ret)
 -   goto out;
 -   }
 -   }
 
 kvm_for_each_vcpu(i, vcpu, kvm)
 kvm_vgic_vcpu_enable(vcpu);

 --- a/virt/kvm/arm/vgic/vgic-its.c
 +++ b/virt/kvm/arm/vgic/vgic-its.c
 @@ -1637,6 +1637,7 @@ static int vgic_register_its_iodev(struct kvm *kvm, 
struct
  static int vgic_its_create(struct kvm_device *dev, u32 type)
  {
 struct vgic_its *its;
 +   int ret;
 
 if (type != KVM_DEV_TYPE_ARM_VGIC_ITS) 
 return -ENODEV;
 @@ -1657,6 +1658,12 @@ static int vgic_its_create(struct kvm_device *dev, u32 
ty
 its->enabled = false;
 its->dev = dev;
  
 +   if (kvm_vgic_global_state.has_gicv4) {
 +   ret = vgic_v4_init(dev->kvm);
 +   if (ret)
 +   return -ENOMEM;
 +   }
 +


> + }
>  
>   kvm_for_each_vcpu(i, vcpu, kvm)
>   kvm_vgic_vcpu_enable(vcpu);
> @@ -323,6 +329,9 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
>  
>   kfree(dist->spis);
>   dist->nr_spis = 0;
> +
> + if (kvm_vgic_global_state.has_gicv4 && vgic_has_its(kvm))
> + vgic_v4_teardown(kvm);
>  }
>  
>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [PATCH v2 00/52] irqchip: KVM: Add support for GICv4

2017-07-01 Thread Shanker Donthineni
Hi Marc,

I've verified the basic GICv4 functionality with v2 series + Eric's IRQ
bypass patches on QDF2400 platform with a minor change in vgic-init.c
successfully.  Nice, I don't  see any deadlock or catastrophic issues
running on QCOM hardware. You can add my tested-by, I'll provide comments
after reviewing giant v2 series.

Tested-by: Shanker Donthineni 

-Original Message-
From: linux-arm-kernel [mailto:linux-arm-kernel-boun...@lists.infradead.org]
On Behalf Of Marc Zyngier
Sent: Wednesday, June 28, 2017 10:03 AM
To: linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
kvmarm@lists.cs.columbia.edu
Cc: Mark Rutland ; Jason Cooper
; Eric Auger ; Christoffer Dall
; Thomas Gleixner ; Shanker
Donthineni 
Subject: [PATCH v2 00/52] irqchip: KVM: Add support for GICv4

Yes, it's been a long time coming, but I really wasn't looking forward to
picking this up again. Anyway...

This (monster of a) series implements full support for GICv4, bringing
direct injection of MSIs to KVM on arm and arm64, assuming you have the
right hardware (which is quite unlikely).

To get an idea of the design, I'd recommend you start with patch #32, which
tries to shed some light on the approach that I've taken. And before that,
please digest some of the GICv3/GICv4 architecture documentation[1] (less
than 800 pages!). Once you feel reasonably insane, you'll be in the right
mood to read the code.

The structure of the series is fairly simple. The initial 34 patches add
some generic support for GICv4, while the rest of the code plugs KVM into
it. This series relies on Eric Auger's irq-bypass series[2], which is a
prerequisite for this work.

The stack has been *very lightly* tested on an arm64 model, with a PCI
virtio block device passed from the host to a guest (using kvmtool and
Jean-Philippe Brucker's excellent VFIO support patches[3]). As it has never
seen any HW, I expect things to be subtly broken, so go forward and test if
you can, though I'm mostly interested in people reviewing the code at the
moment.

I've pushed out a branch based on 4.12-rc6 containing the dependencies (as
well as a couple of debug patches):

git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
kvm-arm64/gicv4-kvm

* From v1:
  - The bulk of the 30-something initial patches have seen countless
bugs being fixed, and some key data structures have been subtly
tweaked (or killed altogether). They are still quite similar to
what I had in v1 though.
  - The whole KVM code is brand new and as I said above, only lightly
tested.
  - Collected a bunch a R-bs from Thomas and Eric (many thanks, guys).

[1]
https://static.docs.arm.com/ihi0069/c/IHI0069C_gic_architecture_specificatio
n.pdf
[2] http://www.spinics.net/lists/kvm/msg151463.html
[3] http://www.spinics.net/lists/kvm/msg151823.html

Marc Zyngier (52):
  genirq: Let irq_set_vcpu_affinity() iterate over hierarchy
  irqchip/gic-v3: Add redistributor iterator
  irqchip/gic-v3: Add VLPI/DirectLPI discovery
  irqchip/gic-v3-its: Move LPI definitions around
  irqchip/gic-v3-its: Add probing for VLPI properties
  irqchip/gic-v3-its: Macro-ize its_send_single_command
  irqchip/gic-v3-its: Implement irq_set_irqchip_state for pending state
  irqchip/gic-v3-its: Split out property table allocation
  irqchip/gic-v3-its: Allow use of indirect VCPU tables
  irqchip/gic-v3-its: Split out pending table allocation
  irqchip/gic-v3-its: Rework LPI freeing
  irqchip/gic-v3-its: Generalize device table allocation
  irqchip/gic-v3-its: Generalize LPI configuration
  irqchip/gic-v4: Add management structure definitions
  irqchip/gic-v3-its: Add GICv4 ITS command definitions
  irqchip/gic-v3-its: Add VLPI configuration hook
  irqchip/gic-v3-its: Add VLPI map/unmap operations
  irqchip/gic-v3-its: Add VLPI configuration handling
  irqchip/gic-v3-its: Add VPE domain infrastructure
  irqchip/gic-v3-its: Add VPE irq domain allocation/teardown
  irqchip/gic-v3-its: Add VPE irq domain [de]activation
  irqchip/gic-v3-its: Add VPENDBASER/VPROPBASER accessors
  irqchip/gic-v3-its: Add VPE scheduling
  irqchip/gic-v3-its: Add VPE invalidation hook
  irqchip/gic-v3-its: Add VPE affinity changes
  irqchip/gic-v3-its: Add VPE interrupt masking
  irqchip/gic-v3-its: Support VPE doorbell invalidation even when
!DirectLPI
  irqchip/gic-v3-its: Set implementation defined bit to enable VLPIs
  irqchip/gic-v4: Add per-VM VPE domain creation
  irqchip/gic-v4: Add VPE command interface
  irqchip/gic-v4: Add VLPI configuration interface
  irqchip/gic-v4: Add some basic documentation
  irqchip/gic-v4: Enable low-level GICv4 operations
  irqchip/gic-v3: Advertise GICv4 support to KVM
  KVM: arm/arm64: vgic: Move kvm_vgic_destroy call around
  KVM: arm/arm64: vITS: Add MSI translation helpers
  KVM: arm/arm64: GICv4: Add init and teardown of the vPE irq domain
  KVM: arm/arm64: GICv4: Wire init/teardown of per-VM support
  KVM: arm/arm

Re: [RFC PATCH 24/33] irqchip/gic-v3-its: Add VPE scheduling

2017-03-16 Thread Shanker Donthineni
Hi Eric,


On 03/16/2017 04:23 PM, Auger Eric wrote:
> Hi,
>
> On 17/01/2017 11:20, Marc Zyngier wrote:
>> When a VPE is scheduled to run, the corresponding redistributor must
>> be told so, by setting VPROPBASER to the VM's property table, and
>> VPENDBASER to the vcpu's pending table.
>>
>> When scheduled out, we preserve the IDAI and PendingLast bits. The
>> latter is specially important, as it tells the hypervisor that
>> there are pending interrupts for this vcpu.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  drivers/irqchip/irq-gic-v3-its.c   | 57 ++
>>  include/linux/irqchip/arm-gic-v3.h | 63 
>> ++
>>  2 files changed, 120 insertions(+)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
>> b/drivers/irqchip/irq-gic-v3-its.c
>> index 598e25b..f918d59 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -143,6 +143,7 @@ static DEFINE_IDA(its_vpeid_ida);
>>  
>>  #define gic_data_rdist()(raw_cpu_ptr(gic_rdists->rdist))
>>  #define gic_data_rdist_rd_base()(gic_data_rdist()->rd_base)
>> +#define gic_data_rdist_vlpi_base()  (gic_data_rdist_rd_base() + SZ_128K)
>>  
>>  static struct its_collection *dev_event_to_col(struct its_device *its_dev,
>> u32 event)
>> @@ -2039,8 +2040,64 @@ static const struct irq_domain_ops its_domain_ops = {
>>  .deactivate = its_irq_domain_deactivate,
>>  };
>>  
>> +static int its_vpe_set_vcpu_affinity(struct irq_data *d, void *vcpu_info)
>> +{
>> +struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
>> +struct its_cmd_info *info = vcpu_info;
>> +u64 val;
>> +
>> +switch (info->cmd_type) {
>> +case SCHEDULE_VPE:
>> +{
>> +void * __iomem vlpi_base = gic_data_rdist_vlpi_base();
>> +
>> +/* Schedule the VPE */
>> +val  = virt_to_phys(page_address(vpe->its_vm->vprop_page)) &
>> +GENMASK_ULL(51, 12);
>> +val |= (LPI_NRBITS - 1) & GICR_VPROPBASER_IDBITS_MASK;
>> +val |= GICR_VPROPBASER_RaWb;
>> +val |= GICR_VPROPBASER_InnerShareable;
>> +gits_write_vpropbaser(val, vlpi_base + GICR_VPROPBASER);
>> +
>> +val  = virt_to_phys(page_address(vpe->vpt_page)) & GENMASK(51, 
>> 16);
>> +val |= GICR_VPENDBASER_WaWb;
>> +val |= GICR_VPENDBASER_NonShareable;
>> +val |= GICR_PENDBASER_PendingLast;
> don't you want to restore the vpe->pending_last here? anyway I
> understand this will force the HW to read the LPI pending table.

It's not a good idea to set PendLast bit always. There is no correctness issue 
but causes a huge impact on the system performance. No need to read pending 
table contents from memory if no VLPI are pending on vPE that is being 
scheduled.

> Reviewed-by: Eric Auger 
>
> Eric
>
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] arm64: kvm: Use has_vhe() instead of hyp_alternate_select()

2017-03-06 Thread Shanker Donthineni

Hi Marc,


On 03/06/2017 02:34 AM, Marc Zyngier wrote:

Hi Shanker,

On Mon, Mar 06 2017 at  2:33:18 am GMT, Shanker Donthineni 
 wrote:

Now all the cpu_hwcaps features have their own static keys. We don't
need a separate function hyp_alternate_select() to patch the vhe/nvhe
code. We can achieve the same functionality by using has_vhe(). It
improves the code readability, uses the jump label instructions, and
also compiler generates the better code with a fewer instructions.

How do you define "better"? Which compiler? Do you have any benchmarking data?
I'm using gcc version 5.2.0. With has_vhe() it shows the smaller code 
size as shown below. I tried to benchmark
the code changes using Cristiffer's microbench tool, but not seeing a 
noticeable difference on QDF2400 platform.


hyp_alternate_select() uses BR/BLR instructions to patch vhe/mvhe code, 
which is not good for branch prediction purpose.
compiler treats patched code as a function call, so the contents of the 
registers x0-x18 are not reusable after vhe/nvhe call.


Current code:
arch/arm64/kvm/hyp/switch.o: file format elf64-littleaarch64

Sections:
Idx Name  Size  VMA   LMA   File 
off  Algn
  0 .text      
0040  2**0

  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data      
0040  2**0

  CONTENTS, ALLOC, LOAD, DATA
  2 .bss       
0040  2**0

  ALLOC
  3 .hyp.text 0550     
0040  2**3

  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE

New code:
arch/arm64/kvm/hyp/switch.o: file format elf64-littleaarch64

Sections:
Idx Name  Size  VMA   LMA   File 
off  Algn
  0 .text      
0040  2**0

  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data      
0040  2**0

  CONTENTS, ALLOC, LOAD, DATA
  2 .bss       
0040  2**0

  ALLOC
  3 .hyp.text 0488     
0040  2**3

  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE



Signed-off-by: Shanker Donthineni 
---
v2: removed 'Change-Id: Ia8084189833f2081ff13c392deb5070c46a64038' from commit

  arch/arm64/kvm/hyp/debug-sr.c  | 12 ++
  arch/arm64/kvm/hyp/switch.c| 50 +++---
  arch/arm64/kvm/hyp/sysreg-sr.c | 23 +--
  3 files changed, 43 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index f5154ed..e5642c2 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -109,9 +109,13 @@ static void __hyp_text __debug_save_spe_nvhe(u64 
*pmscr_el1)
dsb(nsh);
  }
  
-static hyp_alternate_select(__debug_save_spe,

-   __debug_save_spe_nvhe, __debug_save_spe_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
+static void __hyp_text __debug_save_spe(u64 *pmscr_el1)
+{
+   if (has_vhe())
+   __debug_save_spe_vhe(pmscr_el1);
+   else
+   __debug_save_spe_nvhe(pmscr_el1);
+}

I have two worries about this kind of thing:
- Not all compilers do support jump labels, leading to a memory access
on each static key (GCC 4.8, for example). This would immediately
introduce a pretty big regression
- The hyp_alternate_select() method doesn't introduce a fast/slow path
duality. Each path has the exact same cost. I'm not keen on choosing
what is supposed to be the fast path, really.
Yes, it'll require a runtime check if the compiler doesn't support ASM 
GOTO labels.
Agree, hyp_alternate_select() has a constant branch over head but it 
might cause a branch prediction penality.



Thanks,

M.


--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2] arm64: kvm: Use has_vhe() instead of hyp_alternate_select()

2017-03-05 Thread Shanker Donthineni
Now all the cpu_hwcaps features have their own static keys. We don't
need a separate function hyp_alternate_select() to patch the vhe/nvhe
code. We can achieve the same functionality by using has_vhe(). It
improves the code readability, uses the jump label instructions, and
also compiler generates the better code with a fewer instructions.

Signed-off-by: Shanker Donthineni 
---
v2: removed 'Change-Id: Ia8084189833f2081ff13c392deb5070c46a64038' from commit

 arch/arm64/kvm/hyp/debug-sr.c  | 12 ++
 arch/arm64/kvm/hyp/switch.c| 50 +++---
 arch/arm64/kvm/hyp/sysreg-sr.c | 23 +--
 3 files changed, 43 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index f5154ed..e5642c2 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -109,9 +109,13 @@ static void __hyp_text __debug_save_spe_nvhe(u64 
*pmscr_el1)
dsb(nsh);
 }
 
-static hyp_alternate_select(__debug_save_spe,
-   __debug_save_spe_nvhe, __debug_save_spe_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
+static void __hyp_text __debug_save_spe(u64 *pmscr_el1)
+{
+   if (has_vhe())
+   __debug_save_spe_vhe(pmscr_el1);
+   else
+   __debug_save_spe_nvhe(pmscr_el1);
+}
 
 static void __hyp_text __debug_restore_spe(u64 pmscr_el1)
 {
@@ -180,7 +184,7 @@ void __hyp_text __debug_cond_save_host_state(struct 
kvm_vcpu *vcpu)
 
__debug_save_state(vcpu, &vcpu->arch.host_debug_state.regs,
   kern_hyp_va(vcpu->arch.host_cpu_context));
-   __debug_save_spe()(&vcpu->arch.host_debug_state.pmscr_el1);
+   __debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
 }
 
 void __hyp_text __debug_cond_restore_host_state(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index aede165..c5c77b8 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -33,13 +33,9 @@ static bool __hyp_text __fpsimd_enabled_vhe(void)
return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
 }
 
-static hyp_alternate_select(__fpsimd_is_enabled,
-   __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
-
 bool __hyp_text __fpsimd_enabled(void)
 {
-   return __fpsimd_is_enabled()();
+   return has_vhe() ? __fpsimd_enabled_vhe() : __fpsimd_enabled_nvhe();
 }
 
 static void __hyp_text __activate_traps_vhe(void)
@@ -63,9 +59,10 @@ static void __hyp_text __activate_traps_nvhe(void)
write_sysreg(val, cptr_el2);
 }
 
-static hyp_alternate_select(__activate_traps_arch,
-   __activate_traps_nvhe, __activate_traps_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
+static void __hyp_text __activate_traps_arch(void)
+{
+   has_vhe() ? __activate_traps_vhe() : __activate_traps_nvhe();
+}
 
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
@@ -97,7 +94,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
write_sysreg(0, pmselr_el0);
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
-   __activate_traps_arch()();
+   __activate_traps_arch();
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
@@ -127,9 +124,10 @@ static void __hyp_text __deactivate_traps_nvhe(void)
write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
 }
 
-static hyp_alternate_select(__deactivate_traps_arch,
-   __deactivate_traps_nvhe, __deactivate_traps_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
+static void __hyp_text __deactivate_traps_arch(void)
+{
+   has_vhe() ? __deactivate_traps_vhe() : __deactivate_traps_nvhe();
+}
 
 static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 {
@@ -142,7 +140,7 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu 
*vcpu)
if (vcpu->arch.hcr_el2 & HCR_VSE)
vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
 
-   __deactivate_traps_arch()();
+   __deactivate_traps_arch();
write_sysreg(0, hstr_el2);
write_sysreg(0, pmuserenr_el0);
 }
@@ -183,20 +181,14 @@ static void __hyp_text __vgic_restore_state(struct 
kvm_vcpu *vcpu)
__vgic_v2_restore_state(vcpu);
 }
 
-static bool __hyp_text __true_value(void)
+static bool __check_arm_834220(void)
 {
-   return true;
-}
+   if (cpus_have_const_cap(ARM64_WORKAROUND_834220))
+   return true;
 
-static bool __hyp_text __false_value(void)
-{
return false;
 }
 
-static hyp_alternate_select(__check_arm_834220,
-   __false_value, __true_value,
-   ARM64_WORKAROUND_834220);
-
 static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar)
 {

[PATCH] arm64: kvm: Use has_vhe() instead of hyp_alternate_select()

2017-03-05 Thread Shanker Donthineni
Now all the cpu_hwcaps features have their own static keys. We don't
need a separate function hyp_alternate_select() to patch the vhe/nvhe
code. We can achieve the same functionality by using has_vhe(). It
improves the code readability, uses the jump label instructions, and
also compiler generates the better code with a fewer instructions.

Change-Id: Ia8084189833f2081ff13c392deb5070c46a64038
Signed-off-by: Shanker Donthineni 
---
 arch/arm64/kvm/hyp/debug-sr.c  | 12 ++
 arch/arm64/kvm/hyp/switch.c| 50 +++---
 arch/arm64/kvm/hyp/sysreg-sr.c | 23 +--
 3 files changed, 43 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index f5154ed..e5642c2 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -109,9 +109,13 @@ static void __hyp_text __debug_save_spe_nvhe(u64 
*pmscr_el1)
dsb(nsh);
 }
 
-static hyp_alternate_select(__debug_save_spe,
-   __debug_save_spe_nvhe, __debug_save_spe_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
+static void __hyp_text __debug_save_spe(u64 *pmscr_el1)
+{
+   if (has_vhe())
+   __debug_save_spe_vhe(pmscr_el1);
+   else
+   __debug_save_spe_nvhe(pmscr_el1);
+}
 
 static void __hyp_text __debug_restore_spe(u64 pmscr_el1)
 {
@@ -180,7 +184,7 @@ void __hyp_text __debug_cond_save_host_state(struct 
kvm_vcpu *vcpu)
 
__debug_save_state(vcpu, &vcpu->arch.host_debug_state.regs,
   kern_hyp_va(vcpu->arch.host_cpu_context));
-   __debug_save_spe()(&vcpu->arch.host_debug_state.pmscr_el1);
+   __debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
 }
 
 void __hyp_text __debug_cond_restore_host_state(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index aede165..c5c77b8 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -33,13 +33,9 @@ static bool __hyp_text __fpsimd_enabled_vhe(void)
return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
 }
 
-static hyp_alternate_select(__fpsimd_is_enabled,
-   __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
-
 bool __hyp_text __fpsimd_enabled(void)
 {
-   return __fpsimd_is_enabled()();
+   return has_vhe() ? __fpsimd_enabled_vhe() : __fpsimd_enabled_nvhe();
 }
 
 static void __hyp_text __activate_traps_vhe(void)
@@ -63,9 +59,10 @@ static void __hyp_text __activate_traps_nvhe(void)
write_sysreg(val, cptr_el2);
 }
 
-static hyp_alternate_select(__activate_traps_arch,
-   __activate_traps_nvhe, __activate_traps_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
+static void __hyp_text __activate_traps_arch(void)
+{
+   has_vhe() ? __activate_traps_vhe() : __activate_traps_nvhe();
+}
 
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
@@ -97,7 +94,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
write_sysreg(0, pmselr_el0);
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
-   __activate_traps_arch()();
+   __activate_traps_arch();
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
@@ -127,9 +124,10 @@ static void __hyp_text __deactivate_traps_nvhe(void)
write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
 }
 
-static hyp_alternate_select(__deactivate_traps_arch,
-   __deactivate_traps_nvhe, __deactivate_traps_vhe,
-   ARM64_HAS_VIRT_HOST_EXTN);
+static void __hyp_text __deactivate_traps_arch(void)
+{
+   has_vhe() ? __deactivate_traps_vhe() : __deactivate_traps_nvhe();
+}
 
 static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 {
@@ -142,7 +140,7 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu 
*vcpu)
if (vcpu->arch.hcr_el2 & HCR_VSE)
vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
 
-   __deactivate_traps_arch()();
+   __deactivate_traps_arch();
write_sysreg(0, hstr_el2);
write_sysreg(0, pmuserenr_el0);
 }
@@ -183,20 +181,14 @@ static void __hyp_text __vgic_restore_state(struct 
kvm_vcpu *vcpu)
__vgic_v2_restore_state(vcpu);
 }
 
-static bool __hyp_text __true_value(void)
+static bool __check_arm_834220(void)
 {
-   return true;
-}
+   if (cpus_have_const_cap(ARM64_WORKAROUND_834220))
+   return true;
 
-static bool __hyp_text __false_value(void)
-{
return false;
 }
 
-static hyp_alternate_select(__check_arm_834220,
-   __false_value, __true_value,
-   ARM64_WORKAROUND_834220);
-
 static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar)
 {
u64 par, tmp;
@@ -251,7 +243,7 @@ static bool __hyp_tex

Re: [RFC PATCH 24/33] irqchip/gic-v3-its: Add VPE scheduling

2017-02-13 Thread Shanker Donthineni
R_CACHEABILITY_MASK
\
+   GIC_BASER_CACHEABILITY(GICR_VPROPBASER, OUTER, MASK)
+#define GICR_VPROPBASER_CACHEABILITY_MASK  \
+   GICR_VPROPBASER_INNER_CACHEABILITY_MASK
+
+#define GICR_VPROPBASER_InnerShareable \
+   GIC_BASER_SHAREABILITY(GICR_VPROPBASER, InnerShareable)
+
+#define GICR_VPROPBASER_nCnB   GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, nCnB)
+#define GICR_VPROPBASER_nC GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, nC)
+#define GICR_VPROPBASER_RaWt   GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, RaWt)
+#define GICR_VPROPBASER_RaWb   GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, RaWt)
+#define GICR_VPROPBASER_WaWt   GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, WaWt)
+#define GICR_VPROPBASER_WaWb   GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, WaWb)
+#define GICR_VPROPBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, RaWaWt)
+#define GICR_VPROPBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, RaWaWb)
+
+#define GICR_VPENDBASER0x0078
+
+#define GICR_VPENDBASER_SHAREABILITY_SHIFT (10)
+#define GICR_VPENDBASER_INNER_CACHEABILITY_SHIFT   (7)
+#define GICR_VPENDBASER_OUTER_CACHEABILITY_SHIFT   (56)
+#define GICR_VPENDBASER_SHAREABILITY_MASK  \
+   GIC_BASER_SHAREABILITY(GICR_VPENDBASER, SHAREABILITY_MASK)
+#define GICR_VPENDBASER_INNER_CACHEABILITY_MASK
\
+   GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, MASK)
+#define GICR_VPENDBASER_OUTER_CACHEABILITY_MASK
\
+   GIC_BASER_CACHEABILITY(GICR_VPENDBASER, OUTER, MASK)
+#define GICR_VPENDBASER_CACHEABILITY_MASK  \
+   GICR_VPENDBASER_INNER_CACHEABILITY_MASK
+
+#define GICR_VPENDBASER_NonShareable   \
+   GIC_BASER_SHAREABILITY(GICR_VPENDBASER, NonShareable)
+
+#define GICR_VPENDBASER_nCnB   GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, nCnB)
+#define GICR_VPENDBASER_nC GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, nC)
+#define GICR_VPENDBASER_RaWt   GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, RaWt)
+#define GICR_VPENDBASER_RaWb   GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, RaWt)
+#define GICR_VPENDBASER_WaWt   GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, WaWt)
+#define GICR_VPENDBASER_WaWb   GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, WaWb)
+#define GICR_VPENDBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, RaWaWt)
+#define GICR_VPENDBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, RaWaWb)
+
+#define GICR_PENDBASER_Dirty   (1ULL << 60)
+#define GICR_PENDBASER_PendingLast (1ULL << 61)
+#define GICR_PENDBASER_IDAI(1ULL << 62)
+#define GICR_PENDBASER_Valid   (1ULL << 63)
+
+/*
   * ITS registers, offsets from ITS_base
   */
  #define GITS_CTLR 0x


--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 28/33] irqchip/gic-v3-its: Support VPE doorbell invalidation even when !DirectLPI

2017-02-13 Thread Shanker Donthineni

Hi Marc,


On 01/17/2017 04:20 AM, Marc Zyngier wrote:

When we don't have the DirectLPI feature, we must work around the
architecture shortcomings to be able to perform the required
invalidation.

For this, we create a fake device whose sole purpose is to
provide a way to issue a map/inv/unmap sequence (and the corresponding
sync operations). That's 6 commands and a full serialization point
to be able to do this.

You just have hope the hypervisor won't do that too often...

Signed-off-by: Marc Zyngier 
---
  drivers/irqchip/irq-gic-v3-its.c | 59
++--
  1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c
index 008fb71..3787579 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -133,6 +133,9 @@ struct its_device {
u32 device_id;
  };
  
+static struct its_device *vpe_proxy_dev;

+static DEFINE_RAW_SPINLOCK(vpe_proxy_dev_lock);
+
  static LIST_HEAD(its_nodes);
  static DEFINE_SPINLOCK(its_lock);
  static struct rdists *gic_rdists;
@@ -993,8 +996,35 @@ static void lpi_update_config(struct irq_data *d, u8
clr, u8 set)
struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
void __iomem *rdbase;
  
-		rdbase = per_cpu_ptr(gic_rdists->rdist,

vpe->col_idx)->rd_base;
-   writeq_relaxed(d->hwirq, rdbase + GICR_INVLPIR);
+   if (gic_rdists->has_direct_lpi) {
+   rdbase = per_cpu_ptr(gic_rdists->rdist,
vpe->col_idx)->rd_base;
+   writeq_relaxed(d->hwirq, rdbase + GICR_INVLPIR);
+   } else {
+   /*
+* This is insane.
+*
+* If a GICv4 doesn't implement Direct LPIs,
+* the only way to perform an invalidate is to
+* use a fake device to issue a MAP/INV/UNMAP
+* sequence. Since each of these commands has
+* a sync operation, this is really fast. Not.
+*
+* We always use event 0, and this serialize
+* all VPE invalidations in the system.
+*
+* Broken by design(tm).
+*/
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(&vpe_proxy_dev_lock, flags);
+
+   vpe_proxy_dev->event_map.col_map[0] =
vpe->col_idx;
+   its_send_mapvi(vpe_proxy_dev, vpe->vpe_db_lpi, 0);
+   its_send_inv(vpe_proxy_dev, 0);
+   its_send_discard(vpe_proxy_dev, 0);
+
+   raw_spin_unlock_irqrestore(&vpe_proxy_dev_lock,
flags);
+   }
}
  }
  
@@ -2481,6 +2511,31 @@ static struct irq_domain *its_init_vpe_domain(void)

struct fwnode_handle *handle;
struct irq_domain *domain;
  
+	if (gic_rdists->has_direct_lpi) {

+   pr_info("ITS: Using DirectLPI for VPE invalidation\n");
+   } else {
+   struct its_node *its;
+
+   list_for_each_entry(its, &its_nodes, entry) {
+   u32 devid;
+
+   if (!its->is_v4)
+   continue;
+
+   /* Use the last possible DevID */
+   devid = GENMASK(its->device_ids - 1, 0);
How do we know this 'devid' is not being used by real hardware devices? 
I think we need some kind check in its_msi_prepare() to skip this device 
or WARN.

Unfortunately Qualcomm doesn't support Direct LPI feature.


--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 24/33] irqchip/gic-v3-its: Add VPE scheduling

2017-02-13 Thread Shanker Donthineni
 (1 << 0)
  
  /*

+ * Re-Distributor registers, offsets from VLPI_base
+ */
+#define GICR_VPROPBASER0x0070
+
+#define GICR_VPROPBASER_IDBITS_MASK0x1f
+
+#define GICR_VPROPBASER_SHAREABILITY_SHIFT (10)
+#define GICR_VPROPBASER_INNER_CACHEABILITY_SHIFT   (7)
+#define GICR_VPROPBASER_OUTER_CACHEABILITY_SHIFT   (56)
+
+#define GICR_VPROPBASER_SHAREABILITY_MASK  \
+   GIC_BASER_SHAREABILITY(GICR_VPROPBASER, SHAREABILITY_MASK)
+#define GICR_VPROPBASER_INNER_CACHEABILITY_MASK
\
+   GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, MASK)
+#define GICR_VPROPBASER_OUTER_CACHEABILITY_MASK
\
+   GIC_BASER_CACHEABILITY(GICR_VPROPBASER, OUTER, MASK)
+#define GICR_VPROPBASER_CACHEABILITY_MASK  \
+   GICR_VPROPBASER_INNER_CACHEABILITY_MASK
+
+#define GICR_VPROPBASER_InnerShareable \
+   GIC_BASER_SHAREABILITY(GICR_VPROPBASER, InnerShareable)
+
+#define GICR_VPROPBASER_nCnB   GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, nCnB)
+#define GICR_VPROPBASER_nC GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, nC)
+#define GICR_VPROPBASER_RaWt   GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, RaWt)
+#define GICR_VPROPBASER_RaWb   GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, RaWt)
+#define GICR_VPROPBASER_WaWt   GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, WaWt)
+#define GICR_VPROPBASER_WaWb   GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, WaWb)
+#define GICR_VPROPBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, RaWaWt)
+#define GICR_VPROPBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_VPROPBASER,
INNER, RaWaWb)
+
+#define GICR_VPENDBASER0x0078
+
+#define GICR_VPENDBASER_SHAREABILITY_SHIFT (10)
+#define GICR_VPENDBASER_INNER_CACHEABILITY_SHIFT   (7)
+#define GICR_VPENDBASER_OUTER_CACHEABILITY_SHIFT   (56)
+#define GICR_VPENDBASER_SHAREABILITY_MASK  \
+   GIC_BASER_SHAREABILITY(GICR_VPENDBASER, SHAREABILITY_MASK)
+#define GICR_VPENDBASER_INNER_CACHEABILITY_MASK
\
+   GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, MASK)
+#define GICR_VPENDBASER_OUTER_CACHEABILITY_MASK
\
+   GIC_BASER_CACHEABILITY(GICR_VPENDBASER, OUTER, MASK)
+#define GICR_VPENDBASER_CACHEABILITY_MASK  \
+   GICR_VPENDBASER_INNER_CACHEABILITY_MASK
+
+#define GICR_VPENDBASER_NonShareable   \
+   GIC_BASER_SHAREABILITY(GICR_VPENDBASER, NonShareable)
+
+#define GICR_VPENDBASER_nCnB   GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, nCnB)
+#define GICR_VPENDBASER_nC GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, nC)
+#define GICR_VPENDBASER_RaWt   GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, RaWt)
+#define GICR_VPENDBASER_RaWb   GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, RaWt)
+#define GICR_VPENDBASER_WaWt   GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, WaWt)
+#define GICR_VPENDBASER_WaWb   GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, WaWb)
+#define GICR_VPENDBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, RaWaWt)
+#define GICR_VPENDBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_VPENDBASER,
INNER, RaWaWb)
+
+#define GICR_PENDBASER_Dirty   (1ULL << 60)
+#define GICR_PENDBASER_PendingLast (1ULL << 61)
+#define GICR_PENDBASER_IDAI(1ULL << 62)
+#define GICR_PENDBASER_Valid   (1ULL << 63)
+
+/*
   * ITS registers, offsets from ITS_base
   */
  #define GITS_CTLR 0x


--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 23/33] irqchip/gic-v3-its: Add VPENDBASER/VPROPBASER accessors

2017-02-13 Thread Shanker Donthineni

Hi Marc,


On 01/17/2017 04:20 AM, Marc Zyngier wrote:

V{PEND,PROP}BASER being 64bit registers, they need some ad-hoc
accessors on 32bit, specially given that VPENDBASER contains
a Valid bit, making the access a bit convoluted.

Signed-off-by: Marc Zyngier 
---
  arch/arm/include/asm/arch_gicv3.h   | 28 
  arch/arm64/include/asm/arch_gicv3.h |  5 +
  2 files changed, 33 insertions(+)

diff --git a/arch/arm/include/asm/arch_gicv3.h
b/arch/arm/include/asm/arch_gicv3.h
index 2747590..3f18832 100644
--- a/arch/arm/include/asm/arch_gicv3.h
+++ b/arch/arm/include/asm/arch_gicv3.h
@@ -291,5 +291,33 @@ static inline u64 __gic_readq_nonatomic(const
volatile void __iomem *addr)
   */
  #define gits_write_cwriter(v, c)  __gic_writeq_nonatomic(v, c)
  
+/*

+ * GITS_VPROPBASER - hi and lo bits may be accessed independently.
+ */
+#define gits_write_vpropbaser(v, c)__gic_writeq_nonatomic(v, c)
+
+/*
+ * GITS_VPENDBASER - the Valid bit must be cleared before changing
+ * anything else.
+ */
+static inline void gits_write_vpendbaser(u64 val, void * __iomem addr)
+{
+   u32 tmp;
+
+   tmp = readl_relaxed(addr + 4);
+   if (tmp & GICR_PENDBASER_Valid) {
+   tmp &= ~GICR_PENDBASER_Valid;
+   writel_relaxed(tmp, addr + 4);
+   }
+
+   /*
+* Use the fact that __gic_writeq_nonatomic writes the second
+* half of the 64bit quantity after the first.
+*/
+   __gic_writeq_nonatomic(val, addr);
I'm not sure whether software has to check a register write pending bit 
GICR_CTLR.RWP or not. GICv3 spec says, the effect of a write to 
GICR_VPENDBASER register is not guaranteed to be visible throughout the 
affinity hierarchy,as indicated by GICR_CTLR.RWP == 0.


--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 21/33] irqchip/gic-v3-its: Add VPE irq domain allocation/teardown

2017-02-13 Thread Shanker Donthineni

Hi Marc,


On 01/17/2017 04:20 AM, Marc Zyngier wrote:

When creating a VM, the low level GICv4 code is responsible for:
- allocating each VPE a unique VPEID
- allocating a doorbell interrupt for each VPE
- allocating the pending tables for each VPE
- allocating the property table for the VM

This of course has to be reversed when the VM is brought down.

All of this is wired into the irq domain alloc/free methods.

Signed-off-by: Marc Zyngier 
---
  drivers/irqchip/irq-gic-v3-its.c | 174
+++
  1 file changed, 174 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c
index ddd8096..54d0075 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -139,6 +139,7 @@ static struct rdists *gic_rdists;
  static struct irq_domain *its_parent;
  
  static unsigned long its_list_map;

+static DEFINE_IDA(its_vpeid_ida);
  
  #define gic_data_rdist()		(raw_cpu_ptr(gic_rdists->rdist))

  #define gic_data_rdist_rd_base()  (gic_data_rdist()->rd_base)
@@ -1146,6 +1147,11 @@ static struct page *its_allocate_prop_table(gfp_t
gfp_flags)
return prop_page;
  }
  
+static void its_free_prop_table(struct page *prop_page)

+{
+   free_pages((unsigned long)page_address(prop_page),
+  get_order(LPI_PROPBASE_SZ));
+}
  
  static int __init its_alloc_lpi_tables(void)

  {
@@ -1444,6 +1450,12 @@ static struct page
*its_allocate_pending_table(gfp_t gfp_flags)
return pend_page;
  }
  
+static void its_free_pending_table(struct page *pt)

+{
+   free_pages((unsigned long)page_address(pt),
+  get_order(max(LPI_PENDBASE_SZ, SZ_64K)));
+}
+
  static void its_cpu_init_lpis(void)
  {
void __iomem *rbase = gic_data_rdist_rd_base();
@@ -1666,6 +1678,34 @@ static bool its_alloc_device_table(struct its_node
*its, u32 dev_id)
return its_alloc_table_entry(baser, dev_id);
  }
  
+static bool its_alloc_vpe_table(u32 vpe_id)

+{
+   struct its_node *its;
+
+   /*
+* Make sure the L2 tables are allocated on *all* v4 ITSs. We
+* could try and only do it on ITSs corresponding to devices
+* that have interrupts targeted at this VPE, but the
+* complexity becomes crazy (and you have tons of memory
+* anyway, right?).
+*/
+   list_for_each_entry(its, &its_nodes, entry) {
+   struct its_baser *baser;
+
+   if (!its->is_v4)
+   continue;
+
+   baser = its_get_baser(its, GITS_BASER_TYPE_VCPU);
+   if (!baser)
+   return false;
+
+   if (!its_alloc_table_entry(baser, vpe_id))
+   return false;
+   }
+
+   return true;
+}
+
  static struct its_device *its_create_device(struct its_node *its, u32
dev_id,
int nvecs)
  {
@@ -1922,7 +1962,141 @@ static struct irq_chip its_vpe_irq_chip = {
.name   = "GICv4-vpe",
  };
  
+static int its_vpe_id_alloc(void)

+{
+   return ida_simple_get(&its_vpeid_ida, 0, 1 << 16, GFP_KERNEL);
+}
+
+static void its_vpe_id_free(u16 id)
+{
+   ida_simple_remove(&its_vpeid_ida, id);
+}
+
+static int its_vpe_init(struct its_vpe *vpe)
+{
+   struct page *vpt_page;
+   int vpe_id;
+
+   /* Allocate vpe_id */
+   vpe_id = its_vpe_id_alloc();
+   if (vpe_id < 0)
+   return vpe_id;
+
+   /* Allocate VPT */
+   vpt_page = its_allocate_pending_table(GFP_KERNEL);
+   if (vpt_page) {



Change to 'if (!vpt_page)'.


--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 17/33] irqchip/gic-v3-its: Add VLPI configuration hook

2017-02-13 Thread Shanker Donthineni



On 01/17/2017 04:20 AM, Marc Zyngier wrote:

Add the skeleton irq_set_vcpu_affinity method that will be used
to configure VLPIs.

Signed-off-by: Marc Zyngier 
---
  drivers/irqchip/irq-gic-v3-its.c | 33 +
  1 file changed, 33 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c
index 0dbc8b0..1bd78ca 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -36,6 +36,7 @@
  
  #include 

  #include 
+#include 
  
  #include 

  #include 
@@ -771,6 +772,37 @@ static int its_irq_set_irqchip_state(struct irq_data
*d,
return 0;
  }
  
+static int its_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu_info)

+{
+   struct its_device *its_dev = irq_data_get_irq_chip_data(d);
+   struct its_cmd_info *info = vcpu_info;
+   u32 event = its_get_event_id(d);
+
+   /* Need a v4 ITS */
+   if (!its_dev->its->is_v4 || !info)
+   return -EINVAL;
+
+   switch (info->cmd_type) {
+   case MAP_VLPI:
+   {
+   return 0;
+   }
+
+   case UNMAP_VLPI:
+   {
+   return 0;
+   }
+
+   case PROP_UPDATE_VLPI:
+   {
+   return 0;
+   }
+
+   default:
+   return -EINVAL;
+   }

Missing a return statement.

--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 11/33] irqchip/gic-v3-its: Split out pending table allocation

2017-02-13 Thread Shanker Donthineni

Hi Marc,


On 01/17/2017 04:20 AM, Marc Zyngier wrote:

Just as for the property table, let's move the pending table
allocation to a separate function.

Signed-off-by: Marc Zyngier 
---
  drivers/irqchip/irq-gic-v3-its.c | 29 -
  1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c
index 14305db1..dce8f8c 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1188,6 +1188,24 @@ static int its_alloc_collections(struct its_node
*its)
return 0;
  }
  
+static struct page *its_allocate_pending_table(gfp_t gfp_flags)

+{
PEND and PROP table sizes are defined as compile time macros, but as per 
ITS spec implementation 24bit LPI space is also possible. It would be 
nicer to parametrize

both the tables sizes so that it would easier to enable 24bit LPI later.

Actually Qualcomm server chips support 24bit IDBITS.

--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 10/33] irqchip/gic-v4-its: Allow use of indirect VCPU tables

2017-02-13 Thread Shanker Donthineni

Hi Marc,


On 01/17/2017 04:20 AM, Marc Zyngier wrote:

The VCPU tables can be quite sparse as well, and it makes sense
to use indirect tables as well if possible.
The VCPU table has maximum of 2^16 entries as compared to 2^32 entries 
in device table. ITS hardware implementations may not support indirect 
table because of low memory requirement.

Signed-off-by: Marc Zyngier 
---
  drivers/irqchip/irq-gic-v3-its.c | 20 +---
  1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c
index c92ff4d..14305db1 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1060,10 +1060,13 @@ static int its_setup_baser(struct its_node *its,
struct its_baser *baser,
return 0;
  }
  
-static bool its_parse_baser_device(struct its_node *its, struct its_baser

*baser,
-  u32 psz, u32 *order)
+static bool its_parse_indirect_baser(struct its_node *its,
+struct its_baser *baser,
+u32 psz, u32 *order)
  {
-   u64 esz = GITS_BASER_ENTRY_SIZE(its_read_baser(its, baser));
+   u64 tmp = its_read_baser(its, baser);
+   u64 type = GITS_BASER_TYPE(tmp);
+   u64 esz = GITS_BASER_ENTRY_SIZE(tmp);
u64 val = GITS_BASER_InnerShareable | GITS_BASER_WaWb;
u32 ids = its->device_ids;
u32 new_order = *order;
@@ -1102,8 +1105,9 @@ static bool its_parse_baser_device(struct its_node
*its, struct its_baser *baser
if (new_order >= MAX_ORDER) {
new_order = MAX_ORDER - 1;
ids = ilog2(PAGE_ORDER_TO_SIZE(new_order) / (int)esz);
-   pr_warn("ITS@%pa: Device Table too large, reduce ids
%u->%u\n",
-   &its->phys_base, its->device_ids, ids);
+   pr_warn("ITS@%pa: %s Table too large, reduce ids
%u->%u\n",
+   &its->phys_base, its_base_type_string[type],
+   its->device_ids, ids);
}
  
  	*order = new_order;

@@ -1154,8 +1158,10 @@ static int its_alloc_tables(struct its_node *its)
if (type == GITS_BASER_TYPE_NONE)
continue;
  
-		if (type == GITS_BASER_TYPE_DEVICE)

-   indirect = its_parse_baser_device(its, baser, psz,
&order);
Try to allocate maximum memory as possible then attempt enabling 
indirection table.


#define ITS_VPES_MAX(65536)

if (type == GITS_BASER_TYPE_VCPU)
 order = get_order(esz * ITS_VPES_MAX);

On Qualcomm implementation, 1MBytes, 64536  * 16Byte (vPE entry size) 
memory is enough to sparse 16bit vPE.

+   if (type == GITS_BASER_TYPE_DEVICE ||
+   type == GITS_BASER_TYPE_VCPU)
+   indirect = its_parse_indirect_baser(its, baser,
+   psz, &order);
  
  		err = its_setup_baser(its, baser, cache, shr, psz, order,

indirect);
if (err < 0) {


--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 06/33] irqchip/gic-v3-its: Add probing for VLPI properties

2017-02-13 Thread Shanker Donthineni



On 01/17/2017 04:20 AM, Marc Zyngier wrote:

Add the probing code for the ITS VLPI support. This includes
configuring the ITS number if not supporting the single VMOVP
command feature.

Signed-off-by: Marc Zyngier 
---
  drivers/irqchip/irq-gic-v3-its.c   | 47
++
  include/linux/irqchip/arm-gic-v3.h |  4 
  2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c
index 9304dd2..99f6130 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -103,6 +103,7 @@ struct its_node {
u32 ite_size;
u32 device_ids;
int numa_node;
+   boolis_v4;
  };
  
  #define ITS_ITT_ALIGN		SZ_256

@@ -135,6 +136,8 @@ static DEFINE_SPINLOCK(its_lock);
  static struct rdists *gic_rdists;
  static struct irq_domain *its_parent;
  
+static unsigned long its_list_map;

+
  #define gic_data_rdist()  (raw_cpu_ptr(gic_rdists->rdist))
  #define gic_data_rdist_rd_base()  (gic_data_rdist()->rd_base)
  
@@ -1661,8 +1664,8 @@ static int __init its_probe_one(struct resource

*res,
  {
struct its_node *its;
void __iomem *its_base;
-   u32 val;
-   u64 baser, tmp;
+   u32 val, ctlr;
+   u64 baser, tmp, typer;
int err;
  
  	its_base = ioremap(res->start, resource_size(res));

@@ -1695,9 +1698,44 @@ static int __init its_probe_one(struct resource
*res,
raw_spin_lock_init(&its->lock);
INIT_LIST_HEAD(&its->entry);
INIT_LIST_HEAD(&its->its_device_list);
+   typer = gic_read_typer(its_base + GITS_TYPER);
its->base = its_base;
its->phys_base = res->start;
-   its->ite_size = ((gic_read_typer(its_base + GITS_TYPER) >> 4) &
0xf) + 1;
+   its->ite_size = ((typer >> 4) & 0xf) + 1;

I think we should move bit manipulations to a macro, some thing like this.
its->ite_size = GITS_TYPER_ITEBITS(typer);

#define GITS_TYPER_ITEBITS_SHIFT4
#define GITS_TYPER_ITEBITS(r)   r) >> 
GITS_TYPER_ITEBITS_SHIFT) & 0xf) + 1)



+   its->is_v4 = !!(typer & GITS_TYPER_VLPIS);
+   if (its->is_v4 && !(typer & GITS_TYPER_VMOVP)) {
+   int its_number;
+
+   its_number = find_first_zero_bit(&its_list_map, 16);
+   if (its_number >= 16) {
+   pr_err("ITS@%pa: No ITSList entry available!\n",
+  &res->start);
+   err = -EINVAL;
+   goto out_free_its;
+   }
+
+   ctlr = readl_relaxed(its_base + GITS_CTLR);
+   ctlr &= ~GITS_CTLR_ITS_NUMBER;
+   ctlr |= its_number << GITS_CTLR_ITS_NUMBER_SHIFT;
+   writel_relaxed(ctlr, its_base + GITS_CTLR);
+   ctlr = readl_relaxed(its_base + GITS_CTLR);
+   if ((ctlr & GITS_CTLR_ITS_NUMBER) != (its_number <<
GITS_CTLR_ITS_NUMBER_SHIFT)) {
+   its_number = ctlr & GITS_CTLR_ITS_NUMBER;
+   its_number >>= GITS_CTLR_ITS_NUMBER_SHIFT;
+   }
+
+   if (test_and_set_bit(its_number, &its_list_map)) {
+   pr_err("ITS@%pa: Duplicate ITSList entry %d\n",
+  &res->start, its_number);
+   err = -EINVAL;
+   goto out_free_its;
+   }
+
+   pr_info("ITS@%pa: Using ITS number %d\n", &res->start,
its_number);
+   } else {
+   pr_info("ITS@%pa: Single VMOVP capable\n", &res->start);
+   }

Can we move to a separate function for code readability purpose?

--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 02/33] irqchip/gic-v3: Add VLPI/DirectLPI discovery

2017-02-13 Thread Shanker Donthineni



On 01/17/2017 04:20 AM, Marc Zyngier wrote:

Add helper functions that probe for VLPI and DirectLPI properties.

Signed-off-by: Marc Zyngier 
---
  drivers/irqchip/irq-gic-v3.c   | 22 ++
  include/linux/irqchip/arm-gic-v3.h |  3 +++
  2 files changed, 25 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 5cadec0..8a6de91 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -514,6 +514,24 @@ static int gic_populate_rdist(void)
return -ENODEV;
  }
  
+static int __gic_update_vlpi_properties(struct redist_region *region,

+   void __iomem *ptr)
+{
+   u64 typer = gic_read_typer(ptr + GICR_TYPER);
+   gic_data.rdists.has_vlpis &= !!(typer & GICR_TYPER_VLPIS);
+   gic_data.rdists.has_direct_lpi &= !!(typer &
GICR_TYPER_DirectLPIS);
+
+   return 1;
+}
+
+static void gic_update_vlpi_properties(void)
+{
+   gic_scan_rdist_properties(__gic_update_vlpi_properties);
+   pr_info("%sVLPI support, %sdirect LPI support\n",

Would be better if we keep one space after 'no'?

--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 01/33] irqchip/gic-v3: Add redistributor iterator

2017-02-13 Thread Shanker Donthineni

Hi Marc,


On 01/17/2017 04:20 AM, Marc Zyngier wrote:

In order to discover the VLPI properties, we need to iterate over
the redistributor regions. As we already have code that does this,
let's factor it out and make it slightly more generic.

Signed-off-by: Marc Zyngier 
---
  drivers/irqchip/irq-gic-v3.c | 77

  1 file changed, 56 insertions(+), 21 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index c132f29..5cadec0 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -421,24 +421,15 @@ static void __init gic_dist_init(void)
gic_write_irouter(affinity, base + GICD_IROUTER + i * 8);
  }
  
-static int gic_populate_rdist(void)

+static int gic_scan_rdist_properties(int (*fn)(struct redist_region *,
+  void __iomem *))
I don't see this function is parsing GICR properties, may be it makes 
readable on changing name to gic_redist_iterator().



  {
-   unsigned long mpidr = cpu_logical_map(smp_processor_id());
-   u64 typer;
-   u32 aff;
+   int ret = 0;
For readability purpose set  ret = ENODEV, to cover error case where 
gic_data.nr_redist_regions == 0.

int i;
  
-	/*

-* Convert affinity to a 32bit value that can be matched to
-* GICR_TYPER bits [63:32].
-*/
-   aff = (MPIDR_AFFINITY_LEVEL(mpidr, 3) << 24 |
-  MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16 |
-  MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8 |
-  MPIDR_AFFINITY_LEVEL(mpidr, 0));
-
for (i = 0; i < gic_data.nr_redist_regions; i++) {
void __iomem *ptr =
gic_data.redist_regions[i].redist_base;
+   u64 typer;
u32 reg;
  
  		reg = readl_relaxed(ptr + GICR_PIDR2) &

GIC_PIDR2_ARCH_MASK;
@@ -450,14 +441,14 @@ static int gic_populate_rdist(void)
  
  		do {

typer = gic_read_typer(ptr + GICR_TYPER);
-   if ((typer >> 32) == aff) {
-   u64 offset = ptr -
gic_data.redist_regions[i].redist_base;
-   gic_data_rdist_rd_base() = ptr;
-   gic_data_rdist()->phys_base =
gic_data.redist_regions[i].phys_base + offset;
-   pr_info("CPU%d: found redistributor %lx
region %d:%pa\n",
-   smp_processor_id(), mpidr, i,
-   &gic_data_rdist()->phys_base);
+   ret = fn(gic_data.redist_regions + i, ptr);
+   switch (ret) {
+   case 0:
return 0;
+   case -1:
+   break;
+   default:
+   ret = 0;
}
  
  			if (gic_data.redist_regions[i].single_redist)

@@ -473,9 +464,53 @@ static int gic_populate_rdist(void)
} while (!(typer & GICR_TYPER_LAST));
  
  
+	if (ret == -1)

+   ret = -ENODEV;
+
__gic_populate_rdist() returns 1 to try next entry in the list. We 
should not return value 0 here if no matching entry is found otherwise 
the gic_populate_rdist() assumes that it found the corresponding GICR.


+   return 0;
+}
+
+static int __gic_populate_rdist(struct redist_region *region, void
__iomem *ptr)
+{
+   unsigned long mpidr = cpu_logical_map(smp_processor_id());
+   u64 typer;
+   u32 aff;
+
+   /*
+* Convert affinity to a 32bit value that can be matched to
+* GICR_TYPER bits [63:32].
+*/
+   aff = (MPIDR_AFFINITY_LEVEL(mpidr, 3) << 24 |
+  MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16 |
+  MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8 |
+  MPIDR_AFFINITY_LEVEL(mpidr, 0));
+
+   typer = gic_read_typer(ptr + GICR_TYPER);
+   if ((typer >> 32) == aff) {
+   u64 offset = ptr - region->redist_base;
+   gic_data_rdist_rd_base() = ptr;
+   gic_data_rdist()->phys_base = region->phys_base + offset;
+
+   pr_info("CPU%d: found redistributor %lx region %d:%pa\n",
+   smp_processor_id(), mpidr,
+   (int)(region - gic_data.redist_regions),
+   &gic_data_rdist()->phys_base);
+   return 0;
+   }
+
+   /* Try next one */
+   return 1;
+}
+
+static int gic_populate_rdist(void)
+{
+   if (gic_scan_rdist_properties(__gic_populate_rdist) == 0)

what about 'if (!gic_scan_rdist_properties(__gic_populate_rdist))'?

+   return 0;
+
/* We couldn't even deal with ourselves... */
WARN(true, "CPU%d: mpidr %lx has no re-distributor!\n",
-smp_processor_id(), mpidr);
+ 

[RESEND PATCH] KVM: arm/arm64: vgic: Stop injecting the MSI occurrence twice

2017-02-03 Thread Shanker Donthineni
The IRQFD framework calls the architecture dependent function
twice if the corresponding GSI type is edge triggered. For ARM,
the function kvm_set_msi() is getting called twice whenever the
IRQFD receives the event signal. The rest of the code path is
trying to inject the MSI without any validation checks. No need
to call the function vgic_its_inject_msi() second time to avoid
an unnecessary overhead in IRQ queue logic. It also avoids the
possibility of VM seeing the MSI twice.

Simple fix, return -1 if the argument 'level' value is zero.

Signed-off-by: Shanker Donthineni 
Reviewed-by: Eric Auger 
Reviewed-by: Christoffer Dall 
---
Forgot to CC the kvmarm list earlier, including now.

 virt/kvm/arm/vgic/vgic-irqfd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/arm/vgic/vgic-irqfd.c b/virt/kvm/arm/vgic/vgic-irqfd.c
index d918dcf..f138ed2 100644
--- a/virt/kvm/arm/vgic/vgic-irqfd.c
+++ b/virt/kvm/arm/vgic/vgic-irqfd.c
@@ -99,6 +99,9 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
if (!vgic_has_its(kvm))
return -ENODEV;
 
+   if (!level)
+   return -1;
+
return vgic_its_inject_msi(kvm, &msi);
 }
 
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3] arm64: KVM: Optimize __guest_enter/exit() to save a few instructions

2016-08-30 Thread Shanker Donthineni
We are doing an unnecessary stack push/pop operation when restoring
the guest registers x0-x18 in __guest_enter(). This patch saves the
two instructions by using x18 as a base register. No need to store
the vcpu context pointer in stack because it is redundant, the same
information is available in tpidr_el2. The function __guest_exit()
calling convention is slightly modified, caller only pushes the regs
x0-x1 to stack instead of regs x0-x3.

Signed-off-by: Shanker Donthineni 
Reviewed-by: Christoffer Dall 
---
Tested this patch using the Qualcomm QDF24XXX platform.

Changes since v2:
  Removed macros save_x0_to_x3/restore_x0_to_x3.
  Modified el1_sync() to use regs x0 and x1.
  Edited commit text.

Changes since v1:
  Incorporated Cristoffer suggestions.
  __guest_exit prototype is changed to 'void __guest_exit(u64 reason, struct 
kvm_vcpu *vcpu)'.

 arch/arm64/kvm/hyp/entry.S | 101 -
 arch/arm64/kvm/hyp/hyp-entry.S |  37 ++-
 2 files changed, 63 insertions(+), 75 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index ce9e5e5..3967c231 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -55,79 +55,78 @@
  */
 ENTRY(__guest_enter)
// x0: vcpu
-   // x1: host/guest context
-   // x2-x18: clobbered by macros
+   // x1: host context
+   // x2-x17: clobbered by macros
+   // x18: guest context
 
// Store the host regs
save_callee_saved_regs x1
 
-   // Preserve vcpu & host_ctxt for use at exit time
-   stp x0, x1, [sp, #-16]!
+   // Store the host_ctxt for use at exit time
+   str x1, [sp, #-16]!
 
-   add x1, x0, #VCPU_CONTEXT
+   add x18, x0, #VCPU_CONTEXT
 
-   // Prepare x0-x1 for later restore by pushing them onto the stack
-   ldp x2, x3, [x1, #CPU_XREG_OFFSET(0)]
-   stp x2, x3, [sp, #-16]!
+   // Restore guest regs x0-x17
+   ldp x0, x1,   [x18, #CPU_XREG_OFFSET(0)]
+   ldp x2, x3,   [x18, #CPU_XREG_OFFSET(2)]
+   ldp x4, x5,   [x18, #CPU_XREG_OFFSET(4)]
+   ldp x6, x7,   [x18, #CPU_XREG_OFFSET(6)]
+   ldp x8, x9,   [x18, #CPU_XREG_OFFSET(8)]
+   ldp x10, x11, [x18, #CPU_XREG_OFFSET(10)]
+   ldp x12, x13, [x18, #CPU_XREG_OFFSET(12)]
+   ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
+   ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
 
-   // x2-x18
-   ldp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
-   ldp x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
-   ldp x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
-   ldp x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
-   ldp x10, x11, [x1, #CPU_XREG_OFFSET(10)]
-   ldp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
-   ldp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
-   ldp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
-   ldr x18,  [x1, #CPU_XREG_OFFSET(18)]
-
-   // x19-x29, lr
-   restore_callee_saved_regs x1
-
-   // Last bits of the 64bit state
-   ldp x0, x1, [sp], #16
+   // Restore guest regs x19-x29, lr
+   restore_callee_saved_regs x18
+
+   // Restore guest reg x18
+   ldr x18,  [x18, #CPU_XREG_OFFSET(18)]
 
// Do not touch any register after this!
eret
 ENDPROC(__guest_enter)
 
 ENTRY(__guest_exit)
-   // x0: vcpu
-   // x1: return code
-   // x2-x3: free
-   // x4-x29,lr: vcpu regs
-   // vcpu x0-x3 on the stack
-
-   add x2, x0, #VCPU_CONTEXT
-
-   stp x4, x5,   [x2, #CPU_XREG_OFFSET(4)]
-   stp x6, x7,   [x2, #CPU_XREG_OFFSET(6)]
-   stp x8, x9,   [x2, #CPU_XREG_OFFSET(8)]
-   stp x10, x11, [x2, #CPU_XREG_OFFSET(10)]
-   stp x12, x13, [x2, #CPU_XREG_OFFSET(12)]
-   stp x14, x15, [x2, #CPU_XREG_OFFSET(14)]
-   stp x16, x17, [x2, #CPU_XREG_OFFSET(16)]
-   str x18,  [x2, #CPU_XREG_OFFSET(18)]
-
-   ldp x6, x7, [sp], #16   // x2, x3
-   ldp x4, x5, [sp], #16   // x0, x1
-
-   stp x4, x5, [x2, #CPU_XREG_OFFSET(0)]
-   stp x6, x7, [x2, #CPU_XREG_OFFSET(2)]
+   // x0: return code
+   // x1: vcpu
+   // x2-x29,lr: vcpu regs
+   // vcpu x0-x1 on the stack
+
+   add x1, x1, #VCPU_CONTEXT
+
+   // Store the guest regs x2 and x3
+   stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
+
+   // Retrieve the guest regs x0-x1 from the stack
+   ldp x2, x3, [sp], #16   // x0, x1
+
+   // Store the guest regs x0-x1 and x4-x18
+   stp x2, x3,   [x1, #CPU_XREG_OFFSET(0)]
+   stp x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
+   stp x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
+   stp x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
+   stp x10, x11, [x1, #CPU_XREG_OFFSET(10)]
+   stp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
+   stp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
+   stp x16, x17, [x1, #C

Re: [PATCH v2] arm64: KVM: Save four instructions in __guest_enter/exit()

2016-08-30 Thread Shanker Donthineni

Hi Marc,


On 08/30/2016 05:54 AM, Marc Zyngier wrote:

On 30/08/16 10:55, Christoffer Dall wrote:

On Mon, Aug 29, 2016 at 10:51:14PM -0500, Shanker Donthineni wrote:

We are doing an unnecessary stack push/pop operation when restoring
the guest registers x0-x18 in __guest_enter(). This patch saves the
two instructions by using x18 as a base register. No need to store
the vcpu context pointer in stack because it is redundant, the same
information is available in tpidr_el2. The function __guest_exit()
prototype is simplified and caller pushes the regs x0-x1 to stack
instead of regs x0-x3.

Signed-off-by: Shanker Donthineni 

This looks reasonable to me:

Reviewed-by: Christoffer Dall 

Unless Marc has any insight into this having a negative effect on ARM
CPUs, I'll go ahead an merge this.

I've given it a go on Seattle, and couldn't observe any difference with
the original code, which is pretty good news!

I have some comments below, though:


-Christoffer


---
Changes since v1:
   Incorporated Cristoffer suggestions.
   __guest_exit prototype is changed to 'void __guest_exit(u64 reason,

struct kvm_vcpu *vcpu)'.

  arch/arm64/kvm/hyp/entry.S | 101

+

  arch/arm64/kvm/hyp/hyp-entry.S |  11 +++--
  2 files changed, 57 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index ce9e5e5..f70489a 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -55,75 +55,76 @@
   */
  ENTRY(__guest_enter)
// x0: vcpu
-   // x1: host/guest context
-   // x2-x18: clobbered by macros
+   // x1: host context
+   // x2-x17: clobbered by macros
+   // x18: guest context
  
  	// Store the host regs

save_callee_saved_regs x1
  
-	// Preserve vcpu & host_ctxt for use at exit time

-   stp x0, x1, [sp, #-16]!
+   // Store the host_ctxt for use at exit time
+   str x1, [sp, #-16]!
  
-	add	x1, x0, #VCPU_CONTEXT

+   add x18, x0, #VCPU_CONTEXT
  
-	// Prepare x0-x1 for later restore by pushing them onto the stack

-   ldp x2, x3, [x1, #CPU_XREG_OFFSET(0)]
-   stp x2, x3, [sp, #-16]!
+   // Restore guest regs x0-x17
+   ldp x0, x1,   [x18, #CPU_XREG_OFFSET(0)]
+   ldp x2, x3,   [x18, #CPU_XREG_OFFSET(2)]
+   ldp x4, x5,   [x18, #CPU_XREG_OFFSET(4)]
+   ldp x6, x7,   [x18, #CPU_XREG_OFFSET(6)]
+   ldp x8, x9,   [x18, #CPU_XREG_OFFSET(8)]
+   ldp x10, x11, [x18, #CPU_XREG_OFFSET(10)]
+   ldp x12, x13, [x18, #CPU_XREG_OFFSET(12)]
+   ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
+   ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
  
-	// x2-x18

-   ldp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
-   ldp x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
-   ldp x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
-   ldp x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
-   ldp x10, x11, [x1, #CPU_XREG_OFFSET(10)]
-   ldp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
-   ldp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
-   ldp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
-   ldr x18,  [x1, #CPU_XREG_OFFSET(18)]
+   // Restore guest regs x19-x29, lr
+   restore_callee_saved_regs x18
  
-	// x19-x29, lr

-   restore_callee_saved_regs x1
-
-   // Last bits of the 64bit state
-   ldp x0, x1, [sp], #16
+   // Restore guest reg x18
+   ldr x18,  [x18, #CPU_XREG_OFFSET(18)]
  
  	// Do not touch any register after this!

eret
  ENDPROC(__guest_enter)
  
+/*

+ * void __guest_exit(u64 exit_reason, struct kvm_vcpu *vcpu);
+ */

I'm not sure this comment makes much sense as it stands. This is not a C
function by any stretch of the imagination, but the continuation of
__guest_enter. The calling convention is not the C one at all (see how
the stack is involved), and caller-saved registers are going to be
clobbered.


I'll remove this confusing comments.


  ENTRY(__guest_exit)
-   // x0: vcpu
-   // x1: return code
-   // x2-x3: free
-   // x4-x29,lr: vcpu regs
-   // vcpu x0-x3 on the stack
-
-   add x2, x0, #VCPU_CONTEXT
-
-   stp x4, x5,   [x2, #CPU_XREG_OFFSET(4)]
-   stp x6, x7,   [x2, #CPU_XREG_OFFSET(6)]
-   stp x8, x9,   [x2, #CPU_XREG_OFFSET(8)]
-   stp x10, x11, [x2, #CPU_XREG_OFFSET(10)]
-   stp x12, x13, [x2, #CPU_XREG_OFFSET(12)]
-   stp x14, x15, [x2, #CPU_XREG_OFFSET(14)]
-   stp x16, x17, [x2, #CPU_XREG_OFFSET(16)]
-   str x18,  [x2, #CPU_XREG_OFFSET(18)]
-
-   ldp x6, x7, [sp], #16   // x2, x3
-   ldp x4, x5, [sp], #16   // x0, x1
-
-   stp x4, x5, [x2, #CPU_XREG_OFFSET(0)]
-   stp x6, x7, [x2, #CPU_XREG_OFFSET(2)]
+   // x0: return code
+   // x1: vcpu
+   // x2-x29,lr: vcpu regs
+   // vcpu x0-x1 on the stack
+
+   add x1, x

[PATCH v2] arm64: KVM: Save four instructions in __guest_enter/exit()

2016-08-29 Thread Shanker Donthineni
We are doing an unnecessary stack push/pop operation when restoring
the guest registers x0-x18 in __guest_enter(). This patch saves the
two instructions by using x18 as a base register. No need to store
the vcpu context pointer in stack because it is redundant, the same
information is available in tpidr_el2. The function __guest_exit()
prototype is simplified and caller pushes the regs x0-x1 to stack
instead of regs x0-x3.

Signed-off-by: Shanker Donthineni 
---
Changes since v1:
  Incorporated Cristoffer suggestions.
  __guest_exit prototype is changed to 'void __guest_exit(u64 reason, struct 
kvm_vcpu *vcpu)'.

 arch/arm64/kvm/hyp/entry.S | 101 +
 arch/arm64/kvm/hyp/hyp-entry.S |  11 +++--
 2 files changed, 57 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index ce9e5e5..f70489a 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -55,75 +55,76 @@
  */
 ENTRY(__guest_enter)
// x0: vcpu
-   // x1: host/guest context
-   // x2-x18: clobbered by macros
+   // x1: host context
+   // x2-x17: clobbered by macros
+   // x18: guest context
 
// Store the host regs
save_callee_saved_regs x1
 
-   // Preserve vcpu & host_ctxt for use at exit time
-   stp x0, x1, [sp, #-16]!
+   // Store the host_ctxt for use at exit time
+   str x1, [sp, #-16]!
 
-   add x1, x0, #VCPU_CONTEXT
+   add x18, x0, #VCPU_CONTEXT
 
-   // Prepare x0-x1 for later restore by pushing them onto the stack
-   ldp x2, x3, [x1, #CPU_XREG_OFFSET(0)]
-   stp x2, x3, [sp, #-16]!
+   // Restore guest regs x0-x17
+   ldp x0, x1,   [x18, #CPU_XREG_OFFSET(0)]
+   ldp x2, x3,   [x18, #CPU_XREG_OFFSET(2)]
+   ldp x4, x5,   [x18, #CPU_XREG_OFFSET(4)]
+   ldp x6, x7,   [x18, #CPU_XREG_OFFSET(6)]
+   ldp x8, x9,   [x18, #CPU_XREG_OFFSET(8)]
+   ldp x10, x11, [x18, #CPU_XREG_OFFSET(10)]
+   ldp x12, x13, [x18, #CPU_XREG_OFFSET(12)]
+   ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
+   ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
 
-   // x2-x18
-   ldp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
-   ldp x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
-   ldp x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
-   ldp x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
-   ldp x10, x11, [x1, #CPU_XREG_OFFSET(10)]
-   ldp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
-   ldp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
-   ldp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
-   ldr x18,  [x1, #CPU_XREG_OFFSET(18)]
+   // Restore guest regs x19-x29, lr
+   restore_callee_saved_regs x18
 
-   // x19-x29, lr
-   restore_callee_saved_regs x1
-
-   // Last bits of the 64bit state
-   ldp x0, x1, [sp], #16
+   // Restore guest reg x18
+   ldr x18,  [x18, #CPU_XREG_OFFSET(18)]
 
// Do not touch any register after this!
eret
 ENDPROC(__guest_enter)
 
+/*
+ * void __guest_exit(u64 exit_reason, struct kvm_vcpu *vcpu);
+ */
 ENTRY(__guest_exit)
-   // x0: vcpu
-   // x1: return code
-   // x2-x3: free
-   // x4-x29,lr: vcpu regs
-   // vcpu x0-x3 on the stack
-
-   add x2, x0, #VCPU_CONTEXT
-
-   stp x4, x5,   [x2, #CPU_XREG_OFFSET(4)]
-   stp x6, x7,   [x2, #CPU_XREG_OFFSET(6)]
-   stp x8, x9,   [x2, #CPU_XREG_OFFSET(8)]
-   stp x10, x11, [x2, #CPU_XREG_OFFSET(10)]
-   stp x12, x13, [x2, #CPU_XREG_OFFSET(12)]
-   stp x14, x15, [x2, #CPU_XREG_OFFSET(14)]
-   stp x16, x17, [x2, #CPU_XREG_OFFSET(16)]
-   str x18,  [x2, #CPU_XREG_OFFSET(18)]
-
-   ldp x6, x7, [sp], #16   // x2, x3
-   ldp x4, x5, [sp], #16   // x0, x1
-
-   stp x4, x5, [x2, #CPU_XREG_OFFSET(0)]
-   stp x6, x7, [x2, #CPU_XREG_OFFSET(2)]
+   // x0: return code
+   // x1: vcpu
+   // x2-x29,lr: vcpu regs
+   // vcpu x0-x1 on the stack
+
+   add x1, x1, #VCPU_CONTEXT
+
+   // Store the guest regs x2 and x3
+   stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
+
+   // Retrieve the guest regs x0-x1 from the stack
+   ldp x2, x3, [sp], #16   // x0, x1
+
+   // Store the guest regs x0-x1 and x4-x18
+   stp x2, x3,   [x1, #CPU_XREG_OFFSET(0)]
+   stp x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
+   stp x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
+   stp x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
+   stp x10, x11, [x1, #CPU_XREG_OFFSET(10)]
+   stp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
+   stp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
+   stp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
+   str x18,  [x1, #CPU_XREG_OFFSET(18)]
+
+   // Store the guest regs x19-x29, lr
+   save_callee_saved_regs x1
 
-   save_callee

Re: [PATCH] arm64: KVM: Save two instructions in __guest_enter()

2016-08-29 Thread Shanker Donthineni

Hi Christoffer,

This is change may not provide the measurable performance improvement, 
but still we can
save a few cpu cycles on vCPU context switch and also improves the code 
readability.



On 08/25/2016 08:31 AM, Christoffer Dall wrote:

Hi Shanker,

On Tue, Aug 09, 2016 at 08:15:36PM -0500, Shanker Donthineni wrote:

We are doing an unnecessary stack push/pop operation when restoring
the guest registers x0-x18 in __guest_enter(). This patch saves the
two instructions by using x18 as a base register. No need to store
the vcpu context pointer in stack because it is redundant and not
being used anywhere, the same information is available in tpidr_el2.

Does this have any measureable benefit?

Thanks,
-Christoffer


--
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] arm64: KVM: Save two instructions in __guest_enter()

2016-08-09 Thread Shanker Donthineni
We are doing an unnecessary stack push/pop operation when restoring
the guest registers x0-x18 in __guest_enter(). This patch saves the
two instructions by using x18 as a base register. No need to store
the vcpu context pointer in stack because it is redundant and not
being used anywhere, the same information is available in tpidr_el2.

Signed-off-by: Shanker Donthineni 
---
 arch/arm64/kvm/hyp/entry.S | 66 ++
 1 file changed, 32 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index ce9e5e5..d2e09a1 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -55,37 +55,32 @@
  */
 ENTRY(__guest_enter)
// x0: vcpu
-   // x1: host/guest context
-   // x2-x18: clobbered by macros
+   // x1: host context
+   // x2-x17: clobbered by macros
+   // x18: guest context
 
// Store the host regs
save_callee_saved_regs x1
 
-   // Preserve vcpu & host_ctxt for use at exit time
-   stp x0, x1, [sp, #-16]!
+   // Preserve the host_ctxt for use at exit time
+   str x1, [sp, #-16]!
 
-   add x1, x0, #VCPU_CONTEXT
+   add x18, x0, #VCPU_CONTEXT
 
-   // Prepare x0-x1 for later restore by pushing them onto the stack
-   ldp x2, x3, [x1, #CPU_XREG_OFFSET(0)]
-   stp x2, x3, [sp, #-16]!
+   // Restore guest regs x19-x29, lr
+   restore_callee_saved_regs x18
 
-   // x2-x18
-   ldp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
-   ldp x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
-   ldp x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
-   ldp x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
-   ldp x10, x11, [x1, #CPU_XREG_OFFSET(10)]
-   ldp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
-   ldp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
-   ldp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
-   ldr x18,  [x1, #CPU_XREG_OFFSET(18)]
-
-   // x19-x29, lr
-   restore_callee_saved_regs x1
-
-   // Last bits of the 64bit state
-   ldp x0, x1, [sp], #16
+   // Restore guest regs x0-x18
+   ldp x0, x1,   [x18, #CPU_XREG_OFFSET(0)]
+   ldp x2, x3,   [x18, #CPU_XREG_OFFSET(2)]
+   ldp x4, x5,   [x18, #CPU_XREG_OFFSET(4)]
+   ldp x6, x7,   [x18, #CPU_XREG_OFFSET(6)]
+   ldp x8, x9,   [x18, #CPU_XREG_OFFSET(8)]
+   ldp x10, x11, [x18, #CPU_XREG_OFFSET(10)]
+   ldp x12, x13, [x18, #CPU_XREG_OFFSET(12)]
+   ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
+   ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
+   ldr x18,  [x18, #CPU_XREG_OFFSET(18)]
 
// Do not touch any register after this!
eret
@@ -100,6 +95,16 @@ ENTRY(__guest_exit)
 
add x2, x0, #VCPU_CONTEXT
 
+   // Store the guest regs x19-x29, lr
+   save_callee_saved_regs x2
+
+   // Retrieve the guest regs x0-x3 from the stack
+   ldp x21, x22, [sp], #16 // x2, x3
+   ldp x19, x20, [sp], #16 // x0, x1
+
+   // Store the guest regs x0-x18
+   stp x19, x20, [x2, #CPU_XREG_OFFSET(0)]
+   stp x21, x22, [x2, #CPU_XREG_OFFSET(2)]
stp x4, x5,   [x2, #CPU_XREG_OFFSET(4)]
stp x6, x7,   [x2, #CPU_XREG_OFFSET(6)]
stp x8, x9,   [x2, #CPU_XREG_OFFSET(8)]
@@ -109,20 +114,13 @@ ENTRY(__guest_exit)
stp x16, x17, [x2, #CPU_XREG_OFFSET(16)]
str x18,  [x2, #CPU_XREG_OFFSET(18)]
 
-   ldp x6, x7, [sp], #16   // x2, x3
-   ldp x4, x5, [sp], #16   // x0, x1
+   // Restore the host_ctxt from the stack
+   ldr x2, [sp], #16
 
-   stp x4, x5, [x2, #CPU_XREG_OFFSET(0)]
-   stp x6, x7, [x2, #CPU_XREG_OFFSET(2)]
-
-   save_callee_saved_regs x2
-
-   // Restore vcpu & host_ctxt from the stack
-   // (preserving return code in x1)
-   ldp x0, x2, [sp], #16
// Now restore the host regs
restore_callee_saved_regs x2
 
+   // Preserving return code (x1)
mov x0, x1
ret
 ENDPROC(__guest_exit)
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, 
Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PULL 00/29] KVM/ARM Changes for v4.7

2016-05-17 Thread Shanker Donthineni
Hi Itaru,

Look at this commit that might be causing the problem.

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/firmware/efi/arm-runtime.c?id=14c43be60166981f0b1f034ad9c59252c6f99e0d

Review your EFI system table and runtime service region attributes.

On 05/17/2016 03:00 AM, Julien Grall wrote:
> Hello,
>
> On 17/05/2016 00:28, Itaru Kitayama wrote:
>> The new v4.6 upstream kernel gets to the prompt on Mustang (Rev A3).
>
> I would recommend you to bissect Linux and finger one or multiple commits 
> which break booting on your board.
>
> Regards,
>

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PULL 00/29] KVM/ARM Changes for v4.7

2016-05-17 Thread Shanker Donthineni
mory: 10432K (8003f9a8 - 8003fa4b)
>> kvm [1]: 8-bit VMID
>> kvm [1]: Hyp mode initialized successfully
>> INFO: rcu_preempt detected stalls on CPUs/tasks:
>> 0-...: (113 GPs behind) idle=1f9/1/0 softirq=86/86 fqs=0
>> 2-...: (107 GPs behind) idle=559/1/0 softirq=120/120 fqs=0
>> 4-...: (107 GPs behind) idle=33b/1/0 softirq=106/106 fqs=0
>> 5-...: (108 GPs behind) idle=333/1/0 softirq=130/130 fqs=0
>> 6-...: (105 GPs behind) idle=2f7/1/0 softirq=120/120 fqs=0
>> 7-...: (105 GPs behind) idle=327/1/0 softirq=131/131 fqs=0
>> (detected by 1, t=5252 jiffies, g=-135, c=-136, q=8)
>> Task dump for CPU 0:
>> swapper/0   R  running task0 0  0 0x
>> Call trace:
>> [] __switch_to+0x1c8/0x240
>> [] __f.8076+0x10/0x28
>> Task dump for CPU 2:
>> swapper/2   R  running task0 0  1 0x
>> Call trace:
>> [] __switch_to+0x1c8/0x240
>> [<0006a42a>] 0x6a42a
>> Task dump for CPU 4:
>> swapper/4   R  running task0 0  1 0x
>> Call trace:
>> [] __switch_to+0x1c8/0x240
>> [<00048555>] 0x48555
>> Task dump for CPU 5:
>> swapper/5   R  running task0 0  1 0x
>> Call trace:
>> [] __switch_to+0x1c8/0x240
>> [<00048554>] 0x48554
>> Task dump for CPU 6:
>> swapper/6   R  running task0 0  1 0x
>> Call trace:
>> [] __switch_to+0x1c8/0x240
>> [<0004855b>] 0x4855b
>> Task dump for CPU 7:
>> swapper/7   R  running task0 0  1 0x
>> Call trace:
>> [] __switch_to+0x1c8/0x240
>> [<0004855a>] 0x4855a
>> rcu_preempt kthread starved for 5252 jiffies! g18446744073709551481
>> c18446744073709551480 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
>> rcu_preempt S 0808ba20 0 7  2 0x
>> Call trace:
>> [] __switch_to+0x1c8/0x240
>> [] __schedule+0xb68/0x2558
>> [] schedule+0xc4/0x230
>> [] schedule_timeout+0x430/0x858
>> [] rcu_gp_kthread+0x1138/0x1ed0
>> [] kthread+0x1cc/0x1e0
>> [] ret_from_fork+0x10/0x40
>> NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/0:1]
>> Modules linked in:
>>
>> CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW
> 4.6.0-rc7-next-20160513 #6
>> Hardware name: AppliedMicro Mustang/Mustang, BIOS 3.05.05-beta_rc Jan 27
>> 2016
>> task: 8003c034 ti: 8003c040 task.ti: 8003c040
>> PC is at smp_call_function_many+0x780/0x7b0
>> LR is at smp_call_function_many+0x73c/0x7b0
>> pc : [] lr : [] pstate: 8045
>> sp : 8003c0403b50
>> x29: 8003c0403b50 x28: 0a482540
>> x27: 0a482550 x26: 09ffa000
>> x25: 0b34e7c0 x24: 0a041dd0
>> x23: 0b34e7c0 x22: 8003ffe2aa08
>> x21: 0b34e7c0 x20: 8003ffe2aa00
>> x19: 0b34e240 x18: 05f5e0ff
>> x17:  x16: 
>> x15: 00c4 x14: 8003fff79500
>> x13: 0a464f28 x12: 0b59cd10
>> x11:  x10: 0a62eef8
>> x9 : 0b59ccd0 x8 : 8003c040
>> x7 :  x6 : 0b59ccd0
>> x5 :  x4 : 
>> x3 : 8003ffdfdc18 x2 : 
>> x1 : 0a043548 x0 : 0003
>>
>> Kernel panic - not syncing: softlockup: hung tasks
>> CPU: 1 PID: 1 Comm: swapper/0 Tainted: GWL
>> 4.6.0-rc7-next-20160513 #6
>> Hardware name: AppliedMicro Mustang/Mustang, BIOS 3.05.05-beta_rc Jan 27
>> 2016
>> Call trace:
>> [] dump_backtrace+0x0/0x4b0
>> [] show_stack+0x3c/0x60
>> [] dump_stack+0x1dc/0x2c8
>> [] panic+0x264/0x5fc
>> [] watchdog_timer_fn+0x804/0x840
>> [] __hrtimer_run_queues+0x4b4/0xf50
>> [] hrtimer_interrupt+0x174/0x478
>> [] arch_timer_handler_phys+0xac/0xc8
>> [] handle_percpu_devid_irq+0x214/0x8b0
>> [] generic_handle_irq+0x8c/0xb0
>> [] __handle_domain_irq+0x178/0x288
>> [] gic_handle_irq+0x1e8/0x270
>> Exception stack(0x8003ffe13fa0 to 0x8003ffe140c0)
>> 3fa0: 8003c0403a30 8003ffe2aa00 8003c0403b50
> 082b6390
>> 3fc0: 8045 0a041dd0 8003ffe10020
> 8003ffe14010
>> 3fe0: 0a482550 8003c040 0800f000
> 8003c0403a30
>> 4000: 8003c0403b50 8003c0403a30 
> 
>> 4020: 00

Re: [PATCH v6 10/10] clocksource: arm_arch_timer: Remove arch_timer_get_timecounter

2016-04-23 Thread Shanker Donthineni


On 04/11/2016 10:33 AM, Julien Grall wrote:
> The only call of arch_timer_get_timecounter (in KVM) has been removed.
>
> Signed-off-by: Julien Grall 
> Acked-by: Christoffer Dall 

Tested-by: Shanker Donthineni 

Using the Qualcomm Technologies QDF2XXX server.

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 09/10] KVM: arm/arm64: vgic: Rely on the GIC driver to parse the firmware tables

2016-04-23 Thread Shanker Donthineni
Hi Julien,


On 04/11/2016 10:32 AM, Julien Grall wrote:
> Currently, the firmware tables are parsed 2 times: once in the GIC
> drivers, the other time when initializing the vGIC. It means code
> duplication and make more tedious to add the support for another
> firmware table (like ACPI).
>
> Use the recently introduced helper gic_get_kvm_info() to get
> information about the virtual GIC.
>
> With this change, the virtual GIC becomes agnostic to the firmware
> table and KVM will be able to initialize the vGIC on ACPI.
>
> Signed-off-by: Julien Grall 
> Reviewed-by: Christoffer Dall 
>
Tested-by: Shanker Donthineni 

Using the Qualcomm Technologies QDF2XXX server with PAGE_SIZE=4K.
> ---
> Cc: Marc Zyngier 
> Cc: Gleb Natapov 
> Cc: Paolo Bonzini 
>
> Changes in v6:
> - Add Christoffer's reviewed-by
>
> Changes in v4:
> - Remove validation check as they are already done during
> parsing.
> - Move the alignement check from the parsing to the vGIC code.
> - Fix typo in the commit message
>
> Changes in v2:
> - Use 0 rather than a negative value to know when the maintenance
> IRQ
> is not present.
> - Use resource for vcpu and vctrl.
> ---
>  include/kvm/arm_vgic.h |  7 +++---
>  virt/kvm/arm/vgic-v2.c | 61
> +-
>  virt/kvm/arm/vgic-v3.c | 47 +-
>  virt/kvm/arm/vgic.c| 50 ++---
>  4 files changed, 73 insertions(+), 92 deletions(-)
>
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 281caf8..be6037a 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define VGIC_NR_IRQS_LEGACY  256
>  #define VGIC_NR_SGIS 16
> @@ -353,15 +354,15 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu,
> struct irq_phys_map *map);
>  #define vgic_initialized(k)  (!!((k)->arch.vgic.nr_cpus))
>  #define vgic_ready(k)((k)->arch.vgic.ready)
>  
> -int vgic_v2_probe(struct device_node *vgic_node,
> +int vgic_v2_probe(const struct gic_kvm_info *gic_kvm_info,
> const struct vgic_ops **ops,
> const struct vgic_params **params);
>  #ifdef CONFIG_KVM_ARM_VGIC_V3
> -int vgic_v3_probe(struct device_node *vgic_node,
> +int vgic_v3_probe(const struct gic_kvm_info *gic_kvm_info,
> const struct vgic_ops **ops,
> const struct vgic_params **params);
>  #else
> -static inline int vgic_v3_probe(struct device_node *vgic_node,
> +static inline int vgic_v3_probe(const struct gic_kvm_info *gic_kvm_info,
>   const struct vgic_ops **ops,
>   const struct vgic_params **params)
>  {
> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
> index 67ec334..7e826c9 100644
> --- a/virt/kvm/arm/vgic-v2.c
> +++ b/virt/kvm/arm/vgic-v2.c
> @@ -20,9 +20,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
> -#include 
> -#include 
>  
>  #include 
>  
> @@ -186,38 +183,39 @@ static void vgic_cpu_init_lrs(void *params)
>  }
>  
>  /**
> - * vgic_v2_probe - probe for a GICv2 compatible interrupt controller in
> DT
> - * @node:pointer to the DT node
> - * @ops: address of a pointer to the GICv2 operations
> - * @params:  address of a pointer to HW-specific parameters
> + * vgic_v2_probe - probe for a GICv2 compatible interrupt controller
> + * @gic_kvm_info:pointer to the GIC description
> + * @ops: address of a pointer to the GICv2 operations
> + * @params:  address of a pointer to HW-specific parameters
>   *
>   * Returns 0 if a GICv2 has been found, with the low level operations
>   * in *ops and the HW parameters in *params. Returns an error code
>   * otherwise.
>   */
> -int vgic_v2_probe(struct device_node *vgic_node,
> -   const struct vgic_ops **ops,
> -   const struct vgic_params **params)
> +int vgic_v2_probe(const struct gic_kvm_info *gic_kvm_info,
> +const struct vgic_ops **ops,
> +const struct vgic_params **params)
>  {
>   int ret;
> - struct resource vctrl_res;
> - struct resource vcpu_res;
>   struct vgic_params *vgic = &vgic_v2_params;
> + const struct resource *vctrl_res = &gic_kvm_info->vctrl;
> + const struct resource *vcpu_res = &gic_kvm_info->vcpu;
>  
> - vgic->maint_irq = irq_of_parse_and_map(vgic_node, 0);
> - if (!vgic->maint_irq) {
> - kvm_err("err

Re: [PATCH v6 08/10] KVM: arm/arm64: arch_timer: Rely on the arch timer to parse the firmware tables

2016-04-23 Thread Shanker Donthineni


On 04/11/2016 10:32 AM, Julien Grall wrote:
> The firmware table is currently parsed by the virtual timer code in
> order to retrieve the virtual timer interrupt. However, this is already
> done by the arch timer driver.
>
> To avoid code duplication, use the newly function
> arch_timer_get_kvm_info()
> which return all the information required by the virtual timer code.
>
> Signed-off-by: Julien Grall 
> Acked-by: Christoffer Dall 
>
Tested-by: Shanker Donthineni 

Using the Qualcomm Technologies QDF2XXX server platform.

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 07/10] irqchip/gic-v3: Parse and export virtual GIC information

2016-04-23 Thread Shanker Donthineni


On 04/11/2016 10:32 AM, Julien Grall wrote:
> Fill up the recently introduced gic_kvm_info with the hardware
> information used for virtualization.
>
> Signed-off-by: Julien Grall 
> Cc: Thomas Gleixner 
> Cc: Jason Cooper 
> Cc: Marc Zyngier 
>
Tested-by: Shanker Donthineni 

Using the Qualcomm Technologies QDF2XXX server platform.

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 06/10] irqchip/gic-v3: Gather all ACPI specific data in a single structure

2016-04-23 Thread Shanker Donthineni


On 04/11/2016 10:32 AM, Julien Grall wrote:
> The ACPI code requires to use global variables in order to collect
> information from the tables.
>
> To make clear those variables are ACPI specific, gather all of them in a
> single structure.
>
> Furthermore, even if some of the variables are not marked with
> __initdata, they are all only used during the initialization. Therefore,
> the new variable, which hold the structure, can be marked with
> __initdata.
>
> Signed-off-by: Julien Grall 
> Acked-by: Christoffer Dall 
> Reviewed-by: Hanjun Guo 
>
Tested-by: Shanker Donthineni 

Using the Qualcomm Technologies QDF2XXX server platform.

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 04/10] irqchip/gic-v2: Parse and export virtual GIC information

2016-04-23 Thread Shanker Donthineni


On 04/11/2016 10:32 AM, Julien Grall wrote:
> For now, the firmware tables are parsed 2 times: once in the GIC
> drivers, the other timer when initializing the vGIC. It means code
> duplication and make more tedious to add the support for another
> firmware table (like ACPI).
>
> Introduce a new structure and set of helpers to get/set the virtual GIC
> information. Also fill up the structure for GICv2.
>
> Signed-off-by: Julien Grall 
>
Tested-by: Shanker Donthineni 

Using the Qualcomm Technologies QDF2XXX server platform.

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 03/10] irqchip/gic-v2: Gather ACPI specific data in a single structure

2016-04-23 Thread Shanker Donthineni


On 04/11/2016 10:32 AM, Julien Grall wrote:
> The ACPI code requires to use global variables in order to collect
> information from the tables.
>
> For now, a single global variable is used, but more will be added in a
> subsequent patch. To make clear they are ACPI specific, gather all the
> information in a single structure.
>
> Signed-off-by: Julien Grall 
> Acked-by: Christofer Dall 
> Acked-by: Hanjun Guo 
>
Tested-by: Shanker Donthineni 

Using the Qualcomm Technologies QDF2XXX server platform.

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 02/10] clocksource: arm_arch_timer: Extend arch_timer_kvm_info to get the virtual IRQ

2016-04-23 Thread Shanker Donthineni


On 04/11/2016 10:32 AM, Julien Grall wrote:
> Currently, the firmware table is parsed by the virtual timer code in
> order to retrieve the virtual timer interrupt. However, this is already
> done by the arch timer driver.
>
> To avoid code duplication, extend arch_timer_kvm_info to get the virtual
> IRQ.
>
> Note that the KVM code will be modified in a subsequent patch.
>
> Signed-off-by: Julien Grall 
> Acked-by: Christoffer Dall 
>
Tested-by: Shanker Donthineni 

Using the Qualcomm Technologies QDF2XXX server platform.

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 01/10] clocksource: arm_arch_timer: Gather KVM specific information in a structure

2016-04-23 Thread Shanker Donthineni


On 04/11/2016 10:32 AM, Julien Grall wrote:
> Introduce a structure which are filled up by the arch timer driver and
> used by the virtual timer in KVM.
>
> The first member of this structure will be the timecounter. More members
> will be added later.
>
> A stub for the new helper isn't introduced because KVM requires the arch
> timer for both ARM64 and ARM32.
>
> The function arch_timer_get_timecounter is kept for the time being and
> will be dropped in a subsequent patch.
>
> Signed-off-by: Julien Grall 
> Acked-by: Christoffer Dall 
>
Tested-by: Shanker Donthineni 

Using the Qualcomm Technologies QDF2XXX server platform.
> ---
> Cc: Daniel Lezcano 
> Cc: Thomas Gleixner 
> Cc: Marc Zyngier 
>
> Changes in v6:
> - Add Christoffer's acked-by
>
> Changes in v3:
> - Rename the patch
> - Move the KVM changes and removal of arch_timer_get_timecounter
> in separate patches.
> ---
>  drivers/clocksource/arm_arch_timer.c | 12 +---
>  include/clocksource/arm_arch_timer.h |  5 +
>  2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/clocksource/arm_arch_timer.c
> b/drivers/clocksource/arm_arch_timer.c
> index 5152b38..62bdfe7 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -468,11 +468,16 @@ static struct cyclecounter cyclecounter = {
>   .mask   = CLOCKSOURCE_MASK(56),
>  };
>  
> -static struct timecounter timecounter;
> +static struct arch_timer_kvm_info arch_timer_kvm_info;
> +
> +struct arch_timer_kvm_info *arch_timer_get_kvm_info(void)
> +{
> + return &arch_timer_kvm_info;
> +}
>  
>  struct timecounter *arch_timer_get_timecounter(void)
>  {
> - return &timecounter;
> + return &arch_timer_kvm_info.timecounter;
>  }
>  
>  static void __init arch_counter_register(unsigned type)
> @@ -500,7 +505,8 @@ static void __init arch_counter_register(unsigned
> type)
>   clocksource_register_hz(&clocksource_counter, arch_timer_rate);
>   cyclecounter.mult = clocksource_counter.mult;
>   cyclecounter.shift = clocksource_counter.shift;
> - timecounter_init(&timecounter, &cyclecounter, start_count);
> + timecounter_init(&arch_timer_kvm_info.timecounter,
> +  &cyclecounter, start_count);
>  
>   /* 56 bits minimum, so we assume worst case rollover */
>   sched_clock_register(arch_timer_read_counter, 56,
> arch_timer_rate);
> diff --git a/include/clocksource/arm_arch_timer.h
> b/include/clocksource/arm_arch_timer.h
> index 25d0914..9101ed6b 100644
> --- a/include/clocksource/arm_arch_timer.h
> +++ b/include/clocksource/arm_arch_timer.h
> @@ -49,11 +49,16 @@ enum arch_timer_reg {
>  
>  #define ARCH_TIMER_EVT_STREAM_FREQ   1   /* 100us */
>  
> +struct arch_timer_kvm_info {
> + struct timecounter timecounter;
> +};
> +
>  #ifdef CONFIG_ARM_ARCH_TIMER
>  
>  extern u32 arch_timer_get_rate(void);
>  extern u64 (*arch_timer_read_counter)(void);
>  extern struct timecounter *arch_timer_get_timecounter(void);
> +extern struct arch_timer_kvm_info *arch_timer_get_kvm_info(void);
>  
>  #else
>  

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 6/9] irqchip/gic-v3: Parse and export virtual GIC information

2016-04-11 Thread Shanker Donthineni
Hi Julien,

On 04/11/2016 09:27 AM, Julien Grall wrote:
> Hello Hanjun,
>
> On 11/04/16 06:27, Hanjun Guo wrote:
>> On 2016/4/4 19:37, Julien Grall wrote:
>>> +static void __init gic_acpi_setup_kvm_info(void)
>>> +{
>>> +int irq;
>>> +
>>> +if (!gic_acpi_collect_virt_info()) {
>>> +pr_warn("Unable to get hardware information used for
>>> virtualization\n");
>>> +return;
>>> +}
>>> +
>>> +gic_v3_kvm_info.type = GIC_V3;
>>> +
>>> +irq = acpi_register_gsi(NULL, acpi_data.maint_irq,
>>> +acpi_data.maint_irq_mode,
>>> +ACPI_ACTIVE_HIGH);
>>> +if (irq <= 0)
>>> +return;
>>> +
>>> +gic_v3_kvm_info.maint_irq = irq;
>>> +
>>> +if (acpi_data.vcpu_base) {
>>
>> Sorry, I'm not familiar with KVM, but I got a question here, will
>> KVM works without valid vcpu_base in GICv3 mode?
>
Yes, KVM works without vcpu_base in GICv3 mode. The vcpu_base will be used
for emulatingvGICv2 feature. The vGICv3 emulation isdone through the
system registers.

> vcpu_base is only required for supporting GICv2 on GICv3.
>
Yes, you are right,

> Regards,
>

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 6/9] irqchip/gic-v3: Parse and export virtual GIC information

2016-04-08 Thread Shanker Donthineni
return 0;

> + if (first_madt) {
> + first_madt = false;
> +
> + acpi_data.maint_irq = gicc->vgic_interrupt;
> + acpi_data.maint_irq_mode = maint_irq_mode;
> + acpi_data.vcpu_base = gicc->gicv_base_address;
> +
> + return 0;
> + }
> +
> + /*
> +  * The maintenance interrupt and GICV should be the same for every
> CPU
> +  */
> + if ((acpi_data.maint_irq != gicc->vgic_interrupt) ||
> + (acpi_data.maint_irq_mode != maint_irq_mode) ||
> + (acpi_data.vcpu_base != gicc->gicv_base_address))
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +static bool __init gic_acpi_collect_virt_info(void)
> +{
> + int count;
> +
> + count = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_INTERRUPT,
> +   gic_acpi_parse_virt_madt_gicc, 0);
> +
> + return (count > 0);
> +}
> +
>  #define ACPI_GICV3_DIST_MEM_SIZE (SZ_64K)
> +#define ACPI_GICV2_VCTRL_MEM_SIZE(SZ_4K)
> +#define ACPI_GICV2_VCPU_MEM_SIZE (SZ_8K)
> +
> +static void __init gic_acpi_setup_kvm_info(void)
> +{
> + int irq;
> +
> + if (!gic_acpi_collect_virt_info()) {
> + pr_warn("Unable to get hardware information used for
> virtualization\n");
> + return;
> + }
> +
> + gic_v3_kvm_info.type = GIC_V3;
> +
> + irq = acpi_register_gsi(NULL, acpi_data.maint_irq,
> + acpi_data.maint_irq_mode,
> + ACPI_ACTIVE_HIGH);
> + if (irq <= 0)
> + return;
> +
> + gic_v3_kvm_info.maint_irq = irq;
> +
> + if (acpi_data.vcpu_base) {
> + struct resource *vcpu = &gic_v3_kvm_info.vcpu;
> +
> + vcpu->flags = IORESOURCE_MEM;
> + vcpu->start = acpi_data.vcpu_base;
> + vcpu->end = vcpu->start + ACPI_GICV2_VCPU_MEM_SIZE - 1;
> + }
> +
> + gic_set_kvm_info(&gic_v3_kvm_info);
> +}
>  
>  static int __init
>  gic_acpi_init(struct acpi_subtable_header *header, const unsigned long
> end)
> @@ -1159,6 +1265,8 @@ gic_acpi_init(struct acpi_subtable_header *header,
> const unsigned long end)
>   goto out_fwhandle_free;
>  
>   acpi_set_irq_model(ACPI_IRQ_MODEL_GIC, domain_handle);
> + gic_acpi_setup_kvm_info();
> +
>   return 0;
>  
>  out_fwhandle_free:
> diff --git a/include/linux/irqchip/arm-gic-common.h
> b/include/linux/irqchip/arm-gic-common.h
> index ef34f6f..c647b05 100644
> --- a/include/linux/irqchip/arm-gic-common.h
> +++ b/include/linux/irqchip/arm-gic-common.h
> @@ -15,6 +15,7 @@
>  
>  enum gic_type {
>   GIC_V2,
> + GIC_V3,
>  };
>  
>  struct gic_kvm_info {

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Intermittent guest kernel crashes with v4.5-rc6.

2016-03-03 Thread Shanker Donthineni


On 03/03/2016 08:03 AM, Marc Zyngier wrote:
> On 03/03/16 13:25, Shanker Donthineni wrote:
>>
>> On 03/02/2016 11:35 AM, Marc Zyngier wrote:
>>> On 02/03/16 15:48, Shanker Donthineni wrote:
>>>
>>>> We haven't started running heavy workloads in VMs. So far we
>>>> have noticed this random nature behavior only during guest
>>>> kernel boot (at EL1).  
>>>>
>>>> We didn't see this problem on 4.3 kernel. Do you think it is
>>>> related to TLB conflicts?
>>> I cannot imagine why a DSB would solve a TLB conflict. But the fact that
>>> you didn't see it crashing on 4.3 is a good indication that something
>>> else it at play.
>>>
>>> In 4.5, we've rewritten a large part of KVM in C, which has changed the
>>> ordering of the various accesses a lot. It could be that a latent
>>> problem is now exposed more widely.
>>>
>>> Can you try moving this DSB around and find out what is the earliest
>>> point where it solves this problem? Some sort of bisection?
>> The maximum I can move up 'dsb ishst' to the beginning of
>> __guest_enter() but not out side of this function.
>>
>> I don't understand why it is failing below code, branch
>> instruction causing problems.
>>
>> /* Jump in the fire! */
>> +  dsb(ishst);
>> exit_code = __guest_enter(vcpu, host_ctxt);
>> /* And we're baaack! */
> That's very worrying. I can't see how the branch can have an influence
> on the the DSB (nor why the DSB has an influence on the rest of the
> execution, btw).
>
> What if you replace the DSB with an ISB? Do you observe a similar
> behaviour (works if the barrier is in __guest_enter, but not if it is
> outside)?
I have already tried with isb without success. I did another
experiment flush stage-2 TLBs before calling __guest_enetr(),
it fixed the problem.

> Another thing worth looking at is what happened just before we decided
> to get back into the guest. Or to put it differently, what was the
> reason to exit the first place. Was it a Stage-2 fault by any chance?

I will collect as much possible debug data and update results
to you. I went through your KVM refracted 'C' code and did not
find any thing suspicious. I am thinking may be Qualcomm CPUs
have a very aggressive prefech logic that causing the problem. 

> Thanks,
>
>   M.

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Intermittent guest kernel crashes with v4.5-rc6.

2016-03-03 Thread Shanker Donthineni


On 03/02/2016 11:35 AM, Marc Zyngier wrote:
> On 02/03/16 15:48, Shanker Donthineni wrote:
>
>> We haven't started running heavy workloads in VMs. So far we
>> have noticed this random nature behavior only during guest
>> kernel boot (at EL1).  
>>
>> We didn't see this problem on 4.3 kernel. Do you think it is
>> related to TLB conflicts?
> I cannot imagine why a DSB would solve a TLB conflict. But the fact that
> you didn't see it crashing on 4.3 is a good indication that something
> else it at play.
>
> In 4.5, we've rewritten a large part of KVM in C, which has changed the
> ordering of the various accesses a lot. It could be that a latent
> problem is now exposed more widely.
>
> Can you try moving this DSB around and find out what is the earliest
> point where it solves this problem? Some sort of bisection?
The maximum I can move up 'dsb ishst' to the beginning of
__guest_enter() but not out side of this function.

I don't understand why it is failing below code, branch
instruction causing problems.

/* Jump in the fire! */
+  dsb(ishst);
exit_code = __guest_enter(vcpu, host_ctxt);
/* And we're baaack! */

> Thanks,
>
>   M.

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Intermittent guest kernel crashes with v4.5-rc6.

2016-03-02 Thread Shanker Donthineni


On 03/02/2016 09:09 AM, Marc Zyngier wrote:
> On 02/03/16 14:59, Shanker Donthineni wrote:
>> Hi Marc,
>>
>> Thanks for your quick reply.
>>
>> On 03/02/2016 08:16 AM, Marc Zyngier wrote:
>>> On 02/03/16 13:56, Shanker Donthineni wrote:
>>>> For some reason v4.5-rc6 kernel is not stable for guest machines on
>>>> Qualcomm server platforms.
>>>> We are getting IABT translation faults while booting the guest kernel.
>>>> The problem disappears with
>>>> the following code snippet (insert "dsb ish" instruction just before
>>>> switching to EL1 guest). I am
>>>> using v4.5-rc6 kernel for both host and guest machines.
>>>>
>>>> Please let me know if you have any thoughts or ideas for tracing this
>>>> problem.
>>>>
>>>> --- a/arch/arm64/kvm/hyp/entry.S
>>>> +++ b/arch/arm64/kvm/hyp/entry.S
>>>> @@ -88,6 +88,7 @@ ENTRY(__guest_enter)
>>>>   ldp x0, x1, [sp], #16
>>>>
>>>>   // Do not touch any register after this!
>>>> +   dsb ish
>>>>   eret
>>>>ENDPROC(__guest_enter)
>>>>
>>>>
>>>> Using below QEMU command for launching guest machine:
>>>>
>>>> qemu-system-aarch64 -machine type=virt,accel=kvm,gic-version=3  \
>>>> -cpu "host" -smp cpus=1,maxcpus=1 -m 256M -serial stdio \
>>>> -kernel /boot/Image -initrd /boot/rootfs.cpio.gz \
>>>> -append 'earlycon=earlycon=pl011,0x0900  \
>>>> console=ttyAMA0,115200 root=/dev/ram'
>>>>
>>>>
>>>> Guest machine crash log messages:
>>>>
>>>> [0.00] Booting Linux on physical CPU 0x0
>>>> [0.00] Boot CPU: AArch64 Processor [510f2811]
>>>> [0.00] Bad mode in Synchronous Abort handler detected, code
>>>> 0x860f -- IABT (current EL)
>>>> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.rc6+
>>>> [0.00] task: ffc000d52200 ti: ffc000d44000 task.ti:
>>>> ffc000d44000
>>>> [0.00] PC is at early_init_dt_scan_root+0x28/0x94
>>>> [0.00] LR is at of_scan_flat_dt+0x9c/0xd0
>>>> [0.00] pc : [] lr : []
>>>> pstate: 83c5
>>>> [0.00] sp : ffc000d47e80
>>>> [0.00] x29: ffc000d47e80 x28: 
>>>>
>>> If you're getting a prefetch abort, it would be interesting to find out
>>> what instruction is there, whether the page is mapped at stage-2 or not,
>>> what are the stage-2 permissions... Basically, a full description of the
>>> memory state.
>>>
>>> Also, does it work if you do a "dsb ishst" instead?
>>>
>>> Thanks,
>>>
>>> M.
>> Most of the times it is faulting at ldr/str instructions. I have
>> verified stage-1 page and  the
>> the corresponding stage-2 page attributes (SH, AP, PERM), PA etc. after
>> IABT, everything
>> perfectly matches. I am very confident that stage-1/stage-2 MMU page
>> tables are correct.
>>
>> Instruction "dsb ishst" also fixing the problem.
>>
>> One more Interesting observation, if retry an instruction fetch that
>> caused IABT, second
>> time fetch is successful and I don't see IABT.  I used below
>> experimental code to test.
>>
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -346,6 +346,7 @@ el1_sync:
>>  b.eqel1_undef
>>  cmp x24, #ESR_ELx_EC_BREAKPT_CUR// debug exception in EL1
>>  b.geel1_dbg
>> +   kernel_exit 1
>>  b   el1_inv
>>   el1_da:
>>
>>
> OK, that's pretty scary, specially considering that we don't have a DSB
> on that path. Do you ever see it exploding at EL0?
>
> Thanks,
>
>   M.

We haven't started running heavy workloads in VMs. So far we
have noticed this random nature behavior only during guest
kernel boot (at EL1).  

We didn't see this problem on 4.3 kernel. Do you think it is
related to TLB conflicts?

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


  1   2   >