Re: [PATCH v4 01/16] perf/x86/intel: Add x86_pmu.pebs_vmx for Ice Lake Servers

2021-04-12 Thread Liuxiangdong (Aven, Cloud Infrastructure Service Product Dept.)




On 2021/4/9 16:46, Like Xu wrote:

Hi Liuxiangdong,

On 2021/4/9 16:33, Liuxiangdong (Aven, Cloud Infrastructure Service 
Product Dept.) wrote:

Do you have any comments or ideas about it ?

https://lore.kernel.org/kvm/606e5ef6.2060...@huawei.com/


My expectation is that there may be many fewer PEBS samples
on Skylake without any soft lockup.

You may need to confirm the statement

"All that matters is that the EPT pages don't get
unmapped ever while PEBS is active"

is true in the kernel level.

Try "-overcommit mem-lock=on" for your qemu.



We have used "-overcommit mem-lock=on" for qemu when soft lockup.

It seems that ept violation happens when we use pebs.

[ 5199.056246] Call Trace:
[ 5199.056248]  _raw_spin_lock+0x1b/0x20[ 5199.056251] 
follow_page_pte+0xf5/0x580
[ 5199.056258]  __get_user_pages+0x1d6/0x750[ 5199.056262] 
get_user_pages_unlocked+0xdc/0x310

[ 5199.056265]  __gfn_to_pfn_memslot+0x12d/0x4d0 [kvm]
[ 5199.056304]  try_async_pf+0xcc/0x250 [kvm]
[ 5199.056337]  direct_page_fault+0x413/0xa90 [kvm]
[ 5199.056367]  kvm_mmu_page_fault+0x77/0x5e0 [kvm]
[ 5199.056395]  ? vprintk_emit+0xa2/0x240
[ 5199.056399]  ? vmx_vmexit+0x1d/0x40 [kvm_intel]
[ 5199.056407]  ? vmx_vmexit+0x11/0x40 [kvm_intel]
[ 5199.056412]  vmx_handle_exit+0xfe/0x640 [kvm_intel]
[ 5199.056418]  vcpu_enter_guest+0x904/0x1450 [kvm]
[ 5199.056445]  ? kvm_apic_has_interrupt+0x44/0x80 [kvm]
[ 5199.056472]  ? apic_has_interrupt_for_ppr+0x62/0x90 [kvm]
[ 5199.056498]  ? kvm_arch_vcpu_ioctl_run+0xeb/0x550 [kvm]
[ 5199.056523]  kvm_arch_vcpu_ioctl_run+0xeb/0x550 [kvm]
[ 5199.056547]  kvm_vcpu_ioctl+0x23e/0x5b0 [kvm]
[ 5199.056568]  __x64_sys_ioctl+0x8e/0xd0
[ 5199.056571]  do_syscall_64+0x33/0x40
[ 5199.056574]  entry_SYSCALL_64_after_hwframe+0x44/0xae


SDM 17.4.9.2 "Setting Up the DS Save Area" says:

The recording of branch records in the BTS buffer (or PEBS records in 
the PEBS buffer) may not operate
properly if accesses to the linear addresses in any of the three DS save 
area sections cause page faults, VM
exits, or the setting of accessed or dirty flags in the paging 
structures (ordinary or EPT). For that reason,
system software should establish paging structures (both ordinary and 
EPT) to prevent such occurrences.
Implications of this may be that an operating system should allocate 
this memory from a non-paged pool and
that system software cannot do “lazy” page-table entry propagation for 
these pages. Some newer processor
generations support “lazy” EPT page-table entry propagation for PEBS; 
see Section 18.3.10.1 and Section
18.9.5 for more information. A virtual-machine monitor may choose to 
allow use of PEBS by guest software

only if EPT maps all guest-physical memory as present and read/write.


The reason why soft lockup happens may be the unmapped EPT pages. So, do 
we have a way to map all gpa

before we use pebs on Skylake?





On 2021/4/6 13:14, Xu, Like wrote:

Hi Xiangdong,

On 2021/4/6 11:24, Liuxiangdong (Aven, Cloud Infrastructure Service 
Product Dept.) wrote:

Hi,like.
Some questions about this new pebs patches set:
https://lore.kernel.org/kvm/20210329054137.120994-2-like...@linux.intel.com/ 



The new hardware facility supporting guest PEBS is only available
on Intel Ice Lake Server platforms for now.


Yes, we have documented this "EPT-friendly PEBS" capability in the SDM
18.3.10.1 Processor Event Based Sampling (PEBS) Facility

And again, this patch set doesn't officially support guest PEBS on 
the Skylake.





AFAIK, Icelake supports adaptive PEBS and extended PEBS which 
Skylake doesn't.
But we can still use IA32_PEBS_ENABLE MSR to indicate 
general-purpose counter in Skylake.


For Skylake, only the PMC0-PMC3 are valid for PEBS and you may
mask the other unsupported bits in the pmu->pebs_enable_mask.


Is there anything else that only Icelake supports in this patches set?


The PDIR counter on the Ice Lake is the fixed counter 0
while the PDIR counter on the Sky Lake is the gp counter 1.

You may also expose x86_pmu.pebs_vmx for Skylake in the 1st patch.




Besides, we have tried this patches set in Icelake.  We can use 
pebs(eg: "perf record -e cycles:pp")
when guest is kernel-5.11, but can't when kernel-4.18.  Is there a 
minimum guest kernel version requirement?


The Ice Lake CPU model has been added since v5.4.

You may double check whether the stable tree(s) code has
INTEL_FAM6_ICELAKE in the arch/x86/include/asm/intel-family.h.




Thanks,
Xiangdong Liu










Re: [PATCH v4 01/16] perf/x86/intel: Add x86_pmu.pebs_vmx for Ice Lake Servers

2021-04-09 Thread Liuxiangdong (Aven, Cloud Infrastructure Service Product Dept.)

Do you have any comments or ideas about it ?

https://lore.kernel.org/kvm/606e5ef6.2060...@huawei.com/


On 2021/4/6 13:14, Xu, Like wrote:

Hi Xiangdong,

On 2021/4/6 11:24, Liuxiangdong (Aven, Cloud Infrastructure Service 
Product Dept.) wrote:

Hi,like.
Some questions about this new pebs patches set:
https://lore.kernel.org/kvm/20210329054137.120994-2-like...@linux.intel.com/ 



The new hardware facility supporting guest PEBS is only available
on Intel Ice Lake Server platforms for now.


Yes, we have documented this "EPT-friendly PEBS" capability in the SDM
18.3.10.1 Processor Event Based Sampling (PEBS) Facility

And again, this patch set doesn't officially support guest PEBS on the 
Skylake.





AFAIK, Icelake supports adaptive PEBS and extended PEBS which 
Skylake doesn't.
But we can still use IA32_PEBS_ENABLE MSR to indicate general-purpose 
counter in Skylake.


For Skylake, only the PMC0-PMC3 are valid for PEBS and you may
mask the other unsupported bits in the pmu->pebs_enable_mask.


Is there anything else that only Icelake supports in this patches set?


The PDIR counter on the Ice Lake is the fixed counter 0
while the PDIR counter on the Sky Lake is the gp counter 1.

You may also expose x86_pmu.pebs_vmx for Skylake in the 1st patch.




Besides, we have tried this patches set in Icelake.  We can use 
pebs(eg: "perf record -e cycles:pp")
when guest is kernel-5.11, but can't when kernel-4.18.  Is there a 
minimum guest kernel version requirement?


The Ice Lake CPU model has been added since v5.4.

You may double check whether the stable tree(s) code has
INTEL_FAM6_ICELAKE in the arch/x86/include/asm/intel-family.h.




Thanks,
Xiangdong Liu






Re: [PATCH v4 01/16] perf/x86/intel: Add x86_pmu.pebs_vmx for Ice Lake Servers

2021-04-07 Thread Liuxiangdong (Aven, Cloud Infrastructure Service Product Dept.)




On 2021/4/6 13:14, Xu, Like wrote:

Hi Xiangdong,

On 2021/4/6 11:24, Liuxiangdong (Aven, Cloud Infrastructure Service 
Product Dept.) wrote:

Hi,like.
Some questions about this new pebs patches set:
https://lore.kernel.org/kvm/20210329054137.120994-2-like...@linux.intel.com/ 



The new hardware facility supporting guest PEBS is only available
on Intel Ice Lake Server platforms for now.


Yes, we have documented this "EPT-friendly PEBS" capability in the SDM
18.3.10.1 Processor Event Based Sampling (PEBS) Facility

And again, this patch set doesn't officially support guest PEBS on the 
Skylake.





AFAIK, Icelake supports adaptive PEBS and extended PEBS which 
Skylake doesn't.
But we can still use IA32_PEBS_ENABLE MSR to indicate general-purpose 
counter in Skylake.


For Skylake, only the PMC0-PMC3 are valid for PEBS and you may
mask the other unsupported bits in the pmu->pebs_enable_mask.


Is there anything else that only Icelake supports in this patches set?


The PDIR counter on the Ice Lake is the fixed counter 0
while the PDIR counter on the Sky Lake is the gp counter 1.

You may also expose x86_pmu.pebs_vmx for Skylake in the 1st patch.



Yes. In fact, I have tried using this patch set in Skylake after these 
modifications:

1.  Expose x86_pmu.pebs_vmx for Skylake.
2.  Use PMC0-PMC3 for pebs
2.1 Replace "INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed" with 
"x86_pmu.max_pebs_events" in "x86_pmu_handle_guest_pebs"
2.2 Unmask other unsupported bits in the pmu->pebs_enable_mask. 
IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14]
  is always 0 in Skylake, so pmu->pebs_enable_mask equals 
`((1ull << pmu->nr_arch_gp_counters)-1).
2.3  Replace "pmc->idx == 32 " with "pmc->idx == 1" because the 
PDIR counter on the Skylake is the gp counter 1.

3.  Shield patch-09 because Skylake does not support adaptive pebs.
4.  Shield all cpu check code in this patch set just for test.


But, unfortunately, guest will record only a few seconds and then host 
will certainly soft lockup .

Is there anything wrong?




Besides, we have tried this patches set in Icelake.  We can use 
pebs(eg: "perf record -e cycles:pp")
when guest is kernel-5.11, but can't when kernel-4.18.  Is there a 
minimum guest kernel version requirement?


The Ice Lake CPU model has been added since v5.4.

You may double check whether the stable tree(s) code has
INTEL_FAM6_ICELAKE in the arch/x86/include/asm/intel-family.h.




Thanks,
Xiangdong Liu






Re: [PATCH v4 01/16] perf/x86/intel: Add x86_pmu.pebs_vmx for Ice Lake Servers

2021-04-06 Thread Liuxiangdong (Aven, Cloud Infrastructure Service Product Dept.)




On 2021/4/6 20:47, Andi Kleen wrote:

AFAIK, Icelake supports adaptive PEBS and extended PEBS which Skylake
doesn't.
But we can still use IA32_PEBS_ENABLE MSR to indicate general-purpose
counter in Skylake.
Is there anything else that only Icelake supports in this patches set?

Only Icelake server has the support for recovering from a EPT violation
on the PEBS data structures. To use it on Skylake server you would
need to pin the whole guest, but that is currently not done.
Sorry. Some questions about "Pin the whole guest". Do you mean VmPin 
equals VmSize
in "/proc/$(pidof qemu-kvm)/status"? Or just VmLck equals VmSize? Or 
something else?

Besides, we have tried this patches set in Icelake.  We can use pebs(eg:
"perf record -e cycles:pp")
when guest is kernel-5.11, but can't when kernel-4.18.  Is there a minimum
guest kernel version requirement?

You would need a guest kernel that supports Icelake server PEBS. 4.18
would need backports for tht.


-Andi




Re: [PATCH v4 01/16] perf/x86/intel: Add x86_pmu.pebs_vmx for Ice Lake Servers

2021-04-05 Thread Liuxiangdong (Aven, Cloud Infrastructure Service Product Dept.)

Hi,like.
Some questions about this new pebs patches set:
https://lore.kernel.org/kvm/20210329054137.120994-2-like...@linux.intel.com/

The new hardware facility supporting guest PEBS is only available
on Intel Ice Lake Server platforms for now.


AFAIK, Icelake supports adaptive PEBS and extended PEBS which Skylake 
doesn't.
But we can still use IA32_PEBS_ENABLE MSR to indicate general-purpose 
counter in Skylake.

Is there anything else that only Icelake supports in this patches set?


Besides, we have tried this patches set in Icelake.  We can use pebs(eg: 
"perf record -e cycles:pp")
when guest is kernel-5.11, but can't when kernel-4.18.  Is there a 
minimum guest kernel version requirement?



Thanks,
Xiangdong Liu


Re: [PATCH v3 00/17] KVM: x86/pmu: Add support to enable Guest PEBS via DS

2021-01-28 Thread Liuxiangdong (Aven, Cloud Infrastructure Service Product Dept.)




On 2021/1/26 15:08, Xu, Like wrote:

On 2021/1/25 22:47, Liuxiangdong (Aven, Cloud Infrastructure Service
Product Dept.) wrote:

Thanks for replying,

On 2021/1/25 10:41, Like Xu wrote:

+ k...@vger.kernel.org

Hi Liuxiangdong,

On 2021/1/22 18:02, Liuxiangdong (Aven, Cloud Infrastructure Service
Product Dept.) wrote:

Hi Like,

Some questions about
https://lore.kernel.org/kvm/20210104131542.495413-1-like...@linux.intel.com/
<https://lore.kernel.org/kvm/20210104131542.495413-1-like...@linux.intel.com/>


Thanks for trying the PEBS feature in the guest,
and I assume you have correctly applied the QEMU patches for guest PEBS.


Is there any other patch that needs to be apply? I use qemu 5.2.0.
(download from github on January 14th)

Two qemu patches are attached against qemu tree
(commit 31ee895047bdcf7387e3570cbd2a473c6f744b08)
and then run the guest with "-cpu,pebs=true".

Note, this two patch are just for test and not finalized for qemu upstream.

Yes, we can use pebs in IceLake when qemu patches applied.
Thanks very much!

1)Test in IceLake

In the [PATCH v3 10/17] KVM: x86/pmu: Expose CPUIDs feature bits PDCM,
DS, DTES64, we only support Ice Lake with the following x86_model(s):

#define INTEL_FAM6_ICELAKE_X0x6A
#define INTEL_FAM6_ICELAKE_D0x6C

you can check the eax output of "cpuid -l 1 -1 -r",
for example "0x000606a4" meets this requirement.

It's INTEL_FAM6_ICELAKE_X

Yes, it's the target hardware.


cpuid -l 1 -1 -r

CPU:
0x0001 0x00: eax=0x000606a6 ebx=0xb4800800 ecx=0x7ffefbf7
edx=0xbfebfbff


HOST:

CPU family:  6

Model:   106

Model name:  Intel(R) Xeon(R) Platinum 8378A CPU $@ $@

microcode: sig=0x606a6, pf=0x1, revision=0xd000122

As long as you get the latest BIOS from the provider,
you may check 'cat /proc/cpuinfo | grep code | uniq' with the latest one.

OK. I'll do it later.

Guest:  linux kernel 5.11.0-rc2

I assume it's the "upstream tag v5.11-rc2" which is fine.

Yes.

We can find pebs/intel_pt flag in guest cpuinfo, but there still exists
error when we use perf

Just a note, intel_pt and pebs are two features and we can write
pebs records to intel_pt buffer with extra hardware support.
(by default, pebs records are written to the pebs buffer)

You may check the output of "dmesg | grep PEBS" in the guest
to see if the guest PEBS cpuinfo is exposed and use "perf record
–e cycles:pp" to see if PEBS feature actually  works in the guest.

I apply only pebs patch set to linux kernel 5.11.0-rc2, test perf in
guest and dump stack when return -EOPNOTSUPP

Yes, you may apply the qemu patches and try it again.


(1)
# perf record -e instructions:pp
Error:
instructions:pp: PMU Hardware doesn't support
sampling/overflow-interrupts. Try 'perf stat'

[  117.793266] Call Trace:
[  117.793270]  dump_stack+0x57/0x6a
[  117.793275]  intel_pmu_setup_lbr_filter+0x137/0x190
[  117.793280]  intel_pmu_hw_config+0x18b/0x320
[  117.793288]  hsw_hw_config+0xe/0xa0
[  117.793290]  x86_pmu_event_init+0x8e/0x210
[  117.793293]  perf_try_init_event+0x40/0x130
[  117.793297]  perf_event_alloc.part.22+0x611/0xde0
[  117.793299]  ? alloc_fd+0xba/0x180
[  117.793302]  __do_sys_perf_event_open+0x1bd/0xd90
[  117.793305]  do_syscall_64+0x33/0x40
[  117.793308]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Do we need lbr when we use pebs?

No, lbr ane pebs are two features and we enable it separately.


I tried to apply lbr patch
set(https://lore.kernel.org/kvm/911adb63-ba05-ea93-c038-1c09cff15...@intel.com/)
to kernel and qemu, but there is still other problem.
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for
event
...

We don't need that patch for PEBS feature.


(2)
# perf record -e instructions:ppp
Error:
instructions:ppp: PMU Hardware doesn't support
sampling/overflow-interrupts. Try 'perf stat'

[  115.188498] Call Trace:
[  115.188503]  dump_stack+0x57/0x6a
[  115.188509]  x86_pmu_hw_config+0x1eb/0x220
[  115.188515]  intel_pmu_hw_config+0x13/0x320
[  115.188519]  hsw_hw_config+0xe/0xa0
[  115.188521]  x86_pmu_event_init+0x8e/0x210
[  115.188524]  perf_try_init_event+0x40/0x130
[  115.188528]  perf_event_alloc.part.22+0x611/0xde0
[  115.188530]  ? alloc_fd+0xba/0x180
[  115.188534]  __do_sys_perf_event_open+0x1bd/0xd90
[  115.188538]  do_syscall_64+0x33/0x40
[  115.188541]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is beacuse x86_pmu.intel_cap.pebs_format is always 0 in
x86_pmu_max_precise().

We rdmsr MSR_IA32_PERF_CAPABILITIES(0x0345)  from HOST, it's f4c5.
 From guest, it's 2000


# perf record –e cycles:pp

Error:

cycles:pp: PMU Hardware doesn’t support sampling/overflow-interrupts.
Try ‘perf stat’

Could you give some advice?

If you have more specific comments or any concerns, just let me know.


2)Test in Skylake

HOST:

CPU family:  6

Model:   8

Re: [PATCH v3 00/17] KVM: x86/pmu: Add support to enable Guest PEBS via DS

2021-01-25 Thread Liuxiangdong (Aven, Cloud Infrastructure Service Product Dept.)

Thanks for replying,

On 2021/1/25 10:41, Like Xu wrote:

+ k...@vger.kernel.org

Hi Liuxiangdong,

On 2021/1/22 18:02, Liuxiangdong (Aven, Cloud Infrastructure Service 
Product Dept.) wrote:

Hi Like,

Some questions about 
https://lore.kernel.org/kvm/20210104131542.495413-1-like...@linux.intel.com/ 
<https://lore.kernel.org/kvm/20210104131542.495413-1-like...@linux.intel.com/> 



Thanks for trying the PEBS feature in the guest,
and I assume you have correctly applied the QEMU patches for guest PEBS.

Is there any other patch that needs to be apply? I use qemu 5.2.0. 
(download from github on January 14th)



1)Test in IceLake


In the [PATCH v3 10/17] KVM: x86/pmu: Expose CPUIDs feature bits PDCM, 
DS, DTES64, we only support Ice Lake with the following x86_model(s):


#define INTEL_FAM6_ICELAKE_X0x6A
#define INTEL_FAM6_ICELAKE_D0x6C

you can check the eax output of "cpuid -l 1 -1 -r",
for example "0x000606a4" meets this requirement.

It's INTEL_FAM6_ICELAKE_X
cpuid -l 1 -1 -r

CPU:
   0x0001 0x00: eax=0x000606a6 ebx=0xb4800800 ecx=0x7ffefbf7 
edx=0xbfebfbff




HOST:

CPU family:  6

Model:   106

Model name:  Intel(R) Xeon(R) Platinum 8378A CPU 
$@ $@


microcode: sig=0x606a6, pf=0x1, revision=0xd000122


As long as you get the latest BIOS from the provider,
you may check 'cat /proc/cpuinfo | grep code | uniq' with the latest one.

OK. I'll do it later.




Guest:  linux kernel 5.11.0-rc2


I assume it's the "upstream tag v5.11-rc2" which is fine.

Yes.




We can find pebs/intel_pt flag in guest cpuinfo, but there still 
exists error when we use perf


Just a note, intel_pt and pebs are two features and we can write
pebs records to intel_pt buffer with extra hardware support.
(by default, pebs records are written to the pebs buffer)

You may check the output of "dmesg | grep PEBS" in the guest
to see if the guest PEBS cpuinfo is exposed and use "perf record
–e cycles:pp" to see if PEBS feature actually  works in the guest.


I apply only pebs patch set to linux kernel 5.11.0-rc2, test perf in 
guest and dump stack when return -EOPNOTSUPP


(1)
# perf record -e instructions:pp
Error:
instructions:pp: PMU Hardware doesn't support 
sampling/overflow-interrupts. Try 'perf stat'


[  117.793266] Call Trace:
[  117.793270]  dump_stack+0x57/0x6a
[  117.793275]  intel_pmu_setup_lbr_filter+0x137/0x190
[  117.793280]  intel_pmu_hw_config+0x18b/0x320
[  117.793288]  hsw_hw_config+0xe/0xa0
[  117.793290]  x86_pmu_event_init+0x8e/0x210
[  117.793293]  perf_try_init_event+0x40/0x130
[  117.793297]  perf_event_alloc.part.22+0x611/0xde0
[  117.793299]  ? alloc_fd+0xba/0x180
[  117.793302]  __do_sys_perf_event_open+0x1bd/0xd90
[  117.793305]  do_syscall_64+0x33/0x40
[  117.793308]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Do we need lbr when we use pebs?

I tried to apply lbr patch 
set(https://lore.kernel.org/kvm/911adb63-ba05-ea93-c038-1c09cff15...@intel.com/) 
to kernel and qemu, but there is still other problem.

Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) 
for event

...

(2)
# perf record -e instructions:ppp
Error:
instructions:ppp: PMU Hardware doesn't support 
sampling/overflow-interrupts. Try 'perf stat'


[  115.188498] Call Trace:
[  115.188503]  dump_stack+0x57/0x6a
[  115.188509]  x86_pmu_hw_config+0x1eb/0x220
[  115.188515]  intel_pmu_hw_config+0x13/0x320
[  115.188519]  hsw_hw_config+0xe/0xa0
[  115.188521]  x86_pmu_event_init+0x8e/0x210
[  115.188524]  perf_try_init_event+0x40/0x130
[  115.188528]  perf_event_alloc.part.22+0x611/0xde0
[  115.188530]  ? alloc_fd+0xba/0x180
[  115.188534]  __do_sys_perf_event_open+0x1bd/0xd90
[  115.188538]  do_syscall_64+0x33/0x40
[  115.188541]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is beacuse x86_pmu.intel_cap.pebs_format is always 0 in 
x86_pmu_max_precise().


We rdmsr MSR_IA32_PERF_CAPABILITIES(0x0345)  from HOST, it's f4c5.
From guest, it's 2000



# perf record –e cycles:pp

Error:

cycles:pp: PMU Hardware doesn’t support sampling/overflow-interrupts. 
Try ‘perf stat’


Could you give some advice?


If you have more specific comments or any concerns, just let me know.



2)Test in Skylake

HOST:

CPU family:  6

Model:   85

Model name:  Intel(R) Xeon(R) Gold 6146 CPU @

   3.20GHz

microcode: 0x264

Guest: linux 4.18

we cannot find intel_pt flag in guest cpuinfo because 
cpu_has_vmx_intel_pt() return false.


You may check vmx_pebs_supported().

It's true.




SECONDARY_EXEC_PT_USE_GPA/VM_EXIT_CLEAR_IA32_RTIT_CTL/VM_ENTRY_LOAD_IA32_RTIT_CTL 
are both disable.


Is it because microcode is not supported?

And, isthere a new macrocode which can support these bits? How can we 
get this?


Currently, this patch set doesn't support guest PEBS on the Skylake
platforms, and if