from:"Christopher Covington"

[Qemu-devel] [PATCH v3] build: include sys/sysmacros.h for major() and minor()

2016-12-28 Thread Christopher Covington

The definition of the major() and minor() macros are moving within glibc to
. Include this header when it is available to avoid the
following sorts of build-stopping messages:

qga/commands-posix.c: In function ‘dev_major_minor’:
qga/commands-posix.c:656:13: error: In the GNU C Library, "major" is defined
 by . For historical compatibility, it is
 currently defined by  as well, but we plan to
 remove this soon. To use "major", include 
 directly. If you did not intend to use a system-defined macro
 "major", you should undefine it after including . [-Werror]
 *devmajor = major(st.st_rdev);
 ^~

qga/commands-posix.c:657:13: error: In the GNU C Library, "minor" is defined
 by . For historical compatibility, it is
 currently defined by  as well, but we plan to
 remove this soon. To use "minor", include 
 directly. If you did not intend to use a system-defined macro
 "minor", you should undefine it after including . [-Werror]
 *devminor = minor(st.st_rdev);
 ^~

The additional include allows the build to complete on Fedora 26 (Rawhide)
with glibc version 2.24.90.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 configure | 18 ++
 include/sysemu/os-posix.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/configure b/configure
index 218df87d21..58a33c71ad 100755
--- a/configure
+++ b/configure
@@ -4746,6 +4746,20 @@ if test "$modules" = "yes" && test "$LD_REL_FLAGS" = ""; 
then
 fi
 
 ##
+# check for sysmacros.h
+
+have_sysmacros=no
+cat > $TMPC << EOF
+#include 
+int main(void) {
+return makedev(0, 0);
+}
+EOF
+if compile_prog "" "" ; then
+have_sysmacros=yes
+fi
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
@@ -5721,6 +5735,10 @@ if test "$have_af_vsock" = "yes" ; then
   echo "CONFIG_AF_VSOCK=y" >> $config_host_mak
 fi
 
+if test "$have_sysmacros" = "yes" ; then
+  echo "CONFIG_SYSMACROS=y" >> $config_host_mak
+fi
+
 # Hold two types of flag:
 #   CONFIG_THREAD_SETNAME_BYTHREAD  - we've got a way of setting the name on
 # a thread we have a handle to
diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index b0a6c0695b..900bdcb45a 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -34,6 +34,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_SYSMACROS
+#include 
+#endif
+
 void os_set_line_buffering(void);
 void os_set_proc_name(const char *s);
 void os_setup_signal_handling(void);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [PATCH v2] build: include sys/sysmacros.h for major() and minor()

2016-12-28 Thread Christopher Covington

Hi Eric,

On 12/28/2016 11:10 AM, Eric Blake wrote:
> On 12/28/2016 08:53 AM, Christopher Covington wrote:
>> The definition of the major() and minor() macros are moving within glibc to
>> . Include this header to avoid the following sorts of
>> build-stopping messages:
>>
> 
>> The additional include allows the build to complete on Fedora 26 (Rawhide)
>> with glibc version 2.24.90.
>>
>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>> ---
>>  include/sysemu/os-posix.h | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
>> index b0a6c0695b..772d58f7ed 100644
>> --- a/include/sysemu/os-posix.h
>> +++ b/include/sysemu/os-posix.h
>> @@ -28,6 +28,7 @@
>>  
>>  #include 
>>  #include 
>> +#include 
> 
> I repeat what I said on v1:
> 
> Works for glibc; but  is non-standard and not present
> on some other systems, so this may fail to build elsewhere.

I read your response to v1 but got stuck on this "some other systems"
statement which seems too vague for me to act on. I see the following
operating systems checked in configure:

Cygwin, mingw32, GNU/kFreeBSD, FreeBSD, DragonFly, NetBSD, OpenBSD,
Darwin, SunOS, AIX, Haiku, and Linux.

But I'm really not sure what list of C libraries and corresponding mkdev.h
versus sysmacros.h versus types.h usage this translates to.

> You'll probably need a configure probe.

I'm testing that now and will hopefully send it out as v3 shortly.

> Autoconf also says that some platforms have  instead of
>  (per its AC_HEADER_MAJOR macro).

`git grep mkdev` returns no results for me so I conclude that no currently
supported OS/libc requires it.

In case anyone wants to work around these messages, I'd like to highlight
the --disable-werror option to ./configure. If I had known about it this
morning, I probably would be happily authoring other changes right now.

Thanks,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v2] build: include sys/sysmacros.h for major() and minor()

2016-12-28 Thread Christopher Covington

The definition of the major() and minor() macros are moving within glibc to
. Include this header to avoid the following sorts of
build-stopping messages:

qga/commands-posix.c: In function ‘dev_major_minor’:
qga/commands-posix.c:656:13: error: In the GNU C Library, "major" is defined
 by . For historical compatibility, it is
 currently defined by  as well, but we plan to
 remove this soon. To use "major", include 
 directly. If you did not intend to use a system-defined macro
 "major", you should undefine it after including . [-Werror]
 *devmajor = major(st.st_rdev);
 ^~

qga/commands-posix.c:657:13: error: In the GNU C Library, "minor" is defined
 by . For historical compatibility, it is
 currently defined by  as well, but we plan to
 remove this soon. To use "minor", include 
 directly. If you did not intend to use a system-defined macro
 "minor", you should undefine it after including . [-Werror]
 *devminor = minor(st.st_rdev);
 ^~

The additional include allows the build to complete on Fedora 26 (Rawhide)
with glibc version 2.24.90.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 include/sysemu/os-posix.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index b0a6c0695b..772d58f7ed 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -28,6 +28,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [PATCH kvm-unit-tests v8 03/10] arm/arm64: add some delay routines

2016-12-27 Thread Christopher Covington

On 12/09/2016 07:15 AM, Andrew Jones wrote:
> On Fri, Dec 09, 2016 at 11:41:06AM +, Andre Przywara wrote:
>> Hi,
>>
>> On 08/12/16 17:50, Andrew Jones wrote:
>>> Allow a thread to wait some specified amount of time. Can
>>> specify in cycles, usecs, and msecs.

>>> +++ b/lib/arm/asm/delay.h
>>> @@ -0,0 +1,14 @@
>>> +#ifndef _ASMARM_DELAY_H_
>>> +#define _ASMARM_DELAY_H_
>>> +/*
>>> + * Copyright (C) 2016, Red Hat Inc, Andrew Jones 
>>> + *
>>> + * This work is licensed under the terms of the GNU LGPL, version 2.
>>> + */
>>> +#include 
>>> +
>>> +extern void delay(u64 cycles);
>>
>> Nit: Shouldn't this parameter be called "ticks"? Cycles might be a bit
>> misleading, especially since this prototype is the only documentation on
>> this. You might just want to fix this when applying the patches.
> 
> Right or wrong the kernel uses 'cycles' for this function, named
> __timer_delay for arm and __delay for arm64. I guess I prefer
> consistency here.

I too expect timers to tick and CPUs to cycle. The benefit of
parameter-name-precise consistency with the Linux source is not
obvious to me.

Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking

2016-12-01 Thread Christopher Covington

On 12/01/2016 03:27 PM, Andre Przywara wrote:
> Hi,
> 
> On 01/12/16 05:16, Wei Huang wrote:
>> From: Christopher Covington <c...@codeaurora.org>
>>
>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>> PMU cycle counter values. The code includes a strict checking facility
>> intended for the -icount option in TCG mode in the configuration file.
>>
>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>> Signed-off-by: Wei Huang <w...@redhat.com>
>> Reviewed-by: Andrew Jones <drjo...@redhat.com>
>> ---
>>  arm/pmu.c | 123 
>> +-
>>  arm/unittests.cfg |  14 +++
>>  2 files changed, 136 insertions(+), 1 deletion(-)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 3566a27..29d7c2c 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
>>  set_pmxevtyper(value);
>>  isb();
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to 
>> compensate
>> + * for, so hand assemble everything between, and including, the PMCR 
>> accesses
>> + * to start and stop counting. isb instructions were inserted to make sure
>> + * pmccntr read after this function returns the exact instructions executed 
>> in
>> + * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
>> + */
>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>> +{
>> +asm volatile(
>> +"   mcr p15, 0, %[pmcr], c9, c12, 0\n"
>> +"   isb\n"
>> +"1: subs%[loop], %[loop], #1\n"
>> +"   bgt 1b\n"
>> +"   mcr p15, 0, %[z], c9, c12, 0\n"
>> +"   isb\n"
>> +: [loop] "+r" (loop)
>> +: [pmcr] "r" (pmcr), [z] "r" (0)
>> +: "cc");
>> +}
>>  #elif defined(__aarch64__)
>>  DEFINE_GET_SYSREG32(pmcr, el0)
>>  DEFINE_SET_SYSREG32(pmcr, el0)
>> @@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
>>  DEFINE_SET_SYSREG64(pmccntr, el0);
>>  DEFINE_SET_SYSREG32(pmcntenset, el0);
>>  DEFINE_SET_SYSREG32(pmccfiltr, el0);
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to 
>> compensate
>> + * for, so hand assemble everything between, and including, the PMCR 
>> accesses
>> + * to start and stop counting. isb instructions are inserted to make sure
>> + * pmccntr read after this function returns the exact instructions executed
>> + * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
>> + */
>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>> +{
>> +asm volatile(
>> +"   msr pmcr_el0, %[pmcr]\n"
>> +"   isb\n"
>> +"1: subs%[loop], %[loop], #1\n"
>> +"   b.gt1b\n"
>> +"   msr pmcr_el0, xzr\n"
>> +"   isb\n"
>> +: [loop] "+r" (loop)
>> +: [pmcr] "r" (pmcr)
>> +: "cc");
>> +}
>>  #endif
>>  
>>  /*
>> @@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
>>  return success;
>>  }
>>  
>> +/*
>> + * Execute a known number of guest instructions. Only even instruction 
>> counts
>> + * greater than or equal to 4 are supported by the in-line assembly code. 
>> The
>> + * control register (PMCR_EL0) is initialized with the provided value 
>> (allowing
>> + * for example for the cycle counter or event counters to be reset). At the 
>> end
>> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
>> + * counting, allowing the cycle counter or event counters to be read at the
>> + * leisure of the calling code.
>> + */
>> +static void measure_instrs(int num, uint32_t pmcr)
>> +{
>> +int loop = (num - 2) / 2;
>> +
>> +assert(num >= 4 && ((num - 2) % 2 == 0));
>> +precise_instrs_loop(loop, pmcr);
>> +}
>> +
>> +/*
>> + * Measure cycle counts for various known instruction counts. Ensure that 
>> the
>> + * cycle counter progresses (similar to check_cycles_increase() but with 
>> more
>> + * instructions and using reset and stop controls). If supplied a positive,
>> + * nonzero CPI parameter, also strictly

Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases

2016-11-16 Thread Christopher Covington

On 11/16/2016 11:25 AM, Andrew Jones wrote:
> On Wed, Nov 16, 2016 at 11:08:42AM -0500, Christopher Covington wrote:
>> On 11/16/2016 08:01 AM, Andrew Jones wrote:
>>> On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote:
>>>>
>>>>
>>>> On 11/14/2016 09:12 AM, Christopher Covington wrote:
>>>>> Hi Drew, Wei,
>>>>>
>>>>> On 11/14/2016 05:05 AM, Andrew Jones wrote:
>>>>>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
>>>>>>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
>>>>>>>>> From: Christopher Covington <c...@codeaurora.org>
>>>>>>>>>
>>>>>>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>>>>>>>>> even for the smallest delta of two subsequent reads.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>>>>>>>>> Signed-off-by: Wei Huang <w...@redhat.com>
>>>>>>>>> ---
>>>>>>>>>  arm/pmu.c | 98 
>>>>>>>>> +++
>>>>>>>>>  1 file changed, 98 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>>>>>>>> index 0b29088..d5e3ac3 100644
>>>>>>>>> --- a/arm/pmu.c
>>>>>>>>> +++ b/arm/pmu.c
>>>>>>>>> @@ -14,6 +14,7 @@
>>>>>>>>>   */
>>>>>>>>>  #include "libcflat.h"
>>>>>>>>>  
>>>>>>>>> +#define PMU_PMCR_E (1 << 0)
>>>>>>>>>  #define PMU_PMCR_N_SHIFT   11
>>>>>>>>>  #define PMU_PMCR_N_MASK0x1f
>>>>>>>>>  #define PMU_PMCR_ID_SHIFT  16
>>>>>>>>> @@ -21,6 +22,10 @@
>>>>>>>>>  #define PMU_PMCR_IMP_SHIFT 24
>>>>>>>>>  #define PMU_PMCR_IMP_MASK  0xff
>>>>>>>>>  
>>>>>>>>> +#define PMU_CYCLE_IDX  31
>>>>>>>>> +
>>>>>>>>> +#define NR_SAMPLES 10
>>>>>>>>> +
>>>>>>>>>  #if defined(__arm__)
>>>>>>>>>  static inline uint32_t pmcr_read(void)
>>>>>>>>>  {
>>>>>>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>>>>>>>>>   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>>>>>>>>   return ret;
>>>>>>>>>  }
>>>>>>>>> +
>>>>>>>>> +static inline void pmcr_write(uint32_t value)
>>>>>>>>> +{
>>>>>>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline void pmselr_write(uint32_t value)
>>>>>>>>> +{
>>>>>>>>> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline void pmxevtyper_write(uint32_t value)
>>>>>>>>> +{
>>>>>>>>> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/*
>>>>>>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, 
>>>>>>>>> returning 64
>>>>>>>>> + * bits doesn't seem worth the trouble when differential usage of 
>>>>>>>>> the result is
>>>>>>>>> + * expected (with differences that can easily fit in 32 bits). So 
>>>>>>>>> just return
>>>>>>>>> + * the lower 32 bits of the cycle count in AArch32.
>>>>>>>>
>>>>>>>> Like I said in t

Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases

2016-11-16 Thread Christopher Covington

On 11/16/2016 08:01 AM, Andrew Jones wrote:
> On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote:
>>
>>
>> On 11/14/2016 09:12 AM, Christopher Covington wrote:
>>> Hi Drew, Wei,
>>>
>>> On 11/14/2016 05:05 AM, Andrew Jones wrote:
>>>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
>>>>>
>>>>>
>>>>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
>>>>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
>>>>>>> From: Christopher Covington <c...@codeaurora.org>
>>>>>>>
>>>>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>>>>>>> even for the smallest delta of two subsequent reads.
>>>>>>>
>>>>>>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>>>>>>> Signed-off-by: Wei Huang <w...@redhat.com>
>>>>>>> ---
>>>>>>>  arm/pmu.c | 98 
>>>>>>> +++
>>>>>>>  1 file changed, 98 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>>>>>> index 0b29088..d5e3ac3 100644
>>>>>>> --- a/arm/pmu.c
>>>>>>> +++ b/arm/pmu.c
>>>>>>> @@ -14,6 +14,7 @@
>>>>>>>   */
>>>>>>>  #include "libcflat.h"
>>>>>>>  
>>>>>>> +#define PMU_PMCR_E (1 << 0)
>>>>>>>  #define PMU_PMCR_N_SHIFT   11
>>>>>>>  #define PMU_PMCR_N_MASK0x1f
>>>>>>>  #define PMU_PMCR_ID_SHIFT  16
>>>>>>> @@ -21,6 +22,10 @@
>>>>>>>  #define PMU_PMCR_IMP_SHIFT 24
>>>>>>>  #define PMU_PMCR_IMP_MASK  0xff
>>>>>>>  
>>>>>>> +#define PMU_CYCLE_IDX  31
>>>>>>> +
>>>>>>> +#define NR_SAMPLES 10
>>>>>>> +
>>>>>>>  #if defined(__arm__)
>>>>>>>  static inline uint32_t pmcr_read(void)
>>>>>>>  {
>>>>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>>>>>>> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>>>>>> return ret;
>>>>>>>  }
>>>>>>> +
>>>>>>> +static inline void pmcr_write(uint32_t value)
>>>>>>> +{
>>>>>>> +   asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
>>>>>>> +}
>>>>>>> +
>>>>>>> +static inline void pmselr_write(uint32_t value)
>>>>>>> +{
>>>>>>> +   asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
>>>>>>> +}
>>>>>>> +
>>>>>>> +static inline void pmxevtyper_write(uint32_t value)
>>>>>>> +{
>>>>>>> +   asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
>>>>>>> +}
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, 
>>>>>>> returning 64
>>>>>>> + * bits doesn't seem worth the trouble when differential usage of the 
>>>>>>> result is
>>>>>>> + * expected (with differences that can easily fit in 32 bits). So just 
>>>>>>> return
>>>>>>> + * the lower 32 bits of the cycle count in AArch32.
>>>>>>
>>>>>> Like I said in the last review, I'd rather we not do this. We should
>>>>>> return the full value and then the test case should confirm the upper
>>>>>> 32 bits are zero.
>>>>>
>>>>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
>>>>> register. We can force it to a more coarse-grained cycle counter with
>>>>> PMCR.D bit=1 (see below). But it is still not a 64-bit register.
>>>
>>> AArch32 System Register Descriptions
>>> Performance Monitors registers
>>> PMCCNTR, Performance Monitors Cycle Count Register
>>>
>>> To access the PMCCNTR when accessing as a 32-bit re

Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases

2016-11-14 Thread Christopher Covington

Hi Drew, Wei,

On 11/14/2016 05:05 AM, Andrew Jones wrote:
> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
>>
>>
>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
>>>> From: Christopher Covington <c...@codeaurora.org>
>>>>
>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>>>> even for the smallest delta of two subsequent reads.
>>>>
>>>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>>>> Signed-off-by: Wei Huang <w...@redhat.com>
>>>> ---
>>>>  arm/pmu.c | 98 
>>>> +++
>>>>  1 file changed, 98 insertions(+)
>>>>
>>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>>> index 0b29088..d5e3ac3 100644
>>>> --- a/arm/pmu.c
>>>> +++ b/arm/pmu.c
>>>> @@ -14,6 +14,7 @@
>>>>   */
>>>>  #include "libcflat.h"
>>>>  
>>>> +#define PMU_PMCR_E (1 << 0)
>>>>  #define PMU_PMCR_N_SHIFT   11
>>>>  #define PMU_PMCR_N_MASK0x1f
>>>>  #define PMU_PMCR_ID_SHIFT  16
>>>> @@ -21,6 +22,10 @@
>>>>  #define PMU_PMCR_IMP_SHIFT 24
>>>>  #define PMU_PMCR_IMP_MASK  0xff
>>>>  
>>>> +#define PMU_CYCLE_IDX  31
>>>> +
>>>> +#define NR_SAMPLES 10
>>>> +
>>>>  #if defined(__arm__)
>>>>  static inline uint32_t pmcr_read(void)
>>>>  {
>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>>>>asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>>>return ret;
>>>>  }
>>>> +
>>>> +static inline void pmcr_write(uint32_t value)
>>>> +{
>>>> +  asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
>>>> +}
>>>> +
>>>> +static inline void pmselr_write(uint32_t value)
>>>> +{
>>>> +  asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
>>>> +}
>>>> +
>>>> +static inline void pmxevtyper_write(uint32_t value)
>>>> +{
>>>> +  asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
>>>> +}
>>>> +
>>>> +/*
>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, 
>>>> returning 64
>>>> + * bits doesn't seem worth the trouble when differential usage of the 
>>>> result is
>>>> + * expected (with differences that can easily fit in 32 bits). So just 
>>>> return
>>>> + * the lower 32 bits of the cycle count in AArch32.
>>>
>>> Like I said in the last review, I'd rather we not do this. We should
>>> return the full value and then the test case should confirm the upper
>>> 32 bits are zero.
>>
>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
>> register. We can force it to a more coarse-grained cycle counter with
>> PMCR.D bit=1 (see below). But it is still not a 64-bit register.

AArch32 System Register Descriptions
Performance Monitors registers
PMCCNTR, Performance Monitors Cycle Count Register

To access the PMCCNTR when accessing as a 32-bit register:
MRC p15,0,,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
MCR p15,0,,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are 
unchanged

To access the PMCCNTR when accessing as a 64-bit register:
MRRC p15,0,,,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into 
Rt2
MCRR p15,0,,,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32]

Regards,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [kvm-unit-tests PATCHv6 2/3] arm: pmu: Check cycle count increases

2016-10-12 Thread Christopher Covington

Hi Wei,

On 10/12/2016 11:49 AM, Wei Huang wrote:
> On 10/11/2016 01:40 PM, Christopher Covington wrote:
>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>> even for the smallest delta of two subsequent reads.
>>
>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>> Reviewed-by: Andrew Jones <drjo...@redhat.com>
>> ---
>>  arm/pmu.c | 60 
>>  1 file changed, 60 insertions(+)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 42d0ee1..4334de4 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -14,6 +14,8 @@
>>   */
>>  #include "libcflat.h"
>>  
>> +#define NR_SAMPLES 10
>> +
>>  #if defined(__arm__)
>>  static inline uint32_t get_pmcr(void)
>>  {
>> @@ -22,6 +24,25 @@ static inline uint32_t get_pmcr(void)
>>  asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>  return ret;
>>  }
>> +
>> +static inline void set_pmcr(uint32_t pmcr)
>> +{
>> +asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (pmcr));
>> +}
>> +
>> +/*
>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, 
>> returning 64
>> + * bits doesn't seem worth the trouble when differential usage of the 
>> result is
>> + * expected (with differences that can easily fit in 32 bits). So just 
>> return
>> + * the lower 32 bits of the cycle count in AArch32.
>> + */
>> +static inline unsigned long get_pmccntr(void)
>> +{
>> +unsigned long cycles;
>> +
>> +asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
>> +return cycles;
>> +}
>>  #elif defined(__aarch64__)
>>  static inline uint32_t get_pmcr(void)
>>  {
>> @@ -30,6 +51,19 @@ static inline uint32_t get_pmcr(void)
>>  asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
>>  return ret;
>>  }
>> +
>> +static inline void set_pmcr(uint32_t pmcr)
>> +{
>> +asm volatile("msr pmcr_el0, %0" : : "r" (pmcr));
>> +}
>> +
>> +static inline unsigned long get_pmccntr(void)
>> +{
>> +unsigned long cycles;
>> +
>> +asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
>> +return cycles;
>> +}
>>  #endif
>>  
>>  struct pmu_data {
>> @@ -72,11 +106,37 @@ static bool check_pmcr(void)
>>  return pmu.implementer != 0;
>>  }
>>  
>> +/*
>> + * Ensure that the cycle counter progresses between back-to-back reads.
>> + */
>> +static bool check_cycles_increase(void)
>> +{
>> +struct pmu_data pmu = {0};
> 
> Compilation error on my machine:
> 
> arm/pmu.c: In function ‘check_cycles_increase’:
> arm/pmu.c:148:9: error: missing braces around initializer
> [-Werror=missing-braces]
>   struct pmu_data pmu = {0};
> 
> Same for Patch 3.

"...So your compiler complains about {0}? Is there a problem besides the
warning? If not, then I'm still a bit inclined to keep the code neat. The
warnings will go away with compiler updates."

https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg06064.html

Thanks,
Cov
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [kvm-unit-tests PATCHv6 3/3] arm: pmu: Add CPI checking

2016-10-11 Thread Christopher Covington

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode but it is not yet enabled
in the configuration file. Enabling it must wait on infrastructure
improvements which allow for different tests to be run on TCG versus
KVM.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c | 103 +-
 1 file changed, 102 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index 4334de4..76a 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -43,6 +43,23 @@ static inline unsigned long get_pmccntr(void)
asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
return cycles;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+   asm volatile(
+   "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
+   "1: subs%[i], %[i], #1\n"
+   "   bgt 1b\n"
+   "   mcr p15, 0, %[z], c9, c12, 0\n"
+   : [i] "+r" (i)
+   : [pmcr] "r" (pmcr), [z] "r" (0)
+   : "cc");
+}
 #elif defined(__aarch64__)
 static inline uint32_t get_pmcr(void)
 {
@@ -64,6 +81,23 @@ static inline unsigned long get_pmccntr(void)
asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
return cycles;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+   asm volatile(
+   "   msr pmcr_el0, %[pmcr]\n"
+   "1: subs%[i], %[i], #1\n"
+   "   b.gt1b\n"
+   "   msr pmcr_el0, xzr\n"
+   : [i] "+r" (i)
+   : [pmcr] "r" (pmcr)
+   : "cc");
+}
 #endif
 
 struct pmu_data {
@@ -131,12 +165,79 @@ static bool check_cycles_increase(void)
return true;
 }
 
-int main(void)
+/*
+ * Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+   int i = (num - 1) / 2;
+
+   assert(num >= 3 && ((num - 1) % 2 == 0));
+   loop(i, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+   struct pmu_data pmu = {0};
+
+   pmu.cycle_counter_reset = 1;
+   pmu.enable = 1;
+
+   if (cpi > 0)
+   printf("Checking for CPI=%d.\n", cpi);
+   printf("instrs : cycles0 cycles1 ...\n");
+
+   for (int i = 3; i < 300; i += 32) {
+   int avg, sum = 0;
+
+   printf("%d :", i);
+   for (int j = 0; j < NR_SAMPLES; j++) {
+   int cycles;
+
+   measure_instrs(i, pmu.pmcr_el0);
+   cycles = get_pmccntr();
+   printf(" %d", cycles);
+
+   if (!cycles || (cpi > 0 && cycles != i * cpi)) {
+   printf("\n");
+   return false;
+   }
+
+   sum += cycles;
+   }
+   avg = sum / NR_SAMPLES;
+   printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
+   sum, avg, i / avg, avg / i);
+   }
+
+   return true;
+}
+
+int main(int argc, char *argv[])
 {
+   int cpi = 0;
+
+   if (argc >= 1)
+   cpi = atol(argv[0]);
+
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
report("Monotonically increasing cycle count", check_cycles_increase());
+   report("Cycle/instruction ratio", check_cpi(cpi));
 
return report_summary();
 }
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [kvm-unit-tests PATCHv6 2/3] arm: pmu: Check cycle count increases

2016-10-11 Thread Christopher Covington

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
Reviewed-by: Andrew Jones <drjo...@redhat.com>
---
 arm/pmu.c | 60 
 1 file changed, 60 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 42d0ee1..4334de4 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -14,6 +14,8 @@
  */
 #include "libcflat.h"
 
+#define NR_SAMPLES 10
+
 #if defined(__arm__)
 static inline uint32_t get_pmcr(void)
 {
@@ -22,6 +24,25 @@ static inline uint32_t get_pmcr(void)
asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
return ret;
 }
+
+static inline void set_pmcr(uint32_t pmcr)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (pmcr));
+}
+
+/*
+ * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
+ * bits doesn't seem worth the trouble when differential usage of the result is
+ * expected (with differences that can easily fit in 32 bits). So just return
+ * the lower 32 bits of the cycle count in AArch32.
+ */
+static inline unsigned long get_pmccntr(void)
+{
+   unsigned long cycles;
+
+   asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
+   return cycles;
+}
 #elif defined(__aarch64__)
 static inline uint32_t get_pmcr(void)
 {
@@ -30,6 +51,19 @@ static inline uint32_t get_pmcr(void)
asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
return ret;
 }
+
+static inline void set_pmcr(uint32_t pmcr)
+{
+   asm volatile("msr pmcr_el0, %0" : : "r" (pmcr));
+}
+
+static inline unsigned long get_pmccntr(void)
+{
+   unsigned long cycles;
+
+   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+   return cycles;
+}
 #endif
 
 struct pmu_data {
@@ -72,11 +106,37 @@ static bool check_pmcr(void)
return pmu.implementer != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+   struct pmu_data pmu = {0};
+
+   pmu.enable = 1;
+   set_pmcr(pmu.pmcr_el0);
+
+   for (int i = 0; i < NR_SAMPLES; i++) {
+   unsigned long a, b;
+
+   a = get_pmccntr();
+   b = get_pmccntr();
+
+   if (a >= b) {
+   printf("Read %ld then %ld.\n", a, b);
+   return false;
+   }
+   }
+
+   return true;
+}
+
 int main(void)
 {
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
+   report("Monotonically increasing cycle count", check_cycles_increase());
 
return report_summary();
 }
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [kvm-unit-tests PATCHv6 1/3] arm: Add PMU test

2016-10-11 Thread Christopher Covington

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU). As of
October 2016, whether KVM mode has a PMU at all is a tricky
question of which QEMU / mach-virt version is used. So only enable
the TCG mode tests for now.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
Reviewed-by: Andrew Jones <drjo...@redhat.com>
---
 arm/Makefile.common |  3 +-
 arm/pmu.c   | 82 +
 arm/unittests.cfg   | 14 +
 3 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index ccb554d..f98f422 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -11,7 +11,8 @@ endif
 
 tests-common = \
$(TEST_DIR)/selftest.flat \
-   $(TEST_DIR)/spinlock-test.flat
+   $(TEST_DIR)/spinlock-test.flat \
+   $(TEST_DIR)/pmu.flat
 
 all: test_cases
 
diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..42d0ee1
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,82 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright 2015 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+
+#if defined(__arm__)
+static inline uint32_t get_pmcr(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
+   return ret;
+}
+#elif defined(__aarch64__)
+static inline uint32_t get_pmcr(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
+   return ret;
+}
+#endif
+
+struct pmu_data {
+   union {
+   uint32_t pmcr_el0;
+   struct {
+   uint32_t enable:1;
+   uint32_t event_counter_reset:1;
+   uint32_t cycle_counter_reset:1;
+   uint32_t cycle_counter_clock_divider:1;
+   uint32_t event_counter_export:1;
+   uint32_t cycle_counter_disable_when_prohibited:1;
+   uint32_t cycle_counter_long:1;
+   uint32_t reserved:4;
+   uint32_t counters:5;
+   uint32_t identification_code:8;
+   uint32_t implementer:8;
+   };
+   };
+};
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+   struct pmu_data pmu;
+
+   pmu.pmcr_el0 = get_pmcr();
+
+   printf("PMU implementer: %c\n", pmu.implementer);
+   printf("Identification code: 0x%x\n", pmu.identification_code);
+   printf("Event counters:  %d\n", pmu.counters);
+
+   return pmu.implementer != 0;
+}
+
+int main(void)
+{
+   report_prefix_push("pmu");
+
+   report("Control register", check_pmcr());
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index ffd12e5..edaed4a 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -51,3 +51,17 @@ file = selftest.flat
 smp = $MAX_SMP
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support with -icount IPC=1
+[pmu-icount-1]
+file = pmu.flat
+extra_params = -icount 0 -append '1'
+groups = pmu
+accel = tcg
+
+# Test PMU support with -icount IPC=256
+[pmu-icount-256]
+file = pmu.flat
+extra_params = -icount 8 -append '256'
+groups = pmu
+accel = tcg
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [PATCH] hw/intc/arm_gic_kvm: Fix build on aarch64

2016-10-11 Thread Christopher Covington

On 10/11/2016 12:43 PM, Peter Maydell wrote:
> On 11 October 2016 at 17:32, Christopher Covington <c...@codeaurora.org> 
> wrote:
>> Remove unused debugging code to fix native building on aarch64. Without
>> this change, the following -Werr output inhibits make from completing.
>>
>>   qemu/hw/intc/arm_gic_kvm.c:38:18: error: debug_gic_kvm defined but not 
>> used [-Werror=unused-const-variable=]
>>static const int debug_gic_kvm = 0;
>> ^
>>   cc1: all warnings being treated as errors
>>   qemu/rules.mak:60: recipe for target 'hw/intc/arm_gic_kvm.o' failed
>>   make[1]: *** [hw/intc/arm_gic_kvm.o] Error 1
>>   Makefile:205: recipe for target 'subdir-aarch64-softmmu' failed
> 
> This builds for me on aarch64, so presumably this is a "newer
> compiler is more picky" warning.

>From Fedora 25:
gcc (GCC) 6.2.1 20160916 (Red Hat 6.2.1-2)

> In any case, since we don't use the DPRINTF macro we may as well
> dump it (we should use tracepoints if we want to add interesting
> debug-logging in future anyhow).
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>

Thanks Peter!

Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH] hw/intc/arm_gic_kvm: Fix build on aarch64

2016-10-11 Thread Christopher Covington

Remove unused debugging code to fix native building on aarch64. Without
this change, the following -Werr output inhibits make from completing.

  qemu/hw/intc/arm_gic_kvm.c:38:18: error: debug_gic_kvm defined but not used 
[-Werror=unused-const-variable=]
   static const int debug_gic_kvm = 0;
^
  cc1: all warnings being treated as errors
  qemu/rules.mak:60: recipe for target 'hw/intc/arm_gic_kvm.o' failed
  make[1]: *** [hw/intc/arm_gic_kvm.o] Error 1
  Makefile:205: recipe for target 'subdir-aarch64-softmmu' failed

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 hw/intc/arm_gic_kvm.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
index ae7ac58..11729ee 100644
--- a/hw/intc/arm_gic_kvm.c
+++ b/hw/intc/arm_gic_kvm.c
@@ -30,20 +30,6 @@
 #include "gic_internal.h"
 #include "vgic_common.h"
 
-//#define DEBUG_GIC_KVM
-
-#ifdef DEBUG_GIC_KVM
-static const int debug_gic_kvm = 1;
-#else
-static const int debug_gic_kvm = 0;
-#endif
-
-#define DPRINTF(fmt, ...) do { \
-if (debug_gic_kvm) { \
-printf("arm_gic: " fmt , ## __VA_ARGS__); \
-} \
-} while (0)
-
 #define TYPE_KVM_ARM_GIC "kvm-arm-gic"
 #define KVM_ARM_GIC(obj) \
  OBJECT_CHECK(GICState, (obj), TYPE_KVM_ARM_GIC)
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [RFC] Use cpu_get_icount as cpu_get_host_ticks fallback

2016-03-19 Thread Christopher Covington

The previous increment-on-read fallback didn't increment fast
enough for some versions of grub.

https://bugs.launchpad.net/qemu-linaro/+bug/893208

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
I unfortunately don't have the opportunity to fully test this right
now, but I'm sending it out nevertheless on the off chance that
someone else might.
---
 include/qemu/timer.h | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index d0946cb..60c6dd6 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -998,13 +998,12 @@ static inline int64_t cpu_get_host_ticks(void)
 }
 
 #else
-/* The host CPU doesn't have an easily accessible cycle counter.
-   Just return a monotonically increasing value.  This will be
-   totally wrong, but hopefully better than nothing.  */
+/* The host CPU doesn't have an easily accessible cycle counter, so just return
+   the instruction count. This may make the CPU look like it has an IPC of
+   exactly 1, but that shouldn't cause any functional problems. */
 static inline int64_t cpu_get_host_ticks (void)
 {
-static int64_t ticks = 0;
-return ticks++;
+return cpu_get_icount();
 }
 #endif
 
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [Qemu-arm] [PATCH] linux-user: arm: Remove ARM_cpsr and similar #defines

2016-03-03 Thread Christopher Covington

On 03/03/2016 07:18 AM, Peter Maydell wrote:
> Typoed qemu-devel email address again, sorry. I must figure out a
> way to automate "cc the usual suspects"...

git send-email ... \
  --cc-cmd='scripts/get_maintainer.pl --norolestats' \
  ...

Cheers,
Cov

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [Qemu-arm] [PATCH] linux-user: arm: Remove ARM_cpsr and similar #defines

2016-03-03 Thread Christopher Covington

On 03/03/2016 07:18 AM, Peter Maydell wrote:
> Typoed qemu-devel email address again, sorry. I must figure out a
> way to automate "cc the usual suspects"...
> 
> thanks
> -- PMM
> 
> On 3 March 2016 at 12:11, Peter Maydell <peter.mayd...@linaro.org> wrote:
>> The #defines of ARM_cpsr and friends in linux-user/arm/target-syscall.h
>> can clash with versions in the system headers if building on an
>> ARM or AArch64 build (though this seems to be dependent on the version
>> of the system headers). The QEMU defines are not very useful (it's
>> not clear that they're intended for use with the target_pt_regs struct
>> rather than (say) the CPUARMState structure) and we only use them in one
>> function in elfload.c anyway. So just remove the #defines and directly
>> access regs->uregs[].
>>
>> Reported-by: Christopher Covington <c...@codeaurora.org>
>> Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
>> ---
>> Christopher, can you check that this resolves your compile issues, please?

It does. Thanks Peter!

Tested-by: Christopher Covington <c...@codeaurora.org>

>> NB that I have not touched the similar code in bsd-user/elfload.c because
>> that's clearly dead code, since there's no bsd-user/arm support anyway.
>>
>>  linux-user/arm/target_syscall.h | 20 +---
>>  linux-user/elfload.c| 19 ++-
>>  2 files changed, 11 insertions(+), 28 deletions(-)
>>
>> diff --git a/linux-user/arm/target_syscall.h 
>> b/linux-user/arm/target_syscall.h
>> index ea863db..11077b7 100644
>> --- a/linux-user/arm/target_syscall.h
>> +++ b/linux-user/arm/target_syscall.h
>> @@ -4,29 +4,11 @@
>>  /* this struct defines the way the registers are stored on the
>> stack during a system call. */
>>
>> +/* uregs[0..15] are r0 to r15; uregs[16] is CPSR; uregs[17] is ORIG_r0 */
>>  struct target_pt_regs {
>>  abi_long uregs[18];
>>  };
>>
>> -#define ARM_cpsr   uregs[16]
>> -#define ARM_pc uregs[15]
>> -#define ARM_lr uregs[14]
>> -#define ARM_sp uregs[13]
>> -#define ARM_ip uregs[12]
>> -#define ARM_fp uregs[11]
>> -#define ARM_r10uregs[10]
>> -#define ARM_r9 uregs[9]
>> -#define ARM_r8 uregs[8]
>> -#define ARM_r7 uregs[7]
>> -#define ARM_r6 uregs[6]
>> -#define ARM_r5 uregs[5]
>> -#define ARM_r4 uregs[4]
>> -#define ARM_r3 uregs[3]
>> -#define ARM_r2 uregs[2]
>> -#define ARM_r1 uregs[1]
>> -#define ARM_r0 uregs[0]
>> -#define ARM_ORIG_r0uregs[17]
>> -
>>  #define ARM_SYSCALL_BASE   0x90
>>  #define ARM_THUMB_SYSCALL  0
>>
>> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>> index 19dc7f5..ad014da 100644
>> --- a/linux-user/elfload.c
>> +++ b/linux-user/elfload.c
>> @@ -273,19 +273,20 @@ static inline void init_thread(struct target_pt_regs 
>> *regs,
>>  abi_long stack = infop->start_stack;
>>  memset(regs, 0, sizeof(*regs));
>>
>> -regs->ARM_cpsr = 0x10;
>> -if (infop->entry & 1)
>> -regs->ARM_cpsr |= CPSR_T;
>> -regs->ARM_pc = infop->entry & 0xfffe;
>> -regs->ARM_sp = infop->start_stack;
>> +regs->uregs[16] = ARM_CPU_MODE_USR;
>> +if (infop->entry & 1) {
>> +regs->uregs[16] |= CPSR_T;
>> +}
>> +regs->uregs[15] = infop->entry & 0xfffe;
>> +regs->uregs[13] = infop->start_stack;
>>  /* FIXME - what to for failure of get_user()? */
>> -get_user_ual(regs->ARM_r2, stack + 8); /* envp */
>> -get_user_ual(regs->ARM_r1, stack + 4); /* envp */
>> +get_user_ual(regs->uregs[2], stack + 8); /* envp */
>> +get_user_ual(regs->uregs[1], stack + 4); /* envp */
>>  /* XXX: it seems that r0 is zeroed after ! */
>> -regs->ARM_r0 = 0;
>> +regs->uregs[0] = 0;
>>  /* For uClinux PIC binaries.  */
>>  /* XXX: Linux does this only on ARM with no MMU (do we care ?) */
>> -regs->ARM_r10 = infop->start_data;
>> +regs->uregs[10] = infop->start_data;
>>  }
>>
>>  #define ELF_NREG18
>> --
>> 1.9.1
>>
>>


-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] Build breakage: error: "ARM_cpsr" redefined

2016-03-02 Thread Christopher Covington

Hi,

Attempting to build QEMU on aarch64, I got the following error.

===

./configure
--target-list=arm-softmmu,aarch64-softmmu,arm-linux-user,aarch64-linux-user
make
[...]
  CCarm-linux-user/exec.o
In file included from /root/qemu/linux-user/qemu.h:16:0,
 from /root/qemu/exec.c:42:
/root/qemu/linux-user/arm/target_syscall.h:11:0: error: "ARM_cpsr"
redefined [-Werror]
 #define ARM_cpsr uregs[16]
 ^
In file included from /usr/include/sys/user.h:25:0,
 from /usr/include/sys/procfs.h:34,
 from /usr/include/sys/ucontext.h:26,
 from /usr/include/signal.h:360,
 from /root/qemu/include/qemu/osdep.h:79,
 from /root/qemu/exec.c:19:
/usr/include/asm/ptrace.h:64:0: note: this is the location of the
previous definition
 #define ARM_cpsr pstate
 ^
In file included from /root/qemu/linux-user/qemu.h:16:0,
 from /root/qemu/exec.c:42:
/root/qemu/linux-user/arm/target_syscall.h:12:0: error: "ARM_pc"
redefined [-Werror]
 #define ARM_pc  uregs[15]
 ^
In file included from /usr/include/sys/user.h:25:0,
 from /usr/include/sys/procfs.h:34,
 from /usr/include/sys/ucontext.h:26,
 from /usr/include/signal.h:360,
 from /root/qemu/include/qemu/osdep.h:79,
 from /root/qemu/exec.c:19:
/usr/include/asm/ptrace.h:65:0: note: this is the location of the
previous definition
 #define ARM_pc  pc
 ^
In file included from /root/qemu/linux-user/qemu.h:16:0,
 from /root/qemu/exec.c:42:
/root/qemu/linux-user/arm/target_syscall.h:14:0: error: "ARM_sp"
redefined [-Werror]
 #define ARM_sp  uregs[13]
 ^
In file included from /usr/include/sys/user.h:25:0,
 from /usr/include/sys/procfs.h:34,
 from /usr/include/sys/ucontext.h:26,
 from /usr/include/signal.h:360,
 from /root/qemu/include/qemu/osdep.h:79,
 from /root/qemu/exec.c:19:
/usr/include/asm/ptrace.h:66:0: note: this is the location of the
previous definition
 #define ARM_sp  sp
 ^
cc1: all warnings being treated as errors
make[1]: *** [exec.o] Error 1
make: *** [subdir-arm-linux-user] Error 2

===

git log
commit ed6128ebbdd7cd885d39980659dad4b5c8ae8158
Merge: 9c279be 4ade054
Author: Peter Maydell <peter.mayd...@linaro.org>
Date:   Tue Mar 1 15:54:03 2016 +

===

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [PATCH] Rename cpu_get_icount_{locked,biased}

2016-02-10 Thread Christopher Covington

The function does not provide locking but rather adds a bias value.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 cpus.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/cpus.c b/cpus.c
index 898426c..50403c4 100644
--- a/cpus.c
+++ b/cpus.c
@@ -164,7 +164,7 @@ int64_t cpu_get_icount_raw(void)
 }
 
 /* Return the virtual CPU time, based on the instruction counter.  */
-static int64_t cpu_get_icount_locked(void)
+static int64_t cpu_get_icount_biased(void)
 {
 int64_t icount = cpu_get_icount_raw();
 return timers_state.qemu_icount_bias + cpu_icount_to_ns(icount);
@@ -177,7 +177,7 @@ int64_t cpu_get_icount(void)
 
 do {
 start = seqlock_read_begin(_state.vm_clock_seqlock);
-icount = cpu_get_icount_locked();
+icount = cpu_get_icount_biased();
 } while (seqlock_read_retry(_state.vm_clock_seqlock, start));
 
 return icount;
@@ -293,7 +293,7 @@ static void icount_adjust(void)
 
 seqlock_write_lock(_state.vm_clock_seqlock);
 cur_time = cpu_get_clock_locked();
-cur_icount = cpu_get_icount_locked();
+cur_icount = cpu_get_icount_biased();
 
 delta = cur_icount - cur_time;
 /* FIXME: This is a very crude algorithm, somewhat prone to oscillation.  
*/
@@ -356,7 +356,7 @@ static void icount_warp_rt(void)
  * In adaptive mode, do not let QEMU_CLOCK_VIRTUAL run too
  * far ahead of real time.
  */
-int64_t cur_icount = cpu_get_icount_locked();
+int64_t cur_icount = cpu_get_icount_biased();
 int64_t delta = clock - cur_icount;
 warp_delta = MIN(warp_delta, delta);
 }
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH v2 1/5] target-arm: Add the pmceid0 and pmceid1 registers

2016-02-09 Thread Christopher Covington

On 02/09/2016 12:19 PM, Peter Maydell wrote:
> On 6 February 2016 at 00:55, Alistair Francis
>  wrote:
>> Signed-off-by: Aaron Lindsay 
>> Signed-off-by: Alistair Francis 
>> Tested-by: Nathan Rossi 
>> ---
>>
>>  target-arm/cpu-qom.h | 2 ++
>>  target-arm/cpu.c | 2 ++
>>  target-arm/cpu64.c   | 2 ++
>>  target-arm/helper.c  | 8 
>>  4 files changed, 14 insertions(+)
>>
>> diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
>> index 07c0a71..1cc4502 100644
>> --- a/target-arm/cpu-qom.h
>> +++ b/target-arm/cpu-qom.h
>> @@ -148,6 +148,8 @@ typedef struct ARMCPU {
>>  uint32_t id_pfr0;
>>  uint32_t id_pfr1;
>>  uint32_t id_dfr0;
>> +uint32_t pmceid0;
>> +uint32_t pmceid1;
>>  uint32_t id_afr0;
>>  uint32_t id_mmfr0;
>>  uint32_t id_mmfr1;
>> diff --git a/target-arm/cpu.c b/target-arm/cpu.c
>> index 7ddbf3d..937f845 100644
>> --- a/target-arm/cpu.c
>> +++ b/target-arm/cpu.c
>> @@ -1156,6 +1156,8 @@ static void cortex_a15_initfn(Object *obj)
>>  cpu->id_pfr0 = 0x1131;
>>  cpu->id_pfr1 = 0x00011011;
>>  cpu->id_dfr0 = 0x02010555;
>> +cpu->pmceid0 = 0x0481; /* PMUv3 events 0x0, 0x8, and 0x11 */
> 
> These are:
>  SW_INCR   # insn architecturally executed, cc pass, software increment
>  INST_RETIRED # insn architecturally executed
>  CPU_CYCLES # cycle
> 
> However we don't actually implement any of these, so should
> we be advertising them?

Perhaps I'm missing something, but I was under the impression that CPU
cycle accounting was implemented as pmccntr_read/write in
target-arm/helper.c.

The instruction count event may need a wrapper around cpu_get_icount().

SWINC is pretty trivial, but I don't think we actually use it, other
than for some testing (but unfortunately not yet part of the
kvm-unit-tests patchset).

Thanks,
Cov

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [RFC 09/14] Implement remaining PMU functionality

2016-02-02 Thread Christopher Covington

Hi Alistair,

On 02/02/2016 04:22 PM, Alistair Francis wrote:
> On Wed, Aug 5, 2015 at 9:51 AM, Christopher Covington
> <c...@codeaurora.org> wrote:
>> This adds logic to increment PMEVCNTR's based on different event inputs,
>> implements all remaining CP registers, and triggers an interrupt on
>> event overflow.
> 
> We (Xilinx) need parts of this patch to avoid kernel panics when
> booting the 4.4 Linux kernel. Have you done any more work on it? If
> you can send out a pach set I'm happy to have a look at it.

This issue sounds related to Lorenzo Piersali's patch arm64: kernel: fix
PMUv3 registers unconditional access.

As for the status of our TCG PMU patches, unfortunately, last I recall,
I was writing some kvm-unit-tests that Drew wanted me to test against
the KVM PMU, which required real hardware. I got distracted with using
an upstream kernel and a certain distribution on the hardware and have
yet to return.

Cov

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH RFC V5 0/9] Implement GIC-500 from GICv3 family for arm64

2016-01-29 Thread Christopher Covington

On 10/20/2015 01:22 PM, Shlomo Pongratz wrote:
> From: Shlomo Pongratz <shlomo.pongr...@huawei.com>
> 
> This patch is a first step multicores support for arm64.
> 
> This implemntation was tested up to 100 cores.
> 
> Things left to do:
> 
> Support SPI, ITS and ITS CONTROL, note that this patch porpose is to enable
> running multi cores using the "virt" virtual machine and this goal is achived
> without that.
> 
> Add GICv2 backwards competability. Since there is a GICv2 implementation I
> can't see the pusprose for it.
> 
> Special thanks to Peter Crostwaite whose patch to th Linux (kernel) i.e.
> Implement cpu_relax as yield solved the problem of the boot process getting
> stuck for 24 cores and more.
> 
> Figure out why virtual machine name changed from virt-v3 to virt-v3-machine

Hi Shlomo,

Were you planning on another revision of this patchset? Are there any
things you would like help with?

Peter,

Do you have any thoughts about what is essential and what isn't for a
first wave of TCG GICv3 patches to be mergeable?

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [RFC v1 0/2] Add a generic loader

2016-01-28 Thread Christopher Covington

Hi Alistair,

On 01/15/2016 07:19 PM, Alistair Francis wrote:
> This work is based on the original work by Li Guang with extra
> features added by Peter C.
> 
> The idea of this loader is to allow the user to load multiple images
> or values into QEMU at startup.

> I have tested this on ARM and it works. What do people think? Is it worth
> pursuing to try and get accepted?

For what it's worth, I think this is worthwhile.

Cov

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [kvm-unit-tests PATCHv5 3/3] arm: pmu: Add CPI checking

2015-11-11 Thread Christopher Covington

On 11/10/2015 09:05 PM, Andrew Jones wrote:
> On Mon, Nov 02, 2015 at 09:58:14AM -0600, Andrew Jones wrote:
>> On Fri, Oct 30, 2015 at 03:32:43PM -0400, Christopher Covington wrote:
>>> Hi Drew,
>>>
>>> On 10/30/2015 09:00 AM, Andrew Jones wrote:
>>>> On Wed, Oct 28, 2015 at 03:12:55PM -0400, Christopher Covington wrote:
>>>>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>>>>> PMU cycle counter values. The code includes a strict checking facility
>>>>> intended for the -icount option in TCG mode but it is not yet enabled
>>>>> in the configuration file. Enabling it must wait on infrastructure
>>>>> improvements which allow for different tests to be run on TCG versus
>>>>> KVM.
>>>>>
>>>>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>>>>> ---
>>>>>  arm/pmu.c | 103 
>>>>> +-
>>>>>  1 file changed, 102 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>>>> index 4334de4..76a 100644
>>>>> --- a/arm/pmu.c
>>>>> +++ b/arm/pmu.c
>>>>> @@ -43,6 +43,23 @@ static inline unsigned long get_pmccntr(void)
>>>>>   asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
>>>>>   return cycles;
>>>>>  }
>>>>> +
>>>>> +/*
>>>>> + * Extra instructions inserted by the compiler would be difficult to 
>>>>> compensate
>>>>> + * for, so hand assemble everything between, and including, the PMCR 
>>>>> accesses
>>>>> + * to start and stop counting.
>>>>> + */
>>>>> +static inline void loop(int i, uint32_t pmcr)
>>>>> +{
>>>>> + asm volatile(
>>>>> + "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
>>>>> + "1: subs%[i], %[i], #1\n"
>>>>> + "   bgt 1b\n"
>>>>> + "   mcr p15, 0, %[z], c9, c12, 0\n"
>>>>> + : [i] "+r" (i)
>>>>> + : [pmcr] "r" (pmcr), [z] "r" (0)
>>>>> + : "cc");
>>>>> +}
>>>>>  #elif defined(__aarch64__)
>>>>>  static inline uint32_t get_pmcr(void)
>>>>>  {
>>>>> @@ -64,6 +81,23 @@ static inline unsigned long get_pmccntr(void)
>>>>>   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
>>>>>   return cycles;
>>>>>  }
>>>>> +
>>>>> +/*
>>>>> + * Extra instructions inserted by the compiler would be difficult to 
>>>>> compensate
>>>>> + * for, so hand assemble everything between, and including, the PMCR 
>>>>> accesses
>>>>> + * to start and stop counting.
>>>>> + */
>>>>> +static inline void loop(int i, uint32_t pmcr)
>>>>> +{
>>>>> + asm volatile(
>>>>> + "   msr pmcr_el0, %[pmcr]\n"
>>>>> + "1: subs%[i], %[i], #1\n"
>>>>> + "   b.gt1b\n"
>>>>> + "   msr pmcr_el0, xzr\n"
>>>>> + : [i] "+r" (i)
>>>>> + : [pmcr] "r" (pmcr)
>>>>> + : "cc");
>>>>> +}
>>>>>  #endif
>>>>>  
>>>>>  struct pmu_data {
>>>>> @@ -131,12 +165,79 @@ static bool check_cycles_increase(void)
>>>>>   return true;
>>>>>  }
>>>>>  
>>>>> -int main(void)
>>>>> +/*
>>>>> + * Execute a known number of guest instructions. Only odd instruction 
>>>>> counts
>>>>> + * greater than or equal to 3 are supported by the in-line assembly 
>>>>> code. The
>>>>> + * control register (PMCR_EL0) is initialized with the provided value 
>>>>> (allowing
>>>>> + * for example for the cycle counter or event counters to be reset). At 
>>>>> the end
>>>>> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
>>>>> + * counting, allowing the cycle counter or event counters to be read at 
>>>>> the
>>>>> + * leisure of the calling code.
>>

Re: [Qemu-devel] [kvm-unit-tests PATCHv5 3/3] arm: pmu: Add CPI checking

2015-10-30 Thread Christopher Covington

Hi Drew,

On 10/30/2015 09:00 AM, Andrew Jones wrote:
> On Wed, Oct 28, 2015 at 03:12:55PM -0400, Christopher Covington wrote:
>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>> PMU cycle counter values. The code includes a strict checking facility
>> intended for the -icount option in TCG mode but it is not yet enabled
>> in the configuration file. Enabling it must wait on infrastructure
>> improvements which allow for different tests to be run on TCG versus
>> KVM.
>>
>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>> ---
>>  arm/pmu.c | 103 
>> +-
>>  1 file changed, 102 insertions(+), 1 deletion(-)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 4334de4..76a 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -43,6 +43,23 @@ static inline unsigned long get_pmccntr(void)
>>  asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
>>  return cycles;
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to 
>> compensate
>> + * for, so hand assemble everything between, and including, the PMCR 
>> accesses
>> + * to start and stop counting.
>> + */
>> +static inline void loop(int i, uint32_t pmcr)
>> +{
>> +asm volatile(
>> +"   mcr p15, 0, %[pmcr], c9, c12, 0\n"
>> +"1: subs%[i], %[i], #1\n"
>> +"   bgt 1b\n"
>> +"   mcr p15, 0, %[z], c9, c12, 0\n"
>> +: [i] "+r" (i)
>> +: [pmcr] "r" (pmcr), [z] "r" (0)
>> +: "cc");
>> +}
>>  #elif defined(__aarch64__)
>>  static inline uint32_t get_pmcr(void)
>>  {
>> @@ -64,6 +81,23 @@ static inline unsigned long get_pmccntr(void)
>>  asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
>>  return cycles;
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to 
>> compensate
>> + * for, so hand assemble everything between, and including, the PMCR 
>> accesses
>> + * to start and stop counting.
>> + */
>> +static inline void loop(int i, uint32_t pmcr)
>> +{
>> +asm volatile(
>> +"   msr pmcr_el0, %[pmcr]\n"
>> +"1: subs%[i], %[i], #1\n"
>> +"   b.gt1b\n"
>> +"   msr pmcr_el0, xzr\n"
>> +: [i] "+r" (i)
>> +: [pmcr] "r" (pmcr)
>> +: "cc");
>> +}
>>  #endif
>>  
>>  struct pmu_data {
>> @@ -131,12 +165,79 @@ static bool check_cycles_increase(void)
>>  return true;
>>  }
>>  
>> -int main(void)
>> +/*
>> + * Execute a known number of guest instructions. Only odd instruction counts
>> + * greater than or equal to 3 are supported by the in-line assembly code. 
>> The
>> + * control register (PMCR_EL0) is initialized with the provided value 
>> (allowing
>> + * for example for the cycle counter or event counters to be reset). At the 
>> end
>> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
>> + * counting, allowing the cycle counter or event counters to be read at the
>> + * leisure of the calling code.
>> + */
>> +static void measure_instrs(int num, uint32_t pmcr)
>> +{
>> +int i = (num - 1) / 2;
>> +
>> +assert(num >= 3 && ((num - 1) % 2 == 0));
>> +loop(i, pmcr);
>> +}
>> +
>> +/*
>> + * Measure cycle counts for various known instruction counts. Ensure that 
>> the
>> + * cycle counter progresses (similar to check_cycles_increase() but with 
>> more
>> + * instructions and using reset and stop controls). If supplied a positive,
>> + * nonzero CPI parameter, also strictly check that every measurement matches
>> + * it. Strict CPI checking is used to test -icount mode.
>> + */
>> +static bool check_cpi(int cpi)
>> +{
>> +struct pmu_data pmu = {0};
>> +
>> +pmu.cycle_counter_reset = 1;
>> +pmu.enable = 1;
>> +
>> +if (cpi > 0)
>> +printf("Checking for CPI=%d.\n", cpi);
>> +printf("instrs : cycles0 cycles1 ...\n");
>> +
>> +for (int i = 3; i < 300; i += 32) {
>> +int avg, sum = 0;
>> +
>> +printf("%d :", i);
>> +

[Qemu-devel] [kvm-unit-tests PATCHv5 3/3] arm: pmu: Add CPI checking

2015-10-28 Thread Christopher Covington

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode but it is not yet enabled
in the configuration file. Enabling it must wait on infrastructure
improvements which allow for different tests to be run on TCG versus
KVM.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c | 103 +-
 1 file changed, 102 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index 4334de4..76a 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -43,6 +43,23 @@ static inline unsigned long get_pmccntr(void)
asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
return cycles;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+   asm volatile(
+   "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
+   "1: subs%[i], %[i], #1\n"
+   "   bgt 1b\n"
+   "   mcr p15, 0, %[z], c9, c12, 0\n"
+   : [i] "+r" (i)
+   : [pmcr] "r" (pmcr), [z] "r" (0)
+   : "cc");
+}
 #elif defined(__aarch64__)
 static inline uint32_t get_pmcr(void)
 {
@@ -64,6 +81,23 @@ static inline unsigned long get_pmccntr(void)
asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
return cycles;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+   asm volatile(
+   "   msr pmcr_el0, %[pmcr]\n"
+   "1: subs%[i], %[i], #1\n"
+   "   b.gt1b\n"
+   "   msr pmcr_el0, xzr\n"
+   : [i] "+r" (i)
+   : [pmcr] "r" (pmcr)
+   : "cc");
+}
 #endif
 
 struct pmu_data {
@@ -131,12 +165,79 @@ static bool check_cycles_increase(void)
return true;
 }
 
-int main(void)
+/*
+ * Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+   int i = (num - 1) / 2;
+
+   assert(num >= 3 && ((num - 1) % 2 == 0));
+   loop(i, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+   struct pmu_data pmu = {0};
+
+   pmu.cycle_counter_reset = 1;
+   pmu.enable = 1;
+
+   if (cpi > 0)
+   printf("Checking for CPI=%d.\n", cpi);
+   printf("instrs : cycles0 cycles1 ...\n");
+
+   for (int i = 3; i < 300; i += 32) {
+   int avg, sum = 0;
+
+   printf("%d :", i);
+   for (int j = 0; j < NR_SAMPLES; j++) {
+   int cycles;
+
+   measure_instrs(i, pmu.pmcr_el0);
+   cycles = get_pmccntr();
+   printf(" %d", cycles);
+
+   if (!cycles || (cpi > 0 && cycles != i * cpi)) {
+   printf("\n");
+   return false;
+   }
+
+   sum += cycles;
+   }
+   avg = sum / NR_SAMPLES;
+   printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
+   sum, avg, i / avg, avg / i);
+   }
+
+   return true;
+}
+
+int main(int argc, char *argv[])
 {
+   int cpi = 0;
+
+   if (argc >= 1)
+   cpi = atol(argv[0]);
+
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
report("Monotonically increasing cycle count", check_cycles_increase());
+   report("Cycle/instruction ratio", check_cpi(cpi));
 
return report_summary();
 }
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv5 2/3] arm: pmu: Check cycle count increases

2015-10-28 Thread Christopher Covington

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
Reviewed-by: Andrew Jones <drjo...@redhat.com>
---
 arm/pmu.c | 60 
 1 file changed, 60 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 42d0ee1..4334de4 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -14,6 +14,8 @@
  */
 #include "libcflat.h"
 
+#define NR_SAMPLES 10
+
 #if defined(__arm__)
 static inline uint32_t get_pmcr(void)
 {
@@ -22,6 +24,25 @@ static inline uint32_t get_pmcr(void)
asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
return ret;
 }
+
+static inline void set_pmcr(uint32_t pmcr)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (pmcr));
+}
+
+/*
+ * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
+ * bits doesn't seem worth the trouble when differential usage of the result is
+ * expected (with differences that can easily fit in 32 bits). So just return
+ * the lower 32 bits of the cycle count in AArch32.
+ */
+static inline unsigned long get_pmccntr(void)
+{
+   unsigned long cycles;
+
+   asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
+   return cycles;
+}
 #elif defined(__aarch64__)
 static inline uint32_t get_pmcr(void)
 {
@@ -30,6 +51,19 @@ static inline uint32_t get_pmcr(void)
asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
return ret;
 }
+
+static inline void set_pmcr(uint32_t pmcr)
+{
+   asm volatile("msr pmcr_el0, %0" : : "r" (pmcr));
+}
+
+static inline unsigned long get_pmccntr(void)
+{
+   unsigned long cycles;
+
+   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+   return cycles;
+}
 #endif
 
 struct pmu_data {
@@ -72,11 +106,37 @@ static bool check_pmcr(void)
return pmu.implementer != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+   struct pmu_data pmu = {0};
+
+   pmu.enable = 1;
+   set_pmcr(pmu.pmcr_el0);
+
+   for (int i = 0; i < NR_SAMPLES; i++) {
+   unsigned long a, b;
+
+   a = get_pmccntr();
+   b = get_pmccntr();
+
+   if (a >= b) {
+   printf("Read %ld then %ld.\n", a, b);
+   return false;
+   }
+   }
+
+   return true;
+}
+
 int main(void)
 {
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
+   report("Monotonically increasing cycle count", check_cycles_increase());
 
return report_summary();
 }
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv6] ARM PMU tests

2015-10-28 Thread Christopher Covington

Changes from v5:

2/3 arm: pmu: Check cycle count increases
* Use universal initializer {0} despite spurious compiler warnings.
* Add Drew's Reviewed-by.

3/3 arm: pmu: Add CPI checking
* Use numeric constant 0 directly with no intermediate variable.
* More tabs in inline assembly.
* Make A32/A64 inline assembly justification comments uniform.
* Check argc properly.

Thanks,
Christopher Covington

[Qemu-devel] [kvm-unit-tests PATCHv5 1/3] arm: Add PMU test

2015-10-28 Thread Christopher Covington

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington <c...@codeaurora.org>
Reviewed-by: Andrew Jones <drjo...@redhat.com>
---
 arm/pmu.c| 82 
 arm/unittests.cfg|  5 +++
 config/config-arm-common.mak |  4 ++-
 3 files changed, 90 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..42d0ee1
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,82 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright 2015 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+
+#if defined(__arm__)
+static inline uint32_t get_pmcr(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
+   return ret;
+}
+#elif defined(__aarch64__)
+static inline uint32_t get_pmcr(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
+   return ret;
+}
+#endif
+
+struct pmu_data {
+   union {
+   uint32_t pmcr_el0;
+   struct {
+   uint32_t enable:1;
+   uint32_t event_counter_reset:1;
+   uint32_t cycle_counter_reset:1;
+   uint32_t cycle_counter_clock_divider:1;
+   uint32_t event_counter_export:1;
+   uint32_t cycle_counter_disable_when_prohibited:1;
+   uint32_t cycle_counter_long:1;
+   uint32_t reserved:4;
+   uint32_t counters:5;
+   uint32_t identification_code:8;
+   uint32_t implementer:8;
+   };
+   };
+};
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+   struct pmu_data pmu;
+
+   pmu.pmcr_el0 = get_pmcr();
+
+   printf("PMU implementer: %c\n", pmu.implementer);
+   printf("Identification code: 0x%x\n", pmu.identification_code);
+   printf("Event counters:  %d\n", pmu.counters);
+
+   return pmu.implementer != 0;
+}
+
+int main(void)
+{
+   report_prefix_push("pmu");
+
+   report("Control register", check_pmcr());
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index e068a0c..fd94adb 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -35,3 +35,8 @@ file = selftest.flat
 smp = `getconf _NPROCESSORS_CONF`
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support without -icount
+[pmu]
+file = pmu.flat
+groups = pmu
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index 698555d..b34d04c 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -11,7 +11,8 @@ endif
 
 tests-common = \
$(TEST_DIR)/selftest.flat \
-   $(TEST_DIR)/spinlock-test.flat
+   $(TEST_DIR)/spinlock-test.flat \
+   $(TEST_DIR)/pmu.flat
 
 all: test_cases
 
@@ -70,3 +71,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
 $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
+$(TEST_DIR)/pmu.elf: $(cstart.o) $(TEST_DIR)/pmu.o
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv5 1/3] arm: Add PMU test

2015-10-26 Thread Christopher Covington

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington <c...@codeaurora.org>
Reviewed-by: Andrew Jones <drjo...@redhat.com>
---
 arm/pmu.c| 82 
 arm/unittests.cfg|  5 +++
 config/config-arm-common.mak |  4 ++-
 3 files changed, 90 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..42d0ee1
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,82 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright 2015 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+
+#if defined(__arm__)
+static inline uint32_t get_pmcr(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
+   return ret;
+}
+#elif defined(__aarch64__)
+static inline uint32_t get_pmcr(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
+   return ret;
+}
+#endif
+
+struct pmu_data {
+   union {
+   uint32_t pmcr_el0;
+   struct {
+   uint32_t enable:1;
+   uint32_t event_counter_reset:1;
+   uint32_t cycle_counter_reset:1;
+   uint32_t cycle_counter_clock_divider:1;
+   uint32_t event_counter_export:1;
+   uint32_t cycle_counter_disable_when_prohibited:1;
+   uint32_t cycle_counter_long:1;
+   uint32_t reserved:4;
+   uint32_t counters:5;
+   uint32_t identification_code:8;
+   uint32_t implementer:8;
+   };
+   };
+};
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+   struct pmu_data pmu;
+
+   pmu.pmcr_el0 = get_pmcr();
+
+   printf("PMU implementer: %c\n", pmu.implementer);
+   printf("Identification code: 0x%x\n", pmu.identification_code);
+   printf("Event counters:  %d\n", pmu.counters);
+
+   return pmu.implementer != 0;
+}
+
+int main(void)
+{
+   report_prefix_push("pmu");
+
+   report("Control register", check_pmcr());
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index e068a0c..fd94adb 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -35,3 +35,8 @@ file = selftest.flat
 smp = `getconf _NPROCESSORS_CONF`
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support without -icount
+[pmu]
+file = pmu.flat
+groups = pmu
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index 698555d..b34d04c 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -11,7 +11,8 @@ endif
 
 tests-common = \
$(TEST_DIR)/selftest.flat \
-   $(TEST_DIR)/spinlock-test.flat
+   $(TEST_DIR)/spinlock-test.flat \
+   $(TEST_DIR)/pmu.flat
 
 all: test_cases
 
@@ -70,3 +71,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
 $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
+$(TEST_DIR)/pmu.elf: $(cstart.o) $(TEST_DIR)/pmu.o
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv5 3/3] arm: pmu: Add CPI checking

2015-10-26 Thread Christopher Covington

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode but it is not yet enabled
in the configuration file. Enabling it must wait on infrastructure
improvements which allow for different tests to be run on TCG versus
KVM.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c | 105 +-
 1 file changed, 104 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index c44d708..59f26ab 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -43,6 +43,25 @@ static inline unsigned long get_pmccntr(void)
asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
return cycles;
 }
+
+/*
+ * Extra instructions such as `mov rd, #0` inserted by the compiler would be
+ * difficult to compensate for, so hand assemble everything between, and
+ * including, the PMCR accesses to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+   uint32_t z = 0;
+
+   asm volatile(
+   "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
+   "1: subs %[i], %[i], #1\n"
+   "   bgt 1b\n"
+   "   mcr p15, 0, %[z], c9, c12, 0\n"
+   : [i] "+r" (i)
+   : [pmcr] "r" (pmcr), [z] "r" (z)
+   : "cc");
+}
 #elif defined(__aarch64__)
 static inline uint32_t get_pmcr(void)
 {
@@ -64,6 +83,23 @@ static inline unsigned long get_pmccntr(void)
asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
return cycles;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+   asm volatile(
+   "   msr pmcr_el0, %[pmcr]\n"
+   "1: subs %[i], %[i], #1\n"
+   "   b.gt 1b\n"
+   "   msr pmcr_el0, xzr\n"
+   : [i] "+r" (i)
+   : [pmcr] "r" (pmcr)
+   : "cc");
+}
 #endif
 
 struct pmu_data {
@@ -131,12 +167,79 @@ static bool check_cycles_increase(void)
return true;
 }
 
-int main(void)
+/*
+ * Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+   int i = (num - 1) / 2;
+
+   assert(num >= 3 && ((num - 1) % 2 == 0));
+   loop(i, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+   struct pmu_data pmu = { {0} };
+
+   pmu.cycle_counter_reset = 1;
+   pmu.enable = 1;
+
+   if (cpi > 0)
+   printf("Checking for CPI=%d.\n", cpi);
+   printf("instrs : cycles0 cycles1 ...\n");
+
+   for (int i = 3; i < 300; i += 32) {
+   int avg, sum = 0;
+
+   printf("%d :", i);
+   for (int j = 0; j < NR_SAMPLES; j++) {
+   int cycles;
+
+   measure_instrs(i, pmu.pmcr_el0);
+   cycles = get_pmccntr();
+   printf(" %d", cycles);
+
+   if (!cycles || (cpi > 0 && cycles != i * cpi)) {
+   printf("\n");
+   return false;
+   }
+
+   sum += cycles;
+   }
+   avg = sum / NR_SAMPLES;
+   printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
+   sum, avg, i / avg, avg / i);
+   }
+
+   return true;
+}
+
+int main(int argc, char *argv[])
 {
+   int cpi = 0;
+
+   if (argc > 1)
+   cpi = atol(argv[0]);
+
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
report("Monotonically increasing cycle count", check_cycles_increase());
+   report("Cycle/instruction ratio", check_cpi(cpi));
 
return report_summary();
 }
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv5] ARM PMU tests

2015-10-26 Thread Christopher Covington

Changes from v4:
* Add Drew's Reviewed-by to first patch.
* Explain use of 32-bit cycle count values in AArch32.
* Zero-initialize pmu_data struct before use in check_cycles_increase and 
check_cpi. While the insistence on not using memset is entirely my own vanity, 
I blame the funny syntax on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119 
(adds extra set of braces) and checkpatch (adds spaces).
* Improve formatting of inline assembly and better explain why so much code 
must be inline assembly.

Re: [Qemu-devel] [kvm-unit-tests PATCHv5 2/3] arm: pmu: Check cycle count increases

2015-10-26 Thread Christopher Covington

On 10/26/2015 11:58 AM, Andrew Jones wrote:
> On Mon, Oct 26, 2015 at 11:38:49AM -0400, Christopher Covington wrote:
>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>> even for the smallest delta of two subsequent reads.
>>
>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>> ---
>>  arm/pmu.c | 60 
>>  1 file changed, 60 insertions(+)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 42d0ee1..c44d708 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -14,6 +14,8 @@
>>   */
>>  #include "libcflat.h"
>>  
>> +#define NR_SAMPLES 10
>> +
>>  #if defined(__arm__)
>>  static inline uint32_t get_pmcr(void)
>>  {
>> @@ -22,6 +24,25 @@ static inline uint32_t get_pmcr(void)
>>  asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>  return ret;
>>  }
>> +
>> +static inline void set_pmcr(uint32_t pmcr)
>> +{
>> +asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (pmcr));
>> +}
>> +
>> +/*
>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, 
>> returning 64
>> + * bits doesn't seem worth the trouble when differential usage of the 
>> result is
>> + * expected (with differences that can easily fit in 32 bits). So just 
>> return
>> + * the lower 32 bits of the cycle count in AArch32.
>> + */
>> +static inline unsigned long get_pmccntr(void)
>> +{
>> +unsigned long cycles;
>> +
>> +asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
>> +return cycles;
>> +}
>>  #elif defined(__aarch64__)
>>  static inline uint32_t get_pmcr(void)
>>  {
>> @@ -30,6 +51,19 @@ static inline uint32_t get_pmcr(void)
>>  asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
>>  return ret;
>>  }
>> +
>> +static inline void set_pmcr(uint32_t pmcr)
>> +{
>> +asm volatile("msr pmcr_el0, %0" : : "r" (pmcr));
>> +}
>> +
>> +static inline unsigned long get_pmccntr(void)
>> +{
>> +unsigned long cycles;
>> +
>> +asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
>> +return cycles;
>> +}
>>  #endif
>>  
>>  struct pmu_data {
>> @@ -72,11 +106,37 @@ static bool check_pmcr(void)
>>  return pmu.implementer != 0;
>>  }
>>  
>> +/*
>> + * Ensure that the cycle counter progresses between back-to-back reads.
>> + */
>> +static bool check_cycles_increase(void)
>> +{
>> +struct pmu_data pmu = { {0} };
> 
> One set of {} is enough, and looks better.

As I tried to mention in the cover letter, that can trigger a spurious GCC
warning (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119). But if you don't
mind, I don't mind.

> Otherwise
> Reviewed-by: Andrew Jones <drjo...@redhat.com>

Thanks!

Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv5 2/3] arm: pmu: Check cycle count increases

2015-10-26 Thread Christopher Covington

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c | 60 
 1 file changed, 60 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 42d0ee1..c44d708 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -14,6 +14,8 @@
  */
 #include "libcflat.h"
 
+#define NR_SAMPLES 10
+
 #if defined(__arm__)
 static inline uint32_t get_pmcr(void)
 {
@@ -22,6 +24,25 @@ static inline uint32_t get_pmcr(void)
asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
return ret;
 }
+
+static inline void set_pmcr(uint32_t pmcr)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (pmcr));
+}
+
+/*
+ * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
+ * bits doesn't seem worth the trouble when differential usage of the result is
+ * expected (with differences that can easily fit in 32 bits). So just return
+ * the lower 32 bits of the cycle count in AArch32.
+ */
+static inline unsigned long get_pmccntr(void)
+{
+   unsigned long cycles;
+
+   asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
+   return cycles;
+}
 #elif defined(__aarch64__)
 static inline uint32_t get_pmcr(void)
 {
@@ -30,6 +51,19 @@ static inline uint32_t get_pmcr(void)
asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
return ret;
 }
+
+static inline void set_pmcr(uint32_t pmcr)
+{
+   asm volatile("msr pmcr_el0, %0" : : "r" (pmcr));
+}
+
+static inline unsigned long get_pmccntr(void)
+{
+   unsigned long cycles;
+
+   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+   return cycles;
+}
 #endif
 
 struct pmu_data {
@@ -72,11 +106,37 @@ static bool check_pmcr(void)
return pmu.implementer != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+   struct pmu_data pmu = { {0} };
+
+   pmu.enable = 1;
+   set_pmcr(pmu.pmcr_el0);
+
+   for (int i = 0; i < NR_SAMPLES; i++) {
+   unsigned long a, b;
+
+   a = get_pmccntr();
+   b = get_pmccntr();
+
+   if (a >= b) {
+   printf("Read %ld then %ld.\n", a, b);
+   return false;
+   }
+   }
+
+   return true;
+}
+
 int main(void)
 {
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
+   report("Monotonically increasing cycle count", check_cycles_increase());
 
return report_summary();
 }
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [kvm-unit-tests PATCHv4 3/3] arm: pmu: Add CPI checking

2015-10-19 Thread Christopher Covington

Hi Drew,

I appreciate your feedback on these patches.

On 10/18/2015 02:28 PM, Andrew Jones wrote:

>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -37,6 +37,18 @@ static inline unsigned long get_pmccntr(void)
>>  asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
>>  return cycles;
>>  }
>> +
>> +static inline void loop(int i, uint32_t pmcr)
>> +{
>> +uint32_t z = 0;
>> +
>> +asm volatile(
>> +"   mcr p15, 0, %[pmcr], c9, c12, 0\n"
>> +"   1: subs %[i], %[i], #1\n"
>> +"   bgt 1b\n"
>> +"   mcr p15, 0, %[z], c9, c12, 0\n"
>> +: [i] "+r" (i) : [pmcr] "r" (pmcr), [z] "r" (z) : "cc");
> 
> Assembly is always ugly, but we can do a bit better formatting with tabs
> 
>   asm volatile(
>   "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
>   "1: subs%[i], %[i], #1\n"
>   "   bgt 1b\n"
>   "   mcr p15, 0, %[z], c9, c12, 0\n"
>   : [i] "+r" (i)
>   : [pmcr] "r" (pmcr), [z] "r" (z)
>   : "cc");
> 
> Actually it can be even cleaner because you already created set_pmcr()
> 
>   set_pmcr(pmcr);
> 
>   asm volatile(
>   "1: subs%0, %0, #1\n"
>   "   bgt 1b\n"
>   : "+r" (i) : : "cc");
> 
>   set_pmcr(0);

Is there any way to ensure that the compiler won't for example put a `mov rd,
#0` between the `bgt 1b` and the `mcr , rn`?

>> @@ -125,12 +147,79 @@ static bool check_cycles_increase(void)
>>  return true;
>>  }
>>  
>> -int main(void)
>> +/*
>> + * Execute a known number of guest instructions. Only odd instruction counts
>> + * greater than or equal to 3 are supported by the in-line assembly code. 
>> The
> 
> Not all odd counts, right? But rather all multiples of 3? IIUC this is because
> the loop is two instructions (sub + branch), and then the clearing of the pmcr
> register counts as the 3rd?

Clearing the PMCR doesn't happen as part of the loop, but as part of the loop
exit or epilogue.

total_instrs = iteration_count * loop_instrs + eipilogue_instrs
total_instrs = iteration_count * 2 + 1

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH] target-arm: Use common CPU cycle infrastructure

2015-10-14 Thread Christopher Covington

On 10/13/2015 04:53 PM, Peter Maydell wrote:
> On 24 September 2015 at 20:43, Christopher Covington <c...@codeaurora.org> 
> wrote:
>> cpu_get_ticks() provides a common interface across targets for
>> calculating CPU cycles. Using this fixes PMCCNTR reads when -icount
>> is specified (previously a non-increasing value was returned).
>>
>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>> ---
>>  target-arm/helper.c | 9 +++--
>>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> So, I think the conclusion from this thread was that we should
> (a) change the default cpu_get_host_ticks() implementation to
> call get_clock()
> (b) rebase this patch and apply it
> 
> Is anybody planning to do that?

It's not in my plans at the moment.

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv4 1/3] arm: Add PMU test

2015-10-12 Thread Christopher Covington

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c| 82 
 arm/unittests.cfg|  5 +++
 config/config-arm-common.mak |  4 ++-
 3 files changed, 90 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..42d0ee1
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,82 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright 2015 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+
+#if defined(__arm__)
+static inline uint32_t get_pmcr(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
+   return ret;
+}
+#elif defined(__aarch64__)
+static inline uint32_t get_pmcr(void)
+{
+   uint32_t ret;
+
+   asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
+   return ret;
+}
+#endif
+
+struct pmu_data {
+   union {
+   uint32_t pmcr_el0;
+   struct {
+   uint32_t enable:1;
+   uint32_t event_counter_reset:1;
+   uint32_t cycle_counter_reset:1;
+   uint32_t cycle_counter_clock_divider:1;
+   uint32_t event_counter_export:1;
+   uint32_t cycle_counter_disable_when_prohibited:1;
+   uint32_t cycle_counter_long:1;
+   uint32_t reserved:4;
+   uint32_t counters:5;
+   uint32_t identification_code:8;
+   uint32_t implementer:8;
+   };
+   };
+};
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+   struct pmu_data pmu;
+
+   pmu.pmcr_el0 = get_pmcr();
+
+   printf("PMU implementer: %c\n", pmu.implementer);
+   printf("Identification code: 0x%x\n", pmu.identification_code);
+   printf("Event counters:  %d\n", pmu.counters);
+
+   return pmu.implementer != 0;
+}
+
+int main(void)
+{
+   report_prefix_push("pmu");
+
+   report("Control register", check_pmcr());
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index e068a0c..fd94adb 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -35,3 +35,8 @@ file = selftest.flat
 smp = `getconf _NPROCESSORS_CONF`
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support without -icount
+[pmu]
+file = pmu.flat
+groups = pmu
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index 698555d..b34d04c 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -11,7 +11,8 @@ endif
 
 tests-common = \
$(TEST_DIR)/selftest.flat \
-   $(TEST_DIR)/spinlock-test.flat
+   $(TEST_DIR)/spinlock-test.flat \
+   $(TEST_DIR)/pmu.flat
 
 all: test_cases
 
@@ -70,3 +71,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
 $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
+$(TEST_DIR)/pmu.elf: $(cstart.o) $(TEST_DIR)/pmu.o
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv4 2/3] arm: pmu: Check cycle count increases

2015-10-12 Thread Christopher Covington

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c | 54 ++
 1 file changed, 54 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 42d0ee1..ae81970 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -14,6 +14,8 @@
  */
 #include "libcflat.h"
 
+#define NR_SAMPLES 10
+
 #if defined(__arm__)
 static inline uint32_t get_pmcr(void)
 {
@@ -22,6 +24,19 @@ static inline uint32_t get_pmcr(void)
asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
return ret;
 }
+
+static inline void set_pmcr(uint32_t pmcr)
+{
+   asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (pmcr));
+}
+
+static inline unsigned long get_pmccntr(void)
+{
+   unsigned long cycles;
+
+   asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
+   return cycles;
+}
 #elif defined(__aarch64__)
 static inline uint32_t get_pmcr(void)
 {
@@ -30,6 +45,19 @@ static inline uint32_t get_pmcr(void)
asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
return ret;
 }
+
+static inline void set_pmcr(uint32_t pmcr)
+{
+   asm volatile("msr pmcr_el0, %0" : : "r" (pmcr));
+}
+
+static inline unsigned long get_pmccntr(void)
+{
+   unsigned long cycles;
+
+   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+   return cycles;
+}
 #endif
 
 struct pmu_data {
@@ -72,11 +100,37 @@ static bool check_pmcr(void)
return pmu.implementer != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+   struct pmu_data pmu;
+
+   pmu.enable = 1;
+   set_pmcr(pmu.pmcr_el0);
+
+   for (int i = 0; i < NR_SAMPLES; i++) {
+   unsigned long a, b;
+
+   a = get_pmccntr();
+   b = get_pmccntr();
+
+   if (a >= b) {
+   printf("Read %ld then %ld.\n", a, b);
+   return false;
+   }
+   }
+
+   return true;
+}
+
 int main(void)
 {
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
+   report("Monotonically increasing cycle count", check_cycles_increase());
 
return report_summary();
 }
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv4 3/3] arm: pmu: Add CPI checking

2015-10-12 Thread Christopher Covington

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode but it is not yet enabled
in the configuration file. Enabling it must wait on infrastructure
improvements which allow for different tests to be run on TCG versus
KVM.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c | 91 ++-
 1 file changed, 90 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index ae81970..169c36c 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -37,6 +37,18 @@ static inline unsigned long get_pmccntr(void)
asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
return cycles;
 }
+
+static inline void loop(int i, uint32_t pmcr)
+{
+   uint32_t z = 0;
+
+   asm volatile(
+   "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
+   "   1: subs %[i], %[i], #1\n"
+   "   bgt 1b\n"
+   "   mcr p15, 0, %[z], c9, c12, 0\n"
+   : [i] "+r" (i) : [pmcr] "r" (pmcr), [z] "r" (z) : "cc");
+}
 #elif defined(__aarch64__)
 static inline uint32_t get_pmcr(void)
 {
@@ -58,6 +70,16 @@ static inline unsigned long get_pmccntr(void)
asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
return cycles;
 }
+
+static inline void loop(int i, uint32_t pmcr)
+{
+   asm volatile(
+   "   msr pmcr_el0, %[pmcr]\n"
+   "   1: subs %[i], %[i], #1\n"
+   "   b.gt 1b\n"
+   "   msr pmcr_el0, xzr\n"
+   : [i] "+r" (i) : [pmcr] "r" (pmcr) : "cc");
+}
 #endif
 
 struct pmu_data {
@@ -125,12 +147,79 @@ static bool check_cycles_increase(void)
return true;
 }
 
-int main(void)
+/*
+ * Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+   int i = (num - 1) / 2;
+
+   assert(num >= 3 && ((num - 1) % 2 == 0));
+   loop(i, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+   struct pmu_data pmu;
+
+   pmu.cycle_counter_reset = 1;
+   pmu.enable = 1;
+
+   if (cpi > 0)
+   printf("Checking for CPI=%d.\n", cpi);
+   printf("instrs : cycles0 cycles1 ...\n");
+
+   for (int i = 3; i < 300; i += 32) {
+   int avg, sum = 0;
+
+   printf("%d :", i);
+   for (int j = 0; j < NR_SAMPLES; j++) {
+   int cycles;
+
+   measure_instrs(i, pmu.pmcr_el0);
+   cycles = get_pmccntr();
+   printf(" %d", cycles);
+
+   if (!cycles || (cpi > 0 && cycles != i * cpi)) {
+   printf("\n");
+   return false;
+   }
+
+   sum += cycles;
+   }
+   avg = sum / NR_SAMPLES;
+   printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
+   sum, avg, i / avg, avg / i);
+   }
+
+   return true;
+}
+
+int main(int argc, char *argv[])
 {
+   int cpi = 0;
+
+   if (argc > 1)
+   cpi = atol(argv[0]);
+
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
report("Monotonically increasing cycle count", check_cycles_increase());
+   report("Cycle/instruction ratio", check_cpi(cpi));
 
return report_summary();
 }
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv4] ARM PMU tests

2015-10-12 Thread Christopher Covington

Changes from v3 in response to Drew's suggestions:

* Improved pmu_data / PMCR fields and usage
* Straightened out awkward conditionals
* Added 32-bit support
* Styling enhancements
* Deferred -icount testing to later patch

[Qemu-devel] [kvm-unit-tests PATCHv3 3/3] arm: pmu: Add CPI checking

2015-10-06 Thread Christopher Covington

Check the numbers of cycles per instruction (CPI) implied by ARM PMU
cycle counter values. Check that in -icount mode these strictly
match the specified rate.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c | 72 ++-
 arm/unittests.cfg | 13 ++
 2 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index 589e605..0ad113d 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -84,12 +84,82 @@ static bool check_cycles_increase(void)
return true;
 }
 
-int main(void)
+/* Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, struct pmu_data pmcr)
+{
+   int i = (num - 1) / 2;
+
+   if (num < 3 || ((num - 1) % 2))
+   abort();
+
+   asm volatile(
+   "msr pmcr_el0, %[pmcr]\n"
+   "1: subs %[i], %[i], #1\n"
+   "b.gt 1b\n"
+   "msr pmcr_el0, xzr"
+   : [i] "+r" (i) : [pmcr] "r" (pmcr) : "cc");
+}
+
+/* Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+   struct pmu_data pmcr;
+
+   pmcr.cycle_counter_reset = 1;
+   pmcr.enable = 1;
+
+   if (cpi > 0)
+   printf("Checking for CPI=%d.\n", cpi);
+   printf("instrs : cycles0 cycles1 ...\n");
+
+   for (int i = 3; i < 300; i += 32) {
+   int avg, sum = 0;
+
+   printf("%d :", i);
+   for (int j = 0; j < samples; j++) {
+   int cycles;
+
+   measure_instrs(i, pmcr);
+   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+   printf(" %d", cycles);
+
+   if (!cycles || (cpi > 0 && cycles != i * cpi)) {
+   printf("\n");
+   return false;
+   }
+
+   sum += cycles;
+   }
+   avg = sum / samples;
+   printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
+   sum, avg, i / avg, avg / i);
+   }
+
+   return true;
+}
+
+int main(int argc, char *argv[])
 {
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
report("Monotonically increasing cycle count", check_cycles_increase());
 
+   int cpi = (argc == 1 ? atol(argv[0]) : 0);
+
+   report("Cycle/instruction ratio", check_cpi(cpi));
+
return report_summary();
 }
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index fd94adb..333ee0d 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -39,4 +39,17 @@ groups = selftest
 # Test PMU support without -icount
 [pmu]
 file = pmu.flat
+extra_params = -append '-1'
+groups = pmu
+
+# Test PMU support with -icount IPC=1
+[pmu-icount-1]
+file = pmu.flat
+extra_params = -icount 0 -append '1'
+groups = pmu
+
+# Test PMU support with -icount IPC=256
+[pmu-icount-256]
+file = pmu.flat
+extra_params = -icount 8 -append '256'
 groups = pmu
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv3 1/3] arm: Add PMU test

2015-10-06 Thread Christopher Covington

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c   | 66 +
 arm/unittests.cfg   |  5 
 config/config-arm64.mak |  4 ++-
 3 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..91a3688
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,66 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright 2015 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+
+struct pmu_data {
+   union {
+   uint32_t pmcr_el0;
+   struct {
+   uint32_t enable:1;
+   uint32_t event_counter_reset:1;
+   uint32_t cycle_counter_reset:1;
+   uint32_t cycle_counter_clock_divider:1;
+   uint32_t event_counter_export:1;
+   uint32_t cycle_counter_disable_when_prohibited:1;
+   uint32_t cycle_counter_long:1;
+   uint32_t zeros:4;
+   uint32_t num_counters:5;
+   uint32_t identification_code:8;
+   uint32_t implementer:8;
+   };
+   };
+};
+
+/* As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero of them, but hopefully support for
+ * at least the instructions event will be added in the future and the reported
+ * number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+   struct pmu_data pmcr;
+
+   asm volatile("mrs %0, pmcr_el0" : "=r" (pmcr));
+
+   printf("PMU implementer: %c\n", pmcr.implementer);
+   printf("Identification code: 0x%x\n", pmcr.identification_code);
+   printf("Event counters:  %d\n", pmcr.num_counters);
+
+   if (pmcr.implementer)
+   return true;
+
+   return false;
+}
+
+int main(void)
+{
+   report_prefix_push("pmu");
+
+   report("Control register", check_pmcr());
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index e068a0c..fd94adb 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -35,3 +35,8 @@ file = selftest.flat
 smp = `getconf _NPROCESSORS_CONF`
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support without -icount
+[pmu]
+file = pmu.flat
+groups = pmu
diff --git a/config/config-arm64.mak b/config/config-arm64.mak
index d61b703..140b611 100644
--- a/config/config-arm64.mak
+++ b/config/config-arm64.mak
@@ -12,9 +12,11 @@ cflatobjs += lib/arm64/processor.o
 cflatobjs += lib/arm64/spinlock.o
 
 # arm64 specific tests
-tests =
+tests = $(TEST_DIR)/pmu.flat
 
 include config/config-arm-common.mak
 
 arch_clean: arm_clean
$(RM) lib/arm64/.*.d
+
+$(TEST_DIR)/pmu.elf: $(cstart.o) $(TEST_DIR)/pmu.o
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv3 2/3] arm: pmu: Check cycle count increases

2015-10-06 Thread Christopher Covington

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 91a3688..589e605 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -33,6 +33,8 @@ struct pmu_data {
};
 };
 
+static const int samples = 10;
+
 /* As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
  * null. Also print out a couple other interesting fields for diagnostic
  * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
@@ -56,11 +58,38 @@ static bool check_pmcr(void)
return false;
 }
 
+/* Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+   struct pmu_data pmcr;
+
+   pmcr.enable = 1;
+   asm volatile("msr pmcr_el0, %0" : : "r" (pmcr));
+
+   for (int i = 0; i < samples; i++) {
+   int a, b;
+
+   asm volatile(
+   "mrs %[a], pmccntr_el0\n"
+   "mrs %[b], pmccntr_el0\n"
+   : [a] "=r" (a), [b] "=r" (b));
+
+   if (a >= b) {
+   printf("Read %d then %d.\n", a, b);
+   return false;
+   }
+   }
+
+   return true;
+}
+
 int main(void)
 {
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
+   report("Monotonically increasing cycle count", check_cycles_increase());
 
return report_summary();
 }
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv3] ARM PMU tests

2015-10-06 Thread Christopher Covington

Changes from v2:

* Explicit test for monotonically increasing cycle count
* Tests now pass or fail
* Tests broken into functions
* Tests/functions broken into separate patches in series
* Style improvements as suggested by Wei Huang and Linux checkpatch.pl
* Spelling and comment improvements

Re: [Qemu-devel] Enabling PMU in qemu arm64

2015-10-02 Thread Christopher Covington

On 10/01/2015 03:32 PM, Pranith Kumar wrote:
> On Thu, Oct 1, 2015 at 12:21 PM, Christopher Covington
> <c...@codeaurora.org> wrote:
>>
>> Are you using KVM or TCG (are you running on an x86 host or an arm64 host)?
> 
> I am using TCG, aarch64-softmmu on x86 host.
> 
>>
>> We have published some patches implementing the PMU registers and instruction
>> counting (but not any other events) for TCG mode [1], but more work is
>> required to get these changes into shape for inclusion upstream.
>>
>> 1. https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00567.html
> 
> Thanks for the pointer. From the patch series I can see that patches 7
> and 9  are for enabling PMU in ARM virt. Do you plan on submitting
> them upstream?
> I will try these patches locally and see how it goes.
> 
>> To guide and justify the changes I'm currently trying to write kvm-unit-tests
>> that measure
>>
>> A) IPC using PMCCNTR_EL0 (implemented upstream, at least when not using
>> -icount) and code with known length in instructions;
> 
> PMCCNTR_EL0 always returns 0 for me(in 2.4, will check tip).

Make sure it's enabled (PMCR_EL0 = 1). I meant to copy you on the patch but
forgot. Please see "[kvm-unit-tests PATCHv2] arm: Add PMU test" for an example.

>> B) CPU frequency using PMCCNTR_EL0 and CNTVCT_EL0; and
>> C) instructions event in the PMU for code with known length in instructions
> 
> I am guessing these two are not upstream yet, would be great to see it there.

PMCCNTR_EL0 and CNTVCT_EL0 are supported upstream (so B should work). Regular
events, such as the instructions event (8 IIRC), are not yet supported
(further changes are required before C will work).

Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH] target-arm: Use common CPU cycle infrastructure

2015-10-02 Thread Christopher Covington

On 09/29/2015 10:07 AM, Christopher Covington wrote:
> On 09/28/2015 06:05 PM, Alistair Francis wrote:
>> On Thu, Sep 24, 2015 at 12:43 PM, Christopher Covington
>> <c...@codeaurora.org> wrote:
>>> cpu_get_ticks() provides a common interface across targets for
>>> calculating CPU cycles. Using this fixes PMCCNTR reads when -icount
>>> is specified (previously a non-increasing value was returned).
>>>
>>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>>> ---
>>>  target-arm/helper.c | 9 +++--
>>>  1 file changed, 3 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/target-arm/helper.c b/target-arm/helper.c
>>> index 7dc49cb..32923fb 100644
>>> --- a/target-arm/helper.c
>>> +++ b/target-arm/helper.c
>>> @@ -729,8 +729,7 @@ void pmccntr_sync(CPUARMState *env)
>>>  {
>>>  uint64_t temp_ticks;
>>>
>>> -temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
>>> -  get_ticks_per_sec(), 100);
>>> +temp_ticks = cpu_get_ticks();
> 
>> Also I don't think this is correct. cpu_get_ticks() returns the host
>> CPU cycle counter, when in this case we want the guest cycles.
> 
> I too find the use of host CPU cycles quite perplexing. Paolo suggested it
> [1]. Maybe there are timeouts in some software that tend to work better in
> such a mode. Perhaps it is faster, although my intuition is that it's just
> providing a facade of frequency scaling to the guest.
> 
> 1. https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg00162.html
> 
> I like to declare guest instructions per guest CPU cycles = 1. As I understand
> it, an "-icount 0" pair of parameters is how to do this in QEMU for x86. I'd
> like it to work for ARM.
> 
> I wrote a simple assembly test case which I'm working on porting it to the
> kvm-unit-tests framework. In the non-icount case, I saw roughly the same order
> of magnitude for guest IPC before and after the patch. I'd like to also write
> CPU frequency (guest CPU cycles per generic timer guest seconds) and (M)IPS
> (guest instructions per generic timer guest seconds) tests.

I've sent out the CPI test case and while exercising it I noticed that
Laurent's patch fixed -icount. So my original goal has been accomplished. I'm
happy to rebase this patch if the authorities (Peter Maydell?) think using
cpu_get_ticks() is the right thing to do. Otherwise I'll probably try to move
on to support for the instructions event in the ARM PMU.

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [kvm-unit-tests PATCHv2] arm: Add PMU test

2015-10-02 Thread Christopher Covington

Add test the ARM Performance Monitors Unit (PMU). The informational
fields from the control register are printed, but not checked, and
the number of cycles it takes to run a known-instruction-count loop
is printed, but not checked. Once QEMU is fixed, we can at least
begin to check for IPC == 1 when using -icount.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c   | 89 +
 arm/unittests.cfg   | 11 ++
 config/config-arm64.mak |  4 ++-
 3 files changed, 103 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..f724c2c
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,89 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright 2015 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+
+struct pmu_data {
+   union {
+   uint32_t pmcr_el0;
+   struct {
+   unsigned int enable:1;
+   unsigned int event_counter_reset:1;
+   unsigned int cycle_counter_reset:1;
+   unsigned int cycle_counter_clock_divider:1;
+   unsigned int event_counter_export:1;
+   unsigned int cycle_counter_disable_when_prohibited:1;
+   unsigned int cycle_counter_long:1;
+   unsigned int zeros:4;
+   unsigned int num_counters:5;
+   unsigned int identification_code:8;
+   unsigned int implementor:8;
+   };
+   };
+};
+
+/* Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported. The control register (PMCR) is
+ * initialized with the provided value (allowing for example for the cycle
+ * counter or eventer count to be reset if needed). After the known instruction
+ * count loop, zero is written to the PMCR to disable counting, allowing the
+ * cycle counter or event counters to be read as needed at a later time.
+ */
+static void measure_instrs(int len, struct pmu_data pmcr)
+{
+   int i = (len - 1) / 2;
+
+   if (len < 3 || ((len - 1) % 2))
+   abort();
+
+   asm volatile(
+   "msr pmcr_el0, %[pmcr]\n"
+   "1: subs %[i], %[i], #1\n"
+   "b.gt 1b\n"
+   "msr pmcr_el0, xzr"
+   : [i] "+r" (i) : [pmcr] "r" (pmcr) : "cc");
+}
+
+int main()
+{
+   struct pmu_data pmcr;
+   const int samples = 10;
+
+   asm volatile("mrs %0, pmcr_el0" : "=r" (pmcr));
+
+   printf("PMU implementor: %c\n", pmcr.implementor);
+   printf("Identification code: 0x%x\n", pmcr.identification_code);
+   printf("Event counters:  %d\n", pmcr.num_counters);
+
+   pmcr.cycle_counter_reset = 1;
+   pmcr.enable = 1;
+
+   printf("\ninstructions : cycles0 cycles1 ...\n");
+
+   for (int i = 3; i < 300; i += 32) {
+   int avg, sum = 0;
+   printf("%d :", i);
+   for (int j = 0; j < samples; j++) {
+   int val;
+   measure_instrs(i, pmcr);
+   asm volatile("mrs %0, pmccntr_el0" : "=r" (val));
+   sum += val;
+   printf(" %d", val);
+   }
+   avg = sum / samples;
+   printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n", sum, avg, i / 
avg, avg / i);
+   }
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index e068a0c..b3b7ff4 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -35,3 +35,14 @@ file = selftest.flat
 smp = `getconf _NPROCESSORS_CONF`
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support without -icount
+[pmu]
+file = pmu.flat
+groups = pmu
+
+# Test PMU support with -icount
+[pmu-icount]
+file = pmu.flat
+groups = pmu
+extra_params = -icount 0
diff --git a/config/config-arm64.mak b/config/config-arm64.mak
index d61b703..140b611 100644
--- a/config/config-arm64.mak
+++ b/config/config-arm64.mak
@@ -12,9 +12,11 @@ cflatobjs += lib/arm64/processor.o
 cflatobjs += lib/arm64/spinlock.o
 
 # arm64 specific tests
-tests =
+tests =

Re: [Qemu-devel] [PATCH] target-arm: Use common CPU cycle infrastructure

2015-10-02 Thread Christopher Covington

On 10/02/2015 01:25 PM, Peter Crosthwaite wrote:
> On Fri, Oct 2, 2015 at 9:56 AM, Peter Maydell <peter.mayd...@linaro.org> 
> wrote:
>> On 2 October 2015 at 17:44, Christopher Covington <c...@codeaurora.org> 
>> wrote:
>>> I've sent out the CPI test case and while exercising it I noticed that
>>> Laurent's patch fixed -icount. So my original goal has been accomplished. 
>>> I'm
>>> happy to rebase this patch if the authorities (Peter Maydell?) think using
>>> cpu_get_ticks() is the right thing to do. Otherwise I'll probably try to 
>>> move
>>> on to support for the instructions event in the ARM PMU.
>>
>> Authority here is probably Peter Crosthwaite. I can produce an
>> opinion but I'd have to go and research a bunch of stuff to do
>> that, so I'm hoping to avoid it...
> 
> So my idea here is the CPU input frequency should be a property of the CPU.
> 
> Some experimental results confirm that the PMCCNTR on many common ARM
> implementations is directly connected to the input clock and can be
> relied on as a straight free-running counter. I think a genuine
> instruction counter is something else

Yes, the "genuine" instruction counter is something else. The instruction
count is only relevant for folks trying to get deterministic execution by
using the -icount option. QEMU TCG mode does not emulate a cycle-level input
clock for the guest (the whole class of functional models skip this
time-consuming step) but rather operates a block at a time. By doing a little
extra, I think it also interpolates the exact instruction count. Specifying a
fixed IPC = n is the most sensible way of deterministically calculating a
PMCCNTR_EL0 value that I know of. The -icount option allows users to choose
such deterministic behavior.

> and this timer should be independent of any core provider of cycle count.

What, if anything, do you think should be hooked up to the core provider of
cycle count?

Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH] target-arm: Use common CPU cycle infrastructure

2015-10-02 Thread Christopher Covington

On 10/02/2015 03:56 PM, Peter Crosthwaite wrote:
> On Fri, Oct 2, 2015 at 12:25 PM, Christopher Covington
> <c...@codeaurora.org> wrote:
>> On 10/02/2015 01:25 PM, Peter Crosthwaite wrote:
>>> On Fri, Oct 2, 2015 at 9:56 AM, Peter Maydell <peter.mayd...@linaro.org> 
>>> wrote:
>>>> On 2 October 2015 at 17:44, Christopher Covington <c...@codeaurora.org> 
>>>> wrote:
>>>>> I've sent out the CPI test case and while exercising it I noticed that
>>>>> Laurent's patch fixed -icount. So my original goal has been accomplished. 
>>>>> I'm
>>>>> happy to rebase this patch if the authorities (Peter Maydell?) think using
>>>>> cpu_get_ticks() is the right thing to do. Otherwise I'll probably try to 
>>>>> move
>>>>> on to support for the instructions event in the ARM PMU.
>>>>
>>>> Authority here is probably Peter Crosthwaite. I can produce an
>>>> opinion but I'd have to go and research a bunch of stuff to do
>>>> that, so I'm hoping to avoid it...
>>>
>>> So my idea here is the CPU input frequency should be a property of the CPU.
>>>
>>> Some experimental results confirm that the PMCCNTR on many common ARM
>>> implementations is directly connected to the input clock and can be
>>> relied on as a straight free-running counter. I think a genuine
>>> instruction counter is something else
>>
>> Yes, the "genuine" instruction counter is something else. The instruction
>> count is only relevant for folks trying to get deterministic execution by
>> using the -icount option. QEMU TCG mode does not emulate a cycle-level input
>> clock for the guest (the whole class of functional models skip this
>> time-consuming step) but rather operates a block at a time. By doing a little
>> extra, I think it also interpolates the exact instruction count. Specifying a
>> fixed IPC = n is the most sensible way of deterministically calculating a
>> PMCCNTR_EL0 value that I know of. The -icount option allows users to choose
>> such deterministic behavior.
>>
>>> and this timer should be independent of any core provider of cycle count.
>>
>> What, if anything, do you think should be hooked up to the core provider of
>> cycle count?
> 
> Depends, Is this a virtual-machine only concept, or do you have
> something with a real-hardware analogue?

What I meant to ask was, do you see any reason for cpu_get_ticks() to exist?
If no architecture besides i386 wants to use it, perhaps the code should be
moved there.

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] Enabling PMU in qemu arm64

2015-10-01 Thread Christopher Covington

Hi Pranith,

On 10/01/2015 11:34 AM, Pranith Kumar wrote:
> Hi Christoph,
> 
> On the qemu mailing list you mentioned that you use  perf events in
> linux ARM64 guests. I was wondering how you enabled access to the PMU?
> 
> I get illegal instruction whenever I execute any "MSR PMUSERENR_EL0,
> 1" to enable user access. Any help is appreciated.

Are you using KVM or TCG (are you running on an x86 host or an arm64 host)?

We have published some patches implementing the PMU registers and instruction
counting (but not any other events) for TCG mode [1], but more work is
required to get these changes into shape for inclusion upstream.

1. https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00567.html

To guide and justify the changes I'm currently trying to write kvm-unit-tests
that measure

A) IPC using PMCCNTR_EL0 (implemented upstream, at least when not using
-icount) and code with known length in instructions;
B) CPU frequency using PMCCNTR_EL0 and CNTVCT_EL0; and
C) instructions event in the PMU for code with known length in instructions

If you're using KVM, I think Shannon Zhao at Linaro has been working on that.

Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [PATCH] arm: Add PMU test

2015-10-01 Thread Christopher Covington

Beginning with just a read of the control register, add plumbing
for testing the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/pmu.c| 31 +++
 arm/unittests.cfg|  5 +
 config/config-arm-common.mak |  4 +++-
 3 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..b1e3c7a
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,31 @@
+#include "libcflat.h"
+
+union pmcr_el0 {
+   struct {
+   unsigned int implementor:8;
+   unsigned int identification_code:8;
+   unsigned int num_counters:5;
+   unsigned int zeros:4;
+   unsigned int cycle_counter_long:1;
+   unsigned int cycle_counter_disable_when_prohibited:1;
+   unsigned int event_counter_export:1;
+   unsigned int cycle_counter_clock_divider:1;
+   unsigned int cycle_counter_reset:1;
+   unsigned int event_counter_reset:1;
+   unsigned int enable:1;
+   } split;
+   uint32_t full;
+};
+
+int main()
+{
+   union pmcr_el0 pmcr;
+
+   asm volatile("mrs %0, pmcr_el0\n" : "=r" (pmcr));
+
+   printf("PMU implementor: 0x%x\n", pmcr.split.implementor);
+   printf("Identification code: 0x%x\n", pmcr.split.identification_code);
+   printf("Event counters:  %d\n", pmcr.split.num_counters);
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index e068a0c..d3deb6a 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -35,3 +35,8 @@ file = selftest.flat
 smp = `getconf _NPROCESSORS_CONF`
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support
+[pmu]
+file = pmu.flat
+groups = pmu
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index 698555d..b34d04c 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -11,7 +11,8 @@ endif
 
 tests-common = \
$(TEST_DIR)/selftest.flat \
-   $(TEST_DIR)/spinlock-test.flat
+   $(TEST_DIR)/spinlock-test.flat \
+   $(TEST_DIR)/pmu.flat
 
 all: test_cases
 
@@ -70,3 +71,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
 $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
+$(TEST_DIR)/pmu.elf: $(cstart.o) $(TEST_DIR)/pmu.o
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [PATCHv2] arm: Fail on unknown subtest

2015-10-01 Thread Christopher Covington

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/selftest.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arm/selftest.c b/arm/selftest.c
index fc9ec60..f4a5030 100644
--- a/arm/selftest.c
+++ b/arm/selftest.c
@@ -376,6 +376,9 @@ int main(int argc, char **argv)
cpumask_set_cpu(0, _reported);
while (!cpumask_full(_reported))
cpu_relax();
+   } else {
+   printf("Unknown subtest\n");
+   abort();
}
 
return report_summary();
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [PATCH] kvm-unit-tests: arm: Fail on unknown subtest

2015-09-30 Thread Christopher Covington

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 arm/selftest.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arm/selftest.c b/arm/selftest.c
index fc9ec60..aa16a91 100644
--- a/arm/selftest.c
+++ b/arm/selftest.c
@@ -376,6 +376,8 @@ int main(int argc, char **argv)
cpumask_set_cpu(0, _reported);
while (!cpumask_full(_reported))
cpu_relax();
+   } else {
+   report("Unknown subtest", false);
}
 
return report_summary();
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH] target-arm: Use common CPU cycle infrastructure

2015-09-30 Thread Christopher Covington

On 09/28/2015 06:05 PM, Alistair Francis wrote:
> On Thu, Sep 24, 2015 at 12:43 PM, Christopher Covington
> <c...@codeaurora.org> wrote:
>> cpu_get_ticks() provides a common interface across targets for
>> calculating CPU cycles. Using this fixes PMCCNTR reads when -icount
>> is specified (previously a non-increasing value was returned).
>>
>> Signed-off-by: Christopher Covington <c...@codeaurora.org>
>> ---
>>  target-arm/helper.c | 9 +++--
>>  1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/target-arm/helper.c b/target-arm/helper.c
>> index 7dc49cb..32923fb 100644
>> --- a/target-arm/helper.c
>> +++ b/target-arm/helper.c
>> @@ -729,8 +729,7 @@ void pmccntr_sync(CPUARMState *env)
>>  {
>>  uint64_t temp_ticks;
>>
>> -temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
>> -  get_ticks_per_sec(), 100);
>> +temp_ticks = cpu_get_ticks();

> Also I don't think this is correct. cpu_get_ticks() returns the host
> CPU cycle counter, when in this case we want the guest cycles.

I too find the use of host CPU cycles quite perplexing. Paolo suggested it
[1]. Maybe there are timeouts in some software that tend to work better in
such a mode. Perhaps it is faster, although my intuition is that it's just
providing a facade of frequency scaling to the guest.

1. https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg00162.html

I like to declare guest instructions per guest CPU cycles = 1. As I understand
it, an "-icount 0" pair of parameters is how to do this in QEMU for x86. I'd
like it to work for ARM.

I wrote a simple assembly test case which I'm working on porting it to the
kvm-unit-tests framework. In the non-icount case, I saw roughly the same order
of magnitude for guest IPC before and after the patch. I'd like to also write
CPU frequency (guest CPU cycles per generic timer guest seconds) and (M)IPS
(guest instructions per generic timer guest seconds) tests.

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [PATCH] s/cpu_get_real_ticks/cpu_get_host_ticks/

2015-09-25 Thread Christopher Covington

This should help clarify the purpose of the function that returns
the host system's CPU cycle count.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 bsd-user/main.c   |  2 +-
 cpus.c|  6 +++---
 hw/intc/xics.c|  2 +-
 hw/ppc/ppc.c  |  4 ++--
 include/qemu/timer.h  | 20 ++--
 linux-user/main.c |  4 ++--
 target-alpha/sys_helper.c |  2 +-
 7 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index f0a1268..adf2de0 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -108,7 +108,7 @@ void cpu_list_unlock(void)
 
 uint64_t cpu_get_tsc(CPUX86State *env)
 {
-return cpu_get_real_ticks();
+return cpu_get_host_ticks();
 }
 
 static void write_dt(void *ptr, unsigned long addr, unsigned long limit,
diff --git a/cpus.c b/cpus.c
index 056..eba7d5b 100644
--- a/cpus.c
+++ b/cpus.c
@@ -191,7 +191,7 @@ int64_t cpu_get_ticks(void)
 
 ticks = timers_state.cpu_ticks_offset;
 if (timers_state.cpu_ticks_enabled) {
-ticks += cpu_get_real_ticks();
+ticks += cpu_get_host_ticks();
 }
 
 if (timers_state.cpu_ticks_prev > ticks) {
@@ -239,7 +239,7 @@ void cpu_enable_ticks(void)
 /* Here, the really thing protected by seqlock is cpu_clock_offset. */
 seqlock_write_lock(_state.vm_clock_seqlock);
 if (!timers_state.cpu_ticks_enabled) {
-timers_state.cpu_ticks_offset -= cpu_get_real_ticks();
+timers_state.cpu_ticks_offset -= cpu_get_host_ticks();
 timers_state.cpu_clock_offset -= get_clock();
 timers_state.cpu_ticks_enabled = 1;
 }
@@ -255,7 +255,7 @@ void cpu_disable_ticks(void)
 /* Here, the really thing protected by seqlock is cpu_clock_offset. */
 seqlock_write_lock(_state.vm_clock_seqlock);
 if (timers_state.cpu_ticks_enabled) {
-timers_state.cpu_ticks_offset += cpu_get_real_ticks();
+timers_state.cpu_ticks_offset += cpu_get_host_ticks();
 timers_state.cpu_clock_offset = cpu_get_clock_locked();
 timers_state.cpu_ticks_enabled = 0;
 }
diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 67881c7..9ff5796 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -848,7 +848,7 @@ static target_ulong h_xirr_x(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 uint32_t xirr = icp_accept(ss);
 
 args[0] = xirr;
-args[1] = cpu_get_real_ticks();
+args[1] = cpu_get_host_ticks();
 return H_SUCCESS;
 }
 
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index b77e303..2c604ef 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -834,7 +834,7 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t freq)
 static void timebase_pre_save(void *opaque)
 {
 PPCTimebase *tb = opaque;
-uint64_t ticks = cpu_get_real_ticks();
+uint64_t ticks = cpu_get_host_ticks();
 PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
 
 if (!first_ppc_cpu->env.tb_env) {
@@ -878,7 +878,7 @@ static int timebase_post_load(void *opaque, int version_id)
  NANOSECONDS_PER_SECOND);
 guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb);
 
-tb_off_adj = guest_tb - cpu_get_real_ticks();
+tb_off_adj = guest_tb - cpu_get_host_ticks();
 
 tb_off = first_ppc_cpu->env.tb_env->tb_offset;
 trace_ppc_tb_adjust(tb_off, tb_off_adj, tb_off_adj - tb_off,
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 9939246..d0946cb 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -857,7 +857,7 @@ int64_t cpu_icount_to_ns(int64_t icount);
 
 #if defined(_ARCH_PPC)
 
-static inline int64_t cpu_get_real_ticks(void)
+static inline int64_t cpu_get_host_ticks(void)
 {
 int64_t retval;
 #ifdef _ARCH_PPC64
@@ -883,7 +883,7 @@ static inline int64_t cpu_get_real_ticks(void)
 
 #elif defined(__i386__)
 
-static inline int64_t cpu_get_real_ticks(void)
+static inline int64_t cpu_get_host_ticks(void)
 {
 int64_t val;
 asm volatile ("rdtsc" : "=A" (val));
@@ -892,7 +892,7 @@ static inline int64_t cpu_get_real_ticks(void)
 
 #elif defined(__x86_64__)
 
-static inline int64_t cpu_get_real_ticks(void)
+static inline int64_t cpu_get_host_ticks(void)
 {
 uint32_t low,high;
 int64_t val;
@@ -905,7 +905,7 @@ static inline int64_t cpu_get_real_ticks(void)
 
 #elif defined(__hppa__)
 
-static inline int64_t cpu_get_real_ticks(void)
+static inline int64_t cpu_get_host_ticks(void)
 {
 int val;
 asm volatile ("mfctl %%cr16, %0" : "=r"(val));
@@ -914,7 +914,7 @@ static inline int64_t cpu_get_real_ticks(void)
 
 #elif defined(__ia64)
 
-static inline int64_t cpu_get_real_ticks(void)
+static inline int64_t cpu_get_host_ticks(void)
 {
 int64_t val;
 asm volatile ("mov %0 = ar.itc" : "=r"(val) :: "memory");
@@ -923,7 +923,7 @@ static inline int64_t cpu_get_real_ticks(void)
 
 #elif defined(__s390__)
 
-st

[Qemu-devel] [PATCH] target-arm: Use common CPU cycle infrastructure

2015-09-24 Thread Christopher Covington

cpu_get_ticks() provides a common interface across targets for
calculating CPU cycles. Using this fixes PMCCNTR reads when -icount
is specified (previously a non-increasing value was returned).

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 target-arm/helper.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 7dc49cb..32923fb 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -729,8 +729,7 @@ void pmccntr_sync(CPUARMState *env)
 {
 uint64_t temp_ticks;
 
-temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
-  get_ticks_per_sec(), 100);
+temp_ticks = cpu_get_ticks();
 
 if (env->cp15.c9_pmcr & PMCRD) {
 /* Increment once every 64 processor clock cycles */
@@ -768,8 +767,7 @@ static uint64_t pmccntr_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return env->cp15.c15_ccnt;
 }
 
-total_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
-   get_ticks_per_sec(), 100);
+total_ticks = cpu_get_ticks();
 
 if (env->cp15.c9_pmcr & PMCRD) {
 /* Increment once every 64 processor clock cycles */
@@ -789,8 +787,7 @@ static void pmccntr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 return;
 }
 
-total_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
-   get_ticks_per_sec(), 100);
+total_ticks = cpu_get_ticks();
 
 if (env->cp15.c9_pmcr & PMCRD) {
 /* Increment once every 64 processor clock cycles */
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH RFC V4 4/4] Add virt-v3 machine that uses GIC-500

2015-09-24 Thread Christopher Covington

On 09/24/2015 02:03 PM, Christopher Covington wrote:
> Hi,
> 
> On 09/17/2015 01:38 PM, Shlomo Pongratz wrote:
>> From: Pavel Fedin <p.fe...@samsung.com>
>>
>> I would like to offer this, slightly improved implementation. The key thing 
>> is a new
>> kernel_irqchip_type member in Machine class. Currently it it used only by 
>> virt machine for
>> its internal purposes, however in future it is to be passed to KVM in
>> kvm_irqchip_create(). The variable is defined as int in order to be 
>> architecture agnostic,
>> for potential future users.
>>
>> Signed-off-by: Pavel Fedin <p.fe...@samsung.com>
>> ---
>>  hw/arm/virt.c | 72 
>> +--
>>  include/hw/arm/fdt.h  |  2 ++
>>  include/hw/arm/virt.h |  1 +
>>  target-arm/machine.c  |  7 -
>>  4 files changed, 73 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 4d15309..4c2ae7f 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
> 
>> @@ -445,6 +462,14 @@ static void create_gic(VirtBoardInfo *vbi, qemu_irq 
>> *pic, int type, bool secure)
>>  sysbus_mmio_map(gicbusdev, 1, vbi->memmap[VIRT_GIC_CPU].base);
>>  }
>>  
>> +if (type == 3) {
>> +/* Connect GIC to CPU */
>> +for (i = 0; i < smp_cpus; i++) {
>> +    CPUState *cpu = qemu_get_cpu(i);
>> +aatch64_registers_with_opaque_set(OBJECT(cpu), (void *)gicdev);
> 
> Typo--should be "aarch64".
> 
> With that, feel free to add the following if it's any use:
> 
> Tested-by: Christopher Covington <c...@codeaurora.org>

I originally tested building only for aarch64-softmmu, but I've now noticed a
build issue with arm-softmmu.

Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH RFC V4 4/4] Add virt-v3 machine that uses GIC-500

2015-09-24 Thread Christopher Covington

Hi,

On 09/17/2015 01:38 PM, Shlomo Pongratz wrote:
> From: Pavel Fedin <p.fe...@samsung.com>
> 
> I would like to offer this, slightly improved implementation. The key thing 
> is a new
> kernel_irqchip_type member in Machine class. Currently it it used only by 
> virt machine for
> its internal purposes, however in future it is to be passed to KVM in
> kvm_irqchip_create(). The variable is defined as int in order to be 
> architecture agnostic,
> for potential future users.
> 
> Signed-off-by: Pavel Fedin <p.fe...@samsung.com>
> ---
>  hw/arm/virt.c | 72 
> +--
>  include/hw/arm/fdt.h  |  2 ++
>  include/hw/arm/virt.h |  1 +
>  target-arm/machine.c  |  7 -
>  4 files changed, 73 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 4d15309..4c2ae7f 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c

> @@ -445,6 +462,14 @@ static void create_gic(VirtBoardInfo *vbi, qemu_irq 
> *pic, int type, bool secure)
>  sysbus_mmio_map(gicbusdev, 1, vbi->memmap[VIRT_GIC_CPU].base);
>  }
>  
> +if (type == 3) {
> +/* Connect GIC to CPU */
> +for (i = 0; i < smp_cpus; i++) {
> +CPUState *cpu = qemu_get_cpu(i);
> +aatch64_registers_with_opaque_set(OBJECT(cpu), (void *)gicdev);

Typo--should be "aarch64".

With that, feel free to add the following if it's any use:

Tested-by: Christopher Covington <c...@codeaurora.org>

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [PATCHv2] target-arm: Use physical addresses for ldrex/strex

2015-09-23 Thread Christopher Covington

As different virtual addresses may end up aliasing by pointing to
the same physical address, modify load- and store-exclusive to
use physical addresses with the exclusive monitor.

Written by Derek Hower.

Signed-off-by: Christopher Covington <c...@codeaurora.org>
---
 target-arm/helper-a64.h|  2 ++
 target-arm/helper.c| 25 +
 target-arm/translate-a64.c | 25 +++--
 3 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index 1d3d10f..a713d29 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -46,3 +46,5 @@ DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
+
+DEF_HELPER_3(get_phys_addr64, i64, env, i64, i32)
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 12ea88f..7bcff98 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -24,6 +24,31 @@ static inline bool get_phys_addr(CPUARMState *env, 
target_ulong address,
 #define PMCRE   0x1
 #endif
 
+#ifdef TARGET_AARCH64
+
+uint64_t HELPER(get_phys_addr64)(CPUARMState *env,
+ uint64_t vaddr, uint32_t memidx)
+{
+#ifdef CONFIG_USER_ONLY
+  return vaddr;
+#else
+  hwaddr phys_addr;
+  int prot;   /* ignored */
+  target_ulong page_size; /* ignored */
+  MemTxAttrs attrs = {};  /* ignored */
+  uint32_t fsr;   /* ignored */
+
+  /* We just want the address from this function and don't care about faults.
+   * Therefore, we always assume the operation is a load.
+   */
+  get_phys_addr(env, vaddr, 0, memidx == 0, _addr, , ,
+_size, );
+  return phys_addr;
+#endif
+}
+
+#endif
+
 static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
 {
 int nregs;
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index ec0936c..fb34de2 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1708,7 +1708,17 @@ static void gen_load_exclusive(DisasContext *s, int rt, 
int rt2,
 tcg_gen_mov_i64(cpu_reg(s, rt), tmp);
 
 tcg_temp_free_i64(tmp);
-tcg_gen_mov_i64(cpu_exclusive_addr, addr);
+
+/* The monitor must be set on the physical address. We've already read the
+ * address at this point, so we know the translation won't fault.
+ */
+TCGv_i64 physaddr = tcg_temp_new_i64();
+TCGv_i32 idx = tcg_temp_new_i32();
+tcg_gen_movi_i32(idx, get_mem_index(s));
+gen_helper_get_phys_addr64(physaddr, cpu_env, addr, idx);
+tcg_gen_mov_i64(cpu_exclusive_addr, physaddr);
+tcg_temp_free_i64(physaddr);
+tcg_temp_free_i32(idx);
 }
 
 #ifdef CONFIG_USER_ONLY
@@ -1745,13 +1755,24 @@ static void gen_store_exclusive(DisasContext *s, int 
rd, int rt, int rt2,
  * basic block ends at the branch insn.
  */
 tcg_gen_mov_i64(addr, inaddr);
-tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_exclusive_addr, fail_label);
 
 tmp = tcg_temp_new_i64();
 tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), MO_TE + size);
 tcg_gen_brcond_i64(TCG_COND_NE, tmp, cpu_exclusive_val, fail_label);
 tcg_temp_free_i64(tmp);
 
+/* The monitor must be checked on the physical address. We've alredy loaded
+ * this address, so we don't need to check for a fault condition.
+ */
+TCGv_i64 physaddr = tcg_temp_new_i64();
+TCGv_i32 idx = tcg_temp_new_i32();
+tcg_gen_movi_i32(idx, get_mem_index(s));
+gen_helper_get_phys_addr64(physaddr, cpu_env, addr, idx);
+
+tcg_gen_brcond_i64(TCG_COND_NE, physaddr, cpu_exclusive_addr, fail_label);
+tcg_temp_free_i64(physaddr);
+tcg_temp_free_i32(idx);
+
 if (is_pair) {
 TCGv_i64 addrhi = tcg_temp_new_i64();
 TCGv_i64 tmphi = tcg_temp_new_i64();
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] QEMU+Aarch64: in_asm log skips instructions of loop-programs

2015-09-18 Thread Christopher Covington

On 09/18/2015 04:15 AM, Sergey Smolov wrote:
> Hi Christopher,
> 
> 18.09.2015 02:02, Christopher Covington пишет:
>> Hi Sergey,
>>
>> On 09/04/2015 12:38 PM, Sergey Smolov wrote:
>>>> 03.09.2015 19:35, Peter Maydell пишет:
>>>>> On 3 September 2015 at 15:31, Sergey Smolov <smo...@ispras.ru> wrote:
>>>>>> Do you think it is possible to implement another QEMU logger which will
>>>>>> make a record for every executed block,
>>>>> Yes (this would just need to disable the TB linking optimisation,
>>>>> which we've discussed providing a debug toggle for in another
>>>>> thread).
>>> Ok, I've implemented a mapping between disassembled forms of instructions 
>>> and
>>> executed TBs.
>>> Now my logger does "loop unrolling" successfully.
>> This sounds like it solves the same issue as -d nochain but in what's 
>> probably
>> a more time efficient manner. Are you able to share your implementation?

> In which way should I share it? Am I need to create a patch and send it to
> mailing list?

That would be ideal. If you're not familiar with the process, just let me know
and I'd be happy to help.

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] QEMU+Aarch64: in_asm log skips instructions of loop-programs

2015-09-17 Thread Christopher Covington

Hi Sergey,

On 09/04/2015 12:38 PM, Sergey Smolov wrote:
> 
>>
>> 03.09.2015 19:35, Peter Maydell пишет:
>>> On 3 September 2015 at 15:31, Sergey Smolov <smo...@ispras.ru> wrote:
>>>> Do you think it is possible to implement another QEMU logger which will
>>>> make a record for every executed block,
>>> Yes (this would just need to disable the TB linking optimisation,
>>> which we've discussed providing a debug toggle for in another
>>> thread).
> 
> Ok, I've implemented a mapping between disassembled forms of instructions and
> executed TBs.
> Now my logger does "loop unrolling" successfully.

This sounds like it solves the same issue as -d nochain but in what's probably
a more time efficient manner. Are you able to share your implementation?

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH 9/9] target-arm: Wire up HLT 0xf000 as the A64 semihosting instruction

2015-09-14 Thread Christopher Covington

Hi Peter,

On 08/27/2015 02:35 PM, Peter Maydell wrote:
> On 13 August 2015 at 17:35, Peter Maydell <peter.mayd...@linaro.org> wrote:
>> For the A64 instruction set, the semihosting call instruction
>> is 'HLT 0xf000'. Wire this up to call do_arm_semihosting()
>> if semihosting is enabled.
>>
>> Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
>> ---
> 
>> @@ -1553,8 +1554,17 @@ static void disas_exc(DisasContext *s, uint32_t insn)
>>  unallocated_encoding(s);
>>  break;
>>  }
>> -/* HLT */
>> -unsupported_encoding(s, insn);
>> +/* HLT. This has two purposes.
>> + * Architecturally, it is an external halting debug instruction.
>> + * Since QEMU doesn't implement external debug, we treat this as
>> + * it is required for halting debug disabled: it will UNDEF.
>> + * Secondly, "HLT 0xf000" is the A64 semihosting syscall 
>> instruction.
>> + */
>> +if (semihosting_enabled() && imm16 == 0xf000) {
>> +gen_exception_internal_insn(s, 0, EXCP_SEMIHOST);
>> +} else {
>> +unsupported_encoding(s, insn);
>> +}
> 
> Christopher pointed out to me at KVM Forum that this isn't
> consistent with how we do 32-bit ARM semihosting, which has a
> check to prevent its use from userspace in system emulation.
> (The idea is that semihosting is basically a "guest can pwn
> your host" API, so giving access to it to guest userspace is
> kind of brave.)

> There is a usecase for allowing unfettered access to semihosting
> in system emulation mode (basically, running bare metal test
> binaries). I think we should deal with that by having a separate
> command line option for "userspace semihosting access is OK",
> which changes the behaviour for both 32-bit and 64-bit semihosting
> APIs. Alternatively, we could instead allow userspace to use
> "safe" parts of the semihosting API, like "print to stdout",
> but not the less safe parts like "open and write to arbitrary
> host files". Or we could decide that this safety check isn't
> actually very useful (no other model/debug environment has it
> that I know of) and drop it entirely; but that makes me a little
> nervous.

I find allowing trusted guests to access host files to be a very useful
feature. To me it is very similar to passing through / (root) via VirtIO-9P.
Perhaps a useful way of making sure the user knows what files their guest is
gaining access to would be to have a semihosting path prefix option. That way
access could be allowed nowhere; clearly allow everywhere (/); or clearly be
restricted to, and relative to, a certain sysroot directory
(/home/user/my-sysroot).

Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] add multiple times opening support to a virtserialport

2015-08-27 Thread Christopher Covington

On 07/24/2015 08:00 AM, Matt Ma wrote:
 Hi all,
 
 Linaro has developed the foundation for the new Android Emulator code
 base based on a fairly recent upstream QEMU code base, when we
 re-based the code, we updated the device model to be more virtio based
 (for example the drives are now virtio block devices). The aim of this
 is to minimise the delta between upstream and the Android specific
 changes to QEMU. One Android emulator specific feature is the
 AndroidPipe.
 
 AndroidPipe is a communication channel between the guest system and
 the emulator itself. Guest side device node can be opened by multi
 processes at the same time with different service name. It has a
 de-multiplexer on the QEMU side to figure out which service the guest
 actually wanted, so the first write after opening device node is the
 service name guest wanted, after QEMU backend receive this service
 name, create a corresponding communication channel, initialize related
 component, such as file descriptor which connect to the host socket
 serve. So each opening in guest will create a separated communication
 channel.
 
 We can create a separate device for each service type, however some
 services, such as the OpenGL emulation, need to have multiple open
 channels at a time. This is currently not possible using the
 virtserialport which can only be opened once.
 
 Current virtserialport can not  be opened by multiple processes at the
 same time. I know virtserialport has provided buffers beforehand to
 cache data from host to guest, so even there is no guest read, data
 can still be transported from host to guest kernel, when there is
 guest read request, just copy cached data to user space.
 
 We are not sure clearly whether virtio can support
 multi-open-per-device semantics or not, followings are just our
 initial ideas about adding multi-open-per-device feature to a port:
 
 * when there is a open request on a port, kernel will allocate a
 portclient with new id and __wait_queue_head to track this request
 * save this portclient in file-private_data
 * guest kernel pass this portclient info to QEMU and notify that the
 port has been opened
 * QEMU backend will create a clientinfo struct to track this
 communication channel, initialize related component
 * we may change the kernel side strategy of allocating receiving
 buffers in advance to a new strategy, that is when there is a read
 request:
 - allocate a port_buffer, put user space buffer address to
 port_buffer.buf, share memory to avoid memcpy
 - put both portclient id(or portclient addrss) and port_buffer.buf
 to virtqueue, that is the length of buffers chain is 2
 - kick to notify QEMU backend to consume read buffer
 - QEMU backend read portclient info firstly to find the correct
 clientinfo, then read host data directly into virtqueue buffer to
 avoid memcpy
 - guest kernel will wait(similarly in block mode, because the user
 space address has been put into virtqueue) until QEMU backend has
 consumed buffer(all data/part data/nothing have been sent to host
 side)
 - if nothing has been read from host and file descriptor is in
 block mode, read request will wait through __wait_queue_head until
 host side is readable
 
 * above read logic may change the current behavior of transferring
 data to guest kernel even without guest user read
 
 * when there is a write request:
 - allocate a port_buffer, put user space buffer address to
 port_buffer.buf, share memory to avoid memcpy
 - put both portclient id(or portclient addrss) and port_buffer.buf
 to virtqueue, the length of buffers chain is 2
 - kick to notify QEMU backend to consume write buffer
 - QEMU backend read portclient info firstly to find the correct
 clientinfo, then write the virtqueue buffer content to host side as
 current logic
 - guest kernel will wait(similarly in block mode, because the user
 space address has been put into virtqueue) until QEMU backend has
 consumed buffer(all data/part data/nothing have been receive from host
 side)
 - if nothing has been sent out and file descriptor is in block
 mode, write request will wait through __wait_queue_head until host
 side is writable
 
 We obviously don't want to regress existing virtio behaviour and
 performance and welcome the communities expertise to point out
 anything we may have missed out before we get to far down implementing
 our initial proof-of-concept.

Would virtio-vsock be interesting for your purposes?

http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf

(Video doesn't seem to be up yet, but should probably be available eventually
at the following link)

https://www.youtube.com/playlist?list=PLW3ep1uCIRfyLNSu708gWG7uvqlolk0ep

Regards,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH v2 0/3] SysFS driver for QEMU fw_cfg device

2015-08-26 Thread Christopher Covington

Hi Gabriel,

On 08/19/2015 04:49 PM, Gabriel L. Somlo wrote:
 Hi Ard,
 
 On Wed, Aug 19, 2015 at 11:42:02AM +0200, Ard Biesheuvel wrote:
 (missed some cc's)

 On 19 August 2015 at 11:38, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
 From: Gabriel L. Somlo so...@cmu.edu
 Several different architectures supported by QEMU are set up with a
 firmware configuration (fw_cfg) device, used to pass configuration
 blobs into the guest by the host running QEMU.

 Historically, these config blobs were mostly of interest to the guest
 BIOS, but since QEMU v2.4 it is possible to insert arbitrary blobs via
 the command line, which makes them potentially interesting to userspace
 (e.g. for passing early boot environment variables, etc.).


 Does 'potentially interesting' mean you have a use case? Could you 
 elaborate?
 
 My personal one would be something like:
 
 cat  guestinfo.txt  EOT
   KEY1=val1
   KEY2=val2
   ...
 EOT
 
 qemu-system-x86_64 ... -fw-cfg name=opt/guestinfo,file=./guestinfo.txt ...
 
 Then, from inside the guest:
 
   . /sys/firmware/qemu_fw_cfg/by_name/opt/guestinfo/raw
 
   do_something_with $KEY1 $KEY2
   ...
 
 But I'm thinking this is only one of the many positive things one
 could do with the ability to access random host-supplied blobs from
 guest userspace :)

I do this with kernel parameters:

host:
qemu-system-aarch64 -append=KEY1=val1 KEY2=val2

guest:
KEY1=`sed -nr s/.*KEY1=([^ ]+).*/\1/ /proc/cmdline`
KEY2=`sed -nr s/.*KEY2=([^ ]+).*/\1/ /proc/cmdline`

do_something_with $KEY1 $KEY2

In practice it's just script=hostfile, where hostfile is available to the
guest via a 9P passthrough filesystem mount.

While quite architecture specific, I've also previously used an
angel-cmdline tool for similar purposes. Peter's recent semihosting patches
support such a tool for AArch64. (On AArch32 upstream QEMU disallows
semihosting from userspace.)

Before I had 9P on all the simulators I regularly ran, I used a semihosting
based angel-load tool.

Regards,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH 0/9] target-arm: Implement A64 semihosting

2015-08-25 Thread Christopher Covington

Hi Peter,

On 08/13/2015 12:35 PM, Peter Maydell wrote:
 This patch series implements support for semihosting for the
 64-bit ARM instruction set.
 
 It owes a significant debt to the patches sent earlier
 by Christopher Covington (and with code written by Derek Hower).
 However, it is a full from-scratch rewrite (since there were
 several things which I felt those patches didn't take the
 right approach on). I mostly just looked at the earlier
 patches to check I hadn't missed anything.
 
 The changes in the A64 API compared to the A32/T32 one are:
  * input syscall number is in register W0
  * return result is in register X0
  * all argument parameter blocks are 64 bits wide, not 32
  * there is a new SyncCacheRange syscall
  * the SYS_EXIT syscall takes a parameter block and is able
to pass a guest exit status out
  * the insn used to trigger semihosting is a HLT, not an
SVC or BKPT.
 
 I've tested this for A32, T32 and A64 semihosting, for
 both usermode and system emulation, with and without gdb
 remote syscalls.
 
 The test code I wrote to do the testing is here:
 https://git.linaro.org/people/peter.maydell/semihosting-tests.git/
 (not very exciting, but might be handy if anybody needs a
 basic how to run C code starting with bare metal system
 emulation template.)
 
 The test series also includes a bugfix: we haven't correctly
 forwarded SYS_WRITE0 (print string to terminal) to gdb since
 the gdb hosted syscall support was added to QEMU back in 2007...

Your work on this is greatly appreciated.

Tested-by: Christopher Covington c...@codeaurora.org

This works for simple Linux userspace angel-load, angel-store, and angel-exit
utilities as well as at least one newlib/libgloss test. Some more complicated
newlib/libgloss binaries don't run, but that appears to be because of
attempted vbar_el3 accesses.

If it's not much trouble, adding your semihosting tests to kvm-unit-tests
might be nice.

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] ARM softmmu breakpoint misbehavior

2015-08-25 Thread Christopher Covington

On 08/24/2015 01:36 PM, Sergey Fedorov wrote:
 Hi all,
 
 Seems there is a bug in ARM breakpoint emulation. I am not sure how to
 fix it and I would appreciate any suggestion. It is best illustrated by
 a simple test which sets up and enables an unlinked address match
 breakpoint but does not enable debug exceptions globally by
 MDSCR_EL1.MDE bit.
 
 cat test.s EOF   
 .text
 .global _start
 _start:
 adr x0, bp
 msr dbgbvr0_el1, x0
 mov x0, #1
 orr x0, x0, #(0xf  5)
 msr dbgbcr0_el1, x0
 bp:
 nop
 wfi
 b   .
 EOF
 
 aarch64-linux-gnu-as -o test.o test.s
 aarch64-linux-gnu-ld -Ttext=0x4000 -o test.elf test.o
 qemu-system-aarch64 -nographic -machine virt -cpu cortex-a57 -kernel
 test.elf -D qemu.log -d in_asm,exec -singlestep
 
 First, it fails with segmentation fault. What actually happens is a CPU
 breakpoint is inserted in hw_breakpoint_update(). After that, when
 translating bp() an internal debug exception is generated in
 gen_intermediate_code_internal_a64() since there is a CPU breakpoint
 which matches the address of the instruction being translated. Then
 arm_debug_excp_handler() get called in order to handle this breakpoint.
 It calls check_breakpoints() and discovers there is no breakpoints
 enabled since MDSCR_EL1.MDE is not set. It simply returns and we
 eventually get to cpu_handle_guest_debug(), then gdb_set_stop_cpu()
 which does segmentation fault.
 
 I managed to avoid this segmentation fault with this patch:
 
 diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
 index 663c05d..223b939 100644
 --- a/target-arm/op_helper.c
 +++ b/target-arm/op_helper.c
 @@ -889,6 +889,15 @@ void arm_debug_excp_handler(CPUState *cs)
  }
  }
  } else {
 +CPUBreakpoint *bp;
 +uint64_t pc = is_a64(env) ? env-pc : env-regs[15];
 +
 +QTAILQ_FOREACH(bp, cs-breakpoints, entry) {
 +if (bp-pc == pc  !(bp-flags  BP_CPU)) {
 +return;
 +}
 +}
 +
  if (check_breakpoints(cpu)) {
  bool same_el = (arm_debug_target_el(env) ==
 arm_current_el(env));
  if (extended_addresses_enabled(env)) {
 @@ -900,6 +909,8 @@ void arm_debug_excp_handler(CPUState *cs)
  raise_exception(env, EXCP_PREFETCH_ABORT,
  syn_breakpoint(same_el),
  arm_debug_target_el(env));
 +} else {
 +cpu_resume_from_signal(cs, NULL);
  }
  }
  }
 
 The patch adds a check for non-CPU breakpoints first, then calls
 cpu_resume_from_signal() if no CPU breakpoint matches.
 
 With this patch Qemu hangs generating internal debug exception over and
 over:
 
 head -40 qemu.log
 
 IN:
 0x4000:  10a0  adr x0, #+0x14 (addr 0x4014)
 
 Trace 0x7ff11e237000 [4000]
 
 IN:
 0x4004:  d5100080  msr (unknown), x0
 
 Trace 0x7ff11e237040 [4004]
 
 IN:
 0x4008:  d2800020  mov x0, #0x1
 
 Trace 0x7ff11e237080 [4008]
 
 IN:
 0x400c:  b27b0c00  orr x0, x0, #0x1e0
 
 Trace 0x7ff11e2370c0 [400c]
 
 IN:
 0x4010:  d51000a0  msr (unknown), x0
 
 Trace 0x7ff11e237110 [4010]
 
 IN:
 0x4014:  d503201f  nop
 Disassembler disagrees with translator over instruction decoding
 Please report this to qemu-devel@nongnu.org
 
 Trace 0x7ff11e237150 [4014]
 Trace 0x7ff11e237150 [4014]
 Trace 0x7ff11e237150 [4014]
 Trace 0x7ff11e237150 [4014]
 Trace 0x7ff11e237150 [4014]
 Trace 0x7ff11e237150 [4014]
 Trace 0x7ff11e237150 [4014]
 Trace 0x7ff11e237150 [4014]
 Trace 0x7ff11e237150 [4014]
 
 It looks like a bug, but I actually have no idea how would be best to
 overcome this situation. I would be thankful for any suggestion :)

Sorry that I don't have anything more useful to say than the following, but
adding -d int to the mix might help illustrate the alleged internal debug
exception in the trace. Peter recently posted a series related to semihosting
(target-arm: Implement A64 semihosting) that I think touches some of this
code, if you haven't seen that already.

Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH 6/9] target-arm/arm-semi.c: Support widening APIs to 64 bits

2015-08-20 Thread Christopher Covington

On Thu, Aug 13, 2015 at 9:35 AM, Peter Maydell peter.mayd...@linaro.org wrote:
 The 64-bit A64 semihosting API has some pervasive changes from
 the 32-bit version:
  * all parameter blocks are arrays of 64-bit values, not 32-bit
  * the semihosting call number is passed in W0
  * the return value is a 64-bit value in X0

 Implement the necessary handling for this widening.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org

Reviewed-by: Christopher Covington christopher.coving...@linaro.org

Re: [Qemu-devel] [PATCH 9/9] target-arm: Wire up HLT 0xf000 as the A64 semihosting instruction

2015-08-20 Thread Christopher Covington

On Aug 13, 2015 9:35 AM, Peter Maydell peter.mayd...@linaro.org wrote:

 For the A64 instruction set, the semihosting call instruction
 is 'HLT 0xf000'. Wire this up to call do_arm_semihosting()
 if semihosting is enabled.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org

Reviewed-by: Christopher Covington christopher.coving...@linaro.org

Re: [Qemu-devel] [PATCH 4/9] target-arm/arm-semi.c: Factor out repeated 'return env-regs[0]'

2015-08-19 Thread Christopher Covington

On Thu, Aug 13, 2015 at 9:35 AM, Peter Maydell peter.mayd...@linaro.org wrote:
 Factor out a repeated pattern in the semihosting code:

 gdb_do_syscall(arm_semi_cb, system,%s, arg0, (int)arg1+1);
 /* arm_semi_cb sets env-regs[0] to the syscall return value */
 return env-regs[0];

 For A64 the return value will go in a different register; pull
 the sequence out into its own function that passes the return
 value in a static variable rather than overloading regs[0]
 for the purpose, so the code will work on both A32/T32 and A64.

 Note that the lack-of-synchronization bug noted in the FIXME
 comment is not introduced by this commit, but was already present.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org

Reviewed-by: Christopher Covington christopher.coving...@linaro.org

Re: [Qemu-devel] [PATCH 7/9] target-arm/arm-semi.c: Implement A64 specific SyncCacheRange call

2015-08-19 Thread Christopher Covington

On Thu, Aug 13, 2015 at 9:35 AM, Peter Maydell peter.mayd...@linaro.org wrote:
 The A64 semihosting ABI defines a new call SyncCacheRange
 for doing a 'clean D-cache and invalidate I-cache' sequence.
 Since QEMU doesn't implement caches, we can implement this as a nop.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org

Reviewed-by: Christopher Covington christopher.coving...@linaro.org

[Qemu-devel] [RFC 1/2] icount: Print instruction count on exit

2015-08-11 Thread Christopher Covington

When -icount shift=n is in use, print the instruction count when
finished. In conjunction with the `time` command, this can be used to
calculate how many instructions per second QEMU TCG can translate and
execute. The output can also be used to double-check future facilities
such as exposing the instruction count to guest/target software
through interfaces such as an emulated Performance Monitors Unit.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 cpus.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/cpus.c b/cpus.c
index a822ce3..b0bc8ec 100644
--- a/cpus.c
+++ b/cpus.c
@@ -511,6 +511,13 @@ void cpu_ticks_init(void)
 vmstate_register(NULL, 0, vmstate_timers, timers_state);
 }
 
+static Notifier icount_exit_notifier;
+
+static void print_instruction_count(Notifier *notifier, void *data)
+{
+printf(Executed %PRId64 target instructions.\n, cpu_get_icount_raw());
+}
+
 void configure_icount(QemuOpts *opts, Error **errp)
 {
 const char *option;
@@ -541,6 +548,8 @@ void configure_icount(QemuOpts *opts, Error **errp)
 if (errno != 0 || *rem_str != '\0' || !strlen(option)) {
 error_setg(errp, icount: Invalid shift value);
 }
+icount_exit_notifier.notify = print_instruction_count;
+qemu_add_exit_notifier(icount_exit_notifier);
 use_icount = 1;
 return;
 } else if (icount_align_option) {
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [RFC 2/2] qemu-log: Add in_icount option

2015-08-11 Thread Christopher Covington

This allows one to see the size of blocks that get translated (in
target instructions) without the verbosity that in_asm would bring.
This is a step towards generating Basic Block Vectors (BBVs)* which
are histograms of blocks within a given interval. BBVs are useful in
determining whether one interval is similar to another. Note that this
does not yet provide useful information for circular chains of blocks
nor partially executed blocks. When such cases are handled reliably,
this output can also be used to double-check future facilities such as
exposing the instruction count to guest/target software through
interfaces such as an emulated Performance Monitors Unit.

* Basic Block used loosely; single-entry not guaranteed.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 include/qemu/log.h | 1 +
 qemu-log.c | 2 ++
 target-arm/translate-a64.c | 4 
 3 files changed, 7 insertions(+)

diff --git a/include/qemu/log.h b/include/qemu/log.h
index 0b0eef5..6c000ae 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -41,6 +41,7 @@ static inline bool qemu_log_enabled(void)
 #define LOG_UNIMP  (1  10)
 #define LOG_GUEST_ERROR(1  11)
 #define CPU_LOG_MMU(1  12)
+#define CPU_LOG_TB_IN_ICOUNT (1  13)
 
 /* Returns true if a bit is set in the current loglevel mask
  */
diff --git a/qemu-log.c b/qemu-log.c
index b3ebd3c..4a6cbc2 100644
--- a/qemu-log.c
+++ b/qemu-log.c
@@ -154,6 +154,8 @@ const QEMULogItem qemu_log_items[] = {
   show generated host assembly code for each compiled TB },
 { CPU_LOG_TB_IN_ASM, in_asm,
   show target assembly code for each compiled TB },
+{ CPU_LOG_TB_IN_ICOUNT, in_icount,
+  show target instruction count for each compiled TB },
 { CPU_LOG_TB_OP, op,
   show micro ops for each compiled TB },
 { CPU_LOG_TB_OP_OPT, op_opt,
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 0b0f4ae..4877c30 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -11132,6 +11132,10 @@ done_generating:
 gen_tb_end(tb, num_insns);
 
 #ifdef DEBUG_DISAS
+if (qemu_loglevel_mask(CPU_LOG_TB_IN_ICOUNT) 
+qemu_log_in_addr_range(pc_start)) {
+qemu_log(0x TARGET_FMT_lx  [size=%d]\n, pc_start, num_insns);
+}
 if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM) 
 qemu_log_in_addr_range(pc_start)) {
 qemu_log(\n);
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] RFC: Instruction Counting Debug Facilities

2015-08-11 Thread Christopher Covington

Hi,

Please find in this series two small patches adding debugging
facilities related to instruction counting. My ultimate goal is to
provide accurate instruction counts to target software through
the Performance Monitors Unit (PMU) and enable the collection of
Basic Block Vectors (BBVs). These patches are intended to
facilitate future work towards that. Please let me know if you
think there are better ways to accomplish these goals.

My only reason for hiding the instruction count exit notifier
behind -icount shift=n is that I figure the kind of person who is
interested in such a metric is the kind of person who would
want to run with -icount shift=n set. Perhaps there's a statistics
or verbose option that would be more appropriate to key off of.
Additionally printing MIPS would be even neater, but I leave that
for later.

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH v4 07/11] qemu-log: new option -dfilter to limit output

2015-08-10 Thread Christopher Covington

Hi Alex,

On 08/03/2015 05:14 AM, Alex Bennée wrote:
 When debugging big programs or system emulation sometimes you want both
 the verbosity of cpu,exec et all but don't want to generate lots of logs
 for unneeded stuff. This patch adds a new option -dfilter which allows
 you to specify interesting address ranges in the form:
 
   -dfilter 0x8000-0x9000,0xffc8+0x200,...
 
 Then logging code can use the new qemu_log_in_addr_range() function to
 decide if it will output logging information for the given range.
 
 Signed-off-by: Alex Bennée alex.ben...@linaro.org

My usual flow is to filter based on mode (CurrentEL on AArch64) and PID
(CONTEXTIDR on AArch64). Do you foresee any problems with adding such filters?

Thanks,
Christopher Covington

 v2
   - More clean-ups to the documentation
 
 v3
   - re-base
   - use GArray instead of GList to avoid cache bouncing
   - checkpatch fixes
 ---
  include/qemu/log.h |  2 ++
  qemu-log.c | 57 
 ++
  qemu-options.hx| 16 +++
  vl.c   |  3 +++
  4 files changed, 78 insertions(+)
 
 diff --git a/include/qemu/log.h b/include/qemu/log.h
 index b80f8f5..ade1f76 100644
 --- a/include/qemu/log.h
 +++ b/include/qemu/log.h
 @@ -182,6 +182,8 @@ static inline void qemu_set_log(int log_flags)
  }
  
  void qemu_set_log_filename(const char *filename);
 +void qemu_set_dfilter_ranges(const char *ranges);
 +bool qemu_log_in_addr_range(uint64_t addr);
  int qemu_str_to_log_mask(const char *str);
  
  /* Print a usage message listing all the valid logging categories
 diff --git a/qemu-log.c b/qemu-log.c
 index 77ed7bc..b3ebd3c 100644
 --- a/qemu-log.c
 +++ b/qemu-log.c
 @@ -19,11 +19,13 @@
  
  #include qemu-common.h
  #include qemu/log.h
 +#include qemu/range.h
  
  static char *logfilename;
  FILE *qemu_logfile;
  int qemu_loglevel;
  static int log_append = 0;
 +static GArray *debug_regions;
  
  void qemu_log(const char *fmt, ...)
  {
 @@ -92,6 +94,61 @@ void qemu_set_log_filename(const char *filename)
  qemu_set_log(qemu_loglevel);
  }
  
 +/* Returns true if addr is in our debug filter or no filter defined
 + */
 +bool qemu_log_in_addr_range(uint64_t addr)
 +{
 +if (debug_regions) {
 +int i = 0;
 +for (i = 0; i  debug_regions-len; i++) {
 +struct Range *range = g_array_index(debug_regions, Range, i);
 +if (addr = range-begin  addr = range-end) {
 +return true;
 +}
 +}
 +return false;
 +} else {
 +return true;
 +}
 +}
 +
 +
 +void qemu_set_dfilter_ranges(const char *filter_spec)
 +{
 +gchar **ranges = g_strsplit(filter_spec, ,, 0);
 +if (ranges) {
 +gchar **next = ranges;
 +gchar *r = *next++;
 +debug_regions = g_array_sized_new(FALSE, FALSE,
 +  sizeof(Range), 
 g_strv_length(ranges));
 +while (r) {
 +gchar *delim = g_strrstr(r, -);
 +if (!delim) {
 +delim = g_strrstr(r, +);
 +}
 +if (delim) {
 +struct Range range;
 +range.begin = strtoul(r, NULL, 0);
 +switch (*delim) {
 +case '+':
 +range.end = range.begin + strtoul(delim+1, NULL, 0);
 +break;
 +case '-':
 +range.end = strtoul(delim+1, NULL, 0);
 +break;
 +default:
 +g_assert_not_reached();
 +}
 +g_array_append_val(debug_regions, range);
 +} else {
 +g_error(Bad range specifier in: %s, r);
 +}
 +r = *next++;
 +}
 +g_strfreev(ranges);
 +}
 +}
 +
  const QEMULogItem qemu_log_items[] = {
  { CPU_LOG_TB_OUT_ASM, out_asm,
show generated host assembly code for each compiled TB },
 diff --git a/qemu-options.hx b/qemu-options.hx
 index ae53346..90f0df9 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
 @@ -2987,6 +2987,22 @@ STEXI
  Output log in @var{logfile} instead of to stderr
  ETEXI
  
 +DEF(dfilter, HAS_ARG, QEMU_OPTION_DFILTER, \
 +-dfilter range,..  filter debug output to range of addresses (useful 
 for -d cpu,exec,etc..)\n,
 +QEMU_ARCH_ALL)
 +STEXI
 +@item -dfilter @var{range1}[,...]
 +@findex -dfilter
 +Filter debug output to that relevant to a range of target addresses. The 
 filter
 +spec can be either @var{start}-@var{end} or @var{start}+@var{size} where 
 @var{start}
 +@var{end} and @var{size} are the addresses and sizes required. For example:
 +@example
 +-dfilter 0x8000-0x9000,0xffc8+0x200
 +@end example
 +Will dump output for any code in the 0x1000 sized block starting at 0x8000 
 and
 +the 0x200 sized block starting at 0xffc8.
 +ETEXI
 +
  DEF(L, HAS_ARG, QEMU_OPTION_L, \
  -L path set the directory for the BIOS

Re: [Qemu-devel] [PATCH v4 05/11] qemu-log: Improve the exec TB execution logging

2015-08-10 Thread Christopher Covington

Hi Peter, Alex,

On 08/03/2015 05:14 AM, Alex Bennée wrote:
 From: Peter Maydell peter.mayd...@linaro.org
 
 Improve the TB execution logging so that it is easier to identify
 what is happening from trace logs:
  * move the Trace logging of executed TBs into cpu_tb_exec()
so that it is emitted if and only if we actually execute a TB,
and for consistency for the CPU state logging
  * log when we link two TBs together via tb_add_jump()
  * log when cpu_tb_exec() returns early from a chain of TBs
 
 The new style logging looks like this:
 
 Trace 0x7fb7cc822ca0 [ffcdce00]
 Linking TBs 0x7fb7cc822ca0 [ffcdce00] index 0 - 0x7fb7cc823110 
 [ffcdce10]
 Trace 0x7fb7cc823110 [ffcdce10]
 Trace 0x7fb7cc823420 [ffc000302688]
 Trace 0x7fb7cc8234a0 [ffc000302698]
 Trace 0x7fb7cc823520 [ffc0003026a4]
 Trace 0x7fb7cc823560 [ffcdce44]
 Linking TBs 0x7fb7cc823560 [ffcdce44] index 1 - 0x7fb7cc8235d0 
 [ffcdce70]
 Trace 0x7fb7cc8235d0 [ffcdce70]
 Abandoned execution of TB chain before 0x7fb7cc8235d0 [ffcdce70]
 Trace 0x7fb7cc8235d0 [ffcdce70]
 Trace 0x7fb7cc822fd0 [ffcdd52c]

Do you think there's some way to log the loop count when a circular chain is
executed?

System Emulation started at Mon Aug 10 15:30:49 2015
Invocation:aarch64-softmmu/qemu-system-aarch64 -M virt -cpu cortex-a57 -m 2G
-kernel psci-exit -d exec,int,in_asm -nodefaults -nographic -monitor none
-icount shift=0

IN:
0x8000:  d2800140  mov x0, #0xa
0x8004:  f1000400  subs x0, x0, #0x1 (1)
0x8008:  54e1  b.ne #-0x4 (addr 0x8004)

Trace 0x7f38787cb000 [8000]

IN:
0x8004:  f1000400  subs x0, x0, #0x1 (1)
0x8008:  54e1  b.ne #-0x4 (addr 0x8004)

Linking TBs 0x7f38787cb000 [8000] index 1 - 0x7f38787cb0d0
[8004]
Trace 0x7f38787cb0d0 [8004]
Linking TBs 0x7f38787cb0d0 [8004] index 1 - 0x7f38787cb0d0
[8004]
Trace 0x7f38787cb0d0 [8004]

IN:
0x800c:  d2800100  mov x0, #0x8
0x8010:  f2b08000  movk x0, #0x8400, lsl #16
0x8014:  d402  hvc #0x0

Linking TBs 0x7f38787cb0d0 [8004] index 0 - 0x7f38787cb1c0
[800c]
Trace 0x7f38787cb1c0 [800c]
Taking exception 11 [Hypervisor Call]
...from EL1
...with ESR 0x5a00

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] Logging number of translated instructions per basic block

2015-08-06 Thread Christopher Covington

On 08/05/2015 01:11 PM, Christopher Covington wrote:
 Hi Anthony,
 
 On 07/28/2015 05:20 PM, Anthony Carno wrote:
 Hi there,

 As the subject of my email suggests, is there a way to log the number of
 translated instructions per basic block?  I've dug into the debug code a bit
 (specifically /target_disas /and the arch-specific /print_insn_*/) and 
 haven't
 found anything that would suggest this is doable, but before I go and move on
 to my plan B (which is devising a plan C haha), I'd like to ask and make sure
 I'm not missing something.  Thank you!

Perhaps a better answer than my previous one is cpu_get_ticks() when -icount
is used:

https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg00162.html

Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [RFC 01/14] Make unknown semihosting calls non-fatal

2015-08-06 Thread Christopher Covington

Hi Alex,

Thanks for taking a look.

On 08/06/2015 05:11 AM, Alex Bennée wrote:
 
 Christopher Covington c...@codeaurora.org writes:
 
 Signed-off-by: Christopher Covington c...@codeaurora.org
 ---
  target-arm/arm-semi.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

 diff --git a/target-arm/arm-semi.c b/target-arm/arm-semi.c
 index a8b83e6..bcc70ec 100644
 --- a/target-arm/arm-semi.c
 +++ b/target-arm/arm-semi.c
 @@ -186,8 +186,6 @@ static void arm_semi_flen_cb(CPUState *cs, target_ulong 
 ret, target_ulong err)
  #define SET_ARG(n, val) put_user_ual(val, args + (n) * 4)
  uint32_t do_arm_semihosting(CPUARMState *env)
  {
 -ARMCPU *cpu = arm_env_get_cpu(env);
 -CPUState *cs = CPU(cpu);
  target_ulong args;
  target_ulong arg0, arg1, arg2, arg3;
  char * s;
 @@ -195,6 +193,8 @@ uint32_t do_arm_semihosting(CPUARMState *env)
  uint32_t ret;
  uint32_t len;
  #ifdef CONFIG_USER_ONLY
 +ARMCPU *cpu = arm_env_get_cpu(env);
 +CPUState *cs = CPU(cpu);
  TaskState *ts = cs-opaque;
  #else
  CPUARMState *ts = env;
 @@ -562,7 +562,6 @@ uint32_t do_arm_semihosting(CPUARMState *env)
  exit(ret);
  default:
  fprintf(stderr, qemu: Unsupported SemiHosting SWI 0x%02x\n, nr);
 -cpu_dump_state(cs, stderr, fprintf, 0);
 -abort();
 +return -1;
 
 Do SemiHosting calls return anything to the guest? If so removing the
 abort means the guest may go unpredictable if the call is just ignored.
 If so the argument would be to leave the abort() in and just add the
 missing semihosting calls.

If were to pursue this further, I would want to review how external debuggers
supporting semihosting on hardware behave. However my current thinking is that
there are more commonly used mechanisms than semihosting that we can move to
for equivalent functionality.

Functionality: standard mechanism

Guest-initiated host filesystem access: VirtIO 9P passthrough filesystem
Guest-initiated exit: poweroff (PSCI)
Guest-initiated instruction counting: perf stat -e instructions:u (PMU)
Guest-initiated checkpoint dump: [maybe QMP over guest agent VirtIO serial?]

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [RFC 10/14] bbvec: Move mode/PID change detection to register writes

2015-08-05 Thread Christopher Covington

This should speed up basic block detection since these were previously
being checked for on each basic block / instruction.

Written by Aaron Lindsay.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 target-arm/cpu.h   | 13 +
 target-arm/helper-a64.c|  2 +-
 target-arm/helper.c| 39 +--
 target-arm/helper.h|  2 --
 target-arm/translate-a64.c |  2 --
 target-arm/translate.c |  2 --
 6 files changed, 43 insertions(+), 17 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index f6857fa..6c4ba9c 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -670,14 +670,27 @@ static inline uint32_t pstate_read(CPUARMState *env)
 | env-pstate | env-daif;
 }
 
+void update_instruction_count(CPUARMState *env);
+#ifdef CONFIG_BBVEC
+void context_check_mode(CPUARMState *env);
+#endif
+
 static inline void pstate_write(CPUARMState *env, uint32_t val)
 {
+bool mode_changed = (env-pstate ^ val)  PSTATE_M;
 env-ZF = (~val)  PSTATE_Z;
 env-NF = val;
 env-CF = (val  29)  1;
 env-VF = (val  3)  0x8000;
 env-daif = val  PSTATE_DAIF;
 env-pstate = val  ~CACHED_PSTATE_BITS;
+
+if (mode_changed) {
+update_instruction_count(env);
+#ifdef CONFIG_BBVEC
+context_check_mode(env);
+#endif
+}
 }
 
 /* Return the current CPSR value.  */
diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 8803293..e647b90 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -558,8 +558,8 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
 env-condexec_bits = 0;
 }
 
-pstate_write(env, PSTATE_DAIF | new_mode);
 env-aarch64 = 1;
+pstate_write(env, PSTATE_DAIF | new_mode);
 aarch64_restore_sp(env, new_el);
 
 env-pc = addr;
diff --git a/target-arm/helper.c b/target-arm/helper.c
index a659e67..c1f4c47 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -387,6 +387,7 @@ static void fcse_write(CPUARMState *env, const ARMCPRegInfo 
*ri, uint64_t value)
 }
 }
 
+void context_check_pid(CPUARMState *env);
 static void contextidr_write(CPUARMState *env, const ARMCPRegInfo *ri,
  uint64_t value)
 {
@@ -399,6 +400,9 @@ static void contextidr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 arm_tlb_flush(env, 1);
 }
 raw_write(env, ri, value);
+#ifdef CONFIG_BBVEC
+context_check_pid(env);
+#endif
 }
 
 static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -4292,29 +4296,34 @@ void HELPER(bbv_profile)(CPUARMState *env)
 }
 }
 
-void HELPER(context_check_mode)(CPUARMState *env)
+void context_check_mode(CPUARMState *env)
 {
-uint32_t mode;
+uint32_t priv_mode; /* nonzero if privileged */
+
+if (!bbtrace_initialized())
+return;
 
-/* Get current mode: userspace or privileged */
 if (env-aarch64) {
-mode = extract32(env-pstate, 2, 2);
+priv_mode = extract32(env-pstate, 2, 2);
 }
 else if ((env-uncached_cpsr  0x1f) == ARM_CPU_MODE_USR) {
-mode = 0;
+priv_mode = 0;
 }
 /* We don't currently implement the Virtualization or TrustZone
  * extensions, so PL2 and PL3 don't exist for us.
  */
-else mode = 1;
+else priv_mode = 1;
 
-bb_context_check_mode(env-prof_ic, mode);
+bb_context_check_mode(env-prof_ic, priv_mode);
 }
 
-void HELPER(context_check_pid)(CPUARMState *env)
+void context_check_pid(CPUARMState *env)
 {
 uint64_t pid;
 
+if (!bbtrace_initialized())
+return;
+
 /* Read pid from CONTEXTIDR register. In aarch32, if EL1 is not in AArch64
  * mode, we need to shift out the address space identifier in the first 8 
bits.
  *
@@ -4331,7 +4340,7 @@ void HELPER(context_check_pid)(CPUARMState *env)
 bb_context_check_pid(env-prof_ic, pid);
 }
 
-void HELPER(update_instruction_count)(CPUARMState *env)
+void update_instruction_count(CPUARMState *env)
 {
 if (bbtrace_initialized()) {
 /*
@@ -4360,7 +4369,7 @@ void HELPER(update_instruction_count)(CPUARMState *env)
 
 #else //!CONFIG_BBVEC
 
-void HELPER(update_instruction_count)(CPUARMState *env)
+void update_instruction_count(CPUARMState *env)
 {
 pmevcntr_increment(env, PMU_COUNTER_TYPE_INSTRUCTIONS, env-prof_ic);
 pmevcntr_increment(env, PMU_COUNTER_TYPE_CYCLES, env-prof_ic);
@@ -4369,6 +4378,11 @@ void HELPER(update_instruction_count)(CPUARMState *env)
 
 #endif //CONFIG_BBVEC
 
+void HELPER(update_instruction_count)(CPUARMState *env)
+{
+update_instruction_count(env);
+}
+
 /* Sign/zero extend */
 uint32_t HELPER(sxtb16)(uint32_t x)
 {
@@ -4525,6 +4539,11 @@ void switch_mode(CPUARMState *env, int mode)
 if (mode == old_mode)
 return;
 
+update_instruction_count(env);
+#ifdef CONFIG_BBVEC
+context_check_mode(env);
+#endif
+
 if (old_mode == ARM_CPU_MODE_FIQ) {
 memcpy (env-fiq_regs, env-regs + 8, 5 * sizeof(uint32_t

[Qemu-devel] [RFC 12/14] bbvec: Detect mode changes after uncached_cpsr update

2015-08-05 Thread Christopher Covington

The previous code checked for the mode change before the new mode was
written to env-uncached_cpsr, which unfortunately made the bbvec output
look reasonable for small tests.

Written by Aaron Lindsay.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 target-arm/helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 297eb7c..ae0a4ac 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4281,6 +4281,9 @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t 
mask)
 }
 mask = ~CACHED_CPSR_BITS;
 env-uncached_cpsr = (env-uncached_cpsr  ~mask) | (val  mask);
+#ifdef CONFIG_BBVEC
+context_check_mode(env);
+#endif
 }
 
 #ifdef CONFIG_BBVEC
@@ -4540,9 +4543,6 @@ void switch_mode(CPUARMState *env, int mode)
 return;
 
 update_instruction_count(env);
-#ifdef CONFIG_BBVEC
-context_check_mode(env);
-#endif
 
 if (old_mode == ARM_CPU_MODE_FIQ) {
 memcpy (env-fiq_regs, env-regs + 8, 5 * sizeof(uint32_t));
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [RFC 07/14] Add PMU to ARM virt platform

2015-08-05 Thread Christopher Covington

This reserves one PPI (private peripheral interrupt) for the PMU and
creates a corresponding entry in the device tree.

Writteb by Aaron Lindsay.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 hw/arm/virt.c| 21 +
 target-arm/cpu-qom.h |  4 ++--
 target-arm/cpu.c |  8 
 target-arm/cpu.h |  3 ++-
 target-arm/helper.c  |  6 +++---
 5 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 565f573..bd1fbb6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -295,6 +295,23 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi)
GIC_FDT_IRQ_TYPE_PPI, 10, irqflags);
 }
 
+static void fdt_add_pmu_nodes(const VirtBoardInfo *vbi)
+{
+uint32_t irqflags = GIC_FDT_IRQ_FLAGS_EDGE_LO_HI;
+const char nodename[] = /pmu;
+const char compat[] = arm,armv8-pmuv3\0arm,cortex-a15-pmu;
+
+irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
+ GIC_FDT_IRQ_PPI_CPU_WIDTH, (1  vbi-smp_cpus) - 1);
+
+
+qemu_fdt_add_subnode(vbi-fdt, nodename);
+qemu_fdt_setprop(vbi-fdt, nodename, compatible, compat, sizeof(compat));
+
+qemu_fdt_setprop_cells(vbi-fdt, nodename, interrupts,
+   GIC_FDT_IRQ_TYPE_PPI, 7, irqflags);
+}
+
 static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
 {
 int cpu;
@@ -384,6 +401,9 @@ static uint32_t create_gic(const VirtBoardInfo *vbi, 
qemu_irq *pic)
 /* virtual timer */
 qdev_connect_gpio_out(cpudev, 1,
   qdev_get_gpio_in(gicdev, ppibase + 27));
+/* PMU */
+qdev_connect_gpio_out(cpudev, 2,
+  qdev_get_gpio_in(gicdev, ppibase + 23));
 
 sysbus_connect_irq(gicbusdev, i, qdev_get_gpio_in(cpudev, 
ARM_CPU_IRQ));
 }
@@ -802,6 +822,7 @@ static void machvirt_init(MachineState *machine)
 }
 g_strfreev(cpustr);
 fdt_add_timer_nodes(vbi);
+fdt_add_pmu_nodes(vbi);
 fdt_add_cpu_nodes(vbi);
 fdt_add_psci_node(vbi);
 
diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index ed5a644..bb6722f 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -84,8 +84,8 @@ typedef struct ARMCPU {
 
 /* Timers used by the generic (architected) timer */
 QEMUTimer *gt_timer[NUM_GTIMERS];
-/* GPIO outputs for generic timer */
-qemu_irq gt_timer_outputs[NUM_GTIMERS];
+/* GPIO outputs for generic timer and PMU */
+qemu_irq ppi_outputs[NUM_GTIMERS + 1];
 
 /* 'compatible' string for this CPU for Linux device trees */
 const char *dtb_compatible;
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 3ca3fa8..87d0772 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -379,17 +379,17 @@ static void arm_cpu_initfn(Object *obj)
 /* VIRQ and VFIQ are unused with KVM but we add them to maintain
  * the same interface as non-KVM CPUs.
  */
-qdev_init_gpio_in(DEVICE(cpu), arm_cpu_kvm_set_irq, 4);
+qdev_init_gpio_in(DEVICE(cpu), arm_cpu_kvm_set_irq, 5);
 } else {
-qdev_init_gpio_in(DEVICE(cpu), arm_cpu_set_irq, 4);
+qdev_init_gpio_in(DEVICE(cpu), arm_cpu_set_irq, 5);
 }
 
 cpu-gt_timer[GTIMER_PHYS] = timer_new(QEMU_CLOCK_VIRTUAL, GTIMER_SCALE,
 arm_gt_ptimer_cb, cpu);
 cpu-gt_timer[GTIMER_VIRT] = timer_new(QEMU_CLOCK_VIRTUAL, GTIMER_SCALE,
 arm_gt_vtimer_cb, cpu);
-qdev_init_gpio_out(DEVICE(cpu), cpu-gt_timer_outputs,
-   ARRAY_SIZE(cpu-gt_timer_outputs));
+qdev_init_gpio_out(DEVICE(cpu), cpu-ppi_outputs,
+   ARRAY_SIZE(cpu-ppi_outputs));
 #endif
 
 /* DTB consumers generally don't in fact care what the 'compatible'
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 2525569..44084a5 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -112,9 +112,10 @@ typedef struct ARMGenericTimer {
 uint64_t ctl; /* Timer Control register */
 } ARMGenericTimer;
 
+#define NUM_GTIMERS 2
 #define GTIMER_PHYS 0
 #define GTIMER_VIRT 1
-#define NUM_GTIMERS 2
+#define PMU_IDX 2
 
 typedef struct {
 uint64_t raw_tcr;
diff --git a/target-arm/helper.c b/target-arm/helper.c
index d2c02be..1843ec5 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -1216,7 +1216,7 @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
 uint64_t nexttick;
 
 gt-ctl = deposit32(gt-ctl, 2, 1, istatus);
-qemu_set_irq(cpu-gt_timer_outputs[timeridx],
+qemu_set_irq(cpu-ppi_outputs[timeridx],
  (istatus  !(gt-ctl  2)));
 if (istatus) {
 /* Next transition is when count rolls back over to zero */
@@ -1237,7 +1237,7 @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
 } else {
 /* Timer disabled: ISTATUS and timer output always clear */
 gt-ctl = ~4

[Qemu-devel] [RFC 06/14] Added support for block profiling for AArch32 and Aarch64

2015-08-05 Thread Christopher Covington

Blocks are split on all branch instructions. Code assumes a translation
block does not end in a branch. When a branch instruction is identified,
gen_store_is_jmp(1) is called to indicate that a split should happen on
this instruction. The exit notifier is used to finalize the block
profiler, ensuring all block vector files are closed and the block stats
are printed to stdout when QEMU exits.

Written by Gideon Billings and Aaron Lindsay.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 Makefile.objs  |   1 +
 bbv_profiler.c |  77 +++
 bbv_profiler.h |  35 
 configure  |  19 ++
 include/exec/cpu-defs.h|   6 +
 qemu-options.hx|  10 +
 rules.mak  |   3 +-
 target-arm/helper.c|  57 +
 target-arm/helper.h|   6 +
 target-arm/translate-a64.c |  80 +++
 target-arm/translate.c | 503 -
 vl.c   |  58 ++
 12 files changed, 850 insertions(+), 5 deletions(-)
 create mode 100644 bbv_profiler.c
 create mode 100644 bbv_profiler.h

diff --git a/Makefile.objs b/Makefile.objs
index 28999d3..7da955a 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -2,6 +2,7 @@
 # Common libraries for tools and emulators
 stub-obj-y = stubs/
 util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o qapi-event.o
+util-obj-$(CONFIG_BBVEC) += bbv_profiler.o
 
 ###
 # block-obj-y is code used by both qemu system emulation and qemu-img
diff --git a/bbv_profiler.c b/bbv_profiler.c
new file mode 100644
index 000..51e8060
--- /dev/null
+++ b/bbv_profiler.c
@@ -0,0 +1,77 @@
+/*
+ * Copyright (c) 2015 the Linux Foundation.
+ * Written by Gideon Billings and Aaron Lindsay.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see http://www.gnu.org/licenses/.
+ */
+
+#include assert.h
+#include bbv_profiler.h
+
+static BasicBlockTraceHandle trace = NULL;
+static uint32_t mode = 0;
+static uint64_t pid = 0;
+static Notifier exit_notifier;
+
+static void bbtrace_exit_notifier(Notifier *notifier, void *data)
+{
+   assert(trace != NULL);
+
+   // In addition to freeing memory, bbvec_free also closes open files and
+   // prints statistics.
+   bbvec_free(trace);
+}
+
+void bbtrace_init(uint64_t interval_size, const char *filename_base, bool 
trace_userspace_only, bool combine_pids)
+{
+   assert(trace == NULL);
+   trace = bbvec_create(interval_size, filename_base, 
trace_userspace_only, combine_pids);
+}
+
+Notifier* get_bbtrace_exit_notifier(void) {
+   exit_notifier.notify = bbtrace_exit_notifier;
+   return exit_notifier;
+}
+
+int bbtrace_initialized(void) {
+   return (trace != 0);
+}
+
+void bb_process(uint64_t PC, uint64_t IC)
+{
+   assert(trace != NULL);
+
+   bbvec_insert_bb(trace, PC, IC);
+}
+
+void bb_context_check_mode(uint64_t IC, uint32_t new_mode)
+{
+   assert(trace != NULL);
+
+   if (mode != new_mode) {
+   mode = new_mode;
+   bool usrmode = !new_mode;
+   bbvec_mode_change(trace, usrmode, IC);
+   }
+}
+
+void bb_context_check_pid(uint64_t IC, uint64_t new_pid)
+{
+   assert(trace != NULL);
+
+   if (pid != new_pid) {
+   pid = new_pid;
+   bbvec_pid_change(trace, new_pid, IC);
+   }
+}
diff --git a/bbv_profiler.h b/bbv_profiler.h
new file mode 100644
index 000..26dfa1f
--- /dev/null
+++ b/bbv_profiler.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2015 the Linux Foundation.
+ * Written by Gideon Billings and Aaron Lindsay.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see http://www.gnu.org/licenses/.
+ */
+
+#ifndef BBV_PROFILER
+#define BBV_PROFILER
+
+#include bbvec_trace.h
+#include stdbool.h
+#include

[Qemu-devel] [RFC 11/14] Print bbvec stats on 'magic' exceptions

2015-08-05 Thread Christopher Covington

This is necessary because we need a way to differentiate between
instructions executed in a PID by the benchmark we care about and those
executed by CRIU.

Written by Aaron Lindsay.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 bbv_profiler.c  | 15 +++
 bbv_profiler.h  |  1 +
 target-arm/helper-a64.c | 10 ++
 target-arm/helper.c | 12 
 4 files changed, 38 insertions(+)

diff --git a/bbv_profiler.c b/bbv_profiler.c
index 51e8060..66984b2 100644
--- a/bbv_profiler.c
+++ b/bbv_profiler.c
@@ -19,6 +19,12 @@
 #include assert.h
 #include bbv_profiler.h
 
+/* Magic number, which is set as bits 16-31 of the target of a branch which
+ * causes an exception to send a signal to the plugin.
+ */
+#define BBV_MAGIC_NUM 0xdead
+#define BBV_PRINT_STATS 0x0
+
 static BasicBlockTraceHandle trace = NULL;
 static uint32_t mode = 0;
 static uint64_t pid = 0;
@@ -75,3 +81,12 @@ void bb_context_check_pid(uint64_t IC, uint64_t new_pid)
bbvec_pid_change(trace, new_pid, IC);
}
 }
+
+/* Check if the bbv plugin is being signaled to do something by an exception */
+void bb_check_exception(uint64_t pc) {
+   if (((pc  16)  0x) == BBV_MAGIC_NUM) {
+   uint16_t value = pc  0x;
+   if (value == BBV_PRINT_STATS)
+   bbvec_print_stats(trace);
+   }
+}
diff --git a/bbv_profiler.h b/bbv_profiler.h
index 26dfa1f..b922451 100644
--- a/bbv_profiler.h
+++ b/bbv_profiler.h
@@ -31,5 +31,6 @@ int bbtrace_initialized(void);
 void bb_process(uint64_t PC, uint64_t IC);
 void bb_context_check_mode(uint64_t IC, uint32_t mode);
 void bb_context_check_pid(uint64_t IC, uint64_t tpid);
+void bb_check_exception(uint64_t pc);
 
 #endif
diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index e647b90..95eb096 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -27,6 +27,10 @@
 #include qemu/crc32c.h
 #include zlib.h /* For crc32 */
 
+#ifdef CONFIG_BBVEC
+#include bbv_profiler.h
+#endif // CONFIG_BBVEC
+
 /* C2.4.7 Multiply and divide */
 /* special cases for 0 and LLONG_MIN are mandated by the standard */
 uint64_t HELPER(udiv64)(uint64_t num, uint64_t den)
@@ -470,6 +474,12 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
 uint64_t mask;
 #endif
 
+#ifdef CONFIG_BBVEC
+if (bbtrace_initialized()) {
+bb_check_exception(is_a64(env) ? env-pc : env-regs[15]);
+}
+#endif
+
 uint32_t syndrome =
   cs-exception_index == EXCP_ARMV8_HLT ?
 env-exception.syndrome  ~0x :
diff --git a/target-arm/helper.c b/target-arm/helper.c
index c1f4c47..297eb7c 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4773,6 +4773,12 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
 
 arm_log_exception(cs-exception_index);
 
+#ifdef CONFIG_BBVEC
+if (bbtrace_initialized()) {
+bb_check_exception(env-regs[15]);
+}
+#endif
+
 lr = 0xfff1;
 if (env-v7m.current_sp)
 lr |= 4;
@@ -5074,6 +5080,12 @@ void arm_cpu_do_interrupt(CPUState *cs)
 return;
 }
 
+#ifdef CONFIG_BBVEC
+if (bbtrace_initialized()) {
+bb_check_exception(env-regs[15]);
+}
+#endif
+
 /* If this is a debug exception we must update the DBGDSCR.MOE bits */
 switch (env-exception.syndrome  ARM_EL_EC_SHIFT) {
 case EC_BREAKPOINT:
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [RFC 14/14] bbvec: Properly detect conditional thumb2 branching instructions

2015-08-05 Thread Christopher Covington

Add bbvec detection of thumb2 branch instructions via a stripped-down
version of the thumb2 instruction decoding logic. Unfortunately, this
code still called into disas_neon_ls_insn(), which is apparently not
free from side effects. Therefore, calling into this function twice
(once in the added bbvec thumb2 branch detection code and once in the
original decoding code) causes QEMU to report segfaults and illegal
instruction exceptions to the guest kernel (possibly among other
undesirable, still-undiscovered behavior). Because neon instructions
are not allowed to write to the PC, eliminating the call to
disas_neon_ls_insn() altogether is safe.

Written by Aaron Lindsay.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 target-arm/translate.c | 264 -
 1 file changed, 261 insertions(+), 3 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index f9d69ef..8e9b97b 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -9411,6 +9411,260 @@ gen_thumb2_data_op(DisasContext *s, int op, int conds, 
uint32_t shifter_out,
 return 0;
 }
 
+#ifdef CONFIG_BBVEC
+/* Call gen_store_is_jmp(1) if the instruction is a thumb2 branch instruction.
+ * This is called by disas_thumb_insn. The switch/case/if structures are a
+ * simplified copy of the logic in disas_thumb2_insn */
+static void thumb2_branch_detection(CPUARMState *env, DisasContext *s, 
uint16_t insn_hw1)
+{
+uint32_t insn, shift, rd, rn, rs;
+int op;
+
+if (!(arm_dc_feature(s, ARM_FEATURE_THUMB2)
+  || arm_dc_feature(s, ARM_FEATURE_M))) {
+/* Thumb-1 cores may need to treat bl and blx as a pair of
+   16-bit instructions to get correct prefetch abort behavior.  */
+insn = insn_hw1;
+if ((insn  (1  12)) == 0) {
+ARCH(5);
+gen_store_is_jmp(1);
+return;
+}
+if (insn  (1  11)) {
+gen_store_is_jmp(1);
+return;
+}
+if ((s-pc  ~TARGET_PAGE_MASK) == 0) {
+return;
+}
+/* Fall through to 32-bit decode.  */
+}
+
+insn = arm_lduw_code(env, s-pc, s-bswap_code);
+insn |= (uint32_t)insn_hw1  16;
+
+if ((insn  0xf800e800) != 0xf000e800) {
+ARCH(6T2);
+}
+
+rn = (insn  16)  0xf;
+rs = (insn  12)  0xf;
+rd = (insn  8)  0xf;
+switch ((insn  25)  0xf) {
+case 0: case 1: case 2: case 3:
+/* 16-bit instructions.  Should never happen.  */
+return;
+case 4:
+if (insn  (1  22)) {
+/* Other load/store, table branch.  */
+if (insn  0x0120) {
+/* Load/store doubleword.  */
+if (insn  (1  20)) {
+/* ldrd */
+if (rs == 15 || rd == 15)
+gen_store_is_jmp(1);
+}
+} else if ((insn  (1  23)) == 0) {
+/* Load/store exclusive word.  */
+} else if ((insn  (7  5)) == 0) {
+/* Table Branch.  */
+gen_store_is_jmp(1);
+} else {
+int op2 = (insn  6)  0x3;
+op = (insn  4)  0x3;
+switch (op2) {
+case 0:
+return;
+case 1:
+/* Load/store exclusive byte/halfword/doubleword */
+if (op == 2) {
+return;
+}
+ARCH(7);
+break;
+case 2:
+/* Load-acquire/store-release */
+if (op == 3) {
+return;
+}
+/* Fall through */
+case 3:
+/* Load-acquire/store-release exclusive */
+ARCH(8);
+break;
+}
+if (!(op2  1)) {
+if (insn  (1  20)) {
+if (op = 0  op = 2  rs == 15)
+gen_store_is_jmp(1);
+}
+} else if (insn  (1  20)) {
+if (rs == 15 || rd == 15)
+gen_store_is_jmp(1);
+}
+}
+} else {
+/* Load/store multiple, RFE, SRS.  */
+if (((insn  23)  1) == ((insn  24)  1)) {
+/* RFE, SRS: not available in user mode or on M profile */
+if (IS_USER(s) || arm_dc_feature(s, ARM_FEATURE_M)) {
+goto illegal_op;
+}
+if (insn  (1  20)) {
+/* rfe */
+if (insn  (1  21)) {
+/* Base writeback.  */
+if (rn == 15)
+gen_store_is_jmp(1);
+}
+}
+} else {
+int i, loaded_base = 0

Re: [Qemu-devel] Logging number of translated instructions per basic block

2015-08-05 Thread Christopher Covington

Hi Anthony,

On 07/28/2015 05:20 PM, Anthony Carno wrote:
 Hi there,
 
 As the subject of my email suggests, is there a way to log the number of
 translated instructions per basic block?  I've dug into the debug code a bit
 (specifically /target_disas /and the arch-specific /print_insn_*/) and haven't
 found anything that would suggest this is doable, but before I go and move on
 to my plan B (which is devising a plan C haha), I'd like to ask and make sure
 I'm not missing something.  Thank you!

I just posted a rather unruly (but functional) set of patches with the cover
letter title of RFC: ARM Semihosting, PMU, and BBV Changes. These changes
include support for collecting block vectors, which are nothing more than
accumulating instruction counts broken per block (and interval). While I would
guess there are probably much better ways to implement the same functionality,
perhaps it's a starting point.

Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [RFC 02/14] Added semihosting support for A64 in full-system mode

2015-08-05 Thread Christopher Covington

This is for full-system only; not implemented in user mode

Written by Derek Hower.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 include/exec/softmmu-semi.h |  21 ++-
 target-arm/arm-semi.c   | 142 
 target-arm/cpu.h|   3 +-
 target-arm/helper-a64.c |  28 -
 target-arm/helper.c |   1 +
 target-arm/internals.h  |   8 +++
 target-arm/translate-a64.c  |   2 +-
 7 files changed, 174 insertions(+), 31 deletions(-)

diff --git a/include/exec/softmmu-semi.h b/include/exec/softmmu-semi.h
index 8401f7d..9ab8353 100644
--- a/include/exec/softmmu-semi.h
+++ b/include/exec/softmmu-semi.h
@@ -9,6 +9,13 @@
 #ifndef SOFTMMU_SEMI_H
 #define SOFTMMU_SEMI_H 1
 
+static inline uint64_t softmmu_tget64(CPUArchState *env, uint64_t addr)
+{
+uint64_t val;
+
+cpu_memory_rw_debug(ENV_GET_CPU(env), addr, (uint8_t *)val, 8, 0);
+return tswap64(val);
+}
 static inline uint32_t softmmu_tget32(CPUArchState *env, uint32_t addr)
 {
 uint32_t val;
@@ -24,19 +31,27 @@ static inline uint32_t softmmu_tget8(CPUArchState *env, 
uint32_t addr)
 return val;
 }
 
+#define get_user_u64(arg, p) ({ arg = softmmu_tget64(env, p) ; 0; })
 #define get_user_u32(arg, p) ({ arg = softmmu_tget32(env, p) ; 0; })
 #define get_user_u8(arg, p) ({ arg = softmmu_tget8(env, p) ; 0; })
 #define get_user_ual(arg, p) get_user_u32(arg, p)
 
+static inline void softmmu_tput64(CPUArchState *env, uint64_t addr, uint64_t 
val)
+{
+val = tswap64(val);
+cpu_memory_rw_debug(ENV_GET_CPU(env), addr, (uint8_t *)val, 8, 1);
+}
+
 static inline void softmmu_tput32(CPUArchState *env, uint32_t addr, uint32_t 
val)
 {
 val = tswap32(val);
 cpu_memory_rw_debug(ENV_GET_CPU(env), addr, (uint8_t *)val, 4, 1);
 }
+#define put_user_u64(arg, p) ({ softmmu_tput64(env, p, arg) ; 0; })
 #define put_user_u32(arg, p) ({ softmmu_tput32(env, p, arg) ; 0; })
 #define put_user_ual(arg, p) put_user_u32(arg, p)
 
-static void *softmmu_lock_user(CPUArchState *env, uint32_t addr, uint32_t len,
+static void *softmmu_lock_user(CPUArchState *env, target_ulong addr, uint32_t 
len,
int copy)
 {
 uint8_t *p;
@@ -48,11 +63,11 @@ static void *softmmu_lock_user(CPUArchState *env, uint32_t 
addr, uint32_t len,
 return p;
 }
 #define lock_user(type, p, len, copy) softmmu_lock_user(env, p, len, copy)
-static char *softmmu_lock_user_string(CPUArchState *env, uint32_t addr)
+static char *softmmu_lock_user_string(CPUArchState *env, target_ulong addr)
 {
 char *p;
 char *s;
-uint8_t c;
+uint8_t c = 0;
 /* TODO: Make this something that isn't fixed size.  */
 s = p = malloc(1024);
 if (!s) {
diff --git a/target-arm/arm-semi.c b/target-arm/arm-semi.c
index bcc70ec..b89ea8f 100644
--- a/target-arm/arm-semi.c
+++ b/target-arm/arm-semi.c
@@ -4,6 +4,9 @@
  *  Copyright (c) 2005, 2007 CodeSourcery.
  *  Written by Paul Brook.
  *
+ *  Copyright (c) 2015 the Linux Foundation.
+ *  Written by Derek Hower.
+ *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
  *  the Free Software Foundation; either version 2 of the License, or
@@ -140,19 +143,35 @@ static void arm_semi_cb(CPUState *cs, target_ulong ret, 
target_ulong err)
 #else
syscall_err = err;
 #endif
-env-regs[0] = ret;
+if (env-aarch64) {
+  env-xregs[0] = ret;
+} else {
+  env-regs[0] = ret;
+}
 } else {
 /* Fixup syscalls that use nonstardard return conventions.  */
 switch (env-regs[0]) {
 case TARGET_SYS_WRITE:
 case TARGET_SYS_READ:
+  if (env-aarch64) {
+env-xregs[0] = arm_semi_syscall_len - ret;
+  } else {
 env-regs[0] = arm_semi_syscall_len - ret;
+  }
 break;
 case TARGET_SYS_SEEK:
+  if (env-aarch64) {
+env-xregs[0] = 0;
+  } else {
 env-regs[0] = 0;
+  }
 break;
 default:
+  if (env-aarch64) {
+env-xregs[0] = ret;
+  } else {
 env-regs[0] = ret;
+  }
 break;
 }
 }
@@ -165,8 +184,13 @@ static void arm_semi_flen_cb(CPUState *cs, target_ulong 
ret, target_ulong err)
 /* The size is always stored in big-endian order, extract
the value. We assume the size always fit in 32 bits.  */
 uint32_t size;
-cpu_memory_rw_debug(cs, env-regs[13]-64+32, (uint8_t *)size, 4, 0);
-env-regs[0] = be32_to_cpu(size);
+if (env-aarch64) {
+  cpu_memory_rw_debug(cs, env-pc-64+32, (uint8_t *)size, 4, 0);
+  env-xregs[0] = be32_to_cpu(size);
+} else {
+  cpu_memory_rw_debug(cs, env-regs[13]-64+32, (uint8_t *)size, 4, 0);
+  env-regs[0] = be32_to_cpu(size);
+}
 #ifdef CONFIG_USER_ONLY
 ((TaskState *)cs-opaque)-swi_errno = err;
 #else

[Qemu-devel] [RFC 08/14] Add instruction-counting infrastructure to target-arm

2015-08-05 Thread Christopher Covington

This (partially) divorces counting instructions from basic block
collection so instructions can be counted without the bbv plugin being
enabled.

Written by Aaron Lindsay.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 include/exec/cpu-defs.h|  2 ++
 target-arm/helper.c| 43 +++
 target-arm/helper.h|  2 ++
 target-arm/translate-a64.c | 18 ++
 target-arm/translate.c | 18 ++
 5 files changed, 67 insertions(+), 16 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 1eed930..9af080f 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -136,6 +136,8 @@ typedef struct CPUIOTLBEntry {
 CPU_COMMON_TLB  \
 /* Instruction Count (for profiling) */
\
 uint64_t prof_ic;   \
+/* How much of prof_ic's value have we processed? */\
+uint64_t prof_ic_last;  \
 /* PC (for profiling) */\
 uint64_t prof_pc;   \
 /* next page start (for profiling) */   \
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 1843ec5..be3ad01 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -673,6 +673,15 @@ static inline bool arm_ccnt_enabled(CPUARMState *env)
 return true;
 }
 
+/* Called by anything that wants to be an input for event counts to the PMU
+ * (except for SWINC, event 0x000, since its events can target specific
+ * counters)
+ */
+static void pmevcntr_increment(CPUARMState *env, uint8_t event_type,
+   uint64_t increment_by)
+{
+}
+
 void pmccntr_sync(CPUARMState *env)
 {
 uint64_t temp_ticks;
@@ -4060,6 +4069,40 @@ void HELPER(context_check_pid)(CPUARMState *env)
 
 bb_context_check_pid(env-prof_ic, pid);
 }
+
+void HELPER(update_instruction_count)(CPUARMState *env)
+{
+if (bbtrace_initialized()) {
+  /*
+   * If the bbv plugin is compiled in and enabled, we must account for the
+   * fact that bbv_profile needs to see prof_ic before we clear it.
+   * However, it doesn't always clear the counter every time this gets
+   * called, so we must keep track of the last value seen to ensure we
+   * update the instruction counter correctly in that case.
+   */
+increment_instruction_counters(env-prof_ic - env-prof_ic_last);
+if (env-prof_pc  env-prof_is_jmp) {
+// If this is the end of a basic block, zero out last_seen counter 
too
+env-prof_ic_last = 0;
+} else {
+env-prof_ic_last = env-prof_ic;
+}
+} else {
+pmevcntr_increment(env, PMU_COUNTER_TYPE_INSTRUCTIONS, env-prof_ic);
+pmevcntr_increment(env, PMU_COUNTER_TYPE_CYCLES, env-prof_ic);
+env-prof_ic = 0;
+}
+}
+
+#else //!CONFIG_BBVEC
+
+void HELPER(update_instruction_count)(CPUARMState *env)
+{
+pmevcntr_increment(env, PMU_COUNTER_TYPE_INSTRUCTIONS, env-prof_ic);
+pmevcntr_increment(env, PMU_COUNTER_TYPE_CYCLES, env-prof_ic);
+env-prof_ic = 0;
+}
+
 #endif //CONFIG_BBVEC
 
 /* Sign/zero extend */
diff --git a/target-arm/helper.h b/target-arm/helper.h
index 41291c9..02edf3a 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -43,6 +43,8 @@ DEF_HELPER_1(context_check_mode, void, env)
 DEF_HELPER_1(context_check_pid, void, env)
 #endif // CONFIG_BBVEC
 
+DEF_HELPER_1(update_instruction_count, void, env)
+
 DEF_HELPER_3(ssat, i32, env, i32, i32)
 DEF_HELPER_3(usat, i32, env, i32, i32)
 DEF_HELPER_3(ssat16, i32, env, i32, i32)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 80b27ed..f6f8832 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -55,9 +55,9 @@ static TCGv_i64 cpu_exclusive_high;
  * keep track of the current pc, so the last pc in the block can be
  * captured.   */
 static TCGv_i64 cpu_prof_pc;
-static TCGv_i64 cpu_prof_ic;
 static TCGv_i64 cpu_prof_is_jmp;
 #endif // CONFIG_BBVEC
+static TCGv_i64 cpu_prof_ic;
 #ifdef CONFIG_USER_ONLY
 static TCGv_i64 cpu_exclusive_test;
 static TCGv_i32 cpu_exclusive_info;
@@ -124,10 +124,11 @@ void a64_translate_init(void)
 
 #ifdef CONFIG_BBVEC
 // bbvec profiling globals
-cpu_prof_ic = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUARMState, 
prof_ic), prof_ic);
 cpu_prof_pc = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUARMState, 
prof_pc), prof_pc);
 cpu_prof_is_jmp = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUARMState, 
prof_is_jmp), prof_is_jmp);
 #endif // CONFIG_BBVEC
+cpu_prof_ic = tcg_global_mem_new_i64(TCG_AREG0,
+offsetof(CPUARMState, prof_ic), prof_ic);
 
 cpu_exclusive_addr = tcg_global_mem_new_i64(TCG_AREG0,
 offsetof

[Qemu-devel] [RFC 01/14] Make unknown semihosting calls non-fatal

2015-08-05 Thread Christopher Covington

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 target-arm/arm-semi.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target-arm/arm-semi.c b/target-arm/arm-semi.c
index a8b83e6..bcc70ec 100644
--- a/target-arm/arm-semi.c
+++ b/target-arm/arm-semi.c
@@ -186,8 +186,6 @@ static void arm_semi_flen_cb(CPUState *cs, target_ulong 
ret, target_ulong err)
 #define SET_ARG(n, val) put_user_ual(val, args + (n) * 4)
 uint32_t do_arm_semihosting(CPUARMState *env)
 {
-ARMCPU *cpu = arm_env_get_cpu(env);
-CPUState *cs = CPU(cpu);
 target_ulong args;
 target_ulong arg0, arg1, arg2, arg3;
 char * s;
@@ -195,6 +193,8 @@ uint32_t do_arm_semihosting(CPUARMState *env)
 uint32_t ret;
 uint32_t len;
 #ifdef CONFIG_USER_ONLY
+ARMCPU *cpu = arm_env_get_cpu(env);
+CPUState *cs = CPU(cpu);
 TaskState *ts = cs-opaque;
 #else
 CPUARMState *ts = env;
@@ -562,7 +562,6 @@ uint32_t do_arm_semihosting(CPUARMState *env)
 exit(ret);
 default:
 fprintf(stderr, qemu: Unsupported SemiHosting SWI 0x%02x\n, nr);
-cpu_dump_state(cs, stderr, fprintf, 0);
-abort();
+return -1;
 }
 }
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [RFC 09/14] Implement remaining PMU functionality

2015-08-05 Thread Christopher Covington

This adds logic to increment PMEVCNTR's based on different event inputs,
implements all remaining CP registers, and triggers an interrupt on
event overflow.

Written by Aaron Lindsay.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 target-arm/cpu-qom.h |   2 +
 target-arm/cpu.c |   2 +
 target-arm/cpu.h |  37 ++--
 target-arm/cpu64.c   |   2 +
 target-arm/helper.c  | 538 ++-
 5 files changed, 425 insertions(+), 156 deletions(-)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index bb6722f..2a0f3f4 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -136,6 +136,8 @@ typedef struct ARMCPU {
 uint32_t id_pfr0;
 uint32_t id_pfr1;
 uint32_t id_dfr0;
+uint32_t pmceid0;
+uint32_t pmceid1;
 uint32_t id_afr0;
 uint32_t id_mmfr0;
 uint32_t id_mmfr1;
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 87d0772..6a728d9 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -938,6 +938,8 @@ static void cortex_a15_initfn(Object *obj)
 cpu-id_pfr0 = 0x1131;
 cpu-id_pfr1 = 0x00011011;
 cpu-id_dfr0 = 0x02010555;
+cpu-pmceid0 = 0x0481; /* PMUv3 events 0x0, 0x8, and 0x11 */
+cpu-pmceid1 = 0x;
 cpu-id_afr0 = 0x;
 cpu-id_mmfr0 = 0x10201105;
 cpu-id_mmfr1 = 0x2000;
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 44084a5..f6857fa 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -112,11 +112,22 @@ typedef struct ARMGenericTimer {
 uint64_t ctl; /* Timer Control register */
 } ARMGenericTimer;
 
+/* Indices into ARMCPU.ppi_outputs (and .gt_timer for timers) */
 #define NUM_GTIMERS 2
 #define GTIMER_PHYS 0
 #define GTIMER_VIRT 1
 #define PMU_IDX 2
 
+enum pmu_counter_type {
+PMU_COUNTER_TYPE_SWINC = 0x000,
+PMU_COUNTER_TYPE_INSTRUCTIONS = 0x008,
+PMU_COUNTER_TYPE_CYCLES = 0x011
+};
+
+/* Performance monitor event counter state */
+#define NUM_PMU_COUNTERS 4 /* 0-30, inclusive, doesn't count cycle counter */
+#define PMU_COUNTER_MASK 0x800F /* Mask of bits allowed for 
PMINTEN{SET|CLR}*/
+
 typedef struct {
 uint64_t raw_tcr;
 uint32_t mask;
@@ -287,12 +298,13 @@ typedef struct CPUARMState {
 };
 uint32_t c9_insn; /* Cache lockdown registers.  */
 uint32_t c9_data;
-uint64_t c9_pmcr; /* performance monitor control register */
-uint64_t c9_pmcnten; /* perf monitor counter enables */
+uint32_t c9_pmcr; /* performance monitor control register */
+uint64_t c9_pmccntr;
+uint32_t c9_pmcnten; /* perf monitor counter enables */
 uint32_t c9_pmovsr; /* perf monitor overflow status */
-uint32_t c9_pmxevtyper; /* perf monitor event type */
 uint32_t c9_pmuserenr; /* perf monitor user enable */
 uint32_t c9_pminten; /* perf monitor interrupt enables */
+uint32_t c9_pmselr; /* perf monitor event counter selection */
 union { /* Memory attribute redirection */
 struct {
 #ifdef HOST_WORDS_BIGENDIAN
@@ -354,6 +366,9 @@ typedef struct CPUARMState {
 uint64_t tpidruro_ns;
 uint64_t tpidrro_el[1];
 };
+uint32_t c14_pmccfiltr; /* Performance Monitor Filter Register */
+uint32_t c14_pmevcntr[NUM_PMU_COUNTERS];
+uint32_t c14_pmevtyper[NUM_PMU_COUNTERS];
 uint64_t c14_cntfrq; /* Counter Frequency register */
 uint64_t c14_cntkctl; /* Timer Control register */
 ARMGenericTimer c14_timer[NUM_GTIMERS];
@@ -371,11 +386,6 @@ typedef struct CPUARMState {
 uint64_t dbgwvr[16]; /* watchpoint value registers */
 uint64_t dbgwcr[16]; /* watchpoint control registers */
 uint64_t mdscr_el1;
-/* If the counter is enabled, this stores the last time the counter
- * was reset. Otherwise it stores the counter value
- */
-uint64_t c15_ccnt;
-uint64_t pmccfiltr_el0; /* Performance Monitor Filter Register */
 } cp15;
 
 struct {
@@ -508,17 +518,6 @@ int cpu_arm_signal_handler(int host_signum, void *pinfo,
 int arm_cpu_handle_mmu_fault(CPUState *cpu, vaddr address, int rw,
  int mmu_idx);
 
-/**
- * pmccntr_sync
- * @env: CPUARMState
- *
- * Synchronises the counter in the PMCCNTR. This must always be called twice,
- * once before any action that might affect the timer and again afterwards.
- * The function is used to swap the state of the register if required.
- * This only happens when not in user mode (!CONFIG_USER_ONLY)
- */
-void pmccntr_sync(CPUARMState *env);
-
 /* SCTLR bit meanings. Several bits have been reused in newer
  * versions of the architecture; in that case we define constants
  * for both old and new bit meanings. Code which tests against those
diff --git a/target-arm/cpu64.c b/target-arm/cpu64.c
index 270bc2f..b2a3a74 100644
--- a/target-arm/cpu64.c
+++ b/target-arm/cpu64.c
@@ -132,6 +132,8 @@ static void aarch64_a57_initfn

[Qemu-devel] [RFC 13/14] Enable negative icount values for QEMU.

2015-08-05 Thread Christopher Covington

The icount setting specifies how far to shift the instruction
count as a ratio of ns to tie system time to instruction count.
Allow a negative value (i.e. a right shift instead of a left shift)
to be used.

Written by Pat Galizia.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 cpus.c  | 17 +
 target-arm/helper.c | 14 ++
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/cpus.c b/cpus.c
index 62d157a..1c92a85 100644
--- a/cpus.c
+++ b/cpus.c
@@ -110,6 +110,8 @@ static int64_t vm_clock_warp_start = -1;
 static int icount_time_shift;
 /* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
 #define MAX_ICOUNT_SHIFT 10
+/* In cases where we have a negative icount shift, specify a lower bound. */
+#define MIN_ICOUNT_SHIFT -5
 
 static QEMUTimer *icount_rt_timer;
 static QEMUTimer *icount_vm_timer;
@@ -174,6 +176,8 @@ int64_t cpu_get_icount(void)
 
 int64_t cpu_icount_to_ns(int64_t icount)
 {
+if (icount_time_shift  0)
+return icount  (-icount_time_shift);
 return icount  icount_time_shift;
 }
 
@@ -271,6 +275,7 @@ static void icount_adjust(void)
 int64_t cur_time;
 int64_t cur_icount;
 int64_t delta;
+int64_t icount_shifted;
 
 /* Protected by TimersState mutex.  */
 static int64_t last_delta;
@@ -287,8 +292,8 @@ static void icount_adjust(void)
 delta = cur_icount - cur_time;
 /* FIXME: This is a very crude algorithm, somewhat prone to oscillation.  
*/
 if (delta  0
- last_delta + ICOUNT_WOBBLE  delta * 2
- icount_time_shift  0) {
+ last_delta + ICOUNT_WOBBLE  delta * 2
+ icount_time_shift  MIN_ICOUNT_SHIFT) {
 /* The guest is getting too far ahead.  Slow time down.  */
 icount_time_shift--;
 }
@@ -299,8 +304,10 @@ static void icount_adjust(void)
 icount_time_shift++;
 }
 last_delta = delta;
-timers_state.qemu_icount_bias = cur_icount
-  - (timers_state.qemu_icount  
icount_time_shift);
+icount_shifted = (icount_time_shift = 0) ?
+timers_state.qemu_icount  icount_time_shift :
+timers_state.qemu_icount  (-icount_time_shift);
+timers_state.qemu_icount_bias = cur_icount - icount_shifted;
 seqlock_write_unlock(timers_state.vm_clock_seqlock);
 }
 
@@ -321,6 +328,8 @@ static void icount_adjust_vm(void *opaque)
 
 static int64_t qemu_icount_round(int64_t count)
 {
+if (icount_time_shift  0)
+return count;
 return (count + (1  icount_time_shift) - 1)  icount_time_shift;
 }
 
diff --git a/target-arm/helper.c b/target-arm/helper.c
index ae0a4ac..0436df5 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4345,6 +4345,11 @@ void context_check_pid(CPUARMState *env)
 
 void update_instruction_count(CPUARMState *env)
 {
+CPUState *cpu = CPU(arm_env_get_cpu(env));
+int prevIoState = cpu-can_do_io;
+if (use_icount)
+cpu-can_do_io = 1;
+
 if (bbtrace_initialized()) {
 /*
  * If the bbv plugin is compiled in and enabled, we must account for 
the
@@ -4368,15 +4373,24 @@ void update_instruction_count(CPUARMState *env)
 pmevcntr_increment(env, PMU_COUNTER_TYPE_CYCLES, env-prof_ic);
 env-prof_ic = 0;
 }
+
+if (use_icount)
+cpu-can_do_io = prevIoState;
 }
 
 #else //!CONFIG_BBVEC
 
 void update_instruction_count(CPUARMState *env)
 {
+CPUState *cpu = CPU(arm_env_get_cpu(env));
+int prevIoState = cpu-can_do_io;
+if (use_icount)
+cpu-can_do_io = 1;
 pmevcntr_increment(env, PMU_COUNTER_TYPE_INSTRUCTIONS, env-prof_ic);
 pmevcntr_increment(env, PMU_COUNTER_TYPE_CYCLES, env-prof_ic);
 env-prof_ic = 0;
+if (use_icount)
+cpu-can_do_io = prevIoState;
 }
 
 #endif //CONFIG_BBVEC
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [RFC 03/14] Fix makefile

2015-08-05 Thread Christopher Covington

Fixes https://bugs.launchpad.net/qemu/+bug/1319493/

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 Makefile | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 93af871..fcc6314 100644
--- a/Makefile
+++ b/Makefile
@@ -408,7 +408,11 @@ ifneq ($(CONFIG_MODULES),)
done
 endif
 ifneq ($(HELPERS-y),)
-   $(call install-prog,$(HELPERS-y),$(DESTDIR)$(libexecdir))
+   $(INSTALL_DIR) $(DESTDIR)$(libexecdir)
+   $(INSTALL_PROG) $(HELPERS-y) $(DESTDIR)$(libexecdir)
+ifneq ($(STRIP),)
+   $(STRIP) $(addprefix $(DESTDIR)$(libexecdir)/,$(notdir $(HELPERS-y)))
+endif
 endif
 ifneq ($(BLOBS),)
set -e; for x in $(BLOBS); do \
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [RFC 04/14] Modify load exclusive/store exclusive to use physical addresses with the monitor

2015-08-05 Thread Christopher Covington

Written by Derek Hower.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 target-arm/helper-a64.h|  2 ++
 target-arm/helper.c| 22 ++
 target-arm/translate-a64.c | 25 +++--
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index 1d3d10f..a713d29 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -46,3 +46,5 @@ DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
+
+DEF_HELPER_3(get_phys_addr64, i64, env, i64, i32)
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 4491b05..be564b2 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -24,6 +24,28 @@ static inline int get_phys_addr(CPUARMState *env, 
target_ulong address,
 #define PMCRE   0x1
 #endif
 
+#ifdef TARGET_AARCH64
+
+uint64_t HELPER(get_phys_addr64)(CPUARMState *env,
+ uint64_t vaddr, uint32_t memidx)
+{
+#ifdef CONFIG_USER_ONLY
+  return vaddr;
+#else
+  hwaddr phys_addr;
+  int prot;   // ignored
+  target_ulong page_size; // ignored
+  MemTxAttrs attrs = {};  // ignored
+
+  // we just want the address from this function and don't care about faults.
+  // therefore, we always assume the operation is a load
+  get_phys_addr(env, vaddr, 0, memidx == 0, phys_addr, attrs, prot, 
page_size);
+  return phys_addr;
+#endif
+}
+
+#endif
+
 static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
 {
 int nregs;
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 14a501c..20d1d3c 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1683,7 +1683,17 @@ static void gen_load_exclusive(DisasContext *s, int rt, 
int rt2,
 tcg_gen_mov_i64(cpu_reg(s, rt), tmp);
 
 tcg_temp_free_i64(tmp);
-tcg_gen_mov_i64(cpu_exclusive_addr, addr);
+
+// the monitor must be set on the physical address
+// we've already read the address at this point, so we know
+// the translation won't fault
+TCGv_i64 physaddr = tcg_temp_new_i64();
+TCGv_i32 idx = tcg_temp_new_i32();
+tcg_gen_movi_i32(idx, get_mem_index(s));
+gen_helper_get_phys_addr64(physaddr, cpu_env, addr, idx);
+tcg_gen_mov_i64(cpu_exclusive_addr, physaddr);
+tcg_temp_free_i64(physaddr);
+tcg_temp_free_i32(idx);
 }
 
 #ifdef CONFIG_USER_ONLY
@@ -1720,13 +1730,24 @@ static void gen_store_exclusive(DisasContext *s, int 
rd, int rt, int rt2,
  * basic block ends at the branch insn.
  */
 tcg_gen_mov_i64(addr, inaddr);
-tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_exclusive_addr, fail_label);
 
 tmp = tcg_temp_new_i64();
 tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), MO_TE + size);
 tcg_gen_brcond_i64(TCG_COND_NE, tmp, cpu_exclusive_val, fail_label);
 tcg_temp_free_i64(tmp);
 
+// the monitor must be checked on the physical address.
+// We've alredy loaded this address, so we don't need to check for
+// a fault condition
+TCGv_i64 physaddr = tcg_temp_new_i64();
+TCGv_i32 idx = tcg_temp_new_i32();
+tcg_gen_movi_i32(idx, get_mem_index(s));
+gen_helper_get_phys_addr64(physaddr, cpu_env, addr, idx);
+
+tcg_gen_brcond_i64(TCG_COND_NE, physaddr, cpu_exclusive_addr, fail_label);
+tcg_temp_free_i64(physaddr);
+tcg_temp_free_i32(idx);
+
 if (is_pair) {
 TCGv_i64 addrhi = tcg_temp_new_i64();
 TCGv_i64 tmphi = tcg_temp_new_i64();
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[Qemu-devel] [RFC 05/14] Fixed TLB invalidate ops.

2015-08-05 Thread Christopher Covington

Prior to this patch, QEMU was only invalidating the TLB for the local
processor on a TLB flush event, causing unstable behavoir in smp
mode. This patch corrects the behavoir so that all TLBs are
invalidated across the system.

Written by Derek Hower.

Signed-off-by: Christopher Covington c...@codeaurora.org
---
 target-arm/helper.c | 68 +++--
 1 file changed, 30 insertions(+), 38 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index be564b2..ff3c8f7 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -355,23 +355,35 @@ void init_cpreg_list(ARMCPU *cpu)
 g_list_free(keys);
 }
 
-static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t 
value)
+static void arm_tlb_flush(CPUARMState *env, int flush_global)
 {
-ARMCPU *cpu = arm_env_get_cpu(env);
+CPUState* cpu;
+CPU_FOREACH(cpu) {
+  tlb_flush(cpu, flush_global);
+}
+}
 
+static void arm_tlb_flush_page(CPUARMState *env, target_ulong addr)
+{
+CPUState* cpu;
+CPU_FOREACH(cpu) {
+  tlb_flush_page(cpu, addr);
+}
+}
+
+static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t 
value)
+{
 raw_write(env, ri, value);
-tlb_flush(CPU(cpu), 1); /* Flush TLB as domain not tracked in TLB */
+arm_tlb_flush(env, 1);/* Flush TLB as domain not tracked in TLB */
 }
 
 static void fcse_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t 
value)
 {
-ARMCPU *cpu = arm_env_get_cpu(env);
-
 if (raw_read(env, ri) != value) {
 /* Unlike real hardware the qemu TLB uses virtual addresses,
  * not modified virtual addresses, so this causes a TLB flush.
  */
-tlb_flush(CPU(cpu), 1);
+arm_tlb_flush(env, 1);
 raw_write(env, ri, value);
 }
 }
@@ -379,15 +391,13 @@ static void fcse_write(CPUARMState *env, const 
ARMCPRegInfo *ri, uint64_t value)
 static void contextidr_write(CPUARMState *env, const ARMCPRegInfo *ri,
  uint64_t value)
 {
-ARMCPU *cpu = arm_env_get_cpu(env);
-
 if (raw_read(env, ri) != value  !arm_feature(env, ARM_FEATURE_MPU)
  !extended_addresses_enabled(env)) {
 /* For VMSA (when not using the LPAE long descriptor page table
  * format) this register includes the ASID, so do a TLB flush.
  * For PMSA it is purely a process ID and no action is needed.
  */
-tlb_flush(CPU(cpu), 1);
+arm_tlb_flush(env, 1);
 }
 raw_write(env, ri, value);
 }
@@ -396,36 +406,28 @@ static void tlbiall_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
   uint64_t value)
 {
 /* Invalidate all (TLBIALL) */
-ARMCPU *cpu = arm_env_get_cpu(env);
-
-tlb_flush(CPU(cpu), 1);
+arm_tlb_flush(env, 1);
 }
 
 static void tlbimva_write(CPUARMState *env, const ARMCPRegInfo *ri,
   uint64_t value)
 {
 /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
-ARMCPU *cpu = arm_env_get_cpu(env);
-
-tlb_flush_page(CPU(cpu), value  TARGET_PAGE_MASK);
+arm_tlb_flush(env, value  TARGET_PAGE_MASK);
 }
 
 static void tlbiasid_write(CPUARMState *env, const ARMCPRegInfo *ri,
uint64_t value)
 {
 /* Invalidate by ASID (TLBIASID) */
-ARMCPU *cpu = arm_env_get_cpu(env);
-
-tlb_flush(CPU(cpu), value == 0);
+arm_tlb_flush(env, value == 0);
 }
 
 static void tlbimvaa_write(CPUARMState *env, const ARMCPRegInfo *ri,
uint64_t value)
 {
 /* Invalidate single entry by MVA, all ASIDs (TLBIMVAA) */
-ARMCPU *cpu = arm_env_get_cpu(env);
-
-tlb_flush_page(CPU(cpu), value  TARGET_PAGE_MASK);
+arm_tlb_flush_page(env, value  TARGET_PAGE_MASK);
 }
 
 /* IS variants of TLB operations must affect all cores */
@@ -1792,13 +1794,11 @@ static void vmsa_ttbcr_raw_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 static void vmsa_ttbcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
  uint64_t value)
 {
-ARMCPU *cpu = arm_env_get_cpu(env);
-
 if (arm_feature(env, ARM_FEATURE_LPAE)) {
 /* With LPAE the TTBCR could result in a change of ASID
  * via the TTBCR.A1 bit, so do a TLB flush.
  */
-tlb_flush(CPU(cpu), 1);
+arm_tlb_flush(env, 1);
 }
 vmsa_ttbcr_raw_write(env, ri, value);
 }
@@ -1818,11 +1818,10 @@ static void vmsa_ttbcr_reset(CPUARMState *env, const 
ARMCPRegInfo *ri)
 static void vmsa_tcr_el1_write(CPUARMState *env, const ARMCPRegInfo *ri,
uint64_t value)
 {
-ARMCPU *cpu = arm_env_get_cpu(env);
 TCR *tcr = raw_ptr(env, ri);
 
 /* For AArch64 the A1 bit could result in a change of ASID, so TLB flush. 
*/
-tlb_flush(CPU(cpu), 1);
+arm_tlb_flush(env, 1);
 tcr-raw_tcr = value;
 }
 
@@ -1833,9 +1832,7 @@ static void vmsa_ttbr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
  * must flush the TLB

[Qemu-devel] RFC: ARM Semihosting, PMU, and BBV Changes

2015-08-05 Thread Christopher Covington

Hi,

This series is a jumble of changes that I have found useful for
creating samples of long-running applications. I do not expect any of
these patches to be merged upstream as-is but I'm publishing them as a
way to ask for high-level feedback, as there may very well be much
better ways to achieve the same goal. These changes are based on
commit 1fd1a2cc182d4e331acd92c2cdbb52b3e78c040e.

While the patches are in chronological order, their functionality can
roughly be put in the following categories:

A) Functional fixes
B) Guest-emulator/host communication
C) Instrumentation/profiling capabilities

A) Patches 3, 4, 5, and 13 can perhaps be categorized as functional
fixes. They may be fixed on the current tip. As a write this, I'm
thinking this category may be the best one to target for initial
upstreaming.

B) Patches 1, 2, and 11 implement communications mechanisms. We have
used Angel semihosting and magic exceptions to perform various kinds
of guest to emulator and guest to host communication. Since these
patches were originally developed, I've been able to reduce or remove
our need for them, but if anyone has suggestions on better ways to not
need to communicate in the first place or use existing mechanisms to
communicate, that'd be appreciated. As an example, we previously used
semihosting open(), read(), and write() calls for host filesystem
access, but have been able to replace those by mounting a VirtIO-MMIO
9P filesystem. Another example is using poweroff, which ends up making
a PSCI call, to end the run, rather than a custom executable that
makes a semihosting exit call.

C) The instrumentation implemented in patches 6, 7, 8, 9, 10, 12, and
14 allow us to measure instruction counts and block vectors, with
the primary application of running the offline SimPoint algorithm [1]
on the collected block vectors and dumping application level
checkpoints using CRIU [2] in a second pass.

Thanks,
Christopher Covington

1. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.4012
2. http://criu.org/Main_Page

Re: [Qemu-devel] Using the one disk image file on 2 virtual machines at the same time

2015-07-31 Thread Christopher Covington

On 07/29/2015 01:46 PM, John Snow wrote:
 
 
 On 07/29/2015 01:29 PM, Manjong Han wrote:
 Thanks, Stefan.

 2015-07-29 17:46 GMT+09:00 Stefan Hajnoczi stefa...@gmail.com:

 You should probably use qcow2 backing files instead:

   10G.qcow2 -- vm001.qcow2
 ^-- vm002.qcow2

 The command to create these files is:

   qemu-img create -f qcow2 -o backing_file=10G.qcow2 vm001.qcow2.

 Both VMs share the data in 10G.qcow2.  All writes go to vm001.qcow2 or
 vm002.qcow2, respectively, so they don't corrupt each other.


 I tried to create a backing files, using the commands which you told.

 $ qemu-img create -f qcow2 -o backing_file=10G.qcow2 vm001.qcow2
 $ qemu-img create -f qcow2 -o backing_file=10G.qcow2 vm002.qcow2

 And, I used these backing files on each virtual machines.
 But, new files weren't written on original disk image(10G.qcow2)..
 The backing files were working each other.

 Standard file systems (ext4, xfs) and volume managers (LVM) are not
 cluster-aware by default.  They must only be accessed from one machine
 at a time.  Otherwise you risk data corruption.


 I think that I must probably use a shared file system like NFS..

 
 Yes, any files written using the backing files like outlined above will
 put new files in the overlays (e.g. vm001.qcow2 or vm002.qcow2) and NOT
 into the backing file (10G.qcow2)
 
 this is a safe way to share a base image for an OS, but it's not a
 method of accomplishing a concurrent fileshare.
 
 You'll want to configure an NFS or CIFS share (etc) in the base image
 and then allow the multiple VMs to utilize that share.

Using a VirtIO-9P based passthrough filesystem on both might produce the same
result with less host-side configuration required.

http://wiki.qemu.org/Documentation/9psetup

Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [RFC] Towards an Heterogeneous QEMU

2015-07-31 Thread Christopher Covington

 of the rest?

Thanks,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] User space vs kernel space instructions distribution.

2015-07-22 Thread Christopher Covington

On 07/14/2015 04:45 AM, Peter Maydell wrote:
 On 14 July 2015 at 09:32, Shlomo Pongratz shlomopongr...@gmail.com wrote:
 Hi,

 I'm running aarm64 QEMU and I'm counting the number of instructions which
 belong to user space vs kernel space. My measurements shows that 99
 percent of instructions are in kernel space.

How are you counting? Instrumenting QEMU?

 I've used both the address of the instructions and the EL just to be sure. I
 also added an option to disable block chaining just to make sure that all
 the instructions in every TB  is counted.
 When examining some kernel's instructions against the objdump of the kernel
 I've noticed that most of them are in interrupts/timers area.

 Does this make sense?
 Did someone also encountered this phenomenon?
 
 Depends entirely on your workload, obviously. If the system only
 boots then most instructions will be in kernel space. If the system
 is only sitting idle then it'll just be executing the kernel space
 idle loop. If you're measuring solely the section of time where
 a userspace program is doing real work with the CPU and you're
 still seeing a 99% figure then the obvious conclusion would be that
 your measurement approach is wrong...
 
 If your measurement instrumentation is intrusive and is significantly
 slowing down QEMU then you'll naturally find that the guest spends
 more time in timer interrupt handling, because the timer interrupts
 come in in real time, and you've just effectively reduced the speed
 of your CPU, so it can get less useful work done between timer
 interrupts.

I find such behavior undesirable. As best I understand, -icount exists to
provide an alternative, although it may have bugs. I've tinkered with -icount
a little, but I have yet to come up with useful tests to verify its correct
behavior. If anyone has suggestions for how to test it, I'd be eager to hear.

Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PATCH pic32 2/7] Stop simulation when processor is suspended forever by WAIT instruction with interrupts disabled.

2015-07-09 Thread Christopher Covington

On 06/30/2015 09:57 PM, Serge Vakulenko wrote:
 Hi Peter and Leon,
 
 With a bit of thinking, I agree, that the question of session
 termination on WAIT instruction is quite complicated in case of
 multi-core system, background i/o activity, mipsR6 core etc. So I'm
 going to find another solution for the task. What I essentially want
 here is to stop the simulator when the target Unix system is halted,
 like:
 
 $ /usr/local/qemu-mips/bin/qemu-system-mipsel -M pic32mx7-max32
 -nographic -monitor none -serial stdio -bios boot-max32.hex -kernel
 unix.hex -sd sdcard.img
 Board: chipKIT Max32
 Processor: M4K
 RAM size: 128 kbytes
 Load file: 'boot-max32.hex', 6720 bytes
 Load file: 'unix.hex', 144992 bytes
 Card0 image 'sdcard.img', 102888 kbytes
 [...]
 2.11 BSD UNIX (pic32) (console)
 
 login: root
 Password:
 Welcome to RetroBSD!
 erase, kill ^U, intr ^C
 # halt
 killing processes... done
 syncing disks... done
 halted
 $ _   -- QEMU terminated
 
 On BSD, the halt command uses reboot(RB_HALT) system call to terminate
 the operating system. It essentially results in an endless loop on
 wait instruction with interrupts disabled., like for(;;) {
 asm(wait); }. For pic32 it makes little sense to continue
 simulation in this case.
 
 Fortunately, I've found a solution which does not require modification
 of generic code. Everything can be done in the platform-specific part.

Why doesn't the OS do more than busy-loop in halt()? For example poke a
memory-mapped register, or make a firmware or semihosting call?

Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] Monitoring write to memory

2015-07-09 Thread Christopher Covington

On 07/01/2015 08:23 AM, Jun Koi wrote:
 Hello,
 
 I am trying to monitor all the memory writing events inside Qemu by
 instrumenting tcg_gen_qemu_st8, tcg_gen_qemu_st16, tcg_gen_qemu_st32,
 tcg_gen_qemu_st64, as followings:
 
 
 // in tcg-op.h
 
 void helper_checkmem(int64_t data, int64_t address);  // this is declared
 elsewhere
 
 static inline void tcg_gen_qemu_st8(struct uc_struct *uc, TCGv arg, TCGv addr,
 int mem_index)
 {
 #if TARGET_LONG_BITS == 32
 TCGArg args[2] = { GET_TCGV_I32(arg), GET_TCGV_I32(addr) };
 #else
 TCGArg args[2] = { GET_TCGV_I64(arg), GET_TCGV_I64(addr) };
 #endif
 tcg_gen_callN(tcg_ctx, helper_checkmem, dh_retvar_void, 2, args);
 tcg_gen_qemu_st_tl(uc, arg, addr, mem_index, MO_UB);
 }
 
 
 However, when I compile, helper_checkmem() is never called at runtime when
 memory is accessed.
 What is wrong with my code?

What guest code are you running? Does it have a strb instruction? If you're
able to code a simple bare metal test case you should be able to debug qemu
with gdb to see what code paths are taken and why the ones you anticipated 
aren't.

Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] Asking about QEMU's process in memory address space of host

2015-07-09 Thread Christopher Covington

On 07/08/2015 07:14 AM, Piyawath Boukom wrote:
 Dear peoples in mailing-list,
 
 My name is Piyawath Boukom, I’m a student from Tokyo Tech.
 I hope you can enlighten me about this.
 
 Below are things that those I would like to perform.
 
 - I want to identify where guest kernel lives in QEMU’s process on host 
 machine, is it possible to find a set of logical address space of guest 
 kernel on host machine ? (which part of memory space is being used by guest 
 kernel)
 
 - If I can specify where QEMU’s process lives in host memory address space 
 then I know where stack, heap, data, text, etc. of QEMU live. So, can I 
 determine where guest machine’s physical memory lives ? (in host logical 
 address)
 
 - If above are possible to do, can I write those things into a file ? (ex. 
 data in guest machine’s memory, binary data, etc.)
 
 *Host and guest are Linux.

The savevm monitor command dumps pretty much all the state to a file,
although you have to use QCOW2 format disk images.

https://en.wikibooks.org/wiki/QEMU/Monitor#savevm

Once the snapshot/checkpoint has been saved, you can dig into it to get
specific memory contents. You'll have to write your own scripts for this as
far as I know.

For kernel virtual addresses, check the guest's dmesg. `dmesg | sed -nr
'/Virtual kernel memory/,/\[/ p'` does that for me on AArch64.

If there's a particular user space process you're interested in, run `cat
/proc/$PID/maps` in the guest immediately before or after saving the
checkpoint to get its virtual addresses.

Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [Qemu-devel] [PULL 1/8] target-sh4: use bit number for SR constants

2015-06-02 Thread Christopher Covington

Hi Aurelien,

On 06/01/2015 05:29 PM, Aurelien Jarno wrote:
 Use the bit number for SR constants instead of using a bit mask. This
 make possible to also use the constants for shifts.
 
 Reviewed-by: Richard Henderson r...@twiddle.net
 Signed-off-by: Aurelien Jarno aurel...@aurel32.net
 ---
  target-sh4/cpu.c   |  3 +-
  target-sh4/cpu.h   | 30 ++--
  target-sh4/gdbstub.c   |  4 +--
  target-sh4/helper.c| 27 +-
  target-sh4/op_helper.c | 26 -
  target-sh4/translate.c | 75 
 ++
  6 files changed, 85 insertions(+), 80 deletions(-)
 
 diff --git a/target-sh4/cpu.c b/target-sh4/cpu.c
 index d187a2b..cccb14f 100644
 --- a/target-sh4/cpu.c
 +++ b/target-sh4/cpu.c
 @@ -61,7 +61,8 @@ static void superh_cpu_reset(CPUState *s)
  env-fpscr = FPSCR_PR; /* value for userspace according to the kernel */
  set_float_rounding_mode(float_round_nearest_even, env-fp_status); /* 
 ?! */
  #else
 -env-sr = SR_MD | SR_RB | SR_BL | SR_I3 | SR_I2 | SR_I1 | SR_I0;
 +env-sr = (1u  SR_MD) | (1u  SR_RB) | (1u  SR_BL) |
 +  (1u  SR_I3) | (1u  SR_I2) | (1u  SR_I1) | (1u  SR_I0);

I like using the BIT() macro for this kind of thing.

Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

1 2 >

1 - 100 of 181 matches

Mail list logo