[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-27 Thread Giacomo Travaglini via gem5-users
Hi Niko,

Have a look at the following patch:

https://gem5-review.googlesource.com/c/public/gem5/+/38095

This should solve your problem

Kind Regards

Giacomo

> -Original Message-
> From: Giacomo Travaglini
> Sent: 25 November 2020 12:55
> To: POLYCHRONOU Nikolaos ; gem5 users
> mailing list 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5
> on Linux
>
>
>
> > -Original Message-
> > From: POLYCHRONOU Nikolaos 
> > Sent: 25 November 2020 12:22
> > To: Giacomo Travaglini ; gem5 users
> > mailing list 
> > Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside
> > gem5 on Linux
> >
> > Hello,
> > To conclude my argument is that I instantiate the arm_o3 model and my
> > scripts work perfectly.
> > I am almost convince after spending some days that the HPI model in
> > the minor cpu does not increment the cycle counter  as the
> > CPU_STATE_ON is never used.
> > Indeed in the O3 model is used. I will try to spend some time to see
> > why and also try your suggestion King regards Nikolaos
>
> Thanks for spotting this! This is a real bug in gem5: the cycle counters are 
> not
> updated in the MinorCPU model I will post a patch to fix this and will keep 
> you
> posted
>
> Kind Regards
>
> Giacomo
>
> >
> > -Original Message-----
> > From: Giacomo Travaglini 
> > Sent: Wednesday, November 25, 2020 11:12 AM
> > To: POLYCHRONOU Nikolaos ; gem5 users
> > mailing list 
> > Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside
> > gem5 on Linux
> >
> >
> >
> > > -Original Message-
> > > From: POLYCHRONOU Nikolaos 
> > > Sent: 20 November 2020 10:26
> > > To: gem5 users mailing list ; Giacomo
> > > Travaglini 
> > > Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU
> > > inside
> > > gem5 on Linux
> > >
> > > I give some example
> > >
> > > 
> > > --
> > > --- perf stat -e armv8_pmuv3/cpu_cycles/
> > > ./dijkstra_small input.dat Performance counter stats for './dijkstra_small
> input.dat':
> > > 0  armv8_pmuv3/cpu_cycles/
> > >  0.012288848 seconds time elapsed
> > > 
> > > --
> > > --  perf stat -e
> > > armv8_pmuv3/br_immed_retired/,cpu-cycles,cache-
> > > misses,branch-
> > > misses,armv8_pmuv3/st_retired/,armv8_pmuv3/st_retired/,instructions
> > > ./dijks tra_small input.dat  Performance counter stats for
> > > './dijkstra_small
> > input.dat':
> > >
> > >  0  armv8_pmuv3/br_immed_retired/
> > >  0  cpu-cycles
> > >  0  cache-misses
> > >  244128  branch-misses
> > >  0  armv8_pmuv3/st_retired/
> > >  0  armv8_pmuv3/st_retired/
> > >  0  instructions
> > >
> > >0.011671384 seconds time elapsed
> > > 
> > > --
> > > ---  Performance counter stats for
> > > './example/build/armv8/release/bin/example -s
> > > 200 -n 10 -x 1 -z 10':
> > >
> > > 108323  branch-misses
> > >  0  cache-misses  #0.000 % of all 
> > > cache refs
> > >  0  cache-references
> > >  0  cycles
> > >  0  instructions
> > >  0  L1-dcache-load-misses #0.00% of all 
> > > L1-dcache hits
> > >  0  L1-dcache-load-misses #0.00% of all 
> > > L1-dcache hits
> > >  0  L1-dcache-loads
> > >  0  L1-dcache-store-misses
> > >  0  L1-dcache-stores
> > >  0  L1-icache-load-misses
> > > 358406  branch-load-misses
> > > 0  L1-icache-load-misses
> > > [0/44652]
> > > 358406  branch-load-misses
> > >3974014  branch-loads
> > >  0  dTLB-load-misses
> > >  0  iTLB-load-misses
> > >  0  armv8_pmuv3/b

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-25 Thread Giacomo Travaglini via gem5-users


> -Original Message-
> From: POLYCHRONOU Nikolaos 
> Sent: 25 November 2020 12:22
> To: Giacomo Travaglini ; gem5 users mailing
> list 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5
> on Linux
>
> Hello,
> To conclude my argument is that I instantiate the arm_o3 model and my
> scripts work perfectly.
> I am almost convince after spending some days that the HPI model in the
> minor cpu does not increment the cycle counter  as the CPU_STATE_ON is
> never used.
> Indeed in the O3 model is used. I will try to spend some time to see why and
> also try your suggestion King regards Nikolaos

Thanks for spotting this! This is a real bug in gem5: the cycle counters are 
not updated in the MinorCPU model
I will post a patch to fix this and will keep you posted

Kind Regards

Giacomo

>
> -Original Message-
> From: Giacomo Travaglini 
> Sent: Wednesday, November 25, 2020 11:12 AM
> To: POLYCHRONOU Nikolaos ; gem5 users
> mailing list 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5
> on Linux
>
>
>
> > -Original Message-
> > From: POLYCHRONOU Nikolaos 
> > Sent: 20 November 2020 10:26
> > To: gem5 users mailing list ; Giacomo Travaglini
> > 
> > Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside
> > gem5 on Linux
> >
> > I give some example
> >
> > --
> > --- perf stat -e armv8_pmuv3/cpu_cycles/ ./dijkstra_small
> > input.dat Performance counter stats for './dijkstra_small input.dat':
> > 0  armv8_pmuv3/cpu_cycles/
> >  0.012288848 seconds time elapsed
> > --
> > --  perf stat -e
> > armv8_pmuv3/br_immed_retired/,cpu-cycles,cache-
> > misses,branch-
> > misses,armv8_pmuv3/st_retired/,armv8_pmuv3/st_retired/,instructions
> > ./dijks tra_small input.dat  Performance counter stats for './dijkstra_small
> input.dat':
> >
> >  0  armv8_pmuv3/br_immed_retired/
> >  0  cpu-cycles
> >  0  cache-misses
> >  244128  branch-misses
> >  0  armv8_pmuv3/st_retired/
> >  0  armv8_pmuv3/st_retired/
> >  0  instructions
> >
> >0.011671384 seconds time elapsed
> > --
> > ---  Performance counter stats for
> > './example/build/armv8/release/bin/example -s
> > 200 -n 10 -x 1 -z 10':
> >
> > 108323  branch-misses
> >  0  cache-misses  #0.000 % of all cache 
> > refs
> >  0  cache-references
> >  0  cycles
> >  0  instructions
> >  0  L1-dcache-load-misses #0.00% of all 
> > L1-dcache hits
> >  0  L1-dcache-load-misses #0.00% of all 
> > L1-dcache hits
> >  0  L1-dcache-loads
> >  0  L1-dcache-store-misses
> >  0  L1-dcache-stores
> >  0  L1-icache-load-misses
> > 358406  branch-load-misses
> > 0  L1-icache-load-misses
> > [0/44652]
> > 358406  branch-load-misses
> >3974014  branch-loads
> >  0  dTLB-load-misses
> >  0  iTLB-load-misses
> >  0  armv8_pmuv3/br_immed_retired/
> > 358406  armv8_pmuv3/br_mis_pred/
> >3974014  armv8_pmuv3/br_pred/
> >2670637  armv8_pmuv3/br_retired/
> >  0  armv8_pmuv3/br_return_retired/
> >  0  armv8_pmuv3/cpu_cycles/
> >  0  armv8_pmuv3/inst_retired/
> >  0  armv8_pmuv3/inst_spec/
> >  0  armv8_pmuv3/l1d_cache/
> >  0  armv8_pmuv3/l1d_cache_refill/
> >  0  armv8_pmuv3/l1d_tlb_refill/
> >  0  armv8_pmuv3/l1i_cache_refill/
> >  0  armv8_pmuv3/l1i_tlb_refill/
> >  0  armv8_pmuv3/l2d_cache/
> >  0  armv8_pmuv3/l2d_cache_refill/
> >  0  armv8_pmuv3/l2d_cache_wb/
> > 2147483647  armv8_pmuv3/ld_retired/
> > 56345499

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-25 Thread POLYCHRONOU Nikolaos via gem5-users
Hello,
To conclude my argument is that I instantiate the arm_o3 model and my scripts 
work perfectly.
I am almost convince after spending some days that the HPI model in the minor 
cpu does not increment the cycle counter  as the CPU_STATE_ON is never used.
Indeed in the O3 model is used. I will try to spend some time to see why and 
also try your suggestion
King regards
Nikolaos

-Original Message-
From: Giacomo Travaglini  
Sent: Wednesday, November 25, 2020 11:12 AM
To: POLYCHRONOU Nikolaos ; gem5 users mailing list 

Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on 
Linux



> -Original Message-
> From: POLYCHRONOU Nikolaos 
> Sent: 20 November 2020 10:26
> To: gem5 users mailing list ; Giacomo Travaglini 
> 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside 
> gem5 on Linux
>
> I give some example
>
> --
> --- perf stat -e armv8_pmuv3/cpu_cycles/ ./dijkstra_small 
> input.dat Performance counter stats for './dijkstra_small input.dat':
> 0  armv8_pmuv3/cpu_cycles/
>  0.012288848 seconds time elapsed
> --
> --  perf stat -e 
> armv8_pmuv3/br_immed_retired/,cpu-cycles,cache-
> misses,branch-
> misses,armv8_pmuv3/st_retired/,armv8_pmuv3/st_retired/,instructions 
> ./dijks tra_small input.dat  Performance counter stats for './dijkstra_small 
> input.dat':
>
>  0  armv8_pmuv3/br_immed_retired/
>  0  cpu-cycles
>  0  cache-misses
>  244128  branch-misses
>  0  armv8_pmuv3/st_retired/
>  0  armv8_pmuv3/st_retired/
>  0  instructions
>
>0.011671384 seconds time elapsed
> --
> ---  Performance counter stats for 
> './example/build/armv8/release/bin/example -s
> 200 -n 10 -x 1 -z 10':
>
> 108323  branch-misses
>  0  cache-misses  #0.000 % of all cache 
> refs
>  0  cache-references
>  0  cycles
>  0  instructions
>  0  L1-dcache-load-misses #0.00% of all L1-dcache 
> hits
>  0  L1-dcache-load-misses #0.00% of all L1-dcache 
> hits
>  0  L1-dcache-loads
>  0  L1-dcache-store-misses
>  0  L1-dcache-stores
>  0  L1-icache-load-misses
> 358406  branch-load-misses
> 0  L1-icache-load-misses
> [0/44652]
> 358406  branch-load-misses
>3974014  branch-loads
>  0  dTLB-load-misses
>  0  iTLB-load-misses
>  0  armv8_pmuv3/br_immed_retired/
> 358406  armv8_pmuv3/br_mis_pred/
>3974014  armv8_pmuv3/br_pred/
>2670637  armv8_pmuv3/br_retired/
>  0  armv8_pmuv3/br_return_retired/
>  0  armv8_pmuv3/cpu_cycles/
>  0  armv8_pmuv3/inst_retired/
>  0  armv8_pmuv3/inst_spec/
>  0  armv8_pmuv3/l1d_cache/
>  0  armv8_pmuv3/l1d_cache_refill/
>  0  armv8_pmuv3/l1d_tlb_refill/
>  0  armv8_pmuv3/l1i_cache_refill/
>  0  armv8_pmuv3/l1i_tlb_refill/
>  0  armv8_pmuv3/l2d_cache/
>  0  armv8_pmuv3/l2d_cache_refill/
>  0  armv8_pmuv3/l2d_cache_wb/
> 2147483647  armv8_pmuv3/ld_retired/
> 5634549932  armv8_pmuv3/mem_access/   
> (76.27%)
> 8997366789  armv8_pmuv3/st_retired/   
> (47.74%)
>10917948112  armv8_pmuv3/sw_incr/  
> (19.67%)
>
>0.014571150 seconds time elapsed
> --
> ---
> --
>
> As you can see the cpucycles are never incremented. The implemented 
> events in the pmu are returning values.
> THis test is done without loading any module or touching anything. I 
> just boot and run this from a script.
> So from this test el0 access is not required but still the ccnt is 
> zero. I believe either i do sth wrong when i create my addPMUs (i dont 
> think so cause it dont seem relev

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-25 Thread Giacomo Travaglini via gem5-users


> -Original Message-
> From: POLYCHRONOU Nikolaos 
> Sent: 20 November 2020 10:26
> To: gem5 users mailing list ; Giacomo Travaglini
> 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5
> on Linux
>
> I give some example
>
> -
> perf stat -e armv8_pmuv3/cpu_cycles/ ./dijkstra_small input.dat Performance
> counter stats for './dijkstra_small input.dat':
> 0  armv8_pmuv3/cpu_cycles/
>  0.012288848 seconds time elapsed
> 
>  perf stat -e armv8_pmuv3/br_immed_retired/,cpu-cycles,cache-
> misses,branch-
> misses,armv8_pmuv3/st_retired/,armv8_pmuv3/st_retired/,instructions ./dijks
> tra_small input.dat  Performance counter stats for './dijkstra_small 
> input.dat':
>
>  0  armv8_pmuv3/br_immed_retired/
>  0  cpu-cycles
>  0  cache-misses
>  244128  branch-misses
>  0  armv8_pmuv3/st_retired/
>  0  armv8_pmuv3/st_retired/
>  0  instructions
>
>0.011671384 seconds time elapsed
> -
>  Performance counter stats for './example/build/armv8/release/bin/example -s
> 200 -n 10 -x 1 -z 10':
>
> 108323  branch-misses
>  0  cache-misses  #0.000 % of all cache 
> refs
>  0  cache-references
>  0  cycles
>  0  instructions
>  0  L1-dcache-load-misses #0.00% of all L1-dcache 
> hits
>  0  L1-dcache-load-misses #0.00% of all L1-dcache 
> hits
>  0  L1-dcache-loads
>  0  L1-dcache-store-misses
>  0  L1-dcache-stores
>  0  L1-icache-load-misses
> 358406  branch-load-misses
> 0  L1-icache-load-misses
> [0/44652]
> 358406  branch-load-misses
>3974014  branch-loads
>  0  dTLB-load-misses
>  0  iTLB-load-misses
>  0  armv8_pmuv3/br_immed_retired/
> 358406  armv8_pmuv3/br_mis_pred/
>3974014  armv8_pmuv3/br_pred/
>2670637  armv8_pmuv3/br_retired/
>  0  armv8_pmuv3/br_return_retired/
>  0  armv8_pmuv3/cpu_cycles/
>  0  armv8_pmuv3/inst_retired/
>  0  armv8_pmuv3/inst_spec/
>  0  armv8_pmuv3/l1d_cache/
>  0  armv8_pmuv3/l1d_cache_refill/
>  0  armv8_pmuv3/l1d_tlb_refill/
>  0  armv8_pmuv3/l1i_cache_refill/
>  0  armv8_pmuv3/l1i_tlb_refill/
>  0  armv8_pmuv3/l2d_cache/
>  0  armv8_pmuv3/l2d_cache_refill/
>  0  armv8_pmuv3/l2d_cache_wb/
> 2147483647  armv8_pmuv3/ld_retired/
> 5634549932  armv8_pmuv3/mem_access/   
> (76.27%)
> 8997366789  armv8_pmuv3/st_retired/   
> (47.74%)
>10917948112  armv8_pmuv3/sw_incr/  
> (19.67%)
>
>0.014571150 seconds time elapsed
> -
> --
>
> As you can see the cpucycles are never incremented. The implemented events
> in the pmu are returning values.
> THis test is done without loading any module or touching anything. I just boot
> and run this from a script.
> So from this test el0 access is not required but still the ccnt is zero. I 
> believe
> either i do sth wrong when i create my addPMUs (i dont think so cause it dont
> seem relevant as the other effects increase) or the counter is not hooked up
> Thank you
>

The cycle counter should be automatically enabled by default...
Could you rerun your application while putting a breakpoint on the 
BaseCPU::updateCycleCounters
And check what happens when you notify the ppActiveCycles?

I think it would be helpful if you could check if the cycle counter value gets 
incremented correctly
A print would also work, but I strongly recommend using gdb to understand what 
is going on

Kind Regards

Giacomo


> ________
> From: POLYCHRONOU Nikolaos via gem5-users [gem5-users@gem5.o

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-20 Thread POLYCHRONOU Nikolaos via gem5-users
I give some example

-
perf stat -e armv8_pmuv3/cpu_cycles/ ./dijkstra_small input.dat
Performance counter stats for './dijkstra_small input.dat':
0  armv8_pmuv3/cpu_cycles/
 0.012288848 seconds time elapsed

 perf stat -e 
armv8_pmuv3/br_immed_retired/,cpu-cycles,cache-misses,branch-misses,armv8_pmuv3/st_retired/,armv8_pmuv3/st_retired/,instructions
 ./dijkstra_small input.dat
 Performance counter stats for './dijkstra_small input.dat':

 0  armv8_pmuv3/br_immed_retired/
 0  cpu-cycles
 0  cache-misses
 244128  branch-misses
 0  armv8_pmuv3/st_retired/
 0  armv8_pmuv3/st_retired/
 0  instructions

   0.011671384 seconds time elapsed
-
 Performance counter stats for './example/build/armv8/release/bin/example -s 
200 -n 10 -x 1 -z 10':

108323  branch-misses
 0  cache-misses  #0.000 % of all cache refs
 0  cache-references
 0  cycles
 0  instructions
 0  L1-dcache-load-misses #0.00% of all L1-dcache 
hits
 0  L1-dcache-load-misses #0.00% of all L1-dcache 
hits
 0  L1-dcache-loads
 0  L1-dcache-store-misses
 0  L1-dcache-stores
 0  L1-icache-load-misses
358406  branch-load-misses
0  L1-icache-load-misses

  [0/44652]
358406  branch-load-misses
   3974014  branch-loads
 0  dTLB-load-misses
 0  iTLB-load-misses
 0  armv8_pmuv3/br_immed_retired/
358406  armv8_pmuv3/br_mis_pred/
   3974014  armv8_pmuv3/br_pred/
   2670637  armv8_pmuv3/br_retired/
 0  armv8_pmuv3/br_return_retired/
 0  armv8_pmuv3/cpu_cycles/
 0  armv8_pmuv3/inst_retired/
 0  armv8_pmuv3/inst_spec/
 0  armv8_pmuv3/l1d_cache/
 0  armv8_pmuv3/l1d_cache_refill/
 0  armv8_pmuv3/l1d_tlb_refill/
 0  armv8_pmuv3/l1i_cache_refill/   
   
 0  armv8_pmuv3/l1i_tlb_refill/
 0  armv8_pmuv3/l2d_cache/
 0  armv8_pmuv3/l2d_cache_refill/   
 0  armv8_pmuv3/l2d_cache_wb/
2147483647  armv8_pmuv3/ld_retired/
5634549932  armv8_pmuv3/mem_access/ 
  (76.27%)
8997366789  armv8_pmuv3/st_retired/ 
  (47.74%)
   10917948112  armv8_pmuv3/sw_incr/
  (19.67%)
   
   0.014571150 seconds time elapsed 
---

As you can see the cpucycles are never incremented. The implemented events in 
the pmu are returning values.
THis test is done without loading any module or touching anything. I just boot 
and run this from a script.
So from this test el0 access is not required but still the ccnt is zero. I 
believe either i do sth wrong when i create my addPMUs (i dont think so cause 
it dont seem relevant as the other effects increase) or the counter is not 
hooked up
Thank you


From: POLYCHRONOU Nikolaos via gem5-users [gem5-users@gem5.org]
Sent: 20 November 2020 08:57
To: Giacomo Travaglini; gem5 users mailing list
Cc: POLYCHRONOU Nikolaos
Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

Hello Giacomo,
Thank you for your help.
After doing what you proposed I tested perf and directly reading the counters 
but still the eventcounters work but the cyclecounter is always zero. I double 
check the implementations with known one from the literature, and also I read 
the gem5 console to see the messages.
I see that gem5 enables the cycle counter and the event counters. I check the 
configurations with the flag PMUverbose and they are correct. I see that I am 
able to read all of them. I try to do this without applying the module that 
enables EL0 access to the register and with insmod the module. One case that I 
can think is that my module don’t work, so the cycle

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-19 Thread POLYCHRONOU Nikolaos via gem5-users
Hello Giacomo,
Thank you for your help.
After doing what you proposed I tested perf and directly reading the counters 
but still the eventcounters work but the cyclecounter is always zero. I double 
check the implementations with known one from the literature, and also I read 
the gem5 console to see the messages.
I see that gem5 enables the cycle counter and the event counters. I check the 
configurations with the flag PMUverbose and they are correct. I see that I am 
able to read all of them. I try to do this without applying the module that 
enables EL0 access to the register and with insmod the module. One case that I 
can think is that my module don’t work, so the cycle counter return zero 
(usually if the userspace access is not enabled with a module an illegal 
instruction is returned) and the event counters can be read because a register 
is not implemented yet in the configuration, giving like this access to the 
event counters. Another thing will be that the DBGEN, NIDEN signals would be 
zero but I don’t see the case why to disable them in gem5 (if they are disabled 
access to read this counters will always return zero). As I try to perform the 
libflush library in gem5 armv8, I need the ccnt to work. I also didn’t find in 
the internet any solution with the cycle counter. I see problems with the PMU 
ie PMUccnt return zero but the solution is not documented. And in another 
example the branch miss is counted which I am also able to read.
Thank you

-Original Message-
From: Giacomo Travaglini  
Sent: Thursday, November 19, 2020 4:51 PM
To: POLYCHRONOU Nikolaos ; gem5 users mailing list 

Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on 
Linux



> -Original Message-
> From: POLYCHRONOU Nikolaos 
> Sent: 19 November 2020 12:04
> To: gem5 users mailing list ; Giacomo Travaglini 
> 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside 
> gem5 on Linux
>
> Hello Giacomo,
>
> So apparently it works when I don't define the --dtb as an option. 
> Quite funny I thought you need to add this option necessarily.
> I will test if now my scripts are running.
> I want some guidance on how to implement the
> > > Have you defined a probe point in the icache/bpred?
> > > You should instantiate a pmuProbePoint in those classes; those 
> > > will automatically notify the PMU probe listener if wired correctly.
> I will appreciate some guidance even basic steps that I can start.
>

I have never done it myself, but it involves adding probe points in the C++ 
world.
I suggest you to have a look at a single example, like the probe point for 
branch predictor misses.

Just grep for pmuProbePoint("Misses") or ppMisses in gem5 and see how the 
variable is used.

Then try to replicate it to your use case. Run a simple workload and check if 
the counter is correctly updated.

Once you get to the point where things work, feel free to contribute your 
addition to the develop branch if you want, so that other people can benefit 
from your success 

Kind Regards

Giacomo

> -Original Message-
> From: POLYCHRONOU Nikolaos via gem5-users 
> Sent: Thursday, November 19, 2020 12:27 PM
> To: Giacomo Travaglini ; gem5 users 
> mailing list 
> Cc: POLYCHRONOU Nikolaos 
> Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside 
> gem5 on Linux
>
> Okay I  will check this out.
> For the add of probes it is also added in the addPMUs?
> For example I see this in the comments of the addPMUs
>
> > :type ints: List[int]
> > :param events: Additional events to be measured by the PMUs
> > :type events: List[Union[ProbeEvent, SoftwareIncrement]]
> > """
>
>
> And
>
> > for ev in events:
> > isa.pmu.addEvent(ev)
>
>
> Is this where I need to hook up the probes or I need to go to the simObjects?
>
> Another question I had is it possible to link the counters of the 
> stats.txt and use them somehow in the c script instead of using the PMU 
> counters?
>
> Many thanks
> Niko
>
>
> -----Original Message-
> From: Giacomo Travaglini 
> Sent: Thursday, November 19, 2020 12:22 PM
> To: POLYCHRONOU Nikolaos ; gem5 users 
> mailing list 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside 
> gem5 on Linux
>
>
>
> > -Original Message-
> > From: POLYCHRONOU Nikolaos 
> > Sent: 19 November 2020 11:03
> > To: Giacomo Travaglini ; gem5 users 
> > mailing list 
> > Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU 
> > inside
> > gem5 on Linux
> >
> > For the dtb i use the following armv8_gem5_v1_1cpu.dtb
> >
>
> That's the problem. If y

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-19 Thread Giacomo Travaglini via gem5-users


> -Original Message-
> From: POLYCHRONOU Nikolaos 
> Sent: 19 November 2020 12:04
> To: gem5 users mailing list ; Giacomo Travaglini
> 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5
> on Linux
>
> Hello Giacomo,
>
> So apparently it works when I don't define the --dtb as an option. Quite 
> funny I
> thought you need to add this option necessarily.
> I will test if now my scripts are running.
> I want some guidance on how to implement the
> > > Have you defined a probe point in the icache/bpred?
> > > You should instantiate a pmuProbePoint in those classes; those will
> > > automatically notify the PMU probe listener if wired correctly.
> I will appreciate some guidance even basic steps that I can start.
>

I have never done it myself, but it involves adding probe points in the C++ 
world.
I suggest you to have a look at a single example, like the probe point for 
branch predictor misses.

Just grep for pmuProbePoint("Misses") or ppMisses in gem5 and see how the 
variable is used.

Then try to replicate it to your use case. Run a simple workload and check if 
the counter
is correctly updated.

Once you get to the point where things work, feel free to contribute your 
addition to the develop branch if you want, so that other people can benefit 
from your success 

Kind Regards

Giacomo

> -Original Message-
> From: POLYCHRONOU Nikolaos via gem5-users 
> Sent: Thursday, November 19, 2020 12:27 PM
> To: Giacomo Travaglini ; gem5 users mailing
> list 
> Cc: POLYCHRONOU Nikolaos 
> Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on
> Linux
>
> Okay I  will check this out.
> For the add of probes it is also added in the addPMUs?
> For example I see this in the comments of the addPMUs
>
> > :type ints: List[int]
> > :param events: Additional events to be measured by the PMUs
> > :type events: List[Union[ProbeEvent, SoftwareIncrement]]
> > """
>
>
> And
>
> > for ev in events:
> > isa.pmu.addEvent(ev)
>
>
> Is this where I need to hook up the probes or I need to go to the simObjects?
>
> Another question I had is it possible to link the counters of the stats.txt 
> and
> use them somehow in the c script instead of using the PMU counters?
>
> Many thanks
> Niko
>
>
> -Original Message-
> From: Giacomo Travaglini 
> Sent: Thursday, November 19, 2020 12:22 PM
> To: POLYCHRONOU Nikolaos ; gem5 users
> mailing list 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5
> on Linux
>
>
>
> > -Original Message-
> > From: POLYCHRONOU Nikolaos 
> > Sent: 19 November 2020 11:03
> > To: Giacomo Travaglini ; gem5 users
> > mailing list 
> > Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside
> > gem5 on Linux
> >
> > For the dtb i use the following armv8_gem5_v1_1cpu.dtb
> >
>
> That's the problem. If you have a look at the DTS sources, or if you decompile
> the DTB, you will see there Is no PMU entry. You have to either add it 
> manually
> yourself, or you should rely on DTB autogeneration.
>
> The latter is the preferred approach: simply run starter_fs.py *without* the 
> --
> dtb option
>
> > Fot the  configs/example/arm/devices.py? i do the following modifications:
> >def addPMUs(self, ints, events=[]):
> > """
> > Instantiates 1 ArmPMU per PE. The method is accepting a list of
> > interrupt numbers (ints) used by the PMU and a list of events to
> > register in it.
> >
> > :param ints: List of interrupt numbers. The code will iterate over
> > the cpu list in order and will assign to every cpu in the 
> > cluster
> > a PMU with the matching interrupt.
> > :type ints: List[int]
> > :param events: Additional events to be measured by the PMUs
> > :type events: List[Union[ProbeEvent, SoftwareIncrement]]
> > """
> > assert len(ints) == len(self.cpus)
> > for cpu, pint in zip(self.cpus, ints):
> > int_cls = ArmPPI if pint < 32 else ArmSPI
> > for isa in cpu.isa:
> > isa.pmu = ArmPMU(interrupt=int_cls(num=pint))
> > isa.pmu.addArchEvents(cpu=cpu, itb=cpu.itb, dtb=cpu.dtb,
> >   icache=getattr(cpu, 'icache', None),
> >   dcache=getattr(cpu, 'dcache', None),
> >  

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-19 Thread POLYCHRONOU Nikolaos via gem5-users
Hello Giacomo,

So apparently it works when I don't define the --dtb as an option. Quite funny 
I thought you need to add this option necessarily.
I will test if now my scripts are running.
I want some guidance on how to implement the 
> > Have you defined a probe point in the icache/bpred?
> > You should instantiate a pmuProbePoint in those classes; those will 
> > automatically notify the PMU probe listener if wired correctly.
I will appreciate some guidance even basic steps that I can start.

-Original Message-
From: POLYCHRONOU Nikolaos via gem5-users  
Sent: Thursday, November 19, 2020 12:27 PM
To: Giacomo Travaglini ; gem5 users mailing list 

Cc: POLYCHRONOU Nikolaos 
Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

Okay I  will check this out.
For the add of probes it is also added in the addPMUs?
For example I see this in the comments of the addPMUs

> :type ints: List[int]
> :param events: Additional events to be measured by the PMUs
> :type events: List[Union[ProbeEvent, SoftwareIncrement]]
> """


And

> for ev in events:
> isa.pmu.addEvent(ev) 


Is this where I need to hook up the probes or I need to go to the simObjects?

Another question I had is it possible to link the counters of the stats.txt and 
use them somehow in the c script instead of using the PMU counters? 

Many thanks 
Niko


-Original Message-
From: Giacomo Travaglini  
Sent: Thursday, November 19, 2020 12:22 PM
To: POLYCHRONOU Nikolaos ; gem5 users mailing list 

Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on 
Linux



> -Original Message-
> From: POLYCHRONOU Nikolaos 
> Sent: 19 November 2020 11:03
> To: Giacomo Travaglini ; gem5 users 
> mailing list 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside 
> gem5 on Linux
>
> For the dtb i use the following armv8_gem5_v1_1cpu.dtb
>

That's the problem. If you have a look at the DTS sources, or if you decompile 
the DTB, you will see there Is no PMU entry. You have to either add it manually 
yourself, or you should rely on DTB autogeneration.

The latter is the preferred approach: simply run starter_fs.py *without* the 
--dtb option

> Fot the  configs/example/arm/devices.py? i do the following modifications:
>def addPMUs(self, ints, events=[]):
> """
> Instantiates 1 ArmPMU per PE. The method is accepting a list of
> interrupt numbers (ints) used by the PMU and a list of events to
> register in it.
>
> :param ints: List of interrupt numbers. The code will iterate over
> the cpu list in order and will assign to every cpu in the cluster
> a PMU with the matching interrupt.
> :type ints: List[int]
> :param events: Additional events to be measured by the PMUs
> :type events: List[Union[ProbeEvent, SoftwareIncrement]]
> """
> assert len(ints) == len(self.cpus)
> for cpu, pint in zip(self.cpus, ints):
> int_cls = ArmPPI if pint < 32 else ArmSPI
> for isa in cpu.isa:
> isa.pmu = ArmPMU(interrupt=int_cls(num=pint))
> isa.pmu.addArchEvents(cpu=cpu, itb=cpu.itb, dtb=cpu.dtb,
>   icache=getattr(cpu, 'icache', None),
>   dcache=getattr(cpu, 'dcache', None),
>   l2cache=getattr(self, 'l2', None))
> for ev in events:
> isa.pmu.addEvent(ev) and in the def SImpleSystem
>  def AddCaches
> +   ints = [20 for i in range (self._num_cpus)]
> # connect each cluster to the memory hierarchy
> for cluster in self._clusters:
> cluster.addPMUs(ints)
> cluster.connectMemSide(cluster_mem_bus)
>
>
> I am not sure this is the correct way to instantiate the pmu in the 
> devices.py but is seems to be working for what i want to do-- READ 
> the registers with assemply Can you suggest me anothe rmore clever way 
> if you have one?
>
> Also when i load the module to enable user el0 access to the pmu it 
> seems the module is not working so yes maybe it sisnt recognized.
>
> The PMU is enabled in the kernel config. i check this so many times 
> but i am not able to see the message in the boot that they are enabled 
> as i saw in some posts.
> Verified again now. PMU is instantiated in the kernel config
>
> ____
> From: Giacomo Travaglini [giacomo.travagl...@arm.com]
> Sent: 19 November 2020 11:41
> To: POLYCHRONOU Nikolaos; gem5 users mailing list

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-19 Thread POLYCHRONOU Nikolaos via gem5-users
Okay I  will check this out.
For the add of probes it is also added in the addPMUs?
For example I see this in the comments of the addPMUs

> :type ints: List[int]
> :param events: Additional events to be measured by the PMUs
> :type events: List[Union[ProbeEvent, SoftwareIncrement]]
> """


And

> for ev in events:
> isa.pmu.addEvent(ev) 


Is this where I need to hook up the probes or I need to go to the simObjects?

Another question I had is it possible to link the counters of the stats.txt and 
use them somehow in the c script instead of using the PMU counters? 

Many thanks 
Niko


-Original Message-
From: Giacomo Travaglini  
Sent: Thursday, November 19, 2020 12:22 PM
To: POLYCHRONOU Nikolaos ; gem5 users mailing list 

Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on 
Linux



> -Original Message-
> From: POLYCHRONOU Nikolaos 
> Sent: 19 November 2020 11:03
> To: Giacomo Travaglini ; gem5 users 
> mailing list 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside 
> gem5 on Linux
>
> For the dtb i use the following armv8_gem5_v1_1cpu.dtb
>

That's the problem. If you have a look at the DTS sources, or if you decompile 
the DTB, you will see there Is no PMU entry. You have to either add it manually 
yourself, or you should rely on DTB autogeneration.

The latter is the preferred approach: simply run starter_fs.py *without* the 
--dtb option

> Fot the  configs/example/arm/devices.py? i do the following modifications:
>def addPMUs(self, ints, events=[]):
> """
> Instantiates 1 ArmPMU per PE. The method is accepting a list of
> interrupt numbers (ints) used by the PMU and a list of events to
> register in it.
>
> :param ints: List of interrupt numbers. The code will iterate over
> the cpu list in order and will assign to every cpu in the cluster
> a PMU with the matching interrupt.
> :type ints: List[int]
> :param events: Additional events to be measured by the PMUs
> :type events: List[Union[ProbeEvent, SoftwareIncrement]]
> """
> assert len(ints) == len(self.cpus)
> for cpu, pint in zip(self.cpus, ints):
> int_cls = ArmPPI if pint < 32 else ArmSPI
> for isa in cpu.isa:
> isa.pmu = ArmPMU(interrupt=int_cls(num=pint))
> isa.pmu.addArchEvents(cpu=cpu, itb=cpu.itb, dtb=cpu.dtb,
>   icache=getattr(cpu, 'icache', None),
>   dcache=getattr(cpu, 'dcache', None),
>   l2cache=getattr(self, 'l2', None))
> for ev in events:
> isa.pmu.addEvent(ev) and in the def SImpleSystem
>  def AddCaches
> +   ints = [20 for i in range (self._num_cpus)]
> # connect each cluster to the memory hierarchy
> for cluster in self._clusters:
> cluster.addPMUs(ints)
> cluster.connectMemSide(cluster_mem_bus)
>
>
> I am not sure this is the correct way to instantiate the pmu in the 
> devices.py but is seems to be working for what i want to do-- READ 
> the registers with assemply Can you suggest me anothe rmore clever way 
> if you have one?
>
> Also when i load the module to enable user el0 access to the pmu it 
> seems the module is not working so yes maybe it sisnt recognized.
>
> The PMU is enabled in the kernel config. i check this so many times 
> but i am not able to see the message in the boot that they are enabled 
> as i saw in some posts.
> Verified again now. PMU is instantiated in the kernel config
>
> ____________
> From: Giacomo Travaglini [giacomo.travagl...@arm.com]
> Sent: 19 November 2020 11:41
> To: POLYCHRONOU Nikolaos; gem5 users mailing list
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside 
> gem5 on Linux
>
> > -Original Message-
> > From: POLYCHRONOU Nikolaos 
> > Sent: 19 November 2020 10:25
> > To: gem5 users mailing list 
> > Cc: Giacomo Travaglini 
> > Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU 
> > inside
> gem5
> > on Linux
> >
> > Hello and thank you for your answer, Yes I write assemply language 
> > to instantiate the counters. I don't bother with perf even if I 
> > tried to access the cycle counter but the file descriptor didnt 
> > open.
> >   static int perf_fd_cpu_cycles;
> >   static struct perf_event_attr attr_cpu_cycles;
> >   attr_cpu_cycles.size 

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-19 Thread Giacomo Travaglini via gem5-users



> -Original Message-
> From: POLYCHRONOU Nikolaos 
> Sent: 19 November 2020 11:03
> To: Giacomo Travaglini ; gem5 users mailing
> list 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5
> on Linux
>
> For the dtb i use the following armv8_gem5_v1_1cpu.dtb
>

That's the problem. If you have a look at the DTS sources, or if you decompile 
the DTB, you will see there
Is no PMU entry. You have to either add it manually yourself, or you should 
rely on DTB autogeneration.

The latter is the preferred approach: simply run starter_fs.py *without* the 
--dtb option

> Fot the  configs/example/arm/devices.py? i do the following modifications:
>def addPMUs(self, ints, events=[]):
> """
> Instantiates 1 ArmPMU per PE. The method is accepting a list of
> interrupt numbers (ints) used by the PMU and a list of events to
> register in it.
>
> :param ints: List of interrupt numbers. The code will iterate over
> the cpu list in order and will assign to every cpu in the cluster
> a PMU with the matching interrupt.
> :type ints: List[int]
> :param events: Additional events to be measured by the PMUs
> :type events: List[Union[ProbeEvent, SoftwareIncrement]]
> """
> assert len(ints) == len(self.cpus)
> for cpu, pint in zip(self.cpus, ints):
> int_cls = ArmPPI if pint < 32 else ArmSPI
> for isa in cpu.isa:
> isa.pmu = ArmPMU(interrupt=int_cls(num=pint))
> isa.pmu.addArchEvents(cpu=cpu, itb=cpu.itb, dtb=cpu.dtb,
>   icache=getattr(cpu, 'icache', None),
>   dcache=getattr(cpu, 'dcache', None),
>   l2cache=getattr(self, 'l2', None))
> for ev in events:
> isa.pmu.addEvent(ev) and in the def SImpleSystem
>  def AddCaches
> +   ints = [20 for i in range (self._num_cpus)]
> # connect each cluster to the memory hierarchy
> for cluster in self._clusters:
> cluster.addPMUs(ints)
> cluster.connectMemSide(cluster_mem_bus)
>
>
> I am not sure this is the correct way to instantiate the pmu in the devices.py
> but is seems to be working for what i want to do-- READ the registers with
> assemply
> Can you suggest me anothe rmore clever way if you have one?
>
> Also when i load the module to enable user el0 access to the pmu it seems the
> module is not working so yes maybe it sisnt recognized.
>
> The PMU is enabled in the kernel config. i check this so many times but i am
> not able to see the message in the boot that they are enabled as i saw in some
> posts.
> Verified again now. PMU is instantiated in the kernel config
>
> ____________________
> From: Giacomo Travaglini [giacomo.travagl...@arm.com]
> Sent: 19 November 2020 11:41
> To: POLYCHRONOU Nikolaos; gem5 users mailing list
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5
> on Linux
>
> > -----Original Message-
> > From: POLYCHRONOU Nikolaos 
> > Sent: 19 November 2020 10:25
> > To: gem5 users mailing list 
> > Cc: Giacomo Travaglini 
> > Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside
> gem5
> > on Linux
> >
> > Hello and thank you for your answer,
> > Yes I write assemply language to instantiate the counters. I don't bother 
> > with
> > perf even if I tried to access the cycle counter but the file descriptor 
> > didnt
> > open.
> >   static int perf_fd_cpu_cycles;
> >   static struct perf_event_attr attr_cpu_cycles;
> >   attr_cpu_cycles.size = sizeof(attr_cpu_cycles);
> >   attr_cpu_cycles.exclude_kernel = 1;
> >   attr_cpu_cycles.exclude_hv = 1;
> >   attr_cpu_cycles.exclude_callchain_kernel = 1;
> >   attr_cpu_cycles.type = PERF_TYPE_RAW;
> >   attr_cpu_cycles.config = 0x11;
> >
> >   /* Open the file descriptor corresponding to this counter. The counter
> >   should start at this moment. */
> >   if ((perf_fd_cpu_cycles = syscall(__NR_perf_event_open, _cpu_cycles,
> 0,
> > -1, -1, 0)) == -1)
> >fprintf(stderr, "perf_event_open fail %d %d: %s\n",
> > perf_fd_cpu_cycles, errno, strerror(errno)); The above code is an example I
> > used from the posts I attached but with the cycle counter.
> >
>
> If you cannot open the file descriptor it means there is something wrong. It
> means the Linux Kernel doesn't recognise a PMU.
>
> This could be 

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-19 Thread POLYCHRONOU Nikolaos via gem5-users
For the dtb i use the following armv8_gem5_v1_1cpu.dtb

Fot the  configs/example/arm/devices.py? i do the following modifications:
   def addPMUs(self, ints, events=[]):
"""
Instantiates 1 ArmPMU per PE. The method is accepting a list of
interrupt numbers (ints) used by the PMU and a list of events to
register in it.

:param ints: List of interrupt numbers. The code will iterate over
the cpu list in order and will assign to every cpu in the cluster
a PMU with the matching interrupt.
:type ints: List[int]
:param events: Additional events to be measured by the PMUs
:type events: List[Union[ProbeEvent, SoftwareIncrement]]
"""
assert len(ints) == len(self.cpus)
for cpu, pint in zip(self.cpus, ints):
int_cls = ArmPPI if pint < 32 else ArmSPI
for isa in cpu.isa:
isa.pmu = ArmPMU(interrupt=int_cls(num=pint))
isa.pmu.addArchEvents(cpu=cpu, itb=cpu.itb, dtb=cpu.dtb,
  icache=getattr(cpu, 'icache', None),
  dcache=getattr(cpu, 'dcache', None),
  l2cache=getattr(self, 'l2', None))
for ev in events:
isa.pmu.addEvent(ev)
and in the 
def SImpleSystem
 def AddCaches
+   ints = [20 for i in range (self._num_cpus)]
# connect each cluster to the memory hierarchy
for cluster in self._clusters:
cluster.addPMUs(ints)
cluster.connectMemSide(cluster_mem_bus)


I am not sure this is the correct way to instantiate the pmu in the devices.py 
but is seems to be working for what i want to do-- READ the registers with 
assemply
Can you suggest me anothe rmore clever way if you have one?

Also when i load the module to enable user el0 access to the pmu it seems the 
module is not working so yes maybe it sisnt recognized.

The PMU is enabled in the kernel config. i check this so many times but i am 
not able to see the message in the boot that they are enabled as i saw in some 
posts.
Verified again now. PMU is instantiated in the kernel config


From: Giacomo Travaglini [giacomo.travagl...@arm.com]
Sent: 19 November 2020 11:41
To: POLYCHRONOU Nikolaos; gem5 users mailing list
Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on 
Linux

> -Original Message-
> From: POLYCHRONOU Nikolaos 
> Sent: 19 November 2020 10:25
> To: gem5 users mailing list 
> Cc: Giacomo Travaglini 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5
> on Linux
>
> Hello and thank you for your answer,
> Yes I write assemply language to instantiate the counters. I don't bother with
> perf even if I tried to access the cycle counter but the file descriptor didnt
> open.
>   static int perf_fd_cpu_cycles;
>   static struct perf_event_attr attr_cpu_cycles;
>   attr_cpu_cycles.size = sizeof(attr_cpu_cycles);
>   attr_cpu_cycles.exclude_kernel = 1;
>   attr_cpu_cycles.exclude_hv = 1;
>   attr_cpu_cycles.exclude_callchain_kernel = 1;
>   attr_cpu_cycles.type = PERF_TYPE_RAW;
>   attr_cpu_cycles.config = 0x11;
>
>   /* Open the file descriptor corresponding to this counter. The counter
>   should start at this moment. */
>   if ((perf_fd_cpu_cycles = syscall(__NR_perf_event_open, _cpu_cycles, 0,
> -1, -1, 0)) == -1)
>fprintf(stderr, "perf_event_open fail %d %d: %s\n",
> perf_fd_cpu_cycles, errno, strerror(errno)); The above code is an example I
> used from the posts I attached but with the cycle counter.
>

If you cannot open the file descriptor it means there is something wrong. It 
means the Linux Kernel doesn't recognise a PMU.

This could be caused by one of the following reasons

1) You are not instantiating a PMU in the gem5 platform. Are you using the 
addPMUs helper in configs/example/arm/devices.py?
2) Which DTB are you using? Are you relying  on a prebuilt DTB or on 
autogeneration? You should check if your DTB contains a valid PMU entry
3) Is the PMU driver enabled by your kernel configs?

https://community.arm.com/developer/ip-products/system/b/embedded-blog/posts/using-the-arm-performance-monitor-unit-pmu-linux-driver

>
> The assemply I use to instantiate the ccnt is the one provided in the 
> libraries
> Armageddon/libflush. I will take a look in the one you attach me.
>
> As it seems all the event counters are different with the ones in the 
> stats.txt.
> But what I do is
> M5 restestats
> Run code (instantiate pmus ->   code > read pmus)
> M5 dumpstats

I understand, but which events in particular? Anyway as it seems like the PMU 
is not correctly configured, I would defer
The handling

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-19 Thread Giacomo Travaglini via gem5-users



> -Original Message-
> From: POLYCHRONOU Nikolaos 
> Sent: 19 November 2020 10:25
> To: gem5 users mailing list 
> Cc: Giacomo Travaglini 
> Subject: RE: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5
> on Linux
>
> Hello and thank you for your answer,
> Yes I write assemply language to instantiate the counters. I don't bother with
> perf even if I tried to access the cycle counter but the file descriptor didnt
> open.
>   static int perf_fd_cpu_cycles;
>   static struct perf_event_attr attr_cpu_cycles;
>   attr_cpu_cycles.size = sizeof(attr_cpu_cycles);
>   attr_cpu_cycles.exclude_kernel = 1;
>   attr_cpu_cycles.exclude_hv = 1;
>   attr_cpu_cycles.exclude_callchain_kernel = 1;
>   attr_cpu_cycles.type = PERF_TYPE_RAW;
>   attr_cpu_cycles.config = 0x11;
>
>   /* Open the file descriptor corresponding to this counter. The counter
>   should start at this moment. */
>   if ((perf_fd_cpu_cycles = syscall(__NR_perf_event_open, _cpu_cycles, 0,
> -1, -1, 0)) == -1)
>fprintf(stderr, "perf_event_open fail %d %d: %s\n",
> perf_fd_cpu_cycles, errno, strerror(errno)); The above code is an example I
> used from the posts I attached but with the cycle counter.
>

If you cannot open the file descriptor it means there is something wrong. It 
means the Linux Kernel doesn't recognise a PMU.

This could be caused by one of the following reasons

1) You are not instantiating a PMU in the gem5 platform. Are you using the 
addPMUs helper in configs/example/arm/devices.py?
2) Which DTB are you using? Are you relying  on a prebuilt DTB or on 
autogeneration? You should check if your DTB contains a valid PMU entry
3) Is the PMU driver enabled by your kernel configs?

https://community.arm.com/developer/ip-products/system/b/embedded-blog/posts/using-the-arm-performance-monitor-unit-pmu-linux-driver

>
> The assemply I use to instantiate the ccnt is the one provided in the 
> libraries
> Armageddon/libflush. I will take a look in the one you attach me.
>
> As it seems all the event counters are different with the ones in the 
> stats.txt.
> But what I do is
> M5 restestats
> Run code (instantiate pmus ->   code > read pmus)
> M5 dumpstats

I understand, but which events in particular? Anyway as it seems like the PMU 
is not correctly configured, I would defer
The handling of this problem until we are sure we are doing things right.

Anyway I can tell you should expect a small mismatch:

M5 restestats <- gem5 starts counting here
Run code (instantiate pmus ->   code > read pmus) <- PMU starts 
counting here
M5 dumpstats

>
> So in a way I see why there is a difference. I plan to include the 
> asm_volatile of
> these instructions in my C code and see again.
>
> Also for the probe points I can search it a little bit.
> Regards
> Nikos
>
>
> -----Original Message-----
> From: Giacomo Travaglini via gem5-users 
> Sent: Thursday, November 19, 2020 11:09 AM
> To: gem5 users mailing list 
> Cc: Giacomo Travaglini 
> Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on
> Linux
>
> Hi Nikolaos
>
> > -Original Message-
> > From: POLYCHRONOU Nikolaos via gem5-users 
> > Sent: 18 November 2020 07:20
> > To: gem5-users@gem5.org
> > Cc: POLYCHRONOU Nikolaos 
> > Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside
> > gem5 on Linux
> >
> > Helllo.
> >
> > I encounter the following problem when I try to simulate the
> > starter_fs.py in aarch64.
> >
> > Following these following posts https://www.mail-archive.com/gem5-
> > us...@gem5.org/msg18401.html <https://www.mail-archive.com/gem5-
> > us...@gem5.org/msg18401.html>  &
> > https://stackoverflow.com/questions/63988672/using-perf-event-with-the
> > -
> > arm-pmu-inside-gem5
> > <https://stackoverflow.com/questions/63988672/using-
> > perf-event-with-the-arm-pmu-inside-gem5>
> >
> > I did the steps to apply the patch and changes  and also instantiate
> > the pmus. I write a script in the image to access directly the registers.
> >
> That's great, I can see Pierre documented really well his work. Which branch
> are you using?
> Just so you know, all required patches are now merged into develop and will
> be part of next Release (gem5v21).
>
> When you say you are accessing the registers directly, do you mean you are
> adding some inline Assembly to manually initialize the PMU? Or are you relying
> on the perf_events APIs (which is basically a syscall)?
>
> > I manage to read all the events except the cycle counter which always return
> 0.
> > I try to read the cycle counter by instantiat

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-19 Thread POLYCHRONOU Nikolaos via gem5-users
Hello and thank you for your answer,
Yes I write assemply language to instantiate the counters. I don't bother with 
perf even if I tried to access the cycle counter but the file descriptor didnt 
open.
  static int perf_fd_cpu_cycles;
  static struct perf_event_attr attr_cpu_cycles;
  attr_cpu_cycles.size = sizeof(attr_cpu_cycles);
  attr_cpu_cycles.exclude_kernel = 1;
  attr_cpu_cycles.exclude_hv = 1;
  attr_cpu_cycles.exclude_callchain_kernel = 1;
  attr_cpu_cycles.type = PERF_TYPE_RAW;
  attr_cpu_cycles.config = 0x11;

  /* Open the file descriptor corresponding to this counter. The counter
  should start at this moment. */
  if ((perf_fd_cpu_cycles = syscall(__NR_perf_event_open, _cpu_cycles, 0, 
-1, -1, 0)) == -1)
fprintf(stderr, "perf_event_open fail %d %d: %s\n", perf_fd_cpu_cycles, 
errno, strerror(errno));
The above code is an example I used from the posts I attached but with the 
cycle counter.


The assemply I use to instantiate the ccnt is the one provided in the libraries 
Armageddon/libflush. I will take a look in the one you attach me.

As it seems all the event counters are different with the ones in the stats.txt.
But what I do is 
M5 restestats
Run code (instantiate pmus ->   code > read pmus)
M5 dumpstats

So in a way I see why there is a difference. I plan to include the asm_volatile 
of these instructions in my C code and see again.

Also for the probe points I can search it a little bit.
Regards
Nikos


-Original Message-
From: Giacomo Travaglini via gem5-users  
Sent: Thursday, November 19, 2020 11:09 AM
To: gem5 users mailing list 
Cc: Giacomo Travaglini 
Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

Hi Nikolaos

> -Original Message-
> From: POLYCHRONOU Nikolaos via gem5-users 
> Sent: 18 November 2020 07:20
> To: gem5-users@gem5.org
> Cc: POLYCHRONOU Nikolaos 
> Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside 
> gem5 on Linux
>
> Helllo.
>
> I encounter the following problem when I try to simulate the 
> starter_fs.py in aarch64.
>
> Following these following posts https://www.mail-archive.com/gem5-
> us...@gem5.org/msg18401.html <https://www.mail-archive.com/gem5-
> us...@gem5.org/msg18401.html>  &
> https://stackoverflow.com/questions/63988672/using-perf-event-with-the
> -
> arm-pmu-inside-gem5 
> <https://stackoverflow.com/questions/63988672/using-
> perf-event-with-the-arm-pmu-inside-gem5>
>
> I did the steps to apply the patch and changes  and also instantiate 
> the pmus. I write a script in the image to access directly the registers.
>
That's great, I can see Pierre documented really well his work. Which branch 
are you using?
Just so you know, all required patches are now merged into develop and will be 
part of next Release (gem5v21).

When you say you are accessing the registers directly, do you mean you are 
adding some inline Assembly to manually initialize the PMU? Or are you relying 
on the perf_events APIs (which is basically a syscall)?

> I manage to read all the events except the cycle counter which always return 
> 0.
> I try to read the cycle counter by instantiating a pmu event counter 
> with 0x11 but as reading from the ccnt it didn't work as well.
>

If you are manually accessing the PMU via inline assembly (MSR/MRS), it might 
be that you are not correctly initializing The cycle counter.

The following article explains how to access/initialize the PMU either manually 
or via perf_event_open

http://zhiyisun.github.io/2016/03/02/How-to-Use-Performance-Monitor-Unit-(PMU)-of-64-bit-ARMv8-A-in-Linux.html

You can debug what is going on by

- Checking gem5 warnings in stdout/stderr
- Using the PMUVerbose debug flag
- Using gdb and put a breakpoint on any PMU read/write to understand what is 
going on.

> How GEM5 increments this counter. Are the steps to read it the same as 
> in a real platform or the simulator has a mismatch configuration?
>
> Also the values obtained reading the counters are not exactly the same 
> as the
> m5 resetstats - m5 dumpstats. I guess maybe these two are syscalls as I read .
>

Which stats / event counters are different?

> Another question is how to add the other events that are not 
> implemented. I tried to do the following in the ArmPMU.py but it 
> didn't work despite seeing the events created in the console they return 0 
> values.
>
> self.addEvent(ProbeEvent(self,0x01,icache,"L1I_CACHE_REFILL"))
>
>
>
> self.addEvent(ProbeEvent(self,0x0D, bpred, "BR_IMMED_RETIRED"))
>
>
>
> Probably you need to add some counters in the component? Or are the 
> implemented and I do sth wrong?
>
>

Have you defined a probe point in the icache/bpred?
You should instantiate a pmuProbePoint in those classes; those will 
automatically notify t

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-19 Thread Giacomo Travaglini via gem5-users
Hi Nikolaos

> -Original Message-
> From: POLYCHRONOU Nikolaos via gem5-users 
> Sent: 18 November 2020 07:20
> To: gem5-users@gem5.org
> Cc: POLYCHRONOU Nikolaos 
> Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on
> Linux
>
> Helllo.
>
> I encounter the following problem when I try to simulate the starter_fs.py in
> aarch64.
>
> Following these following posts https://www.mail-archive.com/gem5-
> us...@gem5.org/msg18401.html <https://www.mail-archive.com/gem5-
> us...@gem5.org/msg18401.html>  &
> https://stackoverflow.com/questions/63988672/using-perf-event-with-the-
> arm-pmu-inside-gem5 <https://stackoverflow.com/questions/63988672/using-
> perf-event-with-the-arm-pmu-inside-gem5>
>
> I did the steps to apply the patch and changes  and also instantiate the 
> pmus. I
> write a script in the image to access directly the registers.
>
That's great, I can see Pierre documented really well his work. Which branch 
are you using?
Just so you know, all required patches are now merged into develop and will be 
part of next
Release (gem5v21).

When you say you are accessing the registers directly, do you mean you are 
adding some inline
Assembly to manually initialize the PMU? Or are you relying on the perf_events 
APIs (which is basically a syscall)?

> I manage to read all the events except the cycle counter which always return 
> 0.
> I try to read the cycle counter by instantiating a pmu event counter with 0x11
> but as reading from the ccnt it didn't work as well.
>

If you are manually accessing the PMU via inline assembly (MSR/MRS), it might 
be that you are not correctly initializing
The cycle counter.

The following article explains how to access/initialize the PMU either manually 
or via perf_event_open

http://zhiyisun.github.io/2016/03/02/How-to-Use-Performance-Monitor-Unit-(PMU)-of-64-bit-ARMv8-A-in-Linux.html

You can debug what is going on by

- Checking gem5 warnings in stdout/stderr
- Using the PMUVerbose debug flag
- Using gdb and put a breakpoint on any PMU read/write to understand what is 
going on.

> How GEM5 increments this counter. Are the steps to read it the same as in a
> real platform or the simulator has a mismatch configuration?
>
> Also the values obtained reading the counters are not exactly the same as the
> m5 resetstats - m5 dumpstats. I guess maybe these two are syscalls as I read .
>

Which stats / event counters are different?

> Another question is how to add the other events that are not implemented. I
> tried to do the following in the ArmPMU.py but it didn't work despite seeing
> the events created in the console they return 0 values.
>
> self.addEvent(ProbeEvent(self,0x01,icache,"L1I_CACHE_REFILL"))
>
>
>
> self.addEvent(ProbeEvent(self,0x0D, bpred, "BR_IMMED_RETIRED"))
>
>
>
> Probably you need to add some counters in the component? Or are the
> implemented and I do sth wrong?
>
>

Have you defined a probe point in the icache/bpred?
You should instantiate a pmuProbePoint in those classes; those will 
automatically notify the PMU probe listener if
wired correctly.

>
> Really want some guidance.
>
> Thank you
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Nikolaos Foivos POLYCHRONOU
>
> PhD Student - Security of Embedded Systems/IoT/IIoT
>
> Département DSYS / LSOSP
>
>
>
> 17, rue des martyrs | 38000 Grenoble
>
> Fix work . +33 4 38 78 19 58
>
> Office Bat. 4022 - P. 221
>
> nikolaos.polychro...@cea.fr
>
>
>
> LETI, technology research institute
>
> Commissariat à l'énergie atomique et aux énergies alternatives
>
> www.leti.fr <http://www.leti.fr/>   | LETI is a member of the Carnot 
> Institutes
> network
>
>
>
>
>
>
>
>  <https://www.youtube.com/channel/UC3JgudJblGykrECv6OUhWFg>
> <https://twitter.com/cea_leti> <https://www.linkedin.com/company/leti>
>
>
>
>
>
>

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-11-17 Thread POLYCHRONOU Nikolaos via gem5-users
Helllo.
I encounter the following problem when I try to simulate the starter_fs.py in 
aarch64.
Following these following posts 
https://www.mail-archive.com/gem5-users@gem5.org/msg18401.html & 
https://stackoverflow.com/questions/63988672/using-perf-event-with-the-arm-pmu-inside-gem5
I did the steps to apply the patch and changes  and also instantiate the pmus. 
I write a script in the image to access directly the registers.
I manage to read all the events except the cycle counter which always return 0. 
I try to read the cycle counter by instantiating a pmu event counter with 0x11 
but as reading from the ccnt it didn't work as well.
How GEM5 increments this counter. Are the steps to read it the same as in a 
real platform or the simulator has a mismatch configuration?
Also the values obtained reading the counters are not exactly the same as the 
m5 resetstats - m5 dumpstats. I guess maybe these two are syscalls as I read .
Another question is how to add the other events that are not implemented. I 
tried to do the following in the ArmPMU.py but it didn't work despite seeing 
the events created in the console they return 0 values.
self.addEvent(ProbeEvent(self,0x01,icache,"L1I_CACHE_REFILL"))

self.addEvent(ProbeEvent(self,0x0D, bpred, "BR_IMMED_RETIRED"))

Probably you need to add some counters in the component? Or are the implemented 
and I do sth wrong?

Really want some guidance.
Thank you









[Leti_logo_mail]





Nikolaos Foivos POLYCHRONOU
PhD Student - Security of Embedded Systems/IoT/IIoT
Département DSYS / LSOSP

17, rue des martyrs | 38000 Grenoble
Fix work . +33 4 38 78 19 58
Office Bat. 4022 - P. 221
nikolaos.polychro...@cea.fr

LETI, technology research institute
Commissariat à l'énergie atomique et aux énergies alternatives
www.leti.fr  | LETI is a member of the Carnot Institutes 
network







[icone_youtube]   
[icone_twitter] [icone_linkedin] 





___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-09-30 Thread Giacomo Travaglini via gem5-users
This is great Pierre!

I recommend you the following documentation http://www.gem5.org/contributing 
which will guide you through the contributing process. If you have any other 
question, don’t hesitate to ask.

About the issue you are encountering, I am not that familiar with perf_events; 
I think you can debug what is going on
during PMU probing via the PMUVerbose flag in gem5 (it will tell you what 
register is being read/written) or via gdb.

Can I ask you to open a ticket in our bug tracker for this anyway? 
https://gem5.atlassian.net/secure/BrowseProjects.jspa

Many thanks

Giacomo

From: Pierre Ayoub 
Sent: 30 September 2020 12:37
To: Giacomo Travaglini 
Cc: gem5-users 
Subject: Re: Using perf_event with the ARM PMU inside gem5 on Linux

Hi Giacomo,

Many thanks. This time, it works fine and I feel that I really understand how 
the DTB, the GIC and the gem5 code interact together! After declaring correctly 
the PMU in the DTB like you did, we have this confirmation at boot time that 
the Linux kernel correctly see it:
[0.239967] hw perfevents: enabled with armv8_pmuv3 PMU driver, 32 counters 
available
Just one thing. On my real ARM hardware, I used perf_event with the 
PERF_TYPE_HARDWARE type of event. It doesn't work like this for my gem5 
simulated system, perf_event was not able to establish a correspondence between 
gem5 events and architectural events -- despite that the events number are the 
same and corresponds to the ARMv8 specification. I don't know the reason. Thus, 
the workaround is to use the PERF_TYPE_RAW type of event, and the event ids 
that are declared in the ArmPMU.py file directly in our C source code.

I will see how to send patches and learn how to use gerrit. Thanks for your 
help.

Best,
Pierre


De: "Giacomo Travaglini" 
mailto:giacomo.travagl...@arm.com>>
À: "gem5-users" mailto:gem5-users@gem5.org>>
Cc: "Pierre Ayoub" mailto:pierre.ay...@irisa.fr>>
Envoyé: Mardi 29 Septembre 2020 22:22:15
Objet: RE: Using perf_event with the ARM PMU inside gem5 on Linux
Hey Pierre,

You are actually very close to get it right! The problem is: there should be a 
single PMU instantiation.

What you need to do in the BaseCPU is:

# Generate nodes from the BaseCPU children.
# Please note: this is mainly needed for the ISA class
for node in self.recurseDeviceTree(state):
yield node

Please feel free to push this BaseCPU and ArmISA changes as separate patches to 
gerrit if you want (I have implemented it in the same way). I will post the PMU 
one (it is similar to what you are doing but I have done some other refactoring)

Another thing. You are using PPIs for the PMU (good)
PPIs are per-cpu interrupts; by being local to a PE, there’s no need of having 
a different PPI number per core (and the GIC/PMU driver might actually complain)

So rather than doing:

ints = [20, 21, 22, 23]

You should do something like (example)

ints = [22, 22, 22, 22]

Kind Regards

Giacomo


From: Pierre Ayoub via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: 29 September 2020 18:05
To: Giacomo Travaglini 
mailto:giacomo.travagl...@arm.com>>
Cc: gem5-users mailto:gem5-users@gem5.org>>; Pierre Ayoub 
mailto:pierre.ay...@irisa.fr>>
Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

Hi Giacomo,

Thank you for your reply. Your hint about the DTB gives me a great starting 
point to make a lot of research about it, and its relation between the Linux 
kernel and the ARM PMU. I though that I would be able to fix this myself, by 
studying how gem5 generate the DTB and how the PMU is declared in a DTB. 
However, despite that I have learned a lot of things, I was wrong.

In my system script, I declare and attach a PMU like this:
ints = [20, 21, 22, 23]
assert len(ints) == len(system.cpu_cluster.cpus)
for cpu, pint in zip(system.cpu_cluster.cpus, ints):
for isa in cpu.isa:
isa.pmu = ArmPMU(interrupt=ArmPPI(num=pint))
isa.pmu.addArchEvents(
cpu=cpu, dtb=cpu.dtb, itb=cpu.itb,
icache=getattr(cpu, "dcache", None),
dcache=getattr(cpu, "icache", None),
l2cache=getattr(system.cpu_cluster, "l2", None))

And I applied this patch to gem5:
diff --git i/src/arch/arm/ArmISA.py w/src/arch/arm/ArmISA.py
index 2641ec3fb..3d85c1b75 100644
--- i/src/arch/arm/ArmISA.py
+++ w/src/arch/arm/ArmISA.py
@@ -36,6 +36,7 @@
 from m5.params import *
 from m5.proxy import *

+from m5.SimObject import SimObject
 from m5.objects.ArmPMU import ArmPMU
 from m5.objects.ArmSystem import SveVectorLength
 from m5.objects.BaseISA import BaseISA
@@ -49,6 +50,8 @@ class ArmISA(BaseISA):
 cxx_class = 'ArmISA::I

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-09-30 Thread Pierre Ayoub via gem5-users
Hi Giacomo, 

Many thanks. This time, it works fine and I feel that I really understand how 
the DTB, the GIC and the gem5 code interact together! After declaring correctly 
the PMU in the DTB like you did, we have this confirmation at boot time that 
the Linux kernel correctly see it: 

> [ 0.239967] hw perfevents: enabled with armv8_pmuv3 PMU driver, 32 counters
> available

Just one thing. On my real ARM hardware, I used perf_event with the 
PERF_TYPE_HARDWARE type of event. It doesn't work like this for my gem5 
simulated system, perf_event was not able to establish a correspondence between 
gem5 events and architectural events -- despite that the events number are the 
same and corresponds to the ARMv8 specification. I don't know the reason. Thus, 
the workaround is to use the PERF_TYPE_RAW type of event, and the event ids 
that are declared in the ArmPMU.py file directly in our C source code. 

I will see how to send patches and learn how to use gerrit. Thanks for your 
help. 

Best, 
Pierre 

> De: "Giacomo Travaglini" 
> À: "gem5-users" 
> Cc: "Pierre Ayoub" 
> Envoyé: Mardi 29 Septembre 2020 22:22:15
> Objet: RE: Using perf_event with the ARM PMU inside gem5 on Linux

> Hey Pierre,

> You are actually very close to get it right! The problem is: there should be a
> single PMU instantiation.

> What you need to do in the BaseCPU is:

> # Generate nodes from the BaseCPU children.

> # Please note: this is mainly needed for the ISA class

> for node in self.recurseDeviceTree(state):

> yield node

> Please feel free to push this BaseCPU and ArmISA changes as separate patches 
> to
> gerrit if you want (I have implemented it in the same way). I will post the 
> PMU
> one (it is similar to what you are doing but I have done some other
> refactoring)

> Another thing. You are using PPIs for the PMU (good)

> PPIs are per-cpu interrupts; by being local to a PE, there’s no need of 
> having a
> different PPI number per core (and the GIC/PMU driver might actually complain)

> So rather than doing:

> ints = [20, 21, 22, 23]

> You should do something like (example)

> ints = [22, 22, 22, 22]

> Kind Regards

> Giacomo

> From: Pierre Ayoub via gem5-users 
> Sent: 29 September 2020 18:05
> To: Giacomo Travaglini 
> Cc: gem5-users ; Pierre Ayoub 
> Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on 
> Linux

> Hi Giacomo,

> Thank you for your reply. Your hint about the DTB gives me a great starting
> point to make a lot of research about it, and its relation between the Linux
> kernel and the ARM PMU. I though that I would be able to fix this myself, by
> studying how gem5 generate the DTB and how the PMU is declared in a DTB.
> However, despite that I have learned a lot of things, I was wrong.

> In my system script, I declare and attach a PMU like this:

>> ints = [20, 21, 22, 23]
>> assert len(ints) == len(system.cpu_cluster.cpus)

>> for cpu, pint in zip(system.cpu_cluster.cpus, ints):
>> for isa in cpu.isa:
>> isa.pmu = ArmPMU(interrupt=ArmPPI(num=pint))
>> isa.pmu.addArchEvents(
>> cpu=cpu, dtb=cpu.dtb, itb=cpu.itb,
>> icache=getattr(cpu, "dcache", None),
>> dcache=getattr(cpu, "icache", None),
>> l2cache=getattr(system.cpu_cluster, "l2", None))

> And I applied this patch to gem5:

>> diff --git i/src/arch/arm/ArmISA.py w/src/arch/arm/ArmISA.py
>> index 2641ec3fb..3d85c1b75 100644
>> --- i/src/arch/arm/ArmISA.py
>> +++ w/src/arch/arm/ArmISA.py
>> @@ -36,6 +36,7 @@
>> from m5.params import *
>> from m5.proxy import *

>> +from m5.SimObject import SimObject
>> from m5.objects.ArmPMU import ArmPMU
>> from m5.objects.ArmSystem import SveVectorLength
>> from m5.objects.BaseISA import BaseISA
>> @@ -49,6 +50,8 @@ class ArmISA(BaseISA):
>> cxx_class = 'ArmISA::ISA'
>> cxx_header = "arch/arm/isa.hh"

>> + generateDeviceTree = SimObject.recurseDeviceTree
>> +
>> system = Param.System(Parent.any, "System this ISA object belongs to")

>> pmu = Param.ArmPMU(NULL, "Performance Monitoring Unit")
>> diff --git i/src/arch/arm/ArmPMU.py w/src/arch/arm/ArmPMU.py
>> index 047e908b3..58553fbf9 100644
>> --- i/src/arch/arm/ArmPMU.py
>> +++ w/src/arch/arm/ArmPMU.py
>> @@ -40,6 +40,7 @@ from m5.params import *
>> from m5.params import isNullPointer
>> from m5.proxy import *
>> from m5.objects.Gic import ArmInterruptPin
>> +from m5.util.fdthelper import *

>> class ProbeEvent(object):
>> def __init__(self, pmu, _eventId, obj, *listOfNames):
>> @@ -76,6 +77,17 @@ class ArmPMU(SimObject):

>> _events = None

>> + def generateDe

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-09-29 Thread Giacomo Travaglini via gem5-users
Hey Pierre,

You are actually very close to get it right! The problem is: there should be a 
single PMU instantiation.

What you need to do in the BaseCPU is:

# Generate nodes from the BaseCPU children.
# Please note: this is mainly needed for the ISA class
for node in self.recurseDeviceTree(state):
yield node

Please feel free to push this BaseCPU and ArmISA changes as separate patches to 
gerrit if you want (I have implemented it in the same way). I will post the PMU 
one (it is similar to what you are doing but I have done some other refactoring)

Another thing. You are using PPIs for the PMU (good)
PPIs are per-cpu interrupts; by being local to a PE, there's no need of having 
a different PPI number per core (and the GIC/PMU driver might actually complain)

So rather than doing:

ints = [20, 21, 22, 23]

You should do something like (example)

ints = [22, 22, 22, 22]

Kind Regards

Giacomo


From: Pierre Ayoub via gem5-users 
Sent: 29 September 2020 18:05
To: Giacomo Travaglini 
Cc: gem5-users ; Pierre Ayoub 
Subject: [gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

Hi Giacomo,

Thank you for your reply. Your hint about the DTB gives me a great starting 
point to make a lot of research about it, and its relation between the Linux 
kernel and the ARM PMU. I though that I would be able to fix this myself, by 
studying how gem5 generate the DTB and how the PMU is declared in a DTB. 
However, despite that I have learned a lot of things, I was wrong.

In my system script, I declare and attach a PMU like this:
ints = [20, 21, 22, 23]
assert len(ints) == len(system.cpu_cluster.cpus)
for cpu, pint in zip(system.cpu_cluster.cpus, ints):
for isa in cpu.isa:
isa.pmu = ArmPMU(interrupt=ArmPPI(num=pint))
isa.pmu.addArchEvents(
cpu=cpu, dtb=cpu.dtb, itb=cpu.itb,
icache=getattr(cpu, "dcache", None),
dcache=getattr(cpu, "icache", None),
l2cache=getattr(system.cpu_cluster, "l2", None))

And I applied this patch to gem5:
diff --git i/src/arch/arm/ArmISA.py w/src/arch/arm/ArmISA.py
index 2641ec3fb..3d85c1b75 100644
--- i/src/arch/arm/ArmISA.py
+++ w/src/arch/arm/ArmISA.py
@@ -36,6 +36,7 @@
 from m5.params import *
 from m5.proxy import *

+from m5.SimObject import SimObject
 from m5.objects.ArmPMU import ArmPMU
 from m5.objects.ArmSystem import SveVectorLength
 from m5.objects.BaseISA import BaseISA
@@ -49,6 +50,8 @@ class ArmISA(BaseISA):
 cxx_class = 'ArmISA::ISA'
 cxx_header = "arch/arm/isa.hh"

+generateDeviceTree = SimObject.recurseDeviceTree
+
 system = Param.System(Parent.any, "System this ISA object belongs to")

 pmu = Param.ArmPMU(NULL, "Performance Monitoring Unit")
diff --git i/src/arch/arm/ArmPMU.py w/src/arch/arm/ArmPMU.py
index 047e908b3..58553fbf9 100644
--- i/src/arch/arm/ArmPMU.py
+++ w/src/arch/arm/ArmPMU.py
@@ -40,6 +40,7 @@ from m5.params import *
 from m5.params import isNullPointer
 from m5.proxy import *
 from m5.objects.Gic import ArmInterruptPin
+from m5.util.fdthelper import *

 class ProbeEvent(object):
 def __init__(self, pmu, _eventId, obj, *listOfNames):
@@ -76,6 +77,17 @@ class ArmPMU(SimObject):

 _events = None

+def generateDeviceTree(self, state):
+node = FdtNode("pmu")
+node.appendCompatible("arm,armv8-pmuv3")
+# gem5 uses GIC controller interrupt notation, where PPI interrupts
+# start to 16. However, the Linux kernel start from 0, and used a 
tag
+# (set to 1) to indicate the PPI interrupt type.
+node.append(FdtPropertyWords("interrupts", [
+1, int(self.interrupt.num) - 16, 0xf04
+]))
+yield node
+
 def addEvent(self, newObject):
 if not (isinstance(newObject, ProbeEvent)
 or isinstance(newObject, SoftwareIncrement)):
diff --git i/src/cpu/BaseCPU.py w/src/cpu/BaseCPU.py
index ab70d1d7f..e5d0ed3dd 100644
--- i/src/cpu/BaseCPU.py
+++ w/src/cpu/BaseCPU.py
@@ -302,6 +302,9 @@ class BaseCPU(ClockedObject):
 node.appendPhandle(phandle_key)
 cpus_node.append(node)

+for subnode in self.recurseDeviceTree(state):
+node.append(subnode)
+
 yield cpus_node

 def __init__(self, **kwargs):
I end up with a DTB with this:
pmu {
compatible = "arm,armv8-pmuv3";
interrupts = <0x01 0x04 0xf04>;
};
pmu {
compatible = &qu

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-09-29 Thread Pierre Ayoub via gem5-users
Hi Giacomo, 

Thank you for your reply. Your hint about the DTB gives me a great starting 
point to make a lot of research about it, and its relation between the Linux 
kernel and the ARM PMU. I though that I would be able to fix this myself, by 
studying how gem5 generate the DTB and how the PMU is declared in a DTB. 
However, despite that I have learned a lot of things, I was wrong. 

In my system script, I declare and attach a PMU like this: 

> ints = [20, 21, 22, 23]
> assert len(ints) == len(system.cpu_cluster.cpus)
> for cpu, pint in zip(system.cpu_cluster.cpus, ints):
> for isa in cpu.isa:
> isa.pmu = ArmPMU(interrupt=ArmPPI(num=pint))
> isa.pmu.addArchEvents(
> cpu=cpu, dtb=cpu.dtb, itb=cpu.itb,
> icache=getattr(cpu, "dcache", None),
> dcache=getattr(cpu, "icache", None),
> l2cache=getattr(system.cpu_cluster, "l2", None))

And I applied this patch to gem5: 

> diff --git i/src/arch/arm/ArmISA.py w/src/arch/arm/ArmISA.py
> index 2641ec3fb..3d85c1b75 100644
> --- i/src/arch/arm/ArmISA.py
> +++ w/src/arch/arm/ArmISA.py
> @@ -36,6 +36,7 @@
> from m5.params import *
> from m5.proxy import *

> +from m5.SimObject import SimObject
> from m5.objects.ArmPMU import ArmPMU
> from m5.objects.ArmSystem import SveVectorLength
> from m5.objects.BaseISA import BaseISA
> @@ -49,6 +50,8 @@ class ArmISA(BaseISA):
> cxx_class = 'ArmISA::ISA'
> cxx_header = "arch/arm/isa.hh"

> + generateDeviceTree = SimObject.recurseDeviceTree
> +
> system = Param.System(Parent.any, "System this ISA object belongs to")

> pmu = Param.ArmPMU(NULL, "Performance Monitoring Unit")
> diff --git i/src/arch/arm/ArmPMU.py w/src/arch/arm/ArmPMU.py
> index 047e908b3..58553fbf9 100644
> --- i/src/arch/arm/ArmPMU.py
> +++ w/src/arch/arm/ArmPMU.py
> @@ -40,6 +40,7 @@ from m5.params import *
> from m5.params import isNullPointer
> from m5.proxy import *
> from m5.objects.Gic import ArmInterruptPin
> +from m5.util.fdthelper import *

> class ProbeEvent(object):
> def __init__(self, pmu, _eventId, obj, *listOfNames):
> @@ -76,6 +77,17 @@ class ArmPMU(SimObject):

> _events = None

> + def generateDeviceTree(self, state):
> + node = FdtNode("pmu")
> + node.appendCompatible("arm,armv8-pmuv3")
> + # gem5 uses GIC controller interrupt notation, where PPI interrupts
> + # start to 16. However, the Linux kernel start from 0, and used a tag
> + # (set to 1) to indicate the PPI interrupt type.
> + node.append(FdtPropertyWords("interrupts", [
> + 1, int(self.interrupt.num) - 16, 0xf04
> + ]))
> + yield node
> +
> def addEvent(self, newObject):
> if not (isinstance(newObject, ProbeEvent)
> or isinstance(newObject, SoftwareIncrement)):
> diff --git i/src/cpu/BaseCPU.py w/src/cpu/BaseCPU.py
> index ab70d1d7f..e5d0ed3dd 100644
> --- i/src/cpu/BaseCPU.py
> +++ w/src/cpu/BaseCPU.py
> @@ -302,6 +302,9 @@ class BaseCPU(ClockedObject):
> node.appendPhandle(phandle_key)
> cpus_node.append(node)

> + for subnode in self.recurseDeviceTree(state):
> + node.append(subnode)
> +
> yield cpus_node

> def __init__(self, **kwargs):

I end up with a DTB with this: 

> pmu {
> compatible = "arm,armv8-pmuv3";
> interrupts = <0x01 0x04 0xf04>;
> };
> pmu {
> compatible = "arm,armv8-pmuv3";
> interrupts = <0x01 0x05 0xf04>;
> };
> pmu {
> compatible = "arm,armv8-pmuv3";
> interrupts = <0x01 0x06 0xf04>;
> };
> pmu {
> compatible = "arm,armv8-pmuv3";
> interrupts = <0x01 0x07 0xf04>;
> };

One PMU declaration for one core. However, it does not work. I don't even know 
if this kind of declaration is correct, maybe we have to declare the PMU once 
for all cores -- instead of one by core ? 
Note that the configuration of the kernel is correct to normally initialize 
perf_event (in /proc/config.gz). 

Many thanks if you help me, and many thanks also if you post a patch in the 
future. 

Best, 
Pierre 

> De: "Giacomo Travaglini" 
> À: "gem5-users" 
> Cc: "Pierre Ayoub" 
> Envoyé: Jeudi 24 Septembre 2020 12:09:17
> Objet: RE: Using perf_event with the ARM PMU inside gem5 on Linux

> Hi Pierre,

> First of all many thanks for explaining in detail what is your problem. This 
> is
> very helpful.

> The reason why you are not able to use perf_events is probably because the
> kernel is not aware of the presence of PMUs. This is usually communicated to
> Linux via the DTB. I can see how we are not enabling DTB autogen for the
> ArmPMU.

> I will post a patch

> Kind Regards

> Giacomo

> From: Pierre Ayoub via gem5-users 
> Sent: 23 September 2020 08:45
> To: gem5-users@gem5.org
> Cc: Pierre Ayoub 
> Subject: [gem5-users] Using perf_event with the ARM PMU inside gem5 on Linux

> Hi gem5's users,

> TL;DR:
> --

> I know that the ARM PMU is partially implemented, thanks to the gem5 source
> code and some publications. I have a binary which uses perf_event to access 
> the
> PMU on a Linux-based OS, under an ARM processor, on real hardware. Could it 
> use
> perf_event inside a gem5 full-system simulation with a Linux kernel, under the
> ARM ISA? So far, I haven't found the right way to do 

[gem5-users] Re: Using perf_event with the ARM PMU inside gem5 on Linux

2020-09-24 Thread Giacomo Travaglini via gem5-users
Hi Pierre,

First of all many thanks for explaining in detail what is your problem. This is 
very helpful.

The reason why you are not able to use perf_events is probably because the 
kernel is not aware of the presence of PMUs. This is usually communicated to 
Linux via the DTB. I can see how we are not enabling DTB autogen for the ArmPMU.

I will post a patch

Kind Regards

Giacomo

From: Pierre Ayoub via gem5-users 
Sent: 23 September 2020 08:45
To: gem5-users@gem5.org
Cc: Pierre Ayoub 
Subject: [gem5-users] Using perf_event with the ARM PMU inside gem5 on Linux


Hi gem5's users,

TL;DR:
--

I know that the ARM PMU is partially implemented, thanks to the gem5 source
code and some publications. I have a binary which uses perf_event to access the
PMU on a Linux-based OS, under an ARM processor, on real hardware. Could it use
perf_event inside a gem5 full-system simulation with a Linux kernel, under the
ARM ISA? So far, I haven't found the right way to do it. If someone knows, I
will be very grateful!

Detailed information:
-

I have a binary (developed by myself) which uses perf_event on real ARM
hardware, to get cache misses and mispredicted branches, and it works well. My
"perf_event_attr.type" is configured with "PERF_TYPE_HARDWARE" and the
".config" field with "PERF_COUNT_HW_CACHE_MISSES" and another with
"PERF_COUNT_HW_BRANCH_MISSES." However, when I put this binary on a gem5 fs
simulation, configured with the DerivO3CPU, ArmSystem, and RealView platform, I
got the following error:

"ENOENT (2): No such file or directory"

The perf_event file descriptor is not created by the kernel (equal to -1). I
wish to precise that this error arrives at the return of the perf_event_open()
syscall. Finally, this error is documented in the perf_event_open.2 manpage,
and also discussed here. However, it didn't help me to understand the error
regarding gem5.

I don't know if we can access the PMU through perf_event into gem5. If so,
maybe we have to use RAW events? (i.e., do you know if perf_event is supposed
to be initialized with PERF_EVENT_HARDWARE or PERF_EVENT_RAW, to be used with
gem5?) In the gem5 example code under configs, I have found a snippet in
devices.py which "Instantiates 1 ArmPMU per PE" (addPMUs()). However, after few
tries, I don't understand how to use this correctly and how it is related to
perf_event.

I used a code similar to addPMUs() in devices.py, with PPI interrupts number
20, 21, 22, and 23 (one by core) according to the RealView interrupts mapping,
with the ArmPPI class. However, perf_event_open() still return the same
error. Note also that I got this message during the boot:

src/arch/arm/pmu.cc:293: warn: Not doing anything for write to miscreg 
pmuserenr_el0.

This register is documented in the ARMv8-A architecture manual. I have checked
the pmu.cc file, and saw that writing to this register is not implemented (TODO
state). Normally, it should not be a problem since this register allows (when
set to 1) userland access to the PMU, which we don't want because I want to
access it through the Linux kernel perf_event interface.


With --debug-flags=PMUVerbose, I get the following:

0: system.cpu_cluster.cpus0.isa.pmu: Initializing the PMU.
[...]
0: system.cpu_cluster.cpus0.isa.pmu: PMU: Adding Probe Driven event with id 
'0x2'as probe system.cpu_cluster.cpus0.itb:Refills
[...]
8687351673751: system.cpu_cluster.cpus0.isa.pmu: Assigning PMU to ContextID 0.
[...]
8687351673751: system.cpu_cluster.cpus0.isa.pmu: updateCounter(31): Disabling 
counter
[...]

Now, you know all I know about this issue!

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s