Re: Plugin ideas/experiments

2019-06-04 Thread Christophe Lyon
On Mon, 3 Jun 2019 at 21:36, Christophe Lyon  wrote:
>
> On Thu, 30 May 2019 at 12:28, Alex Bennée  wrote:
> >
> >
> > Hi,
> >
> > Food for thought for today's sync up. I've been writting QEMU plugins to
> > exercise the plugin system and see what sort of useful information you
> > can extract when you can control the instruction stream.
> >
> > For example I now have a plugin that can break down instruction counts
> > for any given run, for example a kernel boot:
> >
> >   Instruction Classes:
> >   Class:   UDEF   not counted
> >   Class:   SVE(68 hits)
> >   Class: Reserved (0 hits)
> >   Class:   PCrel addr (4589078 hits)
> >   Class:   Add/Sub (imm,tags) (0 hits)
> >   Class:   Add/Sub (imm)  (26832113 hits)
> >   Class:   Logical (imm)  (74304974 hits)
> >   Class:   Move Wide (imm)(10933759 hits)
> >   Class:   Bitfield   (71470957 hits)
> >   Class:   Extract(85655 hits)
> >   Class: Data Proc Imm(0 hits)
> >   Class:   Cond Branch (imm)  (37227632 hits)
> >   Class:   Exception Gen  (6 hits)
> >   Class: NOP  not counted
> >   Class:   Hints  (244825554 hits)
> >   Class:   Barriers   (1668558 hits)
> >   Class:   PSTATE (202144 hits)
> >   Class:   System Insn(7132992 hits)
> >   Class:   System Reg (2268308 hits)
> >   Class:   Branch (reg)   (6280976 hits)
> >   Class:   Branch (imm)   (18347905 hits)
> >   Class:   Cmp & Branch   (180167025 hits)
> >   Class:   Tst & Branch   (4092972 hits)
> >   Class: Branches (0 hits)
> >   Class:   AdvSimd ldstmult   (0 hits)
> >   Class:   AdvSimd ldstmult++ (0 hits)
> >   Class:   AdvSimd ldst   (0 hits)
> >   Class:   AdvSimd ldst++ (0 hits)
> >   Class:   ldst excl  (160861365 hits)
> >   Class: Prefetch (0 hits)
> >   Class:   Load Reg (lit) (12828544 hits)
> >   Class:   ldst noalloc pair  (0 hits)
> >   Class:   ldst pair  (60381349 hits)
> >   Class:   ldst reg   (0 hits)
> >   Class:   Atomic ldst(0 hits)
> >   Class:   ldst reg (reg off) (0 hits)
> >   Class:   ldst reg (pac) (0 hits)
> >   Class:   ldst reg (imm) (119597941 hits)
> >   Class: Loads & Stores   (0 hits)
> >   Class: Data Proc Reg(113586343 hits)
> >   Class: Scalar FP(0 hits)
> >   Class: Unclassified (0 hits)
> >
> > You can break down each class to individual instructions. For example
> > the Hints are mostly:
> >
> >   Individual Instructions:
> >   Instr: wfe  (132400072 hits)(op=0xd503205f/  
> > Hints)
> >   Instr: sevl (66433640 hits) (op=0xd50320bf/  Hints)
> >   Instr: yield(29619246 hits) (op=0xd503203f/  Hints)
> >   Instr: wfi  (2865 hits) (op=0xd503207f/  Hints)
> >
> > So I'm looking for a similar experiment that would be useful for the
> > memory sub-system. When I chatted to Maxim we thought maybe a simplified
> > cache line simulator might be useful. The aim wouldn't be to simulate
> > what a real cache might do but to be useful say for identifying regions
> > of code which might be susceptible to cache line bouncing. So as
> > compiler writers what sort of run time memory behaviour would you like
> > to track? What sort of information would be useful to extract with such
> > a tool?
> >
> > I'm open to ideas ;-)
> >
>
> On our side (ST), we use qemu plugins for various things:
> - code coverage
> - code profiling
> - loop analysis (more compiler developer oriented than the previous ones)
>

Actually for more detailed info you can have a look at:
Example plugins reference:
https://github.com/atos-tools/qemu/tree/stable-3.1.plugins/tcg/plugins

I.e. example plugins list:
- Full instruction trace
- DineroIV cache simulator
- Instruction group/mnemonic dynamic count
- coverage
- per-function profile
- oprofile
- function call stack
- global instruction count
- I/O memory mapped simulation
- block trace

Other more advanced plugins (ref for instance paper:
https://ppopp19.sigplan.org/details/PPoPP-2019-papers/9/Data-Flow-Dependence-Profiling-for-Structured-Transformations):
- Function call graph, CFGs and function call-stack sampling (flamegraph)
- Dynamic Dependence Graph

> Christophe
>
> >
> > --
> > Alex Bennée
> > ___
> > linaro-toolchain mailing list
> > linaro-toolchain@lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/linaro-toolchain
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: Plugin ideas/experiments

2019-06-03 Thread Christophe Lyon
On Thu, 30 May 2019 at 12:28, Alex Bennée  wrote:
>
>
> Hi,
>
> Food for thought for today's sync up. I've been writting QEMU plugins to
> exercise the plugin system and see what sort of useful information you
> can extract when you can control the instruction stream.
>
> For example I now have a plugin that can break down instruction counts
> for any given run, for example a kernel boot:
>
>   Instruction Classes:
>   Class:   UDEF   not counted
>   Class:   SVE(68 hits)
>   Class: Reserved (0 hits)
>   Class:   PCrel addr (4589078 hits)
>   Class:   Add/Sub (imm,tags) (0 hits)
>   Class:   Add/Sub (imm)  (26832113 hits)
>   Class:   Logical (imm)  (74304974 hits)
>   Class:   Move Wide (imm)(10933759 hits)
>   Class:   Bitfield   (71470957 hits)
>   Class:   Extract(85655 hits)
>   Class: Data Proc Imm(0 hits)
>   Class:   Cond Branch (imm)  (37227632 hits)
>   Class:   Exception Gen  (6 hits)
>   Class: NOP  not counted
>   Class:   Hints  (244825554 hits)
>   Class:   Barriers   (1668558 hits)
>   Class:   PSTATE (202144 hits)
>   Class:   System Insn(7132992 hits)
>   Class:   System Reg (2268308 hits)
>   Class:   Branch (reg)   (6280976 hits)
>   Class:   Branch (imm)   (18347905 hits)
>   Class:   Cmp & Branch   (180167025 hits)
>   Class:   Tst & Branch   (4092972 hits)
>   Class: Branches (0 hits)
>   Class:   AdvSimd ldstmult   (0 hits)
>   Class:   AdvSimd ldstmult++ (0 hits)
>   Class:   AdvSimd ldst   (0 hits)
>   Class:   AdvSimd ldst++ (0 hits)
>   Class:   ldst excl  (160861365 hits)
>   Class: Prefetch (0 hits)
>   Class:   Load Reg (lit) (12828544 hits)
>   Class:   ldst noalloc pair  (0 hits)
>   Class:   ldst pair  (60381349 hits)
>   Class:   ldst reg   (0 hits)
>   Class:   Atomic ldst(0 hits)
>   Class:   ldst reg (reg off) (0 hits)
>   Class:   ldst reg (pac) (0 hits)
>   Class:   ldst reg (imm) (119597941 hits)
>   Class: Loads & Stores   (0 hits)
>   Class: Data Proc Reg(113586343 hits)
>   Class: Scalar FP(0 hits)
>   Class: Unclassified (0 hits)
>
> You can break down each class to individual instructions. For example
> the Hints are mostly:
>
>   Individual Instructions:
>   Instr: wfe  (132400072 hits)(op=0xd503205f/  
> Hints)
>   Instr: sevl (66433640 hits) (op=0xd50320bf/  Hints)
>   Instr: yield(29619246 hits) (op=0xd503203f/  Hints)
>   Instr: wfi  (2865 hits) (op=0xd503207f/  Hints)
>
> So I'm looking for a similar experiment that would be useful for the
> memory sub-system. When I chatted to Maxim we thought maybe a simplified
> cache line simulator might be useful. The aim wouldn't be to simulate
> what a real cache might do but to be useful say for identifying regions
> of code which might be susceptible to cache line bouncing. So as
> compiler writers what sort of run time memory behaviour would you like
> to track? What sort of information would be useful to extract with such
> a tool?
>
> I'm open to ideas ;-)
>

On our side (ST), we use qemu plugins for various things:
- code coverage
- code profiling
- loop analysis (more compiler developer oriented than the previous ones)

Christophe

>
> --
> Alex Bennée
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-toolchain
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: Plugin ideas/experiments

2019-06-03 Thread Peter Smith
On Thu, 30 May 2019 at 11:28, Alex Bennée  wrote:
>
>
> Hi,
>
> Food for thought for today's sync up. I've been writting QEMU plugins to
> exercise the plugin system and see what sort of useful information you
> can extract when you can control the instruction stream.
>
> For example I now have a plugin that can break down instruction counts
> for any given run, for example a kernel boot:
>
>   Instruction Classes:
>   Class:   UDEF   not counted
>   Class:   SVE(68 hits)
>   Class: Reserved (0 hits)
>   Class:   PCrel addr (4589078 hits)
>   Class:   Add/Sub (imm,tags) (0 hits)
>   Class:   Add/Sub (imm)  (26832113 hits)
>   Class:   Logical (imm)  (74304974 hits)
>   Class:   Move Wide (imm)(10933759 hits)
>   Class:   Bitfield   (71470957 hits)
>   Class:   Extract(85655 hits)
>   Class: Data Proc Imm(0 hits)
>   Class:   Cond Branch (imm)  (37227632 hits)
>   Class:   Exception Gen  (6 hits)
>   Class: NOP  not counted
>   Class:   Hints  (244825554 hits)
>   Class:   Barriers   (1668558 hits)
>   Class:   PSTATE (202144 hits)
>   Class:   System Insn(7132992 hits)
>   Class:   System Reg (2268308 hits)
>   Class:   Branch (reg)   (6280976 hits)
>   Class:   Branch (imm)   (18347905 hits)
>   Class:   Cmp & Branch   (180167025 hits)
>   Class:   Tst & Branch   (4092972 hits)
>   Class: Branches (0 hits)
>   Class:   AdvSimd ldstmult   (0 hits)
>   Class:   AdvSimd ldstmult++ (0 hits)
>   Class:   AdvSimd ldst   (0 hits)
>   Class:   AdvSimd ldst++ (0 hits)
>   Class:   ldst excl  (160861365 hits)
>   Class: Prefetch (0 hits)
>   Class:   Load Reg (lit) (12828544 hits)
>   Class:   ldst noalloc pair  (0 hits)
>   Class:   ldst pair  (60381349 hits)
>   Class:   ldst reg   (0 hits)
>   Class:   Atomic ldst(0 hits)
>   Class:   ldst reg (reg off) (0 hits)
>   Class:   ldst reg (pac) (0 hits)
>   Class:   ldst reg (imm) (119597941 hits)
>   Class: Loads & Stores   (0 hits)
>   Class: Data Proc Reg(113586343 hits)
>   Class: Scalar FP(0 hits)
>   Class: Unclassified (0 hits)
>
> You can break down each class to individual instructions. For example
> the Hints are mostly:
>
>   Individual Instructions:
>   Instr: wfe  (132400072 hits)(op=0xd503205f/  
> Hints)
>   Instr: sevl (66433640 hits) (op=0xd50320bf/  Hints)
>   Instr: yield(29619246 hits) (op=0xd503203f/  Hints)
>   Instr: wfi  (2865 hits) (op=0xd503207f/  Hints)
>
> So I'm looking for a similar experiment that would be useful for the
> memory sub-system. When I chatted to Maxim we thought maybe a simplified
> cache line simulator might be useful. The aim wouldn't be to simulate
> what a real cache might do but to be useful say for identifying regions
> of code which might be susceptible to cache line bouncing. So as
> compiler writers what sort of run time memory behaviour would you like
> to track? What sort of information would be useful to extract with such
> a tool?
>
> I'm open to ideas ;-)
>

In our embedded compiler team we used a fast model plugin to check
that our cortex-m3 execute-only code did indeed not read the
executable instructions (no literal pools etc). You may have this
emulated already though. Another demo I saw was a cache visualisation
plugin that gave a graphical display of the cache as the program was
running. Pretty, but sure did slow the model down.

Peter

>
> --
> Alex Bennée
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-toolchain
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: Plugin ideas/experiments

2019-06-03 Thread Adhemerval Zanella


On 30/05/2019 07:27, Alex Bennée wrote:
> 
> Hi,
> 
> Food for thought for today's sync up. I've been writting QEMU plugins to
> exercise the plugin system and see what sort of useful information you
> can extract when you can control the instruction stream.
> 
> For example I now have a plugin that can break down instruction counts
> for any given run, for example a kernel boot:
> 
>   Instruction Classes:
>   Class:   UDEF   not counted
>   Class:   SVE(68 hits)
>   Class: Reserved (0 hits)
>   Class:   PCrel addr (4589078 hits)
>   Class:   Add/Sub (imm,tags) (0 hits)
>   Class:   Add/Sub (imm)  (26832113 hits)
>   Class:   Logical (imm)  (74304974 hits)
>   Class:   Move Wide (imm)(10933759 hits)
>   Class:   Bitfield   (71470957 hits)
>   Class:   Extract(85655 hits)
>   Class: Data Proc Imm(0 hits)
>   Class:   Cond Branch (imm)  (37227632 hits)
>   Class:   Exception Gen  (6 hits)
>   Class: NOP  not counted
>   Class:   Hints  (244825554 hits)
>   Class:   Barriers   (1668558 hits)
>   Class:   PSTATE (202144 hits)
>   Class:   System Insn(7132992 hits)
>   Class:   System Reg (2268308 hits)
>   Class:   Branch (reg)   (6280976 hits)
>   Class:   Branch (imm)   (18347905 hits)
>   Class:   Cmp & Branch   (180167025 hits)
>   Class:   Tst & Branch   (4092972 hits)
>   Class: Branches (0 hits)
>   Class:   AdvSimd ldstmult   (0 hits)
>   Class:   AdvSimd ldstmult++ (0 hits)
>   Class:   AdvSimd ldst   (0 hits)
>   Class:   AdvSimd ldst++ (0 hits)
>   Class:   ldst excl  (160861365 hits)
>   Class: Prefetch (0 hits)
>   Class:   Load Reg (lit) (12828544 hits)
>   Class:   ldst noalloc pair  (0 hits)
>   Class:   ldst pair  (60381349 hits)
>   Class:   ldst reg   (0 hits)
>   Class:   Atomic ldst(0 hits)
>   Class:   ldst reg (reg off) (0 hits)
>   Class:   ldst reg (pac) (0 hits)
>   Class:   ldst reg (imm) (119597941 hits)
>   Class: Loads & Stores   (0 hits)
>   Class: Data Proc Reg(113586343 hits)
>   Class: Scalar FP(0 hits)
>   Class: Unclassified (0 hits)
> 
> You can break down each class to individual instructions. For example
> the Hints are mostly:
> 
>   Individual Instructions:
>   Instr: wfe  (132400072 hits)(op=0xd503205f/  
> Hints)
>   Instr: sevl (66433640 hits) (op=0xd50320bf/  Hints)
>   Instr: yield(29619246 hits) (op=0xd503203f/  Hints)
>   Instr: wfi  (2865 hits) (op=0xd503207f/  Hints)
> 
> So I'm looking for a similar experiment that would be useful for the
> memory sub-system. When I chatted to Maxim we thought maybe a simplified
> cache line simulator might be useful. The aim wouldn't be to simulate
> what a real cache might do but to be useful say for identifying regions
> of code which might be susceptible to cache line bouncing. So as
> compiler writers what sort of run time memory behaviour would you like
> to track? What sort of information would be useful to extract with such
> a tool?
> 
> I'm open to ideas ;-)

Back at IBM one internal project we usually regularly was an instruction 
tracer based on a out-of-tree patch to valgrind.  The idea was to get 
precise instruction sequence for a specific text segment boundary so we 
could it loaded it later on a powerpc simulator to post-analyse the code 
behaviour regarding instruction latency, op-ports utilization, cpu stalls etc.

Not sure if would be that useful without a post-analysis tool, but I think
it might be useful to some arch-specific optimization.  What do you think?
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Plugin ideas/experiments

2019-05-30 Thread Alex Bennée

Hi,

Food for thought for today's sync up. I've been writting QEMU plugins to
exercise the plugin system and see what sort of useful information you
can extract when you can control the instruction stream.

For example I now have a plugin that can break down instruction counts
for any given run, for example a kernel boot:

  Instruction Classes:
  Class:   UDEF   not counted
  Class:   SVE(68 hits)
  Class: Reserved (0 hits)
  Class:   PCrel addr (4589078 hits)
  Class:   Add/Sub (imm,tags) (0 hits)
  Class:   Add/Sub (imm)  (26832113 hits)
  Class:   Logical (imm)  (74304974 hits)
  Class:   Move Wide (imm)(10933759 hits)
  Class:   Bitfield   (71470957 hits)
  Class:   Extract(85655 hits)
  Class: Data Proc Imm(0 hits)
  Class:   Cond Branch (imm)  (37227632 hits)
  Class:   Exception Gen  (6 hits)
  Class: NOP  not counted
  Class:   Hints  (244825554 hits)
  Class:   Barriers   (1668558 hits)
  Class:   PSTATE (202144 hits)
  Class:   System Insn(7132992 hits)
  Class:   System Reg (2268308 hits)
  Class:   Branch (reg)   (6280976 hits)
  Class:   Branch (imm)   (18347905 hits)
  Class:   Cmp & Branch   (180167025 hits)
  Class:   Tst & Branch   (4092972 hits)
  Class: Branches (0 hits)
  Class:   AdvSimd ldstmult   (0 hits)
  Class:   AdvSimd ldstmult++ (0 hits)
  Class:   AdvSimd ldst   (0 hits)
  Class:   AdvSimd ldst++ (0 hits)
  Class:   ldst excl  (160861365 hits)
  Class: Prefetch (0 hits)
  Class:   Load Reg (lit) (12828544 hits)
  Class:   ldst noalloc pair  (0 hits)
  Class:   ldst pair  (60381349 hits)
  Class:   ldst reg   (0 hits)
  Class:   Atomic ldst(0 hits)
  Class:   ldst reg (reg off) (0 hits)
  Class:   ldst reg (pac) (0 hits)
  Class:   ldst reg (imm) (119597941 hits)
  Class: Loads & Stores   (0 hits)
  Class: Data Proc Reg(113586343 hits)
  Class: Scalar FP(0 hits)
  Class: Unclassified (0 hits)

You can break down each class to individual instructions. For example
the Hints are mostly:

  Individual Instructions:
  Instr: wfe  (132400072 hits)(op=0xd503205f/  
Hints)
  Instr: sevl (66433640 hits) (op=0xd50320bf/  Hints)
  Instr: yield(29619246 hits) (op=0xd503203f/  Hints)
  Instr: wfi  (2865 hits) (op=0xd503207f/  Hints)

So I'm looking for a similar experiment that would be useful for the
memory sub-system. When I chatted to Maxim we thought maybe a simplified
cache line simulator might be useful. The aim wouldn't be to simulate
what a real cache might do but to be useful say for identifying regions
of code which might be susceptible to cache line bouncing. So as
compiler writers what sort of run time memory behaviour would you like
to track? What sort of information would be useful to extract with such
a tool?

I'm open to ideas ;-)


--
Alex Bennée
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain