[gem5-users] Detect cache miss in DCACHE

2019-08-02 Thread Francisco Carlos
Hi all,

I want to detect instructions that causes cache miss in data cache and to know 
its latency to memory access.

I would like to know if there is a simple way to do it.

I already tried to see the output from different debug-flags like: Cache, 
O3CpuAll, LSQUnit.

I also have tried to compute the time that memory instructions took in the 
execution stage by modifying it. In case the time was higher than hit latency, 
then i assume that instruction caused a cache miss. However, this approach 
doesn't work for all situations.


thanks in advance

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] x86 instruction decoding

2019-08-02 Thread Shyam Murthy
Hi Gabe,

I was reading through this today 
(https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=129).
 

Within gem5 however, for instructions MOVZX_B_R_M and MOVZX_W_R_M, the 
translated microops are such that they entail partial register stalls, because 
they are translated into load byte and load word, which in turn cause a partial 
register stall. However, the usage of MOVZX/MOVSX instruction were specifically 
optimizations developers and compilers use to eliminate these partial register 
stalls. So, is the current modeling not overly conservative? 

My main concern also came from an application, where the compiler generates 
load bytes zero extending them to 32 bits (as per the optimization), but gem5 
still generates stalls because of the load byte microop. I think this might 
have to be slightly remodeled I guess, but only place being conservative and 
needing a stall might be when zero extension happens into 16 bits. 
Let me know if you feel my thinking is correct. 

Thanks,
Shyam

> On Aug 1, 2019, at 3:38 PM, Gabe Black  wrote:
> 
> There is no way to disable that. The number and identity of the instructions 
> sources/destinations would need to change based on the operand size, and 
> that's not implemented. You could possibly add extra information to the 
> microops to help determine when that sort of thing is happening. All the 
> microops that do partial register updates have that behavior (so most of 
> them), not just lea.
> 
> Gabe
> 
> On Wed, Jul 31, 2019 at 8:06 PM Shyam Murthy  > wrote:
> Thanks Gabe, suppose I’m trying to carry out a data flow analysis on the 
> program, then quite often I rely on the source registers tagged by gem5. In 
> this process, would I not be tracking false dependencies? Is there a way I 
> can disable this?
> 
> Additionally, have you modelled the same only for LEA op, or for other 
> operations too? You were making a call to merge method within the static inst 
> class, I assumed this was because x86 has a lot of instructions like ADD AX, 
> imm, where the source register is clobbered with the output as well. However, 
> I guess primarily you have made calls to the merge method within the static 
> inst class to also model partial register updates. 
> 
> Thanks,
> Shyam
> 
>> On Jul 31, 2019, at 9:03 PM, Gabe Black > > wrote:
>> 
>> Hi Shyam. I think the reason is that x86 instructions (and the microops as 
>> I've implemented them) can do partial register updates, ie writing to only 
>> the lowest byte of a register. In that case, you need the old value to fill 
>> in part of the new value of the register. When writing to 32 bits or more of 
>> the register (although x86 is full of exceptions), you'd generally not need 
>> the old value since you're either writing all 64 bits or zero extending to 
>> 64 bits in the 32 bit case. That optimization is not implemented, and may or 
>> may not be realistic.
>> 
>> Gabe
>> 
>> On Tue, Jul 30, 2019 at 2:40 PM Shyam Murthy > > wrote:
>> The main reason I am asking is because I am trying to do some dependency 
>> analysis in the programs, and false dependencies show up in the process 
>> because architecture registers that are destination registers also get 
>> populated as source registers (when there is no true dependency). Am I 
>> understanding something incorrectly? 
>> 
>> Thanks,
>> Shyam
>> 
>> On Tue, Jul 30, 2019 at 2:25 PM Shyam Murthy > > wrote:
>> Hi Gabe,
>> 
>> Why is that for some of the operations like ld and lea, the decoding logic 
>> within build/X86/arch/generated/decoder-ns.cc.inc, the destination register 
>> is also decoded as a src register?
>> 
>> Thanks,
>> Shyam
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org 
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users 
>> 
> ___
> gem5-users mailing list
> gem5-users@gem5.org 
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users 
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] How to debug unexpected silent termination of benchmark program in SE mode

2019-08-02 Thread Ciro Santilli
Have you tried logging with build/ARM/gem5.opt --debug-flags ExecAll ?

On Fri, Aug 2, 2019 at 4:32 PM Grant Vesely  wrote:
>
> Good morning all,
>
> I am attempting to measure the performance of the ARM HPI model
> shipped with gem5 (in SE mode) using the XSBench
> (https://github.com/ANL-CESAR/XSBench) benchmark program.
>
> gem5 appears to exit normally ("existing with last active thread
> context @ n") without the benchmark completing (XSBench prints
> benchmark results to stdout that are not present).
>
> I am using gem5.opt 2.0 with the command-line `build/ARM/gem5.opt
> configs/example/arm/starter_se.py --cpu="hpi" --num-cores=1
> --mem-size=8GB ~/XSBench/src/XSBench`.
>
> I don't have a core dump, stack trace, or source code location (e.g.
> if an assertion was violated) that I can use, and the simulation takes
> several days to reach the failure point, which appears to be in the
> main benchmark loop
> (XSBench/src/Simulation.c:run_history_based_simulation()), so I can't
> step through it in GDB. What can I do to debug this?
>
> Cheers,
>
> Grant Vesely
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Why is whenReady local to individual blocks?

2019-08-02 Thread Simon William
Hello all,

I am inspecting the code for the basic cache interface, and have come across 
something I find peculiar. The cache consists of cache blocks equal to the 
number of sets and ways of the cache. Whenever a block is filled, its whenReady 
value is set based on the fill_latency value of the cache (currently defined as 
the data_latency, instead of having its own parameter), along with other 
parameters. whenReady defines the latency of the next cache access to the 
aforementioned block, if it is accessed before the block fill operation 
completes.

However, it seems that this does not accurately capture the effect of filling 
the cache line. At a very basic level, without any hardware modifications, no 
cache line should be accessible while the cache line is being filled, as the 
bitlines will be busy filling the cache line. At the very least, cache blocks 
of different ways that share the same set should not be accessible until the 
current tick surpasses the whenReady value for all blocks of the set, 
considering that these blocks share the same wordline. Are the assumptions made 
here correct?

Second question, is whenReady updated when writes come from the CPU side, or 
only from the memory side? I can only find evidence from the memory side.

Thanks for the help,
William Simon
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] How to debug unexpected silent termination of benchmark program in SE mode

2019-08-02 Thread Grant Vesely
Good morning all,

I am attempting to measure the performance of the ARM HPI model
shipped with gem5 (in SE mode) using the XSBench
(https://github.com/ANL-CESAR/XSBench) benchmark program.

gem5 appears to exit normally ("existing with last active thread
context @ n") without the benchmark completing (XSBench prints
benchmark results to stdout that are not present).

I am using gem5.opt 2.0 with the command-line `build/ARM/gem5.opt
configs/example/arm/starter_se.py --cpu="hpi" --num-cores=1
--mem-size=8GB ~/XSBench/src/XSBench`.

I don't have a core dump, stack trace, or source code location (e.g.
if an assertion was violated) that I can use, and the simulation takes
several days to reach the failure point, which appears to be in the
main benchmark loop
(XSBench/src/Simulation.c:run_history_based_simulation()), so I can't
step through it in GDB. What can I do to debug this?

Cheers,

Grant Vesely
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users