from:"Eliot Moss via gem5\-users"

[gem5-users] Re: Dirty blocks in L1I cache

2024-04-23 Thread Eliot Moss via gem5-users


On 4/23/2024 3:51 AM, Theodoros Papavasiliou via gem5-users wrote:

Hello everyone,

I'm running some spec2017 benchmarks on gem5 and I noticed there are some dirty blocks inside the L1 instruction cache. 
These blocks are also shared with the L1 data cache.


So, what is a possible explanation for:
1) having dirty blocks in instruction cache and
2) having the same blocks in both L1 data and instruction caches?

System configuration
CPU: O3, clock=3.4GHz
L1D: size=32KiB, assoc=8, latency=2
L1I: size=32KiB, assoc=8, latency=2
L2: size=128KiB, assoc=8, latency=15
No prefetchers

Run for 20 million instructions
I'm using private_l1_private_l2_cache_hierarchy.py


If the program creates or modifies code, then the created / modified
code will be dirty in the L1 data cache.  If that code is then executed,
it will be fetched into the L1 instruction cache.  It is still dirty in
that it has not yet been written back to main memory.  That's one scenario,
though others may be possible.

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Wondering about PCIe ATS PRI support / ARM SMMU

2024-03-23 Thread Eliot Moss via gem5-users


Dear gem5-ers:

While I'm not working on it just at the moment, I was hoping there might be
support for dynamic page mapping for I/O devices via the PCIe ATS (Address
Translation Services) PRI (Page Request Interface) facility.  My reading of
the ARM SMMU code is that it is not *quite* there, and I am not sure what
would be required to add it.  Also, it would be great if it were present for
x86 systems.  What can folks tell me about all that?  :-)

Regards - Eliot
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: warn: MOVNTDQ: Ignoring non-temporal hint, modeling as cacheable!

2024-03-19 Thread Eliot Moss via gem5-users


On 3/19/2024 4:54 AM, Sasi Kiran Reddy via gem5-users wrote:

Hi Guys,

When i am simulating some of the spec2006 benchmarks like bzip2, libq,namd etc in x86. The simulation is getting stuck 
in the middle. with the warning message as "warn: MOVNTDQ: Ignoring non-temporal hint, modeling as cacheable!".


May I know how to rectify this error and run my simulation smoothly in x86?


The underlying engine does not support the hint,
and will execute the instruction as MOVDQ.  This
is fine.  You can just ignore the warning.  It
may not lead to exactly the same cache contents,
misses, etc., as on a real processor, but there
are many other small ways in which simulations
are not exact replicas of real CPUs.

Best wishes - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Dual load cause xbar busy

2024-02-21 Thread Eliot Moss via gem5-users


On 2/21/2024 5:04 AM, chengyong zhong via gem5-users wrote:

Thanks for the clarification.
IMP, it is a common scenario for modeling a HPC core, can anyone provide some 
tips or sample programs?
On the other hand, I found that multi-bank is supported in the RUBY cache model(ruby/structures/BankedArray.cc), how is 
the multi-load implemented in the RUBY model?

Thanks


I've never used Ruby so someone else will have to answer that.

Cheers - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Dual load cause xbar busy

2024-02-20 Thread Eliot Moss via gem5-users


On 2/20/2024 9:29 PM, chengyong zhong wrote:

Hi Eliot,
Thanks for your kind reply. Are there any sample to implement the feature in 
the Gem5 code repository?


I wrote: Unless I've missed something, gem5 does not provide dual / multi port 
caches at present.

Hence, no example (that I am aware of).  Maybe somebody else on the list knows 
if there is
something out there that can be adapted.

In principle, requests could be sent one after the other over the existing xbar,
but it seemed what you were looking for is concurrent sending of two requests
at the same time, as if there were two separate sets of wires from the cpu to
the data cache.  This does not require another xbar, but it does require an
additional port out of the cpu and an additional port into the data cache.
the problem is that, AFAIK, the code implementing the cache is not prepared
to consider multiple packets arriving in the same cycle or to provided
concurrent cycling of the cache as a dual-ported cache would operate.  It is
kind of assumed that the cycle is a single memory being cycled (on hits anyway).

The code is pretty complicated because of all the different kinds of packets
and behaviors so I don't think it would be trivial to grow into the kind of
parallelism you seem to be seeking.

Can someone else confirm or deny my understanding?

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Dual load cause xbar busy

2024-02-20 Thread Eliot Moss via gem5-users


On 2/20/2024 8:18 AM, chengyong zhong via gem5-users wrote:

Hi all,
I'm using the O3CPU model for performance evaluation, we have two LoadUnit, I find that if dual load issued same time, 
the second load will be blocked and rescheduled after a few cycles of latency.

The O3CPUAll and Xbar trace show:
/The crossbar layer is now busy from tick xxx to xxx/
//
/Memory requst from inst was not sent( cache is blocked: 1, cache_got_blocked: 
1)/
My question is: How to modify the model to support dual load/store with the 
XBar?


If there is only one bus to the cache, as there seems to be, you need to add
an additional path to the cache and rework the cache to support dual access.
As we computer scientists say "this is a mere matter of programming", but IMO
it is not trivial.  Unless I've missed something, gem5 does not provide dual /
multi port caches at present.

Regards - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Architectural state of registers - O3CPU

2024-02-14 Thread Eliot Moss via gem5-users


On 2/14/2024 1:14 PM, Eliot Moss via gem5-users wrote:

On 2/14/2024 12:52 PM, reverent.green--- via gem5-users wrote:


I would like to add some additional information. The register number does
vary in each iteration, sometimes it is above 100. So I think it should be
the physical register value.  If my understanding is correct, the physical
register should be set during the IEW stage before the instruction is
commited or squashed at the last stage. Otherwise out-of-order execution
wouldn't be possible.  So in the end I am searching the point at which the
physical register is set and marked as ready for subsequent instruction,
which depend on this specific register.


Yes, it makes sense that it is a physical register.  For arithmetic, register
to register move, etc., it would be written in IEW.  But for loads, it cannot
be written until LSQ processing, which is later in the pipeline.  I believe
there is a notion of the register being *ready*, and it will be marked ready
when it is written.  Likewise, once all of an instruction's input registers
are ready, that instruction may be executed (the instruction itself becomes
ready).  You can look for the 'writeback' function in lsq_unit.cc.  It clearly
has some relationship to IEW, but it explicitly calls completeAcc, which does
the actual write into the register.  The specific code for that came from the
instruction's template.  This is necessarily so - consider the difference
between loading a byte (say) vs a word, and sign- vs zero-extended values.


See also function writebackInsts in iew.cc.  EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Architectural state of registers - O3CPU

2024-02-14 Thread Eliot Moss via gem5-users


On 2/14/2024 12:52 PM, reverent.green--- via gem5-users wrote:


I would like to add some additional information. The register number does
vary in each iteration, sometimes it is above 100. So I think it should be
the physical register value.  If my understanding is correct, the physical
register should be set during the IEW stage before the instruction is
commited or squashed at the last stage. Otherwise out-of-order execution
wouldn't be possible.  So in the end I am searching the point at which the
physical register is set and marked as ready for subsequent instruction,
which depend on this specific register.


Yes, it makes sense that it is a physical register.  For arithmetic, register
to register move, etc., it would be written in IEW.  But for loads, it cannot
be written until LSQ processing, which is later in the pipeline.  I believe
there is a notion of the register being *ready*, and it will be marked ready
when it is written.  Likewise, once all of an instruction's input registers
are ready, that instruction may be executed (the instruction itself becomes
ready).  You can look for the 'writeback' function in lsq_unit.cc.  It clearly
has some relationship to IEW, but it explicitly calls completeAcc, which does
the actual write into the register.  The specific code for that came from the
instruction's template.  This is necessarily so - consider the difference
between loading a byte (say) vs a word, and sign- vs zero-extended values.

Regards - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Architectural state of registers - O3CPU

2024-02-14 Thread Eliot Moss via gem5-users

On 2/14/2024 12:26 PM, reverent.green--- via gem5-users wrote:

Hey Eliot,
thank you for your answer. I have a follow-up question.
I know, that there are more physical registers than architectural ones and that the achitectural state should be set in
the final commit state.
So if the debug message linked in my earlier mail shows e.g.: "Setting int register 54 to 0x53000", this "register 54"
should be a physical register and it can be used without setting the architectural state?

Do you know, at which point in the O3 steps this physical register is set after
an instruction?

That's something where I'd need to dig into the code the make sure. However,
the number 53 is fairly large so my first impression is that it is a physical
register number, not a logical (architectural) one. On the other hand, if you
count up integer registers, floating point registers, vector registers, etc.,
53 could be in the range of the architectural registers. I do know that if
you request debug trace information from gem5, it will tend to refer to
architectural registers.

I don't know precisely where the physical register is set, but my first
thought is IEW - the W part stands for Writeback, i.e., when registers
typically are written. However, loads are probably written later since they
are not computational but wait for a response from the cache. As I recall,
the load/store queue processing is a separate step in the pipeline, coming
later than IEW.

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Architectural state of registers - O3CPU

2024-02-14 Thread Eliot Moss via gem5-users


On 2/14/2024 11:19 AM, reverent.green--- via gem5-users wrote:

Hello everyone,
can someone give me a hint, where exactly in the code the architectural state of (load) instructions is getting set and 
becomes visible? I tried to trace instructions during the execution via log outputs, but got a bit lost during the IEW 
stage.
I know, that instructions, which depend on specific registers will wait until the register is marked ready from an 
earlier usage. (https://github.com/gem5/gem5/blob/stable/src/cpu/o3/regfile.hh#L273)

But is this already equivalent to the architectural state?

And how is this handled during a wrong speculative execution because of the 
following rollback/squashing?
Kind regards
Robin


A typical out-of-order processor does register renaming, so there are
generally *many* more physical registers than architectural ones, and the
hardware maintains a dynamic mapping.  If necessary, the architectural state
can be constructed, but generally would not be unless you're switching threads
or something.  While IEW may update the registers (I believe), it is the
commit stage that makes the change "permanent".

Does that help?

Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Effective address and ISA

2024-02-06 Thread Eliot Moss via gem5-users


On 2/6/2024 11:13 AM, Nazmus Sakib via gem5-users wrote:

I think gem5 has this SplitDataRequest() method that breaks the request if it 
would need more than one cacheline.
In fact, the page fault is occurring before it goes to the cache. The panic message says the address is 0x400. By 
looking into the disassembly and the output log of -debug-flag=ExecAll, I think the address is an instruction address, 
as I have found addresses starting with 0x400, for example :

0x400bf8 @__libc_start_main+808    :   movz   x1, #0,

Although, the last instruction I see in the output from ExecAll flag is a store 
instruction:
0x41c0ec @_dl_debug_initialize+124    :   stlr   x0, [x3]          : MemWrite : 
 D=0x00492000 *A=0x498028*
Right after this, the panic message occurs.
In fact, using debug-flag=LSQUnit, I can see the message that "Fault on store 
pc" which points to this store instruction.


Yes, it *can* split requests; I'm just not sure it's prepared
to do in this case.

Since you have mentioned instructions, I'm now wondering if there
could be an issue with really small instruction cache lines.

The store might be faulting because of address translation, but it
might also be faulting because of ordering constraints (it's a
store release instruction).  It is 64-bit, which means it would
cross cache lines.  Maybe that's disallowed for such ordering
ops?

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Effective address and ISA

2024-02-05 Thread Eliot Moss via gem5-users

On 2/5/2024 1:39 PM, Nazmus Sakib wrote:

I am trying to see how small I can set the cacheline size (gem5 ARM, test
binary is aarch64)
When I set it to 4 bytes, I get a page fault for address 0x400c00. By going through bunch of debugging (using print of
my own and debug flags), I think the problem is, when trying to generate address 0x400c00, it is only generating 0x400,
and since it is not the VMA list the fixfault() function cannot assign new page, nor can the page table lookip() access
the already assigned page.
I am guessing somehow the MSB of 0x400c00 are lost in address generation. So I was trying to look where this is
happening and why.
I know 4 byte cacheline is unrealistic, and also I am running 64 bit binary, but I want to find the exact reason for
this page fault (I might also be missing some basic understanding of Compter System theory).

Note: cacheline size=8 (byte) works fine !!

*From:* Eliot Moss
*Sent:* 05 February 2024 09:47
*To:* The gem5 Users mailing list
*Cc:* Nazmus Sakib
*Subject:* Re: [gem5-users] Effective address and ISA
[You don't often get email from m...@cs.umass.edu. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification ]

WARNING This email originated external to the NMSU email system. Do not click on links or open attachments unless you
are sure the content is safe.

On 2/5/2024 10:41 AM, Nazmus Sakib via gem5-users wrote:

Hello.
I was trying to find how the virtual (logical) addresses are calculated and
passed on to cpu.
In the load/store queee, after a request object is created, then the
corresponding instruction is assigned a effective
address from this request object, something like inst->effaddr=req->getVirt().I
found setVirt(), the set virtual address
function. But I cannot find who calls this setVirt() and where.

For example: ldr x0, [x1,#1024] // an ARM instruction
Here, the address would be x1+1024. So the content of x1 register plus
immediate 1024.
How and where would this address calculation take place ? Where can I see the
contents of x1 register is added with 1024
? and who would call the setVirt() function ?
As I understand, address calculation is ISA specific, and the dynamic/static
instruction classes works with ISA files to
get this done. I wanted to know how this works, the interface for connecting
ISA features to cpu pipeline.

src/arc/arm/insts/macromem.isa has definitions of micro-ops used for
memory instructions. In there you can find some of the effective
address calculation code being generated (look for eaCode and EA).
See also the instruction templates in src/arc/arm/isa/templates/mem.isa.

These isa files are processed by a custom macro processor to generate
the actual decoding, execution, etc., functions, which you can find
in the build hierarchy.

The whole construction is somewhat complex, but I hope I've answered
your question. Was it just a point of curiosity, or is there something
specific you're trying to do?

Eliot Moss

A guess would be that the code is not set up to expect an aligned 8 byte
quantity might break across cache lines. To make that work, the 8 byte
access would have to be broken into two 4 byte accesses, since each can
miss separately. It would likely take deeper changes to make that work,
though I would think it is possible with concomitant effort.

When you say it is generating 0x400, is that as the whole address? It
looks suspiciously like the page number (i.e., shift right by 12 bits).
But anyway, as I mentioned, a cache line size of 4 bytes has other
problems with it and that may somehow be leading to the behavior you see.

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Effective address and ISA

2024-02-05 Thread Eliot Moss via gem5-users


On 2/5/2024 10:41 AM, Nazmus Sakib via gem5-users wrote:

Hello.
I was trying to find how the virtual (logical) addresses are calculated and 
passed on to cpu.
In the load/store queee, after a request object is created, then the corresponding instruction is assigned a effective 
address from this request object, something like inst->effaddr=req->getVirt().I found setVirt(), the set virtual address 
function. But I cannot find who calls this setVirt() and where.


For example: ldr x0, [x1,#1024] // an ARM instruction
Here, the address would be x1+1024. So the content of x1 register plus 
immediate 1024.
How and where would this address calculation take place ? Where can I see the contents of x1 register is added with 1024 
? and who would call the setVirt() function ?
As I understand, address calculation is ISA specific, and the dynamic/static instruction classes works with ISA files to 
get this done. I wanted to know how this works, the interface for connecting ISA features to cpu pipeline.


src/arc/arm/insts/macromem.isa has definitions of micro-ops used for
memory instructions.  In there you can find some of the effective
address calculation code being generated (look for eaCode and EA).
See also the instruction templates in src/arc/arm/isa/templates/mem.isa.

These isa files are processed by a custom macro processor to generate
the actual decoding, execution, etc., functions, which you can find
in the build hierarchy.

The whole construction is somewhat complex, but I hope I've answered
your question.  Was it just a point of curiosity, or is there something
specific you're trying to do?

Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: How Can I Save the output file-Full System Simulation

2024-01-16 Thread Eliot Moss via gem5-users

On 1/16/2024 1:15 AM, sun2k23 via gem5-users wrote:

I think maybe you can try /sbin/m5 utility, using the writefile option. Then you can redirect the virtual OS
environment files to local Hosts.

The command format is like below:

S2K

At 2024-01-16 11:41:57, "hu miao via gem5-users" wrote:

Hi:
First of all, I am very, very grateful to Hao Nguyen, who answered me
another question about full system
simulation the other day, because I don't know much about gem5 user lists,
and clicking on the reply below always
shows a connection error, so I haven't been able to reply.
I still want to ask a question today about whole-system simulation.
I'm using a full system simulation to test
the MachSuite benchmark, MachSuite contains a lot of algorithms, and I want
to know how long each algorithm runs. I
let the running time of each algorithm be printed to the output .txt, which
makes it easier for me to organize the
data, but I found that the directory that exists after logging in with
m5term localhost 3456 is [root@gem5-host],
and the file is also under this environment, how can I save the output .txt
file in my own server directory or how
can I retrieve the output.txt file from [root@gem5-host].
If someone could answer my question, I would greatly appreciate it.

Actually, you can set the output directory from the command line to gem5.
I do that so that each run goes into a different directory. I'm sure that
people have written all kinds of shell scripts wrappers (as I have) to
handle this.

Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: About gem5 stats granularity

2024-01-12 Thread Eliot Moss via gem5-users


On 1/12/2024 7:57 AM, elio.vinciguerra--- via gem5-users wrote:
Hi everybody, I should have the statistics provided by gem5 in stats.txt with an instruction level granularity. I 
noticed that by default gem5 provides them global, from the beginning of execution to the end. Is it possible to change 
this behavior and somehow get the stats for each simulated instruction?


I'm not sure this makes much sense.  If you mean stats
broken down for each individual instruction executed,
the output would be prohibitively large.  It also would
be hard to pin down / define, since in pipelined and
our-of-order processors, the execution of different
instructions is overlapped, an instruction can be fetched
but not executed, it can be executed speculatively and
later dropped, etc.

Consider just the "simple" question: How long did this
instruction take to execute?  It might have taken (say)
10 cycles in a deep pipeline to go all the way from being
fetched to committed, but its contribution to the total
execution time may be just one cycle or even zero (a
correctly predicted branch on some architectures).

Maybe if you tell us what you are really trying to get at
we can be more helpful :-) ...

Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: importing packages like numpy

2024-01-02 Thread Eliot Moss via gem5-users


On 1/2/2024 9:28 AM, saras nanda via gem5-users wrote:

Hi Everyone ,

I am doing a full system simulation on ARM using fs_bigLITTLE.py and fs_power.py , I am trying to import numpy library 
in my python script but it takes 3-4 days and is still running but the library is not imported , how can I speedup this 
, once I import I can checkpoint but somehow its very slow and takes a really long time.please provide me some 
suggestions on this.


You can run to checkpoint using a simple and fast cpu model,
such as the atomic simple cpu.  Then you can run after checkpoint
with a more complex cpu model.  Of course, the system may already
be doing that for you.

Another possible issue is whether you're firing up a gazillion
server processes in the OS.  Trimming down the boot sequence
helped me save a lot.

HTH

Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: config.dot generation fails

2023-12-28 Thread Eliot Moss via gem5-users


On 12/28/2023 9:24 PM, Arka Maity via gem5-users wrote:

Hi Eliot,


Thanks for your response. Yes, the above issue is just a warning and does not seem to affect the actual simulation runs. 
I was just worried, that something could be wrong with my configuration, which might cause issues later.


Is there a way to suppress the generation of the config.dot files? I tried to search for any cmd line options, but no 
luck. I rely more on config.[ini/json] files and so config.dot is not useful to me anyways.


The --dot-config parameter gives the name of the file to use, default "config.dot".  Perhaps setting --dot-config=None 
or something like that will suppress it.


Regards - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: config.dot generation fails

2023-12-28 Thread Eliot Moss via gem5-users

On 12/27/2023 9:53 PM, Arka Maity via gem5-users wrote:

Hi All,

My ruby configuration instantiates multiple ruby networks, Memtesters, and CHI controllers. When I start, the
simulation, I encounter the following warning.

Warning: flat edge between adjacent nodes one of which has a record shape - replace records with HTML-like labels\n Edge
system_cpu3_data_sequencer_in_ports -> system_cpu5_port\nError: lost system_cpu5_port
system_cpu3_data_sequencer_in_ports edge\n'

warn: failed to generate dot output from /config.dot.

These warnings are preceded by a huge dump of what looks like a part of the
above config.dot file.

When I change the number of Memtesters (instantiated as system.cpu….) The warning disappears. Any ideas, on how to
resolve this? Sorry, I am unable to recreate this issue on a simpler configuration.

This seems to be something deeply technical in dot, a graphviz tool. See this,
for example:

https://forum.graphviz.org/t/why-does-this-link-not-show-up-in-this-graph/258

Since it has to do with the layout of the graph, changing the number of nodes, etc., certainly could affect it. I think
for gem5 purposes, we (you) can probably live with it, just realizing that some edge that is supposed to be there won't
be. The program / script that sets up to call dot might possibly be changed to do some things in a different way to
avoid this, of course.

Hope this helps, at least with understanding what is going on.

Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Read Miss Operation at Last Level Cache (LLC)

2023-12-08 Thread Eliot Moss via gem5-users


On 12/8/2023 1:31 AM, zahra moein via gem5-users wrote:

Hello,

Thank you very much for your response. I appreciate your assistance, and I made an effort to understand the code, which 
provided me with a better understanding.


However, I couldn't determine the exact origin of the packet sent by sendTimingResp() for it to be received by 
recvTimingResp().


I defined a counter and counted the number of times handleFill() was called in the last-level cache. This count was 
equal to system.l2.overallMshrMisses::total and system.mem_ctrls.dram.numReads::total.

The last-level cache is L2, and the prefetcher is disabled.

Based on this, can I conclude that whenever handleFill() is used in the last-level cache, a block is evicted from the 
last-level cache, and a clean block is brought from the main memory to the last-level cache?


I think your conclusion may be too strong because a block *can* be moved
from the IO cache (for example) to the last level cache.  A block brought
from memory is by definition clean, so that part holds.  I think violation of
your assumption would be rare, and perhaps in some systems it would always
hold.

HTH.

Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Read Miss Operation at Last Level Cache (LLC)

2023-12-01 Thread Eliot Moss via gem5-users


On 12/1/2023 6:24 PM, zahra moein via gem5-users wrote:

Thank you for your response.

I would like to seek further clarification regarding the parameter 
RecvTimingResp:
  void BaseCache::recvTimingResp(PacketPtr pkt)

Could you please confirm if this parameter represents a packet that is received from memory? If it does, I would 
appreciate your guidance on how to determine whether this packet is a response to a read miss request in the last-level 
cache.


Thank you for your valuable assistance.


It's a packet received from whatever the cache is connected to.  If it's the
last level cache, then it should come from the memory bus, which would have
gotten it from memory.  But you need to analyze what sort of packet it is, and
the nature of the *request* that resulted in the packet, etc.  You should at
least get a sense of the control flow of that function and perhaps of some of
the significant functions it calls.

If the data are headed for this cache, then there will be some place where
they get loaded into a cache block.

I'm not sure you can say with 100% certainty it has to do with a read miss,
though if you look for where read misses generate requests, that may be
helpful.  One possibly confounding case is packet arriving because of prefetch
- but I think they should be marked as such.

Btw, another case is when another cache responds, but from the viewpoint of
this cache it will still be a recvTimingResp (at the memory bus, a
recvTimingSnoopResp will be involved).

My intention was to point you where to start understanding the code, well
enough at least, not to indicate the specific code location.  (My notion is
that you'll be better off if you study it some and learn more about how it
works (not every detail, since there's a lot of complexity there!), as opposed
to being directed to specific line(s) of code.)  Another clue would be where
statistics counters get updated :-) .

It's conceivable that it will be easier to find the data coming from memory
due to read misses by looking at packets coming to the memory bus from the
memory controller(s).  (I assume you're not interested in data arriving from
I/O devices.)

If you end up really stuck, I may be able to find time to take a look.

HTH -- EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Read Miss Operation at Last Level Cache (LLC)

2023-11-30 Thread Eliot Moss via gem5-users


On 11/30/2023 2:07 PM, zahra moein via gem5-users wrote:

Hi everyone,

As we already know, a "Read miss" at the last cache level (LLC) means that the desired block in the LLC for reading was 
not found. Consequently, it is necessary to locate a victim block and copy the desired block from the main memory to the 
victim block's location in the last cache level.


I want to see the data content of the blocks that were brought to the last level of the cache due to a read miss from 
the main memory. However, I am unsure where the Read Miss operation is implemented. Based on my research, it appears 
that the function:


CacheBlk* BaseCache::handleFill(PacketPtr pkt, CacheBlk *blk, PacketList 
, bool allocate)

maybe helpful in this regard.

I would greatly appreciate any suggestions or guidance on how to effectively validate my findings. For your information, 
I am utilizing a classic cache configuration and an O3 CPU type.


Thank you for your attention to this matter, and I eagerly look forward to your 
guidance.


If you're running a timing model, the packet will arrive with
a RecvTimingResp call.  The code for that function will look
at the nature of the response and possibly load data into a
cache block.  That's where I would look.

Best wishes - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Understanding Squashed Loads/Stores

2023-11-21 Thread Eliot Moss via gem5-users


On 11/21/2023 12:16 PM, Arth Shah via gem5-users wrote:

Hi everyone,

I'm running a benchmark on the O3CPU model (aarch64) and see something strange that I wasn't able to understand. I see a 
lot of Squashed loads and stores in the LSQ but it doesn't seem like it is due to branch misprediction or Cache 
blockages. What else could cause this magnitude of squashes in LSQ?


cpu.lsq0.squashedLoads       657977
cpu.lsq0.squashedStores      1386633

cpu.lsq0.blockedByCache            0
cpu.iew.branchMispredicts           31


I haven't dug into the code, but I wonder what happens
on a TLB miss / page fault.  The load/store might be
part way through the process and then get squashed ...

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Is SMT Supported in ARM Full System Simulation

2023-10-10 Thread Eliot Moss via gem5-users


On 10/10/2023 4:04 AM, Abdelrahman S. Hussein via gem5-users wrote:

Hello,

I am considering using ARM ISA for simulation on gem5. I understand that SMT is NOT supported for Full System Simulation 
for x86. I just would like to know if gem5 supports SMT for Full System simulation in ARM ISA.


Not as far as I know.  This has to do with the underlying
generic models used in gem5.  They are customized to each
instruction set by fiddling parameters, adding functional
units, etc., and of course the instruction formats and
actions can be adjusted.  But the nature of the models
(in-order, out-of-order) are the same.

If I am wrong I'm sure someone will correct me!

Best wishes - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Squashing Instructions after Page Table Fault

2023-10-09 Thread Eliot Moss via gem5-users


You observed that the check on line 471 in tlb.cc did not seem to be the one
causing the fault in the case you were looking at.  It occurs to me that the
line 471 check is for a *resident* page.  If the page is *not* resident, some
other check would apply, and the fault might be raised when the OS examines
the PTE to determine what to do with a disallowed access to a non-resident
page.

Could that be the scenario you were looking at?  That would indeed seem to be
more involved, though at the point gem5 does the interrupt for a non-resident
page (one not in the TLB) you might be able to more directly do a check of the
PTE.  To do that you would need to emulate walking the page tables (hoping
that all the relevant page table pages are themselves resident).

Yes, possibly a bit of a mess ...

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Squashing Instructions after Page Table Fault

2023-10-04 Thread Eliot Moss via gem5-users

On 10/4/2023 10:03 AM, reverent.green--- via gem5-users wrote:

Hi Yuan,

thank you very much for your detailed response. My understanding of the
fault handling in gem5 is getting better and better. Using debug flags, I
can trace the control flow during the execution of my code.

I am currently inspecting tlb.cc in further detail, but I am still searching
for the exact check for my problem. To further specify my question:

During the attempt to access kernel memory, the “user/supervisor” (U/S)
pagetable attribute is used to check whether this page table belongs to
kernel memory or not. If I want to access the memory, it should raise the
page table fault. I am looking for this specific check. My goal is, to
experiment with gem5 and to customize it. Currently, the instruction is not
executed when raising a Page Table Fault. In a first step, I want to change
the check in order to execute the instruction although it wants to access
kernel memory. So I explicitly search for this check inside this command
chain during the Page Fault handling.

Thank you very much in advance.

Best regards

Robin

Assuming we're talking about the x86 architecture, line 471 in tlb.cc is where
the check in question happens:

https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/tlb.cc#L471

Note that the raw bits of the PTE have been abstracted out in the gem5 TLB
entry data structure, hence properties such as entry->user.

HTH

Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Accessing dependent memory locations in a single instruction

2023-09-25 Thread Eliot Moss via gem5-users


On 9/25/2023 4:59 PM, Leonard Peterson via gem5-users wrote:

Hello,

I'm trying to implement an instruction "myinst" that accesses dependent memory locations (similar to 
pointer chasing) using the TimingCPU model (initiateAcc() and completeAcc()). For example:


  myinst r0,0x14000

The above line will first read 8 bytes from address 0x14000, which is say 0xFF000. It will then read 
8 bytes from address 0xFF000 (or "qword ptr [0x14000]") into register r0. This requires issuing 
another initiateMemRead() call within the insturction's completeAcc() function. However, I haven't 
found another example instruction that does this.


I wonder whether this is currently supported by TimingCPU and what is the 
correct way to approach this?


Intel machines are example of ones that have complex addressing modes.  The way
gem5 handles this is for the instruction, called a macro instruction, to be
broken into two or more micro instructions for execution.  This is done in the
decoding process.  It is typical in the out-of-order cpu, but could apply to
other, I think.

Hope this helps ... EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Counters for # DRAM reads, writes, page hits, and page misses

2023-09-08 Thread Eliot Moss via gem5-users


On 9/8/2023 2:55 AM, Aritra Bagchi via gem5-users wrote:

Hi all,

Can anyone indicate how to extract performance counters such as the number of DRAM read operations, 
the number of DRAM write operations, the number of times a page miss occurs, etc.?


Inside src/mem/mem_ctrl.cc, MemCtrl::recvTimingReq( ) method, there are two methods for inserting 
new read and write operations into their respective queues, namely addToReadQueue( ) 
and addToWriteQueue( ). Can the #reads and #writes can be obtained from here? And what about # page 
hits/misses? Any help is appreciated.


The way things generally work in gem5 is that you get a stats dump at
the end of a run.  There are also ways to request such dumps more frequently.
You get a lot of details about accesses to caches and memories.  Are you
looking at stats dumps and not seeing what you hope for?

Best - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Latency or speed

2023-09-05 Thread Eliot Moss via gem5-users

On 9/5/2023 9:30 AM, 中国石油大学张天 via gem5-users wrote:
Hello, I would like to ask, in Gem5, will differences in the order of magnitude of operations such
as Add affect factors such as latency or execution speed?

I'm not sure how to answer that. Things depend so much on processor
model and workload. If an operation is very common, and if other concurrent
work can't hide it's latency, then overall execution time will be longer
with a higher latency. Likewise, instructions per clock (speed) will suffer.

But there are many factors. Remember a functional unit might be able to take a
new add on every cycle, but might require (say) 5 cycles to complete each one.
The *speed* is one per cycle, but the *latency* is 5. Then again, an FU might
not be pipelined and its speed and latency would bot be one operation per 5
cycles. Yet, there might be more than one FU that handle adds, so there are
other sources of parallelism.

I think all we can offer is a kind of general insight like this. Plus I note
that part of what matters in a particular workload is the dependencies between
instructions. If the whole rest of the computation requires the result of that
add, then things will be slower than if there are multiple things going on.

And all of this assume the out of order processor model. An entirely in-order
processor will tend to be be more strongly affected by high latency of a common
operation.

None of this is particular to gem5 - it's general principles of computer
architecture.

Maybe others can refine / add to this ...

HTH -- EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: can't run riscv simulation with any CPU model except Atomic

2023-08-24 Thread Eliot Moss via gem5-users


On 8/24/2023 1:35 AM, oe-fans via gem5-users wrote:
After 2 hours, message appear in m5term. My CPU is xeon e5 2687w v3, I think the boot time is too 
long, are there any way to accelerate it?


I have seen it take longer if the number of cpus told to gem5
does not match the number of cpus in the Linux dtb file.  This
was a while ago, so I'm not sure it's still true ...

Best - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: How to connected CpuSidePort and MemSidePort within a simobject ( not in config file)

2023-08-14 Thread Eliot Moss via gem5-users


On 8/14/2023 3:47 PM, Khan Shaikhul Hadi wrote:
Instead of directly connecting all level 1 caches ( icache, dcache etc) to CPU and next level bus, I 
want to create a controller module that will have all those caches . This controller module will 
receive all cpu requests and distribute them to caches. Similarly it will receive all requests to 
next level caches and send them to next level cache. The reason I want to do such a thing because, 
for my current work, I need to observe requests and responses from caches and modify them based on 
some protocol. Initially I thought of modifying the caches all together, but that became more 
complicated and I thought if I could connect those caches within a module, that would simplify 
things without sacrificing any performance modeling of the caches. Problem is I could not figure out 
how to connect the cache simobject cpu and mem side port with the internal port of the controller 
module.


Best
Shaikhul


I'd be tempted to make a new subclass of CommMonitor and interpose instances between modules.  You 
could keeps the stats capabilities or not.


Regards - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: How to connected CpuSidePort and MemSidePort within a simobject ( not in config file)

2023-08-14 Thread Eliot Moss via gem5-users


On 8/14/2023 1:42 PM, Khan Shaikhul Hadi wrote:

Initially I was thinking doing something like this as you suggested:

CpuSidePort cacheMemSidePortConnection = cache.memSidePort;
MemSidePort cacheCpuSidePortConnection = cache.cpuSidePort;


problem is when I looked into how python code done this connection, constructor 
has


  cpuSidePort(p.name  + ".cpu_side_port", *this, 
"CpuSidePort"),
      memSidePort(p.name  + ".mem_side_port", this, 
"MemSidePort"),


So, when I was thinking about using CpuSidePort cacheMemSidePortConnection = cache.memSidePort, I 
could not make sense of howto deal with those constructor arguments . Especially"this" pointer which 
is supposed to be a pointer or reference to the cache object, but my simobject is not cache_object. 
So, I'm not sure how to deal withthis.


Thanks for your input though.

Best
Shaikhul


If you have already declared Cache cache, then the Cache constructor has run 
and the fields are
available for assigning to other variables - assuming they are public or you 
can arrange access.
Of course you really need to write:

Cache cache(parameters to Cache constructor);

But perhaps you could clarify what you're really trying to do (bigger picture) 
rather than
saying "I want this port connected to that one".

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: How to connected CpuSidePort and MemSidePort within a simobject ( not in config file)

2023-08-14 Thread Eliot Moss via gem5-users


On 8/14/2023 11:58 AM, Khan Shaikhul Hadi via gem5-users wrote:
In my code I'll have a simobject which has its own cache. As classical cache use CpuSidePort and 
MemSidePort to receive and respond to request, I want to create some internal CpuSidePort and 
MemSidePort in my simobject like below


class SimObject : public ClockedObject
{
  Cache cache;
CpuSidePort  cacheMemSidePortConnection;
MemSidePort cacheCpuSidePortConnection; 



// CpuSidePort and MemSidePort class could follow same structure as 
BaseCache CpuSidePort and
MemSidePort


...
...
...
}

My question is how could I connect this ports with cache such that when I schedule some request pkt 
using cacheCpuSidePortConnection, cache's cpuSidePort will catch that packet, when cache's 
memSidePort schedule some req pkt, cacheMemSidePort will catch that pkt.
In the front end, I could see in the library, we could do that using param ( cache.cpu_side_port = 
cpu.mem_side_port). But could not find any reference that connects to port within a simobject.

Any suggestions  or resources which I could follow ?


Why not just do this?

CpuSidePort cacheMemSidePortConnection = cache.memSidePort;
MemSidePort cacheCpuSidePortConnection = cache.cpuSidePort;

Then whatever is connected with the cache is exactly what;s connected to your 
thing -
though I am not sure what you get by doing this.

The connecting up is generally done by Python code in terms of parameters
in the Python classes.  Thus Cache.py has class BaseCache  with:

cpu_side = ResponsePort(...)
mem_side = RequestPort(...)

and in cache/base.cc the constructor has:

cpuSidePort(p.name + ".cpu_side_port", *this, "CpuSidePort"),
memSidePort(p.name + ".mem_side_port", this, "MemSidePort"),

cache/base.hh has (after a whole series of class definitions):

   CpuSidePort cpuSidePort;
   MemSidePort memSidePort;

Here's another possibility: write your thing as a subclass of Cache, assuming 
it is
supposed to act somewhat like a cache.

I don't feel totally confident in guiding you here since it's not clear what 
you're
really hoping to do ...

Maybe others will have other, more useful, perspectives for you ...   EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Build Error 134

2023-08-06 Thread Eliot Moss via gem5-users


On 8/6/2023 1:50 AM, Kaiwen Xue via gem5-users wrote:

On Sat, Aug 5, 2023 at 5:13 AM Eliot Moss via gem5-users
 wrote:


On 8/5/2023 2:16 AM, Kaiwen Xue via gem5-users wrote:

Hi,

I'm new to gem5 and trying to follow the official tutorial [1] to
build an x86 opt target from commit hash 48b4788.

The compilation failed with Error 134. Outputs didn't seem to be
meaningful - they are just normal building messages and ended with
Error 134. I shall attach a detailed build output at the end of this
email.

The build command I used was `python3 `which scons`
build/X86/gem5.opt`. My python3 version is 3.8.10. My scons version is
3.1.2. I've checked all the other dependencies and they seem to be
fine. My machine is a physical server with ubuntu 20.04 running Linux
5.15.

In addition, every time I remove the built directory and rebuilt, the
file names before the Error 134 message are different. e.g., the
attached output has "scons: *** [build/X86/cpu/o3/O3Checker.py.cc]
Error 134", but the file name would different across different builds.

Is there a way to narrow down this issue?

Thanks!
Kevin


My guess is that you ran out of memory - some of the compilations need quite a
lot!  Since scons typically builds in parallel, you get some variation in
which jobs are running when.  You *might* try compiling just one thing at a
time (-j1).  No need to rm everything - you can continue where it aborted.
But still, if even one job demands too much memory, you will fail again.


Hi Eliot,

Thanks for the response! However, I have more than abundant resources
on my server. It's not a virtual machine. It has more than 200GB of
free disk space and 100GB of free memory. I can't continue where it
aborted as well, because the error seemed to start repeating every
time and more compilation wouldn't make more progress.

I'm suspecting that might be a compiler bug though. The parser
reported shift/reduce conflict:
Generating LALR tables
WARNING: 4 shift/reduce conflicts
WARNING: 1 reduce/reduce conflict
WARNING: reduce/reduce conflict in state 98 resolved using rule
(params -> empty)
WARNING: rejected rule (types -> empty) in state 98

which might lead to the following return type warnings in the cache
coherence .sm file:
MESI_Two_Level-L1cache.sm:246: Warning: Non-void return ignored,
return type is 'bool'
MESI_Two_Level-L1cache.sm:248: Warning: Non-void return ignored,
return type is 'bool'
MESI_Two_Level-L1cache.sm:887: Warning: Non-void return ignored,
return type is 'Tick'
MESI_Two_Level-L1cache.sm:999: Warning: Non-void return ignored,
return type is 'Tick'
MESI_Two_Level-L1cache.sm:740: Warning: Unused action:
e_sendAckToRequestor, send invalidate ack to requestor (could be L2 or
L1)
MESI_Two_Level-L2cache.sm:235: Warning: Non-void return ignored,
return type is 'bool'
MESI_Two_Level-L2cache.sm:237: Warning: Non-void return ignored,
return type is 'bool'
MESI_Two_Level-L2cache.sm:594: Warning: Unused action:
fw_sendFwdInvToSharers, invalidate sharers for request
MESI_Two_Level-L2cache.sm:764: Warning: Unused action:
kk_removeRequestSharer, Remove L1 Request sharer from list
MESI_Two_Level-L2cache.sm:780: Warning: Unused action:
mm_markExclusive, set the exclusive owner
MESI_Two_Level-dir.sm:160: Warning: Non-void return ignored, return
type is 'bool'
MESI_Two_Level-dir.sm:294: Warning: Non-void return ignored, return
type is 'Tick'
MESI_Two_Level-dir.sm:298: Warning: Non-void return ignored, return
type is 'Tick'
MESI_Two_Level-dir.sm:302: Warning: Non-void return ignored, return
type is 'Tick'
MESI_Two_Level-dir.sm:348: Warning: Non-void return ignored, return
type is 'Tick'
MESI_Two_Level-dir.sm:351: Warning: Unused action:
p_popIncomingDMARequestQueue, Pop incoming DMA queue
MESI_Two_Level-dma.sm:189: Warning: Non-void return ignored, return
type is 'Tick'
MESI_Two_Level-dma.sm:193: Warning: Non-void return ignored, return
type is 'Tick'

Any idea why those happened? Any response or hints are appreciated!


That looks to me like something coming from ruby, and that parser may be
quite distinct from the one in g++.  But I don't use ruby so I don't
really know ...  Sorry - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Build Error 134

2023-08-05 Thread Eliot Moss via gem5-users


On 8/5/2023 2:16 AM, Kaiwen Xue via gem5-users wrote:

Hi,

I'm new to gem5 and trying to follow the official tutorial [1] to
build an x86 opt target from commit hash 48b4788.

The compilation failed with Error 134. Outputs didn't seem to be
meaningful - they are just normal building messages and ended with
Error 134. I shall attach a detailed build output at the end of this
email.

The build command I used was `python3 `which scons`
build/X86/gem5.opt`. My python3 version is 3.8.10. My scons version is
3.1.2. I've checked all the other dependencies and they seem to be
fine. My machine is a physical server with ubuntu 20.04 running Linux
5.15.

In addition, every time I remove the built directory and rebuilt, the
file names before the Error 134 message are different. e.g., the
attached output has "scons: *** [build/X86/cpu/o3/O3Checker.py.cc]
Error 134", but the file name would different across different builds.

Is there a way to narrow down this issue?

Thanks!
Kevin


My guess is that you ran out of memory - some of the compilations need quite a
lot!  Since scons typically builds in parallel, you get some variation in
which jobs are running when.  You *might* try compiling just one thing at a
time (-j1).  No need to rm everything - you can continue where it aborted.
But still, if even one job demands too much memory, you will fail again.

If you're running in a virtual machine, increase its main memory size.  As I
recall, something like 5-6Gb are needed to build gem5.  Using somewhat more
than that won't hurt.

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: how clflush is simulated in classic cache ( not ruby ) ?

2023-08-03 Thread Eliot Moss via gem5-users


On 8/3/2023 11:56 AM, Khan Shaikhul Hadi wrote:
I'm not sure about silent suppression as that flush instruction is called write after write happens 
to that memory. Fault should arise when a write request is issued. .


This would happen during address translation, which comes before putting
something in the write queue, I believe.  So my scenario is still plausible,
though I do not claim it is right.

Also I checked the packet received in the cpu side port and it receives multiple CacheMaintenance  ( 
CacheClean) requests ( equivalent number of times workload executes clflushopt instruction). So, I'm 
assuming gem5 handles clflushopt instruction somehow, I just could not track how they are  doing it 
without placing it in the store queue as it will be difficult to simulate the following fence 
without this.


To get at various things like this, I like to use the Undo Debugger (udb).
It's a commercially available time travel debugger, but you can get a free
trial and they're also willing to give a free license for academic use.
You can find a place where the cache maintenance packet is issued and
work backwards from there, possibly going an event at a time to find each
prior stage of handling.

Best wishes - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: how clflush is simulated in classic cache ( not ruby ) ?

2023-08-02 Thread Eliot Moss via gem5-users


On 8/2/2023 3:20 PM, Khan Shaikhul Hadi via gem5-users wrote:


But my gdb traces showing that request->isMemAccessRequired() is returning false. That's where I'm 
confused. I'm running this simulation in SE mode.


I always deal with FS mode, but I don't think that matters.

I wonder if the particular access in question is one to a page not currently
mapped.  That should result in silent suppression of the flush (no page fault),
I believe.

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: how clflush is simulated in classic cache ( not ruby ) ?

2023-08-01 Thread Eliot Moss via gem5-users


On 8/1/2023 5:15 PM, Khan Shaikhul Hadi via gem5-users wrote:
As far as I understand, gem5 simulates functionality of clflush instruction for classic cache. Can 
anyone explain how it do that ?


I traced Clflushopt::initiateAcc() function call which eventually calls LSQ::pushRequest() function 
in lsq.cc. But after completion of translation, it checks request->isMemAccessRequired() and isLoad 
both of which returns falls. As a result it does not call write() function which should put the 
instruction in store queue, instead just return inst->getFault().


Without placing this request in the store queue, how does this request reach the cache to invalid 
the block ?

Where gem5 get's timing for this clflush instruction  ?


My reading of the code suggests that request->isMemAccessRequired() will return 
true, since
this is a request.  Things will then move on to do the write.  Eventually a 
suitable packet
will be sent to memory (interestingly, it carries no data).

HTH

Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Possible bug in cache/base.cc

2023-07-14 Thread Eliot Moss via gem5-users


Dear gem5-ers -

I've run across something, which unfortunately I cannot reliably repeat, that
suggests an oversight in the code of src/mem/cache/base.cc.  In particular, it
appears that a HardPFResp can arrive where the MSHR remembered in the packet's
senderState no longer has any targets.  This causes this line to fail with an
assertion error in getTarget():

const QueueEntry::Target *initial_tgt = mshr->getTarget();

Presumably some suitable conditionalization on mshr->hasTargets() could be
used to fix things, unless the problem is that somehow the target(s)
disappeared when they should not have.  My suspicion is that something to do
with snooping by another cache caused the target to go away, and the situation
should just be ignored, but I did not want to attempt a fix along the lines of
testing hasTargets() without further confirmation.  There is also the question
of what to do about the stats in this case, there being no obvious basis for
determining a latency.  [We could change the senderState to include the time
the prefetch began, though, and use HardPFReq as the command.]

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Can't explain timing result for flush and fence in classical cache hierarchy

2023-07-12 Thread Eliot Moss via gem5-users

On 7/6/2023 1:47 PM, Khan Shaikhul Hadi via gem5-users wrote:
In my configuration I used CPUTypes.O3 and PrivateL1SharedL2CacheHeirarchy to check how clflush and
fence impacts the timing of workload. In my workload I run 10,000 iteration to update an array
value, 200 updates per thread. In workload, I have :

for( ;index to simulate two consecutive localized write operations and see the impact of the fence. Insertion of
FENCE ( macro to insert mfence ) increase execution time by 24%. In second scenario, I have :

for( ;index Where FLUSH (macro for _mm_clflush) should take more time to complete than ARR[index+1]=thread_ID
as this memory update should be highly localized and flush needs to get acknowledgement from all
levels of cache before complete. So, FENCE should have much more penalty for flush compared to write
operation. So, I was hoping to see a high execution time increase for insertion of fences in the
second scenario. But insertion of the fence only increases 2% execution time which is counter
intuitive.
Can anyone explain why I'm seeing this behaviour ? As far as I understand, the memory fence should
let the following instruction execute after all previous instructions are completed and removed from
the store buffer in which case clflush should take more time than regular write operation.

Sorry I am only now seeing this ...

IIRC from my work on improving cache write back / flush behavior,
the gem5 implementation considers the flush complete when the
operation reaches the L1 cache - similar to what happens with
stores. I agree that from a timing standpoint this is wrong,
which is why I undertook some substantial surgery. I need to
forward port to more recent releases, do testing, etc., but in
principle have a solution that:

- Gives line flush instructions timing where they are not complete
until any write back makes it to the memory bus.

- Deals with the weaker ordering of clwb and clflushopt (which
required retooling the store unit queue processing order).

- Supports invd, wbinvd, and wbnoind in addition to the line
flush operations.

Not sure when I will be able to accomplish putting these together
as patches for the powers that be to review ...

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Analyzing instruction cycle count

2023-07-11 Thread Eliot Moss via gem5-users


On 7/11/2023 9:13 PM, Nick F via gem5-users wrote:

Good afternoon,

I have been trying to use Gem5 to research and study the performance of several different computer 
architectures. However, I have been noticing that I may be unable to accurately model the 
differences in cycle length for computer programs.


Take for example these two programs:

#include 

int main(void)
{
     for (uint32_t i = 0; i < 1000; i++) {
     uint32_t x = 5 * 6;
     if (x != 30) {
     return 1;
     }
     }
     return 0;
}

#include 

int main(void)
{
     for (uint32_t i = 0; i < 1000; i++) {
     uint32_t x = 5 + 6;
     if (x != 11) {
     return 1;
     }
     }
     return 0;
}

Compiling and running both individually on a basic RISC-V CPU config, they both exit at exactly 
1,297,721,000. However, in a real system, each multiply operation would take longer and I'd suspect 
doing 1000 multiplications would have even a tiny difference in performance. My own research would 
also have difficulties analyzing relative performance unless I'm missing something.


Even custom instructions seem to execute in a single CPU cycle regardless of how the hardware would 
be implemented.


Is there a good way to define cycle delays in my Gem5 environment? I can implement a "multiply" 
function inserts a bunch of no-ops, but that would make it more complicated when the program 
complexity grows.


I've written a small blog post 
 exploring some of what I've tried in the past week. If anyone here has any suggestions I'd be interested to hear them.


Unless you work really hard to defeat it, any compiler will do 5 + 6 or 5 * 6 at
compile time, producing a small constant.  The if test will go away, and the 
loop
probably will, too.  Did you look at the actual machine code of the executable?

Even if you turned all optimizations off, etc., many cores can do adds and
multiplies in about the same amount of time, and given pipelines, other work,
etc., it might well come out to the same number of cycles even if the operations
are there.

So, I'd first check the machine code, and would also want to know the specific
gem5 model being used (in order, out of order, etc.) and other parameters ...

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: recvAtomicLogic() in mem_ctrl.cc

2023-07-11 Thread Eliot Moss via gem5-users


On 7/11/2023 5:46 PM, Ayaz Akram via gem5-users wrote:

Hi Eliot,

Based on my understanding, when pkt->makeResponse() is called it updates the "cmd" of the pkt with 
the appropriate responseCommand (this line of code: cmd = cmd.responseCommand();) . If you look at  
"MemCmd::commandInfo[]"  in packet.cc, the response command for a "WriteReq" command is "WriteResp". 
And the attributes of a "WriteResp" command don't have "HasData", which is why the response pkt will 
return false on a "hasData()" check.


You might also want to look at the struct CommandInfo in packet.hh.


Ah, yes - I was confusing hasData with the STATIC_DATA and DYNAMIC_DATA
properties, which have to do with whether the data field is set (and
with whether it needs to be deleted when the packet is deleted), which
is separate from the logical notion of whether the packet is carrying
data from one place to another.

Thanks for the reminder!   EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: recvAtomicLogic() in mem_ctrl.cc

2023-07-11 Thread Eliot Moss via gem5-users


On 7/11/2023 3:03 PM, John Smith wrote:
Thanks for responding, Elliot. I somewhat understand that after the write is accomplished, the 
returning packet won't have the data. But still, why is the returned value 0 in that case? Shouldn't 
it still be equal to the memory access latency.


In the Atomic case this code is assuming the write can
be absorbed into a write buffer, so there is no additional
latency visible to the user.  Of course it is *possible* to
saturate the buffers, and if you want a more accurate
accounting you can use a Timing model instead.

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: recvAtomicLogic() in mem_ctrl.cc

2023-07-11 Thread Eliot Moss via gem5-users


On 7/11/2023 3:20 PM, Ayaz Akram via gem5-users wrote:

Hi John,

If you are checking if the pkt is write when pkt->hasData() condition is true in recvAtomicLogic() 
function, the check (pkt_is_write) will always be false. The reason is that a write pkt would have 
already written its data to the memory (abstract memory) in the previous line of code 
"mem_intr->access(pkt);" That access to the memory interface converts a request pkt into a response 
pkt and adds or removes data from the pkt (depending on if the request was a read or a write). Also, 
in this implementation, the accessLatency will only be returned if the request was a read request 
i.e., the write requests would not see any latency.


Dear Ayaz - That makes sense to me, but I could not find where
the dropping of the data happens in the code.  The makeResponse
function on packets does not affect the data, and neither does
the writeData function (which grabs the data and copies it to
the memory).  If you know where this happens, it might improve
John's and my understandings of how this code path works.

Regards - Eliot
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: recvAtomicLogic() in mem_ctrl.cc

2023-07-11 Thread Eliot Moss via gem5-users


On 7/11/2023 1:28 PM, John Smith via gem5-users wrote:
So, I used the function pkt->isWrite() to check if the packet is a write request. And I observed 
that inside the pkt->hasData() if condition, pkt->isWrite() returned false. Hence only the read 
packets were entering the if(pkt->hasData()) condition


So you're saying that inside the if condition, pkt->isWrite is *always* false?

I see.  I couldn't find a place in the code (in the version I have downloaded
anyway) where the data is dropped, but I can imagine it happening after the
write is accomplished (though I don't see why), so that the "returning"
packet no longer has data.  What are the exact types of the components
involved?  And maybe someone else is more competent to answer this since it
is somewhat stumping me from my reading of the code.

Cheers - Eliot
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: recvAtomicLogic() in mem_ctrl.cc

2023-07-11 Thread Eliot Moss via gem5-users


On 7/11/2023 1:01 PM, Eliot Moss wrote:

On 7/11/2023 12:52 PM, John Smith wrote:
Okay, but I've also noticed that a WriteReq generally carries no data. Why exactly is that? Cause 
if we are writing to memory, then the memory access latency shouldn't be 0 right?


I believe that happens if the write got its data by snooping a cache.
The packet still goes to the memory, but with the write suppressed.
This certainly happens in the Timing case; I admit I'm a little less
clear about the Atomic one.


Sorry - I see I was responding about a read.

So, what surprises me is that you're saying that write requests generally
carry no data.  That doesn't seem right.  What leads you to that conclusion?

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: recvAtomicLogic() in mem_ctrl.cc

2023-07-11 Thread Eliot Moss via gem5-users


On 7/11/2023 12:52 PM, John Smith wrote:
Okay, but I've also noticed that a WriteReq generally carries no data. Why exactly is that? Cause if 
we are writing to memory, then the memory access latency shouldn't be 0 right?


I believe that happens if the write got its data by snooping a cache.
The packet still goes to the memory, but with the write suppressed.
This certainly happens in the Timing case; I admit I'm a little less
clear about the Atomic one.

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: recvAtomicLogic() in mem_ctrl.cc

2023-07-11 Thread Eliot Moss via gem5-users


On 7/11/2023 12:37 PM, John Smith via gem5-users wrote:

Hi everyone,

Could someone please help me with explaining what's happening in the below code snippet? It's the 
receiveAtomicLogic() function in mem_ctrl.cc. Why are we returning the latency as 0 if the packet 
doesn't have any data? And in what case will the packet have/not have data?


// do the actual memory access and turn the packet into a response

mem_intr->access(pkt);


if (pkt->hasData()) {

// this value is not supposed to be accurate, just enough to

// keep things going, mimic a closed page

// also this latency can't be 0

return mem_intr->accessLatency();

}


return 0;


John - Certain packets carry no data.  For example, a cache line invalidate
without write back will have that property.  Maybe others.

Best - Eliot
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Error: snoop filter exceeded capacity

2023-07-10 Thread Eliot Moss via gem5-users


On 7/10/2023 5:17 PM, John Smith wrote:
I understood how to pass it. However, --param='system.membus.snoop_filter=NULL' doesn't seem to 
work. I'm getting the following error:

NameError: name 'NULL' is not defined


I see.  Well, this line was in an older version of XBar.py:

snoop_filter = Param.SnoopFilter(NULL, "Selected snoop filter.")

and the resulting config file shows a value of NULL.  Maybe on the
command line it will be happier with something more python-ish,
such as None?

I am adding the gem5 list back onto this since someone there probably
knows.

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Error: snoop filter exceeded capacity

2023-07-10 Thread Eliot Moss via gem5-users


On 7/10/2023 3:59 PM, John Smith via gem5-users wrote:

I'm sorry. Here's the error message I got:

build/X86/mem/snoop_filter.cc:197: panic: panic condition !is_hit && (cachedLocations.size() >= 
maxEntryCount) occurred: snoop filter exceeded capacity of 131072 cache blocks


That snoop filter capacity should be enough for 8MB of data (131072 * 64).
While I don't personally use snoop filters (you can just turn that component
off), if you want to use them then maybe increase the size, e.g., double it.
You can add something like this on the command line:

--param system.membus.snoop_filter.max_capacity="16MiB"

that is, enough capacity to deal with 16 MB of cache storage below.  I would
hope you should not need to do this for each tol2bus ...

Of course you can also modify the python code that sets up the system,
but the command line approach may be easier for you.

To turn off snoop filters, do:

--param system.membus.snoop_filter=Null

However, snoop filters offer a possible performance benefit.  (I am not sure
how realistic they are in practice, however - maybe someone else can answer
that.)

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Adding a delay of certain ticks in gem5

2023-07-06 Thread Eliot Moss via gem5-users


On 7/6/2023 12:48 PM, John Smith via gem5-users wrote:
I've looked into the schedule() function which is used to schedule events. But can this function be 
used to simulate delays?


Not by itself.  You schedule an event at something like curTick() + 100.
When the event happens, a function gets called.  So, the last step you
do before your wait is to schedule the event.  The event handler is a
new function that does the rest of the steps.  And any state you'll need
will need to be available, etc.

I guess my point is that it's doable, but non-trivial.  You can't just
pause the gem5 code and pick it up again - it's an event driven system.

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Adding a delay of certain ticks in gem5

2023-07-06 Thread Eliot Moss via gem5-users


On 7/6/2023 11:12 AM, John Smith via gem5-users wrote:

Greetings,
If I want to, for example, add a delay of 100 ticks before a line of code executes in the function 
handleTimingReqMiss() in cache.cc, how do I go about doing that?


Generally speaking, you'll have to schedule an event and then do the
rest of the work in the event handler - something like that.  You can't
just suspend code in the middle.  You'll probably need to break things
into two functions to accomplish this.

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Question about InOrder cpu models

2023-07-05 Thread Eliot Moss via gem5-users


On 7/4/2023 7:17 PM, Eliot Moss via gem5-users wrote:

Dear gem5-ers --

I am thinking of trying to put together something that roughly models ARM's
R82, which is an 8-stage, width 3, in order cpu.  (It's also not a single
thing, but has numerous options you choose, and then set up RTL and can have
your design manufactured.)  I see that there are three non-SMT and one SMT in
order pipeline models, but I'm not clear how I would use them -- swap them in
for the one that does not have 5, 9, or smt in its name?  Or what?  I do know
that I'll need to put together a new system model that uses the ARM isa and is
at least slightly extended from InOrderCPU.py/  Any other things to watch out
for?  Thanks - Eliot


Following up my own question a bit ...

Is InOrderCPU (cpu/inorder) deprecated or something?  Even adding InOrderCPU
to CPU_MODELS in build_opts/ARM does not cause the inorder directory to
compile.  Not sure how to make it happen.

Meanwhile, there is MinorCPU (cpu/minor), which seem perhaps intended to
replace inorder.  Is that right?

Regards - Eliot
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Question about InOrder cpu models

2023-07-04 Thread Eliot Moss via gem5-users


Dear gem5-ers --

I am thinking of trying to put together something that roughly models ARM's
R82, which is an 8-stage, width 3, in order cpu.  (It's also not a single
thing, but has numerous options you choose, and then set up RTL and can have
your design manufactured.)  I see that there are three non-SMT and one SMT in
order pipeline models, but I'm not clear how I would use them -- swap them in
for the one that does not have 5, 9, or smt in its name?  Or what?  I do know
that I'll need to put together a new system model that uses the ARM isa and is
at least slightly extended from InOrderCPU.py/  Any other things to watch out
for?  Thanks - Eliot
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Shared variables between multi-core RISCV CPUs

2023-07-02 Thread Eliot Moss via gem5-users


On 7/1/2023 11:38 PM, Abdlerhman Abotaleb via gem5-users wrote:

How can I share a variable between multicores in GEM5. (I'm simulating RISCV- 
Cores)
I can see that each core allocates different VPN to PFN translation.
So even if I explicitly assign a memory address to a variable (i.e. char*arr = 0x20010 then 
dereference it later) it will be in different physical memory location for each core.

Do you have an idea how to do it from the source code?
Thank you.


My thinking is that it's no different under gem5 than under Linux.  Ways
I can think of:

- Map a file of appropriate size shared; put the variable at a fixed offset
  in the mapped region.

- Similarly for shared memory (shmem).

If you care, the file might be from a file system that is actually kept
in a chunk of physical memory.

The only other thing I can imagine is setting up a new device for which
you write a driver, and that driver maps a kernel and gives access to it,
allowing it to be mapped.  But that's more complicated and more work than
just sharing an ordinary page.

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: bit-slice processors?

2023-06-28 Thread Eliot Moss via gem5-users


On 6/28/2023 9:42 AM, 中国石油大学张天 via gem5-users wrote:

Can gem5 be used to simulate bit-slice processors?


If you're up for defining a new cpu model - not a light undertaking - then I
see no reason why not.  If you're not already aware, gem5 is not a circuit
level model.  Its modules are written in C++ and the overall system is
instantiated and components connected up using python scripts.  You just need
to be able to code the semantics of your cpu in C++.  I would think that
suitable bit masking, shifting, etc., would allow you to model a slice up to
64 bits wide fairly easily, and wider ones by using multiple precision
arithmetic.

You need to be able to describe the (logical) cpu state, the instructions (as
macro-ops and micro-ops), decoding, addressing, faults, etc., in C++ (decoding
and instruction descriptions are generally done using files in a particular
format that use python to generate the tedious repetitive details).

Does this help?  If you're asking is there anything pre-built for this, I
think the answer is no.

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Fatal error for when clflush is included in workload for O3 system simulation

2023-06-22 Thread Eliot Moss via gem5-users


On 6/22/2023 8:19 PM, Khan Shaikhul Hadi via gem5-users wrote:

Hi,
Thank you for your response with the patch link. It helped me a lot to understand what's going on 
and limitations with clflush.


Do you have any idea if clflush alternative for arm isa is implemented in gem5 properly or not. I 
work on persistent memory and for x86 isa, you need clflush and fence ( which also may not be 
implemented properly). If I move to arm, clflush and fence should be replaced with 
similar functionality instruction ( I have no idea about arm isa, sorry can't mention explicit 
instruction. Most likely DC CVAU and memory barrier in arm isa ).


These will boil down to the same micro-ops, just so you know ...

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Persistent memory in gem5: How to test persistent memory workload properly.

2023-06-22 Thread Eliot Moss via gem5-users

On 6/22/2023 5:47 PM, Khan Shaikhul Hadi wrote:

Hi Eliot,
Thank you for your detailed answer.
For my current work, I need "CLFLUSH" and "MFENCE" to work properly. For clflush, I was planning to
modify the instructions execution to issue a flush request to the cache and handle the rest using
directory based cache coherence protocol. As flush is treated as a store operation , I don't know
how to receive flush_complete response and clear store buffer in gem5. Do you have any suggestion on
that or which file/function should I look at to get some idea.

It's just an ordinary response. An item will not be removed from the store
queue
until a response comes. The issue at present, as I recall, is that things like
an ordinary store get a response from the cache as soon as the cache starts
processing the store. In this case you need to defer that response until the
request has been processed all the way out to the memory bus, which will send
a response back down.

I'm not sure what the directory cache coherence protocol has to do with it.
(Note: I do *not* use Ruby / Garnet.)

For fence, it's true that X86 ensures total store order but you need a fence to ensure that all the
respective reads on the update read persisted data not volatile data. else it may result in
inconsistent data upon crash. Based on my understanding, the memory fence does not let other
instructions move forwards unless all the content of the store buffer upto the fence instruction is
cleared. If the fence does not work properly in gem5, then I want to implement this property. Do
you have any suggestions for me? I'm not well versed with gem5 yet.

Sure - if you feel a fence is necessary, use a fence. gem5 implements it by
waiting for the store queue to empty (for sfence and mfence) and the load
queue to empty (for lfence and mfence).

But with eADR as the presumed future, I am not sure of the usefulness of
the flushing any more ...

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Persistent memory in gem5: How to test persistent memory workload properly.

2023-06-22 Thread Eliot Moss via gem5-users

On 6/22/2023 4:54 PM, Khan Shaikhul Hadi via gem5-users wrote:

Hi,
I want to simulate a Persistent Memory machine in gem5. Gem5 has an NVMe module but at instruction
level ,for most part, it does not simulate CLFLUSH ( specially for MESI cache coherence protocol ).
I am also not sure if it simulates memory fence properly (For out of order cpu, it seems like
MFenceOp.execute just returns no fault without doing anything. I was expecting it would do something
to ensure the store buffer is clean before other instructions could proceed or something like
that.). In that case, how one runs a persistent memory benchmark in gem5.

Side Note : To make a update persistent in X86 architecture, an update must
follow by FLUSH and FENCE.

I think you're mostly correct about this.

I coded up an improved version, but have not extracted the code and submitted
it back. Maybe I can give that (largish) patch some priority, to help others
out. There are several issues involved:

- Decoding clwb (which I presume you want), which is fairly easy.

- Giving clflush, clflushopt, and clwb, along with sfence, mfence, and lfence
the right ordering properties to model the x86 semantics (a given x86
implementation *might* impose more order, but, clflushopt and clwb do not
follow total-store-order as if they were other stores).

- Having clflush, clflushopt, and clwb not complete until the data have
reached the memory controller. (I believe the version at present treats
them as done when the request reaches the cache.)

- I also added support for bulk cache flush operations (wbinvd and wbnoinvd),
which may be of less use because they're privileged, for security reasons.

The resulting design has clflush and friends send a packet up to the Point of
Coherence (that's an ARM term, but means where coherence is resolved,
generally the memory bus). Then snoops are sent back down to all caches.
This means that asking for flush of a line that is residing dirty in some
other cpu's L1 cache (for example) will indeed flush the line. When the data
(if any) are sent to the memory bus, the bus then sends a response down. (If
no caches hold the data, a response is also sent.) In principle it would be
possible to force a wait until the data are recorded in memory array, but
since Intel guarantee persistence once data reach the controller, having the
packet cross the memory bus suffices.

Dealing with the weaker ordering of clflushopt and clwb required substantial
surgery to the store queue part of gem5's out of order cpu model, since it
processed items strictly on order when TSO was set (which is the appropriate
setting for x86).

The bulk clean ops (wbinvd, etc.) required a kind of additional "engine" in
the caches to find dirty lines and write them back, and then to detect when
they had all reached the memory bus before indicating completion.

Anyway, yes, the setup as is will give you the semantics, but not the timing.

One further observation. More recent Intel models support eADR, an obscure
but which turns out to mean that if data reach the cache, they will be
persistent. This means you no longer need to use clflush and friends.
Further, given that x86 implements total-store-order on ordinary stores, for
the most part you don't even need fences, unless for some reason you need to
know that a given store has actually reached the cache. (If you're processing
things in a transaction-like way, you simply updated the commit record after
updating everything else. If, after a crash, the commit record indicates
"committed", you know the previous stores also reached the cache so the
transaction is durable (persisted).) Therefore, if you wanted a fence just for
ordering purposes, you don't need it. The fence *does* guarantee that the
store queue empties before you proceed - but on a substantially out of order
machine, emptying a queue like that might have a noticeable impact on
performance for small transactions. There might be occasions where you need
fences to prevent loads and stores from passing each other, but (except for
the above noted clflushopt and clwb) x86 semantics requires loads to be
handled in order, and stores to be handled in order, but the two queues are
separate. (A load does need to see preceding stores to the same byte by the
same cpu, though.)

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: simulate a multi-core processor with Gem5

2023-06-20 Thread Eliot Moss via gem5-users


On 6/20/2023 10:41 AM, 中国石油大学张天 via gem5-users wrote:
How to simulate a multi-core processor with Gem5, such as how to write configuration files? For 
example, in the following form:


You don't write config files.  You write python code that creates
instances of python classes.  The gem5 system will instantiate the
corresponding C++ classes and connect everything together as the
python objects indicate.

Regards - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: How clflush execution works in gem5 ?

2023-06-16 Thread Eliot Moss via gem5-users


On 6/16/2023 11:39 AM, Khan Shaikhul Hadi via gem5-users wrote:

Hi,

I'm trying to figure out how "clflush" instruction works in gem5. Specially, how it issues a signal 
to the cache controller to evict the block from cache hierarchy throughout the system and how it 
receives confirmation to clean the store buffer so that the next fence let following instructions to 
proceed. Anyone have any idea how this works or where I should look for better understanding ?



I have tried to trace clflush execution and found some confusing facts. It would be great if anyone 
could clarify this.



1. "clflush" instruction execution eventually calls Clflushopt::initiateAcc() 
(build/X86/arch/x86/generated/exec-ns.cc.inc ) as macroop definition of CLFLUSH uses clflushopt. So, 
there is no dedicated clflush operation in gem5 but all flush operations are treated as clflushopt ?


clflush is a clflushopt followed by a microop that waits for the store queues
to be empty.  This is what causes the stronger ordering of clflush vs 
clflushopt.

2. When  Clflushopt::initiateAcc() executes in timing simulation ( CPUType::TIMING), it eventually 
calls TimingSimpleCPU::writeMem() function in src/cpu/simple/timing.cc. Here you have :


if (data == NULL) {
         assert(flags & Request::STORE_NO_DATA);
         // This must be a cache block cleaning request
         memset(newData, 0, size);
     } else {
         memcpy(newData, data, size);
     }


So, I was assuming it will have data==NULL and execute memset() but it actually executes memcpy(). 
This seems weird. Am I missing something ?


Some processors have an operation to zero a cache block (line).  That's what 
the memset is for.
Otherwise the flushed data have been sent to the memory and need to be stored 
(memcpy).

3. For out-of-order simulation (CPUType.O3), Clflush::initiateAcc() is called twice the number of 
clflush instructions in my workload. For example, if my workload has 6 clflush instructions, gdb 
  breakpoint at Clflush::initiateAcc shows that this function is called 12 times (timing simulation 
called this function 6 times as it should). Can anyone explain what happens here?


I'd have to go dig into the code, but maybe what you're seeing is that the 
instruction must
first do a virtual address translation, and only after the result of that is 
available (some
number of cycles later) can it send the actual request (which is put into a 
store queue and
acted on in due course).

Note further that an operation like clflush may travel all the way out to the 
coherent xbar
closest to the memory and then snoops will be sent down to *all* the caches 
(since the line
in question may be in some other processor's L1 cache (for example)).  
Whichever cache has
the data will respond.  If none respond, then the cache line is not resident 
anyway (or was
not dirty and is now dropped by all the caches) so there is no further work to 
do.

There are some aspects of this where gem5 does not follow what x86 processors 
do ... in
particular, gem5 handles all x86 memory store operations (clflush is in this 
category)
in order (Intel TSO - total store order), even though Intel ordering of 
clflushopt and
clwb is weaker.  I coded up something more like actual Intel behavior, but have 
not
submitted it back to gem5 :-( ...  It made the store queue processing rather 
more subtle,
since the existing code counted on things proceeding in order.

HTH
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Request for Guidance: Extracting Detailed Information for Floating-Point Instructions

2023-06-16 Thread Eliot Moss via gem5-users

On 6/16/2023 3:07 AM, Alexandra-Nicoleta DAVID via gem5-users wrote:

Dear gem5 Community,

I am currently using the gem5 simulator for my research work and I find it a powerful and insightful
tool for studying and understanding the inner workings of computer architectures.

I am particularly interested in exploring and understanding the behavior of floating-point
instructions within certain benchmarking suites. For my study, I need to extract detailed
information about each floating-point instruction that is executed, such as the Program Counter
(PC), source register, and destination register.

Despite my efforts, I am having difficulty obtaining this data. I have been trying to use the trace
functionality, but it seems I may be missing some key steps or perhaps there is a better approach.

Could anyone guide me on how to accomplish this task? Specifically, I would appreciate it if you
could share any scripts, changes in the source code, configuration options, or any other method that
would allow me to collect the information I need.

What leads you to believe that the trace functionality does not provide what
you want?
In my experience it will, if you select the right trace options (i.e., debug
flags).
These flags are summarized at the end of SConscript files (for the python code
to help
generate .hh and .cc files from them). Looking at gem5/src/cpu/SConscript, you
will
see numerous Exec flags. You may want something like this set of them:

ExecEnable,ExecOpClass,ExecMacro,ExecUser

You may need to experiment with adding or substituting other flags to get what
you
want. There is no filter to get *only* f.p. ops, though you could dig into the
code
to see where these flags are used, and then set up and add a new flag (add it
to SConscript,
#include the right file into the .hh/.cc file(s) where you will use it, etc.).

Note: debug flags are active only in opt and debug builds, not fast builds.

Finally, you pass them on the command line using --debug-flags flg1,flg2,flg3
etc.
You will get a LOT of output; you may want to pipe it through a grep (etc.)
filter
and then compress it with gzip or something.

HTH - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Writing a script to run multiple simulations at once

2023-06-14 Thread Eliot Moss via gem5-users


On 6/14/2023 11:30 AM, Derek Christ wrote:

Hello Eliot Moss,  [one ell please]



a shared Python file with parameter settings sounds useful.



What I meant with running gem5 without the gem5 executable was to use the
compiled library directly from the Python configuration script.



From what I have seen, the gem5 executable sets up some internal state and
then directly calls the embedded Python interpreter to launch the
user-provided script.



But as I see it there is no technical reason why it shouldn't be possible to
call this setup routine directly from Python. This would reduce the
complexity to only one single Python script.


Well, there may remain value to having a standard setup/run script that
invokes a user supplied script.  It helps keep gem5 per se separate from the
user's setup / configuration - a principle of modularity.

A quick look at main.cc suggests you may be right that this *could* be done,
though I have no idea what those various setup functions do and whether any of
that would be hard to do from python.  What I suppose I am missing is the
motivation - why such a change would be substantially better.  My applications
tend to be quite complex and I find I need the layers of script, for various
reasons.  Maybe this has more to do with preference to write in python vs bash
scripts vs C++ code.

gem5 is not currently packaged as a library, I don't think, though I suppose
it could be.  Given the amount of existing projects and infrastructure, one
would need to continue to support the current way of doing things as well.
This might further complicate the system and its maintenance - one hopes by
not very much.

HTH - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Writing a script to run multiple simulations at once

2023-06-14 Thread Eliot Moss via gem5-users

On 6/14/2023 2:32 AM, Derek Christ via gem5-users wrote:

Hello,

maybe I have missed something in the official docs, but I'm not sure how to run multiple simulations
with different parameters concurrently to speed up the process.

What I have done is I created a Python script that sets environment variables and then kicks-off
gem5 which in turn runs another Python script that reads those environment variables to configure
the simulation.

But I'm sure there has to be a better way?

I think it might be easier if it would be possible to run gem5 purely with Python (without the gem5
executable) because then it would be easy to pass custom parameters to the gem5 configuration.

Is there something that I missed?

Thanks

Best
Derek

I use scripts that setup environment variables, command line parameters, and a
python
file with python parameter settings. The latter is placed in a directory
unique to
the run.

It happens that my scripts are shell scripts, but with varying degrees of pain,
other
scripting languages, or even hard coding in C or something, would work. I just
use
what I am skilled with.

You *must* have the gem4 executable, however. The bulk of the simulation is
run with
that (optimizing compiled) C++ code. It implements the actual modules as well
as the
event driven simulation part. Python is used mostly for configuring the
simulation
before the event loop is kicked off. If the simulator were written entirely in
python
it would be much slower.

HTH - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Modeling DRAM memory latencies

2023-06-13 Thread Eliot Moss via gem5-users

On 6/13/2023 9:07 AM, Vincent Abraham wrote:
Sure, I'm using the 22.1.0.0 version and the memory controller files (MemCtrl.py, mem_ctrl.cc, 
mem_ctrl.hh) are located in src/mem/.

On Tue, Jun 13, 2023 at 8:32 AM Eliot Moss mailto:m...@cs.umass.edu>> wrote:

On 6/13/2023 7:10 AM, Vincent Abraham wrote:
 > Hi,
 > I'm afraid just changing the parameters doesn't do the job for me. I 
want to add a delay at the
 > memory controller level, when it sends the requests to the memory. Could 
anyone point me to a
 > function where I should do the changes? Also, how should I add the delay?

Ok.  I'm sure there are multiple strategies, but on reflection here's a simple
one: Insert a bridge just before each memory controller.  A bridge has a delay
parameter that you can set, and it also has queues in each direction whose
size you can set.  If you don't want to delay responses as well, you will need
to modify bridge to have a response delay separate from the current delay that
gets applied in both directions.  You add the python parameter to Bridge.py
and the corresponding C++ field in bridge.hh.  Initialize it in
Bridge::Bridge(...) and then use it in
Bridge::BridgeRequestPort::recvTimingResp.

This method avoids getting into the rather complex internals of the memory
controller module.  The down side is that it does not affect the controller's
stats.

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Modeling DRAM memory latencies

2023-06-13 Thread Eliot Moss via gem5-users


On 6/13/2023 7:10 AM, Vincent Abraham wrote:

Hi,
I'm afraid just changing the parameters doesn't do the job for me. I want to add a delay at the 
memory controller level, when it sends the requests to the memory. Could anyone point me to a 
function where I should do the changes? Also, how should I add the delay?


Which version of the code are you working with and which file
has the memory controller in it?  (This has changed over time.)
Armed with that info I can probably tell you where to make the
change.

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Modeling DRAM memory latencies

2023-06-10 Thread Eliot Moss via gem5-users


Yes, adjusting some parameters in the memory controller
may be the easiest then - though Id have to analyze the
parameters and their meanings to see whether you'd need
to add new parameter(s) and code.

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Modeling DRAM memory latencies

2023-06-10 Thread Eliot Moss via gem5-users


On 6/10/2023 2:32 PM, Vincent Abraham via gem5-users wrote:
I'm extremely sorry if I worded my question incorrectly. I'm actually trying to introduce a delay 
whenever a read/write request happens in the main memory. For example, in a memory write, the data 
would only be flagged as dirty after a 10ns delay.


I'm trying to understand what you're talking about,
but having some difficulty.  "Flagged as dirty" is a
*cache* concept, not a main memory concept, AFAIK.
There are a variety of queues and parameters in the
memory controllers.  If you find the right place to
delay something, you can simply add some more time
to the schedule time for the next event in the
processing pipeline.  Or, with more coding, insert
an additional internal queue.

But what does "flagged as dirty" mean to you?

Best - E<
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Modeling DRAM memory latencies

2023-06-10 Thread Eliot Moss via gem5-users


On 6/10/2023 11:12 AM, Vincent Abraham via gem5-users wrote:

Hi everyone,
I'm trying to model additional latencies in the main memory while performing write/read operations. 
Could anyone tell me how I could go about doing it?


Of course the dram module has a gazillion timing and energy parameters that
you can simply look up in your config.ini file, if that's what you mean.

But I suspect you want to know something like the distribution of access
times.  You can see comm_monitor for statistics examples (and the memory
controller already has a lot as well), but it might go roughly like this.

First, set up a map in the controller module along these lines:

hash_map arrivalTime;

When a packet arrives at the controller, put it into the map like this:

arrivalTime.emplace(pkt, curTick());

Later, when the packet has finished processing, you can find out how long it
took by code like this:

auto it = arrivalTime.find(pkt);
assert(it != arrivalTime.end);  // it really should be there
Tick arrival = it->second;
Tick latency = curTick() - arrival;
arrivalTime.erase(it);

The other part is recording statistics.  To get a histogram over all packets,
declare a stat like this as a member of the memory controller's class:

Stats::Histogram pktLatencies;

In the controller module's regStats function, add this:

pktLatencies
.init(20) // or whatever number of buckets you want
.name(name() + ".pkt_latencies")
.desc("Histogram of packet latencies")
.flags(cdf | dist | nozero);  // I like cdf, but pdf can be good too

Of course you don't have to use a histogram, and you don't have to use just
one.  For example, you could have one for reads and one for writes, and enter
packets conditionalized on isRead() and isWrite().

The one piece I did not mention yet is adding a sample to the histogram.  When
a packet finishes and you know its latency, just do:

arrivalTimes.sample(latency);

The magic of stats will do the rest.

Cheers - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Changing Parameters in NVMInterface subclasses

2023-06-10 Thread Eliot Moss via gem5-users


On 6/10/2023 7:47 AM, Vincent Abraham via gem5-users wrote:

Hi everyone,
I want to change some of the parameters like tREAD, tWRITE, etc. in the NVM_2400_1x64 class. I tried 
to create a new subclass for it but I wasn't able to pass the class by name in the mem-type 
parameter in fs.py. And even with changing the parameters in the existing class itself, it seems 
like the values aren't getting updated when I try to print some of them in the MemConfig.py file. It 
would be great if someone could help me fix this.


You can actually change them on the command line by using:

--param fullNameOfParameter=DesiredValue

If you look in a config.ini file from a run you can see the
full name of the component written out; add .nameOfParameter
to that to get the full name of the the parameter.  Some
parameter values need quotes around them (as I recall,
ones that are strings as opposed to number when used in
python).

To be able to mention the class by name in fs.py you may
need to edit Options.py or MemConfig.py.

Some of these are more deeply embedded into the compiled
code and changes require a rebuild.

HTH - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Custom cache memory

2023-05-21 Thread Eliot Moss via gem5-users


On 5/20/2023 6:39 PM, Pavitra bhade via gem5-users wrote:

Dear all,

I am looking to create a cache memory with  different structure and mapping algorithm. For eg an 
additional bit for each cache memory location, different mapping algorithm based on the value of 
that bit etc. I want a to implement a different logic for cache mapping, as against the regular LRU 
algorithm etc. Can this custom cache memory be implemented using gem5?


Yes.  You can create your own subclass of BaseTags, and of BaseCache, if need 
be.
You need to adjust cache.hh / cache.cc / cache_impl.hh a little to know about 
the
new class.  You can additional field of blk.hh / blk.cc, etc.

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: How to use more CPU for running FS model?

2023-05-07 Thread Eliot Moss via gem5-users


On 5/7/2023 9:15 PM, Xiang Li via gem5-users wrote:

Hi,

I'm running X86 FS model, it would take me a long time for starting a FS model. I find it just using 
one CPU, can gem5 use more CPU when running FS model?


No.  It's modeling a whole complex system at the level of individual small
steps of each component in the system, done using a single event queue.
Given that each event could in principle affect the any other units,
things pretty much have to be done in order.

One might be able to build a concurrent simulator for multiprocessors,
but gem5 is not that simulator - and it would not be an easy task to
get much speed up.  Imagine, for example, when each cpu gets access to
a bus, or its possible need to react to cache misses coming from other
cpus, etc.

Best wishes - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: 回复：Re: 回复：Re: 回复：gem5& execute&

2023-05-07 Thread Eliot Moss via gem5-users


On 5/7/2023 10:50 AM, 中国石油大学张天 via gem5-users wrote:
Sorry for taking so long to reply to you. The goal I want to achieve is a simple reproduction of 
this article (Extending Moore's Law via Computationally Error Tolrant Computing), ultimately 
completing such a system. I believe the core goal should be the design of the RRNS core. Because it 
did mention gem5, I thought I could use gem5 as a tool or other tools such as McPAT to achieve this. 
But through my understanding of the ALU part of Gem5, I found that there was no specific 
implementation, so I thought about using a translator to implement it through Gem5+RTL. The 
attachment contains a paper that guided me to generate these ideas, which you can read.
I am currently quite confused, and I hope you can provide me with some suggestions after fully 
understanding.

Thank you.


Attaching gem5 to RTL goes beyond my knowledge.
But maybe someone else on the list can pick up the thread ...

EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Different latencies

2023-05-02 Thread Eliot Moss via gem5-users


On 5/2/2023 1:01 PM, Shen, Fangjia wrote:
Regarding the data latency, I think it depends on whether the cache is sequential access (access 
cache tags, then data) or parallel access (access tags and data at the same time - common 
optimization for the L1 cache).  See the code for BaseCache::calculateAccessLatency. If 
sequentialAccess==true,  the sum of tag latency and data latency is used, otherwise the greater of 
tag latency and data latency.


Regards,
Fangjia


Good point, Fangjia!
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Different latencies

2023-05-01 Thread Eliot Moss via gem5-users


On 4/30/2023 11:33 PM, Inderjit singh via gem5-users wrote:

Any ideas.
Inderjit Singh

On Wed, May 11, 2022 at 10:08 PM Inderjit singh > wrote:


1. What is the difference between data/tag/response latency?
2. How can I add write latency (for NVM caches) in gem5, any patch?


The tag part of a cache records addresses and access/state bits, typically the
Valid bit (does this entry contain valid information - if not, the entry
should be ignored), Dirty (does the data differ from caches farther away or
the main memory), Shared (are there other copies of this line/block in other
caches), etc.

A cache first takes the address in the request (typically a read or write) and
sees if that addresses occurs is a valid entry in the cache.  The amount of
time to do this is the TAG latency.

The DATA latency is the additional time to supply (or update) the data if the
tag matches (a cache hit).

I believe the RESPONSE latency is how long it takes a cache to pass through a
response coming from beyond itself.


As for your question about write latency in NVM caches, I find the question a
little bit ambiguous.  I assume you mean volatile caches that cache
information coming from NVM.  DATA latency would be the write latency into
such a cache.  If you're talking about the internal buffers of an NVM, there
are a slew of parameters in the NVM interface (which combines all the
buffering, etc., with the actual data array).  See NVMInterface.py, but also
MemInterface.py.  To set such parameters on the command line, you have to do
it for each controller.

This bash code might give you a sense of a way to do it using a script:

# MEMORY parameters

export GEM5_MEMTYPE=${GEM5_MEMTYPE:-NVM_2400_1x64}
export MEM_CHANNELS=${MEM_CHANNELS:-8}
export GEM5_XPARAMS="${GEM5_XPARAMS} --mem-channels=${MEM_CHANNELS}"
export TOTAL_MEM_SIZE_MB=${TOTAL_MEM_SIZE_MB:-512}

## tREAD and tWRITE are guesses ...
## Other values adjusted for 2666 MHz (vs 1200MHz in default gem5)

export MEM_TCK=${MEM_TCK:-0.357ns}
export MEM_TREAD=${MEM_TREAD:-125ns}
export MEM_TWRITE=${MEM_TWRITE:-180ns}
export MEM_TSEND=${MEM_TSEND:-6.28ns}
export MEM_TBURST=${MEM_TBURST:-1.528ns}
export MEM_TWTR=${MEM_TWTR:-0.357ns}
export MEM_TRTW=${MEM_TRTW:-0.357ns}
export MEM_TCS=${MEM_TCS:-0.714ns}
export MEM_DEVICE_SIZE=${MEM_DEVICE_SIZE:=$((TOTAL_MEM_SIZE_MB / 
MEM_CHANNELS))MB}

for (( i=0 ; i < $MEM_CHANNELS ; ++i )) ; do
export GEM5_XPARAMS="${GEM5_XPARAMS} --param 
system.mem_ctrls[$i].dram.tCK=\'${MEM_TCK}\'"
export GEM5_XPARAMS="${GEM5_XPARAMS} --param 
system.mem_ctrls[$i].dram.tREAD=\'${MEM_TREAD}\'"
export GEM5_XPARAMS="${GEM5_XPARAMS} --param 
system.mem_ctrls[$i].dram.tWRITE=\'${MEM_TWRITE}\'"
export GEM5_XPARAMS="${GEM5_XPARAMS} --param 
system.mem_ctrls[$i].dram.tSEND=\'${MEM_TSEND}\'"
export GEM5_XPARAMS="${GEM5_XPARAMS} --param 
system.mem_ctrls[$i].dram.tBURST=\'${MEM_TBURST}\'"
export GEM5_XPARAMS="${GEM5_XPARAMS} --param 
system.mem_ctrls[$i].dram.tWTR=\'${MEM_TWTR}\'"
export GEM5_XPARAMS="${GEM5_XPARAMS} --param 
system.mem_ctrls[$i].dram.tRTW=\'${MEM_TRTW}\'"
export GEM5_XPARAMS="${GEM5_XPARAMS} --param 
system.mem_ctrls[$i].dram.tCS=\'${MEM_TCS}\'"
export GEM5_XPARAMS="${GEM5_XPARAMS} --param 
system.mem_ctrls[$i].dram.device_size=\'${MEM_DEVICE_SIZE}\'"

done

This is designed to allow things like MEM_TCK to be set before running this
piece of script, but otherwise to supply the default value 0.357ns (for
example).  Once all the above is done, the command line itself must include
${GEM5_XPARAMS} to expand out all of the above.  Also, I know its confusing,
but in an NVM only system, the NVM controllers have the name "dram" even
though that are of NVM type!

HTH - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: CXL (Compute Express Link) in gem5?

2023-04-24 Thread Eliot Moss via gem5-users


On 3/27/2023 6:13 AM, gabriel.busnot--- via gem5-users wrote:

Thanks, Gabriel, for your response, now a month ago.  I want to turn my
attention back to this ... :-)


I can’t provide you with an assertive answer but I’ve also been looking at
CXL recently so here is what I understand so far.



 From a functional perspective, the classic cache system seems able to
support the hierarchical coherency aspects just fine with the coherent Xbar
of each chip connected to a CPU side port of the other chip’s Xbar. The
performance will probably be quite off, though. You could improve on it by
implementing a kind of throttle adapter SimObject that would model the CXL
link layer between the 2 xbars. Snoop performance modeling will remain
atomic/blocking just as with any classic cache configuration.


I'm trying to envision doing this in a way that would work.  First, I
interpret you as saying that each component that plays this CXL "game" has its
own coherent Xbar.  You seemed to say that they would be cross connected.
Suppose we have two devices, X and Y.  The mem side of X would be connected to
the cpu side of Y, and mem side of Y to the cpu side of X.  What confuses me
about this is that it seems it would lead to infinite forwarding to mem sides.
It also seems to make it difficult to offer a single point of coherence.

A second arrangement I thought of is that CXL memories could be "level
infinity" caches, i.e., act like caches though the set of lines they hold is
fixed, and their lines are always valid.  Their mem sides would go to a final
coherency Xbar that would serve as the point-of-coherence of the system.  A
CXL memory would always fast-route requests having to with things outside its
address space to this coherency bus, so that that some other memory could
respond.

A third arrangement would be a variation on the second one: CXL memories are
level-infinity caches on the other side of a coherent Xbar "memory bus" with
routing such that each CXL memory get requests pertaining only to its part of
the physical address space.  A CXL device that has its own cache would connect
like a cpu+cache to the memory bus.  A CXL device that has no cache could
connect directly to the coherent Xbar memory bus.  It is not clear to me how
that is different from the current sort of arrangement.

The setup I would like to be able to assemble is this:

- Regular cpu cores with a regular L1/L2/L3 cache hierarchy
- A memory system like the smart memory cube [SMC] - highly parallel
- A processor-in-memory [PIM] that:
  - has more-direct access to the SMC, but that access is still coherent
  - has a private scratch pad memory (non-coherent)
  - has its own its own cache that is coherent with the regular cores' memory
hierarchy
  - has its own DMA units that transport data between coherent memory and the
private scratchpad

I have previously built most of this, but the PIM's cache and DMA were not
coherent, and going through extra protocols to deal with that dragged
performance down.


As for Ruby, the goal is further away. AFAIK, no protocols supports
hierarchical coherency (home node to home node requests, snoopable home
node, etc.). If you don’t care too much about these details, then I would
argue that configuring any Ruby protocol as usual and configuring your
topology to force traffic through a single link could get you closer to a
CXL-style configuration.  You could also implement a link adapter/bridge
component to model the CXL link layer better.


I'm not really interested in Ruby - I've generally "rolled my own", so to
speak.

Maybe it would be useful to set up a Zoom meeting where we can sketch systems
diagrams or something!

Best wishes - Eliot
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: 回复：Re: 回复：gem5& execute&

2023-04-20 Thread Eliot Moss via gem5-users


On 4/20/2023 12:20 PM, 中国石油大学张天 via gem5-users wrote:
Okay, thank you. I have received your suggestion and I will think it over tomorrow. It's already 
midnight here, so I'll go to bed first 。 Thank you again. By the way, it seems that every time I 
send you an email, you always reject it.


If you mean I am always saying "that's not a concept in gem5",
then I am just trying to be honest.  There's nothing wrong with
playing around with a residual based ALU.  I just don't see how
it makes any sense in gem5 because of the level at which it
models things.  You might be able to connect such a hardware
model up, invoke it whenever an instruction comes along that
wants to use it, and plug the result back in, but what would
that really accomplish, assuming it is just an alternative
implementation (at the gate level) of existing instructions?
They aren't modeled at the gate level anyway.  Maybe it would
stress test your model, but assuming the mode is correct, I
suspect all it would do is dramatically slow down the simulation.

May be I/we could be more helpful if you could clarify your
broader goal, what you're trying to accomplish, and why you
think gem5 is a good vehicle for it.  If I am just totally
misunderstanding, I apologize.

Regards - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: 回复：gem5& execute&

2023-04-20 Thread Eliot Moss via gem5-users


On 4/20/2023 11:56 AM, 中国石油大学张天 via gem5-users wrote:
Thank you for your reply again, hahahaha. I have been thinking recently whether it is possible to 
design ALU through Verilog, translate it through Verilator or other tools, and ultimately use it in 
Gem5. I'm not sure if it's feasible. I am so obsessed with ALU because I need to provide a rough 
reproduction of an article that uses the Residual Number System (RNS) to design a CPU core. Since I 
first learned about Gem5, I wanted to keep working on this tool. But now it doesn't seem very good.


As far as gem5 is concerned, you could just write the
operations in C++.  Though if the overall result is the
same as regular binary arithmetic, there is little point.
gem5 simply does not model at that level.  What would you
be showing?

If you want to be modeling things at that level, maybe
building up a CPU using an FPGA board is for you.

Best wishes - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: gem5& execute&

2023-04-20 Thread Eliot Moss via gem5-users


On 4/20/2023 11:33 AM, Eliot Moss via gem5-users wrote:

On 4/20/2023 10:58 AM, 中国石油大学张天 via gem5-users wrote:
Hello everyone, I would like to ask, when executing non memory access instructions in Gem5, 
shouldn't it be executed in ALU? But ALU has not been specifically designed and implemented, how 
is this instruction executed?


gem5 does not model at the circuit level.  A piece of C++ code,
defined with the instruction, will perform the operation.  It
charges an indicated amount of time, according to the functional
unit.  gem5 deals with all the queueing and timing, but does not
model at the gate level.

Best wishes - EM


It occurred to me to add a little bit to this.  The out-of-order (O3)
cpu model allows you define a bunch of different functional units,
which ones can handle which instructions, how many of each the cpu
has, their timing, etc.  For that model anyway there is no single
"ALU".  What I said above still holds: gem5 defines enough to be
able to determine timing (and an estimate of power consumption) but
it does not model at a more detailed level than that.

The model of DRAM chips is somewhat more detailed and realistic
(perhaps), but that's necessary to get obtain accurate timing
predictions.

HTH -- EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: gem5& execute&

2023-04-20 Thread Eliot Moss via gem5-users


On 4/20/2023 10:58 AM, 中国石油大学张天 via gem5-users wrote:
Hello everyone, I would like to ask, when executing non memory access instructions in Gem5, 
shouldn't it be executed in ALU? But ALU has not been specifically designed and implemented, how is 
this instruction executed?


gem5 does not model at the circuit level.  A piece of C++ code,
defined with the instruction, will perform the operation.  It
charges an indicated amount of time, according to the functional
unit.  gem5 deals with all the queueing and timing, but does not
model at the gate level.

Best wishes - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: SPEC2017 Calculating CPI

2023-04-13 Thread Eliot Moss via gem5-users


On 4/13/2023 6:24 AM, Farbin Fayza via gem5-users wrote:

Hi Heng,
Thanks for your reply. Do you mean something like this? I changed atomic.cc 
like this:

void AtomicSimpleCPU::drainResume()
{
     assert(!tickEvent.scheduled());
     if (switchedOut())
         return;

     DPRINTF(SimpleCPU, "Resume\n");
     verifyMemoryMode();

     assert(!threadContexts.empty());

     _status = BaseSimpleCPU::Idle;

*threadInfo[0]->thread->lastSuspend = curTick();*
     for (ThreadID tid = 0; tid < numThreads; tid++) {
         if (threadInfo[tid]->thread->status() == ThreadContext::Active) {
             threadInfo[tid]->execContextStats.notIdleFraction = 1;
             activeThreads.push_back(tid);
             _status = BaseSimpleCPU::Running;

             // Tick if any threads active
             if (!tickEvent.scheduled()) {
                 schedule(tickEvent, nextCycle());
             }
         } else {
             threadInfo[tid]->execContextStats.notIdleFraction = 0;
         }
     }


     // Reschedule any power gating event (if any)
     schedulePowerGatingEvent();
}

After this change, I rebuilt gem5 and ran a simulation on simple.py. But still, the number of cycles 
is very higher than the number of instructions.


Some things I can imagine happening ...

- Maybe numCycles is actually number of ticks?  A tick is the smallest unit of time in the system, 
usually picoseconds - though that would seem to imply a really high execution rate of nearly 10

insts per nanoseond

- Maybe the cpu is idle a lot

It should not be too hard to find where those statistics are defined, output, 
and manipulated
in the source code, by using grep.

HTH - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: 回复：the definition of IntAlu

2023-04-12 Thread Eliot Moss via gem5-users


On 4/12/2023 8:12 AM, 中国石油大学张天 via gem5-users wrote:


So my current understanding is that there is no specific implementation of the ALU in gem5, but some 
required attributes are set, such as delay, number of functional units, power consumption, etc. So 
to design the ALU, it doesn't work. The feeling is to regard the functional unit as a black box, no 
matter what is done in the middle, only define how long it takes to get out of the black box, and 
how much power is used in the black box, etc. Do you think I understand this right?


Yes.  gem5 is a *timing* simulator, but at the functional level,
not the circuit level or logic gate level.  There are, of course,
other simulators that can do that.

See this Wikipedia page for some free circuit simulators:

https://en.wikipedia.org/wiki/List_of_free_electronics_circuit_simulators

and this one for (not necessarily free) HDL simulators:

https://en.wikipedia.org/wiki/List_of_HDL_simulators

and this one for an overview of electronic design automation software:

https://en.wikipedia.org/wiki/Comparison_of_EDA_software

You might wish to simulate your own design using FPGA boards as well.

I've not used any of these tools myself, but am generally aware of them.

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: the definition of IntAlu

2023-04-12 Thread Eliot Moss via gem5-users


On 4/12/2023 7:17 AM, Eliot Moss wrote:

On 4/11/2023 11:50 PM, 中国石油大学张天 via gem5-users wrote:
In gem5, where are the actual definitions of various functional units? For example, where is the 
definition of IntAlu?


src/cpu/o3/FuncUnitConfig.py


Typed too fast:

Not to be snarky, but I found this in less than 30 seconds using grep over the 
sources.

My command was grep -ri intalu gem5/src

Best - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: the definition of IntAlu

2023-04-12 Thread Eliot Moss via gem5-users


On 4/11/2023 11:50 PM, 中国石油大学张天 via gem5-users wrote:
In gem5, where are the actual definitions of various functional units? For example, where is the 
definition of IntAlu?


src/cpu/o3/FuncUnitConfig.py

No to be narky, but I found this in less than 30 seconds using grep over the 
sources.

Best - Eliot Mos
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: CPU&

2023-04-09 Thread Eliot Moss via gem5-users


On 4/9/2023 9:33 PM, 中国石油大学张天 via gem5-users wrote:
Hi everyone, I'm new to gem5. I wonder if it is possible to make changes to the ALU with Gem5? For 
example, I want to implement the addition of two numbers through the Residue Number System instead 
of binary. Or is there any way to implement addition based on Residue Number System? Such as 
modifying instructions?


If addition produces the same result, how would you know the difference?
gem5 does not model at the gate level, only the overall effect.  You can
certainly add new instructions however, if you can figure an encoding, etc.

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: SPEC2017 - Most of the metrics in m5.out/stats.txt are 0 or undefined

2023-04-07 Thread Eliot Moss via gem5-users


On 4/7/2023 10:05 PM, Farbin Fayza via gem5-users wrote:
Could you kindly tell me if there's any way to run the gem5 simulation faster using multiple cores? 
Is it possible while we run SPEC?


The only way is to run multiple distinct simulations in parallel.
This can be done by directing their outputs (both the gem5 output
and the simulated program's output) to different files.  When I
do this I also have some temporary mounts that I need to be
careful about so I copy the (fortunately small) different
mounted drive files.

Anyway, I do this all the time.

HTH -- Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Error in loading libprotobuf9

2023-04-05 Thread Eliot Moss via gem5-users


On 4/5/2023 1:25 PM, Ponda, Esha Ashish via gem5-users wrote:

I tried to look into it and found that my OS has libprotobuf17 installed



./build/ARM/gem5.opt configs/example/arm/starter_se.py --cpu="minor" 
tests/test-progs/hello/bin/arm/linux/hello


I am trying to run the above command and getting the following error:



Any suggestions on what I can do to avoid getting the above error?


I wonder if you have updated your OS *after* building gem5.  Maybe
a rebuild of gem5 will cause it to pick up the current version of
libprotobuf?

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Error in loading libprotobuf9

2023-04-05 Thread Eliot Moss via gem5-users


On 4/5/2023 9:59 AM, Ponda, Esha Ashish via gem5-users wrote:

Hello,
I have tried to install the libprotobuf9 package multiple times, yet it is showing this same error. 
Can anyone help me with this please?


I think you need to research the right name for the *package*
under whatever Linux distribution you are using.  There is no
single answer to that question, but the package name clearly
is not libprotobuf9; rather, that is the base name of one or
more files within the package.

One of my Ubuntu systems gives me this as a list of package
names that start with libprotobuf, using the command:

apt list | grep "^libprotobuf"

libprotobuf-c-dev/jammy-updates,jammy-security 1.3.3-1ubuntu2.1 amd64
libprotobuf-c-dev/jammy-updates,jammy-security 1.3.3-1ubuntu2.1 i386
libprotobuf-c1/jammy-updates,jammy-security 1.3.3-1ubuntu2.1 amd64
libprotobuf-c1/jammy-updates,jammy-security 1.3.3-1ubuntu2.1 i386
libprotobuf-dev/jammy-updates,jammy-security,now 3.12.4-1ubuntu7.22.04.1 amd64 
[installed]
libprotobuf-dev/jammy-updates,jammy-security 3.12.4-1ubuntu7.22.04.1 i386
libprotobuf-java-format-java/jammy,jammy 1.3-1.1 all
libprotobuf-java/jammy-updates,jammy-updates,jammy-security,jammy-security 
3.12.4-1ubuntu7.22.04.1 all
libprotobuf-lite23/jammy-updates,jammy-security,now 3.12.4-1ubuntu7.22.04.1 
amd64 [installed,automatic]
libprotobuf-lite23/jammy-updates,jammy-security 3.12.4-1ubuntu7.22.04.1 i386
libprotobuf2-java/jammy,jammy 2.6.1-4 all
libprotobuf23/jammy-updates,jammy-security,now 3.12.4-1ubuntu7.22.04.1 amd64 
[installed,automatic]
libprotobuf23/jammy-updates,jammy-security 3.12.4-1ubuntu7.22.04.1 i386

The *installed* ones were adequate on my system.  You could dig into
these packages online to see if they contain what you need. That version
9 seems pretty old at this point - Ubuntu bionic offers 10, for example.

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: debugging python code inside GEM5

2023-03-31 Thread Eliot Moss via gem5-users


On 3/31/2023 3:26 AM, G via gem5-users wrote:

Hello,

GEM5 seems C++ wrapping Python, means C++ is on top.
I can easily debug with gdb setting in VSCODE, but anyone knows how to debug into those emeded 
python codes?

Such as se.py?


Personally, I have found adding in print statements to do
pretty well when I have a python code issue needing exploration.
I know it's "old school", but it's typically straightforward.
Not all debugging needs a GUI tool.

(Likewise, I find gdb, or udb (the Undo debugger, a commercial
product that academics can obtain for free, and which can wind
backward and forward through an execution) to serve me well on
the C++ code.)

Best wishes - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: what is the significance of hostMemory parameter in stats.txt file

2023-03-30 Thread Eliot Moss via gem5-users


On 3/30/2023 7:13 AM, Sadhana . via gem5-users wrote:


   Can hostMemory parameter value be considered as the memory foot print size of an application.  
Does it include the memory occupied by the entire application.


I believe it is the total memory of the simulated machine,
and has nothing to do with how much of that memory is used
by the simulated application.  Using grep over the source
code can help answer questions like this :-) ...

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] CXL (Compute Express Link) in gem5?

2023-03-25 Thread Eliot Moss via gem5-users


I'm wondering what work has been done to model CXL in gem5.

Is it something that can be modeled with existing gem5
components by adjusting their timing and other parameters,
or would modeling it well require new components?

From a quick high-level review of what CXL is (Wikipedia),
I *think* I'm most interested in CXL.cache (giving a device
high performance coherent access to memory) and possibly
CXL.mem.  I'm more interested in modeling the performance
than in modeling all the parameter read-out and setup that
would be in CXL.io, as I understand it.

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Retired instructions versus ticks

2023-03-22 Thread Eliot Moss via gem5-users


On 3/22/2023 12:09 PM, Priyanka Ankolekar via gem5-users wrote:


Regarding the other part of your email:
Let me begin by saying I am a novice to both RISCV and gem5.
I have a RISCV RTL with a certain config. I have set up gem5 to match that configuration. I want to 
make sure that they are indeed equivalent so that I can run some experiments on gem5 (instead of on 
RTL) since that would be faster and easier. In order to establish that equivalence, I am running a 
simple benchmark test on both RTL and gem5. The final numbers like DMIPS/MHz etc match fairly 
closely. But I want to dig further to see if the retired instruction/s at a given tick, for both 
these setups, are also a close match.

Hence the questions.


My suggestion would be to:

- Read the CSR at points of interest - one hopes not *too* many points to avoid being overwhelmed 
with output.  Do this in gem5 and in your RTL.


- Add code to gem5 to print the value the tick and the value read when the CSR is read.  A DNPRINTF 
call would serve nicely.  grep can help you find where the right code is using the register name.


Would this do the trick?

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Retired instructions versus ticks

2023-03-22 Thread Eliot Moss via gem5-users


On 3/22/2023 11:11 AM, Priyanka Ankolekar wrote:

Sorry, I should have clarified. I am using the RISCV ISA in gem5.


(As you could have done,) I checked the gem5 sources,
and it *does* model that register, returning totalInsts
as gem5 calculates that.  Presumably that is the same as
statistics will give you, but you could read it on the
fly.  Not sure if the instruction to read that is
privileged, though if it is, you could (as a hack)
change gem5 to allow it to be read in user mode.

Cheers - EM

PS: You did not respond to the other part of that I
said: What is it that you are really trying to do that
the previous suggestions do not satisfy?  Cheers - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Retired instructions versus ticks

2023-03-22 Thread Eliot Moss via gem5-users


On 3/22/2023 8:37 AM, Priyanka Ankolekar via gem5-users wrote:

Thank you, Eliot.

Is there a way to probe minstret CSR to get the retired instructions?


??  What ISA are you talking about?

I doubt that gem5 would model such details of a processor
architecture.  Maybe you should back up a little and tell
us what you're really trying to do, since neither the
retired instructions stats nor a full trace seem to meet
your need ...

Best - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Retired instructions versus ticks

2023-03-20 Thread Eliot Moss via gem5-users


On 3/20/2023 5:05 PM, Priyanka Ankolekar via gem5-users wrote:

Hi Eliot,
(Picking this up again after a while.) :-)

Thank you for your detailed answer. I was able to get a lot of useful data 
points from these statistics.
Is there a way to get what instruction was retired/committed and when (tick)?


That would be a full trace.  For that, look into the various debug flags.

Be prepared for a LOT of output!!

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Question about setting up to use NVM

2023-03-19 Thread Eliot Moss via gem5-users


On 3/18/2023 10:40 PM, Ayaz Akram via gem5-users wrote:

Hi Eliot,

MemCtrl() memory controller in gem5 can control a single DRAM interface or a single NVM interface at 
a time. I think one way to verify that things are set-up correctly is to confirm this from the 
"m5out/config.ini". If config.ini seems to be using 'MemCtrl' type for the memory controller and the 
memory interface connected to the controller is of the type 'NVMInterface', I think that should 
confirm that you are simulating an NVM device.


Thanks, Ayaz!  Yes, that's what I get.  It seems odd that the component
gets named blah-blah.dram, but it clearly says NVMInterface, so that's ok.

I appreciate the confirmation.

Eliot
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Question about setting up to use NVM

2023-03-18 Thread Eliot Moss via gem5-users


Dear gem5-ers -

I wanted to set up to use NVM only, so I tried this on the command line:

  --nvm-type=NVM_2400_1x64

This had no effect.  Digging into configs/common/MemConfig.py was not directly
enlightening.  However, it seems that Options.py sets mem-type to a particular
DRAM by default.  Doing --mem-type=None does not work, giving an error message
that None is not a valid memory type.

What *seems* to work is saying this:

  --mem-type=NVM_2400_1x64

If that is the "approved" way, then fine, but I wanted to make sure that I am
not ending up with some strange mix of controller and device, part intended
for DRAM and part for NVM ...

(Maybe mem-type should be set to that default only if neither mem-type nor
nvm-type are given?  I also noticed that the default and flags for "colors"
are such that one cannot get colors printed any more.  I did a local fix,
but maybe that was not intended?)

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Error: Can't find a working Python installation redux

2023-03-15 Thread Eliot Moss via gem5-users

On 3/15/2023 8:57 AM, Kar, Anurag Arunkumar via gem5-users wrote:

Hi,

I tried following previous archived threads which said the solution to this problem was to provide
the path to PYTHON_CONFIG and not using a conda environment.

I am not using a conda environment and am providing the path in the scons
command:

scons ./build/ARM_MESI_Three_Level/gem5.opt PYTHON_CONFIG=/usr/bin/python-config

scons: Reading SConscript files ...

Mkdir("/data/anurag/gem5-public/build/ARM_MESI_Three_Level/gem5.build")

Checking for linker -Wl,--as-needed support... (cached) yes

Checking for compiler -Wno-free-nonheap-object support... (cached) yes

Checking for compiler -gz support... (cached) yes

Checking for linker -gz support... (cached) yes

Info: Using Python config: python3-config

Checking for C header file Python.h... (cached) yes

Checking Python version... no

Error: Can't find a working Python installation

I have been able to build gem5 on this very machine before, I’m not sure what changed between then
and now. Can someone help me debug this?

To pass an environment variable into a program, write your command line this
way:

PYTHON_CONFIG=/usr/bin/python-config scons ./build/ARM_MESI_Three_Level/gem5.opt

or else do this first:

export PYTHON_CONFIG=/usr/bin/python-config

BUT ... at least in my setup, python-config does not exist, yet all works fine.

Is python accessible through PATH? That's what I would check first. If you're
on bash, then "type python" will show you.

Cheers - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: SPEC CPU2017 X86 SE mode - instruction 'palignr_Vdq_Wdq_Ib' unimplemented

2023-03-14 Thread Eliot Moss via gem5-users


On 3/13/2023 5:33 PM, Abitha Thyagarajan via gem5-users wrote:

Hi Eliot and Mirco,

I had the same issue with `palignr_Vdq_Wdq_Ib` being unimplemented. I tried compiling my application 
binary (i.e., the one I was trying to run on gem5, not gem5 itself) to exclude SSE which contains 
that instruction. I used gcc flags `-mno-sse3 -mno-ssse3 -mno-sse4.1 -mno-sse4.2 -mno-sse4`. 
However, I still get the warning message about `palignr_Vdq_Wdq_Ib` being unimplemented.


Do either of you have any idea about this? If you had success with resolving this warning by another 
method, please let me know.


Sorry - I have no deep knowledge on this and was only making a suggestion.

However, maybe the instruction is coming from a library routine that gets
linked in later, and this not from your actual gcc output.  You can probably
locate the offending instruction using objdump, looking at (searching) the
assembly code of a fully linked executable.

If it *is* a library routine, I'm not sure exactly how to get a version that
does not use that instruction.  Some library routines test hardware ability
at run time and choose different function versions based on the result.  If
the gem5 modeled cpu say "I have SSE", then you may get that version.  It may
be possible to configure the gem5 cpu without SSE (I've not researched that).

I hope this gives you some useful possibilities to pursue.

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Slowness when running SAT Solver in gem5 SE mode

2023-03-13 Thread Eliot Moss via gem5-users


Umm, it's a simulator, and you requested the most detailed simulation mode
(DerivO3CPU).  I expect slowdown factors of *at least* 1000 with such a mode.
That you are seeing perhaps 4000-5000 does not surprise me all that much.  The
simulator has to do a lot of work for each simulated instruction.

If you just want hit/miss statistics, a simpler cpu model might serve your
purpose.  The simplest that might do it is AtomicSimpleCPU - it certainly
maintains a model of the cache; I am less certain about branch prediction.

Tools other gem5 may get you this kind of result faster - maybe some of the
valgrind tools, for example.

Others might be able to confirm or argue against my sense of the slowdown
factor (which I sometimes call the *time dilation* of simulation).

HTH - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: error when running Z3 in SE mode on gem5

2023-03-11 Thread Eliot Moss via gem5-users


I think the key thing here might be the:

"Resource temporarily unavailable"

You seem to have exceeded some limited resource, perhaps
available memory, number of processes (forks), or something
like that.

You may also have more success booting an actual kernel
as opposed to running with emulated syscalls, though of
course it will tend to be a bit slower to simulate.

Regards - Eliot Moss
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

1 2 >

1 - 100 of 161 matches

Mail list logo