[gem5-users] Committing loads

2021-05-06 Thread Farhad Yusufali via gem5-users
 Hello all,

I’ve been staring at the memory trace of a benchmark and am confused as to why 
some loads that I expect to commit are not being committed. Note that I am 
using Ruby.

I’ve been tracing the path loads take and it seems to be the following (More 
information can be found here 
https://www.gem5.org/project/2020/07/18/gem5-o3cpu-backend.html):



instToCommit primarily seems to just add the instruction to a structure called 
an “iewQueue”, but nothing is actually committed.

Then later on, in the tick() function in iew_impl.hh, the load is actually 
committed using the following code:

// Update structures based on instructions committed.   


if (fromCommit->commitInfo[tid].doneSeqNum != 0 &&
!fromCommit->commitInfo[tid].squash &&
!fromCommit->commitInfo[tid].robSquashing) {

ldstQueue.commitStores(fromCommit->commitInfo[tid].doneSeqNum,tid);

ldstQueue.commitLoads(fromCommit->commitInfo[tid].doneSeqNum,tid);

updateLSQNextCycle = true;
instQueue.commit(fromCommit->commitInfo[tid].doneSeqNum,tid);
}

In the commitLoads function [see code excerpt above], the load entry is 
actually freed from the LSQ. I’ve been trying to find the link between the 
“instToCommit” function and the code in “tick” but to no avail. Where/how 
exactly does the tick function decide if it can commit a load? Essentially, 
what’s the bridge between IEW:instToCommit and IEW::tick? I was trying to 
figure out the connection between “fromCommit” [see code excerpt above] and 
“iewQueue", but it seems to involve wires, time buffers and several other 
concepts I’m not familiar with.

Thanks,
Farhad



___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] L1 load misses

2021-04-22 Thread Farhad Yusufali via gem5-users
Hello all,

I’m simulating a single core system with Ruby, using the MESI_Two_Level 
protocol. I need to measure the number of L1 load misses. To do so, I’m summing 
up these two statistics:

system.ruby.L1Cache_Controller.NP.Load
system.ruby.L1Cache_Controller.I.Load

I would expect the resulting sum to give me the number of load requests that 
miss in the L1. However, I’m seeing some confusing behaviour that isn’t what I 
expect, so I’m trying to figure out what’s going on.

a) Does anyone where the two stats above are generated? 
MESI_Two_Level-L1cache.sm doesn’t seem to be generating these stats, nor can I 
find any reference to them in the C++ code…

b) I just wanted to confirm that this is in fact the correct way to measure the 
number of L1 load misses?

Thanks,
Farhad




___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Simulated vs Committed Ops

2021-03-15 Thread Farhad Yusufali via gem5-users
That’s very helpful, thanks a lot!

Best,
Farhad

On Mar 15, 2021, at 1:32 PM, Giacomo Travaglini 
mailto:giacomo.travagl...@arm.com>> wrote:

EXTERNAL EMAIL:

Yeah, this is a quirkiness of gem5. I have tried to address it some time ago 
[1]; I should probably go back and rebase the patchset.

Basically the numOp, by being a Counter (and not a Stat) doesn't get reset 
every time you reset stats.
So the numOp is the total number of ops since the beginning of the simulation, 
whereas committedOp is the real indication of committed ops  in the region of 
interest (as it can be reset).

My advice is to simply look at committedOp if you just want to know executed 
ops in the region of interest

Kind Regards

Giacomo

[1]: https://gem5-review.googlesource.com/c/public/gem5/+/25303

-Original Message-
From: Farhad Yusufali 
mailto:farhad.yusuf...@mail.utoronto.ca>>
Sent: 15 March 2021 17:22
To: Giacomo Travaglini 
mailto:giacomo.travagl...@arm.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: Simulated vs Committed Ops

Hi Giacomo,

I only execute a single workload every time I run a simulation, so there are no
interactions between different workloads.

I do reset my stats at the beginning of the region of interest of my workload
though (using a pseudo-instruction).

Thanks,
Farhad



On Mar 15, 2021, at 12:56 PM, Giacomo Travaglini
mailto:giacomo.travagl...@arm.com> 
<mailto:giacomo.travagl...@arm.com> >
wrote:

EXTERNAL EMAIL:

Hi Farhad,

Quick question: are you resetting the stats between one workload
and the other?

Kind Regards

Giacomo



-----Original Message-
From: Farhad Yusufali via gem5-users mailto:us...@gem5.org> <mailto:gem5-users@gem5.org> >
Sent: 15 March 2021 16:26
To: gem5-users@gem5.org<mailto:gem5-users@gem5.org> <mailto:gem5-users@gem5.org>
Cc: Farhad Yusufali 
mailto:farhad.yusuf...@mail.utoronto.ca>
<mailto:farhad.yusuf...@mail.utoronto.ca> >
Subject: [gem5-users] Simulated vs Committed Ops

Hello all,

I’m trying to figure out the difference between two stats -
simOps and
system.cpu.commit.committedOps. I’m using a 4-wide single
DerivO3CPU
CPU, running serial workloads (i.e. no threads). In
src/cpu/o3/cpu.cc<http://cpu.cc/> <http://cpu.cc/>
<http://cpu.cc<http://cpu.cc/> <http://cpu.cc/> > , in the instDone function, I
found the following code that is
executed every time an instruction is committed:

  thread[tid]->numOp++;
  thread[tid]->numOps++;
  committedOps[tid]++;


As you can see, both committedOps and numOp is
incremented once for
every op committed. In the same file, in the totalOps
function, the numOp
values across different threads are accumulated to return a
total op count for
the entire core:

template 
Counter
FullO3CPU::totalOps() const
{
  Counter total(0);


  ThreadID size = thread.size();
  for (ThreadID i = 0; i < size; i++) {
  total += thread[i]->numOp;
  }


  return total;
}


Finally, in src/cpu/base.hh, in the BaseCPU class, there is a
function,
numSimulatedOps that coalesces op counts across CPUs:

static Counter numSimulatedOps()
{
  Counter total = 0;


  int size = cpuList.size();
  for (int i = 0; i < size; ++i)
  total += cpuList[i]->totalOps();


  return total;
}


This total value is used to generate the simOps stat.


To summarize, every time an op is commited, both
committedOps and
numOp are incremented. committedOps is dumped out
using the
system.cpu.commit.committedOps stat, while numOp is
accumulated across
both threads and CPUs and then dumped out using the
simOps stat. Since
I’m using a single CPU with a single thread, I would expect
both stats to be
identical, however they end up being vastly different
(sometimes by over a
factor of 10) (simOps is usually larger).

Does anyone have any insight into why this could be
happening?

Thanks,
Farhad


IMPORTANT NOTICE: The contents of this email and any attachments
are confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy the
information in any medium. Thank you.


IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Simulated vs Committed Ops

2021-03-15 Thread Farhad Yusufali via gem5-users
Hi Giacomo,

I only execute a single workload every time I run a simulation, so there are no 
interactions between different workloads.

I do reset my stats at the beginning of the region of interest of my workload 
though (using a pseudo-instruction).

Thanks,
Farhad

On Mar 15, 2021, at 12:56 PM, Giacomo Travaglini 
mailto:giacomo.travagl...@arm.com>> wrote:

EXTERNAL EMAIL:

Hi Farhad,

Quick question: are you resetting the stats between one workload and the other?

Kind Regards

Giacomo

-Original Message-
From: Farhad Yusufali via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: 15 March 2021 16:26
To: gem5-users@gem5.org<mailto:gem5-users@gem5.org>
Cc: Farhad Yusufali 
mailto:farhad.yusuf...@mail.utoronto.ca>>
Subject: [gem5-users] Simulated vs Committed Ops

Hello all,

I’m trying to figure out the difference between two stats - simOps and
system.cpu.commit.committedOps. I’m using a 4-wide single DerivO3CPU
CPU, running serial workloads (i.e. no threads). In 
src/cpu/o3/cpu.cc<http://cpu.cc/>
<http://cpu.cc<http://cpu.cc/>> , in the instDone function, I found the 
following code that is
executed every time an instruction is committed:

   thread[tid]->numOp++;
   thread[tid]->numOps++;
   committedOps[tid]++;


As you can see, both committedOps and numOp is incremented once for
every op committed. In the same file, in the totalOps function, the numOp
values across different threads are accumulated to return a total op count for
the entire core:

template 
Counter
FullO3CPU::totalOps() const
{
   Counter total(0);


   ThreadID size = thread.size();
   for (ThreadID i = 0; i < size; i++) {
   total += thread[i]->numOp;
   }


   return total;
}


Finally, in src/cpu/base.hh, in the BaseCPU class, there is a function,
numSimulatedOps that coalesces op counts across CPUs:

static Counter numSimulatedOps()
{
   Counter total = 0;


   int size = cpuList.size();
   for (int i = 0; i < size; ++i)
   total += cpuList[i]->totalOps();


   return total;
}


This total value is used to generate the simOps stat.


To summarize, every time an op is commited, both committedOps and
numOp are incremented. committedOps is dumped out using the
system.cpu.commit.committedOps stat, while numOp is accumulated across
both threads and CPUs and then dumped out using the simOps stat. Since
I’m using a single CPU with a single thread, I would expect both stats to be
identical, however they end up being vastly different (sometimes by over a
factor of 10) (simOps is usually larger).

Does anyone have any insight into why this could be happening?

Thanks,
Farhad
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Simulated vs Committed Ops

2021-03-15 Thread Farhad Yusufali via gem5-users
Hello all,

I’m trying to figure out the difference between two stats - simOps and 
system.cpu.commit.committedOps. I’m using a 4-wide single DerivO3CPU CPU, 
running serial workloads (i.e. no threads). In 
src/cpu/o3/cpu.cc, in the instDone function, I found the 
following code that is executed every time an instruction is committed:

thread[tid]->numOp++;
thread[tid]->numOps++;
committedOps[tid]++;

As you can see, both committedOps and numOp is incremented once for every op 
committed. In the same file, in the totalOps function, the numOp values across 
different threads are accumulated to return a total op count for the entire 
core:

 template 
Counter
FullO3CPU::totalOps() const
{
Counter total(0);

ThreadID size = thread.size();
for (ThreadID i = 0; i < size; i++) {
total += thread[i]->numOp;
}

return total;
}

Finally, in src/cpu/base.hh, in the BaseCPU class, there is a function, 
numSimulatedOps that coalesces op counts across CPUs:

 static Counter numSimulatedOps()
 {
Counter total = 0;

int size = cpuList.size();
for (int i = 0; i < size; ++i)
total += cpuList[i]->totalOps();

return total;
 }

This total value is used to generate the simOps stat.

To summarize, every time an op is commited, both committedOps and numOp are 
incremented. committedOps is dumped out using the 
system.cpu.commit.committedOps stat, while numOp is accumulated across both 
threads and CPUs and then dumped out using the simOps stat. Since I’m using a 
single CPU with a single thread, I would expect both stats to be identical, 
however they end up being vastly different (sometimes by over a factor of 10) 
(simOps is usually larger).

Does anyone have any insight into why this could be happening?

Thanks,
Farhad
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: ThreadID vs ContextID vs threadNumber

2020-11-13 Thread Farhad Yusufali via gem5-users
Got it, thanks. If I have a ThreadID and know what CPU it's running on, how 
would I go about retrieving the corresponding ContextID?

Thanks,
Farhad

From: Adrian Herrera 
Sent: November 13, 2020 10:53 AM
To: gem5 users mailing list 
Cc: Farhad Yusufali ; Giacomo Travaglini 

Subject: Re: [gem5-users] Re: ThreadID vs ContextID vs threadNumber

EXTERNAL EMAIL:  Treat content with extra caution.

Hi Farhad,



To clarify:

  *   CPU ports, in all simple, minor and O3 configurations, tag generated 
Requests with ContextID.
  *   Outgoing Packets from CPU ports have a Request pointer inside, via which 
you can get the ContextID.



Kind regards,

Adrian.



From: Giacomo Travaglini via gem5-users 
Reply to: gem5 users mailing list 
Date: Friday, 13 November 2020 at 14:39
To: gem5 users mailing list 
Cc: Farhad Yusufali , Giacomo Travaglini 

Subject: [gem5-users] Re: ThreadID vs ContextID vs threadNumber



HI Farhad,



ThreadID -> index of the thread within the CPU

ContextID -> global index of the thread within the System



As you can imagine they differ in a MP simulation, with multiple CPUs per System



Kind Regards



Giacomo



From: Farhad Yusufali via gem5-users 
Sent: 13 November 2020 14:25
To: gem5 users mailing list 
Cc: Farhad Yusufali 
Subject: [gem5-users] Re: ThreadID vs ContextID vs threadNumber



Hello,



Just following up on this, any help would be appreciated.



Thanks,

Farhad



From: Farhad Yusufali via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: November 11, 2020 7:31 PM
To: gem5-users mailto:gem5-users@gem5.org>>
Cc: Farhad Yusufali 
mailto:farhad.yusuf...@mail.utoronto.ca>>
Subject: [gem5-users] ThreadID vs ContextID vs threadNumber



EXTERNAL EMAIL:  Treat content with extra caution.

Hi all,



I'm trying to identify what thread a packet and/or instruction belongs to. I 
see three possible options and I'm hoping someone can clarify the difference 
between the 3. I was not able to find a lot of documentation online.



What is the difference between ContextID, ThreadID and threadNumber? (this last 
one is a member of the DynInstPtr class)



I'm not using SMT - in this case do any of the IDs end up being the same?



Thanks,

Farhad

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: ThreadID vs ContextID vs threadNumber

2020-11-13 Thread Farhad Yusufali via gem5-users
Hello,

Just following up on this, any help would be appreciated.

Thanks,
Farhad

From: Farhad Yusufali via gem5-users 
Sent: November 11, 2020 7:31 PM
To: gem5-users 
Cc: Farhad Yusufali 
Subject: [gem5-users] ThreadID vs ContextID vs threadNumber

EXTERNAL EMAIL:  Treat content with extra caution.
Hi all,

I'm trying to identify what thread a packet and/or instruction belongs to. I 
see three possible options and I'm hoping someone can clarify the difference 
between the 3. I was not able to find a lot of documentation online.

What is the difference between ContextID, ThreadID and threadNumber? (this last 
one is a member of the DynInstPtr class)

I'm not using SMT - in this case do any of the IDs end up being the same?

Thanks,
Farhad
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] ThreadID vs ContextID vs threadNumber

2020-11-11 Thread Farhad Yusufali via gem5-users
Hi all,

I'm trying to identify what thread a packet and/or instruction belongs to. I 
see three possible options and I'm hoping someone can clarify the difference 
between the 3. I was not able to find a lot of documentation online.

What is the difference between ContextID, ThreadID and threadNumber? (this last 
one is a member of the DynInstPtr class)

I'm not using SMT - in this case do any of the IDs end up being the same?

Thanks,
Farhad
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: SE Mode crashing with multithread workload

2020-10-15 Thread Farhad Yusufali via gem5-users
Hi all,

Just following up on this. Any help would be appreciated!

Thanks,
Farhad


From: Farhad Yusufali
Sent: October 13, 2020 9:37 PM
To: gem5-users@gem5.org 
Subject: SE Mode crashing with multithread workload

Hi all,

My gem5 version is fa70478413e4650d0058cbfe81fd5ce362101994. I'm trying to run 
a multithreaded workload in SE mode, but it's crashing. Here is my very simple 
workload:


#include 

#include 

using namespace std;


int sum[4];


void* thread(void* sum) {

  for (int i = 0; i < 1000; i++)

*((int*)sum) += i;


  return 0;

}


int main() {

  sum[0] = sum[1] = sum[2] = sum[3] = 0;

  pthread_t threads[4];


  for (int tid = 0; tid < 4; tid++)

  pthread_create([tid], NULL, thread, [tid]);


  for (int tid = 0; tid < 4; tid++)

  pthread_join(threads[tid], NULL);


  cout << sum [0] << " " << sum[1] << " " << sum[2] << " " << sum[3] << endl;

  return 0;

}



When I run it with:

build/X86/gem5.opt --debug-flags=PseudoInst configs/example/se.py --cmd=./multi 
--num-cpus=4 --ruby --cpu-type=DerivO3CPU

I get:
panic: panic condition !clobber occurred: EmulationPageTable::allocate: addr 
0x7778d000 already mapped

I found an existing thread that discusses this but no update was posted: 
https://www.mail-archive.com/gem5-users@gem5.org/msg17926.html

Was this ever resolved?

Thanks,
Farhad
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Shared L2 with Mesh XY Topology

2020-10-14 Thread Farhad Yusufali via gem5-users
If private/shared is determined by the coherence protocol, what does the 
"number of L2 caches" represent? For instance, I'm using the MESI Two Level 
protocol, which according to http://www.m5sim.org/MESI_Two_Level uses a shared 
L2 cache. However, even when using this I can have multiple L2 caches using the 
"--num-l2caches" parameter. What does it mean to have multiple shared caches? I 
thought by definition a shared cache meant a singular cache.

Thanks,
Farhad

From: Krishna, Tushar via gem5-users 
Sent: October 13, 2020 1:43 PM
To: gem5 users mailing list 
Cc: Krishna, Tushar 
Subject: [gem5-users] Re: Shared L2 with Mesh XY Topology

EXTERNAL EMAIL:  Treat content with extra caution.
Private vs Shared L2 depends on the coherence protocol you use. The coherence 
protocol exposes the total number of controllers (L1/L2/Dir) which you can 
connect whatever way you want.
Similarly, if you have a shared L2, I think the coherence protocol can expose 
multiple NUCA slices or a single one. [I am not familiar with the latest in 
terms of the coherence protocols in gem5 today].

There does not need to be a one L2 per core — the Mesh_* is an example topology 
file which assumes there are equal number of L1s and L2s and connects them to 
all routers.
You can see MeshDirCorners_* where you can see how you can connect different 
number of L1s, L2s and Directories.

Cheers,
Tushar
On Oct 13, 2020, 1:16 PM -0400, Farhad Yusufali via gem5-users 
, wrote:
Hi all,

I'm trying to simulate a multicore system that uses a Mesh XY topology, and has 
a single shared L2. However, the documentation here 
(http://www.m5sim.org/Interconnection_Network) says the following:

Mesh_*: This topology requires the number of directories to be equal to the 
number of cpus. The number of routers/switches is equal to the number of cpus 
in the system. Each router/switch is connected to one L1, one L2 (if present), 
and one Directory. The number of rows in the mesh has to be specified by 
--mesh-rows. This parameter enables the creation of non-symmetrical meshes too.

Since there needs to be one L2 per core, I assume the L2s are private. (Unless 
I'm misunderstanding and these are just NUCA slices of one shared L2?).

How can I go about using a Mesh_XY topology with a shared L2 cache?

Thanks,
Farhad
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] SE Mode crashing with multithread workload

2020-10-13 Thread Farhad Yusufali via gem5-users
Hi all,

My gem5 version is fa70478413e4650d0058cbfe81fd5ce362101994. I'm trying to run 
a multithreaded workload in SE mode, but it's crashing. Here is my very simple 
workload:


#include 

#include 

using namespace std;


int sum[4];


void* thread(void* sum) {

  for (int i = 0; i < 1000; i++)

*((int*)sum) += i;


  return 0;

}


int main() {

  sum[0] = sum[1] = sum[2] = sum[3] = 0;

  pthread_t threads[4];


  for (int tid = 0; tid < 4; tid++)

  pthread_create([tid], NULL, thread, [tid]);


  for (int tid = 0; tid < 4; tid++)

  pthread_join(threads[tid], NULL);


  cout << sum [0] << " " << sum[1] << " " << sum[2] << " " << sum[3] << endl;

  return 0;

}



When I run it with:

build/X86/gem5.opt --debug-flags=PseudoInst configs/example/se.py --cmd=./multi 
--num-cpus=4 --ruby --cpu-type=DerivO3CPU

I get:
panic: panic condition !clobber occurred: EmulationPageTable::allocate: addr 
0x7778d000 already mapped

I found an existing thread that discusses this but no update was posted: 
https://www.mail-archive.com/gem5-users@gem5.org/msg17926.html

Was this ever resolved?

Thanks,
Farhad
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Shared L2 with Mesh XY Topology

2020-10-13 Thread Farhad Yusufali via gem5-users
Hi all,

I'm trying to simulate a multicore system that uses a Mesh XY topology, and has 
a single shared L2. However, the documentation here 
(http://www.m5sim.org/Interconnection_Network) says the following:

Mesh_*: This topology requires the number of directories to be equal to the 
number of cpus. The number of routers/switches is equal to the number of cpus 
in the system. Each router/switch is connected to one L1, one L2 (if present), 
and one Directory. The number of rows in the mesh has to be specified by 
--mesh-rows. This parameter enables the creation of non-symmetrical meshes too.

Since there needs to be one L2 per core, I assume the L2s are private. (Unless 
I'm misunderstanding and these are just NUCA slices of one shared L2?).

How can I go about using a Mesh_XY topology with a shared L2 cache?

Thanks,
Farhad
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Approximating the value of L1 misses

2020-09-10 Thread Farhad Yusufali via gem5-users
Hi all,

I didn't get a response on this so just following up - any feedback or insight 
would be appreciated!

Thanks,
Farhad

From: Farhad Yusufali
Sent: September 8, 2020 6:13 PM
To: gem5-users@gem5.org 
Subject: Approximating the value of L1 misses

Hi all,

I'm trying to reduce the latency of L1 misses - to do so, upon an L1 miss, 
instead of fetching from the L2, I would like to use a predictor to predict the 
value of the load and return this to the core (the details of the predictor are 
irrelevant). This way the processor is not stalled waiting for the block to be 
retrieved from the L2/memory. After every N misses to an address, in addition 
to generating a prediction, the L1 would also retrieve the data from the L2 and 
use this to train the predictor (the retrieved block would also be inserted 
into the L1). This is effectively an implementation of Load Value Approximation 
by San Miguel et al.

I'm using the Ruby cache system because I need access to Garnet. To implement 
this, I'm currently modifying the Ruby implementation of the L1 cache in 
gem5/src/mem/ruby/protocol/MESI_Two_Level-L1cache.sm.

However, despite my technique not having to do anything with cache coherence, 
I'm having to essentially modify the details of the MESI protocol to get my 
stuff to work. For example, say I decide to need to fetch data for training 
after predicting a load that is not currently in the L1. If the block is in the 
I state, I need to take the action of sending a request to the L2 to retrieve 
the data. However, if I'm in the I_S state, I don't need to send a request to 
the L2 since the data is already on its way. This is just one of many ways I 
need to modify the coherence protocol.

Is there an easier way to implement this without having to deal with all the 
intricacies and headaches of modifying/verifying coherence protocols? (I cannot 
switch to the classical cache system)

Thanks,
Farhad
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Approximating the value of L1 misses

2020-09-08 Thread Farhad Yusufali via gem5-users
Hi all,

I'm trying to reduce the latency of L1 misses - to do so, upon an L1 miss, 
instead of fetching from the L2, I would like to use a predictor to predict the 
value of the load and return this to the core (the details of the predictor are 
irrelevant). This way the processor is not stalled waiting for the block to be 
retrieved from the L2/memory. After every N misses to an address, in addition 
to generating a prediction, the L1 would also retrieve the data from the L2 and 
use this to train the predictor (the retrieved block would also be inserted 
into the L1). This is effectively an implementation of Load Value Approximation 
by San Miguel et al.

I'm using the Ruby cache system because I need access to Garnet. To implement 
this, I'm currently modifying the Ruby implementation of the L1 cache in 
gem5/src/mem/ruby/protocol/MESI_Two_Level-L1cache.sm.

However, despite my technique not having to do anything with cache coherence, 
I'm having to essentially modify the details of the MESI protocol to get my 
stuff to work. For example, say I decide to need to fetch data for training 
after predicting a load that is not currently in the L1. If the block is in the 
I state, I need to take the action of sending a request to the L2 to retrieve 
the data. However, if I'm in the I_S state, I don't need to send a request to 
the L2 since the data is already on its way. This is just one of many ways I 
need to modify the coherence protocol.

Is there an easier way to implement this without having to deal with all the 
intricacies and headaches of modifying/verifying coherence protocols? (I cannot 
switch to the classical cache system)

Thanks,
Farhad
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s