Re: [gem5-users] Printing stats in ROI

2019-10-23 Thread Prathap Kolakkampadath
Why don't you use existing m5_ pseudo instructions around ROI of the
benchmark. Note to compile your benchmark with the m5 library.

If you are looking for more data, you may also add them in respective
mem/cache file and compile the gem.

Regards,
Prathap

On Wed, Oct 23, 2019 at 4:41 PM Victor Kariofillis 
wrote:

> Hi,
>
> I have implemented pseudo instructions for recognizing the Region of
> Interest of the benchmarks that I am running. What I want to do is to start
> printing some information (cache data) to a file as soon as the ROI begins.
> This printing will be done through the base.cc file in mem/cache. I tried
> having a boolean "roi" variable in system.hh, so that it will globally
> available, but I'm getting compiler errors of multiple definitions. How can
> I have knowledge of whether the execution of the program is in the ROI in
> the base.cc file?
>
> I have another question also. Is it possible to delay this printing until
> the ROI ends? The problem I have with that is that when the "roiend" pseudo
> instruction is encountered, I dump the stats and exit the simulation. Is it
> possible to print these data from base.cc just as it is done with regstats
> before exiting the simulation?
>
> Thanks,
> Victor
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Fwd: Issue related to requests having no contextID

2017-10-10 Thread Prathap Kolakkampadath
Read requests to the memory addresses that are  in the write queue and not
written yet to the memory has to be responded from the write queue. I am
not sure if this answer your question, but seems like this is what you are
observing.

Thanks,
Prathap

On Oct 10, 2017 2:17 AM, "Prakhar Javre"  wrote:

> A gentle reminder for our last query.  Please have a look.
> [Edit] We printed the command name of packets in writeQueue inside the
> chooseNext function.
>
>for(auto i = queue.begin(); i != queue.end() ; ++i) {
> DRAMPacket* dram_pkt = *i;
>if(!dram_pkt->pkt->req->hasContextId()){
> inform("command : %s, read: %d, addr: %lu",
> dram_pkt->pkt->cmdString(), dram_pkt->pkt->isRead(), (dram_pkt)->addr);
> }
> if (ranks[dram_pkt->rank]->isAvailable()) {
> queue.erase(i);
> queue.push_front(dram_pkt);
> found_packet = true;
> break;
> }
> }
>
> Turns out that some of the packets in write queue have isRead() true and
> their command is changed to readReq and readResp.
> Can anyone help us in understanding why this peculiar behavior is getting
> observed.
>
> Thanks
> Prakhar Jawre
> IIT Kanpur, India
> -
>
> Hi guys,
>
> We are implementing a scheme for protection against timing channel attacks
> in DRAM controllers. In that, it is required to choose request (for DRAM
> controller) from a specific core at any particular time. While checking for
> it, we found that a lot of requests are not having contextIDs. We also
> modified cache code to assign contextIDs to writeback requests, but
> apparently, even some of the read requests (and other requests also) are
> not having any contextID.  Can you guys help us in figuring out from which
> place exactly are these requests coming?
>
> System config -
> 2 Cores, L1I, L1D, L2(shared), Prefetch off.
>
> Thanks,
> Prakhar Jawre
> IIT Kanpur, India
>
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] ramulator

2017-10-07 Thread Prathap Kolakkampadath
Read this https://github.com/CMU-SAFARI/ramulator/blob/master/README.md

Prathap

On Oct 7, 2017 1:32 AM, "crown"  wrote:

> Hi
> How to integrate ramulator with gem5?
>
>
>
>  yours sincerely
>
>
>
>   crown
>
>
> 【网易自营】好吃到爆!鲜香弹滑加热即食,经典13香/麻辣小龙虾仅75元3斤>>
> 
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Data_Cache

2017-09-08 Thread Prathap Kolakkampadath
Analyze your benchmark by running for fewer number  of loops(say 2 loops,
first loop all miss and second loop all hits) and look at the gems
statistics. This might give you a lead.

Thanks,
Prathap

On Sep 8, 2017 4:27 PM, "Jackie Chan" <chanjackie...@gmail.com> wrote:

> I'm actually using SE mode, so the only process running is this small
> program.
>
> On Fri, Sep 8, 2017 at 7:10 PM, Prathap Kolakkampadath <
> kvprat...@gmail.com> wrote:
>
>> Could you provide more details about your system configuration? How are
>> you making sure that no other process or kernel accessing the memory.
>>
>> Thanks,
>> Prathap
>>
>> On Sep 8, 2017 7:54 AM, "Jackie Chan" <chanjackie...@gmail.com> wrote:
>>
>> Hey guys!
>>
>> I'm running a small program on gem5 to test data cache. The cache size is
>> 32kB (assoc: 8 and block size: 64 bytes). The program loads independent
>> data values from an array with an offset of 64 bytes (the cache block
>> size), such that the total size of array is 32kB, and the accesses keep on
>> happening in a loop. Theoretically the number of data cache misses should
>> be very low in this case. However I am noticing 72 million data cache
>> misses for a total of  520 million accesses. I reduced the size of the
>> array to 30kB rather than 32kB ( using the same size of data cache) and the
>> number of misses reduced to less than a thousand. I am not sure what I
>> might be missing here? Any thoughts?
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Data_Cache

2017-09-08 Thread Prathap Kolakkampadath
Could you provide more details about your system configuration? How are you
making sure that no other process or kernel accessing the memory.

Thanks,
Prathap

On Sep 8, 2017 7:54 AM, "Jackie Chan"  wrote:

Hey guys!

I'm running a small program on gem5 to test data cache. The cache size is
32kB (assoc: 8 and block size: 64 bytes). The program loads independent
data values from an array with an offset of 64 bytes (the cache block
size), such that the total size of array is 32kB, and the accesses keep on
happening in a loop. Theoretically the number of data cache misses should
be very low in this case. However I am noticing 72 million data cache
misses for a total of  520 million accesses. I reduced the size of the
array to 30kB rather than 32kB ( using the same size of data cache) and the
number of misses reduced to less than a thousand. I am not sure what I
might be missing here? Any thoughts?

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] gem5 and McPAT (II)

2016-03-18 Thread Prathap Kolakkampadath
Hi Marcos,

Please take a look at below paper.
"Micro-architectural simulation of embedded core
heterogeneity with gem5 and McPAT"
http://damien.courousse.fr/pdf/2015-Endo-HiPEAC-RAPIDO.pdf

Hope this helps.

Thanks,
Prathap

On Tue, Mar 15, 2016 at 6:55 PM, Andreas Hansson 
wrote:

> Hi Marcos,
>
> I am not familiar with what it is you are trying to accomplish, but I for
> power modelling of any existing CPU I would suggest to have a look at:
> http://www.powmon.ecs.soton.ac.uk/powermodeling/
>
> We recently posted a set of patches that enable you to easily couple such
> power models to gem5.
>
> Andreas
>
> On 15/03/2016, 14:42, "gem5-users on behalf of Marcos Horro Varela"
>  wrote:
>
> >Hello all,
> >
> >Some time ago I asked if there is an 'official integration' of gem5 and
> >McPAT and since then I am not able to make them work together but I have
> >not received response either. Could anyone give me a clue? Basically
> >because I was thinking about making a real implementation to integrate
> >them.
> >Thank you all,
> >
> >Best regards,
> >
> >--
> >Marcos Horro Varela,
> >BSc student
> >University of A Coruña
> >+34 618 62 67 37
> >http://markos-horro.com
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] How to define L2 as Outer Cacheable?

2016-01-06 Thread Prathap Kolakkampadath
Hello Users,

I have defined a memory type for Normal Memory allocation using NMRR
register.
This memory type has  the Inner Cacheable property, "Non-Cacheable" and
Outer Cacheable property, "WriteBack-WriteAllocate".
The memory access to the memory region allocated using this memory type is
by-passing both L1 cache and L2 Cache. This means
both L1 and L2 falls in the Inner Cacheable domain.
According ARM Architecture Reference Manual, It is possible to have One
inner cache(L1) and One outer Cache(L2), which is implementation defined.

If my understanding is correct, in real systems, Cache that is connected as
the slave to the AMBA Bus, falls under Outer Cacheable domain.
How this can be controlled in Gem5?


Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Modelling command bus contention in DRAM controller

2015-11-12 Thread Prathap Kolakkampadath
Hello Andreas,

I don't quite understand, why updating the CAS to other banks only on
RowHit request would break the "non-monotonic temporal order".
I will try to post a fix on review board.

Thanks,
Prathap



On Wed, Nov 11, 2015 at 4:17 PM, Andreas Hansson <andreas.hans...@arm.com>
wrote:

> Hi Prathap,
>
> Let me first reiterate that I don’t think this would ever be a problem in
> a realistic scenario (the tree arguments from before), but it would be good
> to quantify the impact.
>
> The “solution” in my view would need the controller to take decisions in a
> non-monotonic temporal order, and that would also mean that the data bus
> occupancy would have to be tracked as intervals rather than an end value. I
> think the same holds try for the column (and other) constraints. Perhaps
> the latter can be “tricked” by not updating it and relying on the other
> constraints, but conceptually we would beed to track the start and end, not
> just the end. Agreed?
>
> Andreas
>
> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
> Kolakkampadath <kvprat...@gmail.com>
> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
> Date: Wednesday, 11 November 2015 at 21:54
>
> To: gem5 users mailing list <gem5-users@gem5.org>
> Subject: Re: [gem5-users] Modelling command bus contention in DRAM
> controller
>
> Hello Andreas,
>
> see my comments below.
>
> Thanks,
> Prathap
>
> On Wed, Nov 11, 2015 at 12:59 PM, Andreas Hansson <andreas.hans...@arm.com
> > wrote:
>
>> Hi Prathap,
>>
>> Ok, so for FCFS we are seeing the expected behaviour. Agreed?
>>
>
>   >> Agreed.  Because CAS is ordered.
>
>
>>
>> I completely agree on the point of the ordered CAS, and for FR-FCFS we
>> could indeed hit the case you describe. Additionally, the model makes
>> scheduling decisions “conservatively” early (assuming it has to precharge
>> the page), so there is also an inherent window where we decide to do
>> something, and something else could show up in the meanwhile, which we
>> would have chosen instead.
>>
>
>
>> I agree that we could fix this. The arguments against: 1) in any case, a
>> real controller has a pipeline latency that will limit the visibility to
>> the scheduler, so if the window is in the order of the “fronted pipeline
>> latency” of the model then it’s not really a problem since we would have
>> missed them in reality as well (admittedly here it is slightly longer), 2)
>> with more things in the queues (typical case), the likelihood of having to
>> make a bad decision because of this window is very small, 3) I fear it
>> might add quite some complexity to account for these gaps (as opposed to
>> just tracking next CAS), with a very small impact in most full-blown
>> use-cases.
>>
>
>>> I agree that this may not be an issue on larger use-cases, however
> the implementation differs from how a real DRAM controllers schedules the
> commands, where CAS can be reordered based
>>> on the readiness of the respective Bank.
>
>
>>
>> It would be great to actually figure out if this is an issue on larger
>> use-cases, and what the performance impact on the simulator is for fixing
>> the issue. Will you take a stab at coding up a fix?
>>
>
>>> I think this can be easily fixed by "updating the next CAS to banks,
> only if the packet is a row hit". I believe this works assuming tRRD  for
> any DRAM module is greater than the CAS-CAS delay.
>>> I did a fix and ran dram_sweep.py. There was absolutely no
> difference in the performance, which was expected.
>>> Presently i am not able to anticipate any other complexity.
>
>
>>
>> Andreas
>>
>> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
>> Kolakkampadath <kvprat...@gmail.com>
>> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
>> Date: Wednesday, 11 November 2015 at 18:47
>>
>> To: gem5 users mailing list <gem5-users@gem5.org>
>> Subject: Re: [gem5-users] Modelling command bus contention in DRAM
>> controller
>>
>> Hello Andreas,
>>
>> Please see my comments below
>>
>> Thanks,
>> Prathap
>>
>> On Wed, Nov 11, 2015 at 12:38 PM, Andreas Hansson <
>> andreas.hans...@arm.com> wrote:
>>
>>> Hi Prathap,
>>>
>>> I don’t quite understand the statement about the second CAS being issued
>>> before the first one. FCFS by construction won’t do that (in any case,
>>> please do not use FCFS for anyth

Re: [gem5-users] Modelling command bus contention in DRAM controller

2015-11-11 Thread Prathap Kolakkampadath
Hello Andreas,

Please see my comments below

Thanks,
Prathap

On Wed, Nov 11, 2015 at 12:38 PM, Andreas Hansson <andreas.hans...@arm.com>
wrote:

> Hi Prathap,
>
> I don’t quite understand the statement about the second CAS being issued
> before the first one. FCFS by construction won’t do that (in any case,
> please do not use FCFS for anything besides debugging, it’s really not
> representative).
>

>>>> This could happen even in fr-fcfs, incase a hit request arrives soon
after a miss request has been selected by the scheduler.

>
> The latency you quote for access (2), is that taking the colAllowedAt and
> busBusyUntil into account? Remember that doDRAMAccess is not necessarily
> coinciding with then this access actually starts.
>

>>>> My point here is a  CAS to a bank has to be issued as soon as the bank
is available. In that case, the request 2 should be ready before request
one. However, in the current implementation, "all CAS are strictly ordered".

>
> It could very well be that there is a bug, and if there is we should get
> it sorted.
>
>>>> I believe that this could be a bug.

>
> Andreas
>
> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
> Kolakkampadath <kvprat...@gmail.com>
> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
> Date: Wednesday, 11 November 2015 at 17:43
>
> To: gem5 users mailing list <gem5-users@gem5.org>
> Subject: Re: [gem5-users] Modelling command bus contention in DRAM
> controller
>
> Hello Andreas,
>
> I believe it is restrictive.
> Below is the DRAM trace under fcfs scheduler for two requests, where first
> request is a RowMiss request to Bank0
> and second request is a RowHit request to Bank1.
>
> 1) *Memory access latency of first miss request*.
> From the trace, the Memory access latency of the first miss request is
> 52.5ns (tRP(15) + tRCD(15) + tCL(15) + tBURST(7.5)).
> This is expected.
> 2) *Memory access latency of second request, which is a Hit to a
> different Bank.*
>From the trace, the memory access latency for the second request is
> also 52.5ns
>This is unexpected. CAS of this ready request should have issued before
> the CAS of the first Miss request.
>
> In doDRAMAccess() the miss request is updating the next read/write burst
> of all banks, thus the CAS of Ready request
> can now be issued only after the CAS of the Miss Request.
>
> 321190719635810: system.mem_ctrls: Timing access to addr 4291233984,
> rank/bank/row 0 0 65422
> 321190719635810: system.mem_ctrls: RowMiss:READ
> 321190719635810: system.mem_ctrls: Access to 4291233984, ready at
> 321190719688310 bus busy until 321190719688310.
> 321190719643310: system.mem_ctrls: Timing access to addr 3983119872,
> rank/bank/row 0 1 56019
> 321190719643310: system.mem_ctrls: RowHit:READ
> 321190719643310: system.mem_ctrls: Access to 3983119872, ready at
> 321190719695810 bus busy until 321190719695810.
>
> Please let me know what you think.
>
> Thanks,
> Prathap
>
>
> On Wed, Nov 11, 2015 at 3:00 AM, Andreas Hansson <andreas.hans...@arm.com>
> wrote:
>
>> Hi Prathap,
>>
>> Could you elaborate on why you think this line is causing problems. It
>> sounds like you are suggesting this line is too restrictive?
>>
>> It simply enforces a minimum col-to-col timing, there could still be
>> other constraints that are more restrictive.
>>
>> Andreas
>>
>> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
>> Kolakkampadath <kvprat...@gmail.com>
>> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
>> Date: Tuesday, 10 November 2015 at 21:30
>>
>> To: gem5 users mailing list <gem5-users@gem5.org>
>> Subject: Re: [gem5-users] Modelling command bus contention in DRAM
>> controller
>>
>> Hi Andreas,
>>
>> To be more precise, I believe the below code snippet in doDRAMAccess(),
>> should be called only  for the Row Hit request. For a Row Miss request why
>> do we have to update the bank.colAllowedAt for all the Banks?
>>
>> // update the time for the next read/write burst for each
>>
>>  // bank (add a max with tCCD/tCCD_L here)
>>
>>  ranks[j]->banks[i].colAllowedAt = std::max(cmd_at + 
>> cmd_dly,ranks[j]->banks[i].colAllowedAt)
>>
>>
>> Thanks,
>>
>> Prathap
>>
>>
>>
>> On Tue, Nov 10, 2015 at 12:13 PM, Prathap Kolakkampadath <
>> kvprat...@gmail.com> wrote:
>>
>>> Hi Andreas,
>>>
>>> As you said all the act-act are taken in to account.

Re: [gem5-users] Modelling command bus contention in DRAM controller

2015-11-11 Thread Prathap Kolakkampadath
Hello Andreas,

see my comments below.

Thanks,
Prathap

On Wed, Nov 11, 2015 at 12:59 PM, Andreas Hansson <andreas.hans...@arm.com>
wrote:

> Hi Prathap,
>
> Ok, so for FCFS we are seeing the expected behaviour. Agreed?
>

  >> Agreed.  Because CAS is ordered.


>
> I completely agree on the point of the ordered CAS, and for FR-FCFS we
> could indeed hit the case you describe. Additionally, the model makes
> scheduling decisions “conservatively” early (assuming it has to precharge
> the page), so there is also an inherent window where we decide to do
> something, and something else could show up in the meanwhile, which we
> would have chosen instead.
>


> I agree that we could fix this. The arguments against: 1) in any case, a
> real controller has a pipeline latency that will limit the visibility to
> the scheduler, so if the window is in the order of the “fronted pipeline
> latency” of the model then it’s not really a problem since we would have
> missed them in reality as well (admittedly here it is slightly longer), 2)
> with more things in the queues (typical case), the likelihood of having to
> make a bad decision because of this window is very small, 3) I fear it
> might add quite some complexity to account for these gaps (as opposed to
> just tracking next CAS), with a very small impact in most full-blown
> use-cases.
>

   >> I agree that this may not be an issue on larger use-cases, however
the implementation differs from how a real DRAM controllers schedules the
commands, where CAS can be reordered based
   >> on the readiness of the respective Bank.


>
> It would be great to actually figure out if this is an issue on larger
> use-cases, and what the performance impact on the simulator is for fixing
> the issue. Will you take a stab at coding up a fix?
>

   >> I think this can be easily fixed by "updating the next CAS to banks,
only if the packet is a row hit". I believe this works assuming tRRD  for
any DRAM module is greater than the CAS-CAS delay.
   >> I did a fix and ran dram_sweep.py. There was absolutely no difference
in the performance, which was expected.
   >> Presently i am not able to anticipate any other complexity.


>
> Andreas
>
> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
> Kolakkampadath <kvprat...@gmail.com>
> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
> Date: Wednesday, 11 November 2015 at 18:47
>
> To: gem5 users mailing list <gem5-users@gem5.org>
> Subject: Re: [gem5-users] Modelling command bus contention in DRAM
> controller
>
> Hello Andreas,
>
> Please see my comments below
>
> Thanks,
> Prathap
>
> On Wed, Nov 11, 2015 at 12:38 PM, Andreas Hansson <andreas.hans...@arm.com
> > wrote:
>
>> Hi Prathap,
>>
>> I don’t quite understand the statement about the second CAS being issued
>> before the first one. FCFS by construction won’t do that (in any case,
>> please do not use FCFS for anything besides debugging, it’s really not
>> representative).
>>
>
> >>>> This could happen even in fr-fcfs, incase a hit request arrives soon
> after a miss request has been selected by the scheduler.
>
>>
>> The latency you quote for access (2), is that taking the colAllowedAt and
>> busBusyUntil into account? Remember that doDRAMAccess is not necessarily
>> coinciding with then this access actually starts.
>>
>
> >>>> My point here is a  CAS to a bank has to be issued as soon as the
> bank is available. In that case, the request 2 should be ready before
> request one. However, in the current implementation, "all CAS are strictly
> ordered".
>
>>
>> It could very well be that there is a bug, and if there is we should get
>> it sorted.
>>
> >>>> I believe that this could be a bug.
>
>>
>> Andreas
>>
>> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
>> Kolakkampadath <kvprat...@gmail.com>
>> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
>> Date: Wednesday, 11 November 2015 at 17:43
>>
>> To: gem5 users mailing list <gem5-users@gem5.org>
>> Subject: Re: [gem5-users] Modelling command bus contention in DRAM
>> controller
>>
>> Hello Andreas,
>>
>> I believe it is restrictive.
>> Below is the DRAM trace under fcfs scheduler for two requests, where
>> first request is a RowMiss request to Bank0
>> and second request is a RowHit request to Bank1.
>>
>> 1) *Memory access latency of first miss request*.
>> From the trace, the Memory access latency of the first miss re

Re: [gem5-users] Modelling command bus contention in DRAM controller

2015-11-10 Thread Prathap Kolakkampadath
Hi Andreas,

As you said all the act-act are taken in to account.
All col-to-col is taken in to account except, if there is a open
request(Hit) after a closed request(Miss).
If i am using* FCFS* scheduler, and there are two requests in the queue
Request1 and Request2 like below, according
to the current implementation CAS of Request2 is only issued after CAS of
Request1.  Is that correct?
I don't see in doDramAccess(), where the CAS of second request is updated
ahead of CAS of first request.

*Request1@Bank1 (PRE-ACT-CAS) --> Request2@Bank2 (CAS)*

Could you please clarify?

I will also take a look into the util/dram_sweep_plot.py.

Thanks,
Prathap

On Tue, Nov 10, 2015 at 9:41 AM, Andreas Hansson <andreas.hans...@arm.com>
wrote:

> Hi Prathap,
>
> All the col-to-col, act-to-act etc are taken into account, just not
> command-bus contention. Have a look at util/dram_sweep_plot.py for a
> graphical “test bench” for the DRAM controller. As you will see, it never
> exceeds the theoretical max. This script relies on the
> configs/dram/sweep.py for the actual generation of data.
>
> Andreas
>
> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
> Kolakkampadath <kvprat...@gmail.com>
> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
> Date: Monday, 9 November 2015 at 21:53
> To: gem5 users mailing list <gem5-users@gem5.org>
> Subject: Re: [gem5-users] Modelling command bus contention in DRAM
> controller
>
> Hello Andreas,
>
> One problem could be when there is a Miss request followed by a Hit
> request. Taking the below example, initially queue has only one request
> R1(Miss), as soon as the this request is selected there
> is another request in the queue R2(Hit). Here CAS of R2 is ready and can
> be issued right away in the next clock cycle. However,  i believe in the
> simulator, while it computes the ready time of R1, it also recomputes the
> next CAS that can be issued to other Banks. Thus the CAS of R2 can now be
> issued only after the CAS of R1.  If i am right, this could be a problem?
>
> Request1@Bank1 (PRE-ACT-CAS) --> Request2@Bank2 (CAS)
>
> Thanks,
> Prathap
>
> On Mon, Nov 9, 2015 at 1:27 PM, Andreas Hansson <andreas.hans...@arm.com>
> wrote:
>
>> Hi Prathap,
>>
>> Command-bus contention is intentionally not modelled. The main reason for
>> this is to keep the model performant. Moreover, in real devices the command
>> bus is typically designed to _not_ be a bottleneck. Admittedly this choice
>> could be reassessed if needed.
>>
>> Andreas
>>
>> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
>> Kolakkampadath <kvprat...@gmail.com>
>> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
>> Date: Monday, 9 November 2015 at 18:25
>> To: gem5 users mailing list <gem5-users@gem5.org>
>> Subject: [gem5-users] Modelling command bus contention in DRAM controller
>>
>>
>> Hello Users,
>>
>> After closely looking at the doDRAMAccess() of dram controller
>> implementation in GEM5, i suspect that the current implementation may not
>> be taking in to account the command bus contention that could happen if
>> DRAM timing constraints take particular values.
>>
>> For example in the below scenario, the queue has two closed requests one
>> to Bank1 and other to Bank2.
>>
>> Request1@Bank1 (PRE-ACT-CAS) --> Request2@Bank2 (PRE-ACT-CAS)
>>
>> Lets say tRP(8cycles), tRCD(8cycles), tCL(8cycles), and tRRD(8 cycles).
>> In this case ACT of R2 and CAS of R1 becomes active at the same time.
>> At this point one command needs to be delayed by one clock cycle. I don't
>> see how simulator is handling this?  If the simulator is handling this,
>> could someone please point me to the code snippet where this is handled.
>>
>>
>> Thanks,
>> Prathap
>>
>>
>> --
>>
>> -- IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
> --
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Modelling command bus contention in DRAM controller

2015-11-10 Thread Prathap Kolakkampadath
Hi Andreas,

To be more precise, I believe the below code snippet in doDRAMAccess(),
should be called only  for the Row Hit request. For a Row Miss request why
do we have to update the bank.colAllowedAt for all the Banks?

// update the time for the next read/write burst for each

 // bank (add a max with tCCD/tCCD_L here)

 ranks[j]->banks[i].colAllowedAt = std::max(cmd_at +
cmd_dly,ranks[j]->banks[i].colAllowedAt)


Thanks,

Prathap



On Tue, Nov 10, 2015 at 12:13 PM, Prathap Kolakkampadath <
kvprat...@gmail.com> wrote:

> Hi Andreas,
>
> As you said all the act-act are taken in to account.
> All col-to-col is taken in to account except, if there is a open
> request(Hit) after a closed request(Miss).
> If i am using* FCFS* scheduler, and there are two requests in the queue
> Request1 and Request2 like below, according
> to the current implementation CAS of Request2 is only issued after CAS of
> Request1.  Is that correct?
> I don't see in doDramAccess(), where the CAS of second request is updated
> ahead of CAS of first request.
>
> *Request1@Bank1 (PRE-ACT-CAS) --> Request2@Bank2 (CAS)*
>
> Could you please clarify?
>
> I will also take a look into the util/dram_sweep_plot.py.
>
> Thanks,
> Prathap
>
> On Tue, Nov 10, 2015 at 9:41 AM, Andreas Hansson <andreas.hans...@arm.com>
> wrote:
>
>> Hi Prathap,
>>
>> All the col-to-col, act-to-act etc are taken into account, just not
>> command-bus contention. Have a look at util/dram_sweep_plot.py for a
>> graphical “test bench” for the DRAM controller. As you will see, it never
>> exceeds the theoretical max. This script relies on the
>> configs/dram/sweep.py for the actual generation of data.
>>
>> Andreas
>>
>> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
>> Kolakkampadath <kvprat...@gmail.com>
>> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
>> Date: Monday, 9 November 2015 at 21:53
>> To: gem5 users mailing list <gem5-users@gem5.org>
>> Subject: Re: [gem5-users] Modelling command bus contention in DRAM
>> controller
>>
>> Hello Andreas,
>>
>> One problem could be when there is a Miss request followed by a Hit
>> request. Taking the below example, initially queue has only one request
>> R1(Miss), as soon as the this request is selected there
>> is another request in the queue R2(Hit). Here CAS of R2 is ready and can
>> be issued right away in the next clock cycle. However,  i believe in the
>> simulator, while it computes the ready time of R1, it also recomputes the
>> next CAS that can be issued to other Banks. Thus the CAS of R2 can now be
>> issued only after the CAS of R1.  If i am right, this could be a problem?
>>
>> Request1@Bank1 (PRE-ACT-CAS) --> Request2@Bank2 (CAS)
>>
>> Thanks,
>> Prathap
>>
>> On Mon, Nov 9, 2015 at 1:27 PM, Andreas Hansson <andreas.hans...@arm.com>
>> wrote:
>>
>>> Hi Prathap,
>>>
>>> Command-bus contention is intentionally not modelled. The main reason
>>> for this is to keep the model performant. Moreover, in real devices the
>>> command bus is typically designed to _not_ be a bottleneck. Admittedly this
>>> choice could be reassessed if needed.
>>>
>>> Andreas
>>>
>>> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
>>> Kolakkampadath <kvprat...@gmail.com>
>>> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
>>> Date: Monday, 9 November 2015 at 18:25
>>> To: gem5 users mailing list <gem5-users@gem5.org>
>>> Subject: [gem5-users] Modelling command bus contention in DRAM
>>> controller
>>>
>>>
>>> Hello Users,
>>>
>>> After closely looking at the doDRAMAccess() of dram controller
>>> implementation in GEM5, i suspect that the current implementation may not
>>> be taking in to account the command bus contention that could happen if
>>> DRAM timing constraints take particular values.
>>>
>>> For example in the below scenario, the queue has two closed requests one
>>> to Bank1 and other to Bank2.
>>>
>>> Request1@Bank1 (PRE-ACT-CAS) --> Request2@Bank2 (PRE-ACT-CAS)
>>>
>>> Lets say tRP(8cycles), tRCD(8cycles), tCL(8cycles), and tRRD(8 cycles).
>>> In this case ACT of R2 and CAS of R1 becomes active at the same time.
>>> At this point one command needs to be delayed by one clock cycle. I
>>> don't see how simulator is handling this?  If the simulator is hand

Re: [gem5-users] Modelling command bus contention in DRAM controller

2015-11-09 Thread Prathap Kolakkampadath
Hello Andreas,

Thanks for your reply.

Prathap

On Mon, Nov 9, 2015 at 1:27 PM, Andreas Hansson <andreas.hans...@arm.com>
wrote:

> Hi Prathap,
>
> Command-bus contention is intentionally not modelled. The main reason for
> this is to keep the model performant. Moreover, in real devices the command
> bus is typically designed to _not_ be a bottleneck. Admittedly this choice
> could be reassessed if needed.
>
> Andreas
>
> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
> Kolakkampadath <kvprat...@gmail.com>
> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
> Date: Monday, 9 November 2015 at 18:25
> To: gem5 users mailing list <gem5-users@gem5.org>
> Subject: [gem5-users] Modelling command bus contention in DRAM controller
>
>
> Hello Users,
>
> After closely looking at the doDRAMAccess() of dram controller
> implementation in GEM5, i suspect that the current implementation may not
> be taking in to account the command bus contention that could happen if
> DRAM timing constraints take particular values.
>
> For example in the below scenario, the queue has two closed requests one
> to Bank1 and other to Bank2.
>
> Request1@Bank1 (PRE-ACT-CAS) --> Request2@Bank2 (PRE-ACT-CAS)
>
> Lets say tRP(8cycles), tRCD(8cycles), tCL(8cycles), and tRRD(8 cycles). In
> this case ACT of R2 and CAS of R1 becomes active at the same time.
> At this point one command needs to be delayed by one clock cycle. I don't
> see how simulator is handling this?  If the simulator is handling this,
> could someone please point me to the code snippet where this is handled.
>
>
> Thanks,
> Prathap
>
>
> --
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Modelling command bus contention in DRAM controller

2015-11-09 Thread Prathap Kolakkampadath
Hello Users,

After closely looking at the doDRAMAccess() of dram controller
implementation in GEM5, i suspect that the current implementation may not
be taking in to account the command bus contention that could happen if
DRAM timing constraints take particular values.

For example in the below scenario, the queue has two closed requests one to
Bank1 and other to Bank2.

Request1@Bank1 (PRE-ACT-CAS) --> Request2@Bank2 (PRE-ACT-CAS)

Lets say tRP(8cycles), tRCD(8cycles), tCL(8cycles), and tRRD(8 cycles). In
this case ACT of R2 and CAS of R1 becomes active at the same time.
At this point one command needs to be delayed by one clock cycle. I don't
see how simulator is handling this?  If the simulator is handling this,
could someone please point me to the code snippet where this is handled.


Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Modelling command bus contention in DRAM controller

2015-11-09 Thread Prathap Kolakkampadath
Hello Andreas,

One problem could be when there is a Miss request followed by a Hit
request. Taking the below example, initially queue has only one request
R1(Miss), as soon as the this request is selected there
is another request in the queue R2(Hit). Here CAS of R2 is ready and can be
issued right away in the next clock cycle. However,  i believe in the
simulator, while it computes the ready time of R1, it also recomputes the
next CAS that can be issued to other Banks. Thus the CAS of R2 can now be
issued only after the CAS of R1.  If i am right, this could be a problem?

Request1@Bank1 (PRE-ACT-CAS) --> Request2@Bank2 (CAS)

Thanks,
Prathap

On Mon, Nov 9, 2015 at 1:27 PM, Andreas Hansson <andreas.hans...@arm.com>
wrote:

> Hi Prathap,
>
> Command-bus contention is intentionally not modelled. The main reason for
> this is to keep the model performant. Moreover, in real devices the command
> bus is typically designed to _not_ be a bottleneck. Admittedly this choice
> could be reassessed if needed.
>
> Andreas
>
> From: gem5-users <gem5-users-boun...@gem5.org> on behalf of Prathap
> Kolakkampadath <kvprat...@gmail.com>
> Reply-To: gem5 users mailing list <gem5-users@gem5.org>
> Date: Monday, 9 November 2015 at 18:25
> To: gem5 users mailing list <gem5-users@gem5.org>
> Subject: [gem5-users] Modelling command bus contention in DRAM controller
>
>
> Hello Users,
>
> After closely looking at the doDRAMAccess() of dram controller
> implementation in GEM5, i suspect that the current implementation may not
> be taking in to account the command bus contention that could happen if
> DRAM timing constraints take particular values.
>
> For example in the below scenario, the queue has two closed requests one
> to Bank1 and other to Bank2.
>
> Request1@Bank1 (PRE-ACT-CAS) --> Request2@Bank2 (PRE-ACT-CAS)
>
> Lets say tRP(8cycles), tRCD(8cycles), tCL(8cycles), and tRRD(8 cycles). In
> this case ACT of R2 and CAS of R1 becomes active at the same time.
> At this point one command needs to be delayed by one clock cycle. I don't
> see how simulator is handling this?  If the simulator is handling this,
> could someone please point me to the code snippet where this is handled.
>
>
> Thanks,
> Prathap
>
>
> --
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] ARM cortex A-15 configuration

2015-11-01 Thread Prathap Kolakkampadath
Hello Pierre/Fernando,


Thanks for your replies.
Based on [1] the ROB entries for cortex-A15 is 60. However, as per this
article
http://www.anandtech.com/show/6787/nvidia-tegra-4-architecture-deep-dive-plus-tegra-4i-phoenix-hands-on/2,
the
number of ROB entries are 128. I did run some tests changing the ROB size
to 128 and proportionally increasing the size of LSQ and IQ((3 times the
defaults values of gem5 O3V7a).
Now the results look similar to the real platform. So i am not sure how far
the cortex-A15 settings mentioned in [1] is correct.

Thanks,
Prathap


On Sun, Nov 1, 2015 at 5:49 AM, Fernando Endo <fernando.en...@gmail.com>
wrote:

> Hello,
>
> Regarding [1], the instruction latencies of a A15 can be set as those of
> the A9 in [2], or as those of an A72 (
> http://infocenter.arm.com/help/topic/com.arm.doc.uan0016a/cortex_a72_software_optimization_guide_external.pdf
> )
>
> Best regards,
>
> --
> Fernando A. Endo, Post-doc
>
> INRIA Rennes-Bretagne Atlantique
> France
>
>
> 2015-10-20 9:36 GMT+02:00 Pierre-Yves Péneau <pierre-yves.pen...@lirmm.fr>
> :
>
>> Hi,
>>
>> You can find some informations on [1] for Cortex A7 and A15. See [2] for
>> Cortex A8 and A9.
>>
>> [1] http://damien.courousse.fr/pdf/2015-Endo-HiPEAC-RAPIDO.pdf
>> [2] http://damien.courousse.fr/pdf/Endo2014-gem5-SAMOS.pdf
>>
>>
>> On 19/10/2015 17:17, Prathap Kolakkampadath wrote:
>> > Hello Users,
>> >
>> > What is the exact configuration for cortex A15?
>> > The configuration file "configs/common/O3_ARM_v7a.py" doesn't seems to
>> > replicate cortex A15 correctly. For example based on the below document,
>> > cortex A15 should have ROB of size 128, which 3 times more than the
>> size of
>> > ROB(40) specified in the gem5 configuration file.
>> > How to find the exact information regarding ROB/LQ/SQ etc..?
>> >
>> >
>> http://www.anandtech.com/show/6787/nvidia-tegra-4-architecture-deep-dive-plus-tegra-4i-phoenix-hands-on/2
>> >
>> > Thanks,
>> > Prathap
>> >
>> >
>> >
>> > ___
>> > gem5-users mailing list
>> > gem5-users@gem5.org
>> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>> >
>>
>> --
>> +--+
>> | Pierre-Yves Péneau|  first.last at lirmm.fr  |
>> | PhD student - LIRMM - Sysmic  |+ 33 4 67 41 85 85|
>> | Bâtiment 4 Bureau H2.2|http://walafc0.org|
>> +--+
>>
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] ARM cortex A-15 configuration

2015-10-19 Thread Prathap Kolakkampadath
Hello Users,

What is the exact configuration for cortex A15?
The configuration file "configs/common/O3_ARM_v7a.py" doesn't seems to
replicate cortex A15 correctly. For example based on the below document,
cortex A15 should have ROB of size 128, which 3 times more than the size of
ROB(40) specified in the gem5 configuration file.
How to find the exact information regarding ROB/LQ/SQ etc..?

http://www.anandtech.com/show/6787/nvidia-tegra-4-architecture-deep-dive-plus-tegra-4i-phoenix-hands-on/2

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Sources of In-determinism in Full System Simulators

2015-08-15 Thread Prathap Kolakkampadath
I am observing this behaviour with a synthetic benchmark(Nonlinear
Predictive Control). I am not sure if this benchmark is adding any kind of
randomness.
I need to look in to this.

I tested with eembc benchmarks, where i don't see any variations with the
repeated runs.
So as andreas mentioned this must be introduced by this specific benchmark.

Thanks Andreas and Steve.

On Thu, Aug 13, 2015 at 1:22 PM, Steve Reinhardt ste...@gmail.com wrote:

 Even with x86 you should be seeing deterministic results.  If you are
 regularly seeing inconsistencies, you can try running two copies with debug
 tracing (I suggest Exec,ExecMacro,Cache as a starting set of flags) and
 comparing their output with util/tracdiff to see where they diverge.

 Steve

 On Thu, Aug 13, 2015 at 9:44 AM Andreas Hansson andreas.hans...@arm.com
 wrote:

 Hi Prathap,

 That sounds very odd and should not happen unless the workload itself is
 somehow random. What is it you are running? Are you sure you’re running
 exactly the same thing?

 If it does indeed vary then it would be good if you can track down why by
 running two simulations in lock-step and determining where they diverge.

 We regularly run the ARM regressions with UBSan to ensure there is no
 undefined behaviour in the simulator. I know that for X86 there are quite a
 few warnings from UBSan, so that could be a reason if you’re using x86.

 Andreas

 From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Thursday, 13 August 2015 09:11
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Sources of In-determinism in Full System Simulators

 Hello User,

 I am running a benchmark in gem5 full system mode. Checkpoint is created
 in atomic mode and then switches to detailed mode before starting the
 benchmark. On repeated  runs of the benchmark from same checkpoint, the
 number of memory requests arriving at DRAM banks differs; up-to 5%
 variation.  Can someone point out, what could be the sources of
 in-determinism?


 Thanks,
 Prathap

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782
 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Sources of In-determinism in Full System Simulators

2015-08-13 Thread Prathap Kolakkampadath
Hello User,

I am running a benchmark in gem5 full system mode. Checkpoint is created in
atomic mode and then switches to detailed mode before starting the
benchmark. On repeated  runs of the benchmark from same checkpoint, the
number of memory requests arriving at DRAM banks differs; up-to 5%
variation.  Can someone point out, what could be the sources of
in-determinism?


Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] How queued port is modelled in real platforms?

2015-07-27 Thread Prathap Kolakkampadath
Hello Andreas,

I have modelled a system with large MSHRs, LSQ depth etc. With this i could
see that the packet size grows beyond 100 and hits this assertion. After
disabling this assertion the test runs to completion.

1)Is it safe to disable this?

However as i mentioned in an earlier email, i have modified the DRAM
controller switching algorithm to prioritize reads and never switch to
writes as long as there are reads in the read buffer. With this
modification, In one set of memory intensive benchmarks with high page hit
rate, I could see that  min_number_of_writes_per switch is ~15 . I expect
that the write buffer(DRAM and cache) gets full as a result the core
stalls, and no more requests arrives at DRAM controller. Once the DRAM
controller drains the existing reads it switches to writes and when a write
is serviced, and corresponding buffer is freed, core can generate a new
load/store. But the number of writes per switch(15) that i see doesn't
justify the round trip time.

Further debugging this issue, I observed that once the write queue/ write
buffers are full, and when the DRAM controller service the queued reads,
which generates the write backs(due to eviction). Note that DRAM controller
write_buffer is full at this time. These write backs will be get queued in
the port(deferred packets) and any further reads will be queued at the end
of write backs.

2) Is this a desired behaviour? to address write after read hazard?

Thanks,
Prathap



On Mon, Jul 27, 2015 at 9:22 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  100 was chosen to be “sufficiently infinite”, and only break if
 something is wrong.

  The caches have a limited number of MSHRs, the cores have limited LSQ
 depth etc. We could easily add an outstanding transaction limit to the
 crossbar class. In the end it is a detail/speed trade-off. If it does not
 matter, do not model it…

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Monday, 27 July 2015 15:15
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] How queued port is modelled in real platforms?

Hello Andreas,

  Currently, the reasonable limit of this queue is set to 100. Is there a
 specific reason to choose this as the maximum packet queue size.
  Do any bus interface protocol specifies this limit in real platforms?

  Thanks,
  Prathap

 On Mon, Jul 27, 2015 at 4:54 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  The queued port is indeed infinite, and is a convenience construct. It
 should only be used in places where there is already an inherent limit to
 the number of outstanding requests. There is an assert in the queued port
 to ensure things do not grow uncontrollably.

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Sunday, 26 July 2015 18:34
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] How queued port is modelled in real platforms?

   Hell Users,

  Gem5 implements a queued port to interface memory objects. In my
 understanding this queued port is of infinite size. Is this specific to
 Gem5 implementation? How packets are handled in real hardware if the
 request rate of a layer is faster than the service rate of underlying layer?
  It would be great if someone could help me in understanding this.

  Thanks,
  Prathap



 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org

Re: [gem5-users] Handling write backs

2015-07-27 Thread Prathap Kolakkampadath
Hello Andreas,

Now i understand.

Thanks,
Prathap



On Mon, Jul 27, 2015 at 4:49 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  When you write with a granularity smaller than a cache line (to your L1
 D cache), the cache will read the line in exclusive state, and then write
 the specified part. If you write a whole line, then there is no need to
 first read. The latter behaviour is supported for whole-line write
 operations only.

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Tuesday, 21 July 2015 23:14
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Handling write backs

  Hello Users,

  I figured out that the Gem5 implements fetch-on-write-miss policy.
 On a write miss, allocateMissBuffer()  is called to allocate an MSHR ;
 which send the timing request to bring this cache line.
  Once the request is ready, in the response path, handleFill() is called,
 which is responsible to insert the block in to the cache. While inserting,
 if the replacing  victim block is dirty;a write back packet is generated
 and is copied to write buffers.
  After which satisfyCpuSideRequest() is called to write the data to the
 newly assigned block and marks it as dirty.

  Thanks,
  Prathap






 On Tue, Jul 21, 2015 at 11:21 AM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

   Hello Users,

  I am using classic memory system. What is the write miss policy
 implemented in Gem5?
  Looking at the code it looks like, gem5 implements
 *no-fetch-on-write-miss* policy; the access() inserts a block in cache,
 when the request is writeback and it misses the cache.
  However, when i run a test with bunch of write misses, i see equal
 number of reads and writes to DRAM memory. This could happen if the policy
 is
 *fetch-on-write-miss.* So far i couldn't figure this out. It would be
 great if someone can throw some pointers to understand this further.

  Thanks,
  Prathap

 On Mon, Jul 20, 2015 at 2:02 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

  Hello Users,

  I am running a test which generate write misses to LLC. I am looking
 at the cache implementation code. What i understood is, write are treated
 as write backs; on miss, write back commands allocate a new block in cache
 and write the data into it and marks this block as dirty. When the dirty
 blocks are replaced,these will be written in to write buffers.

  I have following questions on this:
  1) When i run the test which generates write misses, i see same
 number  of reads from memory as the number of writes. Does this mean, write
 backs also fetches the cache-line from main memory?

  2) When the blocks in write buffers will be  written to memory? Is it
 written when the write buffers are full?

  It would be great if someone can help me in understanding this.


  Thanks,
  Prathap




 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] How queued port is modelled in real platforms?

2015-07-27 Thread Prathap Kolakkampadath
Hello Andreas,

Currently, the reasonable limit of this queue is set to 100. Is there a
specific reason to choose this as the maximum packet queue size.
Do any bus interface protocol specifies this limit in real platforms?

Thanks,
Prathap

On Mon, Jul 27, 2015 at 4:54 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  The queued port is indeed infinite, and is a convenience construct. It
 should only be used in places where there is already an inherent limit to
 the number of outstanding requests. There is an assert in the queued port
 to ensure things do not grow uncontrollably.

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Sunday, 26 July 2015 18:34
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] How queued port is modelled in real platforms?

   Hell Users,

  Gem5 implements a queued port to interface memory objects. In my
 understanding this queued port is of infinite size. Is this specific to
 Gem5 implementation? How packets are handled in real hardware if the
 request rate of a layer is faster than the service rate of underlying layer?
  It would be great if someone could help me in understanding this.

  Thanks,
  Prathap



 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] How queued port is modelled in real platforms?

2015-07-26 Thread Prathap Kolakkampadath
Hell Users,

Gem5 implements a queued port to interface memory objects. In my
understanding this queued port is of infinite size. Is this specific to
Gem5 implementation? How packets are handled in real hardware if the
request rate of a layer is faster than the service rate of underlying layer?
It would be great if someone could help me in understanding this.

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Handling write backs

2015-07-21 Thread Prathap Kolakkampadath
Hello Users,

I am using classic memory system. What is the write miss policy implemented
in Gem5?
Looking at the code it looks like, gem5 implements *no-fetch-on-write-miss*
policy; the access() inserts a block in cache, when the request is
writeback and it misses the cache.
However, when i run a test with bunch of write misses, i see equal number
of reads and writes to DRAM memory. This could happen if the policy is
*fetch-on-write-miss.* So far i couldn't figure this out. It would be great
if someone can throw some pointers to understand this further.

Thanks,
Prathap

On Mon, Jul 20, 2015 at 2:02 PM, Prathap Kolakkampadath kvprat...@gmail.com
 wrote:

 Hello Users,

 I am running a test which generate write misses to LLC. I am looking at
 the cache implementation code. What i understood is, write are treated as
 write backs; on miss, write back commands allocate a new block in cache and
 write the data into it and marks this block as dirty. When the dirty blocks
 are replaced,these will be written in to write buffers.

 I have following questions on this:
 1) When i run the test which generates write misses, i see same number  of
 reads from memory as the number of writes. Does this mean, write backs also
 fetches the cache-line from main memory?

 2) When the blocks in write buffers will be  written to memory? Is it
 written when the write buffers are full?

 It would be great if someone can help me in understanding this.


 Thanks,
 Prathap


___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Dynamic allocation of L1 MSHRs

2015-07-21 Thread Prathap Kolakkampadath
Hello Davesh,

I did this by manipulating the isFull function as you have rightly pointed
out.
Thanks for the reply.

Regards,
Prathap

On Tue, Jul 21, 2015 at 2:20 PM, Davesh Shingari shingaridav...@gmail.com
wrote:

 Hi

 I think you should look at the isFull function which checks whether the
 MSHR queue is full or not. You can check if it is a miss request and can
 allocate the size of the mshr queue per core dynamically.

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Handling write backs

2015-07-21 Thread Prathap Kolakkampadath
Hello Users,

I figured out that the Gem5 implements fetch-on-write-miss policy.
On a write miss, allocateMissBuffer()  is called to allocate an MSHR ;
which send the timing request to bring this cache line.
Once the request is ready, in the response path, handleFill() is called,
which is responsible to insert the block in to the cache. While inserting,
if the replacing  victim block is dirty;a write back packet is generated
and is copied to write buffers.
After which satisfyCpuSideRequest() is called to write the data to the
newly assigned block and marks it as dirty.

Thanks,
Prathap






On Tue, Jul 21, 2015 at 11:21 AM, Prathap Kolakkampadath 
kvprat...@gmail.com wrote:

 Hello Users,

 I am using classic memory system. What is the write miss policy
 implemented in Gem5?
 Looking at the code it looks like, gem5 implements
 *no-fetch-on-write-miss* policy; the access() inserts a block in cache,
 when the request is writeback and it misses the cache.
 However, when i run a test with bunch of write misses, i see equal number
 of reads and writes to DRAM memory. This could happen if the policy is
 *fetch-on-write-miss.* So far i couldn't figure this out. It would be
 great if someone can throw some pointers to understand this further.

 Thanks,
 Prathap

 On Mon, Jul 20, 2015 at 2:02 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

 Hello Users,

 I am running a test which generate write misses to LLC. I am looking at
 the cache implementation code. What i understood is, write are treated as
 write backs; on miss, write back commands allocate a new block in cache and
 write the data into it and marks this block as dirty. When the dirty blocks
 are replaced,these will be written in to write buffers.

 I have following questions on this:
 1) When i run the test which generates write misses, i see same number
 of reads from memory as the number of writes. Does this mean, write backs
 also fetches the cache-line from main memory?

 2) When the blocks in write buffers will be  written to memory? Is it
 written when the write buffers are full?

 It would be great if someone can help me in understanding this.


 Thanks,
 Prathap



___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Handling write backs

2015-07-20 Thread Prathap Kolakkampadath
Hello Users,

I am running a test which generate write misses to LLC. I am looking at the
cache implementation code. What i understood is, write are treated as write
backs; on miss, write back commands allocate a new block in cache and write
the data into it and marks this block as dirty. When the dirty blocks are
replaced,these will be written in to write buffers.

I have following questions on this:
1) When i run the test which generates write misses, i see same number  of
reads from memory as the number of writes. Does this mean, write backs also
fetches the cache-line from main memory?

2) When the blocks in write buffers will be  written to memory? Is it
written when the write buffers are full?

It would be great if someone can help me in understanding this.


Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Tracking DRAM requests from a process

2015-07-20 Thread Prathap Kolakkampadath
Hello Davesh,

You could probably use something like this:
To check the requests coming from cpus0
if (system()-getMasterName(pkt-req-masterId()) == switch_cpus0.data)
{}

Hope this helps.

Thanks,
Prathap

On Mon, Jul 20, 2015 at 6:09 PM, Davesh Shingari shingaridav...@gmail.com
wrote:

 Polydoros Petrakis polpetras at gmail.com writes:

 
 
  Maybe you can check the physical memory range allocated for each
 process and track requests depending on the access address. (Check which
 range it belongs to)
 
 
  On 31 March 2015 at 00:30, Prathap Kolakkampadath kvprathap at
 gmail.com wrote:
 
 
 
 
  Hello Andreas,
  I am trying to collect the per request memory access latency of a
 specific linux process Read requests. I am running two memory intensive
 linux processes in a single core.
  In my understanding, the Thread id/Context id, Master id are the same
 for the two process memory requests running in the same core. So I won't
 be able to  differentiate the requests of one process from the other
 using that. Is it possible to differentiate the memory requests at DRAM
 Controller layer based on linux process id?
 
 
  Thanks,
  Prathap
 
 
 
 
 
  On Mon, Mar 30, 2015 at 12:35 PM, Andreas Hansson Andreas.Hansson
 at arm.com wrote:
 
 
 
 
 
 
 
  Hi Prathap,
 
  Could you be a bit more specific about what you mean by “tracking
 requests”. Each request that originates in the CPU has an ASID and
 ThreadID associated with it, as well as a MasterID. You should be able
 to access these at the DRAM controller if that’s
   what you’re after. Note that you end up getting requests without this
 information (write backs etc), so you cannot always rely on it.
 
  Andreas
 
 
 
  From: Prathap Kolakkampadath kvprathap at gmail.comReply-To: gem5
 users mailing list gem5-users at gem5.orgDate: Monday, 30 March 2015
 17:27To: gem5 users mailing list gem5-users at gem5.orgSubject:
 [gem5-users] Tracking DRAM requests from a process
 
 
 
 
 
 
 
 
 
  Hello Users,
 
  I am running Gem5 on ARM FS mode using classic memory system. I am
 running two process on a single core. I need to track the DRAM requests
 (memory access latency) of a particular process. Is it possible to
 identify the process id of a linux process in the DRAM
   Controller layer?
 
 
 
 
  Thanks,
 
  Prathap Kumar Valsan
 
 
 
  -- IMPORTANT NOTICE: The contents of this email and any attachments
 are confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents
   to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.
  ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
  ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
 9NJ, Registered in England  Wales, Company No: 2548782
 
 
 
  ___
  gem5-users mailing listgem5-users at gem5.orghttp://m5sim.org/cgi-
 bin/mailman/listinfo/gem5-users
 
 
 
 
  ___
  gem5-users mailing listgem5-users at gem5.orghttp://m5sim.org/cgi-
 bin/mailman/listinfo/gem5-users
 
 
 
 
 
 
  ___
  gem5-users mailing list
  gem5-users at gem5.org
  http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

 Hi

 Did you find a way to know the core where the memory request originated?

 If I use function to get threadId or asid, I get following error:
 gem5.opt: build/ARM/mem/request.hh:533: int Request::getAsid() const:
 Assertion `privateFlags.isSet(VALID_VADDR)' failed.

 And for masterId, the number changes with simulation (I think it is
 generated statically).

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] DRAMCtrl: Question on read/write draining while not using the write threshold.

2015-07-16 Thread Prathap Kolakkampadath
Hello Andreas,

I kind of figured out what's going on.

With the modified DRAM controller switching mechanism, DRAM controller
Write Buffer and LLC write buffers become full because the DRAM controller
don't service writes as long as there are reads in the queue. Once the LLC
write buffer is full, LLC controller locks the cpu side port; as a result
,core cannot further generate any misses to LLC. At this point, the DRAM
controller continues to service reads from read queue; this could generate
evictions at the LLC. These evicted writes will get queued in the DRAM
controller port. Once the DRAM Read queue is empty, the controller switches
to write. After a write burst, and propagation delay, one write buffer gets
freed and now core can generate more read fills. However the read will
appear in the DRAM controller only after the queued writes at the port(due
to evictions) are serviced.

Do you think this hypothesis is correct?

Thanks,
Prathap

On Thu, Jul 16, 2015 at 11:44 AM, Prathap Kolakkampadath 
kvprat...@gmail.com wrote:

 Hello Andreas,


 Below are the changes:

 @@ -1295,7 +1295,8 @@

  // we have so many writes that we have to transition
  if (writeQueue.size()  writeHighThreshold) {
 -switch_to_writes = true;
 +if (readQueue.empty())
 +switch_to_writes = true;
  }
  }

 @@ -1332,7 +1333,7 @@
  if (writeQueue.empty() ||
  (writeQueue.size() + minWritesPerSwitch  writeLowThreshold 
   !drainManager) ||
 -(!readQueue.empty()  writesThisTime = minWritesPerSwitch))
 {
 +!readQueue.empty()) {
  // turn the bus back around for reads again
  busState = WRITE_TO_READ;

 Previously, i used some bank reservation schemes and not using all the
 banks. Now i re ran without any additional changes other than the above
 and still gets a *mean *writes_per_turn around of ~15.
 Once the cache is blocked due to write_buffers full; the core should be
 able to immediately send another request to DRAM as soon as on write buffer
 is freed.
 In my system this round trip time  is 45.5ns [ 24(L2 latency hit +miss) +
 4 ((L1 latency hit +miss) + 7.5 (tBUrst) + 10[Xbar response+request].
 Note that the static latencies are set to 0.

 I am trying to figure out the unexpected number of writes processed per
 switch.
 Also attached the gem5 statistics.

 Thanks,
 Prathap


 On Thu, Jul 16, 2015 at 6:06 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  It sounds like something is going wrong in your write-switching
 algorithm. Have you verified that a read is actually showing up when you
 think it is?

  If needed, is there any chance you could post the patch on RB, or
 include the changes in a mail?

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Thursday, 16 July 2015 00:36
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] DRAMCtrl: Question on read/write draining while
 not using the write threshold.

 Hello Users,

  I have experimented by modifying the DRAM Controller write draining
 algorithm in such a way that, the DRAM Controller always process reads and
 switch to writes only when the read queue is empty; controller switch from
 writes to read immediately when a read arrives in the read queue.

  With this modification, i ran a very memory intensive test on four cores
 simultaneously. Each miss generates a read(line-fill) and write(write back)
 to DRAM.

  First, I brief what i am expecting: DRAM controller continue to process
 reads; meanwhile DRAM write queue fills up and eventually fills up the
 write buffers in the cache and therefore LLC locks up, therefore, no
 further reads and writes to the DRAM from the core.
 At this point, DRAM controller process reads until the read queue is
 empty and switches to write and starts processing writes until a new read
 request arrives. Note that the LLC is blocked at this moment. Once a write
 is processed and corresponding write buffer of cache is cleared, a core can
 generate a new miss(which generates a line fill first). During this round
 trip time(as observed in my system 45ns and tBURST is 7.5ns), the DRAM
 controller can process almost 6 requests(45/7.5). After which it should
 switch to read.

  However, from the gem5 statistics, I observe that the mean
 writes_per_turn around is 30 instead of ~6. I don't understand why this is
 the case? Can someone help me in understanding this behaviour?

  Thanks,
  Prathap


 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any

Re: [gem5-users] MSHR Queue Full Handling

2015-07-15 Thread Prathap Kolakkampadath
Hello Davesh,

I think it should be possible by passing the desired L1  MSHR setting for
each core, while instantiating the dcache in CacheConfig.py
Also look at the BaseCache constructor, to see how these parameters are
being set.


Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] DRAMCtrl: Question on read/write draining while not using the write threshold.

2015-07-15 Thread Prathap Kolakkampadath
Hello Users,

I have experimented by modifying the DRAM Controller write draining
algorithm in such a way that, the DRAM Controller always process reads and
switch to writes only when the read queue is empty; controller switch from
writes to read immediately when a read arrives in the read queue.

With this modification, i ran a very memory intensive test on four cores
simultaneously. Each miss generates a read(line-fill) and write(write back)
to DRAM.

First, I brief what i am expecting: DRAM controller continue to process
reads; meanwhile DRAM write queue fills up and eventually fills up the
write buffers in the cache and therefore LLC locks up, therefore, no
further reads and writes to the DRAM from the core.
At this point, DRAM controller process reads until the read queue is empty
and switches to write and starts processing writes until a new read request
arrives. Note that the LLC is blocked at this moment. Once a write is
processed and corresponding write buffer of cache is cleared, a core can
generate a new miss(which generates a line fill first). During this round
trip time(as observed in my system 45ns and tBURST is 7.5ns), the DRAM
controller can process almost 6 requests(45/7.5). After which it should
switch to read.

However, from the gem5 statistics, I observe that the mean writes_per_turn
around is 30 instead of ~6. I don't understand why this is the case? Can
someone help me in understanding this behaviour?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Question on retry requests due to write queue full.

2015-07-14 Thread Prathap Kolakkampadath
Hello Users,

I am using classic memory system with following DRAM controller parameters
write_buffer_size = 64
write_high_thresh_perc = 85
write_low_thresh_perc = 50
min_writes_per_switch =18

According to write draining algorithm, the bus has to turn around to writes
when the
writeQueue.size()  writeHighThreshold. However when i run some memory
intensive benchmarks, i get a high number of Write retry requests because
the write queue is full, as reported in the gem5 statistics.
 # Number of times write queue was full causing retry
system.mem_ctrls.numWrRetry183731


I am not sure how this could happen, as the DRAM controller drain writes in
a batch whenever the write queue size grows beyond the high threshold,
which is only 85% of Write Buffer Size. Is this expected?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Question on retry requests due to write queue full.

2015-07-14 Thread Prathap Kolakkampadath
I think if the benchmark is write intensive, this could happen. When DRAM
controller process writes, if there are many writes(cache evictions)
arrives at a rate faster than the rate at which DRAM controller process
writes.

Thanks,
Prathap

On Tue, Jul 14, 2015 at 11:27 AM, Prathap Kolakkampadath 
kvprat...@gmail.com wrote:

 Hello Users,

 I am using classic memory system with following DRAM controller parameters
 write_buffer_size = 64
 write_high_thresh_perc = 85
 write_low_thresh_perc = 50
 min_writes_per_switch =18

 According to write draining algorithm, the bus has to turn around to
 writes when the
 writeQueue.size()  writeHighThreshold. However when i run some memory
 intensive benchmarks, i get a high number of Write retry requests because
 the write queue is full, as reported in the gem5 statistics.
  # Number of times write queue was full causing retry
 system.mem_ctrls.numWrRetry183731


 I am not sure how this could happen, as the DRAM controller drain writes
 in a batch whenever the write queue size grows beyond the high threshold,
 which is only 85% of Write Buffer Size. Is this expected?

 Thanks,
 Prathap

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Question on retry requests due to write queue full.

2015-07-14 Thread Prathap Kolakkampadath
Hello Andreas,

I think that could b the case.

I am running a very memory intensive synthetic benchmark, in every memory
operation generates a read and write to DRAM.

Thanks,
Prathap

On Tue, Jul 14, 2015 at 12:39 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  Your write are probably arriving faster than the controller can actually
 send them to the DRAM. What is it you’re running?

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Tuesday, 14 July 2015 17:27
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Question on retry requests due to write queue full.

  Hello Users,

 I am using classic memory system with following DRAM controller parameters
 write_buffer_size = 64
 write_high_thresh_perc = 85
 write_low_thresh_perc = 50
 min_writes_per_switch =18

 According to write draining algorithm, the bus has to turn around to
 writes when the
 writeQueue.size()  writeHighThreshold. However when i run some memory
 intensive benchmarks, i get a high number of Write retry requests because
 the write queue is full, as reported in the gem5 statistics.
  # Number of times write queue was full causing retry
 system.mem_ctrls.numWrRetry183731


  I am not sure how this could happen, as the DRAM controller drain writes
 in a batch whenever the write queue size grows beyond the high threshold,
 which is only 85% of Write Buffer Size. Is this expected?

  Thanks,
  Prathap

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Suspecting bubbles in the DRAM controller command bus

2015-07-12 Thread Prathap Kolakkampadath
Thanks Andreas.

On Sun, Jul 12, 2015 at 5:55 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  Indeed, that’s the one.

  The controller is quite well tested, but if you’ve found a bug the
 easiest is to post a patch on the reviewboard.

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Friday, 10 July 2015 18:41

 To: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Suspecting bubbles in the DRAM controller
 command bus

Hello Andreas,

  Ok. So below code make sure that the next scheduling decision happens
 early enough. Is that correct?
  // Update the minimum timing between the requests, this is a
 // conservative estimate of when we have to schedule the next
 // request to not introduce any unecessary bubbles. In most cases
 // we will wake up sooner than we have to.
 nextReqTime = busBusyUntil - (tRP + tRCD + tCL);

  Thanks,
  Prathap

 On Fri, Jul 10, 2015 at 11:51 AM, Andreas Hansson andreas.hans...@arm.com
  wrote:

  Hi Prathap,

  If we have no row hits left in the queue, the open-adaptive (and
 close-adaptive) policy will auto precharge. The next scheduling decision
 will happen early enough that we can hide any preparation needed to not
 introduce bubbles on the bus. Thus, the activate will happen early enough
 to get 100% utilisation if this is possible.

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Friday, 10 July 2015 17:11
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Suspecting bubbles in the DRAM controller
 command bus

   Hello Andreas,

  I am still not very clear.
  If we have not already precharged, we need to take the hit and do it
 now.
  What If we don't have any row hits left in the queue? I agree that with
 open-adaptive policy, the Bank1 will be auto precharged. According to the
 code snippet below, it still has to issue an activate now. Shouldn't this
 have done back in time(Bank level parallelism)?

  Thanks,
  Prathap






 On Fri, Jul 10, 2015 at 2:16 AM, Andreas Hansson andreas.hans...@arm.com
  wrote:

  Hi Prathap,

  The expression ensures that we do not “go back in time” when deciding
 to precharge the bank. If we have not already precharged, we need to take
 the hit and do it now. For the access pattern you describe, with an
 closed-adaptive or open-adaptive page policy we will issue the last column
 access with an auto-precharge. In any case, if R0-9 are to bank 0 and R10
 to bank 1 then we can prepare R10 without the need to precharge bank 0.

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Thursday, 9 July 2015 18:26
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Suspecting bubbles in the DRAM controller command
 bus

   Hello Users,

  I suspect the DRAM controller code is adding unwanted bubbles in the
 command bus.

  Consider there are 10 row hit read requests- R0 and R9- in the queue,
 all targeting Bank0 and a Row miss request- R10 -to Bank1 of same rank and
 numbered in the arrival order.  According to FR-FCFS in open-page policy,
 the DRAM controller process all row-hit requests to Bank0 and then choose
 the row-miss request to Bank1. I suspect the problem lies here in the
 controller code, when it updates the access latency of the row miss
 request- R10 - to bank1.

 According to JEDEC timing constraints, the controller can issue
 Precharge  to another bank after a clock cycle(tCK) delay and Activate
 after tRRD cycles delay(ACT-ACT delay between two banks). This means, by
 the time DRAM controller process the 10 row hit requests, the Bank1 should
 be precharged and activated.


  However, I am not sure if this is taken care of in the below snippet
 of code.

 if (bank.openRow == dram_pkt-row) {
 // nothing to do
 } else {
 row_hit = false;

 // If there is a page open, precharge it.
 if (bank.openRow != Bank::NO_ROW) {
 *prechargeBank(bank, std::max(bank.preAllowedAt,
 curTick()))*;
 }

 // next we need to account for the delay in activating the
 // page
 Tick act_tick = std::max(bank.actAllowedAt, curTick());

 // Record the activation and deal with all the global timing
 // constraints caused be a new activation (tRRD and tXAW)
 activateBank(bank, act_tick, dram_pkt-row);

 // issue the command as early as possible
 cmd_at = bank.colAllowedAt;
 }

  shouldn't this be

 *prechargeBank(bank, std::max(bank.preAllowedAt, dram_pkt-entrytime)*;

  I am not sure if my understanding is correct. Please clarify.

  Thanks,
  Prathap

Re: [gem5-users] Suspecting bubbles in the DRAM controller command bus

2015-07-10 Thread Prathap Kolakkampadath
Hello Andreas,

Ok. So below code make sure that the next scheduling decision happens early
enough. Is that correct?
// Update the minimum timing between the requests, this is a
// conservative estimate of when we have to schedule the next
// request to not introduce any unecessary bubbles. In most cases
// we will wake up sooner than we have to.
nextReqTime = busBusyUntil - (tRP + tRCD + tCL);

Thanks,
Prathap

On Fri, Jul 10, 2015 at 11:51 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  If we have no row hits left in the queue, the open-adaptive (and
 close-adaptive) policy will auto precharge. The next scheduling decision
 will happen early enough that we can hide any preparation needed to not
 introduce bubbles on the bus. Thus, the activate will happen early enough
 to get 100% utilisation if this is possible.

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Friday, 10 July 2015 17:11
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Suspecting bubbles in the DRAM controller
 command bus

   Hello Andreas,

  I am still not very clear.
  If we have not already precharged, we need to take the hit and do it
 now.
  What If we don't have any row hits left in the queue? I agree that with
 open-adaptive policy, the Bank1 will be auto precharged. According to the
 code snippet below, it still has to issue an activate now. Shouldn't this
 have done back in time(Bank level parallelism)?

  Thanks,
  Prathap






 On Fri, Jul 10, 2015 at 2:16 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  The expression ensures that we do not “go back in time” when deciding
 to precharge the bank. If we have not already precharged, we need to take
 the hit and do it now. For the access pattern you describe, with an
 closed-adaptive or open-adaptive page policy we will issue the last column
 access with an auto-precharge. In any case, if R0-9 are to bank 0 and R10
 to bank 1 then we can prepare R10 without the need to precharge bank 0.

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Thursday, 9 July 2015 18:26
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Suspecting bubbles in the DRAM controller command
 bus

   Hello Users,

  I suspect the DRAM controller code is adding unwanted bubbles in the
 command bus.

  Consider there are 10 row hit read requests- R0 and R9- in the queue,
 all targeting Bank0 and a Row miss request- R10 -to Bank1 of same rank and
 numbered in the arrival order.  According to FR-FCFS in open-page policy,
 the DRAM controller process all row-hit requests to Bank0 and then choose
 the row-miss request to Bank1. I suspect the problem lies here in the
 controller code, when it updates the access latency of the row miss
 request- R10 - to bank1.

 According to JEDEC timing constraints, the controller can issue
 Precharge  to another bank after a clock cycle(tCK) delay and Activate
 after tRRD cycles delay(ACT-ACT delay between two banks). This means, by
 the time DRAM controller process the 10 row hit requests, the Bank1 should
 be precharged and activated.


  However, I am not sure if this is taken care of in the below snippet of
 code.

 if (bank.openRow == dram_pkt-row) {
 // nothing to do
 } else {
 row_hit = false;

 // If there is a page open, precharge it.
 if (bank.openRow != Bank::NO_ROW) {
 *prechargeBank(bank, std::max(bank.preAllowedAt, curTick()))*
 ;
 }

 // next we need to account for the delay in activating the
 // page
 Tick act_tick = std::max(bank.actAllowedAt, curTick());

 // Record the activation and deal with all the global timing
 // constraints caused be a new activation (tRRD and tXAW)
 activateBank(bank, act_tick, dram_pkt-row);

 // issue the command as early as possible
 cmd_at = bank.colAllowedAt;
 }

  shouldn't this be

 *prechargeBank(bank, std::max(bank.preAllowedAt, dram_pkt-entrytime)*;

  I am not sure if my understanding is correct. Please clarify.

  Thanks,
  Prathap

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

Re: [gem5-users] Suspecting bubbles in the DRAM controller command bus

2015-07-10 Thread Prathap Kolakkampadath
Hello Andreas,

I am still not very clear.
 If we have not already precharged, we need to take the hit and do it now.
What If we don't have any row hits left in the queue? I agree that with
open-adaptive policy, the Bank1 will be auto precharged. According to the
code snippet below, it still has to issue an activate now. Shouldn't this
have done back in time(Bank level parallelism)?

Thanks,
Prathap






On Fri, Jul 10, 2015 at 2:16 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  The expression ensures that we do not “go back in time” when deciding to
 precharge the bank. If we have not already precharged, we need to take the
 hit and do it now. For the access pattern you describe, with an
 closed-adaptive or open-adaptive page policy we will issue the last column
 access with an auto-precharge. In any case, if R0-9 are to bank 0 and R10
 to bank 1 then we can prepare R10 without the need to precharge bank 0.

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Thursday, 9 July 2015 18:26
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Suspecting bubbles in the DRAM controller command
 bus

   Hello Users,

  I suspect the DRAM controller code is adding unwanted bubbles in the
 command bus.

  Consider there are 10 row hit read requests- R0 and R9- in the queue, all
 targeting Bank0 and a Row miss request- R10 -to Bank1 of same rank and
 numbered in the arrival order.  According to FR-FCFS in open-page policy,
 the DRAM controller process all row-hit requests to Bank0 and then choose
 the row-miss request to Bank1. I suspect the problem lies here in the
 controller code, when it updates the access latency of the row miss
 request- R10 - to bank1.

 According to JEDEC timing constraints, the controller can issue Precharge
 to another bank after a clock cycle(tCK) delay and Activate after tRRD
 cycles delay(ACT-ACT delay between two banks). This means, by the time DRAM
 controller process the 10 row hit requests, the Bank1 should be precharged
 and activated.


  However, I am not sure if this is taken care of in the below snippet of
 code.

 if (bank.openRow == dram_pkt-row) {
 // nothing to do
 } else {
 row_hit = false;

 // If there is a page open, precharge it.
 if (bank.openRow != Bank::NO_ROW) {
 *prechargeBank(bank, std::max(bank.preAllowedAt, curTick()))*;
 }

 // next we need to account for the delay in activating the
 // page
 Tick act_tick = std::max(bank.actAllowedAt, curTick());

 // Record the activation and deal with all the global timing
 // constraints caused be a new activation (tRRD and tXAW)
 activateBank(bank, act_tick, dram_pkt-row);

 // issue the command as early as possible
 cmd_at = bank.colAllowedAt;
 }

  shouldn't this be

 *prechargeBank(bank, std::max(bank.preAllowedAt, dram_pkt-entrytime)*;

  I am not sure if my understanding is correct. Please clarify.

  Thanks,
  Prathap

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Suspecting bubbles in the DRAM controller command bus

2015-07-09 Thread Prathap Kolakkampadath
Hello Users,

I suspect the DRAM controller code is adding unwanted bubbles in the
command bus.

Consider there are 10 row hit read requests- R0 and R9- in the queue, all
targeting Bank0 and a Row miss request- R10 -to Bank1 of same rank and
numbered in the arrival order.  According to FR-FCFS in open-page policy,
the DRAM controller process all row-hit requests to Bank0 and then choose
the row-miss request to Bank1. I suspect the problem lies here in the
controller code, when it updates the access latency of the row miss
request- R10 - to bank1.

According to JEDEC timing constraints, the controller can issue Precharge
to another bank after a clock cycle(tCK) delay and Activate after tRRD
cycles delay(ACT-ACT delay between two banks). This means, by the time DRAM
controller process the 10 row hit requests, the Bank1 should be precharged
and activated.


However, I am not sure if this is taken care of in the below snippet of
code.

if (bank.openRow == dram_pkt-row) {
// nothing to do
} else {
row_hit = false;

// If there is a page open, precharge it.
if (bank.openRow != Bank::NO_ROW) {
*prechargeBank(bank, std::max(bank.preAllowedAt, curTick()))*;
}

// next we need to account for the delay in activating the
// page
Tick act_tick = std::max(bank.actAllowedAt, curTick());

// Record the activation and deal with all the global timing
// constraints caused be a new activation (tRRD and tXAW)
activateBank(bank, act_tick, dram_pkt-row);

// issue the command as early as possible
cmd_at = bank.colAllowedAt;
}

shouldn't this be

*prechargeBank(bank, std::max(bank.preAllowedAt, dram_pkt-entrytime)*;

I am not sure if my understanding is correct. Please clarify.

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Help to understand memory trace

2015-07-01 Thread Prathap Kolakkampadath
Hello Users,

I am analyzing the memory access pattern of a benchmark. For which i have
connected the communication monitor between cpu and dcache and obtained the
trace. The snippet of trace looks like below.

w,2174471252,4,66,850031503453000
w,2174471256,4,66,850031503453250
w,2174471260,4,66,850031503453500
r,2282845416,4,74,850031503516750
r,4017452700,4,74,850031503518250
u,4017452700,4,2097218,850031503521500
u,4017452700,4,2097218,850031503524000
w,2174471204,4,66,850031503531500

Here i could see some packet are marked with  commands 'u'(not read nor
write). What does this mean?
Also i am trying to understand the flag values(66, 74, 2097218). Can anyone
help me in understanding what do these flag values indicate?

Thanks,
Prathap Kumar Valsan
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Help to understand memory trace

2015-07-01 Thread Prathap Kolakkampadath
Thanks Andreas.

On Wed, Jul 1, 2015 at 2:42 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap.

  Have a look at src/proto/packet.proto. The ASCII output is not really
 intended to be used for anything besides “checking”.

  The fields in the trace correspond to the enums in src/mem/packet.hh and
 src/mem/request.hh

  Andreas

   From: gem5-users gem5-users-boun...@gem5.org on behalf of Prathap
 Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Wednesday, 1 July 2015 11:53
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Help to understand memory trace

Hello Users,

  I am analyzing the memory access pattern of a benchmark. For which i have
 connected the communication monitor between cpu and dcache and obtained the
 trace. The snippet of trace looks like below.

 w,2174471252,4,66,850031503453000
 w,2174471256,4,66,850031503453250
 w,2174471260,4,66,850031503453500
 r,2282845416,4,74,850031503516750
 r,4017452700,4,74,850031503518250
 u,4017452700,4,2097218,850031503521500
 u,4017452700,4,2097218,850031503524000
 w,2174471204,4,66,850031503531500

  Here i could see some packet are marked with  commands 'u'(not read nor
 write). What does this mean?
  Also i am trying to understand the flag values(66, 74, 2097218). Can
 anyone help me in understanding what do these flag values indicate?

  Thanks,
  Prathap Kumar Valsan


 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] How to model a die-stacked DRAM?

2015-06-18 Thread Prathap Kolakkampadath
Hello Users,

Has anyone tried to model a die-stacked DRAM using gem5's classic memory
system?
I read a couple of papers, in which they model die-stacked DRAM using
DRAMSim2.
How difficult it would be to model and any pointers on where to start?

Thanks,
Prathap Kumar Valsan
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] L2 cache partitioning

2015-06-14 Thread Prathap Kolakkampadath
Thanks

On Sat, Jun 13, 2015 at 4:38 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  We have some patches to restrict way allocation in the cache itself (not
 per core though). You can probably use that as a starting point. I’m afraid
 beyond that you will need to add the appropriate functionality to look at
 e.g. masterId and decide on a way. I’ll try and get those patches posted in
 the next few days.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Monday, 8 June 2015 17:29
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] L2 cache partitioning

  Dear Users,

  I am using ARM Full System configuration, where L2 is 8-way set
 associative shared Last Level Cache. I am trying to partition the L2 cache
 by *ways* among four cores, so that each core gets two ways.
  Is there a hardware support(configuration register) available to do
 this? If not can anyone throw some pointers to achieve way partitioning.


  Thanks in advance.

  Prathap


 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] L2 cache partitioning

2015-06-08 Thread Prathap Kolakkampadath
Dear Users,

I am using ARM Full System configuration, where L2 is 8-way set associative
shared Last Level Cache. I am trying to partition the L2 cache by *ways*
among four cores, so that each core gets two ways.
Is there a hardware support(configuration register) available to do this?
If not can anyone throw some pointers to achieve way partitioning.


Thanks in advance.

Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Dynamic allocation of L1 MSHRs

2015-05-06 Thread Prathap Kolakkampadath
Hello Users,

I am simulating ARM detailed(O3)  quad core CPU with private L1 cache and
shared L2 cache.
I am trying to regulate the number of outstanding requests a core can
generate. I know that by statically changing the number of number of L1
MSHRs(passed as parameters from O3v7a.py), i can restrict the number of
outstanding requests of a core.

I would like to have private cache with different number of L1 MSHRs for
each core(for eg: core0 - 1MSHR, Core2 - 3MSHRs ..etc).  How to make this
assignment through configuration file?

Also i would like to dynamically change this allocation.
Can i make use of m5(special instruction) to do this? Can anyone shed some
light on this.

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Dynamic allocation of L1 MSHRs

2015-05-06 Thread Prathap Kolakkampadath
Hello Users,

I understood the through CacheConfig.py, i can connect L1 caches with
different MSHRs to each core. However i am not sure how to dynamically
change the number of L1 MSHRs allocated to
each core. Can someone shed some light on this?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Query regarding blocking cache slave port

2015-05-04 Thread Prathap Kolakkampadath
Hello All,

I am simulating an ARM O3 multi-core system with private L1 cache and a
Shared L2 cache.
I am investigating the MSHR contention in the L2 cache. If cache has no
free MSHRs, this Marks the access path of the cache as blocked and also
sets the blocked flag in the slave interface.This means there won't be any
further access to the L2 cache.
Instead of blocking the L2 cache altogether, i would like to place a MSHR
reservation to a selected core. So that requests from only selected core
are blocked based on its respective MSHR utilization.

I am not sure if this is feasible. Do L2 Bus has an arbitrator which can be
modified to do this?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Query regarding blocking cache slave port

2015-05-04 Thread Prathap Kolakkampadath
Thanks Andreas.

On Mon, May 4, 2015 at 5:11 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  Check retryWaiting in Xbar. There we choose the port to go next when one
 or more ports had to wait. If you want to implement what you suggest you
 also have to perform a check in recvTimingReq to not just see if the layer
 is busy, but also check if the port asking is within budget.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Monday, 4 May 2015 22:56
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Query regarding blocking cache slave port

   Hi Andreas,

  Thanks for your reply. I am trying to figure out how to implement this
 based on your inputs. Can you also please point out the data structures
 which maintains the queue in cross bar.?

  Thanks,
  Prathap

 On Mon, May 4, 2015 at 4:04 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  The most sensible place to implement the arbitration is indeed in the
 crossbar which is conceptually part of the L2 cache. By default the
 crossbar uses First-Come First-Served, but you can change with not too much
 coding. The tricky bit in this case is to base the selection on MSHRs,
 since the crossbar has no such accounting. I would think the easiest is to
 add outstanding transaction counting per SlavePort in the crossbar, and
 then only let a port have X outstanding transactions. Overall this would be
 valuable functionality, so if you do code it up, please post a patch. It
 would be a great contribution.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Monday, 4 May 2015 20:18
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Query regarding blocking cache slave port

   Hello All,

  I am simulating an ARM O3 multi-core system with private L1 cache and a
 Shared L2 cache.
 I am investigating the MSHR contention in the L2 cache. If cache has no
 free MSHRs, this Marks the access path of the cache as blocked and also
 sets the blocked flag in the slave interface.This means there won't be any
 further access to the L2 cache.
 Instead of blocking the L2 cache altogether, i would like to place a MSHR
 reservation to a selected core. So that requests from only selected core
 are blocked based on its respective MSHR utilization.

  I am not sure if this is feasible. Do L2 Bus has an arbitrator which
 can be modified to do this?

  Thanks,
  Prathap

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Query regarding blocking cache slave port

2015-05-04 Thread Prathap Kolakkampadath
Hi Andreas,

Thanks for your reply. I am trying to figure out how to implement this
based on your inputs. Can you also please point out the data structures
which maintains the queue in cross bar.?

Thanks,
Prathap

On Mon, May 4, 2015 at 4:04 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  The most sensible place to implement the arbitration is indeed in the
 crossbar which is conceptually part of the L2 cache. By default the
 crossbar uses First-Come First-Served, but you can change with not too much
 coding. The tricky bit in this case is to base the selection on MSHRs,
 since the crossbar has no such accounting. I would think the easiest is to
 add outstanding transaction counting per SlavePort in the crossbar, and
 then only let a port have X outstanding transactions. Overall this would be
 valuable functionality, so if you do code it up, please post a patch. It
 would be a great contribution.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Monday, 4 May 2015 20:18
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Query regarding blocking cache slave port

   Hello All,

  I am simulating an ARM O3 multi-core system with private L1 cache and a
 Shared L2 cache.
 I am investigating the MSHR contention in the L2 cache. If cache has no
 free MSHRs, this Marks the access path of the cache as blocked and also
 sets the blocked flag in the slave interface.This means there won't be any
 further access to the L2 cache.
 Instead of blocking the L2 cache altogether, i would like to place a MSHR
 reservation to a selected core. So that requests from only selected core
 are blocked based on its respective MSHR utilization.

  I am not sure if this is feasible. Do L2 Bus has an arbitrator which can
 be modified to do this?

  Thanks,
  Prathap

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] bytesWritten (8 * number of 64-bit stores to unique addresses)

2015-04-21 Thread Prathap Kolakkampadath
Hello Patrick,

Can you check the number of last level cache misses as reported by
stats.text?

Prathap
On Apr 21, 2015 5:47 PM, Patrick plafr...@gmail.com wrote:

 I looked back at this, and I'm still not sure it's clear to me what is
 going on. I decreased the size of the write queue to 2, and when running
 the simulation described in my previous message (in which 512 64-bit stores
 to unique addresses are issued), bytesWritten in one run was reported to be
 only 1,664 bytes. With the write queue set to size 2, I would expect
 bytesWritten to be at least 4096 - 128 = 3,968 bytes (the burstSize is 64
 bytes).

 Any additional help is appreciated.

 Regards,
 Patrick

 On Thu, Apr 16, 2015 at 2:07 PM, Patrick plafr...@gmail.com wrote:

 Thanks, Andreas. This makes sense.

 On Wed, Apr 15, 2015 at 5:26 PM, Andreas Hansson andreas.hans...@arm.com
  wrote:

  Hi Patrick,

  When it comes to the stores you are looking at a rather small number
 of operations, and my guess is that they are still in the DRAM write
 queues. These queues are not drained at the moment once the writes fall
 below the “low water mark”.

  Andreas

   From: Patrick plafr...@gmail.com
 Reply-To: gem5 users mailing list gem5-users@gem5.org
 Date: Wednesday, 15 April 2015 19:13
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] bytesWritten  (8 * number of 64-bit stores to
 unique addresses)

  I am looking at stats.txt for the amount of data written to the DRAM
 during the execution of a process in full system mode. I looked at the
 execution trace, and there are at least 512 64-bit stores to unique
 addresses. However, stats.txt reports only 2,304 bytesWritten to the
 memory. It is a 4-channel memory configuration. stats.txt reports 1,152
 bytesWritten on channel 0, 0 bytesWritten to channel 1, 0
 bytesWritten to channel 2, and 1,152 bytesWritten to channel 3.

  Does anyone know what would cause this? I thought maybe the data might
 be getting left in the caches, but I am waiting until the process exits
 before calling m5 resetstats. The bytesReadDRAM is less than expected,
 also, based on the number of loads in the instruction trace. I thought
 perhaps this was because no-write allocate was being used, but the
 discussion linked below suggests that default is to use write-allocate. I
 can't find where this is configured in gem5, so I'm not able to check this
 at the moment.

  http://comments.gmane.org/gmane.comp.emulators.m5.users/12597

  Any help is appreciated.

   ​-​
 Patrick

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
 9NJ, Registered in England  Wales, Company No: 2548782

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users




 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Question on maximum number of outstanding DRAM memory requests that can be generated by a core.

2015-01-18 Thread Prathap Kolakkampadath via gem5-users
Hello Users,

Is the maximum number of outstanding DRAM memory requests that can be
generated by a core at a time is limited by number of MSHRs in its private
cache?

For example, In a 4 core system configuration,  each core has  a private L1
cache with 6 MSHRs each. The systems Last Level cache has 24 MSHRs, which
is shared by all the four cores. If a memory intensive program running core
0 generates many L2 data cache read request misses, then the number of
outstanding memory requests it can generate at a time is limited by number
of L1 MSHRs or L2 MSHRs?

Thanks and Regards,
Prathap Kumar Valsan
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] DRAMCTRL:Seeing an unusual behaviuor with FR-FCFS scheduler

2014-11-14 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

The above patch doesn't affect my scenario on a single rank. However, i
suspect there is a bug in the FR-FCFS implementation in gem5.
Looks like, the requests are not overlapping. Precharge and/or activate has
to be issued once a memory request entered the queue. So  in a queue, if
there are few ahead read requests and few following
read requests all targeting to a same row in Bank0 to a read request which
targets Bank1 and not a row hit. Worst case delay for Bank1 request should
be tRP+rRCD. After this delay Bank1 request should also become a row hit
(after the precharge and activate). However according to current
implementation the request to Bank1 is served only after all the ahead and
following row hits are done, as these timing parameters are set in
doDRAMAccess(), which is called after reordering the queue.

Is this observation looks correct?

Thanks,
Prathap

On Fri, Nov 14, 2014 at 8:45 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  It would very well be a bug. We have got a small fix for the FRFCFS
 about to go out (but it should not affect your scenario assuming It is a
 single rank):

  diff --git a/src/mem/dram_ctrl.cc b/src/mem/dram_ctrl.cc
 --- a/src/mem/dram_ctrl.cc
 +++ b/src/mem/dram_ctrl.cc
 @@ -1483,8 +1489,8 @@
  // 1) Commands that access the same rank as previous burst
  //and can prep the bank seamlessly.
  // 2) Commands (any rank) with earliest bank prep
 -if (!switched_cmd_type  same_rank_match 
 -min_act_at_same_rank = min_cmd_at) {
 +if ((bank_mask == 0) || (!switched_cmd_type  same_rank_match 
 +min_act_at_same_rank = min_cmd_at)) {
  bank_mask = bank_mask_same_rank;
  }

  Could you give that a spin?

  Thanks,

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Friday, 14 November 2014 00:11
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] DRAMCTRL:Seeing an unusual behaviuor with FR-FCFS
 scheduler

   Hi Users,

  For the following scenario:


  Read0 Read1 Read2 Read3 Read4 Read5 Read6 Read7 Read8 Read9 Read10 Read11

  There are 12 reads in the read queue numbered in the order of arrival.
  Read 0 to Read3 access same row  of Bank1, Read4 access Bank0, Read5 to
 Read8 access same row of Bank2 and Read9 to Read11 access same row of Bank3.

  According to FR-FCFS scheduler, even there is only a single request
 Read4 to Bank0, it should be scheduled after the Read0 to Read3 are
 scheduled. Because within the window of Read0-Read3, the Read4 would have
 done with precharge and activate and ready to schedule. Though Read5 and
 Read9 are also ready, Read4 needs to be scheduled as the next row hit,
 according to FCFS.

  However i see a different behaviour where Read4 is scheduled only after
 all other row hits to Bank2 and Bank3 is scheduled. Also i noticed  from
 the debug prints that Read4 is not becoming a row_hit.

  Are we missing to mark the read as row hit after precharge and activate.
 I am trying to figure this out. Is my understanding correct?

  Thanks,
  Prathap

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] DRAMCTRL:Seeing an unusual behaviuor with FR-FCFS scheduler

2014-11-14 Thread Prathap Kolakkampadath via gem5-users
Hi Andreas,

As you said it shouldn't have to switch to another bank which doesn't has a
row hit. However if there is request in queue which is not row hit when it
enters the queue, has to become row hit after time tRP+tRCD. Then this
request will be scheduled according to FR-FCFS ahead of subsequent requests
in the queue which are also row hits to another bank. However in the
current code, activate and precharge is issued to a request in queue only
if there are no more row hits in the queue. Does this makes sense?

Thanks,
Prathap

On Fri, Nov 14, 2014 at 4:54 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  I am not sure I understand what you think is wrong here. If there are
 row hits, why should the controller switch to another bank (except to avoid
 starvation etc)?

  It will switch once there are no row hits or the limit on in-row-hits is
 reached, and at that point it will switch to the request that is first
 ready, or FCFS if there is more than one that would be ready at the same
 time. Note that there are also constraints like tRRD etc that have to be
 respected.

  If you want to study the behaviour on a larger scale, you can have a
 look at util/dram_sweep_plot.py (you need to run config/dram/sweep.py first
 for a given memory config), and look at the efficiency of the controller
 across a range of in-row-hits and bank parallelism. For the entire 3D
 surface the controller achieves a throughput in the order of 95% of what an
 oracle scheduler would.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Friday, November 14, 2014 at 10:37 PM
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] DRAMCTRL:Seeing an unusual behaviuor with
 FR-FCFS scheduler

Hello Andreas,

  The above patch doesn't affect my scenario on a single rank. However, i
 suspect there is a bug in the FR-FCFS implementation in gem5.
  Looks like, the requests are not overlapping. Precharge and/or activate
 has to be issued once a memory request entered the queue. So  in a queue,
 if there are few ahead read requests and few following
  read requests all targeting to a same row in Bank0 to a read request
 which targets Bank1 and not a row hit. Worst case delay for Bank1 request
 should be tRP+rRCD. After this delay Bank1 request should also become a row
 hit (after the precharge and activate). However according to current
 implementation the request to Bank1 is served only after all the ahead and
 following row hits are done, as these timing parameters are set in
 doDRAMAccess(), which is called after reordering the queue.

  Is this observation looks correct?

  Thanks,
 Prathap

 On Fri, Nov 14, 2014 at 8:45 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  It would very well be a bug. We have got a small fix for the FRFCFS
 about to go out (but it should not affect your scenario assuming It is a
 single rank):

  diff --git a/src/mem/dram_ctrl.cc b/src/mem/dram_ctrl.cc
 --- a/src/mem/dram_ctrl.cc
 +++ b/src/mem/dram_ctrl.cc
 @@ -1483,8 +1489,8 @@
  // 1) Commands that access the same rank as previous burst
  //and can prep the bank seamlessly.
  // 2) Commands (any rank) with earliest bank prep
 -if (!switched_cmd_type  same_rank_match 
 -min_act_at_same_rank = min_cmd_at) {
 +if ((bank_mask == 0) || (!switched_cmd_type  same_rank_match 
 +min_act_at_same_rank = min_cmd_at)) {
  bank_mask = bank_mask_same_rank;
  }

  Could you give that a spin?

  Thanks,

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Friday, 14 November 2014 00:11
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] DRAMCTRL:Seeing an unusual behaviuor with FR-FCFS
 scheduler

   Hi Users,

  For the following scenario:


  Read0 Read1 Read2 Read3 Read4 Read5 Read6 Read7 Read8 Read9 Read10
 Read11

  There are 12 reads in the read queue numbered in the order of arrival.
  Read 0 to Read3 access same row  of Bank1, Read4 access Bank0, Read5 to
 Read8 access same row of Bank2 and Read9 to Read11 access same row of Bank3.

  According to FR-FCFS scheduler, even there is only a single request
 Read4 to Bank0, it should be scheduled after the Read0 to Read3 are
 scheduled. Because within the window of Read0-Read3, the Read4 would have
 done with precharge and activate and ready to schedule. Though Read5 and
 Read9 are also ready, Read4 needs to be scheduled as the next row hit,
 according to FCFS.

  However i see a different behaviour where Read4 is scheduled only after
 all other row hits to Bank2 and Bank3 is scheduled. Also i noticed  from
 the debug prints that Read4 is not becoming a row_hit.

  Are we missing to mark the read as row hit after precharge and
 activate. I am trying to figure this out. Is my understanding

[gem5-users] DRAMCTRL:Seeing an unusual behaviuor with FR-FCFS scheduler

2014-11-13 Thread Prathap Kolakkampadath via gem5-users
Hi Users,

For the following scenario:


Read0 Read1 Read2 Read3 Read4 Read5 Read6 Read7 Read8 Read9 Read10 Read11

There are 12 reads in the read queue numbered in the order of arrival.
Read 0 to Read3 access same row  of Bank1, Read4 access Bank0, Read5 to
Read8 access same row of Bank2 and Read9 to Read11 access same row of Bank3.

According to FR-FCFS scheduler, even there is only a single request Read4
to Bank0, it should be scheduled after the Read0 to Read3 are scheduled.
Because within the window of Read0-Read3, the Read4 would have done with
precharge and activate and ready to schedule. Though Read5 and Read9 are
also ready, Read4 needs to be scheduled as the next row hit, according to
FCFS.

However i see a different behaviour where Read4 is scheduled only after all
other row hits to Bank2 and Bank3 is scheduled. Also i noticed  from the
debug prints that Read4 is not becoming a row_hit.

Are we missing to mark the read as row hit after precharge and activate. I
am trying to figure this out. Is my understanding correct?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] DRAM memory access latency

2014-11-10 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

 waiting in the port until the crossbar can accept it

Is this because of Bus Contention? In that case, is there a way to reduce
this latency by changing any parameters in gem5?

Thanks,
Prathap

On Thu, Nov 6, 2014 at 2:30 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  I suspect the answer to the mysterious 50 ns is due to the responses
 being sent back using a so called “queued port” in gem5. Thus, from the
 memory controller’s point of view the packet is all done, but is now
 waiting in the port until the crossbar can accept it. This queue can hold a
 number of packets if there has been a burst of responses that are trickling
 through the crossbar on their way back.

  You can always run with some debug flags to verify this (XBar, DRAM,
 PacketQueue etc).

  Coincidentally I have been working on a patch to remove this “invisible”
 queue and should hopefully have this on the review board shortly.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Thursday, November 6, 2014 at 5:47 PM
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org

 Subject: Re: [gem5-users] DRAM memory access latency

   Hello Andreas,

  Thanks for your reply.


  Ok. I got that the memory access latency indeed includes the queueing
 latency. And for the read/write request that miss the buffer has a static
 latency of  Static frontend latency + Static backend latency.


  To summarize, the test i run is a latency benchmark which is a pointer
 chasing test(only one request at a time) , generate reads to a specific
 DRAM bank (Bank partitioned).This test is running on cpu0 of 4 cpu
 arm_detailed running at 1GHZ frequency with 1MB shared L2 cache and  single
 channel LPDDR3 x32 DRAM. The bank used by cpu0 is not shared between other
 cpu's.

  Test statistics:

 system.mem_ctrls.avgQLat
43816.35   # Average queueing delay per
 DRAM burst
 system.mem_ctrls.avgBusLat
 5000.00   # Average bus latency per DRAM burst
 system.mem_ctrls.avgMemAccLat
 63816.35   # Average memory access latency per DRAM
 burst
 system.mem_ctrls.avgRdQLen
 2.00   # Average read queue length when enqueuing
 system.mem_ctrls.avgGap
 136814.25   # Average gap between requests
 system.l2.ReadReq_avg_miss_latency::switch_cpus0.data
 114767.654811   # average ReadReq miss latency

  Based on above test statistics:

  avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns)
 +tCL(15ns)+static latency(20ns).
 Is this breakup correct?

  However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more
 than the avgMemAccLat. I couldn't figure out the components contributing to
 this 50ns latency. Your thoughts on this is much appreciated.

  Regards,
  Prathap




 On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  The avgMemAccLat does indeed include any queueing latency. For the
 precise components included in the various latencies I would suggest
 checking the source code.

  Note that the controller is not just accounting for the static (and
 dynamic) DRAM latency, but also the static controller pipeline latency (and
 dynamic queueing latency). The controller static latency is two parameters
 that are by default also adding a few 10’s of nanoseconds.

  Let me know if you need more help breaking out the various components.

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Wednesday, 5 November 2014 05:36
 To: Tao Zhang tao.zhang.0...@gmail.com, gem5 users mailing list 
 gem5-users@gem5.org, Amin Farmahini amin...@gmail.com
 Subject: Re: [gem5-users] DRAM memory access latency

  Hi Tao,Amin,

  According to gem5 source, MemAccLat is the time difference between the
 packet enters in the controller and packet leaves the controller. I presume
  this added with BusLatency and static backend latency should match with
 system.l2.ReadReq_avg_miss_latency. However i see a difference of approx
 50ns.


  As mentioned above if MemAccLat is the time a packet spends in memory
 controller, then it should include the queuing latency too. In that case
 the value of  avgQLat looks suspicious. Is the avgQlat part of
 avgMemAccLat?

  Thanks,
 Prathap



 On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com
 wrote:

  From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the
 overall average memory latency. It is 63.816ns, which is very close to 60ns
 as you calculated. I guess the extra 3.816ns is due to the refresh penalty.

 -Tao

 On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

  Hi Toa, Amin,


  Thanks for your reply.

  To discard interbank interference and queueing delay, i have

Re: [gem5-users] DRAM memory access latency

2014-11-06 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

Thanks for your reply.


Ok. I got that the memory access latency indeed includes the queueing
latency. And for the read/write request that miss the buffer has a static
latency of  Static frontend latency + Static backend latency.


To summarize, the test i run is a latency benchmark which is a pointer
chasing test(only one request at a time) , generate reads to a specific
DRAM bank (Bank partitioned).This test is running on cpu0 of 4 cpu
arm_detailed running at 1GHZ frequency with 1MB shared L2 cache and  single
channel LPDDR3 x32 DRAM. The bank used by cpu0 is not shared between other
cpu's.

Test statistics:

system.mem_ctrls.avgQLat
   43816.35   # Average queueing delay per
DRAM burst
system.mem_ctrls.avgBusLat5000.00
# Average bus latency per DRAM burst
system.mem_ctrls.avgMemAccLat63816.35
# Average memory access latency per DRAM burst
system.mem_ctrls.avgRdQLen   2.00
# Average read queue length when enqueuing
system.mem_ctrls.avgGap 136814.25
# Average gap between requests
system.l2.ReadReq_avg_miss_latency::switch_cpus0.data
114767.654811   # average ReadReq miss latency

Based on above test statistics:

avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns)
+tCL(15ns)+static latency(20ns).
Is this breakup correct?

However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more than
the avgMemAccLat. I couldn't figure out the components contributing to this
50ns latency. Your thoughts on this is much appreciated.

Regards,
Prathap




On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  The avgMemAccLat does indeed include any queueing latency. For the
 precise components included in the various latencies I would suggest
 checking the source code.

  Note that the controller is not just accounting for the static (and
 dynamic) DRAM latency, but also the static controller pipeline latency (and
 dynamic queueing latency). The controller static latency is two parameters
 that are by default also adding a few 10’s of nanoseconds.

  Let me know if you need more help breaking out the various components.

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Wednesday, 5 November 2014 05:36
 To: Tao Zhang tao.zhang.0...@gmail.com, gem5 users mailing list 
 gem5-users@gem5.org, Amin Farmahini amin...@gmail.com
 Subject: Re: [gem5-users] DRAM memory access latency

  Hi Tao,Amin,

  According to gem5 source, MemAccLat is the time difference between the
 packet enters in the controller and packet leaves the controller. I presume
  this added with BusLatency and static backend latency should match with
 system.l2.ReadReq_avg_miss_latency. However i see a difference of approx
 50ns.


  As mentioned above if MemAccLat is the time a packet spends in memory
 controller, then it should include the queuing latency too. In that case
 the value of  avgQLat looks suspicious. Is the avgQlat part of
 avgMemAccLat?

  Thanks,
 Prathap



 On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com
 wrote:

  From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the
 overall average memory latency. It is 63.816ns, which is very close to 60ns
 as you calculated. I guess the extra 3.816ns is due to the refresh penalty.

 -Tao

 On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

  Hi Toa, Amin,


  Thanks for your reply.

  To discard interbank interference and queueing delay, i have
 partitioned the banks so that the latency benchmark has exclusive access to
 a bank. Also latency benchmark is a pointer chasing benchmark, which will
 generate a single read request at a time.


  stats.txt says this:

 system.mem_ctrls.avgQLat
 43816.35   # Average queueing delay per DRAM burst
 system.mem_ctrls.avgBusLat
 5000.00   # Average bus latency per DRAM burst
 system.mem_ctrls.avgMemAccLat
 63816.35   # Average memory access latency per DRAM
 burst
 system.mem_ctrls.avgRdQLen
 2.00   # Average read queue length when enqueuing
 system.mem_ctrls.avgGap
 136814.25   # Average gap between requests
 system.l2.ReadReq_avg_miss_latency::switch_cpus0.data
 114767.654811   # average ReadReq miss latency

  The average Gap between requests is equal to the L2 latency + DRAM
 Latency for this test. Also avgRdQLen is 2 because cache line size is 64
 and DRAM interface is x32.

  Is the final latency sum of avgQLat + avgBusLat + avgMemAccLat ?
 Also when avgRdQLen is 2, i am not sure what amounts to high queueing
 latency?

  Regards,
  Prathap



 On Tue, Nov 4, 2014 at 1:38 PM, Amin Farmahini amin...@gmail.com
 wrote:

  Prathap,

  You

Re: [gem5-users] DRAM memory access latency

2014-11-06 Thread Prathap Kolakkampadath via gem5-users
Thanks for your reply. I will try to verify this and also get back to you
with results once i run with your patch.

Regards,
Prathap

On Thu, Nov 6, 2014 at 2:30 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  I suspect the answer to the mysterious 50 ns is due to the responses
 being sent back using a so called “queued port” in gem5. Thus, from the
 memory controller’s point of view the packet is all done, but is now
 waiting in the port until the crossbar can accept it. This queue can hold a
 number of packets if there has been a burst of responses that are trickling
 through the crossbar on their way back.

  You can always run with some debug flags to verify this (XBar, DRAM,
 PacketQueue etc).

  Coincidentally I have been working on a patch to remove this “invisible”
 queue and should hopefully have this on the review board shortly.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Thursday, November 6, 2014 at 5:47 PM
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org

 Subject: Re: [gem5-users] DRAM memory access latency

   Hello Andreas,

  Thanks for your reply.


  Ok. I got that the memory access latency indeed includes the queueing
 latency. And for the read/write request that miss the buffer has a static
 latency of  Static frontend latency + Static backend latency.


  To summarize, the test i run is a latency benchmark which is a pointer
 chasing test(only one request at a time) , generate reads to a specific
 DRAM bank (Bank partitioned).This test is running on cpu0 of 4 cpu
 arm_detailed running at 1GHZ frequency with 1MB shared L2 cache and  single
 channel LPDDR3 x32 DRAM. The bank used by cpu0 is not shared between other
 cpu's.

  Test statistics:

 system.mem_ctrls.avgQLat
43816.35   # Average queueing delay per
 DRAM burst
 system.mem_ctrls.avgBusLat
 5000.00   # Average bus latency per DRAM burst
 system.mem_ctrls.avgMemAccLat
 63816.35   # Average memory access latency per DRAM
 burst
 system.mem_ctrls.avgRdQLen
 2.00   # Average read queue length when enqueuing
 system.mem_ctrls.avgGap
 136814.25   # Average gap between requests
 system.l2.ReadReq_avg_miss_latency::switch_cpus0.data
 114767.654811   # average ReadReq miss latency

  Based on above test statistics:

  avgMemAccLat is 63ns, which i presume the sum of tRP(15ns)+tRCD(15ns)
 +tCL(15ns)+static latency(20ns).
 Is this breakup correct?

  However the l2.ReadReq_avg_miss_atency is 114ns which is ~50 ns more
 than the avgMemAccLat. I couldn't figure out the components contributing to
 this 50ns latency. Your thoughts on this is much appreciated.

  Regards,
  Prathap




 On Thu, Nov 6, 2014 at 3:03 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  The avgMemAccLat does indeed include any queueing latency. For the
 precise components included in the various latencies I would suggest
 checking the source code.

  Note that the controller is not just accounting for the static (and
 dynamic) DRAM latency, but also the static controller pipeline latency (and
 dynamic queueing latency). The controller static latency is two parameters
 that are by default also adding a few 10’s of nanoseconds.

  Let me know if you need more help breaking out the various components.

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Wednesday, 5 November 2014 05:36
 To: Tao Zhang tao.zhang.0...@gmail.com, gem5 users mailing list 
 gem5-users@gem5.org, Amin Farmahini amin...@gmail.com
 Subject: Re: [gem5-users] DRAM memory access latency

  Hi Tao,Amin,

  According to gem5 source, MemAccLat is the time difference between the
 packet enters in the controller and packet leaves the controller. I presume
  this added with BusLatency and static backend latency should match with
 system.l2.ReadReq_avg_miss_latency. However i see a difference of approx
 50ns.


  As mentioned above if MemAccLat is the time a packet spends in memory
 controller, then it should include the queuing latency too. In that case
 the value of  avgQLat looks suspicious. Is the avgQlat part of
 avgMemAccLat?

  Thanks,
 Prathap



 On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com
 wrote:

  From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the
 overall average memory latency. It is 63.816ns, which is very close to 60ns
 as you calculated. I guess the extra 3.816ns is due to the refresh penalty.

 -Tao

 On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

  Hi Toa, Amin,


  Thanks for your reply.

  To discard interbank interference and queueing delay, i have
 partitioned the banks so that the latency benchmark has exclusive access to
 a bank

[gem5-users] DRAM memory access latency

2014-11-04 Thread Prathap Kolakkampadath via gem5-users
Hello Users,

I am measuring DRAM worst case memory access latency(tRP+tRCD +tCL+tBURST)
using a latency benchmark on arm_detailed(1Ghz) with 1MB shared L2 cache
and  LPDDR3 x32 DRAM.

According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL =
'15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is
22 ns and on cache miss is  132ns. Which means DRAM memory access latency ~
110ns. However according to calculation it should  be
tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns.


The latency what i observe is almost 50ns higher than what it is supposed
to be. Is there anything which I am missing? Do any one know what else
could add to the DRAM memory access latency?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] DRAM memory access latency

2014-11-04 Thread Prathap Kolakkampadath via gem5-users
Hi Tao,Amin,

According to gem5 source, MemAccLat is the time difference between the
packet enters in the controller and packet leaves the controller. I presume
 this added with BusLatency and static backend latency should match with
system.l2.ReadReq_avg_miss_latency. However i see a difference of approx
50ns.


As mentioned above if MemAccLat is the time a packet spends in memory
controller, then it should include the queuing latency too. In that case
the value of  avgQLat looks suspicious. Is the avgQlat part of avgMemAccLat?

Thanks,
Prathap



On Tue, Nov 4, 2014 at 3:11 PM, Tao Zhang tao.zhang.0...@gmail.com wrote:

 From the stats, I'd like to use system.mem_ctrls.avgMemAccLat as the
 overall average memory latency. It is 63.816ns, which is very close to 60ns
 as you calculated. I guess the extra 3.816ns is due to the refresh penalty.

 -Tao

 On Tue, Nov 4, 2014 at 12:10 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

 Hi Toa, Amin,


 Thanks for your reply.

 To discard interbank interference and queueing delay, i have partitioned
 the banks so that the latency benchmark has exclusive access to a bank.
 Also latency benchmark is a pointer chasing benchmark, which will generate
 a single read request at a time.


 stats.txt says this:

 system.mem_ctrls.avgQLat
 43816.35   # Average queueing delay per DRAM burst
 system.mem_ctrls.avgBusLat
 5000.00   # Average bus latency per DRAM burst
 system.mem_ctrls.avgMemAccLat
 63816.35   # Average memory access latency per DRAM
 burst
 system.mem_ctrls.avgRdQLen
 2.00   # Average read queue length when enqueuing
 system.mem_ctrls.avgGap
 136814.25   # Average gap between requests
 system.l2.ReadReq_avg_miss_latency::switch_cpus0.data
 114767.654811   # average ReadReq miss latency

 The average Gap between requests is equal to the L2 latency + DRAM
 Latency for this test. Also avgRdQLen is 2 because cache line size is 64
 and DRAM interface is x32.

 Is the final latency sum of avgQLat + avgBusLat + avgMemAccLat ?
 Also when avgRdQLen is 2, i am not sure what amounts to high queueing
 latency?

 Regards,
 Prathap



 On Tue, Nov 4, 2014 at 1:38 PM, Amin Farmahini amin...@gmail.com wrote:

 Prathap,

 You are probably missing DRAM queuing latency (major reason) and other
 on-chip latencies (such as bus latency) if any.

 Thanks,
 Amin

 On Tue, Nov 4, 2014 at 1:28 PM, Prathap Kolakkampadath via gem5-users 
 gem5-users@gem5.org wrote:

 Hello Users,

 I am measuring DRAM worst case memory access latency(tRP+tRCD
 +tCL+tBURST) using a latency benchmark on arm_detailed(1Ghz) with 1MB
 shared L2 cache and  LPDDR3 x32 DRAM.

 According to DRAM timing parameters, tRP = '15ns, tRCD = '15ns', tCL =
 '15ns', tBURST = '5ns'. Latency measured by the benchmark on cache hit is
 22 ns and on cache miss is  132ns. Which means DRAM memory access latency ~
 110ns. However according to calculation it should  be
 tRP+tRCD+tCL+tBurst+static_backend_latency(10ns) = 60ns.


 The latency what i observe is almost 50ns higher than what it is
 supposed to be. Is there anything which I am missing? Do any one know what
 else could add to the DRAM memory access latency?

 Thanks,
 Prathap


 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users





___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Questions on DRAM Controller model

2014-10-16 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

According to gem5 Documentation

Load  store buffers (for read and write access) don’t impose any
restriction on the number of active memory accesses. Therefore, the maximum
number of outstanding CPU’s memory access requests is not limited by CPU
Memory Object but by underlying memory system model.

I assume this means the number outstanding memory accesses of a CPU is
depended  on number of L1 MSHR. Is this correct?


However, As quoted from the paper
http://web.eecs.umich.edu/~atgutier/papers/ispass_2014.pdf

gem5’s fetch engine only
allows a single outstanding I-cache access, whereas modern
OoO CPUs are fully pipelined allowing multiple parallel
accesses to instruction cache lines. This specification error in
the fetch stage contributes to the I-cache miss statistic error.

If this is the case then, it limits the number of outstanding read
requests? Is it possible to generate multiple parallel accesses by setting
the response_latency of L1 cache to 0?


Kindly let me know what you think on this?

Thanks,
Prathap




On Tue, Oct 14, 2014 at 4:22 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hello Prathap,

  I do not dare say, but perhaps some interaction between your generated
 access sequence and the O3 model (parameters) restrict the number of
 outstanding L1 misses? There are plenty debug flags to help in drilling
 down on this issue. Have a look in src/cpu/o3/Sconscript for the O3 related
 debug flags and src/mem/cache/Sconscript for the cache flags.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, October 14, 2014 at 9:21 PM

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

   Hello Andreas

  Whenever i switch to O3 Cpu from a checkpoint, i could see from
 config.ini that CPU is getting switched but the mem_mode is still set to
 atomic. However when booting in O3 CPU itself(without restoring from a
 checkpoint) the mem_mode is set to timing. Not sure why. Anyhow i could run
 my tests on O3 CPU with mem_mode timing(as verified from config.ini)

  When i run one memory-intensive tests, which generates cache miss on
 every read, in parallel with a pointer chasing test(one outstanding request
 at a time) and both the cpu's share the same bank of DRAM Controller. In my
 setup, as # of L1 MSHRs are 10, memory-intensive test can generate up to 10
 Outstanding requests at a time. Since CPU speed is much faster than DRAM
 controller, can generate outstanding requests and all the requests are
 targeted to same bank, i expect to see the DRAM queue size to be 10 all the
 time when there is a request coming from pointer chasing test. If this
 assumption is correct i could see a better interference in model as i could
 see in real platforms.

  Don't you think DRAM queue size would get  filled up to the size of
 number of L1 MSHRs according to above scenario. And what could be the case
 in order to fill the DRAM up to the size of # of L1 MSHRs.

  Thanks,
  Prathap Kumar Valsan
  Research Assistant
  University of Kansas

 On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  The O3 CPU only works with the memory system in timing mode, so I do
 not understand what two points you are comparing when you say the results
 are exactly the same.

  The read queue is likely to never fill up unless all these transactions
 are generated at once. While the first one is being served by the memory
 controller you may have more coming in etc, but I do not understand why you
 think it would ever fill up.

  For “debugging” make sure that the config.ini actually captures what
 you think you are simulating. Also, you have a lot of DRAM-related stats in
 the stats.txt output.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 14 October 2014 04:33

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

  I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing,
 the results are exactly the same compared to mem_mode=atomic.
  I have partitioned the DRAM banks using software. Both the benchmarks-
 latency-sensitive and bandwidth -sensitive (both generates only reads)
 running in parallel using the same DRAM bank.
 From status file, i observe expected number L2 misses and DRAM requests
 are getting generated.
 In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
 32. So i expect that when a request from a latency-sensitive benchmark
 comes to DRAM, the readQ size has to be 10. However what i am observing is
 most of the time the Queue is not getting filled and hence there is less
 queueing latency and interference.

  I am using classic memory system with default DRAM
 controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
 policy

Re: [gem5-users] Questions on DRAM Controller model

2014-10-15 Thread Prathap Kolakkampadath via gem5-users
Thanks Andreas.


On Tue, Oct 14, 2014 at 4:22 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hello Prathap,

  I do not dare say, but perhaps some interaction between your generated
 access sequence and the O3 model (parameters) restrict the number of
 outstanding L1 misses? There are plenty debug flags to help in drilling
 down on this issue. Have a look in src/cpu/o3/Sconscript for the O3 related
 debug flags and src/mem/cache/Sconscript for the cache flags.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, October 14, 2014 at 9:21 PM

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

   Hello Andreas

  Whenever i switch to O3 Cpu from a checkpoint, i could see from
 config.ini that CPU is getting switched but the mem_mode is still set to
 atomic. However when booting in O3 CPU itself(without restoring from a
 checkpoint) the mem_mode is set to timing. Not sure why. Anyhow i could run
 my tests on O3 CPU with mem_mode timing(as verified from config.ini)

  When i run one memory-intensive tests, which generates cache miss on
 every read, in parallel with a pointer chasing test(one outstanding request
 at a time) and both the cpu's share the same bank of DRAM Controller. In my
 setup, as # of L1 MSHRs are 10, memory-intensive test can generate up to 10
 Outstanding requests at a time. Since CPU speed is much faster than DRAM
 controller, can generate outstanding requests and all the requests are
 targeted to same bank, i expect to see the DRAM queue size to be 10 all the
 time when there is a request coming from pointer chasing test. If this
 assumption is correct i could see a better interference in model as i could
 see in real platforms.

  Don't you think DRAM queue size would get  filled up to the size of
 number of L1 MSHRs according to above scenario. And what could be the case
 in order to fill the DRAM up to the size of # of L1 MSHRs.

  Thanks,
  Prathap Kumar Valsan
  Research Assistant
  University of Kansas

 On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  The O3 CPU only works with the memory system in timing mode, so I do
 not understand what two points you are comparing when you say the results
 are exactly the same.

  The read queue is likely to never fill up unless all these transactions
 are generated at once. While the first one is being served by the memory
 controller you may have more coming in etc, but I do not understand why you
 think it would ever fill up.

  For “debugging” make sure that the config.ini actually captures what
 you think you are simulating. Also, you have a lot of DRAM-related stats in
 the stats.txt output.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 14 October 2014 04:33

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

  I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing,
 the results are exactly the same compared to mem_mode=atomic.
  I have partitioned the DRAM banks using software. Both the benchmarks-
 latency-sensitive and bandwidth -sensitive (both generates only reads)
 running in parallel using the same DRAM bank.
 From status file, i observe expected number L2 misses and DRAM requests
 are getting generated.
 In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
 32. So i expect that when a request from a latency-sensitive benchmark
 comes to DRAM, the readQ size has to be 10. However what i am observing is
 most of the time the Queue is not getting filled and hence there is less
 queueing latency and interference.

  I am using classic memory system with default DRAM
 controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
 policy-open_adaptive, and frfcfs scheduler.

  Do you have any thoughts on this? How could i debug this further?

  Appreciate your help.

  Thanks,
  Prathap Kumar Valsan
  Research Assistant
  University of Kansas

 On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson andreas.hans...@arm.com
  wrote:

  Hi Prathap,

  Indeed. The atomic mode is for fast-forwarding only. Once you actually
 want to get some representative performance numbers you have to run in
 timing mode with either the O3 or Minor CPU model.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Monday, 13 October 2014 10:19

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Thanks for your reply. The memory mode which I used is atomic. I
 think, I need to run the tests in timing More. I believe which shows up
 interference and queueing delay similar to real platforms.

 Prathap
 On Oct 13, 2014 2:55 AM, Andreas Hansson andreas.hans

Re: [gem5-users] Questions on DRAM Controller model

2014-10-14 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas

Whenever i switch to O3 Cpu from a checkpoint, i could see from config.ini
that CPU is getting switched but the mem_mode is still set to atomic.
However when booting in O3 CPU itself(without restoring from a checkpoint)
the mem_mode is set to timing. Not sure why. Anyhow i could run my tests on
O3 CPU with mem_mode timing(as verified from config.ini)

When i run one memory-intensive tests, which generates cache miss on every
read, in parallel with a pointer chasing test(one outstanding request at a
time) and both the cpu's share the same bank of DRAM Controller. In my
setup, as # of L1 MSHRs are 10, memory-intensive test can generate up to 10
Outstanding requests at a time. Since CPU speed is much faster than DRAM
controller, can generate outstanding requests and all the requests are
targeted to same bank, i expect to see the DRAM queue size to be 10 all the
time when there is a request coming from pointer chasing test. If this
assumption is correct i could see a better interference in model as i could
see in real platforms.

Don't you think DRAM queue size would get  filled up to the size of number
of L1 MSHRs according to above scenario. And what could be the case in
order to fill the DRAM up to the size of # of L1 MSHRs.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Tue, Oct 14, 2014 at 2:30 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  The O3 CPU only works with the memory system in timing mode, so I do not
 understand what two points you are comparing when you say the results are
 exactly the same.

  The read queue is likely to never fill up unless all these transactions
 are generated at once. While the first one is being served by the memory
 controller you may have more coming in etc, but I do not understand why you
 think it would ever fill up.

  For “debugging” make sure that the config.ini actually captures what you
 think you are simulating. Also, you have a lot of DRAM-related stats in the
 stats.txt output.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 14 October 2014 04:33

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

Hi Andreas, users

  I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing,
 the results are exactly the same compared to mem_mode=atomic.
  I have partitioned the DRAM banks using software. Both the benchmarks-
 latency-sensitive and bandwidth -sensitive (both generates only reads)
 running in parallel using the same DRAM bank.
 From status file, i observe expected number L2 misses and DRAM requests
 are getting generated.
 In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
 32. So i expect that when a request from a latency-sensitive benchmark
 comes to DRAM, the readQ size has to be 10. However what i am observing is
 most of the time the Queue is not getting filled and hence there is less
 queueing latency and interference.

  I am using classic memory system with default DRAM
 controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
 policy-open_adaptive, and frfcfs scheduler.

  Do you have any thoughts on this? How could i debug this further?

  Appreciate your help.

  Thanks,
  Prathap Kumar Valsan
  Research Assistant
  University of Kansas

 On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  Indeed. The atomic mode is for fast-forwarding only. Once you actually
 want to get some representative performance numbers you have to run in
 timing mode with either the O3 or Minor CPU model.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Monday, 13 October 2014 10:19

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Thanks for your reply. The memory mode which I used is atomic. I think,
 I need to run the tests in timing More. I believe which shows up
 interference and queueing delay similar to real platforms.

 Prathap
 On Oct 13, 2014 2:55 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  I don’t dare say exactly what is going wrong in your setup, but I am
 confident that Ruby will not magically make things more representative (it
 will likely give you a whole lot more problems though). In the end it is
 all about configuring the building blocks to match the system you want to
 capture. The crossbars and caches in the classic memory system do make some
 simplifications, but I have not yet seen a case when they are not
 sufficiently accurate.

  Have you looked at the various policy settings in the DRAM controller,
 e.g. the page policy and address mapping? If you’re trying to correlate
 with a real platform, also see Anthony’s ISPASS paper from last year for
 some sensible steps in simplifying the problem and dividing

[gem5-users] Unit of avg_miss_latency

2014-10-14 Thread Prathap Kolakkampadath via gem5-users
Hi Users,

Below is the avg miss latency for l2 captured from stats.txt. What is the
unit of this? Does this mean 230ns?


system.l2.ReadReq_avg_miss_latency::cpu0.data
230466.136072   # average ReadReq miss latency

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Unit of avg_miss_latency

2014-10-14 Thread Prathap Kolakkampadath via gem5-users
Thanks Amin
On Oct 14, 2014 8:27 PM, Amin Farmahini amin...@gmail.com wrote:

 pico second. Each tick is a pico second in gem5.

 Amin

 On Tue, Oct 14, 2014 at 7:53 PM, Prathap Kolakkampadath via gem5-users 
 gem5-users@gem5.org wrote:

 Hi Users,

 Below is the avg miss latency for l2 captured from stats.txt. What is the
 unit of this? Does this mean 230ns?


 system.l2.ReadReq_avg_miss_latency::cpu0.data
 230466.136072   # average ReadReq miss latency

 Thanks,
 Prathap

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Questions on DRAM Controller model

2014-10-13 Thread Prathap Kolakkampadath via gem5-users
Thanks for your reply. The memory mode which I used is atomic. I think, I
need to run the tests in timing More. I believe which shows up interference
and queueing delay similar to real platforms.

Prathap
On Oct 13, 2014 2:55 AM, Andreas Hansson andreas.hans...@arm.com wrote:

  Hi Prathap,

  I don’t dare say exactly what is going wrong in your setup, but I am
 confident that Ruby will not magically make things more representative (it
 will likely give you a whole lot more problems though). In the end it is
 all about configuring the building blocks to match the system you want to
 capture. The crossbars and caches in the classic memory system do make some
 simplifications, but I have not yet seen a case when they are not
 sufficiently accurate.

  Have you looked at the various policy settings in the DRAM controller,
 e.g. the page policy and address mapping? If you’re trying to correlate
 with a real platform, also see Anthony’s ISPASS paper from last year for
 some sensible steps in simplifying the problem and dividing it into
 manageable chunks.

  Good luck.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Monday, 13 October 2014 00:29
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

   Hello Andreas/Users,

 I used to create a checkpoint until linux boot using Atomic Simple CPU and
 then restore from this checkpoint to detailed O3 cpu before running the
 test. I notice that the mem-mode is  set to atomic and not timing. Will
 that be the reason for less contention in memory bus i am observing?

  Thanks,
  Prathap

 On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

  Hello Andreas,

  Even after configuring the model like the actual hardware, i still not
 seeing enough interference to the read request under consideration. I am
 using the classic memory system model. Since it uses atomic and functional
 Packet allocation protocol, I would like to switch to Ruby( I think it
 more resembles with real platform).


  I am hitting in to below problem when i use ruby.

 /build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches
 --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4
 --mem-size=512MB
 --kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
 --disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
 --machine-type=VExpress_EMM
 --dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
 --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

 Traceback (most recent call last):
   File string, line 1, in module
   File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388, in
 main
 exec filecode in scope
   File configs/example/fs.py, line 302, in module
 test_sys = build_test_system(np)
   File configs/example/fs.py, line 138, in build_test_system
 Ruby.create_system(options, test_sys, test_sys.iobus,
 test_sys._dma_ports)
   File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line
 825, in __getattr__
 raise AttributeError, err_string
 AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
   (C++ object is not yet constructed, so wrapped C++ methods are
 unavailable.)

  What could be the cause of this?

  Thanks,
 Prathap



 On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  There are many possible reasons for the discrepancy, and obviously
 there are many ways of building a memory controller :-). Have you
 configured the model to look like the actual hardware? The most obvious
 differences would be in terms of buffer sizes, the page policy, arbitration
 policy, the threshold before closing a page, the read/write switching,
 actual timings etc. It is also worth checking if the controller hardware
 treats writes the same way the model does (early responses, minimise
 switching).

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 9 September 2014 18:56
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Hello Andreas,

  Thanks for your reply. I read your ISPASS paper and got a fair
 understanding about the architecture.
 I am trying to reproduce the results, collected from running synthetic
 benchmarks (latency and bandwidth) on real hardware, in Simulator
 Environment.However, i could see variations in the results and i am trying
 to understand the reasons.

  The experiment has latency(memory non-intensive with random access) as
 the primary task and bandwidth(memory intensive with sequential access) as
 the co-runner task.


  On real hardware
 case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
 case 2 - 1 corunner : latency of the test is 225.95ns

Re: [gem5-users] Questions on DRAM Controller model

2014-10-13 Thread Prathap Kolakkampadath via gem5-users
Hi Andreas, users

I ran the test with ARM O3 cpu(--cpu-type=detailed) , mem_mode=timing, the
results are exactly the same compared to mem_mode=atomic.
I have partitioned the DRAM banks using software. Both the benchmarks-
latency-sensitive and bandwidth -sensitive (both generates only reads)
running in parallel using the same DRAM bank.
From status file, i observe expected number L2 misses and DRAM requests are
getting generated.
In my system, the number of L1 MSHRs are 10 and number of L2 MSHR's are
32. So i expect that when a request from a latency-sensitive benchmark
comes to DRAM, the readQ size has to be 10. However what i am observing is
most of the time the Queue is not getting filled and hence there is less
queueing latency and interference.

I am using classic memory system with default DRAM
controller,DDR3_1600_x64. Addressing map is RoRaBaChCo, page
policy-open_adaptive, and frfcfs scheduler.

Do you have any thoughts on this? How could i debug this further?

Appreciate your help.

Thanks,
Prathap Kumar Valsan
Research Assistant
University of Kansas

On Mon, Oct 13, 2014 at 4:21 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  Indeed. The atomic mode is for fast-forwarding only. Once you actually
 want to get some representative performance numbers you have to run in
 timing mode with either the O3 or Minor CPU model.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Monday, 13 October 2014 10:19

 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Thanks for your reply. The memory mode which I used is atomic. I think,
 I need to run the tests in timing More. I believe which shows up
 interference and queueing delay similar to real platforms.

 Prathap
 On Oct 13, 2014 2:55 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  I don’t dare say exactly what is going wrong in your setup, but I am
 confident that Ruby will not magically make things more representative (it
 will likely give you a whole lot more problems though). In the end it is
 all about configuring the building blocks to match the system you want to
 capture. The crossbars and caches in the classic memory system do make some
 simplifications, but I have not yet seen a case when they are not
 sufficiently accurate.

  Have you looked at the various policy settings in the DRAM controller,
 e.g. the page policy and address mapping? If you’re trying to correlate
 with a real platform, also see Anthony’s ISPASS paper from last year for
 some sensible steps in simplifying the problem and dividing it into
 manageable chunks.

  Good luck.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Monday, 13 October 2014 00:29
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

   Hello Andreas/Users,

 I used to create a checkpoint until linux boot using Atomic Simple CPU
 and then restore from this checkpoint to detailed O3 cpu before running the
 test. I notice that the mem-mode is  set to atomic and not timing. Will
 that be the reason for less contention in memory bus i am observing?

  Thanks,
  Prathap

 On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath 
 kvprat...@gmail.com wrote:

  Hello Andreas,

  Even after configuring the model like the actual hardware, i still not
 seeing enough interference to the read request under consideration. I am
 using the classic memory system model. Since it uses atomic and functional
 Packet allocation protocol, I would like to switch to Ruby( I think it
 more resembles with real platform).


  I am hitting in to below problem when i use ruby.

 /build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py
 --caches --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB
 --num-cpus=4 --mem-size=512MB
 --kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
 --disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
 --machine-type=VExpress_EMM
 --dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
 --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

 Traceback (most recent call last):
   File string, line 1, in module
   File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388,
 in main
 exec filecode in scope
   File configs/example/fs.py, line 302, in module
 test_sys = build_test_system(np)
   File configs/example/fs.py, line 138, in build_test_system
 Ruby.create_system(options, test_sys, test_sys.iobus,
 test_sys._dma_ports)
   File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line
 825, in __getattr__
 raise AttributeError, err_string
 AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
   (C++ object is not yet constructed, so wrapped C

Re: [gem5-users] Questions on DRAM Controller model

2014-10-12 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

Even after configuring the model like the actual hardware, i still not
seeing enough interference to the read request under consideration. I am
using the classic memory system model. Since it uses atomic and functional
Packet allocation protocol, I would like to switch to Ruby( I think it more
resembles with real platform).


I am hitting in to below problem when i use ruby.

/build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches
--l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4
--mem-size=512MB
--kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
--disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
--machine-type=VExpress_EMM
--dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
--cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

Traceback (most recent call last):
  File string, line 1, in module
  File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388, in
main
exec filecode in scope
  File configs/example/fs.py, line 302, in module
test_sys = build_test_system(np)
  File configs/example/fs.py, line 138, in build_test_system
Ruby.create_system(options, test_sys, test_sys.iobus,
test_sys._dma_ports)
  File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line 825,
in __getattr__
raise AttributeError, err_string
AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
  (C++ object is not yet constructed, so wrapped C++ methods are
unavailable.)

What could be the cause of this?

Thanks,
Prathap



On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  There are many possible reasons for the discrepancy, and obviously there
 are many ways of building a memory controller :-). Have you configured the
 model to look like the actual hardware? The most obvious differences would
 be in terms of buffer sizes, the page policy, arbitration policy, the
 threshold before closing a page, the read/write switching, actual timings
 etc. It is also worth checking if the controller hardware treats writes the
 same way the model does (early responses, minimise switching).

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 9 September 2014 18:56
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Hello Andreas,

  Thanks for your reply. I read your ISPASS paper and got a fair
 understanding about the architecture.
 I am trying to reproduce the results, collected from running synthetic
 benchmarks (latency and bandwidth) on real hardware, in Simulator
 Environment.However, i could see variations in the results and i am trying
 to understand the reasons.

  The experiment has latency(memory non-intensive with random access) as
 the primary task and bandwidth(memory intensive with sequential access) as
 the co-runner task.


  On real hardware
 case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
 case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

  On simulator
  case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
 case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s


  Case 1 where latency test run alone(0 corunner), the results matches on
 both environment. However Case 2, when run with bandwidth(1 corunner), the
 results varies a lot.
 Do you have any thoughts about this?
 Thanks,
 Prathap

 On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  Have you read our ISPASS paper from last year? It’s referenced in the
 header file, as well as on gem5.org.

1. Yes and no. Two different buffers are used in the model are used,
but they are random access, so you can treat the entries any way you want.
2. Yes and no. It’s a C++ model, so the scheduler executes in 0 time.
Thus, when looking at the various requests it effectively sees all the
banks.
3. Yes and no. See above.

 Remember that this is a model. The goal is not to be representative down
 to every last element of an RTL design. The goal is to be representative of
 a real design, and then be fast. Both of these goals are delivered upon by
 the model.

  I hope that explains it. IF there is anything in the results you do not
 agree with, please do say so.

  Thanks,

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Monday, 8 September 2014 18:38
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Questions on DRAM Controller model

  Hello Everybody,

 I am using DDR3_1600_x64. I am trying to understand the memory controller
 design and  have few doubts about it.

 1) Do the memory

Re: [gem5-users] Questions on DRAM Controller model

2014-10-12 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas/Users,

I used to create a checkpoint until linux boot using Atomic Simple CPU and
then restore from this checkpoint to detailed O3 cpu before running the
test. I notice that the mem-mode is  set to atomic and not timing. Will
that be the reason for less contention in memory bus i am observing?

Thanks,
Prathap

On Sun, Oct 12, 2014 at 4:56 PM, Prathap Kolakkampadath kvprat...@gmail.com
 wrote:

 Hello Andreas,

 Even after configuring the model like the actual hardware, i still not
 seeing enough interference to the read request under consideration. I am
 using the classic memory system model. Since it uses atomic and functional
 Packet allocation protocol, I would like to switch to Ruby( I think it
 more resembles with real platform).


 I am hitting in to below problem when i use ruby.

 /build/ARM/gem5.opt --stats-file=cr1A1.txt configs/example/fs.py --caches
 --l2cache --l1d_size=32kB --l1i_size=32kB --l2_size=1MB --num-cpus=4
 --mem-size=512MB
 --kernel=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/vmlinux
 --disk-image=/home/prathap/WorkSpace/gem5/fullsystem/disks/arm-ubuntu-natty-headless.img
 --machine-type=VExpress_EMM
 --dtb-file=/home/prathap/WorkSpace/linux-linaro-tracking-gem5/arch/arm/boot/dts/vexpress-v2p-ca15-tc1-gem5_4cpus.dtb
 --cpu-type=detailed --ruby --mem-type=ddr3_1600_x64

 Traceback (most recent call last):
   File string, line 1, in module
   File /home/prathap/WorkSpace/gem5/src/python/m5/main.py, line 388, in
 main
 exec filecode in scope
   File configs/example/fs.py, line 302, in module
 test_sys = build_test_system(np)
   File configs/example/fs.py, line 138, in build_test_system
 Ruby.create_system(options, test_sys, test_sys.iobus,
 test_sys._dma_ports)
   File /home/prathap/WorkSpace/gem5/src/python/m5/SimObject.py, line
 825, in __getattr__
 raise AttributeError, err_string
 AttributeError: object 'LinuxArmSystem' has no attribute '_dma_ports'
   (C++ object is not yet constructed, so wrapped C++ methods are
 unavailable.)

 What could be the cause of this?

 Thanks,
 Prathap



 On Tue, Sep 9, 2014 at 1:35 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  There are many possible reasons for the discrepancy, and obviously
 there are many ways of building a memory controller :-). Have you
 configured the model to look like the actual hardware? The most obvious
 differences would be in terms of buffer sizes, the page policy, arbitration
 policy, the threshold before closing a page, the read/write switching,
 actual timings etc. It is also worth checking if the controller hardware
 treats writes the same way the model does (early responses, minimise
 switching).

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Tuesday, 9 September 2014 18:56
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] Questions on DRAM Controller model

  Hello Andreas,

  Thanks for your reply. I read your ISPASS paper and got a fair
 understanding about the architecture.
 I am trying to reproduce the results, collected from running synthetic
 benchmarks (latency and bandwidth) on real hardware, in Simulator
 Environment.However, i could see variations in the results and i am trying
 to understand the reasons.

  The experiment has latency(memory non-intensive with random access) as
 the primary task and bandwidth(memory intensive with sequential access) as
 the co-runner task.


  On real hardware
 case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
 case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

  On simulator
  case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
 case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s


  Case 1 where latency test run alone(0 corunner), the results matches on
 both environment. However Case 2, when run with bandwidth(1 corunner), the
 results varies a lot.
 Do you have any thoughts about this?
 Thanks,
 Prathap

 On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  Have you read our ISPASS paper from last year? It’s referenced in the
 header file, as well as on gem5.org.

1. Yes and no. Two different buffers are used in the model are used,
but they are random access, so you can treat the entries any way you 
 want.
2. Yes and no. It’s a C++ model, so the scheduler executes in 0
time. Thus, when looking at the various requests it effectively sees all
the banks.
3. Yes and no. See above.

 Remember that this is a model. The goal is not to be representative down
 to every last element of an RTL design. The goal is to be representative of
 a real design, and then be fast. Both of these goals are delivered upon by
 the model.

  I hope that explains it. IF there is anything in the results you do
 not agree with, please do say so.

  Thanks,

  Andreas

   From: Prathap

Re: [gem5-users] Tracking DRAM read/write requests

2014-10-04 Thread Prathap Kolakkampadath via gem5-users
Hi Users,

Thanks Amin.

You said it right. The miss rate of my benchmark was low. I have modified
the benchmark so that every read request is a cache miss. After this i
could see improvements and DRAM queue is getting filled. However when I
print the queue size from reorderQueue(),  I observe that the queue's are
filled again only after the current queue size decrements to zero. As an
example, initially the queue size is 9 (L1 MSHR = 10)  and it decrements to
8,7..2. and again the queue size becomes 9 after only after the queue
becomes almost empty. Is it due to the latency between when it is read from
DRAM and respective MSHR is cleared ?


Regards,
Prathap



On Fri, Oct 3, 2014 at 3:58 PM, Prathap Kolakkampadath kvprat...@gmail.com
wrote:

 Hi Users,

 I am using an O3 4 cpu ARMv7 with DDR3_1600_x64. L1 I/Dcache size=32k and
 L2Cache size=1MB. #MSHRs' L1 = 10 and #MSHRs' L2 = 30.According to my
 understanding, this will enable each core to generate 10 outstanding memory
 requests.
 I am running a bandwidth test on all cpu's, which is memory-intesnive and
 generates consequent read requests to DRAM.

 However when i captured the DRAM debug messages, i could see that the DRAM
 read queue size is varying only between 0-2(expected a queue fill)  and
 reads are scheduling immediately. Whereas the write queue size varies and
 goes above 20.
 Any guess on what's going wrong?
 I can use a CommMonitor to track incoming requests to DRAM but how can i
 track read/writes to DRAM ?

 Thanks,
 Prathap

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Tracking DRAM read/write requests

2014-10-03 Thread Prathap Kolakkampadath via gem5-users
Hi Users,

I am using an O3 4 cpu ARMv7 with DDR3_1600_x64. L1 I/Dcache size=32k and
L2Cache size=1MB. #MSHRs' L1 = 10 and #MSHRs' L2 = 30.According to my
understanding, this will enable each core to generate 10 outstanding memory
requests.
I am running a bandwidth test on all cpu's, which is memory-intesnive and
generates consequent read requests to DRAM.

However when i captured the DRAM debug messages, i could see that the DRAM
read queue size is varying only between 0-2(expected a queue fill)  and
reads are scheduling immediately. Whereas the write queue size varies and
goes above 20.
Any guess on what's going wrong?
I can use a CommMonitor to track incoming requests to DRAM but how can i
track read/writes to DRAM ?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Query regarding DRAM controller's FR-FCFC scheduler implementation.

2014-10-01 Thread Prathap Kolakkampadath via gem5-users
Hi Users,


I am going through the FR-FCFS implementaion of gem5 DRAM Controller.

When the queue.size() is greater than 1 and memSchedPolicy ==
Enums::frfcfs, the ChooseNext function calls reorderQueue.

The reorderQueue function searches for row hits first in the queue  and if
there is a row hit it selects that request as the next request to be
processed.

My question here is, what if there are multiple row hits?  If there are
multiple row hits, is it not suppose to choose the first come among
multiple row hits? I think the current implementation doesn't arbitrate
among multiple row hits. Is this correct or i am missing something.


Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Query regarding DRAM controller's FR-FCFS scheduler implementation.

2014-10-01 Thread Prathap Kolakkampadath via gem5-users
On Wed, Oct 1, 2014 at 1:59 PM, Prathap Kolakkampadath kvprat...@gmail.com
wrote:

 Hi Users,


 I am going through the FR-FCFS implementaion of gem5 DRAM Controller.

 When the queue.size() is greater than 1 and memSchedPolicy ==
 Enums::frfcfs, the ChooseNext function calls reorderQueue.

 The reorderQueue function searches for row hits first in the queue  and if
 there is a row hit it selects that request as the next request to be
 processed.

 My question here is, what if there are multiple row hits?  If there are
 multiple row hits, is it not suppose to choose the first come among
 multiple row hits? I think the current implementation doesn't arbitrate
 among multiple row hits. Is this correct or i am missing something.


 Thanks,
 Prathap

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Query regarding DRAM controller's FR-FCFC scheduler implementation.

2014-10-01 Thread Prathap Kolakkampadath via gem5-users
Yes. I got that.
Thanks Amin.

On Wed, Oct 1, 2014 at 2:56 PM, Amin Farmahini amin...@gmail.com wrote:

 Prathap,

 As far as I remember, it choses the first request (oldest one) among hits.
 It starts from head of the queue and once there is a hit, you have got the
 first come among multiple row hits.

 Thanks,
 Amin

 On Wed, Oct 1, 2014 at 1:59 PM, Prathap Kolakkampadath via gem5-users 
 gem5-users@gem5.org wrote:

 Hi Users,


 I am going through the FR-FCFS implementaion of gem5 DRAM Controller.

 When the queue.size() is greater than 1 and memSchedPolicy ==
 Enums::frfcfs, the ChooseNext function calls reorderQueue.

 The reorderQueue function searches for row hits first in the queue  and
 if there is a row hit it selects that request as the next request to be
 processed.

 My question here is, what if there are multiple row hits?  If there are
 multiple row hits, is it not suppose to choose the first come among
 multiple row hits? I think the current implementation doesn't arbitrate
 among multiple row hits. Is this correct or i am missing something.


 Thanks,
 Prathap

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Switching CPU type from a checkpoint fails when using memory type dramsim2

2014-09-09 Thread Prathap Kolakkampadath via gem5-users
Hello Everybody,

I have created a checkpoint with cpu type 'atomic' and mem type 'dramsim2.
While switching to cpu type 'detailed' from this checkpoint simulation
fails with below error.

Switch at curTick count:1
info: Entering event queue @ 3534903961500.  Starting simulation...
writing vis file to
ext/dramsim2/DRAMSim2//results//DDR3_micron_32M_8B_x8_sg15/2GB.1Ch.1R.scheme2.open_page.32TQ.32CQ.RtB.pRank.vis
Switched CPUS @ tick 3534903971500
switching cpus
 REAL SIMULATION 
info: Entering event queue @ 3534903971500.  Starting simulation...
gem5.opt: build/ARM/mem/dramsim2.cc:293: void
DRAMSim2::readComplete(unsigned int, uint64_t, uint64_t): Assertion `cycle
== divCeil(curTick() - startTick, wrapper.clockPeriod() *
SimClock::Int::ns)' failed.
Program aborted at tick 3535124958500
Aborted (core dumped)

Do anyone know what went wrong?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Questions on DRAM Controller model

2014-09-09 Thread Prathap Kolakkampadath via gem5-users
Hello Andreas,

Thanks for your reply. I read your ISPASS paper and got a fair
understanding about the architecture.
I am trying to reproduce the results, collected from running synthetic
benchmarks (latency and bandwidth) on real hardware, in Simulator
Environment.However, i could see variations in the results and i am trying
to understand the reasons.

The experiment has latency(memory non-intensive with random access) as the
primary task and bandwidth(memory intensive with sequential access) as the
co-runner task.


On real hardware
case 1 - 0 corunner : latency of the test is 74.88ns and b/w 854.74MB/s
case 2 - 1 corunner : latency of the test is 225.95ns and b/w 283.24MB/s

On simulator
case 1 - 0 corunner : latency of the test is 76.08ns and b/w 802.25MB/s
case 2 - 1 corunner : latency of the test is 93.69ns and b/w 651.57MB/s


Case 1 where latency test run alone(0 corunner), the results matches on
both environment. However Case 2, when run with bandwidth(1 corunner), the
results varies a lot.
Do you have any thoughts about this?
Thanks,
Prathap

On Mon, Sep 8, 2014 at 1:46 PM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  Have you read our ISPASS paper from last year? It’s referenced in the
 header file, as well as on gem5.org.

1. Yes and no. Two different buffers are used in the model are used,
but they are random access, so you can treat the entries any way you want.
2. Yes and no. It’s a C++ model, so the scheduler executes in 0 time.
Thus, when looking at the various requests it effectively sees all the
banks.
3. Yes and no. See above.

 Remember that this is a model. The goal is not to be representative down
 to every last element of an RTL design. The goal is to be representative of
 a real design, and then be fast. Both of these goals are delivered upon by
 the model.

  I hope that explains it. IF there is anything in the results you do not
 agree with, please do say so.

  Thanks,

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Monday, 8 September 2014 18:38
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] Questions on DRAM Controller model

  Hello Everybody,

 I am using DDR3_1600_x64. I am trying to understand the memory controller
 design and  have few doubts about it.

 1) Do the memory controller has a separate  Bank request buffer (read and
 write buffer) for each bank or just a global queue?
 2) Is there a scheduler per bank which arbitrates between different queue
 requests parallel with other bank schedulers?
 3) Is there DRAM bus scheduler that arbitrates between different bank
 requests?

 Thanks,
 Prathap

 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Questions on DRAM Controller model

2014-09-08 Thread Prathap Kolakkampadath via gem5-users
Hello Everybody,

I am using DDR3_1600_x64. I am trying to understand the memory controller
design and  have few doubts about it.

1) Do the memory controller has a separate  Bank request buffer (read and
write buffer) for each bank or just a global queue?
2) Is there a scheduler per bank which arbitrates between different queue
requests parallel with other bank schedulers?
3) Is there DRAM bus scheduler that arbitrates between different bank
requests?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Switching from Atomic CPU to Detailed CPU after Linux booted up

2014-09-03 Thread Prathap Kolakkampadath via gem5-users
Thanks Mitch. It worked.
Is it possible to verify if the system is running with switched cpu after
restore?


On Tue, Sep 2, 2014 at 3:33 PM, Mitch Hayenga mitch.hayenga+g...@gmail.com
wrote:

 Yes you can.  Generally the preferred way to run is to boot/start a
 benchmark with the atomic CPU and then drop a checkpoint.  You can then
 restore from the checkpoint with the detailed CPU.

 Simple use case:
 1) specify gem5 command with --checkpoint-at-end and the atomic CPU
 2) Once the benchmark/boot gets to the place you desire kill the
 simulation (CTRL-C).  This creates a checkpoint
 3) Restore from the checkpoint with the detailed CPU (specify the desired
 cpu model and also -r1 to restore from the checkpoint)




 On Tue, Sep 2, 2014 at 3:20 PM, Prathap Kolakkampadath via gem5-users 
 gem5-users@gem5.org wrote:

 Hi Users,

 I am trying to run some benchmarks on ARM detailed cpu. However the
 simulation takes a very long time for linux to bootup and is stuck at
 freeing init memory and  not mounting the filesystem. In case of atomic
 cpu, the kernel boots up to console quite fastly. I would like to know if i
 can use atomic simple cpu during linux boot phase and then switch to
 detailed cpu before i start running benchmarks?

 Thanks,
 Prathap

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Switching from Atomic CPU to Detailed CPU after Linux booted up

2014-09-02 Thread Prathap Kolakkampadath via gem5-users
Hi Users,

I am trying to run some benchmarks on ARM detailed cpu. However the
simulation takes a very long time for linux to bootup and is stuck at
freeing init memory and  not mounting the filesystem. In case of atomic
cpu, the kernel boots up to console quite fastly. I would like to know if i
can use atomic simple cpu during linux boot phase and then switch to
detailed cpu before i start running benchmarks?

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] How to add shared nonblocking L3 cache in gem5?

2014-08-28 Thread Prathap Kolakkampadath via gem5-users
Thanks Andreas. I have one more question regarding cache. Is it posible to
create a system with multiple L2 caches each private to a specific core?


On Wed, Aug 27, 2014 at 2:51 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  You can easily create a subclass of the BaseCache and give it suitable
 parameters for an L3. This should be fairly straight forward and also easy
 to instantiate in the Python scripts (e.g. fs.py)

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Wednesday, 27 August 2014 05:25
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] How to add shared nonblocking L3 cache in gem5?

  Hi Users,


  I am new to gem5 and I want to add nonblacking shared Last level
 cache(L3). I could see L3 cache options in Options.py with default values
 set. However there is no entry for L3 in Caches.py and CacheConfig.py.

  So extending Cache.py and CacheConfig.py would be enough to create L3
 cache?


  Thanks,
 Prathap


 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] How to add shared nonblocking L3 cache in gem5?

2014-08-28 Thread Prathap Kolakkampadath via gem5-users
In that case whether the MSHR's would be shared between 2 L2 Caches or can
have separate MSHR for each L2 cache?

Thanks.


On Thu, Aug 28, 2014 at 11:52 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi Prathap,

  Definitely. The gem5 memory system let’s you build any tree-topology you
 like, even unbalanced (L2 for one core, and no L2 for another etc, 2 core
 for one L2 and a single core for the next). Just instantiate an L2 per
 core, connect it with a CoherentBus to the L1s of that core, and then use a
 CoherentBus on the memory-side of the L2 to “merge” the tree into the L3
 (or use split L3’s as well).

  If you’ve got pydot installed gem5 generates a PDF/SVG showing the
 system layout to visually ensure you’ve accomplished what you intended.

  Andreas

   From: Prathap Kolakkampadath kvprat...@gmail.com
 Date: Thursday, 28 August 2014 17:47
 To: Andreas Hansson andreas.hans...@arm.com
 Cc: gem5 users mailing list gem5-users@gem5.org
 Subject: Re: [gem5-users] How to add shared nonblocking L3 cache in gem5?

  Thanks Andreas. I have one more question regarding cache. Is it posible
 to create a system with multiple L2 caches each private to a specific core?


 On Wed, Aug 27, 2014 at 2:51 AM, Andreas Hansson andreas.hans...@arm.com
 wrote:

  Hi Prathap,

  You can easily create a subclass of the BaseCache and give it suitable
 parameters for an L3. This should be fairly straight forward and also easy
 to instantiate in the Python scripts (e.g. fs.py)

  Andreas

   From: Prathap Kolakkampadath via gem5-users gem5-users@gem5.org
 Reply-To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Wednesday, 27 August 2014 05:25
 To: gem5 users mailing list gem5-users@gem5.org
 Subject: [gem5-users] How to add shared nonblocking L3 cache in gem5?

  Hi Users,


  I am new to gem5 and I want to add nonblacking shared Last level
 cache(L3). I could see L3 cache options in Options.py with default values
 set. However there is no entry for L3 in Caches.py and CacheConfig.py.

  So extending Cache.py and CacheConfig.py would be enough to create L3
 cache?


  Thanks,
 Prathap


 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782



 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] How to add shared nonblocking L3 cache in gem5?

2014-08-26 Thread Prathap Kolakkampadath via gem5-users
Hi Users,


I am new to gem5 and I want to add nonblacking shared Last level cache(L3).
I could see L3 cache options in Options.py with default values set. However
there is no entry for L3 in Caches.py and CacheConfig.py.

So extending Cache.py and CacheConfig.py would be enough to create L3 cache?


Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Integrate DRAMSim2 with gem5

2014-08-25 Thread Prathap Kolakkampadath via gem5-users
Thanks Andreas and Debiprasanna for your reply. I wanted to use a Cycle
Accurate Memory System Simulator. I got more details about native gem5 DRAM
controller from gem5 publications page. I am doing some experiments with
LLC and its really good to know that native DRAM controller match
contemporary controller architecture.


On Mon, Aug 25, 2014 at 4:14 AM, Andreas Hansson andreas.hans...@arm.com
wrote:

  Hi all,

  DRAMSim2 is indeed integrated through a wrapper that has been part of
 gem5 since March. Note that we integrated it primarily as a reference
 implementation, and that I would suggest to stick to the native gem5 DRAM
 controller. It is more flexible, more tightly integrated with the gem5
 configuration scripts, and will not slow down your simulations like
 DRAMSim2.

  See the gem5 publications page for more info on the DRAM controller in
 gem5: http://www.gem5.org/Publications

  Andreas

   From: Debiprasanna Sahoo via gem5-users gem5-users@gem5.org
 Reply-To: Debiprasanna Sahoo debiprasanna.sa...@gmail.com, gem5 users
 mailing list gem5-users@gem5.org
 Date: Monday, August 25, 2014 at 5:24 AM
 To: Prathap Kolakkampadath kvprat...@gmail.com, gem5 users mailing list
 gem5-users@gem5.org
 Subject: Re: [gem5-users] Integrate DRAMSim2 with gem5

  Hi Prathap,

  It is already integrated in the development version. You can download it
 from the dev repository. In the next release of gem5 stable, you can find
 the same in the stable repository.

  Thanks,
 Debiprasanna Sahoo
  Research Scholar
 IIT Bhubaneswar


 On Sun, Aug 24, 2014 at 11:16 PM, Prathap Kolakkampadath via gem5-users 
 gem5-users@gem5.org wrote:

 Hi Users,

  Has anyone successfully integrated  DRAMSim2 with  gem5? If so please
 point me to the patch and the version of gem5 used.

  Thanks,
 Prathap

 ___
 gem5-users mailing list
 gem5-users@gem5.org
 http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



 -- IMPORTANT NOTICE: The contents of this email and any attachments are
 confidential and may also be privileged. If you are not the intended
 recipient, please notify the sender immediately and do not disclose the
 contents to any other person, use it for any purpose, or store or copy the
 information in any medium. Thank you.

 ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2557590
 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
 Registered in England  Wales, Company No: 2548782

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Integrate DRAMSim2 with gem5

2014-08-24 Thread Prathap Kolakkampadath via gem5-users
Hi Users,

Has anyone successfully integrated  DRAMSim2 with  gem5? If so please point
me to the patch and the version of gem5 used.

Thanks,
Prathap
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users