Re: [gem5-users] Questions about the detailed and useful comments in the Cache access related code

2018-08-29 Thread Gongjin Sun
Excellent explanations, Nikos! Now I got it. Thank you for your time!  I
believe more users will benefit from your explanations.

Best regards

gjins


On Wed, Aug 29, 2018 at 8:23 AM, Nikos Nikoleris 
wrote:

> Hi Gjins,
>
> On 29/08/2018 10:50, Gongjin Sun wrote:
> > Thank you for clear explanations, Nikos. But I still have several
> > follow-up discussions. Please see them below.
> >
> >
> > On Tue, Aug 28, 2018 at 5:32 AM, Nikos Nikoleris
> > mailto:nikos.nikole...@arm.com>> wrote:
> >
> > Hi Gjins,
> >
> > Please see below for my response.
> >
> > On 27/08/2018 07:28, Gongjin Sun wrote:
> > >
> > > 1 BaseCache::access(PacketPtr pkt, CacheBlk *, Cycles
> > > , PacketList ) (src/mem/cache/base.cc)
> > >
> > > (1) In the segment "if (pkt->isEviction()) { ...}",  if I
> understand it
> > > correctly, this code segment checks whether arriving requests
> (Writeback
> > > and CleanEvict) have already had their copies (for the same block
> > > address) in the Write Buffer and handle them accordingly.
> > >
> > > But I notice the comments
> > > "// We check for presence of block in above caches before issuing
> > > // Writeback or CleanEvict to write buffer. Therefore the only
> > > ...
> > > ", it is confusing to say here "in above caches". Shouldn't it be
> "for
> > > presence of block in this Write Buffer"?
> >
> > At this point, a cache above performed an eviction and this cache has
> > received the packet pkt. Before anything else, we search the write
> > buffer of this cache for any packet wbPkt for the same block. If we
> find
> > a matching wbPkt, then wbPkt has to be a writeback (can't be a
> > CleanEvict).
> >
> > When we add a packet (wbPkt) to the write buffer we check if the
> block
> > is cached above (see Cache::doWritebacks()). If it is cached above
> and
> > the packet is a CleanEvict or a WritebackClean then we just squash it
> > and we don't add it to the write buffer.
> >
> > In this case, we just received an eviction from a cache above (pkt),
> > which means that wbPkt can't be a CleanEvict since it would have been
> > squashed.
> >
> > I agree though the comment here is not crystal clear. We should
> probably
> > update it.
> >
> >
> > Thanks. For example, in the "a Writeback generated in this cache peer
> > cache ...", does "this cache peer cache" mean "this cache" or "this
> > cache's peer cache (in another core)"?
> >
>
> I believe this is a typo. It should be:
> Therefore the only possible cases can be of a CleanEvict or a
> WritebackClean packet coming from above encountering a Writeback
> generated in this cache and waiting in the write buffer.
>
> > In addition, for "Cases of upper level peer caches ... simultaneously",
> > it says two scenarios: 1) upper level peer caches (they should be
> > multiple cores' L1 cache assuming this cache is shared L2) generate
> > CleanEvict and Writeback respectively, 2) upper level peer caches only
> > generate CleanEvict, is my understanding correct?
> >
>
> 1) Could be more than one cache above
> 2) A cache above can generate CleanEvict or WritebackClean if am not
> missing something.
>
>
> >
> > >
> > > Also, about the comments
> > > "// Dirty writeback from above trumps our clean writeback...
> discard
> > > here", why is the local found writeback is clean? I think it could
> be
> > > clean or dirty. So arriving dirty writeback sees local writeback
> in the
> > > write buffer and the former could be (but not necessarily) newer
> than
> > > the latter. (One such scenario is: cpu core write hit block A in
> L1 data
> > > cache and then write it back to L2. Then core read it into L1
> again.
> > > Next, the dirty A is put into Write Buffer in L2. After that, the
> cpu
> > > core could "write back A to L2 again" or "write A (the second
> write) and
> > > then write back A to L2 again". The latter makes arriving dirty A
> has
> > > different value from the dirty A in L2's write buffer.)
> > >
> >
> > In your example, I believe that the 2nd ReadEx that hits in L2 and
> finds
> > the block dirty will clear the dirty bit and r

Re: [gem5-users] Questions about the detailed and useful comments in the Cache access related code

2018-08-29 Thread Gongjin Sun
Thank you for clear explanations, Nikos. But I still have several follow-up
discussions. Please see them below.


On Tue, Aug 28, 2018 at 5:32 AM, Nikos Nikoleris 
wrote:

> Hi Gjins,
>
> Please see below for my response.
>
> On 27/08/2018 07:28, Gongjin Sun wrote:
> >
> > 1 BaseCache::access(PacketPtr pkt, CacheBlk *, Cycles
> > , PacketList ) (src/mem/cache/base.cc)
> >
> > (1) In the segment "if (pkt->isEviction()) { ...}",  if I understand it
> > correctly, this code segment checks whether arriving requests (Writeback
> > and CleanEvict) have already had their copies (for the same block
> > address) in the Write Buffer and handle them accordingly.
> >
> > But I notice the comments
> > "// We check for presence of block in above caches before issuing
> > // Writeback or CleanEvict to write buffer. Therefore the only
> > ...
> > ", it is confusing to say here "in above caches". Shouldn't it be "for
> > presence of block in this Write Buffer"?
>
> At this point, a cache above performed an eviction and this cache has
> received the packet pkt. Before anything else, we search the write
> buffer of this cache for any packet wbPkt for the same block. If we find
> a matching wbPkt, then wbPkt has to be a writeback (can't be a CleanEvict).
>
> When we add a packet (wbPkt) to the write buffer we check if the block
> is cached above (see Cache::doWritebacks()). If it is cached above and
> the packet is a CleanEvict or a WritebackClean then we just squash it
> and we don't add it to the write buffer.
>
> In this case, we just received an eviction from a cache above (pkt),
> which means that wbPkt can't be a CleanEvict since it would have been
> squashed.
>
> I agree though the comment here is not crystal clear. We should probably
> update it.
>

Thanks. For example, in the "a Writeback generated in this cache peer cache
...", does "this cache peer cache" mean "this cache" or "this cache's peer
cache (in another core)"?

In addition, for "Cases of upper level peer caches ... simultaneously", it
says two scenarios: 1) upper level peer caches (they should be multiple
cores' L1 cache assuming this cache is shared L2) generate CleanEvict and
Writeback respectively, 2) upper level peer caches only generate
CleanEvict, is my understanding correct?


> >
> > Also, about the comments
> > "// Dirty writeback from above trumps our clean writeback... discard
> > here", why is the local found writeback is clean? I think it could be
> > clean or dirty. So arriving dirty writeback sees local writeback in the
> > write buffer and the former could be (but not necessarily) newer than
> > the latter. (One such scenario is: cpu core write hit block A in L1 data
> > cache and then write it back to L2. Then core read it into L1 again.
> > Next, the dirty A is put into Write Buffer in L2. After that, the cpu
> > core could "write back A to L2 again" or "write A (the second write) and
> > then write back A to L2 again". The latter makes arriving dirty A has
> > different value from the dirty A in L2's write buffer.)
> >
>
> In your example, I believe that the 2nd ReadEx that hits in L2 and finds
> the block dirty will clear the dirty bit and respond with the flag
> cacheResponding which means that the L1 will fill-in and mark the block
> as dirty. In this particular case, I am not sure the L2 can have the
> block dirty.


Yea, you are right. The 2nd ReadExReq will clear the dirty bit and set
CacheResponding flag (in Cache::satisfyRequest(...), cache.cc). But this
block still has dirty data even it is not marked "dirty" any more ...


>

I think the local writeback has to be clean but I might be wrong in any
> case we should add an assertion here:
> assert(wbPkt->isCleanEviction);
> or better
> assert(wbPkt->cmd == MemCmd::WritebackClean;


I agree with you. I cannot think any scenarios which allow an incoming
WritebackDirty from above cache to see a second local WritebackDirty.
Actually, it looks like this is guaranteed by Gem5's MOESI implementation
which only allows one dirty block to exist in the whole cache hierarchy.
The scenario I mentioned only could happen when multiple dirty blocks are
allowed to exist. Speaking of this, I have a relevant question below about
Gem5's own MOESI. (see below, why is only one dirty block allowed)




> > About the comments
> > "// The CleanEvict and WritebackClean snoops into other
> > // peer caches of the same level while traversing the",
> >
> > Do here "peer caches of the same level" mean the caches of the same
> > level in other cpus?
&

[gem5-users] Questions about the detailed and useful comments in the Cache access related code

2018-08-27 Thread Gongjin Sun
Hi,

First, I really thank the maintainer(s) of the cache code who wrote so many
detailed comments for almost all key code in the cache access path to help
the readers (especially the beginners) understand how the cache hierarchy
works.

I read these useful comments again and again and understand most of them.
But still some are not so easy to understand. I list them as follows and
hopefully can get some answers. I appreciate it a lot!

1 BaseCache::access(PacketPtr pkt, CacheBlk *, Cycles , PacketList
) (src/mem/cache/base.cc)

(1) In the segment "if (pkt->isEviction()) { ...}",  if I understand it
correctly, this code segment checks whether arriving requests (Writeback
and CleanEvict) have already had their copies (for the same block address)
in the Write Buffer and handle them accordingly.

But I notice the comments
"// We check for presence of block in above caches before issuing
// Writeback or CleanEvict to write buffer. Therefore the only
...
", it is confusing to say here "in above caches". Shouldn't it be "for
presence of block in this Write Buffer"?

Also, about the comments
"// Dirty writeback from above trumps our clean writeback... discard here",
why is the local found writeback is clean? I think it could be clean or
dirty. So arriving dirty writeback sees local writeback in the write buffer
and the former could be (but not necessarily) newer than the latter. (One
such scenario is: cpu core write hit block A in L1 data cache and then
write it back to L2. Then core read it into L1 again. Next, the dirty A is
put into Write Buffer in L2. After that, the cpu core could "write back A
to L2 again" or "write A (the second write) and then write back A to L2
again". The latter makes arriving dirty A has different value from the
dirty A in L2's write buffer.)

About the comments
"// The CleanEvict and WritebackClean snoops into other
// peer caches of the same level while traversing the",

Do here "peer caches of the same level" mean the caches of the same level
in other cpus?

(2) About the comments
"// we could get a clean writeback while we are having outstanding accesses
to a block, ..."
How does this happen? I just cannot understand this. If we see an
outstanding access in local cache, that means it must miss in above caches
for the same cpu. How can the above cache still evict a clean block (it is
a miss) and write it back to next cache level? Would you like to show one
scenario for this?

2 BaseCache::handleFill(PacketPtr pkt, CacheBlk *blk, PacketList
, bool allocate)

(1) About the comments
"// existing block... probably an upgrade
// either we're getting new data or the block should already be valid"

How does the block become valid from previous (when "pkt" as a request
packet was accessing the cache but not satisfied) invalid status?

(2) About the comments
"// we got the block in Modified state, and invalidated the owners copy"

After it, there is a statement "blk->status |= BlkDirty;", but I don't find
any statements about "invalidated the owners copy" as mentioned in the
above comments. Where is it?

Thank you in advance!

gjins
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] snoop filter exceeded capacity of 131072 cache blocks

2018-08-14 Thread Gongjin Sun
Hi,

I'm running a single thread SPEC CPU2006 benchmark (410.bwaves) on the
latest Gem5 version with a generated checkpoint (at 100th
instructions). Unfortunately, I encountered the error as the title says:

panic: panic condition !is_hit && (cachedLocations.size() >= maxEntryCount)
occurred: snoop filter exceeded capacity of 131072 cache blocks
Memory Usage: 4446868 KBytes
Program aborted at tick 74742779000
--- BEGIN LIBC BACKTRACE ---
/home/gongjins/gem5/build/X86/gem5.opt(_Z15print_backtracev+0x28)[0x12ec698]
/home/gongjins/gem5/build/X86/gem5.opt(_Z12abortHandleri+0x46)[0x12ff1e6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x2ab45fca9390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x2ab460dde428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x2ab460de002a]
/home/gongjins/gem5/build/X86/gem5.opt[0x6eeeff]
/home/gongjins/gem5/build/X86/gem5.opt(_ZN11SnoopFilter11lookupSnoopEPK6Packet+0x153)[0x104a273]
/home/gongjins/gem5/build/X86/gem5.opt(_ZN12CoherentXBar18recvTimingSnoopReqEP6Packets+0x12e)[0x10149fe]
/home/gongjins/gem5/build/X86/gem5.opt(_ZN5Cache12doWritebacksERNSt7__cxx114listIP6PacketSaIS3_EEEm+0xa4)[0x13ccf54]
/home/gongjins/gem5/build/X86/gem5.opt(_ZN9BaseCache14recvTimingRespEP6Packet+0x59a)[0x13c712a]
/home/gongjins/gem5/build/X86/gem5.opt(_ZN9BaseCache11MemSidePort14recvTimingRespEP6Packet+0x14)[0x13b42e4]
/home/gongjins/gem5/build/X86/gem5.opt[0x103ec82]
/home/gongjins/gem5/build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xc5)[0x12f2e85]
/home/gongjins/gem5/build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x50)[0x130aee0]
/home/gongjins/gem5/build/X86/gem5.opt(_Z8simulatem+0xd1b)[0x130bfcb]
/home/gongjins/gem5/build/X86/gem5.opt[0x10e4b3a]
/home/gongjins/gem5/build/X86/gem5.opt[0x86ead7]

...


Any explanation or solution are appreciated!

some command line options are as follows:
"-n 1 --sys-clock=1GHz
--cpu-clock=4GHz
--checkpoint-dir=/home/gjins/benchmark/cpu2006-gcc/single/410_run/m5out
--at-instruction -r 100 -s 1 -W 1000 -I 1000
--mem-size=4294967296 --mem-type=DDR3_1600_8x8 --caches --l2cache
--ctx-swit-intval 10 --l1i_size=32kB --l1d_size=32kB --l2_size=256kB
--l1i_assoc=8 --l1d_assoc=8 --l2_assoc=8 --cacheline_size=64 -c
/home/gjins/benchmark/cpu2006-gcc/single/410_run/bwaves_base.none -o 2"

gjins
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] The recent change "Don't override ISA if provided by user" probably prevents the checkpoint from being restored

2017-12-28 Thread Gongjin Sun
Solved. Just saw that this change (Fix need to set ISA of switch cpus)
fixed this problem.

gjins

On Thu, Dec 28, 2017 at 4:57 PM, Gongjin Sun <gongj...@uci.edu> wrote:

> Hi,
>
> When I was trying to restore my checkpoint (single thread run, SE model),
> I got the following error message:
>
> fatal: Number of ISAs (0) assigned to the CPU does not equal number of
> threads (1).
>
> I searched for this message and found it is in the end of the function of
> "BaseCPU::BaseCPU" in "src/cpu/base.cc". Then I tried to track where the
> value isa is assigned  and found it is assigned in the file of
> "src/cpu/BaseCpu.py". I noticed the isa is initialized to empty like the
> following statement (take x86 for example):
>
> isa = VectorParam.X86ISA([], "ISA instance")
>
> Then I searched for the changes related to the above statement and found
> the change called "Don't override ISA if provided by user" did this change (
> https://www.mail-archive.com/gem5-dev@gem5.org/msg23775.html):
>
> -isa = VectorParam.X86ISA([ isa_class() ], "ISA instance")
> +isa = VectorParam.X86ISA([], "ISA instance")
>
> Obviously, this change removed previous initial value to empty. After I
> changed the above assignment statement to the previous version, that is,
> "isa = VectorParam.X86ISA([ isa_class() ], "ISA instance")", the above
> error message was gone and my checkpoint was restored successfully.
>
> So I'll appreciate it if any maintainer would like to fix it.
>
> gjins
>
>
>
>
>
>
>
>
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] The recent change "Don't override ISA if provided by user" probably prevents the checkpoint from being restored

2017-12-28 Thread Gongjin Sun
Hi,

When I was trying to restore my checkpoint (single thread run, SE model), I
got the following error message:

fatal: Number of ISAs (0) assigned to the CPU does not equal number of
threads (1).

I searched for this message and found it is in the end of the function of
"BaseCPU::BaseCPU" in "src/cpu/base.cc". Then I tried to track where the
value isa is assigned  and found it is assigned in the file of
"src/cpu/BaseCpu.py". I noticed the isa is initialized to empty like the
following statement (take x86 for example):

isa = VectorParam.X86ISA([], "ISA instance")

Then I searched for the changes related to the above statement and found
the change called "Don't override ISA if provided by user" did this change (
https://www.mail-archive.com/gem5-dev@gem5.org/msg23775.html):

-isa = VectorParam.X86ISA([ isa_class() ], "ISA instance")
+isa = VectorParam.X86ISA([], "ISA instance")

Obviously, this change removed previous initial value to empty. After I
changed the above assignment statement to the previous version, that is,
"isa = VectorParam.X86ISA([ isa_class() ], "ISA instance")", the above
error message was gone and my checkpoint was restored successfully.

So I'll appreciate it if any maintainer would like to fix it.

gjins
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] About multiple cache copies and a related comment called "on the same path to memory"

2017-12-18 Thread Gongjin Sun
Hi All,

I noticed that there are some useful comments in the file cache.cc as like
these:

"// if a cache is responding, and it had the line in Owned
// rather than Modified state, we need to invalidate any
// copies that are not on the same path to memory
"
"// we get away with multiple caches (on
// the same path to memory) considering
// the block writeable as we always enter
// the cache hierarchy through a cache,
// and first snoop upwards in all other
// branches"

What is the meaning of "on the same path to memory"?

Assume we have two cpus and 3 cache levels. The third level is shared by
these two cpus. And packet is in the level 2 (L2) of cpu1 currently. If I
understand it correctly, taking the first comment for example,

the cache which is responding is L1 of cpu1, the "on the same path to
memory" means the path from current L2 of cpu1 to to L3 to memory and the
copies which need to be invalidated should be the ones from L1, L2 of cpu2
(not including the copies of L3) . Is my understanding correct? Please
correct me and clarify it if not.

Thanks in advance!

gjins
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Multiprogrammed checkpoint not working.

2016-07-10 Thread Gongjin Sun
Three ways:

1 Open the m5.cpt in your checkpoint dir and change the "cur_ver" in the
section "[root]" to 11 ( 0xb). Then apply the cpt_upgrader.py

2 Add the "numMainEventQueues=1" in the section "[Globals]" (m5.cpt)
manually.

3 Change "0x000e" in serialized.hh into "0x000b",
then recompile the gem5 and regenerate the checkpoint.

Hope this can help you.

On Wed, Jun 15, 2016 at 1:28 AM, Xin He  wrote:

> Hi,
>
> I am using gem5 stable version from official website and I've created
> individual checkpoints in SE mode.
> These checkpoints can be restored and function well individually.
>
> Then I use util/checkpoint_aggregator.py to merge these checkpoints and no
> error reported so far.
> However, when restoring the aggregated checkpoints, error "fatal: Can't
> unserialize 'Globals:numMainEventQueues'" occurs.
>
> After this, I also tried the script util/cpt_upgrader.py, but this error
> still exists.
> From http://reviews.gem5.org/r/2324/, I know the cpt_upgrader.py and
> serialized.hh had been already patched.
> In the lastest stable version,
> In serialized.hh, gem5CheckpointVersion is 0x000e;
> In cpt_upgrader.py, migrations.append the "from_D" version.
>
> Does anyone solve this problem?
>
> Thanks!
>
> Xin He
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Weird! Address conflict between demand requests and l1-dcache prefetcher's requests

2016-07-09 Thread Gongjin Sun
Hi Everyone,

I am working on the stride prefetcher and tagged prefetcher in gem5
(gem5-stable-629fe6e6c781, the stable version last year). My benchmarks are
spec2006 and the configuration of prefetcher is:

L1dcache-stride prefetcher   degree=2
L1dcache-tagged prefetcher degree=2
L2 tagged prefetcherdegree=2

No L1 icache prefetcher

I use SE mode and start the benchmarks from a checkpoint with 40 Billion
instructions, then warmup 1B and run 1B instructions.

When I am doing the experiments, I found that sometimes icache issues a
ReadReq which has the SAME cache block address as a earlier HardPFreq from
L1dcache's prefetcher. This phenomenon confused me so much because as we
all know the code's address range is different from the data's one.
Considering gem5's prefetcher doesn't fetch the addresses across the page
bound, such address conflict looks very weird. The code page should be far
away from the data page, right?

In addition, it seems that no TLB is available in SE mode. So the memory
access addr should be physical address, right? (like pkt->getAddr())

Thanks in advance.

gjins
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Do all the memory access requests from cpu have pc address?

2016-03-27 Thread Gongjin Sun
Hi All,

As the title says, I'm not sure about this. Is this true?

Thanks in advance.

gjins
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] What's the exact meaning of MasterId?

2016-03-26 Thread Gongjin Sun
Hi Debiprasanna,

I forgot to mention one thing: I used the Hardware Prefetcher. Not only
this, I also changed the default handling for HardPFReq request in the
function of getBusPacket(), that is, let the packet with HardPFReq keep its
original cmd: MemCmd::HardPFReq and go through the lower cache levels. The
default behavior for this is to change the cmd of the HardPFReq request
packet into ReadReq.

In my first email, the strange request whose MasterId is 13 is a ReadReq,
but its isPrefetch() is True. That's why I feel so confused. Based on my
above modification, a ReadReq should definitely not be a HardPFReq. It
should a regular demand request from CPU.

isPrefetch() is defined as:

bool isPrefetch() const { return _flags.isSet(PREFETCH); }

I search all the code but can't find where the flag "PREFETCH' is set. It's
so strange.

On the other hand,

About _dataMasterId, it is defined and used as follows:

MasterID _dataMasterId;  // this is the original definition

BaseCPU::BaseCPU(Params *p, bool is_checker): ...
_dataMasterId(p->system->getMasterId(name() + ".data"))
  // this is its initialization

/** Reads this CPU's unique data requestor ID */
MasterID dataMasterId() { return _dataMasterId; }   // this is its wrapping

/** Read this CPU's data requestor ID */
MasterID masterId() const { return cpu->dataMasterId(); }  // this is one
of its uses

Fault TimingSimpleCPU::readMem(...)
{
RequestPtr req  = new Request(asid, addr, size, flags, dataMasterId(), pc,
_cpuId, tid); // this is its another typical use
}
   // dataMasterId() is used to generate a read request


Thank you all the same.

gjins





On Sat, Mar 26, 2016 at 7:10 PM, Debiprasanna Sahoo <
debiprasanna.sa...@gmail.com> wrote:

> Hi gjins,
>
> 18 Master IDs with single core system simulation? Sounds different from my
> experience. Let the Devs answer this.
>
> But if just do grep for MasterID, this is the import thing I get :
>
> src/cpu/base.hh:MasterID _instMasterId;
> src/cpu/base.hh:MasterID _dataMasterId;
>
>
>
> Regards,
> Debiprasanna Sahoo
>
> On Sat, Mar 26, 2016 at 7:00 PM, Gongjin Sun <gongj...@uci.edu> wrote:
>
>> Hi Debiprasanna,
>>
>> Thank you for your answer.  I simulated CPU2006 with a single X86 core
>> and SE mode. I started my simulation from a checkpoint I created before and
>> used a switch "--standard-switch".
>>
>> Why do you think MasterId is the "port number" of a cpu which issued a
>> request? Do you find any comments about this?
>>
>> I thought each MasterId represents one unique kind of request, simply, it
>> can differentiate different request types. But you mentioned the
>> terminology "port number", it looks like a physical concept. Could you
>> please explain it in more detail? I read all code related to MasterId, but
>> can't find its any relation to a "port". Also, I remember port is a
>> specific concept and is only used to connect different MemObject s, right?
>>
>> Thank you
>>
>> gjins
>>
>> On Sat, Mar 26, 2016 at 6:41 PM, Debiprasanna Sahoo <
>> debiprasanna.sa...@gmail.com> wrote:
>>
>>> Hi gjins,
>>>
>>> MasterId is the port number from which requested from the CPU. A CPU can
>>> have a master ID for instruction (switch_cpu.inst) or data
>>> (switch_cpu.data).
>>>
>>> Writebacks(wbMasterId) are also handled by a separate MasterId. The
>>> number of master ids depends on the number of cores you are using for
>>> simulation.
>>>
>>> How many cores you are simulating? Are you using fast-forward options ?
>>>
>>> Regards,
>>> Debiprasanna Sahoo
>>>
>>> On Sat, Mar 26, 2016 at 1:08 PM, Gongjin Sun <gongj...@uci.edu> wrote:
>>>
>>>> Hi All,
>>>>
>>>> The MasterId is defined in src/mem/request.hh. There are four specific
>>>> MasterIds: wbMasterId, funcMasterId, intMasterId and invldMasterId.
>>>> According to some comments, MasterID is used to generate request.
>>>>
>>>> However, during my many simulations, I printed the size of MasterId
>>>> (masterIds.size(), that is, the total of MasterIds) and found that the
>>>> total number of MasterId in system are more than 4. For example, in one of
>>>> my simulations, there are 18 MasterIds. One of them is 13, its MasterId's
>>>> name is switch_cpus.data, and its request type is ReadReq. I am curious
>>>> what kind of request is this.
>>>>
>>>>  I thought most common requests we see should be covered by the above
>>&g

Re: [gem5-users] What's the exact meaning of MasterId?

2016-03-26 Thread Gongjin Sun
Hi Debiprasanna,

Thank you for your answer.  I simulated CPU2006 with a single X86 core and
SE mode. I started my simulation from a checkpoint I created before and
used a switch "--standard-switch".

Why do you think MasterId is the "port number" of a cpu which issued a
request? Do you find any comments about this?

I thought each MasterId represents one unique kind of request, simply, it
can differentiate different request types. But you mentioned the
terminology "port number", it looks like a physical concept. Could you
please explain it in more detail? I read all code related to MasterId, but
can't find its any relation to a "port". Also, I remember port is a
specific concept and is only used to connect different MemObject s, right?

Thank you

gjins

On Sat, Mar 26, 2016 at 6:41 PM, Debiprasanna Sahoo <
debiprasanna.sa...@gmail.com> wrote:

> Hi gjins,
>
> MasterId is the port number from which requested from the CPU. A CPU can
> have a master ID for instruction (switch_cpu.inst) or data
> (switch_cpu.data).
>
> Writebacks(wbMasterId) are also handled by a separate MasterId. The number
> of master ids depends on the number of cores you are using for simulation.
>
> How many cores you are simulating? Are you using fast-forward options ?
>
> Regards,
> Debiprasanna Sahoo
>
> On Sat, Mar 26, 2016 at 1:08 PM, Gongjin Sun <gongj...@uci.edu> wrote:
>
>> Hi All,
>>
>> The MasterId is defined in src/mem/request.hh. There are four specific
>> MasterIds: wbMasterId, funcMasterId, intMasterId and invldMasterId.
>> According to some comments, MasterID is used to generate request.
>>
>> However, during my many simulations, I printed the size of MasterId
>> (masterIds.size(), that is, the total of MasterIds) and found that the
>> total number of MasterId in system are more than 4. For example, in one of
>> my simulations, there are 18 MasterIds. One of them is 13, its MasterId's
>> name is switch_cpus.data, and its request type is ReadReq. I am curious
>> what kind of request is this.
>>
>>  I thought most common requests we see should be covered by the above
>> specific ones, like funcMasterId which should cover the usual demand
>> read/write requests from cpu. So my confusion is why I still see so many
>> (up to 18) MasterIds in my simulation? Besides the four specific ones, why
>> do we still need other MasterIds and what are the rest uses for?
>>
>> In addition, are "all the ReadReq/WriteReq which are generated by cpu"
>> demand requests just from load/store instructions?
>>
>> Any help are very appreciated.
>>
>> gjins
>>
>>
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] What's the exact meaning of MasterId?

2016-03-26 Thread Gongjin Sun
Hi All,

The MasterId is defined in src/mem/request.hh. There are four specific
MasterIds: wbMasterId, funcMasterId, intMasterId and invldMasterId.
According to some comments, MasterID is used to generate request.

However, during my many simulations, I printed the size of MasterId
(masterIds.size(), that is, the total of MasterIds) and found that the
total number of MasterId in system are more than 4. For example, in one of
my simulations, there are 18 MasterIds. One of them is 13, its MasterId's
name is switch_cpus.data, and its request type is ReadReq. I am curious
what kind of request is this.

 I thought most common requests we see should be covered by the above
specific ones, like funcMasterId which should cover the usual demand
read/write requests from cpu. So my confusion is why I still see so many
(up to 18) MasterIds in my simulation? Besides the four specific ones, why
do we still need other MasterIds and what are the rest uses for?

In addition, are "all the ReadReq/WriteReq which are generated by cpu"
demand requests just from load/store instructions?

Any help are very appreciated.

gjins
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Why shouldn't WriteReq arrive at the cache levels below L1?

2016-02-15 Thread Gongjin Sun
solved.

On Sun, Feb 14, 2016 at 12:26 PM, Gongjin Sun <gongj...@uci.edu> wrote:

> Hi All,
>
> I'm doing some experiments to observe and understand the multi-level
> access mechanism in gem5.
>
> I understand that gem5 uses the write-allocate at L1 level for the write
> miss. In order to fill the fetched data from other levels, all the regular
> WriteReq (except for WriteInvalidate)'s cmd will be changed into ReadReq or
> ReadExReq at the L1 cache when these write requests are ready to be sent to
> next level. The related code are as follows:
>
>
> *Cache::recvTimingResp--*
> *bool is_fill = !mshr->isForward &&*
> *(pkt->isRead() || pkt->cmd == MemCmd::UpgradeResp);*
>
> *-*
>
> *-Cache::getBusPacket*
> * else {*
> *   // block is invalid*
> *   cmd = needsExclusive ? MemCmd::ReadExReq : MemCmd::ReadReq;*
> *}*
>
> *--*
>
> But now I want to implement a new access mode: L1 is NOT write-allocate
> for write miss, L2 is write-allocate for write miss. (Doing so is just to
> understand the gem5's cache mechanism and related code more well)
>
> So I made two minor modifications:
> 1 Add a new 'else if" branch below "else if
> (cpu_pkt->isWriteInvalidate())" in *Cache::getBusPacket:*
>
>*else if (cpu_pkt->cmd==MemCmd::WriteReq && isTopLevel) {*
> *   cmd = cpu_pkt->cmd;*
>
> This change will allow WriteReq to go through L1 and arrive at L2 (just
> L2, then will be changed ReadReq or ReadExReq like the original case)
>
> 2 Add a new 'else if" branch below "else if (pkt->cmd ==
> MemCmd::UpgradeFailResp) {" in Cache::recvTimingResp:
>
>  *else if (pkt->cmd== MemCmd::WriteResp) {*
> *   completion_time += clockEdge(responseLatency) + pkt->payloadDelay;*
>
>  This means that when L1 receives the WriteResp packet from L2, L1 doesn't
> fill data to its cache. Except this, anything else isn't changed.
>
> Unfortunately, after doing so, I got a panic:
>
> panic: Tried to read unmapped address 0x2d2d2d30d46bcd25.
>
> I checked my code again and again. I checked the debuginfo and found this
> WriteReq can be returned correctly. After it returns to cpu, gem5 continued
> to advance instructions, that is, six inst fetch access. The last one looks
> like:
>
> *1284000: system.cpu: Fetch*
> *1284000: system.cpu: Complete ICache Fetch for addr 0*
>
> Then I also used gdb to get the backtrace info:
>
>
> --
> #0  0x75e7d657 in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:55
>
> #1  0x75e7ea2a in __GI_abort () at abort.c:89
>
> #2  0x010ae00e in __exit_epilogue (code=-1,
> func=0x1dae81e <X86ISA::PageFault::invoke(ThreadContext*,
> RefCountingPtr const&)::__FUNCTION__> "invoke",
> file=0x1dae60d "build/X86/arch/x86/faults.cc", line=160,
> format=0x1dae6a0 "Tried to %s unmapped address %#x.\n")
> at build/X86/base/misc.cc:94
>
> #3  0x00d44923 in __exit_message
> (prefix=0x1dae62a "panic", code=-1,
> func=0x1dae81e <X86ISA::PageFault::invoke(ThreadContext*,
> RefCountingPtr const&)::__FUNCTION__> "invoke",
> file=0x1dae60d "build/X86/arch/x86/faults.cc", line=160,
> format=0x1dae6a0 "Tried to %s unmapped address %#x.\n")
> at build/X86/base/misc.hh:81
>
> #4  0x00d4308f in X86ISA::PageFault::invoke (this=0x30c7910,
> tc=0x31ac3d0, inst=...) at build/X86/arch/x86/faults.cc:160
>
> #5  0x00dcef87 in BaseSimpleCPU::advancePC (this=0x3122ad0,
> fault=std::shared_ptr (count 3, weak 0) 0x30c7910)
> at build/X86/cpu/simple/base.cc:532
>
> #6  0x00dc7308 in TimingSimpleCPU::advanceInst (this=0x3122ad0,
> fault=std::shared_ptr (count 3, weak 0) 0x30c7910)
> at build/X86/cpu/simple/timing.cc:578
>
> #7  0x00dc5c7b in TimingSimpleCPU::translationFault
> (this=0x3122ad0, fault=std::shared_ptr (count 3, weak 0) 0x30c7910)
> at build/X86/cpu/simple/timing.cc:331
>
> #8  0x00dc6a24 in TimingSimpleCPU::finishTranslation
> (this=0x3122ad0, state=0x30c7ae0) at build/X86/cpu/simple/timing.cc:497
>
> #9  0x00dcaacd in DataTranslation<TimingSimpleCPU*>::finish
> (this=0x30ce300, fault=std::shared_ptr (count 3, weak 0) 0x30c7910,
> req=0x30c6aa0, tc=0x31ac3

[gem5-users] Why shouldn't WriteReq arrive at the cache levels below L1?

2016-02-14 Thread Gongjin Sun
Hi All,

I'm doing some experiments to observe and understand the multi-level access
mechanism in gem5.

I understand that gem5 uses the write-allocate at L1 level for the write
miss. In order to fill the fetched data from other levels, all the regular
WriteReq (except for WriteInvalidate)'s cmd will be changed into ReadReq or
ReadExReq at the L1 cache when these write requests are ready to be sent to
next level. The related code are as follows:

*Cache::recvTimingResp--*
*bool is_fill = !mshr->isForward &&*
*(pkt->isRead() || pkt->cmd == MemCmd::UpgradeResp);*
*-*
*-Cache::getBusPacket*
* else {*
*   // block is invalid*
*   cmd = needsExclusive ? MemCmd::ReadExReq : MemCmd::ReadReq;*
*}*
*--*

But now I want to implement a new access mode: L1 is NOT write-allocate for
write miss, L2 is write-allocate for write miss. (Doing so is just to
understand the gem5's cache mechanism and related code more well)

So I made two minor modifications:
1 Add a new 'else if" branch below "else if (cpu_pkt->isWriteInvalidate())"
in *Cache::getBusPacket:*

   *else if (cpu_pkt->cmd==MemCmd::WriteReq && isTopLevel) {*
*   cmd = cpu_pkt->cmd;*

This change will allow WriteReq to go through L1 and arrive at L2 (just L2,
then will be changed ReadReq or ReadExReq like the original case)

2 Add a new 'else if" branch below "else if (pkt->cmd ==
MemCmd::UpgradeFailResp) {" in Cache::recvTimingResp:

 *else if (pkt->cmd== MemCmd::WriteResp) {*
*   completion_time += clockEdge(responseLatency) + pkt->payloadDelay;*

 This means that when L1 receives the WriteResp packet from L2, L1 doesn't
fill data to its cache. Except this, anything else isn't changed.

Unfortunately, after doing so, I got a panic:

panic: Tried to read unmapped address 0x2d2d2d30d46bcd25.

I checked my code again and again. I checked the debuginfo and found this
WriteReq can be returned correctly. After it returns to cpu, gem5 continued
to advance instructions, that is, six inst fetch access. The last one looks
like:

*1284000: system.cpu: Fetch*
*1284000: system.cpu: Complete ICache Fetch for addr 0*

Then I also used gdb to get the backtrace info:

--
#0  0x75e7d657 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:55

#1  0x75e7ea2a in __GI_abort () at abort.c:89

#2  0x010ae00e in __exit_epilogue (code=-1,
func=0x1dae81e  "invoke",
file=0x1dae60d "build/X86/arch/x86/faults.cc", line=160,
format=0x1dae6a0 "Tried to %s unmapped address %#x.\n")
at build/X86/base/misc.cc:94

#3  0x00d44923 in __exit_message
(prefix=0x1dae62a "panic", code=-1,
func=0x1dae81e  "invoke",
file=0x1dae60d "build/X86/arch/x86/faults.cc", line=160,
format=0x1dae6a0 "Tried to %s unmapped address %#x.\n")
at build/X86/base/misc.hh:81

#4  0x00d4308f in X86ISA::PageFault::invoke (this=0x30c7910,
tc=0x31ac3d0, inst=...) at build/X86/arch/x86/faults.cc:160

#5  0x00dcef87 in BaseSimpleCPU::advancePC (this=0x3122ad0,
fault=std::shared_ptr (count 3, weak 0) 0x30c7910)
at build/X86/cpu/simple/base.cc:532

#6  0x00dc7308 in TimingSimpleCPU::advanceInst (this=0x3122ad0,
fault=std::shared_ptr (count 3, weak 0) 0x30c7910)
at build/X86/cpu/simple/timing.cc:578

#7  0x00dc5c7b in TimingSimpleCPU::translationFault
(this=0x3122ad0, fault=std::shared_ptr (count 3, weak 0) 0x30c7910)
at build/X86/cpu/simple/timing.cc:331

#8  0x00dc6a24 in TimingSimpleCPU::finishTranslation
(this=0x3122ad0, state=0x30c7ae0) at build/X86/cpu/simple/timing.cc:497

#9  0x00dcaacd in DataTranslation::finish
(this=0x30ce300, fault=std::shared_ptr (count 3, weak 0) 0x30c7910,
req=0x30c6aa0, tc=0x31ac3d0, mode=BaseTLB::Read) at
build/X86/cpu/translation.hh:244

#10 0x00d8223f in X86ISA::TLB::translateTiming (this=0x31109d0,
req=0x30c6aa0, tc=0x31ac3d0, translation=0x30ce300,
mode=BaseTLB::Read) at build/X86/arch/x86/tlb.cc:429

#11 0x00dc6311 in TimingSimpleCPU::readMem (this=0x3122ad0,
addr=3255307793404251429, data=0x7fffcde0 "", size=8, flags=3)
at build/X86/cpu/simple/timing.cc:409

#12 0x01d4c8ac in X86ISA::readMemTiming (xc=0x3122c30,
traceData=0x0, addr=3255307793404251429,
mem=@0x7fffcde0: 0, dataSize=8, flags=3) at
build/X86/arch/x86/memhelpers.hh:46

#13 0x01d3d3f5 in X86ISAInst::LdBig::initiateAcc (this=0x30c7e10,
xc=0x3122c30, traceData=0x0)
at 

Re: [gem5-users] About UpgradeReq and write hit

2016-02-08 Thread Gongjin Sun
Really thank you Steve, next I'll read the comment and related code again,
and hope can understand more about the working mechanism of multi-level
coherence.

By the way, I found a possible bug again, please help verify it. (I use
se.py)

--Cache::recvTimingResp

 } else {
 // not a cache fill, just forwarding response
 // responseLatency is the latency of the return path
 // from lower level cahces/memory to the core.
 completion_time += clockEdge(responseLatency) + pkt->payloadDe
 if (pkt->isRead() && !is_error) {
 // sanity check
 assert(pkt->getAddr() == tgt_pkt->getAddr());
 assert(pkt->getSize() >= tgt_pkt->getSize());

 tgt_pkt->setData(pkt->getConstPtr());



The problematic line is :
assert(pkt->getAddr() == tgt_pkt->getAddr());

In the beginning I didn't get this assert failure because all my
application can't enter this "else" code. Usually gem5's default behavior
is "fill" (that is, isFill is True).
But when I modified some code and asked gem5 to not fill data in some cache
level(for example, when a ReadReq misses in L1 but hits in L2, the returned
data will be sent to CPU directly and not filled in L1), it will enter this
code branch. Now I got this assert failure. After I debugged the process of
req and resp, I found the following fact(I have a three cache level
hierarchy):

When cpu's ReadReq arrives at L1 but misses, it will be allocated in L1
s MSHR. In mshr entry, BLOCK address will be used. However, mshr's target's
member 'pkt' will NOT be a block aligned address necessarily. The key thing
is that then when we call getBusPacket(), we generate a new request packet
with a block aligned address:

--getBusPacket()--
PacketPtr pkt = new Packet(cpu_pkt->req, cmd, blkSize);

 // the packet should be block aligned
 assert(pkt->getAddr() == blockAlign(pkt->getAddr()));



See, that's why when the returned resp packet arrives at L1, its
address(pkt->getAddr()) can't be equal to the target's packet's address
('tgt_pkt->getAddr()').

The is just my observed things. So please help verify it. Thank you.

Another thing, I noticed that gem5's access to cache use a physical
address. Why doesn't it use a virtual one? As I remember Virtual Index
Physical Tagged (VIPT) seems to be a common implementation. If I want to
observer the cache's access behavior by virtual address, how can I change
the configuration? I didn't find the method. (I noticed the class 'Request'
has four constructors, but only one is related to virtual address. I didn't
see too much use for this constructor with a virtual address.)

Best regards
gjins

On Mon, Feb 8, 2016 at 10:53 AM, Steve Reinhardt <ste...@gmail.com> wrote:

> The O state normally is not a writable state (that's what differentiates
> it from M).  The description in the wikipedia article is not very good; I
> suggest reading about MOESI from a textbook or some other source you may
> have access to.
>
> The gem5 protocol is a little unusual in how it handles states across
> different levels in a multi-level hierarchy, but that's covered in the
> comment I pointed you at previously.
>
> Steve
>
> On Sun, Feb 7, 2016 at 11:14 PM Gongjin Sun <gongj...@uci.edu> wrote:
>
>> Sorry, there is a typo: "on onwer exists" should be "no owner exists".
>>
>> I think more, and still can't understand why 'O' state has a "dirty" set
>> but can't be "writable". This owner has made changes to this line, but is
>> not "writable". That sounds like a contradiction. Or did I miss something?
>>
>> Thanks
>>
>> On Sun, Feb 7, 2016 at 11:03 PM, Gongjin Sun <gongj...@uci.edu> wrote:
>>
>>> Thank you, Steve.  But I'm still a little confused.
>>>
>>> For the "A write hit implies that a cache has an exclusive copy". If a
>>> miss happens at all cache levels, gem5 will bring this data line from
>>> memory to L3 to L2 to L1, level by level. Now this line has three copies
>>> and its state should be shared (clean). Next if a demand write request
>>> arrives at L1, it will hit. So now how can we handle the copies in L2 and
>>> L3? We can invalidate them, or propagate this line from L1 to l2 and l3 and
>>> make its state become shared(dirty) ??
>>>
>>> Also after I read the comments in CacheBlk::print(), I think gem5's
>>> MOESI looks like not a standard one compared with the MOESI from wikipedia:
>>> http

Re: [gem5-users] About UpgradeReq and write hit

2016-02-08 Thread Gongjin Sun
Oh, no That sounds a bad news to me ...

So does that mean that I have to add a new path to handle the situation for
cache bypass? Is there any easy way to do so? Steve, you are so experienced
and I'll really appreciate it if you would like to give me some suggestions
or hints about this.

Best regards
gjins

On Mon, Feb 8, 2016 at 1:03 PM, Steve Reinhardt <ste...@gmail.com> wrote:

> The "not fill data" path is really intended only for accesses to
> uncacheable memory.  It's not designed to be used for cache bypass
> operations for coherent cacheable data.
>
> Steve
>
>
> On Mon, Feb 8, 2016 at 11:40 AM Gongjin Sun <gongj...@uci.edu> wrote:
>
>> I have to supplement an important thing:
>> After I changed "assert(pkt->getAddr() == tgt_pkt->getAddr()); " into 
>> "assert(pkt->getAddr()
>> == blockAlign(tgt_pkt->getAddr()));" to pass this assert check, I got a
>> 'SIGABRT' signal (after a while of run) and my program exits:
>>
>> Program received signal SIGABRT, Aborted.
>>
>> I analysed this question but still feel it should not be related to my
>> change because logically I didn't make some mistakes about my
>> above-mentioned the address analysis of pkt and tgt_pkt. So I believe it is
>> my change that exposes some other problem. So please help check that. Thank
>> you so much.
>>
>>
>> 
>>
>> Some of the back trace are:
>>
>> 0x75e7d657 in __GI_raise (sig=sig@entry=6) at
>> ../sysdeps/unix/sysv/linux/raise.c:55
>> 0x75e7ea2a in __GI_abort () at abort.c:89
>> 0x010a9eb6 in __exit_epilogue (code=-1,
>> func=0x1da9bbe <X86ISA::PageFault::invoke(ThreadContext*,
>> RefCountingPtr const&)::__FUNCTION__> "invoke",
>> file=0x1da99ad "build/X86/arch/x86/faults.cc", line=160, format=0x1da9a40
>> "Tried to %s unmapped address %#x.\n")
>> at build/X86/base/misc.cc:94
>> 0x00d42d8f in __exit_message
>> (prefix=0x1da99ca "panic", code=-1,
>> func=0x1da9bbe <X86ISA::PageFault::invoke(ThreadContext*,
>> RefCountingPtr const&)::__FUNCTION__> "invoke",
>> file=0x1da99ad "build/X86/arch/x86/faults.cc", line=160, format=0x1da9a40
>> "Tried to %s unmapped address %#x.\n")
>> at build/X86/base/misc.hh:81
>> 0x00d414fb in X86ISA::PageFault::invoke (this=0x3c29d70,
>> tc=0x31ab0e0, inst=...) at build/X86/arch/x86/faults.cc:160
>> 0x00dcd3f3 in BaseSimpleCPU::advancePC (this=0x3196340,
>> fault=...) at build/X86/cpu/simple/base.cc:532
>> 0x00dc5774 in TimingSimpleCPU::advanceInst (this=0x3196340,
>> fault=...) at build/X86/cpu/simple/timing.cc:578
>> 0x00dc40e7 in TimingSimpleCPU::translationFault (this=0x3196340,
>> fault=...) at build/X86/cpu/simple/timing.cc:331
>> 0x00dc4e90 in TimingSimpleCPU::finishTranslation (this=0x3196340,
>> state=0x3c2a040) at build/X86/cpu/simple/timing.cc:497
>> 0x00dc8f39 in DataTranslation<TimingSimpleCPU*>::finish
>> (this=0x3c29e80, fault=..., req=0x3c29cf0, tc=0x31ab0e0,
>> mode=BaseTLB::Read) at build/X86/cpu/translation.hh:244
>> 0x00d806ab in X86ISA::TLB::translateTiming (this=0x312f380,
>> req=0x3c29cf0, tc=0x31ab0e0, translation=0x3c29e80,
>> mode=BaseTLB::Read) at build/X86/arch/x86/tlb.cc:429
>> 0x00dc477d in TimingSimpleCPU::readMem (this=0x3196340, addr=16,
>> data=0x7fffd2c0 "", size=8, flags=4)
>> at build/X86/cpu/simple/timing.cc:409
>> 0x01d47c5c in X86ISA::readMemTiming (xc=0x31964a0,
>> traceData=0x0, addr=16, mem=@0x7fffd2c0: 0, dataSize=8,
>> flags=4) at build/X86/arch/x86/memhelpers.hh:46
>> 0x01d387a5 in X86ISAInst::LdBig::initiateAcc (this=0x3c29f70,
>> xc=0x31964a0, traceData=0x0)
>> at build/X86/arch/x86/generated/exec-ns.cc.inc:19231
>> 0x00dc5a86 in TimingSimpleCPU::completeIfetch (this=0x3196340,
>> pkt=0x3c29df0) at build/X86/cpu/simple/timing.cc:619
>> 0x00dc5e89 in TimingSimpleCPU::IcachePort::ITickEvent::process
>> (this=0x3196708) at build/X86/cpu/simple/timing.cc:666
>> 0x00ebf808 in EventQueue::serviceOne (this=0x2fc1cf0) at
>> build/X86/sim/eventq.cc:221
>>
>> ...
>>
>> Best
>>
>>
>>
>>
>> On Mon, Feb 8, 2016 at 11:29 AM, Gongjin Sun <gongj...@uci.edu> wrote:
>>
>>> Really thank you Steve, next I'll read the comment and related code
>>> again, and hope can underst

Re: [gem5-users] About UpgradeReq and write hit

2016-02-08 Thread Gongjin Sun
I have to supplement an important thing:
After I changed "assert(pkt->getAddr() == tgt_pkt->getAddr()); " into
"assert(pkt->getAddr()
== blockAlign(tgt_pkt->getAddr()));" to pass this assert check, I got a
'SIGABRT' signal (after a while of run) and my program exits:

Program received signal SIGABRT, Aborted.

I analysed this question but still feel it should not be related to my
change because logically I didn't make some mistakes about my
above-mentioned the address analysis of pkt and tgt_pkt. So I believe it is
my change that exposes some other problem. So please help check that. Thank
you so much.



Some of the back trace are:

0x75e7d657 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:55
0x75e7ea2a in __GI_abort () at abort.c:89
0x010a9eb6 in __exit_epilogue (code=-1,
func=0x1da9bbe <X86ISA::PageFault::invoke(ThreadContext*,
RefCountingPtr const&)::__FUNCTION__> "invoke",
file=0x1da99ad "build/X86/arch/x86/faults.cc", line=160, format=0x1da9a40
"Tried to %s unmapped address %#x.\n")
at build/X86/base/misc.cc:94
0x00d42d8f in __exit_message
(prefix=0x1da99ca "panic", code=-1,
func=0x1da9bbe <X86ISA::PageFault::invoke(ThreadContext*,
RefCountingPtr const&)::__FUNCTION__> "invoke",
file=0x1da99ad "build/X86/arch/x86/faults.cc", line=160, format=0x1da9a40
"Tried to %s unmapped address %#x.\n")
at build/X86/base/misc.hh:81
0x00d414fb in X86ISA::PageFault::invoke (this=0x3c29d70,
tc=0x31ab0e0, inst=...) at build/X86/arch/x86/faults.cc:160
0x00dcd3f3 in BaseSimpleCPU::advancePC (this=0x3196340, fault=...)
at build/X86/cpu/simple/base.cc:532
0x00dc5774 in TimingSimpleCPU::advanceInst (this=0x3196340,
fault=...) at build/X86/cpu/simple/timing.cc:578
0x00dc40e7 in TimingSimpleCPU::translationFault (this=0x3196340,
fault=...) at build/X86/cpu/simple/timing.cc:331
0x00dc4e90 in TimingSimpleCPU::finishTranslation (this=0x3196340,
state=0x3c2a040) at build/X86/cpu/simple/timing.cc:497
0x00dc8f39 in DataTranslation<TimingSimpleCPU*>::finish
(this=0x3c29e80, fault=..., req=0x3c29cf0, tc=0x31ab0e0,
mode=BaseTLB::Read) at build/X86/cpu/translation.hh:244
0x00d806ab in X86ISA::TLB::translateTiming (this=0x312f380,
req=0x3c29cf0, tc=0x31ab0e0, translation=0x3c29e80,
mode=BaseTLB::Read) at build/X86/arch/x86/tlb.cc:429
0x00dc477d in TimingSimpleCPU::readMem (this=0x3196340, addr=16,
data=0x7fffd2c0 "", size=8, flags=4)
at build/X86/cpu/simple/timing.cc:409
0x01d47c5c in X86ISA::readMemTiming (xc=0x31964a0,
traceData=0x0, addr=16, mem=@0x7fffd2c0: 0, dataSize=8,
flags=4) at build/X86/arch/x86/memhelpers.hh:46
0x01d387a5 in X86ISAInst::LdBig::initiateAcc (this=0x3c29f70,
xc=0x31964a0, traceData=0x0)
at build/X86/arch/x86/generated/exec-ns.cc.inc:19231
0x00dc5a86 in TimingSimpleCPU::completeIfetch (this=0x3196340,
pkt=0x3c29df0) at build/X86/cpu/simple/timing.cc:619
0x00dc5e89 in TimingSimpleCPU::IcachePort::ITickEvent::process
(this=0x3196708) at build/X86/cpu/simple/timing.cc:666
0x000000ebf808 in EventQueue::serviceOne (this=0x2fc1cf0) at
build/X86/sim/eventq.cc:221

...

Best




On Mon, Feb 8, 2016 at 11:29 AM, Gongjin Sun <gongj...@uci.edu> wrote:

> Really thank you Steve, next I'll read the comment and related code again,
> and hope can understand more about the working mechanism of multi-level
> coherence.
>
> By the way, I found a possible bug again, please help verify it. (I use
> se.py)
>
>
> --Cache::recvTimingResp
>
>  } else {
>  // not a cache fill, just forwarding response
>  // responseLatency is the latency of the return path
>  // from lower level cahces/memory to the core.
>  completion_time += clockEdge(responseLatency) + pkt->payloadDe
>  if (pkt->isRead() && !is_error) {
>  // sanity check
>  assert(pkt->getAddr() == tgt_pkt->getAddr());
>  assert(pkt->getSize() >= tgt_pkt->getSize());
>
>  tgt_pkt->setData(pkt->getConstPtr());
>
>
> 
>
> The problematic line is :
> assert(pkt->getAddr() == tgt_pkt->getAddr());
>
> In the beginning I didn't get this assert failure because all my
> application can't enter this "else" code. Usually gem5's default behavior
> is "fill" (that is, isFill is True).
> But when I modified some code and asked gem5 to not fill data in some
> cache level(for example, when a ReadReq misses in L1 but hits in L2, the
> returned 

[gem5-users] About UpgradeReq and write hit

2016-02-07 Thread Gongjin Sun
Hi All,

Does any know the function of the request called "UpgradeReq"?  Under what
circumstance will this request be generated? After this request is sent to
other cache levels, what will happen to that level? There are so few
comments about it. Accord to its use, I guess it is related to write miss.
But I'm not sure about the specific functions.

In addition, I noticed that when a "write hit" happens in a cache level,
this cache will NOT send an invalidate message to its lower levels (closer
to mem) to invalidate this line's other copies. Is that correct? (Note: now
this cache's upper level (closer to cpu) definitely doesn't contain this
line, otherwise there must a write hit in that upper level rather than this
cache level.)

Thank you in advance

Best
gjins
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] About UpgradeReq and write hit

2016-02-07 Thread Gongjin Sun
Thank you, Steve.  But I'm still a little confused.

For the "A write hit implies that a cache has an exclusive copy". If a miss
happens at all cache levels, gem5 will bring this data line from memory to
L3 to L2 to L1, level by level. Now this line has three copies and its
state should be shared (clean). Next if a demand write request arrives at
L1, it will hit. So now how can we handle the copies in L2 and L3? We can
invalidate them, or propagate this line from L1 to l2 and l3 and make its
state become shared(dirty) ??

Also after I read the comments in CacheBlk::print(), I think gem5's MOESI
looks like not a standard one compared with the MOESI from wikipedia:
https://en.wikipedia.org/wiki/MOESI_protocol

gem5's MOESI is:

state   writabledirty   valid
M  1   1   1
O  0   1   1
E  1   0   1
S  0   0   1
I   0   0   0

For a shared block, according to the explanation of wikipedia, they can be
"dirty" (Here the 'dirty" is with respect to memory), We probably have
several modified copies. But gem5 think they are all clean and can't be
written. Does this mean on onwer exists for shared blocks? .  In addition,
why can't a Owned block be "writable"? It's a owner, right?

I'm so confused. Hope you can help me more. Thank you so much.

gjins


On Sun, Feb 7, 2016 at 10:28 PM, Steve Reinhardt <ste...@gmail.com> wrote:

> Upgrade requests are used on a write to a shared copy, to upgrade that
> copy's state from shared (read-only) to writable. They're generally treated
> as invalidations.
>
> A write hit implies that a cache has an exclusive copy, so it knows that
> there's no need to send invalidations to lower levels.  There are some
> relevant comments on the block states in the CacheBlk::print() method
> definition in src/mem/cache/blk.hh.
>
> Steve
>
>
> On Sun, Feb 7, 2016 at 4:04 PM Gongjin Sun <gongj...@uci.edu> wrote:
>
>> Hi All,
>>
>> Does any know the function of the request called "UpgradeReq"?  Under
>> what circumstance will this request be generated? After this request is
>> sent to other cache levels, what will happen to that level? There are so
>> few comments about it. Accord to its use, I guess it is related to write
>> miss. But I'm not sure about the specific functions.
>>
>> In addition, I noticed that when a "write hit" happens in a cache level,
>> this cache will NOT send an invalidate message to its lower levels (closer
>> to mem) to invalidate this line's other copies. Is that correct? (Note: now
>> this cache's upper level (closer to cpu) definitely doesn't contain this
>> line, otherwise there must a write hit in that upper level rather than this
>> cache level.)
>>
>> Thank you in advance
>>
>> Best
>> gjins
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] About UpgradeReq and write hit

2016-02-07 Thread Gongjin Sun
Sorry, there is a typo: "on onwer exists" should be "no owner exists".

I think more, and still can't understand why 'O' state has a "dirty" set
but can't be "writable". This owner has made changes to this line, but is
not "writable". That sounds like a contradiction. Or did I miss something?

Thanks

On Sun, Feb 7, 2016 at 11:03 PM, Gongjin Sun <gongj...@uci.edu> wrote:

> Thank you, Steve.  But I'm still a little confused.
>
> For the "A write hit implies that a cache has an exclusive copy". If a
> miss happens at all cache levels, gem5 will bring this data line from
> memory to L3 to L2 to L1, level by level. Now this line has three copies
> and its state should be shared (clean). Next if a demand write request
> arrives at L1, it will hit. So now how can we handle the copies in L2 and
> L3? We can invalidate them, or propagate this line from L1 to l2 and l3 and
> make its state become shared(dirty) ??
>
> Also after I read the comments in CacheBlk::print(), I think gem5's MOESI
> looks like not a standard one compared with the MOESI from wikipedia:
> https://en.wikipedia.org/wiki/MOESI_protocol
>
> gem5's MOESI is:
>
> state   writabledirty   valid
> M  1   1   1
> O  0   1   1
> E  1   0   1
> S  0   0   1
> I   0   0   0
>
> For a shared block, according to the explanation of wikipedia, they can be
> "dirty" (Here the 'dirty" is with respect to memory), We probably have
> several modified copies. But gem5 think they are all clean and can't be
> written. Does this mean on onwer exists for shared blocks? .  In addition,
> why can't a Owned block be "writable"? It's a owner, right?
>
> I'm so confused. Hope you can help me more. Thank you so much.
>
> gjins
>
>
> On Sun, Feb 7, 2016 at 10:28 PM, Steve Reinhardt <ste...@gmail.com> wrote:
>
>> Upgrade requests are used on a write to a shared copy, to upgrade that
>> copy's state from shared (read-only) to writable. They're generally treated
>> as invalidations.
>>
>> A write hit implies that a cache has an exclusive copy, so it knows that
>> there's no need to send invalidations to lower levels.  There are some
>> relevant comments on the block states in the CacheBlk::print() method
>> definition in src/mem/cache/blk.hh.
>>
>> Steve
>>
>>
>> On Sun, Feb 7, 2016 at 4:04 PM Gongjin Sun <gongj...@uci.edu> wrote:
>>
>>> Hi All,
>>>
>>> Does any know the function of the request called "UpgradeReq"?  Under
>>> what circumstance will this request be generated? After this request is
>>> sent to other cache levels, what will happen to that level? There are so
>>> few comments about it. Accord to its use, I guess it is related to write
>>> miss. But I'm not sure about the specific functions.
>>>
>>> In addition, I noticed that when a "write hit" happens in a cache level,
>>> this cache will NOT send an invalidate message to its lower levels (closer
>>> to mem) to invalidate this line's other copies. Is that correct? (Note: now
>>> this cache's upper level (closer to cpu) definitely doesn't contain this
>>> line, otherwise there must a write hit in that upper level rather than this
>>> cache level.)
>>>
>>> Thank you in advance
>>>
>>> Best
>>> gjins
>>> ___
>>> gem5-users mailing list
>>> gem5-users@gem5.org
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] What's the function of itb_walker_cache and dtb_walker_cache?

2016-01-27 Thread Gongjin Sun
Hi All,

I'm learning gem5's code structure but don't know what is the
"itb_walker_cache/dtb_walker_cache"? According to their name, they seem to
not be the itlb/dtlb itself. So what exactly is so-called"walker_cache"?

Any explanations are appreciated.

gjins
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users