[gem5-users] SLICC/Ruby (Mesh topology): L2 between directory and DRAM with same address ranges as the directory

2021-02-25 Thread tolausso--- via gem5-users
Hey there folks,

I am trying to add an L2 between the directory and DRAM in a (otherwise flat) 
SLICC protocol I've been working on but have been running into some issues. I 
know that some of the example protocols in src/mem/ruby/protocol/ do have 
co-located L3s alongside the directories, but to simplify the state machine of 
the directory I would really like to keep the controllers separate. As far as 
I've been able to find, there aren't any example protocols which connect a 
cache directly to the memory. If anyone has worked on similar things and has 
pointers to examples I could have a look at, I would be very grateful!

Some more thorough information about what I'm trying to do and what I've tried 
so far:

-- Background --
The system I am hoping to create is a mesh of nodes, where each node has a CPU, 
a private L1, and a pair consisting of a directory and an L2 that is 
responsible for a subset of the address space. So, if a CPU makes a request 
that cannot be satisfied by its L1, it sends a message to a directory using the 
mapAddressToMachine function. Depending on the address of the request, the 
message will be routed through the mesh to the corresponding directory. So far 
so good: setting this up has been easy thanks to the Mesh_XY topology and the 
setup_memory_controllers() function in configs/Ruby/Ruby.py.

My woes come from the fact instead of connecting the directory to memory, I now 
want the directory to send its main-memory requests to an L2 instead, and to 
then have that L2 connected to main memory. The L2 has a simple Valid/Invalid 
state design, since it effectively just serves as a DRAM cache (i.e. the 
directory is responsible for upholding SWMR). Unlike a DRAM cache, however, I 
want the L2 to be collocated with the directory, so if the directory at node 
/n/ makes a request using mapAddressToMachine(..., MachineType:L2Cache) then 
the target L2 will also be on node /n/.

-- Approach --
To set this up, I have taken some of the code from setup_memory_controllers() 
and modified it so that the memory controllers are connected to the L2s and so 
that the L2s and the directories have the same addr_ranges. I have tried both 
manually generating the addr_ranges using the m5.objects.AddrRange() 
constructor and by setting them equal to the addr_ranges of the constructed 
DRAM controllers (without success). This can be seen in my config file for the 
protocol: 
https://gist.githubusercontent.com/theoxo/56d35e7a38a01155029748199c1ac7c9/raw/fe031542188ecfbfc41a791b91756d975777dae9/gistfile1.txt

-- Problem --
Unfortunately, this doesn't really seem to work. I've been testing in SE mode 
with the "threads" test program and while it does successfully run for some 
time, I eventually encounter the following error:

> panic: Tried to read unmapped address 0.
> PC: 0x7890, Instr:   ADD_M_R : ldst   t1b, DS:[rax]

As far as I understand, this means that my attempt at setting up the 
addr_ranges is failing? My understanding of gem5 internals is unfortunately 
quite shallow, so I am struggling to decode more than that from the error 
message.

Sorry about the long email – if anyone recognizes any of these issues from 
similar systems you've configured yourself or know of any pointers to example 
protocols that are at all similar, please do let me know! 

Best,
Theo Olausson
Univ. of Edinburgh
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: SLICC: Main memory overwhelmed by requests?

2020-08-24 Thread tolausso--- via gem5-users
Thank you (once again) for your helpful answers, Jason!

After having done some more experimenting following your suggestions, I've 
found that increasing the deadlock threshold (by several orders of magnitude) 
does not make the problem go away, nor does only increasing the number of 
memory channels to 4. Nonetheless I suspect your gut feeling that bandwidth 
problems are to be blamed still holds, as increasing the DRAM size to 8192MB 
makes the "deadlock" go away and is accompanied by the following message:

> warn: Physical memory size specified is 8192MB which is greater than 3GB.  
> *Twice the number of memory controllers would be created.*

More memory controllers => less congestion in the memory controller(s) – makes 
sense to me!

I wonder if the underlying issue has more to do with the cache hierarchy of my 
system (e.g. no L2 cache, only small L1's) than the protocol itself. Either 
way, having found a band-aid solution is good enough for my current purposes :)

Thanks again for your help Jason!

Best,
Theo
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] SLICC: Main memory overwhelmed by requests?

2020-08-21 Thread tolausso--- via gem5-users
Hi all,

I am trying to run a Linux kernel in FS mode, with a custom-rolled SLICC/Ruby 
directory-based cache coherence protocol, but it seems like the memory 
controller is dropping some requests in rare circumstances -- possibly due to 
it being overwhelmed with requests.

The protocol seems to work fine for a long time but about 90% of the way into 
booting the kernel, around the same time as the "mounting filesystems..." 
message appears, gem5 crashes and reports a deadlock.
Inspecting the trace, it seems that the deadlock occurs during a period of very 
high main memory traffic; the trace looks something like this:
> Directory receives DMA read request for Address 1, sends MEMORY_READ to 
> memory controller
> Directory receives DMA read request for Address 2, sends MEMORY_READ to 
> memory controller
> ...
>  Directory receives DMA read request for Address N, sends MEMORY_READ to 
> memory controller
> Directory receives CPU read request for Address A, sends MEMORY_READ to 
> memory controller

After some time, the Directory receives responses for all of the DMA-induced 
requests (Address 1...N). However, it never hears back about the MEMORY_READ to 
Address A, and so eventually gem5 calls it a day and reports a deadlock. 
Address A is distinct from addresses 1..N and its read should therefore not be 
affected by the requests to the other addresses.

I have tried:
* Using the same kernel with one of the example SLICC protocols 
(MOESI_CMP_directory). No error occurred, so the underlying issue must be with 
my protocol.
* Upping the memory size to 8192MB (from 512MB) and increasing the number of 
channels to 4 (from 1). Under this configuration the above issue does not 
occur, and the Linux kernel happily finishes booting. This combined with the 
fact that it takes so long for any issues to occur makes me think that my 
protocol is somehow overwhelming the memory controller, causing it to drop the 
request to read Address A. In other words, I am pretty confident that the error 
is not something as simple as forgetting to pop the memory queue, for example.

If anyone has any clues as to what might be going on I would very much 
appreciate your comments.
I was especially wondering about the following:
* Is it even possible for requests to main memory to fail due to for example 
network congestion? If so, is there any way to catch this and retry the request?
* (Noob question): Where in gem5 do the main memory requests "go to"? Is there 
a debugging flag I could use to check whether the main memory receives the 
request?

Best,
Theo Olausson
Univ. of Edinburgh
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


[gem5-users] Re: Stores always cause SC_Failed in Ruby/SLICC protocol

2020-07-04 Thread tolausso--- via gem5-users
Hi Jason,

Thank you for your very helpful (and prompt) reply!

You were right that the SC_Failed was a red herring.
After playing around with my protocol a bit more, the issue seems to have been 
that I was making the callback for load and store hits (e.g. 
`sequencer.{x}Callback(address, entry, false)`) directly in the 
mandatoryQueue_in definition, rather than by invoking a transition which then 
made the callback -- it appears this makes the callback silently fail.
What's a bit strange is that callbacks for external hits (e.g. 
`sequencer.{x}(address, entry, true, {data-source})`) seem to work just fine 
when you declare them directly in an in_port rather than as part of an invoked 
transition... Not sure if this is either because the mandatoryQueue is a bit 
special, or if its because of the "initial access was a miss" flag.

Thank you once again for taking the time to help out a less experienced 
fisherman :)

Best,
Theo
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


[gem5-users] Stores always cause SC_Failed in Ruby/SLICC protocol

2020-07-03 Thread tolausso--- via gem5-users
Hi all,

I am trying to learn how to implement cache coherence protocols in gem5 using 
SLICC.
I am currently working on an MSI protocol, similar to the one described in the 
gem5 book.
The protocol passes the random tester for X86 
(`configs/learning_gem5/part3/ruby_test.py`), even when faced with a very large 
workload (4+ cores, 100k+ accesses).
It however does not pass the tester which executes the pre-compiled "threads" 
binary (`configs/learning_gem5/part3/simple_ruby.py`), citing a deadlock.

Inspecting the generated error trace, I find no obvious reason for a deadlock 
(e.g. repeating sequences of messages). This combined with the fact that the 
random tester is unable to find any issues leads me to think the error is not 
caused by for example improper allocation of the messages to the different 
networks causing circular dependencies. Instead, inspecting the error trace I 
find that Store events are always followed by "SC_Failed" instead of "Done", 
which I presume means "Store Conditional Failed". I take it that the X86 
"threads" binary uses Load-Link/Store-Conditional to implement some 
mutex/synchronization.

Consider the following section of the error trace:
```
533000   0Seq   Begin   >   [0x2b9a8, line 0x2b980] 
ST
534000: system.caches.controllers0: MSI-cache.sm:1072: Store in state I at line 
0x2b980
... *cache0 and directory transition to M* ...
585000   0Seq   SC_Failed   >   [0x2b9a8, line 0x2b980] 
0 cycles
586000   0Seq   Begin   >   [0x9898, line 0x9880] 
IFETCH -- Note this load is to a line separate from the stores
587000   0SeqDone   >   [0x9898, line 0x9880] 0 
cycles
588000   0Seq   Begin   >   [0x2b998, line 0x2b980] 
ST -- Store to same line as before
589000: system.caches.controllers0: MSI-cache.sm:1072: Store in state M at line 
0x2b980
589000   0Seq   SC_Failed   >   [0x2b998, line 0x2b980] 
0 cycles
589000   0L1Cache   store_hit  M>M  [0x2b980, line 0x2b980]
```

In this short trace we first see a store to line 0x2b980, which is not present 
in the cache. This finishes with the event "SC_Failed", which seems reasonable 
to me given that the store required a coherence transaction. We then see a load 
to an irrelevant line, which does not evict the line 0x2b980. Finally we see 
another store to line 0x2b980, which this time hits in M state, yet it is once 
again followed by SC_Failed instead of Done. I also find it a bit weird that it 
is reported that SC_Failed before the store_hit event (which is the only event 
triggered when the cache receives a ST event to a line in M state) is reported 
as having taken place.

My code for handling the store_hit in M state is as follows:
```
assert(is_valid(cache_entry));
cache.setMRU(cache_entry);
sequencer.writeCallback(in_msg.LineAddress, entry.cache_line, false);
mandatory_in.dequeue(clockEdge());
```

I realise my question thus far is a bit vague, which I apologise for. What I am 
hoping is that someone more knowledgeable than me could help me understand the 
following:
1. Is my interpretation of SC_Failed as "Store Conditional Failed" correct? (I 
thought x86 didn't support LL/SC, so this seems a bit fishy to me...)
2. Am I right in thinking that if stores are always followed by SC_Fail, this 
might cause a deadlock when executing the "threads" 
(`tests/test-progs/threads/bin/X86/linux/threads`) binary?
3. Any suggestions as to why it might be that I always get SC_Failed despite 
for example stores hitting in M only invoke setMRU and writeCallback?

Apologies for the lengthy question!

Best Regards,
Theo Olausson
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s