[m5-dev] Fixing MESI CMP directory protocol

2011-01-04 Thread Nilay Vaish

What threshold do you use?

On Tue, 4 Jan 2011, Arkaprava Basu wrote:


Hi Nilay,

  On deadlock issue with MESI_CMP_directory :
  Yes,  this can happen as ruby_tester or Sequencer only reports *possible* 
deadlocks. With higher number of processors there is more contention (and 
thus latency) and it can mistakenly report deadlock. I generally look at the 
protocol trace to figure out whether there is actually any deadlock or not. 
You can also try doubling the Sequencer deadlock threshold and see if the 
problem goes away. If its a true deadlock, it will  break again.


On some related note,  as Brad has pointed out MESI_CMP_directory has its 
share of issues. Recently one of Prof. Sarita Adve's student e-mailed us 
(Multifacet) about 6 bugs he found while model checking the 
MESI_CMP_directory (including a major one). I took some time to look at them 
and it seems like MESI_CMP_directory is now fixed (hopefully).  The modified 
protocol is now passing 1M checks with 16 processors with multiple random 
seeds.  I can locally coordinate with you on this, if you want.


Thanks
Arka


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Fixing MESI CMP directory protocol

2011-01-04 Thread Arkaprava Basu

These are the following step I use:

1. First run with whatever default values of threshold are.
2. If deadlocked, take trace and try to find out is there evident reason 
for deadlock or not.

3. If no, double the default threshold value and run again.
4. If the same test passes with larger threshold, then it means the 
deadlock was actually not there. So life is good. If not, need to dig 
more into trace to see whats going on.


@Nilay:
By end of today, I will share with you the patch that seems like fixed  
that protocol.


Thanks
Arka

On 01/04/2011 12:51 PM, Nilay Vaish wrote:

What threshold do you use?

On Tue, 4 Jan 2011, Arkaprava Basu wrote:


Hi Nilay,

  On deadlock issue with MESI_CMP_directory :
  Yes,  this can happen as ruby_tester or Sequencer only reports 
*possible* deadlocks. With higher number of processors there is more 
contention (and thus latency) and it can mistakenly report deadlock. 
I generally look at the protocol trace to figure out whether there is 
actually any deadlock or not. You can also try doubling the Sequencer 
deadlock threshold and see if the problem goes away. If its a true 
deadlock, it will  break again.


On some related note,  as Brad has pointed out MESI_CMP_directory has 
its share of issues. Recently one of Prof. Sarita Adve's student 
e-mailed us (Multifacet) about 6 bugs he found while model checking 
the MESI_CMP_directory (including a major one). I took some time to 
look at them and it seems like MESI_CMP_directory is now fixed 
(hopefully).  The modified protocol is now passing 1M checks with 16 
processors with multiple random seeds.  I can locally coordinate with 
you on this, if you want.


Thanks
Arka


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev