[gem5-dev] The memory dependence predictor misses dependencies following fences

ASHKAN ASGHARZADEH DONIGHI Tue, 14 Jan 2020 04:04:09 -0800

Hi Everybody,
We are working on Atomic instructions in X86 ISA and how they arehandled by gem5.
During running a simple benchmark which executes an Atomic_Incrementwithin a loop for 1M iterations,
we have encountered 2 MILLION TIMES OF MEMORY_ORDER_VIOLATIONS whichresults in 2 Million times of squashing a LD_Inst _because of amissed memory dependence with a previous store_ !!!
After delving into the code of "_src/cpu/o3/mem_dep_unit_impl._hh",we have found the following observations which cause theabovementioned problem:
1) The function INSERT(DYNINSTPTR &INST), is responsible to insertthe new Inst into the Inst_Queue.
2) If the coming Inst is a LD_Inst, then insert function tries tofind out whether the LD_Inst has a dependency with an in-flightMemory Barriers or a preceding ST_Inst or not.
3) If yes, then it adds the LD_Inst into the dependent_vector_listof that Mem_Barrier or ST_Inst. (THE PROBLEM IS HERE)
THE PROBLEM: the default order which gem5 looks for aproducing_store for the LD_Inst is that it gives priority toMem_Barriers, and
only if we do not have a Mem_Barrier then we take a look atstore_set mem_predictor to find the latest preceding storeassociates with the LD_Inst.
This order of finding the producing_store results in many numbers ofMem_Order_violations in the following example:
Example benchmark:

for(int i=0; i < 1M ; i++)
{
   
     /* This assambly represents a simple atomic_Increment in a loop */

       -- Mem_Fence
    -- Store _x_
    -- Load _x_
}

 
Regarding the above snippet, _in theory_, Load _x_ shouldbe _dependent_ on Store x, however, _according to the gem5implementation_, the Mem_Fence is selected as the _dependentinstruction and the dependence between the load and the store isobviated_;
_Having just the Load dependent only on the Mem_Fence_ in the abovecode, which is our simple benchmark, brings about 2 Million times ofSquashes (i.e., Mem_Order_Violations) that degrades the performancesignificantly.

THE SOLUTION: 
- _adding the dependence between the store and the load solves_ theproblem, causeing the number of squashes (i.e.,Mem_Order_Violations) to drop from 2 Millions to only 900_, andreducing execution time._
We want to ask whether our observation regarding how _to adddependencies for_ LD_Inst in "_src/cpu/o3/mem_dep_unit_impl._hh" iscorrect or not.
In another word, we want to ask if adding also the dependence to thestore (apart from the fence) can be done in gem5 with minormodifications?

Thanks a lot for reading our email and appreciate a lot your considerations.

Sincerely,
Ashkan Asgharzadeh, 
  Ph.D. Student at the CS Faculty, 
University of Murcia, Spain

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

[gem5-dev] The memory dependence predictor misses dependencies following fences

Reply via email to