Hi Evangele,

If you're feeling impatient and don't want to wait for Stavros to check in the changes, I've attached a patch (based on Stavros' fix) that will eliminate the evictBuffer assertion that you're seeing.

You can apply the patch from the top-level of the flexus source tree using the following command,

patch -p0 < xxxxx.patch




Jason



On 13/10/2010 4:28 PM, Volos Stavros wrote:
Hi Evangele,

During the previous 2 weeks I experienced the same problems that you observe. I 
suspect that the evictBuffer bug is
related to a race between an eviction and invalidate. I fixed this bug but I 
didn't have time to check in because I left
for a week off. I will commit my code on Monday-Tuesday when I return back. The 
Watchdog bug has to be related to
the network deadlock that I experienced with torus topology. If you try with a 
mesh, you will not have this bug. I will
work on this bug when I return back.

Just to mention that I experienced these bugs when I ran some virtualized 
workloads and not the usual. However, both bugs
can be hidden if you use different configuration file, so that's why we haven't 
experienced these bugs in the past for the
usual workloads. For example, if you use a bigger evict buffer for the L1 cache 
the bug will get hidden.

Regards,
-Stavros


On Oct 13, 2010, at 10:03 PM, Evangelos Vlachos wrote:

Hi all,

I am in the process of migrating to Flexus 4 and I am currently trying
to verify that this has been done successfully. I wanted to compare some
timing results I got against the results your version is reporting for
the same workload (tpcc_nort) and the same configuration (Pejamn
provided one for me, so we can have the same point of reference).

After generating flexpoints and running the timing simulation for 80
flexpoints (0-3:5-24), I observed 9 of them aborted with two kinds of
errors in total (see below). Do you have any idea what can be wrong?

The interesting thing to note is that the CMU's simics checkpoint (at
least the one for tpcc_nort's phase000) is slightly "behind" than the
one you have in EPFL (meaning the cpu_cycles I observe here is about
200K cycles lower than that in EPFL), which means a case that hasn't
been debugged may have occurred.

Any ideas/suggestions?

Regards,
Evangelos

---------

000_013
738<mai_api.cpp:58>  {34101}- CPU[10] Interrupt 4d
739<cycle.cpp:25>  {34106}- 10-uarch Interrupt: 4d pending for 4
740<InclusiveMESI.cpp:1735>  {37152}- 11-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread

000_015
822<flexus.cpp:231>  {100000}- Timestamp: 2010-Oct-09 20:25:46
823<microArch.cpp:321>  {100000}- Timestamp: 2010-Oct-09 20:25:46
824<InclusiveMESI.cpp:1735>  {106843}- 08-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread

000_016
755<cycle.cpp:25>  {42224}- 13-uarch Interrupt: 4d pending for 7
756<accounting.cpp:644>  {44243}- Unaccounted load stall cycles.
Invalid load level: eDirectory for load #65689[15] @v:010046434 |
fffffffff4572098| ldsh [%i4 + 152], %i2           {retired}
757<InclusiveMESI.cpp:1735>  {48535}- 00-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread

000_023
842<microArch.cpp:321>  {140000}- Timestamp: 2010-Oct-09 20:27:48
843<flexus.cpp:311>  {146432}- Watchdog timer expired.  No progress by
CPU 12 for  90015cycles
844<flexus.cpp:315>  {146432}-<undefined>  Assertion failed: ((!
(theWatchdogCounts[i]<  theWatchdogTimeout + 10))) : Watchdog timer
expired.  No progress by CPU 12 for  90015cycles

001_010
756<cycle.cpp:1164>  {24287}- Forced Resync:#12552[03] @v:01024d9dc |
ffffffffc070a000| stx %g0, [%g2 + 0]             {force-resync}
{retired}
757<accounting.cpp:87>  {24287}- Unknown resync for instruction code:
SideEffectStore
758<InclusiveMESI.cpp:1735>  {24394}- 12-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread

002_008
753<mai_api.cpp:58>  {61144}- CPU[3] Interrupt 44
754<cycle.cpp:25>  {61150}- 03-uarch Interrupt: 44 pending for 5
755<InclusiveMESI.cpp:1735>  {61613}- 13-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread

002_022
886<cycle.cpp:25>  {140006}- 05-uarch Interrupt: 4d pending for 4
887<flexus.cpp:311>  {141312}- Watchdog timer expired.  No progress by
CPU 10 for  90015cycles
888<flexus.cpp:315>  {141312}-<undefined>  Assertion failed: ((!
(theWatchdogCounts[i]<  theWatchdogTimeout + 10))) : Watchdog timer
expired.  No progress by CPU 10 for  90015cycles
Abort (SIGABRT) in main thread

003_021
811<cycle.cpp:1164>  {84198}- Forced Resync:#55680[03] @v:01024d9dc |
ffffffffc070a000| stx %g0, [%g2 + 0]             {force-resync}
{retired}
812<accounting.cpp:87>  {84198}- Unknown resync for instruction code:
SideEffectStore
813<InclusiveMESI.cpp:1735>  {89235}- 07-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread

003_024
812<microArch.cpp:321>  {130000}- Timestamp: 2010-Oct-09 20:33:30
813<flexus.cpp:311>  {138496}- Watchdog timer expired.  No progress by
CPU 0 for  90015cycles
814<flexus.cpp:315>  {138496}-<undefined>  Assertion failed: ((!
(theWatchdogCounts[i]<  theWatchdogTimeout + 10))) : Watchdog timer
expired.  No progress by CPU 0 for  90015cycles
Abort (SIGABRT) in main thread




Index: components/Cache/InclusiveMESI.cpp
===================================================================
--- components/Cache/InclusiveMESI.cpp  (revision 826)
+++ components/Cache/InclusiveMESI.cpp  (working copy)
@@ -762,6 +762,23 @@
                                        act.theBackTransport = transport;
                                        return act;
                                }
+                               
+                               if (from_eb && evictee->pending()){
+
+                                       
tracker->setNetworkTrafficRequired(true);
+                                       tracker->setResponder(theNodeId);
+                                       tracker->setFillLevel(theCacheLevel);
+
+                                       Action act(kSend, tracker, true /*Write 
Fwd needs to contain data*/);
+                                       if (state == State::Modified)
+                                               msg->type() = 
MemoryMessage::FwdReplyDirty;
+                                       else
+                                               msg->type() = 
MemoryMessage::FwdReplyWritable;
+                                       act.theBackMessage = true;
+                                       act.theBackTransport = transport;
+                                       return act;
+                               }
+
                                // First, wait for outstanding Read/Fetch 
downgrade requests
                                if (!theSnoopBuffer.hasEntry(msg->address())) {
 
@@ -850,6 +867,16 @@
                                        return act;
                                }
 
+
+                               if (from_eb && evictee->pending()){
+                                       Action act(kSend, tracker, false);
+                                       msg->type() = 
MemoryMessage::InvalidateAck;
+                                       act.theBackMessage = true;
+                                       act.theBackTransport = transport;
+                                       return act;
+                               }
+
+
                                // First, look for outstanding snoops, and wait 
for them to complete
                                if (!theSnoopBuffer.hasEntry(msg->address())) {
 

Reply via email to