Hi Evangele,
If you're feeling impatient and don't want to wait for Stavros to check
in the changes, I've attached a patch (based on Stavros' fix) that will
eliminate the evictBuffer assertion that you're seeing.
You can apply the patch from the top-level of the flexus source tree
using the following command,
patch -p0 < xxxxx.patch
Jason
On 13/10/2010 4:28 PM, Volos Stavros wrote:
Hi Evangele,
During the previous 2 weeks I experienced the same problems that you observe. I
suspect that the evictBuffer bug is
related to a race between an eviction and invalidate. I fixed this bug but I
didn't have time to check in because I left
for a week off. I will commit my code on Monday-Tuesday when I return back. The
Watchdog bug has to be related to
the network deadlock that I experienced with torus topology. If you try with a
mesh, you will not have this bug. I will
work on this bug when I return back.
Just to mention that I experienced these bugs when I ran some virtualized
workloads and not the usual. However, both bugs
can be hidden if you use different configuration file, so that's why we haven't
experienced these bugs in the past for the
usual workloads. For example, if you use a bigger evict buffer for the L1 cache
the bug will get hidden.
Regards,
-Stavros
On Oct 13, 2010, at 10:03 PM, Evangelos Vlachos wrote:
Hi all,
I am in the process of migrating to Flexus 4 and I am currently trying
to verify that this has been done successfully. I wanted to compare some
timing results I got against the results your version is reporting for
the same workload (tpcc_nort) and the same configuration (Pejamn
provided one for me, so we can have the same point of reference).
After generating flexpoints and running the timing simulation for 80
flexpoints (0-3:5-24), I observed 9 of them aborted with two kinds of
errors in total (see below). Do you have any idea what can be wrong?
The interesting thing to note is that the CMU's simics checkpoint (at
least the one for tpcc_nort's phase000) is slightly "behind" than the
one you have in EPFL (meaning the cpu_cycles I observe here is about
200K cycles lower than that in EPFL), which means a case that hasn't
been debugged may have occurred.
Any ideas/suggestions?
Regards,
Evangelos
---------
000_013
738<mai_api.cpp:58> {34101}- CPU[10] Interrupt 4d
739<cycle.cpp:25> {34106}- 10-uarch Interrupt: 4d pending for 4
740<InclusiveMESI.cpp:1735> {37152}- 11-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread
000_015
822<flexus.cpp:231> {100000}- Timestamp: 2010-Oct-09 20:25:46
823<microArch.cpp:321> {100000}- Timestamp: 2010-Oct-09 20:25:46
824<InclusiveMESI.cpp:1735> {106843}- 08-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread
000_016
755<cycle.cpp:25> {42224}- 13-uarch Interrupt: 4d pending for 7
756<accounting.cpp:644> {44243}- Unaccounted load stall cycles.
Invalid load level: eDirectory for load #65689[15] @v:010046434 |
fffffffff4572098| ldsh [%i4 + 152], %i2 {retired}
757<InclusiveMESI.cpp:1735> {48535}- 00-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread
000_023
842<microArch.cpp:321> {140000}- Timestamp: 2010-Oct-09 20:27:48
843<flexus.cpp:311> {146432}- Watchdog timer expired. No progress by
CPU 12 for 90015cycles
844<flexus.cpp:315> {146432}-<undefined> Assertion failed: ((!
(theWatchdogCounts[i]< theWatchdogTimeout + 10))) : Watchdog timer
expired. No progress by CPU 12 for 90015cycles
001_010
756<cycle.cpp:1164> {24287}- Forced Resync:#12552[03] @v:01024d9dc |
ffffffffc070a000| stx %g0, [%g2 + 0] {force-resync}
{retired}
757<accounting.cpp:87> {24287}- Unknown resync for instruction code:
SideEffectStore
758<InclusiveMESI.cpp:1735> {24394}- 12-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread
002_008
753<mai_api.cpp:58> {61144}- CPU[3] Interrupt 44
754<cycle.cpp:25> {61150}- 03-uarch Interrupt: 44 pending for 5
755<InclusiveMESI.cpp:1735> {61613}- 13-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread
002_022
886<cycle.cpp:25> {140006}- 05-uarch Interrupt: 4d pending for 4
887<flexus.cpp:311> {141312}- Watchdog timer expired. No progress by
CPU 10 for 90015cycles
888<flexus.cpp:315> {141312}-<undefined> Assertion failed: ((!
(theWatchdogCounts[i]< theWatchdogTimeout + 10))) : Watchdog timer
expired. No progress by CPU 10 for 90015cycles
Abort (SIGABRT) in main thread
003_021
811<cycle.cpp:1164> {84198}- Forced Resync:#55680[03] @v:01024d9dc |
ffffffffc070a000| stx %g0, [%g2 + 0] {force-resync}
{retired}
812<accounting.cpp:87> {84198}- Unknown resync for instruction code:
SideEffectStore
813<InclusiveMESI.cpp:1735> {89235}- 07-L1d Assertion failed: ((!
(evEntry != theEvictBuffer.end()))) :<undefined>
Abort (SIGABRT) in main thread
003_024
812<microArch.cpp:321> {130000}- Timestamp: 2010-Oct-09 20:33:30
813<flexus.cpp:311> {138496}- Watchdog timer expired. No progress by
CPU 0 for 90015cycles
814<flexus.cpp:315> {138496}-<undefined> Assertion failed: ((!
(theWatchdogCounts[i]< theWatchdogTimeout + 10))) : Watchdog timer
expired. No progress by CPU 0 for 90015cycles
Abort (SIGABRT) in main thread
Index: components/Cache/InclusiveMESI.cpp
===================================================================
--- components/Cache/InclusiveMESI.cpp (revision 826)
+++ components/Cache/InclusiveMESI.cpp (working copy)
@@ -762,6 +762,23 @@
act.theBackTransport = transport;
return act;
}
+
+ if (from_eb && evictee->pending()){
+
+
tracker->setNetworkTrafficRequired(true);
+ tracker->setResponder(theNodeId);
+ tracker->setFillLevel(theCacheLevel);
+
+ Action act(kSend, tracker, true /*Write
Fwd needs to contain data*/);
+ if (state == State::Modified)
+ msg->type() =
MemoryMessage::FwdReplyDirty;
+ else
+ msg->type() =
MemoryMessage::FwdReplyWritable;
+ act.theBackMessage = true;
+ act.theBackTransport = transport;
+ return act;
+ }
+
// First, wait for outstanding Read/Fetch
downgrade requests
if (!theSnoopBuffer.hasEntry(msg->address())) {
@@ -850,6 +867,16 @@
return act;
}
+
+ if (from_eb && evictee->pending()){
+ Action act(kSend, tracker, false);
+ msg->type() =
MemoryMessage::InvalidateAck;
+ act.theBackMessage = true;
+ act.theBackTransport = transport;
+ return act;
+ }
+
+
// First, look for outstanding snoops, and wait
for them to complete
if (!theSnoopBuffer.hasEntry(msg->address())) {