When Massif culls "old" snapshots, it drops about half of them, and then resets 
its period to the MIN() of the inter-snapshots times.

For long runs, that has somewhat unexpected consequences. You can easily end up 
with 75 snapshots that are about a minute apart, and then have the last 25 
about 20 seconds apart for example (or worse). Graphs which don't look at the 
"time=X" can then be a bit misleading -- and that includes ms_print's I 
believe, Martijn's massifplot, and I'd expect most homegrown tools. 

You could say RTFM of course, but it might be more desirable, at least when the 
inter-snapshot time is "long", to only decimate (cull 10% of snapshots). I try 
to do something like that in the following, YMMV. A couple of things had to 
move, the diff is "morally" a lot smaller than it looks.

It might be better to have an explicit option to set the decimation fraction 
(which dictates the frequency). It might also be better to take an average of 
the time between existing snapshots, rather than the MIN() -- which makes it 
very likely these new snapshots will be among the first ones to be discarded at 
the next culling.

A mighty fine tool!


diff --git a/massif/ms_main.c b/massif/ms_main.c
index a6062fc..530b461 100644
--- a/massif/ms_main.c
+++ b/massif/ms_main.c
@@ -220,7 +220,10 @@ Number of snapshots: 50
       VG_(message)(Vg_DebugMsg, "Massif: " format, ##args); \
    }
 
-
+// Time is measured either in i or ms or bytes, depending on the --time-unit
+// option.  It's a Long because it can exceed 32-bits reasonably easily, and
+// because we need to allow negative values to represent unset times.
+typedef Long Time;
 
 //------------------------------------------------------------//
 //--- Statistics                                           ---//
@@ -283,6 +286,13 @@ static ULong total_allocs_deallocs_szB = 0;
 //
 static Bool have_started_executing_code = False;
 
+// 'current_min_time_interval' is the minimum time interval between snapshots.
+// If we try to take a snapshot and less than this much time has passed,
+// we don't take it.  It gets larger as the program runs longer.  It's
+// initialised to zero so that we begin by taking snapshots as quickly as
+// possible.
+static Time current_min_time_interval = 0;
+
 //------------------------------------------------------------//
 //--- Alloc fns                                            ---//
 //------------------------------------------------------------//
@@ -966,11 +976,6 @@ static void update_XCon(XPt* xpt, SSizeT space_delta)
 // limit again, we again cull and then take them even more slowly, and so
 // on.
 
-// Time is measured either in i or ms or bytes, depending on the --time-unit
-// option.  It's a Long because it can exceed 32-bits reasonably easily, and
-// because we need to allow negative values to represent unset times.
-typedef Long Time;
-
 #define UNUSED_SNAPSHOT_TIME  -333  // A conspicuous negative number.
 
 typedef
@@ -1091,12 +1096,37 @@ static void VERB_snapshot(Int verbosity, Char* prefix, 
Int i)
    );
 }
 
+// Given the current conditions, returns the fraction of snapshots to
+// cull (i.e. returns 19 to mean "cull 1/10th")
+//
+// Initially we discard N/2 snapshots, but when the
+// inter-snapshot-time grows large enough, we can afford to cull
+// fewer. The rationale being that, when snapshots are taken once a
+// minute, we can afford to cull every tenth snapshot to get a more
+// uniform distribution.
+//
+// A non-uniform distribution can give misleading graphs (from
+// ms_print or tools like massifplot), unless they do the right thing
+// and use an X-Y plot type (being ASCII art, ms_print would be hard-
+// pressed to do this anyway).
+
+static Int fraction_of_snapshots_to_discard(void)
+{
+    if ((clo_time_unit == TimeMS && current_min_time_interval > 10000) ||
+        (clo_time_unit == TimeI  && current_min_time_interval > 10000000000LL))
+    {
+        return 10;
+    }
+
+    return 2; // also TimeB case, used in short-lived programs anyway.
+}
+
 // Cull half the snapshots;  we choose those that represent the smallest
 // time-spans, because that gives us the most even distribution of snapshots
 // over time.  (It's possible to lose interesting spikes, however.)
 //
 // Algorithm for N snapshots:  We find the snapshot representing the smallest
-// timeframe, and remove it.  We repeat this until (N/2) snapshots are gone.
+// timeframe, and remove it.  We repeat this until e.g. (N/2) snapshots are 
gone.
 // We have to do this one snapshot at a time, rather than finding the (N/2)
 // smallest snapshots in one hit, because when a snapshot is removed, its
 // neighbours immediately cover greater timespans.  So it's O(N^2), but N is
@@ -1109,6 +1139,7 @@ static UInt cull_snapshots(void)
    Int  i, jp, j, jn, min_timespan_i;
    Int  n_deleted = 0;
    Time min_timespan;
+   Int  fraction_to_discard = fraction_of_snapshots_to_discard();
 
    n_cullings++;
 
@@ -1122,7 +1153,7 @@ static UInt cull_snapshots(void)
 
    // First we remove enough snapshots by clearing them in-place.  Once
    // that's done, we can slide the remaining ones down.
-   for (i = 0; i < clo_max_snapshots/2; i++) {
+       for (i = 0; i < (1+clo_max_snapshots/fraction_to_discard); i++) {
       // Find the snapshot representing the smallest timespan.  The timespan
       // for snapshot n = d(N-1,N)+d(N,N+1), where d(A,B) is the time between
       // snapshot A and B.  We don't consider the first and last snapshots for
@@ -1300,12 +1331,6 @@ take_snapshot(Snapshot* snapshot, SnapshotKind kind, 
Time time,
 static void
 maybe_take_snapshot(SnapshotKind kind, Char* what)
 {
-   // 'min_time_interval' is the minimum time interval between snapshots.
-   // If we try to take a snapshot and less than this much time has passed,
-   // we don't take it.  It gets larger as the program runs longer.  It's
-   // initialised to zero so that we begin by taking snapshots as quickly as
-   // possible.
-   static Time min_time_interval = 0;
    // Zero allows startup snapshot.
    static Time earliest_possible_time_of_next_snapshot = 0;
    static Int  n_snapshots_since_last_detailed         = 0;
@@ -1390,11 +1415,11 @@ maybe_take_snapshot(SnapshotKind kind, Char* what)
    // Cull the entries, if our snapshot table is full.
    next_snapshot_i++;
    if (clo_max_snapshots == next_snapshot_i) {
-      min_time_interval = cull_snapshots();
+      current_min_time_interval = cull_snapshots();
    }
 
    // Work out the earliest time when the next snapshot can happen.
-   earliest_possible_time_of_next_snapshot = time + min_time_interval;
+   earliest_possible_time_of_next_snapshot = time + current_min_time_interval;
 }
 


--
Stephane Belmon





 Protected by Websense Hosted Email Security -- www.websense.com 

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Valgrind-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to