[Puppet-dev] [PATCH 04/10] Feature #2597 -- use iterative DFS in Tarjan clustering.

Daniel Pittman Thu, 03 Feb 2011 18:04:31 -0800

From: Daniel Pittman <[email protected]>

In order to bypass the limitations of the C stack, which is also the Ruby
stack, we replace the simple and clear recursive Trajan implementation with an
iterative version that uses the heap as the stack.


This is somewhat harder to read, but can now run a 10,000 vertex deep linear
dependency relationship where, previously, 1,250 was about the limit on my
machine.

This should now be bounded by the size of the heap rather than the stack on
all platforms -- though it would be nice to get rid of the magic and return to
the recursive version if Ruby ever follows Perl down the sensible path of
essentially unlimited recursion by writing that code for us in the
interpreter...
---
 lib/puppet/simple_graph.rb     |   94 +++++++++++++++++++++++++---------------
 spec/unit/simple_graph_spec.rb |    2 +-
 2 files changed, 60 insertions(+), 36 deletions(-)

diff --git a/lib/puppet/simple_graph.rb b/lib/puppet/simple_graph.rb
index b900793..064f6be 100644
--- a/lib/puppet/simple_graph.rb
+++ b/lib/puppet/simple_graph.rb
@@ -101,42 +101,66 @@ class Puppet::SimpleGraph
   # This method has an unhealthy relationship with the find_cycles_in_graph
   # method below, which contains the knowledge of how the state object is
   # maintained.
-  def tarjan(v, s)
-    s.index[v]   = s.n
-    s.lowlink[v] = s.n
-    s.n = s.n + 1
-
-    s.s.push v
-
-    @out_from[v].each do |edge|
-      to = edge[0]
-
-      if ! s.index[to] then
-        tarjan(to, s)
-        s.lowlink[v] = [s.lowlink[v], s.lowlink[to]].min
-      elsif s.s.member? to then
-        # Performance note: the stack membership test *should* be done with a
-        # constant time check, but I was lazy and used something that is
-        # likely to be O(N) where N is the stack depth; this will bite us
-        # eventually, and should be improved before the change lands.
-        #
-        # OTOH, this is only invoked on a very cold path, when things have
-        # gone wrong anyhow, right now.  I feel that getting the code out is
-        # worth more than that final performance boost. --daniel 2011-01-22
-        s.lowlink[v] = [s.lowlink[v], s.index[to]].min
-      end
-    end
+  def tarjan(root, s)
+    # initialize the recursion stack we use to work around the nasty lack of a
+    # decent Ruby stack.
+    recur = [OpenStruct.new :node => root]
+
+    while not recur.empty? do
+      frame = recur.last
+      v = frame.node
+      case frame.step
+      when nil then
+        s.index[v]   = s.n
+        s.lowlink[v] = s.n
+        s.n          = s.n + 1
+
+        s.s.push v
+
+        frame.children = adjacent(v)
+        frame.step = :children
+
+      when :children then
+        if frame.children.length > 0 then
+          child = frame.children.shift
+          if ! s.index[child] then
+            # Never seen, need to recurse.
+            frame.step = :after_recursion
+            frame.child = child
+            recur.push OpenStruct.new :node => child
+          elsif s.s.member? child then
+            # Performance note: the stack membership test *should* be done 
with a
+            # constant time check, but I was lazy and used something that is
+            # likely to be O(N) where N is the stack depth; this will bite us
+            # eventually, and should be improved before the change lands.
+            #
+            # OTOH, this is only invoked on a very cold path, when things have
+            # gone wrong anyhow, right now.  I feel that getting the code out 
is
+            # worth more than that final performance boost. --daniel 2011-01-22
+            s.lowlink[v] = [s.lowlink[v], s.index[child]].min
+          end
+        else
+          if s.lowlink[v] == s.index[v] then
+            # REVISIT: Surely there must be a nicer way to partition this 
around an
+            # index, but I don't know what it is.  This works. :/ --daniel 
2011-01-22
+            #
+            # Performance note: this might also suffer an O(stack depth) 
performance
+            # hit, better replaced with something that is O(1) for splitting 
the
+            # stack into parts.
+            tmp = s.s.slice!(0, s.s.index(v))
+            s.scc.push s.s
+            s.s = tmp
+          end
+          recur.pop               # done with this node, finally.
+        end
+
+      when :after_recursion then
+        s.lowlink[v] = [s.lowlink[v], s.lowlink[frame.child]].min
+        frame.step = :children
 
-    if s.lowlink[v] == s.index[v] then
-      # REVISIT: Surely there must be a nicer way to partition this around an
-      # index, but I don't know what it is.  This works. :/ --daniel 2011-01-22
-      #
-      # Performance note: this might also suffer an O(stack depth) performance
-      # hit, better replaced with something that is O(1) for splitting the
-      # stack into parts.
-      tmp = s.s.slice!(0, s.s.index(v))
-      s.scc.push s.s
-      s.s = tmp
+      else
+        fail "#{frame.step} is an unknown step"
+      end
     end
   end
 
diff --git a/spec/unit/simple_graph_spec.rb b/spec/unit/simple_graph_spec.rb
index a8e0c44..3b0fe9a 100755
--- a/spec/unit/simple_graph_spec.rb
+++ b/spec/unit/simple_graph_spec.rb
@@ -350,7 +350,7 @@ describe Puppet::SimpleGraph do
     end
 
     it "cycle discovery should not fail with large data sets" do
-      limit = 1000
+      limit = 3000
       (1..(limit - 1)).each do |n| add_edges n.to_s => (n+1).to_s end
 
       cycles = nil
-- 
1.7.3.5

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

[Puppet-dev] [PATCH 04/10] Feature #2597 -- use iterative DFS in Tarjan clustering.

Reply via email to