On Sun, Jan 23, 2011 at 12:39 AM, Daniel Pittman <[email protected]> wrote: > From: Daniel Pittman <[email protected]> > > This implements Tarjan's algorithm for finding strongly connected components > in a directed graph, and leverages that to find cycles. > > This allows us to report the minimum set of nodes in each cycle, as well as > reporting each cycle discretely if there are multiple of them. > > While this can still produce overwhelming and/or unhelpful output, it > represents a large step forward in delivering useful information when a cycle > is detected. > > This presently reports the set of nodes that contain the cycle, in no > particular order, rather than the set of edges connecting those nodes. > > Sadly, it also suffers a limitation: the implementation of Tarjan's algorithm > used to find strongly connected components is recursive, so is limited by the > maximum Ruby stack depth to dependency chains less than 1,000 nodes deep. > > While this is probably not a limit in practice, it is a nasty limitation, and > other considerations (like Ruby stack consumption across versions) could > trigger this much sooner than is desirable.
No code comment, but I'm very happy to see active work on this. It's very frustrating to debug these situations. > > Signed-off-by: Daniel Pittman <[email protected]> > --- > Local-branch: feature/next/2597-better-cycle-reporting > lib/puppet/simple_graph.rb | 103 > ++++++++++++++++++++++++++++------------ > spec/unit/simple_graph_spec.rb | 51 +++++++++++++++++++- > 2 files changed, 122 insertions(+), 32 deletions(-) > > diff --git a/lib/puppet/simple_graph.rb b/lib/puppet/simple_graph.rb > index f9a665d..b900793 100644 > --- a/lib/puppet/simple_graph.rb > +++ b/lib/puppet/simple_graph.rb > @@ -1,9 +1,11 @@ > -# Created by Luke A. Kanies on 2007-11-07. > -# Copyright (c) 2007. All rights reserved. > +# Created by Luke A. Kanies on 2007-11-07. > +# Cycle detection and reporting by Daniel Pittman 2011-01-22 > +# Copyright (c) 2007, 2010. All rights reserved. > > require 'puppet/external/dot' > require 'puppet/relationship' > require 'set' > +require 'ostruct' > > # A hopefully-faster graph class to replace the use of GRATR. > class Puppet::SimpleGraph > @@ -92,40 +94,79 @@ class Puppet::SimpleGraph > vertices > end > > - # Provide a topological sort with cycle reporting > - def topsort_with_cycles > - degree = {} > - zeros = [] > - result = [] > - > - # Collect each of our vertices, with the number of in-edges each has. > - vertices.each do |v| > - edges = @in_to[v].dup > - zeros << v if edges.empty? > - degree[v] = edges > + # This is a simple implementation of Tarjan's algorithm to find strongly > + # connected components in the graph; this is a fairly ugly implementation, > + # because I can't just decorate the vertices themselves. > + # > + # This method has an unhealthy relationship with the find_cycles_in_graph > + # method below, which contains the knowledge of how the state object is > + # maintained. > + def tarjan(v, s) > + s.index[v] = s.n > + s.lowlink[v] = s.n > + s.n = s.n + 1 > + > + s.s.push v > + > + @out_from[v].each do |edge| > + to = edge[0] > + > + if ! s.index[to] then > + tarjan(to, s) > + s.lowlink[v] = [s.lowlink[v], s.lowlink[to]].min > + elsif s.s.member? to then > + # Performance note: the stack membership test *should* be done with a > + # constant time check, but I was lazy and used something that is > + # likely to be O(N) where N is the stack depth; this will bite us > + # eventually, and should be improved before the change lands. > + # > + # OTOH, this is only invoked on a very cold path, when things have > + # gone wrong anyhow, right now. I feel that getting the code out is > + # worth more than that final performance boost. --daniel 2011-01-22 > + s.lowlink[v] = [s.lowlink[v], s.index[to]].min > + end > end > > - # Iterate over each 0-degree vertex, decrementing the degree of > - # each of its out-edges. > - while v = zeros.pop > - result << v > - @out_from[v].each { |v2,es| > - degree[v2].delete(v) > - zeros << v2 if degree[v2].empty? > - } > + if s.lowlink[v] == s.index[v] then > + # REVISIT: Surely there must be a nicer way to partition this around an > + # index, but I don't know what it is. This works. :/ --daniel > 2011-01-22 > + # > + # Performance note: this might also suffer an O(stack depth) > performance > + # hit, better replaced with something that is O(1) for splitting the > + # stack into parts. > + tmp = s.s.slice!(0, s.s.index(v)) > + s.scc.push s.s > + s.s = tmp > end > + end > > - # If we have any vertices left with non-zero in-degrees, then we've > found a cycle. > - if cycles = degree.values.reject { |ns| ns.empty? } and cycles.length > 0 > - message = cycles.collect { |edges| > - '(' + edges.collect { |e| e[1].to_s }.join(", ") + ')' > - }.join("\n") > - raise Puppet::Error, "Found dependency cycles in the following > relationships:\n" + > - message + "\n" + > - "Try the '--graph' option and opening the '.dot' file in OmniGraffle > or GraphViz" > + # Find all cycles in the graph by detecting all the strongly connected > + # components, then eliminating everything with a size of one as > + # uninteresting - which it is, because it can't be a cycle. :) > + # > + # This has an unhealthy relationship with the 'tarjan' method above, which > + # it uses to implement the detection of strongly connected components. > + def find_cycles_in_graph > + state = OpenStruct.new :n => 0, :index => {}, :lowlink => {}, :s => [], > :scc => [] > + > + # we usually have a disconnected graph, must walk all possible roots > + vertices.each do |vertex| > + if ! state.index[vertex] then > + tarjan vertex, state > + end > end > > - result > + return state.scc.select { |c| c.length > 1 } > + end > + > + def report_cycles_in_graph > + cycles = find_cycles_in_graph > + n = cycles.length # where is "pluralize"? --daniel 2011-01-22 > + s = n == 1 ? '' : 's' > + > + raise Puppet::Error, "Found #{n} dependency cycle#{s}:\n" + > + cycles.collect { |c| '(' + c.join(", ") + ')' }.join("\n") + "\n" + > + "Try the '--graph' option and opening the '.dot' file in OmniGraffle > or GraphViz" > end > > # Provide a topological sort. > @@ -152,7 +193,7 @@ class Puppet::SimpleGraph > > # If we have any vertices left with non-zero in-degrees, then we've found > a cycle. > if cycles = degree.values.reject { |ns| ns == 0 } and cycles.length > 0 > - topsort_with_cycles > + report_cycles_in_graph > end > > result > diff --git a/spec/unit/simple_graph_spec.rb b/spec/unit/simple_graph_spec.rb > index 58978f2..92d370c 100755 > --- a/spec/unit/simple_graph_spec.rb > +++ b/spec/unit/simple_graph_spec.rb > @@ -305,10 +305,59 @@ describe Puppet::SimpleGraph do > > it "should produce the correct relationship text" do > add_edges :a => :b, :b => :a > - want = %r{following relationships:\n\(b => a\)\n\(a => b\)} > + want = %r{Found 1 dependency cycle:\n\(a, b\)\nTry} > expect { @graph.topsort }.to raise_error(Puppet::Error, want) > end > > + it "cycle reporting should be the minimum cycle for a simple graph" do > + add_edges "a" => "b" > + add_edges "b" => "a" > + add_edges "b" => "c" > + > + cycles = nil > + expect { cycles = @graph.find_cycles_in_graph.sort }.should_not > raise_error > + cycles.should be == [["a", "b"]] > + end > + > + it "cycle reporting should handle two distinct cycles" do > + add_edges "a" => "a1", "a1" => "a" > + add_edges "b" => "b1", "b1" => "b" > + > + cycles = nil > + expect { cycles = @graph.find_cycles_in_graph.sort }.should_not > raise_error > + cycles.should be == [["a", "a1"], ["b", "b1"]] > + end > + > + it "cycle reporting should handle two cycles in a connected graph" do > + add_edges "a" => "b", "b" => "c", "c" => "d" > + add_edges "a" => "a1", "a1" => "a" > + add_edges "c" => "c1", "c1" => "c2", "c2" => "c3", "c3" => "c" > + > + cycles = nil > + expect { cycles = @graph.find_cycles_in_graph.sort }.should_not > raise_error > + cycles.should be == [%w{a a1}, %w{c c1 c2 c3}] > + end > + > + it "cycle reporting should handle a complicated cycle" do > + add_edges "a" => "b", "b" => "c" > + add_edges "a" => "c" > + add_edges "c" => "c1", "c1" => "a" > + add_edges "c" => "c2", "c2" => "b" > + > + cycles = nil > + expect { cycles = @graph.find_cycles_in_graph.sort }.should_not > raise_error > + cycles.should be == [%w{a b c c1 c2}] > + end > + > + it "cycle reporting should not fail with large data sets" do > + limit = 1000 > + (1..(limit - 1)).each do |n| add_edges n.to_s => (n+1).to_s end > + > + cycles = nil > + expect { cycles = @graph.find_cycles_in_graph.sort }.should_not > raise_error > + cycles.should be == [] > + end > + > # Our graph's add_edge method is smart enough not to add > # duplicate edges, so we use the objects, which it doesn't > # check. > -- > 1.7.3.5 > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Developers" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/puppet-dev?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.
