1] [#4590] SimpleGraph is slow

Markus Roberts Wed, 06 Oct 2010 10:32:50 -0700

On Mon, Oct 4, 2010 at 10:24 AM, Paul Berry <[email protected]> wrote:
> Since this patch has been sitting on the list with no comments for a while,
> Jesse and I took a stab at reviewing it last week.  Comments below:
>
> On Wed, Sep 8, 2010 at 11:53 AM, Paul Berry <[email protected]> wrote:
>>
>>   # Clear our graph.
>>   def clear
>> -   �[email protected] { |vertex, wrapper| wrapper.clear }
>> -   �[email protected]
>> -   �[email protected]
>> -  end
>> -
>> -  # Which resources a given resource depends upon.
>> -  def dependents(resource)
>> -    tree_from_vertex(resource).keys
>> +   �...@in_to.each { |k,v| v.clear }
>> +   �...@out_from.each { |k,v| v.clear }
>
> The above two lines seem unnecessary since they are followed by clearing
> @in_to and @out_from.  Was this an attempt to avoid shedding GC loops?
>  Because if so there is actually no benefit, since the sub-hashes contained
> within @in_to and @out_from are not referred to by anything else.





>
>>
>> +   �...@in_to.clear
>> +   �...@out_from.clear
>> +   �...@upstream_from.clear
>> +   �...@downstream_from.clear
>>   end
>>
>>   # Which resources depend upon the given resource.
>>   def dependencies(resource)
>> -    # Cache the reversal graph, because it's somewhat expensive
>> -    # to create.
>> -   �...@reversal ||= reversal
>> -    # Strangely, it's significantly faster to search a reversed
>> -    # tree in the :out direction than to search a normal tree
>> -    # in the :in direction.
>> -   �[email protected]_from_vertex(resource, :out).keys
>> +   vertex?(resource) ? upstream_from_vertex(resource).keys : []
>> +  end
>
> Looks like an indentation error--there should be one more space before
> "vertex".

Yep.  I started to say it looked correct to me, then switched fonts.

>>
>> +
>> +  def dependents(resource)
>> +   vertex?(resource) ? downstream_from_vertex(resource).keys : []
>
> Indentation error here too.

Yep.


>
>>
>>   end
>>
>>   # Whether our graph is directed.  Always true.  Used to produce dotp.
>> files.
>> @@ -209,103 +158,87 @@ class Puppet::SimpleGraph
>>
>>   # Add a new vertex to the graph.
>>   def add_vertex(vertex)
>> -   �...@reversal = nil
>> -    return false if vertex?(vertex)
>> -    setup_vertex(vertex)
>> -    true # don't return the VertexWrapper instance.
>> +   �...@upstream_from.clear
>> +   �...@downstream_from.clear
>
> There's no need to clear @upstream_from and @downstream_from, because adding
> a lone vertex can't change those relationships.

Doh!


>>
>> +   �...@in_to[vertex]    ||= {}
>> +   �...@out_from[vertex] ||= {}
>>   end
>>
>>   # Find a matching edge.  Note that this only finds the first edge,
>>   # not all of them or whatever.
>>   def edge(source, target)
>> -   �[email protected]_with_index { |test_edge, index| return test_edge if
>> test_edge.source == source and test_edge.target == target }
>> +    edge?(source,target) && @out_from[source][target].first
>>   end
>>
>>   def edge_label(source, target)
>> -    return nil unless edge = edge(source, target)
>> -    edge.label
>> +    (e = edge(source, target)) && e.label
>>   end
>
> It's hard to imagine circumstances in which the above two methods would be
> useful, because if there are edges between the source and target, they only
> return information about one of the edges.  Also, it looks like nobody calls
> these methods.  Can we eliminate them?

I started out determining the API by grepping our code base for all
the method names and eliminating those for which there were no static
calls.  That didn't suffice (there were constructed calls) so I
switched to my wrapper & hot spot analysis leaving in anything I
hadn't proven unnecessary.  That means I possibly left in dead code
but (if I didn't make any goofs) didn't remove any unneeded code.

I have no objection to removing edge_lable, provided it really isn't
used anywhere.  I'm more reluctant about edge since 1) the edges have
state the user may want to access and 2) if we don't provide an access
method (and don't reveal more of the internals than is prudent) to
user would need to do some pretty hackish / slow things to get
information we can provide cheaply and 3) it's just not that much
code.

That said, if it isn't used rather I'd replace it with something like:

  def edges(source, target)
    (edge?(source,target) && @out_from[source][target]) || []
  end

or

  def edges(source, target)
    ((@out_from[source] || {})[target]) || []
  end


>
>>
>> -
>> -  public
>> -
>> -#    # For some reason, unconnected vertices do not show up in
>> -#    # this graph.
>> -#    def to_jpg(path, name)
>> -#        gv = vertices
>> -#        Dir.chdir(path) do
>> -#            induced_subgraph(gv).write_to_graphic_file('jpg', name)
>> -#        end
>> -#    end
>
> Since this was the only method that called write_to_graphic_file(), we
> should eliminate it too.

Same reasoning as above; I was doing a performance pass & not trying
to sort out what is and isn't called.  If it truly isn't used, sure,
remove it, but not that's an orthogonal change.

>>
>> @@ -381,6 +314,14 @@ class Puppet::SimpleGraph
>>     predecessor
>>   end
>>
>> +  def downstream_from_vertex(v)
>> +   �...@downstream_from[v] || @out_from[v].keys.inject(@downstream_from[v] =
>> {}) { |result,node| result.update(node=>1).update
>> downstream_from_vertex(node) }
>> +  end
>> +
>> +  def upstream_from_vertex(v)
>> +   �...@upstream_from[v]   || @in_to[v].keys.inject(   @upstream_from[v]   =
>> {}) { |result,node| result.update(node=>1).update upstream_from_vertex(node)
>> }
>> +  end
>> +
>
> These two methods are very difficult to read.  Is there a good reason not to
> do something like this?
> def downstream_from_vertex(v)
>   return @downstream_from[v] if @downstream_from[v]
>   result = @downstream_from[v] = {}
> �...@out_from.keys.each do |node|
>     result[node] = 1
>     result.update(downstream_from_vertex(node))
>   end
>   result
> end

Because I find early returns & mucking about with temporary variables
like "result" much more difficult to read?  The first one tells you
explicitly what the result is and the second just tells you that it's
what you get if you follow some procedure.  Also, the second form is
longer and, I suspect, slower.

Taking the idioms individually:

def foo
  @foo || (exp)
end

is idiomatic ruby; the alternative:

def foo
  return @foo if @foo
  result = @foo = (exp)
  result
end

doesn't look like ruby at all, and has way too many moving parts.

As for using inject to build a hash in a manner analogous to the way
collect builds an array, it isn't a common but it's still (IMHO)
better than the procedural version.  I'd be thrilled if ruby had hash
analogs for collect, flatten, etc. but I don't think using inject is
that much less readable than using map instead of collect for arrays.

The present version (with "node" changed to "x" to avoid a grammatical issue):

  def downstream_from_vertex(v)
    @downstream_from[v] ||
@out_from[v].keys.inject(@downstream_from[v] = {})  { |result,x|
result.update(x=>1).update downstream_from_vertex(x) }
  end

can be read off in English as: the nodes downstream from vertex v
(memoized) is a hash containing x and the nodes downstream from x for
each node x on an edge out from v.

> Also, these methods are only used by dependencies and dependents, above, and
> those methods only care about the keys of the returned hashes.  It seems
> like it would be clearer to implement them using Set objects rather than
> hashes.

The performance of Sets isn't that great in some versions of ruby.
IIRC it killed about 50-80% of the speed improvement.

-- Markus
-----------------------------------------------------------
The power of accurate observation is
commonly called cynicism by those
who have not got it.  ~George Bernard Shaw
------------------------------------------------------------

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Re: [Puppet-dev] [PATCH/puppet 1/1] [#4590] SimpleGraph is slow

Reply via email to