Re: Graph library for Python

M.-A. Lemburg Wed, 09 Dec 2009 16:28:17 -0800

geremy condra wrote:
> On Mon, Dec 7, 2009 at 6:28 PM, M.-A. Lemburg <[email protected]> wrote:
>>>>  * Graph.__init__ should be able to take a list or set
>>>>   of nodes and edges as initializer
>>>
>>> The format of this will need to be thought all the way
>>> through before being implemented. To date, we haven't
>>> come up with anything completely satisfactory, but
>>> AFAIK everybody involved is still open to suggestions
>>> on this.
>>
>> I wasn't thinking of anything clever :-) ...
>>
>> g = Graph(
>>      [Node("a"), Node("b"), Node("c")],
>>      [Edge(Node("a"), Node("b"), "ab"),
>>       Edge(Node("a"), Node("c"), "ac"),
>>       Edge(Node("b"), Node("c"), "bc"),
>>      ])
>>
>> The main motivation here is to get lists, sets and dicts
>> play nice together.
> 
> Generally, we've tried to discourage people from instantiating
> nodes and edges directly, in favor of having them controlled
> through the graph. Maybe something along the lines of:
> 
> g = Graph(nodes=['a', 'b', 'c'], edges=[('a', 'b'), ('a', 'c'), ('b', 'c')])
> 
> ?


That would work as well, but you then miss out on the extra
parameters you can pass to Edge() and Node().

In another email you wrote that Edge() and Node() constructors
should not be used directly, since they have to know the Graph
to which they belong.

Wouldn't it be possible to implement this kind of parent-
referencing in a lazy way, ie. by postponing all the init-
processing until the Graph instance is known ?

>>>>  * Graph.__setitem__ could be mapped to .add_node() for
>>>>   convenience
>>>
>>> This may be a question of playing around with it. ATM I'm
>>> not sold on the idea, but I'll implement it and see how it
>>> turns out in practice.
>>
>> Thinking about it some more, I agree, it's not all that useful.
>>
>>>>  * Graph.__length__ could be mapped to .size() for
>>>>   convenience
>>>
>>> We decided not to do this due to the ambiguity between
>>> whether .size() or .order() was the intended operation,
>>> and looking back I'm not sure that was entirely unjustified.
>>> Do you see there being any confusion on that score?
>>
>> There is an ambiguity here, indeed. My thinking was that
>> the edges define the graph and can be mapped to a dictionary,
>> e.g.
>>
>> d = {"ab": ("a", "b"),
>>     "ac": ("a", "c"),
>>     "bc": ("b", "c")}
>>
>> len(d) == 3
>> len(g) == 3
>>
>> A graph without edges is also what you typically call an empty
>> graph, ie.
>>
>> if not g:
>>    print 'empty'
>>
>> http://mathworld.wolfram.com/EmptyGraph.html
>>
>> Still, perhaps it's better to just not go down this route.
> 
> I'd rather avoid it if possible, since there are many
> potential interpretations of this in different contexts.
> Again, I'm open to to the idea, but not convinced.
> 
>>>>  * Graph.__iter__ could be mapped to an iterator using
>>>>   the fastest traversal method for the graph nodes
>>>>   (ie. order does not matter, it's only important that
>>>>   all nodes are found as fast as possible)
>>>
>>> Again, it seems ambiguous as to whether nodes or
>>> edges are the intended target here, and while the
>>> API can obviously dictate that, it seems a bit like
>>> a case of "in the face of ambiguity, refuse the
>>> temptation to guess" to me.
>>
>> Right, but sometimes "practicalty beats purity" ;-) We had
>> the same situation for dictionaries and then decided that
>> iteration over keys would be more natural than iterating over
>> items (key, value) or values.
>>
>> It's also important to note that:
>>
>>        for n in g: print n
>>
>> and
>>
>>        n in g
>>
>> match up in terms of semantics.
>>
>> Since n in g already uses the "node in graph" approach,
>> the for-loop should use the same logic.
> 
> Beware, that doesn't just match nodes.
> 
> g = Graph()
> g.add_node('a')
> g.add_node('b')
> g.add_edge('a', 'b', 'ab')
> 'ab' in g # returns true

I'd avoid such an ambiguity. It could easily hide programming
errors (testing for edges instead of nodes).

OTOH, you could also regard the graph as a set of nodes
and edges (as you apparently already do). In that case,
you'd define

for x in g: print x

as iteration of both nodes and edges in some arbitrary
order and then use the more specific:

for n in g.nodes: print x
for e in g.edges: print x

for iteration over just the nodes or edges.

>>>>  * Graph could also benefit from some bulk update methods,
>>>>   e.g. to update weights on all edges or nodes by passing
>>>>   in a dictionary mapping names to attribute values
>>>
>>> Sounds good. Again, the format will need some careful
>>> thought, but you're right that this would improve it
>>> greatly.
>>
>> This is only an optimization, but could lead to some great
>> performance improvements by avoiding function call overhead.
> 
> Agreed.
> 
>>>>  * Graph could benefit from some bulk query methods,
>>>>   such as .get_node_attributes() and .add_edges().
>>>
>>> I'm not sure how the first is different from the existing
>>> .data, what did you have in mind?
>>
>> The second one was a typo. It should have been
>> .get_edge_attributes().
>>
>> In both cases I was thinking of being able to quickly
>> extract a number of node/edge attributes, e.g.
>>
>>>>> g.get_edge_attributes('weight', 'color')
>> {"ab": {"weight": 3, "color": "blue"},
>>  "ac": {"weight": 2, "color": "green"},
>>  "bc": {"weight": 1, "color": "red"}}
>>
>> Again, the idea is to reduce call overhead and later
>> on be able to move these lookups to C.
> 
> Entirely doable, but I'm not sure I see the use case.
> Would you mind providing a more realistic example?

The idea is that you use the Graph representation of the
data to implement some algorithm e.g. optimizing weights
for example.

The algorithm could be implemented in C to be fast enough
for large data sets.

The above two methods would then be used to quickly push the
original data into the graph and extract it again after
the algorithm has run.

>>>>  * Node.__getitem__ could be mapped to .data.__getitem__
>>>>   (following the approach taken by ElementTree); same
>>>>   for Edge.__getitem__
>>>
>>>>  * Node.__setitem__ could be mapped to .data.__setitem__
>>>>   (following the approach taken by ElementTree); same
>>>>   for Edge.__setitem__
>>>
>>> .data is just a property that returns a dictionary of non
>>> private members right now, so if you're wanting to just say
>>> node.this = 'stuff', you can already do that. Or am I
>>> misreading your intent?
>>
>> Right, but with attributes you are restricted to Python
>> identifiers, e.g. keywords (e.g. "from", "pass") are not allowed.
>> The above approach bypasses this restriction.
> 
> Hmm. Sounds like a plausible use case to me, although I'm
> not sure its one that should be encouraged. The bigger
> question in my mind is whether all attribute lookups should
> have to pay the extra lookup cost to support a somewhat
> narrow need. I'll definitely have to talk with the other devs
> about this one, and run a little bit of timing besides.

This would only be an optional second way of accessing
the .data dictionary.

node.data['pass'] = 1
node.data['from'] = 'Las Vegas'
node.data['to'] = 'New York'

would work without such modifications.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 10 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Graph library for Python

Reply via email to