Re: [Pharo-project] [squeak-dev] Re: Graph library in Smalltalk - Need for advices

YossiDM Tue, 21 Dec 2010 15:25:37 -0800

Sorry for the late reply, I've been away on business. I'm going to have to
make one of those unfortunate long posts, so hopefully I can hold the
attention span of those already interested anyway.

Cédrick - I found the main limitation with Lemon to be that it was a pain to
extend as well as to integrate a nice persistence layer. 

Stéphane, regarding Mondrian - I've looked over the code and while it's ok
for what it does, I think everyone would admit it's not designed to be a
real graph library. It does what it needs to do and no more, so I would keep
it that way for now. You really do need to design a good graph lib from the
ground up.

It's been my experience after using probably 20 different graph libraries in
real projects over the years that the following themes reoccur:

1. Library is good for visualization, but not for calculations - shortest
path, centrality, MST, etc.

2. Library is good for algorithms but has no visualization support

3. Graph library relies too much on other libraries that are not that
impressive either. Graphviz comes to mine. Great tool, but frankly the
visualizations looks awful by today's standards and it's harder to do
something fully dynamic. Piping a dot file to a unix process to produce a
static image is not terribly useful in things like web applications.

4. Library does not scale predictably or to a certain point. 

5. Library does not support persistence properly.

6. Library forces an API that takes over your data too much or doesn't play
well with existing data. Any model that forces inheritance for Vertex and
Edge classes comes to mind. See neo4j for some examples of a poor API on
this point.

7. Library is good all around but lacks depth (missing algorithms, measures,
etc) and does not play well with others.

8. Every library has its own idea of what a vertex and edge should be and
there's no good way of passing data between them in a nice manner.

9. Visualization libraries have a multitude of issues - specific to desktop
or web only, static output, no capability to annotate vertex or edges, no
pruning... I have yet to find one that is both pretty and useful to this day
without nearly recreating a new library each time.

The reality is every library will have to make compromises at a minimum in
the following areas:

1. Scalability
2. Data structures
3. Algorithms
4. Performance
5. Visualization

I think you need to at least consider all the above when creating a library.
In fact, you probably should break it up as many libraries do into several
components:

1. Common, useful data structures (Vertex, Edge, Adjacency List, Adjaceny
Matrix, etc.) that are optimized for your language - hopefully smalltalk

2. Computations / Measurements

3. Algorithms - layout, common graph problems, etc.

4. Persistence - database or otherwise, likely several supporting libs
specific to the persistence mechanism or implementing some kind of pattern
to be agnostic. The data structures should at least be designed in such a
way that they are persistence friendly to put in something like Gemstone,
Magma, Image, etc. See for example InfiniteGraph which has a Java interface
built on Objectivity.

5. Visualization - desktop, web, and static output.

The problem here is that creating a graph library is a huge undertaking. It
might not sound like it, but they can grow to epic proportions. From my
comments, you can see that it is really not the fault of any particular lib,
but rather the subject matter. You could spend a lifetime packing in
features. The real key is just to create a series of libs that work well
together and not constantly reinvent the wheel with node classes in each
library. 

A secondary problem is doing graph analysis and even drawing graphs can take
a lot of horse power. Parallelism is a tough issue with graph libraries and
there's a multitude of approaches from pure threads to map/reduce to random
walking and stream processing.  This is further compounded by persistence
where you need to start doing things like graph healing, partitioning, and
sharding to scale to massive levels and maintain performance.

I just want to mention to others that graphs are hugely useful in general to
solve a variety of problems from recommendations to meta programming. It's
not just pretty pictures and experiments with molecules. There's a lot of
potential in Smalltalk to do something since generally I feel Smalltalkers
aren't bound by or afraid of new approaches to old problems. Graph databases
in many cases could replace relational DBs for example and let your app do
all kinds of stuff you might never have thought of otherwise.

I could go on and on, but I'll stop myself here. Comments or thoughts?

Stéphane Ducasse wrote:
> 
>> 
>> I've started looking at the exemples YossiDM gave to me and in particular
>> Lemon which was according to him his best experience. I found the model
>> quite clear and covering all what I expect for a generic graph lib
>> (directed, undirected, mapping concept, iterators, and algorithms of
>> course). Moreover and contrary to Boost, it's still developed in 2010.
>> 
>> To be more precise, here's what I expect for a generic graph lib in
>> smalltalk (note all in Lemon except visualization):
>> 
>> - data structure: directed graphs, undirected graphs, possible loop and
>> parallel edged, ..., trees (?)
>> - mapping: easily map objects, informations on nodes and/or edges (here,
>> don't know if I'd like subclassing nodes/eges instead...)
>> - iterators: efficient way to iterate over nodes and edges
>> - algorithms: basic algorithms implementation (bfs, dfs, ..., shortest
>> paths, ...), and plug-ability for specific ones...
>> - visualization: having an interactive graph visualization web/SVG and
>> eventually morphic (... graphviz, mondrian, .......)
>> 
>> then, I could use this for my research work...
>> - I need "belief" nodes with associated conditional beliefs tables
>> - I need to implement algorithms to propagate an information change
>> (change of a node state) in any nodes... 
>> ****mainly, I'd like to get junction trees from a graph [1] which are
>> rely useful for several domains in machine learning field ***
>> 
>> Actually, I don't know if I really need a graph lib as a simple
>> implementation directed to bayesian should be enough but it's the second
>> time I need graphs (last time was for planification) and I think that
>> would be great to have a nice and clean basic implementation.
>> 
>> Couldn't we start developing something similar to Lemon (regarding "API",
>> enitites, etc...) that would work for small scale project project in
>> smalltalk ?
> 
> It would be excellent.
> Because now that you have a full time permanent position you can invest a
> bit and in 2 years you can get something really sexy....
> This is what we are doing all the time around pharo.
> 
>> Yossi, what were the limitations you found with Lemon ?
>> 
>> Cheers,
>> 
>> Cédrick
>> 
> 
> 
> 
> 

-- 
View this message in context: 
http://forum.world.st/Graph-library-in-Smalltalk-Need-for-advices-tp3092747p3159886.html
Sent from the Pharo Smalltalk mailing list archive at Nabble.com.

Re: [Pharo-project] [squeak-dev] Re: Graph library in Smalltalk - Need for advices

Reply via email to