Re: [Neo] Introduction and some thoughts about the coming algorithm component

Patrik Larsson Fri, 23 May 2008 11:50:44 -0700

Hi Marko!

Marko A. Rodriguez wrote:

Hi Patrick,


My name is Marko Rodriguez.

Your problem of calculating N^(2logN) algorithms (e.g. betweenness) on a
massive graph will not go away by simply placing things in memory :).
Also, a relaxation of your depth isn't going to solve much as natural
network have very small diameters -- the entire World Wide Web citation
graph has a diameter of 16 (and thats considered quite big). As we all
know from the Millgram experiments, 6 degrees is all you get from one
person to another in a social network. Thus, once your traverser hits a
super node (highly connected node) ..... slooooooooooowwww...

Yes, this is one of the greater problems we face if we want to do thesekinds of calculations. The point of copying things into memory is toonly copy a very small portion of the network. But the main problemregardless if we do that or not, assuming doing the calculation on theentire network is impossible, is how to figure out how much/what partsof the network is relevant to include. Any ideas on this matter would beappreciated.


Furthermore, you are also dealing with the problem of how to implement a
single-relational network analysis algorithms (i.e. unlabeled graph
algorithms -- the standard in nearly all textbooks) on a multi-relational
network (i.e. labeled or semantic graph). Neo being multi-relational and
all popular network analysis algorithms being designed for
single-relational networks.

While a multi-relational network is indeed a collection of
single-relational network, where each single-relational "slice" has its
own meaning (e.g. KNOWS, WORKS_WITH, etc.), you simply can't calculate a
centrality measure on a single single-relational "slice" if you expect to
do any sort of interesting computation. For instance, lets say you want to
calculate the centrality of a cocitation network between scholars but all
you have are WROTE (maps human to article) and CITES (maps article to
article) edges. In such cases, you will need to move in the graph in more
interesting ways... You will need to take a WROTE, CITES, WROTE^-1 and
loop on that pattern.... This is a simple example, and for more
complicated semantics, you will need a traverser that is nearly Turing
complete.

We would thus like to loop over patterns instead of just singlerelationships? Very good point.

Take care,
Marko.

Thank you very much!
/Patrik

Peter Neubauer wrote:

Hi Patrik,
there is a lot of commits in your lab, maybe it would be cool to have
a small intro on what you are doing and what the status of the algos
are?

/peter

Hi all!

I actually intended to properly introduce myself when my work is added
as a component (which was supposed to happen soon), anyway...

For everyone who does not know about my work, I am working on
implementing some graph algorithms for Neo. So far the focus has been on
finding shortest paths and using that ability to calculate different
centrality measures for nodes in a Neo network.

So far regarding shortest paths, there are implementations of Dijkstra
to find paths between two given nodes (uses bidirectional search) and
also the classic problem of finding the shortest paths from one source
node to all others. For unweighted networks, where the relationships
have an implicit cost of 1, there is a breadth first search which only
differs from Neos BFS traverser in the way that it can find several
paths to each node.

Regarding measures and centrality there is so far support for stress
centrality, betweenness centrality, closeness centrality, eccentricity,
network radius and network diameter. (For definitions of these, I refer
to wikipedia).

What I am working on right now, except for properly documenting what i
have done so far, is experimenting with cutting out parts of a network
(like a subgraph). Since I would appreciate some thoughts/feedback on
this I will here describe it in more detail.

The issue is this. When doing calculations, doing it on the entire
network would just be too much data to handle and take too much time so
we need some way of limiting what part of the network we run the
calculation on. One way of doing this (partly already supported) is to
give each algorithm some kind of limit, for example we can provide our
BFS search with a maximum depth to traverse or our Dijkstra with a
maximum number of relationships to relax/consider.

The other way of doing this is to first create an in-memory copy of the
relevant part of the network, which is what I am working on, and then
release the calculation on that.

A number of questions arise, like if we would like to keep relationship
types (makes most sense) or be able to rearrange/rename them or truncate
them into one combined type.

The most important question however is; how much should we detach this
copy from the real network?

Should we make an entirely detached copy so we can make changes to it
without affecting the "real" network, for example be able to create the
transitive closure or similar? Or should we assume that the algorithm
beeing used is supposed to make some change to the real network so we
need to reflect these changes down to it?



Any feedback and thoughts on this or any other part of my work
(currently in
https://svn.neo4j.org/laboratory/users/patrik/neoalgo/trunk/) is much
appreciated.

This mail became a bit longer than expected, sorry, but thats the way it
happened ;)

Regards
Patrik Larsson

_______________________________________________
Neo mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user


_______________________________________________
Neo mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user


_______________________________________________
Neo mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] Introduction and some thoughts about the coming algorithm component

Reply via email to