Re: Graph clustering via LinLog force directed layout
In it's ideal form (no artificial information decay) the algo is n*n Every node communicates it's position to every other node From the node's perspective, it considers the positions of every other node and it's relationship to the node before making a move If n*n will scale to 5M nodes on giraph then it's simple If not, we have to condense the messaging -- Barnes-Hut simulation using an Octree or Quadtree structure is one option -- http://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation -- this reduces the algo to n*log(n), but adds an additional graph to consider On Wed, Mar 7, 2012 at 11:57 AM, Claudio Martella claudio.marte...@gmail.com wrote: Ok, but why a bipartite graph? I can imagine you can have 2 or 3 coordinates associated with each vertices without actually having multiple vertex types. On Wednesday, March 7, 2012, Timmy Wilson tim...@smarttypes.org wrote: while i'd expect from linlog something like an (x,y) coordinate. is that correct? (x,y,z) is also common -- i think of it as extreme dimension reduction -- after which you're free to use your favorite gis tool -- http://postgis.refractions.net/ -- you can also pull in tools from statistical mechanics/thermodynamics Without complicating the algo for efficiency -- the essence of the energy model is simple -- all nodes repulse each other -- connected nodes attract each other -- this works for any type of graph -- i'm using a weighted directed graph -- in which case the weighs influence the attractive force It's all very continuous/natural -- living in a 3d world we've build a lot of tools/methods to process this kind of information The problem is without efficiency methods like Barnes-Hut we have Cartesian(x,y) or Cartesian(x,y,z) -- because every vertex influences every other vertex An efficient giraph implementation would need 2 layers -- a bipartite graph -- assuming (x,y,z) reduction -- we would need an Octree graph interacting dynamically w/ our graph of interest On Wed, Mar 7, 2012 at 5:00 AM, Claudio Martella claudio.marte...@gmail.com wrote: My personal take is that they do have similar function (they extract communities), but they have a general different type of output. in label propagation you'd end up with an id for each vertex (the vertex id that is the centroid for the community each vertex belongs to), while i'd expect from linlog something like an (x,y) coordinate. is that correct? On Wed, Mar 7, 2012 at 2:45 AM, Timmy Wilson tim...@smarttypes.org wrote: Thank you everyone! I would love to see a comparison of force directed layouts (specifically LinLog) and label propagation. I searched but alas nothing -- they seem to be oddly similar? The current, serial LinLog implementation -- http://code.google.com/p/linloglayout/ -- uses Barnes-Hut simulation -- n*log(n): http://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation I guess the root question is -- do you think it's reasonable to use giraph for Barnes-Hut simulation? On Tue, Mar 6, 2012 at 11:34 AM, Claudio Martella claudio.marte...@gmail.com wrote: Hi, I'm not definitely familiar with the algorithm or implementation of LinLog, I've been just a user. It should be doable with Giraph if you can express it in terms of message-passing between vertices and without a dependency on a global view of the graph (except for the convergence criteria, such as total energy). Please consider that Giraph's data model is based on a directed graph, this should be a quite interesting constraint for you, if your implementation is going to modify energy associated with edges (you'd have two views over the undirected edge, one in each endpoint). In general, a good way of doing community analysis would be to look at algorithms that belong to the family of label-propagation clustering algorithms. Hope this helps, Claudio On Tue, Mar 6, 2012 at 3:28 PM, Timmy Wilson tim...@smarttypes.org wrote: Hi giraph community, I'm interested in using giraph for distributed n-body simulation. Initially, i'm interested in force directed layouts -- ie, graph drawing: http://en.wikipedia.org/wiki/Force-based_algorithms_(graph_drawing) I'm interested specifically in Dr. Andreas Noack's LinLog energy model -- which performs well w/ community detection: http://www.informatik.tu-cottbus.de/~an/GD/linlog.html I have a few examples of a serial implementation here: http://www.smarttypes.org/ The model maximizes the distance between all nodes while minimizing the distance between connected nodes. Without getting into too much detail, i'm curious if anyone has considered using giraph for force directed graph embedding (yet another name for it)? I'm also considering something like http://www.mcs.anl.gov/petsc/ or http://www.cs.cmu.edu/~scandal/alg/nbody.html -- which have fast -- Claudio Martella claudio.marte...@gmail.com
Re: Graph clustering via LinLog force directed layout
while i'd expect from linlog something like an (x,y) coordinate. is that correct? (x,y,z) is also common -- i think of it as extreme dimension reduction -- after which you're free to use your favorite gis tool -- http://postgis.refractions.net/ -- you can also pull in tools from statistical mechanics/thermodynamics Without complicating the algo for efficiency -- the essence of the energy model is simple -- all nodes repulse each other -- connected nodes attract each other -- this works for any type of graph -- i'm using a weighted directed graph -- in which case the weighs influence the attractive force It's all very continuous/natural -- living in a 3d world we've build a lot of tools/methods to process this kind of information The problem is without efficiency methods like Barnes-Hut we have Cartesian(x,y) or Cartesian(x,y,z) -- because every vertex influences every other vertex An efficient giraph implementation would need 2 layers -- a bipartite graph -- assuming (x,y,z) reduction -- we would need an Octree graph interacting dynamically w/ our graph of interest On Wed, Mar 7, 2012 at 5:00 AM, Claudio Martella claudio.marte...@gmail.com wrote: My personal take is that they do have similar function (they extract communities), but they have a general different type of output. in label propagation you'd end up with an id for each vertex (the vertex id that is the centroid for the community each vertex belongs to), while i'd expect from linlog something like an (x,y) coordinate. is that correct? On Wed, Mar 7, 2012 at 2:45 AM, Timmy Wilson tim...@smarttypes.org wrote: Thank you everyone! I would love to see a comparison of force directed layouts (specifically LinLog) and label propagation. I searched but alas nothing -- they seem to be oddly similar? The current, serial LinLog implementation -- http://code.google.com/p/linloglayout/ -- uses Barnes-Hut simulation -- n*log(n): http://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation I guess the root question is -- do you think it's reasonable to use giraph for Barnes-Hut simulation? On Tue, Mar 6, 2012 at 11:34 AM, Claudio Martella claudio.marte...@gmail.com wrote: Hi, I'm not definitely familiar with the algorithm or implementation of LinLog, I've been just a user. It should be doable with Giraph if you can express it in terms of message-passing between vertices and without a dependency on a global view of the graph (except for the convergence criteria, such as total energy). Please consider that Giraph's data model is based on a directed graph, this should be a quite interesting constraint for you, if your implementation is going to modify energy associated with edges (you'd have two views over the undirected edge, one in each endpoint). In general, a good way of doing community analysis would be to look at algorithms that belong to the family of label-propagation clustering algorithms. Hope this helps, Claudio On Tue, Mar 6, 2012 at 3:28 PM, Timmy Wilson tim...@smarttypes.org wrote: Hi giraph community, I'm interested in using giraph for distributed n-body simulation. Initially, i'm interested in force directed layouts -- ie, graph drawing: http://en.wikipedia.org/wiki/Force-based_algorithms_(graph_drawing) I'm interested specifically in Dr. Andreas Noack's LinLog energy model -- which performs well w/ community detection: http://www.informatik.tu-cottbus.de/~an/GD/linlog.html I have a few examples of a serial implementation here: http://www.smarttypes.org/ The model maximizes the distance between all nodes while minimizing the distance between connected nodes. Without getting into too much detail, i'm curious if anyone has considered using giraph for force directed graph embedding (yet another name for it)? I'm also considering something like http://www.mcs.anl.gov/petsc/ or http://www.cs.cmu.edu/~scandal/alg/nbody.html -- which have fast n-body simulation implementations (Barnes-Hut + Fast Multipole). That said, i think giraph may be a good fit -- curious what the community thinks? Thanks, Timmy Wilson Cleveland, OH -- Claudio Martella claudio.marte...@gmail.com -- Claudio Martella claudio.marte...@gmail.com
Graph clustering via LinLog force directed layout
Hi giraph community, I'm interested in using giraph for distributed n-body simulation. Initially, i'm interested in force directed layouts -- ie, graph drawing: http://en.wikipedia.org/wiki/Force-based_algorithms_(graph_drawing) I'm interested specifically in Dr. Andreas Noack's LinLog energy model -- which performs well w/ community detection: http://www.informatik.tu-cottbus.de/~an/GD/linlog.html I have a few examples of a serial implementation here: http://www.smarttypes.org/ The model maximizes the distance between all nodes while minimizing the distance between connected nodes. Without getting into too much detail, i'm curious if anyone has considered using giraph for force directed graph embedding (yet another name for it)? I'm also considering something like http://www.mcs.anl.gov/petsc/ or http://www.cs.cmu.edu/~scandal/alg/nbody.html -- which have fast n-body simulation implementations (Barnes-Hut + Fast Multipole). That said, i think giraph may be a good fit -- curious what the community thinks? Thanks, Timmy Wilson Cleveland, OH
Re: Graph clustering via LinLog force directed layout
Thank you everyone! I would love to see a comparison of force directed layouts (specifically LinLog) and label propagation. I searched but alas nothing -- they seem to be oddly similar? The current, serial LinLog implementation -- http://code.google.com/p/linloglayout/ -- uses Barnes-Hut simulation -- n*log(n): http://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation I guess the root question is -- do you think it's reasonable to use giraph for Barnes-Hut simulation? On Tue, Mar 6, 2012 at 11:34 AM, Claudio Martella claudio.marte...@gmail.com wrote: Hi, I'm not definitely familiar with the algorithm or implementation of LinLog, I've been just a user. It should be doable with Giraph if you can express it in terms of message-passing between vertices and without a dependency on a global view of the graph (except for the convergence criteria, such as total energy). Please consider that Giraph's data model is based on a directed graph, this should be a quite interesting constraint for you, if your implementation is going to modify energy associated with edges (you'd have two views over the undirected edge, one in each endpoint). In general, a good way of doing community analysis would be to look at algorithms that belong to the family of label-propagation clustering algorithms. Hope this helps, Claudio On Tue, Mar 6, 2012 at 3:28 PM, Timmy Wilson tim...@smarttypes.org wrote: Hi giraph community, I'm interested in using giraph for distributed n-body simulation. Initially, i'm interested in force directed layouts -- ie, graph drawing: http://en.wikipedia.org/wiki/Force-based_algorithms_(graph_drawing) I'm interested specifically in Dr. Andreas Noack's LinLog energy model -- which performs well w/ community detection: http://www.informatik.tu-cottbus.de/~an/GD/linlog.html I have a few examples of a serial implementation here: http://www.smarttypes.org/ The model maximizes the distance between all nodes while minimizing the distance between connected nodes. Without getting into too much detail, i'm curious if anyone has considered using giraph for force directed graph embedding (yet another name for it)? I'm also considering something like http://www.mcs.anl.gov/petsc/ or http://www.cs.cmu.edu/~scandal/alg/nbody.html -- which have fast n-body simulation implementations (Barnes-Hut + Fast Multipole). That said, i think giraph may be a good fit -- curious what the community thinks? Thanks, Timmy Wilson Cleveland, OH -- Claudio Martella claudio.marte...@gmail.com