The union find algorithm would be appropriate for labeling connected components as you build the graph.
Andrew Baine On Thursday, July 22, 2010, Arijit Mukherjee <ariji...@gmail.com> wrote: > Thanx to both of you. Yes, I can just check whether the label exists > on the node or not. In my case checking for Integer.MIN_VALUE which is > what is assigned when the subscriber node is created. > > BTW - is it ever possible to label the components while creating the > graph? I can't think of any way of doing this - but I might be missing > something... > > Regards > Arijit > > On 22 July 2010 20:54, Vitor De Mario <vitordema...@gmail.com> wrote: >> As far as the algorithm goes, I see nothing wrong. Connected components is a >> well known problem in graph theory, and you're doing just fine. >> >> I second the recommendations of Tobias, specially the second one, as you >> would get rid of the labelled collection completely, and that improves you >> both in time and memory. >> >> []'s >> Vitor >> >> >> On Thu, Jul 22, 2010 at 11:35 AM, Tobias Ivarsson < >> tobias.ivars...@neotechnology.com> wrote: >> >>> The first obvious thing is that labelled.contains(currentNode.getId()) is >>> going to take more time as your dataset grows, since it's a linear search >>> for the element in an ArrayList. A HashSet would be a much more appropriate >>> data structure for your application. >>> >>> The other thing that comes to mind is the memory overhead of the >>> labelled-collection. Eventually it is going to contain every node in the >>> graph, and be very large. This "steals" some of the memory that could have >>> been used for caching the graph, forcing Neo4j to do more I/O than it would >>> have if it could have used that memory for cache. >>> >>> Would it be possible for you to replace >>> the !labelled.contains(currentNode.getId())-check with >>> currentNode.getProperty("componentID",null) == null? Or are there >>> situations >>> where the node could have that property and not be considered labeled? >>> >>> Cheers, >>> Tobias >>> >>> On Thu, Jul 22, 2010 at 3:35 PM, Arijit Mukherjee <ariji...@gmail.com >>> >wrote: >>> >>> > Hi All >>> > >>> > I'm trying to label all connected components in a graph - i.e. all >>> > nodes that are connected will have a common "componentID" property >>> > set. I'm using the Traverser to do this. For each node in the graph >>> > (unless it is already labelled, which I track by inserting the node ID >>> > in a list), the traverser finds out all the neighbours using BFS, and >>> > then the node and all the neighbours are labelled with a certain >>> > value. The code is something like this - >>> > >>> > Iterable<Node> allNodes = graphDbService.getAllNodes(); >>> > ArrayList labelled = new ArrayList(); >>> > for (Node currentNode : allNodes) { >>> > if (currentNode.hasProperty("number") && >>> > !labelled.contains(currentNode.getId())) { >>> > Traverser traverser = >>> > currentNode.traverse(Order.BREADTH_FIRST, >>> > StopEvaluator.END_OF_GRAPH, >>> > ReturnableEvaluator.ALL_BUT_START_NODE, >>> > RelTypes.CALLS, Direction.BOTH); >>> > int currentID = initialID; >>> > initialID++; >>> > currentNode.setProperty("componentID", currentID); >>> > labelled.add(currentNode.getId()); >>> > for (Node friend : traverser) { >>> > friend.setProperty("componentID", currentID); >>> > // mark each node as labelled >>> > labelled.add(friend.getId()); >>> > } >>> > } >>> > } >>> > >>> > -- > "And when the night is cloudy, > There is still a light that shines on me, > Shine on until tomorrow, let it be." > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Andrew Baine cell: 202-494-8716 _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user