Hi Paul > My questions for the group are: > > 1) Are there any suggestions for how subgraphs might be algorithmically > determined? Are there "clustering" algorithms that might be leveraged? > What you describe brings to my mind the field of "community detection" techniques, i.e. methods that identify groups of nodes on a graph that are more densely connected to each other than to the rest of the network. There is a real abundance of methods for solving this problem, each with its strengths and weaknesses. Perhaps the most widely known methods are the ones based on the concept of modularity (introduced by prof. M.E.J. Newman). Those methods, however, tend to produce communities that are highly skewed in size that is you get few gigantic communities and numerous smaller ones. There are also hierarchical methods that result in a community dendrogram and enable you to cut in a way such that communities have more balanced sizes. My personal favorites are variants of the SCAN method (by Xu et al., KDD 2007) which do not assign all graph nodes to communities and are scalable to graphs of the sizes you mention (million+ nodes, tens of millions of edges).
Best regards, Symeon _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

