I have an existing graph dataset in the edge format:
node_i node_j weight
The number of nodes are around 3.6M, and the number of edges are around 72M.
I also have some labeled data (around a dozen per class with 16 classes in
total), so overall, a perfect setting for label propagation or its
variants. In particular, I want to try the LabelSpreading implementation
for the regularization. I looked at the documentation and can't find a way
to plug in a pre-computed graph (or adjacency matrix). So two questions:
1. What are any scaling issues I should be aware of for a dataset of this
size? I can try sparsifying the graph, but would love to learn any knobs I
should be aware of.
2. How do I plugin an existing weighted graph with the current API? Happy
to use any undocumented features.
Thanks in advance!
scikit-learn mailing list