Heya, sorry for not responding sooner. Running those algorithms algorithm is expensive (O(n^3) from memory), so that's going to be a big limiting factor. And I worry that your graph may be too big for these algorithsm. The max_iter param is certainly available for tuning which trade-off the accuracy of the result. Totally speculating: I don't think sparsifying would help too much with these implementations. These both create fully connected graphs as part of the graph construction step. I think sparsification would help a lot if you instead directly simulated the particle movements through the graph, instead of using these exact solutions.
For #2, what if you subclassed the LabelSpreading class and overrode _build_graph <https://github.com/scikit-learn/scikit-learn/blob/a5ab948/sklearn/semi_supervised/label_propagation.py#L449> to inject the graph that you set up? May be a big hack. On Thu, Dec 1, 2016 at 7:33 PM, Delip Rao <delip...@gmail.com> wrote: > Hello, > > I have an existing graph dataset in the edge format: > > node_i node_j weight > > The number of nodes are around 3.6M, and the number of edges are around > 72M. > > I also have some labeled data (around a dozen per class with 16 classes in > total), so overall, a perfect setting for label propagation or its > variants. In particular, I want to try the LabelSpreading implementation > for the regularization. I looked at the documentation and can't find a way > to plug in a pre-computed graph (or adjacency matrix). So two questions: > > 1. What are any scaling issues I should be aware of for a dataset of this > size? I can try sparsifying the graph, but would love to learn any knobs I > should be aware of. > 2. How do I plugin an existing weighted graph with the current API? Happy > to use any undocumented features. > > Thanks in advance! > Delip > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn