Hi Eric,
I think this sort of enhancement might fit better in scipy: much of the
sparse graph package is back-ported from there.
Regarding your specific problem: you might try using the ``dijkstra()``
function directly and passing the list of nodes you're interested in to the
``indices`` parameter. This will use Dijkstra's algorithm to compute
shortest paths for only those nodes (though for each of them it computes
shortest paths to all other nodes in the graph).
Hope that helps,
Jake
On Thu, Oct 10, 2013 at 5:05 PM, Eric Larson <[email protected]>wrote:
> Hello,
>
> I'm doing some work involving finding the shortest Dijkstra distances
> between nodes in a graph. I'm interested in contributing to sklearn to
> overcome an issue I ran into.
>
> My problem is that I have over N=100,000 nodes defining the graph, but I'm
> only interested in getting distances between all pairs within a subset of
> them (e.g., between the M=10,000 nodes of interest). Not only would
> calculating /all/ pairs be incredibly inefficient, but it causes a memory
> error when using `graph_shortest_path.py`. Unfortunately
> `single_source_shortest_path` doesn't do Dijkstra distances (only
> unweighted), so I can't use that -- it's also not anywhere near as
> efficient. I can get away with using networkx functions, but they are also
> much slower than the sklearn Cython code.
>
> It seems the following fairly simple changes to `graph_shortest_path.pyx`
> would enable my desired behavior of subselecting a set of nodes to
> calculate pairwise distances for:
>
> 1. Add argument `nodes=None` to allow specification of nodes of interest.
> A user passing `None` would basically be translated into being
> `np.arange(N)`, where N is the number of nodes in the graph, to maintain
> current behavior. When argument != None, `dijkstra` must be the selected
> method, unless there's an easy way to modify `floyd_warshall` to operate on
> selected rows (I don't immediately see how it's possible).
> 2. Modify `dijkstra` to take and pass the additional `nodes` argument (now
> actually an array).
> 3. Modify `dijkstra` to loop over only selected indices (line 213 and 229:
> `for i from 0 <= i < N:`).
> 4. Modify the output type of `dijkstra` and `graph_shortest_path` to be an
> MxM array, where M is the number of nodes for which distances are
> calculated.
>
> Instead of changing `graph_shortest_path`, a second option is to overhaul
> `single_source_shortest_path_length` to make use of the `dijkstra` code, as
> suggested on the developers utilities page. The issue with changing the
> behavior of that function, though, is that it doesn't currently do weighted
> path lengths, so it might cause an effective API change. It also wouldn't
> be as efficient to loop over my 10,000 vertices in python, select the
> appropriate 10,000-element subset from the 100,000 that were computed, and
> make an array of those (compared to doing everything in C).
>
> A third option is to make another function entirely, perhaps
> `multiple_source_shortest_path`. It would borrow heavily from
> `graph_shortest_path.pyx`. This option doesn't sound that great to me due
> to code duplication.
>
> Thoughts?
>
> Cheers,
> Eric
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general