Hello,

Is there a way to remove points from a BallTree or KDtree object?
Alternatively, does anyone know of an Rtree implementation in either
sklearn, or a sklearn dependency such as scipy or numpy?

I have pending pull request (#1984) that has run time mainly constrained by
nearest neighbor lookups. These lookups are a function of tree size; so for
example, a 'large' tree may have a per call query time of 0.07 seconds,
while a 'small' tree may have a per call query time of 0.0001 seconds. Once
I execute a neighborhood query, I don't care about that point anymore-- I
can exclude it from future results. In other words, I'm looking for a way
to 'shrink' the query trees as I process a dataset, such that the first
query has a query time of 0.07, and the n/2 query has a query time between
0.07 and 0.0001 seconds query time. Being able to do this would be a
substantial speedup in run time (over an order of magnitude in many cases).

I can rebuild the query tree using only unprocessed points, but it takes
over a second to rebuild from scratch on a large data set (although the
rebuild time does drop as the set shrinks). I don't want to pause and wait
for for the rebuild-- unless there is a faster way to delete and
'rebalance' the tree as opposed to building from scratch.

I could also asynchronously rebuild the tree -- i.e., use a previously
built tree to execute queries, while simultaneously rebuilding a new tree
and then swap the new tree in for queries once built. The idea here would
be to have one process running the queries, while a second continuously
rebuilt the tree using only the currently unprocessed points. This is a bit
of an ugly hack, and I'm honesty not sure a.) how exactly to implement it,
and b.) if it would be possible to keep python 2.x compatibility for this
type of approach ('asyncio' is only included in the standard library from
python 3).

The best solution I can think of would be either an Rtree, or a call to
delete points from an existing query tree. Does anyone know of a way to
accomplish this?

Are there other suggestions/solutions to the above?

Thanks,
Shane
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to