Hello, Is there a way to remove points from a BallTree or KDtree object? Alternatively, does anyone know of an Rtree implementation in either sklearn, or a sklearn dependency such as scipy or numpy?
I have pending pull request (#1984) that has run time mainly constrained by nearest neighbor lookups. These lookups are a function of tree size; so for example, a 'large' tree may have a per call query time of 0.07 seconds, while a 'small' tree may have a per call query time of 0.0001 seconds. Once I execute a neighborhood query, I don't care about that point anymore-- I can exclude it from future results. In other words, I'm looking for a way to 'shrink' the query trees as I process a dataset, such that the first query has a query time of 0.07, and the n/2 query has a query time between 0.07 and 0.0001 seconds query time. Being able to do this would be a substantial speedup in run time (over an order of magnitude in many cases). I can rebuild the query tree using only unprocessed points, but it takes over a second to rebuild from scratch on a large data set (although the rebuild time does drop as the set shrinks). I don't want to pause and wait for for the rebuild-- unless there is a faster way to delete and 'rebalance' the tree as opposed to building from scratch. I could also asynchronously rebuild the tree -- i.e., use a previously built tree to execute queries, while simultaneously rebuilding a new tree and then swap the new tree in for queries once built. The idea here would be to have one process running the queries, while a second continuously rebuilt the tree using only the currently unprocessed points. This is a bit of an ugly hack, and I'm honesty not sure a.) how exactly to implement it, and b.) if it would be possible to keep python 2.x compatibility for this type of approach ('asyncio' is only included in the standard library from python 3). The best solution I can think of would be either an Rtree, or a call to delete points from an existing query tree. Does anyone know of a way to accomplish this? Are there other suggestions/solutions to the above? Thanks, Shane
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general