On Tue, Jan 28, 2014 at 9:25 PM, Olivier Grisel <olivier.gri...@ensta.org>wrote:
> While vanilla LSH is an interesting baseline for Approximate Nearest
> Neighbors search, it is often too error-prone to be practically
> useful. There exists alternative data-driven ANN methods that can have
> a much lower error rates (depending on the data). Among the top
> implementations there are FLANN [1] and Spotify's Annoy [2]. Both are
> written in C++ with Python bindings: there is a bench here by Radim:
>
>
> http://radimrehurek.com/2014/01/performance-shootout-of-nearest-neighbours-querying/
>
> It would be interesting to implement the baseline vanilla LSH (either
> with random projections or min-hash) and / or Cython versions of the
> random projection forests from Annoy and / or the Hierarchical k-means
> trees and randomized K-D trees from FLANN.
>
As always, I think the rule of thumb for inclusion in scikit-learn should
be that the algorithm is standard in the field and have a fairly high
citation count. Is this the case for the algorithms you mention? What are
the 2 or 3 most famous LSH algorithms?
> All approaches could be probably be implemented quite effiently by
> reusing existing functions and classes from other sklearn modules. For
> instance building 100 FLANN-style randomized kd-trees would look very
> similar to:
>
> ExtraTreesRegressor(n_estimators=100, max_features=1).fit(data,
> np.zeros(data.shape[0]))
>
> That sounds similar to
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomTreesEmbedding.html
Mathieu
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general