On Fri, Mar 14, 2014 at 12:49:14PM +0100, Olivier Grisel wrote: > It is my understanding that the canonical application of Approx > Nearest Neighbors methods is for multi-media similarity search with > dense, medium dimensional descriptor vectors. For instance find the > top 100 similar images in a database of at least several millions > images summarized by feature vectors ranging from 100s to 1000s of > descriptors based on possibly pooled BoW encodings of histogram based > local descriptors.
Yes, agreed. However I have the impression that for this to work requires more than algorithmic settings, in particular some form of adapted storage. > Spotify is using ANN queries for a very different use case: they do > matrix factorization of User-Item interaction matrix and perform > similarity queries between items in the latent space (I would say in > the order of 100 components). > See the readme for slightly more details: https://github.com/spotify/annoy Indeed, but they use random projections rather than LSH. > Another user case would be to implement k-Nearest Neighbors > classification on datasets with a dimension high enough to render > exact methods such as KD-tree and ball-tree inefficient (I would say > 500+ features). We do this quite often in the lab, and we simply use a randomized PCA on the train set. It works very well. For simple KNN on numerical features, what is the evidence that LSH works better than random projections? Forgive me for asking this question, I may be unaware of the literature. Gaƫl ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
