On 08/06/2014 07:25 PM, Maheshakya Wijewardena wrote: > Actually in our implementation of LSH Forest, we have an extra parameter > to control the candidate acquisition(to avoid having the candidates with > very small hash length matches - lower bound for max_depth) for > `kneighbors` queries. But that too could be controlled by some heuristic > method. I have not seen any evidence that that parameter is ever beneficially changed from the first value you tried for it (4 IIRC?); until we have some, there is no point in confusing users with it.
> But in DBSCAN, we're considering the radius neighbors which is quite > different from finding K nearest neighbors. The parameter we have to > consider here is actual_radius_neighbors to candidates ratio(Here also > that lower bound parameter is required). Can we use a heuristic based > method to determine that ratio parameter as well? Good point. Note that this affects query time behavior, not index time behavior; I guess that a query time behavior of 1. Look at at least 1% of DB 2. Stop as soon as you've looked at 10% of DB, or if less than 10% of candidates are in radius. Should be at least as good as any constant value of that parameter. > And what about the number of trees? That is exactly the "number of copies of index" that I talked about in the previous message. Daniel > On Wed, Aug 6, 2014 at 9:38 PM, Daniel Vainsencher > <[email protected] <mailto:[email protected]>> wrote: > > Lshforest as opposed to vanilla lsh, has essentially one index time > parameter: number of copies of index. It is a rather easy space,time > vs precision parameter. We could set it heuristically to increase > slowly with data dimension, so the relative overhead decreases, and > then users shouldn't really care about parameters. > > On Aug 6, 2014 5:32 PM, "Joel Nothman" <[email protected] > <mailto:[email protected]>> wrote: > > On 6 August 2014 20:04, Lars Buitinck <[email protected] > <mailto:[email protected]>> wrote: > > 2014-08-06 7:52 GMT+02:00 Joel Nothman > <[email protected] <mailto:[email protected]>>: > > Instead, could we have an interface in which the > `algorithm` parameter could > > take any object supporting `fit(X)`, `query(X)` and > `query_radius(X)`, such > > as an LSHForest instance? Indeed you could also make > 'lsh' an available > > algorithm using reasonable parameters automatically > inferred from the data, > > but you certainly want the user to be able to control the > LSH parameters. > > I'd prefer just passing strings here, though. There aren't > too many > choices for the NN implementation, and accepting NN estimators > complicates the client code and the documentation. Users > would have to > import from various places to assemble their own DBSCAN. > > > It's true that is annoying, and is one reason I suggest LSH with > reasonable automatic parameters be available as a string. > > If we denote NN implementations by strings + a dict for > parameters, we > can just enumerate the various options in the docstring > without the > need to introduce yet more conventions. This is obviously less > generic, but I find it unlikely that we will add large > numbers of NN > implementations or that users will roll their own. > > > A dict is not so friendly with set_params and the grid search > interface, if that's an issue. And I don't see why users > shouldn't be able to use an ANN model that is more tuned to > their data distribution or metric. > > > > ------------------------------------------------------------------------------ > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > <mailto:[email protected]> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > ------------------------------------------------------------------------------ > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > <mailto:[email protected]> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > -- > Undergraduate, > Department of Computer Science and Engineering, > Faculty of Engineering. > University of Moratuwa, > Sri Lanka ------------------------------------------------------------------------------ Infragistics Professional Build stunning WinForms apps today! Reboot your WinForms applications with our WinForms controls. Build a bridge from your legacy apps to the future. http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
