On 08/06/2014 07:25 PM, Maheshakya Wijewardena wrote:
> Actually in our implementation of LSH Forest, we have an extra parameter
> to control the candidate acquisition(to avoid having the candidates with
> very small hash length matches - lower bound for max_depth) for
> `kneighbors` queries. But that too could be controlled by some heuristic
> method.
I have not seen any evidence that that parameter is ever beneficially 
changed from the first value you tried for it (4 IIRC?); until we have 
some, there is no point in confusing users with it.

> But in DBSCAN, we're considering the radius neighbors which is quite
> different from finding K nearest neighbors. The parameter we have to
> consider here is actual_radius_neighbors to candidates ratio(Here also
> that lower bound parameter is required). Can we use a heuristic based
> method to determine that ratio parameter as well?
Good point. Note that this affects query time behavior, not index time 
behavior; I guess that a query time behavior of
1. Look at at least 1% of DB
2. Stop as soon as you've looked at 10% of DB, or if less than 10% of 
candidates are in radius.

Should be at least as good as any constant value of that parameter.

 > And what about the number of trees?
That is exactly the "number of copies of index" that I talked about in 
the previous message.

Daniel

> On Wed, Aug 6, 2014 at 9:38 PM, Daniel Vainsencher
> <[email protected] <mailto:[email protected]>> wrote:
>
>     Lshforest as opposed to vanilla lsh, has essentially one index time
>     parameter: number of copies of index. It is a rather easy space,time
>     vs precision parameter. We could set it heuristically to increase
>     slowly with data dimension, so the relative overhead decreases, and
>     then users shouldn't really care about  parameters.
>
>     On Aug 6, 2014 5:32 PM, "Joel Nothman" <[email protected]
>     <mailto:[email protected]>> wrote:
>
>         On 6 August 2014 20:04, Lars Buitinck <[email protected]
>         <mailto:[email protected]>> wrote:
>
>             2014-08-06 7:52 GMT+02:00 Joel Nothman
>             <[email protected] <mailto:[email protected]>>:
>              > Instead, could we have an interface in which the
>             `algorithm` parameter could
>              > take any object supporting `fit(X)`, `query(X)` and
>             `query_radius(X)`, such
>              > as an LSHForest instance? Indeed you could also make
>             'lsh' an available
>              > algorithm using reasonable parameters automatically
>             inferred from the data,
>              > but you certainly want the user to be able to control the
>             LSH parameters.
>
>             I'd prefer just passing strings here, though. There aren't
>             too many
>             choices for the NN implementation, and accepting NN estimators
>             complicates the client code and the documentation. Users
>             would have to
>             import from various places to assemble their own DBSCAN.
>
>
>         It's true that is annoying, and is one reason I suggest LSH with
>         reasonable automatic parameters be available as a string.
>
>             If we denote NN implementations by strings + a dict for
>             parameters, we
>             can just enumerate the various options in the docstring
>             without the
>             need to introduce yet more conventions. This is obviously less
>             generic, but I find it unlikely that we will add large
>             numbers of NN
>             implementations or that users will roll their own.
>
>
>         A dict is not so friendly with set_params and the grid search
>         interface, if that's an issue. And I don't see why users
>         shouldn't be able to use an ANN model that is more tuned to
>         their data distribution or metric.
>
>
>         
> ------------------------------------------------------------------------------
>         Infragistics Professional
>         Build stunning WinForms apps today!
>         Reboot your WinForms applications with our WinForms controls.
>         Build a bridge from your legacy apps to the future.
>         
> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
>         _______________________________________________
>         Scikit-learn-general mailing list
>         [email protected]
>         <mailto:[email protected]>
>         https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>     
> ------------------------------------------------------------------------------
>     Infragistics Professional
>     Build stunning WinForms apps today!
>     Reboot your WinForms applications with our WinForms controls.
>     Build a bridge from your legacy apps to the future.
>     
> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
>     _______________________________________________
>     Scikit-learn-general mailing list
>     [email protected]
>     <mailto:[email protected]>
>     https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> --
> Undergraduate,
> Department of Computer Science and Engineering,
> Faculty of Engineering.
> University of Moratuwa,
> Sri Lanka


------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to