Actually in our implementation of LSH Forest, we have an extra parameter to
control the candidate acquisition(to avoid having the candidates with very
small hash length matches - lower bound for max_depth) for `kneighbors`
queries. But that too could be controlled by some heuristic method.
But in DBSCAN, we're considering the radius neighbors which is quite
different from finding K nearest neighbors. The parameter we have to
consider here is actual_radius_neighbors to candidates ratio(Here also that
lower bound parameter is required). Can we use a heuristic based method to
determine that ratio parameter as well?
On Wed, Aug 6, 2014 at 9:38 PM, Daniel Vainsencher <
[email protected]> wrote:
> Lshforest as opposed to vanilla lsh, has essentially one index time
> parameter: number of copies of index. It is a rather easy space,time vs
> precision parameter. We could set it heuristically to increase slowly with
> data dimension, so the relative overhead decreases, and then users
> shouldn't really care about parameters.
> On Aug 6, 2014 5:32 PM, "Joel Nothman" <[email protected]> wrote:
>
>> On 6 August 2014 20:04, Lars Buitinck <[email protected]> wrote:
>>
>>> 2014-08-06 7:52 GMT+02:00 Joel Nothman <[email protected]>:
>>> > Instead, could we have an interface in which the `algorithm` parameter
>>> could
>>> > take any object supporting `fit(X)`, `query(X)` and `query_radius(X)`,
>>> such
>>> > as an LSHForest instance? Indeed you could also make 'lsh' an available
>>> > algorithm using reasonable parameters automatically inferred from the
>>> data,
>>> > but you certainly want the user to be able to control the LSH
>>> parameters.
>>>
>>> I'd prefer just passing strings here, though. There aren't too many
>>> choices for the NN implementation, and accepting NN estimators
>>> complicates the client code and the documentation. Users would have to
>>> import from various places to assemble their own DBSCAN.
>>>
>>
>> It's true that is annoying, and is one reason I suggest LSH with
>> reasonable automatic parameters be available as a string.
>>
>>
>>> If we denote NN implementations by strings + a dict for parameters, we
>>> can just enumerate the various options in the docstring without the
>>> need to introduce yet more conventions. This is obviously less
>>> generic, but I find it unlikely that we will add large numbers of NN
>>> implementations or that users will roll their own.
>>>
>>
>> A dict is not so friendly with set_params and the grid search interface,
>> if that's an issue. And I don't see why users shouldn't be able to use an
>> ANN model that is more tuned to their data distribution or metric.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Infragistics Professional
>> Build stunning WinForms apps today!
>> Reboot your WinForms applications with our WinForms controls.
>> Build a bridge from your legacy apps to the future.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
> ------------------------------------------------------------------------------
> Infragistics Professional
> Build stunning WinForms apps today!
> Reboot your WinForms applications with our WinForms controls.
> Build a bridge from your legacy apps to the future.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
--
Undergraduate,
Department of Computer Science and Engineering,
Faculty of Engineering.
University of Moratuwa,
Sri Lanka
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general