It seems to me that the LSH forest is substituting for the `algorithm`
parameter, which selects between ball_tree, kd_tree and brute search for
nearest neighbour search. These are designed not to take additional
parameters.

So you need to accept additional parameters. You could indeed create
another estimator like ApproximateNeighborsDBSCAN, but you'd need to do the
same for KNeighborsClassifier, RadiusNeighborsClassifier,
KNeighborsRegressor and RadiusNeighborsRegressor. That proliferation seems
out of hand.

Instead, could we have an interface in which the `algorithm` parameter
could take any object supporting `fit(X)`, `query(X)` and
`query_radius(X)`, such as an LSHForest instance? Indeed you could also
make 'lsh' an available algorithm using reasonable parameters automatically
inferred from the data, but you certainly want the user to be able to
control the LSH parameters.

(Note, currently BallTree, KDTree don't support fit(), and the index data
is passed into their constructors.)

There is also the caveat that currently the BallTree and KDTree are passed
an effective_metric parameter in their constructors; not all metrics are
possible with a particular LSH implementation, and only euclidean is
currently supported. So the outer estimator (e.g. DBSCAN) could set an
effective_metric parameter on the algorithm object, or it could not.

WDYT?


On 6 August 2014 15:33, Maheshakya Wijewardena <[email protected]>
wrote:

> Hi,
>
> I'm trying to use LSH Forest approximate neighbor search method to obtain
> radius neighbors in DBSCAN. It adheres the API of sklearn.neighbors (at
> least radius_neighbors method at this moment). But LSH Forest itself has a
> set of parameters, so they need to be initialized.
>
> I'm thinking about passing an argumant to DBSCAN init method as
> `approximate_neighbors=True` (or something suitable) and have the LSH
> Forest parameters as well in DBSCAN init method.
>
> The other method Robert suggested to subclass from DBSCAN to use
> approximate neighbors.
>
> Once LSH Forest is initialized, it's just a matter of applying that in the
> place of `NearestNeighbors`. Are the above methods appropriate or is there
> better ways?
>
> PR to LSH Forest: https://github.com/scikit-learn/scikit-learn/pull/3304
>
> Best Regards,
> Maheshakya
>
> --
> Undergraduate,
> Department of Computer Science and Engineering,
> Faculty of Engineering.
> University of Moratuwa,
> Sri Lanka
>
>
> ------------------------------------------------------------------------------
> Infragistics Professional
> Build stunning WinForms apps today!
> Reboot your WinForms applications with our WinForms controls.
> Build a bridge from your legacy apps to the future.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to