Re: [Scikit-learn-general] mapping liblinear wrapper with LinearSVC

2014-08-06 Thread Manoj Kumar
Hi, It looks to me that only the bias parameter is passed to liblinear. It is set to self.intercept_scaling if fit_intercept is set to True and -1 otherwise. The rest (I think) are default parameters. You can have a look at the function signature of train_wrap in liblinear.pyx in order to clarify

Re: [Scikit-learn-general] mapping liblinear wrapper with LinearSVC

2014-08-06 Thread Pagliari, Roberto
Probably my question was not clear. What I meant to ask is this: when using sklearn iblinear with default parameters, what would be the options passed to liblinear library for training, which is the "options" field in liblinear python wrapper? Thank you! From: Pagliari, Roberto [mailto:rpagli.

[Scikit-learn-general] Getting started with BernoulliRBM

2014-08-06 Thread Ingo Fründ
Dear scikit-learners, Thank you first of all for writing such a wonderful machine learning package for python. I've used scikit-learn quite a lot in the past and it seemed to always would right away. Yet, now I'm trying to get started with the BernoulliRBM in scikit-learn, and I seem to be missin

Re: [Scikit-learn-general] Using LSH Forest approximate neibghbor search in DBSCAN[GSoC]

2014-08-06 Thread Maheshakya Wijewardena
And what about the number of trees? On Wed, Aug 6, 2014 at 9:55 PM, Maheshakya Wijewardena < pmaheshak...@gmail.com> wrote: > Actually in our implementation of LSH Forest, we have an extra parameter > to control the candidate acquisition(to avoid having the candidates with > very small hash leng

Re: [Scikit-learn-general] Using LSH Forest approximate neibghbor search in DBSCAN[GSoC]

2014-08-06 Thread Maheshakya Wijewardena
Actually in our implementation of LSH Forest, we have an extra parameter to control the candidate acquisition(to avoid having the candidates with very small hash length matches - lower bound for max_depth) for `kneighbors` queries. But that too could be controlled by some heuristic method. But in

Re: [Scikit-learn-general] Using LSH Forest approximate neibghbor search in DBSCAN[GSoC]

2014-08-06 Thread Daniel Vainsencher
Lshforest as opposed to vanilla lsh, has essentially one index time parameter: number of copies of index. It is a rather easy space,time vs precision parameter. We could set it heuristically to increase slowly with data dimension, so the relative overhead decreases, and then users shouldn't really

Re: [Scikit-learn-general] Using LSH Forest approximate neibghbor search in DBSCAN[GSoC]

2014-08-06 Thread Joel Nothman
On 6 August 2014 20:04, Lars Buitinck wrote: > 2014-08-06 7:52 GMT+02:00 Joel Nothman : > > Instead, could we have an interface in which the `algorithm` parameter > could > > take any object supporting `fit(X)`, `query(X)` and `query_radius(X)`, > such > > as an LSHForest instance? Indeed you cou

Re: [Scikit-learn-general] Using LSH Forest approximate neibghbor search in DBSCAN[GSoC]

2014-08-06 Thread Joel Nothman
On 6 August 2014 20:04, Lars Buitinck wrote: > 2014-08-06 7:52 GMT+02:00 Joel Nothman : > > Instead, could we have an interface in which the `algorithm` parameter > could > > take any object supporting `fit(X)`, `query(X)` and `query_radius(X)`, > such > > as an LSHForest instance? Indeed you cou

Re: [Scikit-learn-general] Using LSH Forest approximate neibghbor search in DBSCAN[GSoC]

2014-08-06 Thread Maheshakya Wijewardena
Joel, I thought the interface you mentioned should accept only instances of estimators. Sorry, My bad. I think Lars have a good idea. Having extra parameters in the DBSCAN to use approximate neighbors(approximate_neighbors=True) and a dict for its parameters seems less complex and suitable at mome

Re: [Scikit-learn-general] Using LSH Forest approximate neibghbor search in DBSCAN[GSoC]

2014-08-06 Thread Lars Buitinck
2014-08-06 7:52 GMT+02:00 Joel Nothman : > Instead, could we have an interface in which the `algorithm` parameter could > take any object supporting `fit(X)`, `query(X)` and `query_radius(X)`, such > as an LSHForest instance? Indeed you could also make 'lsh' an available > algorithm using reasonabl

Re: [Scikit-learn-general] Using LSH Forest approximate neibghbor search in DBSCAN[GSoC]

2014-08-06 Thread Joel Nothman
I don't understand the problem. The default DBSCAN will still have algorithm='auto'. On 6 August 2014 17:01, Maheshakya Wijewardena wrote: > I too considered passing the estimator instance as a parameter to DBSCAN. > If we want to use KDTree or BallTree, NearestNeighbor instances created > with

Re: [Scikit-learn-general] GSoC - Review LSHForest approximate nearest neighbor search implementation

2014-08-06 Thread Kyle Kastner
As far as I know, the typical idea is to keep things as readable as possible, and only optimize the "severe/obvious" type bottlenecks (things like memory explosions, really bad algorithmic complexity, unnecessary data copy, etc). I can't really comment on your "where do the bottlenecks go" questio

Re: [Scikit-learn-general] GSoC - Review LSHForest approximate nearest neighbor search implementation

2014-08-06 Thread Daniel Vainsencher
Hi, As one of Maheshakya's mentors, I want to thank Arnaud, Noel and Lars for reviewing this code! I've learned a lot from your suggestions about scikit learn, stuff like numpy.packbits etc, and they have certainly helped Maheshakya improve the code. I have a couple of questions, mostly for m

Re: [Scikit-learn-general] Using LSH Forest approximate neibghbor search in DBSCAN[GSoC]

2014-08-06 Thread Maheshakya Wijewardena
I too considered passing the estimator instance as a parameter to DBSCAN. If we want to use KDTree or BallTree, NearestNeighbor instances created with algorithm=kdtree or ball_tree can be passed. But Robert mentioned that it would fail the unit test cases- the base test that ensures that all BaseEs