Re: [Scikit-learn-general] GSOC - Locality sensitive Hashing

2014-02-26 Thread Maheshakya Wijewardena
The method Bit sampling for Hamming distance is already included in brute algorithm as the metric hamming in Nearest neighbor search. Hence, I think that does not need to be implemented as a LSH algorithm. On Wed, Feb 26, 2014 at 12:46 AM, Maheshakya Wijewardena pmaheshak...@gmail.com wrote:

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-02-26 Thread Lars Buitinck
2014-02-25 7:52 GMT+01:00 Gael Varoquaux gael.varoqu...@normalesup.org: Extreme learning machine: theory and applications has 1285 citations and it got published in 2006; a large number of citations for a fairly recent article. I believe scikit-learn could add such an interesting learning

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-02-26 Thread Gael Varoquaux
On Wed, Feb 26, 2014 at 01:29:43PM +0100, Lars Buitinck wrote: I recently implemented baseline RBF networks in pretty much the same way: k-means + RBF kernel + linear classifier. I didn't submit a PR because it's just a pipeline of existing components. All your points about transformers and

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-02-26 Thread Vlad Niculae
On Wed Feb 26 13:32:08 2014, Gael Varoquaux wrote: documentation and example This was exactly my thought. Many such (near-)equivalences are not obvious, especially for beginners. If Lars's hinge ELM and RBF network would work well (or provide interesting feature visualisations) on some

[Scikit-learn-general] scikits.mixture.GMM.fit issue

2014-02-26 Thread Dmitry Svinkin
To scikit-learn-general, I fit the bimodal 1D distribution with the strong overlap of Gaussian components using scikits.mixture.GMM. The scikits.mixture.GMM.fit gives result which is inconsistent with parameters of input distribution. The code below demonstrates the issue. In case the two

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-02-26 Thread Lars Buitinck
2014-02-26 13:40 GMT+01:00 Vlad Niculae zephy...@gmail.com: This was exactly my thought. Many such (near-)equivalences are not obvious, especially for beginners. If Lars's hinge ELM and RBF network would work well (or provide interesting feature visualisations) on some sklearn.dataset, an

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-02-26 Thread Gael Varoquaux
On Wed, Feb 26, 2014 at 03:42:50PM +0300, Issam wrote: Or perhaps special pipelines to simplify such common tasks. I'd rather avoid special pipelines. For we, that would mean that we have an API problem with the pipeline, that needs to be identified and solved. G

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-02-26 Thread Lars Buitinck
2014-02-26 13:51 GMT+01:00 Gael Varoquaux gael.varoqu...@normalesup.org: On Wed, Feb 26, 2014 at 03:42:50PM +0300, Issam wrote: Or perhaps special pipelines to simplify such common tasks. I'd rather avoid special pipelines. For we, that would mean that we have an API problem with the

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-02-26 Thread Gael Varoquaux
On Wed, Feb 26, 2014 at 01:55:11PM +0100, Lars Buitinck wrote: I'd rather avoid special pipelines. For we, that would mean that we have an API problem with the pipeline, that needs to be identified and solved. Well, for deep learning, you'd want a generalized backprop on the final N

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-02-26 Thread Mathieu Blondel
+1 for an RBF network transformer (with an option to choose between k-means and random sampling). Mathieu On Wed, Feb 26, 2014 at 9:40 PM, Vlad Niculae zephy...@gmail.com wrote: On Wed Feb 26 13:32:08 2014, Gael Varoquaux wrote: documentation and example This was exactly my thought. Many

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-02-26 Thread federico vaggi
As an aside Lars - I'd actually love to see the recepy, if you don't mind putting up a gist or notebook. On Wed, Feb 26, 2014 at 1:29 PM, Lars Buitinck larsm...@gmail.com wrote: 2014-02-25 7:52 GMT+01:00 Gael Varoquaux gael.varoqu...@normalesup.org: Extreme learning machine: theory and

[Scikit-learn-general] Saving Huge Models

2014-02-26 Thread Lorenzo Isella
Dear All, I am using RandomForest on a data set which has less than 20 features, but about 40 lines. The point is that, even if I work on a subset of about 3 lines to train my model, when I save it using pickle, I get a large file in the order of several hundreds of Mb of space (see

Re: [Scikit-learn-general] Saving Huge Models

2014-02-26 Thread Olivier Grisel
You can control the size of your random forest by adjusting the parameters n_estimators, min_samples_split and even max_depth (read the documentation for more details). It's up to you to find parameter values that match your constraints in terms of accuracy vs model size in RAM and prediction

Re: [Scikit-learn-general] Saving Huge Models

2014-02-26 Thread Andy
On 02/26/2014 05:55 PM, Peter Prettenhofer wrote: please make sure to pickle with the highest protocol - otherwise pickle uses a textual serialization format which is quite inefficient: pickle.dump(clf, f, protocol=pickle.HIGHEST_PROTOCOL) Or simply protocol=-1. This usually makes a huge

Re: [Scikit-learn-general] GSOC - Locality sensitive Hashing

2014-02-26 Thread Andy
On 02/26/2014 10:13 AM, Maheshakya Wijewardena wrote: The method Bit sampling for Hamming distance is already included in brute algorithm as the metric hamming in Nearest neighbor search. Hence, I think that does not need to be implemented as a LSH algorithm I would also rather focus on

Re: [Scikit-learn-general] GSOC - Locality sensitive Hashing

2014-02-26 Thread Maheshakya Wijewardena
I would also rather focus on non-binary representations. Even when using Random Projection method for hashing, only sign of the result of dot product is considered. So that, in that situation also, there will be a binary representation( or +1s and -1s). What is your idea about this method?

[Scikit-learn-general] extra trees, oob score vs shufflesplit

2014-02-26 Thread Satrajit Ghosh
hi folks, when using extra trees, one can compute an oob score. has anybody looked at comparing the oob_score to performing a shufflesplit iteration on the data? are these in someways equivalent or would converge to the same mean? cheers, satra

[Scikit-learn-general] marking review status of PRs

2014-02-26 Thread Joel Nothman
We seem to have a lot of PRs waiting for review in some form or another. I think they could do with better management. Can we use github features to make it more apparent that a PR has received +1 (i.e. needs another reviewer) or +2 (i.e. waiting for merge)? At the moment, [WIP] and [MRG] are

Re: [Scikit-learn-general] marking review status of PRs

2014-02-26 Thread Alexandre Gramfort
Hi, I like the [MRG+1] and [MRG+2] idea. Let's see if it can help... Best, A -- Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow

Re: [Scikit-learn-general] Combine functionality for text feature/image feature pipeline

2014-02-26 Thread Alexandre Gramfort
hi, do you know: http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html ? it might do already what you want A On Thu, Feb 27, 2014 at 8:33 AM, michael kneier michael.kne...@gmail.com wrote: Hi all, I would like to add a combiner class which would work with