Re: [Scikit-learn-general] Optimal Subset Selection Code Contribution

2014-08-19 Thread Gael Varoquaux
Hi Giuseppe, Is there a specific highly-cited reference for these methods. I did a quick search on Google scholar, and it seemed that I could mostly find them used in chemistry. Cheers, Gaƫl On Tue, Aug 19, 2014 at 12:05:13PM +0200, Giuseppe Marco Randazzo wrote: > Hello, > i'm interested to c

Re: [Scikit-learn-general] classification algorithms that return probabilities?

2014-08-19 Thread Adamantios Corais
I see. Would it be a good idea to select only those with very high or very low probability (class 0 and 1 respectively), or this doesn't make sense at all? In addition, may I ask to provide some hints on how to implement that? Thanks in advance! On Tue, Aug 19, 2014 at 9:10 PM, Lars Buitinck wro

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-19 Thread Olivier Grisel
2014-08-18 20:44 GMT+02:00 Sebastian Raschka : > > On Aug 18, 2014, at 12:15 PM, Olivier Grisel > wrote: > >> since it would make the "estimate" and "error" calculation more >> convenient, right? > > I don't understand what you mean "estimate" by "error". Both the model > parameters, its individua

Re: [Scikit-learn-general] classification algorithms that return probabilities?

2014-08-19 Thread Lars Buitinck
2014-08-19 20:07 GMT+02:00 Adamantios Corais : > Great. And what about the confidence error? I mean, how should I select a > subset of classified data points such that the probability they belong to > any class is high whereas the confidence error is 95% or above? Sorry, I hadn't seen that in your

Re: [Scikit-learn-general] classification algorithms that return probabilities?

2014-08-19 Thread Adamantios Corais
Great. And what about the confidence error? I mean, how should I select a subset of classified data points such that the probability they belong to any class is high whereas the confidence error is 95% or above? On Tue, Aug 19, 2014 at 7:53 PM, Lars Buitinck wrote: > 2014-08-19 18:03 GMT+02:00

Re: [Scikit-learn-general] classification algorithms that return probabilities?

2014-08-19 Thread Lars Buitinck
2014-08-19 18:03 GMT+02:00 Adamantios Corais : > I am looking for implementations \ configurations of machine learning > algorithms that, instead of a boolean value (class), they return a > probability along with the corresponding confidence error. Any hints? Any scikit-learn classifier that has a

[Scikit-learn-general] classification algorithms that return probabilities?

2014-08-19 Thread Adamantios Corais
Hi everyone, I am looking for implementations \ configurations of machine learning algorithms that, instead of a boolean value (class), they return a probability along with the corresponding confidence error. Any hints? // Adamantios ---

Re: [Scikit-learn-general] PR about topic models

2014-08-19 Thread chyi-kwei yau
Hi guys, I know this is an old email thread but I got reply from Matt Hoffman last night. And he relicense his onlineLDA code to BSD now. I put it is in the following link, and it would be great if someone can help double check it. (Link: https://www.dropbox.com/s/wnkro3xtqjm7bli/onlineldavb_bsd.t

Re: [Scikit-learn-general] make_multilabel_classification n_labels

2014-08-19 Thread Joel Nothman
Hi Krishna, I have no problem seeing the difference between n_labels=2 and n_labels=10. However the number of labels per sample can never exceed n_classes, so it is not really the mean number of labels per sample, but the expected value of the Poisson distribution from which the number of labels i

[Scikit-learn-general] make_multilabel_classification n_labels

2014-08-19 Thread km
Hi all, I am using the make_multilabel_classification function to generate X and Y The function help reads: ... n_labels : int, optional (default=2) The average number of labels per instance. Number of labels follows a Poisson distribution that never takes the value 0. ... but the

[Scikit-learn-general] Optimal Subset Selection Code Contribution

2014-08-19 Thread Giuseppe Marco Randazzo
Hello, i'm interested to contribute in scikit learn implementing some algorithms to make an optimal selection of objects in a N-dimensional space. These techniques are used when sampling is needed in large data and when the sampling must be done with a specifi criterion: - Most Descriptive Com

Re: [Scikit-learn-general] Large dataset causing Array can't be memory-mapped. Python objects in dtype.

2014-08-19 Thread Anders Aagaard
I mail in a question of something I think might be a bug before calling it a night, the next day I have a workaround + it gets fixed upstream. Thanks a lot! :) On Tue, Aug 19, 2014 at 10:51 AM, Olivier Grisel wrote: > I have issued a PR to fix the bug in joblib: > https://github.com/joblib/jobl

Re: [Scikit-learn-general] Large dataset causing Array can't be memory-mapped. Python objects in dtype.

2014-08-19 Thread Olivier Grisel
I have issued a PR to fix the bug in joblib: https://github.com/joblib/joblib/pull/163 Thanks for the report. -- Olivier -- ___ Scikit-learn-general mailing list Scikit-learn-

Re: [Scikit-learn-general] TdidfTransformer when applied to test dataset

2014-08-19 Thread ZORAIDA HIDALGO SANCHEZ
You right Joel. The options will be: * Use the last vocabulary built. (Only including vocabulary for the last train fold) -> Only vocabulary in the last train fold. Underfitting? * Use the whole vocabulary (as I proposed in the previous email: train + test folds) -> Whole vocabulary in t

Re: [Scikit-learn-general] Large dataset causing Array can't be memory-mapped. Python objects in dtype.

2014-08-19 Thread Anders Aagaard
Oh well. I'm not a very experienced monkey-patcher. There may be a better way to do it (*make sure you apply the monkey patch before importing any other scikit-learn modules*). That part seems pretty obvious now that you mention it ;). Works now, thank you very much! On Tue, Aug 19, 2014 at 8:55

Re: [Scikit-learn-general] Large dataset causing Array can't be memory-mapped. Python objects in dtype.

2014-08-19 Thread Joel Nothman
(or better, a.dtype.hasobject) On 19 August 2014 16:59, Joel Nothman wrote: > You can also modify that line in sklearn/externals/joblib/pool.py in your > local copy of scikit-learn to include an additional condition: > and a.dtype.kind != 'O' > > > On 19 August 2014 16:55, Joel Nothman wrote: