Re: [Scikit-learn-general] Critical Difference Diagram

2015-10-29 Thread Arnaud Joly
scipy allows to perform the friedman test. Orange has the tool to drawn the critical distance diagram. And you can easily compute the critical distance using stats model: from statsmodels.stats.libqsturng import qsturng q_alpha = qsturng(1 - alpha, n_methods, np.inf) / np.sqrt(2) cd = q_alpha *

Re: [Scikit-learn-general] Utility of random_state parameter for decision trees

2015-10-15 Thread Arnaud Joly
Your intuition is correct. For a decision tree with max_feature=None, the random_state is used to break ties randomly. Cheers, Arnaud > On 14 Oct 2015, at 17:33, Kevin Markham wrote: > > Hello, > > I'm a data science instructor that uses scikit-learn extensively in

Re: [Scikit-learn-general] new commiters

2015-09-23 Thread Arnaud Joly
Congratulation and welcome !!! Arnaud > On 23 Sep 2015, at 08:59, Gael Varoquaux > wrote: > > Welcome to the team. You've been doing awesome work. We are very looking > forward to having you in the core devs. > > Gaël > > On Tue, Sep 22, 2015 at 07:16:59PM

Re: [Scikit-learn-general] Estimators of RAKEL and (Ensemble) Classifier Chain for multilabel proposal

2015-07-13 Thread Arnaud Joly
The vanilla rakel and vanilla classifier chain would be a great addition in scikit-learn. FYI For the classifier chain, there is a stalled pull request https://github.com/scikit-learn/scikit-learn/pull/3727 https://github.com/scikit-learn/scikit-learn/pull/3727 . For the rakel classifier,

Re: [Scikit-learn-general] Fwd: multilabel and multiclass classification.

2015-07-03 Thread Arnaud Joly
-label problem, you use a 2 dimensional array of shape (n_samples, n_labels). Best regards, Arnaud Joly On 03 Jul 2015, at 14:48, Prabhanshu Abhishek prabhans...@gmail.com wrote: Sir, I am using Scikit-learn for large scale item classification, which is multiclass and multilabel. I

Re: [Scikit-learn-general] Scitkit-learn Random Forest Classifier

2015-05-27 Thread Arnaud Joly
Hi, You can control the number of attributes that is drawn (tested) at each node with the max_features parameters. Best regards, Arnaud Joly On 27 May 2015, at 11:47, Herbert Schulz hrbrt@gmail.com wrote: Hello everyone, I'm using the Random Forest Classifier to predict the toxicity

Re: [Scikit-learn-general] RFC (also by users) on interpreting 1d X

2015-05-04 Thread Arnaud Joly
I am in favour of raising a error. Arnaud On 01 May 2015, at 19:58, Gael Varoquaux gael.varoqu...@normalesup.org wrote: I strongly advice raising an error. Very very very strongly. Being lax about ambiguous inputs makes prototyping and interactive usage easier: less typing, and the

Re: [Scikit-learn-general] sample weights for RandomForestClassifier to compute cross_val_score with roc_auc metric

2015-04-27 Thread Arnaud Joly
If you set sample_weight[i] = 2, for the i-th samples. It will consider that this sample has to be accounted twice in the tree growing procedure (impurity computation, leaf labelling, …). Best regards, Arnaud On 26 Apr 2015, at 16:00, Luca Puggini lucapug...@gmail.com wrote: Ok thanks a

Re: [Scikit-learn-general] [ANN] scikit-learn 0.16.0 is out!

2015-03-27 Thread Arnaud Joly
Awesome !!! Thanks to all who contributed to this release!! Arnaud On 27 Mar 2015, at 18:22, Gael Varoquaux gael.varoqu...@normalesup.org wrote: Congratulations Olivier and the whole team (thanks a lot to Andy for a lot of work on the issues and the release. This is awesome! Releasing

Re: [Scikit-learn-general] Partial dependence plots for Random Forests.

2015-03-21 Thread Arnaud Joly
No , scikit-learn doesn’t have partial dependence plots for random forest. Best regards, Arnaud On 21 Mar 2015, at 03:43, Shubham Singh Tomar tomarshubha...@gmail.com wrote: Does scikit-learn have any capacity for partial dependence plots and associated data arrays for random forest

Re: [Scikit-learn-general] GSoC2015 topics

2015-03-06 Thread Arnaud Joly
Hi, Sadly this year, I won’t have time for mentoring. However, I will try to find some spare time for reviewing! Best regards, Arnaud On 05 Mar 2015, at 22:43, Andreas Mueller t3k...@gmail.com wrote: Hi Wei Xue. Thanks for your interest. For the GMM project being familiar with DPGMM and

Re: [Scikit-learn-general] Samples per estimator on Random Forests

2014-12-16 Thread Arnaud Joly
. If you want to perform random subspace, you can have a look to BaggingClassifier and BaggingRegressor. 3) It’s possible to achieve 1 and 2 using both the bagging and random forest estimators. Best regards, Arnaud Joly On 16 Dec 2014, at 09:06, Miquel Camprodon miquel.campro...@kernel

Re: [Scikit-learn-general] Fast Johnson-Lindenstrauss Transform

2014-10-29 Thread Arnaud Joly
Can you comment a bit how they combine the random sign matrix and the subsample random subsample fourrier basis? Best regards, Arnaud Joly On 29 Oct 2014, at 14:24, Michal Romaniuk michal.romaniu...@imperial.ac.uk wrote: Hi everyone, I'm thinking of adding the Unrestricted Fast Johnson

Re: [Scikit-learn-general] Suggestion: break up the metrics module

2014-10-15 Thread Arnaud Joly
I totally agree with Gael. I would welcome improvements in the narrative documentation of http://scikit-learn.org/stable/modules/metrics.html about distances and kernels. It feels empty compare to http://scikit-learn.org/stable/modules/model_evaluation.html Best regards, Arnaud On 14 Oct 2014,

Re: [Scikit-learn-general] Welcome new core contributors

2014-10-13 Thread Arnaud Joly
Congratulation !!! Arnaud On 13 Oct 2014, at 03:13, Kyle Kastner kastnerk...@gmail.com wrote: Thanks everyone! There are some nice new extensions for that algorithm planned (randomized SVD!) once I get a moment to submit the proper PR. I am happy to be able to contribute for such an awesome

Re: [Scikit-learn-general] Backward compat policy in utils

2014-09-16 Thread Arnaud Joly
I would add to this lists: - check_array; - check_consistent_length; - check_X_y. Those are very useful. Arnaud On 15 Sep 2014, at 20:03, Olivier Grisel olivier.gri...@ensta.org wrote: 2014-09-15 6:40 GMT-07:00 Mathieu Blondel math...@mblondel.org: lightning is using the

Re: [Scikit-learn-general] Sparse Gradient Boosting Fully Corrective Gradient Boosting

2014-09-16 Thread Arnaud Joly
Hi, There is a very advanced pull request which add sparse matrix support to decision tree: https://github.com/scikit-learn/scikit-learn/pull/3173 Based on this, it could be possible to have gradient tree boosting working on sparse data. Note that adaboost already support sparse matrix with

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
Hi, To get reproducible model, you have to set the random_state. Best regards, Arnaud On 16 Sep 2014, at 12:08, Debanjan Bhattacharyya b.deban...@gmail.com wrote: Hi I recently participated in the Atlas (Higgs Boson Machine Learning Challenge) One of the models I tried was

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
rounds of CV and averaging it ? What exactly goes behind random_state from a Gradient Boosting approach ? Regards Deb On Tue, Sep 16, 2014 at 3:52 PM, Arnaud Joly a.j...@ulg.ac.be wrote: Hi, To get reproducible model, you have to set the random_state. Best regards, Arnaud

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
, 2014 at 6:07 PM, Arnaud Joly a.j...@ulg.ac.be wrote: During the growth of the decision tree, the best split is searched in a subset of max_features sampled among all features. Setting the random_state allows to draw the same subsets of features each time. Note that if several candidate

Re: [Scikit-learn-general] oob_score_ for random forests for regression

2014-09-12 Thread Arnaud Joly
Hi, The r2_score metric is used. Best regards, Arnaud On 12 Sep 2014, at 16:04, Josh Wasserstein ribonucle...@gmail.com wrote: What error metric is used for this? Josh -- Want excitement? Manually upgrade your

Re: [Scikit-learn-general] oob_score_ for random forests for regression

2014-09-12 Thread Arnaud Joly
, Arnaud Joly a.j...@ulg.ac.be wrote: Hi, The r2_score metric is used. Best regards, Arnaud On 12 Sep 2014, at 16:04, Josh Wasserstein ribonucle...@gmail.com wrote: What error metric is used for this? Josh

Re: [Scikit-learn-general] oob_score_ for random forests for regression

2014-09-12 Thread Arnaud Joly
Here the link to the issue https://github.com/scikit-learn/scikit-learn/issues/3455 Arnaud On 12 Sep 2014, at 20:01, Arnaud Joly a.j...@ulg.ac.be wrote: If you want to work on custom oob scoring, there is an issue opened for it. Best regards, Arnaud On 12 Sep 2014, at 19:01, Josh

Re: [Scikit-learn-general] Dynamic Multiple Classifier Systems

2014-08-28 Thread Arnaud Joly
Hi, Which algorithm do you want to bring into scikit-learn? Note that algorithms that are ok for inclusion in scikit-learn have at least 3 years old (since publications), 1000+ cites and wide use and usefulness. [1] Best regards, Arnaud [1]

Re: [Scikit-learn-general] Unpickle doesn't work when upgrading from 14.1 to 15.1.

2014-08-26 Thread Arnaud Joly
Note that most (if not all) speed improvement have been made to fit faster trees. Arnaud On 26 Aug 2014, at 06:56, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Tue, Aug 26, 2014 at 02:42:02AM +, Pranav Sharma wrote: I just upgraded scikit from 14.1 to 15.1 to take advantage of

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Arnaud Joly
If you set n_jobs to XXX, it will spawn XXX threads or processes. Thus, you will need to ask for XXX cores. Note that it’s often possible to retrieve XXX in your script using os.environ. If you use less than the XXX cores, then you won’t use all the available cpu. If you ask for more than XXX

Re: [Scikit-learn-general] Sparse Random Projection negative weights

2014-08-08 Thread Arnaud Joly
Have you tried to increase the number of components or epsilon parameter and density of the SparseRandomProjection? Have you tried to normalise X prior the random projection? Best regards, Arnaud On 08 Aug 2014, at 12:19, Philipp Singer kill...@gmail.com wrote: Just another remark regarding

Re: [Scikit-learn-general] Sparse Random Projection negative weights

2014-08-08 Thread Arnaud Joly
to the Li et al paper. Could you recommend some value? I think I will be more effective with LSA for now. Are there any specific recommendations for the number of components? Chose 300 for now. Best, Philipp Am 08.08.2014 um 13:14 schrieb Arnaud Joly a.j...@ulg.ac.be: Have you tried

Re: [Scikit-learn-general] [ANN] scikit-learn 0.15.1 is out

2014-08-04 Thread Arnaud Joly
Thanks Olivier! Arnaud On 01 Aug 2014, at 17:55, Olivier Grisel olivier.gri...@ensta.org wrote: This is a bugfix release. The list of fixes of this release can be found on: http://scikit-learn.org/stable/whats_new.html You can install from source or binary packages available here

Re: [Scikit-learn-general] About scoring functions with GridSearchCV

2014-07-24 Thread Arnaud Joly
, are the values of ‘C’ visited in order or at random? Or, in other words, if two or more values of ‘C’ lead to similar results, say very close or identical, will the smallest ‘C’ be the output ? Thank you! From: Arnaud Joly [mailto:arnaud.v.j...@gmail.com] Sent: Thursday, July 24, 2014 4:24 PM

Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-16 Thread Arnaud Joly
Hi This looks like a regression. Can you open an issue on github? I am not sure that it would make sense to add a unknown columns label with an optional parameter. But you could easily add one with some numpy operations np.hstack([y, y.sum(axis=1,keepdims=True) == 0]) Best regards, Arnaud

Re: [Scikit-learn-general] sparse matrix input support for GradientBoostingClassifiers or AdaBoostClassifier

2014-07-02 Thread Arnaud Joly
Hi, There is sparse input support with adaboost for weak learners that supports sparse input (such as sgd). For adaboost with decision tree as weak learner, this is in progress see the pull request https://github.com/scikit-learn/scikit-learn/pull/3173 For gradient tree boosting, nothing has

Re: [Scikit-learn-general] Multi-class AND multilabel learning/prediction

2014-07-01 Thread Arnaud Joly
Hi, Can you describe your problem? Do you mean multi-output multi-clas? Best, Arnaud On 01 Jul 2014, at 11:13, Gundala Viswanath gunda...@gmail.com wrote: According to this documentation here: http://scikit-learn.org/stable/modules/multiclass.html The API listed there does EITHER

Re: [Scikit-learn-general] Classifiers that handle instance weighting in sklearn

2014-06-17 Thread Arnaud Joly
Hi, Without being exhaustive Random forest, extra trees, bagging, adaboost, naive bayes and several linear models support sample weight. Best regards, Arnaud On 17 Jun 2014, at 11:27, Mohamed-Rafik Bouguelia bouguelia.med.ra...@gmail.com wrote: Hello all, I've tried to associate

Re: [Scikit-learn-general] Multilabel and differences betweeen 0.14 and Master

2014-06-11 Thread Arnaud Joly
Hi, Could you provide some minimal data as to reproduce this behavior? Best regards, Arnaud On 10 Jun 2014, at 16:53, Miguel Fernando Cabrera mfcabr...@gmail.com wrote: Hi Everyone, This is my first post in the list. I have been using scikit-learn actively for the last six month in my

Re: [Scikit-learn-general] [ANN] scikit-learn 0.15.0b1 is on PyPI (first beta release for 0.15.0)

2014-06-06 Thread Arnaud Joly
Hi all, Thanks Olivier for taking care of the release!! Best regards, Arnaud On 06 Jun 2014, at 15:14, Olivier Grisel olivier.gri...@ensta.org wrote: Hi all, I just pushed a first beta release (0.15.0b1) of the new 0.15.X branch to PyPI. This releases includes (experimental) wheel

Re: [Scikit-learn-general] My talk was approved for EuroScipy'14

2014-05-22 Thread Arnaud Joly
Congratulation ! :-) Cheers, Arnaud On 22 May 2014, at 10:50, Peter Prettenhofer peter.prettenho...@gmail.com wrote: congrats Gilles -- looking forward to your talk -- you should definitely make a blog post from your material (and benchmarks)! 2014-05-22 8:50 GMT+02:00 Vlad Niculae

Re: [Scikit-learn-general] KFold cross validation strangely defaults to not shuffle

2014-04-25 Thread Arnaud Joly
On 23 Apr 2014, at 08:17, Mathieu Blondel math...@mblondel.org wrote: One solution would be to deprecate the shuffle option from KFold and add a new class ShuffleKFold. The documentation should clarify the difference between ShuffleKFold and ShuffleSplit: in the latter you need to specify

Re: [Scikit-learn-general] confused with KNN

2014-04-25 Thread Arnaud Joly
Hi Chengxuan Wan, Without more details and a code example, it’s difficult to help you. Furthermore, it’s better to ask for help on the scikit-learn mailing list or on stack overflow. Best regards, Arnaud Joly On 25 Apr 2014, at 19:04, Chengxuan Wang cw1...@nyu.edu wrote: Hi, Arnaud Joly

Re: [Scikit-learn-general] GSoC acceptance - Sparse Support

2014-04-23 Thread Arnaud Joly
Congratulation Hamzeh !!! I am looking forward working with you ! Arnaud On 23 Apr 2014, at 03:57, Hamzeh Alsalhi ha...@cornell.edu wrote: Thank you to Gael and Arnaud for the support and criticism on my early proposal. I am a big fan of the high coding and collaboration standards at

Re: [Scikit-learn-general] Welcome to GSoC students

2014-04-23 Thread Arnaud Joly
Welcome and congratulation to Issam, Hamzeh, Manoj and Maheshakya! Arnaud On 23 Apr 2014, at 07:51, Robert Layton robertlay...@gmail.com wrote: Thanks Gaël. The fact we received four students is testament to the hard work everyone has done before me! On 23 April 2014 15:46, Gael

Re: [Scikit-learn-general] GSoC

2014-04-08 Thread Arnaud Joly
not be memory/time inefficient. But my question is, is this acceptable? On Mon, Mar 17, 2014 at 6:49 PM, Arnaud Joly a.j...@ulg.ac.be wrote: Hi, The support for sparse matrices should exploit as much as possible the sparsity structure of the matrix without blowing up memory

Re: [Scikit-learn-general] LabelBinarizer changes complete?

2014-03-25 Thread Arnaud Joly
Gollamudi a...@rice.edu wrote: Quoting Arnaud Joly a.j...@ulg.ac.be: Can you provide a gist of your code as to help you? I have an implementation that mimics OnevsRestClassifier I want to eventually try partial_fit since the number of samples is large. Here is the rough outline

Re: [Scikit-learn-general] LabelBinarizer changes complete?

2014-03-24 Thread Arnaud Joly
Hi, Can you provide a gist of your code as to help you? The pr 2458 isnt finished yet and there is possibly some quirk cases where it might fail. However in the branch https://github.com/arjoly/scikit-learn/commits/sparse-label_binarizer, I almost finished the label binarizer part. I can try

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-03-21 Thread Arnaud Joly
Hi Issam, Why not starting by improving multilayer neural network before adding new algorithms ? To neural network expert, is it interesting to have layer configuration à la Torch https://github.com/torch/nn/blob/master/README.md ? Best, Arnaud On 21 Mar 2014, at 10:18, Issam

Re: [Scikit-learn-general] GSOC 2014 scipy.sparse matrix support to DT

2014-03-18 Thread Arnaud Joly
/scikit-learn/issues/655 https://github.com/scikit-learn/scikit-learn/issues/2399 And our mentor is Arnaud Joly, you can ask him for help One of the implementation in progress is https://github.com/scikit-learn/scikit-learn/pull/2848 And refer to https://github.com/fest/fest/blob/master/tree.c

Re: [Scikit-learn-general] GSoC

2014-03-17 Thread Arnaud Joly
for sparse matrices, why not exploit their structure as much as we can. Arnaud is this feasible ? Eltermann anything wrong in my thinking ? cheers, kaushik varanasi On Wed, Mar 12, 2014 at 8:29 PM, Arnaud Joly a.j...@ulg.ac.be wrote: For the number of contributions, I would advise you to do

Re: [Scikit-learn-general] GSoC

2014-03-12 Thread Arnaud Joly
will work on the issues a bit more. On Tue, Mar 11, 2014 at 1:12 PM, Arnaud Joly a.j...@ulg.ac.be wrote: Thanks for your contribution. Keep up! Arnaud On 10 Mar 2014, at 23:51, vamsi kaushik kaushik.varana...@gmail.com wrote: My name is actually Varanasi Vamsi Kaushik(yeah its

Re: [Scikit-learn-general] GSoC

2014-03-10 Thread Arnaud Joly
Hi, Anything concerning the GSOC should pass by the scikit-learn mailing list. Thanks for your interest in the subject. If you intend to apply for a GSOC, I suggest you to read https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2014 and start contributing to

Re: [Scikit-learn-general] proposal

2014-03-10 Thread Arnaud Joly
Hi, The model representation (the tree structure) shouldn’t be affected by the fact that the input data is sparse or dense. You might be interested by this issue https://github.com/scikit-learn/scikit-learn/issues/655. Best, Arnaud On 07 Mar 2014, at 18:13, vamsi kaushik

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-02-08 Thread Arnaud Joly
/8388bedff4e225cda9a1b2b6e3fc250bb7d22276#diff-a2cead4f3702cc4b9f76562bb2777edbL2297 [3] https://github.com/eltermann/scikit-learn/commit/5ba9c367661446c3eba7e6ea54adc1ff5cdfd39f#diff-a2cead4f3702cc4b9f76562bb2777edbR1281 On Wed, Feb 5, 2014 at 10:34 AM, Arnaud Joly a.j...@ulg.ac.be wrote: I think that I would go

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-02-05 Thread Arnaud Joly
I think that I would go for the option that minimize the amount of code duplication. I would probably start with 2. Since we don’t pickle anymore the Splitter and criterion, the constructor arguments could be used to pass the X and the y matrix. Cheers, Arnaud On 04 Feb 2014, at 17:38,

Re: [Scikit-learn-general] Google Summer of Code - ideas

2014-01-31 Thread Arnaud Joly
Hello, Your contributions to scikit-learn is highly appreciated. However, we use only the scikit-learn mailing list to discuss about GSOC ideas. At the moment, I don’t want to give any, but might give some in a near future. We should definitely remove the old list. Since it biases applicants

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-31 Thread Arnaud Joly
Here, some results on the 20 newsgroups dataset: Classifiertrain-time test-time error-rate 5-nn0.0047s 13.6651s0.5916 random forest 263.3146s3.9985s0.2459 sgd 0.2265s0.0657s

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-28 Thread Arnaud Joly
You can also reduce the dimensionality using random projections. Arnaud On 28 Jan 2014, at 11:39, Nick Pentreath nick.pentre...@gmail.com wrote: Another important and related use case is to reduce the search space, for example, in recommendation systems one often has to do the dot

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-28 Thread Arnaud Joly
On 28 Jan 2014, at 15:31, Olivier Grisel olivier.gri...@ensta.org wrote: 2014/1/28 Mathieu Blondel math...@mblondel.org: On Tue, Jan 28, 2014 at 9:25 PM, Olivier Grisel olivier.gri...@ensta.org wrote: While vanilla LSH is an interesting baseline for Approximate Nearest Neighbors

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-24 Thread Arnaud Joly
On 23 Jan 2014, at 07:18, Maheshakya Wijewardena pmaheshak...@gmail.com wrote: Arnaud, I've gone through those messages and I've already started working on patches. Last year I've done a project of a module in our university. It was to implement Bagging in Scikit-learn. As Gilles had

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-22 Thread Arnaud Joly
Hi Maheshakya, I could be one of the mentors for this GSOC. If you want to apply for a GSOC, I think that this message from Gael and Mathieu is worth reading http://sourceforge.net/mailarchive/message.php?msg_id=31864881 Best, Arnaud On 22 Jan 2014, at 06:13, Maheshakya Wijewardena

[Scikit-learn-general] A poster about scikit-learn at Giga-day

2014-01-17 Thread Arnaud Joly
Hi everyone, There is a local event at my university which is called Giga-day (http://www.giga.ulg.ac.be/jcms/prod_207504/fr/giga-day-2014) and I decided to present scikit-learn with a poster. The poster is largely inspired from the last NIPs talk about scikit-learn.

[Scikit-learn-general] A poster about scikit-learn at Giga-day

2014-01-17 Thread Arnaud Joly
Hi everyone, There is a local event at my university which is called Giga-day (http://www.giga.ulg.ac.be/jcms/prod_207504/fr/giga-day-2014) and I decided to present scikit-learn with a poster. The poster is largely inspired from the last NIPs talk about scikit-learn.

Re: [Scikit-learn-general] A poster about scikit-learn at Giga-day

2014-01-17 Thread Arnaud Joly
: Firstly, I doubt it matters, but some of the links are mangled. Then, I think it should say students' master's theses or something like this (plural). Also the chromosome 15 sounds strange to me compared to chromosome 15. Cheers, Vlad On 17/1/2014 14:39 , Arnaud Joly wrote: Hi

Re: [Scikit-learn-general] macro and micro average output

2013-12-16 Thread Arnaud Joly
Hi, Your problem is a binary classification task. In that case, the f1 score function returns the binary classification f1 score. In order to get multi class classification score, you have to set pos_label to None. For example, In [2]: gt = [0, 0, 1, 1, 0, 0, 1, 1, 0] In [3]: from

Re: [Scikit-learn-general] Contributing ensemble selection

2013-12-09 Thread Arnaud Joly
Hi, Thanks for your interest in contributing to scikit-learn. I agree that it's not a `major tool` and I would appreciate if you could guide me to any new `valuable` paper about forming an ensemble from library of models, or in general any paper that's `valuable` related to ensemble

Re: [Scikit-learn-general] LabelBinarizer for large data

2013-10-21 Thread Arnaud Joly
It sounds like you haven't enough memory to store a dense matrix of binarized labels. There is already one pr that tries to alleviate this problem : see https://github.com/scikit-learn/scikit-learn/pull/2458 Best, Arnaud On 20 Oct 2013, at 20:20, Olivier Grisel olivier.gri...@ensta.org

Re: [Scikit-learn-general] LabelBinarizer for large data

2013-10-21 Thread Arnaud Joly
, Mahendra Kariya On Monday, 21 October 2013 1:55 PM, Arnaud Joly arnaud4...@gmail.com wrote: It sounds like you haven't enough memory to store a dense matrix of binarized labels. There is already one pr that tries to alleviate this problem : see https://github.com/scikit-learn/scikit-learn

Re: [Scikit-learn-general] Release 0.14: tagged and pushed!

2013-08-09 Thread Arnaud Joly
Impressive change log ! Congratulations!! Arnaud -- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks,

Re: [Scikit-learn-general] GridSearchCV with multi-label: ROC-AUC-equivalent metrics

2013-07-31 Thread Arnaud Joly
It's what they have done in the mulan library. Arnaud On 19 Jul 2013, at 13:24, Olivier Grisel olivier.gri...@ensta.org wrote: 2013/7/19 Arnaud Joly arnaud4...@gmail.com: You can probably average the precision recall curve or use some ranking metrics [1]. Arnaud [1] Mining Multi-label

Re: [Scikit-learn-general] GridSearchCV with multi-label: ROC-AUC-equivalent metrics

2013-07-19 Thread Arnaud Joly
You can probably average the precision recall curve or use some ranking metrics [1]. Arnaud [1] Mining Multi-label Data http://lkm.fri.uni-lj.si/xaigor/slo/pedagosko/dr-ui/tsoumakas09-dmkdh.pdf On 19 Jul 2013, at 08:56, Eustache DIEMERT eusta...@diemert.fr wrote: I'm no expert, but I know

Re: [Scikit-learn-general] Python 3 port

2013-07-15 Thread Arnaud Joly
Is the py3k branch https://github.com/scikit-learn/scikit-learn/tree/py3k still useful? Arnaud On 09 Jul 2013, at 16:26, Olivier Grisel olivier.gri...@ensta.org wrote: The REAMDE-Py3k.rst was not reflecting the current situation. I just updated it. We don't use 2to3 anymore but a single code

Re: [Scikit-learn-general] mean(scores) vs score(concatenation). E.g. AUC with LOO validation

2013-07-15 Thread Arnaud Joly
The 0.13.X version of scikit-learn doesn't support grid search with an aux score. In the master branch, this is possible thanks to Andreas (see https://github.com/scikit-learn/scikit-learn/pull/1381) However, there is still work in progress on this subject see