Re: [Scikit-learn-general] Critical Difference Diagram

2015-10-29 Thread Arnaud Joly
scipy allows to perform the friedman test. Orange has the tool to drawn the critical distance diagram. And you can easily compute the critical distance using stats model: from statsmodels.stats.libqsturng import qsturng q_alpha = qsturng(1 - alpha, n_methods, np.inf) / np.sqrt(2) cd = q_alpha * n

Re: [Scikit-learn-general] Utility of random_state parameter for decision trees

2015-10-15 Thread Arnaud Joly
Your intuition is correct. For a decision tree with max_feature=None, the random_state is used to break ties randomly. Cheers, Arnaud > On 14 Oct 2015, at 17:33, Kevin Markham wrote: > > Hello, > > I'm a data science instructor that uses scikit-learn extensively in the > classroom. Yesterda

Re: [Scikit-learn-general] new commiters

2015-09-23 Thread Arnaud Joly
Congratulation and welcome !!! Arnaud > On 23 Sep 2015, at 08:59, Gael Varoquaux > wrote: > > Welcome to the team. You've been doing awesome work. We are very looking > forward to having you in the core devs. > > Gaël > > On Tue, Sep 22, 2015 at 07:16:59PM +0200, Alexandre Gramfort wrote: >

Re: [Scikit-learn-general] Estimators of RAKEL and (Ensemble) Classifier Chain for multilabel proposal

2015-07-13 Thread Arnaud Joly
The vanilla rakel and vanilla classifier chain would be a great addition in scikit-learn. FYI For the classifier chain, there is a stalled pull request https://github.com/scikit-learn/scikit-learn/pull/3727 . For the rakel classifier, t

Re: [Scikit-learn-general] Fwd: multilabel and multiclass classification.

2015-07-03 Thread Arnaud Joly
have a multi-label problem, you use a 2 dimensional array of shape (n_samples, n_labels). Best regards, Arnaud Joly > On 03 Jul 2015, at 14:48, Prabhanshu Abhishek wrote: > > > Sir, > I am using Scikit-learn for large scale item classification, which is > multiclass and multila

Re: [Scikit-learn-general] Scitkit-learn Random Forest Classifier

2015-05-27 Thread Arnaud Joly
Hi, You can control the number of attributes that is drawn (tested) at each node with the max_features parameters. Best regards, Arnaud Joly > On 27 May 2015, at 11:47, Herbert Schulz wrote: > > Hello everyone, > > I'm using the "Random Forest Classifier"

Re: [Scikit-learn-general] RFC (also by users) on interpreting 1d X

2015-05-04 Thread Arnaud Joly
I am in favour of raising a error. Arnaud > On 01 May 2015, at 19:58, Gael Varoquaux > wrote: > > I strongly advice raising an error. Very very very strongly. > > Being lax about ambiguous inputs makes prototyping and interactive usage > easier: less typing, and the systems gets it right mos

Re: [Scikit-learn-general] sample weights for RandomForestClassifier to compute cross_val_score with roc_auc metric

2015-04-26 Thread Arnaud Joly
If you set sample_weight[i] = 2, for the i-th samples. It will consider that this sample has to be accounted twice in the tree growing procedure (impurity computation, leaf labelling, …). Best regards, Arnaud > On 26 Apr 2015, at 16:00, Luca Puggini wrote: > > Ok thanks a lot, a last question

Re: [Scikit-learn-general] [ANN] scikit-learn 0.16.0 is out!

2015-03-27 Thread Arnaud Joly
Awesome !!! Thanks to all who contributed to this release!! Arnaud > On 27 Mar 2015, at 18:22, Gael Varoquaux > wrote: > > Congratulations Olivier and the whole team (thanks a lot to Andy for a > lot of work on the issues and the release. > > This is awesome! Releasing and quality assurance

Re: [Scikit-learn-general] Partial dependence plots for Random Forests.

2015-03-21 Thread Arnaud Joly
No , scikit-learn doesn’t have partial dependence plots for random forest. Best regards, Arnaud > On 21 Mar 2015, at 03:43, Shubham Singh Tomar > wrote: > > Does scikit-learn have any capacity for partial dependence plots and > associated data arrays for random forest analyses? > I can find t

Re: [Scikit-learn-general] GSoC2015 topics

2015-03-06 Thread Arnaud Joly
Hi, Sadly this year, I won’t have time for mentoring. However, I will try to find some spare time for reviewing! Best regards, Arnaud > On 05 Mar 2015, at 22:43, Andreas Mueller wrote: > > Hi Wei Xue. > Thanks for your interest. > For the GMM project being familiar with DPGMM and VB should b

Re: [Scikit-learn-general] Score function in Extra-Trees

2015-02-25 Thread Arnaud Joly
Hi Pierre-Luc, This is the same criterion, but with a different name. The maximisation of the reduction of variance at each split will lead to minimize the mean squared error. Cheers, Arnaud > On 24 Feb 2015, at 01:53, Pierre-Luc Bacon wrote: > > In the original Extra-Tree papers, the authors

Re: [Scikit-learn-general] Samples per estimator on Random Forests

2014-12-16 Thread Arnaud Joly
. If you want to perform random subspace, you can have a look to BaggingClassifier and BaggingRegressor. 3) It’s possible to achieve 1 and 2 using both the bagging and random forest estimators. Best regards, Arnaud Joly On 16 Dec 2014, at 09:06, Miquel Camprodon wrote: > Hi all, > &g

Re: [Scikit-learn-general] Fast Johnson-Lindenstrauss Transform

2014-10-29 Thread Arnaud Joly
Can you comment a bit how they combine the random sign matrix and the subsample random subsample fourrier basis? Best regards, Arnaud Joly On 29 Oct 2014, at 14:24, Michal Romaniuk wrote: > Hi everyone, > > I'm thinking of adding the Unrestricted Fast Johnson-Lindenstrauss >

Re: [Scikit-learn-general] Suggestion: break up the metrics module

2014-10-15 Thread Arnaud Joly
I totally agree with Gael. I would welcome improvements in the narrative documentation of http://scikit-learn.org/stable/modules/metrics.html about distances and kernels. It feels empty compare to http://scikit-learn.org/stable/modules/model_evaluation.html Best regards, Arnaud On 14 Oct 2014,

Re: [Scikit-learn-general] Welcome new core contributors

2014-10-13 Thread Arnaud Joly
Congratulation !!! Arnaud On 13 Oct 2014, at 03:13, Kyle Kastner wrote: > Thanks everyone! There are some nice new extensions for that algorithm > planned (randomized SVD!) once I get a moment to submit the proper PR. > I am happy to be able to contribute for such an awesome group :) > > On Su

Re: [Scikit-learn-general] Issues for 1.0

2014-09-22 Thread Arnaud Joly
Would it be possible that the issue with labels is https://github.com/scikit-learn/scikit-learn/issues/2451 ? Best regards, Arnaud On 21 Sep 2014, at 19:12, Andy wrote: > Hi all. > I remember that we had a couple of things we wanted to do for 1.0, but I > didn't really find a good list. > I k

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
or large - right ? > > Regards > Deb > > On Tue, Sep 16, 2014 at 6:07 PM, Arnaud Joly wrote: > During the growth of the decision tree, the best split is searched in a subset > of max_features sampled among all features. > > Setting the random_state allows to draw the sa

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
proach on having stable CV ? Not using > > random_state and doing several rounds of CV and averaging it ? or using > > different random_states > > and doing several rounds of CV and averaging it ? > > > > What exactly goes behind random_state from a Gradient

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
Hi, To get reproducible model, you have to set the random_state. Best regards, Arnaud On 16 Sep 2014, at 12:08, Debanjan Bhattacharyya wrote: > Hi I recently participated in the Atlas (Higgs Boson Machine Learning > Challenge) > > One of the models I tried was GradientBoostingClassifier. I

Re: [Scikit-learn-general] Sparse Gradient Boosting & Fully Corrective Gradient Boosting

2014-09-16 Thread Arnaud Joly
Hi, There is a very advanced pull request which add sparse matrix support to decision tree: https://github.com/scikit-learn/scikit-learn/pull/3173 Based on this, it could be possible to have gradient tree boosting working on sparse data. Note that adaboost already support sparse matrix with non-

Re: [Scikit-learn-general] Backward compat policy in utils

2014-09-16 Thread Arnaud Joly
I would add to this lists: - check_array; - check_consistent_length; - check_X_y. Those are very useful. Arnaud On 15 Sep 2014, at 20:03, Olivier Grisel wrote: > 2014-09-15 6:40 GMT-07:00 Mathieu Blondel : >> lightning is using the following utils: >> >> - check_random_st

Re: [Scikit-learn-general] oob_score_ for random forests for regression

2014-09-12 Thread Arnaud Joly
Here the link to the issue https://github.com/scikit-learn/scikit-learn/issues/3455 Arnaud On 12 Sep 2014, at 20:01, Arnaud Joly wrote: > If you want to work on custom oob scoring, there is an issue opened > for it. > > Best regards, > Arnaud > > On 12 Sep 2014, at 19

Re: [Scikit-learn-general] oob_score_ for random forests for regression

2014-09-12 Thread Arnaud Joly
If you want to work on custom ooh scoring, there is an issue opened for it. Best regards, Arnaud On 12 Sep 2014, at 19:01, Josh Wasserstein wrote: > Thanks! Couldn't find it on the documentation. I may try adding that to a PR. > > Josh > > On Fri, Sep 12, 2014 at 10:07 AM

Re: [Scikit-learn-general] oob_score_ for random forests for regression

2014-09-12 Thread Arnaud Joly
Hi, The r2_score metric is used. Best regards, Arnaud On 12 Sep 2014, at 16:04, Josh Wasserstein wrote: > What error metric is used for this? > > Josh -- Want excitement? Manually upgrade your production database. Wh

Re: [Scikit-learn-general] Dynamic Multiple Classifier Systems

2014-08-28 Thread Arnaud Joly
Hi, Which algorithm do you want to bring into scikit-learn? Note that algorithms that are ok for inclusion in scikit-learn have at least 3 years old (since publications), 1000+ cites and wide use and usefulness. [1] Best regards, Arnaud [1] http://scikit-learn.org/stable/faq.html#can-i-add-thi

Re: [Scikit-learn-general] Unpickle doesn't work when upgrading from 14.1 to 15.1.

2014-08-26 Thread Arnaud Joly
Note that most (if not all) speed improvement have been made to fit faster trees. Arnaud On 26 Aug 2014, at 06:56, Gael Varoquaux wrote: > On Tue, Aug 26, 2014 at 02:42:02AM +, Pranav Sharma wrote: >> I just upgraded scikit from 14.1 to 15.1 to take advantage of the speed >> improvements i

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Arnaud Joly
If you set n_jobs to XXX, it will spawn XXX threads or processes. Thus, you will need to ask for XXX cores. Note that it’s often possible to retrieve XXX in your script using os.environ. If you use less than the XXX cores, then you won’t use all the available cpu. If you ask for more than XXX cor

Re: [Scikit-learn-general] Sparse Random Projection negative weights

2014-08-08 Thread Arnaud Joly
rding to the > Li et al paper. Could you recommend some value? > > I think I will be more effective with LSA for now. Are there any specific > recommendations for the number of components? Chose 300 for now. > > Best, > Philipp > > Am 08.08.2014 um 13:14 schrieb Arnau

Re: [Scikit-learn-general] Sparse Random Projection negative weights

2014-08-08 Thread Arnaud Joly
Have you tried to increase the number of components or epsilon parameter and density of the SparseRandomProjection? Have you tried to normalise X prior the random projection? Best regards, Arnaud On 08 Aug 2014, at 12:19, Philipp Singer wrote: > Just another remark regarding this: > > I guess

Re: [Scikit-learn-general] [ANN] scikit-learn 0.15.1 is out

2014-08-04 Thread Arnaud Joly
Thanks Olivier! Arnaud On 01 Aug 2014, at 17:55, Olivier Grisel wrote: > This is a bugfix release. > > The list of fixes of this release can be found on: > > http://scikit-learn.org/stable/whats_new.html > > You can install from source or binary packages available here > > https://pypi.p

Re: [Scikit-learn-general] About scoring functions with GridSearchCV

2014-07-24 Thread Arnaud Joly
mator, are the values of ‘C’ > visited in order or at random? Or, in other words, if two or more values of > ‘C’ lead to similar results, say very close or identical, will the smallest > ‘C’ be the output ? > > Thank you! > > > From: Arnaud Joly [mailto:arnaud.v.j...@gmail

Re: [Scikit-learn-general] About scoring functions with GridSearchCV

2014-07-24 Thread Arnaud Joly
Hi, 1) Indeed, the default scoring metrics in classification is the accuracy 2) True, the best score will be given by the best average accuracy over the folds 3) It should raise an error as it is not a possible scorer. Hope it help, Arnaud Joly On 24 Jul 2014, at 22:09, Pagliari, Roberto

Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-16 Thread Arnaud Joly
Hi This looks like a regression. Can you open an issue on github? I am not sure that it would make sense to add a unknown columns label with an optional parameter. But you could easily add one with some numpy operations np.hstack([y, y.sum(axis=1,keepdims=True) == 0]) Best regards, Arnaud On

Re: [Scikit-learn-general] sparse matrix input support for GradientBoostingClassifiers or AdaBoostClassifier

2014-07-02 Thread Arnaud Joly
Hi, There is sparse input support with adaboost for weak learners that supports sparse input (such as sgd). For adaboost with decision tree as weak learner, this is in progress see the pull request https://github.com/scikit-learn/scikit-learn/pull/3173 For gradient tree boosting, nothing has b

Re: [Scikit-learn-general] Multi-class AND multilabel learning/prediction

2014-07-01 Thread Arnaud Joly
Hi, Can you describe your problem? Do you mean multi-output multi-clas? Best, Arnaud On 01 Jul 2014, at 11:13, Gundala Viswanath wrote: > According to this documentation here: > http://scikit-learn.org/stable/modules/multiclass.html > > The API listed there does EITHER multi-class OR multi-l

Re: [Scikit-learn-general] Classifiers that handle instance weighting in sklearn

2014-06-17 Thread Arnaud Joly
Hi, Without being exhaustive Random forest, extra trees, bagging, adaboost, naive bayes and several linear models support sample weight. Best regards, Arnaud On 17 Jun 2014, at 11:27, Mohamed-Rafik Bouguelia wrote: > Hello all, > > I've tried to associate weights to instances when trainin

Re: [Scikit-learn-general] Multilabel and differences betweeen 0.14 and Master

2014-06-10 Thread Arnaud Joly
Hi, Could you provide some minimal data as to reproduce this behavior? Best regards, Arnaud On 10 Jun 2014, at 16:53, Miguel Fernando Cabrera wrote: > Hi Everyone, > > This is my first post in the list. I have been using scikit-learn actively > for the last six month in my M.Sc. thesis and

Re: [Scikit-learn-general] [ANN] scikit-learn 0.15.0b1 is on PyPI (first beta release for 0.15.0)

2014-06-06 Thread Arnaud Joly
Hi all, Thanks Olivier for taking care of the release!! Best regards, Arnaud On 06 Jun 2014, at 15:14, Olivier Grisel wrote: > Hi all, > > I just pushed a first beta release (0.15.0b1) of the new 0.15.X branch to > PyPI. > > This releases includes (experimental) wheel packages for the firs

Re: [Scikit-learn-general] My talk was approved for EuroScipy'14

2014-05-22 Thread Arnaud Joly
Congratulation ! :-) Cheers, Arnaud On 22 May 2014, at 10:50, Peter Prettenhofer wrote: > congrats Gilles -- looking forward to your talk -- you should definitely make > a blog post from your material (and benchmarks)! > > > 2014-05-22 8:50 GMT+02:00 Vlad Niculae : > This is great news, c

Re: [Scikit-learn-general] confused with KNN

2014-04-25 Thread Arnaud Joly
Hi Chengxuan Wan, Without more details and a code example, it’s difficult to help you. Furthermore, it’s better to ask for help on the scikit-learn mailing list or on stack overflow. Best regards, Arnaud Joly On 25 Apr 2014, at 19:04, Chengxuan Wang wrote: > Hi, Arnaud Joly, > >

Re: [Scikit-learn-general] KFold cross validation strangely defaults to not shuffle

2014-04-25 Thread Arnaud Joly
On 23 Apr 2014, at 08:17, Mathieu Blondel wrote: > One solution would be to deprecate the "shuffle" option from KFold and add a > new class ShuffleKFold. > The documentation should clarify the difference between ShuffleKFold and > ShuffleSplit: in the latter you need to specify the split size

Re: [Scikit-learn-general] Welcome to GSoC students

2014-04-23 Thread Arnaud Joly
Welcome and congratulation to Issam, Hamzeh, Manoj and Maheshakya! Arnaud On 23 Apr 2014, at 07:51, Robert Layton wrote: > Thanks Gaël. The fact we received four students is testament to the hard work > everyone has done before me! > > > On 23 April 2014 15:46, Gael Varoquaux wrote: > Hi, >

Re: [Scikit-learn-general] GSoC acceptance - Sparse Support

2014-04-23 Thread Arnaud Joly
Congratulation Hamzeh !!! I am looking forward working with you ! Arnaud On 23 Apr 2014, at 03:57, Hamzeh Alsalhi wrote: > Thank you to Gael and Arnaud for the support and criticism on my early > proposal. I am a big fan of the high coding and collaboration standards at > scikit-learn. I hop

Re: [Scikit-learn-general] GSoC

2014-04-08 Thread Arnaud Joly
ree, since we are working with non-zero element pointers, this should > not be memory/time inefficient. But my question is, is this acceptable? > > > On Mon, Mar 17, 2014 at 6:49 PM, Arnaud Joly wrote: > Hi, > > > The support for sparse matrices should exploit as much as

Re: [Scikit-learn-general] LabelBinarizer changes complete?

2014-03-25 Thread Arnaud Joly
Gollamudi wrote: > Quoting Arnaud Joly : >> Can you provide a gist of your code as to help you? > > I have an implementation that mimics OnevsRestClassifier I want to > eventually try partial_fit since the number of samples is large. Here >

Re: [Scikit-learn-general] LabelBinarizer changes complete?

2014-03-24 Thread Arnaud Joly
Hi, Can you provide a gist of your code as to help you? The pr 2458 isnt finished yet and there is possibly some quirk cases where it might fail. However in the branch https://github.com/arjoly/scikit-learn/commits/sparse-label_binarizer, I almost finished the label binarizer part. I can try t

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-03-21 Thread Arnaud Joly
Hi Issam, Why not starting by improving multilayer neural network before adding new algorithms ? To neural network expert, is it interesting to have layer configuration à la Torch https://github.com/torch/nn/blob/master/README.md ? Best, Arnaud On 21 Mar 2014, at 10:18, Issam wrote: > Hi

Re: [Scikit-learn-general] GSOC 2014 scipy.sparse matrix support to DT

2014-03-18 Thread Arnaud Joly
t-learn/issues/655 > https://github.com/scikit-learn/scikit-learn/issues/2399 > > And our mentor is Arnaud Joly, you can ask him for help > One of the implementation in progress is > https://github.com/scikit-learn/scikit-learn/pull/2848 > > And refer to https://github.com/fest

Re: [Scikit-learn-general] GSoC

2014-03-17 Thread Arnaud Joly
are adding support for sparse matrices, why not exploit their > structure as much as we can. Arnaud is this feasible ? Eltermann anything > wrong in my thinking ? > > > cheers, > kaushik varanasi > On Wed, Mar 12, 2014 at 8:29 PM, Arnaud Joly wrote: > For the number of c

Re: [Scikit-learn-general] GSoC

2014-03-12 Thread Arnaud Joly
>> this is a possible implementation criterion ?. And i would also like to know >>> the implementation plan that you have in mind ? >>> >>> Secondly, In my previous mail i have shown you my contributions. I would >>> just like to know if the patches are

Re: [Scikit-learn-general] GSoC

2014-03-11 Thread Arnaud Joly
Vamsi kaushik is actually me. > Thanks for your reply, i'll get to the issue soon > > cheers, > vamsi kaushik > > > On Mon, Mar 10, 2014 at 3:16 PM, Arnaud Joly wrote: > Hi, > > Anything concerning the GSOC should pass by the scikit-learn > mailing list. &g

Re: [Scikit-learn-general] proposal

2014-03-10 Thread Arnaud Joly
Hi, The model representation (the tree structure) shouldn’t be affected by the fact that the input data is sparse or dense. You might be interested by this issue https://github.com/scikit-learn/scikit-learn/issues/655. Best, Arnaud On 07 Mar 2014, at 18:13, vamsi kaushik wrote: > hi Gael,

Re: [Scikit-learn-general] GSoC

2014-03-10 Thread Arnaud Joly
Hi, Anything concerning the GSOC should pass by the scikit-learn mailing list. Thanks for your interest in the subject. If you intend to apply for a GSOC, I suggest you to read https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2014 and start contributing to scik

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-02-08 Thread Arnaud Joly
://github.com/eltermann/scikit-learn/commits/DT-sparse > [2] > https://github.com/eltermann/scikit-learn/commit/8388bedff4e225cda9a1b2b6e3fc250bb7d22276#diff-a2cead4f3702cc4b9f76562bb2777edbL2297 > [3] > https://github.com/eltermann/scikit-learn/commit/5ba9c367661446c3eba7e6ea54adc1ff5cdfd39f#diff-a2cead4f3702

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-02-05 Thread Arnaud Joly
I think that I would go for the option that minimize the amount of code duplication. I would probably start with 2. Since we don’t pickle anymore the Splitter and criterion, the constructor arguments could be used to pass the X and the y matrix. Cheers, Arnaud On 04 Feb 2014, at 17:38, Feli

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-31 Thread Arnaud Joly
Here, some results on the 20 newsgroups dataset: Classifiertrain-time test-time error-rate 5-nn0.0047s 13.6651s0.5916 random forest 263.3146s3.9985s0.2459 sgd 0.2265s0.0657s

Re: [Scikit-learn-general] Google Summer of Code - ideas

2014-01-31 Thread Arnaud Joly
Hello, Your contributions to scikit-learn is highly appreciated. However, we use only the scikit-learn mailing list to discuss about GSOC ideas. At the moment, I don’t want to give any, but might give some in a near future. We should definitely remove the old list. Since it biases applicants to

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-28 Thread Arnaud Joly
On 28 Jan 2014, at 15:31, Olivier Grisel wrote: > 2014/1/28 Mathieu Blondel : >> >> >> >> On Tue, Jan 28, 2014 at 9:25 PM, Olivier Grisel >> wrote: >>> >>> While vanilla LSH is an interesting baseline for Approximate Nearest >>> Neighbors search, it is often too error-prone to be practicall

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-28 Thread Arnaud Joly
You can also reduce the dimensionality using random projections. Arnaud On 28 Jan 2014, at 11:39, Nick Pentreath wrote: > Another important and related use case is to reduce the search space, for > example, in recommendation systems one often has to do the dot product, or > cosine similari

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-24 Thread Arnaud Joly
On 23 Jan 2014, at 07:18, Maheshakya Wijewardena wrote: > Arnaud, > I've gone through those messages and I've already started working on patches. > Last year I've done a project of a module in our university. It was to > implement Bagging in Scikit-learn. As Gilles had already begun that, I

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-22 Thread Arnaud Joly
Hi Maheshakya, I could be one of the mentors for this GSOC. If you want to apply for a GSOC, I think that this message from Gael and Mathieu is worth reading http://sourceforge.net/mailarchive/message.php?msg_id=31864881 Best, Arnaud On 22 Jan 2014, at 06:13, Maheshakya Wijewardena wrote:

Re: [Scikit-learn-general] A poster about scikit-learn at Giga-day

2014-01-17 Thread Arnaud Joly
irstly, I doubt it matters, but some of the links are mangled. > Then, I think it should say "students' master's theses" or something > like this (plural). Also "the chromosome 15" sounds strange to me > compared to "chromosome 15". > > C

[Scikit-learn-general] A poster about scikit-learn at Giga-day

2014-01-17 Thread Arnaud Joly
Hi everyone, There is a local event at my university which is called Giga-day (http://www.giga.ulg.ac.be/jcms/prod_207504/fr/giga-day-2014) and I decided to present scikit-learn with a poster. The poster is largely inspired from the last NIPs talk about scikit-learn. http://static.ajoly.org/files

[Scikit-learn-general] A poster about scikit-learn at Giga-day

2014-01-17 Thread Arnaud Joly
Hi everyone, There is a local event at my university which is called Giga-day (http://www.giga.ulg.ac.be/jcms/prod_207504/fr/giga-day-2014) and I decided to present scikit-learn with a poster. The poster is largely inspired from the last NIPs talk about scikit-learn. http://static.ajoly.org/files

Re: [Scikit-learn-general] macro and micro average output

2013-12-16 Thread Arnaud Joly
Hi, Your problem is a binary classification task. In that case, the f1 score function returns the binary classification f1 score. In order to get multi class classification score, you have to set pos_label to None. For example, In [2]: gt = [0, 0, 1, 1, 0, 0, 1, 1, 0] In [3]: from sklearn.metr

Re: [Scikit-learn-general] Contributing ensemble selection

2013-12-09 Thread Arnaud Joly
Hi, Thanks for your interest in contributing to scikit-learn. > > I agree that it's not a `major tool` and I would appreciate if you could > guide me to any new `valuable` paper about forming an ensemble from library > of models, or in general any paper that's `valuable` related to ensemble

Re: [Scikit-learn-general] LabelBinarizer for large data

2013-10-21 Thread Arnaud Joly
> > > Regards, > Mahendra Kariya > > > On Monday, 21 October 2013 1:55 PM, Arnaud Joly wrote: > > It sounds like you haven't enough memory to store a dense matrix of binarized > labels. > > There is already one pr that tries to alleviate this problem

Re: [Scikit-learn-general] LabelBinarizer for large data

2013-10-21 Thread Arnaud Joly
It sounds like you haven't enough memory to store a dense matrix of binarized labels. There is already one pr that tries to alleviate this problem : see https://github.com/scikit-learn/scikit-learn/pull/2458 Best, Arnaud On 20 Oct 2013, at 20:20, Olivier Grisel wrote: > 2013/10/20 Mahendra

Re: [Scikit-learn-general] PyStruct 0.1 released

2013-08-24 Thread Arnaud Joly
Congratulations !!! :-) Arnaud On 11 Aug 2013, at 19:55, Andreas Mueller wrote: > Hey everybody. > > I just wanted to spam the ML again and say I just "released" PyStruct 0.1. > It contains structured support vector machines, structured perceptrons > and models for multi-label prediction, grap

Re: [Scikit-learn-general] Unable to test a dummy classifier with a score function that requires a probability estimate

2013-08-15 Thread Arnaud Joly
The class _ThresholdScorer in sklearn.metrics.scorer need to be patched to accept multi-label input. A pull request is welcomed ! Best regards, Arnaud On 14 Aug 2013, at 17:35, Josh Wasserstein wrote: > Say I define the following scoring function: > > def multi_label_macro_auc(y_gt, y_pred):

Re: [Scikit-learn-general] Getting training score in multi-label classification problems

2013-08-09 Thread Arnaud Joly
Hi, Auc score doesn't support multi-label input data. But a pull request is welcome ! Best regards, Arnaud Joly On 05 Aug 2013, at 04:05, Issam wrote: > Hi, > > Does scikit have an implemented metrics that computes the auc score for > multi-label classification? That i

Re: [Scikit-learn-general] Release 0.14: tagged and pushed!

2013-08-09 Thread Arnaud Joly
Impressive change log ! Congratulations!! Arnaud -- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, w

Re: [Scikit-learn-general] GridSearchCV with multi-label: ROC-AUC-equivalent metrics

2013-07-31 Thread Arnaud Joly
It's what they have done in the mulan library. Arnaud On 19 Jul 2013, at 13:24, Olivier Grisel wrote: > 2013/7/19 Arnaud Joly : >> You can probably average the precision recall curve >> or use some ranking metrics [1]. >> >> Arnaud >> >> [1] M

Re: [Scikit-learn-general] GridSearchCV with multi-label: ROC-AUC-equivalent metrics

2013-07-19 Thread Arnaud Joly
You can probably average the precision recall curve or use some ranking metrics [1]. Arnaud [1] Mining Multi-label Data http://lkm.fri.uni-lj.si/xaigor/slo/pedagosko/dr-ui/tsoumakas09-dmkdh.pdf On 19 Jul 2013, at 08:56, Eustache DIEMERT wrote: > I'm no expert, but I know this paper which prop

Re: [Scikit-learn-general] Python 3 port

2013-07-15 Thread Arnaud Joly
Is the py3k branch https://github.com/scikit-learn/scikit-learn/tree/py3k still useful? Arnaud On 09 Jul 2013, at 16:26, Olivier Grisel wrote: > The REAMDE-Py3k.rst was not reflecting the current situation. I just > updated it. We don't use 2to3 anymore but a single code base with > helpers in

Re: [Scikit-learn-general] mean(scores) vs score(concatenation). E.g. AUC with LOO validation

2013-07-15 Thread Arnaud Joly
The 0.13.X version of scikit-learn doesn't support grid search with an aux score. In the master branch, this is possible thanks to Andreas (see https://github.com/scikit-learn/scikit-learn/pull/1381) However, there is still work in progress on this subject see https://github.com/scikit-learn/sci