Re: [scikit-learn] Scikit-learn got a prize in France

2022-02-06 Thread Joel Nothman
Do we suddenly get recognition when we release 1.0?! :D Well done everyone for getting us here :) Joel On Sun, 6 Feb 2022 at 05:25, bthirion wrote: > Congrats ! B > > Le 05/02/2022 à 16:23, Gael Varoquaux a écrit : > > Hi everyone, > > > > It has just been announced that scikit-learn has

Re: [scikit-learn] [ANNOUNCEMENT] scikit-learn 1.0 release

2021-09-26 Thread Joel Nothman
Thanks to some amazing work from the core development team, as well as our triagers, and other contributors. We finally got here! On Sat, 25 Sept 2021 at 03:13, Olivier Grisel wrote: > Yeah! > > Thank you so much Adrin for all your efforts in getting this release out! > > Congratulations

Re: [scikit-learn] pipeline diagram

2021-08-29 Thread Joel Nothman
HI Reshama, You can click the nodes in the diagram (obviously the screenshot loses this). Is there some way we can make that more obvious? Passing your mouse (if you're on an appropriate device) over it shows the hand cursor, which is some indication. Would it be helpful if when the user put

Re: [scikit-learn] [TC Vote] Technical Committee vote: line length

2021-07-27 Thread Joel Nothman
> > Keep current 88 characters > Joel Nothman (though admittedly not strong!) > > Revert to 79 characters: > Alex Gramfort Adrin Jalali ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] ANN: scikit-learn-extra 0.2.0 released

2021-04-17 Thread Joel Nothman
Agreed. Well done on making Extra happen! On Thu, 15 Apr 2021, 7:00 pm Guillaume Lemaître, wrote: > Cool work guys. Thanks to the team that takes care about these extensions. > > On Wed, 14 Apr 2021 at 22:54, Timothee Mathieu < > timothee.math...@universite-paris-saclay.fr> wrote: > >> Hello,

Re: [scikit-learn] [Vote] SLEP006: Routing sample-aligned metadata

2021-03-09 Thread Joel Nothman
pted. Thanks all for your considered critique and contributions. On Sat, 27 Feb 2021 at 20:42, Joel Nothman wrote: > Hi all, > > Just a reminder that we are ten days into the month-long voting period, > with one vote on record. Core devs, please find time to consider this > pr

Re: [scikit-learn] [Vote] SLEP006: Routing sample-aligned metadata

2021-02-27 Thread Joel Nothman
metadata={'sample_weight': my_weights,... >>> 'groups': my_groups},... >>> scoring=weighted_acc) On Thu, 18 Feb 2021 at 00:08, Joel Nothman wrote: > With thanks to Alex, Adrin and Christian, we have a proposal to implement > w

[scikit-learn] [Vote] SLEP006: Routing sample-aligned metadata

2021-02-17 Thread Joel Nothman
With thanks to Alex, Adrin and Christian, we have a proposal to implement what we used to call "sample props" that should be expressive enough for us to resolve tens of issues and PRs, but will be largely unobtrusive for most current users. Core developers, please cast your vote in this PR

Re: [scikit-learn] ANN scikit-learn 0.24.0 release

2020-12-22 Thread Joel Nothman
Thanks and congrats to all involved! Some very helpful features. And rumour has it we might have version 1.0 in 2021... :-o On Wed, 23 Dec 2020, 4:12 am Guillaume Lemaître, wrote: > We're happy to announce the 0.24.0 release and already out on PyPI and > conda-forge. > > You can read the

Re: [scikit-learn] major league hacking summer internship program

2020-06-01 Thread Joel Nothman
I put together than inappropriate-for-purpose list of things with distance metrics when you asked re gblomier! But maybe still not fit for this purpose. On Sat, 30 May 2020 at 00:23, Andreas Mueller wrote: > Thanks folks! That gives us a good start I think! > > Re documentation: honestly I'm

Re: [scikit-learn] Why the default max_samples of Random Forest is X.shape[0]?

2020-05-10 Thread Joel Nothman
A bootstrap is very commonly a random draw with replacement of equal size to the original sample. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-05-06 Thread Joel Nothman
When it comes to trees, the API for handling categoricals is simpler than the implementation. Traditionally, tree-based models' handling of categorical variables differs from both ordinal and one-hot encoding, while both of those will work reasonably well for many problems. We are working on

Re: [scikit-learn] Vote: Add Adrin Jalali to the scikit-learn technical committee

2020-04-27 Thread Joel Nothman
+1 On Tue, 28 Apr 2020 at 02:23, Tom DLT wrote: > +1 > > Le lun. 27 avr. 2020, à 07 h 00, Alexandre Gramfort < > alexandre.gramf...@inria.fr> a écrit : > >> +1 >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >>

[scikit-learn] distances

2020-03-03 Thread Joel Nothman
I noticed a comment by @amueller on Gitter re considering a project on our distances implementations. I think there's a lot of work that can be done in unifying distances implementations... (though I'm not always sure the benefit.) I thought I would summarise some of the issues below, as I was

Re: [scikit-learn] Monthly meetings

2020-02-23 Thread Joel Nothman
Helpful, thank you Guillaume On Sat., 22 Feb. 2020, 10:14 am Guillaume Lemaître, wrote: > Hi all, > > I attached the notes that I prepared: notes > We might have to prioritize if we want to make the meeting in an hour. > > Cheers, > > > > On Fri, 21 Feb 2020 at 17:15,

Re: [scikit-learn] How to make sure stop words are matched when lowercase=False?

2020-01-28 Thread Joel Nothman
There is no such code. You need to make sure that the normalisation you use matches the normalisation applied when constructing a stop word list. Unfortunately we do not provide for this directly, and it is not easy to do so in the general case. ___

Re: [scikit-learn] Recommended way of distributing persisted models so they work on different architectures

2020-01-28 Thread Joel Nothman
Yes, ONNX is an appropriate solution when exporting models for prediction. See http://scikit-learn.org/stable/modules/model_persistence.html On Tue, 28 Jan 2020 at 23:03, Christopher.samiullah via scikit-learn < scikit-learn@python.org> wrote: > Dear admins, > > > I recently encountered an issue

Re: [scikit-learn] Memory efficient TfidfVectorizer

2020-01-28 Thread Joel Nothman
Are you concerned about storing the whole corpus text in memory, or the whole corpus' statistics? If the text, use input='file' or input='filename' (or a generator of texts). On Tue, 28 Jan 2020 at 18:01, Peng Yu wrote: > Hi, > > To use TfidfVectorizer, the whole corpus must be used into

Re: [scikit-learn] What are the stopwords used by CountVectorizer?

2020-01-27 Thread Joel Nothman
See also https://www.aclweb.org/anthology/W18-2502/ for a critique of this and other stop word lists. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Why is subset invariance necessary for transfom()?

2020-01-20 Thread Joel Nothman
I think allowing subset invariance to not hold is making stronger assumptions than we usually do about what it means to have a "test set". Having a transformation like this that relies on test set statistics implies that the test set is more than just selected samples, but rather that a large

Re: [scikit-learn] ANN: DIPY 1.1.1 - a powerful release

2020-01-18 Thread Joel Nothman
If the Scikit-learn mailing list is going to include announcements of related package releases, could we please get a line or two describing that package? I expect most readers here don't know of DIPY, or of its relevance to Scikit-learn users. (I'm still not sure why it's generally relevant to

Re: [scikit-learn] Time for Roadmap for the coming years?

2020-01-07 Thread Joel Nothman
The roadmap includes a statement of purpose as at 2018. I don't think the core developers think the roadmap itself is very outdated. But thanks for the reminder. Joel ___ scikit-learn mailing list scikit-learn@python.org

Re: [scikit-learn] Heisenbug?

2019-12-16 Thread Joel Nothman
Hi Dan, this kind of error can come from overflow. Are all of your test systems the same architecture? On Tue., 17 Dec. 2019, 12:03 pm Dan Stromberg, wrote: > Hi folks. > > I'm new to Scikit-learn. > > I have a very large Python project that seems to have a heisenbug which is > manifesting in

Re: [scikit-learn] Vote on SLEP010: n_features_in_ attribute

2019-12-04 Thread Joel Nothman
I am +1 for this, but I think we should look at how to make these new validation methods usable by external developers ideally supporting multiple Scikit-learn versions (i.e. we need something in stable public or protected API). A simple solution is to make default implementations of

Re: [scikit-learn] Vote on SLEP010: n_features_in_ attribute

2019-12-04 Thread Joel Nothman
Oh... I remember what we landed up on, actually... we've made _validate_data private so downstream estimators can't technically expect to use it reliably across any versions... ___ scikit-learn mailing list scikit-learn@python.org

Re: [scikit-learn] ANN: scikit-learn 0.22 final release

2019-12-04 Thread Joel Nothman
The stacked estimators was certainly a team effort! I am excited that we've finally got a consistent solution to using approximate nearest neighbors with our neighbors-based learners. Why is it still version <1? Perhaps it shouldn't be. But it can be hard to set aside perfectionism! And there's

Re: [scikit-learn] Vote on SLEP010: n_features_in_ attribute

2019-12-04 Thread Joel Nothman
We are looking to have n_features_out_ for transformers. This naming makes the difference explicit. I would like to see some guidance on how an estimator implementation (e.g. in scikit-learn-contrib) is advised to maintain compatibility with Scikit-learn pre- and post- SLEP010. That is, we want

Re: [scikit-learn] CutEncoder - simple suggestion for sklearn.preprocessing

2019-10-31 Thread Joel Nothman
id not know it before :(. >> >> thanks. >> >> On Fri, Nov 1, 2019 at 1:46 AM Joel Nothman >> wrote: >> >>> >>> Why is this preferable to KBinsDiscretizer? >>> >>> Where the bin edges are fixed, FunctionTransformer can be used with >

Re: [scikit-learn] CutEncoder - simple suggestion for sklearn.preprocessing

2019-10-31 Thread Joel Nothman
Why is this preferable to KBinsDiscretizer? Where the bin edges are fixed, FunctionTransformer can be used with pandas.cut. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Reminder: Monday October 28th meeting

2019-10-26 Thread Joel Nothman
Reminder: time is 12:00Z. https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019=10=28=12=0=0=240=33=37=179=195 On Fri., 25 Oct. 2019, 2:15 am Adrin, wrote: > Hi Scikit-learn people, > > This is a reminder that we'll be having our monthly call on Monday. > > Please put your

[scikit-learn] Website redesign

2019-09-22 Thread Joel Nothman
Hi scikit-learn users, Scikit-learn developer Thomas Fan recently gave our documentation and web site a refresh, targeting desktop and mobile devices. Please give it a try at https://scikit-learn.org/dev/ and raise usability issues at https://github.com/scikit-learn/scikit-learn/issues/new to

Re: [scikit-learn] Vote on SLEP009: keyword only arguments

2019-09-17 Thread Joel Nothman
I think you mean keyword-only, Alex On Tue., 17 Sep. 2019, 4:11 pm Alexandre Gramfort, < alexandre.gramf...@inria.fr> wrote: > Yes I am +1 for positional arguments for the __init__ of the estimators. > > Alex > Albert: my position when reviewing changes in accordance with this SLEP would be to

Re: [scikit-learn] Vote on SLEP009: keyword only arguments

2019-09-16 Thread Joel Nothman
5/09/2019 00:21, Thomas J Fan wrote: > > +1 from me > > > > On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman > <mailto:joel.noth...@gmail.com>> wrote: > > > > I am +1 for this change. > > > > I agree that users will ac

Re: [scikit-learn] Vote on SLEP009: keyword only arguments

2019-09-14 Thread Joel Nothman
might be unknown, it will remain unknown until > projects > from the ecosystem are not using it. > > To the question: which methods should be impacted? > > I think we should be as gentle as possible at first. I am a little > concerned about > breaking some codes which wer

Re: [scikit-learn] Vote on SLEP009: keyword only arguments

2019-09-11 Thread Joel Nothman
These there details of specific API changes to be decided: The question being put, as per the SLEP, is: do we want to utilise Python 3's force-keyword-argument syntax and to change existing APIs which support arguments positionally to use this syntax, via a deprecation period?

Re: [scikit-learn] Outreachy program

2019-09-08 Thread Joel Nothman
I'm broadly supportive, but just wanted to note our challenges with mentoring GSoC in the past: - Limited mentor availability should not be a big issue now. - Need to focus on a single project may not be well aligned with Scikit-learn's goals, or may not yield optimal code results. -

Re: [scikit-learn] Fwd: [Nairobi, Kenya WiMLDS] OS sprint (June 2019): Impact Report

2019-08-11 Thread Joel Nothman
Awesome work and great write-up, Reshama. Thanks Andy and Adrin especially, for bringing us along in your commitment to such causes. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Monthly meetings between core developers + "Hello World"

2019-08-05 Thread Joel Nothman
Yay for technology! Awesome to see you all and have some matters clarified. Adrin is right that the issue tracker is increasingly overwhelming (because there are more awesome people hired to work on the project, more frequent sprints, etc). This meeting is a useful summary. The meeting mostly

Re: [scikit-learn] Monthly meetings between core developers + "Hello World"

2019-08-01 Thread Joel Nothman
Such meetings could go quite long. (Think those sprint meetings in Paris...) How do we time box discussions, or ensure that the most important/urgent/low-hanging things are covered? Should the objective of the meeting be firstly to prioritise or assign work, and secondarily to discuss issues? To

Re: [scikit-learn] SVM-RFE with scoring = 'f1'

2019-08-01 Thread Joel Nothman
Or use scoring=make_scorer(f1_score, pos_label='n.pre') On Fri, 2 Aug 2019 at 06:15, Malik Yousef wrote: > Hello > When in using the scoring to be 'f1' then i get an error. > Here is the code and the error > > X=data > y=target_column > classifier = LinearSVC() > rfecv =

[scikit-learn] ANN Scikit-learn 0.21.3 and 0.20.4 released

2019-07-29 Thread Joel Nothman
We have released patches to Scikit-learn 0.21 (python >=3.5) and 0.20 (python 2 and 3) series including several bug fixes. See their respective change logs at https://scikit-learn.org/dev/whats_new/v0.21.html#version-0-21-3 and https://scikit-learn.org/dev/whats_new/v0.20.html#version-0-20-4.

Re: [scikit-learn] Continues monitoring of benchmark performances

2019-07-22 Thread Joel Nothman
Isn't Jérémie's project at https://github.com/jeremiedbb/scikit-learn_benchmarks meant to be doing this? What's its status? How does it relate to Tom's work? (Can we please take http://scikit-learn.org/ml-benchmarks/ offline?) On Tue, 23 Jul 2019 at 00:17, Nicolas Hug wrote: > I agree having

Re: [scikit-learn] Monthly meetings between core developers

2019-07-18 Thread Joel Nothman
I'm away on a holiday at the moment (in case you hadn't identified my silence). I'd be keen to join in but might not be able to move schedules around it. I like the idea of prioritising together, though I'm not sure how to keep the meetings clipped. I'm also going to be quite lost on the issue

Re: [scikit-learn] How is linear regression in scikit-learn done? Do you need train and test split?

2019-06-01 Thread Joel Nothman
You're right that you don't need to use CV for hyperparameter estimation in linear regression, but you may want it for model evaluation. As far as I understand: Holding out a test set is recommended if you aren't entirely sure that the assumptions of the model are held (gaussian error on a linear

Re: [scikit-learn] Google code reviews

2019-05-25 Thread Joel Nothman
For some of the larger PRs, this might be helpful. Not going to help where the intricacies of Scikit-learn API come in play. On Sat, 25 May 2019 at 04:17, Andreas Mueller wrote: > Hi All. > What do you think of https://www.pullrequest.com/googleserve/? > It's sponsored code reviews. Could be

[scikit-learn] ANN: Scikit-learn 0.21.2 released

2019-05-25 Thread Joel Nothman
We've released 0.21.2 primarily to fix an issue with euclidean_distances (and pairwise_distances). It should be available on PyPI and Conda-Forge. Full list of changes at https://scikit-learn.org/0.21/whats_new/v0.21.html Thanks to all who helped fix these issues so quickly after 0.21.1.

Re: [scikit-learn] ANN: scikit-learn 0.21.2 released

2019-05-25 Thread Joel Nothman
Sorry, didn't see this one already went through! Whoops. On Fri, 24 May 2019 at 17:41, Olivier Grisel wrote: > A quick bugfix release to fix a critical regression in the computation > of the euclidean distances returning incorrect values silently. > > This release also includes other bugfixes

[scikit-learn] ANN: scikit-learn 0.21 released

2019-05-16 Thread Joel Nothman
Thanks to the work of many, many contributors, we have released Scikit-learn 0.21. It is available from GitHub, PyPI and Conda-forge, but is not yet available on the Anaconda defaults channel. * Documentation at https://scikit-learn.org/0.21 * Release Notes at

Re: [scikit-learn] Can I evaluate clustering efficiency incrementally?

2019-05-16 Thread Joel Nothman
tics alone, but it's not come to fruition. On Thu, 16 May 2019 at 11:47, lampahome wrote: > Joel Nothman 於 2019年5月15日 週三 下午12:16寫道: > >> Evaluating on large datasets is easy if the sufficient statistics are >> just the contingency matrix. >> >> > Sorry, I don't und

Re: [scikit-learn] Can I evaluate clustering efficiency incrementally?

2019-05-14 Thread Joel Nothman
Evaluating on large datasets is easy if the sufficient statistics are just the contingency matrix. On Tue., 14 May 2019, 11:19 pm Tom Augspurger, wrote: > If anyone is interested in implementing these, dask-ml would welcome > additional > metrics that work well with Dask arrays: >

Re: [scikit-learn] Fwd: Proposing Encoder class to encode Ordinal attributes

2019-05-13 Thread Joel Nothman
There has been an issue and a pull request for something similar in DictVectorizer. https://github.com/scikit-learn/scikit-learn/pull/8750 got close to merging and I'm not really sure why it was closed rather than completed. ___ scikit-learn mailing list

[scikit-learn] Release Candidate for Scikit-learn 0.21

2019-04-30 Thread Joel Nothman
PyPI now has source and binary releases for Scikit-learn 0.21rc2. * Documentation at https://scikit-learn.org/0.21 * Release Notes at https://scikit-learn.org/0.21/whats_new * Download source or wheels at https://pypi.org/project/scikit-learn/0.21rc2/ Please try out the software and help us edit

Re: [scikit-learn] Any other clustering algo cluster incrementally?

2019-04-30 Thread Joel Nothman
I think it would be possible to implement an incremental extension to dbscan. But it's been years since I looked at what is involved and it might require storing the training data, unlike those out of core methods. ___ scikit-learn mailing list

Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Joel Nothman
I think it's a bit weird if we're returning sparse output from OneVsRestClassifier.predict if it wasn't fit on sparse Y. Actually, I would be in favour of deprecating multilabel support in OneVsRestClassifier, since it is performing "binary relevance method" for multilabel, not actually OvR.

Re: [scikit-learn] API Discussion: Where shall we put the plotting functions?

2019-04-04 Thread Joel Nothman
Well it would certainly be a low-cost effort improvement if we demonstrated yellowbrick in our examples. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Why is cross_val_predict discouraged?

2019-04-04 Thread Joel Nothman
> I assume that you want to tell that it is not wise to compute TP, FP, FN and then precision and recall using cross_val_predict. If this is what you mean, I'd like you to explain why. Because if there is high variance as a function of training set rather than test sample I'd like to know. > The

[scikit-learn] New core developers: thomasjpfan and nicolashug

2019-04-03 Thread Joel Nothman
The core developers of Scikit-learn have recently voted to welcome Thomas Fan and Nicolas Hug to the team, in recognition of their efforts and trustworthiness as contributors. Both happen to be working with Andy Mueller at Columbia University at the moment. Congratulations and thanks to them both!

Re: [scikit-learn] Why is cross_val_predict discouraged?

2019-04-03 Thread Joel Nothman
um 13:59 schrieb Joel Nothman: > > The equations in Murphy and Hastie very clearly assume a metric > decomposable over samples (a loss function). Several popular metrics > are not. > > For a metric like MSE it will be almost identical assuming the test > sets have almost the same

Re: [scikit-learn] API Discussion: Where shall we put the plotting functions?

2019-04-03 Thread Joel Nothman
t; > >> > >> > >> On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin >> <mailto:qinhanmin2...@sina.com>> wrote: > >> > >> See https://github.com/scikit-learn/scikit-learn/issues/13448 > >> > >> We've int

Re: [scikit-learn] Why is cross_val_predict discouraged?

2019-04-03 Thread Joel Nothman
The equations in Murphy and Hastie very clearly assume a metric decomposable over samples (a loss function). Several popular metrics are not. For a metric like MSE it will be almost identical assuming the test sets have almost the same size. For something like Recall (sensitivity) it will be

Re: [scikit-learn] F1 score weirdness

2019-03-28 Thread Joel Nothman
No it is the macro average of the per-class f1, i.e. an arithmetic mean over harmonic means of P & R per class On Fri., 29 Mar. 2019, 9:53 am Max Halford, wrote: > Hey everyone, > > I've stumbled upon an inconsistency with the F1 score and I can't seem to > get around it. I have two lists

Re: [scikit-learn] Difference in prediction accuracy using SGDClassifier and Cross validation scores.

2019-03-12 Thread Joel Nothman
You are calculating recall, not accuracy. On Sun, 10 Mar 2019 at 05:36, Rajnish kamboj wrote: > > Hi > > I have recently started machine learning and it is my first query regarding > prediction accuracy. > > There is difference in prediction accuracy using SGDClassifier and Cross > validation

[scikit-learn] ANN: Scikit-learn 0.20.3 released

2019-03-02 Thread Joel Nothman
A bug fix release of Scikit-learn, version 0.20.3, has been relased. It is not yet on Conda default channel, but should be available on pypi and conda-forge. Thank you to all who contributed. Substantive changes are listed at https://scikit-learn.org/0.20/whats_new.html#version-0-20-3 And after

Re: [scikit-learn] Sprint discussion points?

2019-02-26 Thread Joel Nothman
What do you think needs to be raised for discussion? On Tue., 26 Feb. 2019, 12:06 pm Jeremie du Boisberranger, < jeremie.du-boisberran...@inria.fr> wrote: > Not the same, although there are similarities. However asv provides > tools to compare benchmarks across commits, and to publish them in

Re: [scikit-learn] Sprint discussion points?

2019-02-25 Thread Joel Nothman
I'm all for the decorator if you can get numpydoc working with it! ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Sprint discussion points?

2019-02-23 Thread Joel Nothman
Something else worth discussing might be the maintenance of scikit-learn-contrib ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Sprint discussion points?

2019-02-20 Thread Joel Nothman
@Hanmin are there particular conversations you are keen to take part in, and particular times that suit you? On Thu., 21 Feb. 2019, 9:13 am Andreas Mueller, wrote: > > > On 2/20/19 4:40 PM, Gael Varoquaux wrote: > > On Tue, Feb 19, 2019 at 06:16:20PM -0500, Andreas Mueller wrote: > >> I put a

Re: [scikit-learn] Sprint discussion points?

2019-02-19 Thread Joel Nothman
schedule, but doing some google form >> or similar seems a bit heavy-handed? >> Not sure if Guillaume had ideas about the schedule, given that he seems >> to be running the show? >> >> On 2/19/19 4:17 PM, Joel Nothm

Re: [scikit-learn] Sprint discussion points?

2019-02-19 Thread Joel Nothman
I don't think optics requires a large meeting, just a few people. I'm happy with your proposal generally, Andy. Do we schedule specific topics at this point? ___ scikit-learn mailing list scikit-learn@python.org

Re: [scikit-learn] VOTE: scikit-learn governance document

2019-02-19 Thread Joel Nothman
Uhh... I forgot to vote. +1 :) It seems there's some consensus. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Sprint discussion points?

2019-02-18 Thread Joel Nothman
And here I was thinking we'd better just push out 0.20.3 this week with what's been listed for it. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Sprint discussion points?

2019-02-13 Thread Joel Nothman
about it. > > How should we prioritize things? > > > On 2/13/19 8:08 PM, Joel Nothman wrote: > > Yes, I was thinking the same. I think there are some other core issues to > solve, such as: > > * euclidean_distances numerical issues > * commitment to ARM testing and d

Re: [scikit-learn] Sprint discussion points?

2019-02-13 Thread Joel Nothman
Yes, I was thinking the same. I think there are some other core issues to solve, such as: * euclidean_distances numerical issues * commitment to ARM testing and debugging * logistic regression stability We should also nut out OPTICS issues or remove it from 0.21. I'm still keen on trying to work

Re: [scikit-learn] Scikit-learn porting strategy

2019-02-05 Thread Joel Nothman
If you count things in Scipy and NumPy (and Joblib and Cython?) that Scikit-learn depends on and which may be lacking or hard to find in SciRuby, it's much much more than 39 years. PyCall, and potentially some Scikit-learn-specific wrappers around it, seems a much more sensible approach.

Re: [scikit-learn] Bounded logistical regression in Python

2019-01-31 Thread Joel Nothman
I don't quite get your terminology, to "add a variable c to center an independent variable Xk", and you've got an extra ) in your equation, so I'm not sure exactly where you want it... If you mean P(X) = a / (1 + exp(b0 + b1*X1 + .. + bn*Xn) * (Xk - c)) then that's the same as P(X) = a / (1 +

Re: [scikit-learn] Can y of datasets be increasing/decreasing ratio when train regression model?

2019-01-30 Thread Joel Nothman
Particular regressors may make assumptions about the distribution of y, or its relationship with the features X. You should be aware of those assumptions and reason about whether they are held well enough. A TransformedTargetRegressor may be used to make your target better match those assumptions,

Re: [scikit-learn] How GridSearchCV to get best_params?

2019-01-05 Thread Joel Nothman
See cv_results_['mean_test_score'] (or 'mean_test_x' where 'x' is the scorer named in the refit parameter). ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] ANN: Scikit-learn 0.20.2 released

2018-12-31 Thread Joel Nothman
A bug fix release of scikit-learn, version 0.20.2, was released a couple of weeks ago. It is not yet on Conda default channel, but should be available on pypi and conda-forge. Thank you to all who contributed. As well as the changes listed at

Re: [scikit-learn] plan to add the association rule classification algorithm in scikit learn

2018-12-16 Thread Joel Nothman
Hi Rui, This has been discussed several times on the mailing list and issue tracker. We are not interested in association rule mining in Scikit-learn for its own purposes. We would be interested in association rule mining only as part of a classification algorithm. Are there such algorithms which

Re: [scikit-learn] check_estimator and score_samples method

2018-12-10 Thread Joel Nothman
We're trying to make check_estimator more flexible ( https://github.com/scikit-learn/scikit-learn/pull/8022) but this is certainly not something we had considered yet. Perhaps suggest it there? Or for now we could just make the check pass if score_samples yields a TypeError with only X...

Re: [scikit-learn] Question about contributing to scikit-learn

2018-12-08 Thread Joel Nothman
Hi Parker, We strongly urge new contributors to start with small issues (documentation, small fixes, etc.) to gain confidence in the contribution procedure, etc. Once you've worked on small issues and understand better what comes through the issue tracker, you can consider bigger contributions.

[scikit-learn] New core dev: Adrin Jalali

2018-12-05 Thread Joel Nothman
The Scikit-learn core development team has welcomed a new member, Adrin Jalali, who has been doing some really amazing work in contributing code and reviews since July (aside from occasional contributions since 2014). Congratulations and welcome, Adrin!

Re: [scikit-learn] Next Sprint

2018-11-18 Thread Joel Nothman
and which would be better to aspire to? Paris in Feb, or Austin in July? On Sun, 18 Nov 2018 at 21:07, Joel Nothman wrote: > When in Feb would we be talking? I'll start mooting it with stakeholders > :) I'm hopeful, but not overly optimistic, that it could work. > > I shou

Re: [scikit-learn] make all new parameters keyword-only?

2018-11-18 Thread Joel Nothman
good to push this change in existing models. We > > should probably announce it strongly well in advance, make sure that all > > our examples are changed (people copy-paste), wait a lot, and find a > > moment to squeeze this in. > > > > Gaël > > > > On Thu, Nov

Re: [scikit-learn] Next Sprint

2018-11-18 Thread Joel Nothman
who > else to invite. I have some funds we could use for paying for travel or > anything else that might be useful. > > > On 11/15/18 10:32 PM, Joel Nothman wrote: > > > Ha! Well, it looks like I won't be teaching the NLP unit at my uni next > year (would usually o

Re: [scikit-learn] Next Sprint

2018-11-15 Thread Joel Nothman
Ha! Well, it looks like I won't be teaching the NLP unit at my uni next year (would usually occupy me March-July), so there is no fundamental problem with disappearing in February, if I can get babysitters, and my boss, on board. (Although I am trying to plan another overseas trip for April, but

Re: [scikit-learn] make all new parameters keyword-only?

2018-11-15 Thread Joel Nothman
ll in advance, make sure that all > our examples are changed (people copy-paste), wait a lot, and find a > moment to squeeze this in. > > Gaël > > On Thu, Nov 15, 2018 at 06:12:35PM +1100, Joel Nothman wrote: > > We could just announce that we will be making this a syntacti

Re: [scikit-learn] make all new parameters keyword-only?

2018-11-14 Thread Joel Nothman
We could just announce that we will be making this a syntactic constraint from version X and make the change wholesale then. It would be less formal backwards compatibility than we usually hold by, but we already are loose with parameter ordering when adding new ones. It would be great if after

Re: [scikit-learn] Random Forest Regressor -- Implementation in C++

2018-11-06 Thread Joel Nothman
See https://github.com/ajtulloch/sklearn-compiledtrees/ and https://github.com/nok/sklearn-porter ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Strange code but that works

2018-10-28 Thread Joel Nothman
Be careful: that @property is very significant here. It means that this is a description of how to *get* the method, not how to *run* the method. You will notice, for instance, that it says `def transform(self)`, not `def transform(self, X)` ___

Re: [scikit-learn] Error with Kfold cross vailidation

2018-10-24 Thread Joel Nothman
Yes, it is not iterable. You are copying a tutorial or code that describes the usage of sklearn.cross_validation.KFold, which no longer exists in version 0.20. Find an example with the newer sklearn.model_selection.KFold. On Thu, 25 Oct 2018 at 00:36, bright silas Aboh wrote: > Okey. I did

Re: [scikit-learn] Micro average in classification report

2018-10-07 Thread Joel Nothman
A lot of this is discussed in http://scikit-learn.org/dev/modules/model_evaluation.html If you passed only a limited set of labels in, micro average would not necessarily be identical across P/R/F. This allows for a "negative label", often an experimentally uninteresting majority class. Try

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-26 Thread Joel Nothman
And for those interested in what's in the pipeline, we are trying to draft a roadmap... https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018 But there are no doubt many features that are absent there too. ___ scikit-learn mailing list

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-26 Thread Joel Nothman
Wow. It's finally out!! Thank you to the cast of thousands, but to also some individuals for real dedication and insight! Yet there's so much more still in the pipeline. If we're clever about things, we'll make the next release cycle shorter and the release more manageable.

Re: [scikit-learn] GMM.fit and GaussianMixture.fit()

2018-08-14 Thread Joel Nothman
gmm is deprecated because it did inappropriate things. Use GaussianMixture On Fri, 10 Aug 2018 11:37 am Dixeena Lopez, wrote: > Dear Sir/Madam, > > I have used GMM.fit() instead of GaussianMixture.fit() and got different > answers. Please gives the advantage and disadvantage of these two.

Re: [scikit-learn] pipeline for modifying target and number of samples

2018-08-01 Thread Joel Nothman
But you can't use cross_validate(seglearn.Pype(...), X, y) in general, can you, if the Pype changes the samples and their correspondence to the input y arbitrarily at both train and predict time? ​ ___ scikit-learn mailing list scikit-learn@python.org

Re: [scikit-learn] Would love to contribute to this library that I fell in love with. I have a question! FIRST TIMER

2018-07-24 Thread Joel Nothman
Hi Abishek, In case you can't tell from the response, this is not a straightforward question to answer. I hope you have looked at our contributor guidelines: http://scikit-learn.org/dev/developers/contributing.html. We encourage contributors to start with changes that focus on things like

Re: [scikit-learn] compiling issue

2018-07-09 Thread Joel Nothman
Homebrew has pushed a lot of users onto Python 3.7 arguably prematurely: several packages weren't ready to support it. A compatibility release, Scikit-learn 0.19.2, is basically ready to be released, but it may take another couple of days. See

Re: [scikit-learn] imbalanced classes: class_weight

2018-06-20 Thread Joel Nothman
the open issue on post-processing / prior adjustment to adjust for class_weight: https://github.com/scikit-learn/scikit-learn/issues/10613​ ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] imbalanced classes: class_weight

2018-06-20 Thread Joel Nothman
We don't usually do any postprocessing for class weight (although there is an open issue:). In the second taxonomy, I'd say Data Pre-processing ("weighting the data space"), but maybe there are exceptions in some estimators. The classification in the first taxonomy is correct, IMO. In the

  1   2   3   >