Thank you very much, Jérémie and everyone else involved in making this
release happen!
--
Olivier
Le ven. 10 janv. 2025 à 11:59, Jérémie du Boisberranger <
jeremie.du-boisberran...@inria.fr> a écrit :
> Hello everyone,
>
> We're happy to announce the 1.6.1 release !
>
>
> It contains fixes for
Thanks Jeremie for pushing this release out!
Now is the time to test downstream projects against this to make sure
it will not break too many things when we publish the 1.2.0 final
release in a week or two !
--
Olivier
___
scikit-learn mailing list
sciki
Thank you so much Guillaume for getting this release out and to Chiara
for pushing forward with the Python 3.11 wheel building infrastructure
update and related fixes!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.
BTW, this is now stable to the URL
https://scikit-learn.org/stable/whats_new/v1.1.html#version-1-1-1 also
works :)
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Thank you to all the contributors who reported bugs, minimal
reproducers and fixes, and thank you Guillaume for getting this bugfix
release out so timely \o/
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailma
I agree with Guillaume's answers.
I think it was a net benefit, even though it might be a bit annoying
to get the tooling right for first time contributors. We can probably
improve this by making the error messages on the CI more directive on
how to fix formatting issues by given copy-pastable com
Congrats Jeremie and everybody who contributed to this release! This
is a great achievement.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Thanks Jeremie for leading the efforts to get this release out!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Maybe you can try to use faulthandler.dump_traceback_later
https://docs.python.org/3/library/faulthandler.html#faulthandler.dump_traceback_later
to get a traceback of all the threads of the main process.
But the fact that you are using the default `p =
multiprocessing.Pool()` makes me think that i
To summarize, the office hours for today are:
- 15:00-16:00 UTC / 17:00-18:00 CEST (this one starts in less than 10min)
- 18:00-19:00 UTC / 20:00-21:00 CEST (with Guillaume)
Sorry for the confusion and see you soon.
--
Olivier
___
scikit-learn mailing
Hi all,
Some of us will be online on the scikit-learn discord this Friday at
15:00 UTC and 20:00 UTC.
First time and occasional contributors are welcome to join us to
discord using this invitation link:
https://discord.gg/YBdN45kD
The focus of these office hour sessions is to answer questions a
Yeah!
Thank you so much Adrin for all your efforts in getting this release out!
Congratulations everyone, time to celebrate!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Hi all,
This is an email to notify everybody interested that the discussion on
interoperability of Python dataframe libraries has moved to an
official repo under the data-apis.org initiative:
https://data-apis.org/blog/dataframe_protocol_rfc/
https://github.com/data-apis/dataframe-api
and they a
Thanks for the heads up! This is interesting. We rarely update
dataframe values in-place in scikit-learn but this is interesting to
know that we could leverage this for more efficient pandas-in
pandas-out support, for instance for missing value imputation.
__
Many very active core devs not represented in the TC voted for 88 and
my previous vote for 79 was not that strong. So I feel that I should
now vote for 88:
Keep current 88 characters:
Olivier
Revert to 79 characters:
--
Olivier
___
scikit-learn mailin
Dear all,
The scikit-learn developer monthly meeting will take place on Monday
June 28th at
3PM UTC.
- Video call link: https://meet.google.com/qbg-ucpe-ngz
- Meeting notes / agenda: https://hackmd.io/0yokz72CTZSny8y3Re648Q
- Local times:
https://www.timeanddate.com/worldclock/meetingdetails.html
> I have only one question related to scikit-learn.
> how to compute topic coherence of lda models in scikit-lean. I don't find
> any function that calculate a coherence value.
> please, reply me.
We don't have such a metric in scikit-learn. I assume you are referring to:
http://svn.aksw.org/pap
I am a bit late but I am very happy to see Norbert joining the triage
team! Welcome!
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Alternatively, you can edit the code to use fetch_openml(...,
as_frame=False) to use a numpy array instead of a pandas dataframe for
this example.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/
Please help us test the first release candidate for scikit-learn 0.24.0:
pip install scikit-learn==0.24.0rc1
Changelog: https://scikit-learn.org/0.24/whats_new/v0.24.html
In particular, if you maintain a project with a dependency on
scikit-learn, please let us know about any regression.
Feel
> Shall I contact them? Any other volunteers?
+1.
I think we are still dependent on travis for ARM-based release builds
and cron-jobs. The rest we can move it to Azure Pipelines or github
actions I believe.
--
Olivier
___
scikit-learn mailing list
sci
Le mar. 13 oct. 2020 à 16:19, Adrin a écrit :
>
> Isn't the Boston dataset available through openml? Maybe here:
> https://www.openml.org/d/531
>
> I'm happy to have the dataset out there on opemml, and for any material that
> addresses some of the issues with it.
> But for educational purposes,
Thanks for your input, this is also an extension I was thinking of.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Hi all,
Thanks to the sustained effort of several contributors (thanks Maria
and Lucy in particular), the Boston housing price dataset is no longer
used in the examples of scikit-learn (nor in the test suite) in the
master branch.
To give some context on why this dataset is problematic, please ha
Shall we start rolling meetings with a switch between 2 or 3 time slots?
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Hi Sole,
I personally support climate change actions very much and I am
convinced climate change is the number 1 challenge of our time. In an
attempt to act in a consistent way with that belief, I declined
several times to keynote at conferences either organized by the fossil
fuel industry or to c
Congrats on the release! And thank you very much to all those who were
involved in making it happen (and Adrin in particular)!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
I get a message for an invalid meeting id.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
This is a minor release that includes many bug fixes and solves a
number of packaging issues with Windows wheels in particular. Here is
the full changelog:
https://scikit-learn.org/stable/whats_new/v0.22.html#version-0-22-1
The conda package will follow soon (hopefully).
Thank you very much to a
Indeed I do not see the "circle add" button in the tweetdeck UI anymore.
But it's ok not to prepare the threads before tweeting the first
tweet. We can build the thread progressively by publishing the first
tweet and then replying one tweet after the other by hitting the reply
button of the last p
Ok the twitter accounts are now switched:
https://twitter.com/scikit_learn/status/1201794032650932224
The notifications for commits pushed to master are live:
https://twitter.com/sklearn_commits
Ready for the release :)
--
Olivier
___
scikit-learn m
Alright, I have configured the new github action for the tweets on
@sklearn_commits:
https://github.com/scikit-learn/scikit-learn/pull/15758
I tested it from my repo and it worked fine (I deleted the test tweet though).
We can do the switch as soon as this PR is merged.
--
Olivier
_
It might actually be possible to use github actions with
https://github.com/xorilog/twitter-action for instance. I will try to
give it a try with a test repo.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailm
Alright, it seems that I can create twitter apps (and generates API
tokens) for the @sklearn_commits account however
https://github.com/filearts/tweethook does not work as it relies on a
third party webtask,io service that does not accept any new
subscription...
I am looking for an alternative way
I have created the https://twitter.com/sklearn_commits twitter account.
I have applied to make this account a "Twitter Developer" account to
be able to use https://github.com/filearts/tweethook to register it as
a webhook for the main scikit-learn github repo.
Once ready, I will remove the old we
Le ven. 22 nov. 2019 à 17:24, Gael Varoquaux
a écrit :
>
> > I would like to create @sklearn_commits instead of
> > @scikit_learn_commits that is too long to my taste. Any opinion?
>
> Some people do not make the link between "sklearn" and "scikit-learn" :)
People who are likely to follow a twitt
Ok, I have sent some invites.
I would like to create @sklearn_commits instead of
@scikit_learn_commits that is too long to my taste. Any opinion?
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/
Thanks Tom, let me try to configure this.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
I am not sure who has the rights to manage the twitter account. I just
sent a password reset request to "sc**@a..***"
I suspect that this is Andreas but I am not so sure.
___
scikit-learn mailing list
scikit-learn@python.org
https:
Le ven. 15 nov. 2019 à 17:31, Nicolas Hug a écrit :
>
> What's the status of this? Would be great to have it for the 0.22 release :) !
>
+1 and we could also announce / thank / RT new sources of funding (CZI
and Fujitsu).
___
scikit-learn mailing list
s
Le mar. 5 nov. 2019 à 12:46, Gael Varoquaux
a écrit :
>
> On Mon, Nov 04, 2019 at 10:14:26PM -0700, Andreas Mueller wrote:
> > Should we re-purpose the existing twitter account or make a new one?
> > https://twitter.com/scikit_learn
>
> I think that we should repurpose it:
>
> - Make a "scikit-lea
I just found this planner to give it a try:
https://www.timeanddate.com/worldclock/meetingtime.html?day=29&month=7&year=2019&p1=240&p2=33&p3=37&p4=179&iv=0
(Berlin and Paris are on the same timezone so I did not put only Berlin).
It's going to be challenging to find a timeslot for every body. Th
Le jeu. 18 juil. 2019 à 08:29, Adrin a écrit :
>
> BTW, where was the meeting for last Monday organized? I don't think I knew it
> was happening.
I do not understand what you are referring to. My email was about the
organization of future meetings as suggested by Andreas.
___
+1 for last Monday of each month. How about the duration? 1h max + breakout
in smaller groups on more specific topics if needed?
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
The core developers of Scikit-learn have recently voted to welcome
Jérémie Du Boisberranger to the team, in recognition of his efforts
and trustworthiness as contributor. Jérémie's works at Inria Saclay
and is supported by the scikit-learn initiative at Fondation Inria and
its partners.
Congratula
You have to use a dedicated framework to distribute the computation on a
cluster like you cray system.
You can use mpi, or dask with dask-jobqueue but the also need to run
parallel algorithms that are efficient when running in a distributed with a
high cost for communication between distributed wo
How many cores du you have on this machine?
joblib.cpu_count()
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
I think it's ok to do as you said.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
A quick bugfix release to fix a critical regression in the computation
of the euclidean distances returning incorrect values silently.
This release also includes other bugfixes listed in the changelog:
https://scikit-learn.org/0.21/whats_new.html#version-0-21-2
The PyPI.org wheels and conda-forg
\o/
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
+1
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
I would also add generalizing early stopping options to most estimators.
This is a bit related to Joel's point on max_iter consistency in
LogisticRegression.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailm
ople say that they
> >> > might be available at this time. It is good for many people, or
> should we
> >> > organize a doodle?
> >> >
> >> > G
> >> >
> >> > On Wed, Dec 19, 2018 at 05:27:21PM -0500, Andreas Mueller wrote:
&
You should probably just "conda update scikit-learn":
scikit-learn 0.20.1 is available on the official anaconda channel for all
supported operating systems:
https://anaconda.org/anaconda/scikit-learn
--
Olivier
___
scikit-learn mailing list
scikit-learn
They are very different statistical models from a mathematical point of
view. See the online scikit-learn documentation or reference text books
such as "Elements of Statistical Learning" for more details.
In practice, linear model tends to be faster to fit on large data,
especially when the number
Congrats and welcome Adrin!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Maybe a subset of the criteo TB dataset?
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
+1 on the ideal in general (and to enforce this on new classes / params).
+1 to be conservative and not break existing code.
Le mar. 20 nov. 2018 à 21:09, Joris Van den Bossche <
jorisvandenboss...@gmail.com> a écrit :
> Op zo 18 nov. 2018 om 11:14 schreef Joel Nothman :
>
>> I think we're all ag
We can also do Paris in April / May or June if that's ok with Joel and
better for Andreas.
I am teaching on Fridays from end of January to March. But I can miss half
a day of sprint to teach my class.
--
Olivier
___
scikit-learn mailing list
scikit-lea
You might also want to have a look at https://github.com/onnx/onnxmltools
although I am not sure if there are RF optimized ONNX runtimes at this
point.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listi
>
>
> > I think model serialization should be a priority.
>
There is also the ONNX specification that is gaining industrial adoption
and that already includes open source exporters for several families of
scikit-learn models:
https://github.com/onnx/onnxmltools
--
Olivier
__
Le mer. 26 sept. 2018 à 23:02, Joel Nothman a
écrit :
> And for those interested in what's in the pipeline, we are trying to draft
> a roadmap...
> https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
>
> But there are no doubt many features that are absent there too.
>
Indeed, i
Joy !
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
I believe it would fit in sklearn-contrib even if it's more for statistical
inference rather than machine learning style prediction.
Others might disagree.
Anyways, joining efforts to improve documentation, CI, testing and so on is
always a good thing for your future users.
--
Olivier
_
This looks like a very useful project.
There is also scikits-bootstraps [1]. Personally I prefer the flat package
namespace of resample (I am not a fan of the 'scikits' namespace package)
but I still think it would be great to contact the author to know if he
would be interested in joining efforts
That's a cool trick but I am worried it would render our API too
"frameworky" for my taste.
I think the FunctionTransformer is enough:
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html
___
scikit-learn maili
Hi everyone!
Let's welcome Joris Van den Bossche (@jorisvdbossche) officially as a
scikit-learn core developer!
Joris is one of the maintainers of the pandas project and recently
contributed many new great PRs to scikit-learn (notably the
ColumnTransformer and a refactoring of the categorical var
It looks nice, thanks for sharing.
Do you plan to couple the active learner with a UX-optimized labeling
interface (for instance with a react.js or similar frontend and a flask or
similar backend)?
--
Olivier
___
scikit-learn mailing list
scikit-lear
Have you had a look at BIRCH?
http://scikit-learn.org/stable/modules/clustering.html#birch
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Interesting project!
BTW, do you know about dask-ml [1]?
It might be interesting to think about generalizing the input validation of
fit and predict / transform as a private method of the BaseEstimator class
instead of directly calling into sklearn.utils.validation functions so has
to make it eas
Maybe update your version of Cython?
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
> Do I need to write object oriented or are functions also ok?
I you want to contribute an implementation as a new project on scikit-learn
contrib, you should be careful to follow the scikit-learn estimators API:
http://scikit-learn.org/dev/developers/contributing.html#apis-of-scikit-learn-object
Congrats to all three of you! Thank you very much for your contributions
and in particular in reviewing contributions by others.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
+1 for python.org if they accept this kind of mailing lists.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
+1
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Grab it with pip or conda !
Quoting the release highlights from the website:
We are excited to release a number of great new features including
neighbors.LocalOutlierFactor for anomaly detection,
preprocessing.QuantileTransformer for robust feature transformation, and
the multioutput.ClassifierCh
I have no idea whether the randomized SVD method is supposed to work for
complex data or not (from a mathematical point of view). I think that all
scikit-learn estimators assume real data (or integer data for class labels)
and our input validation utilities will cast numeric values to float64 by
de
I believe so even though it's always better to check in the code to see how
this parameter is actually used.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
The new release is coming and we are seeking feedback from beta testers!
pip install scikit-learn==0.19b2
conda-forge packages should follow in the coming hours / days.
Note that many models have changed behaviors and some things have been
deprecated, see the full changelog at:
http://scikit-
If this is the first time you contribute, please make sure to
carefully read the contributors guide till the end:
http://scikit-learn.org/stable/developers/contributing.html
In particular, make sure to follow the estimators API conventions for
your PR to get a chance to be reviewed. In particular
Please use this mailing list if you have targeted scikit-learn mailing
list questions. Otherwise you should better ask a specific question on
an NLP and datascience community platform such as:
https://datascience.stackexchange.com/questions/tagged/nlp
or if you have a programming related question
The name of the algorithm / model would be "L2-penalized linear model
with modified Huber loss trained with Stochastic Gradient Descent".
SVM is traditionally used to describe models that use the hinge loss
only (or sometimes the squared hinge loss too).
Only the log loss can be lead to a probabi
I think the documentation is correct. U, a.k.a. "the code" or "the
activations" has shape (n_samples, n_components) and V a.k.a. "the
dictionary" or "the components" has shape (n_components, n_features) in
both case.
We could use n_components uniformly instead of n_atoms for consistency's
sake (an
2017-07-06 15:10 GMT+02:00 Olivier Grisel :
> (and just make sure that the "components" is a synonym for "dictionary
> atoms" in the literature).
Actually I meant: and just make sure that our documentation states explicitly
that the "components" is a s
I am pretty sure this is exactly the kind of presentation that the
EuroScipy audience would enjoy. Please submit!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Thanks for this report!
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
You can have a look at the test named "test_agglomerative_clustering" in:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/tests/test_hierarchical.py
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.
Hi Tim,
Thanks for the help.
I was planning to do a quick sklearn intro based on slides such as the
first part of:
https://speakerdeck.com/ogrisel/intro-to-scikit-learn-and-whats-new-in-0-dot-17
(but I would like to re-do them in HTML with remark.js as I do here:
https://github.com/ogrisel/decks
Hi all,
FYI I have just submitted a 90 min tutorial on scikit-learn to the
EuroScipy CFP. If anybody is interested in co-teaching / TA-ing this
workshop please let me know.
I also plan to stay for the one-day sprint to help people make their
first contribution to the project. Last year we had gre
+1 for changing this example to have error bars represent 5 & 95
percentiles or 25 and 75 percentiles (quartiles).
Or event bootstrapped confidence intervals or the mean feature
importance for each variable. This might be a bit too verbose for an
example though.
> Perhaps more importantly - is a
+1 for recommending to use `pip install --editable .`.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Thanks Matthew,
I have uploaded your Python 3.6 wheel for MacOSX to PyPI.
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
Please provide the full traceback. Without it it's impossible to tell
whether the problem is in scikit-learn or xgboost.
Also, please provide a minimal reproduction script as explained in:
http://scikit-learn.org/stable/faq.html#what-s-the-best-way-to-get-help-on-scikit-learn-usage
--
Olivier
_
Integer coding will indeed make the DT assume an arbitrary ordering
while one-hot encoding does not force the tree model to make that
assumption.
However in practice when the depth of the trees is not too limited (or
if you use a large enough ensemble of trees), the model will have
enough flexibil
For large enough models (e.g. random forests or gradient boosted trees
ensembles) I would definitely recommend arbitrary integer coding for
the categorical variables.
Try both, use cross-validation and see for yourself.
--
Olivier
___
scikit-learn mail
>From a generalization point of view (test accuracy), the optimal
sparsity support should not matter much though, but it can be helpful
to find a the optimally sparsest solution for either computational
constraints (smaller models with a lower prediction latency) and
interpretation of the weights (
Note that SGD is not very good at optimizing finely with a non-smooth
penalty (e.g. l1 or elasticnet). The future SAGA solver is going to be
much better at finding the optimal sparsity support (although this
support is not guaranteed to be stable across re-sampling of the
training set if the traini
Personally I don't feel like mentoring this year. I would really like
to focus my scikit-learn time on finishing the joblib process
refactoring with Thomas Moreau and the binning / thread-based
parallelization of boosted trees with Guillaume and Raghav.
--
Olivier
I don't think we have any model dedicated to this, but it's possible
that expressive non-parametricmodels such as RF and GBRT or richly
parameterized models such as MLP with a regression loss can do a good
enough job at giving you a point estimate.
--
Olivier
_
It's ok to work on a bug if the original contributor has not replied
to the reviewers comments in a while (e.g. a couple of weeks).
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
1 - 100 of 126 matches
Mail list logo