Re: [scikit-learn] [ANN] scikit-learn 1.2.0rc1 is online!
Thanks Jeremie for pushing this release out! Now is the time to test downstream projects against this to make sure it will not break too many things when we publish the 1.2.0 final release in a week or two ! -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] [ANN] scikit-learn 1.1.3 is online!
Thank you so much Guillaume for getting this release out and to Chiara for pushing forward with the Python 3.11 wheel building infrastructure update and related fixes! -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] [ANN] scikit-learn 1.1.1 is online!
BTW, this is now stable to the URL https://scikit-learn.org/stable/whats_new/v1.1.html#version-1-1-1 also works :) ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] [ANN] scikit-learn 1.1.1 is online!
Thank you to all the contributors who reported bugs, minimal reproducers and fixes, and thank you Guillaume for getting this bugfix release out so timely \o/ -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Experience with black formatting in scikit-learn for astropy
I agree with Guillaume's answers. I think it was a net benefit, even though it might be a bit annoying to get the tooling right for first time contributors. We can probably improve this by making the error messages on the CI more directive on how to fix formatting issues by given copy-pastable commands to install and run black in your branch. Otherwise, I really like just pressing shift-ctrl-i to fix the formatting when editing code in VS Code. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] [ANN] scikit-learn 1.1 release
Congrats Jeremie and everybody who contributed to this release! This is a great achievement. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] [ANN] scikit-learn 1.1.0rc1 is online!
Thanks Jeremie for leading the efforts to get this release out! -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn 1 - pytest - multiprocessing Pool - hangs?
Maybe you can try to use faulthandler.dump_traceback_later https://docs.python.org/3/library/faulthandler.html#faulthandler.dump_traceback_later to get a traceback of all the threads of the main process. But the fact that you are using the default `p = multiprocessing.Pool()` makes me think that it might be related to the lack of fork-safety of the OpenMP runtime library of GCC (libgomp) [1]. There are several ways to check this: - print the output of threadpoolctl.threadpool_info() before calling the code that freezes to confirm (or not) that the libgomp runtime has been loaded before creating the MP Pool. - use multiprocessing Pool using a forkserver context instead of the default fork context: multiprocessing.get_context("forkserver").Pool() - alternatively, use loky.get_reusable_excutor() instead of multiprocessing.Pool() (with a slightly different API) - alternatively, use joblib that uses loky internally with an even more different API. - alternatively, recompile scikit-learn from source with clang instead of gcc so as to link scikit-learn to llvm-openmp instead of gcc's libgomp runtime. llvm-openmp is forksafe, - alternatively, install scikit-learn from conda-forge (conda install -c conda-forge scikit-learn) as the conda-forge distribution relinks all OpenMP compiled extensions of its packaged libraries to llvm-openmp transparently at install time, even if they were built with GCC (maybe we should do that for our linux wheels). [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2014-02/msg00979.html If that does not work or need more help, please feel free to open an issue with a minimal reproducer and ping me on gitter or discord. Le jeu. 9 déc. 2021 à 05:59, Norbert Preining a écrit : > > Dear all, > > I am trying to track down a strange behaviour in one of our (Fujitsu) > library we are planning to open source. In preparation for that, I am > trying to bring it into a state that it works with scikit-learn >= 1. > > But, some of our tests fail when running in parallel mode. But they > only fail when running under pytest, but NOT when running under python. > > The library code contains > > def fit(self, X, y=None): > ... > p = multiprocessing.Pool() > ret = _reduce( > p.map()) > > Now what happens is that with scikit-learn 1(.0.1), the code hangs > forever. I adjusted the code also so that the pool definition is not in > the fit function, but in the __init__ function, and saved into self, but > that didn't help either. > > When interrupted, pytest gives: > > > KeyboardInterrupt > ! > /home/norbert/.pyenv/versions/3.9.6/lib/python3.9/threading.py:312: > KeyboardInterrupt > (to show a full traceback on KeyboardInterrupt use --full-trace) > 1 passed, 2 warnings in > 273.84s (0:04:33) = > Exception ignored in: > Traceback (most recent call last): > File > "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/pool.py", > line 268, in __del__ > self._change_notifier.put(None) > File > "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/queues.py", > line 378, in put > self._writer.send_bytes(obj) > File > "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py", > line 205, in send_bytes > self._send_bytes(m[offset:offset + size]) > File > "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py", > line 416, in _send_bytes > self._send(header + buf) > File > "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py", > line 373, in _send > n = write(self._handle, buf) > > > While when running under python testfile.py all goes well. > > > I have tested the following combinations: > * scikit-learn 0.23.*, python 3.8 and python 3.9 => works > * scikit-learn 0.24.*, python 3.8 and python 3.9 => works > * scikit-learn 1.0.1, python 3.8 and python 3.9 => fails > > I don't really understand where scikit-learn comes into the play here, > so I wanted to ask whether someone here has an idea. > > Thanks for any suggestion > > > Norbert > > -- > PREINING Norbert https://www.preining.info > Fujitsu Research + IFMGA Guide + TU Wien + TeX Live + Debian Dev > GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13 > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn office hours on Friday Oct. 8 2021
To summarize, the office hours for today are: - 15:00-16:00 UTC / 17:00-18:00 CEST (this one starts in less than 10min) - 18:00-19:00 UTC / 20:00-21:00 CEST (with Guillaume) Sorry for the confusion and see you soon. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] scikit-learn office hours on Friday Oct. 8 2021
Hi all, Some of us will be online on the scikit-learn discord this Friday at 15:00 UTC and 20:00 UTC. First time and occasional contributors are welcome to join us to discord using this invitation link: https://discord.gg/YBdN45kD The focus of these office hour sessions is to answer questions about contributing to scikit-learn. We can also split into break out audio/text channels and do pair programming or live reviewing of forgotten pull requests with screen sharing. We can also try to assist you into crafting minimal reproduction cases for bug reports to get a higher likelihood of resolution (e.g. https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports). If this experiment is successful, we will probably hold this kind of office hours on a regular basis. See you soon on discord! -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] [ANNOUNCEMENT] scikit-learn 1.0 release
Yeah! Thank you so much Adrin for all your efforts in getting this release out! Congratulations everyone, time to celebrate! -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Dataframe protocol RFC
Hi all, This is an email to notify everybody interested that the discussion on interoperability of Python dataframe libraries has moved to an official repo under the data-apis.org initiative: https://data-apis.org/blog/dataframe_protocol_rfc/ https://github.com/data-apis/dataframe-api and they are requesting feedback from library authors (both dataframe providers and consumers). -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Pandas copy-on-write proposal
Thanks for the heads up! This is interesting. We rarely update dataframe values in-place in scikit-learn but this is interesting to know that we could leverage this for more efficient pandas-in pandas-out support, for instance for missing value imputation. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] [TC Vote] Technical Committee vote: line length
Many very active core devs not represented in the TC voted for 88 and my previous vote for 79 was not that strong. So I feel that I should now vote for 88: Keep current 88 characters: Olivier Revert to 79 characters: -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] scikit-learn monthly developer meeting: Monday June 28 2021
Dear all, The scikit-learn developer monthly meeting will take place on Monday June 28th at 3PM UTC. - Video call link: https://meet.google.com/qbg-ucpe-ngz - Meeting notes / agenda: https://hackmd.io/0yokz72CTZSny8y3Re648Q - Local times: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021=6=28=15=0=0=1440=240=248=195=179=224 The goal of this meeting is to discuss ongoing development topics for the project. Everybody is welcome. As usual, please follow the code of conduct of the project: https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md Regards, -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] New member of the triage team: Norbert
> I have only one question related to scikit-learn. > how to compute topic coherence of lda models in scikit-lean. I don't find > any function that calculate a coherence value. > please, reply me. We don't have such a metric in scikit-learn. I assume you are referring to: http://svn.aksw.org/papers/2015/WSDM_Topic_Evaluation/public.pdf which is implemented in Gensim as: https://radimrehurek.com/gensim/models/coherencemodel.html If I understand correctly this metric needs to compute relative frequencies of occurrences and co-occurrences of words in the documents of the training set. This feels very domain specific compared to the more domain agnostic metrics that we have in scikit-learn. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] New member of the triage team: Norbert
I am a bit late but I am very happy to see Norbert joining the triage team! Welcome! ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] running examples
Alternatively, you can edit the code to use fetch_openml(..., as_frame=False) to use a numpy array instead of a pandas dataframe for this example. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] [ANN] scikit-learn 0.24.0rc1 is online!
Please help us test the first release candidate for scikit-learn 0.24.0: pip install scikit-learn==0.24.0rc1 Changelog: https://scikit-learn.org/0.24/whats_new/v0.24.html In particular, if you maintain a project with a dependency on scikit-learn, please let us know about any regression. Feel free to also retweet the announcement to get more people to test it before the final release (probably in 1 week or 2): https://twitter.com/scikit_learn/status/1334562221498753026 Thanks to anybody who helped make this happen! -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Changes in Travis billing
> Shall I contact them? Any other volunteers? +1. I think we are still dependent on travis for ARM-based release builds and cron-jobs. The rest we can move it to Azure Pipelines or github actions I believe. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] About the Boston housing prices dataset
Le mar. 13 oct. 2020 à 16:19, Adrin a écrit : > > Isn't the Boston dataset available through openml? Maybe here: > https://www.openml.org/d/531 > > I'm happy to have the dataset out there on opemml, and for any material that > addresses some of the issues with it. > But for educational purposes, we don't need to have the dataset in the > package as long as users can still download it > with a oneliner using fetch_openml. That would be an argument in favor of deprecation warning with a message stating the motivation for deprecation and pointing to fetch_openml. However it's going to break examples written in slow to update tutorials or book once the deprecation period is over. But one could argue that this is also the case for any other deprecation in scikit-learn. It's just that sklearn.datasets.load_boston is used A LOT: https://github.com/search?q=load_boston=code -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] About the Boston housing prices dataset
Thanks for your input, this is also an extension I was thinking of. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] About the Boston housing prices dataset
Hi all, Thanks to the sustained effort of several contributors (thanks Maria and Lucy in particular), the Boston housing price dataset is no longer used in the examples of scikit-learn (nor in the test suite) in the master branch. To give some context on why this dataset is problematic, please have a look at this discussion and the blog post linked in it: https://github.com/scikit-learn/scikit-learn/issues/16155 Now that we no longer use sklearn.datasets.load_boston internally, we have to make a decision about what to do with the loader function itself: deprecate it? just silently hide it from our documentation from our documentation (probably a bad idea)? keep it but educate our users about its ethical problem? Personally, I would be slightly in favor of the latter option and I drafted a short paragraph here: https://github.com/scikit-learn/scikit-learn/pull/18594#issuecomment-707601448 Please feel free to share your thoughts so that we can hopefully make a consensual decision before the 0.24 release. Regards, -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn monthly meeting September 28th 2020
Shall we start rolling meetings with a switch between 2 or 3 time slots? -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] climate friendly software licence
Hi Sole, I personally support climate change actions very much and I am convinced climate change is the number 1 challenge of our time. In an attempt to act in a consistent way with that belief, I declined several times to keynote at conferences either organized by the fossil fuel industry or to conferences that would have required me to fly a long distance to give a presentation. However, I don't think software licensing is a right tool to advance this cause. How would we enforce it? What would happen if we don't enforce it? Who is "we", especially when our library is embedded in 3-rd party software product and the end-users are not necessarily aware of all the upstream dependencies? What about gray-cases, e.g. a company that does not fossil directly extraction per-se but works as a consultancy with a majority of customers in the fossil fuel extraction industry? What if a significant part of their consultancy is to help them detect methane leaks in satellite data? How would we audit this? With which resources? How would we get a consensual decision on those gray cases? What about the hypocrisy of using or contributing to software under that license while regularly using fossil fuel powered transportation or in a working or leaving building heated with fossil fuels? Or buying goods transported this way over long distances? Instead, I would rather encourage everyone to vote for legislators and governments that progressively set bans on the development and commercialization of fossil fuel based technologies and to voice your support for such legislations in public debates. I encourage everybody to look twice before accepting to work for a company involved in fossil fuel extraction one way or another or involved in fossil-fuel intensive activities. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] ANN scikit-learn 0.23.0 release
Congrats on the release! And thank you very much to all those who were involved in making it happen (and Adrin in particular)! -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Monthly meetings
I get a message for an invalid meeting id. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] scikit-learn 0.22.1 is out!
This is a minor release that includes many bug fixes and solves a number of packaging issues with Windows wheels in particular. Here is the full changelog: https://scikit-learn.org/stable/whats_new/v0.22.html#version-0-22-1 The conda package will follow soon (hopefully). Thank you very much to all who contributed to this release! Cheers and happy new year! -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Paris Sprint in January and wiki update
Indeed I do not see the "circle add" button in the tweetdeck UI anymore. But it's ok not to prepare the threads before tweeting the first tweet. We can build the thread progressively by publishing the first tweet and then replying one tweet after the other by hitting the reply button of the last published tweet in the thread. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn twitter account
Ok the twitter accounts are now switched: https://twitter.com/scikit_learn/status/1201794032650932224 The notifications for commits pushed to master are live: https://twitter.com/sklearn_commits Ready for the release :) -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn twitter account
Alright, I have configured the new github action for the tweets on @sklearn_commits: https://github.com/scikit-learn/scikit-learn/pull/15758 I tested it from my repo and it worked fine (I deleted the test tweet though). We can do the switch as soon as this PR is merged. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn twitter account
It might actually be possible to use github actions with https://github.com/xorilog/twitter-action for instance. I will try to give it a try with a test repo. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn twitter account
Alright, it seems that I can create twitter apps (and generates API tokens) for the @sklearn_commits account however https://github.com/filearts/tweethook does not work as it relies on a third party webtask,io service that does not accept any new subscription... I am looking for an alternative way to do this but I am not sure how to do so... ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn twitter account
I have created the https://twitter.com/sklearn_commits twitter account. I have applied to make this account a "Twitter Developer" account to be able to use https://github.com/filearts/tweethook to register it as a webhook for the main scikit-learn github repo. Once ready, I will remove the old webhook currently registered on @scikit_learn account and would like to tweet about the transfer as drafted here: https://hackmd.io/@4rHCRgfySZSdd5eMtfUJiA/H1CSpuF2S/edit Please feel free to let me know if you have any comment / suggestion about this plan. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn twitter account
Le ven. 22 nov. 2019 à 17:24, Gael Varoquaux a écrit : > > > I would like to create @sklearn_commits instead of > > @scikit_learn_commits that is too long to my taste. Any opinion? > > Some people do not make the link between "sklearn" and "scikit-learn" :) People who are likely to follow a twitter account that automatically tweet github commits from the scikit-learn github repo are likely to know about "sklearn". And as you said the bio / full name can be more explicit. The main twitter account with general announcements stays @scikit_learn. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn twitter account
Ok, I have sent some invites. I would like to create @sklearn_commits instead of @scikit_learn_commits that is too long to my taste. Any opinion? -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn twitter account
Thanks Tom, let me try to configure this. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn twitter account
I am not sure who has the rights to manage the twitter account. I just sent a password reset request to "sc**@a..***" I suspect that this is Andreas but I am not so sure. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn twitter account
Le ven. 15 nov. 2019 à 17:31, Nicolas Hug a écrit : > > What's the status of this? Would be great to have it for the 0.22 release :) ! > +1 and we could also announce / thank / RT new sources of funding (CZI and Fujitsu). ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn twitter account
Le mar. 5 nov. 2019 à 12:46, Gael Varoquaux a écrit : > > On Mon, Nov 04, 2019 at 10:14:26PM -0700, Andreas Mueller wrote: > > Should we re-purpose the existing twitter account or make a new one? > > https://twitter.com/scikit_learn > > I think that we should repurpose it: > > - Make a "scikit-learn-commits" twitter account that does what the > current one does > - Use the current one. +1 -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Monthly meetings between core developers
I just found this planner to give it a try: https://www.timeanddate.com/worldclock/meetingtime.html?day=29=7=2019=240=33=37=179=0 (Berlin and Paris are on the same timezone so I did not put only Berlin). It's going to be challenging to find a timeslot for every body. The least extreme timeslot for everybody to attend at the same time would be: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019=7=29=11=0=0=240=33=37=179 We could also arrange for a second timeslot later (that would be Tuesday morning in Australia and China): https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019=7=29=21=0=0=240=33=37=179 I wouldn't mind doing a meeting around 11pm on Monday evening from time to time but it would still be very early for Beijing. Just to let you know, I will be off from next Saturday till Monday August 19 (big summer break :) so don't count on my for the first meeting if you start the meetings in the mean time. Le jeu. 18 juil. 2019 à 00:15, Andreas Mueller a écrit : > > > > On 7/17/19 2:17 PM, Guillaume Lemaître wrote: > > I am +1. This is a great initiative. > > > > IMO, we could make it really regular (i.e., a specific week-day of a > > specific week in a month), with a rolling time (for the time-zone issue). > > In this matter, we could maybe clear more in advance our agenda > > instead of trying to find a date which accommodates everyone. > > > I agree, we could do something like the last Monday every month and > alternate between two (or three) different time zones. > We have CET (UTC+1), EST (UTC-5), CT (UTC+08), AEDT (USC+11) so that > seems super easy, right? > (TIL CST can stand for "Central"/US, China, and Cuba! not confusing at all) > > I agree that we should be as inclusive as possible, but I also don't > want to create the expectation that some people (not thinking of any > Australian in particular) > who already sacrifice a lot of their free time have to invest even more > time to keep up with the rest. > > I think the idea of posting write-ups will help being more inclusive in > that regard. > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Monthly meetings between core developers
Le jeu. 18 juil. 2019 à 08:29, Adrin a écrit : > > BTW, where was the meeting for last Monday organized? I don't think I knew it > was happening. I do not understand what you are referring to. My email was about the organization of future meetings as suggested by Andreas. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] New core developer: jeremiedbb
The core developers of Scikit-learn have recently voted to welcome Jérémie Du Boisberranger to the team, in recognition of his efforts and trustworthiness as contributor. Jérémie's works at Inria Saclay and is supported by the scikit-learn initiative at Fondation Inria and its partners. Congratulations and welcome to the team Jérémie! -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Scikit Learn in a Cray computer
You have to use a dedicated framework to distribute the computation on a cluster like you cray system. You can use mpi, or dask with dask-jobqueue but the also need to run parallel algorithms that are efficient when running in a distributed with a high cost for communication between distributed worker nodes. I am not sure that the dbscan implementation in scikit-learn would benefit much from naively running in distributed mode. Le ven. 28 juin 2019 22 h 06, Mauricio Reis a écrit : > Sorry, but just now I reread your answer more closely. > > It seems that the "n_jobs" parameter of the DBScan routine brings no > benefit to performance. If I want to improve the performance of the > DBScan routine I will have to redesign the solution to use MPI > resources. > > Is it correct? > > --- > Ats., > Mauricio Reis > > Em 28/06/2019 16:47, Mauricio Reis escreveu: > > My laptop has Intel I7 processor with 4 cores. When I run the program > > on Windows 10, the "joblib.cpu_count()" routine returns "4". In these > > cases, the same test I did on the Cray computer caused a 10% increase > > in the processing time of the DBScan routine when I used the "n_jobs = > > 4" parameter compared to the processing time of that routine without > > this parameter. Do you know what is the cause of the longer processing > > time when I use "n_jobs = 4" on my laptop? > > > > --- > > Ats., > > Mauricio Reis > > > > Em 28/06/2019 06:29, Brown J.B. via scikit-learn escreveu: > >>> where you can see "ncpus = 1" (I still do not know why 4 lines were > >>> printed - > >>> > >>> (total of 40 nodes) and each node has 1 CPU and 1 GPU! > >> > >>> #PBS -l select=1:ncpus=8:mpiprocs=8 > >>> aprun -n 4 p.sh ./ncpus.py > >> > >> You can request 8 CPUs from a job scheduler, but if each node the > >> script runs on contains only one virtual/physical core, then > >> cpu_count() will return 1. > >> If that CPU supports multi-threading, you would typically get 2. > >> > >> For example, on my workstation: > >> `--> egrep "processor|model name|core id" /proc/cpuinfo > >> processor : 0 > >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz > >> core id : 0 > >> processor : 1 > >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz > >> core id : 1 > >> processor : 2 > >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz > >> core id : 0 > >> processor : 3 > >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz > >> core id : 1 > >> `--> python3 -c "from sklearn.externals import joblib; > >> print(joblib.cpu_count())" > >> 4 > >> > >> It seems that in this situation, if you're wanting to parallelize > >> *independent* sklearn calculations (e.g., changing dataset or random > >> seed), you'll ask for the MPI by PBS processes like you have, but > >> you'll need to place the sklearn computations in a function and then > >> take care of distributing that function call across the MPI processes. > >> > >> Then again, if the runs are independent, it's a lot easier to write a > >> for loop in a shell script that changes the dataset/seed and submits > >> it to the job scheduler to let the job handler take care of the > >> parallel distribution. > >> (I do this when performing 10+ independent runs of sklearn modeling, > >> where models use multiple threads during calculations; in my case, > >> SLURM then takes care of finding the available nodes to distribute the > >> work to.) > >> > >> Hope this helps. > >> J.B. > >> ___ > >> scikit-learn mailing list > >> scikit-learn@python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Scikit Learn in a Cray computer
How many cores du you have on this machine? joblib.cpu_count() ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] [Copyright] Skicit-learn graphic
I think it's ok to do as you said. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Release Candidate for Scikit-learn 0.21
\o/ ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] VOTE: scikit-learn governance document
+1 ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Sprint discussion points?
I would also add generalizing early stopping options to most estimators. This is a bit related to Joel's point on max_iter consistency in LogisticRegression. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Next Sprint
Ok for me. The last 3 weeks of February are fine for me. Le jeu. 20 déc. 2018 à 21:21, Alexandre Gramfort < alexandre.gramf...@inria.fr> a écrit : > ok for me > > Alex > > On Thu, Dec 20, 2018 at 8:35 PM Adrin wrote: > > > > It'll be the least favourable week of February for me, but I can make do. > > > > On Thu, 20 Dec 2018 at 18:45 Andreas Mueller wrote: > >> > >> Works for me! > >> > >> On 12/19/18 5:33 PM, Gael Varoquaux wrote: > >> > I would propose the week of Feb 25th, as I heard people say that they > >> > might be available at this time. It is good for many people, or > should we > >> > organize a doodle? > >> > > >> > G > >> > > >> > On Wed, Dec 19, 2018 at 05:27:21PM -0500, Andreas Mueller wrote: > >> >> Can we please nail down dates for a sprint? > >> >> On 11/20/18 2:25 PM, Gael Varoquaux wrote: > >> >>> On Tue, Nov 20, 2018 at 08:15:07PM +0100, Olivier Grisel wrote: > >> >>>> We can also do Paris in April / May or June if that's ok with Joel > and better > >> >>>> for Andreas. > >> >>> Absolutely. > >> >>> My thoughts here are that I want to minimize transportation, partly > >> >>> because flying has a large carbon footprint. Also, for personal > reasons, > >> >>> I am not sure that I will be able to make it to Austin in July, but > I > >> >>> realize that this is a pretty bad argument. > >> >>> We're happy to try to host in Paris whenever it's most convenient > and to > >> >>> try to help with travel for those not in Paris. > >> >>> Gaël > >> >>> ___ > >> >>> scikit-learn mailing list > >> >>> scikit-learn@python.org > >> >>> https://mail.python.org/mailman/listinfo/scikit-learn > >> >> ___ > >> >> scikit-learn mailing list > >> >> scikit-learn@python.org > >> >> https://mail.python.org/mailman/listinfo/scikit-learn > >> > >> ___ > >> scikit-learn mailing list > >> scikit-learn@python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > ___ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] MLPClassifier on WIndows 10 is 4 times slower than that on macOS?
You should probably just "conda update scikit-learn": scikit-learn 0.20.1 is available on the official anaconda channel for all supported operating systems: https://anaconda.org/anaconda/scikit-learn -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Difference between linear model and tree-based regressor?
They are very different statistical models from a mathematical point of view. See the online scikit-learn documentation or reference text books such as "Elements of Statistical Learning" for more details. In practice, linear model tends to be faster to fit on large data, especially when the number of features is large (although it depends on the solver, loss, penalty, data scaling...). Linear model cannot fit prediction tasks when the data is not linearly separable (by definition) while tree based model do not have this restriction. Tree based model can still under fit in some cases but for different reasons (e.g. when we limit the depth of the trees). Linear model can be made mode expressive via feature engineering (e.g. k-bins discretizer, polynomial features expansion, Nystroem kernel approximation...) and thereafter sometimes be competitive with tree based models even on task that where originally non linearly separable tasks. However this is not guaranteed either. Cross-validation and parameter tuning are still required to tell which class of model works best for a specific task. As you said, tree based model "cannot extrapolate" in the sense that their decision function is piecewise constant while the decision function of linear model is an hyperplane. Depending on the tasks the lack of extrapolation can either be considered a limitation or a benefit (for instance to avoid unrealistic extrapolations like people with a negative age or size, predicting negative mechanical energy loss via heat dissipation, fractions that are larger than 100%, 6 stars out of 5 recommendations...). -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] New core dev: Adrin Jalali
Congrats and welcome Adrin! -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] benchmarking TargetEncoder Was: ANN Dirty_cat: learning on dirty categories
Maybe a subset of the criteo TB dataset? ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Next Sprint
We can also do Paris in April / May or June if that's ok with Joel and better for Andreas. I am teaching on Fridays from end of January to March. But I can miss half a day of sprint to teach my class. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Random Forest Regressor -- Implementation in C++
You might also want to have a look at https://github.com/onnx/onnxmltools although I am not sure if there are RF optimized ONNX runtimes at this point. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] [ANN] Scikit-learn 0.20.0
> > > > I think model serialization should be a priority. > There is also the ONNX specification that is gaining industrial adoption and that already includes open source exporters for several families of scikit-learn models: https://github.com/onnx/onnxmltools -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] [ANN] Scikit-learn 0.20.0
Le mer. 26 sept. 2018 à 23:02, Joel Nothman a écrit : > And for those interested in what's in the pipeline, we are trying to draft > a roadmap... > https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018 > > But there are no doubt many features that are absent there too. > Indeed, it would be great to get some feedback on this roadmap from heavy scikit-learn users: which points do you think are the most important? What is missing from this roadmap? Feel free to reply to this thread. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] [ANN] Scikit-learn 0.20.0
Joy ! ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Bootstrapping in sklearn
I believe it would fit in sklearn-contrib even if it's more for statistical inference rather than machine learning style prediction. Others might disagree. Anyways, joining efforts to improve documentation, CI, testing and so on is always a good thing for your future users. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Bootstrapping in sklearn
This looks like a very useful project. There is also scikits-bootstraps [1]. Personally I prefer the flat package namespace of resample (I am not a fan of the 'scikits' namespace package) but I still think it would be great to contact the author to know if he would be interested in joining efforts. What currently lacks from both projects is a good sphinx-based documentation that explains in a couple of paragraphs with examples what are the different non-parametric inference methods, what are the pros and cons for each of them (sample complexity, computation complexity, kinds of inference, bias, theoretical asymptotic results, practical discrepancies observed in the finite sample setting, assumptions made on the distribution of the data...) and ideally the doc would have reference to examples (using sphinx-gallery) that would highlight the behavior of the tools in both nominal and pathological cases. [1] https://github.com/cgevans/scikits-bootstrap -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] New core dev: Joris Van den Bossche
Hi everyone! Let's welcome Joris Van den Bossche (@jorisvdbossche) officially as a scikit-learn core developer! Joris is one of the maintainers of the pandas project and recently contributed many new great PRs to scikit-learn (notably the ColumnTransformer and a refactoring of the categorical variable preprocessing tools). Cheers! -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Announcing modAL: a modular active learning framework
It looks nice, thanks for sharing. Do you plan to couple the active learner with a UX-optimized labeling interface (for instance with a react.js or similar frontend and a flask or similar backend)? -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] clustering on big dataset
Have you had a look at BIRCH? http://scikit-learn.org/stable/modules/clustering.html#birch -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Announcing sklearn-xarray
Interesting project! BTW, do you know about dask-ml [1]? It might be interesting to think about generalizing the input validation of fit and predict / transform as a private method of the BaseEstimator class instead of directly calling into sklearn.utils.validation functions so has to make it easier for third party projects such as sklearn-xarray and dask-ml to subclass and override those methods to allow for specific input data-structure without converting everyting to a numpy array. [1] https://github.com/dask/dask-ml 2017-12-04 15:21 GMT+01:00 Peter Hausamann: > Hi all, > > I'd like to announce *sklearn-xarray*, a new package that provides a > scikit-learn interface for xarray users. For those not familiar with xarray > (http://xarray.pydata.org), it is a "pandas-like and pandas-compatible > toolkit for analytics on multi-dimensional arrays". > > The package makes it possible to apply sklearn estimators to xarray > DataArrays and Datasets while keeping the labels (called coordinates in > xarray) intact whereever possible. > > You can install the package via pip: > > pip install sklearn-xarray > > To get started, you can: > >- read the documentation: https://phausamann.github.io/sklearn-xarray >and >- check out the repository: https://github. >com/phausamann/sklearn-xarray > > Note that the package is still in a very early development stage and there > will probably be some major API changes in upcoming releases. Most notably, > I'd like to replicate the complete sklearn module structure at some point > by decorating all available estimators with the necessary wrappers. > > Feedback of any kind is appreciated. > > Peter > > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Error while running 'python setup.py build_ext --inplace'
Maybe update your version of Cython? -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Rapid Outlier Detection via Sampling
> Do I need to write object oriented or are functions also ok? I you want to contribute an implementation as a new project on scikit-learn contrib, you should be careful to follow the scikit-learn estimators API: http://scikit-learn.org/dev/developers/contributing.html#apis-of-scikit-learn-objects For outlier detection in particular, you should make sure your new estimator is consistent with the API conventions of other methods already in scikit-learn: http://scikit-learn.org/dev/modules/outlier_detection.html One of the primary goals of the scikit-learn ecosystem is to provide a simple homogeneous API to a very heterogeneous set of methods. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] New core devs: Hanmin Qin, Guillaume Lemaître, and Roman Yurchak
Congrats to all three of you! Thank you very much for your contributions and in particular in reviewing contributions by others. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn-commits mailing list defunct?
+1 for python.org if they accept this kind of mailing lists. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn-commits mailing list defunct?
+1 ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] scikit-learn 0.19.0 is out!
Grab it with pip or conda ! Quoting the release highlights from the website: We are excited to release a number of great new features including neighbors.LocalOutlierFactor for anomaly detection, preprocessing.QuantileTransformer for robust feature transformation, and the multioutput.ClassifierChain meta-estimator to simply account for dependencies between classes in multilabel problems. We have some new algorithms in existing estimators, such as multiplicative update in decomposition.NMF and multinomial linear_model.LogisticRegression with L1 loss (use solver='saga'). Cross validation is now able to return the results from multiple metric evaluations. The new model_selection.cross_validate can return many scores on the test data as well as training set performance and timings, and we have extended the scoring and refit parameters for grid/randomized search to handle multiple metrics. You can also learn faster. For instance, the new option to cache transformations in pipeline.Pipeline makes grid search over pipelines including slow transformations much more efficient. And you can predict faster: if you’re sure you know what you’re doing, you can turn off validating that the input is finite using config_context. We’ve made some important fixes too. We’ve fixed a longstanding implementation error in metrics.average_precision_score, so please be cautious with prior results reported from that function. A number of errors in the manifold.TSNE implementation have been fixed, particularly in the default Barnes-Hut approximation. semi_supervised.LabelSpreading and semi_supervised.LabelPropagation have had substantial fixes. LabelPropagation was previously broken. LabelSpreading should now correctly respect its alpha parameter. Please see the full changelog at: http://scikit-learn.org/0.19/whats_new.html#version-0-19 Notably some models have changed behaviors (bug fixes) and some methods or parameters part of the public API have been deprecated. A big thank you to anyone who made this release possible and Joel in particular. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Truncated svd not working for complex matrices
I have no idea whether the randomized SVD method is supposed to work for complex data or not (from a mathematical point of view). I think that all scikit-learn estimators assume real data (or integer data for class labels) and our input validation utilities will cast numeric values to float64 by default. This might be the cause of your problem. Have a look at the source code to confirm. The reference to the paper can also be found in the docstring of those functions. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Extra trees tuning parameters
I believe so even though it's always better to check in the code to see how this parameter is actually used. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] scikit-learn 0.19b2 is available for testing
The new release is coming and we are seeking feedback from beta testers! pip install scikit-learn==0.19b2 conda-forge packages should follow in the coming hours / days. Note that many models have changed behaviors and some things have been deprecated, see the full changelog at: http://scikit-learn.org/dev/whats_new.html#version-0-19 As usual please report any regression or other bugs as an issue on github. Thanks to anyone who contributed to the release! -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Which algorithm is used in sklearn SGDClassifier when modified huber loss is used?
The name of the algorithm / model would be "L2-penalized linear model with modified Huber loss trained with Stochastic Gradient Descent". SVM is traditionally used to describe models that use the hinge loss only (or sometimes the squared hinge loss too). Only the log loss can be lead to a probabilistic linear binary classifiers in scikit-learn. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Typo in online documentation on Matrix Factorization
I think the documentation is correct. U, a.k.a. "the code" or "the activations" has shape (n_samples, n_components) and V a.k.a. "the dictionary" or "the components" has shape (n_components, n_features) in both case. We could use n_components uniformly instead of n_atoms for consistency's sake (and just make sure that the "components" is a synonym for "dictionary atoms" in the literature). I think V_k is fine because the dimension with size n_components is the first dimension of V. If you spot issues or other things that are unclear or incomplete in the doc, please feel free to open an issue on github. You can also directly submit a pull request if you are familiar with git. The website is built from the docs that live in the "doc/" subfolder of the repo. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Fwd: [SciPy-User] EuroSciPy 2017 call for contributions - extension of deadline
I am pretty sure this is exactly the kind of presentation that the EuroScipy audience would enjoy. Please submit! -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Scikit-learn at Data Intelligence this past weekend
Thanks for this report! -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Agglomerative clustering
You can have a look at the test named "test_agglomerative_clustering" in: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/tests/test_hierarchical.py -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Scikit-learn workshop and sprint at EuroScipy 2017 in Erlangen
Hi all, FYI I have just submitted a 90 min tutorial on scikit-learn to the EuroScipy CFP. If anybody is interested in co-teaching / TA-ing this workshop please let me know. I also plan to stay for the one-day sprint to help people make their first contribution to the project. Last year we had great fun and the sprint was very productive. Registration is now open: https://www.euroscipy.org/2017/ 10th European Conference on Python in Science Location: Erlangen August 28-29 (Mon, Tue) Tutorials / Workshops August 30 - 31 (Wed, Thu) Main conference and posters September 1 (Fri) Sprints See you in Erlangen! -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] XGboost Classifier error
Please provide the full traceback. Without it it's impossible to tell whether the problem is in scikit-learn or xgboost. Also, please provide a minimal reproduction script as explained in: http://scikit-learn.org/stable/faq.html#what-s-the-best-way-to-get-help-on-scikit-learn-usage -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Logistic regression with elastic net regularization
>From a generalization point of view (test accuracy), the optimal sparsity support should not matter much though, but it can be helpful to find a the optimally sparsest solution for either computational constraints (smaller models with a lower prediction latency) and interpretation of the weights (domain specific). -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Logistic regression with elastic net regularization
Note that SGD is not very good at optimizing finely with a non-smooth penalty (e.g. l1 or elasticnet). The future SAGA solver is going to be much better at finding the optimal sparsity support (although this support is not guaranteed to be stable across re-sampling of the training set if the training set is small). -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] GSOC call for mentors
Personally I don't feel like mentoring this year. I would really like to focus my scikit-learn time on finishing the joblib process refactoring with Thomas Moreau and the binning / thread-based parallelization of boosted trees with Guillaume and Raghav. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Modelling event rates
I don't think we have any model dedicated to this, but it's possible that expressive non-parametricmodels such as RF and GBRT or richly parameterized models such as MLP with a regression loss can do a good enough job at giving you a point estimate. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
I would rather like to get it out before April ideally and instead of setting up a roadmap I would rather just identify bugs that are blockers and fix only those and don't wait for any feature before cutting 0.19.X. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
In retrospect, making a small 0.19 release is probably a good idea. I would like to get https://github.com/scikit-learn/scikit-learn/pull/8002 in before cutting the 0.19.X branch. -- Olivier Grisel ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release
Hi all, I think we should release 0.18.2 to get some important fixes and make it easy to release Python 3.6 wheel package for all the operating systems using the automated procedure. I identified a couple of PR to backport to 0.18.X to prepare the 0.18.2 release. Are there any other important recently fixed bugfs people would like to see backported in this release? https://github.com/scikit-learn/scikit-learn/milestone/23?closed=1 Best, -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] modifying CV score
You can indeed derive from BaseEstimator and implement fit, predict and optionally score. Here is the documentation for the expected estimator API: http://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects As this is a linear regression model, you can also want to have a look at the LinearModel and RegressionMixin base classes for inspiration: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/base.py#L401 Note that the score function should always be "higher is better". The explained variance ratio and negative mean squared error are valid scoring functions for model selection in scikit-learn while raw MSE is not not. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] HashingVectorizer slow in version 0.18
I cannot reproduce such a degradation on my machine: (sklearn-0.17)ogrisel@is146148:~/code/scikit-learn$ python ~/tmp/bench_vectorizer.py scikit-learn 0.17.1. Numpy 1.11.2. Python 3.5.0 x86_64 Vectorizing 20newsgroup 11314 documents Vectorization completed in 4.033604383468628 seconds, resulting shape (11314, 1048576) (sklearn-0.18) ogrisel@is146148:~/code/scikit-learn$ python ~/tmp/bench_vectorizer.py scikit-learn 0.18. Numpy 1.11.2. Python 3.5.0 x86_64 Vectorizing 20newsgroup 11314 documents Vectorization completed in 4.990509510040283 seconds, resulting shape (11314, 1048576) Which operating system are you using? Please feel free to open an issue on the tracker anyway. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Latent Semantic Analysis (LSA) and TrucatedSVD
BTW Roman, the examples in your gist would make a great non-regression test for this new feature. Please feel free to submit a PR. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] 0.18?
Sorry for the late reply, Before working on this release I would like to automate the wheel generation process (for the release wheels) in a single repo that will generate wheels for linux, osx and windows based on https://github.com/matthew-brett/multibuild I plan to put that repo under https://github.com/scikit-learn/scikit-learn-wheels and deprecate https://github.com/MacPython/scikit-learn-wheels that we used for the OSX wheels. There is also some issue triaging to do, it would be great to identify blocker bugs that we would like to get fixed before releasing 0.18. We can aim to do a beta mid-August and the final release after euroscipy (first week of September). -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] How to test on PYTHON_ARCH=32 with mac?
> I believe this `arch -i386` only works as a prefix for Python.org Python, > but I'm happy to be corrected. Then the following should work: arch -i386 python -c "import nose; nose.main()" sklearn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] NB-SVM Implementation
I think it could be implemented as a preprocessing step: this is the approach followed by: https://github.com/ryankiros/skip-thoughts/blob/master/eval_classification.py Note that in that case LogisticRegression is used as the final classifier instead of a squared hinge loss SVM but that should not change much in practice. If you want to make this approach scikit-learn compatible (to work with the Pipeline and sklearn's model selection tools for instance) be sure to implement the Transformer API as documented here: http://scikit-learn.org/dev/developers/contributing.html#apis-of-scikit-learn-objects Read the rest of the contributions guide: http://scikit-learn.org/dev/developers NBSVM is quite recent and might not strictly follow the conditions for inclusion as stated in: http://scikit-learn.org/stable/faq.html#can-i-add-this-new-algorithm-that-i-or-someone-else-just-published It already has 163 citations though: https://scholar.google.com/scholar?oi=bibs=en=1710642630990759287 As this is a really strong baseline and the model is not complex and should blend well within the scikit-learn API I would be +1 for inclusion in sklearn. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn