Re: [scikit-learn] [ANN] scikit-learn 1.2.0rc1 is online!

2022-11-29 Thread Olivier Grisel
Thanks Jeremie for pushing this release out!

Now is the time to test downstream projects against this to make sure
it will not break too many things when we publish the 1.2.0 final
release in a week or two !
--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] scikit-learn 1.1.3 is online!

2022-10-30 Thread Olivier Grisel
Thank you so much Guillaume for getting this release out and to Chiara
for pushing forward with the Python 3.11 wheel building infrastructure
update and related fixes!

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] scikit-learn 1.1.1 is online!

2022-05-19 Thread Olivier Grisel
BTW, this is now stable to the URL
https://scikit-learn.org/stable/whats_new/v1.1.html#version-1-1-1 also
works :)
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] scikit-learn 1.1.1 is online!

2022-05-19 Thread Olivier Grisel
Thank you to all the contributors who reported bugs, minimal
reproducers and fixes, and thank you Guillaume for getting this bugfix
release out so timely \o/

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Experience with black formatting in scikit-learn for astropy

2022-05-19 Thread Olivier Grisel
I agree with Guillaume's answers.

I think it was a net benefit, even though it might be a bit annoying
to get the tooling right for first time contributors. We can probably
improve this by making the error messages on the CI more directive on
how to fix formatting issues by given copy-pastable commands to
install and run black in your branch.

Otherwise, I really like just pressing shift-ctrl-i to fix the
formatting when editing code in VS Code.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] scikit-learn 1.1 release

2022-05-12 Thread Olivier Grisel
Congrats Jeremie and everybody who contributed to this release! This
is a great achievement.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] scikit-learn 1.1.0rc1 is online!

2022-04-28 Thread Olivier Grisel
Thanks Jeremie for leading the efforts to get this release out!

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn 1 - pytest - multiprocessing Pool - hangs?

2021-12-09 Thread Olivier Grisel
Maybe you can try to use faulthandler.dump_traceback_later
https://docs.python.org/3/library/faulthandler.html#faulthandler.dump_traceback_later
to get a traceback of all the threads of the main process.

But the fact that you are using the default `p =
multiprocessing.Pool()` makes me think that it might be related to the
lack of fork-safety of the OpenMP runtime library of GCC (libgomp)
[1]. There are several ways to check this:

- print the output of threadpoolctl.threadpool_info() before calling
the code that freezes to confirm (or not) that the libgomp runtime has
been loaded before creating the MP Pool.
- use multiprocessing Pool using a forkserver context instead of the
default fork context: multiprocessing.get_context("forkserver").Pool()
- alternatively, use loky.get_reusable_excutor() instead of
multiprocessing.Pool() (with a slightly different API)
- alternatively, use joblib that uses loky internally with an even
more different API.
- alternatively, recompile scikit-learn from source with clang instead
of gcc so as to link scikit-learn to llvm-openmp instead of gcc's
libgomp runtime. llvm-openmp is forksafe,
- alternatively, install scikit-learn from conda-forge (conda install
-c conda-forge scikit-learn) as the conda-forge distribution relinks
all OpenMP compiled extensions of its packaged libraries to
llvm-openmp transparently at install time, even if they were built
with GCC (maybe we should do that for our linux wheels).

[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2014-02/msg00979.html

If that does not work or need more help, please feel free to open an
issue with a minimal reproducer and ping me on gitter or discord.

Le jeu. 9 déc. 2021 à 05:59, Norbert Preining  a écrit :
>
> Dear all,
>
> I am trying to track down a strange behaviour in one of our (Fujitsu)
> library we are planning to open source. In preparation for that, I am
> trying to bring it into a state that it works with scikit-learn >= 1.
>
> But, some of our tests fail when running in parallel mode. But they
> only fail when running under pytest, but NOT when running under python.
>
> The library code contains
>
> def fit(self, X, y=None):
> ...
> p = multiprocessing.Pool()
> ret = _reduce(
> p.map())
>
> Now what happens is that with scikit-learn 1(.0.1), the code hangs
> forever. I adjusted the code also so that the pool definition is not in
> the fit function, but in the __init__ function, and saved into self, but
> that didn't help either.
>
> When interrupted, pytest gives:
>
>  
> KeyboardInterrupt 
> !
> /home/norbert/.pyenv/versions/3.9.6/lib/python3.9/threading.py:312: 
> KeyboardInterrupt
> (to show a full traceback on KeyboardInterrupt use --full-trace)
>  1 passed, 2 warnings in 
> 273.84s (0:04:33) =
> Exception ignored in: 
> Traceback (most recent call last):
>   File 
> "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/pool.py", 
> line 268, in __del__
> self._change_notifier.put(None)
>   File 
> "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/queues.py",
>  line 378, in put
> self._writer.send_bytes(obj)
>   File 
> "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py",
>  line 205, in send_bytes
> self._send_bytes(m[offset:offset + size])
>   File 
> "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py",
>  line 416, in _send_bytes
> self._send(header + buf)
>   File 
> "/home/norbert/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/connection.py",
>  line 373, in _send
> n = write(self._handle, buf)
>
>
> While when running under python testfile.py all goes well.
>
>
> I have tested the following combinations:
> * scikit-learn 0.23.*, python 3.8 and python 3.9 => works
> * scikit-learn 0.24.*, python 3.8 and python 3.9 => works
> * scikit-learn 1.0.1,  python 3.8 and python 3.9 => fails
>
> I don't really understand where scikit-learn comes into the play here,
> so I wanted to ask whether someone here has an idea.
>
> Thanks for any suggestion
>
>
> Norbert
>
> --
> PREINING Norbert  https://www.preining.info
> Fujitsu Research  +  IFMGA Guide  +  TU Wien  +  TeX Live  + Debian Dev
> GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn office hours on Friday Oct. 8 2021

2021-10-08 Thread Olivier Grisel
To summarize, the office hours for today are:

- 15:00-16:00 UTC / 17:00-18:00 CEST (this one starts in less than 10min)
- 18:00-19:00 UTC / 20:00-21:00 CEST (with Guillaume)

Sorry for the confusion and see you soon.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] scikit-learn office hours on Friday Oct. 8 2021

2021-10-06 Thread Olivier Grisel
Hi all,

Some of us will be online on the scikit-learn discord this Friday at
15:00 UTC and 20:00 UTC.

First time and occasional contributors are welcome to join us to
discord using this invitation link:

https://discord.gg/YBdN45kD

The focus of these office hour sessions is to answer questions about
contributing to scikit-learn. We can also split into break out
audio/text channels and do pair programming or live reviewing of
forgotten pull requests with screen sharing.

We can also try to assist you into crafting minimal reproduction cases
for bug reports to get a higher likelihood of resolution (e.g.
https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).

If this experiment is successful, we will probably hold this kind of
office hours on a regular basis.

See you soon on discord!

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANNOUNCEMENT] scikit-learn 1.0 release

2021-09-24 Thread Olivier Grisel
Yeah!

Thank you so much Adrin for all your efforts in getting this release out!

Congratulations everyone, time to celebrate!

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Dataframe protocol RFC

2021-08-25 Thread Olivier Grisel
Hi all,

This is an email to notify everybody interested that the discussion on
interoperability of Python dataframe libraries has moved to an
official repo under the data-apis.org initiative:

https://data-apis.org/blog/dataframe_protocol_rfc/
https://github.com/data-apis/dataframe-api

and they are requesting feedback from library authors (both dataframe
providers and consumers).

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Pandas copy-on-write proposal

2021-08-25 Thread Olivier Grisel
Thanks for the heads up! This is interesting. We rarely update
dataframe values in-place in scikit-learn but this is interesting to
know that we could leverage this for more efficient pandas-in
pandas-out support, for instance for missing value imputation.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [TC Vote] Technical Committee vote: line length

2021-07-28 Thread Olivier Grisel
Many very active core devs not represented in the TC voted for 88 and
my previous vote for 79 was not that strong. So I feel that I should
now vote for 88:

Keep current 88 characters:

Olivier

Revert to 79 characters:
-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] scikit-learn monthly developer meeting: Monday June 28 2021

2021-06-25 Thread Olivier Grisel
Dear all,

The scikit-learn developer monthly meeting will take place on Monday
June 28th at
3PM UTC.

- Video call link: https://meet.google.com/qbg-ucpe-ngz
- Meeting notes / agenda: https://hackmd.io/0yokz72CTZSny8y3Re648Q
- Local times:
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021=6=28=15=0=0=1440=240=248=195=179=224

The goal of this meeting is to discuss ongoing development topics for
the project. Everybody is welcome.

As usual, please follow the code of conduct of the project:
https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md

Regards,

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] New member of the triage team: Norbert

2021-06-21 Thread Olivier Grisel
> I have only one question related to scikit-learn.
> how to compute topic coherence of lda models in scikit-lean.  I don't find 
> any function that calculate a coherence value.
> please, reply me.

We don't have such a metric in scikit-learn. I assume you are referring to:
http://svn.aksw.org/papers/2015/WSDM_Topic_Evaluation/public.pdf

which is implemented in Gensim as:
https://radimrehurek.com/gensim/models/coherencemodel.html

If I understand correctly this metric needs to compute relative
frequencies of occurrences and co-occurrences of words in the
documents of the training set. This feels very domain specific
compared to the more domain agnostic metrics that we have in
scikit-learn.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] New member of the triage team: Norbert

2021-06-21 Thread Olivier Grisel
I am a bit late but I am very happy to see Norbert joining the triage
team! Welcome!
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] running examples

2021-03-24 Thread Olivier Grisel
Alternatively, you can edit the code to use fetch_openml(...,
as_frame=False) to use a numpy array instead of a pandas dataframe for
this example.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] [ANN] scikit-learn 0.24.0rc1 is online!

2020-12-03 Thread Olivier Grisel
Please help us test the first release candidate for scikit-learn 0.24.0:

   pip install scikit-learn==0.24.0rc1

Changelog: https://scikit-learn.org/0.24/whats_new/v0.24.html

In particular, if you maintain a project with a dependency on
scikit-learn, please let us know about any regression.

Feel free to also retweet the announcement to get more people to test
it before the final release (probably in 1 week or 2):

  https://twitter.com/scikit_learn/status/1334562221498753026

Thanks to anybody who helped make this happen!

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Changes in Travis billing

2020-11-05 Thread Olivier Grisel
> Shall I contact them? Any other volunteers?

+1.

I think we are still dependent on travis for ARM-based release builds
and cron-jobs. The rest we can move it to Azure Pipelines or github
actions I believe.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] About the Boston housing prices dataset

2020-10-14 Thread Olivier Grisel
Le mar. 13 oct. 2020 à 16:19, Adrin  a écrit :
>
> Isn't the Boston dataset available through openml? Maybe here: 
> https://www.openml.org/d/531
>
> I'm happy to have the dataset out there on opemml, and for any material that 
> addresses some of the issues with it.
> But for educational purposes, we don't need to have the dataset in the 
> package as long as users can still download it
> with a oneliner using fetch_openml.

That would be an argument in favor of deprecation warning with a
message stating the motivation for deprecation and pointing to
fetch_openml.

However it's going to break examples written in slow to update
tutorials or book once the deprecation period is over. But one could
argue that this is also the case for any other deprecation in
scikit-learn. It's just that sklearn.datasets.load_boston is used A
LOT: https://github.com/search?q=load_boston=code

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] About the Boston housing prices dataset

2020-10-13 Thread Olivier Grisel
Thanks for your input, this is also an extension I was thinking of.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] About the Boston housing prices dataset

2020-10-13 Thread Olivier Grisel
Hi all,

Thanks to the sustained effort of several contributors (thanks Maria
and Lucy in particular), the Boston housing price dataset is no longer
used in the examples of scikit-learn (nor in the test suite) in the
master branch.

To give some context on why this dataset is problematic, please have a
look at this discussion and  the blog post linked in it:

https://github.com/scikit-learn/scikit-learn/issues/16155

Now that we no longer use sklearn.datasets.load_boston internally, we
have to make a decision about what to do with the loader function
itself: deprecate it? just silently hide it from our documentation
from our documentation (probably a bad idea)? keep it but educate our
users about its ethical problem?

Personally, I would be slightly in favor of the latter option and I
drafted a short paragraph here:

https://github.com/scikit-learn/scikit-learn/pull/18594#issuecomment-707601448

Please feel free to share your thoughts so that we can hopefully make
a consensual decision before the 0.24 release.

Regards,

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn monthly meeting September 28th 2020

2020-09-28 Thread Olivier Grisel
Shall we start rolling meetings with a switch between 2 or 3 time slots?

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] climate friendly software licence

2020-06-29 Thread Olivier Grisel
Hi Sole,

I personally support climate change actions very much and I am
convinced climate change is the number 1 challenge of our time. In an
attempt to act in a consistent way with that belief, I declined
several times to keynote at conferences either organized by the fossil
fuel industry or to conferences that would have required me to fly a
long distance to give a presentation.

However, I don't think software licensing is a right tool to advance this cause.

How would we enforce it? What would happen if we don't enforce it? Who
is "we", especially when our library is embedded in 3-rd party
software product and the end-users are not necessarily aware of all
the upstream dependencies?

What about gray-cases, e.g. a company that does not fossil directly
extraction per-se but works as a consultancy with a majority of
customers in the fossil fuel extraction industry? What if a
significant part of their consultancy is to help them detect methane
leaks in satellite data? How would we audit this? With which
resources? How would we get a consensual decision on those gray cases?

What about the hypocrisy of using or contributing to software under
that license while regularly using fossil fuel powered transportation
or in a working or leaving building heated with fossil fuels? Or
buying goods transported this way over long distances?

Instead, I would rather encourage everyone to vote for legislators and
governments that progressively set bans on the development and
commercialization of fossil fuel based technologies and to voice your
support for such legislations in public debates. I encourage everybody
to look twice before accepting to work for a company involved in
fossil fuel extraction one way or another or involved in fossil-fuel
intensive activities.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] ANN scikit-learn 0.23.0 release

2020-05-13 Thread Olivier Grisel
Congrats on the release! And thank you very much to all those who were
involved in making it happen (and Adrin in particular)!

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Monthly meetings

2020-03-30 Thread Olivier Grisel
I get a message for an invalid meeting id.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] scikit-learn 0.22.1 is out!

2020-01-02 Thread Olivier Grisel
This is a minor release that includes many bug fixes and solves a
number of packaging issues with Windows wheels in particular. Here is
the full changelog:

https://scikit-learn.org/stable/whats_new/v0.22.html#version-0-22-1

The conda package will follow soon (hopefully).

Thank you very much to all who contributed to this release!

Cheers and happy new year!

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Paris Sprint in January and wiki update

2019-12-17 Thread Olivier Grisel
Indeed I do not see the "circle add" button in the tweetdeck UI anymore.

But it's ok not to prepare the threads before tweeting the first
tweet. We can build the thread progressively by publishing the first
tweet and then replying one tweet after the other by hitting the reply
button of the last published tweet in the thread.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn twitter account

2019-12-03 Thread Olivier Grisel
Ok the twitter accounts are now switched:

https://twitter.com/scikit_learn/status/1201794032650932224

The notifications for commits pushed to master are live:

https://twitter.com/sklearn_commits

Ready for the release :)

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn twitter account

2019-12-02 Thread Olivier Grisel
Alright, I have configured the new github action for the tweets on
@sklearn_commits:

https://github.com/scikit-learn/scikit-learn/pull/15758

I tested it from my repo and it worked fine (I deleted the test tweet though).
We can do the switch as soon as this PR is merged.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn twitter account

2019-12-02 Thread Olivier Grisel
It might actually be possible to use github actions with
https://github.com/xorilog/twitter-action for instance. I will try to
give it a try with a test repo.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn twitter account

2019-12-02 Thread Olivier Grisel
Alright, it seems that I can create twitter apps (and generates API
tokens) for the @sklearn_commits account however
https://github.com/filearts/tweethook does not work as it relies on a
third party webtask,io service that does not accept any new
subscription...

I am looking for an alternative way to do this but I am not sure how to do so...
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn twitter account

2019-11-25 Thread Olivier Grisel
I have created the https://twitter.com/sklearn_commits twitter account.

I have applied to make this account a "Twitter Developer" account to
be able to use https://github.com/filearts/tweethook to register it as
a webhook for the main scikit-learn github repo.

Once ready, I will remove the old webhook currently registered on
@scikit_learn account and would like to tweet about the transfer as
drafted here:

https://hackmd.io/@4rHCRgfySZSdd5eMtfUJiA/H1CSpuF2S/edit

Please feel free to let me know if you have any comment / suggestion
about this plan.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn twitter account

2019-11-22 Thread Olivier Grisel
Le ven. 22 nov. 2019 à 17:24, Gael Varoquaux
 a écrit :
>
> > I would like to create @sklearn_commits instead of
> > @scikit_learn_commits that is too long to my taste. Any opinion?
>
> Some people do not make the link between "sklearn" and "scikit-learn" :)

People who are likely to follow a twitter account that automatically
tweet github commits from the scikit-learn github repo are likely to
know about "sklearn". And as you said the bio / full name can be more
explicit.

The main twitter account with general announcements stays @scikit_learn.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn twitter account

2019-11-22 Thread Olivier Grisel
Ok, I have sent some invites.

I would like to create @sklearn_commits instead of
@scikit_learn_commits that is too long to my taste. Any opinion?

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn twitter account

2019-11-22 Thread Olivier Grisel
Thanks Tom, let me try to configure this.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn twitter account

2019-11-15 Thread Olivier Grisel
I am not sure who has the rights to manage the twitter account. I just
sent a password reset request to "sc**@a..***"
I suspect that this is Andreas but I am not so sure.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn twitter account

2019-11-15 Thread Olivier Grisel
Le ven. 15 nov. 2019 à 17:31, Nicolas Hug  a écrit :
>
> What's the status of this? Would be great to have it for the 0.22 release :) !
>

+1 and we could also announce / thank / RT new sources of funding (CZI
and Fujitsu).
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn twitter account

2019-11-15 Thread Olivier Grisel
Le mar. 5 nov. 2019 à 12:46, Gael Varoquaux
 a écrit :
>
> On Mon, Nov 04, 2019 at 10:14:26PM -0700, Andreas Mueller wrote:
> > Should we re-purpose the existing twitter account or make a new one?
> > https://twitter.com/scikit_learn
>
> I think that we should repurpose it:
>
> - Make a "scikit-learn-commits" twitter account that does what the
>   current one does
> - Use the current one.

+1

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Monthly meetings between core developers

2019-07-18 Thread Olivier Grisel
I just found this planner to give it a try:

https://www.timeanddate.com/worldclock/meetingtime.html?day=29=7=2019=240=33=37=179=0

(Berlin and Paris are on the same timezone so I did not put only Berlin).

It's going to be challenging to find a timeslot for every body. The
least extreme timeslot for everybody to attend at the same time would
be:

https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019=7=29=11=0=0=240=33=37=179

We could also arrange for a second timeslot later (that would be
Tuesday morning in Australia and China):

https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019=7=29=21=0=0=240=33=37=179

I wouldn't mind doing a meeting around 11pm on Monday evening from
time to time but it would still be very early for Beijing.

Just to let you know, I will be off from next Saturday till Monday
August 19 (big summer break :) so don't count on my for the first
meeting if you start the meetings  in the mean time.

Le jeu. 18 juil. 2019 à 00:15, Andreas Mueller  a écrit :
>
>
>
> On 7/17/19 2:17 PM, Guillaume Lemaître wrote:
> > I am +1. This is a great initiative.
> >
> > IMO, we could make it really regular (i.e., a specific week-day of a
> > specific week in a month), with a rolling time (for the time-zone issue).
> > In this matter, we could maybe clear more in advance our agenda
> > instead of trying to find a date which accommodates everyone.
> >
> I agree, we could do something like the last Monday every month and
> alternate between two (or three) different time zones.
> We have CET (UTC+1), EST (UTC-5), CT (UTC+08), AEDT (USC+11) so that
> seems super easy, right?
> (TIL CST can stand for "Central"/US, China, and Cuba! not confusing at all)
>
> I agree that we should be as inclusive as possible, but I also don't
> want to create the expectation that some people (not thinking of any
> Australian in particular)
> who already sacrifice a lot of their free time have to invest even more
> time to keep up with the rest.
>
> I think the idea of posting write-ups will help being more inclusive in
> that regard.
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Monthly meetings between core developers

2019-07-18 Thread Olivier Grisel
Le jeu. 18 juil. 2019 à 08:29, Adrin  a écrit :
>
> BTW, where was the meeting for last Monday organized? I don't think I knew it 
> was happening.

I do not understand what you are referring to. My email was about the
organization of future meetings as suggested by Andreas.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] New core developer: jeremiedbb

2019-07-03 Thread Olivier Grisel
The core developers of Scikit-learn have recently voted to welcome
Jérémie Du Boisberranger to the team, in recognition of his efforts
and trustworthiness as contributor. Jérémie's works at Inria Saclay
and is supported by the scikit-learn initiative at Fondation Inria and
its partners.

Congratulations and welcome to the team Jérémie!

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-29 Thread Olivier Grisel
You have to use a dedicated framework to distribute the computation on a
cluster like you cray system.

You can use mpi, or dask with dask-jobqueue but the also need to run
parallel algorithms that are efficient when running in a distributed with a
high cost for communication between distributed worker nodes.

I am not sure that the dbscan implementation in scikit-learn would benefit
much from naively running in distributed mode.

Le ven. 28 juin 2019 22 h 06, Mauricio Reis  a écrit :

> Sorry, but just now I reread your answer more closely.
>
> It seems that the "n_jobs" parameter of the DBScan routine brings no
> benefit to performance. If I want to improve the performance of the
> DBScan routine I will have to redesign the solution to use MPI
> resources.
>
> Is it correct?
>
> ---
> Ats.,
> Mauricio Reis
>
> Em 28/06/2019 16:47, Mauricio Reis escreveu:
> > My laptop has Intel I7 processor with 4 cores. When I run the program
> > on Windows 10, the "joblib.cpu_count()" routine returns "4". In these
> > cases, the same test I did on the Cray computer caused a 10% increase
> > in the processing time of the DBScan routine when I used the "n_jobs =
> > 4" parameter compared to the processing time of that routine without
> > this parameter. Do you know what is the cause of the longer processing
> > time when I use "n_jobs = 4" on my laptop?
> >
> > ---
> > Ats.,
> > Mauricio Reis
> >
> > Em 28/06/2019 06:29, Brown J.B. via scikit-learn escreveu:
> >>> where you can see "ncpus = 1" (I still do not know why 4 lines were
> >>> printed -
> >>>
> >>> (total of 40 nodes) and each node has 1 CPU and 1 GPU!
> >>
> >>> #PBS -l select=1:ncpus=8:mpiprocs=8
> >>> aprun -n 4 p.sh ./ncpus.py
> >>
> >> You can request 8 CPUs from a job scheduler, but if each node the
> >> script runs on contains only one virtual/physical core, then
> >> cpu_count() will return 1.
> >> If that CPU supports multi-threading, you would typically get 2.
> >>
> >> For example, on my workstation:
> >> `--> egrep "processor|model name|core id" /proc/cpuinfo
> >> processor : 0
> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
> >> core id : 0
> >> processor : 1
> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
> >> core id : 1
> >> processor : 2
> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
> >> core id : 0
> >> processor : 3
> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
> >> core id : 1
> >> `--> python3 -c "from sklearn.externals import joblib;
> >> print(joblib.cpu_count())"
> >> 4
> >>
> >> It seems that in this situation, if you're wanting to parallelize
> >> *independent* sklearn calculations (e.g., changing dataset or random
> >> seed), you'll ask for the MPI by PBS processes like you have, but
> >> you'll need to place the sklearn computations in a function and then
> >> take care of distributing that function call across the MPI processes.
> >>
> >> Then again, if the runs are independent, it's a lot easier to write a
> >> for loop in a shell script that changes the dataset/seed and submits
> >> it to the job scheduler to let the job handler take care of the
> >> parallel distribution.
> >> (I do this when performing 10+ independent runs of sklearn modeling,
> >> where models use multiple threads during calculations; in my case,
> >> SLURM then takes care of finding the available nodes to distribute the
> >> work to.)
> >>
> >> Hope this helps.
> >> J.B.
> >> ___
> >> scikit-learn mailing list
> >> scikit-learn@python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-19 Thread Olivier Grisel
How many cores du you have on this machine?

joblib.cpu_count()
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [Copyright] Skicit-learn graphic

2019-05-24 Thread Olivier Grisel
I think it's ok to do as you said.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Release Candidate for Scikit-learn 0.21

2019-05-01 Thread Olivier Grisel
\o/
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] VOTE: scikit-learn governance document

2019-02-20 Thread Olivier Grisel
+1
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Sprint discussion points?

2019-02-15 Thread Olivier Grisel
I would also add generalizing early stopping options to most estimators.

This is a bit related to Joel's point on max_iter consistency in
LogisticRegression.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Next Sprint

2018-12-21 Thread Olivier Grisel
Ok for me. The last 3 weeks of February are fine for me.

Le jeu. 20 déc. 2018 à 21:21, Alexandre Gramfort <
alexandre.gramf...@inria.fr> a écrit :

> ok for me
>
> Alex
>
> On Thu, Dec 20, 2018 at 8:35 PM Adrin  wrote:
> >
> > It'll be the least favourable week of February for me, but I can make do.
> >
> > On Thu, 20 Dec 2018 at 18:45 Andreas Mueller  wrote:
> >>
> >> Works for me!
> >>
> >> On 12/19/18 5:33 PM, Gael Varoquaux wrote:
> >> > I would propose  the week of Feb 25th, as I heard people say that they
> >> > might be available at this time. It is good for many people, or
> should we
> >> > organize a doodle?
> >> >
> >> > G
> >> >
> >> > On Wed, Dec 19, 2018 at 05:27:21PM -0500, Andreas Mueller wrote:
> >> >> Can we please nail down dates for a sprint?
> >> >> On 11/20/18 2:25 PM, Gael Varoquaux wrote:
> >> >>> On Tue, Nov 20, 2018 at 08:15:07PM +0100, Olivier Grisel wrote:
> >> >>>> We can also do Paris in April / May or June if that's ok with Joel
> and better
> >> >>>> for Andreas.
> >> >>> Absolutely.
> >> >>> My thoughts here are that I want to minimize transportation, partly
> >> >>> because flying has a large carbon footprint. Also, for personal
> reasons,
> >> >>> I am not sure that I will be able to make it to Austin in July, but
> I
> >> >>> realize that this is a pretty bad argument.
> >> >>> We're happy to try to host in Paris whenever it's most convenient
> and to
> >> >>> try to help with travel for those not in Paris.
> >> >>> Gaël
> >> >>> ___
> >> >>> scikit-learn mailing list
> >> >>> scikit-learn@python.org
> >> >>> https://mail.python.org/mailman/listinfo/scikit-learn
> >> >> ___
> >> >> scikit-learn mailing list
> >> >> scikit-learn@python.org
> >> >> https://mail.python.org/mailman/listinfo/scikit-learn
> >>
> >> ___
> >> scikit-learn mailing list
> >> scikit-learn@python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > ___
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] MLPClassifier on WIndows 10 is 4 times slower than that on macOS?

2018-12-18 Thread Olivier Grisel
You should probably just "conda update scikit-learn":

scikit-learn 0.20.1 is available on the official anaconda channel for all
supported operating systems:
https://anaconda.org/anaconda/scikit-learn
-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Difference between linear model and tree-based regressor?

2018-12-13 Thread Olivier Grisel
They are very different statistical models from a mathematical point of
view. See the online scikit-learn documentation or reference text books
such as "Elements of Statistical Learning" for more details.

In practice, linear model tends to be faster to fit on large data,
especially when the number of features is large (although it depends on the
solver, loss, penalty, data scaling...).

Linear model cannot fit prediction tasks when the data is not linearly
separable (by definition) while tree based model do not have this
restriction.

Tree based model can still under fit in some cases but for different
reasons (e.g. when we limit the depth of the trees).

Linear model can be made mode expressive via feature engineering (e.g.
k-bins discretizer, polynomial features expansion, Nystroem kernel
approximation...) and thereafter sometimes be competitive with tree based
models even on task that where originally non linearly separable tasks.
However this is not guaranteed either. Cross-validation and parameter
tuning are still required to tell which class of model works best for a
specific task.

As you said, tree based model "cannot extrapolate" in the sense that their
decision function is piecewise constant while the decision function of
linear model is an hyperplane. Depending on the tasks the lack of
extrapolation can either be considered a limitation or a benefit (for
instance to avoid unrealistic extrapolations like people with a negative
age or size, predicting negative mechanical energy loss via heat
dissipation, fractions that are larger than 100%, 6 stars out of 5
recommendations...).

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] New core dev: Adrin Jalali

2018-12-06 Thread Olivier Grisel
Congrats and welcome Adrin!

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] benchmarking TargetEncoder Was: ANN Dirty_cat: learning on dirty categories

2018-11-23 Thread Olivier Grisel
Maybe a subset of the criteo TB dataset?
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Next Sprint

2018-11-20 Thread Olivier Grisel
We can also do Paris in April / May or June if that's ok with Joel and
better for Andreas.

I am teaching on Fridays from end of January to March. But I can miss half
a day of sprint to teach my class.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Random Forest Regressor -- Implementation in C++

2018-11-07 Thread Olivier Grisel
You might also want to have a look at https://github.com/onnx/onnxmltools
although I am not sure if there are RF optimized ONNX runtimes at this
point.
-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Olivier Grisel
>
>
> > I think model serialization should be a priority.
>

There is also the ONNX specification that is gaining industrial adoption
and that already includes open source exporters for several families of
scikit-learn models:

https://github.com/onnx/onnxmltools

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-27 Thread Olivier Grisel
Le mer. 26 sept. 2018 à 23:02, Joel Nothman  a
écrit :

> And for those interested in what's in the pipeline, we are trying to draft
> a roadmap...
> https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
>
> But there are no doubt many features that are absent there too.
>

Indeed, it would be great to get some feedback on this roadmap from heavy
scikit-learn users: which points do you think are the most important? What
is missing from this roadmap?

Feel free to reply to this thread.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-27 Thread Olivier Grisel
Joy !
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Bootstrapping in sklearn

2018-09-20 Thread Olivier Grisel
I believe it would fit in sklearn-contrib even if it's more for statistical
inference rather than machine learning style prediction.

Others might disagree.

Anyways, joining efforts to improve documentation, CI, testing and so on is
always a good thing for your future users.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Bootstrapping in sklearn

2018-09-18 Thread Olivier Grisel
This looks like a very useful project.

There is also scikits-bootstraps [1]. Personally I prefer the flat package
namespace of resample (I am not a fan of the 'scikits' namespace package)
but I still think it would be great to contact the author to know if he
would be interested in joining efforts.

What currently lacks from both projects is a good sphinx-based
documentation that explains in a couple of paragraphs with examples what
are the different non-parametric inference methods, what are the pros and
cons for each of them (sample complexity, computation complexity, kinds of
inference, bias, theoretical asymptotic results, practical discrepancies
observed in the finite sample setting, assumptions made on the distribution
of the data...) and ideally the doc would have reference to examples (using
sphinx-gallery) that would highlight the behavior of the tools in both
nominal and pathological cases.

[1] https://github.com/cgevans/scikits-bootstrap

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] New core dev: Joris Van den Bossche

2018-06-23 Thread Olivier Grisel
Hi everyone!

Let's welcome Joris Van den Bossche (@jorisvdbossche) officially as a
scikit-learn core developer!

Joris is one of the maintainers of the pandas project and recently
contributed many new great PRs to scikit-learn (notably the
ColumnTransformer and a refactoring of the categorical variable
preprocessing tools).

Cheers!

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Announcing modAL: a modular active learning framework

2018-02-19 Thread Olivier Grisel
It looks nice, thanks for sharing.

Do you plan to couple the active learner with a UX-optimized labeling
interface (for instance with a react.js or similar frontend and a flask or
similar backend)?

-- 
Olivier
​
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] clustering on big dataset

2018-01-02 Thread Olivier Grisel
Have you had a look at BIRCH?

http://scikit-learn.org/stable/modules/clustering.html#birch

-- 
Olivier
​
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Announcing sklearn-xarray

2017-12-04 Thread Olivier Grisel
Interesting project!

BTW, do you know about dask-ml [1]?

It might be interesting to think about generalizing the input validation of
fit and predict / transform as a private method of the BaseEstimator class
instead of directly calling into sklearn.utils.validation functions so has
to make it easier for third party projects such as sklearn-xarray and
dask-ml to subclass and override those methods to allow for specific input
data-structure without converting everyting to a numpy array.

[1] https://github.com/dask/dask-ml



2017-12-04 15:21 GMT+01:00 Peter Hausamann :

> Hi all,
>
> I'd like to announce *sklearn-xarray*, a new package that provides a
> scikit-learn interface for xarray users. For those not familiar with xarray
> (http://xarray.pydata.org), it is a "pandas-like and pandas-compatible
> toolkit for analytics on multi-dimensional arrays".
>
> The package makes it possible to apply sklearn estimators to xarray
> DataArrays and Datasets while keeping the labels (called coordinates in
> xarray) intact whereever possible.
>
> You can install the package via pip:
>
> pip install sklearn-xarray
>
> To get started, you can:
>
>- read the documentation: https://phausamann.github.io/sklearn-xarray
>and
>- check out the repository: https://github.
>com/phausamann/sklearn-xarray
>
> Note that the package is still in a very early development stage and there
> will probably be some major API changes in upcoming releases. Most notably,
> I'd like to replicate the complete sklearn module structure at some point
> by decorating all available estimators with the necessary wrappers.
>
> Feedback of any kind is appreciated.
>
> Peter
>
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Error while running 'python setup.py build_ext --inplace'

2017-12-04 Thread Olivier Grisel
Maybe update your version of Cython?

-- 
Olivier
​
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Rapid Outlier Detection via Sampling

2017-11-27 Thread Olivier Grisel
> Do I need to write object oriented or are functions also ok?

I you want to contribute an implementation as a new project on scikit-learn
contrib, you should be careful to follow the scikit-learn estimators API:

http://scikit-learn.org/dev/developers/contributing.html#apis-of-scikit-learn-objects

For outlier detection in particular, you should make sure your new
estimator is consistent with the API conventions of other methods already
in scikit-learn:

http://scikit-learn.org/dev/modules/outlier_detection.html

One of the primary goals of the scikit-learn ecosystem is to provide a
simple homogeneous API to a very heterogeneous set of methods.

-- 
Olivier
​
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] New core devs: Hanmin Qin, Guillaume Lemaître, and Roman Yurchak

2017-11-09 Thread Olivier Grisel
Congrats to all three of you! Thank you very much for your contributions
and in particular in reviewing contributions by others.

-- 
Olivier
​
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn-commits mailing list defunct?

2017-08-28 Thread Olivier Grisel
+1 for python.org if they accept this kind of mailing lists.
​
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] scikit-learn-commits mailing list defunct?

2017-08-28 Thread Olivier Grisel
+1
​
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] scikit-learn 0.19.0 is out!

2017-08-11 Thread Olivier Grisel
Grab it with pip or conda !

Quoting the release highlights from the website:

We are excited to release a number of great new features including
neighbors.LocalOutlierFactor for anomaly detection,
preprocessing.QuantileTransformer for robust feature transformation, and
the multioutput.ClassifierChain meta-estimator to simply account for
dependencies between classes in multilabel problems. We have some new
algorithms in existing estimators, such as multiplicative update in
decomposition.NMF and multinomial linear_model.LogisticRegression with L1
loss (use solver='saga').

Cross validation is now able to return the results from multiple metric
evaluations. The new model_selection.cross_validate can return many scores
on the test data as well as training set performance and timings, and we
have extended the scoring and refit parameters for grid/randomized search
to handle multiple metrics.

You can also learn faster. For instance, the new option to cache
transformations in pipeline.Pipeline makes grid search over pipelines
including slow transformations much more efficient. And you can predict
faster: if you’re sure you know what you’re doing, you can turn off
validating that the input is finite using config_context.

We’ve made some important fixes too. We’ve fixed a longstanding
implementation error in metrics.average_precision_score, so please be
cautious with prior results reported from that function. A number of errors
in the manifold.TSNE implementation have been fixed, particularly in the
default Barnes-Hut approximation. semi_supervised.LabelSpreading and
semi_supervised.LabelPropagation have had substantial fixes.
LabelPropagation was previously broken. LabelSpreading should now correctly
respect its alpha parameter.

Please see the full changelog at:

http://scikit-learn.org/0.19/whats_new.html#version-0-19

Notably some models have changed behaviors (bug fixes) and some methods or
parameters part of the public API have been deprecated.

A big thank you to anyone who made this release possible and Joel in
particular.

--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Truncated svd not working for complex matrices

2017-08-10 Thread Olivier Grisel
I have no idea whether the randomized SVD method is supposed to work for
complex data or not (from a mathematical point of view). I think that all
scikit-learn estimators assume real data (or integer data for class labels)
and our input validation utilities will cast numeric values to float64 by
default. This might be the cause of your problem. Have a look at the source
code to confirm. The reference to the paper can also be found in the
docstring of those functions.

-- 
Olivier
​
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Extra trees tuning parameters

2017-08-04 Thread Olivier Grisel
I believe so even though it's always better to check in the code to see how
this parameter is actually used.

-- 
Olivier
​
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] scikit-learn 0.19b2 is available for testing

2017-07-17 Thread Olivier Grisel
The new release is coming and we are seeking feedback from beta testers!

  pip install scikit-learn==0.19b2

conda-forge packages should follow in the coming hours / days.

Note that many models have changed behaviors and some things have been
deprecated, see the full changelog at:

http://scikit-learn.org/dev/whats_new.html#version-0-19

As usual please report any regression or other bugs as an issue on github.

Thanks to anyone who contributed to the release!

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Which algorithm is used in sklearn SGDClassifier when modified huber loss is used?

2017-07-07 Thread Olivier Grisel
The name of the algorithm / model would be "L2-penalized linear model
with modified Huber loss trained with Stochastic Gradient Descent".

SVM is traditionally used to describe models that use the hinge loss
only (or sometimes the squared hinge loss too).

Only the log loss can be lead to a probabilistic linear binary
classifiers in scikit-learn.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Typo in online documentation on Matrix Factorization

2017-07-06 Thread Olivier Grisel
I think the documentation is correct. U, a.k.a. "the code" or "the
activations" has shape (n_samples, n_components) and V a.k.a. "the
dictionary" or "the components" has shape (n_components, n_features) in
both case.

We could use n_components uniformly instead of n_atoms for consistency's
sake (and just make sure that the "components" is a synonym for "dictionary
atoms" in the literature).

I think V_k is fine because the dimension with size n_components is the
first dimension of V.
​
If you spot issues or other things that are unclear or incomplete in the
doc, please feel free to open an issue on github. You can also directly
submit a pull request if you are familiar with git. The website is built
from the docs that live in the "doc/" subfolder of the repo.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Fwd: [SciPy-User] EuroSciPy 2017 call for contributions - extension of deadline

2017-06-30 Thread Olivier Grisel
I am pretty sure this is exactly the kind of presentation that the
EuroScipy audience would enjoy. Please submit!

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Scikit-learn at Data Intelligence this past weekend

2017-06-30 Thread Olivier Grisel
Thanks for this report!

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Agglomerative clustering

2017-06-30 Thread Olivier Grisel
You can have a look at the test named "test_agglomerative_clustering" in:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/tests/test_hierarchical.py

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Scikit-learn workshop and sprint at EuroScipy 2017 in Erlangen

2017-06-23 Thread Olivier Grisel
Hi all,

FYI I have just submitted a 90 min tutorial on scikit-learn to the
EuroScipy CFP. If anybody is interested in co-teaching / TA-ing this
workshop please let me know.

I also plan to stay for the one-day sprint to help people make their
first contribution to the project. Last year we had great fun and the
sprint was very productive.

Registration is now open:
https://www.euroscipy.org/2017/

10th European Conference on Python in Science

Location: Erlangen
August 28-29 (Mon, Tue) Tutorials / Workshops
August 30 - 31 (Wed, Thu) Main conference and posters
September 1 (Fri) Sprints

See you in Erlangen!

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] XGboost Classifier error

2017-04-19 Thread Olivier Grisel
Please provide the full traceback. Without it it's impossible to tell
whether the problem is in scikit-learn or xgboost.

Also, please provide a minimal reproduction script as explained in:

http://scikit-learn.org/stable/faq.html#what-s-the-best-way-to-get-help-on-scikit-learn-usage

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Logistic regression with elastic net regularization

2017-03-14 Thread Olivier Grisel
>From a generalization point of view (test accuracy), the optimal
sparsity support should not matter much though, but it can be helpful
to find a the optimally sparsest solution for either computational
constraints (smaller models with a lower prediction latency) and
interpretation of the weights (domain specific).

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Logistic regression with elastic net regularization

2017-03-14 Thread Olivier Grisel
Note that SGD is not very good at optimizing finely with a non-smooth
penalty (e.g. l1 or elasticnet). The future SAGA solver is going to be
much better at finding the optimal sparsity support (although this
support is not guaranteed to be stable across re-sampling of the
training set if the training set is small).

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] GSOC call for mentors

2017-02-18 Thread Olivier Grisel
Personally I don't feel like mentoring this year. I would really like
to focus my scikit-learn time on finishing the joblib process
refactoring with Thomas Moreau and the binning / thread-based
parallelization of boosted trees with Guillaume and Raghav.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Modelling event rates

2017-02-17 Thread Olivier Grisel
I don't think we have any model dedicated to this, but it's possible
that expressive non-parametricmodels such as RF and GBRT or richly
parameterized models such as MLP with a regression loss can do a good
enough job at giving you a point estimate.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release

2017-01-09 Thread Olivier Grisel
I would rather like to get it out before April ideally and instead of
setting up a roadmap I would rather just identify bugs that are
blockers and fix only those and don't wait for any feature before
cutting 0.19.X.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release

2017-01-09 Thread Olivier Grisel
In retrospect, making a small 0.19 release is probably a good idea.

I would like to get
https://github.com/scikit-learn/scikit-learn/pull/8002 in before
cutting the 0.19.X branch.

-- 
Olivier Grisel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Preparing a scikit-learn 0.18.2 bugfix release

2017-01-09 Thread Olivier Grisel
Hi all,

I think we should release 0.18.2 to get some important fixes and make
it easy to release Python 3.6 wheel package for all the operating
systems using the automated procedure.

I identified a couple of PR to backport to 0.18.X to prepare the
0.18.2 release. Are there any other important recently fixed bugfs
people would like to see backported in this release?

https://github.com/scikit-learn/scikit-learn/milestone/23?closed=1

Best,

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] modifying CV score

2017-01-04 Thread Olivier Grisel
You can indeed derive from BaseEstimator and implement fit, predict
and optionally score.

Here is the documentation for the expected estimator API:

http://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects

As this is a linear regression model, you can also want to have a look
at the LinearModel and RegressionMixin base classes for inspiration:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/base.py#L401

Note that the score function should always be "higher is better". The
explained variance ratio and negative mean squared error are valid
scoring functions for model selection in scikit-learn while raw MSE is
not not.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] HashingVectorizer slow in version 0.18

2016-10-11 Thread Olivier Grisel
I cannot reproduce such a degradation on my machine:

(sklearn-0.17)ogrisel@is146148:~/code/scikit-learn$ python
~/tmp/bench_vectorizer.py
scikit-learn 0.17.1. Numpy 1.11.2. Python 3.5.0 x86_64
Vectorizing 20newsgroup 11314 documents
Vectorization completed in  4.033604383468628  seconds, resulting
shape  (11314, 1048576)

(sklearn-0.18) ogrisel@is146148:~/code/scikit-learn$ python
~/tmp/bench_vectorizer.py
scikit-learn 0.18. Numpy 1.11.2. Python 3.5.0 x86_64
Vectorizing 20newsgroup 11314 documents
Vectorization completed in  4.990509510040283  seconds, resulting
shape  (11314, 1048576)

Which operating system are you using?

Please feel free to open an issue on the tracker anyway.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Latent Semantic Analysis (LSA) and TrucatedSVD

2016-08-27 Thread Olivier Grisel
BTW Roman, the examples in your gist would make a great non-regression
test for this new feature. Please feel free to submit a PR.

--
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] 0.18?

2016-07-25 Thread Olivier Grisel
Sorry for the late reply,

Before working on this release I would like to automate the wheel
generation process (for the release wheels) in a single repo that will
generate wheels for linux, osx and windows based on
https://github.com/matthew-brett/multibuild

I plan to put that repo under
https://github.com/scikit-learn/scikit-learn-wheels and deprecate
https://github.com/MacPython/scikit-learn-wheels that we used for the
OSX wheels.

There is also some issue triaging to do, it would be great to identify
blocker bugs that we would like to get fixed before releasing 0.18.

We can aim to do a beta mid-August and the final release after
euroscipy (first week of September).

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] How to test on PYTHON_ARCH=32 with mac?

2016-07-20 Thread Olivier Grisel
> I believe this `arch -i386`  only works as a prefix for Python.org Python, 
> but I'm happy to be corrected.

Then the following should work:

arch -i386 python -c "import nose; nose.main()" sklearn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] NB-SVM Implementation

2016-06-07 Thread Olivier Grisel
I think it could be implemented as a preprocessing step: this is the
approach followed by:
https://github.com/ryankiros/skip-thoughts/blob/master/eval_classification.py

Note that in that case LogisticRegression is used as the final
classifier instead of a squared hinge loss SVM but that should not
change much in practice.

If you want to make this approach scikit-learn compatible (to work
with the Pipeline and sklearn's model selection tools for instance) be
sure to implement the Transformer API as documented here:

http://scikit-learn.org/dev/developers/contributing.html#apis-of-scikit-learn-objects

Read the rest of the contributions guide:

http://scikit-learn.org/dev/developers

NBSVM is quite recent and might not strictly follow the conditions for
inclusion as stated in:

http://scikit-learn.org/stable/faq.html#can-i-add-this-new-algorithm-that-i-or-someone-else-just-published

It already has 163 citations though:

https://scholar.google.com/scholar?oi=bibs=en=1710642630990759287

As this is a really strong baseline and the model is not complex and
should blend well within the scikit-learn API I would be +1 for
inclusion in sklearn.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn