Re: [scikit-learn] Continues monitoring of benchmark performances

2019-07-22 Thread Joel Nothman
Isn't Jérémie's project at
https://github.com/jeremiedbb/scikit-learn_benchmarks meant to be doing
this? What's its status? How does it relate to Tom's work?

(Can we please take http://scikit-learn.org/ml-benchmarks/ offline?)

On Tue, 23 Jul 2019 at 00:17, Nicolas Hug  wrote:

> I agree having benchmarks for non regression would be very helpful. A
> seemingly simple change in Cython code can lead to drastic performance drop.
>
> I can't find it back but I think Jérémie has submitted an issue about this?
>
> On 7/22/19 9:59 AM, Tom Augspurger wrote:
>
> Thanks Adrin,
>
> A month or so ago I started running scikit-learn benchmarks, but I had to
> disable them since they were taking too long (longer than a day).
> I haven't had time to investigate why, but I assume it was an issue with
> how I set them up.
>
> Just FYI, I'm planning to include "maintain and improve the benchmark
> running tools" as part of the pandas' application for the CZI grant.
> All that is in https://github.com/asv-runner (a mix of Ansible, Airflow,
> and GitHub bots). If anyone is interested in (possibly) having funding
> to work on this, feel free to reach out to me off list and we can discuss
> things.
>
> Tom
>
> On Mon, Jul 22, 2019 at 8:53 AM Adrin  wrote:
>
>> Hi,
>>
>> There is this [page](https://pandas.pydata.org/speed/scikit-learn/)
>> maintained by some of the pandas maintainers (@TomAugspurger in
>> particular), and it seems like a really good idea to have an eye on the
>> performance of different benchmarks through time just in case a PR
>> introduces some major drawbacks.
>>
>> However, he doesn't have the bandwidth to maintain it much more, and not
>> really the hardware. I think it'd be a good idea for us to have that,
>> wanted to bring it up and see what you think!
>>
>> Cheers,
>> Adrin.
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
> ___
> scikit-learn mailing 
> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Continues monitoring of benchmark performances

2019-07-22 Thread Nicolas Hug
I agree having benchmarks for non regression would be very helpful. A 
seemingly simple change in Cython code can lead to drastic performance drop.


I can't find it back but I think Jérémie has submitted an issue about this?


On 7/22/19 9:59 AM, Tom Augspurger wrote:

Thanks Adrin,

A month or so ago I started running scikit-learn benchmarks, but I had 
to disable them since they were taking too long (longer than a day).
I haven't had time to investigate why, but I assume it was an issue 
with how I set them up.


Just FYI, I'm planning to include "maintain and improve the benchmark 
running tools" as part of the pandas' application for the CZI grant.
All that is in https://github.com/asv-runner (a mix of Ansible, 
Airflow, and GitHub bots). If anyone is interested in (possibly) 
having funding
to work on this, feel free to reach out to me off list and we can 
discuss things.


Tom

On Mon, Jul 22, 2019 at 8:53 AM Adrin > wrote:


Hi,

There is this
[page](https://pandas.pydata.org/speed/scikit-learn/) maintained
by some of the pandas maintainers (@TomAugspurger in particular),
and it seems like a really good idea to have an eye on the
performance of different benchmarks through time just in case a PR
introduces some major drawbacks.

However, he doesn't have the bandwidth to maintain it much more,
and not really the hardware. I think it'd be a good idea for us to
have that, wanted to bring it up and see what you think!

Cheers,
Adrin.
___
scikit-learn mailing list
scikit-learn@python.org 
https://mail.python.org/mailman/listinfo/scikit-learn


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Continues monitoring of benchmark performances

2019-07-22 Thread Tom Augspurger
Thanks Adrin,

A month or so ago I started running scikit-learn benchmarks, but I had to
disable them since they were taking too long (longer than a day).
I haven't had time to investigate why, but I assume it was an issue with
how I set them up.

Just FYI, I'm planning to include "maintain and improve the benchmark
running tools" as part of the pandas' application for the CZI grant.
All that is in https://github.com/asv-runner (a mix of Ansible, Airflow,
and GitHub bots). If anyone is interested in (possibly) having funding
to work on this, feel free to reach out to me off list and we can discuss
things.

Tom

On Mon, Jul 22, 2019 at 8:53 AM Adrin  wrote:

> Hi,
>
> There is this [page](https://pandas.pydata.org/speed/scikit-learn/)
> maintained by some of the pandas maintainers (@TomAugspurger in
> particular), and it seems like a really good idea to have an eye on the
> performance of different benchmarks through time just in case a PR
> introduces some major drawbacks.
>
> However, he doesn't have the bandwidth to maintain it much more, and not
> really the hardware. I think it'd be a good idea for us to have that,
> wanted to bring it up and see what you think!
>
> Cheers,
> Adrin.
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Monthly meetings between core developers + "Hello World"

2019-07-22 Thread Andreas Mueller


On 7/22/19 9:22 AM, Adrin wrote:

Awesome, excited to have your help around :)

We already have the @core-devs team on github, we can use it more 
often/more organized.hi


Why wouldn't we just use the scikit-learn repo projects?




On Fri, Jul 19, 2019 at 2:48 PM Chiara Marmo > wrote:


Dear list,

I'm Chiara, in September I will start to work full time for the
Scikit-Learn Consortium at INRIA (France). My background is in
Astronomy and Planetary Science: I've worked there as a Research
Engineer for around 15 years, writing code, mining data and
managing some project.

One of my task at the Consortium will be to take care of our
connection with the developer community, so let me know if I can
help in managing those monthly meetings in some way.
In the meanwhile, may I suggest to create a github team for core
developers in the scikit-learn organization? As Alexandre said,
team specific projects and discussions on github could be a way to
efficiently prepare meetings and prioritize issues.

Thanks for listening,
have a nice day.
Chiara
___
scikit-learn mailing list
scikit-learn@python.org 
https://mail.python.org/mailman/listinfo/scikit-learn


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Continues monitoring of benchmark performances

2019-07-22 Thread Adrin
Hi,

There is this [page](https://pandas.pydata.org/speed/scikit-learn/)
maintained by some of the pandas maintainers (@TomAugspurger in
particular), and it seems like a really good idea to have an eye on the
performance of different benchmarks through time just in case a PR
introduces some major drawbacks.

However, he doesn't have the bandwidth to maintain it much more, and not
really the hardware. I think it'd be a good idea for us to have that,
wanted to bring it up and see what you think!

Cheers,
Adrin.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Test Sample Size

2019-07-22 Thread Brown J.B. via scikit-learn
Dear Milton,

It is just my opinion based on many experiences, but if you want to
stress-test your estimator, make your test set at least as big as, if not
bigger than, the training set.

Sincerely,
J.B.

2019年7月22日(月) 22:18 Milton Pifano :

> Dear scikit-learn subscribers.
>
> I am working on a multiclass classificacition project and I have found
> many resources about how to deal with  an imbalaced dataset for trainning,
> bu I have not been able to find  any reference on the test dataset size.
> Can anyone send some references?
>
> Thanks,
> Milton Pifano
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Monthly meetings between core developers + "Hello World"

2019-07-22 Thread Adrin
Awesome, excited to have your help around :)

We already have the @core-devs team on github, we can use it more
often/more organized.

On Fri, Jul 19, 2019 at 2:48 PM Chiara Marmo  wrote:

> Dear list,
>
> I'm Chiara, in September I will start to work full time for the
> Scikit-Learn Consortium at INRIA (France). My background is in Astronomy
> and Planetary Science: I've worked there as a Research Engineer for around
> 15 years, writing code, mining data and managing some project.
>
> One of my task at the Consortium will be to take care of our connection
> with the developer community, so let me know if I can help in managing
> those monthly meetings in some way.
> In the meanwhile, may I suggest to create a github team for core
> developers in the scikit-learn organization? As Alexandre said, team
> specific projects and discussions on github could be a way to efficiently
> prepare meetings and prioritize issues.
>
> Thanks for listening,
> have a nice day.
> Chiara
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Test Sample Size

2019-07-22 Thread Milton Pifano
Dear scikit-learn subscribers.

I am working on a multiclass classificacition project and I have found many
resources about how to deal with  an imbalaced dataset for trainning, bu I
have not been able to find  any reference on the test dataset size.
Can anyone send some references?

Thanks,
Milton Pifano
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn