Hi Vlad, It's on my riding list to create a stress testing module for
sklearn, and we should work together a bit on this. Unfortunately I'm very
busy for the next two weeks, but I should be available. I'm happy to work
with you on this.

As well as size, there are other aspects that need to be considered for
general testing, including local minima and noisy features. I'm not sure
all will be particularly relevant on your case.

As for where these should go, I'll have to let someone else answer that
question.

Thanks,  Robert
On May 6, 2012 8:29 AM, "Vlad Niculae" <[email protected]> wrote:

> Hello everybody,
>
> I will start my effort for my GSoC project for this year, as discussed,
> with making the linear models faster where applicable, most importantly in
> multi-task regression problems.
>
> The plan (which will be piloted now, and towards the middle of the summer,
> hopefully will get nailed down), is:
>
> 1. Choose some datasets for benchmarking the regression problem.
> These need to explore as many of the possible gotchas as we can: wide X,
> tall X, sparse X, etc. Maybe use our generators.
>
> 2. Set up a (pilot) benchmark runner using these datasets.
> This will slowly build up into a nice speed.pypy -like (but hopefully
> cleaner) interface so we can monitor the overall performance of the scikit.
>
> 3. Lose nights obsessing over getting the plot to go lower and lower.
>
> 4. ???
>
> 5. Profit
>
>
> When I get to 3 I know exactly what to do. However I feel that 1 and
> especially 2 are pretty novel and should be discussed. Things will start
> moving smoothly once we have such a system up and running, but it does not
> appear to be trivial so I would like to hear your suggestions and what you
> expect from this project.
>
> Olivier suggested using the buildbot to run the benchmarks and maybe using
> and improving Wes's vbench. A key ingredient is performance stability, if
> the machine we run the benchmarks on is under variable load or goes through
> an upgrade, our plots could go wild. The first issue can be solved with
> best-out-of-k timeit-style benchmarking.
>
> Another point is whether do you think this should take the place of the
> ml-benchmarks project and if not, what exactly should be its home?
>
> Here's to putting a stone at the foundation of future scikit-learn
> performance development.
>
> Cheers,
> Vlad
>
> ------------------
> Vlad N.
> http://vene.ro
>
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to