Hello everybody,

I will start my effort for my GSoC project for this year, as discussed, with 
making the linear models faster where applicable, most importantly in 
multi-task regression problems.

The plan (which will be piloted now, and towards the middle of the summer, 
hopefully will get nailed down), is:

1. Choose some datasets for benchmarking the regression problem.
These need to explore as many of the possible gotchas as we can: wide X, tall 
X, sparse X, etc. Maybe use our generators.

2. Set up a (pilot) benchmark runner using these datasets.
This will slowly build up into a nice speed.pypy -like (but hopefully cleaner) 
interface so we can monitor the overall performance of the scikit.

3. Lose nights obsessing over getting the plot to go lower and lower.

4. ???

5. Profit


When I get to 3 I know exactly what to do. However I feel that 1 and especially 
2 are pretty novel and should be discussed. Things will start moving smoothly 
once we have such a system up and running, but it does not appear to be trivial 
so I would like to hear your suggestions and what you expect from this project.

Olivier suggested using the buildbot to run the benchmarks and maybe using and 
improving Wes's vbench. A key ingredient is performance stability, if the 
machine we run the benchmarks on is under variable load or goes through an 
upgrade, our plots could go wild. The first issue can be solved with 
best-out-of-k timeit-style benchmarking.

Another point is whether do you think this should take the place of the 
ml-benchmarks project and if not, what exactly should be its home?

Here's to putting a stone at the foundation of future scikit-learn performance 
development.

Cheers,
Vlad

------------------
Vlad N.
http://vene.ro




------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to