Hello everybody, I will start my effort for my GSoC project for this year, as discussed, with making the linear models faster where applicable, most importantly in multi-task regression problems.
The plan (which will be piloted now, and towards the middle of the summer, hopefully will get nailed down), is: 1. Choose some datasets for benchmarking the regression problem. These need to explore as many of the possible gotchas as we can: wide X, tall X, sparse X, etc. Maybe use our generators. 2. Set up a (pilot) benchmark runner using these datasets. This will slowly build up into a nice speed.pypy -like (but hopefully cleaner) interface so we can monitor the overall performance of the scikit. 3. Lose nights obsessing over getting the plot to go lower and lower. 4. ??? 5. Profit When I get to 3 I know exactly what to do. However I feel that 1 and especially 2 are pretty novel and should be discussed. Things will start moving smoothly once we have such a system up and running, but it does not appear to be trivial so I would like to hear your suggestions and what you expect from this project. Olivier suggested using the buildbot to run the benchmarks and maybe using and improving Wes's vbench. A key ingredient is performance stability, if the machine we run the benchmarks on is under variable load or goes through an upgrade, our plots could go wild. The first issue can be solved with best-out-of-k timeit-style benchmarking. Another point is whether do you think this should take the place of the ml-benchmarks project and if not, what exactly should be its home? Here's to putting a stone at the foundation of future scikit-learn performance development. Cheers, Vlad ------------------ Vlad N. http://vene.ro ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
