[Scikit-learn-general] Automatic benchmarking: GSoC cross-project idea

Ryan Curtin Tue, 11 Mar 2014 12:49:28 -0700

Hello there,

My name is Ryan and I work on the mlpack machine learning library.  Last
summer, we were participants in the Google Summer of Code, and we had an
excellent student who created an automatic benchmarking system, seen
here:


http://www.mlpack.org/benchmark.html
(you have to hit the buttons on the right to open panels to get the
benchmarks to show; it needs better documentation, but in part that's
why I'm writing this message)

These benchmarks take a long time to run and were originally intended
for internal mlpack use, to answer the questions "Where is mlpack slow?
Where is mlpack fast?  Where do we need to improve?"

However, it quickly became clear after talking to the Shogun library
that this system could be improved, and also that they were interested
in improving it and deploying it for their own set of questions.

I came into your IRC channel some time ago and spoke with Andy, who
suggested that the scikit-learn community would be interested in this
system and potentially interested in a cross-library GSoC project to
improve the system, which is now hosted at Github:

https://github.com/zoq/benchmarks

Currently, the system is run on this ridiculous contraption I set up in
my free time (note the sparc and sparc64 boxes, very important for
testing!):

http://www.ratml.org/misc_img/build_farm.jpg

So, the point of this email is to see if scikit-learn finds this system
interesting and might want to help out with an effort to improve the
project.  Here are ideas from both the mlpack ideas list and the Shogun
ideas list:

http://www.mlpack.org/trac/wiki/SummerOfCodeIdeas#ImprovementofautomaticbenchmarkingsystemcollaborationwithShogun
http://www.shogun-toolbox.org/page/Events/gsoc2014_ideas#mlpack

Given the interest we have had so far, I think that a good place to
start might be improving the system so it can answer more than "Which
library is fastest for this task?" and maybe it can answer questions
more like "Which library is most accurate with default options for this
task?" or "Which particular algorithm provides the best accuracy/runtime
tradeoff?"  At the same time, I think we have to be careful so as to not
make the system so incredibly complex that it can answer any question at
all, but nobody knows what questions to ask, or the questions are simply
too difficult to ask.  (I think http://mlcomp.org/ suffers from this
problem; when I go to their website, I'm so overloaded with information
that I don't really even know how to begin asking the system questions.)

I'm not on the list, so please CC me in any responses to this thread
that go to the list.  If there is sufficient interest, we can involve
the Shogun guys in a discussion to decide what might be interesting to
do over a summer.

Thanks,

Ryan

-- 
Ryan Curtin    | "In honor of the last American hero, to whom speed
[email protected] | means freedom of the soul."  - Super Soul

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Automatic benchmarking: GSoC cross-project idea

Reply via email to