Hello there, My name is Ryan and I work on the mlpack machine learning library. Last summer, we were participants in the Google Summer of Code, and we had an excellent student who created an automatic benchmarking system, seen here:
http://www.mlpack.org/benchmark.html (you have to hit the buttons on the right to open panels to get the benchmarks to show; it needs better documentation, but in part that's why I'm writing this message) These benchmarks take a long time to run and were originally intended for internal mlpack use, to answer the questions "Where is mlpack slow? Where is mlpack fast? Where do we need to improve?" However, it quickly became clear after talking to the Shogun library that this system could be improved, and also that they were interested in improving it and deploying it for their own set of questions. I came into your IRC channel some time ago and spoke with Andy, who suggested that the scikit-learn community would be interested in this system and potentially interested in a cross-library GSoC project to improve the system, which is now hosted at Github: https://github.com/zoq/benchmarks Currently, the system is run on this ridiculous contraption I set up in my free time (note the sparc and sparc64 boxes, very important for testing!): http://www.ratml.org/misc_img/build_farm.jpg So, the point of this email is to see if scikit-learn finds this system interesting and might want to help out with an effort to improve the project. Here are ideas from both the mlpack ideas list and the Shogun ideas list: http://www.mlpack.org/trac/wiki/SummerOfCodeIdeas#ImprovementofautomaticbenchmarkingsystemcollaborationwithShogun http://www.shogun-toolbox.org/page/Events/gsoc2014_ideas#mlpack Given the interest we have had so far, I think that a good place to start might be improving the system so it can answer more than "Which library is fastest for this task?" and maybe it can answer questions more like "Which library is most accurate with default options for this task?" or "Which particular algorithm provides the best accuracy/runtime tradeoff?" At the same time, I think we have to be careful so as to not make the system so incredibly complex that it can answer any question at all, but nobody knows what questions to ask, or the questions are simply too difficult to ask. (I think http://mlcomp.org/ suffers from this problem; when I go to their website, I'm so overloaded with information that I don't really even know how to begin asking the system questions.) I'm not on the list, so please CC me in any responses to this thread that go to the list. If there is sufficient interest, we can involve the Shogun guys in a discussion to decide what might be interesting to do over a summer. Thanks, Ryan -- Ryan Curtin | "In honor of the last American hero, to whom speed [email protected] | means freedom of the soul." - Super Soul ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
