On Sat, Jul 21, 2012 at 12:18:46AM +0200, Olivier Grisel wrote: > Recent merges have slowly decreased the test coverage ratio of the > code base (from nearly 90% down to 85% now):
This email discusses how to do something about this. I am not sure that everybody knows the efficient workflow to improve test coverage. It's easy: 1. Run 'make test-coverage'. The output lists for each file the line numbers that are not tested. 2. Find a low hanging fruit, look at which lines are not tested, write or adapt a test specifically for these lines. 3. Loop. Its also a great way to improve your knowledge of the scikit-learn codebase. > https://jenkins.shiningpanda.com/scikit-learn/job/python-2.7-numpy-1.5.1-scipy-0.10.0/ As a side note, I get different covering numbers when running test-coverage on my box than on jenkins. No big deal. Non-algorithmic code that needs to be tested ============================================= There is a lot of misc code that is not tested, such as setup.py. This is actually something that I don't believe that we can ignore for two reasons: i) the broken window effect and ii) the fact that this is code that indeed needs to be supported and maintained in the long run. * Setup.py Reason ii) does not really applied to the setup.py, as they are tested during the build. Actually, I am wondering: is it possible to rig up jenkins so that it is an install that is performed and tested, and not a build inplace? The install would then be tested, which is an additional benefit. * Datasets: a lot of dataset downloading code does not get tested. I am a bit uneasy about this. It would be great to write some testing code, for instance based on mocking urllib. * I have checked in a refactor and tests for __check_build: https://github.com/scikit-learn/scikit-learn/commit/f29363092c456d87e9d344f018938f8f0bb6ae23 this is an example of how code can be engineered to be tested as much as possible. * The joblib tests do not seem to be running. I'll see what I can do about it. Algorithmic code that needs more tests ======================================== Here are what I believe are low-hanging fruit that could use some love. It's probably not a huge amount of work to improve testing, and I wouldn't be surprised that once we do this, we uncover bugs or inconsistent behaviors: * Gaussian processes: https://jenkins.shiningpanda.com/scikit-learn/job/python-2.7-numpy-1.5.1-scipy-0.10.0/902/cobertura/sklearn_gaussian_process * Spectral clustering: https://jenkins.shiningpanda.com/scikit-learn/job/python-2.7-numpy-1.5.1-scipy-0.10.0/902/cobertura/sklearn_cluster/spectral_py/ * Some feature selection code: https://jenkins.shiningpanda.com/scikit-learn/job/python-2.7-numpy-1.5.1-scipy-0.10.0/902/cobertura/sklearn_feature_selection/ * sklearn.utils.bench Now its just a simple matter of programming :) Gael ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
