On Sat, Jul 21, 2012 at 12:18:46AM +0200, Olivier Grisel wrote:
> Recent merges have slowly decreased the test coverage ratio of the
> code base (from nearly 90% down to 85% now):

This email discusses how to do something about this.

I am not sure that everybody knows the efficient workflow to improve test
coverage. It's easy:

1. Run 'make test-coverage'. The output lists for each file the line
   numbers that are not tested.

2. Find a low hanging fruit, look at which lines are not tested, write or
   adapt a test specifically for these lines.

3. Loop.

Its also a great way to improve your knowledge of the scikit-learn
codebase.

> https://jenkins.shiningpanda.com/scikit-learn/job/python-2.7-numpy-1.5.1-scipy-0.10.0/
As a side note, I get different covering numbers when running test-coverage
on my box than on jenkins. No big deal.


Non-algorithmic code that needs to be tested
=============================================

There is a lot of misc code that is not tested, such as setup.py. This is
actually something that I don't believe that we can ignore for two
reasons: i) the broken window effect and ii) the fact that this is code
that indeed needs to be supported and maintained in the long run.

* Setup.py

  Reason ii) does not really applied to the setup.py, as they are tested
  during the build. Actually, I am wondering: is it possible to rig up
  jenkins so that it is an install that is performed and tested, and not a
  build inplace? The install would then be tested, which is an additional
  benefit.

* Datasets: a lot of dataset downloading code does not get tested. I am a
  bit uneasy about this. It would be great to write some testing code,
  for instance based on mocking urllib.

* I have checked in a refactor and tests for __check_build:
  
https://github.com/scikit-learn/scikit-learn/commit/f29363092c456d87e9d344f018938f8f0bb6ae23
  this is an example of how code can be engineered to be tested as much
  as possible.

* The joblib tests do not seem to be running. I'll see what I can do about
  it.

Algorithmic code that needs more tests
========================================

Here are what I believe are low-hanging fruit that could use some love.
It's probably not a huge amount of work to improve testing, and I
wouldn't be surprised that once we do this, we uncover bugs or
inconsistent behaviors:

* Gaussian processes:
  
https://jenkins.shiningpanda.com/scikit-learn/job/python-2.7-numpy-1.5.1-scipy-0.10.0/902/cobertura/sklearn_gaussian_process

* Spectral clustering:
  
https://jenkins.shiningpanda.com/scikit-learn/job/python-2.7-numpy-1.5.1-scipy-0.10.0/902/cobertura/sklearn_cluster/spectral_py/

* Some feature selection code:
  
https://jenkins.shiningpanda.com/scikit-learn/job/python-2.7-numpy-1.5.1-scipy-0.10.0/902/cobertura/sklearn_feature_selection/

* sklearn.utils.bench

Now its just a simple matter of programming :)

Gael

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to