2012/7/3 Jake Vanderplas <[email protected]>:
> Hi folks,
> I turned in the first draft of my PhD thesis yesterday,

Congrats :)

> 1) The tutorial examples make use of several astronomy-specific
> datasets.  These primarily come from publicly available data at the
> Sloan Digital Sky Survey [2], but I've done some preprocessing and
> loaded the datasets onto my own web site.  This location for the data
> files should be long-lived: the website is associated with a
> python-based statistics textbook I'm coauthoring, which will be
> published early 2013.  Currently, I've put the loaders from these
> datasets in the example plotting scripts themselves.  Should the loaders
> be moved to sklearn.datasets, so the data can be used for general
> examples which are not associated with the tutorial? Or do you think
> it's OK to have tutorial-specific loaders left out of the general
> sklearn package?

+1 for making them available as fetch_* function. It would be even
better to publish them on mldata.org and use the default mldata loader
from scikit-learn.

> 2) In the tutorial files, I'd like to show in-line example code for the
> loading, processing, and plotting of these datasets.  I think this may
> pose a problem for doctests, because it could result in large downloads
> and/or matplotlib plotting when nosetests are run. My gut feeling is
> that it would be better to have nosetests ignore the code snippets in
> the tutorial.  Any input on this?  What's the best way to tell doctests
> to ignore these code blocks (short of an ignore directive on each line)?

It's is possible to setup fixtures for nose tests. You could skip
those doctests (raise SkipTest) if the data is not already available
in the local cache:

https://github.com/scikit-learn/scikit-learn/blob/master/doc/datasets/labeled_faces_fixture.py
https://github.com/scikit-learn/scikit-learn/blob/master/Makefile#L33

> 3) Currently the exercises follow the format that Olivier set up in his
> tutorials, with a "skeleton" and a "solution" for each example script.
> On Fernando's suggestion, I'd like to move to using ipython notebooks
> for these examples.  I think it leads to a much smoother interface,
> especially the ability to try-out code snippets one-by-one, avoiding
> errors associated with running incomplete code.  This may lead to a
> problem: ipython notebook is still relatively new, and not everyone can
> read *.ipynb files.  Any input on whether I should remove the current
> skeleton/solution scripts in favor of ipython notebooks, or try to
> retain both versions of each example for broader compatibility?

I also think that partially filled notebooks would be a very good
teaching support, especially for tutored exercises sessions. Along
with conversion script to be able to turn a notebook-prototyped
session into a runnable program suitable for deployment in a real
application.

> 4) Any other issues I'm not thinking of?
>
> Thanks for taking the time to read through all this.  I should mention
> that I'm working on this now in preparation for the scikit-learn
> tutorial at Scipy 2012 in Austin two weeks from now.  I hope to see some
> of you there!

Enjoy!

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to