2012/7/3 Jake Vanderplas <[email protected]>: > Hi folks, > I turned in the first draft of my PhD thesis yesterday,
Congrats :) > 1) The tutorial examples make use of several astronomy-specific > datasets. These primarily come from publicly available data at the > Sloan Digital Sky Survey [2], but I've done some preprocessing and > loaded the datasets onto my own web site. This location for the data > files should be long-lived: the website is associated with a > python-based statistics textbook I'm coauthoring, which will be > published early 2013. Currently, I've put the loaders from these > datasets in the example plotting scripts themselves. Should the loaders > be moved to sklearn.datasets, so the data can be used for general > examples which are not associated with the tutorial? Or do you think > it's OK to have tutorial-specific loaders left out of the general > sklearn package? +1 for making them available as fetch_* function. It would be even better to publish them on mldata.org and use the default mldata loader from scikit-learn. > 2) In the tutorial files, I'd like to show in-line example code for the > loading, processing, and plotting of these datasets. I think this may > pose a problem for doctests, because it could result in large downloads > and/or matplotlib plotting when nosetests are run. My gut feeling is > that it would be better to have nosetests ignore the code snippets in > the tutorial. Any input on this? What's the best way to tell doctests > to ignore these code blocks (short of an ignore directive on each line)? It's is possible to setup fixtures for nose tests. You could skip those doctests (raise SkipTest) if the data is not already available in the local cache: https://github.com/scikit-learn/scikit-learn/blob/master/doc/datasets/labeled_faces_fixture.py https://github.com/scikit-learn/scikit-learn/blob/master/Makefile#L33 > 3) Currently the exercises follow the format that Olivier set up in his > tutorials, with a "skeleton" and a "solution" for each example script. > On Fernando's suggestion, I'd like to move to using ipython notebooks > for these examples. I think it leads to a much smoother interface, > especially the ability to try-out code snippets one-by-one, avoiding > errors associated with running incomplete code. This may lead to a > problem: ipython notebook is still relatively new, and not everyone can > read *.ipynb files. Any input on whether I should remove the current > skeleton/solution scripts in favor of ipython notebooks, or try to > retain both versions of each example for broader compatibility? I also think that partially filled notebooks would be a very good teaching support, especially for tutored exercises sessions. Along with conversion script to be able to turn a notebook-prototyped session into a runnable program suitable for deployment in a real application. > 4) Any other issues I'm not thinking of? > > Thanks for taking the time to read through all this. I should mention > that I'm working on this now in preparation for the scikit-learn > tutorial at Scipy 2012 in Austin two weeks from now. I hope to see some > of you there! Enjoy! -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
