On Tue, Jul 03, 2012 at 12:24:43PM -0700, Jake Vanderplas wrote: > I turned in the first draft of my PhD thesis yesterday,
Congratulations! > Should the loaders be moved to sklearn.datasets, so the data can be > used for general examples which are not associated with the tutorial? > Or do you think it's OK to have tutorial-specific loaders left out of > the general sklearn package? I haven't looked at this specific code, but I think that it is fine to leave it in the examples unless it is fairly complex and shared accross several examples. Downloading code in the examples is what we do in the Mayavi examples, and it has worked out well. > 2) In the tutorial files, I'd like to show in-line example code for the > loading, processing, and plotting of these datasets. I think this may > pose a problem for doctests, because it could result in large downloads > and/or matplotlib plotting when nosetests are run. My gut feeling is > that it would be better to have nosetests ignore the code snippets in > the tutorial. Any input on this? What's the best way to tell doctests > to ignore these code blocks (short of an ignore directive on each line)? A combination of ignore directive and mock code in rest comments is what I have used successfully (see for instance https://raw.github.com/nisl/tutorial/master/doc/haxby_decoding.rst ) By the way that tutorial is also a possible example of how to easy blend example and text together, as it grabs portion of its code from an example file. > 3) Currently the exercises follow the format that Olivier set up in his > tutorials, with a "skeleton" and a "solution" for each example script. > On Fernando's suggestion, I'd like to move to using ipython notebooks > for these examples. I think it leads to a much smoother interface, > especially the ability to try-out code snippets one-by-one, avoiding > errors associated with running incomplete code. This may lead to a > problem: ipython notebook is still relatively new, and not everyone can > read *.ipynb files. Any input on whether I should remove the current > skeleton/solution scripts in favor of ipython notebooks, or try to > retain both versions of each example for broader compatibility? I am very clearly -1 on this suggestion for several reasons: a. I worry very much about leaving the tried and tested notion of a source code file. We have a complete development and maintenance flow that is based upon and that it would break, it particular: 1. Version control: I don't know how people do version control of notebooks, but I am a bit worried of what the diffs will look like. I think that we currently have a great workflow with git and github. 2. Testing: with the scipy-lecture notes and the NISL tutorial (https://github.com/nisl/tutorial) we really found it hard to make sure that across time the tutorials did not break. We now have a policy that all code must be doctested, and all figures must be generated from an example (pretty much the policy that we have in the scikit-learn). Doctesting can be tricky, I am quite happy to rely on the existing best practices that work for rst and source code files. b. The Notebook is still bleeding edge. We have a clear policy of not depending on any package that is not a few years old, and I think that it has payed off. We don't want to loose people for such a feature. c. Not everybody likes a Notebook interface. I was recently giving a tutorial and we had a discussion in the classroom on the Notebook, and it turns out that about 50% of the people in that classroom preferred a Matlab-like environment in which you have a separate editor, and the remaining 50% preferred a notebook-like interface. I personally am an very unhappy man if I edit code in a browser. To sum up, I think that it is a good idea, but here we must choose where we innovate. My concerns will be clearly addressed in the near future, but we can wait and focus on innovating on the machine learning side. That said, there is a _huge_ value in your suggestion: it would provide users a great environment to interact with the exercise and the solution, and I would love to see IPython notebook poping up alongside document in the future. I think that a technical solution to most of my concerns would be a way to generate a notebook from a Python source code file. Fernando's suggestion of using nbconvert partly addresses point b and c, but not point a, which is the most important to my eyes. Beside it would still force our developers to use a recent version of the notebook. I am quite convinced that we should make things as easy and uneventful as possible for our developers, and that entails trying as much as possible not to force a choice of tools upon them. > 4) Any other issues I'm not thinking of? I need to review this. I am completely exhausted after one month straight of traveling (including a scikit-learn tutorial at a NeuroImaging conference) and a bit sick. Tomorrow I have to catch up with things at work, and I was planning to stay far away from my computer tonight. Was are your time constrains? Cheers, Gaël PS: thanks for the tutorial, it's really cool. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
