Re: [Scikit-learn-general] Astronomy Tutorial

Gael Varoquaux Thu, 05 Jul 2012 09:35:02 -0700

On Tue, Jul 03, 2012 at 12:24:43PM -0700, Jake Vanderplas wrote:
> I turned in the first draft of my PhD thesis yesterday,


Congratulations!

> Should the loaders be moved to sklearn.datasets, so the data can be
> used for general examples which are not associated with the tutorial?
> Or do you think it's OK to have tutorial-specific loaders left out of
> the general sklearn package?

I haven't looked at this specific code, but I think that it is fine to
leave it in the examples unless it is fairly complex and shared accross
several examples. Downloading code in the examples is what we do in the
Mayavi examples, and it has worked out well.

> 2) In the tutorial files, I'd like to show in-line example code for the 
> loading, processing, and plotting of these datasets.  I think this may 
> pose a problem for doctests, because it could result in large downloads 
> and/or matplotlib plotting when nosetests are run. My gut feeling is 
> that it would be better to have nosetests ignore the code snippets in 
> the tutorial.  Any input on this?  What's the best way to tell doctests 
> to ignore these code blocks (short of an ignore directive on each line)?

A combination of ignore directive and mock code in rest comments is what
I have used successfully (see for instance
https://raw.github.com/nisl/tutorial/master/doc/haxby_decoding.rst )

By the way that tutorial is also a possible example of how to easy blend
example and text together, as it grabs portion of its code from an
example file.

> 3) Currently the exercises follow the format that Olivier set up in his 
> tutorials, with a "skeleton" and a "solution" for each example script.  
> On Fernando's suggestion, I'd like to move to using ipython notebooks 
> for these examples.  I think it leads to a much smoother interface, 
> especially the ability to try-out code snippets one-by-one, avoiding 
> errors associated with running incomplete code.  This may lead to a 
> problem: ipython notebook is still relatively new, and not everyone can 
> read *.ipynb files.  Any input on whether I should remove the current 
> skeleton/solution scripts in favor of ipython notebooks, or try to 
> retain both versions of each example for broader compatibility?

I am very clearly -1 on this suggestion for several reasons:

a. I worry very much about leaving the tried and tested notion of a source
   code file. We have a complete development and maintenance flow that is
   based upon and that it would break, it particular:
    1. Version control: I don't know how people do version control of
       notebooks, but I am a bit worried of what the diffs will look
       like. I think that we currently have a great workflow with git and
       github.
    2. Testing: with the scipy-lecture notes and the NISL tutorial
       (https://github.com/nisl/tutorial) we really found it hard to
       make sure that across time the tutorials did not break. We now have
       a policy that all code must be doctested, and all figures must be
       generated from an example (pretty much the policy that we have in
       the scikit-learn). Doctesting can be tricky, I am quite happy to
       rely on the existing best practices that work for rst and source
       code files.

b. The Notebook is still bleeding edge. We have a clear policy of not
   depending on any package that is not a few years old, and I think that
   it has payed off. We don't want to loose people for such a feature.

c. Not everybody likes a Notebook interface. I was recently giving a
   tutorial and we had a discussion in the classroom on the Notebook, and
   it turns out that about 50% of the people in that classroom preferred a
   Matlab-like environment in which you have a separate editor, and the
   remaining 50% preferred a notebook-like interface. I personally am an
   very unhappy man if I edit code in a browser.

To sum up, I think that it is a good idea, but here we must choose where
we innovate. My concerns will be clearly addressed in the near future,
but we can wait and focus on innovating on the machine learning side.

That said, there is a _huge_ value in your suggestion: it would provide
users a great environment to interact with the exercise and the solution,
and I would love to see IPython notebook poping up alongside document in
the future. I think that a technical solution to most of my concerns
would be a way to generate a notebook from a Python source code file.

Fernando's suggestion of using nbconvert partly addresses point b and c,
but not point a, which is the most important to my eyes. Beside it would
still force our developers to use a recent version of the notebook. I am
quite convinced that we should make things as easy and uneventful as
possible for our developers, and that entails trying as much as possible
not to force a choice of tools upon them.

> 4) Any other issues I'm not thinking of?

I need to review this. I am completely exhausted after one month straight
of traveling (including a scikit-learn tutorial at a NeuroImaging
conference) and a bit sick. Tomorrow I have to catch up with things at
work, and I was planning to stay far away from my computer tonight. Was
are your time constrains?

Cheers,

Gaël

PS: thanks for the tutorial, it's really cool.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Astronomy Tutorial

Reply via email to