Thanks for that explanation, Josef. sklearn uses __init__ and its __all__
to define a public API where across-version compatibility is maintained. It
abstracts path on disk from import path by each top-level submodule
exporting all public names within that submodule (whether it's a deeper
package or not). So generally there's at most a single namespace within
sklearn that one needs to recall for each class/function. It's just not all
exported to the root. (The file-import path abstraction still has problems
when things are moved, or when there are issues of design and dependency
topology that leaves LinearSVC in sklearn.svm rather than
sklearn.linear_model, which is where it belongs IMHO.)

The idea of keeping everything namespaced has advantages, although it seems
most verbose. Here are some options:

import sklearn.api as skl
skl.grid_search.GridSearchCV(skl.pipeline.Pipeline([
    ('sel', skl.feature_selection.SelectKBest(skl.feature_selection.chi2)),
    ('clf', skl.svm.LinearSVC())
], {'clf__C': [.1, 1.]}))

vs (currently supported)

from sklearn import pipeline, feature_selection, grid_search, svm
grid_search.GridSearchCV(pipeline.Pipeline([
    ('sel', feature_selection.SelectKBest(feature_selection.chi2)),
    ('clf', svm.LinearSVC())
], {'clf__C': [.1, 1.]}))

vs

import sklearn.api as skl
skl.GridSearchCV(skl.Pipeline([
    ('sel', skl.SelectKBest(skl.chi2)),
    ('clf', skl.LinearSVC())
], {'clf__C': [.1, 1.]}))

vs

from sklearn.api import GridSearchCV, Pipeline, SelectKBest, chi2, LinearSVC
import sklearn.api as skl
GridSearchCV(Pipeline([
    ('sel', SelectKBest(chi2)),
    ('clf', LinearSVC())
], {'clf__C': [.1, 1.]}))

The first alternative helps the use-case where presently you would import
one module, then realise you need another and import it too. The second
assumes you know all required modules up front, while the third and fourth
provide minimal namespacing, which is good for conciseness, but perhaps bad
for zen. (They happen to avoid problems of remembering that LinearSVC is in
svm rather than linear_model, but I'm not sure that's what we should be
solving here.)

I would think the third and fourth are to be frowned upon in application
code but useful in scripting, while the first two (and certainly the
second) might be considered admissible.

~J


On Tue, Dec 3, 2013 at 8:57 AM, <josef.p...@gmail.com> wrote:

> On Mon, Dec 2, 2013 at 4:17 PM, Gael Varoquaux
> <gael.varoqu...@normalesup.org> wrote:
> > On Tue, Dec 03, 2013 at 06:56:14AM +1100, Joel Nothman wrote:
> >> As for  "There should be one-- and preferably only one --obvious way to
> >> do it," Gaƫl, I feel there are times where the one obvious way to do it
> >> should be conditioned on whether you're building an application or
> >> writing a quick script / interactive play.
> >
> > The problem with pushing that logic too far is that it ends up creating a
> > jump in difficulty going from the interactive use to the proper one. It's
> > a mistake that we did with Mayavi.
> >
> > Ideally, the "correct usage pattern" should be design in a way that it is
> > reasonnably easy to use for interactive use.
> >
> > Going back to the topic of the discussion: if we add an 'api' module,
> > which pattern to we want to document and encourage?
>
> To explain a bit why statsmodels got it's apis:
>
> One reason was that we can put the main models, OLS, WLS, GLM, RLM,
> .... into the main namespace for easy access
> instead of the actual path, for example sm.OLS instead of
> statsmodels.regression.linear_models.OLS
>
> The main reason for me was that we don't import anything with `import
> statsmodels`. I don't like it at all that if I want a simple function
> from scipy.stats, then I have to import half or two thirds of scipy.
> (slooow)
> As a consequence, almost all our __init__.py are empty.
> (Although `import statsmodels.regression.linear_models.OLS` still
> imports the large parts of scipy.)
>
> The other consequence is that we can, to some extend, separate actual
> code paths from the import paths. The code has in several cases deeper
> levels than the recommended imports. We shorten and pool the import
> paths in the api.
>
> Almost all our documentation examples use the API imports. Almost all
> code imports from the individual modules.
>
> (To make it more tricky we also recommend `import statsmodels.formula
> as smf` to get the formula interface.
> smf.ols, smf.poisson, .... or the slightly longer `sm.formula.ols` but
> that's less well established as a single recommended pattern.)
>
> http://statsmodels.sourceforge.net/devel/importpaths.html
>
> Josef
>
>
> >
> > G
> >
> >
> ------------------------------------------------------------------------------
> > Rapidly troubleshoot problems before they affect your business. Most IT
> > organizations don't have a clear picture of how application performance
> > affects their revenue. With AppDynamics, you get 100% visibility into
> your
> > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
> AppDynamics Pro!
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to