A reflective response without a clear opinion:

I'll admit to rarely-if-ever using function versions, and suspect they
frequently have limited utility over the estimator interface. Occasionally
they even wrap the estimator interface, so they're not going to provide the
efficiency advantages Gaƫl talks about.

While "People writing algorithms are not used to think in terms of
objects.", such people still know how to wrap an object to make it look
like a function. Seeing as there has been no consistent approach to
developing functional learners, I think that there are many functions that
effectively provide (data, estimator parameters) -> model attributes. This
is clearly a nice functional abstraction, but in truth, only those
functions that accept more/different parameters from their estimator
cousins, for instance only solve part of the learning problem, are
distinctively useful.

>From an API development perspective, functions that return model parameters
can be frustrating; they land up accumulating return_something flags in
order to fit changing/expanding output needs, while estimators act as a
namespace where diagnostic output can be dumped, usually at very little
cost. As with output, users may expect function input (i.e. argument
ordering) to be more fixed, in comparison to estimators where separating
data from parameters means it is more natural to use kwargs in
construction, or simply use set_params or attribute setting. So from the
perspective of version compatibility the function versions are harder to
maintain, and we've not yet really ascertained their benefit.

Their presence in the public API often duplicates the cost of maintaining
docstrings. But we could fairly disregard this issue, in part because even
when private we'd appreciate clear and explicit parameter/returns
documentation.

@Andy, the documentation implies these are for advanced use by (generally)
not referencing them in the narrative documentation. I think that's a fair
way to keep them only for the sight of those who dig deeper, but this
implicitness leaves some maintenance risks. While I don't think a note in
the docstring of each function version is the right solution, "See Also"
could be used to indicate the relationship. Additionally, or alternatively,
we could split classes.rst into "Estimators", "Low-level learning
functions" and "Utilities".

On 11 September 2015 at 01:21, Andreas Mueller <t3k...@gmail.com> wrote:

>
>
> On 09/10/2015 10:08 AM, Gael Varoquaux wrote:
> >> >And your statement "they are for advanced users" is not manifested in
> >> >the API or documentation.
> > OK, but that's a bug of the documentation.
> So you suggest adding to the docstring of every function "this is for
> advanced users only"?
> That is kind of like making them private, only that private is much more
> explicit.
> >> >There is no reason a user would expect one to act different from the
> other.
> > Users who don't code aglorithms probably don't have any reason to be
> > using them.
> >
> Well the reason would be they find them in the API docs and they don't
> know whether to use the class or the function.
>
> It is fair to summarize your opinion as
> "functions don't need input validation or a consistent interface, the
> documentation should make clear they
> are for advanced users"?
>
> FWIW many of the functions do input validation at the moment, it is just
> inconsistent.
>
>
> ------------------------------------------------------------------------------
> Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
> Get real-time metrics from all of your servers, apps and tools
> in one place.
> SourceForge users - Click here to start your Free Trial of Datadog now!
> http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to