Re: [Scikit-learn-general] RFC (also by users) on interpreting 1d X

Gael Varoquaux Mon, 04 May 2015 04:48:07 -0700

On Mon, May 04, 2015 at 01:32:02PM +0200, federico vaggi wrote:
> I think Gael makes a very strong argument, but I think the error should be as
> explicit and informative as possible (for new users).


+1. Including suggesting the syntax X[:, np.newaxis], which is not
trivial.

G

> On Fri, May 1, 2015 at 7:58 PM, Gael Varoquaux <gael.varoqu...@normalesup.org>
> wrote:

>     I strongly advice raising an error. Very very very strongly.

>     Being lax about ambiguous inputs makes prototyping and interactive usage
>     easier: less typing, and the systems gets it right most of the time.
>     However, it makes production use and debugging complex code much harder.
>     Indeed, errors, that might not be related to a simple user error but
>     might be generated by a complex framework, do not lead to exceptions, but
>     to problems down the line.

>     We are not R. We require a bit more of typing, we don't have as many
>     shortcuts and magic syntax. But we can be used in production, on big
>     datasets. We can be used by people like Airbus to monitor failures of
>     part in planes [*], or by many others.

>     Yes beginners want things to 'just work', but in the long run, they are
>     thankful for a well-thought and strict specification.

>     Gaël


>     [*]
>     http://www.pyvideo.org/video/3519/
>     scikit-learn-for-predictive-maintenance-at-airbus

>     On Fri, May 01, 2015 at 06:51:00PM +0100, Luca Puggini wrote:
>     > I vote for 3.

>     > On Fri, May 1, 2015 at 6:27 PM, Andreas Mueller <t3k...@gmail.com> 
> wrote:

>     >     Hi all.
>     >     A quick questions on future API.
>     >     What should happen if a user passes an X with shape (N,), in other
>     words
>     >     X.ndim == 1?

>     >     This is unfortunately not really consistent in scikit-learn right
>     now.
>     >     Three things are possible:
>     >     1) Raise an error
>     >     2) N = n_features, that is X contains a single sample
>     >     3) N = n_samples, that is X has a single feature

>     >     I would think it should be N=n_samples. Gael thinks (iirc) we should
>     raise
>     >     an error.
>     >     In the code, we currently take N=n_features in predict,
>     decision_function,
>     >     predict_proba and transform, basically everywhere.
>     >     This is in part due to using ``check_array`` everywhere, which used
>     the
>     >     backward-compatible (but odd) behavior of np.atleast2d.

>     >     In ``fit``it looks like all estimators assume N=n_features, apart
>     from
>     >     DictionaryLearning, MinMaxScaler, StandardScaler, which assume N=
>     n_samples.

>     >     See https://github.com/scikit-learn/scikit-learn/pull/4511 for more
>     >     discussion

>     >     Obviously any change we make would mean a deprecation cycle, which
>     will
>     >     mean warning in 0.17 and 0.18 when someone gives a 1-dim X that 
> we'll
>     >     change something soon, and then actually change it in 0.19 (1.0?).

>     >     Andy

>     >   
>      
> ------------------------------------------------------------------------------
>     >     One dashboard for servers and applications across
>     Physical-Virtual-Cloud
>     >     Widest out-of-the-box monitoring support with 50+ applications
>     >     Performance metrics, stats and reports that give you Actionable
>     Insights
>     >     Deep dive visibility with transaction tracing using APM Insight.
>     >     http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>     >     _______________________________________________
>     >     Scikit-learn-general mailing list
>     >     Scikit-learn-general@lists.sourceforge.net
>     >     https://lists.sourceforge.net/lists/listinfo/scikit-learn-general





>     
> ------------------------------------------------------------------------------
>     > One dashboard for servers and applications across Physical-Virtual-Cloud
>     > Widest out-of-the-box monitoring support with 50+ applications
>     > Performance metrics, stats and reports that give you Actionable Insights
>     > Deep dive visibility with transaction tracing using APM Insight.
>     > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

>     > _______________________________________________
>     > Scikit-learn-general mailing list
>     > Scikit-learn-general@lists.sourceforge.net
>     > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
-- 
    Gael Varoquaux
    Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] RFC (also by users) on interpreting 1d X

Reply via email to