hi Immanuel,

yes centering the data can take some time and as_float_array returns a
C-contiguous
array with copy=True

What I would do is to add support for order in as_float_array so we avoid an
unnecessary copy.

nice catch !

Alex

On Thu, Aug 16, 2012 at 12:06 PM, iBayer <[email protected]> wrote:
> Hello,
>
> I just got the impression that linear_model.base.center_data always returns
> a C_CONTIGUOUS array even if
> it has been made f-continous before X = np.asfortranarray(X) (see ipython
> session below). It looks to me that this causes some expensive
> memory operation when fitting some models (see line profile below)
> Am I missing something here?
>
> best,
> Immanuel
>
> -----
> In [10]: from sklearn.linear_model.base import center_data
>
> In [12]: from sklearn.datasets.samples_generator import make_regression
>
> In [13]: %paste
> X, y, coef = make_regression(n_samples=10000, n_features=5000,
> n_informative=1000,
>                 random_state=0, coef=True)
> X = np.asfortranarray(X)
> ## -- End pasted text --
>
> In [14]: X.flags
> Out[14]:
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : True
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> In [17]: %paste
>         X, y, X_mean, y_mean, X_std = center_data(X, y,
>                 fit_intercept=False, normalize=False, copy=True)
> ## -- End pasted text --
>
> In [18]: X.flags
> Out[18]:
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
> ----
>
> ---
> Timer unit: 1e-06 s
>
> File:
> /home/mane/git/enet_strong_rules/sklearn/linear_model/coordinate_descent.py
> Function: _dense_fit at line 167
> Total time: 7.60257 s
>
> Line #      Hits         Time  Per Hit   % Time  Line Contents
> ==============================================================
>    167                                               def _dense_fit(self, X,
> y, Xy=None, coef_init=None,
>    168
> active_set_init=None, alpha_init=None, R_init=None):
>
>    177         1            2      2.0      0.0          X, y, X_mean,
> y_mean, X_std = self._center_data(X, y,
>    178         1      4457984 4457984.0     58.6
> self.fit_intercept, self.normalize, copy=self.copy_X)
>    200         1      1273583 1273583.0     16.8          X =
> np.asfortranarray(X)  # make data contiguous in memory
>    213         1            5      5.0      0.0
> self._fit_enet_with_strong_rule(X, y, Xy,
>    214         1            4      4.0      0.0
> active_set_init=active_set_init, coef_init=coef_init,
>    215         1      1867135 1867135.0     24.6
> alpha_init=alpha_init, R_init=R_init)
> ----
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to