hi Immanuel, yes centering the data can take some time and as_float_array returns a C-contiguous array with copy=True
What I would do is to add support for order in as_float_array so we avoid an unnecessary copy. nice catch ! Alex On Thu, Aug 16, 2012 at 12:06 PM, iBayer <[email protected]> wrote: > Hello, > > I just got the impression that linear_model.base.center_data always returns > a C_CONTIGUOUS array even if > it has been made f-continous before X = np.asfortranarray(X) (see ipython > session below). It looks to me that this causes some expensive > memory operation when fitting some models (see line profile below) > Am I missing something here? > > best, > Immanuel > > ----- > In [10]: from sklearn.linear_model.base import center_data > > In [12]: from sklearn.datasets.samples_generator import make_regression > > In [13]: %paste > X, y, coef = make_regression(n_samples=10000, n_features=5000, > n_informative=1000, > random_state=0, coef=True) > X = np.asfortranarray(X) > ## -- End pasted text -- > > In [14]: X.flags > Out[14]: > C_CONTIGUOUS : False > F_CONTIGUOUS : True > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > In [17]: %paste > X, y, X_mean, y_mean, X_std = center_data(X, y, > fit_intercept=False, normalize=False, copy=True) > ## -- End pasted text -- > > In [18]: X.flags > Out[18]: > C_CONTIGUOUS : True > F_CONTIGUOUS : False > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > ---- > > --- > Timer unit: 1e-06 s > > File: > /home/mane/git/enet_strong_rules/sklearn/linear_model/coordinate_descent.py > Function: _dense_fit at line 167 > Total time: 7.60257 s > > Line # Hits Time Per Hit % Time Line Contents > ============================================================== > 167 def _dense_fit(self, X, > y, Xy=None, coef_init=None, > 168 > active_set_init=None, alpha_init=None, R_init=None): > > 177 1 2 2.0 0.0 X, y, X_mean, > y_mean, X_std = self._center_data(X, y, > 178 1 4457984 4457984.0 58.6 > self.fit_intercept, self.normalize, copy=self.copy_X) > 200 1 1273583 1273583.0 16.8 X = > np.asfortranarray(X) # make data contiguous in memory > 213 1 5 5.0 0.0 > self._fit_enet_with_strong_rule(X, y, Xy, > 214 1 4 4.0 0.0 > active_set_init=active_set_init, coef_init=coef_init, > 215 1 1867135 1867135.0 24.6 > alpha_init=alpha_init, R_init=R_init) > ---- > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
