2012/1/8 Mathieu Blondel <[email protected]>: > If I'm not mistaken (I just read the source code on github), the copy > that Peter is experiencing is due to ravel() in this method: > https://github.com/scipy/scipy/blob/master/scipy/sparse/compressed.py#L264 > > This method in turn invokes csr_matvecs which is implemented here: > https://github.com/scipy/scipy/blob/master/scipy/sparse/sparsetools/csr.h#L1010 > > This method takes a sparse matrix and a flat array (C-style ordered) > as inputs. The advantage of using ravel() here is that another > implementation is not needed to handle Fortran-style arrays. However, > it does result in a copy. > > In predict, SGDClassifier does a safe_sparse_dot(X, self.coef_.T). > Therefore, if coef_ is Fortran-style, coef_.T becomes C-style, which > is the format expected by ravel() to avoid a copy.
Thanks for the analysis. > Olivier's solution sounds good. And it's easy to implement too :) @pprett can you confirm it solves your perf issue on your data? > Another would be to implement a > routine that can handle the dot product with a Fortran-style array > directly in utils/sparsefuncs.pyx. +1 for opening an issue to as to make safe_sparse_dot smarter and avoid naive callers to get bitten. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
