I've just sent a PR which implements Oliviers solution - The overhead
(converting to f-style) only applies to multi-class classification so
I think its ok to do it this way. SGDClassifier currently does not
support `partial_fit` only via the `init_coef` and `init_intercept`
arguments, they are automatically converted to c-style before they are
used in model fitting so everything is fine.

As soon as I work on implementing partial_fit I'll refactor the whole
thing to avoid mem copies whenever possible.

https://github.com/scikit-learn/scikit-learn/pull/545

best,
 Peter

2012/1/9 Peter Prettenhofer <[email protected]>:
> 2012/1/9 Peter Prettenhofer <[email protected]>:
>> 2012/1/8 Mathieu Blondel <[email protected]>:
>>> If I'm not mistaken (I just read the source code on github), the copy
>>> that Peter is experiencing is due to ravel() in this method:
>>> https://github.com/scipy/scipy/blob/master/scipy/sparse/compressed.py#L264
>>>
>>> This method in turn invokes csr_matvecs which is implemented here:
>>> https://github.com/scipy/scipy/blob/master/scipy/sparse/sparsetools/csr.h#L1010
>>>
>>> This method takes a sparse matrix and a flat array (C-style ordered)
>>> as inputs. The advantage of using ravel() here is that another
>>> implementation is not needed to handle Fortran-style arrays. However,
>>> it does result in a copy.
>>>
>>> In predict, SGDClassifier does a safe_sparse_dot(X, self.coef_.T).
>>> Therefore, if coef_ is Fortran-style, coef_.T becomes C-style, which
>>> is the format expected by ravel() to avoid a copy.
>>
>> I just checked: The issue does only apply to multi class classification!
>> For binary classification ``coef_`` is a one dimensional array which
>> is both c and fortran style.
>>
>> ``ravel`` is only used in the binary case so it is not responsible for
>> the copy.
>>
>
> BTW: I just looked at csr_matrix.__mul__ and
> sparse.compressed._mul_multivector - there are ravels all over the
> place that trigger a copy of the view.
>
>
> --
> Peter Prettenhofer



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to