2012/1/8 Mathieu Blondel <[email protected]>:
> If I'm not mistaken (I just read the source code on github), the copy
> that Peter is experiencing is due to ravel() in this method:
> https://github.com/scipy/scipy/blob/master/scipy/sparse/compressed.py#L264
>
> This method in turn invokes csr_matvecs which is implemented here:
> https://github.com/scipy/scipy/blob/master/scipy/sparse/sparsetools/csr.h#L1010
>
> This method takes a sparse matrix and a flat array (C-style ordered)
> as inputs. The advantage of using ravel() here is that another
> implementation is not needed to handle Fortran-style arrays. However,
> it does result in a copy.
>
> In predict, SGDClassifier does a safe_sparse_dot(X, self.coef_.T).
> Therefore, if coef_ is Fortran-style, coef_.T becomes C-style, which
> is the format expected by ravel() to avoid a copy.

I just checked: The issue does only apply to multi class classification!
For binary classification ``coef_`` is a one dimensional array which
is both c and fortran style.

``ravel`` is only used in the binary case so it is not responsible for
the copy.

>
> Olivier's solution sounds good. Another would be to implement a
> routine that can handle the dot product with a Fortran-style array
> directly in utils/sparsefuncs.pyx.

Such a utility would be great indeed - do you think we would need to
write this from scratch or are there some scipy/numpy convenience
methods that we could utilize?

AFAIK the alternatives so far are:

1) Change ``coef_`` to fortran-style
    If I change ``coef_`` to fortran style I've to make some changes
to the `sgd_fast` ext module since it is not stride aware.

2) Make ``coef_`` fortran-style after fit (Oliviers suggestion)
   Convenient but requires casting back to c-style for partial_fit +
some constant runtime

3) Create a dot product utility function that deals with the case
   Handy solution because it can be reused in a couple of places (e.g.
NaiveBayes); largest development effort.

>
> Mathieu
>
> On Mon, Jan 9, 2012 at 5:21 AM, Olivier Grisel <[email protected]> 
> wrote:
>> If the only change would be to do a:
>>
>> self.coef_ = np.asfortranarray(coef_)
>>
>> at the end of the fit method of the SGDClassifier and SGDRegressor
>> then I am all for it.
>>
>> We should just check that this indeed solves the memory copy issue you 
>> suspect.
>>
>> --
>> Olivier
>>
>> ------------------------------------------------------------------------------
>> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
>> infrastructure or vast IT resources to deliver seamless, secure access to
>> virtual desktops. With this all-in-one solution, easily deploy virtual
>> desktops for less than the cost of PCs and save 60% on VDI infrastructure
>> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
> infrastructure or vast IT resources to deliver seamless, secure access to
> virtual desktops. With this all-in-one solution, easily deploy virtual
> desktops for less than the cost of PCs and save 60% on VDI infrastructure
> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to