2013/1/4 Olivier Grisel <[email protected]>: > I don't think it is an oversight. In one case it was easier to > generate a CSC layouted datastructure and a COO in the other.
I think you mean CSR here? > One does not want to trigger a memory copy by calling `.tocsr` in > advance if the next estimator in the pipeline needs a CSC layout. > > CSC representation is more efficient for coordinate descent based > algorithms (right now we just have linear regression models) or > (ensembles of) decision trees (currently the sparse input is not > implemented but it might in the future and at that point CSC will be > the most adapted memory layout). But COO->CSC makes a copy as well, right? So we could just as well build a CSR matrix directly to avoid a copy in the extremely common CountVectorizer->TfidfTransformer and CountVectorizer->atleast2d_or_csr pipelines. CSR->CSC shouldn't be more expensive than COO->CSC. We build CSR matrices in other places: SVMlight loader, hashing trick. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
