When using cross_validation.X, all arrays are checked in the normal way --
using check_arrays.
I am developing code that uses string documents as input, so I have a list
of strings as the "data" and a numpy array as classes as normal.
(In case anyone doesn't know, my research area is authorship analysis.)
I have classes that use the Classifier mixins etc, so they work well with
cross validation, except that a copy of the data is made to create the
numpy array.
Normally this is fine, but I'm now working with a really large dataset that
fits into memory only once.
The copy that gets made by check_array causes a memory error.
My question: converting to numpy arrays is intended behaviour, and fits
with the rest of the project. Should there be a way to turn it off? i.e.
"respect_input_type=True" argument?
- Robert
--
Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general