On 14 January 2013 16:10, Kenneth C. Arnold <[email protected]>wrote:

> Why not use numpy arrays of strings all along? Their importance here is
> fancy indexing... Or use X=np.arange(N) and do the fancy indexing yourself
> on demand?
>
> -Ken
> On Jan 13, 2013 11:04 PM, "Robert Layton" <[email protected]> wrote:
>
>> When using cross_validation.X, all arrays are checked in the normal way
>> -- using check_arrays.
>> I am developing code that uses string documents as input, so I have a
>> list of strings as the "data" and a numpy array as classes as normal.
>> (In case anyone doesn't know, my research area is authorship analysis.)
>> I have classes that use the Classifier mixins etc, so they work well with
>> cross validation, except that a copy of the data is made to create the
>> numpy array.
>> Normally this is fine, but I'm now working with a really large dataset
>> that fits into memory only once.
>> The copy that gets made by check_array causes a memory error.
>>
>> My question: converting to numpy arrays is intended behaviour, and fits
>> with the rest of the project. Should there be a way to turn it off? i.e.
>> "respect_input_type=True" argument?
>>
>>
>> - Robert
>>
>>
>> --
>>
>> Public key at: http://pgp.mit.edu/ Search for this email address and
>> select the key from "2011-08-19" (key id: 54BA8735)
>>
>>
>> ------------------------------------------------------------------------------
>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> MVPs and experts. SALE $99.99 this month only -- learn more at:
>> http://p.sf.net/sfu/learnmore_122412
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. SALE $99.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122412
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

I've been having a lot of trouble loading as a numpy array. I know
generally how to do it, but I must be doing it wrong since the numpy array
can't fit in memory, whle the "list of strings" representation does....

I'll investigate that option a bit more.

-- 

Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to