On 14 January 2013 17:42, Gael Varoquaux <[email protected]>wrote:

> > I've been having a lot of trouble loading as a numpy array. I know
> > generally how to do it, but I must be doing it wrong since the numpy
> > array can't fit in memory, whle the "list of strings" representation
> > does....
>
> I believe that it's because the string are store in a 'string
> representation', and thus padded to the longest string. Try using
> 'dtype=object' in the constructor to avoid that problem.
>
> HTH,
>
> Gaƫl
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. SALE $99.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122412
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



This would be the culprit, there are some outliers in terms of length in
the dataset.
I'll check tomorrow at work, but I would guess this fixed it.

(as an aside, I was wondering if the dtype=str was actually a
representation or just an API wrapper around string pointers, now I know.


-- 

Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to