Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

Dimitrios Pritsos Mon, 23 Jan 2012 03:18:29 -0800

On 01/23/2012 12:24 PM, Olivier Grisel wrote:
> Have a look at `sklearn.linear_model.SGDClassifier` that supports a
> partial_fit method in master that you can call several times with
> slices of data.


Thanx for the Ref I will have a look right now

> BTW: what is the structure of you data in PyTables? Is is mapped to a
> scipy.sparse Compressed Sparse Row datastructure? How many features do
> you have in your dataset?
>

The training data are in a EArray (Compressed per row due to lots of 
zeros).
I have 34000 Samples and the length of my Dictionary depending on the 
Training Set is about 1,500,000.
However, using about 30,000 features seems satisfactory for a 
proof-of-concept case. However the samples needs to be approximately 
about 30-50k.
I am not very experienced in neither Pytable or nympy/scipy, however, I 
don't think that a scipy.sparse can fit my Data in Ram even with a 
smaller Dictionary. At this was my convolution in some of my preliminary 
testes while building the preprocessing phase for the Evaluation tests. 
By the way my Ram is 4Gb.

Thank you very much for your quick response!


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

Reply via email to