[Scikit-learn-general] Out of core/online

Gael Varoquaux Mon, 23 Jan 2012 05:41:06 -0800

On Mon, Jan 23, 2012 at 11:37:16AM +0200, Dimitrios Pritsos wrote:
> So, is there a any tip for me to fit() the model in stages i.e not to 
> bring the whole data set in Memory during the learning process. As I can 
> see in my code when I am giving an EArray as an argument to Fit() it 
> seem to load everything in memory in order to train the model, so I 
> cannot exploit the Pytables feature i.e Arrays to "Live" on the Disk and 
> not on the Ram.


That's called 'out of core computing', and can be implemented using
on-line or mini-batch algorithms. The scikit doesn't yet have a complete
framework to do this, but a few estimators expose a 'partial_fit' method
that will get you part of the way. Namely, you can try and use the SGD
for this.

Please note that your data should be somewhat i.i.d distributed in the
sample direction. In other words, if your first samples all look similar
to each other, and look very different from the last one, you'll be in
trouble.

Hope this helps,

Gael

PS: please do not reply to a thread with a totally different topic: start
a new thread.


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Out of core/online

Reply via email to