On Mon, Jan 23, 2012 at 11:37:16AM +0200, Dimitrios Pritsos wrote: > So, is there a any tip for me to fit() the model in stages i.e not to > bring the whole data set in Memory during the learning process. As I can > see in my code when I am giving an EArray as an argument to Fit() it > seem to load everything in memory in order to train the model, so I > cannot exploit the Pytables feature i.e Arrays to "Live" on the Disk and > not on the Ram.
That's called 'out of core computing', and can be implemented using on-line or mini-batch algorithms. The scikit doesn't yet have a complete framework to do this, but a few estimators expose a 'partial_fit' method that will get you part of the way. Namely, you can try and use the SGD for this. Please note that your data should be somewhat i.i.d distributed in the sample direction. In other words, if your first samples all look similar to each other, and look very different from the last one, you'll be in trouble. Hope this helps, Gael PS: please do not reply to a thread with a totally different topic: start a new thread. ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
