On 01/23/2012 03:46 PM, Dimitrios Pritsos wrote: > On 01/23/2012 03:20 PM, Olivier Grisel wrote: >> 2012/1/23 Dimitrios Pritsos<[email protected]>: >>> On 01/23/2012 02:20 PM, Lars Buitinck wrote: >>>> 2012/1/23 Dimitrios Pritsos<[email protected]>: >>>>> On 01/23/2012 12:24 PM, Olivier Grisel wrote: >>>>>> BTW: what is the structure of you data in PyTables? Is is mapped to a >>>>>> scipy.sparse Compressed Sparse Row datastructure? How many features do >>>>>> you have in your dataset? >>>>> The training data are in a EArray (Compressed per row due to lots of >>>>> zeros). >>>>> I have 34000 Samples and the length of my Dictionary depending on the >>>>> Training Set is about 1,500,000. >>>>> However, using about 30,000 features seems satisfactory for a >>>>> proof-of-concept case. However the samples needs to be approximately >>>>> about 30-50k. >>>> That would be doable. 30k features × 50k samples in a CSR matrix with >>>> dtype=float32, assuming it's 90% zeros (a pessimistic guess for topic >>>> spotting) would take just over 2GB. >>>> >>> I will give it a try however in some of my tests had a memory management >>> problem. As I can recall it was mostly because of numpy function that >>> might ask from pyTable to load every thing in main men. I guess some >>> loops and some slicing might solve the problem. >>> >>> However I fist try to figure out how to use linear_model.SGDClassifier >>> which it suppose to be capable to be trained in stages. Plus since I am >>> using Linear Kernel it won't effect my results. >>> >>> Still I will give a try to the Sparse structure. >> BTW, if you find a way to load your data into a >> scipy.sparse.csr_matrix that fits in memory at once then you don't >> need to bother with the `partial_fit` method of SGDClassifier. Just >> use the regular fit method and you will be fine. >> > oops that was the missing method from Ref Documentation. Thank! (ie > partial_fit())
I guess I misunderstood something here. There is no partial_fit(). Plus I haven't manage to figure out how to do the partial fit. I have the latest SKLEART I retrieved by git. Am I missing something? > ------------------------------------------------------------------------------ > Try before you buy = See our experts in action! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-dev2 > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
