On 01/23/2012 03:46 PM, Dimitrios Pritsos wrote:
> On 01/23/2012 03:20 PM, Olivier Grisel wrote:
>> 2012/1/23 Dimitrios Pritsos<[email protected]>:
>>> On 01/23/2012 02:20 PM, Lars Buitinck wrote:
>>>> 2012/1/23 Dimitrios Pritsos<[email protected]>:
>>>>> On 01/23/2012 12:24 PM, Olivier Grisel wrote:
>>>>>> BTW: what is the structure of you data in PyTables? Is is mapped to a
>>>>>> scipy.sparse Compressed Sparse Row datastructure? How many features do
>>>>>> you have in your dataset?
>>>>> The training data are in a EArray (Compressed per row due to lots of
>>>>> zeros).
>>>>> I have 34000 Samples and the length of my Dictionary depending on the
>>>>> Training Set is about 1,500,000.
>>>>> However, using about 30,000 features seems satisfactory for a
>>>>> proof-of-concept case. However the samples needs to be approximately
>>>>> about 30-50k.
>>>> That would be doable. 30k features × 50k samples in a CSR matrix with
>>>> dtype=float32, assuming it's 90% zeros (a pessimistic guess for topic
>>>> spotting) would take just over 2GB.
>>>>
>>> I will give it a try however in some of my tests had a memory management
>>> problem. As I can recall it was mostly because of numpy function that
>>> might ask from pyTable to load every thing in main men. I guess some
>>> loops and some slicing might solve the problem.
>>>
>>> However I fist try to figure out how to use linear_model.SGDClassifier
>>> which it suppose to be capable to be trained in stages. Plus since I am
>>> using Linear Kernel it won't effect my results.
>>>
>>> Still I will give a try to the Sparse structure.
>> BTW, if you find a way to load your data into a
>> scipy.sparse.csr_matrix that fits in memory at once then you don't
>> need to bother with the `partial_fit` method of SGDClassifier. Just
>> use the regular fit method and you will be fine.
>>
> oops that was the missing method from Ref Documentation. Thank! (ie
> partial_fit())

I guess I misunderstood something here. There is no partial_fit(). Plus 
I haven't manage to figure out how to do the partial fit.

I have the latest SKLEART I retrieved by git. Am I missing something?

> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to