On Fri, Oct 7, 2011 at 5:07 AM, Olivier Grisel <[email protected]> wrote:
> 2011/10/7 Ian Goodfellow <[email protected]>:
>> I understand that LinearSVC is implemented using liblinear, which I thought
>> should work well with large datasets. However, when I pass LinearSVC.fit a
>> design matrix of size 40,000 x 14,400 (in float32 format, so 2.3 gigabytes)
>>
>> it ends up using at least 8 additional gigabytes of RAM!
>
> I am pretty sure that liblinear uses a sparse format based on 64bit
> floats and integer indices internally. So if your data is dense (very
> few zeros) that means:

Also the wrappers always make a copy of the data when calling
liblinear functions. This means that even if your data has the optimal
layout, you will need twice as much memory as your original dataset.

We avoided this in the libsvm wrappers, but no one has implemented it
for the liblinear ones ...

Fabian.

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to