Re: [Scikit-learn-general] : SVM and Sparse Data in (latest) version of sklearn

Dimitrios Pritsos Fri, 14 Sep 2012 03:09:26 -0700

On 09/13/2012 09:27 PM, Lars Buitinck wrote:
> 2012/9/13 Dimitrios Pritsos<[email protected]>:
>> There is a Great difference in the performance of SVM.fit() method
>> (OneClassSVM in particular) depending on the input. When the input is a
>> Sparse Matrix the Training is Extremely slow for a very small amount of
>> data i.e. 180x1000 matrix where 1000 are the features size and 180 are
>> the samples. On the other hand an Array input of the same size is quite
>> fast, even faster than the Libsvm Python API as I can recall.
>>
>> Is that normal or I ve encountered some short of a bug?
> No, that's not normal. Good you tell us...
>
> 1. How slow is slow?
More than 10min per Class training. In same cases more than 30min so I 
stopped it. In Dense Arrays for the same matrix requires only few sec 
per Class training.
> 2. Is it equally slow when you use the deprecated
> sklearn.svm.sparse.OneClassSVM?
I don't know I can test it and let you know, after I will finish a test 
that is currently running.
> 3, How sparse is your data? I.e., how many zeros are there in X?
>
They normally be very sparse maybe 30% or less of variables are non-zero 
because they derive from Documents (in particular webpages) with a 
Corpus Vocabulary sized from 1000 to 120000 terms/tokens.


Regards

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] : SVM and Sparse Data in (latest) version of sklearn

Reply via email to