I am currently playing with the RandomForestEmbedding in scikit-learn using the
MNIST data. I train the RandomForestEmbedding using random matrix of the shape
(1000,784) and use it to transform the MNIST data then feed it into linear SVM.
The training time of the SVM is about 37s and the accuracy is about 93%.
On the other hand, if I train the RandomForestEmbedding using the first 1000
samples of the dataset and then feed the transformed data into linear SVM, the
training time drops to about 3s and the accuracy is about 97%.
Both transformed data has the same level of sparsity. The increase in accuracy
might not be surprised. But I wonder why the drastic drop in training time. Any
idea?
--
CalebĀ
On Sunday, March 2, 2014 11:34 PM, Olivier Grisel <olivier.gri...@ensta.org>
wrote:
Inverting 0 and 1 does not change the problem mathematically but does
change the number of multiplication addition operations performed when
computing the dot products or euclidean norms involved in the
computation of columns of the kernel matrix when a sparse
representation of the input data is used.
--
Olivier
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general