On Thu, Nov 10, 2011 at 2:31 AM, Olivier Grisel <[email protected]> wrote:
> Maybe it would be better to have a dedicated method for this use case > rather that using complex kwargs in the `fit` method. What about using > a `fit_from_kernel` or `fit_from_affinity` method (if you find a > better name)? For me the main problem of precomputed kernels is more the data representation than the method names. 1) storing a symmetric matrix in a 2d dense array is inefficient memory-wise 2) some values in the matrix may never be needed during training (this is the case for SVMs for example) so computing them all is a waste 3) for SVC.predict, requiring a n_test x n_train dense array is a huge waste For 3), I'd suggest using n_test x n_SV array instead. For 1), using a upper triangular packed format (i.e. store the values in a 1d-array) would be a solution and would be easy to use from Cython. Combined with mmap arrays, it would allow to store large Gram matrices (but wouldn't solve problem 2)). User-defined kernel functions could be an answer to all 1), 2) and 3) if the function was called on-demand but I think the current implementation just pre-computes the entire Gram matrix. Currently, people who want to learn SVMs on large-scale datasets with custom kernels should probably use libsvm's C++ API directly. Mathieu ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
