Re: [R] SVM classification based on pairwise distance matrix
Hi Steve, thanks a lot, I will haev a look at the kernel appraoch ,that looks promising. I will first have to study the theory behind before I use it, I guess. Cheers M. On 10/21/2010 5:42 PM, Steve Lianoglou wrote: Hi, On Thu, Oct 21, 2010 at 9:42 AM, Martin Tomkomartin.to...@geo.uzh.ch wrote: Dear all, I am exploring the possibilities for automated classification of my data. I have successfully used KNN, but was thinking about looking at SVM (which I did nto use before). I have a pairwise distance matrix of training observations which are classified in set classes, and a distance matrix of new observations to the training ones. It seems to me that since you have some pairwise distance metric, your original data is in some vector form. Why not just try using your original data (forget the pairwsise distance for now) and try a few different kernels for the svm, such as a linear kernel or an rbf/gaussian. Is it possible to use distance matrices for SVM, and if yes, which package would do so (e1071 ? ). I guess you can think of a kernel matrix as something like a distance matrix -- actually, it's more like a similarity matrix. I don't recall if e1071 allows you to use kernel matrix as input, but I'm pretty sure the svm functions from kernlab do. It was a pain to use, though. But anyway -- don't use your distance matrix :-) I have little experience with SVM, and I had the impression that it is a/ usually used with data taht have observations in terms of a number of variables (hence, not pariwise distances); With the exception of plugging in a kernel matrix (which was calculated from data in its original feature space) that's pretty much correct. b/ it is not well suited for large multidimensional spaces (I have a distance matrix of 200*200 observations, a part of this could be used as training data, but still, we are looking at say 50 distances per observation). But your distance matrix isn't really the same multidemensional space your data lives in, right? Anyway, like I said before, try the SVM on your original data with some different kernels. I think the RBF kernel should be closest in spirit to your distance matrix, and will likely perform better than your kNN ;-). Hope that helps, -steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SVM classification based on pairwise distance matrix
Dear all, I am exploring the possibilities for automated classification of my data. I have successfully used KNN, but was thinking about looking at SVM (which I did nto use before). I have a pairwise distance matrix of training observations which are classified in set classes, and a distance matrix of new observations to the training ones. Is it possible to use distance matrices for SVM, and if yes, which package would do so (e1071 ? ). I have little experience with SVM, and I had the impression that it is a/ usually used with data taht have observations in terms of a number of variables (hence, not pariwise distances); b/ it is not well suited for large multidimensional spaces (I have a distance matrix of 200*200 observations, a part of this could be used as training data, but still, we are looking at say 50 distances per observation). Thanks Martin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM classification based on pairwise distance matrix
Hi, On Thu, Oct 21, 2010 at 9:42 AM, Martin Tomko martin.to...@geo.uzh.ch wrote: Dear all, I am exploring the possibilities for automated classification of my data. I have successfully used KNN, but was thinking about looking at SVM (which I did nto use before). I have a pairwise distance matrix of training observations which are classified in set classes, and a distance matrix of new observations to the training ones. It seems to me that since you have some pairwise distance metric, your original data is in some vector form. Why not just try using your original data (forget the pairwsise distance for now) and try a few different kernels for the svm, such as a linear kernel or an rbf/gaussian. Is it possible to use distance matrices for SVM, and if yes, which package would do so (e1071 ? ). I guess you can think of a kernel matrix as something like a distance matrix -- actually, it's more like a similarity matrix. I don't recall if e1071 allows you to use kernel matrix as input, but I'm pretty sure the svm functions from kernlab do. It was a pain to use, though. But anyway -- don't use your distance matrix :-) I have little experience with SVM, and I had the impression that it is a/ usually used with data taht have observations in terms of a number of variables (hence, not pariwise distances); With the exception of plugging in a kernel matrix (which was calculated from data in its original feature space) that's pretty much correct. b/ it is not well suited for large multidimensional spaces (I have a distance matrix of 200*200 observations, a part of this could be used as training data, but still, we are looking at say 50 distances per observation). But your distance matrix isn't really the same multidemensional space your data lives in, right? Anyway, like I said before, try the SVM on your original data with some different kernels. I think the RBF kernel should be closest in spirit to your distance matrix, and will likely perform better than your kNN ;-). Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM classification based on pairwise distance matrix
Hi Steve, tahnks for the hints and clarifications. Unfortunately, I will not be able to use the approach you suggest, The distances I generate are distances between VERY large matrices (say 10x10 and more) each of different dimensions (not necessarily square either), and there is no significance in terms of column properties, they are basically graphs of sort. Is there a way out with the SVM, or I just forget that? Martin On 10/21/2010 5:42 PM, Steve Lianoglou wrote: Hi, On Thu, Oct 21, 2010 at 9:42 AM, Martin Tomkomartin.to...@geo.uzh.ch wrote: Dear all, I am exploring the possibilities for automated classification of my data. I have successfully used KNN, but was thinking about looking at SVM (which I did nto use before). I have a pairwise distance matrix of training observations which are classified in set classes, and a distance matrix of new observations to the training ones. It seems to me that since you have some pairwise distance metric, your original data is in some vector form. Why not just try using your original data (forget the pairwsise distance for now) and try a few different kernels for the svm, such as a linear kernel or an rbf/gaussian. Is it possible to use distance matrices for SVM, and if yes, which package would do so (e1071 ? ). I guess you can think of a kernel matrix as something like a distance matrix -- actually, it's more like a similarity matrix. I don't recall if e1071 allows you to use kernel matrix as input, but I'm pretty sure the svm functions from kernlab do. It was a pain to use, though. But anyway -- don't use your distance matrix :-) I have little experience with SVM, and I had the impression that it is a/ usually used with data taht have observations in terms of a number of variables (hence, not pariwise distances); With the exception of plugging in a kernel matrix (which was calculated from data in its original feature space) that's pretty much correct. b/ it is not well suited for large multidimensional spaces (I have a distance matrix of 200*200 observations, a part of this could be used as training data, but still, we are looking at say 50 distances per observation). But your distance matrix isn't really the same multidemensional space your data lives in, right? Anyway, like I said before, try the SVM on your original data with some different kernels. I think the RBF kernel should be closest in spirit to your distance matrix, and will likely perform better than your kNN ;-). Hope that helps, -steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM classification based on pairwise distance matrix
Hi, On Thu, Oct 21, 2010 at 12:12 PM, Martin Tomko martin.to...@geo.uzh.ch wrote: Hi Steve, tahnks for the hints and clarifications. Unfortunately, I will not be able to use the approach you suggest, The distances I generate are distances between VERY large matrices (say 10x10 and more) each of different dimensions (not necessarily square either), and there is no significance in terms of column properties, they are basically graphs of sort. Is there a way out with the SVM, or I just forget that? Well, it's not clear to me what type of data you are working with. You say they are graphs of sort. There are principled ways of working with graphs in SVMs -- namely using graph kernels. You can find information about them if you run through google (Karsten Borgwadt does a lot of work in this area). Unfortunately, I don't think there are any public-domain implementations out there for you to consume easily. But still -- you're able to calculate a distance metric over your data -- how are you doing that? Here's a shot at the dark, and probably not so correct, but read at your own risk: What if you try to create a kernel matrix by plugging your distance metric into the appropriate place from something like an RBF kernel function. For instance, the value of the RBF kernel between two points is: exp(-|X_1 - X_2|^2 / sigma^2) What if you plugged your distance measure between samples X_1 and X_2 into the |X_1 - X_2| slot and kept the rest the same? You have to verify that this is a valid kernel (gram) matrix -- I think it just needs to be symmetric positive definite. See a quick review here: http://www.support-vector.net/icml-tutorial.pdf Now your just left to figure out how to use ksvm (from kernlab) with kernel matrices and maybe you have something that can work. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.