Dear list, my problem seems to be primarily a statistical one, but maybe there is a misspecification within R (and hopefully a solution).
I have two groups with two measured variables as training data. According to the variables, the groups differ totally. I know that this is a very easy situation, but the later analysis will use the same principle (aside from more groups and more possible values). The example should be enough to draw my problem: matrix <- matrix(rep(c(0,0,0,0,0,1,1,1,1,1),3), ncol = 3, byrow = FALSE) matrix[,2:3] <- jitter(matrix[,2:3], .001) lda <- lda(matrix[,2:3],matrix[,1], prior = c(5,5)/10) I added some jitter to obtain a little within-group variance. The LDA would fail otherwise. When trying to predict to probability of new values, I get some strange results: testmatrix <- matrix(c(0,0,1,1,0,1,1,0), ncol = 2, byrow = TRUE) predict(lda,testmatrix)$posterior > predict(lda,testmatrix)$posterior 0 1 [1,] 1 0 [2,] 0 1 [3,] 0 1 [4,] 1 0 Row 1 and 2 are quite right, although the probability should be not equal to 1, rather be close to 1. But row 3 and 4 really bothers me. The probabilities should be .5 for every value. Additionally the coefficients seem to be way to high: > lda[["scaling"]] LD1 [1,] 5835.805 [2,] 7000.393 When I insert 1 error per group, the results are quite right (jitter is not needed in this case): matrix <- matrix(rep(c(0,0,0,0,0,1,1,1,1,1),3), ncol = 3, byrow = FALSE) matrix[3,2] <- c(1) matrix[8,3] <- c(0) lda <- lda(matrix[,2:3],matrix[,1], prior = c(5,5)/10) predict(lda,testmatrix)$posterior > predict(lda,testmatrix)$posterior 0 1 [1,] 0.9996646499 0.0003353501 [2,] 0.0003353501 0.9996646499 [3,] 0.5000000000 0.5000000000 [4,] 0.5000000000 0.5000000000 My question is now: Is my data "too good" or did I make a mistake in my code? Best regards, Arne Schulz ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.