[R] tree()
Hi I am trying to use tree() to classify movements in a futures contract. My data is like this: diff dip dim adx 1 0100.08650.100.0 2 0 93.185402044.5455 93.18540 3 0 90.309951549.1169 90.30995 4 1 85.22030 927.0419 85.22030 5 1 85.36084 785.6480 85.36084 6 0 85.72627 663.3814 85.72627 7 0 78.06721 500.1113 78.06721 8 1 69.59398 376.7558 69.59398 9 1 71.15429 307.4533 71.15429 10 1 71.81023 280.6238 71.81023 plus another 6000 lines The cpus example works fine and I am trying this: tree.model - tree(as.factor(indi$diff) ~ indi$dim + indi$dip + indi$adx, indi[1:4000,]) tree.model summary(tree.model) plot(tree.model); text(tree.model) but I get this: tree.model - tree(as.factor(indi$diff) ~ indi$dim + indi$dip + indi$adx, indi[1:4000,]) tree.model node), split, n, deviance, yval, (yprob) * denotes terminal node 1) root 6023 8346 0 ( 0.513 0.487 ) * summary(tree.model) Classification tree: tree(formula = as.factor(indi$diff) ~ indi$dim + indi$dip + indi$adx, data = indi[1:4000, ]) Variables actually used in tree construction: character(0) Number of terminal nodes: 1 Residual mean deviance: 1.386 = 8346 / 6022 Misclassification error rate: 0.487 = 2933 / 6023 plot(tree.model); text(tree.model) Error in plot.tree(tree.model) : cannot plot singlenode tree I'm not getting any sort of tree formed. I wondered if anyone could point me in the right direction. Thanks. Stephen Choularton [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ripper
Is there some decision tree method available with R, like ripper, that ends up producing a list of the rules and can be used for prediction? Stephen Choularton 02 2226 0413 545 182 -- 5:40 PM [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] futures, investment, etc
Hi I am just starting to look at R and trading in futures, stock, etc Can anyone point me to useful background material? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] futures, investment
Hi I am just starting to look at R and trading in futures, stock, etc Can anyone point me to useful background material? Stephen Choularton 02 2226 0413 545 182 -- 11:39 PM [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] font size for xlab
Hi I am trying to set the xlab font size. I have this code: attach(errorsBySpeakers) postscript(pic2.ps,width=4,height=4,paper=A4,horizontal=FALSE,pointsize= 0,family=Times) plot(prattpercent,uttspercent, xlab=Testing) abline(z) dev.off() detach(errorsBySpeakers) but I cannot find the correct form of words to make the xlab Testing as a size 11 font. Does anyone know the wording? Also for ylab. Thanks Stephen Choularton 02 2226 0413 545 182 Checked by AVG Free Edition. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stepAIC
Hi I hope this isn't off topics, but I have always found when I stepAIC() some glm I get an improvement in accuracy and kappa, but I have just done a case where I got a marginal deterioration. Is this possible, or should I be going through my figures carefully to see if I have messed up? Stephen Choularton 02 2226 0413 545 182 Checked by AVG Free Edition. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] boosting - second posting
Hi I am using boosting for a classification and prediction problem. For some reason it is giving me an outcome that doesn't fall between 0 and 1 for the predictions. I have tried type=response but it made no difference. Can anyone see what I am doing wrong? Screen output shown below: boost.model - gbm(as.factor(train$simNuance) ~ ., # formula + data=train, # dataset + # +1: monotone increase, + # 0: no monotone restrictions + distribution=gaussian, # bernoulli, adaboost, gaussian, + # poisson, and coxph available + n.trees=3000,# number of trees + shrinkage=0.005, # shrinkage or learning rate, + # 0.001 to 0.1 usually work + interaction.depth=3, # 1: additive model, 2: two-way interactions, etc. + bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best + train.fraction = 0.5,# fraction of data for training, + # first train.fraction*N used for training + n.minobsinnode = 10, # minimum total weight needed in each node + cv.folds = 5,# do 5-fold cross-validation + keep.data=TRUE, # keep a copy of the dataset with the object + verbose=FALSE)# print out progress best.iter = gbm.perf(boost.model,method=cv) pred = predict.gbm(boost.model, test, best.iter) summary(pred) Min. 1st Qu. MedianMean 3rd Qu.Max. 0.4772 1.5140 1.6760 1.5100 1.7190 1.9420 Checked by AVG Free Edition. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] boosting
Hi I am using boosting for a classification and prediction problem. For some reason it is giving me an outcome that doesn't fall between 0 and 1 for the predictions. I have tried type=response but it made no difference. Can anyone see what I am doing wrong? Screen output shown below: boost.model - gbm(as.factor(train$simNuance) ~ ., # formula + data=train, # dataset + # +1: monotone increase, + # 0: no monotone restrictions + distribution=gaussian, # bernoulli, adaboost, gaussian, + # poisson, and coxph available + n.trees=3000,# number of trees + shrinkage=0.005, # shrinkage or learning rate, + # 0.001 to 0.1 usually work + interaction.depth=3, # 1: additive model, 2: two-way interactions, etc. + bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best + train.fraction = 0.5,# fraction of data for training, + # first train.fraction*N used for training + n.minobsinnode = 10, # minimum total weight needed in each node + cv.folds = 5,# do 5-fold cross-validation + keep.data=TRUE, # keep a copy of the dataset with the object + verbose=FALSE)# print out progress best.iter = gbm.perf(boost.model,method=cv) pred = predict.gbm(boost.model, test, best.iter) summary(pred) Min. 1st Qu. MedianMean 3rd Qu.Max. 0.4772 1.5140 1.6760 1.5100 1.7190 1.9420 Checked by AVG Free Edition. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] products and polynomials in formulae
Hi I can do this: formula = as.factor(outcome) ~ . in glm and other model building functions. I think there is a way to get the product of the determinants (that is d1 * d2, d1 * d3, etc) and also another way to get all the polynomials (that is like poly(d1,2) would produce for a single determinant). Can anyone tell me how you write them? Stephen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] error message
Hi Does anyone know what this means: glm.model = glm(formula = as.factor(nextDay) ~ ., family=binomial, data=spi[1:1000,]) pred - predict(glm.model, spi[1001:1250,-9], type=response) Warning message: prediction from a rank-deficient fit may be misleading in: predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == 9 is my determinant and I still get this message even when I remove the 9. Stephen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] There were 25 warnings (use warnings() to see them)
I am trying to use bagging like this: bag.model - bagging(as.factor(nextDay) ~ ., data = spi[1:1250,]) pred = predict(bag.model, spi[1251:13500,-9]) There were 25 warnings (use warnings() to see them) t = table(pred, spi[1251:13500,9]) t pred 0 1 0 42 40 1 12 22 classAgreement(t) but I get the warning. The warnings run like this: warnings() Warning messages: 1: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 2: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 3: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 4: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 5: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 6: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 7: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 8: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 9: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 10: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 11: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 12: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 13: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 14: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 15: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 16: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 17: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 18: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 19: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 20: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 21: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 22: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 23: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 24: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) 25: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, 1:N, predict(object$mtrees[[i]], newdata, type = class)) Can anyone tell me what is going wrong? Stephen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] There were 25 warnings (use warnings() to see them)
Don't worry I can see my typo. Sorry for the posting! bag.model - bagging(as.factor(nextDay) ~ ., data = spi[1:1250,]) pred = predict(bag.model, spi[1251:13500,-9]) There were 25 warnings (use warnings() to see them) t = table(pred, spi[1251:13500,9]) t [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] graphs - saving and multiple
Hi I am doing something like this: hist(maximumPitch, xlab=Maximum Pitch in Hertz) which produces a nice histogram but what do I do to get two or three, etc on one page? I want to save the resulting file to an eps. I can find: postscript(ex.eps) which I then run something like my hist above and then dev.off() but I don't get anything in my eps file! Thanks. Stephen [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] tuning SVM's
Hi I am doing this sort of thing: POLY: obj = best.tune(svm, similarity ~., data = training, kernel = polynomial) summary(obj) Call: best.tune(svm, similarity ~ ., data = training, kernel = polynomial) Parameters: SVM-Type: eps-regression SVM-Kernel: polynomial cost: 1 degree: 3 gamma: 0.04545455 coef.0: 0 epsilon: 0.1 Number of Support Vectors: 754 svm.model - svm(similarity ~., data = training, kernel = polynomial, cost = 1, degree = 3, gamma = 0.04545455, coef.0 = 0, epsilon = 0.1) pred=predict(svm.model, testing) pred[pred .5] = 1 pred[pred = .5] = 0 table(testing$similarity, pred) pred 0 1 0 30 8 1 70 63 obj = best.tune(svm, similarity ~., data = training, kernel = linear) summary(obj) LINEAR: Call: best.tune(svm, similarity ~ ., data = training, kernel = linear) Parameters: SVM-Type: eps-regression SVM-Kernel: linear cost: 1 gamma: 0.04545455 epsilon: 0.1 Number of Support Vectors: 697 svm.model - svm(similarity ~., data = training, kernel = linear, cost = 1, gamma = 0.04545455, epsilon = 0.1) pred=predict(svm.model, testing) pred[pred .5] = 1 pred[pred = .5] = 0 table(testing$similarity, pred) pred 0 1 0 6 32 1 4 129 RADIAL: obj = best.tune(svm, similarity ~., data = training, kernel = radial) summary(obj) Call: best.tune(svm, similarity ~ ., data = training, kernel = linear) Parameters: SVM-Type: eps-regression SVM-Kernel: linear cost: 1 gamma: 0.04545455 epsilon: 0.1 Number of Support Vectors: 697 svm.model - svm(similarity ~., data = training, kernel = radial, cost = 1, gamma = 0.04545455, epsilon = 0.1) pred=predict(svm.model, testing) pred[pred .5] = 1 pred[pred = .5] = 0 table(testing$similarity, pred) pred 0 1 0 27 11 1 64 69 SIGMOID: obj = best.tune(svm, similarity ~., data = training, kernel = sigmoid) summary(obj) Call: best.tune(svm, similarity ~ ., data = training, kernel = sigmoid) Parameters: SVM-Type: eps-regression SVM-Kernel: sigmoid cost: 1 gamma: 0.04545455 coef.0: 0 epsilon: 0.1 Number of Support Vectors: 986 svm.model - svm(similarity ~., data = training, kernel = sigmoid, cost = 1, gamma = 0.04545455, coef.0 = 0, epsilon = 0.1) pred=predict(svm.model, testing) pred[pred .5] = 1 pred[pred = .5] = 0 table(testing$similarity, pred) pred 0 1 0 8 30 1 26 107 and then taking out the kappa statistic to see if I am getting anything significant. I get kappas of 15 - 17% - I don't think that is very good. I know kappa is really for comparing the outcomes of two taggers but it seems a good way to measure if your results might be by chance. Two questions: Any comments on Kappa and what it might be telling me? What can I do to tune my kernels further? Stephen [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] (no subject)
Hi I am trying to tune an svm by doing the following: tune(svm, similarity ~., data = training, degree = 2^(1:2), gamma = 2^(-1:1), coef0 = 2^(-1:1), cost = 2^(2:4), type = polynomial) but I am getting Error in svm.default(x, y, scale = scale, ...) : wrong type specification! I have to admit I am not sure what I am doing wrong. Could anyone tell me why the parameters I am using are wrong? Plus could anyone tell me how to go about picking the correct ranges for my tuning? Thanks S [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] tune()
Hi I am trying to tune an svm by doing the following: tune(svm, similarity ~., data = training, degree = 2^(1:2), gamma = 2^(-1:1), coef0 = 2^(-1:1), cost = 2^(2:4), type = polynomial) but I am getting Error in svm.default(x, y, scale = scale, ...) : wrong type specification! I have to admit I am not sure what I am doing wrong. Could anyone tell me why the parameters I am using are wrong? Plus could anyone tell me how to go about picking the correct ranges for my tuning? Thanks S [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] support vector machine
Hi Everyone Thanks to those who responded last time. I am still having problems. I really want to find one of those tutorials on how to use svm() so I can then get going using it myself. Issues are which kernel to choose, how to tune the parameters. If anyone know of a tutorial please let me know. Stephen [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] SMVs
Hi Everyone I am struggling to get going with support vector machines in R - smv() and predict() etc. Does anyone know of a good tutorial covering R and these things? Stephen [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html