[R] how to pass weka classifier options with a meta classifier in RWeka?
Hi, I am trying to replicate a training of AttributeSelectedClassifier with CFsSubsetEval, BestFirst and NaiveBayes that I have initially done with Weka. Now, I am trying to use RWeka in R. I have a problem of passing arguments to the CfsSubsetEval, BestFirst and NaiveBayes. I have first created an interface for the classifier with: AS-make_Weka_classifier(weka/classifiers/meta/AttributeSelectedClassifier) And then I am trying to run the classifier with: nb.model-AS(class~.,data=ex, control=Weka_control( E=weka.attributeSelection.CfsSubsetEval, S=weka.attributeSelection.BestFirst -D 1, W=weka.classifiers.bayes.NaiveBayes -D)) But now, I get an error saying: Error in .jcall(classifier, V, buildClassifier, instances) : java.lang.Exception: Can't find class called: weka.classifiers.bayes.NaiveBayes -D indicating that the way I am passing the argument -D to the NaiveBayes is incorrect. I am uncertain from the RWeka documentation how the passing mechanism of Weka_control is supposed to work with meta classifiers. All help is greatly appreciated. Many thanks, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tr: Re: how to pass weka classifier options with a meta classifier in RWeka?
On Thu, 2012-02-09 at 14:52 +0100, Milan Bouchet-Valat wrote: Le jeudi 09 février 2012 à 15:31 +0200, Kari Ruohonen a écrit : snip And then I am trying to run the classifier with: nb.model-AS(class~.,data=ex, control=Weka_control( E=weka.attributeSelection.CfsSubsetEval, S=weka.attributeSelection.BestFirst -D 1, W=weka.classifiers.bayes.NaiveBayes -D)) But now, I get an error saying: Error in .jcall(classifier, V, buildClassifier, instances) : java.lang.Exception: Can't find class called: weka.classifiers.bayes.NaiveBayes -D indicating that the way I am passing the argument -D to the NaiveBayes is incorrect. I am uncertain from the RWeka documentation how the passing mechanism of Weka_control is supposed to work with meta classifiers. All help is greatly appreciated. I've never tried it myself, but ?Weka_control says: One can use lists for options taking multiple arguments, see the documentation for ‘SMO’ for an example. So maybe nb.model-AS(class~.,data=ex, control=Weka_control( E=weka.attributeSelection.CfsSubsetEval, S=list(weka.attributeSelection.BestFirst, D=1), W=list(weka.classifiers.bayes.NaiveBayes, D=1))) Cheers Hi and thanks for the suggestion. Unfortunately, it results in a similar error: Error in .jcall(classifier, V, buildClassifier, instances) : java.lang.Exception: Can't find class called: weka.classifiers.bayes.NaiveBayes -D 1 regards, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gam predictions with negbin model
Hi, I wonder if predict.gam is supposed to work with family=negbin() definition? It seems to me that the values returned by type=response are far off the observed values. Here is an example output from the negbin examples: set.seed(3) n-400 dat-gamSim(1,n=n) g-exp(dat$f/5) dat$y-rnbinom(g,size=3,mu=g) b-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat) summary(y) Min. 1st Qu. MedianMean 3rd Qu.Max. 0.6061 1.6340 2.8120 2.7970 3.9250 4.9830 summary(predict(b,type=response)) Min. 1st Qu. MedianMean 3rd Qu.Max. 0.8972 3.1610 4.8140 6.1170 8.1300 28.0100 I.e. the range and mean of observed values (y) are smaller than those of the predictions from the gam model. Should I somehow apply the estimated theta on these predictions? regards, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam predictions with negbin model
On 26/10/11 12:10, Achim Zeileis wrote: On Wed, 26 Oct 2011, Kari Ruohonen wrote: Hi, I wonder if predict.gam is supposed to work with family=negbin() definition? It seems to me that the values returned by type=response are far off the observed values. Here is an example output from the negbin examples: set.seed(3) n-400 dat-gamSim(1,n=n) g-exp(dat$f/5) dat$y-rnbinom(g,size=3,mu=g) b-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat) summary(y) Min. 1st Qu. MedianMean 3rd Qu.Max. 0.6061 1.6340 2.8120 2.7970 3.9250 4.9830 summary(predict(b,type=response)) Min. 1st Qu. MedianMean 3rd Qu.Max. 0.8972 3.1610 4.8140 6.1170 8.1300 28.0100 I.e. the range and mean of observed values (y) What exactly is y in the code above? I guess you mean dat$y: R summary(dat$y) Min. 1st Qu. MedianMean 3rd Qu.Max. 0.000 2.000 4.000 6.235 8.000 68.000 which looks rather reasonable... Z Thanks - what a stupid mistake, an old .RData hanging around even if I started a new R instance. Terribly sorry and many apologies. Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] location of Tisean executables when using RTisean and jumping between linux and windows
Hi, I wonder if someone could help. I needed to transfer (copy) a workspace file that had been generated in linux (R 2.11) to windows running the same version of R 2.11 (but of course windows binary). Usually, there is no problem in doing this and all objects work as expected. I am often doing this to be able to produce wmf or emf graphic files that I need. This time I had some spectra that I have taken the first derivative of with the sav_gol function in the RTisean package. I know RTisean is just an interface to the Tisean executables. The trouble I am facing is that it seems that the location of the Tisean executables is somehow hard coded to the R workspace file. I assume this since when I try to rerun the sav_gol on the windows machine after copying the workspace file from linux and opening it in windows, RTisean tries to search the Tisean executables from the location that is valid for linux, not windows. RTisean help package says RTisean asks the location of the executables the first time a function is called and that this location is saved in user's home directory for future use. There is no specific information of how this works in windows where there is no obvious home directory. However, I have run R console on windows and it asked this location but I don't know where the information was stored. In linux it is in .RTiseanSettings file in user's home as explained. My questions are: 1) Is there a way I could break the link of the Tisean executables to the linux location so that when run in windows the executables in windows will be used? 2) Is the hard coding of the location of Tisean executables to the workspace image deliberate and necessary? Many thanks, Kari Ruohonen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scores for a new observation from PCAgrid() in pcaPP
Hi, I a trying to compute scores for a new observation based on previously computed PCA by PCAgrid() function in the pcaPP package. My data has more variables than observations. Here is an imaginary data set to show the case: n.samples-30 n.bins-1000 x.sim-rep(0,n.bins) V.sim-diag(n.bins) mtx-array(dim=c(n.samples,n.bins)) for(i in 1:n.samples) mtx[i,]-mvrnorm(1,x.sim,V.sim) With prcomp() I can do the following: pc.pr2-prcomp(mtx,scale=TRUE) newscr.pr2-scale(t(mtx[1,]),pc.pr2$center,pc.pr2$scale)%*%pc.pr2 $rotation The latter computes the scores for the first row of mtx. I can verify that the scores are the same as computed originally by comparing with pc.pr2$x[1,] # that will print out the scores for the first observation Now, if I tried the same with PCAgrid() as follows: pc.pp2-PCAgrid(mtx,k=min(dim(mtx)),scale=mad) newscr.pp2-scale(t(mtx[1,]),pc.pp2$center,pc.pp2$scale)%*%pc.pp2 $loadings The newscr.pp2 do not match the scores in the pc.pp2 object as can be verified by comparing with: pc.pp2$x[1,] I wonder what I am missing? Or is it so that for the grid method such computation of scores from the loadings and original observations is not possible? For the case pn, i.e. when there are more observations than variables, the scores computed from loadings and the scores from the model object match also for the PCAgrid() method, i.e. the behaviour described above seems to relate to cases where pn. Many thanks for any help, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] font question on pdf device
On Fri, 2010-10-08 at 14:19 +0100, Ted Harding wrote: On 08-Oct-10 12:44:12, Kari Ruohonen wrote: Hi, I wonder if this is something on my machine locally or R in general. When I do the following: plot(c(0,1),c(0,1),main=expression(paste(symbol(D),D,sep=))) I get a plot with a title having uppercase delta followed by D. But in the following pdf(file=deltaTest.pdf) plot(c(0,1),c(0,1),main=expression(paste(symbol(D),D,sep=))) dev.off() the uppercase delta looks like O with overstrike slash, i.e. Ø. snip [1] stats graphics grDevices utils datasets methods base which is the same as yours (except that I'm using a slightly earlier version of R, and on i486 rather than x86_64. Debian Etch by the way). Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 08-Oct-10 Time: 14:19:48 -- XFMail -- Hi and thanks for suggestions. Based on these I installed acroread and found that when viewed with acroread the Delta in the pdf file prints out OK but when viewed with evince, the document viewer, I get the error. So, it seems not be an R issue at all. I am running 64-bit Ubuntu 9.10 for those who are interested in testing this. Many thanks for all help. Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] font question on pdf device
Hi, I wonder if this is something on my machine locally or R in general. When I do the following: plot(c(0,1),c(0,1),main=expression(paste(symbol(D),D,sep=))) I get a plot with a title having uppercase delta followed by D. But in the following pdf(file=deltaTest.pdf) plot(c(0,1),c(0,1),main=expression(paste(symbol(D),D,sep=))) dev.off() the uppercase delta looks like O with overstrike slash, i.e. Ø. Other greek alphabets, such as Gamma, seem to work fine for pdf as well. My sessioninfo for this is sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Many thanks, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merging data frames with matrix objects when missing cases
Hi, I have faced a problem with the merge() function when trying to merge two data frames that have a common index but the second one does not have cases for all indexes in the first one. With usual variables R fills in the missing cases with NA if all=T is requested. But if the variable is a matrix R seems to insert NA only to the first column of the matrix and fill in the rest of the columns by recycling the values. Here is a toy example: df1-data.frame(a=1:3,X1=I(matrix(1:6,ncol=2))) df2-data.frame(a=1:2,X2=I(matrix(11:14,ncol=2))) merge(df1,df2) a X1.1 X1.2 X2.1 X2.2 1 114 11 13 2 225 12 14 # no all=T, missing cases are dropped merge(df1,df2,all=T) a X1.1 X1.2 X2.1 X2.2 1 114 11 13 2 225 12 14 3 336 NA 13 # X2.1 set to NA correctly but X2.2 set to 13 by recycling. Can I somehow get the behaviour that the third row of the second matrix X2 in the above example would be filled with NA for all columns? None of the merge() options does not seem to provide a solution. regards, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging data frames with matrix objects when missing cases
Yes, that was the original question: when a variable in a data frame is a matrix instead of an ordinary variable merge() handles the missing cases so that only the first column of the matrix gets NA and the rest are recycled. If the matrix is broken to several variables everything works fine. Why then have a matrix in a data frame as a variable? In chemometrics, for example, it is usual to have e.g. NIR spectra stored in the data frame in this way. This eases the use of such spectra as a predictor in the model formula (may contain hundreds of variables depending on the wavelength binning used). It is also helpful in grouping variables in a data frame to different predictor sets. See examples in the pls package. There is a workout by searching the NA for the first column and setting all other columns on that row NA as well. But my question was more like a caution about the unexpected behaviour that someone could consider as an unwished feature. Kari On Fri, 2009-09-18 at 20:41 +0300, johannes rara wrote: This has something to do with your data.frame structure see str(df1) 'data.frame': 3 obs. of 2 variables: $ a : int 1 2 3 $ X1: 'AsIs' int [1:3, 1:2] 1 2 3 4 5 6 str(df2) 'data.frame': 2 obs. of 2 variables: $ a : int 1 2 $ X2: 'AsIs' int [1:2, 1:2] 11 12 13 14 This seems to work df1-data.frame(a=1:3, b = 1:3, c = 4:6) str(df1) 'data.frame': 3 obs. of 3 variables: $ a: int 1 2 3 $ b: int 1 2 3 $ c: int 4 5 6 df2-data.frame(a=1:2, d = 11:12, e = 13:14) str(df2) 'data.frame': 2 obs. of 3 variables: $ a: int 1 2 $ d: int 11 12 $ e: int 13 14 merge(df1,df2) a b c d e 1 1 1 4 11 13 2 2 2 5 12 14 merge(df1, df2, all=T) a b c d e 1 1 1 4 11 13 2 2 2 5 12 14 3 3 3 6 NA NA 2009/9/18 Kari Ruohonen kari.ruoho...@utu.fi: Hi, I have faced a problem with the merge() function when trying to merge two data frames that have a common index but the second one does not have cases for all indexes in the first one. With usual variables R fills in the missing cases with NA if all=T is requested. But if the variable is a matrix R seems to insert NA only to the first column of the matrix and fill in the rest of the columns by recycling the values. Here is a toy example: df1-data.frame(a=1:3,X1=I(matrix(1:6,ncol=2))) df2-data.frame(a=1:2,X2=I(matrix(11:14,ncol=2))) merge(df1,df2) a X1.1 X1.2 X2.1 X2.2 1 114 11 13 2 225 12 14 # no all=T, missing cases are dropped merge(df1,df2,all=T) a X1.1 X1.2 X2.1 X2.2 1 114 11 13 2 225 12 14 3 336 NA 13 # X2.1 set to NA correctly but X2.2 set to 13 by recycling. Can I somehow get the behaviour that the third row of the second matrix X2 in the above example would be filled with NA for all columns? None of the merge() options does not seem to provide a solution. regards, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] residual standard error in rlm (MASS package)
Hi, I would appreciate of someone could explain how the residual standard error is computed for rlm models (MASS package). Usually, one would expect to get the residual standard error by sqrt(sum((y-fitted(fm))^2)/(n-2)) where y is the response, fm a linear model with an intercept and slope for x and n the number of observations. This does not seem to work for rlm models and I am wondering what obvious am I missing here? Here is an example: x-1:100 y - c(2.37156056743079, 1.66644749462933, 6.33155723966817, 12.7709430358167, 11.124950273, 19.7839679181322, 15.4923741347280, 18.702397068068, 18.7599963836891, 16.5916430986993, 16.0653054434192, 25.4517287910774, 19.9306544701024, 25.3581170063305, 35.6823980984208, 25.8293557856092, 34.7021243077337, 31.5336533511445, 36.3599764020412, 44.6000402205419, 41.9899219097128, 45.4564141342995, 43.6061038794823, 48.7566542867736, 47.5504015095432, 54.8120780105412, 55.2620894365424, 53.223516997263, 59.5477081631011, 61.2390445046623, 62.3106323086734, 68.1104058608567, 62.399184797047, 73.9413640517595, 70.6710955288097, 74.5456476513766, 64.968260562374, 73.2318014155102, 73.7335636549196, 76.9362454490887, 80.2579421621043, 80.945827481932, 87.7805234941603, 90.0909966936097, 86.0620664696943, 90.3640690887434, 98.0965832886435, 96.789139334781, 102.114606626867, 98.3302535449148, 103.107825932103, 109.942412367491, 106.868253017023, 109.808738425258, 110.136050155862, 108.846488332796, 118.442973085485, 117.276921857816, 118.640871017018, 119.263784892266, 123.100214564588, 123.860590728955, 128.712228721465, 131.297848895423, 123.283516322512, 134.012585073241, 132.665302554315, 138.673423711638, 143.687124396642, 139.159598404340, 142.012045172451, 146.480644634549, 145.429104228138, 144.503524323636, 152.348091257061, 149.237135977337, 159.803973361884, 153.195835890301, 158.921034703569, 163.479578254736, 159.591944778941, 163.185119145309, 165.890510577093, 164.573471319534, 173.549321320816, 169.520130741843, 170.439532597426, 174.477604263110, 178.059609946662, 177.828073866105, 185.005760822296, 184.280998437732, 196.085419590290, 187.125508176825, 190.524627542992, 196.849299652848, 197.830377226055, 197.973198490102, 198.59328678419, 199.450725602621 ) # y originally generated with y-2*x+rnorm(100,0,2) fm-lm(y~x) rm-rlm(y~x) fm.r-sqrt(sum((y-fitted(fm))^2)/(n-2)) rm.r-sqrt(sum((y-fitted(rm))^2)/(n-2)) print(matrix(c(fm.r,summary(fm)$sigma,rm.r,summary(rm)$sigma), ncol=2,byrow=T)) Output of this is: [,1] [,2] [1,] 1.900033 1.900033 [2,] 1.905847 1.595128 I.e. for the lm model the residual standard error from the summary.lm method matches exactly sqrt(sum((y-fitted(fm))^2)/(n-2)) but that for the summary.rlm model is somewhat smaller than sqrt(sum((y-fitted(rm))^2)/(n-2)). I am curious what causes this difference? My sessionInfo() R version 2.7.1 (2008-06-23) x86_64-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MASS_7.2-44 regards, Kari Ruohonen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error in display function of the ARM package
Hi, I get the following error message when trying to use the display function on the ARM package: display(model) Error in .Internal(round(x, digits)) : no internal function round Looks like some kind of mismatch between the ARM package and some others? Can I somehow get around it? I have learned to like the display function to print model summaries. Here is my sessionInfo(): sessionInfo() R version 2.6.0 (2007-10-03) i486-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8; LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C; LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] foreign_0.8-23car_1.2-5 arm_1.0-34 R2WinBUGS_2.1-6 [5] coda_0.12-1 lme4_0.99875-9Matrix_0.999375-3 lattice_0.17-2 [9] MASS_7.2-37 loaded via a namespace (and not attached): [1] grid_2.6.0 Thanks, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error in display function of the ARM package
Thank you very much for the answer. Re-installation (I did a full reinstall of R and packages I use) helped and cured the problem. I had somehow missed the advice to re-install packages when upgrading to 2.6.0 and had only used update.packages(). regards, Kari On Wed, 2007-10-31 at 09:30 +, Prof Brian Ripley wrote: On Wed, 31 Oct 2007, Kari Ruohonen wrote: Hi, I get the following error message when trying to use the display function on the ARM package: You seem to mean 'arm' not 'ARM'. display(model) Error in .Internal(round(x, digits)) : no internal function round Looks like some kind of mismatch between the ARM package and some others? You mayneed to reinstall your packages under R 2.6.0, in particular Matrix: see https://stat.ethz.ch/pipermail/r-help/2007-October/142367.html This is a symptom of not doing so. Without reproducible code we can't tell if there is anything else amiss. Can I somehow get around it? I have learned to like the display function to print model summaries. Here is my sessionInfo(): sessionInfo() R version 2.6.0 (2007-10-03) i486-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8; LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C; LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] foreign_0.8-23car_1.2-5 arm_1.0-34 R2WinBUGS_2.1-6 [5] coda_0.12-1 lme4_0.99875-9Matrix_0.999375-3 lattice_0.17-2 [9] MASS_7.2-37 loaded via a namespace (and not attached): [1] grid_2.6.0 Thanks, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.