[R] question about non-linear least squares in R
Hi, everyone, My question is: It's not every time that you can get a converged result from the nls function. Is there any solution for me to get a reasonable result? For example: x - c(-0.06,-0.04,-0.025,-0.015,-0.005,0.005,0.015,0.025,0.04,0.06) y - c(1866760,1457870,1314960,1250560,1184850,1144920,1158850,1199910,1263850,1452520) fitOup- nls(y ~ constant + A*(x-MA)^4 + B*(x-MA)^2, start=list(constant=1000, A=1, B=-100, MA=0), control=nls.control(maxiter=100, minFactor=1/4096), trace=TRUE) For this one, I cannot get the converged result, how can I reach it? To use another funtion or to modify some settings for nls? Thank you very much! Yours, Warren __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about non-linear least squares in R
Below is one possibility: If you knew MA you would get a regular linear least-squares for parameters A,B and constant which can be easily solved. So now you can define a function f(MA) which returns that value. Now you must minimize that f - a function of one argument. It can have several local minima and so you must be careful but I believe that minimizing (even bad) function of one argument should be easier than your original problem. Regards, Moshe. P.S. if you do this I would be interested to know whether this works. --- Yu (Warren) Wang [EMAIL PROTECTED] wrote: Hi, everyone, My question is: It's not every time that you can get a converged result from the nls function. Is there any solution for me to get a reasonable result? For example: x - c(-0.06,-0.04,-0.025,-0.015,-0.005,0.005,0.015,0.025,0.04,0.06) y - c(1866760,1457870,1314960,1250560,1184850,1144920,1158850,1199910,1263850,1452520) fitOup- nls(y ~ constant + A*(x-MA)^4 + B*(x-MA)^2, start=list(constant=1000, A=1, B=-100, MA=0), control=nls.control(maxiter=100, minFactor=1/4096), trace=TRUE) For this one, I cannot get the converged result, how can I reach it? To use another funtion or to modify some settings for nls? Thank you very much! Yours, Warren __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about non-linear least squares in R
Here is one way of getting a reasonable fit: 1. Scale your y's by dividing all values by 1e6. 2. Plot x vs. y. The plot looks like a quadratic function. 3. Fit a quadratic const. + B*x^2 - this a linear regression problem so use lm. 4. Plot the predictions. 5. Eyball the necessary shift - MA is around 0.01. Refit const. + B*(x-.01)^2. Should get const.=1.147 and B=139.144 6. Use start=list(const.= 1.147, A=0, B=1.147, MA=.01). nls should converge in 4 iterations. In general, good starting points may be crucial to nls convergence. Scaling the y's to reasonable values also helps. Hope this helps, Andy __ Andy Jaworski 518-1-01 Process Laboratory 3M Corporate Research Laboratory - E-mail: [EMAIL PROTECTED] Tel: (651) 733-6092 Fax: (651) 736-3122 Yu (Warren) Wang [EMAIL PROTECTED] To Sent by: r-help@stat.math.ethz.ch [EMAIL PROTECTED] r-help@stat.math.ethz.ch at.math.ethz.chcc Subject 09/05/2007 02:51 [R] question about non-linear least AMsquares in R Hi, everyone, My question is: It's not every time that you can get a converged result from the nls function. Is there any solution for me to get a reasonable result? For example: x - c(-0.06,-0.04,-0.025,-0.015,-0.005,0.005,0.015,0.025,0.04,0.06) y - c(1866760,1457870,1314960,1250560,1184850,1144920,1158850,1199910,1263850,1452520) fitOup- nls(y ~ constant + A*(x-MA)^4 + B*(x-MA)^2, start=list(constant=1000, A=1, B=-100, MA=0), control=nls.control(maxiter=100, minFactor=1/4096), trace=TRUE) For this one, I cannot get the converged result, how can I reach it? To use another funtion or to modify some settings for nls? Thank you very much! Yours, Warren __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about making array dataset inside a package
Hello I have a question about how I can include an array dataset inside a package. In our package, one R function is designed to take an array of 2 rows * 2 columns * k levels as input data enter by the user, where k is positive integers. I am trying to include a 3-D array of 2-by-2-by-8 as dataset in the package, so users can load the data using data(dataname). We prefer to load the data as dataset, rather than include the long syntax in the help file for users to copy and paste. I can generate array in R console using long chain of syntax like:arraydat=array(.omitted...) However, I cannot figure a way to save the data in 3D array data frame format using write(), write.table(), or use data.frame() etc. If I directly copy-paste the screen output to a text file. I cannot read it into R using like: arraydat=read.table(array.txt, header=TRUE) This will give me a 2 rows by (2*k) columns two-dimension data set, and lost all level (or depth) structure for the purpose of our R function. The header is messed too. I appreciate if any expert can tell me how to include/save array stucture in dataset in R, and also how to read it into R to preserve the header and array structure. Thank you for your time. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about making array dataset inside a package
Please read http://cran.r-project.org/doc/manuals/R-exts.html especially Section 1.1.3. Use save to create an Rdata file. Max -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Wallace Hui Sent: Wednesday, September 05, 2007 4:18 PM To: r-help@stat.math.ethz.ch Subject: [R] Question about making array dataset inside a package Hello I have a question about how I can include an array dataset inside a package. In our package, one R function is designed to take an array of 2 rows * 2 columns * k levels as input data enter by the user, where k is positive integers. I am trying to include a 3-D array of 2-by-2-by-8 as dataset in the package, so users can load the data using data(dataname). We prefer to load the data as dataset, rather than include the long syntax in the help file for users to copy and paste. I can generate array in R console using long chain of syntax like:arraydat=array(.omitted...) However, I cannot figure a way to save the data in 3D array data frame format using write(), write.table(), or use data.frame() etc. If I directly copy-paste the screen output to a text file. I cannot read it into R using like: arraydat=read.table(array.txt, header=TRUE) This will give me a 2 rows by (2*k) columns two-dimension data set, and lost all level (or depth) structure for the purpose of our R function. The header is messed too. I appreciate if any expert can tell me how to include/save array stucture in dataset in R, and also how to read it into R to preserve the header and array structure. Thank you for your time. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question: randomization t-test function already defined in R?
Dear R Users, I am hoping you can help me. I have received code from a colleague who uses Matlab. I need to translate it into R. I am wondering if there is a randomization t-test (from non-parametric statistics) function already defined in R. (In Matlab the function is randtest.m.) ** QUESTION: Is anyone aware of a randomization t-test function in R? ** Thank you for your help, Karen --- Karen M. Green, Ph.D. Research Investigator Drug Design Group Sanofi Aventis Pharmaceuticals 1580 E. Hanley Blvd. Tucson, AZ 85737-9525 USA [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question: randomization t-test function already defined in R?
On Wed, 5 Sep 2007, [EMAIL PROTECTED] wrote: Dear R Users, I am hoping you can help me. I have received code from a colleague who uses Matlab. I need to translate it into R. I am wondering if there is a randomization t-test (from non-parametric statistics) function already defined in R. (In Matlab the function is randtest.m.) I don't know what randtest in MATLAB does exactly, but I guess that you might want to look at the function independence_test() in package coin. The distribution argument should probably set to use either exact or approximate p values. hth, Z ** QUESTION: Is anyone aware of a randomization t-test function in R? ** Thank you for your help, Karen --- Karen M. Green, Ph.D. Research Investigator Drug Design Group Sanofi Aventis Pharmaceuticals 1580 E. Hanley Blvd. Tucson, AZ 85737-9525 USA [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on graphs and simple if statements
Hi all, I'm a relatively new user to the R interface, and am trying to do some basic operations. So far I've used the curve() command to plot graphs. This has a couple of limitations, namely trying to do piecewise graphs. Basically, I'm looking for how to do if and while loops for making a graph. in simple code, something like: if x is on the interval [-1,1], x=x^2+1 otherwise, x=x^2 something along those lines. I also am wondering if it's possible to graph something like this: while(n20) { curve(1/n^2, -1,1) n++ } Along these same lines, is there a way to graph finite series? Thank you for your time and your replies. -- View this message in context: http://www.nabble.com/Question-on-graphs-and-simple-if-statements-tf4366956.html#a12446999 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on graphs and simple if statements
The answer to your first question is curve(ifelse((x-1)(x1),x^2+1,x+2),from=-10,to=10) Sorry I cannot understand the next part of your query. Ross Darnell On Sun, 2007-09-02 at 00:31 -0700, JHawk3 wrote: Hi all, I'm a relatively new user to the R interface, and am trying to do some basic operations. So far I've used the curve() command to plot graphs. This has a couple of limitations, namely trying to do piecewise graphs. Basically, I'm looking for how to do if and while loops for making a graph. in simple code, something like: if x is on the interval [-1,1], x=x^2+1 otherwise, x=x^2 something along those lines. I also am wondering if it's possible to graph something like this: while(n20) { curve(1/n^2, -1,1) n++ } Along these same lines, is there a way to graph finite series? Thank you for your time and your replies. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on shardsplot
Dear All, Would you please tell me how to display the sample No. on the map ? ---Below commands don't display the sample No.(from 1 to 150).--- library(som) library(klaR) iris.som3 - som(iris[,1:4], xdim = 14,ydim = 6) library(klaR); opar- par(xpd = NA) shardsplot(iris.som3, data.or = iris,label = TRUE) legend(3.5,14.3, col = rainbow(3), xjust =0.5, yjust = 0,legend = levels(iris[, 5]),pch = 16, horiz = TRUE) par(opar) Ebi [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on shardsplot
Ebi, keisyu, or whatever your name is, I know that this questions has already been answered by the shardplot author in a private thread, where this has been posted under a different name. Why do you obscure your real name on the list??? The answer by Nils Raabe was that shardsplot is intended to draw the cluster map but adding sample numbers is more difficult. His quick proposal that needs some minor tweaking is: xycords - cbind(kronecker(1:14, rep(1,6)), rep(1:6, 14)) labs - sapply(1:nrow(xycords), function(x) paste(as.character(which( (iris.som3$visual[,1] + 1 == xycords[x,1]) * (iris.som3$visual[,2] + 1 == xycords[x,2]) == 1)), collapse = ;)) text(cbind(xycords[,2], xycords[,1]), labs, cex=0.75) Uwe Ligges ebi wrote: Dear All, Would you please tell me how to display the sample No. on the map ? ---Below commands don't display the sample No.(from 1 to 150).--- library(som) library(klaR) iris.som3 - som(iris[,1:4], xdim = 14,ydim = 6) library(klaR); opar- par(xpd = NA) shardsplot(iris.som3, data.or = iris,label = TRUE) legend(3.5,14.3, col = rainbow(3), xjust =0.5, yjust = 0,legend = levels(iris[, 5]),pch = 16, horiz = TRUE) par(opar) Ebi [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about writing some code
Just a short question: Why does this work: x = c(1.6,1.8,2.4,2.7,2.9,3.3,3.4,3.4,4,5.2) F26-boxplot(x, add=FALSE, horizontal=TRUE, main=Figur 2.6 Boxplot, axes=FALSE) ...but this doesn't... x = c(1.6,1.8,2.4,2.7,2.9,3.3,3.4,3.4,4,5.2) F26 - boxplot(x) plot(F26, add=FALSE, horizontal=TRUE, main=Figur 2.6 Boxplot, axes=FALSE) Thanks for any help, Squall44 -- View this message in context: http://www.nabble.com/Question-about-writing-some-code-tf4297359.html#a12231853 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about writing some code
squall44 wrote: Just a short question: Why does this work: x = c(1.6,1.8,2.4,2.7,2.9,3.3,3.4,3.4,4,5.2) F26-boxplot(x, add=FALSE, horizontal=TRUE, main=Figur 2.6 Boxplot, axes=FALSE) boxplot returns some information on the boxplot, but the result cannot be plotted by boxplot itself. You can use bxp() to do so. Uwe Ligges ...but this doesn't... x = c(1.6,1.8,2.4,2.7,2.9,3.3,3.4,3.4,4,5.2) F26 - boxplot(x) plot(F26, add=FALSE, horizontal=TRUE, main=Figur 2.6 Boxplot, axes=FALSE) Thanks for any help, Squall44 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on R server and TinnR
So if you go to TinnR in R menu on the menu bar and choose 'R server: connections and tests'. What does that do exactly? It seems to me that there's a way to have R listening to incoming request from remote machine from TinnR. If anyone could share your knowledge, I'd really appreciate. Thank you again. - adschai - Original Message - From: Adaikalavan Ramasamy Date: Monday, August 20, 2007 8:19 am Subject: Re: [R] Question on R server and TinnR To: [EMAIL PROTECTED] I don't understand your question on how to make R act as a server. If you want to execute different pre-written R scripts then have a look at help(BATCH) and help(commandArgs). Regards, Adai [EMAIL PROTECTED] wrote: Hi - Classic question that I tried to look up online and couldn't find a clear answer. It seems that I can have R to act as a server. But I never know how this works. Would anyone please provide an example or introduction material where I can learn about this? I'm trying to build an environment where computation are distributed/delegated among different servers requested whenever I need. Thank you. - adschai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R- project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about lme and AR(1)
On Sat, 18 Aug 2007, Francisco Redelico wrote: Dear R users, As far as I know, EM algorithm can be only applied to estimate parameter from a regular exponential family. No. The EM algorithm will converge to a stationary point (and except in artificial cases, to a local maximum) for any likelihood. The special case you may be thinking of is that in some problems the E-step is equivalent to computing E[missing data | observed data] rather than the more general E[loglikelihood|observed data] -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on R server and TinnR
Hi - Classic question that I tried to look up online and couldn't find a clear answer. It seems that I can have R to act as a server. But I never know how this works. Would anyone please provide an example or introduction material where I can learn about this? I'm trying to build an environment where computation are distributed/delegated among different servers requested whenever I need. Thank you. - adschai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about unicode characters in tcltk
R Help wrote: hello list, Can someone help me figure out why the following code doesn't work? I'm trying to but both Greek letters and subscripts into a tcltk menu. The code creates all the mu's, and the 1 and 2 subscripts, but it won't create the 0. Is there a certain set of characters that R won't recognize the unicode for? Or am I input the \u2080 incorrectly? library(tcltk) m -tktoplevel() frame1 - tkframe(m) frame2 - tkframe(m) frame3 - tkframe(m) entry1 - tkentry(frame1,width=5,bg='white') entry2 - tkentry(frame2,width=5,bg='white') entry3 - tkentry(frame3,width=5,bg='white') tkpack(tklabel(frame1,text='\u03bc\u2080'),side='left') tkpack(tklabel(frame2,text='\u03bc\u2081'),side='left') tkpack(tklabel(frame3,text='\u03bc\u2082'),side='left') tkpack(frame1,entry1,side='top') tkpack(frame2,entry2,side='top') tkpack(frame3,entry3,side='top') thanks -- Sam Which OS was this? I can reproduce the issue on SuSE, but NOT Fedora 7. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about unicode characters in tcltk
On Sat, 2007-08-18 at 14:40 +0200, Peter Dalgaard wrote: R Help wrote: hello list, Can someone help me figure out why the following code doesn't work? I'm trying to but both Greek letters and subscripts into a tcltk menu. The code creates all the mu's, and the 1 and 2 subscripts, but it won't create the 0. Is there a certain set of characters that R won't recognize the unicode for? Or am I input the \u2080 incorrectly? library(tcltk) m -tktoplevel() frame1 - tkframe(m) frame2 - tkframe(m) frame3 - tkframe(m) entry1 - tkentry(frame1,width=5,bg='white') entry2 - tkentry(frame2,width=5,bg='white') entry3 - tkentry(frame3,width=5,bg='white') tkpack(tklabel(frame1,text='\u03bc\u2080'),side='left') tkpack(tklabel(frame2,text='\u03bc\u2081'),side='left') tkpack(tklabel(frame3,text='\u03bc\u2082'),side='left') tkpack(frame1,entry1,side='top') tkpack(frame2,entry2,side='top') tkpack(frame3,entry3,side='top') thanks -- Sam Which OS was this? I can reproduce the issue on SuSE, but NOT Fedora 7. I can reproduce this on Fedora 7 in that the \u2080 is reproduced as is and not as a subscript, unlike the other \u which appear as subscripted characters, sessionInfo() R version 2.5.1 Patched (2007-08-02 r42389) i686-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] tcltk stats graphics grDevices utils datasets [7] methods base If there is something specific to my Fedora installation that is different to Peter's that I can ascertain from installed packages/fonts etc, then let me know and I can provide the output from my laptop. G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about lme and AR(1)
Dear R users, As far as I know, EM algorithm can be only applied to estimate parameter from a regular exponential family. A multivariate normal distribution with an AR(1) matrix as covariance matrix does not belong to a regular exponential family, it is belong to a curved exponential family, so EM algorithm can not be applied to estimate parameters for this kind of distribution. I have used nle function from nlme package to estimate variance components with correlation=corAr1, this function uses first EM algorithm to refine the initial estimates of the random effects variance-covariance coefficients and uses them into a Newton-Raphson algorithm. Do anyone know what kind of modification of the EM algorithm use lme function to solve the problem mentioned below? Thank you in advance for your help Francisco [[alternative HTML version deleted]] __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam !gratis! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about sm.options sm.survival
On Thu, 16 Aug 2007, Rachel Jia wrote: Hi, there: It's my first time to post question in this forum, so thanks for your tolerance if my question is too naive. I am using a nonparametric smoothing procedure in sm package to generate smoothed survival curves for continuous covariate. I want to truncate the suvival curve and only display the part with covariate value between 0 and 7. The following is the code I wrote: sm.options(list(xlab=log_BSI_min3_to_base, xlim=c(0,7), ylab=Median Progression Prob)) sm.survival(min3.base.prog.cen[,3],min3.base.prog.cen[,2],min3.base.prog.cen[,1],h=sd(min3.base.prog.cen[,3]),status.code=1 ) But the xlim option does not work. Can anyone help me with this problem? The help page suggests that you need to use xlim as an inline option (part of ...). Following the help page example sm.survival(x, y, status, h=2, xlim=c(0,4)) works. So I think you need to follow the help page exactly. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about sm.options sm.survival
Hi, there: It's my first time to post question in this forum, so thanks for your tolerance if my question is too naive. I am using a nonparametric smoothing procedure in sm package to generate smoothed survival curves for continuous covariate. I want to truncate the suvival curve and only display the part with covariate value between 0 and 7. The following is the code I wrote: sm.options(list(xlab=log_BSI_min3_to_base, xlim=c(0,7), ylab=Median Progression Prob)) sm.survival(min3.base.prog.cen[,3],min3.base.prog.cen[,2],min3.base.prog.cen[,1],h=sd(min3.base.prog.cen[,3]),status.code=1 ) But the xlim option does not work. Can anyone help me with this problem? Thanks a lot. Rachel -- View this message in context: http://www.nabble.com/Question-about-sm.options---sm.survival-tf4279605.html#a12181263 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on output of vectors from a for loop into a matrix
Hello R-help, I am a recent convert to R from SAS and I am having some difficulty with output of a for loop into a matrix. I have searched the help manuals and the archives, but I can't get my code to work. It is probably a syntax error that I am not spotting. I am trying to make a distance matrix by extracting the distances from each point to all others in a vector (the for loop). I can get them all to print and can save each individually to a file, but I cannot get them to be bound into a matrix. Here is the code that I have tried: m-length(Day.1.flower.coords$x) #31 grid points output.matix-matrix(0.0,m,m) for(i in 1:m){ dist-spDistsN1(Day.1.coords.matrix, Day.1.coords.matrix[i,]) dist-as.vector(dist) output.matrix[i,]-dist print(dist)} The error message that I get is: Error in output.matrix[i,] - dist : incorrect number of subscripts on matrix Thanks for your help. Ryan ~~ Ryan D. Briscoe Runquist Population Biology Graduate Group University of California, Davis [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on output of vectors from a for loop into a matrix
On Thursday 16 August 2007 15:35, Ryan Briscoe Runquist wrote: Hello R-help, I am a recent convert to R from SAS and I am having some difficulty with output of a for loop into a matrix. I have searched the help manuals and the archives, but I can't get my code to work. It is probably a syntax error that I am not spotting. I am trying to make a distance matrix by extracting the distances from each point to all others in a vector (the for loop). I can get them all to print and can save each individually to a file, but I cannot get them to be bound into a matrix. Here is the code that I have tried: m-length(Day.1.flower.coords$x) #31 grid points output.matix-matrix(0.0,m,m) for(i in 1:m){ dist-spDistsN1(Day.1.coords.matrix, Day.1.coords.matrix[i,]) dist-as.vector(dist) output.matrix[i,]-dist print(dist)} The error message that I get is: Error in output.matrix[i,] - dist : incorrect number of subscripts on matrix Thanks for your help. Ryan ~~ Ryan D. Briscoe Runquist Population Biology Graduate Group University of California, Davis [EMAIL PROTECTED] Hi Bryan, for all UCD students you have the luxury of not using a loop! :) Would something like this work - for the creation of a 'distance matrix' : # example dataset: 2 x 2 grid d - expand.grid(x=1:2, y=1:2) # an instructive plot plot(d, type='n') text(d, rownames(d)) # create distance object and convert to matrix: m - as.matrix(dist(d)) m 1234 1 0.00 1.00 1.00 1.414214 2 1.00 0.00 1.414214 1.00 3 1.00 1.414214 0.00 1.00 4 1.414214 1.00 1.00 0.00 # is that the kind of distance matrix you are looking for : ?dist Distance Matrix Computation Description: This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix. [...] cheers, Dylan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dylan Beaudette Soils and Biogeochemistry Graduate Group University of California at Davis 530.754.7341 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about unicode characters in tcltk
hello list, Can someone help me figure out why the following code doesn't work? I'm trying to but both Greek letters and subscripts into a tcltk menu. The code creates all the mu's, and the 1 and 2 subscripts, but it won't create the 0. Is there a certain set of characters that R won't recognize the unicode for? Or am I input the \u2080 incorrectly? library(tcltk) m -tktoplevel() frame1 - tkframe(m) frame2 - tkframe(m) frame3 - tkframe(m) entry1 - tkentry(frame1,width=5,bg='white') entry2 - tkentry(frame2,width=5,bg='white') entry3 - tkentry(frame3,width=5,bg='white') tkpack(tklabel(frame1,text='\u03bc\u2080'),side='left') tkpack(tklabel(frame2,text='\u03bc\u2081'),side='left') tkpack(tklabel(frame3,text='\u03bc\u2082'),side='left') tkpack(frame1,entry1,side='top') tkpack(frame2,entry2,side='top') tkpack(frame3,entry3,side='top') thanks -- Sam __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about unicode characters in tcltk
R Help wrote: hello list, Can someone help me figure out why the following code doesn't work? I'm trying to but both Greek letters and subscripts into a tcltk menu. The code creates all the mu's, and the 1 and 2 subscripts, but it won't create the 0. Is there a certain set of characters that R won't recognize the unicode for? Or am I input the \u2080 incorrectly? library(tcltk) m -tktoplevel() frame1 - tkframe(m) frame2 - tkframe(m) frame3 - tkframe(m) entry1 - tkentry(frame1,width=5,bg='white') entry2 - tkentry(frame2,width=5,bg='white') entry3 - tkentry(frame3,width=5,bg='white') tkpack(tklabel(frame1,text='\u03bc\u2080'),side='left') tkpack(tklabel(frame2,text='\u03bc\u2081'),side='left') tkpack(tklabel(frame3,text='\u03bc\u2082'),side='left') tkpack(frame1,entry1,side='top') tkpack(frame2,entry2,side='top') tkpack(frame3,entry3,side='top') Odd, but I think not an R issue. I get weirdness in wish too. Try this % toplevel .a .a % label .a.b -text \u03bc\u2080 -font {Roman -10} .a.b % pack .a.b % .a.b configure {-activebackground [] {-text text Text {} μ₀} {-textvariable textVariable Variable {} {}} {-underline underline Underline -1 -1} {-width width Width 0 0} {-wraplength wrapLength WrapLength 0 0} % .a.b configure -font {Helvetica -12 bold} # the default, now shows \u2080 % .a.b configure -font {Roman -10} # back to Roman, *still* shows \u2080 ???!!! -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question regarding is.factor()
Dear all, please help with what must be a straightforward question which I can't answer. I add a column of my dataframe as factor of an existing column e.g. df[,5] - factor(df[,2]) and can test that it is by is.factor() but if I did not know in advance what types the columns were, is there a function to tell me what they are. i.e. instead of is.factor(), is.matrix(), is.list(), a function more like what.is() - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question regarding is.factor()
typeof() for 'types'. However, factor is not a type but a class, so class() is probably what you want. On Mon, 13 Aug 2007, Jabez Wilson wrote: Dear all, please help with what must be a straightforward question which I can't answer. But 'An Introduction to R' could. I add a column of my dataframe as factor of an existing column e.g. df[,5] - factor(df[,2]) and can test that it is by is.factor() but if I did not know in advance what types the columns were, is there a function to tell me what they are. i.e. instead of is.factor(), is.matrix(), is.list(), a function more like what.is() - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question regarding is.factor()
See ?class -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 13/08/07, Jabez Wilson [EMAIL PROTECTED] wrote: Dear all, please help with what must be a straightforward question which I can't answer. I add a column of my dataframe as factor of an existing column e.g. df[,5] - factor(df[,2]) and can test that it is by is.factor() but if I did not know in advance what types the columns were, is there a function to tell me what they are. i.e. instead of is.factor(), is.matrix(), is.list(), a function more like what.is() - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question on glmmML compared to NLMIXED
Hello! Can anyone help me. I am using the posterior.mode from the result of glmmML. It apears to be different from the BLUe estimate of the RANDOM statement in PROC NLMIXED in SAS. Why is that? Thank you Ronen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] QUESTION ON R!!!!!!!!!!!1
I think I asked you yesterday on the [EMAIL PROTECTED] account to read the posting guide *before* posting to R-help and I asked you not to SHOUT!!! Uwe Ligges lecastil wrote: Good day. I am employed at a public entity that handles million information and records of several variables and distributed in several topics. For the statistical analyses we use a statistical package, which allows us to call directly of the database (ORACLE) the information and to realize the statistical analysis. Everything is done in brief time. Nevertheless, we want to know if the statistical package R is capable of doing the same thing that does the statistical package with which we work, that is to say, is R capable of importing information of a database of million records to maximum speed and later statistical analysis allows to realize once imported the information?. I am grateful for prompt response to you since it is of supreme urgency to know this information. Luis Eduardo Castillo Méndez. ** Buen día. Trabajo en una entidad pública que maneja millones de datos y registros de varias variables y distribuidos en varios tópicos. Para los análisis estadísticos usamos un paquete estadístico, el cuál nos permite llamar directamente de la base de datos (ORACLE) la información y realizar el análisis estadístico. Todo se hace en tiempo breve. Sin embargo, queremos saber si el paquete estadístico R es capaz de hacer lo mismo que hace el paquete estadístico con el que trabajamos, es decir, R es capaz de importar datos de una base de datos de millones de registros a velocidad máxima y después permita realizar análisis estadístico una vez importado los datos? . Te agradezco pronta respuesta ya que es de suma urgencia saber este dato. Luis Eduardo Castillo Méndez. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] QUESTION ON R!!!!!!!!!!!1
Good day. I am employed at a public entity that handles million information and records of several variables and distributed in several topics. For the statistical analyses we use a statistical package, which allows us to call directly of the database (ORACLE) the information and to realize the statistical analysis. Everything is done in brief time. Nevertheless, we want to know if the statistical package R is capable of doing the same thing that does the statistical package with which we work, that is to say, is R capable of importing information of a database of million records to maximum speed and later statistical analysis allows to realize once imported the information?. I am grateful for prompt response to you since it is of supreme urgency to know this information. Luis Eduardo Castillo Méndez. ** Buen día. Trabajo en una entidad pública que maneja millones de datos y registros de varias variables y distribuidos en varios tópicos. Para los análisis estadísticos usamos un paquete estadístico, el cuál nos permite llamar directamente de la base de datos (ORACLE) la información y realizar el análisis estadístico. Todo se hace en tiempo breve. Sin embargo, queremos saber si el paquete estadístico R es capaz de hacer lo mismo que hace el paquete estadístico con el que trabajamos, es decir, R es capaz de importar datos de una base de datos de millones de registros a velocidad máxima y después permita realizar análisis estadístico una vez importado los datos? . Te agradezco pronta respuesta ya que es de suma urgencia saber este dato. Luis Eduardo Castillo Méndez. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question regarding QT device
Hi, After a few modifications in the makefiles, I successfully compiled the Qt device (written by Deepayan Sirkar) for OS X 10.4.9 on a Powerbook. However when loading into R If i remove this line from zzz.R in qtutils/R grDevices::deviceIsInteractive(QT) and then install library(qtutils) loads fine and the QT() calls returns a QT window, however, if i switch to another application and then switch back to the R GUI, the menubar has disappeared. If I do not remove the line grDevices::deviceIsInteractive(QT) the following error appears an qtutils does not load Error : 'deviceIsInteractive' is not an exported object from 'namespace:grDevices' Error : .onLoad failed in 'loadNamespace' for 'qtutils' Error: package/namespace load failed for 'qtutils' Could anyone provide some pointers to get that deviceIsInteractive to work? Thanks for your time Saptarshi Saptarshi Guha | [EMAIL PROTECTED] | http://www.stat.purdue.edu/~sguha [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question regarding QT device
grDevices::deviceIsInteractive is only in the unreleased R-devel version of R: which version are you using? Please do study the R posting guide: we do ask for basic information for a good reason, and do ask for questions on packages (especially unreleased packages) to be sent to the maintainer. On Sun, 5 Aug 2007, Saptarshi Guha wrote: Hi, After a few modifications in the makefiles, I successfully compiled the Qt device (written by Deepayan Sirkar) for OS X 10.4.9 on a Powerbook. However when loading into R If i remove this line from zzz.R in qtutils/R grDevices::deviceIsInteractive(QT) and then install library(qtutils) loads fine and the QT() calls returns a QT window, however, if i switch to another application and then switch back to the R GUI, the menubar has disappeared. If I do not remove the line grDevices::deviceIsInteractive(QT) the following error appears an qtutils does not load Error : 'deviceIsInteractive' is not an exported object from 'namespace:grDevices' Error : .onLoad failed in 'loadNamespace' for 'qtutils' Error: package/namespace load failed for 'qtutils' Could anyone provide some pointers to get that deviceIsInteractive to work? Thanks for your time Saptarshi Saptarshi Guha | [EMAIL PROTECTED] | http://www.stat.purdue.edu/~sguha [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question regarding QT device
On 8/5/07, Saptarshi Guha [EMAIL PROTECTED] wrote: Hi, After a few modifications in the makefiles, I successfully compiled the Qt device (written by Deepayan Sirkar) for OS X 10.4.9 on a Powerbook. Cool, can you send me the modifications? I haven't managed to compile qtutils on OS X yet (not that I've tried too hard). However when loading into R If i remove this line from zzz.R in qtutils/R grDevices::deviceIsInteractive(QT) and then install library(qtutils) loads fine and the QT() calls returns a QT window, however, if i switch to another application and then switch back to the R GUI, the menubar has disappeared. I can't test this, but Qt is designed to run as the main application, so it's possible that it is overriding whatever the GUI is doing. If I do not remove the line grDevices::deviceIsInteractive(QT) the following error appears an qtutils does not load Error : 'deviceIsInteractive' is not an exported object from 'namespace:grDevices' Error : .onLoad failed in 'loadNamespace' for 'qtutils' Error: package/namespace load failed for 'qtutils' Could anyone provide some pointers to get that deviceIsInteractive to work? What's your R version? Do you see this in 2.5.1? -Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about logistic models (AIC)
Dear fellow R-ussers, I have a question about the Akaike Information Criterion in the R output. Normally you would want it to be as small as possible, yet when i check up the information in my study books, the AIC is usually displayed as a negative number. In the exercises it is given as - AIC . R displays it as a positive number, does this mean that a large AIC gives a small - AIC , so the bigger the better? Kind regards, Tom. Tom Willems CODA-CERVA-VAR Department of Virology Epizootic Diseases Section Groeselenberg 99 B-1180 Ukkel Tel.: +32 2 3790522 Fax: +32 2 3790666 E-mail: [EMAIL PROTECTED] www.var.fgov.be Disclaimer: click here [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about logistic models (AIC)
Tom: That's a good question. AIC, as i'm sure you know, is usually calculated as 2k-2ln(L), where k is the number of free parameters in the model, and L is the log-likelihood of the model. The goal of AIC--and the reason one normally tries to select a model with minimal AIC--is to minimize what's referred to as the Kullback-Leibler distance between the distribution of your data's density from the theoretical true theoretical density as defined by the model. More concisely, the AIC is an index of the amount of information regarding your data that is lost when your model is used to describe it. To get back to your question, I can't say without a little more information why the AIC's your referring to are negative (but perhaps it's an issue of using the log-likelihood instead of the negative log- likelihood), but my first instinct is that it doesn't matter. I would go with the AIC that is closest to zero. I hope that helps. Kyle H. Ambert Graduate Student, Dept. Behavioral Neuroscience Oregon Health Science University [EMAIL PROTECTED] On Aug 3, 2007, at 8:41 AM, Tom Willems wrote: Dear fellow R-ussers, I have a question about the Akaike Information Criterion in the R output. Normally you would want it to be as small as possible, yet when i check up the information in my study books, the AIC is usually displayed as a negative number. In the exercises it is given as - AIC . R displays it as a positive number, does this mean that a large AIC gives a small - AIC , so the bigger the better? Kind regards, Tom. Tom Willems CODA-CERVA-VAR Department of Virology Epizootic Diseases Section Groeselenberg 99 B-1180 Ukkel Tel.: +32 2 3790522 Fax: +32 2 3790666 E-mail: [EMAIL PROTECTED] www.var.fgov.be Disclaimer: click here [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about logistic models (AIC)
Tom Willems Tom.Willems at var.fgov.be writes: I have a question about the Akaike Information Criterion in the R output. Normally you would want it to be as small as possible, yet when i check up the information in my study books, the AIC is usually displayed as a negative number. In the exercises it is given as - AIC . R displays it as a positive number, does this mean that a large AIC gives a small - AIC , so the bigger the better? I don't know about your books, but confirm that smaller AIC is better. AIC is usually positive (likelihood is between 0 and 1, therefore log-likelihood is negative, therefore -2L + 2k is positive), although continuous probability densities or neglected normalization coefficients can lead to negative AICs -- but smaller (more negative, if AIC0) is still better. Ben Bolker __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about logistic models (AIC)
On 03-Aug-07 16:42:33, Kyle. wrote: Tom: That's a good question. AIC, as i'm sure you know, is usually calculated as 2k-2ln(L), where k is the number of free parameters in the model, and L is the log-likelihood of the model. The goal of AIC--and the reason one normally tries to select a model with minimal AIC--is to minimize what's referred to as the Kullback-Leibler distance between the distribution of your data's density from the theoretical true theoretical density as defined by the model. More concisely, the AIC is an index of the amount of information regarding your data that is lost when your model is used to describe it. To get back to your question, I can't say without a little more information why the AIC's your referring to are negative (but perhaps it's an issue of using the log-likelihood instead of the negative log- likelihood), but my first instinct is that it doesn't matter. I would go with the AIC that is closest to zero. I hope that helps. That could be wrong! Don't forget that ln(L) is indeterminate to within an additive constant (for given data), so one man's negative AIC could be another mans positive AIC -- but if each compared their AICs with different models (the same different models for each) then they should get the same *difference* of AIC. The correct approach is to compare AICs on the basis of algebraic difference: AIC1 - AIC2 in comparing Model1 with Model2. If this is positive then Model2 is better than Model1 (on the AIC criterion). Taking the AIC that is closest to zero would be the wrong way round for negative AICs. Ted. On Aug 3, 2007, at 8:41 AM, Tom Willems wrote: Dear fellow R-ussers, I have a question about the Akaike Information Criterion in the R output. Normally you would want it to be as small as possible, yet when i check up the information in my study books, the AIC is usually displayed as a negative number. In the exercises it is given as - AIC . R displays it as a positive number, does this mean that a large AIC gives a small - AIC , so the bigger the better? Kind regards, Tom. Tom Willems CODA-CERVA-VAR Department of Virology Epizootic Diseases Section Groeselenberg 99 B-1180 Ukkel Tel.: +32 2 3790522 Fax: +32 2 3790666 E-mail: [EMAIL PROTECTED] www.var.fgov.be E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 03-Aug-07 Time: 18:31:51 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about logistic models (AIC)
Ah. Very good point. I stand corrected. ---Kyle. [EMAIL PROTECTED] 08/03/07 10:32 AM On 03-Aug-07 16:42:33, Kyle. wrote: Tom: That's a good question. AIC, as i'm sure you know, is usually calculated as 2k-2ln(L), where k is the number of free parameters in the model, and L is the log-likelihood of the model. The goal of AIC--and the reason one normally tries to select a model with minimal AIC--is to minimize what's referred to as the Kullback-Leibler distance between the distribution of your data's density from the theoretical true theoretical density as defined by the model. More concisely, the AIC is an index of the amount of information regarding your data that is lost when your model is used to describe it. To get back to your question, I can't say without a little more information why the AIC's your referring to are negative (but perhaps it's an issue of using the log-likelihood instead of the negative log- likelihood), but my first instinct is that it doesn't matter. I would go with the AIC that is closest to zero. I hope that helps. That could be wrong! Don't forget that ln(L) is indeterminate to within an additive constant (for given data), so one man's negative AIC could be another mans positive AIC -- but if each compared their AICs with different models (the same different models for each) then they should get the same *difference* of AIC. The correct approach is to compare AICs on the basis of algebraic difference: AIC1 - AIC2 in comparing Model1 with Model2. If this is positive then Model2 is better than Model1 (on the AIC criterion). Taking the AIC that is closest to zero would be the wrong way round for negative AICs. Ted. On Aug 3, 2007, at 8:41 AM, Tom Willems wrote: Dear fellow R-ussers, I have a question about the Akaike Information Criterion in the R output. Normally you would want it to be as small as possible, yet when i check up the information in my study books, the AIC is usually displayed as a negative number. In the exercises it is given as - AIC . R displays it as a positive number, does this mean that a large AIC gives a small - AIC , so the bigger the better? Kind regards, Tom. Tom Willems CODA-CERVA-VAR Department of Virology Epizootic Diseases Section Groeselenberg 99 B-1180 Ukkel Tel.: +32 2 3790522 Fax: +32 2 3790666 E-mail: [EMAIL PROTECTED] www.var.fgov.be E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 03-Aug-07 Time: 18:31:51 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question on using gl1ce from lasso2 package
G'day Li, On Wed, 25 Jul 2007 00:50:43 -0400 Li Li [EMAIL PROTECTED] wrote: I tried several settings by using the family=gaussian in gl1ce, but none of them works. [...] gl1ce(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length, data=iris,family=gaussian()) Error in eval(expr, envir, enclos) : object etastart not found Does anyone have experience with this function? Any help will be appreciated, Actually, gl1ce(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length, data=iris, + family=gaussian) should work. No need to say `family=gaussian()'. However, omitting the brackets leads to an even more obscure error message and using traceback() makes me believe that the function `family()' must have been changed since this code was ported from S-Plus to R. The version with brackets does not seem to work since the function that tries to determine initial values for the iterative process of fitting a GLM tries to access an object called `etastart', which exist in the list of formal parameters of glm() but is not a formal parameter of gl1ce(). I am not sure whether this problem always existed or is also new due to changes in glm() and accompanying functions. (Time to re-work the whole lasso2 package, I guess.) A way to solve your problem is to issue the following commands: etastart - NULL gl1ce(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length, data=iris, + family=gaussian()) However, since you are using the gaussian family, why not use l1ce() directly? The following command leads to the same output: l1ce(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length, data=iris, + absolute=TRUE) Hope this helps. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question on using gl1ce from lasso2 package
Hi, I tried several settings by using the family=gaussian in gl1ce, but none of them works. For the case glm can work. Here is the error message I got: glm(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length ,data=iris,family=gaussian()) gl1ce(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length ,data=iris,family=gaussian()) Error in eval(expr, envir, enclos) : object etastart not found Does anyone have experience with this function? Any help will be appreciated, Li [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about cmdscale function
Hi. I know matrices that use distances between places works fine when using cmdscale. However, what about matrices such as: A B C D E A 01 23 12 9 B 10 10 12 3 C 23 10 0 23 4 D 12 12 23 0 21 E 9 34 21 0 i.e. matrices which do not represent physical distances between places (as they would not make sense for real distances such as the one above) but other statistics instead? Thanks Andy [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about genalg in R
Hi I am first looking at genalg as a starting point for GA implementation in R. I have a couple of questions regarding this package and GA implementation in general. If you could provide some hints on these questions, I would really appreciate. Thank you in advance. 1. I found two methods for performing optimization based on the type of chromosome encoding, i.e. binary and float. I'm wondering if I have a representation that mixes between the two, is there any other methods or functions that I can use? If there is none in genalg, is there any such implementation in other package? 2. This is a question on multiobjective optimization base on GA. I'm wondering if there is any implementation in R or known package that I can use? Thank you. - adschai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about ar1 time series
Hello everybody, I recently wrote a program that to generate AR1 time series, here the code: #By Jomopo. Junio-2007, Leioa, Vizcaya #This program to create the AR1 syntetic series (one by one) #Where the mean is zero, but the variance of the serie AR1 and #the coef. of AR1 are be changed. If var serie AR1 = 1 then is standarized! #Final version for AR1 time series program #Mon Jul 16 11:58:03 CEST 2007 Checked again in R-prompt, and it's OK! #Creating the sintetic AR1 series... where the white-noise #has a mean = 0, and the var = sigmaz_c = stand_dev^2 is whatever value, #if sigmaz_c = 1 then this white-noise is a Gaussian-noise. #rho1 (or alpha in another text-books ;-)) 1 (in fact 0 rho1 1) so that #the system can be stationary. #Where var_serie is the variance of the serie cat(\n Hello, this is creat_AR1_synt_ser.R. \n These are the input parameters: synt_series(ar1_length, rho1, ...), where rho1 is the correlat. coef.\n) ar1 - function(x, rho1, af) { return(x*rho1 + runif(1, -af, af)) } #Spin-up for the AR1 series. For this case is enough with this amount spinup - function(x0, rho1, af) { xt - x0 for (i in 1:100) { xtp1 - ar1(xt, rho1, af) xt - xtp1 } return(xt) } #Wherein ar1_length is the number of data in AR1 series #rho1 is a correlation coef. #sigmaz_c is optional synt_series - function(ar1_length, rho1, var_serie) { if( (var_serie = 0) || rho1 = -1 || rho1 = 1 ) stop(The variance of the serie should be 0, or the rho1 parameter should be between (-1, 1) for that the serie can be stationary, be careful with this, bye. \n) syntdata- rep(0, ar1_length) #af = adjustement factor, i.e. for that the var of random numbers = var of white noise (check the manual of runif) af - sqrt( 3 * var_serie * (1 - rho1) * (1 + rho1) ) x0 - runif(1, -af, af) syntdata[1] - spinup(x0, rho1, af) for (i in 2:ar1_length) { syntdata[i] - ar1(syntdata[i - 1], rho1, af) } return(syntdata) } I would like some suggestions and hints. Thanks a lot for your help! -- Josué Mosés Polanco Martínez Correo-e alternativo [EMAIL PROTECTED] It is a wasted day unless you have learned something new and made someone smile -Mark Weingartz. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about acception rejection sampling - NOT R question
Not a strict proof, but think of it this way: The liklihood of getting a particular value of x has 2 parts. 1st x has to be generated from h, the liklihood of this happening is h(x), 2nd the point has to be accepted with conditional probability f(x)/(c*h(x)). If we multiply we get h(x) * f(x)/ ( c* h(x) ) and the 2 h(x)'s cancel out leaving the liklihood of getting x as f(x)/c. The /c just indicates that approximately 1-1/c points will be rejected and thrown out and the final normalized distribution is f(x), which was the goal. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Leeds, Mark (IED) Sent: Friday, July 13, 2007 2:45 PM To: r-help@stat.math.ethz.ch Subject: [R] Question about acception rejection sampling - NOT R question This is not related to R but I was hoping that someone could help me. I am reading the Understanding the Metropolis Hastings Algorithm paper from the American Statistician by Chip and Greenberg, 1995, Vol 49, No 4. Right at the beginning they explain the algorithm for basic acceptance rejection sampling in which you want to simulate a density from f(x) but it's not easy and you are able to generate from another density called h(x). The assumption is that there exists some c such that f(x) = c(h(x) for all x They clearly explain the steps as follows : 1) generate Z from h(x). 2) generate u from a Uniform(0,1) 3) if u is less than or equal to f(Z)/c(h(Z) then return Z as the generated RV; otherwise reject it and try again. I think that, since f(Z)/c(h(z) is U(0,1), then u has the distrbution as f(Z)/c(h(Z). But, I don't understand why the generated and accepted Z's have the same density as f(x) ? Does someone know where there is a proof of this or if it's reasonably to explain, please feel free to explain it. They authors definitely believe it's too trivial because they don't. The reason I ask is because, if I don't understand this then I definitely won't understand the rest of the paper because it gets much more complicated. I willing to track down the proof but I don't know where to look. Thanks. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about Network library
Hello, I'm a bit new to R and am trying to plot a network using the network package. Based on the documentation, if I do the following steps, things work fine: test = c(0,1,1,1,0,0,0,1,0) m = matrix(test, 3) g = network(m) plot(g) However, if I try to read in the above nine numbers (space-delimted) from a file into a table using read.table(), things don't work: test = read.table(test.dat, header=FALSE) m = matrix(test,3) g = network(m) - doesn't work - g is not created Can someone please offer a suggestion about what to do? I'm guessing that I maybe shouldn't be using read.table() for this, or maybe I need to do something to convert the table into something compatible with matrix() and/or network()? Thank you, Joe - Got a little couch potato? Check out fun summer activities for kids. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about ar1 time series
Le 07-07-16 à 09:50, Josué Polanco a écrit : Hello everybody, I recently wrote a program that to generate AR1 time series, here the code: #By Jomopo. Junio-2007, Leioa, Vizcaya #This program to create the AR1 syntetic series (one by one) #Where the mean is zero, but the variance of the serie AR1 and #the coef. of AR1 are be changed. If var serie AR1 = 1 then is standarized! #Final version for AR1 time series program #Mon Jul 16 11:58:03 CEST 2007 Checked again in R-prompt, and it's OK! #Creating the sintetic AR1 series... where the white-noise #has a mean = 0, and the var = sigmaz_c = stand_dev^2 is whatever value, #if sigmaz_c = 1 then this white-noise is a Gaussian-noise. #rho1 (or alpha in another text-books ;-)) 1 (in fact 0 rho1 1) so that #the system can be stationary. #Where var_serie is the variance of the serie cat(\n Hello, this is creat_AR1_synt_ser.R. \n These are the input parameters: synt_series(ar1_length, rho1, ...), where rho1 is the correlat. coef.\n) ar1 - function(x, rho1, af) { return(x*rho1 + runif(1, -af, af)) } #Spin-up for the AR1 series. For this case is enough with this amount spinup - function(x0, rho1, af) { xt - x0 for (i in 1:100) { xtp1 - ar1(xt, rho1, af) xt - xtp1 } return(xt) } #Wherein ar1_length is the number of data in AR1 series #rho1 is a correlation coef. #sigmaz_c is optional synt_series - function(ar1_length, rho1, var_serie) { if( (var_serie = 0) || rho1 = -1 || rho1 = 1 ) stop(The variance of the serie should be 0, or the rho1 parameter should be between (-1, 1) for that the serie can be stationary, be careful with this, bye. \n) syntdata- rep(0, ar1_length) #af = adjustement factor, i.e. for that the var of random numbers = var of white noise (check the manual of runif) af - sqrt( 3 * var_serie * (1 - rho1) * (1 + rho1) ) x0 - runif(1, -af, af) syntdata[1] - spinup(x0, rho1, af) for (i in 2:ar1_length) { syntdata[i] - ar1(syntdata[i - 1], rho1, af) } return(syntdata) } I would like some suggestions and hints. Here's one: look at arima.sim() and ease your life a lot. ;-) Thanks a lot for your help! -- Josué Mosés Polanco Martínez Correo-e alternativo [EMAIL PROTECTED] It is a wasted day unless you have learned something new and made someone smile -Mark Weingartz. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. --- Vincent Goulet, Associate Professor École d'actuariat Université Laval, Québec [EMAIL PROTECTED] http://vgoulet.act.ulaval.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about acception rejection sampling - NOT R question
This is not related to R but I was hoping that someone could help me. I am reading the Understanding the Metropolis Hastings Algorithm paper from the American Statistician by Chip and Greenberg, 1995, Vol 49, No 4. Right at the beginning they explain the algorithm for basic acceptance rejection sampling in which you want to simulate a density from f(x) but it's not easy and you are able to generate from another density called h(x). The assumption is that there exists some c such that f(x) = c(h(x) for all x They clearly explain the steps as follows : 1) generate Z from h(x). 2) generate u from a Uniform(0,1) 3) if u is less than or equal to f(Z)/c(h(Z) then return Z as the generated RV; otherwise reject it and try again. I think that, since f(Z)/c(h(z) is U(0,1), then u has the distrbution as f(Z)/c(h(Z). But, I don't understand why the generated and accepted Z's have the same density as f(x) ? Does someone know where there is a proof of this or if it's reasonably to explain, please feel free to explain it. They authors definitely believe it's too trivial because they don't. The reason I ask is because, if I don't understand this then I definitely won't understand the rest of the paper because it gets much more complicated. I willing to track down the proof but I don't know where to look. Thanks. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about acception rejection sampling - NOT R question
On Fri, 13 Jul 2007, Leeds, Mark (IED) wrote: This is not related to R but I was hoping that someone could help me. I am reading the Understanding the Metropolis Hastings Algorithm paper from the American Statistician by Chip and Greenberg, 1995, Vol 49, No 4. Right at the beginning they explain the algorithm for basic acceptance rejection sampling in which you want to simulate a density from f(x) but it's not easy and you are able to generate from another density called h(x). The assumption is that there exists some c such that f(x) = c(h(x) for all x They clearly explain the steps as follows : 1) generate Z from h(x). 2) generate u from a Uniform(0,1) 3) if u is less than or equal to f(Z)/c(h(Z) then return Z as the generated RV; otherwise reject it and try again. I think that, since f(Z)/c(h(z) is U(0,1), then u has the distrbution as f(Z)/c(h(Z). But, I don't understand why the generated and accepted Z's have the same density as f(x) ? Does someone know where there is a proof of this or if it's reasonably to explain, please feel free to explain it. The original reference to J. von Neumann's work, which no doubt has a proof, is in http://en.wikipedia.org/wiki/Rejection_sampling along with a suggestive graph. Here is another graph that may give your intuition a boost: f - function(x) dnorm(x)/2+dnorm(x,sd=0.1)/2 h - dnorm Z - rnorm(100) u - runif(100) cee - f(0)/h(0) # as is obvious u.lt.f.over.ch - u f(Z)/cee/h(Z) cutpts - seq(-5,5,by=0.1) midpts - head(cutpts,n=-1)/2 + tail(cutpts,n=-1)/2 tab - table( !u.lt.f.over.ch, cut( Z, cutpts ) ) bp - barplot( tab ) lines( bp, f(midpts)*sum(u.lt.f.over.ch)*0.1,col='red',lwd=2) Of course, there is some discreteness in this plot. If you have a 1280x1024 or wider screen or want to zoom in on a pdf(), you might try 0.025 or even 0.0125 in place of 0.1. Turn this graph on its side and think about what u.lt.f.over.ch is doing conditionally on Z, i.e. look at one bar and think about why it is split where it is. HTH, Chuck They authors definitely believe it's too trivial because they don't. The reason I ask is because, if I don't understand this then I definitely won't understand the rest of the paper because it gets much more complicated. I willing to track down the proof but I don't know where to look. Thanks. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about gamm models
Dear R-Users, I have a question concerning mixed models in R. The strata of my model are the counties of Germany. The differencies between these counties should be modelled as realizations of a normally distributed random variable X. Moreover, the model contains a 0/1 variable A that enters as a fixed effect. The only special feature that should be additionally in the model is the following: In a usual mixed model the (constant) variance of X will be chosen in an optimal way, but I want to fit 2 constant variances, one for the subset {A=0} and the other one for the subset {A=1}. Nevertheless it should be one and the same random variable X. I know that it is possible to fit a model with two independent random variables X1 and X2 for the subsets {A=0} and {A=1} respectively. But I want it to be the same! Equivalently the correlation between X1 and X2 should be 1. Can anybody help me in this respect? Yours sincerely Matthias an der Heiden __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question for svm function in e1071
thank you. This is definitely doable. - adschai- Original Message -From: David Meyer Date: Friday, July 6, 2007 6:34 amSubject: Re: [R] Question for svm function in e1071To: [EMAIL PROTECTED]: r-help@stat.math.ethz.ch Adschai: The function is written in C++, so debugging the source code of the R svm() function will not really help. What you can do to make the C-code more verbose is the following: - get the sources of e1071 - in the src/ directory, look up the svm.cpp file - In line 37, there is: #if 0 void info(char *fmt,...) [...] replace the first line by: #if 1 - build and install the package again. Best David Sorry that I have many questions today. I am using svm function on about 180,000 points of training set. It takes very long time to run. However,I would like it to spit out something to make sure that the run is not dead in between. Would you please suggest anyway to do ! so? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question for svm function in e1071
Adschai: The function is written in C++, so debugging the source code of the R svm() function will not really help. What you can do to make the C-code more verbose is the following: - get the sources of e1071 - in the src/ directory, look up the svm.cpp file - In line 37, there is: #if 0 void info(char *fmt,...) [...] replace the first line by: #if 1 - build and install the package again. Best David Sorry that I have many questions today. I am using svm function on about 180,000 points of training set. It takes very long time to run. However, I would like it to spit out something to make sure that the run is not dead in between. Would you please suggest anyway to do so? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question for svm function in e1071
adschai : this isn't particularly helpful but when I am using a function from a package called xxx that I have little knowledge about, I take the source as is and create my own function out of It called my.xxx and then put print statements Inside it to see what's going on. This is probably an extremely kludgy way of dealing with that problem But it is a way. I'm not R expert enough to know any other way but I would Be interested if you get a better private reply. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Thursday, July 05, 2007 12:36 AM To: r-help@stat.math.ethz.ch Subject: [R] Question for svm function in e1071 Hi, Sorry that I have many questions today. I am using svm function on about 180,000 points of training set. It takes very long time to run. However, I would like it to spit out something to make sure that the run is not dead in between. Would you please suggest anyway to do so? And is there anyway to speed up the performance of this svm function? Thank you. - adschai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about framework to weighting different classes in SVM
Adschai: here is an example for class.weights (isn't it on the help page?): data(iris) i2 - iris levels(i2$Species)[3] - versicolor summary(i2$Species) wts - 100 / table(i2$Species) wts m - svm(Species ~ ., data = i2, class.weights = wts) Cheers, David __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on Rmpi looping
Dear R list, In the course of learning to work with Rmpi, we are confused about a few points. The following simple program is based on some examples we retrieved from the web. Each slave is writing the same output line multiple times (a multiple equal to the number of slaves). In other words, the write statements are being executed a number of times equal to the number of slaves. I am trying to print out the slave number to a file (once), but it is printing it 4 times (since the number of slaves is 4). The code is as follows: # Initialize the Rmpi environment library(Rmpi) # We are spawning four slaves mpi.spawn.Rslaves(nslaves=4) if (mpi.comm.size() != 5) { print(Please initialize an MPI cluster of at least 5 processors.) print(Then, try again) mpi.quit() } .Last - function() { if (is.loaded(mpi_initialize)) { if (mpi.comm.size(1) 0) { print(Please use mpi.close.Rslaves() to close slaves.) mpi.close.Rslaves() } print(Please use mpi.quit() to quit R) .Call(mpi_finalize) } } # Define the base directory basedir - /home/user/directory/ fcnfishtest - function() { wrout = paste(basedir, paste(processor,my_rank, sep=), sep=) write(my_rank, wrout, append=TRUE) } ## END OF SLAVES ## # We're in the parent. #Have each slave get its rank mpi.bcast.cmd(my_rank - mpi.comm.rank()) mpi.bcast.Robj2slave(basedir) # Send the function to the slaves mpi.bcast.Robj2slave(fcnfishtest) # Call the function x- mpi.remote.exec(fcnfishtest()) x # close slaves and exit mpi.close.Rslaves() mpi.quit(save = no) # end code The output is as follows: file 1: 1 1 1 1 file 2: 2 2 2 2 file 3: 3 3 3 3 file 4: 4 4 4 4 We would like to call four slaves with output files like: file 1: 1 file 2: 2 file 3: 3 file 4: 4 Any help would be greatly appreciated. Thank you! Kathy Gerber University of Virginia - ITC - Research Computing Support [EMAIL PROTECTED] 434-982-4986 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about framework to weighting different classes in SVM
Hi gurus, I have a doubt about multiclass classification SVM. The population in my data includes a couple of class labels that have relatively small proportion of the entire population compared to other classes. I would like SVM to pay more attention to these classes. However, the question I am having here is that is there any systematic/theoretic framework to determine the weights for each class? My second question is directly related to R. I would like to use the class.weights attribute in svm function. However, I'm quite confused a bit about how to use it from the description I got from ?svm. Below is the quote. 'a named vector of weights for the different classes, used for asymetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named.' Is the name of the vector has to match the levels in my factor used as target labels for my classification? Any simple example would be really appreciated. Thank you! - adschai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question for svm function in e1071
Hi, Sorry that I have many questions today. I am using svm function on about 180,000 points of training set. It takes very long time to run. However, I would like it to spit out something to make sure that the run is not dead in between. Would you please suggest anyway to do so? And is there anyway to speed up the performance of this svm function? Thank you. - adschai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Statistics Question not R question: competing risks and non-informative censoring
All, I am working with Emergency Department (ED) Length of Stay Data. The ED visit can end in one of a variety of ways (Admit, discharge, transfer, etc...) Initially, I have modeled the time to event by fitting a survival model to the time the outcome of interest and treat all other outcomes as censoring. However I recently came across the cmprsk package in R which seems to be developed specifically for competing risks survival models. I have been able to gather that the key issue in deciding whether to model each outcome separately or to use something like the cmprsk package is to determine whether or not the censoring is non-informative. I have read a up a little about informative vs. no-informative censoring, but remain confused, is their a standard approach to determining whether or not censoring is informative, can anyone suggest some good (approachable/non-technical) references on the subject? best, Spencer [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about PCA with prcomp
Hello All, The basic premise of what I want to do is the following: I have 20 entities for which I have ~500 measurements each. So, I have a matrix of 20 rows by ~500 columns. The 20 entities fall into two classes: good and bad. I eventually would like to derive a model that would then be able to classify new entities as being in good territory or bad territory based upon my existing data set. I know that not all ~500 measurements are meaningful, so I thought the best place to begin would be to do a PCA in order to reduce the amount of data with which I have to work. I did this using the prcomp function and found that nearly 90% of the variance in the data is explained by PC1 and 2. So far, so good. I would now like to find out which of the original ~500 measurements contribute to PC1 and 2 and by how much. Any tips would be greatly appreciated! And apologies in advance if this turns out to be an idiotic question. james __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about PCA with prcomp
Hi James, Have a look at Cadima et al.'s subselect package [Cadima worked with/was a student of Prof Jolliffe, one of _the_ experts on PCA; Jolliffe devotes part of a Chapter to this question in his text (Principal Component Analysis, pub. Springer)]. Then you should look at psychometric stuff: a good place to start would be Professor Revelle's psych package. BestR, Mark. James R. Graham wrote: Hello All, The basic premise of what I want to do is the following: I have 20 entities for which I have ~500 measurements each. So, I have a matrix of 20 rows by ~500 columns. The 20 entities fall into two classes: good and bad. I eventually would like to derive a model that would then be able to classify new entities as being in good territory or bad territory based upon my existing data set. I know that not all ~500 measurements are meaningful, so I thought the best place to begin would be to do a PCA in order to reduce the amount of data with which I have to work. I did this using the prcomp function and found that nearly 90% of the variance in the data is explained by PC1 and 2. So far, so good. I would now like to find out which of the original ~500 measurements contribute to PC1 and 2 and by how much. Any tips would be greatly appreciated! And apologies in advance if this turns out to be an idiotic question. james __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a11398608 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about PCA with prcomp
Mark, What you are referring to deals with the selection of covariates, since PC doesn't do dimensionality reduction in the sense of covariate selection. But what Mark is asking for is to identify how much each data point contributes to individual PCs. I don't think that Mark's query makes much sense, unless he meant to ask: which individuals have high/low scores on PC1/PC2. Here are some comments that may be tangentially related to Mark's question: 1. If one is worried about a few data points contributing heavily to the estimation of PCs, then one can use robust PCA, for example, using robust covariance matrices. MASS has some tools for this. 2. The biplot for the first 2 PCs can give some insights 3. PCs, especially, the last few PCs, can be used to identify outliers. Hope this is helpful, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Difford Sent: Monday, July 02, 2007 1:55 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] Question about PCA with prcomp Hi James, Have a look at Cadima et al.'s subselect package [Cadima worked with/was a student of Prof Jolliffe, one of _the_ experts on PCA; Jolliffe devotes part of a Chapter to this question in his text (Principal Component Analysis, pub. Springer)]. Then you should look at psychometric stuff: a good place to start would be Professor Revelle's psych package. BestR, Mark. James R. Graham wrote: Hello All, The basic premise of what I want to do is the following: I have 20 entities for which I have ~500 measurements each. So, I have a matrix of 20 rows by ~500 columns. The 20 entities fall into two classes: good and bad. I eventually would like to derive a model that would then be able to classify new entities as being in good territory or bad territory based upon my existing data set. I know that not all ~500 measurements are meaningful, so I thought the best place to begin would be to do a PCA in order to reduce the amount of data with which I have to work. I did this using the prcomp function and found that nearly 90% of the variance in the data is explained by PC1 and 2. So far, so good. I would now like to find out which of the original ~500 measurements contribute to PC1 and 2 and by how much. Any tips would be greatly appreciated! And apologies in advance if this turns out to be an idiotic question. james __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a1139860 8 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about PCA with prcomp
On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote: | Mark, | | What you are referring to deals with the selection of covariates, since PC | doesn't do dimensionality reduction in the sense of covariate selection. | But what Mark is asking for is to identify how much each data point | contributes to individual PCs. I don't think that Mark's query makes much | sense, unless he meant to ask: which individuals have high/low scores on | PC1/PC2. Here are some comments that may be tangentially related to Mark's | question: | | 1. If one is worried about a few data points contributing heavily to the | estimation of PCs, then one can use robust PCA, for example, using robust | covariance matrices. MASS has some tools for this. | 2. The biplot for the first 2 PCs can give some insights | 3. PCs, especially, the last few PCs, can be used to identify outliers. What is meant by last few PCs? -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_Middle minds discuss events (:_~*~_:)Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about PCA with prcomp
The PCs that are associated with the smaller eigenvalues. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Patrick Connolly [mailto:[EMAIL PROTECTED] Sent: Monday, July 02, 2007 4:23 PM To: Ravi Varadhan Cc: 'Mark Difford'; r-help@stat.math.ethz.ch Subject: Re: [R] Question about PCA with prcomp On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote: | Mark, | | What you are referring to deals with the selection of covariates, since PC | doesn't do dimensionality reduction in the sense of covariate selection. | But what Mark is asking for is to identify how much each data point | contributes to individual PCs. I don't think that Mark's query makes much | sense, unless he meant to ask: which individuals have high/low scores on | PC1/PC2. Here are some comments that may be tangentially related to Mark's | question: | | 1. If one is worried about a few data points contributing heavily to the | estimation of PCs, then one can use robust PCA, for example, using robust | covariance matrices. MASS has some tools for this. | 2. The biplot for the first 2 PCs can give some insights | 3. PCs, especially, the last few PCs, can be used to identify outliers. What is meant by last few PCs? -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_Middle minds discuss events (:_~*~_:)Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about PCA with prcomp
Hi James, Ravi: James wrote: ... I have 20 entities for which I have ~500 measurements each. So, I have a matrix of 20 rows by ~500 columns. ... Perhaps I misread James' question, but I don't think so. As James described it, we have ~500 measurements made on 20 objects. A PCA on this [20 rows/observations by ~ 500 columns/descriptors/variables] should return ~ 500 eigenvalues. And each of these columns/descriptors/variables will have a loading on each PC. James wants to reduce his descriptors/measurements/variables to the most important (variance). A primitive way of doing this would be to examine the loadings on the first 2--3 PCs and choose those columns/descriptors/variables with the highest loadings, and throw away the rest. [He has already decided that he can throw away all but the first two PCs.] In fact, it would be a very good idea to do a coinertia analysis on the pre- and post-selected sets, and look at the RV value. If this is above [thumbsuck] 0.9, then you're doing very well (there's a good plot method for this in ade4, cf coinertia c). But see Cadima et al. (+refs for caution; and elsewhere) for more sophisticated methods of subsetting. Regards, Mark. Ravi Varadhan wrote: Mark, What you are referring to deals with the selection of covariates, since PC doesn't do dimensionality reduction in the sense of covariate selection. But what Mark is asking for is to identify how much each data point contributes to individual PCs. I don't think that Mark's query makes much sense, unless he meant to ask: which individuals have high/low scores on PC1/PC2. Here are some comments that may be tangentially related to Mark's question: 1. If one is worried about a few data points contributing heavily to the estimation of PCs, then one can use robust PCA, for example, using robust covariance matrices. MASS has some tools for this. 2. The biplot for the first 2 PCs can give some insights 3. PCs, especially, the last few PCs, can be used to identify outliers. Hope this is helpful, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Difford Sent: Monday, July 02, 2007 1:55 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] Question about PCA with prcomp Hi James, Have a look at Cadima et al.'s subselect package [Cadima worked with/was a student of Prof Jolliffe, one of _the_ experts on PCA; Jolliffe devotes part of a Chapter to this question in his text (Principal Component Analysis, pub. Springer)]. Then you should look at psychometric stuff: a good place to start would be Professor Revelle's psych package. BestR, Mark. James R. Graham wrote: Hello All, The basic premise of what I want to do is the following: I have 20 entities for which I have ~500 measurements each. So, I have a matrix of 20 rows by ~500 columns. The 20 entities fall into two classes: good and bad. I eventually would like to derive a model that would then be able to classify new entities as being in good territory or bad territory based upon my existing data set. I know that not all ~500 measurements are meaningful, so I thought the best place to begin would be to do a PCA in order to reduce the amount of data with which I have to work. I did this using the prcomp function and found that nearly 90% of the variance in the data is explained by PC1 and 2. So far, so good. I would now like to find out which of the original ~500 measurements contribute to PC1 and 2 and by how much. Any tips would be greatly appreciated! And apologies in advance if this turns out to be an idiotic question. james __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a1139860 8 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help
Re: [R] Question about PCA with prcomp
...but with 500 variables and only 20 'entities' (observations) you will have 481 PCs with dead zero eigenvalues. How small is 'smaller' and how many is a few? Everyone who has responded to this seems to accept the idea that PCA is the way to go here, but that is not clear to me at all. There is a 2-sample structure in the 20 observations that you have. If you simply ignore that in doing your PCA you are making strong assumptions about sampling that would seem to me unlikely to be met. If you allow for the structure and project orthogonal to it then you are probably throwing the baby out with the bathwater - you want to choose variables which maximise separation between the 2 samples (and now you are up to 482 zero principal variances, if that matters...). I think this problem probably needs a bit of a re-think. Some variant on singular LDA, for example, may be a more useful way to think about it. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Monday, 2 July 2007 1:29 PM To: 'Patrick Connolly' Cc: r-help@stat.math.ethz.ch; 'Mark Difford' Subject: Re: [R] Question about PCA with prcomp The PCs that are associated with the smaller eigenvalues. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Patrick Connolly [mailto:[EMAIL PROTECTED] Sent: Monday, July 02, 2007 4:23 PM To: Ravi Varadhan Cc: 'Mark Difford'; r-help@stat.math.ethz.ch Subject: Re: [R] Question about PCA with prcomp On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote: | Mark, | | What you are referring to deals with the selection of covariates, | since PC | doesn't do dimensionality reduction in the sense of covariate selection. | But what Mark is asking for is to identify how much each data point | contributes to individual PCs. I don't think that Mark's query makes much | sense, unless he meant to ask: which individuals have high/low scores | on PC1/PC2. Here are some comments that may be tangentially related | to Mark's | question: | | 1. If one is worried about a few data points contributing heavily to | the estimation of PCs, then one can use robust PCA, for example, | using robust covariance matrices. MASS has some tools for this. | 2. The biplot for the first 2 PCs can give some insights 3. PCs, | especially, the last few PCs, can be used to identify outliers. What is meant by last few PCs? -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_Middle minds discuss events (:_~*~_:)Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about PCA with prcomp
To all ..., Bill's lateral wisdom is almost certainly a better solution. So thanks for the advice (and everything else that went before it [Bill: apropos of termplot, what happened to tplot ?]). And I will [almost] desist from asking the obvious: and if there were 10 000 observations ? BestR, Mark. Bill.Venables wrote: ...but with 500 variables and only 20 'entities' (observations) you will have 481 PCs with dead zero eigenvalues. How small is 'smaller' and how many is a few? Everyone who has responded to this seems to accept the idea that PCA is the way to go here, but that is not clear to me at all. There is a 2-sample structure in the 20 observations that you have. If you simply ignore that in doing your PCA you are making strong assumptions about sampling that would seem to me unlikely to be met. If you allow for the structure and project orthogonal to it then you are probably throwing the baby out with the bathwater - you want to choose variables which maximise separation between the 2 samples (and now you are up to 482 zero principal variances, if that matters...). I think this problem probably needs a bit of a re-think. Some variant on singular LDA, for example, may be a more useful way to think about it. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Monday, 2 July 2007 1:29 PM To: 'Patrick Connolly' Cc: r-help@stat.math.ethz.ch; 'Mark Difford' Subject: Re: [R] Question about PCA with prcomp The PCs that are associated with the smaller eigenvalues. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Patrick Connolly [mailto:[EMAIL PROTECTED] Sent: Monday, July 02, 2007 4:23 PM To: Ravi Varadhan Cc: 'Mark Difford'; r-help@stat.math.ethz.ch Subject: Re: [R] Question about PCA with prcomp On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote: | Mark, | | What you are referring to deals with the selection of covariates, | since PC | doesn't do dimensionality reduction in the sense of covariate selection. | But what Mark is asking for is to identify how much each data point | contributes to individual PCs. I don't think that Mark's query makes much | sense, unless he meant to ask: which individuals have high/low scores | on PC1/PC2. Here are some comments that may be tangentially related | to Mark's | question: | | 1. If one is worried about a few data points contributing heavily to | the estimation of PCs, then one can use robust PCA, for example, | using robust covariance matrices. MASS has some tools for this. | 2. The biplot for the first 2 PCs can give some insights 3. PCs, | especially, the last few PCs, can be used to identify outliers. What is meant by last few PCs? -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Middle minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a11402204 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Evaluating predictive power with no intercept-statistics question - not R question
I realize that the following has been talked about on this list many times before in some related way but I am going to ask for help anyway because I still don't know what to do. Suppose I have no intercept models such as the following : Y = B*X_1 + error Y = B*X_2 + error Y = B*X_3 + error Y = B*X_4 + error and I run regressions on each ( over the same sample of Y ) and now I want to evaluate which X has the greatest predictive power. I'm fairly certain that R squared is not applicable because of the lack of an intercept but I was wondering what was ? Any references to this particular problem or suggestions are appreciated. I honestly believe that including an intercept is incorrect For my particular problem. Thanks. Maybe I could put all the X's in one regression and some kind of topdownselect or StepAIC algorithm for example ? Thanks. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question
Hi I have two tree and i want compute error of each tree and then cmparison with t.test. I cant coding this problem.Could i help me? Regards - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on package building
Hi dear R-Help members: When I was building my R package, using $R CMD build --binary pkg it came out the following error: ** libs Error: package 'methods' was built for x86_64-unknown-linux-gnu Execution halted The information about $R_HOME/library/methods/libs/methods.so is: $file methods.so methods.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), not stripped my machine is: $uname -a Linux whinev 2.4.21-37.EL #1 SMP Wed Sep 7 13:32:18 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux and I used the options below to build R src: ../configure --prefix=/home/whinev/R --enable-R-shlib CC=gcc -m32 CXXFLAGS=-m32 -O2 -g FFLAGS=-m32 -O2 -g FCFLAGS=-m32 -O2 -g LDFLAGS=-L/usr/local/lib LIBnn=lib --with-x=no Would you please tell me what the matter is? Thank you all! Whinev [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about lmer
Hello, We have a problem with function lmer. This is our code: Get_values-function(ff_count, fixed_factors, rf_count, random_factors, y_values) { SA-matrix(as.array(c(fixed_factors, random_factors)), ncol=3) data-as.data.frame(SA) y-as.array(y_values) dd-data.frame(SA) for(i in 1:(ff_count+rf_count)){ dd[,i]-as.factor(data[,i]) } fit_full=lmer(y~dd[,1]+dd[,2]+(1|dd[,3]),method=ML) fit_full } A-c(0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1) B-c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1) C-c(0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1) Y-c(5,3,4,1,1,2,6,1,5,3,7,1,2,3,1,1,5,3,4,1,1,2,6,1,5,3,7,1,2,3,1,1) r-Get_values(2, c(A,B),1,c(C),Y) r R output: Error in inherits(x, factor) : object dd not found Can this function work with random array? Because this code is working: D-as.factor(data[,3]) fit_full=lmer(y~dd[,1]+dd[,2]+(1|D),method=ML) -- Truly yours, Julia mailto:[EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about lmer
On 6/18/07, Julia Proudnikova [EMAIL PROTECTED] wrote: Hello, We have a problem with function lmer. This is our code: Get_values-function(ff_count, fixed_factors, rf_count, random_factors, y_values) { SA-matrix(as.array(c(fixed_factors, random_factors)), ncol=3) data-as.data.frame(SA) y-as.array(y_values) dd-data.frame(SA) for(i in 1:(ff_count+rf_count)){ dd[,i]-as.factor(data[,i]) } fit_full=lmer(y~dd[,1]+dd[,2]+(1|dd[,3]),method=ML) fit_full } A-c(0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1) B-c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1) C-c(0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1) Y-c(5,3,4,1,1,2,6,1,5,3,7,1,2,3,1,1,5,3,4,1,1,2,6,1,5,3,7,1,2,3,1,1) r-Get_values(2, c(A,B),1,c(C),Y) r R output: Error in inherits(x, factor) : object dd not found Can this function work with random array? Because this code is working: The full explanation of why lmer fails to find dd has to do with the way names are resolved in a call to model.frame. However, there may be a way to solve your problem by redesigning your function so you don't need to worry about what model.frame does. Why not pass the data as a data frame and pass the names of the fixed factors, random factors and response variable as character strings? Your current design of creating a matrix, then converting it to a data frame then converting numeric variables back to factors is a bit convoluted. If you knew that you were only going to have one random factor you could generate the formula as substitute(y ~ ff + (1|rf), list(y = as.name(y_name), ff = parse(paste(ff_names, collapse = +)), rf = as.name(rf_name)) It gets a bit trickier with multiple random factors. Having said all this, it does appear that the call to model.frame inside lmer is getting the wrong environment from the formula and I will correct that. If you need more detail about the redesign I am suggesting, feel free to contact me off-list. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Not R question]: Better fit for order probit model
Thank you so much Robert. Please find the information below. The scale 1-10 are subjective physical condition ratings scored by inspection engineers at the site. 1-5 are in very bad conditions (bridge close down to seriously deteriorated). The rest from 6-10 are categorized as good condition.However, the proportion of samples in my data are as follows. Bridges with 1-5 scale covers about 2-3% of total population. The majority of the bridges are in 7-8. Certainly, I have enough data for model estimation but the mix of the proportion is good. I attached the detail of condition rating scale at the end of this message. As a result, my ordered probit model yield cutting points that cannot capture level 1-5 well. Even in my in-sample population, the model cannot capture level 2-5 at all. In other words, with the estimated cutting points for level 1-5, I have zero bridges that belong to level 2-5. Unfortunately, my objective is especially to analyze statistically what kind of design attributes lead to such a bad condition. So I would like my model to be able to capture bad condition bridges as much as it could. 9 EXCELLENT CONDITION 8 VERY GOOD CONDITION - no problems noted. 7 GOOD CONDITION - some minor problems. 6 SATISFACTORY CONDITION - structural elements show some minor deterioration. 5 FAIR CONDITION - all primary structural elements are sound but may have minor section loss, cracking, spalling or scour. 4 POOR CONDITION - advanced section loss, deterioration, spalling or scour. 3 SERIOUS CONDITION - loss of section, deterioration of primary structural elements. Fatigue cracks in steel or shear cracks in concrete may be present. 2 CRITICAL CONDITION - advanced deterioration of primary structural elements. Fatigue cracks in steel or shear cracks in concrete may be present or scour may have removed substructure support. Unless closely monitored it may be necessary to close the bridge until corrective action is taken. 1 IMMINANT FAILURE CONDITION - major deterioration or section loss present in critical sructural components or obvious vertical or horizontal movement affecting structure stability. Bridge is closed to traffic but corrective action may put it back in light service. 0 FAILED CONDITION - out of service; beyond corrective action. - Original Message -From: Robert A LaBudde Date: Saturday, June 16, 2007 9:59 amSubject: Re: [R] [Not R question]: Better fit for order probit modelTo: R-help@stat.math.ethz.ch At 03:17 AM 6/16/2007, adschai wrote: Thank you so much Robert. I haven't thought about the idea of clumping categories together. One of the reason is because these categories are bridge condition rating scores. They indeed represent different meaning and serviceability conditions. They vary from 0-9. I have about 300,000 data in which the first 5 labels, i.e. 0- 4, are bad condition bridge and there are only less than 1000 instances in total. The worst case, is for example, score 0 (meaning the bridge is not operatable), I have 60 instances. Score 1 I have about 100. I would appreciate if you could provide some opinion as to how you would make the order probit fits better in this case? Thank you so much in advance. You certainly appear to h! ave enough data to populate these categories. Your problems in a getting a good fit may relate to other problems. You need to supply more information in order to say more. What are the definitions of each category? Is the ordering consistent, or are there really two different scales, one for bridge with essentially no problems, and another for those with serious damage? What evidence do you have that your fit is poor? What model are you fitting? Etc. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do r! ead the posting guide http://www.R- project.org/posting-guide.html a nd provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Not R question]: Better fit for order probit model
At 01:29 PM 6/17/2007, adschai wrote: Thank you so much Robert. Please find the information below. The scale 1-10 are subjective physical condition ratings scored by inspection engineers at the site. 1-5 are in very bad conditions (bridge close down to seriously deteriorated). The rest from 6-10 are categorized as good condition.However, the proportion of samples in my data are as follows. Bridges with 1-5 scale covers about 2-3% of total population. The majority of the bridges are in 7-8. Certainly, I have enough data for model estimation but the mix of the proportion is good. I attached the detail of condition rating scale at the end of this message. snip My guess is that you really have two distributions here: 1) Bridges that are basically under proper supervision and repair (Categories 6-10), and those that are not Categories 1-5). These two classes would probably have dramatically different relations to the covariates your are using. So my recommendation would be to consider splitting your response into two classes (0/1), each with 5 sub categories. Keeping the two classes merged together in a single group is equivalent to merging two different distributions with a weighting factor. This may be causing a bimodal distribution which is giving you problems. You could try your modeling on each of the two classes separately before continuing with your full dataset modeling. You may learn something useful to help you with your problems. For the full model, you would need to include a full set of interaction terms with class. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Not R question]: Better fit for order probit model
Thank you so much Robert. I haven't thought about the idea of clumping categories together. One of the reason is because these categories are bridge condition rating scores. They indeed represent different meaning and serviceability conditions. They vary from 0-9. I have about 300,000 data in which the first 5 labels, i.e. 0-4, are bad condition bridge and there are only less than 1000 instances in total. The worst case, is for example, score 0 (meaning the bridge is not operatable), I have 60 instances. Score 1 I have about 100. I would appreciate if you could provide some opinion as to how you would make the order probit fits better in this case? Thank you so much in advance.- adschai- Original Message -From: Robert A LaBudde Date: Friday, June 15, 2007 9:52 pmSubject: Re: [R] [Not R question]: Better fit for order probit modelTo: R-help@stat.math.ethz.ch At 09:31 PM 6/15/2007, adschai wrote: I have a model which tries to fit a set of data with 10-level order! ed responses. Somehow, in my data, the majority of the observations are from level 6-10 and leave only about 1-5% of total observations contributed to level 1-10. As a result, my model tends to perform badly on points that have lower level than 6. I would like to ask if there's any way to circumvent this problem or not. I was thinking of the followings ideas. But I am opened to any suggestions if you could please. 1. Bootstrapping with small size of samples each time. Howevever, in each sample basket, I intentionally sample in such a way that there is a good mix between observations from each level. Then I have to do this many times. But I don't know how to obtain the true standard error of estimated parameters after all bootstrapping has been done. Is it going to be simply the average of all standard errors estimated each time? 2. Weighting points with level 1-6 more. But it's unclear to me how to put t! his weight back to maximum likelihood when estimating parameters. I t's unlike OLS where your objective is to minimize error or, if you'd like, a penalty function. But MLE is obviously not a penalty function. 3. Do step-wise regression. I will segment the data into two regions, first points with response less than 6 and the rest with those above 6. The first step is a binary regression to determine if the point belongs to which of the two groups. Then in the second step, estimate ordered probit model for each group separately. The question here is then, why I am choosing 6 as a cutting point instead of others? Any suggestions would be really appreciated. Thank you. You could do the obvious, and lump categories such as 1-6 or 1-7 together to make a composite category. You don't mention the size of your dataset. If there are 10,000 data, you might live with a 1% category. If you only have 100 data, you have too many categories. Also, next time plan your study and training better so! that next time your categories are fully utilized. And don't use so many categories. People have trouble even selecting responses on a 5-level scale. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R- project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about formula for lm
Yet another solution: with(X,lm(get(Ytext)~Xvar)) On 14-Jun-07, at 5:18 PM, Greg Snow wrote: Try: lm( formula( paste( Ytext, '~ Xvar' ) ), data=X) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro Mardones Sent: Thursday, June 14, 2007 1:14 PM To: R-help@stat.math.ethz.ch Subject: [R] question about formula for lm Dear all; Is there any way to make this to work?: .x-rnorm(50,10,3) .y-.x+rnorm(50,0,1) X-data.frame(.x,.y) colnames(X)-c(Xvar,Yvar) Ytext-Yvar lm(Ytext~Xvar,data=X) # doesn't run lm(Yvar~Xvar,data=X) # does run The main idea is to use Ytext as input in a function, so you just type Yvar and the model should fit Also, I need to avoid the expression X$Yvar~X$Xvar Thanks for any idea PM __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University Website: http://myweb.dal.ca/mc973993 Public calendar: http://icalx.com/public/informavore/Public The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Not R question]: Better fit for order probit model
At 03:17 AM 6/16/2007, adschai wrote: Thank you so much Robert. I haven't thought about the idea of clumping categories together. One of the reason is because these categories are bridge condition rating scores. They indeed represent different meaning and serviceability conditions. They vary from 0-9. I have about 300,000 data in which the first 5 labels, i.e. 0-4, are bad condition bridge and there are only less than 1000 instances in total. The worst case, is for example, score 0 (meaning the bridge is not operatable), I have 60 instances. Score 1 I have about 100. I would appreciate if you could provide some opinion as to how you would make the order probit fits better in this case? Thank you so much in advance. You certainly appear to have enough data to populate these categories. Your problems in a getting a good fit may relate to other problems. You need to supply more information in order to say more. What are the definitions of each category? Is the ordering consistent, or are there really two different scales, one for bridge with essentially no problems, and another for those with serious damage? What evidence do you have that your fit is poor? What model are you fitting? Etc. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about formula for lm
On 6/14/07, Greg Snow [EMAIL PROTECTED] wrote: Try: lm( formula( paste( Ytext, '~ Xvar' ) ), data=X) That type of construction is perilously close to parse(paste(...)) and we know what Thomas said about that (see fortune(parse)). A safer way of constructing a formula from names stored in a character variable is substitute(foo ~ Xvar, list(foo = as.name(Ytext)) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro Mardones Sent: Thursday, June 14, 2007 1:14 PM To: R-help@stat.math.ethz.ch Subject: [R] question about formula for lm Dear all; Is there any way to make this to work?: .x-rnorm(50,10,3) .y-.x+rnorm(50,0,1) X-data.frame(.x,.y) colnames(X)-c(Xvar,Yvar) Ytext-Yvar lm(Ytext~Xvar,data=X) # doesn't run lm(Yvar~Xvar,data=X) # does run The main idea is to use Ytext as input in a function, so you just type Yvar and the model should fit Also, I need to avoid the expression X$Yvar~X$Xvar Thanks for any idea PM __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about formula for lm
A couple of easy ways are to create the calling sequence as a call or character string and then evaluate it: eval(bquote(lm(.(as.name(Ytext)) ~ Xvar, X))) eval(parse(text = paste(lm(, Ytext, ~ Xvar, X Note that these solutions both have the advantage over some of the prior solutions that the output from print.lm shows which variable was actually used after Call: eval(bquote(lm(.(as.name(Ytext)) ~ Xvar, X))) Call: lm(formula = Yvar ~ Xvar, data = X) --- This line comes out meaningfully!!! Coefficients: (Intercept) Xvar 0.3300 0.9729 On 6/14/07, Pedro Mardones [EMAIL PROTECTED] wrote: Dear all; Is there any way to make this to work?: .x-rnorm(50,10,3) .y-.x+rnorm(50,0,1) X-data.frame(.x,.y) colnames(X)-c(Xvar,Yvar) Ytext-Yvar lm(Ytext~Xvar,data=X) # doesn't run lm(Yvar~Xvar,data=X) # does run The main idea is to use Ytext as input in a function, so you just type Yvar and the model should fit Also, I need to avoid the expression X$Yvar~X$Xvar Thanks for any idea PM __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [Not R question]: Better fit for order probit model
Hi, I have a model which tries to fit a set of data with 10-level ordered responses. Somehow, in my data, the majority of the observations are from level 6-10 and leave only about 1-5% of total observations contributed to level 1-10. As a result, my model tends to perform badly on points that have lower level than 6. I would like to ask if there's any way to circumvent this problem or not. I was thinking of the followings ideas. But I am opened to any suggestions if you could please. 1. Bootstrapping with small size of samples each time. Howevever, in each sample basket, I intentionally sample in such a way that there is a good mix between observations from each level. Then I have to do this many times. But I don't know how to obtain the true standard error of estimated parameters after all bootstrapping has been done. Is it going to be simply the average of all standard errors estimated each time? 2. Weighting points with level 1-6 more. But it's unclear to me how to put this weight back to maximum likelihood when estimating parameters. It's unlike OLS where your objective is to minimize error or, if you'd like, a penalty function. But MLE is obviously not a penalty function. 3. Do step-wise regression. I will segment the data into two regions, first points with response less than 6 and the rest with those above 6. The first step is a binary regression to determine if the point belongs to which of the two groups. Then in the second step, estimate ordered probit model for each group separately. The question here is then, why I am choosing 6 as a cutting point instead of others? Any suggestions would be really appreciated. Thank you. - adschai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Not R question]: Better fit for order probit model
At 09:31 PM 6/15/2007, adschai wrote: I have a model which tries to fit a set of data with 10-level ordered responses. Somehow, in my data, the majority of the observations are from level 6-10 and leave only about 1-5% of total observations contributed to level 1-10. As a result, my model tends to perform badly on points that have lower level than 6. I would like to ask if there's any way to circumvent this problem or not. I was thinking of the followings ideas. But I am opened to any suggestions if you could please. 1. Bootstrapping with small size of samples each time. Howevever, in each sample basket, I intentionally sample in such a way that there is a good mix between observations from each level. Then I have to do this many times. But I don't know how to obtain the true standard error of estimated parameters after all bootstrapping has been done. Is it going to be simply the average of all standard errors estimated each time? 2. Weighting points with level 1-6 more. But it's unclear to me how to put this weight back to maximum likelihood when estimating parameters. It's unlike OLS where your objective is to minimize error or, if you'd like, a penalty function. But MLE is obviously not a penalty function. 3. Do step-wise regression. I will segment the data into two regions, first points with response less than 6 and the rest with those above 6. The first step is a binary regression to determine if the point belongs to which of the two groups. Then in the second step, estimate ordered probit model for each group separately. The question here is then, why I am choosing 6 as a cutting point instead of others? Any suggestions would be really appreciated. Thank you. You could do the obvious, and lump categories such as 1-6 or 1-7 together to make a composite category. You don't mention the size of your dataset. If there are 10,000 data, you might live with a 1% category. If you only have 100 data, you have too many categories. Also, next time plan your study and training better so that next time your categories are fully utilized. And don't use so many categories. People have trouble even selecting responses on a 5-level scale. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about formula for lm
Dear all; Is there any way to make this to work?: .x-rnorm(50,10,3) .y-.x+rnorm(50,0,1) X-data.frame(.x,.y) colnames(X)-c(Xvar,Yvar) Ytext-Yvar lm(Ytext~Xvar,data=X) # doesn't run lm(Yvar~Xvar,data=X) # does run The main idea is to use Ytext as input in a function, so you just type Yvar and the model should fit Also, I need to avoid the expression X$Yvar~X$Xvar Thanks for any idea PM __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about formula for lm
Why not using: lm(X[[Ytext]]~Xvar,data=X) ThPe Pedro Mardones wrote: Dear all; Is there any way to make this to work?: .x-rnorm(50,10,3) .y-.x+rnorm(50,0,1) X-data.frame(.x,.y) colnames(X)-c(Xvar,Yvar) Ytext-Yvar lm(Ytext~Xvar,data=X) # doesn't run lm(Yvar~Xvar,data=X) # does run The main idea is to use Ytext as input in a function, so you just type Yvar and the model should fit Also, I need to avoid the expression X$Yvar~X$Xvar Thanks for any idea PM __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about formula for lm
First check the value of Ytext. Try Ytext - X$Yvar --- Pedro Mardones [EMAIL PROTECTED] wrote: Dear all; Is there any way to make this to work?: .x-rnorm(50,10,3) .y-.x+rnorm(50,0,1) X-data.frame(.x,.y) colnames(X)-c(Xvar,Yvar) Ytext-Yvar lm(Ytext~Xvar,data=X) # doesn't run lm(Yvar~Xvar,data=X) # does run The main idea is to use Ytext as input in a function, so you just type Yvar and the model should fit Also, I need to avoid the expression X$Yvar~X$Xvar Thanks for any idea PM __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about formula for lm
Try: lm( formula( paste( Ytext, '~ Xvar' ) ), data=X) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro Mardones Sent: Thursday, June 14, 2007 1:14 PM To: R-help@stat.math.ethz.ch Subject: [R] question about formula for lm Dear all; Is there any way to make this to work?: .x-rnorm(50,10,3) .y-.x+rnorm(50,0,1) X-data.frame(.x,.y) colnames(X)-c(Xvar,Yvar) Ytext-Yvar lm(Ytext~Xvar,data=X) # doesn't run lm(Yvar~Xvar,data=X) # does run The main idea is to use Ytext as input in a function, so you just type Yvar and the model should fit Also, I need to avoid the expression X$Yvar~X$Xvar Thanks for any idea PM __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question with nlm
Hi, I would really appreciate if I could get some help here. I'm using nlm to minimize my negative log likelihood function. What I did is as follows: My log likelihood function (it returns negative log likelihood) with 'gradient' attribute defined inside as follows: # ==Method definition== logLikFunc3 - function(sigma, object, totalTime) { y - as.matrix([EMAIL PROTECTED]:totalTime,1]); x - as.matrix([EMAIL PROTECTED]:totalTime,]); # compute necessary matrices M - as.matrix([EMAIL PROTECTED]); P - diag(sigma*sigma); A - AMatrix(totalTime, M, [EMAIL PROTECTED]:totalTime,]); Q - IMatrix(totalTime)+A %*% outerM(IMatrix(totalTime-1),P) %*% t(A); invQ - solve(Q,IMatrix(dim(Q)[1])); xM - matrix(rep(0, dim(M)[2]*totalTime), ncol=dim(M)[2], nrow=totalTime); for (i in 1:totalTime) { xM[i,] - x[i,] %*% powerM(M, -totalTime+i); } tmp - solve((t(xM) %*% invQ %*% xM), IMatrix(dim(xM)[2])); Bt - (tmp %*% t(xM)) %*% (invQ %*% y); N - IMatrix(totalTime)-(xM %*% tmp %*% t(xM) %*% invQ); sigma2 - (1/totalTime) * t(y- xM %*% Bt)%*% invQ %*% (y- xM %*% Bt); # log likelihood function loglik - -0.5*log(abs(det(diag(rep(sigma2,totalTime)-0.5*log(abs(det(Q)))- (0.5/sigma2)* (t(y- (xM%*% Bt)) %*% invQ %*% (y-(xM %*% Bt))); sgm - sigma; # gradients eq. (4.16) gr - function(sgm) { gradVecs - c(); # sgm - c(sigma1, sigma2); sgm - sgm*sgm; for (i in 1:length(sgm)) { Eij - matrix(rep(0, length(sgm)^2), nrow=length(sgm), ncol=length(sgm)); Eij[i,i] - 1.0; # trace term term1 - -sum(diag((invQ %*% A) %*% outerM(IMatrix(totalTime-1),Eij) %*% t(A))); # very long term term2 - (1/totalTime)*solve((t(y) %*% t(N) %*% invQ %*% y), IMatrix(dim(y)[2])); term3 - (t(y) %*% t(N) %*% invQ %*% A) %*% outerM(IMatrix(totalTime-1),Eij) %*% (t(A) %*% invQ %*% N %*% y); gradVecs - -1*c(gradVecs, term1+ (term2 %*% term3)); } # end for print(paste(Gradient has length:, length(gradVecs))); return(gradVecs); } res - -loglik; attr(res, gradient) - gradVecs; return(res); } #=end method definition= Then when I call the nlm on this function, i.e. nlm(f=logLikFunc3, p=as.numeric(c(1,1)), object=this, totalTime=200, print.level=2) It complains that my analytic gradient returns vector length different from number of my unknowns. In this case, I tried print the length of gradient vector that I returned (as you could see in the code). It has the same length as my input parameter vectors. Did I do anything wrong here? Also, I would like to be able to put some constraints on this optimization as well. I tried constrOptim with: ui - diag(c(1,1)); ci - matrix(rep(0,2), ncol=1, nrow=2); using the same parameters passed to nlm above. constrOptim gives me an error that initial value is in infeasible region which I don't quite understand. As my constraints simply says that the two parameters must be greater than zero. My assigned initial values are both 1. So it should be ok. Any help would be really appreciated. Thank you. - adschai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about data availale in .RData file using the biobase package
Hi all I am analyzing micro array data and I have R workspace images as my source of the data(.Rdata format).That was in the biobase package format,so I used some commands from the bio base package manual and could write the data into excel files. The data I am working on is the cancer data. I could get microarray information and recurrence information by using commands like x-pData(oncogene) y-exprs(oncogene) I think the survival information should also be in the .RData file.How can i know what all information is available in the give file. Please let me know any commands that show what type of information is available in the given file from a bio base package. Thank You rama kanth Download prohibited? No problem! To chat from any browser without download, Click Here: http://in.messenger.yahoo.com/webmessengerpromo.php __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on weighted Kaplan-Meier analysis of case-cohort design
I have a study best described as a retrospective case-cohort design: the cases were all the events in a given time span surveyed, and the controls (event-free during the follow-up period) were selected in 2:1 ratio (2 controls per case). The sampling frequency for the controls was about 0.27, so I used a weight vector consisting of 1 for cases and 1/0.27 for controls for coxph to adjust for sampling bias. Using the same weights in Kaplan-Meier analysis (survfit) gave very inaccurate survival curves (much lower event rate than expected from population). Are weighting handled differently between coxph and survfit? How should I conduct a weighted Kaplan-Meier analysis (given that survfit doesn't accept a weighted cox model) for such a design? Any explanations or suggestions are highly appreciated, xiaojun __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Svar: Re: help with simple R-question
thanks, that works great!! just have another thing...i the same area What if the class is list instead of array, how can you name the first unrecognized column? Rina John Kane [EMAIL PROTECTED] 06/05/07 3:17 --- Rina Miehs [EMAIL PROTECTED] wrote: hello what do i write for R to recognize both columns? In the R-script downunder you can see that i use tapply to get my information out of my data, and then i need to use it as a dataframe with both columns! It is like R is using the first column as an observationnumber or something, how can i change that?? It is using the names of the variables as rownames. try n.ant - names(antall) antal1 - data.frame(n.antal1, antal1) antal1 -tapply(l1$omlob1,l1$farid,length) antal1 1437987 1100 10007995 10008295 10008792 10010203 10018703 10033401 2 3 3 2 3 1 1 2 10048900 10050492 10055897 10076495 10081892 10094801 10100692 10101395 3 1 3 3 6 2 5 20 10101495 10101595 10104692 10113592 10113697 10114297 10120797 10120897 1 5 4 2 6 11 1 4 10121697 10121897 10121997 10133592 10142892 10142995 10146495 10150497 16 3 6 1 1 6 4 4 10150692 10157092 10157292 10164792 10170892 10171795 10171895 10172300 5 2 4 4 4 4 4 1 10175195 10187802 10192499 10192897 10198295 10200493 10201693 10211593 1 2 2 3 5 1 3 5 antal1 - data.frame(antal1) antal1 antal1 14379872 1100 3 10007995 3 10008295 2 10008792 3 10010203 NA 10018703 NA 10033401 2 10048900 3 10050492 1 10055897 3 10076495 3 10081892 6 10094801 2 10100692 5 Thanks Rina [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Get news delivered with the All new Yahoo! Mail. Enjoy RSS feeds right on your Mail page. Start today at http://mrd.mail.yahoo.com/try_beta?.intl=ca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Svar: Re: help with simple R-question
--- Rina Miehs [EMAIL PROTECTED] wrote: thanks, that works great!! just have another thing...i the same area What if the class is list instead of array, how can you name the first unrecognized column? I am not sure that I understand the question. You don't really have an unrecognised column in the dataframe but an array of rownames ( I don't know how they are stored). I think you can do the same thing as you did for the data.frame but as I say, I am not sure I understand the question. Would you post a little example? Rina John Kane [EMAIL PROTECTED] 06/05/07 3:17 --- Rina Miehs [EMAIL PROTECTED] wrote: hello what do i write for R to recognize both columns? In the R-script downunder you can see that i use tapply to get my information out of my data, and then i need to use it as a dataframe with both columns! It is like R is using the first column as an observationnumber or something, how can i change that?? It is using the names of the variables as rownames. try n.ant - names(antall) antal1 - data.frame(n.antal1, antal1) antal1 -tapply(l1$omlob1,l1$farid,length) antal1 1437987 1100 10007995 10008295 10008792 10010203 10018703 10033401 2 3 3 2 3 1 1 2 10048900 10050492 10055897 10076495 10081892 10094801 10100692 10101395 3 1 3 3 6 2 5 20 10101495 10101595 10104692 10113592 10113697 10114297 10120797 10120897 1 5 4 2 6 11 1 4 10121697 10121897 10121997 10133592 10142892 10142995 10146495 10150497 16 3 6 1 1 6 4 4 10150692 10157092 10157292 10164792 10170892 10171795 10171895 10172300 5 2 4 4 4 4 4 1 10175195 10187802 10192499 10192897 10198295 10200493 10201693 10211593 1 2 2 3 5 1 3 5 antal1 - data.frame(antal1) antal1 antal1 14379872 1100 3 10007995 3 10008295 2 10008792 3 10010203 NA 10018703 NA 10033401 2 10048900 3 10050492 1 10055897 3 10076495 3 10081892 6 10094801 2 10100692 5 Thanks Rina [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Get news delivered with the All new Yahoo! Mail. Enjoy RSS feeds right on your Mail page. Start today at http://mrd.mail.yahoo.com/try_beta?.intl=ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Svar: Re: help with simple R-question
The left column is boar id number, and the right is the random effect estimate. I need the numbers in the left column when i merge far1 together with other data.frames based on the id numbers. When i use ranef the output is the class list and R only sees the intercepts, but i need a data.frame with 'boar id' and 'niveau', two columns not just one... fx far1 - ranef(resultat1)[1] far1 $farid (Intercept) 1437987 -3.775851e-03 1100 -3.220044e-03 10007995 1.848914e-02 10008295 -4.583903e-03 10008792 -9.518371e-03 10033401 -7.538132e-03 10048900 1.540309e-02 far1 - as.data.frame(far1) far1 X.Intercept. 1437987 -3.775851e-03 1100 -3.220044e-03 10007995 1.848914e-02 10008295 -4.583903e-03 10008792 -9.518371e-03 10033401 -7.538132e-03 10048900 1.540309e-02 Thanks again Rina John Kane [EMAIL PROTECTED] 06/06/07 11:38 --- Rina Miehs [EMAIL PROTECTED] wrote: thanks, that works great!! just have another thing...i the same area What if the class is list instead of array, how can you name the first unrecognized column? I am not sure that I understand the question. You don't really have an unrecognised column in the dataframe but an array of rownames ( I don't know how they are stored). I think you can do the same thing as you did for the data.frame but as I say, I am not sure I understand the question. Would you post a little example? Rina John Kane [EMAIL PROTECTED] 06/05/07 3:17 --- Rina Miehs [EMAIL PROTECTED] wrote: hello what do i write for R to recognize both columns? In the R-script downunder you can see that i use tapply to get my information out of my data, and then i need to use it as a dataframe with both columns! It is like R is using the first column as an observationnumber or something, how can i change that?? It is using the names of the variables as rownames. try n.ant - names(antall) antal1 - data.frame(n.antal1, antal1) antal1 -tapply(l1$omlob1,l1$farid,length) antal1 1437987 1100 10007995 10008295 10008792 10010203 10018703 10033401 2 3 3 2 3 1 1 2 10048900 10050492 10055897 10076495 10081892 10094801 10100692 10101395 3 1 3 3 6 2 5 20 10101495 10101595 10104692 10113592 10113697 10114297 10120797 10120897 1 5 4 2 6 11 1 4 10121697 10121897 10121997 10133592 10142892 10142995 10146495 10150497 16 3 6 1 1 6 4 4 10150692 10157092 10157292 10164792 10170892 10171795 10171895 10172300 5 2 4 4 4 4 4 1 10175195 10187802 10192499 10192897 10198295 10200493 10201693 10211593 1 2 2 3 5 1 3 5 antal1 - data.frame(antal1) antal1 antal1 14379872 1100 3 10007995 3 10008295 2 10008792 3 10010203 NA 10018703 NA 10033401 2 10048900 3 10050492 1 10055897 3 10076495 3 10081892 6 10094801 2 10100692 5 Thanks Rina [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ ) ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Get news delivered with the All new Yahoo! Mail. Enjoy RSS feeds right on your Mail page. Start today at http://mrd.mail.yahoo.com/try_beta?.intl=ca Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: Svar: Re: help with simple R-question
[EMAIL PROTECTED] napsal dne 06.06.2007 10:56:18: thanks, that works great!! just have another thing...i the same area What if the class is list instead of array, how can you name the first unrecognized column? Hi look in some intro manual and learn about R data structures. Matrix, array, vector, data.frame, list and some others have some distinct features and they sometimes can be interchanged and sometimes not. To learn how objects are organised see str(), class(), typeof(), mode(). Regards Petr Rina John Kane [EMAIL PROTECTED] 06/05/07 3:17 --- Rina Miehs [EMAIL PROTECTED] wrote: hello what do i write for R to recognize both columns? In the R-script downunder you can see that i use tapply to get my information out of my data, and then i need to use it as a dataframe with both columns! It is like R is using the first column as an observationnumber or something, how can i change that?? It is using the names of the variables as rownames. try n.ant - names(antall) antal1 - data.frame(n.antal1, antal1) antal1 -tapply(l1$omlob1,l1$farid,length) antal1 1437987 1100 10007995 10008295 10008792 10010203 10018703 10033401 2 3 3 2 3 1 1 2 10048900 10050492 10055897 10076495 10081892 10094801 10100692 10101395 3 1 3 3 6 2 5 20 10101495 10101595 10104692 10113592 10113697 10114297 10120797 10120897 1 5 4 2 6 11 1 4 10121697 10121897 10121997 10133592 10142892 10142995 10146495 10150497 16 3 6 1 1 6 4 4 10150692 10157092 10157292 10164792 10170892 10171795 10171895 10172300 5 2 4 4 4 4 4 1 10175195 10187802 10192499 10192897 10198295 10200493 10201693 10211593 1 2 2 3 5 1 3 5 antal1 - data.frame(antal1) antal1 antal1 14379872 1100 3 10007995 3 10008295 2 10008792 3 10010203 NA 10018703 NA 10033401 2 10048900 3 10050492 1 10055897 3 10076495 3 10081892 6 10094801 2 10100692 5 Thanks Rina [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Get news delivered with the All new Yahoo! Mail. Enjoy RSS feeds right on your Mail page. Start today at http://mrd.mail.yahoo.com/try_beta?.intl=ca [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Svar: Re: help with simple R-question
I think that you have the same situation as before though I have never used ranef(). The boar ids are acting as the row names and are not really part of the data.frame. It just looks like that when R prints the data.frame. Try boars - rownames(far1) far1 - cbind( boars, far1) The results may look a bit funny with two printed columns looking the same but one will be a column in the data.frame and the other will be the row names column. I hope :) --- Rina Miehs [EMAIL PROTECTED] wrote: The left column is boar id number, and the right is the random effect estimate. I need the numbers in the left column when i merge far1 together with other data.frames based on the id numbers. When i use ranef the output is the class list and R only sees the intercepts, but i need a data.frame with 'boar id' and 'niveau', two columns not just one... fx far1 - ranef(resultat1)[1] far1 $farid (Intercept) 1437987 -3.775851e-03 1100 -3.220044e-03 10007995 1.848914e-02 10008295 -4.583903e-03 10008792 -9.518371e-03 10033401 -7.538132e-03 10048900 1.540309e-02 far1 - as.data.frame(far1) far1 X.Intercept. 1437987 -3.775851e-03 1100 -3.220044e-03 10007995 1.848914e-02 10008295 -4.583903e-03 10008792 -9.518371e-03 10033401 -7.538132e-03 10048900 1.540309e-02 Thanks again Rina John Kane [EMAIL PROTECTED] 06/06/07 11:38 --- Rina Miehs [EMAIL PROTECTED] wrote: thanks, that works great!! just have another thing...i the same area What if the class is list instead of array, how can you name the first unrecognized column? I am not sure that I understand the question. You don't really have an unrecognised column in the dataframe but an array of rownames ( I don't know how they are stored). I think you can do the same thing as you did for the data.frame but as I say, I am not sure I understand the question. Would you post a little example? Rina John Kane [EMAIL PROTECTED] 06/05/07 3:17 --- Rina Miehs [EMAIL PROTECTED] wrote: hello what do i write for R to recognize both columns? In the R-script downunder you can see that i use tapply to get my information out of my data, and then i need to use it as a dataframe with both columns! It is like R is using the first column as an observationnumber or something, how can i change that?? It is using the names of the variables as rownames. try n.ant - names(antall) antal1 - data.frame(n.antal1, antal1) antal1 -tapply(l1$omlob1,l1$farid,length) antal1 1437987 1100 10007995 10008295 10008792 10010203 10018703 10033401 2 3 3 2 3 1 1 2 10048900 10050492 10055897 10076495 10081892 10094801 10100692 10101395 3 1 3 3 6 2 5 20 10101495 10101595 10104692 10113592 10113697 10114297 10120797 10120897 1 5 4 2 6 11 1 4 10121697 10121897 10121997 10133592 10142892 10142995 10146495 10150497 16 3 6 1 1 6 4 4 10150692 10157092 10157292 10164792 10170892 10171795 10171895 10172300 5 2 4 4 4 4 4 1 10175195 10187802 10192499 10192897 10198295 10200493 10201693 10211593 1 2 2 3 5 1 3 5 antal1 - data.frame(antal1) antal1 antal1 14379872 1100 3 10007995 3 10008295 2 10008792 3 10010203 NA 10018703 NA 10033401 2 10048900 3 10050492 1 10055897 3 10076495 3 10081892 6 10094801 2 10100692 5 Thanks Rina [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ ) ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Get news delivered with the All new Yahoo! Mail. Enjoy RSS feeds right on your Mail page. Start today at http://mrd.mail.yahoo.com/try_beta?.intl=ca Be smarter than spam. See how smart SpamGuard is at giving junk http://mrd.mail.yahoo.com/try_beta?.intl=ca === message truncated === __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
Re: [R] Svar: Re: help with simple R-question
Thanks that helped in the right way:D Thanks alot! Rina John Kane [EMAIL PROTECTED] 06/06/07 1:46 I think that you have the same situation as before though I have never used ranef(). The boar ids are acting as the row names and are not really part of the data.frame. It just looks like that when R prints the data.frame. Try boars - rownames(far1) far1 - cbind( boars, far1) The results may look a bit funny with two printed columns looking the same but one will be a column in the data.frame and the other will be the row names column. I hope :) --- Rina Miehs [EMAIL PROTECTED] wrote: The left column is boar id number, and the right is the random effect estimate. I need the numbers in the left column when i merge far1 together with other data.frames based on the id numbers. When i use ranef the output is the class list and R only sees the intercepts, but i need a data.frame with 'boar id' and 'niveau', two columns not just one... fx far1 - ranef(resultat1)[1] far1 $farid (Intercept) 1437987 -3.775851e-03 1100 -3.220044e-03 10007995 1.848914e-02 10008295 -4.583903e-03 10008792 -9.518371e-03 10033401 -7.538132e-03 10048900 1.540309e-02 far1 - as.data.frame(far1) far1 X.Intercept. 1437987 -3.775851e-03 1100 -3.220044e-03 10007995 1.848914e-02 10008295 -4.583903e-03 10008792 -9.518371e-03 10033401 -7.538132e-03 10048900 1.540309e-02 Thanks again Rina John Kane [EMAIL PROTECTED] 06/06/07 11:38 --- Rina Miehs [EMAIL PROTECTED] wrote: thanks, that works great!! just have another thing...i the same area What if the class is list instead of array, how can you name the first unrecognized column? I am not sure that I understand the question. You don't really have an unrecognised column in the dataframe but an array of rownames ( I don't know how they are stored). I think you can do the same thing as you did for the data.frame but as I say, I am not sure I understand the question. Would you post a little example? Rina John Kane [EMAIL PROTECTED] 06/05/07 3:17 --- Rina Miehs [EMAIL PROTECTED] wrote: hello what do i write for R to recognize both columns? In the R-script downunder you can see that i use tapply to get my information out of my data, and then i need to use it as a dataframe with both columns! It is like R is using the first column as an observationnumber or something, how can i change that?? It is using the names of the variables as rownames. try n.ant - names(antall) antal1 - data.frame(n.antal1, antal1) antal1 -tapply(l1$omlob1,l1$farid,length) antal1 1437987 1100 10007995 10008295 10008792 10010203 10018703 10033401 2 3 3 2 3 1 1 2 10048900 10050492 10055897 10076495 10081892 10094801 10100692 10101395 3 1 3 3 6 2 5 20 10101495 10101595 10104692 10113592 10113697 10114297 10120797 10120897 1 5 4 2 6 11 1 4 10121697 10121897 10121997 10133592 10142892 10142995 10146495 10150497 16 3 6 1 1 6 4 4 10150692 10157092 10157292 10164792 10170892 10171795 10171895 10172300 5 2 4 4 4 4 4 1 10175195 10187802 10192499 10192897 10198295 10200493 10201693 10211593 1 2 2 3 5 1 3 5 antal1 - data.frame(antal1) antal1 antal1 14379872 1100 3 10007995 3 10008295 2 10008792 3 10010203 NA 10018703 NA 10033401 2 10048900 3 10050492 1 10055897 3 10076495 3 10081892 6 10094801 2 10100692 5 Thanks Rina [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ ) ( http://www.r/ ) ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Get news delivered with the All new Yahoo! Mail. Enjoy RSS feeds right on your Mail page. Start today at http://mrd.mail.yahoo.com/try_beta?.intl=ca Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca
[R] Question on RandomForest in unsupervised mode
Hi, I attempted to run the randomForest() function on a dataset without predefined classes. According to the manual, running randomForest without a response variable/class labels should result in the function assuming you are running in unsupervised mode. In this case, I understand that my data is all assigned to one class whereas a second synthetic class is made up, which is assigned to a second class. The online manual suggests that an oob misclassification error in this two-class problem of ~40% or more would indicate that the x- variables look like independent variables to random forests (and I assume that in this case the proximities obtained by the randomForest would not be informative for clustering). When I run randomForest() in the unsupervised mode my first problem is that I get NULL entries for the confusion matrix and the err.rate, but I suppose this is normal behaviour. My only information (apart from the proximities of course), seems to be the votes, from which I can deduce whether the variables are meaningful or not. The second problem is that, in my case, all my observations seem to have about 20-40% of the votes from class 1 and the rest from class 2 (i.e. class 2 wins always). Assuming that class 1 was assigned to my original data, I'd say this is rather surprising. Initially I thought this was simply a problem of my data not being meaningful, but I repeated simply the forest with the iris example data and I get more or less the same result. I did simply: iris.urf - randomForest(iris[,-5]) iris.urf$votes and I got again most of the votes coming from class 2, although here vote percentages are slightly more balanced than with my data (approximately 40 to 60% most of the time). Has anyone got experience with unsupervised randomForest() in R and can explain to me why I'm observing this behaviour? In general, any hints about pitfalls regarding random forests in unsupervised mode would be very much appreciated. Many thanks in advance, Irilenia - Irilenia (Irene) Nobeli Randall Division of Cell and Molecular Biophysics New Hunt's House (room 3.14) King's College London, Guy's Campus London, SE1 1UL U.K. [EMAIL PROTECTED] +44(0)207-8486329 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question: RMySQL bulk load/update one column, dbWriteTable()?
Hi, I have a question reading using RMySQL trying to load one R vector into a table column. To be more specifically, the table is there populated. Now I add a new column and want to populate this. Can some colleagues on this list teach me how to do this? I know how to write one R object/table into MYSQL table using dbWriteTable. But in this situation, I just want to do one column. Thanks much in advance. -- Waverley [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question: RMySQL bulk load/update one column, dbWriteTable()?
I have a question reading using RMySQL trying to load one R vector into a table column. To be more specifically, the table is there populated. Now I add a new column and want to populate this. Okay, this is more of an SQL question now, but you could just use dbWriteTable and then do an multi-table update. dbGetQuery(con, select * from tmp) id name 1 1A 2 2B 3 3C 4 4D 5 5E dbSendQuery(con, alter table tmp add column r2 float) ## calculate some statistic for all or some ids in table x-dataframe(id=1:5, r2=c(.1, .4, .9, .4,.7)) dbWriteTable(con, r2tmp, x ) dbSendQuery(con, update tmp t, r2tmp r set t.r2=r.r2 where t.id=r.id) dbGetQuery(con, select * from tmp) id name r2 1 1A 0.1 2 2B 0.4 3 3C 0.9 4 4D 0.4 5 5E 0.7 Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question: RMySQL bulk load/update one column, dbWriteTable()?
Thanks Chris. I am trying almost the same solution while I have failed the dbWriteTable. The problem of using update is that it is way TOO slow when the row size is ~20. That is why I hope I can still get dbWriteTable way to add one column. dbWriteTable is very efficient and fast. The problem of dbWriteTable, so far I know and so far I have read, is that you have to load one data frame which covers all the columns of one table. Now I want to do is bulky load one column in stead of ALL columns. Supposedly underneath dbWriteTable is load data infile, which according to my reading should allow you to load data infile to one table column. can someone help? Thanks. On 6/6/07, Chris Stubben [EMAIL PROTECTED] wrote: I have a question reading using RMySQL trying to load one R vector into a table column. To be more specifically, the table is there populated. Now I add a new column and want to populate this. Okay, this is more of an SQL question now, but you could just use dbWriteTable and then do an multi-table update. dbGetQuery(con, select * from tmp) id name 1 1A 2 2B 3 3C 4 4D 5 5E dbSendQuery(con, alter table tmp add column r2 float) ## calculate some statistic for all or some ids in table x-dataframe(id=1:5, r2=c(.1, .4, .9, .4,.7)) dbWriteTable(con, r2tmp, x ) dbSendQuery(con, update tmp t, r2tmp r set t.r2=r.r2 where t.id=r.id) dbGetQuery(con, select * from tmp) id name r2 1 1A 0.1 2 2B 0.4 3 3C 0.9 4 4D 0.4 5 5E 0.7 Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Waverley @ Palo Alto [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about parse and expression
Dear R-users, In the following example, I would like to see my ylabel as: beta[3] * x[3] + epsilon (where beta and epsilon are replaced by their mathematical symbols). Please advise. Thanks. Nitin i - 3 ee - expression(beta[i] * x[i] + epsilon) xyplot(1:10~ 11:20, ylab = parse(text=ee) ) 8:00? 8:25? 8:40? Find a flick in no time __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about parse and expression
On Wed, 6 Jun 2007, Nitin Jain wrote: Dear R-users, In the following example, I would like to see my ylabel as: beta[3] * x[3] + epsilon (where beta and epsilon are replaced by their mathematical symbols). Please advise. ?plotmath ?bquote example(plotmath) Thanks. Nitin i - 3 ee - expression(beta[i] * x[i] + epsilon) xyplot(1:10~ 11:20, ylab = parse(text=ee) ) 8:00? 8:25? 8:40? Find a flick in no time __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about parse and expression
Try this: library(lattice) x - 1:10 i - 3 xlab - as.expression(substitute(beta[i] * x[i] + epsilon, list(i = i))) xyplot(x ~ x, xlab = xlab) On 6/6/07, Nitin Jain [EMAIL PROTECTED] wrote: Dear R-users, In the following example, I would like to see my ylabel as: beta[3] * x[3] + epsilon (where beta and epsilon are replaced by their mathematical symbols). Please advise. Thanks. Nitin i - 3 ee - expression(beta[i] * x[i] + epsilon) xyplot(1:10~ 11:20, ylab = parse(text=ee) ) 8:00? 8:25? 8:40? Find a flick in no time __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.