Re: [R] convert a tar.gz to a Windows zip
Uwe Ligges seems to be providing a simple way to make the conversion (though I have not used it): http://win-builder.r-project.org/ On Mon, 1 Oct 2007, Edna Bell wrote: Dear R Gurus; Is there a simple way to convert a Linux produced tar.gz file (a package) to a Windows binary zip package, please? Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function for special matrix design
Lookup: ?diag ?upper.tri ?lower.tri Example: my.mat-matrix(1:16,ncol=4) my.mat my.mat[upper.tri(my.mat)]-3 my.mat -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of kevinchang Sent: 01 October 2007 17:37 To: r-help@r-project.org Subject: [R] function for special matrix design Hi All, I am trying to make a matrix with a particular value for all the elements above and including diagonal and another particular value for all the elements below diagonal. Obviously , Matrix function in Matrix package does not work since it only allow filling by either row or column. So I am wondering if there is a built-in function allowing me to do this ? please help. Thanks -- View this message in context: http://www.nabble.com/function-for-special-matrix-design-tf4549437.html#a12982709 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Disentagling formulas
I tried formula-y~X1+X2+X3 update.formula(formula,.-NewRandomEffects~.) and this runs without an error. So I still didn't get the point. You may also need to insulate your new response variable if you want to estimate a model with in fact only one response variable, which ist done by I() as in update.formula(formula,I(.-NewRandomEffects)~.) hrth. Rebecca Sela schrieb: Update looks like the function I want to use, but it still isn't working, possibly because I am doing something more complicated than my message implied. What I want to do is subtract a vector of random effects (determined within the function) from Y. I hoped that something like this: tree - rpart(update(formula,.-NewRandomEffects~.)) would work, but I got the message Error in update.default(formula, . - NewRandomEffects ~ .) : need an object with call component I have considered creating a function that would subtract the random effect from each Y individually, but there is no identifier on Y. Is there an obvious fix I'm missing? Thanks for your suggestions! Rebecca - Original Message - From: Eik Vettorazzi [EMAIL PROTECTED] To: Rebecca Sela [EMAIL PROTECTED] Cc: r-help@r-project.org Sent: Monday, October 1, 2007 11:34:13 AM (GMT-0500) America/New_York Subject: Re: [R] Disentagling formulas maybe I didn't understand the problem, but you can do sth like formula-y~X1+X2+X3 lm(update.formula(formula,f(Y)~.)) maybe ?all.vars or ?terms will help you too. hth. Rebecca Sela schrieb: I am writing a program in which I would like to take in a formula, change the response (Y) variable into something else, and then pass the formula, with the new Y variable to another function. That is, I am starting with formula - Y~X1+X2+X3 and I'd like to do something like Y - formula$Y newY - f(Y) lm(newY~X1+X2+X3) So far, it seems that my only option will be a very complicated sequence of steps involving match.call(). Is there a simpler way to change the response variable in a formula? Thanks in advance! Rebecca -- Rebecca Sela Statistics Department Stern School of Business New York University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/42803-8243 F ++49/40/42803-7790 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with ARES
Dear mailing list: I am a new user of R and I would like to use the new ARES package. I have followed the procedure to import a genepopfile and everything seemed to work. However when I ran aresCalc function, I got a message error and I don't know what it means. The error message is object alri.tot not found. Can anyone help me? Thank you, Malia -- View this message in context: http://www.nabble.com/problem-with-ARES-tf4553922.html#a12995734 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: Calculating group means using self-written function
Hi [EMAIL PROTECTED] napsal dne 02.10.2007 10:44:20: Hi R-users, Suppose I have a following data set. y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20*1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) var1 - rep(1:5,12) z - rep(1:6,10) f - gl(3,20, labels=paste(lev, 1:3, sep=)) d - data.frame(var1=var1, z=z,y=y, f=f) Using following code I can calculate group means library(doBy) summaryBy(y ~ f + var1, data=d, FUN=mean) How do I have to modify the FUN argument if I want to calculate mean using unique values for instance fun - function(x, y) sum(x)/length(unique(y)) summaryBy(y ~ f + var1, data=d, FUN=fun(y, z) Error in get(x, envir, mode, inherits) : variable currFUN of mode function was not found Not sure how to do it in doBy but using aggregate aggregate(d$y, list(d$var1,d$f), fun, y=z) probably do what you want Regards Petr Best regards LN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mean of subset of rows
Thankyou all for your answers, I have decided using aggregate() but I will keep in mind tapply(). I was wondering if it is possible to tell aggregate to use two functions at the same time, i.e., mean() and sd (), or is it better to call aggregate() two times, one for mean, and another for sd and then cbind both results. Also, if the data.frame has now three columns, ID-size-Town, as follows: data-data.frame(ID=c(rep(letters[1:4],2),rep(f,12)), size=runif (20),Town=c(rep(LETTERS[1:2],4),rep(C,12))) And I want to produce a table with the mean size for each ID keeping Town in the result table, I cannot get to add the Town for each ID avgs -aggregate(data$size, by = list(data$ID), mean) SDs -aggregate(data$size, by = list(data$ID), sd) results = cbind(avgs,SDs[2],data$Town) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 5, 20 Thanks again! David You were on the right track with the for loop, but often you can do the same thing looplessly (I know, it's not really a word) in R: If your data is like this: data-data.frame(ID=rep(letters[1:4], 5), size=runif(20)) then apply either tapply(data$size, data$ID, mean) or aggregate(data$size, list(data$ID), mean) For further reference, section 4.2 in An Introduction to R describes using tapply in this way. Jeff. On Oct 1, 2007, at 11:57 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Dear list, this must be an easy one: I have a data.frame of two columns, ID with four different levels (A to D) and numerical size, and each of the 4 different IDs is repeated a different number of times. I would like to get the mean size for each ID as another data.frame. I have tried the following: ID= as.character(unique(data[,1])) # I use unique() because data will be larger in future nIDs = length(ID) for(i in 1:nIDs){ + subdata = subset(data,V1==ID[i]) + average = as.data.frame(cbind(1:i,ID[i],mean(subdata[,2])) + } Unfortunately, my output only gets the last level of ID four times: average V1 V2 V3 1 1 D 179.7778 2 2 D 179.7778 3 3 D 179.7778 4 4 D 179.7778 How can I get what I need? there might be an easier way to do it, but I guess my skills aren´t that good. Any suggestions are welcome Regards, David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] building packages on Windows
On 01/10/2007 11:45 PM, Edna Bell wrote: Hi again. I'm sure that this is really simple. I'm trying to build a package on a Windows Vista machine. I use Rcmd build --binary test but I get the Please set TMPDIR to a valid temporary directory I tried TMPDIR=c:\temp but to no avail. I suspect you don't have write permission on c:\temp. Windows systems typically have TEMP set to a writeable directory, and recent builds of R 2.6.0 will use that directory; in older versions you should be able to manually set TMPDIR to the same directory. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sample fromt he real line
On 01/10/2007 3:50 AM, Daniel Polhamus wrote: Hello R Gurus, This is a simple enough question, but I am curious as to whether there's an answer... Can R generate a random variable uniformly distributed on -Inf to Inf? Philosophically this doesn't seem possible, and if not, as I imagine so, is there some sort of generally accepted factor I should be multiplying by a Unif(-1,1) rv to sample from the real line? There's no distribution that's uniform on the real line, but there are a lot of ways to sample non-uniformly from it. A common approximation (e.g. to simulate a non-informative prior in Bayesian work) is to use a normal distribution with a large variance. AFAIK there isn't any agreement on what large should be, because it depends so much on context. Duncan Murdoch Thanks, Dan Polhamus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splom pairs and groups argument
On 10/2/07, GOUACHE David [EMAIL PROTECTED] wrote: Hello, I'm trying to pull off a certain graph using splom, and can't quite get my panel functions right. Basically, the equivalent using pairs would be something like this (using iris data set as an example): panel.corval - function(x, y, digits=2, prefix=, cex.cor,col,pch) { usr - par(usr); on.exit(par(usr)) par(usr = c(0, 1, 0, 1)) r - abs(cor(x, y,use=complete.obs)) txt - format(c(r, 0.123456789), digits=digits)[1] txt - paste(prefix, txt, sep=) if(missing(cex.cor)) cex - 0.8/strwidth(txt) text(0.5, 0.5, txt, cex = cex ) } pairs(iris[,1:4], lower.panel=panel.smooth, upper.panel=panel.corval,col=rainbow(nlevels(iris$Species))[iris$Species],pch=19) Here's a more or less direct translation: panel.corval2 - function(x, y, digits=2, prefix=, cex.cor, ...) { require(grid) r - abs(cor(x, y, use = complete.obs)) txt - format(c(r, 0.123456789), digits=digits)[1] txt - paste(prefix, txt, sep=) if (missing(cex.cor)) cex.cor - 10 / nchar(txt) grid.text(txt, 0.5, 0.5, gp = gpar(cex = cex.cor)) } splom(iris[1:4], groups = iris$Species, pch = 16, lower.panel = function(...) { panel.xyplot(...) panel.loess(..., col = 1, lwd = 3) }, upper.panel = panel.corval2) My goals are: 1) to have in one panel (above or below the diagonal) the points themselves with a smoothing line over all the points and in the other the (absolute) value of the correlation coefficient, so as to get an overall idea of my correlations 2) to be able to identify points depending on levels of one or more factors (hence the groups argument that is giving me a hard time in writing a succesfull panel function) 3) I want to do this in splom as I want to build several scatterplot matrices subsetting my data over the levels of another factor... Could anyone please help me out with this ? Also, while trying to set up an example, I landed upon this behavior I couldn't quite make sense of: panel.corval - function(x, y, digits=2, prefix=, cex.cor) { usr - par(usr); on.exit(par(usr)) par(usr = c(0, 1, 0, 1)) r - abs(cor(x, y,use=complete.obs)) txt - format(c(r, 0.123456789), digits=digits)[1] txt - paste(prefix, txt, sep=) if(missing(cex.cor)) cex - 0.8/strwidth(txt) text(0.5, 0.5, txt, cex = cex ) } pairs(iris[,1:4], lower.panel=panel.smooth, upper.panel=panel.corval,col=rainbow(nlevels(iris$Species))[iris$Species],pch=19) Erreur dans lower.panel(...) : unused argument(s) (col = c(#FF, #FF, #FF, I can't understand why not specifying col and pch arguments in the function I use for upper.panel (which dosen't at all need them in this case) is a problem. upper.panel will be called with pch, col, etc., even if they don't ``need'' them. Having a ... argument should be enough to capture these. It's all the more confusing that the error message says the problem is with lower.panel... If anyone has a clue as to what's happening, please tell. pairs.default is renaming things internally; hint: try pairs(iris[,1:4], upper.panel=panel.corval,pch=19, row1attop=FALSE) -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating group means using self-written function
Thanks Petr, Yes, your code seems to work. But when I try to reproduce it with my original data set fun - function(x, y) sum(x)/length(unique(y)) aggregate(vsid$lev, list(vsid$month, vsid$yeari), fun, vsid$lev=vsid$date) I get Error: syntax error, unexpected EQ_ASSIGN, expecting ',' in aggregate(vsid$lev, list(vsid$month, vsid$year), fun, vsid$lev= Can you intepret what is wrong? vsid$date is $ date :Class 'Date' num [1:637] 13695 13695 13695 13695 13695 ... Cheers, Lauri 2007/10/2, Petr PIKAL [EMAIL PROTECTED]: Hi [EMAIL PROTECTED] napsal dne 02.10.2007 10:44:20: Hi R-users, Suppose I have a following data set. y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20*1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) var1 - rep(1:5,12) z - rep(1:6,10) f - gl(3,20, labels=paste(lev, 1:3, sep=)) d - data.frame(var1=var1, z=z,y=y, f=f) Using following code I can calculate group means library(doBy) summaryBy(y ~ f + var1, data=d, FUN=mean) How do I have to modify the FUN argument if I want to calculate mean using unique values for instance fun - function(x, y) sum(x)/length(unique(y)) summaryBy(y ~ f + var1, data=d, FUN=fun(y, z) Error in get(x, envir, mode, inherits) : variable currFUN of mode function was not found Not sure how to do it in doBy but using aggregate aggregate(d$y, list(d$var1,d$f), fun, y=z) probably do what you want Regards Petr Best regards LN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Regression
look at ?summary.lm(), and specifically at the `Value' section, e.g., try this: lmFit - lm(weight ~ group - 1) summ.lmFit - summary(lmFit) summ.lmFit$coefficients summ.lmFit$r.squared I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: livia [EMAIL PROTECTED] To: r-help@r-project.org Sent: Tuesday, October 02, 2007 1:04 PM Subject: [R] Linear Regression Hello, I would like to fit a linear regression and when I use summary(), I got the following result: Call: lm(formula = weight ~ group - 1) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(|t|) groupCtl 5.0320 0.2202 22.85 9.55e-15 *** groupTrt 4.6610 0.2202 21.16 3.62e-14 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-Squared: 0.9818, Adjusted R-squared: 0.9798 F-statistic: 485.1 on 2 and 18 DF, p-value: 2.2e-16 In fact, I do not need them all. Is there a way of exacting part of the infomation, like the Coefficient or Multiple R-Squared? -- View this message in context: http://www.nabble.com/Linear-Regression-tf4554258.html#a12996725 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Regression
Livia, Try the following: fit1-summary(lm(weight~group-1) summary(fit1) names(fit1) #The code above wil fit the model and print the results. #The statement names(fit1) will give you the components of #the summary such as #coefficients, R.squared adj.R.squared, etc. #You can then access the components, viz. coeffs-summary(fit1)$coefficients #or RSq-summary(fit1)$r.squared John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) livia [EMAIL PROTECTED] 10/2/2007 7:04 AM Hello, I would like to fit a linear regression and when I use summary(), I got the following result: Call: lm(formula = weight ~ group - 1) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(|t|) groupCtl 5.0320 0.2202 22.85 9.55e-15 *** groupTrt 4.6610 0.2202 21.16 3.62e-14 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-Squared: 0.9818, Adjusted R-squared: 0.9798 F-statistic: 485.1 on 2 and 18 DF, p-value: 2.2e-16 In fact, I do not need them all. Is there a way of exacting part of the infomation, like the Coefficient or Multiple R-Squared? -- View this message in context: http://www.nabble.com/Linear-Regression-tf4554258.html#a12996725 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for the so...{{dropped}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Regression
OOPPSS, I forgot a right parenthesis fit1-summary(lm(weight~group-1) ) John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) John Sorkin [EMAIL PROTECTED] 10/2/2007 7:40 AM Livia, Try the following: fit1-summary(lm(weight~group-1) summary(fit1) names(fit1) #The code above wil fit the model and print the results. #The statement names(fit1) will give you the components of #the summary such as #coefficients, R.squared adj.R.squared, etc. #You can then access the components, viz. coeffs-summary(fit1)$coefficients #or RSq-summary(fit1)$r.squared John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) livia [EMAIL PROTECTED] 10/2/2007 7:04 AM Hello, I would like to fit a linear regression and when I use summary(), I got the following result: Call: lm(formula = weight ~ group - 1) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(|t|) groupCtl 5.0320 0.2202 22.85 9.55e-15 *** groupTrt 4.6610 0.2202 21.16 3.62e-14 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-Squared: 0.9818, Adjusted R-squared: 0.9798 F-statistic: 485.1 on 2 and 18 DF, p-value: 2.2e-16 In fact, I do not need them all. Is there a way of exacting part of the infomation, like the Coefficient or Multiple R-Squared? -- View this message in context: http://www.nabble.com/Linear-Regression-tf4554258.html#a12996725 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for the so...{{dropped}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for the so...{{dropped}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apparently Conflicting Results with coxph
Kevin E. Thorpe wrote: Peter Dalgaard wrote: Kevin E. Thorpe wrote: Dear List: I have a data frame prepared in the couting process style for including a binary time-dependent covariate. The first few rows look like this. PtNo StartEnd Status Imp 1 1 0 608.0 0 0 2 2 0 513.0 0 0 3 2 513 887.0 0 1 4 3 0 57.0 0 0 5 357 604.0 0 1 6 4 0 150.0 1 0 The outcome is mortality and the covariate is for an implantable defibrillator, so it is expected that the implant would reduce the risk of death. The results of fitting coxph (survival package) are: Call: coxph(formula = Surv(Start, End, Status) ~ Imp, data = nina.excl) coef exp(coef) se(coef) zp Imp 0.163 1.180.485 0.337 0.74 Likelihood ratio test=0.11 on 1 df, p=0.738 n= 335 Since this was unexpected, I created a non-counting process data frame with an indicator variable representing received an implant or not. Here are the results: Call: coxph(formula = Surv(Days, Dead) ~ Implant, data = nina.excl0) coef exp(coef) se(coef) z p Implant -1.77 0.1710.426 -4.15 3.3e-05 Likelihood ratio test=19.1 on 1 df, p=1.21e-05 n= 197 I found this degree of discrepancy surprising, especially the point estimate of the coefficient. I have verified the data frames are set up correctly. Here is what I have tried to understand what is going on. I tried fitting models adjusted for other covariates that I have in the data frame. This did not appreciably affect the coefficients for the implant variable. I ran cox.zph on the two models shown above and plotted the results. In both cases, the point estimate of Beta(t) is sort of parabolic in that the curves are monotonically increasing to a local maximum after which they are monotonically decreasing (the CIs are a bit more wiggly). I would interpret this to mean that the effect of implant is probably time-dependent. If so, how do I actually get a proper estimate of beta(t) for a variable like this? Are there some other things I should look at to understand what's going on? If you want to play with time-dependent regression coefficients have a look at the timereg package and the book that it supports. However, first you need to consider the possibility of selection effects that can take place even with non-varying effects. In the case at hand I would suspect a bias created by the fact that you don't implant devices into people who are already dead. Thanks. The point in your last paragraph did cross my mind too. I thought about this some more, and I'm not sure that possibility is to blame. In my time-dependent model, I don't think I'm doing anything different than is done for transplant in the Stanford Heart Study (the often used example for this kind of time-dependent covariate). As in my case, you would not transplant a dead patient. So, I remain puzzled as to why my model is misbehaving. -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Department of Public Health Sciences Faculty of Medicine, University of Toronto email: [EMAIL PROTECTED] Tel: 416.864.5776 Fax: 416.864.6057 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating group means using self-written function
Well, I finally found a roundabout fun - function(x, y) sum(x)/max(y) aggregate(vsid$lev, list(vsid$month, vsid$year), fun, y=by(vsid$date, vsid$month, function(x) length(unique(x Thanks, Lauri 2007/10/2, Lauri Nikkinen [EMAIL PROTECTED]: Thanks Petr for your kind answer. I got it now but it seems that argument y will not be split by list(vsid$month, vsid$year) in the aggregate function. I should get number of days in each month in the denominator with length(unique(y)) but instead I get sum of days in months in the denominator. So I will not get correct answers. Should I modify my fun in some way? Best regards, Lauri 2007/10/2, Petr PIKAL [EMAIL PROTECTED]: Hi [EMAIL PROTECTED] napsal dne 02.10.2007 13:19:09: Thanks Petr, Yes, your code seems to work. But when I try to reproduce it with my original data set fun - function(x, y) sum(x)/length(unique(y)) aggregate(vsid$lev, list(vsid$month, vsid$yeari), fun, vsid$lev=vsid$date) Shall be aggregate(vsid$lev, list(vsid$month, vsid$yeari), fun, y=vsid$date) From help page ## S3 method for class 'data.frame': aggregate(x, by, FUN, ...) ... further arguments passed to or used by methods. Your function has 2 arguments one is x which is assigned vsid$lev and the other is y which you want to assign vsid$date. You can imagine that aggregate splits your x according to the levels mentioned in by and applies to each split a function fun together with any other argument, in your case y. So you need to provide a correct name to your function otherwise it does not know what to do. Regards Petr I get Error: syntax error, unexpected EQ_ASSIGN, expecting ',' in aggregate(vsid$lev, list(vsid$month, vsid$year), fun, vsid$lev= Can you intepret what is wrong? vsid$date is $ date :Class 'Date' num [1:637] 13695 13695 13695 13695 13695 ... Cheers, Lauri 2007/10/2, Petr PIKAL [EMAIL PROTECTED]: Hi [EMAIL PROTECTED] napsal dne 02.10.2007 10:44:20: Hi R-users, Suppose I have a following data set. y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20*1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) var1 - rep(1:5,12) z - rep(1:6,10) f - gl(3,20, labels=paste(lev, 1:3, sep=)) d - data.frame(var1=var1, z=z,y=y, f=f) Using following code I can calculate group means library(doBy) summaryBy(y ~ f + var1, data=d, FUN=mean) How do I have to modify the FUN argument if I want to calculate mean using unique values for instance fun - function(x, y) sum(x)/length(unique(y)) summaryBy(y ~ f + var1, data=d, FUN=fun(y, z) Error in get(x, envir, mode, inherits) : variable currFUN of mode function was not found Not sure how to do it in doBy but using aggregate aggregate(d$y, list(d$var1,d$f), fun, y=z) probably do what you want Regards Petr Best regards LN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating group means using self-written function
Lauri Nikkinen wrote: Suppose I have a following data set. y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20*1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) var1 - rep(1:5,12) z - rep(1:6,10) f - gl(3,20, labels=paste(lev, 1:3, sep=)) d - data.frame(var1=var1, z=z,y=y, f=f) Using following code I can calculate group means library(doBy) summaryBy(y ~ f + var1, data=d, FUN=mean) How do I have to modify the FUN argument if I want to calculate mean using unique values for instance fun - function(x, y) sum(x)/length(unique(y)) summaryBy(y ~ f + var1, data=d, FUN=fun(y, z) Error in get(x, envir, mode, inherits) : variable currFUN of mode function was not found Best regards LN Your first call to summaryBy suggests, that the argument FUN must be a name of a function. fun(y,z) in the second call is evaluated before call to summaryBy. fun(y,z) returns a single number, so you have summaryBy(y ~ f + var1, data=d, FUN=331.9017) summaryBy wants to call FUN(x) and use its result and check if it is valid. It cannot call 331.9017 and gives the error. Probably, you might want to change your code for something like following fun - function(x, y=1) sum(x)/length(unique(y)) summaryBy(y ~ f + var1, data=d, FUN=fun, y=z) More details should be given in ?summaryBy -- View this message in context: http://www.nabble.com/Calculating-group-means-using-self-written-function-tf4553726.html#a12998681 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using sprintf with dates
hello, Please help with using sprintf with character variables: The following does not produce what i intended foot=function(){ str1=format(Sys.Date,%Y%m%d) sprintf(99%-4s%s,nm,str1) } I wanted to have 99nm 20071002 as the output. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using sprintf with dates
stephen bond wrote on 10/02/2007 08:36 AM: hello, Please help with using sprintf with character variables: The following does not produce what i intended foot=function(){ str1=format(Sys.Date,%Y%m%d) sprintf(99%-4s%s,nm,str1) } I wanted to have 99nm 20071002 as the output. You forgot the parens after Sys.Date: foot=function(){ str1=format(Sys.Date(),%Y%m%d) sprintf(99%-4s%s,nm,str1) } Jeff -- http://biostat.mc.vanderbilt.edu/JeffreyHorner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using sprintf with dates
stephen bond wrote: foot=function(){ str1=format(Sys.Date,%Y%m%d) sprintf(99%-4s%s,nm,str1) } I wanted to have 99nm 20071002 as the output. Sys.Date is a function. It's perfectly possible to write string - format(Sys.Date, %s) (or, generically, string - format(sin, %s), etc), but it will just put some description of the function in the string. Also, the assignment in R is -, not =. Try: foot - function(){ str1 - format(Sys.Date(),%Y%m%d) sprintf(99%-4s%s,nm,str1) } Alberto Monteiro __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SVM: stratified cross-validation
Hi, Is there a way to do stratified cross-validation of Support Vector Machines? Regards, Kaustubh the tools to get online. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cannot Install rimage
I'm trying to install rimage in R version 2.5.1 running on Fedora 6 (kernel 2.6.22.7-57.fc6 with the headers and gcc installed, along with fftw2 and libjpeg and headers): install.packages(rimage) Warning in install.packages(rimage) : argument 'lib' is missing: using '/usr/lib/R/library' trying URL 'http://lib.stat.cmu.edu/R/CRAN/src/contrib/rimage_0.5-7.tar.gz' Content type 'application/x-gzip' length 331029 bytes opened URL == downloaded 323Kb * Installing *source* package 'rimage' ... checking for g++... no checking for c++... no checking for gpp... no checking for aCC... no checking for CC... no checking for cxx... no checking for cc++... no checking for cl... no checking for FCC... no checking for KCC... no checking for RCC... no checking for xlC_r... no checking for xlC... no checking for C++ compiler default output... configure: error: C++ compiler cannot create executables See `config.log' for more details. ERROR: configuration failed for package 'rimage' ** Removing '/usr/lib/R/library/rimage' The downloaded packages are in /tmp/RtmpGDx12u/downloaded_packages Warning message: installation of package 'rimage' had non-zero exit status in: install.packages(rimage) I cannot find config.log that is mentioned. Can someone please tell me what is missing from my configuration? rimage installed on another system running Fedora 7 once I installed fftw2. Rick B. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mcv package gamm function Error in chol(XVX + S)
Hi all R users ! I'm using gamm function from Simon Wood's mgcv package, to fit a spatial regression Generalized Additive Mixed Model, as covariates I have the geographical longitude and latitude locations of indexed data. I include a random effect for each district (dist) so the code is fit - gamm(y~s(lon,lat,bs=tp, m=2)+offset(log(exp.)), random=list(dist=~1), family=poisson, niterPQL=30) when I run the gamm function I obtain the next error message: %%% Maximum number of PQL iterations: 30 iteration 1 iteration 2 ... iteration 8 iteration 9 iteration 10 Error in chol(XVX + S) : the leading minor of order 29 is not positive definite %%% Could be any problem in gamm() ??? -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to view the code of a method?
Dear All I am a biginner of R. I have difficulty with reading the code of a method. I am using the vars package to estimate a VAR model and I want to view the code of predict method for objects with class attribute varest. I thougt I could just type the name predict without anything to display the code of the method as I often do with generic function. However, I got the following messages: function (object, ...) UseMethod(predict) environment: namespace:stats I cannot figure out the meaning that the above message want to deliver to me and thus cannot find a way to view the code of the predict method with class varest. Can anyone help me out? Thanks in advance. Strong Chen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to view the code of a method?
For S3 methods I'd try something like methods(predict) [1] predict.ar*predict.Arima* predict.arima0* predict.glm [5] predict.HoltWinters* predict.lm predict.loess* predict.mlm [9] predict.nls* predict.poly predict.ppr* predict.prcomp* [13] predict.princomp* predict.smooth.spline* predict.smooth.spline.fit* predict.StructTS* Non-visible functions are asterisked and then e.g. predict.lm or getAnywhere(predict.ar) for those Non-visible HIH Stef On Tue, Oct 02, 2007 at 11:52:34PM +0900, Strong wrote: StrongDear All Strong StrongI am a biginner of R. Strong StrongI have difficulty with reading the code of a method. Strong StrongI am using the vars package to estimate a VAR model and I want to view the code of predict method for objects with class attribute varest. Strong StrongI thougt I could just type the name predict without anything to display the code of the method as I often do with generic function. However, I got the following messages: Strong Strongfunction (object, ...) StrongUseMethod(predict) Strongenvironment: namespace:stats Strong StrongI cannot figure out the meaning that the above message want to deliver to me and thus cannot find a way to view the code of the predict method with class varest. Strong StrongCan anyone help me out? Strong StrongThanks in advance. Strong Strong StrongStrong Chen Strong Strong__ StrongR-help@r-project.org mailing list Stronghttps://stat.ethz.ch/mailman/listinfo/r-help StrongPLEASE do read the posting guide http://www.R-project.org/posting-guide.html Strongand provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] permutations of a binary matrix with fixed margins
Jérôme, As a first attempt, how about the function below. It works (or not) by randomly sorting the rows and columns, then searching the table for squares with the corners = matrix(c(1,0,0,1),ncol=2) and subtracting them from 1 to give matrix(c(0,1,1,0),ncol=2) (and vice versa). Randomized matrices can be produced as a chain where each permutation is seeded with the previous one. Potential problems: 1. Does it really explore all possible permutations? Does it do it in an unbiased way? 2. Related to above: there is potential autocorrelation (although I haven't found it with my data set), so might need some dememorisation steps. 3. It's slow and dememorisation makes it slower. 4. It isn't clear to me whether it needs the added stochastic element, i.e. squares are only flipped if runif(1)0.5. In practice it seems to work without it (as far as I can tell, i.e. it isn't autocorrelated using my data set). I suspect there's a much better way of doing this, which might take the margins as an input, and therefore wouldn't be directly derived from any particular matrix structure. Paul ### # function to permute binary matrix keeping margins fixed. # the input x is the binary matrix to be permuted permute.struct- function(x) { x-x[rownames(x)[sample(1:nrow(x))],colnames(x)[sample(1:ncol(x))]] pattern-c(1,0,0,1) for(i in 1:(nrow(x)-1)) for(j in 1:(ncol(x)-1)) for(ii in (i+1):nrow(x)) for(jj in (j+1):ncol(x)) { query-c(x[c(i,ii),c(j,jj)]) if((all(query-pattern==0) || all(query==1-pattern)) runif(1)0.5) x[c(i,ii),c(j,jj)] - 1 - x[c(i,ii),c(j,jj)] } x } ### From: Mathieu Jérôme jerome.mathieu_at_snv.jussieu.fr Date: Thu 05 Apr 2007 - 13:34:01 GMT Dear R Users, How can I randomize a matrix (with contains only 0 and 1) with the constraint that margin totals (columns and rows sums) are kept constant? Some have called this swap permutation or something like that. The principle is to find bloc of 10 01 and change it for 01 10 there can be several rows or columns between these numbers. It probably exists in R, but I can't find the function. I am aware of permcont, but it works only for 2x2 matrices thanks in advance Jerome -- Jérôme Mathieu Laboratory of soil tropical ecology UPMC [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to add legend for image()?
Dear friends, The following is an example to explain my question. I want to get a legend which will show the z-values according to different colors in image() function. x-sort(runif(10)) #x-coordinates y-sort(runif(10)) #y-coordinates z-matrix(runif(100),nrow=10) #attributes values image(x,y,z,col=gray((6:3)/6)) # legend(x,y,legend=z,col=gray((6:3)/6)) #error. the colors should denote different z-values, i want to get this legend. Thanks. -- With Kind Regards, oooO: (..): :\.(:::Oooo:: ::\_)::(..):: :::)./::: ::(_/ : [***] Zhi Jie,Zhang ,PHD Tel:86-21-54237149 Dept. of Epidemiology,School of Public Health,Fudan University Address:No. 138 Yi Xue Yuan Road,Shanghai,China Postcode:200032 Email:[EMAIL PROTECTED] Website: www.statABC.com [***] oooO: (..): :\.(:::Oooo:: ::\_)::(..):: :::)./::: ::(_/ : [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lpSolve doesn't compile because of a malloc.h error
Yes, it was very clear but being an absolute beginner as a programmer I was misled by the presence of both stdlib.h and malloc.h in lpslink55.c (in a nutshell, I thought Why s it complaining if stdlib.h is there, in the code??). Anyway, thanks to your help I made it in the end. Ciao Vittorio Monday 01 October 2007 20:47:36 Prof Brian Ripley wrote: Isn't the message rather clear? You need to edit the package sources replace malloc.h by stdlib.h (or remove the former if the latter is already there, as in this case). And BTW, 'gcc-4.2.2' is unreleased, but imminent (as R 2.6.0 is). On Mon, 1 Oct 2007, vittorio wrote: Under freebsd 6.2-p7 i386, R 2.5.1,gcc-4.2.2 I'm unable to compile package lpSolve because: hpbsd# R CMD INSTALL lpSolve_5.5.8.tar.gz * Installing to library '/usr/local/lib/R/library' * Installing *source* package 'lpSolve' ... ** libs cc -std=gnu99 -I/usr/local/lib/R/include -I/usr/local/lib/R/include -I . -DINTEGERTIME -DPARSER_LP -DBUILDING_FOR_R -DYY_NEVER_INTERACTIVE -DUSRDLL -DCLOCKTIME -DRoleIsExternalInvEngine -DINVERSE_ACTIVE=INVERSE_LUSOL -DINLINE=static -DParanoia -I/usr/local/include -D__NO_MATH_INLINES -fpic -O2 -fno-strict-aliasing -pipe -march=prescott -c colamd.c -o colamd.o cc -std=gnu99 -I/usr/local/lib/R/include -I/usr/local/lib/R/include -I . -DINTEGERTIME -DPARSER_LP -DBUILDING_FOR_R -DYY_NEVER_INTERACTIVE -DUSRDLL -DCLOCKTIME -DRoleIsExternalInvEngine -DINVERSE_ACTIVE=INVERSE_LUSOL -DINLINE=static -DParanoia -I/usr/local/include -D__NO_MATH_INLINES -fpic -O2 -fno-strict-aliasing -pipe -march=prescott -c commonlib.c -o commonlib.o cc -std=gnu99 -I/usr/local/lib/R/include -I/usr/local/lib/R/include -I . -DINTEGERTIME -DPARSER_LP -DBUILDING_FOR_R -DYY_NEVER_INTERACTIVE -DUSRDLL -DCLOCKTIME -DRoleIsExternalInvEngine -DINVERSE_ACTIVE=INVERSE_LUSOL -DINLINE=static -DParanoia -I/usr/local/include -D__NO_MATH_INLINES -fpic -O2 -fno-strict-aliasing -pipe -march=prescott -c lp_wlp.c -o lp_wlp.o cc -std=gnu99 -I/usr/local/lib/R/include -I/usr/local/lib/R/include -I . -DINTEGERTIME -DPARSER_LP -DBUILDING_FOR_R -DYY_NEVER_INTERACTIVE -DUSRDLL -DCLOCKTIME -DRoleIsExternalInvEngine -DINVERSE_ACTIVE=INVERSE_LUSOL -DINLINE=static -DParanoia -I/usr/local/include -D__NO_MATH_INLINES -fpic -O2 -fno-strict-aliasing -pipe -march=prescott -c lpslink55.c -o lpslink55.o In file included from lpslink55.c:11: /usr/include/malloc.h:3:2: #error malloc.h has been replaced by stdlib.h *** Error code 1 Stop in /tmp/R.INSTALL.zVvtvK/lpSolve/src. ERROR: compilation failed for package 'lpSolve' ** Removing '/usr/local/lib/R/library/lpSolve' === Now, both malloc.h and stdlib.h are under /usr/include. What's wrong with it? Ciao Vittorio __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating proportions from a data frame rather than a table
When one has raw data it is easy to create a table of one variable against another and then calculate proportions For example a.nice.table-table(a,b) prop.table(a.nice.table,1) However, I looked at several papers and created a data frame of the aggregate data. That means I acually created a table except it is a data frame. The first column lists the name of the first author and the year. I cannot find how to convert the data frame to a table so I can use great functions such as prop.table and margin.table. Alas I tried, rowSums but it provides lousy output without listing the names of th papers. I have thought of going through the reshape package but I suspect that there is an easier way to convert a data frame to a table. Is there? -- Farrel Buchinsky [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mcv package gamm function Error in chol(XVX + S)
Well, it's certainly a problem detected by gamm, but whether it's a fundamental statistical problem with this model/data combination or something fixable by taking an alternative approach within gamm I can't tell. Would you be able to send me the data used here (I promise not to use it for anything else, of course), so I can figure out where the problem is? best, Simon I'm using gamm function from Simon Wood's mgcv package, to fit a spatial regression Generalized Additive Mixed Model, as covariates I have the geographical longitude and latitude locations of indexed data. I include a random effect for each district (dist) so the code is fit - gamm(y~s(lon,lat,bs=tp, m=2)+offset(log(exp.)), random=list(dist=~1), family=poisson, niterPQL=30) when I run the gamm function I obtain the next error message: %%% Maximum number of PQL iterations: 30 iteration 1 iteration 2 ... iteration 8 iteration 9 iteration 10 Error in chol(XVX + S) : the leading minor of order 29 is not positive definite %%% Could be any problem in gamm() ??? -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] zoo timeserie continuous? complete with NaN
dear r-list I have a zoo object with 2 objects and time: looks like: 2005-12-31 12:00:00 NA NaN 2005-12-31 13:00:00 NA NaN 2005-12-31 14:00:00 NA NaN 2005-12-31 15:00:00 NA NaN 2005-12-31 16:00:00 NA NaN 2005-12-31 18:00:00 NA NaN 2005-12-31 19:00:00 NA NaN 2005-12-31 20:00:00 NA NaN 2005-12-31 21:00:00 NA NaN 2005-12-31 22:00:00 NA NaN 2005-12-31 23:00:00 NA NaN its much longer though... As you can see, the values for 2005-12-31 17:00:00 miss. How can I let R show me if the timeseries is continuous and if not, to fill up with NaN values (i need a continuous timeseries). is there a function like complete.timeseries(missing=NA)? Thanks fpr your help Marc -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot question
Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n n 25p5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 16B-E06.g 47740 4676 6769 48 37B-B02.g 42860 6152 19276 72 35B-A02.g 48325 12863 38274 143 35B-A02.g 48410 12806 39013 175 35B-A02.g 48417 9057 40923 176 35B-A02.g 51403 13865 43338 161 45B-C12.g 50939 3656 5783 43 45B-C12.g 52356 5524 6041 55 45B-C12.g 49338 5141 5266 41 45B-C12.g 51567 3915 5677 43 35A-G04.g 40365 5513 6971 32 35B-D01.g 54217 12607 13067 93 35B-D01.g 55283 11441 14964 101 35B-D01.g 55041 9626 14928 94 35B-D01.g 54058 9465 14912 88 35B-A04.g 42745 12080 34271 105 35B-A04.g 41055 12423 34874 126 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is Intensity. I want to plot the data this way. x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. y-axis is Intensity Inside of plot is the points of intensity over 4 concentrations, points from different genes have different color or shape. A regression line of each genes crosss different concetrations, and at the end of line is gene IDs. Thanks, Tiandao __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitted values in LMER for the fixed-effects only
On 9/29/07, Dieter Menne [EMAIL PROTECTED] wrote: On 9/28/07, Anouk Simard Anouk.Simard at bio.ulaval.ca wrote: I would like to extract the fitted values from a model using LMER but only for the fix portion of the model and not for the fix and random portion (e.g it is the procedure outpm or outp in SAS). .. Quoting Douglas Bates bates at stat.wisc.edu The calculation of the level = 0 fitted values in the new representation of the fitted model is quite easy. It is [EMAIL PROTECTED] %*% fixef(fm1) Douglas Bates replied Yes. Mmm.. If I consider a few hundreds of messages in this list on the subject, it's considered dirty to use the [EMAIL PROTECTED] or fm1$X construct. So should we expect that this is final (no longer fitted(), augPred etc), or is it work in prooogres? This is to indicate that I am rather slow in getting the lme4 package finished? :-) I agree that I am and no one is more impatient with the progress than I. You are correct that it is considered bad form to reach into a fitted model structure and pull out individual components or slots. My brief answer agreeing that this was still the best way of obtaining these particular values should have emphasized the still. I haven't thought of a good way of specifying this particular form in, say, the fitted method. There is a clean way of specifying the levels of random effects incorporated into the fitted values for models with nested random effects only. However, for crossed or partially crossed random effects this is less clearly defined. Perhaps a better alternative would be to define a method for the model.matrix generic for lmer models. Then the calculation could be written as model.matrix(fm1, type = fixed) %*% fixef(fm1) using only extractor functions. It may be surprising that sometimes the most difficult part of the design of the code in a package is deciding on the names and arguments of functions or methods. In my experience rushing such decisions often leads to a lot of maintenance headaches. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to add legend for image()?
Have you tried filled.contour()? It automatically generates a legend. However, I'm not sure what is going on with your x's and y's being sorted random numbers. I assume your actual data are not like this. If you do not have data in an equally spaced grid you may need to create one by some sort of interpolation (e.g. Kriging or inverse distance weighting). On 10/2/07, zhijie zhang [EMAIL PROTECTED] wrote: Dear friends, The following is an example to explain my question. I want to get a legend which will show the z-values according to different colors in image() function. x-sort(runif(10)) #x-coordinates y-sort(runif(10)) #y-coordinates z-matrix(runif(100),nrow=10) #attributes values image(x,y,z,col=gray((6:3)/6)) # legend(x,y,legend=z,col=gray((6:3)/6)) #error. the colors should denote different z-values, i want to get this legend. Thanks. -- With Kind Regards, oooO: (..): :\.(:::Oooo:: ::\_)::(..):: :::)./::: ::(_/ : [***] Zhi Jie,Zhang ,PHD Tel:86-21-54237149 Dept. of Epidemiology,School of Public Health,Fudan University Address:No. 138 Yi Xue Yuan Road,Shanghai,China Postcode:200032 Email:[EMAIL PROTECTED] Website: www.statABC.com [***] oooO: (..): :\.(:::Oooo:: ::\_)::(..):: :::)./::: ::(_/ : [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apparently Conflicting Results with coxph
From my experience, what you are seeing is almost certainly a patient selection effect. (The number 1 reason for puzzling results is incorrect coding of a time-dependent covariate, but you appear to have been quite careful). Assigning the implant as a non-time dependent covariate almost guarrantees that the estimated effect will be beneficial. The only people who get an implant are those who live longer than average (long enough to get an implant). The size of such a bias is surprisingly large. The problem is rediscovered in the cancer field every few years, in comparisons of responders to non-responders. As a time-dependent covariate, you have the problem of indication for treatment. Say for instance that the devices were very expensive, and were only used for patients in immenent danger of death. For a device that was a placebo you would find, not surprisingly, that being selected for implantation carried a major risk. The device may need to be extremely effective to overcome this type of bias. As a simple example, if you compare the death rate of those who have seen a oncologist (cancer doc) in the last month to those who have not done so, you find that the former group has a much higher death rate. Terry Therneau Kevin E. Thorpe wrote: Dear List: I have a data frame prepared in the couting process style for including a binary time-dependent covariate. The first few rows look like this. PtNo StartEnd Status Imp 1 1 0 608.0 0 0 2 2 0 513.0 0 0 3 2 513 887.0 0 1 4 3 0 57.0 0 0 5 357 604.0 0 1 6 4 0 150.0 1 0 The outcome is mortality and the covariate is for an implantable defibrillator, so it is expected that the implant would reduce the risk of death. The results of fitting coxph (survival package) are: Call: coxph(formula = Surv(Start, End, Status) ~ Imp, data = nina.excl) coef exp(coef) se(coef) zp Imp 0.163 1.180.485 0.337 0.74 Likelihood ratio test=0.11 on 1 df, p=0.738 n= 335 Since this was unexpected, I created a non-counting process data frame with an indicator variable representing received an implant or not. Here are the results: Call: coxph(formula = Surv(Days, Dead) ~ Implant, data = nina.excl0) coef exp(coef) se(coef) z p Implant -1.77 0.1710.426 -4.15 3.3e-05 Likelihood ratio test=19.1 on 1 df, p=1.21e-05 n= 197 I found this degree of discrepancy surprising, especially the point estimate of the coefficient. I have verified the data frames are set up correctly. Here is what I have tried to understand what is going on. I tried fitting models adjusted for other covariates that I have in the data frame. This did not appreciably affect the coefficients for the implant variable. I ran cox.zph on the two models shown above and plotted the results. In both cases, the point estimate of Beta(t) is sort of parabolic in that the curves are monotonically increasing to a local maximum after which they are monotonically decreasing (the CIs are a bit more wiggly). I would interpret this to mean that the effect of implant is probably time-dependent. If so, how do I actually get a proper estimate of beta(t) for a variable like this? Are there some other things I should look at to understand what's going on? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apparently Conflicting Results with coxph
On Tue, 2 Oct 2007, Kevin E. Thorpe wrote: Kevin E. Thorpe wrote: Peter Dalgaard wrote: Kevin E. Thorpe wrote: Dear List: I have a data frame prepared in the couting process style for including a binary time-dependent covariate. The first few rows look like this. PtNo StartEnd Status Imp 1 1 0 608.0 0 0 2 2 0 513.0 0 0 3 2 513 887.0 0 1 4 3 0 57.0 0 0 5 357 604.0 0 1 6 4 0 150.0 1 0 The outcome is mortality and the covariate is for an implantable defibrillator, so it is expected that the implant would reduce the risk of death. The results of fitting coxph (survival package) are: Call: coxph(formula = Surv(Start, End, Status) ~ Imp, data = nina.excl) coef exp(coef) se(coef) zp Imp 0.163 1.180.485 0.337 0.74 Likelihood ratio test=0.11 on 1 df, p=0.738 n= 335 Since this was unexpected, I created a non-counting process data frame with an indicator variable representing received an implant or not. Here are the results: Call: coxph(formula = Surv(Days, Dead) ~ Implant, data = nina.excl0) coef exp(coef) se(coef) z p Implant -1.77 0.1710.426 -4.15 3.3e-05 Likelihood ratio test=19.1 on 1 df, p=1.21e-05 n= 197 I found this degree of discrepancy surprising, especially the point estimate of the coefficient. I have verified the data frames are set up correctly. Here is what I have tried to understand what is going on. I tried fitting models adjusted for other covariates that I have in the data frame. This did not appreciably affect the coefficients for the implant variable. I ran cox.zph on the two models shown above and plotted the results. In both cases, the point estimate of Beta(t) is sort of parabolic in that the curves are monotonically increasing to a local maximum after which they are monotonically decreasing (the CIs are a bit more wiggly). I would interpret this to mean that the effect of implant is probably time-dependent. If so, how do I actually get a proper estimate of beta(t) for a variable like this? Are there some other things I should look at to understand what's going on? If you want to play with time-dependent regression coefficients have a look at the timereg package and the book that it supports. However, first you need to consider the possibility of selection effects that can take place even with non-varying effects. In the case at hand I would suspect a bias created by the fact that you don't implant devices into people who are already dead. Thanks. The point in your last paragraph did cross my mind too. I thought about this some more, and I'm not sure that possibility is to blame. In my time-dependent model, I don't think I'm doing anything different than is done for transplant in the Stanford Heart Study (the often used example for this kind of time-dependent covariate). As in my case, you would not transplant a dead patient. So, I remain puzzled as to why my model is misbehaving. Indeed, the bias created that Peter alludes to is precisely because you would not implant a device in a dead patient that the implantation of such a device has an apparent negative effect on hazard of mortality. If a patient is enrolled and a date is set (perhaps implicitly) for implanting the device, then a patient who survives longer is more likely to receive the device. Here is a setup in which patients have exponential hazard and are scheduled to receive devices uniformly over a fixed time interval following their enrollment. As you see the time of intended implant is independent of the time of death by construction, but the apparent effect of implantation is negative. time.of.death - rexp(1000) time.of.implant - runif(1000,0,2) implant.occurred - time.of.death time.of.implant coxph( Surv( time.of.death ) ~ implant.occurred ) Call: coxph(formula = Surv(time.of.death) ~ implant.occurred) coef exp(coef) se(coef) z p implant.occurredTRUE -1.80 0.1650.084 -21.5 0 Likelihood ratio test=519 on 1 df, p=0 n= 1000 HTH, Chuck -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Department of Public Health Sciences Faculty of Medicine, University of Toronto email: [EMAIL PROTECTED] Tel: 416.864.5776 Fax: 416.864.6057 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Re: [R] Plot for two Series
Here is an example: ts.plot(ts(1:10, start = 2000), ts(2:8, start = 2005)) On 10/2/07, amna khan [EMAIL PROTECTED] wrote: Sir the data used in function ts.plot() is of the form Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov 1974 2134 1863 1877 1877 1492 1249 1280 1131 1209 1492 1621 1975 2103 2137 2153 1833 1403 1288 1186 1133 1053 1347 1545 1976 2020 2750 2283 1479 1189 1160 1113 970 999 1208 1467 1977 2240 1634 1722 1801 1246 1162 1087 1013 959 1179 1229 1978 2019 2284 1942 1423 1340 1187 1098 1004 970 1140 1110 1979 2263 1820 1846 1531 1215 1075 1056 975 940 1081 1294 But I have the data series of the form YEAR Khanpur 1 195252.1 2 195322.4 3195426.4 41955 132.8 5 195671.9 6 195710.4 7 1958 9.1 8 195949.8 because of which the sacle on x-axis are the numbers 1,2 3,..etc. Please guid in this regard Thank you On 10/1/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: See ?ts.plot Also ?plot.zoo and ?xyplot.zoo in the zoo package. On 10/1/07, amna khan [EMAIL PROTECTED] wrote: Dear Sir I want to plot the two time series having unequal number of observations, say, one series having values from year 1960 to year 2000 and other having values from year 1975 to year 2000. How to plot them on a single graph sheet. Regards -- AMINA SHAHZADI Department of Statistics GC University Lahore, Pakistan. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- AMINA SHAHZADI Department of Statistics GC University Lahore, Pakistan. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot question
If I've correctly interpreted what you want, you first need to get the x values: x - colnames(n) x - as.numeric(substr(x, 1, nchar(x) - 1)) Then it seems fairly easy to use matplot to get the values with different colors for each concentration dim(x) - c(length(x), 1) matplot(x, t(n), pch = 1) But this does not look like a simple line will fit the data for each gene well, so perhaps I've misunderstood something. You will have to decide how you want to do the regression. It will also get very messy and difficult to read with 20 lines (a different regression for each gene). To do the regressions, plot the lines, and label with the gene ID, see ?lm ?predict ?abline ?text On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n n 25p5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 16B-E06.g 47740 4676 6769 48 37B-B02.g 42860 6152 19276 72 35B-A02.g 48325 12863 38274 143 35B-A02.g 48410 12806 39013 175 35B-A02.g 48417 9057 40923 176 35B-A02.g 51403 13865 43338 161 45B-C12.g 50939 3656 5783 43 45B-C12.g 52356 5524 6041 55 45B-C12.g 49338 5141 5266 41 45B-C12.g 51567 3915 5677 43 35A-G04.g 40365 5513 6971 32 35B-D01.g 54217 12607 13067 93 35B-D01.g 55283 11441 14964 101 35B-D01.g 55041 9626 14928 94 35B-D01.g 54058 9465 14912 88 35B-A04.g 42745 12080 34271 105 35B-A04.g 41055 12423 34874 126 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is Intensity. I want to plot the data this way. x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. y-axis is Intensity Inside of plot is the points of intensity over 4 concentrations, points from different genes have different color or shape. A regression line of each genes crosss different concetrations, and at the end of line is gene IDs. Thanks, Tiandao __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to create this graph?
You could create this graph using barplot (give a matrix with the input), then use the text and possibly lines or segments functions to add the text and any additional lines needed. However, this graph is not clear to me, the box with 41 in it looks bigger than the box with 59, are those supposed to be the values of the boxes? Or somethning else. If you can give a better description of what you are trying to accomplish, then maybe we can give better advice (including possibly a better graph to create). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of squall44 Sent: Monday, October 01, 2007 1:42 AM To: r-help@r-project.org Subject: [R] how to create this graph? Hello, I'm trying to create the following graph: http://www.nabble.com/file/p12974686/bar.gif Is there a command (like pie(...) or barplot(...)) or is this just a special case of eg barplot? I'd be glad if someone could give me a hint on where to start. Regards, Tobias -- View this message in context: http://www.nabble.com/how-to-create-this-graph--tf4546742.html #a12974686 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] continuous boxplot?
Here are a couple of things that might help: Look at the hexbin package (I think it is on bioconductor rather than cran) Look at the quantreg package (for estimating the quantiles to plot) Look at the running and wapply functions in the gtools package for another option to estimate the quantiles. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Karin Lagesen Sent: Monday, October 01, 2007 2:32 AM To: r-help@r-project.org Subject: [R] continuous boxplot? I have two vectors x and y, which I would like to plot against each other. I am also displaying other data in this plot. However, I have about 1 million points to plot, and just plotting them x againt y is not very informative. What I'd like to do is to do sort of a continuous box plot. My x values goes from -1 to 1 and my y values from 0 to 1, so I´d like to plot the median and quantiles, and possibly also all of the outliers somehow. Are there any facilities in R for doing something like this, or would I need to do this the hard coded way? Thankyou very much for your help! Karin -- Karin Lagesen, PhD student [EMAIL PROTECTED] http://folk.uio.no/karinlag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] phyper returns negative results
On Tue, 2 Oct 2007, Bastian Angermann wrote: Dear R-users, In R 2.5.1 on Windows XP, SP2 the call to phyper(0,0,74,3,lower.tail=FALSE) returns -4.195862e-17. This does not happen with R2.5.1 on Linux, 0 is returned. Is this a bug (and should be reported as such), since phyper should not return negative values, or is this considered as one of the annoyances of finite precision arithmetic. I tend to think of this as a bug since it breaks the assumption phyper()=0. You are specifically asked to try out the latest version before calling 'bug' (in the R FAQ, in the R posting guide ...). R version 2.6.0 RC (2007-10-01 r43049) ... phyper(0,0,74,3,lower.tail=FALSE) [1] 0 on Windows. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] continuous boxplot?
Karin, I like to use bagplots in these cases where there are a lot of cases and scatter plots become one big smudge. See http://www.wiwi.uni-bielefeld.de/~wolf/software/R-wtools/bagplot/bagplot.pdf And some further examples on slides 36 - 39 of http://www.porzak.com/JimArchive/JimPorzak_CIwithR_useR2006_tutorial.pdf -- HTH, Jim Porzak Responsys, Inc. San Francisco, CA http://www.linkedin.com/in/jimporzak On 10/1/07, Karin Lagesen [EMAIL PROTECTED] wrote: I have two vectors x and y, which I would like to plot against each other. I am also displaying other data in this plot. However, I have about 1 million points to plot, and just plotting them x againt y is not very informative. What I'd like to do is to do sort of a continuous box plot. My x values goes from -1 to 1 and my y values from 0 to 1, so I´d like to plot the median and quantiles, and possibly also all of the outliers somehow. Are there any facilities in R for doing something like this, or would I need to do this the hard coded way? Thankyou very much for your help! Karin -- Karin Lagesen, PhD student [EMAIL PROTECTED] http://folk.uio.no/karinlag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Variable selection in R
Disclaimer : Short of having local statistical expertise at hand, I'm using this list because I use R for variable selection in the context of linear multiple regression but the questions I have relate more to basic statistics than to R per se. Please redirect me to another appropriate list if such a list exists. I have the very common problem of identifying which (subset of) variables are important in a multiple linear regression problem. Googling and browsing around, I ended up with four methods I could easily access through R packages. My aim was to use those methods with continuous variables in a bid to find out which had an effect on my dependent variable, but more so I thought this would lead me to identify variables amenable to categorization. Example : I have population as a continuous variable so I figured if some variable selection algorithm flagged population as significant, then I could try and apply a priori knowledge of the classes of population ranges I suspect could show different behaviours in the form of statistically different means as per ANOVA analysis. Coming back to my original variable selection need, here's what I did. First : regular lm Second : step Third : all-subsets (regsubsets, package leaps) Fourth : lasso (l1ce, package lasso2) Fifth : (I meant to use the lars package, but it does not allow for formulas; I know I could cast my dataset as matrices, but I didn't find an easy way of doing this and I figured I had enough options already) I'm trying to make sense of the information that is sent back at me from the summary calls. I'm looking for what variables are identified by each method, hoping to find comparable/complementary results. Given I'm a transient user of statistics, I find many of the Details sections on the help files lack specific instructions as to exactly how to interpret the results. Here we go *lm* anova(tonlm) Analysis of Variance Table Response: Cout.ton Df Sum Sq Mean Sq F valuePr(F) Tonnage 197209720 1.3497 0.2470437 Popul 1 112361 112361 15.6014 0.0001164 *** DensiteOcc1 173350 173350 24.0699 2.245e-06 *** NbmRues.hab 1 280 280 0.0389 0.8438903 RFU.hab 1 183 183 0.0254 0.8734816 UO.MAMR.Precis1 67161 67161 9.3254 0.0026428 ** t.CS.t.déchets.MAMR 1 24188 24188 3.3586 0.0686925 . Pct.Pot.CS.hab1 78764 78764 10.9365 0.0011614 ** NbCentresTriDs100km 1 218725 218725 30.3702 1.380e-07 *** DistMarche1 54114 54114 7.5137 0.0068110 ** Residuals 162 11667177202 *ANOVA applied on step(lm) * Response: Cout.ton Df Sum Sq Mean Sq F valuePr(F) Popul 129172917 0.4038 0.525990 DensiteOcc1 163741 163741 22.6730 4.179e-06 *** RFU.hab 1 83 83 0.0115 0.914899 UO.MAMR.Precis1 168407 168407 23.3191 3.112e-06 *** Pct.Pot.CS.hab1 122685 122685 16.9880 5.943e-05 *** NbCentresTriDs100km 1 190659 190659 26.4002 7.761e-07 *** DistMarche1 65467 65467 9.0652 0.003015 ** Residuals 165 11916067222 Questions compared to lm above : What tells me which variable was selected first in the stepwise process ? Do I sort Pr(F), the lowest value of which corresponds to the first variable ? Popul has a 3 star rating in lm and nothing in step. How do I interpret that ? * all-subsets regression * summary(tonall)$cp [1] 37.338138 31.452375 25.300272 17.965950 13.751043 10.021910 8.455827 8.810498 9.273599 11.00 summary(tonall)$adjr2 [1] 0.2155976 0.2411378 0.2680048 0.2997661 0.3197652 0.3381030 0.3481410 0.3506880 0.3528339 0.3499369 summary(tonall)$which[1,][summary(tonall)$which[1,]] (Intercept) DistMarche summary(tonall)$which[2,][summary(tonall)$which[2,]] (Intercept), NbCentresTriDs100km, DistMarche summary(tonall)$which[3,][summary(tonall)$which[3,]] (Intercept), Tonnage, UO.MAMR.Precis, NbCentresTriDs100km summary(tonall)$which[4,][summary(tonall)$which[4,]] (Intercept), Tonnage, DensiteOcc, UO.MAMR.Precis, NbCentresTriDs100km summary(tonall)$which[5,][summary(tonall)$which[5,]] (Intercept), Tonnage, DensiteOcc, UO.MAMR.Precis NbCentresTriDs100km, DistMarche omitting the remaining values up to which[10,] Questions w/r to lm and step : all-subsets says DistMarche is the single most important. That makes some sense as that variable had a two-star rating in both lm and step. But shouldn't the 3-star ratings in step be close to those in all-subsets ? For example, the best 3-variable model shows Tonnage popping up. Tonnage has no rating in lm and doesn't even show in step. Is it a matter of step being initialized with a variable such that Tonnage will never be considered whereas it is in an exhaustive all-subsets regression ? I'm puzzled. * lasso *
Re: [R] plot question
Thanks for your quick reply, Eric. I want plot colnames(n) as string on x-axis. If the regression lines don't fit the data very well, it is OK, the plot is only for quality check. On Tue, 2 Oct 2007, Eric Thompson wrote: If I've correctly interpreted what you want, you first need to get the x values: x - colnames(n) x - as.numeric(substr(x, 1, nchar(x) - 1)) Then it seems fairly easy to use matplot to get the values with different colors for each concentration dim(x) - c(length(x), 1) matplot(x, t(n), pch = 1) But this does not look like a simple line will fit the data for each gene well, so perhaps I've misunderstood something. You will have to decide how you want to do the regression. It will also get very messy and difficult to read with 20 lines (a different regression for each gene). To do the regressions, plot the lines, and label with the gene ID, see ?lm ?predict ?abline ?text On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n n 25p5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 16B-E06.g 47740 4676 6769 48 37B-B02.g 42860 6152 19276 72 35B-A02.g 48325 12863 38274 143 35B-A02.g 48410 12806 39013 175 35B-A02.g 48417 9057 40923 176 35B-A02.g 51403 13865 43338 161 45B-C12.g 50939 3656 5783 43 45B-C12.g 52356 5524 6041 55 45B-C12.g 49338 5141 5266 41 45B-C12.g 51567 3915 5677 43 35A-G04.g 40365 5513 6971 32 35B-D01.g 54217 12607 13067 93 35B-D01.g 55283 11441 14964 101 35B-D01.g 55041 9626 14928 94 35B-D01.g 54058 9465 14912 88 35B-A04.g 42745 12080 34271 105 35B-A04.g 41055 12423 34874 126 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is Intensity. I want to plot the data this way. x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. y-axis is Intensity Inside of plot is the points of intensity over 4 concentrations, points from different genes have different color or shape. A regression line of each genes crosss different concetrations, and at the end of line is gene IDs. Thanks, Tiandao __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot question
Okay. If you want to customize the axis labels, you can suppress the defaults by changing the matplot call to matplot(x, t(n), pch = 1, axes = FALSE) And then adding them how you want: axis(side = 2) axis(side = 3, at = x, lab = colnames(n)) box() On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Thanks for your quick reply, Eric. I want plot colnames(n) as string on x-axis. If the regression lines don't fit the data very well, it is OK, the plot is only for quality check. On Tue, 2 Oct 2007, Eric Thompson wrote: If I've correctly interpreted what you want, you first need to get the x values: x - colnames(n) x - as.numeric(substr(x, 1, nchar(x) - 1)) Then it seems fairly easy to use matplot to get the values with different colors for each concentration dim(x) - c(length(x), 1) matplot(x, t(n), pch = 1) But this does not look like a simple line will fit the data for each gene well, so perhaps I've misunderstood something. You will have to decide how you want to do the regression. It will also get very messy and difficult to read with 20 lines (a different regression for each gene). To do the regressions, plot the lines, and label with the gene ID, see ?lm ?predict ?abline ?text On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n n 25p5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 16B-E06.g 47740 4676 6769 48 37B-B02.g 42860 6152 19276 72 35B-A02.g 48325 12863 38274 143 35B-A02.g 48410 12806 39013 175 35B-A02.g 48417 9057 40923 176 35B-A02.g 51403 13865 43338 161 45B-C12.g 50939 3656 5783 43 45B-C12.g 52356 5524 6041 55 45B-C12.g 49338 5141 5266 41 45B-C12.g 51567 3915 5677 43 35A-G04.g 40365 5513 6971 32 35B-D01.g 54217 12607 13067 93 35B-D01.g 55283 11441 14964 101 35B-D01.g 55041 9626 14928 94 35B-D01.g 54058 9465 14912 88 35B-A04.g 42745 12080 34271 105 35B-A04.g 41055 12423 34874 126 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is Intensity. I want to plot the data this way. x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. y-axis is Intensity Inside of plot is the points of intensity over 4 concentrations, points from different genes have different color or shape. A regression line of each genes crosss different concetrations, and at the end of line is gene IDs. Thanks, Tiandao __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem loading a txt file as a data.frame
Hello all, I know that this is a terribly banal question but I cannot seem to solve it. I am trying to load in data from a tab-delimited text file. Some columns are mixed text-numbers and other columns are strictly numbers. Some cells are blank. My command is: MDMT_RPup - read.table(GCRMA_MDM-T_RPup.txt, header=T, sep=\t, row.names=NULL, fill=TRUE) The problem is that this text file should load with 118 rows, 7 columns of data. But when I execute the following command: dim(MDMT_RPup) I only get: 40 7 Other similarly generated text files of varying sizes (9 rows, 72 rows, etc.) seem to load just fine. I have tried opening the text file in MS Excel and deleting all columns to the right of the data in case there was some kind of hidden character. No help. Any ideas? --Nick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] continuous boxplot?
Folks: I found the references in the previous replies to this vexing data visualization issue to be quite interesting and useful. I think it fair to say that there is no single best way to do this -- it all depends on what you need to learn , and probably several alternative displays will be necessary to get the important information the data have to convey. However,as always, this issue has been considered before, and it may be worthwhile to at least consider an already available standard approach using shingles and a trellis-type plot. ?xyplot and ?shingle should get you started (you probably want to shingle or bin on quantiles of y). The canonical reference is Bill Cleveland's VISUALIZING DATA (see coplots). Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jim Porzak Sent: Tuesday, October 02, 2007 11:19 AM To: Karin Lagesen Cc: r-help@r-project.org Subject: Re: [R] continuous boxplot? Karin, I like to use bagplots in these cases where there are a lot of cases and scatter plots become one big smudge. See http://www.wiwi.uni-bielefeld.de/~wolf/software/R-wtools/bagplot/bagplot.pdf And some further examples on slides 36 - 39 of http://www.porzak.com/JimArchive/JimPorzak_CIwithR_useR2006_tutorial.pdf -- HTH, Jim Porzak Responsys, Inc. San Francisco, CA http://www.linkedin.com/in/jimporzak On 10/1/07, Karin Lagesen [EMAIL PROTECTED] wrote: I have two vectors x and y, which I would like to plot against each other. I am also displaying other data in this plot. However, I have about 1 million points to plot, and just plotting them x againt y is not very informative. What I'd like to do is to do sort of a continuous box plot. My x values goes from -1 to 1 and my y values from 0 to 1, so I4d like to plot the median and quantiles, and possibly also all of the outliers somehow. Are there any facilities in R for doing something like this, or would I need to do this the hard coded way? Thankyou very much for your help! Karin -- Karin Lagesen, PhD student [EMAIL PROTECTED] http://folk.uio.no/karinlag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strange names when creating a data.frame: Difference between - and =
This is just a curiosity question. Why do the two different syntaxes for df1 and df2 give such different results in the names(dfx)? Thanks df1 - data.frame(nas = c(A, B , B ,C ,B, A, D),nums = c(3,2,1, 1, 2,3, 7)) df2 - data.frame(nas - c(A, B , B ,C ,B, A, D),nums - c(3,2,1, 1, 2,3, 7)) df1; df2 df1 nas nums 1 A3 2 B2 . . df2 nasc..ABBCBAD.. numsc.3..2..1..1..2..3..7. 1 A 3 2 B 3 . . __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adehabitat package question - trying to generate animal home ranges
Hello, I'm new to R and am trying to use the adehabitat package for home range and habitat selection analysis for some animal radiotelemetry data sets I have, but I can't get the home range functions (mcp kernelUD) to work with my data. I think that my problem is that I don't properly understand how the package uses data frames and factors, and I may not be defining the id factor correctly. I've started out by trying to imitate the code in the homerange.r demo that comes with the ade package. The code I'm trying to repeat with my data is: ### ### ### ### loads the data library(adehabitat) data(puechabon) xy-puechabon$locs[,c(X,Y)] id-puechabon$locs$Name ## The data are: xy[1:4,] ## relocations coordinates id[1:4] ## ID ### ### ### ### Home ranges ## MCP hr-mcp(xy, id) ## home range estimation plot(hr)## displays the MCP (jj-mcp.area(xy, id)) ## home range size plot(jj)## plots home-range size ## Kernel home range (hr-kernelUD(xy, id, h=LSCV, grid=100)) ## UD estimation plotLSCV(hr) ## LSCV criterion image(hr) ## displays the UD (jj-kernel.area(xy, id)) ## home range size plot(jj) ## Plots home range size ver - getverticeshr(hr) ## home-range contours plot(ver) ## Plots contours == The following is as far as I have gotten with my attempt to repeat the above: require(adehabitat) ... ## I can get my data into the data frame object test test = read.csv(c:/rjunk/testdata.csv, header=T) test BIRD_LOC BIRD SEX DATE TIME DATEVALUE SEASON X Y 1 CBF1001 CBF F 5/18/2004 1756 38125 04BR 675022 5302087 2 CBF1002 CBF F 5/19/2004 1817 38126 04BR 674988 5302111 ... 90 CBM1043 CBM M 9/28/2004 739 38258 04BR 675305 5302229 91 CBM1044 CBM M 10/1/2004 738 38261 04BR 675005 5302109 ## What does the $locs do? Am I correct that this is referring to something that I don't have defined in my data frame? xy-test$locs[,c(X,Y)] id-test$locs$bird xy NULL id NULL ## I figured out how to subset the data into my XY coordinates and the BIRD id lables, but the mcp and kernelUD functions don't work with these subsets. Do I need to do something else to identify BIRD as a factor? Is there a data prep step that I'm missing? xy - subset(test, select = c(X, Y)) xy X Y 1 675022 5302087 2 674988 5302111 ... 90 675305 5302229 91 675005 5302109 id - subset(test, select = c(BIRD)) id BIRD 1 CBF 2 CBF ... 90 CBM 91 CBM hr - mcp(xy, id) Error in mcp(xy, id) : xy and id should be of the same length hr - kernelUD(xy, id, h=LSCV, grid=100) Error in kernelUD(xy, id, h = LSCV, grid = 100) : id should have the same length as xy --- Peter Singleton Wildlife Ecologist USFS Pacific Northwest Research Station 1133 N. Western Ave. Wenatchee WA 98801 Phone: (509)664-1732 Fax: (509)665-8362 E-mail: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot question
However, I got the following msg. matplot(colnames(n), t(n), pch = 1, axes = FALSE) Error in plot.window(xlim, ylim, log, asp, ...) : need finite 'xlim' values In addition: Warning messages: 1: NAs introduced by coercion in: as.double.default(x) 2: no non-missing arguments to min; returning Inf in: min(x) 3: no non-missing arguments to max; returning -Inf in: max(x) 4: NAs introduced by coercion in: as.double.default(x) On Tue, 2 Oct 2007, Eric Thompson wrote: Okay. If you want to customize the axis labels, you can suppress the defaults by changing the matplot call to matplot(x, t(n), pch = 1, axes = FALSE) And then adding them how you want: axis(side = 2) axis(side = 3, at = x, lab = colnames(n)) box() On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Thanks for your quick reply, Eric. I want plot colnames(n) as string on x-axis. If the regression lines don't fit the data very well, it is OK, the plot is only for quality check. On Tue, 2 Oct 2007, Eric Thompson wrote: If I've correctly interpreted what you want, you first need to get the x values: x - colnames(n) x - as.numeric(substr(x, 1, nchar(x) - 1)) Then it seems fairly easy to use matplot to get the values with different colors for each concentration dim(x) - c(length(x), 1) matplot(x, t(n), pch = 1) But this does not look like a simple line will fit the data for each gene well, so perhaps I've misunderstood something. You will have to decide how you want to do the regression. It will also get very messy and difficult to read with 20 lines (a different regression for each gene). To do the regressions, plot the lines, and label with the gene ID, see ?lm ?predict ?abline ?text On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n n 25p5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 16B-E06.g 47740 4676 6769 48 37B-B02.g 42860 6152 19276 72 35B-A02.g 48325 12863 38274 143 35B-A02.g 48410 12806 39013 175 35B-A02.g 48417 9057 40923 176 35B-A02.g 51403 13865 43338 161 45B-C12.g 50939 3656 5783 43 45B-C12.g 52356 5524 6041 55 45B-C12.g 49338 5141 5266 41 45B-C12.g 51567 3915 5677 43 35A-G04.g 40365 5513 6971 32 35B-D01.g 54217 12607 13067 93 35B-D01.g 55283 11441 14964 101 35B-D01.g 55041 9626 14928 94 35B-D01.g 54058 9465 14912 88 35B-A04.g 42745 12080 34271 105 35B-A04.g 41055 12423 34874 126 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is Intensity. I want to plot the data this way. x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. y-axis is Intensity Inside of plot is the points of intensity over 4 concentrations, points from different genes have different color or shape. A regression line of each genes crosss different concetrations, and at the end of line is gene IDs. Thanks, Tiandao __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem loading a txt file as a data.frame
On Tue, 2 Oct 2007, Ettinger, Nicholas wrote: Hello all, I know that this is a terribly banal question but I cannot seem to solve it. I am trying to load in data from a tab-delimited text file. Some columns are mixed text-numbers and other columns are strictly numbers. Some cells are blank. My command is: MDMT_RPup- read.table(GCRMA_MDM-T_RPup.txt, header=T, sep=\t, row.names=NULL, fill=TRUE) You have specified neither quote nor comment.char. The default for quote is \'. You might try changing it to or \. The problem is that this text file should load with 118 rows, 7 columns of data. But when I execute the following command: dim(MDMT_RPup) I only get: 40 7 Look at each of the rows. Where things start going crazy you need to inspect the data. You might use readLines() and cat() to read in and inspect the data rather than a text editor. HTH, Chuck Other similarly generated text files of varying sizes (9 rows, 72 rows, etc.) seem to load just fine. I have tried opening the text file in MS Excel and deleting all columns to the right of the data in case there was some kind of hidden character. No help. Any ideas? --Nick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot question
But you did not use the command I suggested: you replaced x with colnames(n), which is a vector of characters. Characters such as 25p has little meaning for plotting: as.numeric(25p) [1] NA So you've tried to plot a bunch of NA's, which is why it doesn't know what limits to use for xlim. That is why I gave you the commands to convert that character array colnames(n) to a meaningful numeric array. It is unreasonable to expect R to guess that 25p = 25. x - colnames(n) x - as.numeric(substr(x, 1, nchar(x) - 1)) matplot(x, t(n), pch = 1, axes = FALSE) axis(side = 2) axis(side = 3, at = x, lab = colnames(n)) box() On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: However, I got the following msg. matplot(colnames(n), t(n), pch = 1, axes = FALSE) Error in plot.window(xlim, ylim, log, asp, ...) : need finite 'xlim' values In addition: Warning messages: 1: NAs introduced by coercion in: as.double.default(x) 2: no non-missing arguments to min; returning Inf in: min(x) 3: no non-missing arguments to max; returning -Inf in: max(x) 4: NAs introduced by coercion in: as.double.default(x) On Tue, 2 Oct 2007, Eric Thompson wrote: Okay. If you want to customize the axis labels, you can suppress the defaults by changing the matplot call to matplot(x, t(n), pch = 1, axes = FALSE) And then adding them how you want: axis(side = 2) axis(side = 3, at = x, lab = colnames(n)) box() On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Thanks for your quick reply, Eric. I want plot colnames(n) as string on x-axis. If the regression lines don't fit the data very well, it is OK, the plot is only for quality check. On Tue, 2 Oct 2007, Eric Thompson wrote: If I've correctly interpreted what you want, you first need to get the x values: x - colnames(n) x - as.numeric(substr(x, 1, nchar(x) - 1)) Then it seems fairly easy to use matplot to get the values with different colors for each concentration dim(x) - c(length(x), 1) matplot(x, t(n), pch = 1) But this does not look like a simple line will fit the data for each gene well, so perhaps I've misunderstood something. You will have to decide how you want to do the regression. It will also get very messy and difficult to read with 20 lines (a different regression for each gene). To do the regressions, plot the lines, and label with the gene ID, see ?lm ?predict ?abline ?text On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n n 25p5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 16B-E06.g 47740 4676 6769 48 37B-B02.g 42860 6152 19276 72 35B-A02.g 48325 12863 38274 143 35B-A02.g 48410 12806 39013 175 35B-A02.g 48417 9057 40923 176 35B-A02.g 51403 13865 43338 161 45B-C12.g 50939 3656 5783 43 45B-C12.g 52356 5524 6041 55 45B-C12.g 49338 5141 5266 41 45B-C12.g 51567 3915 5677 43 35A-G04.g 40365 5513 6971 32 35B-D01.g 54217 12607 13067 93 35B-D01.g 55283 11441 14964 101 35B-D01.g 55041 9626 14928 94 35B-D01.g 54058 9465 14912 88 35B-A04.g 42745 12080 34271 105 35B-A04.g 41055 12423 34874 126 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is Intensity. I want to plot the data this way. x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. y-axis is Intensity Inside of plot is the points of intensity over 4 concentrations, points from different genes have different color or shape. A regression line of each genes crosss different concetrations, and at the end of line is gene IDs. Thanks, Tiandao __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique rows in data frame
Here is one way to get the means: x x1 x2 1 A 1 2 B 2 3 B 3 aggregate(x$x2, list(x$x1),mean) Group.1 x 1 A 1.0 2 B 2.5 On 10/2/07, Dieter Best [EMAIL PROTECTED] wrote: Hello there, I have a data frame a small version of which could look like the following: x1 x2 1 A 1 2 B 2 3 B 3 Now I need to remove rows which are duplicate in x1, i.e. in the example above I would remove row 3. I have an ugly solution with for and while loops and ifs. ... And of course my data set is much larger and my solution takes along time. Any ideas what could be the best way to do this in R? Better yet: I actually would like to sort of collapse row 2 and 3 in the example above by replacing 2 and 3 with a new row 2 which has in x2 the mean of old x2 of row 2 and 3 (maybe this is poorly said). Anyways, thanks a lot in advance for suggestions. -- D - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange names when creating a data.frame: Difference between - and =
your two dataframe defintions are different. In the case of df1, you have 'names' the elements, and in the case of df2,you have done an assignment to 'nam' and 'num' (look in the workspace and you will see the object. Since you haven't names the elements in df2, is tries to constuct a name from the expressions that is given (e.g., nums = c(3, 2,1, 1, 2,3, 7)), so you get. This is what happens: make.names(nums - c(3,2,1, 1, 2,3, 7)) [1] nums.c.3.2.112.3..7. It tries to make a name from the expression. On 10/2/07, John Kane [EMAIL PROTECTED] wrote: This is just a curiosity question. Why do the two different syntaxes for df1 and df2 give such different results in the names(dfx)? Thanks df1 - data.frame(nas = c(A, B , B ,C ,B, A, D),nums = c(3,2,1, 1, 2,3, 7)) df2 - data.frame(nas - c(A, B , B ,C ,B, A, D),nums - c(3,2,1, 1, 2,3, 7)) df1; df2 df1 nas nums 1 A3 2 B2 . . df2 nasc..ABBCBAD.. numsc.3..2..1..1..2..3..7. 1 A 3 2 B 3 . . __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] continuous boxplot?
A couple of other nice references for dealing with many points in a scatterplot are: D. B. Carr, R. J. Littlefield, W. L. Nicholson, and J. S. Littlefield. Scatterplot matrix techniques for large n. Journal of the American Statistical Association, 82(398):424–436, 1987. W. S. Cleveland and R. McGill. The many faces of a scatterplot. Journal of the American Statistical Association, 79(388):807–822, 1984. A. Unwin, M. Theus, and H. Hofmann. Graphics of Large Datasets. Springer, 2006. Another technique is to overlay contours of a 2d kernel density estimate - this is somewhat similar to a bag-plot, although with different underlying assumptions. It's also important to think about whether you're interested in the joint (e.g. bag plot) or conditional (e.g. quantile regression) density. Hadley On 10/2/07, Bert Gunter [EMAIL PROTECTED] wrote: Folks: I found the references in the previous replies to this vexing data visualization issue to be quite interesting and useful. I think it fair to say that there is no single best way to do this -- it all depends on what you need to learn , and probably several alternative displays will be necessary to get the important information the data have to convey. However,as always, this issue has been considered before, and it may be worthwhile to at least consider an already available standard approach using shingles and a trellis-type plot. ?xyplot and ?shingle should get you started (you probably want to shingle or bin on quantiles of y). The canonical reference is Bill Cleveland's VISUALIZING DATA (see coplots). Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jim Porzak Sent: Tuesday, October 02, 2007 11:19 AM To: Karin Lagesen Cc: r-help@r-project.org Subject: Re: [R] continuous boxplot? Karin, I like to use bagplots in these cases where there are a lot of cases and scatter plots become one big smudge. See http://www.wiwi.uni-bielefeld.de/~wolf/software/R-wtools/bagplot/bagplot.pdf And some further examples on slides 36 - 39 of http://www.porzak.com/JimArchive/JimPorzak_CIwithR_useR2006_tutorial.pdf -- HTH, Jim Porzak Responsys, Inc. San Francisco, CA http://www.linkedin.com/in/jimporzak On 10/1/07, Karin Lagesen [EMAIL PROTECTED] wrote: I have two vectors x and y, which I would like to plot against each other. I am also displaying other data in this plot. However, I have about 1 million points to plot, and just plotting them x againt y is not very informative. What I'd like to do is to do sort of a continuous box plot. My x values goes from -1 to 1 and my y values from 0 to 1, so I4d like to plot the median and quantiles, and possibly also all of the outliers somehow. Are there any facilities in R for doing something like this, or would I need to do this the hard coded way? Thankyou very much for your help! Karin -- Karin Lagesen, PhD student [EMAIL PROTECTED] http://folk.uio.no/karinlag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange names when creating a data.frame: Difference between - and =
On Tue, 2 Oct 2007, John Kane wrote: This is just a curiosity question. Why do the two different syntaxes for df1 and df2 give such different results in the names(dfx)? Because only in the second case did you specify the names in the call. When you do nas - c(A ...) the argument has no name but you create an object called 'nas' in the calling environment (the workspace, it looks like). Perhaps you are confused by thinking '=' is an assignment operator in R, but it is so only when it is not used in any other sense. Thanks df1 - data.frame(nas = c(A, B , B ,C ,B, A, D),nums = c(3,2,1, 1, 2,3, 7)) df2 - data.frame(nas - c(A, B , B ,C ,B, A, D),nums - c(3,2,1, 1, 2,3, 7)) df1; df2 df1 nas nums 1 A3 2 B2 . . df2 nasc..ABBCBAD.. numsc.3..2..1..1..2..3..7. 1 A 3 2 B 3 . . __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique rows in data frame
?aggregate x - data.frame(a= as.factor(c(A, B , B ,C ,B, A, D)), b = c(3,2,1, 1, 2,3, 7)) aggregate(x[,2], list(x[,1]), mean) --- Dieter Best [EMAIL PROTECTED] wrote: Hello there, I have a data frame a small version of which could look like the following: x1 x2 1 A 1 2 B 2 3 B 3 Now I need to remove rows which are duplicate in x1, i.e. in the example above I would remove row 3. I have an ugly solution with for and while loops and ifs. ... And of course my data set is much larger and my solution takes along time. Any ideas what could be the best way to do this in R? Better yet: I actually would like to sort of collapse row 2 and 3 in the example above by replacing 2 and 3 with a new row 2 which has in x2 the mean of old x2 of row 2 and 3 (maybe this is poorly said). Anyways, thanks a lot in advance for suggestions. -- D - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot question
Hello Eric, I got it right this time. You are right, regression line is way too messy. For each gene, plot the lines connecting the points of the average intensity at different concentrations, and write gene ID at the end of line. I am new to the R graphics, any help is appreciated. On Tue, 2 Oct 2007, Eric Thompson wrote: But you did not use the command I suggested: you replaced x with colnames(n), which is a vector of characters. Characters such as 25p has little meaning for plotting: as.numeric(25p) [1] NA So you've tried to plot a bunch of NA's, which is why it doesn't know what limits to use for xlim. That is why I gave you the commands to convert that character array colnames(n) to a meaningful numeric array. It is unreasonable to expect R to guess that 25p = 25. x - colnames(n) x - as.numeric(substr(x, 1, nchar(x) - 1)) matplot(x, t(n), pch = 1, axes = FALSE) axis(side = 2) axis(side = 3, at = x, lab = colnames(n)) box() On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: However, I got the following msg. matplot(colnames(n), t(n), pch = 1, axes = FALSE) Error in plot.window(xlim, ylim, log, asp, ...) : need finite 'xlim' values In addition: Warning messages: 1: NAs introduced by coercion in: as.double.default(x) 2: no non-missing arguments to min; returning Inf in: min(x) 3: no non-missing arguments to max; returning -Inf in: max(x) 4: NAs introduced by coercion in: as.double.default(x) On Tue, 2 Oct 2007, Eric Thompson wrote: Okay. If you want to customize the axis labels, you can suppress the defaults by changing the matplot call to matplot(x, t(n), pch = 1, axes = FALSE) And then adding them how you want: axis(side = 2) axis(side = 3, at = x, lab = colnames(n)) box() On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Thanks for your quick reply, Eric. I want plot colnames(n) as string on x-axis. If the regression lines don't fit the data very well, it is OK, the plot is only for quality check. On Tue, 2 Oct 2007, Eric Thompson wrote: If I've correctly interpreted what you want, you first need to get the x values: x - colnames(n) x - as.numeric(substr(x, 1, nchar(x) - 1)) Then it seems fairly easy to use matplot to get the values with different colors for each concentration dim(x) - c(length(x), 1) matplot(x, t(n), pch = 1) But this does not look like a simple line will fit the data for each gene well, so perhaps I've misunderstood something. You will have to decide how you want to do the regression. It will also get very messy and difficult to read with 20 lines (a different regression for each gene). To do the regressions, plot the lines, and label with the gene ID, see ?lm ?predict ?abline ?text On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n n 25p5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 16B-E06.g 47740 4676 6769 48 37B-B02.g 42860 6152 19276 72 35B-A02.g 48325 12863 38274 143 35B-A02.g 48410 12806 39013 175 35B-A02.g 48417 9057 40923 176 35B-A02.g 51403 13865 43338 161 45B-C12.g 50939 3656 5783 43 45B-C12.g 52356 5524 6041 55 45B-C12.g 49338 5141 5266 41 45B-C12.g 51567 3915 5677 43 35A-G04.g 40365 5513 6971 32 35B-D01.g 54217 12607 13067 93 35B-D01.g 55283 11441 14964 101 35B-D01.g 55041 9626 14928 94 35B-D01.g 54058 9465 14912 88 35B-A04.g 42745 12080 34271 105 35B-A04.g 41055 12423 34874 126 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is Intensity. I want to plot the data this way. x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. y-axis is Intensity Inside of plot is the points of intensity over 4 concentrations, points from different genes have different color or shape. A regression line of each genes crosss different concetrations, and at the end of line is gene IDs. Thanks, Tiandao __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
Re: [R] Strange names when creating a data.frame: Difference between - and =
John Kane-2 wrote: This is just a curiosity question. Why do the two different syntaxes for df1 and df2 give such different results in the names(dfx)? Thanks df1 - data.frame(nas = c(A, B , B ,C ,B, A, D),nums = c(3,2,1, 1, 2,3, 7)) df2 - data.frame(nas - c(A, B , B ,C ,B, A, D),nums - c(3,2,1, 1, 2,3, 7)) df1; df2 df1 nas nums 1 A3 2 B2 . . df2 nasc..ABBCBAD.. numsc.3..2..1..1..2..3..7. 1 A 3 2 B 3 . . You specify argument names in the first function call in the form agrument.name=value Argument names become column names in the data frame. - is the assignment operator, it returns the assigned value, it is not the specification of a function argument. Second function call does not contain argument names, so, they are constructed from expressions. You should have the variables nas and nums in your workspace now. ls() should show them. If you type data.frame (without parentheses, ( and )) on the R command prompt it will print the body of this function. You can study it and discover how it works. -- View this message in context: http://www.nabble.com/Strange-names-when-creating-a-data.frame%3A-Difference-between-%3C--and-%3D-tf4557141.html#a13006205 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot question
On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n n 25p5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 16B-E06.g 47740 4676 6769 48 37B-B02.g 42860 6152 19276 72 35B-A02.g 48325 12863 38274 143 35B-A02.g 48410 12806 39013 175 35B-A02.g 48417 9057 40923 176 35B-A02.g 51403 13865 43338 161 45B-C12.g 50939 3656 5783 43 45B-C12.g 52356 5524 6041 55 45B-C12.g 49338 5141 5266 41 45B-C12.g 51567 3915 5677 43 35A-G04.g 40365 5513 6971 32 35B-D01.g 54217 12607 13067 93 35B-D01.g 55283 11441 14964 101 35B-D01.g 55041 9626 14928 94 35B-D01.g 54058 9465 14912 88 35B-A04.g 42745 12080 34271 105 35B-A04.g 41055 12423 34874 126 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is Intensity. I want to plot the data this way. x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. y-axis is Intensity Inside of plot is the points of intensity over 4 concentrations, points from different genes have different color or shape. A regression line of each genes crosss different concetrations, and at the end of line is gene IDs. I might do it something like this: df - structure(list(gene = structure(c(1L, 1L, 1L, 1L, 6L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 7L, 2L, 5L, 5L, 5L, 5L, 4L, 4L), .Label = c(16B-E06.g, 35A-G04.g, 35B-A02.g, 35B-A04.g, 35B-D01.g, 37B-B02.g, 45B-C12.g), class = factor), X25p = c(45379L, 45138L, 48457L, 47740L, 42860L, 48325L, 48410L, 48417L, 51403L, 50939L, 52356L, 49338L, 51567L, 40365L, 54217L, 55283L, 55041L, 54058L, 42745L, 41055L), X5p = c(4383L, 4028L, 4267L, 4676L, 6152L, 12863L, 12806L, 9057L, 13865L, 3656L, 5524L, 5141L, 3915L, 5513L, 12607L, 11441L, 9626L, 9465L, 12080L, 12423L), X2.5p = c(5123L, 6249L, 5470L, 6769L, 19276L, 38274L, 39013L, 40923L, 43338L, 5783L, 6041L, 5266L, 5677L, 6971L, 13067L, 14964L, 14928L, 14912L, 34271L, 34874L), X0.5p = c(45L, 52L, 54L, 48L, 72L, 143L, 175L, 176L, 161L, 43L, 55L, 41L, 43L, 32L, 93L, 101L, 94L, 88L, 105L, 126L )), .Names = c(gene, X25p, X5p, X2.5p, X0.5p), class = data.frame, row.names = c(NA, -20L)) library(reshape) library(ggplot2) dfm - melt(df, id=1) names(dfm) - c(gene, conc, intensity) dfm$conc - as.numeric(gsub([Xp], , as.character(dfm$conc))) qplot(conc, intensity, data=dfm, colour=gene, log=xy) + geom_smooth(method=lm) Note that I've converted the concentrations to numeric values and plotted them on a log scale. If you want to treat concentration as a factor, then you'll need the following code: dfm$conc - factor(dfm$conc) qplot(conc, intensity, data=dfm, colour=gene, group=gene, log=y) + geom_smooth(method=lm, xseq=levels(dfm$conc)) But in that case, fitting a linear model seems a bit dubious. Note that you can also use this format of data with lattice: library(lattice) xyplot(intensity ~ conc, data=dfm, type=c(p,r), group=gene, auto.key=T) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trouble obtaining results from a loop
#Hello, #I have a question about obtaining results from a loop I have written. #Below is a sample of individual genotypes from a genetic question I am working on called P.genotype.sample . P.genotype.sample-matrix(10,10,10) P.genotype.sample[,1]-c(2,2,1,5,1,1,5,6,1,3) P.genotype.sample[,2]-c(6,3,3,6,8,1,6,7,2,3) P.genotype.sample[,3]-c(2,2,2,3,3,2,2,2,3,3) P.genotype.sample[,4]-c(2,8,8,3,8,2,8,3,4,3) P.genotype.sample[,5]-c(3,3,8,3,6,1,1,1,1,3) P.genotype.sample[,6]-c(6,3,8,8,6,8,7,3,1,7) P.genotype.sample[,7]-c(1,5,5,1,5,1,1,5,5,5) P.genotype.sample[,8]-c(5,5,5,5,7,6,7,5,5,8) P.genotype.sample[,9]-c(5,5,8,5,5,5,5,2,5,2) P.genotype.sample[,10]-c(5,8,8,8,8,5,8,5,8,2) P.genotype.sample # I would like to count the number of times the same number ( 1 through 8) occurs in each row # and only in pairs of columns; 1:2, 3:4, 5:6, 7:8, 9:10. below is the loop that i have written # to perform this operation and put in the results in a matrix that is 5 columns wide and 8 rows long (called result). # I can get the operation to run but it gives me the results for the last pair of columns and overwrites # the results into every column of my result matrix. I am puzzled as to why this is occurring. Loci-5 Locus.Columns- Loci*2 x-seq(1,Locus.Columns,2) result-matrix(8,8,Loci) for (i in 1:Loci){ for (j in 1:8){ for (k in x){ result[j,i] - sum(P.genotype.sample[,k]==j P.genotype.sample[,k+1]==j) } } } result # Below is a long and drawn out way to achieve the result I am looking for in the loop above. true.result-matrix(8,8,Loci) for(j in 1:8){ true.result[j,1] -sum(P.genotype.sample[,1]==j P.genotype.sample[,2]==j) } for(j in 1:8){ true.result[j,2] -sum(P.genotype.sample[,3]==j P.genotype.sample[,4]==j) } for(j in 1:8){ true.result[j,3] -sum(P.genotype.sample[,5]==j P.genotype.sample[,6]==j) } for(j in 1:8){ true.result[j,4] -sum(P.genotype.sample[,7]==j P.genotype.sample[,8]==j) } for(j in 1:8){ true.result[j,5] -sum(P.genotype.sample[,9]==j P.genotype.sample[,10]==j) } true.result # Any suggestions or help on how to make this loop work would be greatly appreciated. # Thanks in advance Luke Neraas [EMAIL PROTECTED] University of Alaska Fairbanks School of Fisheries and Ocean Sciences 11120 Glacier Highway UAF Fisheries Division Juneau, AK 99801 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot question
Thanks for your help, Hadley. I want to treat concentration as factor, and the 2nd and 3rd part of codes are what I wanted. However, how to draw the lines to connect the points of average intensity of each gene at different concentrations? On Tue, 2 Oct 2007, hadley wickham wrote: On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n n 25p5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is Intensity. I want to plot the data this way. x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. y-axis is Intensity Inside of plot is the points of intensity over 4 concentrations, points from different genes have different color or shape. A regression line of each genes crosss different concetrations, and at the end of line is gene IDs. I might do it something like this: df - structure(list(gene = structure(c(1L, 1L, 1L, 1L, 6L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 7L, 2L, 5L, 5L, 5L, 5L, 4L, 4L), .Label = c(16B-E06.g, 35A-G04.g, 35B-A02.g, 35B-A04.g, 35B-D01.g, 37B-B02.g, 45B-C12.g), class = factor), X25p = c(45379L, 45138L, 48457L, 47740L, 42860L, 48325L, 48410L, 48417L, 51403L, 50939L, 52356L, 49338L, 51567L, 40365L, 54217L, 55283L, 55041L, 54058L, 42745L, 41055L), X5p = c(4383L, 4028L, 4267L, 4676L, 6152L, 12863L, 12806L, 9057L, 13865L, 3656L, 5524L, 5141L, 3915L, 5513L, 12607L, 11441L, 9626L, 9465L, 12080L, 12423L), X2.5p = c(5123L, 6249L, 5470L, 6769L, 19276L, 38274L, 39013L, 40923L, 43338L, 5783L, 6041L, 5266L, 5677L, 6971L, 13067L, 14964L, 14928L, 14912L, 34271L, 34874L), X0.5p = c(45L, 52L, 54L, 48L, 72L, 143L, 175L, 176L, 161L, 43L, 55L, 41L, 43L, 32L, 93L, 101L, 94L, 88L, 105L, 126L )), .Names = c(gene, X25p, X5p, X2.5p, X0.5p), class = data.frame, row.names = c(NA, -20L)) library(reshape) library(ggplot2) dfm - melt(df, id=1) names(dfm) - c(gene, conc, intensity) dfm$conc - as.numeric(gsub([Xp], , as.character(dfm$conc))) qplot(conc, intensity, data=dfm, colour=gene, log=xy) + geom_smooth(method=lm) Note that I've converted the concentrations to numeric values and plotted them on a log scale. If you want to treat concentration as a factor, then you'll need the following code: dfm$conc - factor(dfm$conc) qplot(conc, intensity, data=dfm, colour=gene, group=gene, log=y) + geom_smooth(method=lm, xseq=levels(dfm$conc)) But in that case, fitting a linear model seems a bit dubious. Note that you can also use this format of data with lattice: library(lattice) xyplot(intensity ~ conc, data=dfm, type=c(p,r), group=gene, auto.key=T) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] continuous boxplot?
On 10/2/07, Bert Gunter [EMAIL PROTECTED] wrote: Folks: I found the references in the previous replies to this vexing data visualization issue to be quite interesting and useful. I think it fair to say that there is no single best way to do this -- it all depends on what you need to learn , and probably several alternative displays will be necessary to get the important information the data have to convey. However,as always, this issue has been considered before, and it may be worthwhile to at least consider an already available standard approach using shingles and a trellis-type plot. ?xyplot and ?shingle should get you started (you probably want to shingle or bin on quantiles of y). The canonical reference is Bill Cleveland's VISUALIZING DATA (see coplots). A better reference for that is Elements of Graphing Data, where boxplots created after discretizing the x variable is used to motivate lowess. It's pretty much the only legitimate use I have seen of boxplots with shingles (it's a cheap way to get both a sense of location and variability non-parametrically). I vaguely remember seeing an even better reference, but can't find it now. -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot question
The last example does connect the points of average intensity, doesn't it? Hadley On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Thanks for your help, Hadley. I want to treat concentration as factor, and the 2nd and 3rd part of codes are what I wanted. However, how to draw the lines to connect the points of average intensity of each gene at different concentrations? On Tue, 2 Oct 2007, hadley wickham wrote: On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote: Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n n 25p5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is Intensity. I want to plot the data this way. x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p. y-axis is Intensity Inside of plot is the points of intensity over 4 concentrations, points from different genes have different color or shape. A regression line of each genes crosss different concetrations, and at the end of line is gene IDs. I might do it something like this: df - structure(list(gene = structure(c(1L, 1L, 1L, 1L, 6L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 7L, 2L, 5L, 5L, 5L, 5L, 4L, 4L), .Label = c(16B-E06.g, 35A-G04.g, 35B-A02.g, 35B-A04.g, 35B-D01.g, 37B-B02.g, 45B-C12.g), class = factor), X25p = c(45379L, 45138L, 48457L, 47740L, 42860L, 48325L, 48410L, 48417L, 51403L, 50939L, 52356L, 49338L, 51567L, 40365L, 54217L, 55283L, 55041L, 54058L, 42745L, 41055L), X5p = c(4383L, 4028L, 4267L, 4676L, 6152L, 12863L, 12806L, 9057L, 13865L, 3656L, 5524L, 5141L, 3915L, 5513L, 12607L, 11441L, 9626L, 9465L, 12080L, 12423L), X2.5p = c(5123L, 6249L, 5470L, 6769L, 19276L, 38274L, 39013L, 40923L, 43338L, 5783L, 6041L, 5266L, 5677L, 6971L, 13067L, 14964L, 14928L, 14912L, 34271L, 34874L), X0.5p = c(45L, 52L, 54L, 48L, 72L, 143L, 175L, 176L, 161L, 43L, 55L, 41L, 43L, 32L, 93L, 101L, 94L, 88L, 105L, 126L )), .Names = c(gene, X25p, X5p, X2.5p, X0.5p), class = data.frame, row.names = c(NA, -20L)) library(reshape) library(ggplot2) dfm - melt(df, id=1) names(dfm) - c(gene, conc, intensity) dfm$conc - as.numeric(gsub([Xp], , as.character(dfm$conc))) qplot(conc, intensity, data=dfm, colour=gene, log=xy) + geom_smooth(method=lm) Note that I've converted the concentrations to numeric values and plotted them on a log scale. If you want to treat concentration as a factor, then you'll need the following code: dfm$conc - factor(dfm$conc) qplot(conc, intensity, data=dfm, colour=gene, group=gene, log=y) + geom_smooth(method=lm, xseq=levels(dfm$conc)) But in that case, fitting a linear model seems a bit dubious. Note that you can also use this format of data with lattice: library(lattice) xyplot(intensity ~ conc, data=dfm, type=c(p,r), group=gene, auto.key=T) Hadley -- http://had.co.nz/ -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique rows in data frame
Hi guys, thanks for all your help. I didn't know about aggregate yet. John Kane [EMAIL PROTECTED] wrote: ?aggregate x - data.frame(a= as.factor(c(A, B , B ,C ,B, A, D)), b = c(3, 2, 1, 1, 2, 3, 7)) aggregate(x[,2], list(x[,1]), mean) --- Dieter Best wrote: Hello there, I have a data frame a small version of which could look like the following: x1 x2 1 A 1 2 B 2 3 B 3 Now I need to remove rows which are duplicate in x1, i.e. in the example above I would remove row 3. I have an ugly solution with for and while loops and ifs. ... And of course my data set is much larger and my solution takes along time. Any ideas what could be the best way to do this in R? Better yet: I actually would like to sort of collapse row 2 and 3 in the example above by replacing 2 and 3 with a new row 2 which has in x2 the mean of old x2 of row 2 and 3 (maybe this is poorly said). Anyways, thanks a lot in advance for suggestions. -- D - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble obtaining results from a loop
I think this gives you what you are looking for: P.genotype.sample [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]262236155 5 [2,]232833555 8 [3,]132888558 8 [4,]563338155 8 [5,]183866575 8 [6,]112218165 5 [7,]562817175 8 [8,]672313552 5 [9,]123411555 8 [10,]333337582 2 # create result matrix result - matrix(0, ncol=5, nrow=8) # iterate for pairs of columns for (i in seq(1, ncol(P.genotype.sample), by=2)){ + # get the rows that match + match.row - P.genotype.sample[,i] == P.genotype.sample[, i + 1] + # get the values from the rows that match + same - P.genotype.sample[match.row, i] + # now increment the 'result' matrix + for (j in same) result[j, (i + 1) / 2] - result[j, (i + 1) / 2] + 1 + } result [,1] [,2] [,3] [,4] [,5] [1,]10100 [2,]02001 [3,]12100 [4,]00000 [5,]00042 [6,]00100 [7,]00000 [8,]00101 On 10/2/07, Luke Neraas [EMAIL PROTECTED] wrote: #Hello, #I have a question about obtaining results from a loop I have written. #Below is a sample of individual genotypes from a genetic question I am working on called P.genotype.sample . P.genotype.sample-matrix(10,10,10) P.genotype.sample[,1]-c(2,2,1,5,1,1,5,6,1,3) P.genotype.sample[,2]-c(6,3,3,6,8,1,6,7,2,3) P.genotype.sample[,3]-c(2,2,2,3,3,2,2,2,3,3) P.genotype.sample[,4]-c(2,8,8,3,8,2,8,3,4,3) P.genotype.sample[,5]-c(3,3,8,3,6,1,1,1,1,3) P.genotype.sample[,6]-c(6,3,8,8,6,8,7,3,1,7) P.genotype.sample[,7]-c(1,5,5,1,5,1,1,5,5,5) P.genotype.sample[,8]-c(5,5,5,5,7,6,7,5,5,8) P.genotype.sample[,9]-c(5,5,8,5,5,5,5,2,5,2) P.genotype.sample[,10]-c(5,8,8,8,8,5,8,5,8,2) P.genotype.sample # I would like to count the number of times the same number ( 1 through 8) occurs in each row # and only in pairs of columns; 1:2, 3:4, 5:6, 7:8, 9:10. below is the loop that i have written # to perform this operation and put in the results in a matrix that is 5 columns wide and 8 rows long (called result). # I can get the operation to run but it gives me the results for the last pair of columns and overwrites # the results into every column of my result matrix. I am puzzled as to why this is occurring. Loci-5 Locus.Columns- Loci*2 x-seq(1,Locus.Columns,2) result-matrix(8,8,Loci) for (i in 1:Loci){ for (j in 1:8){ for (k in x){ result[j,i] - sum(P.genotype.sample[,k]==j P.genotype.sample[,k+1]==j) } } } result # Below is a long and drawn out way to achieve the result I am looking for in the loop above. true.result-matrix(8,8,Loci) for(j in 1:8){ true.result[j,1] -sum(P.genotype.sample[,1]==j P.genotype.sample[,2]==j) } for(j in 1:8){ true.result[j,2] -sum(P.genotype.sample[,3]==j P.genotype.sample[,4]==j) } for(j in 1:8){ true.result[j,3] -sum(P.genotype.sample[,5]==j P.genotype.sample[,6]==j) } for(j in 1:8){ true.result[j,4] -sum(P.genotype.sample[,7]==j P.genotype.sample[,8]==j) } for(j in 1:8){ true.result[j,5] -sum(P.genotype.sample[,9]==j P.genotype.sample[,10]==j) } true.result # Any suggestions or help on how to make this loop work would be greatly appreciated. # Thanks in advance Luke Neraas [EMAIL PROTECTED] University of Alaska Fairbanks School of Fisheries and Ocean Sciences 11120 Glacier Highway UAF Fisheries Division Juneau, AK 99801 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Protect your brand online
This is a text part of the message. It is shown for the users of old-style e-mail clients __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R routines vs. MATLAB/SPSS Routines
Hi all, I've become quite enamored of R lately, and have decided to try to teach some of its basics (reading in data, manipulation and classical stats analyses) to my fellow grad students at the University of Toronto. I sent out a mass email and have already received some positive responses. One student, however, wanted to know what differentiates the routines that R uses, from those that MATLAB and SPSS use. In other words, in what respects do R routines work faster/more efficiently/more accurately than those of MATLAB/SPSS. I thank you in advance for any answer you can give me (or rather, the inquiring student). Cheers, Matthew Dubins __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GARCH simulations
Hey, Is there any way to simulate a GARCH(1,1) data set? Thanks Tharanga __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating proportions from a data frame rather than a table
How do you create a table from a data frame? I tried as.table( name.of.data.frame) but it bombed out. I will include the exact error message in my next posting. If I recall correctly, it said that the data.frame could not be coerced to a table. On 10/2/07, John Kane [EMAIL PROTECTED] wrote: What am I missing here? Cannot you just create the table from the data.frame and apply prop.table()to it? --- Farrel Buchinsky [EMAIL PROTECTED] wrote: When one has raw data it is easy to create a table of one variable against another and then calculate proportions For example a.nice.table-table(a,b) prop.table(a.nice.table,1) However, I looked at several papers and created a data frame of the aggregate data. That means I acually created a table except it is a data frame. The first column lists the name of the first author and the year. I cannot find how to convert the data frame to a table so I can use great functions such as prop.table and margin.table. Alas I tried, rowSums but it provides lousy output without listing the names of th papers. I have thought of going through the reshape package but I suspect that there is an easier way to convert a data frame to a table. Is there? -- Farrel Buchinsky Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca -- Farrel Buchinsky GrandCentral Tel: (412) 567-7870 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Design package: plot summary
Cary Dehing-Oberije wrote: Hi everybody, I am a new user of R, design package. I am trying to plot the estimated hazard ratio's of my cox regression model with the confidence intervals. But I keep getting the message:Error in `contrasts-`(`*tmp*`, value = contr.treatment) : contrasts can be applied only to factors with 2 or more levels I have dichotomous, categorical as well as continuous variables in the model, but I can only plot the dichotomous variable. I started with the following syntax: dd - datadist(x1, x2, x3) Please follow the posting guide. Please provide R commands that generate values of x1, x2, x3 that cause the problem you reported below. options(datadist='dd') fit-cph(S~ x1 +x2 +x3, x=T,y=T) plothr- summary(fit) plot(plothr, log=T) or just plot(summary(fit), log=TRUE) Because the levels defined by datadist didn't seem to work I tried to adjust the levels after fitting the model: dd$limits[Adjust to,x1] - 1 dd$limits[Adjust to,x2] - 30 Don't do that. Frank fit - update(fit) plothr- summary(fit) plot(plothr, log=T) Finally I have tried: summary(fit, x1=1, x2=30, x3=3) But this option didn't work either. Can anybody tell me what goes wrong and why? Thanks for your help, Cary Cary Dehing-Oberije Researcher CAT project p/a MAASTRO clinic Dr. Tanslaan 12 6229 ET Maastricht The Netherlands phone: 088 - 4455666 / 088 - 4455767 email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique rows in data frame
Try this: DF[!duplicated(DF$x1), ] x1 x2 1 A 1 2 B 2 # or subset(DF, !duplicated(x1)) x1 x2 1 A 1 2 B 2 On 10/2/07, Dieter Best [EMAIL PROTECTED] wrote: Hello there, I have a data frame a small version of which could look like the following: x1 x2 1 A 1 2 B 2 3 B 3 Now I need to remove rows which are duplicate in x1, i.e. in the example above I would remove row 3. I have an ugly solution with for and while loops and ifs. ... And of course my data set is much larger and my solution takes along time. Any ideas what could be the best way to do this in R? Better yet: I actually would like to sort of collapse row 2 and 3 in the example above by replacing 2 and 3 with a new row 2 which has in x2 the mean of old x2 of row 2 and 3 (maybe this is poorly said). Anyways, thanks a lot in advance for suggestions. -- D - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R routines vs. MATLAB/SPSS Routines
I would stress the advantages of the free and open source nature of R over the proprietary programs you mention. Because R is free (as in beer), your student will have access to it even when they are free of the university that I presume buys a MATLAB/SPSS license for them. And because R is open source, when your student has an idea regarding how to make R work faster/more efficiently/more accurately, they can modify the source code at whatever level they like. They can also examine at whatever level they like what R is doing, so there are no black boxes involved. On Tue, 2 Oct 2007, Matthew Dubins wrote: Hi all, I've become quite enamored of R lately, and have decided to try to teach some of its basics (reading in data, manipulation and classical stats analyses) to my fellow grad students at the University of Toronto. I sent out a mass email and have already received some positive responses. One student, however, wanted to know what differentiates the routines that R uses, from those that MATLAB and SPSS use. In other words, in what respects do R routines work faster/more efficiently/more accurately than those of MATLAB/SPSS. I thank you in advance for any answer you can give me (or rather, the inquiring student). Cheers, Matthew Dubins __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to modify function for list
Dear R Gurus, I have a function which calculate the BMI standard deviation score based on the age and gender specific L, M, S value. I have written a function like this bmisds - function (age, gender, bmi){ if (age ==1 gender ==1) { bmif - c(-1.013,16.133,0.07656) } else if (age ==1 gender ==0) { bmif - c(-1.79,16.421,0.07149) } else if (age == 2 gender == 1) { bmif - c(-1.403,15.58,0.07342) } else if (age == 2 gender == 0) { bmif - c(-2.045,15.861,0.07116) } (((bmi/bmif[2])**bmif[1] -1)/(bmif[1]*bmif[3])) } This function work when I supply a single variable, e.g. bmisds(1,0,16) When I try it with a list, e.g. age - c(12,13,14) gender - c(0,1,1) bmi - c(14,15,16) bmisds(age,gender,bmi) This function give me wrong answer and error msg. Warning messages: 1: the condition has length 1 and only the first element will be used in: if (age == 1 gender == 1) { How can I midify this function to work with list/vector? regards, C -- CH Chan Research Assistant - KWH http://www.macgrass.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Factor levels.
I have factors with levels ``Unit, Achieved, and Scholarship; I wish to replace these with U, A, and S. So I do fff - factor(fff,labels=c(U,A,S)) This works as long as all of the levels are actually present in the factor. But if ``Scholarship'' is absent (as if often is) then I get an error. I can do a workaround such as fff - factor(c(U,A,S)[fff],levels=c(U,A,S)) but this seems kludgy to me. Is there a sexier way? cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confidenti...{{dropped}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.