[R] Rounding and printing
Hello, I am trying to print a table with numbers all rounded to the same number of digits (one after the decimal), but R seems to want to not print ".0" for integers. I can go in and fix it one number at a time, but I'd like to understand the principle. Here's an example of the code. The problem is the 13th element, 21 or 21.0: >nvb_deaths <- round(ss[,10]/100,digits=1) > nvb_deaths [1] 56.5 1.6 0.2 3.9 0.1 2.2 0.2 2.6 1.5 4.1 1.1 6.1 21.0 >nvb_dths <- paste(nvb_deaths," >(",round(100*nvb_deaths/nvb_deaths[1],digits=1),"%)",sep="") > nvb_dths [1] "56.5 (100%)" "1.6 (2.8%)" "0.2 (0.4%)" "3.9 (6.9%)" "0.1 (0.2%)" "2.2 (3.9%)" [7] "0.2 (0.4%)" "2.6 (4.6%)" "1.5 (2.7%)" "4.1 (7.3%)" "1.1 (1.9%)" "6.1 (10.8%)" [13] "21 (37.2%)" > print(nvb_deaths,digits=1) [1] 56.5 1.6 0.2 3.9 0.1 2.2 0.2 2.6 1.5 4.1 1.1 6.1 21.0 > paste(print(nvb_deaths,digits=1)," > (",round(100*nvb_deaths/nvb_deaths[1],digits=1),"%)",sep="") [1] 56.5 1.6 0.2 3.9 0.1 2.2 0.2 2.6 1.5 4.1 1.1 6.1 21.0 [1] "56.5 (100%)" "1.6 (2.8%)" "0.2 (0.4%)" "3.9 (6.9%)" "0.1 (0.2%)" "2.2 (3.9%)" [7] "0.2 (0.4%)" "2.6 (4.6%)" "1.5 (2.7%)" "4.1 (7.3%)" "1.1 (1.9%)" "6.1 (10.8%)" [13] "21 (37.2%)" I'm running R v2.8.1 on Windows. Any help is much appreciated. Cheers, Alan Cohen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Drawing lines in margins
Hi all, Quick question: What function can I use to draw a line in the margin of a plot? segments() and lines() both stop at the margin. In case the answer depends on exactly what I'm trying to do, see below. I'm using R v. 2.8.1 on Windows XP. Cheers, Alan I'm trying to make a horizontal barplot with a column of numbers on the right side. I'd like to put a line between the column header and the numbers. The following reconstructs the idea - just copy and paste it in: aa <- 1:10 plot.mtx2<-cbind(aa,aa+1) colnames(plot.mtx2)<-c("Male","Female") lci2<- cbind(aa-1,aa) uci2<- cbind(aa+1,aa+2) par(mar=c(5,6,4,5)) cols <- c("grey79","grey41") bplot2<-barplot(t(plot.mtx2),beside=TRUE,xlab="Malaria death rates per 100,000", names.arg=paste("state",aa,sep=""),legend.text=F,las=1,xlim=c(0,13), horiz=T, col=cols, main="Malaria death rates by state and sex") legend(8,6,legend=c("Female","Male"),fill=cols[order(2:1)]) segments(y0=bplot2, y1=bplot2, x0=t(lci2), x1=t(uci2)) mtext(10*(aa+1),side=4,line=4,at=seq(3,3*length(aa),by=3)-0.35,padj=0.5,adj=1,las=1,cex=0.85) mtext(10*aa,side=4,line=4,at=seq(2,3*length(aa)-1,by=3)-0.65,padj=0.5,adj=1,las=1,cex=0.85) mtext("Estimated",side=4,line=3,at=3*length(aa)+2.75,padj=0.5,adj=0.5,las=1,cex=0.85) mtext("Deaths",side=4,line=3,at=3*length(aa)+1.25,padj=0.5,adj=0.5,las=1,cex=0.85) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Long to wide format without time variable
Hi all, I am trying to convert a data set of physician death codings (each individual's cause of death is coded by multiple physicians) from long to wide format, but the "reshape" function doesn't seem to work because it requires a "time" variable to identify the sequence among the repeated observations within individuals. My data set has no order, and different numbers of physicians code each death, up to 23. It is also quite large, so for-loops are very slow, and I'll need to repeat the procedure multiple times. So I'm looking for a processor-efficient way to replicate "reshape" without a time variable. Thanks in advance for any help you can provide. A worked example and some code I've tried are below. I'm working with R v2.8.1 on Windows XP Professional. Cheers, Alan Cohen Here's what my data look like now: > id <- rep(1:5,2) > COD <- c("A01","A02","A03","A04","A05","B01","A02","B03","B04","A05") > MDid <- c(1:6,3,5,7,2) > data <- as.data.frame(cbind(id,COD,MDid)) > data id COD MDid 1 1 A011 2 2 A022 3 3 A033 4 4 A044 5 5 A055 6 1 B016 7 2 A023 8 3 B035 9 4 B047 10 5 A052 And here's what I'd like them to look like: > id2 <- 1:5 > COD.1 <- c("A01","A02","A03","A04","A05") > COD.2 <- c("B01","A02","B03","B04","A05") > MDid.1 <- 1:5 > MDid.2 <-c(6,3,5,7,2) > data.wide <- as.data.frame(cbind(id2,COD.1,COD.2,MDid.1,MDid.2)) > data.wide id2 COD.1 COD.2 MDid.1 MDid.2 1 1 A01 B01 1 6 2 2 A02 A02 2 3 3 3 A03 B03 3 5 4 4 A04 B04 4 7 5 5 A05 A05 5 2 Here's the for-loop that's very slow (with or without the if-clauses activated): ids<-unique(data$id) ct<-length(ids) codes<-matrix(0,ct,11) colnames(codes)<-c("ID","ICD1","Coder1","ICD2","Coder2","ICD3","Coder3","ICD4","Coder4","ICD5","Coder5") j<-0 for (i in 1:ct){ kkk <- ids[i] rpt<-data[data$id==kkk,] j<-max(j,nrow(rpt)) codes[i,1]<-kkk codes[i,2]<-rpt$ICDCode[1] codes[i,3]<-rpt$T_Physician_ID[1] #if (nrow(rpt)>=2){ codes[i,4]<-rpt$ICDCode[2] codes[i,5]<-rpt$T_Physician_ID[2] #if (nrow(rpt)>=3) { codes[i,6]<-rpt$ICDCode[3] codes[i,7]<-rpt$T_Physician_ID[3] #if (nrow(rpt)>=4) { codes[i,8]<-rpt$ICDCode[4] codes[i,9]<-rpt$T_Physician_ID[4] #if (nrow(rpt)>=5) { codes[i,10]<-rpt$ICDCode[5] codes[i,11]<-rpt$T_Physician_ID[5] # } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with "apply"
Hi R users, I am trying to assign ages to age classes for a large data set (123,000 records), and using a for-loop was too slow, so I wrote a function and used apply. However, the function does not properly assign the first two classes (the rest are fine). It appears that when age is one digit, it does not get assigned properly. I tried to provide a small-scale work-up (at the end of the email) but it does not reproduce the problem; the best I can do is to provide my code and the output below. As you can see, I've confirmed that age is numeric, that all values are integers, and that pieces of the code work independently. Any thoughts would be appreciated. To add to the mystery, depending which rows of my data set I select, I get different problems. mds[1:100,] gives the problem above, as do mds[100:200,] , mds[150:250,] and mds[1:10100,]. However, with mds[200:300,], mds[250:350,] and mds[1000:1100,], only ages with 3 digits are correctly assigned - all ages <100 are returned as NA. I'm using R v 2.8.1 on Windows XP. Cheers, Alan Cohen Centre for Global Health Research, Toronto,ON > ageassign <- function(x){ + y <- NA + if (x[11] %in% c(0:4)) {y <- "0-4"} + else if (x[11] %in% c(5:14)) {y <- "5-14" } + else if (x[11] %in% c(15:29)) {y <- "15-29" } + else if (x[11] %in% c(30:69)) {y <- "30-69"} + else if (x[11] %in% c(70:79)) {y <- "70-79"} + else if (x[11] %in% c(80:125)) {y <- "80+"} + return(y) + } > jj <- apply(mds[1:100,],1,FUN=ageassign) > jj 1 2 3 4 5 6 7 8 9 10 11 12 13 NA "80+" "30-69" "30-69" "80+" NA "30-69" "30-69" "70-79" "15-29" "15-29" "30-69" "70-79" 14 15 16 17 18 19 20 21 22 23 24 25 26 "80+" NA "30-69" "30-69" "30-69" "80+" "80+" "15-29" "70-79" "30-69" "70-79" "70-79" "30-69" 27 28 29 30 31 32 33 34 35 36 37 38 39 "70-79" "80+" NA "80+" "70-79" NA "15-29" "15-29" NA NA "70-79" "30-69" "30-69" 40 41 42 43 44 45 46 47 48 49 50 51 52 "70-79" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "70-79" "15-29" "30-69" NA "15-29" "30-69" 53 54 55 56 57 58 59 60 61 62 63 64 65 "30-69" NA "70-79" "30-69" "30-69" "30-69" "30-69" "15-29" "30-69" "30-69" "70-79" "30-69" NA 66 67 68 69 70 71 72 73 74 75 76 77 78 "30-69" "30-69" "30-69" "30-69" "30-69" "80+" "30-69" "80+" "70-79" "30-69" "30-69" "30-69" NA 79 80 81 82 83 84 85 86 87 88 89 90 91 "30-69" "30-69" "30-69" NA "80+" "30-69" "30-69" "30-69" NA "15-29" "30-69" "30-69" "30-69" 92 93 94 95 96 97 98 99 100 "30-69" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "30-69" "30-69" > mds[1:100,11] [1] 3 82 40 35 82 1 37 57 71 22 21 52 73 86 1 43 60 63 84 88 29 73 69 75 73 43 75 83 4 83 77 1 27 [34] 15 1 6 76 51 45 71 54 64 69 70 48 38 74 26 37 4 18 63 59 8 78 63 67 62 50 21 66 69 75 57 4 50 [67] 58 60 61 62 83 69 92 75 30 49 69 1 69 63 69 0 93 64 59 69 2 25 32 60 66 67 54 53 64 79 59 49 59 [100] 64 > table(mds[,11]) 0123456789 10 11 12 13 14 15 16 17 18 19 3123 6441 3856 2884 1968 1615 1386 1088 1098 721 943 681 511 380 426 835 571 555 719 653 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 879 715 672 631 655 773 680 713 769 538 685 566 729 702 652 766 683 723 821 675 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
[R] Weighted principal components analysis?
Hello R-ers, I'm trying to do a weighted principal components analysis. I couldn't find any such option with princomp or prcomp. Does anyone know of a package or way to do this? More specifically, the observations I'm working with are averages from populations of varying sizes. I thus need to weight the observations by sample size. Ideally I could apply these weights at the cell level (i.e., allowing sample size to vary within observations across variables), but even applying them just to the observations would get me most of the way there. I'm using R v2.8.1 on Windows XP. I've searched Help and the R site and had no luck. Thanks for any help you can provide. Cheers, Alan Cohen Centre for Global Health Research Toronto, Ontario __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using apply to get group means
Hi all, I'm trying to improve my R skills and make my programming more efficient and succinct. I can solve the following question, but wonder if there's a better way to do it: I'm trying to calculate mean by several variables and then put this back into the original data set as a new variable. For example, if I were measuring weight, I might want to have each individual's weight, and also the group mean by, say, race, sex, and geographic region. The following code works: > x1<-rep(c("A","B","C"),3) > x2<-c(rep(1,3),rep(2,3),1,2,1) > x3<-c(1,2,3,4,5,6,2,6,4) > x<-as.data.frame(cbind(x1,x2,x3)) > x3.mean<-rep(0,nrow(x)) > for (i in 1:nrow(x)){ + x3.mean[i]<-mean(as.numeric(x[,3][x[,1]==x[,1][i]&x[,2]==x[,2][i]])) + } > cbind(x,x3.mean) x1 x2 x3 x3.mean 1 A 1 1 1.5 2 B 1 2 2.0 3 C 1 3 3.5 4 A 2 4 4.0 5 B 2 5 5.5 6 C 2 6 6.0 7 A 1 2 1.5 8 B 2 6 5.5 9 C 1 4 3.5 However, I'd love to be able to do this with "apply" rather than a for-loop. Or is there a built-in function? Any suggestions? Also, any way to avoid the hassles with having to convert to a data frame and then again to numeric when one variable is character? Cheers, Alan Cohen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Failure to subset in R v 2.8.0
Hello, I've been using a pre-release version of R v 2.8.0 for Windows for the last couple months. I think that there have been consistent problems with subsetting data sets, but I had usually been able to find work-arounds or was unable to confirm this as a bug. I think now I have, and would love advice on what to do if I've made some error. The data set in question ("c") has 500,000 observations and 44 variables. The problematic variable, "month," takes integer values 1:12, and all are present in the data set: > unique(c$month) [1] 11 10 9 8 12 1 7 4 6 2 5 3 However, I can't select observations of c for certain values of month: > c[c$month==11,] [1] STATEDISTRICT TALUKVILLAGE TYPE SERIALNO INTDATE QH101P [9] QH114QH115A1 QH115B1 QH115C1 QH115A2 QH115B2 QH115C2 QH115A3 [17] QH115B3 QH115C3 QH115A4 QH115B4 QH115C4 QH115A5 QH115B5 QH115C5 [25] QH116QH117A1 QH117B1 QH117C1 QH117A2 QH117B2 QH117C2 QH117A3 [33] QH117B3 QH117C3 QH117A4 QH117B4 QH117C4 QH117A5 QH117B5 QH117C5 [41] phaseyear monthstdistid.rch <0 rows> (or 0-length row.names) I get the same result for c[c[,43]==11,], and > length(c$month[c$month==11]) [1] 0 This is true for most values of month (1,2,4,5,7,8,10,11), but the multiples of 3 work, apparently correctly. Other variables do not have this problem (the columns shift in the email, but these three observations have month=11): > c[c$STATE==11,][1:3,] STATE DISTRICT TALUK VILLAGE TYPE SERIALNO INTDATE QH101P QH114 QH115A1 QH115B1 QH115C1 QH115A2 QH115B2 QH115C2 QH115A3 QH115B3 87556112 1 1151187 6 0 0 0 0 0 0 0 0 0 87557112 1 11 101187 3 0 0 0 0 0 0 0 0 0 87558112 1 11 141187 5 0 0 0 0 0 0 0 0 0 QH115C3 QH115A4 QH115B4 QH115C4 QH115A5 QH115B5 QH115C5 QH116 QH117A1 QH117B1 QH117C1 QH117A2 QH117B2 QH117C2 QH117A3 QH117B3 QH117C3 87556 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 87557 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 87558 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 QH117A4 QH117B4 QH117C4 QH117A5 QH117B5 QH117C5 phase year month stdistid.rch 87556 0 0 0 0 0 0 1 199811 1102 87557 0 0 0 0 0 0 1 199811 1102 87558 0 0 0 0 0 0 1 199811 1102 The data set is called directly from a csv file, where all variables should be stored in the same way, and using as.numeric(as.character(c$month)) does not help. Nor does restarting R, restarting the computer, or trying the operation on smaller subsets of c. I'd appreciate any help you an provide. Sincerely, Alan Cohen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Memory limits for large data sets
Hello, I have several very large data sets (1-7 million observations, sometimes hundreds of variables) that I'm trying to work with in R, and memory seems to be a big issue. I'm currently using a 2 GB Windows setup, but might have the option to run R on a server remotely. Windows R seems basically limited to 2 GB memory if I'm right; is there the possibility to go much beyond that with server-based R? In other words, am I limited by R or by my hardware, and how much might R be able to handle if I get the hardware necessary? Also, any possibility of using web-based R for this kind of thing? Cheers, Alan Cohen Alan Cohen Post-doctoral Fellow Centre for Global Health Research 70 Richmond St. East, Suite 202A Toronto, ON M5C 1N8 Canada (416) 854-3121 (cell) (416) 864-6060 ext. 3156 (0ffice) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.