[R] Export multiple data files from R
Dear R-users, I have 10 data files in gpr format (dat1.gpr, dat10.gpr). I want to read in these files one by one in R and then add one extra column (called log) to each data file as below data.file=sort(dir(path ='C:/Documents and Settings/ Mina dokument/data1, pattern = .gpr$,full.names = TRUE)) num.data.files- length(data.file) num.data.files i=1 ### read one data file data-read.table(file = data.file[i],skip=31,header=T,sep='\t',na.strings=NA) ### Define the log ratio using values in column 2 resp 8 log=as.matrix(log((data[,2])/(data[,8]))) ### append column called log to data frame data, for the reading data file data=cbind(data,log) ### Read remaining data files for (i in 2:num.data.files) { data-read.table(file=data.file[i],header=T,skip=31,sep='\t',na.strings=NA) log=as.matrix(log((data[,2])/(data[,8]))) data=cbind(data,log) } Now I want to export these files (with an extra column in each) as gpr-files in a folder called data2 but dont know exactly how to do it, can you help me out ? Thanks for your help, Jenny - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Export multiple data files as gpr-files
Dear R-users, I have 10 data files in gpr format (dat1.gpr, dat10.gpr). I want to read in these files one by one in R and then add one extra column (called log) to each data file as below data.file=sort(dir(path ='C:/Documents and Settings/ Mina dokument/data1, pattern = .gpr$,full.names = TRUE)) num.data.files- length(data.file) num.data.files i=1 ### read one data file data-read.table(file = data.file[i],skip=31,header=T,sep='\t',na.strings=NA) ### Define the log ratio using values in column 2 resp 8 log=as.matrix(log((data[,2])/(data[,8]))) ### append column called log to data frame data, for the reading data file data=cbind(data,log) ### Read remaining data files for (i in 2:num.data.files) { data-read.table(file=data.file[i],header=T,skip=31,sep='\t',na.strings=NA) log=as.matrix(log((data[,2])/(data[,8]))) data=cbind(data,log) } Now I want to export these files (with an extra column in each) as gpr-files in a folder called data2 but dont know exactly how to do it, can you help me out ? Thanks for your help, Jenny - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] keep track of selected observations over time
Dear all, Attached is a description of my data, graph and the problem which I need help with. Hope you have time to open the file and help me out. Many thanks, Jenny - __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] identify selected substances across individuals
Dear R -users, Below is an example data, which contains three id-numbers. For each id there are three substances in each of the three blocks. Some substances are repeated twice.The subsatances are the same for all ids. The id number 3 is actually a control so all responses (y) that are equal or greater than 4 are supposed to be removed from this id number. This I can do easily in R but what I need help with is I want to have those substances that are removed from id number 3 also removed from other ids as well. I could do an algorithm like : for id in 1:2, if substance = c(abc,dgf) then delete but if the substances to be removed have long strings and are more than 2 (for example 20 substances) then it would take long time to list the substances manually. Can you guys please show me a clever way to do what I described above ? Another problem: Let's assume that we use the same data and do a box plot for each of the id and assume that there are two extreme outliers shown on the boxplot for id number 3 (probably the substances abc and dgf). I want to list the responses for these two substances for all ids and then make a profile plot for each substance across ids to check for the variation of response. Again if there are more than two substances and three ids then writting an algoritm like if substance = c(abc,dgf), then list y may be inefficient. Any suggestion, please ? Thanks alot for your help, Jenny substance id block y abc 1 1 2.5 ade 1 1 2.2 dfg 1 1 3.0 ade 1 2 3.9 glf 1 2 2.4 wdg 1 2 2.8 abc 1 3 3.4 whl 1 3 4.2 dgf 1 3 2.7 abc 2 1 4.5 ade 2 1 3.2 dfg 2 1 3.0 ade 2 2 3.2 glf 2 2 2.0 wdg 2 2 2.3 abc 2 3 3.4 whl 2 3 4.1 dgf 2 3 2.5 abc 3 1 4.0 ade 3 1 3.0 dfg 3 1 3.0 ade 3 2 2.9 glf 3 2 2.3 wdg 3 2 2.8 abc 3 3 3.1 whl 3 3 2.8 dgf 3 3 5.0 - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extracting data into different subsets
y Slide Block ID 441068 -0.020464103 1 15 GFASKTPANQA 448844 0.061400545 1 41 GFASKTPANQA 456620 -0.031026896 10 15 GFASKTPANQA 464396 -0.033166864 10 41 GFASKTPANQA 472172 -0.108148804 11 15 GFASKTPANQA 479948 -0.397759508 11 41 GFASKTPANQA 4167 -0.67283526 1 13 ALPAFSPPAQA 11943 -0.23982701 1 37 ALPAFSPPAQA 19719 -0.10169540 10 13 ALPAFSPPAQA 27495 0.70043972 10 37 ALPAFSPPAQA 35271 -0.18807235 11 13 ALPAFSPPAQA 43047 -0.17982104 11 37 ALPAFSPPAQA 5264 -0.011681805 1 17 ATQAAGAGAVA 13040 -0.073063462 1 41 ATQAAGAGAVA 20816 -0.017996429 10 17 ATQAAGAGAVA 28592 0.010159866 10 41 ATQAAGAGAVA 36368 -0.056034035 11 17 ATQAAGAGAVA 44144 -0.346175641 11 41 ATQAAGAGAVA 5612 -0.7121977 1 18 GFASKTPANQA 13388 -0.4076580 1 42 GFASKTPANQA 21164 -0.1864131 10 18 GFASKTPANQA 28940 -0.1140163 10 42 GFASKTPANQA 36716 -0.3246222 11 18 GFASKTPANQA 44492 -0.4355016 11 42 GFASKTPANQA where there are 4 different IDs and each ID appears twice in two blocks for each of 3 slides. I want to extract the data in such a way that every ID that appears the first time will be grouped to group 1, and the second time to group 2. For the data above, it means that the IDs with response y that are in blocks 15,13,17,18 for each slide will be in group 1 and the rest are in group 2. How can I do this in R ? Thanks for your help, Jenny - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to add characters on graph ?
Dear R-users, I have following data # Plot coloured scatter plot c-dat[100:110,c(5,7,8)] par(mfrow=c(3,2)) plot(c$lb,c$index, pch=1, col=5,cex=1, lwd=2, xlab=LB, ylab=Index,cex.main =1,font.main= 1, main=scatterplot) ID index lb 100 FLINDYTHNIPLI 1.84770221 9.087463 101 none 0.06657547 8.927778 102 GDDKVYSANGFTT -0.22922544 8.599913 103 GDFTQGPQSAKTR 0.01203925 8.483816 104 GDKEFSDALGYLQ -0.06264494 8.463524 105 GDPTETLRQCFDD -0.10011148 8.483816 106 GDSGGSFQNGHAQ -0.13460447 8.442943 107 GDVYSFAIIMQEV 1.91504700 8.413628 108 GLRSLYPPQ -0.11224126 8.383704 109 GLWVTYKAQDAKT 0.03723291 8.257388 110 GMSQPLLDRTVPD -0.06580206 8.294621 When I plotted a scatter plot of index against lb, there are two extreme values. How can I plot so that these values are replaced by their ID or the IDs are next to these values on the graph? I want to do something like: if index 1.5 then plot the IDs instead of the indexes greater than 1.5 or place the Ids next to their indexes. The data above is a little part of my real data (which might have more than two extreme outliers). Thanks for your help, Jenny __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about for loop
Dear R-experts, I have a dataset of 4 patients and each patient has many records at four different time points. I have done 4 different qqnorm plots on the same graph where each plot represents the records of one patient at each time point. I would like to do the same graph for the remaining patiens, but instead of repeating the below procedure three more times I would like to have a for loop so when I run the loop I would get four graphs of the four patients at the same time, where each graph has 4 different qqnorm plots representing the data at four time points for each patient. I tried to do the loop but couldn't make it work. below is the code used to the graph for one patient. sli-4 - sli_7 contain records at four time points. dat=dataset, y=records, Slide=time point ### Patient 24 sli_4=dat$y[dat$Slide==4 dat$Control==0] sli_5=dat$y[dat$Slide==5 dat$Control==0] sli_6=dat$y[dat$Slide==6 dat$Control==0] sli_7=dat$y[dat$Slide==7 dat$Control==0] ### qq-plot of patient 24 q_sli4-qqnorm(sli_4,plot.it=FALSE) q_sli5-qqnorm(sli_5,plot.it=FALSE) q_sli6-qqnorm(sli_6,plot.it=FALSE) q_sli7-qqnorm(sli_7,plot.it=FALSE) plot(range(q_sli4,q_sli5,q_sli6,q_sli7),range(q_sli4,q_sli5,q_sli6,q_sli7),type=n, xlab = Theoretical Quantiles,col.main=blue, main = Normal Q-Q Plot of index for patient 24,ylab = Sample Quantiles) points(q_sli4,col=4,pch=0,cex=1) points(q_sli5,col=3,pch=1,cex=1) points(q_sli6,col=2,pch=2,cex=1) points(q_sli7,col=1,pch=3,cex=1) legend(topleft,c(Day 0, 56 days,112 days, 252 days),col=c(4,3,2,1), text.col=c(4,3,2,1),pch=c(0,1,2,3),bg=bisque) abline(0,0) Thanks alot for your help, All the bests, Jenny - - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to plot two variables in the same qqnorm-plot?
Dear all, I have two variables called c2 and c3 and want to plot these variables in the same qqnorm-plot with two different symbols or colors to distinguish them so I can easily compare the variables aginst each other. How can I do in R? I only manage to do two separated qqnorn-plots. Thanks for your help, All the best, Jenny c2= -0.1545775144 -0.0601161235 -0.1454710903 0.1893182564 -0.0470586789 -0.0740945381 -0.0041386301 0.0889232833 -0.0418779055 -0.0184595989 -0.1116784460 0.5286719173 -0.0714560939 -0.1160750488 0.2479689612 -0.0255424336 -0.1802256606 -0.1436590798 0.2091955894 -0.0408695231 -0.0097490458 0.4674886420 0.1310178029 0.2518403775 c3= -0.1696564482 -0.1126714841 -0.1460504793 0.1674967485 -0.0181011669 -0.0671367425 0.3261235871 0.0125372613 -0.0970306822 0.0066345879 -0.0438274488 0.8376670819 -0.1195411677 -0.0735540655 0.2999832105 -0.0133914650 -0.1020235781 -0.0929364933 0.1909337727 -0.0198168723 0.0544515704 0.5744399944 0.5022208978 -0.1494894501 - Stava rätt! Stava lätt! Yahoo! Mails stavkontroll tar hand om tryckfelen och mycket mer! Få den på http://se.mail.yahoo.com [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about position in row and column
Hi R-users, I have 48 blocks like below 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 in each block there are 18 cloumns and 18 rows which give 18*18*48 observations. The matrix looks like 1 2 3 17 18 1 2 3. ..1 7 18 1 2 3. ..1 7 18 1 2 3 ... 17 18 1 2 Block 1 Block 2 block 3 Block 4 3 . . 17 18 -- 1 2 block 5 block 6 block 7block 8 3 . . 17 18 -- 1 2 3 block 9 block 10 block11 block12 . . 17 18 --- 1 2 3 block 13 block 14block 15 block 16 . . 17 18 . . . til the last row with blocks 45 46 47 48 What i want to do is to add column and row positions to block 2-48 so that the matrix will look like 1 2 317 18 19 20 21...35 36 37 38 39 ...53 54 55 56 ...71 72 1 2 block 1 block 2 block 3block 4 3 . . 17 18 -- 19 20 block 5block 6 block 7 block 8 21 . . 35 36 -- 37 38 39 . . 53 54 --- 55 56 57 . . 71 72 . . . . . --- 198 199 block 45 block 46 block 47 block 48 . . 215 216 How can I do this in R ? I try layout is a matrix contains columns:block, row, column, id name ## Column positions ### blockcol-rep(0,nrow(layout)) if(layout$Block%%4==2) {blockcol=layout$Column+18} if(layout$Block%%4==3){blockcol=layout$Column+36} else {blockcol=layout$Column+54} ## Row positions ### blockrow-rep(0,nrow(layout)) if(layout$Block=4) {blockrow=layout$Row} for (j in 2:12){ if((j-1)*4layout$Block=j*4) {blockrow=layout$Row+(j-1)*18} } but didn't get the desirable result. Thanks for your help, Jenny [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to create a vector with different categories in a simple way?
Thank you so much, Marc and Phil. Unfortunenately, I misunderstood the problem myself and wasn't clear how i wanted the variables to be. I will describe the issue again and hope you can help me out. Here is part of data called layout Id Name block col row 1 a1 11 2 b1 21 3 c131 4 a14 1 5 b112 6 c122 7 b132 8 c14 2 9 d 1 1 3 10 e 1 2 3 11 a 1 3 3 12 d 1 4 3 13 e 1 1 4 14 a 1 2 4 15 d 1 3 4 16 c 1 4 4 17 d 2 1 1 18 c 2 2 1 19 e 2 3 1 20 d 2 4 1 21 b 2 1 2 22 e 2 2 2 23 f2 3 2 24 d 2 4 2 32 a 2 4 4 and so on .. . . . .. . . . .. . . . 768 f 48 44 As you can see for each row there are 4 columns. the total observations in each block is 16. My real data contains 48 blocks which give totally 768 observations. The blocks ranged from 1-48 are displayed four and four after each other like below. Note there are 4 rows and 4 columns in each block. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 What I want is to create two variables called blockrow respective blockcol in such a way that blockrow will have value 1 for block 1,2 3 and 4, blockrow=2 for blocks 5,6,7 and 8 and so on. Similarly, blockcol = 1 for blocks 1,5 ,9,13,17,21,25,29,33,37,41 and 44 and so on. As you can see there are 12 blockrows and 4 blockcols. The data should look like Id Name block col row blockrow blockcol 1 a1 11 11 2 b1 21 11 3 c131 11 4 a14 1 11 5 b112 11 6 c122 1 1 7 b132 1 1 8 c142 1 1 9 d1 1 3 1 1 10 e 1 2 3 1 1 11 a 1 3 3 1 1 12 d 1 4 3 1 1 13 e 1 1 4 1 1 14 a 1 2 4 1 1 15 d 1 3 4 1 1 16 c 1 4 4 1 1 17 d 2 1 1 1 2 18 c 2 2 1 1 2 19 e 2 3 1 1 2 20 d 2 4 1 1 2 21 b 2 1 2 1 2 22 e 2 2 2 1 2 23 f2 3 2 1 2 24 d 2 4 2 1 2 32 a 2 4 4 and so on .. . . . .. . . . .. . . . 768 f 48 44 12 4 I have an algorithm blockrow -1 if(layout$block = 4) blockrow -1 if(5=layout$block = 8) blockrow -2 if(9=layout$block = 12) blockrow -3 and so on Can I do a for loop like : #-- Append some more columns to matrix layout-- blockrow-rep(0,nrow(layout)) blockcol-rep(0,nrow(layout)) for (a in 1:12){ if(4*a+1=layout$block=(a+1)*4) blockrow-(a+1) } Similarly, blockcol-1 if(layout$block = 5,9,13,17,21,25,29,33,37,41,45) blockcol-1 if(layout$block = 2,6,10,14,18,22,26,30,34,38,42,46) blockcol-2 and so on which give the for loop blockcol-1 for (a in 1:12){ if(layout$block==(4*a+1)) blockrow-1 } or how can i do it in R so I get blockrow and blockcol as i want ? Thanks again for your help, Best regards, Yen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to create a vector with different categories in a simple way?
Hi R-users, I have a matrice called layout which contains 5 columns:id, name, row, column and block. The column called block has totally 48 blocks and looks like 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Each block (1-48) has 18 rows and 18 columns. I want to create 2 variables called blockrow and blockcol in such a way that blockrow will have value 1 for block 1,2 3 and 4, blockrow=2 for blocks 5,6,7 and 8 and so on. Similarly, blockcol = 1 for blocks 1,5 ,9,13,17,21,25,29,33,37,41 and 44 and so on. As you can see there are 12 blockrows and 4 blockcols. I have written the following programme but it didnot give the desirable output. How can I make it work in a simplier way ? # five first columns give position identifiers layout=a[,1:5] #-- Append some more columns to the identifiers-- blockrow-rep(0,nrow(layout)) blockcol-rep(0,nrow(layout)) blockrow[layout$Block==c(1,2,3,4)]=1 blockrow[layout$Block==c(5,6,7,8)]=2 blockrow[layout$Block==c(9,10,11,12)]=3 blockrow[layout$Block==c(13,14,15,16)]=4 blockrow[layout$Block==c(17,18,19,20)]=5 blockrow[layout$Block==c(21,22,23,24)]=6 blockrow[layout$Block==c(25,26,27,28)]=7 blockrow[layout$Block==c(29,30,31,32)]=8 blockrow[layout$Block==c(33,34,35,36)]=9 blockrow[layout$Block==c(37,38,39,40)]=10 blockrow[layout$Block==c(41,42,43,44)]=11 blockrow[layout$Block==c(45,46,47,48)]=12 blockcol[layout$Block==c(1,5,9,13,17,21,25,29,33,37,41,45)]=1 blockcol[layout$Block==c(2,6,10,14,18,22,26,30,34,38,42,46)]=2 blockcol[layout$Block==c(3,7,11,15,19,23,27,31,35,39,43,47)]=3 slidecol[layout$Block==c(4,8,12,16,20,24,28,32,36,40,44,48)]=4 #-- re-arrange the response (index) as a long vector and the # layout matrix as a long matrix matching the response vector y=c(index) layout=data.frame(layout, Control,blockow,blockcol) Block=as.factor(rep(layout$Block, ncol(index))) Column=rep(layout$Column, ncol(index)) Row=rep(layout$Row, ncol(index)) Name=as.factor(rep(layout$Name, ncol(index))) ID=rep(layout$ID, ncol(index)) blockrow=rep(layout$sliderow, ncol(index)) blockcol=rep(layout$slidecol, ncol(index)) Thanks for your help, Best regards, Jenny [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.