Re: [R] creating a reverse geometric sequence
Erik Iverson er...@ccbr.umn.edu writes: Hello, Can anyone think of a non-iterative way to generate a decreasing geometric sequence in R? For example, for a hypothetical function dg, I would like: dg(20) [1] 20 10 5 2 1 where I am using integer division by 2 to get each subsequent value in the sequence. There is of course: dg - function(x) { res - integer() while(x = 1) { res - c(res, x) x - x %/% 2 } res } dg(20) [1] 20 10 5 2 1 This implementation of 'dg' uses an interative 'while' loop. I'm simply wondering if there is a way to vectorize this process? Hi Erik, How about dg - function(x) { maxi - floor(log(x)/log(2)) floor(x / (2^(0:maxi))) } I don't think the remainders cause a problem. Dan Thanks, Erik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] C function call in R
John Lande john.land...@gmail.com writes: dear all, we am trying to improve the performance of my R code, with the implentation of some function with custom C code. we found difficult to import and export/import data structure such us matrices or data.frame into the external C functions. Please give a *very simple* example of what you're trying and failing to do. Use the .C() interface, forget about the .Call interface. Then it is not that hard. Start with the convolve example on p.69 and 70 of Writing R Extensions. Get that working and then turn it into your problem. Forget about lists and data frames: everything is going to be a simple vector. That includes arrays and matrices: you can pass them in, but C will know nothing about their dimensions until you tell it. Of course, you can pass the dimension vectors in as a separate vector. So, if you use arrays, you need to understand the order in which R stores the elements of the array. If your problem cannot be solved with the .C interface then you should consider whether it is worthwhile to proceed as the .Call interface repays those who use it frequently but has a considerably steeper learning (and forgetting) curve. Dan we already tried the solution from Writing R Extensions form the R webpage. do you have any other solution or more advanced documentation on that point? looking forward your answer [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading in all files of a certain type
Dimitri Liakhovitski dimitri.liakhovit...@gmail.com writes: Thanks again - and one follow-up question. When I do do.call(rbind, lapply(dir(patt = \\.csv$), read.csv)) What is the right way to speicify (probably under patt) that I only need to grab those .csv files that contain a certain string, e.g., result? I assume you mean whose names contain a certain string, rather than the string being in the file contents. The pattern argument to dir() is a regular expression. They are a worthwhile thing to know a bit about, so, you should have a look at some introductory material on regular expressions, but this might also help a bit: dir() [1] 1-result.csv result-2.csv resultcsvresult.csv dir(patt=result\\.csv$) [1] 1-result.csv result.csv dir(patt=result.*\\.csv$) [1] 1-result.csv result-2.csv result.csv Dan I tried a couple of things, like patt= \\.csv$ pat = result - but it does not seem to work Thanks a lot! Dimitri On Wed, May 12, 2010 at 6:16 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Thanks a lot, Henrique, will try! Dimitri On Wed, May 12, 2010 at 3:41 PM, Henrique Dallazuanna www...@gmail.com wrote: Try this: do.call(rbind, lapply(dir(patt = \\.csv$), read.csv)) On Wed, May 12, 2010 at 4:32 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello, I am wondering if it's possible to read in all files of a certain type - without specifying their names. For example, I have 10 .csv files in my working directory. I would like to read them in and bind them all together. I was thinking of writing a loop, read in all files, and then bind them. Is it possible? Thanks a lot! -- Dimitri Liakhovitski Ninah Consulting www.ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Dimitri Liakhovitski Ninah Consulting www.ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading in all files of a certain type
jim holtman jholt...@gmail.com writes: try: pattern=*result*\\.csv$ Just for the record, that's not quite correct. The * doesn't behave like in a shell glob. Instead, * says 0 or more copies of the previous character. So the above pattern picks up resul.csv, which I don't think was intended. I don't know what a * is defined to do when it is the first character of a regexp, but I believe it should be avoided. Dan On Mon, May 17, 2010 at 9:06 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Thanks again - and one follow-up question. When I do do.call(rbind, lapply(dir(patt = \\.csv$), read.csv)) What is the right way to speicify (probably under patt) that I only need to grab those .csv files that contain a certain string, e.g., result? I tried a couple of things, like patt= \\.csv$ pat = result - but it does not seem to work Thanks a lot! Dimitri On Wed, May 12, 2010 at 6:16 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Thanks a lot, Henrique, will try! Dimitri On Wed, May 12, 2010 at 3:41 PM, Henrique Dallazuanna www...@gmail.com wrote: Try this: do.call(rbind, lapply(dir(patt = \\.csv$), read.csv)) On Wed, May 12, 2010 at 4:32 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello, I am wondering if it's possible to read in all files of a certain type - without specifying their names. For example, I have 10 .csv files in my working directory. I would like to read them in and bind them all together. I was thinking of writing a loop, read in all files, and then bind them. Is it possible? Thanks a lot! -- Dimitri Liakhovitski Ninah Consulting www.ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Dimitri Liakhovitski Ninah Consulting www.ninah.com -- Dimitri Liakhovitski Ninah Consulting www.ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable variables using R ... e.g., looping over data frames with a numeric separator
Monte Shaffer monte.shaf...@gmail.com writes: Hello, I have programmed in PHP a lot, and wanted to know if anyone figured out Variable variables using R. For example, I have several dataframes of unequal sizes that relate to L treatments (1, 2, 3, 4, 5,6, L) ... in this case L=7 You should create a list containing 7 data frames, rather than attempting to identify them with names containing integers. Then you can process your data frames in a for loop, or with lapply etc, and things should generally seem much better. df.list - list(fData.1, fData.2, fData.3, fData.4, fData.5, fData.6, fData.7) Dan fData.1 unique.1 fit.nls.1 summary.nls.1 fit.var.1 summary.var.1 . fData.2 unique.2 fit.nls.2 summary.nls.2 fit.var.2 summary.var.2 . fData.L unique.L fit.nls.L summary.nls.L fit.var.L summary.var.L = I want to do something like for(i in 1:L-1) { dataStr = gsub(' ','',paste(fData.,i)); dataVar = eval(dataStr); ## GOAL is to grab data frame 'fData.1' and do stuff with it, then in next loop grab data frame 'fData.2' and do stuff with it } # in PHP, I would define the string $dataStr = final.1 and then $dataVar = $$dataStr which is a variable variables use. Thanks in advance for any help you can offer or suggest. My current solution is to write code in PHP that generates lots of R code. I would like to do it all in R, so I don't have to rely on another language. monte {x: [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] applying quantile to a list using values of another object as probs
Lorenzo Cattarino l.cattar...@uq.edu.au writes: Hi Jim, Thanks for your reply. Your codes does work but I was hoping to find a way to use lapply and avoid the for loop. Lorenzo -Original Message- From: Jim Lemon [mailto:j...@bitwrit.com.au] Sent: Monday, 17 May 2010 8:27 PM To: Lorenzo Cattarino Cc: r-help@r-project.org Subject: Re: [R] applying quantile to a list using values of another object as probs On 05/17/2010 06:01 PM, Lorenzo Cattarino wrote: Hi r-users, I have a matrix B and a list of 3x3 matrices (mylist). I want to calculate the quantiles in the list using each of the value of B as probabilities. It's a little confusing, because it isn't clear why the elements of mylist are matrices, nor why B is a matrix. I.e. why aren't these things just dimensionless vectors? However if you really do want to ignore the row/column information then perhaps what you're looking for is lapply(mylist, quantile, probs=B) [[1]] 26.55087% 37.21239% 57.28534% 90.82078% 20.16819% 89.83897% 94.46753% 66.07978% 62.9114% -0.2191315 0.3738468 0.5389231 1.2277025 -0.4274793 1.1973174 1.3405621 0.6223309 0.5811310 6.178627% 20.59746% 17.65568% -1.4270686 -0.4166326 -0.4909661 [[2]] 26.55087%37.21239%57.28534%90.82078%20.16819%89.83897% 94.46753% -0.004930323 0.072476814 0.703609732 0.925581428 -0.027300847 0.923628895 0.932833742 66.07978% 62.9114%6.178627%20.59746%17.65568% 0.793329524 0.783422677 -1.028244961 -0.026313767 -0.033078300 [[3]] 26.55087% 37.21239% 57.28534% 90.82078% 20.16819% 89.83897% 94.46753% 66.07978% 62.9114% -0.1492189 -0.1040074 0.2025300 0.8161114 -0.2803999 0.7580782 1.0316644 0.3963404 0.3886679 6.178627% 20.59746% 17.65568% -0.9801188 -0.2693299 -0.3451936 Dan The codes I wrote are: B- matrix (runif(12, 0, 1), 3, 4) mylist- lapply(mylist, function(x) {matrix (rnorm(9), 3, 3)}) for (i in 1:length(B)) { quant- lapply (mylist, quantile, probs=B[i]) } But quant returned the quantiles calculated using only the last value ([3,3]) of the matrix B. Hi Lorenzo, This works for me: B-matrix (runif(12,0,1),3,4) mylist-list() for(i in 1:3) mylist[[i]]-matrix(rnorm(9),3,3) myq-list() for(i in 1:3)myq[[i]]-quantile(mylist[[i]],probs=B[i,]) Although looking at your example, I may have misunderstood what you want the result to be. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using dev.copy
I'm working over an ssh connection without X11 graphics. I'm making a plot, the first stage of drawing which takes a long time. I want to experiment with adding details. Here is what I was hoping to do, which results in error. ## Draw the master plot on png dev 2 png(file=master.png) plot(1:10) ## Save a copy on png dev 3 png(file=copy1.png) dev.set(2) dev.copy(which=3) ## Add details to copy, write to disk and view abline(v=5) Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...) : plot.new has not been called yet Can someone tell me how to do this correctly? Thanks a lot, Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lapply to apply a function using a vector
Flana flana.bristo at gmail.com writes: Hi, First, thank you all for your help. Here is my problem (simplified): Say I have a list: a=list(matrix(50,nrow=5,ncol=5), matrix(25,nrow=5,ncol=5), matrix(10,nrow=5,ncol=5)) I'd like to use rbinom with a different probability for each matrix. I tried: b=c(.8,.1,.9) brep=rep(b,each=25) lapply(a,function(a) rbinom(25,a,brep)) but that doesn't work-- it just uses the first value of b rather than applying it over that list. Seeing as you want to index in to both the size and prob arguments of rbinom, you can use mapply, rather than lapply: mapply(function(size, prob) matrix(rbinom(25, size=size, prob=prob), nrow=5, ncol=5), c(50,25,10), c(.8,.1,.9), SIMPLIFY=FALSE) An lapply equivalent would have to use an explicit index variable, e.g. lapply(1:3, function(i) matrix(rbinom(25, size=a[[i]], prob=b[i]), nrow=5)) However, it may be that neither of these are the most efficient way to do this, as they involve calling rbinom multiple times. For just 3 different parameter sets (prob and size) that's unlikely to be a problem, but if you were simulating for a large number of parameter sets then you might want to consider calling rbinom once and subsequently unpacking the results, e.g. size - rep(c(50,25,10), each=25) prob - rep(c(.8,.1,.9), each=25) x - rbinom(25*3, size=size, prob=prob) lapply(split(x, rep(1:3, each=25)), matrix, nrow=5) Dan what I am currently doing is: c=list() for (i in 1:3){c[[i]]=rbinom(25,a[[i]],b[i])} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to do calculations in data matrices?
Zoppoli, Gabriele (NIH/NCI) [G] zoppolig at mail.nih.gov writes: Please give me just a reference where I can find something useful. The others are right that rather than randomly googling, you should bite the bullet and sit down for a couple of hours with some introductory material on R (a book, or one of the freely available pdfs). Unless you are never going to use R again, it will be worth it. But seeing as you asked your question clearly, here's one way to do the steps you specify. Hopefully this will help as well. First, make a matrix to work with: mat1 - matrix(sample(1:10, size=12, replace=TRUE), ncol=4) mat1 [,1] [,2] [,3] [,4] [1,]57 109 [2,]4 1082 [3,]8 1095 In summary, I need to : - find the median of each row of a matrix You can use apply for that: row.medians - apply(mat1, 1, median) row.medians [1] 8.0 6.0 8.5 - create a new matrix with each value in the first matrix divided by the median of its row That's easy to *do*: mat2 - mat1 / row.medians mat2 [,1] [,2] [,3] [,4] [1,] 0.625 0.875000 1.25 1.125 [2,] 0.667 1.67 1.33 0.333 [3,] 0.9411765 1.176471 1.058824 0.5882353 but it may take more time to understand why that worked. How come it knew that we wanted to divide each row by the median of the row? (Hint: understand the byrow argument in ?matrix and the mentions of the word recyling in ?Arithmetic). - if a value a in the second matrix is 1, I need to substitute it with 1/a First make a logical vector which identifies the elements of the matrix you want to operate on: is.small - mat2 1 Then perform the operation on those elements: mat2[is.small] - 1 / mat2[is.small] mat2 [,1] [,2] [,3] [,4] [1,] 1.6000 1.142857 1.25 1.125 [2,] 1.5000 1.67 1.33 3.000 [3,] 1.0625 1.176471 1.058824 1.700 # you could also use ifelse(mat2 1, 1/mat2, mat2) dan I know that for some of you it must be overeasy, but I swear I googled for two hours with keywords operations, calculations, data matrices, data tables, and CRAN, and I didn't find anything useful. Thank you all Gabriele Zoppoli, MD Ph.D. Fellow, Experimental and Clinical Oncology and Hematology, University of Genova, Genova, Italy Guest Researcher, LMP, NCI, NIH, Bethesda MD Work: 301-451-8575 Mobile: 301-204-5642 Email: zoppolig at mail.nih.gov __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NMDS ordination
Aisyah aisyah.faruk at ioz.ac.uk writes: Hi Im currently trying to plot my NMDS data together with fitted variables (envfit funct) on an ordination plot. The plot function shows two displays=sites and sp. I was wondering how to plot it so that the sites come up as different points for different sites but the species come up as actual names? It looks a little busy at the moment with everything in. Please provide an example. I.e. working code creating the plot as you have it at the moment. Include an example data set and be explicit about what packages are needed. Don't post a large data set -- just create a minimal example demonstrating the problem you are having. Sya __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] List to matrix or to vectors conversion
Ted.Harding at manchester.ac.uk writes: On 12-Feb-10 13:14:29, Juan Tomas Sayago wrote: Dear list, I have a list with 1000 x1000 lines and columns Lists have neither lines nor columns. Can you explain exactly what you have? E.g. show us the code that created your list? do you know how I can convert it to matrrix or data.frame. Thanks. Juan as.data.frame() will convert it to a dataframe. If you then apply as.matrix() to the result you will get a matrix: L - list(X=c(1,2,3),Y=c(4,5,6),Z=c(7,8,9)) If you want a matrix as opposed to a data.frame (e.g. your list entries are all numeric), and the data set is large, this more efficient method might be useful: matrix(unlist(L), nrow=3) [,1] [,2] [,3] [1,]147 [2,]258 [3,]369 If it's not obvious to you what that does, consider: unlist(L) X1 X2 X3 Y1 Y2 Y3 Z1 Z2 Z3 1 2 3 4 5 6 7 8 9 matrix(unlist(L), nrow=3, byrow=TRUE) [,1] [,2] [,3] [1,]123 [2,]456 [3,]789 matrix(unlist(L), nrow=3, byrow=FALSE) [,1] [,2] [,3] [1,]147 [2,]258 [3,]369 L # $X # [1] 1 2 3 # $Y # [1] 4 5 6 # $Z # [1] 7 8 9 D - as.data.frame(L) D # X Y Z # 1 1 4 7 # 2 2 5 8 # 3 3 6 9 M - as.matrix(D) M # X Y Z # [1,] 1 4 7 # [2,] 2 5 8 # [3,] 3 6 9 Note that applying as.matrix() directly to the original L will not work. It returns a list, not a matrix. Ted. E-Mail: (Ted Harding) Ted.Harding at manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 12-Feb-10 Time: 13:40:32 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] paired wilcox test on each row of a large dataframe
gauravbhatti gaurav15984 at hotmail.com writes: hI I have to calculate V statistic for each row of a large dataframe (28000). I can not use multtest package for paired wilcox test. I have been using for loop which are. Is there a way to speed the computation with another method like using apply or tapply? Using a for loop is fine here (and basically unavoidable). If you need it to be faster, use a matrix rather than a data.frame. (i.e. make a matrix containing columns 1-12, which are all numeric and so do not need to be in a data frame). Below are versions using apply, sapply and an explicit for loop. There's not much difference in speed. But the last one, in which the data is in a data.frame with rownames, is much slower. d - matrix(rnorm(12000), nrow=1000) system.time(ans - apply(d, 1, function(row) unlist(wilcox.test(row[1:6], row[7:12])[c(p.value,statistic)]))) user system elapsed 2.660 0.064 2.730 system.time(ans2 - sapply(1:nrow(d), function(i) unlist(wilcox.test(d[i,1:6], d[i,7:12])[c(p.value,statistic)]))) user system elapsed 2.480 0.108 2.583 system.time({ans3 - matrix(nrow=nrow(d), ncol=2) ; for(i in 1:nrow(d)) { ans3[i,] - unlist(wilcox.test(d[i,1:6], d[i,7:12]) [c(p.value,statistic)])}}) user system elapsed 2.504 0.000 2.503 d - as.data.frame(d) rownames(d) - paste(letters, 1:nrow(d)) system.time(ans2 - sapply(1:nrow(d), function(i) unlist(wilcox.test(as.numeric(d[i,1:6]), as.numeric(d[i,7:12]))[c(p.value,statistic)]))) user system elapsed 5.673 0.212 5.885 Dan My data set looks like this: 11573_MB 11911_MB 11966_MB 12091_MB 12168_MB 12420_MB cg0292 0.62123125 0.82663502 0.74687013 0.61774927 0.7337809 0.73203721 cg2426 0.63631315 0.64408750 0.61975158 0.72500713 0.5753110 0.65146526 cg3994 0.05035499 0.05189776 0.05882848 0.11198073 0.1313330 0.03883439 cg5847 0.13936423 0.14967690 0.31874454 0.15876243 0.117 0.15070058 cg6414 0.09059770 0.09915681 0.09952658 0.13955982 0.1757718 0.07566312 cg7981 0.05622769 0.04143790 0.07167018 0.08051046 0.1378107 0.0543 .. 11573_CB 11911_CB 11966_CB 12091_CB 12168_CB 12420_CB cg0292 0.83059018 0.65396035 0.74519819 0.76007659 0.70335691 0.7857631 cg2426 0.61450928 0.59160923 0.69857198 0.73028911 0.71808719 0.6741295 cg3994 0.04223668 0.07910444 0.05416764 0.06156407 0.06381321 0.0643354 cg5847 0.13897704 0.06407313 0.20449931 0.15683154 0.18936196 0.1610695 cg6414 0.06520757 0.12243180 0.11380134 0.10957321 0.15759518 0.1236715 cg7981 0.04789030 0.11699024 0.07143036 0.05996888 0.10829510 0.1069037 . .. . . . There are 12 columns and 27000 rows. I have to perform paired test on each row (1:6 vs 7:12) and store the p value and statistic in two columns . Whats the fastest way? Gaurav Bhatti __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Code working but too slow, any idea for how to speed it up ?(no loop in it)
anna lippelanna24 at hotmail.com writes: Hello my friends, here is a code I wrote with no loops on matrix that is taking too long (2 seconds and I call him 720 times -- 12 minutes): mat1 and mat2 are both matrix with 103 columns and 164 rows. Could you provide some example code creating matrices mat1 and mat2 which have exactly the same structure as the mat1 and mat2 you are using. We don't really want your exact data, but just toy matrices that have exactly the same form as your data matrices. Without that your question's hard to answer as we can't try out your code. failing that, please post the output of str(mat1) and str(mat2). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Access variables by string
Philipp Rappold philipp.rappold at gmail.com writes: Dear all, [...] (2) I need this functionality for a customized na.exclude() function that I am building, which should only exclude rows that have NA in certain columns. Maybe there is already a function which does exactly what I need, so I'd highly appreciate if someone could point me there ;) I would use something like naexclude - function(data, varnames) d[rowSums(is.na(data[,varnames,drop=FALSE])) == 0,] Dan My current implementation looks like this: naexlcude - function(data, varnames) { for(v in varnames){ data = subset(data, !is.na(v)) } data } Best Philipp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching matrix columns to a vector
Jagat.K.Sheth wrote: How about which(colSums(t-v) == 0) ? But what about v=c(2,1,3)? It needs to be something like which(colSums((t - v)^2)) == 0 or which(colSums(abs(t - v))) == 0 Dan Jagat.K.Sheth wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Salas, Andria Kay Sent: Monday, November 24, 2008 10:04 AM To: r-help@r-project.org Subject: [R] matching matrix columns to a vector I need help with (hopefully) just one more thing. I have been fussing with this for quite some time and have decided just to give up and ask! I want to match a column in a matrix to a vector. I found a which command that I thought would be helpful as it does the following: g=c(1,5,3,2,7) which(g==5) [1] 2 As the above gave which placement in the g vector corresponded to 5 (the second place), I need this command to give me which column in a matrix matches to a vector. This is just a toy example of what I am trying to do: t=matrix(1:12,3,4) v=c(1,2,3) which(t[,j]==v) This does not work, and with my real matrices and vectors, I was getting outputs that did not make sense. These examples are more to give an idea of what I am aiming to accomplish. Thank you for all the help!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/matching-matrix-columns-to-a-vector-tp20664376p20668707.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching matrix columns to a vector
Dan Davison wrote: Jagat.K.Sheth wrote: How about which(colSums(t-v) == 0) ? But what about v=c(2,1,3)? It needs to be something like which(colSums((t - v)^2)) == 0 or which(colSums(abs(t - v))) == 0 Sorry, apparently I tried to write a line of R code without using emacs. Bad idea. I meant which(colSums((t - v)^2) == 0) Dan Dan Davison wrote: Dan Jagat.K.Sheth wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Salas, Andria Kay Sent: Monday, November 24, 2008 10:04 AM To: r-help@r-project.org Subject: [R] matching matrix columns to a vector I need help with (hopefully) just one more thing. I have been fussing with this for quite some time and have decided just to give up and ask! I want to match a column in a matrix to a vector. I found a which command that I thought would be helpful as it does the following: g=c(1,5,3,2,7) which(g==5) [1] 2 As the above gave which placement in the g vector corresponded to 5 (the second place), I need this command to give me which column in a matrix matches to a vector. This is just a toy example of what I am trying to do: t=matrix(1:12,3,4) v=c(1,2,3) which(t[,j]==v) This does not work, and with my real matrices and vectors, I was getting outputs that did not make sense. These examples are more to give an idea of what I am aiming to accomplish. Thank you for all the help!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/matching-matrix-columns-to-a-vector-tp20664376p20668878.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a way to not use an explicit loop?
Both shape parameters of rbeta can be vectors; for x - rbeta(n, shape1, shape2) x[i] ~ Beta(shape1[i], shape2[i]) so bbsim - function(m=1000, num.post.draws=1e4, size.a=100, prob.a=.27, prior.count=1) { data.count - rbinom(m, size.a, prob.a) shape1 - rep(prior.count + data.count, each=num.post.draws) shape2 - rep(prior.count + size.a - data.count, each=num.post.draws) matrix(rbeta(m * num.post.draws, shape1, shape2), num.post.draws, m) } Then you can do beta.draws - bbsim() means - apply(beta.draws, 2, mean) medians - apply(beta.draws, 2, median) etc Dan On Wed, Sep 17, 2008 at 11:56:36AM -0700, Juancarlos Laguardia wrote: I have a problem in where i generate m independent draws from a binomial distribution, say draw1 = rbinom( m , size.a, prob.a ) then I need to use each draw to generate a beta distribution. So, like using a beta prior, binomial likelihood, and obtain beta posterior, m many times. I have not found out a way to vectorize draws from a beta distribution, so I have an explicit for loop within my code for( i in 1: m ) { beta.post = rbeta( 1, draw1[i] + prior.constant , prior.constant + size.a - draw1[i] ) beta.post.mean[i] = mean(beta.post) beta.post.median[i] = median(beta.post) etc.. for other info } Is there a way to vectorize draws from an beta distribution? UC Slug __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://www.stats.ox.ac.uk/~davison __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scripting in R -- pattern matching, logic, system calls, the works!
Instead of writing some long, ugly, script, the way to use R is to break problems down into distinct tasks. Reading data is one task, and performing regressions on the data, plotting summarising are different tasks. Write functions to do each task in general, and then use those functions. So one task is reading the data from a Coverage dir. You want to do a linear regression on the data, so you want to have the data stored as a data frame. Following on from Don McQueen's good advice, here's a function that does the job: read.data.from.coverage.dir - function(dir, pattern=Length_[0-9]+, min.length=0, max.length=Inf) { ## return a data frame with lengths in first column and means of ## file contents in second column files - list.files(dir, pattern) lengths - as.numeric(gsub(Length_, , files, perl=TRUE)) files - files[lengths = min.length lengths = max.length] get.mean.from.file - function(file) mean(scan(file.path(dir,file), quiet=TRUE)) data.frame(x=lengths, y=sapply(files, get.mean.from.file)) } And here's a function, that uses the first one, to get all the data from your various coverage dirs get.all.data - function(topdir) { coverage.dirs - list.files(path=topdir, pattern=Coverage, full.names=TRUE) lapply(coverage.dirs, read.data.from.coverage.dir) } So now you can do ## read all the data all.data - get.all.data(topdir=~) ## perform all the regressions regression.fits - lapply(all.data, function(df) lm(y ~ x, data=df)) ## summarise them summaries - lapply(regression.fits, summary) ## etc All those commands are generating lists of objects; lapply is a shorthand for doing a for loop over a list. You can use sink() to redirect output, but it would probably be better to create tables and/or figures in R first, then write them to files. Dan On Tue, Sep 16, 2008 at 07:01:42AM -0700, bioinformatics_guy wrote: Don, Excellent advice. I've gone back and done a bit of coding and wanted to see what you think and possibly shore up some of the technical stuff I am still having a bit of difficulty with. I'll past the code I have to date with any important annotations: topdir=~ library(gmodels) setwd(topdir) ### Will probably want to do two for loops as opposed to recursive files=list.files(path=topdir,pattern=Coverage) for (i in files) { dir=paste(~/hangers/,i,sep=) files2=list.files(path=dir,pattern=Length) ### Make an empty matrix that will have the independent variable as the filenum and the dependent variable ### as the mean of the length or should I have two vectors for the regression. Basically the Length_(\d+) is the independent variable (which is taken from the filename) which all the regressions will have and then inside the Length_(\d+) is a 1d set of numbers which I take the mean of which in turn becomes the dependent variable. So in essence the points are: f(length)=mean(length$V1) f(45)=50 f(50)=60 etc ... for (j in files2) { ## I just rearranged the following line but I'm not sure what the command is doing ## I am assuming 'as.numeric' means take the input as a number instead of a string and the gsub has#me stumped filenum=as.numeric(gsub('Length_','',j)) ## Can I assign variables at the top instead of hardcoding? like upper=50 , lower=30? ## And I don't need to put brackets for this if statement do I? Does it basically just ## say that if the filenum is outside those parameters, just go to the next j in files2? if (filenum 200 | filenum -10) next dir2=paste(~/hangers,i,j,sep=/) tmp=read.table(dir2) mean(tmp($V1)) Now should I put these in a matrix or a vector (all j values (length vs mean(tmp$V1) for each i iteration) } } I think lastly, Id like to get a print out of each of the regressions (each iteration of i). Is that when I use the summary command? And, like in unix, can I redirect the output to a file? Best Don MacQueen wrote: I can't go through all the details, but hopefully this will help get you started. If you look at the help page for the list.files() function, you will see this: list.files(path = ., pattern = NULL, all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE) The . in path means to start at your current working directory. Assuming your 5 Coverage directories are subdirectories of your current working directory, that's what you want. Then, setting recursive to TRUE will cause it to also list the contents of all subdirectories. Since your Length files are in the Coverage subdirectories, that's what you want. Finally, the pattern argument returns only files that match the pattern, so something like patter=Length should get you
Re: [R] Spatial join ? optimizing code
Hi Monica, I think the key to speeding this up is, for every point in 'track', to compute the distance to all points in 'classif' 'simultaneously', using vectorized calculations. Here's my function. On my laptop it's about 160 times faster than the original for the case I looked at (10,000 observations in track and 500 in classif). I get around 18 seconds for the 30,000 and 4,000 example (2 GHz processor running linux). Dan dist.merge2 - function(x, y, xeast, xnorth, yeast, ynorth) { ## construct data frame d in which d[i,] contains information ## associated with the closest point in y to x[i,] xpos - as.matrix(x[,c(xeast, xnorth)]) xposl - lapply(seq.int(nrow(x)), function(i) xpos[i,]) ypos - t(as.matrix(y[,c(yeast, ynorth)])) yinfo - y[,! colnames(y) %in% c(yeast,ynorth)] get.match.and.dist - function(point) { sqdists - colSums((point - ypos)^2) ind - which.min(sqdists) c(ind, sqrt(sqdists[ind])) } match - sapply(xposl, get.match.and.dist) cbind(xpos, mindist=match[2,], yinfo[match[1,],]) } It's marginally faster to convert xpos to a list followed by sapply as I do here, than to leave it as a matrix and use apply to get the matches. On Tue, Sep 16, 2008 at 04:23:33PM +, Monica Pisica wrote: Hi, Few days ago I have asked about spatial join on the minimum distance between 2 sets of points with coordinates and attributes in 2 different data frames. Simon Knapp sent code to do it when calculating distance on a sphere using lat, long coordinates and I've change his code to use Euclidian distances since my data had UTM coordinates. Typically one data frame has around 30 000 points and the classification data frame has around 4000 points, and the aim is to add to each point from the first data frame all the attributes from the second data frame of the point that is closest to it. On my PC (Dell, OptiPlex GX620, X86 ? based PC, 4 GB RAM, 3192 Mhz processor) It took quite a long time to do the join: user system elapsed 8166.07 2.98 8194.43 Sys.info() sysname release Windows XP version nodename build 2600, Service Pack 2 machine x86 I am running R 2.7.1 patched. I wonder if any of you can suggest or help (or have time) in optimizing this code to make it run faster. My programming skills are not high enough to do it. Thanks, Monica code follows: x a data frame with over 3 points with coord in UTM, xeast, xnorth y a data frame with over 4000 points with UTM coord (yeast, ynorth) and # classification ### calculating Euclidian distance dist - function(xeast, xnorth, yeast, ynorth) { ((xeast-yeast)^2 + (xnorth-ynorth)^2)^0.5 } ### doing the merge by location with minimum distance dist.merge - function(x, y, xeast, xnorth, yeast, ynorth){ tmp - t(apply(x[,c(xeast, xnorth)], 1, function(x, y){ dists - apply(y, 1, function(x, y) dist(x[2], x[1], y[2], y[1]), x) cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,] } , y[,c(yeast, ynorth)])) tmp - cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(yeast, ynorth), names(y))]) row.names(tmp) - NULL tmp } code end _ Live. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://www.stats.ox.ac.uk/~davison __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very confused with class
Hi Rich, Richard M. Heiberger wrote: Dan, The real problem is the use of csv files. csv files don't handle missing values (#VALUE is most likely from Excel), dates, or other complications very well. Read your Excel file directly into R with one of the packages designed specifically for that purpose. I recommend RExcel (Windows only) which allows complete two-way communication between R and Excel. Missing values and dates are handled correctly. You can download the RExcelInstaller package from CRAN. I'm sure RExcel is an excellent technology. However, it is an unnecessarily complex technology in this instance. What I was trying to do was help the original poster read in tabular data stored in a standard text format, which is a fundamental skill for any R programmer. In general, I would encourage people (beginners especially) to avoid the use of hi-tech solutions, when simple text-based solutions suffice. But when people do need to have more sophisticated integration of R and e.g. Excel, it's nice that the tools exist. Dan Richard M. Heiberger wrote: Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Very-confused-with-class-tp19090246p19104343.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [help] simulation of a simple Marcov Stochastic process for population genetics
On Thu, Aug 21, 2008 at 03:00:51AM -0700, z3mao wrote: Hi, this is my first time using R. I want to simulate the following process: in a population of size N, there are i individuals bearing genotype A, the number of those bearing A is j in the next generation, which following a binominal distribution (choose j from 2*N, the p is i/2*N), to plot the probability of the next generations, my script is as follows. It cannot run successfully, declaring that the ylim should be limited. In a situation like this, try using options(error=recover) to debug. I wonder where the bug is. Thanks very much! There are several bugs... The most serious is that your homemade binomial random number generator is wrong. (For example, look at what happens when it is given a probability parameter of 0: it returns 1 rather than 0. Your alleles aren't going to be lost from the population very often!). So, if someone has set you the task of simulating drift without using a built-in binomial RNG, then you'll need to think through your RNG code again. But if you are free to do what you want, then you should use the R function rbinom to generate binomial RVs. Here are comments on the other bugs with a cleaned up (but still probabilistically wrong) version below. generation-function(i,N) { m-1; ## Don't initialise m here; it gets initialised in the for loop gen-numeric(); ## gen - rep(NA, 50) is better for(m in 1:50) { testp-runif(1,0,1); j-0; sump-0; while(sump testp) { sump-sump+dbinom(j,2*N,i/(2*N)); j-j+1; } ## I've already said that the above is wrong i-j; gen[m]-j/(2*N); m-m+1; ## The for loop deals with incrementing m; don't do it yourself! } plot(m, gen[m]); ## You want plot(1:50, gen, type=l) ## You don't need semicolons at the end of lines in R! } Here's a version of your code that corrects the other bugs, but still has your incorrect binomial RNG code in it. generation - function(i,N) { warning(binomial RNG code is wrong) mvals- 1:50; gen- numeric(); for(m in mvals) { testp- runif(1,0,1); j- 0; sump- 0; while(sump testp) { sump- sump+dbinom(j,2*N,i/(2*N)); j- j+1; } i- j; gen[m]- j/(2*N);## m- m+1; } plot(mvals, gen, type=l); } Dan -- View this message in context: http://www.nabble.com/-help--simulation-of-a-simple-Marcov-Stochastic-process-for-population-genetics-tp19085705p19085705.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://www.stats.ox.ac.uk/~davison __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very confused with class
Hi Robin, You haven't said where you're getting the data from. But if the answer is that you're using read.table, read.csv or similar to read the data into R, then I advise you to go back to that stage and get it right from the outset. It's very, very common to see people who are relatively new to R splattering their code with calls to as.numeric, just because they haven't read the data in properly in the first place. It's also common in those who aren't new to R... So e.g. if you are using read.table, then use the colClasses argument to specify the classes of your columns, and use str() on the result until you're happy with the data frame produced. It's not entirely clear why you would have ended up with factors if your data are numeric. That often happens when people mix characters with numbers. Perhaps you have mixed the header row up with the data? Anyway, what you are seeing are the integer encodings of the factors. E.g. f - factor(11:20) str(f) Factor w/ 10 levels 11,12,13,..: 1 2 3 4 5 6 7 8 9 10 as.numeric(f) [1] 1 2 3 4 5 6 7 8 9 10 But don't mess with them. Just make sure that things which shouldn't be factors never become factors. Dan On Thu, Aug 21, 2008 at 03:40:58PM +0100, Williams, Robin wrote: Hi all, I am very confused with class. I am looking at some weather data which I want to use as explanatory variables in an lm. R has treated these variables as factors (i.e. with different levels), whereas I want them treated as discretely measured continuous variables. So I need to reassign the class of these variables, right? Indeed, doing class(southwest$pressure) (pressure being air pressure), I get # factor. Now what class should I use to reassign them so that my model fitting process goes as I want it to? I have obviously done something wrong. I did southwest$pressure - as(southwest$pressure,numeric) numeric seeming like a reasonable class to assign to this variable. However, doing some summary stats like mean(southwest$pressure) # 341, max(southwest$pressure) # 761, which is clearly nonsense, as my maximum value is around 1040. Something similar has happened to maxtemp (maximum temperature), which I also reassigned from a factor to class numeric, which now apparently has a maximum value of 147! Clearly it must be the reassignment of class that has caused these problems, as summary stats on the data before I reassigned the classes were fine. What is wrong with the class numeric? Reading the numeric help page didn't reveal anything to me. Can someone suggest the correct class? Many thanks for any help. Robin Williams Met Office summer intern - Health Forecasting [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://www.stats.ox.ac.uk/~davison __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very confused with class
On Thu, Aug 21, 2008 at 04:20:57PM +0100, Williams, Robin wrote: Hi Dan, Thanks for the reply, yes, I am using read.csv on the attached file. OK, so how about using the colClasses argument. Your problem is that some malfunctioning software has inserted the value #VALUE! into some of your supposedly numeric cells. So deal with that with the na.strings argument. Like I said, when reading in data, it's worth spending a minute looking at the documentation for read.table/read.csv rather than spending an hour messing about with the results of not doing so. Southwest - read.csv(southwest.csv, colClasses=c(character,rep(numeric,10), character), na.strings=#VALUE!) str(Southwest) 'data.frame': 1530 obs. of 12 variables: $ date : chr 5/1/1997 5/2/1997 5/3/1997 5/4/1997 ... $ maxtemp : num 18.8 21.8 16.6 14.9 14.2 9.3 9.9 12.7 12.8 13.2 ... $ mintemp : num 7.7 9.8 11 12.2 11.3 4.5 2.1 5.7 6.7 7.3 ... $ pressure : num 1028 1023 1015 1001 989 ... $ humid : num 59 44 83 80 87 57 64 83 70 69 ... $ wind : num 8.4 11.1 8.2 17.4 13.8 16.2 11.1 14.9 12.7 16.6 ... $ rain : num 0 0 6 1 3.3 2.6 4.3 6 3.2 1.6 ... $ index : num 1 2 3 4 5 6 7 8 9 10 ... $ admissions: num 5.00 4.72 5.16 3.67 3.62 ... $ detrended : num 4.79 4.47 5.30 3.91 3.51 ... $ detrended2: num 4.79 4.47 5.30 3.91 3.51 ... $ d.o.w.: chr Thu Fri Sat Sun ... NB you could coerce those dates to a date class rather than character but I'll leave that up to you. str() is your friend. Dan However, as when I do Southwest - data.frame(read.csv(southwest.csv) read.csv returns a data frame; no need to wrap it in data.frame() Names(southwest) the output is the column headings (i.e. the variables), and looking at the data I only get the numbers, I assume the column headings haven't become confused with the data. I.e. if I just do Southwest$pressure The output is correct, i.e. the values contained in the pressure column. Appologies for my repeated question, but I'm somewhat confused on this one and my lack of experience with R isn't helping matters. I don't even understand why R is interpreting these figures as factors in the first place, doesn't this imply that any similar data would be interpreted as factors? Thanks for any further help. Robin Williams Met Office summer intern - Health Forecasting [EMAIL PROTECTED] -Original Message- From: Dan Davison [mailto:[EMAIL PROTECTED] Sent: Thursday, August 21, 2008 4:11 PM To: Williams, Robin Cc: r-help@r-project.org Subject: Re: [R] Very confused with class Hi Robin, You haven't said where you're getting the data from. But if the answer is that you're using read.table, read.csv or similar to read the data into R, then I advise you to go back to that stage and get it right from the outset. It's very, very common to see people who are relatively new to R splattering their code with calls to as.numeric, just because they haven't read the data in properly in the first place. It's also common in those who aren't new to R... So e.g. if you are using read.table, then use the colClasses argument to specify the classes of your columns, and use str() on the result until you're happy with the data frame produced. It's not entirely clear why you would have ended up with factors if your data are numeric. That often happens when people mix characters with numbers. Perhaps you have mixed the header row up with the data? Anyway, what you are seeing are the integer encodings of the factors. E.g. f - factor(11:20) str(f) Factor w/ 10 levels 11,12,13,..: 1 2 3 4 5 6 7 8 9 10 as.numeric(f) [1] 1 2 3 4 5 6 7 8 9 10 But don't mess with them. Just make sure that things which shouldn't be factors never become factors. Dan On Thu, Aug 21, 2008 at 03:40:58PM +0100, Williams, Robin wrote: Hi all, I am very confused with class. I am looking at some weather data which I want to use as explanatory variables in an lm. R has treated these variables as factors (i.e. with different levels), whereas I want them treated as discretely measured continuous variables. So I need to reassign the class of these variables, right? Indeed, doing class(southwest$pressure) (pressure being air pressure), I get # factor. Now what class should I use to reassign them so that my model fitting process goes as I want it to? I have obviously done something wrong. I did southwest$pressure - as(southwest$pressure,numeric) numeric seeming like a reasonable class to assign to this variable. However, doing some summary stats like mean(southwest$pressure) # 341, max(southwest$pressure) # 761, which is clearly nonsense, as my maximum value is around 1040. Something similar has happened to maxtemp (maximum temperature), which I also reassigned from a factor to class numeric, which now apparently has a maximum value of 147! Clearly it must
Re: [R] Quickly calculating the mean results over a collection of data sets?
On Tue, Aug 12, 2008 at 04:47:14AM -0400, Michael R. Head wrote: I have a collection of datasets in separate data frames which have 3 independent test parameters (w, x, y) and one dependent variable (z) , together with some additional static test data on each row. What I want is a data frame which contains the test data, the parameters (w, x, y) and the mean value of all (z)s in the Z column. Each datasets has around 6000 rows and around 7 columns, which doesn't seem outrageously large, so it seems like this shouldn't too time consuming, but the way I've been approaching it seems to take way too long (20 seconds for datasets over 4 runs, longer for my datasets over 10 runs). My imperative-coding brain lead me to use for loops, which seems to be particularly problematic for R performance. My first attempt at this looked like the following, which takes roughly 60 seconds to complete. I rewrote it a little, but the code was much longer and effectively replaces one of the for loops with an lapply(). I could paste the other code, but it's much longer and less clear about its intent. Hi Michael, ### # Start code snippet ### ### inputFiles just a list of paths to the test runs testRuns - lapply(inputFiles, function(x) { read.table(x, header=TRUE)}) (Just BTW lapply(inputFiles, read.table, header=TRUE) is slightly nicer to look at) ### W, X, Y have (small) natural values w - unique(testRuns[[1]]$W) x - unique(testRuns[[1]]$X) y - unique(testRuns[[1]]$Y) ### All runs have the same values for all columns ### with the exception of the Z values, so just ### copy the first test run data testMeans - data.frame(testRuns[[1]]) How about rbind()ing all the data frames together, and working with the combined data frame? Say that testRuns is testRuns [[1]] W X Y Z 1 1 5 5 -0.5251156 2 5 1 3 1.1761139 3 2 4 4 -0.8934380 4 5 1 1 1.4076303 5 5 3 1 0.4679745 [[2]] W X Y Z 1 1 5 5 -0.8556862 2 5 1 3 0.3517671 3 2 4 4 -1.0202064 4 5 1 1 1.2152349 5 5 3 1 0.4340249 allRuns - do.call(rbind, testRuns) aggregate(allRuns$Z, by=allRuns[c(W,X,Y)], mean) W X Y x 1 5 1 1 1.3114326 2 5 3 1 0.4509997 3 5 1 3 0.7639405 4 2 4 4 -0.9568222 5 1 5 5 -0.6904009 Dan for(w0 in w) { for(y0 in y) { for (x0 in x) { row - which(testMeans$W == w0 testMeans$Y == y0 testMeans$X == x0) meanValues - sapply(testRuns, function(r) {mean( subset(r, r$W == w0 r$Y == y0 r$X == x0)$Z )}) testMeans[row,]$Z = mean(meanValues) } } } ### I will then want to plot certain values over (X, Z), ### so ultimately, I'm going to subset the data further. ### Code which gives me a list of W tables with mean Z values ### works, too. ### # End code snippet ### Thanks, mike -- Michael R. Head [EMAIL PROTECTED] http://www.cs.binghamton.edu/~mike/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- www.stats.ox.ac.uk/~davison __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Frequency vector
On Tue, Aug 12, 2008 at 01:21:29AM -0700, dennis11 wrote: I want to create a vecor with frequencies. I have tried this: a - c(1,1,1,1,2,3,4,5,5) b - table(a) print (b[1]) which results in: print (b[1]) 1 4 The only thing I want is the 4. So this seems obvious: print (b[1,2]) No! The 1 is just a label. You're not looking at a matrix. (BTW, I think you meant b[2,1]). First I would say don't get rid of the 1 label unless you need to. It's just a label telling you what the count is referring to, and it wouldn't be there if there weren't a good reason for it. It won't interfere with any numeric calculations you do, e.g. b[1] * 2 1 8 But if you really want to extract the integer counts from an object of class table you could do as.vector(b) [1] 4 1 1 1 2 Remember that if an object is not behaving as you would expect, use str() and class() to see what you've really got: class(b) [1] table str(b) 'table' int [, 1:5] 4 1 1 1 2 - attr(*, dimnames)=List of 1 ..$ a: chr [1:5] 1 2 3 4 ... Dan but it does not work: Error in b[1, 2] : incorrect number of dimensions How do I get a vector or how do I refer to the 4 without getting the 1 label as well? -- View this message in context: http://www.nabble.com/Frequency-vector-tp18939882p18939882.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- www.stats.ox.ac.uk/~davison __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Between the values
On Tue, Aug 12, 2008 at 05:16:01PM +0530, Shubha Vishwanath Karanth wrote: Hi R, This is a very trivial one C=0.1 I want to check whether my value of C is between 0 and 1 exclusively I don't want to use (C0 C1). And I can't use a single statement like (0C1). Is there a between function? Or how do we specify from 0 to 1? Does %in% help me? If you don't like (C 0 C 1), then just write your own function is.between(x, low, high) (NB1 you've basically written it already; NB2 single '' for the vectorised version 'are.between'). People's personal tastes about what's desirable will vary, and anyway it's good practice to build up your own personal library of functions. Ultimately if you have a high quality collection of related functions for working on a particular sort of problem, then you should publish them as an R package on CRAN. Dan Many Thanks, Shubha This e-mail may contain confidential and/or privileged i...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- www.stats.ox.ac.uk/~davison __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Between the values
Shubha Vishwanath Karanth wrote: Or at least anyways of defining a vector/(or something like that) which has all values between 0 and 1? For example: C(0,1) is incorrect, seq(0,1,0.2) is also incorrect, seq(0,1,0.1) is also incorrect How does one specify this? Hi Shubha, What are you trying to do? The set of all real numbers between 0 and 1 is infinitely large. Obviously you can't explicitly construct an infinitely large vector in R. If you want to construct an implicit specification of that set, then I think I've already given you a good answer in R: define a predicate function and use it. E.g. between - function(x, low, high) x low x high I don't know much at all about symbolic mathematics packages like Maple and Mathematica, but maybe you're thinking of something you can do in those softwares? R is not trying to be a competitor to them; they do lots of things R doesn't, and vice versa. Dan Shubha Vishwanath Karanth wrote: Thanks, Shubha -Original Message- From: Dan Davison [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 12, 2008 5:54 PM To: Shubha Vishwanath Karanth Cc: [EMAIL PROTECTED] Subject: Re: [R] Between the values On Tue, Aug 12, 2008 at 05:16:01PM +0530, Shubha Vishwanath Karanth wrote: Hi R, This is a very trivial one C=0.1 I want to check whether my value of C is between 0 and 1 exclusively I don't want to use (C0 C1). And I can't use a single statement like (0C1). Is there a between function? Or how do we specify from 0 to 1? Does %in% help me? If you don't like (C 0 C 1), then just write your own function is.between(x, low, high) (NB1 you've basically written it already; NB2 single '' for the vectorised version 'are.between'). People's personal tastes about what's desirable will vary, and anyway it's good practice to build up your own personal library of functions. Ultimately if you have a high quality collection of related functions for working on a particular sort of problem, then you should publish them as an R package on CRAN. Dan Many Thanks, Shubha This e-mail may contain confidential and/or privileged i...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- www.stats.ox.ac.uk/~davison This e-mail may contain confidential and/or privileged i...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Between-the-values-tp18943069p18944668.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dynamically extract data from a list
Dries Knapen-2 wrote: Hi, Thanks for your reply. However, this didn't work exactly as I needed it to since the expression is dynamically built as a character vector i.e. not executed as e - expression(Sepal.Width 4) but as e - expression(Sepal.Width 4) in which case subset() throws an error (must evaluate to logical). Fortunately, a good night of sleep resulted in this workaround: s - iris[Sepal.Width 4,] execute.string - function(string) { write(string, 'tmp.txt') out - source('tmp.txt') unlink('tmp.txt') return(out$value) } execute.string(s) Is this what you want? eval(parse(text=s)) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 16 5.7 4.4 1.5 0.4 setosa 33 5.2 4.1 1.5 0.1 setosa 34 5.5 4.2 1.4 0.2 setosa Dan Dries Knapen-2 wrote: On 12 Aug 2008, at 04:08, Gabor Grothendieck wrote: Try this: e - expression(Sepal.Width 4) subset(iris, eval(e), select = Sepal.Length) Sepal.Length 16 5.7 33 5.2 34 5.5 subset(iris, eval(e)) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 16 5.7 4.4 1.5 0.4 setosa 33 5.2 4.1 1.5 0.1 setosa 34 5.5 4.2 1.4 0.2 setosa On Mon, Aug 11, 2008 at 9:36 PM, Dries Knapen [EMAIL PROTECTED] wrote: Hi, Based on user input, I wrote a function that creates a list which looks like: str(list) List of 4 $ varieties: chr [1:12] temp.26_time.5dagen_biorep.1 time.5dagen_temp.26_biorep.2 temp.18_time.5dagen_biorep.1 temp.18_time.5dagen_biorep.2 ... $ temp : Factor w/ 2 levels 18,26: 2 2 1 1 2 2 1 1 1 1 ... $ time : Factor w/ 3 levels 14dagen,28dagen,..: 3 3 3 3 1 1 1 1 2 2 ... $ biorep : Factor w/ 2 levels 1,2: 1 2 1 2 1 2 1 2 1 2 ... Now, based on user input as well, I want to dynamically extract data from list$varieties. Therefore, I wrote a function which generates a string containing the data extraction conditions which looks like this: query - make.contrast.substring(negative.contrast, list) Read 1 item [1] (list$temp=='18')(list$time=='14dagen'|list$time=='28dagen'|list $time=='5dagen')(list$biorep=='1'|list$biorep=='2') Now what I want to achieve is to extract data by doing: list$varieties[query] which doesn't work since query is a string and object names are not expanded... Obviously, manually copying the string like so list$varieties[(list$temp=='18')(list$time=='14dagen'|list $time=='28dagen'|list$time=='5dagen')(list$biorep=='1'|list $biorep=='2')] works perfectly - but I need it to be automated. I'm quite new to R and used to programming in PHP, so I may just be conceptually confused about how to do this. Any help would be greatly appreciated. thanks in advance, Dries Knapen Dr. Dries Knapen University of Antwerp Department of Biology Ecophysiology, Biochemistry and Toxicology Groenenborgerlaan 171 - U711, B-2020 Antwerp Belgium tel ++32 3 265 33 49 fax ++32 3 265 34 97 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/dynamically-extract-data-from-a-list-tp18936737p18945945.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scripting - query
On Sun, Aug 10, 2008 at 02:44:00PM +1200, Gareth Campbell wrote: I have a vector: alleles.present-c(D3, D16, ... ) The alleles present changes given the case I'm dealing with - i.e. either all of the alleles I use for my calculations are present, or some of them. Depending on what alleles are present, I need to make matrices and do calculations on those alleles present and completely disregard any formula or other use of the alleles not present. I'm trying to figure out the best way to do this. Basically I'm trying to do if() commands (with no success so far) to allow me to query the alleles.present for the presence of each allele I use and then let dictate which formula to use etc... Does anyone have a good way to do this? I've been fiddling with grep() etc... but I can't get it to do what I need!! Very frustrating. It's going to be hard for people to make good suggestions here without a concrete example. Can you provide a toy example that is as simple as possible, while illustrating (some of) the problems you are trying to solve? Dan p.s. Are you familiar with %in% ? E.g. if(D3 %in% alleles.present) do.something() else do.something.else() See help(%in%) Thanks very much -- Gareth Campbell PhD Candidate The University of Auckland P +649 815 3670 M +6421 256 3511 E [EMAIL PROTECTED] [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting nested for loops to an apply function(s)
On Sat, Aug 09, 2008 at 08:53:00PM -0400, Kurt Newman wrote: Resending. Previous message was truncated. Sorry for possible confusion. From: [EMAIL PROTECTED] To: r-help@r-project.org Date: Sat, 9 Aug 2008 18:25:47 -0400 Subject: [R] Converting nested for loops to an apply function(s) Hello, I would like to know more about how to use the apply family and have attempted to convert nested for loops in example code from Contributed Documentation (The Friendly Beginners' R Course? by Toby Marthews (ZIP, 2007-03-01)) to an apply function(s). The relevant code is: distances=c(51,65,175,196,197,125,10,56)#distances of 8 houses from the town centre in m bearings=c(10,8,210,25,74,128,235,335) #bearings of the houses in degrees xpos=distances*sin(bearings*pi/180) #in sin and cos the argument MUST be in radians ypos=distances*cos(bearings*pi/180) numpoints=length(distances) nnd=rep(sqrt(2*400*400),times=numpoints)#start with the maximum possible distance for (i in 1:numpoints) { for (j in 1:numpoints) { if (i!=j) { diffx=abs(xpos[i]-xpos[j]) diffy=abs(ypos[i]-ypos[j]) nd=sqrt((diffx^2)+(diffy^2)) if (nd nnd[i]) {nnd[i]=nd} } } } print(data.frame(xpos,ypos,nnd)) My attempts to convert the nested for loops to an apply function(s) have not been successful. I would like to know how to convert the code to increase my knowledge of R programming and to evaluate operational efficiency of the different strategies. Hi Kurt, It's not just the apply() family that help in vectorising problems. In this case, outer() is also going to be helpful, as well as remembering that all the standard arithmetical operators automatically vectorise. I would use something like this: nearest.neighbour.distance - function(xpos, ypos) { xdist - abs(outer(xpos, xpos, -)) ydist - abs(outer(ypos, ypos, -)) dist - sqrt(xdist^2 + ydist^2) diag(dist) - NA apply(dist, 1, min, na.rm=TRUE) } Dan Thank you in advance for your comments / suggestions. Kurt Newman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help using outer function
On Sun, Aug 10, 2008 at 09:02:59AM -0700, warthog29 wrote: Hi, I would like to use the R's outer function on y below so that I can subtract elements from each other. The resulting dataframe is symmetric, save for the ^^ outer() returns a matrix, not a data frame. negative signs on the other half of the numbers. I would like to get only half of the dataframe. Here is the code I wrote (it is returning only the first line of the all elements I want. Please help). y-c(4,4,3.9,3.8,3.7,3.6,3.5,3.5,3.5,3.3,3.2,3.2) b-outer(y,y,-) b-as.matrix(by) I assume that line was supposed to be b-as.matrix(by). In any case you don't need it; b is a matrix already. # I want to keep the elements: #b[1,2:12], #b[2,3:12], #.until #b[11,12:12]. Use upper.tri() to get the upper-triangle: b[upper.tri(b, diag=FALSE)] [1] 0.0 0.1 0.1 0.2 0.2 0.1 0.3 0.3 0.2 0.1 0.4 0.4 0.3 0.2 0.1 0.5 0.5 0.4 0.3 [20] 0.2 0.1 0.5 0.5 0.4 0.3 0.2 0.1 0.0 0.5 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.7 0.7 [39] 0.6 0.5 0.4 0.3 0.2 0.2 0.2 0.8 0.8 0.7 0.6 0.5 0.4 0.3 0.3 0.3 0.1 0.8 0.8 [58] 0.7 0.6 0.5 0.4 0.3 0.3 0.3 0.1 0.0 Or perhaps you want to knock out the negative entries, but still keep the matrix structure: b[lower.tri(b)] - NA or perhaps you wanted b - abs(outer(y,y,-)) in the first place? #Here is the function I wrote to get half of matrix: wk-function(p){ for (i in 2:p){ ri-b[i-1,i:p] return(ri) } } wk(12) #[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.5 0.5 0.7 0.8 0.8 I think you were intending this function to be something like this wk-function(p){ ri - NULL for (i in 2:p){ ri-c(ri, b[i-1,i:p]) } return(ri) } Note that this function will give a different result from upper.tri(), because you are concatenating elements in the *rows* of the matrix, whereas the way matrices are represented in R has consecutive elements running down the columns. I.e. look at A - matrix(nrow=2,ncol=2) A [,1] [,2] [1,] NA NA [2,] NA NA A[] - 1:4 A [,1] [,2] [1,]13 [2,]24 Dan As you can see, it is only returning the first line. I would like other corresponding elements too, to be found in row 2 to 12. Thanks. -- View this message in context: http://www.nabble.com/help-using-outer-function-tp18914432p18914432.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help using outer function
On Sun, Aug 10, 2008 at 06:00:21PM +0100, Dan Davison wrote: On Sun, Aug 10, 2008 at 09:02:59AM -0700, warthog29 wrote: Hi, I would like to use the R's outer function on y below so that I can subtract elements from each other. The resulting dataframe is symmetric, save for the ^^ outer() returns a matrix, not a data frame. negative signs on the other half of the numbers. I would like to get only half of the dataframe. Here is the code I wrote (it is returning only the first line of the all elements I want. Please help). y-c(4,4,3.9,3.8,3.7,3.6,3.5,3.5,3.5,3.3,3.2,3.2) b-outer(y,y,-) b-as.matrix(by) I assume that line was supposed to be b-as.matrix(by). In any case Hmm, I didn't really clarify things there. I meant b-as.matrix(b). But anyway, not needed. you don't need it; b is a matrix already. # I want to keep the elements: #b[1,2:12], #b[2,3:12], #.until #b[11,12:12]. Use upper.tri() to get the upper-triangle: b[upper.tri(b, diag=FALSE)] [1] 0.0 0.1 0.1 0.2 0.2 0.1 0.3 0.3 0.2 0.1 0.4 0.4 0.3 0.2 0.1 0.5 0.5 0.4 0.3 [20] 0.2 0.1 0.5 0.5 0.4 0.3 0.2 0.1 0.0 0.5 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.7 0.7 [39] 0.6 0.5 0.4 0.3 0.2 0.2 0.2 0.8 0.8 0.7 0.6 0.5 0.4 0.3 0.3 0.3 0.1 0.8 0.8 [58] 0.7 0.6 0.5 0.4 0.3 0.3 0.3 0.1 0.0 Or perhaps you want to knock out the negative entries, but still keep the matrix structure: b[lower.tri(b)] - NA or perhaps you wanted b - abs(outer(y,y,-)) in the first place? #Here is the function I wrote to get half of matrix: wk-function(p){ for (i in 2:p){ ri-b[i-1,i:p] return(ri) } } wk(12) #[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.5 0.5 0.7 0.8 0.8 I think you were intending this function to be something like this wk-function(p){ ri - NULL for (i in 2:p){ ri-c(ri, b[i-1,i:p]) } return(ri) } Note that this function will give a different result from upper.tri(), because you are concatenating elements in the *rows* of the matrix, whereas the way matrices are represented in R has consecutive elements running down the columns. I.e. look at A - matrix(nrow=2,ncol=2) A [,1] [,2] [1,] NA NA [2,] NA NA A[] - 1:4 A [,1] [,2] [1,]13 [2,]24 Dan As you can see, it is only returning the first line. I would like other corresponding elements too, to be found in row 2 to 12. Thanks. -- View this message in context: http://www.nabble.com/help-using-outer-function-tp18914432p18914432.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] import/export txt file
On Fri, Aug 08, 2008 at 04:44:13PM -0700, Alessandro wrote: Hi All, I have 2 questions: 1. Import: when I import my txt file (X,Y and Z) in R with testground - read.table(file=c:/work_LIDAR_USA/R_kriging/ground26841492694149.txt, header=T), I lost the 4 number after the point (.). does It possible add in the code the possibility to read the 4 numbers after the . I think the problem is simply that you have options()$digits set to 7 (the default). Read the 'digits' section in help(options) and try options(digits=11) 2. Does It possible to write a X, Y, Z *txt file without the ID in R and sep, for the rows? write.csv(your.data.frame, row.names=FALSE, quote=FALSE) x - read.table(path/to/your/file.txt, header=T) x # X Y Z # 1 26800.47 4149984 1543.39 # 2 26800.47 4149984 1543.39 options(digits=11) x # X Y Z # 1 26800.47 4149983.94 1543.39 # 2 26800.47 4149983.94 1543.39 write.csv(x, row.names=FALSE, quote=FALSE) # X,Y,Z # 26800.47,4149983.94,1543.39 # 26800.47,4149983.94,1543.39 Dan Example: Original data: X Y Z 26800.4700 4149983.9400 1543.3900 ... . .. I wish to create a txt file (with , sep): X, Y, Z 26800.4700, 4149983.9400, 1543.3900 ..., ., .. Thanks (It's Friday night, sorry I am tired) Ale [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table question
On Fri, Aug 08, 2008 at 07:27:13PM -0700, Alessandro wrote: Hi All. I have a file txt with 3 columns (X, Y and Z). every rows has 4 decimal place (i.e. x.). I use read.table to import the data in R, but with summary(), I don't see the decimal place after the dot. Is there any way for me to preserve the information? I hope I've answered this in the first thread on the subject. https://stat.ethz.ch/pipermail/r-help/2008-August/170422.html Dan p.s. People don't like it if you submit the same question twice. testground - read.table (file=c:/work_LIDAR_USA/R_kriging/ground26841492694149.txt, header=T) thanks Ale [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] effective matrix subset
On Sat, Aug 09, 2008 at 06:29:59AM -0500, Marc Schwartz wrote: on 08/09/2008 06:01 AM [EMAIL PROTECTED] wrote: Hi; If we have a matrix A, and a vector X, where length(X)=nrow(A), and X contains a wanted column for each row in A, in row ascending order. How would be the most effective way to extract the desired vector V (with length(V)=nrow(A))? A - matrix(1:20, 4, 5) A [,1] [,2] [,3] [,4] [,5] [1,]159 13 17 [2,]26 10 14 18 [3,]37 11 15 19 [4,]48 12 16 20 # Create an arbitrary set of indices, one for each row in A X - c(2, 5, 1, 4) X [1] 2 5 1 4 Presumably you want: V - c(A[1, 2], A[2, 5], A[3, 1], A[4, 4]) V [1] 5 18 3 16 If so, then: sapply(seq(nrow(A)), function(i) A[i, X[i]]) [1] 5 18 3 16 Or A[cbind(seq(nrow(A)), X)] [1] 5 18 3 16 Dan Is that what you were looking for? BTW, see ?diag for a special case: diag(A) [1] 1 6 11 16 HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Index alternative to nasty FOR loop?
On Wed, Aug 06, 2008 at 05:42:21PM +, zack holden wrote: Dear R wizards, I have a folder containing 1000 files. For each file, I need to extract the first row of each file, paste it to a new file, then write out that file. Then I need to repeat this operation for each additional row (row 2, then row 3, etc) for 23 rows in each file. I can do this with a for loop (as below). Hi Zack, There's a few problems with your sketched-out for loop (see below), but if I've understood your problem, then here are a couple of solutions that use for loops in the way you were intending. They both take line i from file 1, line i from file 2, ..., and write them to a file called lines_i, for i in 1:23. The first one is for the case when you have tabular data, so it uses read.table, and write.table. You might want to mess about with the arguments to read.table and write.table, specifying whether you have a header, and whether you want the row.names printed out, etc. The second one is similar but just works line by line, regardless of what the line looks like (i.e. doesn't assume you have tabular data in the files). collate.lines.1 - function(folder, nrows=23) { files - list.files(folder, full.names=TRUE) for(file in files) { file.as.data.frame - read.table(file) for(row in 1:nrows) { outfile - paste(lines_, row, .csv, sep=) write.table(file.as.data.frame[row,], file=outfile, append=TRUE, row.names=FALSE, col.names=FALSE, sep=,) } } } collate.lines.2 - function(folder, nrows=23) { files - list.files(folder, full.names=TRUE) for(file in files) { file.as.character.vector - scan(file, what=, sep=\n) for(row in 1:nrows) { outfile - paste(lines, row, sep=_) cat(file.as.character.vector[row], \n, file=outfile, append=TRUE) } } } Is there a way to use some of the indexing power of R to get around this nasty loop? If you really mean that you want a solution without explicit for loops in R, then that is possible. But I would recommend that you stick to a straightforward solution until you're completely comfortable with programming in that style. It's conceivable that the no-for-loop versions might be faster if you have lots of files / rows, but don't worry aout speed until it's a problem. Here's my effort at doing it without for loops; it's a bit of a stretch and wasn't as easy to write down as the first two. I've probably missed a cleaner solution. collate.lines.1.fancy - function(folder, nrows=23) { outfiles - paste(lines_, 1:nrows, .csv, sep=) files - list.files(folder, full.names=TRUE) files.as.data.frames - lapply(files, read.table) x - lapply(files.as.data.frames, function(df) split(df, f=factor(1:nrow(df ## split all rows apart x - do.call(mapply, c(x, list(FUN=function(...) rbind(...), SIMPLIFY=FALSE))) ## collate rows from different data frames write.function - function(dataframe, outfile) write.table(dataframe, file=outfile, row.names=FALSE, col.names=FALSE, sep=,) invisible(mapply(write.function, x, outfiles)) } Thank you in advance for any suggestions ### newoutfile - data.frame() list - list.files(c:/data) ## 'list' not such a good name as it's a built-in function file = 1 ## you don't need this for(file in list) { row - file[1, ] ## that's not going to work; 'list' is a character vector, you haven't got the files as data.frames yet newoutfile - rbind(row, newoutfile) file = file + 1 write.csv(outfile, file = output.csv) } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Union of columns of two matrices
On Wed, Aug 06, 2008 at 06:32:43PM -0400, Giuseppe Paleologo wrote: I was posed the following problem/teaser: given two matrices, come up with an elegant (=fast short) function that returns a matrix with all and only the non-duplicated columns of both matrices; the column order does not matter. In essence, a matrix equivalent of union(x,y), where x and y are vectors. I could not come with anything nice. Any ideas? union.matrices - function(a, b) { u - cbind(a,b) u[,!duplicated(u, MARGIN=2)] } ? (Obviously not attempting to deal with issues of identity of columns containing real numbers) Dan Giuseppe -- Giuseppe A. Paleologo :: Email: [EMAIL PROTECTED] :: AOL: gappy3000 :: Skype :: gappy3000 :: Gtalk: paleologo :: Mobile: 917.331.3497 fact: 2^32,582,657-1 is a prime [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] List of occurrence matrices
Lauri Nikkinen wrote: R users, I don't know if I can make myself clear but I'll give it a try. I have a data.frame like this x - var1,var2,var3,var4 a,b,b,a b,b,c,b c,a,a,a a,b,c,c b,a,c,a c,c,b,b a,c,a,b b,c,a,c c,a,b,c DF - read.table(textConnection(x), header=T, sep=,) DF and I would like to sum all the combinations/occurences by a factor (letter in this case) between variables and produce a list of occurrence matrices. For example in this case the occurrence matrix (first element of list) for factor a should look like this occulist $a var1var2var3var4 var1 x 0 1 1 var2 0 x 1 2 var3 1 1 x 1 var4 1 2 1 x $b etc. because there is two rows where var2 and var4 has a I think this does it: occur.matrices - function(df) { levels - levels(unlist(df)) ans - lapply(levels, function(level) crossprod(df == level)) structure(ans, names=levels) } Dan occur.matrices(DF) $a var1 var2 var3 var4 var13011 var20312 var31131 var41213 $b var1 var2 var3 var4 var13101 var21311 var30131 var41113 $c var1 var2 var3 var4 var13101 var21301 var30031 var41113 DF[DF$var2==a DF$var4==a,] Can you give an advice how to achieve this kind of a list of matrices? -Lauri __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/List-of-%22occurrence%22-matrices-tp18870809p18871268.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating an array of lists
Gang Chen-4 wrote: Hi, I want to store some number of outputs from running a bunch of analyses such as lm() into an array. I know how to do this with a one-dimensional array (vector) by creating myArray - vector(mode='list', length=10) Note that in R terminology, 'myArray' is a list, not an array. You are right to store things like lm() output in a list. If you want to store multiple lm outputs in a way that is conceptually multi-dimensional, I would suggest using lists of lists. Then you can use rapply(lm.fits, some.function, how=replace) to process the model fits while keeping the multi-dimensional structure. Dan Gang Chen-4 wrote: and storing each lm() result into a component of myArray. My question is, how can do this for a multiple dimensional array? It seems array() does not have such a 'mode' option as in vector(). Any alternatives? Thanks in advance, Gang __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Creating-an-array-of-lists-tp18874326p18875567.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Font size in plots (I do NOT understand par help)
On Wed, Aug 06, 2008 at 03:37:48PM +0100, Stephane Bourgeois wrote: Hi, I do not get how par works, help please. Let's say I have a simple plot: plot(1:10) I want to change the font size for the x axis... how do I do that? OK, so firstly go to the help page for par by typing ?par I'm not saying you should read the whole thing right now. There's quite a lot of options. But you want to change something to do with axes, so search for the word 'axis'. The 3rd hit I get shows the following lines. 'cex.axis' The magnification to be used for axis annotation relative to the current setting of 'cex'. 'cex.lab' The magnification to be used for x and y labels relative to the current setting of 'cex'. Note that one of those refers to the axis annotation (i.e the numbers along the axis), whereas the other refers to the axis labels. Now there's two ways to proceed. First, note that par() is a function. When you call the function, it changes the values of the graphics parameters you specify. So say you want to make the axis labels font twice as big. The first method would be par(cex.lab=2) plot(1:10) An alternative method is as follows: plot(1:10, cex.lab=2) If you don't know why that works, look at the help page for plot by typing ?plot, and read the stuff about the three dots (...) If you go for the first method, one useful trick is to save the previous values, so you can restore them. You would do that like this: old.par.settings - par(cex.lab=2) plot(1:10) ## now restore them par(old.par.settings) That works because the function par() happens to spit out the old values as its return value, although its effect is to change them. To be fair, you actually asked how to change the font size on the x-axis, whereas the above changes it on both axes. AFAIK there's no par() options that do exactly that, so the way I'd do it would be to first plot without any axis labels, and subsequently add the x- and y- labels independently using the title() function, and passing extra 'cex.lab=' arguments in the same way as the second method above: plot(1:10, xlab=, ylab=) title(xlab=xlab title, cex.lab=3) title(ylab=ylab title, cex.lab=.5) Dan Thank you, Stephane -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a compa ny registered in England with number 2742969, whose registered office is 2 15 Euston Road, London, NW1 2BE. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.