Re: [R] Seeking help with a loop
x - data.frame(q33a=3:4,q33b=5:6,q35a=1:2,q35b=2:1) y - list() for (i in grep(q33, colnames(x), value=TRUE)) +y[[sub(q33,,i)]] - ifelse(x[[sub(q33,q35,i)]]==1, x[[i]], NA) as.data.frame(y) a b 1 3 NA 2 NA 6 # if you really want to create new variables rather # than have them in a data frame: # (use paste() or sub() to modify the names if you # want something like newfielda) for (i in names(y)) assign(i, y[[i]]) a [1] 3 NA b [1] NA 6 hope this helps, Tony Plate Greg Blevins wrote: Hello R Helpers, After spending considerable time attempting to write a loop (and searching the help archives) I have decided to post my problem. In a dataframe I have columns labeled: q33a q33b q33c...q33rq35a q35b q35c...q35r What I want to do is create new variables based on the following logic: newfielda - ifelse(q35a==1, q33a, NA) newfieldb - ifelse(q35b==1, q33b, NA) ... newfieldr What I did was create two new dataframes, one containing q33a-r the other q35a-r and tried to loop over both, but I could not get any of the loop syntax I tried to give me the result I was seeking. Any help would be much appreciated. Greg Blevins Partner The Market Solutions Group, Inc. Minneapolis, MN Windows XP, R 2.1.1 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Why only a string for heading for row.names with write.csv with a matrix?
Here's a relatively easy way to get what I think you want. Note that converting x to a data frame before cbind'ing allows the type of the elements of x to be preserved: x - matrix(1:6, 2,3) rownames(x) - c(ID1, ID2) colnames(x) - c(Attr1, Attr2, Attr3) x Attr1 Attr2 Attr3 ID1 1 3 5 ID2 2 4 6 write.table(cbind(id=row.names(x), as.data.frame(x)), row.names=FALSE, sep=,) id,Attr1,Attr2,Attr3 ID1,1,3,5 ID2,2,4,6 As to why you can't get this via an argument to write.table (or write.csv), I suspect that part of the answer is a wish to avoid creeping featuritis. Transferring data between programs is notoriously infuriating. There are more data formats than there are programs, but few programs use the same format as their default preferred format. So to accommodate everyone's preferred format would require an extremely large number of features in the data import/export functions. Maintaining software that contains a large number of features is difficult -- it's easy for errors to creep in because there are so many combinations of how different features can be used on different functions. The alternative to having lots of features on each function is to have a relatively small set of powerful functions that can be used to construct the behavior you want. This type of software is thought by many to be easier to maintain and extend. I think is is pretty much the preferred approach in R. The above one-liner for writing the data in the form you want is really not much more complex than using an additional argument to write.table(). (And if you need to do this kind of thing frequently, then it's easy in R to create your own wrapper function for 'write.table'.) One might object to this line of explanation by noting that many functions already have many arguments and lots of features. I think the situation is that the original author of any particular function gets to decide what features the function will have, and after that there is considerable reluctance (justifiably) to add new features, especially in cases where there desired functionality can be easily achieved in other ways with existing functions. -- Tony Plate Earl F. Glynn wrote: Consider: x - matrix(1:6, 2,3) rownames(x) - c(ID1, ID2) colnames(x) - c(Attr1, Attr2, Attr3) x Attr1 Attr2 Attr3 ID1 1 3 5 ID2 2 4 6 write.csv(x,file=x.csv) ,Attr1,Attr2,Attr3 ID1,1,3,5 ID2,2,4,6 Have I missed an easy way to get the string to be something meaningful? There is no information in the string. This column heading for the row names often could used as a database key, but the entry would need to be manually edited first. Why not provide a way to specify the string instead of putting as the heading for the rownames? From http://finzi.psych.upenn.edu/R/doc/manual/R-data.html Header line R prefers the header line to have no entry for the row names, . . . Some other systems require a (possibly empty) entry for the row names, which is what write.table will provide if argument col.names = NA is specified. Excel is one such system. Why is an empty entry the only option here? A quick solution that comes to mind seems a bit kludgy: y - cbind(rownames(x), x) colnames(y)[1] - ID y IDAttr1 Attr2 Attr3 ID1 ID1 1 3 5 ID2 ID2 2 4 6 write.table(y, row.names=F, col.names=T, sep=,, file=y.csv) ID,Attr1,Attr2,Attr3 ID1,1,3,5 ID2,2,4,6 Now the rownames have an ID header, which could be used as a key in a database if desired without editing (but all the numbers are now characters strings, too). It's also not clear why I had to use write.table above, instead of write.csv: write.csv(y, row.names=F, col.names=T, file=y.csv) Error in write.table(..., col.names = NA, sep = ,, qmethod = double) : col.names = NA makes no sense when row.names = FALSE Thanks for any insight about this. efg -- Earl F. Glynn Bioinformatics Stowers Institute __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] queer data set
Here's one way of working with the data you gave: x - read.table(file(clipboard), fill=T, header=T) x HEADER1 HEADER2 HEADER3 HEADER3.1 1 A1 B1 C1 X11;X12;X13 2 A2 B2 C2 X21;X22;X23;X24;X25 3 A3 B3 C3 4 A4 B4 C4 X41;X42;X43 5 A5 B5 C5 X51 apply(x, 1, function(x) strsplit(x[4], ;)[[1]]) $1 [1] X11 X12 X13 $2 [1] X21 X22 X23 X24 X25 $3 character(0) $4 [1] X41 X42 X43 $5 [1] X51 do.call(rbind, apply(x, 1, function(x) { +y - strsplit(x[4], ;)[[1]] +x3 - matrix(x[1:3], ncol=3, nrow=max(1,length(y)), byrow=T) +return(cbind(x3, if (length(y)) y else NA)) + })) [,1] [,2] [,3] [,4] [1,] A1 B1 C1 X11 [2,] A1 B1 C1 X12 [3,] A1 B1 C1 X13 [4,] A2 B2 C2 X21 [5,] A2 B2 C2 X22 [6,] A2 B2 C2 X23 [7,] A2 B2 C2 X24 [8,] A2 B2 C2 X25 [9,] A3 B3 C3 NA [10,] A4 B4 C4 X41 [11,] A4 B4 C4 X42 [12,] A4 B4 C4 X43 [13,] A5 B5 C5 X51 This of course is a matrix; you can convert it back to a dataframe using as.data.frame() if you desire. Use either NA (with quotes) or NA (without quotes) to control whether you get just the string NA or an actual character NA value in column 4. If you're processing a huge amount of data, you can probably do better by rewriting the above code to avoid implicit coercions of data types. hope this helps, Tony Plate S.O. Nyangoma wrote: I have a dataset that is basically structureless. Its dimension varies from row to row and sep(s) are a mixture of tab and semi colon (;) and example is HEADER1 HEADER2 HEADER3 HEADER3 A1 B1 C1 X11;X12;X13 A2 B2 C2 X21;X22;X23;X24;X25 A3 B3 C3 A4 B4 C4 X41;X42;X43 A5 B5 C5 X51 etc., say. Note that a blank under HEADER3 corresponds to non occurance and all semi colon (;) delimited variables are under HEADER3. These values run into tens of thousands. I want to give some order to this queer matrix to something like: HEADER1 HEADER2 HEADER3 HEADER3 A1 B1 C1 X11 A1 B1 C1 X12 A1 B1 C1 X13 A1 B1 C1 X14 A2 B2 C2 X21 A2 B2 C2 X22 A2 B2 C2 X23 A2 B2 C2 X24 A2 B2 C2 X25 A2 B2 C2 X26 A3 B3 C3 NA A4 B4 C4 X41 A4 B4 C4 X42 A4 B4 C4 X43 Is there a brilliant R-way of doing such task? Goodday. Stephen. - Original Message - From: Prof Brian Ripley [EMAIL PROTECTED] Date: Monday, August 15, 2005 11:13 pm Subject: Re: [R] How to get a list work in RData file On Mon, 15 Aug 2005, Xiyan Lon wrote: Dear R-Helper, (There are quite a few of us.) I want to know how I get a list work which I saved in RData file. For example, I don't understand that at all, but it looks as if you want to save an unevaluated call, in which case see ?quote and use something like xyadd - quote(test.xy(x=2, y=3)) load and saving has nothing to do with this: it doesn't change the meaning of objects in the workspace. test.xy - function(x,y) { +xy - x+y +xy + } xyadd - test.xy(x=2, y=3) xyadd [1] 5 x1 - c(2,43,60,8) y1 - c(91,7,5,30) xyadd1 - test.xy(x=x1, y=y1) xyadd1 [1] 93 50 65 38 save(list = ls(all=TRUE), file = testxy.RData) rm(list=ls(all=TRUE)) load(C:/R/useR/testxy.RData) ls() [1] test.xy x1 xyadd xyadd1 y1 ls.str(pat=xyadd) xyadd : num 5 xyadd1 : num [1:4] 93 50 65 38 When I run, I know the result like above xyadd [1] 5 xyadd1 [1] 93 50 65 38 what I want to know, is there any function to make the result like: xyadd test.xy(x=2, y=3) and xyadd1 test.xy(x=x1, y=y1) -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting- guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Regular expressions sub
x - scan(clipboard, what=) Read 7 items x [1] 1.11 10.11 11.11 113.31 114.2 114.3 114.8 gsub([0-9]*\\., , x) [1] 11 11 11 31 2 3 8 Bernd Weiss wrote: Dear all, I am struggling with the use of regular expression. I got as.character(test$sample.id) [1] 1.11 10.11 11.11 113.31 114.2 114.3 114.8 and need [1] 11 11 11 31 2 3 8 I.e. remove everything before the . . TIA, Bernd __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] books about MCMC to use MCMC R packages?
I've found Bayesian Data Analysis by Gelman, Carlin, Stern Rubin (2nd ed) to be quite useful for understanding how MCMC can be used for Bayesian models. It has a little bit of R code in it too. -- Tony Plate Molins, Jordi wrote: Dear list users, I need to learn about MCMC methods, and since there are several packages in R that deal with this subject, I want to use them. I want to buy a book (or more than one, if necessary) that satisfies the following requirements: - it teaches well MCMC methods; - it is easy to implement numerically the ideas of the book, and notation and concepts are similar to the corresponding R packages that deal with MCMC methods. I have done a search and 2 books seem to satisfy my requirements: - Markov Chain Monte Carlo In Practice, by W.R. Gilks and others. - Monte Carlo Statistical methods, Robert and Casella. What do people think about these books? Is there a suggestion of some other book that could satisfy better my requirements? Thank you very much in advance. The information contained herein is confidential and is inte...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Assign references
Looking at what objects exist after the call to myFunk() should give you a clue as to what happened: remove(list=objects()) myFunk-function(a,b,foo,bar) {foo-a+b; bar-a*b;} x-0; y-0; myFunk(4,5,x,y) x [1] 0 y [1] 0 objects() [1] barfoomyFunk x y bar [1] 20 foo [1] 9 I suspect that you might have slightly misinterpreted Thomas Lumely's explanations of how the - operator works in different situations (the LHS must exist if you are assigning using a replacement operator, e.g., as in foo[1] - ..., but not when you are assigning the whole object as in foo - ...). But I really would suggest careful consideration of what might be the best way to approach your problem -- modifying global data from within a function is not the standard way of using R. Unless you are very careful about how you do it, it is likely to cause headaches for yourself and/or others down the road (because R is just not intended to be used that way). The standard way of doing this sort of thing in R is to modify a local copy of the dataframe and return that, or if you have to return several dataframes, then return a list of dataframes. -- Tony Plate [EMAIL PROTECTED] wrote: Folks, I've run into trouble while writing functions that I hope will create and modify a dataframe or two. To that end I've written a toy function that simply sets a couple of variables (well, tries but fails). Searching the archives, Thomas Lumley recently explained the - operator, showing that it was necessary for x and y to exist prior to the function call, but I haven't the faintest why this isn't working: myFunk-function(a,b,foo,bar) {foo-a+b; bar-a*b;} x-0; y-0; myFunk(4,5,x,y) x-0; y-0; myFunk(4,5,x,y) x [1] 0 y [1] 0 What (no doubt simple) reason is there for x and y not changing? Thank you, cur -- Curt Seeliger, Data Ranger CSC, EPA/WED contractor 541/754-4638 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R on a supercomputer
In general, R is not written in such a way that data remain in cache. However, R can use optimized BLAS libraries, and these are. So if your version of R is compiled to use an optimized BLAS library appropriate to the machine (e.g., ATLAS, or Prof. Goto's Blas), AND a considerable amount of the computation done in your R program involves basic linear algebra (matrix multiplication, etc.), then you might see a good speedup. -- Tony Plate Kimpel, Mark William wrote: I am using R with Bioconductor to perform analyses on large datasets using bootstrap methods. In an attempt to speed up my work, I have inquired about using our local supercomputer and asked the administrator if he thought R would run faster on our parallel network. I received the following reply: The second benefit is that the processors have large caches. Briefly, everything is loaded into cache before going into the processor. With large caches, there is less movement of data between memory and cache, and this can save quite a bit of time. Indeed, when programmers optimize code they usually think about how to do things to keep data in cache as long as possible. Whether you would receive any benefit from larger cache depends on how R is written. If it's written such that data remain in cache, the speed-up could be considerable, but I have no way to predict it. My question is, is R written such that data remain in cache? Thanks, Mark W. Kimpel MD Indiana University School of Medicine [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] transformation matrice of vector into array
Here's a way to convert a matrix of vectors like you have into an array: x - array(lapply(seq(0,len=6,by=4), +, c(a=1,b=2,c=3,d=4)), dim=c(2,3), dimnames=list(c(X,Y),c(e,f,g))) x e f g X Numeric,4 Numeric,4 Numeric,4 Y Numeric,4 Numeric,4 Numeric,4 x[[Y,e]] a b c d 5 6 7 8 xa - array(unlist(x, use.names=F), dim=c(length(x[[1,1]]),dim(x)), dimnames=c(list(names(x[[1,1]])),dimnames(x))) x[Y,e] [[1]] a b c d 5 6 7 8 xa[,Y,e] a b c d 5 6 7 8 Then you can do whatever sums you want over the array. I have not extensively checked the above code, and if I were going to use it, I would do numerous spot checks of elements to make sure all the elements are going to the right places -- it's not too difficult to make mistakes when pulling apart and reassembling arrays like this. (For simpler cases involving lists of vectors or matrices, the abind() function can help.) -- Tony Plate Jessica Gervais wrote: Hi, I need some help I have a matrix M(m,n) in which each element is a vector V of lenght 6 1 2 3 4 5 6 7 1 List,6 List,6 List,6 List,6 List,6 List,6 List,6 2 List,6 List,6 List,6 List,6 List,6 List,6 List,6 3 List,6 List,6 List,6 List,6 List,6 List,6 List,6 4 List,6 List,6 List,6 List,6 List,6 List,6 List,6 i would like to make the sum on the matrix of each element of the matrix, that is to say sum(on the matrix)(M[j,][[j]][[1]]) sum(on the matrix)(M[j,][[j]][[2]]) ... sum(on the matrix)(M[j,][[j]][[6]]) I don't really know how to do. I thought it was possible to transform the matrix M into an array A of dimension (m,n,6), and then use the command sum(colsums(A[,,1]), which seems to be possible and quite fast. ...but I don't know how to convert a matrix of vector into an array As anyone any little idea about that ? Thanks by advance Jessica __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Functions ,Optim, Dataframe
Supply your additional arguments to optim() and they will get passed to your function: mydat-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35)) fr-function(x, d) { + # d is a vector of d1, d2, p1 p2 + u - x[1] + v - x[2] + d1 - d[1] + d2 - d[2] + p1 - d[3] + p2 - d[4] + sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2))) + } x0 - c(1,1)# starting values for two unknown parameters y1 - optim(x0,fr,d=unlist(mydat[1,])) y2 - optim(x0,fr,d=unlist(mydat[2,])) y1$par [1] 0.462500 0.828125 y2$par [1] -1.0937500 0.2828125 yall - apply(mydat, 1, function(d) optim(x0,fr,d=d)) yall[[1]]$par [1] 0.462500 0.828125 yall[[2]]$par [1] -1.0937500 0.2828125 One thing you must be careful of is that none of the arguments to your function match or partially match the named arguments of optim(), which are: names(formals(optim)) [1] par fn gr method lower upper control [8] hessian ... For example, if your function has an argument 'he=', you will not be able to pass it, because if you say optim(x0, fr, he=3), the 'he' will match the 'hessian=' argument of optim(), and it will not be interpreted as being a '...' argument. -- Tony Plate Michael Papenfus wrote: I think I need to clarify a little further on my original question. I have the following two rows of data: mydat-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35)) mydat d1 d2 p1 p2 1 3 6 0.55 0.85 2 5 10 0.05 0.35 I need to optimize the following function using optim for each row in mydat fr-function(x) { u-x[1] v-x[2] sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2)) } x0-c(1,1)# starting values for two unknown parameters y-optim(x0,fr) In my defined function fr, (d1 d2 p1 p2) are known values which I need to read in from my dataframe and u v are the TWO unknown parameters. I want to solve this equation for each row of my dataframe. I can get this to work when I manually plug in the known values (d1 d2 p1 p2). However, I would like to apply this to each row in my dataframe where the known values are automatically passed to my function which then is sent to optim which solves for the two unknown parameters for each row in the dataframe. thanks again, mike __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Functions ,Optim, Dataframe
I added an example of passing additional arguments through optim() to the objective and gradient functions to the Discussion section of the Wiki-fied R documentation. See it at http://wiki.r-project.org/rwiki/doku.php?id=rdoc:stats:optim -- Tony Plate PS. I had to add purge=true to the end of the URL, i.e., http://wiki.r-project.org/rwiki/doku.php?id=rdoc:stats:optimpurge=true in order to see the original documentation the first time -- it's something to do with bad cache entries for the page. Michael Papenfus wrote: I think I need to clarify a little further on my original question. I have the following two rows of data: mydat-data.frame(d1=c(3,5),d2=c(6,10),p1=c(.55,.05),p2=c(.85,.35)) mydat d1 d2 p1 p2 1 3 6 0.55 0.85 2 5 10 0.05 0.35 I need to optimize the following function using optim for each row in mydat fr-function(x) { u-x[1] v-x[2] sqrt(sum((plnorm(c(d1,d2,u,v)-c(p1,p2))^2)) } x0-c(1,1)# starting values for two unknown parameters y-optim(x0,fr) In my defined function fr, (d1 d2 p1 p2) are known values which I need to read in from my dataframe and u v are the TWO unknown parameters. I want to solve this equation for each row of my dataframe. I can get this to work when I manually plug in the known values (d1 d2 p1 p2). However, I would like to apply this to each row in my dataframe where the known values are automatically passed to my function which then is sent to optim which solves for the two unknown parameters for each row in the dataframe. thanks again, mike __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] deleting a directory
?unlink says that unlink() can remove directories (and has a 'recursive' argument). 'unlink' is in the SEE ALSO section in ?file.remove. -- Tony Plate Sundar Dorai-Raj wrote: Hi, all, I'm looking a utility for removing a directory from within R. Currently, I'm using: foo - function(...) { mydir - tempdir() dir.create(mydir, showWarnings = FALSE, recursive = TRUE) on.exit(system(sprintf(rm -rf %s, mydir))) ## do some stuff in mydir invisible() } However, this is assumes rm is available. I know of ?dir.create, but there is no opposite. And ?file.remove appears to work only on files and not directories. Any advice? Or is my current approach the only solution? R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 3.1 year 2006 month 06 day01 svn rev38247 language R version.string Version 2.3.1 (2006-06-01) Thanks, --sundar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] meta characters in file path
What is the problem you are having? Seems to work fine for me running under Windows2000: write.table(data.frame(a=1:3,b=4:6), file=@# x.csv, sep=,) read.csv(file=@# x.csv) a b 1 1 4 2 2 5 3 3 6 sessionInfo() Version 2.3.1 (2006-06-01) i386-pc-mingw32 attached base packages: [1] methods stats graphics grDevices utils datasets [7] base other attached packages: XML 0.99-8 Li,Qinghong,ST.LOUIS,Molecular Biology wrote: Hi, I need to read in some files. The file names contain come meta characters such as @, #, and white spaces etc, In read.csv, file= option, is there any way that one can make the function to recognize a file path with those characters? Thanks Johnny [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regex scares me
I think this does the trick. Note that it is case sensitive. x - c(lad.tab, xxladyy.tab, xxyy.tab, lad.tabx, LAD.tab, lad.TAB) grep(lad.*\\.tab$, x, value=T) [1] lad.tab xxladyy.tab Jon Minton wrote: Hi, apologies if this is too simple but I've been stuck on the following for a while: I have a vector of strings: filenames with a name before the extension and a variety of possible extensions I want to select only those files with: 1) a .tab extension AND 2) the character sequence lad anywhere in the name of the file before the extension. Surely this won't take long to do, I thought. (But I was wrong.) What's the regexp pattern to specify here? Thanks, Jon Minton [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cannot get simple data.frame binding.
Maybe I'm missing something, but your Real life code looks like it should work. What happens when you do: ire1 - data.frame(md1[, 1:11], other) Error in data.frame(md1[, 1:11], other) : arguments imply differing number of rows: 11, 75 str(md1[, 1:11]) str(other) ? Maybe the labelled data frame is causing the problem? Did you try as.data.frame(md1[,1:11])? (I'm guessing that will strip off extra attributes). -- Tony Plate John Kane wrote: I am stuck on a simple problem where an example works fine but the real one does not. I have a data.frame where I wish to sum up some values across the rows and create a new data.frame with some of old data.frame variables and the new summed variable. It works fine in my simple example but I am doing something wrong in the real world. In the real world I am loading a labeled data.frame. The orginal data comes from a spss file imported using spss.get but the current data.frame is a subset of the orginal spss file. EXAMPLE cata - c( 1,1,6,1,1,NA) catb - c( 1,2,3,4,5,6) doga - c(3,5,3,6,4, 0) dogb - c(2,4,6,8,10, 12) rata - c (NA, 9, 9, 8, 9, 8) ratb - c( 1,2,3,4,5,6) bata - c( 12, 42,NA, 45, 32, 54) batb - c( 13, 15, 17,19,21,23) id - c('a', 'b', 'b', 'c', 'a', 'b') site - c(1,1,4,4,1,4) mat1 - cbind(cata, catb, doga, dogb, rata, ratb, bata, batb) data1 - data.frame(site, id, mat1) attach(data1) data1 aa - which(names(data1)==rata) bb - length(names(data1)) mat1 - as.matrix(data1[,aa:bb]) food - apply( mat1, 1, sum , na.rm=T) food abba - data.frame(data1[, 1:6], food) abba -- Real life problem load(C:/start/R.objects/partly.corrected.materials.Rdata) md1-partly.corrected.materials aa - which(names(md1)==oaks) bb - length(names(md1)) # sum the values of the other variables mat1 - as.matrix( md1[, aa:bb] ) other - apply(mat1,1, sum, na.rm=T) ire1 - data.frame(md1[, 1:11], other) Error in data.frame(md1[, 1:11], other) : arguments imply differing number of rows: 11, 75 - I have simply worked around the problem by using ire1 - data.frame(md1$site, md1$colour, md1$ss1 ... , other) but I would like to know what stupid thing I am doing. Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with putting objects in list
I suspect you are not thinking about the list and the subsetting/extraction operators in the right way. A list contains a number of components. To get a subset of the list, use the '[' operator. The subset can contain zero or more components of the list, and it is a list itself. So, if x is a list, then x[2] is a list containing a single component. To extract a component from the list, use the '[[' operator. You can only extract one component at a time. If you supply a vector index with more than one element, it will index recursively. x - list(1,2:3,letters[1:3]) x [[1]] [1] 1 [[2]] [1] 2 3 [[3]] [1] a b c # a subset of the list x[2:3] [[1]] [1] 2 3 [[2]] [1] a b c # a list with one component: x[2] [[1]] [1] 2 3 # the second component itself x[[2]] [1] 2 3 # recursive indexing x[[c(2,1)]] [1] 2 x[[c(3,2)]] [1] b Rainer M Krug wrote: Hi I use the following code and it stores the results of density() in the list dr: dens - function(run) { density( positions$X[positions$run==run], bw=3, cut=-2 ) } dr - lapply(1:5, dens) but the results are stored in dr[[i]] and not dr[i], i.e. plot(dr[[1]]) works, but plot([1]) doesn't. Is there any way that I can store them in dr[i]? Thanks a lot, Rainer __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rename cols
The following works for data frames and matrices (you didn't say which you were working with). x - data.frame(V1=1:3,V2=4:6) x V1 V2 1 1 4 2 2 5 3 3 6 colnames(x) - c(Apple, Orange) x Apple Orange 1 1 4 2 2 5 3 3 6 For a data frame, 'names(x) - c(Apple, Orange)' also works, because a dataframe is stored internally as a list of columns. -- Tony Plate Ethan Johnsons wrote: A quick question please! How do you rename column names? i.e. V1 -- Apple; V2 -- Orange, etc. thx much ej [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Access Rows in a Data Frame by Row Name
Matrix-style indexing works for both columns and rows of data frames. E.g.: x - data.frame(a=1:5, b=6:10, d=11:15) x a b d 1 1 6 11 2 2 7 12 3 3 8 13 4 4 9 14 5 5 10 15 x[2:4,c(1,3)] a d 2 2 12 3 3 13 4 4 14 Time spend reading the help document An Introduction to R will probably be well worth it. The relevant sections are 5 Arrays and matrices, and 6.3 Data frames. -- Tony Plate Michael Gormley wrote: I have created a data frame using the read.table command. I want to be able to access the rows by the row name, or a vector of row names. I know that you can access columns by using the data.frame.name$col.name. Is there a way to access row names in a similar manner? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbolic matrix elements...
If I construct the matrix by list()ing together the expressions rather than c()ing, then it works OK: x - matrix(list( expression(x3-5*x+4), expression(log(x2-4*x x[1,1] [[1]] expression(x3 - 5 * x + 4) x[[1,1]] expression(x3 - 5 * x + 4) D(x[[1,1]], x) -5 The reason c() doesn't work properly here might have something to do with it creating a language object of an unconventional type: c( expression(x3-5*x+4), expression(log(x2-4*x))) expression(x3 - 5 * x + 4, log(x2 - 4 * x)) expression(x3-5*x+4) expression(x3 - 5 * x + 4) Using list() with language objects is much safer if you just want to make lists of them. -- Tony Plate Evan Cooch wrote: Eik Vettorazzi wrote: test=matrix(c( expression(x^3-5*x+4), expression(log(x^2-4*x works. Well, not really (or I'm misunderstanding). Your code enters fine (no errors), but I can't access individual elements - e.g., test[1,1] gives me an error: test=matrix(c( expression(x^3-5*x+4), expression(log(x^2-4*x test[1,1] Error: matrix subscripting not handled for this type Meaning...what? btw. you recieved an error because D expects an expression and you offered a list OK - so why then are each of the elements identified as an expression which I print out the vector? Each element is reported to be an expression. OK, if so, then I remain puzzled as to how this is a 'list'. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] List-manipulation
Does this do what you want? x - list(1,2,3:7,8,9:10) sapply(x, function(xx) xx[1]) [1] 1 2 3 8 9 -- Tony Plate Benjamin Otto wrote: Hi, Sorry for the question, I know it should be basic knowledge but I'm struggling for two hours now. How do I select only the first entry of each list member and ignore the rest? So for $121_at -113691170 $1255_g_at 42231151 $1316_at 35472685 35472588 $1320_at -88003869 I only want to select -113691170, 42231151, 35472685 and -88003869 .? Regards Benjamin -- Benjamin Otto Universitaetsklinikum Eppendorf Hamburg Institut fuer Klinische Chemie Martinistrasse 52 20246 Hamburg [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how ot replace the diagonal of a matrix
You are indexing with numeric 0's and 1's, which will refer to only the matrix element 1,1 (multiple times), cf: matrix(1:9,3)[diag(3)] [1] 1 1 1 Try one of these: idx - diag(3) 0 idx - which(diag(3)0) idx - cbind(seq(len=n), seq(len=n)) (For very large matrices, the third will be more efficient, I believe.) -- Tony Plate roger bos wrote: Dear useRs, Trying to replace the diagonal of a matrix is not working for me. I want a matrix with .6 on the diag and .4 elsewhere. The following code looks like it should work--when I lookk at mps and idx they look how I want them too--but it only replaces the first element, not each element on the diagonal. mps - matrix(rep(.4, 3*3), nrow=n, byrow=TRUE) idx - diag(3) mps idx mps[idx] - rep(.6,3) I also tried something along the lines of diag(mps=.6, ...) but it didn't know what mps was. Thanks, Roger __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shifting a huge matrix left or right efficiently ?
If you're able to work with the transpose of your matrix, you might consider the function 'filter()', e.g.: filter(diag(1:5), c(2,3), sides=1) Time Series: Start = 1 End = 5 Frequency = 1 [,1] [,2] [,3] [,4] [,5] 1 NA NA NA NA NA 234000 306600 400980 5000 12 10 I don't know if the conversion to and from a time-series class will impact the timing, but if this might serve your purposes, it's easy to do some experiments to find out. - Tony Plate Huang-Wen Chen wrote: I'm wondering what's the best way to shift a huge matrix left or right. My current implementation is the following: shiftMatrixL - function(X, shift, padding=0) { cbind(X[, -1:-shift], matrix(padding, dim(X)[1], shift)) } X - shiftMatrixL(X, 1)*3 + shiftMatrixL(X,2)*5... However, it's still slow due to heavy use of this function. The resulting matrix will only be read once and then discarded, so I believe the best implementation of this function is in C, manipulating the internal data structure of this matrix. Anyone know similar package for doing this job ? Huang-Wen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: rarefy a matrix of counts
Here's a way using apply(), and the prob= argument of sample(): df - data.frame(sample1=c(red=400,green=100,black=300), sample2=c(300,0,1000), sample3=c(2500,200,500)) df sample1 sample2 sample3 red 400 3002500 green 100 0 200 black 3001000 500 set.seed(1) apply(df, 2, function(counts) sample(seq(along=counts), rep=T, size=7, prob=counts)) sample1 sample2 sample3 [1,] 1 3 1 [2,] 1 3 1 [3,] 3 3 1 [4,] 2 3 2 [5,] 1 3 1 [6,] 2 3 1 [7,] 2 3 3 Note that this does sampling WITH replacement. AFAIK, sampling without replacement requires enumerating the entire population to be sampled from. I.e., you cannot do sample(1:3, prob=1:3, rep=F, size=4) instead of sample(c(1,2,2,3,3,3), rep=F, size=4) -- Tony Plate From reading ?sample, I was a little unclear on whether sampling without replacement could work Petr Pikal wrote: Hi a litle bit different story. But x1 - sample(c(rep(red,400),rep(green, 100), rep(black,300)),100) is maybe close. With data frame (if it is not big) DF color sample1 sample2 sample3 1 red 400 3002500 2 green 100 0 200 3 black 3001000 500 x - data.frame(matrix(NA,100,3)) for (i in 2:ncol(DF)) x[,i-1] - sample(rep(DF[,1], DF[,i]),100) if you want result in data frame or x-vector(list, 3) for (i in 2:ncol(DF)) x[[,i-1]] - sample(rep(DF[,1], DF[,i]),100) if you want it in list. Maybe somebody is clever enough to discard for loop but you said you have 80 columns which shall be no problem. HTH Petr On 11 Oct 2006 at 10:11, Brian Frappier wrote: Date sent:Wed, 11 Oct 2006 10:11:33 -0400 From: Brian Frappier [EMAIL PROTECTED] To: Petr Pikal [EMAIL PROTECTED] Subject: Fwd: [R] rarefy a matrix of counts -- Forwarded message -- From: Brian Frappier [EMAIL PROTECTED] Date: Oct 11, 2006 10:10 AM Subject: Re: [R] rarefy a matrix of counts To: r-help@stat.math.ethz.ch Hi Petr, Thanks for your response. I have data that looks like the following: sample 1 sample 2 sample 3 red candy400 300 2500 green candy1000 200 black candy 3001000500 I don't want to randomly select either the samples (columns) or the candy types (rows), which sample as you state would allow me. Instead, I want to randomly sample 100 candies from each sample and retain info on their associated type. I could make a list of all the candies in each sample: sample 1 red red red red green green black red black ... and then randomly sample those rows. Repeat for each sample. But, I am not sure how to do that without alot of loops, and am wondering if there is an easier way in R. Thanks! I should have laid this out in the first email...sorry. On 10/11/06, Petr Pikal [EMAIL PROTECTED] wrote: Hi I am not experienced in Matlab and from your explanation I do not understand what exactly do you want. It seems that you want randomly choose a sample of 100 rows from your martix, what can be achived by sample. DF-data.frame(rnorm(100), 1:100, 101:200, 201:300) DF[sample(1:100, 10),] If you want to do this several times, you need to save your result and than it depends on what you want to do next. One suitable form is list of matrices the other is array and you can use for loop for completing it. HTH Petr On 10 Oct 2006 at 17:40, Brian Frappier wrote: Date sent: Tue, 10 Oct 2006 17:40:47 -0400 From: Brian Frappier [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Subject: [R] rarefy a matrix of counts Hi all, I have a matrix of counts for objects (rows) by samples (columns). I aimed for about 500 counts in each sample (I have about 80 samples) and would now like to rarefy these down to 100 counts in each sample using simple random sampling without replacement. I plan on rarefying several times for each sample. I could do the tedious looping task of making a list of all objects (with its associated identifier) in each sample and then use the wonderful sampling package to select a sub-sample of 100 for each sample and thereby get a logical vector of inclusions. I would then regroup the resulting logical vector into a vector of counts by object, rinse and repeat several times for each sample. Alternately, using the same list, I could create a random index of integers between 1 and the number of objects for a sample (without repeats) and then select those objects from the list. Again, rinse and repeat several time for each sample. Is there a way to directly rarefy a matrix of counts without having to create a list of objects first? I am
Re: [R] Fwd: rarefy a matrix of counts
Two things to note: (1) rep() can be vectorized: rep(1:3, 2:4) [1] 1 1 2 2 2 3 3 3 3 (2) you will likely get much better performance if you work with integers and convert to strings after sampling (or use factors), e.g.: c(red,green,blue)[sample(rep(1:3,c(400,100,300)), 5)] [1] red blue red red red -- Tony Plate Brian Frappier wrote: I tried all of the approaches below. the problem with: x - data.frame(matrix(NA,100,3)) for (i in 2:ncol(DF)) x[,i-1] - sample(rep(DF[,1], DF[,i]),100) if you want result in data frame or x-vector(list, 3) for (i in 2:ncol(DF)) x[[,i-1]] - sample(rep(DF[,1], DF[,i]),100) is that this code still samples the rows, not the elements, i.e. returns 100 or 300 in the matrix cells instead of red or a matrix of counts by color (object type) like: x1x2 x3 red 32 560 gr6895 40 sum 100 100 100 It looks like Tony is right: sampling without replacement requires listing of all elements to be sampled. But, the code Petr provided x1 - sample(c(rep(red,400),rep(green, 100),rep(black,300)),100) did give me a clue of how to quickly make such a list using the 'rep' command. I will for-loop a rep statement using my original matrix to create a list of elements for each sample: Thanks Petr and Tony for your help! On 10/11/06, *Tony Plate* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Here's a way using apply(), and the prob= argument of sample(): df - data.frame(sample1=c(red=400,green=100,black=300), sample2=c(300,0,1000), sample3=c(2500,200,500)) df sample1 sample2 sample3 red 400 3002500 green 100 0 200 black 3001000 500 set.seed(1) apply(df, 2, function(counts) sample(seq(along=counts), rep=T, size=7, prob=counts)) sample1 sample2 sample3 [1,] 1 3 1 [2,] 1 3 1 [3,] 3 3 1 [4,] 2 3 2 [5,] 1 3 1 [6,] 2 3 1 [7,] 2 3 3 Note that this does sampling WITH replacement. AFAIK, sampling without replacement requires enumerating the entire population to be sampled from. I.e., you cannot do sample(1:3, prob=1:3, rep=F, size=4) instead of sample(c(1,2,2,3,3,3), rep=F, size=4) -- Tony Plate From reading ?sample, I was a little unclear on whether sampling without replacement could work Petr Pikal wrote: Hi a litle bit different story. But x1 - sample(c(rep(red,400),rep(green, 100), rep(black,300)),100) is maybe close. With data frame (if it is not big) DF color sample1 sample2 sample3 1 red 400 3002500 2 green 100 0 200 3 black 3001000 500 x - data.frame(matrix(NA,100,3)) for (i in 2:ncol(DF)) x[,i-1] - sample(rep(DF[,1], DF[,i]),100) if you want result in data frame or x-vector(list, 3) for (i in 2:ncol(DF)) x[[,i-1]] - sample(rep(DF[,1], DF[,i]),100) if you want it in list. Maybe somebody is clever enough to discard for loop but you said you have 80 columns which shall be no problem. HTH Petr On 11 Oct 2006 at 10:11, Brian Frappier wrote: Date sent:Wed, 11 Oct 2006 10:11:33 -0400 From: Brian Frappier [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] To: Petr Pikal [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Subject: Fwd: [R] rarefy a matrix of counts -- Forwarded message -- From: Brian Frappier [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Date: Oct 11, 2006 10:10 AM Subject: Re: [R] rarefy a matrix of counts To: r-help@stat.math.ethz.ch mailto:r-help@stat.math.ethz.ch Hi Petr, Thanks for your response. I have data that looks like the following: sample 1 sample 2 sample 3 red candy400 300 2500 green candy1000 200 black candy 3001000500 I don't want to randomly select either the samples (columns) or the candy types (rows), which sample as you state would allow me. Instead, I want to randomly sample 100 candies from each sample and retain info on their associated type. I could make a list of all the candies in each sample: sample 1 red red red red green green black red black
Re: [R] enter browser on error
use options(error=recover), e.g.: remove(x) NULL Warning message: remove: variable x was not found (function() {x})() Error in (function() { : Object x not found options(error=recover) (function(y=1) {x})(2) Error in (function(y = 1) { : Object x not found Enter a frame number, or 0 to exit 1:(function(y = 1) { Selection: 1 Called from: eval(expr, envir, enclos) Browse[1] y [1] 2 Browse[1] Enter a frame number, or 0 to exit 1:(function(y = 1) { Selection: 0 At Tuesday 11:26 AM 8/31/2004, Bickel, David wrote: Is there a way I can get R to automatically enter the browser inside a user-defined function on the generation of an error? Specifically, I'm trying to debug this: Error in as.double.default(sapply(lis, FUN)) : (list) object cannot be coerced to double In addition: There were 38 warnings (use warnings() to see them) traceback() 8: as.double.default(sapply(lis, FUN)) 7: as.numeric(sapply(lis, FUN)) 6: numeric.sapply(function(x) { [EMAIL PROTECTED] }) On detection of the error, I would like browser() to be called at the level of numeric.sapply(), so that I can examine x. I'm wondering if this can be done by modifying the default error handling. Using try() with browser() didn't work. Thanks, David _ David Bickel http://davidbickel.com Research Scientist Pioneer Hi-Bred International Bioinformatics Exploratory Research 7250 NW 62nd Ave., PO Box 552 Johnston, Iowa 50131-0552 515-334-4739 Tel 515-334-6634 Fax [EMAIL PROTECTED], [EMAIL PROTECTED] This communication is for use by the intended recipient and ...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Signs of loadings from princomp on Windows
FWIW, I see the same behavior as Francisco on my Windows machine (also an installation of the windows binary without trying to install any special BLAS libraries): library(MASS) data(painters) pca.painters - princomp(painters[ ,1:4]) loadings(pca.painters) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Composition 0.484 -0.376 0.784 -0.101 Drawing 0.424 0.187 -0.280 -0.841 Colour -0.381 -0.845 -0.211 -0.310 Expression 0.664 -0.330 -0.513 0.432 Comp.1 Comp.2 Comp.3 Comp.4 SS loadings 1.00 1.00 1.00 1.00 Proportion Var 0.25 0.25 0.25 0.25 Cumulative Var 0.25 0.50 0.75 1.00 pca.painters - princomp(painters[ ,1:4]) loadings(pca.painters) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Composition -0.484 -0.376 0.784 -0.101 Drawing -0.424 0.187 -0.280 -0.841 Colour 0.381 -0.845 -0.211 -0.310 Expression -0.664 -0.330 -0.513 0.432 Comp.1 Comp.2 Comp.3 Comp.4 SS loadings 1.00 1.00 1.00 1.00 Proportion Var 0.25 0.25 0.25 0.25 Cumulative Var 0.25 0.50 0.75 1.00 R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major1 minor9.1 year 2004 month06 day 21 language R My machine is a dual-processor hp xw8000. I also get the same results with R 2.0.0 dev as in R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Under development (unstable) major2 minor0.0 year 2004 month09 day 13 language R -- Tony Plate At Tuesday 10:25 AM 9/14/2004, Prof Brian Ripley wrote: On Tue, 14 Sep 2004, Francisco Chamu wrote: I have run this on both Windows 2000 and XP. All I did was install the binaries from CRAN so I think I am using the standard Rblas.dll. To reproduce what I see you must run the code at the beginning of the R session. We did, as you said `start a clean session'. I think to reproduce what you see we have to be using your account on your computer. After the second run, all subsequent runs give the same result as the second set. Thanks, Francisco On Tue, 14 Sep 2004 08:29:25 +0200, Uwe Ligges [EMAIL PROTECTED] wrote: Prof Brian Ripley wrote: I get the second set each time, on Windows, using the build from CRAN. Which BLAS are you using? Works also well for me with a self compiled R-1.9.1 (both with standard Rblas as well as with the Rblas.dll for Athlon CPU from CRAN). Is this a NT-based version of Windows (NT, 2k, XP)? Uwe On Tue, 14 Sep 2004, Francisco Chamu wrote: I start a clean session of R 1.9.1 on Windows and I run the following code: library(MASS) data(painters) pca.painters - princomp(painters[ ,1:4]) loadings(pca.painters) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Composition 0.484 -0.376 0.784 -0.101 Drawing 0.424 0.187 -0.280 -0.841 Colour -0.381 -0.845 -0.211 -0.310 Expression 0.664 -0.330 -0.513 0.432 Comp.1 Comp.2 Comp.3 Comp.4 SS loadings 1.00 1.00 1.00 1.00 Proportion Var 0.25 0.25 0.25 0.25 Cumulative Var 0.25 0.50 0.75 1.00 However, if I rerun the same analysis, the loadings of the first component have the opposite sign (see below), why is that? I have read the note in the princomp help that says The signs of the columns of the loadings and scores are arbitrary, and so may differ between different programs for PCA, and even between different builds of R. However, I still would expect the same signs for two runs in the same session. pca.painters - princomp(painters[ ,1:4]) loadings(pca.painters) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Composition -0.484 -0.376 0.784 -0.101 Drawing -0.424 0.187 -0.280 -0.841 Colour 0.381 -0.845 -0.211 -0.310 Expression -0.664 -0.330 -0.513 0.432 Comp.1 Comp.2 Comp.3 Comp.4 SS loadings 1.00 1.00 1.00 1.00 Proportion Var 0.25 0.25 0.25 0.25 Cumulative Var 0.25 0.50 0.75 1.00 R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major1 minor9.1 year 2004 month06 day 21 language R BTW, I have tried the same in R 1.9.1 on Debian and I can't reproduce what I see on Windows. In fact all the runs give the same as the second run on Windows. -Francisco __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road
Re: [R] efficient submatrix extraction
I think you should be able to do something with reassigning the dim attribute, and then using apply(), something along the lines of the following (which doesn't do your computation on the data in the subarrays, but merely illustrates how to create and access them): x - matrix(1:64,ncol=8) x [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,]19 17 25 33 41 49 57 [2,]2 10 18 26 34 42 50 58 [3,]3 11 19 27 35 43 51 59 [4,]4 12 20 28 36 44 52 60 [5,]5 13 21 29 37 45 53 61 [6,]6 14 22 30 38 46 54 62 [7,]7 15 23 31 39 47 55 63 [8,]8 16 24 32 40 48 56 64 x2 - x dim(x2) - c(2,4,2,4) x2[,1,,1] [,1] [,2] [1,]19 [2,]2 10 x2[,2,,1] [,1] [,2] [1,]3 11 [2,]4 12 x2[,1,,2] [,1] [,2] [1,] 17 25 [2,] 18 26 x4 - x dim(x4) - c(4,2,4,2) x4[,1,,1] [,1] [,2] [,3] [,4] [1,]19 17 25 [2,]2 10 18 26 [3,]3 11 19 27 [4,]4 12 20 28 invisible(apply(x4, c(2,4), print)) [,1] [,2] [,3] [,4] [1,]19 17 25 [2,]2 10 18 26 [3,]3 11 19 27 [4,]4 12 20 28 [,1] [,2] [,3] [,4] [1,]5 13 21 29 [2,]6 14 22 30 [3,]7 15 23 31 [4,]8 16 24 32 [,1] [,2] [,3] [,4] [1,] 33 41 49 57 [2,] 34 42 50 58 [3,] 35 43 51 59 [4,] 36 44 52 60 [,1] [,2] [,3] [,4] [1,] 37 45 53 61 [2,] 38 46 54 62 [3,] 39 47 55 63 [4,] 40 48 56 64 hope this helps, Tony Plate At Wednesday 03:10 PM 9/15/2004, Rajarshi Guha wrote: Hi, I have a matrix of say 1024x1024 and I want to look at it in chunks. That is I'd like to divide into a series of submatrices of order 2x2. | 1 2 3 4 5 6 7 8 ... | | 1 2 3 4 5 6 7 8 ... | | 1 2 3 4 5 6 7 8 ... | | 1 2 3 4 5 6 7 8 ... | ... So the first submatrix would be | 1 2 | | 1 2 | the second one would be | 3 4 | | 3 4 | and so on. That is I want the matrix to be evenly divided into 2x2 submatrices. Now I'm also doing this subdivision into 4x4, 8x8 ... 256x256 submatrices. Currently I'm using loops and I'm sure there is a mroe efficient way to do it: m - matrix(runif(1024*1024), nrow=1024) boxsize - 2^(1:8) for (b in boxsize) { bcount - 0 bstart - seq(1,1024, by=b) for (x in bstart) { for (y in bstart) { xend - x + b - 1 yend - y + b - 1 if (length(which( m[ x:xend, y:yend ] 0.7)) 0) { bcount - bcount + 1 } } } } Is there any way to vectorize the two inner loops? Thanks, --- Rajarshi Guha [EMAIL PROTECTED] http://jijo.cjb.net GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE --- The way to love anything is to realize that it might be lost. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Signs of loadings from princomp on Windows
You could investigate this yourself by looking at the code of princomp (try getAnywhere(princomp.default)). I'd suggest making a file that in-lines the body of princomp.default into the commands you had below. See if you still get the difference. (I'd be surprised if you didn't). Then try commenting out lines the second pass through the commands produces the same results as the first. The very last thing you commented out might help to answer your question What would be causing the difference? (The fact that various people chimed in to say they could reproduce the behavior that bothered you, but didn't bother dig deeper suggests it didn't bother them that much, which further suggests that you are the person most motivated by this and thus the best candidate for investigating it further...) -- Tony Plate At Wednesday 05:07 PM 9/15/2004, Francisco Chamu wrote: I am sorry to insist, but we have three other people that were able to reproduce the behavior I mentioned. I have also installed R 1.9.1 from the CRAN binaries on a different Windows machine and again I see the differents signs as mentioned before. What would be causing the difference? -Francisco On Tue, 14 Sep 2004 11:04:29 -0600, Tony Plate [EMAIL PROTECTED] wrote: FWIW, I see the same behavior as Francisco on my Windows machine (also an installation of the windows binary without trying to install any special BLAS libraries): library(MASS) data(painters) pca.painters - princomp(painters[ ,1:4]) loadings(pca.painters) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Composition 0.484 -0.376 0.784 -0.101 Drawing 0.424 0.187 -0.280 -0.841 Colour -0.381 -0.845 -0.211 -0.310 Expression 0.664 -0.330 -0.513 0.432 Comp.1 Comp.2 Comp.3 Comp.4 SS loadings 1.00 1.00 1.00 1.00 Proportion Var 0.25 0.25 0.25 0.25 Cumulative Var 0.25 0.50 0.75 1.00 pca.painters - princomp(painters[ ,1:4]) loadings(pca.painters) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Composition -0.484 -0.376 0.784 -0.101 Drawing -0.424 0.187 -0.280 -0.841 Colour 0.381 -0.845 -0.211 -0.310 Expression -0.664 -0.330 -0.513 0.432 Comp.1 Comp.2 Comp.3 Comp.4 SS loadings 1.00 1.00 1.00 1.00 Proportion Var 0.25 0.25 0.25 0.25 Cumulative Var 0.25 0.50 0.75 1.00 R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major1 minor9.1 year 2004 month06 day 21 language R My machine is a dual-processor hp xw8000. I also get the same results with R 2.0.0 dev as in R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Under development (unstable) major2 minor0.0 year 2004 month09 day 13 language R -- Tony Plate At Tuesday 10:25 AM 9/14/2004, Prof Brian Ripley wrote: On Tue, 14 Sep 2004, Francisco Chamu wrote: I have run this on both Windows 2000 and XP. All I did was install the binaries from CRAN so I think I am using the standard Rblas.dll. To reproduce what I see you must run the code at the beginning of the R session. We did, as you said `start a clean session'. I think to reproduce what you see we have to be using your account on your computer. After the second run, all subsequent runs give the same result as the second set. Thanks, Francisco On Tue, 14 Sep 2004 08:29:25 +0200, Uwe Ligges [EMAIL PROTECTED] wrote: Prof Brian Ripley wrote: I get the second set each time, on Windows, using the build from CRAN. Which BLAS are you using? Works also well for me with a self compiled R-1.9.1 (both with standard Rblas as well as with the Rblas.dll for Athlon CPU from CRAN). Is this a NT-based version of Windows (NT, 2k, XP)? Uwe On Tue, 14 Sep 2004, Francisco Chamu wrote: I start a clean session of R 1.9.1 on Windows and I run the following code: library(MASS) data(painters) pca.painters - princomp(painters[ ,1:4]) loadings(pca.painters) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Composition 0.484 -0.376 0.784 -0.101 Drawing 0.424 0.187 -0.280 -0.841 Colour -0.381 -0.845 -0.211 -0.310 Expression 0.664 -0.330 -0.513 0.432 Comp.1 Comp.2 Comp.3 Comp.4 SS loadings 1.00 1.00 1.00 1.00 Proportion Var 0.25 0.25 0.25 0.25 Cumulative Var 0.25 0.50 0.75 1.00 However, if I rerun the same analysis, the loadings of the first component have the opposite sign (see below), why is that? I have read the note in the princomp help that says The signs of the columns of the loadings and scores are arbitrary, and so may differ between
Re: [R] There were 50 or more warnings (use warnings() to see the first 50)
Try putting options(warn=1) at the start of your R code. This should cause the warnings to be printed as they occur, instead of the default of being saved up until the top-level command terminates. See ?warning and ?option. -- Tony Plate At Thursday 08:52 AM 9/16/2004, Mag. Ferri Leberl wrote: I employ R in the Slave-Mode. The slave returns me the following feedback: There were 50 or more warnings (use warnings() to see the first 50) I have found no way so far to get the warnings viewed. Which command would be appropriate? warnings() (without an argument) returns NULL. Thank you in advance. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] read 4-jan-02 as date
Works fine when you give as.Date() a character vector. I suspect the Date column in your data frame is a factor. d - c(12-Jan-01, 11-Jan-01, 10-Jan-01, 9-Jan-01, 8-Jan-01, 5-Jan-01) d [1] 12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01 8-Jan-01 5-Jan-01 as.Date(d, format=%d-%b-%y) [1] 2001-01-12 2001-01-11 2001-01-10 2001-01-09 2001-01-08 [6] 2001-01-05 as.Date(factor(d), format=%d-%b-%y) Error in fromchar(x) : character string is not in a standard unambiguous format Hope this helps, Tony Plate At Monday 09:04 AM 10/11/2004, bogdan romocea wrote: Dear R users, I have a column with dates (character) in a data frame: 12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01 8-Jan-01 5-Jan-01 and I need to convert them to (Julian) dates so that I can sort the whole data frame by date. I thought it would be very simple, but after checking the documentation and the list I still don't have something that works. 1. as.Date returns the error below. What am I doing wrong? As far as I can see the character strings are in standard format. d$Date - as.Date(d$Date, format=%d-%b-%y) Error in fromchar(x) : character string is not in a standard unambiguous format 2. as.date {Survival} produces this error, d$Date - as.date(d$Date, order = dmy) Error in as.date(d$Date, order = dmy) : Cannot coerce to date format 3. Assuming all else fails, is there a text function similar to SCAN in SAS? Given a string like 9-Jan-01 and - as separator, I'd like a function that can read the first, second and third values (9, Jan, 01), so that I can get Julian dates with mdy.date {survival}. Thanks in advance, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How might one write this better?
The trick to vectorizing asset - numeric(T+1) for (t in 1:T) asset[t+1] - cont[t] + ret[t]*asset[t] is to expand it algebraically into a sum of terms like: asset[4] = cont[3] + ret[3] * cont[2] + ret[3] * ret[2] * cont[1] (where the general case should be reasonably obvious, but is more work to write down) Then recognize that this a sum of the elementwise product of a pair of vectors, one of which can be constructed with careful use of rev() and cumprod(): set.seed(1) ret - (rnorm(5)+1)/10 cont - seq(along=ret)+100 asset - numeric(length(ret)+1) # loop way of computing assets -- final asset value is in the last element of asset[] for (i in seq(along=ret)) asset[i+1] - cont[i] + (1+ret[i]) * asset[i] asset [1] 0. 101. 214.9548 321.4880 508.9232 681.5849 # vectorized way of computing final asset value sum(cumprod(rev(c(1+ret[-1],1))) * rev(cont)) [1] 681.585 # compare the two sum(cumprod(rev(c(1+ret[-1],1))) * rev(cont)) - asset[length(ret)+1] [1] 0 At Sunday 05:35 AM 10/3/2004, you wrote: I am trying to simulate the trajectory of the pension assets of one person. In C-like syntax, it looks like this: daily.wage.growth = 1.001 # deterministic contribution.rate = 0.08 # deterministic 8% Wage = 10 # initial Asset = 0 # initial for (10,000 days) { Asset += contribution.rate * Wage # accreting contributions Wage *= daily.wage.growth * Wage# wage growth Asset *= draw from a normal distribution# Asset returns } cat(Terminal asset = , Asset, \n) How can one do this well in R? What I tried so far is to notice that the wage trajectory is deterministic, it does not change from one run to the next, and it can be done in one line. The asset returns trajectory can be obtained using a single call to rnorm(). Both these can be done nicely using R functions (if you're curious, I can give you my code). Using these, I efficiently get a vector of contributions c[] and a vector of returns r[]. But that still leaves the loop: Asset - 0 for (t in 1:T) { Asset - c[t] + r[t]*Asset } How might one do this better? I find that using this code, it takes roughly 0.3 seconds per computation of Asset (on my dinky 500 MHz Celeron). I need to do 50,000 of these every now and then, and it's a pain to have to wait 3 hours. It'll be great if there is some neat R way to rewrite the little loop above. -- Ajay Shah Consultant [EMAIL PROTECTED] Department of Economic Affairs http://www.mayin.org/ajayshah Ministry of Finance, New Delhi __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Equivalents of Matlab's 'find' and 'end'
At Thursday 08:10 AM 10/7/2004, Bryan L. Brown wrote: Sorry if these questions have been asked recently--I'm new to this list. I'm primarily a Matlab user who is attempting to learn R and I'm searching for possible equivalents of commands that I found very handy in Matlab. So that I don't seem ungrateful to those who may answer, I HAVE determined ways to carry out these processes in 'brute force' sorts of ways in R code, but they lack the elegance and simplicity of the Matlab commands. Also, if you know that no such commands exist, that bit of knowledge would be helpful to know so that I don't continue fruitless searches. The first is Matlab's 'find' command. This is one of the most useful commands in Matab. Basically, if X is the vector X=[3, 2, 1, 1, 2, 3] the command 'find(X==1)' would return the vector [3, 4] which would indicate that the vector X had the value of 1 at the 3 and 4 positions. This was an extremely useful command for subsetting in Matlab. The closest thing I've found in R has been 'match' but match only returns the first value as opposed to the position of all matching values. For this specific case, you can use which(). Also note that sometimes it can be useful to use match() with the arguments swapped, which can return you the positions of all matching values. Also, the operator %in% can be useful: X - c(3, 2, 1, 1, 2, 3) which(X==1) [1] 3 4 match(1, X) [1] 3 match(X, 1) [1] NA NA 1 1 NA NA which(!is.na(match(X, 1))) [1] 3 4 which(X %in% 1) [1] 3 4 The second Matlab command that I'd like to find an R equivalent for is 'end'. 'end' is just a simple little command that indicates the end of a row/column. It is incredibly handy when used to subset matrices like Y = X(2:end) and produces Y=[2, 1, 1, 2, 3] if the X is the same as in the previous example. This cutsie little command was extremely useful for composing programs that were flexible and could use input matrices of any size without modifying the code. I realize that you can accomplish the same by Y - X[2:length(X)] in R, but this method is ungainly, particularly when subsetting matrices rather than vectors. Yep, that is a handy feature, and I often wish for something like it, but in my 10 years of using R/S-PLUS I've not come across anything better than using length(X) (or nrow(X)/ncol(X)) for the general case. (But I do sometimes still discover useful things that I didn't know about.) For your specific case of Y = X(2:end) in R/S-PLUS you can do: Y = X[-1] If anyone has advice, I'd be grateful, Bryan L. Brown Integrative Biology University of Texas at Austin Austin, TX 78712 512-965-0678 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-(wiki)-pedia?
At Thursday 11:29 AM 10/7/2004, Dan Bolser wrote: [snip] I just added some pages... I think it would be great if people could get motivated to contribute to something like this. Its one of those cases of just getting the ball rolling... Do you think you can dump the existing R-docs into this wiki as a framework to get things going? If the existing R-docs are dumped into a wiki, won't the copy in the Wiki quickly get out of date? How does one get around this problem? -- Tony Plate __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub() on Matrix
Many more recent regular expression implementations have ways of indicating a match on a word boundary. It's usually \b. Here's what you did: gsub(x1, i1, x1 + x2 + x10 + xx1) [1] i1 + x2 + i10 + xi1 The following worked for me to just change x1 to i1, while leaving alone any larger word that contains x1: gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1) [1] i1 + x2 + x10 + xx1 Note that the backslash must be escaped itself to get past the R lexical analyser, which is independent of the regexp processor. What the regexp processor sees is just a single backslash. For more on this, look for perl documentation of regular expressions. Be aware that to use full perl regexps, you must supply the perl=T argument to gsub(). Also note that \b seems to be part of the most basic regular expression language in R; it even works with extended=F: gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=T) [1] i1 + x2 + x10 + xx1 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=F) [1] i1 + x2 + x10 + xx1 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=F, ext=F) [1] i1 + x2 + x10 + xx1 (I assumed the fact that you have a matrix of strings is not relevant.) Hope this helps, Tony Plate At Wednesday 09:07 PM 10/27/2004, Kevin Wang wrote: Hi, Suppose I've got a matrix, and the first few elements look like x1 + x3 + x4 + x5 + x1:x3 + x1:x4 x1 + x2 + x3 + x5 + x1:x2 + x1:x5 x1 + x3 + x4 + x5 + x1:x3 + x1:x5 and so on (have got terms from x1 ~ x14). If I want to replace all the x1 with i7, all x2 with i14, all x3 with i13, for example. Is there an easy way? I tried to put what I want to replace in a vector, like: repl = c(i7, i14, i13, d2, i8, i5, i6, i3, A, i9, i2, i4, i15, i21) and have another vector, say: orig [1] x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 [11] x11 x12 x13 x14 Then I tried something like gsub(orig, repl, mat) ## mat is the name of my matrix but it didn't work *_*.it would replace terms like x10 with i70. (I know it may be an easy question...but I haven't done much regular expression) Cheers, Kevin Ko-Kang Kevin Wang PhD Student Centre for Mathematics and its Applications Building 27, Room 1004 Mathematical Sciences Institute (MSI) Australian National University Canberra, ACT 0200 Australia Homepage: http://wwwmaths.anu.edu.au/~wangk/ Ph (W): +61-2-6125-2431 Ph (H): +61-2-6125-7407 Ph (M): +61-40-451-8301 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] why should you set the mode in a vector?
It's useful when you need to be certain of the mode of a vector. One such situation is when you are about to call a C-language function using the .C() interface. As you point out, some assignments (even just to vector elements) can change the mode of the entire vector. This is why it's important to check the mode of vectors passed to external language functions immediately before the call. As to what assigning the mode does, it specifies (or changes, if necessary) the underlying type of storage of the vector. In R, all the elements in a vector have the same storage mode. In the example below, the storage is initial as double-precision floats, but after the assignment of character data to element 2, the vector is stored as character data (with suitably coerced values of the other elements). After assignment of list data to element 1, the entire vector becomes a list (i.e., a vector of pointers to general objects). [The terminology I'm using here is a little loose, but someone please correct me if it is outright wrong.] Finally, the assigning of mode numeric to the list fails because not all elements can be coerced. (And I'm not sure why the last assignment succeeds and produces the results it does.) v - vector(mode=numeric,length=4) v[3:4] - 3:4 storage.mode(v) [1] double v[2] - foo v [1] 0 foo 3 4 storage.mode(v) [1] character v[1] - list(1:3) v [[1]] [1] 1 2 3 [[2]] [1] foo [[3]] [1] 3 [[4]] [1] 4 mode(v) - numeric Error in as.double.default(list(as.integer(c(1, 2, 3)), foo, 3, 4)) : (list) object cannot be coerced to double x - v[2:4] mode(x) - numeric x [1] NA NA NA -- Tony Plate At Friday 03:41 PM 10/29/2004, Joel Bremson wrote: Hi all, If I write v = vector(mode=numeric,length=10) I'm still allowed to assign non-numerics to v. Furthermore, R figures out what kind of vector I've got anyway when I use the mode() function. So what is it that assigning a mode does? Joel __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] make apply() return a list
for()-loops aren't so bad. Look inside the code of apply() and see what it uses! The important thing is that you use vectorized functions to manipulate vectors. It's often fine to use for-loops to manipulate the rows or columns of a matrix, but once you've extracted a row or a column, then use a vectorized function to manipulate that data. In any case, one way to get apply() to return a list is to wrap the result from the subfunction inside a list, e.g.: x - apply(matrix(1:6,2), 1, function(x) list((c(mean=mean(x), sd=sd(x) x [[1]] [[1]][[1]] mean sd 32 [[2]] [[2]][[1]] mean sd 42 # to remove the extra level of listing here, do: lapply(x, [[, 1) [[1]] mean sd 32 [[2]] mean sd 42 At Monday 11:37 AM 11/1/2004, Arne Henningsen wrote: Hi, I have a dataframe (say myData) and want to get a list (say myList) that contains a matrix for each row of the dataframe myData. These matrices are calculated based on the corresponding row of myData. Using a for()-loop to do this is very slow. Thus, I tried to use apply(). However, afaik apply() does only return a list if the matrices have different dimensions, while my matrices have all the same dimension. To get a list I could change the dimension of one matrix artificially and restore it after apply(): This a (very much) simplified example of what I did: myData - data.frame( a = c( 1,2,3 ), b = c( 4,5,6 ) ) myFunction - function( values ) { +myMatrix - matrix( values, 2, 2 ) +if( all( values == myData[ 1, ] ) ) { + myMatrix - cbind( myMatrix, rep( 0, 2 ) ) +} +return( myMatrix ) + } myList - apply( myData, 1, myFunction ) myList[[ 1 ]] - myList[[ 1 ]][ 1:2, 1:2 ] myList $1 [,1] [,2] [1,]11 [2,]44 $2 [,1] [,2] [1,]22 [2,]55 $3 [,1] [,2] [1,]33 [2,]66 This exactly does what I want and really speeds up the calculation, but I wonder if there is an easier way to make apply() return a list. Thanks for your help, Arne -- Arne Henningsen Department of Agricultural Economics University of Kiel Olshausenstr. 40 D-24098 Kiel (Germany) Tel: +49-431-880 4445 Fax: +49-431-880 1397 [EMAIL PROTECTED] http://www.uni-kiel.de/agrarpol/ahenningsen/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Reading word by word in a dataset
Trying to make it work when not all rows have the same numbers of fields seems like a good place to use the flush argument to scan() (to skip everything after the first field on the line): With the following copied to the clipboard: i1-apple10$ New_York i2-banana i3-strawberry 7$Japan do: scan(clipboard, , flush=T) Read 3 items [1] i1-apple i2-banana i3-strawberry sub(^[A-Za-z0-9]*-, , scan(clipboard, , flush=T)) Read 3 items [1] apple banana strawberry -- Tony Plate At Monday 01:59 PM 11/1/2004, Spencer Graves wrote: Uwe and Andy's solutions are great for many applications but won't work if not all rows have the same numbers of fields. Consider for example the following modification of Lee's example: i1-apple10$ New_York i2-banana i3-strawberry 7$Japan If I copy this to clipboard and run Andy's code, I get the following: read.table(clipboard, colClasses=c(character, NULL, NULL)) Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 2 did not have 3 elements We can get around this using scan, then splitting things apart similar to the way Uwe described: dat - + scan(clipboard, character(0), sep=\n) Read 3 items dash - regexpr(-, dat) dat2 - substring(dat, pmax(0, dash)+1) blank - regexpr( , dat2) if(any(blank0)) + blank[blank0] - nchar(dat2[blank0]) substring(dat2, 1, blank) [1] apple banana strawberry hope this helps. spencer graves Uwe Ligges wrote: Liaw, Andy wrote: Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have: read.table(clipboard, colClasses=c(character, NULL, NULL)) V1 1 i1-apple 2 i2-banana 3 i3-strawberry ... and if only the words after - are of interest, the statement can be followed by sapply(strsplit(, -), [, 2) Uwe Ligges HTH, Andy From: j lee Hello All, I'd like to read first words in lines into a new file. If I have a data file the following, how can I get the first words: apple, banana, strawberry? i1-apple10$ New_York i2-banana 5$London i3-strawberry 7$Japan Is there any similar question already posted to the list? I am a bit new to R, having a few months of experience now. Cheers, John __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Resources for optimizing code
Have you tried reading the manual An Introduction to R, with special attention to Array Indexing (indexing for data frames is pretty similar to indexing for matrices). Unless I'm misunderstanding, what you want to do is very simple. It is possible to use numeric vectors with 0 and 1 to indicate whether you want to keep the row, but it's a little easier with logical vectors. Here's an example: x - data.frame(a=1:5,b=letters[1:5]) keep.num - ifelse(x$a %% 2 == 1, 1, 0) keep.num [1] 1 0 1 0 1 keep.logical - (x$a %% 2) == 1 keep.logical [1] TRUE FALSE TRUE FALSE TRUE x[keep.num==1,,drop=F] a b 1 1 a 3 3 c 5 5 e x[keep.logical,,drop=F] a b 1 1 a 3 3 c 5 5 e At Friday 10:34 AM 11/5/2004, Janet Elise Rosenbaum wrote: I want to eliminate certain observations in a large dataframe (21000x100). I have written code which does this using a binary vector (0=delete obs, 1=keep), but it uses for loops, and so it's slow and in the extreme it causes R to hang for indefinite time periods. I'm looking for one of two things: 1. A document which discusses how to avoid for loops and situations in which it's impossible to avoid for loops. or 2. A function which can do the above better than mine. My code is pasted below. Thanks so much, Janet # asst is a binary vector of length= nrow(DATAFRAME). # 1= observations you want to keep. 0= observation to get rid of. remove.xtra.f -function(asst, DATAFRAME) { n-sum(asst, na.rm=T) newdata-matrix(nrow=n, ncol=ncol(DATAFRAME)) j-1 for(i in 1:length(data)) { if (asst[i]==1) { newdata[j,]-DATAFRAME[i,] j-j+1 } } newdata.f-as.data.frame(newdata) names(newdata.f)-names(DATAFRAME) return(newdata.f) } -- Janet Rosenbaum [EMAIL PROTECTED] PhD Candidate in Health Policy, Harvard GSAS Harvard Injury Control Research Center, Harvard School of Public Health __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] hashing using named lists
Use match() for exact matching, i.e., test[[match(name, names(test))]] Yes, it is more cumbersome. This partial matching is considered by some to be a design fault, but changing it would break too many programs that depend upon it. I don't understand your question about all.equal.list() -- it does seem to require exact matches on names, e.g.: all.equal(list(a=1:3), list(aa=1:3)) [1] Names: 1 string mismatches all.equal(list(aa=1:3), list(a=1:3)) [1] Names: 1 string mismatches (the above run in R 2.0.0) -- Tony Plate (BTW, in R this operation is generally called indexing or subscripting or extraction, but not hashing. Hashing is a specific technique for managing and looking up indices, which is why some other programming languages refer to list-like objects that are indexed by character strings as hashes. I don't think hashing is used for list names in R, but someone please correct me if I'm wrong! ) At Thursday 09:29 AM 11/18/2004, ulas karaoz wrote: hi all, I am trying to use named list to hash a bunch of vector by name, for instance: test = list() test$name = c(1,2,3) the problem is that when i try to get the values back by using the name, the matching isn't done in an exact way, so test$na is not NULL. is there a way around this? Why by default all.equal.list doesnt require an exact match? How can I do hashing in R? thanks. ulas. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Re: Protocol for answering basic questions
Perhaps something like the following paragraph should be added to the start of the Posting Guide (as a new paragraph right after the existing first paragraph): Note that R-help is *not* intended for questions that are easily answered by consulting one of the FAQs or other introductory material (see Do your homework before posting below).Such questions are actively discouraged and are likely to evoke a brusque response. Questions about seemingly simple matters that are mentioned in the FAQs or other introductory material *are welcomed* on R-help *when the questioner obviously has done their homework and the question is accompanied by an explanation* like FAQ 7.2.1 seems to be relevant to this but I couldn't understand/apply the answer because Something like this would make it very clear up front what type of questions are not appropriate. (I'm not trying at all to dictate the policy, but as far as I can tell, the above summaries the attitude of the majority of very knowledgeable helpers that respond to questions on R-help.) Also, I think that John Maindonald's idea of a I am new to R, where do I start? page, with a link from the posting guide, is an excellent idea. I'm aware that some feel that the posting guide is already too long, but my feeling is that if users don't read a very easily accessible posting guide AND post inappropriate questions AND become offended by brusque responses, then they are beyond where they can easily be helped. The most important thing is to make it very clear what types of questions are and are not considered appropriate, so that beginning users know what they are getting into. And the following might merit inclusion in the FAQ: Why is R-help not for hand-holding beginner questions? R-help is a high traffic list and the general sentiment is that too many very simple questions will overwhelm everyone and most importantly result in the knowledgeable helpers ceasing to participate. The reason that there is no R-help-me-quickly-I-dont-want-to-read-the-documentation list is that no-one has felt that it would work well -- it is unlikely that many knowledgeable users of R would be willing to participate. Without such users participating, it is likely that sometimes bad advice would be offered and stand uncorrected, because R is a complex language with many ways of doing things, some markedly inferior to others. For these reasons, some feel it would be a very bad idea to create such a list. (However, anyone who believes otherwise and wishes to start and maintain such a list or other similar service is free to do so.) One reason for this overall state of affairs is that R is free software and consequently there is no revenue stream to support a hand-holding support service with paid employees. So although the actual software is free, some investment in terms of time spent reading documentation is required in order to use it. Furthermore, many of the frequent helpers on R-help have written introductory documents intended to help beginners with many aspects of learning and using R (e.g., An Introduction to R, and the various FAQs). Consequently they sometimes get fed up getting asked again and again the same question they have already written a document to explain. Nonetheless, the general sentiment on R-help is very helpful -- a quote summarizes it well: It's OK if you need some spoonfeeding (I need that quite often myself), but at least show how you have tried to use the spoon yourself, instead of just showing us your open mouth. [Attribution to Andy Liaw, or remain anonymous?] As some feel that sufficient time and bandwidth has already been spent on this issue, if anyone has any comments on this particular matter of an addition to the posting guide (or FAQ), feel free to choose to respond to me privately, and I will summarize as appropriate. -- Tony Plate __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Percentages in contingency tables *warning trivial question*
The 'abind' function in the 'abind' package is a generalized binding functions for arrays. (I've never tried it with tables.) At Monday 04:36 AM 12/13/2004, BXC (Bendix Carstensen) wrote: [...snip...] The last step is necessary in the absence of a generalized cbind/rbind for tables/arrays. Please correct me if such a thing exists. If it does, it should be referenced under see also in the help page for cbind. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] reading the seed from a simulation
With most modern random number generators you can't capture the current state in a single 32-bit integer. (I suspect the .Random.seed you are seeing is the state contained in 625 integers). The easiest way to run reproducible simulations is to explicitly set the seed, using an integer, before each run. Then it's easy to put the random number generator into the same state again, e.g.: for (sim.num in 1:100) { set.seed(sim.num) ... run simulation ... } If you can't do this, you can record the value of .Random.seed prior to the simulation, and then when you want to reproduce that simulation again, set .Random.seed to that value, e.g.: set.seed(1) sample(1:100, 5) [1] 27 37 57 89 20 sample(1:100, 5) [1] 90 94 65 62 6 set.seed(1) sample(1:100, 5) [1] 27 37 57 89 20 saved.seed - .Random.seed sample(1:100, 5) [1] 90 94 65 62 6 .Random.seed - saved.seed sample(1:100, 5) [1] 90 94 65 62 6 This is not guaranteed to work with all random-number generators; see the NOTE section in ?set.seed -- Tony Plate At Friday 09:50 AM 12/17/2004, Suzette Blanchard wrote: Greetings, I have a simulation of a nonlinear model that is failing. But it does not fail til way into the simulation. I would like to look at the run that is failing and maybe I could if I could capture the seed for the failing run. The help file on set.seed says you can do it but when I tried rs-.Random.seed print(paste(rs,rs,sep= )) I got 626 of them so I don't know how to identify the right one. Please can you help? Thank you, Suzette = Suzette Blanchard, Ph.D. UCSD-PPRU __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] two dimensional array of object elements
Create your original matrix as a list datatype. When assigning elements, be careful with the list structure, as the example indicates. m - 2; n - 3 a - array(list(),c(m,n)) a[1,2] - list(b=1,c=2) Error in [-(`*tmp*`, 1, 2, value = list(b = 1, c = 2)) : number of items to replace is not a multiple of replacement length a[1,2] - list(list(b=1,c=2)) At Friday 11:36 AM 2/11/2005, Weijie Cai wrote: Hi list, I want to create a two (possibly three) dimensional array of objects. These objects are classes in object oriented style. I failed by using a-array(NA,c(m,n)) for (i in 1:m){ for (j in 1:n){ a[i,j]-My.Obj } } The elements are still NA. Any suggestions? Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] problem using uniroot with integrate
At Wednesday 09:27 AM 3/9/2005, Ken Knoblauch wrote: Hi, I'm trying to calculate the value of the variable, dp, below, in the argument to the integral of dnorm(x-dp) * pnorm(x)^(m-1). This corresponds to the estimate of the sensitivity of an observer in an m-alternative forced choice experiment, given the probability of a correct response, Pc, a Gaussian assumption for the noise and no bias. The function that I wrote below gives me an error: Error in f(x, ...) : recursive default argument reference The problem seems to be at the statement using uniroot, because the furntion est.dp works fine outside of the main function. I've been using R for awhile but there are still many nuances about the scoping and the use of environments that I'm weak on and would like to understand better. I would appreciate any suggestions or solutions that anyone might offer for fixing my error. Thank you. dprime.mAFC - function(Pc, m) { est.dp - function(dp, Pc = Pc, m = m) { pr - function(x, dpt = dp, m0 = m) { dnorm(x - dpt) * pnorm(x)^(m0 - 1) } Pc - integrate(pr, lower = -Inf, upper = Inf, dpt = dp, m0 = m)$value } dp.res - uniroot(est.dp, interval = c(0,5), Pc = Pc, m = m) dp.res$root } You've got several problems here * recursive argument defaults: these are unnecessary but result in the particular error message you are seeing (e.g., in the def of est.dp, the default value for the argument 'm' is the value of the argument 'm' itself -- default values for arguments are interpreted in the frame of the function itself) * the argument m=m you supply to uniroot() is being interpreted as specifying the 'maxiter' argument to uniroot() I think you can fix it by changing the 'm' argument of function est.dp to be named 'm0', and specifying 'm0' in the call to uniroot. (but I can't tell for sure because you didn't supply a working example -- when I just guess at values to pass in I get numerical errors.) Also, it would be best to remove the incorrect recursive default arguments for the functions est.dp and pr. -- Tony Plate __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to multiply a constant to a matrix?
I still can't see why this is a problem. If a 1x1 matrix should be treated as a scalar, then it can just be wrapped in drop(), and the arithmetic will be computed correctly by R. Are there any cases where this cannot be done? More specifically, are there any matrix algebra expressions where, depending on the particular dimensions of the variables used, drop() must be used in some cases, and not in other cases? A related but different behavior is the default dropping dimensions with extent equal to one by indexing operations. This can be problematic because if one is not careful, incorrect results can be obtained for particular values used in the expression. For example, consider the following, in which we are trying to compute the cross product of some columns of x with some rows of y. If x has n rows and y has n columns, then the result should always be an nxn matrix. However, if we are not careful with using drop=F in the indexing expressions, we can inadvertently end up with a 1x1 inner product matrix result for the case where we just use one column of x and one row of y. The solution to this is to always use drop=F in indexing in situations where this can occur. x - matrix(1:9, ncol=3) y - matrix(-(1:9), ncol=3) i - 1:2 x[,i] %*% y[i,] [,1] [,2] [,3] [1,] -9 -24 -39 [2,] -12 -33 -54 [3,] -15 -42 -69 i - 1:3 x[,i] %*% y[i,] [,1] [,2] [,3] [1,] -30 -66 -102 [2,] -36 -81 -126 [3,] -42 -96 -150 # i has just one element -- the expression without drop=F # no longer computes an outer product i - 2 x[,i] %*% y[i,] [,1] [1,] -81 x[,i,drop=F] %*% y[i,,drop=F] [,1] [,2] [,3] [1,] -8 -20 -32 [2,] -10 -25 -40 [3,] -12 -30 -48 Cannot all cases in the situations you mention be handled in an analogous manner, by always wrapping appropriate quadratic expressions in drop(), or are there some cases where the result of the quadratic expression must be treated as a matrix, and other cases where the result of the quadratic expression must be treated as a scalar? -- Tony Plate Michael wrote: imagine when you have complicated matrix algebra computation using R, you cannot prevent some middle-terms become quadratic and absorbs into one scalar, right? if R cannot intelligently determine this, and you have to manually add drop everywhere, do you think it is reasonable? On 5/23/06, Patrick Burns [EMAIL PROTECTED] wrote: I think drop(B/D) * solve(A) would be a more transparent approach. It isn't that R can not do what you want, it is that it is saving you from shooting yourself in the foot in your attempt. What you are doing is not really a matrix computation. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Michael wrote: This is very strange: I want compute the following in R: g = B/D * solve(A) where B and D are quadratics so they are just a scalar number, e.g. B=t(a) %*% F %*% a; I want to multiply B/D to A^(-1), but R just does not allow me to do that and it keeps complaining that nonconformable array, etc. I tried the following two tricks and they worked: as.numeric(B/D) * solve(A) diag(as.numeric(B/D), 5, 5) %*% solve (A) But if R cannot intelligently do scalar and matrix multiplication, it is really problemetic. It basically cannot be used to do computations, since in complicated matrix algebras, you have to distinguish where is scalar, and scalars obtained from quadratics cannot be directly used to multiply another matrix, etc. It is going to a huge mess... Any thoughts? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] max / pmax
Here's an example of how I think you can do what you want. Play with the definition of the function highest.use() to get random selection of multiple maxima. drug.names - c(marijuana, crack, cocaine, heroin) drugs - factor(drug.names, levels=drug.names) drugs [1] marijuana crack cocaine heroin Levels: marijuana crack cocaine heroin as.numeric(drugs) [1] 1 2 3 4 N - 20 set.seed(1) primary.drug - sample(drugs, N, rep=T) primary.drug[sample(1:20, 10)] - NA primary.drug [1] NA crack NA NA NA NA heroin [8] cocaine cocaine marijuana NA NA cocaine crack [15] heroinNA cocaine heroinNA NA Levels: marijuana crack cocaine heroin # usage frequencies marijuana - sample(1:3, N, rep=T) crack - sample(1:3, N, rep=T) cocaine - sample(1:3, N, rep=T) heroin - sample(1:3, N, rep=T) cbind(marijuana, crack, cocaine, heroin) marijuana crack cocaine heroin [1,] 2 2 2 1 [2,] 2 3 3 1 [3,] 2 2 2 2 [4,] 1 1 2 3 [5,] 3 1 2 3 [6,] 3 1 3 3 [7,] 3 1 3 2 [8,] 1 2 2 2 [9,] 3 2 3 3 [10,] 2 2 3 2 [11,] 3 3 2 2 [12,] 2 1 3 2 [13,] 3 2 2 1 [14,] 2 1 1 3 [15,] 2 2 3 2 [16,] 3 1 1 1 [17,] 1 2 3 1 [18,] 2 3 1 2 [19,] 3 1 1 3 [20,] 3 3 1 2 highest.use - function(x) {y - which(x==max(x, na.rm=T)); if (length(y)==1) return(y) else return(NA)} apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use) [1] NA NA NA 4 NA NA NA NA NA 3 NA 3 1 4 3 1 3 2 NA NA impute.primary.drug - drugs[ifelse(is.na(primary.drug), apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use), as.numeric(primary.drug))] data.frame(primary.drug, impute.primary.drug) primary.drug impute.primary.drug 1 NANA 2 crack crack 3 NANA 4 NA heroin 5 NANA 6 NANA 7heroin heroin 8 cocaine cocaine 9 cocaine cocaine 10marijuana marijuana 11 NANA 12 NA cocaine 13 cocaine cocaine 14crack crack 15 heroin heroin 16 NA marijuana 17 cocaine cocaine 18 heroin heroin 19 NANA 20 NANA Brian Perron wrote: Hello R users, I am relatively new to R and cannot seem to crack a coding problem. I am working with substance abuse data, and I have a variable called primary.drug which is considered the drug of choice for each subject. I have just a few missing values on that variable. Instead of using a multiple imputation method like chained equations, I would prefer to derive these values from other survey responses. Specifically, I have a frequency of use (in days) for each of the major drugs, so I would like the missing values to be replaced by that drug with the highest level of use. I am starting with the ifelse and max statements, but I know it is wrong: impute.primary.drug - ifelse(is.na(primary.drug), max(marijuana, crack, cocaine, heroin), primary.drug) Here are the problems. First, the max statement (should it be pmax?), returns the highest numeric quantity rather than the variable itself. In other words, I want to test which drug has the highest value, but return the variable name rather than the observed value. Second, if ties are observed, how can I specify the value to be NA? Or, how can I specify one of the values to be randomly selected? Thank in advance for your assistance. Regards, Brian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] References verifying accuracy of R for basic statistical calculations and tests
This might be a place to start: http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html Among the references listed there are: Assessing the Reliability of Statistical Software: Part I by B. D. McCullough (1998) http://www.amstat.org/publications/tas/mccull-1.pdf Assessing the Reliability of Statistical Software: Part II by B. D. McCullough (1999) http://www.amstat.org/publications/tas/mccull.pdf Those might have some relevance Then, doing within an R session: RSiteSearch(Assessing Reliability Statistical Software) turns up 14 hits, many of them looking relevant [leaving the and of in the query results in the search engine timing out - odd?] -- Tony Plate Corey Powell wrote: Do you know of any references that verify the accuracy of R for basic statistical calculations and tests. The results of these studies should indicate that R results are the same as the results of other statistical packages to a certain number of decimal places on some benchmark calculations. Thanks, Corey Powell Clinical Data Analyst Broncus Technologies [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Subset dataframe based on condition
Works OK for me: x - data.frame(a=10^(-2:7), b=10^(10:1)) subset(x, a 1) a b 4 1e+01 1e+07 5 1e+02 1e+06 6 1e+03 1e+05 7 1e+04 1e+04 8 1e+05 1e+03 9 1e+06 1e+02 10 1e+07 1e+01 subset(x, a 1 b a) ab 8 1e+05 1000 9 1e+06 100 10 1e+07 10 Do you get all numeric for the following? sapply(x, class) a b numeric numeric If not, then your data frame is probably encoding the information in some way that you don't want (though if it was as factors, I would have expected a warning from the comparison operator). You might get more help by distilling your problem to a simple example that can be tried out by others. -- Tony Plate Sachin J wrote: Hi, I am trying to extract subset of data from my original data frame based on some condition. For example : (mydf -original data frame, submydf - subset dada frame) submydf = subset(mydf, a 1 b = a), here column a contains values ranging from 0.01 to 10. I want to extract only those matching condition 1 i.e a . But when i execute this command it is not giving me appropriate result. The subset df - submydf contains rows with 0.01 also. Please help me to resolve this problem. Thanks in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] test for end of file on connection
With the text of your message copied to the clipboard: con - file(clipboard, r) readLines(con, 1) [1] I am looking for a function to test for end-of-file on a connection. readLines(con, 1) [1] Apparently this question was already asked a couple of years ago and readLines(con, 1) [1] then P. Dalgaard suggested to look at help(connections), readLines(con, 1) [1] help(readLines). Unfortunately, I couldn't find such a function on those readLines(con, 1) [1] pages, maybe I am missing something. readLines(con, 1) character(0) i.e., readLines() returns a zero length result upon reaching end of file. AFAIK the other file reading functions have similar behavior. It's still worth reading in detail the help for readLines(). hope this helps, Tony Plate At Tuesday 12:08 AM 5/11/2004, Vadim Ogranovich wrote: Hi, I am looking for a function to test for end-of-file on a connection. Apparently this question was already asked a couple of years ago and then P. Dalgaard suggested to look at help(connections), help(readLines). Unfortunately, I couldn't find such a function on those pages, maybe I am missing something. Did anyone figure this out? Thanks, Vadim [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] privileged slots,
At Tuesday 03:44 AM 6/1/2004, Jari Oksanen wrote: [snip] There are several other things that were fully documented and still were removed. One of the latest cases was print.coefmat which was abruptly made Defunct without warning or grace period: code written for 1.8* didn't work in 1.9.0 and if corrected for 1.9.0 it wouldn't work in pre-1.9.0. Anything can change in R without warning, and your code may be broken anytime. Just be prepared. This is true of many software packages. In our production environment we often (usually) run older versions of software, including statistical software, because of bugs or changed behaviors (or fears thereof) in new versions. We usually run the latest versions in our test and non-production systems and only upgrade our production systems when two conditions are satisfied: (1) we need the features in the upgrade and (2) we are comfortable that the upgraded package will run reliably. From what I can see, R is only distinguished from other software packages in these regards by the extreme speed with which bug fixes for the latest version are made available (in contrast, we're still waiting more than a year for fixes for bugs in some commercial software that were described as critical bugs by the vendor's support team) and the high level of respect accorded to users by the core developers (changes are debated and effects on existing software seem to be taken seriously). One very helpful tool to deal with software updates is automated testing. I highly recommend it. R comes with a testing framework. -- Tony Plate cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Importing binary data
Probably the simplest way to improve the speed of your code would be to write the data so that all the data in a column is contiguous. Then you'll be able to read each column with a single call to readBin(). hope this helps, Tony Plate At Tuesday 04:02 AM 6/1/2004, Uli Tuerk wrote: Hi everybody! I've a large dataset, about 2 Mio entries of the format which I would like to import into a frame: integerintegerfloatstringfloatstringstring Because to the huge data amount I've choosen a binary format instead of a text format when exporting from Matlab. My import function is attached below. It works fine for only some entries but is deadly slow when trying to read the complete set. Does anybody has some pointers for me for improving the import or handling such large data sets? Thanks in advance! Uli read.DET.data - function ( f ) { counter - 1 spk.v - c() imp.v - c() score.v - c() th.v - c() ses.v - c() rec.v - c() type.v - c() fid - file( f ,rb) tempi - readBin(fid , integer(), size=1, signed=FALSE) while ( length(tempi) != 0) { spk.v[ counter ] - tempi imp.v[ counter ] - readBin(fid, integer(), size=1, signed=FALSE) score.v[ counter ] - readBin(fid, numeric(), size=4) type.v[ counter ] - readBin(fid, character()) th.v[ counter ] - readBin(fid, numeric(), size=4) ses.v[ counter ] - readBin(fid, character()) rec.v[ counter ] - readBin(fid, character()) counter - counter + 1 tempi - readBin(fid, integer(), size=1, signed=FALSE) } close( fid ) spkf - factor ( spk.v ) impf - factor ( imp.v ) det.f - data.frame( spk=spkf, imp=impf, score=score.v, th=th.v, ses=ses.v, rec=rec.v, type=type.v) det.f } __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to Describe R to Finance People
At Monday 07:58 PM 6/7/2004, Richard A. O'Keefe wrote: [snip] There are three perspectives on programming languages like the S/R family: (1) The programming language perspective. I am sorry to tell you that the only excuse for R is S. R is *weird*. It combines error-prone C-like syntax with data structures that are APL-like but not sufficiently* APL-like to have behaviour that is easy to reason about. The scope rules (certainly the scope rules for S) were obviously designed by someone who had a fanatical hatred of compilers and wanted to ensure that the language could never be usefully compiled. What in particular about the scope rules for S makes it tough for compilers? The scope for ordinary variables seems pretty straightforward -- either local or in one of several global locations. (Or are you referring to the feature of the get() function that it can access variables in any frame?) Thanks to 'with' the R scope rules are little better. The fact that (object)$name returns NULL instead of reporting an error when the object doesn't _have_ a $name property means that errors can be delayed to the point where debugging is harder than it needs to be. Yup, that's why I proposed (and provided an implementation) of an alternative $$ operator that did report an error when object$$name didn't have a name component (and also didn't allow abbreviation), but there was no interest shown in incorporating this into R. -- Tony Plate __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] direct data frame entry
easy to do it by column: d - data.frame(name=c(obs1name,obs2name,obs3name),val1=c(0.2,0.4,0.6),val2=c(0.3,1.0,2.0),row.names=c(r1,r2,r3)) d name val1 val2 r1 obs1name 0.2 0.3 r2 obs2name 0.4 1.0 r3 obs3name 0.6 2.0 (when you do it by row, you get the numbers as factors because c(obs1name, 0.2, 0.3) etc. are character vectors) At Wednesday 01:29 PM 6/9/2004, ivo welch wrote: hi: I searched the last 2 hours for a way to enter a data frame directly in my program. (I know how to read from a file.) that is, I would like to say something like d - this.is.a.data.frame( c(obs1name, 0.2, 0.3), c(obs2name, 0.4, 1.0), c(obs3name, 0.6, 2.0) , varnames=c(name, val1, val2) ); everything I have tried sofar (usually, building with rbind and then names(d)) has come out with factors for the numbers, which is obviously not what I want. this must be a pretty elementary request, so it should probably be an example under data.frame (or read.table). of course, it is probably somewhere---just I have do not remember it and could not find it after 2 hours of searching. I also tried the r-help archives---at the very least, I hope we will get the answer there for future lookups. regards, /iaw __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] a scope problem
This looks like it probably is a scope problem with non-standard evaluation rules for the argument subset= of nnet. Instead of subset=sub[-i], try data=dftc[-i,] (I've not tested this since I don't have the data objects you used.) hope this helps, Tony Plate At Thursday 04:38 PM 6/10/2004, you wrote: Hi, I have some code that looks like: dftc - df[sets$tcset,] pt - numeric(nrow(dftc)) sub - 1:nrow(dftc) for (i in 1:nrow(dftc)) { n - nnet( fmla, data=dftc, weights=wts, subset=sub[-i], size=4, decay=0.01) pt[i] - predict( n, dftc[ i, ], type='class' ) } However running this give me the error: Error in eval(expr, envir, enclos) : Object i not found I have noted this problem in some other instances. For example if I define a function f - function( dat, sets ) { # use sets } I sometimes get an error similar to that above. Does anybody know why this would happen? (R 1.9.0 on Fedora Core 2) --- Rajarshi Guha [EMAIL PROTECTED] http://jijo.cjb.net GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE --- All laws are simulations of reality. -- John C. Lilly __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Elementary sapply question
At Monday 12:57 PM 6/21/2004, Ajay Shah wrote: [...snip...] I am aware of the ... in sapply(). I am unable to understand how sapply will know where to utilise the x[i] values: as the 1st arg or the 2nd arg for f(x, y)? That is, when I say: sapply(x, f, 3) how does sapply know that I mean: for (i in 3:5) { f(i, 3) } and not for (i in 3:5) { f(3, i) } How would we force sapply to use one or the other interpretation? All the functions in the apply() family construct the call by just appending the additional arguments after the first. If you supply argument names for the additional arguments, those will be supplied to the function called. This can be used to force different interpretations of arguments. E.g: sapply(3:5, function(x, y) {return(y)}, 1) [1] 1 1 1 sapply(3:5, function(x, y) {return(y)}, y=1) [1] 1 1 1 sapply(3:5, function(x, y) {return(y)}, x=1) [1] 3 4 5 sapply(3:5, function(x, y) {return(y)}, z=1) Error in FUN(X[[as.integer(1)]], ...) : unused argument(s) (z ...) In the third example, the actual set of arguments in the call to the anonymous function is something like (3, x=1), so the standard argument interpretation rules result in the arguments having the values y=3, x=1. hope this help, Tony Plate Thanks, -ans. -- Ajay Shah Consultant [EMAIL PROTECTED] Department of Economic Affairs http://www.mayin.org/ajayshah Ministry of Finance, New Delhi __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Re: summaries (was: SUMMARY: elementary sapply question)
Posting summaries is customary (or used to be) on S-news, where it was customary to reply to the poster, and not always the whole list. (Whereas on R it is requested that replies be posted to the entire list, which makes summaries less necessary.) However, a good summary can be a very useful thing (and Ajay's summary was very nicely done). What about making it the custom on R-list that the recipient of helpful responses post a summary on a Wiki? This could be a good way for recipients of help to give something back to the community, and it might provide a sufficient input of energy to take a wiki past critical mass, such as the one mentioned by Gabor Grothendieck last year: From: Gabor Grothendieck [EMAIL PROTECTED] MIME-Version: 1.0 Date: Wed, 17 Dec 2003 11:53:59 -0500 (EST) [snip] Actually someone did set up an R wiki some time ago at: http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?RwikiHome yet no one really used it. Some critical mass of use is needed to get such a project off the ground. Comments? -- Tony Plate __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
As far as I know, read.table() in S-plus performs similarly to read.table() in R with respect to speed. So, I wouldn't put high hopes in finding much satisfaction there. I do frequently read large tables in S-plus, and with a considerable amount of work was able to speed things up significantly, mainly by using scan() with appropriate arguments. It's possible that some of the add-on modules for S-plus (e.g., the data-mining module) have faster I/O, but I haven't investigated those. I get the best read performance out of S-plus by using a homegrown binary file format with each column stored in a contiguous block of memory and meta data (i.e., column types and dimensions) stored at the start of the file. The S-plus read function reads the columns one at a time using readRaw(). One would be able to do something similar in R. If you have to read from a text file, then, as others have suggested, writing a C program wouldn't be that hard, as long as you make the format inflexible. -- Tony Plate At Tuesday 06:19 PM 6/29/2004, Igor Rivin wrote: I was not particularly annoyed, just disappointed, since R seems like a much better thing than SAS in general, and doing everything with a combination of hand-rolled tools is too much work. However, I do need to work with very large data sets, and if it takes 20 minutes to read them in, I have to explore other options (one of which might be S-PLUS, which claims scalability as a major , er, PLUS over R). __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
To be careful, there's lots more to I/O than the functions read.table() scan() -- I was only commenting on those, and no inference should be made about other aspects of S-plus I/O based on those comments! I suspect that what has happened is that memory, CPU speed, and I/O speed have evolved at different rates, so what used to be acceptable code in read.table() (in both R and S-plus) is now showing its limitations and has reached the point where it can take half an hour to read in, on a readily-available computer, the largest data table that can be comfortably handled. I'm speculating, but 10 years ago, on a readily available computer, did it take half an hour to read in the largest data table that could be comfortably handled in S-plus or R? People who encounter this now are surprised and disappointed, and IMHO, somewhat justifiably so. The fact that R is an open source volunteer project suggests that the time is ripe for one of those disappointed people to fix the matter and contribute the function read.table.fast()! -- Tony Plate At Wednesday 10:08 AM 6/30/2004, Igor Rivin wrote: Thank you! It's interesting about S-Plus, since they apparently try to support work with much larger data sets by writing everything out to disk (thus getting around the, eg, address space limitations, I guess), so it is a little surprising that they did not tweak the I/O more... Thanks again, Igor __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Can R read data from stdin?
The easiest way would probably be to do the hack of creating a temporary file to hold stdin, then call R to process that file. That would be easy to do in a shell script. If this really won't suffice, this older message might lead to something useful: Rd] R scripting patches for R-1.8.0 Neil McKay mckay at repsac.gmr.com Thu Oct 16 20:30:20 MEST 2003 Previous message: [Rd] data() misbehaving inside a function Next message: [Rd] R scripting patches for R-1.8.0 Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] I've updated my scripting patches to R-1.8.0. These patches allow you to write shell scripts in R (at least on *nix systems) by putting #!/path/to/R.bin --script on the first line of the script file. If you're interested in the patches, e-mail me at mckay at gmr.com -- Neil D. McKay, Mail Code 480-106-359Phone: (586)986-1470 (GM:8-226-1470) Manufacturing Systems Research Lab FAX: (586)986-0574 (GM:8-226-0574) GM Research Development CenterInternet e-mail: mckay at gmr.com 30500 Mound Road Warren, Mich. 48090 At Friday 02:17 PM 7/9/2004, Hayashi Soichi - shayas wrote: Is there anyway I can write a script which feed input datasource from stdin and let R process it (maybe frequency report) then output the report to stdout? I can't seem to find much info on documentation or FAQ on this topic. Thanks! Soichi Hayashi ** The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please re-send this communication to the sender and delete the original message or any copy of it from your computer system. Thank You. [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Regular Expressions
I'd suggest doing it with multiple regular expressions -- you could construct a single regular expression for this, but I expect it would get quite complicated and possibly very slow. The expression for y in the example below tabulates how many words matched for each line (i.e., line 2 matched 1 word, line 3 matched 3 words, and line 4 matched 2 words). x - readLines(clipboard, -1) x [1] Is there a way to use regular expressions to capture two or more words in a [2] sentence? For example, I wish to to find all the lines that have the words \thomas\, [3] \perl\, and \program\, such as \thomas uses a program called perl\, or \perl is a [4] program that thomas uses\, etc. sapply(c(perl,program,thomas), function(re) grep(re, x)) $perl [1] 3 $program [1] 3 4 $thomas [1] 2 3 4 unlist(sapply(c(perl,program,thomas), function(re) grep(re, x)), use.names=F) [1] 3 3 4 2 3 4 y - table(unlist(sapply(c(perl,program,thomas), function(re) grep(re, x)), use.names=F)) y 2 3 4 1 3 2 which(y=2) 3 4 2 3 hope this helps, Tony Plate At Monday 05:59 PM 7/12/2004, Sangick Jeon wrote: Hi, Is there a way to use regular expressions to capture two or more words in a sentence? For example, I wish to to find all the lines that have the words thomas, perl, and program, such as thomas uses a program called perl, or perl is a program that thomas uses, etc. I'm sure this is a very easy task, I would greatly appreciate any help. Thanks! Sangick __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stumped with subsetting
Seems to work fine for me if I understand correctly what you're trying to do (there are some typos in your message, which may mean I'm not understanding): data - data.frame(x=1:3,y=4:6,z=7:9) data[c(x,y)] x y 1 1 4 2 2 5 3 3 6 mylist - c(x,y) data[mylist] x y 1 1 4 2 2 5 3 3 6 data[,mylist] x y 1 1 4 2 2 5 3 3 6 I'd generally use the second form of subsetting above (i.e., data[,mylist], because that will work with matrices as well). hope this helps, Tony Plate At Thursday 01:22 PM 7/29/2004, Peter Wilkinson wrote: This seems like such a trivial thing to do: given a data.frame DF and variables w,v, x,y,z I can do DF[x] or DF[c(x,y)] if I create a vector, mylist = c(x,y) then I do DF[mylist] I am not getting x and y, I get something else. what is the correct way to subset a data.frame by columns using a vector, as if I were doing DF[x,y]? Peter __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lapply drops colnames
If you were preferring to use lapply() rather than for() for reasons of efficiency,you might want to test whether there actually is any difference. In a little test case, involving a data frame with 10,000 columns, I see no big difference. The advantage of a for loop in your situation is that it makes it easy to get at the column names. x - data.frame(sapply(1:1, FUN=rnorm, n=100)) system.time(x1 - unlist(lapply(x, sum))) [1] 0.31 0.01 0.33 NA NA system.time({x2 - numeric(ncol(x)); for (i in seq(len=ncol(x))) x2[i] - sum(x[[i]])}) [1] 0.27 0.00 0.27 NA NA all.equal(x1, x2) [1] TRUE hope this helps, Tony Plate At Monday 04:35 PM 8/2/2004, Jack Tanner wrote: Wolski wrote: What you can do is to extend the column (list) by an addtional attribute attr(mydataframe[i],info)-names(mydataframe)[i] and store theyr names in it. OK, that's brilliant. Any ideas on how to do this automatically for every column in my dataframe? lapply(dataframe... fails for the obvious reason. Should I do something like this, or is for() to be avoided even in this case? for(i in 1:length(a)) {print(names(a)[i])} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to import specific column(s) using read.table?
At Tuesday 01:55 PM 8/10/2004, F Duan wrote: Thanks a lot. Your way works perfect. And one more tiny question related to your codes: My data file has many columns to be omitted (suppose the first 20 ones), but I found scan(myfile, what=list(rep(NULL, 20), rep(0, 5)) doesn't work. I had to to type NULL 20 times and 0 five times in the list(...). That's because rep(NULL, 20) returns a single NULL -- it's not obvious what else it could sensibly return. What you need to do is replicate 20 times a list containing NULL (and a list containing NULL is quite a different object to NULL). E.g.: rep(NULL, 20) NULL c(rep(list(NULL), 3), rep(list(0), 2)) [[1]]: NULL [[2]]: NULL [[3]]: NULL [[4]]: [1] 0 [[5]]: [1] 0 Tony Plate But anyway, it works and saves a lot of memory for me. Thank you again. Frank Quoting Gabor Grothendieck [EMAIL PROTECTED]: Gabor Grothendieck ggrothendieck at myway.com writes: : : F Duan f.duan at yale.edu writes: : : I have a very big tab-delim txt file with header and I only want to import : several columns into R. I checked the options for read.table and only : : Try using scan with the what=list(...) and flush=TRUE arguments. : For example, if your data looks like this: : : 1 2 3 4 : 5 6 7 8 : 9 10 11 12 : 13 14 15 16 : : then you could read columns 2 and 4 into a list with: : oops. That should be 1 and 3. :scan(myfile, what = list(0, NULL, 0), flush = TRUE) : : or read in and convert to a data frame via: : :do.call(cbind, scan(myfile, what = list(0, NULL, 0), flush = TRUE)) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] numerical accuracy, dumb question
At Friday 08:41 PM 8/13/2004, Marc Schwartz wrote: Part of that decision may depend upon how big the dataset is and what is intended to be done with the ID's: object.size(1011001001001) [1] 36 object.size(1011001001001) [1] 52 object.size(factor(1011001001001)) [1] 244 They will by default, as Andy indicates, be read and stored as doubles. They are too large for integers, at least on my system: .Machine$integer.max [1] 2147483647 Converting to a character might make sense, with only a minimal memory penalty. However, using a factor results in a notable memory penalty, if the attributes of a factor are not needed. That depends on how long the vectors are. The memory overhead for factors is per vector, with only 4 bytes used for each additional element (if the level already appears). The memory overhead for character data is per element -- there is no amortization for repeated values. object.size(factor(1011001001001)) [1] 244 object.size(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),1))) [1] 308 # bytes per element in factor, for length 4: object.size(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),1)))/4 [1] 77 # bytes per element in factor, for length 1000: object.size(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),250)))/1000 [1] 4.292 # bytes per element in character data, for length 1000: object.size(as.character(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),250/1000 [1] 20.028 So, for long vectors with relatively few different values, storage as factors is far more memory efficient (this is because the character data is stored only once per level, and each element is stored as a 4-byte integer). (The above was done on Windows 2000). -- Tony Plate If any mathematical operations are to be performed with the ID's then leaving them as doubles makes most sense. Dan, more information on the numerical characteristics of your system can be found by using: .Machine See ?.Machine and ?object.size for more information. HTH, Marc Schwartz On Fri, 2004-08-13 at 21:02, Liaw, Andy wrote: If I'm not mistaken, numerics are read in as doubles, so that shouldn't be a problem. However, I'd try using factor or character. Andy From: Dan Bolser I store an id as a big number, could this be a problem? Should I convert to at string when I use read.table(... example id's 1001001001001 1001001001002 ... 1002001002005 Bigest is probably 1011001001001 Ta, Dan. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Suggestion for posting guide
When I originally compiled the posting guide many people felt that it should be kept as concise as possible, so that its length would not discourage people from reading it. (It probably ended up too long anyway.) So I wouldn't really recommend adding a section of this length too it. That said, a question posted with a good example that can be cut and pasted directly into R is far easier to respond to, so it does seem like a good idea to help people create such things. If someone (Gabor?) wanted to create a page on how to provide good examples in posts, the people who control what gets put on the R-project site might be willing to put it up there, and a link to it from the posting guide would seem like a good idea. Tony Plate At Thursday 07:17 AM 8/19/2004, Gabor Grothendieck wrote: I have a suggestion for the posting guide. One problem with some posts is that they do not provide an example that can be reproduced. I think that many people just do not know how to easily specify some data and some technical assistance should be provided in the posting guide. If the problem depends on specific data they should be made aware, in the posting guide, of: dput(x) since that outputs object x as R code which can then be easily copied from the post and pasted into a session. If its not dependent on particular data they can generate patterned or random data IF THEY KNOW HOW but many might find it easier to just use one of the included datasets so some guidance should be provided on the contents of a few of them, e.g. R comes with built in data sets. data() will list them, data(iris) will attach data set iris and ?iris, str(iris), summary(iris), head(iris) and dput(iris) will give more information on iris (after attaching it). The following are a few of the datasets that come with R: iris - data frame with 4 numeric columns and one 3 level factor nhtemp - a ts class time series faithful - data frame with two numeric columns warpbreaks - data with a numeric column, a 2-level factor a 3-level factor Also letters, LETTERS, month.abb and month.name are built in character vectors that do not require a data statement to access. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Loss of rownames and colnames
At Friday 11:46 AM 8/20/2004, Min-Han Tan wrote: Hi, I am working on some microarray data, and have some problems with writing iterations. In essence, the problem is that objects with three dimensions don't have rownames and colnames. These colnames and rownames would otherwise still be there in 2 dimensional objects. I need to generate multiple iterations of a 2 means-clustering algorithm, and these objects thus probably need 3 dimensions. What objects are you using that are three dimensional but don't have dimension names? Ordinary arrays have dimension names, and rownames() and colnames() extract the names on the first two dimensions: x - array(1:12,dim=c(2,3,2),dimnames=list(letters[1:2],LETTERS[24:26],letters[20:21])) x , , t X Y Z a 1 3 5 b 2 4 6 , , u X Y Z a 7 9 11 b 8 10 12 dimnames(x)[[1]] [1] a b dimnames(x)[[2]] [1] X Y Z x[,Y,] t u a 3 9 b 4 10 rownames(x) [1] a b colnames(x) [1] X Y Z If you need a convenient way to construct three dimensional objects, you can use the abind() package, e.g.: library(abind) # you will have to install the package from CRAN first x1 - matrix(1:6,nrow=2,dimnames=list(letters[1:2],LETTERS[24:26])) x2 - matrix(7:12,nrow=2,dimnames=list(letters[1:2],LETTERS[24:26])) abind(list(t=x1, u=x2), along=3) , , t X Y Z a 1 3 5 b 2 4 6 , , u X Y Z a 7 9 11 b 8 10 12 (The objects to be bound don't have to be given to abind() in a list, but this manner of invocation is convenient when one happens to have a list of objects to be bound together, as one might get in the result from lapply().) My scripts are all written with heavy references to matching of colnames and rownames, so I am running into some problems here. (colnames = sample ids, and rownames = gene ids) Last time I looked, subscripting matrices and arrays with strings was very slow (for large objects), so if you are using character subscripts and you're having problems with slowness, consider doing the indexing yourself using match(), e.g.: x - matrix(rnorm(26^4), ncol=26, dimnames=list(paste(rep(letters,each=26^2),rep(letters,each=26),letters,sep=), LETTERS)) dim(x) [1] 1757626 xr - sample(rownames(x), 1) length(xr) [1] 1 system.time(y - x[xr, ]) [1] 2.22 0.00 2.30 NA NA system.time(y - x[match(xr, rownames(x)), ]) [1] 0.09 0.00 0.09 NA NA HTH -- Tony Plate My bad workaround solution so far has been to generate objects tagged with .2, and have multiple blocks of code. e.g. test.1 - ... test.2 - ... test.x - .. The obvious problem with this solution is that there does not seem to be an easy way of manipulating all these objects together without typing out their names individually. Thanks for any advice. Regards, Min-Han __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] apply ( , , table)
apply() tries to be a bit smart about what it does (sometimes maybe too smart), but it actually is pretty useful a lot of the time. It's extremely widely used, so changing the behavior is not an option -- changing the behavior would break a lot of existing code. (Personally, I'd prefer it if apply() put its dimensions back together in a slightly more intelligent way, i.e., if apply(x, 1, c) and apply(x, 2, c) returned the same thing, but apply is how it is.) In situations where you don't want apply() to try to construct a matrix from your results, you can wrap the results in a list, to force apply() to return just a list of results, e.g. (the outer lapply() strips off an unnecessary level of list depth): b2 - lapply(apply (a, 1, function(x) list(table(x))), [[, 1) length(b2) [1] 4 b2[[1]] x 1 2 6 7 2 1 1 1 attributes(b2[[1]]) $dim [1] 4 $dimnames $dimnames$x [1] 1 2 6 7 $class [1] table Your particular case might benefit from more information given to table, which allows it to provide results in a more uniform format, e.g.: b1 - apply (a, 1, function(x) table(factor(x, levels=0:9))) b1 [,1] [,2] [,3] [,4] 00100 12112 21001 30100 40220 50011 61001 71000 80010 90000 hope this helps, Tony Plate At Tuesday 10:42 AM 8/24/2004, [EMAIL PROTECTED] wrote: a - matrix (c( 7, 1, 1, 2, 6, 3, 4, 0, 1, 4, 5, 1, 8, 4, 4, 6, 1, 1, 2, 5), nrow=4, byrow=TRUE) b - apply (a, 1, table) apply documentation says clearly that if the rows of the result of FUN are the same length, then an array will be returned. And column-major would be the appropriate order in R. But b above is pretty opaque compared to what one would expect, and what one would get from apply ( , , table) if the rows were not of equal length. One needs to do something like n - matrix (apply (a, 1, function (x) unique (sort (x))), nrow=nrow(a)) to get the corresponding names of b to figure out the counts. Denis White __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] (no subject)
Looks like there might have been some truncation of Jonathon Baron's message. Here's one way of computing the sample mode of a vector. set.seed(1) x - sample(1:5,20,rep=T) x [1] 2 2 3 5 2 5 5 4 4 1 2 1 4 2 4 3 4 5 2 4 table(x) x 1 2 3 4 5 2 6 2 6 4 names(which.max(table(x))) [1] 2 Note that this method returns the first max value in the case of ties. hope this helps, Tony Plate At Tuesday 11:01 AM 8/24/2004, Jonathan Baron wrote: On 08/24/04 13:50, Paolo Tommasini wrote: Hi my name is Paolo Tommasini does anyone know how to compute a mode ( most frequent element ) for a distribution ? which.max -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron R search page: http://finzi.psych.upenn.edu/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] S - R
Have you tried following the advice in the R Data Import/Export manual? It suggests the following: Function data.restore reads S-PLUS data dumps (created by data.dump) with the same restrictions (except that dumps from the Alpha platform can also be read). It should be possible to read data dumps from S-PLUS 5.x and 6.x written with data.dump(oldStyle=T). -- Tony Plate At Wednesday 10:29 AM 8/25/2004, Zachary Skrivanek wrote: Hello! I would like to be able to read in list data objects in R/S created in R/S. (Ie R-S or S-R.) I have tried 'dput' and 'dump' in S, but neither of the created files could be read into R (with 'dget' nor 'source'). Is there any way that I can save a list object in S that can be read into R? Sincerely, Zachary Skrivanek, PhD Research Scientist Program Phase Statistics-Endocrine [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] S - R
I think the issue is that dput() and dget() don't work for some more complex structures (as you point out, they do appear to work for simple structures). The R Data Import/Export manual doesn't mention using dput and dget to transfer objects between R and S-PLUS, perhaps because these functions have limited coverage? E.g.: S-PLUS6.1 junk - list(f=as.name(g)) S-PLUS6.1 dput(junk,junk1.dat) S-PLUS6.1 data.dump(junk, file=junk2.dat, oldStyle=F) S-PLUS6.1 data.dump(junk, file=junk3.dat, oldStyle=T) R dget(junk1.dat) Error in eval(expr, envir, enclos) : Object g not found R # with package foreign loaded R data.restore(junk2.dat) Error in ReadSdump(TRUE, ) : S mode junk (near byte offset 45) not supported In addition: Warning message: NAs introduced by coercion R data.restore(junk3.dat) [1] junk3.dat R junk $f g -- Tony Plate At Wednesday 11:45 AM 8/25/2004, Rolf Turner wrote: I'm puzzled by the discourse in this thread. Briefly, dput() and dget() seem to work just fine for me. I tried junk - list(x=rnorm(20),y=sample(1:100,12,TRUE)) dput(junk,junk.dat) in Splus (Version 6.1.2 Release 2 for Sun SPARC, SunOS 5.6 : 2002) and then in R junk - dget(junk.dat) R version: platform sparc-sun-solaris2.9 arch sparc os solaris2.9 system sparc, solaris2.9 status major1 minor9.1 year 2004 month06 day 21 language R There were no complaints, and typing ``junk'' in the R window and in the Splus window appeared to produce indentical results. So what's the problem? cheers, Rolf Turner [EMAIL PROTECTED] Tony Plate wrote: Have you tried following the advice in the R Data Import/Export manual? It suggests the following: Function data.restore reads S-PLUS data dumps (created by data.dump) with the same restrictions (except that dumps from the Alpha platform can also be read). It should be possible to read data dumps from S-PLUS 5.x and 6.x written with data.dump(oldStyle=T). -- Tony Plate At Wednesday 10:29 AM 8/25/2004, Zachary Skrivanek wrote: Hello! I would like to be able to read in list data objects in R/S created in R/S. (Ie R-S or S-R.) I have tried 'dput' and 'dump' in S, but neither of the created files could be read into R (with 'dget' nor 'source'). Is there any way that I can save a list object in S that can be read into R? Sincerely, Zachary Skrivanek, PhD Research Scientist Program Phase Statistics-Endocrine __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] terminate R program when trying to access out-of-bounds array element?
One way could be to make a special class with an indexing method that checks for out-of-bounds numeric indices. Here's an example for vectors: setOldClass(c(oobcvec)) x - 1:3 class(x) - oobcvec x [1] 1 2 3 attr(,class) [1] oobcvec [.oobcvec - function(x, ..., drop=T) { +if (!missing(..1) is.numeric(..1) any(is.na(..1) | ..1 1 | ..1 length(x))) +stop(numeric vector out of range) +NextMethod([) + } x[2:3] [1] 2 3 x[2:4] Error in [.oobcvec(x, 2:4) : numeric vector out of range Then, for vectors for which you want out-of-bounds checks done when they indexed, set the class to oobcvec. This should work for simple vectors (I checked, and it works if the vectors have names). If you want this write a method like this for indexing matrices, you can use ..1 and ..2 to refer to the i and j indices. If you want to also be able to check for missing character indices, you'll just need to add more code. Note that the above example disallows 0 and negative indices, which may or may not be what you want. If you're extensively using other classes that you've defined, and you want out-of-bounds checking for them, then you need to integrate the checks into the subsetting methods for those classes -- you can't just use the above approach. hope this helps, Tony Plate Vivek Rao wrote: I want R to stop running a script (after printing an error message) when an array subscript larger than the length of the array is used, for example x = c(1) print(x[2]) rather than printing NA, since trying to access such an element may indicate an error in my program. Is there a way to get this behavior in R? Explicit testing with the is.na() function everywhere does not seem like a good solution. Thanks. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] terminate R program when trying to access out-of-bounds array element?
Oops. The message in the 'stop' should be something more like numeric index out of range. -- Tony Plate Tony Plate wrote: One way could be to make a special class with an indexing method that checks for out-of-bounds numeric indices. Here's an example for vectors: setOldClass(c(oobcvec)) x - 1:3 class(x) - oobcvec x [1] 1 2 3 attr(,class) [1] oobcvec [.oobcvec - function(x, ..., drop=T) { +if (!missing(..1) is.numeric(..1) any(is.na(..1) | ..1 1 | ..1 length(x))) +stop(numeric vector out of range) +NextMethod([) + } x[2:3] [1] 2 3 x[2:4] Error in [.oobcvec(x, 2:4) : numeric vector out of range Then, for vectors for which you want out-of-bounds checks done when they indexed, set the class to oobcvec. This should work for simple vectors (I checked, and it works if the vectors have names). If you want this write a method like this for indexing matrices, you can use ..1 and ..2 to refer to the i and j indices. If you want to also be able to check for missing character indices, you'll just need to add more code. Note that the above example disallows 0 and negative indices, which may or may not be what you want. If you're extensively using other classes that you've defined, and you want out-of-bounds checking for them, then you need to integrate the checks into the subsetting methods for those classes -- you can't just use the above approach. hope this helps, Tony Plate Vivek Rao wrote: I want R to stop running a script (after printing an error message) when an array subscript larger than the length of the array is used, for example x = c(1) print(x[2]) rather than printing NA, since trying to access such an element may indicate an error in my program. Is there a way to get this behavior in R? Explicit testing with the is.na() function everywhere does not seem like a good solution. Thanks. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] pointer to comments re Paul Murrell's new book, R, SAS on Andrew Gelman's blog
There are some interesting comments re Paul Murrell's new book, R, SAS on Andrew Gelman's blog: http://www.stat.columbia.edu/~cook/movabletype/archives/2005/04/a_new_book_on_r.html -- Tony Plate __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Proba( Ut+2=1 / ((Ut+1==1) (Ut==1))) ?
table() can return all the n-gram statistics, e.g.: v - sample(c(-1,1), 1000, rep=TRUE) table(v_{t-2}=v[-seq(to=length(v), len=2)], v_{t-1}=v[-c(1,length(v))], v_t=v[-(1:2)]) , , v_t = -1 v_{t-1} v_{t-2} -1 1 -1 136 134 1 131 112 , , v_t = 1 v_{t-1} v_{t-2} -1 1 -1 131 113 1 115 126 This says that there were 136 cases in which a -1 followed two -1's (and 126 cases in which a 1 followed to 1's). If you're really only interested in particular contexts, you can do something like: table(v[-seq(to=length(v), len=2)]==1 v[-c(1,length(v))]==1 v[-(1:2)]==1) FALSE TRUE 872 126 table(v[-seq(to=length(v), len=2)]==-1 v[-c(1,length(v))]==-1 v[-(1:2)]==-1) FALSE TRUE 862 136 or sum(v[-seq(to=length(v), len=2)]==-1 v[-c(1,length(v))]==-1 v[-(1:2)]==-1) [1] 136 vincent wrote: Dear all, First I apologize if my question is quite simple, but i'm very newbie with R. I have vectors of the form v = c(1,1,-1,-1,-1,1,1,1,1,-1,1) (longer than this one of course). The elements are only +1 or -1. I would like to calculate : - the frequencies of -1 occurences after 2 consecutives -1 - the frequencies of +1 occurences after 2 consecutives +1 It looks probably something like : Proba( Ut+2=1 / ((Ut+1==1) (Ut==1))) could someone please give me a little hint about how i should/could begin to proceed ? Thanks (Thanks also to the R creators/contributors, this soft seems really great !) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Index matrix to pick elements from 3-dimensional matrix
I'm assuming what you want to do is randomly sample from slices of A selected on the 3-rd dimension, as specified by J. Here's a way that uses indexing by a matrix. The cbind() builds a three column matrix of indices, the first two of which are randomly selected. The use of replace() is to make the result have the same attributes, e.g., dim and dimnames, as J. A - array(letters[1:12],c(2,2,3)) J - matrix(c(1,2,3,3),2,2) replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), sample(dim(A)[2], length(J), rep=T), as.vector(J))]) [,1] [,2] [1,] b l [2,] f k replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), sample(dim(A)[2], length(J), rep=T), as.vector(J))]) [,1] [,2] [1,] b l [2,] h i replace(J, TRUE, A[cbind(sample(dim(A)[1], length(J), rep=T), sample(dim(A)[2], length(J), rep=T), as.vector(J))]) [,1] [,2] [1,] c l [2,] h k -- Tony Plate Robin Hankin wrote: Hello Juhana try this (but there must be a better way!) stratified.select - function(A,J){ out - sapply(J,function(i){sample(A[,,i],1)}) attributes(out) - attributes(J) return(out) } A - array(letters[1:12],c(2,2,3)) J - matrix(c(1,2,3,3),2,2) R stratified.select(A,J) [,1] [,2] [1,] b i [2,] g k R stratified.select(A,J) [,1] [,2] [1,] d j [2,] f l R best wishes Robin On Apr 26, 2005, at 05:16 am, juhana vartiainen wrote: Hi all Suppose I have a dim=c(2,2,3) matrix A, say: A[,,1]= a b c d A[,,2]= e f g h A[,,3]= i j k l Suppose that I want to create a 2x2 matrix X, which picks elements from the above-mentioned submatrices according to an index matrix J referring to the depth dimension: J= 1 3 2 3 In other words, I want X to be X= a j g l since the matrix J says that the (1,1)-element should be picked from A[,,1], the (1,2)-element should be picked from A[,,3], etc. I have A and I have J. Is there an expression in A and J that creates X? Thanks Juhana [EMAIL PROTECTED] -- Juhana Vartiainen docent in economics Director, FIEF (Trade Union Foundation for Economic Research, Stockholm), http://www.fief.se gsm +46 70 360 9915 office +46 8 696 9915 email [EMAIL PROTECTED] homepage http://www.fief.se/staff/Juhana/index.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Robin Hankin Uncertainty Analyst Southampton Oceanography Centre European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Summarizing factor data in table?
Do you want to count the number of non-NA divisions and organizations in the data for each year (where duplicates are counted as many times as they appear)? tapply(!is.na(foo$div), foo$yr, sum) 1998 1999 2000 042 tapply(!is.na(foo$org), foo$yr, sum) 1998 1999 2000 442 Or perhaps the number of unique non-NA divisions and organizations in the data for each year? tapply(foo$div, foo$yr, function(x) length(na.omit(unique(x 1998 1999 2000 042 tapply(foo$org, foo$yr, function(x) length(na.omit(unique(x 1998 1999 2000 442 (I don't understand where the 3 in your desired output comes from though, which maybe indicates I completely misunderstand your request.) Andy Bunn wrote: I have a very simple query with regard to summarizing the number of factors present in a certain snippet of a data frame. Given the following data frame: foo - data.frame(yr = c(rep(1998,4), rep(1999,4), rep(2000,2)), div = factor(c(rep(NA,4),A,B,C,D,A,C)), org = factor(c(1:4,1:4,1,2))) I want to get two new variables. Object ndiv would give the number of divisions by year: 1998 0 1999 3 2000 2 Object norgs would give the number of organizations 1998 4 1999 4 2000 2 I figure xtabs should be able to do it, but I'm stuck without a for loop. Any suggestions? -Andy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Defining binary indexing operators
It's not necessary to be that complicated, is it? AFAIK, the '$' operator is treated specially by the parser so that its RHS is treated as a string, not a variable name. Hence, a method for $ can just take the indexing argument directly as given -- no need for any fancy language tricks (eval(), etc.) x - structure(3, class = myclass) y - 5 foo - function(x,y) paste(x, indexed by ', y, ', sep=) foo(x, y) [1] 3 indexed by '5' $.myclass - foo x$y [1] 3 indexed by 'y' The point of the above example is that foo(x,y) behaves differently from x$y even when both call the same function: foo(x,y) uses the value of the variable 'y', whereas x$y uses the string y. This is as desired for an indexing operator $. -- Tony Plate Gabor Grothendieck wrote: On 4/27/05, Ali - [EMAIL PROTECTED] wrote: Assume we have a function like: foo - function(x, y) how is it possible to define a binary indexing operator, denoted by $, so that x$y functions the same as foo(x, y) Here is an example. Note that $ does not evaluate y so you have to do it yourself: x - structure(3, class = myclass) y - 5 foo - function(x,y) x+y $.myclass - function(x, i) { i - eval.parent(parse(text=i)); foo(x, i) } x$y # structure(8, class = myclass) [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Defining binary indexing operators
Excuse me! I misunderstood the question, and indeed, it is necessary be that complicated when you try to make x$y behave the same as foo(x,y), rather than foo(x,y) (doing the former would be inadvisible, as I think someelse pointed out too.) Tony Plate wrote: It's not necessary to be that complicated, is it? AFAIK, the '$' operator is treated specially by the parser so that its RHS is treated as a string, not a variable name. Hence, a method for $ can just take the indexing argument directly as given -- no need for any fancy language tricks (eval(), etc.) x - structure(3, class = myclass) y - 5 foo - function(x,y) paste(x, indexed by ', y, ', sep=) foo(x, y) [1] 3 indexed by '5' $.myclass - foo x$y [1] 3 indexed by 'y' The point of the above example is that foo(x,y) behaves differently from x$y even when both call the same function: foo(x,y) uses the value of the variable 'y', whereas x$y uses the string y. This is as desired for an indexing operator $. -- Tony Plate Gabor Grothendieck wrote: On 4/27/05, Ali - [EMAIL PROTECTED] wrote: Assume we have a function like: foo - function(x, y) how is it possible to define a binary indexing operator, denoted by $, so that x$y functions the same as foo(x, y) Here is an example. Note that $ does not evaluate y so you have to do it yourself: x - structure(3, class = myclass) y - 5 foo - function(x,y) x+y $.myclass - function(x, i) { i - eval.parent(parse(text=i)); foo(x, i) } x$y # structure(8, class = myclass) [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Getting the name of an object as character
If you're trying to find the textual form of an actual argument, here's one way: foo - function(x) { + xn - substitute(x) + if (is.name(xn) !exists(as.character(xn))) + as.character(xn) + else + x + } foo(x) [1] 3 foo(xx) [1] xx foo(list(xx)) Error in foo(list(xx)) : Object xx not found If you want the textual form of arguments that are expressions, use deparse() and a different test ( beware that deparse() can return a vector of character data). Although you can do this in R, it is not always advisable practice. Many people who have written functions with non-standard evaluation rules like this have come to regret it (one reason is that it makes these functions difficult to use in programs, another is that the behavior of the function can depend upon what global variables exists, another is that when the function works as intended, that's great, but when it doesn't, users can get quite confused trying to figure out what it's doing.) The R function help() is an example of a commonly used function with a non-standard evaluation rule. -- Tony Plate Ali - wrote: This could be really trivial, but I cannot find the right function to get the name of an object as a character. Assume we have a function like: getName - function(obj) Now if we call the function like: getName(blabla) and 'blabla' is not a defined object, I want getName to return blabla. In other word, if paste(blabla) returns blabla I want to define a paste function which returns the same character by: paste(blabla) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Reconstruction of a valid expression within a function
You are passing just a string to subset(). At the very least you need to parse it (but still this does not work easily with subset() -- see below). But are you sure you need to do this? subset() for dataframes already accepts subset expressions involving the columns of the dataframe, e.g.: df - data.frame(x=1:10,y=rep(1:5,2)) subset(df, y==2) x y 2 2 2 7 7 2 However, it's tricky to get subset() to work with an expression for its subset argument. This is because of the way it evaluates its subset expression (look at the code for subset.data.frame()). subset(df, parse(text=df$y==2)) Error in subset.data.frame(df, parse(text = df$y==2)) : 'subset' must evaluate to logical subset(df, parse(text=y==2)) Error in subset.data.frame(df, parse(text = y==2)) : 'subset' must evaluate to logical It's a little tricky in general passing R language expressions around, because many functions that work with expressions work with the unevaluated form of the actual argument, rather than with an R language expression as the value of a variable. E.g.: with(df, y==2) [1] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE cond - parse(text=y==2) cond expression(y == 2) with(df, cond) expression(y == 2) One way to make these types of functions work with R language expressions as the value of a variable is to use do.call(): do.call(with, list(df, cond)) [1] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE So, returning to subset(), you can give it an expression that is stored in the value of a variable like this: do.call(subset, list(df, cond)) x y 2 2 2 7 7 2 However, if you're a beginner at R, I suspect that you'll get much further if you avoid such meta-language constructs and just find a way to make subset() work for you without trying to paste together R language expressions. Hope this helps, -- Tony Plate Pascal Boisson wrote: Hello all, I have some trouble in reconstructing a valid expression within a function, here is my question. I am building a function : SUB-function(DF,subset=TRUE) { #where DF is a data frame, with Var1, Var2, Fact1, Fact2, Fact3 #and subset would be an expression, eg. Fact3 == 1 #in a first time I want to build a subset from DF #I managed to, with an expression like eg. DF$Fact3, # but I would like to skip the DF$ for convenience # so I tried something like this : tabsub-deparse(substitute(subset)) dDF-deparse(substitute(DF)) if (tabsub[1]!=TRUE) { subset-paste(dDF,$,tabsub,sep=)} #At this point, I have a string that seems to be the expression that I want sDF-subset(DF, subset) } #But I have an error message : Error in r !is.na(r) : operations are possible only for numeric or logical types I can not understand why is that, even after I've tried to convert properly the string into an expression. I've been all the day trying to sort that problem ... Maybe this attempt is ackward and I have not understood what is really behind an expression. But if anyone could give me a tip concerning this problem or point me to relevant references, I would really appreciate. Thanks Pascal Boisson _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ DISCLAIMER:\ \ This email is from the Scottish Crop Researc...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Subarrays
Here's one way: subarray - function(x, marginals, intervals) { + if (length(marginals) != length(intervals)) + stop(marginals and intervals must be the same length (intervals can be a list)) + if (any(marginals1 | marginalslength(dim(x + stop(marginals must contain values in 1:length(dim(x))) + ic - Quote(x[, drop=T]) + # ic has 4 elts with one empty index arg + ic2 - ic[c(1, 2, rep(3, length(dim(x))), 4)] + # ic2 has an empty arg for each dim of x + ic2[marginals+2] - intervals + eval(ic2) } subarray(v, c(1,4), c(3,2)) [,1] [,2] [,3] [,4] [1,] 67 83 99 115 [2,] 71 87 103 119 [3,] 75 91 107 123 [4,] 79 95 111 127 subarray(v, c(1,4), list(3,2)) [,1] [,2] [,3] [,4] [1,] 67 83 99 115 [2,] 71 87 103 119 [3,] 75 91 107 123 [4,] 79 95 111 127 subarray(v, c(1,3,4), list(c(1,3,4),1,2)) [,1] [,2] [,3] [,4] [1,] 65 69 73 77 [2,] 67 71 75 79 [3,] 68 72 76 80 Question for language experts: is this the best way to create and manipulate R language expressions that contain empty arguments, or are there other preferred ways? -- Tony Plate Gunnar Hellmund wrote: Define an array v-1:256 dim(v)-rep(4,4) Subarrays can be obtained as follows: v[3,2,,2] [1] 71 87 103 119 v[3,,,2] [,1] [,2] [,3] [,4] [1,] 67 83 99 115 [2,] 71 87 103 119 [3,] 75 91 107 123 [4,] 79 95 111 127 In the general case this procedure is very tedious. Given an array A, dim(A)=(dim_1,dim_2,...,dim_d) and two vectors v1=(n_i1,...n_ik), v2=(int_1,...,int_k) ('marginals' and relevant 'interval numbers') is there a smart way to obtain A[,...,int_1,,int_2,,,int_k,] ? Best wishes Gunnar Hellmund __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] na.action
Maybe this does what you want: x - as.matrix(read.table(clipboard)) x V1 V2 V3 V4 1 NA 0 0 0 2 0 NA 0 NA 3 0 0 NA 2 4 0 0 2 NA rowSums(x==2, na.rm=T) 1 2 3 4 0 0 1 1 There's probably at least 5 or 6 other quite sensible ways of doing this, but this is probably the fastest (and the least versatile). A more general building block is the sum() function, as in: sum(x[3,]==2, na.rm=T) [1] 1 The key is the use of the 'na.rm=T' argument value. hope this helps, Tony Plate Tim Smith wrote: Hi, I had the following code: testp - rcorr(t(datcm1),type = pearson) mat1 - testp[[1]][,] 0.6 mat2 - testp[[3]][,] 0.05 mat3 - mat1 + mat2 The resulting mat3 (smaller version) matrix looks like: NA 000 0 NA0 NA 0 0 NA2 0 02 NA To get to the number of times a '2' appears in the rows, I was trying to run the following code: numrow = nrow(mat3) counter - matrix(nrow = numrow,ncol =1) for(i in 1:numrow){ count = 0; for(j in 1:numrow){ if(mat3[i,j] == 2){ count = count + 1 } } counter[i,1] = count } However, I get the following error: 'Error in if (mat3[i, j] == 2) { : missing value where TRUE/FALSE needed' I also tried to use the na.action, but couldn't get anything. I'm sure there must be a relatively easy fix to this. Is there a workaround this problem? thanks, Tim __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] summary(as.factor(x) - force to not sort the result according factor levels
Christoph Lehmann wrote: Hi The result of a summary(as.factor(x)) (see example below) call is sorted according to the factor level. How can I get the result not sorted but in the original order of the levels in x? by creating the factor with the levels in the order you want: test - c(120402, 120402, 120402, 1323, 1323,200393, 200393, 200393, 200393, 200393) summary(factor(test, levels=unique(test))) 120402 1323 200393 3 2 5 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] function for cumulative occurrence of elements
I'm not entirely sure what you want, but is it 9 5 3 for this data? (9 new species occur at the first point, 5 new at the second, and 3 new at the third). If this is right, then to get accumulation curve when random Points are considered, you can probably just index rows of dt appropriately. dd - read.table(clipboard, header=T) dd[,1:3] Pointspecies frequency 1 7 American_elm 7 2 7 apple 2 3 7 black_cherry 8 4 7 black_oak 1 5 7chokecherry 1 6 7 oak_sp 1 7 7 pignut_hickory 1 8 7 red_maple 1 9 7 white_oak 5 10 9 black_spruce 2 11 9blue_spruce 2 12 9missing12 13 9 Norway_spruce 8 14 9 white_spruce 3 1512 apple 2 1612 black_cherry 1 1712 black_locust 1 1812 black_walnut 1 1912 lilac 3 2012missing 2 # dt: table of which species occur at which Points dt - table(dd$Point, dd$species) # doc: for each species, the index of the Point where # it first occurs doc - apply(dt, 2, function(x) which(x==1)[1]) doc American_elm apple black_cherry black_locust black_oak 1 1 1 3 1 black_spruce black_walnutblue_sprucechokecherry lilac 2 3 2 1 3 missing Norway_spruce oak_sp pignut_hickory red_maple 2 2 1 1 1 white_oak white_spruce 1 2 table(doc) doc 1 2 3 9 5 3 hope this helps, Tony Plate Steven K Friedman wrote: Hello, I have a data set with 9700 records, and 7 parameters. The data were collected for a survey of forest communities. Sample plots (1009) and species (139) are included in this data set. I need to determine how species are accumulated as new plots are considered. Basically, I want to develop a species area curve. I've included the first 20 records from the data set. Point represents the plot id. The other parameters are parts of the information statistic H'. Using Table, I can construct a data set that lists the occurrence of a species at any Point (it produces a binary 0/1 data table). From there it get confusing, regarding the most efficient approach to determining the addition of new and or repeated species occurrences. ptcount - table(sppoint.freq$species, sppoint.freq$Point) From here I've played around with colSums to calculate the number of species at each Point. The difficulty is determining if a species is new or repeated. Also since there are 1009 points a function is needed to screen every Point. Two goals are of interest: 1) the species accumulation curve, and 2) an accumulation curve when random Points are considered. Any help would be greatly appreciated. Thank you Steve Friedman Pointspecies frequency point.list point.prop log.prop point.hprime 1 7 American elm 7 27 0.25925926 -1.3499267 0.3499810 2 7 apple 2 27 0.07407407 -2.6026897 0.1927918 3 7 black cherry 8 27 0.29629630 -1.2163953 0.3604134 4 7 black oak 1 27 0.03703704 -3.2958369 0.1220680 5 7chokecherry 1 27 0.03703704 -3.2958369 0.1220680 6 7 oak sp 1 27 0.03703704 -3.2958369 0.1220680 7 7 pignut hickory 1 27 0.03703704 -3.2958369 0.1220680 8 7 red maple 1 27 0.03703704 -3.2958369 0.1220680 9 7 white oak 5 27 0.18518519 -1.6863990 0.3122961 10 9 black spruce 2 27 0.07407407 -2.6026897 0.1927918 11 9blue spruce 2 27 0.07407407 -2.6026897 0.1927918 12 9missing12 27 0. -0.8109302 0.3604134 13 9 Norway spruce 8 27 0.29629630 -1.2163953 0.3604134 14 9 white spruce 3 27 0. -2.1972246 0.2441361 1512 apple 2 27 0.07407407 -2.6026897 0.1927918 1612 black cherry 1 27 0.03703704 -3.2958369 0.1220680 1712 black locust 1 27 0.03703704 -3.2958369 0.1220680 1812 black walnut 1 27 0.03703704 -3.2958369 0.1220680 1912 lilac 3 27 0. -2.1972246 0.2441361 2012missing 2 27 0.07407407 -2.6026897 0.1927918 __ R-help@stat.math.ethz.ch
Re: [R] Generating correlated data from uniform distribution
Isn't this a little trickier with non-normal variables? It sounds like Menghui Chen wants variables that have uniform marginal distribution, and a specified correlation. When I look at histograms (or just the quantiles) of the rows of dat2 in your example, I see something for dat2[2,] that does not look much like it comes from a uniform distribution. dat-matrix(runif(2000),2,1000) rho-.77 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.7513892 hist(dat2[1,]) hist(dat2[2,]) quantile(dat2[1,]) 0% 25% 50% 75%100% 0.000655829 0.246216035 0.507075912 0.745158441 0.16418 quantile(dat2[2,]) 0% 25% 50% 75% 100% 0.0393046 0.4980066 0.7150426 0.9208855 1.3864704 -- Tony Plate Jim Brennan wrote: dat-matrix(runif(2000),2,1000) rho-.77 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.7513892 dat-matrix(runif(2),2,1) rho-.28 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.2681669 dat-matrix(runif(20),2,10) rho-.28 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.2814035 See ?choleski -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Menghui Chen Sent: July 1, 2005 4:49 PM To: r-help@stat.math.ethz.ch Subject: [R] Generating correlated data from uniform distribution Dear R users, I want to generate two random variables (X1, X2) from uniform distribution (-0.5, 0.5) with a specified correlation coefficient r. Does anyone know how to do it in R? Many thanks! Menghui __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how as.numeric() !- factor
The problem is that the 2nd column in your data frame has been converted into a factor. This happened because you used cbind() with mixed character and numeric vectors. cbind() with these types of arguments will construct a character matrix. Then when you passed that character matrix to as.data.frame() it converted both columns to factors. Here's a simpler example of what happened: cbind(letters[1:2], c(1,3)) [,1] [,2] [1,] a 1 [2,] b 3 x - as.data.frame(cbind(letters[1:2], c(1,3))) x V1 V2 1 a 1 2 b 3 as.numeric(x[,2]) [1] 1 2 as.numeric(as.character(x[,2])) [1] 1 3 With the data frame as you constructed it, you need an expression like round(as.numeric(as.character(Np.occup97.98[,2])), 2) to accomplish what you want. It would probably be better to construct a more felicitous data frame in the first place: df - data.frame(site = levels(sums$site), Np.occup97.98 = sums$Ant.Nptrad97.98/Ant.trad$Ant.trad97.98) (unless of course you had some unstated reason for constructing the data frame the way you did) -- Tony Plate At Thursday 10:03 AM 7/31/2003 +0200, Tord Snall wrote: Dear all, I have divided two vectors: Np.occup97.98- as.data.frame(cbind(site = levels(sums$site), Np.occup97.98 = sums$Ant.Nptrad97.98/Ant.trad$Ant.trad97.98)) Np.occup97.98 site Np.occup97.98 1 erken97 0.342592592592593 2 erken98 0.333 3 rormyran 0.48471615720524 4 valkror 0.286026200873362 However, at a later stage of the analysis I want round(Np.occup97.98[,2], 2) Error in Math.factor(x, digits) : round not meaningful for factors neither did this work: round(Np.occup97.98[,2], 2) Error in Math.factor(x, digits) : round not meaningful for factors or this: round(as.numeric(Np.occup97.98[,2]), 2) [1] 3 2 4 1 because, as clearly written in the help file: as.numeric for factors yields the codes underlying the factor levels, not the numeric representation of the labels. I've discovered this solution: Np.occup97.98- as.data.frame(cbind(site = levels(sums$site), + Np.occup97.98 = round(sums$Ant.Nptrad97.98/Ant.trad$Ant.trad97.98,2))) Np.occup97.98 site Np.occup97.98 1 erken97 0.34 2 erken98 0.33 3 rormyran 0.48 4 valkror 0.29 However, I would like to do this rounding later. Could someone give a tip. I think that I would have been helped by a sentence in help(as.numeric). Thanks in advance. Sincerely, Tord --- Tord Snäll Avd. f växtekologi, Evolutionsbiologiskt centrum, Uppsala universitet Dept. of Plant Ecology, Evolutionary Biology Centre, Uppsala University Villavägen 14 SE-752 36 Uppsala, Sweden Tel: 018-471 28 82 (int +46 18 471 28 82) (work) Tel: 018-25 71 33 (int +46 18 25 71 33) (home) Fax: 018-55 34 19 (int +46 18 55 34 19) (work) E-mail: [EMAIL PROTECTED] Check this: http://www.vaxtbio.uu.se/resfold/snall.htm! __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] vectorization question
From ?data.frame: Details: A data frame is a list of variables of the same length with unique row names, given class `data.frame'. Your example constructs an object that does not conform to the definition of a data frame (the new column is not the same length as the old columns). Some data frame functions may work OK with such an object, but others will not. For example, the print function for data.frame silently handles such an illegal data frame (which could be described as unfortunate.) It would probably be far easier to construct a correct data frame in the first place than to try to find and fix functions that don't handle illegal data frames. For adding a new column to a data frame, the expressions x[,new.column.name] - value and x[[new.column.name]] - value will replicate the value so that the new column is the same length as the existing ones, while the $ operator in an assignment will not replicate the value. (One could argue that this is a deficiency, but I think it has been that way for a long time, and the behavior is the same in the current version of S-plus.) x1 - data.frame(a=1:3) x2 - x1 x3 - x1 x1$b - 0 x2[,b] - 0 x3[[b]] - 0 sapply(x1, length) a b 3 1 sapply(x2, length) a b 3 3 sapply(x3, length) a b 3 3 as.matrix(x2) a b 1 1 0 2 2 0 3 3 0 as.matrix(x1) Error in as.matrix.data.frame(x1) : dim- length of dims do not match the length of object At Thursday 04:50 PM 8/14/2003 +, Alberto Murta wrote: Dear all I recently noticed the following error when cohercing a data.frame into a matrix: example - matrix(1:12,4,3) example - as.data.frame(example) example$V4 - 0 example V1 V2 V3 V4 1 1 5 9 0 2 2 6 10 0 3 3 7 11 0 4 4 8 12 0 example - as.matrix(example) Error in as.matrix.data.frame(example) : dim- length of dims do not match the length of object However, if the column to be added has the right number of lines, there's no error: example - matrix(1:12,4,3) example - as.data.frame(example) example$V4 - rep(0,4) example V1 V2 V3 V4 1 1 5 9 0 2 2 6 10 0 3 3 7 11 0 4 4 8 12 0 example - as.matrix(example) example V1 V2 V3 V4 1 1 5 9 0 2 2 6 10 0 3 3 7 11 0 4 4 8 12 0 Shouldn't it work well both ways? I checked the attributes and dims of the data frame and they are the same in both cases. Where's the difference that originates the error message? Thanks in advance Alberto platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major1 minor7.1 year 2003 month06 day 16 language R -- Alberto G. Murta Institute for Agriculture and Fisheries Research (INIAP-IPIMAR) Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062 Fax:+351 213015948 | http://www.ipimar-iniap.ipimar.pt/pelagicos/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!!
Perhaps you were trying for as sample size increases, variance *of the mean* decreases (a least when variance is finite). If you swap mean and var in your code, I think you will get what you are looking for. -- Tony Plate At Tuesday 05:42 PM 8/19/2003 +, Padmanabhan, Sudharsha wrote: Hello, I am running a few simulations for clinical trial anlysis. I want some help regarding the following. We know trhat as the sample size increases, the variance should decrease, but I am getting some unexpected results. SO I ran a code (shown below) to check the validity of this. large-array(1,c(1000,1000)) small-array(1,c(100,1000)) for(i in 1:1000){large[i,]-rnorm(1000,0,3)} for(i in 1:1000){small[i,]-rnorm(100,0,3)}} yy-array(1,100) for(i in 1:100){yy[i]-var(small[i,])} y1y-array(1,1000) for(i in 1:1000){y1y[i]-var(large[i,])} mean(yy);mean(y1y); [1] 8.944 [1] 9.098 This shows that on an average,for 1000 such samples of 1000 Normal numbers, the variance is higher than that of a 100 samples of 1000 random numbers. Why is this so? Can someone please help me out Thanks. Regards ~S. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Using files as connections
You need to save the connection object returned by file() and then use that object in other functions. You need to change the appropriate lines to the following (at least): con - file(c:/data/perry/data.csv,open=r) cline - readLines(con,n=1) close(con) (I don't know if more changes are needed to get it working.) Note that using the connection object in other functions can have side effects on the connection object (which is how a connection remembers its point in the file.) (Perhaps more accurately, the side effect is on the internal system data referred to by the R connection object.) con - textConnection(letters) con descriptionclass mode text letters textConnection r text opened can readcan write openedyes no readLines(con, 1) [1] a readLines(con, 1) [1] b con.saved - con readLines(con, 1) [1] c readLines(con.saved, 1) [1] d readLines(con, 1) [1] e identical(con, con.saved) [1] TRUE showConnections() description classmode text isopen can read can write 3 letters textConnection r text opened yesno hope this helps, Tony Plate At Thursday 11:19 AM 8/28/2003 +1200, you wrote: I have been trying to read a random sample of lines from a file into a data frame using readLines(). The help indicates that readLines() will start from the current line if the connection is open, but presented with a closed connection it will open it, start from the beginning, and close it when finished. In the code that follows I tried to open the file before reading but apparently without success, because the result was repeated copies of the first line: flines - 107165 slines - 100 selected - sort(sample(flines,slines)) strvec - rep(,slines) file(c:/data/perry/data.csv,open=r) isel - 0 for (iline in 1:slines) { isel - isel + 1 cline - readLines(c:/data/perry/data.csv,n=1) if (iline == selected[isel]) strvec[isel] - cline else isel - isel - 1 } close(c:/data/perry/data.csv) sel.flows - read.table(textConnection(strvec), header=FALSE, sep=,) There was also an error no applicable method for close. Comments gratefully received. Murray Jorgensen __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: Just don't do it, surely? (was RE: [R] Retrieve ... argument values)
At Wednesday 11:19 AM 9/17/2003 +0100, Simon Fear wrote: There have been various elegant solutions to test for the presence of a particular named parameter within a ... argument, such as if (!is.null(list(...)$ylim)) if (ylim %in% names(list(...))) I think I'd have to comment these lines pretty clearly if I wanted to easily follow the code in 6 months time. But I'm still not convinced it is ever a good idea to use this technique in preference to using explicit named arguments. If there is something special about ylim, why insist that it be passed within ... in the first place? Surely it's better to define the function as function(x,ylim=default,...) within which you do your special ylim stuff, then call plot(x, ylim=ylim,...))?? Can anyone come up with a good reason not to follow that principle? I think my earlier post may have been misconstrued: I'm not saying never write functions that use ..., I'm just saying never write functions that depend on a particular argument being passed via Several reasons for not following that principle involve proliferation of defaults -- if the lower level functions have defaults, then those defaults must be repeated at the higher levels. This is a good reason for not following that principle, because it makes software maintenance more difficult. Another reason for not following that principle is that tf you have several lower level functions with different default values for an argument of the same name, it becomes impossible to get the lower-level default behavior. -- Tony Plate __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: Just don't do it, surely? (was RE: [R] Retrieve ... argument values)
Simon, I agree, for some (maybe most) arguments it is good to know what defaults are being used. But there are some for which I really don't want to know. An example of the latter is arguments that control interaction with a database. Suppose I have a low-level interaction function that takes an argument 'db.mode', where this specifies a way of interacting with the database. Now, if I also have a higher level function that gets data from the database I might write: db.get.high.level.data - function(what, ...) { processed.what - do something to 'what' db.get.low.level.data(processed.what, ...) } db.get.low.level.data(what, db.mode=2) { # fetch the data } By using ... arguments I can specify a db.mode argument to the higher level function, or just get the default provided in the lower level function. If I then change the lower level function to provide a better mode of interaction I can make that mode the default in the lower level function, and be confident it will be used everywhere. But if I specify the defaults in both places, then changing defaults becomes a big task. As for the second point regarding different functions having different defaults for an argument of the same name, it can certainly be handled as you describe by making different argument names in the higher level function. -- Tony Plate At Wednesday 05:25 PM 9/17/2003 +0100, Simon Fear wrote: Tony, I don't understand what you mean. Could you give an example? -Original Message- From: Tony Plate [mailto:[EMAIL PROTECTED] ... I'm not saying never write functions that use ..., I'm just saying never write functions that depend on a particular argument being passed via Several reasons for not following that principle involve proliferation of defaults -- if the lower level functions have defaults, then those defaults must be repeated at the higher levels. This is a good reason for not following that principle, because it makes software maintenance more difficult. I don't think I agree with that (though maybe I just didn't get it). I prefer to know what arguments a function is going to use. Another reason for not following that principle is that tf you have several lower level functions with different default values for an argument of the same name, it becomes impossible to get the lower-level default behavior. I'm lost there. When I choose which function to call it has its own default?? I often call a function of mine called timepoints.summary for which I want to pass graphical parameters to boxplots, matplots and confidence interval plots. So I name the arguments cex.boxplot, col.boxplot etc and then within the function I call boxplot(x, cex=boxplot.cex) and so on. I wouldn't expect a single argument cex to magically work out whether it was being used in a boxplot or matplot and change to a different default?? Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 69 Fax: +44 (0) 1379 65 email: [EMAIL PROTECTED] web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and contains information which may be legally privileged. It is intended for the stated addressee(s) only. Access to this email by anyone else is unauthorised. If you are not the intended addressee, any action taken (or not taken) in reliance on it, or any disclosure or copying of the contents of it is unauthorised and unlawful. If you are not the addressee, please inform the sender immediately and delete the email from your system. This message and any associated attachments have been checked for viruses using an internationally recognised virus detection process. However, Internet communications cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete. Therefore, we do not accept responsibility for any errors or omissions that are present in this message, or any attachment, that have arisen as a result of e-mail transmission. If verification is required, please request a hard-copy version. Any views or opinions presented are solely those of the author and do not necessarily represent those of Syne qua non. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: AW: [R] Rank and extract data from a series
Using Thomas Unternährer's handy example, one could also do: X - c(1, 4.5, 2.3, 1, 7.3) mean(order(X, decreasing=TRUE)[1:2]) [1] 3.5 I think this will give the same results as Thomas Unternährer's suggested code in almost all cases, but it is perhaps more concise and direct (provided that you don't actually need the values of the top items). (of course you have to change the 1:2 to 1:10 for your needs). Note that this question gets tricky if there are ties such that there is no unique set of row numbers that identify N top items. For example, consider the following data: X - c(1,3,2,3,4) Taking top two, should the answer be 3.5 (avg of row numbers 2 and 5), 4.5 (avg of row numbers 4 and 5), or 3.67 (avg of row numbers 2,4 and 5)? mean(order(X, decreasing=TRUE)[1:2]) [1] 3.5 order(X, decreasing=TRUE)[1:2] [1] 5 2 # Andy Liaw's suggestion: mean(which(X %in% sort(X, decreasing=TRUE)[1:2])) [1] 3.67 which(X %in% sort(X, decreasing=TRUE)[1:2]) [1] 2 4 5 # Thomas Unternährer's suggestion: mean(match(sort(X, decreasing=TRUE)[1:2], X)) [1] 3.5 match(sort(X, decreasing=TRUE)[1:2], X) [1] 5 2 hope this helps, Tony Plate At Tuesday 02:23 PM 9/23/2003 +0200, Unternährer Thomas, uth wrote: Hi, I would like to rank a time-series of data, extract the top ten data items from this series, determine the corresponding row numbers for each value in the sample, and take a mean of these *row numbers* (not the data). I would like to do this in R, rather than pre-process the data on the UNIX command line if possible, as I need to calculate other statistics for the series. I understand that I can use 'sort' to order the data, but I am not aware of a function in R that would allow me to extract a given number of these data and then determine their positions within the original time series. e.g. Time series: 1.0 (row 1) 4.5 (row 2) 2.3 (row 3) 1.0 (row 4) 7.3 (row 5) Sort would give me: 1.0 1.0 2.3 4.5 7.3 I would then like to extract the top two data items: 4.5 7.3 and determine their positions within the original (unsorted) time series: 4.5 = row 2 7.3 = row 5 then take a mean: 2 and 5 = 3.5 Thanks in advance. James Brown X - c(1, 4.5, 2.3, 1, 7.3) X1 - sort(X, decreasing=TRUE)[1:2] X2 - match(X1, X) mean(X2) Hope this helps Thomas ___ James Brown Cambridge Coastal Research Unit (CCRU) Department of Geography University of Cambridge Downing Place Cambridge CB2 3EN, UK Telephone: +44 (0)1223 339776 Mobile: 07929 817546 Fax: +44 (0)1223 355674 E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] http://www.geog.cam.ac.uk/ccru/CCRU.html ___ On Wed, 10 Sep 2003, Jerome Asselin wrote: On September 10, 2003 04:03 pm, Kevin S. Van Horn wrote: Your method looks like a naive reimplementation of integration, and won't work so well for distributions that have the great majority of the probability mass concentrated in a small fraction of the sample space. I was hoping for something that would retain the adaptability of integrate(). Yesterday, I've suggested to use approxfun(). Did you consider my suggestion? Below is an example. N - 500 x - rexp(N) y - rank(x)/(N+1) empCDF - approxfun(x,y) xvals - seq(0,4,.01) plot(xvals,empCDF(xvals),type=l, xlab=Quantile,ylab=Cumulative Distribution Function) lines(xvals,pexp(xvals),lty=2) legend(2,.4,c(Empirical CDF,Exact CDF),lty=1:2) It's possible to tune in some parameters in approxfun() to better match your personal preferences. Have a look at help(approxfun) for details. HTH, Jerome Asselin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Tony Plate [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] confusion about what to expect?
Have you investigated the drop= argument to [? (as in the expression testdata[,2,drop=F], which will return a dataframe). [.data.frame has somewhat different behavior from [ on matrices with respect to the drop argument: If the result would be a dataframe with a single column, the default behavior of [.data.frame is to return a vector (return a dataframe always if drop=F), but if the result would be a dataframe with a single row, the default behavior is to return a dataframe (return a list if drop=T). E.g.: class(data.frame(a=1:3,b=4:6)[,1]) [1] integer class(data.frame(a=1:3,b=4:6)[,1,drop=F]) [1] data.frame class(data.frame(a=1:3,b=4:6)[1,]) [1] data.frame class(data.frame(a=1:3,b=4:6)[1,,drop=T]) [1] list The default behavior is often what you want, but when it isn't it can be confusing, especially it's not that easy to find documentation for this (at least not in a quick look through the FAQ, ?[, and An Introduction to R -- please excuse me if I overlooked something.) The thing you have going on with names(testdata[...]) is merely a consequence of whether or not the result of the subsetting operation is a dataframe or a vector. hope this helps, Tony Plate At Tuesday 04:08 PM 9/23/2003 -0700, you wrote: In playing around with data.frames (and wanting a simple, cheap way to use the variable and case names in plots; but I've solved that with some hacks, yech), I noticed the following behavior with subsetting. testdata - data.frame(matrix(1:20,nrow=4,ncol=5)) names(testdata) ## expect labels, get them names(testdata[2,]) ## expect labels, get them names(testdata[,2]) ## expect labels, but NOT -- STRIPPED OFF?? testdata[,2] ## would have expect a name (X2) in the front? NOT EXPECTED testdata[2,] ## get what I expect testdata[2,2] ## just a number, not a sub-data.frame? unexpected testdata[2,2:3] ## this is a data.frame testdata[2:3,2:3] ## and this is, too. version _ platform i386-pc-linux-gnu arch i386 os linux-gnu system i386, linux-gnu status alpha major1 minor8.0 year 2003 month09 day 20 language R I don't have 1.7.1 handy at this location to test, but I would've expected a data.frame-like object upon subsetting; should I have expected otherwise? (granted, a data.frame with just a single variable could be thought of as silly, but it does have some extra information that might be worthwhile, on occassion?) I'm not sure that it is a bug, but I was caught by suprise. If it isn't a bug, and someone has a concise way to think through this, for my future reference, I'd appreciate hearing about it. best, -tony -- [EMAIL PROTECTED]http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Why does a[which(b == c[d])] not work?
At Wednesday 03:06 PM 10/8/2003 +0200, Martin Maechler wrote: Your question has been answered by Achim and Peter Dalgaard (at least). Just a note: Using a[which(logic)] looks like a clumsy and inefficient way of writing a[ logic ] and I think you shouldn't propagate its use ... What then is the recommended way of treating an NA in the logical subset as a FALSE? (Or were you just talking about the given example, which didn't have this issue. However, you admonition seemed more general.) As in: x - 1:4 y - c(1,2,NA,4) x[y %% 2 == 0] [1] 2 NA 4 x[which(y %% 2 == 0)] [1] 2 4 Sometimes one might want the first result, but more usually, I want the second, and using which() seems a convenient way to get it. -- Tony Plate __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: AW: [R] Getting rows from a dataframe
If you're having so much trouble, perhaps it's because you want to get a vector result? This requires a little more, and if so, perhaps one of the following provides what you are looking for: x - data.frame(a=1:3,b=4:6) # row as a data frame x[2,] a b 2 2 5 # row as a list x[2,,drop=T] $a [1] 2 $b [1] 5 # row as a vector unlist(x[2,,drop=T]) a b 2 5 # row as a vector again unlist(x[2,]) a b 2 5 # row as a matrix (if x contains any non-numeric columns, this will be a character matrix) as.matrix(x[2,]) a b 2 2 5 # row as a vector (if x contains any non-numeric columns, this will be a character vector) as.matrix(x)[2,] a b 2 5 # row as a numeric vector (non-numeric columns in x will be converted to numeric data, see ?data.matrix for how) data.matrix(x[2,]) a b 2 2 5 Tony Plate At Thursday 05:40 PM 10/9/2003 +0100, Mark Lee wrote: I have this right on the desk in front of me. I have gone through most of this actually and have been looking for the answer for several weeks now before resorting to this. The only reference I've found to this is on page 20 under array indexing but didn't see the relation to dataframes. Thanks, Mark __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Subseting in a 3D array
One way would be: apply(ib5km.lincol.random[1:3,], 1, function(i) ib5km15.dbc[i[1],i[2],]) (untested) -- Tony Plate At Wednesday 06:47 PM 10/15/2003 +0200, Agustin Lobo wrote: Hi! I have a 3d array: dim(ib5km15.dbc) [1] 190 241 19 and a set of positions to extract: ib5km.lincol.random[1:3,] [,1] [,2] [1,] 78 70 [2,] 29 213 [3,] 180 22 Geting the values of a 2D array for that set of positions would be: ima - ib5km15.dbc[,,1] ima[ib5km.lincol.random[1:10,]] but don't find the way for the case of the 3D array: ib5km15.dbc[ib5km.lincol.random[1:10,],] Error in ib5km15.dbc[ib5km.lincol.random[1:10, ], ] : incorrect number of dimensions Could anyone suggest the way of subseting the 3D array to get a vector of z values for each position recorded in ib5km.lincol.random? (avoiding the use of for loops). Thanks Agus Dr. Agustin Lobo Instituto de Ciencias de la Tierra (CSIC) Lluis Sole Sabaris s/n 08028 Barcelona SPAIN tel 34 93409 5410 fax 34 93411 0012 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] datetime data and plotting
At Friday 02:20 PM 10/17/2003 -0400, Gabor Grothendieck wrote: [material deleted] Time zones are not part of the problem yet POSIXt forces this extraneous complication on you. chron has no time zones in the first place and therefore allows you to work in the natural frame of the problem, avoiding subtle problems like this. This sort of thing has been discussed a number of times and I had previously suggested that chron be moved to the base or else that a timezone-less version of POSIXt be added to the base. See: https://stat.ethz.ch/pipermail/r-devel/2003-August/027269.html I also see the usefulness of a time-zone-free time/date class, but why does chron need to be moved to the base to be useful here? -- Tony Plate __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] do.call() and aperm()
I've also been thinking about how to specify that 'along' should be length(dim)+1. At the moment one can specify any number from 0 up to length(dim)+1, but as you point out you have to spell out length(dim)+1 as the value for the along argument. It would possible to make abind() automatically calculate along=length(dim)+1 when given along=NA, or along=-1, or along=+1. Any preferences? -- Tony Plate At Tuesday 04:48 PM 10/21/2003 +0100, Robin Hankin wrote: Hi everyone I've been playing with do.call() but I'm having problems understanding it. I have a list of n elements, each one of which is d dimensional [actually an n-by-n-by ... by-n array]. Neither n nor d is known in advance. I want to bind the elements together in a higher-dimensional array. Toy example follows with d=n=3. f - function(n){array(n,c(3,3,3))} x - sapply(1:3,f,simplify=FALSE) Then what I want is ans - abind(x[[1]] , x[[2]] , x[[3]] , along=4) [abind() is defined in library(abind)]. Note that dim(ans) is c(3,3,3,3), as required. PROBLEM: how do I do tell do.call() that I want to give abind() the extra argument along=4 (in general, I want along=length(dim(x[[1]]))+1)? Oblig Attempt: jj - function(...){abind(... , along=4)} do.call(jj , x) This works, because I know that d=3 (and therefore use along=4), but it doesn't generalize easily to arbitrary d. I'm clearly missing something basic. Anyone? __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] do.call() and aperm()
do.call(abind c(list.of.arrays, list(along=4))) This reminds me that I had been meaning to submit an enhancement of abind() that allows the first argument to be a list of arrays so that you could simply do abind(list.of.arrays, along=4), as I find this is a very common pattern. -- Tony Plate At Tuesday 04:48 PM 10/21/2003 +0100, Robin Hankin wrote: Hi everyone I've been playing with do.call() but I'm having problems understanding it. I have a list of n elements, each one of which is d dimensional [actually an n-by-n-by ... by-n array]. Neither n nor d is known in advance. I want to bind the elements together in a higher-dimensional array. Toy example follows with d=n=3. f - function(n){array(n,c(3,3,3))} x - sapply(1:3,f,simplify=FALSE) Then what I want is ans - abind(x[[1]] , x[[2]] , x[[3]] , along=4) [abind() is defined in library(abind)]. Note that dim(ans) is c(3,3,3,3), as required. PROBLEM: how do I do tell do.call() that I want to give abind() the extra argument along=4 (in general, I want along=length(dim(x[[1]]))+1)? Oblig Attempt: jj - function(...){abind(... , along=4)} do.call(jj , x) This works, because I know that d=3 (and therefore use along=4), but it doesn't generalize easily to arbitrary d. I'm clearly missing something basic. Anyone? __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] do.call() and aperm()
Thanks, I appreciate knowing that. abind() can currently take a fractional value for along, and behaves as per your description of 'catenation' in APL. Does APL supply any hints as to what sort of value to give 'along' to tell abind() to perform 'lamination'? -- Tony Plate At Tuesday 01:22 PM 10/21/2003 -0400, Gabor Grothendieck wrote: I suggest following APL as that is a well thought out system. In APL terms there are two operations here called: - catenation. In abind, this occurs when along = 1,2,...,length(dim) - lamination. In abind, this occurs when along = length(dim) + 1 however, the latter is really only one case of lamination in which the added dimension comes at the end. To do it in full generality would require that one can add the new dimension at any spot including before the first, between the first and the second, ..., after the last. In APL notation, if along has a fractional part then the new dimension is placed between floor(along) and ceiling(along). Thus along=1.1 would put the new dimension between the first and second. The actual value of the fractional part is not material. --- From: Tony Plate [EMAIL PROTECTED] I've also been thinking about how to specify that 'along' should be length(dim)+1. At the moment one can specify any number from 0 up to length(dim)+1, but as you point out you have to spell out length(dim)+1 as the value for the along argument. It would possible to make abind() automatically calculate along=length(dim)+1 when given along=NA, or along=-1, or along=+1. Any preferences? -- Tony Plate ___ No banners. No pop-ups. No kidding. Introducing My Way - http://www.myway.com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] what's going on here with substitute() ?
I was trying to create a function with a value computed at creation time, using substitute(), but I got results I don't understand: this.is.R Error: Object this.is.R not found substitute(this.is.R - function() X, list(X=!is.null(options(CRAN)[[1]]))) this.is.R - function() TRUE # the above expression as printed is what I want for the function definition eval(substitute(this.is.R - function() X, list(X=!is.null(options(CRAN)[[1]] this.is.R function() X this.is.R() [1] TRUE X Error: Object X not found rm(this.is.R) # Try again a slightly different way substitute(this.is.R - function() X, list(X=!is.null(options(CRAN)[[1]]))) this.is.R - function() TRUE .Last.value this.is.R - function() TRUE eval(.Last.value) this.is.R function() X this.is.R() [1] TRUE rm(this.is.R) Why is the body of the function X when I substituted a different expression for X? Also, given that the body of the function is X, how does the function evaluate to TRUE since X is not defined anywhere (except in a list that should have been discarded.) This happens with both R 1.7.1 and R 1.8.0 (under Windows 2000). (yes, I did discover the function is.R(), but I still want to discover what's going here.) -- Tony Plate PS. In S-plus 6.1, things worked as I had expected: substitute(this.is.R - function() X, list(X=!is.null(options(CRAN)[[1]]))) this.is.R - function() F eval(substitute(this.is.R - function() X, list(X=!is.null(options(CRAN)[[1]] function() F this.is.R function() F this.is.R() [1] F __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help