Re: [R] Convert factor to numeric vector of labels
Hi all, If we, the R community, are endeavoring to make R user friendly (gasp!), I think that one of the first places to start would be in setting stringsAsFactors = FALSE. Several times I've run into instances of folks decrying R's rediculous usage of memory in reading data, only to come to find out that these folks were unknowingly importing certain columns as factors. The fix is easy once you know it, but it isn't obvious to new users, and I'd bet that it turns some % of people off of the program. Factors are not used often enough to justify this default behavior in my opinion. When factors are used, the user knows to treat the variable as a factor, and so it can be done on a case-by-case (or should I say variable-by-variable?) basis. Is this a default that should be changed? Matt On 8/13/07, John Kane [EMAIL PROTECTED] wrote: This is one of R's rather _endearing_ little idiosyncrasies. I ran into it a while ago. http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html For some reason, possibly historical, the option stringAsFactors is set to TRUE. As Prof Ripley says FAQ 7.10 will tell you as.numeric(as.character(f)) # for a one-off conversion From Gabor Grothendieck A one-off solution for a complete data.frame DF - data.frame(let = letters[1:3], num = 1:3, stringsAsFactors = FALSE) str(DF) # to see what has happened. You can reset the option globally, see below. However you might want to read Gabor Grothendieck's comment about this in the thread referenced above since it could cause problems if you transfer files alot. Personally I went with the global option since I don't tend to transfer programs to other people and I was getting tired of tracking down errors in my programs caused by numeric and character variables suddenly deciding to become factors. From Steven Tucker: You can also this option globally with options(stringsAsFactors = TRUE) # in \library\base\R\Rprofile --- Falk Lieder [EMAIL PROTECTED] wrote: Hi, I have imported a data file to R. Unfortunately R has interpreted some numeric variables as factors. Therefore I want to reconvert these to numeric vectors whose values are the factor levels' labels. I tried as.numeric(factor), but it returns a vector of factor levels (i.e. 1,2,3,...) instead of labels (i.e. 0.71, 1.34, 2.61,…). What can I do instead? Best wishes, Falk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speed up R
Robert, I'm not exactly an expert, but here's what I think. If you have only 786 MB of RAM on your machine and you are using ~500 of it in a session of R, that could slow things down considerably because your machine is trying to find free blocks of memory that haven't been used yet. I would buy additional RAM. As for Mike Prager's point about the type of hard drive being important, I'm not sure this is right (someone correct me if I'm misunderstanding). R stores and accesses objects through RAM - they aren't stored and accessed on the hard drive except when reading and writing. So hard drive type probably won't make much difference to speed in R. Matt On 6/20/07, Robert McFadden [EMAIL PROTECTED] wrote: -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] The advantage of dual processors is that you can use the machine for several things at once, including multiple R jobs. For example, when I am doing package checking I am typically checking 4 packages at once on a dual processor machine to get continuous high utilization. I would like to thank very much everybody taking part in discussion. Does an answer above suggest that I can open two R console and do simulations simultaneously? If so, all simulations take more or less 1/2 times - or much less then doing it in turn? During our discussion one mentioned that RAM is important. But in my computing I do not use up more then 500 MB. I have 786 MB it means (probably) that I have enough. Am I right? Best, Rob I have little doubt that a Pentium 4 would be much slower than the others. I've just bought an Intel Core 2 Duo E6600 primarily to run 64-bit Linux, but it also has Vista 64 and XP (32-bit) on it. I don't think the differences between the current dual-core chips are really enough to worry about: they will all look slow in less than a year. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speed up R
So Mike, let me ask you a question. If R runs out of RAM, does it begin to use virtual RAM, and hence begin to swap from the hard drive? If so, I could see how a faster hard drive would speed R up when you don't have enough RAM... On 6/20/07, Mike Prager [EMAIL PROTECTED] wrote: Matthew Keller [EMAIL PROTECTED] wrote: Robert, ... As for Mike Prager's point about the type of hard drive being important, I'm not sure this is right (someone correct me if I'm misunderstanding). R stores and accesses objects through RAM - they aren't stored and accessed on the hard drive except when reading and writing. So hard drive type probably won't make much difference to speed in R. In my experience, it makes a substantial difference if any swapping to disk is going on. That will happen if, e.g., other processes or Windows itself need RAM. Though R keeps the data in RAM, under Windows, non-SCSI disk I/O puts a noticeable load on the CPU. As SCSI controllers have CPUs of their own, they offload much of that work from the system CPU. I have compared dual-processor computers with equal RAM, one with a SCSI subsystem and one with fast (7200 RPM) ATA disks and slightly faster CPUs. One was my work machine, one my home. The difference was not subtle. For another example, think of how slow laptops seem when multitasking, compared to a good workstation. It is usually the poor disk subsystem that's the bottleneck, not the CPU. Mike -- Mike Prager, NOAA, Beaufort, NC * Opinions expressed are personal and not represented otherwise. * Any use of tradenames does not constitute a NOAA endorsement. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speed up R
Hi Robert, Here's my 2 cents. 64-bit is a memory issue, not a speed issue per se. If a concern is increasing RAM (which is important in R since objects are stored in RAM), then you will want to get 64 bit if you plan on getting a computer with over 4GB RAM. I'm not sure about this (someone correct me if I'm wrong), but I think that windows has problems addressing that much RAM (surely the 64bit Vista is OK with it though... surely). Linux or Apple (the powermac) might be better bets if you're wanting to work with programs that use a lot of RAM. BTW, Intel does make 64 bit chips now. They use them in macs. As for speed, go with multicore processors with as much GHz as possible. On 6/19/07, Robert McFadden [EMAIL PROTECTED] wrote: Dear R Users, I hope that there is someone who has an experience with a problem that I describe below and will help me. I must buy new desktop computer and I'm wondering which processor to choose if my only aim is to speed up R. I would like to reduce a simulation time - sometimes it takes days. I consider buying one of them (I'm working under Win XP 32 bit): 1. Intel Core2 Duo E6700 2.67 GHz 2. Dual-Core Intel Xeon processor 3070 - 2,66 GHz 3. AMD Athlon 64 X2 6000+ Or simple Pentium 4? I'm very confused because I'm not sure whether R takes advantage dual-core or not. If not, probably Athlon would be better, wouldn't be? I would appreciate any help. Rob __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cause of error message in cov function?
Hi Uwe and other R-folks, To answer your question, effects.cur is a matrix of real numbers. It turns out that, on updating her version to the current one, the error disappears. I'm posting this for posterity, just in case anyone else has a similar problem with the function cov in earlier R versions. I have no idea why the error was occurring, and can't track it down because I don't have access to R 2.1... Matt On 6/13/07, Uwe Ligges [EMAIL PROTECTED] wrote: Matthew Keller wrote: Hi all, I have written a script in R that simulates genetically informative data - it is posted on my website and available to the public. This is my first time to write a script for use by others and am learning that it isn't as easy as it seems. To the issue. My script runs fine on my machine and on a server I have access to, but a user has written me saying that it crashes the first time the function cov is called up. Below is her error message followed by the version of R she's using. Can anyone help me out here? I can't recreate her error message. Does anyone know what this might have to do with? Is it a version issue (she's using R 2.1)? I'd appreciate any help!! It may be a version issue, but hard to say since we do not know what effects.cur() is, not do we have any data to reproduce this. Uwe Ligges Matt ERROR MESSAGE: cov.varcomp - cov(t(effects.cur[c(A,AA,D,F,S,E,AGE,AGE.by.A),]*beta2)) there is an argument mssing. error message: Error in mean((a - mean(a)) * (b - mean(b))) : argument b is missing, with no default SPECIFICS OF HER MACHINE: memory.size() [1] 10985480 R.Version() $platform [1] i386-pc-mingw32 $arch [1] i386 $os [1] mingw32 $system [1] i386, mingw32 $status [1] $major [1] 2 $minor [1] 1.0 $year [1] 2005 $month [1] 04 $day [1] 18 $language [1] R .Platform $OS.type [1] windows $file.sep [1] / $dynlib.ext [1] .dll $GUI [1] Rgui $endian [1] little $pkgType [1] win.binary -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cause of error message in cov function?
Hi all, I have written a script in R that simulates genetically informative data - it is posted on my website and available to the public. This is my first time to write a script for use by others and am learning that it isn't as easy as it seems. To the issue. My script runs fine on my machine and on a server I have access to, but a user has written me saying that it crashes the first time the function cov is called up. Below is her error message followed by the version of R she's using. Can anyone help me out here? I can't recreate her error message. Does anyone know what this might have to do with? Is it a version issue (she's using R 2.1)? I'd appreciate any help!! Matt ERROR MESSAGE: cov.varcomp - cov(t(effects.cur[c(A,AA,D,F,S,E,AGE,AGE.by.A),]*beta2)) there is an argument mssing. error message: Error in mean((a - mean(a)) * (b - mean(b))) : argument b is missing, with no default SPECIFICS OF HER MACHINE: memory.size() [1] 10985480 R.Version() $platform [1] i386-pc-mingw32 $arch [1] i386 $os [1] mingw32 $system [1] i386, mingw32 $status [1] $major [1] 2 $minor [1] 1.0 $year [1] 2005 $month [1] 04 $day [1] 18 $language [1] R .Platform $OS.type [1] windows $file.sep [1] / $dynlib.ext [1] .dll $GUI [1] Rgui $endian [1] little $pkgType [1] win.binary -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error loading libraries in MAC
HI Mayte Gregory, This is probably a question needing to be posted to r-sig-mac. A search for this problem on that forum turns up lots of hits. I think everyone is having these problems (which makes me pause about whether to switch to Mac, given how much I use R). Below is a message from the r-sig-mac forum from R/Mac guru Simon Urbanek: Simon Urbanek to Byron, r-sig-mac show details 9:53 am (6 hours ago) On Apr 19, 2007, at 5:47 AM, Byron Ellis wrote: Actually it appears that a variety of packages are doing this (gtools did it to me earlier today). Looks like there may be a general compilation change, perhaps in preparation for 2.5? Yes, indeed, R 2.5.0 doesn't need the 4.0.3 compilers anymore and thus no path changes. Unfortunately this seems to have influenced the 2.4 builds process. A hot fix is to install gcc4.0.3 from the full image of R 2.4.1 (open the installation image, click on Packages and then gcc4.0.3 to install) or to use R 2.5.0 RC instead. I'll see if I can find the culprit in the meantime... On 4/18/07, Gregory Warnes [EMAIL PROTECTED] wrote: I have received a number of reports of problems with recent unversal Mac packages from CRAN when used with R 2.4.1. Has something in the build script changed? -G On Apr 18, 2007, at 4:49PM , Mayte Suarez-Farinas wrote: Hi I just installed the gmodels package and the installation was successful but when I was trying to load the library I got an error (see below). Interesting, yesterday I wrote to the maintainer of RSQLite apckage because I got the same error. Does somebody knows what is going on ?? thanks, Mayte library(gmodels) Error in dyn.load(x, as.logical(local), as.logical(now)) : unable to load shared library '/Library/Frameworks/R.framework/ Versions/2.4/Resources/library/gtools/libs/i386/gtools.so': dlopen(/Library/Frameworks/R.framework/Versions/2.4/Resources/ library/gtools/libs/i386/gtools.so, 6): Library not loaded: /usr/ local/gcc4.0/i686-apple-darwin8/lib/libgcc_s.1.0.dylib Referenced from: /Library/Frameworks/R.framework/Versions/2.4/ Resources/library/gtools/libs/i386/gtools.so Reason: image not found Error: package/namespace load failed for 'gmodels' R.Version() $platform [1] i386-apple-darwin8.8.1 $arch [1] i386 $os [1] darwin8.8.1 $system [1] i386, darwin8.8.1 $status [1] $major [1] 2 $minor [1] 4.1 $year [1] 2006 $month [1] 12 $day [1] 18 $`svn rev` [1] 40228 $language [1] R $version.string [1] R version 2.4.1 (2006-12-18) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Sequence
Hi Anup, (runif(100).5)*1 #would give you 0's and 1's. sample(rep(c(-1,1),50),100) #A bit slower I think, gives you -1's and 1's Best, Matt On 4/12/07, Anup Nandialath [EMAIL PROTECTED] wrote: Dear Friends, I'm trying to generate a sequence of 100 observations with either a 1 or -1. In other words the sequence should look something like this. y = 1 1 -1 1 -1 -1 -1 1 1 .. Can somebody please give me some direction on how I can do this in R. Thanks Anup Don't get soaked. Take a quick peak at the forecast __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rserve and R to R communication
Hi Ramon, I've been interested in responses to your question. I have what I think is a similar issue - I have a very large simulation script and would like to be able to modularize it by having a main script that calls lots of subscripts - but I haven't done that yet because the only way I could think to do it was to call a subscript, have it run, save the objects from the subscript, and then call those objects back into the main script, which seems like a very slow and onerous way to do it. Would Rserve do what I'm looking for? On 4/7/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote: Dear All, The clients.txt file of the latest Rserve package, by Simon Urbanek, says, regarding its R client, (...) a simple R client, i.e. it allows you to connect to Rserve from R itself. It is very simple and limited, because Rserve was not primarily meant for R-to-R communication (there are better ways to do that), but it is useful for quick interactive connection to an Rserve farm. Which are those better ways to do it? I am thinking about using Rserve to have an R process send jobs to a bunch of Rserves in different machines. It is like what we could do with Rmpi (or pvm), but without the MPI layer. Therefore, presumably it'd be easier to deal with network problems, machine's failures, using checkpoints, etc. (i.e., to try to get better fault tolerance). It seems that Rserve would provide the basic infrastructure for doing that and saves me from reinventing the wheel of using sockets, etc, directly from R. However, Simon's comment about better ways of R-to-R communication made me wonder if this idea really makes sense. What is the catch? Have other people tried similar approaches? Thanks, R. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rserve and R to R communication
Hi Paul, Thanks for your message, but I'm not 100% clear on your meaning. Could you unpack your logic a bit? Is this because two (+) sub-processes that are begun at precisely the same time will both have the same set.seed value (by default) for any RNG? Matt On 4/9/07, Paul Gilbert [EMAIL PROTECTED] wrote: Matthew Keller wrote: Hi Ramon, I've been interested in responses to your question. I have what I think is a similar issue - I have a very large simulation script and would like to be able to modularize it by having a main script that calls lots of subscripts - For simulations you need to worry about the random number generator sequence. I think snow has a scheme for handling this. If you devise your own system then be sure to look after this (non-trivial) detail. Paul Gilbert but I haven't done that yet because the only way I could think to do it was to call a subscript, have it run, save the objects from the subscript, and then call those objects back into the main script, which seems like a very slow and onerous way to do it. Would Rserve do what I'm looking for? On 4/7/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote: Dear All, The clients.txt file of the latest Rserve package, by Simon Urbanek, says, regarding its R client, (...) a simple R client, i.e. it allows you to connect to Rserve from R itself. It is very simple and limited, because Rserve was not primarily meant for R-to-R communication (there are better ways to do that), but it is useful for quick interactive connection to an Rserve farm. Which are those better ways to do it? I am thinking about using Rserve to have an R process send jobs to a bunch of Rserves in different machines. It is like what we could do with Rmpi (or pvm), but without the MPI layer. Therefore, presumably it'd be easier to deal with network problems, machine's failures, using checkpoints, etc. (i.e., to try to get better fault tolerance). It seems that Rserve would provide the basic infrastructure for doing that and saves me from reinventing the wheel of using sockets, etc, directly from R. However, Simon's comment about better ways of R-to-R communication made me wonder if this idea really makes sense. What is the catch? Have other people tried similar approaches? Thanks, R. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. La version française suit le texte anglais. This email may contain privileged and/or confidential info...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix similarity comparison
hi Carlos, its not really clear what you're asking here. If all you want is to see what entries are the same and which are different between two matrices of the same dimensions, then this does it: #same m1==m2 #diff m1 != m2 If you want to extract the ones that are the same, indx - m1==m2 indx [,1] [,2] [,3] [,4] [1,] TRUE TRUE FALSE TRUE [2,] TRUE TRUE TRUE FALSE [3,] FALSE TRUE FALSE TRUE m1[indx] 1 0 0 1 0 1 1 1 On 3/19/07, Carlos Guerra [EMAIL PROTECTED] wrote: Good morning to you all, I have a problem with a set of matrices that I want to compare. I want to see the similarity between them, and to be able to extract the differences between them. They have all the same number of columns and rows, and correspond presence absence data: for example: m1 - matrix(c(1,0,0,0,1,0,1,1,1,1,1,1), 3,4) m2 - matrix(c(1,0,1,0,1,0,0,1,0,1,0,1), 3,4) I tried with the function cor2m() [package=edodist] but it didn't worked and my matrices are much bigger than the ones from the example. Thank you, Carlos -- Carlos GUERRA Gabinete de Sistemas de Informacao Geografica Escola Superior Agraria de Ponte de Lima Mosteiro de Refoios do Lima 4990-706 Ponte de Lima Tlm: +351 91 2407109 Tlf: +351 258 909779 Reclaim your Inbox...!!! http://www.mozilla.org/products/thunderbird/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ideas to speed up code: converting a matrix of integers to a matrix of normally distributed values
Hi all, [this is a bit hard to describe, so if my initial description is confusing, please try running my code below] #WHAT I'M TRYING TO DO I'd appreciate any help in trying to speed up some code. I've written a script that converts a matrix of integers (usually between 1-10,000 - these represent allele names) into two new matrices of normally distributed values (representing genetic effects), such that a given number in the integer (allele name) matrix always corresponds to the *same* randomly distributed (genetic) effects. For example, every time my function sees the allele name 3782, it converts that integer into the same two effects (e.g., -.372 1.727), which follow normal distributions (these effects can be correlated; below I've set their correlation to .5). I have an entire matrix of integers, and am converting those into two entire matrices of effects. #HOW I'VE DONE IT SO FAR To get the correlations between the effects, I've used the mvrnorm function from MASS To convert the allele names to genetic effects, I've created a function (make.effect) that resets the set.seed() to the allele name each time its called. To get the matrix of genetic effects, I use sapply. #THE PROBLEM The problem is that I often need to convert matrices that have 500K cells, and do this over several hundred iterations, so it is quite slow. If anyone has ideas to speed this up (e.g., some clever way to convert a matrix of integers to a matrix of effects without using sapply), I would be forever indebted. ##MY CODE library(MASS) ##run this example to see what I'm talking about above make.effects - function(x,mn=0,var=1,cov=.5){ set.seed(x) return(mvrnorm(1,mu=c(mn,mn),Sigma=matrix(c(var,cov,cov,var),nrow=2),empirical=FALSE))} (alleles - matrix(c(5400,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),ncol=4)) a12 - array(sapply(alleles,make.effects),dim=c(2,nrow(alleles),ncol(alleles))) (a1 - a12[1,,]) (a2 - a12[2,,]) #I've set the population correlation = .5; empirical corr=.4635 cor(as.vector(a1),as.vector(a2)) ##run this to see that the code begins to get pretty slow with even a 3000x4 matrix system.time({ alleles - matrix(rep(c(5400,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),1000),ncol=4) a12 - array(sapply(alleles,make.effects),dim=c(2,nrow(alleles),ncol(alleles))) a1 - a12[1,,] a2 - a12[2,,] }) -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ideas to speed up code: converting a matrix of integers to a matrix of normally distributed values
Hi, Many thanks to Jim and Martin for their suggestions. Using your ideas, I came up with a solution that indexes rather than uses sapply (and therefore calling up mvrnorm separately for each cell in the matrix). The trick is to create a key matrix once, and then to use the match() function each time I need to take the results from the key matrix and place them in the appropriate spots in an 'effects' matrix. If anyone is interested, here is a solution which speeds up the process by a factor of 200 (!) : unique.allele.seq - 1:1 make.effects - function(allele.seq, seed, mn = 0, var=1, cov=.5) { set.seed(seed) return(mvrnorm(length(allele.seq), mu=c(mn,mn),Sigma=matrix(c(var,cov,cov,var),nrow=2), empirical=FALSE))} effects.key - make.effects(unique.allele.seq, 123) (alleles - matrix(c(15,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),ncol=4)) (indx - match(alleles,key)) (a1 - matrix(effects.key[indx,1],ncol=ncol(alleles))) (a2 - matrix(effects.key[indx,2],ncol=ncol(alleles))) #to check timing system.time({ alleles - matrix(rep(c(5400,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),1),ncol=4) indx - match(alleles,key) a1 - matrix(effects.key[indx,1],ncol=ncol(alleles)) a2 - matrix(effects.key[indx,2],ncol=ncol(alleles))}) On 3/16/07, jim holtman [EMAIL PROTECTED] wrote: Considering that the vast majority of your time is spent in the function mvrnorm (on my system 5.7 out of 6.1 seconds). In your example that is 12000 calls to the function. To improve your speed you have to cut down the number of calls to the function. For example, how many unique integers do you have and can to do the calls for those and then substitute matching values. Here is what Rprof showed: total.time total.pct self.time self.pct system.time 6.12 99.7 0.00 0.0 as.vector 6.06 98.7 0.18 2.9 FUN 6.06 98.7 0.12 2.0 array 6.06 98.7 0.10 1.6 lapply 6.06 98.7 0.00 0.0 sapply 6.06 98.7 0.00 0.0 eval6.04 98.4 0.06 1.0 mvrnorm 5.72 93.2 0.34 5.5 eigen 2.58 42.0 0.52 8.5 or another way of looking at it: 0 6.1 root 1.6.1 system.time 2. .6.0 eval 3. . .6.0 eval 4. . . .6.0 array 5. . . . .6.0 as.vector 6. . . . . .6.0 sapply 7. . . . . . .6.0 lapply 8. . . . . . . .6.0 FUN 9. . . . . . . . .5.7 mvrnorm 10. . . . . . . . . .2.6 eigen 11. . . . . . . . . . .1.2 sort.list 12. . . . . . . . . . . .1.0 match.arg 13. . . . . . . . . . . . .0.7 eval On 3/16/07, Matthew Keller [EMAIL PROTECTED] wrote: Hi all, [this is a bit hard to describe, so if my initial description is confusing, please try running my code below] #WHAT I'M TRYING TO DO I'd appreciate any help in trying to speed up some code. I've written a script that converts a matrix of integers (usually between 1-10,000 - these represent allele names) into two new matrices of normally distributed values (representing genetic effects), such that a given number in the integer (allele name) matrix always corresponds to the *same* randomly distributed (genetic) effects. For example, every time my function sees the allele name 3782, it converts that integer into the same two effects (e.g., -.372 1.727), which follow normal distributions (these effects can be correlated; below I've set their correlation to .5). I have an entire matrix of integers, and am converting those into two entire matrices of effects. #HOW I'VE DONE IT SO FAR To get the correlations between the effects, I've used the mvrnorm function from MASS To convert the allele names to genetic effects, I've created a function (make.effect) that resets the set.seed() to the allele name each time its called. To get the matrix of genetic effects, I use sapply. #THE PROBLEM The problem is that I often need to convert matrices that have 500K cells, and do this over several hundred iterations, so it is quite slow. If anyone has ideas to speed this up (e.g., some clever way to convert a matrix of integers to a matrix of effects without using sapply), I would be forever indebted. ##MY CODE library(MASS) ##run this example to see what I'm talking about above make.effects - function(x,mn=0,var=1,cov=.5){ set.seed(x) return(mvrnorm(1,mu=c(mn,mn),Sigma=matrix(c(var,cov,cov,var),nrow=2),empirical=FALSE))} (alleles - matrix(c(5400,3309,2080,1080,2080,,389,9362,6372,3787,2798,1009),ncol=4)) a12 - array(sapply(alleles,make.effects),dim=c(2,nrow(alleles),ncol(alleles))) (a1 - a12[1,,]) (a2 - a12[2,,]) #I've set the population correlation = .5; empirical corr=.4635 cor(as.vector
Re: [R] How to plot two graphs on one single plot?
Hi Yun, If you're asking how to place new graphic material on the same plot (e.g., several lines/points/etc in a single x-y region), this is covered in the Intro to R manual. E.g., you can do: plot(x1, y1, type='p', xlim=range(x1,x2), ylim=range(y1, y2), xlab='x', ylab='y') points(x2, y2, col=red) see ?lines ?points ?text ?abline Also, see the new option in par Best, Matt On 2/23/07, Yun Zhang [EMAIL PROTECTED] wrote: Hi, I am trying to plot two distribution graph on one plot. But I dont know how. I set them to the same x, y limit, even same x, y labels. Code: x1=rnorm(25, mean=0, sd=1) y1=dnorm(x1, mean=0, sd=1) x2=rnorm(25, mean=0, sd=1) y2=dnorm(x2, mean=0, sd=1) plot(x1, y1, type='p', xlim=range(x1,x2), ylim=range(y1, y2), xlab='x', ylab='y') plot(x2, y2, type='p', col=red, xlab='x', ylab='y') They just dont show up in one plot. Any hint will be very helpful. Thanks, Yun __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Google Custom Search Engine for R
Hi Sergio, There was a discussion on this board recently about the difficulty of searching for R related material on the web. I think the custom google search engine is a good idea. It would be helpful if we could have access to the full list of websites it is indexing so that we could make suggestions about other sites that are missing. As it is, it only tells us that there are 35 websites, and shows us the first several. Also, you might check out Sasha Goodman's Rseek: http://www.rseek.org/ Have you tried to compare the success of yours with Rseek? All the Best, Matt On 2/23/07, Sérgio Nunes [EMAIL PROTECTED] wrote: Hi, Since R is a (very) generic name, I've been having some trouble searching the web for this topic. Due to this, I've just created a Google Custom Search Engine that includes several of the most relevant sites that have information on R. See it in action at: http://google.com/coop/cse?cx=018133866098353049407%3Aozv9awtetwy This is really a preliminary test. Feel free to add yourself to the project and contribute with suggestions. Sérgio Nunes __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matlab style plotting in R
Hi Maria, I'm interested in the responses you get. The way I do this is to use par(new=TRUE), which tells R not to clean the frame before plotting the next plot. So, eg ###Overlaid plot op - par(mar = c(5, 4, 4, 5) + 0.1, las = 2) plot(x=1:5,y=rnorm(5)+1:5,type='b',xlab=xlab,ylab=ylab,main=Title) par(new=TRUE) plot(x=1:5,y=rnorm(5),axes=FALSE,ylab=,xlab=,type='b',col=red) axis(4,at=c(-2,-1,0,1,2),labels=c(6,7,8,9,10)) par(las=0) mtext(Other Y,side=4,line=3) The trick is this: if your 2nd set of Y variables are on a different scale than your 1st set of Y-variables, you'll need to transform the second set so that they'll be on the same scale - otherwise they won't show up on the old plot. You'll also, of course, need to back-transform your labels for the 2nd set of Y variables so that they read appropriately. A terrific site for R graphics, with lots of worked examples (including ones similar to the one I just did), is here: http://zoonek2.free.fr/UNIX/48_R/all.html On 2/13/07, Maria Vatapitakapha [EMAIL PROTECTED] wrote: Hello I was wondering how I can achieve matlab style plotting in R, in the sense that matlab allows you to plot multiple sets of variables within the same x-y axes. plot in R does not seem to cater for this. I tried 'overplot' from the gplots package but this assumes different y axes for the variables. any suggestions would be very appreciated Maria [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in Industry
Bob, Far from flaming you, I think you made a good point - one that I imagine most people who use R have come across. The name R is a big impediment to effective online searches. As a check, I entered R software, SAS software, SPSS software, and S+ software into google. The R 'hit rate' was only ten out of the first 20 results (I didn't look any further). For the other three software packages, the hit rates were all 100% (20/20). I do wonder if anything can/should be done about this. I generally search using the term CRAN but of course, that omits lots of stuff relevant to R. Any ideas about how to do effective online searches for R related materials? Matt On 2/6/07, Wensui Liu [EMAIL PROTECTED] wrote: I've been looking for job that allows me to use R/S+ since I got out of graduate school 2 years ago but with no success. I am wondering if there is something that can be done to promote the use of R in industry. It's been very frustrating to see people doing statistics using excel/spss and even more frustrating to see people paying $$$ for something much inferior to R. On 2/6/07, Doran, Harold [EMAIL PROTECTED] wrote: The other day, CNN had a story on working at Google. Out of curiosity, I went to the Google employment web site (I'm not looking, but just curious). In perusing their job posts for statisticians, preference is given to those who use R and python. Other languages, S-Plus and something called SAS were listed as lower priorities. When I started using Python, I noted they have a portion of the web site with job postings. CRAN does not have something similar, but think it might be useful. I think R is becoming more widely used in industry and I wonder if helping it move along a bit, the maintainer of CRAN could create a section of the web site devoted to jobs where R is a requirement. Hence, we could have our own little monster.com kind of thing going on. Of the multitude of ways the gospel can be spread, this is small. But, I think every small step forward is good. Anyone think this is useful? Harold [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Download stock prices
Hi Mihai, You might check out the Rmetrics bundle, available on the cran website. I've used its fBasics library to download stock prices. Try the yahooImport() function and the keystats() function for downloading specific stock prices. I had to fiddle with the keystats function to get it to work properly, but I wrote to the writer of the library and it may have been fixed by now. Best of luck, Matt On 2/4/07, Mihai Nica [EMAIL PROTECTED] wrote: gReetings: Is there any way to download a (or a sample of a) crossection of stock market prices? Or is it possible to use get.hist.quote with a *wild card*? Thanks, mihai Mihai Nica 170 East Griffith St. G5 Jackson, MS 39201 601-914-0361 8:00? 8:25? 8:40? Find a flick in no time [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading functions in R
Hi Jeff, The way I do this is to place all the options that I want, along with functions I've written that I always want available, into the Rprofile.site file. R always loads this file upon startup. That file is located in the etc/ folder. E.g., on my computer, it is at: C:\Program Files\R\R-2.2.1\etc\Rprofile.site . This is explained in section 10.8 of the R-intro.pdf manual that comes with R. If you only want the functions available sometimes, then use Christos' suggestions. -- Matt On 1/31/07, Christos Hatzis [EMAIL PROTECTED] wrote: The recommended approach is to make a package for your functions that will include documentation, error checks etc. Another way to accomplish what you want is to start a new R session and 'source' your .R files and then to save the workspace in a .RData file, e.g. myFunctions.RData. Finally attach(myFunctions.RData) should do the trick without cluttering your workspace. -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Forest Floor Sent: Wednesday, January 31, 2007 10:41 PM To: r-help@stat.math.ethz.ch Subject: [R] Loading functions in R Hi all, This information must be out there, but I can't seem to find it. What I want to do is to store functions I've created (as .R files or in whatever form) and then load them when I need them (or on startup) so that I can access without cluttering my program with the function code. This seems like it should be easy, but Thanks! Jeff __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to print the objects in the memory
Hi usstata, I think this will get you what you want: mget(ls(),globalenv()) On 1/31/07, usstata [EMAIL PROTECTED] wrote: Hi,all: May be a pointless question a - 1:10 b - matrix(1:8,nrow = 4) c - letters[4:8] …… ls() [1] a b c ls() can print the names of the objects in the memory , but I want to get the result : a [1] 1 2 3 4 5 6 7 8 9 10 b [,1] [,2] [1,]15 [2,]26 [3,]37 [4,]48 c [1] d e f g h …… I try the command print(noquote(ls())) which it can't help Best regards usstata __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] what is the purpose of an error message in uniroot?
Hi all, This is probably a blindingly obvious question: Why does it matter in the uniroot function whether the f() values at the end points that you supply are of the same sign? For example: f - function(x,y) {y-x^2+1} #this gives a warning uniroot(f,interval=c(-5,5),y=0) Error in uniroot(f, interval=c(-5, 5), y = 0) : f() values at end points not of opposite sign #this doesn't give a warning uniroot(f,interval=c(.1,5),y=0) $root [1] 1 $f.root [1] 1.054e-05 $iter [1] 9 $estim.prec [1] 6.104e-05 If I comment out the two lines of script in the uniroot function that produce this warning and create a new function, call it uniroot2, everything works as I'd like. But for didactic purposes, why did the creators of uniroot want the f() values at endpoints to be of opposite sign? Thanks in advance, Matt __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.