[ESS] error merging R source files in ESS mode
Hi, I'm a long-time Emacs and R user, but have never used ESS (at least not on purpose -- it's bundled with Aquamacs Emacs, which is what I use.) I often use Emacs to merge other types of files, but when I just selected two R source files to merge, I got the error message: "Customise alist is not specified, nor ess-local-customize-alist is set." but comparing the files didn't trigger the error. I'd be grateful for any suggestions about how to get merge working, or where to look for information about how to get it working. Aside from disabling ESS-mode, I'm not sure how to proceed. Many thanks, David Romano __ ESS-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/ess-help
[R] ggplot2: how to jitter spaghetti plot so slopes are preserved
Hi, Suppose I have a the data frame given by: dput(toy.df) structure(list(id = c(1, 2, 1, 2), time = c(1L, 1L, 2L, 2L), value = c(1, 2, 2, 3)), .Names = c(id, time, value), row.names = c(NA, 4L), class = data.frame) that is: toy.df id time value 1 11 1 2 21 2 3 12 2 4 22 3 I can create a spaghetti plot with the command: ggplot(toy.df,aes(x=time,y=value,group=id,color=factor(id))) + geom_line() What I'd like to be able to do is jitter the lines themselves by translation so that their slopes are preserved, but so far my attempts to jitter -- within ggplot, as opposed to first jittering toy.df by hand -- seem to always jitter the two points for a given id independently, and thus change the slopes. I'd be grateful for any guidance! Thanks, David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trouble using sapply to perform multiple t-test
Thanks Arun and Jim; this helps me sort out several points I hadn't been aware of! -David On Sat, Feb 15, 2014 at 1:39 PM, arun smartpink...@yahoo.com wrote: Hi David, Try: Check the output of: lapply(mm,function(x) x) #mm is matrix #and lapply(as.data.frame(mm),function(x) x) sapply(split(mm,col(mm)),function(x){out - t.test(x[1:15],x[16:25])$p.value}) # 1 2 #0.1091573 1.000 #or sapply(as.data.frame(mm), function(x) t.test(x[1:15],x[16:25])$p.value) # V1V2 #0.1091573 1.000 A.K. On Saturday, February 15, 2014 3:19 PM, David Romano drom...@stanford.edu wrote: Hi folks, I'm having trouble with code that used to work, and I can't figure out what's going wrong. I'd be grateful for any help in sorting this out. Suppose I define a matrix mm - matrix(1:15, 25,2) and compare the first 15 values of column 1 of mm to the values remaining in the same column and obtain p values as follows: c1 - mm[,1] out - t.test(c1[1:15],c1[16:25]) ; out$p.value This of course works fine, but if I try to embed this line in a call to sapply to repeat this for each column, I get the following: mm.pvals - sapply(mm, function(x) {out - t.test(x[1:15],x[16:25]) ; out$p.value}) Error in t.test.default(x[1:15], x[16:25]) : not enough 'x' observations What is baffling is code like this has worked for me before, and I can't tell what's triggering the error. Thanks in advance for your help! Best, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trouble using sapply to perform multiple t-test
Hi folks, I'm having trouble with code that used to work, and I can't figure out what's going wrong. I'd be grateful for any help in sorting this out. Suppose I define a matrix mm - matrix(1:15, 25,2) and compare the first 15 values of column 1 of mm to the values remaining in the same column and obtain p values as follows: c1 - mm[,1] out - t.test(c1[1:15],c1[16:25]) ; out$p.value This of course works fine, but if I try to embed this line in a call to sapply to repeat this for each column, I get the following: mm.pvals - sapply(mm, function(x) {out - t.test(x[1:15],x[16:25]) ; out$p.value}) Error in t.test.default(x[1:15], x[16:25]) : not enough 'x' observations What is baffling is code like this has worked for me before, and I can't tell what's triggering the error. Thanks in advance for your help! Best, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to parallelize 'apply' across multiple cores on a Mac
(I neglected to use reply-all.) -- Forwarded message -- From: David Romano drom...@stanford.edu Date: Sat, May 4, 2013 at 11:25 AM Subject: Re: [R] how to parallelize 'apply' across multiple cores on a Mac To: Charles Berry ccbe...@ucsd.edu On Sat, May 4, 2013 at 9:32 AM, Charles Berry ccbe...@ucsd.edu wrote: David, If you insist on explicitly parallelizing this: The functions in the recommended package 'parallel' work on a Mac. I would not try to work on each tiny column as a separate function call - too much overhead if you parallelize - instead, bundle up 100-1000 columns to operate on. The calc's you describe are sound simple enough that I would just write them in C and use the .Call interface to invoke them. You only need enough working memory in C to operate on one column and space to save the result. So a MacBook with 8GB of memory will handle it with room to breathe. This is a good use case for the 'inline' package, especially if you are unfamiliar with the use of .Call. === But it might be as fast to forget about paralleizing this (explicitly). [detailed recommendations deleted] On a Mac, the vecLib BLAS will do crossprod using the multiple cores without your needing to do anything special. So you can forget about 'parallel', 'multicore', etc. So your remaining problem is to reread steps 2=6 and figure out what 'minimal.matrix' and 'fill.rows' have to be. === You can also approach this problem using 'filter', but that can get 'convoluted' (pun intended - see ?filter). HTH, Thanks, Charles, for all the helpful pointers! For the moment, I'll leave parallelization aside, and will explore using 'crossprod' and 'filter'. Although, from your suggestion that 8 GB of memory should be sufficient if I went the parallel, I also wonder whether I'm suffering not just from inefficient use of computing resources, but that there's a memory leak as well: The original 'apply' code would, in much less than a minute, take over the full 18 GB of memory available on my workstation, and then leave it functioning at a crawl for at least a half hour or so. I'll ask about this by reposting this message again with a different subject, so no need to address it in this thread. Thanks again, David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] memory leak using 'apply'? [was: how to parallelize 'apply' across multiple cores on a Mac]
Hi everyone, From the answers I've received to the question below, it occurs to me there may be more than inefficient programming on my part involved: The 'apply' code described below quickly takes up the 18 GB of memory I have available, which leaves my machine functioning at a crawl for the at least 30 minutes (likely more) it takes for R complete it computations. Similar behavior arises when try to add even a handful of columns to the matrix (data frame, really) I obtain from the 'apply' described below, the only difference being how long it takes to complete the task, which is more on the order of five minutes for adding four columns. I'd be grateful for any suggestions about how to trouble-shoot what's happening, or how to prevent R from taking up so much of the available memory (which is then not released until I restart R)! Thanks in advance for you help, David On Fri, May 3, 2013 at 4:56 PM, David Romano drom...@stanford.edu wrote: Hi everyone, I'm trying to use apply (with a call to zoo's rollapply within) on the columns of a 1.5Kx165K matrix, and I'd like to make use of the other cores on my machine to speed it up. (And hopefully also leave more memory free: I find that after I create a big object like this, I have to save my workspace and then close and reopen R to be able to recover memory tied up by R, but maybe that's a separate issue -- if so, please let me know!) It seems the package 'multicore' has a parallel version of 'lapply', which I suppose I could combine with a 'do.call' (I think) to gather the elements of the output list into a matrix, but I was wondering whether there might be another route. And, in case the particular way I constructed the call to 'apply' might be the source of the problem, here is a deconstructed version of what I did to each column, for easier parsing: - begin call to 'apply' Step 1: Identify several disjoint subsequences of fixed length, say length three, of a column. column.values - 1:16 desired.subseqs - c( NA, NA, NA, 1, 1, 1, NA, 1, 1, 1, NA, NA, 1,1,1, NA ) # this vector is used for every column. desired.values - desired.subseq * column.values Step 2: Find the average value of each subsequence. desired.means - rollapply( desired.values, 3, mean, fill=NA, align = right, na.rm = FALSE) # put mean in highest index of subsequence and retain original vector length desired.means [1] NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA NA 14 NA Step 3: Shift values forward by one index value, retaining original vector length. desired.means - zoo( desired.means ) # in order to be able to use lag.zoo desired.means - lag( desired.means, k = -1, na.pad = TRUE) desired.means [1] NA NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA 14 Step 4: Use last-observation-carried-forward, retaining original vector length. desired.means - na.locf( desired.means, na.rm = FALSE ) desired.means [1] NA NA NA NA NA NA 5 5 5 5 9 9 9 9 9 14 Step 5: Use next-observation-carried-backward to assign values to initial sequence of NAs. desired.means - na.locf( desired.means, fromLast = TRUE) desired.means [1] 5 5 5 5 5 5 5 5 5 5 9 9 9 9 9 14 Step 6: Convert back to vector (from zoo object), and subtract from column. desired.column - vector.values - coredata(desired.means) desired.column [1] -4 -3 -2 -1 0 1 2 3 4 5 2 3 4 5 6 2 - end call to 'apply' Thanks, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to parallelize 'apply' across multiple cores on a Mac
Hi everyone, I'm trying to use apply (with a call to zoo's rollapply within) on the columns of a 1.5Kx165K matrix, and I'd like to make use of the other cores on my machine to speed it up. (And hopefully also leave more memory free: I find that after I create a big object like this, I have to save my workspace and then close and reopen R to be able to recover memory tied up by R, but maybe that's a separate issue -- if so, please let me know!) It seems the package 'multicore' has a parallel version of 'lapply', which I suppose I could combine with a 'do.call' (I think) to gather the elements of the output list into a matrix, but I was wondering whether there might be another route. And, in case the particular way I constructed the call to 'apply' might be the source of the problem, here is a deconstructed version of what I did to each column, for easier parsing: - begin call to 'apply' Step 1: Identify several disjoint subsequences of fixed length, say length three, of a column. column.values - 1:16 desired.subseqs - c( NA, NA, NA, 1, 1, 1, NA, 1, 1, 1, NA, NA, 1,1,1, NA ) # this vector is used for every column. desired.values - desired.subseq * column.values Step 2: Find the average value of each subsequence. desired.means - rollapply( desired.values, 3, mean, fill=NA, align = right, na.rm = FALSE) # put mean in highest index of subsequence and retain original vector length desired.means [1] NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA NA 14 NA Step 3: Shift values forward by one index value, retaining original vector length. desired.means - zoo( desired.means ) # in order to be able to use lag.zoo desired.means - lag( desired.means, k = -1, na.pad = TRUE) desired.means [1] NA NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA 14 Step 4: Use last-observation-carried-forward, retaining original vector length. desired.means - na.locf( desired.means, na.rm = FALSE ) desired.means [1] NA NA NA NA NA NA 5 5 5 5 9 9 9 9 9 14 Step 5: Use next-observation-carried-backward to assign values to initial sequence of NAs. desired.means - na.locf( desired.means, fromLast = TRUE) desired.means [1] 5 5 5 5 5 5 5 5 5 5 9 9 9 9 9 14 Step 6: Convert back to vector (from zoo object), and subtract from column. desired.column - vector.values - coredata(desired.means) desired.column [1] -4 -3 -2 -1 0 1 2 3 4 5 2 3 4 5 6 2 - end call to 'apply' Thanks, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to best add columns to a matrix with many columns
Hi everyone, I have large data frame, say df1, with 165K columns, and all but the first four columns of df1 are numeric. I transformed the numeric data and obtained a matrix, call it data.m, with 165K - 4 columns, and then tried to create a second data frame by replacing the numeric columns of df1 by data.m. I did this in two ways, and both ways instantly used up all the available memory, so I was wondering whether there was a better way to do this. Here's what I tried: df2 - df1 df2[ ,5:length(df1)] - data.m and df2 - cbind( df1[1:4], data.m) Thanks, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to best add columns to a matrix with many columns
Sorry, Jeff, I misspoke: the 'matrix' data.m is really a data frame -- I was just thinking about it as a matrix since it's the numeric part of df1, and didn't realize the thought made it's way in the message. So the memory issues are unrelated to converting between data frames and matrices. -David On Fri, May 3, 2013 at 8:20 PM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote: I am not seeing any good justification in your description for converting to matrix if you are planning to convert it back to data frame. Memory is going to be inefficiently-used if you do this. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. David Romano drom...@stanford.edu wrote: Hi everyone, I have large data frame, say df1, with 165K columns, and all but the first four columns of df1 are numeric. I transformed the numeric data and obtained a matrix, call it data.m, with 165K - 4 columns, and then tried to create a second data frame by replacing the numeric columns of df1 by data.m. I did this in two ways, and both ways instantly used up all the available memory, so I was wondering whether there was a better way to do this. Here's what I tried: df2 - df1 df2[ ,5:length(df1)] - data.m and df2 - cbind( df1[1:4], data.m) Thanks, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about reproducibility/consistency of principal component and lda directions in R
On Sat, Feb 9, 2013 at 11:43 AM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 08.02.2013 20:14, David Romano wrote: Hi everyone, I'm not exactly sure how to ask this question most clearly, but I hope that giving the context in which it occurs for me will help: I'm trying to compare the brain images of two patient populations; each image is composed of voxels (the 3D analogue of pixels), and I have two images per patient, one reflecting grey matter concentration at each voxel, and the other reflecting white matter concentration at each voxel. I determined the groups by means of an analysis that involved information from both types of images, and what I set out to do was to get a rough idea of where in the brain the two groups showed the most striking differences. My first attempt was to replace -- on a voxel by voxel basis -- the bivariate grey/white data by a combined univariate measure, namely the first principal component score. From these principal component scores I calculated Cohen's d to obtain a rough estimate of the effect size at each voxel, and the resulting brain images show very nice separation into meaningful brain regions, some corresponding to negative effect sizes and some to positive ones. What puzzles me about how nice the separation into brain regions is, is that the meaning of positive and negative is determined by the choice of the first principal component direction at each voxel, but this choice is -- in principle (no pun intended -- sorry!) -- arbitrary. (Meaning whether an eigenvector or its negative is chosen as the direction is in principle arbitrary.) So here are my questions: Does the algorithm used in R produce the same principal component directions if applied to the same data repeatedly? Yes, but it may change if you execute it on another machine (depends on compiler hence also 32-bit vs 64-bit and OS). And if so, should the directions chosen by the algorithm change continuously with the data? For example, if one data set were obtained by applying a small amount of noise to another, should the resulting directions be close to each other (as opposed to close negative of each other)? (Assuming the data is far from being singular in some vague sense I'm not sure how to make precise.) Noise means the sign can change again. Of course, you can define yourself e.g. the direction of the very first value and change signs otherwise. My second attempt was to do the same, but with the first lda scores, so I have the same questions about lda directions, too. Same for lda. Best, Uwe Ligges Thanks, Uwe; all good to know. Best, David Any light you could shed on these questions would be very welcome! Thanks in advance, David Romano [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] different behavior of $ with string literal vs string variable as argument
Hi everyone, I ran into the issue below while trying to execute a command of the form apply(list.names,1, function(x) F(favorite.list$x) ) where list.names is a character vector containing the names of the elements of favorite.list and F is some function defined on a list element. Namely, the $ operator doesn't treat the string variable 'x' as the string it represents, so that, e.g. ll - list(ss=abc) ll$ss [1] abc ll$ss [1] abc but name - ss ll$name NULL I can get around this by using integers and the [[ and [ operators, but I'd like to be able to use names directly, too -- how would I go about doing this? Thanks for your help in clarifying what might be going on here. David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] different behavior of $ with string literal vs string variable as argument
On Sun, Feb 10, 2013 at 1:40 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 13-02-10 4:06 PM, David Romano wrote: Hi everyone, I ran into the issue below while trying to execute a command of the form apply(list.names,1, function(x) F(favorite.list$x) ) where list.names is a character vector containing the names of the elements of favorite.list and F is some function defined on a list element. Namely, the $ operator doesn't treat the string variable 'x' as the string it represents, so that, e.g. ll - list(ss=abc) ll$ss [1] abc ll$ss [1] abc but name - ss ll$name NULL I can get around this by using integers and the [[ and [ operators, but I'd like to be able to use names directly, too -- how would I go about doing this? Thanks for your help in clarifying what might be going on here. You can use names with [[, e.g. ll[[name]] will work in your example. You can see more details in the help topic help($), in the section Recursive (list-like) objects. Duncan Murdoch Thanks, Duncan (and Michael, earlier); this clear everything up. And just so the help topic language is included in this thread: Recursive (list-like) objects Indexing by [ is similar to atomic vectors and selects a list of the specified element(s). Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. - which I take to mean that the argument to $ cannot require evaluation of any kind, and so must be a string literal. Thanks again, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] different behavior of $ with string literal vs string variable as argument
Sorry, this was meant to go to the full list. -David On Sun, Feb 10, 2013 at 2:15 PM, David Romano drom...@stanford.edu wrote: On Sun, Feb 10, 2013 at 1:59 PM, Bert Gunter gunter.ber...@gene.comwrote: Please read the Help before posting. ?$ says: It helps to know that $ must be quoted, so thanks again goes to Duncan for pointing this out. Both [[ and $ select a single element of the list. The main difference is that $ **does not allow computed indices** , whereas [[ does. x$name is equivalent to x[[name, exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument. [emphasis added] In other words, $ does not evaluate its argument. This also appeared just a couple of days ago on this list, so please also search Help archives before posting. I did search, but as Ben points out in the next message in the thread, it's tricky to formulate the search to get hits, and, for example, I wouldn't have realized the post he refers to there involves the same issue unless I already knew the answer. David -- Bert On Sun, Feb 10, 2013 at 1:06 PM, David Romano drom...@stanford.edu wrote: Hi everyone, I ran into the issue below while trying to execute a command of the form apply(list.names,1, function(x) F(favorite.list$x) ) where list.names is a character vector containing the names of the elements of favorite.list and F is some function defined on a list element. Namely, the $ operator doesn't treat the string variable 'x' as the string it represents, so that, e.g. ll - list(ss=abc) ll$ss [1] abc ll$ss [1] abc but name - ss ll$name NULL I can get around this by using integers and the [[ and [ operators, but I'd like to be able to use names directly, too -- how would I go about doing this? Thanks for your help in clarifying what might be going on here. David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to extract test for collinearity and constantcy used in lda
Just posting to answer my own question, at least for the variables constant error: I hadn't noticed that lda has an argument called 'tol' that governs when variables are interpreted as constant within groups; it's right there in the help entry for lda, so I apologize for having missed it.As to the variables collinear warning, it's still not clear to me what level of correlation will trigger it. My apologies, David On Wed, Feb 6, 2013 at 12:21 PM, David Romano drom...@stanford.edu wrote: Hi everyone, I'm trying to vectorize an application of lda to each 2D slice of a 3D array, but am running into trouble: It seems there are quite a few 2D slices that trigger either the variables are collinear warning, or worse, trigger a variable appears to be constant within groups error and fails (i.e., ceases computation rather than skips bad slice). There are cases where neither warning is literally true, so I expect the warning and error must be triggered in a neighborhood of collinearity and within-group-constancy, and I would like to be able to remove the offending slice in advance. Does anyone know where I can find the explicit tests that are used for these? Thanks in advance for any light you can help shed on this question. Best, David P.S. The 3D array has roughly 40K 2D slices, so inspection is not an option! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about reproducibility/consistency of principal component and lda directions in R
Hi everyone, I'm not exactly sure how to ask this question most clearly, but I hope that giving the context in which it occurs for me will help: I'm trying to compare the brain images of two patient populations; each image is composed of voxels (the 3D analogue of pixels), and I have two images per patient, one reflecting grey matter concentration at each voxel, and the other reflecting white matter concentration at each voxel. I determined the groups by means of an analysis that involved information from both types of images, and what I set out to do was to get a rough idea of where in the brain the two groups showed the most striking differences. My first attempt was to replace -- on a voxel by voxel basis -- the bivariate grey/white data by a combined univariate measure, namely the first principal component score. From these principal component scores I calculated Cohen's d to obtain a rough estimate of the effect size at each voxel, and the resulting brain images show very nice separation into meaningful brain regions, some corresponding to negative effect sizes and some to positive ones. What puzzles me about how nice the separation into brain regions is, is that the meaning of positive and negative is determined by the choice of the first principal component direction at each voxel, but this choice is -- in principle (no pun intended -- sorry!) -- arbitrary. (Meaning whether an eigenvector or its negative is chosen as the direction is in principle arbitrary.) So here are my questions: Does the algorithm used in R produce the same principal component directions if applied to the same data repeatedly? And if so, should the directions chosen by the algorithm change continuously with the data? For example, if one data set were obtained by applying a small amount of noise to another, should the resulting directions be close to each other (as opposed to close negative of each other)? (Assuming the data is far from being singular in some vague sense I'm not sure how to make precise.) My second attempt was to do the same, but with the first lda scores, so I have the same questions about lda directions, too. Any light you could shed on these questions would be very welcome! Thanks in advance, David Romano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to multiply list of matrices by list of vectors
Thanks Rolf and Arun! -David On Wed, Feb 6, 2013 at 6:13 AM, arun smartpink...@yahoo.com wrote: Hi, I got an error message with: vlist - apply(mm, list) Error in match.fun(FUN) : argument FUN is missing, with no default #assuming that vlist - apply(mm,2,list) mapply(%*%,mlist,vlist[1:2],SIMPLIFY=FALSE) #[[1]] #[,1] #[1,] 19 #[2,] 22 #[3,] 25 #[4,] 28 # #[[2]] #[,1] #[1,] 67 #[2,] 74 #[3,] 81 #[4,] 88 A.K. - Original Message - From: David Romano drom...@stanford.edu To: r-help@r-project.org Cc: Sent: Wednesday, February 6, 2013 12:50 AM Subject: [R] how to multiply list of matrices by list of vectors Hi everyone, I'd like to be able to apply lda to each 2D matrix slice of a 3D array, and then use the scalings to obtain the corresponding lda scores. I can use 'apply' to get a list of the lda output for each 2D slice, and can create a list of the resulting scalings, but I'm not sure how to multiply them in a vectorized way. Here's how I made a list of 2D matrices (suggestion on improving this would be welcome, too!): aa - array(1:24,c(4,2,3)) mlist - apply(aa,2,list) mlist - lapply(mlist, unlist) mlist - lapply(mlist, function(x) matrix(x,4,2)) and here's how I made a list of vectors: mm - matrix(1:6,2,3) vlist - apply(mm, list) vlist - lapply(vlist, unlist) Now I'd like to make the list whose i-th element is mlist[[i]]%*%vlist[[i]] without having to loop through the indices. Any help would be appreciated! Thanks, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to extract test for collinearity and constantcy used in lda
Hi everyone, I'm trying to vectorize an application of lda to each 2D slice of a 3D array, but am running into trouble: It seems there are quite a few 2D slices that trigger either the variables are collinear warning, or worse, trigger a variable appears to be constant within groups error and fails (i.e., ceases computation rather than skips bad slice). There are cases where neither warning is literally true, so I expect the warning and error must be triggered in a neighborhood of collinearity and within-group-constancy, and I would like to be able to remove the offending slice in advance. Does anyone know where I can find the explicit tests that are used for these? Thanks in advance for any light you can help shed on this question. Best, David P.S. The 3D array has roughly 40K 2D slices, so inspection is not an option! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to multiply list of matrices by list of vectors
Hi everyone, I'd like to be able to apply lda to each 2D matrix slice of a 3D array, and then use the scalings to obtain the corresponding lda scores. I can use 'apply' to get a list of the lda output for each 2D slice, and can create a list of the resulting scalings, but I'm not sure how to multiply them in a vectorized way. Here's how I made a list of 2D matrices (suggestion on improving this would be welcome, too!): aa - array(1:24,c(4,2,3)) mlist - apply(aa,2,list) mlist - lapply(mlist, unlist) mlist - lapply(mlist, function(x) matrix(x,4,2)) and here's how I made a list of vectors: mm - matrix(1:6,2,3) vlist - apply(mm, list) vlist - lapply(vlist, unlist) Now I'd like to make the list whose i-th element is mlist[[i]]%*%vlist[[i]] without having to loop through the indices. Any help would be appreciated! Thanks, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odd behavior of browser()
On Tue, Dec 4, 2012 at 2:12 PM, David Romano drom...@stanford.edu wrote: On Tue, Dec 4, 2012 at 10:22 AM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 04/12/2012 12:54 PM, David Romano wrote: Hi everyone, I normally include a call to browser() as I'm working out the kinks in my scripts, and I am always able to step through each line by hitting Return, but for some reason, in the scripts I'm working on now, hitting Return seems to cause execution of *all* the lines in my script. I've restarted R several times in case it was stuck in a bad state for some reason, but I'm consistently getting this behavior anyway. Has anyone run into this problem before? Maybe I inadvertently reset preferences? I wouldn't have expected that to work. Calling browser() from within a function will let you step through the function, but calling it from within a script doesn't. Do you really have some scripts where this worked? Duncan Murdoch Hi Duncan (and this addresses Michael's earlier comment, too), I've been using browser() in scripts since this summer, which is when I started using R, and -- until now -- it has always worked to step through the scripts, and -- in regards to Michael's comment -- whether or not there were blank lines in the script... David Romano Hi everyone, I forgot to cc r-help in the response above, and found out why browser() had been working in my scripts up to now: All of my scripts so far had consisted of a body of code that was applied in the same range of contexts via nested 'for' loops, so that each script had the form browser() for (c in context){ body } in which case I could run through the body one line at a time. So -- outside of when it's called from inside a function -- I still can't make sense of exactly when browser() will do this, but I now have at least one way to run through a script. Thanks to Michael and Duncan for their skepticism, which kept me going in search of what happened! David Romano An example which produces this behavior is the following: file bugcheck.r: browser() a - 1 b - 2 source(bugcheck.r) Called from: eval(expr, envir, enclos) Browse[1] Return ls() [1] a b a [1] 1 b [1] 2 I'd be grateful for any help in resolving this! Thanks, David Romano [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using 'apply' to apply princomp to an array of datasets
Hi everyone, Suppose I have a 3D array of datasets, where say dimension 1 corresponds to cases, dimension 2 to datasets, and dimension 3 to observations within a dataset. As an example, suppose I do the following: x - sample(1:20, 48, replace=TRUE) datasets - array(x, dim=c(4,3,2)) Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single data matrix with four cases and two observations. Now, I'd like to be able to do the following: apply pca to each dataset, and create a matrix of the first principal component scores. In this example, I could do: pcl-apply(datasets,2,princomp) which yields a list of princomp output, one for each dataset, so that the vector of first principal component scores for dataset 1 is obtained by score1set1 - pcl[[1]]$scores[,1] and I could then obtain the desired matrix by score1matrix - cbind( score1set1, score1set2, score1set3) So my first question is: 1) how could I use *apply to do this? I'm having trouble because pcl is a list of lists, so I can't use, say, do.call(cbind, ...) without first having a list of the first component score vectors, which I'm not sure how to produce. My second question is: 2) Having answered question 1), now suppose there may be datasets containing NA value -- how could I select the subset of values from dimension 2 corresponding to the datasets for which this is true (again using *apply?)? Thanks in advance for any light you might be able to shed on these questions! David Romano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using 'apply' to apply princomp to an array of datasets
Sorry, I just realized I didn't send the message below in plain text. -David Romano On Wed, Dec 12, 2012 at 9:14 AM, David Romano drom...@stanford.edu wrote: Hi everyone, Suppose I have a 3D array of datasets, where say dimension 1 corresponds to cases, dimension 2 to datasets, and dimension 3 to observations within a dataset. As an example, suppose I do the following: x - sample(1:20, 48, replace=TRUE) datasets - array(x, dim=c(4,3,2)) Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single data matrix with four cases and two observations. Now, I'd like to be able to do the following: apply pca to each dataset, and create a matrix of the first principal component scores. In this example, I could do: pcl-apply(datasets,2,princomp) which yields a list of princomp output, one for each dataset, so that the vector of first principal component scores for dataset 1 is obtained by score1set1 - pcl[[1]]$scores[,1] and I could then obtain the desired matrix by score1matrix - cbind( score1set1, score1set2, score1set3) So my first question is: 1) how could I use *apply to do this? I'm having trouble because pcl is a list of lists, so I can't use, say, do.call(cbind, ...) without first having a list of the first component score vectors, which I'm not sure how to produce. My second question is: 2) Having answered question 1), now suppose there may be datasets containing NA value -- how could I select the subset of values from dimension 2 corresponding to the datasets for which this is true (again using *apply?)? Thanks in advance for any light you might be able to shed on these questions! David Romano __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using 'apply' to apply princomp to an array of datasets
Thank you, Rui! This is incredibly helpful -- anonymous functions are new to me, and I appreciate being shown how useful they are. Best regards, David On Wed, Dec 12, 2012 at 10:12 AM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, As for the first question try scoreset - lapply(pcl, function(x) x$scores[, 1]) do.call(cbind, scoreset) As for the second question, you want to know which columns in 'datasets' have NA's? colidx - apply(datasets, 2, function(x) any(is.na(x))) datasets[, colidx] # These have NA's For the column numbers you can do colnums - which(colidx) Hope this helps, Rui Barradas Em 12-12-2012 17:14, David Romano escreveu: Hi everyone, Suppose I have a 3D array of datasets, where say dimension 1 corresponds to cases, dimension 2 to datasets, and dimension 3 to observations within a dataset. As an example, suppose I do the following: x - sample(1:20, 48, replace=TRUE) datasets - array(x, dim=c(4,3,2)) Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single data matrix with four cases and two observations. Now, I'd like to be able to do the following: apply pca to each dataset, and create a matrix of the first principal component scores. In this example, I could do: pcl-apply(datasets,2,princomp) which yields a list of princomp output, one for each dataset, so that the vector of first principal component scores for dataset 1 is obtained by score1set1 - pcl[[1]]$scores[,1] and I could then obtain the desired matrix by score1matrix - cbind( score1set1, score1set2, score1set3) So my first question is: 1) how could I use *apply to do this? I'm having trouble because pcl is a list of lists, so I can't use, say, do.call(cbind, ...) without first having a list of the first component score vectors, which I'm not sure how to produce. My second question is: 2) Having answered question 1), now suppose there may be datasets containing NA value -- how could I select the subset of values from dimension 2 corresponding to the datasets for which this is true (again using *apply?)? Thanks in advance for any light you might be able to shed on these questions! David Romano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odd behavior of browser()
Hi everyone, I normally include a call to browser() as I'm working out the kinks in my scripts, and I am always able to step through each line by hitting Return, but for some reason, in the scripts I'm working on now, hitting Return seems to cause execution of *all* the lines in my script. I've restarted R several times in case it was stuck in a bad state for some reason, but I'm consistently getting this behavior anyway. Has anyone run into this problem before? Maybe I inadvertently reset preferences? An example which produces this behavior is the following: file bugcheck.r: browser() a - 1 b - 2 source(bugcheck.r) Called from: eval(expr, envir, enclos) Browse[1] Return ls() [1] a b a [1] 1 b [1] 2 I'd be grateful for any help in resolving this! Thanks, David Romano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using ifelse to remove NA's from specific columns of a data frame containing strings and numbers
Hi everyone, I have a data frame one of whose columns is a character vector and the rest are numeric, and in debugging a script, I noticed that an ifelse call seems to be coercing the character column to a numeric column, and producing unintended values as a result. Roughly, here's what I tried to do: df: a data frame with, say, the first column as a character column and the second and third columns numeric. also: NA's occur only in the numeric columns, and if they occur in one, they occur in the other as well. I wanted to replace the NA's in column 2 with 0's and the ones in column 3 with 1's, so first I did this: na.replacements -ifelse(col(df)==2,0,1). Then I used a second ifelse call to try to remove the NA's as I wanted, first by doing this: clean.df - ifelse(is.na(df), na.replacements, df), which produced a list of lists vaguely resembling df, with the NA's mostly intact, and so then I tried this: clean.df - ifelse(is.na(df), na.replacements, unlist(df)), which seems to work if all the columns are numeric, but otherwise changes strings to numbers. I can't make sense of the help documentation enough to clear this up, but my guess is that the yes and no values passed to ifelse need to be vectors, in which case it seems I'll have to use another approach entirely, but even if is not the case and lists are acceptable, I'm not sure how to convert a mixed-mode data frame into a vector-like list of elements (which I would hope would work). I'd be grateful for any suggestions! Thanks, David Romano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using ifelse to remove NA's from specific columns of a data frame containing strings and numbers
Thanks for the suggestion, Bert; I just re-read the introduction with particular attention to the sections you mentioned, but I don't see how any of it bears on my question. Namely -- to rephrase: What constraints are there on the form of the yes and no values required by ifelse? The introduction doesn't really speak to this, and the help documentation seems to suggest that as long the shapes of the test, yes values, and no values agree, that would be sufficient -- I don't see anything that specifies that any of these should be of a particular data type. My example, however, seems to indicate that the yes and no values can't be a mixture of characters and numbers, and I'm trying to figure out what the underlying constraints are on ifelse. Thanks again, David On Thu, Nov 15, 2012 at 6:46 AM, Bert Gunter gunter.ber...@gene.com wrote: David: You seem to be getting lost in basic R tasks. Have you read the Intro to R tutorial? If not, do so, as this should tell you how to do what you need. If so, re-read the sections on indexing ([), replacement, and NA's. Also read about character vectors and factors. -- Bert On Thu, Nov 15, 2012 at 3:19 AM, David Romano drom...@stanford.edu wrote: Hi everyone, I have a data frame one of whose columns is a character vector and the rest are numeric, and in debugging a script, I noticed that an ifelse call seems to be coercing the character column to a numeric column, and producing unintended values as a result. Roughly, here's what I tried to do: df: a data frame with, say, the first column as a character column and the second and third columns numeric. also: NA's occur only in the numeric columns, and if they occur in one, they occur in the other as well. I wanted to replace the NA's in column 2 with 0's and the ones in column 3 with 1's, so first I did this: na.replacements -ifelse(col(df)==2,0,1). Then I used a second ifelse call to try to remove the NA's as I wanted, first by doing this: clean.df - ifelse(is.na(df), na.replacements, df), which produced a list of lists vaguely resembling df, with the NA's mostly intact, and so then I tried this: clean.df - ifelse(is.na(df), na.replacements, unlist(df)), which seems to work if all the columns are numeric, but otherwise changes strings to numbers. I can't make sense of the help documentation enough to clear this up, but my guess is that the yes and no values passed to ifelse need to be vectors, in which case it seems I'll have to use another approach entirely, but even if is not the case and lists are acceptable, I'm not sure how to convert a mixed-mode data frame into a vector-like list of elements (which I would hope would work). I'd be grateful for any suggestions! Thanks, David Romano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting character vectors into multiple vectors using strsplit
Hi again, I just wanted to thank folks for their suggestions; they led me to understand the *apply family a little better, and to realize the issue was really a question of how to convert a list of equal length vectors into a matrix. In this case sapply only needs to be asked to identify these vectors individually; I don't know if R has the equivalent of an identity function, but the following solution accomplishes this: splitvectors - sapply(splitlist, function(x) x) splitvectors [,1] [,2] [1,] a1 a2 [2,] b1 b2 or, by replacing the anonymous function by c, we obtain a more elegant but more wasteful solution. Thanks again for everyone's help, David Romano On Fri, Sep 7, 2012 at 11:12 AM, David Romano roma...@grinnell.edu wrote: Hi folks, Suppose I create the character vector charvec by charvec-c(a1.b1,a2.b2) charvec [1] a1.b1 a2.b2 and then I use strsplit on charvec as follows: splitlist-strsplit(charvec,split=.,fixed=TRUE) splitlist [[1]] [1] a1 b1 [[2]] [1] a2 b2 I was wondering whether there is already a function which can extract the a and b parts of the list splitlist; that is, that can return the same vectors as those created by c(a1,a2) and c(b1,b2). Thanks, David Romano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] splitting character vectors into multiple vectors using strsplit
Hi folks, Suppose I create the character vector charvec by charvec-c(a1.b1,a2.b2) charvec [1] a1.b1 a2.b2 and then I use strsplit on charvec as follows: splitlist-strsplit(charvec,split=.,fixed=TRUE) splitlist [[1]] [1] a1 b1 [[2]] [1] a2 b2 I was wondering whether there is already a function which can extract the a and b parts of the list splitlist; that is, that can return the same vectors as those created by c(a1,a2) and c(b1,b2). Thanks, David Romano __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ways of getting around allocMatrix limit?
I need to multiply to very large, nonsparse matrices, and so get the error allocMatrix: too many elements specified. Is there a way to set the limit for allocMatrix? In my case, the two matrices, A and B, are nxm and mxp where m is small, so I could subdivide each into blocks of submatrices A=rbind(A1,A2,...) and B=cbind(B1,B2,...) then multiply each pair of submatrices, but I was thinking there must be a better way to get around the allocMatrix limit. I'd be grateful for any suggestions! Thanks, David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using save() to work with objects that exceed memory capacity
On Sun, Jul 29, 2012 at 7:08 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: On Sat, Jul 28, 2012 at 10:48 AM, David Romano roma...@grinnell.edu wrote: Context: I'm relatively new to R and am working with very large datasets. General problem: If working on a dataset requires that I produce more than two objects of roughly the size of the dataset, R quickly uses up its available memory and slows to a virtual halt. My tentative solution: To save and remove objects as they're created, and load them when I need them. To do this I'm trying to automatically generate file names derived from these objects, and use these in save(). My specific question to the list: How do I capture the string that names an object I want to save, in such a way that I can use it in a function that calls save()? For example, suppose I create a matrix and then save it follows: mat-matrix(1:9,3,3) save(mat, file=matfile) Then I get a file of the kind I'd like: the command 'load(matfile)' retrieves the correct matrix, with the original name 'mat'. Further, if I instead save it this way: objectname-mat save(list=ls(pattern=objectname), file=matfile) then I get the same positive result. But now suppose I create a function saveobj - function(objectname,objectfile) + { + save(list=ls(pattern=objectname),file=objectfile); + return()}; Then if I now try to save 'mat' by matname-mat saveobj(matname,matfile) I do not get the same result; namely, the command 'load(mat)' retrieves no objects. Why is this? load(matfile) no? Yes. It seems to work for me: R x - matrix(1:9, ncol = 3) R saveobj - function(obj, file){ + save(list = obj, file = file) + } R exists(x) [1] FALSE R saveobj(x, amatrix.rdat) R rm(x) R load(amatrix.rdat) R x [,1] [,2] [,3] [1,]147 [2,]258 [3,]369 Cheers, Michael Thanks, Michael, for locating the trouble in the unessary call to ls(), and thanks to Duncan Murdoch, too, for pointing out how ls() was causing the observed behavior: without including an argument like envir=parent.frame(), ls() only returns local objects created after the call to saveobj. Very helpful -- thanks to you both! Best, David I'd be grateful for any help on either my specific questions, or suggestions of a better ways to address the issue of limited memory. Thanks, David Romano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using save() to work with objects that exceed memory capacity
Context: I'm relatively new to R and am working with very large datasets. General problem: If working on a dataset requires that I produce more than two objects of roughly the size of the dataset, R quickly uses up its available memory and slows to a virtual halt. My tentative solution: To save and remove objects as they're created, and load them when I need them. To do this I'm trying to automatically generate file names derived from these objects, and use these in save(). My specific question to the list: How do I capture the string that names an object I want to save, in such a way that I can use it in a function that calls save()? For example, suppose I create a matrix and then save it follows: mat-matrix(1:9,3,3) save(mat, file=matfile) Then I get a file of the kind I'd like: the command 'load(matfile)' retrieves the correct matrix, with the original name 'mat'. Further, if I instead save it this way: objectname-mat save(list=ls(pattern=objectname), file=matfile) then I get the same positive result. But now suppose I create a function saveobj - function(objectname,objectfile) + { + save(list=ls(pattern=objectname),file=objectfile); + return()}; Then if I now try to save 'mat' by matname-mat saveobj(matname,matfile) I do not get the same result; namely, the command 'load(mat)' retrieves no objects. Why is this? I'd be grateful for any help on either my specific questions, or suggestions of a better ways to address the issue of limited memory. Thanks, David Romano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.