[ESS] error merging R source files in ESS mode

2020-06-21 Thread David Romano via ESS-help
Hi,

I'm a long-time Emacs and R user, but have never used ESS (at least
not on purpose -- it's bundled with Aquamacs Emacs, which is what I
use.)

I often use Emacs to merge other types of files, but when I just
selected two R source files to merge, I got the error message:

"Customise alist is not specified, nor  ess-local-customize-alist is set."

but comparing the files didn't trigger the error.

I'd be grateful for any suggestions about how to get merge working, or
where to look for information about how to get it working.  Aside from
disabling ESS-mode, I'm not sure how to proceed.

Many thanks,
David Romano

__
ESS-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help


[R] ggplot2: how to jitter spaghetti plot so slopes are preserved

2014-08-20 Thread David Romano
Hi,

Suppose I have a the data frame given by:
 dput(toy.df)
structure(list(id = c(1, 2, 1, 2), time = c(1L, 1L, 2L, 2L),
value = c(1, 2, 2, 3)), .Names = c(id, time, value), row.names = c(NA,
4L), class = data.frame)

that is:
 toy.df
  id time value
1  11 1
2  21 2
3  12 2
4  22 3

I can create a spaghetti plot with the command:
 ggplot(toy.df,aes(x=time,y=value,group=id,color=factor(id))) + geom_line()

What I'd like to be able to do is jitter the lines themselves by
translation so that their slopes are preserved, but so far my attempts
to jitter -- within ggplot, as opposed to first jittering toy.df by
hand -- seem to always jitter the two points for a given id
independently, and thus change the slopes.

I'd be grateful for any guidance!

Thanks,
David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] trouble using sapply to perform multiple t-test

2014-02-16 Thread David Romano
Thanks Arun and Jim; this helps me sort out several points I hadn't been
aware of!   -David


On Sat, Feb 15, 2014 at 1:39 PM, arun smartpink...@yahoo.com wrote:



 Hi David,
 Try:
 Check the output of:
 lapply(mm,function(x) x) #mm is matrix
 #and
 lapply(as.data.frame(mm),function(x) x)




   sapply(split(mm,col(mm)),function(x){out -
 t.test(x[1:15],x[16:25])$p.value})
  #   1 2
 #0.1091573 1.000


 #or
  sapply(as.data.frame(mm), function(x)  t.test(x[1:15],x[16:25])$p.value)
 #   V1V2
 #0.1091573 1.000

 A.K.


 On Saturday, February 15, 2014 3:19 PM, David Romano drom...@stanford.edu
 wrote:
 Hi folks,

 I'm having trouble with code that used to work, and I can't figure out
 what's going wrong.  I'd be grateful for any help in sorting this out.


 Suppose I define a matrix
  mm - matrix(1:15, 25,2)
 and compare the first 15 values of column 1 of mm to the values remaining
 in the same column and obtain p values as follows:
  c1 - mm[,1]
  out - t.test(c1[1:15],c1[16:25]) ; out$p.value

 This of course works fine, but if I try to embed this line in a call to
 sapply to repeat this for each column, I get the following:
  mm.pvals - sapply(mm, function(x) {out - t.test(x[1:15],x[16:25]) ;
 out$p.value})
 Error in t.test.default(x[1:15], x[16:25]) : not enough 'x' observations

 What is baffling is code like this has worked for me before, and I can't
 tell what's triggering the error.

 Thanks in advance for your help!

 Best,
 David

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] trouble using sapply to perform multiple t-test

2014-02-15 Thread David Romano
Hi folks,

I'm having trouble with code that used to work, and I can't figure out
what's going wrong.  I'd be grateful for any help in sorting this out.


Suppose I define a matrix
 mm - matrix(1:15, 25,2)
and compare the first 15 values of column 1 of mm to the values remaining
in the same column and obtain p values as follows:
 c1 - mm[,1]
 out - t.test(c1[1:15],c1[16:25]) ; out$p.value

This of course works fine, but if I try to embed this line in a call to
sapply to repeat this for each column, I get the following:
 mm.pvals - sapply(mm, function(x) {out - t.test(x[1:15],x[16:25]) ;
out$p.value})
Error in t.test.default(x[1:15], x[16:25]) : not enough 'x' observations

What is baffling is code like this has worked for me before, and I can't
tell what's triggering the error.

Thanks in advance for your help!

Best,
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to parallelize 'apply' across multiple cores on a Mac

2013-05-04 Thread David Romano
(I neglected to use reply-all.)

-- Forwarded message --
From: David Romano drom...@stanford.edu
Date: Sat, May 4, 2013 at 11:25 AM
Subject: Re: [R] how to parallelize 'apply' across multiple cores on a Mac
To: Charles Berry ccbe...@ucsd.edu


On Sat, May 4, 2013 at 9:32 AM, Charles Berry ccbe...@ucsd.edu wrote:
 David,

 If you insist on explicitly parallelizing this:

 The functions in the recommended package 'parallel' work on a Mac.

 I would not try to work on each tiny column as a separate function call -
 too much overhead if you parallelize - instead, bundle up 100-1000 columns
 to operate on.

 The calc's you describe are sound simple enough that I would just write
 them in C and use the .Call interface to invoke them. You only need enough
 working memory in C to operate on one column and space to save the result.

 So a MacBook with 8GB of memory will handle it with room to breathe.

 This is a good use case for the 'inline' package, especially if you are
 unfamiliar with the use of .Call.


 ===

 But it might be as fast to forget about paralleizing this (explicitly).

[detailed recommendations deleted]

 On a Mac, the vecLib BLAS will do crossprod using the multiple
 cores without your needing to do anything special. So you can forget about
 'parallel', 'multicore', etc.


 So your remaining problem is to reread steps 2=6 and figure out what
 'minimal.matrix' and 'fill.rows' have to be.

 ===

 You can also approach this problem using 'filter', but that can get
 'convoluted' (pun intended - see ?filter).

 HTH,

Thanks, Charles, for all the helpful pointers!   For the moment, I'll
leave parallelization aside, and will explore using 'crossprod' and
'filter'.   Although, from your suggestion that 8 GB of memory should
be sufficient if I went the parallel, I also wonder whether I'm
suffering not just from inefficient use of computing resources, but
that there's a memory leak as well:   The original 'apply' code would,
in much less than a minute, take over the full 18 GB of memory
available on my workstation, and then leave it functioning at a crawl
for at least a half hour or so.   I'll ask about this by reposting
this message again with a different subject, so no need to address it
in this thread.

Thanks again,
David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] memory leak using 'apply'? [was: how to parallelize 'apply' across multiple cores on a Mac]

2013-05-04 Thread David Romano
Hi everyone,

From the answers I've received to the question below, it occurs to me there
may be more than inefficient programming on my part involved:   The 'apply'
code described below quickly takes up the 18 GB of memory I have available,
which leaves my machine functioning at a crawl for the at least 30 minutes
(likely more) it takes for R complete it computations.   Similar behavior
arises when try to add even a handful of columns to the matrix (data frame,
really) I obtain from the 'apply' described below, the only difference
being how long it takes to complete the task, which is more on the order of
five minutes for adding four columns.

I'd be grateful for any suggestions about how to trouble-shoot what's
happening, or how to prevent R from taking up so much of the available
memory (which is then not released until I restart R)!

Thanks in advance for you help,
David

On Fri, May 3, 2013 at 4:56 PM, David Romano drom...@stanford.edu wrote:

 Hi everyone,

 I'm trying to use apply (with a call to zoo's rollapply within) on the
 columns of a 1.5Kx165K matrix, and I'd like to make use of the other cores
 on my machine to speed it up. (And hopefully also leave more memory free: I
 find that after I create a big object like this, I have to save my
 workspace and then close and reopen R to be able to recover memory tied up
 by R, but maybe that's a separate issue -- if so, please let me know!)

 It seems the package 'multicore' has a parallel version of 'lapply', which
 I suppose I could combine with a 'do.call' (I think) to gather the elements
 of the output list into a matrix, but I was wondering whether there might
 be another route.

 And, in case the particular way I constructed the call to 'apply' might be
 the source of the problem, here is a deconstructed version of what I did to
 each column, for easier parsing:
 -  begin call to 'apply'
 
 Step 1:  Identify several disjoint subsequences of fixed length, say
 length three, of a column.

 column.values - 1:16
 desired.subseqs - c( NA, NA, NA, 1, 1, 1, NA, 1, 1, 1, NA, NA, 1,1,1, NA
 )   # this vector is used for every column.
 desired.values - desired.subseq * column.values

 Step 2:  Find the average value of each subsequence.

 desired.means - rollapply( desired.values, 3, mean, fill=NA, align =
 right, na.rm = FALSE)  # put mean in highest index of subsequence and
 retain original vector length
 desired.means
 [1] NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA NA 14 NA

 Step 3:   Shift values forward by one index value, retaining original
 vector length.

 desired.means - zoo( desired.means )  # in order to be able to use lag.zoo
 desired.means - lag( desired.means, k = -1, na.pad = TRUE)
 desired.means
 [1] NA NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA 14

 Step 4:   Use last-observation-carried-forward, retaining original vector
 length.

 desired.means - na.locf( desired.means, na.rm = FALSE )
 desired.means
 [1] NA NA NA NA NA NA 5 5 5 5 9 9 9 9 9 14

 Step 5:  Use next-observation-carried-backward to assign values to initial
 sequence of NAs.

 desired.means - na.locf( desired.means, fromLast = TRUE)
 desired.means
 [1] 5 5 5 5 5 5 5 5 5 5 9 9 9 9 9 14

 Step 6:  Convert back to vector (from zoo object), and subtract from
 column.

 desired.column - vector.values - coredata(desired.means)
 desired.column
 [1] -4 -3 -2 -1 0 1 2 3 4 5 2 3 4 5 6 2
 -  end call to 'apply' 

 Thanks,
 David



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to parallelize 'apply' across multiple cores on a Mac

2013-05-03 Thread David Romano
Hi everyone,

I'm trying to use apply (with a call to zoo's rollapply within) on the
columns of a 1.5Kx165K matrix, and I'd like to make use of the other cores
on my machine to speed it up. (And hopefully also leave more memory free: I
find that after I create a big object like this, I have to save my
workspace and then close and reopen R to be able to recover memory tied up
by R, but maybe that's a separate issue -- if so, please let me know!)

It seems the package 'multicore' has a parallel version of 'lapply', which
I suppose I could combine with a 'do.call' (I think) to gather the elements
of the output list into a matrix, but I was wondering whether there might
be another route.

And, in case the particular way I constructed the call to 'apply' might be
the source of the problem, here is a deconstructed version of what I did to
each column, for easier parsing:
-  begin call to 'apply'

Step 1:  Identify several disjoint subsequences of fixed length, say length
three, of a column.

column.values - 1:16
desired.subseqs - c( NA, NA, NA, 1, 1, 1, NA, 1, 1, 1, NA, NA, 1,1,1, NA
)   # this vector is used for every column.
desired.values - desired.subseq * column.values

Step 2:  Find the average value of each subsequence.

desired.means - rollapply( desired.values, 3, mean, fill=NA, align =
right, na.rm = FALSE)  # put mean in highest index of subsequence and
retain original vector length
desired.means
[1] NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA NA 14 NA

Step 3:   Shift values forward by one index value, retaining original
vector length.

desired.means - zoo( desired.means )  # in order to be able to use lag.zoo
desired.means - lag( desired.means, k = -1, na.pad = TRUE)
desired.means
[1] NA NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA 14

Step 4:   Use last-observation-carried-forward, retaining original vector
length.

desired.means - na.locf( desired.means, na.rm = FALSE )
desired.means
[1] NA NA NA NA NA NA 5 5 5 5 9 9 9 9 9 14

Step 5:  Use next-observation-carried-backward to assign values to initial
sequence of NAs.

desired.means - na.locf( desired.means, fromLast = TRUE)
desired.means
[1] 5 5 5 5 5 5 5 5 5 5 9 9 9 9 9 14

Step 6:  Convert back to vector (from zoo object), and subtract from column.

desired.column - vector.values - coredata(desired.means)
desired.column
[1] -4 -3 -2 -1 0 1 2 3 4 5 2 3 4 5 6 2
-  end call to 'apply' 

Thanks,
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to best add columns to a matrix with many columns

2013-05-03 Thread David Romano
Hi everyone,

I have large data frame, say df1,  with 165K columns, and all but the first
four columns of df1 are numeric.   I transformed the numeric data and
obtained a matrix, call it data.m, with 165K - 4 columns, and then tried to
create a second data frame by replacing the numeric columns of df1 by
data.m.  I did this in two ways, and both ways instantly used up all the
available memory, so I was wondering whether there was a better way to do
this.

Here's what I tried:

df2 - df1
df2[ ,5:length(df1)] - data.m

and

df2 - cbind( df1[1:4], data.m)

Thanks,
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to best add columns to a matrix with many columns

2013-05-03 Thread David Romano
Sorry, Jeff, I misspoke:  the 'matrix' data.m is really a data frame -- I
was just thinking about it as a matrix since it's the numeric part of df1,
and didn't realize the thought made it's way in the message.   So the
memory issues are unrelated to converting between data frames and
matrices.  -David

On Fri, May 3, 2013 at 8:20 PM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote:

 I am not seeing any good justification in your description for converting
 to matrix if you are planning to convert it back to data frame. Memory is
 going to be inefficiently-used if you do this.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 David Romano drom...@stanford.edu wrote:

 Hi everyone,
 
 I have large data frame, say df1,  with 165K columns, and all but the
 first
 four columns of df1 are numeric.   I transformed the numeric data and
 obtained a matrix, call it data.m, with 165K - 4 columns, and then
 tried to
 create a second data frame by replacing the numeric columns of df1 by
 data.m.  I did this in two ways, and both ways instantly used up all
 the
 available memory, so I was wondering whether there was a better way to
 do
 this.
 
 Here's what I tried:
 
 df2 - df1
 df2[ ,5:length(df1)] - data.m
 
 and
 
 df2 - cbind( df1[1:4], data.m)
 
 Thanks,
 David
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about reproducibility/consistency of principal component and lda directions in R

2013-02-10 Thread David Romano
On Sat, Feb 9, 2013 at 11:43 AM, Uwe Ligges lig...@statistik.tu-dortmund.de
 wrote:



 On 08.02.2013 20:14, David Romano wrote:

 Hi everyone,

 I'm not exactly sure how to ask this question most clearly, but I hope
 that
 giving the context in which it occurs for me will help:   I'm trying to
 compare the brain images of two patient populations; each image is
 composed
 of voxels (the 3D analogue of pixels), and I have two images per patient,
 one reflecting grey matter concentration at each voxel, and the other
 reflecting white matter concentration at each voxel.

 I determined the groups by means of an analysis that involved information
 from both types of images, and what I set out to do was to get a rough
 idea
 of where in the brain the two groups showed the most striking differences.

 My first attempt was to replace -- on a voxel by voxel basis -- the
 bivariate grey/white data by a combined univariate measure, namely the
 first principal component score.   From these principal component scores I
 calculated Cohen's d to obtain a rough estimate of the effect size at each
 voxel, and the resulting brain images show very nice separation into
 meaningful brain regions, some corresponding to negative effect sizes and
 some to positive ones.

 What puzzles me about how nice the separation into brain regions is, is
 that the meaning of positive and negative is determined by the choice of
 the first principal component direction at each voxel, but this choice is
 -- in principle (no pun intended -- sorry!) -- arbitrary.  (Meaning
 whether
 an eigenvector or its negative is chosen as the direction is in principle
 arbitrary.)

 So here are my questions:   Does the algorithm used in R produce the same
 principal component directions if applied to the same data repeatedly?


 Yes, but it may change if you execute it on another machine (depends on
 compiler hence also 32-bit vs 64-bit and OS).



  And if so, should the directions chosen by the algorithm change
 continuously with the data?  For example, if one data set were obtained by
 applying a small amount of noise to another, should the resulting
 directions be close to each other (as opposed to close negative of each
 other)?  (Assuming the data is far from being singular in some vague
 sense I'm not sure how to make precise.)


 Noise means the sign can change again.

 Of course, you can define yourself e.g. the direction of the very first
 value and change signs otherwise.



  My second attempt was to do the same, but with the first lda scores, so I
 have the same questions about lda directions, too.



 Same for lda.

 Best,
 Uwe Ligges


Thanks, Uwe; all good to know.

Best,
David






  Any light you could shed on these questions would be very welcome!

 Thanks in advance,
 David Romano

 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] different behavior of $ with string literal vs string variable as argument

2013-02-10 Thread David Romano
Hi everyone,

I ran into the issue below while trying to execute a command of the form

apply(list.names,1, function(x)  F(favorite.list$x) )

where list.names is a character vector containing the names of the elements
of favorite.list and F is some function defined on a list element.

Namely,  the $ operator doesn't treat the string variable 'x' as the string
it represents, so that, e.g.

 ll - list(ss=abc)
 ll$ss
[1] abc
 ll$ss
[1] abc

but

 name - ss
 ll$name
NULL

I can get around this by using integers and the [[ and [ operators, but I'd
like to be able to use names directly, too -- how would I go about doing
this?

Thanks for your help in clarifying what might be going on here.

David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] different behavior of $ with string literal vs string variable as argument

2013-02-10 Thread David Romano
On Sun, Feb 10, 2013 at 1:40 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote:

 On 13-02-10 4:06 PM, David Romano wrote:

 Hi everyone,

 I ran into the issue below while trying to execute a command of the form

 apply(list.names,1, function(x)  F(favorite.list$x) )

 where list.names is a character vector containing the names of the
 elements
 of favorite.list and F is some function defined on a list element.

 Namely,  the $ operator doesn't treat the string variable 'x' as the
 string
 it represents, so that, e.g.

  ll - list(ss=abc)
 ll$ss

 [1] abc

 ll$ss

 [1] abc

 but

  name - ss
 ll$name

 NULL

 I can get around this by using integers and the [[ and [ operators, but
 I'd
 like to be able to use names directly, too -- how would I go about doing
 this?

 Thanks for your help in clarifying what might be going on here.


 You can use names with [[, e.g.

 ll[[name]]

 will work in your example.  You can see more details in the help topic
 help($), in the section Recursive (list-like) objects.

 Duncan Murdoch



Thanks, Duncan (and Michael, earlier); this clear everything up.  And just
so the help topic language is included in this thread:

Recursive (list-like) objects

Indexing by [ is similar to atomic vectors and selects a list of the
specified element(s).

Both [[ and $ select a single element of the list. The main difference is
that $ does not allow computed indices, whereas [[ does.
-

which I take to mean that the argument to $ cannot require evaluation of
any kind, and so must be a string literal.

Thanks again,
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] different behavior of $ with string literal vs string variable as argument

2013-02-10 Thread David Romano
Sorry, this was meant to go to the full list.  -David

On Sun, Feb 10, 2013 at 2:15 PM, David Romano drom...@stanford.edu wrote:



 On Sun, Feb 10, 2013 at 1:59 PM, Bert Gunter gunter.ber...@gene.comwrote:

 Please read the Help before posting.

 ?$ says:


 It helps to know that $ must be quoted, so thanks again goes to Duncan for
 pointing this out.


  Both [[ and $ select a single element of the list. The main
 difference is that $ **does not allow computed indices** , whereas [[
 does. x$name is equivalent to x[[name, exact = FALSE]]. Also, the
 partial matching behavior of [[ can be controlled using the exact
 argument.  [emphasis added]

 In other words, $ does not evaluate its argument.

 This also appeared just a couple of days ago on this list, so please
 also search Help archives before posting.


 I did search, but as Ben points out in the next message in the thread,
 it's tricky to formulate the search to get hits, and, for example, I
 wouldn't have realized the post he refers to there involves the same issue
 unless I already knew the answer.

 David

 -- Bert

 On Sun, Feb 10, 2013 at 1:06 PM, David Romano drom...@stanford.edu
 wrote:
  Hi everyone,
 
  I ran into the issue below while trying to execute a command of the form
 
  apply(list.names,1, function(x)  F(favorite.list$x) )
 
  where list.names is a character vector containing the names of the
 elements
  of favorite.list and F is some function defined on a list element.
 
  Namely,  the $ operator doesn't treat the string variable 'x' as the
 string
  it represents, so that, e.g.
 
  ll - list(ss=abc)
  ll$ss
  [1] abc
  ll$ss
  [1] abc
 
  but
 
  name - ss
  ll$name
  NULL
 
  I can get around this by using integers and the [[ and [ operators, but
 I'd
  like to be able to use names directly, too -- how would I go about doing
  this?
 
  Thanks for your help in clarifying what might be going on here.
 
  David
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to extract test for collinearity and constantcy used in lda

2013-02-08 Thread David Romano
Just posting to answer my own question, at least for the variables
constant error:  I hadn't noticed that lda has an argument called 'tol'
that governs when variables are interpreted as constant within groups; it's
right there in the help entry for lda, so I apologize for having missed
it.As to the variables collinear warning,  it's still not clear to me
what level of correlation will trigger it.

My apologies,
David

On Wed, Feb 6, 2013 at 12:21 PM, David Romano drom...@stanford.edu wrote:

 Hi everyone,

 I'm trying to vectorize an application of lda to each 2D slice of a 3D
 array, but am running into trouble:  It seems there are quite a few 2D
 slices that trigger either the variables are collinear warning, or worse,
 trigger a variable appears to be constant within groups error and fails
 (i.e., ceases computation rather than skips bad slice).

 There are cases where neither warning is literally true, so I expect the
 warning and error must be triggered in a neighborhood of collinearity and
 within-group-constancy, and I would like to be able to remove the offending
 slice in advance.   Does anyone know where I can find the explicit tests
 that are used for these?

 Thanks in advance for any light you can help shed on this question.

 Best,
 David

 P.S.  The 3D array has roughly 40K 2D slices, so inspection is not an
 option!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question about reproducibility/consistency of principal component and lda directions in R

2013-02-08 Thread David Romano
Hi everyone,

I'm not exactly sure how to ask this question most clearly, but I hope that
giving the context in which it occurs for me will help:   I'm trying to
compare the brain images of two patient populations; each image is composed
of voxels (the 3D analogue of pixels), and I have two images per patient,
one reflecting grey matter concentration at each voxel, and the other
reflecting white matter concentration at each voxel.

I determined the groups by means of an analysis that involved information
from both types of images, and what I set out to do was to get a rough idea
of where in the brain the two groups showed the most striking differences.

My first attempt was to replace -- on a voxel by voxel basis -- the
bivariate grey/white data by a combined univariate measure, namely the
first principal component score.   From these principal component scores I
calculated Cohen's d to obtain a rough estimate of the effect size at each
voxel, and the resulting brain images show very nice separation into
meaningful brain regions, some corresponding to negative effect sizes and
some to positive ones.

What puzzles me about how nice the separation into brain regions is, is
that the meaning of positive and negative is determined by the choice of
the first principal component direction at each voxel, but this choice is
-- in principle (no pun intended -- sorry!) -- arbitrary.  (Meaning whether
an eigenvector or its negative is chosen as the direction is in principle
arbitrary.)

So here are my questions:   Does the algorithm used in R produce the same
principal component directions if applied to the same data repeatedly?
And if so, should the directions chosen by the algorithm change
continuously with the data?  For example, if one data set were obtained by
applying a small amount of noise to another, should the resulting
directions be close to each other (as opposed to close negative of each
other)?  (Assuming the data is far from being singular in some vague
sense I'm not sure how to make precise.)

My second attempt was to do the same, but with the first lda scores, so I
have the same questions about lda directions, too.

Any light you could shed on these questions would be very welcome!

Thanks in advance,
David Romano

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to multiply list of matrices by list of vectors

2013-02-06 Thread David Romano
Thanks Rolf and Arun!   -David

On Wed, Feb 6, 2013 at 6:13 AM, arun smartpink...@yahoo.com wrote:

 Hi,

 I got an error message with:
 vlist - apply(mm, list)

 Error in match.fun(FUN) : argument FUN is missing, with no default
 #assuming that
 vlist - apply(mm,2,list)

 mapply(%*%,mlist,vlist[1:2],SIMPLIFY=FALSE)
 #[[1]]
  #[,1]
 #[1,]   19
 #[2,]   22
 #[3,]   25
 #[4,]   28
 #
 #[[2]]
  #[,1]
 #[1,]   67
 #[2,]   74
 #[3,]   81
 #[4,]   88

 A.K.
 - Original Message -
 From: David Romano drom...@stanford.edu
 To: r-help@r-project.org
 Cc:
 Sent: Wednesday, February 6, 2013 12:50 AM
 Subject: [R] how to multiply list of matrices by list of vectors

 Hi everyone,

 I'd like to be able to apply lda to each 2D matrix slice of a 3D array, and
 then use the scalings to obtain the corresponding lda scores.

 I can use 'apply' to get a list of the lda output for each 2D slice, and
 can create a list of the resulting scalings, but I'm not sure how to
 multiply them in a vectorized way.


 Here's how I made a list of 2D matrices (suggestion on improving this would
 be welcome, too!):

  aa - array(1:24,c(4,2,3))
  mlist - apply(aa,2,list)
  mlist - lapply(mlist, unlist)
  mlist - lapply(mlist, function(x) matrix(x,4,2))

 and here's how I made a list of vectors:

  mm - matrix(1:6,2,3)
  vlist - apply(mm, list)
  vlist - lapply(vlist, unlist)

 Now I'd like to make the list whose i-th element is mlist[[i]]%*%vlist[[i]]
 without having to loop through the indices.

 Any help would be appreciated!

 Thanks,
 David

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to extract test for collinearity and constantcy used in lda

2013-02-06 Thread David Romano
Hi everyone,

I'm trying to vectorize an application of lda to each 2D slice of a 3D
array, but am running into trouble:  It seems there are quite a few 2D
slices that trigger either the variables are collinear warning, or worse,
trigger a variable appears to be constant within groups error and fails
(i.e., ceases computation rather than skips bad slice).

There are cases where neither warning is literally true, so I expect the
warning and error must be triggered in a neighborhood of collinearity and
within-group-constancy, and I would like to be able to remove the offending
slice in advance.   Does anyone know where I can find the explicit tests
that are used for these?

Thanks in advance for any light you can help shed on this question.

Best,
David

P.S.  The 3D array has roughly 40K 2D slices, so inspection is not an
option!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to multiply list of matrices by list of vectors

2013-02-05 Thread David Romano
Hi everyone,

I'd like to be able to apply lda to each 2D matrix slice of a 3D array, and
then use the scalings to obtain the corresponding lda scores.

I can use 'apply' to get a list of the lda output for each 2D slice, and
can create a list of the resulting scalings, but I'm not sure how to
multiply them in a vectorized way.


Here's how I made a list of 2D matrices (suggestion on improving this would
be welcome, too!):

 aa - array(1:24,c(4,2,3))
 mlist - apply(aa,2,list)
 mlist - lapply(mlist, unlist)
 mlist - lapply(mlist, function(x) matrix(x,4,2))

and here's how I made a list of vectors:

 mm - matrix(1:6,2,3)
 vlist - apply(mm, list)
 vlist - lapply(vlist, unlist)

Now I'd like to make the list whose i-th element is mlist[[i]]%*%vlist[[i]]
without having to loop through the indices.

Any help would be appreciated!

Thanks,
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] odd behavior of browser()

2012-12-12 Thread David Romano
On Tue, Dec 4, 2012 at 2:12 PM, David Romano drom...@stanford.edu wrote:



 On Tue, Dec 4, 2012 at 10:22 AM, Duncan Murdoch 
 murdoch.dun...@gmail.comwrote:

 On 04/12/2012 12:54 PM, David Romano wrote:

 Hi everyone,

 I normally include a call to browser() as I'm working out the kinks in my
 scripts, and I am always able to step through each line by hitting
 Return, but for some reason, in the scripts I'm working on now, hitting
 Return seems to cause execution of *all* the lines in my script.  I've
 restarted R several times in case it was stuck in a bad state for some
 reason, but I'm consistently getting this behavior anyway.  Has anyone
 run
 into this problem before?  Maybe I inadvertently reset preferences?


 I wouldn't have expected that to work.  Calling browser() from within a
 function will let you step through the function, but calling it from within
 a script doesn't.  Do you really have some scripts where this worked?

 Duncan Murdoch



 Hi Duncan (and this addresses Michael's earlier comment, too),

 I've been using browser() in scripts since this summer, which is when I
 started using R, and -- until now -- it has always worked to step through
 the scripts, and -- in regards to Michael's comment -- whether or not there
 were blank lines in the script...

 David Romano


Hi everyone,

I forgot to cc r-help in the response above, and found out why browser()
had been working in my scripts up to now:  All of my scripts so far had
consisted of a body of code that was applied in the same range of contexts
via nested 'for' loops, so that each script had the form

browser()
for (c in context){
  body
}

in which case I could run through the body one line at a time.

So -- outside of when it's called from inside a function -- I still can't
make sense of exactly when browser() will do this, but I now have at least
one way to run through a script.

Thanks to Michael and Duncan for their skepticism, which kept me going in
search of what happened!

David Romano










 An example which produces this behavior is the following:

 file bugcheck.r:

 browser()

 a - 1
 b - 2

  source(bugcheck.r)
 Called from: eval(expr, envir, enclos)
 Browse[1]  Return
 
  ls()
 [1] a b
  a
 [1] 1
  b
 [1] 2

 I'd be grateful for any help in resolving this!

 Thanks,
 David Romano

 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using 'apply' to apply princomp to an array of datasets

2012-12-12 Thread David Romano
Hi everyone,

Suppose I have a 3D array of datasets, where say dimension 1 corresponds to
cases, dimension 2 to datasets, and dimension 3 to observations within a
dataset.  As an example, suppose I do the following:

 x - sample(1:20, 48, replace=TRUE)
 datasets - array(x, dim=c(4,3,2))

Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single data
matrix with four cases and two observations.  Now, I'd like to be able to
do the following: apply pca to each dataset, and create a matrix of the
first principal component scores.

In this example, I could do:

 pcl-apply(datasets,2,princomp)

which yields a list of princomp output, one for each dataset, so that the
vector of first principal component scores for dataset 1 is obtained by

 score1set1 - pcl[[1]]$scores[,1]

and I could then obtain the desired matrix by

 score1matrix - cbind( score1set1, score1set2, score1set3)


So my first question is: 1) how could I use *apply to do this?  I'm having
trouble because pcl is a list of lists, so I can't use, say, do.call(cbind,
...) without first having a list of the first component score vectors,
which I'm not sure how to produce.

My second question is: 2) Having answered question 1), now suppose there
may be datasets containing NA value -- how could I select the subset of
values from dimension 2 corresponding to the datasets for which this is
true (again using *apply?)?

Thanks in advance for any light you might be able to shed on these
questions!

David Romano

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using 'apply' to apply princomp to an array of datasets

2012-12-12 Thread David Romano
Sorry, I just realized I didn't send the message below in plain text.
-David Romano

On Wed, Dec 12, 2012 at 9:14 AM, David Romano drom...@stanford.edu wrote:

 Hi everyone,

 Suppose I have a 3D array of datasets, where say dimension 1 corresponds
 to cases, dimension 2 to datasets, and dimension 3 to observations within a
 dataset.  As an example, suppose I do the following:

  x - sample(1:20, 48, replace=TRUE)
  datasets - array(x, dim=c(4,3,2))

 Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single
 data matrix with four cases and two observations.  Now, I'd like to be able
 to do the following: apply pca to each dataset, and create a matrix of the
 first principal component scores.

 In this example, I could do:

  pcl-apply(datasets,2,princomp)

 which yields a list of princomp output, one for each dataset, so that the
 vector of first principal component scores for dataset 1 is obtained by

  score1set1 - pcl[[1]]$scores[,1]

 and I could then obtain the desired matrix by

  score1matrix - cbind( score1set1, score1set2, score1set3)


 So my first question is: 1) how could I use *apply to do this?  I'm having
 trouble because pcl is a list of lists, so I can't use, say, do.call(cbind,
 ...) without first having a list of the first component score vectors, which
 I'm not sure how to produce.

 My second question is: 2) Having answered question 1), now suppose there
 may be datasets containing NA value -- how could I select the subset of
 values from dimension 2 corresponding to the datasets for which this is true
 (again using *apply?)?

 Thanks in advance for any light you might be able to shed on these
 questions!

 David Romano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using 'apply' to apply princomp to an array of datasets

2012-12-12 Thread David Romano
Thank you, Rui!   This is incredibly helpful -- anonymous functions
are new to me, and I appreciate being shown how useful they are.

Best regards,
David

On Wed, Dec 12, 2012 at 10:12 AM, Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,

 As for the first question try

 scoreset - lapply(pcl, function(x) x$scores[, 1])
 do.call(cbind, scoreset)


 As for the second question, you want to know which columns in 'datasets'
 have NA's?

 colidx - apply(datasets, 2, function(x) any(is.na(x)))
 datasets[, colidx]  # These have NA's


 For the column numbers you can do

 colnums - which(colidx)

 Hope this helps,

 Rui Barradas

 Em 12-12-2012 17:14, David Romano escreveu:

 Hi everyone,

 Suppose I have a 3D array of datasets, where say dimension 1 corresponds
 to
 cases, dimension 2 to datasets, and dimension 3 to observations within a
 dataset.  As an example, suppose I do the following:

 x - sample(1:20, 48, replace=TRUE)
 datasets - array(x, dim=c(4,3,2))

 Here, for each j=1,2,3, I'd like to think of datasets[,j,] as a single
 data
 matrix with four cases and two observations.  Now, I'd like to be able to
 do the following: apply pca to each dataset, and create a matrix of the
 first principal component scores.

 In this example, I could do:

 pcl-apply(datasets,2,princomp)

 which yields a list of princomp output, one for each dataset, so that the
 vector of first principal component scores for dataset 1 is obtained by

 score1set1 - pcl[[1]]$scores[,1]

 and I could then obtain the desired matrix by

 score1matrix - cbind( score1set1, score1set2, score1set3)


 So my first question is: 1) how could I use *apply to do this?  I'm having
 trouble because pcl is a list of lists, so I can't use, say,
 do.call(cbind,
 ...) without first having a list of the first component score vectors,
 which I'm not sure how to produce.

 My second question is: 2) Having answered question 1), now suppose there
 may be datasets containing NA value -- how could I select the subset of
 values from dimension 2 corresponding to the datasets for which this is
 true (again using *apply?)?

 Thanks in advance for any light you might be able to shed on these
 questions!

 David Romano

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] odd behavior of browser()

2012-12-04 Thread David Romano
Hi everyone,

I normally include a call to browser() as I'm working out the kinks in my
scripts, and I am always able to step through each line by hitting
Return, but for some reason, in the scripts I'm working on now, hitting
Return seems to cause execution of *all* the lines in my script.  I've
restarted R several times in case it was stuck in a bad state for some
reason, but I'm consistently getting this behavior anyway.  Has anyone run
into this problem before?  Maybe I inadvertently reset preferences?

An example which produces this behavior is the following:

file bugcheck.r:

browser()

a - 1
b - 2

 source(bugcheck.r)
Called from: eval(expr, envir, enclos)
Browse[1]  Return

 ls()
[1] a b
 a
[1] 1
 b
[1] 2

I'd be grateful for any help in resolving this!

Thanks,
David Romano

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using ifelse to remove NA's from specific columns of a data frame containing strings and numbers

2012-11-15 Thread David Romano
Hi everyone,

I have a data frame one of whose columns is a character vector and the rest
are numeric, and in debugging a script, I noticed that an ifelse call seems
to be coercing the character column to a numeric column, and producing
unintended values as a result.   Roughly, here's what I tried to do:

df: a data frame with, say, the first column as a character column and the
second and third columns numeric.

also: NA's occur only in the numeric columns, and if they occur in one,
they occur in the other as well.

I wanted to replace the NA's in column 2 with 0's and the ones in column 3
with 1's, so first I did this:

 na.replacements -ifelse(col(df)==2,0,1).

Then I used a second ifelse call to try to remove the NA's as I wanted,
first by doing this:

 clean.df - ifelse(is.na(df), na.replacements, df),

which produced a list of lists vaguely resembling df, with the NA's mostly
intact, and so then I tried this:

 clean.df - ifelse(is.na(df), na.replacements, unlist(df)),

which seems to work if all the columns are numeric, but otherwise changes
strings to numbers.

I can't make sense of the help documentation enough to clear this up, but
my guess is that the yes and no values passed to ifelse need to be
vectors, in which case it seems I'll have to use another approach entirely,
but even if is not the case and lists are acceptable, I'm not sure how to
convert a mixed-mode data frame into a vector-like list of elements (which
I would hope would work).

I'd be grateful for any suggestions!

Thanks,
David Romano

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using ifelse to remove NA's from specific columns of a data frame containing strings and numbers

2012-11-15 Thread David Romano
Thanks for the suggestion, Bert; I just re-read the introduction with
particular attention to the sections you mentioned, but I don't see how any
of it bears on my question.  Namely -- to rephrase:  What constraints are
there on the form of the yes and no values required by ifelse?   The
introduction doesn't really speak to this, and the help documentation seems
to suggest that as long the shapes of the test, yes values, and no
values agree, that would be sufficient -- I don't see anything that
specifies that any of these should be of a particular data type.   My
example, however, seems to indicate that the yes and no values can't be
a mixture of characters and numbers, and I'm trying to figure out what the
underlying constraints are on ifelse.

Thanks again,
David

On Thu, Nov 15, 2012 at 6:46 AM, Bert Gunter gunter.ber...@gene.com wrote:

 David:

 You seem to be getting lost in basic R tasks. Have you read the Intro
 to R tutorial? If not, do so, as this should tell you how to do what
 you need. If so, re-read the sections on indexing ([), replacement,
 and NA's. Also read about character vectors and factors.

 -- Bert

 On Thu, Nov 15, 2012 at 3:19 AM, David Romano drom...@stanford.edu
 wrote:
  Hi everyone,
 
  I have a data frame one of whose columns is a character vector and the
 rest
  are numeric, and in debugging a script, I noticed that an ifelse call
 seems
  to be coercing the character column to a numeric column, and producing
  unintended values as a result.   Roughly, here's what I tried to do:
 
  df: a data frame with, say, the first column as a character column and
 the
  second and third columns numeric.
 
  also: NA's occur only in the numeric columns, and if they occur in one,
  they occur in the other as well.
 
  I wanted to replace the NA's in column 2 with 0's and the ones in column
 3
  with 1's, so first I did this:
 
  na.replacements -ifelse(col(df)==2,0,1).
 
  Then I used a second ifelse call to try to remove the NA's as I wanted,
  first by doing this:
 
  clean.df - ifelse(is.na(df), na.replacements, df),
 
  which produced a list of lists vaguely resembling df, with the NA's
 mostly
  intact, and so then I tried this:
 
  clean.df - ifelse(is.na(df), na.replacements, unlist(df)),
 
  which seems to work if all the columns are numeric, but otherwise changes
  strings to numbers.
 
  I can't make sense of the help documentation enough to clear this up, but
  my guess is that the yes and no values passed to ifelse need to be
  vectors, in which case it seems I'll have to use another approach
 entirely,
  but even if is not the case and lists are acceptable, I'm not sure how to
  convert a mixed-mode data frame into a vector-like list of elements
 (which
  I would hope would work).
 
  I'd be grateful for any suggestions!
 
  Thanks,
  David Romano
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting character vectors into multiple vectors using strsplit

2012-11-02 Thread David Romano
Hi again,

I just wanted to thank folks for their suggestions; they led me to
understand the *apply family a little better, and to realize the issue was
really a question of how to convert a list of equal length vectors into a
matrix.  In this case sapply only needs to be asked to identify these
vectors individually; I don't know if R has the equivalent of an identity
function, but the following solution accomplishes this:

 splitvectors - sapply(splitlist, function(x) x)
 splitvectors
 [,1] [,2]
[1,] a1 a2
[2,] b1 b2

or, by replacing the anonymous function by c, we obtain a more elegant but
more wasteful solution.

Thanks again for everyone's help,
David Romano

On Fri, Sep 7, 2012 at 11:12 AM, David Romano roma...@grinnell.edu wrote:

 Hi folks,

 Suppose I create the character vector charvec by

  charvec-c(a1.b1,a2.b2)
  charvec
 [1] a1.b1 a2.b2

 and then I use strsplit on charvec as follows:

  splitlist-strsplit(charvec,split=.,fixed=TRUE)
  splitlist
 [[1]]
 [1] a1 b1

 [[2]]
 [1] a2 b2


 I was wondering whether there is already a function which can extract
 the a and b parts of the list splitlist; that is, that can return
 the same vectors as those created by c(a1,a2) and c(b1,b2).

 Thanks,
 David Romano


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] splitting character vectors into multiple vectors using strsplit

2012-09-07 Thread David Romano
Hi folks,

Suppose I create the character vector charvec by

 charvec-c(a1.b1,a2.b2)
 charvec
[1] a1.b1 a2.b2

and then I use strsplit on charvec as follows:

 splitlist-strsplit(charvec,split=.,fixed=TRUE)
 splitlist
[[1]]
[1] a1 b1

[[2]]
[1] a2 b2


I was wondering whether there is already a function which can extract
the a and b parts of the list splitlist; that is, that can return
the same vectors as those created by c(a1,a2) and c(b1,b2).

Thanks,
David Romano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ways of getting around allocMatrix limit?

2012-07-31 Thread David Romano
I need to multiply to very large, nonsparse matrices, and so get the
error allocMatrix: too many elements specified.

Is there a way to set the limit for allocMatrix?

In my case, the two matrices, A and B, are nxm and mxp where m is
small, so I could subdivide each into blocks of submatrices
A=rbind(A1,A2,...) and B=cbind(B1,B2,...) then multiply each pair of
submatrices, but I was thinking there must be a better way to get
around the allocMatrix limit.   I'd be grateful for any suggestions!

Thanks,
David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using save() to work with objects that exceed memory capacity

2012-07-30 Thread David Romano
On Sun, Jul 29, 2012 at 7:08 AM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:

 On Sat, Jul 28, 2012 at 10:48 AM, David Romano roma...@grinnell.edu
 wrote:
  Context:  I'm relatively new to R and am working with very large
 datasets.
 
  General problem:  If working on a dataset requires that I produce more
 than
  two objects of roughly the size of the dataset, R quickly uses up its
  available memory and slows to a virtual halt.
 
  My tentative solution:  To save and remove objects as they're created,
 and
  load them when I need them.  To do this I'm trying to automatically
  generate file names derived from these objects, and use these in save().
 
  My specific question to the list:  How do I capture the string that names
  an object I want to save, in such a way that I can use it in a function
  that calls save()?
 
  For example, suppose I create a matrix and then save it follows:
  mat-matrix(1:9,3,3)
  save(mat, file=matfile)
  Then I get a file of the kind I'd like: the command 'load(matfile)'
  retrieves the correct matrix, with the original name 'mat'.
 
  Further, if I instead save it this way:
  objectname-mat
  save(list=ls(pattern=objectname), file=matfile)
  then I get the same positive result.
 
  But now suppose I create a function
  saveobj - function(objectname,objectfile)
  +   {
  + save(list=ls(pattern=objectname),file=objectfile);
  + return()};
  Then if I now try to save 'mat' by
  matname-mat
  saveobj(matname,matfile)
  I do not get the same result; namely, the command 'load(mat)' retrieves
  no objects.  Why is this?

 load(matfile) no?


Yes.


 It seems to work for me:

 R x - matrix(1:9, ncol = 3)
 R saveobj - function(obj, file){
 + save(list = obj, file = file)
 + }
 R exists(x)
 [1] FALSE
 R saveobj(x, amatrix.rdat)
 R rm(x)
 R load(amatrix.rdat)
 R x
  [,1] [,2] [,3]
 [1,]147
 [2,]258
 [3,]369

 Cheers,
 Michael


Thanks, Michael, for locating the trouble in the unessary call to ls(), and
thanks to Duncan Murdoch, too, for pointing out how ls() was causing the
observed behavior: without including an argument like envir=parent.frame(),
ls() only returns local objects created after the call to saveobj.   Very
helpful -- thanks to you both!

Best,
David




  
 
  I'd be grateful for any help on either my specific questions, or
  suggestions of a better ways to address the issue of limited memory.
 
  Thanks,
  David Romano
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using save() to work with objects that exceed memory capacity

2012-07-28 Thread David Romano
Context:  I'm relatively new to R and am working with very large datasets.

General problem:  If working on a dataset requires that I produce more than
two objects of roughly the size of the dataset, R quickly uses up its
available memory and slows to a virtual halt.

My tentative solution:  To save and remove objects as they're created, and
load them when I need them.  To do this I'm trying to automatically
generate file names derived from these objects, and use these in save().

My specific question to the list:  How do I capture the string that names
an object I want to save, in such a way that I can use it in a function
that calls save()?

For example, suppose I create a matrix and then save it follows:
 mat-matrix(1:9,3,3)
 save(mat, file=matfile)
Then I get a file of the kind I'd like: the command 'load(matfile)'
retrieves the correct matrix, with the original name 'mat'.

Further, if I instead save it this way:
 objectname-mat
 save(list=ls(pattern=objectname), file=matfile)
then I get the same positive result.

But now suppose I create a function
 saveobj - function(objectname,objectfile)
+   {
+ save(list=ls(pattern=objectname),file=objectfile);
+ return()};
Then if I now try to save 'mat' by
 matname-mat
 saveobj(matname,matfile)
I do not get the same result; namely, the command 'load(mat)' retrieves
no objects.  Why is this?


I'd be grateful for any help on either my specific questions, or
suggestions of a better ways to address the issue of limited memory.

Thanks,
David Romano

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.