Re: [R] Median computation

Benno Pütz Wed, 23 May 2012 10:27:19 -0700

I wonder how you do this (or maybe on what kind of machine you execute it).


I tried it out of curiosity and get

> df = as.data.frame(lapply(1:300,function(x)sample(200,250000,T)))
> colnames(df) = sample(letters[1:20],300,T)
> system.time(dfmed<-lapply(unique(colnames(df)), function(x)
+ rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)))
   user  system elapsed 
  5.680   0.952   7.171 

and those times are in seconds! The time consuming part was building the 
data.frame not the calculation.

The only thing I noticed is that my R process claims some 1.4 GB of memory but 
that should not be a problem on any recent hardware but my guess at answering 
your question would be that this might be your problem, especially if you have 
other memory-hogging variables like this data frame lying around and you see 
severe memory swapping effects

        Benno

> Hello Everybody,
> 
> The code:
> 
> dfmed<-lapply(unique(colnames(df)), function(x)
> rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
> 
> takes really long time to execute ( in hours). Is there a faster way to do
> this?
> 
> Thanks!
> 
> On Tue, May 22, 2012 at 3:46 PM, Preeti <pre...@sci.utah.edu> wrote:
> 
>> Thanks Henrik! Here is the one-liner that I wrote:
>> 
>> dfmed<-lapply(unique(colnames(df)), function(x)
>> rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
>> 
>> Thanks again!
>> 
>> 
>> On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson 
>> <h...@biostat.ucsf.edu>wrote:
>> 
>>> See rowMedians() of the matrixStats package for replacing apply(x,
>>> MARGIN=1, FUN=median). /Henrik
>>> 
>>> On Tue, May 22, 2012 at 12:34 PM, Preeti <pre...@sci.utah.edu> wrote:
>>>> Hi,
>>>> 
>>>> I have a 250,000 by 300 matrix. I am trying to calculate the median of
>>>> those columns (by row) with column names that are identical. I would
>>> like
>>>> this to be efficient since apply(x,1,median) where x is created by
>>> choosing
>>>> only those columns with same column name and looping on this is taking a
>>>> really long time. Is there an efficient way to do this?
>>>> 
>>>> Thanks!
>>>> 
>>>>       [[alternative HTML version deleted]]
>>>> 
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Benno Pütz
Statistical Genetics
MPI of Psychiatry
Kraepelinstr. 2-10
80804 Munich, Germany
T: ++49-(0)89-306 22 222
F: ++49-(0)89-306 22 601




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Median computation

Reply via email to