Thanks, Stefan.
I tested the expressions over a set of various size of data frames.
The result shows 2) and 3) are faster than 1) especially over a data frame
with a large number of columns. The third one is probably the best.
1) subset(df, V1 > 0, V2) or subset(df, V1 > 0, V2)$V2
2) df[df$V1 > 0.5, "V2"]
3) df$V2[df$V1 > 0]
== TESTS ==
1. test over 1000000*10 matrix
> df <- as.data.frame.matrix(matrix(runif(10000000),1000000))
> system.time(subset(df, V1 > 0.5, V2), gcFirst=T)
user system elapsed
0.260 0.044 0.302
> system.time(subset(df, V1 > 0.5, V2)$V2, gcFirst=T)
user system elapsed
0.256 0.044 0.300
> system.time(df[df$V1 > 0.5, "V2"], gcFirst=T)
user system elapsed
0.100 0.016 0.117
> system.time(df$V2[df$V1 > 0.5], gcFirst=T)
user system elapsed
0.104 0.012 0.117
2. test over 100000*100 matrix
> df <- as.data.frame.matrix(matrix(runif(10000000),100000))
> system.time(subset(df, V1 > 0.5, V2), gcFirst=T)
user system elapsed
0.04 0.00 0.04
> system.time(subset(df, V1 > 0.5, V2)$V2, gcFirst=T)
user system elapsed
0.040 0.000 0.042
> system.time(df[df$V1 > 0.5, "V2"], gcFirst=T)
user system elapsed
0.012 0.000 0.011
> system.time(df$V2[df$V1 > 0.5], gcFirst=T)
user system elapsed
0.012 0.000 0.011
3. test over 10000*1000 matrix
> df <- as.data.frame.matrix(matrix(runif(10000000),10000))
> system.time(subset(df, V1 > 0.5, V2), gcFirst=T)
user system elapsed
0.008 0.000 0.008
> system.time(subset(df, V1 > 0.5, V2)$V2, gcFirst=T)
user system elapsed
0.004 0.000 0.005
> system.time(df[df$V1 > 0.5, "V2"], gcFirst=T)
user system elapsed
0.004 0.000 0.001
> system.time(df$V2[df$V1 > 0.5], gcFirst=T)
user system elapsed
0.004 0.000 0.001
4. test over 100*100000 matrix
> df <- as.data.frame.matrix(matrix(runif(10000000),100))
> system.time(subset(df, V1 > 0.5, V2), gcFirst=T)
user system elapsed
0.336 0.000 0.336
> system.time(subset(df, V1 > 0.5, V2)$V2, gcFirst=T)
user system elapsed
0.332 0.000 0.330
> system.time(df[df$V1 > 0.5, "V2"], gcFirst=T)
user system elapsed
0.004 0.000 0.005
> system.time(df$V2[df$V1 > 0.5], gcFirst=T)
user system elapsed
0 0 0
5. test over 10*1000000 matrix
> df <- as.data.frame.matrix(matrix(runif(10000000),10))
> system.time(subset(df, V1 > 0.5, V2), gcFirst=T)
user system elapsed
26.698 0.000 26.698
> system.time(subset(df, V1 > 0.5, V2)$V2, gcFirst=T)
user system elapsed
26.678 0.004 26.678
> system.time(df[df$V1>0.5, "V2"], gcFirst=T)
user system elapsed
0.060 0.000 0.057
> system.time(df$V2[df$V1>0.5], gcFirst=T)
user system elapsed
0 0 0
2009/9/26 Stefan Grosse <[email protected]>
> On Sat, 26 Sep 2009 15:26:12 +0900 You Hyun Jo <[email protected]>
> wrote:
>
> YHJ> Is there any (performance) difference (except the difference of
> YHJ> the return types)
> YHJ> between the following two computations?
>
> Try it yourself.
> ?system.time
> is useful for that purpose.
>
> Stefan
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.