This was in error since s3 was not set.  The as.numeric in the calculation
of s3 can be omitted if its ok to have an integer rather than numeric result
and in that case its still faster yet.

> set.seed(1)
> C <- sample(c("a", "b"), 1000000, replace = TRUE)
> system.time({
+ s0 <- vector(length = length(C))
+ for(i in seq_along(C)) s0[i] <- if (C[i] == "a") 1 else -1
+ s0
+ })
   user  system elapsed
  21.32    0.02   26.10
> system.time(s1 <- ifelse(C == "a", 1, -1))
   user  system elapsed
   2.37    0.26    2.64
> system.time(s2 <- 2 * (C == "a") - 1)
   user  system elapsed
   0.32    0.02    0.35
> system.time({tmp <- C == "a"; s3 <- as.numeric(tmp - !tmp)})
   user  system elapsed
   0.28    0.02    0.31
> identical(s0, s1)
[1] TRUE
> identical(s0, s2)
[1] TRUE
> identical(s0, s3)
[1] TRUE
>


On 7/4/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> In thinking about this a bit more I have found a slightly faster one still.
> See s3.  Also I have added s0, the original solution, to the timings.
>
> > set.seed(1)
> > C <- sample(c("a", "b"), 1000000, replace = TRUE)
> > system.time({
> + s0 <- vector(length = length(C))
> + for(i in seq_along(C)) s0[i] <- if (C[i] == "a") 1 else -1
> + s0
> + })
>   user  system elapsed
>  21.75    0.02   25.99
> > system.time(s1 <- ifelse(C == "a", 1, -1))
>   user  system elapsed
>   2.32    0.17    2.54
> > system.time(s2 <- 2 * (C == "a") - 1)
>   user  system elapsed
>   0.29    0.02    0.32
> > system.time({tmp <- C == "a"; tmp - !tmp})
>   user  system elapsed
>   0.21    0.00    0.21
> > identical(s0, s1)
> [1] TRUE
> > identical(s0, s2)
> [1] TRUE
> > identical(s0, s3)
> [1] TRUE
>
> On 7/4/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> > Here are two ways.  The second way is more than 10x faster.
> >
> > > set.seed(1)
> > > C <- sample(c("a", "b"), 100000, replace = TRUE)
> > > system.time(s1 <- ifelse(C == "a", 1, -1))
> >   user  system elapsed
> >   0.37    0.01    0.38
> > > system.time(s2 <- 2 * (C == "a") - 1)
> >   user  system elapsed
> >   0.02    0.00    0.02
> > > identical(s1, s2)
> > [1] TRUE
> >
> > On 7/4/07, Keith Alan Chamberlain <[EMAIL PROTECTED]> wrote:
> > > Dear Rhelpers,
> > >
> > > Is there a faster way than below to set a vector based on values from
> > > another vector? I'd like to call a pre-existing function for this, but one
> > > which can also handle an arbitrarily large number of categories. Any 
> > > ideas?
> > >
> > > Cat=c('a','a','a','b','b','b','a','a','b')      # Categorical variable
> > > C1=vector(length=length(Cat))   # New vector for numeric values
> > >
> > > # Cycle through each column and set C1 to corresponding value of Cat.
> > > for(i in 1:length(C1)){
> > >        if(Cat[i]=='a') C1[i]=-1 else C1[i]=1
> > > }
> > >
> > > C1
> > > [1] -1 -1 -1  1  1  1 -1 -1  1
> > > Cat
> > > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b"
> > >
> > > Sincerely,
> > > KeithC.
> > > Psych Undergrad, CU Boulder (US)
> > > RE McNair Scholar
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to