On Sun, 23 Sep 2018, Wensui Liu wrote:

what you measures is the "elapsed" time in the default setting. you
might need to take a closer look at the beautiful benchmark() function
and see what time I am talking about.

When I am waiting for the answer, elapsed time is what matters to me. Also, since each person usually has different hardware, running benchmark with multiple expressions as Ista did lets you pay attention to relative comparisons.

Keep in mind that parallel processing requires extra time just to distribute the calculations to the workers, so it doesn't pay to distribute tiny tasks like calculating the division of two numeric vector elements. That is the essence of vectorizing... bundle your simple calculations together so the processor can focus on getting answers rather than managing processes or even interpreting R for loops.

I just provided tentative solution for the person asking for it  and
believe he has enough wisdom to decide what's best. why bother to
judge others subjectively?

I would say that Ista has backed up his objections with measurable performance metrics, so while his initial reaction was pretty subjective I think your reaction at this point is really off the mark.

One confusing aspect of your response is that Ista reacted to your use of the Vectorize function, but you responded as though he reacted to your use of the pvec function. I mentioned drawbacks of using pvec above, but it really is important to stress that the Vectorize function is a usability facade and is in no way a performance enhancement to be associated with what we refer to as vectorized (lowercase) code.

The Vectorize function creates a function that calls lapply, which in turn calls the C function do_lapply, which calls your R function with scalar inputs as many times as desired, storing the results in a list, which Vectorize then gives to mapply which runs another for loop over to create a matrix or vector result. This is clearly less efficient than a simple for loop would have been, rather than more efficient as a true vectorized solution such as log(c1[-1]/c1[-len]) will normally be. Vectorize is syntactic sugar with a performance penalty.

Please pay attention to the comments offered by others on this list... being told your solution is inferior doesn't feel good but it is a very real opportunity for you to improve.

End comment.

On Sun, Sep 23, 2018 at 1:18 PM Ista Zahn <istaz...@gmail.com> wrote:

On Sun, Sep 23, 2018 at 1:46 PM Wensui Liu <liuwen...@gmail.com> wrote:

actually, by the parallel pvec, the user time is a lot shorter. or did
I somewhere miss your invaluable insight?

c1 <- 1:1000000
len <- length(c1)
rbenchmark::benchmark(log(c1[-1]/c1[-len]), replications = 100)
                  test replications elapsed relative user.self sys.self
1 log(c1[-1]/c1[-len])          100   4.617        1     4.484    0.133
  user.child sys.child
1          0         0
rbenchmark::benchmark(pvec(1:(len - 1), mc.cores = 4, function(i) log(c1[i + 1] 
/ c1[i])), replications = 100)
                                                               test
1 pvec(1:(len - 1), mc.cores = 4, function(i) log(c1[i + 1]/c1[i]))
  replications elapsed relative user.self sys.self user.child sys.child
1          100   9.079        1     2.571    4.138      9.736     8.046

Your output is mangled in my email, but on my system your pvec
approach takes more than twice as long:

c1 <- 1:1000000
len <- length(c1)
library(parallel)
library(rbenchmark)

regular <- function() log(c1[-1]/c1[-len])
iterate.parallel <- function() {
  pvec(1:(len - 1), mc.cores = 4,
       function(i) log(c1[i + 1] / c1[i]))
}

benchmark(regular(), iterate.parallel(),
          replications = 100,
          columns = c("test", "elapsed", "relative"))
##                 test elapsed relative
## 2 iterate.parallel()   7.517    2.482
## 1          regular()   3.028    1.000

Honestly, just use log(c1[-1]/c1[-len]). The code is simple and easy
to understand and it runs pretty fast. There is usually no reason to
make it more complicated.
--Ista

On Sun, Sep 23, 2018 at 12:33 PM Ista Zahn <istaz...@gmail.com> wrote:

On Sun, Sep 23, 2018 at 10:09 AM Wensui Liu <liuwen...@gmail.com> wrote:

Why?

The operations required for this algorithm are vectorized, as are most
operations in R. There is no need to iterate through each element.
Using Vectorize to achieve the iteration is no better than using
*apply or a for-loop, and betrays the same basic lack of insight into
basic principles of programming in R.

And/or, if you want a more practical reason:

c1 <- 1:1000000
len <- 1000000
system.time( s1 <- log(c1[-1]/c1[-len]))
   user  system elapsed
  0.031   0.004   0.035
system.time(s2 <- Vectorize(function(i) log(c1[i + 1] / c1[i])) (1:len))
   user  system elapsed
  1.258   0.022   1.282

Best,
Ista


On Sun, Sep 23, 2018 at 7:54 AM Ista Zahn <istaz...@gmail.com> wrote:

On Sat, Sep 22, 2018 at 9:06 PM Wensui Liu <liuwen...@gmail.com> wrote:

or this one:

(Vectorize(function(i) log(c1[i + 1] / c1[i])) (1:len))

Oh dear god no.


On Sat, Sep 22, 2018 at 4:16 PM rsherry8 <rsher...@comcast.net> wrote:


It is my impression that good R programmers make very little use of the
for statement. Please consider  the following
R statement:
         for( i in 1:(len-1) )  s[i] = log(c1[i+1]/c1[i], base = exp(1) )
One problem I have found with this statement is that s must exist before
the statement is run. Can it be written without using a for
loop? Would that be better?

Thanks,
Bob

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to