On Sun, 23 Sep 2018, Wensui Liu wrote:
what you measures is the "elapsed" time in the default setting. you
might need to take a closer look at the beautiful benchmark() function
and see what time I am talking about.
When I am waiting for the answer, elapsed time is what matters to me.
Also, since each person usually has different hardware, running benchmark
with multiple expressions as Ista did lets you pay attention to relative
comparisons.
Keep in mind that parallel processing requires extra time just to
distribute the calculations to the workers, so it doesn't pay to
distribute tiny tasks like calculating the division of two numeric vector
elements. That is the essence of vectorizing... bundle your simple
calculations together so the processor can focus on getting answers rather
than managing processes or even interpreting R for loops.
I just provided tentative solution for the person asking for it and
believe he has enough wisdom to decide what's best. why bother to
judge others subjectively?
I would say that Ista has backed up his objections with measurable
performance metrics, so while his initial reaction was pretty subjective I
think your reaction at this point is really off the mark.
One confusing aspect of your response is that Ista reacted to your
use of the Vectorize function, but you responded as though he reacted
to your use of the pvec function. I mentioned drawbacks of using pvec
above, but it really is important to stress that the Vectorize function is
a usability facade and is in no way a performance enhancement to be
associated with what we refer to as vectorized (lowercase) code.
The Vectorize function creates a function that calls lapply, which in turn
calls the C function do_lapply, which calls your R function with scalar
inputs as many times as desired, storing the results in a list, which
Vectorize then gives to mapply which runs another for loop over to create
a matrix or vector result. This is clearly less efficient than a simple
for loop would have been, rather than more efficient as a true vectorized
solution such as log(c1[-1]/c1[-len]) will normally be. Vectorize is
syntactic sugar with a performance penalty.
Please pay attention to the comments offered by others on this list...
being told your solution is inferior doesn't feel good but it is a very
real opportunity for you to improve.
End comment.
On Sun, Sep 23, 2018 at 1:18 PM Ista Zahn <istaz...@gmail.com> wrote:
On Sun, Sep 23, 2018 at 1:46 PM Wensui Liu <liuwen...@gmail.com> wrote:
actually, by the parallel pvec, the user time is a lot shorter. or did
I somewhere miss your invaluable insight?
c1 <- 1:1000000
len <- length(c1)
rbenchmark::benchmark(log(c1[-1]/c1[-len]), replications = 100)
test replications elapsed relative user.self sys.self
1 log(c1[-1]/c1[-len]) 100 4.617 1 4.484 0.133
user.child sys.child
1 0 0
rbenchmark::benchmark(pvec(1:(len - 1), mc.cores = 4, function(i) log(c1[i + 1]
/ c1[i])), replications = 100)
test
1 pvec(1:(len - 1), mc.cores = 4, function(i) log(c1[i + 1]/c1[i]))
replications elapsed relative user.self sys.self user.child sys.child
1 100 9.079 1 2.571 4.138 9.736 8.046
Your output is mangled in my email, but on my system your pvec
approach takes more than twice as long:
c1 <- 1:1000000
len <- length(c1)
library(parallel)
library(rbenchmark)
regular <- function() log(c1[-1]/c1[-len])
iterate.parallel <- function() {
pvec(1:(len - 1), mc.cores = 4,
function(i) log(c1[i + 1] / c1[i]))
}
benchmark(regular(), iterate.parallel(),
replications = 100,
columns = c("test", "elapsed", "relative"))
## test elapsed relative
## 2 iterate.parallel() 7.517 2.482
## 1 regular() 3.028 1.000
Honestly, just use log(c1[-1]/c1[-len]). The code is simple and easy
to understand and it runs pretty fast. There is usually no reason to
make it more complicated.
--Ista
On Sun, Sep 23, 2018 at 12:33 PM Ista Zahn <istaz...@gmail.com> wrote:
On Sun, Sep 23, 2018 at 10:09 AM Wensui Liu <liuwen...@gmail.com> wrote:
Why?
The operations required for this algorithm are vectorized, as are most
operations in R. There is no need to iterate through each element.
Using Vectorize to achieve the iteration is no better than using
*apply or a for-loop, and betrays the same basic lack of insight into
basic principles of programming in R.
And/or, if you want a more practical reason:
c1 <- 1:1000000
len <- 1000000
system.time( s1 <- log(c1[-1]/c1[-len]))
user system elapsed
0.031 0.004 0.035
system.time(s2 <- Vectorize(function(i) log(c1[i + 1] / c1[i])) (1:len))
user system elapsed
1.258 0.022 1.282
Best,
Ista
On Sun, Sep 23, 2018 at 7:54 AM Ista Zahn <istaz...@gmail.com> wrote:
On Sat, Sep 22, 2018 at 9:06 PM Wensui Liu <liuwen...@gmail.com> wrote:
or this one:
(Vectorize(function(i) log(c1[i + 1] / c1[i])) (1:len))
Oh dear god no.
On Sat, Sep 22, 2018 at 4:16 PM rsherry8 <rsher...@comcast.net> wrote:
It is my impression that good R programmers make very little use of the
for statement. Please consider the following
R statement:
for( i in 1:(len-1) ) s[i] = log(c1[i+1]/c1[i], base = exp(1) )
One problem I have found with this statement is that s must exist before
the statement is run. Can it be written without using a for
loop? Would that be better?
Thanks,
Bob
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.