On Apr 21, 2010, at 2:15 PM, Seth Falcon wrote: > On 4/21/10 10:45 AM, Simon Urbanek wrote: >> Won't that miss the last incomplete chunk? (and please don't use >> DATAPTR on INTSXP even though the effect is currently the same) >> >> In general it seems that the it depends on nt whether this is >> efficient or not since calls to short memcpy are expensive (very >> small nt that is). >> >> I ran some empirical tests to compare memcpy vs for() (x86_64, OS X) >> and the results were encouraging - depending on the size of the >> copied block the difference could be quite big: tiny block (ca. n = >> 32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x >> faster as the size increases the gap closes (presumably due to RAM >> bandwidth limitations) so for n = 512M it is ~30%. >> > >> Of course this is contingent on the implementation of memcpy, >> compiler, architecture etc. And will only matter if copying is what >> you do most of the time ... > > Copying of vectors is something that I would expect to happen fairly often in > many applications of R. > > Is for() faster on small blocks by enough that one would want to branch based > on size? >
Good question. Given that the branching itself adds overhead possibly not. In the best case for() can be ~40% faster (for single-digit n) but that means billions of copies to make a difference (since the operation itself is so fast). The break-even point on my test machine is n=32 and when I added the branching it took 20% hit so I guess it's simply not worth it. The only case that may be worth branching is n:1 since that is likely a fairly common use (the branching penalty in copy routines is lower than comparing memcpy/for implementations since the branching can be done before the outer for loop so this may vary case-by-case). Cheers, Simon ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel