Re: [Rd] suggestion how to use memcpy in duplicate.c

Simon Urbanek Sat, 24 Apr 2010 08:22:10 -0700

Herve,

I think you code just confirms what I said -- for small nt for() wins, 
otherwise memcpy wins. Taking your measurements (they are a bit crude since 
they measure overhead as well):


ginaz:sandbox$ time ./hmc.mc 1

real    0m7.294s
user    0m7.239s
sys     0m0.054s
ginaz:sandbox$ time ./hmc 1

real    0m3.773s
user    0m3.746s
sys     0m0.024s

so for() is about 2x faster

ginaz:sandbox$ time ./hmc 3

real    0m4.751s
user    0m4.718s
sys     0m0.023s
ginaz:sandbox$ time ./hmc.mc 3

real    0m3.098s
user    0m3.051s
sys     0m0.045s

memcpy is about 50% faster.

It also proves me right when I said we should only special-case the common case 
of scalar recycling and use memcpy for everything else.

Cheers,
Simon




On Apr 23, 2010, at 9:21 PM, Hervé Pagès wrote:

> Follow up...
> 
> Hervé Pagès wrote:
>> Hi Matthew,
>> Matthew Dowle wrote:
>>> Just to add some clarification, the suggestion wasn't motivated by speeding 
>>> up a length 3 vector being recycled 3.3 million times.  But its a good 
>>> point that any change should not make that case slower.  I don't know how 
>>> much vectorCopy is called really,  DUPLICATE_ATOMIC_VECTOR seems more 
>>> significant, which doesn't recycle, and already had the FIXME next to it.
>>> 
>>> Where copyVector is passed a large source though, then memcpy should be 
>>> faster than any of the methods using a for loop through each element 
>>> (whether recycling or not),  allowing for the usual caveats. What are the 
>>> timings like if you repeat the for loop 100 times to get a more robust 
>>> timing ?  It needs to be a repeat around the for loop only, not the 
>>> allocVector whose variance looks to be included in those timings below. 
>>> Then increase the size of the source vector,  and compare to memcpy.
>> On my system (DELL LATITUDE laptop with 64-bit 9.04 Ubuntu):
>> #include <stdio.h>
>> #include <string.h>
>> #include <stdlib.h>
>> void *memcpy2(char *dest, const char *src, size_t n)
>> {
>>        int i;
>>        for (i = 0; i < n; i++) *(dest++) = *(src++);
>>        return dest;
>> }
>> int main()
>> {
>>        int n, kmax, k;
>>        char *x, *y;
>>        n = 25000000;
>>    kmax = 100;
>>        x = (char *) malloc(n);
>>        y = (char *) malloc(n);
>>        for (k = 0; k < kmax; k++)
>>                //memcpy2(y, x, n);
>>                memcpy(y, x, n);
>>        return 0;
>> }
>> Benchmarks:
>> n = 25000000, kmax = 100, memcpy2:
>>  real    0m8.123s
>>  user    0m8.077s
>>  sys    0m0.040s
>> n = 25000000, k = 100, memcpy:
>>  real    0m1.076s
>>  user    0m1.004s
>>  sys    0m0.060s
>> n = 25000, kmax = 100000, memcpy2:
>>  real    0m8.033s
>>  user    0m8.005s
>>  sys    0m0.012s
>> n = 25000, kmax = 100000, memcpy:
>>  real    0m0.353s
>>  user    0m0.352s
>>  sys    0m0.000s
>> n = 25, kmax = 100000000, memcpy2:
>>  real    0m8.351s
>>  user    0m8.313s
>>  sys    0m0.008s
>> n = 25, kmax = 100000000, memcpy:
>>  real    0m0.628s
>>  user    0m0.624s
>>  sys    0m0.004s
>> So depending on the size of the memory area to copy, GNU memcpy() is
>> between 7.5x and 22x faster than using a for() loop. You can reasonably
>> expect that the authors of memcpy() have done their best to optimize
>> the code for most platforms they support, for big and small memory
>> areas, and that if there was a need to branch based on the size of the
>> area, that's already done *inside* memcpy() (I'm just speculating here,
>> I didn't look at memcpy's source code).
> 
> So for copying a vector of integer (with recycling of the source),
> yes, a memcpy-based implementation is much faster, for long and small
> vectors (even for a length 3 vector being recycled 3.3 million
> times ;-) ), at least on my system:
> 
> nt = 3; ns = 10000000; kmax = 100; copy_ints:
> 
>  real 0m1.206s
>  user 0m1.168s
>  sys  0m0.040s
> 
> nt = 3; ns = 10000000; kmax = 100; copy_ints2:
> 
>  real 0m6.326s
>  user 0m6.264s
>  sys  0m0.052s
> 
> 
> Code:
> =======================================================================
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
> 
> void memcpy_with_recycling_of_src(char *dest, size_t dest_nblocks,
>                                 const char *src, size_t src_nblocks,
>                                 size_t blocksize)
> {
>       int i, imax, q;
>       size_t src_size;
> 
>       imax = dest_nblocks - src_nblocks;
>       src_size = src_nblocks * blocksize;
>       for (i = 0; i <= imax; i += src_nblocks) {
>               memcpy(dest, src, src_size);
>               dest += src_size;
>               i += src_nblocks;
>       }
>       q = dest_nblocks - i;
>       if (q > 0)
>               memcpy(dest, src, q * blocksize);
>       return;
> }
> 
> void copy_ints(int *dest, int dest_length,
>               const int *src, int src_length)
> {
>       memcpy_with_recycling_of_src((char *) dest, dest_length,
>                                    (char *) src, src_length,
>                                    sizeof(int));
> }
> 
> /* the copyVector() way */
> void copy_ints2(int *dest, int dest_length,
>                const int *src, int src_length)
> {
>       int i;
> 
>       for (i = 0; i < dest_length; i++)
>               dest[i] = src[i % src_length];
> }
> 
> int main()
> {
>       int nt, ns, kmax, k;
>       int *t, *s;
> 
>       nt = 3;
>       ns = 10000000;
>       kmax = 100;
>       t = (int *) malloc(nt * sizeof(int));
>       s = (int *) malloc(ns * sizeof(int));
>       for (k = 0; k < kmax; k++)
>               //copy_ints(s, ns, t, nt);
>               copy_ints2(s, ns, t, nt);
>       return 0;
> }
> 
> Note that the function that actually does the job is
> memcpy_with_recycling_of_src(). It can be reused for copying
> vectors with elements of an arbitrary size.
> 
> Cheers,
> H.
> 
>> Cheers,
>> H.
>>> 
>>> Matthew
>>> 
>>> "William Dunlap" <wdun...@tibco.com> wrote in message 
>>> news:77eb52c6dd32ba4d87471dcd70c8d70002ce6...@na-pa-vbe03.na.tibco.com...
>>> If I were worried about the time this loop takes,
>>> I would avoid using i%nt.  For the attached C code
>>> compile with gcc 4.3.3 with -O2 I get
>>>  > # INTEGER() in loop
>>>  > system.time( r1 <- .Call("my_rep1", 1:3, 1e7) )
>>>     user  system elapsed
>>>    0.060   0.012   0.071
>>> 
>>>  > # INTEGER() before loop
>>>  > system.time( r2 <- .Call("my_rep2", 1:3, 1e7) )
>>>     user  system elapsed
>>>    0.076   0.008   0.086
>>> 
>>>  > # replace i%src_length in loop with j=0 before loop and
>>>  > #    if(++j==src_length) j=0 ;
>>>  > # in the loop.
>>>  > system.time( r3 <- .Call("my_rep3", 1:3, 1e7) )
>>>     user  system elapsed
>>>    0.024   0.028   0.050
>>>  > identical(r1,r2) && identical(r2,r3)
>>>  [1] TRUE
>>> 
>>> The C code is:
>>> #define USE_RINTERNALS /* pretend we are in the R kernel */
>>> #include <R.h>
>>> #include <Rinternals.h>
>>> 
>>> 
>>> SEXP my_rep1(SEXP s_src, SEXP s_dest_length)
>>> {
>>>    int src_length = length(s_src) ;
>>>    int dest_length = asInteger(s_dest_length) ;
>>>    int i,j ;
>>>    SEXP s_dest ;
>>>    PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
>>>    if(TYPEOF(s_src) != INTSXP) error("src must be integer data") ;
>>>    for(i=0;i<dest_length;i++) {
>>>        INTEGER(s_dest)[i] = INTEGER(s_src)[i % src_length] ;
>>>    }
>>>    UNPROTECT(1) ;
>>>    return s_dest ;
>>> }
>>> SEXP my_rep2(SEXP s_src, SEXP s_dest_length)
>>> {
>>>    int src_length = length(s_src) ;
>>>    int dest_length = asInteger(s_dest_length) ;
>>>    int *psrc = INTEGER(s_src) ;
>>>    int *pdest ;
>>>    int i ;
>>>    SEXP s_dest ;
>>>    PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
>>>    pdest = INTEGER(s_dest) ;
>>>    if(TYPEOF(s_src) != INTSXP) error("src must be integer data") ;
>>>    /* end of boilerplate */
>>>    for(i=0;i<dest_length;i++) {
>>>        pdest[i] = psrc[i % src_length] ;
>>>    }
>>>    UNPROTECT(1) ;
>>>    return s_dest ;
>>> }
>>> SEXP my_rep3(SEXP s_src, SEXP s_dest_length)
>>> {
>>>    int src_length = length(s_src) ;
>>>    int dest_length = asInteger(s_dest_length) ;
>>>    int *psrc = INTEGER(s_src) ;
>>>    int *pdest ;
>>>    int i,j ;
>>>    SEXP s_dest ;
>>>    PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
>>>    pdest = INTEGER(s_dest) ;
>>>    if(TYPEOF(s_src) != INTSXP) error("src must be integer data") ;
>>>    /* end of boilerplate */
>>>    for(j=0,i=0;i<dest_length;i++) {
>>>        *pdest++ = psrc[j++] ;
>>>        if (j==src_length) {
>>>            j = 0 ;
>>>        }
>>>    }
>>>    UNPROTECT(1) ;
>>>    return s_dest ;
>>> }
>>> 
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>> 
>>>> -----Original Message-----
>>>> From: r-devel-boun...@r-project.org
>>>> [mailto:r-devel-boun...@r-project.org] On Behalf Of Romain Francois
>>>> Sent: Wednesday, April 21, 2010 12:32 PM
>>>> To: Matthew Dowle
>>>> Cc: r-de...@stat.math.ethz.ch
>>>> Subject: Re: [Rd] suggestion how to use memcpy in duplicate.c
>>>> 
>>>> Le 21/04/10 17:54, Matthew Dowle a écrit :
>>>>>> From copyVector in duplicate.c :
>>>>> void copyVector(SEXP s, SEXP t)
>>>>> {
>>>>>     int i, ns, nt;
>>>>>     nt = LENGTH(t);
>>>>>     ns = LENGTH(s);
>>>>>     switch (TYPEOF(s)) {
>>>>> ...
>>>>>     case INTSXP:
>>>>>     for (i = 0; i<  ns; i++)
>>>>>         INTEGER(s)[i] = INTEGER(t)[i % nt];
>>>>>     break;
>>>>> ...
>>>>> 
>>>>> could that be replaced with :
>>>>> 
>>>>>     case INTSXP:
>>>>>     for (i=0; i<ns/nt; i++)
>>>>>         memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char
>>>> *)DATAPTR(t),
>>>>> nt*sizeof(int));
>>>>>     break;
>>>> or at least with something like this:
>>>> 
>>>> int* p_s = INTEGER(s) ;
>>>> int* p_t = INTEGER(t) ;
>>>> for( i=0 ; i < ns ; i++){
>>>> p_s[i] = p_t[i % nt];
>>>> }
>>>> 
>>>> since expanding the INTEGER macro over and over has a price.
>>>> 
>>>>> and similar for the other types in copyVector.  This won't
>>>> help regular
>>>>> vector copies, since those seem to be done by the
>>>> DUPLICATE_ATOMIC_VECTOR
>>>>> macro, see next suggestion below, but it should help
>>>> copyMatrix which calls
>>>>> copyVector, scan.c which calls copyVector on three lines,
>>>> dcf.c (once) and
>>>>> dounzip.c (once).
>>>>> 
>>>>> For the DUPLICATE_ATOMIC_VECTOR macro there is already a
>>>> comment next to it
>>>>> :
>>>>> 
>>>>>     <FIXME>: surely memcpy would be faster here?
>>>>> 
>>>>> which seems to refer to the for loop  :
>>>>> 
>>>>>     else { \
>>>>>     int __i__; \
>>>>>     type *__fp__ = fun(from), *__tp__ = fun(to); \
>>>>>     for (__i__ = 0; __i__<  __n__; __i__++) \
>>>>>       __tp__[__i__] = __fp__[__i__]; \
>>>>>   } \
>>>>> 
>>>>> Could that loop be replaced by the following ?
>>>>> 
>>>>>    else { \
>>>>>    memcpy((char *)DATAPTR(to), (char *)DATAPTR(from),
>>>> __n__*sizeof(type)); \
>>>>>    }\
>>>>> 
>>>>> In the data.table package, dogroups.c uses this technique,
>>>> so the principle
>>>>> is tested and works well so far.
>>>>> 
>>>>> Are there any road blocks preventing this change, or is
>>>> anyone already
>>>>> working on it ?  If not then I'll try and test it (on
>>>> Ubuntu 32bit) and
>>>>> submit patch with timings, as before.  Comments/pointers
>>>> much appreciated.
>>>>> Matthew
>>>>> 
>>>>> ______________________________________________
>>>>> R-devel@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> 
>>>>> 
>>>> 
>>>> -- 
>>>> Romain Francois
>>>> Professional R Enthusiast
>>>> +33(0) 6 28 91 30 30
>>>> http://romainfrancois.blog.free.fr
>>>> |- http://bit.ly/9aKDM9 : embed images in Rd documents
>>>> |- http://tr.im/OIXN : raster images and RImageJ
>>>> |- http://tr.im/OcQe : Rcpp 0.7.7
>>>> 
>>>> ______________________________________________
>>>> R-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> 
>>> 
>>> ______________________________________________
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] suggestion how to use memcpy in duplicate.c

Reply via email to