Re: [Rd] Bugs? when dealing with contrasts
Gabor Grothendieck wrote: On Wed, Apr 21, 2010 at 4:26 PM, Peter Dalgaard pda...@gmail.com wrote: ... I.e., that R reverts to using indicator variables when the intercept is absent. Is there any nice way of getting contr.sum coding for the interaction as opposed to the ugly code in my post that I used to force it? i.e. cbind(1, model.matrix(~ fac)[,2:3] * scores) I think not. In general, an interaction like ~fac:scores indicates three lines with a common intercept and three different slopes, and changing the parametrization is not supposed to change the model, whereas your model inserts a restriction that the slopes sum to zero (if I understand correctly). So if you want to fit ugly models, you get to do a little ugly footwork. (A similar, simpler, issue arises if you want to have a 2x2 design with no effect in one column and/or one row (think clinical trial, placebo vs. active, baseline vs. treated. You can only do this us explicit dummy variables, not with the two classifications represented as factors.) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RUnit bug?
Le 22/04/10 02:19, Dominick Samperi a écrit : There appears to be a bug in RUnit. Given a testsuite testsuite.math, say, when I run: runTestSuite(testsuite.math) this works fine, provided there are no extraneous files in the unit test subdirectory. But if there are any Emacs temp files (with names that end with '~') then runTestSuite gets confused and tries to run functions from the temp files as well. How do you define 'testsuite.math'. The default value of the testFileRegexp argument in defineTestSuite should rule these files out. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9aKDM9 : embed images in Rd documents |- http://tr.im/OIXN : raster images and RImageJ |- http://tr.im/OcQe : Rcpp 0.7.7 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] suggestion how to use memcpy in duplicate.c
Is this a thumbs up for memcpy for DUPLICATE_ATOMIC_VECTOR at least ? If there is further specific testing then let me know, happy to help, but you seem to have beaten me to it. Matthew Simon Urbanek simon.urba...@r-project.org wrote in message news:65d21b93-a737-4a94-bdf4-ad7e90518...@r-project.org... On Apr 21, 2010, at 2:15 PM, Seth Falcon wrote: On 4/21/10 10:45 AM, Simon Urbanek wrote: Won't that miss the last incomplete chunk? (and please don't use DATAPTR on INTSXP even though the effect is currently the same) In general it seems that the it depends on nt whether this is efficient or not since calls to short memcpy are expensive (very small nt that is). I ran some empirical tests to compare memcpy vs for() (x86_64, OS X) and the results were encouraging - depending on the size of the copied block the difference could be quite big: tiny block (ca. n = 32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x faster as the size increases the gap closes (presumably due to RAM bandwidth limitations) so for n = 512M it is ~30%. Of course this is contingent on the implementation of memcpy, compiler, architecture etc. And will only matter if copying is what you do most of the time ... Copying of vectors is something that I would expect to happen fairly often in many applications of R. Is for() faster on small blocks by enough that one would want to branch based on size? Good question. Given that the branching itself adds overhead possibly not. In the best case for() can be ~40% faster (for single-digit n) but that means billions of copies to make a difference (since the operation itself is so fast). The break-even point on my test machine is n=32 and when I added the branching it took 20% hit so I guess it's simply not worth it. The only case that may be worth branching is n:1 since that is likely a fairly common use (the branching penalty in copy routines is lower than comparing memcpy/for implementations since the branching can be done before the outer for loop so this may vary case-by-case). Cheers, Simon __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] suggestion how to use memcpy in duplicate.c
Just to add some clarification, the suggestion wasn't motivated by speeding up a length 3 vector being recycled 3.3 million times. But its a good point that any change should not make that case slower. I don't know how much vectorCopy is called really, DUPLICATE_ATOMIC_VECTOR seems more significant, which doesn't recycle, and already had the FIXME next to it. Where copyVector is passed a large source though, then memcpy should be faster than any of the methods using a for loop through each element (whether recycling or not), allowing for the usual caveats. What are the timings like if you repeat the for loop 100 times to get a more robust timing ? It needs to be a repeat around the for loop only, not the allocVector whose variance looks to be included in those timings below. Then increase the size of the source vector, and compare to memcpy. Matthew William Dunlap wdun...@tibco.com wrote in message news:77eb52c6dd32ba4d87471dcd70c8d70002ce6...@na-pa-vbe03.na.tibco.com... If I were worried about the time this loop takes, I would avoid using i%nt. For the attached C code compile with gcc 4.3.3 with -O2 I get # INTEGER() in loop system.time( r1 - .Call(my_rep1, 1:3, 1e7) ) user system elapsed 0.060 0.012 0.071 # INTEGER() before loop system.time( r2 - .Call(my_rep2, 1:3, 1e7) ) user system elapsed 0.076 0.008 0.086 # replace i%src_length in loop with j=0 before loop and #if(++j==src_length) j=0 ; # in the loop. system.time( r3 - .Call(my_rep3, 1:3, 1e7) ) user system elapsed 0.024 0.028 0.050 identical(r1,r2) identical(r2,r3) [1] TRUE The C code is: #define USE_RINTERNALS /* pretend we are in the R kernel */ #include R.h #include Rinternals.h SEXP my_rep1(SEXP s_src, SEXP s_dest_length) { int src_length = length(s_src) ; int dest_length = asInteger(s_dest_length) ; int i,j ; SEXP s_dest ; PROTECT(s_dest = allocVector(INTSXP, dest_length)) ; if(TYPEOF(s_src) != INTSXP) error(src must be integer data) ; for(i=0;idest_length;i++) { INTEGER(s_dest)[i] = INTEGER(s_src)[i % src_length] ; } UNPROTECT(1) ; return s_dest ; } SEXP my_rep2(SEXP s_src, SEXP s_dest_length) { int src_length = length(s_src) ; int dest_length = asInteger(s_dest_length) ; int *psrc = INTEGER(s_src) ; int *pdest ; int i ; SEXP s_dest ; PROTECT(s_dest = allocVector(INTSXP, dest_length)) ; pdest = INTEGER(s_dest) ; if(TYPEOF(s_src) != INTSXP) error(src must be integer data) ; /* end of boilerplate */ for(i=0;idest_length;i++) { pdest[i] = psrc[i % src_length] ; } UNPROTECT(1) ; return s_dest ; } SEXP my_rep3(SEXP s_src, SEXP s_dest_length) { int src_length = length(s_src) ; int dest_length = asInteger(s_dest_length) ; int *psrc = INTEGER(s_src) ; int *pdest ; int i,j ; SEXP s_dest ; PROTECT(s_dest = allocVector(INTSXP, dest_length)) ; pdest = INTEGER(s_dest) ; if(TYPEOF(s_src) != INTSXP) error(src must be integer data) ; /* end of boilerplate */ for(j=0,i=0;idest_length;i++) { *pdest++ = psrc[j++] ; if (j==src_length) { j = 0 ; } } UNPROTECT(1) ; return s_dest ; } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Romain Francois Sent: Wednesday, April 21, 2010 12:32 PM To: Matthew Dowle Cc: r-de...@stat.math.ethz.ch Subject: Re: [Rd] suggestion how to use memcpy in duplicate.c Le 21/04/10 17:54, Matthew Dowle a écrit : From copyVector in duplicate.c : void copyVector(SEXP s, SEXP t) { int i, ns, nt; nt = LENGTH(t); ns = LENGTH(s); switch (TYPEOF(s)) { ... case INTSXP: for (i = 0; i ns; i++) INTEGER(s)[i] = INTEGER(t)[i % nt]; break; ... could that be replaced with : case INTSXP: for (i=0; ins/nt; i++) memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t), nt*sizeof(int)); break; or at least with something like this: int* p_s = INTEGER(s) ; int* p_t = INTEGER(t) ; for( i=0 ; i ns ; i++){ p_s[i] = p_t[i % nt]; } since expanding the INTEGER macro over and over has a price. and similar for the other types in copyVector. This won't help regular vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR macro, see next suggestion below, but it should help copyMatrix which calls copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and dounzip.c (once). For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it : FIXME: surely memcpy would be faster here? which seems to refer to the for loop : else { \ int __i__; \ type *__fp__ = fun(from), *__tp__ = fun(to); \ for (__i__ =
Re: [Rd] Bugs? when dealing with contrasts
On Thu, Apr 22, 2010 at 2:32 AM, Peter Dalgaard pda...@gmail.com wrote: Gabor Grothendieck wrote: On Wed, Apr 21, 2010 at 4:26 PM, Peter Dalgaard pda...@gmail.com wrote: ... I.e., that R reverts to using indicator variables when the intercept is absent. Is there any nice way of getting contr.sum coding for the interaction as opposed to the ugly code in my post that I used to force it? i.e. cbind(1, model.matrix(~ fac)[,2:3] * scores) I think not. In general, an interaction like ~fac:scores indicates three lines with a common intercept and three different slopes, and changing the parametrization is not supposed to change the model, whereas your model inserts a restriction that the slopes sum to zero (if I understand correctly). So if you want to fit ugly models, you get to do a little ugly footwork. OK. Thanks. I guess that's fair. (A similar, simpler, issue arises if you want to have a 2x2 design with no effect in one column and/or one row (think clinical trial, placebo vs. active, baseline vs. treated. You can only do this us explicit dummy variables, not with the two classifications represented as factors.) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] suggestion how to use memcpy in duplicate.c
On Apr 22, 2010, at 7:12 AM, Matthew Dowle wrote: Is this a thumbs up for memcpy for DUPLICATE_ATOMIC_VECTOR at least ? If there is further specific testing then let me know, happy to help, but you seem to have beaten me to it. I was not volunteering to do anything - I was just looking at whether it makes sense to bother at all and pointing out the bugs in your code ;). I have a sufficiently long list of TODOs already :P Cheers, Simon Simon Urbanek simon.urba...@r-project.org wrote in message news:65d21b93-a737-4a94-bdf4-ad7e90518...@r-project.org... On Apr 21, 2010, at 2:15 PM, Seth Falcon wrote: On 4/21/10 10:45 AM, Simon Urbanek wrote: Won't that miss the last incomplete chunk? (and please don't use DATAPTR on INTSXP even though the effect is currently the same) In general it seems that the it depends on nt whether this is efficient or not since calls to short memcpy are expensive (very small nt that is). I ran some empirical tests to compare memcpy vs for() (x86_64, OS X) and the results were encouraging - depending on the size of the copied block the difference could be quite big: tiny block (ca. n = 32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x faster as the size increases the gap closes (presumably due to RAM bandwidth limitations) so for n = 512M it is ~30%. Of course this is contingent on the implementation of memcpy, compiler, architecture etc. And will only matter if copying is what you do most of the time ... Copying of vectors is something that I would expect to happen fairly often in many applications of R. Is for() faster on small blocks by enough that one would want to branch based on size? Good question. Given that the branching itself adds overhead possibly not. In the best case for() can be ~40% faster (for single-digit n) but that means billions of copies to make a difference (since the operation itself is so fast). The break-even point on my test machine is n=32 and when I added the branching it took 20% hit so I guess it's simply not worth it. The only case that may be worth branching is n:1 since that is likely a fairly common use (the branching penalty in copy routines is lower than comparing memcpy/for implementations since the branching can be done before the outer for loop so this may vary case-by-case). Cheers, Simon __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RUnit bug?
Romain has already given you the answer. As would have the help page ?defineTestSuite Not a bug, but a user error, I assume. Matthias Dominick Samperi wrote, On 04/22/10 02:19: There appears to be a bug in RUnit. Given a testsuite testsuite.math, say, when I run: runTestSuite(testsuite.math) this works fine, provided there are no extraneous files in the unit test subdirectory. But if there are any Emacs temp files (with names that end with '~') then runTestSuite gets confused and tries to run functions from the temp files as well. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Matthias Burger Project Manager/ Biostatistician Epigenomics AG Kleine Praesidentenstr. 1 10178 Berlin, Germany phone:+49-30-24345-0 fax:+49-30-24345-555 http://www.epigenomics.com matthias.bur...@epigenomics.com -- Epigenomics AG Berlin Amtsgericht Charlottenburg HRB 75861 Vorstand: Geert Nygaard (CEO/Vorsitzender) Oliver Schacht PhD (CFO) Aufsichtsrat: Prof. Dr. Dr. hc. Rolf Krebs (Chairman/Vorsitzender) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Rtools for building 64 bit windows packages
Hello R developers, I sincerely apologize if the answer to this question is clearly documented somewhere, but I was unable to figure it out over my morning coffee. I just downloaded today's release of R 2.11.0 and installed it on my Windows 7 64 bit VM. I also downloaded the latest version of Rtools211 from Professor Murdoch's site. The first thing I attempted to do was build some of my packages from source to check that they work with the new version. I got the following error message: making DLL ... x86_64-w64-mingw32-gcc -IC:/PROGRA~1/R/R-211~1.0-X/include -O2 -Wall -std=gnu99 -c tikzDevice.c -o tikzDevice.o x86_64-w64-mingw32-gcc: not found This does not surprise me, R 2.11.0 is hot out of the forge and Rtools probably hasn't been repacked to support the 64 bit version. I gathered from the Windows FAQ and the list archives that the MinGW-w64 project supplies the compilers and linkers used by the 64 bit version- I visited their site and found the selection of packages available for download... confusing. I guess what I'm asking: * Do I use the Cygwin binaries? * If not, is there an officially blessed binary distribution of Windows x86_64 compilers and binutils? * If not, do I build the x86_64 toolchain from the current HEAD, or is there a specific revision that has been determined to be stable? Thanks for your time and effort on maintaining and enhancing such a wonderful language! -Charlie - Charlie Sharpsteen Undergraduate-- Environmental Resources Engineering Humboldt State University -- View this message in context: http://r.789695.n4.nabble.com/Rtools-for-building-64-bit-windows-packages-tp2021034p2021034.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rtools for building 64 bit windows packages
On 22/04/2010 3:04 PM, Sharpie wrote: Hello R developers, I sincerely apologize if the answer to this question is clearly documented somewhere, but I was unable to figure it out over my morning coffee. I just downloaded today's release of R 2.11.0 and installed it on my Windows 7 64 bit VM. I also downloaded the latest version of Rtools211 from Professor Murdoch's site. The first thing I attempted to do was build some of my packages from source to check that they work with the new version. I got the following error message: making DLL ... x86_64-w64-mingw32-gcc -IC:/PROGRA~1/R/R-211~1.0-X/include -O2 -Wall -std=gnu99 -c tikzDevice.c -o tikzDevice.o x86_64-w64-mingw32-gcc: not found This does not surprise me, R 2.11.0 is hot out of the forge and Rtools probably hasn't been repacked to support the 64 bit version. I gathered from the Windows FAQ and the list archives that the MinGW-w64 project supplies the compilers and linkers used by the 64 bit version- I visited their site and found the selection of packages available for download... confusing. I guess what I'm asking: * Do I use the Cygwin binaries? You can use the Rtools for the stuff other than the compilers. You need the MinGW 64 bit versions of the compilers; they are not nicely packaged yet, but the instructions for finding them are in the new version of the R-admin manual, in the section 3.3, Building R for 64 bit Windows. Duncan Murdoch * If not, is there an officially blessed binary distribution of Windows x86_64 compilers and binutils? * If not, do I build the x86_64 toolchain from the current HEAD, or is there a specific revision that has been determined to be stable? Thanks for your time and effort on maintaining and enhancing such a wonderful language! -Charlie - Charlie Sharpsteen Undergraduate-- Environmental Resources Engineering Humboldt State University __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rtools for building 64 bit windows packages
Duncan Murdoch-2 wrote: You can use the Rtools for the stuff other than the compilers. You need the MinGW 64 bit versions of the compilers; they are not nicely packaged yet, but the instructions for finding them are in the new version of the R-admin manual, in the section 3.3, Building R for 64 bit Windows. Ahh, thank you Duncan- this was exactly the information I was looking for. When I looked in R-admin this morning, I skipped straight to Appendix D as I wasn't interested in building R, just packages. Thanks again! -Charlie - Charlie Sharpsteen Undergraduate-- Environmental Resources Engineering Humboldt State University -- View this message in context: http://r.789695.n4.nabble.com/Rtools-for-building-64-bit-windows-packages-tp2021034p2022510.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RUnit bug?
Thanks. With help from Matthias I discovered that I was using the wrong RUnit docs. I was using the Nov. 25, 2009 paper by Matthias and two others instead of the online RUnit package docs, where a more robust regular expression appears. On Thu, Apr 22, 2010 at 3:51 AM, Romain Francois rom...@r-enthusiasts.comwrote: Le 22/04/10 02:19, Dominick Samperi a écrit : There appears to be a bug in RUnit. Given a testsuite testsuite.math, say, when I run: runTestSuite(testsuite.math) this works fine, provided there are no extraneous files in the unit test subdirectory. But if there are any Emacs temp files (with names that end with '~') then runTestSuite gets confused and tries to run functions from the temp files as well. How do you define 'testsuite.math'. The default value of the testFileRegexp argument in defineTestSuite should rule these files out. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9aKDM9 : embed images in Rd documents |- http://tr.im/OIXN : raster images and RImageJ |- http://tr.im/OcQe : Rcpp 0.7.7 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] segfault with format.POSIXct()
Hi, I'm getting a segmentation fault as follows: ---cut here---start-- R begt - as.POSIXct(strptime(10/01/2009 06:00:00, format=%d/%m/%Y %H:%M:%S), +tz=GMT) R tser - seq(begt, by=5, length.out=91000) R tser.trunc - format(tser) Error: segfault from C stack overflow ---cut here---end With the following set up: ---cut here---start-- R sessionInfo() R version 2.11.0 RC (2010-04-19 r51778) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8LC_PAPER=en_CA.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] slmisc_0.7.3 lattice_0.18-3 loaded via a namespace (and not attached): [1] grid_2.11.0 ---cut here---end Reducing the size of the sequence in seq.POSIXct() to 9 doesn't cause a segfault, so it seems to be a memory issue. Is this a bug? Thanks, -- Seb __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel