Re: [Rd] Bugs? when dealing with contrasts

2010-04-22 Thread Peter Dalgaard
Gabor Grothendieck wrote:
 On Wed, Apr 21, 2010 at 4:26 PM, Peter Dalgaard pda...@gmail.com wrote:
...
 I.e., that R reverts to using indicator variables when the intercept is
 absent.
 
 Is there any nice way of getting contr.sum coding for the interaction
 as opposed to the ugly code in my post that I used to force it? i.e.
 cbind(1, model.matrix(~ fac)[,2:3] * scores)

I think not. In general, an interaction like ~fac:scores indicates three
lines with a common intercept and three different slopes, and changing
the parametrization is not supposed to change the model, whereas your
model inserts a restriction that the slopes sum to zero (if I understand
correctly). So if you want to fit ugly models, you get to do a little
ugly footwork.

(A similar, simpler, issue arises if you want to have a 2x2 design with
no effect in one column and/or one row (think clinical trial, placebo
vs. active, baseline vs. treated. You can only do this us explicit dummy
variables, not with the two classifications represented as factors.)


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RUnit bug?

2010-04-22 Thread Romain Francois

Le 22/04/10 02:19, Dominick Samperi a écrit :


There appears to be a bug in RUnit.

Given a testsuite testsuite.math, say, when I run:

runTestSuite(testsuite.math)

this works fine, provided there are no extraneous files in the
unit test subdirectory.

But if there are any Emacs temp files (with names that
end with '~') then runTestSuite gets confused and tries to
run functions from the temp files as well.


How do you define 'testsuite.math'. The default value of the 
testFileRegexp argument in defineTestSuite should rule these files out.


Romain

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9aKDM9 : embed images in Rd documents
|- http://tr.im/OIXN : raster images and RImageJ
|- http://tr.im/OcQe : Rcpp 0.7.7

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-22 Thread Matthew Dowle

Is this a thumbs up for memcpy for DUPLICATE_ATOMIC_VECTOR at least ?

If there is further specific testing then let me know, happy to help, but 
you seem to have beaten me to it.

Matthew


Simon Urbanek simon.urba...@r-project.org wrote in message 
news:65d21b93-a737-4a94-bdf4-ad7e90518...@r-project.org...

 On Apr 21, 2010, at 2:15 PM, Seth Falcon wrote:

 On 4/21/10 10:45 AM, Simon Urbanek wrote:
 Won't that miss the last incomplete chunk? (and please don't use
 DATAPTR on INTSXP even though the effect is currently the same)

 In general it seems that the it depends on nt whether this is
 efficient or not since calls to short memcpy are expensive (very
 small nt that is).

 I ran some empirical tests to compare memcpy vs for() (x86_64, OS X)
 and the results were encouraging - depending on the size of the
 copied block the difference could be quite big: tiny block (ca. n =
 32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x
 faster as the size increases the gap closes (presumably due to RAM
 bandwidth limitations) so for n = 512M it is ~30%.


 Of course this is contingent on the implementation of memcpy,
 compiler, architecture etc. And will only matter if copying is what
 you do most of the time ...

 Copying of vectors is something that I would expect to happen fairly 
 often in many applications of R.

 Is for() faster on small blocks by enough that one would want to branch 
 based on size?


 Good question. Given that the branching itself adds overhead possibly not. 
 In the best case for() can be ~40% faster (for single-digit n) but that 
 means billions of copies to make a difference (since the operation itself 
 is so fast). The break-even point on my test machine is n=32 and when I 
 added the branching it took 20% hit so I guess it's simply not worth it. 
 The only case that may be worth branching is n:1 since that is likely a 
 fairly common use (the branching penalty in copy routines is lower than 
 comparing memcpy/for implementations since the branching can be done 
 before the outer for loop so this may vary case-by-case).

 Cheers,
 Simon


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-22 Thread Matthew Dowle

Just to add some clarification, the suggestion wasn't motivated by speeding 
up a length 3 vector being recycled 3.3 million times.  But its a good point 
that any change should not make that case slower.  I don't know how much 
vectorCopy is called really,  DUPLICATE_ATOMIC_VECTOR seems more 
significant, which doesn't recycle, and already had the FIXME next to it.

Where copyVector is passed a large source though, then memcpy should be 
faster than any of the methods using a for loop through each element 
(whether recycling or not),  allowing for the usual caveats. What are the 
timings like if you repeat the for loop 100 times to get a more robust 
timing ?  It needs to be a repeat around the for loop only, not the 
allocVector whose variance looks to be included in those timings below. Then 
increase the size of the source vector,  and compare to memcpy.

Matthew

William Dunlap wdun...@tibco.com wrote in message 
news:77eb52c6dd32ba4d87471dcd70c8d70002ce6...@na-pa-vbe03.na.tibco.com...
If I were worried about the time this loop takes,
I would avoid using i%nt.  For the attached C code
compile with gcc 4.3.3 with -O2 I get
   # INTEGER() in loop
   system.time( r1 - .Call(my_rep1, 1:3, 1e7) )
 user  system elapsed
0.060   0.012   0.071

   # INTEGER() before loop
   system.time( r2 - .Call(my_rep2, 1:3, 1e7) )
 user  system elapsed
0.076   0.008   0.086

   # replace i%src_length in loop with j=0 before loop and
   #if(++j==src_length) j=0 ;
   # in the loop.
   system.time( r3 - .Call(my_rep3, 1:3, 1e7) )
 user  system elapsed
0.024   0.028   0.050
   identical(r1,r2)  identical(r2,r3)
  [1] TRUE

The C code is:
#define USE_RINTERNALS /* pretend we are in the R kernel */
#include R.h
#include Rinternals.h


SEXP my_rep1(SEXP s_src, SEXP s_dest_length)
{
int src_length = length(s_src) ;
int dest_length = asInteger(s_dest_length) ;
int i,j ;
SEXP s_dest ;
PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
if(TYPEOF(s_src) != INTSXP) error(src must be integer data) ;
for(i=0;idest_length;i++) {
INTEGER(s_dest)[i] = INTEGER(s_src)[i % src_length] ;
}
UNPROTECT(1) ;
return s_dest ;
}
SEXP my_rep2(SEXP s_src, SEXP s_dest_length)
{
int src_length = length(s_src) ;
int dest_length = asInteger(s_dest_length) ;
int *psrc = INTEGER(s_src) ;
int *pdest ;
int i ;
SEXP s_dest ;
PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
pdest = INTEGER(s_dest) ;
if(TYPEOF(s_src) != INTSXP) error(src must be integer data) ;
/* end of boilerplate */
for(i=0;idest_length;i++) {
pdest[i] = psrc[i % src_length] ;
}
UNPROTECT(1) ;
return s_dest ;
}
SEXP my_rep3(SEXP s_src, SEXP s_dest_length)
{
int src_length = length(s_src) ;
int dest_length = asInteger(s_dest_length) ;
int *psrc = INTEGER(s_src) ;
int *pdest ;
int i,j ;
SEXP s_dest ;
PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
pdest = INTEGER(s_dest) ;
if(TYPEOF(s_src) != INTSXP) error(src must be integer data) ;
/* end of boilerplate */
for(j=0,i=0;idest_length;i++) {
*pdest++ = psrc[j++] ;
if (j==src_length) {
j = 0 ;
}
}
UNPROTECT(1) ;
return s_dest ;
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

 -Original Message-
 From: r-devel-boun...@r-project.org
 [mailto:r-devel-boun...@r-project.org] On Behalf Of Romain Francois
 Sent: Wednesday, April 21, 2010 12:32 PM
 To: Matthew Dowle
 Cc: r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] suggestion how to use memcpy in duplicate.c

 Le 21/04/10 17:54, Matthew Dowle a écrit :
 
  From copyVector in duplicate.c :
 
  void copyVector(SEXP s, SEXP t)
  {
   int i, ns, nt;
   nt = LENGTH(t);
   ns = LENGTH(s);
   switch (TYPEOF(s)) {
  ...
   case INTSXP:
   for (i = 0; i  ns; i++)
   INTEGER(s)[i] = INTEGER(t)[i % nt];
   break;
  ...
 
  could that be replaced with :
 
   case INTSXP:
   for (i=0; ins/nt; i++)
   memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char
 *)DATAPTR(t),
  nt*sizeof(int));
   break;

 or at least with something like this:

 int* p_s = INTEGER(s) ;
 int* p_t = INTEGER(t) ;
 for( i=0 ; i  ns ; i++){
 p_s[i] = p_t[i % nt];
 }

 since expanding the INTEGER macro over and over has a price.

  and similar for the other types in copyVector.  This won't
 help regular
  vector copies, since those seem to be done by the
 DUPLICATE_ATOMIC_VECTOR
  macro, see next suggestion below, but it should help
 copyMatrix which calls
  copyVector, scan.c which calls copyVector on three lines,
 dcf.c (once) and
  dounzip.c (once).
 
  For the DUPLICATE_ATOMIC_VECTOR macro there is already a
 comment next to it
  :
 
   FIXME: surely memcpy would be faster here?
 
  which seems to refer to the for loop  :
 
   else { \
   int __i__; \
   type *__fp__ = fun(from), *__tp__ = fun(to); \
   for (__i__ = 

Re: [Rd] Bugs? when dealing with contrasts

2010-04-22 Thread Gabor Grothendieck
On Thu, Apr 22, 2010 at 2:32 AM, Peter Dalgaard pda...@gmail.com wrote:
 Gabor Grothendieck wrote:
 On Wed, Apr 21, 2010 at 4:26 PM, Peter Dalgaard pda...@gmail.com wrote:
 ...
 I.e., that R reverts to using indicator variables when the intercept is
 absent.

 Is there any nice way of getting contr.sum coding for the interaction
 as opposed to the ugly code in my post that I used to force it? i.e.
 cbind(1, model.matrix(~ fac)[,2:3] * scores)

 I think not. In general, an interaction like ~fac:scores indicates three
 lines with a common intercept and three different slopes, and changing
 the parametrization is not supposed to change the model, whereas your
 model inserts a restriction that the slopes sum to zero (if I understand
 correctly). So if you want to fit ugly models, you get to do a little
 ugly footwork.


OK. Thanks.  I guess that's fair.

 (A similar, simpler, issue arises if you want to have a 2x2 design with
 no effect in one column and/or one row (think clinical trial, placebo
 vs. active, baseline vs. treated. You can only do this us explicit dummy
 variables, not with the two classifications represented as factors.)


 --
 Peter Dalgaard
 Center for Statistics, Copenhagen Business School
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-22 Thread Simon Urbanek

On Apr 22, 2010, at 7:12 AM, Matthew Dowle wrote:

 
 Is this a thumbs up for memcpy for DUPLICATE_ATOMIC_VECTOR at least ?
 
 If there is further specific testing then let me know, happy to help, but 
 you seem to have beaten me to it.
 

I was not volunteering to do anything - I was just looking at whether it makes 
sense to bother at all and pointing out the bugs in your code ;). I have a 
sufficiently long list of TODOs already :P

Cheers,
Simon


 
 Simon Urbanek simon.urba...@r-project.org wrote in message 
 news:65d21b93-a737-4a94-bdf4-ad7e90518...@r-project.org...
 
 On Apr 21, 2010, at 2:15 PM, Seth Falcon wrote:
 
 On 4/21/10 10:45 AM, Simon Urbanek wrote:
 Won't that miss the last incomplete chunk? (and please don't use
 DATAPTR on INTSXP even though the effect is currently the same)
 
 In general it seems that the it depends on nt whether this is
 efficient or not since calls to short memcpy are expensive (very
 small nt that is).
 
 I ran some empirical tests to compare memcpy vs for() (x86_64, OS X)
 and the results were encouraging - depending on the size of the
 copied block the difference could be quite big: tiny block (ca. n =
 32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x
 faster as the size increases the gap closes (presumably due to RAM
 bandwidth limitations) so for n = 512M it is ~30%.
 
 
 Of course this is contingent on the implementation of memcpy,
 compiler, architecture etc. And will only matter if copying is what
 you do most of the time ...
 
 Copying of vectors is something that I would expect to happen fairly 
 often in many applications of R.
 
 Is for() faster on small blocks by enough that one would want to branch 
 based on size?
 
 
 Good question. Given that the branching itself adds overhead possibly not. 
 In the best case for() can be ~40% faster (for single-digit n) but that 
 means billions of copies to make a difference (since the operation itself 
 is so fast). The break-even point on my test machine is n=32 and when I 
 added the branching it took 20% hit so I guess it's simply not worth it. 
 The only case that may be worth branching is n:1 since that is likely a 
 fairly common use (the branching penalty in copy routines is lower than 
 comparing memcpy/for implementations since the branching can be done 
 before the outer for loop so this may vary case-by-case).
 
 Cheers,
 Simon
 
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RUnit bug?

2010-04-22 Thread it-r-devel

Romain has already given you the answer. As would have the help page
?defineTestSuite

Not a bug, but a user error, I assume.

  Matthias

Dominick Samperi wrote, On 04/22/10 02:19:
 There appears to be a bug in RUnit.
 
 Given a testsuite testsuite.math, say, when I run:
 
 runTestSuite(testsuite.math)
 
 this works fine, provided there are no extraneous files in the
 unit test subdirectory.
 
 But if there are any Emacs temp files (with names that
 end with '~') then runTestSuite gets confused and tries to
 run functions from the temp files as well.
 
   [[alternative HTML version deleted]]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 


-- 
Matthias Burger   Project Manager/ Biostatistician
Epigenomics AG Kleine Praesidentenstr. 1 10178 Berlin, Germany
phone:+49-30-24345-0  fax:+49-30-24345-555
http://www.epigenomics.com matthias.bur...@epigenomics.com
--
Epigenomics AG Berlin Amtsgericht Charlottenburg HRB 75861
Vorstand: Geert Nygaard (CEO/Vorsitzender)
  Oliver Schacht PhD (CFO)
Aufsichtsrat: Prof. Dr. Dr. hc. Rolf Krebs (Chairman/Vorsitzender)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Rtools for building 64 bit windows packages

2010-04-22 Thread Sharpie

Hello R developers,

I sincerely apologize if the answer to this question is clearly documented
somewhere, but I was unable to figure it out over my morning coffee.

I just downloaded today's release of R 2.11.0 and installed it on my Windows
7 64 bit VM.  I also downloaded the latest version of Rtools211 from
Professor Murdoch's site.   The first thing I attempted to do was build some
of my packages from source to check that they work with the new version.  I
got the following error message:

  making DLL ...
x86_64-w64-mingw32-gcc -IC:/PROGRA~1/R/R-211~1.0-X/include -O2
-Wall  -std=gnu99 -c tikzDevice.c -o tikzDevice.o
x86_64-w64-mingw32-gcc: not found

This does not surprise me, R 2.11.0 is hot out of the forge and Rtools
probably hasn't been repacked to support the 64 bit version.  I gathered
from the Windows FAQ and the list archives that the MinGW-w64 project
supplies the compilers and linkers used by the 64 bit version- I visited
their site and found the selection of packages available for download...
confusing.

I guess what I'm asking: 

  * Do I use the Cygwin binaries?

  * If not, is there an officially blessed binary distribution of Windows
x86_64 compilers and binutils?

  * If not, do I build the x86_64 toolchain from the current HEAD, or is
there a specific revision that has been determined to be stable?


Thanks for your time and effort on maintaining and enhancing such a
wonderful language!

-Charlie

-
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Rtools-for-building-64-bit-windows-packages-tp2021034p2021034.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rtools for building 64 bit windows packages

2010-04-22 Thread Duncan Murdoch

On 22/04/2010 3:04 PM, Sharpie wrote:

Hello R developers,

I sincerely apologize if the answer to this question is clearly documented
somewhere, but I was unable to figure it out over my morning coffee.

I just downloaded today's release of R 2.11.0 and installed it on my Windows
7 64 bit VM.  I also downloaded the latest version of Rtools211 from
Professor Murdoch's site.   The first thing I attempted to do was build some
of my packages from source to check that they work with the new version.  I
got the following error message:

  making DLL ...
x86_64-w64-mingw32-gcc -IC:/PROGRA~1/R/R-211~1.0-X/include -O2
-Wall  -std=gnu99 -c tikzDevice.c -o tikzDevice.o
x86_64-w64-mingw32-gcc: not found

This does not surprise me, R 2.11.0 is hot out of the forge and Rtools
probably hasn't been repacked to support the 64 bit version.  I gathered
from the Windows FAQ and the list archives that the MinGW-w64 project
supplies the compilers and linkers used by the 64 bit version- I visited
their site and found the selection of packages available for download...
confusing.

I guess what I'm asking: 


  * Do I use the Cygwin binaries?
  


You can use the Rtools for the stuff other than the compilers.  You need 
the MinGW 64 bit versions of the compilers; they are not nicely packaged 
yet, but the instructions for finding them are in the new version of the 
R-admin manual, in the section 3.3, Building R for 64 bit Windows. 


Duncan Murdoch

  * If not, is there an officially blessed binary distribution of Windows
x86_64 compilers and binutils?
  
  * If not, do I build the x86_64 toolchain from the current HEAD, or is

there a specific revision that has been determined to be stable?


Thanks for your time and effort on maintaining and enhancing such a
wonderful language!

-Charlie

-
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rtools for building 64 bit windows packages

2010-04-22 Thread Sharpie


Duncan Murdoch-2 wrote:
 
 You can use the Rtools for the stuff other than the compilers.  You need 
 the MinGW 64 bit versions of the compilers; they are not nicely packaged 
 yet, but the instructions for finding them are in the new version of the 
 R-admin manual, in the section 3.3, Building R for 64 bit Windows. 
 

Ahh, thank you Duncan- this was exactly the information I was looking for. 
When I looked in R-admin this morning, I skipped straight to Appendix D as I
wasn't interested in building R, just packages.

Thanks again!

-Charlie

-
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Rtools-for-building-64-bit-windows-packages-tp2021034p2022510.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RUnit bug?

2010-04-22 Thread Dominick Samperi
Thanks. With help from Matthias I discovered that I was using the wrong
RUnit docs.
I was using the Nov. 25, 2009 paper by Matthias and two others instead of
the online RUnit package docs, where a more robust regular expression
appears.

On Thu, Apr 22, 2010 at 3:51 AM, Romain Francois
rom...@r-enthusiasts.comwrote:

 Le 22/04/10 02:19, Dominick Samperi a écrit :


 There appears to be a bug in RUnit.

 Given a testsuite testsuite.math, say, when I run:

 runTestSuite(testsuite.math)

 this works fine, provided there are no extraneous files in the
 unit test subdirectory.

 But if there are any Emacs temp files (with names that
 end with '~') then runTestSuite gets confused and tries to
 run functions from the temp files as well.


 How do you define 'testsuite.math'. The default value of the testFileRegexp
 argument in defineTestSuite should rule these files out.

 Romain

 --
 Romain Francois
 Professional R Enthusiast
 +33(0) 6 28 91 30 30
 http://romainfrancois.blog.free.fr
 |- http://bit.ly/9aKDM9 : embed images in Rd documents
 |- http://tr.im/OIXN : raster images and RImageJ
 |- http://tr.im/OcQe : Rcpp 0.7.7



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] segfault with format.POSIXct()

2010-04-22 Thread Sebastian P. Luque
Hi,

I'm getting a segmentation fault as follows:

---cut here---start--
R begt - as.POSIXct(strptime(10/01/2009 06:00:00, format=%d/%m/%Y 
%H:%M:%S),
+tz=GMT)
R tser - seq(begt, by=5, length.out=91000)
R tser.trunc - format(tser)
Error: segfault from C stack overflow
---cut here---end

With the following set up:

---cut here---start--
R sessionInfo()
R version 2.11.0 RC (2010-04-19 r51778) 
x86_64-pc-linux-gnu 

locale:
 [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C   LC_TIME=en_CA.UTF-8  
  LC_COLLATE=en_CA.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_CA.UTF-8LC_PAPER=en_CA.UTF-8 
  LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C 
LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] slmisc_0.7.3   lattice_0.18-3

loaded via a namespace (and not attached):
[1] grid_2.11.0
---cut here---end


Reducing the size of the sequence in seq.POSIXct() to 9 doesn't
cause a segfault, so it seems to be a memory issue.  Is this a bug?

Thanks,

-- 
Seb

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel