Re: [Rd] Serial connections?

2010-04-21 Thread Philippe Grosjean
There is another option I use since a couple of years to pilot 
scientific devices, to program my chemostats, etc. and that is 
platform-independent: Tcl.


Tcl i/o design is excellent and very robust, and Tcl/Tk is integrated in 
R with the tcltk package.


It is really just a mather of a few lines of code in R to communicate 
through a serial port from R using Tcl. Something like:


require(tcltk)
.Tcl('set R_com1 [open com1 r+]') # Works on Windows too!

# There are many config parameters available here... just an example
.Tcl('fconfigure $R_com1 -mode 9600,n,8,1 -buffering none -blocking 0')

# Send a command to your device through the serial port
.Tcl('puts -nonewline $R_com1 {my_cmd}')

# Read a line of text from the serial port
line - tclvalue(.Tcl('gets $R_com1'))

With a little bit more code, one can program Tcl to call R code 
automatically everytime new data is pushed by the connected device 
through the serial port. This is done using something like:


.Tcl('fileevent $R_com1 readable [list Handler_com1 $R_com1]')

Here, Handler_com1 is a Tcl function. So, some care must be taken 
using tcltk's .Tcl.callback() to trigger the event on the R side. One 
way to deal with this easily is by using tclFun() from the tcltk2 package.


In tcltk2, there is also ?tclTaskSchedule that can be of interest in the 
context of serial port communication to trigger a R function in the 
background regularly and collect data actively from the serial port.


All these tools give me a lot a flexibility to communicate through the 
serial port from R,... and most importantly, to write my code in a 
portable way (tested on Windows XP and Linux Ubuntu).


If there is some interest in this approach, I could initiate a 'tclcom' 
R package on R-Forge and place there the code I have.


Best,

Philippe
..°}))
 ) ) ) ) )
( ( ( ( (Prof. Philippe Grosjean
 ) ) ) ) )
( ( ( ( (Numerical Ecology of Aquatic Systems
 ) ) ) ) )   Mons University, Belgium
( ( ( ( (
..

On 21/04/10 03:17, shotwelm wrote:

Simon is right of course, there are plenty of sensors that would work
just fine at 9600 baud (like a thermistor rigged to an ADC). There's a
theorem along these lines (Nyquist sampling theorem?). I think piping
the output to R is a clever solution. I added a few lines to the ttys.c
program so that the baud rate is a command line option (i.e. -B9600)
http://biostatmatt.com/temp/ttys.c  and confirmed it will compile in
Linux (2.6.30). Maybe it will save a step. Microcontrollers really are
addictive!

For an ioctl package, I was originally thinking of using file
descriptors directly. However, I agree this feels like subverting what
could be an extension of the connections API. Given that everything is
a file in POSIX systems, there may be an argument for an ioctl package
that is independent of the connections implementation, say to do things
that connections were not designed to do. For example, interfacing with
V4L2 devices usually involves many ioctl calls, an mmap call, but rarely
read or write calls. But maybe it would just be better to pipe this type
of output to R also...

-Matt

On Tue, 2010-04-20 at 16:42 -0400, Simon Urbanek wrote:

On Apr 20, 2010, at 11:51 AM, shotwelm wrote:


I've done some microcontroller work over serial also. Unfortunately, 
interfacing with a serial port is system dependent, and the mechanisms can be 
quite different, as you probably know. It appears that Simon has a solution 
below that will work if you are willing to accept the default baud rate (9600 
is way too slow for good sensor data


[OT: define good ;) - good doesn't mean fast - besides it won't be any good 
if it is too fast to be meaningfully processed -- that's a different story, though :P - 
and it is trivial to change so the solution works in general]



), parity, etc.. or use external tools. On POSIX systems, you would need access 
to the termios.h header and the system ioctl function in order to change these 
settings. Although I'm not 100% sure, I don't think R has this capability ... 
yet.

I'm new to the list, but I'd be surprised if the R developers that have been 
around awhile haven't already considered adding support for ioctls and the 
POSIX terminal interface. This makes me wonder why it's not there. If there is 
no good reason, I'm starting to see a series of R packages (or core extensions) 
developing.


Good luck ;). The issue is that connections are inherently backend-independent 
which implies that packages have no access to connection internals as they can 
change at any time. This means that you can't enhance them without putting the 
enhancements into R itself. This implies that you have to make a strong case 
since you need a volunteer in R-core to maintain that code etc.



With a package for ioctls, we could use all sorts of cool stuff, like 
Video4Linux2 (webcams, HAM radio, tuners)...



Ioctls are 

[Rd] New CRAN test platform - Sparc Solaris

2010-04-21 Thread Prof Brian Ripley
The CRAN package check page now shows results on Sparc Solaris 10 
using a server donated by Sun.


This box tests several different aspects from the existing Intel-based 
test boxes:


- the CPU is big-endian.

- it does not have extended-precision doubles in FPU registers.  (It 
does support long doubles if selected explicitly.)


- the C/C++ header set is from a different tradition.

- the Sun Studio compiler is used (as has been on one of the Linux 
boxes, and we may phase the latter out in due course).


- the toolset is ATT-based Unix rather than GNU or BSD.  In 
particular Solaris make, sed and tar and an actual Bourne shell are 
used.


There is only a limited set of additional software installed.  A few 
things have had to be compiled with gcc, notably the packages using 
Gtk+ (and Solaris' own installation of Gtk+ is too old for RGtk2) and 
MCMCpack.


The server does not have a graphics card and this is causing some 
problems with the X11 installation, including bitmap devices and rgl, 
that we are still working on.


Amongst the package issues shown up are

- unterminated final lines in src/Makevars[.in] cause failures.

- there are many packages which require GNU make and fail to declare 
it.  These include Cairo JavaGD Rcpp RcppExamples RInside farmR 
highlight phyclust png rJava.


- the assumption that 'tar' accepts compressed archives (gdata).

- 60-odd C++ -using packages are not correct C++, often because they 
use GNU extensions or have missing headers (and C++ requires headers 
for prototypes), or use C99 functions that are not in C++ and hence 
not in the headers when used from C++ -- Solaris is rather strict 
about the latter.



Brian Ripley

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Serial connections?

2010-04-21 Thread Simon Urbanek
Philippe,

unfortunately that approach has one major drawback - it does not give you a 
connection. As I said in the previous e-mail it is fairly easy to talk to a tty 
directly, but the challenge is to turn it into a connection. I don't like the 
Tcl approach for several reasons, one of them being that it's entirely 
unnatural since you have to construct strings of code -- you could as well use 
rJava and Java serial connections which has a little more friendly syntax (but 
I wouldn't recommend that, either) or any other language R interfaces to.

I was thinking about it a bit and I may be tempted to have a dab at a tty 
connection, but I still would not want to mess with ioctl (the other part Matt 
mentioned).

Cheers,
Simon



On Apr 21, 2010, at 6:30 AM, Philippe Grosjean wrote:

 There is another option I use since a couple of years to pilot scientific 
 devices, to program my chemostats, etc. and that is platform-independent: Tcl.
 
 Tcl i/o design is excellent and very robust, and Tcl/Tk is integrated in R 
 with the tcltk package.
 
 It is really just a mather of a few lines of code in R to communicate through 
 a serial port from R using Tcl. Something like:
 
 require(tcltk)
 .Tcl('set R_com1 [open com1 r+]') # Works on Windows too!
 
 # There are many config parameters available here... just an example
 .Tcl('fconfigure $R_com1 -mode 9600,n,8,1 -buffering none -blocking 0')
 
 # Send a command to your device through the serial port
 .Tcl('puts -nonewline $R_com1 {my_cmd}')
 
 # Read a line of text from the serial port
 line - tclvalue(.Tcl('gets $R_com1'))
 
 With a little bit more code, one can program Tcl to call R code automatically 
 everytime new data is pushed by the connected device through the serial port. 
 This is done using something like:
 
 .Tcl('fileevent $R_com1 readable [list Handler_com1 $R_com1]')
 
 Here, Handler_com1 is a Tcl function. So, some care must be taken using 
 tcltk's .Tcl.callback() to trigger the event on the R side. One way to deal 
 with this easily is by using tclFun() from the tcltk2 package.
 
 In tcltk2, there is also ?tclTaskSchedule that can be of interest in the 
 context of serial port communication to trigger a R function in the 
 background regularly and collect data actively from the serial port.
 
 All these tools give me a lot a flexibility to communicate through the serial 
 port from R,... and most importantly, to write my code in a portable way 
 (tested on Windows XP and Linux Ubuntu).
 
 If there is some interest in this approach, I could initiate a 'tclcom' R 
 package on R-Forge and place there the code I have.
 
 Best,
 
 Philippe
 ..°}))
 ) ) ) ) )
 ( ( ( ( (Prof. Philippe Grosjean
 ) ) ) ) )
 ( ( ( ( (Numerical Ecology of Aquatic Systems
 ) ) ) ) )   Mons University, Belgium
 ( ( ( ( (
 ..
 
 On 21/04/10 03:17, shotwelm wrote:
 Simon is right of course, there are plenty of sensors that would work
 just fine at 9600 baud (like a thermistor rigged to an ADC). There's a
 theorem along these lines (Nyquist sampling theorem?). I think piping
 the output to R is a clever solution. I added a few lines to the ttys.c
 program so that the baud rate is a command line option (i.e. -B9600)
 http://biostatmatt.com/temp/ttys.c  and confirmed it will compile in
 Linux (2.6.30). Maybe it will save a step. Microcontrollers really are
 addictive!
 
 For an ioctl package, I was originally thinking of using file
 descriptors directly. However, I agree this feels like subverting what
 could be an extension of the connections API. Given that everything is
 a file in POSIX systems, there may be an argument for an ioctl package
 that is independent of the connections implementation, say to do things
 that connections were not designed to do. For example, interfacing with
 V4L2 devices usually involves many ioctl calls, an mmap call, but rarely
 read or write calls. But maybe it would just be better to pipe this type
 of output to R also...
 
 -Matt
 
 On Tue, 2010-04-20 at 16:42 -0400, Simon Urbanek wrote:
 On Apr 20, 2010, at 11:51 AM, shotwelm wrote:
 
 I've done some microcontroller work over serial also. Unfortunately, 
 interfacing with a serial port is system dependent, and the mechanisms can 
 be quite different, as you probably know. It appears that Simon has a 
 solution below that will work if you are willing to accept the default 
 baud rate (9600 is way too slow for good sensor data
 
 [OT: define good ;) - good doesn't mean fast - besides it won't be any 
 good if it is too fast to be meaningfully processed -- that's a different 
 story, though :P - and it is trivial to change so the solution works in 
 general]
 
 
 ), parity, etc.. or use external tools. On POSIX systems, you would need 
 access to the termios.h header and the system ioctl function in order to 
 change these settings. Although I'm not 100% sure, I don't think R 

Re: [Rd] Serial connections?

2010-04-21 Thread Philippe Grosjean

Simon,

I see what you mean, and I agree it would be nice to have a connection. 
That would be the natural way in S.


Yet, connections are driven by R. You cannot open a connection and let 
the connected device trigger some R code when data is available, don't you?


Otherwise, I don't understand your problem with strings of code. A .R 
file also contains a series of strings of code interpreted by the R 
parser when you source() it. Just that code happens to be S. Here the 
code is Tcl. You can source as well a .tcl file with the same code if 
you prefer...


Best,

Philippe

On 21/04/10 14:07, Simon Urbanek wrote:

Philippe,

unfortunately that approach has one major drawback - it does not give you a 
connection. As I said in the previous e-mail it is fairly easy to talk to a tty 
directly, but the challenge is to turn it into a connection. I don't like the 
Tcl approach for several reasons, one of them being that it's entirely 
unnatural since you have to construct strings of code -- you could as well use 
rJava and Java serial connections which has a little more friendly syntax (but 
I wouldn't recommend that, either) or any other language R interfaces to.

I was thinking about it a bit and I may be tempted to have a dab at a tty 
connection, but I still would not want to mess with ioctl (the other part Matt 
mentioned).

Cheers,
Simon



On Apr 21, 2010, at 6:30 AM, Philippe Grosjean wrote:


There is another option I use since a couple of years to pilot scientific 
devices, to program my chemostats, etc. and that is platform-independent: Tcl.

Tcl i/o design is excellent and very robust, and Tcl/Tk is integrated in R with 
the tcltk package.

It is really just a mather of a few lines of code in R to communicate through a 
serial port from R using Tcl. Something like:

require(tcltk)
.Tcl('set R_com1 [open com1 r+]') # Works on Windows too!

# There are many config parameters available here... just an example
.Tcl('fconfigure $R_com1 -mode 9600,n,8,1 -buffering none -blocking 0')

# Send a command to your device through the serial port
.Tcl('puts -nonewline $R_com1 {my_cmd}')

# Read a line of text from the serial port
line- tclvalue(.Tcl('gets $R_com1'))

With a little bit more code, one can program Tcl to call R code automatically 
everytime new data is pushed by the connected device through the serial port. 
This is done using something like:

.Tcl('fileevent $R_com1 readable [list Handler_com1 $R_com1]')

Here, Handler_com1 is a Tcl function. So, some care must be taken using 
tcltk's .Tcl.callback() to trigger the event on the R side. One way to deal with this 
easily is by using tclFun() from the tcltk2 package.

In tcltk2, there is also ?tclTaskSchedule that can be of interest in the 
context of serial port communication to trigger a R function in the background 
regularly and collect data actively from the serial port.

All these tools give me a lot a flexibility to communicate through the serial 
port from R,... and most importantly, to write my code in a portable way 
(tested on Windows XP and Linux Ubuntu).

If there is some interest in this approach, I could initiate a 'tclcom' R 
package on R-Forge and place there the code I have.

Best,

Philippe
..°}))
) ) ) ) )
( ( ( ( (Prof. Philippe Grosjean
) ) ) ) )
( ( ( ( (Numerical Ecology of Aquatic Systems
) ) ) ) )   Mons University, Belgium
( ( ( ( (
..

On 21/04/10 03:17, shotwelm wrote:

Simon is right of course, there are plenty of sensors that would work
just fine at 9600 baud (like a thermistor rigged to an ADC). There's a
theorem along these lines (Nyquist sampling theorem?). I think piping
the output to R is a clever solution. I added a few lines to the ttys.c
program so that the baud rate is a command line option (i.e. -B9600)
http://biostatmatt.com/temp/ttys.c   and confirmed it will compile in
Linux (2.6.30). Maybe it will save a step. Microcontrollers really are
addictive!

For an ioctl package, I was originally thinking of using file
descriptors directly. However, I agree this feels like subverting what
could be an extension of the connections API. Given that everything is
a file in POSIX systems, there may be an argument for an ioctl package
that is independent of the connections implementation, say to do things
that connections were not designed to do. For example, interfacing with
V4L2 devices usually involves many ioctl calls, an mmap call, but rarely
read or write calls. But maybe it would just be better to pipe this type
of output to R also...

-Matt

On Tue, 2010-04-20 at 16:42 -0400, Simon Urbanek wrote:

On Apr 20, 2010, at 11:51 AM, shotwelm wrote:


I've done some microcontroller work over serial also. Unfortunately, 
interfacing with a serial port is system dependent, and the mechanisms can be 
quite different, as you probably know. It appears that Simon has a solution 
below that will work if you 

[Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Matthew Dowle
From copyVector in duplicate.c :

void copyVector(SEXP s, SEXP t)
{
int i, ns, nt;
nt = LENGTH(t);
ns = LENGTH(s);
switch (TYPEOF(s)) {
...
case INTSXP:
for (i = 0; i  ns; i++)
INTEGER(s)[i] = INTEGER(t)[i % nt];
break;
...

could that be replaced with :

case INTSXP:
for (i=0; ins/nt; i++)
memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t), 
nt*sizeof(int));
break;

and similar for the other types in copyVector.  This won't help regular 
vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR 
macro, see next suggestion below, but it should help copyMatrix which calls 
copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and 
dounzip.c (once).

For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it 
:

FIXME: surely memcpy would be faster here?

which seems to refer to the for loop  :

else { \
int __i__; \
type *__fp__ = fun(from), *__tp__ = fun(to); \
for (__i__ = 0; __i__  __n__; __i__++) \
  __tp__[__i__] = __fp__[__i__]; \
  } \

Could that loop be replaced by the following ?

   else { \
   memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
   }\

In the data.table package, dogroups.c uses this technique, so the principle 
is tested and works well so far.

Are there any road blocks preventing this change, or is anyone already 
working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and 
submit patch with timings, as before.  Comments/pointers much appreciated.

Matthew

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Simon Urbanek
Matt,

On Apr 21, 2010, at 11:54 AM, Matthew Dowle wrote:

 From copyVector in duplicate.c :
 
 void copyVector(SEXP s, SEXP t)
 {
int i, ns, nt;
nt = LENGTH(t);
ns = LENGTH(s);
switch (TYPEOF(s)) {
 ...
case INTSXP:
for (i = 0; i  ns; i++)
INTEGER(s)[i] = INTEGER(t)[i % nt];
break;
 ...
 
 could that be replaced with :
 
case INTSXP:
for (i=0; ins/nt; i++)
memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t), 
 nt*sizeof(int));
break;
 

Won't that miss the last incomplete chunk? (and please don't use DATAPTR on 
INTSXP even though the effect is currently the same)

In general it seems that the it depends on nt whether this is efficient or not 
since calls to short memcpy are expensive (very small nt that is).

 I ran some empirical tests to compare memcpy vs for() (x86_64, OS X) and the 
results were encouraging - depending on the size of the copied block the 
difference could be quite big:
tiny block (ca. n = 32 or less) - for() is faster
small block (n ~ 1k) - memcpy is ca. 8x faster
as the size increases the gap closes (presumably due to RAM bandwidth 
limitations) so for n = 512M it is ~30%.

Of course this is contingent on the implementation of memcpy, compiler, 
architecture etc. And will only matter if copying is what you do most of the 
time ...

Cheers,
Simon



 and similar for the other types in copyVector.  This won't help regular 
 vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR 
 macro, see next suggestion below, but it should help copyMatrix which calls 
 copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and 
 dounzip.c (once).
 
 For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it 
 :
 
FIXME: surely memcpy would be faster here?
 
 which seems to refer to the for loop  :
 
else { \
int __i__; \
type *__fp__ = fun(from), *__tp__ = fun(to); \
for (__i__ = 0; __i__  __n__; __i__++) \
  __tp__[__i__] = __fp__[__i__]; \
  } \
 
 Could that loop be replaced by the following ?
 
   else { \
   memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
   }\
 
 In the data.table package, dogroups.c uses this technique, so the principle 
 is tested and works well so far.
 
 Are there any road blocks preventing this change, or is anyone already 
 working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and 
 submit patch with timings, as before.  Comments/pointers much appreciated.
 
 Matthew
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bugs? when dealing with contrasts

2010-04-21 Thread Gabor Grothendieck
R version 2.10.1 (2009-12-14)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Below are two cases where I don't seem to be getting contr.sum
contrasts even though they were specified.  Are these bugs?

The first case is an interaction between continuous and factor variables.

The second case contrasts= was specified as an arg to lm.  The second
works ok if we set the contrasts through options but not if we set it
through an lm argument.


 # 1. In this case I don't seem to be getting contr.sum contrasts:

 options(contrasts = c(contr.sum, contr.poly))
 getOption(contrasts)
[1] contr.sum  contr.poly
 scores - rep(seq(-2, 2), 3); scores
 [1] -2 -1  0  1  2 -2 -1  0  1  2 -2 -1  0  1  2
 fac - gl(3, 5); fac
 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
Levels: 1 2 3

 # I get this:
 model.matrix(~ scores:fac)
   (Intercept) scores:fac1 scores:fac2 scores:fac3
11  -2   0   0
21  -1   0   0
31   0   0   0
41   1   0   0
51   2   0   0
61   0  -2   0
71   0  -1   0
81   0   0   0
91   0   1   0
10   1   0   2   0
11   1   0   0  -2
12   1   0   0  -1
13   1   0   0   0
14   1   0   0   1
15   1   0   0   2
attr(,assign)
[1] 0 1 1 1
attr(,contrasts)
attr(,contrasts)$fac
[1] contr.sum


 # But I was expecting this since I am using contr.sum
 cbind(1, model.matrix(~ fac)[,2:3] * scores)
 fac1 fac2
1  1   -20
2  1   -10
3  100
4  110
5  120
6  10   -2
7  10   -1
8  100
9  101
10 102
11 122
12 111
13 100
14 1   -1   -1
15 1   -2   -2


 # 2.
 # here I don't get contr.sum but rather get contr.treatment
 options(contrasts = c(contr.treatment, contr.poly))
 getOption(contrasts)
[1] contr.treatment contr.poly
 model.matrix(lm(seq(15) ~ fac, contrasts = c(contr.sum, contr.poly)))
   (Intercept) fac2 fac3
1100
2100
3100
4100
5100
6110
7110
8110
9110
10   110
11   101
12   101
13   101
14   101
15   101
attr(,assign)
[1] 0 1 1
attr(,contrasts)
attr(,contrasts)$fac
[1] contr.treatment

 model.matrix(lm(seq(15) ~ fac)) # same
   (Intercept) fac2 fac3
1100
2100
3100
4100
5100
6110
7110
8110
9110
10   110
11   101
12   101
13   101
14   101
15   101
attr(,assign)
[1] 0 1 1
attr(,contrasts)
attr(,contrasts)$fac
[1] contr.treatment


 # I was expecting the first one to give me this
 options(contrasts = c(contr.sum, contr.poly))
 model.matrix(lm(seq(15) ~ fac))
   (Intercept) fac1 fac2
1110
2110
3110
4110
5110
6101
7101
8101
9101
10   101
11   1   -1   -1
12   1   -1   -1
13   1   -1   -1
14   1   -1   -1
15   1   -1   -1
attr(,assign)
[1] 0 1 1
attr(,contrasts)
attr(,contrasts)$fac
[1] contr.sum


 R.version.string
[1] R version 2.10.1 (2009-12-14)
 win.version()
[1] Windows Vista (build 6002) Service Pack 2

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Seth Falcon

On 4/21/10 10:45 AM, Simon Urbanek wrote:

Won't that miss the last incomplete chunk? (and please don't use
DATAPTR on INTSXP even though the effect is currently the same)

In general it seems that the it depends on nt whether this is
efficient or not since calls to short memcpy are expensive (very
small nt that is).

I ran some empirical tests to compare memcpy vs for() (x86_64, OS X)
and the results were encouraging - depending on the size of the
copied block the difference could be quite big: tiny block (ca. n =
32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x
faster as the size increases the gap closes (presumably due to RAM
bandwidth limitations) so for n = 512M it is ~30%.




Of course this is contingent on the implementation of memcpy,
compiler, architecture etc. And will only matter if copying is what
you do most of the time ...


Copying of vectors is something that I would expect to happen fairly 
often in many applications of R.


Is for() faster on small blocks by enough that one would want to branch 
based on size?


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Romain Francois

Le 21/04/10 17:54, Matthew Dowle a écrit :



From copyVector in duplicate.c :


void copyVector(SEXP s, SEXP t)
{
 int i, ns, nt;
 nt = LENGTH(t);
 ns = LENGTH(s);
 switch (TYPEOF(s)) {
...
 case INTSXP:
 for (i = 0; i  ns; i++)
 INTEGER(s)[i] = INTEGER(t)[i % nt];
 break;
...

could that be replaced with :

 case INTSXP:
 for (i=0; ins/nt; i++)
 memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t),
nt*sizeof(int));
 break;


or at least with something like this:

int* p_s = INTEGER(s) ;
int* p_t = INTEGER(t) ;
for( i=0 ; i  ns ; i++){
p_s[i] = p_t[i % nt];
}

since expanding the INTEGER macro over and over has a price.


and similar for the other types in copyVector.  This won't help regular
vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR
macro, see next suggestion below, but it should help copyMatrix which calls
copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and
dounzip.c (once).

For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it
:

 FIXME: surely memcpy would be faster here?

which seems to refer to the for loop  :

 else { \
 int __i__; \
 type *__fp__ = fun(from), *__tp__ = fun(to); \
 for (__i__ = 0; __i__  __n__; __i__++) \
   __tp__[__i__] = __fp__[__i__]; \
   } \

Could that loop be replaced by the following ?

else { \
memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
}\

In the data.table package, dogroups.c uses this technique, so the principle
is tested and works well so far.

Are there any road blocks preventing this change, or is anyone already
working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and
submit patch with timings, as before.  Comments/pointers much appreciated.

Matthew

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9aKDM9 : embed images in Rd documents
|- http://tr.im/OIXN : raster images and RImageJ
|- http://tr.im/OcQe : Rcpp 0.7.7

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Simon Urbanek

On Apr 21, 2010, at 2:15 PM, Seth Falcon wrote:

 On 4/21/10 10:45 AM, Simon Urbanek wrote:
 Won't that miss the last incomplete chunk? (and please don't use
 DATAPTR on INTSXP even though the effect is currently the same)
 
 In general it seems that the it depends on nt whether this is
 efficient or not since calls to short memcpy are expensive (very
 small nt that is).
 
 I ran some empirical tests to compare memcpy vs for() (x86_64, OS X)
 and the results were encouraging - depending on the size of the
 copied block the difference could be quite big: tiny block (ca. n =
 32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x
 faster as the size increases the gap closes (presumably due to RAM
 bandwidth limitations) so for n = 512M it is ~30%.
 
 
 Of course this is contingent on the implementation of memcpy,
 compiler, architecture etc. And will only matter if copying is what
 you do most of the time ...
 
 Copying of vectors is something that I would expect to happen fairly often in 
 many applications of R.
 
 Is for() faster on small blocks by enough that one would want to branch based 
 on size?
 

Good question. Given that the branching itself adds overhead possibly not. In 
the best case for() can be ~40% faster (for single-digit n) but that means 
billions of copies to make a difference (since the operation itself is so 
fast). The break-even point on my test machine is n=32 and when I added the 
branching it took 20% hit so I guess it's simply not worth it. The only case 
that may be worth branching is n:1 since that is likely a fairly common use 
(the branching penalty in copy routines is lower than comparing memcpy/for 
implementations since the branching can be done before the outer for loop so 
this may vary case-by-case).

Cheers,
Simon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Simon Urbanek

On Apr 21, 2010, at 3:32 PM, Romain Francois wrote:

 Le 21/04/10 17:54, Matthew Dowle a écrit :
 
 From copyVector in duplicate.c :
 
 void copyVector(SEXP s, SEXP t)
 {
 int i, ns, nt;
 nt = LENGTH(t);
 ns = LENGTH(s);
 switch (TYPEOF(s)) {
 ...
 case INTSXP:
 for (i = 0; i  ns; i++)
 INTEGER(s)[i] = INTEGER(t)[i % nt];
 break;
 ...
 
 could that be replaced with :
 
 case INTSXP:
 for (i=0; ins/nt; i++)
 memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t),
 nt*sizeof(int));
 break;
 
 or at least with something like this:
 
 int* p_s = INTEGER(s) ;
 int* p_t = INTEGER(t) ;
 for( i=0 ; i  ns ; i++){
   p_s[i] = p_t[i % nt];
 }
 
 since expanding the INTEGER macro over and over has a price.
 

... in packages, yes, but not in core R.

Cheers,
Simon


 and similar for the other types in copyVector.  This won't help regular
 vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR
 macro, see next suggestion below, but it should help copyMatrix which calls
 copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and
 dounzip.c (once).
 
 For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it
 :
 
 FIXME: surely memcpy would be faster here?
 
 which seems to refer to the for loop  :
 
 else { \
 int __i__; \
 type *__fp__ = fun(from), *__tp__ = fun(to); \
 for (__i__ = 0; __i__  __n__; __i__++) \
   __tp__[__i__] = __fp__[__i__]; \
   } \
 
 Could that loop be replaced by the following ?
 
else { \
memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
}\
 
 In the data.table package, dogroups.c uses this technique, so the principle
 is tested and works well so far.
 
 Are there any road blocks preventing this change, or is anyone already
 working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and
 submit patch with timings, as before.  Comments/pointers much appreciated.
 
 Matthew
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 
 
 -- 
 Romain Francois
 Professional R Enthusiast
 +33(0) 6 28 91 30 30
 http://romainfrancois.blog.free.fr
 |- http://bit.ly/9aKDM9 : embed images in Rd documents
 |- http://tr.im/OIXN : raster images and RImageJ
 |- http://tr.im/OcQe : Rcpp 0.7.7
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Romain Francois

Le 21/04/10 21:39, Simon Urbanek a écrit :



On Apr 21, 2010, at 3:32 PM, Romain Francois wrote:


Le 21/04/10 17:54, Matthew Dowle a écrit :



 From copyVector in duplicate.c :


void copyVector(SEXP s, SEXP t)
{
 int i, ns, nt;
 nt = LENGTH(t);
 ns = LENGTH(s);
 switch (TYPEOF(s)) {
...
 case INTSXP:
 for (i = 0; i   ns; i++)
 INTEGER(s)[i] = INTEGER(t)[i % nt];
 break;
...

could that be replaced with :

 case INTSXP:
 for (i=0; ins/nt; i++)
 memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t),
nt*sizeof(int));
 break;


or at least with something like this:

int* p_s = INTEGER(s) ;
int* p_t = INTEGER(t) ;
for( i=0 ; i  ns ; i++){
p_s[i] = p_t[i % nt];
}

since expanding the INTEGER macro over and over has a price.



... in packages, yes, but not in core R.


Hmm. I was not talking about the overhead of the INTEGER function:

int *(INTEGER)(SEXP x) {
if(TYPEOF(x) != INTSXP  TYPEOF(x) != LGLSXP)
error(%s() can only be applied to a '%s', not a '%s',
  INTEGER, integer, type2char(TYPEOF(x)));
return INTEGER(x);
}



but the one related to the macro.

#define INTEGER(x)  ((int *) DATAPTR(x))
#define DATAPTR(x)  (((SEXPREC_ALIGN *) (x)) + 1)

so the loop expands to :

for (i = 0; i   ns; i++)
  ((int *) (((SEXPREC_ALIGN *) (s)) + 1))[i] = ((int *) 
(((SEXPREC_ALIGN *) (t)) + 1))[i % nt];


I still believe grabbing the pointer just once for s and once for t is 
more efficient ...



Cheers,
Simon



and similar for the other types in copyVector.  This won't help regular
vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR
macro, see next suggestion below, but it should help copyMatrix which calls
copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and
dounzip.c (once).

For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it
:

 FIXME: surely memcpy would be faster here?

which seems to refer to the for loop  :

 else { \
 int __i__; \
 type *__fp__ = fun(from), *__tp__ = fun(to); \
 for (__i__ = 0; __i__   __n__; __i__++) \
   __tp__[__i__] = __fp__[__i__]; \
   } \

Could that loop be replaced by the following ?

else { \
memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
}\

In the data.table package, dogroups.c uses this technique, so the principle
is tested and works well so far.

Are there any road blocks preventing this change, or is anyone already
working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and
submit patch with timings, as before.  Comments/pointers much appreciated.

Matthew



--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9aKDM9 : embed images in Rd documents
|- http://tr.im/OIXN : raster images and RImageJ
|- http://tr.im/OcQe : Rcpp 0.7.7

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Simon Urbanek

On Apr 21, 2010, at 4:13 PM, Romain Francois wrote:

 Le 21/04/10 21:39, Simon Urbanek a écrit :
 
 
 On Apr 21, 2010, at 3:32 PM, Romain Francois wrote:
 
 Le 21/04/10 17:54, Matthew Dowle a écrit :
 
 From copyVector in duplicate.c :
 
 void copyVector(SEXP s, SEXP t)
 {
 int i, ns, nt;
 nt = LENGTH(t);
 ns = LENGTH(s);
 switch (TYPEOF(s)) {
 ...
 case INTSXP:
 for (i = 0; i   ns; i++)
 INTEGER(s)[i] = INTEGER(t)[i % nt];
 break;
 ...
 
 could that be replaced with :
 
 case INTSXP:
 for (i=0; ins/nt; i++)
 memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t),
 nt*sizeof(int));
 break;
 
 or at least with something like this:
 
 int* p_s = INTEGER(s) ;
 int* p_t = INTEGER(t) ;
 for( i=0 ; i  ns ; i++){
 p_s[i] = p_t[i % nt];
 }
 
 since expanding the INTEGER macro over and over has a price.
 
 
 ... in packages, yes, but not in core R.
 
 Hmm. I was not talking about the overhead of the INTEGER function:
 
 int *(INTEGER)(SEXP x) {
if(TYPEOF(x) != INTSXP  TYPEOF(x) != LGLSXP)
   error(%s() can only be applied to a '%s', not a '%s',
 INTEGER, integer, type2char(TYPEOF(x)));
return INTEGER(x);
 }
 
 
 
 but the one related to the macro.
 
 #define INTEGER(x)((int *) DATAPTR(x))
 #define DATAPTR(x)(((SEXPREC_ALIGN *) (x)) + 1)
 
 so the loop expands to :
 
 for (i = 0; i   ns; i++)
  ((int *) (((SEXPREC_ALIGN *) (s)) + 1))[i] = ((int *) 
 (((SEXPREC_ALIGN *) (t)) + 1))[i % nt];
 
 I still believe grabbing the pointer just once for s and once for t is more 
 efficient ...
 

Nope, since everything involved is loop invariant so the pointer values don't 
change (you'd have to declare s or t volatile to prevent that).

Try using gcc -s and you'll see that the code is the same (depending on the 
version of gcc the order of the first comparison can change so technically the 
INTEGER(x) version can save one add instruction in the degenerate case and be 
faster(!) in old gcc).

Cheers,
Simon



 
 and similar for the other types in copyVector.  This won't help regular
 vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR
 macro, see next suggestion below, but it should help copyMatrix which calls
 copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and
 dounzip.c (once).
 
 For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it
 :
 
 FIXME: surely memcpy would be faster here?
 
 which seems to refer to the for loop  :
 
 else { \
 int __i__; \
 type *__fp__ = fun(from), *__tp__ = fun(to); \
 for (__i__ = 0; __i__   __n__; __i__++) \
   __tp__[__i__] = __fp__[__i__]; \
   } \
 
 Could that loop be replaced by the following ?
 
else { \
memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); 
 \
}\
 
 In the data.table package, dogroups.c uses this technique, so the principle
 is tested and works well so far.
 
 Are there any road blocks preventing this change, or is anyone already
 working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and
 submit patch with timings, as before.  Comments/pointers much appreciated.
 
 Matthew
 
 
 -- 
 Romain Francois
 Professional R Enthusiast
 +33(0) 6 28 91 30 30
 http://romainfrancois.blog.free.fr
 |- http://bit.ly/9aKDM9 : embed images in Rd documents
 |- http://tr.im/OIXN : raster images and RImageJ
 |- http://tr.im/OcQe : Rcpp 0.7.7
 
 
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bugs? when dealing with contrasts

2010-04-21 Thread Gabor Grothendieck
On Wed, Apr 21, 2010 at 4:26 PM, Peter Dalgaard pda...@gmail.com wrote:
 As for case #1, the rules are tricky in cases where interactions are
 present without main effects, but AFAICS, what you observe is
 essentially the same effect as

 model.matrix(~fac-1, contrasts=list(fac=contr.sum))
   fac1 fac2 fac3
 1     1    0    0
 2     1    0    0
 3     1    0    0
 4     1    0    0
 5     1    0    0
 6     0    1    0
 7     0    1    0
 8     0    1    0
 9     0    1    0
 10    0    1    0
 11    0    0    1
 12    0    0    1
 13    0    0    1
 14    0    0    1
 15    0    0    1
 attr(,assign)
 [1] 1 1 1
 attr(,contrasts)
 attr(,contrasts)$fac
 [1] contr.sum


 I.e., that R reverts to using indicator variables when the intercept is
 absent.

Is there any nice way of getting contr.sum coding for the interaction
as opposed to the ugly code in my post that I used to force it? i.e.
cbind(1, model.matrix(~ fac)[,2:3] * scores)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Simon Urbanek

On Apr 21, 2010, at 4:39 PM, Simon Urbanek wrote:

 
 On Apr 21, 2010, at 4:13 PM, Romain Francois wrote:
 
 Le 21/04/10 21:39, Simon Urbanek a écrit :
 
 
 On Apr 21, 2010, at 3:32 PM, Romain Francois wrote:
 
 Le 21/04/10 17:54, Matthew Dowle a écrit :
 
 From copyVector in duplicate.c :
 
 void copyVector(SEXP s, SEXP t)
 {
int i, ns, nt;
nt = LENGTH(t);
ns = LENGTH(s);
switch (TYPEOF(s)) {
 ...
case INTSXP:
for (i = 0; i   ns; i++)
INTEGER(s)[i] = INTEGER(t)[i % nt];
break;
 ...
 
 could that be replaced with :
 
case INTSXP:
for (i=0; ins/nt; i++)
memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t),
 nt*sizeof(int));
break;
 
 or at least with something like this:
 
 int* p_s = INTEGER(s) ;
 int* p_t = INTEGER(t) ;
 for( i=0 ; i  ns ; i++){
p_s[i] = p_t[i % nt];
 }
 
 since expanding the INTEGER macro over and over has a price.
 
 
 ... in packages, yes, but not in core R.
 
 Hmm. I was not talking about the overhead of the INTEGER function:
 
 int *(INTEGER)(SEXP x) {
   if(TYPEOF(x) != INTSXP  TYPEOF(x) != LGLSXP)
  error(%s() can only be applied to a '%s', not a '%s',
INTEGER, integer, type2char(TYPEOF(x)));
   return INTEGER(x);
 }
 
 
 
 but the one related to the macro.
 
 #define INTEGER(x)   ((int *) DATAPTR(x))
 #define DATAPTR(x)   (((SEXPREC_ALIGN *) (x)) + 1)
 
 so the loop expands to :
 
 for (i = 0; i   ns; i++)
 ((int *) (((SEXPREC_ALIGN *) (s)) + 1))[i] = ((int *) 
 (((SEXPREC_ALIGN *) (t)) + 1))[i % nt];
 
 I still believe grabbing the pointer just once for s and once for t is more 
 efficient ...
 
 
 Nope, since everything involved is loop invariant so the pointer values don't 
 change (you'd have to declare s or t volatile to prevent that).
 
 Try using gcc -s

Sorry, I meant gcc -S of course.


 and you'll see that the code is the same (depending on the version of gcc the 
 order of the first comparison can change so technically the INTEGER(x) 
 version can save one add instruction in the degenerate case and be faster(!) 
 in old gcc

[the reason being that INTEGER(x) does not need to be evaluated if the loop is 
not entered whereas p_s and p_t are populated unconditionally - smarter 
compilers will notice that p_s/p_t are not used in that case and thus generate 
identical code in both cases]

Cheers,
Simon


 
 
 
 and similar for the other types in copyVector.  This won't help regular
 vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR
 macro, see next suggestion below, but it should help copyMatrix which 
 calls
 copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and
 dounzip.c (once).
 
 For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to 
 it
 :
 
FIXME: surely memcpy would be faster here?
 
 which seems to refer to the for loop  :
 
else { \
int __i__; \
type *__fp__ = fun(from), *__tp__ = fun(to); \
for (__i__ = 0; __i__   __n__; __i__++) \
  __tp__[__i__] = __fp__[__i__]; \
  } \
 
 Could that loop be replaced by the following ?
 
   else { \
   memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); 
 \
   }\
 
 In the data.table package, dogroups.c uses this technique, so the 
 principle
 is tested and works well so far.
 
 Are there any road blocks preventing this change, or is anyone already
 working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and
 submit patch with timings, as before.  Comments/pointers much appreciated.
 
 Matthew
 
 
 -- 
 Romain Francois
 Professional R Enthusiast
 +33(0) 6 28 91 30 30
 http://romainfrancois.blog.free.fr
 |- http://bit.ly/9aKDM9 : embed images in Rd documents
 |- http://tr.im/OIXN : raster images and RImageJ
 |- http://tr.im/OcQe : Rcpp 0.7.7
 
 
 
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread William Dunlap
If I were worried about the time this loop takes,
I would avoid using i%nt.  For the attached C code
compile with gcc 4.3.3 with -O2 I get 
   # INTEGER() in loop
   system.time( r1 - .Call(my_rep1, 1:3, 1e7) )
 user  system elapsed
0.060   0.012   0.071

   # INTEGER() before loop
   system.time( r2 - .Call(my_rep2, 1:3, 1e7) )
 user  system elapsed
0.076   0.008   0.086

   # replace i%src_length in loop with j=0 before loop and
   #if(++j==src_length) j=0 ;
   # in the loop.
   system.time( r3 - .Call(my_rep3, 1:3, 1e7) )
 user  system elapsed
0.024   0.028   0.050
   identical(r1,r2)  identical(r2,r3)
  [1] TRUE

The C code is:
#define USE_RINTERNALS /* pretend we are in the R kernel */
#include R.h
#include Rinternals.h


SEXP my_rep1(SEXP s_src, SEXP s_dest_length)
{
int src_length = length(s_src) ;
int dest_length = asInteger(s_dest_length) ;
int i,j ;
SEXP s_dest ;
PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
if(TYPEOF(s_src) != INTSXP) error(src must be integer data) ;
for(i=0;idest_length;i++) {
INTEGER(s_dest)[i] = INTEGER(s_src)[i % src_length] ;
}
UNPROTECT(1) ;
return s_dest ;
}
SEXP my_rep2(SEXP s_src, SEXP s_dest_length)
{
int src_length = length(s_src) ;
int dest_length = asInteger(s_dest_length) ;
int *psrc = INTEGER(s_src) ;
int *pdest ;
int i ;
SEXP s_dest ;
PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
pdest = INTEGER(s_dest) ;
if(TYPEOF(s_src) != INTSXP) error(src must be integer data) ;
/* end of boilerplate */
for(i=0;idest_length;i++) {
pdest[i] = psrc[i % src_length] ;
}
UNPROTECT(1) ;
return s_dest ;
}
SEXP my_rep3(SEXP s_src, SEXP s_dest_length)
{
int src_length = length(s_src) ;
int dest_length = asInteger(s_dest_length) ;
int *psrc = INTEGER(s_src) ;
int *pdest ;
int i,j ;
SEXP s_dest ;
PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
pdest = INTEGER(s_dest) ;
if(TYPEOF(s_src) != INTSXP) error(src must be integer data) ;
/* end of boilerplate */
for(j=0,i=0;idest_length;i++) {
*pdest++ = psrc[j++] ;
if (j==src_length) {
j = 0 ;
}
}
UNPROTECT(1) ;
return s_dest ;
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-devel-boun...@r-project.org 
 [mailto:r-devel-boun...@r-project.org] On Behalf Of Romain Francois
 Sent: Wednesday, April 21, 2010 12:32 PM
 To: Matthew Dowle
 Cc: r-de...@stat.math.ethz.ch
 Subject: Re: [Rd] suggestion how to use memcpy in duplicate.c
 
 Le 21/04/10 17:54, Matthew Dowle a écrit :
 
  From copyVector in duplicate.c :
 
  void copyVector(SEXP s, SEXP t)
  {
   int i, ns, nt;
   nt = LENGTH(t);
   ns = LENGTH(s);
   switch (TYPEOF(s)) {
  ...
   case INTSXP:
   for (i = 0; i  ns; i++)
   INTEGER(s)[i] = INTEGER(t)[i % nt];
   break;
  ...
 
  could that be replaced with :
 
   case INTSXP:
   for (i=0; ins/nt; i++)
   memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char 
 *)DATAPTR(t),
  nt*sizeof(int));
   break;
 
 or at least with something like this:
 
 int* p_s = INTEGER(s) ;
 int* p_t = INTEGER(t) ;
 for( i=0 ; i  ns ; i++){
   p_s[i] = p_t[i % nt];
 }
 
 since expanding the INTEGER macro over and over has a price.
 
  and similar for the other types in copyVector.  This won't 
 help regular
  vector copies, since those seem to be done by the 
 DUPLICATE_ATOMIC_VECTOR
  macro, see next suggestion below, but it should help 
 copyMatrix which calls
  copyVector, scan.c which calls copyVector on three lines, 
 dcf.c (once) and
  dounzip.c (once).
 
  For the DUPLICATE_ATOMIC_VECTOR macro there is already a 
 comment next to it
  :
 
   FIXME: surely memcpy would be faster here?
 
  which seems to refer to the for loop  :
 
   else { \
   int __i__; \
   type *__fp__ = fun(from), *__tp__ = fun(to); \
   for (__i__ = 0; __i__  __n__; __i__++) \
 __tp__[__i__] = __fp__[__i__]; \
 } \
 
  Could that loop be replaced by the following ?
 
  else { \
  memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), 
 __n__*sizeof(type)); \
  }\
 
  In the data.table package, dogroups.c uses this technique, 
 so the principle
  is tested and works well so far.
 
  Are there any road blocks preventing this change, or is 
 anyone already
  working on it ?  If not then I'll try and test it (on 
 Ubuntu 32bit) and
  submit patch with timings, as before.  Comments/pointers 
 much appreciated.
 
  Matthew
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 
 
 -- 
 Romain Francois
 Professional R Enthusiast
 +33(0) 6 28 91 30 30
 http://romainfrancois.blog.free.fr
 |- http://bit.ly/9aKDM9 : embed images in Rd documents
 |- http://tr.im/OIXN : raster images and RImageJ
 |- http://tr.im/OcQe : Rcpp 

[Rd] RUnit bug?

2010-04-21 Thread Dominick Samperi
There appears to be a bug in RUnit.

Given a testsuite testsuite.math, say, when I run:

runTestSuite(testsuite.math)

this works fine, provided there are no extraneous files in the
unit test subdirectory.

But if there are any Emacs temp files (with names that
end with '~') then runTestSuite gets confused and tries to
run functions from the temp files as well.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Question of R CMD check

2010-04-21 Thread rusers.sh
Hi all,
Today, i just installed the newest R version 2.10.1 and other necessary
tools for building R package under windows,e.g. Rtools, perl. All are the
newest version.
  After the correct configuration under windows (configuration should be
correct), i use it to re-check my old package. I found the following prolem
when checking EXAMPLEs in each function, which did not exist before this
re-installation.

* checking examples ... ERROR
 Running examples in 'stam-Ex.R' failed.

  I used \dontrun{} % enddontrun in all the examples of my functions that
should be no problem, i think. I checked my package before and did not find
errors. I also browsed the checking results in 'stam-Ex.R'. It listed all
the example codes in that file, something like this,
  cleanEx(); nameEx(stcdt)
  ### * stcdt
  flush(stderr()); flush(stdout())
  ###example codes

  I did not met this problem before.  Any ideas on solving this?
  Thanks a lot.

-- 
-
Jane Chang
Queen's

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bugs? when dealing with contrasts

2010-04-21 Thread Berwin A Turlach
G'day Gabor,

On Wed, 21 Apr 2010 14:00:33 -0400
Gabor Grothendieck ggrothendi...@gmail.com wrote:

 Below are two cases where I don't seem to be getting contr.sum
 contrasts even though they were specified.  Are these bugs?

Short answer: no. :)

 The first case is an interaction between continuous and factor
 variables.
 
 The second case contrasts= was specified as an arg to lm.  The second
 works ok if we set the contrasts through options but not if we set it
 through an lm argument.
 
 
  # 1. In this case I don't seem to be getting contr.sum contrasts:
 
  options(contrasts = c(contr.sum, contr.poly))
  getOption(contrasts)
 [1] contr.sum  contr.poly
  scores - rep(seq(-2, 2), 3); scores
  [1] -2 -1  0  1  2 -2 -1  0  1  2 -2 -1  0  1  2
  fac - gl(3, 5); fac
  [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
 Levels: 1 2 3
 
  # I get this:
  model.matrix(~ scores:fac)
(Intercept) scores:fac1 scores:fac2 scores:fac3
 11  -2   0   0
 21  -1   0   0
 31   0   0   0
 41   1   0   0
 51   2   0   0
 61   0  -2   0
 71   0  -1   0
 81   0   0   0
 91   0   1   0
 10   1   0   2   0
 11   1   0   0  -2
 12   1   0   0  -1
 13   1   0   0   0
 14   1   0   0   1
 15   1   0   0   2
 attr(,assign)
 [1] 0 1 1 1
 attr(,contrasts)
 attr(,contrasts)$fac
 [1] contr.sum

When creating the model matrix, the various levels of a factor are
first coded as indicator variables (for creation of the main effects
and for creation of interactions).  The contrast matrix is then
applied to remove redundant columns from the design matrix; that is,
columns that are known to be redundant due to the *design* (i.e. model
formula as a whole), depending on whether data is available for all
levels of a factor (or combinations of levels if several factors are in
the model) you can still end up with a design matrix that does not have
full column rank.

In this case, since there is no main effect fac, the three columns that
are put into the design matrix due to the score:fac term do not create
a singularity, hence the contrast matrix is not applied to these three
columns.

The above description is probably a bit rough and lacking in detail (if
not even wrong in some details).  IMHO, the best explanation of how R
goes from the model formula to the actual design matrix (for linear
models) can still be found in MASS; though, if I am not mistaken,
current versions of R seem to proceed slightly different to that
description in more complicated models (i.e. several factors and
continuous explanatory variables).

  # But I was expecting this since I am using contr.sum
  cbind(1, model.matrix(~ fac)[,2:3] * scores)
  fac1 fac2
 1  1   -20
 2  1   -10
 3  100
 4  110
 5  120
 6  10   -2
 7  10   -1
 8  100
 9  101
 10 102
 11 122
 12 111
 13 100
 14 1   -1   -1
 15 1   -2   -2

If you wish to have the design matrix that contains the interaction
terms as if the model had the main effects too but without the columns
corresponding to the main effects, then just instruct R that you want
to have such a matrix:

R model.matrix(~ scores*fac)[,-(2:4)]
   (Intercept) scores:fac1 scores:fac2
11  -2   0
21  -1   0
31   0   0
41   1   0
51   2   0
61   0  -2
71   0  -1
81   0   0
91   0   1
10   1   0   2
11   1   2   2
12   1   1   1
13   1   0   0
14   1  -1  -1
15   1  -2  -2

Though, I am not sure why one wants to fit such a model.  

  # 2.
  # here I don't get contr.sum but rather get contr.treatment
  options(contrasts = c(contr.treatment, contr.poly))
  getOption(contrasts)
 [1] contr.treatment contr.poly
  model.matrix(lm(seq(15) ~ fac, contrasts = c(contr.sum,
  contr.poly)))
(Intercept) fac2 fac3
 1100
 2100
 3100
 4100
 5100
 6110
 7110
 8110
 9110
 10   110
 11   101
 12   101
 13   101
 14   101
 15   

Re: [Rd] Bugs? when dealing with contrasts

2010-04-21 Thread Gabor Grothendieck
On Wed, Apr 21, 2010 at 11:38 PM, Berwin A Turlach
ber...@maths.uwa.edu.au wrote:
  # But I was expecting this since I am using contr.sum
  cbind(1, model.matrix(~ fac)[,2:3] * scores)
      fac1 fac2
 1  1   -2    0
 2  1   -1    0
 3  1    0    0
 4  1    1    0
 5  1    2    0
 6  1    0   -2
 7  1    0   -1
 8  1    0    0
 9  1    0    1
 10 1    0    2
 11 1    2    2
 12 1    1    1
 13 1    0    0
 14 1   -1   -1
 15 1   -2   -2

 If you wish to have the design matrix that contains the interaction
 terms as if the model had the main effects too but without the columns
 corresponding to the main effects, then just instruct R that you want
 to have such a matrix:

 R model.matrix(~ scores*fac)[,-(2:4)]
   (Intercept) scores:fac1 scores:fac2
 1            1          -2           0
 2            1          -1           0
 3            1           0           0
 4            1           1           0
 5            1           2           0
 6            1           0          -2
 7            1           0          -1
 8            1           0           0
 9            1           0           1
 10           1           0           2
 11           1           2           2
 12           1           1           1
 13           1           0           0
 14           1          -1          -1
 15           1          -2          -2

 Though, I am not sure why one wants to fit such a model.

To save on degrees of freedom.

I had reduced it to the minimum needed to illustrate but in fact its
closer to this (except the coding is not the one needed):

options(contrasts = c(contr.sum, contr.poly))
tab - as.table(matrix(1:21, 7))
dimnames(tab) = list(X = letters[1:7], Y = LETTERS[1:3])
rr - factor(row(tab))
cc - factor(col(tab))
scores - rep(seq(-3,3), 3)
model.matrix( ~ rr + cc + scores:cc)

so the main effects are rr and cc but scores takes the place of rr in
the interaction.

Your description of the process seems right since it would predict
that the following gives the required coding and it does:

model.matrix(~ scores*cc + rr)[,-2]

Thanks.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel