Re: [Rd] function call overhead

2011-02-16 Thread Olaf Mersmann
Dear Hadly, dear list,

On Wed, Feb 16, 2011 at 9:53 PM, Hadley Wickham had...@rice.edu wrote:
 I wondered about this statement too but:

 system.time(replicate(1e4, base::print))
   user  system elapsed
  0.539   0.001   0.541
 system.time(replicate(1e4, print))
   user  system elapsed
  0.013   0.000   0.012

These timings are skewed. Because I too have wondered about this in
the past, I recently published the microbenchmark package which tries
hard to accurately time it takes to evaluate some expression(s). Using
this package I get:

 library(microbenchmark)
 res - microbenchmark(print, base::print, times=1)
 res
Unit: nanoeconds  ## I've fixed the typo, but not pushed to CRAN
  minlq  medianuq max
print  576568.069   48389
base::print 41763 43357 44278.5 48403 4749851

A better way to look at this is by converting to evaluations per second:

 print(res, unit=eps)
Unit: evaluations per second
min  lq  median  uqmax
print   17543859.65 15384615.38 14705882.35 14492753.62 20665.8538
base::print23944.6423064.3322584.3220659.88   210.5329

Resolving 23000 names per second or ~15M ist quite a dramatic
difference in my world. The timings obtained by

  system.time(replicate(1e4, base::print))
   User  System verstrichen
  0.475   0.006   0.483
  system.time(replicate(1e4, print))
   User  System verstrichen
  0.011   0.001   0.014

are skewed by the overhead of replicate() in this case because the
execution time of the expression under test is so short.

Cheers,
Olaf Mersmann

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Possible bug in R parser

2011-01-24 Thread Olaf Mersmann
Dear R developers,

A recent typo led me to discover, that R is happy to accept

   20x2
  [1] 20

as input. This appears to be related to the parsing of hexadecimal
constants, since there must be a zero before the 'x' (i.e. 2x2 or
02x02 gives the expected error). All this is under R 2.12.1 on both OS
X and Linux. Is this expected behavior?

Cheers,
Olaf Mersmann

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] list comprehension to create an arbitrary-sized list with arbitrary names/values

2010-10-13 Thread Olaf Mersmann
Hi,

On 13.10.2010, at 21:26, Steve Kim wrote:
 mydict = dict([(keyfun(x), valfun(x)) for x in mylist])
 
 to create a dictionary with whatever keys and values we want from an
 input list of arbitrary size. In R, I want to similarly create a list
 with names/values that are generated by some keyfun and valfun
 (assuming that keyfun is guaranteed to return something suitable as a
 name). How can I do this?

Try something like this:

  mydict - lapply(mylist, valfun)
  names(mydict) - sapply(mylist, keyfun)

or

  mydict - structure(lapply(mylist, valfun), names=sapply(mylist, keyfun))

Cheers
Olaf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tabulate() does not check for input bounds

2010-10-03 Thread Olaf Mersmann
Dear Simone,

On 04.10.2010, at 01:01, Simone Giannerini wrote:
 it looks like that tabulate() does not check for the bounds of the input.
 Reproducible example:
 
 b - 1:2
 tabulate(b[1:100])
 [1] 1 1

this looks perfectly reasonable. Consider the result of 

 b - 1:2
 b[1:100]
  [1]  1  2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [76] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

and check the help page for tabulate (esp. the na.rm argument).

What was your expected result?

Cheers,
Olaf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Speed improvement for Find() and Position()

2010-09-01 Thread Olaf Mersmann
Dear R-developers,

both Find() and Position() (as the documentation mentions) are currently not 
optimized in any way. I have rewritten both functions in a more efficient 
manner by replacing the sapply() with a for() loop that terminates early if a 
match is found. Here is a patch against the current subversion HEAD

  http://www.statistik.tu-dortmund.de/~olafm/temp/fp.patch

and here are some numbers to show that this change is worth while:

% cat fp_bench.R 
set.seed(42)
pred - function(z) z == 1

for (n in c(10^(2:4))) {
  x - sample(1:n, 2*n, replace=TRUE)
  
  tf - system.time(replicate(1000L, Find(pred, x)))
  message(sprintf(Find: n=%5i user=%6.3f system=%6.3f,
  2*n, tf[1], tf[2]))

  tp - system.time(replicate(1000L, Find(pred, x)))
  message(sprintf(Position: n=%5i user=%6.3f system=%6.3f,
  2*n, tp[1], tp[2]))
}

## Unpatched R:
% Rscript fp_bench.R 
Find: n=  200 user= 0.491 system= 0.015
Position: n=  200 user= 0.477 system= 0.014
Find: n= 2000 user= 4.450 system= 0.083
Position: n= 2000 user= 4.507 system= 0.094
Find: n=2 user=63.435 system= 1.497
Position: n=2 user=63.130 system= 1.328

## Patched R:
% ./bin/Rscript fp_bench.R
Find: n=  200 user= 0.101 system= 0.013
Position: n=  200 user= 0.085 system= 0.003
Find: n= 2000 user= 0.781 system= 0.002
Position: n= 2000 user= 0.809 system= 0.012
Find: n=2 user=20.537 system= 0.394
Position: n=2 user=20.502 system= 0.404

Cheers,
Olaf Mersmann
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] transpose of complex matrices in R

2010-07-30 Thread Olaf Mersmann
Hi,

On 30.07.2010, at 11:35, Robin Hankin wrote:
 3.  Try to define a t.complex() function:
 t.complex - function(x){t(Conj(x))}
 (also fails because of recursion)

Try this version:

  t.complex - function(x) {
xx - Conj(x)
.Internal(t.default(xx))
  }

You get infinite recursion in your example because you keep dispatching on the 
(complex) result of Conj(x) in t(Conj(x)). I'm not sure if the use of .Internal 
in user code is sanctioned but it does work for me.

Cheers,
Olaf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Attributes of 1st argument in ...

2010-07-03 Thread Olaf Mersmann
Hi Daniel,

On 02.07.2010, at 23:26, Daniel Murphy wrote:
 I am trying to get an attribute of the first argument in a call to a
 function whose formal arguments consist of dots only and do something, e.g.,
 call 'cbind', based on the attribute
 f- function(...) {get first attribute; maybe or maybe not call 'cbind'}
 
 I thought of (ignoring deparse.level for the moment)
 
 f-function(...) {x - attr(list(...)[[1L]], foo); if (x==bar)
 cbind(...) else x}

what about using the somewhat obscure ..1 syntax? This version runs quite a bit 
faster for me:

  g - function(...) {
x - attr(..1, foo)
if (x == bar)
  cbind(...)
else
  x
  }

but it will be hard to quantify how this pans out for your unless we know how 
many and what size and type the arguments are.

Cheers,
Olaf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Attributes of 1st argument in ...

2010-07-02 Thread Olaf Mersmann
Hi Daniel,

On 02.07.2010, at 23:26, Daniel Murphy wrote:
 I am trying to get an attribute of the first argument in a call to a
 function whose formal arguments consist of dots only and do something, e.g.,
 call 'cbind', based on the attribute
 f- function(...) {get first attribute; maybe or maybe not call 'cbind'}
 
 I thought of (ignoring deparse.level for the moment)
 
 f-function(...) {x - attr(list(...)[[1L]], foo); if (x==bar)
 cbind(...) else x}

what about using the somewhat obscure ..1 syntax? This version runs quite a bit 
faster for me:

 g - function(...) {
   x - attr(..1, foo)
   if (x == bar)
 cbind(...)
   else
 x
 }

but it will be hard to quantify how this pans out for your unless we know how 
many and what size and type the arguments are.

Cheers,
Olaf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug in memDecompress()

2010-05-07 Thread Olaf Mersmann
Dear R developers,

I have discovered a bug in the implementation of lzma decompression in 
memDecompress(). It is only triggered if the uncompressed size of the content 
is more than 3 times as large as the compressed content. Here's a simple 
example to reproduce it:

  n - 200
  
  char - paste(replicate(n, 1234567890), collapse=)
  char.comp - memCompress(char, type=xz)
  char.dec - memDecompress(char.comp, type=xz, asChar=TRUE)
  nchar(char.dec) == nchar(char)

  raw - serialize(char, connection=NULL)
  raw.comp - memCompress(raw, type=xz)
  raw.dec - memDecompress(raw.comp, type=xz)
  length(raw.dec) == length(raw)

  char.uns - unserialize(raw.dec)

The root cause seems to be, that lzma_code() will return LZMA_OK even if it 
could not decompress the whole content. In this case strm.avail_in will be 
greater than zero. The following patch changes the respective if statements:

  http://www.statistik.tu-dortmund.de/~olafm/temp/memdecompress.patch

It also contains a small fix from the xz upstream for an uninitialized field in 
lzma_stream.

Cheers,
Olaf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Small documentation fix for [.data.frame

2010-02-19 Thread Olaf Mersmann
Hello,

in the manual page for [.data.frame it reads:

  ... There is a method for replacement which checks \code{value} for
  the corrupt number of row, and replicates it if necessary. ...

This should probably read

  ... There is a method for replacement which checks \code{value} for
  the correct number of rows, and replicates it if necessary. ...

A trivial patch changing this is can be found here:

  http://www.statistik.tu-dortmund.de/~olafm/temp/edf_doc.patch
 
Cheers,
Olaf Mersmann

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Fix for incorrect use of restrict in xz third party code

2010-02-19 Thread Olaf Mersmann
Hello,

the included XZ Utils source code contains an incorrect use of the
restrict keyword. This leads to data corruption under certain
circumstances. For a short discussion of the problem see

  http://sourceforge.net/projects/lzmautils/forums/forum/708858/topic/3306733

This was fixed in the XZ Utils git repository in commit 

  commit 49cfc8d392cf535f8dd10233225b1fc726fec9ef
  Author: Lasse Collin lasse.col...@tukaani.org
  Date:   Tue Sep 15 21:07:23 2009 +0300

Fix incorrect use of restrict.

Since then, there has not been a proper release of the XZ Utils so I
have applied said patch to the sources included in R and added a note
to the R_changes file in the src/extra/xz/ directory detailing the
changes.

This 'bug' is only triggered if the Intel C or gcc 4.4 is used to
compile R and the included liblzma is used instead of a system wide
one, so it might not be worth the trouble of patching the sources
instead of waiting for a new release. If anyone wants to apply a fix,
I have prepared a patch with all the changes which can be found here

  http://www.statistik.tu-dortmund.de/~olafm/temp/xz_restrict.patch

Cheers,
Olaf Mersmann

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Using svSocket with data.table

2009-07-25 Thread Olaf Mersmann
Hi Matthew,

Excerpts from Matthew Dowle's message of Sat Jul 25 09:07:44 +0200 2009:
 So I'm looking to do the same as the demo,  but with a binary socket.  Does 
 anyone have any ideas?  I've looked a bit at  Rserve, bigmemory, biocep, nws 
 but although all those packages are great,  I didn't find anything that 
 worked in exactly this way i.e.  i) R to R ii) CLI non-blocking and iii) no 
 need to startup R in a special way

Don't be fooled. R does not handle multiple requests in parallel
internally. Also I suspect that, depending on what you do on the CLI,
this will interact badly with svSocket. 

As far as binary transfer of R objects goes, you are probably looking
for serialize() and unserialize(). Not sure if these are guaranteed to
work across differen versions of R and different word sizes. See the
Warnings section in the serialize manual page.

Cheers
Olaf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] model.matrix memory problem (PR#13838)

2009-07-17 Thread Olaf Mersmann
Hi,

Excerpts from Torsten.Hothorn's message of Thu Jul 16 17:20:10 +0200 2009:
 `model.matrix' might kill R with a segfault (on a illposed problem, but 
 anyway):
 
 mydf - as.data.frame(sapply(1:40, function(i) gl(2, 100)))
 f - as.formula(paste(~ - 1 + , paste(names(mydf), collapse = :), sep = 
 ))
 X - model.matrix(f, data = mydf)
 
   *** caught segfault ***
 address 0x18, cause 'memory not mapped'
 Segmentation fault

I've taken a look at this. The problem lies in lines 1784 - 1798 of
src/main/model.c. What happens is that 'k' overflows (signed
int). That means k is 0 after the loop an nc is set to 0. That means
the allocated model matrix 'x' is too small which results in the
observed segfault. 

I can provide a patch which checks for overflow and throws an error if
that is the desired behaviour.

Greetings,
Olaf Mersmann

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Logical Error? (PR#13516)

2009-02-10 Thread Olaf Mersmann
Excerpts from camey's message of Tue Feb 10 15:55:04 +0100 2009:
 Using the commands bellow I expected that the answer is TRUE, but it is FALSE!
 
 P_exposicao=.9
 (1-P_exposicao)==.1

Look at the difference of the two, it is much smaller than .Machine$double.eps 
on my computer.

This is not a bug, it's due to the limited precision of floating point numbers.

Sincerely
Olaf Mersmann

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Patch to fix small bug in do_External and do_dotcall

2008-12-28 Thread Olaf Mersmann
Excerpts from Prof Brian Ripley's message of Sun Dec 28 15:03:28 +0100 2008:
 Thank you.  You do realize that your example is not passing a 
 NativeSymbolInfo object, don't you?  I believe the intention is that you 
 pass 'sym'.

Yes, I had originally used sym and later changed it to sym$address to
verify, that the fix also works for case b) described in the comment
above checkValidSymboldId(), namly passing in a pointer to a
function. I'm not sure if this is currently a 'supported' method of
calling a function, since it is only mentioned in the man page for
getNativeSymbolInfo() and not in the .Call() man page.

 I'll incoporate the patch once I have worked out an accuracy description 
 of what it does 

The behavior before the patch is to assume that the head of args is a
string. When it is anything else, the call to translateChar() when
deriving the function name for the error message fails.

Instead of dealing with each possible type, the patch reuses the
function name that was returned by resolveNativeRoutine() which in
turn calls checkValidSymbolId() (all defined in dotcode.c). If
CAR(args) is a string, checkValidSymboldId simply returns and
resolveNativeRoutine() copies the name into buf. If CAR(args) is a
NativeSymbolInfo object, checkValidSymbolId() recalls itself with the
second element of the NativeSymbolInfo object (its address
member). Lastly if CAR(args) is a EXTRPTRSXP (the type of the address
member of a NativeSymbolInfo object) checkValidSymboldId() extracts
the symbol name from the EXTPTRSXP if it is a registered symbol. This
is the only loophole. If I where to pass an address to .Call() or
.External() which was a native symbol but not a registered native
symbol, the buffer holding the function name would never be
filled. I'm not sure how to deal with this corner case. One option
would be to copy some (un)descriptive like 'Unknwon' into the
buffer. If this is acceptable I can add it and post a revised patch.

Greetings from Dortmund
Olaf Mersmann

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Patch to fix small bug in do_External and do_dotcall

2008-12-26 Thread Olaf Mersmann
I've stumbled upon a small bug/inconsitency in do_External and do_dotcall:

Here's an example:

  % LC_ALL=C R  --vanilla  symname-bug.R

  R version 2.8.0 (2008-10-20)
  *snip*
   options(error=expression(0))
   ## Call 'R_GD_nullDevice' with incorrect parameter count:
   .Call(R_GD_nullDevice, 1)
  Error in .Call(R_GD_nullDevice, 1) : 
Incorrect number of arguments (1), expecting 0 for R_GD_nullDevice
   
   ## Same call made via a NativeSymbolInfo object:
   sym - getDLLRegisteredRoutines(grDevices)$.Call[[R_GD_nullDevice]]
   .Call(sym$address, 1)
  Error: 'getEncChar' must be called on a CHARSXP

The error stems from the fact, that both do_External and do_dotcall
expect CAR(args) to be a string, while it might be a NativeSymbolInfo
object. checkValidSymbolId() already handles this, so the fix is to
use the symbol name returned from resolveNativeRoutine().

After applying the attached patch (against R-trunk revision 47348) the
output looks like this:

  % LC_ALL=C bin/R  --vanilla  symname-bug.R
  
  R version 2.9.0 Under development (unstable) (2008-12-26 r47348)
  *snip*
   options(error=expression(0))
   ## Call 'R_GD_nullDevice' with incorrect parameter count:
   .Call(R_GD_nullDevice, 1)
  Error in .Call(R_GD_nullDevice, 1) : 
Incorrect number of arguments (1), expecting 0 for R_GD_nullDevice
   
   ## Same call made via a NativeSymbolInfo object:
   sym - getDLLRegisteredRoutines(grDevices)$.Call[[R_GD_nullDevice]]
   .Call(sym$address, 1)
  Error in .Call(sym$address, 1) : 
Incorrect number of arguments (1), expecting 0 for R_GD_nullDevice

Greetings from Dortmund
Olaf
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Patch to fix small bug in do_External and do_dotcall

2008-12-26 Thread Olaf Mersmann
Excerpts from Prof Brian Ripley's message of Sat Dec 27 06:59:24 +0100 2008:
 Thank you, but can we see the patch please (no attachement arrived)?

I've posted them online:

  http://www.statistik.tu-dortmund.de/~olafm/files/symname-bug.R
  http://www.statistik.tu-dortmund.de/~olafm/files/symname-bug.patch

Sorry for the inconvenience, not sure why the attachments got lost.

Greetings from Dortmund,
Olaf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] DTrace probes for R

2008-12-18 Thread Olaf Mersmann
I've integrated some DTrace [1] probes into R. Namely a probe which
fires on fuction entry and return and one which fires before / after a
garbage collection.

Is there any interest in merging something like this into R-devel? If
yes, I'd like to discuss which probes and what data would be useful /
interesting from a developers standpoint.

Greetings from Dortmund,
Olaf Mersmann

[1] http://www.sun.com/bigadmin/content/dtrace/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel