Re: [Rd] Circumventing code/documentation mismatches ('R CMD check')

2011-07-05 Thread peter dalgaard

On Jul 5, 2011, at 08:00 , Johannes Graumann wrote:

 Hello,
 
 As prompted by B. Ripley (see below), I am transfering this over from R-User 
 ...
 
 For a package I am writing a function that looks like
 
 test - function(Argument1=NA){
   # Prerequisite testing
   if(!(is.na(Argument1))){
   if(!(is.character(Argument1))){
   stop(Wrong class.)
   }
   }
   # Function Body
   cat(Hello World\n)
 }
 
 Documentation of this is straight forward:
 
 ...
 \usage{test(Argument1=NA)}
 ...
 
 However writing the function could be made more concise like so:
 
 test2 - function(Argument1=NA_character_){
   # Prerequisite testing
   if(!(is.character(Argument1))){
   stop(Wrong class.)
   }
   # Function Body
   cat(Hello World\n)
 }
 
 To prevent confusion I do not want to use 'NA_character_' in the user-
 exposed documentation and using 
 
 ...
 \usage{test2(Argument1=NA)}
 ...
 
 leads to a warning reagrding a code/documentation mismatch.
 
 Is there any way to prevent that?

You don't want to do that... 

That strategy breaks if someone passes the documented default explicitly, 
which certainly _causes_ confusion rather than prevent it. I.e.

test2(NA) # fails

test3 - function(a=NA) test2(a) # 3rd party code might build on your function
test3() # fails

If your function only accept character values, even if NA, then that is what 
should be documented. In the end, you'll find that an explicit is.na() is the 
right thing to do. 



-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [datatable-help] speeding up perception

2011-07-05 Thread Matthew Dowle

Simon,

Thanks for the great suggestion. I've written a skeleton assignment
function for data.table which incurs no copies, which works for this
case. For completeness, if I understand correctly, this is for : 
  i) convenience of new users who don't know how to vectorize yet
  ii) more complex examples which can't be vectorized.

Before:

 system.time(for (r in 1:R) DT[r,20] - 1.0)
   user  system elapsed 
 12.792   0.488  13.340 

After :

 system.time(for (r in 1:R) DT[r,20] - 1.0)
   user  system elapsed 
  2.908   0.020   2.935

Where this can be reduced further as follows :

 system.time(for (r in 1:R) `[-.data.table`(DT,r,2,1.0))
   user  system elapsed 
  0.132   0.000   0.131 
 

Still working on it. When it doesn't break other data.table tests, I'll
commit to R-Forge ...

Matthew


On Mon, 2011-07-04 at 12:41 -0400, Simon Urbanek wrote:
 Timothée,
 
 On Jul 4, 2011, at 2:47 AM, Timothée Carayol wrote:
 
  Hi --
  
  It's my first post on this list; as a relatively new user with little
  knowledge of R internals, I am a bit intimidated by the depth of some
  of the discussions here, so please spare me if I say something
  incredibly silly.
  
  I feel that someone at this point should mention Matthew Dowle's
  excellent data.table package
  (http://cran.r-project.org/web/packages/data.table/index.html) which
  seems to me to address many of the inefficiencies of data.frame.
  data.tables have no row names; and operations that only need data from
  one or two columns are (I believe) just as quick whether the total
  number of columns is 5 or 1000. This results in very quick operations
  (and, often, elegant code as well).
  
 
 I agree that data.table is a very good alternative (for other reasons) that 
 should be promoted more. The only slight snag is that it doesn't help with 
 the issue at hand since it simply does a pass-though for subassignments to 
 data frame's methods and thus suffers from the same problems (in fact there 
 is a rather stark asymmetry in how it handles subsetting vs subassignment - 
 which is a bit surprising [if I read the code correctly you can't use the 
 same indexing in both]). In fact I would propose that it should not do that 
 but handle the simple cases itself more efficiently without unneeded copies. 
 That would make it indeed a very interesting alternative.
 
 Cheers,
 Simon
 
 
  
  On Mon, Jul 4, 2011 at 6:19 AM, ivo welch ivo.we...@gmail.com wrote:
  thank you, simon.  this was very interesting indeed.  I also now
  understand how far out of my depth I am here.
  
  fortunately, as an end user, obviously, *I* now know how to avoid the
  problem.  I particularly like the as.list() transformation and back to
  as.data.frame() to speed things up without loss of (much)
  functionality.
  
  
  more broadly, I view the avoidance of individual access through the
  use of apply and vector operations as a mixed IQ test and knowledge
  test (which I often fail).  However, even for the most clever, there
  are also situations where the KISS programming principle makes
  explicit loops still preferable.  Personally, I would have preferred
  it if R had, in its standard statistical data set data structure,
  foregone the row names feature in exchange for retaining fast direct
  access.  R could have reserved its current implementation with row
  names but slow access for a less common (possibly pseudo-inheriting)
  data structure.
  
  
  If end users commonly do iterations over a data frame, which I would
  guess to be the case, then the impression of R by (novice) end users
  could be greatly enhanced if the extreme penalties could be eliminated
  or at least flagged.  For example, I wonder if modest special internal
  code could store data frames internally and transparently as lists of
  vectors UNTIL a row name is assigned to.  Easier and uglier, a simple
  but specific warning message could be issued with a suggestion if
  there is an individual read/write into a data frame (Warning: data
  frames are much slower than lists of vectors for individual element
  access).
  
  
  I would also suggest changing the Introduction to R 6.3  from A
  data frame may for many purposes be regarded as a matrix with columns
  possibly of differing modes and attributes. It may be displayed in
  matrix form, and its rows and columns extracted using matrix indexing
  conventions. to A data frame may for many purposes be regarded as a
  matrix with columns possibly of differing modes and attributes. It may
  be displayed in matrix form, and its rows and columns extracted using
  matrix indexing conventions.  However, data frames can be much slower
  than matrices or even lists of vectors (which, like data frames, can
  contain different types of columns) when individual elements need to
  be accessed.  Reading about it immediately upon introduction could
  flag the problem in a more visible manner.
  
  
  regards,
  
  /iaw
  
  __
  

Re: [Rd] Recent and upcoming changes to R-devel

2011-07-05 Thread Tobias Verbeke

L.S.

On 07/05/2011 02:16 AM, mark.braving...@csiro.au wrote:

I may have misunderstood, but:

Please could we have an optional installation that does not*not*  byte-compile 
base and recommended?

Reason: it's not possible to debug byte-compiled code-- at least not with the 
'debug' package, which is quite widely used. I quite often end up using 
'mtrace' on functions in base/recommended packages to figure out what they are 
doing. And sometimes I (and others) experiment with changing functions in 
base/recommended to improve functionality. That seems to be harder with BC 
versions, and might even be impossible, as best I can tell from hints in the 
documentation of 'compile').

Personally, if I had to choose only one, I'd rather live with the speed penalty 
from not byte-compiling. But of course, if both are available, I could install 
both.


I completely second this request. All speed improvements and the byte
compiler in particular are leaps forward and I am very grateful and
admiring towards the people that make this happen.

That being said, 'moving away' from the sources (with the lazy loading
files and byte-compilation) may be a step back for R package developers
that (during development and maybe on separate development installations 
[as opposed to production installations of R]) require

the sources of all packages to be efficient in their work.

As many of you know there is an open source Eclipse/StatET visual
debugger ready and for that application as well (similar to Mark's
request) presence of non-compiled code is highly desirable.

For the particular purpose of debugging R packages, I would even plead
to go beyond the current options and support the addition of an
R package install option that allows to include the sources (e.g. in
a standard folder Rsrc/) in installed packages.

I am fully aware that one can always fetch the source tarballs from
CRAN for that purpose, but it would be much more easy if a simple
installation option could put the R sources of a package in a separate
folder [or archive inside an existing folder] such that R development
tools (such as the Eclipse/StatET IDE) can offer inspection of sources
or display them (e.g. during debugging) out of the box.

If one has the srcref, one can always load the absolutely correct source 
code this way, even if one doesn't know the parent function with

the source attribute.

Any comments?

Best,
Tobias

P.S. One could even consider a post-install option e.g. to add 'real'
R sources (and source references) to Windows packages (which are by
definition already 'installed' and for which such information is not
by default included in the CRAN binaries of these packages).


 Prof Brian Ripley wrote:
  There was an R-core meeting the week before last, and various planned
  changes will appear in R-devel over the next few weeks.

  These are changes planned for R 2.14.0 scheduled for Oct 31.  As we
  are sick of people referring to R-devel as '2.14' or '2.14.0', that
  version number will not be used until we reach 2.14.0 alpha.  You
  will be able to have a package depend on an svn version number when
  referring to R-devel rather than using R (= 2.14.0).

  All packages are installed with lazy-loading (there were 72 CRAN
  packages and 8 BioC packages which opted out).  This means that the
  code is always parsed at install time which inter alia simplifies the
  descriptions.  R 2.13.1 RC warns on installation about packages which
  ask not to be lazy-loaded, and R-devel ignores such requests (with a
  warning).

  In the near future all packages will have a name space.  If the
  sources do not contain one, a default NAMESPACE file will be added.
  This again will simplify the descriptions and also a lot of internal
  code.  Maintainers of packages without name spaces (currently 42% of
  CRAN) are encouraged to add one themselves.

  R-devel is installed with the base and recommended packages
  byte-compiled (the equivalent of 'make bytecode' in R 2.13.x, but
  done less inefficiently).  There is a new option R CMD INSTALL
  --byte-compile to byte-compile contributed packages, but that remains
  optional.
  Byte-compilation is quite expensive (so you definitely want to do it
  at install time, which requires lazy-loading), and relatively few
  packages benefit appreciably from byte-compilation.  A larger number
  of packages benefit from byte-compilation of R itself: for example
  AER runs its checks 10% faster.  The byte-compiler technology is
  thanks to Luke Tierney.

  There is support for figures in Rd files: currently with a first-pass
  implementation (thanks to Duncan Murdoch).

__
R-devel@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Recent and upcoming changes to R-devel

2011-07-05 Thread Duncan Murdoch

On 05/07/2011 6:52 AM, Tobias Verbeke wrote:

L.S.

On 07/05/2011 02:16 AM, mark.braving...@csiro.au wrote:
  I may have misunderstood, but:

  Please could we have an optional installation that does not*not*  
byte-compile base and recommended?

  Reason: it's not possible to debug byte-compiled code-- at least not with 
the 'debug' package, which is quite widely used. I quite often end up using 
'mtrace' on functions in base/recommended packages to figure out what they are 
doing. And sometimes I (and others) experiment with changing functions in 
base/recommended to improve functionality. That seems to be harder with BC 
versions, and might even be impossible, as best I can tell from hints in the 
documentation of 'compile').

  Personally, if I had to choose only one, I'd rather live with the speed 
penalty from not byte-compiling. But of course, if both are available, I could 
install both.

I completely second this request. All speed improvements and the byte
compiler in particular are leaps forward and I am very grateful and
admiring towards the people that make this happen.

That being said, 'moving away' from the sources (with the lazy loading
files and byte-compilation) may be a step back for R package developers
that (during development and maybe on separate development installations
[as opposed to production installations of R]) require
the sources of all packages to be efficient in their work.

As many of you know there is an open source Eclipse/StatET visual
debugger ready and for that application as well (similar to Mark's
request) presence of non-compiled code is highly desirable.

For the particular purpose of debugging R packages, I would even plead
to go beyond the current options and support the addition of an
R package install option that allows to include the sources (e.g. in
a standard folder Rsrc/) in installed packages.

I am fully aware that one can always fetch the source tarballs from
CRAN for that purpose, but it would be much more easy if a simple
installation option could put the R sources of a package in a separate
folder [or archive inside an existing folder] such that R development
tools (such as the Eclipse/StatET IDE) can offer inspection of sources
or display them (e.g. during debugging) out of the box.

If one has the srcref, one can always load the absolutely correct source
code this way, even if one doesn't know the parent function with
the source attribute.

Any comments?


I think these requests have already been met.  If you modify the body of 
a closure (as trace() does), then the byte compiled version is 
discarded, and you go back to the regular interpreted code.  If you 
install packages with the R_KEEP_PKG_SOURCE=yes environment variable 
set, the you keep all source for all functions.  (It's attached to the 
function itself, not as a file that may be out of date.)  It's possible 
that byte compiling turns off R_KEEP_PKG_SOURCE, but that is something 
that is either easily fixed, or avoided by re-installing without byte 
compiling.


Duncan Murdoch


Best,
Tobias

P.S. One could even consider a post-install option e.g. to add 'real'
R sources (and source references) to Windows packages (which are by
definition already 'installed' and for which such information is not
by default included in the CRAN binaries of these packages).

Prof Brian Ripley wrote:
 There was an R-core meeting the week before last, and various planned
 changes will appear in R-devel over the next few weeks.
  
 These are changes planned for R 2.14.0 scheduled for Oct 31.  As we
 are sick of people referring to R-devel as '2.14' or '2.14.0', that
 version number will not be used until we reach 2.14.0 alpha.  You
 will be able to have a package depend on an svn version number when
 referring to R-devel rather than using R (= 2.14.0).
  
 All packages are installed with lazy-loading (there were 72 CRAN
 packages and 8 BioC packages which opted out).  This means that the
 code is always parsed at install time which inter alia simplifies the
 descriptions.  R 2.13.1 RC warns on installation about packages which
 ask not to be lazy-loaded, and R-devel ignores such requests (with a
 warning).
  
 In the near future all packages will have a name space.  If the
 sources do not contain one, a default NAMESPACE file will be added.
 This again will simplify the descriptions and also a lot of internal
 code.  Maintainers of packages without name spaces (currently 42% of
 CRAN) are encouraged to add one themselves.
  
 R-devel is installed with the base and recommended packages
 byte-compiled (the equivalent of 'make bytecode' in R 2.13.x, but
 done less inefficiently).  There is a new option R CMD INSTALL
 --byte-compile to byte-compile contributed packages, but that remains
 optional.
 Byte-compilation is quite expensive (so you definitely want to do it
 at install time, which requires 

Re: [Rd] Recent and upcoming changes to R-devel

2011-07-05 Thread Tobias Verbeke

Dear Duncan,

On 07/05/2011 03:25 PM, Duncan Murdoch wrote:

On 05/07/2011 6:52 AM, Tobias Verbeke wrote:

L.S.

On 07/05/2011 02:16 AM, mark.braving...@csiro.au wrote:
 I may have misunderstood, but:

 Please could we have an optional installation that does not*not*
byte-compile base and recommended?

 Reason: it's not possible to debug byte-compiled code-- at least not
with the 'debug' package, which is quite widely used. I quite often
end up using 'mtrace' on functions in base/recommended packages to
figure out what they are doing. And sometimes I (and others)
experiment with changing functions in base/recommended to improve
functionality. That seems to be harder with BC versions, and might
even be impossible, as best I can tell from hints in the documentation
of 'compile').

 Personally, if I had to choose only one, I'd rather live with the
speed penalty from not byte-compiling. But of course, if both are
available, I could install both.

I completely second this request. All speed improvements and the byte
compiler in particular are leaps forward and I am very grateful and
admiring towards the people that make this happen.

That being said, 'moving away' from the sources (with the lazy loading
files and byte-compilation) may be a step back for R package developers
that (during development and maybe on separate development installations
[as opposed to production installations of R]) require
the sources of all packages to be efficient in their work.

As many of you know there is an open source Eclipse/StatET visual
debugger ready and for that application as well (similar to Mark's
request) presence of non-compiled code is highly desirable.

For the particular purpose of debugging R packages, I would even plead
to go beyond the current options and support the addition of an
R package install option that allows to include the sources (e.g. in
a standard folder Rsrc/) in installed packages.

I am fully aware that one can always fetch the source tarballs from
CRAN for that purpose, but it would be much more easy if a simple
installation option could put the R sources of a package in a separate
folder [or archive inside an existing folder] such that R development
tools (such as the Eclipse/StatET IDE) can offer inspection of sources
or display them (e.g. during debugging) out of the box.

If one has the srcref, one can always load the absolutely correct source
code this way, even if one doesn't know the parent function with
the source attribute.

Any comments?


I think these requests have already been met. If you modify the body of
a closure (as trace() does), then the byte compiled version is
discarded, and you go back to the regular interpreted code. If you
install packages with the R_KEEP_PKG_SOURCE=yes environment variable
set, the you keep all source for all functions. (It's attached to the
function itself, not as a file that may be out of date.) It's possible
that byte compiling turns off R_KEEP_PKG_SOURCE, but that is something
that is either easily fixed, or avoided by re-installing without byte
compiling.


Many thanks for your reaction. Is the R_KEEP_PKG_SOURCE=yes environment
variable also supported during R installation ?

I hope I'm not overlooking anything, but when compiling

ftp://ftp.stat.math.ethz.ch/Software/R/R-devel.tar.gz

a few minutes ago I encountered the following issue:

[...]

building package 'tools'
mkdir -p -- ../../../library/tools
make[4]: Entering directory `/home/tobias/rAdmin/R-devel/src/library/tools'
mkdir -p -- ../../../library/tools/R
mkdir -p -- ../../../library/tools/po
make[4]: Leaving directory `/home/tobias/rAdmin/R-devel/src/library/tools'
make[4]: Entering directory `/home/tobias/rAdmin/R-devel/src/library/tools'
make[5]: Entering directory 
`/home/tobias/rAdmin/R-devel/src/library/tools/src'

making text.d from text.c
making init.d from init.c
making Rmd5.d from Rmd5.c
making md5.d from md5.c
gcc -std=gnu99 -I../../../../include  -I/usr/local/include 
-fvisibility=hidden -fpic  -g -O2 -c text.c -o text.o
gcc -std=gnu99 -I../../../../include  -I/usr/local/include 
-fvisibility=hidden -fpic  -g -O2 -c init.c -o init.o
gcc -std=gnu99 -I../../../../include  -I/usr/local/include 
-fvisibility=hidden -fpic  -g -O2 -c Rmd5.c -o Rmd5.o
gcc -std=gnu99 -I../../../../include  -I/usr/local/include 
-fvisibility=hidden -fpic  -g -O2 -c md5.c -o md5.o
gcc -std=gnu99 -shared -L/usr/local/lib64 -o tools.so text.o init.o 
Rmd5.o md5.o -L../../../../lib -lR
make[6]: Entering directory 
`/home/tobias/rAdmin/R-devel/src/library/tools/src'

make[6]: `Makedeps' is up to date.
make[6]: Leaving directory 
`/home/tobias/rAdmin/R-devel/src/library/tools/src'
make[6]: Entering directory 
`/home/tobias/rAdmin/R-devel/src/library/tools/src'

mkdir -p -- ../../../../library/tools/libs
make[6]: Leaving directory 
`/home/tobias/rAdmin/R-devel/src/library/tools/src'
make[5]: Leaving directory 
`/home/tobias/rAdmin/R-devel/src/library/tools/src'

make[4]: Leaving directory 

Re: [Rd] Recent and upcoming changes to R-devel

2011-07-05 Thread Duncan Murdoch

On 05/07/2011 10:17 AM, Tobias Verbeke wrote:

Dear Duncan,

On 07/05/2011 03:25 PM, Duncan Murdoch wrote:
  On 05/07/2011 6:52 AM, Tobias Verbeke wrote:
  L.S.

  On 07/05/2011 02:16 AM, mark.braving...@csiro.au wrote:
I may have misunderstood, but:
  
Please could we have an optional installation that does not*not*
  byte-compile base and recommended?
  
Reason: it's not possible to debug byte-compiled code-- at least not
  with the 'debug' package, which is quite widely used. I quite often
  end up using 'mtrace' on functions in base/recommended packages to
  figure out what they are doing. And sometimes I (and others)
  experiment with changing functions in base/recommended to improve
  functionality. That seems to be harder with BC versions, and might
  even be impossible, as best I can tell from hints in the documentation
  of 'compile').
  
Personally, if I had to choose only one, I'd rather live with the
  speed penalty from not byte-compiling. But of course, if both are
  available, I could install both.

  I completely second this request. All speed improvements and the byte
  compiler in particular are leaps forward and I am very grateful and
  admiring towards the people that make this happen.

  That being said, 'moving away' from the sources (with the lazy loading
  files and byte-compilation) may be a step back for R package developers
  that (during development and maybe on separate development installations
  [as opposed to production installations of R]) require
  the sources of all packages to be efficient in their work.

  As many of you know there is an open source Eclipse/StatET visual
  debugger ready and for that application as well (similar to Mark's
  request) presence of non-compiled code is highly desirable.

  For the particular purpose of debugging R packages, I would even plead
  to go beyond the current options and support the addition of an
  R package install option that allows to include the sources (e.g. in
  a standard folder Rsrc/) in installed packages.

  I am fully aware that one can always fetch the source tarballs from
  CRAN for that purpose, but it would be much more easy if a simple
  installation option could put the R sources of a package in a separate
  folder [or archive inside an existing folder] such that R development
  tools (such as the Eclipse/StatET IDE) can offer inspection of sources
  or display them (e.g. during debugging) out of the box.

  If one has the srcref, one can always load the absolutely correct source
  code this way, even if one doesn't know the parent function with
  the source attribute.

  Any comments?

  I think these requests have already been met. If you modify the body of
  a closure (as trace() does), then the byte compiled version is
  discarded, and you go back to the regular interpreted code. If you
  install packages with the R_KEEP_PKG_SOURCE=yes environment variable
  set, the you keep all source for all functions. (It's attached to the
  function itself, not as a file that may be out of date.) It's possible
  that byte compiling turns off R_KEEP_PKG_SOURCE, but that is something
  that is either easily fixed, or avoided by re-installing without byte
  compiling.

Many thanks for your reaction. Is the R_KEEP_PKG_SOURCE=yes environment
variable also supported during R installation ?


Yes, other than the error you saw below, which is a temporary problem.  
Not sure which function exceeded the length limit, but the length limit 
is going away before 2.14.0 is released.


Duncan Murdoch


I hope I'm not overlooking anything, but when compiling

ftp://ftp.stat.math.ethz.ch/Software/R/R-devel.tar.gz

a few minutes ago I encountered the following issue:

[...]

building package 'tools'
mkdir -p -- ../../../library/tools
make[4]: Entering directory `/home/tobias/rAdmin/R-devel/src/library/tools'
mkdir -p -- ../../../library/tools/R
mkdir -p -- ../../../library/tools/po
make[4]: Leaving directory `/home/tobias/rAdmin/R-devel/src/library/tools'
make[4]: Entering directory `/home/tobias/rAdmin/R-devel/src/library/tools'
make[5]: Entering directory
`/home/tobias/rAdmin/R-devel/src/library/tools/src'
making text.d from text.c
making init.d from init.c
making Rmd5.d from Rmd5.c
making md5.d from md5.c
gcc -std=gnu99 -I../../../../include  -I/usr/local/include
-fvisibility=hidden -fpic  -g -O2 -c text.c -o text.o
gcc -std=gnu99 -I../../../../include  -I/usr/local/include
-fvisibility=hidden -fpic  -g -O2 -c init.c -o init.o
gcc -std=gnu99 -I../../../../include  -I/usr/local/include
-fvisibility=hidden -fpic  -g -O2 -c Rmd5.c -o Rmd5.o
gcc -std=gnu99 -I../../../../include  -I/usr/local/include
-fvisibility=hidden -fpic  -g -O2 -c md5.c -o md5.o
gcc -std=gnu99 -shared -L/usr/local/lib64 -o tools.so text.o init.o
Rmd5.o md5.o -L../../../../lib -lR
make[6]: Entering directory
`/home/tobias/rAdmin/R-devel/src/library/tools/src'
make[6]: `Makedeps' is up to date.
make[6]: Leaving directory

Re: [Rd] Recent and upcoming changes to R-devel

2011-07-05 Thread Duncan Murdoch

On 05/07/2011 11:20 AM, Stephan Wahlbrink wrote:

Dear developers,

Duncan Murdoch wrote [2011-07-05 15:25]:
  On 05/07/2011 6:52 AM, Tobias Verbeke wrote:
  L.S.

  On 07/05/2011 02:16 AM, mark.braving...@csiro.au wrote:
I may have misunderstood, but:
  
Please could we have an optional installation that does not*not*
  byte-compile base and recommended?
  
Reason: it's not possible to debug byte-compiled code-- at least not
  with the 'debug' package, which is quite widely used. I quite often
  end up using 'mtrace' on functions in base/recommended packages to
  figure out what they are doing. And sometimes I (and others)
  experiment with changing functions in base/recommended to improve
  functionality. That seems to be harder with BC versions, and might
  even be impossible, as best I can tell from hints in the documentation
  of 'compile').
  
Personally, if I had to choose only one, I'd rather live with the
  speed penalty from not byte-compiling. But of course, if both are
  available, I could install both.

  I completely second this request. All speed improvements and the byte
  compiler in particular are leaps forward and I am very grateful and
  admiring towards the people that make this happen.

  That being said, 'moving away' from the sources (with the lazy loading
  files and byte-compilation) may be a step back for R package developers
  that (during development and maybe on separate development installations
  [as opposed to production installations of R]) require
  the sources of all packages to be efficient in their work.

  As many of you know there is an open source Eclipse/StatET visual
  debugger ready and for that application as well (similar to Mark's
  request) presence of non-compiled code is highly desirable.

  For the particular purpose of debugging R packages, I would even plead
  to go beyond the current options and support the addition of an
  R package install option that allows to include the sources (e.g. in
  a standard folder Rsrc/) in installed packages.

  I am fully aware that one can always fetch the source tarballs from
  CRAN for that purpose, but it would be much more easy if a simple
  installation option could put the R sources of a package in a separate
  folder [or archive inside an existing folder] such that R development
  tools (such as the Eclipse/StatET IDE) can offer inspection of sources
  or display them (e.g. during debugging) out of the box.

  If one has the srcref, one can always load the absolutely correct source
  code this way, even if one doesn't know the parent function with
  the source attribute.

  Any comments?

  I think these requests have already been met. If you modify the body of
  a closure (as trace() does), then the byte compiled version is
  discarded, and you go back to the regular interpreted code. If you
  install packages with the R_KEEP_PKG_SOURCE=yes environment variable
  set, the you keep all source for all functions. (It's attached to the
  function itself, not as a file that may be out of date.) It's possible
  that byte compiling turns off R_KEEP_PKG_SOURCE, but that is something
  that is either easily fixed, or avoided by re-installing without byte
  compiling.

I don’t know how the new installation works exactly, but would it be
possible, to simply install both types, the old expression bodies and
the new byte-compiled, as single package at the same time?


Yes, that's what is done.

  This would
allow the R user and developer to simply use the variant which is the
best at the moment. If he wants to debug code, he can switch of the use
of byte-compiled code  and use the old R expressions (with attached
srcrefs). If debugging is not required, he can profit from the
byte-compiled version. The best would be a toggle, to switch it at
runtime, but a startup option would be sufficient too.

I think direct access to the code is one big advantage of open source
software. For developer it makes it easier to find and fix bugs if
something is wrong. But it can also help users a lot to understand how a
function or algorithm works and learn from code written by other persons
– if the access to the sources is easy.

As long byte-code doesn’t support the debugging features of R, it is
required for best debugging support to run the functions completely
without byte-complied code. If I understood it correctly, byte-code
frames would disable srcrefs as well as features like “step return” to
that frames. Therefore I ask for a way that it is easy to switch between
both execution types.


What gave you that impression?

Duncan Murdoch


Best,
Stephan



  Duncan Murdoch

  Best,
  Tobias

  P.S. One could even consider a post-install option e.g. to add 'real'
  R sources (and source references) to Windows packages (which are by
  definition already 'installed' and for which such information is not
  by default included in the CRAN binaries of these packages).

  Prof Brian Ripley wrote:
  There was an R-core 

Re: [Rd] Recent and upcoming changes to R-devel

2011-07-05 Thread Stephan Wahlbrink

Dear developers,

Duncan Murdoch wrote [2011-07-05 15:25]:

On 05/07/2011 6:52 AM, Tobias Verbeke wrote:

L.S.

On 07/05/2011 02:16 AM, mark.braving...@csiro.au wrote:
 I may have misunderstood, but:

 Please could we have an optional installation that does not*not*
byte-compile base and recommended?

 Reason: it's not possible to debug byte-compiled code-- at least not
with the 'debug' package, which is quite widely used. I quite often
end up using 'mtrace' on functions in base/recommended packages to
figure out what they are doing. And sometimes I (and others)
experiment with changing functions in base/recommended to improve
functionality. That seems to be harder with BC versions, and might
even be impossible, as best I can tell from hints in the documentation
of 'compile').

 Personally, if I had to choose only one, I'd rather live with the
speed penalty from not byte-compiling. But of course, if both are
available, I could install both.

I completely second this request. All speed improvements and the byte
compiler in particular are leaps forward and I am very grateful and
admiring towards the people that make this happen.

That being said, 'moving away' from the sources (with the lazy loading
files and byte-compilation) may be a step back for R package developers
that (during development and maybe on separate development installations
[as opposed to production installations of R]) require
the sources of all packages to be efficient in their work.

As many of you know there is an open source Eclipse/StatET visual
debugger ready and for that application as well (similar to Mark's
request) presence of non-compiled code is highly desirable.

For the particular purpose of debugging R packages, I would even plead
to go beyond the current options and support the addition of an
R package install option that allows to include the sources (e.g. in
a standard folder Rsrc/) in installed packages.

I am fully aware that one can always fetch the source tarballs from
CRAN for that purpose, but it would be much more easy if a simple
installation option could put the R sources of a package in a separate
folder [or archive inside an existing folder] such that R development
tools (such as the Eclipse/StatET IDE) can offer inspection of sources
or display them (e.g. during debugging) out of the box.

If one has the srcref, one can always load the absolutely correct source
code this way, even if one doesn't know the parent function with
the source attribute.

Any comments?


I think these requests have already been met. If you modify the body of
a closure (as trace() does), then the byte compiled version is
discarded, and you go back to the regular interpreted code. If you
install packages with the R_KEEP_PKG_SOURCE=yes environment variable
set, the you keep all source for all functions. (It's attached to the
function itself, not as a file that may be out of date.) It's possible
that byte compiling turns off R_KEEP_PKG_SOURCE, but that is something
that is either easily fixed, or avoided by re-installing without byte
compiling.


I don’t know how the new installation works exactly, but would it be 
possible, to simply install both types, the old expression bodies and 
the new byte-compiled, as single package at the same time? This would 
allow the R user and developer to simply use the variant which is the 
best at the moment. If he wants to debug code, he can switch of the use 
of byte-compiled code  and use the old R expressions (with attached 
srcrefs). If debugging is not required, he can profit from the 
byte-compiled version. The best would be a toggle, to switch it at 
runtime, but a startup option would be sufficient too.


I think direct access to the code is one big advantage of open source 
software. For developer it makes it easier to find and fix bugs if 
something is wrong. But it can also help users a lot to understand how a 
function or algorithm works and learn from code written by other persons 
– if the access to the sources is easy.


As long byte-code doesn’t support the debugging features of R, it is 
required for best debugging support to run the functions completely 
without byte-complied code. If I understood it correctly, byte-code 
frames would disable srcrefs as well as features like “step return” to 
that frames. Therefore I ask for a way that it is easy to switch between 
both execution types.


Best,
Stephan




Duncan Murdoch


Best,
Tobias

P.S. One could even consider a post-install option e.g. to add 'real'
R sources (and source references) to Windows packages (which are by
definition already 'installed' and for which such information is not
by default included in the CRAN binaries of these packages).

  Prof Brian Ripley wrote:
  There was an R-core meeting the week before last, and various
planned
  changes will appear in R-devel over the next few weeks.
 
  These are changes planned for R 2.14.0 scheduled for Oct 31. As we
  are sick of people referring to R-devel as '2.14' or 

Re: [Rd] Recent and upcoming changes to R-devel

2011-07-05 Thread Tobias Verbeke

On 07/05/2011 04:21 PM, Duncan Murdoch wrote:

On 05/07/2011 10:17 AM, Tobias Verbeke wrote:

Dear Duncan,

On 07/05/2011 03:25 PM, Duncan Murdoch wrote:
 On 05/07/2011 6:52 AM, Tobias Verbeke wrote:
 L.S.

 On 07/05/2011 02:16 AM, mark.braving...@csiro.au wrote:
  I may have misunderstood, but:
 
  Please could we have an optional installation that does not*not*
 byte-compile base and recommended?
 
  Reason: it's not possible to debug byte-compiled code-- at least not
 with the 'debug' package, which is quite widely used. I quite often
 end up using 'mtrace' on functions in base/recommended packages to
 figure out what they are doing. And sometimes I (and others)
 experiment with changing functions in base/recommended to improve
 functionality. That seems to be harder with BC versions, and might
 even be impossible, as best I can tell from hints in the documentation
 of 'compile').
 
  Personally, if I had to choose only one, I'd rather live with the
 speed penalty from not byte-compiling. But of course, if both are
 available, I could install both.

 I completely second this request. All speed improvements and the byte
 compiler in particular are leaps forward and I am very grateful and
 admiring towards the people that make this happen.

 That being said, 'moving away' from the sources (with the lazy loading
 files and byte-compilation) may be a step back for R package
developers
 that (during development and maybe on separate development
installations
 [as opposed to production installations of R]) require
 the sources of all packages to be efficient in their work.

 As many of you know there is an open source Eclipse/StatET visual
 debugger ready and for that application as well (similar to Mark's
 request) presence of non-compiled code is highly desirable.

 For the particular purpose of debugging R packages, I would even plead
 to go beyond the current options and support the addition of an
 R package install option that allows to include the sources (e.g. in
 a standard folder Rsrc/) in installed packages.

 I am fully aware that one can always fetch the source tarballs from
 CRAN for that purpose, but it would be much more easy if a simple
 installation option could put the R sources of a package in a separate
 folder [or archive inside an existing folder] such that R development
 tools (such as the Eclipse/StatET IDE) can offer inspection of sources
 or display them (e.g. during debugging) out of the box.

 If one has the srcref, one can always load the absolutely correct
source
 code this way, even if one doesn't know the parent function with
 the source attribute.

 Any comments?

 I think these requests have already been met. If you modify the body of
 a closure (as trace() does), then the byte compiled version is
 discarded, and you go back to the regular interpreted code. If you
 install packages with the R_KEEP_PKG_SOURCE=yes environment variable
 set, the you keep all source for all functions. (It's attached to the
 function itself, not as a file that may be out of date.) It's possible


Can you expand on when files put inside a package at install
time will be out of date compared to the source information
attached to a function ?

I (naively) thought the source information was created and attached
at install time as well and that it did not change afterwards either.

I guess the arguments for files is that they have precise
locations and allow for easy indexing by development tools
external to R (but may be corrected here as well).


 that byte compiling turns off R_KEEP_PKG_SOURCE, but that is something
 that is either easily fixed, or avoided by re-installing without byte
 compiling.

Many thanks for your reaction. Is the R_KEEP_PKG_SOURCE=yes environment
variable also supported during R installation ?


Yes, other than the error you saw below, which is a temporary problem.
Not sure which function exceeded the length limit, but the length limit
is going away before 2.14.0 is released.


Thanks again, Duncan, for the clarification.

Is it useful (or just whimsical) to have an R
function that would allow for a given stock CRAN
Windows R installation with stock Windows CRAN binary
add-on packages to add the source information that
would be useful e.g. for a debugger post factum?

I can imagine something like

update.packages(., checkSourcesKept = TRUE)

as I don't think this can currently be solved
with a combination of INSTALL_opts=--with-keep.source
and type=source given that there will not be a check
for the presence of source information to determine
which packages require being updated (or in this
case 'completed' with source information).

The alternative scenario would be to expect users
that want this functionality to compile R and all
add-on packages from source (also on Windows or
Mac).

Best,
Tobias


I hope I'm not overlooking anything, but when compiling

ftp://ftp.stat.math.ethz.ch/Software/R/R-devel.tar.gz

a few minutes ago I encountered the following issue:

[...]


Re: [Rd] [datatable-help] speeding up perception

2011-07-05 Thread Matthew Dowle
Simon (and all),

I've tried to make assignment as fast as calling `[-.data.table`
directly, for user convenience. Profiling shows (IIUC) that it isn't
dispatch, but x being copied. Is there a way to prevent '[-' from
copying x?  Small reproducible example in vanilla R 2.13.0 :

 x = list(a=1:1,b=1:1)
 class(x) = newclass
 [-.newclass = function(x,i,j,value) x  # i.e. do nothing
 tracemem(x)
[1] 0xa1ec758
 x[1,2] = 42L
tracemem[0xa1ec758 - 0xa1ec558]:# but, x is still copied, why?
 

I've tried returning NULL from [-.newclass but then x gets assigned
NULL :

 [-.newclass = function(x,i,j,value) NULL
 x[1,2] = 42L
tracemem[0xa1ec558 - 0x9c5f318]: 
 x
NULL
 

Any pointers much appreciated. If that copy is preventable it should
save the user needing to use `[-.data.table`(...) syntax to get the
best speed (20 times faster on the small example used so far).

Matthew


On Tue, 2011-07-05 at 08:32 +0100, Matthew Dowle wrote:
 Simon,
 
 Thanks for the great suggestion. I've written a skeleton assignment
 function for data.table which incurs no copies, which works for this
 case. For completeness, if I understand correctly, this is for : 
   i) convenience of new users who don't know how to vectorize yet
   ii) more complex examples which can't be vectorized.
 
 Before:
 
  system.time(for (r in 1:R) DT[r,20] - 1.0)
user  system elapsed 
  12.792   0.488  13.340 
 
 After :
 
  system.time(for (r in 1:R) DT[r,20] - 1.0)
user  system elapsed 
   2.908   0.020   2.935
 
 Where this can be reduced further as follows :
 
  system.time(for (r in 1:R) `[-.data.table`(DT,r,2,1.0))
user  system elapsed 
   0.132   0.000   0.131 
  
 
 Still working on it. When it doesn't break other data.table tests, I'll
 commit to R-Forge ...
 
 Matthew
 
 
 On Mon, 2011-07-04 at 12:41 -0400, Simon Urbanek wrote:
  Timothée,
  
  On Jul 4, 2011, at 2:47 AM, Timothée Carayol wrote:
  
   Hi --
   
   It's my first post on this list; as a relatively new user with little
   knowledge of R internals, I am a bit intimidated by the depth of some
   of the discussions here, so please spare me if I say something
   incredibly silly.
   
   I feel that someone at this point should mention Matthew Dowle's
   excellent data.table package
   (http://cran.r-project.org/web/packages/data.table/index.html) which
   seems to me to address many of the inefficiencies of data.frame.
   data.tables have no row names; and operations that only need data from
   one or two columns are (I believe) just as quick whether the total
   number of columns is 5 or 1000. This results in very quick operations
   (and, often, elegant code as well).
   
  
  I agree that data.table is a very good alternative (for other reasons) that 
  should be promoted more. The only slight snag is that it doesn't help with 
  the issue at hand since it simply does a pass-though for subassignments to 
  data frame's methods and thus suffers from the same problems (in fact there 
  is a rather stark asymmetry in how it handles subsetting vs subassignment - 
  which is a bit surprising [if I read the code correctly you can't use the 
  same indexing in both]). In fact I would propose that it should not do that 
  but handle the simple cases itself more efficiently without unneeded 
  copies. That would make it indeed a very interesting alternative.
  
  Cheers,
  Simon
  
  
   
   On Mon, Jul 4, 2011 at 6:19 AM, ivo welch ivo.we...@gmail.com wrote:
   thank you, simon.  this was very interesting indeed.  I also now
   understand how far out of my depth I am here.
   
   fortunately, as an end user, obviously, *I* now know how to avoid the
   problem.  I particularly like the as.list() transformation and back to
   as.data.frame() to speed things up without loss of (much)
   functionality.
   
   
   more broadly, I view the avoidance of individual access through the
   use of apply and vector operations as a mixed IQ test and knowledge
   test (which I often fail).  However, even for the most clever, there
   are also situations where the KISS programming principle makes
   explicit loops still preferable.  Personally, I would have preferred
   it if R had, in its standard statistical data set data structure,
   foregone the row names feature in exchange for retaining fast direct
   access.  R could have reserved its current implementation with row
   names but slow access for a less common (possibly pseudo-inheriting)
   data structure.
   
   
   If end users commonly do iterations over a data frame, which I would
   guess to be the case, then the impression of R by (novice) end users
   could be greatly enhanced if the extreme penalties could be eliminated
   or at least flagged.  For example, I wonder if modest special internal
   code could store data frames internally and transparently as lists of
   vectors UNTIL a row name is assigned to.  Easier and uglier, a simple
   but specific warning message could be issued with a suggestion if
   there is an 

Re: [Rd] Recent and upcoming changes to R-devel

2011-07-05 Thread Simon Urbanek

On Jul 5, 2011, at 1:45 PM, Tobias Verbeke wrote:

 On 07/05/2011 04:21 PM, Duncan Murdoch wrote:
 On 05/07/2011 10:17 AM, Tobias Verbeke wrote:
 Dear Duncan,
 
 On 07/05/2011 03:25 PM, Duncan Murdoch wrote:
  On 05/07/2011 6:52 AM, Tobias Verbeke wrote:
  L.S.
 
  On 07/05/2011 02:16 AM, mark.braving...@csiro.au wrote:
   I may have misunderstood, but:
  
   Please could we have an optional installation that does not*not*
  byte-compile base and recommended?
  
   Reason: it's not possible to debug byte-compiled code-- at least not
  with the 'debug' package, which is quite widely used. I quite often
  end up using 'mtrace' on functions in base/recommended packages to
  figure out what they are doing. And sometimes I (and others)
  experiment with changing functions in base/recommended to improve
  functionality. That seems to be harder with BC versions, and might
  even be impossible, as best I can tell from hints in the documentation
  of 'compile').
  
   Personally, if I had to choose only one, I'd rather live with the
  speed penalty from not byte-compiling. But of course, if both are
  available, I could install both.
 
  I completely second this request. All speed improvements and the byte
  compiler in particular are leaps forward and I am very grateful and
  admiring towards the people that make this happen.
 
  That being said, 'moving away' from the sources (with the lazy loading
  files and byte-compilation) may be a step back for R package
 developers
  that (during development and maybe on separate development
 installations
  [as opposed to production installations of R]) require
  the sources of all packages to be efficient in their work.
 
  As many of you know there is an open source Eclipse/StatET visual
  debugger ready and for that application as well (similar to Mark's
  request) presence of non-compiled code is highly desirable.
 
  For the particular purpose of debugging R packages, I would even plead
  to go beyond the current options and support the addition of an
  R package install option that allows to include the sources (e.g. in
  a standard folder Rsrc/) in installed packages.
 
  I am fully aware that one can always fetch the source tarballs from
  CRAN for that purpose, but it would be much more easy if a simple
  installation option could put the R sources of a package in a separate
  folder [or archive inside an existing folder] such that R development
  tools (such as the Eclipse/StatET IDE) can offer inspection of sources
  or display them (e.g. during debugging) out of the box.
 
  If one has the srcref, one can always load the absolutely correct
 source
  code this way, even if one doesn't know the parent function with
  the source attribute.
 
  Any comments?
 
  I think these requests have already been met. If you modify the body of
  a closure (as trace() does), then the byte compiled version is
  discarded, and you go back to the regular interpreted code. If you
  install packages with the R_KEEP_PKG_SOURCE=yes environment variable
  set, the you keep all source for all functions. (It's attached to the
  function itself, not as a file that may be out of date.) It's possible
 
 Can you expand on when files put inside a package at install time will be out 
 of date compared to the source information attached to a function ?
 

When you edit such files.


 I (naively) thought the source information was created and attached at 
 install time as well and that it did not change afterwards either.
 

... unless you edit it.


 I guess the arguments for files is that they have precise locations and allow 
 for easy indexing by development tools external to R (but may be corrected 
 here as well).
 

Yes, but the moment you change a file it is no longer reflected in R unless you 
re-source it.

This is usually not an issue if you have a separate installed copy, but if you 
edit directly on the installed sources (something less frequent with 
lazy-loaded packages but more so in the old days), the files won't reflect 
what's actually parsed. This is a common problem, not specific to R, really. By 
keeping the sources with the objects, you guarantee that they match even if the 
sources files have been edited - useful for debugging. It not as esoteric as it 
sounds - just store a function in a workspace and then continue working on a 
project ...


  that byte compiling turns off R_KEEP_PKG_SOURCE, but that is something
  that is either easily fixed, or avoided by re-installing without byte
  compiling.
 
 Many thanks for your reaction. Is the R_KEEP_PKG_SOURCE=yes environment
 variable also supported during R installation ?
 
 Yes, other than the error you saw below, which is a temporary problem.
 Not sure which function exceeded the length limit, but the length limit
 is going away before 2.14.0 is released.
 
 Thanks again, Duncan, for the clarification.
 
 Is it useful (or just whimsical) to have an R
 function that would allow for a given stock CRAN
 Windows R 

Re: [Rd] Recent and upcoming changes to R-devel

2011-07-05 Thread Duncan Murdoch

On 05/07/2011 1:45 PM, Tobias Verbeke wrote:

On 07/05/2011 04:21 PM, Duncan Murdoch wrote:
  On 05/07/2011 10:17 AM, Tobias Verbeke wrote:
  Dear Duncan,

  On 07/05/2011 03:25 PM, Duncan Murdoch wrote:
On 05/07/2011 6:52 AM, Tobias Verbeke wrote:
L.S.
  
On 07/05/2011 02:16 AM, mark.braving...@csiro.au wrote:
  I may have misunderstood, but:

  Please could we have an optional installation that does not*not*
byte-compile base and recommended?

  Reason: it's not possible to debug byte-compiled code-- at least not
with the 'debug' package, which is quite widely used. I quite often
end up using 'mtrace' on functions in base/recommended packages to
figure out what they are doing. And sometimes I (and others)
experiment with changing functions in base/recommended to improve
functionality. That seems to be harder with BC versions, and might
even be impossible, as best I can tell from hints in the documentation
of 'compile').

  Personally, if I had to choose only one, I'd rather live with the
speed penalty from not byte-compiling. But of course, if both are
available, I could install both.
  
I completely second this request. All speed improvements and the byte
compiler in particular are leaps forward and I am very grateful and
admiring towards the people that make this happen.
  
That being said, 'moving away' from the sources (with the lazy loading
files and byte-compilation) may be a step back for R package
  developers
that (during development and maybe on separate development
  installations
[as opposed to production installations of R]) require
the sources of all packages to be efficient in their work.
  
As many of you know there is an open source Eclipse/StatET visual
debugger ready and for that application as well (similar to Mark's
request) presence of non-compiled code is highly desirable.
  
For the particular purpose of debugging R packages, I would even plead
to go beyond the current options and support the addition of an
R package install option that allows to include the sources (e.g. in
a standard folder Rsrc/) in installed packages.
  
I am fully aware that one can always fetch the source tarballs from
CRAN for that purpose, but it would be much more easy if a simple
installation option could put the R sources of a package in a separate
folder [or archive inside an existing folder] such that R development
tools (such as the Eclipse/StatET IDE) can offer inspection of sources
or display them (e.g. during debugging) out of the box.
  
If one has the srcref, one can always load the absolutely correct
  source
code this way, even if one doesn't know the parent function with
the source attribute.
  
Any comments?
  
I think these requests have already been met. If you modify the body of
a closure (as trace() does), then the byte compiled version is
discarded, and you go back to the regular interpreted code. If you
install packages with the R_KEEP_PKG_SOURCE=yes environment variable
set, the you keep all source for all functions. (It's attached to the
function itself, not as a file that may be out of date.) It's possible

Can you expand on when files put inside a package at install
time will be out of date compared to the source information
attached to a function ?


Suppose you're debugging.  You change a function, source it:  now it's 
not the same as the one in the package source, it's the one in your editor.



I (naively) thought the source information was created and attached
at install time as well and that it did not change afterwards either.


It won't change if the function doesn't change, but during debugging (or 
in some strange examples, during normal execution) the function might 
change.



I guess the arguments for files is that they have precise
locations and allow for easy indexing by development tools
external to R (but may be corrected here as well).


As in pre-2.13.0, it will keep the locations and time stamps of the 
files, but we were finding it was too unreliable not to have an actual 
copy of the contents, so 2.13.0 also keeps a copy of the file, and 
that's the main source of content to display.



that byte compiling turns off R_KEEP_PKG_SOURCE, but that is something
that is either easily fixed, or avoided by re-installing without byte
compiling.

  Many thanks for your reaction. Is the R_KEEP_PKG_SOURCE=yes environment
  variable also supported during R installation ?

  Yes, other than the error you saw below, which is a temporary problem.
  Not sure which function exceeded the length limit, but the length limit
  is going away before 2.14.0 is released.

Thanks again, Duncan, for the clarification.

Is it useful (or just whimsical) to have an R
function that would allow for a given stock CRAN
Windows R installation with stock Windows CRAN binary
add-on 

Re: [Rd] [datatable-help] speeding up perception

2011-07-05 Thread Simon Urbanek

On Jul 5, 2011, at 2:08 PM, Matthew Dowle wrote:

 Simon (and all),
 
 I've tried to make assignment as fast as calling `[-.data.table`
 directly, for user convenience. Profiling shows (IIUC) that it isn't
 dispatch, but x being copied. Is there a way to prevent '[-' from
 copying x?

Good point, and conceptually, no. It's a subassignment after all - see R-lang 
3.4.4 - it is equivalent to 

`*tmp*` - x
x - `[-`(`*tmp*`, i, j, value)
rm(`*tmp*`)

so there is always a copy involved.

Now, a conceptual copy doesn't mean real copy in R since R tries to keep the 
pass-by-value illusion while passing references in cases where it knows that 
modifications cannot occur and/or they are safe. The default subassign method 
uses that feature which means it can afford to not duplicate if there is only 
one reference -- then it's safe to not duplicate as we are replacing that only 
existing reference. And in the case of a matrix, that will be true at the 
latest from the second subassignment on.

Unfortunately the method dispatch (AFAICS) introduces one more reference in the 
dispatch chain so there will always be two references so duplication is 
necessary. Since we have only 0 / 1 / 2+ information on the references, we 
can't distinguish whether the second reference is due to the dispatch or due to 
the passed object having more than one reference, so we have to duplicate in 
any case. That is unfortunate, and I don't see a way around (unless we handle 
subassignment methods is some special way).

Cheers,
Simon



  Small reproducible example in vanilla R 2.13.0 :
 
 x = list(a=1:1,b=1:1)
 class(x) = newclass
 [-.newclass = function(x,i,j,value) x  # i.e. do nothing
 tracemem(x)
 [1] 0xa1ec758
 x[1,2] = 42L
 tracemem[0xa1ec758 - 0xa1ec558]:# but, x is still copied, why?
 
 
 I've tried returning NULL from [-.newclass but then x gets assigned
 NULL :
 
 [-.newclass = function(x,i,j,value) NULL
 x[1,2] = 42L
 tracemem[0xa1ec558 - 0x9c5f318]: 
 x
 NULL
 
 
 Any pointers much appreciated. If that copy is preventable it should
 save the user needing to use `[-.data.table`(...) syntax to get the
 best speed (20 times faster on the small example used so far).
 
 Matthew
 
 
 On Tue, 2011-07-05 at 08:32 +0100, Matthew Dowle wrote:
 Simon,
 
 Thanks for the great suggestion. I've written a skeleton assignment
 function for data.table which incurs no copies, which works for this
 case. For completeness, if I understand correctly, this is for : 
  i) convenience of new users who don't know how to vectorize yet
  ii) more complex examples which can't be vectorized.
 
 Before:
 
 system.time(for (r in 1:R) DT[r,20] - 1.0)
   user  system elapsed 
 12.792   0.488  13.340 
 
 After :
 
 system.time(for (r in 1:R) DT[r,20] - 1.0)
   user  system elapsed 
  2.908   0.020   2.935
 
 Where this can be reduced further as follows :
 
 system.time(for (r in 1:R) `[-.data.table`(DT,r,2,1.0))
   user  system elapsed 
  0.132   0.000   0.131 
 
 
 Still working on it. When it doesn't break other data.table tests, I'll
 commit to R-Forge ...
 
 Matthew
 
 
 On Mon, 2011-07-04 at 12:41 -0400, Simon Urbanek wrote:
 Timothée,
 
 On Jul 4, 2011, at 2:47 AM, Timothée Carayol wrote:
 
 Hi --
 
 It's my first post on this list; as a relatively new user with little
 knowledge of R internals, I am a bit intimidated by the depth of some
 of the discussions here, so please spare me if I say something
 incredibly silly.
 
 I feel that someone at this point should mention Matthew Dowle's
 excellent data.table package
 (http://cran.r-project.org/web/packages/data.table/index.html) which
 seems to me to address many of the inefficiencies of data.frame.
 data.tables have no row names; and operations that only need data from
 one or two columns are (I believe) just as quick whether the total
 number of columns is 5 or 1000. This results in very quick operations
 (and, often, elegant code as well).
 
 
 I agree that data.table is a very good alternative (for other reasons) that 
 should be promoted more. The only slight snag is that it doesn't help with 
 the issue at hand since it simply does a pass-though for subassignments to 
 data frame's methods and thus suffers from the same problems (in fact there 
 is a rather stark asymmetry in how it handles subsetting vs subassignment - 
 which is a bit surprising [if I read the code correctly you can't use the 
 same indexing in both]). In fact I would propose that it should not do that 
 but handle the simple cases itself more efficiently without unneeded 
 copies. That would make it indeed a very interesting alternative.
 
 Cheers,
 Simon
 
 
 
 On Mon, Jul 4, 2011 at 6:19 AM, ivo welch ivo.we...@gmail.com wrote:
 thank you, simon.  this was very interesting indeed.  I also now
 understand how far out of my depth I am here.
 
 fortunately, as an end user, obviously, *I* now know how to avoid the
 problem.  I particularly like the as.list() transformation and back to
 as.data.frame() to speed 

Re: [Rd] Syntactically valid names

2011-07-05 Thread Hadley Wickham
 I wouldn't expect so. The basic structure might be handled using a regexp of 
 sorts, but even that is tricky because of the dot not followed by number 
 rule, and then there's the stop list of reserved words, which would make your 
 code clumsy whatever you do.

 How on Earth would you expect anything to be significantly more elegant than 
 your

 function(x) x == make.names(x)

 anyway??! (OK, if there was a wrapper for the C level isValidName() 
 function...)

Good point.  Thanks!

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [datatable-help] speeding up perception

2011-07-05 Thread luke-tierney

On Tue, 5 Jul 2011, Matthew Dowle wrote:


Simon (and all),

I've tried to make assignment as fast as calling `[-.data.table`
directly, for user convenience. Profiling shows (IIUC) that it isn't
dispatch, but x being copied. Is there a way to prevent '[-' from
copying x?  Small reproducible example in vanilla R 2.13.0 :


x = list(a=1:1,b=1:1)
class(x) = newclass
[-.newclass = function(x,i,j,value) x  # i.e. do nothing
tracemem(x)

[1] 0xa1ec758

x[1,2] = 42L

tracemem[0xa1ec758 - 0xa1ec558]:# but, x is still copied, why?




This one is a red herring -- the class(x) - newclass assignment is
bumping up the NAMED value and as a result the following assignment
needs to duplicate. (the primitive class- could be modified to avoid
the NAMED bump but it's fairly intricate code so I'm not going to look
into it now).

[A bit more later in reply to Simon's message]

luke



I've tried returning NULL from [-.newclass but then x gets assigned
NULL :


[-.newclass = function(x,i,j,value) NULL
x[1,2] = 42L
tracemem[0xa1ec558 - 0x9c5f318]: 

x

NULL




Any pointers much appreciated. If that copy is preventable it should
save the user needing to use `[-.data.table`(...) syntax to get the
best speed (20 times faster on the small example used so far).

Matthew


On Tue, 2011-07-05 at 08:32 +0100, Matthew Dowle wrote:

Simon,

Thanks for the great suggestion. I've written a skeleton assignment
function for data.table which incurs no copies, which works for this
case. For completeness, if I understand correctly, this is for :
  i) convenience of new users who don't know how to vectorize yet
  ii) more complex examples which can't be vectorized.

Before:

 system.time(for (r in 1:R) DT[r,20] - 1.0)
   user  system elapsed
 12.792   0.488  13.340 


After :

 system.time(for (r in 1:R) DT[r,20] - 1.0)
   user  system elapsed
  2.908   0.020   2.935

Where this can be reduced further as follows :

 system.time(for (r in 1:R) `[-.data.table`(DT,r,2,1.0))
   user  system elapsed
  0.132   0.000   0.131 
 


Still working on it. When it doesn't break other data.table tests, I'll
commit to R-Forge ...

Matthew


On Mon, 2011-07-04 at 12:41 -0400, Simon Urbanek wrote:
 Timothée,
 
 On Jul 4, 2011, at 2:47 AM, Timothée Carayol wrote:
 
  Hi --
  
  It's my first post on this list; as a relatively new user with little

  knowledge of R internals, I am a bit intimidated by the depth of some
  of the discussions here, so please spare me if I say something
  incredibly silly.
  
  I feel that someone at this point should mention Matthew Dowle's

  excellent data.table package
  (http://cran.r-project.org/web/packages/data.table/index.html) which
  seems to me to address many of the inefficiencies of data.frame.
  data.tables have no row names; and operations that only need data from
  one or two columns are (I believe) just as quick whether the total
  number of columns is 5 or 1000. This results in very quick operations
  (and, often, elegant code as well).
  
 
 I agree that data.table is a very good alternative (for other reasons) that should be promoted more. The only slight snag is that it doesn't help with the issue at hand since it simply does a pass-though for subassignments to data frame's methods and thus suffers from the same problems (in fact there is a rather stark asymmetry in how it handles subsetting vs subassignment - which is a bit surprising [if I read the code correctly you can't use the same indexing in both]). In fact I would propose that it should not do that but handle the simple cases itself more efficiently without unneeded copies. That would make it indeed a very interesting alternative.
 
 Cheers,

 Simon
 
 
  
  On Mon, Jul 4, 2011 at 6:19 AM, ivo welch ivo.we...@gmail.com wrote:

  thank you, simon.  this was very interesting indeed.  I also now
  understand how far out of my depth I am here.
  
  fortunately, as an end user, obviously, *I* now know how to avoid the

  problem.  I particularly like the as.list() transformation and back to
  as.data.frame() to speed things up without loss of (much)
  functionality.
  
  
  more broadly, I view the avoidance of individual access through the

  use of apply and vector operations as a mixed IQ test and knowledge
  test (which I often fail).  However, even for the most clever, there
  are also situations where the KISS programming principle makes
  explicit loops still preferable.  Personally, I would have preferred
  it if R had, in its standard statistical data set data structure,
  foregone the row names feature in exchange for retaining fast direct
  access.  R could have reserved its current implementation with row
  names but slow access for a less common (possibly pseudo-inheriting)
  data structure.
  
  
  If end users commonly do iterations over a data frame, which I would

  guess to be the case, then the impression of R by (novice) end users
  could be greatly enhanced if the extreme penalties could be eliminated
  or at least 

Re: [Rd] [datatable-help] speeding up perception

2011-07-05 Thread luke-tierney

On Tue, 5 Jul 2011, Simon Urbanek wrote:



On Jul 5, 2011, at 2:08 PM, Matthew Dowle wrote:


Simon (and all),

I've tried to make assignment as fast as calling `[-.data.table`
directly, for user convenience. Profiling shows (IIUC) that it isn't
dispatch, but x being copied. Is there a way to prevent '[-' from
copying x?


Good point, and conceptually, no. It's a subassignment after all - see R-lang 
3.4.4 - it is equivalent to

`*tmp*` - x
x - `[-`(`*tmp*`, i, j, value)
rm(`*tmp*`)

so there is always a copy involved.

Now, a conceptual copy doesn't mean real copy in R since R tries to keep the 
pass-by-value illusion while passing references in cases where it knows that 
modifications cannot occur and/or they are safe. The default subassign method 
uses that feature which means it can afford to not duplicate if there is only 
one reference -- then it's safe to not duplicate as we are replacing that only 
existing reference. And in the case of a matrix, that will be true at the 
latest from the second subassignment on.

Unfortunately the method dispatch (AFAICS) introduces one more reference in the 
dispatch chain so there will always be two references so duplication is 
necessary. Since we have only 0 / 1 / 2+ information on the references, we 
can't distinguish whether the second reference is due to the dispatch or due to 
the passed object having more than one reference, so we have to duplicate in 
any case. That is unfortunate, and I don't see a way around (unless we handle 
subassignment methods is some special way).


I don't believe dispatch is bumping NAMED (and a quick experiment
seems to confirm this though I don't guarantee I did that right). The
issue is that a replacement function implemented as a closure, which
is the only option for a package, will always see NAMED on the object
to be modified as 2 (because the value is obtained by forcing the
argument promise) and so any R level assignments will duplicate.  This
also isn't really an issue of imprecise reference counting -- there
really are (at least) two legitimate references -- one though the
argument and one through the caller's environment.

It would be good it we could come up with a way for packages to be
able to define replacement functions that do not duplicate in cases
where we really don't want them to, but this would require coming up
with some sort of protocol, minimally involving an efficient way to
detect whether a replacement funciton is bing called in a replacement
context or directly.

There are some replacement functions that use C code to cheat, but
these may create problems if called directly, so I won't advertise
them.

Best,

luke



Cheers,
Simon




 Small reproducible example in vanilla R 2.13.0 :


x = list(a=1:1,b=1:1)
class(x) = newclass
[-.newclass = function(x,i,j,value) x  # i.e. do nothing
tracemem(x)

[1] 0xa1ec758

x[1,2] = 42L

tracemem[0xa1ec758 - 0xa1ec558]:# but, x is still copied, why?




I've tried returning NULL from [-.newclass but then x gets assigned
NULL :


[-.newclass = function(x,i,j,value) NULL
x[1,2] = 42L

tracemem[0xa1ec558 - 0x9c5f318]:

x

NULL




Any pointers much appreciated. If that copy is preventable it should
save the user needing to use `[-.data.table`(...) syntax to get the
best speed (20 times faster on the small example used so far).

Matthew


On Tue, 2011-07-05 at 08:32 +0100, Matthew Dowle wrote:

Simon,

Thanks for the great suggestion. I've written a skeleton assignment
function for data.table which incurs no copies, which works for this
case. For completeness, if I understand correctly, this is for :
 i) convenience of new users who don't know how to vectorize yet
 ii) more complex examples which can't be vectorized.

Before:


system.time(for (r in 1:R) DT[r,20] - 1.0)

  user  system elapsed
12.792   0.488  13.340

After :


system.time(for (r in 1:R) DT[r,20] - 1.0)

  user  system elapsed
 2.908   0.020   2.935

Where this can be reduced further as follows :


system.time(for (r in 1:R) `[-.data.table`(DT,r,2,1.0))

  user  system elapsed
 0.132   0.000   0.131




Still working on it. When it doesn't break other data.table tests, I'll
commit to R-Forge ...

Matthew


On Mon, 2011-07-04 at 12:41 -0400, Simon Urbanek wrote:

Timothée,

On Jul 4, 2011, at 2:47 AM, Timothée Carayol wrote:


Hi --

It's my first post on this list; as a relatively new user with little
knowledge of R internals, I am a bit intimidated by the depth of some
of the discussions here, so please spare me if I say something
incredibly silly.

I feel that someone at this point should mention Matthew Dowle's
excellent data.table package
(http://cran.r-project.org/web/packages/data.table/index.html) which
seems to me to address many of the inefficiencies of data.frame.
data.tables have no row names; and operations that only need data from
one or two columns are (I believe) just as quick whether the total
number of columns is 5 or 1000. This results in very quick operations
(and, 

Re: [Rd] Syntactically valid names

2011-07-05 Thread Hadley Wickham
On Tue, Jul 5, 2011 at 7:31 PM, steven mosher mosherste...@gmail.com wrote:
  regexp approach is kinda ugly
 http://www.r-bloggers.com/testing-for-valid-variable-names/

Hmm, I think that suggests a couple of small bug in make.names:

 make.names(...)
[1] ...
 make.names(..1)
[1] ..1

and

 x - paste(rep(x, 1e6), collapse = )
 x == make.names(x)
[1] TRUE


Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Syntactically valid names

2011-07-05 Thread steven mosher
 regexp approach is kinda ugly

http://www.r-bloggers.com/testing-for-valid-variable-names/

On Tue, Jul 5, 2011 at 3:29 PM, Hadley Wickham had...@rice.edu wrote:

  I wouldn't expect so. The basic structure might be handled using a regexp
 of sorts, but even that is tricky because of the dot not followed by
 number rule, and then there's the stop list of reserved words, which would
 make your code clumsy whatever you do.
 
  How on Earth would you expect anything to be significantly more elegant
 than your
 
  function(x) x == make.names(x)
 
  anyway??! (OK, if there was a wrapper for the C level isValidName()
 function...)

 Good point.  Thanks!

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Syntactically valid names

2011-07-05 Thread Davor Cubranic
On June 30, 2011 01:37:57 PM Hadley Wickham wrote:

 Is there any easy way to tell if a string is a syntactically valid name?
[...]
 
 One implementation would be:
 
 is.syntactic - function(x) x == make.names(x)
 
 but I wonder if there's a more elegant way.

This is without quoting, right? Because make.names replaces spaces with 
periods, and using quoting I can create syntactically valid names that do 
include spaces:

`x prime` - 3
ls()

Davor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Syntactically valid names

2011-07-05 Thread Hadley Wickham
 This is without quoting, right? Because make.names replaces spaces with
 periods, and using quoting I can create syntactically valid names that do
 include spaces:

    `x prime` - 3
    ls()

That's not a syntactically valid name - you use backticks to refer to
names that are not syntactically valid.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Syntactically valid names

2011-07-05 Thread peter dalgaard

On Jul 6, 2011, at 01:40 , Hadley Wickham wrote:

 On Tue, Jul 5, 2011 at 7:31 PM, steven mosher mosherste...@gmail.com wrote:
  regexp approach is kinda ugly
 http://www.r-bloggers.com/testing-for-valid-variable-names/
 
 Hmm, I think that suggests a couple of small bug in make.names:
 
 make.names(...)
 [1] ...
 make.names(..1)
 [1] ..1
 

What's wrong with that? They are names alright, just with special meanings.

 x - quote(...)
 mode(x)
[1] name


 and
 
 x - paste(rep(x, 1e6), collapse = )
 x == make.names(x)
 [1] TRUE
 
 

Mildly insane, but technically OK, no?


 Hadley
 
 -- 
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Syntactically valid names

2011-07-05 Thread Davor Cubranic
On July 5, 2011 04:59:16 PM Hadley Wickham wrote:
 That's not a syntactically valid name - you use backticks to refer to
 names that are not syntactically valid.

I was too loose in my terminology: I meant that `x prime` is a valid name, but 
as you said, it is not syntactically valid.

Davor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [datatable-help] speeding up perception

2011-07-05 Thread David Winsemius


On Jul 5, 2011, at 7:18 PM, luke-tier...@uiowa.edu luke-tier...@uiowa.edu 
 wrote:



On Tue, 5 Jul 2011, Simon Urbanek wrote:



On Jul 5, 2011, at 2:08 PM, Matthew Dowle wrote:


Simon (and all),

I've tried to make assignment as fast as calling `[-.data.table`
directly, for user convenience. Profiling shows (IIUC) that it isn't
dispatch, but x being copied. Is there a way to prevent '[-' from
copying x?


Good point, and conceptually, no. It's a subassignment after all -  
see R-lang 3.4.4 - it is equivalent to


`*tmp*` - x
x - `[-`(`*tmp*`, i, j, value)
rm(`*tmp*`)

so there is always a copy involved.

Now, a conceptual copy doesn't mean real copy in R since R tries to  
keep the pass-by-value illusion while passing references in cases  
where it knows that modifications cannot occur and/or they are  
safe. The default subassign method uses that feature which means it  
can afford to not duplicate if there is only one reference -- then  
it's safe to not duplicate as we are replacing that only existing  
reference. And in the case of a matrix, that will be true at the  
latest from the second subassignment on.


Unfortunately the method dispatch (AFAICS) introduces one more  
reference in the dispatch chain so there will always be two  
references so duplication is necessary. Since we have only 0 / 1 /  
2+ information on the references, we can't distinguish whether the  
second reference is due to the dispatch or due to the passed object  
having more than one reference, so we have to duplicate in any  
case. That is unfortunate, and I don't see a way around (unless we  
handle subassignment methods is some special way).


I don't believe dispatch is bumping NAMED (and a quick experiment
seems to confirm this though I don't guarantee I did that right). The
issue is that a replacement function implemented as a closure, which
is the only option for a package, will always see NAMED on the object
to be modified as 2 (because the value is obtained by forcing the
argument promise) and so any R level assignments will duplicate.  This
also isn't really an issue of imprecise reference counting -- there
really are (at least) two legitimate references -- one though the
argument and one through the caller's environment.

It would be good it we could come up with a way for packages to be
able to define replacement functions that do not duplicate in cases
where we really don't want them to, but this would require coming up
with some sort of protocol, minimally involving an efficient way to
detect whether a replacement funciton is being called in a replacement
context or directly.


Would $- always satisfy that condition. It would be big help to me  
if it could be designed to avoid duplication the rest of the data.frame.


--



There are some replacement functions that use C code to cheat, but
these may create problems if called directly, so I won't advertise
them.

Best,

luke



Cheers,
Simon





--
Luke Tierney
Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
  Actuarial Science
241 Schaeffer Hall  email:  l...@stat.uiowa.edu
Iowa City, IA 52242 WWW:  http:// 
www.stat.uiowa.edu__

R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


David Winsemius, MD
West Hartford, CT

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [datatable-help] speeding up perception

2011-07-05 Thread Simon Urbanek
No subassignment function satisfies that condition, because you can always call 
them directly. However, that doesn't stop the default method from making that 
assumption, so I'm not sure it's an issue.

David, Just to clarify - the data frame content is not copied, we are talking 
about the vector holding columns.

Cheers,
Simon

Sent from my iPhone

On Jul 5, 2011, at 9:01 PM, David Winsemius dwinsem...@comcast.net wrote:

 
 On Jul 5, 2011, at 7:18 PM, luke-tier...@uiowa.edu luke-tier...@uiowa.edu 
 wrote:
 
 On Tue, 5 Jul 2011, Simon Urbanek wrote:
 
 
 On Jul 5, 2011, at 2:08 PM, Matthew Dowle wrote:
 
 Simon (and all),
 
 I've tried to make assignment as fast as calling `[-.data.table`
 directly, for user convenience. Profiling shows (IIUC) that it isn't
 dispatch, but x being copied. Is there a way to prevent '[-' from
 copying x?
 
 Good point, and conceptually, no. It's a subassignment after all - see 
 R-lang 3.4.4 - it is equivalent to
 
 `*tmp*` - x
 x - `[-`(`*tmp*`, i, j, value)
 rm(`*tmp*`)
 
 so there is always a copy involved.
 
 Now, a conceptual copy doesn't mean real copy in R since R tries to keep 
 the pass-by-value illusion while passing references in cases where it knows 
 that modifications cannot occur and/or they are safe. The default subassign 
 method uses that feature which means it can afford to not duplicate if 
 there is only one reference -- then it's safe to not duplicate as we are 
 replacing that only existing reference. And in the case of a matrix, that 
 will be true at the latest from the second subassignment on.
 
 Unfortunately the method dispatch (AFAICS) introduces one more reference in 
 the dispatch chain so there will always be two references so duplication is 
 necessary. Since we have only 0 / 1 / 2+ information on the references, we 
 can't distinguish whether the second reference is due to the dispatch or 
 due to the passed object having more than one reference, so we have to 
 duplicate in any case. That is unfortunate, and I don't see a way around 
 (unless we handle subassignment methods is some special way).
 
 I don't believe dispatch is bumping NAMED (and a quick experiment
 seems to confirm this though I don't guarantee I did that right). The
 issue is that a replacement function implemented as a closure, which
 is the only option for a package, will always see NAMED on the object
 to be modified as 2 (because the value is obtained by forcing the
 argument promise) and so any R level assignments will duplicate.  This
 also isn't really an issue of imprecise reference counting -- there
 really are (at least) two legitimate references -- one though the
 argument and one through the caller's environment.
 
 It would be good it we could come up with a way for packages to be
 able to define replacement functions that do not duplicate in cases
 where we really don't want them to, but this would require coming up
 with some sort of protocol, minimally involving an efficient way to
 detect whether a replacement funciton is being called in a replacement
 context or directly.
 
 Would $- always satisfy that condition. It would be big help to me if it 
 could be designed to avoid duplication the rest of the data.frame.
 
 -- 
 
 
 There are some replacement functions that use C code to cheat, but
 these may create problems if called directly, so I won't advertise
 them.
 
 Best,
 
 luke
 
 
 Cheers,
 Simon
 
 
 
 
 -- 
 Luke Tierney
 Statistics and Actuarial Science
 Ralph E. Wareham Professor of Mathematical Sciences
 University of Iowa  Phone: 319-335-3386
 Department of Statistics andFax:   319-335-3017
  Actuarial Science
 241 Schaeffer Hall  email:  l...@stat.uiowa.edu
 Iowa City, IA 52242 WWW:  
 http://www.stat.uiowa.edu__
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 David Winsemius, MD
 West Hartford, CT
 
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Syntactically valid names

2011-07-05 Thread Hadley Wickham
 What's wrong with that? They are names alright, just with special meanings.

But you can't really use them for variables:

 ... - 4
 ...
Error: '...' used in an incorrect context
 ..1 - 4
 ..1
Error: 'nthcdr' needs a list to CDR down

And make.names generally protects you against that:

 make.names(function)
[1] function.
 make.names(break)
[1] break.
 make.names(TRUE)
[1] TRUE.

 x - paste(rep(x, 1e6), collapse = )
 x == make.names(x)
 [1] TRUE

 Mildly insane, but technically OK, no?

I don't think so:

 x - paste(rep(x, 1e6), collapse = )
 assign(x, 1)
Error in assign(x, 1) : variable names are limited to 1 bytes

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel